Moodify. 1. Introduction. 2. System Architecture. 2.1 Data Fetching Component. W205-1 Rock Baek, Saru Mehta, Vincent Chio, Walter Erquingo Pezo

Size: px
Start display at page:

Download "Moodify. 1. Introduction. 2. System Architecture. 2.1 Data Fetching Component. W205-1 Rock Baek, Saru Mehta, Vincent Chio, Walter Erquingo Pezo"

Transcription

1 1. Introduction Moodify Moodify is an music web application that recommend songs to user based on mood. There are two ways a user can interact with the application. First, users can select a mood that is supported by the system and the application displays a list of songs that are classified with the highest probability for that mood. Second, user can navigate trending mood through an interactive Google map that displays the current most popular mood in each of the state or country around the world. The rest of the paper discusses about the technical details on the proposed system. Section 2 discusses about system architecture. Section 3 discusses about data retrieval strategies. Section 4 discusses about implementation details and improvement. Section 5 concludes the paper. 2. System Architecture The system involves two major components: 1) a backend system that fetches music metadata as well as generating mood categorization for each of the song; 2) a frontend user facing web application that accepts user mood queries and responds with a list of songs that matches the user input. The backend system is to build all the necessary and properly indexed data that is then consumed by the frontend web application. There are two types of requirements from the frontend: 1) search songs by mood; 2) browse mood by location. To support the first requirement, the backend system needs to build an index on the probability of a song falling into certain mood category. The probability is then multiplied by the song hotness to calculate a mood frequency - song hotness (mf- sh) score. The details of calculating mf- sh score are elaborated in section 2.5. To support the second requirement, the backend needs to associate location with a mood for a song, and then aggregate all the mood by location and build an index on the location- mood count to be consumed by the frontend. In order to support the requirements from the frontend, the backend needs to: 1) compile a list of trending songs (trending means songs that are most listened or mentioned, and has no correlation with mood); 2) associate mood with each of the song; 3) associate location to each of the moods; 4) build the indexes that are consumed by the frontend. Section 2.1 to 2.4 discusses about the technical implementation of the backend system. Section 2.5 discusses about the frontend system. Section 2.6 shows the system diagram. 2.1 Data Fetching Component To tackle requirement 1, the system refers to Echonest ( for the list of trending songs. Echonest hosts one of the most versatile music database in the world with over 30 million songs and 3 millions artists records. Not only the data set includes basic metadata such as song title, artist, album and genre, but also it includes some of the intelligent attributes such as 1

2 energy, danceability, tempo and hotness. We utilized the hotness attribute as sorting criteria to gather the list of trending songs. To tackle requirement 2, the system requires a supplementary human behavioral information in order to accurately predict moods associated with a song. Using human behavioral data has advantage over other static data such as lyrics to tag a song with moods. This is because human s mood toward a song may change overtime whereas lyrics stay forever. The dynamic nature of the mood analysis provides real- time music recommendation that more accurately reflects the current trend. This source of data is obtained through two social media sites: 1) Twitter and 2) YouTube. By using song title and artist as filter criteria, we can fetch the most relevant tweets and YouTube comments for each of the songs and associate moods with each of the text. Requirement 3 depends on the location data for each of the text gathered from requirement 2 above. Twitter supports location based tweets, however, YouTube comment does not support it currently. Thus, the system only uses geo- enabled tweets to associate the location for mood, and only those moods with associated location are aggregated and displayed in the interactive mood map. Noted that, the process of aggregating location mood has no effect on the process of aggregating mood for a song and thus has no impact on the frontend requirement 1 (search songs by mood). These two ETL processes are discussed later in section Mood Analysis The system supports the following mood categories: anger, disgust, fear, joy, love, sad, surprise. Our main guideline on building our corpora is based on the paper EmpaTweet: Annotating and Detecting Emotions on Twitter which describes how to tag tweets with similar categories. The main approach for tweets is to categorize manually about 1500 tweets (200 per category) and use this as corpora for a series of Multinomial Naive Bayes Classifiers, one for each category. In this categorization, non alphabetic characters, stop words and hashtags are removed. Non mood related hashtags such as event and topic hashtags are removed because they appear more frequently because of a trend and not because of their sentiment values. Besides, a Porter stemmer is used. Additionally, the same process is repeated for YouTube comments, whose language is different than tweets because a tweet is limited to 140 characters and many of them are simply hashtags. YouTube comments do not have these restrictions. Thus, 14 classifiers are needed in total with 7 categories for YouTube and 7 categories for Twitter. The system also utilizes NLTK library to clean up the comments and tweets and scikit- learns for the classifiers. These classifiers are then used to tag moods for each of the tweets and YouTube comments. A vector of moods is produced as a result for each tweet and comment. The vector includes a list of mood categories with each containing a 0/1 number indicating whether the song falls into the mood category. These vectors are consumed by the ETL process to aggregate all the moods associated with a song. 2.3 ETL process 2

3 The system involves two aggregation processes that are required to generate the indexes for frontend consumption. The first ETL process is to aggregate all the mood vectors for each of the song. This can be accomplished by a MapReduce job. The mapper reads the mood vectors for each of the song and emits song id as key and the mood vector as value. The reducer simply counts up all value for each mood for a song and divides the aggregated value by the total number of reference to get the probability. The result of the MapReduce job is a list of probability mood vector for each of the song. Each mood in the vector indicates its probability of the mood occurring among all the corpora for the song. The second ETL process is to aggregate all the mood for each of the state/country for the entire database. This can also be accomplished by a MapReduce job. The mapper reads only the mood vectors that are associated with a location and emits (location, mood) as key. The reducer simply counts up all the keys. The result of the MapReduce job is used to build a location- mood model. The root level of the location- mood model is keyed by the location. The second level is keyed by mood, which is sorted by the total number of a mood by location. 2.4 Data Storage Component MongoDB is the primary data storage component for the whole system. MongoDB has several advantages over traditional SQL database. First, the data fetching component consists of multiple data sources with different data schema. Using MongoDB avoids the overhead of schema definition and potential database schema migration should we decide to add additional attributes and data sources. This allows us to efficiently implement the data fetching component. Second, the schema of the system is relatively simple, considering the fact that the front- end web application only requires the two indexes and song title and artist (finding the video of a song can be done on- demand using title and artist once user selects the song). There is no need for data normalization. Third, the flexibility of the document- oriented storage system allows us to augment data structure with added functionalities such as mood analysis without the need of modifying the database schema to adapt the new modeling in the system. Data sources from data fetching component are stored directly into MongoDB. The data can then be exported into CSV formatted file to be consumed by the ETL. Similarly, mood vectors for each of the tweet and comment generated from mood analysis component are stored directly into MongoDB. However, the result of the MapReduce jobs from the ETL processes are stored initially into the file system and then a process is triggered to transform the aggregate result for each of the key into the corresponding indexes into the MongoDB. 2.5 Data Presentation A web frontend application is built to present users with two major functions: 1) search songs by mood categories; 2) explore moods by regions in an interactive map. This first type of requests can be answered by the ETL process that builds the mood vectors for each of the song. The probability of a song falling into a specific mood category can be calculated by dividing the total number of vector references for that mood over the total number of mood vectors for a song. This probability score is then multiplied by the hotness score that is fetched from Echonest data source. Multiplying the hotness score can balance the scenario where less popular songs have fewer number of mood 3

4 vectors which increase the chance of falling into a specific mood category. This also takes into consideration that more popular songs should have higher chance of being shown. The mood frequency - song hotness (mf- sh) score is used as sorting criteria to display the list of songs for a specific category. The second type of requests can be answered by the ETL process that builds the location- mood model. Each region in the map is displayed with the top referenced mood which simply counts all the reference for a mood in the specific region for all of the songs. 2.6 System Diagram 3. Data Retrieval Strategy There are three major data sources that are consumed by the system. Each sub- section discusses about the challenges we face for each data source and the corresponding data retrieval strategies. For each data source, we only stored the necessary attributes in the database and dropped 4

5 everthing else. Thus, we are able to control the final database size down to 1.37GB even though we have crawled a large amount of data. 3.1 Echonest Table 1 in appendix 3 provides a summary on the collected Echonest data. Script was written to fetch trending song from Echonest sorted by attribute hotness. Hotness is a fraction number between 0 and 1. Due to the API constraints, the search parameter for hotness only accepts up to 2 decimal points and the API only returns up to 1000 records for each type of search. To fetch as many song as possible, the following strategy is used: 1) specify both lower limit and upper limit for hotness, e.g ; 2) search up to 1000 results for each of the hotness range (0.01). Using this strategy, we are able to fetch trending songs from Echonest. The process of fetching the data was finished within half an hour. However, the trending songs returned using this strategy have duplication. These duplicated songs have different song id but same artist name and song title. Another script was written to eliminate the duplicated songs. As a result, we extracted unique songs. 3.2 Twitter Table 2 in appendix 3 provides a summary on the collected Twitter data. Script was written to fetch up to 500 tweets for each of the song using Twitter API. Song title and artist name are used as search filter criteria to fetch related tweets. Using this strategy, we are able to fetch tweets for unique songs. The process of fetching the data was finished within 1 day under the search API rate limit of tweets / 15 minutes. However, shortly after we used a sample of the tweet corpus as training dataset for the classifier, we realized a severe quality issue in the retrieved tweets. Most of the tweets have the format of Listening to and Now playing and lots of them are promotional tweets for marketing purpose. Thus, any mood analysis based on the set of low quality tweets would be irrelevant. Most of the tweets ended up with 0 probability for any of the moods using this classifier. A second iteration of tweets acquisition was run to fetch more relevant tweet using different criteria. We transformed song title into a hashtag and used it as filter criteria. We also filtered out non- english tweets and tweets that contains the words watch, now playing and video. Since we recognized this issue at a very late stage of the project, we only fetched up to 100 tweets for each of the song in order to speed up the process. Using this approach the tweet corpus contains much more higher quality text. As a result, the distribution of the probability mood vector is significantly improved. We are able to fetch tweets for unique songs. 3.3 Tweet Location In order to fetch the state/country data for geo- enabled tweets, a program was written to perform reverse geocoding to transform location coordinates into state and country for higher level aggregation. For country that support administration region such as state, state will be used as 5

6 aggregation point instead, otherwise, country is used. By using Nominatim service, we are able to fetch locations for unique tweets. Then, we used Google Map service to transform the state/country back into coordinates to be used to display mood icon in Google Map in the frontend application. This process was finished in a day. 3.4 YouTube Table 3 in appendix 3 provides a summary on the collected YouTube data. In addition to using Twitter comments to analyze the moods for each top song retrieved from our Echonest corpus, we use Youtube comments as an equal measure to gauge the moods of a song. Similar to the first tweets fetching strategy, song title and artist name were used as search filter criteria to fetch related tweets. For each unique song, 5 YouTube videos were used as comments references and 20 comments were fetched for each of the video. Thus a maximum of 100 comments were fetched for each of the unique song. Retrieving song comments from a variety of videos ensures that we have a large random sample of user comments to perform the sentiment analysis. Using this strategy, we are able to fetch comments for unique songs. The rate limit for retrieving data from Youtube API is 50,000 requests/day, so we can search 500 songs a day. The songs were split into three sections and were assigned to three teammates to perform the data retrieval concurrently. The process of fetching the data was finished within 10 days. 4. Implementation 4.1 Data Storage Since the data acquisition process was split between teammates, there are two major ways of storing the initial dataset. One way is to store the immediate data into local MongoDB. We used this strategy for the Echonest and Twitter data. Since only one teammate is responsible in this data retrieval process, storing in local MongoDB removes unnecessary complexity of migrating data to the system MongoDB server. The other way is to store the immediate data into AWS S3. We used this strategy for the YouTube data because this process was split between three teammates. The YouTube data was fetched from S3 and stored to local MongoDB afterward. After the data acquisition process was accomplished, we created an EC2 instance in AWS and ran a MongoDB server in the instance to serve as main data storage for the system. We reused the database backup and restore programs written in assignment 3 to migrate all the data from local to remote MongoDB. We prefer this strategy rather than writing to the remote MongoDB during the data acquisition process because of performance concern. The remote MongoDB is running in a t2.micro EC2 instance that takes advantage of the AWS free tier services. The processing power of this type of EC2 instance is very limited. Performing database insertion for millions of records would take hours to days. Whereas, using the backup and restore strategy takes only a few minutes. The remote MongoDB server serves as the backbone for all the subsequent system processes. 6

7 4.2 Mood Analysis NLTK is used for the Porter stemmer, the stop words for English and tokenization removing non- alpha characters. We also use sklearn for the Multinomial Naive Bayes Algorithm and the Counter Vectorizer of documents. sklearn has advantage over the NLTK on the performance of Naive Bayes classification. We used a random sample of the tweets and YouTube comments corpora and manually tagged them which were used to train the 14 classifiers. A program was also written to automatically tag all the tweets and comments, generate the mood vectors, and store the vectors in the MongoDB. 4.3 ETL After the mood vectors for each of the tweets and YouTube comments were generated, we ran the two MapReduce jobs implemented using MRJob for the two ETL processes described earlier. We also wrote a program to calculate the mf- sh score for each of the song based on the aggregated mood vector from the MRJob output and store the mf- sh score back to the system MongoDB for each of the songs. The location- mood aggregation result from MRJob output was stored in a new MongoDB collection which is consumed directly by the frontend application. 4.4 Data Modeling The following table shows the data models that we used to store the result from each of the components described earlier to MongoDB. echonest_songs tweets_v2 youtube_comments location_moods - id - title - artists_name - song_hotttnesss - youtube_mf_sh - tweet_mf_sh - id - song_id - text - coordinates - user - love - joy - sad - disgust - anger - surprise - fear - Geolocation - id - song_id - text - love - joy - sad - disgust - anger - surprise - fear - location - love - joy - sad - disgust - anger - surprise - fear - longitude - latitude echonest_songs is the collection model to hold information for each individual songs. id, title, artist_name and song_hotttnesss are attributes crawled from Echonest data source. They are stored into the MongoDB without any modification. youtube_mf_sh and tweet_mf_sh are objects that hold the mf_sh score for each of the mood. Each mf_sh object holds 7 attributes that are keyed by each of the mood name and the value of each mood key is the calculated mf_sh score. For example, youtube_mf_sh may look like: 7

8 { love : 0.11, joy : 0.019, sad : 0.037, disgust : 0.028, anger : 0.037, surprise : 0.084, fear : 0.009} tweets_v2 is the collection model to hold information for each tweet. id, text, coordinates and user are attributes crawled from Twitter data source. song_id is the corresponding Echonest song id for a tweet. The song_id is used for association with the echonest_songs collection and is also used as aggregation key for the ETL processes. love, joy, sad, disgust, anger, surprise and fear are all 0/1 valued attributes used to store the mood analysis result. These 7 attributes are considered as the mood vector as discussed throughout the paper. Geolocation stores the result of reverse geocoding of the coordinates. For example: the attribute may look like: { city : West Hollywood, house_number : 463, country : United States of America, county : Los Angeles County, state : California, postcode : 90036, country_code : us } This attribute is used to construct the location key that is used in the location- mood ETL process which generates the location key in the location_moods collection. youtube_comments is the collection model to hold information for each YouTube comment. text is the comment crawled from YouTube data source. id is generated by the system because we did not record the information during data acquisition process. However, it turns out that we never need to use the id to fetch more data from YouTube and thus the attribute is never reused. As is similar to tweets_v2 collection, song_id and the rest of the mood attributes have exactly the same meaning and functionalities. location_moods is the collection model to hold the aggregated mood vector for each of the location. location is a text representation of state/country discussed earlier in the second ETL process. This attribute is also used as key for the each of the document. An example of the attribute looks like british columbia,canada. longitude and latitude are the coordinates of the location attribute. These two attributes are used to create mood tags in the Google Map in the web application. The rest of the mood attributes are the aggregation result from the ETL process. Thus they may have value higher than 1. A summary statistic for each of the collections is also reported in Appendix Web Application The web application serves as a pure presentation layer of the mood recommendation system. It exposes two main Restful endpoints with one serving the mood category inquiry requests and the other one serving the location mood requests. We made use of Ruby on Rails framework to develop the application. Since two of the teammates already have experience working with the technologies and Ruby on Rails has outstanding advantage of developing web application efficiently, we decided to use it for productivity reason. The web application can be accessed through 8

9 4.6 Scalability The architecture of the system was planned to scale to millions songs. Due to the simplicity of the data modeling, MongoDB serves extremely well for the functionalities of the system while maintains the simplicity of the system design. Since MongoDB handles database sharding out of the box, the database should support the two main type of queries from web application without any performance degrade. However, each component of system currently requires a manual trigger by human currently. Ideally, a scheduler should be implemented to pipeline the whole process. Based on refresh period, the scheduler would automatically fetch data from Echonest and then YouTube and Twitter. We could also increase the API rate limits for all the data source providers when the existing limits severely impact the performance of the data acquisition process. But this should not be considered as a scalability issue. Since mood classifier can be reused, it does not impose any scalability drawback to construct them. But as more tweets and YouTube comments are logged into the system, the classifying process can become the bottleneck. Instead of sequentially tagging the text, we can split the whole corpora by database page, id space or shard id. Then the system can run the classifiers concurrently. The performance of the two ETL processes can also be improved significantly when corpus for each data source is growing or when more data sources are added to the system. Since the MapReduce jobs are implemented using MRJob, they can be easily configured to be run in Amazon EMR clusters and take advantage of the computing power. 4.7 Improvement One major challenge during the data acquisition process is to aggregate relevant human behavioral data. Fetching relevant YouTube comments is fairly straightforward. Simply searching videos in YouTube using the song title and artist name, the returned list of videos, especially the videos that are ranked at the top, are likely to include the official video of the song. The comments in YouTube videos are also closely representing the emotion or feeling of the commenters. However, as discussed in section 3.2, using similar strategy ended up with a large list of lower quality tweets that result in 0 for all the mood classifiers. Although the second iteration of the acquisition significantly improve the quality, the distribution of the moods in the tweet corpus is still heavily skewed to the love and joy mood. The following two diagrams demonstrate the issue: 9

10 Figure 1: YouTube comments mood distribution Figure 2: Tweet mood distribution Because we can only fetch location information from tweets, most of the mood in the mood map are either love or joy. There are several improvements can be made to the system: 1) add more data 10

11 sources, e.g. 8tracks and soundcloud where user comments are directly linked to songs as similar to YouTube; 2) diversify the sampling of the training dataset for classifiers to include multiple languages; 3) include more features to the classifiers such as punctuation and training text from other corpus. 4) mf_sh scores from different data sources can be weighted and combined into a single mf_sh scores to be used as the only sorting criteria. This would improve user experience of the web application. 5. Conclusion Moodify exposes a new way of exploring music using real- time user behavioral information. As opposed to the traditional mood classification based on static song attributes, the real- time behavioral information employs a dynamic layer in the mood classification algorithm that creates a more accurate prediction based on current trend. In addition, mood map allows users to explore the current mood in the world through the lense of music appetite. Once the improvements discussed in section 4.7 can be achieved, we anticipate that users would use Moodify to dynamically construct a music playlist based on current mood or location in interest. The experience would be similar to clicking one of the many playlists in existing streaming music services. Appendix 1: Project Repository Appendix 2: AWS S3 Appendix 3: Data Summary Table 1: Echonest data summary Steps # of songs Size MongoDB collection Echonest data before cleaning Echnoest data after cleaning duplicates 31, MB echonest_song 14, MB echonest_songs 11

12 Table 2: Twitter data summary Steps # of tweets # of unique referenced songs Size MongoDB collection First iteration 1,785, MB tweets Second iteration 575, MB tweets_v2 Table 3: YouTube data summary # of comments # of unique referenced songs Size MongoDB collection 739, MB youtube_comments Table 4: Location mood data summary # of unique location (state,country) # of tweets with geolocation Size MongoDB collection MB location_moods Appendix 4: Tools and Libraries pyechonest: fetch song metadata from Echoecho API service tweepy: search tweets that are related to songs fetched from Echonest. pymongo: manage the MongoDB apiclient, oauth2client: fetch YouTube comments from YouTube API service. nltk: used for Porter stemmer, the stop words for English and tokenization removing non- alpha characters sklearn: used for the Multinomial Naive Bayes Algorithm and the Counter Vectorizer of documents. Also its metrics are used to make a plot of the performance of the algorithm. geopy: fetch geolocation information using Nominatim and Google Map services boto: store and retrieve data in AWS S3 matplotlib: plot data pandas: read and parse CSV files mrjob: used for MapReduce job implementation Ruby on Rails: web framework to build the frontend application 12

The Billion Object Platform (BOP): a system to lower barriers to support big, streaming, spatio-temporal data sources

The Billion Object Platform (BOP): a system to lower barriers to support big, streaming, spatio-temporal data sources FOSS4G 2017 Boston The Billion Object Platform (BOP): a system to lower barriers to support big, streaming, spatio-temporal data sources Devika Kakkar and Ben Lewis Harvard Center for Geographic Analysis

More information

Building A Billion Spatio-Temporal Object Search and Visualization Platform

Building A Billion Spatio-Temporal Object Search and Visualization Platform 2017 2 nd International Symposium on Spatiotemporal Computing Harvard University Building A Billion Spatio-Temporal Object Search and Visualization Platform Devika Kakkar, Benjamin Lewis Goal Develop a

More information

STORE LOCATOR USER GUIDE Extension version: 1.0 Magento Compatibility: CE 2.0

STORE LOCATOR USER GUIDE Extension version: 1.0 Magento Compatibility: CE 2.0 support@magestore.com sales@magestore.com Phone: +1-606-657-0768 STORE LOCATOR USER GUIDE Extension version: 1.0 Magento Compatibility: CE 2.0 Table of Contents 1. INTRODUCTION 3 Outstanding Features...3

More information

Automated Tagging for Online Q&A Forums

Automated Tagging for Online Q&A Forums 1 Automated Tagging for Online Q&A Forums Rajat Sharma, Nitin Kalra, Gautam Nagpal University of California, San Diego, La Jolla, CA 92093, USA {ras043, nikalra, gnagpal}@ucsd.edu Abstract Hashtags created

More information

NLP Final Project Fall 2015, Due Friday, December 18

NLP Final Project Fall 2015, Due Friday, December 18 NLP Final Project Fall 2015, Due Friday, December 18 For the final project, everyone is required to do some sentiment classification and then choose one of the other three types of projects: annotation,

More information

/ Cloud Computing. Recitation 7 October 10, 2017

/ Cloud Computing. Recitation 7 October 10, 2017 15-319 / 15-619 Cloud Computing Recitation 7 October 10, 2017 Overview Last week s reflection Project 3.1 OLI Unit 3 - Module 10, 11, 12 Quiz 5 This week s schedule OLI Unit 3 - Module 13 Quiz 6 Project

More information

Assignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis

Assignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis Assignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis Due by 11:59:59pm on Tuesday, March 16, 2010 This assignment is based on a similar assignment developed at the University of Washington. Running

More information

Enhancing applications with Cognitive APIs IBM Corporation

Enhancing applications with Cognitive APIs IBM Corporation Enhancing applications with Cognitive APIs After you complete this section, you should understand: The Watson Developer Cloud offerings and APIs The benefits of commonly used Cognitive services 2 Watson

More information

W205: Storing and Retrieving Data Spring 2015

W205: Storing and Retrieving Data Spring 2015 W205: Storing and Retrieving Data Spring 2015 Instructor: Alex Milowski Team Members: Nate Black Arthur Mak Malini Mittal Marguerite Oneto April 28, 2015 1 Table of Contents 1 Introduction. 4 1.1 The Problem:

More information

Lambda Architecture for Batch and Stream Processing. October 2018

Lambda Architecture for Batch and Stream Processing. October 2018 Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.

More information

Uber Push and Subscribe Database

Uber Push and Subscribe Database Uber Push and Subscribe Database June 21, 2016 Clifford Boyce Kyle DiSandro Richard Komarovskiy Austin Schussler Table of Contents 1. Introduction 2 a. Client Description 2 b. Product Vision 2 2. Requirements

More information

Test On Line: reusing SAS code in WEB applications Author: Carlo Ramella TXT e-solutions

Test On Line: reusing SAS code in WEB applications Author: Carlo Ramella TXT e-solutions Test On Line: reusing SAS code in WEB applications Author: Carlo Ramella TXT e-solutions Chapter 1: Abstract The Proway System is a powerful complete system for Process and Testing Data Analysis in IC

More information

STORE LOCATOR PLUGIN USER GUIDE

STORE LOCATOR PLUGIN USER GUIDE support@simicart.com Support: +84.3127.1357 STORE LOCATOR PLUGIN USER GUIDE Table of Contents 1. INTRODUCTION... 3 2. HOW TO INSTALL... 4 3. HOW TO CONFIGURE... 5 4. HOW TO USE ON APP... 13 SimiCart Store

More information

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper. Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

GraphCEP Real-Time Data Analytics Using Parallel Complex Event and Graph Processing

GraphCEP Real-Time Data Analytics Using Parallel Complex Event and Graph Processing Institute of Parallel and Distributed Systems () Universitätsstraße 38 D-70569 Stuttgart GraphCEP Real-Time Data Analytics Using Parallel Complex Event and Graph Processing Ruben Mayer, Christian Mayer,

More information

CSE 454 Final Report TasteCliq

CSE 454 Final Report TasteCliq CSE 454 Final Report TasteCliq Samrach Nouv, Andrew Hau, Soheil Danesh, and John-Paul Simonis Goals Your goals for the project Create an online service which allows people to discover new media based on

More information

Media AI. Adaptive. Intelligent. Architectural Design Document

Media AI. Adaptive. Intelligent. Architectural Design Document Adaptive. Intelligent. Nick Burwell CS 130 Software Development Thursday, December 16, 2004 Table of Contents 1. Introduction...1 2. Architecture...1 3. Component Design...2 3.1 User login & administration...2

More information

/ Cloud Computing. Recitation 10 March 22nd, 2016

/ Cloud Computing. Recitation 10 March 22nd, 2016 15-319 / 15-619 Cloud Computing Recitation 10 March 22nd, 2016 Overview Administrative issues Office Hours, Piazza guidelines Last week s reflection Project 3.3, OLI Unit 4, Module 15, Quiz 8 This week

More information

Data Analytics with HPC. Data Streaming

Data Analytics with HPC. Data Streaming Data Analytics with HPC Data Streaming Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

CSCI6900 Assignment 1: Naïve Bayes on Hadoop

CSCI6900 Assignment 1: Naïve Bayes on Hadoop DEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OF GEORGIA CSCI6900 Assignment 1: Naïve Bayes on Hadoop DUE: Friday, January 29 by 11:59:59pm Out January 8, 2015 1 INTRODUCTION TO NAÏVE BAYES Much of machine

More information

Extracting Information from Social Networks

Extracting Information from Social Networks Extracting Information from Social Networks Reminder: Social networks Catch-all term for social networking sites Facebook microblogging sites Twitter blog sites (for some purposes) 1 2 Ways we can use

More information

Developing Focused Crawlers for Genre Specific Search Engines

Developing Focused Crawlers for Genre Specific Search Engines Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com

More information

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018 Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning

More information

Using the Force of Python and SAS Viya on Star Wars Fan Posts

Using the Force of Python and SAS Viya on Star Wars Fan Posts SESUG Paper BB-170-2017 Using the Force of Python and SAS Viya on Star Wars Fan Posts Grace Heyne, Zencos Consulting, LLC ABSTRACT The wealth of information available on the Internet includes useful and

More information

ETL Testing Concepts:

ETL Testing Concepts: Here are top 4 ETL Testing Tools: Most of the software companies today depend on data flow such as large amount of information made available for access and one can get everything which is needed. This

More information

Prototyping Data Intensive Apps: TrendingTopics.org

Prototyping Data Intensive Apps: TrendingTopics.org Prototyping Data Intensive Apps: TrendingTopics.org Pete Skomoroch Research Scientist at LinkedIn Consultant at Data Wrangling @peteskomoroch 09/29/09 1 Talk Outline TrendingTopics Overview Wikipedia Page

More information

IBM Best Practices Working With Multiple CCM Applications Draft

IBM Best Practices Working With Multiple CCM Applications Draft Best Practices Working With Multiple CCM Applications. This document collects best practices to work with Multiple CCM applications in large size enterprise deployment topologies. Please see Best Practices

More information

The main website for Henrico County, henrico.us, received a complete visual and structural

The main website for Henrico County, henrico.us, received a complete visual and structural Page 1 1. Program Overview The main website for Henrico County, henrico.us, received a complete visual and structural overhaul, which was completed in May of 2016. The goal of the project was to update

More information

Transformer Looping Functions for Pivoting the data :

Transformer Looping Functions for Pivoting the data : Transformer Looping Functions for Pivoting the data : Convert a single row into multiple rows using Transformer Looping Function? (Pivoting of data using parallel transformer in Datastage 8.5,8.7 and 9.1)

More information

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark Announcements HW2 due this Thursday AWS accounts Any success? Feel

More information

1

1 1 2 3 6 7 8 9 10 Storage & IO Benchmarking Primer Running sysbench and preparing data Use the prepare option to generate the data. Experiments Run sysbench with different storage systems and instance

More information

RedPoint Data Management for Hadoop Trial

RedPoint Data Management for Hadoop Trial RedPoint Data Management for Hadoop Trial RedPoint Global 36 Washington Street Wellesley Hills, MA 02481 +1 781 725 0258 www.redpoint.net Copyright 2014 RedPoint Global Contents About the Hadoop sample

More information

The Road to a Complete Tweet Index

The Road to a Complete Tweet Index The Road to a Complete Tweet Index Yi Zhuang Staff Software Engineer @ Twitter Outline 1. Current Scale of Twitter Search 2. The History of Twitter Search Infra 3. Complete Tweet Index 4. Search Engine

More information

Applied Machine Learning

Applied Machine Learning Applied Machine Learning Lab 3 Working with Text Data Overview In this lab, you will use R or Python to work with text data. Specifically, you will use code to clean text, remove stop words, and apply

More information

Machine Learning in Action

Machine Learning in Action Machine Learning in Action PETER HARRINGTON Ill MANNING Shelter Island brief contents PART l (~tj\ssification...,... 1 1 Machine learning basics 3 2 Classifying with k-nearest Neighbors 18 3 Splitting

More information

Using the VMware vcenter Orchestrator Client. vrealize Orchestrator 5.5.1

Using the VMware vcenter Orchestrator Client. vrealize Orchestrator 5.5.1 Using the VMware vcenter Orchestrator Client vrealize Orchestrator 5.5.1 You can find the most up-to-date technical documentation on the VMware website at: https://docs.vmware.com/ If you have comments

More information

CS : Final Project Report

CS : Final Project Report CS 294-16: Final Project Report Team: Purple Paraguayans Michael Ball Nishok Chetty Rohan Roy Choudhury Alper Vural Problem Statement and Background Music has always been a form of both personal expression

More information

MapReduce Design Patterns

MapReduce Design Patterns MapReduce Design Patterns MapReduce Restrictions Any algorithm that needs to be implemented using MapReduce must be expressed in terms of a small number of rigidly defined components that must fit together

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

ETL Transformations Performance Optimization

ETL Transformations Performance Optimization ETL Transformations Performance Optimization Sunil Kumar, PMP 1, Dr. M.P. Thapliyal 2 and Dr. Harish Chaudhary 3 1 Research Scholar at Department Of Computer Science and Engineering, Bhagwant University,

More information

Creating a Classifier for a Focused Web Crawler

Creating a Classifier for a Focused Web Crawler Creating a Classifier for a Focused Web Crawler Nathan Moeller December 16, 2015 1 Abstract With the increasing size of the web, it can be hard to find high quality content with traditional search engines.

More information

CS229 Final Project: Predicting Expected Response Times

CS229 Final Project: Predicting Expected  Response Times CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time

More information

Using the VMware vrealize Orchestrator Client

Using the VMware vrealize Orchestrator Client Using the VMware vrealize Orchestrator Client vrealize Orchestrator 7.0 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by

More information

Search Engines and Time Series Databases

Search Engines and Time Series Databases Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search Engines and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18

More information

Pagely.com implements log analytics with AWS Glue and Amazon Athena using Beyondsoft s ConvergDB

Pagely.com implements log analytics with AWS Glue and Amazon Athena using Beyondsoft s ConvergDB Pagely.com implements log analytics with AWS Glue and Amazon Athena using Beyondsoft s ConvergDB Pagely is the market leader in managed WordPress hosting, and an AWS Advanced Technology, SaaS, and Public

More information

P2P Applications. Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza Canale A-L Prof.ssa Chiara Petrioli

P2P Applications. Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza Canale A-L Prof.ssa Chiara Petrioli P2P Applications Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza Canale A-L Prof.ssa Chiara Petrioli Server-based Network Peer-to-peer networks A type of network

More information

MovieRec - CS 410 Project Report

MovieRec - CS 410 Project Report MovieRec - CS 410 Project Report Team : Pattanee Chutipongpattanakul - chutipo2 Swapnil Shah - sshah219 Abstract MovieRec is a unique movie search engine that allows users to search for any type of the

More information

Welcome to the New Era of Cloud Computing

Welcome to the New Era of Cloud Computing Welcome to the New Era of Cloud Computing Aaron Kimball The web is replacing the desktop 1 SDKs & toolkits are there What about the backend? Image: Wikipedia user Calyponte 2 Two key concepts Processing

More information

SOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES

SOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES SOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES Introduction to Information Retrieval CS 150 Donald J. Patterson This content based on the paper located here: http://dx.doi.org/10.1007/s10618-008-0118-x

More information

Incluvie: Actor Data Collection Ada Gok, Dana Hochman, Lucy Zhan

Incluvie: Actor Data Collection Ada Gok, Dana Hochman, Lucy Zhan Incluvie: Actor Data Collection Ada Gok, Dana Hochman, Lucy Zhan {goka,danarh,lucyzh}@bu.edu Figure 0. Our partner company: Incluvie. 1. Project Task Incluvie is a platform that promotes and celebrates

More information

Oracle Endeca Information Discovery

Oracle Endeca Information Discovery Oracle Endeca Information Discovery Glossary Version 2.4.0 November 2012 Copyright and disclaimer Copyright 2003, 2013, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Queries on streams

More information

MICROSOFT BUSINESS INTELLIGENCE

MICROSOFT BUSINESS INTELLIGENCE SSIS MICROSOFT BUSINESS INTELLIGENCE 1) Introduction to Integration Services Defining sql server integration services Exploring the need for migrating diverse Data the role of business intelligence (bi)

More information

PROJECT REPORT. TweetMine Twitter Sentiment Analysis Tool KRZYSZTOF OBLAK C

PROJECT REPORT. TweetMine Twitter Sentiment Analysis Tool KRZYSZTOF OBLAK C PROJECT REPORT TweetMine Twitter Sentiment Analysis Tool KRZYSZTOF OBLAK C00161361 Table of Contents 1. Introduction... 1 1.1. Purpose and Content... 1 1.2. Project Brief... 1 2. Description of Submitted

More information

A U T O M A T E D C O N T E NT P R O T E C T I O N, A N A L Y T I C S A N D M O N E T I Z A T I O N A C R O S S S O C I A L P L A T F O R M S

A U T O M A T E D C O N T E NT P R O T E C T I O N, A N A L Y T I C S A N D M O N E T I Z A T I O N A C R O S S S O C I A L P L A T F O R M S Presenting: Eyal Arad VIDEOCITES 1 ID LTD. 2018 A U T O M A T E D C O N T E NT P R O T E C T I O N, A N A L Y T I C S A N D M O N E T I Z A T I O N A C R O S S S O C I A L P L A T F O R M S VIDEOCITES

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

S E N T I M E N T A N A L Y S I S O F S O C I A L M E D I A W I T H D A T A V I S U A L I S A T I O N

S E N T I M E N T A N A L Y S I S O F S O C I A L M E D I A W I T H D A T A V I S U A L I S A T I O N S E N T I M E N T A N A L Y S I S O F S O C I A L M E D I A W I T H D A T A V I S U A L I S A T I O N BY J OHN KELLY SOFTWARE DEVELOPMEN T FIN AL REPOR T 5 TH APRIL 2017 TABLE OF CONTENTS Abstract 2 1.

More information

CA ERwin Data Modeler

CA ERwin Data Modeler CA ERwin Data Modeler Implementation Guide Service Pack 9.5.2 This Documentation, which includes embedded help systems and electronically distributed materials, (hereinafter referred to only and is subject

More information

Social Network Analytics on Cray Urika-XA

Social Network Analytics on Cray Urika-XA Social Network Analytics on Cray Urika-XA Mike Hinchey, mhinchey@cray.com Technical Solutions Architect Cray Inc, Analytics Products Group April, 2015 Agenda 1. Introduce platform Urika-XA 2. Technology

More information

Clustering to Reduce Spatial Data Set Size

Clustering to Reduce Spatial Data Set Size Clustering to Reduce Spatial Data Set Size Geoff Boeing arxiv:1803.08101v1 [cs.lg] 21 Mar 2018 1 Introduction Department of City and Regional Planning University of California, Berkeley March 2018 Traditionally

More information

R-Store: A Scalable Distributed System for Supporting Real-time Analytics

R-Store: A Scalable Distributed System for Supporting Real-time Analytics R-Store: A Scalable Distributed System for Supporting Real-time Analytics Feng Li, M. Tamer Ozsu, Gang Chen, Beng Chin Ooi National University of Singapore ICDE 2014 Background Situation for large scale

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

Java Archives Search Engine Using Byte Code as Information Source

Java Archives Search Engine Using Byte Code as Information Source Java Archives Search Engine Using Byte Code as Information Source Oscar Karnalim School of Electrical Engineering and Informatics Bandung Institute of Technology Bandung, Indonesia 23512012@std.stei.itb.ac.id

More information

Qlik Sense Enterprise architecture and scalability

Qlik Sense Enterprise architecture and scalability White Paper Qlik Sense Enterprise architecture and scalability June, 2017 qlik.com Platform Qlik Sense is an analytics platform powered by an associative, in-memory analytics engine. Based on users selections,

More information

DATA MINING TRANSACTION

DATA MINING TRANSACTION DATA MINING Data Mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. It is

More information

VolunteerMatters Wordpress Web Platform Calendar Admin Guide. Version 1.1

VolunteerMatters Wordpress Web Platform Calendar Admin Guide. Version 1.1 VolunteerMatters Wordpress Web Platform Calendar Admin Guide Version 1.1 VolunteerMatters Wordpress Web: Admin Guide This VolunteerMatters Wordpress Web Platform administrative guide is broken up into

More information

Your First Hadoop App, Step by Step

Your First Hadoop App, Step by Step Learn Hadoop in one evening Your First Hadoop App, Step by Step Martynas 1 Miliauskas @mmiliauskas Your First Hadoop App, Step by Step By Martynas Miliauskas Published in 2013 by Martynas Miliauskas On

More information

Data Analytics Framework and Methodology for WhatsApp Chats

Data Analytics Framework and Methodology for WhatsApp Chats Data Analytics Framework and Methodology for WhatsApp Chats Transliteration of Thanglish and Short WhatsApp Messages P. Sudhandradevi Department of Computer Applications Bharathiar University Coimbatore,

More information

NosDB vs DocumentDB. Comparison. For.NET and Java Applications. This document compares NosDB and DocumentDB. Read this comparison to:

NosDB vs DocumentDB. Comparison. For.NET and Java Applications. This document compares NosDB and DocumentDB. Read this comparison to: NosDB vs DocumentDB Comparison For.NET and Java Applications NosDB 1.3 vs. DocumentDB v8.6 This document compares NosDB and DocumentDB. Read this comparison to: Understand NosDB and DocumentDB major feature

More information

Stager. A Web Based Application for Presenting Network Statistics. Arne Øslebø

Stager. A Web Based Application for Presenting Network Statistics. Arne Øslebø Stager A Web Based Application for Presenting Network Statistics Arne Øslebø Keywords: Network monitoring, web application, NetFlow, network statistics Abstract Stager is a web based

More information

Conclusions. Chapter Summary of our contributions

Conclusions. Chapter Summary of our contributions Chapter 1 Conclusions During this thesis, We studied Web crawling at many different levels. Our main objectives were to develop a model for Web crawling, to study crawling strategies and to build a Web

More information

Identifying Important Communications

Identifying Important Communications Identifying Important Communications Aaron Jaffey ajaffey@stanford.edu Akifumi Kobashi akobashi@stanford.edu Abstract As we move towards a society increasingly dependent on electronic communication, our

More information

Freegal emusic PC user guide

Freegal emusic PC user guide Freegal emusic PC user guide What is Freegal? Freegal is a free music streaming and downloading service. Freegal offers access to about 7 million songs including the Sony Music catalogue. In total the

More information

Parts of Speech, Named Entity Recognizer

Parts of Speech, Named Entity Recognizer Parts of Speech, Named Entity Recognizer Artificial Intelligence @ Allegheny College Janyl Jumadinova November 8, 2018 Janyl Jumadinova Parts of Speech, Named Entity Recognizer November 8, 2018 1 / 25

More information

Data for linguistics ALEXIS DIMITRIADIS. Contents First Last Prev Next Back Close Quit

Data for linguistics ALEXIS DIMITRIADIS. Contents First Last Prev Next Back Close Quit Data for linguistics ALEXIS DIMITRIADIS Text, corpora, and data in the wild 1. Where does language data come from? The usual: Introspection, questionnaires, etc. Corpora, suited to the domain of study:

More information

AI Dining Suggestion App. CS 297 Report Bao Pham ( ) Advisor: Dr. Chris Pollett

AI Dining Suggestion App. CS 297 Report Bao Pham ( ) Advisor: Dr. Chris Pollett AI Dining Suggestion App CS 297 Report Bao Pham (009621001) Advisor: Dr. Chris Pollett Abstract Trying to decide what to eat can be challenging and time-consuming. Google or Yelp are two popular search

More information

/ Cloud Computing. Recitation 9 March 15th, 2016

/ Cloud Computing. Recitation 9 March 15th, 2016 15-319 / 15-619 Cloud Computing Recitation 9 March 15th, 2016 Overview Administrative issues Office Hours, Piazza guidelines Last week s reflection Project 3.2, OLI Unit 4, Module 14, Quiz 7 This week

More information

How can you implement this through a script that a scheduling daemon runs daily on the application servers?

How can you implement this through a script that a scheduling daemon runs daily on the application servers? You ve been tasked with implementing an automated data backup solution for your application servers that run on Amazon EC2 with Amazon EBS volumes. You want to use a distributed data store for your backups

More information

Sitecore Experience Platform 8.0 Rev: September 13, Sitecore Experience Platform 8.0

Sitecore Experience Platform 8.0 Rev: September 13, Sitecore Experience Platform 8.0 Sitecore Experience Platform 8.0 Rev: September 13, 2018 Sitecore Experience Platform 8.0 All the official Sitecore documentation. Page 1 of 455 Experience Analytics glossary This topic contains a glossary

More information

CS November 2018

CS November 2018 Bigtable Highly available distributed storage Distributed Systems 19. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

Qlik Sense Performance Benchmark

Qlik Sense Performance Benchmark Technical Brief Qlik Sense Performance Benchmark This technical brief outlines performance benchmarks for Qlik Sense and is based on a testing methodology called the Qlik Capacity Benchmark. This series

More information

Kaggle See Click Fix Model Description

Kaggle See Click Fix Model Description Kaggle See Click Fix Model Description BY: Miroslaw Horbal & Bryan Gregory LOCATION: Waterloo, Ont, Canada & Dallas, TX CONTACT : miroslaw@gmail.com & bryan.gregory1@gmail.com CONTEST: See Click Predict

More information

Optimizing Testing Performance With Data Validation Option

Optimizing Testing Performance With Data Validation Option Optimizing Testing Performance With Data Validation Option 1993-2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording

More information

A Study of the Correlation between the Spatial Attributes on Twitter

A Study of the Correlation between the Spatial Attributes on Twitter A Study of the Correlation between the Spatial Attributes on Twitter Bumsuk Lee, Byung-Yeon Hwang Dept. of Computer Science and Engineering, The Catholic University of Korea 3 Jibong-ro, Wonmi-gu, Bucheon-si,

More information

CS November 2017

CS November 2017 Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

USING THE MUSICBRAINZ DATABASE IN THE CLASSROOM. Cédric Mesnage Southampton Solent University United Kingdom

USING THE MUSICBRAINZ DATABASE IN THE CLASSROOM. Cédric Mesnage Southampton Solent University United Kingdom USING THE MUSICBRAINZ DATABASE IN THE CLASSROOM Cédric Mesnage Southampton Solent University United Kingdom Abstract Musicbrainz is a crowd-sourced database of music metadata. The level 6 class of Data

More information

Media wrangling in the car with GENIVI reqs

Media wrangling in the car with GENIVI reqs Media wrangling in the car with GENIVI reqs Collecting all your music in one place Jonatan Pålsson February 2, 2014 Jonatan Pålsson Media wrangling in the car with GENIVI reqs February 2, 2014 1 / 22 Outline

More information

Python Certification Training

Python Certification Training Introduction To Python Python Certification Training Goal : Give brief idea of what Python is and touch on basics. Define Python Know why Python is popular Setup Python environment Discuss flow control

More information

Towards a hybrid approach to Netflix Challenge

Towards a hybrid approach to Netflix Challenge Towards a hybrid approach to Netflix Challenge Abhishek Gupta, Abhijeet Mohapatra, Tejaswi Tenneti March 12, 2009 1 Introduction Today Recommendation systems [3] have become indispensible because of the

More information

SEO: SEARCH ENGINE OPTIMISATION

SEO: SEARCH ENGINE OPTIMISATION SEO: SEARCH ENGINE OPTIMISATION SEO IN 11 BASIC STEPS EXPLAINED What is all the commotion about this SEO, why is it important? I have had a professional content writer produce my content to make sure that

More information

/ Cloud Computing. Recitation 8 October 18, 2016

/ Cloud Computing. Recitation 8 October 18, 2016 15-319 / 15-619 Cloud Computing Recitation 8 October 18, 2016 1 Overview Administrative issues Office Hours, Piazza guidelines Last week s reflection Project 3.2, OLI Unit 3, Module 13, Quiz 6 This week

More information

CADIAL Search Engine at INEX

CADIAL Search Engine at INEX CADIAL Search Engine at INEX Jure Mijić 1, Marie-Francine Moens 2, and Bojana Dalbelo Bašić 1 1 Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000 Zagreb, Croatia {jure.mijic,bojana.dalbelo}@fer.hr

More information

A Fast and High Throughput SQL Query System for Big Data

A Fast and High Throughput SQL Query System for Big Data A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190

More information

Orchestrating Music Queries via the Semantic Web

Orchestrating Music Queries via the Semantic Web Orchestrating Music Queries via the Semantic Web Milos Vukicevic, John Galletly American University in Bulgaria Blagoevgrad 2700 Bulgaria +359 73 888 466 milossmi@gmail.com, jgalletly@aubg.bg Abstract

More information

GR Reference Models. GR Reference Models. Without Session Replication

GR Reference Models. GR Reference Models. Without Session Replication , page 1 Advantages and Disadvantages of GR Models, page 6 SPR/Balance Considerations, page 7 Data Synchronization, page 8 CPS GR Dimensions, page 9 Network Diagrams, page 12 The CPS solution stores session

More information

MCSA SQL SERVER 2012

MCSA SQL SERVER 2012 MCSA SQL SERVER 2012 1. Course 10774A: Querying Microsoft SQL Server 2012 Course Outline Module 1: Introduction to Microsoft SQL Server 2012 Introducing Microsoft SQL Server 2012 Getting Started with SQL

More information