Cloud-based Twitter sentiment analysis for Ranking of hotels in the Cities of Australia

Size: px

Start display at page:

Download "Cloud-based Twitter sentiment analysis for Ranking of hotels in the Cities of Australia"

Norma Richard
5 years ago
Views:

1 Cloud-based Twitter sentiment analysis for Ranking of hotels in the Cities of Australia Distributed Computing Project Final Report Elisa Mena (633144) Supervisor: Richard Sinnott

2 Cloud-based Twitter Sentiment Analysis for Ranking of Hotels in the Cities of Australia A Dissertation Presented By ELISA MENA Submitted to the University of Melbourne in partial fulfilment of the requirements for the degree of MASTER OF DISTRIBUTED COMPUTING November 2014 Department of Computing and Information Systems i

4 DECLARATION OF AUTHENTICITY I hereby state this dissertation is my original work, gathered and utilized especially to fulfil the purposes and objectives of this study, and have not been previously submitted to any other university for a higher degree. I also declare that the publications cited in this work have been personally consulted. iii

5 ACKNOWLEDGMENTS I express my gratitude to my supervisor Richard Sinnott for always willing to help; his patience and support during the development of this project were essential to culminate it. I also want to thank to Farzad Khodadadi for his valuable help and the SENESCYT for its financial support during my stay in Australia. Finally, I thank my family and friends for their continuous encouragement to achieve my objectives and never give up. iv

6 ABSTRACT The tourism industry has been promoting its products and services based on reviews that people often write on travel web sites such as TripAdvisor. These reviews have a big impact on the decision making process when evaluate alternative places to visit, or hotels to book. Nonetheless, there have been indications that these reviews may not always be truthful. In order to prove that, we built a system for sentiment analysis that gathered data from Twitter streaming API for five cities of Australia: Sydney, Melbourne, Brisbane, Perth and Adelaide. We gathered over 53 million tweets. Then, processed that data by removing non-english characters, emoticons and hashtags, and classified tweets into positive, negative and neutral using a Naïve Bayes classifier. This classifier was trained using two data sets: a natural language toolkit corpus (1000 movie reviews) and 400 manually classified hotel-related tweets gathered from the Twitter API. The accuracy for both data sets was similar: 68.35% and 68.18% respectively. In order to analyse the results we used the dynamic dashboard Kibana in conjunction with Elasticsearch and Couchbase. We found more objective than subjective data. We analysed the data of the city of Melbourne and found that the hotel with more tweets was the Crown and the least one was The Blackman. Furthermore, the data relevant to hotels was scarce. We found around 300 out of 53 million tweets relevant to a hotel in the best case. v

7 TABLE OF CONTENTS 1. INTRODUCTION LITERATURE REVIEW ARCHITECTURE AND DESIGN DATA COLLECTION TEXT PREPARATION SENTIMENT DETECTION AND CLASSIFICATION WEB INTERFACE EXPERIMENTS AND RESULTS SENTIMENT ANALYSIS HOTEL REVIEWS IN TWITTER CONCLUSIONS REFERENCES... 14

8 1. Introduction The tourism industry has been capturing reviews from restaurants, hotels and holiday places for a long time; these reviews often are gathered from different sources such as TripAdvisor. TripAdvisor claims to be one of the largest travel sites in the world with more than 170 million reviews according to (TripAdvisorUK, 2014). Nevertheless, it has been criticized for allowing unsubstantiated reviews from people that never stayed in the reviewed place (Technology, 2012) Since TripAdvisor has a great impact on the reputation of different travel places and its reviews may not always been truthful, it is important to evaluate their accuracy. One great source of public data is Twitter which is a microblogging site launched in It has been widely used since then. It receives around 340 million tweets per day (Twitter, 2012). It allows users to write messages (tweets) of 140 characters that correspond to thoughts, ideas and feelings written in real time. Because Twitter s relationship model allows people to keep up to date on the latest news, even if those news come from a user that is not part of their social circle. It has been considered a great resource for data mining (Russell, 2013) and for use in for example, sentiment analysis. Hence, the data collected from Twitter can potentially help to identify public opinion about hotels that were already reviewed in TripAdvisor. In order to analyse the data gathered by Twitter, it is necessary to perform various steps that include collecting and preparing data, classifying the data and presenting it in a visual manner. For collecting data the Twitter streaming API was used to collect tweets from five cities of Australia: Sydney, Melbourne, Brisbane, Perth and Adelaide. This data was stored in a Couchbase database instance. In order to get hotelrelated data Elasticsearch was used to perform full-text search over a large corpus of tweets (over 53 million tweets) that had been harvested on the Cloud. This data was stored and sentiment analysis performed using a supervised learning approach with a Naïve Bayes classifier. According to this method we used two different training data sets: the natural language toolkit (NLTK) movies reviews that has 1000 reviews and a manually labelled data set that contains 400 hotel-related tweets. We calculated the accuracy of the NLTK corpus and our manually classified data set and in both cases we got approximately 68%. For the presentation layer of the system we used the Kibana web interface that provided us a flexible tool to analyse the data processes for the five cities of Australia. In this document we present a Literature Review section where we evaluated the approaches of different authors for handling twitter data. In the third section Architecture and Design we show the methodology and layout used to 1

9 build the infrastructure required by the system. In the fourth section we present the Experiments and Results by evaluating the two training data sets and their accuracy. We also include the evaluation of the ranking of the hotels in the city of Melbourne. Finally, in the fifth section we present the conclusions of this work as a whole. 2. Literature Review Sentiment analysis is one of the most researched areas in computer science; it has around 7000 articles according to (Feldman, 2013). There have been a number of projects related to social media data analysis in different topics like politics, movie ranking and marketing. The most popular approaches to do sentiment analysis consistent with (Kaur & Gupta, 2013) are: subjective lexicon where a list of words labelled as positive, negative or neutral are given, the N-gram model where a group of n words are given as a training data set, and machine learning where classification is performed using a set of features extracted from the text. Some authors use and compare the three approaches for different types of data including twitter and other data from the web. The work of (Kasper & Vela, 2011) gathered data from different web sites using a crawler and compiled them into one system called BESAHOT, they processed data written only in the German language and used a statistical polarity classifier and a linguistic information extraction. The polarity values were assigned to the text segments only. They achieved 72% accuracy when the identified topic polarity had neutral content and 75% with multi-topic content. (Gräbner, Zanker, Fliedl, & Fuchs, 2012) used a corpus from TripAdvisor reviews and constructed a domain-specific lexicon using part of speech tags (POS). This lexicon contained high frequency words such as hotel or room. They evaluated the system by measuring precision and recall for two ratings: a 5-star rating and a 3-way rating (positive, negative and neutral). They demonstrated that the classification accuracy was significantly better when they used less numbers of labels for the data (3-way rating) with a larger training data set. These approaches ((Kasper & Vela, 2011) and (Gräbner et al., 2012)) process and analyse data gathered from the web specifically hotel reviews in travel web sites. Their approaches use linguistic information extraction or POS that is applicable to data is more structured than Twitter data, which has a restriction of 140 characters. In this context, (Agarwal, Xie, Vovsha, Rambow, & Passonneau, 2011) explored the use of tree kernels in order to avoid the use of feature vectors. They used the POS specific prior polarity features and demonstrated the use of tree kernels for a set of new proposed features perform. They also use a 3-way classification and use unigram model, a feature based model and a tree kernel based model. The unigram model was used as a baseline, for the feature-based model. They also used 2

10 an approach that included new features for the tree kernel they designed for a Twitter tree representation. They demonstrated that using a set of 100 features they could perform similar accuracy as with a unigram model comprised of 10,000 features. The tree kernel approach, compared to the two others, dramatically impacts on the accuracy of the system. Moreover, they found that feature-specific twitter data like hashtags and emoticons add slight values to the classifier. For their purpose, they used 5,000 manually labelled tweets. (Kouloumpis, Wilson, & Moore, 2011) investigated further into the information and more informal language used in microblogging for 3-way classification. They used supervised machine learning systems and included three different corpora for training data sets: emoticons, hashtags and manually labelled data. Since microblogging data is different from well-structured data, the features and techniques used in other approaches may not be applicable to this data. They used POS and microblogging features including hashtags and emoticons. They found that POS features contributed to a decrease in the performance of their classifier. Moreover, including emoticons for data training improves the performance. Furthermore, the inclusion of microblogging features such as emoticons, intensifiers and abbreviations were identified as being beneficial. Many authors use general twitter data containing different topics. We instead used a Naïve Bayes classifier with two different types of corpora: the natural language toolkit (NLTK) movies reviews and a manually labelled hotel-related twitter data set. 3. Architecture and Design Sentiment analysis is related to sociology with focus on emotions and feelings. These emotions sometimes are part of the decision making process that a person follow to purchase something (Rambocas & Gama, 2013). In the case of hotel reviews, the emotions that other people express have a great impact on other s people opinion of a determined place. In order to capture those emotions, we used the following process: 3

Very often this data varies due to the type of language used, e.g. slang, context and language.

11 Data Collection Text Preparation Sentiment Detection Sentiment Classification Presentation Figure 1 Sentiment Analysis Process. Source: (Rambocas and Gama) Figure 1 shows the process for sentiment analysis. The data collection stage involves gathering data from one or more sources including blogs, reviews, and microblogs. Very often this data varies due to the type of language used, e.g. slang, context and language. The NLTK provides tools to prepare and clean data of non-textual content or content that is not relevant to the study. In the third stage sentiment detection involves detecting emotions and subjectivity of the text in order to classify new data with previously collected. Finally, the results are presented in a graphical way; so visual analytics can be performed Data Collection The data gathered from Twitter API is stored in JSON format. This data has defined structure based on the twitter-defined fields like id, user or text. In order to collect tweets in real time, Twitter offers a Streaming API. Using a developer account it is possible to retrieve the 1% of all the tweets requested from this API. In this project we collected tweets based on a bounding box of the coordinates of the cities of Sydney, Melbourne, Perth, Brisbane and Adelaide (Figure 2). Using the API we collected 1.6 million tweets in total. The city that has most number of tweets was Sydney with 783,000 tweets. 4

Figure 2 Melbourne coordinates That number of tweets could not been used totally because only a small number of them were effective for this research, i.e. that were specifically related to hotels.

Figure 3 hotel-related tweets We initially harvested around 1.6 million tweets in an Apache CouchDB instance for the five cities of Australia, but we got only 200 tweets related to hotels.

12 Figure 2 Melbourne coordinates That number of tweets could not been used totally because only a small number of them were effective for this research, i.e. that were specifically related to hotels. In order to find the hotel-related tweets, it was necessary to use a search engine that performs full-text search and retrieves only the tweets that had in their text field i.e. the word hotel (Figure 3). Figure 3 hotel-related tweets We initially harvested around 1.6 million tweets in an Apache CouchDB instance for the five cities of Australia, but we got only 200 tweets related to hotels. Thus, we decided to gather more data from a bigger database that had been harvesting tweets since May This new database had over 53 million tweets for these five cities. In addition, this new database had its data indexed to an Elasticsearch instance, so when we queried the database for the word hotel, we got around 11 thousand tweets. In order to get more data, the two databases were integrated into one Couchbase instance as shown in Figure 4: 5

were used to migrate the data from the CouchDB instance and the hotel-related tweets from the Elasticsearch store.

Couchbase uses buckets instead of databases and these buckets are the containers of documents.

One big advantage of Couchbase is that it has more documentation and support than CouchDB.

13 DATA SOURCE Twitter data 1.6 million tweets 53 million tweets hotel-related queries Migrators XDCR Figure 4 Data Collection Two python scripts were used to migrate the data from the CouchDB instance and the hotel-related tweets from the Elasticsearch store. CouchDB and Couchbase have some advantages and disadvantages. CouchDB is an earlier development of Couchbase. Couchbase uses buckets instead of databases and these buckets are the containers of documents. Couchbase has some features inherited from CouchDB and Membase. One big advantage of Couchbase is that it has more documentation and support than CouchDB. In addition, it is horizontally scalable and allows increasing the performance when querying the database. One similarity is that both are NoSQL databases, which is useful to handle twitter data (Couchbae, 2014). To perform queries over the text field of a tweet, Elasticsearch was used. XDCR Figure 5 Indexing to Elasticsearch 6

14 Elasticsearch performs full-text search based on its query domain specific language (DSL) based on JSON for queries. Each query to Elasticsearch can have different filters that are shown in the Section 3.4. It also provides a distributed infrastructure that can be used to increase search performance. In our case, we built up a cluster of two Elasticsearch instances. Couchbase instances perform replication to other nodes in different data centres by using the cross datacentre replication (XDCR). This feature is also used for indexing data in Elasticsearch. The big advantage of using XDCR is that every change made in a document of a bucket in Couchbase, will be reflected in Elasticsearch. The sentiment classification of tweets was performed in the database and then reflected into the Elasticsearch cluster. For the presentation, only the data stored in the indexes of Elasticsearch was used Text Preparation For this stage, regular expressions were used to remove all characters that were not part of the English alphabet. This process basically removed the links attached to the text, hashtags, and emoticons. It also converted the text to lowercase. We did not considered methods such as part of speech identifiers since (Kouloumpis et al., 2011) demonstrated those features might not be useful for twitter data. We neither used emoticons corpus because we found in the twitter hotel data that people use Unicode of text such as : ) The results of this is shown here. emoticons instead Original text: I like it raw, #Melbourne #Australia Crown Casino Melbourne #crowncasino Clean text: i like it raw melbourne australia crowncasino crown casino Melbourne 3.3. Sentiment Detection and Classification There are several techniques used to support sentiment detection from text. It is important to consider how subjective is the text to be analysed. Most of the tweets are written to share information more than emotions, i.e. they have neutral sentiment. For instance: I'm at Crown Casino We used the TextBlob library for Python. TextBlob is built on top of the natural language toolkit (NLTK), so it was possible to do sentiment analysis of the twitter text using NLTK tools. The NLTK sentiment analyser uses a corpus of movie reviews that contains 1000 reviews for polarity data set and for 7

sentence polarity data set 5331 sentences (Lee, 2005). The sentiment analyser uses a Naïve Bayes classifier based on the movie reviews corpus (Toolkit, 2014).

In addition, it returns a number representing the subjectivity of the text. For classification we used two different corpora: the NLTK movie reviews and a manually labelled twitter data set.

2, where it used clean text to perform sentiment classification.

15 sentence polarity data set 5331 sentences (Lee, 2005). The sentiment analyser uses a Naïve Bayes classifier based on the movie reviews corpus (Toolkit, 2014). The TextBlob library uses the NLTK to provide an interface that allows determining whether a tweet is positive, negative or neutral. In addition, it returns a number representing the subjectivity of the text. For classification we used two different corpora: the NLTK movie reviews and a manually labelled twitter data set. Thus, after migrating the data from CouchDB and Elasticsearch databases, the script Classifier was used to prepare the text as in Section 3.2, where it used clean text to perform sentiment classification. Following this, tweets were updated with a new label field as shown in Figure 6: XDCR read write Classifier Figure 6 Tweets classification The label field contains the polarity of the text, and the subjectivity. By using this new field, it was possible to determine how many of them were positive, negative or neutral in the presentation layer of the system Web Interface After having the data classified, it was then necessary to present the data in a way that it is easy to visualise/analyse. Kibana is a tool used to handle data dynamically and create customizable dashboards. This tool was created using HTML and JavaScript, so it is not necessary to use extra modules on a standard web server (Elasticsearch, 2014). The Elasticsearch DSL language supports generation of dynamic queries to Elasticsearch engine and retrieval of those results for presentation in Kibana. Since Couchbase replicates real-time data to Elasticsearch, it is possible to use streaming data in Kibana as well. 8

Figure 7 Kibana dashboard for Melbourne For example this pie chart shows the percentage of positive, negative and neutral tweets for the hotel Crown in Melbourne.

16 Figure 7 Kibana dashboard for Melbourne For example this pie chart shows the percentage of positive, negative and neutral tweets for the hotel Crown in Melbourne. The data stored in Elasticsearch was first migrated from the Elasticsearch and CouchDB databases, and then subsequently stored in another Couchbase instance. Once classified it was replicated to Elasticsearch. This data was related to hotels in Melbourne such as the Crown shown here. However, there are many other hotels. Elasticsearch DSL language is useful when filtering data as needed. An example of query to Elasticsearch is shown in Figure 8: curl -X POST " -d '{ "query": { "bool": { "must": [ { "term": { "couchbasedocument.doc.text": "crown" } } ], "must_not": [ { "term": { "couchbasedocument.doc.user.screen_name": "animaliberaaus" } } ], "should": [ { "term": { "couchbasedocument.doc.text": "casino" 9

17 } } }, { "term": { "couchbasedocument.doc.text": "hotel" } } ] } }, "from": 0, "size": 10, "sort": [], "facets": {} Figure 8 Elasticsearch query In this example this is the request sent to the instance. The Elasticsearch host is and the port is The index name to look up the documents in is melbourne, the _search field is to use the Elasticsearch search API and the body of the request has to be sent in JSON format. The variable pretty is set to true in order to show the results in a more legible format. In the example above, the query for the term crown must be present in the couchbasedocument.doc.text field of the document. The terms casino and hotel are not always required. The term animaliberaaus must not be part of the query in the field couchbasedocument.doc.user.screen_name. Finally, limiting their number retrieved can be supported; in this case the limit is set to return ten results. Furthermore, it is also possible to sort the results and to include more filters to the query defining the facets field. The result is delivered as a JSON object: { "took" : 20, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2203, "max_score" : , "hits" : [ { "_index" : "melbourne", "_type" : "couchbasedocument", "_id" : " ", "_score" : , "_source":{"doc":{"contributors":null,"truncated":false,"text": CROWN CASINO ","in_reply_to_status_id":null,"id": ,"favorite_count":0," source":"<a href=\" rel=\"nofollow\">.. Figure 9 Results from Elasticsearch (Excerpt) 10

18 The result shows the time taken to retrieve the data; twenty milliseconds in this case. No time outs in the query were produced and the search was successfully completed in the 5 shards each index in Elasticsearch has. Each shard represents a working unit of Elasticsearch search engine. An index usually points to primary and replica shards that can be distributed over different nodes in the Elasticsearch cluster. If there is a failure, these shards can be redistributed over other nodes. The number of items relevant to the query is the hits ; in the example given here the number of total hits is Kibana generates this type of query dynamically when a new pie chart, table, map or filter is included in the dashboard. It is worth noting that we also used these types of queries to Elasticsearch when we migrated hotel-related tweets from the 53-million tweet database. 4. Experiments and Results 4.1. Sentiment Analysis For the NLTK corpus, we calculated the accuracy of the NLTK classifier by classifying 400 with the NLTK sentiment analyser, and then we classified the same 400 tweets manually. The accuracy of the NLTK sentiment analyser was 68.35%. For the manually labelled twitter data, we trained a Naïve Bayes classifier with those 400 tweets that we already classified as a training set and then we tested it with 50 new tweets. In this experiment we got an accuracy of 68.18%. Even though, we used a unique feature, the text of the tweet, in the second classifier, the accuracy of the second classifier was similar to the NLTK classifier based on the movies review corpus. In addition it is important to point out that we labelled only 400 tweets compared to the 5000 sentences that the NLTK corpora has. Most of the data classified corresponds to the class neutral because some tweets are related to news or to factual information Hotel reviews in twitter We gathered the top 10 best hotels of the city of Melbourne ranked by TripAdvisor; the first in the ranking is The Langham and the last is Marriott. After performing analytics with Kibana the results were as shown in Figure 10 11

19 Melbourne positive negative neutral effective hits Figure 10 Results Melbourne According to Figure 10, the Crown Complex is the hotel with most Tweets in Melbourne. The top 3 hotels were: Crown Towers, Sofitel, The Langham To classify these three hotels we considered the number of total hits and the neutral content. In general, the positive emotions are more than the negative, but both are below the number of neutral tweets. We can assume that it is because these hotels are in the 5-star hotels in Melbourne. The neutral sentiment is also important to consider because there always is a motive to share thoughts and emotions. In the case of this tweet: I'm at Crown Casino, the classification is neutral; however, the question remains: why does this person share that? According to (Cocker, 2014) there are three motives to share: to build social capital, self-enhancement and altruism. In the case of the person that shared that he is at the Crown, it may show that he is trying to self-enhance because Crown Casino gives him a good social status. In this sense, this may be considered as something positive for the Crown Complex. The hotel with most negative sentiment was the Marriott. On the 3 rd of October 2014 news about Marriott hotel using a jamming system to prevent guests using mobile networks in their phones appeared (Fung, 2014), so several users shared the news. Even though this event was in Tennessee, US, twitter users in Australia were outraged. 12

20 The hotel with least Tweets was the The Blackman. Figure 11 The Blackman review in TripAdvisor. Source: (Advisor, 2014) According to TripAdvisor, this user ranked the Blackman Hotel with four points out of five; however, the review shows a problem that she experienced. That may be the reason to share a review that she wrote her first time, according to the information of her profile in TripAdvisor (1 review). Therefore, this hotel may not been motivating its guests enough to share opinions (positive, negative or neutral) in Twitter. 5. Conclusions The aim of analysing Twitter data was to identify general opinion on hotels accurately and compare how these sentiments relate to hotel guides in the major cities of Australia. For this purpose we used sentiment analysis techniques that include supervised machine learning classification. We evaluated Naïve Bayes classifier with two different training data sets. The first one contained around 5000 movie reviews labelled sentences and the second one 400 hotel-related twitter data. Although the second data set contained significantly less data than the first one, the accuracy for both systems was similar: around 68%. This classification was oriented to subjective (emotional) data, but the objective (neutral) data is also worthy of evaluation. In this sense, we found that people tend not to share thoughts unless they have a motive: either positive or negative sentiment. According to (Rimé, Mesquita, Boca, & Philippot, 1991), people share emotional experiences only with their close people and with people that may find that experience relevant. In some cases people share data that is objective, which may impact on other s opinion like the Marriott news. The neutral data can also reveal positive implied meanings, e.g. like the user that posted a tweet indicating that he was at the Crown hotel. The data of TripAdvisor and the data of Twitter are completely different in terms of structure and meaning. TripAdvisor is a social medium to share reviews about travel-related places, while Twitter is a microblogging web site that facilitates sharing of opinions and thoughts about any topic. Finding relevant data in Twitter is a complex process that requires a deep analysis of the language used, the type of user, the event and the topic. We could count 13

21 on a database of about 53-million tweets, but when we analysed the public opinion of a hotel, the best result was 305 tweets while TripAdvisor has thousands of reviews. References Advisor, T. (2014). The Blackman hotel reviews. Retrieved 30/10/2014, 2014, from d r art_series_the_blackman- Melbourne_Victoria.html - CHECK_RATES_CONT Agarwal, A., Xie, B., Vovsha, I., Rambow, O., & Passonneau, R. (2011). Sentiment analysis of twitter data. Paper presented at the Proceedings of the Workshop on Languages in Social Media. Cocker, B. (2014). Advertising primer. A background to Advertising. Internet Marketing subject slides. The University of Melbourne. Melbourne. Couchbae. (2014, 2014). Couchbase vs. Apache CouchDB. A comparison of two open source NoSQL database technologies 2014, from Elasticsearch. (2014). visualize logs and time-stamped data. Retrieved 30/10/2014, 2014, from Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications of the ACM, 56(4), Fung, B. (2014). FCC to Marriott: No, you can t force your customers onto terrible hotel WiFi. Gräbner, D., Zanker, M., Fliedl, G., & Fuchs, M. (2012). Classification of customer reviews based on sentiment analysis: na. Kasper, W., & Vela, M. (2011). Sentiment analysis for hotel reviews. Paper presented at the Computational Linguistics-Applications Conference. Kaur, A., & Gupta, V. (2013). A Survey on Sentiment Analysis and Opinion Mining Techniques. Journal of Emerging Technologies in Web Intelligence, 5(4), Kouloumpis, E., Wilson, T., & Moore, J. (2011). Twitter sentiment analysis: The good the bad and the omg! ICWSM, 11, Lee, B. P. a. L. (2005). Movie Review Data. Retrieved 30/10/2014, 2014, from Rambocas, M., & Gama, J. (2013). Marketing Research: The Role of Sentiment Analysis: Universidade do Porto, Faculdade de Economia do Porto. Rimé, B., Mesquita, B., Boca, S., & Philippot, P. (1991). Beyond the emotional event: Six studies on the social sharing of emotion. Cognition and Emotion, 5(5-6), doi: / Russell, M. A. (2013). Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More: " O'Reilly Media, Inc.". Russell, M. A., & Russell, M. (2011). 21 Recipes for Mining Twitter: " O'Reilly Media, Inc.". Technology, B. N. (2012, 01/02/2012). TripAdvisor rebuked over 'trust' claims on review site. Retrieved 30/10/2014, 2014, from 14

22 Toolkit, N. L. (2014, 21/08/2014). Natural Language Toolkit. Retrieved 30/10/2014, 2014, from TripAdvisorUK. (2014). Fact Sheet. Retrieved 30/10/2014, 2014, from Twitter. (2012, 21/03/2012). Twitter turns six. Retrieved 30/10/2014, 2014, from blog.twitter.com/2012/twitter-turns-six 15

Managing your online reputation

Managing your online reputation In this internet age where every thought, feeling and opinion is tweeted, posted or blogged about for the world to see, reputation management has never been so important