Sentiment Analysis for Mobile SNS Data

Size: px
Start display at page:

Download "Sentiment Analysis for Mobile SNS Data"

Transcription

1 Sentiment Analysis for Mobile SNS Data SeonHwan Kim, Il-Kyu Ha, Bong-Hyun Back and Byoungchul Ahn Department of Computer Engineering, Yeungnam University, Gyeongsan, Gyeongbuk, Korea Abstract Everyday a lot of diverse data have been generated every day regarding individual opinions and preferences on the contents of Social Network Service (SNS). These data could affect greatly to various fields of our society such as politics, public opinions, economics, services and entertainments. It is necessary to extract new information from SNS data and to understand the true intention of users or customers. To extract important information, it is required to several techniques to analyze a large amount of SNS data, extract meaningful data from them, and generate new information. This paper presents an efficient method that can process various unstructured big data on social networks, and extract the information for sentiment and generate preferences of users from sentiment information. The proposed method shows O(n) processing time as the number of data increases. Keywords: Big data, SNS, Sentiment 1 Introduction Social Networking Service (SNS) is widely serviced by smart phones and their users are increased very rapidly in recent years. In addition, a lot of data for a variety of personal opinions and interests are generated exponentially. Some critical information from SNS data might generate a great impact to public opinion formation in various fields such as politics, economy, service, and entertainment. It is necessary to develop methods or algorithms which extract and process meaningful information from a large amount of data generated by the SNS. Also it is required to capture opinions in real time and to utilize this information for various application fields and to represent them with visualization. We propose a big data processing method that can efficiently handle various unstructured data that collected from a lot of SNS data. Further, we suggest sentiment analysis algorithms, which can extract the sentiment information and classify preferences and changes of customers about a particular issues as time passes. 2 Related work Most data generated on SNS service are unstructured data because data have not been standardized and its structure and shape are so complex unlike video image data and document data [1]. In order to extract meaningful information from a number of unstructured data on SNS, the process of unstructured data is needed. Various technologies for processing the unstructured data are studied focusing on morphological analysis. However, barriers to data analysis such as symbol word and new buzzword from the young people could exist. For this reason, big data processing and sensitive analysis using the computer has become more difficult. Thus, researches on text mining extract information in the semi-structured or atypical text data based on the natural language processing techniques have been developed[5-7]. They are using statistical, periodic algorithm based on machine learning to extract meaningful information and to purify the information from the text data of the mass. In addition, research on opinion mining to determine the evaluation of positive, negative, neutral preference in the text has also been carried out [8-9]. Currently, a variety of open source projects for processing big data are in progress by naming ecosystem of Hadoop (Hadoop ECO system) [10]. Database that is used to process the big data, use NoSQL (Not-Only SQL) for storage and retrieval of data using the consistency model less restrictive than traditional relational databases [11]. As relational databases such as RDBMS, NoSQL uses a database depending on the situation. Many studies on the NoSQL database is underway in academia and industry current. Typically, Google BigTable, Amazon DynamoDB, Apache HBase of open source projects, Cassandra, MongoDB are representative [10][12][13][14][16]. In particular, MongoDB that are used in this study is classified to a CP type database with the Partition tolerance and Consistency based on the theory (Consistency, Availability, Partition tolerance) of CAP. It has been promoted as a source project. The sentiment is emotion which we feel in mind and happen to some works or phenomena [16]. Sentiment Analysis is a process that discovers and extracts subjective information from the original data by utilizing computational linguistics, natural language processing and text analytics [16]. Studies that analyze the sentiment from big data have been developed[17-19]. Work to analyze the type of sentiment and classification, can be divided into three stages significantly. In the first step, the sentence in which sensibility information is included to express thoughts and feelings subjective is extracted. In the next step, the polarity of the sentence or document is classified like as positive,

2 negative or neutral. In final step, a classification of intensity determines subjectivity strength of text documents [20-21]. 3 Sentiment Analysis of Unstructured SNS Data 3.1 System Model We propose a big data processing system that can efficiently handle various unstructured SNS data. The proposed system is comprised of parallel HDFS(Hadoop Distributed File System) and MapReduce. Parallel HDFS that is based on the ecosystem of Hadoop is used to collect and save data reliably from a large variety SNS data. And MapReduce[22] is used to analyze large amounts of unstructured data for sentiment of user effectively. Configuration of the proposed system is shown in Figure MapReduce Functions MapReduce is a software framework developed by Google to support distributed computing and parallel programming using the concept of function called map. In this paper, it is classified into four special map functions. They perform positive/negative context analysis, morphological analysis, token analysis, prohibitive word analysis respectively. Table 2 shows 4 proposed functions and their operations. Table 2. Functions of the proposed sentiment analysis Sentiment analysis function Operations Referenced dictionary Positive/negative context analysis function context analysis using sentence pattern matching elimination of needless Morphological elements, analysis function calculation of the result count Token analysis function creation of tokens calculation of the result count Prohibited words calculation of the prohibited analysis function word score positive/negative context dictionary positive/negative word dictionary prohibited word dictionary Figure 1. The proposed system 3.2 Composition of HDFS HDFS is a file processing system which has the structure of distributed processing. It has been configured as a parallel server shown in Figure 1. The system is connected in parallel using four servers based on Linux and each chunk node to store data is set to 64MB. It duplicates the name server using the NFS for disaster recovery. Functions of the proposed servers are described in Table 1. Table 1. HDFS Servers Server Components Functions PrimaryServer (Master Node) SecondaryServer (Slave Node 1) DataServer1 (Slave Node 2) DataServer2 (Slave Node 3) Namenode, MapReduce, Crawler Secondary NameNode Main server for parallel distribution process Name node(controlling other servers) Backup server of main server First, it performs a positive/negative contextual analysis function. It examines the context by each sentence to enhance accuracy and is subjected to matching pattern with the negative context dictionary or the positive context dictionary. And it counts the number of positive and negative context, if the number of positive word is equal to the number of negative words, the sentence is treated as positive and it is transferred to the morphological analysis if the contextual analysis does not classify context. Algorithm for contextual analysis is shown as Figure 2. Second, it performs a morphological analysis function. This function removes an unnecessary component such as special symbols in the analysis by using the morphological analyzer. And it counts by comparing the sentence to positive and negative clause dictionaries. If the value of positive counter is equal to that of negative counter, the sentence is treated as positive. If the morphological analyzer does not classify the polarity, the sentence is passed to the token analysis. Third, it performs the token analysis. After separating tokens by space from the source sentence, the function counts positive word and negative word by comparing the negative and positive dictionaries. If the value of the positive counter is equal to that of negative counter, the sentence is treated as positive. If the token analysis does not classify the polarity, the sentence is passed to the prohibition word analysis. Fourth, it performs prohibitive analysis. It calculates the prohibition score based on prohibition dictionary. Algorithm

3 for morphological analysis, token analysis and prohibition word analysis is described as Figure 3. //Context Analysis //input keyword, source //keyword: target word for decision of positive or negative sentiment //source: source data of text form that is processed by HDFS Input keyword and source Initialize result // a criteria for sentiment decision //pre-processing Change the keyword to lower-case Change the source to lower-case Eliminate the needless characters in source text Initialize positive_count and negative_count //Context Analysis Get the minimum sentence unit from the source //Computation of the positive_count and negative_count if (minimum sentence unit == positive) then positive_count++ if( minimum sentence unit == negative) then negative_count++ Repeat this step until there is no minimum sentence unit //Computation of the result by positive_count and negative_count if (positive_count == 0 and negative_count == 0) then result = 0 //undecidable if (positive_count == negative_count) then result = 1 //positive else result = positive_count - negative_count Figure 2. Context analysis function 3.4 Dictionaries for Sentiment Analysis The proposed dictionaries use five MapReduce functions. They are a positive context, a negative context dictionary, a positive word dictionary, a negative word dictionary and a prohibited word dictionary. In prohibition word dictionary, it is composed of polarity and score. The role of each dictionary is shown as Table 3. //Morphological Analysis if (result == 0) in previous stage Input source Initialize result-s //a criteria for sentiment decision //pre-processing source Eliminate the needless characters in source text Initialize positive_count_s and negative_count_s //Computation of the positive_count_s and negative_count_s using //the positive/negative word dictionary Compute positive_count_s, negative_count_s Repeat this step until there is no morpheme unit if (positive_count_s == 0 and negative_count_s == 0) then result-s=0 if (positive_count_s == negative_count_s) then result-s = positive_count_s else result-s = positive_count_s - negative_count_s //Token Analysis if (result-s == 0) in previous stage Create tokens Initialize positive_count_s and negative_count_s //Computation of the positive_count_s and negative_count_s using // the positive/negative word dictionary Compute positive_count_s, negative_count_s Repeat this step until there is no token if (positive_count_s == 0 and negative_count_s == 0) then result-s=0 if (positive_count_s == negative_count_s) then result-s = positive_count_s else result-s = positive_count_s - negative_count_s //Prohibited word Analysis if (result-s == 0) in previous stage //Computation of the positive_count_s and negative_count_s using // the prohibited word dictionary Compute positive_count_s, negative_count_s result-s = positive_count_s - negative_count_s Figure 3. Analysis of Morphological, token and prohibited word Table 3. Dictionaries for sentiment analysis Role application Positive Context Negative Context Positive Word Negative Word Prohibited Word compute the number of positive context in source sentence / set of positive context patterns compute the number of negative context in source sentence / set of negative context patterns compute the number of positive word in source sentence / set of positive word patterns compute the number of negative word in source sentence / set of negative word patterns compute the number of prohibited word in source sentence / set of prohibited words 4 Experiment and Results Context Analysis Morphological/To ken Analysis Prohibited Word Analysis 4.1 Data Collection and Experimental Environment Data collection performance of the proposed system is analyzed through the Twitter and Topsy. Topsy analyzes the activity of users in the SNS services such as Google Plus and Twitter. Topsy provides the analyzed data by analyzing about 500 millions of data per day. After the acquisition of the historical data, Twitter4j is used to collect data for continuous incremental data. Twitter provides one week data only and the key that may be used to query 450 for 15 minutes. In this study, a data collection module is to run every 4 hours using the crawler. Experimental environment of the proposed system for performance analysis is described at Table 4. The proposed system consists of four Hadoop-based parallel servers and uses the 6.3 x64 CentOS as an operating system. Components OS, RE Crawler, HDFS Layer MapReduce Layer MongoDB WAS, Web Server Table 4. Experimental environment Roles Use of Hadoop for distributed storage Supporting Java environment for processing some business logic Crawler: Gathering the source data from various SNSs HDFS: Distribution File system, Data storage Sentence Analysis, Text Mining, Sentiment Analysis Storing analyzed results by MapReduce in MongoDB Supporting Web applications using analyzed results

4 4.2 Analysis and Evaluation The following four tests have been carried out to analyze the performance of the proposed system. First, it is an experiment of the system performance according to the number of data. The test of system load and acquisition time is performed using seven Twitter data sets at Table 5. Each data are collected using Topsy API. Table 5. Data sets for experiment and analysis Data Set Extraction Number of data number period (day) API 1 2,106 1 Topsys API 2 11, , , , , , Figure 4 shows a comparison of HDFS loading time and crawling time for each data set. Figure 5 and 6 shows the CPU load and memory load of each node in HDFS when each dataset has stacked and crawled. In the case of 2,106 dataset, crawl time is 6 seconds and HDFS loading time is 1 second. In the case of 100,497 dataset, crawl time is 70 seconds and HDFS loading time is 10 seconds as shown in Figure 4. The processing time is increased in HDFS loading time and crawl time in proportion to the number of data. Thus, the network load and the system load by collecting and stacking data show very close to the proposed system, the stable data collection and the data loading are processed in a few seconds. resource by distributing loading the data. The master node uses more memory resource than the slave nodes. Figure 5. Memory Usage for Data Crawling and HDFS Loading In Figure 6, slave node SN1 and slave node SN2 show that CPU usages are from maximum 2.8% to minimum 0.0%. But the slave node SN3 shows the CPU usage is from minimum 0.0% up to 11.4%. The reason is that the slave node SN3 loads data in parallel and distributed processing. The master node shows the CPU usage is from 5.0% up to 7.9%. Therefore, the proposed system provides a stable environment when it collects and loads data. Figure 6. CPU Usage for Data Crawling and HDFS Loading Figure 4. Crawling Time and HDFS Loading Time The memory usage from slave node SN1 to slave node SN3 has used maximum 3.93% and minimum 0.03%. The master node M, has used from maximum 7.31% to minimum 0.6% as shown in Figure 5. Slave nodes use small memory Figure 7. Time of MapReduce Processing and Sentiment Analysis

5 Sentiment analysis time and system load are tested by increasing the number of data. The experiment is executed in the degree of the system load and analysis time for sentiment analysis. Figure 7 shows the comparison of the sentiment analysis time for each data set. Figure 8 and 9 show memory load and CPU load for each node. The sentiment analysis time takes from 68 seconds to 35 seconds for each 7 data sets. The analysis time increases linearly to the number of data as shown in Figure 7. In Figure 8, the master node does not process actual analysis but manage slave nodes. Its CPU usage is low when the slave nodes use most of the CPU resource. When the number of data set is less than 40,000, each slave node processes data in parallel. When the number of data set is greater than 40,000, all slave nodes utilize to maximize CPU resources according to the number of data. Therefore, the proposed system is performed stably as the number of data is increased. This is because the proposed system engages in parallel mode if CPU loads are increased. In Figure 9, the memory usage of the master node is low, but the load of the memory usage of slave nodes is distributed to each slave node and all slaves have balanced for the analysis. Therefore the proposed system distributes work load to slave nodes equally and maintains the load balancing. The system and algorithm of the propose method shows O(n) processing time. It provides a stable distributed analysis environment without processing by a single node. Figure 9. Memory Consumption of MapReduce Processing and Sentiment Analysis Figure 10. Comparison between the results of the proposed functions and the results of manual sorting Figure 8. CPU Consumption of MapReduce Processing and Sentiment Analysis The accuracy of the sentiment analysis is measured. "Happy" word is used to analyze the sentiment. Figure 10 shows the comparison results of the proposed system and manual works. In Figure 10, error ratio of neutral sentiment is relatively high and the error rate for positive and negative sentiment is relatively small. The sentiment analysis results of the proposed system are very close to those of manual works. 5 Conclusions A big data processing system and algorithms are proposed to analyze the sentiment of users from the large amounts of unstructured data generated by SNS. The proposed system is composed of a parallel HDFS system based Hadoop Ecosystem and four primary special functions for the MapReduce. In addition, it uses the five types of data dictionary for sentiment analysis. The proposed system processes data with small loading time as the number of data increases. The analyzing works are not processed by one node, but distributed to all nodes for load balancing. When the proposed sentiment analysis functions have processed the data, the load of the system is distributed to all slave nodes equally. The sentiment analysis results of the proposed system are very close to those of manual works. Therefore the proposed system distributes work load to slave nodes equally and maintains the load balancing. Please address any questions of this paper to Byoungchul Ahn by (b.ahn@yu.ac.kr).

6 6 Acknowledgement This work (Grants No. C ) was supported by Business for Cooperative R&D between Industry, Academy, and Research Institute funded Korea Small and Medium Business Administration in References [1] McKinsey, 2011, Big Data: The Next Frontier for Innovation, Competition, and Productivity, [Online. McKinsey & Company, [2] Chang-Shing Lee, Mei-Hui Wang, Automated ontology construction for unstructured text documents, Data & Knowledge Engineering, Vol.60, Iss.3, pp , 2007 [3] B. Lee, J. Lim, J. Yoo, Utilization of Social Media Analysis using Big Data, Jour. of the Korea Contents Association, Vol.13, No.2, pp , 2013 [4] M. Song, S. Kim, A Study of improving on prediction model by analyzing method Big data, The Journal of Digital Policy & Management, Vol.11, No.6, pp , 2013 [5] Ah Tan, Text mining: The state of the art and the challenges, Proc. of the PAKDD 1999, 1999 [6] Q. Mei, C. Xhai, Discovering evolutionary theme patterns from text: an exploration of temporal text mining, Proc. of the 11th ACM SIGKDD international conference on knowledge discovery in data mining, pp , 2005 [7] K. Park, K. Hwang, A Bio-Text Mining System Based on Natural Language Processing, Jour. of KISS: computing practices, Vol.17, No.4, pp , 2011 [13] S. Sivasubramanian, Amazon dynamodb: a seamlessly scalable non-relational database service, Proc. of the 2012 ACM SIGMOD 12, pp , 2012 [14] Lars George, HBase: The Definitive Guide, O REILLY, 2011 [15] Kristina Chodorow, MongoDB: The Definitive Guide 2nd Edition, O REILLY, 2013 [16] B. Pang,,L. Lee, "Opinion Mining and Sentiment Analysis," Foundations and Trends in Information Retrieval: Vol.2, No.1-2,pp.1-135, 2008 [17] S. Mukherjee, P. Bhattacharyya, Sentiment Analysis in Twitter with Lightweight Discourse Analysis, Proc. of COLING 2012, pp , 2012 [18] N. Godbole, S. Skiena, Large-Scale Sentiment Analysis for News and Blogs, Proc. of the ICWSM 2007, 2007 [19] A. Pak, P. Paroubek, Twitter as a Corpus for Sentiment Analysis and Opinion Mining, Proc. of the LREC 2010, 2010 [20] H. Tang, S. Tan, X. Cheng, " A survey on sentiment detection of reviews," Expert Systems with Applications, Vol.36, pp , 2009 [21] Seth Gilbert, Nancy Lynch, Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services, ACM SIGACT New 33(2), pp , [22] J. Dean, S. Ghemawat, MapReduce; Simplified Data Processing on Large Clusters, Communications of the ACM, Vol.51, No.1, pp , 2008 [8] B. Pang, L. Lee, Opinion Mining and Sentiment Analysis, Foundations and Trends in Information Retrieval, Vol.2, No.1-2, pp.1-135, 2008 [9] B. Kang, M. Song, A Study on Opinion Mining of Newspaper Texts based on Topic Modeling, Jour. of the Korean Library and Information Science Society, Vol.47, No.4, pp , 2013 [10] [11] Jing Han, Kian Du, Survey on NoSQL database, Proc. of 6th International Conference on Pervasive Computing and Applications(ICPCA), pp , 2011 [12] Fay Chang, R.E. Gruber, Bigtable: A Distributed Storage System for Structured Data, ACM Transations on Computer System, Vol.26, Iss.2, 2008

Research Article MapReduce Functions to Analyze Sentiment Information from Social Big Data

Research Article MapReduce Functions to Analyze Sentiment Information from Social Big Data Hindawi Publishing Corporation International Journal of Distributed Sensor Networks Volume 2015, Article ID 417502, 11 pages http://dx.doi.org/10.1155/2015/417502 Research Article MapReduce Functions to

More information

An Efficient Informal Data Processing Method by Removing Duplicated Data

An Efficient Informal Data Processing Method by Removing Duplicated Data An Efficient Informal Data Processing Method by Removing Duplicated Data Jaejeong Lee 1, Hyeongrak Park and Byoungchul Ahn * Dept. of Computer Engineering, Yeungnam University, Gyeongsan, Korea. *Corresponding

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

SMCCSE: PaaS Platform for processing large amounts of social media

SMCCSE: PaaS Platform for processing large amounts of social media KSII The first International Conference on Internet (ICONI) 2011, December 2011 1 Copyright c 2011 KSII SMCCSE: PaaS Platform for processing large amounts of social media Myoungjin Kim 1, Hanku Lee 2 and

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

Sentiment Analysis for Customer Review Sites

Sentiment Analysis for Customer Review Sites Sentiment Analysis for Customer Review Sites Chi-Hwan Choi 1, Jeong-Eun Lee 2, Gyeong-Su Park 2, Jonghwa Na 3, Wan-Sup Cho 4 1 Dept. of Bio-Information Technology 2 Dept. of Business Data Convergence 3

More information

Big Data with Hadoop Ecosystem

Big Data with Hadoop Ecosystem Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process

More information

A Fast and High Throughput SQL Query System for Big Data

A Fast and High Throughput SQL Query System for Big Data A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190

More information

Big Data Architect.

Big Data Architect. Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional

More information

Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2

Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2 2nd International Conference on Materials Science, Machinery and Energy Engineering (MSMEE 2017) Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2 1 Information Engineering

More information

HADOOP FRAMEWORK FOR BIG DATA

HADOOP FRAMEWORK FOR BIG DATA HADOOP FRAMEWORK FOR BIG DATA Mr K. Srinivas Babu 1,Dr K. Rameshwaraiah 2 1 Research Scholar S V University, Tirupathi 2 Professor and Head NNRESGI, Hyderabad Abstract - Data has to be stored for further

More information

Twitter data Analytics using Distributed Computing

Twitter data Analytics using Distributed Computing Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE

More information

Online Bill Processing System for Public Sectors in Big Data

Online Bill Processing System for Public Sectors in Big Data IJIRST International Journal for Innovative Research in Science & Technology Volume 4 Issue 10 March 2018 ISSN (online): 2349-6010 Online Bill Processing System for Public Sectors in Big Data H. Anwer

More information

Implementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b

Implementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b International Conference on Artificial Intelligence and Engineering Applications (AIEA 2016) Implementation of Parallel CASINO Algorithm Based on MapReduce Li Zhang a, Yijie Shi b State key laboratory

More information

Hadoop, Yarn and Beyond

Hadoop, Yarn and Beyond Hadoop, Yarn and Beyond 1 B. R A M A M U R T H Y Overview We learned about Hadoop1.x or the core. Just like Java evolved, Java core, Java 1.X, Java 2.. So on, software and systems evolve, naturally.. Lets

More information

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework Scientific Journal of Impact Factor (SJIF): e-issn (O): 2348- International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 A Study: Hadoop Framework Devateja

More information

Real-time Calculating Over Self-Health Data Using Storm Jiangyong Cai1, a, Zhengping Jin2, b

Real-time Calculating Over Self-Health Data Using Storm Jiangyong Cai1, a, Zhengping Jin2, b 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 2015) Real-time Calculating Over Self-Health Data Using Storm Jiangyong Cai1, a, Zhengping Jin2, b 1

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

Election Analysis and Prediction Using Big Data Analytics

Election Analysis and Prediction Using Big Data Analytics Election Analysis and Prediction Using Big Data Analytics Omkar Sawant, Chintaman Taral, Roopak Garbhe Students, Department Of Information Technology Vidyalankar Institute of Technology, Mumbai, India

More information

A MODEL OF EXTRACTING PATTERNS IN SOCIAL NETWORK DATA USING TOPIC MODELLING, SENTIMENT ANALYSIS AND GRAPH DATABASES

A MODEL OF EXTRACTING PATTERNS IN SOCIAL NETWORK DATA USING TOPIC MODELLING, SENTIMENT ANALYSIS AND GRAPH DATABASES A MODEL OF EXTRACTING PATTERNS IN SOCIAL NETWORK DATA USING TOPIC MODELLING, SENTIMENT ANALYSIS AND GRAPH DATABASES ABSTRACT Assane Wade 1 and Giovanna Di MarzoSerugendo 2 Centre Universitaire d Informatique

More information

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Richa Jain 1, Namrata Sharma 2 1M.Tech Scholar, Department of CSE, Sushila Devi Bansal College of Engineering, Indore (M.P.),

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK BIG DATA ANALYSIS ISSUES AND EVOLUTION OF HADOOP SURAJIT DAS, DR. DINESH GOPALANI

More information

Design and Implement of Bigdata Analysis Systems

Design and Implement of Bigdata Analysis Systems Design and Implement of Bigdata Analysis Systems Jeong-Joon Kim *Department of Computer Science & Engineering, Korea Polytechnic University, Gyeonggi-do Siheung-si 15073, Korea. Abstract The development

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

Databases 2 (VU) ( / )

Databases 2 (VU) ( / ) Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:

More information

Certified Big Data and Hadoop Course Curriculum

Certified Big Data and Hadoop Course Curriculum Certified Big Data and Hadoop Course Curriculum The Certified Big Data and Hadoop course by DataFlair is a perfect blend of in-depth theoretical knowledge and strong practical skills via implementation

More information

Study on the Distributed Crawling for Processing Massive Data in the Distributed Network Environment

Study on the Distributed Crawling for Processing Massive Data in the Distributed Network Environment , pp.375-384 http://dx.doi.org/10.14257/ijmue.2015.10.10.37 Study on the Distributed Crawling for Processing Massive Data in the Distributed Network Environment Chang-Su Kim PaiChai University, 155-40,

More information

Scalable Web Programming. CS193S - Jan Jannink - 2/25/10

Scalable Web Programming. CS193S - Jan Jannink - 2/25/10 Scalable Web Programming CS193S - Jan Jannink - 2/25/10 Weekly Syllabus 1.Scalability: (Jan.) 2.Agile Practices 3.Ecology/Mashups 4.Browser/Client 7.Analytics 8.Cloud/Map-Reduce 9.Published APIs: (Mar.)*

More information

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype? Big data hype? Big Data: Hype or Hallelujah? Data Base and Data Mining Group of 2 Google Flu trends On the Internet February 2010 detected flu outbreak two weeks ahead of CDC data Nowcasting http://www.internetlivestats.com/

More information

Embedded Technosolutions

Embedded Technosolutions Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication

More information

Fast and Effective System for Name Entity Recognition on Big Data

Fast and Effective System for Name Entity Recognition on Big Data International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-3, Issue-2 E-ISSN: 2347-2693 Fast and Effective System for Name Entity Recognition on Big Data Jigyasa Nigam

More information

New Challenges in Big Data: Technical Perspectives. Hwanjo Yu POSTECH

New Challenges in Big Data: Technical Perspectives. Hwanjo Yu POSTECH New Challenges in Big Data: Technical Perspectives Hwanjo Yu POSTECH http:/hwanjoyu.org Over 1 Billion SNS users!! Viral Marketing Word-of-Mouth Effect > TV advertising......... Influence Maximization

More information

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Chapter 3. Foundations of Business Intelligence: Databases and Information Management Chapter 3 Foundations of Business Intelligence: Databases and Information Management THE DATA HIERARCHY TRADITIONAL FILE PROCESSING Organizing Data in a Traditional File Environment Problems with the traditional

More information

Hadoop. copyright 2011 Trainologic LTD

Hadoop. copyright 2011 Trainologic LTD Hadoop Hadoop is a framework for processing large amounts of data in a distributed manner. It can scale up to thousands of machines. It provides high-availability. Provides map-reduce functionality. Hides

More information

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018 Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning

More information

A Review Paper on Big data & Hadoop

A Review Paper on Big data & Hadoop A Review Paper on Big data & Hadoop Rupali Jagadale MCA Department, Modern College of Engg. Modern College of Engginering Pune,India rupalijagadale02@gmail.com Pratibha Adkar MCA Department, Modern College

More information

Big Data Hadoop Course Content

Big Data Hadoop Course Content Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux

More information

Innovatus Technologies

Innovatus Technologies HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String

More information

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component

More information

SENTIMENT ESTIMATION OF TWEETS BY LEARNING SOCIAL BOOKMARK DATA

SENTIMENT ESTIMATION OF TWEETS BY LEARNING SOCIAL BOOKMARK DATA IADIS International Journal on WWW/Internet Vol. 14, No. 1, pp. 15-27 ISSN: 1645-7641 SENTIMENT ESTIMATION OF TWEETS BY LEARNING SOCIAL BOOKMARK DATA Yasuyuki Okamura, Takayuki Yumoto, Manabu Nii and Naotake

More information

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem I J C T A, 9(41) 2016, pp. 1235-1239 International Science Press Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem Hema Dubey *, Nilay Khare *, Alind Khare **

More information

Hadoop An Overview. - Socrates CCDH

Hadoop An Overview. - Socrates CCDH Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected

More information

A New Model of Search Engine based on Cloud Computing

A New Model of Search Engine based on Cloud Computing A New Model of Search Engine based on Cloud Computing DING Jian-li 1,2, YANG Bo 1 1. College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China 2. Tianjin Key

More information

Competitive Intelligence and Web Mining:

Competitive Intelligence and Web Mining: Competitive Intelligence and Web Mining: Domain Specific Web Spiders American University in Cairo (AUC) CSCE 590: Seminar1 Report Dr. Ahmed Rafea 2 P age Khalid Magdy Salama 3 P age Table of Contents Introduction

More information

Collective Intelligence in Action

Collective Intelligence in Action Collective Intelligence in Action SATNAM ALAG II MANNING Greenwich (74 w. long.) contents foreword xv preface xvii acknowledgments xix about this book xxi PART 1 GATHERING DATA FOR INTELLIGENCE 1 "1 Understanding

More information

DIGIT.B4 Big Data PoC

DIGIT.B4 Big Data PoC DIGIT.B4 Big Data PoC RTD Health papers D02.02 Technological Architecture Table of contents 1 Introduction... 5 2 Methodological Approach... 6 2.1 Business understanding... 7 2.2 Data linguistic understanding...

More information

CIB Session 12th NoSQL Databases Structures

CIB Session 12th NoSQL Databases Structures CIB Session 12th NoSQL Databases Structures By: Shahab Safaee & Morteza Zahedi Software Engineering PhD Email: safaee.shx@gmail.com, morteza.zahedi.a@gmail.com cibtrc.ir cibtrc cibtrc 2 Agenda What is

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

Cloud Computing Techniques for Big Data and Hadoop Implementation

Cloud Computing Techniques for Big Data and Hadoop Implementation Cloud Computing Techniques for Big Data and Hadoop Implementation Nikhil Gupta (Author) Ms. komal Saxena(Guide) Research scholar Assistant Professor AIIT, Amity university AIIT, Amity university NOIDA-UP

More information

10 Million Smart Meter Data with Apache HBase

10 Million Smart Meter Data with Apache HBase 10 Million Smart Meter Data with Apache HBase 5/31/2017 OSS Solution Center Hitachi, Ltd. Masahiro Ito OSS Summit Japan 2017 Who am I? Masahiro Ito ( 伊藤雅博 ) Software Engineer at Hitachi, Ltd. Focus on

More information

A Review to the Approach for Transformation of Data from MySQL to NoSQL

A Review to the Approach for Transformation of Data from MySQL to NoSQL A Review to the Approach for Transformation of Data from MySQL to NoSQL Monika 1 and Ashok 2 1 M. Tech. Scholar, Department of Computer Science and Engineering, BITS College of Engineering, Bhiwani, Haryana

More information

A data-driven framework for archiving and exploring social media data

A data-driven framework for archiving and exploring social media data A data-driven framework for archiving and exploring social media data Qunying Huang and Chen Xu Yongqi An, 20599957 Oct 18, 2016 Introduction Social media applications are widely deployed in various platforms

More information

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on

More information

SCHEME OF TEACHING AND EXAMINATION B.E. (ISE) VIII SEMESTER (ACADEMIC YEAR )

SCHEME OF TEACHING AND EXAMINATION B.E. (ISE) VIII SEMESTER (ACADEMIC YEAR ) SCHEME OF TEACHING AND EXAMINATION B.E. (ISE) VIII SEMESTER (ACADEMIC YEAR 2016-17) Sl Subject Code Subject Credits Hours/Week Examination Marks No Lecture Tutorial Practical CIE SEE Total 1 UIS00XX Elective

More information

BIG DATA TESTING: A UNIFIED VIEW

BIG DATA TESTING: A UNIFIED VIEW http://core.ecu.edu/strg BIG DATA TESTING: A UNIFIED VIEW BY NAM THAI ECU, Computer Science Department, March 16, 2016 2/30 PRESENTATION CONTENT 1. Overview of Big Data A. 5 V s of Big Data B. Data generation

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT. Oracle Big Data. A NALYTICS A ND MANAG E MENT. Oracle Big Data: Redundância. Compatível com ecossistema Hadoop, HIVE, HBASE, SPARK. Integração com Cloudera Manager. Possibilidade de Utilização da Linguagem

More information

Data Analysis Using MapReduce in Hadoop Environment

Data Analysis Using MapReduce in Hadoop Environment Data Analysis Using MapReduce in Hadoop Environment Muhammad Khairul Rijal Muhammad*, Saiful Adli Ismail, Mohd Nazri Kama, Othman Mohd Yusop, Azri Azmi Advanced Informatics School (UTM AIS), Universiti

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data

More information

This is a brief tutorial that explains how to make use of Sqoop in Hadoop ecosystem.

This is a brief tutorial that explains how to make use of Sqoop in Hadoop ecosystem. About the Tutorial Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and

More information

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? Simple to start What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? What is the maximum download speed you get? Simple computation

More information

A BigData Tour HDFS, Ceph and MapReduce

A BigData Tour HDFS, Ceph and MapReduce A BigData Tour HDFS, Ceph and MapReduce These slides are possible thanks to these sources Jonathan Drusi - SCInet Toronto Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing SICS; Yahoo!

More information

Cassandra- A Distributed Database

Cassandra- A Distributed Database Cassandra- A Distributed Database Tulika Gupta Department of Information Technology Poornima Institute of Engineering and Technology Jaipur, Rajasthan, India Abstract- A relational database is a traditional

More information

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce Parallel Programming Principle and Practice Lecture 10 Big Data Processing with MapReduce Outline MapReduce Programming Model MapReduce Examples Hadoop 2 Incredible Things That Happen Every Minute On The

More information

D DAVID PUBLISHING. Big Data; Definition and Challenges. 1. Introduction. Shirin Abbasi

D DAVID PUBLISHING. Big Data; Definition and Challenges. 1. Introduction. Shirin Abbasi Journal of Energy and Power Engineering 10 (2016) 405-410 doi: 10.17265/1934-8975/2016.07.004 D DAVID PUBLISHING Shirin Abbasi Computer Department, Islamic Azad University-Tehran Center Branch, Tehran

More information

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management Management Information Systems Review Questions Chapter 6 Foundations of Business Intelligence: Databases and Information Management 1) The traditional file environment does not typically have a problem

More information

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI 2017 International Conference on Electronic, Control, Automation and Mechanical Engineering (ECAME 2017) ISBN: 978-1-60595-523-0 The Establishment of Large Data Mining Platform Based on Cloud Computing

More information

Certified Big Data Hadoop and Spark Scala Course Curriculum

Certified Big Data Hadoop and Spark Scala Course Curriculum Certified Big Data Hadoop and Spark Scala Course Curriculum The Certified Big Data Hadoop and Spark Scala course by DataFlair is a perfect blend of indepth theoretical knowledge and strong practical skills

More information

Bigdata Platform Design and Implementation Model

Bigdata Platform Design and Implementation Model Indian Journal of Science and Technology, Vol 8(18), DOI: 10.17485/ijst/2015/v8i18/75864, August 2015 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Bigdata Platform Design and Implementation Model

More information

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29,

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, 2016 1 OBJECTIVES ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, 2016 2 WHAT

More information

Global Journal of Engineering Science and Research Management

Global Journal of Engineering Science and Research Management A FUNDAMENTAL CONCEPT OF MAPREDUCE WITH MASSIVE FILES DATASET IN BIG DATA USING HADOOP PSEUDO-DISTRIBUTION MODE K. Srikanth*, P. Venkateswarlu, Ashok Suragala * Department of Information Technology, JNTUK-UCEV

More information

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423

More information

TOOLS FOR INTEGRATING BIG DATA IN CLOUD COMPUTING: A STATE OF ART SURVEY

TOOLS FOR INTEGRATING BIG DATA IN CLOUD COMPUTING: A STATE OF ART SURVEY Journal of Analysis and Computation (JAC) (An International Peer Reviewed Journal), www.ijaconline.com, ISSN 0973-2861 International Conference on Emerging Trends in IOT & Machine Learning, 2018 TOOLS

More information

Introduction to NoSQL by William McKnight

Introduction to NoSQL by William McKnight Introduction to NoSQL by William McKnight All rights reserved. Reproduction in whole or part prohibited except by written permission. Product and company names mentioned herein may be trademarks of their

More information

Google File System (GFS) and Hadoop Distributed File System (HDFS)

Google File System (GFS) and Hadoop Distributed File System (HDFS) Google File System (GFS) and Hadoop Distributed File System (HDFS) 1 Hadoop: Architectural Design Principles Linear scalability More nodes can do more work within the same time Linear on data size, linear

More information

Survey on Incremental MapReduce for Data Mining

Survey on Incremental MapReduce for Data Mining Survey on Incremental MapReduce for Data Mining Trupti M. Shinde 1, Prof.S.V.Chobe 2 1 Research Scholar, Computer Engineering Dept., Dr. D. Y. Patil Institute of Engineering &Technology, 2 Associate Professor,

More information

Mining Social Media Users Interest

Mining Social Media Users Interest Mining Social Media Users Interest Presenters: Heng Wang,Man Yuan April, 4 th, 2016 Agenda Introduction to Text Mining Tool & Dataset Data Pre-processing Text Mining on Twitter Summary & Future Improvement

More information

Top 25 Big Data Interview Questions And Answers

Top 25 Big Data Interview Questions And Answers Top 25 Big Data Interview Questions And Answers By: Neeru Jain - Big Data The era of big data has just begun. With more companies inclined towards big data to run their operations, the demand for talent

More information

A Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop

A Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop A Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop Myoungjin Kim 1, Seungho Han 1, Jongjin Jung 3, Hanku Lee 1,2,*, Okkyung Choi 2 1 Department of Internet and Multimedia Engineering,

More information

Decision analysis of the weather log by Hadoop

Decision analysis of the weather log by Hadoop Advances in Engineering Research (AER), volume 116 International Conference on Communication and Electronic Information Engineering (CEIE 2016) Decision analysis of the weather log by Hadoop Hao Wu Department

More information

A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING

A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING Journal homepage: www.mjret.in ISSN:2348-6953 A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING Bhavsar Nikhil, Bhavsar Riddhikesh,Patil Balu,Tad Mukesh Department of Computer Engineering JSPM s

More information

ANNUAL REPORT Visit us at project.eu Supported by. Mission

ANNUAL REPORT Visit us at   project.eu Supported by. Mission Mission ANNUAL REPORT 2011 The Web has proved to be an unprecedented success for facilitating the publication, use and exchange of information, at planetary scale, on virtually every topic, and representing

More information

Comparative Analysis of Range Aggregate Queries In Big Data Environment

Comparative Analysis of Range Aggregate Queries In Big Data Environment Comparative Analysis of Range Aggregate Queries In Big Data Environment Ranjanee S PG Scholar, Dept. of Computer Science and Engineering, Institute of Road and Transport Technology, Erode, TamilNadu, India.

More information

Column Stores and HBase. Rui LIU, Maksim Hrytsenia

Column Stores and HBase. Rui LIU, Maksim Hrytsenia Column Stores and HBase Rui LIU, Maksim Hrytsenia December 2017 Contents 1 Hadoop 2 1.1 Creation................................ 2 2 HBase 3 2.1 Column Store Database....................... 3 2.2 HBase

More information

Available online at ScienceDirect. Procedia Computer Science 64 (2015 )

Available online at  ScienceDirect. Procedia Computer Science 64 (2015 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 64 (2015 ) 425 431 Conference on ENTERprise Information Systems / International Conference on Project MANagement / Conference

More information

A Glimpse of the Hadoop Echosystem

A Glimpse of the Hadoop Echosystem A Glimpse of the Hadoop Echosystem 1 Hadoop Echosystem A cluster is shared among several users in an organization Different services HDFS and MapReduce provide the lower layers of the infrastructures Other

More information

Big Data Analytics using Apache Hadoop and Spark with Scala

Big Data Analytics using Apache Hadoop and Spark with Scala Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important

More information

Keywords Hadoop, Map Reduce, K-Means, Data Analysis, Storage, Clusters.

Keywords Hadoop, Map Reduce, K-Means, Data Analysis, Storage, Clusters. Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

EXTRACT DATA IN LARGE DATABASE WITH HADOOP

EXTRACT DATA IN LARGE DATABASE WITH HADOOP International Journal of Advances in Engineering & Scientific Research (IJAESR) ISSN: 2349 3607 (Online), ISSN: 2349 4824 (Print) Download Full paper from : http://www.arseam.com/content/volume-1-issue-7-nov-2014-0

More information

Distributed Systems 16. Distributed File Systems II

Distributed Systems 16. Distributed File Systems II Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related

More information

Data Informatics. Seon Ho Kim, Ph.D.

Data Informatics. Seon Ho Kim, Ph.D. Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu HBase HBase is.. A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage. Designed to operate

More information

A Survey on Comparative Analysis of Big Data Tools

A Survey on Comparative Analysis of Big Data Tools Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Based on Big Data: Hype or Hallelujah? by Elena Baralis

Based on Big Data: Hype or Hallelujah? by Elena Baralis Based on Big Data: Hype or Hallelujah? by Elena Baralis http://dbdmg.polito.it/wordpress/wp-content/uploads/2010/12/bigdata_2015_2x.pdf 1 3 February 2010 Google detected flu outbreak two weeks ahead of

More information

Next-Generation Cloud Platform

Next-Generation Cloud Platform Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology

More information

Juxtaposition of Apache Tez and Hadoop MapReduce on Hadoop Cluster - Applying Compression Algorithms

Juxtaposition of Apache Tez and Hadoop MapReduce on Hadoop Cluster - Applying Compression Algorithms , pp.289-295 http://dx.doi.org/10.14257/astl.2017.147.40 Juxtaposition of Apache Tez and Hadoop MapReduce on Hadoop Cluster - Applying Compression Algorithms Dr. E. Laxmi Lydia 1 Associate Professor, Department

More information

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018 Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/

More information

Apache Spark: A Literature Review. Presenter: Aaron Sarson

Apache Spark: A Literature Review. Presenter: Aaron Sarson Apache Spark: A Literature Review Presenter: Aaron Sarson Outline Introduction to Spark Problem to be addressed Proposed Approach Ø Research Questions Contributions Results Ø RQ1, RQ2, RQ3 Conclusion &

More information

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES 1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB

More information