SMART SONGS SELECTION IN PLAYLISTS USING PARALLEL K-MEANS CLUSTERING

Size: px
Start display at page:

Download "SMART SONGS SELECTION IN PLAYLISTS USING PARALLEL K-MEANS CLUSTERING"

Transcription

1 International Journal of Civil Engineering and Technology (IJCIET) Volume 9, Issue 3, March 2018, pp , Article ID: IJCIET_09_03_077 Available online at ISSN Print: and ISSN Online: IAEME Publication Scopus Indexed SMART SONGS SELECTION IN PLAYLISTS USING PARALLEL K-MEANS CLUSTERING Pradyun Manoj Department of Computer Science, Christ [Deemed to be University], Hosur Road, Bhavani Nagar, Bengaluru , Karnataka, India Saleema JS Department of Computer Science, Christ [Deemed to be University], Hosur Road, Bhavani Nagar, Bengaluru , Karnataka, India ABSTRACT Most songs today are of different tempo, pitch and time signature. In a music player application, the typical shuffle picks the succeeding song or preceding song at random with no parameters to choose the songs. Different songs from different genres can have a tempo range anywhere between forty beats per minute and three hundred beats per minute. In this paper, the quick and efficient parallel k means clustering algorithm is implemented in Hadoop on the million-song dataset subset to form clusters for the songs based on tempo and pitch. The aim of this paper is to reduce the variation that occurs when a typical shuffle picks the succeeding song at random. This variation can be in the form of tempo or other parameters. The formation of clusters and intern the reduction in the variation of tempo can be used in a new smart shuffle. After the clusters have been formed, the smart shuffle picks the songs within that specific cluster. This paper aims at reducing the variation by 50%. This would have many musical benefits and would also be more pleasing to the listener. Keywords: Hadoop, Parallel K-Means Clustering,, Tempo, Variation Cite this Article: Pradyun Manoj and Saleema JS, Smart Songs Selection in Playlists using Parallel K-Means Clustering, International Journal of Civil Engineering and Technology, 9(3), 2018, pp INTRODUCTION With the development of social media, business analytics, healthcare analysis and online shopping portals, the vast amount of data being produced per minute passes the petabyte threshold. The data that is collected by various analytics sectors is in the unstructured format. Collected data can be of different variety or volume. This is what is referred today as big data. There are a vast number of scalable frameworks that process and structure this data efficiently. The MapReduce framework is one of the most-widely used frameworks in big data analysis. The framework is usually used to process multi-terabyte datasets of different editor@iaeme.com

2 Smart Songs Selection in Playlists using Parallel K-Means Clustering variety. Independent chunks of data are formed when a MapReduce job splits the input dataset and later allows a map task to process these splits of data in parallel. The map task receives a set of <key, value> pairs and gives an output of processed <key, value> pairs [1]. The target concept of parallelism is to achieve speed without losing accuracy. In a typical map-reduce job, once the input data is split into its respective blocks, the "Mapper class" sends in a map function to do the required processing on the data. Once the data has been processed, the output of the mapper is then sorted or combined and given as input to the reduce task. The reducer will now receive the input and aggregate the data according to the given <key, value> pairs [1]. The final output of the reducer is then written back to the HDFS (Hadoop Distributed File System) [2]. The output of the reducer can either be the final output of the job or the solution can be returned to the map-reduce framework again to do a secondary job. This process iterates until the desired solution is met. This framework is similar to the divide and conquer approach, sharing the basic idea. Music is a form of art which comprises of three dimensions, namely, rhythm, melody and harmony. It is often referred to as organized sound. Different elements that form the basis of music are pitch, rhythm, dynamics, timbre and texture. comprises of melody and harmony. The associated concepts of rhythm are tempo, meter and articulation. Softness and loudness refer to dynamics and the tonal quality or color of a musical sound is timbre and texture. The most prominent elements in a song are pitch and rhythm. The pitch of a song is a perceptual property of sounds that allows their ordering on a frequency-related scale [3]. It could more commonly be referred to as the quality which makes it possible to classify a certain song or sound as higher or lower [4]. A sound s pitch can only be determined if it is differentiable from noise, which is unclear and unstable [5]. In the theory of music, the ordered frequency or pitch of different musical notes, whether ascending or descending, forms a scale. Any scale in music is formed by two major components, the tonic and the interval pattern. The tonic is the starting point of the scale and the interval pattern is the type of scale [6]. There are a total of 12 notes in music and the notations are represented by A, A#, B, C, C#, D, D#, E, F, F#, G, G#. In this paper, the notes will be represented as numbers from 0 to 11. The tempo of a song is measured by the number of beats per minute. This would directly reflect the speed of a song. Tempo can influence the genre of a musical piece and the performer s interpretation. Most musical pieces will contain a specific tempo which could range anywhere between 40 beats per minute to 300 beats per minute. For example, a song could contain a tempo of 80 beats per minute with a 4/4 time signature. A typical shuffle in a music player would not consider the tempo of a song as a parameter when picking the succeeding song. Hence, the succeeding song could have a tempo of 140 beats per minute. This could cause the listener a certain amount of discomfort as each succeeding song would vary in genre or danceability. This could also influence the mood of the listener. Hence, in this paper, the parallel k means clustering algorithm is implemented to form clusters of songs based on pitch and tempo to reduce the amount of variation that is produced through the transition between songs. The parallel k means algorithm is implemented as a quick and efficient algorithm is necessary to form the clusters. This algorithm utilizes the parallelism that is offered by mapreduce. In a typical k-means clustering approach, being an unsupervised learning technique, the clusters are determined by finding the shortest distance between a given point and the centroids. The vector is then assigned to the closest centroid. The same is continued for all the vectors. Once the first iteration is complete with all the vectors, the centroid is recalculated and the second iteration is performed. This process is continued until there is little or no change in the cluster centroids. Using the final clusters, a conclusion is drawn. In this editor@iaeme.com

3 Pradyun Manoj and Saleema JS algorithm, the distance calculation is most time-consuming. Since the distance calculation between a vector and a centroid does not affect the outcome of the distance calculation between another vector and a centroid, the distance calculation can be executed in parallel. Hence, the Parallel K-Means algorithm is implemented in this paper which parallelizes the distance calculation 2. LITERATURE REVIEW Before the existence of parallelism and the Hadoop MapReduce framework, all the programs were executed in serial. That is, when a program was to be executed, the instructions in the program would be executed sequentially. In other words, one instruction would not run until the previous instruction is complete. This whole process of executing instructions and programs sequentially was time consuming. Further, all the programs executed on a single processor. To resolve the time complexity that serial execution took, especially on large datasets, a technique called parallel processing was developed. The big advantage that parallel processing holds over serial processing is reduced time complexity. Although expensive, it reduces the amount of time taken for a single program to finish executing. While the execution of one instruction does not depend on the execution of another instruction, parallel processing can be performed [7]. Hadoop's MapReduce is one such framework that takes full advantage of parallel processing. The MapReduce framework uses non-local resources to process large volumes of data in a speedy and efficient manner. The MapReduce framework is divided into two main classes, the mapper class and the reducer class, respectively. The Mapper class has a function called the map function that receives the input data from the Hadoop distributed File System and further divides it according to the given parameters and stores it as small chunks of data in the form of <key, value> pairs [1,8]. Depending on the size of the data, the mapper size can vary. Similarly, the number of mappers can be increased or decreased accordingly. In some cases, having an increased number of mappers would be unnecessary. This is because, the dataset is not large and a small number of mappers would be enough to process all the data [8]. The size of each mapper as well can be modified according to the requirements of the job. The default size of each mapper depends on the machine the map-reduce job runs on. Once the data has been given its respective <key, value> pairs, the reducer class receives the input from the mapper class. The reducer class has a function called the reduce function. The main goal of the reduce function is to shuffle the data and finally reduce or aggregate the data according to the <key, value> [8]. To summarize the map-reduce framework, the mapper class receives the data from the Hadoop distributed file system, processes it into <key, value> pairs. The output of the mapper is a list of <key, value> pairs. The reducer class then receives this list and performs a shuffle and sort to produce a list of keys and aggregated values. And finally, the output is a list of final <key, value> pair/pairs [9]. G Dunn, Music Preferences based on audio features, and its relation to personality. Author conducted a study with 165 males and 189 female participants to find how music preferences linked to objective audio features relate to the personality of an individual. The method of the study was a Principal Component Analysis. The audio features were extracted and computationally derived from the audio clips. The results revealed that the excitementseeking was higher or positively related to music with a greater number of percussive events. The same excitement-seeking was negatively related to music with fewer percussive events [10]. Weizhong Zhao et al., Parallel K Means Clustering Based on MapReduce. The authors adapted the existing k-means algorithm in MapReduce framework which was implemented in editor@iaeme.com

4 Smart Songs Selection in Playlists using Parallel K-Means Clustering Hadoop. This would substantially increase the clustering speed and efficiency and make it applicable to large scale data. By properly assigning the correct <key, value> pairs, the k- means algorithm can be executed in parallel. Only one kind of MapReduce job is required by the PK-Means algorithm. Three functions were implemented. In the Map function of the mapper class, the input dataset is stored in the Hadoop distributed file system or HDFS as a sequence file of <key, value> pairs. After the dataset is split, it is globally broadcast to all the mappers. The distance is now calculated in parallel. According to the authors, the distance calculation is the most time-consuming. Since the distance calculation between two vectors is independent of the distance calculation between two other vectors, the execution can be performed in parallel. The combine function then combines the intermediate data from the same map task. The output of the combine function is then received as input to the reduce function. The reduce function then sums up all the samples and computes the total number of samples assigned to each cluster. The new calculated centroids are used in the next iteration. This process continues until there is little or no change to the cluster centroids. In conclusion, the authors demonstrate the speed-up, scale-up and size-up of the algorithm. The algorithm performs better as the size of the dataset increases. The speed-up and size-up performance increases. It is also able to scale well. The results finally show that the algorithm can process large datasets quick and efficiently on commodity hardware [11]. Considering the background, no study has been performed on clustering songs based on audio features of tempo and pitch to reduce the variation between transition of songs. The typical shuffle has an average variation that can lie anywhere between 40 beats per minute to 150 beats per minute. This paper aims at reducing the average variation by a 50% margin. The impact of smooth transitioning between songs where the variation is minimal would prove to be more pleasing to the listener and beneficial. In this paper, since the data size is relatively large, the parallel k-means clustering approach is adopted. 3. PARALLEL K-MEANS CLUSTERING The analysis conducted required a fast and effective clustering algorithm. The traditional k- means clustering approach will not produce the result required within the timeframe. Since the parallel k means clustering approach can speed-up, scale-up and size-up efficiently with the given data, this algorithm was implemented. The algorithm has three functions; map, combine and reduce. Algorithm 1 demonstrates the map function [11]. Algorithm 1: Algorithm Map; 1. Sample instance is constructed from value; 2. Double.MAX_VALUE is assigned to minimum distance mindis; 3. index = -1; 4. for i in range 0 to centers.length do dis = ComputeDist(instance, centers[i]); if dis < mindis { mindis = dis; index= i; } 5. End for 6. index is taken as key ; 7. value is constructed as a string comprising of the values from different dimensions; 8. output < key, value > pair; 9. End editor@iaeme.com

5 Pradyun Manoj and Saleema JS In step 4, the closest center point from the given sample is computed where the ComputeDist function returns the distance between the center points centers[i] and instance. Once the map function returns an output of < key, value > pairs, the result is sent to the Combine function. Algorithm 2 demonstrates the same [11]. Algorithm 2: Algorithm Combine; 1. One array is initialized to record the sum of value from each of the dimensions contained in the same cluster; list is V; 2. num =0, a counter is initialized in the same cluster to record the sum of sample number; 3. while(v.hasnext()) { sample instance is constructed from V.next(); the values from different dimensions of instance are added to the array increment num; 4. } 5. key is taken as key ; 6. value is constructed as a string comprising of the values from different dimensions and num; 7. output < key, value > pair; 8. End A combiner is used after each map task to combine the intermediate data from the same map task. The reducer function is then used to calculate the final output (centroids) and the input of the reduce function is obtained by the data received from the combine function of each host. Algorithm 3 demonstrates the reduce function [11]. Algorithm 3: Algorithm Reduce; 1. One array is initialized to record the sum of value from each of the dimensions contained in the same cluster; list is V; 2. num =0, a counter is initialized in the same cluster to record the sum of sample number; 3. while(v.hasnext()) { sample instance is constructed from V.next(); the values from different dimensions of instance are added to the array increment num; 4. } 5. The entries of the array are divided by NUM to get the new center s coordinates; 6. key is taken as key ; 7. value is constructed as a string comprising of the center s coordinates 8. output < key, value > pair; 9. End The final output from the reduce function will provide the calculated centroids and the cluster numbers. This < key, value > pair will be used in the next iteration as the new centroids. The process repeats until there is little or no change in the cluster centroids. The distance calculation in the parallel k-means clustering algorithm is performed using the Euclidean Distance. Equation (1) depicts the Euclidean distance [11] calculation for n- space: editor@iaeme.com

6 Smart Songs Selection in Playlists using Parallel K-Means Clustering ( ) ( ) ( ) ( ) ( ) ( ( ) ) (1) In this paper, a two-dimensional Euclidean distance formula is implemented [11]. Equation (2) represents the same: ( ) ( ) ( ) ( ) (2) The parallel distance computation through the map reduce framework is better explained diagrammatically. Fig. 1 demonstrates the parallel execution of each map task for calculating mutually exclusive distances. Figure 1 Block diagram of generic parallel distance calculation with pitch and tempo The distance calculation is executed in parallel in the map tasks. This process greatly increases the speed of the clustering algorithm as the distance calculation consumes the most amount of time and each execution can be performed independently. The minimum distance is calculated for all the vectors and the cluster number is assigned as the key. The value contains a string of different dimensions. The reduce function re-calculates the new centroids by finding the average of all the vectors in that cluster. The output is in the form of < key, value > pairs. 4. IMPLEMENTATION 4.1. Dataset Description The million-song dataset [12] is a collection of different audio features consisting of timbre, texture or tonal quality, pitch, pitch confidence, tempo and other data for a million popular contemporary music tracks. The size of the entire data set is around 300GB. Although it does not contain any audio tracks, it does contain the extracted audio features of the respected tracks and metadata. For research purposes, this experiment is run on the million-song subset which contains 1.8% of the entire dataset. There are 10,000 songs in this subset and this will prove sufficient to demonstrate the reduction of variation in tempo with the parallel k-means clustering algorithm. The subset is a randomly generated set from the million-song dataset. The two audio features, pitch and tempo, needed for the analysis are extracted. Table 1 gives the feature description editor@iaeme.com

7 Pradyun Manoj and Saleema JS Table 1 Extracted audio feature description Feature Type Range Tempo Integer Float The pitch is determined by the scale of a song. The 12 notes in music theory are A, A#, B, C, C#, D, D#, E, F, F#, G, G# and each note is assigned an integer value from 0 to 11. Tempo is measured by the total number of beats per minute. In the dataset, the type is float and the tempo ranges from 0.0 to beats per minute Experimental Results First, a situation where the clustering is not applied before the shuffle. The variation in tempo is calculated. 7 songs are picked at random and the same is demonstrated in Table 2. Table 2 Variation calculation without clustering Randomly selected songs Tempo (in BPM) Song 1 Song 2 Song 3 Song 4 Song 5 Song 6 Song Variation Tempoi+1 Tempoi Average Variation The calculated average variation is bpm. The aim of this analysis is to reduce this variation by 50%. The data points are added to a scatter plot before the clustering is performed as seen in Fig Tempo Figure 2 and tempo data points before clustering editor@iaeme.com

8 Smart Songs Selection in Playlists using Parallel K-Means Clustering For the parallel k-means implementation, k is initialized at 4. After applying the algorithm to form clusters based on the two parameters, it was found that the clusters formed have minimal variation in their tempo and pitch. The centroids from each cluster formed are given in Table 3. Table 3 Resultant cluster centroids after parallel k-means execution on pitch and tempo Cluster Number Tempo If 7 songs were to be picked at random from cluster 1, the calculated average variation is represented in Table 4 and corresponding scatter plot in Fig. 3. Table 4 Variation calculation after shuffle on cluster 1 Randomly selected songs Tempo (in BPM) Song 1 Song 2 Song 3 Song 4 Song 5 Song 6 Song Variation Tempoi+1 Tempoi Average Variation Tempo Figure 3 Scatter Plot for cluster 1 The calculated variation from the randomly selected songs in cluster 1 is bpm. Samples of randomly selected songs within each cluster were generated and the reduction in variation was always greater than 50%. Another example where songs were randomly selected from cluster 3 is shown in Table 5 and corresponding scatter plot in Fig editor@iaeme.com

9 Tempo Tempo Pradyun Manoj and Saleema JS Tempo Variation in BPM Table 5 Variation calculation after shuffle on cluster 3 Tempo (in BPM) Variation Tempo i+1 Tempo i Song Song Song Song Song Song Song Average Variation Figure 4 Scatter Plot for cluster 3 The calculated average variation from the randomly selected songs in cluster 3 is This is substantially less than the calculated variation when the clustering was not applied. The results were better than expected. The aim was to achieve a 50% reduction in the variation. The sample shows a reduction in variation of greater than 50%. The same process of selecting songs at random within the same cluster was repeated and the recorded variation in all plausible cases were always meeting the aim of this paper. Aggregated view of all clusters in a single scatter plot is represented in Fig Cluster 1 Cluster 2 Cluster 3 Cluster Figure 5 Scatter plot for aggregated clusters editor@iaeme.com

10 Smart Songs Selection in Playlists using Parallel K-Means Clustering A total of 10 runs were conducted after the analysis to compare the reduction in variation, as seen in Table 6. To demonstrate the reduction in variation, any cluster can be used. In this comparison, results from cluster 1 were used. Table 6 Comparisons of Tempo Variations for 7 randomly selected songs in smart shuffle Run Number Before Clustering After clustering Reduction in variation % Run % Run % Run % Run % Run % Run % Run % Run % Run % Run % Average reduction in variation % 88.94% An average reduction in variation percentage is recorded at 88.94%. According to Fig. 5, songs can be selected either horizontally or vertically. If songs are selected horizontally, the succeeding songs would be of similar tempo and increasing pitch. If songs are selected vertically, the succeeding songs would be of similar pitch and increasing tempo. 5. CONCLUSION The aim of this paper was to reduce the average variation by 50%. The calculated results show that the average reduction in variation in the 10 runs was 88.94%. The analysis was a success as the reduction in variation was greater than 50%. Since the typical shuffle had a very high rate of variation, the mood and continuity for the listener was affected. The new smart shuffle, eliminates the variation by an average of 88.94% and this would be more pleasing to the listener and have numerous musical benefits. With the flexibility to change the parameters to meet different application requirements, there is vast opportunity to further develop the smart shuffle. This can be incorporated in any music player application looking to enhance its features or reach a specific type of musically-inclined audience. REFERENCES [1] D. Jeffrey, S.Ghemawat, MapReduce: simplified data processing on large clusters, Communications of the ACM 51.1 (2008): [2] Borthakur, Dhruba, The hadoop distributed file system: Architecture and design, Hadoop Project Website (2007): 21. [3] Klapuri, Anssi, Introduction to music transcription Signal Processing Methods for Music Transcription, (Boston : Springer, 2006) [4] Plack, C.J, A J. Oxenham, and Richard R. Fay, : Neural Coding and Perception, (New York: Springer, 2005) 1-6. [5] Randel, D.Michael, The Harvard dictionary of music, Harvard University Press, [6] Hewitt, M. John, Musical Scales of the World, Note Tree, [7] Logan, G. D, Parallel and serial processing, Stevens handbook of experimental psychology (2002) editor@iaeme.com

11 Pradyun Manoj and Saleema JS [8] Apache Software Foundation, MapReduce Tutorial, [9] J.Dean, S.Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, Proc. of Operating Systems Design and Implementation, San Francisco, CA, 2004, [10] D.Greg, Music preferences based on audio features and its relation to personality, ESCOM 2009: 7th Triennial Conference of European Society for the Cognitive Sciences of Music, [11] Zhao, Weizhong, H.Ma, and Q.He, Parallel k-means clustering based on mapreduce, IEEE International Conference on Cloud Computing, (Berlin:Springer,2009), [12] Million Song Dataset, official website by Thierry Bertin-Mahieux, [13] Chandra Das, Shilpi Bose, Matangini Chattopadhyay, Samiran Chattopadhyay, A Novel Distance Based Modified K-Means Clustering Algorithm for Estimation of Missing Values in Micro-Array Gene Expression Data, International Journal of Information Technology & Management Information System (IJITMIS), Volume 5, Issue 3, September - December (2014), pp [14] Deepika Khurana and Dr. M.P.S Bhatia, Dynamic Approach to K-Means Clustering Algorithm, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 3, May-June (2013), pp editor@iaeme.com

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

A Comparative study of Clustering Algorithms using MapReduce in Hadoop A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering

More information

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on

More information

ANALYZING THE MILLION SONG DATASET USING MAPREDUCE

ANALYZING THE MILLION SONG DATASET USING MAPREDUCE PROGRAMMING ASSIGNMENT 3 ANALYZING THE MILLION SONG DATASET USING MAPREDUCE Version 1.0 DUE DATE: Wednesday, October 18 th, 2017 @ 5:00 pm OBJECTIVE You will be developing MapReduce programs that parse

More information

Iteration Reduction K Means Clustering Algorithm

Iteration Reduction K Means Clustering Algorithm Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department

More information

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department

More information

Improved MapReduce k-means Clustering Algorithm with Combiner

Improved MapReduce k-means Clustering Algorithm with Combiner 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Improved MapReduce k-means Clustering Algorithm with Combiner Prajesh P Anchalia Department Of Computer Science and Engineering

More information

AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang

AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang International Journal of Innovative Computing, Information and Control ICIC International c 2017 ISSN 1349-4198 Volume 13, Number 3, June 2017 pp. 1037 1046 AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA

More information

Comparative Analysis of K means Clustering Sequentially And Parallely

Comparative Analysis of K means Clustering Sequentially And Parallely Comparative Analysis of K means Clustering Sequentially And Parallely Kavya D S 1, Chaitra D Desai 2 1 M.tech, Computer Science and Engineering, REVA ITM, Bangalore, India 2 REVA ITM, Bangalore, India

More information

Figure 1 shows unstructured data when plotted on the co-ordinate axis

Figure 1 shows unstructured data when plotted on the co-ordinate axis 7th International Conference on Computational Intelligence, Communication Systems and Networks (CICSyN) Key Frame Extraction and Foreground Modelling Using K-Means Clustering Azra Nasreen Kaushik Roy Kunal

More information

Mining Distributed Frequent Itemset with Hadoop

Mining Distributed Frequent Itemset with Hadoop Mining Distributed Frequent Itemset with Hadoop Ms. Poonam Modgi, PG student, Parul Institute of Technology, GTU. Prof. Dinesh Vaghela, Parul Institute of Technology, GTU. Abstract: In the current scenario

More information

S. Sreenivasan Research Scholar, School of Advanced Sciences, VIT University, Chennai Campus, Vandalur-Kelambakkam Road, Chennai, Tamil Nadu, India

S. Sreenivasan Research Scholar, School of Advanced Sciences, VIT University, Chennai Campus, Vandalur-Kelambakkam Road, Chennai, Tamil Nadu, India International Journal of Civil Engineering and Technology (IJCIET) Volume 9, Issue 10, October 2018, pp. 1322 1330, Article ID: IJCIET_09_10_132 Available online at http://www.iaeme.com/ijciet/issues.asp?jtype=ijciet&vtype=9&itype=10

More information

15-440: Project 4. Characterizing MapReduce Task Parallelism using K-Means on the Cloud

15-440: Project 4. Characterizing MapReduce Task Parallelism using K-Means on the Cloud 15-440: Project 4 Characterizing MapReduce Task Parallelism using K-Means on the Cloud School of Computer Science Carnegie Mellon University, Qatar Fall 2016 Assigned Date: November 15 th, 2016 Due Date:

More information

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem I J C T A, 9(41) 2016, pp. 1235-1239 International Science Press Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem Hema Dubey *, Nilay Khare *, Alind Khare **

More information

The Automatic Musicologist

The Automatic Musicologist The Automatic Musicologist Douglas Turnbull Department of Computer Science and Engineering University of California, San Diego UCSD AI Seminar April 12, 2004 Based on the paper: Fast Recognition of Musical

More information

SBKMMA: Sorting Based K Means and Median Based Clustering Algorithm Using Multi Machine Technique for Big Data

SBKMMA: Sorting Based K Means and Median Based Clustering Algorithm Using Multi Machine Technique for Big Data International Journal of Computer (IJC) ISSN 2307-4523 (Print & Online) Global Society of Scientific Research and Researchers http://ijcjournal.org/ SBKMMA: Sorting Based K Means and Median Based Algorithm

More information

High Performance Computing on MapReduce Programming Framework

High Performance Computing on MapReduce Programming Framework International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming

More information

Parallel K-Means Clustering with Triangle Inequality

Parallel K-Means Clustering with Triangle Inequality Parallel K-Means Clustering with Triangle Inequality Rachel Krohn and Christer Karlsson Mathematics and Computer Science Department, South Dakota School of Mines and Technology Rapid City, SD, 5771, USA

More information

Curriculum Guidebook: Music Gr PK Gr K Gr 1 Gr 2 Gr 3 Gr 4 Gr 5 Gr 6 Gr 7 Gr 8

Curriculum Guidebook: Music Gr PK Gr K Gr 1 Gr 2 Gr 3 Gr 4 Gr 5 Gr 6 Gr 7 Gr 8 PK K 1 2 3 4 5 6 7 8 Elements of Music 014 Differentiates rhythm and beat X X X X X X 021 Distinguishes high and low registers X X X X 022 Distinguishes loud and soft dynamics X X X X X 023 Distinguishes

More information

A computational model for MapReduce job flow

A computational model for MapReduce job flow A computational model for MapReduce job flow Tommaso Di Noia, Marina Mongiello, Eugenio Di Sciascio Dipartimento di Ingegneria Elettrica e Dell informazione Politecnico di Bari Via E. Orabona, 4 70125

More information

Online Bill Processing System for Public Sectors in Big Data

Online Bill Processing System for Public Sectors in Big Data IJIRST International Journal for Innovative Research in Science & Technology Volume 4 Issue 10 March 2018 ISSN (online): 2349-6010 Online Bill Processing System for Public Sectors in Big Data H. Anwer

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

CS 61C: Great Ideas in Computer Architecture. MapReduce

CS 61C: Great Ideas in Computer Architecture. MapReduce CS 61C: Great Ideas in Computer Architecture MapReduce Guest Lecturer: Justin Hsia 3/06/2013 Spring 2013 Lecture #18 1 Review of Last Lecture Performance latency and throughput Warehouse Scale Computing

More information

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Kranti Patil 1, Jayashree Fegade 2, Diksha Chiramade 3, Srujan Patil 4, Pradnya A. Vikhar 5 1,2,3,4,5 KCES

More information

ABSTRACT I. INTRODUCTION

ABSTRACT I. INTRODUCTION International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISS: 2456-3307 Hadoop Periodic Jobs Using Data Blocks to Achieve

More information

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3

More information

A brief history on Hadoop

A brief history on Hadoop Hadoop Basics A brief history on Hadoop 2003 - Google launches project Nutch to handle billions of searches and indexing millions of web pages. Oct 2003 - Google releases papers with GFS (Google File System)

More information

Accelerating Unique Strategy for Centroid Priming in K-Means Clustering

Accelerating Unique Strategy for Centroid Priming in K-Means Clustering IJIRST International Journal for Innovative Research in Science & Technology Volume 3 Issue 07 December 2016 ISSN (online): 2349-6010 Accelerating Unique Strategy for Centroid Priming in K-Means Clustering

More information

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework Scientific Journal of Impact Factor (SJIF): e-issn (O): 2348- International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 A Study: Hadoop Framework Devateja

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

Mounica B, Aditya Srivastava, Md. Faisal Alam

Mounica B, Aditya Srivastava, Md. Faisal Alam International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 3 ISSN : 2456-3307 Clustering of large datasets using Hadoop Ecosystem

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK DISTRIBUTED FRAMEWORK FOR DATA MINING AS A SERVICE ON PRIVATE CLOUD RUCHA V. JAMNEKAR

More information

An improved MapReduce Design of Kmeans for clustering very large datasets

An improved MapReduce Design of Kmeans for clustering very large datasets An improved MapReduce Design of Kmeans for clustering very large datasets Amira Boukhdhir Laboratoire SOlE Higher Institute of management Tunis Tunis, Tunisia Boukhdhir _ amira@yahoo.fr Oussama Lachiheb

More information

DATA POOL: A STRUCTURE TO STORE VOLUMINOUS DATA

DATA POOL: A STRUCTURE TO STORE VOLUMINOUS DATA International Journal of Computer Engineering & Technology (IJCET) Volume 9, Issue 5, September-October 2018, pp. 167 180, Article ID: IJCET_09_05_020 Available online at http://www.iaeme.com/ijcet/issues.asp?jtype=ijcet&vtype=9&itype=5

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Mitigating Data Skew Using Map Reduce Application

Mitigating Data Skew Using Map Reduce Application Ms. Archana P.M Mitigating Data Skew Using Map Reduce Application Mr. Malathesh S.H 4 th sem, M.Tech (C.S.E) Associate Professor C.S.E Dept. M.S.E.C, V.T.U Bangalore, India archanaanil062@gmail.com M.S.E.C,

More information

Review on Managing RDF Graph Using MapReduce

Review on Managing RDF Graph Using MapReduce Review on Managing RDF Graph Using MapReduce 1 Hetal K. Makavana, 2 Prof. Ashutosh A. Abhangi 1 M.E. Computer Engineering, 2 Assistant Professor Noble Group of Institutions Junagadh, India Abstract solution

More information

A Review of K-mean Algorithm

A Review of K-mean Algorithm A Review of K-mean Algorithm Jyoti Yadav #1, Monika Sharma *2 1 PG Student, CSE Department, M.D.U Rohtak, Haryana, India 2 Assistant Professor, IT Department, M.D.U Rohtak, Haryana, India Abstract Cluster

More information

Keywords Hadoop, Map Reduce, K-Means, Data Analysis, Storage, Clusters.

Keywords Hadoop, Map Reduce, K-Means, Data Analysis, Storage, Clusters. Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Research Article Apriori Association Rule Algorithms using VMware Environment

Research Article Apriori Association Rule Algorithms using VMware Environment Research Journal of Applied Sciences, Engineering and Technology 8(2): 16-166, 214 DOI:1.1926/rjaset.8.955 ISSN: 24-7459; e-issn: 24-7467 214 Maxwell Scientific Publication Corp. Submitted: January 2,

More information

Survey on MapReduce Scheduling Algorithms

Survey on MapReduce Scheduling Algorithms Survey on MapReduce Scheduling Algorithms Liya Thomas, Mtech Student, Department of CSE, SCTCE,TVM Syama R, Assistant Professor Department of CSE, SCTCE,TVM ABSTRACT MapReduce is a programming model used

More information

Efficient Algorithm for Frequent Itemset Generation in Big Data

Efficient Algorithm for Frequent Itemset Generation in Big Data Efficient Algorithm for Frequent Itemset Generation in Big Data Anbumalar Smilin V, Siddique Ibrahim S.P, Dr.M.Sivabalakrishnan P.G. Student, Department of Computer Science and Engineering, Kumaraguru

More information

MapReduce Design Patterns

MapReduce Design Patterns MapReduce Design Patterns MapReduce Restrictions Any algorithm that needs to be implemented using MapReduce must be expressed in terms of a small number of rigidly defined components that must fit together

More information

Batch Inherence of Map Reduce Framework

Batch Inherence of Map Reduce Framework Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.287

More information

Document Clustering with Map Reduce using Hadoop Framework

Document Clustering with Map Reduce using Hadoop Framework Document Clustering with Map Reduce using Hadoop Framework Satish Muppidi* Department of IT, GMRIT, Rajam, AP, India msatishmtech@gmail.com M. Ramakrishna Murty Department of CSE GMRIT, Rajam, AP, India

More information

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments Send Orders for Reprints to reprints@benthamscience.ae 368 The Open Automation and Control Systems Journal, 2014, 6, 368-373 Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing

More information

Global Journal of Engineering Science and Research Management

Global Journal of Engineering Science and Research Management A FUNDAMENTAL CONCEPT OF MAPREDUCE WITH MASSIVE FILES DATASET IN BIG DATA USING HADOOP PSEUDO-DISTRIBUTION MODE K. Srikanth*, P. Venkateswarlu, Ashok Suragala * Department of Information Technology, JNTUK-UCEV

More information

University of Waterloo. Storing Directed Acyclic Graphs in Relational Databases

University of Waterloo. Storing Directed Acyclic Graphs in Relational Databases University of Waterloo Software Engineering Storing Directed Acyclic Graphs in Relational Databases Spotify USA Inc New York, NY, USA Prepared by Soheil Koushan Student ID: 20523416 User ID: skoushan 4A

More information

Enhancing K-means Clustering Algorithm with Improved Initial Center

Enhancing K-means Clustering Algorithm with Improved Initial Center Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of

More information

Implementation of Aggregation of Map and Reduce Function for Performance Improvisation

Implementation of Aggregation of Map and Reduce Function for Performance Improvisation 2016 IJSRSET Volume 2 Issue 5 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Implementation of Aggregation of Map and Reduce Function for Performance Improvisation

More information

MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014

MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014 MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION Steve Tjoa kiemyang@gmail.com June 25, 2014 Review from Day 2 Supervised vs. Unsupervised Unsupervised - clustering Supervised binary classifiers (2 classes)

More information

INDEX-BASED JOIN IN MAPREDUCE USING HADOOP MAPFILES

INDEX-BASED JOIN IN MAPREDUCE USING HADOOP MAPFILES Al-Badarneh et al. Special Issue Volume 2 Issue 1, pp. 200-213 Date of Publication: 19 th December, 2016 DOI-https://dx.doi.org/10.20319/mijst.2016.s21.200213 INDEX-BASED JOIN IN MAPREDUCE USING HADOOP

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Volume 3, Issue 11, November 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 11, November 2015 International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 11, November 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com

More information

TDT- An Efficient Clustering Algorithm for Large Database Ms. Kritika Maheshwari, Mr. M.Rajsekaran

TDT- An Efficient Clustering Algorithm for Large Database Ms. Kritika Maheshwari, Mr. M.Rajsekaran TDT- An Efficient Clustering Algorithm for Large Database Ms. Kritika Maheshwari, Mr. M.Rajsekaran M-Tech Scholar, Department of Computer Science and Engineering, SRM University, India Assistant Professor,

More information

Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points

Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points Dr. T. VELMURUGAN Associate professor, PG and Research Department of Computer Science, D.G.Vaishnav College, Chennai-600106,

More information

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop

More information

Two-layer Distance Scheme in Matching Engine for Query by Humming System

Two-layer Distance Scheme in Matching Engine for Query by Humming System Two-layer Distance Scheme in Matching Engine for Query by Humming System Feng Zhang, Yan Song, Lirong Dai, Renhua Wang University of Science and Technology of China, iflytek Speech Lab, Hefei zhangf@ustc.edu,

More information

Storm Identification in the Rainfall Data Using Singular Value Decomposition and K- Nearest Neighbour Classification

Storm Identification in the Rainfall Data Using Singular Value Decomposition and K- Nearest Neighbour Classification Storm Identification in the Rainfall Data Using Singular Value Decomposition and K- Nearest Neighbour Classification Manoj Praphakar.T 1, Shabariram C.P 2 P.G. Student, Department of Computer Science Engineering,

More information

PRE HADOOP AND POST HADOOP VALIDATIONS FOR BIG DATA

PRE HADOOP AND POST HADOOP VALIDATIONS FOR BIG DATA International Journal of Mechanical Engineering and Technology (IJMET) Volume 8, Issue 10, October 2017, pp. 608 616, Article ID: IJMET_08_10_066 Available online at http://www.iaeme.com/ijmet/issues.asp?jtype=ijmet&vtype=8&itype=10

More information

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline

More information

732A54/TDDE31 Big Data Analytics

732A54/TDDE31 Big Data Analytics 732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks

More information

Uday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India. IJRASET 2015: All Rights are Reserved

Uday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India. IJRASET 2015: All Rights are Reserved Implementation of K-Means Clustering Algorithm in Hadoop Framework Uday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India Abstract Drastic growth

More information

Introduction to MapReduce (cont.)

Introduction to MapReduce (cont.) Introduction to MapReduce (cont.) Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com USC INF 553 Foundations and Applications of Data Mining (Fall 2018) 2 MapReduce: Summary USC INF 553 Foundations

More information

A Survey on Comparative Analysis of Big Data Tools

A Survey on Comparative Analysis of Big Data Tools Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

CLUSTERING is one major task of exploratory data. Practical Privacy-Preserving MapReduce Based K-means Clustering over Large-scale Dataset

CLUSTERING is one major task of exploratory data. Practical Privacy-Preserving MapReduce Based K-means Clustering over Large-scale Dataset 1 Practical Privacy-Preserving MapReduce Based K-means Clustering over Large-scale Dataset Jiawei Yuan, Member, IEEE, Yifan Tian, Student Member, IEEE Abstract Clustering techniques have been widely adopted

More information

Enhancing the Efficiency of Radix Sort by Using Clustering Mechanism

Enhancing the Efficiency of Radix Sort by Using Clustering Mechanism Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

CHAPTER 8 Multimedia Information Retrieval

CHAPTER 8 Multimedia Information Retrieval CHAPTER 8 Multimedia Information Retrieval Introduction Text has been the predominant medium for the communication of information. With the availability of better computing capabilities such as availability

More information

Performance Analysis of Hadoop Application For Heterogeneous Systems

Performance Analysis of Hadoop Application For Heterogeneous Systems IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. I (May-Jun. 2016), PP 30-34 www.iosrjournals.org Performance Analysis of Hadoop Application

More information

A Review Approach for Big Data and Hadoop Technology

A Review Approach for Big Data and Hadoop Technology International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 A Review Approach for Big Data and Hadoop Technology Prof. Ghanshyam Dhomse

More information

The amount of data increases every day Some numbers ( 2012):

The amount of data increases every day Some numbers ( 2012): 1 The amount of data increases every day Some numbers ( 2012): Data processed by Google every day: 100+ PB Data processed by Facebook every day: 10+ PB To analyze them, systems that scale with respect

More information

REVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK

REVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK REVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK 1 Dr.R.Kousalya, 2 T.Sindhupriya 1 Research Supervisor, Professor & Head, Department of Computer Applications, Dr.N.G.P Arts and Science College, Coimbatore

More information

2/26/2017. The amount of data increases every day Some numbers ( 2012):

2/26/2017. The amount of data increases every day Some numbers ( 2012): The amount of data increases every day Some numbers ( 2012): Data processed by Google every day: 100+ PB Data processed by Facebook every day: 10+ PB To analyze them, systems that scale with respect to

More information

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? Simple to start What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? What is the maximum download speed you get? Simple computation

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Jürgen Herre for Integrated Circuits (FhG-IIS) Erlangen, Germany Jürgen Herre, hrr@iis.fhg.de Page 1 Overview Extracting meaning

More information

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Available online at  ScienceDirect. Procedia Computer Science 89 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 89 (2016 ) 341 348 Twelfth International Multi-Conference on Information Processing-2016 (IMCIP-2016) Parallel Approach

More information

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad

More information

Data Analytics Framework and Methodology for WhatsApp Chats

Data Analytics Framework and Methodology for WhatsApp Chats Data Analytics Framework and Methodology for WhatsApp Chats Transliteration of Thanglish and Short WhatsApp Messages P. Sudhandradevi Department of Computer Applications Bharathiar University Coimbatore,

More information

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Abstract Deciding on which algorithm to use, in terms of which is the most effective and accurate

More information

The MapReduce Framework

The MapReduce Framework The MapReduce Framework In Partial fulfilment of the requirements for course CMPT 816 Presented by: Ahmed Abdel Moamen Agents Lab Overview MapReduce was firstly introduced by Google on 2004. MapReduce

More information

An Improved Performance Evaluation on Large-Scale Data using MapReduce Technique

An Improved Performance Evaluation on Large-Scale Data using MapReduce Technique Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,

More information

Dr. Chatti Subba Lakshmi

Dr. Chatti Subba Lakshmi International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 1 ISSN : 2456-3307 Case Study on Static k-means ering Algorithm Dr.

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Redefining and Enhancing K-means Algorithm

Redefining and Enhancing K-means Algorithm Redefining and Enhancing K-means Algorithm Nimrat Kaur Sidhu 1, Rajneet kaur 2 Research Scholar, Department of Computer Science Engineering, SGGSWU, Fatehgarh Sahib, Punjab, India 1 Assistant Professor,

More information

Hadoop/MapReduce Computing Paradigm

Hadoop/MapReduce Computing Paradigm Hadoop/Reduce Computing Paradigm 1 Large-Scale Data Analytics Reduce computing paradigm (E.g., Hadoop) vs. Traditional database systems vs. Database Many enterprises are turning to Hadoop Especially applications

More information

NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM

NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM Saroj 1, Ms. Kavita2 1 Student of Masters of Technology, 2 Assistant Professor Department of Computer Science and Engineering JCDM college

More information

Music Signal Spotting Retrieval by a Humming Query Using Start Frame Feature Dependent Continuous Dynamic Programming

Music Signal Spotting Retrieval by a Humming Query Using Start Frame Feature Dependent Continuous Dynamic Programming Music Signal Spotting Retrieval by a Humming Query Using Start Frame Feature Dependent Continuous Dynamic Programming Takuichi Nishimura Real World Computing Partnership / National Institute of Advanced

More information

Distributed Face Recognition Using Hadoop

Distributed Face Recognition Using Hadoop Distributed Face Recognition Using Hadoop A. Thorat, V. Malhotra, S. Narvekar and A. Joshi Dept. of Computer Engineering and IT College of Engineering, Pune {abhishekthorat02@gmail.com, vinayak.malhotra20@gmail.com,

More information

An Optimization Algorithm of Selecting Initial Clustering Center in K means

An Optimization Algorithm of Selecting Initial Clustering Center in K means 2nd International Conference on Machinery, Electronics and Control Simulation (MECS 2017) An Optimization Algorithm of Selecting Initial Clustering Center in K means Tianhan Gao1, a, Xue Kong2, b,* 1 School

More information

Databases 2 (VU) ( / )

Databases 2 (VU) ( / ) Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:

More information

A Naïve Soft Computing based Approach for Gene Expression Data Analysis

A Naïve Soft Computing based Approach for Gene Expression Data Analysis Available online at www.sciencedirect.com Procedia Engineering 38 (2012 ) 2124 2128 International Conference on Modeling Optimization and Computing (ICMOC-2012) A Naïve Soft Computing based Approach for

More information

Graph Algorithms using Map-Reduce. Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web

Graph Algorithms using Map-Reduce. Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some

More information

A REVIEW PAPER ON BIG DATA ANALYTICS

A REVIEW PAPER ON BIG DATA ANALYTICS A REVIEW PAPER ON BIG DATA ANALYTICS Kirti Bhatia 1, Lalit 2 1 HOD, Department of Computer Science, SKITM Bahadurgarh Haryana, India bhatia.kirti.it@gmail.com 2 M Tech 4th sem SKITM Bahadurgarh, Haryana,

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 2. MapReduce Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Framework A programming model

More information

URBAN GROWTH MODELLING USING CELLULAR AUTOMATA BASED CLASSIFIER

URBAN GROWTH MODELLING USING CELLULAR AUTOMATA BASED CLASSIFIER International Journal of Civil Engineering and Technology (IJCIET) Volume 8, Issue 12, December 2017, pp. 302-308, Article ID: IJCIET_08_12_035 Available online at http://www.iaeme.com/ijciet/issues.asp?jtype=ijciet&vtype=8&itype=12

More information

PARTICLE SWARM OPTIMIZATION FOR MULTIDIMENSIONAL CLUSTERING OF NATURAL LANGUAGE DATA

PARTICLE SWARM OPTIMIZATION FOR MULTIDIMENSIONAL CLUSTERING OF NATURAL LANGUAGE DATA International Journal of Civil Engineering and Technology (IJCIET) Volume 9, Issue 11, November 2018, pp. 139 149, Article ID: IJCIET_09_11_014 Available online at http://www.iaeme.com/ijciet/issues.asp?jtype=ijciet&vtype=9&itype=11

More information

Mining Large-Scale Music Data Sets

Mining Large-Scale Music Data Sets Mining Large-Scale Music Data Sets Dan Ellis & Thierry Bertin-Mahieux Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,thierry}@ee.columbia.edu

More information

Data Analytics for. Transmission Expansion Planning. Andrés Ramos. January Estadística II. Transmission Expansion Planning GITI/GITT

Data Analytics for. Transmission Expansion Planning. Andrés Ramos. January Estadística II. Transmission Expansion Planning GITI/GITT Data Analytics for Andrés Ramos January 2018 1 1 Introduction 2 Definition Determine which lines and transformers and when to build optimizing total investment and operation costs 3 Challenges for TEP

More information