14th Iran Media Technology Conference. by H. Shah-Hosseini. 12 Dec Gathered & presented by H. Shah-Hosseini 1
|
|
- Gavin Barton
- 6 years ago
- Views:
Transcription
1 14th Iran Media Technology Conference by H. Shah-Hosseini 12 Dec Gathered & presented by H. Shah-Hosseini 1
2 Topics Big data: Big data and its four vs: volume, velocity, variety, and veracity Another two v's for big data: valence, and value Data science: Data science and its five p's Data science process: acquire, process, analyze, report, act More on analysis (data mining): Classification, Regression, Clustering, Association analysis (rules), Graph analytics Hadoop Hadoop Distributed File system Hadoop Yarn Hadoop MapReduce 14th Iran Media Technology Conference Gathered & presented by H. Shah-Hosseini 2
3 Gathered & presented by H. Shah-Hosseini 3
4 Big data: Volume Gathered & presented by H. Shah-Hosseini 4
5 Big data: Volume (2) Volume is related to size and exponential growth of data Every minute: 204 million s are sent Facebook: 200,000 photos are uploaded; 1.8 Million likes are given. YouTube: 1.3 Million video views 72 hours of video uploads Challenges: storage, access, and processing Gathered & presented by H. Shah-Hosseini 5
6 Big data: Velocity Gathered & presented by H. Shah-Hosseini 6
7 Big data: Velocity (2) Velocity: Speed of creating data, and the need to hشve speed in storing and analyzing data. For big data: We need real-time action late decisions lead to missing opportunities, losing customers visiting your online store; or loss of lives in healthcare or disasters Thus, real-time processing is prefered in contrast to batch processing. Gathered & presented by H. Shah-Hosseini 7
8 Big data: Variety Gathered & presented by H. Shah-Hosseini 8
9 Big data: Variety (2) Variety is related to complexity of data structure Axes of Data Variety: Structural Variety: formats and models Media Variety: medium in which data get delivered Semantic Variety: how to interpret and operate on data Availability Variations: real-time? Intermittent? We can have variety even in a single Sender, receiver, date,..: Well-structured Body of the Text Attachments: Multi-media Who-sends-to-whom: Network A current cannot reference a past Semantics Real-time? Availability Gathered & presented by H. Shah-Hosseini 9
10 Big data: Veracity Gathered & presented by H. Shah-Hosseini 10
11 Big data: Veracity (2) Veracity refers to quality Accuracy of data data can be noisy, imprecise, biased, or full of uncertainty Reliability of the data source where the data come from or how was generated is also a factor Ordinary citizens who volunteer to report when they or someone in their family are experiencing symptoms of ILI. Flu Near You, a system run by the HealthMap initiative cofounded by Brownstein at Boston Children s Hospital, was launched in 2011 and now has 46,000 participants, covering 70,000 people. Gathered & presented by H. Shah-Hosseini 11
12 Big data: Characteristics 4+2 V s of big data: Valence: refers to connectedness of big data. Thus, it is how interconnected the data is. As there are more and more connections among the data the complexity of the analysis increases. Value: is the benefit we get from big data Gathered & presented by H. Shah-Hosseini 12
13 Gathered & presented by H. Shah-Hosseini 13
14 Data science, and its five components ( five P's) Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured (from wiki) which is a continuation of some of the data analysis fields such as statistics, machine learning, data mining, and predictive analytics, similar to Knowledge Discovery in Databases (KDD) Five P s of data science People: is the data scientists as a team. Purpose: is the big data challenge for example, rate of spread and direction in a wildfire Gathered & presented by H. Shah-Hosseini 14
15 Data scientist skills: Top five Top five skills needed for a data scientist (first two, technical): 1) Programming The ability to analyze large datasets, and create tools to do better data science. 2) Quantitative analysis Experimental design and analysis, Modeling of complex economic or growth systems(churn models), Machine learning 3) Product intuition Generating hypotheses, Defining metrics, Debugging analyses 4) Communication Communicating insights, Data visualization and presentation, General communication 5) Teamwork Being selfless, Constant iteration, and Sharing knowledge with others Gathered & presented by H. Shah-Hosseini 15
16 Data science process The data science process includes five steps: 1) Acquire: identity datasets and retrieve them 2) Prepare: composed of two sub-steps: explore & preprocess 3) Analyze: Select analytical techniques, and build models 4) Report: evaluation of analytical results and creating reports 5) Act: Apply the results The five steps can be repeated to demand the original purpose Gathered & presented by H. Shah-Hosseini 16
17 Step1: acquire Determine what data is available and acquire them. For this purpose, we should identity suitable data and make use of all data relevant to our problem for analysis Data comes from many different sources, structured or unstructured, with different velocities And different technologies to access these data Gathered & presented by H. Shah-Hosseini 17
18 Step 1: Acquire: example: traditional databases We use SQL and query browsers to acquire data from these databases Here, the data is structured Gathered & presented by H. Shah-Hosseini 18
19 Step 1: Acquire: example: files (such as text files and excel spreadsheets) We often use scripting languages to acquire data from files Gathered & presented by H. Shah-Hosseini 19
20 Step 1: Acquire: example: from websites There are a variety of formats and services that webpages are allowed to use based on W3C: Formats includes xml, html, that webpages are written within Web services are also hosted by websites to programmatically access to their data. Gathered & presented by H. Shah-Hosseini 20
21 Step 1: Acquire: example: nosql storage nosql storage systems are used to manage a variety of data types and also big data. In these data storage systems, data are not stored as rows and columns nosql storage systems provide API to access their data Also, most nosql systems provide webservices (such as REST) to interface with their data. Gathered & presented by H. Shah-Hosseini 21
22 Step 1: Acquire: a use case: wildfire case Sensor data from weather stations have been stored in relational databases. So we can use SQL to access these data which may use to model the fire Real-time data via a websocket service to receive weather station data. These data are processed and compared to the patterns found by our model to assess the situation Feeds can be retrieved via hash tags related to any fire occurring near the interested region To get tweets of people for sentiment analysis to measure how people feel: are they experiencing fear, anger, or ignorance of fire. Which may lead to measure the urgency of the fire A similar scenario can be designed for earthquake Gathered & presented by H. Shah-Hosseini 22
23 Step 2a: Exploring data First step after acquiring data is to explore it (understand the data) in step explore, we look for things such as correlations, outliers, and general trends. Without this step, we cannot use the data effectively Correlation graphs show the dependencies between variables in the data. Graphing the general trend show if there is a consistent direction in which the variables are moving toward. Such as sales prices are going up or down. An outlier is a data point that is distant from other data points. Gathered & presented by H. Shah-Hosseini 23 Outliers must be avoided.
24 Step 2a: Exploring data (2) Also, we may use statistics to describe your data with numerical values These numbers give us an idea of the nature of our data. For example, a negative range in the field of age indicates that there is something wrong in our data Gathered & presented by H. Shah-Hosseini 24
25 Step 2a: Exploring data (3) Visualization provides a quick look at the data in this preliminary analysis step: For example, heat maps provide a quick look where the hot spots are Histogram shows the distribution of data and may reveal unusual spread of data Gathered & presented by H. Shah-Hosseini 25
26 Step 2b: preprocess The raw data are never in the format we need. In the preprocess step: We have to clean the data Then, to transform the data to make it suitable for analysis Real data is messy. Data quality issues includes: Inconsistent values Missing values Duplicate records Invalid data (such as a longer postal code) Outliers (very different from the rest of the data) We need to detect and correct these quality issues Gathered & presented by H. Shah-Hosseini 26
27 Step 2b: preprocess: addressing data quality issues: cleaning the data To handle incomplete or incorrect data, we need domain knowledge Such as the knowledge of the application, how the data was collected; the users of the application, etc Gathered & presented by H. Shah-Hosseini 27
28 Step 2b: preprocess: data munging Here, we manipulate the clean data into the format needed for the analysis. Other names: data wrangling, data preprocessing Some operations may use in this preprocess step are: Feature selection, scaling, dimension reduction, transformation, manipulation Data preparation is very important for meaningful analysis. Gathered & presented by H. Shah-Hosseini 28
29 Step 2b: preprocess: data munging: scaling Scaling means changing the range of values to become within the specified range. The scaling prevents large values to dominate the results in the analysis Scale to [0,1] by : xnew=(x-min(x))/(max(x)-min(x)) Or to make data: zero-mean and unitvariance Gathered & presented by H. Shah-Hosseini 29
30 Step 2b: preprocess: data munging: transformation We can also transform data to make it better for analysis For example, we may use transformations to reduce noise in the data or to reduce variability. Aggregation (average filter) is such a transformation that reduce details or variability For example, daily sales figures have many irregular changes, aggregating to weekly or monthly figures results in smoother data Point: using such transformation removes details from the data. So, care must betaken if detail is needed for an application Gathered & presented by H. Shah-Hosseini 30
31 Step 2b: preprocess: data munging: transformation: Denoising An example of denoising with a neural networkbased autoencoder using Keras and Python. Gathered & presented by H. Shah-Hosseini 31
32 Step 2b: preprocess: data munging: feature selection The process of selecting a subset of relevant features (variables, predictors) that are useful to build a good predictor (model) Feature selection can be used for: Removing irrelevant or redundant features Which results in making the analysis easier Combining features Creating new features For example, If two feature are highly correlated, one of them can be removed without negatively affecting the analysis Feature selection algorithms broadly fall into three categories: filter, wrapper and embedded models. Gathered & presented by H. Shah-Hosseini 32
33 Step 2b: preprocess: data munging: feature selection: wrapper approach A wrapper feature selection, implemented in KNIME using a Naive Bayes classifier Gathered & presented by H. Shah-Hosseini 33
34 Step 2b: preprocess: data munging: dimensionality reduction Dimensionality reduction is useful when we have a large amount of dimensions (features) for each record in the dataset. It involves finding a smaller subset of dimensions that capture most of the variations in the data By doing this, we remove irrelevant features as well as reducing the features, which finally leads to simpler analysis. This can be used for data compression also. Here, a cat is represented by 1,2 and 5 components instead of 100 pixels. 3D MDS for visualization 2D Gathered & presented by H. Shah-Hosseini 34
35 Step 2b: preprocess: data munging: Transformation into feature space Example of using PCA (Principal Component Analysis) for transforming data into eigenspace: Gathered & presented by H. Shah-Hosseini 35
36 Step 2b: preprocess: data munging: data manipulation Raw data often has to be manipulated to be in the correct format for analysis For example, from samples recording daily changes in stocks prices, we may be interested in price changes of a particular market segment, such as real estate, or healthcare. which has to be extracted from the data. This requires to determine which stocks belongs to which market segment. Then, grouping them together and perhaps computing the mean, range, or standard deviations for each group. Each block shows a record in the dataset Gathered & presented by H. Shah-Hosseini 36
37 Step 3: analyze Building model from the data (input data) The model generates the output data Since there are different types of problems, then we have different types of techniques for analysis such as: Classification Regression Clustering Association analysis (rules) Graph analytics (graph mining) Recommendation systems Model building: Input data-> analyze technique-> model -> output data Gathered & presented by H. Shah-Hosseini 37
38 Step 3: analyze : Classification Classification: To predict the category of input data If we have only two categories, we call it binary classification. For handwritten digits, how many categories do we have? Another example: Spam filter for s, having two classes: spam vs. nonspam, which is a binary classification problem. A simple binary classifier: Example: spam or nonspam is a binary classification Gathered & presented by H. Shah-Hosseini 38
39 Step 3: analyze : Classification (2) Example: Deep Learning for image classification, using a Deep Learning model trained on Imagenet: Gathered & presented by H. Shah-Hosseini 39
40 Step 3: analyze : Regression Regression is when we have to predict a numeric value instead of category For example, to predict the price of a stock, gold, or oil To approximate a function by its data points, Example below: linear regression Gathered & presented by H. Shah-Hosseini 40
41 Step 3: analyze : Clustering Clustering: here, goal is to organizing similar items into groups. A clustering with three clusters Example: customer segmentation Using DBSCAN for clustering with KNIME: Gray points are considered noise with Gathered & presented by H. Shah-Hosseini DBSCAN 41
42 Step 3: analyze : association Association analysis: the goal is to find rules to capture association between items. An example is market-basket analysis, in which we want to discover which items come together frequently in baskets. The form of an association rule is: i j, where i is a set of items: i= {i1,i2, ik} and j is an item. The implication of this association rule is that if all of the items in i appear in some basket, then j is likely to appear in that basket as well point: Fequent itemsets are obtained to get association rules. Gathered & presented by H. Shah-Hosseini 42
43 Step 3: analyze : association (2) Example: Consider eight baskets for itemset {b,c,m,p,j}: B1 ={m,c,b} B2 ={m,p,j} B3 ={m,b} B4 ={c,j} B5 ={m,p,b} B6 ={m,c,b,j} B7 ={c,b,j} B8 ={b,c} An association rule: {m, b} --> c. has confidence=2/4=50% confidence( i j) support( i, j) support( i) If i is a set of items, the support for i is the number of baskets for which i is a subset. (m,b,c) appears in B1, and B6 (m,b) appears in B1,B3,B5, and B6 Gathered & presented by H. Shah-Hosseini 43
44 Step 3: analyze: graph analytics When data can be transformed into a graph, we may use the graph structure to find connections between entities. For example, it can be used for exploring the effect of a disease or epidemics by analyzing hospital or doctors records, or by analyzing social networks related to a specific region Example: community detection Gathered & presented by H. Shah-Hosseini 44
45 Step 3: analyze: node importance: Pagerank, degree centrality Pagerank scores have been normalized to sum to 100. The importance is visualized by their size. The more in-links leads to more importance Degree centrality for the Karate club graph: Gathered & presented by H. Shah-Hosseini 45
46 Step 3: analyze: recommendation systems There are different approches for recommendation systems: content-based collaborative filtering latent factors Example: book recom. Gathered & presented by H. Shah-Hosseini 46
47 Step 3: analyze: modelling Modelling includes: Selecting the technique, Building the model Validating the model Validation depends on the technique we used. For example, we may apply the model to new data samples that the model has not seen before for classification,we compare the predicted values with correct values in the test set. Gathered & presented by H. Shah-Hosseini 47
48 Step 4: Reporting To communicate your insights It is based on the audience The first thing to do is to determine what to present. To answer to this, we have to answer the following question: What is the main results (the punchline)? What added value do these results provide? How do the results compare to the success criteria determined at the beginning of the project? The results may be puzzling or they are counter to what you were hoping to find. You must report them too. Gathered & presented by H. Shah-Hosseini 48
49 Step 4: Reporting: Visualization tools Some open-source visualization tools are, Python, KNIME, R, and Tableau: Gathered & presented by H. Shah-Hosseini 49
50 Step 5: act: turning insights into action To determine what actions should be taken? For example, Is there something in the process that should be changed to remove bottlenecks? Is there data that should be added to the application to make it more accurate? Should we segment our population into more well-defined groups? How to implement the actions? what should be added into your process how should it be automated stakeholders need to be identified and involved Gathered & presented by H. Shah-Hosseini 50
51 Step 5: act: evaluation We need to assess the impact of the action By monitoring and measuring the impact of the action on the process or the application. Which finally leads to evaluation Evaluation determines the next steps: Should we revisit some data? We need to determine real-time actions and to automate these actions Gathered & presented by H. Shah-Hosseini 51
52 Gathered & presented by H. Shah-Hosseini 52
53 Hadoop: What is it? Apache Hadoop: is an open source software framework for storage & large scale processing of data-sets on clusters of commodity hardware Hadoop was created by Doug Cutting and Mike Cafarella in 2005, Cutting named the project after son's toy elephant some main Hadoop's features: moving computation to data instead of moving data to computation scalability reliability new kind of analysis: simple algorithm on large data Hadoop works on the basic assumption that hardware failures are common. These failures are taken care by Hadoop Framework. Gathered & presented by H. Shah-Hosseini 53
54 Hadoop is layered A layered example: For example: Storm, Spark, and Flink can be used Real-time and in-memory processing. Gathered & presented by H. Shah-Hosseini 54
55 Hadoop: HDFS HDFS: Distributed, scalable, and portable file system written in Java for the Hadoop framework, derive from Google file system. Intended for large files and batch inserts. (Write Once, Read many times.) Gathered & presented by H. Shah-Hosseini 55
56 From Hadoop 1.0 to Hadoop 2.0: YARN YARN has been born to do resource management separate from data processing. YARN 'schedules applications in order to prioritize tasks and maintains big data analytics systems. As one part of a greater architecture, Yarn aggregates and sorts data to conduct specific queries for data retrieval. It helps to allocate resources to particular applications and manages other kinds of resource monitoring tasks. Gathered & presented by H. Shah-Hosseini 56
57 Hadoop Ecosystem Hadoop Ecosystem: refers to the various components of Apache Hadoop Software library. It is a set of tools and accessories to address particular needs in processing the Big Data. In other words, a set of different modules interacting together forms a Hadoop Ecosystem. Question: How to figure out this Zoo? Gathered & presented by H. Shah-Hosseini 57
58 Hadoop Zoo: Examples Facbook's stack: Gathered & presented by H. Shah-Hosseini 58
59 Hadoop Zoo: Examples (2) Yahoo's stack: Gathered & presented by H. Shah-Hosseini 59
60 Hadoop Zoo: Examples (3) LinkedIn's stack: Gathered & presented by H. Shah-Hosseini 60
61 Hadoop Zoo: Examples (4) Cloudera's stack: Gathered & presented by H. Shah-Hosseini 61
62 Hadoop's major componets: Sqoop Sqoop: a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. Gathered & presented by H. Shah-Hosseini 62
63 Hadoop's major componets: HBase Column-oriented database management system Key-value store Based on Google Big Table It can hold extremely large data Dynamic data model It is not a Relational DBMS It supports both batch-style computations using MapReduce and point queries (random reads) Consistent performance of reads/writes to data used by Hadoop applications. Allows Data Store to be aggregated or processed using MapReduce functionality. Data platform for Analytics and Machine Learning. Bulk storage of logs, documents, real-time activity feeds and raw imported data. Gathered & presented by H. Shah-Hosseini 63
64 Hadoop's major componets: Pig, Hive, Oozie Pig: High level programming on top of Hadoop MapReduce. Data analysis problems as data flows Originally developed at Yahoo Hive: Data warehouse software facilitates querying and managing large datasets residing in distributed storage Mechanism to project structure onto this data and query the data using a SQLlike language called HiveQL Oozie: Workflow scheduler system to manage Apache Hadoop jobs Gathered & presented by H. Shah-Hosseini 64
65 Hadoop's major componets: Zookeeper Zookeeper: Provides operational services for a Hadoop cluster group services maintaining configuration information naming services providing distributed synchronization and providing group services Gathered & presented by H. Shah-Hosseini 65
66 HDFS Architecture: Summary Single NameNode: a master server that manages the file system namespace and regulates access to files by clients. Multiple DataNodes: typically one per node in the cluster. Datanode's functions: Manage storage Serving read/write requests from clients Block creation, deletion, replication based on instructions from NameNode Gathered & presented by H. Shah-Hosseini 66
67 HDS: Block size Default block size is 64MB It is good for large files For example, a 10GB file will be broken into: 10x1024/64 = 160 blocks NameNode memory usage: Every block represented as object Number of map tasks: data typically processed block at a time Network load: Number of checks with datanodes proportional to number of blocks so. small block size is not good Gathered & presented by H. Shah-Hosseini 67
68 Map and Reduce : An example Problem definition: We have a huge text document We wanto count the number of times each distinct word appears on the given file Some applications: analyzing web server logs statitics for query terms in search engines We need to define two functions: Map: scan each line of file, and extract something we care about (keys) Group by key: sort and shuffle, which is handled by Hadoop Reduce: aggregate, summarize, filter, or transform Gathered & presented by H. Shah-Hosseini 68
69 Wordcount: a serial code A serial code: 1) Get word 2) Look up word in table 3) Add 1 to count How would you count all the words in all the Star Wars scripts and books and blogs and etc? Solution: Map/Reduce strategy Gathered & presented by H. Shah-Hosseini 69
70 Wordcount: Mapper Let <word, 1> be the <key,value> pair. and let Hadoop do the hard work The Mapper: Loop until done: Get word Emit <word, 1> Gathered & presented by H. Shah-Hosseini 70
71 Wordcount: Shuffling and sorting This step is done by Hadoop: Gathered & presented by H. Shah-Hosseini 71
72 Wordcount: the Reducer Loop over Key-values: Get next <word,value> If <word> is same as previous word, add <value> to count; else emit <word, count> set count to 0 Gathered & presented by H. Shah-Hosseini 72
73 Wordcount, summary: Map/Reduce Gathered & presented by H. Shah-Hosseini 73
74 Wordcount: summary: Example Point: Shuffling is done by a hash function in Hadoop. Gathered & presented by H. Shah-Hosseini 74
75 Map/Reduce: In parallel Partitioning, sorting, and grouping, etc are done by Hadoop System uses a default partition function: hash(key) mod #Reducers Gathered & presented by H. Shah-Hosseini 75
76 Refinement to map/reduce: Use combiners Combiner: combines the values of all keys of a single mapper (single node): Often a Map produces many pairs with same key: <key,value1>,<key,value2>, <key, valule3>, We can aggregate these pairs by a combiner (similar to reducer) such as: combine(<key, [value1, value2,...]) --> <key, valuefinal> valuefinal=value1+value2+... Much less data is needed to be shuffle or copied Gathered & presented by H. Shah-Hosseini 76
77 End of the session on Big Data Gathered & presented by H. Shah-Hosseini 77
Big Data Hadoop Stack
Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationCloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018
Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning
More informationProcessing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.
Processing Unstructured Data Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. http://dinesql.com / Dinesh Priyankara @dinesh_priya Founder/Principal Architect dinesql Pvt Ltd. Microsoft Most
More informationBig Data Architect.
Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional
More informationBig Data and Hadoop. Course Curriculum: Your 10 Module Learning Plan. About Edureka
Course Curriculum: Your 10 Module Learning Plan Big Data and Hadoop About Edureka Edureka is a leading e-learning platform providing live instructor-led interactive online training. We cater to professionals
More informationHadoop. Introduction / Overview
Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures
More informationBig Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours
Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals
More informationDelving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture
Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases
More informationHadoop An Overview. - Socrates CCDH
Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected
More informationThe Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou
The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423
More informationCertified Big Data and Hadoop Course Curriculum
Certified Big Data and Hadoop Course Curriculum The Certified Big Data and Hadoop course by DataFlair is a perfect blend of in-depth theoretical knowledge and strong practical skills via implementation
More informationIntroduction to BigData, Hadoop:-
Introduction to BigData, Hadoop:- Big Data Introduction: Hadoop Introduction What is Hadoop? Why Hadoop? Hadoop History. Different types of Components in Hadoop? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE,
More informationMap Reduce & Hadoop Recommended Text:
Map Reduce & Hadoop Recommended Text: Hadoop: The Definitive Guide Tom White O Reilly 2010 VMware Inc. All rights reserved Big Data! Large datasets are becoming more common The New York Stock Exchange
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationBIG DATA TESTING: A UNIFIED VIEW
http://core.ecu.edu/strg BIG DATA TESTING: A UNIFIED VIEW BY NAM THAI ECU, Computer Science Department, March 16, 2016 2/30 PRESENTATION CONTENT 1. Overview of Big Data A. 5 V s of Big Data B. Data generation
More informationBlended Learning Outline: Cloudera Data Analyst Training (171219a)
Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationInnovatus Technologies
HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String
More informationBig Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Big Data Analytics Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Big Data "The world is crazy. But at least it s getting regular analysis." Izabela
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationData Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros
Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on
More informationCISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More informationFile Inclusion Vulnerability Analysis using Hadoop and Navie Bayes Classifier
File Inclusion Vulnerability Analysis using Hadoop and Navie Bayes Classifier [1] Vidya Muraleedharan [2] Dr.KSatheesh Kumar [3] Ashok Babu [1] M.Tech Student, School of Computer Sciences, Mahatma Gandhi
More informationCertified Big Data Hadoop and Spark Scala Course Curriculum
Certified Big Data Hadoop and Spark Scala Course Curriculum The Certified Big Data Hadoop and Spark Scala course by DataFlair is a perfect blend of indepth theoretical knowledge and strong practical skills
More informationHadoop Development Introduction
Hadoop Development Introduction What is Bigdata? Evolution of Bigdata Types of Data and their Significance Need for Bigdata Analytics Why Bigdata with Hadoop? History of Hadoop Why Hadoop is in demand
More informationBig Data Hadoop Course Content
Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux
More informationBig Data Analytics. Description:
Big Data Analytics Description: With the advance of IT storage, pcoressing, computation, and sensing technologies, Big Data has become a novel norm of life. Only until recently, computers are able to capture
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationData Platforms and Pattern Mining
Morteza Zihayat Data Platforms and Pattern Mining IBM Corporation About Myself IBM Software Group Big Data Scientist 4Platform Computing, IBM (2014 Now) PhD Candidate (2011 Now) 4Lassonde School of Engineering,
More informationChapter 3. Foundations of Business Intelligence: Databases and Information Management
Chapter 3 Foundations of Business Intelligence: Databases and Information Management THE DATA HIERARCHY TRADITIONAL FILE PROCESSING Organizing Data in a Traditional File Environment Problems with the traditional
More informationSTATS Data Analysis using Python. Lecture 7: the MapReduce framework Some slides adapted from C. Budak and R. Burns
STATS 700-002 Data Analysis using Python Lecture 7: the MapReduce framework Some slides adapted from C. Budak and R. Burns Unit 3: parallel processing and big data The next few lectures will focus on big
More informationBig Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural edition
Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural edition What s the BIG deal?! 2011 2011 2008 2010 2012 What s the BIG deal?! (Gartner Hype Cycle) What s the
More informationManagement Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management
Management Information Systems Review Questions Chapter 6 Foundations of Business Intelligence: Databases and Information Management 1) The traditional file environment does not typically have a problem
More informationHadoop. copyright 2011 Trainologic LTD
Hadoop Hadoop is a framework for processing large amounts of data in a distributed manner. It can scale up to thousands of machines. It provides high-availability. Provides map-reduce functionality. Hides
More informationNowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?
Big data hype? Big Data: Hype or Hallelujah? Data Base and Data Mining Group of 2 Google Flu trends On the Internet February 2010 detected flu outbreak two weeks ahead of CDC data Nowcasting http://www.internetlivestats.com/
More informationOverview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::
Module Title Duration : Cloudera Data Analyst Training : 4 days Overview Take your knowledge to the next level Cloudera University s four-day data analyst training course will teach you to apply traditional
More information<Insert Picture Here> Introduction to Big Data Technology
Introduction to Big Data Technology The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More informationA brief history on Hadoop
Hadoop Basics A brief history on Hadoop 2003 - Google launches project Nutch to handle billions of searches and indexing millions of web pages. Oct 2003 - Google releases papers with GFS (Google File System)
More informationSpecialist ICT Learning
Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.
More informationDatabases 2 (VU) ( / )
Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:
More informationSQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism
Big Data and Hadoop with Azure HDInsight Andrew Brust Senior Director, Technical Product Marketing and Evangelism Datameer Level: Intermediate Meet Andrew Senior Director, Technical Product Marketing and
More informationHadoop Online Training
Hadoop Online Training IQ training facility offers Hadoop Online Training. Our Hadoop trainers come with vast work experience and teaching skills. Our Hadoop training online is regarded as the one of the
More informationMicrosoft Big Data and Hadoop
Microsoft Big Data and Hadoop Lara Rubbelke @sqlgal Cindy Gross @sqlcindy 2 The world of data is changing The 4Vs of Big Data http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data 3 Common
More informationDHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI
DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI Department of Information Technology IT6701 - INFORMATION MANAGEMENT Anna University 2 & 16 Mark Questions & Answers Year / Semester: IV / VII Regulation: 2013
More informationIntroduction to Big-Data
Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,
More informationmicrosoft
70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series
More informationChapter 6 VIDEO CASES
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More informationREVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK
REVIEW ON BIG DATA ANALYTICS AND HADOOP FRAMEWORK 1 Dr.R.Kousalya, 2 T.Sindhupriya 1 Research Supervisor, Professor & Head, Department of Computer Applications, Dr.N.G.P Arts and Science College, Coimbatore
More informationThe amount of data increases every day Some numbers ( 2012):
1 The amount of data increases every day Some numbers ( 2012): Data processed by Google every day: 100+ PB Data processed by Facebook every day: 10+ PB To analyze them, systems that scale with respect
More information2/26/2017. The amount of data increases every day Some numbers ( 2012):
The amount of data increases every day Some numbers ( 2012): Data processed by Google every day: 100+ PB Data processed by Facebook every day: 10+ PB To analyze them, systems that scale with respect to
More informationDepartment of Information Technology, St. Joseph s College (Autonomous), Trichy, TamilNadu, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 5 ISSN : 2456-3307 A Survey on Big Data and Hadoop Ecosystem Components
More informationIntroduction to MapReduce (cont.)
Introduction to MapReduce (cont.) Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com USC INF 553 Foundations and Applications of Data Mining (Fall 2018) 2 MapReduce: Summary USC INF 553 Foundations
More informationTop 25 Big Data Interview Questions And Answers
Top 25 Big Data Interview Questions And Answers By: Neeru Jain - Big Data The era of big data has just begun. With more companies inclined towards big data to run their operations, the demand for talent
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large
More informationPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.org Pete Skomoroch Research Scientist at LinkedIn Consultant at Data Wrangling @peteskomoroch 09/29/09 1 Talk Outline TrendingTopics Overview Wikipedia Page
More informationMap Reduce. Yerevan.
Map Reduce Erasmus+ @ Yerevan dacosta@irit.fr Divide and conquer at PaaS 100 % // Typical problem Iterate over a large number of records Extract something of interest from each Shuffle and sort intermediate
More informationexam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0
70-775.exam Number: 70-775 Passing Score: 800 Time Limit: 120 min File Version: 1.0 Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight Version 1.0 Exam A QUESTION 1 You use YARN to
More informationPLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS
PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationLecture 11 Hadoop & Spark
Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem
More informationThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem Zohar Elkayam www.realdbamagic.com Twitter: @realmgic Who am I? Zohar Elkayam, CTO at Brillix Programmer, DBA, team leader, database trainer,
More informationParallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce
Parallel Programming Principle and Practice Lecture 10 Big Data Processing with MapReduce Outline MapReduce Programming Model MapReduce Examples Hadoop 2 Incredible Things That Happen Every Minute On The
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationHadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)
Bigdata Fundamentals Day1: (2hours) 1. Understanding BigData. a. What is Big Data? b. Big-Data characteristics. c. Challenges with the traditional Data Base Systems and Distributed Systems. 2. Distributions:
More informationIntroduction to Hadoop. Owen O Malley Yahoo!, Grid Team
Introduction to Hadoop Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since
More information2/26/2017. Originally developed at the University of California - Berkeley's AMPLab
Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second
More informationCIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench
CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench Abstract Implementing a Hadoop-based system for processing big data and doing analytics is a topic which has been
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn.?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn.?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : About Quality Thought We are
More informationConfiguring and Deploying Hadoop Cluster Deployment Templates
Configuring and Deploying Hadoop Cluster Deployment Templates This chapter contains the following sections: Hadoop Cluster Profile Templates, on page 1 Creating a Hadoop Cluster Profile Template, on page
More informationIntroduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data
Introduction to Hadoop High Availability Scaling Advantages and Challenges Introduction to Big Data What is Big data Big Data opportunities Big Data Challenges Characteristics of Big data Introduction
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationData Informatics. Seon Ho Kim, Ph.D.
Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu HBase HBase is.. A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage. Designed to operate
More informationA Glimpse of the Hadoop Echosystem
A Glimpse of the Hadoop Echosystem 1 Hadoop Echosystem A cluster is shared among several users in an organization Different services HDFS and MapReduce provide the lower layers of the infrastructures Other
More informationBig Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture
Big Data Syllabus Hadoop YARN Setup Programming in YARN framework j Understanding big data and Hadoop Big Data Limitations and Solutions of existing Data Analytics Architecture Hadoop Features Hadoop Ecosystem
More informationWhat is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?
Simple to start What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? What is the maximum download speed you get? Simple computation
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationBIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29,
BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, 2016 1 OBJECTIVES ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29, 2016 2 WHAT
More informationWebinar Series TMIP VISION
Webinar Series TMIP VISION TMIP provides technical support and promotes knowledge and information exchange in the transportation planning and modeling community. Today s Goals To Consider: Parallel Processing
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining
More informationStream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...
Data Ingestion ETL, Distcp, Kafka, OpenRefine, Query & Exploration SQL, Search, Cypher, Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...
More informationOracle Big Data Science
Oracle Big Data Science Tim Vlamis and Dan Vlamis Vlamis Software Solutions 816-781-2880 www.vlamis.com @VlamisSoftware Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri
More informationD DAVID PUBLISHING. Big Data; Definition and Challenges. 1. Introduction. Shirin Abbasi
Journal of Energy and Power Engineering 10 (2016) 405-410 doi: 10.17265/1934-8975/2016.07.004 D DAVID PUBLISHING Shirin Abbasi Computer Department, Islamic Azad University-Tehran Center Branch, Tehran
More informationBig Data Specialized Studies
Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate
More informationScalable Web Programming. CS193S - Jan Jannink - 2/25/10
Scalable Web Programming CS193S - Jan Jannink - 2/25/10 Weekly Syllabus 1.Scalability: (Jan.) 2.Agile Practices 3.Ecology/Mashups 4.Browser/Client 7.Analytics 8.Cloud/Map-Reduce 9.Published APIs: (Mar.)*
More informationA Review Paper on Big data & Hadoop
A Review Paper on Big data & Hadoop Rupali Jagadale MCA Department, Modern College of Engg. Modern College of Engginering Pune,India rupalijagadale02@gmail.com Pratibha Adkar MCA Department, Modern College
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining
More informationBig Data Analytics using Apache Hadoop and Spark with Scala
Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important
More informationdocs.hortonworks.com
docs.hortonworks.com : Getting Started Guide Copyright 2012, 2014 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing,
More informationNew Approaches to Big Data Processing and Analytics
New Approaches to Big Data Processing and Analytics Contributing authors: David Floyer, David Vellante Original publication date: February 12, 2013 There are number of approaches to processing and analyzing
More informationHADOOP FRAMEWORK FOR BIG DATA
HADOOP FRAMEWORK FOR BIG DATA Mr K. Srinivas Babu 1,Dr K. Rameshwaraiah 2 1 Research Scholar S V University, Tirupathi 2 Professor and Head NNRESGI, Hyderabad Abstract - Data has to be stored for further
More informationTackling Big Data Using MATLAB
Tackling Big Data Using MATLAB Alka Nair Application Engineer 2015 The MathWorks, Inc. 1 Building Machine Learning Models with Big Data Access Preprocess, Exploration & Model Development Scale up & Integrate
More informationIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce Who Am I - Ryan Tabora - Data Developer at Think Big Analytics - Big Data Consulting - Experience working with Hadoop, HBase, Hive, Solr, Cassandra, etc. 2 Who Am I -
More informationHadoop course content
course content COURSE DETAILS 1. In-detail explanation on the concepts of HDFS & MapReduce frameworks 2. What is 2.X Architecture & How to set up Cluster 3. How to write complex MapReduce Programs 4. In-detail
More informationOnline Bill Processing System for Public Sectors in Big Data
IJIRST International Journal for Innovative Research in Science & Technology Volume 4 Issue 10 March 2018 ISSN (online): 2349-6010 Online Bill Processing System for Public Sectors in Big Data H. Anwer
More information