Processing Rat Brain Neuronal Signals Using a Hadoop Computing Cluster
|
|
- Claire Berry
- 5 years ago
- Views:
Transcription
1 Processing Rat Brain Neuronal Signals Using a Hadoop Computing Cluster Jadin C. Jackson, PhD Biology University of St. Thomas jadincjackson@stthomas.edu Bradley S. Rubin, PhD Graduate Programs in Software University of St. Thomas bsrubin@stthomas.edu 1
2 Introduc)on: Hippocampus as a memory structure Human lesions - HM Medial Temporal Lobe resec)on for epilepsy IQ = 112 Unable to form las)ng declara)ve memory Corkin et al. (1997) J. Neurosci., 17(10):
3 Hippocampus Human Rat Paxinos & Watson 1997
4 Introduc)on Naviga)onal Memory Rat lesions Neural Correlates The Place Cell R TT04-01 OF
5 Recording neural ensembles n LeY: Hyperdrive n 12 tetrodes allow for simultaneous recording of up to 100 neurons, or more.
6 Tetrodes Neurons generate electrical signals we can record Tetrodes let us dis)nguish these signals based on their spa)al distribu)on. A C B
7 Neural signals Neurons are Highly Organized: Connec)vity Spa)al Distribu)on Organized Connec6vity and Distribu6on: Summa)on of currents Buzsaki (2004)
8 Signals we can get from Tetrodes 600 Hz 6 khz + ~1 msec Ac)on Poten)als Spikes 1 Hz 475 Hz + ~100 msec Local Field Poten)als (LFPs) or EEG Note: Electroencephalogram (EEG) usually refers to signals recorded from outside the brain.
9 Hippocampal Place Field
10 Analyzing LFP data
11 Filtering Hippocampal LFP + = 500 ms
12 Filtering Hippocampal LFP Theta (θ) = 6-10 Hz 500 ms Gamma γ Slow = 35-80Hz γ Fast = Hz
13 Convolu)on ()me domain) f and g are two real- valued func)ons t is )me τ is a dummy variable n is sample index k is a dummy index hfp://en.wikipedia.org/wiki/convolu)on
14 Con)nuous Wavelet Transform x is the signal ψ is the wavelet t is )me a is the scaling factor b is the shiy ψ =
15 Con)nuous Wavelet Transform x is the signal ψ is the wavelet t is )me a is the scaling factor b is the shiy ψ = = small
16 Con)nuous Wavelet Transform x is the signal ψ is the wavelet t is )me a is the scaling factor b is the shiy ψ = = large
17
18 Frequency Time
19 Wavelet Transform Signal (Samar et al., 2002)
20 Convolu)on (frequency domain) F {} is the Fourier Transform F{f(t) g(t)}= F{f(t)} F{g(t)} f and g are two real- valued func)ons t is )me Convolu)on = Bofom Line: Mul)plying the Frequency Response of two func)ons = The Frequency Response of the convolu)on of those func)ons. Take the FFT of two func)ons, mul)ply the results, and convert back to the )me domain.
21 Channel Averaging Frequency (Hz) Average
22 Event- triggered Analysis: subselng Frequency (Hz) t i t i +τ
23 Frequency (Hz) Applying the Wavelet Transform Theta (θ) Time γ Slow γ Fast
24 What is phase locking? Imagine we have + 2 to 3 dis6nct oscilla6ons being generated by local neural circuitry. = (+ Noise) If they are all coordinated with each other, then they would be phase- locked.
25 Compu)ng phase locking 360 o 0 o
26 Compu)ng phase locking 360 o 0 o
27 Compu)ng phase locking 360 o 0 o
28 Compu)ng phase locking 360 o 360 o 360 o 0 o 0 o 0 o Average by phase
29 Gamma Fast gamma and Slow gamma are on separate theta cycles. Fast gamma = Medial Entorhinal Ctx. Slow gamma = CA3 Bofom Line: Different gamma = Different inputs
30 Descisions Choice Point Behaviors at the Choice Point What inputs is the hippocampal network processing during this decision?
31 GPS Hadoop Cluster 24 Nodes (1 master, 23 slaves), running Ubuntu Server Nodes are Sun Fire X2200M2 with 2x AMD Opteron 2214 DualCore 2.2GHz processors Master: 18GB RAM, 2x 1 TB drives (RAID 1 mirrored) NameNode, SecondaryNameNode, JobTracker, MySQL for Hive metastore, user local home directories Slaves: 12GB RAM, 250 GB + 1 TB drives DataNode, TaskTracker, 2 CPU core slots for mappers and reducers 31!
32 Single Rat Run Processing Flow 6.6 million records/channel 15 channels 99 million records total 1.3 GB total 196 records 950 KB total Convolution Output 23.5 GB/channel 353 GB total 32!
33 Convolution Step Each HDFS file contains pairs of )me and voltage values for a single rat run channel We set the input to non- splifable, so the default 64MB HDFS block size is ignored, and each file is sent in its en)rety to a single mapper In the mapper, we read in the input file into a buffer (buffer size = power of 2) Once the buffer is completely loaded, in the close() method, we perform the FFT, complex number mul)plica)on, and inverse FFT for kernel frequencies from Hz in 1- Hz increments There is no reduce step, which helped with output file naming This approach does not work well with the percentage complete job monitoring! 33!
34 Directory Structure We write the convolu)on output data using this directory structure so that we can leverage Hive par))oning /neuro/output/rats dt= rat=r184 rat=r channel=1a channel=2a... R a R a 34
35 Hive External Table Creation Hive s external table crea)on capability allows SQL query capability over HDFS- resident output files CREATE EXTERNAL TABLE rats (time INT, frequency SMALLINT, convolution FLOAT) PARTITIONED BY(rat STRING, dt STRING, channel STRING) ROW FORMAT SERDE 'neurohadoop.ratserde' STORED AS SEQUENCEFILE LOCATION '/neuro/output/rats'; ALTER TABLE rats ADD PARTITION(rat='R184',dt=' ',channel='1a');...! Hive s par))oning capability limits the data traversal to only the subdirectories needed to answer the query, yielding a major performance boost for most subsequent opera)ons 35!
36 Hive Serde It is most convenient to use text files for Hadoop and Hive input and output Because we have a large amount of data to pass between Hadoop output and Hive input, we wanted to use a binary data format, called a SequenceFile Also, Hive ignores Keys in SequenceFiles, and only treats a Value as a Hive column, so we needed to create a custom Java serde (serializer/deserializer) for this complex value object consis)ng of an int, a short, and a float to map to the three columns This is not documented well, so here is our code hfp://pastebin.com/xuy36kxg Also, we used Snappy block- level compression Result: Bytes/convolu)on output record > !
37 Result! Amplitude! Frequency! 37!
38 Result Frequency! Phase! 38!
39 Performance Overview Performance growth is sub- linear for each group of 3 rat runs because ayer that point, we then use up all mapper slots (15/rat run * 3 rat runs = 45 slots, and we have 46 available) 39!
40 Performance Detail Channel Averaging Convolu)on 40!
41 Convolution Performance Load Kernel: 326 Load Data: Signal FFT: 1081 Kernel FFT: 604 Product: 203 Inverse FFT: 1070 Output Data: Kernel FFT: 418 Product: 85 Inverse FFT: 755 Output Data: Kernel FFT: 554 Product: 88 Inverse FFT: 859 Output Data: All 196 kernels are loaded from the distributed cache, then all the signal data for a channel is loaded and FFT processed Then each kernel is FFT processed, complex mul)plied with the signal FFT, and the result is processed with an inverse FFT, and then the output data is wrifen to HDFS, repeated for each kernel Performance results in milliseconds Average Java/Hadoop )me/convolu)on = 37 sec Average Matlab )me/convolu)on = 11 sec! We used the JTransforms FFT library: hfps://sites.google.com/site/piotrwendykier/soyware/jtransforms 41!
42 Overall Comparison with Matlab An apples- apples comparison is difficult because single worksta)on memory limita)ons dictate a non- op)mal processing flow for the Matlab approach The Hadoop approach does the processing in this order: Squaring, Averaging the 12 channels, Compu)ng the mean and standard devia)on of the average channel values, Subselng the )me intervals, Z- scoring This takes about 4 hours to process using 46 compu)ng slots The Matlab approach does the processing in this order: Squaring, Z- scoring, Subselng the )me intervals, Averaging the 12 channels This takes about 10 hours to process, using a small number of process cores 42!
43 Qualitative Hadoop/Hive Benefits vs. Matlab With Hadoop We were able to implement the op)mal processing path, unconstrained by single worksta)on memory All input and output data is online All intermediate data is online and available for subsequent ad hoc processing Easy total batch processing consis)ng of a sequence of Java MapReduce and Hive steps dump data into HDFS, issue command, come back when all done 43!
44 Amazon Elastic MapReduce We ported most of our applica)on to run on Amazon s Elas)c MapReduce service The only Java code change needed was changing file references to use an s3n://bucketname prefix AWS allows data in/out directly from persistent S3 storage, or transient HDFS storage (we did the former) Performance was comparable to our cluster, cost is in the low 10s of dollars. 44!
45 Amazon Elastic MapReduce Surprised Hive metadata is transient, unless a MySQL instance is configured, so tables created in one process step disappear in subsequent steps hfp://docs.amazonwebservices.com/elas)cmapreduce/latest/developerguide/ UsingEMR_Hive.html#emr- dev- create- metastore- outside S3 has 5 GB limit, so we needed to enable mul)- part upload hfp://docs.amazonwebservices.com/elas)cmapreduce/latest/developerguide/ Config_Mul)part.html?r=3305 An 8- core instance has 7 slots available for MR processing Small errors consume seconds, but billing rounds up to the nearest hour 45!
Apache Hive. CMSC 491 Hadoop-Based Distributed Compu<ng Spring 2016 Adam Shook
Apache Hive CMSC 491 Hadoop-Based Distributed Compu
More informationHive SQL over Hadoop
Hive SQL over Hadoop Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Introduction Apache Hive is a high-level abstraction on top of MapReduce Uses
More informationInforma)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies
Informa)on Retrieval and Map- Reduce Implementa)ons Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies mas4108@louisiana.edu Map-Reduce: Why? Need to process 100TB datasets On 1 node:
More informationHadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)
Bigdata Fundamentals Day1: (2hours) 1. Understanding BigData. a. What is Big Data? b. Big-Data characteristics. c. Challenges with the traditional Data Base Systems and Distributed Systems. 2. Distributions:
More informationBig Data for Engineers Spring Resource Management
Ghislain Fourny Big Data for Engineers Spring 2018 7. Resource Management artjazz / 123RF Stock Photo Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models
More informationCloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018
Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster
More informationitpass4sure Helps you pass the actual test with valid and latest training material.
itpass4sure http://www.itpass4sure.com/ Helps you pass the actual test with valid and latest training material. Exam : CCD-410 Title : Cloudera Certified Developer for Apache Hadoop (CCDH) Vendor : Cloudera
More informationHadoop MapReduce Framework
Hadoop MapReduce Framework Contents Hadoop MapReduce Framework Architecture Interaction Diagram of MapReduce Framework (Hadoop 1.0) Interaction Diagram of MapReduce Framework (Hadoop 2.0) Hadoop MapReduce
More informationEsgynDB Enterprise 2.0 Platform Reference Architecture
EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed
More informationBig Data 7. Resource Management
Ghislain Fourny Big Data 7. Resource Management artjazz / 123RF Stock Photo Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage
More informationFacilitating Consistency Check between Specification & Implementation with MapReduce Framework
Facilitating Consistency Check between Specification & Implementation with MapReduce Framework Shigeru KUSAKABE, Yoichi OMORI, Keijiro ARAKI Kyushu University, Japan 2 Our expectation Light-weight formal
More information3. Monitoring Scenarios
3. Monitoring Scenarios This section describes the following: Navigation Alerts Interval Rules Navigation Ambari SCOM Use the Ambari SCOM main navigation tree to browse cluster, HDFS and MapReduce performance
More informationIntroduction to Hive Cloudera, Inc.
Introduction to Hive Outline Motivation Overview Data Model Working with Hive Wrap up & Conclusions Background Started at Facebook Data was collected by nightly cron jobs into Oracle DB ETL via hand-coded
More informationMI-PDB, MIE-PDB: Advanced Database Systems
MI-PDB, MIE-PDB: Advanced Database Systems http://www.ksi.mff.cuni.cz/~svoboda/courses/2015-2-mie-pdb/ Lecture 10: MapReduce, Hadoop 26. 4. 2016 Lecturer: Martin Svoboda svoboda@ksi.mff.cuni.cz Author:
More informationMapReduce, Hadoop and Spark. Bompotas Agorakis
MapReduce, Hadoop and Spark Bompotas Agorakis Big Data Processing Most of the computations are conceptually straightforward on a single machine but the volume of data is HUGE Need to use many (1.000s)
More informationLecture 7 (03/12, 03/14): Hive and Impala Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2018
Lecture 7 (03/12, 03/14): Hive and Impala Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2018 K. Zhang (pic source: mapr.com/blog) Copyright BUDT 2016 758 Where
More informationApache Hive for Oracle DBAs. Luís Marques
Apache Hive for Oracle DBAs Luís Marques About me Oracle ACE Alumnus Long time open source supporter Founder of Redglue (www.redglue.eu) works for @redgluept as Lead Data Architect @drune After this talk,
More informationHive and Shark. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)
Hive and Shark Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Hive and Shark 1393/8/19 1 / 45 Motivation MapReduce is hard to
More informationLecture 11 Hadoop & Spark
Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem
More informationShark: Hive (SQL) on Spark
Shark: Hive (SQL) on Spark Reynold Xin UC Berkeley AMP Camp Aug 21, 2012 UC BERKELEY SELECT page_name, SUM(page_views) views FROM wikistats GROUP BY page_name ORDER BY views DESC LIMIT 10; Stage 0: Map-Shuffle-Reduce
More informationIntroduction to BigData, Hadoop:-
Introduction to BigData, Hadoop:- Big Data Introduction: Hadoop Introduction What is Hadoop? Why Hadoop? Hadoop History. Different types of Components in Hadoop? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE,
More informationDec 2, 2013: Hippocampal spa:al map forma:on
MATH:7450 (22M:305) Topics in Topology: Scien:fic and Engineering Applica:ons of Algebraic Topology Dec 2, 2013: Hippocampal spa:al map forma:on Fall 2013 course offered through the University of Iowa
More informationMixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp
MixApart: Decoupled Analytics for Shared Storage Systems Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp Hadoop Pig, Hive Hadoop + Enterprise storage?! Shared storage
More informationCCA Administrator Exam (CCA131)
CCA Administrator Exam (CCA131) Cloudera CCA-500 Dumps Available Here at: /cloudera-exam/cca-500-dumps.html Enrolling now you will get access to 60 questions in a unique set of CCA- 500 dumps Question
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationORC Files. Owen O June Page 1. Hortonworks Inc. 2012
ORC Files Owen O Malley owen@hortonworks.com @owen_omalley owen@hortonworks.com June 2013 Page 1 Who Am I? First committer added to Hadoop in 2006 First VP of Hadoop at Apache Was architect of MapReduce
More informationCloudera Exam CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Version: 7.5 [ Total Questions: 97 ]
s@lm@n Cloudera Exam CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Version: 7.5 [ Total Questions: 97 ] Question No : 1 Which two updates occur when a client application opens a stream
More informationPerformance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis
Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis Elif Dede, Madhusudhan Govindaraju Lavanya Ramakrishnan, Dan Gunter, Shane Canon Department of Computer Science, Binghamton
More informationYour First Hadoop App, Step by Step
Learn Hadoop in one evening Your First Hadoop App, Step by Step Martynas 1 Miliauskas @mmiliauskas Your First Hadoop App, Step by Step By Martynas Miliauskas Published in 2013 by Martynas Miliauskas On
More informationHadoop Map Reduce 10/17/2018 1
Hadoop Map Reduce 10/17/2018 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind of functional programming We focus on the MapReduce execution engine of Hadoop through YARN 10/17/2018
More informationIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce Who Am I - Ryan Tabora - Data Developer at Think Big Analytics - Big Data Consulting - Experience working with Hadoop, HBase, Hive, Solr, Cassandra, etc. 2 Who Am I -
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423
More informationHadoop. copyright 2011 Trainologic LTD
Hadoop Hadoop is a framework for processing large amounts of data in a distributed manner. It can scale up to thousands of machines. It provides high-availability. Provides map-reduce functionality. Hides
More informationGuoping Wang and Chee-Yong Chan Department of Computer Science, School of Computing National University of Singapore VLDB 14.
Guoping Wang and Chee-Yong Chan Department of Computer Science, School of Computing National University of Singapore VLDB 14 Page 1 Introduction & Notations Multi-Job optimization Evaluation Conclusion
More informationMapReduce, Apache Hadoop
NDBI040: Big Data Management and NoSQL Databases hp://www.ksi.mff.cuni.cz/ svoboda/courses/2016-1-ndbi040/ Lecture 2 MapReduce, Apache Hadoop Marn Svoboda svoboda@ksi.mff.cuni.cz 11. 10. 2016 Charles University
More informationHIVE MOCK TEST HIVE MOCK TEST III
http://www.tutorialspoint.com HIVE MOCK TEST Copyright tutorialspoint.com This section presents you various set of Mock Tests related to Hive. You can download these sample mock tests at your local machine
More informationMapReduce, Apache Hadoop
Czech Technical University in Prague, Faculty of Informaon Technology MIE-PDB: Advanced Database Systems hp://www.ksi.mff.cuni.cz/~svoboda/courses/2016-2-mie-pdb/ Lecture 12 MapReduce, Apache Hadoop Marn
More informationBig Data Hadoop Course Content
Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux
More informationShark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker
Shark Hive on Spark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Agenda Intro to Spark Apache Hive Shark Shark s Improvements over Hive Demo Alpha
More informationBig Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Big Data Analytics Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Big Data "The world is crazy. But at least it s getting regular analysis." Izabela
More informationTITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP
TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop
More informationHadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here
Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here 2013-11-12 Copyright 2013 Cloudera
More informationHadoop. Introduction to BIGDATA and HADOOP
Hadoop Introduction to BIGDATA and HADOOP What is Big Data? What is Hadoop? Relation between Big Data and Hadoop What is the need of going ahead with Hadoop? Scenarios to apt Hadoop Technology in REAL
More informationPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.org Pete Skomoroch Research Scientist at LinkedIn Consultant at Data Wrangling @peteskomoroch 09/29/09 1 Talk Outline TrendingTopics Overview Wikipedia Page
More informationTuning the Hive Engine for Big Data Management
Tuning the Hive Engine for Big Data Management Copyright Informatica LLC 2017. Informatica, the Informatica logo, Big Data Management, PowerCenter, and PowerExchange are trademarks or registered trademarks
More informationData Storage Infrastructure at Facebook
Data Storage Infrastructure at Facebook Spring 2018 Cleveland State University CIS 601 Presentation Yi Dong Instructor: Dr. Chung Outline Strategy of data storage, processing, and log collection Data flow
More informationClustering Lecture 8: MapReduce
Clustering Lecture 8: MapReduce Jing Gao SUNY Buffalo 1 Divide and Conquer Work Partition w 1 w 2 w 3 worker worker worker r 1 r 2 r 3 Result Combine 4 Distributed Grep Very big data Split data Split data
More informationBig Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours
Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals
More informationPyTables. An on- disk binary data container. Francesc Alted. May 9 th 2012, Aus=n Python meetup
PyTables An on- disk binary data container Francesc Alted May 9 th 2012, Aus=n Python meetup Overview What PyTables is? Data structures in PyTables The one million song dataset Advanced capabili=es in
More informationData Access 3. Managing Apache Hive. Date of Publish:
3 Managing Apache Hive Date of Publish: 2018-07-12 http://docs.hortonworks.com Contents ACID operations... 3 Configure partitions for transactions...3 View transactions...3 View transaction locks... 4
More informationDistributed Systems 16. Distributed File Systems II
Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS
More informationHADOOP FRAMEWORK FOR BIG DATA
HADOOP FRAMEWORK FOR BIG DATA Mr K. Srinivas Babu 1,Dr K. Rameshwaraiah 2 1 Research Scholar S V University, Tirupathi 2 Professor and Head NNRESGI, Hyderabad Abstract - Data has to be stored for further
More informationDistributed Filesystem
Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the
More informationCopyright 2012, Oracle and/or its affiliates. All rights reserved.
1 Big Data Connectors: High Performance Integration for Hadoop and Oracle Database Melli Annamalai Sue Mavris Rob Abbott 2 Program Agenda Big Data Connectors: Brief Overview Connecting Hadoop with Oracle
More informationVendor: Cloudera. Exam Code: CCA-505. Exam Name: Cloudera Certified Administrator for Apache Hadoop (CCAH) CDH5 Upgrade Exam.
Vendor: Cloudera Exam Code: CCA-505 Exam Name: Cloudera Certified Administrator for Apache Hadoop (CCAH) CDH5 Upgrade Exam Version: Demo QUESTION 1 You have installed a cluster running HDFS and MapReduce
More informationHDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung
HDFS: Hadoop Distributed File System CIS 612 Sunnie Chung What is Big Data?? Bulk Amount Unstructured Introduction Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per
More informationHADOOP COURSE CONTENT (HADOOP-1.X, 2.X & 3.X) (Development, Administration & REAL TIME Projects Implementation)
HADOOP COURSE CONTENT (HADOOP-1.X, 2.X & 3.X) (Development, Administration & REAL TIME Projects Implementation) Introduction to BIGDATA and HADOOP What is Big Data? What is Hadoop? Relation between Big
More informationEnergy Efficient Transparent Library Accelera4on with CAPI Heiner Giefers IBM Research Zurich
Energy Efficient Transparent Library Accelera4on with CAPI Heiner Giefers IBM Research Zurich Revolu'onizing the Datacenter Datacenter Join the Conversa'on #OpenPOWERSummit Towards highly efficient data
More informationGhislain Fourny. Big Data 6. Massive Parallel Processing (MapReduce)
Ghislain Fourny Big Data 6. Massive Parallel Processing (MapReduce) So far, we have... Storage as file system (HDFS) 13 So far, we have... Storage as tables (HBase) Storage as file system (HDFS) 14 Data
More informationEnhanced Hadoop with Search and MapReduce Concurrency Optimization
Volume 114 No. 12 2017, 323-331 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Enhanced Hadoop with Search and MapReduce Concurrency Optimization
More informationsqoop Easy, parallel database import/export Aaron Kimball Cloudera Inc. June 8, 2010
sqoop Easy, parallel database import/export Aaron Kimball Cloudera Inc. June 8, 2010 Your database Holds a lot of really valuable data! Many structured tables of several hundred GB Provides fast access
More informationHadoop-PR Hortonworks Certified Apache Hadoop 2.0 Developer (Pig and Hive Developer)
Hortonworks Hadoop-PR000007 Hortonworks Certified Apache Hadoop 2.0 Developer (Pig and Hive Developer) http://killexams.com/pass4sure/exam-detail/hadoop-pr000007 QUESTION: 99 Which one of the following
More informationDatabase Applications (15-415)
Database Applications (15-415) Hadoop Lecture 24, April 23, 2014 Mohammad Hammoud Today Last Session: NoSQL databases Today s Session: Hadoop = HDFS + MapReduce Announcements: Final Exam is on Sunday April
More informationTuning Intelligent Data Lake Performance
Tuning Intelligent Data Lake Performance 2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without
More informationThe Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler
The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler MSST 10 Hadoop in Perspective Hadoop scales computation capacity, storage capacity, and I/O bandwidth by
More informationComputer Memory. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1
Computer Memory Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1 Warm Up public int sum1(int n, int m, int[][] table) { int output = 0; for (int i = 0; i < n; i++) { for (int j = 0; j
More informationComparative Analysis of K means Clustering Sequentially And Parallely
Comparative Analysis of K means Clustering Sequentially And Parallely Kavya D S 1, Chaitra D Desai 2 1 M.tech, Computer Science and Engineering, REVA ITM, Bangalore, India 2 REVA ITM, Bangalore, India
More information1Z Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions
1Z0-449 Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions Table of Contents Introduction to 1Z0-449 Exam on Oracle Big Data 2017 Implementation Essentials... 2 Oracle 1Z0-449
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationIntroduction to MapReduce
Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed
More informationBest Practices and Performance Tuning on Amazon Elastic MapReduce
Best Practices and Performance Tuning on Amazon Elastic MapReduce Michael Hanisch Solutions Architect Amo Abeyaratne Big Data and Analytics Consultant ANZ 12.04.2016 2016, Amazon Web Services, Inc. or
More informationAaron Sun, in collaboration with Taehoon Kang, William Greene, Ben Speakmon and Chris Mills
Aaron Sun, in collaboration with Taehoon Kang, William Greene, Ben Speakmon and Chris Mills INTRO About KIXEYE An online gaming company focused on mid- core and hard- core games Founded in 00 Over 00 employees
More information10 Million Smart Meter Data with Apache HBase
10 Million Smart Meter Data with Apache HBase 5/31/2017 OSS Solution Center Hitachi, Ltd. Masahiro Ito OSS Summit Japan 2017 Who am I? Masahiro Ito ( 伊藤雅博 ) Software Engineer at Hitachi, Ltd. Focus on
More informationForget about the Clouds, Shoot for the MOON
Forget about the Clouds, Shoot for the MOON Wu FENG feng@cs.vt.edu Dept. of Computer Science Dept. of Electrical & Computer Engineering Virginia Bioinformatics Institute September 2012, W. Feng Motivation
More informationGhislain Fourny. Big Data Fall Massive Parallel Processing (MapReduce)
Ghislain Fourny Big Data Fall 2018 6. Massive Parallel Processing (MapReduce) Let's begin with a field experiment 2 400+ Pokemons, 10 different 3 How many of each??????????? 4 400 distributed to many volunteers
More informationGoogle File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information
Subject 10 Fall 2015 Google File System and BigTable and tiny bits of HDFS (Hadoop File System) and Chubby Not in textbook; additional information Disclaimer: These abbreviated notes DO NOT substitute
More informationApril Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.
1. MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model. MapReduce is a framework for processing big data which processes data in two phases, a Map
More informationPart 1: Indexes for Big Data
JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,
More informationIntroduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data
Introduction to Hadoop High Availability Scaling Advantages and Challenges Introduction to Big Data What is Big data Big Data opportunities Big Data Challenges Characteristics of Big data Introduction
More informationOracle Big Data SQL High Performance Data Virtualization Explained
Keywords: Oracle Big Data SQL High Performance Data Virtualization Explained Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data SQL, SQL, Big Data, Hadoop, NoSQL Databases, Relational Databases,
More informationHADOOP 3.0 is here! Dr. Sandeep Deshmukh Sadepach Labs Pvt. Ltd. - Let us grow together!
HADOOP 3.0 is here! Dr. Sandeep Deshmukh sandeep@sadepach.com Sadepach Labs Pvt. Ltd. - Let us grow together! About me BE from VNIT Nagpur, MTech+PhD from IIT Bombay Worked with Persistent Systems - Life
More informationExam Questions CCA-500
Exam Questions CCA-500 Cloudera Certified Administrator for Apache Hadoop (CCAH) https://www.2passeasy.com/dumps/cca-500/ Question No : 1 Your cluster s mapred-start.xml includes the following parameters
More informationVoldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Voldemort Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/29 Outline 1 2 3 Smruti R. Sarangi Leader Election 2/29 Data
More informationLocating ego-centers in depth for hippocampal place cells
204 5th Joint Symposium on Neural Computation Proceedings UCSD (1998) Locating ego-centers in depth for hippocampal place cells Kechen Zhang,' Terrence J. Sejeowski112 & Bruce L. ~cnau~hton~ 'Howard Hughes
More informationDistributed Systems. CS422/522 Lecture17 17 November 2014
Distributed Systems CS422/522 Lecture17 17 November 2014 Lecture Outline Introduction Hadoop Chord What s a distributed system? What s a distributed system? A distributed system is a collection of loosely
More informationTechno Expert Solutions An institute for specialized studies!
Course Content of Big Data Hadoop( Intermediate+ Advance) Pre-requistes: knowledge of Core Java/ Oracle: Basic of Unix S.no Topics Date Status Introduction to Big Data & Hadoop Importance of Data& Data
More informationBatch Inherence of Map Reduce Framework
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.287
More informationMATE-EC2: A Middleware for Processing Data with Amazon Web Services
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering
More informationThe ATLAS EventIndex: Full chain deployment and first operation
The ATLAS EventIndex: Full chain deployment and first operation Álvaro Fernández Casaní Instituto de Física Corpuscular () Universitat de València CSIC On behalf of the ATLAS Collaboration 1 Outline ATLAS
More informationBig Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.
Big Data Programming: an Introduction Spring 2015, X. Zhang Fordham Univ. Outline What the course is about? scope Introduction to big data programming Opportunity and challenge of big data Origin of Hadoop
More informationAlbis: High-Performance File Format for Big Data Systems
Albis: High-Performance File Format for Big Data Systems Animesh Trivedi, Patrick Stuedi, Jonas Pfefferle, Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference
More informationTuning Enterprise Information Catalog Performance
Tuning Enterprise Information Catalog Performance Copyright Informatica LLC 2015, 2018. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the United States
More informationChapter 5. The MapReduce Programming Model and Implementation
Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing
More informationCluster Setup. Table of contents
Table of contents 1 Purpose...2 2 Pre-requisites...2 3 Installation...2 4 Configuration... 2 4.1 Configuration Files...2 4.2 Site Configuration... 3 5 Cluster Restartability... 10 5.1 Map/Reduce...10 6
More informationWhat is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?
Simple to start What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? What is the maximum download speed you get? Simple computation
More informationA Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System
A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System Ilkay Al(ntas and Daniel Crawl San Diego Supercomputer Center UC San Diego Jianwu Wang UMBC WorDS.sdsc.edu Computa3onal
More informationConfiguring Ports for Big Data Management, Data Integration Hub, Enterprise Information Catalog, and Intelligent Data Lake 10.2
Configuring s for Big Data Management, Data Integration Hub, Enterprise Information Catalog, and Intelligent Data Lake 10.2 Copyright Informatica LLC 2016, 2017. Informatica, the Informatica logo, Big
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationCertified Big Data and Hadoop Course Curriculum
Certified Big Data and Hadoop Course Curriculum The Certified Big Data and Hadoop course by DataFlair is a perfect blend of in-depth theoretical knowledge and strong practical skills via implementation
More informationInria, Rennes Bretagne Atlantique Research Center
Hadoop TP 1 Shadi Ibrahim Inria, Rennes Bretagne Atlantique Research Center Getting started with Hadoop Prerequisites Basic Configuration Starting Hadoop Verifying cluster operation Hadoop INRIA S.IBRAHIM
More information