Cloud Architectures. Jinesh Varia. Amazon Web Services. Technology Evangelist
|
|
- Daniel Morris
- 5 years ago
- Views:
Transcription
1 Cloud Architectures Jinesh Varia Technology Evangelist Amazon Web Services
2 ANIMOTO.COM
3 Scale: 50 servers to 3500 servers in 3 days
4
5
6 Winner December 2007 Prize:Golden Hammer Photo: Smashing the hardware
7
8
9 TimesMachine from NY Times Articles TIFF -> PDF Input: 11 Million Articles (4TB of data) What did he do? 100 EC2 Instances for 24 hours All data on S3 Output: 1.5 TB of Data Hadoop, itext, JetS3t Under $400
10
11
12
13 CS290F : Scalable Internet Services USCB Fall 2006 Prof created an app to manage team usage Ruby on Rails Complete Stack: From Load balancer, App Server to DB Learn how to scale: Simulated load Generated Graphs All course contents, students assignments, lessons learned are on the Wiki
14 CS345a : Data Stanford Tools used: Shell/Linux/Java Hadoop on EC2 Data set on S3 Datasets :NetFlix, Alexa, IR datasets from TREC Class organization: Stanford Winter Students Each Team spawns Hadoop slave nodes TA created Getting- Started AMIs (& scripts) TA managed the students usage
15
16 Cloud Architectures
17 Cloud Architectures Hardware Infrastructure Job execution time time
18 Cloud Architectures Hardware Infrastructure Job execution time time
19 Shrink your processing time CPUs time
20 Shrink your processing time CPUs time
21 Main Problems Technical How to co-ordinate jobs between machines (distributed processing)? What if a machine fails? How will I Scale-out? Business How do I get management signoff? Resources to manage the infrastructure? How do I get rid of the Idle Infrastructure? Hadoop Web Services Cloud Computing
22 Let s take a usecase Web Company : Analyze large-data sets of clickstream logs Social Networking Company : Analyze demographic and market data Phone Company : Locate all customers who have called in a given area Large Retailer Chain : Wants to know what items a particular customer bought last month Surveillance Company : Wants to transcodevideo for last several years PharmaCompany : Wants locate people who were prescribed a certain drug
23 GrepTheWeb
24 What s so cool about GrepTheWeb? RegEx WWW
25 Examples of Patterns Source Code intx = 40 + i Any thing with punctuation Hey! he said, Are you ok? Case Sensitive Function CallOrderController() Equations f(x) = x^2 Other Patterns (dis)integration of life, Address
26 Zoom Level 1 RegEx Input dataset (List of Document Urls) Alexa GrepTheWeb Service GetStatus Subset of document URLs that matched the RegEx
27 Zoom Level 2 Amazon SQS Distributed Transient Buffer Never Lose a message StartGrep RegEx Amazon SQS Input Files (Alexa Crawl) Amazon S3 Infinitely Scalable Storage in the cloud Ideal for small short-lived Highly Available, Amazon Durable SimpleDB and Reliable Amazon EC2 Database in the cloud messages Manage phases Resizable Computing Private and Public Storage Capacity in the cloud Controller Access control Pay by the GB Lightweight Query-able User info, Attribute Launch, Monitor, Store Spawn Server Instances Job status info Shutdown Message using a Locking Web Service call Root Level Access Amazon SimpleDB DB Amazon EC2 Cluster Amazon S3 GetStatus Get Output Pay by the hour Distributed and Partitioned Input Output Pay by GB, Pay per Query
28 Zoom Level 3 Amazon SQS Billing Queue StartGrep Launch Queue Monitor Queue Shut down Queue Billing Service Controller Launch Monitor Shutdown Controller Controller Controller Billing Controller Insert JobID, Status launch Insert EC2 info ping Get EC2 Info Shutdown Check for results GetStatus Status DB Master M Slaves N HDFS Put File Output Input Get Amazon HadoopCluster on File SimpleDB Amazon EC2 Amazon S3 Get Output Input Files (Alexa Crawl)
29 Zoom Level 4 User1 StartJob 1 Servic e User2 StartJob 2 Map Map Map.. Map Tasks Map Map Map.. Map Tasks Combin e Reduce Hadoop Job Combine Reduce Hadoop Job StopJob 1 StopJob 2 Store status and results Get Result
30 SideTrack: WordCountExample MAPPER: For each input record, extract a set of key/value pairs that we care about the each record Hi Hadoop, Bye Hadoop Input Map Input key value pairs ( Hi, 1), ( Hadoop, 1), ( Bye, 1), ( Hadoop, 1) key 1 Values.. key 3 Values.. REDUCER: For each extracted key/value pair, combine it with other values that share the same key ( Hadoop, [1,1]) Key 1 All Values.. Reduce Aggregate ( Hadoop, 2) Final Key 1 Values.. Source: Doug Cutting s Slide Deck on Hadoop
31 Zoom Level 5 (HadoopMapReduce) MAPPER: For each input record, extract a set of key/value pairs that we care about the each record (LineNumber, s3pointer) Input Map Input key value pairs (s3pointer, [matches]) key 1 Values.. key 3 Values.. REDUCER: For each extracted key/value pair, combine it with other values that share the same key Key 1 All Values.. Reduce Aggregate Identity Function Final Key 1 Values.. Source: Doug Cutting s Slide Deck on Hadoop
32 The Open Source Hadoop framework is giving developers the power to do some pretty extraordinary things.
33 The Open Source Hadoop frameworkon Amazon EC2/S3is giving every developerthe power to do some pretty extraordinary things.
34 References Running Hadoop MapReduce on Amazon EC2 and Amazon S3 D=873 Hadoop on Amazon EC2 Step By Step Wiki Taking Massive Distributed Computing to the Common Man - Hadoop on Amazon EC2/S3
35
36 Thank you! Jinesh Varia /s3 /ec2 /sdb /sqs /forums /resources /blog
Amazon Web Services Cloud Computing in Action. Jeff Barr
Amazon Web Services Cloud Computing in Action Jeff Barr jbarr@amazon.com Who am I? Software development background Programmable applications and sites Microsoft Visual Basic and.net Teams Startup / venture
More informationChapter 5. The MapReduce Programming Model and Implementation
Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing
More informationPrincipal Solutions Architect. Architecting in the Cloud
Matt Tavis Principal Solutions Architect Architecting in the Cloud Cloud Best Practices Whitepaper Prescriptive guidance to Cloud Architects Just Search for Cloud Best Practices to find the link ttp://media.amazonwebservices.co
More informationWhat is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)?
What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)? What is Amazon Machine Image (AMI)? Amazon Elastic Compute Cloud (EC2)?
More information2013 AWS Worldwide Public Sector Summit Washington, D.C.
2013 AWS Worldwide Public Sector Summit Washington, D.C. EMR for Fun and for Profit Ben Butler Sr. Manager, Big Data butlerb@amazon.com @bensbutler Overview 1. What is big data? 2. What is AWS Elastic
More informationAmazon Web Services CSE 490H. This presentation incorporates content licensed under the Creative Commons Attribution 2.5 License.
Amazon Web Services CSE 490H This presentation incorporates content licensed under the Creative Commons Attribution 2.5 License. Overview Questions about Project 3? EC2 S3 Putting them together Brief Virtualization
More informationFault-Tolerant Computer System Design ECE 695/CS 590. Putting it All Together
Fault-Tolerant Computer System Design ECE 695/CS 590 Putting it All Together Saurabh Bagchi ECE/CS Purdue University ECE 695/CS 590 1 Outline Looking at some practical systems that integrate multiple techniques
More informationCIT 668: System Architecture. Amazon Web Services
CIT 668: System Architecture Amazon Web Services Topics 1. AWS Global Infrastructure 2. Foundation Services 1. Compute 2. Storage 3. Database 4. Network 3. AWS Economics Amazon Services Architecture Regions
More informationBig Data 7. Resource Management
Ghislain Fourny Big Data 7. Resource Management artjazz / 123RF Stock Photo Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage
More informationCPET 581 Cloud Computing: Technologies and Enterprise IT Strategies
CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies Lecture 8 Cloud Programming & Software Environments: High Performance Computing & AWS Services Part 2 of 2 Spring 2015 A Specialty Course
More informationScalable Web Programming. CS193S - Jan Jannink - 2/25/10
Scalable Web Programming CS193S - Jan Jannink - 2/25/10 Weekly Syllabus 1.Scalability: (Jan.) 2.Agile Practices 3.Ecology/Mashups 4.Browser/Client 7.Analytics 8.Cloud/Map-Reduce 9.Published APIs: (Mar.)*
More informationWelcome to the New Era of Cloud Computing
Welcome to the New Era of Cloud Computing Aaron Kimball The web is replacing the desktop 1 SDKs & toolkits are there What about the backend? Image: Wikipedia user Calyponte 2 Two key concepts Processing
More informationIntroduction to MapReduce
Introduction to MapReduce April 19, 2012 Jinoh Kim, Ph.D. Computer Science Department Lock Haven University of Pennsylvania Research Areas Datacenter Energy Management Exa-scale Computing Network Performance
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationCS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014
CS15-319 / 15-619 Cloud Computing Recitation 3 September 9 th & 11 th, 2014 Overview Last Week s Reflection --Project 1.1, Quiz 1, Unit 1 This Week s Schedule --Unit2 (module 3 & 4), Project 1.2 Questions
More informationAmazon Web Services: Building a Web-Scale Computing Infrastructure. Jeff Barr Senior Web Services Evangelist
Amazon Web Services: Building a Web-Scale Computing Infrastructure Jeff Barr Senior Web Services Evangelist Survey Says Are you an Amazon retail customer? Have you heard of the Amazon Web Services? Have
More informationHow can you implement this through a script that a scheduling daemon runs daily on the application servers?
You ve been tasked with implementing an automated data backup solution for your application servers that run on Amazon EC2 with Amazon EBS volumes. You want to use a distributed data store for your backups
More informationDISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?
DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing Slide 1 Slide 3 ➀ What is Cloud Computing? ➁ X as a Service ➂ Key Challenges ➃ Developing for the Cloud Why is it called Cloud? services provided
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 26: Parallel Databases and MapReduce CSE 344 - Winter 2013 1 HW8 MapReduce (Hadoop) w/ declarative language (Pig) Cluster will run in Amazon s cloud (AWS)
More informationPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.org Pete Skomoroch Research Scientist at LinkedIn Consultant at Data Wrangling @peteskomoroch 09/29/09 1 Talk Outline TrendingTopics Overview Wikipedia Page
More informationKillTest *KIJGT 3WCNKV[ $GVVGT 5GTXKEG Q&A NZZV ]]] QORRZKYZ IUS =K ULLKX LXKK [VJGZK YKX\OIK LUX UTK _KGX
KillTest Q&A Exam : AWS-SysOps Title : AWS Certified SysOps Administrator Associate Version : Demo 1 / 4 1.A user has created photo editing software and hosted it on EC2. The software accepts requests
More informationHow to host and manage enterprise customers on AWS: TOYOTA, Nippon Television, UNIQLO use cases
How to host and manage enterprise customers on AWS: TOYOTA, Nippon Television, UNIQLO use cases Kazutaka Goto - Evangelist, cloudpack Ken Tamagawa - Sr. Manager, Solutions Architecture, Amazon Web Services
More informationProgramming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines
A programming model in Cloud: MapReduce Programming model and implementation for processing and generating large data sets Users specify a map function to generate a set of intermediate key/value pairs
More informationWhere We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344
Where We Are Introduction to Data Management CSE 344 Lecture 22: MapReduce We are talking about parallel query processing There exist two main types of engines: Parallel DBMSs (last lecture + quick review)
More informationDesigning Fault-Tolerant Applications
Designing Fault-Tolerant Applications Miles Ward Enterprise Solutions Architect Building Fault-Tolerant Applications on AWS White paper published last year Sharing best practices We d like to hear your
More informationHadoop An Overview. - Socrates CCDH
Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected
More informationBig Data for Engineers Spring Resource Management
Ghislain Fourny Big Data for Engineers Spring 2018 7. Resource Management artjazz / 123RF Stock Photo Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationHigh Performance Computing on MapReduce Programming Framework
International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming
More informationLarge Scale Processing with Hadoop
Large Scale Processing with Hadoop William Palmer Some slides courtesy of Per Møldrup-Dalum (State and University Library, Denmark) and Sven Schlarb (Austrian National Library) SCAPE Information Day British
More informationCloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018
Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster
More informationMATE-EC2: A Middleware for Processing Data with Amazon Web Services
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering
More informationThe Bits Are In The Clouds
The Bits Are In The Clouds Jeff Barr Senior Web Services Evangelist jbarr@amazon.com www.storage-developer.org Hello! I m Jeff Barr Amazon employee since 2002 Background: Microsoft, consulting, startups
More informationHadoop/MapReduce Computing Paradigm
Hadoop/Reduce Computing Paradigm 1 Large-Scale Data Analytics Reduce computing paradigm (E.g., Hadoop) vs. Traditional database systems vs. Database Many enterprises are turning to Hadoop Especially applications
More informationDATA MINING II - 1DL460
DATA MINING II - 1DL460 Spring 2017 A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt17 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationData Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros
Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on
More informationAmazon Elastic MapReduce. API Reference API Version
Amazon Elastic MapReduce API Reference Amazon Elastic MapReduce: API Reference Copyright 2011-2012 Amazon Web Services LLC or its affiliates. All rights reserved. Welcome... 1 Actions... 2 AddInstanceGroups...
More informationMapReduce: Simplified Data Processing on Large Clusters 유연일민철기
MapReduce: Simplified Data Processing on Large Clusters 유연일민철기 Introduction MapReduce is a programming model and an associated implementation for processing and generating large data set with parallel,
More informationOracle NoSQL Database and Cisco- Collaboration that produces results. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Oracle NoSQL Database and Cisco- Collaboration that produces results 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. What is Big Data? SOCIAL BLOG SMART METER VOLUME VELOCITY VARIETY
More informationCISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More informationCloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe
Cloud Programming Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Computing Only required amount of CPU and storage can be used anytime from anywhere via network Availability, throughput, reliability
More informationThe MapReduce Framework
The MapReduce Framework In Partial fulfilment of the requirements for course CMPT 816 Presented by: Ahmed Abdel Moamen Agents Lab Overview MapReduce was firstly introduced by Google on 2004. MapReduce
More informationIntroduction to Hadoop. Owen O Malley Yahoo!, Grid Team
Introduction to Hadoop Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since
More informationIntroduction to MapReduce (cont.)
Introduction to MapReduce (cont.) Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com USC INF 553 Foundations and Applications of Data Mining (Fall 2018) 2 MapReduce: Summary USC INF 553 Foundations
More informationBasics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama
Basics of Cloud Computing Lecture 2 Cloud Providers Satish Srirama Outline Cloud computing services recap Amazon cloud services Elastic Compute Cloud (EC2) Storage services - Amazon S3 and EBS Cloud managers
More informationProject Presentation
Project Presentation Saad Arif Dept. of Electrical Engineering and Computer Science University of Central Florida - Orlando, FL November 7, 2013 1 Introduction 1 Introduction 2 Gallery 1 Introduction 2
More informationOutline. Distributed File System Map-Reduce The Computational Model Map-Reduce Algorithm Evaluation Computing Joins
MapReduce 1 Outline Distributed File System Map-Reduce The Computational Model Map-Reduce Algorithm Evaluation Computing Joins 2 Outline Distributed File System Map-Reduce The Computational Model Map-Reduce
More informationPage 1. Goals for Today" Background of Cloud Computing" Sources Driving Big Data" CS162 Operating Systems and Systems Programming Lecture 24
Goals for Today" CS162 Operating Systems and Systems Programming Lecture 24 Capstone: Cloud Computing" Distributed systems Cloud Computing programming paradigms Cloud Computing OS December 2, 2013 Anthony
More informationIntro to Software as a Service (SaaS) and Cloud Computing
UC Berkeley Intro to Software as a Service (SaaS) and Cloud Computing Armando Fox, UC Berkeley Reliable Adaptive Distributed Systems Lab 2009-2012 Image: John Curley http://www.flickr.com/photos/jay_que/1834540/
More informationAmazon AWS-Solution-Architect-Associate Exam
Volume: 858 Questions Question: 1 You are trying to launch an EC2 instance, however the instance seems to go into a terminated status immediately. What would probably not be a reason that this is happening?
More informationPLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS
PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad
More informationCS November 2018
Bigtable Highly available distributed storage Distributed Systems 19. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account
More informationCS November 2017
Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account
More informationRails on HBase. Zachary Pinter and Tony Hillerson RailsConf 2011
Rails on HBase Zachary Pinter and Tony Hillerson RailsConf 2011 What we will cover What is it? What are the tradeoffs that HBase makes? Why HBase is probably the wrong choice for your app Why HBase might
More informationLocality-Aware Dynamic VM Reconfiguration on MapReduce Clouds. Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng
Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng Virtual Clusters on Cloud } Private cluster on public cloud } Distributed
More informationamazon.com s Journey to the Cloud Jon Jenkins AWS Summit June 13, 2011
amazon.com s Journey to the Cloud Jon Jenkins jjenkin@amazon.com AWS Summit June 13, 2011 1995-2010 + First real data center Distribution Center Isolation Decouple Service Oriented Architecture Scale
More informationOpen Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments
Send Orders for Reprints to reprints@benthamscience.ae 368 The Open Automation and Control Systems Journal, 2014, 6, 368-373 Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing
More informationIntroduction to Amazon Web Services. Jeff Barr Senior AWS /
Introduction to Amazon Web Services Jeff Barr Senior AWS Evangelist @jeffbarr / jbarr@amazon.com What Does It Take to be a Global Online Retailer? The Obvious Part And the Not-So Obvious Part How Did
More informationAWS Solutions Architect Associate (SAA-C01) Sample Exam Questions
1) A company is storing an access key (access key ID and secret access key) in a text file on a custom AMI. The company uses the access key to access DynamoDB tables from instances created from the AMI.
More informationDesign Patterns for the Cloud. MCSN - N. Tonellotto - Distributed Enabling Platforms 68
Design Patterns for the Cloud 68 based on Amazon Web Services Architecting for the Cloud: Best Practices Jinesh Varia http://media.amazonwebservices.com/aws_cloud_best_practices.pdf 69 Amazon Web Services
More informationMixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp
MixApart: Decoupled Analytics for Shared Storage Systems Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp Hadoop Pig, Hive Hadoop + Enterprise storage?! Shared storage
More informationMapReduce, Hadoop and Spark. Bompotas Agorakis
MapReduce, Hadoop and Spark Bompotas Agorakis Big Data Processing Most of the computations are conceptually straightforward on a single machine but the volume of data is HUGE Need to use many (1.000s)
More informationCS15-319: Cloud Computing. Lecture 3 Course Project and Amazon AWS Majd Sakr and Mohammad Hammoud
CS15-319: Cloud Computing Lecture 3 Course Project and Amazon AWS Majd Sakr and Mohammad Hammoud Lecture Outline Discussion On Course Project Amazon Web Services 2 Course Project Course Project Phase I-A
More informationAhead in the Cloud. Matt Wood TECHNOLOGY EVANGELIST
Ahead in the Cloud Matt Wood TECHNOLOGY EVANGELIST Hello. Thank you. Don t be afraid to be a bit technical! There Will Be Code 3 1 Building blocks Infrastructure services Compute Storage Databases
More informationCS 345A Data Mining. MapReduce
CS 345A Data Mining MapReduce Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Dis Commodity Clusters Web data sets can be ery large Tens to hundreds of terabytes
More informationIndrajit Roy, Srinath T.V. Setty, Ann Kilzer, Vitaly Shmatikov, Emmett Witchel The University of Texas at Austin
Airavat: Security and Privacy for MapReduce Indrajit Roy, Srinath T.V. Setty, Ann Kilzer, Vitaly Shmatikov, Emmett Witchel The University of Texas at Austin Computing in the year 201X 2 Data Illusion of
More informationBig Data Infrastructure at Spotify
Big Data Infrastructure at Spotify Wouter de Bie Team Lead Data Infrastructure September 26, 2013 2 Who am I? According to ZDNet: "The work they have done to improve the Apache Hive data warehouse system
More informationPaaS Cloud mit Java. Eberhard Wolff, Principal Technologist, SpringSource A division of VMware VMware Inc. All rights reserved
PaaS Cloud mit Java Eberhard Wolff, Principal Technologist, SpringSource A division of VMware 2009 VMware Inc. All rights reserved Agenda! A Few Words About Cloud! PaaS Platform as a Service! Google App
More informationAWS Solution Architecture Patterns
AWS Solution Architecture Patterns Objectives Key objectives of this chapter AWS reference architecture catalog Overview of some AWS solution architecture patterns 1.1 AWS Architecture Center The AWS Architecture
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK DISTRIBUTED FRAMEWORK FOR DATA MINING AS A SERVICE ON PRIVATE CLOUD RUCHA V. JAMNEKAR
More informationThe Gearman Cookbook OSCON Eric Day Senior Software Rackspace
The Gearman Cookbook OSCON 2010 Eric Day http://oddments.org/ Senior Software Engineer @ Rackspace Thanks for being here! OSCON 2010 The Gearman Cookbook 2 Ask questions! Grab a mic for long questions.
More informationLaunch and Scale your Social Game in the Cloud with AltEgo, Amazon Web Services and RightScale
Launch and Scale your Social Game in the Cloud with AltEgo, Amazon Web Services and RightScale November 16, 2010 Your Panel Today Presenting: Josh Fraser: VP, Business Development, RightScale Jeff Barr:
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 2. MapReduce Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Framework A programming model
More informationAWS Certified Solutions Architect - Associate 2018 (SAA-001)
AWS Certified Solutions Architect - Associate 2018 (SAA-001) Amazon AWS Certified Solutions Architect Associate 2018 Dumps Available Here at: /amazon-exam/aws-certified-solutionsarchitect-associate-2018-dumps.html
More information1. Stratified sampling is advantageous when sampling each stratum independently.
Quiz 1. 1. Stratified sampling is advantageous when sampling each stratum independently. 2. All outliers within a dataset are invalid observations. 3. Consider a dataset comprising a set of (single value)
More informationA Review Paper on Big data & Hadoop
A Review Paper on Big data & Hadoop Rupali Jagadale MCA Department, Modern College of Engg. Modern College of Engginering Pune,India rupalijagadale02@gmail.com Pratibha Adkar MCA Department, Modern College
More informationIntroduction to. Amazon Web Services. Thilina Gunarathne Salsa Group, Indiana University. With contributions from Saliya Ekanayake.
Introduction to Amazon Web Services Thilina Gunarathne Salsa Group, Indiana University. With contributions from Saliya Ekanayake. Introduction Fourth Paradigm Data intensive scientific discovery DNA Sequencing
More informationTraining on Amazon AWS Cloud Computing. Course Content
Training on Amazon AWS Cloud Computing Course Content 15 Amazon Web Services (AWS) Cloud Computing 1) Introduction to cloud computing Introduction to Cloud Computing Why Cloud Computing? Benefits of Cloud
More informationSAA-C01. AWS Solutions Architect Associate. Exam Summary Syllabus Questions
SAA-C01 AWS Solutions Architect Associate Exam Summary Syllabus Questions Table of Contents Introduction to SAA-C01 Exam on AWS Solutions Architect Associate... 2 AWS SAA-C01 Certification Details:...
More informationThe Evolution of a Data Project
The Evolution of a Data Project The Evolution of a Data Project Python script The Evolution of a Data Project Python script SQL on live DB The Evolution of a Data Project Python script SQL on live DB SQL
More informationMySQL in the Cloud Tricks and Tradeoffs
MySQL in the Cloud Tricks and Tradeoffs Thorsten von Eicken CTO RightScale 1 MySQL & Amazon EC2 @RightScale Operating in Amazon EC2 since fall 2006 Cloud Computing Management System Replicated MySQL product
More informationThe ATLAS EventIndex: Full chain deployment and first operation
The ATLAS EventIndex: Full chain deployment and first operation Álvaro Fernández Casaní Instituto de Física Corpuscular () Universitat de València CSIC On behalf of the ATLAS Collaboration 1 Outline ATLAS
More informationApache Giraph: Facebook-scale graph processing infrastructure. 3/31/2014 Avery Ching, Facebook GDM
Apache Giraph: Facebook-scale graph processing infrastructure 3/31/2014 Avery Ching, Facebook GDM Motivation Apache Giraph Inspired by Google s Pregel but runs on Hadoop Think like a vertex Maximum value
More informationSensor Data Collection and Processing
Sensor Data Collection and Processing Applying Web Scale To Sensor Data Today s speaker Josh Patterson josh@cloudera.com / twitter: @jpatanooga Master s Thesis: self-organizing mesh networks Published
More informationData-Intensive Distributed Computing
Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationProgramming Systems for Big Data
Programming Systems for Big Data CS315B Lecture 17 Including material from Kunle Olukotun Prof. Aiken CS 315B Lecture 17 1 Big Data We ve focused on parallel programming for computational science There
More informationHadoop Map Reduce 10/17/2018 1
Hadoop Map Reduce 10/17/2018 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind of functional programming We focus on the MapReduce execution engine of Hadoop through YARN 10/17/2018
More informationParallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce
Parallel Programming Principle and Practice Lecture 10 Big Data Processing with MapReduce Outline MapReduce Programming Model MapReduce Examples Hadoop 2 Incredible Things That Happen Every Minute On The
More informationWHITEPAPER AMAZON ELB: Your Master Key to a Secure, Cost-Efficient and Scalable Cloud.
WHITEPAPER AMAZON ELB: Your Master Key to a Secure, Cost-Efficient and Scalable Cloud www.cloudcheckr.com TABLE OF CONTENTS Overview 3 What Is ELB? 3 How ELB Works 4 Classic Load Balancer 5 Application
More informationAWS_SOA-C00 Exam. Volume: 758 Questions
Volume: 758 Questions Question: 1 A user has created photo editing software and hosted it on EC2. The software accepts requests from the user about the photo format and resolution and sends a message to
More informationwhat is cloud computing?
what is cloud computing? (Private) Cloud Computing with Mesos at Twi9er Benjamin Hindman @benh scalable virtualized self-service utility managed elastic economic pay-as-you-go what is cloud computing?
More informationAWS Interview Questions and Answers
AWS Interview Questions and Answers AWS Interview Questions and Answers AWS Certified Solutions Architect Drives to the 15 Top Paying IT Certifications. Absolutely, AWS Solution Architect position is an
More informationDatabase Systems CSE 414
Database Systems CSE 414 Lecture 19: MapReduce (Ch. 20.2) CSE 414 - Fall 2017 1 Announcements HW5 is due tomorrow 11pm HW6 is posted and due Nov. 27 11pm Section Thursday on setting up Spark on AWS Create
More informationMapReduce: Recap. Juliana Freire & Cláudio Silva. Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec
MapReduce: Recap Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec MapReduce: Recap Sequentially read a lot of data Why? Map: extract something we care about map (k, v)
More informationDistributed Computing.
Distributed Computing at Hai.Thai@rackspace.com About: Me ME About: Me ME 09 Tech grad B.S. Computer Engineering 4 years at rackspace About: Rackspace About: Rackspace Managed + Cloud hosting Cloud Applications:
More informationHaLoop Efficient Iterative Data Processing on Large Clusters
HaLoop Efficient Iterative Data Processing on Large Clusters Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael D. Ernst University of Washington Department of Computer Science & Engineering Presented
More informationCS 345A Data Mining. MapReduce
CS 345A Data Mining MapReduce Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very large Tens to hundreds of terabytes
More information7/22/2008. Transformations
Bandwidth Consumed by s Global Websites Bandwidth Consumed by What is? 7 Countries More than 76 million active customer accounts Approximately 1.3 million active seller accounts Hundreds of thousand of
More informationFacilitating Consistency Check between Specification & Implementation with MapReduce Framework
Facilitating Consistency Check between Specification & Implementation with MapReduce Framework Shigeru KUSAKABE, Yoichi OMORI, Keijiro ARAKI Kyushu University, Japan 2 Our expectation Light-weight formal
More information