SciDB An Open Source Data Base Project. Paul Brown* *medical emergency trumped presence
|
|
- Jesse Berry
- 5 years ago
- Views:
Transcription
1 SciDB An Open Source Data Base Project by Paul Brown* *medical emergency trumped presence
2 Outline Science data Why science folks are unhappy with RDBMS Our project what we are doing about it
3 O(100) petabytes
4 Nearest neighbor queries, time series queries
5 Snow Cover in the Sierras
6 Why SciDB? Big science very unhappy with RDBMS Astronomy HEP Fusion Bio Remote sensing Oceanography
7 Why? Experience of Sequoia 2000 (mid 1990s) Tried to use Postgres for science databases Failed badly Main science data type is an array horribly inefficient to simulate arrays on top of tables Required features absent (provenance, uncertainty, version control) SQL operations wrong (regrid not join)
8 Why SciDB? Net result Mentality of roll your own from the ground up for every new science project Realization by the science community that this is long-term suicide Community seemingly wants to get behind something better Great commonality of needs among domains
9 A Little Context XLDB-1 (Oct 2007) Message from previous slides came across loud and clear Dewitt/Stonebraker agreed to move the ball down the field
10 A Little Context Asilomar (March day workshop people) Flesh out requirements (biggie is open source, commercial quality, petabyte-scale DBMS) Considerable commonality across science disciplines Core team of science and DBMS types identified to push things forward Research issues identified
11 A Little Context The Next Year Initial design completed Along with an initial implementation Shown at VLDB/Lyons Recruiting of initial team Detailed use cases specified
12 Funding Situation Tried and failed to get $$$ from NSF/DOE/NASA Tried and failed to get $$$ from foundations Tried and failed to get $$$ from industry Last resort was VC s Company (Zetics) funded in March 10
13 Present Day Structure About 25 employees/consultants/volunteers co-ordinated by Suchi Raman Design co-ordinated by Mike Stonebraker and Paul Brown Support/marketing/business/website co-ordinated by Marilyn Matz
14 Data Model Nested multi-dimensional arrays natural representation for spatially or temporally ordered data Array cells can be tuples of values or other arrays extensible type system Arrays can be very sparse user-definable handling for null or missing data
15 Data Storage Arrays are chunked (in multiple dimensions) in storage Chunks are partitioned across a collection of nodes Each node has processor and storage (shared nothing) Chunks have overlap to support neighborhood operations
16 Architecture Shared nothing cluster 10 s 1000 s of nodes Commodity hardware Queries refer to arrays as if not distributed Query planner optimizes queries for efficient data access & processing Query plan runs on a node s local executor/storage manager Runtime supervisor coordinates execution Application Language Specific UI Runtime Supervisor Query Interface and Parser Plan Generator Node 3 Node 2 Node 1 Local Executor Storage Manager Application Layer Server Layer Storage Layer
17 AQL: an array & analytics query language DBMS-style operations Filter, combine (join) Array operations Multiply Transpose Extensible Postgres style UDFs Interfaces to other open source packages MatLab R
18 Other Features Which Science Guys Want (These could be in RDBMS, but Aren t) Uncertainty Data has error bars Which must be carried along in the computation (interval arithmetic)
19 Other Features Provenance (lineage) What calibration generated the data What was the cooking algorithm In general repeatability of data derivation
20 Other Features Time travel Don t fix errors by overwrite I.e. keep all of the data Spatial support Named versions Recalibration usually handled this way
21 Status Open source SciDb 0.5 available now From our web site Good for PoCs Not good for Pbytes Better versions are imminent Come to the community meeting this afternoon for a roadmap
What to do with Scientific Data? Michael Stonebraker
What to do with Scientific Data? by Michael Stonebraker Outline Science data what it looks like Hardware options for deployment Software options RDBMS Wrappers on RDBMS SciDB Courtesy of LSST. Used with
More informationEmerging Database Technologies and Their Applicability to High Energy Physics: A First Look at SciDB
Journal of Physics: Conference Series Emerging Database Technologies and Their Applicability to High Energy Physics: A First Look at SciDB To cite this article: D Malon et al 2011 J. Phys.: Conf. Ser.
More informationNew Direction for TPC. Michael Stonebraker
New Direction for TPC by Michael Stonebraker Outline 1985 1985-88 PAFS TPC-H The future 1985 Jim Gray writes debit-credit benchmark And gets his friends to be co-authors Commercial systems do about 25
More informationSS-DB: A Standard Science DBMS Benchmark
SS-DB: A Standard Science DBMS Benchmark Philippe Cudre-Mauroux 1, Hideaki Kimura 2, Kian-Tat Lim 3, Jennie Rogers 2, Samuel Madden 1, Michael Stonebraker 1, Stanley B. Zdonik 2, and Paul G. Brown 4 1
More informationAn exploration of SciDB in the context of emerging technologies for data stores in particle physics and cosmology
An exploration of SciDB in the context of emerging technologies for data stores in particle physics and cosmology D Malon, P van Gemmeren and J Weinstein Argonne National Laboratory, 9700 S Cass Ave, Lemont,
More informationIntroduction to Spatial Database Systems
Introduction to Spatial Database Systems by Cyrus Shahabi from Ralf Hart Hartmut Guting s VLDB Journal v3, n4, October 1994 Data Structures & Algorithms 1. Implementation of spatial algebra in an integrated
More informationAddressing the Variety Challenge to Ease the Systemization of Machine Learning in Earth Science
Addressing the Variety Challenge to Ease the Systemization of Machine Learning in Earth Science K-S Kuo 1,2,3, M L Rilee 1,4, A O Oloso 1,5, K Doan 1,3, T L Clune 1, and H-F Yu 6 1 NASA Goddard Space Flight
More informationCarnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Administrivia. Faloutsos/Pavlo CMU /615
Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#14(b): Implementation of Relational Operations Administrivia HW4 is due today. HW5 is out. Faloutsos/Pavlo
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
CHAPTER 26 Enhanced Data Models: Introduction to Active, Temporal, Spatial, Multimedia, and Deductive Databases 26.1 Active Database Concepts and Triggers Database systems implement rules that specify
More informationColumn-Stores vs. Row-Stores: How Different Are They Really?
Column-Stores vs. Row-Stores: How Different Are They Really? Daniel Abadi, Samuel Madden, Nabil Hachem Presented by Guozhang Wang November 18 th, 2008 Several slides are from Daniel Abadi and Michael Stonebraker
More informationTutorial Outline. Map/Reduce vs. DBMS. MR vs. DBMS [DeWitt and Stonebraker 2008] Acknowledgements. MR is a step backwards in database access
Map/Reduce vs. DBMS Sharma Chakravarthy Information Technology Laboratory Computer Science and Engineering Department The University of Texas at Arlington, Arlington, TX 76009 Email: sharma@cse.uta.edu
More informationSciQL A Query Language for Science Applications M. Kersten, Y. Zhang, M. Ivanova, N. Nes CWI Amsterdam Array Database Workshop March 25th, 2011
SciQL A Quer Language for Science Applications M. Kersten, Y. Zhang, M. Ivanova, N. Nes CWI Amsterdam Arra Database Workshop March 5th, Who needs arras anwa? Seismolog Astronom Climate simulation Remote
More informationCS511 Design of Database Management Systems
Announcements CS511 Design of Database Management Systems HW incremental release starting last Sun. Class will reschedule next week: time: Wednesday Tuesday 5pm, place: 1310 DCL Lecture 05: Object Relational
More informationThe BigDAWG Polystore. Michael Stonebraker
The BigDAWG Polystore by Michael Stonebraker Purpose of BigDAWG (Intel ISTC on Big Data) Flatten BDAS (Badass) Outline Why do we need a polystore What exactly is a Polystore The BigDAWG stack The big kahuna
More informationImplementation Techniques
V Implementation Techniques 34 Efficient Evaluation of the Valid-Time Natural Join 35 Efficient Differential Timeslice Computation 36 R-Tree Based Indexing of Now-Relative Bitemporal Data 37 Light-Weight
More informationImplementing Connected Component Labeling as a User Defined Operator for SciDB
Implementing Connected Component Labeling as a User Defined Operator for SciDB Amidu Oloso 1,2, Kwo-Sen Kuo 1,3, Thomas Clune 1, Paul Brown 4, Alex Poliakov 4 1. NASA Goddard Space Flight Center, Greenbelt,
More informationQUERYING SQL, NOSQL, AND NEWSQL DATABASES TOGETHER AND AT SCALE BAPI CHATTERJEE IBM, INDIA RESEARCH LAB, NEW DELHI, INDIA
QUERYING SQL, NOSQL, AND NEWSQL DATABASES TOGETHER AND AT SCALE BAPI CHATTERJEE IBM, INDIA RESEARCH LAB, NEW DELHI, INDIA DISCLAIMER The statements/views expressed in the presentation slides are those
More informationCost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class.
Cost-based Query Sub-System Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications Queries Select * From Blah B Where B.blah = blah Query Parser Query Optimizer C. Faloutsos A. Pavlo
More informationTime Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix
Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang Department of Computer Science, University of Houston, USA Abstract. We study the serial and parallel
More informationTraditional RDBMS Wisdom is All Wrong -- In Three Acts. Michael Stonebraker
Traditional RDBMS Wisdom is All Wrong -- In Three Acts Michael Stonebraker The Stonebraker Says Webinar Series The first three acts: 1. Why main memory is the answer for OLTP Recording available at VoltDB.com
More informationCPSC 421 Database Management Systems. Lecture 19: Physical Database Design Concurrency Control and Recovery
CPSC 421 Database Management Systems Lecture 19: Physical Database Design Concurrency Control and Recovery * Some material adapted from R. Ramakrishnan, L. Delcambre, and B. Ludaescher Agenda Physical
More informationCSE 544: Principles of Database Systems
CSE 544: Principles of Database Systems Anatomy of a DBMS, Parallel Databases 1 Announcements Lecture on Thursday, May 2nd: Moved to 9am-10:30am, CSE 403 Paper reviews: Anatomy paper was due yesterday;
More informationSystems Analysis & Design
Systems Analysis & Design Dr. Arif Sari Email: arif@arifsari.net Course Website: www.arifsari.net/courses/ Slide 1 Adapted from slides 2005 John Wiley & Sons, Inc. Slide 2 Course Textbook: Systems Analysis
More informationCMPT 354 Database Systems I. Spring 2012 Instructor: Hassan Khosravi
CMPT 354 Database Systems I Spring 2012 Instructor: Hassan Khosravi Textbook First Course in Database Systems, 3 rd Edition. Jeffry Ullman and Jennifer Widom Other text books Ramakrishnan SILBERSCHATZ
More informationContents. Part I Setting the Scene
Contents Part I Setting the Scene 1 Introduction... 3 1.1 About Mobility Data... 3 1.1.1 Global Positioning System (GPS)... 5 1.1.2 Format of GPS Data... 6 1.1.3 Examples of Trajectory Datasets... 8 1.2
More informationAn Introduction to Preparing Data for Analysis with JMP. Full book available for purchase here. About This Book... ix About The Author...
An Introduction to Preparing Data for Analysis with JMP. Full book available for purchase here. Contents About This Book... ix About The Author... xiii Chapter 1: Data Management in the Analytics Process...
More informationData Tamer: A Scalable Data Curation System. Michael Stonebraker
Data Tamer: A Scalable Data Curation System by Michael Stonebraker How Does This Fit Into Big Data? I have too much of it Volume problem It s coming at me too fast Velocity problem It s coming at me from
More informationHierarchy of knowledge BIG DATA 9/7/2017. Architecture
BIG DATA Architecture Hierarchy of knowledge Data: Element (fact, figure, etc.) which is basic information that can be to be based on decisions, reasoning, research and which is treated by the human or
More informationSafe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
More informationA Cloud System for Machine Learning Exploiting a Parallel Array DBMS
2017 28th International Workshop on Database and Expert Systems Applications A Cloud System for Machine Learning Exploiting a Parallel Array DBMS Yiqun Zhang, Carlos Ordonez, Lennart Johnsson Department
More informationArray QL Syntax. Draft 4 September 10, <
Array QL Syntax Draft 4 September 10, 2012 Comments on this draft should be sent to arraydb-l@slac.stanford.edu Contributors: K.-T. Lim, D. Maier, J. Becla for XLDB M. Kersten, Y.
More informationCompile-Time Code Generation for Embedded Data-Intensive Query Languages
Compile-Time Code Generation for Embedded Data-Intensive Query Languages Leonidas Fegaras University of Texas at Arlington http://lambda.uta.edu/ Outline Emerging DISC (Data-Intensive Scalable Computing)
More informationNew Technologies for Data Management
New Technologies for Data Management Chaitan Baru 2 2 Why new technologies? Big Data Characteristics: Volume, Velocity, Variety Began as a Volume problem E.g. Web crawls 1 spb-100 spb in a single cluster
More informationCosmic Peta-Scale Data Analysis at IN2P3
Cosmic Peta-Scale Data Analysis at IN2P3 Fabrice Jammes Scalable Data Systems Expert LSST Database and Data Access Software Developer Yvan Calas Senior research engineer LSST deputy project leader at CC-IN2P3
More informationAnnouncements. Two typical kinds of queries. Choosing Index is Not Enough. Cost Parameters. Cost of Reading Data From Disk
Announcements Introduction to Database Systems CSE 414 Lecture 17: Basics of Query Optimization and Query Cost Estimation Midterm will be released by end of day today Need to start one HW6 step NOW: https://aws.amazon.com/education/awseducate/apply/
More informationPrinciples of Data Management. Lecture #9 (Query Processing Overview)
Principles of Data Management Lecture #9 (Query Processing Overview) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v Midterm
More information2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.
Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss
More information2.3 Algorithms Using Map-Reduce
28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure
More informationDATABASE SCALE WITHOUT LIMITS ON AWS
The move to cloud computing is changing the face of the computer industry, and at the heart of this change is elastic computing. Modern applications now have diverse and demanding requirements that leverage
More informationWho we are: Database Research - Provenance, Integration, and more hot stuff. Boris Glavic. Department of Computer Science
Who we are: Database Research - Provenance, Integration, and more hot stuff Boris Glavic Department of Computer Science September 24, 2013 Hi, I am Boris Glavic, Assistant Professor Hi, I am Boris Glavic,
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs
More informationOracle and Tangosol Acquisition Announcement
Oracle and Tangosol Acquisition Announcement March 23, 2007 The following is intended to outline our general product direction. It is intended for information purposes only, and may
More informationModule 9: Selectivity Estimation
Module 9: Selectivity Estimation Module Outline 9.1 Query Cost and Selectivity Estimation 9.2 Database profiles 9.3 Sampling 9.4 Statistics maintained by commercial DBMS Web Forms Transaction Manager Lock
More informationCSE544 Database Architecture
CSE544 Database Architecture Tuesday, February 1 st, 2011 Slides courtesy of Magda Balazinska 1 Where We Are What we have already seen Overview of the relational model Motivation and where model came from
More informationSystems Analysis & Design
Systems Analysis & Design Dr. Ahmed Lawgali Ahmed.lawgali@uob.edu.ly Slide 1 Systems Analysis & Design Course Textbook: Systems Analysis and Design With UML 2.0 An Object-Oriented Approach, Second Edition
More informationCSE 544 Principles of Database Management Systems
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 5 - DBMS Architecture and Indexing 1 Announcements HW1 is due next Thursday How is it going? Projects: Proposals are due
More informationAn Overview of Projection, Partitioning and Segmentation of Big Data Using Hp Vertica
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 19, Issue 5, Ver. I (Sep.- Oct. 2017), PP 48-53 www.iosrjournals.org An Overview of Projection, Partitioning
More informationWhat happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques
376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 16 Query optimization What happens Database is given a query Query is scanned - scanner creates a list
More informationMySQL Fulltext Search
MySQL Fulltext Search Sergei Golubchik International PHP Conference Frankfurt/Main, 2003 MySQL AB 2003 Int. PHP Conference: Fulltext Search Sergei Golubchik Frankfurt 2003 2 MySQL Fulltext Search: Session
More informationVertica s Design: Basics, Successes, and Failures
Vertica s Design: Basics, Successes, and Failures Chuck Bear CIDR 2015 January 5, 2015 1. Vertica Basics: Storage Format Design Goals SQL (for the ecosystem and knowledge pool) Clusters of commodity hardware
More informationWelcome to the topic of SAP HANA modeling views.
Welcome to the topic of SAP HANA modeling views. 1 At the end of this topic, you will be able to describe the three types of SAP HANA modeling views and use the SAP HANA Studio to work with views in the
More information1 (eagle_eye) and Naeem Latif
1 CS614 today quiz solved by my campus group these are just for idea if any wrong than we don t responsible for it Question # 1 of 10 ( Start time: 07:08:29 PM ) Total Marks: 1 As opposed to the outcome
More informationTraditional RDBMS Wisdom is All Wrong -- In Three Acts "
Traditional RDBMS Wisdom is All Wrong -- In Three Acts "! The Stonebraker Says Webinar Series! The first three acts:! 1. Why the elephants are toast and why main memory is the answer for OLTP! Today! 2.
More informationOpen Source Database Ecosystem in Peter Zaitsev 3 October 2016
Open Source Database Ecosystem in 2016 Peter Zaitsev 3 October 2016 Great things are happening with Open Source Databases It is great Industry and Community to be a part of 2 Why? 3 Data Continues Exponential
More informationImplementing Connected Component Labeling as a User Defined Operator for SciDB
Implementing Connected Component Labeling as a User Defined Operator for SciDB Amidu Oloso 1,2, Kwo-Sen Kuo 1,3,4, Thomas Clune 1 1 NASA GSFC, Greenbelt, MD, USA, 2 SSAI, Greenbelt MD, USA, 3 Bayesics,
More informationIntroduction to Trajectory Clustering. By YONGLI ZHANG
Introduction to Trajectory Clustering By YONGLI ZHANG Outline 1. Problem Definition 2. Clustering Methods for Trajectory data 3. Model-based Trajectory Clustering 4. Applications 5. Conclusions 1 Problem
More informationHuge market -- essentially all high performance databases work this way
11/5/2017 Lecture 16 -- Parallel & Distributed Databases Parallel/distributed databases: goal provide exactly the same API (SQL) and abstractions (relational tables), but partition data across a bunch
More informationSomething to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:
Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base
More informationSciSpark 201. Searching for MCCs
SciSpark 201 Searching for MCCs Agenda for 201: Access your SciSpark & Notebook VM (personal sandbox) Quick recap. of SciSpark Project What is Spark? SciSpark Extensions scitensor: N-dimensional arrays
More informationImpala. A Modern, Open Source SQL Engine for Hadoop. Yogesh Chockalingam
Impala A Modern, Open Source SQL Engine for Hadoop Yogesh Chockalingam Agenda Introduction Architecture Front End Back End Evaluation Comparison with Spark SQL Introduction Why not use Hive or HBase?
More informationPredictive Elastic Database Systems. Rebecca Taft HPTS 2017
Predictive Elastic Database Systems Rebecca Taft becca@cockroachlabs.com HPTS 2017 1 Modern OLTP Applications Large Scale Cloud-Based Performance is Critical 2 Challenges to transaction performance: skew
More informationSAP CERTIFIED APPLICATION ASSOCIATE - SAP HANA 2.0 (SPS01)
SAP EDUCATION SAMPLE QUESTIONS: C_HANAIMP_13 SAP CERTIFIED APPLICATION ASSOCIATE - SAP HANA 2.0 (SPS01) Disclaimer: These sample questions are for self-evaluation purposes only and do not appear on the
More informationZ-KNN Join for the Swiss Feed Database: a feasibility study
Z-KNN Join for the Swiss Feed Database: a feasibility study Francesco Luminati University Of Zurich, Switzerland francesco.luminati@uzh.ch 1 Introduction K-nearest neighbor query (knn) and k-nearest neighbor
More informationIntroduction to Wireless Sensor Network. Peter Scheuermann and Goce Trajcevski Dept. of EECS Northwestern University
Introduction to Wireless Sensor Network Peter Scheuermann and Goce Trajcevski Dept. of EECS Northwestern University 1 A Database Primer 2 A leap in history A brief overview/review of databases (DBMS s)
More informationMITOCW ocw f99-lec12_300k
MITOCW ocw-18.06-f99-lec12_300k This is lecture twelve. OK. We've reached twelve lectures. And this one is more than the others about applications of linear algebra. And I'll confess. When I'm giving you
More informationBasics of Data Management
Basics of Data Management Chaitan Baru 2 2 Objectives of this Module Introduce concepts and technologies for managing structured, semistructured, unstructured data Obtain a grounding in traditional data
More information@joerg_schad Nightmares of a Container Orchestration System
@joerg_schad Nightmares of a Container Orchestration System 2017 Mesosphere, Inc. All Rights Reserved. 1 Jörg Schad Distributed Systems Engineer @joerg_schad Jan Repnak Support Engineer/ Solution Architect
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic
More informationPostgreSQL Built-in Sharding:
Copyright(c)2017 NTT Corp. All Rights Reserved. PostgreSQL Built-in Sharding: Enabling Big Data Management with the Blue Elephant E. Fujita, K. Horiguchi, M. Sawada, and A. Langote NTT Open Source Software
More informationDealing with Data Especially Big Data
Dealing with Data Especially Big Data INFO-GB-2346.01 Fall 2017 Professor Norman White nwhite@stern.nyu.edu normwhite@twitter Teaching Assistant: Frenil Sanghavi fps241@stern.nyu.edu Administrative Assistant:
More informationProject Participants
Annual Report for Period:10/2004-10/2005 Submitted on: 06/21/2005 Principal Investigator: Yang, Li. Award ID: 0414857 Organization: Western Michigan Univ Title: Projection and Interactive Exploration of
More informationpgconf.de 2018 Berlin, Germany Magnus Hagander
A look at the Elephants Trunk PostgreSQL 11 pgconf.de 2018 Berlin, Germany Magnus Hagander magnus@hagander.net Magnus Hagander Redpill Linpro Principal database consultant PostgreSQL Core Team member Committer
More informationNew Trends in Database Systems
New Trends in Database Systems Ahmed Eldawy 9/29/2016 1 Spatial and Spatio-temporal data 9/29/2016 2 What is spatial data Geographical data Medical images 9/29/2016 Astronomical data Trajectories 3 Application
More informationData Partitioning and MapReduce
Data Partitioning and MapReduce Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies,
More informationUnsupervised Learning: Clustering
Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning
More informationHigh-Performance Distributed DBMS for Analytics
1 High-Performance Distributed DBMS for Analytics 2 About me Developer, hardware engineering background Head of Analytic Products Department in Yandex jkee@yandex-team.ru 3 About Yandex One of the largest
More informationAndrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs
Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09 Presented by: Daniel Isaacs It all starts with cluster computing. MapReduce Why
More informationData Transformation and Migration in Polystores
Data Transformation and Migration in Polystores Adam Dziedzic, Aaron Elmore & Michael Stonebraker September 15th, 2016 Agenda Data Migration for Polystores: What & Why? How? Acceleration of physical data
More informationDr. Chuck Cartledge. 18 Feb. 2015
CS-495/595 Pig Lecture #6 Dr. Chuck Cartledge 18 Feb. 2015 1/18 Table of contents I 1 Miscellanea 2 The Book 3 Chapter 11 4 Conclusion 5 References 2/18 Corrections and additions since last lecture. Completed
More informationTrajStore: an Adaptive Storage System for Very Large Trajectory Data Sets
TrajStore: an Adaptive Storage System for Very Large Trajectory Data Sets Philippe Cudré-Mauroux Eugene Wu Samuel Madden Computer Science and Artificial Intelligence Laboratory Massachusetts Institute
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark Announcements HW2 due this Thursday AWS accounts Any success? Feel
More informationSQL, Scaling, and What s Unique About PostgreSQL
SQL, Scaling, and What s Unique About PostgreSQL Ozgun Erdogan Citus Data XLDB May 2018 Punch Line 1. What is unique about PostgreSQL? The extension APIs 2. PostgreSQL extensions are a game changer for
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 25: Parallel Databases CSE 344 - Winter 2013 1 Announcements Webquiz due tonight last WQ! J HW7 due on Wednesday HW8 will be posted soon Will take more hours
More informationPerm Integrating Data Provenance Support in Database Systems
Perm Integrating Data Provenance Support in Database Systems Boris Glavic Database Technology Group Department of Informatics University of Zurich glavic@ifi.uzh.ch Gustavo Alonso Systems Group Department
More informationCosmic Peta-Scale Data Analysis at IN2P3
Cosmic Peta-Scale Data Analysis at IN2P3 Fabrice Jammes Scalable Data Systems Expert LSST Database and Data Access Software Developer Yvan Calas Senior research engineer LSST deputy project leader at CC-IN2P3
More informationUsing space-filling curves for multidimensional
Using space-filling curves for multidimensional indexing Dr. Bisztray Dénes Senior Research Engineer 1 Nokia Solutions and Networks 2014 In medias res Performance problems with RDBMS Switch to NoSQL store
More informationGoals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query
Goals for Today CS 133: Databases Fall 2018 Lec 02 09/06 Relational Model & Memory and Buffer Manager Prof. Beth Trushkowsky Reason about the conceptual evaluation of an SQL query Understand the storage
More information11/8/ th IEEE Requirements Engineering Conference 27-Sep to 1-Oct, 2010
11/8/2010 18 th IEEE Requirements Engineering Conference 27-Sep to 1-Oct, 2010 Requirements Engineering @ Intel few dedicated requirements engineers central training / coaching department still using Word
More informationChapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
More informationActivity Guide - Public Key Cryptography
Unit 2 Lesson 19 Name(s) Period Date Activity Guide - Public Key Cryptography Introduction This activity is similar to the cups and beans encryption we did in a previous lesson. However, instead of using
More informationSpecialist ICT Learning
Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.
More informationCreating a Virtual Knowledge Base for Financial Risk and Reporting
Creating a Virtual Knowledge Base for Financial Risk and Reporting Juan Sequeda, Capsenta Inc. Mike Bennett, Ltd. Ontology Summit 2016 24 March 2016 1 Risk reporting New regulatory requirements The Basel
More informationUsing Relational Databases for Digital Research
Using Relational Databases for Digital Research Definition (using a) relational database is a way of recording information in a structure that maximizes efficiency by separating information into different
More informationInformation Systems Development COMM005 (CSM03) Autumn Semester 2009
Information Systems Development COMM005 (CSM03) Autumn Semester 2009 Dr. Jonathan Y. Clark Email: j.y.clark@surrey.ac.uk Course Website: www.computing.surrey.ac.uk/courses/csm03/isdmain.htm Slide 1 Adapted
More informationTribhuvan University Institute of Science and Technology MODEL QUESTION
MODEL QUESTION 1. Suppose that a data warehouse for Big University consists of four dimensions: student, course, semester, and instructor, and two measures count and avg-grade. When at the lowest conceptual
More informationChapter 6. Hash-Based Indexing. Efficient Support for Equality Search. Architecture and Implementation of Database Systems Summer 2014
Chapter 6 Efficient Support for Equality Architecture and Implementation of Database Systems Summer 2014 (Split, Rehashing) Wilhelm-Schickard-Institut für Informatik Universität Tübingen 1 We now turn
More informationShark: SQL and Rich Analytics at Scale. Michael Xueyuan Han Ronny Hajoon Ko
Shark: SQL and Rich Analytics at Scale Michael Xueyuan Han Ronny Hajoon Ko What Are The Problems? Data volumes are expanding dramatically Why Is It Hard? Needs to scale out Managing hundreds of machines
More informationChap4: Spatial Storage and Indexing. 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary
Chap4: Spatial Storage and Indexing 4.1 Storage:Disk and Files 4.2 Spatial Indexing 4.3 Trends 4.4 Summary Learning Objectives Learning Objectives (LO) LO1: Understand concept of a physical data model
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 26: Parallel Databases and MapReduce CSE 344 - Winter 2013 1 HW8 MapReduce (Hadoop) w/ declarative language (Pig) Cluster will run in Amazon s cloud (AWS)
More informationHive and Shark. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)
Hive and Shark Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Hive and Shark 1393/8/19 1 / 45 Motivation MapReduce is hard to
More information