Basics of Data Management

Size: px
Start display at page:

Download "Basics of Data Management"

Transcription

1 Basics of Data Management Chaitan Baru

2 2 2 Objectives of this Module Introduce concepts and technologies for managing structured, semistructured, unstructured data Obtain a grounding in traditional data management

3 3 3 Outline The Data Lifecycle Data Genres Structured, Semi-structured, Unstructured Traditional Data Management A grounding in the basics

4 4 4 Data Roles The Data Owner The Data User The Application/Database programmer The Database Administrator The Systems Administrator (Storage Systems Administrator) The Data Security Officer (Data Quality)

5 5 5 The Data Lifecycle Acquisition: Obtaining the data Modeling : Representing the data Implies a Requirements Analysis step Storage: Storing the data Access and Analysis (includes data integration) Sharing data: Standards for sharing Preserving the data for the long term

6 6 6 Data Management Tasks Obtaining the data Primary data obtained by you Secondary: Data from existing sources, accessed by you and then transformed for your use Accessing existing data resources Modeling the data Data tends to stay around longer than one thinks, or plans for Some amount of formal modeling can be useful Storing data Depends very much on: Available skill sets Plans for how to use the data Storage system selected for the data has a huge impact on downstream tasks

7 7 7 Data Management Tasks 2 Accessing and Analyzing the Data Methods employed depend upon storage choice E.g. files vs DBMS vs HDFS vs Graph Data Stores, etc. Sharing the Data Need accepted norms for structure and semantics of the data Age: Time period passed since birth to now rounded to nearest year Represented by an integer Absorbed Nitrates, Age of Rock,

8 8 8 Data Management Tasks 3 Preserving the Data for the Long-term Users typically want to preserve ALL data! Until cost is factored in Decide on how long long term is Find a service or service-provider who can preserve the data at a cost that you can afford Could be campus libraries.in future Use immutable methods for storing the bits and referring to the bits

9 9 9 Data Genres: Classes of data Structured Spreadsheets, relational databases, graphs? Semi-structured Web logs, XML documents, Unstructured Text documents, Key-Value pairs,

10 10 10 Representing / storing data Structured E.g., Relational databases Created via a requirements analysis and design process. Using a Data Definition Language (DDL) Information about the structure schema is in metadata tables, aka Database Catalog Structured model includes/enables a formal query language, e.g. SQL

11 11 11 Representing / storing data 2 Semi-structured E.g., XML Data An extensible template for representing data, e.g. tree, graph With extensibility built in, e.g. Adding nodes, subtrees, subgraphs, edges Schema specification available for XML XML extensions to relational databases

12 12 12 Representing / storing data 3 Unstructured E.g. Text documents Created by applications in any language using file operations Information about the structure resides with the application

13 13 13 Traditional Data Management Tasks Design and Implement the Logical Schema (The data template ) Design and Implement the Physical Schema(files, tables, indexes) Load the Data Access Data Update Data Index, Save, Backup/Restore Data Update Schema

14 14 14 Traditional Data Management Application-driven design Requirements analysis to obtain the application needs Data Management Steps 1. Design: Conceptual or Logical Schema How the user thinks about the data The Data Model: Data Representation + Query Language 2. Implementation: Physical Schema How the computer internally implements the logical model (using memory, disk files, disk blocks, etc.) 3. Data Access Query processing Data Loading Data updating: Insert, Delete, Update 4. Utilities: Indexing, Statistics, Backup/Restore,

15 15 15 Structured Data: Spreadsheets Commonly used by science users Logical Model Set of rows with a set of columns. Columns can have data types Columns can be easily added/ removed Language GUI-based manipulation Macros in Visual Basic ODBC interface to the data in the spreadsheet SQL Can create linkages / dependencies among columns

16 16 16 Data example Staff members working on Projects A project may involve multiple Staff members A Staff member could work on multiple projects A Staff member works on a Project for a given percentage of time (effort) Excel

17 17 17 Sharing and Archiving Excel Sheets DataONE (2012) DataUp: dataup.org Create metadata Archive dataset GEON (2005) Contribute Resource: portal.geongrid.org Create metadata Convert to relational database Archive dataset

18 18 18 Spreadsheets Limitations Based on physical implementation E.g. 1M rows, 16K columns Number of worksheets depends upon amount of available memory Utilities Data Load: Smart Import facility Indexing, backup/restore, etc: Not available

19 19 19 Considerations in choosing appropriate data management technologies Available skill sets Available software / hardware Nature of the problem Amount of data Number of users Longevity of the project Always longer than expected Data professionals like to take the long view Scientists / users tend to take the short view

20 20 20 Structured Data: Relational Databases Logical Model Set of relations, each is a set of tuples (rows) Each tuple has a set of columns (fields) Physical Model Set of Tables (rows + columns) Data stored in tablespaces Indexes Extended field types, e.g. blobs, text, image,

21 21 21 The Staff-Projects Example Logical/Conceptual Schema E-R Diagram, UML Staff WorksOn Project

22 22 22 The Staff-Projects Data Example Staff SSN Last Name First Name Project Project Index Project Name Funding Agency Start Date End Date WorksOn Staff SSN Project Index %Time Start Date End Date

23 23 23 Relational Databases: Physical Schema Tables are defined in Tablespaces Tablespaces are a collection of Tablespace Containers Tablespace containers can be files or raw devices Different Tablespaces for Data, Indexes, Large Objects (BLOBs) Data are organized by rows Language SQL: Operates on tables Select, Project, Join Aggregate, Group-By, Sort

24 24 24 DBMS Architecture Applications DBMS Filesystem DBMS Operating System Devices

25 25 25 Making Relational Databases Efficient Schema design Can reduce number of tables to reduce number of joins Select Staff.FirstName, Staff.LastName, Project.ProjectName, Effort.Time From Staff, Project, Effort Where Staff.SSN=Effort.StaffSSN and Project.ProjIndex = Effort.ProjIndex and Effort.Time>0.5 Order By Effort.Time, Staff.LastName, Staff.FirstName Define views to simplify SQL queries CREATE VIEW Over50PerCent AS SELECT (Staff.FirstName, Staff.LastName, Project.ProjectName, Effort.Time.FROM WHERE ) Materialized views to make access efficient

26 26 26 Relational Databases: Loading Data Data Loading Bulk loading of data Incremental load (append) Load with Update option Load with no-log option Insert, Update, Delete Operations

27 27 27 Improving Performance Indexing Single column indexes Multicolumn indexes for efficient joins Multidimensional indexing, e.g. spatial (lat/long), spatiotemporal Query processing Data parallel processing: Multiple threads per operator (e.g. select, join) Dataflow processing: Between consecutive operators, e.g. from select to join Tune database parameters (heap sizes, process pools, temp space)

28 28 28 Data Placement For Intra-query Parallelism Staff Effort Tablespace A Tablespace B Distribution across disks in a storage system Parallel I/O: Read two tables at the same time Use the Tablespace abstraction to define disjoint set of tablespace containers Clustering together of similar records from different tables Use Join columns to define clustering Grab multiple rows with same join values across different tables in single (or few) Read operations

29 29 29 Data Placement Across Nodes P P P P... Staff Effort Staff Effort Staff Effort Staff Effort Data distribution schemes Hash partition Staff and Effort on SSN Range partition data E.g. all data belonging to a particular region Round-robin partition

30 30 30 Embedding application logic in the database query User-Defined Functions, UDFs Application Process Space: Where user applications execute UDF s Application Heap -- Memory Database Process Space: Where SQL queries execute

31 31 31 Relational databases The Winner in data management, since 1980 s Have agglomerated functionality over time Support for new data types E.g., strings, large binary objects (in GB s), images, spatial data, video, XML Have become high performance >300,000 transactions/minute! 100 s-1000 s of concurrent users Very fast query processing Robustness and availability Failover support, on-line backups, indexing, loading, But all within a strict consistency model A database transitions from one consistent state to the next An inconsistent database is unusable

32 32 32 Shared-nothing Parallel Databases Invented in the 1980 s Teradata, IBM DB2 Parallel Edition (now DB2 EEE) But Only interface to DBMS was SQL/ODBC No API s at storage distribution level ( HDFS) Nor at process execution level ( MapReduce) Hadoop and its ecosystem has exposed these internals but, wait, things might get better!

33 33 33

Microsoft. [MS20762]: Developing SQL Databases

Microsoft. [MS20762]: Developing SQL Databases [MS20762]: Developing SQL Databases Length : 5 Days Audience(s) : IT Professionals Level : 300 Technology : Microsoft SQL Server Delivery Method : Instructor-led (Classroom) Course Overview This five-day

More information

Microsoft Developing SQL Databases

Microsoft Developing SQL Databases 1800 ULEARN (853 276) www.ddls.com.au Length 5 days Microsoft 20762 - Developing SQL Databases Price $4290.00 (inc GST) Version C Overview This five-day instructor-led course provides students with the

More information

"Charting the Course... MOC C: Developing SQL Databases. Course Summary

Charting the Course... MOC C: Developing SQL Databases. Course Summary Course Summary Description This five-day instructor-led course provides students with the knowledge and skills to develop a Microsoft SQL database. The course focuses on teaching individuals how to use

More information

20762B: DEVELOPING SQL DATABASES

20762B: DEVELOPING SQL DATABASES ABOUT THIS COURSE This five day instructor-led course provides students with the knowledge and skills to develop a Microsoft SQL Server 2016 database. The course focuses on teaching individuals how to

More information

Developing SQL Databases

Developing SQL Databases Course 20762B: Developing SQL Databases Page 1 of 9 Developing SQL Databases Course 20762B: 4 days; Instructor-Led Introduction This four-day instructor-led course provides students with the knowledge

More information

SQL Server Development 20762: Developing SQL Databases in Microsoft SQL Server Upcoming Dates. Course Description.

SQL Server Development 20762: Developing SQL Databases in Microsoft SQL Server Upcoming Dates. Course Description. SQL Server Development 20762: Developing SQL Databases in Microsoft SQL Server 2016 Learn how to design and Implement advanced SQL Server 2016 databases including working with tables, create optimized

More information

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1 Basic Concepts :- 1. What is Data? Data is a collection of facts from which conclusion may be drawn. In computer science, data is anything in a form suitable for use with a computer. Data is often distinguished

More information

Data about data is database Select correct option: True False Partially True None of the Above

Data about data is database Select correct option: True False Partially True None of the Above Within a table, each primary key value. is a minimal super key is always the first field in each table must be numeric must be unique Foreign Key is A field in a table that matches a key field in another

More information

Tutorial Outline. Map/Reduce vs. DBMS. MR vs. DBMS [DeWitt and Stonebraker 2008] Acknowledgements. MR is a step backwards in database access

Tutorial Outline. Map/Reduce vs. DBMS. MR vs. DBMS [DeWitt and Stonebraker 2008] Acknowledgements. MR is a step backwards in database access Map/Reduce vs. DBMS Sharma Chakravarthy Information Technology Laboratory Computer Science and Engineering Department The University of Texas at Arlington, Arlington, TX 76009 Email: sharma@cse.uta.edu

More information

CSE 544 Principles of Database Management Systems

CSE 544 Principles of Database Management Systems CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 5 - DBMS Architecture and Indexing 1 Announcements HW1 is due next Thursday How is it going? Projects: Proposals are due

More information

Database Systems. Jan Chomicki. University at Buffalo

Database Systems. Jan Chomicki. University at Buffalo Database Systems Jan Chomicki University at Buffalo Plan of the course 1 Database Management Systems 2 Relational data model 3 Indexing 4 Query processing and optimization 5 Database design 6 Selected

More information

MIS Database Systems.

MIS Database Systems. MIS 335 - Database Systems http://www.mis.boun.edu.tr/durahim/ Ahmet Onur Durahim Learning Objectives Database systems concepts Designing and implementing a database application Life of a Query in a Database

More information

BIS Database Management Systems.

BIS Database Management Systems. BIS 512 - Database Management Systems http://www.mis.boun.edu.tr/durahim/ Ahmet Onur Durahim Learning Objectives Database systems concepts Designing and implementing a database application Life of a Query

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

MySQL Database Administrator Training NIIT, Gurgaon India 31 August-10 September 2015

MySQL Database Administrator Training NIIT, Gurgaon India 31 August-10 September 2015 MySQL Database Administrator Training Day 1: AGENDA Introduction to MySQL MySQL Overview MySQL Database Server Editions MySQL Products MySQL Services and Support MySQL Resources Example Databases MySQL

More information

Jyotheswar Kuricheti

Jyotheswar Kuricheti Jyotheswar Kuricheti 1 Agenda: 1. Performance Tuning Overview 2. Identify Bottlenecks 3. Optimizing at different levels : Target Source Mapping Session System 2 3 Performance Tuning Overview: 4 What is

More information

EMC Documentum xdb. High-performance native XML database optimized for storing and querying large volumes of XML content

EMC Documentum xdb. High-performance native XML database optimized for storing and querying large volumes of XML content DATA SHEET EMC Documentum xdb High-performance native XML database optimized for storing and querying large volumes of XML content The Big Picture Ideal for content-oriented applications like dynamic publishing

More information

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344 Where We Are Introduction to Data Management CSE 344 Lecture 22: MapReduce We are talking about parallel query processing There exist two main types of engines: Parallel DBMSs (last lecture + quick review)

More information

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017 Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017 About the Presentation Problems Existing Solutions Denis Magda

More information

Hive and Shark. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

Hive and Shark. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic) Hive and Shark Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Hive and Shark 1393/8/19 1 / 45 Motivation MapReduce is hard to

More information

CSE 544: Principles of Database Systems

CSE 544: Principles of Database Systems CSE 544: Principles of Database Systems Anatomy of a DBMS, Parallel Databases 1 Announcements Lecture on Thursday, May 2nd: Moved to 9am-10:30am, CSE 403 Paper reviews: Anatomy paper was due yesterday;

More information

Fundamentals of Information Systems, Seventh Edition

Fundamentals of Information Systems, Seventh Edition Chapter 3 Data Centers, and Business Intelligence 1 Why Learn About Database Systems, Data Centers, and Business Intelligence? Database: A database is an organized collection of data. Databases also help

More information

Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course:

Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course: Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course: 20762C Developing SQL 2016 Databases Module 1: An Introduction to Database Development Introduction to the

More information

Outline. Database Tuning. Ideal Transaction. Concurrency Tuning Goals. Concurrency Tuning. Nikolaus Augsten. Lock Tuning. Unit 8 WS 2013/2014

Outline. Database Tuning. Ideal Transaction. Concurrency Tuning Goals. Concurrency Tuning. Nikolaus Augsten. Lock Tuning. Unit 8 WS 2013/2014 Outline Database Tuning Nikolaus Augsten University of Salzburg Department of Computer Science Database Group 1 Unit 8 WS 2013/2014 Adapted from Database Tuning by Dennis Shasha and Philippe Bonnet. Nikolaus

More information

Test On Line: reusing SAS code in WEB applications Author: Carlo Ramella TXT e-solutions

Test On Line: reusing SAS code in WEB applications Author: Carlo Ramella TXT e-solutions Test On Line: reusing SAS code in WEB applications Author: Carlo Ramella TXT e-solutions Chapter 1: Abstract The Proway System is a powerful complete system for Process and Testing Data Analysis in IC

More information

CSE 344 MAY 7 TH EXAM REVIEW

CSE 344 MAY 7 TH EXAM REVIEW CSE 344 MAY 7 TH EXAM REVIEW EXAMINATION STATIONS Exam Wednesday 9:30-10:20 One sheet of notes, front and back Practice solutions out after class Good luck! EXAM LENGTH Production v. Verification Practice

More information

What s New in MySQL 5.7 Geir Høydalsvik, Sr. Director, MySQL Engineering. Copyright 2015, Oracle and/or its affiliates. All rights reserved.

What s New in MySQL 5.7 Geir Høydalsvik, Sr. Director, MySQL Engineering. Copyright 2015, Oracle and/or its affiliates. All rights reserved. What s New in MySQL 5.7 Geir Høydalsvik, Sr. Director, MySQL Engineering Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes

More information

CSE 190D Spring 2017 Final Exam Answers

CSE 190D Spring 2017 Final Exam Answers CSE 190D Spring 2017 Final Exam Answers Q 1. [20pts] For the following questions, clearly circle True or False. 1. The hash join algorithm always has fewer page I/Os compared to the block nested loop join

More information

Greenplum Architecture Class Outline

Greenplum Architecture Class Outline Greenplum Architecture Class Outline Introduction to the Greenplum Architecture What is Parallel Processing? The Basics of a Single Computer Data in Memory is Fast as Lightning Parallel Processing Of Data

More information

Chapter 11 Database Concepts

Chapter 11 Database Concepts Chapter 11 Database Concepts INTRODUCTION Database is collection of interrelated data and database system is basically a computer based record keeping system. It contains the information about one particular

More information

Database Management Systems. Chapter 1

Database Management Systems. Chapter 1 Database Management Systems Chapter 1 Overview of Database Systems Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 What Is a DBMS? A database is a collection of data. Models real-world

More information

Database Management and Tuning

Database Management and Tuning Database Management and Tuning Concurrency Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 8 May 10, 2012 Acknowledgements: The slides are provided by Nikolaus

More information

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344 Introduction to Data Management CSE 344 Lecture 26: Parallel Databases and MapReduce CSE 344 - Winter 2013 1 HW8 MapReduce (Hadoop) w/ declarative language (Pig) Cluster will run in Amazon s cloud (AWS)

More information

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous

More information

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. Processing Unstructured Data Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. http://dinesql.com / Dinesh Priyankara @dinesh_priya Founder/Principal Architect dinesql Pvt Ltd. Microsoft Most

More information

IBM DB2 11 DBA for z/os Certification Review Guide Exam 312

IBM DB2 11 DBA for z/os Certification Review Guide Exam 312 Introduction IBM DB2 11 DBA for z/os Certification Review Guide Exam 312 The purpose of this book is to assist you with preparing for the IBM DB2 11 DBA for z/os exam (Exam 312), one of the two required

More information

Tuning Enterprise Information Catalog Performance

Tuning Enterprise Information Catalog Performance Tuning Enterprise Information Catalog Performance Copyright Informatica LLC 2015, 2018. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the United States

More information

SAP IQ - Business Intelligence and vertical data processing with 8 GB RAM or less

SAP IQ - Business Intelligence and vertical data processing with 8 GB RAM or less SAP IQ - Business Intelligence and vertical data processing with 8 GB RAM or less Dipl.- Inform. Volker Stöffler Volker.Stoeffler@DB-TecKnowledgy.info Public Agenda Introduction: What is SAP IQ - in a

More information

COSC 304 Introduction to Database Systems. Database Introduction. Dr. Ramon Lawrence University of British Columbia Okanagan

COSC 304 Introduction to Database Systems. Database Introduction. Dr. Ramon Lawrence University of British Columbia Okanagan COSC 304 Introduction to Database Systems Database Introduction Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca What is a database? A database is a collection of logically

More information

SQL Server Analysis Services

SQL Server Analysis Services DataBase and Data Mining Group of DataBase and Data Mining Group of Database and data mining group, SQL Server 2005 Analysis Services SQL Server 2005 Analysis Services - 1 Analysis Services Database and

More information

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 1 Database Systems

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 1 Database Systems Database Systems: Design, Implementation, and Management Tenth Edition Chapter 1 Database Systems Objectives In this chapter, you will learn: The difference between data and information What a database

More information

CSE 190D Spring 2017 Final Exam

CSE 190D Spring 2017 Final Exam CSE 190D Spring 2017 Final Exam Full Name : Student ID : Major : INSTRUCTIONS 1. You have up to 2 hours and 59 minutes to complete this exam. 2. You can have up to one letter/a4-sized sheet of notes, formulae,

More information

Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...

Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,... Data Ingestion ETL, Distcp, Kafka, OpenRefine, Query & Exploration SQL, Search, Cypher, Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) The Entity Relationship Model Lecture 2, January 12, 2016 Mohammad Hammoud Today Last Session: Course overview and a brief introduction on databases and database systems

More information

Department of Information Technology B.E/B.Tech : CSE/IT Regulation: 2013 Sub. Code / Sub. Name : CS6302 Database Management Systems

Department of Information Technology B.E/B.Tech : CSE/IT Regulation: 2013 Sub. Code / Sub. Name : CS6302 Database Management Systems COURSE DELIVERY PLAN - THEORY Page 1 of 6 Department of Information Technology B.E/B.Tech : CSE/IT Regulation: 2013 Sub. Code / Sub. Name : CS6302 Database Management Systems Unit : I LP: CS6302 Rev. :

More information

Lecture 8. Database Management and Queries

Lecture 8. Database Management and Queries Lecture 8 Database Management and Queries Lecture 8: Outline I. Database Components II. Database Structures A. Conceptual, Logical, and Physical Components III. Non-Relational Databases A. Flat File B.

More information

Database Management System. Fundamental Database Concepts

Database Management System. Fundamental Database Concepts Database Management System Fundamental Database Concepts CONTENTS Basics of DBMS Purpose of DBMS Applications of DBMS Views of Data Instances and Schema Data Models Database Languages Responsibility of

More information

Chapter 1: Introduction

Chapter 1: Introduction Chapter 1: Introduction Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Outline The Need for Databases Data Models Relational Databases Database Design Storage Manager Query

More information

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model. 1. MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model. MapReduce is a framework for processing big data which processes data in two phases, a Map

More information

C Exam Code: C Exam Name: IBM InfoSphere DataStage v9.1

C Exam Code: C Exam Name: IBM InfoSphere DataStage v9.1 C2090-303 Number: C2090-303 Passing Score: 800 Time Limit: 120 min File Version: 36.8 Exam Code: C2090-303 Exam Name: IBM InfoSphere DataStage v9.1 Actualtests QUESTION 1 In your ETL application design

More information

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here 2013-11-12 Copyright 2013 Cloudera

More information

ColumnStore Indexes. מה חדש ב- 2014?SQL Server.

ColumnStore Indexes. מה חדש ב- 2014?SQL Server. ColumnStore Indexes מה חדש ב- 2014?SQL Server דודאי מאיר meir@valinor.co.il 3 Column vs. row store Row Store (Heap / B-Tree) Column Store (values compressed) ProductID OrderDate Cost ProductID OrderDate

More information

Introduction to Database Management Systems

Introduction to Database Management Systems Introduction to Database Management Systems Excerpt from Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 What Is a DBMS? A very large, integrated collection of data. Models real-world

More information

Database Management Systems (CPTR 312)

Database Management Systems (CPTR 312) Database Management Systems (CPTR 312) Preliminaries Me: Raheel Ahmad Ph.D., Southern Illinois University M.S., University of Southern Mississippi B.S., Zakir Hussain College, India Contact: Science 116,

More information

Optimizing Performance for Partitioned Mappings

Optimizing Performance for Partitioned Mappings Optimizing Performance for Partitioned Mappings 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)

More information

BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data IBM Corporation

BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data IBM Corporation BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data 2013 IBM Corporation A Big Data architecture evolves from a traditional BI architecture

More information

Database Management Systems MIT Introduction By S. Sabraz Nawaz

Database Management Systems MIT Introduction By S. Sabraz Nawaz Database Management Systems MIT 22033 Introduction By S. Sabraz Nawaz Recommended Reading Database Management Systems 3 rd Edition, Ramakrishnan, Gehrke Murach s SQL Server 2008 for Developers Any book

More information

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence What s a database system? Review of Basic Database Concepts CPS 296.1 Topics in Database Systems According to Oxford Dictionary Database: an organized body of related information Database system, DataBase

More information

Basant Group of Institution

Basant Group of Institution Basant Group of Institution Visual Basic 6.0 Objective Question Q.1 In the relational modes, cardinality is termed as: (A) Number of tuples. (B) Number of attributes. (C) Number of tables. (D) Number of

More information

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data Introduction to Hadoop High Availability Scaling Advantages and Challenges Introduction to Big Data What is Big data Big Data opportunities Big Data Challenges Characteristics of Big data Introduction

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs

More information

CTL.SC4x Technology and Systems

CTL.SC4x Technology and Systems in Supply Chain Management CTL.SC4x Technology and Systems Key Concepts Document This document contains the Key Concepts for the SC4x course, Weeks 1 and 2. These are meant to complement, not replace,

More information

Mahathma Gandhi University

Mahathma Gandhi University Mahathma Gandhi University BSc Computer science III Semester BCS 303 OBJECTIVE TYPE QUESTIONS Choose the correct or best alternative in the following: Q.1 In the relational modes, cardinality is termed

More information

Big Data Hadoop Course Content

Big Data Hadoop Course Content Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux

More information

IBM DB2 for z/os Application Developer Certification

IBM DB2 for z/os Application Developer Certification IBM DB2 for z/os Application Developer Certification Professional Certification Exam Copyright 2018 Computer Business International, Inc. www.cbi4you.com 1 What does it involve? IBM DB2 for z/os Application

More information

Learning Alliance Corporation, Inc. For more info: go to

Learning Alliance Corporation, Inc. For more info: go to Writing Queries Using Microsoft SQL Server Transact-SQL Length: 3 Day(s) Language(s): English Audience(s): IT Professionals Level: 200 Technology: Microsoft SQL Server Type: Course Delivery Method: Instructor-led

More information

6232A - Version: 1. Implementing a Microsoft SQL Server 2008 Database

6232A - Version: 1. Implementing a Microsoft SQL Server 2008 Database 6232A - Version: 1 Implementing a Microsoft SQL Server 2008 Database Implementing a Microsoft SQL Server 2008 Database 6232A - Version: 1 5 days Course Description: This five-day instructor-led course

More information

Outline. Quick Introduction to Database Systems. Data Manipulation Tasks. What do they all have in common? CSE142 Wi03 G-1

Outline. Quick Introduction to Database Systems. Data Manipulation Tasks. What do they all have in common? CSE142 Wi03 G-1 Outline Quick Introduction to Database Systems Why do we need a different kind of system? What is a database system? Separating the what the how: The relational data model Querying the databases: SQL May

More information

Sepand Gojgini. ColumnStore Index Primer

Sepand Gojgini. ColumnStore Index Primer Sepand Gojgini ColumnStore Index Primer SQLSaturday Sponsors! Titanium & Global Partner Gold Silver Bronze Without the generosity of these sponsors, this event would not be possible! Please, stop by the

More information

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

HAWQ: A Massively Parallel Processing SQL Engine in Hadoop

HAWQ: A Massively Parallel Processing SQL Engine in Hadoop HAWQ: A Massively Parallel Processing SQL Engine in Hadoop Lei Chang, Zhanwei Wang, Tao Ma, Lirong Jian, Lili Ma, Alon Goldshuv Luke Lonergan, Jeffrey Cohen, Caleb Welton, Gavin Sherry, Milind Bhandarkar

More information

CompSci 516: Database Systems. Lecture 20. Parallel DBMS. Instructor: Sudeepa Roy

CompSci 516: Database Systems. Lecture 20. Parallel DBMS. Instructor: Sudeepa Roy CompSci 516 Database Systems Lecture 20 Parallel DBMS Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements HW3 due on Monday, Nov 20, 11:55 pm (in 2 weeks) See some

More information

Sql 2008 Copy Table Structure And Database To

Sql 2008 Copy Table Structure And Database To Sql 2008 Copy Table Structure And Database To Another Table Different you can create a table with same schema in another database first and copy the data like Browse other questions tagged sql-server sql-server-2008r2-express.

More information

DATABASE MANAGEMENT SYSTEMS. UNIT I Introduction to Database Systems

DATABASE MANAGEMENT SYSTEMS. UNIT I Introduction to Database Systems DATABASE MANAGEMENT SYSTEMS UNIT I Introduction to Database Systems Terminology Data = known facts that can be recorded Database (DB) = logically coherent collection of related data with some inherent

More information

Talend Open Studio for Data Quality. User Guide 5.5.2

Talend Open Studio for Data Quality. User Guide 5.5.2 Talend Open Studio for Data Quality User Guide 5.5.2 Talend Open Studio for Data Quality Adapted for v5.5. Supersedes previous releases. Publication date: January 29, 2015 Copyleft This documentation is

More information

Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10

Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10 Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10 RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY, KIRUMAMPAKKAM-607 402 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING QUESTION BANK

More information

Bonus Content. Glossary

Bonus Content. Glossary Bonus Content Glossary ActiveX control: A reusable software component that can be added to an application, reducing development time in the process. ActiveX is a Microsoft technology; ActiveX components

More information

The Relational Model. Database Management Systems

The Relational Model. Database Management Systems The Relational Model Fall 2017, Lecture 2 A relationship, I think, is like a shark, you know? It has to constantly move forward or it dies. And I think what we got on our hands is a dead shark. Woody Allen

More information

Systems:;-'./'--'.; r. Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington

Systems:;-'./'--'.; r. Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington Data base 7\,T"] Systems:;-'./'--'.; r Modelsj Languages, Design, and Application Programming Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington Shamkant

More information

Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...

Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,... Data Ingestion ETL, Distcp, Kafka, OpenRefine, Query & Exploration SQL, Search, Cypher, Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...

More information

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Copyright 2012, Oracle and/or its affiliates. All rights reserved. 1 Big Data Connectors: High Performance Integration for Hadoop and Oracle Database Melli Annamalai Sue Mavris Rob Abbott 2 Program Agenda Big Data Connectors: Brief Overview Connecting Hadoop with Oracle

More information

Using Relational Databases for Digital Research

Using Relational Databases for Digital Research Using Relational Databases for Digital Research Definition (using a) relational database is a way of recording information in a structure that maximizes efficiency by separating information into different

More information

Chapter 1: Introduction

Chapter 1: Introduction Chapter 1: Introduction Chapter 2: Intro. To the Relational Model Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Database Management System (DBMS) DBMS is Collection of

More information

CSE544: Principles of Database Systems. Lectures 5-6 Database Architecture Storage and Indexes

CSE544: Principles of Database Systems. Lectures 5-6 Database Architecture Storage and Indexes CSE544: Principles of Database Systems Lectures 5-6 Database Architecture Storage and Indexes 1 Announcements Project Choose a topic. Set limited goals! Sign up (doodle) to meet with me this week Homework

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344 Introduction to Data Management CSE 344 Lecture 24: MapReduce CSE 344 - Fall 2016 1 HW8 is out Last assignment! Get Amazon credits now (see instructions) Spark with Hadoop Due next wed CSE 344 - Fall 2016

More information

EXAMGOOD QUESTION & ANSWER. Accurate study guides High passing rate! Exam Good provides update free of charge in one year!

EXAMGOOD QUESTION & ANSWER. Accurate study guides High passing rate! Exam Good provides update free of charge in one year! EXAMGOOD QUESTION & ANSWER Exam Good provides update free of charge in one year! Accurate study guides High passing rate! http://www.examgood.com Exam : C2090-610 Title : DB2 10.1 Fundamentals Version

More information

What We Have Already Learned. DBMS Deployment: Local. Where We Are Headed Next. DBMS Deployment: 3 Tiers. DBMS Deployment: Client/Server

What We Have Already Learned. DBMS Deployment: Local. Where We Are Headed Next. DBMS Deployment: 3 Tiers. DBMS Deployment: Client/Server What We Have Already Learned CSE 444: Database Internals Lectures 19-20 Parallel DBMSs Overall architecture of a DBMS Internals of query execution: Data storage and indexing Buffer management Query evaluation

More information

Writing Queries Using Microsoft SQL Server 2008 Transact- SQL

Writing Queries Using Microsoft SQL Server 2008 Transact- SQL Writing Queries Using Microsoft SQL Server 2008 Transact- SQL Course 2778-08; 3 Days, Instructor-led Course Description This 3-day instructor led course provides students with the technical skills required

More information

Introduction to Data Science

Introduction to Data Science UNIT I INTRODUCTION TO DATA SCIENCE Syllabus Introduction of Data Science Basic Data Analytics using R R Graphical User Interfaces Data Import and Export Attribute and Data Types Descriptive Statistics

More information

A Fast and High Throughput SQL Query System for Big Data

A Fast and High Throughput SQL Query System for Big Data A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190

More information

Tables. Tables. Physical Organization: SQL Server Partitions

Tables. Tables. Physical Organization: SQL Server Partitions Tables Physical Organization: SQL Server 2005 Tables and indexes are stored as a collection of 8 KB pages A table is divided in one or more partitions Each partition contains data rows in either a heap

More information

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM IBM Spectrum Protect Version 8.1.2 Introduction to Data Protection Solutions IBM IBM Spectrum Protect Version 8.1.2 Introduction to Data Protection Solutions IBM Note: Before you use this information

More information

Fundamentals of. Database Systems. Shamkant B. Navathe. College of Computing Georgia Institute of Technology PEARSON.

Fundamentals of. Database Systems. Shamkant B. Navathe. College of Computing Georgia Institute of Technology PEARSON. Fundamentals of Database Systems 5th Edition Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington Shamkant B. Navathe College of Computing Georgia Institute

More information

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

MAPR DATA GOVERNANCE WITHOUT COMPROMISE MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance

More information

MarkLogic Technology Briefing

MarkLogic Technology Briefing MarkLogic Technology Briefing Edd Patterson CTO/VP Systems Engineering, Americas Slide 1 Agenda Introductions About MarkLogic MarkLogic Server Deep Dive Slide 2 MarkLogic Overview Company Highlights Headquartered

More information

Physical Organization: SQL Server 2005

Physical Organization: SQL Server 2005 Physical Organization: SQL Server 2005 Tables Tables and indexes are stored as a collection of 8 KB pages A table is divided in one or more partitions Each partition contains data rows in either a heap

More information

Relational Database Systems Part 01. Karine Reis Ferreira

Relational Database Systems Part 01. Karine Reis Ferreira Relational Database Systems Part 01 Karine Reis Ferreira karine@dpi.inpe.br Aula da disciplina Computação Aplicada I (CAP 241) 2016 Database System Database: is a collection of related data. represents

More information

Bottom line: A database is the data stored and a database system is the software that manages the data. COSC Dr.

Bottom line: A database is the data stored and a database system is the software that manages the data. COSC Dr. COSC 304 Introduction to Systems Introduction Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca What is a database? A database is a collection of logically related data for

More information