Basics of Data Management
|
|
- Scot Andrews
- 5 years ago
- Views:
Transcription
1 Basics of Data Management Chaitan Baru
2 2 2 Objectives of this Module Introduce concepts and technologies for managing structured, semistructured, unstructured data Obtain a grounding in traditional data management
3 3 3 Outline The Data Lifecycle Data Genres Structured, Semi-structured, Unstructured Traditional Data Management A grounding in the basics
4 4 4 Data Roles The Data Owner The Data User The Application/Database programmer The Database Administrator The Systems Administrator (Storage Systems Administrator) The Data Security Officer (Data Quality)
5 5 5 The Data Lifecycle Acquisition: Obtaining the data Modeling : Representing the data Implies a Requirements Analysis step Storage: Storing the data Access and Analysis (includes data integration) Sharing data: Standards for sharing Preserving the data for the long term
6 6 6 Data Management Tasks Obtaining the data Primary data obtained by you Secondary: Data from existing sources, accessed by you and then transformed for your use Accessing existing data resources Modeling the data Data tends to stay around longer than one thinks, or plans for Some amount of formal modeling can be useful Storing data Depends very much on: Available skill sets Plans for how to use the data Storage system selected for the data has a huge impact on downstream tasks
7 7 7 Data Management Tasks 2 Accessing and Analyzing the Data Methods employed depend upon storage choice E.g. files vs DBMS vs HDFS vs Graph Data Stores, etc. Sharing the Data Need accepted norms for structure and semantics of the data Age: Time period passed since birth to now rounded to nearest year Represented by an integer Absorbed Nitrates, Age of Rock,
8 8 8 Data Management Tasks 3 Preserving the Data for the Long-term Users typically want to preserve ALL data! Until cost is factored in Decide on how long long term is Find a service or service-provider who can preserve the data at a cost that you can afford Could be campus libraries.in future Use immutable methods for storing the bits and referring to the bits
9 9 9 Data Genres: Classes of data Structured Spreadsheets, relational databases, graphs? Semi-structured Web logs, XML documents, Unstructured Text documents, Key-Value pairs,
10 10 10 Representing / storing data Structured E.g., Relational databases Created via a requirements analysis and design process. Using a Data Definition Language (DDL) Information about the structure schema is in metadata tables, aka Database Catalog Structured model includes/enables a formal query language, e.g. SQL
11 11 11 Representing / storing data 2 Semi-structured E.g., XML Data An extensible template for representing data, e.g. tree, graph With extensibility built in, e.g. Adding nodes, subtrees, subgraphs, edges Schema specification available for XML XML extensions to relational databases
12 12 12 Representing / storing data 3 Unstructured E.g. Text documents Created by applications in any language using file operations Information about the structure resides with the application
13 13 13 Traditional Data Management Tasks Design and Implement the Logical Schema (The data template ) Design and Implement the Physical Schema(files, tables, indexes) Load the Data Access Data Update Data Index, Save, Backup/Restore Data Update Schema
14 14 14 Traditional Data Management Application-driven design Requirements analysis to obtain the application needs Data Management Steps 1. Design: Conceptual or Logical Schema How the user thinks about the data The Data Model: Data Representation + Query Language 2. Implementation: Physical Schema How the computer internally implements the logical model (using memory, disk files, disk blocks, etc.) 3. Data Access Query processing Data Loading Data updating: Insert, Delete, Update 4. Utilities: Indexing, Statistics, Backup/Restore,
15 15 15 Structured Data: Spreadsheets Commonly used by science users Logical Model Set of rows with a set of columns. Columns can have data types Columns can be easily added/ removed Language GUI-based manipulation Macros in Visual Basic ODBC interface to the data in the spreadsheet SQL Can create linkages / dependencies among columns
16 16 16 Data example Staff members working on Projects A project may involve multiple Staff members A Staff member could work on multiple projects A Staff member works on a Project for a given percentage of time (effort) Excel
17 17 17 Sharing and Archiving Excel Sheets DataONE (2012) DataUp: dataup.org Create metadata Archive dataset GEON (2005) Contribute Resource: portal.geongrid.org Create metadata Convert to relational database Archive dataset
18 18 18 Spreadsheets Limitations Based on physical implementation E.g. 1M rows, 16K columns Number of worksheets depends upon amount of available memory Utilities Data Load: Smart Import facility Indexing, backup/restore, etc: Not available
19 19 19 Considerations in choosing appropriate data management technologies Available skill sets Available software / hardware Nature of the problem Amount of data Number of users Longevity of the project Always longer than expected Data professionals like to take the long view Scientists / users tend to take the short view
20 20 20 Structured Data: Relational Databases Logical Model Set of relations, each is a set of tuples (rows) Each tuple has a set of columns (fields) Physical Model Set of Tables (rows + columns) Data stored in tablespaces Indexes Extended field types, e.g. blobs, text, image,
21 21 21 The Staff-Projects Example Logical/Conceptual Schema E-R Diagram, UML Staff WorksOn Project
22 22 22 The Staff-Projects Data Example Staff SSN Last Name First Name Project Project Index Project Name Funding Agency Start Date End Date WorksOn Staff SSN Project Index %Time Start Date End Date
23 23 23 Relational Databases: Physical Schema Tables are defined in Tablespaces Tablespaces are a collection of Tablespace Containers Tablespace containers can be files or raw devices Different Tablespaces for Data, Indexes, Large Objects (BLOBs) Data are organized by rows Language SQL: Operates on tables Select, Project, Join Aggregate, Group-By, Sort
24 24 24 DBMS Architecture Applications DBMS Filesystem DBMS Operating System Devices
25 25 25 Making Relational Databases Efficient Schema design Can reduce number of tables to reduce number of joins Select Staff.FirstName, Staff.LastName, Project.ProjectName, Effort.Time From Staff, Project, Effort Where Staff.SSN=Effort.StaffSSN and Project.ProjIndex = Effort.ProjIndex and Effort.Time>0.5 Order By Effort.Time, Staff.LastName, Staff.FirstName Define views to simplify SQL queries CREATE VIEW Over50PerCent AS SELECT (Staff.FirstName, Staff.LastName, Project.ProjectName, Effort.Time.FROM WHERE ) Materialized views to make access efficient
26 26 26 Relational Databases: Loading Data Data Loading Bulk loading of data Incremental load (append) Load with Update option Load with no-log option Insert, Update, Delete Operations
27 27 27 Improving Performance Indexing Single column indexes Multicolumn indexes for efficient joins Multidimensional indexing, e.g. spatial (lat/long), spatiotemporal Query processing Data parallel processing: Multiple threads per operator (e.g. select, join) Dataflow processing: Between consecutive operators, e.g. from select to join Tune database parameters (heap sizes, process pools, temp space)
28 28 28 Data Placement For Intra-query Parallelism Staff Effort Tablespace A Tablespace B Distribution across disks in a storage system Parallel I/O: Read two tables at the same time Use the Tablespace abstraction to define disjoint set of tablespace containers Clustering together of similar records from different tables Use Join columns to define clustering Grab multiple rows with same join values across different tables in single (or few) Read operations
29 29 29 Data Placement Across Nodes P P P P... Staff Effort Staff Effort Staff Effort Staff Effort Data distribution schemes Hash partition Staff and Effort on SSN Range partition data E.g. all data belonging to a particular region Round-robin partition
30 30 30 Embedding application logic in the database query User-Defined Functions, UDFs Application Process Space: Where user applications execute UDF s Application Heap -- Memory Database Process Space: Where SQL queries execute
31 31 31 Relational databases The Winner in data management, since 1980 s Have agglomerated functionality over time Support for new data types E.g., strings, large binary objects (in GB s), images, spatial data, video, XML Have become high performance >300,000 transactions/minute! 100 s-1000 s of concurrent users Very fast query processing Robustness and availability Failover support, on-line backups, indexing, loading, But all within a strict consistency model A database transitions from one consistent state to the next An inconsistent database is unusable
32 32 32 Shared-nothing Parallel Databases Invented in the 1980 s Teradata, IBM DB2 Parallel Edition (now DB2 EEE) But Only interface to DBMS was SQL/ODBC No API s at storage distribution level ( HDFS) Nor at process execution level ( MapReduce) Hadoop and its ecosystem has exposed these internals but, wait, things might get better!
33 33 33
Microsoft. [MS20762]: Developing SQL Databases
[MS20762]: Developing SQL Databases Length : 5 Days Audience(s) : IT Professionals Level : 300 Technology : Microsoft SQL Server Delivery Method : Instructor-led (Classroom) Course Overview This five-day
More informationMicrosoft Developing SQL Databases
1800 ULEARN (853 276) www.ddls.com.au Length 5 days Microsoft 20762 - Developing SQL Databases Price $4290.00 (inc GST) Version C Overview This five-day instructor-led course provides students with the
More information"Charting the Course... MOC C: Developing SQL Databases. Course Summary
Course Summary Description This five-day instructor-led course provides students with the knowledge and skills to develop a Microsoft SQL database. The course focuses on teaching individuals how to use
More information20762B: DEVELOPING SQL DATABASES
ABOUT THIS COURSE This five day instructor-led course provides students with the knowledge and skills to develop a Microsoft SQL Server 2016 database. The course focuses on teaching individuals how to
More informationDeveloping SQL Databases
Course 20762B: Developing SQL Databases Page 1 of 9 Developing SQL Databases Course 20762B: 4 days; Instructor-Led Introduction This four-day instructor-led course provides students with the knowledge
More informationSQL Server Development 20762: Developing SQL Databases in Microsoft SQL Server Upcoming Dates. Course Description.
SQL Server Development 20762: Developing SQL Databases in Microsoft SQL Server 2016 Learn how to design and Implement advanced SQL Server 2016 databases including working with tables, create optimized
More informationB.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1
Basic Concepts :- 1. What is Data? Data is a collection of facts from which conclusion may be drawn. In computer science, data is anything in a form suitable for use with a computer. Data is often distinguished
More informationData about data is database Select correct option: True False Partially True None of the Above
Within a table, each primary key value. is a minimal super key is always the first field in each table must be numeric must be unique Foreign Key is A field in a table that matches a key field in another
More informationTutorial Outline. Map/Reduce vs. DBMS. MR vs. DBMS [DeWitt and Stonebraker 2008] Acknowledgements. MR is a step backwards in database access
Map/Reduce vs. DBMS Sharma Chakravarthy Information Technology Laboratory Computer Science and Engineering Department The University of Texas at Arlington, Arlington, TX 76009 Email: sharma@cse.uta.edu
More informationCSE 544 Principles of Database Management Systems
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 5 - DBMS Architecture and Indexing 1 Announcements HW1 is due next Thursday How is it going? Projects: Proposals are due
More informationDatabase Systems. Jan Chomicki. University at Buffalo
Database Systems Jan Chomicki University at Buffalo Plan of the course 1 Database Management Systems 2 Relational data model 3 Indexing 4 Query processing and optimization 5 Database design 6 Selected
More informationMIS Database Systems.
MIS 335 - Database Systems http://www.mis.boun.edu.tr/durahim/ Ahmet Onur Durahim Learning Objectives Database systems concepts Designing and implementing a database application Life of a Query in a Database
More informationBIS Database Management Systems.
BIS 512 - Database Management Systems http://www.mis.boun.edu.tr/durahim/ Ahmet Onur Durahim Learning Objectives Database systems concepts Designing and implementing a database application Life of a Query
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationMySQL Database Administrator Training NIIT, Gurgaon India 31 August-10 September 2015
MySQL Database Administrator Training Day 1: AGENDA Introduction to MySQL MySQL Overview MySQL Database Server Editions MySQL Products MySQL Services and Support MySQL Resources Example Databases MySQL
More informationJyotheswar Kuricheti
Jyotheswar Kuricheti 1 Agenda: 1. Performance Tuning Overview 2. Identify Bottlenecks 3. Optimizing at different levels : Target Source Mapping Session System 2 3 Performance Tuning Overview: 4 What is
More informationEMC Documentum xdb. High-performance native XML database optimized for storing and querying large volumes of XML content
DATA SHEET EMC Documentum xdb High-performance native XML database optimized for storing and querying large volumes of XML content The Big Picture Ideal for content-oriented applications like dynamic publishing
More informationWhere We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344
Where We Are Introduction to Data Management CSE 344 Lecture 22: MapReduce We are talking about parallel query processing There exist two main types of engines: Parallel DBMSs (last lecture + quick review)
More informationAccelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017
Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017 About the Presentation Problems Existing Solutions Denis Magda
More informationHive and Shark. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)
Hive and Shark Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Hive and Shark 1393/8/19 1 / 45 Motivation MapReduce is hard to
More informationCSE 544: Principles of Database Systems
CSE 544: Principles of Database Systems Anatomy of a DBMS, Parallel Databases 1 Announcements Lecture on Thursday, May 2nd: Moved to 9am-10:30am, CSE 403 Paper reviews: Anatomy paper was due yesterday;
More informationFundamentals of Information Systems, Seventh Edition
Chapter 3 Data Centers, and Business Intelligence 1 Why Learn About Database Systems, Data Centers, and Business Intelligence? Database: A database is an organized collection of data. Databases also help
More informationCourse Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course:
Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course: 20762C Developing SQL 2016 Databases Module 1: An Introduction to Database Development Introduction to the
More informationOutline. Database Tuning. Ideal Transaction. Concurrency Tuning Goals. Concurrency Tuning. Nikolaus Augsten. Lock Tuning. Unit 8 WS 2013/2014
Outline Database Tuning Nikolaus Augsten University of Salzburg Department of Computer Science Database Group 1 Unit 8 WS 2013/2014 Adapted from Database Tuning by Dennis Shasha and Philippe Bonnet. Nikolaus
More informationTest On Line: reusing SAS code in WEB applications Author: Carlo Ramella TXT e-solutions
Test On Line: reusing SAS code in WEB applications Author: Carlo Ramella TXT e-solutions Chapter 1: Abstract The Proway System is a powerful complete system for Process and Testing Data Analysis in IC
More informationCSE 344 MAY 7 TH EXAM REVIEW
CSE 344 MAY 7 TH EXAM REVIEW EXAMINATION STATIONS Exam Wednesday 9:30-10:20 One sheet of notes, front and back Practice solutions out after class Good luck! EXAM LENGTH Production v. Verification Practice
More informationWhat s New in MySQL 5.7 Geir Høydalsvik, Sr. Director, MySQL Engineering. Copyright 2015, Oracle and/or its affiliates. All rights reserved.
What s New in MySQL 5.7 Geir Høydalsvik, Sr. Director, MySQL Engineering Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes
More informationCSE 190D Spring 2017 Final Exam Answers
CSE 190D Spring 2017 Final Exam Answers Q 1. [20pts] For the following questions, clearly circle True or False. 1. The hash join algorithm always has fewer page I/Os compared to the block nested loop join
More informationGreenplum Architecture Class Outline
Greenplum Architecture Class Outline Introduction to the Greenplum Architecture What is Parallel Processing? The Basics of a Single Computer Data in Memory is Fast as Lightning Parallel Processing Of Data
More informationChapter 11 Database Concepts
Chapter 11 Database Concepts INTRODUCTION Database is collection of interrelated data and database system is basically a computer based record keeping system. It contains the information about one particular
More informationDatabase Management Systems. Chapter 1
Database Management Systems Chapter 1 Overview of Database Systems Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 What Is a DBMS? A database is a collection of data. Models real-world
More informationDatabase Management and Tuning
Database Management and Tuning Concurrency Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 8 May 10, 2012 Acknowledgements: The slides are provided by Nikolaus
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 26: Parallel Databases and MapReduce CSE 344 - Winter 2013 1 HW8 MapReduce (Hadoop) w/ declarative language (Pig) Cluster will run in Amazon s cloud (AWS)
More informationOracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data
Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous
More informationProcessing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.
Processing Unstructured Data Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd. http://dinesql.com / Dinesh Priyankara @dinesh_priya Founder/Principal Architect dinesql Pvt Ltd. Microsoft Most
More informationIBM DB2 11 DBA for z/os Certification Review Guide Exam 312
Introduction IBM DB2 11 DBA for z/os Certification Review Guide Exam 312 The purpose of this book is to assist you with preparing for the IBM DB2 11 DBA for z/os exam (Exam 312), one of the two required
More informationTuning Enterprise Information Catalog Performance
Tuning Enterprise Information Catalog Performance Copyright Informatica LLC 2015, 2018. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the United States
More informationSAP IQ - Business Intelligence and vertical data processing with 8 GB RAM or less
SAP IQ - Business Intelligence and vertical data processing with 8 GB RAM or less Dipl.- Inform. Volker Stöffler Volker.Stoeffler@DB-TecKnowledgy.info Public Agenda Introduction: What is SAP IQ - in a
More informationCOSC 304 Introduction to Database Systems. Database Introduction. Dr. Ramon Lawrence University of British Columbia Okanagan
COSC 304 Introduction to Database Systems Database Introduction Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca What is a database? A database is a collection of logically
More informationSQL Server Analysis Services
DataBase and Data Mining Group of DataBase and Data Mining Group of Database and data mining group, SQL Server 2005 Analysis Services SQL Server 2005 Analysis Services - 1 Analysis Services Database and
More informationDatabase Systems: Design, Implementation, and Management Tenth Edition. Chapter 1 Database Systems
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 1 Database Systems Objectives In this chapter, you will learn: The difference between data and information What a database
More informationCSE 190D Spring 2017 Final Exam
CSE 190D Spring 2017 Final Exam Full Name : Student ID : Major : INSTRUCTIONS 1. You have up to 2 hours and 59 minutes to complete this exam. 2. You can have up to one letter/a4-sized sheet of notes, formulae,
More informationStream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...
Data Ingestion ETL, Distcp, Kafka, OpenRefine, Query & Exploration SQL, Search, Cypher, Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...
More informationDatabase Applications (15-415)
Database Applications (15-415) The Entity Relationship Model Lecture 2, January 12, 2016 Mohammad Hammoud Today Last Session: Course overview and a brief introduction on databases and database systems
More informationDepartment of Information Technology B.E/B.Tech : CSE/IT Regulation: 2013 Sub. Code / Sub. Name : CS6302 Database Management Systems
COURSE DELIVERY PLAN - THEORY Page 1 of 6 Department of Information Technology B.E/B.Tech : CSE/IT Regulation: 2013 Sub. Code / Sub. Name : CS6302 Database Management Systems Unit : I LP: CS6302 Rev. :
More informationLecture 8. Database Management and Queries
Lecture 8 Database Management and Queries Lecture 8: Outline I. Database Components II. Database Structures A. Conceptual, Logical, and Physical Components III. Non-Relational Databases A. Flat File B.
More informationDatabase Management System. Fundamental Database Concepts
Database Management System Fundamental Database Concepts CONTENTS Basics of DBMS Purpose of DBMS Applications of DBMS Views of Data Instances and Schema Data Models Database Languages Responsibility of
More informationChapter 1: Introduction
Chapter 1: Introduction Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Outline The Need for Databases Data Models Relational Databases Database Design Storage Manager Query
More informationApril Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.
1. MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model. MapReduce is a framework for processing big data which processes data in two phases, a Map
More informationC Exam Code: C Exam Name: IBM InfoSphere DataStage v9.1
C2090-303 Number: C2090-303 Passing Score: 800 Time Limit: 120 min File Version: 36.8 Exam Code: C2090-303 Exam Name: IBM InfoSphere DataStage v9.1 Actualtests QUESTION 1 In your ETL application design
More informationHadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here
Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here 2013-11-12 Copyright 2013 Cloudera
More informationColumnStore Indexes. מה חדש ב- 2014?SQL Server.
ColumnStore Indexes מה חדש ב- 2014?SQL Server דודאי מאיר meir@valinor.co.il 3 Column vs. row store Row Store (Heap / B-Tree) Column Store (values compressed) ProductID OrderDate Cost ProductID OrderDate
More informationIntroduction to Database Management Systems
Introduction to Database Management Systems Excerpt from Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 What Is a DBMS? A very large, integrated collection of data. Models real-world
More informationDatabase Management Systems (CPTR 312)
Database Management Systems (CPTR 312) Preliminaries Me: Raheel Ahmad Ph.D., Southern Illinois University M.S., University of Southern Mississippi B.S., Zakir Hussain College, India Contact: Science 116,
More informationOptimizing Performance for Partitioned Mappings
Optimizing Performance for Partitioned Mappings 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)
More informationBigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data IBM Corporation
BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data 2013 IBM Corporation A Big Data architecture evolves from a traditional BI architecture
More informationDatabase Management Systems MIT Introduction By S. Sabraz Nawaz
Database Management Systems MIT 22033 Introduction By S. Sabraz Nawaz Recommended Reading Database Management Systems 3 rd Edition, Ramakrishnan, Gehrke Murach s SQL Server 2008 for Developers Any book
More informationWhat s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence
What s a database system? Review of Basic Database Concepts CPS 296.1 Topics in Database Systems According to Oxford Dictionary Database: an organized body of related information Database system, DataBase
More informationBasant Group of Institution
Basant Group of Institution Visual Basic 6.0 Objective Question Q.1 In the relational modes, cardinality is termed as: (A) Number of tuples. (B) Number of attributes. (C) Number of tables. (D) Number of
More informationIntroduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data
Introduction to Hadoop High Availability Scaling Advantages and Challenges Introduction to Big Data What is Big data Big Data opportunities Big Data Challenges Characteristics of Big data Introduction
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs
More informationCTL.SC4x Technology and Systems
in Supply Chain Management CTL.SC4x Technology and Systems Key Concepts Document This document contains the Key Concepts for the SC4x course, Weeks 1 and 2. These are meant to complement, not replace,
More informationMahathma Gandhi University
Mahathma Gandhi University BSc Computer science III Semester BCS 303 OBJECTIVE TYPE QUESTIONS Choose the correct or best alternative in the following: Q.1 In the relational modes, cardinality is termed
More informationBig Data Hadoop Course Content
Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux
More informationIBM DB2 for z/os Application Developer Certification
IBM DB2 for z/os Application Developer Certification Professional Certification Exam Copyright 2018 Computer Business International, Inc. www.cbi4you.com 1 What does it involve? IBM DB2 for z/os Application
More informationLearning Alliance Corporation, Inc. For more info: go to
Writing Queries Using Microsoft SQL Server Transact-SQL Length: 3 Day(s) Language(s): English Audience(s): IT Professionals Level: 200 Technology: Microsoft SQL Server Type: Course Delivery Method: Instructor-led
More information6232A - Version: 1. Implementing a Microsoft SQL Server 2008 Database
6232A - Version: 1 Implementing a Microsoft SQL Server 2008 Database Implementing a Microsoft SQL Server 2008 Database 6232A - Version: 1 5 days Course Description: This five-day instructor-led course
More informationOutline. Quick Introduction to Database Systems. Data Manipulation Tasks. What do they all have in common? CSE142 Wi03 G-1
Outline Quick Introduction to Database Systems Why do we need a different kind of system? What is a database system? Separating the what the how: The relational data model Querying the databases: SQL May
More informationSepand Gojgini. ColumnStore Index Primer
Sepand Gojgini ColumnStore Index Primer SQLSaturday Sponsors! Titanium & Global Partner Gold Silver Bronze Without the generosity of these sponsors, this event would not be possible! Please, stop by the
More informationAsanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks
Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data
More informationSafe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
More informationHAWQ: A Massively Parallel Processing SQL Engine in Hadoop
HAWQ: A Massively Parallel Processing SQL Engine in Hadoop Lei Chang, Zhanwei Wang, Tao Ma, Lirong Jian, Lili Ma, Alon Goldshuv Luke Lonergan, Jeffrey Cohen, Caleb Welton, Gavin Sherry, Milind Bhandarkar
More informationCompSci 516: Database Systems. Lecture 20. Parallel DBMS. Instructor: Sudeepa Roy
CompSci 516 Database Systems Lecture 20 Parallel DBMS Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements HW3 due on Monday, Nov 20, 11:55 pm (in 2 weeks) See some
More informationSql 2008 Copy Table Structure And Database To
Sql 2008 Copy Table Structure And Database To Another Table Different you can create a table with same schema in another database first and copy the data like Browse other questions tagged sql-server sql-server-2008r2-express.
More informationDATABASE MANAGEMENT SYSTEMS. UNIT I Introduction to Database Systems
DATABASE MANAGEMENT SYSTEMS UNIT I Introduction to Database Systems Terminology Data = known facts that can be recorded Database (DB) = logically coherent collection of related data with some inherent
More informationTalend Open Studio for Data Quality. User Guide 5.5.2
Talend Open Studio for Data Quality User Guide 5.5.2 Talend Open Studio for Data Quality Adapted for v5.5. Supersedes previous releases. Publication date: January 29, 2015 Copyleft This documentation is
More informationRajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10
Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10 RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY, KIRUMAMPAKKAM-607 402 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING QUESTION BANK
More informationBonus Content. Glossary
Bonus Content Glossary ActiveX control: A reusable software component that can be added to an application, reducing development time in the process. ActiveX is a Microsoft technology; ActiveX components
More informationThe Relational Model. Database Management Systems
The Relational Model Fall 2017, Lecture 2 A relationship, I think, is like a shark, you know? It has to constantly move forward or it dies. And I think what we got on our hands is a dead shark. Woody Allen
More informationSystems:;-'./'--'.; r. Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington
Data base 7\,T"] Systems:;-'./'--'.; r Modelsj Languages, Design, and Application Programming Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington Shamkant
More informationStream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...
Data Ingestion ETL, Distcp, Kafka, OpenRefine, Query & Exploration SQL, Search, Cypher, Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...
More informationCopyright 2012, Oracle and/or its affiliates. All rights reserved.
1 Big Data Connectors: High Performance Integration for Hadoop and Oracle Database Melli Annamalai Sue Mavris Rob Abbott 2 Program Agenda Big Data Connectors: Brief Overview Connecting Hadoop with Oracle
More informationUsing Relational Databases for Digital Research
Using Relational Databases for Digital Research Definition (using a) relational database is a way of recording information in a structure that maximizes efficiency by separating information into different
More informationChapter 1: Introduction
Chapter 1: Introduction Chapter 2: Intro. To the Relational Model Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Database Management System (DBMS) DBMS is Collection of
More informationCSE544: Principles of Database Systems. Lectures 5-6 Database Architecture Storage and Indexes
CSE544: Principles of Database Systems Lectures 5-6 Database Architecture Storage and Indexes 1 Announcements Project Choose a topic. Set limited goals! Sign up (doodle) to meet with me this week Homework
More informationSomething to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:
Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 24: MapReduce CSE 344 - Fall 2016 1 HW8 is out Last assignment! Get Amazon credits now (see instructions) Spark with Hadoop Due next wed CSE 344 - Fall 2016
More informationEXAMGOOD QUESTION & ANSWER. Accurate study guides High passing rate! Exam Good provides update free of charge in one year!
EXAMGOOD QUESTION & ANSWER Exam Good provides update free of charge in one year! Accurate study guides High passing rate! http://www.examgood.com Exam : C2090-610 Title : DB2 10.1 Fundamentals Version
More informationWhat We Have Already Learned. DBMS Deployment: Local. Where We Are Headed Next. DBMS Deployment: 3 Tiers. DBMS Deployment: Client/Server
What We Have Already Learned CSE 444: Database Internals Lectures 19-20 Parallel DBMSs Overall architecture of a DBMS Internals of query execution: Data storage and indexing Buffer management Query evaluation
More informationWriting Queries Using Microsoft SQL Server 2008 Transact- SQL
Writing Queries Using Microsoft SQL Server 2008 Transact- SQL Course 2778-08; 3 Days, Instructor-led Course Description This 3-day instructor led course provides students with the technical skills required
More informationIntroduction to Data Science
UNIT I INTRODUCTION TO DATA SCIENCE Syllabus Introduction of Data Science Basic Data Analytics using R R Graphical User Interfaces Data Import and Export Attribute and Data Types Descriptive Statistics
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More informationTables. Tables. Physical Organization: SQL Server Partitions
Tables Physical Organization: SQL Server 2005 Tables and indexes are stored as a collection of 8 KB pages A table is divided in one or more partitions Each partition contains data rows in either a heap
More informationIBM Spectrum Protect Version Introduction to Data Protection Solutions IBM
IBM Spectrum Protect Version 8.1.2 Introduction to Data Protection Solutions IBM IBM Spectrum Protect Version 8.1.2 Introduction to Data Protection Solutions IBM Note: Before you use this information
More informationFundamentals of. Database Systems. Shamkant B. Navathe. College of Computing Georgia Institute of Technology PEARSON.
Fundamentals of Database Systems 5th Edition Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington Shamkant B. Navathe College of Computing Georgia Institute
More informationMAPR DATA GOVERNANCE WITHOUT COMPROMISE
MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance
More informationMarkLogic Technology Briefing
MarkLogic Technology Briefing Edd Patterson CTO/VP Systems Engineering, Americas Slide 1 Agenda Introductions About MarkLogic MarkLogic Server Deep Dive Slide 2 MarkLogic Overview Company Highlights Headquartered
More informationPhysical Organization: SQL Server 2005
Physical Organization: SQL Server 2005 Tables Tables and indexes are stored as a collection of 8 KB pages A table is divided in one or more partitions Each partition contains data rows in either a heap
More informationRelational Database Systems Part 01. Karine Reis Ferreira
Relational Database Systems Part 01 Karine Reis Ferreira karine@dpi.inpe.br Aula da disciplina Computação Aplicada I (CAP 241) 2016 Database System Database: is a collection of related data. represents
More informationBottom line: A database is the data stored and a database system is the software that manages the data. COSC Dr.
COSC 304 Introduction to Systems Introduction Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca What is a database? A database is a collection of logically related data for
More information