Joining Tables with SQL: The most important econometrics lesson you may ever learn

Size: px
Start display at page:

Download "Joining Tables with SQL: The most important econometrics lesson you may ever learn"

Transcription

1 Western Kentucky University From the SelectedWorks of Matt Bogard Summer June, 2015 Joining Tables with SQL: The most important econometrics lesson you may ever learn Matt Bogard, Western Kentucky University Available at:

2 Joining Tables with SQL: The most important econometrics lesson you may ever learn Introduction Students of econometrics might often spend their days learning proofs and theorems, and if they are lucky they will get their hands on some data and access to software to actually practice some applied work rather it be for a class project or part of a thesis or dissertation. I have written before about the large gap between theoretical and applied econometrics, but there is another gap to speak of, and it has nothing to do with theoretical properties of estimators or interpreting output from STATA, SAS or R. This has to do with raw coding, hacking, and data manipulation skills. The ability to tease out relevant observations and measures from both large structured transactional databases or unstructured log files or tweetstreams. This gap becomes more of an issue as econometricians move from more academic environments to corporate environments and especially so for those economists that begin to take on roles as data scientists. In these environments, not only is it true that problems don t fit the standard textbook solutions (see article Applied Econometrics ), but the data does not exist in any form that is in any way like the simple data sets used in textbooks. One cannot always expect their IT people to be able to just dump them a flat file with all the variables and formats that will work for your research project. In fact, the absolute best you might hope for in many environments is a SQL or Oracle data base with hundreds or thousands of tables and the tiny bits of information you need spread across a number of them. How do you bring all of this information together to do an analysis? This can be complicated, but for the uninitiated I will present some toy examples to give a feel for executing basic database queries to bring together different pieces of information housed in separate tables in order to produce a toy analytics ready data set. We ll use SQL (structured query language) commands in SAS using PROC SQL. Something similar could be done using the sqldf package in R, or other tools. The Business Scenario Suppose you are a seed company and you have a data warehouse that houses all of your customer data, as well as data about your products (hyrbrids) and technology*. You also have implemented a big-data IoT (internet of things) project where customers that plant your hybrids will upload their yields via some web ap. This is very important, because now, instead of just limiting our marketing and R&D work to expensive and planned field experiments, you can tab terabytes of observational data related to your product performance. You are an econometrician just hired as a data scientist to analyze this data. But, you find that it is spread across four different tables in the corporate enterprise data ware house (here we ignore the many challenges involved related to schemas, cardinality etc. or more advanced architectures like Hadoop commonly used in big data applications). This is one of the most common ways that data is stored in business and analytic environments. The CUSTOMER table has demographic information, history, customer IDs etc. We will consider just the number of planted acres, but this table could in theory include a lot of other important information we would want to consider in an analysis.

3 ID ACRES The PRODUCT table tells which products the customer bought, tying product to customer ID. ID TYPE 1 G G G P P G G P G8590 The TECH or technology table contains each product sold by the company and the technology or particular trait associated with that product. In this table each product type there is a corresponding trait or technology (again a big oversimplification of actual corn hybrid data). TYPE G8590 G8484 P8787 TRAIT BT RR RW Suppose that you store all of the uploaded web-ap yield data in a table called CUSTOMER_YIELD. ID YIELD

4 Using SQL to Create an Analytics Ready Data Source In SAS the PROC SQL procedure allows you to bring these disparate sources of data together. If you observe the tables closely, you should be able to see that you can link the CUSTOMER table to the PRODUCT table by referencing or matching on the variable ID which is in each table. If we simply want to keep all of the existing data (all of the customers) in the CUSTOMER table and add in the products they bought (from the PRODUCT table) then we execute what is referred to as a LEFT JOIN using the following code in SAS to create a new combined table called TEMP1_ADD_PRODUCT. PROC SQL CREATE TABLE TEMP1_ADD_PRODUCT AS SELECT A.*, B.TYPE FROM CUSTOMER A LEFT JOIN PRODUCT B ON A.ID = B.ID QUIT R: library(sqldf)# required sqldf library temp1_add_product <- sqldf('select a.*,b.type from customer a left join product b on a.id = b.id' ) Python: import pandas as pd # package for data manipulation temp1_add_product = pd.merge(customer,product[['type','id']], on='id', how='left') The SELECT statement tells the procedure which variables from each table to select for the new data set. Each table is referenced by an alias in this as A is the designated alias for the CUSTOMER table while B is the designated alias for the PRODUCT table. The reference A.* indicates we want to select all of the variables in the data set associated with A. The reference to B.TYPE indicates that we only are interested in adding the TYPE variable from the PRODUCT table. (in more complex real world applications tables could contain numerous variables and we often only want certain key variables from each table). We tell the procedure to get the data FROM the CUSTOMER table (designated with alias A) and execute a LEFT JOIN with the PRODUCT table (designated with the alias B). We tell the procedure to join the two tables based ON the ID variable in each respective table. (variables like this that link information between different tables are often called keys ). The syntax is different in Python using pandas, but the logic is analogous. The output data set is below:

5 ID ACRES TYPE G G G P P G G P G8590 Now we have customer demographics (ID & acres in this simplified example) combined with the product or variety of seed they planted. Next we want to determine what kind of technology or genetic trait is associated with each type of seed or variety they purchased. We can do this by executing another LEFT JOIN between TEMP1_ADD_PRODUCT and the data in the TECH table. Notice this time the common key between these two tables that links the product to its associated technology is the variable TYPE (which we just added when we created TEMP1_ADD_PRODUCT). The result of this next join is a table we will call TEMP2_ADD_TECH. PROC SQL CREATE TABLE TEMP2_ADD_TECH AS SELECT A.*, B.TRAIT FROM TEMP1_ADD_PRODUCT A LEFT JOIN TECH B ON A.TYPE = B.TYPE QUIT R: temp2_add_tech <- sqldf('select a.*,b.trait from temp1_add_product a left join tech b on a.type = b.type' ) Python: temp2_add_tech = pd.merge(temp1_add_product,tech[['type','trait']], on='type', how='left') ID ACRES TYPE TRAIT G8484 RR G8590 BT G8590 BT G8590 BT G8590 BT G8590 BT P8787 RW P8787 RW P8787 RW

6 Finally we want to get the yield data for each customer s selected variety (product) and the associated trait or technology. Since customers upload their yield data via a web or mobile ap, each yield is going to be associated with the customer ID, so we can join yield data based on the common key ID. (a big simplification in this example is a customer has only one yield data point, but in real world applications they could have multiple farms and fields with multiple data points within each field). PROC SQL CREATE TABLE TEMP3_ADD_YIELD AS SELECT A.*, B.YIELD FROM TEMP2_ADD_TECH A LEFT JOIN CUSTOMER_YIELD B ON A.ID = B.ID QUIT R: temp3_add_yield <- sqldf('select a.*, b.yield from temp2_add_tech a left join customer_yield b on a.id = b.id' ) Python: temp3_add_yield = pd.merge(temp2_add_tech,customer_yield[['id','yield']], on='id', how='left') ID ACRES TYPE TRAIT YIELD G8590 BT G8590 BT G8484 RR P8787 RW P8787 RW G8590 BT G8590 BT P8787 RW G8590 BT 170 Now we have an analytic ready data set that we could use to analyze differences in yield by product TYPE. In each case we added the specific information we required form specific tables building out the final data set. We did this in a number of separate SQL statements executed in SAS. With each additional join, we created an intermediate data set. For illustrative purposes or with small data sets this approach is fine. In more realistic applications, where each intermediate table might consist of millions of rows of

7 data, we would want to be more efficient. The following block of SAS code completes all of the joins and the final data set at once, vs. creating a number of temporary intermediate data sets. PROC SQL CREATE TABLE TEMP1_ADD_ALL AS SELECT A.*, B.TYPE, C.TRAIT, D.YIELD FROM CUSTOMER A LEFT JOIN PRODUCT B ON A.ID = B.ID LEFT JOIN TECH C ON B.TYPE = C.TYPE LEFT JOIN CUSTOMER_YIELD D ON A.ID = D.ID QUIT R: temp1_add_all <- sqldf('select a.*, b.type, c.trait, d.yield from customer a left join product b on a.id =b.id left join tech c on b.type = c.type left join customer_yield d on a.id = d.id' ) Python: temp1_add_all = pd.merge(pd.merge(pd.merge(customer,product[['type','id']],on='id',how='left ),tech[['type','trait']],on='type',how = 'left'), customer_yield[['id','yield']], on='id', how='left') Conclusion My goal is not to teach anyone to be a SQL programmer, but simply introduce you to the paradigm of transactional data bases and what it takes to derive a data set suitable for analysis in a non-academic setting. For further reading about big data applications and econometrics see below. Further Reading: Applied Econometrics: Economists as Data Scientists: Econometrics and Big Data: Is machine learning trending with economists? Big data, John Deere, and the internet of things.

8 The Data Science Venn Diagram Big Ag Meets Big Data Notes: 1) Bt refers to a technology or genetic trait in plants that allows them to express Bt proteins which are toxic to certain pests 2) RR refers to a technology or genetic trait in plants that allows them to be resistant to the herbicide Roundup. Appendix: SAS, R and Python Code for Building Demo Data Mart SAS Code: *SET UP TOY CUSTOMER DATA BASE DATA CUSTOMER INPUT ID ACRES CARDS RUN DATA PRODUCT INPUT ID TYPE $ CARDS 1 G G G P P G G P G8590 RUN

9 DATA TECH INPUT TYPE $ TRAIT $ CARDS G8590 BT G8484 RR P8787 RW RUN DATA CUSTOMER_YIELD INPUT ID YIELD CARDS RUN R Code: # generate data fields id <- numeric() id <- c(1,2,3,4,5,6,7,8,9) acres <- c(1800,1970,980,960,970,1500,700,2500,2980) type <- c('g8590','g8590','g8484','p8787','p8787','g8590','g8590','p8787','g8590') trait <- c('bt','rr','rw') yield <- c(160,165,180,200,175,149,168,300,170) # generate fact tables customer <- cbind.data.frame(id,acres) product <- cbind.data.frame(id,type) customer_yield <- cbind.data.frame(id,yield) # generate special tech lookup table type2 <- c('g8590','g8484','p8787') tech <- cbind.data.frame(type2,trait) tech$type <- tech$type2 tech$type2 <- NULL Python: # create customer table data = {'id':[1,2,3,4,5,6,7,8,9],'acres':[1800,1970,980,960,970,1500,700,2500,2980]} customer = pd.dataframe(data,columns =['id','acres'])

10 # create product table data = {'id':[1,2,3,4,5,6,7,8,9],'type':['g8590','g8590','g8484','p8787','p8787','g8 590','G8590','P8787','G8590']} product = pd.dataframe(data,columns =['id','type']) # create tech table data = {'type':['g8590','g8484','p8787'],'trait':['bt','rr','rw']} tech = pd.dataframe(data,columns =['type','trait']) # create customer yield table data = {'id':[1,2,3,4,5,6,7,8,9],'yield':[160,165,180,200,175,149,168,300,170]} customer_yield = pd.dataframe(data,columns =['id','yield'])

Guide Users along Information Pathways and Surf through the Data

Guide Users along Information Pathways and Surf through the Data Guide Users along Information Pathways and Surf through the Data Stephen Overton, Overton Technologies, LLC, Raleigh, NC ABSTRACT Business information can be consumed many ways using the SAS Enterprise

More information

Data Management Glossary

Data Management Glossary Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative

More information

Efficiently Join a SAS Data Set with External Database Tables

Efficiently Join a SAS Data Set with External Database Tables ABSTRACT Paper 2466-2018 Efficiently Join a SAS Data Set with External Database Tables Dadong Li, Michael Cantor, New York University Medical Center Joining a SAS data set with an external database is

More information

The Hadoop Paradigm & the Need for Dataset Management

The Hadoop Paradigm & the Need for Dataset Management The Hadoop Paradigm & the Need for Dataset Management 1. Hadoop Adoption Hadoop is being adopted rapidly by many different types of enterprises and government entities and it is an extraordinarily complex

More information

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples. Instructions to the Examiners: 1. May the Examiners not look for exact words from the text book in the Answers. 2. May any valid example be accepted - example may or may not be from the text book 1. Attempt

More information

Lecture 1 Getting Started with SAS

Lecture 1 Getting Started with SAS SAS for Data Management, Analysis, and Reporting Lecture 1 Getting Started with SAS Portions reproduced with permission of SAS Institute Inc., Cary, NC, USA Goals of the course To provide skills required

More information

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management Management Information Systems Review Questions Chapter 6 Foundations of Business Intelligence: Databases and Information Management 1) The traditional file environment does not typically have a problem

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

Shine a Light on Dark Data with Vertica Flex Tables

Shine a Light on Dark Data with Vertica Flex Tables White Paper Analytics and Big Data Shine a Light on Dark Data with Vertica Flex Tables Hidden within the dark recesses of your enterprise lurks dark data, information that exists but is forgotten, unused,

More information

MySQL for Beginners Ed 3

MySQL for Beginners Ed 3 MySQL for Beginners Ed 3 Duration: 4 Days What you will learn The MySQL for Beginners course helps you learn about the world's most popular open source database. Expert Oracle University instructors will

More information

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications

More information

Dealing with Data Especially Big Data

Dealing with Data Especially Big Data Dealing with Data Especially Big Data INFO-GB-2346.01 Fall 2017 Professor Norman White nwhite@stern.nyu.edu normwhite@twitter Teaching Assistant: Frenil Sanghavi fps241@stern.nyu.edu Administrative Assistant:

More information

Technology In Action, Complete, 14e (Evans et al.) Chapter 11 Behind the Scenes: Databases and Information Systems

Technology In Action, Complete, 14e (Evans et al.) Chapter 11 Behind the Scenes: Databases and Information Systems Technology In Action, Complete, 14e (Evans et al.) Chapter 11 Behind the Scenes: Databases and Information Systems 1) A is a collection of related data that can be stored, sorted, organized, and queried.

More information

Anurag Sharma (IIT Bombay) 1 / 13

Anurag Sharma (IIT Bombay) 1 / 13 0 Map Reduce Algorithm Design Anurag Sharma (IIT Bombay) 1 / 13 Relational Joins Anurag Sharma Fundamental Research Group IIT Bombay Anurag Sharma (IIT Bombay) 1 / 13 Secondary Sorting Required if we need

More information

Business Analytics Nanodegree Syllabus

Business Analytics Nanodegree Syllabus Business Analytics Nanodegree Syllabus Master data fundamentals applicable to any industry Before You Start There are no prerequisites for this program, aside from basic computer skills. You should be

More information

Efficient and Scalable Friend Recommendations

Efficient and Scalable Friend Recommendations Efficient and Scalable Friend Recommendations Comparing Traditional and Graph-Processing Approaches Nicholas Tietz Software Engineer at GraphSQL nicholas@graphsql.com January 13, 2014 1 Introduction 2

More information

Implementing Table Operations Using Structured Query Language (SQL) Using Multiple Operations. SQL: Structured Query Language

Implementing Table Operations Using Structured Query Language (SQL) Using Multiple Operations. SQL: Structured Query Language Implementing Table Operations Using Structured Query Language (SQL) Using Multiple Operations Show Only certain columns and rows from the join of Table A with Table B The implementation of table operations

More information

Databases vs. Spreadsheets

Databases vs. Spreadsheets Databases vs. Spreadsheets Databases A database is a structured set of data that are easily accessible in various ways Database software tools facilitate the automated management of data Storing, modifying,

More information

MySQL for Developers Ed 3

MySQL for Developers Ed 3 Oracle University Contact Us: 0845 777 7711 MySQL for Developers Ed 3 Duration: 5 Days What you will learn This MySQL for Developers training teaches developers how to plan, design and implement applications

More information

An Enchanted World: SAS in an Open Ecosystem

An Enchanted World: SAS in an Open Ecosystem An Enchanted World: SAS in an Open Ecosystem Tuba Islam SAS Global Technology Practice C opyr i g ht 2016, SAS Ins titut e Inc. All rights res er ve d. Diversity can bring power if there is collaboration

More information

Lesson 1. Introduction to Programming OBJECTIVES

Lesson 1. Introduction to Programming OBJECTIVES Introduction to Programming If you re new to programming, you might be intimidated by code and flowcharts. You might even wonder how you ll ever understand them. This lesson offers some basic ideas and

More information

High-Performance Distributed DBMS for Analytics

High-Performance Distributed DBMS for Analytics 1 High-Performance Distributed DBMS for Analytics 2 About me Developer, hardware engineering background Head of Analytic Products Department in Yandex jkee@yandex-team.ru 3 About Yandex One of the largest

More information

MySQL for Developers Ed 3

MySQL for Developers Ed 3 Oracle University Contact Us: 1.800.529.0165 MySQL for Developers Ed 3 Duration: 5 Days What you will learn This MySQL for Developers training teaches developers how to plan, design and implement applications

More information

Lesson 14 Transcript: Triggers

Lesson 14 Transcript: Triggers Lesson 14 Transcript: Triggers Slide 1: Cover Welcome to Lesson 14 of DB2 on Campus Lecture Series. Today, we are going to talk about Triggers. My name is Raul Chong, and I'm the DB2 on Campus Program

More information

When, Where & Why to Use NoSQL?

When, Where & Why to Use NoSQL? When, Where & Why to Use NoSQL? 1 Big data is becoming a big challenge for enterprises. Many organizations have built environments for transactional data with Relational Database Management Systems (RDBMS),

More information

Greenplum Architecture Class Outline

Greenplum Architecture Class Outline Greenplum Architecture Class Outline Introduction to the Greenplum Architecture What is Parallel Processing? The Basics of a Single Computer Data in Memory is Fast as Lightning Parallel Processing Of Data

More information

Learn SQL by Calculating Customer Lifetime Value

Learn SQL by Calculating Customer Lifetime Value Learn SQL Learn SQL by Calculating Customer Lifetime Value Setup, Counting and Filtering 1 Learn SQL CONTENTS Getting Started Scenario Setup Sorting with ORDER BY FilteringwithWHERE FilteringandSorting

More information

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP 07.29.2015 LANDING STAGING DW Let s start with something basic Is Data Lake a new concept? What is the closest we can

More information

How to speed up a database which has gotten slow

How to speed up a database which has gotten slow Triad Area, NC USA E-mail: info@geniusone.com Web: http://geniusone.com How to speed up a database which has gotten slow hardware OS database parameters Blob fields Indices table design / table contents

More information

Big Data Specialized Studies

Big Data Specialized Studies Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate

More information

Improving the ROI of Your Data Warehouse

Improving the ROI of Your Data Warehouse Improving the ROI of Your Data Warehouse Many organizations are struggling with a straightforward but challenging problem: their data warehouse can t affordably house all of their data and simultaneously

More information

SOFTWARE DEVELOPMENT: DATA SCIENCE

SOFTWARE DEVELOPMENT: DATA SCIENCE PROFESSIONAL CAREER TRAINING INSTITUTE SOFTWARE DEVELOPMENT: DATA SCIENCE www.pcti.edu/data-science applicant@pcti.edu 832-484-9100 PROGRAM OVERVIEW Prepare for a life changing career as a data scientist

More information

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Data Warehouses Chapter 12. Class 10: Data Warehouses 1 Data Warehouses Chapter 12 Class 10: Data Warehouses 1 OLTP vs OLAP Operational Database: a database designed to support the day today transactions of an organization Data Warehouse: historical data is

More information

End-to-End data mining feature integration, transformation and selection with Datameer Datameer, Inc. All rights reserved.

End-to-End data mining feature integration, transformation and selection with Datameer Datameer, Inc. All rights reserved. End-to-End data mining feature integration, transformation and selection with Datameer Fastest time to Insights Rapid Data Integration Zero coding data integration Wizard-led data integration & No ETL

More information

E(xtract) T(ransform) L(oad)

E(xtract) T(ransform) L(oad) Gunther Heinrich, Tobias Steimer E(xtract) T(ransform) L(oad) OLAP 20.06.08 Agenda 1 Introduction 2 Extract 3 Transform 4 Load 5 SSIS - Tutorial 2 1 Introduction 1.1 What is ETL? 1.2 Alternative Approach

More information

Relational Databases. Relational Databases. SQLite Manager For Firefox. https://addons.mozilla.org/en-us/firefox/addon/sqlite-manager/

Relational Databases. Relational Databases. SQLite Manager For Firefox. https://addons.mozilla.org/en-us/firefox/addon/sqlite-manager/ Relational Databases Charles Severance Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution 3.0 License. http://creativecommons.org/licenses/by/3.0/.

More information

Lesson 3: Building a Market Basket Scenario (Intermediate Data Mining Tutorial)

Lesson 3: Building a Market Basket Scenario (Intermediate Data Mining Tutorial) From this diagram, you can see that the aggregated mining model preserves the overall range and trends in values while minimizing the fluctuations in the individual data series. Conclusion You have learned

More information

7 BEST PRACTICES TO BECOME A TABLEAU NINJA FOR OBIEE

7 BEST PRACTICES TO BECOME A TABLEAU NINJA FOR OBIEE 7 BEST PRACTICES TO BECOME A TABLEAU NINJA FOR OBIEE Whitepaper Abstract By connecting directly from Tableau to OBIEE, BI Connector has empowered business users with Self-Service capabilities like never

More information

Data Analysis and Data Science

Data Analysis and Data Science Data Analysis and Data Science CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/29/15 Agenda Check-in Online Analytical Processing Data Science Homework 8 Check-in Online Analytical

More information

@Pentaho #BigDataWebSeries

@Pentaho #BigDataWebSeries Enterprise Data Warehouse Optimization with Hadoop Big Data @Pentaho #BigDataWebSeries Your Hosts Today Dave Henry SVP Enterprise Solutions Davy Nys VP EMEA & APAC 2 Source/copyright: The Human Face of

More information

Introducing Oracle R Enterprise 1.4 -

Introducing Oracle R Enterprise 1.4 - Hello, and welcome to this online, self-paced lesson entitled Introducing Oracle R Enterprise. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle. I

More information

BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data IBM Corporation

BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data IBM Corporation BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data 2013 IBM Corporation A Big Data architecture evolves from a traditional BI architecture

More information

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Topics AGENDA Challenges with Big Data Analytics How SAS can help you to minimize time to value with

More information

Accelerate your SAS analytics to take the gold

Accelerate your SAS analytics to take the gold Accelerate your SAS analytics to take the gold A White Paper by Fuzzy Logix Whatever the nature of your business s analytics environment we are sure you are under increasing pressure to deliver more: more

More information

Microsoft Analytics Platform System (APS)

Microsoft Analytics Platform System (APS) Microsoft Analytics Platform System (APS) The turnkey modern data warehouse appliance Matt Usher, Senior Program Manager @ Microsoft About.me @two_under Senior Program Manager 9 years at Microsoft Visual

More information

Data Architecture Whitepaper MODERN DATA ARCHITECTURE BY W H INMON

Data Architecture Whitepaper MODERN DATA ARCHITECTURE BY W H INMON Data Architecture Whitepaper MODERN DATA ARCHITECTURE BY W H INMON DATA WAREHOUSE Data warehouse is an established concept and discipline that is discussed in books, conferences and seminars. Indeed data

More information

Data Structures & Algorithms In Java Download Free (EPUB, PDF)

Data Structures & Algorithms In Java Download Free (EPUB, PDF) Data Structures & Algorithms In Java Download Free (EPUB, PDF) Data Structures and Algorithms in Java, Second Edition is designed to be easy to read and understand although the topic itself is complicated.

More information

Projected by: LUKA CECXLADZE BEQA CHELIDZE Superviser : Nodar Momtsemlidze

Projected by: LUKA CECXLADZE BEQA CHELIDZE Superviser : Nodar Momtsemlidze Projected by: LUKA CECXLADZE BEQA CHELIDZE Superviser : Nodar Momtsemlidze About HBase HBase is a column-oriented database management system that runs on top of HDFS. It is well suited for sparse data

More information

NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018

NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018 NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE DACE https://dace.unige.ch Data and Analysis Center for Exoplanets. Facility to store, exchange and analyse data

More information

2997 Yarmouth Greenway Drive, Madison, WI Phone: (608) Web:

2997 Yarmouth Greenway Drive, Madison, WI Phone: (608) Web: Getting the Most Out of SAS Enterprise Guide 2997 Yarmouth Greenway Drive, Madison, WI 53711 Phone: (608) 278-9964 Web: www.sys-seminar.com 1 Questions, Comments Technical Difficulties: Call 1-800-263-6317

More information

Relational Databases. Charles Severance

Relational Databases. Charles Severance Relational Databases Charles Severance Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution 3.0 License. http://creativecommons.org/licenses/by/3.0/.

More information

Lecture 8. Database Management and Queries

Lecture 8. Database Management and Queries Lecture 8 Database Management and Queries Lecture 8: Outline I. Database Components II. Database Structures A. Conceptual, Logical, and Physical Components III. Non-Relational Databases A. Flat File B.

More information

Data in the Cloud and Analytics in the Lake

Data in the Cloud and Analytics in the Lake Data in the Cloud and Analytics in the Lake Introduction Working in Analytics for over 5 years Part the digital team at BNZ for 3 years Based in the Auckland office Preferred Languages SQL Python (PySpark)

More information

Timeless Theory vs. Changing Users: Reconsidering Database Education

Timeless Theory vs. Changing Users: Reconsidering Database Education Timeless Theory vs. Changing Users: Reconsidering Database Education Purpose of the Session Demonstration of subject matter mastery, teaching skills But theme topic required Focus on my two divergent roles

More information

SQL. Dean Williamson, Ph.D. Assistant Vice President Institutional Research, Effectiveness, Analysis & Accreditation Prairie View A&M University

SQL. Dean Williamson, Ph.D. Assistant Vice President Institutional Research, Effectiveness, Analysis & Accreditation Prairie View A&M University SQL Dean Williamson, Ph.D. Assistant Vice President Institutional Research, Effectiveness, Analysis & Accreditation Prairie View A&M University SQL 1965: Maron & Levien propose Relational Data File 1968:

More information

SQL DDL. CS3 Database Systems Weeks 4-5 SQL DDL Database design. Key Constraints. Inclusion Constraints

SQL DDL. CS3 Database Systems Weeks 4-5 SQL DDL Database design. Key Constraints. Inclusion Constraints SQL DDL CS3 Database Systems Weeks 4-5 SQL DDL Database design In its simplest use, SQL s Data Definition Language (DDL) provides a name and a type for each column of a table. CREATE TABLE Hikers ( HId

More information

Chapter 1 Introduction

Chapter 1 Introduction Chapter 1 Introduction Why I Am Writing This: Why I am I writing a set of tutorials on compilers and how to build them? Well, the idea goes back several years ago when Rapid-Q, one of the best free BASIC

More information

Incremental Updates VS Full Reload

Incremental Updates VS Full Reload Incremental Updates VS Full Reload Change Data Capture Minutes VS Hours 1 Table of Contents Executive Summary - 3 Accessing Data from a Variety of Data Sources and Platforms - 4 Approaches to Moving Changed

More information

Free Ebooks A Python Primer For ArcGISÂ

Free Ebooks A Python Primer For ArcGISÂ Free Ebooks A Python Primer For ArcGISÂ The automation of geoprocessing tasks is becoming a common practice among GIS professionals. Python is the standard programming language for ArcGIS and other fields

More information

Applying big data analytics in practice

Applying big data analytics in practice ARISTOTLE UNIVERSITY of THESSALONIKI Applying big data analytics in practice Anastasios Gounaris School of Informatics datalab.csd.auth.gr/~gounaris email: gounaria@csd.auth.gr New data every 1 min 2 What

More information

Stat Wk 3. Stat 342 Notes. Week 3, Page 1 / 71

Stat Wk 3. Stat 342 Notes. Week 3, Page 1 / 71 Stat 342 - Wk 3 What is SQL Proc SQL 'Select' command and 'from' clause 'group by' clause 'order by' clause 'where' clause 'create table' command 'inner join' (as time permits) Stat 342 Notes. Week 3,

More information

Search Engines and Knowledge Graphs

Search Engines and Knowledge Graphs Search Engines and Knowledge Graphs It s Complicated! Panos Alexopoulos Head of Ontology Who we are and what we do We develop Technology to bridge the language and meaning gap between People and Jobs...

More information

Crystal Reports. Overview. Contents. How to report off a Teradata Database

Crystal Reports. Overview. Contents. How to report off a Teradata Database Crystal Reports How to report off a Teradata Database Overview What is Teradata? NCR Teradata is a database and data warehouse software developer. This whitepaper will give you some basic information on

More information

Evolution of Database Systems

Evolution of Database Systems Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second

More information

THE RISE OF. The Disruptive Data Warehouse

THE RISE OF. The Disruptive Data Warehouse THE RISE OF The Disruptive Data Warehouse CONTENTS What Is the Disruptive Data Warehouse? 1 Old School Query a single database The data warehouse is for business intelligence The data warehouse is based

More information

Lesson 1. Why Use It? Terms to Know

Lesson 1. Why Use It? Terms to Know describe how a table is designed and filled. describe a form and its use. know the appropriate time to use a sort or a query. see the value of key fields, common fields, and multiple-field sorts. describe

More information

FOUNDATIONS OF A CROSS-DISCIPLINARY PEDAGOGY FOR BIG DATA *

FOUNDATIONS OF A CROSS-DISCIPLINARY PEDAGOGY FOR BIG DATA * FOUNDATIONS OF A CROSS-DISCIPLINARY PEDAGOGY FOR BIG DATA * Joshua Eckroth Stetson University DeLand, Florida 386-740-2519 jeckroth@stetson.edu ABSTRACT The increasing awareness of big data is transforming

More information

DOWNLOAD PDF INSIDE RELATIONAL DATABASES

DOWNLOAD PDF INSIDE RELATIONAL DATABASES Chapter 1 : Inside Microsoft's Cosmos DB ZDNet Inside Relational Databases is an excellent introduction to the topic and a very good resource. I read the book cover to cover and found the authors' insights

More information

Fundamentals of Information Systems, Seventh Edition

Fundamentals of Information Systems, Seventh Edition Chapter 3 Data Centers, and Business Intelligence 1 Why Learn About Database Systems, Data Centers, and Business Intelligence? Database: A database is an organized collection of data. Databases also help

More information

Chapter 1 SQL and Data

Chapter 1 SQL and Data Chapter 1 SQL and Data What is SQL? Structured Query Language An industry-standard language used to access & manipulate data stored in a relational database E. F. Codd, 1970 s IBM 2 What is Oracle? A relational

More information

Optimizing Testing Performance With Data Validation Option

Optimizing Testing Performance With Data Validation Option Optimizing Testing Performance With Data Validation Option 1993-2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording

More information

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI Department of Information Technology IT6701 - INFORMATION MANAGEMENT Anna University 2 & 16 Mark Questions & Answers Year / Semester: IV / VII Regulation: 2013

More information

Lecture 12. Lecture 12: The IO Model & External Sorting

Lecture 12. Lecture 12: The IO Model & External Sorting Lecture 12 Lecture 12: The IO Model & External Sorting Announcements Announcements 1. Thank you for the great feedback (post coming soon)! 2. Educational goals: 1. Tech changes, principles change more

More information

QLIK INTEGRATION WITH AMAZON REDSHIFT

QLIK INTEGRATION WITH AMAZON REDSHIFT QLIK INTEGRATION WITH AMAZON REDSHIFT Qlik Partner Engineering Created August 2016, last updated March 2017 Contents Introduction... 2 About Amazon Web Services (AWS)... 2 About Amazon Redshift... 2 Qlik

More information

Scaling Up 1 CSE 6242 / CX Duen Horng (Polo) Chau Georgia Tech. Hadoop, Pig

Scaling Up 1 CSE 6242 / CX Duen Horng (Polo) Chau Georgia Tech. Hadoop, Pig CSE 6242 / CX 4242 Scaling Up 1 Hadoop, Pig Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Le

More information

CA Test Data Manager 3.x: Foundations 200

CA Test Data Manager 3.x: Foundations 200 CA EDUCATION COURSE DESCRIPTION CA Test Data Manager 3.x: Foundations 200 Course Overview PRODUCT RELEASE CA Test Data Manager 3.2 This course provides students with primary concepts on each function of

More information

One-PROC-Away: The Essence of an Analysis Database Russell W. Helms, Ph.D. Rho, Inc.

One-PROC-Away: The Essence of an Analysis Database Russell W. Helms, Ph.D. Rho, Inc. One-PROC-Away: The Essence of an Analysis Database Russell W. Helms, Ph.D. Rho, Inc. Chapel Hill, NC RHelms@RhoWorld.com www.rhoworld.com Presented to ASA/JSM: San Francisco, August 2003 One-PROC-Away

More information

CS317 File and Database Systems

CS317 File and Database Systems CS317 File and Database Systems Lecture 3 Relational Model & Languages Part-1 September 7, 2018 Sam Siewert More Embedded Systems Summer - Analog, Digital, Firmware, Software Reasons to Consider Catch

More information

Who we are: Database Research - Provenance, Integration, and more hot stuff. Boris Glavic. Department of Computer Science

Who we are: Database Research - Provenance, Integration, and more hot stuff. Boris Glavic. Department of Computer Science Who we are: Database Research - Provenance, Integration, and more hot stuff Boris Glavic Department of Computer Science September 24, 2013 Hi, I am Boris Glavic, Assistant Professor Hi, I am Boris Glavic,

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

Databases in Discovery

Databases in Discovery Databases in Discovery By Craig Ball I loathe the practice of law from forms, but bow to its power. Lawyers love forms; so, to get lawyers to use more efficient and precise prose in their discovery requests,

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems 1 Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems The Defacto Choice For Convergence 2 ABSTRACT & SPEAKER BIO Dealing with enormous data growth is a key challenge for

More information

Informatics 1: Data & Analysis

Informatics 1: Data & Analysis Informatics 1: Data & Analysis Lecture 5: Relational Algebra Ian Stark School of Informatics The University of Edinburgh Tuesday 31 January 2017 Semester 2 Week 3 https://blog.inf.ed.ac.uk/da17 Tutorial

More information

Data Management Lecture Outline 2 Part 2. Instructor: Trevor Nadeau

Data Management Lecture Outline 2 Part 2. Instructor: Trevor Nadeau Data Management Lecture Outline 2 Part 2 Instructor: Trevor Nadeau Data Entities, Attributes, and Items Entity: Things we store information about. (i.e. persons, places, objects, events, etc.) Have relationships

More information

CTL.SC4x Technology and Systems

CTL.SC4x Technology and Systems in Supply Chain Management CTL.SC4x Technology and Systems Key Concepts Document This document contains the Key Concepts for the SC4x course, Weeks 1 and 2. These are meant to complement, not replace,

More information

Are your spreadsheets filled with unnecessary zero s, cluttering your information and making it hard to identify significant results?

Are your spreadsheets filled with unnecessary zero s, cluttering your information and making it hard to identify significant results? Declutter your Spreadsheets by Hiding Zero Values Are your spreadsheets filled with unnecessary zero s, cluttering your information and making it hard to identify significant results? Undertaking data

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

Todd Walter Chief Technologist Teradata Corporation

Todd Walter Chief Technologist Teradata Corporation Todd Walter Chief Technologist Teradata Corporation 10/14/2013 1 The following solely represents the opinions of Todd Walter not the opinions of Teradata Corporation Nothing in this document may be construed

More information

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security Bringing OpenStack to the Enterprise An enterprise-class solution ensures you get the required performance, reliability, and security INTRODUCTION Organizations today frequently need to quickly get systems

More information

TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL

TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL We have spent the first part of the course learning Excel: importing files, cleaning, sorting, filtering, pivot tables and exporting

More information

Modelling Structures in Data Mining Techniques

Modelling Structures in Data Mining Techniques Modelling Structures in Data Mining Techniques Ananth Y N 1, Narahari.N.S 2 Associate Professor, Dept of Computer Science, School of Graduate Studies- JainUniversity- J.C.Road, Bangalore, INDIA 1 Professor

More information

Step-by-step data transformation

Step-by-step data transformation Step-by-step data transformation Explanation of what BI4Dynamics does in a process of delivering business intelligence Contents 1. Introduction... 3 Before we start... 3 1 st. STEP: CREATING A STAGING

More information

How to integrate data into Tableau

How to integrate data into Tableau 1 How to integrate data into Tableau a comparison of 3 approaches: ETL, Tableau self-service and WHITE PAPER WHITE PAPER 2 data How to integrate data into Tableau a comparison of 3 es: ETL, Tableau self-service

More information

Create A Relational Database Schema For The Following Library System

Create A Relational Database Schema For The Following Library System Create A Relational Database Schema For The Following Library System Define data atomicity as it relates to the definition of relational databases. Define the following concepts:. Key Design the schema

More information

Learning PHP, MySQL, JavaScript, And CSS: A Step-by-Step Guide To Creating Dynamic Websites PDF

Learning PHP, MySQL, JavaScript, And CSS: A Step-by-Step Guide To Creating Dynamic Websites PDF Learning PHP, MySQL, JavaScript, And CSS: A Step-by-Step Guide To Creating Dynamic Websites PDF Learn how to build interactive, data-driven websitesâ even if you donâ t have any previous programming experience.

More information

Oracle Big Data Discovery

Oracle Big Data Discovery Oracle Big Data Discovery Turning Data into Business Value Harald Erb Oracle Business Analytics & Big Data 1 Safe Harbor Statement The following is intended to outline our general product direction. It

More information

What is the Best Way for Children to Learn Computer Programming?

What is the Best Way for Children to Learn Computer Programming? What is the Best Way for Children to Learn Computer Programming? Dr Alex Davidovic One of the defining characteristics of today s society is that the computers and mobile devices are the integral and natural

More information

Test bank for accounting information systems 1st edition by richardson chang and smith

Test bank for accounting information systems 1st edition by richardson chang and smith Test bank for accounting information systems 1st edition by richardson chang and smith Chapter 04 Relational Databases and Enterprise Systems True / False Questions 1. Three types of data models used today

More information

Introduction to Oracle

Introduction to Oracle Class Note: Chapter 1 Introduction to Oracle (Updated May 10, 2016) [The class note is the typical material I would prepare for my face-to-face class. Since this is an Internet based class, I am sharing

More information