Joining Tables with SQL: The most important econometrics lesson you may ever learn
|
|
- Jeffrey Goodman
- 5 years ago
- Views:
Transcription
1 Western Kentucky University From the SelectedWorks of Matt Bogard Summer June, 2015 Joining Tables with SQL: The most important econometrics lesson you may ever learn Matt Bogard, Western Kentucky University Available at:
2 Joining Tables with SQL: The most important econometrics lesson you may ever learn Introduction Students of econometrics might often spend their days learning proofs and theorems, and if they are lucky they will get their hands on some data and access to software to actually practice some applied work rather it be for a class project or part of a thesis or dissertation. I have written before about the large gap between theoretical and applied econometrics, but there is another gap to speak of, and it has nothing to do with theoretical properties of estimators or interpreting output from STATA, SAS or R. This has to do with raw coding, hacking, and data manipulation skills. The ability to tease out relevant observations and measures from both large structured transactional databases or unstructured log files or tweetstreams. This gap becomes more of an issue as econometricians move from more academic environments to corporate environments and especially so for those economists that begin to take on roles as data scientists. In these environments, not only is it true that problems don t fit the standard textbook solutions (see article Applied Econometrics ), but the data does not exist in any form that is in any way like the simple data sets used in textbooks. One cannot always expect their IT people to be able to just dump them a flat file with all the variables and formats that will work for your research project. In fact, the absolute best you might hope for in many environments is a SQL or Oracle data base with hundreds or thousands of tables and the tiny bits of information you need spread across a number of them. How do you bring all of this information together to do an analysis? This can be complicated, but for the uninitiated I will present some toy examples to give a feel for executing basic database queries to bring together different pieces of information housed in separate tables in order to produce a toy analytics ready data set. We ll use SQL (structured query language) commands in SAS using PROC SQL. Something similar could be done using the sqldf package in R, or other tools. The Business Scenario Suppose you are a seed company and you have a data warehouse that houses all of your customer data, as well as data about your products (hyrbrids) and technology*. You also have implemented a big-data IoT (internet of things) project where customers that plant your hybrids will upload their yields via some web ap. This is very important, because now, instead of just limiting our marketing and R&D work to expensive and planned field experiments, you can tab terabytes of observational data related to your product performance. You are an econometrician just hired as a data scientist to analyze this data. But, you find that it is spread across four different tables in the corporate enterprise data ware house (here we ignore the many challenges involved related to schemas, cardinality etc. or more advanced architectures like Hadoop commonly used in big data applications). This is one of the most common ways that data is stored in business and analytic environments. The CUSTOMER table has demographic information, history, customer IDs etc. We will consider just the number of planted acres, but this table could in theory include a lot of other important information we would want to consider in an analysis.
3 ID ACRES The PRODUCT table tells which products the customer bought, tying product to customer ID. ID TYPE 1 G G G P P G G P G8590 The TECH or technology table contains each product sold by the company and the technology or particular trait associated with that product. In this table each product type there is a corresponding trait or technology (again a big oversimplification of actual corn hybrid data). TYPE G8590 G8484 P8787 TRAIT BT RR RW Suppose that you store all of the uploaded web-ap yield data in a table called CUSTOMER_YIELD. ID YIELD
4 Using SQL to Create an Analytics Ready Data Source In SAS the PROC SQL procedure allows you to bring these disparate sources of data together. If you observe the tables closely, you should be able to see that you can link the CUSTOMER table to the PRODUCT table by referencing or matching on the variable ID which is in each table. If we simply want to keep all of the existing data (all of the customers) in the CUSTOMER table and add in the products they bought (from the PRODUCT table) then we execute what is referred to as a LEFT JOIN using the following code in SAS to create a new combined table called TEMP1_ADD_PRODUCT. PROC SQL CREATE TABLE TEMP1_ADD_PRODUCT AS SELECT A.*, B.TYPE FROM CUSTOMER A LEFT JOIN PRODUCT B ON A.ID = B.ID QUIT R: library(sqldf)# required sqldf library temp1_add_product <- sqldf('select a.*,b.type from customer a left join product b on a.id = b.id' ) Python: import pandas as pd # package for data manipulation temp1_add_product = pd.merge(customer,product[['type','id']], on='id', how='left') The SELECT statement tells the procedure which variables from each table to select for the new data set. Each table is referenced by an alias in this as A is the designated alias for the CUSTOMER table while B is the designated alias for the PRODUCT table. The reference A.* indicates we want to select all of the variables in the data set associated with A. The reference to B.TYPE indicates that we only are interested in adding the TYPE variable from the PRODUCT table. (in more complex real world applications tables could contain numerous variables and we often only want certain key variables from each table). We tell the procedure to get the data FROM the CUSTOMER table (designated with alias A) and execute a LEFT JOIN with the PRODUCT table (designated with the alias B). We tell the procedure to join the two tables based ON the ID variable in each respective table. (variables like this that link information between different tables are often called keys ). The syntax is different in Python using pandas, but the logic is analogous. The output data set is below:
5 ID ACRES TYPE G G G P P G G P G8590 Now we have customer demographics (ID & acres in this simplified example) combined with the product or variety of seed they planted. Next we want to determine what kind of technology or genetic trait is associated with each type of seed or variety they purchased. We can do this by executing another LEFT JOIN between TEMP1_ADD_PRODUCT and the data in the TECH table. Notice this time the common key between these two tables that links the product to its associated technology is the variable TYPE (which we just added when we created TEMP1_ADD_PRODUCT). The result of this next join is a table we will call TEMP2_ADD_TECH. PROC SQL CREATE TABLE TEMP2_ADD_TECH AS SELECT A.*, B.TRAIT FROM TEMP1_ADD_PRODUCT A LEFT JOIN TECH B ON A.TYPE = B.TYPE QUIT R: temp2_add_tech <- sqldf('select a.*,b.trait from temp1_add_product a left join tech b on a.type = b.type' ) Python: temp2_add_tech = pd.merge(temp1_add_product,tech[['type','trait']], on='type', how='left') ID ACRES TYPE TRAIT G8484 RR G8590 BT G8590 BT G8590 BT G8590 BT G8590 BT P8787 RW P8787 RW P8787 RW
6 Finally we want to get the yield data for each customer s selected variety (product) and the associated trait or technology. Since customers upload their yield data via a web or mobile ap, each yield is going to be associated with the customer ID, so we can join yield data based on the common key ID. (a big simplification in this example is a customer has only one yield data point, but in real world applications they could have multiple farms and fields with multiple data points within each field). PROC SQL CREATE TABLE TEMP3_ADD_YIELD AS SELECT A.*, B.YIELD FROM TEMP2_ADD_TECH A LEFT JOIN CUSTOMER_YIELD B ON A.ID = B.ID QUIT R: temp3_add_yield <- sqldf('select a.*, b.yield from temp2_add_tech a left join customer_yield b on a.id = b.id' ) Python: temp3_add_yield = pd.merge(temp2_add_tech,customer_yield[['id','yield']], on='id', how='left') ID ACRES TYPE TRAIT YIELD G8590 BT G8590 BT G8484 RR P8787 RW P8787 RW G8590 BT G8590 BT P8787 RW G8590 BT 170 Now we have an analytic ready data set that we could use to analyze differences in yield by product TYPE. In each case we added the specific information we required form specific tables building out the final data set. We did this in a number of separate SQL statements executed in SAS. With each additional join, we created an intermediate data set. For illustrative purposes or with small data sets this approach is fine. In more realistic applications, where each intermediate table might consist of millions of rows of
7 data, we would want to be more efficient. The following block of SAS code completes all of the joins and the final data set at once, vs. creating a number of temporary intermediate data sets. PROC SQL CREATE TABLE TEMP1_ADD_ALL AS SELECT A.*, B.TYPE, C.TRAIT, D.YIELD FROM CUSTOMER A LEFT JOIN PRODUCT B ON A.ID = B.ID LEFT JOIN TECH C ON B.TYPE = C.TYPE LEFT JOIN CUSTOMER_YIELD D ON A.ID = D.ID QUIT R: temp1_add_all <- sqldf('select a.*, b.type, c.trait, d.yield from customer a left join product b on a.id =b.id left join tech c on b.type = c.type left join customer_yield d on a.id = d.id' ) Python: temp1_add_all = pd.merge(pd.merge(pd.merge(customer,product[['type','id']],on='id',how='left ),tech[['type','trait']],on='type',how = 'left'), customer_yield[['id','yield']], on='id', how='left') Conclusion My goal is not to teach anyone to be a SQL programmer, but simply introduce you to the paradigm of transactional data bases and what it takes to derive a data set suitable for analysis in a non-academic setting. For further reading about big data applications and econometrics see below. Further Reading: Applied Econometrics: Economists as Data Scientists: Econometrics and Big Data: Is machine learning trending with economists? Big data, John Deere, and the internet of things.
8 The Data Science Venn Diagram Big Ag Meets Big Data Notes: 1) Bt refers to a technology or genetic trait in plants that allows them to express Bt proteins which are toxic to certain pests 2) RR refers to a technology or genetic trait in plants that allows them to be resistant to the herbicide Roundup. Appendix: SAS, R and Python Code for Building Demo Data Mart SAS Code: *SET UP TOY CUSTOMER DATA BASE DATA CUSTOMER INPUT ID ACRES CARDS RUN DATA PRODUCT INPUT ID TYPE $ CARDS 1 G G G P P G G P G8590 RUN
9 DATA TECH INPUT TYPE $ TRAIT $ CARDS G8590 BT G8484 RR P8787 RW RUN DATA CUSTOMER_YIELD INPUT ID YIELD CARDS RUN R Code: # generate data fields id <- numeric() id <- c(1,2,3,4,5,6,7,8,9) acres <- c(1800,1970,980,960,970,1500,700,2500,2980) type <- c('g8590','g8590','g8484','p8787','p8787','g8590','g8590','p8787','g8590') trait <- c('bt','rr','rw') yield <- c(160,165,180,200,175,149,168,300,170) # generate fact tables customer <- cbind.data.frame(id,acres) product <- cbind.data.frame(id,type) customer_yield <- cbind.data.frame(id,yield) # generate special tech lookup table type2 <- c('g8590','g8484','p8787') tech <- cbind.data.frame(type2,trait) tech$type <- tech$type2 tech$type2 <- NULL Python: # create customer table data = {'id':[1,2,3,4,5,6,7,8,9],'acres':[1800,1970,980,960,970,1500,700,2500,2980]} customer = pd.dataframe(data,columns =['id','acres'])
10 # create product table data = {'id':[1,2,3,4,5,6,7,8,9],'type':['g8590','g8590','g8484','p8787','p8787','g8 590','G8590','P8787','G8590']} product = pd.dataframe(data,columns =['id','type']) # create tech table data = {'type':['g8590','g8484','p8787'],'trait':['bt','rr','rw']} tech = pd.dataframe(data,columns =['type','trait']) # create customer yield table data = {'id':[1,2,3,4,5,6,7,8,9],'yield':[160,165,180,200,175,149,168,300,170]} customer_yield = pd.dataframe(data,columns =['id','yield'])
Guide Users along Information Pathways and Surf through the Data
Guide Users along Information Pathways and Surf through the Data Stephen Overton, Overton Technologies, LLC, Raleigh, NC ABSTRACT Business information can be consumed many ways using the SAS Enterprise
More informationData Management Glossary
Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative
More informationEfficiently Join a SAS Data Set with External Database Tables
ABSTRACT Paper 2466-2018 Efficiently Join a SAS Data Set with External Database Tables Dadong Li, Michael Cantor, New York University Medical Center Joining a SAS data set with an external database is
More informationThe Hadoop Paradigm & the Need for Dataset Management
The Hadoop Paradigm & the Need for Dataset Management 1. Hadoop Adoption Hadoop is being adopted rapidly by many different types of enterprises and government entities and it is an extraordinarily complex
More information1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.
Instructions to the Examiners: 1. May the Examiners not look for exact words from the text book in the Answers. 2. May any valid example be accepted - example may or may not be from the text book 1. Attempt
More informationLecture 1 Getting Started with SAS
SAS for Data Management, Analysis, and Reporting Lecture 1 Getting Started with SAS Portions reproduced with permission of SAS Institute Inc., Cary, NC, USA Goals of the course To provide skills required
More informationManagement Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management
Management Information Systems Review Questions Chapter 6 Foundations of Business Intelligence: Databases and Information Management 1) The traditional file environment does not typically have a problem
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationShine a Light on Dark Data with Vertica Flex Tables
White Paper Analytics and Big Data Shine a Light on Dark Data with Vertica Flex Tables Hidden within the dark recesses of your enterprise lurks dark data, information that exists but is forgotten, unused,
More informationMySQL for Beginners Ed 3
MySQL for Beginners Ed 3 Duration: 4 Days What you will learn The MySQL for Beginners course helps you learn about the world's most popular open source database. Expert Oracle University instructors will
More informationCONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM
CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications
More informationDealing with Data Especially Big Data
Dealing with Data Especially Big Data INFO-GB-2346.01 Fall 2017 Professor Norman White nwhite@stern.nyu.edu normwhite@twitter Teaching Assistant: Frenil Sanghavi fps241@stern.nyu.edu Administrative Assistant:
More informationTechnology In Action, Complete, 14e (Evans et al.) Chapter 11 Behind the Scenes: Databases and Information Systems
Technology In Action, Complete, 14e (Evans et al.) Chapter 11 Behind the Scenes: Databases and Information Systems 1) A is a collection of related data that can be stored, sorted, organized, and queried.
More informationAnurag Sharma (IIT Bombay) 1 / 13
0 Map Reduce Algorithm Design Anurag Sharma (IIT Bombay) 1 / 13 Relational Joins Anurag Sharma Fundamental Research Group IIT Bombay Anurag Sharma (IIT Bombay) 1 / 13 Secondary Sorting Required if we need
More informationBusiness Analytics Nanodegree Syllabus
Business Analytics Nanodegree Syllabus Master data fundamentals applicable to any industry Before You Start There are no prerequisites for this program, aside from basic computer skills. You should be
More informationEfficient and Scalable Friend Recommendations
Efficient and Scalable Friend Recommendations Comparing Traditional and Graph-Processing Approaches Nicholas Tietz Software Engineer at GraphSQL nicholas@graphsql.com January 13, 2014 1 Introduction 2
More informationImplementing Table Operations Using Structured Query Language (SQL) Using Multiple Operations. SQL: Structured Query Language
Implementing Table Operations Using Structured Query Language (SQL) Using Multiple Operations Show Only certain columns and rows from the join of Table A with Table B The implementation of table operations
More informationDatabases vs. Spreadsheets
Databases vs. Spreadsheets Databases A database is a structured set of data that are easily accessible in various ways Database software tools facilitate the automated management of data Storing, modifying,
More informationMySQL for Developers Ed 3
Oracle University Contact Us: 0845 777 7711 MySQL for Developers Ed 3 Duration: 5 Days What you will learn This MySQL for Developers training teaches developers how to plan, design and implement applications
More informationAn Enchanted World: SAS in an Open Ecosystem
An Enchanted World: SAS in an Open Ecosystem Tuba Islam SAS Global Technology Practice C opyr i g ht 2016, SAS Ins titut e Inc. All rights res er ve d. Diversity can bring power if there is collaboration
More informationLesson 1. Introduction to Programming OBJECTIVES
Introduction to Programming If you re new to programming, you might be intimidated by code and flowcharts. You might even wonder how you ll ever understand them. This lesson offers some basic ideas and
More informationHigh-Performance Distributed DBMS for Analytics
1 High-Performance Distributed DBMS for Analytics 2 About me Developer, hardware engineering background Head of Analytic Products Department in Yandex jkee@yandex-team.ru 3 About Yandex One of the largest
More informationMySQL for Developers Ed 3
Oracle University Contact Us: 1.800.529.0165 MySQL for Developers Ed 3 Duration: 5 Days What you will learn This MySQL for Developers training teaches developers how to plan, design and implement applications
More informationLesson 14 Transcript: Triggers
Lesson 14 Transcript: Triggers Slide 1: Cover Welcome to Lesson 14 of DB2 on Campus Lecture Series. Today, we are going to talk about Triggers. My name is Raul Chong, and I'm the DB2 on Campus Program
More informationWhen, Where & Why to Use NoSQL?
When, Where & Why to Use NoSQL? 1 Big data is becoming a big challenge for enterprises. Many organizations have built environments for transactional data with Relational Database Management Systems (RDBMS),
More informationGreenplum Architecture Class Outline
Greenplum Architecture Class Outline Introduction to the Greenplum Architecture What is Parallel Processing? The Basics of a Single Computer Data in Memory is Fast as Lightning Parallel Processing Of Data
More informationLearn SQL by Calculating Customer Lifetime Value
Learn SQL Learn SQL by Calculating Customer Lifetime Value Setup, Counting and Filtering 1 Learn SQL CONTENTS Getting Started Scenario Setup Sorting with ORDER BY FilteringwithWHERE FilteringandSorting
More informationBest practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP
Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP 07.29.2015 LANDING STAGING DW Let s start with something basic Is Data Lake a new concept? What is the closest we can
More informationHow to speed up a database which has gotten slow
Triad Area, NC USA E-mail: info@geniusone.com Web: http://geniusone.com How to speed up a database which has gotten slow hardware OS database parameters Blob fields Indices table design / table contents
More informationBig Data Specialized Studies
Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate
More informationImproving the ROI of Your Data Warehouse
Improving the ROI of Your Data Warehouse Many organizations are struggling with a straightforward but challenging problem: their data warehouse can t affordably house all of their data and simultaneously
More informationSOFTWARE DEVELOPMENT: DATA SCIENCE
PROFESSIONAL CAREER TRAINING INSTITUTE SOFTWARE DEVELOPMENT: DATA SCIENCE www.pcti.edu/data-science applicant@pcti.edu 832-484-9100 PROGRAM OVERVIEW Prepare for a life changing career as a data scientist
More informationData Warehouses Chapter 12. Class 10: Data Warehouses 1
Data Warehouses Chapter 12 Class 10: Data Warehouses 1 OLTP vs OLAP Operational Database: a database designed to support the day today transactions of an organization Data Warehouse: historical data is
More informationEnd-to-End data mining feature integration, transformation and selection with Datameer Datameer, Inc. All rights reserved.
End-to-End data mining feature integration, transformation and selection with Datameer Fastest time to Insights Rapid Data Integration Zero coding data integration Wizard-led data integration & No ETL
More informationE(xtract) T(ransform) L(oad)
Gunther Heinrich, Tobias Steimer E(xtract) T(ransform) L(oad) OLAP 20.06.08 Agenda 1 Introduction 2 Extract 3 Transform 4 Load 5 SSIS - Tutorial 2 1 Introduction 1.1 What is ETL? 1.2 Alternative Approach
More informationRelational Databases. Relational Databases. SQLite Manager For Firefox. https://addons.mozilla.org/en-us/firefox/addon/sqlite-manager/
Relational Databases Charles Severance Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution 3.0 License. http://creativecommons.org/licenses/by/3.0/.
More informationLesson 3: Building a Market Basket Scenario (Intermediate Data Mining Tutorial)
From this diagram, you can see that the aggregated mining model preserves the overall range and trends in values while minimizing the fluctuations in the individual data series. Conclusion You have learned
More information7 BEST PRACTICES TO BECOME A TABLEAU NINJA FOR OBIEE
7 BEST PRACTICES TO BECOME A TABLEAU NINJA FOR OBIEE Whitepaper Abstract By connecting directly from Tableau to OBIEE, BI Connector has empowered business users with Self-Service capabilities like never
More informationData Analysis and Data Science
Data Analysis and Data Science CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/29/15 Agenda Check-in Online Analytical Processing Data Science Homework 8 Check-in Online Analytical
More information@Pentaho #BigDataWebSeries
Enterprise Data Warehouse Optimization with Hadoop Big Data @Pentaho #BigDataWebSeries Your Hosts Today Dave Henry SVP Enterprise Solutions Davy Nys VP EMEA & APAC 2 Source/copyright: The Human Face of
More informationIntroducing Oracle R Enterprise 1.4 -
Hello, and welcome to this online, self-paced lesson entitled Introducing Oracle R Enterprise. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle. I
More informationBigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data IBM Corporation
BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data 2013 IBM Corporation A Big Data architecture evolves from a traditional BI architecture
More informationOutrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS
Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Topics AGENDA Challenges with Big Data Analytics How SAS can help you to minimize time to value with
More informationAccelerate your SAS analytics to take the gold
Accelerate your SAS analytics to take the gold A White Paper by Fuzzy Logix Whatever the nature of your business s analytics environment we are sure you are under increasing pressure to deliver more: more
More informationMicrosoft Analytics Platform System (APS)
Microsoft Analytics Platform System (APS) The turnkey modern data warehouse appliance Matt Usher, Senior Program Manager @ Microsoft About.me @two_under Senior Program Manager 9 years at Microsoft Visual
More informationData Architecture Whitepaper MODERN DATA ARCHITECTURE BY W H INMON
Data Architecture Whitepaper MODERN DATA ARCHITECTURE BY W H INMON DATA WAREHOUSE Data warehouse is an established concept and discipline that is discussed in books, conferences and seminars. Indeed data
More informationData Structures & Algorithms In Java Download Free (EPUB, PDF)
Data Structures & Algorithms In Java Download Free (EPUB, PDF) Data Structures and Algorithms in Java, Second Edition is designed to be easy to read and understand although the topic itself is complicated.
More informationProjected by: LUKA CECXLADZE BEQA CHELIDZE Superviser : Nodar Momtsemlidze
Projected by: LUKA CECXLADZE BEQA CHELIDZE Superviser : Nodar Momtsemlidze About HBase HBase is a column-oriented database management system that runs on top of HDFS. It is well suited for sparse data
More informationNoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018
NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE DACE https://dace.unige.ch Data and Analysis Center for Exoplanets. Facility to store, exchange and analyse data
More information2997 Yarmouth Greenway Drive, Madison, WI Phone: (608) Web:
Getting the Most Out of SAS Enterprise Guide 2997 Yarmouth Greenway Drive, Madison, WI 53711 Phone: (608) 278-9964 Web: www.sys-seminar.com 1 Questions, Comments Technical Difficulties: Call 1-800-263-6317
More informationRelational Databases. Charles Severance
Relational Databases Charles Severance Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution 3.0 License. http://creativecommons.org/licenses/by/3.0/.
More informationLecture 8. Database Management and Queries
Lecture 8 Database Management and Queries Lecture 8: Outline I. Database Components II. Database Structures A. Conceptual, Logical, and Physical Components III. Non-Relational Databases A. Flat File B.
More informationData in the Cloud and Analytics in the Lake
Data in the Cloud and Analytics in the Lake Introduction Working in Analytics for over 5 years Part the digital team at BNZ for 3 years Based in the Auckland office Preferred Languages SQL Python (PySpark)
More informationTimeless Theory vs. Changing Users: Reconsidering Database Education
Timeless Theory vs. Changing Users: Reconsidering Database Education Purpose of the Session Demonstration of subject matter mastery, teaching skills But theme topic required Focus on my two divergent roles
More informationSQL. Dean Williamson, Ph.D. Assistant Vice President Institutional Research, Effectiveness, Analysis & Accreditation Prairie View A&M University
SQL Dean Williamson, Ph.D. Assistant Vice President Institutional Research, Effectiveness, Analysis & Accreditation Prairie View A&M University SQL 1965: Maron & Levien propose Relational Data File 1968:
More informationSQL DDL. CS3 Database Systems Weeks 4-5 SQL DDL Database design. Key Constraints. Inclusion Constraints
SQL DDL CS3 Database Systems Weeks 4-5 SQL DDL Database design In its simplest use, SQL s Data Definition Language (DDL) provides a name and a type for each column of a table. CREATE TABLE Hikers ( HId
More informationChapter 1 Introduction
Chapter 1 Introduction Why I Am Writing This: Why I am I writing a set of tutorials on compilers and how to build them? Well, the idea goes back several years ago when Rapid-Q, one of the best free BASIC
More informationIncremental Updates VS Full Reload
Incremental Updates VS Full Reload Change Data Capture Minutes VS Hours 1 Table of Contents Executive Summary - 3 Accessing Data from a Variety of Data Sources and Platforms - 4 Approaches to Moving Changed
More informationFree Ebooks A Python Primer For ArcGISÂ
Free Ebooks A Python Primer For ArcGISÂ The automation of geoprocessing tasks is becoming a common practice among GIS professionals. Python is the standard programming language for ArcGIS and other fields
More informationApplying big data analytics in practice
ARISTOTLE UNIVERSITY of THESSALONIKI Applying big data analytics in practice Anastasios Gounaris School of Informatics datalab.csd.auth.gr/~gounaris email: gounaria@csd.auth.gr New data every 1 min 2 What
More informationStat Wk 3. Stat 342 Notes. Week 3, Page 1 / 71
Stat 342 - Wk 3 What is SQL Proc SQL 'Select' command and 'from' clause 'group by' clause 'order by' clause 'where' clause 'create table' command 'inner join' (as time permits) Stat 342 Notes. Week 3,
More informationSearch Engines and Knowledge Graphs
Search Engines and Knowledge Graphs It s Complicated! Panos Alexopoulos Head of Ontology Who we are and what we do We develop Technology to bridge the language and meaning gap between People and Jobs...
More informationCrystal Reports. Overview. Contents. How to report off a Teradata Database
Crystal Reports How to report off a Teradata Database Overview What is Teradata? NCR Teradata is a database and data warehouse software developer. This whitepaper will give you some basic information on
More informationEvolution of Database Systems
Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second
More informationTHE RISE OF. The Disruptive Data Warehouse
THE RISE OF The Disruptive Data Warehouse CONTENTS What Is the Disruptive Data Warehouse? 1 Old School Query a single database The data warehouse is for business intelligence The data warehouse is based
More informationLesson 1. Why Use It? Terms to Know
describe how a table is designed and filled. describe a form and its use. know the appropriate time to use a sort or a query. see the value of key fields, common fields, and multiple-field sorts. describe
More informationFOUNDATIONS OF A CROSS-DISCIPLINARY PEDAGOGY FOR BIG DATA *
FOUNDATIONS OF A CROSS-DISCIPLINARY PEDAGOGY FOR BIG DATA * Joshua Eckroth Stetson University DeLand, Florida 386-740-2519 jeckroth@stetson.edu ABSTRACT The increasing awareness of big data is transforming
More informationDOWNLOAD PDF INSIDE RELATIONAL DATABASES
Chapter 1 : Inside Microsoft's Cosmos DB ZDNet Inside Relational Databases is an excellent introduction to the topic and a very good resource. I read the book cover to cover and found the authors' insights
More informationFundamentals of Information Systems, Seventh Edition
Chapter 3 Data Centers, and Business Intelligence 1 Why Learn About Database Systems, Data Centers, and Business Intelligence? Database: A database is an organized collection of data. Databases also help
More informationChapter 1 SQL and Data
Chapter 1 SQL and Data What is SQL? Structured Query Language An industry-standard language used to access & manipulate data stored in a relational database E. F. Codd, 1970 s IBM 2 What is Oracle? A relational
More informationOptimizing Testing Performance With Data Validation Option
Optimizing Testing Performance With Data Validation Option 1993-2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording
More informationDHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI
DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI Department of Information Technology IT6701 - INFORMATION MANAGEMENT Anna University 2 & 16 Mark Questions & Answers Year / Semester: IV / VII Regulation: 2013
More informationLecture 12. Lecture 12: The IO Model & External Sorting
Lecture 12 Lecture 12: The IO Model & External Sorting Announcements Announcements 1. Thank you for the great feedback (post coming soon)! 2. Educational goals: 1. Tech changes, principles change more
More informationQLIK INTEGRATION WITH AMAZON REDSHIFT
QLIK INTEGRATION WITH AMAZON REDSHIFT Qlik Partner Engineering Created August 2016, last updated March 2017 Contents Introduction... 2 About Amazon Web Services (AWS)... 2 About Amazon Redshift... 2 Qlik
More informationScaling Up 1 CSE 6242 / CX Duen Horng (Polo) Chau Georgia Tech. Hadoop, Pig
CSE 6242 / CX 4242 Scaling Up 1 Hadoop, Pig Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Le
More informationCA Test Data Manager 3.x: Foundations 200
CA EDUCATION COURSE DESCRIPTION CA Test Data Manager 3.x: Foundations 200 Course Overview PRODUCT RELEASE CA Test Data Manager 3.2 This course provides students with primary concepts on each function of
More informationOne-PROC-Away: The Essence of an Analysis Database Russell W. Helms, Ph.D. Rho, Inc.
One-PROC-Away: The Essence of an Analysis Database Russell W. Helms, Ph.D. Rho, Inc. Chapel Hill, NC RHelms@RhoWorld.com www.rhoworld.com Presented to ASA/JSM: San Francisco, August 2003 One-PROC-Away
More informationCS317 File and Database Systems
CS317 File and Database Systems Lecture 3 Relational Model & Languages Part-1 September 7, 2018 Sam Siewert More Embedded Systems Summer - Analog, Digital, Firmware, Software Reasons to Consider Catch
More informationWho we are: Database Research - Provenance, Integration, and more hot stuff. Boris Glavic. Department of Computer Science
Who we are: Database Research - Provenance, Integration, and more hot stuff Boris Glavic Department of Computer Science September 24, 2013 Hi, I am Boris Glavic, Assistant Professor Hi, I am Boris Glavic,
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationDatabases in Discovery
Databases in Discovery By Craig Ball I loathe the practice of law from forms, but bow to its power. Lawyers love forms; so, to get lawyers to use more efficient and precise prose in their discovery requests,
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationTaming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems
1 Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems The Defacto Choice For Convergence 2 ABSTRACT & SPEAKER BIO Dealing with enormous data growth is a key challenge for
More informationInformatics 1: Data & Analysis
Informatics 1: Data & Analysis Lecture 5: Relational Algebra Ian Stark School of Informatics The University of Edinburgh Tuesday 31 January 2017 Semester 2 Week 3 https://blog.inf.ed.ac.uk/da17 Tutorial
More informationData Management Lecture Outline 2 Part 2. Instructor: Trevor Nadeau
Data Management Lecture Outline 2 Part 2 Instructor: Trevor Nadeau Data Entities, Attributes, and Items Entity: Things we store information about. (i.e. persons, places, objects, events, etc.) Have relationships
More informationCTL.SC4x Technology and Systems
in Supply Chain Management CTL.SC4x Technology and Systems Key Concepts Document This document contains the Key Concepts for the SC4x course, Weeks 1 and 2. These are meant to complement, not replace,
More informationAre your spreadsheets filled with unnecessary zero s, cluttering your information and making it hard to identify significant results?
Declutter your Spreadsheets by Hiding Zero Values Are your spreadsheets filled with unnecessary zero s, cluttering your information and making it hard to identify significant results? Undertaking data
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationTodd Walter Chief Technologist Teradata Corporation
Todd Walter Chief Technologist Teradata Corporation 10/14/2013 1 The following solely represents the opinions of Todd Walter not the opinions of Teradata Corporation Nothing in this document may be construed
More informationBringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security
Bringing OpenStack to the Enterprise An enterprise-class solution ensures you get the required performance, reliability, and security INTRODUCTION Organizations today frequently need to quickly get systems
More informationTUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL
TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL We have spent the first part of the course learning Excel: importing files, cleaning, sorting, filtering, pivot tables and exporting
More informationModelling Structures in Data Mining Techniques
Modelling Structures in Data Mining Techniques Ananth Y N 1, Narahari.N.S 2 Associate Professor, Dept of Computer Science, School of Graduate Studies- JainUniversity- J.C.Road, Bangalore, INDIA 1 Professor
More informationStep-by-step data transformation
Step-by-step data transformation Explanation of what BI4Dynamics does in a process of delivering business intelligence Contents 1. Introduction... 3 Before we start... 3 1 st. STEP: CREATING A STAGING
More informationHow to integrate data into Tableau
1 How to integrate data into Tableau a comparison of 3 approaches: ETL, Tableau self-service and WHITE PAPER WHITE PAPER 2 data How to integrate data into Tableau a comparison of 3 es: ETL, Tableau self-service
More informationCreate A Relational Database Schema For The Following Library System
Create A Relational Database Schema For The Following Library System Define data atomicity as it relates to the definition of relational databases. Define the following concepts:. Key Design the schema
More informationLearning PHP, MySQL, JavaScript, And CSS: A Step-by-Step Guide To Creating Dynamic Websites PDF
Learning PHP, MySQL, JavaScript, And CSS: A Step-by-Step Guide To Creating Dynamic Websites PDF Learn how to build interactive, data-driven websitesâ even if you donâ t have any previous programming experience.
More informationOracle Big Data Discovery
Oracle Big Data Discovery Turning Data into Business Value Harald Erb Oracle Business Analytics & Big Data 1 Safe Harbor Statement The following is intended to outline our general product direction. It
More informationWhat is the Best Way for Children to Learn Computer Programming?
What is the Best Way for Children to Learn Computer Programming? Dr Alex Davidovic One of the defining characteristics of today s society is that the computers and mobile devices are the integral and natural
More informationTest bank for accounting information systems 1st edition by richardson chang and smith
Test bank for accounting information systems 1st edition by richardson chang and smith Chapter 04 Relational Databases and Enterprise Systems True / False Questions 1. Three types of data models used today
More informationIntroduction to Oracle
Class Note: Chapter 1 Introduction to Oracle (Updated May 10, 2016) [The class note is the typical material I would prepare for my face-to-face class. Since this is an Internet based class, I am sharing
More information