INTRODUCTION TO DATA SCIENCE
|
|
- Francis Collins
- 6 years ago
- Views:
Transcription
1 DATA11001 INTRODUCTION TO DATA SCIENCE EPISODE 2
2 TODAY S MENU 1. D ATA B A S E S 2. D ATA T R A N S F O R M AT I O N S 3. F I LT E R I N G AND I M P U TAT I O N
3 DATABASES This isn t a course on databases: hopefully you ve already taken one But we ll refresh some basics to be able to access data in databases - sqlite - elementary SQL For the most part, we ll just extract the data we need and manipulate it in, e.g., python and command-line tools
4 EXAMPLE: KAGGLE: EUROPEAN SOCCER DATABASE
5 EUROPEAN SOCCER DATABASE Create an account on Kaggle unless you already have one Chief Data Scientist s advice: Do Kaggle competitions. [ ] Preprocessing, missing values, using libraries [ ] You can find the soccer database here The database is a single zip file: database.sqlite.zip Zipped 34 MB, unzipped 313 MB
6 EUROPEAN SOCCER DATABASE Easy to use from command-line: sqlite3 $ sqlite3 database.sqlite SQLite version :17:19 Enter ".help" for usage hints. sqlite> SELECT player_name FROM Player LIMIT 10; Aaron Appindangoye Aaron Cresswell Aaron Doran Aaron Galindo Aaron Hughes Aaron Hunt Aaron Kuhl Aaron Lennon Aaron Lennox Aaron Meijers
7 EUROPEAN SOCCER DATABASE Same in python: import sqlite3 database = 'database.sqlite' conn = sqlite3.connect(database) c = conn.cursor() query = "SELECT player_name FROM Player;" c.execute(query) rows = c.fetchmany(10) print(rows) conn.close() [('Aaron Appindangoye',), ('Aaron Cresswell',), ('Aaron Doran',), ('Aaron Galindo',), ('Aaron Hughes',), ('Aaron Hunt',), ('Aaron Kuhl',), ('Aaron Lennon',), ('Aaron Lennox',), ('Aaron Meijers',)]
8 EUROPEAN SOCCER DATABASE Same in python with pandas (note the formatting, incl. header): import sqlite3 import pandas as pd database = 'database.sqlite' conn = sqlite3.connect(database) query = "SELECT player_name FROM Player;" rows = pd.read_sql(query, conn) print(rows[0:10]) conn.close() player_name 0 Aaron Appindangoye 1 Aaron Cresswell 2 Aaron Doran 3 Aaron Galindo 4 Aaron Hughes 5 Aaron Hunt 6 Aaron Kuhl 7 Aaron Lennon 8 Aaron Lennox 9 Aaron Meijers
9 EUROPEAN SOCCER DATABASE Simple SQL tricks: sqlite> SELECT player_name, height FROM Player...> ORDER BY height...> LIMIT 10; Juan Quero Diego Buonanotte Maxi Moralez Anthony Deroin Bakari Kone Edgar Salli Fouad Rachid Frederic Sammaritano Lorenzo Insigne Pablo Piatti
10 EUROPEAN SOCCER DATABASE TABLE Player: id, player_api_id, player_name, player_fifa_api_id, birthday, height, weight TABLE Player_Attributes: id, player_fifa_api_id, player_api_id, date, overall_rating, potential, preferred_foot, attacking_work_rate, defensive_work_rate, crossing, finishing, heading_accuracy, short_passing, volleys, dribbling, curve, free_kick_accuracy, long_passing, ball_control, acceleration, sprint_speed, agility, reactions, balance, shot_power, jumping, stamina, strength, long_shots, aggression, interceptions, positioning, vision, penalties, marking, standing_tackle, sliding_tackle, gk_diving, gk_handling, gk_kicking, gk_positioning, gk_reflexes
11 EUROPEAN SOCCER DATABASE Joining tables: sqlite> SELECT * FROM...> (SELECT player_name, height, weight,...> player_api_id AS p_id FROM Player) a...> INNER JOIN Player_attributes b...> ON a.p_id = b.player_api_id...> LIMIT 10; Aaron Appindangoye :00: right medium medium Aaron Appindangoye :00: right medium medium
12 EUROPEAN SOCCER DATABASE More SQL tricks: CREATE TABLE, GROUP BY, aggregate functions (MAX) sqlite> CREATE TABLE player_max_date...> AS SELECT player_api_id AS p_id,...> MAX(date) AS date...> FROM player_attributes...> GROUP BY p_id; sqlite> SELECT * FROM player_max_date LIMIT 3; :00: :00: :00:00
13 EUROPEAN SOCCER DATABASE Three-way join: sqlite> SELECT * FROM...> (SELECT player_name, height, weight,...> player_api_id AS p_id...> FROM player) a...> INNER JOIN...> player_attributes b...> ON a.p_id = b.player_api_id...> INNER JOIN player_max_date c...> ON b.player_api_id = c.p_id AND...> b.date = c.date; Aaron Appindangoye :00: right medium medium :00:00 Aaron Cresswell :00: left high medium :00:00...
14 2. DATA TRANS- FORMATIONS
15 T R A N S F O R M AT I O N S sqlite> sqlite> sqlite> sqlite>...>...>...>...>...>...>...>...> sqlite>.mode csv.headers on.output player_stats.csv SELECT * FROM (SELECT player_name, height, weight, player_api_id AS p_id FROM player) a INNER JOIN player_attributes b ON a.p_id = b.player_api_id INNER JOIN player_max_date c ON b.player_api_id = c.p_id AND b.date = c.date;.output stdout
16 T R A N S F O R M AT I O N S
17 T R A N S F O R M AT I O N S csv => json is easy with python and pandas! import pandas as pd import json data = pd.read_csv("player_stats.csv") print(data.to_json(orient='records', lines=true)) {"player_name":"aaron Appindangoye","height": ,"weight":187,"p_id":505942,"id": 1,"player_fifa_api_id":218353,"player_api_id": ,"date":" :00:00","overall_rating":67.0,"potential": 71.0,"preferred_foot":"right","attacking_work_rate" :"medium","defensive_work_rate":"medium","crossing" :49.0,"finishing":44.0,"heading_accuracy": 71.0,"short_passing":61.0,"volleys": 44.0,"dribbling":51.0,"curve": 45.0,"free_kick_accuracy":39.0,"long_passing": 64.0,"ball_control":49.0,"acceleration": 60.0,"sprint_speed":64.0,"agility": 59.0,"reactions":47.0,"balance":65.0,"shot_power":
18 OTHER TRANSFORMATIONS HTML => e.g. CSV "Scraping!" (dirty business)
19 TRANSFORMATIONS Content transformations: string to numeric, " " > (float) dates (mind the formats, 9/5/2017 vs ) NA/ /0/99/etc can mean missing entries splitting: name = "Teemu Roos" => first = "Teemu", last = "Roos"... Especially for text, it may be important to: downcase: SuperMan > superman remove punctuation stem: 'swimming' > 'swim'
20 3. F I LT E R I N G AND I M P U TAT I O N
21 FILTERING Subsetting: columns and/or rows Many of these are conveniently done using command-line tools such as grep, cut, awk, sed For big data, it is important to avoid reading all the data in memory before starting: the above tools only store and process the data little by little, so memory consumption is constant
22 IMPUTATION Missing values can be a show-stopper for many analysis methods A simple way is to filter out all records with missing entries This may, however, lose a lot of important data Another option is to impute, i.e., enter "fake" data in the place of the missing entries: average for numeric columns mode (most typical value) for categorical columns also possible to use machine learning to predict the missing entries based on the others
INTRODUCTION TO DATA SCIENCE
DATA11001 INTRODUCTION TO DATA SCIENCE EPISODE 2: MINIPROJECT, ARRAY DATA, STORAGE FORMATS & TRANSFORMATIONS TODAY S MENU 1. M I N I P R O J E C T S 2. A R R AY D ATA 3. D ATA T R A N S F O R M AT I O
More informationTraffic violations revisited
Traffic violations revisited November 9, 2017 In this lab, you will once again extract data about traffic violations from a CSV file, but this time you will use SQLite. First, download the following files
More informationConverting categorical data into numbers with Pandas and Scikit-learn -...
1 of 6 11/17/2016 11:02 AM FastML Machine learning made easy RSS Home Contents Popular Links Backgrounds About Converting categorical data into numbers with Pandas and Scikit-learn 2014-04-30 Many machine
More informationDatabases. Course October 23, 2018 Carsten Witt
Databases Course 02807 October 23, 2018 Carsten Witt Databases Database = an organized collection of data, stored and accessed electronically (Wikipedia) Different principles for organization of data:
More information15-388/688 - Practical Data Science: Relational Data. J. Zico Kolter Carnegie Mellon University Spring 2018
15-388/688 - Practical Data Science: Relational Data J. Zico Kolter Carnegie Mellon University Spring 2018 1 Announcements Piazza etiquette: Changing organization of threads to be easier to search (starting
More informationINTRODUCTION TO DATA SCIENCE
INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #7 2/16/2017 CMSC320 Tuesdays & Thursdays 3:30pm 4:45pm ANNOUNCEMENTS Anant s office hours have changed: Old: 2PM-3PM on Tuesdays New: 11AM-12PM on
More informationIMPORTING DATA IN PYTHON I. Introduction to relational databases
IMPORTING DATA IN PYTHON I Introduction to relational databases What is a relational database? Based on relational model of data First described by Edgar Ted Codd Example: Northwind database Orders table
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationNCSS: Databases and SQL
NCSS: Databases and SQL Tim Dawborn Lecture 2, January, 2017 Python/sqlite3 DB Design API JOINs 2 Outline 1 Connecting to an SQLite database using Python 2 What is a good database design? 3 A nice API
More informationDATA STRUCTURE AND ALGORITHM USING PYTHON
DATA STRUCTURE AND ALGORITHM USING PYTHON Common Use Python Module II Peter Lo Pandas Data Structures and Data Analysis tools 2 What is Pandas? Pandas is an open-source Python library providing highperformance,
More informationNow go to bash and type the command ls to list files. The unix command unzip <filename> unzips a file.
wrangling data unix terminal and filesystem Grab data-examples.zip from top of lecture 4 notes and upload to main directory on c9.io. (No need to unzip yet.) Now go to bash and type the command ls to list
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationCIS 192: Lecture 11 Databases (SQLite3)
CIS 192: Lecture 11 Databases (SQLite3) Lili Dworkin University of Pennsylvania In-Class Quiz app = Flask( main ) @app.route('/') def home():... app.run() 1. type(app.run) 2. type(app.route( / )) Hint:
More informationSTAT 408. Data Scraping and SQL STAT 408. Data Scraping SQL. March 8, 2018
and and March 8, 2018 and and scraping is defined as using a computer to extract information, typically from human readable websites. We could spend multiple weeks on this, so this will be a basic introduction
More informationSQLite vs. MongoDB for Big Data
SQLite vs. MongoDB for Big Data In my latest tutorial I walked readers through a Python script designed to download tweets by a set of Twitter users and insert them into an SQLite database. In this post
More informationSTATS Data Analysis using Python. Lecture 15: Advanced Command Line
STATS 700-002 Data Analysis using Python Lecture 15: Advanced Command Line Why UNIX/Linux? As a data scientist, you will spend most of your time dealing with data Data sets never arrive ready to analyze
More informationCommand-Line Data Analysis INX_S17, Day 15,
Command-Line Data Analysis INX_S17, Day 15, 2017-05-12 General tool efficiency, tr, newlines, join, column Learning Outcome(s): Discuss the theory behind Unix/Linux tool efficiency, e.g., the reasons behind
More informationCSE 115. Introduction to Computer Science I
CSE 115 Introduction to Computer Science I Road map Review (sorting) Persisting data Databases Sorting Given a sequence of values that can be ordered, sorting involves rearranging these values so they
More informationSQL I: Introduction. Relational Databases. Attribute. Tuple. Relation
1 SQL I: Introduction Lab Objective: Being able to store and manipulate large data sets quickly is a fundamental part of data science. The SQL language is the classic database management system for working
More informationCS 2316 Exam 3 ANSWER KEY
CS 2316 Exam 3 Practice ANSWER KEY Failure to properly fill in the information on this page will result in a deduction of up to 5 points from your exam score. Signing signifies you are aware of and in
More informationData Science. Data Analyst. Data Scientist. Data Architect
Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &
More informationExceptions & a Taste of Declarative Programming in SQL
Exceptions & a Taste of Declarative Programming in SQL David E. Culler CS8 Computational Structures in Data Science http://inst.eecs.berkeley.edu/~cs88 Lecture 12 April 18, 2016 Computational Concepts
More informationLecture #12: Quick: Exceptions and SQL
UC Berkeley EECS Adj. Assistant Prof. Dr. Gerald Friedland Computational Structures in Data Science Lecture #12: Quick: Exceptions and SQL Administrivia Open Project: Starts Monday! Creative data task
More informationSOFTWARE DEVELOPMENT: DATA SCIENCE
PROFESSIONAL CAREER TRAINING INSTITUTE SOFTWARE DEVELOPMENT: DATA SCIENCE www.pcti.edu/data-science applicant@pcti.edu 832-484-9100 PROGRAM OVERVIEW Prepare for a life changing career as a data scientist
More informationPandas UDF Scalable Analysis with Python and PySpark. Li Jin, Two Sigma Investments
Pandas UDF Scalable Analysis with Python and PySpark Li Jin, Two Sigma Investments About Me Li Jin (icexelloss) Software Engineer @ Two Sigma Investments Analytics Tools Smith Apache Arrow Committer Other
More informationCS108 Lecture 18: Databases and SQL
CS108 Lecture 18: Databases and SQL Databases for data storage and access The Structured Query Language Aaron Stevens 4 March 2013 What You ll Learn Today How does Facebook generate unique pages for each
More informationLECTURE 21. Database Interfaces
LECTURE 21 Database Interfaces DATABASES Commonly, Python applications will need to access a database of some sort. As you can imagine, not only is this easy to do in Python but there is a ton of support
More informationOracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service
Demo Introduction Keywords: Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service Goal of Demo: Oracle Big Data Preparation Cloud Services can ingest data from various
More informationPrometheus. A Next Generation Monitoring System. Brian Brazil Founder
Prometheus A Next Generation Monitoring System Brian Brazil Founder Who am I? Engineer passionate about running software reliably in production. Based in Ireland Core-Prometheus developer Contributor to
More informationAn Introduction to Preparing Data for Analysis with JMP. Full book available for purchase here. About This Book... ix About The Author...
An Introduction to Preparing Data for Analysis with JMP. Full book available for purchase here. Contents About This Book... ix About The Author... xiii Chapter 1: Data Management in the Analytics Process...
More informationPandas. Data Manipulation in Python
Pandas Data Manipulation in Python 1 / 27 Pandas Built on NumPy Adds data structures and data manipulation tools Enables easier data cleaning and analysis import pandas as pd 2 / 27 Pandas Fundamentals
More informationNoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018
NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE DACE https://dace.unige.ch Data and Analysis Center for Exoplanets. Facility to store, exchange and analyse data
More informationNCSS: Databases and SQL
NCSS: Databases and SQL Tim Dawborn Lecture 1, January, 2016 Motivation SQLite SELECT WHERE JOIN Tips 2 Outline 1 Motivation 2 SQLite 3 Searching for Data 4 Filtering Results 5 Joining multiple tables
More informationDesigning dashboards for performance. Reference deck
Designing dashboards for performance Reference deck Basic principles 1. Everything in moderation 2. If it isn t fast in database, it won t be fast in Tableau 3. If it isn t fast in desktop, it won t be
More informationOptimizer Challenges in a Multi-Tenant World
Optimizer Challenges in a Multi-Tenant World Pat Selinger pselinger@salesforce.come Classic Query Optimizer Concepts & Assumptions Relational Model Cost = X * CPU + Y * I/O Cardinality Selectivity Clustering
More informationCS 170 Algorithms Fall 2014 David Wagner HW12. Due Dec. 5, 6:00pm
CS 170 Algorithms Fall 2014 David Wagner HW12 Due Dec. 5, 6:00pm Instructions. This homework is due Friday, December 5, at 6:00pm electronically via glookup. This homework assignment is a programming assignment
More informationMicrosoft Excel & The Internet. J. Carlton Collins ASA Research
Microsoft Excel & The Internet J. Carlton Collins ASA Research Carlton@ASAResearch.com 770.734.0950 Excel and the Internet There are at least 9 good ways in which Excel and the Internet can work together,
More informationExtract API: Build sophisticated data models with the Extract API
Welcome # T C 1 8 Extract API: Build sophisticated data models with the Extract API Justin Craycraft Senior Sales Consultant Tableau / Customer Consulting My Office Photo Used with permission Agenda 1)
More informationWhy I Use Python for Academic Research
Why I Use Python for Academic Research Academics and other researchers have to choose from a variety of research skills. Most social scientists do not add computer programming into their skill set. As
More informationBig Data, Right Tools: Computational Resources for Empirical Research 2014
Big Data, Right Tools: Computational Resources for Empirical Research 2014 Dokyun Lee, PhD Candidate, OPIM Dept. July 30, 2014 The aim of this course is to familiarize beginning Wharton PhD studentswithbothpubliclyavailable
More information42 Building a Report with a Text Pluggable Data Source
42 Building a Report with a Text Pluggable Data Source Figure 42 1 Report output using a text PDS Reports Builder enables you to use any data source you wish. In this chapter, you will learn how to use
More information10 things I wish I knew. about Machine Learning Competitions
10 things I wish I knew about Machine Learning Competitions Introduction Theoretical competition run-down The list of things I wish I knew Code samples for a running competition Kaggle the platform Reasons
More informationData Foundations. Topic Objectives. and list subcategories of each. its properties. before producing a visualization. subsetting
CS 725/825 Information Visualization Fall 2013 Data Foundations Dr. Michele C. Weigle http://www.cs.odu.edu/~mweigle/cs725-f13/ Topic Objectives! Distinguish between ordinal and nominal values and list
More informationInvestigating Source Code Reusability for Android and Blackberry Applications
Investigating Source Code Reusability for Android and Blackberry Applications Group G8 Jenelle Chen Aaron Jin 1 Outline Recaps Challenges with mobile development Problem definition Approach Demo Detailed
More information#mstrworld. Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending. Presented by: Trishla Maru.
Analyzing Multiple Data Sources with Multisource Data Federation and In-Memory Data Blending Presented by: Trishla Maru Agenda Overview MultiSource Data Federation Use Cases Design Considerations Data
More informationThings You Will Most Likely Want to Do in TeamSnap
How to Use TeamSnap for Parents This is a How To Guide for parents of children playing in Beaumont Soccer Association who want to learn how to utilize TeamSnap effectively. TeamSnap helps Managers: Organize
More informationPandas. Data Manipulation in Python
Pandas Data Manipulation in Python 1 / 26 Pandas Built on NumPy Adds data structures and data manipulation tools Enables easier data cleaning and analysis import pandas as pd 2 / 26 Pandas Fundamentals
More informationUsing PostgreSQL, Prometheus & Grafana for Storing, Analyzing and Visualizing Metrics
Using PostgreSQL, Prometheus & Grafana for Storing, Analyzing and Visualizing Metrics Erik Nordström, PhD Core Database Engineer hello@timescale.com github.com/timescale Why PostgreSQL? Reliable and familiar
More informationData and Text Mining
Data representation and manipulation I prof. dr. Bojan Cestnik Temida d.o.o. & Jozef Stefan Institute Ljubljana bojan.cestnik@temida.si prof. dr. Bojan Cestnik 1 Contents Introduction Basic Data Mining
More informationScalable Web Software. CS193S - Jan Jannink - 1/07/10
Scalable Web Software CS193S - Jan Jannink - 1/07/10 Administrative Stuff Computer Forum Career Fair: Wed. 13, 11-4 Lawn between Hewlett Teaching Center and Gilbert Building Looking forward to your emails!
More informationA detailed comparison of EasyMorph vs Tableau Prep
A detailed comparison of vs We at keep getting asked by our customers and partners: How is positioned versus?. Well, you asked, we answer! Short answer and are similar, but there are two important differences.
More informationChapter The Juice: A Podcast Aggregator
Chapter 12 The Juice: A Podcast Aggregator For those who may not be familiar, podcasts are audio programs, generally provided in a format that is convenient for handheld media players. The name is a play
More informationData Science Bootcamp Curriculum. NYC Data Science Academy
Data Science Bootcamp Curriculum NYC Data Science Academy 100+ hours free, self-paced online course. Access to part-time in-person courses hosted at NYC campus Machine Learning with R and Python Foundations
More informationPython & Spark PTT18/19
Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes Härtel Msc. Marcel Heinz The Big Picture [Aggarwal15] Plenty of Building Blocks are involved in this Big Picture Back to the Big Picture [Aggarwal15]
More informationFinancial Statements Using Crystal Reports
Sessions 6-7 & 6-8 Friday, October 13, 2017 8:30 am 1:00 pm Room 616B Sessions 6-7 & 6-8 Financial Statements Using Crystal Reports Presented By: David Hardy Progressive Reports Original Author(s): David
More informationTraining. Data Modelling. Framework Manager Projects (2 days) Contents
We aim to provide you with the right training, at the right time and at the right price'. A cost effective solution to your business objectives. Our trainers are experts in IBM Cognos applications and
More informationMicrosoft Access Illustrated. Unit B: Building and Using Queries
Microsoft Access 2010- Illustrated Unit B: Building and Using Queries Objectives Use the Query Wizard Work with data in a query Use Query Design View Sort and find data (continued) Microsoft Office 2010-Illustrated
More informationDealing with Data Especially Big Data
Dealing with Data Especially Big Data INFO-GB-2346.01 Fall 2017 Professor Norman White nwhite@stern.nyu.edu normwhite@twitter Teaching Assistant: Frenil Sanghavi fps241@stern.nyu.edu Administrative Assistant:
More informationPlease pick up your name card
L06: SQL 233 Announcements! Please pick up your name card - always come with your name card - If nobody answers my question, I will likely pick on those without a namecard or in the last row Polls on speed:
More informationPart 1: Collecting and visualizing The Movie DB (TMDb) data
CSE6242 / CX4242: Data and Visual Analytics Georgia Tech Fall 2015 Homework 1: Analyzing The Movie DB dataset; SQLite; D3 Warmup; OpenRefine Due: Friday, 11 September, 2015, 11:55PM EST Prepared by Meera
More informationCITS4009 Introduction to Data Science
School of Computer Science and Software Engineering CITS4009 Introduction to Data Science SEMESTER 2, 2017: CHAPTER 4 MANAGING DATA 1 Chapter Objectives Fixing data quality problems Organizing your data
More informationCSE 115. Introduction to Computer Science I
CSE 115 Introduction to Computer Science I Road map Review HTML injection SQL injection Persisting data Central Processing Unit CPU Random Access Memory RAM persistent storage (e.g. file or database) Persisting
More informationJaql. Kevin Beyer, Vuk Ercegovac, Eugene Shekita, Jun Rao, Ning Li, Sandeep Tata. IBM Almaden Research Center
Jaql Running Pipes in the Clouds Kevin Beyer, Vuk Ercegovac, Eugene Shekita, Jun Rao, Ning Li, Sandeep Tata IBM Almaden Research Center http://code.google.com/p/jaql/ 2009 IBM Corporation Motivating Scenarios
More informationToday s Presentation
Banish the I/O: Together, SSD and Main Memory Storage Accelerate Database Performance Today s Presentation Conventional Database Performance Optimization Goal: Minimize I/O Legacy Approach: Cache 21 st
More informationHelp: Importing Contacts User Guide
Help: Importing Contacts User Guide Contents 1. PURPOSE OF THIS GUIDE:... 2 2. OVERVIEW OF THE IMPORT PROCESS:... 2 3. PREPARING YOUR IMPORT CONTACTS FILE... 3 4. STARTING THE IMPORT CONTACTS WIZARD...
More informationBest Practices for Choosing Content Reporting Tools and Datasources. Andrew Grohe Pentaho Director of Services Delivery, Hitachi Vantara
Best Practices for Choosing Content Reporting Tools and Datasources Andrew Grohe Pentaho Director of Services Delivery, Hitachi Vantara Agenda Discuss best practices for choosing content with Pentaho Business
More informationDatabases in Python. MySQL, SQLite. Accessing persistent storage (Relational databases) from Python code
Databases in Python MySQL, SQLite Accessing persistent storage (Relational databases) from Python code Goal Making some data 'persistent' When application restarts When computer restarts Manage big amounts
More informationIntroduction to Database Systems CSE 414
Introduction to Database Systems CSE 414 Lectures 4 and 5: Aggregates in SQL CSE 414 - Spring 2013 1 Announcements Homework 1 is due on Wednesday Quiz 2 will be out today and due on Friday CSE 414 - Spring
More informationQuerying Data with Transact SQL
Course 20761A: Querying Data with Transact SQL Course details Course Outline Module 1: Introduction to Microsoft SQL Server 2016 This module introduces SQL Server, the versions of SQL Server, including
More informationSix Core Data Wrangling Activities. An introductory guide to data wrangling with Trifacta
Six Core Data Wrangling Activities An introductory guide to data wrangling with Trifacta Today s Data Driven Culture Are you inundated with data? Today, most organizations are collecting as much data in
More informationHOST A GET CODING! CLUB TAKEOVER
HOST A GET CODING! CLUB TAKEOVER www.getcodingkids.com #GetCoding @WalkerBooksUK GETTING STARTED THE LUCKY CAT CLUB We re The Lucky Cat Club! Welcome to our club takeover. Join us for a top-secret mission
More informationPython and Databases
Python and Databases Wednesday 25 th March CAS North East Conference, Newcastle Sue Sentance King s College London/CAS/Python School @suesentance sue.sentance@computingatschool.org.uk This handout includes
More informationConnecting Spotfire to Data Sources with Information Designer
Connecting Spotfire to Data Sources with Information Designer Margot Goodwin, Senior Manager, Application Consulting September 15, 2016 HUMAN HEALTH ENVIRONMENTAL HEALTH 2014 PerkinElmer Spotfire Information
More informationCSC 411 Lecture 4: Ensembles I
CSC 411 Lecture 4: Ensembles I Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 04-Ensembles I 1 / 22 Overview We ve seen two particular classification algorithms:
More informationQuerying Data with Transact-SQL (761)
Querying Data with Transact-SQL (761) Manage data with Transact-SQL Create Transact-SQL SELECT queries Identify proper SELECT query structure, write specific queries to satisfy business requirements, construct
More informationIMPORTING DATA IN PYTHON I. Welcome to the course!
IMPORTING DATA IN PYTHON I Welcome to the course! Import data Flat files, e.g..txts,.csvs Files from other software Relational databases Plain text files Source: Project Gutenberg Table data titanic.csv
More informationA day in the life of a functional data scientist. Richard Minerich, Director of R&D at Bayard
A day in the life of a functional data scientist Richard Minerich, Director of R&D at Bayard Rock @Rickasaurus Projecting onto a 2D Plane The Pairwise Entity Resolution Process Blocking Scoring Review
More informationCS108 Lecture 19: The Python DBAPI
CS108 Lecture 19: The Python DBAPI Sqlite3 database Running SQL and reading results in Python Aaron Stevens 6 March 2013 What You ll Learn Today Review: SQL Review: the Python tuple sequence. How does
More informationCS1 Lecture 5 Jan. 25, 2019
CS1 Lecture 5 Jan. 25, 2019 HW1 due Monday, 9:00am. Notes: Do not write all the code at once before starting to test. Take tiny steps. Write a few lines test... add a line or two test... add another line
More informationTopics. History. Architecture. MongoDB, Mongoose - RDBMS - SQL. - NoSQL
Databases Topics History - RDBMS - SQL Architecture - SQL - NoSQL MongoDB, Mongoose Persistent Data Storage What features do we want in a persistent data storage system? We have been using text files to
More informationIREASONING INC. UltraSwing User Guide
ULTRASWING LIBRARY IREASONING INC. UltraSwing User Guide ULTRASWING LIBRARY User Guide Copyright 2003 ireasoning Inc., All Rights Reserved. The information contained herein is the property of ireasoning
More informationLab Assignment 3 on XML
CIS612 Dr. Sunnie S. Chung Lab Assignment 3 on XML Semi-structure Data Processing: Transforming XML data to CSV format For Lab3, You can write in your choice of any languages in any platform. The Semi-Structured
More informationLotus IT Hub. Module-1: Python Foundation (Mandatory)
Module-1: Python Foundation (Mandatory) What is Python and history of Python? Why Python and where to use it? Discussion about Python 2 and Python 3 Set up Python environment for development Demonstration
More informationDatabase Design. A Bottom-Up Approach
Database Design A Bottom-Up Approach Reality Check Why do you need a database? What is the primary use of your database? Fast data entry Fast queries Summary data Who is responsible for the content? Who
More informationCS317 File and Database Systems
CS317 File and Database Systems Lecture 3 Relational Model & Languages Part-1 September 7, 2018 Sam Siewert More Embedded Systems Summer - Analog, Digital, Firmware, Software Reasons to Consider Catch
More informationINTRODUCTION TO DATA SCIENCE
DATA11001 INTRODUCTION TO DATA SCIENCE EPISODE 1: WHAT IS DATA SCIENCE?, DATA TODAY S MENU 1. COURSE LOGISTICS 2. WHAT IS DATA SCIENCE? 3. DATA WHO WE ARE Lecturer: Teemu Roos, Associate professor, PhD
More informationA Non-Relational Storage Analysis
A Non-Relational Storage Analysis Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Cloud Computing - 2nd semester 2012/2013 Universitat Politècnica de Catalunya Microblogging - big data?
More informationRelational Query Languages. Preliminaries. Formal Relational Query Languages. Example Schema, with table contents. Relational Algebra
Note: Slides are posted on the class website, protected by a password written on the board Reading: see class home page www.cs.umb.edu/cs630. Relational Algebra CS430/630 Lecture 2 Relational Query Languages
More informationData Wrangling with Python and Pandas
Data Wrangling with Python and Pandas January 25, 2015 1 Introduction to Pandas: the Python Data Analysis library This is a short introduction to pandas, geared mainly for new users and adapted heavily
More informationCS / Cloud Computing. Recitation 7 October 7 th and 9 th, 2014
CS15-319 / 15-619 Cloud Computing Recitation 7 October 7 th and 9 th, 2014 15-619 Project Students enrolled in 15-619 Since 12 units, an extra project worth 3-units Project will be released this week Team
More informationQueries give database managers its real power. Their most common function is to filter and consolidate data from tables to retrieve it.
1 2 Queries give database managers its real power. Their most common function is to filter and consolidate data from tables to retrieve it. The data you want to see is usually spread across several tables
More informationINFORMATION TECHNOLOGY NOTES
Unit-6 SESSION 7: RESPOND TO A MEETING REQUEST Calendar software allows the user to respond to other users meeting requests. Open the email application to view the request. to respond, select Accept, Tentative,
More informationCS12020 (Computer Graphics, Vision and Games) Worksheet 1
CS12020 (Computer Graphics, Vision and Games) Worksheet 1 Jim Finnis (jcf1@aber.ac.uk) 1 Getting to know your shield First, book out your shield. This might take a little time, so be patient. Make sure
More informationApplication development with relational and non-relational databases
Application development with relational and non-relational databases Mario Lassnig European Organization for Nuclear Research (CERN) mario.lassnig@cern.ch About me Software Engineer Data Management for
More informationDetailed instructions for video analysis using Logger Pro.
Detailed instructions for video analysis using Logger Pro. 1. Begin by locating or creating a video of a projectile (or any moving object). Save it to your computer. Most video file types are accepted,
More informationDatabases and ERP Selection: Oracle vs SQL Server
DATABASES AND ERP ORACLE VS SQL SELECTION: SERVER Databases and ERP Selection: Oracle vs SQL Server By Rick Veague, Chief Technology Officer, IFS North America An enterprise application like enterprise
More informationRavenDB & document stores
université libre de bruxelles INFO-H415 - Advanced Databases RavenDB & document stores Authors: Yasin Arslan Jacky Trinh Professor: Esteban Zimányi Contents 1 Introduction 3 1.1 Présentation...................................
More informationHow to Deploy Enterprise Analytics Applications With SAP BW and SAP HANA
How to Deploy Enterprise Analytics Applications With SAP BW and SAP HANA Peter Huegel SAP Solutions Specialist Agenda MicroStrategy and SAP Drilldown MicroStrategy and SAP BW Drilldown MicroStrategy and
More informationData Collection, Simple Storage (SQLite) & Cleaning
Data Collection, Simple Storage (SQLite) & Cleaning Duen Horng (Polo) Chau Georgia Tech CSE 6242 A / CS 4803 DVA Jan 15, 2013 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,
More informationFall 2017 Discussion 10: November 15, Introduction. 2 Creating Tables
CS 61A SQL Fall 2017 Discussion 10: November 15, 2017 1 Introduction SQL is an example of a declarative programming language Statements do not describe computations directly, but instead describe the desired
More information