Python With Data Science

Similar documents
Specialist ICT Learning

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

Certified Data Science with Python Professional VS-1442

Data Science Bootcamp Curriculum. NYC Data Science Academy

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:

Python Certification Training

Data Science. Data Analyst. Data Scientist. Data Architect

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

SCIENCE. An Introduction to Python Brief History Why Python Where to use

Ch.1 Introduction. Why Machine Learning (ML)?

ARTIFICIAL INTELLIGENCE AND PYTHON

Python for. Data Science. by Luca Massaron. and John Paul Mueller

Data Science with Python Course Catalog

ML 프로그래밍 ( 보충 ) Scikit-Learn

SQL Server Machine Learning Marek Chmel & Vladimir Muzny

Python & Spark PTT18/19

Supervised Learning Classification Algorithms Comparison

CIS192 Python Programming

Data Science Course Content

Data Analytics and Machine Learning: From Node to Cluster

Ch.1 Introduction. Why Machine Learning (ML)? manual designing of rules requires knowing how humans do it.

Contents. Preface to the Second Edition

Python Certification Training

BIG DATA SCIENTIST Certification. Big Data Scientist

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Evaluation of Machine Learning Algorithms for Satellite Operations Support

Machine Learning in Python. Rohith Mohan GradQuant Spring 2018

Performance Evaluation of Various Classification Algorithms

About Intellipaat. About the Course. Why Take This Course?

Introduction to R Programming

Rapid growth of massive datasets

Introducing Categorical Data/Variables (pp )

Oracle Big Data Discovery

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Network Traffic Measurements and Analysis

Data Analytics Training Program

Machine Learning with Python

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

Dealing with Data Especially Big Data

Supervised vs unsupervised clustering

EPL451: Data Mining on the Web Lab 5

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

ADVANCED CLASSIFICATION TECHNIQUES

90 Hours for online Live Training

CSE 158. Web Mining and Recommender Systems. Midterm recap

Using Existing Numerical Libraries on Spark

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

HANDS ON DATA MINING. By Amit Somech. Workshop in Data-science, March 2016

Machine Learning in Action

Chapter 1 - The Spark Machine Learning Library

scikit-learn (Machine Learning in Python)

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

R (2) Data analysis case study using R for readily available data set using any one machine learning algorithm.

Python Certification Training

Writing Queries Using Microsoft SQL Server 2008 Transact- SQL

Oracle Machine Learning Notebook

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT

Data Science Training

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

pandas: Rich Data Analysis Tools for Quant Finance

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22

SOFTWARE DEVELOPMENT: DATA SCIENCE

Data Analyst Nanodegree Syllabus

Machine Learning: Think Big and Parallel

Intel Distribution for Python* и Intel Performance Libraries

Introduction to Data Science

Data Analyst Nanodegree Syllabus

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

Chapter 1, Introduction

Computational Databases: Inspirations from Statistical Software. Linnea Passing, Technical University of Munich

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Classification Algorithms in Data Mining

Data Analytics Training Program using

Predicting Popular Xbox games based on Search Queries of Users

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Data Mining. Jeff M. Phillips. January 7, 2019 CS 5140 / CS 6140

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

7 Techniques for Data Dimensionality Reduction

Learning Alliance Corporation, Inc. For more info: go to

Clustering algorithms and autoencoders for anomaly detection

Connecting ArcGIS with R and Conda. Shaun Walbridge

Python for Data Analysis

Slides for Data Mining by I. H. Witten and E. Frank

Using Numerical Libraries on Spark

Topics In Feature Selection

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Accelerated Machine Learning Algorithms in Python

Naïve Bayes for text classification

Introduction to Data Mining and Data Analytics

Data Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\

Machine Learning in the Process Industry. Anders Hedlund Analytics Specialist

SQL Server 2017: Data Science with Python or R?

Classifying Building Energy Consumption Behavior Using an Ensemble of Machine Learning Methods

Data Set. What is Data Mining? Data Mining (Big Data Analytics) Illustrative Applications. What is Knowledge Discovery?

Visualization and text mining of patent and non-patent data

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Introduction to Text Mining. Hongning Wang

Transcription:

Course Overview This course covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Who Should Attend Data Scientists, Software Developers, IT Architects, and Technical Managers. Participants should have the general knowledge of statistics and programming and also be familiar with Python. Course Objectives NumPy, pandas, Matplotlib, scikit-learn; Python REPLs; Jupyter Notebooks; Data analytics life-cycle phases; Data repairing and normalizing; Data aggregation and grouping; Data visualization; Data science algorithms for supervised and unsupervised; Machine Learning. Course Outline 1 PYTHON FOR DATA SCIENCE Using Modules Listing Methods in a Module Creating Your Own Modules List Comprehension Dictionary Comprehension String Comprehension Python 2 vs Python 3 Sets (Python 3+) Python Idioms Python Data Science Ecosystem NumPy NumPy Arrays NumPy Idioms pandas Data Wrangling with pandas' DataFrame SciPy Scikit-learn SciPy or scikit-learn? Matplotlib Python vs R Python on Apache Spark Python Dev Tools and REPLs Anaconda IPython Visual Studio Code Jupyter Jupyter Basic Commands View Course Dates & Register Today This is a 2-day class Page 1

2 APPLIED DATA SCIENCE What is Data Science? Data Science Ecosystem Data Mining vs. Data Science Business Analytics vs. Data Science Data Science, Machine Learning, AI? Who is a Data Scientist? Data Science Skill Sets Venn Diagram Data Scientists at Work Examples of Data Science Projects An Example of a Data Product Applied Data Science at Google Data Science Gotchas 3 DATA ANALYTICS LIFE-CYCLE PHASES Big Data Analytics Pipeline Data Discovery Phase Data Harvesting Phase Data Priming Phase Data Logistics and Data Governance Exploratory Data Analysis Model Planning Phase Model Building Phase Communicating the Results Production Roll-out 4 REPAIRING AND NORMALIZING DATA Repairing and Normalizing Data Dealing with the Missing Data Sample Data Set Getting Info on Null Data Dropping a Column Interpolating Missing Data in pandas Replacing the Missing Values with the Mean Value Scaling (Normalizing) the Data Data Preprocessing with scikit-learn Scaling with the scale() Function The MinMaxScaler Object Page 2

5 DESCRIPTIVE STATISTICS COMPUTING FEATURES IN PYTHON Descriptive Statistics Non-uniformity of a Probability Distribution Using NumPy for Calculating Descriptive Statistics Measures Finding Min and Max in NumPy Using pandas for Calculating Descriptive Statistics Measures Correlation Regression and Correlation Covariance Getting Pairwise Correlation and Covariance Measures Finding Min and Max in pandas DataFrame 6 DATA AGGREGATION AND GROUPING Data Aggregation and Grouping Sample Data Set The pandas.core.groupby.seriesgroupby Object Grouping by Two or More Columns Emulating the SQL's WHERE Clause The Pivot Tables Cross-Tabulation 7 DATA VISUALIZATION WITH MATPLOTLIB Data Visualization What is matplotlib? Getting Started with matplotlib The Plotting Window The Figure Options The matplotlib.pyplot.plot() Function The matplotlib.pyplot.bar() Function The matplotlib.pyplot.pie () Function Subplots Using the matplotlib.gridspec.gridspec Object The matplotlib.pyplot.subplot() Function Figures Saving Figures to File Visualization with pandas Working with matplotlib in Jupyter Notebooks 8 DATA SCIENCE AND ML ALGORITHMS IN SCIKIT- LEARN Data Science, Machine Learning, AI? Types of Machine Learning Terminology: Features and Observations Continuous and Categorical Features (Variables) Terminology: Axis The scikit-learn Package scikit-learn Estimators Models, Estimators, and Predictors Page 3

Common Distance Metrics The Euclidean Metric The LIBSVM format Scaling of the Features The Curse of Dimensionality Supervised vs Unsupervised Machine Learning Supervised Machine Learning Algorithms Unsupervised Machine Learning Algorithms Choose the Right Algorithm Life-cycles of Machine Learning Development Data Split for Training and Test Data Sets Data Splitting in scikit-learn Classification Examples Classifying with k-nearest Neighbors (SL) k-nearest Neighbors Algorithm k-nearest Neighbors Algorithm The Error Rate Dimensionality Reduction The Advantages of Dimensionality Reduction Principal component analysis (PCA) Data Blending Decision Trees (SL) Decision Tree Terminology Decision Tree Classification in Context of Information Theory Information Entropy Defined The Shannon Entropy Formula The Simplified Decision Tree Algorithm Using Decision Trees Random Forests SVM Naive Bayes Classifier (SL) Naive Bayesian Probabilistic Model in a Nutshell Bayes Formula Classification of Documents with Naive Bayes Unsupervised Learning Type: Clustering Clustering Examples k-means Clustering (UL) k-means Clustering in a Nutshell k-means Characteristics Regression Analysis Simple Linear Regression Model Linear vs Non-Linear Regression Linear Regression Illustration Major Underlying Assumptions for Regression Analysis Least-Squares Method (LSM) Locally Weighted Linear Regression Regression Models in Excel Multiple Regression Analysis Logistic Regression Regression vs Classification Time-Series Analysis Page 4

Decomposing Time-Series 9 LAB EXERCISES Lab 1 - Learning the Lab Environment Lab 2 - Using Jupyter Notebook Lab 3 - Repairing and Normalizing Data Lab 4 - Computing Descriptive Statistics Lab 5 - Data Grouping and Aggregation Lab 6 - Data Visualization with matplotlib Lab 7 - Data Splitting Lab 8 - k-nearest Neighbors Algorithm Lab 9 - The k-means Algorithm Lab 10 - The Random Forest Algorithm Page 5