Data Science Bootcamp Curriculum. NYC Data Science Academy

Similar documents
Python With Data Science

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:

SCIENCE. An Introduction to Python Brief History Why Python Where to use

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Specialist ICT Learning

Data Analytics Training Program

About Intellipaat. About the Course. Why Take This Course?

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

Python Certification Training

Python Certification Training

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Python Certification Training

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Data Science Training

Data Analytics Job Guarantee Program

Machine Learning in Action

Index. bfs() function, 225 Big data characteristics, 2 variety, 3 velocity, 3 veracity, 3 volume, 2 Breadth-first search algorithm, 220, 225

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Data Analytics Training Program using

ML 프로그래밍 ( 보충 ) Scikit-Learn

Byte Academy. Python Fullstack

Data Analyst Nanodegree Syllabus

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Scalable Machine Learning in R. with H2O

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

Higher level data processing in Apache Spark

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Apache Spark and Scala Certification Training

SOFTWARE DEVELOPMENT: DATA SCIENCE

Data Science. Data Analyst. Data Scientist. Data Architect

Scaled Machine Learning at Matroid

Integrating Advanced Analytics with Big Data

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data Infrastructures & Technologies

Data Analyst Nanodegree Syllabus

Analyzing Big Data with Microsoft R

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Certified Data Science with Python Professional VS-1442

SQL Server Machine Learning Marek Chmel & Vladimir Muzny

Python for. Data Science. by Luca Massaron. and John Paul Mueller

BEST BIG DATA CERTIFICATIONS

Chapter 1 - The Spark Machine Learning Library

Automation.

Using Existing Numerical Libraries on Spark

Tutorial on Machine Learning Tools

Data Science Course Content

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

Machine Learning with Python

90 Hours for online Live Training

Big Data and Large Scale Machine Learning

Innovatus Technologies

Big Data and FrameWorks; Perspectives to Applied Machine Learning

Overview and Practical Application of Machine Learning in Pricing

Overview. Audience profile. At course completion. Course Outline. : 20773A: Analyzing Big Data with Microsoft R. Course Outline :: 20773A::

Contents. Preface to the Second Edition

BIG DATA SCIENTIST Certification. Big Data Scientist

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics

Unifying Big Data Workloads in Apache Spark

SparkSQL 11/14/2018 1

Data Analytics and Machine Learning: From Node to Cluster

Big Data Analytics using Apache Hadoop and Spark with Scala

HADOOP COURSE CONTENT (HADOOP-1.X, 2.X & 3.X) (Development, Administration & REAL TIME Projects Implementation)

COPYRIGHT DATASHEET

BUILT FOR THE SPEED OF BUSINESS

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Programming for Data Science Syllabus

Apache SystemML Declarative Machine Learning

Hadoop & Big Data Analytics Complete Practical & Real-time Training

An Introduction to Apache Spark

MLI - An API for Distributed Machine Learning. Sarang Dev

Apache Spark 2.0. Matei

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT

Lecture 25: Review I

Lotus IT Hub. Module-1: Python Foundation (Mandatory)

Hadoop. Introduction to BIGDATA and HADOOP

Classifying Building Energy Consumption Behavior Using an Ensemble of Machine Learning Methods

Oracle Big Data Connectors

UCF DATA ANALYTICS AND VISUALIZATION BOOT CAMP

Big Data, Data Science Master Course

Dealing with Data Especially Big Data

Machine Learning With Spark

STAT 1291: Data Science

ACHIEVEMENTS FROM TRAINING

Applying Supervised Learning

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Machine Learning. Chao Lan

CSE 158. Web Mining and Recommender Systems. Midterm recap

pandas: Rich Data Analysis Tools for Quant Finance


Big Data Hadoop Stack

We are ready to serve Latest Testing Trends, Are you ready to learn.?? New Batches Info

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

KNIME for the life sciences Cambridge Meetup

Practical Guidance for Machine Learning Applications

Using Numerical Libraries on Spark

Certified Big Data Hadoop and Spark Scala Course Curriculum

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Transcription:

Data Science Bootcamp Curriculum NYC Data Science Academy

100+ hours free, self-paced online course. Access to part-time in-person courses hosted at NYC campus Machine Learning with R and Python Foundations of statistics, regressions, classifications, model selections, unsupervised learning, time series analysis, NLP, deep learning, Tensorflow, etc. Machine learning theory defense, Capstone project presentations. Code reviews, resume workshop, mock interviews, career day Prework Week 1-4 Week 5-9 Week 10-12 Get Hired Data Analysis and Visualization Linux system, Git, SQL Data analysis and visualization with R and Python R Shiny Web scraping with Python Big Data with Hadoop & Spark Spark, Spark SQL, Spark MLlib, Hadoop and MapReduce, Hive, Pig

Pre-work Once students are enrolled in the bootcamp, they are granted access to our online, self-paced pre-work materials: 20-30 hours: Introductory Python (Optional) 35-45 hours: Data Analysis and Visualization with R 20-30 hours: Data Analysis and Visualization with Python Students are also invited to join their cohort s Slack channel, where they meet their future classmates, instructors, and get support on pre-work assignments. Enrolled bootcamp students can also choose to take part-time, beginner-level courses hosted at our NYC campus. 100% tuition credited to bootcamp.

Week 1 Data Science Toolkit Linux, Git, Bash, and SQL Data Science with R Data Analytics Part I Linux system o Operating Systems and Linux o File System and File Operations o Text-processing commands o Other useful commands Git o What is Version Control and Git? o Installing Git o Getting Started with Git o Git Tips o Undoing Changes o What is Github? o Working With Remotes SQL o Intro to SQL o Tables and schemas o SQL queries SELECT o MySQL database management o Joins Programming foundation in R I o Introduction to R o Introduction to RStudio o R objects o Functional programming: apply Programming foundation in R II o More data types o Control statements o Functions o Data Transformations Week 2 Data Science with R Data Analytics Part II Data manipulation with dplyr o Introduction to dplyr o Built-in functions Updated April 10, 2017 1

o Join data sets o Groupwise operations Data Visualization with "ggplot2" o Why ggplot2? o The Grammar of Graphics o Constructing a ggplot2 plot o Scatterplots o Bar charts o Histograms o Visualizing big data o Saving Graphs o Customizing Graphics Lab: Data Visualization from Scratch Introduction to Shiny o Shiny introduction o Design the User-interface o Control widgets o Build reactive output o Use data table in Shiny Apps o Use R scripts, data and packages o UI and server for the App o Make Shiny perform quickly o Matrix-based visualizations o Use reactive expressions o Share and deploy Shiny apps Lab: Build a Shiny app from Scratch Week 3 Data Science with R Machine Learning Part I Data Science with Python - Data Analytics Part I Foundations of Statistics o All About Your Data o Statistical Inference o Introduction to Machine Learning Get Started with Python o Installing and using ipython o Simple values and expressions Updated April 10, 2017 2

o Lambda functions and named functions o Lists o Functional operators: map and filter Strings and Data Structures o String operations o File Input and Output o Searching in files o Data Structures Conditionals and Control Flows o Conditionals o For loops o List Comprehensions o While loops o Errors and Exceptions Project Day: Exploratory Visualization & Shiny Project 1 Due: Exploratory Visualization & Shiny Week 4 Data Science with Python Data Analytics Part II Advanced Topics o Multiple-list operations: map and zip o Functional operators: reduce o Object Oriented Programming Introduction to Web Scraping o Regular Expressions o Introduction to HTML o Basics of Beautifulsoup o Examples Introduction to Scrapy o An example o Getting Started o Items/spider/pipelines/settings.py o In Class Lab Introduction to Numpy o Ndarray o Subscripting and slicing o Operations o Matrix and linear algebra Updated April 10, 2017 3

o Random Sampling Introduction to Pandas o Data Structure o Data Manipulation o Handling missing data o Grouping and aggregation Week 5 Data Science with Python - Data Analytics Part III Data Science with R - Machine Learning Part I Matplotlib & Seaborn o In-class Lab Missingness & Imputation o Missing Data o Basic Methods of Imputation o K-Nearest Neighbors Linear Regression I o Simple Linear Regression o Assumptions & Diagnostics o Transformations o The Coefficient of Determination R 2 Project Day: Web Scraping Project 2 Due: Web Scraping Week 6 Data Science with R - Machine Learning Part II Linear Regression II o Multiple Linear Regression o Assumptions & Diagnostics o Research Questions of Interest o Extending Model Flexibility Generalized Linear Models o Logistic Regression o Maximum Likelihood Estimation o Model Interpretation o Assessing Model Fit Updated April 10, 2017 4

The Curse of Dimensionality o Ridge Regression o Lasso Regression o Cross-Validation o Bias/Variance Tradeoff Tree Methods o Decision Trees o Bagging o Random Forest o Variable Importance Week 7 Data Science with R - Machine Learning Part III Data Science with Python - Machine Learning Part I Support Vector Machines o Maximal Margin Classifier o Support Vector Classifier o Support Vector Machines o Multi-Class SVMs Association Rules & Naïve Bayes o Association Rule Mining o Naïve Bayes Python - Linear Regression o What is Machine Learning o Introduction to Scikit-Learn o Simple Linear Regression o Multiple Linear Regression o Statsmodels Python - Classification Part I o Limitation of Linear Regression o Logistic Regression o Discriminant Analysis: Motivation o Discriminant Analysis: Models Updated April 10, 2017 5

o Naïve Bayes Python - Model Selection o Cross-Validation o Bootstrap o Feature Selection o Regularization o Grid Search Week 8 Data Science with Python - Machine Learning Part II Data Science with R - Machine Learning Part IV Python - Classification Part II o Support Vector Machines o Tree-Based Methods Principal Component Analysis o Taking a New Perspective o Dimension Reduction o Vectors of Highest Variance o The PCA Procedure Cluster Analysis o Intro to Cluster Analysis o K-Means Clustering o Hierarchical Clustering o Clustering Takeaways Python - Unsupervised Learning o Intro to Unsupervised Learning o Principal Component Analysis o Clustering Project Day: Machine Learning Project 3 Due: Machine Learning Week 9 Data Science with R - Machine Learning (Continued) Big Data Time Series Analysis o o The Nature of Time Series Analysis Learn from the Examples Updated April 10, 2017 6

o Decomposition of Time Series Data o Examples of Stationary Non-White-Noise Time Series o ARMA and ARIMA Models o Assessing Model Fit Introduction to Spark o What is Apache Spark o Initializing Spark o RDDs, Transformations and Actions o Working with Key-Value Paris o Performance & Optimization Introduction to Spark SQL o Overview o Spark Session o Working with DataFrames o Using HiveQL in Spark SQL Spark Mllib o Spark Machine Learning Workflow o How ML Pipeline Works o ML Pipeline Example: Predicting Diamonds Price o Extracting, transforming and select features o Train Validation Splitting o Building the ML Pipeline with DecisionTreeRegressor o Model Evaluation o Model Tuning Week 10 Big Data (Continued) Advanced Machine Learning Topics Neural Network with Tensorflow Natural Language Processing with Deep Learning Hadoop and MapReduce: o What is Hadoop o HDFS o MapReduce o Combiner o Hadoop Monitoring Ports Apache Hive: Updated April 10, 2017 7

o Databases for Hadoop o Hive o Compiling HiveQL to MapReduce o Technical aspects of Hive o Extending Hive with TRANSFORM Apache Pig: o Pig Overview o An introductory example o Pig Latin Basics o Compiling Pig to MapReduce Week 11 SQL, R, & Python Code Review Machine Learning Theory Defense A/B Testing Machine Learning Theory Defense Practice Machine Learning Theory Defense Project Day - Capstone Week 12 SQL, R, & Python Code Review Machine Learning Theory Defense Capstone Project Presentations SQL Code Review Session R Code Review Session Python Code Review Session Machine Learning Theory Defense From the beginning of Bootcamp, you will work on hands-on projects. Now your Capstone Project lets you create your own data product that showcases your interests and talents. Students are free to use anything covered in class on this project. Updated April 10, 2017 8