DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:

Similar documents
Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

Python With Data Science

Data Science Bootcamp Curriculum. NYC Data Science Academy

Data Science. Data Analyst. Data Scientist. Data Architect

90 Hours for online Live Training

Data Analytics Training Program

Certified Data Science with Python Professional VS-1442

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus

Python Certification Training

Data Analytics Training Program using

Specialist ICT Learning

ACHIEVEMENTS FROM TRAINING

BUSINESS ANALYTICS. 96 HOURS Practical Learning. DexLab Certified. Training Module. Gurgaon (Head Office)

Contents. Preface to the Second Edition

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT

Python for. Data Science. by Luca Massaron. and John Paul Mueller

About Intellipaat. About the Course. Why Take This Course?

BIG DATA SCIENTIST Certification. Big Data Scientist

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Python Certification Training

Data Management - 50%

SCIENCE. An Introduction to Python Brief History Why Python Where to use

Data Science Training

Data Science with Python Course Catalog

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

CS6220: DATA MINING TECHNIQUES

Ivy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V)

Table Of Contents: xix Foreword to Second Edition

Introduction to R Programming

SAS (Statistical Analysis Software/System)

SQL Server Machine Learning Marek Chmel & Vladimir Muzny

Python Certification Training

Data Science Course Content

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Slides for Data Mining by I. H. Witten and E. Frank

Python Training. Complete Practical & Real-time Trainings. A Unit of SequelGate Innovative Technologies Pvt. Ltd.

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Quantitative - One Population

Machine Learning Project Report

Supervised Learning Classification Algorithms Comparison

Business Club. Decision Trees

Doing the Data Science Dance

ARTIFICIAL INTELLIGENCE AND PYTHON

SAS (Statistical Analysis Software/System)

Chapter 1, Introduction

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Pre-Requisites: CS2510. NU Core Designations: AD

Machine Learning with Python

Performance Evaluation of Various Classification Algorithms

STAT 1291: Data Science

Applied Regression Modeling: A Business Approach

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8

Python & Spark PTT18/19

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone

Ch.1 Introduction. Why Machine Learning (ML)?

Webgurukul Programming Language Course

Machine Learning in Action

What is KNIME? workflows nodes standard data mining, data analysis data manipulation

Ch.1 Introduction. Why Machine Learning (ML)? manual designing of rules requires knowing how humans do it.

Machine Learning in the Process Industry. Anders Hedlund Analytics Specialist

Machine Learning Techniques for Data Mining

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

Overview. Data Mining for Business Intelligence. Shmueli, Patel & Bruce

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa

STATISTICS (STAT) 200 Level Courses. 300 Level Courses. Statistics (STAT) 1

Excel to R and back 1

Data Analysis Guidelines

JMP Book Descriptions

Hal Varian, Google s Chief Economist The McKinsey Quarterly, Jan 2009

GETTING STARTED WITH DATA MINING

Generalized least squares (GLS) estimates of the level-2 coefficients,

Machine Learning and SystemML. Nikolay Manchev Data Scientist Europe E-

Lecture 25: Review I

Network Traffic Measurements and Analysis

K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Antrix Academy of Data Science TM

R (2) Data analysis case study using R for readily available data set using any one machine learning algorithm.

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007

Using Machine Learning to Optimize Storage Systems

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Why I Use Python for Academic Research

Data Mining: Exploring Data

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Scalable Machine Learning in R. with H2O

STATISTICS (STAT) Statistics (STAT) 1

Python for Data Analysis

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Data Analysis Multiple Regression

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Index. bfs() function, 225 Big data characteristics, 2 variety, 3 velocity, 3 veracity, 3 volume, 2 Breadth-first search algorithm, 220, 225

MATLAB is a multi-paradigm numerical computing environment fourth-generation programming language. A proprietary programming language developed by

Supervised vs unsupervised clustering

Apache SystemML Declarative Machine Learning

Fathom Dynamic Data TM Version 2 Specifications

Transcription:

DATA SCIENCE About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst/Analytics Manager/Actuarial Scientist/Business Analytic Practitioners. The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. The course is a combination of various data science concepts such as machine learning, visualization, data mining, programming, data munging, etc. There are three components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is manual calculations will be shown on how formulae s are used behind the logics. The third is a practical introduction to the tools that will be used in the program like R Programming and EXCEL. Course features: Exclusive doubt clarification session on every weekend Real Time Case Study driven approach Placement Assistance Pre-Requisite / Qualification: Any Graduate. No programming and statistics knowledge or skills required Duration of the course: 90 Hours (On working days-one hour and weekends-3hrs). Mode of course delivery: Classroom/Online Training INTRODUCTION What is Data Science? Introduction. What background is required? Why Data Science? Importance of Data Science. Demand for Data Science Professional. Brief Introduction to Big data and Data Analytics. Lifecycle of data science. Tools and Technologies used in data Science. What is Machine Learning? Different types of Data Science Tasks.

BUSINESS STATISTICS Descriptive statistics and Inferential Statistics Sample and Population Variables and Data types Percentiles Measures of Central Tendency Measures of Spread Skeweness, Kurtosis Degrees of freedom Variance, Covariance, Correlation Standardization/Scaling Probability Expected of x Sampling Distribution Standard Probability Distribution Functions Bernoulli, Binomial, Normal distributions Standard Normal Deviate Decision Making Rules Test of Hypothesis One sample t-test, Chi-square Two sample t-test Analysis of Variance (ANOVA) EXPLORATORY DATA ANALYSIS AND VISUALIZATION Summary Statistics Data Transformations Outlier Detection and Management Charts and Graphs One Dimensional Chart Box plots Bar graph Histogram Scatter plots Multi-Dimensional Charts Fancy Charts - Bubble charts DATA PRE-PROCESSING Data Types and Conversions Binning, Scaling, Standardization, Normalization Min-max Scaling Missing values Treatment Imputation

PREDICTION ANALYTICS Simple Linear Regression Multiple Linear Regression Estimation of Model Parameters Hypothesis Testing in Multiple Linear Regression Extra sum of squares R Square, R- Square Adjusted Variable Selection a. All Possible Regressions b. Sequential Selection (Forward, Backward, Stepwise) Multicollinearity VIF Residual Analysis/Regression Diagnostics. Polynomial Regression Transformations a. Bulging Rules b. Box Tidwell c. Box cox d. Weighted Least Square Dummy variables a. General Concepts of Indicator variables. Predicted Error sum of squares (PRESS) Assessing Performance a. Variance Biased Trade-off b. Resampling Methods c. Cross Validation d. Leave one out Cross validation e. k-fold Cross Validation f. Bootstrap Logistic Regression A Case Study will be presented on Logistic Regression

MACHINE LEARNING Introduction to Supervised and unsupervised Learning Neural Networks a. Network Topology b. Single Layer Perceptron c. Multi-Layer perceptron d. Feed forward and Back propagation Models Introduction to Deep Learning Association Rules a. Market Basket Analysis b. APRIORI c. Support, Lift, Confidence Nearest-Neighbour Methods (KNN Classifier) a. Euclidian Distance b. Hamming Distance Decision Tree a. Finding Root Node, Intermediate Nodes, Terminal Nodes b. Construction of Rules c. Miss classification d. Gini Index e. Overfitting and Prunning f. Regression Trees Boosting, Bagging and Random Forest a. Resampling Methods b. Resampling methods with Replacement c. Resampling methods without Replacement d. Random Forest Dimensional Reduction Techniques 1. Principle Component Analysis a. Eigen values and Eigen Vectors 2. Cluster Analysis a. Hierarchal Clustering b. Linkage Methods c. Non- Hierarchal Clustering d. K-Means Clustering

Text Mining / Natural Language processing a. Unstructured Data b. Text Analytics c. Cleaning Text data d. Tokenization e. Pre-processing f. Word counts and word clouds g. Sentiment Analysis h. Text classification i. Distance measures Introduction to probabilistic methods Introduction a. Naive Bayes b. Joint and Condition probabilities c. Classification using Naive Bayes Approach Support Vector Machines a. Maximum Margin Classifier b. Support vector Classifier c. Support vector machines d. Kernels Linear and Non Linear PYTHON - PROGRAMMING How to install python (Anaconda) How to install scikit Learn (Anaconda) How to work with Jupyter Notebook How to work with Spyder IDE Strings Lists Tuples Sets Dictionaries Control Flows Functions Formal/Positional/Keyword arguments Predefined functions (range, len, enumerates etc ) Data Frames Packages required for data Science in Python Lab/Coding

Introduction to NumPy One-dimensional Array Two-dimensional Array Pr-defined functions (arrange, reshape, zeros, ones, empty) Basic Matrix operations Scalar addition, subtraction, multiplication, division Matrix addition, subtraction, multiplication, division and transpose Slicing Indexing Looping Shape Manipulation Stacking Introduction to Pandas Series DataFrame df.groupby df.crosstab df.apply df.map Apache Spark Analytics What is Spark Introduction to Spark RDD Introduction to Spark SQL and Dataframes Using R-Spark for machine learning Hands-on: installation and configuration of Spark Hands on Spark RDD programming Hands on of Spark SQL Dataframe programming Using R-Spark for machine learning programming

R PROGRAMMING 1. Getting R 1.1 Downloading R 1.2 R Version 1.3 32-bit versus 64-bit 1.4 Installing 2. The R Environment 2.1 Command Line Interface 2.2 RStudio 3. R Packages 3.1 Installing Packages 3.2 Loading Packages 4. Reading Data into R 4.1 Reading CSVs 4.2 Excel Data 4.3 Clipboard 5. Advanced Data Structures 5.1 Data.frames 5.2 Lists 5.3 Matrices 5.4 Arrays 5.5. Factors 6. Basics of R 6.1 Basic Math 6.2 Variables 6.3 Data Types 6.4 Vectors 6.5 Calling Functions 6.6 Function Documentation 6.7 Missing Data 7. Control Statements 7.1 if and else 7.2 switch 7.3 ifelse 8. Loops 8.1 for Loops

8.2 while Loops 8.3 Controlling Loops 9. Group Manipulation 9.1 Apply Family 9.2 aggregate 10. Data Reshaping 10.1 cbind and rbind 10.2 Joins 10.3 Reshape2 11. String Theory 11.1 paste 11.2 sprintf 11.3 Extracting Text/ Regular Expressions 12. Graphs with R and GGPlot2 12.1 Basic and Interactive Plots 12.2 Dendrograms 12.3 Pie Chart and Its Alternatives 12.4 Adding the Third Dimension 12.5 Visualizing Continuous Data 13. Basic Statistics 13.1 Summary Statistics 13.2 Correlation and Covariance 13.3 T-Tests 13.4 ANOVA 14. Probability Distributions 14.1 Normal Distribution 14.2 Binomial Distribution Course Highlights A Dedicated Portal For Practicing. Real Time Project Data Models to Work 1-1 Mentorship Internship Offers for Freshers. Weekly Assignments. Weekly Doubt Sessions\ Resume Preparation Tips Interview Guidance And Support. Dedicated HR Team for Job Support And Placement Assistance.