DEVNET-2163 Machine Learning with Python Dmitry Figol, SE WW Enterprise Sales @dmfigol
Cisco Spark How Questions? Use Cisco Spark to communicate with the speaker after the session 1. Find this session in the Cisco Live Mobile App 2. Click Join the Discussion 3. Install Spark or go directly to the space 4. Enter messages/questions in the space cs.co/ciscolivebot#devnet-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Agenda Machine Learning introduction Supervised Machine Learning Unsupervised Machine Learning Machine Learning in Python with scikit-learn library Application of Machine Learning in networking
theguardian
Google recaptcha DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 7
More examples YouTube, Spotify, Amazon recommender systems Speech recognition: Siri, Google Home, Alexa IBM Watson Google ranking results Facebook s facial recognition Many more DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 8
What is Machine Learning Machine learning is a collection of algorithms that can teach a computer to learn from data DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 9
Problems with machine learning Dirty, missing data Not enough data Hard to convert unstructured data to mathematical values Hard to find a good model Do results make sense? Is it bug or expected result? Incorrect model? A number (or an army) of data scientists are needed Or you could become one ;) DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 11
All models are wrong but some are useful George Box 1978 DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 12
Data in Machine Learning It is all about data Bad data leads to bad results Big data is required to get good results Data should be cleaned first DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 13
Machine Learning Algorithms
Types of Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Deep Learning DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 15
Supervised Learning Solves classification and regression problems based on labeled training data Classification: assign groups to input data based on previous data Regression: predicts real values to input data based on previous data DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 16
Supervised Learning Algorithms Linear Regression Logistic regression K-Nearest Neighbor (KNN) Support Vector Machine (SVM) Decision trees and random forests Many more DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 17
Data in Supervised learning Each data entry has features (x 1, x 2,.., x n ) and a label y Features represent different data characteristics, a label outcome/result Training data set contains records where a label is known Data is fed into the algorithm and the model is computed Model represents correlation of features to a label Machine can now predict labels for new feature values DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 18
Data in Supervised learning - Example A big number of network engineers filled in a survey with questions about their skills, experience, number of employees in their company, region and salary Skills, years of experience, information about company, region are features Salary is a label A model describing correlation of features to a label is found based on the survey data Now we can predict the salary for a person who didn t take a survey DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 19
Linear Regression The simplest supervised learning algorithm Linear relation between features and labels: y = w 0 + w 1 x 1 + + w n x n Only one feature a straight line: y = w 0 + w 1 x 1 Intuition: find coefficients w 0, w 1,..., w n so that an error between all training labels and predicted labels is minimal DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 20
Linear Regression - example House prices Price Square meters DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 21
Unsupervised Machine Learning Solves clustering and association problems based on unlabeled training data Clustering: discovering grouping in the data Association: finding rules that describe data DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 22
Unsupervised Learning Algorithms K-means Hierarchical clustering Apriori algorithm DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 23
K-means Finds K groups within the data Algorithm: Select K random points as centroids Assign all dataset points to centroids based on proximity Update centroids within group Repeat until centroids do not change DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 24
Reinforcement Learning Based on exploring the world Rewards for actions are known Start with random actions, record rewards with every action Build a policy by preferring actions that lead to higher rewards Continue improving the policy with every experience Examples: both AlphaGo and OpenAI played with themselves thousands of times until they learnt strategies that are close to perfect in complex games DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 25
Deep Learning State of the art in Machine Learning Mostly for supervised learning Uses neural networks (the first application: to model brain neurons) More complex Requires more data Requires more time to train DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 26
Deep Learning applications Speech and image recognition Natural language processing Machine translation (DeepL) Self-driving cars Many more DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 27
Machine Learning with Python ML libraries: scikit learn Tensorflow Apache Spark Data science libraries: numpy/scipy pandas DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 28
Machine Learning with Python demo
Machine Learning in Networking
Encrypted Traffic Analytics (ETA) Known malware traffic Known benign traffic Extract observable features in the data Employ machine learning techniques to build detectors Known malware sessions detected in encrypted traffic with 99% accuracy Identifying encrypted malware traffic with contextual flow data AISec 16 Blake Anderson, David McGrew (Cisco Fellow) DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 31
How it works Initial Data Packet Make the most of the unencrypted fields Sequence of Packet Lengths and Times Identify the content type through the size and timing of packets Threat Intelligence Map Who s who of the Internet s dark side Data exfiltration Self-Signed certificate C2 message Broad behavioral information about the servers on the Internet. DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 32
Malware detection with ETA Initial Data Packet Cloud-based machine learning Threat Intelligence Map Sequence of Packet Lengths and Times All three elements reinforce each other inside the analytics engine using them. DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 33
Machine Learning For wired and wireless networks End to end visibility Predictive Analytics DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 34
DNA Center Analytics Distributed Stream Processing Continuous processing, aggregating, correlating and analyzing data in motion Distributed analytics pipeline runtime and programming model Real-time or near real-time Analytics Operations: Time Series Analysis Complex Event Processing Machine Learning DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 35
Cisco Spark How Questions? Use Cisco Spark to communicate with the speaker after the session 1. Find this session in the Cisco Live Mobile App 2. Click Join the Discussion 3. Install Spark or go directly to the space 4. Enter messages/questions in the space cs.co/ciscolivebot#devnet-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Please complete your Online Session Evaluations after each session Complete 4 Session Evaluations & the Overall Conference Evaluation (available from Thursday) to receive your Cisco Live T-shirt All surveys can be completed via the Cisco Live Mobile App or the Communication Stations Complete Your Online Session Evaluation Don t forget: Cisco Live sessions will be available for viewing on-demand after the event at www.ciscolive.com/global/on-demand-library/. 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public
Continue Your Education Start with prerequisites: Programming: Python, R, Matlab, etc. Linear Algebra Statistics Machine Learning resources: Machine Learning by Andrew Ng online course Coursera/Stanford Reinforcement Learning course by David Silver recordings CS231n Convolutional Neural Networks for Visual Recognition - Stanford DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 38
Continue Your Education DNA-C Analytics booth in WoS Related sessions: Unlocking the Mystery of Machine Learning and Big Data [BRKIOT-2394] A Cloud-based Machine Learning / Analytics architecture for DNA (wireless/wired) Assurance [BRKEWN-2033] Inside Cisco IT: Using Machine Learning Technologies to Drive Digital Transformation [BRKCOC-2017] DEVNET-2163 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 39
Thank you