Spam Filtering with Naive Bayes Classifier
|
|
- Michael Hines
- 6 years ago
- Views:
Transcription
1 Spam Filtering with Naive Bayes Classifier Yuriy Arabskyy June 6, 2017
2 Table of contents What is spam? Different spam types Anti-Spam Techniques Probability theory basics Conditional probability Bayes Theorem Naive Bayes Theorem Spam filtering with Naive Bayes Classifier (NBC) Definition of terms Feature representation Evaluation Comparison to Logistic Classifier (LC)
3 What is spam? Spam mass-mailing of a message over the internet, for the purposes of advertising.
4 What is spam? Spam mass-mailing of a message over the internet, for the purposes of advertising. HOT sing1e women seeking your attention near this AREA!!!%%###!!! Just follow this link...
5 What is spam? Spam mass-mailing of a message over the internet, for the purposes of advertising. HOT sing1e women seeking your attention near this AREA!!!%%###!!! Just follow this link... I am Wumi Abdul; the only Daughter of late Mr and Mrs George Abdul. My father was a very wealthy cocoa merchant in Abidjan, he was poisoned to death by his business associates... I seek for a foreign partner. Please provide a Bank account where this money would be transferred to.
6 Different spam types Figure: Spam chart
7 Anti-Spam Techniques End-user techniques Discretion Address munging Ham passwords
8 Anti-Spam Techniques End-user techniques Discretion Address munging Ham passwords Mail server level filtering Realtime Blackhole Lists Spamtrapping SMTP callback verification Statistical spam filtering
9 Probability theory basics Conditional Probability: Pr[X Y ] = Pr[Y X ] Pr[X ] (1)
10 Probability theory basics Conditional Probability: Pr[X Y ] = Pr[Y X ] Pr[X ] (1) Figure: Weather - conditional probability
11 Probability theory basics Pr[X Y ] = Pr[Y X ] Pr[X ] = Pr[X Y ] Pr[Y ] (2)
12 Probability theory basics Pr[X Y ] = Pr[Y X ] Pr[X ] = Pr[X Y ] Pr[Y ] (2) Bayes Theorem: Pr[Y X ] = Pr[X Y ] Pr[Y ] Pr[X ] (3)
13 Probability theory basics Pr[X Y ] = Pr[Y X ] Pr[X ] = Pr[X Y ] Pr[Y ] (2) Bayes Theorem: Pr[Y X ] = Pr[X Y ] Pr[Y ] Pr[X ] (3) Bayes Theorem is a way of updating of what we think about the world, based on what we know about it.
14 Probability theory basics Multiple variables Pr[x 1, x 2,..., x n ] = Pr[x 1 x 2, x 3,..., x n ] Pr[x 2, x 3,..., x n ] (4) Pr[x 2, x 3,..., x n ] = Pr[x 2 x 3, x 4,..., x n ] Pr[x 3, x 4,..., x n ] (5)
15 Probability theory basics Multiple variables Pr[x 1, x 2,..., x n ] = Pr[x 1 x 2, x 3,..., x n ] Pr[x 2, x 3,..., x n ] (4) Pr[x 2, x 3,..., x n ] = Pr[x 2 x 3, x 4,..., x n ] Pr[x 3, x 4,..., x n ] (5) Assuming x i and x j are independent: Pr[x i x j ] = Pr[x i ] (6)
16 Probability theory basics Multiple variables Pr[x 1, x 2,..., x n ] = Pr[x 1 x 2, x 3,..., x n ] Pr[x 2, x 3,..., x n ] (4) Pr[x 2, x 3,..., x n ] = Pr[x 2 x 3, x 4,..., x n ] Pr[x 3, x 4,..., x n ] (5) Assuming x i and x j are independent: Pr[x i x j ] = Pr[x i ] (6) Previous formula may be simplified to the following one: Pr[x 1, x 2,..., x n ] = Pr[x 1 ] Pr[x 2 ]... Pr[x n ] (7)
17 Spam filtering with NBC Bayes theorem rewritten using the naive assumption: Pr(c x 1, x 2,..., x n ) = Pr(c)Pr(x 1 c)pr(x 2 c) Pr(x n c) Pr(x 1, x 2,..., x n ) (8)
18 Spam filtering with NBC Bayes theorem rewritten using the naive assumption: Pr(c x 1, x 2,..., x n ) = Pr(c)Pr(x 1 c)pr(x 2 c) Pr(x n c) Pr(x 1, x 2,..., x n ) (8) Class of d i = argmax c Pr(c d i ) (9)
19 Spam filtering with NBC Defintion of terms: Vocabulary (V) is an ordered collection of words i.e., V = (v 1, v 2, v 3,..., v n ) used to classify an . Document (D) is an ordered collection of words used in a message D = (w 1, w 2, w 3,..., w n ). The classifier is a machine that, when given a document D and a collection of parameters θ, deterministically returns the class of the document.
20 Spam filtering with NBC Document representation Binary vector of length V is used to represent a document. x i means the absence of the word v i in the specified document.
21 Spam filtering with NBC Document representation Binary vector of length V is used to represent a document. x i means the absence of the word v i in the specified document. Bernoulli event model Pr[x i c k ] = p x i ki (1 p ki) 1 x i (10) p ki is the probability of class c k generating the word v i and can be calculated as follows: d c p ki = k ispresent(v i, d) (11) # of documents in c k
22 Evaluation Legitimate Spam Classifier accepted a b Classifier rejected c d b accepted even though it was spam c legitimate mail is classified as spam (very bad!) Recall = a a + c Precision = a a + b (12)
23 Comparison to Logistic Classifier Advantage NBC requires less training data to be able to function properly. Disadvantage Logistic Classifier can reach a lower error rate when given enough data.
24 Comparison to Logistic Classifier Figure: Dashed LC; Solid NBC; Y-axis error; X-axis - m (1000 random train splits
25 Thank you for listening!
26 Thank you for listening! And remember, what do we say to nigerian princes who want to make business with you? :)
27 Thank you for listening! And remember, what do we say to nigerian princes who want to make business with you? :) If you re interested, listen to James Veitch s talk about answering spam: happens_when_you_reply_to_spam_
Introduction to Machine Learning Prof. Mr. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Introduction to Machine Learning Prof. Mr. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 19 Python Exercise on Naive Bayes Hello everyone.
More informationProblem Set #6 Due: 11:30am on Wednesday, June 7th Note: We will not be accepting late submissions.
Chris Piech Pset #6 CS09 May 26, 207 Problem Set #6 Due: :30am on Wednesday, June 7th Note: We will not be accepting late submissions. For each of the written problems, explain/justify how you obtained
More information2. A Bernoulli distribution has the following likelihood function for a data set D: N 1 N 1 + N 0
Machine Learning Fall 2015 Homework 1 Homework must be submitted electronically following the instructions on the course homepage. Make sure to explain you reasoning or show your derivations. Except for
More informationCS/INFO 1305 Summer 2011 Machine Learning
ML Artificial Intelligence ML How does a human learn? Machine learning applications Central challenge in machine learning How can we build computer systems that automatically improve with experience, and
More informationSpam Classification Documentation
Spam Classification Documentation What is SPAM? Unsolicited, unwanted email that was sent indiscriminately, directly or indirectly, by a sender having no current relationship with the recipient. Objective:
More informationNaïve Bayes, Gaussian Distributions, Practical Applications
Naïve Bayes, Gaussian Distributions, Practical Applications Required reading: Mitchell draft chapter, sections 1 and 2. (available on class website) Machine Learning 10-601 Tom M. Mitchell Machine Learning
More information1 Document Classification [60 points]
CIS519: Applied Machine Learning Spring 2018 Homework 4 Handed Out: April 3 rd, 2018 Due: April 14 th, 2018, 11:59 PM 1 Document Classification [60 points] In this problem, you will implement several text
More informationBayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.
Bayes Nets Independence With joint probability distributions we can compute many useful things, but working with joint PD's is often intractable. The naïve Bayes' approach represents one (boneheaded?)
More informationPredictive Analysis: Evaluation and Experimentation. Heejun Kim
Predictive Analysis: Evaluation and Experimentation Heejun Kim June 19, 2018 Evaluation and Experimentation Evaluation Metrics Cross-Validation Significance Tests Evaluation Predictive analysis: training
More informationProblem Set #6 Due: 2:30pm on Wednesday, June 1 st
Chris Piech Handout #38 CS09 May 8, 206 Problem Set #6 Due: 2:30pm on Wednesday, June st Note: The last day this assignment will be accepted (late) is Friday, June 3rd As noted above, the last day this
More informationNearest neighbors classifiers
Nearest neighbors classifiers James McInerney Adapted from slides by Daniel Hsu Sept 11, 2017 1 / 25 Housekeeping We received 167 HW0 submissions on Gradescope before midnight Sept 10th. From a random
More informationNaïve Bayes Classifiers. Jonathan Lee and Varun Mahadevan
Naïve Bayes Classifiers Jonathan Lee and Varun Mahadevan Programming Project: Spam Filter Due: Thursday, November 10, 11:59pm Implement the Naive Bayes classifier for classifying emails as either spam
More informationBayesian Spam Detection
Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal Volume 2 Issue 1 Article 2 2015 Bayesian Spam Detection Jeremy J. Eberhardt University or Minnesota, Morris Follow this and additional
More informationTo earn the extra credit, one of the following has to hold true. Please circle and sign.
CS 188 Spring 2011 Introduction to Artificial Intelligence Practice Final Exam To earn the extra credit, one of the following has to hold true. Please circle and sign. A I spent 3 or more hours on the
More informationCSCI544, Fall 2016: Assignment 1
CSCI544, Fall 2016: Assignment 1 Due Date: September 23 rd, 4pm. Introduction The goal of this assignment is to get some experience implementing the simple but effective machine learning technique, Naïve
More informationApplication of Support Vector Machine Algorithm in Spam Filtering
Application of Support Vector Machine Algorithm in E-Mail Spam Filtering Julia Bluszcz, Daria Fitisova, Alexander Hamann, Alexey Trifonov, Advisor: Patrick Jähnichen Abstract The problem of spam classification
More informationWhat is Spam? Spam is unsolicited in the form of: Commercial advertising Phishing Virus-generated Spam Scams
Spam Overview What is Spam? Spam is unsolicited email in the form of: Commercial advertising Phishing Virus-generated Spam Scams E.g. Nigerian Prince who has an inheritance he wishes to share What is Bulk
More informationCS 188: Artificial Intelligence Fall Machine Learning
CS 188: Artificial Intelligence Fall 2007 Lecture 23: Naïve Bayes 11/15/2007 Dan Klein UC Berkeley Machine Learning Up till now: how to reason or make decisions using a model Machine learning: how to select
More information5.2. In mathematics, when a geometric figure is transformed, the size and shape of the. Hey, Haven t I Seen You Before? Congruent Triangles
Hey, Haven t I Seen You Before? Congruent Triangles. Learning Goals In this lesson, you will: Identify corresponding sides and corresponding angles of congruent triangles. Explore the relationship between
More informationNaïve Bayes Classifiers. Jonathan Lee and Varun Mahadevan
Naïve Bayes Classifiers Jonathan Lee and Varun Mahadevan Independence Recap: Definition: Two events X and Y are independent if P(XY) = P(X)P(Y), and if P Y > 0, then P X Y = P(X) Conditional Independence
More informationDuke University. Information Searching Models. Xianjue Huang. Math of the Universe. Hubert Bray
Duke University Information Searching Models Xianjue Huang Math of the Universe Hubert Bray 24 July 2017 Introduction Information searching happens in our daily life, and even before the computers were
More informationCPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016
CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Assignment 0: Admin 1 late day to hand it in tonight, 2 late days for Wednesday. Assignment 1 is out: Due Friday of next week.
More informationQuick recap on ing Security Recap on where to find things on Belvidere website & a look at the Belvidere Facebook page
Workshop #7 Email Security Previous workshops 1. Introduction 2. Smart phones & Tablets 3. All about WatsApp 4. More on WatsApp 5. Surfing the Internet 6. Emailing Quick recap on Emailing Email Security
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationBig Data Appliance in Risk Management
Big Data Appliance in Risk Management Erste Group Bank Jozef Zubricky Group Credit Risk Models and Methods Digital data have predictive power... Web Scenarios with highest predictive power Currency Conversion
More informationProbabilistic Learning Classification using Naïve Bayes
Probabilistic Learning Classification using Naïve Bayes Weather forecasts are usually provided in terms such as 70 percent chance of rain. These forecasts are known as probabilities of precipitation reports.
More informationRegularization and model selection
CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial
More informationReminder You MUST have the SMS One Time Password facility set up to make use of international payments.
Now you can easily send money overseas. Simply follow these straightforward steps below and you ll be on your way! Reminder You MUST have the SMS One Time Password facility set up to make use of international
More informationDecision Science Letters
Decision Science Letters 3 (2014) 439 444 Contents lists available at GrowingScience Decision Science Letters homepage: www.growingscience.com/dsl Identifying spam e-mail messages using an intelligence
More informationNo opinion. [No Response]
General Questions Q1. Do you agree that the proposals to refine the WHOIS opt-out eligibility and to provide a framework for registrar privacy services meets the policy objectives set out in the consultation
More informationLog-Space. A log-space Turing Machine is comprised of two tapes: the input tape of size n which is cannot be written on, and the work tape of size.
CSE 431 Theory of Computation Scribes: Michelle Park and Julianne Brodhacker Lecture 18 May 29 Review of Log-Space Turing Machines: Log-Space A log-space Turing Machine is comprised of two tapes: the input
More informationNaïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict
More informationCSEP 573: Artificial Intelligence
CSEP 573: Artificial Intelligence Machine Learning: Perceptron Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer and Dan Klein. 1 Generative vs. Discriminative Generative classifiers:
More informationFinal Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours.
CS 188 Spring 2010 Introduction to Artificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except a two-page crib sheet. Please use non-programmable calculators
More informationv.5 Accounts Payable: Best Practices
v.5 Accounts Payable: Best Practices (Course #V210) Presented by: Dave Heston Shelby Consultant 2017 Shelby Systems, Inc. Other brand and product names are trademarks or registered trademarks of the respective
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 21: ML: Naïve Bayes 11/10/2011 Dan Klein UC Berkeley Example: Spam Filter Input: email Output: spam/ham Setup: Get a large collection of example emails,
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationSpam & Phishing. Aggelos Kiayias
Spam & Phishing Aggelos Kiayias What is Spam? What is the relation? The Spam Sketch in Monty Python s Flying Circus, 1970 Word Filtering Simple filtering: example: if an e-mail contains the strings offer
More informationCAMELOT Configuration Overview Step-by-Step
General Mode of Operation Page: 1 CAMELOT Configuration Overview Step-by-Step 1. General Mode of Operation CAMELOT consists basically of three analytic processes running in a row before the email reaches
More informationSearch Engines. Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Classification and Clustering Classification and clustering are classical pattern recognition / machine learning problems
More informationMS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods
MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Supervised Learning: Nonparametric
More informationCoding Categorical Variables in Regression: Indicator or Dummy Variables. Professor George S. Easton
Coding Categorical Variables in Regression: Indicator or Dummy Variables Professor George S. Easton DataScienceSource.com This video is embedded on the following web page at DataScienceSource.com: DataScienceSource.com/DummyVariables
More informationMULTI-DIMENSIONAL MONTE CARLO INTEGRATION
CS580: Computer Graphics KAIST School of Computing Chapter 3 MULTI-DIMENSIONAL MONTE CARLO INTEGRATION 2 1 Monte Carlo Integration This describes a simple technique for the numerical evaluation of integrals
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 20: Naïve Bayes 4/11/2011 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. W4 due right now Announcements P4 out, due Friday First contest competition
More informationCS 188: Artificial Intelligence Fall Announcements
CS 188: Artificial Intelligence Fall 2006 Lecture 22: Naïve Bayes 11/14/2006 Dan Klein UC Berkeley Announcements Optional midterm On Tuesday 11/21 in class Review session 11/19, 7-9pm, in 306 Soda Projects
More informationAnnouncements. CS 188: Artificial Intelligence Fall Machine Learning. Classification. Classification. Bayes Nets for Classification
CS 88: Artificial Intelligence Fall 00 Lecture : Naïve Bayes //00 Announcements Optional midterm On Tuesday / in class Review session /9, 7-9pm, in 0 Soda Projects. due /. due /7 Dan Klein UC Berkeley
More informationText Classification. Dr. Johan Hagelbäck.
Text Classification Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Document Classification A very common machine learning problem is to classify a document based on its text contents We use
More informationJuly 2009 Report #31
July 2009 Report #31 Spam volumes continue to fluctuate but averaged approximately 90 percent of all email messages in June 2009. The recent death of Michael Jackson, and the subsequent public interest
More informationIntroduction This paper will discuss the best practices for stopping the maximum amount of SPAM arriving in a user's inbox. It will outline simple
Table of Contents Introduction...2 Overview...3 Common techniques to identify SPAM...4 Greylisting...5 Dictionary Attack...5 Catchalls...5 From address...5 HELO / EHLO...6 SPF records...6 Detecting SPAM...6
More informationEvaluation of different biological data and computational classification methods for use in protein interaction prediction.
Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly
More informationBayesian Classification Using Probabilistic Graphical Models
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Bayesian Classification Using Probabilistic Graphical Models Mehal Patel San Jose State University
More informationMachine Learning. Chao Lan
Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART V Credibility: Evaluating what s been learned 10/25/2000 2 Evaluation: the key to success How
More informationCase Study I: Naïve Bayesian spam filtering
Case Study I: Naïve Bayesian spam filtering Mike Wiper and Conchi Ausín Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School 26th - 30th June, 2017
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 20. PGM Representation Next Lectures Representation of joint distributions Conditional/marginal independence * Directed vs
More informationTree-based methods for classification and regression
Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting
More informationDensity estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate
Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,
More informationProofpoint Anti-Spam Software For John Jay College
proofpoint > Proofpoint Anti-Spam Software For John Jay College Spam as we know it is actually unsolicited email sent to people for many different purposes. Spam email can be sent to advertise new products,
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationA brief Incursion into Botnet Detection
A brief Incursion into Anant Narayanan Advanced Topics in Computer and Network Security October 5, 2009 What We re Going To Cover 1 2 3 Counter-intelligence 4 What Are s? Networks of zombie computers The
More informationComputer aided mail filtering using SVM
Computer aided mail filtering using SVM Lin Liao, Jochen Jaeger Department of Computer Science & Engineering University of Washington, Seattle Introduction What is SPAM? Electronic version of junk mail,
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 2014-2015 Jakob Verbeek, November 28, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15
More informationThe Comparative Study of Machine Learning Algorithms in Text Data Classification*
The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification
More informationTour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers
Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers James P. Biagioni Piotr M. Szczurek Peter C. Nelson, Ph.D. Abolfazl Mohammadian, Ph.D. Agenda Background
More informationProject Report: "Bayesian Spam Filter"
Humboldt-Universität zu Berlin Lehrstuhl für Maschinelles Lernen Sommersemester 2016 Maschinelles Lernen 1 Project Report: "Bayesian E-Mail Spam Filter" The Bayesians Sabine Bertram, Carolina Gumuljo,
More informationNon-ML Anti-Spamming: A Role Based Solution
Non-ML Anti-Spamming: A Role Based Solution Anthony Y. Fu, Email: anthony@cs.cityu.edu.hk WebPage: http://www.cs.cityu.edu.hk/~anthony Department of Computer Science, City University of Hong Kong Hong
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 9, 2012 Today: Graphical models Bayes Nets: Inference Learning Readings: Required: Bishop chapter
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationMEMOR.IO ONSCREEN SHORT MOVIE BY VADIM GORDT
MEMOR.IO ONSCREEN SHORT MOVIE BY VADIM GORDT Synopsis A girl is waiting for a skype call from her father at her 14 th birthday. When she was 10 he left home for a military operation and since that she
More informationThe Normal Distribution & z-scores
& z-scores Distributions: Who needs them? Why are we interested in distributions? Important link between distributions and probabilities of events If we know the distribution of a set of events, then we
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationIP Reputation Exchange security research
IP Reputation Exchange e-mail security research Prof. Dr. Norbert Pohlmann Institute for Internet Security if(is) University of Applied Sciences Gelsenkirchen http://www.internet-sicherheit.de Content
More informationWoodcote Primary School Climbing the Ladders of Learning: Maths Milestone 1.1
Climbing the Ladders of Learning: Maths Milestone 1.1 Number s I can join in with counting beyond 10 I can take away one from a number of objects I can talk about, recognise & copy simple repeating patterns
More informationMarkov Decision Processes (MDPs) (cont.)
Markov Decision Processes (MDPs) (cont.) Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University November 29 th, 2007 Markov Decision Process (MDP) Representation State space: Joint state x
More informationMy Target Level 1c. My areas for development:
My Target Level 1c I can read numbers up to 10 (R) I can count up to 10 objects (R) I can say the number names in order up to 20 (R) I can write at least 4 numbers up to 10. When someone gives me a small
More informationTesting Continuous Distributions. Artur Czumaj. DIMAP (Centre for Discrete Maths and it Applications) & Department of Computer Science
Testing Continuous Distributions Artur Czumaj DIMAP (Centre for Discrete Maths and it Applications) & Department of Computer Science University of Warwick Joint work with A. Adamaszek & C. Sohler Testing
More informationCS 584 Data Mining. Classification 1
CS 584 Data Mining Classification 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find a model for
More informationPage 1 CCM6+ Unit 10 Graphing UNIT 10 COORDINATE PLANE. CCM Name: Math Teacher: Projected Test Date:
Page 1 CCM6+ Unit 10 Graphing UNIT 10 COORDINATE PLANE CCM6+ 2015-16 Name: Math Teacher: Projected Test Date: Main Concept Page(s) Vocabulary 2 Coordinate Plane Introduction graph and 3-6 label Reflect
More informationWhat every attorney should know about E-security Also, ESI
What every attorney should know about E-security Also, ESI Sean Markham Esq. McCarthy Law Firm, LLC smarkham@mccarthy-lawfirm.com Why should I care about security? Because it is a good idea! and, if that
More informationCE Advanced Network Security Phishing I
CE 817 - Advanced Network Security Phishing I Lecture 15 Mehdi Kharrazi Department of Computer Engineering Sharif University of Technology Acknowledgments: Some of the slides are fully or partially obtained
More informationLecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs
Lecture 5 Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs Reading: Randomized Search Trees by Aragon & Seidel, Algorithmica 1996, http://sims.berkeley.edu/~aragon/pubs/rst96.pdf;
More informationIntroduction to Hidden Markov models
1/38 Introduction to Hidden Markov models Mark Johnson Macquarie University September 17, 2014 2/38 Outline Sequence labelling Hidden Markov Models Finding the most probable label sequence Higher-order
More informationComputer Security Incident Response Team Slovakia CSIRT.SK
Computer Security Incident Response Team Slovakia CSIRT.SK Martin Jurčík, CSIRT.SK CS Danube, 15 th March, 2016, Prague CS Danube (Cyber Security in Danube Region) project is part financed by the European
More informationLOGISTIC REGRESSION FOR MULTIPLE CLASSES
Peter Orbanz Applied Data Mining Not examinable. 111 LOGISTIC REGRESSION FOR MULTIPLE CLASSES Bernoulli and multinomial distributions The mulitnomial distribution of N draws from K categories with parameter
More informationNaïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict
More informationMTH 3210: PROBABILITY AND STATISTICS DESCRIPTIVE STATISTICS WORKSHEET
MTH 3210: PROBABILITY AND STATISTICS DESCRIPTIVE STATISTICS WORKSHEET Before you work on the practice problems (Section 3) please make sure that you read the supplementary notes (Section 1) and work through
More informationb 1. If he flips the b over to the left, what new letter is formed? Draw a picture to the right.
Name: Date: Student Exploration: Rotations, Reflections, and Translations Vocabulary: image, preimage, reflection, rotation, transformation, translation Prior Knowledge Questions (Do these BEFORE using
More informationBioinformatics - Lecture 07
Bioinformatics - Lecture 07 Bioinformatics Clusters and networks Martin Saturka http://www.bioplexity.org/lectures/ EBI version 0.4 Creative Commons Attribution-Share Alike 2.5 License Learning on profiles
More informationSchematizing a Global SPAM Indicative Probability
Schematizing a Global SPAM Indicative Probability NIKOLAOS KORFIATIS MARIOS POULOS SOZON PAPAVLASSOPOULOS Department of Management Science and Technology Athens University of Economics and Business Athens,
More informationAI Programming CS S-15 Probability Theory
AI Programming CS662-2013S-15 Probability Theory David Galles Department of Computer Science University of San Francisco 15-0: Uncertainty In many interesting agent environments, uncertainty plays a central
More informationYear 5 Maths Areas of Focused Learning and Associated Vocabulary
Year 5 Maths Areas of Focused Learning and Associated Vocabulary Counting, partitioning and calculating Addition and subtraction Mental methods: special cases Written methods: whole numbers and decimals
More informationDensity estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate
Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,
More informationMaths Key Objectives Check list Year 1
Maths Key Objectives Check list Year 1 Count to and across 100 from any number. Count, read and write numbers to 100 in numerals. Read and write mathematical symbols +, - and =. Identify one more and one
More information11.6 The Coordinate Plane
11.6 The Coordinate Plane Introduction The Map Kevin and his pen pal Charlotte are both creating maps of their neighborhoods to show each other what it looks like where they live. Kevin has decided to
More informationSTUDYING OF CLASSIFYING CHINESE SMS MESSAGES
STUDYING OF CLASSIFYING CHINESE SMS MESSAGES BASED ON BAYESIAN CLASSIFICATION 1 LI FENG, 2 LI JIGANG 1,2 Computer Science Department, DongHua University, Shanghai, China E-mail: 1 Lifeng@dhu.edu.cn, 2
More informationPrivate-Key Encryption
Private-Key Encryption Ali El Kaafarani Mathematical Institute Oxford University 1 of 32 Outline 1 Historical Ciphers 2 Probability Review 3 Security Definitions: Perfect Secrecy 4 One Time Pad (OTP) 2
More informationPolynomial and Rational Functions
Chapter 3 Polynomial and Rational Functions Review sections as needed from Chapter 0, Basic Techniques, page 8. Refer to page 187 for an example of the work required on paper for all graded homework unless
More informationMX Control Console. Administrative User Manual
MX Control Console Administrative User Manual This Software and Related Documentation are proprietary to MX Logic, Inc. Copyright 2003 MX Logic, Inc. The information contained in this document is subject
More informationLogistic Regression: Probabilistic Interpretation
Logistic Regression: Probabilistic Interpretation Approximate 0/1 Loss Logistic Regression Adaboost (z) SVM Solution: Approximate 0/1 loss with convex loss ( surrogate loss) 0-1 z = y w x SVM (hinge),
More informationDesign and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute
Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Module 07 Lecture - 38 Divide and Conquer: Closest Pair of Points We now look at another divide and conquer algorithm,
More information