CISC 4631 Data Mining Lecture 01:
|
|
- Adrian Collins
- 6 years ago
- Views:
Transcription
1 CISC 4631 Data Mining Lecture 01: Introduction to Data Mining 1
2 Let s Start By Seeing What You Know Quick Quiz Do you know what Data Mining is? Do you know of any examples of Data Mining? 2
3 What is Data Mining? Data Mining has many definitions Non-trivial extraction of implicit, previously unknown and potentially useful information from data Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns 3
4 Alternative Names Data Mining was/is known by these other names (although many of these have lost favor over time): Knowledge discovery in databases (KDD) Knowledge extraction Data/pattern analysis Data archeology, data dredging, information harvesting, business intelligence, etc. Recently introduced new names (maybe with different emphases): Data Science Big Data 4
5 Some Examples Netflix and Amazon use data mining to recommend products (recommender systems) Companies use data mining for marketing Who should be mailed a catalog Who should see what online ads (Google Adwords) Fordham s WISDM project uses smartphone accelerometer data to classify user activities (walking, jogging, sitting, etc.) Some search engines cluster retrieved documents into meaningful groups Group pages about Jaguar into car pages and cat pages 5
6 Why Data Mining and Why Now? Data Mining was not very popular until about years ago Quick Quiz: What do you think changed? 6
7 Why Mine Data? There are now tremendous amounts of data that are automatically collected and warehoused. What are some examples? Web data, e-commerce Store purchases Bank/Credit Card transactions Cell phone GPS information Smartphone and Smartwatch Sensor Data 7
8 Why Mine Data? What technological changes have helped make data mining so prevalent now? Computers: cheaper and more powerful Smaller mobile devices are exploding in popularity Disk and other storage: greater capacity and cheaper Increased use of on-line resources and Internet We shouldn t discount the advances in algorithms but most data mining algorithms are relatively mature 8
9 Why Mine Data? In business, competitive pressure is strong Provide better, customized services for an edge (e.g. in Customer Relationship Management) CRM is a relatively big deal now How do we get the most out of the customer over the long run Example: Customer Churn Analysis 9
10 Why Mine Data? Often info hidden in data is not evident Analysts may take weeks to discover useful information Much of the data is never analyzed at all There is just too much data to analyze without assistance 10
11 Scientific Need Data collected at enormous speeds remote sensors on satellite telescopes scanning the skies microarrays generating gene expression data scientific simulations Traditional techniques infeasible 11
12 How Big is the Data? Examples of Large Data Sets AT&T s 26TB call detail database (2003) Ebay 6PB, IRS 150TB data warehouse Yahoo has a 2PB DB to analyze behavior of ½ billion web visitors/month (24 billion events/day) Wal-Mart has a 583 TB database (2006) Indexed web contains about 20 Billion pages Sites like Facebook, Flicker & Twitter contain lots of data Google is estimated (in 2011) to have 900,000 servers to handle its data! 12
13 How Much Data is Being Created? 5 Exabytes new data created (2002, UC Berkeley) Humans created/copied 161/281 Exabytes in 06/07 (IDC) 1 Exabyte = stacks of books stretching from Earth to Sun 3 million times the books ever written Not all data stored at once (includes temporary data) In ZB (2800EB) of data will be created/copied Forecast for 2020: 40 ZB, or (57X number of grains of sand on Earth) OK, we get the point already.! Head hurts. 13
14 Why Data Mining? Why Now? According to BabyCenter.com, today one in three children born in the United States already have an online presence (usually in the form of a sonogram) before they are born. That number grows to 92% by the time they are two. In 2012 the average digital birth of children occurs at approximately six months, with a third of all children s photos and information posted online within weeks of their birth. What will it mean to live in a world where our every moment, from birth to death, is digitally chronicled and preserved in vast cloud based databases, forever? During the first day of a baby s life, the amount of data generated by humanity is equivalent to 70 times the information contained in the library of congress. 14
15 Origins of Data Mining Draws ideas from machine learning/ai, pattern recognition, statistics, and database systems* Traditional techniques may be unsuitable due to Enormity of data High dimensionality Heterogeneous & distributed data Statistics Data Mining Artificial Intelligence Machine Learning Pattern Recognition * databases currently have limited impact; data mining is rarely done in a database but rather on flat files Database systems 15
16 Statistics vs. Data Mining Experience has shown that students with statistics backgrounds are often confused by data mining if the differences aren t highlighted When compared to Data Mining: Statistics is more theory-based Data mining methods are often based on heuristic algorithms Statistics is based firmly on mathematics (e.g., probability) Statistics is more focused on testing hypotheses vs. finding interesting relationships Statistics makes more assumptions about the data 16
17 The Process of Data Mining Data Mining is a process, sometimes referred to as a knowledge discovery process. In this process there is a data mining step that applies data mining algorithms to extract knowledge. About 80% of our class will focus on the data mining step but in the real world 80% of the time is spent on the other steps (e.g., prepping data) 17
18 Second Part of Introduction: DATA MINING TASKS 18
19 Top-Level Data Mining Tasks At highest level, data mining tasks can be divided into: Prediction Tasks (supervised learning) Use some variables to predict unknown or future values of other variables Description Tasks (unsupervised learning) Find human-interpretable patterns that describe the data 19
20 Key Data Mining Tasks Overview of the major data mining tasks studied in this course: Prediction Tasks Classification Regression Description Tasks Clustering Association Rule Discovery 20
21 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class, which is to be predicted. Find a model for class attribute as a function of the values of other attributes. Model maps record to a class value Goal: previously unseen records should be assigned a class as accurately as possible. A test set is used to determine accuracy of the model Can you think of classification tasks? 21
22 10 10 Classification Example Tid Refund Marital Status Taxable Income Cheat Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No No Single 75K? 2 No Married 100K No Yes Married 50K? 3 No Single 70K No No Married 150K? 4 Yes Married 120K No Yes Divorced 90K? 5 No Divorced 95K Yes No Single 40K? 6 No Married 60K No 7 Yes Divorced 220K No No Married 80K? Test Set 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes Training Set Learn Classifier Model
23 Classification: Application 1 Direct Marketing Goal: Reduce cost of mailing by targeting a set of consumers likely to buy a new cell-phone product. Approach: Use the data for a similar product introduced before. We know which customers decided to buy and which decided otherwise. This {buy, don t buy} decision forms the class attribute Collect various demographic, lifestyle, and companyinteraction related information about all such customers. Type of business, where they stay, how much they earn, etc. Use this info as input attributes to learn a classifier model 23
24 Classification: Application 2 Fraud Detection Goal: Predict fraudulent cases in credit card transactions Approach: Use credit card transactions and info on account-holders as attributes When and what does customer buy, how often pays on time, etc Label past transactions as fraud or fair transactions. This forms the class attribute. Learn a model for the class of the transactions. Use this model to detect fraud by observing credit card transactions on an account. 24
25 Classification: Application 3 Sky Survey Cataloging Goal: To predict class (star or galaxy) of sky objects, especially visually faint ones, based on the telescopic survey images (from Palomar Observatory) images with 23,040 x 23,040 pixels per image. Approach: Segment the image. Measure image attributes (features) - 40 of them per object. Model the class based on these features. Success Story: Could find 16 new high red-shift quasars, some of the farthest objects that are difficult to find! From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining,
26 Classifying Galaxies Courtesy: Early Class: Stages of Formation Intermediate Attributes: Image features, Characteristics of light waves received, etc. Late Data Size: 72 million stars, 20 million galaxies Object Catalog: 9 GB Image Database: 150 GB 26
27 Regression Predict a value of a given continuous (numerical) variable based on the values of other variables Greatly studied in statistics Examples: Predicting sales amounts of new product based on advertising expenditure. Predicting wind velocities as a function of temperature, humidity, air pressure, etc. Time series prediction of stock market indices 27
28 Clustering Given a set of data points find clusters so that Data points in same cluster are similar Data points in different clusters are dissimilar You try it on the Simpsons. How can we cluster these 5 data points? 28
29 What is a natural grouping among these objects? 29
30 What is a natural grouping among these objects? Clustering is subjective Simpson's Family School Employees Females Males 30
31 What is Similarity? The quality or state of being similar; likeness; resemblance; as, a similarity of features. Webster's Dictionary Similarity is hard to define, but We know it when we see it The real meaning of similarity is a philosophical question. We will take a more pragmatic approach. 31
32 Clustering: Application 1 Market Segmentation: Goal: subdivide a market into distinct subsets of similar customers Approach: Collect different attributes of customers based on their geographical and lifestyle related information. Find clusters of similar customers. Measure the clustering quality by observing buying patterns of customers in same cluster vs. those from different clusters. 32
33 Clustering: Application 2 Document Clustering: Goal: Find groups of documents that are similar to each other based on the words appearing in them Approach: Identify frequently occurring terms in each document. Form a similarity measure based on the frequencies of different terms. Use it to cluster. Uses: Information Retrieval can utilize the clusters to relate a new document or search term to clustered documents. 33
34 Association Rule Discovery Given a set of records each of which contain some number of items from a given collection TID Produce dependency rules which will predict occurrence of an item based on occurrences of other items. Items 1 Bread, Coke, Milk 2 Beer, Bread 3 Beer, Coke, Diaper, Milk 4 Beer, Bread, Diaper, Milk 5 Coke, Diaper, Milk Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer} beer Diapers 34
35 Association Rule Discovery Application Marketing and Sales Promotion Applications Let the rule discovered be {Bagels, } --> {Potato Chips} Potato Chips as consequent => Can be used to determine what should be done to boost its sales. Bagels in the antecedent => Can be used to see which products would be affected if the store discontinues selling bagels. Bagels in antecedent and Potato chips in consequent => Can be used to see what products should be sold with Bagels to promote sale of Potato chips! Can help determine where to position store items Supermarket shelf management Did you ever notice that some stores have bananas in the cereal aisle? 35
36 Challenges of Data Mining Scalability Dimensionality Complex and Heterogeneous Data Data Quality Data Ownership and Distribution Privacy Preservation Streaming Data 36
37 What is (and is not) Data Mining? Based on the definitions of data mining, are these DM or not? Finding a phone number in a directory Not data mining (trivial?, DB query) Grouping related documents returned by search engine Is data mining (not trivial, clustering) Identifying who has a disease based on symptoms Is data mining (not trivial, classification) Web search on keyword using search engine May be data mining** ** More of an information retrieval task than data mining task. However, since Google does much more than keyword matching, there will be a data mining component. For example, Google mines the link structure of the Web to decide which pages are important (link mining is a type of data mining). 37
38 Data sets If you are Interested in Data Mining NYC open data ( UCI Data Repository ( Visit kdnuggets, an online newsletter and more You can arrange to have newsletter ed to you Also includes job openings ACM SIGKDD is the professional organization associated with data mining ACM Special Interest Group (SIG) on data mining Can join SIGKDD for $22 or for $54 can also join ACM as student member 38
Includes Review of Syllabus OVERVIEW OF THE CLASS
Includes Review of Syllabus OVERVIEW OF THE CLASS What is this class about? This class will introduce data mining The types of problems that can be addressed The methods that can be used Focus will be
More informationCOMP90049 Knowledge Technologies
COMP90049 Knowledge Technologies Data Mining (Lecture Set 3) 2017 Rao Kotagiri Department of Computing and Information Systems The Melbourne School of Engineering Some of slides are derived from Prof Vipin
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 1 1 Acknowledgement Several Slides in this presentation are taken from course slides provided by Han and Kimber (Data Mining Concepts and Techniques) and Tan,
More informationAn Introduction to Data Mining BY:GAGAN DEEP KAUSHAL
An Introduction to Data Mining BY:GAGAN DEEP KAUSHAL Trends leading to Data Flood More data is generated: Bank, telecom, other business transactions... Scientific Data: astronomy, biology, etc Web, text,
More informationData Mining Course Overview
Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical
More informationFoundation of Data Mining: Introduction
Foundation of Data Mining: Introduction Hillol Kargupta CSEE Department, UMBC hillol@cs.umbc.edu ITE 342, (410) 455-3972 www.cs.umbc.edu/~hillol Acknowledgement: Tan, Steinbach, and Kumar provided some
More informationData Mining Concept. References. Why Mine Data? Commercial Viewpoint. Why Mine Data? Scientific Viewpoint
References Discovering Knowledge in Data Daniel T Larose, 2005 Data Mining Concept Data Mining: Concepts and Techniques, 2nd Edition, 2005 Micheline Kamber, Jiawei Han Data Mining: Practical Machine Learning
More informationIntroduction to Data Mining CS 584 Data Mining (Fall 2016)
Introduction to Data Mining CS 584 Data Mining (Fall 2016) Huzefa Rangwala AssociateProfessor, Computer Science George Mason University Email: rangwala@cs.gmu.edu Website: www.cs.gmu.edu/~hrangwal Slides
More informationStats Overview Ji Zhu, Michigan Statistics 1. Overview. Ji Zhu 445C West Hall
Stats 415 - Overview Ji Zhu, Michigan Statistics 1 Overview Ji Zhu 445C West Hall 734-936-2577 jizhu@umich.edu Stats 415 - Overview Ji Zhu, Michigan Statistics 2 What is Data Mining? Data mining is a multi-disciplinary
More informationData Mining: Introduction. Lecture Notes for Chapter 1. Introduction to Data Mining
Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Why Mine Data? Commercial Viewpoint
More informationINTRODUCTION TO DATA MINING
INTRODUCTION TO DATA MINING 1 Chiara Renso KDDLab - ISTI CNR, Italy http://www-kdd.isti.cnr.it email: chiara.renso@isti.cnr.it Knowledge Discovery and Data Mining Laboratory, ISTI National Research Council,
More informationStatistical Learning and Data Mining CS 363D/ SSC 358
Statistical Learning and Data Mining CS 363D/ SSC 358! Lecture: Introduction Pradeep Ravikumar pradeepr@cs.utexas.edu What is this course about (in 1 minute) Big Data Data Mining, Statistical Learning
More informationData Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University
Data Mining Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused Web data, e-commerce
More informationDATA MINING II - 1DL460
DATA MINING II - 1DL460 Spring 2012 A second course in data mining!! http://www.it.uu.se/edu/course/homepage/infoutv2/vt12 Kjell Orsborn! Uppsala Database Laboratory! Department of Information Technology,
More informationDATA MINING II - 1DL460
DATA MINING II - 1DL460 Spring 2016 A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt16 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationIntroduction to Data Mining. Komate AMPHAWAN
Introduction to Data Mining Komate AMPHAWAN 1 Data mining(1970s) = Knowledge Discovery in (very large) Databases : KDD Automatically Find hidden patterns and hidden association from raw data by using computer(s).
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University.
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2 Instructor: Jure Leskovec TAs: Aditya Parameswaran Bahman Bahmani Peyman Kazemian 3 Course website: http://cs246.stanford.edu
More informationKnowledge Discovery & Data Mining
Announcements ISM 50 - Business Information Systems Lecture 17 Instructor: Magdalini Eirinaki UC Santa Cruz May 29, 2007 News Folio #3 DUE Thursday 5/31 Database Assignment DUE Tuesday 6/5 Business Paper
More informationISM 50 - Business Information Systems
ISM 50 - Business Information Systems Lecture 17 Instructor: Magdalini Eirinaki UC Santa Cruz May 29, 2007 Announcements News Folio #3 DUE Thursday 5/31 Database Assignment DUE Tuesday 6/5 Business Paper
More informationD B M G Data Base and Data Mining Group of Politecnico di Torino
DataBase and Data Mining Group of Data mining fundamentals Data Base and Data Mining Group of Data analysis Most companies own huge databases containing operational data textual documents experiment results
More informationCOMP 465 Special Topics: Data Mining
COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,
More informationINTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING
CS 7265 BIG DATA ANALYTICS INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, PhD Computer Science,
More informationOverview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer
Data Mining George Karypis Department of Computer Science Digital Technology Center University of Minnesota, Minneapolis, USA. http://www.cs.umn.edu/~karypis karypis@cs.umn.edu Overview Data-mining What
More informationData mining fundamentals
Data mining fundamentals Elena Baralis Politecnico di Torino Data analysis Most companies own huge bases containing operational textual documents experiment results These bases are a potential source of
More informationThanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently New challenges: with a
Data Mining and Information Retrieval Introduction to Data Mining Why Data Mining? Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently
More informationCS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University
CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,
More informationCSE4334/5334 DATA MINING
CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy
More informationData Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei
Data Mining Chapter 1: Introduction Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei 1 Any Question? Just Ask 3 Chapter 1. Introduction Why Data Mining? What Is Data Mining? A Multi-Dimensional
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationDATA MINING LECTURE 1. Introduction
DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is still no unique answer to this question. A tentative definition: Data mining is the use of efficient techniques
More informationIntroduction to Data Mining S L I D E S B Y : S H R E E J A S W A L
Introduction to Data Mining S L I D E S B Y : S H R E E J A S W A L Books 2 Which Chapter from which Text Book? Chapter 1: Introduction from Han, Kamber, "Data Mining Concepts and Techniques", Morgan Kaufmann
More informationData Mining & Machine Learning
Data Mining & Machine Learning Dino Pedreschi & Anna Monreale Dipartimento di Infomatica Tutor: Riccardo Guidotti, Dipartimento di Informatica DIPARTIMENTO DI INFORMATICA - Università di Pisa Data Mining
More informationData Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.
Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan
More informationDATA MINING LECTURE 1. Introduction
DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is still no unique answer to this question. A tentative definition: Data mining is the use of efficient techniques
More informationChapter 3: Data Mining:
Chapter 3: Data Mining: 3.1 What is Data Mining? Data Mining is the process of automatically discovering useful information in large repository. Why do we need Data mining? Conventional database systems
More informationDATA MINING INTRO LECTURE. Introduction
DATA MINING INTRO LECTURE Introduction Instructors Aris (Aris Anagnostopoulos) Yiannis (Ioannis Chatzigiannakis) Evimaria (Evimaria Terzi) What is data mining? After years of data mining there is still
More informationINTRODUCTION TO DATA MINING ASSOCIATION RULES. Luiza Antonie
INTRODUCTION TO DATA MINING ASSOCIATION RULES Luiza Antonie Luiza Antonie, PhD WHO AM I? PDF on Record Linkage Department of Finance and Economics, University of Guelph Email: lantonie@uoguelph.ca Website:
More informationData warehouse and Data Mining
Data warehouse and Data Mining Lecture No. 14 Data Mining and its techniques Naeem A. Mahoto Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationCS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University
CS423: Data Mining Introduction Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS423: Data Mining 1 / 29 Quote of the day Never memorize something that
More informationINTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá
INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús
More informationBusiness Analytics and Big Data: the process and the tools
Business Analytics and Big Data: the process and the tools Mehmet Gençer Assoc.Prof., Organization Studies & Computer Engineering mehmetgencer@yahoo.com mehmet.gencer@ieu.edu.tr https://mgencer.com How
More informationClassification and Regression
Classification and Regression Announcements Study guide for exam is on the LMS Sample exam will be posted by Monday Reminder that phase 3 oral presentations are being held next week during workshops Plan
More informationDATA MINING LECTURE 1. Introduction
DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is still no unique answer to this question. A tentative definition: Data mining is the use of efficient techniques
More informationChapter 4 Data Mining A Short Introduction
Chapter 4 Data Mining A Short Introduction Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining 3. Clustering 4. Classification Data Mining - 2 2 1. Data Mining Overview
More informationQuestion Bank. 4) It is the source of information later delivered to data marts.
Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile
More informationClassification: Basic Concepts, Decision Trees, and Model Evaluation
Classification: Basic Concepts, Decision Trees, and Model Evaluation Data Warehousing and Mining Lecture 4 by Hossen Asiful Mustafa Classification: Definition Given a collection of records (training set
More informationOverview of Web Mining Techniques and its Application towards Web
Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous
More informationLecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics
More informationANU MLSS 2010: Data Mining. Part 2: Association rule mining
ANU MLSS 2010: Data Mining Part 2: Association rule mining Lecture outline What is association mining? Market basket analysis and association rule examples Basic concepts and formalism Basic rule measurements
More informationAn Introduction to Data Mining
An Introduction to Data Mining Hossein Hakimzadeh Computer and Information Sciences Data Mining (B561) 1 What Is Data Mining? Original Definition: "data mining" was a statistician's term for overusing
More informationKnowledge Discovery in Data Bases
Knowledge Discovery in Data Bases Chien-Chung Chan Department of CS University of Akron Akron, OH 44325-4003 2/24/99 1 Why KDD? We are drowning in information, but starving for knowledge John Naisbett
More informationDATA MINING INTRO LECTURE. Introduction
DATA MINING INTRO LECTURE Introduction Instructors Aris (Aris Anagnostopoulos, lectures) Yiannis (Ioannis Chatzigiannakis, lab) Adriano (Adriano Fazzone, Teaching Assistant) Mailing list Register to the
More informationChapter 28. Outline. Definitions of Data Mining. Data Mining Concepts
Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms
More informationWinter Semester 2009/10 Free University of Bozen, Bolzano
Data Warehousing and Data Mining Winter Semester 2009/10 Free University of Bozen, Bolzano DW Lecturer: Johann Gamper gamper@inf.unibz.it DM Lecturer: Mouna Kacimi mouna.kacimi@unibz.it http://www.inf.unibz.it/dis/teaching/dwdm/index.html
More informationData Mining and Data Warehousing Introduction to Data Mining
Data Mining and Data Warehousing Introduction to Data Mining Quiz Easy Q1. Which of the following is a data warehouse? a. Can be updated by end users. b. Contains numerous naming conventions and formats.
More information745: Advanced Database Systems
745: Advanced Database Systems Yanlei Diao University of Massachusetts Amherst Outline Overview of course topics Course requirements Database Management Systems 1. Online Analytical Processing (OLAP) vs.
More informationJarek Szlichta
Jarek Szlichta http://data.science.uoit.ca/ Approximate terminology, though there is some overlap: Data(base) operations Executing specific operations or queries over data Data mining Looking for patterns
More informationExtra readings beyond the lecture slides are important:
1 Notes To preview next lecture: Check the lecture notes, if slides are not available: http://web.cse.ohio-state.edu/~sun.397/courses/au2017/cse5243-new.html Check UIUC course on the same topic. All their
More informationApplying big data analytics in practice
ARISTOTLE UNIVERSITY of THESSALONIKI Applying big data analytics in practice Anastasios Gounaris School of Informatics datalab.csd.auth.gr/~gounaris email: gounaria@csd.auth.gr New data every 1 min 2 What
More informationDATA MINING TRANSACTION
DATA MINING Data Mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. It is
More information1. Inroduction to Data Mininig
1. Inroduction to Data Mininig 1.1 Introduction Universe of Data Information Technology has grown in various directions in the recent years. One natural evolutionary path has been the development of the
More informationCOMP 6838 Data MIning
COMP 6838 Data MIning LECTURE 1: Introduction Dr. Edgar Acuna Departmento de Matematicas Universidad de Puerto Rico- Mayaguez math.uprm.edu/~edgar 1 Course s Objectives Understand the basic concepts to
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 02 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationPart I. Instructor: Wei Ding
Classification Part I Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Classification: Definition Given a collection of records (training set ) Each record contains a set
More informationTIM 50 - Business Information Systems
TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz May 20, 2014 Announcements DB 2 Due Tuesday Next Week The Database Approach to Data Management Database: Collection of related files containing
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 03 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationOverview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?
Introduction to Data Warehousing and Business Intelligence Overview Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction A tour of the coming DW lectures DW Applications Loosely
More informationData Mining Concepts
Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential
More informationBig Data - Some Words BIG DATA 8/31/2017. Introduction
BIG DATA Introduction Big Data - Some Words Connectivity Social Medias Share information Interactivity People Business Data Data mining Text mining Business Intelligence 1 What is Big Data Big Data means
More informationAssociation Rules. Berlin Chen References:
Association Rules Berlin Chen 2005 References: 1. Data Mining: Concepts, Models, Methods and Algorithms, Chapter 8 2. Data Mining: Concepts and Techniques, Chapter 6 Association Rules: Basic Concepts A
More informationCAP-359 PRINCIPLES AND APPLICATIONS OF DATA MINING. Rafael Santos
CAP-359 PRINCIPLES AND APPLICATIONS OF DATA MINING Rafael Santos rafael.santos@inpe.br www.lac.inpe.br/~rafael.santos/ Overview So far What is Data Mining? Applications, Examples. Let s think about your
More informationDefining a Data Mining Task. CSE3212 Data Mining. What to be mined? Or the Approaches. Task-relevant Data. Estimation.
CSE3212 Data Mining Data Mining Approaches Defining a Data Mining Task To define a data mining task, one needs to answer the following questions: 1. What data set do I want to mine? 2. What kind of knowledge
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationHigh dim. data. Graph data. Infinite data. Machine learning. Apps. Locality sensitive hashing. Filtering data streams.
http://www.mmds.org High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Network Analysis
More informationClassification. Instructor: Wei Ding
Classification Decision Tree Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Preliminaries Each data record is characterized by a tuple (x, y), where x is the attribute
More informationData Mining: Data. What is Data? Lecture Notes for Chapter 2. Introduction to Data Mining. Properties of Attribute Values. Types of Attributes
0 Data Mining: Data What is Data? Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Collection of data objects and their attributes An attribute is a property or characteristic
More informationData Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining
10 Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 What is Data? Collection of data objects
More information3 Data, Data Mining. Chengkai Li
CSE4334/5334 Data Mining 3 Data, Data Mining Chengkai Li Department of Computer Science and Engineering University of Texas at Arlington Fall 2018 (Slides partly courtesy of Pang-Ning Tan, Michael Steinbach
More informationFall 2017 ECEN Special Topics in Data Mining and Analysis
Fall 2017 ECEN 689-600 Special Topics in Data Mining and Analysis Nick Duffield Department of Electrical & Computer Engineering Teas A&M University Organization Organization Instructor: Nick Duffield,
More informationDatabase and Knowledge-Base Systems: Data Mining. Martin Ester
Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro
More informationA Systems Approach to Dimensional Modeling in Data Marts. Joseph M. Firestone, Ph.D. White Paper No. One. March 12, 1997
1 of 8 5/24/02 4:43 PM A Systems Approach to Dimensional Modeling in Data Marts By Joseph M. Firestone, Ph.D. White Paper No. One March 12, 1997 OLAP s Purposes And Dimensional Data Modeling Dimensional
More informationTIM 50 - Business Information Systems
TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz Nov 10, 2016 Class Announcements n Database Assignment 2 posted n Due 11/22 The Database Approach to Data Management The Final Database Design
More informationData Preprocessing UE 141 Spring 2013
Data Preprocessing UE 141 Spring 2013 Jing Gao SUNY Buffalo 1 Outline Data Data Preprocessing Improve data quality Prepare data for analysis Exploring Data Statistics Visualization 2 Document Data Each
More informationInternational Journal of Advance Engineering and Research Development. A Survey on Data Mining Methods and its Applications
Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 5, Issue 01, January -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 A Survey
More informationData Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application
Data Structures Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali 2009-2010 Association Rules: Basic Concepts and Application 1. Association rules: Given a set of transactions, find
More informationWeb Mining Evolution & Comparative Study with Data Mining
Web Mining Evolution & Comparative Study with Data Mining Anu, Assistant Professor (Resource Person) University Institute of Engineering and Technology Mahrishi Dayanand University Rohtak-124001, India
More informationKnowledge Engineering and Data Mining. Knowledge engineering has 6 basic phases:
Knowledge Engineering and Data Mining Knowledge Engineering The process of building intelligent knowledge based systems is called knowledge engineering Knowledge engineering has 6 basic phases: 1. Problem
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Queries on streams
More informationOracle9i Data Mining. An Oracle White Paper December 2001
Oracle9i Data Mining An Oracle White Paper December 2001 Oracle9i Data Mining Benefits and Uses of Data Mining... 2 What Is Data Mining?... 3 Data Mining Concepts... 4 Using the Past to Predict the Future...
More informationData Mining Clustering
Data Mining Clustering Jingpeng Li 1 of 34 Supervised Learning F(x): true function (usually not known) D: training sample (x, F(x)) 57,M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 0
More informationData Mining Concepts. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Data Mining Concepts Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Partly based on
More informationIntroduction to Data Mining
Introduction to Data Mining José Hernández ndez-orallo Dpto.. de Sistemas Informáticos and Computación Universidad Politécnica de Valencia, Spain jorallo@dsic.upv.es Horsens, Denmark, 26th September 2005
More informationData Platforms and Pattern Mining
Morteza Zihayat Data Platforms and Pattern Mining IBM Corporation About Myself IBM Software Group Big Data Scientist 4Platform Computing, IBM (2014 Now) PhD Candidate (2011 Now) 4Lassonde School of Engineering,
More informationISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationKDD E A MINERAÇÃO DE DADOS. Daniela Barreiro Claro
KDD E A MINERAÇÃO DE DADOS Daniela Barreiro Claro Outline Introduction KDD Pré-Processamento Mineração de Dados Tarefas Pós-Processamento Prof. Daniela Barreiro Claro 2 de X;X= BIG Data Huge amount of
More informationWhat Is Data Mining? CMPT 354: Database I -- Data Mining 2
Data Mining What Is Data Mining? Mining data mining knowledge Data mining is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data CMPT
More informationDATA MINING LECTURE 1. Introduction
DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is still no unique answer to this question. A tentative definition: Data mining is the use of efficient techniques
More informationData mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014
Data Mining Data mining processes What technological infrastructure is required? Data mining is a system of searching through large amounts of data for patterns. It is a relatively new concept which is
More informationKNOWLEDGE DISCOVERY AND DATA MINING
KNOWLEDGE DISCOVERY AND DATA MINING Prof. Fabio A. Schreiber Dipartimento di Elettronica e Informazione Politecnico di Milano INFORMATION MANAGEMENT TECHNOLOGIES DATA WAREHOUSE DECISION SUPPORT SYSTEMS
More informationDr. SubraMANI Paramasivam. Think & Work like a Data Scientist with SQL 2016 & R
Dr. SubraMANI Paramasivam Think & Work like a Data Scientist with SQL 2016 & R About the Speaker Group Leader Dr. SubraMANI Paramasivam PhD., MVP, MCT, MCSE (x2), MCITP (x2), MCP, MCTS (x3), MCSA CEO,
More information