Distributed recipe mining to combine historic and social media knowledge

Size: px
Start display at page:

Download "Distributed recipe mining to combine historic and social media knowledge"

Transcription

1 Distributed recipe mining to combine historic and social media knowledge Sigurd Sippel Betreuung: Kai von Luck

2 Motivation develops during the time Social Media redundant adaptions Historic Books What is the main recipe? 2

3 Search on cluster map 3

4 Inner Cluster 4

5 KDD process [FPSS96] Knowledge discovery in database and data mining Defining goal Osama Fayyad SIGKDD KDD Conference Gregory Piatetsky-Shapiro Data selection Data cleaning Data reduction Mining function Mining algorithm Data mining Interpretation Performance 5

6 Defining goal domain and feature selection Defining goal predictive descriptive Trend Analysis Describing pattern Recommen dation Human Readable 6

7 Feature selection author date description Defining goal Data selection title ingredients units tools preperation glassware 7

8 Data cleaning Plausibility check Handling missing data Defining goal Data selection Data cleaning remove noise Implicit information Fuzzy Logic 8

9 Data reduction & projection Feature extraction Defining goal Keyword Normalization Data selection Data cleaning Database Data reduction Keywords Tags Supporting Data Projection {tag1, tag2,..., tagn} 9

10 Distance function Euklid distance [JMF99] Dynamic-Time-Warping [HLWG08] Speech Recogniction Levenshtein-Distance String recognition 10

11 Choosing mining function [FKPT07, S. 512] [LW06, S. 6] Defining goal Data selection Data cleaning Data reduction Mining function Classification Regression Clustering [JMF99, S. 277] 11

12 Chemical Model [MHC06, S. 80] 12

13 (Graph) Representation [MHC06, S. 80] Trees Bayesian network Neural network

14 Choosing mining algorithm Defining goal Bayes Data selection Data cleaning Support Vector Machines Klassifikation Neuronale Netze Data reduction Mining function Mining algorithm Gradient Boosting Machines 14

15 Data mining: Classification Defining goal Trainingset Testset Data selection Data cleaning Classification SVM Featurevektor -> {true, false} Data reduction Mining function Mining algorithm Data mining 15

16 Interpretation Defining goal Remove redudant patterns Data selection Data cleaning Translating for users Data reduction Mining function Mining algorithm Visualization Data mining Interpretation 16

17 Optimizing Training set Defining goal Data selection Data cleaning Gradient descent technique Hill climbing Data reduction Mining function Mining algorithm Data mining Interpretation Feature Complexity Performance 17

18 Large scale Twitter mining of drug related adverse events [BTY12] Socal Media Facebook Twitter Its hot weather, Tamoxifen is a nightmare Mining is drug related? (hot weather, Tamoxifen) => Nightmare 18

19 Classification Training User Training set Potential drug users Corresponding Keywords in Timeline Tweet Training set Medicine Ontology Unified Medicine Language system 19

20 Mining Process to find user [BTY12, S. 27] 20

21 Feature Extraction of Tweets Bag of words Action State of Drag Using Hash Tags Count Pronouns Count Semantic Types Reply Tags URLs Drug names Relevance Semantic Groups Relevance in Unified Medicine Language system 21

22 Support Vector Machine decision function g(x) = sign(f(x)) hyper plane <w,x> + b = 0 [LW06, S. 6] [BTY12, S. 30] 22

23 Recipe recommendation [TLA12] Preparation Dishes removed quantities removed temperature Ingredient combination key ingredients modification options nutritions Gradient Boosting Machine Classification [Fri01] 23

24 Co-occurrence network User Preferences Sweet or savory [TLA12, S. 301] pointwise mutual information 24

25 Substitution network based on modification options [TLA12, S. 302] 25

26 Prediction performance [TLA12, S. 307] 26

27 Roadmap Project 1 Distance Function Testing principles libsvm svmlight r julia Classification 27

28 Sources [BTY12] Bian, Jiang ; Topaloglu, Umit ; Yu, Fan: Towards Large-scale Twitter Mining for Drug- related Adverse Events. In: Proceedings of the 2012 International Workshop on Smart Health and Wellbeing. New York, NY, USA : ACM, 2012 (SHB 12). ISBN , [FKPT07] Fahrmeir, Ludwig ; Künstler, Rita ; Pigeot, Iris ; Tutz, Gerhard: Statistik. Springer- Verlag Berlin Heidelberg, 2007 (Springer-Lehrbuch). id=zinjp103ircc. ISBN [FPSS96] Fayyad, Usama ; Piatetsky-Shapiro, Gregory ; Smyth, Padhraic: The KDD Pro- cess for Extracting Useful Knowledge from Volumes of Data. In: Commun. ACM 39 (1996), November, Nr. 11, dx.doi.org/ / DOI / ISSN [Fri01] Friedman, Jerome H.: Greedy function approximation: a gradient boosting machine. In: Annals of Statistics (2001), S [JMF99] Jain, A. K. ; Murty, M. N. ; Flynn, P. J.: Data Clustering: A Review. In: ACM Com- put. Surv. 31 (1999), September, Nr. 3, DOI / ISSN [LW06] Lovell, Brian C. ; Walder, Christian J.: Support vector machines for business applica- tions. (2006) [MHC06] Maulik, U. ; Holder, L.B. ; Cook, D.J.: Advanced Methods for Knowledge Discovery from Complex Data. Springer, 2006 (Advanced Information and Knowledge Processing). books?id=ooosx1x2-_sc. ISBN [TLA12] Teng, Chun-Yuen ; Lin, Yu-Ru ; Adamic, Lada A.: Recipe Recommendation Using Ingredient Networks. In: Proceedings of the 3rd Annual ACM Web Science Conference. New York, NY, USA : ACM, 2012 (WebSci 12). ISBN ,

Recommendations for cocktail recipes. Sigurd Sippel

Recommendations for cocktail recipes. Sigurd Sippel Recommendations for cocktail recipes Sigurd Sippel Goal recommendation by example Leading question Application: Recommendation for cocktails in bar Have a distance function a sufficient precision for recommendation?

More information

Keywords Fuzzy, Set Theory, KDD, Data Base, Transformed Database.

Keywords Fuzzy, Set Theory, KDD, Data Base, Transformed Database. Volume 6, Issue 5, May 016 ISSN: 77 18X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Fuzzy Logic in Online

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 1, Number 1 (2015), pp. 25-31 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

D B M G Data Base and Data Mining Group of Politecnico di Torino

D B M G Data Base and Data Mining Group of Politecnico di Torino DataBase and Data Mining Group of Data mining fundamentals Data Base and Data Mining Group of Data analysis Most companies own huge databases containing operational data textual documents experiment results

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

Using Graphs to Improve Activity Prediction in Smart Environments based on Motion Sensor Data

Using Graphs to Improve Activity Prediction in Smart Environments based on Motion Sensor Data Using Graphs to Improve Activity Prediction in Smart Environments based on Motion Sensor Data S. Seth Long and Lawrence B. Holder Washington State University Abstract. Activity Recognition in Smart Environments

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 4, April 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Discovering Knowledge

More information

9. Conclusions. 9.1 Definition KDD

9. Conclusions. 9.1 Definition KDD 9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

Query Disambiguation from Web Search Logs

Query Disambiguation from Web Search Logs Vol.133 (Information Technology and Computer Science 2016), pp.90-94 http://dx.doi.org/10.14257/astl.2016. Query Disambiguation from Web Search Logs Christian Højgaard 1, Joachim Sejr 2, and Yun-Gyung

More information

K-Mean Clustering Algorithm Implemented To E-Banking

K-Mean Clustering Algorithm Implemented To E-Banking K-Mean Clustering Algorithm Implemented To E-Banking Kanika Bansal Banasthali University Anjali Bohra Banasthali University Abstract As the nations are connected to each other, so is the banking sector.

More information

Data Mining: An experimental approach with WEKA on UCI Dataset

Data Mining: An experimental approach with WEKA on UCI Dataset Data Mining: An experimental approach with WEKA on UCI Dataset Ajay Kumar Dept. of computer science Shivaji College University of Delhi, India Indranath Chatterjee Dept. of computer science Faculty of

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data

Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data Jesse Read 1, Albert Bifet 2, Bernhard Pfahringer 2, Geoff Holmes 2 1 Department of Signal Theory and Communications Universidad

More information

A New Approach To Graph Based Object Classification On Images

A New Approach To Graph Based Object Classification On Images A New Approach To Graph Based Object Classification On Images Sandhya S Krishnan,Kavitha V K P.G Scholar, Dept of CSE, BMCE, Kollam, Kerala, India Sandhya4parvathy@gmail.com Abstract: The main idea of

More information

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,

More information

Feature weighting classification algorithm in the application of text data processing research

Feature weighting classification algorithm in the application of text data processing research , pp.41-47 http://dx.doi.org/10.14257/astl.2016.134.07 Feature weighting classification algorithm in the application of text data research Zhou Chengyi University of Science and Technology Liaoning, Anshan,

More information

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44 Data Mining Piotr Paszek piotr.paszek@us.edu.pl Introduction (Piotr Paszek) Data Mining DM KDD 1 / 44 Plan of the lecture 1 Data Mining (DM) 2 Knowledge Discovery in Databases (KDD) 3 CRISP-DM 4 DM software

More information

SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER

SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER P.Radhabai Mrs.M.Priya Packialatha Dr.G.Geetha PG Student Assistant Professor Professor Dept of Computer Science and Engg Dept

More information

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Richa Jain 1, Namrata Sharma 2 1M.Tech Scholar, Department of CSE, Sushila Devi Bansal College of Engineering, Indore (M.P.),

More information

Information mining and information retrieval : methods and applications

Information mining and information retrieval : methods and applications Information mining and information retrieval : methods and applications J. Mothe, C. Chrisment Institut de Recherche en Informatique de Toulouse Université Paul Sabatier, 118 Route de Narbonne, 31062 Toulouse

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

Pre-Requisites: CS2510. NU Core Designations: AD

Pre-Requisites: CS2510. NU Core Designations: AD DS4100: Data Collection, Integration and Analysis Teaches how to collect data from multiple sources and integrate them into consistent data sets. Explains how to use semi-automated and automated classification

More information

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei Data Mining Chapter 1: Introduction Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei 1 Any Question? Just Ask 3 Chapter 1. Introduction Why Data Mining? What Is Data Mining? A Multi-Dimensional

More information

A Survey on Postive and Unlabelled Learning

A Survey on Postive and Unlabelled Learning A Survey on Postive and Unlabelled Learning Gang Li Computer & Information Sciences University of Delaware ligang@udel.edu Abstract In this paper we survey the main algorithms used in positive and unlabeled

More information

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer Data Mining George Karypis Department of Computer Science Digital Technology Center University of Minnesota, Minneapolis, USA. http://www.cs.umn.edu/~karypis karypis@cs.umn.edu Overview Data-mining What

More information

Data mining fundamentals

Data mining fundamentals Data mining fundamentals Elena Baralis Politecnico di Torino Data analysis Most companies own huge bases containing operational textual documents experiment results These bases are a potential source of

More information

Summary. Machine Learning: Introduction. Marcin Sydow

Summary. Machine Learning: Introduction. Marcin Sydow Outline of this Lecture Data Motivation for Data Mining and Learning Idea of Learning Decision Table: Cases and Attributes Supervised and Unsupervised Learning Classication and Regression Examples Data:

More information

Differential Privacy. CPSC 457/557, Fall 13 10/31/13 Hushiyang Liu

Differential Privacy. CPSC 457/557, Fall 13 10/31/13 Hushiyang Liu Differential Privacy CPSC 457/557, Fall 13 10/31/13 Hushiyang Liu Era of big data Motivation: Utility vs. Privacy large-size database automatized data analysis Utility "analyze and extract knowledge from

More information

Towards Semantic Data Mining

Towards Semantic Data Mining Towards Semantic Data Mining Haishan Liu Department of Computer and Information Science, University of Oregon, Eugene, OR, 97401, USA ahoyleo@cs.uoregon.edu Abstract. Incorporating domain knowledge is

More information

New Orleans, Louisiana, February/March Knowledge Discovery from Telecommunication. Network Alarm Databases. K. Hatonen M. Klemettinen H.

New Orleans, Louisiana, February/March Knowledge Discovery from Telecommunication. Network Alarm Databases. K. Hatonen M. Klemettinen H. To appear in the 12th International Conference on Data Engineering (ICDE'96), New Orleans, Louisiana, February/March 1996. Knowledge Discovery from Telecommunication Network Alarm Databases K. Hatonen

More information

Search Computing: Business Areas, Research and Socio-Economic Challenges

Search Computing: Business Areas, Research and Socio-Economic Challenges Search Computing: Business Areas, Research and Socio-Economic Challenges Yiannis Kompatsiaris, Spiros Nikolopoulos, CERTH--ITI NEM SUMMIT Torino-Italy, 28th September 2011 Media Search Cluster Search Computing

More information

INTRODUCTION TO DATA MINING

INTRODUCTION TO DATA MINING INTRODUCTION TO DATA MINING 1 Chiara Renso KDDLab - ISTI CNR, Italy http://www-kdd.isti.cnr.it email: chiara.renso@isti.cnr.it Knowledge Discovery and Data Mining Laboratory, ISTI National Research Council,

More information

Data Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University

Data Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University Data Mining Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused Web data, e-commerce

More information

arxiv: v1 [cs.db] 7 Dec 2011

arxiv: v1 [cs.db] 7 Dec 2011 Using Taxonomies to Facilitate the Analysis of the Association Rules Marcos Aurélio Domingues 1 and Solange Oliveira Rezende 2 arxiv:1112.1734v1 [cs.db] 7 Dec 2011 1 LIACC-NIAAD Universidade do Porto Rua

More information

Sequences Modeling and Analysis Based on Complex Network

Sequences Modeling and Analysis Based on Complex Network Sequences Modeling and Analysis Based on Complex Network Li Wan 1, Kai Shu 1, and Yu Guo 2 1 Chongqing University, China 2 Institute of Chemical Defence People Libration Army {wanli,shukai}@cqu.edu.cn

More information

Comparative analysis of data mining methods for predicting credit default probabilities in a retail bank portfolio

Comparative analysis of data mining methods for predicting credit default probabilities in a retail bank portfolio Comparative analysis of data mining methods for predicting credit default probabilities in a retail bank portfolio Adela Ioana Tudor, Adela Bâra, Simona Vasilica Oprea Department of Economic Informatics

More information

Oleksandr Kuzomin, Bohdan Tkachenko

Oleksandr Kuzomin, Bohdan Tkachenko International Journal "Information Technologies Knowledge" Volume 9, Number 2, 2015 131 INTELLECTUAL SEARCH ENGINE OF ADEQUATE INFORMATION IN INTERNET FOR CREATING DATABASES AND KNOWLEDGE BASES Oleksandr

More information

DATA MINING OF NS-2 TRACE FILE

DATA MINING OF NS-2 TRACE FILE International Journal of Wireless & Mobile Networks (IJWMN) Vol. 6, No. 5, October 214 DATA MINING OF NS-2 TRACE FILE ABSTRACT Ahmed Jawad Kadhim IRAQ- Ministry of Education- General Directorate of Education

More information

Deep Tensor: Eliciting New Insights from Graph Data that Express Relationships between People and Things

Deep Tensor: Eliciting New Insights from Graph Data that Express Relationships between People and Things Deep Tensor: Eliciting New Insights from Graph Data that Express Relationships between People and Things Koji Maruhashi An important problem in information and communications technology (ICT) is classifying

More information

Practical Guidance for Machine Learning Applications

Practical Guidance for Machine Learning Applications Practical Guidance for Machine Learning Applications Brett Wujek About the authors Material from SGF Paper SAS2360-2016 Brett Wujek Senior Data Scientist, Advanced Analytics R&D ~20 years developing engineering

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.

More information

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA) International Journal of Innovation and Scientific Research ISSN 2351-8014 Vol. 12 No. 1 Nov. 2014, pp. 217-222 2014 Innovative Space of Scientific Research Journals http://www.ijisr.issr-journals.org/

More information

SEMANTIC ENHANCED UDDI USING OWL-S PROFILE ONTOLOGY FOR THE AUTOMATIC DISCOVERY OF WEB SERVICES IN THE DOMAIN OF TELECOMMUNICATION

SEMANTIC ENHANCED UDDI USING OWL-S PROFILE ONTOLOGY FOR THE AUTOMATIC DISCOVERY OF WEB SERVICES IN THE DOMAIN OF TELECOMMUNICATION Journal of Computer Science 10 (8): 1418-1422, 2014 ISSN: 1549-3636 2014 doi:10.3844/jcssp.2014.1418.1422 Published Online 10 (8) 2014 (http://www.thescipub.com/jcs.toc) SEMANTIC ENHANCED UDDI USING OWL-S

More information

C00-2. 資料探勘簡介 Data Mining 吳漢銘國立臺北大學統計學系.

C00-2. 資料探勘簡介 Data Mining 吳漢銘國立臺北大學統計學系. C00-2 資料探勘簡介 Data Mining 吳漢銘國立臺北大學統計學系 為什麼要使用 R 做為資料探勘工具? 2/30 Why R? R is a high-quality, cross-platform, flexible, widely used open source, free language for statistics, graphics, mathematics, and data

More information

Data Mining in Bioinformatics: Study & Survey

Data Mining in Bioinformatics: Study & Survey Data Mining in Bioinformatics: Study & Survey Saliha V S St. Joseph s college Irinjalakuda Abstract--Large amounts of data are generated in medical research. A biological database consists of a collection

More information

Mining Data Streams. From Data-Streams Management System Queries to Knowledge Discovery from continuous and fast-evolving Data Records.

Mining Data Streams. From Data-Streams Management System Queries to Knowledge Discovery from continuous and fast-evolving Data Records. DATA STREAMS MINING Mining Data Streams From Data-Streams Management System Queries to Knowledge Discovery from continuous and fast-evolving Data Records. Hammad Haleem Xavier Plantaz APPLICATIONS Sensors

More information

CS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University

CS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University CS423: Data Mining Introduction Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS423: Data Mining 1 / 29 Quote of the day Never memorize something that

More information

Reliable Data Mining Tasks and Techniques for Industrial Applications

Reliable Data Mining Tasks and Techniques for Industrial Applications Reliable Data Mining Tasks and Techniques for Industrial Applications G.Sabarmathi 1, Dr.R.Chinnaiyan 2 1 Department of Computer Science, Christ Academy Institute for Advance Studies, Bangalore University,

More information

Additive Regression Applied to a Large-Scale Collaborative Filtering Problem

Additive Regression Applied to a Large-Scale Collaborative Filtering Problem Additive Regression Applied to a Large-Scale Collaborative Filtering Problem Eibe Frank 1 and Mark Hall 2 1 Department of Computer Science, University of Waikato, Hamilton, New Zealand eibe@cs.waikato.ac.nz

More information

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms

More information

Linking Entities in Chinese Queries to Knowledge Graph

Linking Entities in Chinese Queries to Knowledge Graph Linking Entities in Chinese Queries to Knowledge Graph Jun Li 1, Jinxian Pan 2, Chen Ye 1, Yong Huang 1, Danlu Wen 1, and Zhichun Wang 1(B) 1 Beijing Normal University, Beijing, China zcwang@bnu.edu.cn

More information

Data Science Course Content

Data Science Course Content CHAPTER 1: INTRODUCTION TO DATA SCIENCE Data Science Course Content What is the need for Data Scientists Data Science Foundation Business Intelligence Data Analysis Data Mining Machine Learning Difference

More information

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA Knowledge Discovery Javier Béjar URL - Spring 2019 CS - MIA Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics

More information

KBSVM: KMeans-based SVM for Business Intelligence

KBSVM: KMeans-based SVM for Business Intelligence Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2004 Proceedings Americas Conference on Information Systems (AMCIS) December 2004 KBSVM: KMeans-based SVM for Business Intelligence

More information

ANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, Comparative Study of Classification Algorithms Using Data Mining

ANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, Comparative Study of Classification Algorithms Using Data Mining ANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, 2014 ISSN 2278 5485 EISSN 2278 5477 discovery Science Comparative Study of Classification Algorithms Using Data Mining Akhila

More information

COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES

COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES USING DIFFERENT DATASETS V. Vaithiyanathan 1, K. Rajeswari 2, Kapil Tajane 3, Rahul Pitale 3 1 Associate Dean Research, CTS Chair Professor, SASTRA University,

More information

PARALLEL CLASSIFICATION ALGORITHMS

PARALLEL CLASSIFICATION ALGORITHMS PARALLEL CLASSIFICATION ALGORITHMS By: Faiz Quraishi Riti Sharma 9 th May, 2013 OVERVIEW Introduction Types of Classification Linear Classification Support Vector Machines Parallel SVM Approach Decision

More information

Study and Analysis of Recommendation Systems for Location Based Social Network (LBSN)

Study and Analysis of Recommendation Systems for Location Based Social Network (LBSN) , pp.421-426 http://dx.doi.org/10.14257/astl.2017.147.60 Study and Analysis of Recommendation Systems for Location Based Social Network (LBSN) N. Ganesh 1, K. SaiShirini 1, Ch. AlekhyaSri 1 and Venkata

More information

SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR

SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR P.SHENBAGAVALLI M.E., Research Scholar, Assistant professor/cse MPNMJ Engineering college Sspshenba2@gmail.com J.SARAVANAKUMAR B.Tech(IT)., PG

More information

Data Mining Technology Based on Bayesian Network Structure Applied in Learning

Data Mining Technology Based on Bayesian Network Structure Applied in Learning , pp.67-71 http://dx.doi.org/10.14257/astl.2016.137.12 Data Mining Technology Based on Bayesian Network Structure Applied in Learning Chunhua Wang, Dong Han College of Information Engineering, Huanghuai

More information

Iris recognition using SVM and BP algorithms

Iris recognition using SVM and BP algorithms International Journal of Engineering Research and Advanced Technology (IJERAT) DOI: http://dx.doi.org/10.31695/ijerat.2018.3262 E-ISSN : 2454-6135 Volume.4, Issue 5 May -2018 Iris recognition using SVM

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

Classification by Support Vector Machines

Classification by Support Vector Machines Classification by Support Vector Machines Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Practical DNA Microarray Analysis 2003 1 Overview I II III

More information

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set Renu Vashist School of Computer Science and Engineering Shri Mata Vaishno Devi University, Katra,

More information

Sentiment Web Mining Architecture - Shahriar Movafaghi

Sentiment Web Mining Architecture - Shahriar Movafaghi Available online at www.sciencedirect.com Procedia - Social and Behavioral Sciences 26 (2011) 191 197 COINs 2010 Sentiment Web Mining Architecture - Shahriar Movafaghi Shahria Movafaghi a, Jack Bullock

More information

Data Mining: STATISTICA

Data Mining: STATISTICA Outline Data Mining: STATISTICA Prepare the data Classification and regression (C & R, ANN) Clustering Association rules Graphic user interface Prepare the Data Statistica can read from Excel,.txt and

More information

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,

More information

An Experimental Analysis of Outliers Detection on Static Exaustive Datasets.

An Experimental Analysis of Outliers Detection on Static Exaustive Datasets. International Journal Latest Trends in Engineering and Technology Vol.(7)Issue(3), pp. 319-325 DOI: http://dx.doi.org/10.21172/1.73.544 e ISSN:2278 621X An Experimental Analysis Outliers Detection on Static

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 7, July-2013 ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 7, July-2013 ISSN 1 Review: Boosting Classifiers For Intrusion Detection Richa Rawat, Anurag Jain ABSTRACT Network and host intrusion detection systems monitor malicious activities and the management station is a technique

More information

SERVICE RECOMMENDATION ON WIKI-WS PLATFORM

SERVICE RECOMMENDATION ON WIKI-WS PLATFORM TASKQUARTERLYvol.19,No4,2015,pp.445 453 SERVICE RECOMMENDATION ON WIKI-WS PLATFORM ANDRZEJ SOBECKI Academic Computer Centre, Gdansk University of Technology Narutowicza 11/12, 80-233 Gdansk, Poland (received:

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技法 ) Lecture 16: Finale Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系

More information

Notes on Support Vector Machines

Notes on Support Vector Machines Western Kentucky University From the SelectedWorks of Matt Bogard Summer May, 2012 Notes on Support Vector Machines Matt Bogard, Western Kentucky University Available at: https://works.bepress.com/matt_bogard/20/

More information

Co-clustering for differentially private synthetic data generation

Co-clustering for differentially private synthetic data generation Co-clustering for differentially private synthetic data generation Tarek Benkhelif, Françoise Fessant, Fabrice Clérot and Guillaume Raschia January 23, 2018 Orange Labs & LS2N Journée thématique EGC &

More information

DATA ANALYSIS I. Types of Attributes Sparse, Incomplete, Inaccurate Data

DATA ANALYSIS I. Types of Attributes Sparse, Incomplete, Inaccurate Data DATA ANALYSIS I Types of Attributes Sparse, Incomplete, Inaccurate Data Sources Bramer, M. (2013). Principles of data mining. Springer. [12-21] Witten, I. H., Frank, E. (2011). Data Mining: Practical machine

More information

Classification of Tweets using Supervised and Semisupervised Learning

Classification of Tweets using Supervised and Semisupervised Learning Classification of Tweets using Supervised and Semisupervised Learning Achin Jain, Kuk Jang I. INTRODUCTION The goal of this project is to classify the given tweets into 2 categories, namely happy and sad.

More information

Clustering Analysis based on Data Mining Applications Xuedong Fan

Clustering Analysis based on Data Mining Applications Xuedong Fan Applied Mechanics and Materials Online: 203-02-3 ISSN: 662-7482, Vols. 303-306, pp 026-029 doi:0.4028/www.scientific.net/amm.303-306.026 203 Trans Tech Publications, Switzerland Clustering Analysis based

More information

Open Access Research on the Data Pre-Processing in the Network Abnormal Intrusion Detection

Open Access Research on the Data Pre-Processing in the Network Abnormal Intrusion Detection Send Orders for Reprints to reprints@benthamscience.ae 1228 The Open Automation and Control Systems Journal, 2014, 6, 1228-1232 Open Access Research on the Data Pre-Processing in the Network Abnormal Intrusion

More information

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA

More information

An Efficient Hash-based Association Rule Mining Approach for Document Clustering

An Efficient Hash-based Association Rule Mining Approach for Document Clustering An Efficient Hash-based Association Rule Mining Approach for Document Clustering NOHA NEGM #1, PASSENT ELKAFRAWY #2, ABD-ELBADEEH SALEM * 3 # Faculty of Science, Menoufia University Shebin El-Kom, EGYPT

More information

Techniques for Mining Text Documents

Techniques for Mining Text Documents Techniques for Mining Text Documents Ranveer Kaur M.Tech, Computer Science and Engineering Sri Guru Granth Sahib World University, Fatehgarh Sahib, Punjab, India Shruti Aggarwal Assistant Professor, Computer

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining

More information

Automated Tagging for Online Q&A Forums

Automated Tagging for Online Q&A Forums 1 Automated Tagging for Online Q&A Forums Rajat Sharma, Nitin Kalra, Gautam Nagpal University of California, San Diego, La Jolla, CA 92093, USA {ras043, nikalra, gnagpal}@ucsd.edu Abstract Hashtags created

More information

CSE 316: SOCIAL NETWORK ANALYSIS INTRODUCTION. Fall 2017 Marion Neumann

CSE 316: SOCIAL NETWORK ANALYSIS INTRODUCTION. Fall 2017 Marion Neumann CSE 316: SOCIAL NETWORK ANALYSIS Fall 2017 Marion Neumann INTRODUCTION Contents in these slides may be subject to copyright. Some materials are adopted from: http://www.cs.cornell.edu/home /kleinber/ networks-book,

More information

Data Warehousing and Machine Learning

Data Warehousing and Machine Learning Data Warehousing and Machine Learning Introduction Thomas D. Nielsen Aalborg University Department of Computer Science Spring 2008 DWML Spring 2008 1 / 47 What is Data Mining?? Introduction DWML Spring

More information

Outline. Prepare the data Classification and regression Clustering Association rules Graphic user interface

Outline. Prepare the data Classification and regression Clustering Association rules Graphic user interface Data Mining: i STATISTICA Outline Prepare the data Classification and regression Clustering Association rules Graphic user interface 1 Prepare the Data Statistica can read from Excel,.txt and many other

More information

arxiv: v1 [cs.lg] 3 Oct 2018

arxiv: v1 [cs.lg] 3 Oct 2018 Real-time Clustering Algorithm Based on Predefined Level-of-Similarity Real-time Clustering Algorithm Based on Predefined Level-of-Similarity arxiv:1810.01878v1 [cs.lg] 3 Oct 2018 Rabindra Lamsal Shubham

More information

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION Ms. Nikita P.Katariya 1, Prof. M. S. Chaudhari 2 1 Dept. of Computer Science & Engg, P.B.C.E., Nagpur, India, nikitakatariya@yahoo.com 2 Dept.

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

A Survey On Different Text Clustering Techniques For Patent Analysis

A Survey On Different Text Clustering Techniques For Patent Analysis A Survey On Different Text Clustering Techniques For Patent Analysis Abhilash Sharma Assistant Professor, CSE Department RIMT IET, Mandi Gobindgarh, Punjab, INDIA ABSTRACT Patent analysis is a management

More information

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.7

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.7 Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.7 March 1, 2007 CSE-4412: Data Mining 1 Chapter 6 Classification and Prediction 1. What is classification? What is prediction?

More information

Advanced Data Mining Techniques

Advanced Data Mining Techniques Advanced Data Mining Techniques David L. Olson Dursun Delen Advanced Data Mining Techniques Dr. David L. Olson Department of Management Science University of Nebraska Lincoln, NE 68588-0491 USA dolson3@unl.edu

More information

Diagnostics of Product Defects by Clustering and Machine Learning Classification Algorithm

Diagnostics of Product Defects by Clustering and Machine Learning Classification Algorithm Journal of Automation and Control, 2015, Vol. 3, No. 3, 96-100 Available online at http://pubs.sciepub.com/autoamtion/3/3/11 Science and Education Publishing DOI:10.12691/automation-3-3-11 Diagnostics

More information

Machine Learning. Chao Lan

Machine Learning. Chao Lan Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian

More information

A STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES

A STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES A STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES Narsaiah Putta Assistant professor Department of CSE, VASAVI College of Engineering, Hyderabad, Telangana, India Abstract Abstract An Classification

More information

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús

More information

Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset

Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset International Journal of Computer Applications (0975 8887) Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset Mehdi Naseriparsa Islamic Azad University Tehran

More information

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs) Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based

More information