Applicability of Process Mining Techniques in Business Environments

Size: px
Start display at page:

Download "Applicability of Process Mining Techniques in Business Environments"

Transcription

1 Applicability of Process Mining Techniques in Business Environments Annual Meeting IEEE Task Force on Process Mining Andrea Burattin andreaburattin September 8, 2014

2 Brief Curriculum Vitæ 2009, M.Sc. Computer Science (A.I. program) University of Padova , Ph.D. Supervisor: Prof. Alessandro Sperduti Joint school University of Bologna Padova Thesis defended on April , Postdoc Prompt project (prompt.processmining.it) University of Padova Specola, Padova. 2 of 17

3 Ph.D. Inception Ph.D background Inception during M.Sc. thesis ˆ Companies: study on process mining A company (Siav S.p.A.) funded my PhD ˆ Aim: investigate applicability of process mining techniques in business scenarios ˆ Interaction with companies: interesting! (but sometimes... ) Outcome ˆ Applicability of Process Mining Techniques in Business Environments 3 of 17

4 Quick Recap of Process Mining Imagination Incarnation / Environment Operational Model implement Operational Incarnation control Information S ystem (re-)design describe basis Process Mining support protocol / audit Extension analyze Analytical Model augment compare Conformance compare Event Logs create Discovery mine Observation Source: C. Günther, Process mining in Flexible Environments. PhD thesis, TU/e, Eindhoven, of 17

5 Quick Recap of Process Mining Imagination Incarnation / Environment Operational Model implement Operational Incarnation control Information S ystem (re-)design describe basis Process Mining support protocol / audit Extension analyze Analytical Model augment compare Conformance compare Event Logs create Discovery mine Observation Source: C. Günther, Process mining in Flexible Environments. PhD thesis, TU/e, Eindhoven, of 17

6 Quick Recap of Process Mining Imagination Incarnation / Environment Operational Model implement Operational Incarnation control Information S ystem (re-)design describe basis Process Mining support protocol / audit Extension analyze Analytical Model augment compare Conformance compare Event Logs create Discovery mine Observation Source: C. Günther, Process mining in Flexible Environments. PhD thesis, TU/e, Eindhoven, of 17

7 Quick Recap of Process Mining Imagination Incarnation / Environment Operational Model implement Operational Incarnation control Information S ystem (re-)design describe basis Process Mining support protocol / audit Extension analyze Analytical Model augment compare Conformance compare Event Logs create Discovery mine Observation Source: C. Günther, Process mining in Flexible Environments. PhD thesis, TU/e, Eindhoven, of 17

8 Theoretical vs. Industrial-related Open Problems Some literature open problems Duplicate tasks Exploiting all data available Holistic mining Dierent perspectives from dierent sources Noise and incompleteness 5 of 17

9 Theoretical vs. Industrial-related Open Problems Some literature open problems Duplicate tasks Exploiting all data available Holistic mining Dierent perspectives from dierent sources Noise and incompleteness Case studies open problems Using process mining tools and conguring algorithms Results interpretation Readable results Computational power and storage capacity required 5 of 17

10 Theoretical vs. Industrial-related Open Problems Some literature open problems Duplicate tasks Exploiting all data available Holistic mining Dierent perspectives from dierent sources Noise and incompleteness Case studies open problems Using process mining tools and conguring algorithms Results interpretation Readable results Computational power and storage capacity required Not overlapping sets 5 of 17

11 Possible Industry Scenarios Four possible industry scenarios Process aware vs. Process unaware Process aware software vs. Process unaware software Process Aware Companies Company 4 Company 3 Process Unaware Companies Company 1 Company 2 Process Unaware Information Systems Process Aware Information Systems 6 of 17

12 Thesis Structure and Organization Data Prepara on Process Mining Capable Event Logs Process Mining Capable Event Stream Control flow Mining Stream Control flow Mining Process Extension Process Representa on Results Evalua on Model Evalua on 6 of 17

13 Overview Data Preparation Data Prepara on Process Mining Capable Event Logs Process Mining Capable Event Stream Control flow Mining Stream Control flow Mining Process Extension Process Representa on Results Evalua on Model Evalua on 6 of 17

14 Problems with Data Preparation Problems at dierent complexity and abstraction levels. Examples: Adaptation of existing data (Syntax problem, easy) Introduction of new information (Dicult) 7 of 17

15 Problems with Data Preparation Problems at dierent complexity and abstraction levels. Examples: Adaptation of existing data (Syntax problem, easy) Introduction of new information (Dicult) Typical set of required elds (case-id; activity; timestamp; [process-name]; [originator]) 7 of 17

16 Problems with Data Preparation Problems at dierent complexity and abstraction levels. Examples: Adaptation of existing data (Syntax problem, easy) Introduction of new information (Dicult) Typical set of required elds (case-id; activity; timestamp; [process-name]; [originator]) Our context: Company process aware; IS process unaware Structure of available log (activity; timestamp; originator; info 1 ;... ; info n ) 7 of 17

17 Problems with Data Preparation (cont.) Case-id from info i elds Candidate case-id elds A-priori knowledge Events chains Strings similarity functions Selection of maximal chain Most activities or simplest chain Process name is not a problem All events belonging to the same process 8 of 17

18 Problems with Data Preparation (cont.) Case-id from info i elds Candidate case-id elds A-priori knowledge Events chains Strings similarity functions Selection of maximal chain Most activities or simplest chain Process name is not a problem All events belonging to the same process Act. info 1 info 2 a 1 AB-01 BB-01 a 2 AA-02 AB-01 a 3 AB-01 BB-02 a 4 AB-01 BB-03 a 1 AA-03 BB-04 a 5 AA-03 BB-05 8 of 17

19 Overview Control-ow Mining Data Prepara on Process Mining Capable Event Logs Process Mining Capable Event Stream Control flow Mining Stream Control flow Mining Process Extension Process Representa on Results Evalua on Model Evalua on 8 of 17

20 Exploiting Data Available Events with duration instead of instantaneous event Generalization of Heuristics Miner to exploit this new information Time Sub ac vity 1 Sub ac vity 2 Sub ac vity n 1 Sub ac vity n Start End Main ac vity 9 of 17

21 Exploiting Data Available Events with duration instead of instantaneous event Generalization of Heuristics Miner to exploit this new information Time Sub ac vity 1 Sub ac vity 2 Sub ac vity n 1 Sub ac vity n Start End Main ac vity A B Process with events as me intervals B A D C D C A B C D Time A B C D Process with instantaneous events 9 of 17

22 Not-expert Users Our users: not-expert in process mining, with notions of BPM 10 of 17

23 Not-expert Users Our users: not-expert in process mining, with notions of BPM Observations Process mining algorithms require congurations Typically, algorithm congurations are threshold on measures The mining log is nite Only a nite amount of congurations possible 10 of 17

24 Not-expert Users Our users: not-expert in process mining, with notions of BPM Observations Process mining algorithms require congurations Typically, algorithm congurations are threshold on measures The mining log is nite Only a nite amount of congurations possible We are able to discretize the parameter values τ 1 =? τ 2 =? τ 3 =? τ 4 =? E A B C D B A C F D? E A A C B B C D D 10 of 17

25 Model Selection Approaches User-guided Approach Hierarchical clustering of models Average linkage Any model-to-model metric Process 1 Process 10 Process 9 Process 8 Process 5 Process 6 Process 4 Process 7 Process 2 Process Navigation of the dendrogram 11 of 17

26 Model Selection Approaches User-guided Approach Hierarchical clustering of models Average linkage Any model-to-model metric Automatic Approach Hill climbing with Maximum plateau steps Random restarts (Local optimum) h MDL = arg min h H L(h) + L(D h) Process 1 Process 10 Process 9 Process 8 Process 5 Process 6 Process 4 Process 7 Process 2 Process Navigation of the dendrogram MDL encodings MDL by Calders et al. Simplied heuristics 11 of 17

27 Overview Results Evaluation Data Prepara on Process Mining Capable Event Logs Process Mining Capable Event Stream Control flow Mining Stream Control flow Mining Process Extension Process Representa on Results Evalua on Model Evalua on 11 of 17

28 Evaluation Metrics Model-to-model Metric Complex process into Permitted relations Forbidden relations Generation rules (based on Alpha alg.) A B A > B, B A A B A > B, B > A A # B A B, B A Comparison as Jaccard similarity on two sets (> and ) 12 of 17

29 Evaluation Metrics Model-to-model Metric Complex process into Permitted relations Forbidden relations Generation rules (based on Alpha alg.) A B A > B, B A A B A > B, B > A A # B A B, B A Comparison as Jaccard similarity on two sets (> and ) Model-to-log Metric Declare constraint π and a trace σ healthiness measures Activation sparsity: 1 n a(σ,π) n(σ) Violation ratio: n v (σ,π) n a(σ,π) Fulllment ratio: Conict ratio: n f (σ,π) n a(σ,π) n c (σ,π) n a(σ,π) 12 of 17

30 Overview Process Extension Data Prepara on Process Mining Capable Event Logs Process Mining Capable Event Stream Control flow Mining Stream Control flow Mining Process Extension Process Representa on Results Evalua on Model Evalua on 12 of 17

31 Multiperspective Mining Given Log with information on originators Process model Assumption Roles are characterized by consistent set of originators We add roles to the model 13 of 17

32 Multiperspective Mining Given Log with information on originators Process model Assumption Roles are characterized by consistent set of originators We add roles to the model 1 Dependencies as handover of roles 2 Remove dependencies below threshold Connected components are candidate roles 3 Merge candidate roles if users sets similarities above threshold Entropy-based metric to tune thresholds 13 of 17

33 Overview Stream Control-ow Mining Data Prepara on Process Mining Capable Event Logs Process Mining Capable Event Stream Control flow Mining Stream Control flow Mining Process Extension Process Representa on Results Evalua on Model Evalua on 13 of 17

34 Stream Context Stream Mining Peculiarities Cannot store the entire stream Approximation Backtracking not feasible One pass over data Variable system condition Ex. uctuating stream rates Adapt the model to new data Concept drifts Completely new problems! 14 of 17

35 Stream Context Stream Mining Peculiarities Cannot store the entire stream Approximation Backtracking not feasible One pass over data Variable system condition Ex. uctuating stream rates Adapt the model to new data Concept drifts Principle Recent observations are more important than older ones Completely new problems! 14 of 17

36 Stream Context Stream Mining Peculiarities Cannot store the entire stream Approximation Backtracking not feasible One pass over data Variable system condition Ex. uctuating stream rates Adapt the model to new data Concept drifts Completely new problems! Principle Recent observations are more important than older ones 3 version of Heuristics Miner Based on Sliding Window Based on Lossy Counting Based on Budget Lossy Counting 14 of 17

37 Overview Data Prepara on Process Mining Capable Event Logs Process Mining Capable Event Stream Control flow Mining Stream Control flow Mining Process Extension Process Representa on Results Evalua on Model Evalua on 14 of 17

38 Extra: Processes and Logs Generator Companies are reluctant to share their data Researchers need to do tests (No BPI challenges at that time) 15 of 17

39 Extra: Processes and Logs Generator Companies are reluctant to share their data Researchers need to do tests (No BPI challenges at that time) Processes and Logs Generator Stochastic context free grammar generates random processes Rules to simulate a process and produce an event log Reference model used for evaluation control-ow mining algorithms A a P astart G (G; G) a end (G G) (G; G) A; (G G); A A b A A e f c d A g 15 of 17

40 Detailed Map of Performed Activities Legacy, Process unaware Informa on Systems Data Prepara on Process Mining Capable Event Logs Process Mining Capable Event Stream Control flow Mining Algorithm Exploi ng More Data User guided Discovery Algorithm Configura on Automa c Algorithm Configura on Event Logs Generator Stream Control flow Mining Framework Process Representa on (e.g. Dependency Graph, Petri Net) Extension of Process Models with Organiza onal Roles Random Process Generator Model to model Metric Model to log Metric Model Evalua on (wrt Log / Original Model) 16 of 17

41 Thanks! Doing the Ph.D. has been amazing! A huge Thank you! to My supervisor, Alessandro Sperduti Siav S.p.A. and Roberto Pinelli My internal examiners: Tullio Vardanega, Paolo Baldan My external examiners: Barbara Weber, Diogo Ferreira All the process mining community! 17 of 17

Part II Workflow discovery algorithms

Part II Workflow discovery algorithms Process Mining Part II Workflow discovery algorithms Induction of Control-Flow Graphs α-algorithm Heuristic Miner Fuzzy Miner Outline Part I Introduction to Process Mining Context, motivation and goal

More information

Data Streams in ProM 6: A Single-Node Architecture

Data Streams in ProM 6: A Single-Node Architecture Data Streams in ProM 6: A Single-Node Architecture S.J. van Zelst, A. Burattin 2, B.F. van Dongen and H.M.W. Verbeek Eindhoven University of Technology {s.j.v.zelst,b.f.v.dongen,h.m.w.verbeek}@tue.nl 2

More information

Reality Mining Via Process Mining

Reality Mining Via Process Mining Reality Mining Via Process Mining O. M. Hassan, M. S. Farag, and M. M. Mohie El-Din Abstract Reality mining project work on Ubiquitous Mobile Systems (UMSs) that allow for automated capturing of events.

More information

Online Conformance Checking for Petri Nets and Event Streams

Online Conformance Checking for Petri Nets and Event Streams Online Conformance Checking for Petri Nets and Event Streams Andrea Burattin University of Innsbruck, Austria; Technical University of Denmark, Denmark andbur@dtu.dk Abstract. Within process mining, we

More information

Online Conformance Checking for Petri Nets and Event Streams

Online Conformance Checking for Petri Nets and Event Streams Downloaded from orbit.dtu.dk on: Apr 30, 2018 Online Conformance Checking for Petri Nets and Event Streams Burattin, Andrea Published in: Online Proceedings of the BPM Demo Track 2017 Publication date:

More information

Dierencegraph - A ProM Plugin for Calculating and Visualizing Dierences between Processes

Dierencegraph - A ProM Plugin for Calculating and Visualizing Dierences between Processes Dierencegraph - A ProM Plugin for Calculating and Visualizing Dierences between Processes Manuel Gall 1, Günter Wallner 2, Simone Kriglstein 3, Stefanie Rinderle-Ma 1 1 University of Vienna, Faculty of

More information

Decomposed Process Mining with DivideAndConquer

Decomposed Process Mining with DivideAndConquer Decomposed Process Mining with DivideAndConquer H.M.W. Verbeek Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands h.m.w.verbeek@tue.nl Abstract.

More information

Bidimensional Process Discovery for Mining BPMN Models

Bidimensional Process Discovery for Mining BPMN Models Bidimensional Process Discovery for Mining BPMN Models DeMiMoP 2014, Haifa Eindhoven Jochen De Weerdt, KU Leuven (@jochendw) Seppe vanden Broucke, KU Leuven (@macuyiko) (presenter) Filip Caron, KU Leuven

More information

Reality Mining Via Process Mining

Reality Mining Via Process Mining Reality Mining Via Process Mining O. M. Hassan, M. S. Farag, M. M. MohieEl-Din Department of Mathematics, Facility of Science Al-Azhar University Cairo, Egypt {ohassan, farag.sayed, mmeldin}@azhar.edu.eg

More information

Machine Learning for Software Engineering

Machine Learning for Software Engineering Machine Learning for Software Engineering Single-State Meta-Heuristics Prof. Dr.-Ing. Norbert Siegmund Intelligent Software Systems 1 2 Recap: Goal is to Find the Optimum Challenges of general optimization

More information

Introduction. Process Mining post-execution analysis Process Simulation what-if analysis

Introduction. Process Mining post-execution analysis Process Simulation what-if analysis Process mining Process mining is the missing link between model-based process analysis and dataoriented analysis techniques. Through concrete data sets and easy to use software the process mining provides

More information

Types of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters

Types of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters Types of general clustering methods Clustering Algorithms for general similarity measures agglomerative versus divisive algorithms agglomerative = bottom-up build up clusters from single objects divisive

More information

10701 Machine Learning. Clustering

10701 Machine Learning. Clustering 171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

Local Search for CSPs

Local Search for CSPs Local Search for CSPs Alan Mackworth UBC CS CSP February, 0 Textbook. Lecture Overview Domain splitting: recap, more details & pseudocode Local Search Time-permitting: Stochastic Local Search (start) Searching

More information

Prediction-based diagnosis and loss prevention using qualitative multi-scale models

Prediction-based diagnosis and loss prevention using qualitative multi-scale models European Symposium on Computer Arded Aided Process Engineering 15 L. Puigjaner and A. Espuña (Editors) 2005 Elsevier Science B.V. All rights reserved. Prediction-based diagnosis and loss prevention using

More information

The Multi-perspective Process Explorer

The Multi-perspective Process Explorer The Multi-perspective Process Explorer Felix Mannhardt 1,2, Massimiliano de Leoni 1, Hajo A. Reijers 3,1 1 Eindhoven University of Technology, Eindhoven, The Netherlands 2 Lexmark Enterprise Software,

More information

Cascade Failures from Distributed Generation in Power Grids

Cascade Failures from Distributed Generation in Power Grids Cascade Failures from Distributed Generation in Power Grids Antonio Scala CNR-ISC @ Univ. di Roma La Sapienza, IMT Alti Studi Lucca, LIMS London AIIC Italian Experts on Critical Infrastructures Sakshi

More information

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing

More information

The multi-perspective process explorer

The multi-perspective process explorer The multi-perspective process explorer Mannhardt, F.; de Leoni, M.; Reijers, H.A. Published in: Proceedings of the Demo Session of the 13th International Conference on Business Process Management (BPM

More information

BPMN Miner 2.0: Discovering Hierarchical and Block-Structured BPMN Process Models

BPMN Miner 2.0: Discovering Hierarchical and Block-Structured BPMN Process Models BPMN Miner 2.0: Discovering Hierarchical and Block-Structured BPMN Process Models Raffaele Conforti 1, Adriano Augusto 1, Marcello La Rosa 1, Marlon Dumas 2, and Luciano García-Bañuelos 2 1 Queensland

More information

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 15 Table of contents 1 Introduction 2 Data preprocessing

More information

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 15 Table of contents 1 Introduction 2 Data preprocessing

More information

Discovering Hierarchical Process Models Using ProM

Discovering Hierarchical Process Models Using ProM Discovering Hierarchical Process Models Using ProM R.P. Jagadeesh Chandra Bose 1,2, Eric H.M.W. Verbeek 1 and Wil M.P. van der Aalst 1 1 Department of Mathematics and Computer Science, University of Technology,

More information

Local Search. (Textbook Chpt 4.8) Computer Science cpsc322, Lecture 14. May, 30, CPSC 322, Lecture 14 Slide 1

Local Search. (Textbook Chpt 4.8) Computer Science cpsc322, Lecture 14. May, 30, CPSC 322, Lecture 14 Slide 1 Local Search Computer Science cpsc322, Lecture 14 (Textbook Chpt 4.8) May, 30, 2017 CPSC 322, Lecture 14 Slide 1 Announcements Assignment1 due now! Assignment2 out today CPSC 322, Lecture 10 Slide 2 Lecture

More information

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. Unsupervised Learning. Manfred Huber Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training

More information

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2

More information

APD tool: Mining Anomalous Patterns from Event Logs

APD tool: Mining Anomalous Patterns from Event Logs APD tool: Mining Anomalous Patterns from Event Logs Laura Genga 1, Mahdi Alizadeh 1, Domenico Potena 2, Claudia Diamantini 2, and Nicola Zannone 1 1 Eindhoven University of Technology 2 Università Politecnica

More information

Clustering part II 1

Clustering part II 1 Clustering part II 1 Clustering What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods 2 Partitioning Algorithms:

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information

Alignment-Based Trace Clustering

Alignment-Based Trace Clustering Alignment-Based Trace Clustering Thomas Chatain, Josep Carmona, and Boudewijn van Dongen LSV, ENS Paris-Saclay, CNRS, INRIA, Cachan (France) chatain@lsv.ens-cachan.fr Universitat Politècnica de Catalunya,

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Midterm Examination CS540-2: Introduction to Artificial Intelligence Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search

More information

Heuristic Optimisation

Heuristic Optimisation Heuristic Optimisation Part 2: Basic concepts Sándor Zoltán Németh http://web.mat.bham.ac.uk/s.z.nemeth s.nemeth@bham.ac.uk University of Birmingham S Z Németh (s.nemeth@bham.ac.uk) Heuristic Optimisation

More information

UNIT 2 Data Preprocessing

UNIT 2 Data Preprocessing UNIT 2 Data Preprocessing Lecture Topic ********************************************** Lecture 13 Why preprocess the data? Lecture 14 Lecture 15 Lecture 16 Lecture 17 Data cleaning Data integration and

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Machine Learning: Perceptron Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer and Dan Klein. 1 Generative vs. Discriminative Generative classifiers:

More information

Streaming process discovery and conformance checking

Streaming process discovery and conformance checking Streaming process discovery and conformance checking Andrea Burattin Synonyms Online process mining; online process discovery; online conformance checking. Definitions Streaming process discovery, streaming

More information

The clustering in general is the task of grouping a set of objects in such a way that objects

The clustering in general is the task of grouping a set of objects in such a way that objects Spectral Clustering: A Graph Partitioning Point of View Yangzihao Wang Computer Science Department, University of California, Davis yzhwang@ucdavis.edu Abstract This course project provide the basic theory

More information

Research on outlier intrusion detection technologybased on data mining

Research on outlier intrusion detection technologybased on data mining Acta Technica 62 (2017), No. 4A, 635640 c 2017 Institute of Thermomechanics CAS, v.v.i. Research on outlier intrusion detection technologybased on data mining Liang zhu 1, 2 Abstract. With the rapid development

More information

Where Next? Data Mining Techniques and Challenges for Trajectory Prediction. Slides credit: Layla Pournajaf

Where Next? Data Mining Techniques and Challenges for Trajectory Prediction. Slides credit: Layla Pournajaf Where Next? Data Mining Techniques and Challenges for Trajectory Prediction Slides credit: Layla Pournajaf o Navigational services. o Traffic management. o Location-based advertising. Source: A. Monreale,

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Welcome to DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Time: 6:00pm 8:50pm Thu Location: AK 232 Fall 2016 High Dimensional Data v Given a cloud of data points we want to understand

More information

Improving Process Model Precision by Loop Unrolling

Improving Process Model Precision by Loop Unrolling Improving Process Model Precision by Loop Unrolling David Sánchez-Charles 1, Marc Solé 1, Josep Carmona 2, Victor Muntés-Mulero 1 1 CA Strategic Research Labs, CA Technologies, Spain David.Sanchez,Marc.SoleSimo,Victor.Muntes@ca.com

More information

Introduction to Machine Learning. Xiaojin Zhu

Introduction to Machine Learning. Xiaojin Zhu Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006

More information

Process Mining Tutorial

Process Mining Tutorial Anne Rozinat Christian W. Günther 26. April 2010 Web: http://fluxicon.com Email: anne@fluxicon.com Phone: +31(0)62 4364201 Copyright 2010 Fluxicon Problem IT-supported business processes are complex Lack

More information

Lecture 3. Planar Kinematics

Lecture 3. Planar Kinematics Matthew T. Mason Mechanics of Manipulation Outline Where are we? s 1. Foundations and general concepts. 2.. 3. Spherical and spatial kinematics. Readings etc. The text: By now you should have read Chapter

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Clustering Algorithms for general similarity measures

Clustering Algorithms for general similarity measures Types of general clustering methods Clustering Algorithms for general similarity measures general similarity measure: specified by object X object similarity matrix 1 constructive algorithms agglomerative

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Fall 2013 Reading: Chapter 3 Han, Chapter 2 Tan Anca Doloc-Mihu, Ph.D. Some slides courtesy of Li Xiong, Ph.D. and 2011 Han, Kamber & Pei. Data Mining. Morgan Kaufmann.

More information

By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad

By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad Data Analytics life cycle Discovery Data preparation Preprocessing requirements data cleaning, data integration, data reduction, data

More information

Similarity Ranking in Large- Scale Bipartite Graphs

Similarity Ranking in Large- Scale Bipartite Graphs Similarity Ranking in Large- Scale Bipartite Graphs Alessandro Epasto Brown University - 20 th March 2014 1 Joint work with J. Feldman, S. Lattanzi, S. Leonardi, V. Mirrokni [WWW, 2014] 2 AdWords Ads Ads

More information

CSCI-630 Foundations of Intelligent Systems Fall 2015, Prof. Zanibbi

CSCI-630 Foundations of Intelligent Systems Fall 2015, Prof. Zanibbi CSCI-630 Foundations of Intelligent Systems Fall 2015, Prof. Zanibbi Midterm Examination Name: October 16, 2015. Duration: 50 minutes, Out of 50 points Instructions If you have a question, please remain

More information

Machine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler Overview What is clustering and its applications? Distance between two clusters. Hierarchical Agglomerative clustering.

More information

Clustering. Chapter 10 in Introduction to statistical learning

Clustering. Chapter 10 in Introduction to statistical learning Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What

More information

Local Search. (Textbook Chpt 4.8) Computer Science cpsc322, Lecture 14. Oct, 7, CPSC 322, Lecture 14 Slide 1

Local Search. (Textbook Chpt 4.8) Computer Science cpsc322, Lecture 14. Oct, 7, CPSC 322, Lecture 14 Slide 1 Local Search Computer Science cpsc322, Lecture 14 (Textbook Chpt 4.8) Oct, 7, 2013 CPSC 322, Lecture 14 Slide 1 Department of Computer Science Undergraduate Events More details @ https://www.cs.ubc.ca/students/undergrad/life/upcoming-events

More information

Operations Research and Optimization: A Primer

Operations Research and Optimization: A Primer Operations Research and Optimization: A Primer Ron Rardin, PhD NSF Program Director, Operations Research and Service Enterprise Engineering also Professor of Industrial Engineering, Purdue University Introduction

More information

Pattern Recognition Lecture Sequential Clustering

Pattern Recognition Lecture Sequential Clustering Pattern Recognition Lecture Prof. Dr. Marcin Grzegorzek Research Group for Pattern Recognition Institute for Vision and Graphics University of Siegen, Germany Pattern Recognition Chain patterns sensor

More information

Answer All Questions. All Questions Carry Equal Marks. Time: 20 Min. Marks: 10.

Answer All Questions. All Questions Carry Equal Marks. Time: 20 Min. Marks: 10. Code No: 126VW Set No. 1 JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD B.Tech. III Year, II Sem., II Mid-Term Examinations, April-2018 DATA WAREHOUSING AND DATA MINING Objective Exam Name: Hall Ticket

More information

BOOLEAN MATRIX FACTORIZATIONS. with applications in data mining Pauli Miettinen

BOOLEAN MATRIX FACTORIZATIONS. with applications in data mining Pauli Miettinen BOOLEAN MATRIX FACTORIZATIONS with applications in data mining Pauli Miettinen MATRIX FACTORIZATIONS BOOLEAN MATRIX FACTORIZATIONS o THE BOOLEAN MATRIX PRODUCT As normal matrix product, but with addition

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Spring 2016 A second course in data mining!! http://www.it.uu.se/edu/course/homepage/infoutv2/vt16 Kjell Orsborn! Uppsala Database Laboratory! Department of Information Technology,

More information

Non-Dominated Bi-Objective Genetic Mining Algorithm

Non-Dominated Bi-Objective Genetic Mining Algorithm Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 6 (2017) pp. 1607-1614 Research India Publications http://www.ripublication.com Non-Dominated Bi-Objective Genetic Mining

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Unsupervised: no target value to predict

Unsupervised: no target value to predict Clustering Unsupervised: no target value to predict Differences between models/algorithms: Exclusive vs. overlapping Deterministic vs. probabilistic Hierarchical vs. flat Incremental vs. batch learning

More information

A Concurrency Control for Transactional Mobile Agents

A Concurrency Control for Transactional Mobile Agents A Concurrency Control for Transactional Mobile Agents Jeong-Joon Yoo and Dong-Ik Lee Department of Information and Communications, Kwang-Ju Institute of Science and Technology (K-JIST) Puk-Gu Oryong-Dong

More information

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22 INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task

More information

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) 10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) Rahil Mahdian 01.04.2016 LSV Lab, Saarland University, Germany What is clustering? Clustering is the classification of objects into different groups,

More information

Clustering: K-means and Kernel K-means

Clustering: K-means and Kernel K-means Clustering: K-means and Kernel K-means Piyush Rai Machine Learning (CS771A) Aug 31, 2016 Machine Learning (CS771A) Clustering: K-means and Kernel K-means 1 Clustering Usually an unsupervised learning problem

More information

EE 701 ROBOT VISION. Segmentation

EE 701 ROBOT VISION. Segmentation EE 701 ROBOT VISION Regions and Image Segmentation Histogram-based Segmentation Automatic Thresholding K-means Clustering Spatial Coherence Merging and Splitting Graph Theoretic Segmentation Region Growing

More information

Topological Abstraction and Planning for the Pursuit-Evasion Problem

Topological Abstraction and Planning for the Pursuit-Evasion Problem Topological Abstraction and Planning for the Pursuit-Evasion Problem Alberto Speranzon*, Siddharth Srivastava (UTRC) Robert Ghrist, Vidit Nanda (UPenn) IMA 1/1/2015 *Now at Honeywell Aerospace Advanced

More information

Clustering (Basic concepts and Algorithms) Entscheidungsunterstützungssysteme

Clustering (Basic concepts and Algorithms) Entscheidungsunterstützungssysteme Clustering (Basic concepts and Algorithms) Entscheidungsunterstützungssysteme Why do we need to find similarity? Similarity underlies many data science methods and solutions to business problems. Some

More information

6. Learning Partitions of a Set

6. Learning Partitions of a Set 6. Learning Partitions of a Set Also known as clustering! Usually, we partition sets into subsets with elements that are somewhat similar (and since similarity is often task dependent, different partitions

More information

P ^ 2π 3 2π 3. 2π 3 P 2 P 1. a. b. c.

P ^ 2π 3 2π 3. 2π 3 P 2 P 1. a. b. c. Workshop on Fundamental Structural Properties in Image and Pattern Analysis - FSPIPA-99, Budapest, Hungary, Sept 1999. Quantitative Analysis of Continuous Symmetry in Shapes and Objects Hagit Hel-Or and

More information

Hierarchical clustering

Hierarchical clustering Hierarchical clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Description Produces a set of nested clusters organized as a hierarchical tree. Can be visualized

More information

Constraint Programming

Constraint Programming Depth-first search Let us go back to foundations: DFS = Depth First Search Constraint Programming Roman Barták Department of Theoretical Computer Science and Mathematical Logic 2 3 4 5 6 7 8 9 Observation:

More information

Data Mining. Clustering. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Clustering. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Clustering Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 31 Table of contents 1 Introduction 2 Data matrix and

More information

Enabling Flexibility in Process-Aware

Enabling Flexibility in Process-Aware Manfred Reichert Barbara Weber Enabling Flexibility in Process-Aware Information Systems Challenges, Methods, Technologies ^ Springer Part I Basic Concepts and Flexibility Issues 1 Introduction 3 1.1 Motivation

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence CSPs II + Local Search Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Cluster Analysis. Angela Montanari and Laura Anderlucci

Cluster Analysis. Angela Montanari and Laura Anderlucci Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a

More information

Big Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety

Big Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Big Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety Abhishek

More information

Keywords: clustering algorithms, unsupervised learning, cluster validity

Keywords: clustering algorithms, unsupervised learning, cluster validity Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Clustering Based

More information

CS490D: Introduction to Data Mining Prof. Chris Clifton

CS490D: Introduction to Data Mining Prof. Chris Clifton CS490D: Introduction to Data Mining Prof. Chris Clifton April 5, 2004 Mining of Time Series Data Time-series database Mining Time-Series and Sequence Data Consists of sequences of values or events changing

More information

Cluster analysis. Agnieszka Nowak - Brzezinska

Cluster analysis. Agnieszka Nowak - Brzezinska Cluster analysis Agnieszka Nowak - Brzezinska Outline of lecture What is cluster analysis? Clustering algorithms Measures of Cluster Validity What is Cluster Analysis? Finding groups of objects such that

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview Chapter 888 Introduction This procedure generates D-optimal designs for multi-factor experiments with both quantitative and qualitative factors. The factors can have a mixed number of levels. For example,

More information

Efficient, Scalable, and Provenance-Aware Management of Linked Data

Efficient, Scalable, and Provenance-Aware Management of Linked Data Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management

More information

Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges

Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges Abhishek Santra 1 and Sanjukta Bhowmick 2 1 Information Technology Laboratory, CSE Department, University of

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

IT INFRASTRUCTURE PROJECT PHASE I INSTRUCTIONS

IT INFRASTRUCTURE PROJECT PHASE I INSTRUCTIONS Project Overview IT INFRASTRUCTURE PROJECT PHASE I INSTRUCTIONS This project along with the Phase II IT Infrastructure Project will help you understand how a network administrator improves network performance

More information

Syntactic Measures of Complexity

Syntactic Measures of Complexity A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Arts 1999 Bruce Edmonds Department of Philosophy Table of Contents Table of Contents - page 2

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Slides From Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining

More information

Chapter 5: Outlier Detection

Chapter 5: Outlier Detection Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 5: Outlier Detection Lecture: Prof. Dr.

More information

Week 7: Traffic Models and QoS

Week 7: Traffic Models and QoS Week 7: Traffic Models and QoS Acknowledgement: Some slides are adapted from Computer Networking: A Top Down Approach Featuring the Internet, 2 nd edition, J.F Kurose and K.W. Ross All Rights Reserved,

More information

Recap Randomized Algorithms Comparing SLS Algorithms. Local Search. CPSC 322 CSPs 5. Textbook 4.8. Local Search CPSC 322 CSPs 5, Slide 1

Recap Randomized Algorithms Comparing SLS Algorithms. Local Search. CPSC 322 CSPs 5. Textbook 4.8. Local Search CPSC 322 CSPs 5, Slide 1 Local Search CPSC 322 CSPs 5 Textbook 4.8 Local Search CPSC 322 CSPs 5, Slide 1 Lecture Overview 1 Recap 2 Randomized Algorithms 3 Comparing SLS Algorithms Local Search CPSC 322 CSPs 5, Slide 2 Stochastic

More information

A ProM Operational Support Provider for Predictive Monitoring of Business Processes

A ProM Operational Support Provider for Predictive Monitoring of Business Processes A ProM Operational Support Provider for Predictive Monitoring of Business Processes Marco Federici 1,2, Williams Rizzi 1,2, Chiara Di Francescomarino 1, Marlon Dumas 3, Chiara Ghidini 1, Fabrizio Maria

More information

Information Integration of Partially Labeled Data

Information Integration of Partially Labeled Data Information Integration of Partially Labeled Data Steffen Rendle and Lars Schmidt-Thieme Information Systems and Machine Learning Lab, University of Hildesheim srendle@ismll.uni-hildesheim.de, schmidt-thieme@ismll.uni-hildesheim.de

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets: Inference (Finish) Variable Elimination Graph-view of VE: Fill-edges, induced width

More information

Improving Test Suites via Operational Abstraction

Improving Test Suites via Operational Abstraction Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science http://pag.lcs.mit.edu/~mernst/ Joint work with Michael Harder, Jeff Mellen, and Benjamin Morse Michael Ernst,

More information

Politecnico di Milano FACOLTÀ DI INGEGNERIA DELL INFORMAZIONE. Sistemi Embedded 1 A.A Exam date: September 5 th, 2017

Politecnico di Milano FACOLTÀ DI INGEGNERIA DELL INFORMAZIONE. Sistemi Embedded 1 A.A Exam date: September 5 th, 2017 Politecnico di Milano FACOLTÀ DI INGEGNERIA DELL INFORMAZIONE Sistemi Embedded 1 A.A. 2016-2017 Exam date: September 5 th, 2017 Prof. William FORNACIARI Surname (readable)... Q1 Q2 TOTAL NOTES It is forbidden

More information