Next Steps in Data Mining Sistemas de Apoio à Decisão Cláudia Antunes
Temporal Data Mining Cláudia Antunes
Data Mining Knowledge Discovery is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. [Frawley, KDD 1995] Data sources Preprocessing Data Mining Evaluation Info 3
Data Mining: Open Issues New Visualization Forms Data sources Preprocessing Data Mining Evaluation Info 4 Privacy Issues
Data Sources Time Series Bio Sequences Social Nets Data Streams 5
Mining Data Streams Streams continuous, online process e.g. how to monitor network packets for intruders? concept drift and environment drift? RFID network and sensor network data Requirements: small constant time per record fixed amount of memory at most one scan of data model always available model up-to-date 6
Mining Networks Community and Social Networks Linked data between emails, Web pages, blogs, citations, sequences and people Static and dynamic structural behavior Mining in and for Computer Networks detect anomalies (e.g., sudden traffic spikes due to a DoS (Denial of Service) attacks Need to handle 10Gig Ethernet links (a) detect (b) trace back (c ) drop packet 7
Sequential and Time Series Data How to efficiently and accurately cluster, classify and predict the trends? Time series data used for predictions are contaminated by noise How to do accurate shortterm and long-term predictions? Signal processing techniques introduce lags in the filtered data, which reduces accuracy Key in source selection, domain knowledge in rules, and optimization methods 8
Mining Bio and Environmental Data New problems raise new questions Large scale problems especially so Biological data mining, such as HIV vaccine design DNA, chemical properties, 3D structures, and functional properties à need to be fused Environmental data mining Mining for solving the energy crisis 9
If you want a second opinion, I will ask my computer Inteligência Artificial - Aprendizagem by Cláudia Antunes 10
Guiding the Discovery Using domain knowledge to inform the methods How to represent knowledge? How to guide the process How to prevent discovery unknown patterns? 11
Security, Privacy and Data Integrity How to ensure the users privacy while their data are being mined? How to do data mining for protection of security and privacy? Knowledge integrity assessment Data are intentionally modified from in order to misinform the recipients 12
Scaling Up for Big Data Using Hadoop / MapReduce Distributed file system Must scale (linearly ) with Amount of data Number of machines Problem complexity 13
14
The traditional classification process (since the late 50's) Fixed/engineered features (or fixed kernel) + trainable classifier classifier hand-crafted Feature Extractor Simple Trainable Classifier 15
Deep Learning Deep learning = Feature learning Trainable features (or kernel) + trainable classifier classifier hand-crafted Trainable Feature Extractor Simple Trainable Trainable Classifier 16
Representation Learning a set of methods that allows a machine to be fed with raw data and to automatically discover the representations needed for detection or classification. pixels edges object parts (combination of edges) object models 17
Deep Learning Deep-learning methods are representationlearning methods with multiple levels of representation, obtained by composing simple but non-linear modules that each transform the representation at one level (starting with the raw input) into a representation at a higher, slightly more abstract level 18
Deep Learning Alternatives Feed-Forward: multilayer neural nets, convolutional nets Feed-Back: Stacked Sparse Coding, Deconvolutional Nets Bi-Drectional: Deep Boltzmann Machines, Stacked Auto-Encoders 19
20