A Knowledge Framework For Sensor Data Quality Control

Size: px
Start display at page:

Download "A Knowledge Framework For Sensor Data Quality Control"

Transcription

1 1 A Knowledge Framework For Sensor Data Quality Control Anshuman Sahu, Bo Yang, Member, IEEE Abstract--Data driven approaches are being increasingly used in grid monitoring and decision support for power system operation and planning. The fact that more and more critical business decisions are made based on measurement data poses high requirements on reliability of communication and information infrastructure. Data quality is a major concern for many power companies due to communication network inefficiency, malfunctioning of sensors and inappropriate data ingestion. This paper proposes a framework for sensor data quality control which is data driven and requires minimal domain expert input. The proposed framework identifies poor quality data (outliers, adverse operational data) based on feature extraction and machine learning techniques. The framework generates additional insights into the data which can be leveraged by the operator. We demonstrate a prototype of the framework on real data. Index Terms Sensor data quality, data complexity, machine learning models. I. INTRODUCTION With the proliferation of digital measuring devices, such as smart meter on distribution systems and phasor measurement units (PMU) on the transmission systems, power systems become more observable through field measurements. As a result, measurement data are being increasingly used in grid monitoring and decision support for power system operation and planning, which previously used to rely heavily on information derived from model simulation. For instance, conventional transmission grid operations rely on remedy control strategies that are determined based on offline analysis. Such analysis may not be optimal for real time operation. PMU and other grid sensors provide improved situational awareness so that abnormalities such as oscillations, unexpected generation drop etc. can be detected based on high resolution synchronized field measurements. In recent years, research efforts on grid analysis facilitated by PMU data have attracted great attention and covered a wide range of topics: state estimation [1]; adaptive control and protection [2]; voltage stability assessment [3]; and low frequency oscillation detection [4]. Similar trend has been observed on distribution power systems, where the popularity of smart meter and intelligent electronic devices reforms the way distribution companies operate with improved feeder management, Anshuman Sahu is with Hitachi America Big Data Lab, Santa Clara, CA ( anshuman.sahu@hal.hitachi.com). Bo Yang is with Smartwires ( Bo.Yang@smartwires.com) resilient operations, and enhanced customer engagement. The fact that more and more critical business decisions are made based on measurement data poses high requirements on reliability of communication and information infrastructure. Unfortunately, data quality is a concern for many power companies due to inefficiency of communication network, malfunctioning of sensors and inappropriate data ingestion. Research efforts dedicated to improvement of data quality often focus on recovering the missing data [5], mitigating impacts of network latency [6], or capturing noise level [7]. Common practices are 1) excluding obvious unrealistic data points based on engineering judgment or rough estimate based on physical models and then 2) developing models to capture impacts of noise and uncertainties. Such process works well for many applications except that it requires a lot of domain expert inputs, and is difficult to generalize for data of different nature. For example, the rule-of-thumb that works well on voltage measurements cannot be easily adapted for current measurements. When volume, velocity, and dimensionality of data stream are sufficiently high, the process becomes really challenging and cannot be handled well by existing techniques. This paper proposes a framework for data quality control which is purely data driven and requires minimal domain expert input. The proposed framework identifies poor quality data (outliers, adverse operational data) based on feature extraction and machine learning. The data signatures extracted are clustered, and fed into a learner for gleaning useful insights. The framework can be used to detect abnormalities in very high dimensional settings, and is robust to changes in operating conditions. Section 2 describes the architecture and methodology behind our framework. Section 3 describes an instantiation of the components of the framework in detail on a dataset on paired wind cup anemometer measurements. We finally conclude with future work in Section 4. II. ARCHITECTURE AND METHODOLOGY It is assumed that the outputs of physical systems are interrelated and can be described mathematically. Their measurements shall be able to reflect such principles. Good or acceptable measurements will fit the mathematical models well while the unacceptable data points, i.e. bad or missing data, will fall out of the solution space. The strategy for data quality control is to identify such acceptable solution space based on the observed data and train machine model to describe the principles, so that when new data stream comes in, bad data

2 2 can easily be screened out. The architecture is shown in Figure 1 consists of four major components. Feature extraction (FE) is often deployed for preprocessing of data matrix for subsequent learning and generalization steps. When size of input data is very huge and a lot of features in the data are correlated, FE can reduce the dimensionality of input data while retaining useful features. Dimensionality reduction methods include principal component analysis (PCA) and other subspace learning methods. Clustering is a data mining process to discover groups of similar data points. It is expected that normal and abnormal data points will have different characteristics, and are thus expected to belong to different groups. Different cluster analysis techniques can be employed. Commonly used methods consider neighborhood distribution, density and connectivity between data points. Figure 1 Work flow for data quality control framework. Machine Model Training Once data points are clustered, a label can be assigned to them based on cluster membership. Machine learning models in the form of supervised classifiers will then be learned to characterize each data group. One of the groups will be able to represent the normal operating data closely while other groups will correspond to contaminated data points and cannot be used to make predicts or decisions. In many cases, the physical systems under measurement are partially known. Domain expertise can then be leveraged to improve model accuracy, which otherwise might be limited due to lack of data coverage. Domain expert also has the opportunity to validate or further improve the machine model. Such refined model could be applied for data classification after which clean data will be identified for further analysis. III. CASE STUDY: DEMONSTRATION ON THE PAIRED WIND CUP ANEMOMETER DATA We demonstrate the efficacy of our framework on sensor data obtained from a pair of cup anemometers. The dataset was released during Prognostics and Health Management (PHM) society challenge in Different attributes (features) such as wind speed, wind direction, and temperature were recorded and summarized in intervals of 10 minutes for both anemometers. For more details about the origin of the dataset, we refer the interested reader to the following website: Note that unlike the goal of the contest, our contribution here is to show how the framework can utilize real-life data to tease out valuable insights in an almost automated fashion. We describe our procedure step-by-step on one of the datasets. The first eight attributes correspond to summary statistics for wind speed; the next three attributes correspond to summary statistics for wind direction (all entries in the column corresponding to minimum value of wind direction were found to be zero and were removed); and the final four attributes correspond to summary statistics for temperature. Also we are instantiating our framework with specific algorithms for each step. These can be appropriately substituted with other appropriate methods for dealing with different situations. A. Feature Extraction Using Principal Component Analysis (PCA) The goal of feature extraction is to understand the dependencies between the attributes and identify the inherent subspace in which data resides. A survey of such techniques can be found in [1]. PCA is a simple yet highly effective approach in practice. We performed PCA on the scaled data, and project the data to the top 3 principal components (PC) in Figure 3. As shown in Figure 2, these 3 PCs can explain slightly more than 80% of the total variance of the data. Upon inspecting the loadings, we identified that the first PC focused on wind speed; the second PC focused on temperature and wind-direction; and the third PC focused on standard deviation of the attributes. Figure 2 Cumulative variance of principal components: Top 3 principal components are selected covering 80% of variance B. Sub-Group identification via clustering Once we extract the relevant features, we need to discover sub-groups within the data. Model based clustering [9] utilizing Bayesian Information Criterion (BIC) was employed for this purpose. Our premise is the data set consists of outliers as well as points corresponding to adverse weather conditions (for example icing) in addition to normal points. Generally, the number of clusters identified depends on the application of interest. We show the results of clustering in Figure 4 where we identify three clusters.

3 3 C. Machine Model By Decision Tree Method After discovering the sub-groups, we are naturally interested in understanding them. Supervised machine learning algorithms can be employed for such purpose. Decision Trees [10] are one such technique that can be employed. They are robust, can handle high-dimensionality, provide importance scores for the features, and produce rules which are interpretable. The cluster memberships learned from previous step are assigned as class labels to the data in original feature space. The tree model learned from the data is visualized in Figure 5. The first row in each node of the tree shows the dominant class in that node (1 represents class for outlier data; 2 represents class for adverse weather data; and 3 represents class for normal data); the second row shows the estimated probability for each class (in the order of classes 1,2,3 respectively); and the final row represents the percentage of total data that fall in the node as a result of application of rule denoted below each node. If a data point satisfies the rule, it goes to the left child node. Else it goes to the right child node. for such cases. The rules, however, were based on human judgment with no empirical evidence. Furthermore, the rules were derived looking at one feature at a time. Thus, we can miss potential interaction between features which can be useful descriptors. In our method, we provide a systematic approach to discover such rules. Decision trees can handle interaction between features. Moreover, with availability of additional ground truth data as well as feedback from operator, we can improve the accuracy of our method. Table 1 depicts the comparison of classification results from our approach and previous study [11], which base on engineering judgment and rough estimate to weather affects. It can be seen that substantially more data points are retained in our approach. Figure 5 Classification tree based on rules learned from data Figure 3 Clustered data in dominant PCA space: normal data are shown in green, data from adverse weather conditions are shown in red and outliers in the data are shown in blue. Clustering in PCA space tends to separate out abnormal points. Figure 6 Boxplot showing distribution of key features for each class (color coded previously) identified through decision tree. Figure 4 Highly coupled clusters and their covariance structure. Color labels for each class are same as in Figure 3. In addition to estimating the class probabilities, we also look at the important features identified by the tree model, and show the box plots on those features for the three classes in Figure 6. It is interesting to note that the features and rules identified above show physical meaning in discriminating between the classes. Previous study [11] had shown some rules Table 1 Comparison of data clustering with physical rule based method Bad data (outlier+icing) Good data Rule-based method Our method Difference 55% -38% D. Data Classification The fine-tuned machine model is then applied to a new data file for prediction. The test file has 720 data rows where each row has the same data format as the given training data set. Each row is tested under the developed decision tree and labeled with one class. In this case, normal data points (530 rows) dominate the data file where outliers and icing points are

4 4 combined to have 190 rows. Class 3 will be retained for further analysis. The classification of data can then be deployed on equipment health analysis, which can be intuitively determined if substantial outliers or icing points been detected. IV. DEMONSTRATION ON HIGH RESOLUTION TIME SERIES DATA The proposed framework is further demonstrated on time series data that are raw field measurements, more dynamic and contaminated with noise. The data from phasor measurement units are adopted to benchmark the efficiency of proposed approaches on outlier identification and nuisance data detection. The adopted PMU data is available through the Texas Synchrophasor Network. The measurements on 22pm of January 12, 2012 are randomly selected for demonstration. The original data file contains one hour of voltage, phasor and frequency measurements for six PMUs scattered around Texas. The experiments were run on the first five minutes for the purpose of demonstrating the efficacy of the methodology in detecting data points of interest. The goal is to test the proposed data quality control framework on selected dataset without inputs from domain experts. Both phasors and frequency signals are selected as ground truth for comparison, which may contain noise and outliers by nature. In the experiments, however, white noise of different levels and distributions are also applied to the raw data for performance evaluation. The proposed approaches are expected to differentiate the added noises and outliers only relying on correlations between angles and frequencies from different PMUs. Figure 7 Frequency and angle measurements for 6 PMUs A. Contaminated data Since there is no information about level of noise and quality for the raw data, artificial white noise with different magnitudes are added to test performance of the proposed methods. There is no missing data for the selected data file. The following noise scenarios are considered. Simulated white noise has a Gaussian distribution with noise level of 1-3% of measurement signals. The occurrence of the noise is also randomly distributed where total occurrence is set to be 20-60% of overall duration. As the magnitude and occurrence of the contaminated measurements are totally random, it can measure the flexibility of proposed methodology and its ability to address unpredictable issues. Raw data 1% Mag, 20% ToD 3% Mag, 60% ToD 3% Mag, 40% ToD 1% Mag, 40% ToD Figure 8 Phase angle and frequency with artificial white noises - 5 mins overview (Upper left) Noises at various levels(upper right) Comparison of magnitude (Lower left) Similar noised frequency measurements (Lower right) Figure 8 shows the angle and frequency measurements when various levels of noise have been added. The two figures on the top show absolute voltage angle with 1-3% of white noise, where the upper left one is a closer look at the noised signals. The figure on lower right compared each noise level, which in general are more obvious as the measured value increases with time. The figure on the lower right depicts the noised frequency signals. Since 1% change corresponds to roughly 0.6 Hz, the noisy measurement points can be easily differentiated from the clean data points. B. Outlier Detection As in Figure 8, the contaminated frequency data points, i.e. outliers are relatively easy to identify. However, the noise applied to angle measurements are closely coupled with clean data points and follow roughly same trends as time progresses. It is challenging to use rule based approach, i.e. angle thresholds, to separate contaminated angle measurements. The proposed principle component analysis method, however, successfully captures the correlations between angle and frequency and applies that for outlier identification. Figure 9 depicts the outliers in red and clean data in blue. It is confirmed that all contaminated data points are detected. Figure 9 Angle and frequency for 1 PMU with 1% noise level and 40% total duration Figure 10 shows that tri-modal feature of frequency is captured and adopted as clustering criteria. The root of the tree divides all datasets into two groups, where one group with frequencies equal to or above 61 Hz is seen as one cluster in orange color.

5 5 Apparently, this group has all the outliers, which are higher than the normal values. The rest of data is further clustered into two groups, where the group with frequencies lower than 59 Hz is seen to have outliers lower than the normal values. The group in the Green Box is the collection of all normal measurements. Figure 12 Classification tree used to differentiate contaminated data points Figure 10 Classification tree used to differentiate contaminated data points (with 3% noise level and 40% total duration C. Nuisance Data Detection Sometimes sensors can malfunction or are not calibrated properly, which induces nuisance data. This type of data quality problem is difficult to detect unless the measurements are way off the chart. In this paper, we propose to deploy the correlation among measurements to identify problematic measurements due to device malfunction. It is assumed that the measurements from different locations are bounded by physical rules behind the scene and the correlation of data shall be relatively consistent. For example, the differentiation of angle shall yield frequency. Thus the malfunction of one device will cause the change in data correlation and such new pattern cannot reverse automatically unless the data channel is removed. To test this approach, a vertical shift of 3% is applied to one PMU angle measurement after 85 second (out of 1 hour). The assumption is that this PMU was calibrated incorrectly and henceforth, yields wrong angle measurements. Figure 11 shows that such change was captured immediately as shown in the red. However, both frequencies and angles are marked as problematic by PCA method since it only detects change of correlation among data channels and cannot advise root cause. Decision tree method [10] is then applied to explore the cause of change, which discloses angle measured by PMU 1 to be the differentiator (Figure 12). Figure 11 Angle and frequency when 1 PMU angle measurement has vertical shift V. CONCLUSION AND FUTURE WORK A knowledge framework for sensor data quality control is proposed to identify abnormal data points through feature extraction and clustering techniques. Supervised learner models are then used to describe the data signature for normal operating conditions which can be incorporated into future decision making. The prototype of the framework has been described in details on the paired wind cup anemometer data and proved its feasibility. It is then further demonstrated on dynamic time series data, i.e. PMU measurements. The whole process is automatic and requires least domain expert inputs, which leverage on the data signature and correlation of data measurements. The results of experiment show that the proposed framework can function as expected on both types of datasets. Future research will take into account the location and topology (spatial) information as well as the associated temporal information for refining the predictions. VI. REFERENCES [1] M. Gol and A. Abur, A Fast Decoupled State Estimator for Systems Measured by PMUs, IEEE Transactions on Power Systems, vol. PP, no, 99, pp. 1-6, 2014 [2] M. Ariff, B. Pal and A. Singh, Estimating Dynamic Model Parameters for Adaptive Protection and Control in Power System, IEEE Transactions on Power Systems, vol. PP, no. 99, pp. 1-11, 2014 [3] V. Vijay, Application of phasor measurements for dynamic security assessment using decision trees, IEEE PES general meeting, San Diego, July 2012 [4] N. Zhou, J. Dagle, Initial Results in Using a Self-Coherence Method for Detecting Sustained Oscillations, IEEE Transactions on Power Systems, vol. 30, no. 1, pp , 2014 [5] M. Wang, J. H. Chow and et al, A Low-Rank Matrix Approach for the Analysis of Large Amounts of Power System Synchrophasor 48th Hawaii International Conference on System Sciences, 2015 [6] O. Al-Khatib, W. Hardjawana, B. Vucetic, Traffic Modeling and Optimization in Public and Private Wireless Access Networks for Smart Grids, IEEE Transactions on Smart Grid, vol 5, issue 4, pp [7] J.S. Erkelens, R. Heusdens, Tracking of Nonstationary Noise Based on Data-Driven Recursive Noise Power Estimation, IEEE Transactions on Audio, Speech, and Language Processing, vol 16, issue 6, pp [8] S. Ding, H. Zhu, W. Jia, and C. Su, "A survey on feature extraction for pattern recognition." Artificial Intelligence Review vol. 37, no. 3, pp , [9] C. Fraley, and A. E. Raftery, "Model-based clustering, discriminant analysis, and density estimation." Journal of the American statistical Association, vol. 97, no. 458, pp , [10] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen, Classification and regression trees. CRC press, [11] D. Siegel, and J. Lee, "An auto-associative residual processing and K- means clustering approach for anemometer health assessment." International Journal of Prognostics and Health Management, vol 2, pp , 2011.

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures

More information

Real-Time Model-Free Detection of Low-Quality Synchrophasor Data

Real-Time Model-Free Detection of Low-Quality Synchrophasor Data Real-Time Model-Free Detection of Low-Quality Synchrophasor Data Meng Wu and Le Xie Department of Electrical and Computer Engineering Texas A&M University College Station, TX NASPI Work Group meeting March

More information

Application of Clustering Techniques to Energy Data to Enhance Analysts Productivity

Application of Clustering Techniques to Energy Data to Enhance Analysts Productivity Application of Clustering Techniques to Energy Data to Enhance Analysts Productivity Wendy Foslien, Honeywell Labs Valerie Guralnik, Honeywell Labs Steve Harp, Honeywell Labs William Koran, Honeywell Atrium

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

2. Data Preprocessing

2. Data Preprocessing 2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Quality Assessment of Power Dispatching Data Based on Improved Cloud Model

Quality Assessment of Power Dispatching Data Based on Improved Cloud Model Quality Assessment of Power Dispatching Based on Improved Cloud Model Zhaoyang Qu, Shaohua Zhou *. School of Information Engineering, Northeast Electric Power University, Jilin, China Abstract. This paper

More information

Detection of Anomalies using Online Oversampling PCA

Detection of Anomalies using Online Oversampling PCA Detection of Anomalies using Online Oversampling PCA Miss Supriya A. Bagane, Prof. Sonali Patil Abstract Anomaly detection is the process of identifying unexpected behavior and it is an important research

More information

3. Data Preprocessing. 3.1 Introduction

3. Data Preprocessing. 3.1 Introduction 3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation

More information

Online Bad Data Detection for Synchrophasor Systems via Spatio-temporal Correlations

Online Bad Data Detection for Synchrophasor Systems via Spatio-temporal Correlations LOGO Online Bad Data Detection for Synchrophasor Systems via Spatio-temporal s Le Xie Texas A&M University NASPI International Synchrophasor Symposium March 24, 2016 Content 1 Introduction 2 Technical

More information

Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham

Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham Final Report for cs229: Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham Abstract. The goal of this work is to use machine learning to understand

More information

Using Statistical Techniques to Improve the QC Process of Swell Noise Filtering

Using Statistical Techniques to Improve the QC Process of Swell Noise Filtering Using Statistical Techniques to Improve the QC Process of Swell Noise Filtering A. Spanos* (Petroleum Geo-Services) & M. Bekara (PGS - Petroleum Geo- Services) SUMMARY The current approach for the quality

More information

Discovery of the Source of Contaminant Release

Discovery of the Source of Contaminant Release Discovery of the Source of Contaminant Release Devina Sanjaya 1 Henry Qin Introduction Computer ability to model contaminant release events and predict the source of release in real time is crucial in

More information

Fall 2017 ECEN Special Topics in Data Mining and Analysis

Fall 2017 ECEN Special Topics in Data Mining and Analysis Fall 2017 ECEN 689-600 Special Topics in Data Mining and Analysis Nick Duffield Department of Electrical & Computer Engineering Teas A&M University Organization Organization Instructor: Nick Duffield,

More information

Efficient PMU Data Analysis through High Performance Data Management Platform

Efficient PMU Data Analysis through High Performance Data Management Platform NASPI WG meeting Data & Network Management Task Team Efficient PMU Data Analysis through High Performance Data Management Platform 10/14/2015 Bo Lucy Yang, Jun Yamazaki, Norifumi Nishikawa, Hsiu-Khuern

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

The Comparative Study of Machine Learning Algorithms in Text Data Classification* The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

Chapter 5: Outlier Detection

Chapter 5: Outlier Detection Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 5: Outlier Detection Lecture: Prof. Dr.

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Response to API 1163 and Its Impact on Pipeline Integrity Management

Response to API 1163 and Its Impact on Pipeline Integrity Management ECNDT 2 - Tu.2.7.1 Response to API 3 and Its Impact on Pipeline Integrity Management Munendra S TOMAR, Martin FINGERHUT; RTD Quality Services, USA Abstract. Knowing the accuracy and reliability of ILI

More information

Feature Selection Technique to Improve Performance Prediction in a Wafer Fabrication Process

Feature Selection Technique to Improve Performance Prediction in a Wafer Fabrication Process Feature Selection Technique to Improve Performance Prediction in a Wafer Fabrication Process KITTISAK KERDPRASOP and NITTAYA KERDPRASOP Data Engineering Research Unit, School of Computer Engineering, Suranaree

More information

Cyber attack detection using decision tree approach

Cyber attack detection using decision tree approach Cyber attack detection using decision tree approach Amit Shinde Department of Industrial Engineering, Arizona State University,Tempe, AZ, USA {amit.shinde@asu.edu} In this information age, information

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES EXPERIMENTAL WORK PART I CHAPTER 6 DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES The evaluation of models built using statistical in conjunction with various feature subset

More information

COLOR FIDELITY OF CHROMATIC DISTRIBUTIONS BY TRIAD ILLUMINANT COMPARISON. Marcel P. Lucassen, Theo Gevers, Arjan Gijsenij

COLOR FIDELITY OF CHROMATIC DISTRIBUTIONS BY TRIAD ILLUMINANT COMPARISON. Marcel P. Lucassen, Theo Gevers, Arjan Gijsenij COLOR FIDELITY OF CHROMATIC DISTRIBUTIONS BY TRIAD ILLUMINANT COMPARISON Marcel P. Lucassen, Theo Gevers, Arjan Gijsenij Intelligent Systems Lab Amsterdam, University of Amsterdam ABSTRACT Performance

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to JULY 2011 Afsaneh Yazdani What motivated? Wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge What motivated? Data

More information

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,

More information

Anomaly Detection on Data Streams with High Dimensional Data Environment

Anomaly Detection on Data Streams with High Dimensional Data Environment Anomaly Detection on Data Streams with High Dimensional Data Environment Mr. D. Gokul Prasath 1, Dr. R. Sivaraj, M.E, Ph.D., 2 Department of CSE, Velalar College of Engineering & Technology, Erode 1 Assistant

More information

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before

More information

A Data Classification Algorithm of Internet of Things Based on Neural Network

A Data Classification Algorithm of Internet of Things Based on Neural Network A Data Classification Algorithm of Internet of Things Based on Neural Network https://doi.org/10.3991/ijoe.v13i09.7587 Zhenjun Li Hunan Radio and TV University, Hunan, China 278060389@qq.com Abstract To

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

CS 229 Final Project Report Learning to Decode Cognitive States of Rat using Functional Magnetic Resonance Imaging Time Series

CS 229 Final Project Report Learning to Decode Cognitive States of Rat using Functional Magnetic Resonance Imaging Time Series CS 229 Final Project Report Learning to Decode Cognitive States of Rat using Functional Magnetic Resonance Imaging Time Series Jingyuan Chen //Department of Electrical Engineering, cjy2010@stanford.edu//

More information

Sensor Based Time Series Classification of Body Movement

Sensor Based Time Series Classification of Body Movement Sensor Based Time Series Classification of Body Movement Swapna Philip, Yu Cao*, and Ming Li Department of Computer Science California State University, Fresno Fresno, CA, U.S.A swapna.philip@gmail.com,

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

Accelerometer Gesture Recognition

Accelerometer Gesture Recognition Accelerometer Gesture Recognition Michael Xie xie@cs.stanford.edu David Pan napdivad@stanford.edu December 12, 2014 Abstract Our goal is to make gesture-based input for smartphones and smartwatches accurate

More information

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017 Machine Learning in the Wild Dealing with Messy Data Rajmonda S. Caceres SDS 293 Smith College October 30, 2017 Analytical Chain: From Data to Actions Data Collection Data Cleaning/ Preparation Analysis

More information

Calibrating HART Transmitters. HCF_LIT-054, Revision 1.1

Calibrating HART Transmitters. HCF_LIT-054, Revision 1.1 Calibrating HART Transmitters HCF_LIT-054, Revision 1.1 Release Date: November 19, 2008 Date of Publication: November 19, 2008 Document Distribution / Maintenance Control / Document Approval To obtain

More information

Optimal Clustering and Statistical Identification of Defective ICs using I DDQ Testing

Optimal Clustering and Statistical Identification of Defective ICs using I DDQ Testing Optimal Clustering and Statistical Identification of Defective ICs using I DDQ Testing A. Rao +, A.P. Jayasumana * and Y.K. Malaiya* *Colorado State University, Fort Collins, CO 8523 + PalmChip Corporation,

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction CHAPTER 5 SUMMARY AND CONCLUSION Chapter 1: Introduction Data mining is used to extract the hidden, potential, useful and valuable information from very large amount of data. Data mining tools can handle

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Abstract In this project, we study 374 public high schools in New York City. The project seeks to use regression techniques

More information

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking

More information

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing

More information

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Bhakti V. Gavali 1, Prof. Vivekanand Reddy 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological

More information

Ali Abur Northeastern University Department of Electrical and Computer Engineering Boston, MA 02115

Ali Abur Northeastern University Department of Electrical and Computer Engineering Boston, MA 02115 Enhanced State t Estimation Ali Abur Northeastern University Department of Electrical and Computer Engineering Boston, MA 02115 GCEP Workshop: Advanced Electricity Infrastructure Frances Arriallaga Alumni

More information

A Survey Of Issues And Challenges Associated With Clustering Algorithms

A Survey Of Issues And Challenges Associated With Clustering Algorithms International Journal for Science and Emerging ISSN No. (Online):2250-3641 Technologies with Latest Trends 10(1): 7-11 (2013) ISSN No. (Print): 2277-8136 A Survey Of Issues And Challenges Associated With

More information

Figure 1: Workflow of object-based classification

Figure 1: Workflow of object-based classification Technical Specifications Object Analyst Object Analyst is an add-on package for Geomatica that provides tools for segmentation, classification, and feature extraction. Object Analyst includes an all-in-one

More information

Louis Fourrier Fabien Gaie Thomas Rolf

Louis Fourrier Fabien Gaie Thomas Rolf CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted

More information

Estimating Noise and Dimensionality in BCI Data Sets: Towards Illiteracy Comprehension

Estimating Noise and Dimensionality in BCI Data Sets: Towards Illiteracy Comprehension Estimating Noise and Dimensionality in BCI Data Sets: Towards Illiteracy Comprehension Claudia Sannelli, Mikio Braun, Michael Tangermann, Klaus-Robert Müller, Machine Learning Laboratory, Dept. Computer

More information

Defining a Better Vehicle Trajectory With GMM

Defining a Better Vehicle Trajectory With GMM Santa Clara University Department of Computer Engineering COEN 281 Data Mining Professor Ming- Hwa Wang, Ph.D Winter 2016 Defining a Better Vehicle Trajectory With GMM Christiane Gregory Abe Millan Contents

More information

ECE 285 Class Project Report

ECE 285 Class Project Report ECE 285 Class Project Report Based on Source localization in an ocean waveguide using supervised machine learning Yiwen Gong ( yig122@eng.ucsd.edu), Yu Chai( yuc385@eng.ucsd.edu ), Yifeng Bu( ybu@eng.ucsd.edu

More information

AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH

AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH AN ENHANCED ATTRIBUTE RERANKING DESIGN FOR WEB IMAGE SEARCH Sai Tejaswi Dasari #1 and G K Kishore Babu *2 # Student,Cse, CIET, Lam,Guntur, India * Assistant Professort,Cse, CIET, Lam,Guntur, India Abstract-

More information

APPLICATION NOTE. XCellAir s Wi-Fi Radio Resource Optimization Solution. Features, Test Results & Methodology

APPLICATION NOTE. XCellAir s Wi-Fi Radio Resource Optimization Solution. Features, Test Results & Methodology APPLICATION NOTE XCellAir s Wi-Fi Radio Resource Optimization Solution Features, Test Results & Methodology Introduction Multi Service Operators (MSOs) and Internet service providers have been aggressively

More information

Grid Operations - Program 39

Grid Operations - Program 39 Grid Operations - Program 39 Program Description Program Overview In many ways, today's power system must be operated to meet objectives for which it was not explicitly designed. Today's transmission system

More information

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013 Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier

More information

MSA220 - Statistical Learning for Big Data

MSA220 - Statistical Learning for Big Data MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups

More information

Remote Sensing & Photogrammetry W4. Beata Hejmanowska Building C4, room 212, phone:

Remote Sensing & Photogrammetry W4. Beata Hejmanowska Building C4, room 212, phone: Remote Sensing & Photogrammetry W4 Beata Hejmanowska Building C4, room 212, phone: +4812 617 22 72 605 061 510 galia@agh.edu.pl 1 General procedures in image classification Conventional multispectral classification

More information

Face Recognition using Eigenfaces SMAI Course Project

Face Recognition using Eigenfaces SMAI Course Project Face Recognition using Eigenfaces SMAI Course Project Satarupa Guha IIIT Hyderabad 201307566 satarupa.guha@research.iiit.ac.in Ayushi Dalmia IIIT Hyderabad 201307565 ayushi.dalmia@research.iiit.ac.in Abstract

More information

A Data-Mining Approach for Wind Turbine Power Generation Performance Monitoring Based on Power Curve

A Data-Mining Approach for Wind Turbine Power Generation Performance Monitoring Based on Power Curve , pp.456-46 http://dx.doi.org/1.1457/astl.16. A Data-Mining Approach for Wind Turbine Power Generation Performance Monitoring Based on Power Curve Jianlou Lou 1,1, Heng Lu 1, Jia Xu and Zhaoyang Qu 1,

More information

Modulation-Aware Energy Balancing in Hierarchical Wireless Sensor Networks 1

Modulation-Aware Energy Balancing in Hierarchical Wireless Sensor Networks 1 Modulation-Aware Energy Balancing in Hierarchical Wireless Sensor Networks 1 Maryam Soltan, Inkwon Hwang, Massoud Pedram Dept. of Electrical Engineering University of Southern California Los Angeles, CA

More information

Lab 9. Julia Janicki. Introduction

Lab 9. Julia Janicki. Introduction Lab 9 Julia Janicki Introduction My goal for this project is to map a general land cover in the area of Alexandria in Egypt using supervised classification, specifically the Maximum Likelihood and Support

More information

Texture Image Segmentation using FCM

Texture Image Segmentation using FCM Proceedings of 2012 4th International Conference on Machine Learning and Computing IPCSIT vol. 25 (2012) (2012) IACSIT Press, Singapore Texture Image Segmentation using FCM Kanchan S. Deshmukh + M.G.M

More information

A Neural Network for Real-Time Signal Processing

A Neural Network for Real-Time Signal Processing 248 MalkofT A Neural Network for Real-Time Signal Processing Donald B. Malkoff General Electric / Advanced Technology Laboratories Moorestown Corporate Center Building 145-2, Route 38 Moorestown, NJ 08057

More information

CS 521 Data Mining Techniques Instructor: Abdullah Mueen

CS 521 Data Mining Techniques Instructor: Abdullah Mueen CS 521 Data Mining Techniques Instructor: Abdullah Mueen LECTURE 2: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks

More information

CPSC 340: Machine Learning and Data Mining. Outlier Detection Fall 2018

CPSC 340: Machine Learning and Data Mining. Outlier Detection Fall 2018 CPSC 340: Machine Learning and Data Mining Outlier Detection Fall 2018 Admin Assignment 2 is due Friday. Assignment 1 grades available? Midterm rooms are now booked. October 18 th at 6:30pm (BUCH A102

More information

Performance Degradation Assessment and Fault Diagnosis of Bearing Based on EMD and PCA-SOM

Performance Degradation Assessment and Fault Diagnosis of Bearing Based on EMD and PCA-SOM Performance Degradation Assessment and Fault Diagnosis of Bearing Based on EMD and PCA-SOM Lu Chen and Yuan Hang PERFORMANCE DEGRADATION ASSESSMENT AND FAULT DIAGNOSIS OF BEARING BASED ON EMD AND PCA-SOM.

More information

Basic Data Mining Technique

Basic Data Mining Technique Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm

More information

An Automated System for Data Attribute Anomaly Detection

An Automated System for Data Attribute Anomaly Detection Proceedings of Machine Learning Research 77:95 101, 2017 KDD 2017: Workshop on Anomaly Detection in Finance An Automated System for Data Attribute Anomaly Detection David Love Nalin Aggarwal Alexander

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Research on outlier intrusion detection technologybased on data mining

Research on outlier intrusion detection technologybased on data mining Acta Technica 62 (2017), No. 4A, 635640 c 2017 Institute of Thermomechanics CAS, v.v.i. Research on outlier intrusion detection technologybased on data mining Liang zhu 1, 2 Abstract. With the rapid development

More information

COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS

COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS COMPARISON OF DENSITY-BASED CLUSTERING ALGORITHMS Mariam Rehman Lahore College for Women University Lahore, Pakistan mariam.rehman321@gmail.com Syed Atif Mehdi University of Management and Technology Lahore,

More information

University of Florida CISE department Gator Engineering. Clustering Part 5

University of Florida CISE department Gator Engineering. Clustering Part 5 Clustering Part 5 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville SNN Approach to Clustering Ordinary distance measures have problems Euclidean

More information

Data: a collection of numbers or facts that require further processing before they are meaningful

Data: a collection of numbers or facts that require further processing before they are meaningful Digital Image Classification Data vs. Information Data: a collection of numbers or facts that require further processing before they are meaningful Information: Derived knowledge from raw data. Something

More information

Model-based segmentation and recognition from range data

Model-based segmentation and recognition from range data Model-based segmentation and recognition from range data Jan Boehm Institute for Photogrammetry Universität Stuttgart Germany Keywords: range image, segmentation, object recognition, CAD ABSTRACT This

More information

Implementing Operational Analytics Using Big Data Technologies to Detect and Predict Sensor Anomalies

Implementing Operational Analytics Using Big Data Technologies to Detect and Predict Sensor Anomalies Implementing Operational Analytics Using Big Data Technologies to Detect and Predict Sensor Anomalies Joseph Coughlin, Rohit Mital, Shashi Nittur, Benjamin SanNicolas, Christian Wolf, Rinor Jusufi Stinger

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE

APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE Sundari NallamReddy, Samarandra Behera, Sanjeev Karadagi, Dr. Anantha Desik ABSTRACT: Tata

More information

Chapter 10. Conclusion Discussion

Chapter 10. Conclusion Discussion Chapter 10 Conclusion 10.1 Discussion Question 1: Usually a dynamic system has delays and feedback. Can OMEGA handle systems with infinite delays, and with elastic delays? OMEGA handles those systems with

More information

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn University,

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data

More information

DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING

DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING DUPLICATE DETECTION AND AUDIO THUMBNAILS WITH AUDIO FINGERPRINTING Christopher Burges, Daniel Plastina, John Platt, Erin Renshaw, and Henrique Malvar March 24 Technical Report MSR-TR-24-19 Audio fingerprinting

More information

Perry. Lakeshore. Avon. Eastlake

Perry. Lakeshore. Avon. Eastlake Perry Lorain Avon Lakeshore Eastlake Ashtabula Mansfield Sammis Beaver Valley Conesville Tidd Burger & Kammer Muskingum Perry Lorain Avon Lakeshore Eastlake Ashtabula Mansfield Sammis Beaver Valley Conesville

More information

Customer Clustering using RFM analysis

Customer Clustering using RFM analysis Customer Clustering using RFM analysis VASILIS AGGELIS WINBANK PIRAEUS BANK Athens GREECE AggelisV@winbank.gr DIMITRIS CHRISTODOULAKIS Computer Engineering and Informatics Department University of Patras

More information

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept

More information

Machine Learning (CSMML16) (Autumn term, ) Xia Hong

Machine Learning (CSMML16) (Autumn term, ) Xia Hong Machine Learning (CSMML16) (Autumn term, 28-29) Xia Hong 1 Useful books: 1. C. M. Bishop: Pattern Recognition and Machine Learning (2007) Springer. 2. S. Haykin: Neural Networks (1999) Prentice Hall. 3.

More information

Automatic Shadow Removal by Illuminance in HSV Color Space

Automatic Shadow Removal by Illuminance in HSV Color Space Computer Science and Information Technology 3(3): 70-75, 2015 DOI: 10.13189/csit.2015.030303 http://www.hrpub.org Automatic Shadow Removal by Illuminance in HSV Color Space Wenbo Huang 1, KyoungYeon Kim

More information

Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga.

Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. Americo Pereira, Jan Otto Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. ABSTRACT In this paper we want to explain what feature selection is and

More information

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Exploratory Machine Learning studies for disruption prediction on DIII-D by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Presented at the 2 nd IAEA Technical Meeting on

More information

CS 231A Computer Vision (Fall 2012) Problem Set 3

CS 231A Computer Vision (Fall 2012) Problem Set 3 CS 231A Computer Vision (Fall 2012) Problem Set 3 Due: Nov. 13 th, 2012 (2:15pm) 1 Probabilistic Recursion for Tracking (20 points) In this problem you will derive a method for tracking a point of interest

More information

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped

More information

Unified PMU Placement Algorithm for Power Systems

Unified PMU Placement Algorithm for Power Systems Unified PMU Placement Algorithm for Power Systems Kunal Amare, and Virgilio A. Centeno Bradley Department of Electrical and Computer Engineering, Virginia Tech Blacksburg, VA-24061, USA. Anamitra Pal Network

More information

Fuzzy Partitioning with FID3.1

Fuzzy Partitioning with FID3.1 Fuzzy Partitioning with FID3.1 Cezary Z. Janikow Dept. of Mathematics and Computer Science University of Missouri St. Louis St. Louis, Missouri 63121 janikow@umsl.edu Maciej Fajfer Institute of Computing

More information