Model Based Symbolic Description for Big Data Analysis
|
|
- Flora Reynolds
- 6 years ago
- Views:
Transcription
1 Model Based Symbolic Description for Big Data Analysis 1 Model Based Symbolic Description for Big Data Analysis *Carlo Drago, **Carlo Lauro and **Germana Scepi *University of Rome Niccolo Cusano, **University of Naples Federico II COMPSTAT st International Conference on Computational Statistics
2 Model Based Symbolic Description for Big Data Analysis 2 Outline The Statistical Problem Beanplot Time Series Definition Kernel and Bandwidth choice Beanplot Characteristics and Robustness Parameterization Beanplot Modelling Multiple Beanplot Time Series Beanplot Multiple Factor Analysis Beanplot Clustering (using the Beanplot Model Distance) Beanplot Constrained Clustering (using the Beanplot Model Distance) Beanplot Forecasting
3 Model Based Symbolic Description for Big Data Analysis Big Data 3 Big Data Recent technological advances carried on many innovations in data. In particular, there was an explosion of large data sets available. Big data is the term frequently used today for any collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. Big data are data characterized by: high volume high velocity high variety This type of data usually shows a temporal dimension also.
4 Model Based Symbolic Description for Big Data Analysis Big Data 4 Financial Big Data This is especially promising and differentiating for financial services companies. In fact, financial business copes with hundreds of millions daily transactions and use big data in order to conduct transformations on their processes and organizations and to obtain competitive advantages in financial markets. Financial firms must be able to collect, store, and analyze rapidly changing, this type of data in order to maximize profits, reduce risk, and meet increasingly stringent regulatory requirements. The extraction of insights from so complex, and frequently unstructured data, is a very important step in this process and the statistical approach can give a fundamental contribution in this sense.
5 Model Based Symbolic Description for Big Data Analysis Big Data 5 Financial Big Data We consider as big data, observations on financial variables, taken daily or at a finer time scale, often irregularly spaced over time, and usually exhibit periodic (intra-day and intra-week) patterns in financial markets. The high-frequency data possess these peculiar features and can be considered an example of big data in finance markets, such as records of transactions and quotes for stocks or bonds, currencies and so on. These peculiar time series shows many difficulties in visualization and if are analyzed by means of an aggregated index conduct to an evident information loss.
6 Model Based Symbolic Description for Big Data Analysis Big Data 6 The Frequency Domain A time series of distributions would offer a more informative representation than other forms of aggregated time series. In order to analyze these data and we will consider the data not on the temporal domain of the time series, but in the frequency domain (considering for example the day). In this sense we consider the number of occurrencies on the time related to a specific value. We have several advantages on do that: We can detect simply the data patterns on the data as the most recurrent observations on the temporal interval We can detect the inter-temporal seasonalities which can occur on the temporal interval We can observe the similarities between different series.
7 Model Based Symbolic Description for Big Data Analysis Big Data 7 From Financial Big Data to Symbolic Data From the initial financial big data we are able to obtain the symbolic data table in which each data can be represented as a distribution. At this point we can: Represent the distribution as a Beanplot data Choosing the adequate data model Parameterizing the data model and obtaining the relevant parameters The final parameters are the relevant big data representation and could be used in clustering and forecasting.
8 Model Based Symbolic Description for Big Data Analysis Big Data 8 From Financial Big Data to Symbolic Data Figure: Fro Finacial Big Data to Symbolic Data (the first graph is from Martinaitis (2012))
9 Model Based Symbolic Description for Big Data Analysis Big Data 9 Methods Figure: Methods
10 Model Based Symbolic Description for Big Data Analysis Beanplot Time Series 10 Beanplot Time Series (BTS) A Beanplot time series can be defined as an ordered sequence of beanplot data (Kampstra 2008) over the time. The advantage of using the beanplot is his capacity to represent the intra-period data structure at time t. In the Beanplot time series a density data at time t with t = 1... T is defined as: ˆb k,h,t = 1 nh n i=1 K( x x i ) = 1 h nh (K( x x1 h )+K( x x2 h )+ +K( x xn )) h (1) where K is a kernel function, h is a smoothing parameter defined as a bandwidth and n is the number of x i intra-period observations.
11 Model Based Symbolic Description for Big Data Analysis Beanplot Time Series 11 Beanplot Taxonomies We can detect some typical taxonomies in the beanplots: A) Unimodality: data tend to gather over one mode in a regular way. B) Multimodality: data tend to gather over two modes. C) Break: data tend to gather in two mode but there is at least a break between the observations. Figure: Beanplot Taxonomy
12 Model Based Symbolic Description for Big Data Analysis Beanplot Time Series 12 Identifying Intra-Period Breaks Beanplot can be characterised by some groups of internal outlier observations (more than one). In this way the final result is a break in the data structure. In order to detect the intra-period breaks we: We sort the observations from the highest to the lowest We compute the first differences i with i = 1... n 1 and we compute the mean = i n 1 Are considered relevant the values which are over a specified threshold for example i > 3 In particular these value need to break the internal patterns considered. Is relevant to take in to account we can weight the internal outliers detected. In this way the beanplot is represented by a suitable weighting system. Figure: Intra-period breaks
13 Model Based Symbolic Description for Big Data Analysis Beanplot Time Series 13 Kernels Cosine Various kernels (K) can be generally chosen: Gaussian, uniform, Epanechnikov, triweight, exponential, cosine between others. The kernel is chosen in order to represent adequately the density function. K need to satisfy: Kernel Uniform: + K(u) du = 1 (2) K(u) = ( u 1) (3) Epanechnikov K(u) = 3 4 (1 u2 ) 1 ( u 1) (4) Triweight Exponential K(u) = (1 u2 ) 3 1 ( u 1) (5) K(u) = 1 2π e 1 2 u2 (6)
14 Model Based Symbolic Description for Big Data Analysis Beanplot Time Series 14 Kernel Properties Kernel function K(u) is nonnegative and need to fulfill (Racine 2008): K(u)du = 1 (8) K(u) = K( u) (9) u 2 K(u)d(u) = K 2 > 0 (10)
15 Model Based Symbolic Description for Big Data Analysis Beanplot Time Series 15 Kernel Selection It turns out that a range of kernel functions result in estimators having similar relative efficiencies, one could choose the kernel based on computational considerations, the Gaussian kernel being a popular choice... (Racine 1986) In order to approximate our data we will choose the Gaussian kernel: K(u) = 1 2π e 1 2 u2 (11) By considering big data, the Gaussian kernel is the most simple to interpret...unlike choosing a kernel function, however, choosing an appropriate bandwidth is a crucial aspect of sound nonparametric analysis (Racine 1986)
16 Model Based Symbolic Description for Big Data Analysis Beanplot Time Series 16 Kernel Selection Figure: Kernel Choice and Kernel Density Estimation The figure show the kernel density estimation computation using a Gaussian kernel and a bandwidth of h = 0.3 (R code by François 2012)
17 Model Based Symbolic Description for Big Data Analysis Beanplot Time Series 17 BTS: Bandwidth Selection We ll show the impact of the different selected bandwidths (using three choices: low, high and Sheather-Jones) on the beanplot time series. In the example we consider a yearly interval for the beanplot observation related the Dow Jones Index. This interval could be validated by considering the temporal horizons in which in these data (stocks) can occur. In fact in risk management application the relevant interval is the year (to take in to account the risks of financial crisis). In particular by considering the bandwidth we can to observe: Low Bandwidth: tend to show many bumps or to maximize the number of bumps by beanplot. High Bandwidth: we tend to have a more regular shape of the density traces. However here the risk is to lose some informations. Sheather Jones Method: the bandwidth change beanplot by beanplot so the bandwidth as well became an indicator of variability. Usually the impact of both bandwidth selection and kernel selection is obtained by simulation.
18 Model Based Symbolic Description for Big Data Analysis Beanplot Time Series 18 BTS: Bandwidth Selection Dow Jones BTS Bandwidth Selection Yearly Beanplot Time Series on Dow Jones daily data ( ). Different Bandwidth choices on the Beanplot Time Series: Low bandwidth h = 8, High Bandwidth h = 102, Sheather and Jones method (use some pilot estimation of derivatives to choose the bandwidth). Kernel selected: Gaussian.
19 Model Based Symbolic Description for Big Data Analysis Beanplot Time Series 19 The Impact of the Kernel and the Bandwidth Changing Selection It is possible to explore the beanplot data characteristics using different Kernels and Bandwidths. We will choose to use the Gaussian kernel (for his flexibility) and the bandwidth obtained by the Sheather Jones method (to explore the data structure).
20 Model Based Symbolic Description for Big Data Analysis Beanplot Time Series 20 Beanplot Time Series: Characteristics Beanline the mean or the median. Beanplots Lower and Upper Bound [X ] t = [X t,l, X t,u ] with < X t,l X t,u < Beanplots Center and Radius [X ] t = X t,c, X t,r where X t,c = (X t,l + X t,u )/2 and X t,r = (X t,u X t,l )/2 Quantiles Main Characteristics Location: the beanline mean, the beanplot Center. Size: Beanplots Radius, Lower and Upper Bounds Shape: the h parameter regulates the density trace. So, the higher the bandwidth the wiggler the density function. The h parameter can be obtained using the Sheather-Jones method (see Kampstra (2008)). Relevant effects also on the kurthosis.
21 Model Based Symbolic Description for Big Data Analysis Beanplot Time Series 21 Beanplot Time Series: Characteristics Intra-period and inter-period variability Yearly beanplot time series on Dow Jones daily data ( ) allows the identification of structural changes and intra-period variability patterns. The Kernel chosen is the Gaussian, the bandwidth is obtained by mean of the Sheather-Jones method.
22 Model Based Symbolic Description for Big Data Analysis Parameterization 22 Beanplot Modeling: Choosing the Class of the Model We consider the symbolic aggregation approach by considering as temporal interval the day We consider an approch to the frequency domain in order to extract the relevant daily patterns At this point we choose the class of the model. In particular we consider the number of the mixtures to use, the distributions considered and so on. In our case we choose two mixtures because the gof indexes show a good approximation of the data. At the same time the Gaussian distribution allow to maximize the gof index in the experiments on data we have performed. From te relevant daily data we extract the relevant parameters by the parameterization procedure. In particular we will consider a finite mixture model for each density function.
23 Model Based Symbolic Description for Big Data Analysis Parameterization 23 Beanplot Parameterization In order to compare and to analyse the beanplot time series we need to parameterize the different beanplot. The aim of the parameterization are: Synthesizing the beanplot observations Comparing, analysing and interpreting the beanplot observations Storing big data In this sense: We consider a kernel density estimation of the density function (a bandwidth h and a kernel K). We obtain: B k t Finite mixture model of the density function. We obtain: B M t Model diagnostic and model fit
24 Model Based Symbolic Description for Big Data Analysis Parameterization 24 Beanplot by Mixture Models Parameterization is important because the storing of the relevant information of the beanplots can be used in clustering and in forecasting. With the aim of parameterization we estimate the model parameters as a finite mixture density function. So we have: B M t = J π j f (x θ j ) (12) j=1 Where π 1... π J are scalars and θ 1,..., θ J are vectors of parameters Here: 0 π j 1 and also π 1 + π π J = 1. Therefore we obtain A µ t (means), A σ t (standard deviations), A p t (weights). We use Gaussian distributions for their flexibility. We use the Maximum Likelihood Estimation for the estimation of the parameters.
25 Model Based Symbolic Description for Big Data Analysis Parameterization 25 Bt M Parameters Interpretation Parameters can be interpreted in this way: µ j they represent the main intra-period characteristics, for example in the financial context, which values the price of a stock has gathered over time. Changes in µ j can occur in the presence of structural changes σ j represent the intra-period variability, where in financial terms this can be higher volatility. Changes in σ j can occur in the presence of financial news (higher or lower intra-period volatility). π j represents the relative weight for each distinct group of observation. Changes in π j are related to intra-periodal changes
26 Model Based Symbolic Description for Big Data Analysis Parameterization 26 Number of Bt M Parameters The number of parameters to estimate is referred to the number of components (C) in the mixture. A feasible solution need to be a compromise between comparability, simplicity and usability. After the estimation of the model is necessary to consider the quality of the fit. Figure: Beanplot Model with C = 2
27 Model Based Symbolic Description for Big Data Analysis Parameterization 27 Weighting In every finite mixture model we measure the fit of the model by using an goodness of fit index. The index measure the level of fit of the model related the initial data. In this sense 1 represent the highest level of fit, and 0 the minimum. This index is used to weighting the observations in all the different models of models in order to weight less the observations with no represent adequately the data. At the same time observations with higher goodness of fit are weighted more.
28 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 28 Multiple Beanplot Time Series (MBTS) Here with the aim to create a representative market index we consider a beanplot to take in to account the intra-period variation. In particular we construct a Beanplot market index in order to represent the entire market risk. A beanplot market index can have a relevant applications in risk management to anticipate the risk over time. At the same time a beanplot market index can reflect the state of an economy and the sentiment of the investors and help investment decisions. So in this sense we extend our previous approach for single beanplot analysis to the case of Multiple Beanplot Time Series
29 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 29 Multiple Beanplot Time Series Multiple Beanplot Time Series can be defined the simultaneous observation of more than one Beanplot Time Series. For example we can observe the Beanplot Time Series related to a more than one financial market. By considering the multiple beanplot time series related to a market the resultant synthesis will be a beanplot representing the entire market (as an index of the entire market, for example, FTSE MIB for the Italian Case). Possible real applications: Exploratory Time Series Analysis Constructing Composite Indicators based on multiple beanplot time series Portfolio Selection Change Point Detection Forecasting
30 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 30 Multiple Beanplot Time Series Analysis We consider four different methods with different aims: Multiple Factor Analysis with the aim of seeking the common structure of the blocks describing the multiple beanplot time series. Clustering with the aim of detecting relevant subgroups over time and finding similar beanplot observations. These observation can be related to different stocks. The results can be used in portfolio selection strategies Constrained Clustering with the aim of detecting relevant subperiods in a beanplot time series. These relevant subperiods represented by groups of beanplots over time can be used in order to detect market change point. Forecasting with the aim of prediction of the observations over time. The models can be used in trading.
31 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 31 Beanplot Multiple Factor Analysis (BMFA) The aim of the method is to synthesize the different beanplot multiple time series in order to obtain indexes over time of the market or the portfolio. The indexes can be used in order to take decisions. We consider as one of the most important element in building the index the gof as the capacity of the models to approximate the original data. We parameterize the different beanplot time series. In this case we obtain the parameters related the weights, the means and the variance for each data. In this example we visualize the first parameter (the weight related the first mixture): m1.p1 m2.p1 m3.p1 m4.p1 m5.p1 m6.p1 m7.p
32 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 32 Beanplot Multiple Factor Analysis Here we visualize the matrix for the weight related the second mixture: m1.p2 m2.p2 m3.p2 m4.p2 m5.p2 m6.p2 m7.p
33 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 33 Beanplot Multiple Factor Analysis The first parameter for the mean of the first mixture: m1.m1 m2.m1 m3.m1 m4.m1 m5.m1 m6.m1 m7.m
34 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 34 Beanplot Multiple Factor Analysis The second parameter for the mean of the second mixture: m1.m2 m2.m2 m3.m2 m4.m2 m5.m2 m6.m2 m7.m
35 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 35 Beanplot Multiple Factor Analysis The variance parameter related the first mixture: m1.s1 m2.s1 m3.s1 m4.s1 m5.s1 m6.s1 m7.s
36 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 36 Beanplot Multiple Factor Analysis The second parameter related the variance of the second mixture: m1.s2 m2.s2 m3.s2 m4.s2 m5.s2 m6.s2 m7.s
37 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 37 Beanplot Multiple Factor Analysis We obtain as well the gof index for each mixture. Each model is represented by their parameters and by the gof index. The gof index is necessary in order to weight differently the observations with have a lower gof in the different models. m1.gof m2.gof m3.gof m4.gof m5.gof m6.gof m7.gof
38 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 38 Beanplot Multiple Factor Analysis We can obtain the index as beanplots from the block-pca weighting for the gof index. At the end of the procedure we can obtain the beanplot prototype time series. The global PCA is performed on a matrix with merged initial datasets (Abdi and Valentine 2007) Figure: MFA Beanplot Prototype Time Series
39 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 39 Beanplot Multiple Factor Analysis By considering the correlation circle we can observe the variables of high performing stocks (represented by higher means) versus the characteristics of low means (x-axis). At the same time we are able to see characteizations of higher volatility in the y-axis. Figure: Correlation Circle
40 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 40 Beanplot Multiple Factor Analysis We obtain as well: the individual factor maps and the groups representations. These results can be used in order to interpret the results financially: Individual factor map (1) show the characteristics of the different temporal observations. We can observe here the dynamics over time of the market as a whole Individual factor map (2) show the way the different stocks (represented by the different models) performs over time. It is possible to read that some stocks tend to grow more than others so they seems to be good opportunities (model 2 and model 5) Groups representation show the portfolio selection by considering the different performances of the stocks (or models). In this context seems to be reasonable a strategy by picking first of all the stocks 5 and 7 then 1 and 2. Overall these stocks seems to be convenient by considering their performances over time. The plot is useful in order to discriminate good stocks to others. We use the gof index in order to weight accordingly the observations.
41 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 41 Beanplot Multiple Factor Analysis Figure: Individual Factor Map (1)
42 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 42 Beanplot Multiple Factor Analysis Figure: Individual Factor Map (2)
43 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 43 Beanplot Multiple Factor Analysis Figure: Groups representation
44 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 44 Beanplot Clustering The aim of the clustering procedure is find groups of different beanplot models or stocks on a day which can be more similar. The procedure can be very useful on stock picking processes. In this context a relevant distance used is the model distance by Lauro Romano Giordano (2006). By using the appropriate distance we are able to discover that the stocks 2 and 3 performs very peculiarly on the groups of the stocks considered. The stocks 1 and 7 show together a very low gof. Finally we are able to discriminate the different stocks typologies.
45 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 45 Beanplot Clustering model t p1 p2 m1 m2 s1 s2 gof 1 m m m m m m m
46 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 46 Beanplot Clustering Figure: Clustering
47 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 47 Beanplot Constrained Clustering The aim of the constrained clustering procedure is to find groups of beanplots (or models) which are similar over the time. The final results can be used to detect relevant change point over time. Also in this case the relevant distance used is the model distance by Lauro Romano Giordano (2006). The results show a very unstable situation for the first three observations. In this context we can detect three changing points on the first three observations. Then the period 4-5 and the period 6-8 show relevant similarities. Overall the periods 1,2,3 are very risky because the gof level is comparatively not so high
48 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 48 Beanplot Constrained Clustering t p1 p2 m1 m2 s1 s2 gof
49 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 49 Beanplot Constrained Clustering Figure: Constrained Clustering
50 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 50 Beanplot Forecasting In order to predict adequately the observations related the beanplot models over time we can use a forecasting procedure based on the VAR. The aim of the procedure is to predict each observation over time by choosing the adequate VAR model The models take in the account the weight based on the gof. The results of the predicted parameters allows to obtain the predicted models.
51 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 51 Beanplot Forecasting V1 V2 V3 V4 V5 V6 V7 V prediction
52 Model Based Symbolic Description for Big Data Analysis Multiple Beanplot Time Series 52 Beanplot Forecasting Figure: Forecasting: real beanplot to predict (left) and the forecast (right)
53 Model Based Symbolic Description for Big Data Analysis Conclusions 53 Conclusions The application of the Beanplots as Symbolic Data seems to be very fruitful on Financial Big Data. The use of the models based on the beanplots allow to retain the relevant information based on the parameters of the models as well. A fundamental point is to use the error on weighting the different models and observations. In this context we have shown that the use of the error allow the improvement of the results The different models allow to detect relevant patterns in the data which can be exploited in various financial operations like trading, risk management and so on. As future development we will consider these methodologies in other contexts as for example control charts in order to evaluate the stability of the markets and building relevant system alerts.
Kernel Density Estimation (KDE)
Kernel Density Estimation (KDE) Previously, we ve seen how to use the histogram method to infer the probability density function (PDF) of a random variable (population) using a finite data sample. In this
More informationDATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm
DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationNonparametric Density Estimation
Nonparametric Estimation Data: X 1,..., X n iid P where P is a distribution with density f(x). Aim: Estimation of density f(x) Parametric density estimation: Fit parametric model {f(x θ) θ Θ} to data parameter
More informationOn Kernel Density Estimation with Univariate Application. SILOKO, Israel Uzuazor
On Kernel Density Estimation with Univariate Application BY SILOKO, Israel Uzuazor Department of Mathematics/ICT, Edo University Iyamho, Edo State, Nigeria. A Seminar Presented at Faculty of Science, Edo
More informationToday. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time
Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine
More informationAcquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.
Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationDynamic Thresholding for Image Analysis
Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationData Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data
More informationNonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni
Nonparametric Risk Attribution for Factor Models of Portfolios October 3, 2017 Kellie Ottoboni Outline The problem Page 3 Additive model of returns Page 7 Euler s formula for risk decomposition Page 11
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationUse of Extreme Value Statistics in Modeling Biometric Systems
Use of Extreme Value Statistics in Modeling Biometric Systems Similarity Scores Two types of matching: Genuine sample Imposter sample Matching scores Enrolled sample 0.95 0.32 Probability Density Decision
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationBig Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural edition
Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural edition What s the BIG deal?! 2011 2011 2008 2010 2012 What s the BIG deal?! (Gartner Hype Cycle) What s the
More informationSD 372 Pattern Recognition
SD 372 Pattern Recognition Lab 2: Model Estimation and Discriminant Functions 1 Purpose This lab examines the areas of statistical model estimation and classifier aggregation. Model estimation will be
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationRecap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach
Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)
More informationMachine Learning Lecture 3
Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear
More informationThe data quality trends report
Report The 2015 email data quality trends report How organizations today are managing and using email Table of contents: Summary...1 Research methodology...1 Key findings...2 Email collection and database
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationLecture 25 Nonlinear Programming. November 9, 2009
Nonlinear Programming November 9, 2009 Outline Nonlinear Programming Another example of NLP problem What makes these problems complex Scalar Function Unconstrained Problem Local and global optima: definition,
More informationBayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis
Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis Xavier Le Faucheur a, Brani Vidakovic b and Allen Tannenbaum a a School of Electrical and Computer Engineering, b Department of Biomedical
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationExpectation Maximization (EM) and Gaussian Mixture Models
Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation
More informationWhen, Where & Why to Use NoSQL?
When, Where & Why to Use NoSQL? 1 Big data is becoming a big challenge for enterprises. Many organizations have built environments for transactional data with Relational Database Management Systems (RDBMS),
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationCustomer Clustering using RFM analysis
Customer Clustering using RFM analysis VASILIS AGGELIS WINBANK PIRAEUS BANK Athens GREECE AggelisV@winbank.gr DIMITRIS CHRISTODOULAKIS Computer Engineering and Informatics Department University of Patras
More informationSalford Systems Predictive Modeler Unsupervised Learning. Salford Systems
Salford Systems Predictive Modeler Unsupervised Learning Salford Systems http://www.salford-systems.com Unsupervised Learning In mainstream statistics this is typically known as cluster analysis The term
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationSYDE Winter 2011 Introduction to Pattern Recognition. Clustering
SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned
More informationMachine Learning Lecture 3
Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process
More informationPoints Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked
Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations
More informationIntroduction to Nonparametric/Semiparametric Econometric Analysis: Implementation
to Nonparametric/Semiparametric Econometric Analysis: Implementation Yoichi Arai National Graduate Institute for Policy Studies 2014 JEA Spring Meeting (14 June) 1 / 30 Motivation MSE (MISE): Measures
More informationMachine Learning Lecture 3
Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 2
Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster
More informationComputer Vision 6 Segmentation by Fitting
Computer Vision 6 Segmentation by Fitting MAP-I Doctoral Programme Miguel Tavares Coimbra Outline The Hough Transform Fitting Lines Fitting Curves Fitting as a Probabilistic Inference Problem Acknowledgements:
More informationUnit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys
Unit 7 Statistics AFM Mrs. Valentine 7.1 Samples and Surveys v Obj.: I will understand the different methods of sampling and studying data. I will be able to determine the type used in an example, and
More informationAnalysis of Functional MRI Timeseries Data Using Signal Processing Techniques
Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques Sea Chen Department of Biomedical Engineering Advisors: Dr. Charles A. Bouman and Dr. Mark J. Lowe S. Chen Final Exam October
More informationOutline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model
Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses
More informationChristoHouston Energy Inc. (CHE INC.) Pipeline Anomaly Analysis By Liquid Green Technologies Corporation
ChristoHouston Energy Inc. () Pipeline Anomaly Analysis By Liquid Green Technologies Corporation CHE INC. Overview: Review of Scope of Work Wall thickness analysis - Pipeline and sectional statistics Feature
More informationBBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler
BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Classification Classification systems: Supervised learning Make a rational prediction given evidence There are several methods for
More informationThe Ohio State University Columbus, Ohio, USA Universidad Autónoma de Nuevo León San Nicolás de los Garza, Nuevo León, México, 66450
Optimization and Analysis of Variability in High Precision Injection Molding Carlos E. Castro 1, Blaine Lilly 1, José M. Castro 1, and Mauricio Cabrera Ríos 2 1 Department of Industrial, Welding & Systems
More informationCOMPUTATIONAL STATISTICS UNSUPERVISED LEARNING
COMPUTATIONAL STATISTICS UNSUPERVISED LEARNING Luca Bortolussi Department of Mathematics and Geosciences University of Trieste Office 238, third floor, H2bis luca@dmi.units.it Trieste, Winter Semester
More informationAaron Daniel Chia Huang Licai Huang Medhavi Sikaria Signal Processing: Forecasting and Modeling
Aaron Daniel Chia Huang Licai Huang Medhavi Sikaria Signal Processing: Forecasting and Modeling Abstract Forecasting future events and statistics is problematic because the data set is a stochastic, rather
More informationWhat s New in Spotfire DXP 1.1. Spotfire Product Management January 2007
What s New in Spotfire DXP 1.1 Spotfire Product Management January 2007 Spotfire DXP Version 1.1 This document highlights the new capabilities planned for release in version 1.1 of Spotfire DXP. In this
More information8/3/2017. Contour Assessment for Quality Assurance and Data Mining. Objective. Outline. Tom Purdie, PhD, MCCPM
Contour Assessment for Quality Assurance and Data Mining Tom Purdie, PhD, MCCPM Objective Understand the state-of-the-art in contour assessment for quality assurance including data mining-based techniques
More informationComputer Experiments: Space Filling Design and Gaussian Process Modeling
Computer Experiments: Space Filling Design and Gaussian Process Modeling Best Practice Authored by: Cory Natoli Sarah Burke, Ph.D. 30 March 2018 The goal of the STAT COE is to assist in developing rigorous,
More informationLecture 9: Hough Transform and Thresholding base Segmentation
#1 Lecture 9: Hough Transform and Thresholding base Segmentation Saad Bedros sbedros@umn.edu Hough Transform Robust method to find a shape in an image Shape can be described in parametric form A voting
More informationMultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A
MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI
More informationAnalysing Search Trends
Data Mining in Business Intelligence 7 March 2013, Ben-Gurion University Analysing Search Trends Yair Shimshoni, Google R&D center, Tel-Aviv. shimsh@google.com Outline What are search trends? The Google
More informationData transformation in multivariate quality control
Motto: Is it normal to have normal data? Data transformation in multivariate quality control J. Militký and M. Meloun The Technical University of Liberec Liberec, Czech Republic University of Pardubice
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationThe Curse of Dimensionality
The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning
More informationIntroduction to Trajectory Clustering. By YONGLI ZHANG
Introduction to Trajectory Clustering By YONGLI ZHANG Outline 1. Problem Definition 2. Clustering Methods for Trajectory data 3. Model-based Trajectory Clustering 4. Applications 5. Conclusions 1 Problem
More informationVocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.
5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table
More informationSolution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013
Your Name: Your student id: Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013 Problem 1 [5+?]: Hypothesis Classes Problem 2 [8]: Losses and Risks Problem 3 [11]: Model Generation
More informationSome questions of consensus building using co-association
Some questions of consensus building using co-association VITALIY TAYANOV Polish-Japanese High School of Computer Technics Aleja Legionow, 4190, Bytom POLAND vtayanov@yahoo.com Abstract: In this paper
More informationActive Appearance Models
Active Appearance Models Edwards, Taylor, and Cootes Presented by Bryan Russell Overview Overview of Appearance Models Combined Appearance Models Active Appearance Model Search Results Constrained Active
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised
More informationUser Behaviour and Platform Performance. in Mobile Multiplayer Environments
User Behaviour and Platform Performance in Mobile Multiplayer Environments HELSINKI UNIVERSITY OF TECHNOLOGY Systems Analysis Laboratory Ilkka Hirvonen 51555K 1 Introduction As mobile technologies advance
More information2014 Stat-Ease, Inc. All Rights Reserved.
What s New in Design-Expert version 9 Factorial split plots (Two-Level, Multilevel, Optimal) Definitive Screening and Single Factor designs Journal Feature Design layout Graph Columns Design Evaluation
More information3 Graphical Displays of Data
3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationIntroduction to Data Mining
Introduction to JULY 2011 Afsaneh Yazdani What motivated? Wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge What motivated? Data
More informationMachine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham
Final Report for cs229: Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham Abstract. The goal of this work is to use machine learning to understand
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationReview of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga.
Americo Pereira, Jan Otto Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. ABSTRACT In this paper we want to explain what feature selection is and
More informationChapter 10. Conclusion Discussion
Chapter 10 Conclusion 10.1 Discussion Question 1: Usually a dynamic system has delays and feedback. Can OMEGA handle systems with infinite delays, and with elastic delays? OMEGA handles those systems with
More informationELEC Dr Reji Mathew Electrical Engineering UNSW
ELEC 4622 Dr Reji Mathew Electrical Engineering UNSW Review of Motion Modelling and Estimation Introduction to Motion Modelling & Estimation Forward Motion Backward Motion Block Motion Estimation Motion
More informationData Mining and Analytics. Introduction
Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data
More informationGraph Structure Over Time
Graph Structure Over Time Observing how time alters the structure of the IEEE data set Priti Kumar Computer Science Rensselaer Polytechnic Institute Troy, NY Kumarp3@rpi.edu Abstract This paper examines
More information* Hyun Suk Park. Korea Institute of Civil Engineering and Building, 283 Goyangdae-Ro Goyang-Si, Korea. Corresponding Author: Hyun Suk Park
International Journal Of Engineering Research And Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 13, Issue 11 (November 2017), PP.47-59 Determination of The optimal Aggregation
More informationCHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and
CHAPTER-13 Mining Class Comparisons: Discrimination between DifferentClasses: 13.1 Introduction 13.2 Class Comparison Methods and Implementation 13.3 Presentation of Class Comparison Descriptions 13.4
More informationThis tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.
About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts
More informationLecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)
School of Computer Science Probabilistic Graphical Models Structured Sparse Additive Models Junming Yin and Eric Xing Lecture 7, April 4, 013 Reading: See class website 1 Outline Nonparametric regression
More information3 Feature Selection & Feature Extraction
3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 Max-Dependency, Max-Relevance, Min-Redundancy 3.3.2 Relevance Filter 3.3.3 Redundancy
More informationData Polygamy: The Many-Many Relationships among Urban Spatio-Temporal Data Sets. Fernando Chirigati Harish Doraiswamy Theodoros Damoulas
Data Polygamy: The Many-Many Relationships among Urban Spatio-Temporal Data Sets Fernando Chirigati Harish Doraiswamy Theodoros Damoulas Juliana Freire New York University New York University University
More informationThe Perils of Unfettered In-Sample Backtesting
The Perils of Unfettered In-Sample Backtesting Tyler Yeats June 8, 2015 Abstract When testing a financial investment strategy, it is common to use what is known as a backtest, or a simulation of how well
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationApplication of Clustering Techniques to Energy Data to Enhance Analysts Productivity
Application of Clustering Techniques to Energy Data to Enhance Analysts Productivity Wendy Foslien, Honeywell Labs Valerie Guralnik, Honeywell Labs Steve Harp, Honeywell Labs William Koran, Honeywell Atrium
More informationIntroduction to Mobile Robotics
Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,
More informationWelcome to Analytics. Welcome to Applause! Table of Contents:
Welcome to Applause! Your success is our priority and we want to make sure Applause Analytics (ALX) provides you with actionable insight into what your users are thinking and saying about their experiences
More informationApplied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University
Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University NIPS 2008: E. Sudderth & M. Jordan, Shared Segmentation of Natural
More informationAutomate Transform Analyze
Competitive Intelligence 2.0 Turning the Web s Big Data into Big Insights Automate Transform Analyze Introduction Today, the web continues to grow at a dizzying pace. There are more than 1 billion websites
More informationMATH3016: OPTIMIZATION
MATH3016: OPTIMIZATION Lecturer: Dr Huifu Xu School of Mathematics University of Southampton Highfield SO17 1BJ Southampton Email: h.xu@soton.ac.uk 1 Introduction What is optimization? Optimization is
More informationLocating Salient Object Features
Locating Salient Object Features K.N.Walker, T.F.Cootes and C.J.Taylor Dept. Medical Biophysics, Manchester University, UK knw@sv1.smb.man.ac.uk Abstract We present a method for locating salient object
More informationPredict Outcomes and Reveal Relationships in Categorical Data
PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,
More informationHigh-Dimensional Incremental Divisive Clustering under Population Drift
High-Dimensional Incremental Divisive Clustering under Population Drift Nicos Pavlidis Inference for Change-Point and Related Processes joint work with David Hofmeyr and Idris Eckley Clustering Clustering:
More informationCS Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts
More informationConditional Volatility Estimation by. Conditional Quantile Autoregression
International Journal of Mathematical Analysis Vol. 8, 2014, no. 41, 2033-2046 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijma.2014.47210 Conditional Volatility Estimation by Conditional Quantile
More informationNetwork Heartbeat Traffic Characterization. Mackenzie Haffey Martin Arlitt Carey Williamson Department of Computer Science University of Calgary
Network Heartbeat Traffic Characterization Mackenzie Haffey Martin Arlitt Carey Williamson Department of Computer Science University of Calgary What is a Network Heartbeat? An event that occurs repeatedly
More informationINDEX UNIT 4 PPT SLIDES
INDEX UNIT 4 PPT SLIDES S.NO. TOPIC 1. 2. Screen designing Screen planning and purpose arganizing screen elements 3. 4. screen navigation and flow Visually pleasing composition 5. 6. 7. 8. focus and emphasis
More informationData Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining
Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Data Preprocessing Aggregation Sampling Dimensionality Reduction Feature subset selection Feature creation
More information