Anonymisation of Public Use Data Sets

Size: px
Start display at page:

Download "Anonymisation of Public Use Data Sets"

Transcription

1 Anonymsaton of Publc Use Data Sets Methods for Reducng Dsclosure Rsk and the Analyss of Perturbed Data Harvey Goldsten Unversty of Brstol and Unversty College London and Natale Shlomo Unversty of Manchester 1

2 The problem and some solutons Release of large (pseudonymsed datasets for analyss potentally allows statstcal attack va searchng for records satsfyng certan constrants (e.g. age, locaton, medcaton.. Standard soluton s to degrade data values to make t unlkely that an attacker could correctly dentfy ndvduals. Typcally judge usng k-anonymty Two types of dsclosure control methods under safe data approach 1. Non-perturbatve methods reduce nformaton content 2. Perturbatve methods alters the data to ncrease uncertanty of dentfcaton 2

3 The problem and some solutons Non-perturbatve methods: 1. Remove cells wth small counts f data n tabular form, preservng margns 2. Delete senstve varables 3. Group categores or categorse contnuous varables of dsclosve varables such as postcode, age. 4. Sub-sample Perturbatve methods: 1. Add random nose to ncrease uncertanty around correct dentfcaton (ths ncludes random msclassfcaton for categorcal varables 2. mcro-aggregaton of smlar cases (effectvely reduces varaton 3. Create synthetc data values whle preservng data structure 3

4 Effects on statstcal analyss a key concern 1 Cell removal: may over - coarsen data and n partcular remove nterestng nteracton effects 2 Groupng: lke (1 may smooth over comple relatonshps 3 Addton of random nose wll lead to ncorrect standard errors and also based coeffcents n generalsed lnear models unless properly adjusted for 4 Synthetc data may lead to severely based coeffcents f analyss models do not nclude varables used n the synthess 4

5 Synthetc data Synthetc data: reles on assumed or modelled data relatonshps to smulate (mpute new data that appromates real data. Ths can be done for all data or a subset. Producng multply mputed datasets allows correctons to be made for mputaton varance or appromatons avalable. Few would advocate that such data should be used for a fnal analyss: rather they can provde an ndcaton for a small set of fnal models that can then smply be ftted (n a secure envronment to produce requred model estmates. 5

6 Synthetc data Ths poses partcular problems: 1. There s a strong relance on producng the rght structure, typcally va a seres of condtonal models. 2. Even usng synthetc data n eploratory mode can lead users astray, where ther models based upon an appromaton to the true structure become based, and lead to the selecton of napproprate fnal models to be estmated usng the real data. 6

7 Addng random nose n general Addng random nose s less etreme than synthetc data. We suppose that the attacker has avalable a set of q values for y (the varables to be used, say yy that she ntends to match aganst records n the data set. We propose to construct a new set of varables, z, whch s what the attacker wll see zz yy + mm where m has a predefned (normal dstrbuton (other dstrbutons are possble e.g. dfferental prvacy technques often use a double eponental dstrbuton For smplcty, assume ndependence across varables to be dsturbed or we mght consder the case of correlated nose (correlated wth true values to preserve the correlaton structure and suffcent statstcs. Note that y can be contnuous or dscrete (categores numbered 1,,p 7

8 Addng random nose n general The value of the varance (σσ 2 mm wll determne the strength of the resstance to attack and can be a functon of the true varablty of each varable. We now form a measure of the dstance between the yy and each z and then rank these dstances. A general dstance measure can be wrtten n the form DD zz yy TT WW(zz yy, where, for eample, WW 1 Ω kk But, more smply we can choose the Eucldean dstance for each comparson record DD qq jj1 (zz yy jj 2, qq DD jj1 (zz yy jj 2, 1,., nn 8

9 Rankng the dstances A ratonal attacker chooses closest record(s to ther own as the correct one(s. Form RR RRRRRRRR(DD, RR RRRRRRRR DD, Defne value of for RR 1. Defne h RR 1, For eample: Thus f h0 we have the correct match. RR RR , so h f attacker chooses closest record. h measures dfference between chosen and correct method So choose nose added large enough so that, say, Pr h < pp < εε (say, pp 3, εε 0.1 9

10 A smulaton Generate 10 3 records wth 5 normal varables and σσ 2 mm 0.1 All varances 1 and covarances For each true value record (attacker s y generate DD, DD The followng table gves some estmates of dsclosveness n terms of h for a range of ndvduals at dfferent dstances from the medan. 10

11 Dstrbuton for h hh Cumulatve percentle of D dstrbuton

12 More results Lowest decle Pr(h>5. For combnatons of Ω aaaaaa σσ mm 2 where Ω always has unt dagonal elements and equal off-dagonal elements (gven by columns are shown. Sample sze σσ mm We see that the procedure s readly tuned smply by changng the varance of the nose. We are also studyng the possblty of a more sophstcated attack that uses vales of y predcted from the perturbed dataset rather than the z themselves. 12

13 The h-nde and k-anonymsaton If we have, say, 2-anonymty ths mples that an attacker s able to dentfy two ndvdual records matchng her own nformaton, so choosng ether of them at random means that there s a probablty of 0.5 that t s the correct one. The h-nde, however, only yelds a sngle ndvdual as the closest, for eample wth a probablty about 0.5 and thus provdes less nformaton to the attacker than n the case of 2-anonymty. 13

14 The h-nde and k-anonymzaton II For k-anonymty an attacker may be qute content that they can access 2 or perhaps even 5 records contanng the one that s sought. By contrast, wth the h-nde procedure, n our most favourable case, the probablty of the sought-for ndvdual beng one of the two nearest s just over 60% and one of the fve nearest just under 80% Thus t could be argued that ths s suffcent to deter an attacker and hence sutable n terms of dsclosveness. In practce careful attenton needs to be pad to the amount of nose requred to satsfy dsclosure concerns. 14

15 How to remove the nose Assume nose η ~ d(0, 2 σ η add to contnuous varable We get unbased totals and means but larger varance and bases where predctors ncorporate nose How to make correct nferences n a general modellng framework? y η 2 σ η Assume a smple regresson model wth a dependent varable that has been subjected to Gaussan addtve nose wth a mean of 0 and a postve varance The predctor varable s error free we assume

16 16 16 The model s: where denotes the true but unobserved value of the dependent varable If we regress on then snce y y y n y η ε β α 1,...,, y y (, ( (, (, ( (, ( (, ( Var y Cov Var Cov y Cov Var y Cov Var y Cov + + η η β 0, ( Cov η How to remove the nose

17 How to remove the nose Addtve nose on the dependent varable thus does not bas slope coeffcent but ncreases standard errors due to the ncrease n varance Var ( y Var( y + Var( η Now add nose η to predctor varable The model s now: y α + + β η + ε, 1,..., n where denotes true but unobserved value of 17

18 How to remove the nose If we regress y on then for the least squares scope coeffcent: β Cov( y, Var( Cov( y, + η Var( + Var( η Cov( y, Var( + + Cov( y, Var( η η Cov( y, Var( + Var( η snce Cov( y, η 0 Addtve nose on predctor varable bases slope coeffcent downwards (attenuaton Thus we need sutable methodology to deal wth these measurement errors 18

19 How to remove the nose For the lease squares slope coeffcent n a smple lnear regresson: ˆ β p Cov( y, Var( + Var( We defne λ 1 / σ as the relablty rato η ( + σ η σ βσ η β (1 + σ 2 η / σ 2 σ 1 A consstent estmate of the slope coeffcent s obtaned by dvdng least squares estmate by λ 2 λ σ η To calculate we assumes that s released and known to the researcher

20 How to remove the nose n general Nose s random wth known propertes so a measurement error model s requred Ths requres that the parameters used to generate the nose are known to the researchers. Current work (usng a CLOSER grant at Brstol s underway to develop software to show how the nose should be generated n such a way that the parameters can be released under a predetermned h-nde to protect aganst attrbute dsclosure whlst preservng utlty 20

21 How to remove the nose In smple lnear regresson, correlated nose can be added whch produce unbased estmates of slope coeffcents by usng standard regresson technques. Current work at Brstol s developng algorthms ncorporatng measurement error models that wll handle generalzed lnear models and multlevel data of dfferent types. Specalsaton to anonymsaton wth user software currently funded through ESRC (va Closer at Brstol (Boyd, Goldsten and Burton Can be combned wth handlng mssng data values. Some loss of statstcal effcency but enables underlyng sgnal to be etracted and thus provdes unbased parameter estmates. 21

22 Further thoughts Often, a data attacker wll have no pre-estng ndvdual data and may trawl the dataset to dscover an nterestng record, for eample an ndvdual wth an unusual combnaton of values. They may then attempt to dentfy the real person usng other varables n the data record. Our procedure s also relevant to such an attack so long as the nose has been appled to the varables n queston. How to tune the nose and dfferental nose related to dentfablty of varables s an area for further research. For eample we mght wsh to add relatvely more nose to a varable such as heght than har colour. Now, t may well be the case that, condtonal on the data avalable to the attacker, a varable such as ncome can be predcted wth suffcent accuracy wthn ths dataset, and f the data structure s well appromated ether by removng nose or va synthess then ncome could be farly accurately predcted and ths may be suffcent for an attacker s purpose. Needs further consderaton. Provson of sutable analyss tools and tranng for data analysts s mportant dscussons are underway wth Government departments and agences through ADRN. 22

23 Thank you for your attenton 23

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics Introducton G10 NAG Fortran Lbrary Chapter Introducton G10 Smoothng n Statstcs Contents 1 Scope of the Chapter... 2 2 Background to the Problems... 2 2.1 Smoothng Methods... 2 2.2 Smoothng Splnes and Regresson

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Adjustment methods for differential measurement errors in multimode surveys

Adjustment methods for differential measurement errors in multimode surveys Adjustment methods for dfferental measurement errors n multmode surveys Salah Merad UK Offce for Natonal Statstcs ESSnet MM DCSS, Fnal Meetng Wesbaden, Germany, 4-5 September 2014 Outlne Introducton Stablsng

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

Parameter estimation for incomplete bivariate longitudinal data in clinical trials Parameter estmaton for ncomplete bvarate longtudnal data n clncal trals Naum M. Khutoryansky Novo Nordsk Pharmaceutcals, Inc., Prnceton, NJ ABSTRACT Bvarate models are useful when analyzng longtudnal data

More information

Econometrics 2. Panel Data Methods. Advanced Panel Data Methods I

Econometrics 2. Panel Data Methods. Advanced Panel Data Methods I Panel Data Methods Econometrcs 2 Advanced Panel Data Methods I Last tme: Panel data concepts and the two-perod case (13.3-4) Unobserved effects model: Tme-nvarant and dosyncratc effects Omted varables

More information

A Post Randomization Framework for Privacy-Preserving Bayesian. Network Parameter Learning

A Post Randomization Framework for Privacy-Preserving Bayesian. Network Parameter Learning A Post Randomzaton Framework for Prvacy-Preservng Bayesan Network Parameter Learnng JIANJIE MA K.SIVAKUMAR School Electrcal Engneerng and Computer Scence, Washngton State Unversty Pullman, WA. 9964-75

More information

Modeling Local Uncertainty accounting for Uncertainty in the Data

Modeling Local Uncertainty accounting for Uncertainty in the Data Modelng Local Uncertanty accontng for Uncertanty n the Data Olena Babak and Clayton V Detsch Consder the problem of estmaton at an nsampled locaton sng srrondng samples The standard approach to ths problem

More information

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010 Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement

More information

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007 Syntheszer 1.0 A Varyng Coeffcent Meta Meta-Analytc nalytc Tool Employng Mcrosoft Excel 007.38.17.5 User s Gude Z. Krzan 009 Table of Contents 1. Introducton and Acknowledgments 3. Operatonal Functons

More information

A CLASS OF TRANSFORMED EFFICIENT RATIO ESTIMATORS OF FINITE POPULATION MEAN. Department of Statistics, Islamia College, Peshawar, Pakistan 2

A CLASS OF TRANSFORMED EFFICIENT RATIO ESTIMATORS OF FINITE POPULATION MEAN. Department of Statistics, Islamia College, Peshawar, Pakistan 2 Pa. J. Statst. 5 Vol. 3(4), 353-36 A CLASS OF TRANSFORMED EFFICIENT RATIO ESTIMATORS OF FINITE POPULATION MEAN Sajjad Ahmad Khan, Hameed Al, Sadaf Manzoor and Alamgr Department of Statstcs, Islama College,

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005 Exercses (Part 4) Introducton to R UCLA/CCPR John Fox, February 2005 1. A challengng problem: Iterated weghted least squares (IWLS) s a standard method of fttng generalzed lnear models to data. As descrbed

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Announcements. Supervised Learning

Announcements. Supervised Learning Announcements See Chapter 5 of Duda, Hart, and Stork. Tutoral by Burge lnked to on web page. Supervsed Learnng Classfcaton wth labeled eamples. Images vectors n hgh-d space. Supervsed Learnng Labeled eamples

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

SVM-based Learning for Multiple Model Estimation

SVM-based Learning for Multiple Model Estimation SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

A Statistical Model Selection Strategy Applied to Neural Networks

A Statistical Model Selection Strategy Applied to Neural Networks A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos

More information

Lecture 5: Probability Distributions. Random Variables

Lecture 5: Probability Distributions. Random Variables Lecture 5: Probablty Dstrbutons Random Varables Probablty Dstrbutons Dscrete Random Varables Contnuous Random Varables and ther Dstrbutons Dscrete Jont Dstrbutons Contnuous Jont Dstrbutons Independent

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Random Variables and Probability Distributions

Random Variables and Probability Distributions Random Varables and Probablty Dstrbutons Some Prelmnary Informaton Scales on Measurement IE231 - Lecture Notes 5 Mar 14, 2017 Nomnal scale: These are categorcal values that has no relatonshp of order or

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Review of approximation techniques

Review of approximation techniques CHAPTER 2 Revew of appromaton technques 2. Introducton Optmzaton problems n engneerng desgn are characterzed by the followng assocated features: the objectve functon and constrants are mplct functons evaluated

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

7/12/2016. GROUP ANALYSIS Martin M. Monti UCLA Psychology AGGREGATING MULTIPLE SUBJECTS VARIANCE AT THE GROUP LEVEL

7/12/2016. GROUP ANALYSIS Martin M. Monti UCLA Psychology AGGREGATING MULTIPLE SUBJECTS VARIANCE AT THE GROUP LEVEL GROUP ANALYSIS Martn M. Mont UCLA Psychology NITP AGGREGATING MULTIPLE SUBJECTS When we conduct mult-subject analyss we are tryng to understand whether an effect s sgnfcant across a group of people. Whether

More information

LECTURE : MANIFOLD LEARNING

LECTURE : MANIFOLD LEARNING LECTURE : MANIFOLD LEARNING Rta Osadchy Some sldes are due to L.Saul, V. C. Raykar, N. Verma Topcs PCA MDS IsoMap LLE EgenMaps Done! Dmensonalty Reducton Data representaton Inputs are real-valued vectors

More information

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour 6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Why visualisation? IRDS: Visualization. Univariate data. Visualisations that we won t be interested in. Graphics provide little additional information

Why visualisation? IRDS: Visualization. Univariate data. Visualisations that we won t be interested in. Graphics provide little additional information Why vsualsaton? IRDS: Vsualzaton Charles Sutton Unversty of Ednburgh Goal : Have a data set that I want to understand. Ths s called exploratory data analyss. Today s lecture. Goal II: Want to dsplay data

More information

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches Proceedngs of the Internatonal Conference on Cognton and Recognton Fuzzy Flterng Algorthms for Image Processng: Performance Evaluaton of Varous Approaches Rajoo Pandey and Umesh Ghanekar Department of

More information

EXTENDED BIC CRITERION FOR MODEL SELECTION

EXTENDED BIC CRITERION FOR MODEL SELECTION IDIAP RESEARCH REPORT EXTEDED BIC CRITERIO FOR ODEL SELECTIO Itshak Lapdot Andrew orrs IDIAP-RR-0-4 Dalle olle Insttute for Perceptual Artfcal Intellgence P.O.Box 59 artgny Valas Swtzerland phone +4 7

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Mixed Linear System Estimation and Identification

Mixed Linear System Estimation and Identification 48th IEEE Conference on Decson and Control, Shangha, Chna, December 2009 Mxed Lnear System Estmaton and Identfcaton A. Zymns S. Boyd D. Gornevsky Abstract We consder a mxed lnear system model, wth both

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

A Semi-parametric Regression Model to Estimate Variability of NO 2

A Semi-parametric Regression Model to Estimate Variability of NO 2 Envronment and Polluton; Vol. 2, No. 1; 2013 ISSN 1927-0909 E-ISSN 1927-0917 Publshed by Canadan Center of Scence and Educaton A Sem-parametrc Regresson Model to Estmate Varablty of NO 2 Meczysław Szyszkowcz

More information

Comparing High-Order Boolean Features

Comparing High-Order Boolean Features Brgham Young Unversty BYU cholarsarchve All Faculty Publcatons 2005-07-0 Comparng Hgh-Order Boolean Features Adam Drake adam_drake@yahoo.com Dan A. Ventura ventura@cs.byu.edu Follow ths and addtonal works

More information

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp Lfe Tables (Tmes) Summary... 1 Data Input... 2 Analyss Summary... 3 Survval Functon... 5 Log Survval Functon... 6 Cumulatve Hazard Functon... 7 Percentles... 7 Group Comparsons... 8 Summary The Lfe Tables

More information

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap Int. Journal of Math. Analyss, Vol. 8, 4, no. 5, 7-7 HIKARI Ltd, www.m-hkar.com http://dx.do.org/.988/jma.4.494 Emprcal Dstrbutons of Parameter Estmates n Bnary Logstc Regresson Usng Bootstrap Anwar Ftranto*

More information

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT 3. - 5. 5., Brno, Czech Republc, EU APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT Abstract Josef TOŠENOVSKÝ ) Lenka MONSPORTOVÁ ) Flp TOŠENOVSKÝ

More information

Biostatistics 615/815

Biostatistics 615/815 The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts

More information

Recognizing Faces. Outline

Recognizing Faces. Outline Recognzng Faces Drk Colbry Outlne Introducton and Motvaton Defnng a feature vector Prncpal Component Analyss Lnear Dscrmnate Analyss !"" #$""% http://www.nfotech.oulu.f/annual/2004 + &'()*) '+)* 2 ! &

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

Electrical analysis of light-weight, triangular weave reflector antennas

Electrical analysis of light-weight, triangular weave reflector antennas Electrcal analyss of lght-weght, trangular weave reflector antennas Knud Pontoppdan TICRA Laederstraede 34 DK-121 Copenhagen K Denmark Emal: kp@tcra.com INTRODUCTION The new lght-weght reflector antenna

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics Ths module s part of the Memobust Handbook on Methodology of Modern Busness Statstcs 26 March 2014 Theme: Donor Imputaton Contents General secton... 3 1. Summary... 3 2. General descrpton... 3 2.1 Introducton

More information

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty

More information

Lecture 4: Principal components

Lecture 4: Principal components /3/6 Lecture 4: Prncpal components 3..6 Multvarate lnear regresson MLR s optmal for the estmaton data...but poor for handlng collnear data Covarance matrx s not nvertble (large condton number) Robustness

More information

Three supervised learning methods on pen digits character recognition dataset

Three supervised learning methods on pen digits character recognition dataset Three supervsed learnng methods on pen dgts character recognton dataset Chrs Flezach Department of Computer Scence and Engneerng Unversty of Calforna, San Dego San Dego, CA 92093 cflezac@cs.ucsd.edu Satoru

More information

Intra-Parametric Analysis of a Fuzzy MOLP

Intra-Parametric Analysis of a Fuzzy MOLP Intra-Parametrc Analyss of a Fuzzy MOLP a MIAO-LING WANG a Department of Industral Engneerng and Management a Mnghsn Insttute of Technology and Hsnchu Tawan, ROC b HSIAO-FAN WANG b Insttute of Industral

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned

More information

C2 Training: June 8 9, Combining effect sizes across studies. Create a set of independent effect sizes. Introduction to meta-analysis

C2 Training: June 8 9, Combining effect sizes across studies. Create a set of independent effect sizes. Introduction to meta-analysis C2 Tranng: June 8 9, 2010 Introducton to meta-analyss The Campbell Collaboraton www.campbellcollaboraton.org Combnng effect szes across studes Compute effect szes wthn each study Create a set of ndependent

More information

Fusion Performance Model for Distributed Tracking and Classification

Fusion Performance Model for Distributed Tracking and Classification Fuson Performance Model for Dstrbuted rackng and Classfcaton K.C. Chang and Yng Song Dept. of SEOR, School of I&E George Mason Unversty FAIRFAX, VA kchang@gmu.edu Martn Lggns Verdan Systems Dvson, Inc.

More information

THE THEORY OF REGIONALIZED VARIABLES

THE THEORY OF REGIONALIZED VARIABLES CHAPTER 4 THE THEORY OF REGIONALIZED VARIABLES 4.1 Introducton It s ponted out by Armstrong (1998 : 16) that Matheron (1963b), realzng the sgnfcance of the spatal aspect of geostatstcal data, coned the

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

The Man-hour Estimation Models & Its Comparison of Interim Products Assembly for Shipbuilding

The Man-hour Estimation Models & Its Comparison of Interim Products Assembly for Shipbuilding Internatonal Journal of Operatons Research Internatonal Journal of Operatons Research Vol., No., 9 4 (005) The Man-hour Estmaton Models & Its Comparson of Interm Products Assembly for Shpbuldng Bn Lu and

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Control strategies for network efficiency and resilience with route choice

Control strategies for network efficiency and resilience with route choice Control strateges for networ effcency and reslence wth route choce Andy Chow Ru Sha Centre for Transport Studes Unversty College London, UK Centralsed strateges UK 1 Centralsed strateges Some effectve

More information

We Two Seismic Interference Attenuation Methods Based on Automatic Detection of Seismic Interference Moveout

We Two Seismic Interference Attenuation Methods Based on Automatic Detection of Seismic Interference Moveout We 14 15 Two Sesmc Interference Attenuaton Methods Based on Automatc Detecton of Sesmc Interference Moveout S. Jansen* (Unversty of Oslo), T. Elboth (CGG) & C. Sanchs (CGG) SUMMARY The need for effcent

More information

Available online at ScienceDirect. Procedia Environmental Sciences 26 (2015 )

Available online at   ScienceDirect. Procedia Environmental Sciences 26 (2015 ) Avalable onlne at www.scencedrect.com ScenceDrect Proceda Envronmental Scences 26 (2015 ) 109 114 Spatal Statstcs 2015: Emergng Patterns Calbratng a Geographcally Weghted Regresson Model wth Parameter-Specfc

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

5.0 Quality Assurance

5.0 Quality Assurance 5.0 Dr. Fred Omega Garces Analytcal Chemstry 25 Natural Scence, Mramar College Bascs of s what we do to get the rght answer for our purpose QA s planned and refers to planned and systematc producton processes

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

REFRACTIVE INDEX SELECTION FOR POWDER MIXTURES

REFRACTIVE INDEX SELECTION FOR POWDER MIXTURES REFRACTIVE INDEX SELECTION FOR POWDER MIXTURES Laser dffracton s one of the most wdely used methods for partcle sze analyss of mcron and submcron sze powders and dspersons. It s quck and easy and provdes

More information

A Similarity-Based Prognostics Approach for Remaining Useful Life Estimation of Engineered Systems

A Similarity-Based Prognostics Approach for Remaining Useful Life Estimation of Engineered Systems 2008 INTERNATIONAL CONFERENCE ON PROGNOSTICS AND HEALTH MANAGEMENT A Smlarty-Based Prognostcs Approach for Remanng Useful Lfe Estmaton of Engneered Systems Tany Wang, Janbo Yu, Davd Segel, and Jay Lee

More information

LOOP ANALYSIS. The second systematic technique to determine all currents and voltages in a circuit

LOOP ANALYSIS. The second systematic technique to determine all currents and voltages in a circuit LOOP ANALYSS The second systematic technique to determine all currents and voltages in a circuit T S DUAL TO NODE ANALYSS - T FRST DETERMNES ALL CURRENTS N A CRCUT AND THEN T USES OHM S LAW TO COMPUTE

More information

Solutions to Programming Assignment Five Interpolation and Numerical Differentiation

Solutions to Programming Assignment Five Interpolation and Numerical Differentiation College of Engneerng and Coputer Scence Mechancal Engneerng Departent Mechancal Engneerng 309 Nuercal Analyss of Engneerng Systes Sprng 04 Nuber: 537 Instructor: Larry Caretto Solutons to Prograng Assgnent

More information

Using Auxiliary Data for Adjustment In Longitudinal Research. Dirk Sikkel Joop Hox Edith de Leeuw

Using Auxiliary Data for Adjustment In Longitudinal Research. Dirk Sikkel Joop Hox Edith de Leeuw Usng Auxlary Data for Adjustment In Longtudnal Research Drk Skkel Joop Hox Edth de Leeuw 1. Longtudnal Research The use of longtudnal research desgns, such as cohort or panel studes, has become more and

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Graph-based Clustering

Graph-based Clustering Graphbased Clusterng Transform the data nto a graph representaton ertces are the data ponts to be clustered Edges are eghted based on smlarty beteen data ponts Graph parttonng Þ Each connected component

More information

Principal Component Inversion

Principal Component Inversion Prncpal Component Inverson Dr. A. Neumann, H. Krawczyk German Aerospace Centre DLR Remote Sensng Technology Insttute Marne Remote Sensng Prncpal Components - Propertes The Lnear Inverson Algorthm Optmsaton

More information

Support Vector Machines for Business Applications

Support Vector Machines for Business Applications Support Vector Machnes for Busness Applcatons Bran C. Lovell and Chrstan J Walder The Unversty of Queensland and Max Planck Insttute, Tübngen {lovell, walder}@tee.uq.edu.au Introducton Recent years have

More information

A Clustering Algorithm for Chinese Adjectives and Nouns 1

A Clustering Algorithm for Chinese Adjectives and Nouns 1 Clusterng lgorthm for Chnese dectves and ouns Yang Wen, Chunfa Yuan, Changnng Huang 2 State Key aboratory of Intellgent Technology and System Deptartment of Computer Scence & Technology, Tsnghua Unversty,

More information

Finite Population Small Area Interval Estimation

Finite Population Small Area Interval Estimation Journal of Offcal Statstcs, Vol. 23, No. 2, 2007, pp. 223 237 Fnte Populaton Small Area Interval Estmaton L-Chun Zhang 1 Small area nterval estmaton s consdered for a fnte populaton, where the small area

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Outlier Detection based on Robust Parameter Estimates

Outlier Detection based on Robust Parameter Estimates Outler Detecton based on Robust Parameter Estmates Nor Azlda Aleng 1, Ny Ny Nang, Norzan Mohamed 3 and Kasyp Mokhtar 4 1,3 School of Informatcs and Appled Mathematcs, Unverst Malaysa Terengganu, 1030 Kuala

More information