Application Presentation - SASECA. Sistema Asociativo para la Selección de Características

Size: px

Start display at page:

Download "Application Presentation - SASECA. Sistema Asociativo para la Selección de Características"

Nelson Lyons
5 years ago
Views:

Aldape Pérez (CIC-IPN) Manuel Alejandro Soto-Ramos (CIC-IPN) Joint

1 INSTITUTO POLITÉCNICO NACIONAL CENTRO DE INVESTIGACIÓN EN COMPUTACIÓN Application Presentation - SASECA Sistema Asociativo para la Selección de Características Mario Aldape Pérez (CIC-IPN) Manuel Alejandro Soto-Ramos (CIC-IPN) Joint CHAIN/GISELA/EPIKH School for Application Porting to Science Gateways June 2012 Mexico City (Mexico)

2 Project Objectives 2

3 A new approach for feature selection using associative memories. A method for estimating the quality of learning in an Associative Memory. A method that reduces the space complexity by eliminating redundant information of the patterns that form the fundamental set. A method for reducing the size of the patterns that form the fundamental set. 3

4 4

5 Associative Memories 5

6 An associative memory can be seen as a system that relates input patterns with output patterns. M M 6

7 Heteroassociative memory M (elaine.512) Autoassociative memory M 7

8 Heteroassociative memory M (elaine.512) Autoassociative memory M 8

9 9 Let and let, be two column vectors with. The fundamental set of associations is denoted by: 0,1, A x, n A x n,u y u A y 1 2 n n x x A x x 1 2 u u y y A y y 1, 2,,, p x y

10 10

11 Lernmatrix y 1 y i y u x 1 m 11 x j m 1 j x n m m i1 m m u1 m ij uj m m 1n in un [ ] Steinbuch, K. (1961). Die Lernmatrix. Biological Cybernetics, 1,

12 Learning Phase m m m ij ij ij if yi 1 x j 1 mij if yi 1 y xj 0 0 otherwise With a positive constant 12

13 Classification Phase Given an unknown input pattern x A n. Find the class y A u to which an unknown input pattern belongs, according to the following rule: x y i n n u 1 if mij x j h 1 mhj x j j1 j1 0 otherwise With the maximum operator 13

14 14

15 Linear Associator Learning Phase y 1 t y y x y u 2 x 1, x2,, xn p t m ij 1 uxn M y x [ ] Yáñez Márquez, C. & Díaz de León Santiago, J.L. (2001). Linear Associator de Anderson-Kohonen, IT-50, Serie Verde, ISBN , CIC-IPN, México. 15

16 Classification Phase Given an unknown input pattern x A n. Find the class y A u to which an unknown input pattern belongs, according to the following rule: x Mx y x x 1 p t M x y 16

17 Learning Phase CHAT p M y x 1 t Classification Phase y i n n u 1 if mij x j h 1 mhj x j j1 j1 0 otherwise With the maximum operator [ ] Santiago Montero, R. (2003). Clasificador Híbrido de Patrones basado en la Lernmatrix de Steinbuch y el Linear Associator de Anderson-Kohonen. Tesis de Maestría en Ciencias de la Computación, Centro de Investigación en Computación, México. 17

18 Disadvantages: It is not possible to identify redundant or irrelevant information It is not possible to reduce the dimension of the training patterns 18

19 19

20 Feature Selection 20

21 Feature Selection. Consists of choosing those features that preserve or maximize the separation between classes. Filter Wrapper 21

22 Filter Dimensionality reduction Supervised learning Classification Preprocessing Data acquisition Reduced data Prediction Preprocessed data Original data 22

23 Advantages: Low computational costs. Disadvantages: No feedback from the predictor that will be using a lower dimensionality set. 23

24 Wrapper Dimensionality reduction Supervised learning Classification Preprocessing Data acquisition Reduced data Prediction Original data Preprocessed data 2 n iterations 24

25 Advantages: Allows to obtain the optimal subset of features. Disadvantages: Implies high computational costs. Not applicable to high dimensional data sets. 25

26 Learning Phase HCM p C y x 1 t Classification Phase y n n r 1 u r if m x e 1 m x e ij j j h hj j j i j1 j1 0 otherwise r 1,2,...,(2 f 1) With the maximum operator [ ] Aldape-Pérez, M., Yáñez-Márquez, C., & Argüelles-Cruz, A. J. (2007). Optimized Associative Memories for Feature Selection. Lecture Notes in Computer Science, 4477,

27 Advantages: Reduces the dimensionality of the problem. Allows to obtain the optimal subset of features. It increases the predictive accuracy in pattern classification problems. 27

28 Disadvantages: Requires an exploration of the entire solution space Implies high computational costs 28

29 Proposed Model 29

30 Data acquisition Preprocessing Supervised learning Feature Selection Original data Preprocessed data Associative Memory Constraint vector Prediction Reduced data Masked data 30

31 Previous Results 31

32 Experiments Along the experimental phase, four databases were used. These were taken from the UCI Machine Learning Repository. Breast Heart Credit Hepatitis Number of features Number of classes Number of patterns [*] 32

33 Breast Cancer [*] Problem: Classification Type of attributes: Integer Number of patterns: 699 Number of attributes: class label [*] 33

34 34

35 35

36 36

37 Heart Disease Database [*] Problem: Classification Type of attributes: Real Number of patterns: 270 Number of attributes: class label [*] 37

38 38

39 39

40 40

41 Australian Credit Approval Database [*] Problem: Classification Type of attributes: Real Number of patterns: 690 Number of attributes: class label [*] 41

42 42

43 43

44 Hepatitis Database [*] Problem: Classification Type of attributes: Integer Number of patterns: 155 Number of attributes: class label [*] 44

45 45

46 46

47 47

48 48

49 Number of features Time seg seg seg seg min min día mes año 49

50 50

51 Our Proposal 51

52 Data acquisition Preprocessing Supervised learning Feature Selection Original data Preprocessed data Associative Memory Constraint vector Prediction Reduced data Masked data 52

53 Associative Memory Feature Selection Constraint vector Masked data Reduced data Constraint vector Masked data Reduced data 53

54 QUESTIONS 54

55 55

56 56

57 57

58 58

59 59

SSV Criterion Based Discretization for Naive Bayes Classifiers

SSV Criterion Based Discretization for Naive Bayes Classifiers Krzysztof Grąbczewski kgrabcze@phys.uni.torun.pl Department of Informatics, Nicolaus Copernicus University, ul. Grudziądzka 5, 87-100 Toruń,