Towards a Logical Model for Patterns

Size: px
Start display at page:

Download "Towards a Logical Model for Patterns"

Transcription

1 Towards a Logical Model for Patterns Stefano Rizzi 1, Elisa Bertino 2, Barbara Catania 3, Matteo Golfarelli 1, Maria Halkidi 4, Manolis Terrovitis 5, Panos Vassiliadis 6, Michalis Vazirgiannis 4, and Euripides Vrachnos 4 1 DEIS, Univ. of Bologna, Italy 2 DICO, Univ. of Milan, Italy 3 DISI, Univ. of Genoa, Italy 4 Athens Univ. of Economics & Business, Greece 5 Dept. of Electrical and Computer Engineering, Nat. Tech. Univ. of Athens, Greece 6 Dept. of Computer Science, University of Ioannina, Greece Abstract. Nowadays, the vast volume of collected digital data obliges us to employ processing methods like pattern recognition and data mining in order to reduce the complexity of data management. In this paper, we present the architecture and the logical foundations for the management of the produced knowledge artifacts, which we call patterns. To this end, we first introduce the concept of Pattern-Base Management System; then, we provide the logical foundations of a general framework based on the notions of pattern types and pattern classes, which stand for the intensional and extensional description of pattern instances, respectively. The framework is general and extensible enough to cover a broad range of real-world patterns, each of which is characterized by its structure, the related underlying data, an expression that carries the semantics of the pattern, and measurements of how successful the representation of raw data is. Finally, some remarkable types of relationships between patterns are discussed. 1 Introduction and Motivation The increasing opportunity of quickly collecting and cheaply storing large volumes of data, and the need for extracting concise information to be efficiently manipulated and intuitively analysed, are posing new requirements for DBMSs in both industrial and scientific applications. In this direction, during the last decade we witnessed the progressive spreading and success of data warehousing systems, built on top of operational databases in order to provide managers and knowledge workers with ready-at-hand summary data to be used for decision support. In these systems, the transactional view of data is replaced by a multidimensional view, relying on an ad-hoc logical model (the multidimensional This work was partially funded by the Information Society Technologies programme of the European Commission, Future and Emerging Technologies under the IST PANDA project I.-Y. Song et al. (Eds.): ER 2003, LNCS 2813, pp , c Springer-Verlag Berlin Heidelberg 2003

2 78 S. Rizzi et al. Table 1. Some examples of patterns. Application Raw data Type of pattern market-basket analysis sales transactions item association rules signal processing complex signals recurrent waveforms mobile objects monitoring measured trajectories equations information retrieval documents keyword frequencies image recognition image database image features market segmentation user profiles user clusters music retrieval music scores, audio files rhythm, melody, harmony system monitoring system output stream failure patterns financial brokerage trading records stock trends click-stream analysis web-server logs sequences of clicks epidemiology clinical records symptom-diagnosis correlations risk evaluation customer records decision trees model [17]) whose key concepts are made first-class citizens, meaning that they can be directly stored, queried, and manipulated. On the other hand, the limited analysis power provided by OLAP interfaces proved to be insufficient for advanced applications, in which the huge quantity of data stored necessarily requires semi-automated processing techniques, and the peculiarity of the user requirements calls for non-standard analysis techniques. Thus, sophisticated data processing tools (based for instance on data mining, pattern recognition, and knowledge extraction techniques) were devised in order to reduce, as far as possible, the user intervention in the process of extracting interesting knowledge artifacts (e.g., clusters, association rules, time series) from raw data [8,11,13]. We claim that the term pattern is a good candidate to generally denote these novel information types, characterized by a high degree of diversity and complexity. Some examples of patterns in different application domains are reported in Table 1. Differently from the case of data warehousing systems, the problem of directly storing and querying pattern-bases has received very limited attention so far in the commercial world, and probably no attention at all from the database community. On the other hand, we claim that end-users from both industrial and scientific domains would greatly benefit from adopting a Pattern-Base Management System (PBMS) capable of modeling and storing patterns, for the following main reasons: Abstraction. Within a PBMS, patterns would be made first-class citizens thus providing the user with a meaningful abstraction of raw data to be directly analyzed and manipulated. Efficiency. Introducing an architectural separation between the PBMS and the DBMS would improve the efficiency of both traditional transactions on the DBMS and advanced processing on patterns. Querying. The PBMS would provide an expressive language for querying the pattern-base in order to retrieve and compare patterns.

3 Towards a Logical Model for Patterns 79 In this context, the purposes of the PANDA (PAtterns for Next-generation DAtabase systems [4]) project of the European Community are: (1) to lay the foundations for pattern modeling; (2) to investigate the main issues involved in managing and querying a pattern-base; and (3) to outline the requirements for building a PBMS. In this paper we propose a logical framework for modeling patterns, aimed at satisfying three basic requirements: Generality. The model must be general enough to meet the specific requirements posed in different application domains for different kinds of patterns. Extensibility. The model must be extensible to accomodate new kinds of patterns introduced by novel and challenging applications. Reusability. The model must include constructs encouraging the reuse of what has already been defined. Informally, a pattern can be thought of as a compact and rich in semantics representation of raw data. In our approach, a pattern is modeled by its structure, measure, source, and expression. The structure component qualifies the pattern by locating it within a pattern space. The measure component quantifies the pattern by measuring the quality of the raw data representation achieved by the pattern itself. The source component describes the raw data which the pattern relates to. The expression component describes the (approximate) mapping between the raw data space and the pattern space. The paper is structured as follows. In Section 2 we deliver an informal definition of patterns and outline a reference architecture in which a PBMS could be framed. In Section 3 we formally describe our proposal of a logical framework for modeling patterns, while Section 4 discusses some remarkable types of relationships between patterns. In Section 5 we survey some related approaches. Finally, in Section 6 the conclusions are drawn and the future work is outlined. 2 From DBMSs to PBMSs 2.1 Raw Data vs. Patterns Raw data are recorded from various sources in the real world, often by collecting measurements from various instruments or devices (e.g., cellular phones, environment measurements, monitoring of computer systems, etc.). The determining property of raw data is the vastness of their volume; moreover, a significant degree of heterogeneity may be present. Clearly, data in such huge volumes do not constitute knowledge per se, i.e. little useful information can be deduced simply by their observation, so they hardly can be directly exploited by human beings. Thus, more elaborate techniques are required in order to extract the hidden knowledge and make these data valuable for end-users. The common characteristic of all these techniques is that large portions of the available data are abstracted and effectively represented by a small number of knowledge-carrying representatives, which we call

4 80 S. Rizzi et al. Fig. 1. Patterns and raw data in the supermarket example. patterns. Thus, in general, one pattern is related to many data items; on the other hand, several patterns (possibly of different types) can be associated to the same data item (e.g., due to the application of different algorithms). Example 1. Consider a supermarket database which records the items purchased by each customer within each sales transaction. While this large volume of data is not providing the supermarket management with any clear indication about the buying habits of customers, some knowledge discovery algorithm can be applied to come up with relevant knowledge. In Figure 1 both a clustering technique [15] and an algorithm for extracting association rules [7] have been applied. In the first case, patterns are clusters of customers which share some categories of products. In the second, patterns come in the form of association rules which relate sets of items frequently bought together by customers; the relevance of each rule is typically expressed by statistical measures which quantify its support and confidence. Note that each pattern, besides providing the end-user with some hidden knowledge over the underlying data, can be mapped to the subset of data it is related to. Patterns, thus, can be regarded as artifacts which effectively describe subsets of raw data (thus, they are compact) by isolating and emphasizing some interesting properties (thus, they are rich in semantics). While in most cases a pattern is interesting to the end-users because it describes a recurrent behaviour (e.g., in market segmentation, stock exchange analysis, etc.), sometimes it is relevant just because it is related to some singular, unexpected event (e.g., in failure monitoring). Note that all the kinds of patterns, besides being somehow related to raw data, also imply some processing in order to either generate them through some learning algorithm or to check/map them against raw data. We are now ready to give a preliminary, informal, definition for patterns; a formal definition will be given in Section 3.1. Definition 1 (Pattern). A pattern is a compact and rich in semantics representation of raw data. 2.2 Architecture Patterns can be managed by using a Pattern-Base Management System exactly as database records are managed by a database management system. A Pattern- Base Management System is thus defined as follows.

5 Towards a Logical Model for Patterns 81 Fig. 2. The PBMS architecture. Definition 2 (PBMS). A Pattern-Base Management System (PBMS) is a system for handling (storing/processing/retrieving) patterns defined over raw data in order to efficiently support pattern matching and to exploit pattern-related operations generating intensional information. The set of patterns managed by a PBMS is called pattern-base. The reference architecture for a PBMS is depicted in Figure 2. On the bottom layer, a set of devices produce data, which are then organized and stored within databases or files to be typically, but not necessarily, managed by a DBMS. Knowledge discovery algorithms are applied over these data and generate patterns to be fed into the PBMS; note that, in our approach, these algorithms are loosely coupled with the PBMS. Within the PBMS, it is worth to distinguish three different layers:

6 82 S. Rizzi et al. 1. The pattern layer is populated with patterns. 2. The type layer holds built-in and user-defined types for patterns. Patterns of the same type share similar structural characteristics. 3. The class layer holds definitions of pattern classes, i.e., collections of semantically related patterns. Classes play the role of collections in the objectoriented context and are the key concept in the definition of a pattern query language. Besides using the DBMS, end-users may directly interact with the PBMS: to this end, the PBMS adopts ad-hoc techniques not only for representing and storing patterns, but also for posing and processing queries and for efficiently retrieving patterns. 3 The Logical Modeling Framework In this section we formalize our proposal of a logical framework for modeling patterns by characterizing pattern types, their instances, and the classes which collect them. 3.1 Pattern Types Though our approach is parametric on the typing system adopted, the examples provided in this paper will be based on a specific, very common typing system. Assuming there is a set of base types (including the root type ) and a set of type constructors, the set T of types includes all the base types together with all the types recursively defined by applying a type constructor to one or more other types. Types are applied to attributes. Let base types include integers, reals, Booleans, strings, and timestamps; let type constructors include list, set, bag, array, and tuple. Using an obvious syntax, some examples of type declarations are (we use uppercase for base types and type constructors, lowercase for attributes): salary: REAL SET(INTEGER) TUPLE(x: INTEGER, y: INTEGER) personnel: LIST(TUPLE(age: INTEGER, salary: INTEGER)) A pattern type represents the intensional form of patterns, giving a formal description of their structure and relationship with source data. Thus, pattern types play the same role of abstract data types in the object-oriented model. Definition 3 (Pattern type). A pattern type pt is a quintuple pt =(n, ss, ds, ms, f) where n is the name of the pattern type; ss, ds, and ms (called respectively structure schema, source schema, and measure schema) are types in T ; f is a formula, written in a given language, which refers to attributes appearing in the source and in the structure schemas.

7 Towards a Logical Model for Patterns 83 The first component of a pattern type has an obvious meaning; the remaining four have the following roles: The structure schema ss defines the pattern space by describing the structure of the patterns instances of the pattern type. The achievable complexity of the pattern space (hence, the flexibility of pattern representation) depends on the expressivity of the typing system. The source schema ds defines the related source space by describing the dataset from which patterns, instances of the pattern type being defined, are constructed. Characterizing the source schema is fundamental for every operation which involves both the pattern space and the source space (e.g., when applying some technique to extract patterns from raw data or when checking for the validity of a pattern on a dataset). The measure schema ms describes the measures which quantify the quality of the source data representation achieved by the pattern. The role of this component is to enable the user to evaluate how accurate and significant for a given application each pattern is. Besides, the different semantics of the measure component with reference to the structure can be exploited in order to define more effective functions for evaluating the distance between two patterns [12]. The formula f describes the relationship between the source space and the pattern space, thus carrying the semantics of the pattern. Inside f, attributes are interpreted as free variables ranging over the components of either the source or the pattern space. Note that, though in some particular domains f may exactly express the inter-space relationship (at most, by allowing all raw data related to the pattern to be enumerated), in most cases it will describe it only approximatively. Though our approach to pattern modeling is parametric on the language adopted for formulas, the achievable semantics for patterns strongly depends on its expressivity. For the examples reported in this paper, we try a constraint calculus based on polynomial constraints which seems suitable for several types of patterns [16]; still, a full exploration of the most suitable language is outside the scope of the paper. Example 2. Given a domain D of values and a set of transactions, each including a subset of D, anassociation rule takes the form A B where A D, B D, A B =. A is often called the head of the rule, while B is its body [13]. A possible pattern type for modeling association rules over strings representing products is the following: n : AssociationRule ss : TUPLE(head: SET(STRING), body: SET(STRING)) ds : BAG(transaction: SET(STRING)) ms : TUPLE(confidence: REAL, support: REAL) f : x(x head x body x transaction)

8 84 S. Rizzi et al. The structure schema is a tuple modeling the head and the body. The source schema specifies that association rules are constructed from a bag of transactions, each defined as a set of products. The measure schema includes two common measures used to assess the relevance of a rule: its confidence (what percentage of the transactions including the head also include the body) and its support (what percentage of the whole set of transactions include both the head and the body). Finally, the formula of the constraint calculus represents (exactly, in this case) the pattern/dataset relationship by associating each rule with the set of transactions which support it. Example 3. An example of a mathematical pattern is a straight line which interpolates a set of samples. In this case, the source schema models the samples, the structure schema includes the two coefficients necessary to determine a line, while the measure schema includes, for instance, a fitting quantifier. The formula which establishes the approximate correspondence between the pattern and the source data is the equation of the line. n : InterpolatingLine ss : TUPLE(a: REAL, b:real) ds : SET(sample: TUPLE(x: REAL, y: REAL)) ms : fitting: REAL f : y=a x+b 3.2 Patterns Let raw data be stored in a number of databases and/or files. A dataset is any subset of these data, which we assume to be wrapped under a type of our typing system (dataset type). Definition 4 (Pattern). Let pt =(n, ss, ds, ms, f) be a pattern type. A pattern p instance of pt is a quintuple p =(pid, s, d, m, e) where pid (pattern identifier) is a unique identifier for p; s (structure) is a value for type ss; d (source) isa dataset whose type conforms to type ds; m (measure) is a value for type ms; e is an expression denoting the region of the source space that is related to p. According to this definition, a pattern is characterized by (1) a pattern identifier (which plays the same role of OIDs in the object model), (2) a structure that positions the pattern within the pattern space defined by its pattern type, (3) a source that identifies the specific dataset the pattern relates to, (4) a measure that estimates the quality of the raw data representation achieved by the pattern, (5) an expression which relates the pattern to the source data. In particular, the expression is obtained by the formula f in the pattern type by (1) instantiating each attribute appearing in ss with the corresponding value specified in s, and (2) letting the attributes appearing in ds range over the source space. Note that further information could be associated to each pattern, specifying for instance the mining session which produced it, which algorithm was used, which parameter values rule the algorithm, etc.

9 Towards a Logical Model for Patterns 85 Example 4. Consider again pattern type AssociationRule defined in Example 2, and suppose that raw data include a relational database containing a table sales which stores data related to the sales transactions in a sport shop: sales (transactionid, article, quantity). Using an extended SQL syntax to denote the dataset, an example of an instance of AssociationRule is: pid : 512 s :(head = { Boots }, body = { Socks, Hat }) d : SELECT SETOF(article) AS transaction FROM sales GROUP BY transactionid m :(confidence =0.75, support =0.55) e : {transaction : x(x { Boots, Socks, Hat } x transaction)} In the expression, transaction ranges over the source space; the values given to head and body within the structure are used to bind variables head and body in the formula of pattern type AssociationRule. Example 5. Let raw data be an array of real values corresponding to samples periodically taken from a signal, and let each pattern represent a recurrent waveshape together with the position where it appears within the dataset and its amplitude shift and gain: n : TimeSeries ss : TUPLE(curve: ARRAY[1..5](REAL), position: INTEGER, shift: REAL, gain: REAL) ds : samples: ARRAY[1..100](REAL) ms : similarity: REAL f : samples[position + i 1] = shift + gain curve[i], i :1 i 5 Measure similarity expresses how well waveshape curve approximates the source signal in that position. The formula approximatively maps curve onto the data space in position. A possible pattern, extracted from a dataset which records the hourly-detected levels of the Colorado river, is as follows: pid : 456 s :(curve =(y =0, y =0.8, y =1, y =0.8, y =0), position =12, shift =2.0, gain =1.5) d : colorado.txt m : similarity =0.83 e : {samples[12] = 2.0, samples[13] = 3.2, samples[14] = 3.5, samples[15] = 3.2, samples[16] = 2.0}

10 86 S. Rizzi et al. 3.3 Classes A class is a set of semantically related patterns and constitutes the key concept in defining a pattern query language. A class is defined for a given pattern type and contains only patterns of that type. Moreover, each pattern must belong to at least one class. Formally, a class is defined as follows. Definition 5 (Class). A class c is a triple c =(cid, pt, pc) where cid (class identifier) is a unique identifier for c, pt is a pattern type, and pc is a collection of patterns of type pt. Example 6. The Apriori algorithm described in [7] could be used to generate relevant association rules from the dataset presented in Example 4. All the generated patterns could be inserted in a class called SaleRules for pattern type AssociationRule defined in Example 2. The collection of patterns associated with the class can be later extended to include rules generated from a different dataset, representing for instance the sales transaction recorded in a different store. 4 Relationships between Patterns In this section we introduce some interesting relationships between patterns aimed at increasing the modeling expressivity of our logical framework, but which also improve reusability and extensibility and impact the querying flexibility. For space reasons we will propose here an informal presentation; see [9] for formal details. 4.1 Specialization Abstraction by specialization (the so-called IS-A relationship) is widely used in most modeling approaches, and the associated inheritance mechanism significatively addresses the extensibility and reusability issues by allowing new entities to be cheaply derived from existing ones. Specialization between pattern types can be defined by first introducing a standard notion of subtyping between base types (e.g., integer is a subtype of real). Subtyping can then be inductively extended to deal with types containing type constructors: t 1 specializes t 2 if the outermost constructors in t 1 and t 2 coincide and each component in t 2 is specialized by one component in t 1. Finally, pattern type pt 1 specializes pattern type pt 2 if the structure schema, the source schema, and the measure schema of pt 1 specialize the structure schema, the source schema, and the measure schema of pt 2. Note that, if pt 1 specializes pt 2 and class c is defined for pt 2, also the instances of pt 1 can be part of c. Example 7. Given a set S of points, a clustering is a set of clusters, each being a subset of S, such that the points in a cluster are more similar to each other

11 Towards a Logical Model for Patterns 87 than points in different clusters [13]. The components for pattern type Cluster, representing circular clusters defined on a 2-dimensional space, are: n : Cluster ss : TUPLE(radius:, center: TUPLE(cx:, cy: )) ds : SET(x:, y: ) ms : f :(x cx) 2 +(y cy) 2 radius 2 where f gives an approximate evaluation of the region of the source space represented by each cluster. While in Cluster the 2-dimensional source schema is generically defined, clusters on any specific source space will be easily defined by specialization. Thus, Cluster could be for instance specialized into a new pattern type ClusterOfIntegers where cx, cy, x, and y are specialized to integers, radius is specialized to reals, and a new measure avgintraclusterdistance of type real is added. 4.2 Composition and Refinement A nice feature of the object model is the possibility of creating complex objects, i.e. objects which consist of other objects. In our pattern framework, this can be achieved by extending the set of base types with pattern types, thus giving the user the possibility of declaring complex types. This technique may have two different impacts on modeling. Firstly, it is possible to declare the structure schema as a complex type, in order to create patterns recursively containing other patterns thus defining, from the conceptual point of view, a part-of hierarchy. We will call composition this relationship. Secondly, a complex type may appear within the source schema: this allows for supporting the modeling of patterns obtained by mining other existing patterns. Since in general a pattern is a compact representation of its source data, we may call refinement this relationship in order to emphasize that moving from a pattern type to the pattern type(s) that appear in its source entails increasing the level of detail in representing knowledge. Example 8. Let pattern type ClusterOfRules describe a mono-dimensional cluster of association rules: the source schema here represents the space of association rules, and the structure models one cluster-representative rule. Assuming that each cluster trivially includes all the rules sharing the same head, it is: n : ClusterOfRules ss : representative: AssociationRule ds : SET(rule: AssociationRule) ms : TUPLE(deviationOnConfidence: REAL, deviationonsupport: REAL) f : rule.ss.head = representative.ss.head

12 88 S. Rizzi et al. where a standard dot notation is adopted to address the components of pattern types. Thus, there is a refinement relationship between ClusterOfRules and AssociationRule. Consider now that a clustering is a set of clusters: intuitively, also clustering is a pattern, whose structure is modeled by a complex type which aggregates a set of clusters. Thus, there would be a composition relationship between pattern types Clustering and ClusterOfRules. 5 Related Approaches The most popular efforts for modeling patterns is the Predictive Model Markup Language [6], that uses XML to represent data mining models. Though PMML enables the exchange of patterns between heterogeneous pattern-bases, it does not provide any general model for the representation of different pattern types; besides, the problem of mapping patterns against raw data is not considered. Among the other approaches, we mention the SQL/MM standard [2]; here, the supported mining models are represented as SQL types and made accessible through the SQL:1999 base syntax. A framework for metadata representation is proposed by the Common Warehouse Model [1], whose main purpose is to enable easy interchange of warehouse and business intelligence metadata between various heterogeneous repositories, and not the effective manipulation of these metadata. The Java Data Mining API [5] addresses the need for procedural support of all the existing and evolving data mining standards; in particular, it supports the building of data mining models as well as the creation, storage, access, and maintenance of data and metadata that represent data mining results. Finally, the Pattern Query Language is an SQL-like query language for patterns [3], assumed to be stored like traditional data in relational tables. Overall, the listed approaches seem inadequate to represent and handle different classes of patterns in a flexible, effective, and coherent way: in fact, a given list of predefined pattern types is considered and no general approach to pattern modeling is proposed. In contrast, in our framework, the definition of a general model and the possibility of constructing new pattern types by inheritance from a root type allow uniform manipulation of all patterns. A specific mention is deserved by inductive databases, where data and patterns, in the form of rules inducted by data, are represented together to be uniformly retrieved and manipulated [14]. Our approach differs from the inductive database one in different ways: While only association rules and string patterns are usually considered there and no attempt is made towards a general pattern model, in our approach no predefined pattern types are considered and the main focus lies in devising a general and extensible model for patterns. Patterns are far more complex than the raw data they represent and, we argue, cannot be effectively stored in a relational manner. The difference in semantics between patterns and raw data discourage from adopting the same query language for both, rather call for defining a pattern

13 Towards a Logical Model for Patterns 89 query language capable of capitalizing on the peculiar semantics of each pattern component. Differently from [14], we claim that the peculiarities of patterns in terms of structure and behaviour, together with the characteristic of the expected workload on them, call for a logical separation between the database and the pattern-base in order to ensure efficient handling of both raw data and patterns through dedicated management systems. We close this section by observing that, though our approach shares some similarities with the object-oriented model, there are several specific requirements calling for an ad-hoc model and for a dedicated management system: In terms of the logical framework, the discriminating feature is the requirement for a semantically rich representation of patterns, achieved by separating structure and measure on the one hand, by introducing the expression component on the other. From a conceptual point of view, the modeling framework adopted entails the refinement relationship between patterns, not directly supported by objectoriented models. From the functional point of view, instead of pointer-chasing object-oriented queries, novel querying requirements will presumably arise, including (a) ad hoc operations over the source and pattern spaces and their mapping; (b) pattern matching tests, strengthened by the separation of structure and measure; (c) reasoning facilities based on the expression component of patterns. In terms of system architecture, the relevance of queries aimed at evaluating similarity between patterns and the request for efficiency call for alternative storage and query optimization techniques. 6 Conclusions and Future Work In this paper, we have dealt with the introduction of the architecture and the logical foundations for pattern management. First, we introduced Pattern-Base Management Systems and their architecture. Then, we provided the logical foundations of a general framework, as the basis for PBMS management. Our logical framework is based on the principles of generality, extensibility and reusability. To address the generality goal we introduced a simple yet powerful modeling framework, able to cover a broad range of real-world patterns. Thus, the most general definition of a pattern specifies its structure, the underlying data that correspond to it, an expression which is rich in semantics so as to characterize what the pattern stands for, and measurements of how successful the raw data abstraction is. To address the extensibility and usability goals, we introduced type hierarchies, that provide the PBMS with the flexibility of smoothly incorporating novel pattern types, as well as mechanisms for constructing composite patterns and for refining patterns. Though the fundamentals of pattern modeling have been addressed, several important issues still need to be investigated. Future research includes both theoretical aspects as well as implementation-specific issues. Implementation issues

14 90 S. Rizzi et al. involve primarily the construction of ad-hoc storage management and query processing modules for the efficient management of patterns. The theoretical aspects include the evaluation and comparison of the expressivity of different languages to express formulas, and the study of a flexible query language for retrieving and comparing complex patterns. In particular, as to query languages, a basic operation is that of comparison: two patterns of the same type can be compared to compute a score assessing their mutual similarity as a function of the similarity between both the structure and the measure components. Particularly challenging is the comparison between complex patterns: in this case, the similarity score is computed starting from the similarity between component patterns, then the obtained scores are aggregated, using an aggregation logic, to determine the overall similarity [10]. References 1. Common Warehouse Metamodel (CWM) ISO SQL/MM Part 6. Documents/ FCD/fcd-datamining pdf, Information Discovery Data Mining Suite The PANDA Project Java Data Mining API Predictive Model Markup Language (PMML). v2/pmml v2 0.html, R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 20th VLDB, M. Berry and G. Linoff. Data mining techniques: for marketing, sales, and customer support. John Wiley, E. Bertino et al. A preliminary proposal for the PANDA logical model. Technical Report TR , PANDA, I. Bartolini et al. PAtterns for Next-generation DAtabase systems: preliminary results of the PANDA project. In Proc. 11th SEBD, Cetraro, Italy, U. Fayyad, G. Piatesky-Shapiro, and P. Smyth. From data mining to knowledge discovery: an overview. In Advances in Knowledge Discovery and Data Mining, pages AAAI Press and the MIT Press, V. Ganti, R. Ramakrishnan, J. Gehrke, and W.-Y. Loh. A framework for measuring distances in data characteristics. PODS, J. Han and M. Kamber. Data mining: concepts and techniques. Academic Press, T. Imielinski and H. Mannila. A Database Perspective on Knowledge Discovery. Communications of the ACM, 39(11):58 64, A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a survey. ACM Computing Surveys, 31: , P. Kanellakis, G. Kuper, and P. Revesz. Constraint Query Languages. Journal of Computer and System Sciences, 51(1):25 52, P. Vassiliadis and T. Sellis. A survey of logical models for OLAP databases. SIG- MOD Record, 28(4):64 69, 1999.

UML-Based Conceptual Modeling of Pattern-Bases

UML-Based Conceptual Modeling of Pattern-Bases UML-Based Conceptual Modeling of Pattern-Bases Stefano Rizzi DEIS - University of Bologna Viale Risorgimento, 2 40136 Bologna - Italy srizzi@deis.unibo.it Abstract. The concept of pattern, meant as an

More information

Panel: Pattern management challenges

Panel: Pattern management challenges Panel: Pattern management challenges Panos Vassiliadis Univ. of Ioannina, Dept. of Computer Science, 45110, Ioannina, Hellas E-mail: pvassil@cs.uoi.gr 1 Introduction The increasing opportunity of quickly

More information

Flexible Pattern Management within PSYCHO

Flexible Pattern Management within PSYCHO Flexible Pattern Management within PSYCHO Barbara Catania and Anna Maddalena Dipartimento di Informatica e Scienze dell Informazione Università degli Studi di Genova (Italy) {catania,maddalena}@disi.unige.it

More information

Towards a Language for Pattern Manipulation and Querying

Towards a Language for Pattern Manipulation and Querying Towards a Language for Pattern Manipulation and Querying Elisa Bertino 1, Barbara Catania 2, and Anna Maddalena 2 1 Dipartimento di Scienze dell Informazione Università degli Studi di Milano (Italy) 2

More information

Measuring and Evaluating Dissimilarity in Data and Pattern Spaces

Measuring and Evaluating Dissimilarity in Data and Pattern Spaces Measuring and Evaluating Dissimilarity in Data and Pattern Spaces Irene Ntoutsi, Yannis Theodoridis Database Group, Information Systems Laboratory Department of Informatics, University of Piraeus, Greece

More information

Pattern Management. Irene Ntoutsi, Yannis Theodoridis

Pattern Management. Irene Ntoutsi, Yannis Theodoridis Pattern Management Irene Ntoutsi, Yannis Theodoridis Information Systems Lab Department of Informatics University of Piraeus Hellas Technical Report Series UNIPI-ISL-TR-2007-01 March 2007 Pattern Management

More information

A Framework for Data Mining Pattern Management

A Framework for Data Mining Pattern Management A Framework for Data Mining Pattern Management Barbara Catania 1, Anna Maddalena 1, Maurizio Mazza 1, Elisa Bertino 2, and Stefano Rizzi 3 1 University of Genova (Italy) {catania,maddalena,mazza}@disi.unige.it

More information

Modeling and Language Support for the Management of Pattern-Bases

Modeling and Language Support for the Management of Pattern-Bases Modeling and Language Support for the Management of Pattern-Bases Manolis Terrovitis a,, Panos Vassiliadis b, Spiros Skiadopoulos c, Elisa Bertino d, Barbara Catania e, Anna Maddalena e, and Stefano Rizzi

More information

Modeling and Language Support for the Management of Pattern-Bases

Modeling and Language Support for the Management of Pattern-Bases Modeling and Language Support for the Management of Pattern-Bases Manolis Terrovitis Panos Vassiliadis Spiros Skiadopoulos Elisa Bertino Barbara Catania Anna Maddalena Abstract In our days knowledge extraction

More information

PatMan: A Visual Database System to Manipulate Path Patterns and Data in Hierarchical Catalogs

PatMan: A Visual Database System to Manipulate Path Patterns and Data in Hierarchical Catalogs PatMan: A Visual Database System to Manipulate Path Patterns and Data in Hierarchical Catalogs Aggeliki Koukoutsaki, Theodore Dalamagas, Timos Sellis, Panagiotis Bouros Knowledge and Database Systems Lab

More information

1. Inroduction to Data Mininig

1. Inroduction to Data Mininig 1. Inroduction to Data Mininig 1.1 Introduction Universe of Data Information Technology has grown in various directions in the recent years. One natural evolutionary path has been the development of the

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3

More information

3.4 Data-Centric workflow

3.4 Data-Centric workflow 3.4 Data-Centric workflow One of the most important activities in a S-DWH environment is represented by data integration of different and heterogeneous sources. The process of extract, transform, and load

More information

QUERY RECOMMENDATION SYSTEM USING USERS QUERYING BEHAVIOR

QUERY RECOMMENDATION SYSTEM USING USERS QUERYING BEHAVIOR International Journal of Emerging Technology and Innovative Engineering QUERY RECOMMENDATION SYSTEM USING USERS QUERYING BEHAVIOR V.Megha Dept of Computer science and Engineering College Of Engineering

More information

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective B.Manivannan Research Scholar, Dept. Computer Science, Dravidian University, Kuppam, Andhra Pradesh, India

More information

Modeling and Language Support for the Management of Pattern-Bases

Modeling and Language Support for the Management of Pattern-Bases 1 Modeling and Language Support for the Management of Pattern-Bases Manolis Terrovitis 1 Panos Vassiliadis 2 Spiros Skiadopoulos 1 Elisa Bertino 3 Barbara Catania 4 Anna Maddalena 4 Abstract Information

More information

An Approach to Intensional Query Answering at Multiple Abstraction Levels Using Data Mining Approaches

An Approach to Intensional Query Answering at Multiple Abstraction Levels Using Data Mining Approaches An Approach to Intensional Query Answering at Multiple Abstraction Levels Using Data Mining Approaches Suk-Chung Yoon E. K. Park Dept. of Computer Science Dept. of Software Architecture Widener University

More information

Practical Database Design Methodology and Use of UML Diagrams Design & Analysis of Database Systems

Practical Database Design Methodology and Use of UML Diagrams Design & Analysis of Database Systems Practical Database Design Methodology and Use of UML Diagrams 406.426 Design & Analysis of Database Systems Jonghun Park jonghun@snu.ac.kr Dept. of Industrial Engineering Seoul National University chapter

More information

Mining Quantitative Association Rules on Overlapped Intervals

Mining Quantitative Association Rules on Overlapped Intervals Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

METADATA INTERCHANGE IN SERVICE BASED ARCHITECTURE

METADATA INTERCHANGE IN SERVICE BASED ARCHITECTURE UDC:681.324 Review paper METADATA INTERCHANGE IN SERVICE BASED ARCHITECTURE Alma Butkovi Tomac Nagravision Kudelski group, Cheseaux / Lausanne alma.butkovictomac@nagra.com Dražen Tomac Cambridge Technology

More information

Modern Software Engineering Methodologies Meet Data Warehouse Design: 4WD

Modern Software Engineering Methodologies Meet Data Warehouse Design: 4WD Modern Software Engineering Methodologies Meet Data Warehouse Design: 4WD Matteo Golfarelli Stefano Rizzi Elisa Turricchia University of Bologna - Italy 13th International Conference on Data Warehousing

More information

Contextual snowflake modelling for pattern warehouse logical design

Contextual snowflake modelling for pattern warehouse logical design Sādhanā Vol. 40, Part 1, February 2015, pp. 15 33. c Indian Academy of Sciences Contextual snowflake modelling for pattern warehouse logical design 1. Introduction VIVEK TIWARI and RAMJEEVAN SINGH THAKUR

More information

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW Ana Azevedo and M.F. Santos ABSTRACT In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA) International Journal of Innovation and Scientific Research ISSN 2351-8014 Vol. 12 No. 1 Nov. 2014, pp. 217-222 2014 Innovative Space of Scientific Research Journals http://www.ijisr.issr-journals.org/

More information

Unit 10 Databases. Computer Concepts Unit Contents. 10 Operational and Analytical Databases. 10 Section A: Database Basics

Unit 10 Databases. Computer Concepts Unit Contents. 10 Operational and Analytical Databases. 10 Section A: Database Basics Unit 10 Databases Computer Concepts 2016 ENHANCED EDITION 10 Unit Contents Section A: Database Basics Section B: Database Tools Section C: Database Design Section D: SQL Section E: Big Data Unit 10: Databases

More information

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

Modelling Structures in Data Mining Techniques

Modelling Structures in Data Mining Techniques Modelling Structures in Data Mining Techniques Ananth Y N 1, Narahari.N.S 2 Associate Professor, Dept of Computer Science, School of Graduate Studies- JainUniversity- J.C.Road, Bangalore, INDIA 1 Professor

More information

Data Warehousing and OLAP Technologies for Decision-Making Process

Data Warehousing and OLAP Technologies for Decision-Making Process Data Warehousing and OLAP Technologies for Decision-Making Process Hiren H Darji Asst. Prof in Anand Institute of Information Science,Anand Abstract Data warehousing and on-line analytical processing (OLAP)

More information

Generating Cross level Rules: An automated approach

Generating Cross level Rules: An automated approach Generating Cross level Rules: An automated approach Ashok 1, Sonika Dhingra 1 1HOD, Dept of Software Engg.,Bhiwani Institute of Technology, Bhiwani, India 1M.Tech Student, Dept of Software Engg.,Bhiwani

More information

Association Rule Mining. Entscheidungsunterstützungssysteme

Association Rule Mining. Entscheidungsunterstützungssysteme Association Rule Mining Entscheidungsunterstützungssysteme Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set

More information

Guidelines for the application of Data Envelopment Analysis to assess evolving software

Guidelines for the application of Data Envelopment Analysis to assess evolving software Short Paper Guidelines for the application of Data Envelopment Analysis to assess evolving software Alexander Chatzigeorgiou Department of Applied Informatics, University of Macedonia 546 Thessaloniki,

More information

Fausto Giunchiglia and Mattia Fumagalli

Fausto Giunchiglia and Mattia Fumagalli DISI - Via Sommarive 5-38123 Povo - Trento (Italy) http://disi.unitn.it FROM ER MODELS TO THE ENTITY MODEL Fausto Giunchiglia and Mattia Fumagalli Date (2014-October) Technical Report # DISI-14-014 From

More information

Information mining and information retrieval : methods and applications

Information mining and information retrieval : methods and applications Information mining and information retrieval : methods and applications J. Mothe, C. Chrisment Institut de Recherche en Informatique de Toulouse Université Paul Sabatier, 118 Route de Narbonne, 31062 Toulouse

More information

Generalized Document Data Model for Integrating Autonomous Applications

Generalized Document Data Model for Integrating Autonomous Applications 6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. Generalized Document Data Model for Integrating Autonomous Applications Zsolt Hernáth, Zoltán Vincellér Abstract

More information

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction

More information

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati Analytical Representation on Secure Mining in Horizontally Distributed Database Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering

More information

Database Management Systems MIT Lesson 01 - Introduction By S. Sabraz Nawaz

Database Management Systems MIT Lesson 01 - Introduction By S. Sabraz Nawaz Database Management Systems MIT 22033 Lesson 01 - Introduction By S. Sabraz Nawaz Introduction A database management system (DBMS) is a software package designed to create and maintain databases (examples?)

More information

Digital Archives: Extending the 5S model through NESTOR

Digital Archives: Extending the 5S model through NESTOR Digital Archives: Extending the 5S model through NESTOR Nicola Ferro and Gianmaria Silvello Department of Information Engineering, University of Padua, Italy {ferro, silvello}@dei.unipd.it Abstract. Archives

More information

. The problem: ynamic ata Warehouse esign Ws are dynamic entities that evolve continuously over time. As time passes, new queries need to be answered

. The problem: ynamic ata Warehouse esign Ws are dynamic entities that evolve continuously over time. As time passes, new queries need to be answered ynamic ata Warehouse esign? imitri Theodoratos Timos Sellis epartment of Electrical and Computer Engineering Computer Science ivision National Technical University of Athens Zographou 57 73, Athens, Greece

More information

Variety-Aware OLAP of Document-Oriented Databases

Variety-Aware OLAP of Document-Oriented Databases Variety-Aware OLAP of Document-Oriented Databases Enrico Gallinucci, Matteo Golfarelli, Stefano Rizzi DISI University of Bologna, Italy Introduction In recent years, schemalessdbmss have progressively

More information

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,

More information

Bitmap index-based decision trees

Bitmap index-based decision trees Bitmap index-based decision trees Cécile Favre and Fadila Bentayeb ERIC - Université Lumière Lyon 2, Bâtiment L, 5 avenue Pierre Mendès-France 69676 BRON Cedex FRANCE {cfavre, bentayeb}@eric.univ-lyon2.fr

More information

Chapter 4 Data Mining A Short Introduction

Chapter 4 Data Mining A Short Introduction Chapter 4 Data Mining A Short Introduction Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining 3. Clustering 4. Classification Data Mining - 2 2 1. Data Mining Overview

More information

5-1McGraw-Hill/Irwin. Copyright 2007 by The McGraw-Hill Companies, Inc. All rights reserved.

5-1McGraw-Hill/Irwin. Copyright 2007 by The McGraw-Hill Companies, Inc. All rights reserved. 5-1McGraw-Hill/Irwin Copyright 2007 by The McGraw-Hill Companies, Inc. All rights reserved. 5 hapter Data Resource Management Data Concepts Database Management Types of Databases McGraw-Hill/Irwin Copyright

More information

Data Access Paths for Frequent Itemsets Discovery

Data Access Paths for Frequent Itemsets Discovery Data Access Paths for Frequent Itemsets Discovery Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science {marekw, mzakrz}@cs.put.poznan.pl Abstract. A number

More information

Materialized Data Mining Views *

Materialized Data Mining Views * Materialized Data Mining Views * Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland tel. +48 61

More information

Web page recommendation using a stochastic process model

Web page recommendation using a stochastic process model Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,

More information

MIT Database Management Systems Lesson 01: Introduction

MIT Database Management Systems Lesson 01: Introduction MIT 22033 Database Management Systems Lesson 01: Introduction By S. Sabraz Nawaz Senior Lecturer in MIT, FMC, SEUSL Learning Outcomes At the end of the module the student will be able to: Describe the

More information

A Methodology for Integrating XML Data into Data Warehouses

A Methodology for Integrating XML Data into Data Warehouses A Methodology for Integrating XML Data into Data Warehouses Boris Vrdoljak, Marko Banek, Zoran Skočir University of Zagreb Faculty of Electrical Engineering and Computing Address: Unska 3, HR-10000 Zagreb,

More information

Semantic Web in a Constrained Environment

Semantic Web in a Constrained Environment Semantic Web in a Constrained Environment Laurens Rietveld and Stefan Schlobach Department of Computer Science, VU University Amsterdam, The Netherlands {laurens.rietveld,k.s.schlobach}@vu.nl Abstract.

More information

Value Added Association Rules

Value Added Association Rules Value Added Association Rules T.Y. Lin San Jose State University drlin@sjsu.edu Glossary Association Rule Mining A Association Rule Mining is an exploratory learning task to discover some hidden, dependency

More information

This proposed research is inspired by the work of Mr Jagdish Sadhave 2009, who used

This proposed research is inspired by the work of Mr Jagdish Sadhave 2009, who used Literature Review This proposed research is inspired by the work of Mr Jagdish Sadhave 2009, who used the technology of Data Mining and Knowledge Discovery in Databases to build Examination Data Warehouse

More information

Data Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini

Data Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 6 (2013), pp. 669-674 Research India Publications http://www.ripublication.com/aeee.htm Data Warehousing Ritham Vashisht,

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 1, Number 1 (2015), pp. 25-31 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study

A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study Mirzaei.Afshin 1, Sheikh.Reza 2 1 Department of Industrial Engineering and

More information

CHAPTER-23 MINING COMPLEX TYPES OF DATA

CHAPTER-23 MINING COMPLEX TYPES OF DATA CHAPTER-23 MINING COMPLEX TYPES OF DATA 23.1 Introduction 23.2 Multidimensional Analysis and Descriptive Mining of Complex Data Objects 23.3 Generalization of Structured Data 23.4 Aggregation and Approximation

More information

Efficient integration of data mining techniques in DBMSs

Efficient integration of data mining techniques in DBMSs Efficient integration of data mining techniques in DBMSs Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex, FRANCE {bentayeb jdarmont

More information

ANALYTICS DRIVEN DATA MODEL IN DIGITAL SERVICES

ANALYTICS DRIVEN DATA MODEL IN DIGITAL SERVICES ANALYTICS DRIVEN DATA MODEL IN DIGITAL SERVICES Ng Wai Keat 1 1 Axiata Analytics Centre, Axiata Group, Malaysia *Corresponding E-mail : waikeat.ng@axiata.com Abstract Data models are generally applied

More information

Annotation for the Semantic Web During Website Development

Annotation for the Semantic Web During Website Development Annotation for the Semantic Web During Website Development Peter Plessers and Olga De Troyer Vrije Universiteit Brussel, Department of Computer Science, WISE, Pleinlaan 2, 1050 Brussel, Belgium {Peter.Plessers,

More information

A Systems Approach to Dimensional Modeling in Data Marts. Joseph M. Firestone, Ph.D. White Paper No. One. March 12, 1997

A Systems Approach to Dimensional Modeling in Data Marts. Joseph M. Firestone, Ph.D. White Paper No. One. March 12, 1997 1 of 8 5/24/02 4:43 PM A Systems Approach to Dimensional Modeling in Data Marts By Joseph M. Firestone, Ph.D. White Paper No. One March 12, 1997 OLAP s Purposes And Dimensional Data Modeling Dimensional

More information

XML-OLAP: A Multidimensional Analysis Framework for XML Warehouses

XML-OLAP: A Multidimensional Analysis Framework for XML Warehouses XML-OLAP: A Multidimensional Analysis Framework for XML Warehouses Byung-Kwon Park 1,HyoilHan 2,andIl-YeolSong 2 1 Dong-A University, Busan, Korea bpark@dau.ac.kr 2 Drexel University, Philadelphia, PA

More information

Lies, Damned Lies and Statistics Using Data Mining Techniques to Find the True Facts.

Lies, Damned Lies and Statistics Using Data Mining Techniques to Find the True Facts. Lies, Damned Lies and Statistics Using Data Mining Techniques to Find the True Facts. BY SCOTT A. BARNES, CPA, CFF, CGMA The adversarial nature of the American legal system creates a natural conflict between

More information

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine.

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine. 1. true / false By a compiler we mean a program that translates to code that will run natively on some machine. 2. true / false ML can be compiled. 3. true / false FORTRAN can reasonably be considered

More information

Adnan YAZICI Computer Engineering Department

Adnan YAZICI Computer Engineering Department Data Warehouse Adnan YAZICI Computer Engineering Department Middle East Technical University, A.Yazici, 2010 Definition A data warehouse is a subject-oriented integrated time-variant nonvolatile collection

More information

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Rada Chirkova Department of Computer Science, North Carolina State University Raleigh, NC 27695-7535 chirkova@csc.ncsu.edu Foto Afrati

More information

Category Theory in Ontology Research: Concrete Gain from an Abstract Approach

Category Theory in Ontology Research: Concrete Gain from an Abstract Approach Category Theory in Ontology Research: Concrete Gain from an Abstract Approach Markus Krötzsch Pascal Hitzler Marc Ehrig York Sure Institute AIFB, University of Karlsruhe, Germany; {mak,hitzler,ehrig,sure}@aifb.uni-karlsruhe.de

More information

Gurpreet Kaur 1, Naveen Aggarwal 2 1,2

Gurpreet Kaur 1, Naveen Aggarwal 2 1,2 Association Rule Mining in XML databases: Performance Evaluation and Analysis Gurpreet Kaur 1, Naveen Aggarwal 2 1,2 Department of Computer Science & Engineering, UIET Panjab University Chandigarh. E-mail

More information

KNOWLEDGE DISCOVERY AND DATA MINING

KNOWLEDGE DISCOVERY AND DATA MINING KNOWLEDGE DISCOVERY AND DATA MINING Prof. Fabio A. Schreiber Dipartimento di Elettronica e Informazione Politecnico di Milano INFORMATION MANAGEMENT TECHNOLOGIES DATA WAREHOUSE DECISION SUPPORT SYSTEMS

More information

Handout 9: Imperative Programs and State

Handout 9: Imperative Programs and State 06-02552 Princ. of Progr. Languages (and Extended ) The University of Birmingham Spring Semester 2016-17 School of Computer Science c Uday Reddy2016-17 Handout 9: Imperative Programs and State Imperative

More information

Minsoo Ryu. College of Information and Communications Hanyang University.

Minsoo Ryu. College of Information and Communications Hanyang University. Software Reuse and Component-Based Software Engineering Minsoo Ryu College of Information and Communications Hanyang University msryu@hanyang.ac.kr Software Reuse Contents Components CBSE (Component-Based

More information

CHAPTER 3 Implementation of Data warehouse in Data Mining

CHAPTER 3 Implementation of Data warehouse in Data Mining CHAPTER 3 Implementation of Data warehouse in Data Mining 3.1 Introduction to Data Warehousing A data warehouse is storage of convenient, consistent, complete and consolidated data, which is collected

More information

Reference Model Concept for Structuring and Representing Performance Indicators in Manufacturing

Reference Model Concept for Structuring and Representing Performance Indicators in Manufacturing Reference Model Concept for Structuring and Representing Performance Indicators in Manufacturing Stefan Hesse 1, Bernhard Wolf 1, Martin Rosjat 1, Dražen Nadoveza 2, and George Pintzos 3 1 SAP AG, SAP

More information

A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition

A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition S.Vigneswaran 1, M.Yashothai 2 1 Research Scholar (SRF), Anna University, Chennai.

More information

Comparison of FP tree and Apriori Algorithm

Comparison of FP tree and Apriori Algorithm International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 10, Issue 6 (June 2014), PP.78-82 Comparison of FP tree and Apriori Algorithm Prashasti

More information

Induction of Association Rules: Apriori Implementation

Induction of Association Rules: Apriori Implementation 1 Induction of Association Rules: Apriori Implementation Christian Borgelt and Rudolf Kruse Department of Knowledge Processing and Language Engineering School of Computer Science Otto-von-Guericke-University

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

5. MULTIPLE LEVELS AND CROSS LEVELS ASSOCIATION RULES UNDER CONSTRAINTS

5. MULTIPLE LEVELS AND CROSS LEVELS ASSOCIATION RULES UNDER CONSTRAINTS 5. MULTIPLE LEVELS AND CROSS LEVELS ASSOCIATION RULES UNDER CONSTRAINTS Association rules generated from mining data at multiple levels of abstraction are called multiple level or multi level association

More information

Stylus Studio Case Study: FIXML Working with Complex Message Sets Defined Using XML Schema

Stylus Studio Case Study: FIXML Working with Complex Message Sets Defined Using XML Schema Stylus Studio Case Study: FIXML Working with Complex Message Sets Defined Using XML Schema Introduction The advanced XML Schema handling and presentation capabilities of Stylus Studio have valuable implications

More information

challenges in domain-specific modeling raphaël mannadiar august 27, 2009

challenges in domain-specific modeling raphaël mannadiar august 27, 2009 challenges in domain-specific modeling raphaël mannadiar august 27, 2009 raphaël mannadiar challenges in domain-specific modeling 1/59 outline 1 introduction 2 approaches 3 debugging and simulation 4 differencing

More information

Usability Evaluation of Tools for Nomadic Application Development

Usability Evaluation of Tools for Nomadic Application Development Usability Evaluation of Tools for Nomadic Application Development Cristina Chesta (1), Carmen Santoro (2), Fabio Paternò (2) (1) Motorola Electronics S.p.a. GSG Italy Via Cardinal Massaia 83, 10147 Torino

More information

PPKM: Preserving Privacy in Knowledge Management

PPKM: Preserving Privacy in Knowledge Management PPKM: Preserving Privacy in Knowledge Management N. Maheswari (Corresponding Author) P.G. Department of Computer Science Kongu Arts and Science College, Erode-638-107, Tamil Nadu, India E-mail: mahii_14@yahoo.com

More information

DATA WAREHOUSING AND MINING UNIT-V TWO MARK QUESTIONS WITH ANSWERS

DATA WAREHOUSING AND MINING UNIT-V TWO MARK QUESTIONS WITH ANSWERS DATA WAREHOUSING AND MINING UNIT-V TWO MARK QUESTIONS WITH ANSWERS 1. NAME SOME SPECIFIC APPLICATION ORIENTED DATABASES. Spatial databases, Time-series databases, Text databases and multimedia databases.

More information

Interestingness Measurements

Interestingness Measurements Interestingness Measurements Objective measures Two popular measurements: support and confidence Subjective measures [Silberschatz & Tuzhilin, KDD95] A rule (pattern) is interesting if it is unexpected

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Software Reuse and Component-Based Software Engineering

Software Reuse and Component-Based Software Engineering Software Reuse and Component-Based Software Engineering Minsoo Ryu Hanyang University msryu@hanyang.ac.kr Contents Software Reuse Components CBSE (Component-Based Software Engineering) Domain Engineering

More information

Correlation Based Feature Selection with Irrelevant Feature Removal

Correlation Based Feature Selection with Irrelevant Feature Removal Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

A Study on Website Quality Models

A Study on Website Quality Models International Journal of Scientific and Research Publications, Volume 4, Issue 12, December 2014 1 A Study on Website Quality Models R.Anusha Department of Information Systems Management, M.O.P Vaishnav

More information

Extracting knowledge from Ontology using Jena for Semantic Web

Extracting knowledge from Ontology using Jena for Semantic Web Extracting knowledge from Ontology using Jena for Semantic Web Ayesha Ameen I.T Department Deccan College of Engineering and Technology Hyderabad A.P, India ameenayesha@gmail.com Khaleel Ur Rahman Khan

More information

Reasoning on Business Processes and Ontologies in a Logic Programming Environment

Reasoning on Business Processes and Ontologies in a Logic Programming Environment Reasoning on Business Processes and Ontologies in a Logic Programming Environment Michele Missikoff 1, Maurizio Proietti 1, Fabrizio Smith 1,2 1 IASI-CNR, Viale Manzoni 30, 00185, Rome, Italy 2 DIEI, Università

More information

Novel Materialized View Selection in a Multidimensional Database

Novel Materialized View Selection in a Multidimensional Database Graphic Era University From the SelectedWorks of vijay singh Winter February 10, 2009 Novel Materialized View Selection in a Multidimensional Database vijay singh Available at: https://works.bepress.com/vijaysingh/5/

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Tadeusz Morzy, Maciej Zakrzewicz

Tadeusz Morzy, Maciej Zakrzewicz From: KDD-98 Proceedings. Copyright 998, AAAI (www.aaai.org). All rights reserved. Group Bitmap Index: A Structure for Association Rules Retrieval Tadeusz Morzy, Maciej Zakrzewicz Institute of Computing

More information

ISO. International Organization for Standardization. ISO/IEC JTC 1/SC 32 Data Management and Interchange WG4 SQL/MM. Secretariat: USA (ANSI)

ISO. International Organization for Standardization. ISO/IEC JTC 1/SC 32 Data Management and Interchange WG4 SQL/MM. Secretariat: USA (ANSI) ISO/IEC JTC 1/SC 32 N 0736 ISO/IEC JTC 1/SC 32/WG 4 SQL/MM:VIE-006 January, 2002 ISO International Organization for Standardization ISO/IEC JTC 1/SC 32 Data Management and Interchange WG4 SQL/MM Secretariat:

More information

Introducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values

Introducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values Introducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine

More information

A Generic Approach for Compliance Assessment of Interoperability Artifacts

A Generic Approach for Compliance Assessment of Interoperability Artifacts A Generic Approach for Compliance Assessment of Interoperability Artifacts Stipe Fustar Power Grid 360 11060 Parkwood Drive #2, Cupertino, CA 95014 sfustar@powergrid360.com Keywords: Semantic Model, IEC

More information

Inference in Hierarchical Multidimensional Space

Inference in Hierarchical Multidimensional Space Proc. International Conference on Data Technologies and Applications (DATA 2012), Rome, Italy, 25-27 July 2012, 70-76 Related papers: http://conceptoriented.org/ Inference in Hierarchical Multidimensional

More information

arxiv: v1 [cs.db] 10 May 2007

arxiv: v1 [cs.db] 10 May 2007 Decision tree modeling with relational views Fadila Bentayeb and Jérôme Darmont arxiv:0705.1455v1 [cs.db] 10 May 2007 ERIC Université Lumière Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France

More information