PROPOSTE DI PROGETTI E TESI DI LAUREA Tecnologie per i Sistemi Informativi Context Integration for Mobile Data Design Disparate, heterogeneous, independent Data Sources Semantic schema integration Context-aware information filtering: Data Tailoring Common, integrated, semantic access to data Issues: mobility, data transiency Multiple scenarios: system adaptability 1
Context Model: Dimension Tree Dimension Tree: is a Context-User Model, represented as a constrained ontology Dimensions are used to classify all the possible user-context pairs is an extension of the Very Small DataBase Dimension Array 2
Domain Ontology Domain Ontology: Represents the main concepts, relations, attributes of the domain: build a shared vocabulary Copes with the absence of the equivalent of a DB global schema It will be, in the medium/long term, shared and commonly agreed Must be decidable and computable (typically within OWL-DL) Data Source: Semantic Extraction Data Source Ontology: Semantic Extraction: data abstract model + storage model Supports the query processing Models isolation (different models can be used separately) 3
Chunks Chunk: is the set of relevant data for a given user in a given context can be derived from several data sources is highly context-aware can be materialized on the user device Possibili aree di progetto Moduli per ontology mapping (tecniche di rilevazione di similitudine) Estrattori di semantica per le diverse sorgenti informative (XML, Web pages, OODB, sensori wireless ) Query processing: argomento più opportuno per (progetto + tesi) richiede lavoro di analisi preliminare Generazione di chunk nelle varie fasi del ciclo di vita del sistema (design time, run time, query time) Toolbox per la configurazione dell architettura Case tool 4
XML XML (acronimo di extensible Markup Language) deriva come HTML dalla specifica di SGML (Standard Generalized Markup Language) ed è stato introdotto dal W3C; XML può essere visto come una moderna lingua franca nella modellizzazione delle informazioni e può anche essere utilizzato per rappresentare dati semi-strutturati(a differenza dei Database) che hanno una struttura implicita e incompleta; XML non è né un sostituto di HTML né un linguaggio di programmazione a se stante; Data Mining Data Mining area di ricerca che si occupa dello studio di tecniche per estrapolare informazioni implicite, non conosciute ma utili per gli utenti, da basi di dati di grosse dimensioni. Regola di associazione implicazione valida con una certa frequenza. Ad esempio, con una certa frequenza f, coloro che seguono il genere gioco a premi seguono anche gli sceneggiati televisivi. 5
Our goal Given XML dataset D A summarized representation of D by means of association rules AR A query Q Provide an intensional answer to Q by querying AR instead of D Substitute the actual data answering query with a set of properties characterizing them [Motro89] Our goal <xml> D Data Mining <xml> AR <article year"2001"> <volume>30</volume> <number>2</number> <month>june</month> <conference>acm International </conference> <date>may 21-24, 2001</date> <location>santa Barbara, California, USA</location> <title articlecode="302001">securing...</title> <authors> <author authorposition="01">e. Brown</author> <author authorposition="02">l. Baines</author> </authors>. <result> { for $article in doc("document.xml")//article where $article/authors/author/text() = "E. Brown" EXTENSIONAL return $article } </result> answer Q <XQuery> INTENSIONAL answer <result> { for $article in doc("ruleset.xml")//associationrule where $article[rulebody[item[itemname="author" and ItemValue="E. Brown"]]] return $article } <AssociationRule support="0.2" confidence="0.8"> </result> <RuleBody> <item> <ItemName>author</ItemName> <ItemValue>E. Brown</ItemValue> </item> </RuleBody> <RuleHead> <item> <ItemName>conference</ItemName> <ItemValue>ACM Intern </ItemValue> </item> </RuleHead> </AssociationRule> 6
Motivation XML is a verbose representation of data Huge storage space Query processing time AR s provide a succinct representation Provide: fast approximate succinct Can substitute the actual set if currently unreachable Answer to query (e.g decison support purpose) Patterns for XML Documents (1) Patterns = abstract representation of a generalization of constraints [BGQT04] summarized representation of the data Based on association rule extracted from the dataset Association rule: X,Y set of data items X Y support sup(x Y) = freq(x U Y) confidence conf(x Y) = freq(x U Y)/freq(X) 7
Patterns for XML Documents (2) Two orthogonal ways to classify patterns: Exact (e.g. functional dependencies) Instance (dataset instances) Schema (dataset structure) Probabilistic (weak constraints) Patterns for XML Documents (3) Instance patterns = patterns expressed on the instances of the dataset GSL language for pattern formalization [BGQT04] Instance Pattern Query 8
Examples of framework (1/4) Classes of query formalized into XQuery expression to inquire either the XML Dataset or the Rule Set. A tool with query prototype for each class of query Examples of framework (2/4) Graphical query language to express queries XQBE (XQuery By Example) [Braga03] User friendly Output: XQuery expression easy to modify in an automatic manner to inquire even the rule set 9
Examples of framework (3/4) Examples of framework (4/4) 10
Wireless embedded sensor networks Thousands of tiny low power devices spread over large physical areas monitor the environment, possibly predicting potential faults in buildings, bridges, roads, railways etc. The devices must be small, unobtrusive, and cheap The network must be unexpensive to develop, deploy, program, utilize and maintain A sensor network Comprises a number of sensor nodes and a base station Applications: Monitoring contaminated land areas or waters Monitoring animal behaviour Fire, earthquake emergencies Vehicle tracking, traffic control Surveillance of city districts, defense related networks, alerts to terroristical threats 11
Motes: the Mica2 platform Mica2Dot Basically same features, smaller size, fewer sensor options Different sensor boards for Mica2 and Mica2Dot DB view of sensor networks Traditional: Procedural addressing of individual sensor nodes: user specifies how task is executed, data is processed centrally DB-style approach: Declarative querying: user is not concerned about how the network works : in-network distributed processing 12
TinyDB TinyDB is a query processing system for extracting information from a network of TinyOS sensors. Reduced SQL interface (with some additional constructs) Queries issued from a PC Collects data from motes in the environment, filters it, aggregates it together, and routes it out to a PC Exploits power-efficient in-network processing algorithms. Multiple persistent queries with different sample time But further useful database functionalities are still lacking One VSDB should reside at least on every generic sensing device (e.g. Mica2) To compose a distributed/federated database Each VSDB should be context aware Each VSDB should be able to appropriately redirect queries to neighbours (P2P) because of an internal fault or a generic unavailability because it does not possess the information because the other node knows something more, in order to complete the information because the other node has a less power-consuming sensor onboard design appropriate, optimized query processing plans (e.g. redirect subquery, cache subquery result, etc.) 13
Estrazione di dati da sorgenti web e costruzione di data warehouse Un tema di interesse: i congressi medici nel mondo: Definizione di ontologie di dominio Estrattori di informazioni Progetto e realizzazione della base di dati e del data warehouse 14