By Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy and Pedro Domingos. Presented by Yael Kazaz

Size: px
Start display at page:

Download "By Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy and Pedro Domingos. Presented by Yael Kazaz"

Transcription

1 By Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy and Pedro Domingos Presented by Yael Kazaz

2 Example: Merging Real-Estate Agencies Two real-estate agencies: S and T, decide to merge Schema T has one table: Listings Schema S has two tables: Houses and Agents Merging schema S into schema T

3 Example: Making Tuples Using SQL area = SELECT location from HOUSES agent-name = SELECT name from AGENTS agent-address = SELECT concat(city, state) FROM AGENTS list-price = SELECT price * (1 + fee-rate) FROM HOUSES, AGENTS WHERE agent-id = id

4 Motivation Creating matches between data sources important Manually creating matched is hard Past attempts deal only with one-to-one (1-1) matches For example: Address = Location For example: Room_Price = Room_Rate Complex matches are not considered For example: Address = concat(city, state) For example: Room_Price = Room_Rate*(1+ Tax_Rate)

5 Introducing the imap System Semi-automatically discovering: 1-1 matches Complex matches Semi automatically constructing complex matches is very important since complex matched compose up to half of the matches!

6 Complex Matches Creating complex matches is harder than 1-1 matches: the match space can be very large or even infinite! The number of 1-1 matches is bounded The number of complex matches is not. There are an un-bounded number of functions for combining attributes in a schema

7 Example: 1-1 Matches area = SELECT location from HOUSES agent-name = SELECT name from AGENTS agent-address = SELECT concat(city, state) FROM AGENTS list-price = SELECT price * (1 + fee-rate) FROM HOUSES, AGENTS WHERE agent-id = id

8 Example: Complex Matches area = SELECT location from HOUSES agent-name = SELECT name from AGENTS agent-address = SELECT concat(city, state) FROM AGENTS list-price = SELECT price * (1 + fee-rate) FROM HOUSES, AGENTS WHERE agent-id = id

9

10 Overview

11 Match Generator (1) Input: Target schema (T) Source schema (S) Output: Match candidates

12 Match Generator (2) What the Match Generator does: The Match Generator takes as input two schemas: S and T For each attribute t of T, it generates a set of match candidates: 1-1 and complex matches The generation in guided by a set of search modules

13 Match Generator (3) For Example: for t = area in T the candidates are: location in HOUSES name in AGENTS state in AGENTS concat(city, state) in AGENTS concat(name, city) in AGENTS

14 Match Generator (4) PROBLEM: Unbounded number of match candidates SOLUTION: Search the space of possible matches HOW: Use search modules, called searchers, each in charge of a specific type of attribute

15 Match Generator (5) Implemented searchers in imap: The searchers cover many complex match types: text, numeric, category, etc. The searchers evaluate match candidates, and exploit domain knowledge, such as domain constraints and overlap data

16 Match Generator (6) Applying search to candidate generation requires addressing three issues: (1) Search strategy (2) Evaluation of candidate matches (3) Termination condition

17 Match Generator (7) (1) Search strategy (2) Evaluation of candidate matches (3) Termination condition The space search can be very large or even unbounded We need to efficiently search such spaces imap addresses this problem using beam search

18 Match Generator (8) Beam Search Example: (K = 3) A A A A B C D (3) (5) (1) B (3) (5) C D B C D (5) E F (4) (6) G H (6) (5) E F (4) (6) S = {A} S = {B C D} S = {B C E} S = {C E H}

19 Match Generator (9) (1) Search strategy (2) Evaluation of candidate matches (3) Termination condition We use beam search to search candidate matches Beam search uses a scoring function to evaluate each match candidate At each level of the search tree, it keeps only k highest-scoring match The searcher can conduct an efficient search in any type of search space

20 Match Generator (10) (1) Search strategy (2) Evaluation of candidate matches (3) Termination condition To conduct beam search, given a match candidate, we assign to it a score of the distance between it and the target attribute For example: Given a match candidate: concat(city, state) we approximates the distance between it and the target attribute: (agent-address)

21 Match Generator (11) (1) Search strategy (2) Evaluation of candidate matches (3) Termination condition imap uses techniques to compute the candidate scores: Machine learning Statistics Heuristics

22 Match Generator (12) (1) Search strategy (2) Evaluation of candidate matches (3) Termination condition The search space can be unbounded We need to decide when to stop the search We terminate when we start seeing diminishing returns from our search

23 Match Generator (13) (1) Search strategy (2) Evaluation of candidate matches (3) Termination condition In the i th iteration we keep track of the highest score of candidate matches Max i If (Max i+1 Max i ) < threshold: we stop the search and return the k highest match candidates

24 Overview

25 Similarity Estimator (1) Input: Match candidates Output: Similarity matrix

26 Similarity Estimator (2) What the Similarity Estimator does: Computes for each candidate a score of similarity to attribute t of T The output of this module is a matrix that stores the similarity score of the pairs: <target attribute, match candidate> Target attribute t1 Target attribute t2 Target attribute t3. Match candidate Match candidate Match candidate

27 Similarity Estimator (3) name concat(city, state) price price * (1 + fee-rate) area list-price agent-address agent-name

28 Similarity Estimator (4) PROBLEM: Deciding which candidate is better than another SOLUTION: For each target attribute t of T, the searchers suggest a set of match candidates HOW: The scores assigned to each of the candidate matches is based only on a single type of information For example: the text searcher considers only word frequencies

29 Similarity Estimator (5) The Similarity Estimator evaluates these candidates, and assign to each of them a final score The Similarity Estimator exploits additional types of information to for an accurate score The Similarity Estimator employs evaluator modules that exploit types of information, and then combines the suggested scores into one final score

30 Overview

31 Match Selector (1) Input: Similarity matrix Output: 1-1 matches Complex matches

32 Match Selector (2) What the Match Selector does: Examines the Similarity Matrix and outputs the best matches for the attributes of T, under certain constraints name concat(city, state) price price * (1 + fee-rate) area list-price agent-address agent-name

33 Match Selector (3) PROBLEM: How to match the best candidate to the attribute SOLUTION: The best global match could be where each target attribute is assigned the match with the highest score BUT This match assignment may not be acceptable because it may violate domain constraints

34 For Example: The imap Architecture: Match Selector (4) Domain constraint: name and city in AGENTS have no relation Result: The tuple area= SELECT concat(name, city) FROM AGENTS is not selected

35 Overview

36 Exploiting Domain Knowledge (1) Exploiting domain knowledge is beneficial on 1-1 matching On complex matching, it is more crucial: early detection of unlikely matches

37 Exploiting Domain Knowledge (2) For example: imap learns that the number of real estate agents in a specific area is bounded by 50 Given the match: agent-name = concat(firstname, last-name), where first-name and last-name belong to the home owner, imap will realize that concat(first-name, last-name) results in hundreds of distinct names Conclusion: concat(first-name, last-name) is unlikely to match agent-name

38 Exploiting Domain Knowledge (3) imap exploits domain knowledge: Domain constraints Past complex matches Overlap data External data

39 Exploiting Domain Knowledge (4) imap exploits domain knowledge: Domain constraints Past complex matches Overlap data External data

40 Exploiting Domain Knowledge (5) Constraints are either: Present in the schema For example: agent-name is a text and amount is only numeric Provided by an expert For example: The tax on a room cannot be less than 7% Provided by the user For example: We do not sell houses that cost less than 200,000$

41 Exploiting Domain Knowledge (6) imap considers 3 kinds of constraints: Two attributes are un-related For example: name and beds are unrelated; meaning that they cannot appear in the same match formula Constraint on a single attribute For example: the average value of numrooms does not exceed 10 Multiple schema attributes are un-related For example: area and agent-name are unrelated

42 Exploiting Domain Knowledge (7) imap exploits domain knowledge: Domain constraints Past complex matches Overlap data External data

43 Exploiting Domain Knowledge (8) Sometimes, the same or similar schemas are mapped repeatedly imap extracts the expression template of these matches, and guides the search process For example: Given the past match: price = pr * ( ), imap will extract: VARIABLE * (1 + CONSTANT) and ask the numeric searcher to look for matches for that template

44 Exploiting Domain Knowledge (9) imap exploits domain knowledge: Domain constraints Past complex matches Overlap data External data

45 Exploiting Domain Knowledge (10) There are many cases where the source (S) and target (T) share the same data For example: database S and T share a house listing ( Atlanta, GA, ) In such overlap cases the shared data provides valuable information for the mapping process

46 Exploiting Domain Knowledge (11) Searchers that exploit overlap data: Overlap Text Searcher Overlap Numeric Searcher Overlap Category & Schema Mismatch Searchers

47 Exploiting Domain Knowledge (12) HOW THE SEARCHERS WORK: Step 1: Use the original searchers for an initial mapping Step 2: Use the overlap data to re-evaluate the mappings for improved matching accuracy For example: when re-evaluated, mapping: agent-address = location, receives score 0 because it is not correct for the shared house listing: ( Atlanta, GA, ) and agent-address = concat(city, state) receives score 1

48 Exploiting Domain Knowledge (13) imap exploits domain knowledge: Domain constraints Past complex matches Overlap data External data

49 Exploiting Domain Knowledge (14) External data is used as additional constraints on the attributes of a schema Usually provided by experts Can be very useful in schema matching

50 Exploiting Domain Knowledge (15) HOW EXPLOITING EXTERNAL DATA WORKS: Step 1: Use external data to learn about the feature Step 2: Apply the learned information to evaluate matches for the target attribute For example: Target attribute: agent-name A feature that can be potentially useful in schema matching: number of distinct agent names

51 Overview

52 Generating Explanations (1) Example: In matching real-estate schemas, for attribute: list-price, imap has produced the matches: list-price = price list-price = price * (1 + monthly-fee-rate)

53 Generating Explanations (2) PROBLEM: The user is uncertain which of the two is the correct match SOLUTION: imap must explain the ranking: For example: why did list-price = price get a higher rank than list-price = price * (1 + monthly-fee-rate)?

54 Generating Explanations (3) imap s goal is to provide an environment where a human user can quickly generate a mapping between a pair of schemas For a user to know which match to choose, imap must supply an explanation for each of the matches

55 Generating Explanations (4) imap considers 3 questions: Explain existing match: Why does the match exist? For example: Why the match month-posted = month-fee-rate exist? Explain absent match: Why doesn t the match exist? Explain match ranking: Why is one match better than another?

56 Generating Explanations (5) imap keeps track of the decision making progress in a dependency graph Each node is one of the following: Schema attribute Assumption Candidate match Domain knowledge An edge between two nodes means that one node leads to another

57 Generating Explanations (6)

58

59 Conclusions (1) Matches are key for enabling a wide variety of data sharing and exchange scenarios The majority of the research on schema matching has focused on 1-1 matches imap offers a solution to the problem of finding complex matches The key challenge with complex matches is that the space of possible matching candidates is possibly unbounded, and evaluating each candidate is harder

60 Conclusions (2) The architecture of imap is modular and extensible New searchers and new evaluation modules can be added easily Experimental results show that imap achieves 43-92% accuracy on several real world domains, thus demonstrating the promise of the approach

61

62

imap: Discovering Complex Semantic Matches between Database Schemas

imap: Discovering Complex Semantic Matches between Database Schemas imap: Discovering Complex Semantic Matches between Database Schemas Robin Dhamankar, Yoonkyong Lee, AnHai Doan Department of Computer Science University of Illinois, Urbana-Champaign, IL, USA {dhamanka,ylee11,anhai}@cs.uiuc.edu

More information

Partly based on slides by AnHai Doan

Partly based on slides by AnHai Doan Partly based on slides by AnHai Doan New faculty member Find houses with 2 bedrooms priced under 200K realestate.com homeseekers.com homes.com 2 Find houses with 2 bedrooms priced under 200K mediated schema

More information

Learning mappings and queries

Learning mappings and queries Learning mappings and queries Marie Jacob University Of Pennsylvania DEIS 2010 1 Schema mappings Denote relationships between schemas Relates source schema S and target schema T Defined in a query language

More information

Database Technologies. Madalina CROITORU IUT Montpellier

Database Technologies. Madalina CROITORU IUT Montpellier Database Technologies Madalina CROITORU croitoru@lirmm.fr IUT Montpellier Course practicalities 2 x 2h per week (14 weeks) Basics of database theory relational model, relational algebra, SQL and database

More information

Customer Clustering using RFM analysis

Customer Clustering using RFM analysis Customer Clustering using RFM analysis VASILIS AGGELIS WINBANK PIRAEUS BANK Athens GREECE AggelisV@winbank.gr DIMITRIS CHRISTODOULAKIS Computer Engineering and Informatics Department University of Patras

More information

Interactive Machine Learning (IML) Markup of OCR Generated Text by Exploiting Domain Knowledge: A Biodiversity Case Study

Interactive Machine Learning (IML) Markup of OCR Generated Text by Exploiting Domain Knowledge: A Biodiversity Case Study Interactive Machine Learning (IML) Markup of OCR Generated by Exploiting Domain Knowledge: A Biodiversity Case Study Several digitization projects such as Google books are involved in scanning millions

More information

Learning to Match Ontologies on the Semantic Web

Learning to Match Ontologies on the Semantic Web The VLDB Journal manuscript No. (will be inserted by the editor) Learning to Match Ontologies on the Semantic Web AnHai Doan, Jayant Madhavan, Robin Dhamankar, Pedro Domingos, Alon Halevy Department of

More information

Escaping Local Optima: Genetic Algorithm

Escaping Local Optima: Genetic Algorithm Artificial Intelligence Escaping Local Optima: Genetic Algorithm Dae-Won Kim School of Computer Science & Engineering Chung-Ang University We re trying to escape local optima To achieve this, we have learned

More information

Learning to Match the Schemas of Data Sources: A Multistrategy Approach

Learning to Match the Schemas of Data Sources: A Multistrategy Approach Machine Learning, 50, 279 301, 2003 c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. Learning to Match the Schemas of Data Sources: A Multistrategy Approach ANHAI DOAN anhai@cs.uiuc.edu

More information

SeMap: A Generic Schema Matching System

SeMap: A Generic Schema Matching System SeMap: A Generic Schema Matching System by Ting Wang B.Sc., Zhejiang University, 2004 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science in The Faculty of

More information

NOTE 1: This is a closed book examination. For example, class text, copies of overhead slides and printed notes may not be used. There are 11 pages.

NOTE 1: This is a closed book examination. For example, class text, copies of overhead slides and printed notes may not be used. There are 11 pages. NOTE 1: This is a closed book examination. For example, class text, copies of overhead slides and printed notes may not be used. There are 11 pages. The last page, only, may be separated and used as an

More information

Information Discovery, Extraction and Integration for the Hidden Web

Information Discovery, Extraction and Integration for the Hidden Web Information Discovery, Extraction and Integration for the Hidden Web Jiying Wang Department of Computer Science University of Science and Technology Clear Water Bay, Kowloon Hong Kong cswangjy@cs.ust.hk

More information

Implementation Techniques

Implementation Techniques V Implementation Techniques 34 Efficient Evaluation of the Valid-Time Natural Join 35 Efficient Differential Timeslice Computation 36 R-Tree Based Indexing of Now-Relative Bitemporal Data 37 Light-Weight

More information

SIMON: A Multi-strategy Classification Approach Resolving Ontology Heterogeneity The P2P Meets the Semantic Web *

SIMON: A Multi-strategy Classification Approach Resolving Ontology Heterogeneity The P2P Meets the Semantic Web * SIMON: A Multi-strategy Classification Approach Resolving Ontology Heterogeneity The P2P Meets the Semantic Web * Leyun Pan, Liang Zhang, and Fanyuan Ma Department of Computer Science and Engineering Shanghai

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

Learning to match ontologies on the Semantic Web

Learning to match ontologies on the Semantic Web The VLDB Journal (2003) 12: 303 319 / Digital Object Identifier (DOI) 10.1007/s00778-003-0104-2 Learning to match ontologies on the Semantic Web AnHai Doan 1, Jayant Madhavan 2, Robin Dhamankar 1, Pedro

More information

A Generic Algorithm for Heterogeneous Schema Matching

A Generic Algorithm for Heterogeneous Schema Matching You Li, Dongbo Liu, and Weiming Zhang A Generic Algorithm for Heterogeneous Schema Matching You Li1, Dongbo Liu,3, and Weiming Zhang1 1 Department of Management Science, National University of Defense

More information

Building a website. Should you build your own website?

Building a website. Should you build your own website? Building a website As discussed in the previous module, your website is the online shop window for your business and you will only get one chance to make a good first impression. It is worthwhile investing

More information

ON HEURISTIC METHODS IN NEXT-GENERATION SEQUENCING DATA ANALYSIS

ON HEURISTIC METHODS IN NEXT-GENERATION SEQUENCING DATA ANALYSIS ON HEURISTIC METHODS IN NEXT-GENERATION SEQUENCING DATA ANALYSIS Ivan Vogel Doctoral Degree Programme (1), FIT BUT E-mail: xvogel01@stud.fit.vutbr.cz Supervised by: Jaroslav Zendulka E-mail: zendulka@fit.vutbr.cz

More information

Creating a Mediated Schema Based on Initial Correspondences

Creating a Mediated Schema Based on Initial Correspondences Creating a Mediated Schema Based on Initial Correspondences Rachel A. Pottinger University of Washington Seattle, WA, 98195 rap@cs.washington.edu Philip A. Bernstein Microsoft Research Redmond, WA 98052-6399

More information

Manual Wrapper Generation. Automatic Wrapper Generation. Grammar Induction Approach. Overview. Limitations. Website Structure-based Approach

Manual Wrapper Generation. Automatic Wrapper Generation. Grammar Induction Approach. Overview. Limitations. Website Structure-based Approach Automatic Wrapper Generation Kristina Lerman University of Southern California Manual Wrapper Generation Manual wrapper generation requires user to Specify the schema of the information source Single tuple

More information

Certified Business Analysis Professional (CBAP )

Certified Business Analysis Professional (CBAP ) Certified Business Analysis Professional (CBAP ) 3 Days Classroom Training PHILIPPINES :: MALAYSIA :: VIETNAM :: SINGAPORE :: INDIA Content Certified Business Analysis Professional - (CBAP ) Introduction

More information

Most, but not all, state associations link to the VU web site.

Most, but not all, state associations link to the VU web site. 1 Most, but not all, state associations link to the VU web site. The graphic above was taken from the Arizona association which is one of the biggest promoters of the VU. If you Googled virtual university

More information

Google Domination SEO Copywriting Secrets For Business Owners

Google Domination SEO Copywriting Secrets For Business Owners Page 1 of 5 06 FBI Consultancy PDF For 1000 Business Article This business support article has a minimum re-sale value of 50 or $70 It is provided FREE of charge to all consultants & business owners who

More information

Learning Path Queries on Graph Databases

Learning Path Queries on Graph Databases Learning Path Queries on Graph Databases Radu Ciucanu joint work with Angela Bonifati and Aurélien Lemay University of Lille, France INRIA Lille Nord Europe Links Project EDBT 15 March 24, 2015 Radu Ciucanu

More information

2. Data Preprocessing

2. Data Preprocessing 2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459

More information

3. Data Preprocessing. 3.1 Introduction

3. Data Preprocessing. 3.1 Introduction 3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation

More information

Procedures to become a Public Service Training Instructor

Procedures to become a Public Service Training Instructor Procedures to become a Public Service Training Instructor The procedures to be a Public Service Training Instructor are defined by Policy 5202 (Legislative Rule 126CSR136) and other policies of the West

More information

1Z0-526

1Z0-526 1Z0-526 Passing Score: 800 Time Limit: 4 min Exam A QUESTION 1 ABC's Database administrator has divided its region table into several tables so that the west region is in one table and all the other regions

More information

Data Integration. Lecture 23. Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems. CompSci 516: Data Intensive Computing Systems

Data Integration. Lecture 23. Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems. CompSci 516: Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 23 Data Integration Instructor: Sudeepa Roy Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 1 Announcements No class next week thanksgiving

More information

DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY

DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY Reham I. Abdel Monem 1, Ali H. El-Bastawissy 2 and Mohamed M. Elwakil 3 1 Information Systems Department, Faculty of computers and information,

More information

TRIPWIRE VULNERABILITY RISK METRICS CONNECTING SECURITY TO THE BUSINESS

TRIPWIRE VULNERABILITY RISK METRICS CONNECTING SECURITY TO THE BUSINESS CONFIDENCE: SECURED WHITE PAPER IRFAHN KHIMJI, CISSP TRIPWIRE VULNERABILITY RISK METRICS CONNECTING SECURITY TO THE BUSINESS ADVANCED THREAT PROTECTION, SECURITY AND COMPLIANCE EXECUTIVE SUMMARY A vulnerability

More information

Multi column matching for database schema translation

Multi column matching for database schema translation Multi column matching for database schema translation Robert Warren, Frank Wm. Tompa School of Computer Science, University of Waterloo {rhwarren, fwtompa}@uwaterloo.ca Technical Report CS-2005-24 August

More information

What is. Search Engine Marketing

What is. Search Engine Marketing What is Search Engine Marketing About the presenter Tom Fernandez CRMLS Smart Solutions Specialist 909-859-2040 ext.2095 tom@crmls.org About this class 1. Good for all agents (with or without a website)

More information

Introduction Implementation of the Coalescing Operator Performance of Coalescing Conclusion. Temporal Coalescing. Roger Schneider 17.4.

Introduction Implementation of the Coalescing Operator Performance of Coalescing Conclusion. Temporal Coalescing. Roger Schneider 17.4. Temporal Coalescing Roger Schneider 17.4.2010 Content Introduction Example Definition of Coalescing Example Definition of Coalescing Motivating Example AHVNr PNo WCode valid time (VT) 1000 10 a 08:00,

More information

How To Enter A New Customer Order - Self-Installing Dealer (DSI) Desk Reference

How To Enter A New Customer Order - Self-Installing Dealer (DSI) Desk Reference Summary This covers: Gathering Information before placing an Order Order > Add Customer Tab Serviceability Page Contacts Page Packages Page Options Page Payment Page Review Page Confirmation Page Additional

More information

Getting the most from your websites SEO. A seven point guide to understanding SEO and how to maximise results

Getting the most from your websites SEO. A seven point guide to understanding SEO and how to maximise results Getting the most from your websites SEO A seven point guide to understanding SEO and how to maximise results About this document SEO: What is it? This document is aimed at giving a SEO: What is it? 2 SEO

More information

EECS-3421a: Test #1 Design

EECS-3421a: Test #1 Design 2016 October 12 EECS-3421a: Test #1 1 of 14 EECS-3421a: Test #1 Design Electrical Engineering & Computer Science Lassonde School of Engineering York University Family Name: Given Name: Student#: EECS Account:

More information

Mining Frequent Itemsets in Time-Varying Data Streams

Mining Frequent Itemsets in Time-Varying Data Streams Mining Frequent Itemsets in Time-Varying Data Streams Abstract A transactional data stream is an unbounded sequence of transactions continuously generated, usually at a high rate. Mining frequent itemsets

More information

MIDTERM EXAMINATION Networked Life (NETS 112) November 21, 2013 Prof. Michael Kearns

MIDTERM EXAMINATION Networked Life (NETS 112) November 21, 2013 Prof. Michael Kearns MIDTERM EXAMINATION Networked Life (NETS 112) November 21, 2013 Prof. Michael Kearns This is a closed-book exam. You should have no material on your desk other than the exam itself and a pencil or pen.

More information

Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation

Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation Kaushik Chakrabarti Venkatesh Ganti Jiawei Han Dong Xin* Microsoft Research Microsoft Research University of Illinois University

More information

Online Digital Transformation Courses COB Certified E-Commerce & E-Business Manager E-Learning Options

Online Digital Transformation Courses COB Certified E-Commerce & E-Business Manager E-Learning Options Online Digital Transformation Courses COB Certified E-Commerce & E-Business Manager E-Learning Options Course Information GBP Edition The Institute for Business Advancement www.iba.insitute August 2017

More information

How To Enter A Sales Order Sales Only Dealer Desk Reference

How To Enter A Sales Order Sales Only Dealer Desk Reference Summary This Desk Reference covers: Gathering Information before placing an Order Order > Add Customer Tab Serviceability Page Contacts Page Packages Page Options Page Payment Page Review Page Schedule

More information

CSE-6490B Final Exam

CSE-6490B Final Exam February 2009 CSE-6490B Final Exam Fall 2008 p 1 CSE-6490B Final Exam In your submitted work for this final exam, please include and sign the following statement: I understand that this final take-home

More information

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM 20 CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM 2.1 CLASSIFICATION OF CONVENTIONAL TECHNIQUES Classical optimization methods can be classified into two distinct groups:

More information

UNIT 6 MODELLING DECISION PROBLEMS (LP)

UNIT 6 MODELLING DECISION PROBLEMS (LP) UNIT 6 MODELLING DECISION This unit: PROBLEMS (LP) Introduces the linear programming (LP) technique to solve decision problems 1 INTRODUCTION TO LINEAR PROGRAMMING A Linear Programming model seeks to maximize

More information

µbe: User Guided Source Selection and Schema Mediation for Internet Scale Data Integration

µbe: User Guided Source Selection and Schema Mediation for Internet Scale Data Integration µbe: User Guided Source Selection and Schema Mediation for Internet Scale Data Integration Ashraf Aboulnaga Kareem El Gebaly University of Waterloo {ashraf, kelgebal}@cs.uwaterloo.ca Abstract The typical

More information

DEVELOPMENT AND EVALUATION OF A SYSTEM FOR CHECKING FOR IMPROPER SENDING OF PERSONAL INFORMATION IN ENCRYPTED

DEVELOPMENT AND EVALUATION OF A SYSTEM FOR CHECKING FOR IMPROPER SENDING OF PERSONAL INFORMATION IN ENCRYPTED DEVELOPMENT AND EVALUATION OF A SYSTEM FOR CHECKING FOR IMPROPER SENDING OF PERSONAL INFORMATION IN ENCRYPTED E-MAIL Kenji Yasu 1, Yasuhiko Akahane 2, Masami Ozaki 1, Koji Semoto 1, Ryoichi Sasaki 1 1

More information

STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE

STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE Wei-ning Qian, Hai-lei Qian, Li Wei, Yan Wang and Ao-ying Zhou Computer Science Department Fudan University Shanghai 200433 E-mail: wnqian@fudan.edu.cn

More information

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search 1 / 33 Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search Bernd Wittefeld Supervisor Markus Löckelt 20. July 2012 2 / 33 Teaser - Google Web History http://www.google.com/history

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Lecture Notes for Chapter 2: Getting Started

Lecture Notes for Chapter 2: Getting Started Instant download and all chapters Instructor's Manual Introduction To Algorithms 2nd Edition Thomas H. Cormen, Clara Lee, Erica Lin https://testbankdata.com/download/instructors-manual-introduction-algorithms-2ndedition-thomas-h-cormen-clara-lee-erica-lin/

More information

Dynamic Time Warping & Search

Dynamic Time Warping & Search Dynamic Time Warping & Search Dynamic time warping Search Graph search algorithms Dynamic programming algorithms 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 1 Word-Based Template Matching

More information

(a) Explain how physical data dependencies can increase the cost of maintaining an information

(a) Explain how physical data dependencies can increase the cost of maintaining an information NOTE 1: This is a closed book examination. For example, class text, copies of overhead slides and printed notes may not be used. There are 11 pages. The last page, only, may be separated and used as an

More information

Evolving Variable-Ordering Heuristics for Constrained Optimisation

Evolving Variable-Ordering Heuristics for Constrained Optimisation Griffith Research Online https://research-repository.griffith.edu.au Evolving Variable-Ordering Heuristics for Constrained Optimisation Author Bain, Stuart, Thornton, John, Sattar, Abdul Published 2005

More information

A Parallel Implementation of a Higher-order Self Consistent Mean Field. Effectively solving the protein repacking problem is a key step to successful

A Parallel Implementation of a Higher-order Self Consistent Mean Field. Effectively solving the protein repacking problem is a key step to successful Karl Gutwin May 15, 2005 18.336 A Parallel Implementation of a Higher-order Self Consistent Mean Field Effectively solving the protein repacking problem is a key step to successful protein design. Put

More information

COSC Dr. Ramon Lawrence. Emp Relation

COSC Dr. Ramon Lawrence. Emp Relation COSC 304 Introduction to Database Systems Normalization Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Normalization Normalization is a technique for producing relations

More information

Unsupervised Semantic Parsing

Unsupervised Semantic Parsing Unsupervised Semantic Parsing Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos) 1 Outline Motivation Unsupervised semantic parsing Learning and inference

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Structured Data on the Web

Structured Data on the Web Structured Data on the Web Alon Halevy Google Australasian Computer Science Week January, 2010 Structured Data & The Web Andree Hudson, 4 th of July Hard to find structured data via search engines

More information

NALA Certifying Board Announces New Exam Specifications Effective with 2018 Administrations

NALA Certifying Board Announces New Exam Specifications Effective with 2018 Administrations NALA Certifying Board Announces New Exam Specifications Effective with 2018 Administrations Background The NALA Certifying Board provides oversight for the development and ongoing maintenance of the Certified

More information

Design of Parallel Algorithms. Models of Parallel Computation

Design of Parallel Algorithms. Models of Parallel Computation + Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms and Concurrency n Introduction to Parallel Algorithms n Tasks and Decomposition n Processes and Mapping n Processes

More information

10 Tips for Real Estate Agents looking for an Internet Fax Service

10 Tips for Real Estate Agents looking for an Internet Fax Service 10 Tips for Real Estate Agents looking for an Internet Fax Service June 22, 2006 Wendy Lowe 1 Agenda 10 Tips for Real Estate agents looking to purchase Internet Fax Introduction to MyFax Q&A 2 Real Estate

More information

7. Solve the following compound inequality. Write the solution in interval notation.

7. Solve the following compound inequality. Write the solution in interval notation. 1. Write an inequality that describes the graph. 0 1. Use words to describe the line graph. 4. Create a line graph of the inequality. x > 4 4. Write an inequality that describes the graph. 0 5. Write an

More information

CMA CMA Create Save Comparative Market Analyses CMA Analysis Resume Comparable Pricing Estimated Seller Proceeds Comparison Adjustable

CMA CMA Create Save Comparative Market Analyses CMA Analysis Resume Comparable Pricing Estimated Seller Proceeds Comparison Adjustable CMA By clicking CMA from the Navica main menu, you are able to Create and Save a Comparative Market Analyses for your Clients using data that is already stored within the system. The CMA feature within

More information

Multiple Query Optimization for Density-Based Clustering Queries over Streaming Windows

Multiple Query Optimization for Density-Based Clustering Queries over Streaming Windows Worcester Polytechnic Institute DigitalCommons@WPI Computer Science Faculty Publications Department of Computer Science 4-1-2009 Multiple Query Optimization for Density-Based Clustering Queries over Streaming

More information

The Threshold Algorithm: from Middleware Systems to the Relational Engine

The Threshold Algorithm: from Middleware Systems to the Relational Engine IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.?, NO.?,?? 1 The Threshold Algorithm: from Middleware Systems to the Relational Engine Nicolas Bruno Microsoft Research nicolasb@microsoft.com Hui(Wendy)

More information

Appropriate Item Partition for Improving the Mining Performance

Appropriate Item Partition for Improving the Mining Performance Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National

More information

Saving Time and Costs with Virtual Patching and Legacy Application Modernizing

Saving Time and Costs with Virtual Patching and Legacy Application Modernizing Case Study Virtual Patching/Legacy Applications May 2017 Saving Time and Costs with Virtual Patching and Legacy Application Modernizing Instant security and operations improvement without code changes

More information

System Setup. Accessing the Setup. Chapter 1

System Setup. Accessing the Setup. Chapter 1 System Setup Chapter 1 Chapter 1 System Setup When you create deals, certain pieces of standard information must be entered repeatedly. Continually entering the same information takes time and leaves you

More information

New Matrix Features Version 5.5. Count on the Fly. Contact Carts Navigation Bar Improvements Goggles Market Watch Widget Stats

New Matrix Features Version 5.5. Count on the Fly. Contact Carts Navigation Bar Improvements Goggles Market Watch Widget Stats New Matrix Features Version 5.5 Count on the Fly Contact Carts Navigation Bar Improvements Goggles Market Watch Widget Stats Count on the Fly When conducting a search, Count On the Fly displays the number

More information

Clustering Using Graph Connectivity

Clustering Using Graph Connectivity Clustering Using Graph Connectivity Patrick Williams June 3, 010 1 Introduction It is often desirable to group elements of a set into disjoint subsets, based on the similarity between the elements in the

More information

Wire Fraud Begins to Hammer the Construction Industry

Wire Fraud Begins to Hammer the Construction Industry Wire Fraud Begins to Hammer the Construction Industry Cybercriminals are adding new housing construction to their fraud landscape and likely on a wide scale. Created and published by: Thomas W. Cronkright

More information

FUNCTIONAL DEPENDENCIES

FUNCTIONAL DEPENDENCIES FUNCTIONAL DEPENDENCIES CS 564- Spring 2018 ACKs: Dan Suciu, Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? Database Design Theory: Functional Dependencies Armstrong s rules The Closure Algorithm

More information

Digital Audience Analysis: Understanding Online Car Shopping Behavior & Sources of Traffic to Dealer Websites

Digital Audience Analysis: Understanding Online Car Shopping Behavior & Sources of Traffic to Dealer Websites October 2012 Digital Audience Analysis: Understanding Online Car Shopping Behavior & Sources of Traffic to Dealer Websites The Internet has rapidly equipped shoppers with more tools, resources, and overall

More information

Jianyong Wang Department of Computer Science and Technology Tsinghua University

Jianyong Wang Department of Computer Science and Technology Tsinghua University Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity

More information

Additional reading for this lecture: Heuristic Evaluation by Jakob Nielsen. Read the first four bulleted articles, starting with How to conduct a

Additional reading for this lecture: Heuristic Evaluation by Jakob Nielsen. Read the first four bulleted articles, starting with How to conduct a Additional reading for this lecture: Heuristic Evaluation by Jakob Nielsen. Read the first four bulleted articles, starting with How to conduct a heuristic evaluation and ending with How to rate severity.

More information

COB Certified E-Commerce & E-Business Manager E-Learning Options

COB Certified E-Commerce & E-Business Manager E-Learning Options COB Certified E-Commerce & E-Business Manager E-Learning Options Course Information USD Edition The Certificate in Online Business www.cobcertified.com August 2017 Edition V.5 1 Table of Contents INTRODUCTION...

More information

Efficient Mining Algorithms for Large-scale Graphs

Efficient Mining Algorithms for Large-scale Graphs Efficient Mining Algorithms for Large-scale Graphs Yasunari Kishimoto, Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka Abstract This article describes efficient graph mining algorithms designed

More information

Mining Generalised Emerging Patterns

Mining Generalised Emerging Patterns Mining Generalised Emerging Patterns Xiaoyuan Qian, James Bailey, Christopher Leckie Department of Computer Science and Software Engineering University of Melbourne, Australia {jbailey, caleckie}@csse.unimelb.edu.au

More information

Collective Entity Resolution in Relational Data

Collective Entity Resolution in Relational Data Collective Entity Resolution in Relational Data I. Bhattacharya, L. Getoor University of Maryland Presented by: Srikar Pyda, Brett Walenz CS590.01 - Duke University Parts of this presentation from: http://www.norc.org/pdfs/may%202011%20personal%20validation%20and%20entity%20resolution%20conference/getoorcollectiveentityresolution

More information

The Quick Guide to Better Site Search

The Quick Guide to Better Site Search The Quick Guide to Better Site Search Start improving your site search today sli-systems.com sli-systems.com.au sli-systems.co.uk To accelerate your e-commerce, start with site search Turn Your Browsers

More information

Example: CPU-bound process that would run for 100 quanta continuously 1, 2, 4, 8, 16, 32, 64 (only 37 required for last run) Needs only 7 swaps

Example: CPU-bound process that would run for 100 quanta continuously 1, 2, 4, 8, 16, 32, 64 (only 37 required for last run) Needs only 7 swaps Interactive Scheduling Algorithms Continued o Priority Scheduling Introduction Round-robin assumes all processes are equal often not the case Assign a priority to each process, and always choose the process

More information

Price Performance Analysis of NxtGen Vs. Amazon EC2 and Rackspace Cloud.

Price Performance Analysis of NxtGen Vs. Amazon EC2 and Rackspace Cloud. Price Performance Analysis of Vs. EC2 and Cloud. Performance Report: ECS Performance Analysis of Virtual Machines on ECS and Competitive IaaS Offerings An Examination of Web Server and Database Workloads

More information

Reliability Measure of 2D-PAGE Spot Matching using Multiple Graphs

Reliability Measure of 2D-PAGE Spot Matching using Multiple Graphs Reliability Measure of 2D-PAGE Spot Matching using Multiple Graphs Dae-Seong Jeoune 1, Chan-Myeong Han 2, Yun-Kyoo Ryoo 3, Sung-Woo Han 4, Hwi-Won Kim 5, Wookhyun Kim 6, and Young-Woo Yoon 6 1 Department

More information

Kristina Lerman Anon Plangprasopchok Craig Knoblock. USC Information Sciences Institute

Kristina Lerman Anon Plangprasopchok Craig Knoblock. USC Information Sciences Institute Kristina Lerman Anon Plangprasopchok Craig Knoblock Check weather forecast Find hotels address Select hotel by price, features and reviews Find flights features Get distance to hotel Email agenda to attendees

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

Online supplement for: A Study of Quality and Accuracy Tradeoffs in Process Mining, by Zan Huang and Akhil Kumar (Appendices A F) APPENDIX A

Online supplement for: A Study of Quality and Accuracy Tradeoffs in Process Mining, by Zan Huang and Akhil Kumar (Appendices A F) APPENDIX A Online supplement for: A Study of Quality and Accuracy Tradeoffs in Process Mining, by Zan Huang and Akhil Kumar (Appendices A F) APPENDIX A This example illustrates the calculation of the mismerge score

More information

A Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System

A Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System A Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System Takashi Yukawa Nagaoka University of Technology 1603-1 Kamitomioka-cho, Nagaoka-shi Niigata, 940-2188 JAPAN

More information

Presented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu

Presented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu Presented by: Dimitri Galmanovich Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu 1 When looking for Unstructured data 2 Millions of such queries every day

More information

Performance Analysis of Virtual Machines on NxtGen ECS and Competitive IaaS Offerings An Examination of Web Server and Database Workloads

Performance Analysis of Virtual Machines on NxtGen ECS and Competitive IaaS Offerings An Examination of Web Server and Database Workloads Performance Report: ECS Performance Analysis of Virtual Machines on ECS and Competitive IaaS Offerings An Examination of Web Server and Database Workloads April 215 EXECUTIVE SUMMARY commissioned this

More information

NAME SIMILARITY MEASURES FOR XML SCHEMA MATCHING

NAME SIMILARITY MEASURES FOR XML SCHEMA MATCHING NAME SIMILARITY MEASURES FOR XML SCHEMA MATCHING Ali El Desoukey Mansoura University, Mansoura, Egypt Amany Sarhan, Alsayed Algergawy Tanta University, Tanta, Egypt Seham Moawed Middle Delta Company for

More information

Challenges and Benefits of a Methodology for Scoring Web Content Accessibility Guidelines (WCAG) 2.0 Conformance

Challenges and Benefits of a Methodology for Scoring Web Content Accessibility Guidelines (WCAG) 2.0 Conformance NISTIR 8010 Challenges and Benefits of a Methodology for Scoring Web Content Accessibility Guidelines (WCAG) 2.0 Conformance Frederick Boland Elizabeth Fong http://dx.doi.org/10.6028/nist.ir.8010 NISTIR

More information

Contractors Guide to Search Engine Optimization

Contractors Guide to Search Engine Optimization Contractors Guide to Search Engine Optimization CONTENTS What is Search Engine Optimization (SEO)? Why Do Businesses Need SEO (If They Want To Generate Business Online)? Which Search Engines Should You

More information

3 SOLVING PROBLEMS BY SEARCHING

3 SOLVING PROBLEMS BY SEARCHING 48 3 SOLVING PROBLEMS BY SEARCHING A goal-based agent aims at solving problems by performing actions that lead to desirable states Let us first consider the uninformed situation in which the agent is not

More information

Programming Logic and Design Sixth Edition

Programming Logic and Design Sixth Edition Objectives Programming Logic and Design Sixth Edition Chapter 6 Arrays In this chapter, you will learn about: Arrays and how they occupy computer memory Manipulating an array to replace nested decisions

More information

Matching Schemas in Online Communities: A Web 2.0 Approach

Matching Schemas in Online Communities: A Web 2.0 Approach Matching Schemas in Online Communities: A Web.0 Approach Robert McCann 1, Warren Shen, AnHai Doan 1 Microsoft, University of Wisconsin-Madison robert.mccann@microsoft.com, {whshen,anhai}@cs.wisc.edu Abstract

More information

Parser: SQL parse tree

Parser: SQL parse tree Jinze Liu Parser: SQL parse tree Good old lex & yacc Detect and reject syntax errors Validator: parse tree logical plan Detect and reject semantic errors Nonexistent tables/views/columns? Insufficient

More information

Final Exam Review (Revised 3/16) Math MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Final Exam Review (Revised 3/16) Math MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Final Exam Review (Revised 3/16) Math 0001 Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Evaluate. 1) 1 14 1) A) 1 B) 114 C) 14 D) undefined

More information

Processing Structural Constraints

Processing Structural Constraints SYNONYMS None Processing Structural Constraints Andrew Trotman Department of Computer Science University of Otago Dunedin New Zealand DEFINITION When searching unstructured plain-text the user is limited

More information