By Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy and Pedro Domingos. Presented by Yael Kazaz
|
|
- Ethelbert Rodgers
- 5 years ago
- Views:
Transcription
1 By Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy and Pedro Domingos Presented by Yael Kazaz
2 Example: Merging Real-Estate Agencies Two real-estate agencies: S and T, decide to merge Schema T has one table: Listings Schema S has two tables: Houses and Agents Merging schema S into schema T
3 Example: Making Tuples Using SQL area = SELECT location from HOUSES agent-name = SELECT name from AGENTS agent-address = SELECT concat(city, state) FROM AGENTS list-price = SELECT price * (1 + fee-rate) FROM HOUSES, AGENTS WHERE agent-id = id
4 Motivation Creating matches between data sources important Manually creating matched is hard Past attempts deal only with one-to-one (1-1) matches For example: Address = Location For example: Room_Price = Room_Rate Complex matches are not considered For example: Address = concat(city, state) For example: Room_Price = Room_Rate*(1+ Tax_Rate)
5 Introducing the imap System Semi-automatically discovering: 1-1 matches Complex matches Semi automatically constructing complex matches is very important since complex matched compose up to half of the matches!
6 Complex Matches Creating complex matches is harder than 1-1 matches: the match space can be very large or even infinite! The number of 1-1 matches is bounded The number of complex matches is not. There are an un-bounded number of functions for combining attributes in a schema
7 Example: 1-1 Matches area = SELECT location from HOUSES agent-name = SELECT name from AGENTS agent-address = SELECT concat(city, state) FROM AGENTS list-price = SELECT price * (1 + fee-rate) FROM HOUSES, AGENTS WHERE agent-id = id
8 Example: Complex Matches area = SELECT location from HOUSES agent-name = SELECT name from AGENTS agent-address = SELECT concat(city, state) FROM AGENTS list-price = SELECT price * (1 + fee-rate) FROM HOUSES, AGENTS WHERE agent-id = id
9
10 Overview
11 Match Generator (1) Input: Target schema (T) Source schema (S) Output: Match candidates
12 Match Generator (2) What the Match Generator does: The Match Generator takes as input two schemas: S and T For each attribute t of T, it generates a set of match candidates: 1-1 and complex matches The generation in guided by a set of search modules
13 Match Generator (3) For Example: for t = area in T the candidates are: location in HOUSES name in AGENTS state in AGENTS concat(city, state) in AGENTS concat(name, city) in AGENTS
14 Match Generator (4) PROBLEM: Unbounded number of match candidates SOLUTION: Search the space of possible matches HOW: Use search modules, called searchers, each in charge of a specific type of attribute
15 Match Generator (5) Implemented searchers in imap: The searchers cover many complex match types: text, numeric, category, etc. The searchers evaluate match candidates, and exploit domain knowledge, such as domain constraints and overlap data
16 Match Generator (6) Applying search to candidate generation requires addressing three issues: (1) Search strategy (2) Evaluation of candidate matches (3) Termination condition
17 Match Generator (7) (1) Search strategy (2) Evaluation of candidate matches (3) Termination condition The space search can be very large or even unbounded We need to efficiently search such spaces imap addresses this problem using beam search
18 Match Generator (8) Beam Search Example: (K = 3) A A A A B C D (3) (5) (1) B (3) (5) C D B C D (5) E F (4) (6) G H (6) (5) E F (4) (6) S = {A} S = {B C D} S = {B C E} S = {C E H}
19 Match Generator (9) (1) Search strategy (2) Evaluation of candidate matches (3) Termination condition We use beam search to search candidate matches Beam search uses a scoring function to evaluate each match candidate At each level of the search tree, it keeps only k highest-scoring match The searcher can conduct an efficient search in any type of search space
20 Match Generator (10) (1) Search strategy (2) Evaluation of candidate matches (3) Termination condition To conduct beam search, given a match candidate, we assign to it a score of the distance between it and the target attribute For example: Given a match candidate: concat(city, state) we approximates the distance between it and the target attribute: (agent-address)
21 Match Generator (11) (1) Search strategy (2) Evaluation of candidate matches (3) Termination condition imap uses techniques to compute the candidate scores: Machine learning Statistics Heuristics
22 Match Generator (12) (1) Search strategy (2) Evaluation of candidate matches (3) Termination condition The search space can be unbounded We need to decide when to stop the search We terminate when we start seeing diminishing returns from our search
23 Match Generator (13) (1) Search strategy (2) Evaluation of candidate matches (3) Termination condition In the i th iteration we keep track of the highest score of candidate matches Max i If (Max i+1 Max i ) < threshold: we stop the search and return the k highest match candidates
24 Overview
25 Similarity Estimator (1) Input: Match candidates Output: Similarity matrix
26 Similarity Estimator (2) What the Similarity Estimator does: Computes for each candidate a score of similarity to attribute t of T The output of this module is a matrix that stores the similarity score of the pairs: <target attribute, match candidate> Target attribute t1 Target attribute t2 Target attribute t3. Match candidate Match candidate Match candidate
27 Similarity Estimator (3) name concat(city, state) price price * (1 + fee-rate) area list-price agent-address agent-name
28 Similarity Estimator (4) PROBLEM: Deciding which candidate is better than another SOLUTION: For each target attribute t of T, the searchers suggest a set of match candidates HOW: The scores assigned to each of the candidate matches is based only on a single type of information For example: the text searcher considers only word frequencies
29 Similarity Estimator (5) The Similarity Estimator evaluates these candidates, and assign to each of them a final score The Similarity Estimator exploits additional types of information to for an accurate score The Similarity Estimator employs evaluator modules that exploit types of information, and then combines the suggested scores into one final score
30 Overview
31 Match Selector (1) Input: Similarity matrix Output: 1-1 matches Complex matches
32 Match Selector (2) What the Match Selector does: Examines the Similarity Matrix and outputs the best matches for the attributes of T, under certain constraints name concat(city, state) price price * (1 + fee-rate) area list-price agent-address agent-name
33 Match Selector (3) PROBLEM: How to match the best candidate to the attribute SOLUTION: The best global match could be where each target attribute is assigned the match with the highest score BUT This match assignment may not be acceptable because it may violate domain constraints
34 For Example: The imap Architecture: Match Selector (4) Domain constraint: name and city in AGENTS have no relation Result: The tuple area= SELECT concat(name, city) FROM AGENTS is not selected
35 Overview
36 Exploiting Domain Knowledge (1) Exploiting domain knowledge is beneficial on 1-1 matching On complex matching, it is more crucial: early detection of unlikely matches
37 Exploiting Domain Knowledge (2) For example: imap learns that the number of real estate agents in a specific area is bounded by 50 Given the match: agent-name = concat(firstname, last-name), where first-name and last-name belong to the home owner, imap will realize that concat(first-name, last-name) results in hundreds of distinct names Conclusion: concat(first-name, last-name) is unlikely to match agent-name
38 Exploiting Domain Knowledge (3) imap exploits domain knowledge: Domain constraints Past complex matches Overlap data External data
39 Exploiting Domain Knowledge (4) imap exploits domain knowledge: Domain constraints Past complex matches Overlap data External data
40 Exploiting Domain Knowledge (5) Constraints are either: Present in the schema For example: agent-name is a text and amount is only numeric Provided by an expert For example: The tax on a room cannot be less than 7% Provided by the user For example: We do not sell houses that cost less than 200,000$
41 Exploiting Domain Knowledge (6) imap considers 3 kinds of constraints: Two attributes are un-related For example: name and beds are unrelated; meaning that they cannot appear in the same match formula Constraint on a single attribute For example: the average value of numrooms does not exceed 10 Multiple schema attributes are un-related For example: area and agent-name are unrelated
42 Exploiting Domain Knowledge (7) imap exploits domain knowledge: Domain constraints Past complex matches Overlap data External data
43 Exploiting Domain Knowledge (8) Sometimes, the same or similar schemas are mapped repeatedly imap extracts the expression template of these matches, and guides the search process For example: Given the past match: price = pr * ( ), imap will extract: VARIABLE * (1 + CONSTANT) and ask the numeric searcher to look for matches for that template
44 Exploiting Domain Knowledge (9) imap exploits domain knowledge: Domain constraints Past complex matches Overlap data External data
45 Exploiting Domain Knowledge (10) There are many cases where the source (S) and target (T) share the same data For example: database S and T share a house listing ( Atlanta, GA, ) In such overlap cases the shared data provides valuable information for the mapping process
46 Exploiting Domain Knowledge (11) Searchers that exploit overlap data: Overlap Text Searcher Overlap Numeric Searcher Overlap Category & Schema Mismatch Searchers
47 Exploiting Domain Knowledge (12) HOW THE SEARCHERS WORK: Step 1: Use the original searchers for an initial mapping Step 2: Use the overlap data to re-evaluate the mappings for improved matching accuracy For example: when re-evaluated, mapping: agent-address = location, receives score 0 because it is not correct for the shared house listing: ( Atlanta, GA, ) and agent-address = concat(city, state) receives score 1
48 Exploiting Domain Knowledge (13) imap exploits domain knowledge: Domain constraints Past complex matches Overlap data External data
49 Exploiting Domain Knowledge (14) External data is used as additional constraints on the attributes of a schema Usually provided by experts Can be very useful in schema matching
50 Exploiting Domain Knowledge (15) HOW EXPLOITING EXTERNAL DATA WORKS: Step 1: Use external data to learn about the feature Step 2: Apply the learned information to evaluate matches for the target attribute For example: Target attribute: agent-name A feature that can be potentially useful in schema matching: number of distinct agent names
51 Overview
52 Generating Explanations (1) Example: In matching real-estate schemas, for attribute: list-price, imap has produced the matches: list-price = price list-price = price * (1 + monthly-fee-rate)
53 Generating Explanations (2) PROBLEM: The user is uncertain which of the two is the correct match SOLUTION: imap must explain the ranking: For example: why did list-price = price get a higher rank than list-price = price * (1 + monthly-fee-rate)?
54 Generating Explanations (3) imap s goal is to provide an environment where a human user can quickly generate a mapping between a pair of schemas For a user to know which match to choose, imap must supply an explanation for each of the matches
55 Generating Explanations (4) imap considers 3 questions: Explain existing match: Why does the match exist? For example: Why the match month-posted = month-fee-rate exist? Explain absent match: Why doesn t the match exist? Explain match ranking: Why is one match better than another?
56 Generating Explanations (5) imap keeps track of the decision making progress in a dependency graph Each node is one of the following: Schema attribute Assumption Candidate match Domain knowledge An edge between two nodes means that one node leads to another
57 Generating Explanations (6)
58
59 Conclusions (1) Matches are key for enabling a wide variety of data sharing and exchange scenarios The majority of the research on schema matching has focused on 1-1 matches imap offers a solution to the problem of finding complex matches The key challenge with complex matches is that the space of possible matching candidates is possibly unbounded, and evaluating each candidate is harder
60 Conclusions (2) The architecture of imap is modular and extensible New searchers and new evaluation modules can be added easily Experimental results show that imap achieves 43-92% accuracy on several real world domains, thus demonstrating the promise of the approach
61
62
imap: Discovering Complex Semantic Matches between Database Schemas
imap: Discovering Complex Semantic Matches between Database Schemas Robin Dhamankar, Yoonkyong Lee, AnHai Doan Department of Computer Science University of Illinois, Urbana-Champaign, IL, USA {dhamanka,ylee11,anhai}@cs.uiuc.edu
More informationPartly based on slides by AnHai Doan
Partly based on slides by AnHai Doan New faculty member Find houses with 2 bedrooms priced under 200K realestate.com homeseekers.com homes.com 2 Find houses with 2 bedrooms priced under 200K mediated schema
More informationLearning mappings and queries
Learning mappings and queries Marie Jacob University Of Pennsylvania DEIS 2010 1 Schema mappings Denote relationships between schemas Relates source schema S and target schema T Defined in a query language
More informationDatabase Technologies. Madalina CROITORU IUT Montpellier
Database Technologies Madalina CROITORU croitoru@lirmm.fr IUT Montpellier Course practicalities 2 x 2h per week (14 weeks) Basics of database theory relational model, relational algebra, SQL and database
More informationCustomer Clustering using RFM analysis
Customer Clustering using RFM analysis VASILIS AGGELIS WINBANK PIRAEUS BANK Athens GREECE AggelisV@winbank.gr DIMITRIS CHRISTODOULAKIS Computer Engineering and Informatics Department University of Patras
More informationInteractive Machine Learning (IML) Markup of OCR Generated Text by Exploiting Domain Knowledge: A Biodiversity Case Study
Interactive Machine Learning (IML) Markup of OCR Generated by Exploiting Domain Knowledge: A Biodiversity Case Study Several digitization projects such as Google books are involved in scanning millions
More informationLearning to Match Ontologies on the Semantic Web
The VLDB Journal manuscript No. (will be inserted by the editor) Learning to Match Ontologies on the Semantic Web AnHai Doan, Jayant Madhavan, Robin Dhamankar, Pedro Domingos, Alon Halevy Department of
More informationEscaping Local Optima: Genetic Algorithm
Artificial Intelligence Escaping Local Optima: Genetic Algorithm Dae-Won Kim School of Computer Science & Engineering Chung-Ang University We re trying to escape local optima To achieve this, we have learned
More informationLearning to Match the Schemas of Data Sources: A Multistrategy Approach
Machine Learning, 50, 279 301, 2003 c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. Learning to Match the Schemas of Data Sources: A Multistrategy Approach ANHAI DOAN anhai@cs.uiuc.edu
More informationSeMap: A Generic Schema Matching System
SeMap: A Generic Schema Matching System by Ting Wang B.Sc., Zhejiang University, 2004 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science in The Faculty of
More informationNOTE 1: This is a closed book examination. For example, class text, copies of overhead slides and printed notes may not be used. There are 11 pages.
NOTE 1: This is a closed book examination. For example, class text, copies of overhead slides and printed notes may not be used. There are 11 pages. The last page, only, may be separated and used as an
More informationInformation Discovery, Extraction and Integration for the Hidden Web
Information Discovery, Extraction and Integration for the Hidden Web Jiying Wang Department of Computer Science University of Science and Technology Clear Water Bay, Kowloon Hong Kong cswangjy@cs.ust.hk
More informationImplementation Techniques
V Implementation Techniques 34 Efficient Evaluation of the Valid-Time Natural Join 35 Efficient Differential Timeslice Computation 36 R-Tree Based Indexing of Now-Relative Bitemporal Data 37 Light-Weight
More informationSIMON: A Multi-strategy Classification Approach Resolving Ontology Heterogeneity The P2P Meets the Semantic Web *
SIMON: A Multi-strategy Classification Approach Resolving Ontology Heterogeneity The P2P Meets the Semantic Web * Leyun Pan, Liang Zhang, and Fanyuan Ma Department of Computer Science and Engineering Shanghai
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationLearning to match ontologies on the Semantic Web
The VLDB Journal (2003) 12: 303 319 / Digital Object Identifier (DOI) 10.1007/s00778-003-0104-2 Learning to match ontologies on the Semantic Web AnHai Doan 1, Jayant Madhavan 2, Robin Dhamankar 1, Pedro
More informationA Generic Algorithm for Heterogeneous Schema Matching
You Li, Dongbo Liu, and Weiming Zhang A Generic Algorithm for Heterogeneous Schema Matching You Li1, Dongbo Liu,3, and Weiming Zhang1 1 Department of Management Science, National University of Defense
More informationBuilding a website. Should you build your own website?
Building a website As discussed in the previous module, your website is the online shop window for your business and you will only get one chance to make a good first impression. It is worthwhile investing
More informationON HEURISTIC METHODS IN NEXT-GENERATION SEQUENCING DATA ANALYSIS
ON HEURISTIC METHODS IN NEXT-GENERATION SEQUENCING DATA ANALYSIS Ivan Vogel Doctoral Degree Programme (1), FIT BUT E-mail: xvogel01@stud.fit.vutbr.cz Supervised by: Jaroslav Zendulka E-mail: zendulka@fit.vutbr.cz
More informationCreating a Mediated Schema Based on Initial Correspondences
Creating a Mediated Schema Based on Initial Correspondences Rachel A. Pottinger University of Washington Seattle, WA, 98195 rap@cs.washington.edu Philip A. Bernstein Microsoft Research Redmond, WA 98052-6399
More informationManual Wrapper Generation. Automatic Wrapper Generation. Grammar Induction Approach. Overview. Limitations. Website Structure-based Approach
Automatic Wrapper Generation Kristina Lerman University of Southern California Manual Wrapper Generation Manual wrapper generation requires user to Specify the schema of the information source Single tuple
More informationCertified Business Analysis Professional (CBAP )
Certified Business Analysis Professional (CBAP ) 3 Days Classroom Training PHILIPPINES :: MALAYSIA :: VIETNAM :: SINGAPORE :: INDIA Content Certified Business Analysis Professional - (CBAP ) Introduction
More informationMost, but not all, state associations link to the VU web site.
1 Most, but not all, state associations link to the VU web site. The graphic above was taken from the Arizona association which is one of the biggest promoters of the VU. If you Googled virtual university
More informationGoogle Domination SEO Copywriting Secrets For Business Owners
Page 1 of 5 06 FBI Consultancy PDF For 1000 Business Article This business support article has a minimum re-sale value of 50 or $70 It is provided FREE of charge to all consultants & business owners who
More informationLearning Path Queries on Graph Databases
Learning Path Queries on Graph Databases Radu Ciucanu joint work with Angela Bonifati and Aurélien Lemay University of Lille, France INRIA Lille Nord Europe Links Project EDBT 15 March 24, 2015 Radu Ciucanu
More information2. Data Preprocessing
2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459
More information3. Data Preprocessing. 3.1 Introduction
3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation
More informationProcedures to become a Public Service Training Instructor
Procedures to become a Public Service Training Instructor The procedures to be a Public Service Training Instructor are defined by Policy 5202 (Legislative Rule 126CSR136) and other policies of the West
More information1Z0-526
1Z0-526 Passing Score: 800 Time Limit: 4 min Exam A QUESTION 1 ABC's Database administrator has divided its region table into several tables so that the west region is in one table and all the other regions
More informationData Integration. Lecture 23. Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems. CompSci 516: Data Intensive Computing Systems
CompSci 516 Data Intensive Computing Systems Lecture 23 Data Integration Instructor: Sudeepa Roy Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 1 Announcements No class next week thanksgiving
More informationDIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY
DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY Reham I. Abdel Monem 1, Ali H. El-Bastawissy 2 and Mohamed M. Elwakil 3 1 Information Systems Department, Faculty of computers and information,
More informationTRIPWIRE VULNERABILITY RISK METRICS CONNECTING SECURITY TO THE BUSINESS
CONFIDENCE: SECURED WHITE PAPER IRFAHN KHIMJI, CISSP TRIPWIRE VULNERABILITY RISK METRICS CONNECTING SECURITY TO THE BUSINESS ADVANCED THREAT PROTECTION, SECURITY AND COMPLIANCE EXECUTIVE SUMMARY A vulnerability
More informationMulti column matching for database schema translation
Multi column matching for database schema translation Robert Warren, Frank Wm. Tompa School of Computer Science, University of Waterloo {rhwarren, fwtompa}@uwaterloo.ca Technical Report CS-2005-24 August
More informationWhat is. Search Engine Marketing
What is Search Engine Marketing About the presenter Tom Fernandez CRMLS Smart Solutions Specialist 909-859-2040 ext.2095 tom@crmls.org About this class 1. Good for all agents (with or without a website)
More informationIntroduction Implementation of the Coalescing Operator Performance of Coalescing Conclusion. Temporal Coalescing. Roger Schneider 17.4.
Temporal Coalescing Roger Schneider 17.4.2010 Content Introduction Example Definition of Coalescing Example Definition of Coalescing Motivating Example AHVNr PNo WCode valid time (VT) 1000 10 a 08:00,
More informationHow To Enter A New Customer Order - Self-Installing Dealer (DSI) Desk Reference
Summary This covers: Gathering Information before placing an Order Order > Add Customer Tab Serviceability Page Contacts Page Packages Page Options Page Payment Page Review Page Confirmation Page Additional
More informationGetting the most from your websites SEO. A seven point guide to understanding SEO and how to maximise results
Getting the most from your websites SEO A seven point guide to understanding SEO and how to maximise results About this document SEO: What is it? This document is aimed at giving a SEO: What is it? 2 SEO
More informationEECS-3421a: Test #1 Design
2016 October 12 EECS-3421a: Test #1 1 of 14 EECS-3421a: Test #1 Design Electrical Engineering & Computer Science Lassonde School of Engineering York University Family Name: Given Name: Student#: EECS Account:
More informationMining Frequent Itemsets in Time-Varying Data Streams
Mining Frequent Itemsets in Time-Varying Data Streams Abstract A transactional data stream is an unbounded sequence of transactions continuously generated, usually at a high rate. Mining frequent itemsets
More informationMIDTERM EXAMINATION Networked Life (NETS 112) November 21, 2013 Prof. Michael Kearns
MIDTERM EXAMINATION Networked Life (NETS 112) November 21, 2013 Prof. Michael Kearns This is a closed-book exam. You should have no material on your desk other than the exam itself and a pencil or pen.
More informationRanking Objects by Exploiting Relationships: Computing Top-K over Aggregation
Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation Kaushik Chakrabarti Venkatesh Ganti Jiawei Han Dong Xin* Microsoft Research Microsoft Research University of Illinois University
More informationOnline Digital Transformation Courses COB Certified E-Commerce & E-Business Manager E-Learning Options
Online Digital Transformation Courses COB Certified E-Commerce & E-Business Manager E-Learning Options Course Information GBP Edition The Institute for Business Advancement www.iba.insitute August 2017
More informationHow To Enter A Sales Order Sales Only Dealer Desk Reference
Summary This Desk Reference covers: Gathering Information before placing an Order Order > Add Customer Tab Serviceability Page Contacts Page Packages Page Options Page Payment Page Review Page Schedule
More informationCSE-6490B Final Exam
February 2009 CSE-6490B Final Exam Fall 2008 p 1 CSE-6490B Final Exam In your submitted work for this final exam, please include and sign the following statement: I understand that this final take-home
More informationCHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM
20 CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM 2.1 CLASSIFICATION OF CONVENTIONAL TECHNIQUES Classical optimization methods can be classified into two distinct groups:
More informationUNIT 6 MODELLING DECISION PROBLEMS (LP)
UNIT 6 MODELLING DECISION This unit: PROBLEMS (LP) Introduces the linear programming (LP) technique to solve decision problems 1 INTRODUCTION TO LINEAR PROGRAMMING A Linear Programming model seeks to maximize
More informationµbe: User Guided Source Selection and Schema Mediation for Internet Scale Data Integration
µbe: User Guided Source Selection and Schema Mediation for Internet Scale Data Integration Ashraf Aboulnaga Kareem El Gebaly University of Waterloo {ashraf, kelgebal}@cs.uwaterloo.ca Abstract The typical
More informationDEVELOPMENT AND EVALUATION OF A SYSTEM FOR CHECKING FOR IMPROPER SENDING OF PERSONAL INFORMATION IN ENCRYPTED
DEVELOPMENT AND EVALUATION OF A SYSTEM FOR CHECKING FOR IMPROPER SENDING OF PERSONAL INFORMATION IN ENCRYPTED E-MAIL Kenji Yasu 1, Yasuhiko Akahane 2, Masami Ozaki 1, Koji Semoto 1, Ryoichi Sasaki 1 1
More informationSTRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE
STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE Wei-ning Qian, Hai-lei Qian, Li Wei, Yan Wang and Ao-ying Zhou Computer Science Department Fudan University Shanghai 200433 E-mail: wnqian@fudan.edu.cn
More informationLearning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search
1 / 33 Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search Bernd Wittefeld Supervisor Markus Löckelt 20. July 2012 2 / 33 Teaser - Google Web History http://www.google.com/history
More informationLeveraging Set Relations in Exact Set Similarity Join
Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,
More informationLecture Notes for Chapter 2: Getting Started
Instant download and all chapters Instructor's Manual Introduction To Algorithms 2nd Edition Thomas H. Cormen, Clara Lee, Erica Lin https://testbankdata.com/download/instructors-manual-introduction-algorithms-2ndedition-thomas-h-cormen-clara-lee-erica-lin/
More informationDynamic Time Warping & Search
Dynamic Time Warping & Search Dynamic time warping Search Graph search algorithms Dynamic programming algorithms 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 1 Word-Based Template Matching
More information(a) Explain how physical data dependencies can increase the cost of maintaining an information
NOTE 1: This is a closed book examination. For example, class text, copies of overhead slides and printed notes may not be used. There are 11 pages. The last page, only, may be separated and used as an
More informationEvolving Variable-Ordering Heuristics for Constrained Optimisation
Griffith Research Online https://research-repository.griffith.edu.au Evolving Variable-Ordering Heuristics for Constrained Optimisation Author Bain, Stuart, Thornton, John, Sattar, Abdul Published 2005
More informationA Parallel Implementation of a Higher-order Self Consistent Mean Field. Effectively solving the protein repacking problem is a key step to successful
Karl Gutwin May 15, 2005 18.336 A Parallel Implementation of a Higher-order Self Consistent Mean Field Effectively solving the protein repacking problem is a key step to successful protein design. Put
More informationCOSC Dr. Ramon Lawrence. Emp Relation
COSC 304 Introduction to Database Systems Normalization Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Normalization Normalization is a technique for producing relations
More informationUnsupervised Semantic Parsing
Unsupervised Semantic Parsing Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos) 1 Outline Motivation Unsupervised semantic parsing Learning and inference
More informationJoint Entity Resolution
Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute
More informationStructured Data on the Web
Structured Data on the Web Alon Halevy Google Australasian Computer Science Week January, 2010 Structured Data & The Web Andree Hudson, 4 th of July Hard to find structured data via search engines
More informationNALA Certifying Board Announces New Exam Specifications Effective with 2018 Administrations
NALA Certifying Board Announces New Exam Specifications Effective with 2018 Administrations Background The NALA Certifying Board provides oversight for the development and ongoing maintenance of the Certified
More informationDesign of Parallel Algorithms. Models of Parallel Computation
+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms and Concurrency n Introduction to Parallel Algorithms n Tasks and Decomposition n Processes and Mapping n Processes
More information10 Tips for Real Estate Agents looking for an Internet Fax Service
10 Tips for Real Estate Agents looking for an Internet Fax Service June 22, 2006 Wendy Lowe 1 Agenda 10 Tips for Real Estate agents looking to purchase Internet Fax Introduction to MyFax Q&A 2 Real Estate
More information7. Solve the following compound inequality. Write the solution in interval notation.
1. Write an inequality that describes the graph. 0 1. Use words to describe the line graph. 4. Create a line graph of the inequality. x > 4 4. Write an inequality that describes the graph. 0 5. Write an
More informationCMA CMA Create Save Comparative Market Analyses CMA Analysis Resume Comparable Pricing Estimated Seller Proceeds Comparison Adjustable
CMA By clicking CMA from the Navica main menu, you are able to Create and Save a Comparative Market Analyses for your Clients using data that is already stored within the system. The CMA feature within
More informationMultiple Query Optimization for Density-Based Clustering Queries over Streaming Windows
Worcester Polytechnic Institute DigitalCommons@WPI Computer Science Faculty Publications Department of Computer Science 4-1-2009 Multiple Query Optimization for Density-Based Clustering Queries over Streaming
More informationThe Threshold Algorithm: from Middleware Systems to the Relational Engine
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.?, NO.?,?? 1 The Threshold Algorithm: from Middleware Systems to the Relational Engine Nicolas Bruno Microsoft Research nicolasb@microsoft.com Hui(Wendy)
More informationAppropriate Item Partition for Improving the Mining Performance
Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National
More informationSaving Time and Costs with Virtual Patching and Legacy Application Modernizing
Case Study Virtual Patching/Legacy Applications May 2017 Saving Time and Costs with Virtual Patching and Legacy Application Modernizing Instant security and operations improvement without code changes
More informationSystem Setup. Accessing the Setup. Chapter 1
System Setup Chapter 1 Chapter 1 System Setup When you create deals, certain pieces of standard information must be entered repeatedly. Continually entering the same information takes time and leaves you
More informationNew Matrix Features Version 5.5. Count on the Fly. Contact Carts Navigation Bar Improvements Goggles Market Watch Widget Stats
New Matrix Features Version 5.5 Count on the Fly Contact Carts Navigation Bar Improvements Goggles Market Watch Widget Stats Count on the Fly When conducting a search, Count On the Fly displays the number
More informationClustering Using Graph Connectivity
Clustering Using Graph Connectivity Patrick Williams June 3, 010 1 Introduction It is often desirable to group elements of a set into disjoint subsets, based on the similarity between the elements in the
More informationWire Fraud Begins to Hammer the Construction Industry
Wire Fraud Begins to Hammer the Construction Industry Cybercriminals are adding new housing construction to their fraud landscape and likely on a wide scale. Created and published by: Thomas W. Cronkright
More informationFUNCTIONAL DEPENDENCIES
FUNCTIONAL DEPENDENCIES CS 564- Spring 2018 ACKs: Dan Suciu, Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? Database Design Theory: Functional Dependencies Armstrong s rules The Closure Algorithm
More informationDigital Audience Analysis: Understanding Online Car Shopping Behavior & Sources of Traffic to Dealer Websites
October 2012 Digital Audience Analysis: Understanding Online Car Shopping Behavior & Sources of Traffic to Dealer Websites The Internet has rapidly equipped shoppers with more tools, resources, and overall
More informationJianyong Wang Department of Computer Science and Technology Tsinghua University
Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity
More informationAdditional reading for this lecture: Heuristic Evaluation by Jakob Nielsen. Read the first four bulleted articles, starting with How to conduct a
Additional reading for this lecture: Heuristic Evaluation by Jakob Nielsen. Read the first four bulleted articles, starting with How to conduct a heuristic evaluation and ending with How to rate severity.
More informationCOB Certified E-Commerce & E-Business Manager E-Learning Options
COB Certified E-Commerce & E-Business Manager E-Learning Options Course Information USD Edition The Certificate in Online Business www.cobcertified.com August 2017 Edition V.5 1 Table of Contents INTRODUCTION...
More informationEfficient Mining Algorithms for Large-scale Graphs
Efficient Mining Algorithms for Large-scale Graphs Yasunari Kishimoto, Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka Abstract This article describes efficient graph mining algorithms designed
More informationMining Generalised Emerging Patterns
Mining Generalised Emerging Patterns Xiaoyuan Qian, James Bailey, Christopher Leckie Department of Computer Science and Software Engineering University of Melbourne, Australia {jbailey, caleckie}@csse.unimelb.edu.au
More informationCollective Entity Resolution in Relational Data
Collective Entity Resolution in Relational Data I. Bhattacharya, L. Getoor University of Maryland Presented by: Srikar Pyda, Brett Walenz CS590.01 - Duke University Parts of this presentation from: http://www.norc.org/pdfs/may%202011%20personal%20validation%20and%20entity%20resolution%20conference/getoorcollectiveentityresolution
More informationThe Quick Guide to Better Site Search
The Quick Guide to Better Site Search Start improving your site search today sli-systems.com sli-systems.com.au sli-systems.co.uk To accelerate your e-commerce, start with site search Turn Your Browsers
More informationExample: CPU-bound process that would run for 100 quanta continuously 1, 2, 4, 8, 16, 32, 64 (only 37 required for last run) Needs only 7 swaps
Interactive Scheduling Algorithms Continued o Priority Scheduling Introduction Round-robin assumes all processes are equal often not the case Assign a priority to each process, and always choose the process
More informationPrice Performance Analysis of NxtGen Vs. Amazon EC2 and Rackspace Cloud.
Price Performance Analysis of Vs. EC2 and Cloud. Performance Report: ECS Performance Analysis of Virtual Machines on ECS and Competitive IaaS Offerings An Examination of Web Server and Database Workloads
More informationReliability Measure of 2D-PAGE Spot Matching using Multiple Graphs
Reliability Measure of 2D-PAGE Spot Matching using Multiple Graphs Dae-Seong Jeoune 1, Chan-Myeong Han 2, Yun-Kyoo Ryoo 3, Sung-Woo Han 4, Hwi-Won Kim 5, Wookhyun Kim 6, and Young-Woo Yoon 6 1 Department
More informationKristina Lerman Anon Plangprasopchok Craig Knoblock. USC Information Sciences Institute
Kristina Lerman Anon Plangprasopchok Craig Knoblock Check weather forecast Find hotels address Select hotel by price, features and reviews Find flights features Get distance to hotel Email agenda to attendees
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationOnline supplement for: A Study of Quality and Accuracy Tradeoffs in Process Mining, by Zan Huang and Akhil Kumar (Appendices A F) APPENDIX A
Online supplement for: A Study of Quality and Accuracy Tradeoffs in Process Mining, by Zan Huang and Akhil Kumar (Appendices A F) APPENDIX A This example illustrates the calculation of the mismerge score
More informationA Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System
A Vector Space Equalization Scheme for a Concept-based Collaborative Information Retrieval System Takashi Yukawa Nagaoka University of Technology 1603-1 Kamitomioka-cho, Nagaoka-shi Niigata, 940-2188 JAPAN
More informationPresented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu
Presented by: Dimitri Galmanovich Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu 1 When looking for Unstructured data 2 Millions of such queries every day
More informationPerformance Analysis of Virtual Machines on NxtGen ECS and Competitive IaaS Offerings An Examination of Web Server and Database Workloads
Performance Report: ECS Performance Analysis of Virtual Machines on ECS and Competitive IaaS Offerings An Examination of Web Server and Database Workloads April 215 EXECUTIVE SUMMARY commissioned this
More informationNAME SIMILARITY MEASURES FOR XML SCHEMA MATCHING
NAME SIMILARITY MEASURES FOR XML SCHEMA MATCHING Ali El Desoukey Mansoura University, Mansoura, Egypt Amany Sarhan, Alsayed Algergawy Tanta University, Tanta, Egypt Seham Moawed Middle Delta Company for
More informationChallenges and Benefits of a Methodology for Scoring Web Content Accessibility Guidelines (WCAG) 2.0 Conformance
NISTIR 8010 Challenges and Benefits of a Methodology for Scoring Web Content Accessibility Guidelines (WCAG) 2.0 Conformance Frederick Boland Elizabeth Fong http://dx.doi.org/10.6028/nist.ir.8010 NISTIR
More informationContractors Guide to Search Engine Optimization
Contractors Guide to Search Engine Optimization CONTENTS What is Search Engine Optimization (SEO)? Why Do Businesses Need SEO (If They Want To Generate Business Online)? Which Search Engines Should You
More information3 SOLVING PROBLEMS BY SEARCHING
48 3 SOLVING PROBLEMS BY SEARCHING A goal-based agent aims at solving problems by performing actions that lead to desirable states Let us first consider the uninformed situation in which the agent is not
More informationProgramming Logic and Design Sixth Edition
Objectives Programming Logic and Design Sixth Edition Chapter 6 Arrays In this chapter, you will learn about: Arrays and how they occupy computer memory Manipulating an array to replace nested decisions
More informationMatching Schemas in Online Communities: A Web 2.0 Approach
Matching Schemas in Online Communities: A Web.0 Approach Robert McCann 1, Warren Shen, AnHai Doan 1 Microsoft, University of Wisconsin-Madison robert.mccann@microsoft.com, {whshen,anhai}@cs.wisc.edu Abstract
More informationParser: SQL parse tree
Jinze Liu Parser: SQL parse tree Good old lex & yacc Detect and reject syntax errors Validator: parse tree logical plan Detect and reject semantic errors Nonexistent tables/views/columns? Insufficient
More informationFinal Exam Review (Revised 3/16) Math MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Final Exam Review (Revised 3/16) Math 0001 Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Evaluate. 1) 1 14 1) A) 1 B) 114 C) 14 D) undefined
More informationProcessing Structural Constraints
SYNONYMS None Processing Structural Constraints Andrew Trotman Department of Computer Science University of Otago Dunedin New Zealand DEFINITION When searching unstructured plain-text the user is limited
More information