All groups final presentation/poster and write-up

Size: px
Start display at page:

Download "All groups final presentation/poster and write-up"

Transcription

1 Logistics Non-NIST groups project proposals Guidelines posted Write-up and slides due this Friday Coming Monday, each Non-NIST group will give project pitch (5 min) based on the slides Everyone in class need to log into canvas and pick the favorite idea at the end of the class All groups final presentation/poster and write-up Guidelines are be posted, draft final presentation submission will be due 1 day before presentation to allow quick feedback In-class presentation slots will be announced for group sign up soon on 12/2, 12/4, 12/7, 12/9 1

2 Review What are knowledge bases? What are some examples of knowledge bases? What are some of the KB applications? 2

3 Life of A Knowledge Base (KB) A knowledge base system is a special kind of database management system to for knowledge base management. KB extraction: knowledge extraction using statistical models in NLP/ML literature KB expansion: knowledge inference using deductive rules and knowledge with uncertainty KB evolution: knowledge update given new evidence and new data sources KB integration: knowledge integration from multiple sources (e.g., human, data sets, models) 3

4 Uncertainty Management Where does Uncertainty come from? Inherent in the state-of-the-art NLP results Incorrect, conflicting data sources Derived facts and query results Ad-hoc ways to handle uncertainty NULL values MAP in extraction or majority voting for integration Probability Relational Model to encode the joint distribution over the uncertainty and correlation in an automatically constructed KB Probabilistic Knowledge Base System 4

5 A Probabilistic Knowledge Base Statistical machine learning models and data processing and management systems are the backbones Knowledge & uncertainty Knowledge Integration Questions Knowledge & uncertainty Answers Knowledge Extraction Knowledge Expansion 5 Knowledge Evolution

6 Challenges & Projects Knowledge Extraction Probabilistic Database for Knowledge Extraction ( ) [VLDB08, VLDB10, ICDE10, SIGMOD11] BayesStore Multimodal Information Extraction (Image & Text) VITA Scalable Probabilistic First-order Rule Learning SRL Knowledge Expansion & Evolution Inference over Probabilistic KB [SIGMOD 2014] ProbKB Unified Data- and Graph-Parallel Framework [VLDB2015] UDA-GIST Query-Driven Inference for IE, ER and QA Archer Scalable NLP over MPP frameworks [VLDB 2012] MADLib Knowledge Integration Multiple Knowledge Base Integration SigmaKB Crowd Assisted Machine Learning [HCOMP 2013] CAMeL 6

7 Archimedes: A Master Probabilistic Knowledge Base System Architecture Applications Probabilistic KB System Core Knowledge ETL Q&A Systems ProbKB: Probabilistic KB Data Model and Materialization for Inference Ecology, Biomedical, Data Privacy Archer: Querydriven Inference for IE, ER and QA UDA-GIST: Unified Framework for Data Parallel and Graph Parallel Operations VITA: Multimodal SigmaKB: Information Multiple KB Extraction Integration Analysts for Situational Awareness CAMeL: Crowd Assisted Marching Learning for KB MADLib: In-DB Statistical Methods SRL: Scalable Probabilistic Rule Learning Parallel Data Processing over Multi-core or Distributed Cluster 7

8 From Big Data to Big Wisdom Data Facts Rules

9 Archimedes: A Master Probabilistic Knowledge Base System Architecture Applications Probabilistic KB System Core Knowledge ETL Q&A Systems ProbKB: Probabilistic KB Data Model and Materialization for Inference Ecology, Biomedical, Data Privacy Archer: Querydriven Inference for IE, ER and QA UDA-GIST: Unified Framework for Data Parallel and Graph Parallel Operations VITA: Multimodal SigmaKB: Information Multiple KB Extraction Integration Analysts for Situational Awareness CAMeL: Crowd Assisted Marching Learning for KB MADLib: In-DB Statistical Methods SRL: Scalable Probabilistic Rule Learning Parallel Data Processing over Multi-core or Distributed Cluster 9

10 Web Does Not Know All Facts and firstorder Rules For example, the Web has much information on: X Food contains Y Chemical Y Chemicals prevents Z Disease The Web has little information on: X Food prevents Z Disease We can infer propositional facts in addition to those extracted from the Web using first-order rules First-order Rules can be automatically learned from large KBs [Sherlock-Holmes, Archimedes-SRL] Other approach exist to expand KB without rules: neural model, Path Ranking, Google Knowledge Vault 10

11 Inferring Implicit Information Knowledge Expansion over a ProbKB [SIGMOD 14] Kale is rich in Calcium Calcium helps prevent Osteoporosis Kale helps prevent Osteoporosis. Uncertainty Exist in Rules and Facts 11 11

12 Probabilistic Inference using MLN A Markov logic network is a set of formulae with weights. Together with a finite set of constants, it defines a Markov network (i.e., factor graph) Contains(Food, Chemical) :- IsMadeFrom(Food, Ingredient), Contains(Ingredient, Chemical) Made (F1,I1) Prevents (C1,D1) Prevents(Food, Disease) :- Contains(Food, Chemical), Prevents(Chemical, Disease) Contains (I1,C1) Contains (F1,C1) Prevents (F1,D1) C = {F1, C1, D1, I1} 12

13 Challenges and Solutions for Knowledge Expansion over ProbKB Efficiency & Scalability We support relational representation of the ProbKB data model including the KB (entities + facts) and MLN; We design a SQL-based algorithm to apply inference rules in batches of structurally equivalent rules; We use MPP databases to parallelize the inference process. Quality Control We identify major error sources; We use semantic constraints to identify errors and ambiguities. We clean the rule set based on their statistical properties. 13

14 Probabilistic KB Data Model Example (ReVerb-Sherlock KB) We define a probabilistic knowledge base to be a 5-tuple Γ = (E, C, R, Π, L): Entities E Classes C Relations R Ruth Gruber, New York City, Brooklyn W (Writer) = {Ruth Gruber}, C (City) = {New York City}, P (Place) = {Brooklyn} born in(w, P ), born in(w, C ), live in(w, P ), live in(w, C ), locate in(p, C ) Facts Π 0.93 born in(ruth Gruber, Brooklyn) 0.96 born in(ruth Gruber, New York City) 14 14

15 Probabilistic KB Data Model (Cont.) Example (ReVerb-Sherlock KB Cont.) Rules L 1.40 x W y P (live in(x, y) born in(x, y)) 1.53 x W y C (live in(x, y) born in(x, y)) 2.68 x W y P (grow up in(x, y) born in(x, y)) 0.74 x W y C (grow up in(x, y) born in(x, y)) 0.32 x P y C z W (locate in(x, y) live in(z, x) live in(z, y)) 0.52 x P y C z W (locate in(x, y) born in(z, x) born in(z, y)) x C y C z W (born in(z, x) born in(z, y) x = y) Table: Probabilistic KB from ReVerb-Sherlock. 15

16 Relational ProbKB Definition Two first-order clauses are defined to be structurally equivalent if they differ only in the entities, classes, and relations symbols. x W y P (live in(x, y) born in(x, y)) 1.40 x W y C (live in(x, y) born in(x, y)) 1.53 x W y P (grow up in(x, y) born in(x, y)) 2.68 x W y C (grow up in(x, y) born in(x, y)) 0.74 R 1 R 2 C 1 C 2 w live in born in W P 1.40 live in born in W C 1.53 grow up in born in W P 2.68 grow up in born in W C 0.74

17 State-of-the-art Grounding & Inference Parallel MLN Grounding [state-of-the-art] Felix/Tuffy implementation over Greenplum* Parallel MCMC Inference [state-of-the-art] GraphLab parallelized Gibbs implementation 17

18 ProbKB In-Database Architecture RDMBS Factor Graph SQL UDF/UDA Query Optimizer & Execution Engine MLN Entities Facts 18

19 Efficiency & Scalability Sherlock-ReVerb KB Dataset ReVerb: 400K extracted facts from web text corpus; Sherlock: 31K inference rules learned from ReVerb. # relations 82,768 # rules 30,912 # entities 277,216 # facts 407,247 Table: Sherlock-ReVerb KB statistics 19

20 Efficiency & Scalability Knowledge Expansion Runtime Results Tuffy-T: A modified version of Tuffy with typing; ProbKB: SQL-based ProbKB expansion in batches; ProbKB-p: parallel version of ProbKB expansion. Systems Load Round 1 Round 2 Round 3 Round 4 ProbKB-p ProbKB Tuffy-T # records 396K 420K 456K 580K 1.5M Table: Sherlock-ReVerb KB runtime in minutes Resulting in 592M factors after the 4th iteration > quality control! 20

21 Quality Control Inference Errors born in(freud, Berlin) born in(freud, Germany) capital of(berlin, Germany) hub of(berlin, Germany) born in(mandel, Berlin) born in(mandel, Baltimore) located in(baltimore, Berlin) born in(freud, Berlin) born in(freud, Baltimore) born in(rothman, Baltimore) capital of(baltimore, Germany) live in(rothman, Baltimore) live in(rothman, Germany) Incorrect extractions Incorrect rules Ambiguous entities Propagated errors.

22 Knowledge Expansion Quality Control Methods Semantic/functional constraints for ambiguity detection (SC) Functional Relations Violating Facts Ambiguous Entities born in born in(mandel, Berlin) born in(mandel, New York City) born in(mandel, Chicago) Leonard Mandel Johnny Mandel Tom Mandel (futurist) grow up in located in capital of grow up in(miller, Placentia) grow up in(miller, New York City) grow up in(miller, New Orleans) located in(regional office, Glasgow) located in(regional office, Panama City) located in(regional office, South Bend) capital of(delhi, India) capital of(calcutta, India) Dustin Miller Alan Gifford Miller Taylor Miller McCarthy & Stone regional offices OCHA regional offices Indiana Landmarks regional offices (Incorrect extraction) Statistical Rule Cleaning (RC): thresholding statistical significance (e.g., conditional probability) > θ 22

23 Precision of inferred facts Knowledge Expansion Quality Control Results Ambiguities (detected) 34% 0.6 Ambiguous join keys 24% No SC RC RC top 20% RC top 10% SC only SC RC top 50% SC RC top 20% Incorrect rules 33% (b) Synonyms 1% General types 2% Incorrect extractions 6% Estimated number of correct facts (a) 0.6 higher precision. (b) Error sources. Current work: probabilistic rule learning for ProbKB 23

Efficient In-Database Analytics with Graphical Models

Efficient In-Database Analytics with Graphical Models Efficient In-Database Analytics with Graphical Models Daisy Zhe Wang, Yang Chen, Christan Grant and Kun Li {daisyw,yang,cgrant,kli}@cise.ufl.edu University of Florida, Department of Computer and Information

More information

Archimedes: Efficient Query Processing over Probabilistic Knowledge Bases

Archimedes: Efficient Query Processing over Probabilistic Knowledge Bases Archimedes: Efficient Query Processing over Probabilistic Knowledge Bases Yang Chen, Xiaofeng Zhou, Kun Li, Daisy Zhe Wang * Department of Computer and Information Science and Engineering, University of

More information

Extracting and Querying Probabilistic Information From Text in BayesStore-IE

Extracting and Querying Probabilistic Information From Text in BayesStore-IE Extracting and Querying Probabilistic Information From Text in BayesStore-IE Daisy Zhe Wang, Michael J. Franklin, Minos Garofalakis 2, Joseph M. Hellerstein University of California, Berkeley Technical

More information

Tuffy. Scaling up Statistical Inference in Markov Logic using an RDBMS

Tuffy. Scaling up Statistical Inference in Markov Logic using an RDBMS Tuffy Scaling up Statistical Inference in Markov Logic using an RDBMS Feng Niu, Chris Ré, AnHai Doan, and Jude Shavlik University of Wisconsin-Madison One Slide Summary Machine Reading is a DARPA program

More information

Automatic Knowledge Base Construction using Probabilistic Extraction, Deductive Reasoning, and Human Feedback

Automatic Knowledge Base Construction using Probabilistic Extraction, Deductive Reasoning, and Human Feedback Automatic Knowledge Base Construction using Probabilistic Extraction, Deductive Reasoning, and Human Feedback Daisy Zhe Wang Yang Chen Sean Goldberg Christan Grant Department of Computer and Information

More information

Jianyong Wang Department of Computer Science and Technology Tsinghua University

Jianyong Wang Department of Computer Science and Technology Tsinghua University Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity

More information

Constraint Propagation for Efficient Inference in Markov Logic

Constraint Propagation for Efficient Inference in Markov Logic Constraint Propagation for Efficient Inference in Tivadar Papai 1 Parag Singla 2 Henry Kautz 1 1 University of Rochester, Rochester NY 14627, USA 2 University of Texas, Austin TX 78701, USA September 13,

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

Information Retrieval (Part 1)

Information Retrieval (Part 1) Information Retrieval (Part 1) Fabio Aiolli http://www.math.unipd.it/~aiolli Dipartimento di Matematica Università di Padova Anno Accademico 2008/2009 1 Bibliographic References Copies of slides Selected

More information

Computer-based Tracking Protocols: Improving Communication between Databases

Computer-based Tracking Protocols: Improving Communication between Databases Computer-based Tracking Protocols: Improving Communication between Databases Amol Deshpande Database Group Department of Computer Science University of Maryland Overview Food tracking and traceability

More information

Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS

Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS Feng Niu Christopher Ré AnHai Doan Jude Shavlik University of Wisconsin-Madison {leonn,chrisre,anhai,shavlik}@cs.wisc.edu

More information

Enterprise Big Data Platforms

Enterprise Big Data Platforms Enterprise Big Data Platforms + Big Data research @ Roma Tre Antonio Maccioni maccioni@dia.uniroma3.it 19 April 2017 Outline Polystores QUEPA project Data Lakes KAYAK project No one size fits all Polyglot

More information

In-database batch and query-time inference over probabilistic graphical models using UDA GIST

In-database batch and query-time inference over probabilistic graphical models using UDA GIST The VLDB Journal (2017) 26:177 201 DOI 10.1007/s00778-016-0446-1 REGULAR PAPER In-database batch and query-time inference over probabilistic graphical models using UDA GIST Kun Li 1 Xiaofeng Zhou 1 Daisy

More information

Continuous Data Cleaning

Continuous Data Cleaning Continuous Data Cleaning M. Volkovs, F. Chiang, J. Szlichta and R. J. Miller ICDE 2014 Presenter: Nabiha Asghar Outline Introduction and motivation Main contributions of the paper Description of architecture

More information

Hybrid In-Database Inference for Declarative Information Extraction

Hybrid In-Database Inference for Declarative Information Extraction Hybrid In-Database Inference for Declarative Information Extraction Minos Garofalakis Technical University of Crete Daisy Zhe Wang University of California, Berkeley Joseph M. Hellerstein University of

More information

Sempala. Interactive SPARQL Query Processing on Hadoop

Sempala. Interactive SPARQL Query Processing on Hadoop Sempala Interactive SPARQL Query Processing on Hadoop Alexander Schätzle, Martin Przyjaciel-Zablocki, Antony Neu, Georg Lausen University of Freiburg, Germany ISWC 2014 - Riva del Garda, Italy Motivation

More information

Markov Logic: Representation

Markov Logic: Representation Markov Logic: Representation Overview Statistical relational learning Markov logic Basic inference Basic learning Statistical Relational Learning Goals: Combine (subsets of) logic and probability into

More information

Lecture 27: Learning from relational data

Lecture 27: Learning from relational data Lecture 27: Learning from relational data STATS 202: Data mining and analysis December 2, 2017 1 / 12 Announcements Kaggle deadline is this Thursday (Dec 7) at 4pm. If you haven t already, make a submission

More information

Unifying Big Data Workloads in Apache Spark

Unifying Big Data Workloads in Apache Spark Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache

More information

Writing Queries Using Microsoft SQL Server 2008 Transact-SQL. Overview

Writing Queries Using Microsoft SQL Server 2008 Transact-SQL. Overview Writing Queries Using Microsoft SQL Server 2008 Transact-SQL Overview The course has been extended by one day in response to delegate feedback. This extra day will allow for timely completion of all the

More information

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

An Overview of various methodologies used in Data set Preparation for Data mining Analysis An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of

More information

Advanced Computer Graphics CS 525M: Crowds replace Experts: Building Better Location-based Services using Mobile Social Network Interactions

Advanced Computer Graphics CS 525M: Crowds replace Experts: Building Better Location-based Services using Mobile Social Network Interactions Advanced Computer Graphics CS 525M: Crowds replace Experts: Building Better Location-based Services using Mobile Social Network Interactions XIAOCHEN HUANG Computer Science Dept. Worcester Polytechnic

More information

CSE Database Management Systems. York University. Parke Godfrey. Winter CSE-4411M Database Management Systems Godfrey p.

CSE Database Management Systems. York University. Parke Godfrey. Winter CSE-4411M Database Management Systems Godfrey p. CSE-4411 Database Management Systems York University Parke Godfrey Winter 2014 CSE-4411M Database Management Systems Godfrey p. 1/16 CSE-3421 vs CSE-4411 CSE-4411 is a continuation of CSE-3421, right?

More information

Record Linkage using Probabilistic Methods and Data Mining Techniques

Record Linkage using Probabilistic Methods and Data Mining Techniques Doi:10.5901/mjss.2017.v8n3p203 Abstract Record Linkage using Probabilistic Methods and Data Mining Techniques Ogerta Elezaj Faculty of Economy, University of Tirana Gloria Tuxhari Faculty of Economy, University

More information

Tania Tudorache Stanford University. - Ontolog forum invited talk04. October 2007

Tania Tudorache Stanford University. - Ontolog forum invited talk04. October 2007 Collaborative Ontology Development in Protégé Tania Tudorache Stanford University - Ontolog forum invited talk04. October 2007 Outline Introduction and Background Tools for collaborative knowledge development

More information

Mining the Web 2.0 to improve Search

Mining the Web 2.0 to improve Search Mining the Web 2.0 to improve Search Ricardo Baeza-Yates VP, Yahoo! Research Agenda The Power of Data Examples Improving Image Search (Faceted Clusters) Searching the Wikipedia (Correlator) Understanding

More information

HoloClean: Holistic Data Repairs with Probabilistic Inference

HoloClean: Holistic Data Repairs with Probabilistic Inference HoloClean: Holistic Data Repairs with Probabilistic Inference Theodoros Rekatsinas *, Xu Chu, Ihab F. Ilyas, Christopher Ré * {thodrek, chrismre}@cs.stanford.edu, {x4chu, ilyas}@uwaterloo.ca * Stanford

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 26 Enhanced Data Models: Introduction to Active, Temporal, Spatial, Multimedia, and Deductive Databases 26.1 Active Database Concepts and Triggers Database systems implement rules that specify

More information

A Framework for Securing Databases from Intrusion Threats

A Framework for Securing Databases from Intrusion Threats A Framework for Securing Databases from Intrusion Threats R. Prince Jeyaseelan James Department of Computer Applications, Valliammai Engineering College Affiliated to Anna University, Chennai, India Email:

More information

FBR SYSTEM: USER DIRECTED FILTERING OF IMPRECISE QUERIES

FBR SYSTEM: USER DIRECTED FILTERING OF IMPRECISE QUERIES FBR SYSTEM: USER DIRECTED FILTERING OF IMPRECISE QUERIES Sarika Sarode 1, K. V. Metre 2 1 Department of Computer Engineering, MET s IOE, Maharashtra, India 2 Department of Computer Engineering, MET s IOE,

More information

Link Mining & Entity Resolution. Lise Getoor University of Maryland, College Park

Link Mining & Entity Resolution. Lise Getoor University of Maryland, College Park Link Mining & Entity Resolution Lise Getoor University of Maryland, College Park Learning in Structured Domains Traditional machine learning and data mining approaches assume: A random sample of homogeneous

More information

Distributed Case-based Reasoning for Fault Management

Distributed Case-based Reasoning for Fault Management Distributed Case-based Reasoning for Fault Management Ha Manh Tran and Jürgen Schönwälder Computer Science, Jacobs University Bremen, Germany 1st EMANICS Workshop on Peer-to-Peer Management University

More information

Probabilistic/Uncertain Data Management

Probabilistic/Uncertain Data Management Probabilistic/Uncertain Data Management 1. Dalvi, Suciu. Efficient query evaluation on probabilistic databases, VLDB Jrnl, 2004 2. Das Sarma et al. Working models for uncertain data, ICDE 2006. Slides

More information

Asking the Right Questions in Crowd Data Sourcing

Asking the Right Questions in Crowd Data Sourcing MoDaS Mob Data Sourcing Asking the Right Questions in Crowd Data Sourcing Tova Milo Tel Aviv University Outline Introduction to crowd (data) sourcing Databases and crowds Declarative is good How to best

More information

Join Bayes Nets: A New Type of Bayes net for Relational Data

Join Bayes Nets: A New Type of Bayes net for Relational Data Join Bayes Nets: A New Type of Bayes net for Relational Data Oliver Schulte oschulte@cs.sfu.ca Hassan Khosravi hkhosrav@cs.sfu.ca Bahareh Bina bba18@cs.sfu.ca Flavia Moser fmoser@cs.sfu.ca Abstract Many

More information

Semantic Optimization of Preference Queries

Semantic Optimization of Preference Queries Semantic Optimization of Preference Queries Jan Chomicki University at Buffalo http://www.cse.buffalo.edu/ chomicki 1 Querying with Preferences Find the best answers to a query, instead of all the answers.

More information

DEC Computer Technology LESSON 6: DATABASES AND WEB SEARCH ENGINES

DEC Computer Technology LESSON 6: DATABASES AND WEB SEARCH ENGINES DEC. 1-5 Computer Technology LESSON 6: DATABASES AND WEB SEARCH ENGINES Monday Overview of Databases A web search engine is a large database containing information about Web pages that have been registered

More information

Provenance: Information for Shared Understanding

Provenance: Information for Shared Understanding Provenance: Information for Shared Understanding M. David Allen June 2012 Approved for Public Release: 3/7/2012 Case 12-0965 Government Mandates Net-Centric Data Strategy mandate: Is the source, accuracy

More information

NYU CSCI-GA Fall 2016

NYU CSCI-GA Fall 2016 1 / 45 Information Retrieval: Personalization Fernando Diaz Microsoft Research NYC November 7, 2016 2 / 45 Outline Introduction to Personalization Topic-Specific PageRank News Personalization Deciding

More information

Pliny and Fixr Meeting. September 15, 2014

Pliny and Fixr Meeting. September 15, 2014 Pliny and Fixr Meeting September 15, 2014 Fixr: Mining and Understanding Bug Fixes for App-Framework Protocol Defects (TA2) University of Colorado Boulder September 15, 2014 Fixr: Mining and Understanding

More information

Equipping Robot Control Programs with First-Order Probabilistic Reasoning Capabilities

Equipping Robot Control Programs with First-Order Probabilistic Reasoning Capabilities Equipping Robot s with First-Order Probabilistic Reasoning Capabilities Nicholas M. Stiffler Ashok Kumar April 2011 1 / 48 Outline An autonomous robot system that is to act in a real-world environment

More information

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x Deep Learning 4 Autoencoder, Attention (spatial transformer), Multi-modal learning, Neural Turing Machine, Memory Networks, Generative Adversarial Net Jian Li IIIS, Tsinghua Autoencoder Autoencoder Unsupervised

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

Learning mappings and queries

Learning mappings and queries Learning mappings and queries Marie Jacob University Of Pennsylvania DEIS 2010 1 Schema mappings Denote relationships between schemas Relates source schema S and target schema T Defined in a query language

More information

Informatica Enterprise Information Catalog

Informatica Enterprise Information Catalog Data Sheet Informatica Enterprise Information Catalog Benefits Automatically catalog and classify all types of data across the enterprise using an AI-powered catalog Identify domains and entities with

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

Supporting Fuzzy Keyword Search in Databases

Supporting Fuzzy Keyword Search in Databases I J C T A, 9(24), 2016, pp. 385-391 International Science Press Supporting Fuzzy Keyword Search in Databases Jayavarthini C.* and Priya S. ABSTRACT An efficient keyword search system computes answers as

More information

CS6200 Information Retrieval. Jesse Anderton College of Computer and Information Science Northeastern University

CS6200 Information Retrieval. Jesse Anderton College of Computer and Information Science Northeastern University CS6200 Information Retrieval Jesse Anderton College of Computer and Information Science Northeastern University Major Contributors Gerard Salton! Vector Space Model Indexing Relevance Feedback SMART Karen

More information

Ontology Based Prediction of Difficult Keyword Queries

Ontology Based Prediction of Difficult Keyword Queries Ontology Based Prediction of Difficult Keyword Queries Lubna.C*, Kasim K Pursuing M.Tech (CSE)*, Associate Professor (CSE) MEA Engineering College, Perinthalmanna Kerala, India lubna9990@gmail.com, kasim_mlp@gmail.com

More information

Review -Chapter 4. Review -Chapter 5

Review -Chapter 4. Review -Chapter 5 Review -Chapter 4 Entity relationship (ER) model Steps for building a formal ERD Uses ER diagrams to represent conceptual database as viewed by the end user Three main components Entities Relationships

More information

Logic: TD as search, Datalog (variables)

Logic: TD as search, Datalog (variables) Logic: TD as search, Datalog (variables) Computer Science cpsc322, Lecture 23 (Textbook Chpt 5.2 & some basic concepts from Chpt 12) June, 8, 2017 CPSC 322, Lecture 23 Slide 1 Lecture Overview Recap Top

More information

Multi-dimensional Skyline to find shopping malls. Md Amir Amiruzzaman Suphanut Parn Jamonnak Zhengyong Ren

Multi-dimensional Skyline to find shopping malls. Md Amir Amiruzzaman Suphanut Parn Jamonnak Zhengyong Ren Multi-dimensional Skyline to find shopping malls Md Amir Amiruzzaman Suphanut Parn Jamonnak Zhengyong Ren Introduction In market research predicting customer movement is very important. While customers

More information

Silberschatz, Korth and Sudarshan See for conditions on re-use

Silberschatz, Korth and Sudarshan See   for conditions on re-use Chapter 3: SQL Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 3: SQL Data Definition Basic Query Structure Set Operations Aggregate Functions Null Values Nested

More information

Big Data Infrastructures & Technologies

Big Data Infrastructures & Technologies Big Data Infrastructures & Technologies Spark and MLLIB OVERVIEW OF SPARK What is Spark? Fast and expressive cluster computing system interoperable with Apache Hadoop Improves efficiency through: In-memory

More information

Combining the Logical and the Probabilistic in Program Analysis. Xin Zhang Xujie Si Mayur Naik University of Pennsylvania

Combining the Logical and the Probabilistic in Program Analysis. Xin Zhang Xujie Si Mayur Naik University of Pennsylvania Combining the Logical and the Probabilistic in Program Analysis Xin Zhang Xujie Si Mayur Naik University of Pennsylvania What is Program Analysis? int f(int i) {... } Program Analysis x may be null!...

More information

Relational model continued. Understanding how to use the relational model. Summary of board example: with Copies as weak entity

Relational model continued. Understanding how to use the relational model. Summary of board example: with Copies as weak entity COS 597A: Principles of Database and Information Systems Relational model continued Understanding how to use the relational model 1 with as weak entity folded into folded into branches: (br_, librarian,

More information

Case Study: Lufthansa Cargo Database

Case Study: Lufthansa Cargo Database Case Study: Lufthansa Cargo Database Carsten Schürmann 1 Today s lecture More on data modelling Introduction to Lufthansa Cargo Database Entity Relationship diagram Boyce-Codd normal form 2 From Lecture

More information

Similarity Joins of Text with Incomplete Information Formats

Similarity Joins of Text with Incomplete Information Formats Similarity Joins of Text with Incomplete Information Formats Shaoxu Song and Lei Chen Department of Computer Science Hong Kong University of Science and Technology {sshaoxu,leichen}@cs.ust.hk Abstract.

More information

Class-Level Bayes Nets for Relational Data

Class-Level Bayes Nets for Relational Data Class-Level Bayes Nets for Relational Data Oliver Schulte, Hassan Khosravi, Flavia Moser, Martin Ester {oschulte, hkhosrav, fmoser, ester}@cs.sfu.ca School of Computing Science Simon Fraser University

More information

CTL.SC4x Technology and Systems

CTL.SC4x Technology and Systems in Supply Chain Management CTL.SC4x Technology and Systems Key Concepts Document This document contains the Key Concepts for the SC4x course, Weeks 1 and 2. These are meant to complement, not replace,

More information

5/3/2010Z:\ jeh\self\notes.doc\7 Chapter 7 Graphical models and belief propagation Graphical models and belief propagation

5/3/2010Z:\ jeh\self\notes.doc\7 Chapter 7 Graphical models and belief propagation Graphical models and belief propagation //00Z:\ jeh\self\notes.doc\7 Chapter 7 Graphical models and belief propagation 7. Graphical models and belief propagation Outline graphical models Bayesian networks pair wise Markov random fields factor

More information

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check

More information

AN INTERACTIVE FORM APPROACH FOR DATABASE QUERIES THROUGH F-MEASURE

AN INTERACTIVE FORM APPROACH FOR DATABASE QUERIES THROUGH F-MEASURE http:// AN INTERACTIVE FORM APPROACH FOR DATABASE QUERIES THROUGH F-MEASURE Parashurama M. 1, Doddegowda B.J 2 1 PG Scholar, 2 Associate Professor, CSE Department, AMC Engineering College, Karnataka, (India).

More information

Integrating auxiliary data in optimal spatial design for species distribution mapping

Integrating auxiliary data in optimal spatial design for species distribution mapping Integrating auxiliary data in optimal spatial design for species distribution mapping Brian Reich, Krishna Pacifici and Jon Stallings North Carolina State University Reich + Pacifici + Stallings Optimal

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

11/04/16. Data Profiling. Helena Galhardas DEI/IST. References

11/04/16. Data Profiling. Helena Galhardas DEI/IST. References Data Profiling Helena Galhardas DEI/IST References Slides Data Profiling course, Felix Naumann, Trento, July 2015 Z. Abedjan, L. Golab, F. Naumann, Profiling Relational Data A Survey, VLDBJ 2015 T. Papenbrock

More information

Markov Random Fields and Gibbs Sampling for Image Denoising

Markov Random Fields and Gibbs Sampling for Image Denoising Markov Random Fields and Gibbs Sampling for Image Denoising Chang Yue Electrical Engineering Stanford University changyue@stanfoed.edu Abstract This project applies Gibbs Sampling based on different Markov

More information

Data Cleansing. LIU Jingyuan, Vislab WANG Yilei, Theoretical group

Data Cleansing. LIU Jingyuan, Vislab WANG Yilei, Theoretical group Data Cleansing LIU Jingyuan, Vislab WANG Yilei, Theoretical group What is Data Cleansing Data cleansing (data cleaning) is the process of detecting and correcting (or removing) errors or inconsistencies

More information

THE RELATIONAL MODEL. University of Waterloo

THE RELATIONAL MODEL. University of Waterloo THE RELATIONAL MODEL 1-1 List of Slides 1 2 The Relational Model 3 Relations and Databases 4 Example 5 Another Example 6 What does it mean? 7 Example Database 8 What can we do with it? 9 Variables and

More information

HaLoop Efficient Iterative Data Processing on Large Clusters

HaLoop Efficient Iterative Data Processing on Large Clusters HaLoop Efficient Iterative Data Processing on Large Clusters Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael D. Ernst University of Washington Department of Computer Science & Engineering Presented

More information

Research challenges in data-intensive computing The Stratosphere Project Apache Flink

Research challenges in data-intensive computing The Stratosphere Project Apache Flink Research challenges in data-intensive computing The Stratosphere Project Apache Flink Seif Haridi KTH/SICS haridi@kth.se e2e-clouds.org Presented by: Seif Haridi May 2014 Research Areas Data-intensive

More information

Target and source schemas may contain integrity constraints. source schema(s) assertions relating elements of the global schema to elements of the

Target and source schemas may contain integrity constraints. source schema(s) assertions relating elements of the global schema to elements of the Data integration Data Integration System: target (integrated) schema source schema (maybe more than one) assertions relating elements of the global schema to elements of the source schema(s) Target and

More information

Structured Data on the Web

Structured Data on the Web Structured Data on the Web Alon Halevy Google Australasian Computer Science Week January, 2010 Structured Data & The Web Andree Hudson, 4 th of July Hard to find structured data via search engines

More information

FACTORBASE : SQL for Learning A Multi-Relational Graphical Model

FACTORBASE : SQL for Learning A Multi-Relational Graphical Model 1 FACTORBASE : SQL for Learning A Multi-Relational Graphical Model Oliver Schulte, Zhensong Qian Simon Fraser University, Canada {oschulte,zqian}@sfu.ca arxiv:1508.02428v1 [cs.db] 10 Aug 2015 Abstract

More information

VISION & LANGUAGE From Captions to Visual Concepts and Back

VISION & LANGUAGE From Captions to Visual Concepts and Back VISION & LANGUAGE From Captions to Visual Concepts and Back Brady Fowler & Kerry Jones Tuesday, February 28th 2017 CS 6501-004 VICENTE Agenda Problem Domain Object Detection Language Generation Sentence

More information

PRIS at TAC2012 KBP Track

PRIS at TAC2012 KBP Track PRIS at TAC2012 KBP Track Yan Li, Sijia Chen, Zhihua Zhou, Jie Yin, Hao Luo, Liyin Hong, Weiran Xu, Guang Chen, Jun Guo School of Information and Communication Engineering Beijing University of Posts and

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets: Inference (Finish) Variable Elimination Graph-view of VE: Fill-edges, induced width

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

itrails: Pay-as-you-go Information Integration in Dataspaces

itrails: Pay-as-you-go Information Integration in Dataspaces itrails: Pay-as-you-go Information Integration in Dataspaces Marcos Vaz Salles Jens Dittrich Shant Karakashian Olivier Girard Lukas Blunschi ETH Zurich VLDB 2007 Outline Motivation itrails Experiments

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Beyond Bag of Words Bag of Words a document is considered to be an unordered collection of words with no relationships Extending

More information

Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016

Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016 + Databases and Information Retrieval Integration TIETS42 Autumn 2016 Kostas Stefanidis kostas.stefanidis@uta.fi http://www.uta.fi/sis/tie/dbir/index.html http://people.uta.fi/~kostas.stefanidis/dbir16/dbir16-main.html

More information

A synthetic query-aware database generator

A synthetic query-aware database generator A synthetic query-aware database generator Anonymous Department of Computer Science Golisano College of Computing and Information Sciences Rochester, NY 14586 Abstract In database applications and DBMS

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data

More information

Drawing the Big Picture

Drawing the Big Picture Drawing the Big Picture Multi-Platform Data Architectures, Queries, and Analytics Philip Russom TDWI Research Director for Data Management August 26, 2015 Sponsor 2 Speakers Philip Russom TDWI Research

More information

EECS 647: Introduction to Database Systems

EECS 647: Introduction to Database Systems EECS 647: Introduction to Database Systems Instructor: Luke Huan Spring 2009 Stating Points A database A database management system A miniworld A data model Conceptual model Relational model 2/24/2009

More information

Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator

Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator R.Saravanan 1, J.Sivapriya 2, M.Shahidha 3 1 Assisstant Professor, Department of IT,SMVEC, Puducherry, India 2,3 UG student, Department

More information

EO Ground Segment Evolution Reflections by

EO Ground Segment Evolution Reflections by EO Ground Segment Evolution Reflections by Interoute Jonathan Brown Marketing Director Workshop 2015, 24 th September 2015 ESA/ESRIN Frascati Interoute, from the ground to the cloud 1. Interoute is the

More information

(12) Patent Application Publication (10) Pub. No.: US 2010/ A1. Yu (43) Pub. Date: Aug. 26, 2010

(12) Patent Application Publication (10) Pub. No.: US 2010/ A1. Yu (43) Pub. Date: Aug. 26, 2010 US 2010O217768A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2010/0217768 A1 Yu (43) Pub. Date: (54) QUERY SYSTEM FOR BIOMEDICAL Publication Classification LITERATURE USING

More information

Open Data Integration. Renée J. Miller

Open Data Integration. Renée J. Miller Open Data Integration Renée J. Miller miller@northeastern.edu !2 Open Data Principles Timely & Comprehensive Accessible and Usable Complete - All public data is made available. Public data is data that

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) ER to Relational & Relational Algebra Lecture 4, January 20, 2015 Mohammad Hammoud Today Last Session: The relational model Today s Session: ER to relational Relational algebra

More information

Hibernate Search Googling your persistence domain model. Emmanuel Bernard Doer JBoss, a division of Red Hat

Hibernate Search Googling your persistence domain model. Emmanuel Bernard Doer JBoss, a division of Red Hat Hibernate Search Googling your persistence domain model Emmanuel Bernard Doer JBoss, a division of Red Hat Search: left over of today s applications Add search dimension to the domain model Frankly, search

More information

Reproducible Workflows Biomedical Research. P Berlin, Germany

Reproducible Workflows Biomedical Research. P Berlin, Germany Reproducible Workflows Biomedical Research P11 2018 Berlin, Germany Contributors Leslie McIntosh Research Data Alliance, U.S., Executive Director Oya Beyan Aachen University, Germany Anthony Juehne RDA,

More information

DATABASE TECHNOLOGY - 1MB025 (also 1DL029, 1DL300+1DL400)

DATABASE TECHNOLOGY - 1MB025 (also 1DL029, 1DL300+1DL400) 1 DATABASE TECHNOLOGY - 1MB025 (also 1DL029, 1DL300+1DL400) Spring 2008 An introductury course on database systems http://user.it.uu.se/~udbl/dbt-vt2008/ alt. http://www.it.uu.se/edu/course/homepage/dbastekn/vt08/

More information

Question Answering over Knowledge Bases: Entity, Text, and System Perspectives. Wanyun Cui Fudan University

Question Answering over Knowledge Bases: Entity, Text, and System Perspectives. Wanyun Cui Fudan University Question Answering over Knowledge Bases: Entity, Text, and System Perspectives Wanyun Cui Fudan University Backgrounds Question Answering (QA) systems answer questions posed by humans in a natural language.

More information

SAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine

SAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine SAP IQ Software16, Edge Edition The Affordable High Performance Analytical Database Engine Agenda Agenda Introduction to Dobler Consulting Today s Data Challenges Overview of SAP IQ 16, Edge Edition SAP

More information

Why do we need graph processing?

Why do we need graph processing? Why do we need graph processing? Community detection: suggest followers? Determine what products people will like Count how many people are in different communities (polling?) Graphs are Everywhere Group

More information

Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS

Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS Feng Niu Christopher Ré AnHai Doan Jude Shavlik University of Wisconsin-Madison {leonn,chrisre,anhai,shavlik}@cs.wisc.edu

More information