All groups final presentation/poster and write-up
|
|
- Byron Green
- 6 years ago
- Views:
Transcription
1 Logistics Non-NIST groups project proposals Guidelines posted Write-up and slides due this Friday Coming Monday, each Non-NIST group will give project pitch (5 min) based on the slides Everyone in class need to log into canvas and pick the favorite idea at the end of the class All groups final presentation/poster and write-up Guidelines are be posted, draft final presentation submission will be due 1 day before presentation to allow quick feedback In-class presentation slots will be announced for group sign up soon on 12/2, 12/4, 12/7, 12/9 1
2 Review What are knowledge bases? What are some examples of knowledge bases? What are some of the KB applications? 2
3 Life of A Knowledge Base (KB) A knowledge base system is a special kind of database management system to for knowledge base management. KB extraction: knowledge extraction using statistical models in NLP/ML literature KB expansion: knowledge inference using deductive rules and knowledge with uncertainty KB evolution: knowledge update given new evidence and new data sources KB integration: knowledge integration from multiple sources (e.g., human, data sets, models) 3
4 Uncertainty Management Where does Uncertainty come from? Inherent in the state-of-the-art NLP results Incorrect, conflicting data sources Derived facts and query results Ad-hoc ways to handle uncertainty NULL values MAP in extraction or majority voting for integration Probability Relational Model to encode the joint distribution over the uncertainty and correlation in an automatically constructed KB Probabilistic Knowledge Base System 4
5 A Probabilistic Knowledge Base Statistical machine learning models and data processing and management systems are the backbones Knowledge & uncertainty Knowledge Integration Questions Knowledge & uncertainty Answers Knowledge Extraction Knowledge Expansion 5 Knowledge Evolution
6 Challenges & Projects Knowledge Extraction Probabilistic Database for Knowledge Extraction ( ) [VLDB08, VLDB10, ICDE10, SIGMOD11] BayesStore Multimodal Information Extraction (Image & Text) VITA Scalable Probabilistic First-order Rule Learning SRL Knowledge Expansion & Evolution Inference over Probabilistic KB [SIGMOD 2014] ProbKB Unified Data- and Graph-Parallel Framework [VLDB2015] UDA-GIST Query-Driven Inference for IE, ER and QA Archer Scalable NLP over MPP frameworks [VLDB 2012] MADLib Knowledge Integration Multiple Knowledge Base Integration SigmaKB Crowd Assisted Machine Learning [HCOMP 2013] CAMeL 6
7 Archimedes: A Master Probabilistic Knowledge Base System Architecture Applications Probabilistic KB System Core Knowledge ETL Q&A Systems ProbKB: Probabilistic KB Data Model and Materialization for Inference Ecology, Biomedical, Data Privacy Archer: Querydriven Inference for IE, ER and QA UDA-GIST: Unified Framework for Data Parallel and Graph Parallel Operations VITA: Multimodal SigmaKB: Information Multiple KB Extraction Integration Analysts for Situational Awareness CAMeL: Crowd Assisted Marching Learning for KB MADLib: In-DB Statistical Methods SRL: Scalable Probabilistic Rule Learning Parallel Data Processing over Multi-core or Distributed Cluster 7
8 From Big Data to Big Wisdom Data Facts Rules
9 Archimedes: A Master Probabilistic Knowledge Base System Architecture Applications Probabilistic KB System Core Knowledge ETL Q&A Systems ProbKB: Probabilistic KB Data Model and Materialization for Inference Ecology, Biomedical, Data Privacy Archer: Querydriven Inference for IE, ER and QA UDA-GIST: Unified Framework for Data Parallel and Graph Parallel Operations VITA: Multimodal SigmaKB: Information Multiple KB Extraction Integration Analysts for Situational Awareness CAMeL: Crowd Assisted Marching Learning for KB MADLib: In-DB Statistical Methods SRL: Scalable Probabilistic Rule Learning Parallel Data Processing over Multi-core or Distributed Cluster 9
10 Web Does Not Know All Facts and firstorder Rules For example, the Web has much information on: X Food contains Y Chemical Y Chemicals prevents Z Disease The Web has little information on: X Food prevents Z Disease We can infer propositional facts in addition to those extracted from the Web using first-order rules First-order Rules can be automatically learned from large KBs [Sherlock-Holmes, Archimedes-SRL] Other approach exist to expand KB without rules: neural model, Path Ranking, Google Knowledge Vault 10
11 Inferring Implicit Information Knowledge Expansion over a ProbKB [SIGMOD 14] Kale is rich in Calcium Calcium helps prevent Osteoporosis Kale helps prevent Osteoporosis. Uncertainty Exist in Rules and Facts 11 11
12 Probabilistic Inference using MLN A Markov logic network is a set of formulae with weights. Together with a finite set of constants, it defines a Markov network (i.e., factor graph) Contains(Food, Chemical) :- IsMadeFrom(Food, Ingredient), Contains(Ingredient, Chemical) Made (F1,I1) Prevents (C1,D1) Prevents(Food, Disease) :- Contains(Food, Chemical), Prevents(Chemical, Disease) Contains (I1,C1) Contains (F1,C1) Prevents (F1,D1) C = {F1, C1, D1, I1} 12
13 Challenges and Solutions for Knowledge Expansion over ProbKB Efficiency & Scalability We support relational representation of the ProbKB data model including the KB (entities + facts) and MLN; We design a SQL-based algorithm to apply inference rules in batches of structurally equivalent rules; We use MPP databases to parallelize the inference process. Quality Control We identify major error sources; We use semantic constraints to identify errors and ambiguities. We clean the rule set based on their statistical properties. 13
14 Probabilistic KB Data Model Example (ReVerb-Sherlock KB) We define a probabilistic knowledge base to be a 5-tuple Γ = (E, C, R, Π, L): Entities E Classes C Relations R Ruth Gruber, New York City, Brooklyn W (Writer) = {Ruth Gruber}, C (City) = {New York City}, P (Place) = {Brooklyn} born in(w, P ), born in(w, C ), live in(w, P ), live in(w, C ), locate in(p, C ) Facts Π 0.93 born in(ruth Gruber, Brooklyn) 0.96 born in(ruth Gruber, New York City) 14 14
15 Probabilistic KB Data Model (Cont.) Example (ReVerb-Sherlock KB Cont.) Rules L 1.40 x W y P (live in(x, y) born in(x, y)) 1.53 x W y C (live in(x, y) born in(x, y)) 2.68 x W y P (grow up in(x, y) born in(x, y)) 0.74 x W y C (grow up in(x, y) born in(x, y)) 0.32 x P y C z W (locate in(x, y) live in(z, x) live in(z, y)) 0.52 x P y C z W (locate in(x, y) born in(z, x) born in(z, y)) x C y C z W (born in(z, x) born in(z, y) x = y) Table: Probabilistic KB from ReVerb-Sherlock. 15
16 Relational ProbKB Definition Two first-order clauses are defined to be structurally equivalent if they differ only in the entities, classes, and relations symbols. x W y P (live in(x, y) born in(x, y)) 1.40 x W y C (live in(x, y) born in(x, y)) 1.53 x W y P (grow up in(x, y) born in(x, y)) 2.68 x W y C (grow up in(x, y) born in(x, y)) 0.74 R 1 R 2 C 1 C 2 w live in born in W P 1.40 live in born in W C 1.53 grow up in born in W P 2.68 grow up in born in W C 0.74
17 State-of-the-art Grounding & Inference Parallel MLN Grounding [state-of-the-art] Felix/Tuffy implementation over Greenplum* Parallel MCMC Inference [state-of-the-art] GraphLab parallelized Gibbs implementation 17
18 ProbKB In-Database Architecture RDMBS Factor Graph SQL UDF/UDA Query Optimizer & Execution Engine MLN Entities Facts 18
19 Efficiency & Scalability Sherlock-ReVerb KB Dataset ReVerb: 400K extracted facts from web text corpus; Sherlock: 31K inference rules learned from ReVerb. # relations 82,768 # rules 30,912 # entities 277,216 # facts 407,247 Table: Sherlock-ReVerb KB statistics 19
20 Efficiency & Scalability Knowledge Expansion Runtime Results Tuffy-T: A modified version of Tuffy with typing; ProbKB: SQL-based ProbKB expansion in batches; ProbKB-p: parallel version of ProbKB expansion. Systems Load Round 1 Round 2 Round 3 Round 4 ProbKB-p ProbKB Tuffy-T # records 396K 420K 456K 580K 1.5M Table: Sherlock-ReVerb KB runtime in minutes Resulting in 592M factors after the 4th iteration > quality control! 20
21 Quality Control Inference Errors born in(freud, Berlin) born in(freud, Germany) capital of(berlin, Germany) hub of(berlin, Germany) born in(mandel, Berlin) born in(mandel, Baltimore) located in(baltimore, Berlin) born in(freud, Berlin) born in(freud, Baltimore) born in(rothman, Baltimore) capital of(baltimore, Germany) live in(rothman, Baltimore) live in(rothman, Germany) Incorrect extractions Incorrect rules Ambiguous entities Propagated errors.
22 Knowledge Expansion Quality Control Methods Semantic/functional constraints for ambiguity detection (SC) Functional Relations Violating Facts Ambiguous Entities born in born in(mandel, Berlin) born in(mandel, New York City) born in(mandel, Chicago) Leonard Mandel Johnny Mandel Tom Mandel (futurist) grow up in located in capital of grow up in(miller, Placentia) grow up in(miller, New York City) grow up in(miller, New Orleans) located in(regional office, Glasgow) located in(regional office, Panama City) located in(regional office, South Bend) capital of(delhi, India) capital of(calcutta, India) Dustin Miller Alan Gifford Miller Taylor Miller McCarthy & Stone regional offices OCHA regional offices Indiana Landmarks regional offices (Incorrect extraction) Statistical Rule Cleaning (RC): thresholding statistical significance (e.g., conditional probability) > θ 22
23 Precision of inferred facts Knowledge Expansion Quality Control Results Ambiguities (detected) 34% 0.6 Ambiguous join keys 24% No SC RC RC top 20% RC top 10% SC only SC RC top 50% SC RC top 20% Incorrect rules 33% (b) Synonyms 1% General types 2% Incorrect extractions 6% Estimated number of correct facts (a) 0.6 higher precision. (b) Error sources. Current work: probabilistic rule learning for ProbKB 23
Efficient In-Database Analytics with Graphical Models
Efficient In-Database Analytics with Graphical Models Daisy Zhe Wang, Yang Chen, Christan Grant and Kun Li {daisyw,yang,cgrant,kli}@cise.ufl.edu University of Florida, Department of Computer and Information
More informationArchimedes: Efficient Query Processing over Probabilistic Knowledge Bases
Archimedes: Efficient Query Processing over Probabilistic Knowledge Bases Yang Chen, Xiaofeng Zhou, Kun Li, Daisy Zhe Wang * Department of Computer and Information Science and Engineering, University of
More informationExtracting and Querying Probabilistic Information From Text in BayesStore-IE
Extracting and Querying Probabilistic Information From Text in BayesStore-IE Daisy Zhe Wang, Michael J. Franklin, Minos Garofalakis 2, Joseph M. Hellerstein University of California, Berkeley Technical
More informationTuffy. Scaling up Statistical Inference in Markov Logic using an RDBMS
Tuffy Scaling up Statistical Inference in Markov Logic using an RDBMS Feng Niu, Chris Ré, AnHai Doan, and Jude Shavlik University of Wisconsin-Madison One Slide Summary Machine Reading is a DARPA program
More informationAutomatic Knowledge Base Construction using Probabilistic Extraction, Deductive Reasoning, and Human Feedback
Automatic Knowledge Base Construction using Probabilistic Extraction, Deductive Reasoning, and Human Feedback Daisy Zhe Wang Yang Chen Sean Goldberg Christan Grant Department of Computer and Information
More informationJianyong Wang Department of Computer Science and Technology Tsinghua University
Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity
More informationConstraint Propagation for Efficient Inference in Markov Logic
Constraint Propagation for Efficient Inference in Tivadar Papai 1 Parag Singla 2 Henry Kautz 1 1 University of Rochester, Rochester NY 14627, USA 2 University of Texas, Austin TX 78701, USA September 13,
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationInformation Retrieval (Part 1)
Information Retrieval (Part 1) Fabio Aiolli http://www.math.unipd.it/~aiolli Dipartimento di Matematica Università di Padova Anno Accademico 2008/2009 1 Bibliographic References Copies of slides Selected
More informationComputer-based Tracking Protocols: Improving Communication between Databases
Computer-based Tracking Protocols: Improving Communication between Databases Amol Deshpande Database Group Department of Computer Science University of Maryland Overview Food tracking and traceability
More informationTuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS
Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS Feng Niu Christopher Ré AnHai Doan Jude Shavlik University of Wisconsin-Madison {leonn,chrisre,anhai,shavlik}@cs.wisc.edu
More informationEnterprise Big Data Platforms
Enterprise Big Data Platforms + Big Data research @ Roma Tre Antonio Maccioni maccioni@dia.uniroma3.it 19 April 2017 Outline Polystores QUEPA project Data Lakes KAYAK project No one size fits all Polyglot
More informationIn-database batch and query-time inference over probabilistic graphical models using UDA GIST
The VLDB Journal (2017) 26:177 201 DOI 10.1007/s00778-016-0446-1 REGULAR PAPER In-database batch and query-time inference over probabilistic graphical models using UDA GIST Kun Li 1 Xiaofeng Zhou 1 Daisy
More informationContinuous Data Cleaning
Continuous Data Cleaning M. Volkovs, F. Chiang, J. Szlichta and R. J. Miller ICDE 2014 Presenter: Nabiha Asghar Outline Introduction and motivation Main contributions of the paper Description of architecture
More informationHybrid In-Database Inference for Declarative Information Extraction
Hybrid In-Database Inference for Declarative Information Extraction Minos Garofalakis Technical University of Crete Daisy Zhe Wang University of California, Berkeley Joseph M. Hellerstein University of
More informationSempala. Interactive SPARQL Query Processing on Hadoop
Sempala Interactive SPARQL Query Processing on Hadoop Alexander Schätzle, Martin Przyjaciel-Zablocki, Antony Neu, Georg Lausen University of Freiburg, Germany ISWC 2014 - Riva del Garda, Italy Motivation
More informationMarkov Logic: Representation
Markov Logic: Representation Overview Statistical relational learning Markov logic Basic inference Basic learning Statistical Relational Learning Goals: Combine (subsets of) logic and probability into
More informationLecture 27: Learning from relational data
Lecture 27: Learning from relational data STATS 202: Data mining and analysis December 2, 2017 1 / 12 Announcements Kaggle deadline is this Thursday (Dec 7) at 4pm. If you haven t already, make a submission
More informationUnifying Big Data Workloads in Apache Spark
Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache
More informationWriting Queries Using Microsoft SQL Server 2008 Transact-SQL. Overview
Writing Queries Using Microsoft SQL Server 2008 Transact-SQL Overview The course has been extended by one day in response to delegate feedback. This extra day will allow for timely completion of all the
More informationAn Overview of various methodologies used in Data set Preparation for Data mining Analysis
An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of
More informationAdvanced Computer Graphics CS 525M: Crowds replace Experts: Building Better Location-based Services using Mobile Social Network Interactions
Advanced Computer Graphics CS 525M: Crowds replace Experts: Building Better Location-based Services using Mobile Social Network Interactions XIAOCHEN HUANG Computer Science Dept. Worcester Polytechnic
More informationCSE Database Management Systems. York University. Parke Godfrey. Winter CSE-4411M Database Management Systems Godfrey p.
CSE-4411 Database Management Systems York University Parke Godfrey Winter 2014 CSE-4411M Database Management Systems Godfrey p. 1/16 CSE-3421 vs CSE-4411 CSE-4411 is a continuation of CSE-3421, right?
More informationRecord Linkage using Probabilistic Methods and Data Mining Techniques
Doi:10.5901/mjss.2017.v8n3p203 Abstract Record Linkage using Probabilistic Methods and Data Mining Techniques Ogerta Elezaj Faculty of Economy, University of Tirana Gloria Tuxhari Faculty of Economy, University
More informationTania Tudorache Stanford University. - Ontolog forum invited talk04. October 2007
Collaborative Ontology Development in Protégé Tania Tudorache Stanford University - Ontolog forum invited talk04. October 2007 Outline Introduction and Background Tools for collaborative knowledge development
More informationMining the Web 2.0 to improve Search
Mining the Web 2.0 to improve Search Ricardo Baeza-Yates VP, Yahoo! Research Agenda The Power of Data Examples Improving Image Search (Faceted Clusters) Searching the Wikipedia (Correlator) Understanding
More informationHoloClean: Holistic Data Repairs with Probabilistic Inference
HoloClean: Holistic Data Repairs with Probabilistic Inference Theodoros Rekatsinas *, Xu Chu, Ihab F. Ilyas, Christopher Ré * {thodrek, chrismre}@cs.stanford.edu, {x4chu, ilyas}@uwaterloo.ca * Stanford
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
CHAPTER 26 Enhanced Data Models: Introduction to Active, Temporal, Spatial, Multimedia, and Deductive Databases 26.1 Active Database Concepts and Triggers Database systems implement rules that specify
More informationA Framework for Securing Databases from Intrusion Threats
A Framework for Securing Databases from Intrusion Threats R. Prince Jeyaseelan James Department of Computer Applications, Valliammai Engineering College Affiliated to Anna University, Chennai, India Email:
More informationFBR SYSTEM: USER DIRECTED FILTERING OF IMPRECISE QUERIES
FBR SYSTEM: USER DIRECTED FILTERING OF IMPRECISE QUERIES Sarika Sarode 1, K. V. Metre 2 1 Department of Computer Engineering, MET s IOE, Maharashtra, India 2 Department of Computer Engineering, MET s IOE,
More informationLink Mining & Entity Resolution. Lise Getoor University of Maryland, College Park
Link Mining & Entity Resolution Lise Getoor University of Maryland, College Park Learning in Structured Domains Traditional machine learning and data mining approaches assume: A random sample of homogeneous
More informationDistributed Case-based Reasoning for Fault Management
Distributed Case-based Reasoning for Fault Management Ha Manh Tran and Jürgen Schönwälder Computer Science, Jacobs University Bremen, Germany 1st EMANICS Workshop on Peer-to-Peer Management University
More informationProbabilistic/Uncertain Data Management
Probabilistic/Uncertain Data Management 1. Dalvi, Suciu. Efficient query evaluation on probabilistic databases, VLDB Jrnl, 2004 2. Das Sarma et al. Working models for uncertain data, ICDE 2006. Slides
More informationAsking the Right Questions in Crowd Data Sourcing
MoDaS Mob Data Sourcing Asking the Right Questions in Crowd Data Sourcing Tova Milo Tel Aviv University Outline Introduction to crowd (data) sourcing Databases and crowds Declarative is good How to best
More informationJoin Bayes Nets: A New Type of Bayes net for Relational Data
Join Bayes Nets: A New Type of Bayes net for Relational Data Oliver Schulte oschulte@cs.sfu.ca Hassan Khosravi hkhosrav@cs.sfu.ca Bahareh Bina bba18@cs.sfu.ca Flavia Moser fmoser@cs.sfu.ca Abstract Many
More informationSemantic Optimization of Preference Queries
Semantic Optimization of Preference Queries Jan Chomicki University at Buffalo http://www.cse.buffalo.edu/ chomicki 1 Querying with Preferences Find the best answers to a query, instead of all the answers.
More informationDEC Computer Technology LESSON 6: DATABASES AND WEB SEARCH ENGINES
DEC. 1-5 Computer Technology LESSON 6: DATABASES AND WEB SEARCH ENGINES Monday Overview of Databases A web search engine is a large database containing information about Web pages that have been registered
More informationProvenance: Information for Shared Understanding
Provenance: Information for Shared Understanding M. David Allen June 2012 Approved for Public Release: 3/7/2012 Case 12-0965 Government Mandates Net-Centric Data Strategy mandate: Is the source, accuracy
More informationNYU CSCI-GA Fall 2016
1 / 45 Information Retrieval: Personalization Fernando Diaz Microsoft Research NYC November 7, 2016 2 / 45 Outline Introduction to Personalization Topic-Specific PageRank News Personalization Deciding
More informationPliny and Fixr Meeting. September 15, 2014
Pliny and Fixr Meeting September 15, 2014 Fixr: Mining and Understanding Bug Fixes for App-Framework Protocol Defects (TA2) University of Colorado Boulder September 15, 2014 Fixr: Mining and Understanding
More informationEquipping Robot Control Programs with First-Order Probabilistic Reasoning Capabilities
Equipping Robot s with First-Order Probabilistic Reasoning Capabilities Nicholas M. Stiffler Ashok Kumar April 2011 1 / 48 Outline An autonomous robot system that is to act in a real-world environment
More informationFrom Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019
From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways
More informationUAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA
UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University
More informationAutoencoder. Representation learning (related to dictionary learning) Both the input and the output are x
Deep Learning 4 Autoencoder, Attention (spatial transformer), Multi-modal learning, Neural Turing Machine, Memory Networks, Generative Adversarial Net Jian Li IIIS, Tsinghua Autoencoder Autoencoder Unsupervised
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationLearning mappings and queries
Learning mappings and queries Marie Jacob University Of Pennsylvania DEIS 2010 1 Schema mappings Denote relationships between schemas Relates source schema S and target schema T Defined in a query language
More informationInformatica Enterprise Information Catalog
Data Sheet Informatica Enterprise Information Catalog Benefits Automatically catalog and classify all types of data across the enterprise using an AI-powered catalog Identify domains and entities with
More informationSafe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
More informationSupporting Fuzzy Keyword Search in Databases
I J C T A, 9(24), 2016, pp. 385-391 International Science Press Supporting Fuzzy Keyword Search in Databases Jayavarthini C.* and Priya S. ABSTRACT An efficient keyword search system computes answers as
More informationCS6200 Information Retrieval. Jesse Anderton College of Computer and Information Science Northeastern University
CS6200 Information Retrieval Jesse Anderton College of Computer and Information Science Northeastern University Major Contributors Gerard Salton! Vector Space Model Indexing Relevance Feedback SMART Karen
More informationOntology Based Prediction of Difficult Keyword Queries
Ontology Based Prediction of Difficult Keyword Queries Lubna.C*, Kasim K Pursuing M.Tech (CSE)*, Associate Professor (CSE) MEA Engineering College, Perinthalmanna Kerala, India lubna9990@gmail.com, kasim_mlp@gmail.com
More informationReview -Chapter 4. Review -Chapter 5
Review -Chapter 4 Entity relationship (ER) model Steps for building a formal ERD Uses ER diagrams to represent conceptual database as viewed by the end user Three main components Entities Relationships
More informationLogic: TD as search, Datalog (variables)
Logic: TD as search, Datalog (variables) Computer Science cpsc322, Lecture 23 (Textbook Chpt 5.2 & some basic concepts from Chpt 12) June, 8, 2017 CPSC 322, Lecture 23 Slide 1 Lecture Overview Recap Top
More informationMulti-dimensional Skyline to find shopping malls. Md Amir Amiruzzaman Suphanut Parn Jamonnak Zhengyong Ren
Multi-dimensional Skyline to find shopping malls Md Amir Amiruzzaman Suphanut Parn Jamonnak Zhengyong Ren Introduction In market research predicting customer movement is very important. While customers
More informationSilberschatz, Korth and Sudarshan See for conditions on re-use
Chapter 3: SQL Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 3: SQL Data Definition Basic Query Structure Set Operations Aggregate Functions Null Values Nested
More informationBig Data Infrastructures & Technologies
Big Data Infrastructures & Technologies Spark and MLLIB OVERVIEW OF SPARK What is Spark? Fast and expressive cluster computing system interoperable with Apache Hadoop Improves efficiency through: In-memory
More informationCombining the Logical and the Probabilistic in Program Analysis. Xin Zhang Xujie Si Mayur Naik University of Pennsylvania
Combining the Logical and the Probabilistic in Program Analysis Xin Zhang Xujie Si Mayur Naik University of Pennsylvania What is Program Analysis? int f(int i) {... } Program Analysis x may be null!...
More informationRelational model continued. Understanding how to use the relational model. Summary of board example: with Copies as weak entity
COS 597A: Principles of Database and Information Systems Relational model continued Understanding how to use the relational model 1 with as weak entity folded into folded into branches: (br_, librarian,
More informationCase Study: Lufthansa Cargo Database
Case Study: Lufthansa Cargo Database Carsten Schürmann 1 Today s lecture More on data modelling Introduction to Lufthansa Cargo Database Entity Relationship diagram Boyce-Codd normal form 2 From Lecture
More informationSimilarity Joins of Text with Incomplete Information Formats
Similarity Joins of Text with Incomplete Information Formats Shaoxu Song and Lei Chen Department of Computer Science Hong Kong University of Science and Technology {sshaoxu,leichen}@cs.ust.hk Abstract.
More informationClass-Level Bayes Nets for Relational Data
Class-Level Bayes Nets for Relational Data Oliver Schulte, Hassan Khosravi, Flavia Moser, Martin Ester {oschulte, hkhosrav, fmoser, ester}@cs.sfu.ca School of Computing Science Simon Fraser University
More informationCTL.SC4x Technology and Systems
in Supply Chain Management CTL.SC4x Technology and Systems Key Concepts Document This document contains the Key Concepts for the SC4x course, Weeks 1 and 2. These are meant to complement, not replace,
More information5/3/2010Z:\ jeh\self\notes.doc\7 Chapter 7 Graphical models and belief propagation Graphical models and belief propagation
//00Z:\ jeh\self\notes.doc\7 Chapter 7 Graphical models and belief propagation 7. Graphical models and belief propagation Outline graphical models Bayesian networks pair wise Markov random fields factor
More informationData Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems
Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check
More informationAN INTERACTIVE FORM APPROACH FOR DATABASE QUERIES THROUGH F-MEASURE
http:// AN INTERACTIVE FORM APPROACH FOR DATABASE QUERIES THROUGH F-MEASURE Parashurama M. 1, Doddegowda B.J 2 1 PG Scholar, 2 Associate Professor, CSE Department, AMC Engineering College, Karnataka, (India).
More informationIntegrating auxiliary data in optimal spatial design for species distribution mapping
Integrating auxiliary data in optimal spatial design for species distribution mapping Brian Reich, Krishna Pacifici and Jon Stallings North Carolina State University Reich + Pacifici + Stallings Optimal
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More information11/04/16. Data Profiling. Helena Galhardas DEI/IST. References
Data Profiling Helena Galhardas DEI/IST References Slides Data Profiling course, Felix Naumann, Trento, July 2015 Z. Abedjan, L. Golab, F. Naumann, Profiling Relational Data A Survey, VLDBJ 2015 T. Papenbrock
More informationMarkov Random Fields and Gibbs Sampling for Image Denoising
Markov Random Fields and Gibbs Sampling for Image Denoising Chang Yue Electrical Engineering Stanford University changyue@stanfoed.edu Abstract This project applies Gibbs Sampling based on different Markov
More informationData Cleansing. LIU Jingyuan, Vislab WANG Yilei, Theoretical group
Data Cleansing LIU Jingyuan, Vislab WANG Yilei, Theoretical group What is Data Cleansing Data cleansing (data cleaning) is the process of detecting and correcting (or removing) errors or inconsistencies
More informationTHE RELATIONAL MODEL. University of Waterloo
THE RELATIONAL MODEL 1-1 List of Slides 1 2 The Relational Model 3 Relations and Databases 4 Example 5 Another Example 6 What does it mean? 7 Example Database 8 What can we do with it? 9 Variables and
More informationHaLoop Efficient Iterative Data Processing on Large Clusters
HaLoop Efficient Iterative Data Processing on Large Clusters Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael D. Ernst University of Washington Department of Computer Science & Engineering Presented
More informationResearch challenges in data-intensive computing The Stratosphere Project Apache Flink
Research challenges in data-intensive computing The Stratosphere Project Apache Flink Seif Haridi KTH/SICS haridi@kth.se e2e-clouds.org Presented by: Seif Haridi May 2014 Research Areas Data-intensive
More informationTarget and source schemas may contain integrity constraints. source schema(s) assertions relating elements of the global schema to elements of the
Data integration Data Integration System: target (integrated) schema source schema (maybe more than one) assertions relating elements of the global schema to elements of the source schema(s) Target and
More informationStructured Data on the Web
Structured Data on the Web Alon Halevy Google Australasian Computer Science Week January, 2010 Structured Data & The Web Andree Hudson, 4 th of July Hard to find structured data via search engines
More informationFACTORBASE : SQL for Learning A Multi-Relational Graphical Model
1 FACTORBASE : SQL for Learning A Multi-Relational Graphical Model Oliver Schulte, Zhensong Qian Simon Fraser University, Canada {oschulte,zqian}@sfu.ca arxiv:1508.02428v1 [cs.db] 10 Aug 2015 Abstract
More informationVISION & LANGUAGE From Captions to Visual Concepts and Back
VISION & LANGUAGE From Captions to Visual Concepts and Back Brady Fowler & Kerry Jones Tuesday, February 28th 2017 CS 6501-004 VICENTE Agenda Problem Domain Object Detection Language Generation Sentence
More informationPRIS at TAC2012 KBP Track
PRIS at TAC2012 KBP Track Yan Li, Sijia Chen, Zhihua Zhou, Jie Yin, Hao Luo, Liyin Hong, Weiran Xu, Guang Chen, Jun Guo School of Information and Communication Engineering Beijing University of Posts and
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets: Inference (Finish) Variable Elimination Graph-view of VE: Fill-edges, induced width
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationitrails: Pay-as-you-go Information Integration in Dataspaces
itrails: Pay-as-you-go Information Integration in Dataspaces Marcos Vaz Salles Jens Dittrich Shant Karakashian Olivier Girard Lukas Blunschi ETH Zurich VLDB 2007 Outline Motivation itrails Experiments
More informationSearch Engines. Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Beyond Bag of Words Bag of Words a document is considered to be an unordered collection of words with no relationships Extending
More informationDatabases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016
+ Databases and Information Retrieval Integration TIETS42 Autumn 2016 Kostas Stefanidis kostas.stefanidis@uta.fi http://www.uta.fi/sis/tie/dbir/index.html http://people.uta.fi/~kostas.stefanidis/dbir16/dbir16-main.html
More informationA synthetic query-aware database generator
A synthetic query-aware database generator Anonymous Department of Computer Science Golisano College of Computing and Information Sciences Rochester, NY 14586 Abstract In database applications and DBMS
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data
More informationDrawing the Big Picture
Drawing the Big Picture Multi-Platform Data Architectures, Queries, and Analytics Philip Russom TDWI Research Director for Data Management August 26, 2015 Sponsor 2 Speakers Philip Russom TDWI Research
More informationEECS 647: Introduction to Database Systems
EECS 647: Introduction to Database Systems Instructor: Luke Huan Spring 2009 Stating Points A database A database management system A miniworld A data model Conceptual model Relational model 2/24/2009
More informationHorizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator
Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator R.Saravanan 1, J.Sivapriya 2, M.Shahidha 3 1 Assisstant Professor, Department of IT,SMVEC, Puducherry, India 2,3 UG student, Department
More informationEO Ground Segment Evolution Reflections by
EO Ground Segment Evolution Reflections by Interoute Jonathan Brown Marketing Director Workshop 2015, 24 th September 2015 ESA/ESRIN Frascati Interoute, from the ground to the cloud 1. Interoute is the
More information(12) Patent Application Publication (10) Pub. No.: US 2010/ A1. Yu (43) Pub. Date: Aug. 26, 2010
US 2010O217768A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2010/0217768 A1 Yu (43) Pub. Date: (54) QUERY SYSTEM FOR BIOMEDICAL Publication Classification LITERATURE USING
More informationOpen Data Integration. Renée J. Miller
Open Data Integration Renée J. Miller miller@northeastern.edu !2 Open Data Principles Timely & Comprehensive Accessible and Usable Complete - All public data is made available. Public data is data that
More informationDatabase Applications (15-415)
Database Applications (15-415) ER to Relational & Relational Algebra Lecture 4, January 20, 2015 Mohammad Hammoud Today Last Session: The relational model Today s Session: ER to relational Relational algebra
More informationHibernate Search Googling your persistence domain model. Emmanuel Bernard Doer JBoss, a division of Red Hat
Hibernate Search Googling your persistence domain model Emmanuel Bernard Doer JBoss, a division of Red Hat Search: left over of today s applications Add search dimension to the domain model Frankly, search
More informationReproducible Workflows Biomedical Research. P Berlin, Germany
Reproducible Workflows Biomedical Research P11 2018 Berlin, Germany Contributors Leslie McIntosh Research Data Alliance, U.S., Executive Director Oya Beyan Aachen University, Germany Anthony Juehne RDA,
More informationDATABASE TECHNOLOGY - 1MB025 (also 1DL029, 1DL300+1DL400)
1 DATABASE TECHNOLOGY - 1MB025 (also 1DL029, 1DL300+1DL400) Spring 2008 An introductury course on database systems http://user.it.uu.se/~udbl/dbt-vt2008/ alt. http://www.it.uu.se/edu/course/homepage/dbastekn/vt08/
More informationQuestion Answering over Knowledge Bases: Entity, Text, and System Perspectives. Wanyun Cui Fudan University
Question Answering over Knowledge Bases: Entity, Text, and System Perspectives Wanyun Cui Fudan University Backgrounds Question Answering (QA) systems answer questions posed by humans in a natural language.
More informationSAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine
SAP IQ Software16, Edge Edition The Affordable High Performance Analytical Database Engine Agenda Agenda Introduction to Dobler Consulting Today s Data Challenges Overview of SAP IQ 16, Edge Edition SAP
More informationWhy do we need graph processing?
Why do we need graph processing? Community detection: suggest followers? Determine what products people will like Count how many people are in different communities (polling?) Graphs are Everywhere Group
More informationTuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS
Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS Feng Niu Christopher Ré AnHai Doan Jude Shavlik University of Wisconsin-Madison {leonn,chrisre,anhai,shavlik}@cs.wisc.edu
More information