Holistic and Compact Selectivity Estimation for Hybrid Queries over RDF Graphs
|
|
- Chloe Freeman
- 6 years ago
- Views:
Transcription
1 Holistic and Compact Selectivity Estimation for Hybrid Queries over RDF Graphs Authors: Andreas Wagner, Veli Bicer, Thanh Tran, and Rudi Studer Presenter: Freddy Lecue IBM Research Ireland 2014 International Business Machines Corporation 2014 International Business Machines Corporation 1
2 Outline Introduction Text-Rich Data-Graphs and Hybrid Queries Problem Definition Contributions TopGuess Data Synopsis Probabilistic Component Evaluation Conclusion References 2014 International Business Machines Corporation 2014 International Business Machines Corporation 2
3 Text-Rich Data-Graphs and Hybrid Queries Increasing amount of semi-structured, text-rich data: Structure Structured data with unstructured texts (e.g., [1]). Unstructed data annotated with structured information (e.g., [2]). Text [1] DBpedia A Crystallization Point for the Web of Data. [2] International Business Machines Corporation 2014 International Business Machines Corporation 3
4 Text-Rich Data-Graphs and Hybrid Queries (2) Focus of our work: conjuctive, hybrid queries relation attribute?x?y keyword structured query predicates unstructured query predicates string (query) predicates Structure Text 2014 International Business Machines Corporation 2014 International Business Machines Corporation 4
5 Problem Definition (1) Problem: Efficiently and effectively estimate the result set size for a conjuctive, hybrid query Q. Decompose problem: sel(q) = R(Q) * P(Q), [5]. R(Q): upper-bound cardinality for result set. P(Q): probability for Q having an non-empty result. [5] Selectivity estimation using probabilistic models. Correlation between query predicates (data elements) make approximation of P(Q) hard. Correlations?x relation attribute relation?y attribute keyword relation attribute keyword keyword Correlations Correlations Correlations make estimations relying on! indepence assumptions error-prone 2014 International Business Machines Corporation 2014 International Business Machines Corporation 5
6 Problem Definition (2) Previous works focuses either on structured or on unstructured query constraints. - Graph synopses [3] - Join samples [4] - PRMs [5,6] - In our previous work[18], we introduced a uniform model (BN+) for hybrid queries: Effectiveness Issues: Difficulty of capturing all correlations between text and structure Pruning text (i.e. vocabulary) using string synopses result in an "information loss" Efficiency Issues: Correlations?x relation?y attribute relation keyword relation keyword keyword Correlations Data synopsis: Large query-independent BN constructed offline. Grows exponentially w.r.t. vocabulary size Estimation: BN inferencing over large synopsis which is NP-hard. Correlations - Fuzzy string matching [7,8] - Extraction operators [9,10] - [18] Wagner et.al, EDBT 2013, Selectivity estimation for hybrid queries over text-rich data graphs 2014 International Business Machines Corporation 2014 International Business Machines Corporation 6
7 Problem Definition (3) Motivating Example There can many entities of type Person (i.e., bindings for?p a Person), while only few entities have a name Audrey". So, in order to estimate the # bindings for?p, a synopsis has to capture statistics for any word associated (via name) with Person entities. Data Graph Hybrid Query 2014 International Business Machines Corporation 2014 International Business Machines Corporation 7
8 Contributions We propose a novel approach (TopGuess), which utilizes relational topic models as data synopsis summarizing textual data with linear space complexity w.r.t. vocabulary size allowing to capture statistics for the complete vocabulary of words by means of topics (no "information loss" due to coarse-grained string synopses) Correlations between the structure and the text via topics TopGuess constructs a small query-specific BN at the query time for estimation With time complexity independent of the synopsis size so not directly use a large synopsis in memory at runtime, instead, employ a small and compact synopsis for the current query. Experiments on real-world data: improve effectiveness by up to 88% - without sacricing runtime performance International Business Machines Corporation 2014 International Business Machines Corporation 8
9 TOPGUESS 2014 International Business Machines Corporation 2014 International Business Machines Corporation 9
10 Data Synopsis Uniform synopsis using relational topic models Different topic models can be used [19] [20] [21] [22] Synopsis Parameters Topics: Textual data in a low-dimensional representation via a set of k topics Class-Topic Parameter: correlations between a class (e.g. Movie, Person) and topics (represented as a vector for each class) Relation-Topic Parameter: correlations between a relation (e.g. starring) and topics (represented as a matrix for each relation) Given topics, TopGuess data synopsis has linear space complexity w.r.t. vocabulary (see Thm. 1 in the paper) Synopsis of example data graph using TRM [19] 2014 International Business Machines Corporation International Business Machines Corporation 10
11 Probabilistic Component (1) TopGuess constructs a small query-specific BN for each query at query-time Every predicate in the query is represented as an observed random variable in BN Class, relation and string predicates Also each query variable v (e.g. m, p, l) is represented as a topical random variable X v in BN (e.g. X m, X p, X l ) Those topical random variables are modelled as multinomial distribution over the topics So every query variable is perceived as topic mixtures However, initially the distribution of X v is unknown (hidden) so learned using gradient ascent Query-specific BN is acyclic (see Thrm.2 in the paper) Hybrid Query Query-specific BN 2014 International Business Machines Corporation International Business Machines Corporation 11
12 Probabilistic Component (2) TIA considers that query predicate probabilities depend on (and are governed by) the topics of their associated topical random variables For instance, random variable X holiday is only dependent on X m. In other words, given X m, X holiday is conditionally independent of all other variables, e.g., X audrey. TIA allows us to easily estimate P(Q) via: Topical Independence Assumption (TIA) Given topical random variables (X v ), all the query predicate random variables in the query-specific BN is independent 2014 International Business Machines Corporation International Business Machines Corporation 12
13 EVALUATION 2014 International Business Machines Corporation International Business Machines Corporation 13
14 Evaluation (1) Setting Data: IMDB [14] and DBLP [15]. IMDB featured more correlations than DBLP. Both datasets have large vocabularies: ~25 million (DBLP) and ~7 million (IMDB) words Queries: recent keyword search benchmarks [13,14]. We employed 54 DBLP queries and 46 IMDB queries. Systems: We used n-gram-based string synopses [10]: random samples of 1-grams, top-k 1-grams, stratified bloom filters on 1-grams. String predicates were integrated via (1) independence (ind) or (2) conditional independence (bn) assumption. TopGuess [13] Spark2: Top-k keyword query in relational data-bases. [14] A framework for evaluating database key-word search strategies International Business Machines Corporation International Business Machines Corporation 14
15 Evaluation (2) Setting (2) Synopsis size: We employ baselines with varying synopsis size by varying # words captured by the string synopsis Overall synopsis size depends mainly on string synopsis size. Synopses sizes {2, 4, 20, 40} MByte in memory. In contrast, TopGuess keeps a large topic model (281MB-IMDB and 229MB-DBLP) at disk and constructs a small, query-specific BN in memory at runtime (~ 100 KBytes) Metrics: Efficiency: selectivity estimation time. Effectiveness: multiplicative error [17]. [17] Independence is good: De-pendency-based histogram syno-pses for high-dimensional data International Business Machines Corporation International Business Machines Corporation 15
16 Evaluation (3) Results 2014 International Business Machines Corporation International Business Machines Corporation 16
17 Conclusion We proposed a holistic approach (TopGuess) for selectivity estimation of hybrid queries. TopGuess uses RTMs with linear space complexity w.r.t. vocabulary Compact query-specific BN as probabilistic component enables estimation independent from synopsis size Empirical studies on real-world data achieved strong effectiveness improvements, while not requiring additional runtime. Future work: Extending TopGuess to a more generic selectivity estimation approach for RDF data and BGP queries Replacing the topic models in our data synopsis with different application-specific synopses (e.g. streaming RDF data) 2014 International Business Machines Corporation International Business Machines Corporation 17
18 References [1] Christian Bizer et al: DBpedia A Crystallization Point for the Web of Data. Journal of Web Semantics: Science, Services and Agents on the World Wide Web, Issue 7, Pages , [2] [3] S. Acharya, P. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In SIGMOD, pages , [4] J. Spiegel and N. Polyzotis. Graph-based synopses for relational selectivity estimation. In SIGMOD, pages , [5] L. Getoor, B. Taskar, and D. Koller. Selectivity estimation using probabilistic models. In SIGMOD, pages , [6] K.Tzoumas, A. Deshpande, and C. S. Jensen. Lightweight graphical models for selectivity estimation without independence assumptions. PVLDB, 4(11): , [7] S. Chaudhuri, V. Ganti, and L. Gravano. Selectivity estimation for string predicates: Overcoming the underestimation problem. In ICDE, pages , [8] L. Jin and C. Li. Selectivity estimation for fuzzy string predicates in large data sets. In VLDB, pages , International Business Machines Corporation International Business Machines Corporation 18
19 References (2) [9] W. Shen, A. Doan, J. F. Naughton, and R. Ramakrishnan. Declarative information extraction using datalog with embedded extraction predicates. In VLDB, pages , [10] D. Z. Wang, L. Wei, Y. Li, F. Reiss, and S. Vaithyanathan. Selectivity estimation for extraction operators over text data. In ICDE, pages , [11] C. Chow and C. Liu. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3): ,1968. [12] M. Meila and M. Jordan. Learning with mixtures of trees. The Journal of Machine Learning Research, 1:1 48, [13] Y. Luo, W. Wang, X. Lin, X. Zhou, J. Wang, and K. Li. Spark2: Top-k keyword query in relational databases. IEEE Transactions on Knowledge and Data Engineering, 23(12): , [14] J. Coffman and A. C. Weaver. A framework for evaluating database keyword search strategies. In CIKM, pages , [15] [16] D. Koller and N. Friedman. Probabilistic graphical models. MIT press, [17] A. Deshpande, M. N. Garofalakis, and R. Rastogi. Independence is good: Dependency-based histogram synopses for highdimensional data. In SIGMOD, pages , [18] A. Wagner, V. Bicer, T. Tran: Selectivity estimation for hybrid queries over text-rich data graphs. EDBT 2013: International Business Machines Corporation International Business Machines Corporation 19
20 References (3) [19] V. Bicer, T. Tran, Y. Ma, and R. Studer. TRM - Learning Dependencies between Text and Structure with Topical Relational Models. In ISWC, [20] J. Chang and D. Blei. Relational Topic Models for Document Networks. In AIStats, [21] Y. Liu, A. Niculescu-Mizil, and W. Gryc. Topic-link LDA: Joint Models of Topic and Author Community. In ICML, [22]L. Zhang et al. Multirelational Topic Models. In ICDM, International Business Machines Corporation International Business Machines Corporation 20
Selectivity Estimation for Hybrid Queries over Text-Rich Data Graphs
Selectivity Estimation for Hybrid Queries over Text-Rich Data Graphs Andreas Wagner AIFB, KIT Karlsruhe, Germany a.wagner@kit.edu Veli Bicer IBM Research, Smarter Cities Technology Centre Dublin, Ireland
More informationEffective Semantic Search over Huge RDF Data
Effective Semantic Search over Huge RDF Data 1 Dinesh A. Zende, 2 Chavan Ganesh Baban 1 Assistant Professor, 2 Post Graduate Student Vidya Pratisthan s Kamanayan Bajaj Institute of Engineering & Technology,
More informationGraph-Based Synopses for Relational Data. Alkis Polyzotis (UC Santa Cruz)
Graph-Based Synopses for Relational Data Alkis Polyzotis (UC Santa Cruz) Data Synopses Data Query Result Data Synopsis Query Approximate Result Problem: exact answer may be too costly to compute Examples:
More informationEvaluation of Keyword Search System with Ranking
Evaluation of Keyword Search System with Ranking P.Saranya, Dr.S.Babu UG Scholar, Department of CSE, Final Year, IFET College of Engineering, Villupuram, Tamil nadu, India Associate Professor, Department
More informationISSN Vol.08,Issue.18, October-2016, Pages:
ISSN 2348 2370 Vol.08,Issue.18, October-2016, Pages:3571-3578 www.ijatir.org Efficient Prediction of Difficult Keyword Queries Over Data Bases SHALINI ATLA 1, DEEPTHI JANAGAMA 2 1 PG Scholar, Dept of CSE,
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationOntology Based Prediction of Difficult Keyword Queries
Ontology Based Prediction of Difficult Keyword Queries Lubna.C*, Kasim K Pursuing M.Tech (CSE)*, Associate Professor (CSE) MEA Engineering College, Perinthalmanna Kerala, India lubna9990@gmail.com, kasim_mlp@gmail.com
More informationRanking Web Pages by Associating Keywords with Locations
Ranking Web Pages by Associating Keywords with Locations Peiquan Jin, Xiaoxiang Zhang, Qingqing Zhang, Sheng Lin, and Lihua Yue University of Science and Technology of China, 230027, Hefei, China jpq@ustc.edu.cn
More informationMining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams
Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06
More informationIJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: 2.114
[Saranya, 4(3): March, 2015] ISSN: 2277-9655 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY A SURVEY ON KEYWORD QUERY ROUTING IN DATABASES N.Saranya*, R.Rajeshkumar, S.Saranya
More informationMultimodal Information Spaces for Content-based Image Retrieval
Research Proposal Multimodal Information Spaces for Content-based Image Retrieval Abstract Currently, image retrieval by content is a research problem of great interest in academia and the industry, due
More informationJianyong Wang Department of Computer Science and Technology Tsinghua University
Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity
More informationVolume 2, Issue 11, November 2014 International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 11, November 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com
More informationExtracting and Querying Probabilistic Information From Text in BayesStore-IE
Extracting and Querying Probabilistic Information From Text in BayesStore-IE Daisy Zhe Wang, Michael J. Franklin, Minos Garofalakis 2, Joseph M. Hellerstein University of California, Berkeley Technical
More informationInternational Journal of Advance Engineering and Research Development. Performance Enhancement of Search System
Scientific Journal of Impact Factor(SJIF): 3.134 International Journal of Advance Engineering and Research Development Volume 2,Issue 7, July -2015 Performance Enhancement of Search System Ms. Uma P Nalawade
More informationEffective Latent Space Graph-based Re-ranking Model with Global Consistency
Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case
More informationEffective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar
Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar School of Computer Science Faculty of Science University of Windsor How Big is this Big Data? 40 Billion Instagram Photos 300 Hours
More informationLinked Bernoulli Synopses: Sampling Along Foreign Keys
Linked Bernoulli Synopses: Sampling Along Foreign Keys Rainer Gemulla, Philipp Rösch, Wolfgang Lehner Database Technology Group Technische Universität Dresden, Germany {gemulla,roesch,lehner}@inf.tu-dresden.de
More informationEffective Top-k Keyword Search in Relational Databases Considering Query Semantics
Effective Top-k Keyword Search in Relational Databases Considering Query Semantics Yanwei Xu 1,2, Yoshiharu Ishikawa 1, and Jihong Guan 2 1 Graduate School of Information Science, Nagoya University, Japan
More informationτ-xsynopses - A System for Run-time Management of XML Synopses
τ-xsynopses - A System for Run-time Management of XML Synopses Natasha Drukh School of Computer Science Tel Aviv University kreimern@cs.tau.ac.il Leon Portman School of Computer Science Tel Aviv University
More informationKeywords Data alignment, Data annotation, Web database, Search Result Record
Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web
More informationEstimating the Selectivity of XML Path Expression with predicates by Histograms
Estimating the Selectivity of XML Path Expression with predicates by Histograms Yu Wang 1, Haixun Wang 2, Xiaofeng Meng 1, and Shan Wang 1 1 Information School, Renmin University of China, Beijing 100872,
More informationTop-k Linked Data Query Processing
Top-k Linked Data Query Processing Andreas Wagner, Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer Institute AIFB, Karlsruhe Institute of Technology, Germany {a.wagner,ducthanh.tran,guenter.ladwig,harth,studer}@kit.edu
More informationImproving Difficult Queries by Leveraging Clusters in Term Graph
Improving Difficult Queries by Leveraging Clusters in Term Graph Rajul Anand and Alexander Kotov Department of Computer Science, Wayne State University, Detroit MI 48226, USA {rajulanand,kotov}@wayne.edu
More informationCS573 Data Privacy and Security. Differential Privacy tabular data and range queries. Li Xiong
CS573 Data Privacy and Security Differential Privacy tabular data and range queries Li Xiong Outline Tabular data and histogram/range queries Algorithms for low dimensional data Algorithms for high dimensional
More informationStatiX: Making XML Count
StatiX: Making XML Count * Prasan Roy Jerome Simeon Bell Labs - Lucent Technologies Jayant Haritsa Maya Ramanath Indian Institute of Science Statix SIGMOD, 2002 1 Motivation Statistics to estimate cardinality
More informationarxiv: v1 [cs.db] 22 Mar 2018
Learning State Representations for Query Optimization with Deep Reinforcement Learning Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, S. Sathiya Keerthi + University of Washington, Microsoft, Criteo
More informationCardinality Estimation: An Experimental Survey
: An Experimental Survey and Felix Naumann VLDB 2018 Estimation and Approximation Session Rio de Janeiro-Brazil 29 th August 2018 Information System Group Hasso Plattner Institut University of Potsdam
More informationEntity and Knowledge Base-oriented Information Retrieval
Entity and Knowledge Base-oriented Information Retrieval Presenter: Liuqing Li liuqing@vt.edu Digital Library Research Laboratory Virginia Polytechnic Institute and State University Blacksburg, VA 24061
More informationInformation Retrieval Using Keyword Search Technique
Information Retrieval Using Keyword Search Technique Dhananjay A. Gholap, Dr.Gumaste S. V Department of Computer Engineering, Sharadchandra Pawar College of Engineering, Dumbarwadi, Otur, Pune, India ABSTRACT:
More informationRandom projection for non-gaussian mixture models
Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,
More informationLinkedMDB. The first linked data source dedicated to movies
Oktie Hassanzadeh Mariano Consens University of Toronto April 20th, 2009 Madrid, Spain Presentation at the Linked Data On the Web (LDOW) 2009 Workshop LinkedMDB 2 The first linked data source dedicated
More informationSummary Models for Routing Keywords to Linked Data Sources
Summary Models for Routing Keywords to Linked Data Sources Thanh Tran, Lei Zhang, Rudi Studer Institute AIFB, Karlsruhe Institute of Technology, Germany {dtr,lzh,studer}@kit.edu Abstract. The proliferation
More informationSurvey of Spatial Approximate String Search
Survey of Spatial Approximate String Search B.Ramya M.Tech 1 1 Department of Computer Science and Engineering, Karunya University, Coimbatore, Tamil Nadu, India Abstract: Several applications require finding
More informationDBpedia-An Advancement Towards Content Extraction From Wikipedia
DBpedia-An Advancement Towards Content Extraction From Wikipedia Neha Jain Government Degree College R.S Pura, Jammu, J&K Abstract: DBpedia is the research product of the efforts made towards extracting
More informationKeyword search is a process of searching for relevant documents on the Web using one or
ABSTRACT Keyword search is a process of searching for relevant documents on the Web using one or more user specified words called Keywords. Keywords and their related data elements are linked using keyword
More informationLearning Statistical Models From Relational Data
Slides taken from the presentation (subset only) Learning Statistical Models From Relational Data Lise Getoor University of Maryland, College Park Includes work done by: Nir Friedman, Hebrew U. Daphne
More informationSelectivity Estimation for Extraction Operators over Text Data
Selectivity Estimation for Extraction Operators over Text Data Daisy Zhe Wang Long Wei Yunyao Li Frederick Reiss Shivakumar Vaithyanathan Electrical Engineering and Computer Sciences University of California
More informationAutomatic Linguistic Indexing of Pictures by a Statistical Modeling Approach
Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Outline Objective Approach Experiment Conclusion and Future work Objective Automatically establish linguistic indexing of pictures
More informationSELECTIVITY ESTIMATION USING CUSTOMIZED N-TRIPLE TEMPLATE IN RDF
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationLinking Entities in Chinese Queries to Knowledge Graph
Linking Entities in Chinese Queries to Knowledge Graph Jun Li 1, Jinxian Pan 2, Chen Ye 1, Yong Huang 1, Danlu Wen 1, and Zhichun Wang 1(B) 1 Beijing Normal University, Beijing, China zcwang@bnu.edu.cn
More informationECE521 Lecture 18 Graphical Models Hidden Markov Models
ECE521 Lecture 18 Graphical Models Hidden Markov Models Outline Graphical models Conditional independence Conditional independence after marginalization Sequence models hidden Markov models 2 Graphical
More informationEfficient Subgraph Matching by Postponing Cartesian Products
Efficient Subgraph Matching by Postponing Cartesian Products Computer Science and Engineering Lijun Chang Lijun.Chang@unsw.edu.au The University of New South Wales, Australia Joint work with Fei Bi, Xuemin
More informationA Schema Extraction Algorithm for External Memory Graphs Based on Novel Utility Function
DEIM Forum 2018 I5-5 Abstract A Schema Extraction Algorithm for External Memory Graphs Based on Novel Utility Function Yoshiki SEKINE and Nobutaka SUZUKI Graduate School of Library, Information and Media
More informationA Content Based Image Retrieval System Based on Color Features
A Content Based Image Retrieval System Based on Features Irena Valova, University of Rousse Angel Kanchev, Department of Computer Systems and Technologies, Rousse, Bulgaria, Irena@ecs.ru.acad.bg Boris
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationLiangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*
Tracking Trends: Incorporating Term Volume into Temporal Topic Models Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Dept. of Computer Science and Engineering, Lehigh University, Bethlehem, PA,
More informationExtending Keyword Search to Metadata in Relational Database
DEWS2008 C6-1 Extending Keyword Search to Metadata in Relational Database Jiajun GU Hiroyuki KITAGAWA Graduate School of Systems and Information Engineering Center for Computational Sciences University
More informationLearning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li
Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,
More informationQuery Independent Scholarly Article Ranking
Query Independent Scholarly Article Ranking Shuai Ma, Chen Gong, Renjun Hu, Dongsheng Luo, Chunming Hu, Jinpeng Huai SKLSDE Lab, Beihang University, China Beijing Advanced Innovation Center for Big Data
More informationEffective Searching of RDF Knowledge Bases
Effective Searching of RDF Knowledge Bases Shady Elbassuoni Joint work with: Maya Ramanath and Gerhard Weikum RDF Knowledge Bases Annie Hall is a 1977 American romantic comedy directed by Woody Allen and
More informationISSN Vol.05,Issue.07, July-2017, Pages:
WWW.IJITECH.ORG ISSN 2321-8665 Vol.05,Issue.07, July-2017, Pages:1320-1324 Efficient Prediction of Difficult Keyword Queries over Databases KYAMA MAHESH 1, DEEPTHI JANAGAMA 2, N. ANJANEYULU 3 1 PG Scholar,
More informationEfficient Prediction of Difficult Keyword Queries over Databases
Efficient Prediction of Difficult Keyword Queries over Databases Gurramkonda Lakshmi Priyanka P.G. Scholar (M. Tech), Department of CSE, Srinivasa Institute of Technology & Sciences, Ukkayapalli, Kadapa,
More informationEfficient Indexing and Searching Framework for Unstructured Data
Efficient Indexing and Searching Framework for Unstructured Data Kyar Nyo Aye, Ni Lar Thein University of Computer Studies, Yangon kyarnyoaye@gmail.com, nilarthein@gmail.com ABSTRACT The proliferation
More informationA PERSONALIZED RECOMMENDER SYSTEM FOR TELECOM PRODUCTS AND SERVICES
A PERSONALIZED RECOMMENDER SYSTEM FOR TELECOM PRODUCTS AND SERVICES Zui Zhang, Kun Liu, William Wang, Tai Zhang and Jie Lu Decision Systems & e-service Intelligence Lab, Centre for Quantum Computation
More informationText Document Clustering Using DPM with Concept and Feature Analysis
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,
More informationSummarizing and mining inverse distributions on data streams via dynamic inverse sampling
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling Presented by Graham Cormode cormode@bell-labs.com S. Muthukrishnan muthu@cs.rutgers.edu Irina Rozenbaum rozenbau@paul.rutgers.edu
More informationTwigINLAB: A Decomposition-Matching-Merging Approach To Improving XML Query Processing
American Journal of Applied Sciences 5 (9): 99-25, 28 ISSN 546-9239 28 Science Publications TwigINLAB: A Decomposition-Matching-Merging Approach To Improving XML Query Processing Su-Cheng Haw and Chien-Sing
More informationRiMOM Results for OAEI 2009
RiMOM Results for OAEI 2009 Xiao Zhang, Qian Zhong, Feng Shi, Juanzi Li and Jie Tang Department of Computer Science and Technology, Tsinghua University, Beijing, China zhangxiao,zhongqian,shifeng,ljz,tangjie@keg.cs.tsinghua.edu.cn
More informationMAINTAIN TOP-K RESULTS USING SIMILARITY CLUSTERING IN RELATIONAL DATABASE
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 MAINTAIN TOP-K RESULTS USING SIMILARITY CLUSTERING IN RELATIONAL DATABASE Syamily K.R 1, Belfin R.V 2 1 PG student,
More informationOptimal Workload-based Weighted Wavelet Synopses
Optimal Workload-based Weighted Wavelet Synopses Yossi Matias and Daniel Urieli School of Computer Science Tel-Aviv University {matias,daniel1}@tau.ac.il Abstract. In recent years wavelets were shown to
More informationA Brief Review of Representation Learning in Recommender 赵鑫 RUC
A Brief Review of Representation Learning in Recommender Systems @ 赵鑫 RUC batmanfly@qq.com Representation learning Overview of recommender systems Tasks Rating prediction Item recommendation Basic models
More informationSPARK: Top-k Keyword Query in Relational Database
SPARK: Top-k Keyword Query in Relational Database Wei Wang University of New South Wales Australia 20/03/2007 1 Outline Demo & Introduction Ranking Query Evaluation Conclusions 20/03/2007 2 Demo 20/03/2007
More informationA Robust Number Parser based on Conditional Random Fields
A Robust Number Parser based on Conditional Random Fields Heiko Paulheim Data and Web Science Group, University of Mannheim, Germany Abstract. When processing information from unstructured sources, numbers
More informationDistributed Sampling in a Big Data Management System
Distributed Sampling in a Big Data Management System Dan Radion University of Washington Department of Computer Science and Engineering Undergraduate Departmental Honors Thesis Advised by Dan Suciu Contents
More informationJingren Zhou. Microsoft Corp.
Jingren Zhou Microsoft Corp. Microsoft Bing Infrastructure BING applications fall into two broad categories: Back-end: Massive batch processing creates new datasets Front-end: Online request processing
More informationRefining Information Extraction Rules using Data Provenance
Refining Information Extraction Rules using Data Provenance Bin Liu 1, Laura Chiticariu 2 Vivian Chu 2 H.V. Jagadish 1 Frederick R. Reiss 2 1 University of Michigan 2 IBM Research Almaden Abstract Developing
More informationInverted Index for Fast Nearest Neighbour
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationTree Based Index (TBI) System. Getting Started with TBI
Tree Based Index (TBI) System Getting Started with TBI Jia Xu 1 Zhenjie Zhang 2 Anthony K. H. Tung 2 Ge Yu 1 1 {xujia,yuge}@ise.neu.edu.cn 2 {zhenjie,atung}@comp.nus.edu.sg May 5, 2010 1 System Introduction
More informationGraph Exploration: Taking the User into the Loop
Graph Exploration: Taking the User into the Loop Davide Mottin, Anja Jentzsch, Emmanuel Müller Hasso Plattner Institute, Potsdam, Germany 2016/10/24 CIKM2016, Indianapolis, US Where we are Background (5
More informationSearching SNT in XML Documents Using Reduction Factor
Searching SNT in XML Documents Using Reduction Factor Mary Posonia A Department of computer science, Sathyabama University, Tamilnadu, Chennai, India maryposonia@sathyabamauniversity.ac.in http://www.sathyabamauniversity.ac.in
More informationGraph Databases. Guilherme Fetter Damasio. University of Ontario Institute of Technology and IBM Centre for Advanced Studies IBM Corporation
Graph Databases Guilherme Fetter Damasio University of Ontario Institute of Technology and IBM Centre for Advanced Studies Outline Introduction Relational Database Graph Database Our Research 2 Introduction
More informationMining Trusted Information in Medical Science: An Information Network Approach
Mining Trusted Information in Medical Science: An Information Network Approach Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign Collaborated with many, especially Yizhou
More informationA Bayesian Approach to Hybrid Image Retrieval
A Bayesian Approach to Hybrid Image Retrieval Pradhee Tandon and C. V. Jawahar Center for Visual Information Technology International Institute of Information Technology Hyderabad - 500032, INDIA {pradhee@research.,jawahar@}iiit.ac.in
More informationAugust 2012 Daejeon, South Korea
Building a Web of Linked Entities (Part I: Overview) Pablo N. Mendes Free University of Berlin August 2012 Daejeon, South Korea Outline Part I A Web of Linked Entities Challenges Progress towards solutions
More informationQuery Segmentation Using Conditional Random Fields
Query Segmentation Using Conditional Random Fields Xiaohui Yu and Huxia Shi York University Toronto, ON, Canada, M3J 1P3 xhyu@yorku.ca,huxiashi@cse.yorku.ca ABSTRACT A growing mount of available text data
More informationA Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2
A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 1 Student, M.E., (Computer science and Engineering) in M.G University, India, 2 Associate Professor
More information[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632
More informationSummary: A Tutorial on Learning With Bayesian Networks
Summary: A Tutorial on Learning With Bayesian Networks Markus Kalisch May 5, 2006 We primarily summarize [4]. When we think that it is appropriate, we comment on additional facts and more recent developments.
More informationDynamic Bayesian network (DBN)
Readings: K&F: 18.1, 18.2, 18.3, 18.4 ynamic Bayesian Networks Beyond 10708 Graphical Models 10708 Carlos Guestrin Carnegie Mellon University ecember 1 st, 2006 1 ynamic Bayesian network (BN) HMM defined
More informationClustering For Similarity Search And Privacyguaranteed Publishing Of Hi-Dimensional Data Ashwini.R #1, K.Praveen *2, R.V.
Clustering For Similarity Search And Privacyguaranteed Publishing Of Hi-Dimensional Data Ashwini.R #1, K.Praveen *2, R.V.Krishnaiah *3 #1 M.Tech, Computer Science Engineering, DRKIST, Hyderabad, Andhra
More informationKeyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan
Keyword search in relational databases By SO Tsz Yan Amanda & HON Ka Lam Ethan 1 Introduction Ubiquitous relational databases Need to know SQL and database structure Hard to define an object 2 Query representation
More informationEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management
More informationEvaluating Multidimensional Histograms in ProstgreSQL
Evaluating Multidimensional Histograms in ProstgreSQL Dougal Sutherland Swarthmore College 500 College Ave Swarthmore, PA dsuther1@swarthmore.edu Ryan Carlson Swarthmore College 500 College Ave Swarthmore,
More informationGraph Symmetry and Social Network Anonymization
Graph Symmetry and Social Network Anonymization Yanghua XIAO ( 肖仰华 ) School of computer science Fudan University For more information, please visit http://gdm.fudan.edu.cn Graph isomorphism determination
More informationInternational Journal of Advance Research in Engineering, Science & Technology
Impact Factor (SJIF): 4.542 International Journal of Advance Research in Engineering, Science & Technology e-issn: 2393-9877, p-issn: 2394-2444 Volume 4, Issue 4, April-2017 A Simple Effective Algorithm
More informationRanking models in Information Retrieval: A Survey
Ranking models in Information Retrieval: A Survey R.Suganya Devi Research Scholar Department of Computer Science and Engineering College of Engineering, Guindy, Chennai, Tamilnadu, India Dr D Manjula Professor
More informationPathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data
PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data Enhua Jiao, Tok Wang Ling, Chee-Yong Chan School of Computing, National University of Singapore {jiaoenhu,lingtw,chancy}@comp.nus.edu.sg
More informationImproved Structured Robustness (I-SR): A Novel Approach to Predict Hard Keyword Queries
Journal of Scientific & Industrial Research Vol. 76, January 2017, pp. 38-43 Improved Structured Robustness (I-SR): A Novel Approach to Predict Hard Keyword Queries M S Selvi, K Deepa, M S Sangari* and
More informationParallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce
Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce Huayu Wu Institute for Infocomm Research, A*STAR, Singapore huwu@i2r.a-star.edu.sg Abstract. Processing XML queries over
More informationMaster Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala
Master Project Various Aspects of Recommender Systems May 2nd, 2017 Master project SS17 Albert-Ludwigs-Universität Freiburg Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue
More informationData Extraction and Alignment in Web Databases
Data Extraction and Alignment in Web Databases Mrs K.R.Karthika M.Phil Scholar Department of Computer Science Dr N.G.P arts and science college Coimbatore,India Mr K.Kumaravel Ph.D Scholar Department of
More informationApproximate Query Processing: What is New and Where to Go?
Data Science and Engineering (2018) 3:379 397 https://doi.org/10.1007/s41019-018-0074-4 Approximate Query Processing: What is New and Where to Go? A Survey on Approximate Query Processing Kaiyu Li 1 Guoliang
More informationKeyword Search over RDF Graphs. Elisa Menendez
Elisa Menendez emenendez@inf.puc-rio.br Summary Motivation Keyword Search over RDF Process Challenges Example QUIOW System Next Steps Motivation Motivation Keyword search is an easy way to retrieve information
More informationStructure Index for RDF Data
Structure Index for RDF Data Thanh Tran Institute AIFB Karlsruhe Institute of Technology (KIT) 7628 Karlsruhe, Germany ducthanh.tran@kit.edu Günter Ladwig Institute AIFB Karlsruhe Institute of Technology
More informationINDEX-BASED JOIN IN MAPREDUCE USING HADOOP MAPFILES
Al-Badarneh et al. Special Issue Volume 2 Issue 1, pp. 200-213 Date of Publication: 19 th December, 2016 DOI-https://dx.doi.org/10.20319/mijst.2016.s21.200213 INDEX-BASED JOIN IN MAPREDUCE USING HADOOP
More informationA Survey Paper on Keyword Search Mechanism for RDF Graph Model
ISSN 2395-1621 A Survey Paper on Keyword Search Mechanism for RDF Graph Model #1 Manisha Bhaik, #2 Shyam Gadekar, #3 Nikhil Gumaste, #4 Laxmikant Suryawanshi 1 manishabhaik1@gmail.com, 2 shyam.gadekar123@gmail.com,
More informationImgSeek: Capturing User s Intent For Internet Image Search
ImgSeek: Capturing User s Intent For Internet Image Search Abstract - Internet image search engines (e.g. Bing Image Search) frequently lean on adjacent text features. It is difficult for them to illustrate
More informationHybrid Feature Selection for Modeling Intrusion Detection Systems
Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,
More informationarxiv: v1 [cs.db] 6 Jan 2019
Exact Selectivity Computation for Modern In-Memory Database Optimization Jun Hyung Shin University of California Merced jshin33@ucmerced.edu Florin Rusu University of California Merced frusu@ucmerced.edu
More information