Semantic Inversion in XML Keyword Search with General Conditional Random Fields

Size: px
Start display at page:

Download "Semantic Inversion in XML Keyword Search with General Conditional Random Fields"

Transcription

1 Semantic Inversion in XML Keyword Search with General Conditional Random Fields Shu-Han Wang and Zhi-Hong Deng Key Laboratory of Machine Perception (Ministry of Education), School of Electronic Engineering and Computer Science, Peking University Abstract. Keyword search has been widely used in information retrieval systems, such as search engines. However, the input retrieval keywords are so ambiguous that we can hardly know the retrieval intent explicitly. Therefore, how to inverse keywords into semantic is meaningful. In this paper, we clearly define the Semantic Inversion problem in XML keyword search and solve it with General Conditional Random Fields. Our algorithm concerns different categories of relevance and provides the alternative label sequences corresponding to the retrieval keywords. The results of experiments show that our algorithm is effective and 12% higher than the baseline in terms of precision. 1 Introduction As a widely accepted tool, Keyword Search has been extensively used to search information from all kinds of databases, such as document corpus, relation databases, and semi-structure databases. However, the main weakness of Keyword Search is its ambiguity. Given a Keyword Search that consists of only keywords, it is hard to know what the users really want to search. For example, a user wants to find a paper published by IJCAI. The paper is written by Pineau and is about some technique based on point. For the above information requirement, a proper keyword search may be Point based Pineau IJCAI. However, most all existing works compute results by statistical information of keywords without understanding the semantics under these keywords. In fact, if the system knows that Point based is part of the paper s title(sometimes users may hardly remember the whole title, so only type in part of it), Pineau is the author, and IJCAI is the title of a proceeding. As we see, if we can recognize the inner semantic of users input, user s search intent will be much more explicit. Recent work have started to help users to construct semantic queries. [1] extracts structural semantic in graph data, and provide the top-k matching subgraphs according to the query. [2] proposes an automatic keyword query reformulation approach which extracts information in dataset offline and generates semantic of query online. [3] presents a novel system guiding users through a process of increasing semantic to specify their query intention. Pandey and Punera analyze user s search intent by extracting template structure of search queries[4]. Corresponding author. X. Lin et al. (Eds.): WISE 2013, Part I, LNCS 8180, pp , c Springer-Verlag Berlin Heidelberg 2013

2 432 S.-H. Wang and Z.-H. Deng These work fully proved that deeper semantic respect to keywords can greatly help our retrieval. However, different from all methods above, our algorithm mainly concentrate on tagging the keywords with labels in XML database, and gives users top-k matching label sequence according to keyword sequence. Since XML labels can be well semantical, users can affirm what they really need by selecting proper labels. Here we present our main contributions to keyword search on XML database. The Semantic Inversion for Keyword Search Given keyword sequence, our algorithm recognize it into label sequence, so as to understand the semantic of the keywords. We call this recognition Semantic Inversion. In our algorithm, alternative label sequences are provided after keywords are typed in. Users select the best labels matching their keywords, in order to clarify their retrieval intention. As semantic becomes so important in retrieval, Semantic Inversion can be a promising way to optimize the keyword search. Model the Semantic Inversion with CRF If the Semantic Inversion problem were difficult, it could hardly be useful for retrieval. Fortunately, we find out that the Semantic Inversion is similar with the Part of Speech Tagging(POS) and other sequential learning problems. So existing models may be useful. Conditional Random Fields has been proved efficient in sequential learning and results of our experiments also prove that CRFs can solve the Semantic Inversion problem outstandingly. Quantize the Relevance by Weighing Diverse Features Existing algorithms aiming to compute the relations in keyword field always concentrate on one or few factors (LCA of keywords, co-occurrence between keywords, etc.). In our algorithm, keyword-keyword, label-label, keyword-label, different categories of relevance are weighed to quantize the relevance between keyword sequence and label sequence jointly. As we will discuss in later part of this paper, our learning algorithm is to find the best parameters for optimizing the weight of various features. In this paper, we discuss keyword search only in XML domain. The rest of the paper is organized as follows. Section 2 presents the definition of Semantic Inversion for keyword search. Section 3 introduces general CRFs which we have applied to our problem. Details of features and the algorithm are provided in Section 4. The following two parts shows the experiments and several related work. Finally, we close with conclusion in Section 7. 2 Semantic Inversion In this section, we provide the concept of Semantic Inversion in XML domain, and why Semantic Inversion is able to improve keyword search. Definition 1(Label): Given a set of XML files and a word, a tag is the label of the term if the content of the tag contains the word. A single word may have many probable labels.

3 Semantic Inversion in XML Keyword Search 433 For instance: the label of word Pineau is author, and the label of words (appearing together) Torran Dubh can be conflict or caption. In XML files, the label can be regarded as the semantic of its content,so we will not distinguish semantic and label in later parts. Definition 2(Label Sequence): Given a search keyword sequence S consists of a sequence of words, the corresponding label sequence is composed of labels respect to each word in keyword sequence. Apparently, a label sequence can be recognized as various probable label sequences, which express diverse semantic. For instance: the label sequence of the word sequence Point, based, Pineau, IJCAI can be title, title, author, booktitle. Definition 3(Semantic Inversion): Given a set of XML files and search keyword sequence S = {w 1,w 2,,w k }, the problem of semantic inversion is to find sequential label(s) L = {l 1,l 2,,l k } which maximizes Sim(S, L),whereSim(S, L) is a function to evaluate the fitness or relevance of S and L. The answer sequences (label sequences) are given in descending order of Sim(S, L). In our algorithm, the CRF model uses conditional probability Pr(y x) as the relevance function Sim(S, L). Semantic Inversion is quite useful in keyword search. We state it in three aspects: Semantic Inversion Can Help the Search Engine to Improve Accuracy. Traditional search engine may also recognize the word Pineau in the query Point, based, Pineau, IJCAI as a person s name, but after Semantic Inversion, Pineau can be recognized as a author s name, rather than director s, politician s or others. As a result, the misunderstanding of the search engine can be greatly reduced, so the search accuracy is improved. Semantic Inversion Can Help Users Prevent Ambiguity. If the words in query have diverse semantic, Semantic Inversion can provide alternative label sequence. Since tags in XML is semantical and easy to understand, users can clarify their needs by just selecting the proper label sequence. Semantic Inversion Can Reduce the Search Time. When the label sequence is selected, the search range has been greatly narrowed. Search engines only need to search from pages relevant to labels so the search time is greatly reduced. 3 General CRFs CRFs(Conditional Random Fields) have been widely used by sequential algorithms, especially in sequential tagging problems, CRFs outperform other models. This section will briefly review CRFs and the General CRFs which have been applied in our algorithm. 3.1 Conditional Random Fields Assume x = x 1,x 2,,x n is the input keyword sequence and y = y 1,y 2,,y n is the label(the semantic) sequence. x and y have the same length. CRF(Conditional Random Fields)[5] models the conditional probability Pr(y x) by using a Markov random field for the structured y, and find the best y i to maximize Pr(y i x).

4 434 S.-H. Wang and Z.-H. Deng For the keyword sequence x and the semantic sequence y, theglobal feature vector of CRF is the sum of all the local feature functions: F(y, x) = n f(y, x,i) (1) CRF computes the conditional probability with parameter vector w by i=1 where Pr(y x, w) = ew F(y,x) Z w (x) (2) Z w (x) = y e w F(y,x) (3) For the given keyword sequence, the most probable label sequence maximize the conditional probability, Since Z w (x) does not depend on y, we can also say: ŷ =argmaxw F(y, x) (4) y 3.2 Learning Algorithm Training a CRF is to learn λ for maximizing the log-likelihood of a given training set T = {(x k, y k )} N k=1. Meanwhile, we need to penalize the likelihood with a spherical Gaussian weight prior[6] to prevent from overfitting. So the gradient is: L w = [F(y k, x k ) E Pr(y x k,w)f(y, x k )] k w (5) σ 2 The Learning Algorithm seeks the zero of the gradient. In other words, L w = Linear-Chain CRFs and General CRFs Linear-chain CRFs performs well in sequential learning problems such as NP chunking[7], Part of Speech tagging[5], Opinion Expression Identification[8] and Named Entity Recognition[9]. To solve this kind of problems, the Markov field of y should be a linear chain, and the transition features are just between the adjacent y i.in those applications, we suppose labels are sequential and use Linear-chain CRFs to concentrate on the dependence of the adjacent labels(or adjacent segments, Semi-Markov CRF[10]). However, the problems of retrieval are quite different. Several labels appear together to express one subject jointly. Of course, labels are not sequential. We need to structure y to describe the relevance of all pairs of labels, rather than only the adjacent ones. That is general CRF. In general CRFs, the structure of labels can be a complete graph. For its complexity, general CRFs are not so commonly used as Linear-chain CRFs.

5 Semantic Inversion in XML Keyword Search The Approach Semantic Inversion can be naively solved by the random select algorithm and the greedy algorithm. Random select algorithm gives the answer randomly selected from all candidate answers. Greedy algorithm recognize each keyword x i as the label which x i most frequently appears in. However, both algorithms fail to consider the relevance between labels and the relevance between adjacent keywords. To fully considerate all categories of relevance, we employ the general CRFs to model the relevance and then proposed an algorithm to solve Semantic Inversion. The algorithm weighs keyword-label relevance, label-label relevance and adjacent keywords relevance, and uses Gradient Descent algorithm to learn best parameters for the model. 4.1 Features In this section, we concentrate on features used in the general CRF. We need to extract textual features for quantize of relevance keywords and labels beforehand. In our algorithm, there are three categories of features for one keyword sequence-label sequence pair(x, y): f for keyword-label relevance, g for label-label relevance, h and h for the relevance between adjacent keywords. Feature f(x i, y i ) expresses the dependence between the keyword and the label in position i. They appear in the same position, which indicates we try to recognize the keyword x i as the content of label y i. How frequently x i appears under the label y i should be our first consideration. We measure this kind of dependence like: f(x i,y i )= p(x i,y i ) (6) p(x i ) where p(x i,y i ) is the frequency that keyword x i appear in the content of label y i, and p(x i ) denotes the frequency of keyword x i. Since the frequency of the labels could be different one another, we will not consider this factor in feature f. Feature g(y i, y j ) expresses the relevance of two labels. Existing methods(such as SCLA[11]) measures it mainly based on XML files tree-like structure. We measure this relevance based on co-occurrence. The knowledge base can be seen as a set of instances(people, cities, films, etc.), and labels which often appear commonly to describe the same instances should be deeper relevant. g(y i,y j )= p(y i,y j ) (7) p(y i )p(y j ) where p(y i,y j ) is the frequency that label y i and label y j appear together in one instance, p(y i ) denotes te frequency of label y i and p(y j ) for y j, respectively. In general CRFs, this transform feature should be calculated between each pair of labels, which differs from the Linear-chain CRFs. Feature h(x i, x i+1, y i, y i+1 ) and h (x i, x i+1, y i, y i+1 ) measure the relevance of the adjacent keywords. Here we also use the co-occurrence of keywords to measure it: h 0 (x i,x i+1 )= p(x i,x i+1 ) p(x i )p(x i+1 ) (8)

6 436 S.-H. Wang and Z.-H. Deng where p(x i,x i+1 ) denotes keyword x i and keyword x i+1 appear in the content of one label. Adjacent keywords could probably express the similar semantic. h(x i,x i+1,y i,y i+1 ) describes the contribution for recognizing adjacent keywords into the same label: { h 0 (x i,x i+1 ), y i = y i+1 h(x i,x i+1,y i,y i+1 )= (9) 0, y i y i+1 If an alternative label sequences inverses the adjacent keywords into different labels, we also need to penalize. h (x i,x i+1,y i,y i+1 ) is the penalty according to the relevance of the keywords. { h 0, y i = y i+1 (x i,x i+1,y i,y i+1 )= (10) h 0 (x i,x i+1 ), y i y i+1 All the features can be extracted quite easily. For Semantic Inversion, the joint information is contained in feature g and sequential information is concluded in feature h and h. Feature f is the basic and the most natural features for our problem. We put these three sorts of features into general CRF, then learn the parameters. The total weighed features are w F(y, x) =w 1 f(x i,y i )+w 2 g(y i,y j ) i i j + w 3 h(x i,x i+1,y i,y i+1 ) i + w 4 h (x i,x i+1,y i,y i+1 ) i (11) where the parameter vector w =[w 1,w 2,w 3,w 4 ] is what we need to learn from the general CRF. Then we can use the equation (2) and (3) to calculate the probabilities of each alternative label sequence. 4.2 Parameter Learning First, we find out all the keywords and the labels in the Training Data and find out all probable labels of each keyword. Then 5 Experiments In our experiments, the Test Algorithm gives the best 10 probable label sequences for each keyword sequence.

7 Semantic Inversion in XML Keyword Search 437 Algorithm 1. The Learning Algorithm 1: Learn(T rainingdata = {< x, y >}) 2: Find out all the keywords and the labels in T rainingdata, and all probable labels for each keyword. 3: Calculate the features f,g,h,h 4: Initialize CRF 5: repeat 6: Calculate L by Equation (5) 7: Modify w by L: w = w + L 8: until L < threshold 9: return CRF 10: End Learn 5.1 Data Source and Extraction We use Wikipedia dataset for our experiments. Wikipedia dataset contains over 1,000,000 XML documents, involving all fields of knowledge. We randomly select 50,000 documents of them, and no fields are selected particularly. We extract the keyword sequence - label sequence pairs from the infobox of each XML file. The infobox of Wikipedia is the ideal source of our experiments for its neat attribute - content format. We randomly select attributes as the labels, and extract part of the respective content as the keywords. For each label, selected keywords will not be more than 4. The total length of the label sequence will not be more than Evaluation As we discussed before, our algorithm learns from the training set T = {(x 1, y 1 ), (x 2, y 2 ), } which consisting of several keyword sequence - label sequence pairs. The algorithm modified the four parameters (w 1,w 2,w 3 and w 4 ) and imply them into the test set. We evaluate how the answer(label sequences) our algorithm gives resemble the correct (label) sequence. Here the correct sequence is the sequence of the labels, the content of which keywords are exactly contained in. The test contains only the keyword sequence. For each keyword sequence, we generate all probable label sequences. Label sequences with too low f(x i,y i ) feature i (in other words, keywords appear too few times in these labels) will be taken out. The algorithm will grade all the probable label sequences by computing the conditional probability Pr(y x). For each x in the test set S, we concentrate on the best label sequence(with the highest conditional probability) ŷ, and compare it to the correct sequence y. The accuracy is calculated by: Acc = Match(ŷ, y) Length(x) x S x S (12)

8 438 S.-H. Wang and Z.-H. Deng and Match(ŷ, y) = i eq(ŷ, y,i) (13) Another concentration is the accuracy of the best sequence within top-n sequences(in our experiments, N =4, 7, 10). In the real occasion, the search engine should provide users with several alternative results within the first page so that users can choose the best one for their own. Since some input keyword sequences are originally ambiguous, the Top-N accuracy can sometimes be more convincing. Algorithms sorts the label sequences by the conditional probability, and the set TopN(x) contains the top-n sequences. The Top-N accuracy is calculated by: AccN = x S max y TopN(x) Match(y, y) Length(x) x S In the other hand, we want to know how the algorithm ranked the real correct sequence. If the algorithm is efficient, the rank of correct label sequence should be small. We also evaluate the algorithm by how frequently the correct label sequence appear in Top-N(N =1, 4, 7, 10) answers(label sequences). This could show whether the correct sequence has been highly scored. (14) 5.3 Results and Discussion We use the cross-validation to evaluate the experiments. All the keyword sequence - label sequence pairs are split into 10 parts. At each time, 9 parts are used for learning and one part for testing. The code is written by C++. The programs are performed on a server with 4 core processors and 16GB memory. Accuracy of the First Answer. Our algorithms(73.2%) outperforms the baseline algorithms(38.1% for Random Select Algorithm and 61.2% for Greedy Algorithm), which confirms our assumption that fully consideration of various categories of relevance can improve the quality of semantic reversion. Accuracy of Top-N Answers. Figure (a) shows the accuracy of the Top-N answers, N =1, 4, 7, 10. This accuracy is calculated by Equation (11). Users can select the best answer from the N answers our algorithm gives. The best accuracy of Top10 answers can be over 90%! That is to say: Users can lead the search engine to understand the precise semantic by no more than 10% extra work. That is quite exciting. The Rank of the Correct Answer. Figure (b) shows how our algorithm ranks the correct answer. Nearly half of the correct answers are ranked at the first place. Over 80% of cases, the real correct sequence has been ranked before 10 and will be shown within the first page of the results. If so, the only thing users need to do is selecting.

9 Semantic Inversion in XML Keyword Search 439 (a) Accuracy of Top-N Answers (b) The Rank of the Correct Answer 6 Related Work In this section, we will present some related work around the utilization of Conditional Random Fields and algorithms for the keyword search on structured database. Semi-Markov Conditional Random Fields[10] split the sequence into several segments. Their goal is to find the best segmentation maximizing the conditional probability which is defined by the CRF model. Semi-CRFs perform very well in NER problems[12]. For our problem, Semi-CRF can easily find the phrases and the names contained in input sequence, but will not fully describe the joint semantic of the labels because Semi-CRFs are mainly based on the Linear-chain CRFs. On structured data, there is a work focusing on Keyword Query Reformulation[2]. The reformulated queries provide alternative descriptions of original input, so as to better capture users information need and guide users to explore related items in the target structured data. The data are modeled with a heterogenous graph, and a probabilistic generation model is utilized for query reformulation. Its aim is to help users to claim the semantic clearly. In the field of RDF, recent work provides QUICK[3], a novel system for helping users to construct semantic queries in a given domain. QUICK works with the schema graph and the query templates. Users can conveniently express their search intent by increasingly selecting the semantic and the structure provided by QUICK. What is more, an online system with a user-friendly interface has been established based on QUICK. 7 Conclusions and Future Work In nowadays keyword search, good search systems should understand users intent deeply. How to recognize and represent the keyword semantic becomes more and more important. In XML data, labels are naturally semantical, so recognizing the keywords into XML labels is what we need to concentrate on. In this paper we define this process as Semantic Inversion and model it with general conditional random fields. From our experiments, our algorithms can efficiently recognize the keywords into labels and

10 440 S.-H. Wang and Z.-H. Deng top-k label sequences also provide users with the chance to reclaim their real search intent. In the future, we want to construct a semantical knowledge base and establish a better XML retrieval system based on Semantic Inversion. We also hope to expand the Semantic Inversion to other structured data, such as RDF. If semantic can be simply and accurately inversed into other explicit forms, keyword search will surely improve. Acknowledgement. This work is partially supported by Project supported by National Natural Science Foundation of China and Project 2009AA01Z136 supported by the National High Technology Research and Development Program of China (863 Program). References 1. Tran, T., Wang, H., Rudolph, S., Cimiano, P.: Top-k exploration of query candidates for efficient keyword search on graph-shaped (rdf) data. In: International Conference on Data Engineering - ICDE 2009, pp (2009) 2. Yao, J., Cui, B., Hua, L., Huang, Y.: Keyword Query Reformulation on Structured Data. In: International Conference on Data Engineering, ICDE (2012) 3. Zenz, G., Zhou, X., Minack, E., Siberski, W., Nejdl, W.: From keywords to semantic queries - Incremental query construction on the semantic web. Journal of Web Semantics 7(3), (2009) 4. Pandey, S., Punera, K.: Unsupervised Extraction of Template Structure in Web Search Queries. In: International World Wide Web Conference - WWW (2012) 5. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: International Conference on Machine Learning, ICML 2001, pp (2001) 6. Chen, S.F., Rosenfeld, R.: A Gaussian Prior for Smoothing Maximum Entropy Models. Technical Report CMU-CS , Carnegie Mellon University (1999) 7. Sha, F., Pereira, F.C.N.: Shallow parsing with conditional random fields. In: North American Chapter of the Association for Computational Linguistics, NAACL (2003) 8. Breck, E., Choi, Y., Cardie, C.: Identifying expressions of opinion in context. In: International Joint Conference on Artificial Intelligence, IJCAI, pp (2007) 9. McCallum, A., Li, W.: Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning - CoNLL (2003) 10. Sarawagi, S., Cohen, W.W.: Semi-Markov Conditional Random Fields for Information Extraction. In: Neural Information Processing Systems, NIPS (2004) 11. Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest LCAs in XML databases. In: International Conference on Management of Data - SIGMOD, pp (2005) 12. Okanohara, D., Miyao, Y., Tsuruoka, Y., Tsujii, J.: Improving the Scalability of Semi-Markov Conditional Random Fields for Named Entity Recognition. In: Meeting of the Association for Computational Linguistics. ACL (2006)

Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001

Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001 Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher - 113059006 Raj Dabre 11305R001 Purpose of the Seminar To emphasize on the need for Shallow Parsing. To impart basic information about techniques

More information

Top-k Keyword Search Over Graphs Based On Backward Search

Top-k Keyword Search Over Graphs Based On Backward Search Top-k Keyword Search Over Graphs Based On Backward Search Jia-Hui Zeng, Jiu-Ming Huang, Shu-Qiang Yang 1College of Computer National University of Defense Technology, Changsha, China 2College of Computer

More information

Introduction to CRFs. Isabelle Tellier

Introduction to CRFs. Isabelle Tellier Introduction to CRFs Isabelle Tellier 02-08-2013 Plan 1. What is annotation for? 2. Linear and tree-shaped CRFs 3. State of the Art 4. Conclusion 1. What is annotation for? What is annotation? inputs can

More information

Feature Extraction and Loss training using CRFs: A Project Report

Feature Extraction and Loss training using CRFs: A Project Report Feature Extraction and Loss training using CRFs: A Project Report Ankan Saha Department of computer Science University of Chicago March 11, 2008 Abstract POS tagging has been a very important problem in

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

ISSN: (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Paper / Case Study Available online at: www.ijarcsms.com

More information

Conditional Random Fields : Theory and Application

Conditional Random Fields : Theory and Application Conditional Random Fields : Theory and Application Matt Seigel (mss46@cam.ac.uk) 3 June 2010 Cambridge University Engineering Department Outline The Sequence Classification Problem Linear Chain CRFs CRF

More information

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based

More information

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran

More information

A Survey on Keyword Diversification Over XML Data

A Survey on Keyword Diversification Over XML Data ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology An ISO 3297: 2007 Certified Organization Volume 6, Special Issue 5,

More information

1) Give decision trees to represent the following Boolean functions:

1) Give decision trees to represent the following Boolean functions: 1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following

More information

Complex Prediction Problems

Complex Prediction Problems Problems A novel approach to multiple Structured Output Prediction Max-Planck Institute ECML HLIE08 Information Extraction Extract structured information from unstructured data Typical subtasks Named Entity

More information

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Abstract Automatic linguistic indexing of pictures is an important but highly challenging problem for researchers in content-based

More information

Network Traffic Classification Based on Deep Learning

Network Traffic Classification Based on Deep Learning Journal of Physics: Conference Series PAPER OPEN ACCESS Network Traffic Classification Based on Deep Learning To cite this article: Jun Hua Shu et al 2018 J. Phys.: Conf. Ser. 1087 062021 View the article

More information

RiMOM Results for OAEI 2009

RiMOM Results for OAEI 2009 RiMOM Results for OAEI 2009 Xiao Zhang, Qian Zhong, Feng Shi, Juanzi Li and Jie Tang Department of Computer Science and Technology, Tsinghua University, Beijing, China zhangxiao,zhongqian,shifeng,ljz,tangjie@keg.cs.tsinghua.edu.cn

More information

over Multi Label Images

over Multi Label Images IBM Research Compact Hashing for Mixed Image Keyword Query over Multi Label Images Xianglong Liu 1, Yadong Mu 2, Bo Lang 1 and Shih Fu Chang 2 1 Beihang University, Beijing, China 2 Columbia University,

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

CADIAL Search Engine at INEX

CADIAL Search Engine at INEX CADIAL Search Engine at INEX Jure Mijić 1, Marie-Francine Moens 2, and Bojana Dalbelo Bašić 1 1 Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000 Zagreb, Croatia {jure.mijic,bojana.dalbelo}@fer.hr

More information

A Web Recommendation System Based on Maximum Entropy

A Web Recommendation System Based on Maximum Entropy A Web Recommendation System Based on Maximum Entropy Xin Jin, Bamshad Mobasher,Yanzan Zhou Center for Web Intelligence School of Computer Science, Telecommunication, and Information Systems DePaul University,

More information

Learning to extract information from large domain-specific websites using sequential models

Learning to extract information from large domain-specific websites using sequential models Learning to extract information from large domain-specific websites using sequential models Sunita Sarawagi sunita@iitb.ac.in V.G.Vinod Vydiswaran vgvinodv@iitb.ac.in ABSTRACT In this article we describe

More information

Particle Filtering. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore

Particle Filtering. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore Particle Filtering CS6240 Multimedia Analysis Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore (CS6240) Particle Filtering 1 / 28 Introduction Introduction

More information

A reversible data hiding based on adaptive prediction technique and histogram shifting

A reversible data hiding based on adaptive prediction technique and histogram shifting A reversible data hiding based on adaptive prediction technique and histogram shifting Rui Liu, Rongrong Ni, Yao Zhao Institute of Information Science Beijing Jiaotong University E-mail: rrni@bjtu.edu.cn

More information

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

Video Inter-frame Forgery Identification Based on Optical Flow Consistency Sensors & Transducers 24 by IFSA Publishing, S. L. http://www.sensorsportal.com Video Inter-frame Forgery Identification Based on Optical Flow Consistency Qi Wang, Zhaohong Li, Zhenzhen Zhang, Qinglong

More information

Query-Sensitive Similarity Measure for Content-Based Image Retrieval

Query-Sensitive Similarity Measure for Content-Based Image Retrieval Query-Sensitive Similarity Measure for Content-Based Image Retrieval Zhi-Hua Zhou Hong-Bin Dai National Laboratory for Novel Software Technology Nanjing University, Nanjing 2193, China {zhouzh, daihb}@lamda.nju.edu.cn

More information

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,

More information

Closing the Loop in Webpage Understanding

Closing the Loop in Webpage Understanding IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1 Closing the Loop in Webpage Understanding Chunyu Yang, Student Member, IEEE, Yong Cao, Zaiqing Nie, Jie Zhou, Senior Member, IEEE, and Ji-Rong Wen

More information

CRF Feature Induction

CRF Feature Induction CRF Feature Induction Andrew McCallum Efficiently Inducing Features of Conditional Random Fields Kuzman Ganchev 1 Introduction Basic Idea Aside: Transformation Based Learning Notation/CRF Review 2 Arbitrary

More information

Semi-Supervised Learning of Named Entity Substructure

Semi-Supervised Learning of Named Entity Substructure Semi-Supervised Learning of Named Entity Substructure Alden Timme aotimme@stanford.edu CS229 Final Project Advisor: Richard Socher richard@socher.org Abstract The goal of this project was two-fold: (1)

More information

Efficient Dependency-Guided Named Entity Recognition

Efficient Dependency-Guided Named Entity Recognition Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Efficient Dependency-Guided Named Entity Recognition Zhanming Jie, Aldrian Obaja Muis, Wei Lu Singapore University of

More information

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach Outline Objective Approach Experiment Conclusion and Future work Objective Automatically establish linguistic indexing of pictures

More information

Falcon-AO: Aligning Ontologies with Falcon

Falcon-AO: Aligning Ontologies with Falcon Falcon-AO: Aligning Ontologies with Falcon Ningsheng Jian, Wei Hu, Gong Cheng, Yuzhong Qu Department of Computer Science and Engineering Southeast University Nanjing 210096, P. R. China {nsjian, whu, gcheng,

More information

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C, Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative

More information

Fast and Effective System for Name Entity Recognition on Big Data

Fast and Effective System for Name Entity Recognition on Big Data International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-3, Issue-2 E-ISSN: 2347-2693 Fast and Effective System for Name Entity Recognition on Big Data Jigyasa Nigam

More information

RiMOM Results for OAEI 2008

RiMOM Results for OAEI 2008 RiMOM Results for OAEI 2008 Xiao Zhang 1, Qian Zhong 1, Juanzi Li 1, Jie Tang 1, Guotong Xie 2 and Hanyu Li 2 1 Department of Computer Science and Technology, Tsinghua University, China {zhangxiao,zhongqian,ljz,tangjie}@keg.cs.tsinghua.edu.cn

More information

Linking Entities in Chinese Queries to Knowledge Graph

Linking Entities in Chinese Queries to Knowledge Graph Linking Entities in Chinese Queries to Knowledge Graph Jun Li 1, Jinxian Pan 2, Chen Ye 1, Yong Huang 1, Danlu Wen 1, and Zhichun Wang 1(B) 1 Beijing Normal University, Beijing, China zcwang@bnu.edu.cn

More information

Computationally Efficient M-Estimation of Log-Linear Structure Models

Computationally Efficient M-Estimation of Log-Linear Structure Models Computationally Efficient M-Estimation of Log-Linear Structure Models Noah Smith, Doug Vail, and John Lafferty School of Computer Science Carnegie Mellon University {nasmith,dvail2,lafferty}@cs.cmu.edu

More information

Using Maximum Entropy for Automatic Image Annotation

Using Maximum Entropy for Automatic Image Annotation Using Maximum Entropy for Automatic Image Annotation Jiwoon Jeon and R. Manmatha Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst Amherst, MA-01003.

More information

Rank Measures for Ordering

Rank Measures for Ordering Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many

More information

An ICA-Based Multivariate Discretization Algorithm

An ICA-Based Multivariate Discretization Algorithm An ICA-Based Multivariate Discretization Algorithm Ye Kang 1,2, Shanshan Wang 1,2, Xiaoyan Liu 1, Hokyin Lai 1, Huaiqing Wang 1, and Baiqi Miao 2 1 Department of Information Systems, City University of

More information

Toward Interlinking Asian Resources Effectively: Chinese to Korean Frequency-Based Machine Translation System

Toward Interlinking Asian Resources Effectively: Chinese to Korean Frequency-Based Machine Translation System Toward Interlinking Asian Resources Effectively: Chinese to Korean Frequency-Based Machine Translation System Eun Ji Kim and Mun Yong Yi (&) Department of Knowledge Service Engineering, KAIST, Daejeon,

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Beyond Bag of Words Bag of Words a document is considered to be an unordered collection of words with no relationships Extending

More information

On maximum spanning DAG algorithms for semantic DAG parsing

On maximum spanning DAG algorithms for semantic DAG parsing On maximum spanning DAG algorithms for semantic DAG parsing Natalie Schluter Department of Computer Science School of Technology, Malmö University Malmö, Sweden natalie.schluter@mah.se Abstract Consideration

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

3 Nonlinear Regression

3 Nonlinear Regression CSC 4 / CSC D / CSC C 3 Sometimes linear models are not sufficient to capture the real-world phenomena, and thus nonlinear models are necessary. In regression, all such models will have the same basic

More information

Sequences Modeling and Analysis Based on Complex Network

Sequences Modeling and Analysis Based on Complex Network Sequences Modeling and Analysis Based on Complex Network Li Wan 1, Kai Shu 1, and Yu Guo 2 1 Chongqing University, China 2 Institute of Chemical Defence People Libration Army {wanli,shukai}@cqu.edu.cn

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Segment-based Hidden Markov Models for Information Extraction

Segment-based Hidden Markov Models for Information Extraction Segment-based Hidden Markov Models for Information Extraction Zhenmei Gu David R. Cheriton School of Computer Science University of Waterloo Waterloo, Ontario, Canada N2l 3G1 z2gu@uwaterloo.ca Nick Cercone

More information

Topic Diversity Method for Image Re-Ranking

Topic Diversity Method for Image Re-Ranking Topic Diversity Method for Image Re-Ranking D.Ashwini 1, P.Jerlin Jeba 2, D.Vanitha 3 M.E, P.Veeralakshmi M.E., Ph.D 4 1,2 Student, 3 Assistant Professor, 4 Associate Professor 1,2,3,4 Department of Information

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN: IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T

More information

An Improved DFSA Anti-collision Algorithm Based on the RFID-based Internet of Vehicles

An Improved DFSA Anti-collision Algorithm Based on the RFID-based Internet of Vehicles 2016 2 nd International Conference on Energy, Materials and Manufacturing Engineering (EMME 2016) ISBN: 978-1-60595-441-7 An Improved DFSA Anti-collision Algorithm Based on the RFID-based Internet of Vehicles

More information

TSS: A Hybrid Web Searches

TSS: A Hybrid Web Searches 410 TSS: A Hybrid Web Searches Li-Xin Han 1,2,3, Gui-Hai Chen 3, and Li Xie 3 1 Department of Mathematics, Nanjing University, Nanjing 210093, P.R. China 2 Department of Computer Science and Engineering,

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

Using Decision Boundary to Analyze Classifiers

Using Decision Boundary to Analyze Classifiers Using Decision Boundary to Analyze Classifiers Zhiyong Yan Congfu Xu College of Computer Science, Zhejiang University, Hangzhou, China yanzhiyong@zju.edu.cn Abstract In this paper we propose to use decision

More information

Hybrid Quasi-Monte Carlo Method for the Simulation of State Space Models

Hybrid Quasi-Monte Carlo Method for the Simulation of State Space Models The Tenth International Symposium on Operations Research and Its Applications (ISORA 211) Dunhuang, China, August 28 31, 211 Copyright 211 ORSC & APORC, pp. 83 88 Hybrid Quasi-Monte Carlo Method for the

More information

BUPT at TREC 2009: Entity Track

BUPT at TREC 2009: Entity Track BUPT at TREC 2009: Entity Track Zhanyi Wang, Dongxin Liu, Weiran Xu, Guang Chen, Jun Guo Pattern Recognition and Intelligent System Lab, Beijing University of Posts and Telecommunications, Beijing, China,

More information

Improving Data Access Performance by Reverse Indexing

Improving Data Access Performance by Reverse Indexing Improving Data Access Performance by Reverse Indexing Mary Posonia #1, V.L.Jyothi *2 # Department of Computer Science, Sathyabama University, Chennai, India # Department of Computer Science, Jeppiaar Engineering

More information

A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP

A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP Rini John and Sharvari S. Govilkar Department of Computer Engineering of PIIT Mumbai University, New Panvel, India ABSTRACT Webpages

More information

RSDC 09: Tag Recommendation Using Keywords and Association Rules

RSDC 09: Tag Recommendation Using Keywords and Association Rules RSDC 09: Tag Recommendation Using Keywords and Association Rules Jian Wang, Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University, Bethlehem, PA 18015 USA

More information

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 1 Student, M.E., (Computer science and Engineering) in M.G University, India, 2 Associate Professor

More information

A Deep Relevance Matching Model for Ad-hoc Retrieval

A Deep Relevance Matching Model for Ad-hoc Retrieval A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese

More information

A Digital Library Framework for Reusing e-learning Video Documents

A Digital Library Framework for Reusing e-learning Video Documents A Digital Library Framework for Reusing e-learning Video Documents Paolo Bolettieri, Fabrizio Falchi, Claudio Gennaro, and Fausto Rabitti ISTI-CNR, via G. Moruzzi 1, 56124 Pisa, Italy paolo.bolettieri,fabrizio.falchi,claudio.gennaro,

More information

Volume 2, Issue 6, June 2014 International Journal of Advance Research in Computer Science and Management Studies

Volume 2, Issue 6, June 2014 International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 6, June 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Internet

More information

I Know Your Name: Named Entity Recognition and Structural Parsing

I Know Your Name: Named Entity Recognition and Structural Parsing I Know Your Name: Named Entity Recognition and Structural Parsing David Philipson and Nikil Viswanathan {pdavid2, nikil}@stanford.edu CS224N Fall 2011 Introduction In this project, we explore a Maximum

More information

Handwritten Word Recognition using Conditional Random Fields

Handwritten Word Recognition using Conditional Random Fields Handwritten Word Recognition using Conditional Random Fields Shravya Shetty Harish Srinivasan Sargur Srihari Center of Excellence for Document Analysis and Recognition (CEDAR) Department of Computer Science

More information

Indoor optimal path planning based on Dijkstra Algorithm Yicheng Xu 1,a, Zhigang Wen 1,2,b, Xiaoying Zhang 1

Indoor optimal path planning based on Dijkstra Algorithm Yicheng Xu 1,a, Zhigang Wen 1,2,b, Xiaoying Zhang 1 International Conference on Materials Engineering and Information Technology Applications (MEITA 2015) Indoor optimal path planning based on Dijkstra Algorithm Yicheng Xu 1,a, Zhigang Wen 1,2,b, Xiaoying

More information

A Survey on Postive and Unlabelled Learning

A Survey on Postive and Unlabelled Learning A Survey on Postive and Unlabelled Learning Gang Li Computer & Information Sciences University of Delaware ligang@udel.edu Abstract In this paper we survey the main algorithms used in positive and unlabeled

More information

Scalable Trigram Backoff Language Models

Scalable Trigram Backoff Language Models Scalable Trigram Backoff Language Models Kristie Seymore Ronald Rosenfeld May 1996 CMU-CS-96-139 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 This material is based upon work

More information

Label Distribution Learning. Wei Han

Label Distribution Learning. Wei Han Label Distribution Learning Wei Han, Big Data Research Center, UESTC Email:wei.hb.han@gmail.com Outline 1. Why label distribution learning? 2. What is label distribution learning? 2.1. Problem Formulation

More information

Boosting Simple Model Selection Cross Validation Regularization

Boosting Simple Model Selection Cross Validation Regularization Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,

More information

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Exam Marco Kuhlmann. This exam consists of three parts:

Exam Marco Kuhlmann. This exam consists of three parts: TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding

More information

A probabilistic model to resolve diversity-accuracy challenge of recommendation systems

A probabilistic model to resolve diversity-accuracy challenge of recommendation systems A probabilistic model to resolve diversity-accuracy challenge of recommendation systems AMIN JAVARI MAHDI JALILI 1 Received: 17 Mar 2013 / Revised: 19 May 2014 / Accepted: 30 Jun 2014 Recommendation systems

More information

Face Recognition Technology Based On Image Processing Chen Xin, Yajuan Li, Zhimin Tian

Face Recognition Technology Based On Image Processing Chen Xin, Yajuan Li, Zhimin Tian 4th International Conference on Machinery, Materials and Computing Technology (ICMMCT 2016) Face Recognition Technology Based On Image Processing Chen Xin, Yajuan Li, Zhimin Tian Hebei Engineering and

More information

Webpage Understanding: an Integrated Approach

Webpage Understanding: an Integrated Approach Webpage Understanding: an Integrated Approach Jun Zhu Dept. of Comp. Sci. & Tech. Tsinghua University Beijing, 100084 China jjzhunet9@hotmail.com Bo Zhang Dept. of Comp. Sci. & Tech. Tsinghua University

More information

Semantic Annotation of Web Resources Using IdentityRank and Wikipedia

Semantic Annotation of Web Resources Using IdentityRank and Wikipedia Semantic Annotation of Web Resources Using IdentityRank and Wikipedia Norberto Fernández, José M.Blázquez, Luis Sánchez, and Vicente Luque Telematic Engineering Department. Carlos III University of Madrid

More information

University of Delaware at Diversity Task of Web Track 2010

University of Delaware at Diversity Task of Web Track 2010 University of Delaware at Diversity Task of Web Track 2010 Wei Zheng 1, Xuanhui Wang 2, and Hui Fang 1 1 Department of ECE, University of Delaware 2 Yahoo! Abstract We report our systems and experiments

More information

Automatic Domain Partitioning for Multi-Domain Learning

Automatic Domain Partitioning for Multi-Domain Learning Automatic Domain Partitioning for Multi-Domain Learning Di Wang diwang@cs.cmu.edu Chenyan Xiong cx@cs.cmu.edu William Yang Wang ww@cmu.edu Abstract Multi-Domain learning (MDL) assumes that the domain labels

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base Yi Zeng, Dongsheng Wang, Tielin Zhang, Hao Wang, and Hongwei Hao Institute of Automation, Chinese Academy of Sciences, Beijing,

More information

MEMMs (Log-Linear Tagging Models)

MEMMs (Log-Linear Tagging Models) Chapter 8 MEMMs (Log-Linear Tagging Models) 8.1 Introduction In this chapter we return to the problem of tagging. We previously described hidden Markov models (HMMs) for tagging problems. This chapter

More information

Academic Paper Recommendation Based on Heterogeneous Graph

Academic Paper Recommendation Based on Heterogeneous Graph Academic Paper Recommendation Based on Heterogeneous Graph Linlin Pan, Xinyu Dai, Shujian Huang, and Jiajun Chen National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023,

More information

Feature-level Fusion for Effective Palmprint Authentication

Feature-level Fusion for Effective Palmprint Authentication Feature-level Fusion for Effective Palmprint Authentication Adams Wai-Kin Kong 1, 2 and David Zhang 1 1 Biometric Research Center, Department of Computing The Hong Kong Polytechnic University, Kowloon,

More information

Extracting Relation Descriptors with Conditional Random Fields

Extracting Relation Descriptors with Conditional Random Fields Extracting Relation Descriptors with Conditional Random Fields Yaliang Li, Jing Jiang, Hai Leong Chieu, Kian Ming A. Chai School of Information Systems, Singapore Management University, Singapore DSO National

More information

Diversification of Query Interpretations and Search Results

Diversification of Query Interpretations and Search Results Diversification of Query Interpretations and Search Results Advanced Methods of IR Elena Demidova Materials used in the slides: Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova,

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

A Finite State Mobile Agent Computation Model

A Finite State Mobile Agent Computation Model A Finite State Mobile Agent Computation Model Yong Liu, Congfu Xu, Zhaohui Wu, Weidong Chen, and Yunhe Pan College of Computer Science, Zhejiang University Hangzhou 310027, PR China Abstract In this paper,

More information

learning stage (Stage 1), CNNH learns approximate hash codes for training images by optimizing the following loss function:

learning stage (Stage 1), CNNH learns approximate hash codes for training images by optimizing the following loss function: 1 Query-adaptive Image Retrieval by Deep Weighted Hashing Jian Zhang and Yuxin Peng arxiv:1612.2541v2 [cs.cv] 9 May 217 Abstract Hashing methods have attracted much attention for large scale image retrieval.

More information

Clinical Named Entity Recognition Method Based on CRF

Clinical Named Entity Recognition Method Based on CRF Clinical Named Entity Recognition Method Based on CRF Yanxu Chen 1, Gang Zhang 1, Haizhou Fang 1, Bin He, and Yi Guan Research Center of Language Technology Harbin Institute of Technology, Harbin, China

More information

Ranking Web Pages by Associating Keywords with Locations

Ranking Web Pages by Associating Keywords with Locations Ranking Web Pages by Associating Keywords with Locations Peiquan Jin, Xiaoxiang Zhang, Qingqing Zhang, Sheng Lin, and Lihua Yue University of Science and Technology of China, 230027, Hefei, China jpq@ustc.edu.cn

More information

CRFs for Image Classification

CRFs for Image Classification CRFs for Image Classification Devi Parikh and Dhruv Batra Carnegie Mellon University Pittsburgh, PA 15213 {dparikh,dbatra}@ece.cmu.edu Abstract We use Conditional Random Fields (CRFs) to classify regions

More information

A Taxonomy of Semi-Supervised Learning Algorithms

A Taxonomy of Semi-Supervised Learning Algorithms A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph

More information

VIDEO SEARCHING AND BROWSING USING VIEWFINDER

VIDEO SEARCHING AND BROWSING USING VIEWFINDER VIDEO SEARCHING AND BROWSING USING VIEWFINDER By Dan E. Albertson Dr. Javed Mostafa John Fieber Ph. D. Student Associate Professor Ph. D. Candidate Information Science Information Science Information Science

More information

Multimodal Information Spaces for Content-based Image Retrieval

Multimodal Information Spaces for Content-based Image Retrieval Research Proposal Multimodal Information Spaces for Content-based Image Retrieval Abstract Currently, image retrieval by content is a research problem of great interest in academia and the industry, due

More information

A Partial Curve Matching Method for Automatic Reassembly of 2D Fragments

A Partial Curve Matching Method for Automatic Reassembly of 2D Fragments A Partial Curve Matching Method for Automatic Reassembly of 2D Fragments Liangjia Zhu 1, Zongtan Zhou 1, Jingwei Zhang 2,andDewenHu 1 1 Department of Automatic Control, College of Mechatronics and Automation,

More information

Learning Alignments from Latent Space Structures

Learning Alignments from Latent Space Structures Learning Alignments from Latent Space Structures Ieva Kazlauskaite Department of Computer Science University of Bath, UK i.kazlauskaite@bath.ac.uk Carl Henrik Ek Faculty of Engineering University of Bristol,

More information

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings.

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings. Online Review Spam Detection by New Linguistic Features Amir Karam, University of Maryland Baltimore County Bin Zhou, University of Maryland Baltimore County Karami, A., Zhou, B. (2015). Online Review

More information

ImgSeek: Capturing User s Intent For Internet Image Search

ImgSeek: Capturing User s Intent For Internet Image Search ImgSeek: Capturing User s Intent For Internet Image Search Abstract - Internet image search engines (e.g. Bing Image Search) frequently lean on adjacent text features. It is difficult for them to illustrate

More information

Grounded Compositional Semantics for Finding and Describing Images with Sentences

Grounded Compositional Semantics for Finding and Describing Images with Sentences Grounded Compositional Semantics for Finding and Describing Images with Sentences R. Socher, A. Karpathy, V. Le,D. Manning, A Y. Ng - 2013 Ali Gharaee 1 Alireza Keshavarzi 2 1 Department of Computational

More information

Sign Language Recognition using Dynamic Time Warping and Hand Shape Distance Based on Histogram of Oriented Gradient Features

Sign Language Recognition using Dynamic Time Warping and Hand Shape Distance Based on Histogram of Oriented Gradient Features Sign Language Recognition using Dynamic Time Warping and Hand Shape Distance Based on Histogram of Oriented Gradient Features Pat Jangyodsuk Department of Computer Science and Engineering The University

More information