Bring Semantic Web to Social Communities

Size: px
Start display at page:

Download "Bring Semantic Web to Social Communities"

Transcription

1 Bring Semantic Web to Social Communities Jie Tang Dept. of Computer Science, Tsinghua University, China April 19, 2010 Abstract Recently, more and more researchers have recognized the importance of knowledge discovery from the data generated in these social sites. Promising results have been presented in work alone this line. The results are mostly useful for analysts, to better describe the characteristics of social communities and better understand user behaviours. The real impact of mining social data and social semantic techniques, however, has not reached into the social communities and social users. People who created the data are not people who benefit from the analysis of the data. How can we leverage the knowledge discovered from social data to enhance experiences of users in social communities? To diminish misbehaviours? To facilitate social learning and social influence? To improve the quality of user-generated data? Search engine has provided a good example in the context of Web 1.0, by analysing the data generated by a search engine (i.e., search logs) to enhance search quality. How can we generalize this success in Web 2.0, by mining user-generated data to influence social behaviour of the same group of users? In this paper, we will discuss some potentially promising directions to bridge the gap between data analysis and user-centered applications, bring together researchers in semantic web and social computing. We will share our experience, success, and lesson learned in developing semantic social systems. 1 Introduction Semantic web aims to bridge computers and human beings by enabling the computers to understand the meaning (semantics) of information and services on the web. In essence, two core tasks to realize semantic web are defining/generating semantics and satisfying users requests based on the semantic information. In the past years, many researches have been conducted to this end. However, a prevalent method is to ask the user to create the semantics on the web and then domain experts define reasoning rules to help locate the information to satisfy users requests. This method has been criticized recently due to its high infeasibility. First, people are not willing to create the semantic content. Although, some tools have been developed to facilitate the semantic annotation task, people who created the semantic data are not people who benefit from the data. There is no motivation for people to create the semantics. Second, the 1

2 complex reasoning rules defined by the domain experts are inapplicable also. The web are evolving rapidly. A rule may be correct at a specific time, but may quickly become out-of-date. On the other hand, with the emergence and rapid proliferation of social applications and media, such as instant messaging (e.g., IRC, AIM, MSN, Jabber, Skype), sharing sites (e.g., Flickr, Picassa, YouTube, Plaxo), blogs (e.g., Blogger, WordPress, LiveJournal), wikis (e.g., Wikipedia, PBWiki), microblogs (e.g., Twitter, Jaiku), social networks (e.g., MySpace, Facebook, Ning), collaboration networks (e.g., DBLP) to mention a few, there is little doubt that social network is becoming a popular research topic, attracting tremendous interest from mathematics, biology, physics, computer science, and sociology. The social web provides an opportunity to obtain users generated data, for example users on Twitter now send out more than 50 million tweets per day; at the same time, it also poses several unique challenges. Most existing researches have focused on finding the macro-level mechanisms of the social influence such as degree distributions, diameter, clustering coefficient, communities, and small world effect [1, 2, 4, 5]. However, these methods provide us limited insight into the micro-level dynamics of the social network such as how an individual user changes his behaviors (actions) and how a user s action influences his friends. In this paper, we discuss the challenges we are facing with and introduce our experiences in developing an academic social network system. 2 Challenges There are still many challenging issues for semantic web in the social communities. 1. Lack of semantic-based information The semantic information obtained from the user generated or extracted by using heuristics is often incomplete or inconsistent. Users do not fill some information merely because they are not willing to fill the information. A challenging problem is how to extract the semantic information from the social web. 2. Integration of semantic data To integrate the semantic data from different sources, one needs not only find the alignment between the heterogeneous schemas of these data, but also solve the object resolution problem (an identical instance has multiple representation forms and a same representation may refer to multiple meanings). This is also a fundamental problem for the Linked data vision. 3. Modeling and search of semantic data The (semantic) web is rather heterogeneous, which gives rise to several challenging issues and make it different from the general search engine. First, the informationseeking practice [3] is not only about documents, but also about other information sources, such as in the academic network it includes authors, papers, conferences and journals, etc. In this spirit, a good search engine should not only provide support for documents, but also for all these information sources. Second, semantic search typically requires much higher retrieval accuracy. Given a query, such as semantic 2

3 address phone fax research_interest affiliation position person_photo homepage title start_page end_page date download_url bsuniv bsdate bsmajor Researcher author Publication cite msmajor msuniv msdate coauthor editor/reviewer Journal published_at phddate publisher phdmajor phduniv member_of is_part_of chair/pc_member Publication Venue homepage Organization Conference host_by location description location date relationship property sub_class Figure 1: The schema of academic network. web, a user does not typically mean to find documents containing these two words. Her/his intention is to find documents on the semantic web topic. These two issues are often intertwined. For instance, we need not only consider the search accuracy of documents, but for other information sources as well. Now, the problem is how to find a principled way to model the heterogeneous semantic data and how to design a ranking method for searching the semantic data with high accuracy. 4. Social influence analysis The social web is not only about data/information, but also about users. It is well known that users actions in a social network are influenced by various factors such as personal interests, social influence, and global trends. However, few work systematically studies how social actions evolve in a dynamic social network and to what extent the different factors affect the user actions. More specifically, how to quantify the strength of social influence between two users? How to estimate the model on real large networks? How to model the social network structure, user attributes and users historical actions so as to predict users behaviors? 3 Our Experiences in Arnetminer As a case study, we have developed Arnetminer, 1 a system aiming to provide comprehensive search and mining services for academic community. As the academic information is located in the distributed web, we need first extract the semantic information from the distributed web. We define the data model of the academic network (as shown in Figure 1). Some of the academic data can be extracted from structured data sources such as the publication information from DBLP; while other data needs to be extracted from 1 3

4 unstructured Web pages such as researchers homepages. We propose a unified approach to extract researcher profiles from the researchers homepages. We integrate the publication data from online databases. We extract the organization information from Wikipedia using regular expressions. Our technique contribution includes the unified approach for researcher profiling [10] and the approach for dealing with the disambiguation problem in the integration [14]. The unified approach for research profiling explored in this paper is based on a new Condition Random Field model called Tree-structured Conditional Random Fields(TCRFs) [6] [9]. Researcher profiling Specifically, the researcher profiling approach consists of three steps: relevant page identification, preprocessing, and tagging. In relevant page identification, given a researcher, we first get a list of web pages by a search engine (we use Google API) and then identify the homepage/introducing page using a classifier. The performance of the classifier is 92.39% in terms of F1-measure. In preprocessing, we separate the text into tokens and assign possible tags to each token. The tokens form the basic units in the following tagging step and the pages form the sequences of units in it. In tagging, given a sequence of units, we determine the most likely corresponding sequence of tags by using a trained tagging model. The type of the tags corresponds to the profile property (as shown in Figure 1). As the tagging model, we use Treestructured Conditional Random Fields (TCRFs) [6]. TCRFs can model dependencies across hierarchically laid-out information. In researcher profile extraction, an identified homepage can be represented as an DOM tree. The root node corresponds to the Web page, a leaf node denotes a word token, and an inner node denotes a coarse information block (e.g., a block containing contact information). For parameter estimation, as the graphical structure in TCRFs can be a tree with cycles, exact inference will be expensive. We propose using the Tree-based Reparameterization (TRP) algorithm [13] to compute the approximate inference. We evaluate the performance of the proposed approach on 2, 000 randomly chosen researchers homepages. Our approach can reach 86.70% (in terms of F1-measure) on average. We compare our method with several state-of-the-art methods, i.e., rule learning based method (Amilcare), classification based method (SVM-based method), and linear-chain CRFs. Our approach significantly outperforms (+3.4%-33.2%) the baseline methods for profile extraction. Integration We collect the publication data from online databases including DBLP, ACM Digital library, Citeseer, and others. For integrating researcher profiles and the publications data, we use the author as identifier. Thus we need to deal with the ambiguity problem. The task of disambiguation is defined as follows: Given a person a, we denote all papers having the author d a as P = {p 1, p 2,, p n }. Suppose there existing k actual researchers {y 1, y 2,, y k } having the a, our task is to assign each of these n papers to its real researcher y i. We propose a probabilistic framework for disambiguation based on Hidden Markov Random Fields (HMRF) [14]. The method effectively improve (+8%) the performance of disambiguation, by comparing with the baseline methods on two real-world data sets. Heterogeneous academic network The extracted/integrated data is stored into an academic network base. With the profiling and integration methods, we have already 4

5 collected 548,504 researcher profiles, 2,858,504 publications, 5,042 conferences, and 32,215,473 paper-paper citation relationships, 47,443,857 coauthor relationships, and 14,720,130 paper-published-at relationships. Based on the academic network, services such as expertise search [7], citation network analysis [12], influence analysis [8], topical graph search, and topic browser [11] have been provided. The system is in operation on the internet for nearly two years and receives a large amount of accesses from 180 countries. Feedbacks from users and system logs indicate that users consider the system really help people to find and share information in the academic community. 4 Conclusion In this paper, we discuss the opportunities and the challenges of semantic web in the social web era. We briefly introduce our work on Arnetminer, an academic social networking system and share our experiences when developing this system. The general problem of semantic web meeting social communities presents an new and interesting research direction in web science. References [1] R. Albert and A. L. Barabasi. Reviews of Modern Physics, 74(1), [2] M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In SIGCOMM 99, pages , [3] M. Hertzum and A. M. Pejtersen. The information-seeking practices of engineers: Searching for documents as well as for people. Information Processing & Management, 36(5): , [4] M. E. J. Newman. The structure and function of complex networks. SIAM Reviews, 45, [5] S. H. Strogatz. Exploring complex networks. Nature, 410: , [6] J. Tang, M. Hong, J. Li, and B. Liang. Tree-structured conditional random fields for semantic annotation. In ISWC 06, pages , [7] J. Tang, R. Jin, and J. Zhang. A topic modeling approach and its integration into the random walk framework for academic search. In ICDM 08, pages , [8] J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in large-scale networks. In KDD 09, pages , [9] J. Tang, L. Yao, D. Zhang, and J. Zhang. A combination approach to web user profiling. ACM Transactions on Knowledge Discovery from Data, 2010 (to appear). [10] J. Tang, D. Zhang, and L. Yao. Social network extraction of academic researchers. In ICDM 07, pages , [11] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: Extraction and mining of academic social networks. In KDD 08, pages , [12] J. Tang, J. Zhang, J. X. Yu, Z. Yang, K. Cai, R. Ma, L. Zhang, and Z. Su. Topic distributions over links on web. In ICDM 09, [13] M. J. Wainwright, T. Jaakkola, and A. S. Willsky. Tree-based reparameterization for approximate estimation on loopy graphs. In Proceedings of the 13th Neural Information Processing Systems (NIPS 01), pages , [14] D. Zhang, J. Tang, and J. Li. A constraint-based probabilistic framework for disambiguation. In CIKM 07, pages ,

2 Jie Tang et al. 1. INTRODUCTION Profiling of a Web user is the process of obtaining values of different properties that constitute the user model. C

2 Jie Tang et al. 1. INTRODUCTION Profiling of a Web user is the process of obtaining values of different properties that constitute the user model. C A Combination Approach to Web User Profiling Jie Tang Tsinghua University Limin Yao University of Massachusetts Amherst Duo Zhang University of Illinois at Urbana-Champaign and Jing Zhang Tsinghua University

More information

Deep Web Crawling and Mining for Building Advanced Search Application

Deep Web Crawling and Mining for Building Advanced Search Application Deep Web Crawling and Mining for Building Advanced Search Application Zhigang Hua, Dan Hou, Yu Liu, Xin Sun, Yanbing Yu {hua, houdan, yuliu, xinsun, yyu}@cc.gatech.edu College of computing, Georgia Tech

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

An Empirical Analysis of Communities in Real-World Networks

An Empirical Analysis of Communities in Real-World Networks An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization

More information

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction

More information

Entity Information Management in Complex Networks

Entity Information Management in Complex Networks Entity Information Management in Complex Networks Yi Fang Department of Computer Science 250 N. University Street Purdue University, West Lafayette, IN 47906, USA fangy@cs.purdue.edu ABSTRACT Entity information

More information

AMiner II Toward Understanding Big Scholar Data. Jie Tang

AMiner II Toward Understanding Big Scholar Data. Jie Tang AMiner II Toward Understanding Big Scholar Data Jie Tang AMiner II Toward Understanding Big Scholar Data @006-05, http://aminer.org Jie Tang Tsinghua University Mining Knowledge from Big Data Drown or

More information

Scholarly Big Data: Leverage for Science

Scholarly Big Data: Leverage for Science Scholarly Big Data: Leverage for Science C. Lee Giles The Pennsylvania State University University Park, PA, USA giles@ist.psu.edu http://clgiles.ist.psu.edu Funded in part by NSF, Allen Institute for

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Chapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han

Chapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han Chapter 1. Social Media and Social Computing October 2012 Youn-Hee Han http://link.koreatech.ac.kr 1.1 Social Media A rapid development and change of the Web and the Internet Participatory web application

More information

Image Similarity Measurements Using Hmok- Simrank

Image Similarity Measurements Using Hmok- Simrank Image Similarity Measurements Using Hmok- Simrank A.Vijay Department of computer science and Engineering Selvam College of Technology, Namakkal, Tamilnadu,india. k.jayarajan M.E (Ph.D) Assistant Professor,

More information

Semantic Annotation using Horizontal and Vertical Contexts

Semantic Annotation using Horizontal and Vertical Contexts Semantic Annotation using Horizontal and Vertical Contexts Mingcai Hong, Jie Tang, and Juanzi Li Department of Computer Science & Technology, Tsinghua University, 100084. China. {hmc, tj, ljz}@keg.cs.tsinghua.edu.cn

More information

Community Mining Tool using Bibliography Data

Community Mining Tool using Bibliography Data Community Mining Tool using Bibliography Data Ryutaro Ichise, Hideaki Takeda National Institute of Informatics 2-1-2 Hitotsubashi Chiyoda-ku Tokyo, 101-8430, Japan {ichise,takeda}@nii.ac.jp Kosuke Ueyama

More information

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization Pedro Ribeiro (DCC/FCUP & CRACS/INESC-TEC) Part 1 Motivation and emergence of Network Science

More information

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI 2017 International Conference on Electronic, Control, Automation and Mechanical Engineering (ECAME 2017) ISBN: 978-1-60595-523-0 The Establishment of Large Data Mining Platform Based on Cloud Computing

More information

CSE 701: LARGE-SCALE GRAPH MINING. A. Erdem Sariyuce

CSE 701: LARGE-SCALE GRAPH MINING. A. Erdem Sariyuce CSE 701: LARGE-SCALE GRAPH MINING A. Erdem Sariyuce WHO AM I? My name is Erdem Office: 323 Davis Hall Office hours: Wednesday 2-4 pm Research on graph (network) mining & management Practical algorithms

More information

Overview of Web Mining Techniques and its Application towards Web

Overview of Web Mining Techniques and its Application towards Web Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous

More information

Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques

Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques Kouhei Sugiyama, Hiroyuki Ohsaki and Makoto Imase Graduate School of Information Science and Technology,

More information

Query Independent Scholarly Article Ranking

Query Independent Scholarly Article Ranking Query Independent Scholarly Article Ranking Shuai Ma, Chen Gong, Renjun Hu, Dongsheng Luo, Chunming Hu, Jinpeng Huai SKLSDE Lab, Beihang University, China Beijing Advanced Innovation Center for Big Data

More information

Functionality, Challenges and Architecture of Social Networks

Functionality, Challenges and Architecture of Social Networks Functionality, Challenges and Architecture of Social Networks INF 5370 Outline Social Network Services Functionality Business Model Current Architecture and Scalability Challenges Conclusion 1 Social Network

More information

Structure Mining for Intellectual Networks

Structure Mining for Intellectual Networks Structure Mining for Intellectual Networks Ryutaro Ichise 1, Hideaki Takeda 1, and Kosuke Ueyama 2 1 National Institute of Informatics, 2-1-2 Chiyoda-ku Tokyo 101-8430, Japan, {ichise,takeda}@nii.ac.jp

More information

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 1 Student, M.E., (Computer science and Engineering) in M.G University, India, 2 Associate Professor

More information

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS Nilam B. Lonkar 1, Dinesh B. Hanchate 2 Student of Computer Engineering, Pune University VPKBIET, Baramati, India Computer Engineering, Pune University VPKBIET,

More information

Tag Based Image Search by Social Re-ranking

Tag Based Image Search by Social Re-ranking Tag Based Image Search by Social Re-ranking Vilas Dilip Mane, Prof.Nilesh P. Sable Student, Department of Computer Engineering, Imperial College of Engineering & Research, Wagholi, Pune, Savitribai Phule

More information

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,

More information

Survey on Community Question Answering Systems

Survey on Community Question Answering Systems World Journal of Technology, Engineering and Research, Volume 3, Issue 1 (2018) 114-119 Contents available at WJTER World Journal of Technology, Engineering and Research Journal Homepage: www.wjter.com

More information

Recommendation on the Web Search by Using Co-Occurrence

Recommendation on the Web Search by Using Co-Occurrence Recommendation on the Web Search by Using Co-Occurrence S.Jayabalaji 1, G.Thilagavathy 2, P.Kubendiran 3, V.D.Srihari 4. UG Scholar, Department of Computer science & Engineering, Sree Shakthi Engineering

More information

Semantic Technologies for Nuclear Knowledge Modelling and Applications

Semantic Technologies for Nuclear Knowledge Modelling and Applications Semantic Technologies for Nuclear Knowledge Modelling and Applications D. Beraha 3 rd International Conference on Nuclear Knowledge Management 7.-11.11.2016, Vienna, Austria Why Semantics? Machines understanding

More information

Introduction to Networks and Business Intelligence

Introduction to Networks and Business Intelligence Introduction to Networks and Business Intelligence Prof. Dr. Daning Hu Department of Informatics University of Zurich Sep 16th, 2014 Outline n Network Science A Random History n Network Analysis Network

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Large Scale Graph Algorithms

Large Scale Graph Algorithms Large Scale Graph Algorithms A Guide to Web Research: Lecture 2 Yury Lifshits Steklov Institute of Mathematics at St.Petersburg Stuttgart, Spring 2007 1 / 34 Talk Objective To pose an abstract computational

More information

NUSIS at TREC 2011 Microblog Track: Refining Query Results with Hashtags

NUSIS at TREC 2011 Microblog Track: Refining Query Results with Hashtags NUSIS at TREC 2011 Microblog Track: Refining Query Results with Hashtags Hadi Amiri 1,, Yang Bao 2,, Anqi Cui 3,,*, Anindya Datta 2,, Fang Fang 2,, Xiaoying Xu 2, 1 Department of Computer Science, School

More information

Searching for Expertise

Searching for Expertise Searching for Expertise Toine Bogers Royal School of Library & Information Science University of Copenhagen IVA/CCC seminar April 24, 2013 Outline Introduction Expertise databases Expertise seeking tasks

More information

An Efficient Methodology for Image Rich Information Retrieval

An Efficient Methodology for Image Rich Information Retrieval An Efficient Methodology for Image Rich Information Retrieval 56 Ashwini Jaid, 2 Komal Savant, 3 Sonali Varma, 4 Pushpa Jat, 5 Prof. Sushama Shinde,2,3,4 Computer Department, Siddhant College of Engineering,

More information

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul 1 CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul Introduction Our problem is crawling a static social graph (snapshot). Given

More information

Domain Specific Search Engine for Students

Domain Specific Search Engine for Students Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam

More information

HTML 5 and CSS 3, Illustrated Complete. Unit M: Integrating Social Media Tools

HTML 5 and CSS 3, Illustrated Complete. Unit M: Integrating Social Media Tools HTML 5 and CSS 3, Illustrated Complete Unit M: Integrating Social Media Tools Objectives Understand social networking Integrate a Facebook account with a Web site Integrate a Twitter account feed Add a

More information

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific

More information

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION Ms. Nikita P.Katariya 1, Prof. M. S. Chaudhari 2 1 Dept. of Computer Science & Engg, P.B.C.E., Nagpur, India, nikitakatariya@yahoo.com 2 Dept.

More information

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4 Prof. James She james.she@ust.hk 1 Selected Works of Activity 4 2 Selected Works of Activity 4 3 Last lecture 4 Mid-term

More information

Finding Topic-centric Identified Experts based on Full Text Analysis

Finding Topic-centric Identified Experts based on Full Text Analysis Finding Topic-centric Identified Experts based on Full Text Analysis Hanmin Jung, Mikyoung Lee, In-Su Kang, Seung-Woo Lee, Won-Kyung Sung Information Service Research Lab., KISTI, Korea jhm@kisti.re.kr

More information

CE4031 and CZ4031 Database System Principles

CE4031 and CZ4031 Database System Principles CE431 and CZ431 Database System Principles Course CE/CZ431 Course Database System Principles CE/CZ21 Algorithms; CZ27 Introduction to Databases CZ433 Advanced Data Management (not offered currently) Lectures

More information

Visualization and text mining of patent and non-patent data

Visualization and text mining of patent and non-patent data of patent and non-patent data Anton Heijs Information Solutions Delft, The Netherlands http://www.treparel.com/ ICIC conference, Nice, France, 2008 Outline Introduction Applications on patent and non-patent

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks

AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, Jiawei Han University of Illinois at Urbana-Champaign (UIUC) Facebook Inc. U.S. Army Research

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN: Semi Automatic Annotation Exploitation Similarity of Pics in i Personal Photo Albums P. Subashree Kasi Thangam 1 and R. Rosy Angel 2 1 Assistant Professor, Department of Computer Science Engineering College,

More information

A Bayesian Approach to Hybrid Image Retrieval

A Bayesian Approach to Hybrid Image Retrieval A Bayesian Approach to Hybrid Image Retrieval Pradhee Tandon and C. V. Jawahar Center for Visual Information Technology International Institute of Information Technology Hyderabad - 500032, INDIA {pradhee@research.,jawahar@}iiit.ac.in

More information

CE4031 and CZ4031 Database System Principles

CE4031 and CZ4031 Database System Principles CE4031 and CZ4031 Database System Principles Academic AY1819 Semester 1 CE/CZ4031 Database System Principles s CE/CZ2001 Algorithms; CZ2007 Introduction to Databases CZ4033 Advanced Data Management (not

More information

The Design of a Live Social Observatory System

The Design of a Live Social Observatory System The Design of a Live Social Observatory System Huanbo Luan 1,2, Juanzi Li 2, Maosong Sun 2, Tat-Seng Chua 1 1 School of Computing, National University of Singapore 2 Department of Computer Science and

More information

Jianyong Wang Department of Computer Science and Technology Tsinghua University

Jianyong Wang Department of Computer Science and Technology Tsinghua University Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity

More information

Evaluating the Usefulness of Sentiment Information for Focused Crawlers

Evaluating the Usefulness of Sentiment Information for Focused Crawlers Evaluating the Usefulness of Sentiment Information for Focused Crawlers Tianjun Fu 1, Ahmed Abbasi 2, Daniel Zeng 1, Hsinchun Chen 1 University of Arizona 1, University of Wisconsin-Milwaukee 2 futj@email.arizona.edu,

More information

ANNUAL REPORT Visit us at project.eu Supported by. Mission

ANNUAL REPORT Visit us at   project.eu Supported by. Mission Mission ANNUAL REPORT 2011 The Web has proved to be an unprecedented success for facilitating the publication, use and exchange of information, at planetary scale, on virtually every topic, and representing

More information

BUPT at TREC 2009: Entity Track

BUPT at TREC 2009: Entity Track BUPT at TREC 2009: Entity Track Zhanyi Wang, Dongxin Liu, Weiran Xu, Guang Chen, Jun Guo Pattern Recognition and Intelligent System Lab, Beijing University of Posts and Telecommunications, Beijing, China,

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

The Microblogging Evolution Based on User's Popularity

The Microblogging Evolution Based on User's Popularity Send Orders for Reprints to reprints@benthamscience.ae The Open Cybernetics & Systemics Journal, 2015, 9, 2703-2708 2703 The Microblogging Evolution Based on User's Popularity Open Access Tao Shaohua *

More information

Searching in All the Right Places. How Is Information Organized? Chapter 5: Searching for Truth: Locating Information on the WWW

Searching in All the Right Places. How Is Information Organized? Chapter 5: Searching for Truth: Locating Information on the WWW Chapter 5: Searching for Truth: Locating Information on the WWW Fluency with Information Technology Third Edition by Lawrence Snyder Searching in All the Right Places The Obvious and Familiar To find tax

More information

IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM

IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM Myomyo Thannaing 1, Ayenandar Hlaing 2 1,2 University of Technology (Yadanarpon Cyber City), near Pyin Oo Lwin, Myanmar ABSTRACT

More information

An improved PageRank algorithm for Social Network User s Influence research Peng Wang, Xue Bo*, Huamin Yang, Shuangzi Sun, Songjiang Li

An improved PageRank algorithm for Social Network User s Influence research Peng Wang, Xue Bo*, Huamin Yang, Shuangzi Sun, Songjiang Li 3rd International Conference on Mechatronics and Industrial Informatics (ICMII 2015) An improved PageRank algorithm for Social Network User s Influence research Peng Wang, Xue Bo*, Huamin Yang, Shuangzi

More information

Social Network Mining An Introduction

Social Network Mining An Introduction Social Network Mining An Introduction Jiawei Zhang Assistant Professor Florida State University Big Data A Questionnaire Please raise your hands, if you (1) use Facebook (2) use Instagram (3) use Snapchat

More information

Open Research Online The Open University s repository of research publications and other research outputs

Open Research Online The Open University s repository of research publications and other research outputs Open Research Online The Open University s repository of research publications and other research outputs Social Web Communities Conference or Workshop Item How to cite: Alani, Harith; Staab, Steffen and

More information

User Guided Entity Similarity Search Using Meta-Path Selection in Heterogeneous Information Networks

User Guided Entity Similarity Search Using Meta-Path Selection in Heterogeneous Information Networks User Guided Entity Similarity Search Using Meta-Path Selection in Heterogeneous Information Networks Xiao Yu, Yizhou Sun, Brandon Norick, Tiancheng Mao, Jiawei Han Computer Science Department University

More information

Outsourcing Privacy-Preserving Social Networks to a Cloud

Outsourcing Privacy-Preserving Social Networks to a Cloud IEEE INFOCOM 2013, April 14-19, Turin, Italy Outsourcing Privacy-Preserving Social Networks to a Cloud Guojun Wang a, Qin Liu a, Feng Li c, Shuhui Yang d, and Jie Wu b a Central South University, China

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

Topic-Level Random Walk through Probabilistic Model

Topic-Level Random Walk through Probabilistic Model Topic-Level Random Walk through Probabilistic Model Zi Yang, Jie Tang, Jing Zhang, Juanzi Li, and Bo Gao Department of Computer Science & Technology, Tsinghua University, China Abstract. In this paper,

More information

Petroleum User Group Meeting, April 2006 Houston, TX. Leveraging Semantic Technology for Improved Enterprise Search and Knowledge Discovery

Petroleum User Group Meeting, April 2006 Houston, TX. Leveraging Semantic Technology for Improved Enterprise Search and Knowledge Discovery Petroleum User Group Meeting, April 2006 Houston, TX Leveraging Semantic Technology for Improved Enterprise Search and Knowledge Discovery Petroleum User Group Meeting, April 2006 Houston, TX OR GIS as

More information

Query-Sensitive Similarity Measure for Content-Based Image Retrieval

Query-Sensitive Similarity Measure for Content-Based Image Retrieval Query-Sensitive Similarity Measure for Content-Based Image Retrieval Zhi-Hua Zhou Hong-Bin Dai National Laboratory for Novel Software Technology Nanjing University, Nanjing 2193, China {zhouzh, daihb}@lamda.nju.edu.cn

More information

CSE 3. How Is Information Organized? Searching in All the Right Places. Design of Hierarchies

CSE 3. How Is Information Organized? Searching in All the Right Places. Design of Hierarchies CSE 3 Comics Updates Shortcut(s)/Tip(s) of the Day Web Proxy Server PrimoPDF How Computers Work Ch 30 Chapter 5: Searching for Truth: Locating Information on the WWW Fluency with Information Technology

More information

Graph Classification in Heterogeneous

Graph Classification in Heterogeneous Title: Graph Classification in Heterogeneous Networks Name: Xiangnan Kong 1, Philip S. Yu 1 Affil./Addr.: Department of Computer Science University of Illinois at Chicago Chicago, IL, USA E-mail: {xkong4,

More information

Mining Trusted Information in Medical Science: An Information Network Approach

Mining Trusted Information in Medical Science: An Information Network Approach Mining Trusted Information in Medical Science: An Information Network Approach Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign Collaborated with many, especially Yizhou

More information

SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR

SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR P.SHENBAGAVALLI M.E., Research Scholar, Assistant professor/cse MPNMJ Engineering college Sspshenba2@gmail.com J.SARAVANAKUMAR B.Tech(IT)., PG

More information

Learning to Infer Social Ties in Large Networks

Learning to Infer Social Ties in Large Networks Learning to Infer Social Ties in Large Networks Wenbin Tang, Honglei Zhuang, and Jie Tang Department of Computer Science and Technology, Tsinghua University {tangwb06,honglei.zhuang}@gmail.com, jietang@tsinghua.edu.cn

More information

An Approach To Web Content Mining

An Approach To Web Content Mining An Approach To Web Content Mining Nita Patil, Chhaya Das, Shreya Patanakar, Kshitija Pol Department of Computer Engg. Datta Meghe College of Engineering, Airoli, Navi Mumbai Abstract-With the research

More information

Remotely Sensed Image Processing Service Automatic Composition

Remotely Sensed Image Processing Service Automatic Composition Remotely Sensed Image Processing Service Automatic Composition Xiaoxia Yang Supervised by Qing Zhu State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University

More information

Video annotation based on adaptive annular spatial partition scheme

Video annotation based on adaptive annular spatial partition scheme Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory

More information

FSRM Feedback Algorithm based on Learning Theory

FSRM Feedback Algorithm based on Learning Theory Send Orders for Reprints to reprints@benthamscience.ae The Open Cybernetics & Systemics Journal, 2015, 9, 699-703 699 FSRM Feedback Algorithm based on Learning Theory Open Access Zhang Shui-Li *, Dong

More information

Query Modifications Patterns During Web Searching

Query Modifications Patterns During Web Searching Bernard J. Jansen The Pennsylvania State University jjansen@ist.psu.edu Query Modifications Patterns During Web Searching Amanda Spink Queensland University of Technology ah.spink@qut.edu.au Bhuva Narayan

More information

Text Mining. Representation of Text Documents

Text Mining. Representation of Text Documents Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,

More information

Interaction Model to Predict Subjective Specificity of Search Results

Interaction Model to Predict Subjective Specificity of Search Results Interaction Model to Predict Subjective Specificity of Search Results Kumaripaba Athukorala, Antti Oulasvirta, Dorota Glowacka, Jilles Vreeken, Giulio Jacucci Helsinki Institute for Information Technology

More information

Keywords Data alignment, Data annotation, Web database, Search Result Record

Keywords Data alignment, Data annotation, Web database, Search Result Record Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web

More information

An Efficient Approach for Color Pattern Matching Using Image Mining

An Efficient Approach for Color Pattern Matching Using Image Mining An Efficient Approach for Color Pattern Matching Using Image Mining * Manjot Kaur Navjot Kaur Master of Technology in Computer Science & Engineering, Sri Guru Granth Sahib World University, Fatehgarh Sahib,

More information

Introduction April 27 th 2016

Introduction April 27 th 2016 Social Web Mining Summer Term 2016 1 Introduction April 27 th 2016 Dr. Darko Obradovic Insiders Technologies GmbH Kaiserslautern d.obradovic@insiders-technologies.de Outline for Today 1.1 1.2 1.3 1.4 1.5

More information

An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data

An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data Xiaorong Yang 1,2, Wensheng Wang 1,2, Qingtian Zeng 3, and Nengfu Xie 1,2 1 Agriculture Information Institute,

More information

Complex networks: A mixture of power-law and Weibull distributions

Complex networks: A mixture of power-law and Weibull distributions Complex networks: A mixture of power-law and Weibull distributions Ke Xu, Liandong Liu, Xiao Liang State Key Laboratory of Software Development Environment Beihang University, Beijing 100191, China Abstract:

More information

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Effective Latent Space Graph-based Re-ranking Model with Global Consistency Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case

More information

Twitter data Analytics using Distributed Computing

Twitter data Analytics using Distributed Computing Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE

More information

Digital News and Social Content. How to revitalize your news content and make it relevant in the digital age

Digital News and Social Content. How to revitalize your news content and make it relevant in the digital age Digital News and Social Content How to revitalize your news content and make it relevant in the digital age Adapting to the Digital World A new format: Provide the facts Package it in sections Add visual

More information

Link Prediction for Social Network

Link Prediction for Social Network Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue

More information

Competitive Intelligence and Web Mining:

Competitive Intelligence and Web Mining: Competitive Intelligence and Web Mining: Domain Specific Web Spiders American University in Cairo (AUC) CSCE 590: Seminar1 Report Dr. Ahmed Rafea 2 P age Khalid Magdy Salama 3 P age Table of Contents Introduction

More information

VisoLink: A User-Centric Social Relationship Mining

VisoLink: A User-Centric Social Relationship Mining VisoLink: A User-Centric Social Relationship Mining Lisa Fan and Botang Li Department of Computer Science, University of Regina Regina, Saskatchewan S4S 0A2 Canada {fan, li269}@cs.uregina.ca Abstract.

More information

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task. Junjun Wang 2013/4/22

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task. Junjun Wang 2013/4/22 Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task Junjun Wang 2013/4/22 Outline Introduction Related Word System Overview Subtopic Candidate Mining Subtopic Ranking Results and Discussion

More information

Overview of the Stateof-the-Art. Networks. Evolution of social network studies

Overview of the Stateof-the-Art. Networks. Evolution of social network studies Overview of the Stateof-the-Art in Social Networks INF5370 spring 2014 Evolution of social network studies 1950-1970: mathematical studies of networks formed by the actual human interactions Pandemics,

More information

Web Database Integration

Web Database Integration In Proceedings of the Ph.D Workshop in conjunction with VLDB 06 (VLDB-PhD2006), Seoul, Korea, September 11, 2006 Web Database Integration Wei Liu School of Information Renmin University of China Beijing,

More information

MULTIMEDIA ANALYTICS: SYNERGY BETWEEN HUMAN AND MACHINE BY VISUALIZATION

MULTIMEDIA ANALYTICS: SYNERGY BETWEEN HUMAN AND MACHINE BY VISUALIZATION MULTIMEDIA ANALYTICS: SYNERGY BETWEEN HUMAN AND MACHINE BY VISUALIZATION Marcel Worring, Jan Zahálka, Stevan Rudinac Intelligent Systems Lab Amsterdam Amsterdam Data Science University of Amsterdam Amsterdam

More information

D B M G Data Base and Data Mining Group of Politecnico di Torino

D B M G Data Base and Data Mining Group of Politecnico di Torino DataBase and Data Mining Group of Data mining fundamentals Data Base and Data Mining Group of Data analysis Most companies own huge databases containing operational data textual documents experiment results

More information

Comparative Analysis of Range Aggregate Queries In Big Data Environment

Comparative Analysis of Range Aggregate Queries In Big Data Environment Comparative Analysis of Range Aggregate Queries In Big Data Environment Ranjanee S PG Scholar, Dept. of Computer Science and Engineering, Institute of Road and Transport Technology, Erode, TamilNadu, India.

More information

IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK

IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK 1 Mount Steffi Varish.C, 2 Guru Rama SenthilVel Abstract - Image Mining is a recent trended approach enveloped in

More information

Search Computing: Business Areas, Research and Socio-Economic Challenges

Search Computing: Business Areas, Research and Socio-Economic Challenges Search Computing: Business Areas, Research and Socio-Economic Challenges Yiannis Kompatsiaris, Spiros Nikolopoulos, CERTH--ITI NEM SUMMIT Torino-Italy, 28th September 2011 Media Search Cluster Search Computing

More information

Mining Query-Based Subnetwork Outliers in Heterogeneous Information Networks

Mining Query-Based Subnetwork Outliers in Heterogeneous Information Networks Mining Query-Based Subnetwork Outliers in Heterogeneous Information Networks Honglei Zhuang, Jing Zhang 2, George Brova, Jie Tang 2, Hasan Cam 3, Xifeng Yan 4, Jiawei Han University of Illinois at Urbana-Champaign

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK SURVEY ON BIG DATA USING DATA MINING AYUSHI V. RATHOD, PROF. S. S. ASOLE BNCOE,

More information