PANDA: A Platform for Academic Knowledge Discovery and Acquisition
|
|
- Bertina Hudson
- 5 years ago
- Views:
Transcription
1 PANDA: A Platform for Academic Knowledge Discovery and Acquisition Zhaoan Dong 1 ; Jiaheng Lu 2,1 ; Tok Wang Ling 3 1.Renmin University of China 2.University of Helsinki 3.National University of Singapore
2 CONTENT 1. Motivation and background 2. Definitions and problem statement 3. Our hybrid framework 4. Current system implementation 5. Related work 6. Conclusion and future work 2
3 1. Motivation and background Existing popular web-based academic search systems 3 provide literature search and retrieval services through a user-friendly interface Keywords search return a long list of paper titles and other textual information To find the papers they want, users often need scan the long list and download some papers to read one by one. Time-consuming costly
4 1. Motivation and background PandaSearch: A Fine-grained Academic Search Engine For Research Documents on Computer Science (ICDE 2015) 4
5 1. Motivation and background PandaSearch: A Fine-grained Academic Search Engine For Research Documents On Computer Science (ICDE 2015) 5
6 1. Motivation and background Knowledge cells Some meaningful information objects within academic documents, e.g. Figures, Tables, Definitions, etc.). Example 1: Figure Table Definition Algorithm Some examples for different types of Knowledge Cells 6
7 1. Motivation and background some relationships among knowledge cells are usually implied or hidden in the sentences of the articles. Example 2: (1) K-Medoids is compared with PIL. PAM (2) K-Medoids algorithm depends on Equation 4 DEPD Equation 4 K-medoids PIL (3) PAM is a kind of K-Medoids algorithms 7
8 1. Motivation and background some relationships among knowledge cells are usually implied or hidden in the sentences of the articles. Example 3: LKMed PAM (4) HKMed is adapted from LKMed (5) HKMed is adapted from PAM. HKMed Equation 4 VARNT DEPD K-medoids PIL According the sentences we can find the relationships among three algorithms: HKMed, LKmed and PAM. 8
9 1. Motivation and background Example 4: REF DEPD CMP Figure Definition Algorithm Theorem Table 9 A Fragment of an Academic Knowledge Graph
10 1. Motivation and background The academic knowledge graph can provide a more accurate paper-level results Improving the ranking of the relevant papers towards keywords query. a fine-grained search Looking inside the documents to search some research data within scientific articles Returning some fine-grained information objects not only a flat list of paper-level information. deep-level information exploring Academic Knowledge discovery Academic information exploring developers 10
11 1. Motivation and background In the future, on the one hand, we want to add Advanced Search to PandaSearch for common users as below. on the other hand, we can provide SQL-Like APIs for external systems as demonstrated in the following examples. 11
12 1. Motivation and background Example 5: To find the Figures that contain inverted list" in their captions and the papers these Figures from. SELECT p.pid, p.title, k.name, k.content FROM papers p, cells k WHERE contains(k.name,"inverted list") AND k.type="figure" AND p.pid=k.pid; We use a non-standard SQL statements to illustrate what the query language looks like. papers and cells can be either relational tables or nonrelational data collections contains can be some built-in functions. 12
13 1. Motivation and background Example 6: Search algorithms from different papers which are variants or have been compared with an Algorithm whose name is related to hash join algorithm. SELECT k 1.pid, k 1.name, k 2.pid, k 2.name FROM cells k 1, cells k 2 WHERE relations(k 1,k 2 ) IN ("CMP","VARNT") AND contains(k 2.name,"hash join") AND k 1.type = k 2.type = "Algorithm" AND k 1.pid!= k 2.pid; cells can be either relational tables or non-relational data collections relations can be some built-in functions. 13
14 1. Motivation and background Objectives and challenges. 14 (1) Correctly identify and extract the contents of each Knowledge Cell. PDFs lacks of enough structural information diverse journals published in different years and layouts (2) Extract the attributes, key phrases and contexts of the Knowledge Cells. The captions of Figures, the specifications of Algorithms etc. are hard for computer to understand. (3) Identify and extract various relationships between Knowledge Cells The relationships are usually implied in text, rare or invisible. Some even require expertise to be recognized.
15 1. Motivation and background Example 8: The layouts of Knowledge Cells are always changing with the format of different documents, different conferences or different journals. 15
16 1. Motivation and background Example 9: Even in one document, the layouts are different. 16 There are at least three different layouts of 11 logical objects including one Table and ten Figures.
17 1. Motivation and background Example 10: Text, Number, Formula missing Caption missing information makes some attributes of the knowledge cells null. information overload makes it hard to extract the attributes and relationships. 17
18 1. Motivation and background To overcome the challenges, 18 we propose a hybrid framework combining the accuracy of human workers with the speed of computer algorithms. Automatic computer algorithms: Low cost, speed can hardly extend to handle diverse journals and layouts, with the increasing amount of scientific publications. Human workers in crowdsourcing more accuracy, higher performance. Expensive crowdsourcing cost, e.g. time, money. The cooperation of human and machine can help researchers to resolve large scale complex problems in a more efficient way
19 2. Definitions and problem statement The definition of Knowledge Cell Definition 1: A Knowledge Cell is a meaningful information object within an academic document. Each Knowledge Cell should have some attributes including an identifier, paper identifier, type, name, content and key phrases, and so on. Generally, if papers are also of a special kind of Knowledge Cells that have attributes like paper identifier ( e.g. pid ), title, authors, pages, conference or journal, date, references, etc. 19
20 2. Definitions and problem statement The definition of Academic Knowledge Graph Definition 2: An Academic Knowledge Graph is a directed graph AKG = (K, R ), where K is the set of Knowledge Cells extracted from a collection of academic documents and R = { (k 1, k 2, r ) k 1, k 2 ϵ K; k 1 k 2 ; and r is the relationship between k 1 and k 2 }. Note that k 1 and k 2 are two knowledge cells either from one PDF file or two different files. 20
21 2. Definitions and problem statement We will obtain a more general Academic Knowledge Graph (GAKG) as a hyper graph if it contains the relationships between: each paper and it citations. each paper and Knowledge Cells within it. Knowledge Cells Figure CITE Definition Algorithm Theorem Table A fragment of a general Academic Knowledge Graph 21
22 2. Definitions and problem statement Problem statement: the problem of academic knowledge discovery and acquisition can be modeled as a crowd-sourced database problem, where scholarly papers, Knowledge Cells and their relationships could be represented as rows /records with some missing attributes that could be supplied by either automatic algorithms or anonymous human workers. Our objectives is to identify and extract them by either automatic algorithms or anonymous human workers for further queries. We focus on how to design such hybrid workflows that combine the automatic algorithms and crowdsourced tasks efficiently and effectively. 22
23 3. Our hybrid framework A generic framework for knowledge discovery and acquisition from PDF documents. Crowd training Automated Extracting Crowdsourcing PDF Pages Automated Extracting Algorithm Low confidence HIT Candidate HIT HIT Candidate Candidates HITs generating HITs High confidence Confirmed Knowledge Cells 23 The hybrid workflows
24 3. Our hybrid framework Our hybrid workflows can be regarded as a multi-stage process (1)Preprocessing stage. 24 Metadata information of papers could be harvested from public website previously. title, authors, publication date, page number, etc. Format conversion PDF documents text files PDF pages JPEG/PNG images pages filtering by rule-based filters. Some PDF pages that obviously do not contain the target objects to be extracted should be filtered
25 3. Our hybrid framework (2) Extracting academic knowledge using automatic algorithms Heuristic methods and machine learning algorithms are employed to: Locate the position of the area of each Knowledge Cell Analyze the texts and extract the attributes, contexts, key phrases of each Knowledge Cells. Provide a confidence estimate value on how accurate and reliable an identified result is likely to be. Adjust the filtering threshold of the confidence dynamically with consideration of time cost, result quality and budget of crowdsourcing. 25
26 3. Our hybrid framework (3) Crowdsourcing tasks design 26 Results with high confidence value will be retained. Otherwise, the current page will be switched to the crowdsourcing layer as a Human Intelligence Task Candidate (HITC). Human Intelligence Tasks (HITs) for extracting certain Knowledge Cells or information will be designed and generated. A web-based task-oriented crowdsourcing system Identifying tasks Reviewing tasks Tutorial tasks Test tasks
27 3. Our hybrid framework (4)Crowdsourcing process management and cost model Answers aggregation and quality control 27 Majority vote, etc. a tutorial module a test module A crowdsourcing cost model how to archive a higher quality with a fixed budget. how to reduce the whole cost with quality constraints. User management module Registration ranking and reputation
28 4. Current system implementation Platform for Academic knowledge Discovery and Acquisition (PANDA) Internet PDFs Crowds PANDA Academic Knowledge Base PandaSearch Query Result User 28 PANDA serves as a data provider for Pandasearch
29 4. Current system implementation 29 The system architecture of PANDA
30 4. Current system implementation (1) Data Storage 2.9 Million PDF documents in computer science. We currently focus on the extraction of Figures Data Type Number Papers Figures Definitions 1939 Lemmas 757 Theorems 726 Algorithms 671 Propositions 52 Examples 1038 Now, we have extracted Figures from 5000 papers, including nearly 4000 SIGMOD papers published from 1980 to So that the number of Figures is quite less than the number of papers. This is why we want to develop the PANDA, to process the rest papers that are still increasing in amount. Statistics of current data stores 30
31 4. Current system implementation (2) Algorithmic Layer we have built an algorithm using rule-based and machine learning methods to automatically extract Figures: 1. Splitting the PDF document into pages. 2. Converting the PDF file into standard text file format. 3. Filtering the pages that obviously do not contain figures. 4. Locating the boundary of the figure s content area by a detector. (PDFBox and libsvm are used.) 5. Cropping the Figures content by an Extractor or a Cropper according to the position information. 31
32 4. Current system implementation We performed an initial experiment for extracting Figures within nearly 4,000 SIGMOD papers published from 1980 to We use Completeness and Purity to evaluate the result of boundary detector in addition to Precision, Recall and F-Measure. Complete: the result region includes all the parts of the Knowledge Cell content. Pure: does not contain anything that does not belong to the Knowledge Cell. A correctly identified component of a Knowledge Cell is therefore both complete and pure. 32
33 4. Current system implementation Example 11:The identified results in the left page are not correct, since the first one discard the left part and the second one covers too much texts. 33
34 4. Current system implementation Preliminary experimental results of current algorithms for Figures Extraction. This figure shows that the performance for papers from 1980 to 1989 are lower than those of the later years Recall Precision F-Measure
35 4. Current system implementation Example 12:PDF pages in earlier years 35 This is because the PDF files in earlier years usually have low quality or resolutions. The extracted texts usually contain various type of noises in character recognition process, e.g. typos. This maybe affect the discovery and locating of some Knowledge Cells.
36 4. Current system implementation (3) Crowdsourcing Layer 36 An Example of Web-based Interfaces for Extracting Figures
37 4. Current system implementation (4) Crowds/human workers 37 Who might contribute to the crowdsourced tasks Common users Authors Student volunteers Published on Mechanical Turk?Crowdflower? How to motivate and retain human workers? Game? award points? recaptcha?
38 5. Related work More and more interests have been spent on the extraction and management of research data within scientific literature. Digital Curation (DC) is the selection, preservation, maintenance, collection and archiving of digital assets. establishes, maintains and adds value to repositories of digital data for present and future use. Deep Indexing(DI) Indexing the research data within articles that are invisible to the traditional bibliographic searches. Deep Indexing is now available in ProQuest, CiteSeerX, ScienceDirect, etc. 38
39 5. Related work Figures and tables are also displayed when the paper they from are returned as a search result. In Citeseer, users can search tables by inputting some keyworks. 39
40 5. Related work However The extraction and management of each kind of Knowledge Cells is independent. The query and display of them depend on the query of academic papers, not the attributes of Knowledge Cells themselves. No published works focus on the relationships among various kind of Knowledge Cells. No related work utilizing the relationships to build the Academic Knowledge Graph as we proposed. 40
41 5. Related work Automatic Information Extraction A number of methods, techniques and tools have been employed to analyze the structure of PDFs and identify different layout blocks within PDFs. Hu, Jianying, and Y. Liu. Analysis of Documents Born Digital. Handbook of Document Image Processing and Recognition. Springer London, 2014: Klampfl, Stefan, et al. "Unsupervised document structure analysis of digital scientific articles." International Journal on Digital Libraries14.3(2014): J. Wu, K. Williams, H. Chen, M. Khabsa, C. Caragea, A. Ororbia, D. Jordan, and C. L. Giles, Citeseerx: AI in a digital library search engine, in AAAI, 2014, pp Most of them focus on the structure analysis of PDF documents to identify and extract the content of Figures and Tables. We want to extend them to the extraction of other kinds of Knowledge Cells and their attributes. 41
42 5. Related work Task-Oriented Crowdsourcing C. Lofi and K. E. Maarry, Design patterns for hybrid algorithmic crowdsourcing workflows, in CBI, 2014, pp N. Luz, N. Silva, and P. Novais, A survey of task-oriented crowdsourcing, Artificial Intelligence Review, pp. 1 27, N. Luz, N. Silva, and P. Novais, Generating human-computer microtask workflows from domain ontologies, in Human-Computer Interaction. Theories, Methods, and Tools. Springer, 2014, pp E. Kamar, S. Hacker, and E. Horvitz, Combining human and machine intelligence in large-scale crowdsourcing in AAMAS, 2012, pp S. K. Kondreddi, P. Triantafillou, and G. Weikum, Combining information extraction and human computing for crowdsourced knowledge acquisition, in ICDE, 2014, pp There are no related work on academic knowledge discovery and acquisition using crowdsourcing methods. 42
43 6. Conclusion and future work The objectives of this research is to identify and extract academic knowledge using a hybrid framework integrating the accuracy of human workers and the speed of algorithms. The contributions of this paper Stated the problem of academic knowledge discovery and acquisition as a crowd-sourced database problem based on the definitions of Knowledge Cells and Academic Knowledge Graph. Proposed a hybrid framework integrating the accuracy of human workers and the speed of automatic algorithms. Designed a web-based crowdsourcing module for Figure extraction with some preliminary achievements. 43
44 6. Conclusion and future work We have a lot of works to do Improving the feasibility of the crowdsourcing interfaces and optimize the design of HITs Making the algorithms to be confidence-aware and to iteratively interact with the crowdsourcing modules. Strategies for switch tasks. Optimization of the algorithms using human contributions. Trade-off considerations. Extending the framework to identify and extract various attributes and information of Knowledge Cells. Different Knowledge Cells have some different features 44
45 Thank you! 45
Scholarly Big Data: Leverage for Science
Scholarly Big Data: Leverage for Science C. Lee Giles The Pennsylvania State University University Park, PA, USA giles@ist.psu.edu http://clgiles.ist.psu.edu Funded in part by NSF, Allen Institute for
More informationExtracting Algorithms by Indexing and Mining Large Data Sets
Extracting Algorithms by Indexing and Mining Large Data Sets Vinod Jadhav 1, Dr.Rekha Rathore 2 P.G. Student, Department of Computer Engineering, RKDF SOE Indore, University of RGPV, Bhopal, India Associate
More informationSemantic Scholar. ICSTI Towards a More Efficient Review of Research Literature 11 September
Semantic Scholar ICSTI Towards a More Efficient Review of Research Literature 11 September 2018 Allen Institute for Artificial Intelligence (https://allenai.org/) Non-profit Research Institute in Seattle,
More informationEpistemo: A Crowd-Powered Conversational Search Interface
Epistemo: A Crowd-Powered Conversational Search Interface Saiganesh Swaminathan saiganes@cs.cmu.edu Ting-Hao (Kenneth) Huang tinghaoh@andrew.cmu.edu Irene Lin iwl@andrew.cmu.edu Anhong Guo anhongg@cs.cmu.edu
More informationChapter 50 Tracing Related Scientific Papers by a Given Seed Paper Using Parscit
Chapter 50 Tracing Related Scientific Papers by a Given Seed Paper Using Parscit Resmana Lim, Indra Ruslan, Hansin Susatya, Adi Wibowo, Andreas Handojo and Raymond Sutjiadi Abstract The project developed
More informationAlgorithmic Crowdsourcing
Algorithmic Crowdsourcing (and Applications in Social Networking) Jie Wu Dept. of Computer and Info. Sciences Temple University Road Map Introduction Mechanical Turk Applications Paradigms Challenges and
More informationThe DOI Identifier. Drexel University. From the SelectedWorks of James Gross. James Gross, Drexel University. June 4, 2012
Drexel University From the SelectedWorks of James Gross June 4, 2012 The DOI Identifier James Gross, Drexel University Available at: https://works.bepress.com/jamesgross/26/ The DOI Identifier James Gross
More informationA Hierarchical Document Clustering Approach with Frequent Itemsets
A Hierarchical Document Clustering Approach with Frequent Itemsets Cheng-Jhe Lee, Chiun-Chieh Hsu, and Da-Ren Chen Abstract In order to effectively retrieve required information from the large amount of
More informationSearching of Nearest Neighbor Based on Keywords using Spatial Inverted Index
Searching of Nearest Neighbor Based on Keywords using Spatial Inverted Index B. SATYA MOUNIKA 1, J. VENKATA KRISHNA 2 1 M-Tech Dept. of CSE SreeVahini Institute of Science and Technology TiruvuruAndhra
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REVIEW PAPER ON IMPLEMENTATION OF DOCUMENT ANNOTATION USING CONTENT AND QUERYING
More informationThe FreeSearch System
Wolfgang Nejdl 03/05/12 1 The FreeSearch System Search engine for digital libraries Simple to use interface Intuitive functionalities Easily scalable Now with focus on Duplicate detection and duplicate
More informationSearching SNT in XML Documents Using Reduction Factor
Searching SNT in XML Documents Using Reduction Factor Mary Posonia A Department of computer science, Sathyabama University, Tamilnadu, Chennai, India maryposonia@sathyabamauniversity.ac.in http://www.sathyabamauniversity.ac.in
More informationSkimmer: Rapid Scrolling of Relational Query Results. Manish Singh, Arnab Nandi and H.V. Jagadish
Skimmer: Rapid Scrolling of Relational Query Results Manish Singh, Arnab Nandi and H.V. Jagadish Information Overload! Hard for users to specify the query results of interest! Empty or many-answers problem!
More informationYou can access ProQuest, as well as any of the other available subscription databases, from the library Web page,
Harold Washington College Library Using ProQuest to Search for Articles About ProQuest ProQuest is one of many subscription databases available through the Harold Washington College Library. ProQuest offers
More informationKeywords Data alignment, Data annotation, Web database, Search Result Record
Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web
More informationCHAPTER-6 WEB USAGE MINING USING CLUSTERING
CHAPTER-6 WEB USAGE MINING USING CLUSTERING 6.1 Related work in Clustering Technique 6.2 Quantifiable Analysis of Distance Measurement Techniques 6.3 Approaches to Formation of Clusters 6.4 Conclusion
More informationOutline. Eg. 1: DBLP. Motivation. Eg. 2: ACM DL Portal. Eg. 2: DBLP. Digital Libraries (DL) often have many errors that negatively affect:
Outline Effective and Scalable Solutions for Mixed and Split Citation Problems in Digital Libraries Dongwon Lee, Byung-Won On Penn State University, USA Jaewoo Kang North Carolina State University, USA
More informationIn the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,
1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to
More informationWeb of Science. Platform Release Nina Chang Product Release Date: March 25, 2018 EXTERNAL RELEASE DOCUMENTATION
Web of Science EXTERNAL RELEASE DOCUMENTATION Platform Release 5.28 Nina Chang Product Release Date: March 25, 2018 Document Version: 1.0 Date of issue: March 22, 2018 RELEASE OVERVIEW The following features
More informationQuery Independent Scholarly Article Ranking
Query Independent Scholarly Article Ranking Shuai Ma, Chen Gong, Renjun Hu, Dongsheng Luo, Chunming Hu, Jinpeng Huai SKLSDE Lab, Beihang University, China Beijing Advanced Innovation Center for Big Data
More informationCookies, fake news and single search boxes: the role of A&I services in a changing research landscape
IET White Paper Cookies, fake news and single search boxes: the role of A&I services in a changing research landscape November 2017 www.theiet.org/inspec 1 Introduction Searching for information on the
More informationScholarly collaboration platforms
Scholarly collaboration platforms STM Meeting 22 April 2015 Washington, DC Mark Ware @mrkwr Question: Which social network do researchers know & use almost as much as Google Scholar? Source: Reprinted
More informationTERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES
TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.
More informationA Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation
, pp.162-167 http://dx.doi.org/10.14257/astl.2016.138.33 A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation Liqiang Hu, Chaofeng He Shijiazhuang Tiedao University,
More informationA BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK
A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific
More informationLeveraging Transitive Relations for Crowdsourced Joins*
Leveraging Transitive Relations for Crowdsourced Joins* Jiannan Wang #, Guoliang Li #, Tim Kraska, Michael J. Franklin, Jianhua Feng # # Department of Computer Science, Tsinghua University, Brown University,
More informationA Web Service for Scholarly Big Data Information Extraction
A Web Service for Scholarly Big Data Information Extraction Kyle Williams, Lichi Li, Madian Khabsa, Jian Wu, Patrick C. Shih and C. Lee Giles Information Sciences and Technology Computer Science and Engineering
More informationWhen you enter your search, you will get a list of results like this:
There are many databases available to you as students at the University of the Cumberlands. Click here to see a listing of databases by subject. This tutorial deals with the use of a particular database:
More informationICME: Status & Perspectives
ICME: Status & Perspectives from Materials Science and Engineering Surya R. Kalidindi Georgia Institute of Technology New Strategic Initiatives: ICME, MGI Reduce expensive late stage iterations Materials
More informationData Curation Profile Human Genomics
Data Curation Profile Human Genomics Profile Author Profile Author Institution Name Contact J. Carlson N. Brown Purdue University J. Carlson, jrcarlso@purdue.edu Date of Creation October 27, 2009 Date
More informationWEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS
1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,
More informationAn Improved Markov Model Approach to Predict Web Page Caching
An Improved Markov Model Approach to Predict Web Page Caching Meenu Brala Student, JMIT, Radaur meenubrala@gmail.com Mrs. Mamta Dhanda Asstt. Prof, CSE, JMIT Radaur mamtanain@gmail.com Abstract Optimization
More informationOnline Template Matching Over a Stream of Digitized Documents
Online Template Matching Over a Stream of Digitized Documents Michael Stockerl, Christoph Ringlstetter Gini Gmbh Eirini Ntoutsi, Matthias Schubert, Hans Peter Kriegel University of Munich (LMU) CS Departement
More informationImproving Suffix Tree Clustering Algorithm for Web Documents
International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015) Improving Suffix Tree Clustering Algorithm for Web Documents Yan Zhuang Computer Center East China Normal
More informationDeep Web Content Mining
Deep Web Content Mining Shohreh Ajoudanian, and Mohammad Davarpanah Jazi Abstract The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased
More informationUsing JSTOR. May 2016
Using JSTOR May 2016 Presentation Agenda 1. What is JSTOR? 2. JSTOR demonstration Searching JSTOR Format of the journal content Linking to content on JSTOR 3. Help & Support What is JSTOR? What is JSTOR?
More informationCHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES
188 CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES 6.1 INTRODUCTION Image representation schemes designed for image retrieval systems are categorized into two
More informationMake the most of your access to ScienceDirect
1 Make the most of your access to ScienceDirect Present Future 2 ScienceDirect Training Deck We re here to help you make the most of your access to ScienceDirect. ScienceDirect offers researchers the latest
More informationComparison of FP tree and Apriori Algorithm
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 10, Issue 6 (June 2014), PP.78-82 Comparison of FP tree and Apriori Algorithm Prashasti
More informationThe Functional Extension Parser (FEP) A Document Understanding Platform
The Functional Extension Parser (FEP) A Document Understanding Platform Günter Mühlberger University of Innsbruck Department for German Language and Literature Studies Introduction A book is more than
More informationSelecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach +
Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach + Abdullah Al-Hamdani, Gultekin Ozsoyoglu Electrical Engineering and Computer Science Dept, Case Western Reserve University,
More informationInformation Extraction out of Born- Digital Scientific Articles
Information Extraction out of Born- Digital Scientific Articles Roman Kern EXCITE Workshop 2017 Know-Center GmbH, www.know-center.at STARTING POINT: PDF EXTRACTION Know- -Driven Business and Big Data Analytics
More informationBibliometrics: Citation Analysis
Bibliometrics: Citation Analysis Many standard documents include bibliographies (or references), explicit citations to other previously published documents. Now, if you consider citations as links, academic
More informationSciVerse Scopus. 1. Scopus introduction and content coverage. 2. Scopus in comparison with Web of Science. 3. Basic functionalities of Scopus
Prepared by: Jawad Sayadi Account Manager, United Kingdom Elsevier BV Radarweg 29 1043 NX Amsterdam The Netherlands J.Sayadi@elsevier.com SciVerse Scopus SciVerse Scopus 1. Scopus introduction and content
More informationCITESEERX DATA: SEMANTICIZING SCHOLARLY PAPERS
CITESEERX DATA: SEMANTICIZING SCHOLARLY PAPERS Jian Wu, IST, Pennsylvania State University Chen Liang, IST, Pennsylvania State University Huaiyu Yang, EECS, Vanderbilt University C. Lee Giles, IST & CSE
More informationEvent: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect
Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect BEOP.CTO.TP4 Owner: OCTO Revision: 0001 Approved by: JAT Effective: 08/30/2018 Buchanan & Edwards Proprietary: Printed copies of
More informationRevealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization
Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Katsuya Masuda *, Makoto Tanji **, and Hideki Mima *** Abstract This study proposes a framework to access to the
More informationSourcererCC -- Scaling Code Clone Detection to Big-Code
SourcererCC -- Scaling Code Clone Detection to Big-Code What did this paper do? SourcererCC a token-based clone detector, that can detect both exact and near-miss clones from large inter project repositories
More informationeresearch Australia The Elephant in the Room! Open Access Archiving and other Gateways to e-research Richard Levy
eresearch Australia Open Access Archiving and other Gateways to e-research Richard Levy The Elephant in the Room! The Impact of Google on eresearch Google is the black box to information on the Internet,
More informationEmpirical Analysis of Single and Multi Document Summarization using Clustering Algorithms
Engineering, Technology & Applied Science Research Vol. 8, No. 1, 2018, 2562-2567 2562 Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms Mrunal S. Bewoor Department
More informationAn Approach to Evaluate and Enhance the Retrieval of Web Services Based on Semantic Information
An Approach to Evaluate and Enhance the Retrieval of Web Services Based on Semantic Information Stefan Schulte Multimedia Communications Lab (KOM) Technische Universität Darmstadt, Germany schulte@kom.tu-darmstadt.de
More informationFinding Topic-centric Identified Experts based on Full Text Analysis
Finding Topic-centric Identified Experts based on Full Text Analysis Hanmin Jung, Mikyoung Lee, In-Su Kang, Seung-Woo Lee, Won-Kyung Sung Information Service Research Lab., KISTI, Korea jhm@kisti.re.kr
More informationA System for Searching, Extracting & Copying for Algorithm, Pseudocodes & Programs in Data
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,
More informationResearch Article QOS Based Web Service Ranking Using Fuzzy C-means Clusters
Research Journal of Applied Sciences, Engineering and Technology 10(9): 1045-1050, 2015 DOI: 10.19026/rjaset.10.1873 ISSN: 2040-7459; e-issn: 2040-7467 2015 Maxwell Scientific Publication Corp. Submitted:
More informationarxiv:cs/ v1 [cs.dl] 11 Feb 1999
arxiv:cs/9902023v1 [cs.dl] 11 Feb 1999 A New Ranking Principle for Multimedia Information Retrieval Martin Wechsler, Peter Schäuble Eurospider Information Technology AG, Zürich www.eurospider.com Abstract
More informationLukáš Plch at Mendel university in Brno
Lukáš Plch lukas.plch@mendelu.cz at Mendel university in Brno CAB Abstracts Greenfile Econlit with Full Text OECD ilibrary the most comprehensive database of its kind, instant access to over 7.3 million
More informationA Digital Library Framework for Reusing e-learning Video Documents
A Digital Library Framework for Reusing e-learning Video Documents Paolo Bolettieri, Fabrizio Falchi, Claudio Gennaro, and Fausto Rabitti ISTI-CNR, via G. Moruzzi 1, 56124 Pisa, Italy paolo.bolettieri,fabrizio.falchi,claudio.gennaro,
More informationEFFICIENT ATTRIBUTE REDUCTION ALGORITHM
EFFICIENT ATTRIBUTE REDUCTION ALGORITHM Zhongzhi Shi, Shaohui Liu, Zheng Zheng Institute Of Computing Technology,Chinese Academy of Sciences, Beijing, China Abstract: Key words: Efficiency of algorithms
More informationA PGI LIBRARY TUTORIAL
A PGI LIBRARY TUTORIAL EBSCOHOST DATABASES Part II II. ADVANCED SEARCH TECHNIQUES There are a number of ways to refine your search with the intention of increasing your results and/or making them more
More informationLeveraging Set Relations in Exact Set Similarity Join
Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,
More informationis an electronic document that is both user friendly and library friendly
is an electronic document that is both user friendly and library friendly is easy to read and to navigate it has bookmarks and an interactive table-of-contents is practical to consult and arouses more
More informationEXTRACTION OF RELEVANT WEB PAGES USING DATA MINING
Chapter 3 EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING 3.1 INTRODUCTION Generally web pages are retrieved with the help of search engines which deploy crawlers for downloading purpose. Given a query,
More informationTools and Infrastructure for Supporting Enterprise Knowledge Graphs
Tools and Infrastructure for Supporting Enterprise Knowledge Graphs Sumit Bhatia, Nidhi Rajshree, Anshu Jain, and Nitish Aggarwal IBM Research sumitbhatia@in.ibm.com, {nidhi.rajshree,anshu.n.jain}@us.ibm.com,nitish.aggarwal@ibm.com
More informationAlberto Messina, Maurizio Montagnuolo
A Generalised Cross-Modal Clustering Method Applied to Multimedia News Semantic Indexing and Retrieval Alberto Messina, Maurizio Montagnuolo RAI Centre for Research and Technological Innovation Madrid,
More informationIMPLEMENTATION AND COMPARATIVE STUDY OF IMPROVED APRIORI ALGORITHM FOR ASSOCIATION PATTERN MINING
IMPLEMENTATION AND COMPARATIVE STUDY OF IMPROVED APRIORI ALGORITHM FOR ASSOCIATION PATTERN MINING 1 SONALI SONKUSARE, 2 JAYESH SURANA 1,2 Information Technology, R.G.P.V., Bhopal Shri Vaishnav Institute
More informationUsing JSTOR. September 2014
Using JSTOR September 2014 Presentation Agenda 1. What is JSTOR? 2. JSTOR demonstration Searching JSTOR Format of the journal content Using a MyJSTOR account to organize research Linking to content on
More informationResearch Article Apriori Association Rule Algorithms using VMware Environment
Research Journal of Applied Sciences, Engineering and Technology 8(2): 16-166, 214 DOI:1.1926/rjaset.8.955 ISSN: 24-7459; e-issn: 24-7467 214 Maxwell Scientific Publication Corp. Submitted: January 2,
More informationRanking Algorithms For Digital Forensic String Search Hits
DIGITAL FORENSIC RESEARCH CONFERENCE Ranking Algorithms For Digital Forensic String Search Hits By Nicole Beebe and Lishu Liu Presented At The Digital Forensic Research Conference DFRWS 2014 USA Denver,
More informationAn Improved Apriori Algorithm for Association Rules
Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan
More informationDesign and Implementation of Agricultural Information Resources Vertical Search Engine Based on Nutch
619 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 51, 2016 Guest Editors: Tichun Wang, Hongyang Zhang, Lei Tian Copyright 2016, AIDIC Servizi S.r.l., ISBN 978-88-95608-43-3; ISSN 2283-9216 The
More informationSCOPUS. Scuola di Dottorato di Ricerca in Bioscienze e Biotecnologie. Polo bibliotecario di Scienze, Farmacologia e Scienze Farmaceutiche
SCOPUS COMPARISON OF JOURNALS INDEXED BY WEB OF SCIENCE AND SCOPUS (May 2012) from: JISC Academic Database Assessment Tool SCOPUS: A FEW NUMBERS (November 2012)
More informationPlagiarism Detection Using FP-Growth Algorithm
Northeastern University NLP Project Report Plagiarism Detection Using FP-Growth Algorithm Varun Nandu (nandu.v@husky.neu.edu) Suraj Nair (nair.sur@husky.neu.edu) Supervised by Dr. Lu Wang December 10,
More informationLearning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li
Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,
More informationOPTIMIZED METHOD FOR INDEXING THE HIDDEN WEB DATA
International Journal of Information Technology and Knowledge Management July-December 2011, Volume 4, No. 2, pp. 673-678 OPTIMIZED METHOD FOR INDEXING THE HIDDEN WEB DATA Priyanka Gupta 1, Komal Bhatia
More informationArchives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment
Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment Shigeo Sugimoto Research Center for Knowledge Communities Graduate School of Library, Information
More informationParticular experience in design and implementation of a Current Research Information System in Russia: national specificity
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 00 (2014) 000 000 www.elsevier.com/locate/procedia CRIS 2014 Particular experience in design and implementation of a Current
More informationData Hiding on Text Using Big-5 Code
Data Hiding on Text Using Big-5 Code Jun-Chou Chuang 1 and Yu-Chen Hu 2 1 Department of Computer Science and Communication Engineering Providence University 200 Chung-Chi Rd., Shalu, Taichung 43301, Republic
More informationContent Based Smart Crawler For Efficiently Harvesting Deep Web Interface
Content Based Smart Crawler For Efficiently Harvesting Deep Web Interface Prof. T.P.Aher(ME), Ms.Rupal R.Boob, Ms.Saburi V.Dhole, Ms.Dipika B.Avhad, Ms.Suvarna S.Burkul 1 Assistant Professor, Computer
More informationTISA Methodology Threat Intelligence Scoring and Analysis
TISA Methodology Threat Intelligence Scoring and Analysis Contents Introduction 2 Defining the Problem 2 The Use of Machine Learning for Intelligence Analysis 3 TISA Text Analysis and Feature Extraction
More informationData Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.
Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan
More informationRanking Web Pages by Associating Keywords with Locations
Ranking Web Pages by Associating Keywords with Locations Peiquan Jin, Xiaoxiang Zhang, Qingqing Zhang, Sheng Lin, and Lihua Yue University of Science and Technology of China, 230027, Hefei, China jpq@ustc.edu.cn
More informationDomain-specific Concept-based Information Retrieval System
Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical
More informationKeywords: Extraction, Training, Classification 1. INTRODUCTION 2. EXISTING SYSTEMS
ISSN XXXX XXXX 2017 IJESC Research Article Volume 7 Issue No.5 Forex Detection using Neural Networks in Image Processing Aditya Shettigar 1, Priyank Singal 2 BE Student 1, 2 Department of Computer Engineering
More informationResearch and Application of E-Commerce Recommendation System Based on Association Rules Algorithm
Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm Qingting Zhu 1*, Haifeng Lu 2 and Xinliang Xu 3 1 School of Computer Science and Software Engineering,
More informationTIPS FOR USING GOOGLE
TIPS FOR USING GOOGLE GOOGLE SEARCH TIPS There are numerous general Web search engines, such as Bing, Yahoo, and Ask. Probably the most well known engine is Google, in part because it is more sophisticated
More informationA Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2
A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,
More informationEfficient Algorithm for Frequent Itemset Generation in Big Data
Efficient Algorithm for Frequent Itemset Generation in Big Data Anbumalar Smilin V, Siddique Ibrahim S.P, Dr.M.Sivabalakrishnan P.G. Student, Department of Computer Science and Engineering, Kumaraguru
More informationScientific databases
SCID 305 : Generic Skills in Science Research Scientific databases Suang Udomvaraphunt Academic IT Stang Monkolsuk library and Information Division Faculty of Science Stang Mongkolsuk Library http://stang.sc.mahidol.ac.th
More informationAn Implementation of Tree Pattern Matching Algorithms for Enhancement of Query Processing Operations in Large XML Trees
An Implementation of Tree Pattern Matching Algorithms for Enhancement of Query Processing Operations in Large XML Trees N. Murugesan 1 and R.Santhosh 2 1 PG Scholar, 2 Assistant Professor, Department of
More informationIntegrating Text Mining with Image Processing
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727 PP 01-05 www.iosrjournals.org Integrating Text Mining with Image Processing Anjali Sahu 1, Pradnya Chavan 2, Dr. Suhasini
More informationScienceDirect. Goes beyond search to research. Genevieve Musasa - Customer Consultant Africa
1 ScienceDirect Goes beyond search to research Genevieve Musasa - Customer Consultant Africa G.musasa@elsevier.com Stefan Blanché -Your Account Manager S.Blanche@elsevier.com April 2015 www.elsevierafrica.com
More informationWeb Data mining-a Research area in Web usage mining
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,
More informationSimultaneous Perturbation Stochastic Approximation Algorithm Combined with Neural Network and Fuzzy Simulation
.--- Simultaneous Perturbation Stochastic Approximation Algorithm Combined with Neural Networ and Fuzzy Simulation Abstract - - - - Keywords: Many optimization problems contain fuzzy information. Possibility
More informationInformation Management (IM)
1 2 3 4 5 6 7 8 9 Information Management (IM) Information Management (IM) is primarily concerned with the capture, digitization, representation, organization, transformation, and presentation of information;
More informationAN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES
Journal of Defense Resources Management No. 1 (1) / 2010 AN OVERVIEW OF SEARCHING AND DISCOVERING Cezar VASILESCU Regional Department of Defense Resources Management Studies Abstract: The Internet becomes
More informationEfficiently Mining Positive Correlation Rules
Applied Mathematics & Information Sciences An International Journal 2011 NSP 5 (2) (2011), 39S-44S Efficiently Mining Positive Correlation Rules Zhongmei Zhou Department of Computer Science & Engineering,
More informationPatternRank: A Software-Pattern Search System Based on Mutual Reference Importance
PatternRank: A Software-Pattern Search System Based on Mutual Reference Importance Atsuto Kubo, Hiroyuki Nakayama, Hironori Washizaki, Yoshiaki Fukazawa Waseda University Department of Computer Science
More informationRecovering Interaction Design Patterns in Web Applications
Recovering Interaction Design Patterns in Web Applications P. Tramontana A.R. Fasolino Dipartimento di Informatica e Sistemistica University of Naples Federico II, Italy G.A. Di Lucca RCOST Research Centre
More informationWeb of Science. Platform Release Nina Chang Product Release Date: December 10, 2017 EXTERNAL RELEASE DOCUMENTATION
Web of Science EXTERNAL RELEASE DOCUMENTATION Platform Release 5.27 Nina Chang Product Release Date: December 10, 2017 Document Version: 1.0 Date of issue: December 7, 2017 RELEASE OVERVIEW The following
More informationA Survey On Different Text Clustering Techniques For Patent Analysis
A Survey On Different Text Clustering Techniques For Patent Analysis Abhilash Sharma Assistant Professor, CSE Department RIMT IET, Mandi Gobindgarh, Punjab, INDIA ABSTRACT Patent analysis is a management
More information