Science 2.0 VU Processing Science 2.0 Data, Content Mining
|
|
- Jeffery Lyons
- 5 years ago
- Views:
Transcription
1 W I S S E N n T E C H N I K n L E I D E N S C H A F T Science 2.0 VU Processing Science 2.0 Data, Content Mining Elisabeth Lex KTI, TU Graz WS 2015/16 u
2 Agenda Repetition from last time: Open Science Processing academic resources Mining in academic resources (content perspective) Example: ContentMine: Extraction of scientific facts 2
3 Repetition: Open Science Open Science Ideas, Concepts, Benefits and Pitfalls E.g. Enhancing collaboration and communitybuilding, increasing efficiency of research vs no reward system yet Open Data Sharing your data influences how often you get cited (Piwowar, et al., 2007 and Pinowar, et a., 2013) Different models for Open Access Green vs. Gold vs. Hybrid 3
4 Open Science 5 schools of thought 4
5 Example: Open Government Data: Eurostat I d like to compare the unemployment rate in Austria with other European ones Via Google Public Data Explorer, 5
6 Open Access in Science: Open Access Journals Green ( self-archiving): author can self-archive at the time of submission of the publication whether the publication is grey literature (usually internal non-peer-reviewed), a peer-reviewed journal publication, a peer-reviewed conference proceedings paper or a monograph Gold ( author pays ): the author or author institution can pay a fee to the publisher at publication time, the publisher then makes the publication available 'free' at the point of access. further little-used road hybrid forms: for example platinum open access (does not charge author fees)... Both green and gold are compatible and can co-exist Source: Jeffery, K. Open Access: An Introduction,
7 Processing Academic Resources 7
8 Motivation Aggregate scientific results Exploratory search in digital collections Find experts in domains Make science discoverable Improve access to scientific publications Extract facts for research Discover relationships Check for errors => improve science
9 How? Aggregate and manage data: repositories, aggregators, datasets,... Mining in Academic Resources Information Extraction Topic Modeling Clustering/Classification Linking publications Make available data and source code J 9
10 KDD Process 10
11 How? Aggregate and manage data: repositories, aggregators, datasets,... Mining in Academic Resources Information Extraction Topic Modeling Clustering/Classification Linking publications Make available data and source code J 11
12 Datasets The European Library Open Dataset Digital collection and 200 mio bibliographic records opendata Datahub.io E.g. DBLP Computer Science Bibliography Metadata of over 1.8 mio publications by 1 mio authors 12
13 Repositories and Aggregators ISI Web of Science Scopus Pubmed The European Library Library of Congress ArXiv Figshare Data Citation Index Mendeley Google Scholar CiteSeerX... 13
14 APIs to Repositories... APIs to access scientific publications and research data ropensci: arxiv, PlosOne, Figshare Mendeley: Developer API, Python package: pip install mendeley 14
15 Example - ropensci 15
16 How? Aggregate and manage data: repositories, aggregators, datasets,... Mining in Academic Resources Information Extraction Topic Modeling Clustering / Classification Linking publications Make available data and source code J 16
17 Information Extraction IE Goal: Extract structured information out of unstructured content, e.g. Method names, quantities, temporal expressions Authors from scientific publications Organizations in acknowledgements section of papers References... 17
18 IE Process ORGANIZATION, PERSON, LOCATION, DATE, TIME, MONEY, and GPE (geo-political entity) Applying word classes to words within a sentence Input: raw text of a document Output: list of (entity, relation, entity) 18
19 IE Standard Approaches (1/2) Regular expressions / Rule-based approaches E.g. dates, RT@user 19
20 IE as Machine Learning Task Supervised: train model with annotated training data, use trained model to classify unknown text Choose a class label for a given input Identify features of language data to classify it Construct language models out of them Learn about text/language from these models Methods: Classifiers: Naive Bayes, Maxent Models Sequence models: Hidden Markov Models, CRFs 20
21 Libraries NLTK ( 21
22 Mining academic documents Extraction of structural elements Tables, figures,.. Extraction of facts from structural elements and doc Named Entity Recognition (e.g. gene names,..) Relation extraction (e.g. system A impacts system B) Mostly: PDF format Good for presentation but problems with metadata quality, hard to analyse While PDF analysis tools exist, there is still room for improvement! 22
23 Approach Divide and conquer Extracting blocks from the PDF based on structure and layout information Classify the extracted blocks E.g. into title, body, references, abstract,.. Classify content of extracted blocks E.g. tables Extract relevant info from the content (Named Entities, nouns, dates, quantities,..) 23
24 Approach Extracting blocks Features: layout specific such as position, font, font size,.. Apply Machine Learning approches Unsupervised (clustering) Supervised (classification) 24
25 Unsupervised Approach Clustering: given a set of objects find the groupings of objects so that the similarity within a group is maximized and the similarity between groups is minimized Cluster = block Successive merge and split mechanism 25
26 Supervised Approach Classification: given a set of labeled examples, create a model and use it to predict the label of unknown examples Classify blocks: Maximum Entropy Models Create training data by labeling blocks, i.e. assigning blocks to classes Learn a model based on the training data and apply it to classify unknown blocks Features: layout, formatting, word frequencies,.. 26
27 Fact Extraction from Publications Extract entities from within the identified blocks E.g. author block divide further to extract all authors contained in the block Extract relations between entities Open Information Extraction Learns a models without needing training data Can extract binary relations from sentences 27
28 Example: Measuring quality of Wikipedia Unbalanced Balanced Measure Value [%] Value [%] Accuracy F-Measure Precision Recall Elisabeth Lex, Michael Voelske, Marcelo Errecalde, Edgardo Ferretti, Leticia Cagnina, Christopher Horn, Benno Stein, and Michael Granitzer Measuring the quality of web content using factual information. In Proceedings of WebQuality '12 at WWW 12
29 Extract Topics from Publications Topic Models: algorithms that uncover thematic structure in document collections Facilitate searching, browsing, summarizing Latent Dirichlet Allocation (LDA) Hierarchical probabilistic model 29 18/11/15
30 LDA Probabilistic model that helps find latent topics for documents Probabilistic model: treat data as observations that stem from a generative proabilistic process which involves hidden variables Documents: Thematic structure are the hidden variables Each topic is described by words in the documents 30 18/11/15
31 LDA Probability of ith word for doc d Probability of ti within topic zi Probability of using a word from topic zi in the doc 31 Infer hidden structure using posterior inference What are the topics that describe the documents? Classify unknown data using the topic model How does unknown data fit into estimated topic structure? Nr of topics Z has to be choosen in advance Defines level of specification of topics 18/11/15
32 Example: Model evolution of topics over time in Science journal Dataset: pages Science from from JSTOR archive 32 18/11/15
33 Validation of extracted information Crowdsourcing as a way to evaluate mining quality Share the extracted information via e.g. a Webbased platform Enable users to give feedback Accept, reject, suggest new concepts/facts 33
34 HowTo: Text Mining using ropensci Library that facilitates text mining on publications Search for articles Fetch articles Get links for full text articles (xml, pdf) Extract text from articles / convert formats Collect bits of articles that you actually need Download supplementary materials from papers Chamberlain Scott (2015). fulltext: Full Text of Scholarly Articles Across Many Data Sources. R package version
35 Example: Text Mining using ropensci #include the library! library("fulltext )! #ft_search() - get metadata on a search query.! > (res1 <- ft_search(query = 'open science', from = 'arxiv'))! > (out <- ft_get(res1))! > res1$arxiv!! # ft_get() - get full or partial text of articles.! > res <- ft_get('cs/ v1', from='arxiv')!! #extract the fulltext! > res2 <- ft_extract(res)! > res2$arxiv$data!! #extract interesting parts from the fulltext! > out %>% chunks("doi")! 35
36 Example: Text Mining using ropensci fulltext can extract parts of a paper via chunks(): all, front, body, back, title, doi, categories, authors, keywords, abstract, executive_summary, refs, refs_dois, publisher, journal_meta, article_meta, acknowledgments, permissions, history! Can do PDF extraction E.g. via GhostScript: (res_gs <- ft_extract(pdf, "gs"))!
37 How? Aggregate and manage data: repositories, aggregators, datasets,... Mining in Academic Resources Information Extraction Topic Modeling Clustering/Classification Linking publications Make available data and source code J 37
38 Clustering of Academic Resources Detect groupings of papers based on content similarity E.g. alongside of topics Transform content (e.g. abstract of a paper) into machine readable representation Bag of Words approach: document treated as bag of words/terms, represented as vector Document-Term matrix: term frequencies across all documents 38
39 Vector Space Model Documents are vectors in Term- Document Space Elements of vector are weights wij corresponding to doc i and term j Weights: frequencies of terms in docs TF-IDF Proximity of documents (similarity) calculated by cosine of angle between document vectors 39
40 Example: Facilitate exploratory search By topic of interest (cluster = topic of interest) Setting: Social bookmarking dataset, URLs described by tags Research Questions: What clusters (aka groups of interests) exist? Are they somehow related? How do they evolve over time?
41 Clustering Algorithms KDD lectures! Here, briefly: K-means algorithm 1. Select k points as initial centroids 2. Repeat 3. Form k clusters by assigning all points to closest centroid 4. Recompute centroid of each cluster 5. Until centroids don t change 41 18/11/15
42 Example n
43
44 Classification of Scientific Publications Categorize into established subject-based taxonomy E.g. Library of Congress UNESCO thesaurus DOAJ subject classification Library of Congress Subject Headings 44
45 How? Aggregate and manage data: repositories, aggregators, datasets,... Mining in Academic Resources Information Extraction Topic Modeling Clustering/Classification Linking publications Make available data and source code J 45
46 Linking Scientific Publications Citations (explicitely defined) Similarity Statistical similarity: cosine Semantic similarity: more complex, e.g. via topics Usage Argument support Contradiction... 46
47 n Linking via Citations 47
48 How? Aggregate and manage data: repositories, aggregators, datasets,... Mining in Academic Resources Information Extraction Clustering / Classification Linking publications Search Make available data and source code J 48
49 Sharing code Github Bitbucket ipython Notebooks... 49
50 Example: ContentMine Idea: facts cannot be copyrighted Billion of facts in copyrightprotected research articles à Make them publicly accessible! 50
51 Possible questions for ContentMine Find references to papers by a given author. This is metadata and therefore factual. It is usually trivial to extract references and authors. More difficult, of course to disambiguate. Find who sponsors research. Extract acknowledgements and perform Named Entity Recognition to detect companies. Link the companies to the papers where they are listed in the acknowledgement 51
52 Machine Extraction of scientific facts n 1. Crawl scientific literature 2. Scrape each scientific article 3. Extract facts 4. Index 5. Republish (WikiData)
53 Example: retrieve metadata for specific article 53 18/11/15
54 Content Mining Problems Secondary publishers create walled gardens E.g. ResearchGate portal Publishers contracts ban content-mining. Publishers may cut off universities who mine Publishers lobby governments to require licences for content mining UK à the right to read is the right to mine
55 Summary Aggregators/repos for scientific publications Mining content/data in publications Information / fact extraction Topic modeling Clustering E.g. Exploratory analysis of large datasets Find groups of interest expressed by user generated tags and their relations ContentMine as example 55
56 Questions? See you next week! 56
Your Open Science and Research Publishing Platform. 1st SciShops Summer School
Your Open Science and Research Publishing Platform 1st SciShops Summer School to researchers? to Open Science? Personal / project / community profile Thematic / personal / project repositories Enriched
More informationMaster Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala
Master Project Various Aspects of Recommender Systems May 2nd, 2017 Master project SS17 Albert-Ludwigs-Universität Freiburg Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue
More informationScience 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis
W I S S E N n T E C H N I K n L E I D E N S C H A F T Science 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis Elisabeth Lex KTI, TU Graz WS 2015/16 u www.tugraz.at
More informationFeature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web
Feature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web Chenghua Lin, Yulan He, Carlos Pedrinaci, and John Domingue Knowledge Media Institute, The Open University
More informationScholarly Big Data: Leverage for Science
Scholarly Big Data: Leverage for Science C. Lee Giles The Pennsylvania State University University Park, PA, USA giles@ist.psu.edu http://clgiles.ist.psu.edu Funded in part by NSF, Allen Institute for
More informationClustering using Topic Models
Clustering using Topic Models Compiled by Sujatha Das, Cornelia Caragea Credits for slides: Blei, Allan, Arms, Manning, Rai, Lund, Noble, Page. Clustering Partition unlabeled examples into disjoint subsets
More informationKnowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.
Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More informationIntroduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.
Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How
More informationMake the most of your access to ScienceDirect
1 Make the most of your access to ScienceDirect Present Future 2 ScienceDirect Training Deck We re here to help you make the most of your access to ScienceDirect. ScienceDirect offers researchers the latest
More informationVALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER
VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER CS6007-INFORMATION RETRIEVAL Regulation 2013 Academic Year 2018
More informationOrganize. Collaborate. Discover. All About Mendeley
Organize. Collaborate. Discover. www.mendeley.com All About Mendeley 1 What is Mendeley? Free Academic Software Cross-Platform (Win/Mac/Linux/Mobile) All Major Browsers Desktop Web Mobile How does Mendeley
More informationA Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2
A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation
More informationChapter 50 Tracing Related Scientific Papers by a Given Seed Paper Using Parscit
Chapter 50 Tracing Related Scientific Papers by a Given Seed Paper Using Parscit Resmana Lim, Indra Ruslan, Hansin Susatya, Adi Wibowo, Andreas Handojo and Raymond Sutjiadi Abstract The project developed
More informationAn Introduction to Search Engines and Web Navigation
An Introduction to Search Engines and Web Navigation MARK LEVENE ADDISON-WESLEY Ал imprint of Pearson Education Harlow, England London New York Boston San Francisco Toronto Sydney Tokyo Singapore Hong
More informationInformation Retrieval
Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have
More informationQuestion Answering Systems
Question Answering Systems An Introduction Potsdam, Germany, 14 July 2011 Saeedeh Momtazi Information Systems Group Outline 2 1 Introduction Outline 2 1 Introduction 2 History Outline 2 1 Introduction
More informationRepresentation/Indexing (fig 1.2) IR models - overview (fig 2.1) IR models - vector space. Weighting TF*IDF. U s e r. T a s k s
Summary agenda Summary: EITN01 Web Intelligence and Information Retrieval Anders Ardö EIT Electrical and Information Technology, Lund University March 13, 2013 A Ardö, EIT Summary: EITN01 Web Intelligence
More informationText Mining. Representation of Text Documents
Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,
More informationSemantic Scholar. ICSTI Towards a More Efficient Review of Research Literature 11 September
Semantic Scholar ICSTI Towards a More Efficient Review of Research Literature 11 September 2018 Allen Institute for Artificial Intelligence (https://allenai.org/) Non-profit Research Institute in Seattle,
More informationDATA MINING II - 1DL460. Spring 2014"
DATA MINING II - 1DL460 Spring 2014" A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt14 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationScholarly collaboration platforms
Scholarly collaboration platforms STM Meeting 22 April 2015 Washington, DC Mark Ware @mrkwr Question: Which social network do researchers know & use almost as much as Google Scholar? Source: Reprinted
More informationBuilding Institutional Repositories: Emerging Challenges
University of Nebraska at Omaha From the SelectedWorks of Yumi Ohira 2014 Building Institutional Repositories: Emerging Challenges Yumi Ohira, University of Nebraska at Omaha Available at: https://works.bepress.com/yumi-ohira/3/
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Data Matrices and Vector Space Model Denis Helic KTI, TU Graz Nov 6, 2014 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 1 / 55 Big picture: KDDM Probability
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationOutline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity
Outline Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Lecture 10 CS 410/510 Information Retrieval on the Internet Query reformulation Sources of relevance for feedback Using
More informationScuola di dottorato in Scienze molecolari Information literacy in chemistry 2015 SCOPUS
SCOPUS ORIGINAL RESEARCH INFORMATION IN SCIENCE is published (stored) in PRIMARY LITERATURE it refers to the first place a scientist will communicate to the general audience in a publicly accessible document
More informationScopus. Information literacy in Chemistry. J une 14, 2011
Information literacy in Chemistry Scopus J une 14, 2011 BIBLIOGRAPHIC DATABASE electronic archive of bibliographic records that refer to published academic literature the records are structured and organized
More informationOverview of Web Mining Techniques and its Application towards Web
Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous
More informationBing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web
More informationCitation Services for Institutional Repositories: Citebase Search. Tim Brody Intelligence, Agents, Multimedia Group University of Southampton
Citation Services for Institutional Repositories: Citebase Search Tim Brody Intelligence, Agents, Multimedia Group University of Southampton Content The Research Literature The Open Access Literature Why
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,
More informationCSE 158. Web Mining and Recommender Systems. Midterm recap
CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158
More informationA BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK
A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific
More informationProf. Ahmet Süerdem Istanbul Bilgi University London School of Economics
Prof. Ahmet Süerdem Istanbul Bilgi University London School of Economics Media Intelligence Business intelligence (BI) Uses data mining techniques and tools for the transformation of raw data into meaningful
More informationWeb of Science. Platform Release Nina Chang Product Release Date: December 10, 2017 EXTERNAL RELEASE DOCUMENTATION
Web of Science EXTERNAL RELEASE DOCUMENTATION Platform Release 5.27 Nina Chang Product Release Date: December 10, 2017 Document Version: 1.0 Date of issue: December 7, 2017 RELEASE OVERVIEW The following
More informationAmerican Institute of Physics
American Institute of Physics (http://journals.aip.org/)* Founded in 1931, the American Institute of Physics (AIP) is a not-for-profit scholarly society established for the purpose of promoting the advancement
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationPositive and Negative Links
Positive and Negative Links Web Science (VU) (707.000) Elisabeth Lex KTI, TU Graz May 4, 2015 Elisabeth Lex (KTI, TU Graz) Networks May 4, 2015 1 / 66 Outline 1 Repetition 2 Motivation 3 Structural Balance
More informationCollective Intelligence in Action
Collective Intelligence in Action SATNAM ALAG II MANNING Greenwich (74 w. long.) contents foreword xv preface xvii acknowledgments xix about this book xxi PART 1 GATHERING DATA FOR INTELLIGENCE 1 "1 Understanding
More informationLink Mining & Entity Resolution. Lise Getoor University of Maryland, College Park
Link Mining & Entity Resolution Lise Getoor University of Maryland, College Park Learning in Structured Domains Traditional machine learning and data mining approaches assume: A random sample of homogeneous
More informationMaximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009
Maximizing the Value of STM Content through Semantic Enrichment Frank Stumpf December 1, 2009 What is Semantics and Semantic Processing? Content Knowledge Framework Technology Framework Search Text Images
More informationYour Research Social Media: Leverage the Mendeley platform for your needs
Your Research Social Media: Leverage the Mendeley platform for your needs Thelmal Huang, Elsevier Email: th.huang@elsevier.com Cell Phone: 0930660745 www.mendeley.com All But a the lot time of the we state
More informationDocument Clustering for Mediated Information Access The WebCluster Project
Document Clustering for Mediated Information Access The WebCluster Project School of Communication, Information and Library Sciences Rutgers University The original WebCluster project was conducted at
More informationOrganize Collaborate Share. By:
Organize Collaborate Share By: Perpustakaan Tun Abdul Razak (PTAR) Kampus Puncak Alam Unit Perkhidmatan Akademik Updated: Januari 2017 Agenda What is Mendeley? Creating your library Managing your documents
More informationCitation Services for Institutional Repositories: Citebase Search. Tim Brody Intelligence, Agents, Multimedia Group University of Southampton
Citation Services for Institutional Repositories: Citebase Search Tim Brody Intelligence, Agents, Multimedia Group University of Southampton 28/04/2009 2 28/04/2009 3 Content The Open Access Literature
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationContent-based Recommender Systems
Recuperação de Informação Doutoramento em Engenharia Informática e Computadores Instituto Superior Técnico Universidade Técnica de Lisboa Bibliography Pasquale Lops, Marco de Gemmis, Giovanni Semeraro:
More informationDeveloping Focused Crawlers for Genre Specific Search Engines
Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com
More informationWEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS
1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,
More informationOntology based Model and Procedure Creation for Topic Analysis in Chinese Language
Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.
More informationDemystifying Scopus APIs
0 Demystifying Scopus APIs Massimiliano Bearzot Customer Consultant South Europe April 17, 2018 1 What You Will Learn Today about Scopus APIs Simplistically, how do Scopus APIs work & why do they matter?
More informationInformation Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Information Retrieval CS 6900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Information Retrieval Information Retrieval (IR) is finding material of an unstructured
More informationSciX Open, self organising repository for scientific information exchange. D15: Value Added Publications IST
IST-2001-33127 SciX Open, self organising repository for scientific information exchange D15: Value Added Publications Responsible author: Gudni Gudnason Co-authors: Arnar Gudnason Type: software/pilot
More informationEmpowering People with Knowledge the Next Frontier for Web Search. Wei-Ying Ma Assistant Managing Director Microsoft Research Asia
Empowering People with Knowledge the Next Frontier for Web Search Wei-Ying Ma Assistant Managing Director Microsoft Research Asia Important Trends for Web Search Organizing all information Addressing user
More informationCS490W. Text Clustering. Luo Si. Department of Computer Science Purdue University
CS490W Text Clustering Luo Si Department of Computer Science Purdue University [Borrows slides from Chris Manning, Ray Mooney and Soumen Chakrabarti] Clustering Document clustering Motivations Document
More informationText Mining. Munawar, PhD. Text Mining - Munawar, PhD
10 Text Mining Munawar, PhD Definition Text mining also is known as Text Data Mining (TDM) and Knowledge Discovery in Textual Database (KDT).[1] A process of identifying novel information from a collection
More informationCombining Text Embedding and Knowledge Graph Embedding Techniques for Academic Search Engines
Combining Text Embedding and Knowledge Graph Embedding Techniques for Academic Search Engines SemDeep-4, Oct. 2018 Gengchen Mai Krzysztof Janowicz Bo Yan STKO Lab, University of California, Santa Barbara
More informationBengkel Kelestarian Jurnal Pusat Sitasi Malaysia. Digital Object Identifier Way Forward. 12 Januari 2017
Bengkel Kelestarian Jurnal Pusat Sitasi Malaysia Digital Object Identifier Way Forward Thanaletchumi Dharmalingam Malaysia Citation Centre 12 Januari 2017 1. What is DOI? 2. Why DOI? 3. How do I get a
More informationHow to Use Google Scholar An Educator s Guide
http://scholar.google.com/ How to Use Google Scholar An Educator s Guide What is Google Scholar? Google Scholar provides a simple way to broadly search for scholarly literature. Google Scholar helps you
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,
More informationUsing Linked Data and taxonomies to create a quick-start smart thesaurus
7) MARJORIE HLAVA Using Linked Data and taxonomies to create a quick-start smart thesaurus 1. About the Case Organization The two current applications of this approach are a large scientific publisher
More informationSupervised classification of law area in the legal domain
AFSTUDEERPROJECT BSC KI Supervised classification of law area in the legal domain Author: Mees FRÖBERG (10559949) Supervisors: Evangelos KANOULAS Tjerk DE GREEF June 24, 2016 Abstract Search algorithms
More informationInternational Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani
LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models
More informationCreating a Classifier for a Focused Web Crawler
Creating a Classifier for a Focused Web Crawler Nathan Moeller December 16, 2015 1 Abstract With the increasing size of the web, it can be hard to find high quality content with traditional search engines.
More informationKristina Lerman University of Southern California. This lecture is partly based on slides prepared by Anon Plangprasopchok
Kristina Lerman University of Southern California This lecture is partly based on slides prepared by Anon Plangprasopchok Social Web is a platform for people to create, organize and share information Users
More informationScientific databases
SCID 305 : Generic Skills in Science Research Scientific databases Suang Udomvaraphunt Academic IT Stang Monkolsuk library and Information Division Faculty of Science Stang Mongkolsuk Library http://stang.sc.mahidol.ac.th
More informationRESEARCH ANALYTICS From Web of Science to InCites. September 20 th, 2010 Marta Plebani
RESEARCH ANALYTICS From Web of Science to InCites September 20 th, 2010 Marta Plebani marta.plebani@thomsonreuters.com Web Of Science: main purposes Find high-impact articles and conference proceedings.
More informationAN EFFICIENT PROCESSING OF WEBPAGE METADATA AND DOCUMENTS USING ANNOTATION Sabna N.S 1, Jayaleshmi S 2
AN EFFICIENT PROCESSING OF WEBPAGE METADATA AND DOCUMENTS USING ANNOTATION Sabna N.S 1, Jayaleshmi S 2 1 M.Tech Scholar, Dept of CSE, LBSITW, Poojappura, Thiruvananthapuram sabnans1988@gmail.com 2 Associate
More informationUSC Viterbi School of Engineering
Introduction to Computational Thinking and Data Science USC Viterbi School of Engineering http://www.datascience4all.org Term: Fall 2016 Time: Tues- Thur 10am- 11:50am Location: Allan Hancock Foundation
More informationHistorical Text Mining:
Historical Text Mining Historical Text Mining, and Historical Text Mining: Challenges and Opportunities Dr. Robert Sanderson Dept. of Computer Science University of Liverpool azaroth@liv.ac.uk http://www.csc.liv.ac.uk/~azaroth/
More informationLIBRARY RESOURCES & GUIDES APA STYLE YOUR LITERATURE REVIEW PRIMARY & SECONDARY SOURCES SEARCHING LIBRARY E-RESOURCES ( DATABASES ) FOR ARTICLES
2015 Feb LITERATURE REVIEW ASSIGNMENT Library Class Outline Centennial College Libraries homepage http://library.centennialcollege.ca/ LIBRARY RESOURCES & GUIDES APA STYLE YOUR LITERATURE REVIEW PRIMARY
More informationAccess Innovations, Inc.
2016. Access Innovations, Inc. All rights reserved. Welcome To DCMI Special Session: Applying Taxonomies in Publishing Leveraging Your Semantic Enrichment Investment 13 October 2016, 10:30 to 12:00 Access
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering May 25, 2011 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig Homework
More informationSummarizing Public Opinion on a Topic
Summarizing Public Opinion on a Topic 1 Abstract We present SPOT (Summarizing Public Opinion on a Topic), a new blog browsing web application that combines clustering with summarization to present an organized,
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationIntroduction to Bibliometrics and Tools for Organizing References. Uta Grothkopf ESO Library
Introduction to Bibliometrics and Tools for Organizing References Uta Grothkopf ESO Library esolib@eso.org Astronomy evaluate? Possible measures Number of talks Invitations to conferences Students, graduations
More informationMendeley: A Reference Management Tools
Mendeley: A Reference Management Tools OSSLM-2016 WORKSHOP MANUAL Prepared by Dr. Samir Kumar Jalal, Deputy Librarian National Workshop on Open Source Software for Library Management (OSSLM 2016) June
More informationPerspectives on Open Data in Science Open Data in Science: Challenges & Opportunities for Europe
Perspectives on Open Data in Science Open Data in Science: Challenges & Opportunities for Europe Stephane Berghmans, DVM PhD 31 January 2018 9 When talking about data, we talk about All forms of research
More informationOpen Access to Publications in H2020
Workshop Open Science and European OA policies in H2020 Open Access to Publications in H2020 Pedro Principe, University of Minho 26 April 2016 AGENDA Open Access in Europe: from FP7 to H2020 OA in H2020:
More informationData Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.
Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan
More informationOpen Access & Open Data in H2020
Open Access & Open Data in H2020 Services & Support Hannelore Vanhaverbeke, PhD Research Coordination Office What are you in for? Mandatory Each beneficiary must ensure open access (free of charge, online
More informationLinda Strick Fraunhofer FOKUS. EOSC Summit - Rules of Participation Workshop, Brussels 11th June 2018
Linda Strick Fraunhofer FOKUS EOSC Summit - Rules of Participation Workshop, Brussels 11th June 2018 EOSC Business Models, Data Management Policies, Data Security & Legal Issues 16:30 17:16 Room 0B Panelists:
More informationHow to contribute information to AGRIS
How to contribute information to AGRIS Guidelines on how to complete your registration form The dashboard includes information about you, your institution and your collection. You are welcome to provide
More informationA Measurement Design for the Comparison of Expert Usability Evaluation and Mobile App User Reviews
A Measurement Design for the Comparison of Expert Usability Evaluation and Mobile App User Reviews Necmiye Genc-Nayebi and Alain Abran Department of Software Engineering and Information Technology, Ecole
More informationLibrary resources in philology
Library resources in philology l 1 Search strategy Identify and define your information needs. Select relevant information sources. Identify suitable search terms. Establish relationships between these
More informationCitation extraction and modeling. Meen Chul Kim, Andrea Forte, Aaron Halfaker
Citation extraction and modeling Meen Chul Kim, Andrea Forte, Aaron Halfaker History 2005 - Rebuilt Mediawiki with references as first class objects in the system. - it had a summary page and discussion
More informationInfoSci -Databases Platform
InfoSci -Databases Platform User Guide 07 A Database of Information Science and Technology Research IGIGlobal www.igi-global.com InfoSci -Databases Platform User Guide 07 Getting Started: IGI Global is
More informationNSF gateway to Scientific literature
NSF gateway to Scientific literature Workshop on Proposal Writing National Science Foundation 19 June 2012 Sunethra Perera Outline NSF Literature Local Literature at the NSF Local Literature at Other institutions
More information60-538: Information Retrieval
60-538: Information Retrieval September 7, 2017 1 / 48 Outline 1 what is IR 2 3 2 / 48 Outline 1 what is IR 2 3 3 / 48 IR not long time ago 4 / 48 5 / 48 now IR is mostly about search engines there are
More informationCORE: Improving access and enabling re-use of open access content using aggregations
CORE: Improving access and enabling re-use of open access content using aggregations Petr Knoth CORE (Connecting REpositories) Knowledge Media institute The Open University @petrknoth 1/39 Outline 1. The
More informationEvent: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect
Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect BEOP.CTO.TP4 Owner: OCTO Revision: 0001 Approved by: JAT Effective: 08/30/2018 Buchanan & Edwards Proprietary: Printed copies of
More informationReview on Text Mining
Review on Text Mining Aarushi Rai #1, Aarush Gupta *2, Jabanjalin Hilda J. #3 #1 School of Computer Science and Engineering, VIT University, Tamil Nadu - India #2 School of Computer Science and Engineering,
More informationSHARING YOUR RESEARCH DATA VIA
SHARING YOUR RESEARCH DATA VIA SCHOLARBANK@NUS MEET OUR TEAM Gerrie Kow Head, Scholarly Communication NUS Libraries gerrie@nus.edu.sg Estella Ye Research Data Management Librarian NUS Libraries estella.ye@nus.edu.sg
More information