Semantic Scholar ICSTI Towards a More Efficient Review of Research Literature 11 September 2018
Allen Institute for Artificial Intelligence (https://allenai.org/) Non-profit Research Institute in Seattle, Founded by Microsoft Founder Paul Allen AI2 launched Jan. 2014 Team of 50 June 2016 AI for the common good Team of 75 June 2017 Team of ~100 June 2018
Challenges for Researchers The number of scientific papers has doubled every nine years since World War II.* There are over 34,000 scholarly, peer-reviewed journals in existence today, collectively publishing some 2.5 million articles every year. It s estimated that a single researcher, depending on their discipline, will read about 270** of them in the same time frame. *Source: What We Cannot Know. By Prof. Marcus du Sautoy. **The STM Report, March 2015 3
Key Challenges for Researchers Core tasks have become more difficult: Staying current with research Placing research in context Evaluating importance of research Semantic Scholar User Research, 2016 4
A.I.-driven approach to research Make it easy to survey and consume the world's scholarly knowledge
Semantic Scholar User Challenge: Staying current with research Semantic Scholar Approach: Search and discovery of Computer Science and Biomedicine Robust knowledge graph of papers, venues, topics, and authors Alerts on Authors, Papers; Reading library Key Results: 40MM+ Papers Indexed in CS and Bio-medicine Partners: IEEE, Springer Nature, PubMed, Science, MIT Press, ArXiv, +more Global Reach
Growth in indexed papers, partnerships 40MM+ Papers indexed Began with Computer Science, launched biomedicine in Fall 2017 DBLP The online reference for open bibliographic information on computer science journals and proceedings. CiteSeer Scientific literature digital library and search engine. OdySci Academic A website for searching and ranking technical publications. AMiner Mining deep knowledge from scientific sources. Hyper Articles en Ligne (HAL) Open archive run by Centre pour la communication scientifique directe, a French computing centre Science Peer-reviewed academic journal of the American Association for the Advancement of Science. MIT Press The MIT Press is a university press affiliated with the Massachusetts Institute of Technology. Springer Nature Publisher of some of the world's most influential science and technology journals. PubMed More than 27 million citations for biomedical papers. Frontiers Peer-reviewed open access journals in science and technology. ACM Publications and transactions from the Association for Computing Machinery. IEEE Leading organization for technology publishing. ACL Publications and transactions from the Association for Computational Linguistics. ArXiv Open access to e-prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics.
Alerts and Library
Growth in user traffic 1.7MM Monthly Active users 300%+ year on year growth Globally Distributed
Semantic Scholar User Challenge: Placing research in context Semantic Scholar Approach: Augment with relevant additional content: Featured Presentations, Videos, Code Libraries, News, Blogs Automatic generation of topic summary pages with key papers Search relevance incorporates citations, citation influence Key Results: Millions of matched Conference Presentations, Videos, Code Libraries, News Articles, and Scientific Blogs Relevant search results
Placing Research in Context Relevant external content Currently available are millions of highly relevant: Presentations Videos Academic Blogs Code Repositories Evaluating: News Articles Clinical trials Twitter mentions Mendeley references
Placing Research in Context Search Relevance & Filtering
Placing Research in Context Search Relevance & Filtering
Automatically generated Topic Pages
Semantic Scholar User Challenge: Evaluating importance of research Semantic Scholar Approach: Extract citations and build citation graph Extract abstract, charts, tables, and other metadata Models trained to extract numerical results, topics, and other semantics Key Results: Parse 470k entities, 2.5MM relations, 335MM entity mentions Ongoing improvements to boost precision and recall
Evaluating Research: Influential citations
Evaluating Research: Tables, Charts, Metadata
Evaluating Research: Tables, Charts, Metadata
Science Parse Extracting metadata at high precision Title: Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions Authors: Peter Clark, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Turney, Daniel Khashabi
Authors: Clark, P.; Balasubramanian, N.; Bhakthavatsalam, S.; Humphreys, K.; Kinkead, J.; Sabharwal, A.; and Tafjord, O. Title: Automatic construction of inferencesupporting knowledge bases Year: 2014
Groundbreaking AI Research Extract meaningful structures Examples: Entity extraction. Entity linking. Relation extraction. Answering FAQs. Entity discovery. Semantic frames. Figure extraction. Synthesize knowledge Examples: Ontology matching. Literature graph. Table aggregation. Summarization. Citation classification. Topic page compilation. User models. Macro analysis Examples: Association between prepublishing & citations. Identifying demographic bias in clinical trials. Change of affiliations vs. research productivity. Peer reviews.
Research Impact Extract meaningful structures Synthesize knowledge Macro analysis Ammar et al. SemEval 17 Peters et al. ACL 17 Wang et al. BioNLP 18 Bhagavatula et al. NAACL 18 Kang et al. NAACL 18 Feldman et al. arxiv 18 Siegel et al. JCDL 18 Ammar et al. NAACL 18 Beltagy et al. (in submission)