Unstructured Text in Big Data The Elephant in the Room

Size: px
Start display at page:

Download "Unstructured Text in Big Data The Elephant in the Room"

Transcription

1 Unstructured Text in Big Data The Elephant in the Room David Milward ICIC, October 2013

2 Click Unstructured to to edit edit Master Master Big title Data style title style Big Data Volume, Variety, Velocity Estimated 80% of all data is unstructured Need to be able to make decisions based on this data Ever increasing amount of data makes it harder and harder to filter by hand Need more automation 80% 20% Linguamatics Ltd.

3 Click From to to Unstructured edit edit Master Master title style Data title style to Structured Identify concepts and relationships Structured, relational data Normalized concept identifiers Unstructured or Semi-Structured Content Sources Traditional text mining works in batch mode Requires each extraction to be programmed in, or learnt from a large amount of annotated material Linguamatics Confidential

4 Click I2E Agile to to edit edit Text Master Master Mining title style title style Querying Results Information Consumers Confidential

5 Click Increasing to to edit edit Master Range Master title Of style title Applications style Vocabulary Development Target Identification & Prioritization Safety/Tox Sentiment Analysis in Social Media Mining FDA Drug Conference In the Labels beginning... Abstract SAR Competitive Protein- Extraction Intelligence Protein Interactions Plant Metabolic Pathways Mining Key Opinion Leader Identification Biomarker Discovery Mining Electronic Medical Records Patent FTO Chemical Search In-licensing Opportunities Patent Analysis Extracting Numerical and Experimental Data Systems Biology Drug Repositioning Confidential

6 Click The to Database to edit edit Master Master Approach title style title style Can we predetermine what is required? One approach is to try to extract everything but in practice have to make assumptions, whether by hand or automatically Do we include negative statements? Do we include speculation? What context do we include? Need a good idea of how the data will be used to know what needs to be extracted There are cases where this works well e.g. have useful structured data and want to add to it from free text have new records structured, but want to bring in information from legacy free text Linguamatics Ltd.

7 Click Extracting to to edit edit Master for Master a title Relational style title style Database Extracting all relevant information from pathology reports Linguamatics Ltd.

8 Click Extracting to to edit edit Master for Master a title Semantic style title style Store Output as a set of triples Because all the parts have URIs, the parts are webaddressable Peptide Treat Express Associate Psoriasis Infectious Disease Web Linguamatics Ltd.

9 Click The to Ad to edit Hoc edit Master Approach Master title style title style No need to build a database Regard text mining queries as ways to create a database on the fly Keep all the unstructured free text, and query for what you need when you need it Even index structured data with text mining to link information together Used very successfully by knowledge professionals: Just like a search engine, there are thousands of questions people want to ask of the data You can t predict them all beforehand Linguamatics Ltd.

10 Click Ad Hoc to to edit Querying edit Master Master title style title style Find relationships that you want for any particular information request Treat the text mining as an agile database to answer thousands of different questions e.g. metabolites from fruit Linguamatics Ltd.

11 Click Do We to to edit Need edit Master Master More? title style title style I2E Agile Text Mining has opened up text mining to the end user, but it is still mainly used by knowledge professionals Given the importance of unstructured big data, can we bring the benefits of text mining to an even wider audience? Linguamatics Ltd.

12 Click Workflow to to edit edit Master Automation Master title style title style Successful queries converted into regular workflows using Pipeline Pilot or KNIME Real-time workflows for up-to-date dashboards, alerts Further analysis, visualization, integration with other data Pipeline Pilot 12

13 Click Clinical to to edit Trials edit Master Master Analysis title style title from style I2E Text Mining Results Colours indicate Phase Dates Using Pipeline Pilot visualization Linguamatics Ltd.

14 Click Embedding to to edit edit Master I2E Master title within style title Web style Apps 10/21/ _8.pptx

15 Click Form-Based to to edit edit Master Master Querying title style title style Create web interfaces connecting to I2E server Information professionals develop sophisticated queries Queries are parameterized to allow end users to customize Javascript allows portability to mobile devices such as ipads

16 Click Enhancing to to edit edit Master Enterprise Master title style title Search style Use text mining to annotate concepts to feed into a search engine to: provide concept search provide concepts for facets provide intelligent thumbnails for documents, pulling out key information Linguamatics Ltd.

17 Value Click Semantic to to edit edit Master annotation Master title style title of documents style Relationships Concepts Keywords NLP-based Text Mining Search Engine

18 Click Provide to to edit edit Concepts Master Master title style title Facets style for Enterprise Search Document Refine the search information and based on teaser terminologies Document preview and additional information Linguamatics Ltd.

19 Click Dictionary to to edit edit Master Matching Master title style title - Companies style

20 Click Pattern to to edit edit Matching Master Master title - style Institutions title style Not a fixed list: can find previously unknown institutions

21 Click Pattern to to edit edit Matching Master Master title - style Mutations title style

22 Click Pattern to to edit edit Matching Master Master title - style Chemicals title style Assign structure to novel chemicals Distinguish exemplified compounds Distinguish compounds given properties

23 Click Whatever to to edit edit Master the Master Content... title style title style... I2E can mine and extract with precision Scientific literature Twitter

24 Click. and to to edit Wherever edit Master Master title style title style Internal Content MEDLINE Patents Drug Labels Clinical Trials Publisher Content Provider I2E Enterprise I2E OnDemand I2E Platform External Content I2E Platform I2E 4.1 Linked Servers Users can query local information or information on the cloud Proprietary content can reside within Enterprise Standard sources e.g. patents can be hosted on the cloud, and shared by all users Data from content providers can reside on their sites 24

25 Click Bringing to to edit edit the Master Master Elephant title style title Down style to Size Data warehouses for data that you know you need Embedding text mining in other applications e.g. Enterprise Search to provide benefits of semantic search Building specialized new interfaces to provide self-service applications for end-users Continuing role for ad hoc text mining text mining can ask almost any question of the data you can never predict every question Linguamatics Ltd.

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural

More information

Precise Medication Extraction using Agile Text Mining

Precise Medication Extraction using Agile Text Mining Precise Medication Extraction using Agile Text Mining Chaitanya Shivade *, James Cormack, David Milward * The Ohio State University, Columbus, Ohio, USA Linguamatics Ltd, Cambridge, UK shivade@cse.ohio-state.edu,

More information

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value KNOWLEDGENT INSIGHTS volume 1 no. 5 October 7, 2011 Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value Today s growing commercial, operational and regulatory

More information

What is Text Mining? Sophia Ananiadou National Centre for Text Mining University of Manchester

What is Text Mining? Sophia Ananiadou National Centre for Text Mining   University of Manchester National Centre for Text Mining www.nactem.ac.uk University of Manchester Outline Aims of text mining Text Mining steps Text Mining uses Applications 2 Aims Extract and discover knowledge hidden in text

More information

Semantic Knowledge Discovery OntoChem IT Solutions

Semantic Knowledge Discovery OntoChem IT Solutions Semantic Knowledge Discovery OntoChem IT Solutions OntoChem IT Solutions GmbH Blücherstr. 24 06120 Halle (Saale) Germany Tel. +49 345 4780472 Fax: +49 345 4780471 mail: info(at)ontochem.com Get the Gold!

More information

<Insert Picture Here> Introduction to Big Data Technology

<Insert Picture Here> Introduction to Big Data Technology Introduction to Big Data Technology The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into

More information

Cost-Benefit Analysis of Retrospective vs. Prospective Data Standardization

Cost-Benefit Analysis of Retrospective vs. Prospective Data Standardization Cost-Benefit Analysis of Retrospective vs. Prospective Data Standardization Vicki Seyfert-Margolis, PhD Senior Advisor, Science Innovation and Policy Food and Drug Administration IOM Sharing Clinical Research

More information

SELF-SERVICE SEMANTIC DATA FEDERATION

SELF-SERVICE SEMANTIC DATA FEDERATION SELF-SERVICE SEMANTIC DATA FEDERATION WE LL MAKE YOU A DATA SCIENTIST Contact: IPSNP Computing Inc. Chris Baker, CEO Chris.Baker@ipsnp.com (506) 721 8241 BIG VISION: SELF-SERVICE DATA FEDERATION Biomedical

More information

Data Mining and Warehousing

Data Mining and Warehousing Data Mining and Warehousing Sangeetha K V I st MCA Adhiyamaan College of Engineering, Hosur-635109. E-mail:veerasangee1989@gmail.com Rajeshwari P I st MCA Adhiyamaan College of Engineering, Hosur-635109.

More information

An Oracle White Paper October Oracle Social Cloud Platform Text Analytics

An Oracle White Paper October Oracle Social Cloud Platform Text Analytics An Oracle White Paper October 2012 Oracle Social Cloud Platform Text Analytics Executive Overview Oracle s social cloud text analytics platform is able to process unstructured text-based conversations

More information

Fast Innovation requires Fast IT

Fast Innovation requires Fast IT Fast Innovation requires Fast IT Cisco Data Virtualization Puneet Kumar Bhugra Business Solutions Manager 1 Challenge In Data, Big Data & Analytics Siloed, Multiple Sources Business Outcomes Business Opportunity:

More information

Big Data in Translational Science

Big Data in Translational Science Big Data in Translational Science Albert Wang Associate Director, Translational R&D IT Bristol-Myers Squibb 2015 AAPS Annual Meeting Agenda Perspectives on Big Data Big Data in Translational R&D Selected

More information

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX 1 Successful companies know that analytics are key to winning customer loyalty, optimizing business processes and beating their

More information

D360: Unlock the value of your scientific data Solving Informatics Problems for Translational Research

D360: Unlock the value of your scientific data Solving Informatics Problems for Translational Research D360: Unlock the value of your scientific data Solving Informatics Problems for Translational Research Dr. Fabian Bös, Senior Application Scientist Certara Spain SL Martin-Kollar-Str. 17, 81829 Munich

More information

Text mining tools for semantically enriching the scientific literature

Text mining tools for semantically enriching the scientific literature Text mining tools for semantically enriching the scientific literature Sophia Ananiadou Director National Centre for Text Mining School of Computer Science University of Manchester Need for enriching the

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

Use of Semantic Technologies at Eli Lilly and Company. J Phil Brooks Information Consultant, SE Data Team Discover IT Eli Lilly and Company

Use of Semantic Technologies at Eli Lilly and Company. J Phil Brooks Information Consultant, SE Data Team Discover IT Eli Lilly and Company Use of Semantic Technologies at Eli Lilly and Company J Phil Brooks Information Consultant, SE Data Team Discover IT Eli Lilly and Company Notable Semantic Projects at Lilly Discovery Metadata Integration

More information

Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services. Patrick Wendel Imperial College, London

Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services. Patrick Wendel Imperial College, London Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services Patrick Wendel Imperial College, London Data Mining and Exploration Middleware for Distributed and Grid Computing,

More information

Bioqueries: A Social Community Sharing Experiences while Querying Biological Linked Data (

Bioqueries: A Social Community Sharing Experiences while Querying Biological Linked Data ( Bioqueries: A Social Community Sharing Experiences while Querying Biological Linked Data (http://bioqueries.uma.es) María Jesús García-Godoy, Ismael Navas-Delgado, José Francisco Aldana Montes Computing

More information

Visualization and text mining of patent and non-patent data

Visualization and text mining of patent and non-patent data of patent and non-patent data Anton Heijs Information Solutions Delft, The Netherlands http://www.treparel.com/ ICIC conference, Nice, France, 2008 Outline Introduction Applications on patent and non-patent

More information

Semantic Annotation, Search and Analysis

Semantic Annotation, Search and Analysis Semantic Annotation, Search and Analysis Borislav Popov, Ontotext Ontology A machine readable conceptual model a common vocabulary for sharing information machine-interpretable definitions of concepts in

More information

Life Science Research Center (LSRC) Rachel Henning, Dr. Eliot Randle

Life Science Research Center (LSRC) Rachel Henning, Dr. Eliot Randle Life Science Research Center (LSRC) Rachel Henning, Dr. Eliot Randle Infotrieve The leading integrated solution provider of information management and services Over 3,000 organizations and over 50,000

More information

NPP & Blockchain Have you thought about the data? Ken Krupa, CTO, MarkLogic

NPP & Blockchain Have you thought about the data? Ken Krupa, CTO, MarkLogic NPP & Blockchain Have you thought about the data? Ken Krupa, CTO, MarkLogic Hello SLIDE: 2 14 COPYRIGHT November 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. A QUICK LOOK New Payments Platform Open

More information

SmartData Fabric distributed virtual data, graph data and master data management, analytics and security. Solutions and Key Features Revision 2.

SmartData Fabric distributed virtual data, graph data and master data management, analytics and security. Solutions and Key Features Revision 2. s and Key Features Revision 2.5 Page 1 of 7 www.whamtech.com (972) 991-5700 info@whamtech.com March 2018 ID SOL1 Automated Data Discovery and Classification (ADDC) Key Feature ID KF01 KF02 KF03 Key Feature

More information

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009 Maximizing the Value of STM Content through Semantic Enrichment Frank Stumpf December 1, 2009 What is Semantics and Semantic Processing? Content Knowledge Framework Technology Framework Search Text Images

More information

Natural Language Processing with PoolParty

Natural Language Processing with PoolParty Natural Language Processing with PoolParty Table of Content Introduction to PoolParty 2 Resolving Language Problems 4 Key Features 5 Entity Extraction and Term Extraction 5 Shadow Concepts 6 Word Sense

More information

Essential for Employee Engagement. Frequently Asked Questions

Essential for Employee Engagement. Frequently Asked Questions Essential for Employee Engagement The Essential Communications Intranet is a solution that provides clients with a ready to use framework that takes advantage of core SharePoint elements and our years

More information

Overview of Data Services and Streaming Data Solution with Azure

Overview of Data Services and Streaming Data Solution with Azure Overview of Data Services and Streaming Data Solution with Azure Tara Mason Senior Consultant tmason@impactmakers.com Platform as a Service Offerings SQL Server On Premises vs. Azure SQL Server SQL Server

More information

The Emerging Data Lake IT Strategy

The Emerging Data Lake IT Strategy The Emerging Data Lake IT Strategy An Evolving Approach for Dealing with Big Data & Changing Environments bit.ly/datalake SPEAKERS: Thomas Kelly, Practice Director Cognizant Technology Solutions Sean Martin,

More information

Transforming the Economics of

Transforming the Economics of Transforming the Economics of Real-World Evidence Analytics Integrated environment expedites data availability, reduces costs, and extends business intelligence access throughout the organization 1 Executive

More information

e-scider: A tool to retrieve, prioritize and analyze the articles from PubMed database Sujit R. Tangadpalliwar 1, Rakesh Nimbalkar 2, Prabha Garg* 3

e-scider: A tool to retrieve, prioritize and analyze the articles from PubMed database Sujit R. Tangadpalliwar 1, Rakesh Nimbalkar 2, Prabha Garg* 3 e-scider: A tool to retrieve, prioritize and analyze the articles from PubMed database Sujit R. Tangadpalliwar 1, Rakesh Nimbalkar 2, Prabha Garg* 3 1 National Institute of Pharmaceutical Education and

More information

Linked Data: Fast, low cost semantic interoperability for health care?

Linked Data: Fast, low cost semantic interoperability for health care? Linked Data: Fast, low cost semantic interoperability for health care? About the presentation Part I: Motivation Why we need semantic operability in health care Why enhancing existing systems to increase

More information

CRIS2016 conference Valerie Brasse and Laurent Remy 11 th of June 2016, St Andrews

CRIS2016 conference Valerie Brasse and Laurent Remy 11 th of June 2016, St Andrews CRIS2016 conference Valerie Brasse and Laurent Remy 11 th of June 2016, St Andrews 2016/06/11 Strasbourg IHU Knowledge Base: a CERIF implementation 1 Stakeholders Institut Hospitalo-Universitaire = teaching

More information

SciVerse Scopus. 1. Scopus introduction and content coverage. 2. Scopus in comparison with Web of Science. 3. Basic functionalities of Scopus

SciVerse Scopus. 1. Scopus introduction and content coverage. 2. Scopus in comparison with Web of Science. 3. Basic functionalities of Scopus Prepared by: Jawad Sayadi Account Manager, United Kingdom Elsevier BV Radarweg 29 1043 NX Amsterdam The Netherlands J.Sayadi@elsevier.com SciVerse Scopus SciVerse Scopus 1. Scopus introduction and content

More information

Content Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered.

Content Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered. Content Enrichment An essential strategic capability for every publisher Enriched content. Delivered. An essential strategic capability for every publisher Overview Content is at the centre of everything

More information

Customisable Curation Workflows in Argo

Customisable Curation Workflows in Argo Customisable Curation Workflows in Argo Rafal Rak*, Riza Batista-Navarro, Andrew Rowley, Jacob Carter and Sophia Ananiadou National Centre for Text Mining, University of Manchester, UK *Corresponding author:

More information

Markush Structure Usability in Patent and Combinatorial Chemistry: New Approaches and Software Tools. Wei Deng (David)

Markush Structure Usability in Patent and Combinatorial Chemistry: New Approaches and Software Tools. Wei Deng (David) Markush Structure Usability in Patent and Combinatorial Chemistry: New Approaches and Software Tools Wei Deng (David) Chemical Space in Patents Exemplified structures Chemical space: 10 0-10 3 Indexed

More information

Challenges and Opportunities with Big Data. By: Rohit Ranjan

Challenges and Opportunities with Big Data. By: Rohit Ranjan Challenges and Opportunities with Big Data By: Rohit Ranjan Introduction What is Big Data? Big data is data sets that are so voluminous and complex that traditional data processing application software

More information

Network analysis. Martina Kutmon Department of Bioinformatics Maastricht University

Network analysis. Martina Kutmon Department of Bioinformatics Maastricht University Network analysis Martina Kutmon Department of Bioinformatics Maastricht University What's gonna happen today? Network Analysis Introduction Quiz Hands-on session ConsensusPathDB interaction database Outline

More information

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper. Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...

More information

BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data

BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data BioNav: An Ontology-Based Framework to Discover Semantic Links in the Cloud of Linked Data María-Esther Vidal 1, Louiqa Raschid 2, Natalia Márquez 1, Jean Carlo Rivera 1, and Edna Ruckhaus 1 1 Universidad

More information

Transforming IT: From Silos To Services

Transforming IT: From Silos To Services Transforming IT: From Silos To Services Chuck Hollis Global Marketing CTO EMC Corporation http://chucksblog.emc.com @chuckhollis IT is being transformed. Our world is changing fast New Technologies New

More information

Data Lakes and Their Implication on the Global Health Sector v2.0

Data Lakes and Their Implication on the Global Health Sector v2.0 Data Lakes and Their Implication on the Global Health Sector v2.0 Research Blogs Expert Opinions Pekka Neittaanmäki Dean of the Faculty of Information Technology Professor, Department of Mathematical Information

More information

Informatica Enterprise Information Catalog

Informatica Enterprise Information Catalog Data Sheet Informatica Enterprise Information Catalog Benefits Automatically catalog and classify all types of data across the enterprise using an AI-powered catalog Identify domains and entities with

More information

Embase for biomedical searching An introduction. Presented by Sherry Winter January 27, 2015

Embase for biomedical searching An introduction. Presented by Sherry Winter January 27, 2015 1 Embase for biomedical searching An introduction Presented by Sherry Winter January 27, 2015 2 2 Need to know Webinar control panel: Ask a question for questions and comments Option for full screen view

More information

REDEFINING THE ENTERPRISE

REDEFINING THE ENTERPRISE REDEFINING THE ENTERPRISE ENABLING IT AND BUSINESS TRANSFORMATION WITH INDUSTRY BENCHMARKS 1 TODAY S BUSINESS CHALLENGES REACT FASTER TO FIND NEW GROWTH CUT OPERATIONAL COSTS & LEGACY MORE THAN EVER 2

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

Building a Data Strategy for a Digital World

Building a Data Strategy for a Digital World Building a Data Strategy for a Digital World Jason Hunter, CTO, APAC Data Challenge: Pushing the Limits of What's Possible The Art of the Possible Multiple Government Agencies Data Hub 100 s of Service

More information

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud Microsoft Azure Databricks for data engineering Building production data pipelines with Apache Spark in the cloud Azure Databricks As companies continue to set their sights on making data-driven decisions

More information

HTML 5 and CSS 3, Illustrated Complete. Unit M: Integrating Social Media Tools

HTML 5 and CSS 3, Illustrated Complete. Unit M: Integrating Social Media Tools HTML 5 and CSS 3, Illustrated Complete Unit M: Integrating Social Media Tools Objectives Understand social networking Integrate a Facebook account with a Web site Integrate a Twitter account feed Add a

More information

Election Analysis and Prediction Using Big Data Analytics

Election Analysis and Prediction Using Big Data Analytics Election Analysis and Prediction Using Big Data Analytics Omkar Sawant, Chintaman Taral, Roopak Garbhe Students, Department Of Information Technology Vidyalankar Institute of Technology, Mumbai, India

More information

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC SAP Agile Data Preparation Simplify the Way You Shape Data Introduction SAP Agile Data Preparation Overview Video SAP Agile Data Preparation is a self-service data preparation application providing data

More information

PERSPECTIVE. Data Virtualization A Potential Antidote for Big Data Growing Pains. Abstract

PERSPECTIVE. Data Virtualization A Potential Antidote for Big Data Growing Pains. Abstract PERSPECTIVE Data Virtualization A Potential Antidote for Big Data Growing Pains Abstract Enterprises are already facing challenges around data consolidation, heterogeneity, quality, and value. Now they

More information

TEXT MINING: THE NEXT DATA FRONTIER

TEXT MINING: THE NEXT DATA FRONTIER TEXT MINING: THE NEXT DATA FRONTIER An Infrastructural Approach Dr. Petr Knoth CORE (core.ac.uk) Knowledge Media institute, The Open University United Kingdom 2 OpenMinTeD Establish an open and sustainable

More information

Alternative Tools for Mining The Biomedical Literature

Alternative Tools for Mining The Biomedical Literature Yale University From the SelectedWorks of Rolando Garcia-Milian May 14, 2014 Alternative Tools for Mining The Biomedical Literature Rolando Garcia-Milian, Yale University Available at: https://works.bepress.com/rolando_garciamilian/1/

More information

Pfizerpedia Patents The Who, what when and why of patents. David Walsh, Andrew Berridge, RDMi, Pfizer, Sandwich

Pfizerpedia Patents The Who, what when and why of patents. David Walsh, Andrew Berridge, RDMi, Pfizer, Sandwich Pfizerpedia Patents The Who, what when and why of patents David Walsh, Andrew Berridge, RDMi, Pfizer, Sandwich We are Pfizer - We like Patents! Lipitor patent The best selling US prescription medicine

More information

When, Where & Why to Use NoSQL?

When, Where & Why to Use NoSQL? When, Where & Why to Use NoSQL? 1 Big data is becoming a big challenge for enterprises. Many organizations have built environments for transactional data with Relational Database Management Systems (RDBMS),

More information

Text Mining. Representation of Text Documents

Text Mining. Representation of Text Documents Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,

More information

Third generation of Data Virtualization

Third generation of Data Virtualization White Paper Third generation of Data Virtualization Write back to the sources An Enterprise Enabler white paper from Stone Bond Technologies Copyright 2014 Stone Bond Technologies, L.P. All rights reserved.

More information

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems

Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems 1 Taming Structured And Unstructured Data With SAP HANA Running On VCE Vblock Systems The Defacto Choice For Convergence 2 ABSTRACT & SPEAKER BIO Dealing with enormous data growth is a key challenge for

More information

Big Linked Data ETL Benchmark on Cloud Commodity Hardware

Big Linked Data ETL Benchmark on Cloud Commodity Hardware Big Linked Data ETL Benchmark on Cloud Commodity Hardware iminds Ghent University Dieter De Witte, Laurens De Vocht, Ruben Verborgh, Erik Mannens, Rik Van de Walle Ontoforce Kenny Knecht, Filip Pattyn,

More information

Přehled novinek v SQL Server 2016

Přehled novinek v SQL Server 2016 Přehled novinek v SQL Server 2016 Martin Rys, BI Competency Leader martin.rys@adastragrp.com https://www.linkedin.com/in/martinrys 20.4.2016 1 BI Competency development 2 Trends, modern data warehousing

More information

SureChem and ChEMBL. ACS CINF webinar. John P. Overington & Nicko Goncharoff

SureChem and ChEMBL. ACS CINF webinar. John P. Overington & Nicko Goncharoff SureChem and ChEMBL ACS CINF webinar John P. Overington & Nicko Goncharoff 8 th April 2014 Assay/Target ChEMBL Data for Drug Discovery 1. Scientific facts 3. Insight, tools and resources for translational

More information

Transitioning to Symyx

Transitioning to Symyx Whitepaper Transitioning to Symyx Notebook by Accelrys from Third-Party Electronic Lab Notebooks Ordinarily in a market with strong growth, vendors do not focus on competitive displacement of competitor

More information

Big Data Integration BIG DATA 9/15/2017. Business Performance

Big Data Integration BIG DATA 9/15/2017. Business Performance BIG DATA Business Performance Big Data Integration Big data is often about doing things that weren t widely possible because the technology was not advanced enough or the cost of doing so was prohibitive.

More information

Data Analytics at Logitech Snowflake + Tableau = #Winning

Data Analytics at Logitech Snowflake + Tableau = #Winning Welcome # T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief

More information

Life Sciences Oracle Based Solutions. June 2004

Life Sciences Oracle Based Solutions. June 2004 Life Sciences Oracle Based Solutions June 2004 Overview of Accelrys Leading supplier of computation tools to the life science and informatics research community: Bioinformatics Cheminformatics Modeling/Simulation

More information

Overview of Web Mining Techniques and its Application towards Web

Overview of Web Mining Techniques and its Application towards Web Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous

More information

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Chapter 3. Foundations of Business Intelligence: Databases and Information Management Chapter 3 Foundations of Business Intelligence: Databases and Information Management THE DATA HIERARCHY TRADITIONAL FILE PROCESSING Organizing Data in a Traditional File Environment Problems with the traditional

More information

From Data to Knowledge: Semantics and Implementations

From Data to Knowledge: Semantics and Implementations PhUSE 2015 Paper TT04 From Data to Knowledge: Semantics and Implementations Judith Goud, Akana, Bennekom, The Netherlands ABSTRACT Biopharma organizations are collecting increasingly more data for regulatory

More information

Interoperability. Doug Fridsma, MD PhD President & CEO, AMIA

Interoperability. Doug Fridsma, MD PhD President & CEO, AMIA Interoperability Doug Fridsma, MD PhD President & CEO, AMIA Getting to Interoperability To get to interoperability (or to avoid information blocking) we need a common understanding of the problem Can t

More information

Information Retrieval, Information Extraction, and Text Mining Applications for Biology. Slides by Suleyman Cetintas & Luo Si

Information Retrieval, Information Extraction, and Text Mining Applications for Biology. Slides by Suleyman Cetintas & Luo Si Information Retrieval, Information Extraction, and Text Mining Applications for Biology Slides by Suleyman Cetintas & Luo Si 1 Outline Introduction Overview of Literature Data Sources PubMed, HighWire

More information

Some useful resources. Data-mining

Some useful resources. Data-mining Some useful resources Data-mining Data Mining? Yeah, I could use a nap What We ll Discuss Why search Who should do it Sources National Library of Medicine Includes the NLM Gateway, PubMed, ClinicalTrials.gov,

More information

Query-Time JOIN for Active Intelligence Engine (AIE)

Query-Time JOIN for Active Intelligence Engine (AIE) Query-Time JOIN for Active Intelligence Engine (AIE) Ad hoc JOINing of Structured Data and Unstructured Content: An Attivio-Patented Breakthrough in Information- Centered Business Agility An Attivio Technology

More information

Qwam E-content Server (QES) : Software solutions for searching, monitoring, managing and sharing of professional web content

Qwam E-content Server (QES) : Software solutions for searching, monitoring, managing and sharing of professional web content Qwam E-content Server (QES) : Software solutions for searching, monitoring, managing and sharing of professional web content - Competitive, R&D and market intelligence solutions (Qwam Watch/KM Suite) -

More information

Digital Enterprise Platform for Live Business. Kevin Liu SAP Greater China, Vice President General Manager of Big Data and Platform BU

Digital Enterprise Platform for Live Business. Kevin Liu SAP Greater China, Vice President General Manager of Big Data and Platform BU Digital Enterprise Platform for Live Business Kevin Liu SAP Greater China, Vice President General Manager of Big Data and Platform BU Rethinking the Future Competing in today s marketplace means leveraging

More information

DICOM Research Applications - life at the fringe of reality

DICOM Research Applications - life at the fringe of reality SPIE Medical Imaging 2009 DICOM Research Applications - life at the fringe of reality David Clunie RadPharm, Inc. Overview Range of research applications Clinical versus research context Commonalities

More information

SNOMED Clinical Terms

SNOMED Clinical Terms Representing clinical information using SNOMED Clinical Terms with different structural information models KR-MED 2008 - Phoenix David Markwell Laura Sato The Clinical Information Consultancy Ltd NHS Connecting

More information

An overview of Graph Categories and Graph Primitives

An overview of Graph Categories and Graph Primitives An overview of Graph Categories and Graph Primitives Dino Ienco (dino.ienco@irstea.fr) https://sites.google.com/site/dinoienco/ Topics I m interested in: Graph Database and Graph Data Mining Social Network

More information

ETL is No Longer King, Long Live SDD

ETL is No Longer King, Long Live SDD ETL is No Longer King, Long Live SDD How to Close the Loop from Discovery to Information () to Insights (Analytics) to Outcomes (Business Processes) A presentation by Brian McCalley of DXC Technology,

More information

Using Linked Data and taxonomies to create a quick-start smart thesaurus

Using Linked Data and taxonomies to create a quick-start smart thesaurus 7) MARJORIE HLAVA Using Linked Data and taxonomies to create a quick-start smart thesaurus 1. About the Case Organization The two current applications of this approach are a large scientific publisher

More information

Convergence of classical search and topic maps evidences from a practical case in the chemical industry

Convergence of classical search and topic maps evidences from a practical case in the chemical industry C H A I R S O F Information Systems Convergence of classical search and topic maps evidences from a practical case in the chemical industry Dr. Stefan Smolnik Information Systems 2, European Business School

More information

A chemistry friendly system integrating drug design tools and a consistent visual interface Xin Zhang, Christian Baber

A chemistry friendly system integrating drug design tools and a consistent visual interface Xin Zhang, Christian Baber Cubist Pharmaceuticals The Shape of Cures to Come A chemistry friendly system integrating drug design tools and a consistent visual interface Xin Zhang, Christian Baber Outline Introduction to drug metabolism

More information

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing. About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This

More information

Data Science and Open Source Software. Iraklis Varlamis Assistant Professor Harokopio University of Athens

Data Science and Open Source Software. Iraklis Varlamis Assistant Professor Harokopio University of Athens Data Science and Open Source Software Iraklis Varlamis Assistant Professor Harokopio University of Athens varlamis@hua.gr What is data science? 2 Why data science is important? More data (volume, variety,...)

More information

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Integrating Complex Financial Workflows in Oracle Database Xavier Lopez Seamus Hayes Oracle PolarLake, LTD 2 Copyright 2011, Oracle

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

No boundaries to news production.

No boundaries to news production. No boundaries to news production. NEWS KNOWS NO BOUNDARIES. NEITHER SHOULD YOUR NEWS PRODUCTION SYSTEM. AP ENPS is the one system for your entire news organization whether they are working in the field

More information

Applying Auto-Data Classification Techniques for Large Data Sets

Applying Auto-Data Classification Techniques for Large Data Sets SESSION ID: PDAC-W02 Applying Auto-Data Classification Techniques for Large Data Sets Anchit Arora Program Manager InfoSec, Cisco The proliferation of data and increase in complexity 1995 2006 2014 2020

More information

Sentiment Web Mining Architecture - Shahriar Movafaghi

Sentiment Web Mining Architecture - Shahriar Movafaghi Available online at www.sciencedirect.com Procedia - Social and Behavioral Sciences 26 (2011) 191 197 COINs 2010 Sentiment Web Mining Architecture - Shahriar Movafaghi Shahria Movafaghi a, Jack Bullock

More information

Contractual Approaches to Data Protection in Clinical Research Projects

Contractual Approaches to Data Protection in Clinical Research Projects Contractual Approaches to Data Protection in Clinical Research Projects EICAR, 24th Annual Conference Nürnberg, October 2016 Dr. jur. Marc Stauch Institute for Legal Informatics Leibniz Universität Hannover

More information

Progress DataDirect For Business Intelligence And Analytics Vendors

Progress DataDirect For Business Intelligence And Analytics Vendors Progress DataDirect For Business Intelligence And Analytics Vendors DATA SHEET FEATURES: Direction connection to a variety of SaaS and on-premises data sources via Progress DataDirect Hybrid Data Pipeline

More information

Next-Generation Standards Management with IHS Engineering Workbench

Next-Generation Standards Management with IHS Engineering Workbench ENGINEERING & PRODUCT DESIGN Next-Generation Standards Management with IHS Engineering Workbench The addition of standards management capabilities in IHS Engineering Workbench provides IHS Standards Expert

More information

Deploying, Managing and Reusing R Models in an Enterprise Environment

Deploying, Managing and Reusing R Models in an Enterprise Environment Deploying, Managing and Reusing R Models in an Enterprise Environment Making Data Science Accessible to a Wider Audience Lou Bajuk-Yorgan, Sr. Director, Product Management Streaming and Advanced Analytics

More information

Showing it all a new interface for finding all Norwegian research output

Showing it all a new interface for finding all Norwegian research output Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 00 (2014) 000 000 www.elsevier.com/locate/procedia CRIS 2014 Showing it all a new interface for finding all Norwegian research

More information

Big Data The end of Data Warehousing?

Big Data The end of Data Warehousing? Big Data The end of Data Warehousing? Hermann Bär Oracle USA Redwood Shores, CA Schlüsselworte Big data, data warehousing, advanced analytics, Hadoop, unstructured data Introduction If there was an Unwort

More information

TURNING TEXT INTO INSIGHT: TEXT MINING IN THE LIFE SCIENCES

TURNING TEXT INTO INSIGHT: TEXT MINING IN THE LIFE SCIENCES TURNING TEXT INTO INSIGHT: TEXT MINING IN THE LIFE SCIENCES According to The STM Report (2015), 2.5 million peer-reviewed articles are published in scholarly journals each year. 1 PubMed contains more

More information

AI Application and Development in ehealth Field. MIN Dong

AI Application and Development in ehealth Field. MIN Dong AI Application and Development in ehealth Field MIN Dong What s e-health? Defined by WHO ehealth is the cost-effective and secure use of information and communications technologies(icts) in support of

More information

Chapter 6: ISAR Systems: Functions and Design

Chapter 6: ISAR Systems: Functions and Design Chapter 6: ISAR Systems: Functions and Design Information Search And Retrieval is a system which allow end users to communicate with the system. Every one will use the ISAR system in a different way. Each

More information

Information Workbench

Information Workbench Information Workbench The Optique Technical Solution Christoph Pinkel, fluid Operations AG Optique: What is it, really? 3 Optique: End-user Access to Big Data 4 Optique: Scalable Access to Big Data 5 The

More information