Visualization and text mining of patent and non-patent data

Similar documents
Part I: Data Mining Foundations

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

ECLT 5810 Clustering

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

ECLT 5810 Clustering

Knowledge Discovery and Data Mining 1 (VO) ( )

Table Of Contents: xix Foreword to Second Edition

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Outline. Morning program Preliminaries Semantic matching Learning to rank Entities

PARALLEL CLASSIFICATION ALGORITHMS

Gene Clustering & Classification

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

Bruno Martins. 1 st Semester 2012/2013

Multiple-Choice Questionnaire Group C

Data Set. What is Data Mining? Data Mining (Big Data Analytics) Illustrative Applications. What is Knowledge Discovery?

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis.

Keyword Extraction by KNN considering Similarity among Features

Python With Data Science

TABLE OF CONTENTS CHAPTER NO. TITLE PAGENO. LIST OF TABLES LIST OF FIGURES LIST OF ABRIVATION

Clustering Analysis Basics

KMX Patent Analytics

Introduction to Web Clustering

Data Mining Concepts. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech

Clustering & Classification (chapter 15)

Data Mining: STATISTICA

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Text Mining. Representation of Text Documents

MATRIX BASED SEQUENTIAL INDEXING TECHNIQUE FOR VIDEO DATA MINING

Contents. Preface to the Second Edition

Natural Language Processing

Intro to Artificial Intelligence

Outline. Prepare the data Classification and regression Clustering Association rules Graphic user interface

Machine Learning Part 1

Chapter 27 Introduction to Information Retrieval and Web Search

SCHEME OF COURSE WORK. Data Warehousing and Data mining

Interoperability First Published On: Last Updated On:

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction

Ajloun National University

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Machine Learning using MapReduce

Customer Clustering using RFM analysis

Specialist ICT Learning

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

Oracle9i Data Mining. Data Sheet August 2002

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

DATA WAREHOUING UNIT I

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

String Vector based KNN for Text Categorization

Clustering and Visualisation of Data

Chapter 1, Introduction

Gábor Imre MADFAST SIMILARITY SEARCH

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

NUS-I2R: Learning a Combined System for Entity Linking

Supervised Reranking for Web Image Search

ADaM version 4.0 (Eagle) Tutorial Information Technology and Systems Center University of Alabama in Huntsville

Dynamic Visualization of Hubs and Authorities during Web Search

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

Table of Contents 1 Introduction A Declarative Approach to Entity Resolution... 17

Mining Web Data. Lijun Zhang

Outline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity

Contextual Search using Cognitive Discovery Capabilities

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry

Correlation Based Feature Selection with Irrelevant Feature Removal

PATTERN CLASSIFICATION AND SCENE ANALYSIS

Ubiquitous Computing and Communication Journal (ISSN )

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Query Phrase Expansion using Wikipedia for Patent Class Search

DATA MINING TEST 2 INSTRUCTIONS: this test consists of 4 questions you may attempt all questions. maximum marks = 100 bonus marks available = 10

MACHINE LEARNING FOR SOFTWARE MAINTAINABILITY

Clustering Analysis based on Data Mining Applications Xuedong Fan

Analyzer of Bio-resource Citations. World Data Center of Microorganisms(WDCM)

Best Practices to Ensure Comprehensive Prior Art Searches

IMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Big Data Analytics! Special Topics for Computer Science CSE CSE Feb 9

Creating a Recommender System. An Elasticsearch & Apache Spark approach

Answer All Questions. All Questions Carry Equal Marks. Time: 20 Min. Marks: 10.

QMiner is a data analytics platform for processing large-scale real-time streams containing structured and unstructured data.

Stat 602X Exam 2 Spring 2011

1.2 What Spotlight and Strata users can expect

Creating a Classifier for a Focused Web Crawler

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

Data Mining Concepts & Tasks

DIGIT.B4 Big Data PoC

TABLE OF CONTENTS PAGE TITLE NO.

Introduction to Data Mining and Data Analytics

KMX Analytics Documentation

D B M G Data Base and Data Mining Group of Politecnico di Torino

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

2. Basic Task of Pattern Classification

Social Behavior Prediction Through Reality Mining

Transcription:

of patent and non-patent data Anton Heijs Information Solutions Delft, The Netherlands http://www.treparel.com/ ICIC conference, Nice, France, 2008

Outline Introduction Applications on patent and non-patent literature Patent mining and visualization Mining and visualization Medline Conclusions and future development

Introduction Mining + visualization = visual analytics More data - less insight Overview needed Text mining approaches controlling over/under-fit Rule based : difficult to control under fit Natural language processing : difficult to control over fit Machine learning : controls over-fit and under-fit

Document clustering : finding patterns unsupervised Unsupervised text mining : clustering Input is a vector representation of all documents Separate the data in its natural groups based on a similarity metric Output is a similarity matrix

Document classification : finding patterns supervised Supervised text mining : classification Input is a vector representation of all documents Determine a hyper plane which separates all the data Binary classification : classify to 1 class Multi class classification : classify to multiple classes Compound classification : combine multiple binary classifiers Output is a matrix with classification scores

Text analytics Text analytics is text mining and visualization Combined use of mining and visualization for analysis Uses text mining to determine trends in the data Use visualization to make complex data understandable

Applications on patent and non-patent literature Text mining and visualization steps management and preprocessing data text mining (filtering the data in the pipeline) mapping filtered data to geometry rendering and interaction

Patent analytics applications Patent search using classification and clustering algorithms Patent landscaping, spotting new opportunities Patent ranking, utilization optimization and valuation

Search by querie by class

Tree map visualization of the patent classification

Clustering of patents of 7 classes in Optical Recording

Clustering of large patent sets

Building a classifier

Testing a classifier

Visualization of the classifier performance

Patent analytics application combining mining and visualization

Correlation between patents

Patents over time

Patents visualization hierarchically, supervised, unsupervised and over time

Text mining and visualizations applications on Medline Search for facts and relationships (using classification and clustering) Disambiguation (using classification) Named entity recognition (using classification) Concept and relationships discovery Knowledge discovery by integrating text mining and visualization with Data mining : mining table data from experiments Image mining : micro array analysis Graph mining : path way analysis

Mining and visualization Medline

Search Medline by querie

Classification of Medline documents to multiple topics

Visualization of trends in Medline documents over time

Conclusions Combination of text mining and visualization algorithms enables more application solutions Multiple application solutions are a combination of a small set of text mining algorithms Multiple application solutions require multiple coupled view visualizations to provide complete

Future development Patent analytics will include patent ranking and valuation Text mining will integrate with data mining, for instance for patent portfolio management Stand alone applications and applications with work flow support will advance

Trends Patterns Relationships http://www.treparel.com/ Enabling you to learn more and see more