Hebei University of Technology A Text-Mining-based Patent Analysis in Product Innovative Process

Similar documents
Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels

Automatic classification of patents oriented to TRIZ: a case study on large aperture optical elements

INTRODUCTION TO DATA MINING

WEKA homepage.

Text Mining. Representation of Text Documents

String Vector based KNN for Text Categorization

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

Document Clustering: Comparison of Similarity Measures

Mining Web Data. Lijun Zhang

Prior Art Retrieval Using Various Patent Document Fields Contents

Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

AN EFFECTIVE INFORMATION RETRIEVAL FOR AMBIGUOUS QUERY

Letter Pair Similarity Classification and URL Ranking Based on Feedback Approach

Patent Classification Codes Made Easy

KEYWORD EXTRACTION FROM DESKTOP USING TEXT MINING TECHNIQUES

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

Overview of Web Mining Techniques and its Application towards Web

Patent Image Retrieval

Integrating Text Mining with Image Processing

Data Mining of Web Access Logs Using Classification Techniques

Statistical Learning and Data Mining CS 363D/ SSC 358

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

Encoding Words into String Vectors for Word Categorization

Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization

Effective Pattern Identification Approach for Text Mining

Feature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web

Mining Social Media Users Interest

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Ontology Based Search Engine

Non-trivial extraction of implicit, previously unknown and potentially useful information from data

Chapter 6: Information Retrieval and Web Search. An introduction

Derwent Innovations Index

CHAPTER 3 ASSOCIATON RULE BASED CLUSTERING

Clustering Analysis based on Data Mining Applications Xuedong Fan

June 15, Abstract. 2. Methodology and Considerations. 1. Introduction

Web Information Retrieval using WordNet

TIRA: Text based Information Retrieval Architecture

Image Classification Using Text Mining and Feature Clustering (Text Document and Image Categorization Using Fuzzy Similarity Based Feature Clustering)

Design and Realization of Data Mining System based on Web HE Defu1, a

A Comparative Study of Selected Classification Algorithms of Data Mining

Datasets Size: Effect on Clustering Results

ktmine Patent Search User Guide Table of Contents

Chapter 27 Introduction to Information Retrieval and Web Search

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis.

Dr. Prof. El-Bahlul Emhemed Fgee Supervisor, Computer Department, Libyan Academy, Libya

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION

Best Practices to Ensure Comprehensive Prior Art Searches

Data Mining Technology Based on Bayesian Network Structure Applied in Learning

ISSN: [Sugumar * et al., 7(4): April, 2018] Impact Factor: 5.164

International Journal of Computer Engineering and Applications, Volume VIII, Issue III, Part I, December 14

PROBLEM FORMULATION AND RESEARCH METHODOLOGY

International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015)

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22

International ejournals

A Comparison of Three Document Clustering Algorithms: TreeCluster, Word Intersection GQF, and Word Intersection Hierarchical Agglomerative Clustering

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

An Increasing Efficiency of Pre-processing using APOST Stemmer Algorithm for Information Retrieval

Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation

An Improved Document Clustering Approach Using Weighted K-Means Algorithm

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

MapReduce: Simplified Data Processing on Large Clusters. By Stephen Cardina

Derwent Innovations Index

Most keyword search tools would place a capacitive touch panel further down the results list.

Study of Data Mining Algorithm in Social Network Analysis

Adaptive and Personalized System for Semantic Web Mining

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Chapter 3 - Text. Management and Retrieval

USPTO INVENTOR DISAMBIGUATION

IMPLEMENTATION AND COMPARATIVE STUDY OF IMPROVED APRIORI ALGORITHM FOR ASSOCIATION PATTERN MINING

Knowledge Discovery and Data Mining 1 (VO) ( )

Seminars of Software and Services for the Information Society

Introduction to Programming Languages and Compilers. CS164 11:00-12:30 TT 10 Evans. UPRM ICOM 4029 (Adapted from: Prof. Necula UCB CS 164)

Creating Time-Varying Fuzzy Control Rules Based on Data Mining

D B M G Data Base and Data Mining Group of Politecnico di Torino

Information Retrieval

Discovery of Agricultural Patterns Using Parallel Hybrid Clustering Paradigm

Review on Text Mining

Question Answering Systems

Mining Association Rules in Temporal Document Collections

SAACO: Semantic Analysis based Ant Colony Optimization Algorithm for Efficient Text Document Clustering

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

Tagonto. Tagonto Project is an attempt of nearing two far worlds Tag based systems. Almost completely unstructured and semantically empty

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Web search results clustering in Polish: experimental evaluation of Carrot

Enriching knowledge graphs with text processing techniques

Inventions on auto-configurable GUI-A TRIZ based analysis

On-line glossary compilation

Information Extraction Techniques in Terrorism Surveillance

A Survey On Different Text Clustering Techniques For Patent Analysis

Prior Art Research Overview

1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation

1DL321: Kompilatorteknik I (Compiler Design 1)

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA

Text Analytics (Text Mining)

Mining Relations from Git Repositories

Transcription:

A Text-Mining-based Patent Analysis in Product Innovative Process Liang Yanhong, Tan Runhua

Abstract Hebei University of Technology Patent documents contain important technical knowledge and research results. They have high quality information to inspire designers in product development. However, they are lengthy and have much noisy results such that it takes a lot of human efforts for analysis. And due to the fact that hidden and unanticipated information plays a dominant role for TRIZ user, it is difficult to discern manually, thus, patent analysis has long been considered useful in product innovative process. Automatic tools for assisting innovators and patent engineers in obtaining useful information from patent documents are in great demand. In TRIZ theory, a product design problem can be considered as one or several Contradictions and Inventive Principles. Text mining could be used to analyze these textual documents and extract useful information from large amount documents quickly and automatically. In this paper, a computer-aided approach for extracting useful information from patent documents according to TRIZ Inventive Principles is proposed. 2

outline Introduction Theoretical background Patent classification and characteristics of patent document Text mining A text-mining-based methodology to patent analysis Text preprocessing Text transformation and feature selection Pattern discovery Conclusion and future work 3

Introduction According to TRIZ, a significant operation of product innovation is to solve design contradictions. When contradictions and Inventive Principles are defined, product is developed by referring to the analogous inventions not only in related fields but also dissimilar problems in other fields that have previously solved the same contradiction. Patent documents contain important research results that are valuable for product innovation. To obtain useful information, the method by scanning or reading the patent documents manually is a rather trivial and timeconsuming task. Text mining, which specialized for full-text patent analysis, can be applied to derive information from patent documents. Thus, automatic tools of text mining in patent analysis for assisting innovators or patent engineers are in great demand. 4

Theoretical background Patent classification Patent documents are divided into different areas according to technology fields is helpful to search the prior art for traditional inventors. However, it is inadequate and noisy for TRIZ users since TRIZ users are interested in previous patents that have solved the same Contradiction and used the same Inventive Principles, which may come from different fields. If patent documents can be classified according to Inventive Principles and Contradiction, they will provide a broader view for TRIZ users and TRIZ software developers, by helping them find possible inspiration from different fields. 5

Characteristics of patent document A patent document contains dozens of items for analysis. Some are structured -- patent number, filing date, assignees. Some are unstructured -- title, claims, abstract, descriptions. Usually the description of the invention can be further segmented into field of the invention, background, summary, and detailed description. The mass of patent document is made of unstructured text, text mining techniques can be applied for patent analysis. 6

Text mining Text mining is a rather new technique that has been proposed to perform knowledge discovery from collections of unstructured text. Like data mining or knowledge discovery, text mining is often regarded as a process to find implicit, previously unknown, and potentially useful regularities in large textual datasets. Since it is suitable for drawing valuable information from large volumes of unstructured text, text mining has been widely adopted to explore the complex relationship among patent documents. With the application of text mining an effective means can be provided for content searches in the textual fields of patent documents. 7

A text-mining-based methodology to patent analysis The overall process of conducting text-mining-based patent analysis goes through several steps. First of all, text collection and text preprocessing are the preliminary step. Second, raw patent documents are transformed into structured data. Text mining that extracts keywords and clue words from patent document is used to this end. In relation to patent analysis, text mining is used as a data processing and information-extracting tool. 8

Text mining process of patent analysis 9

Text preprocessing Usually the abstracts and descriptions provided enough semantic information to determine TRIZ Inventive Principles that the patents used. Therefore the abstracts and descriptions are extracted from the full text and parsed into independent sentence. It is implemented using regular expressions in Perl. 10

Text transformation and feature selection The document is first split into a series of words. Adjectives, adverbs, nouns and multi-word are extracted from the document. word frequency and inverse document frequency are two parameters used in filtering terms. Low TF and DF terms are often removed from the indexing of patent documents. To better match concepts among terms, words are stemmed based on Porter s algorithm. It contains keywords, title words, and clue words. By repeatedly merging back nearby words based on three simple merging, dropping, and accepting rules, Maximally repeated strings in the text are extracted as keyword candidates. The title words are those non-stop words that occur in the title of a patent document. The clue words are a list of words that reveal the intent, functions, purposes, or improvements of the patent. 11

the keyword extraction algorithm 12

Pattern discovery A keyword search is helpful to the innovator but sometimes the useful patents maybe be neglected since the same Inventive Principles to solve the problem come from different fields. Therefore there is a need to accurately return records that might exhibit similar problems and causes to innovators. Under these circumstances, clustering might be a possible tool that could be employed. 13

Conclusion and future work An automated text mining system will be developed using Perl and WEKA software. In addition to TRIZ Incentive Principles, Contradictions would also be taken into consideration. 14

Thank you! Institute of Design for Innovation 2007-10-06