Innovation Typology. Collaborative Authoritativeness. Focused Web Mining. Text and Data Mining In Innovation. Generational Models

Similar documents
Concurrent Apriori Data Mining Algorithms

Machine Learning. Topic 6: Clustering

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Available online at Available online at Advanced in Control Engineering and Information Science

Cluster Analysis of Electrical Behavior

Parallel and Distributed Association Rule Mining - Dr. Giuseppe Di Fatta. San Vigilio,

Keyword-based Document Clustering

CS 534: Computer Vision Model Fitting

Hierarchical clustering for gene expression data analysis

TF 2 P-growth: An Efficient Algorithm for Mining Frequent Patterns without any Thresholds

Domain Thesaurus Construction from Wikipedia *

Unsupervised Learning and Clustering

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Lecture 4: Principal components

Data Mining: Model Evaluation

Query Clustering Using a Hybrid Query Similarity Measure

Efficient Distributed File System (EDFS)

An Optimal Algorithm for Prufer Codes *

Wireless Sensor Networks Fault Identification Using Data Association

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Machine Learning: Algorithms and Applications

A Simple Methodology for Database Clustering. Hao Tang 12 Guangdong University of Technology, Guangdong, , China

Agenda & Reading. Simple If. Decision-Making Statements. COMPSCI 280 S1C Applications Programming. Programming Fundamentals

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

A Combined Approach for Mining Fuzzy Frequent Itemset

Outline. CHARM: An Efficient Algorithm for Closed Itemset Mining. Introductions. Introductions

Simulation Based Analysis of FAST TCP using OMNET++

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

Single Document Keyphrase Extraction Using Neighborhood Knowledge

Performance Evaluation of Information Retrieval Systems

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

Fuzzy Weighted Association Rule Mining with Weighted Support and Confidence Framework

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Programming in Fortran 90 : 2017/2018

All-Pairs Shortest Paths. Approximate All-Pairs shortest paths Approximate distance oracles Spanners and Emulators. Uri Zwick Tel Aviv University

Recognizing Faces. Outline

Research on Categorization of Animation Effect Based on Data Mining

LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals

Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation

LARRY SNYDER DEPT. OF INDUSTRIAL AND SYSTEMS ENGINEERING CENTER FOR VALUE CHAIN RESEARCH LEHIGH UNIVERSITY

A fault tree analysis strategy using binary decision diagrams

Effective Page Recommendation Algorithms Based on. Distributed Learning Automata and Weighted Association. Rules

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

Graph-based Clustering

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Smoothing Spline ANOVA for variable screening

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Discovering Relational Patterns across Multiple Databases

Vectorization in the Polyhedral Model

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp

Polyhedral Compilation Foundations

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

Routing in Degree-constrained FSO Mesh Networks

On Some Entertaining Applications of the Concept of Set in Computer Science Course

Problem Set 3 Solutions

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

A Heuristic for Mining Association Rules In Polynomial Time*

Parallel matrix-vector multiplication

Meta-heuristics for Multidimensional Knapsack Problems

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Signed Distance-based Deep Memory Recommender

Feature Reduction and Selection

Data Mining Approaches to User Modeling for Adaptive Hypermedia: Survey and Future Directions

Goals and Approach Type of Resources Allocation Models Shared Non-shared Not in this Lecture In this Lecture

Non-Split Restrained Dominating Set of an Interval Graph Using an Algorithm

Annales UMCS Informatica AI 1 (2003) UMCS. Designing of multichannel optical communication systems topologies criteria optimization

ApproxMGMSP: A Scalable Method of Mining Approximate Multidimensional Sequential Patterns on Distributed System

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Obstacle-Aware Routing Problem in. a Rectangular Mesh Network

Lecture 15: Memory Hierarchy Optimizations. I. Caches: A Quick Review II. Iteration Space & Loop Transformations III.

LOOP ANALYSIS. The second systematic technique to determine all currents and voltages in a circuit

Algorithms for Frequent Pattern Mining of Big Data

S1 Note. Basis functions.

Biostatistics 615/815

Sorting. Sorted Original. index. index

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky

Development of an Active Shape Model. Using the Discrete Cosine Transform

Data-Aware Scheduling Strategy for Scientific Workflow Applications in IaaS Cloud Computing

K-means and Hierarchical Clustering

Review of approximation techniques

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Unsupervised Learning and Clustering

Solving two-person zero-sum game by Matlab

CS47300: Web Information Search and Management

Multi-stable Perception. Necker Cube

LECTURE : MANIFOLD LEARNING

Support Vector Machines

Performance Study of Parallel Programming on Cloud Computing Environments Using MapReduce

Target Tracking Analysis Based on Corner Registration Zhengxi Kang 1, a, Hui Zhao 1, b, Yuanzhen Dang 1, c

Clustering. A. Bellaachia Page: 1

Transcription:

Text and Data Mnng In Innovaton Joseph Engler Innovaton Typology Generatonal Models 1. Lnear or Push (Baroque) 2. Pull (Romantc) 3. Cyclc (Classcal) 4. Strategc (New Age) 5. Collaboratve (Polyphonc) Collaboratve Authortatveness Focused Web Mnng for Papers/Artcles Creaton of Authortatveness Matrx Apply Authortatveness Metrc Document Clusterng Focused Web Mnng Standard Web Crawlng Methodologes Download of Query Specfc Fles Forms the repostory from whch Text Mnng takes place 1

Authortatveness Matrx Authors names are parsed from the documents Cted references are parsed from the documents The publcaton date s parsed from the document Authortatveness Matrx Howt, P. Benkler, Y. Stokc, D. Von Hppel, E. Nolan, R.L. Koza, J.R. Document Year Kusak, A 1 1 0 1 1 1 2006 Ln, G. 0 1 0 0 1 1 2004 Stokc, D. 1 0 1 0 0 1 1999 Parsng of Names Heurstcs Name should be frst non-empty lne after ttle Regular Expressons ^\w+\s\w[.]\s*\w+\s \w+\s\w*[.]*\s*\w+\s[a][n][d] Names Database wth Dce Coeffcent Dce Coeffcent Create Bgrams of the two words beng compared. Nght g = {n, g, gh, ht} = X Nacht = {na, ac, ch, ht} = Y Calculate Smlarty ( X Y ) Dce = 2 Coef X + Y 2

Authortatveness Metrc Scan Authortatveness Matrx Create Hash of Authors (row) Create Hash of Referenced Authors (column) Create Hash of Average Age of Document for each author Authors Hash (rows) measures Out-Lnks Reference Hash (columns) measures In-Lnks Authortatveness Metrc Cont. Calculate the ntal authortatveness for each author and referenced author. t' A = l( ln( λ ( out ) + n ) out s the number of out-lnks n s the number of n-lnks λ s a user defned weght parameter of document age n [0,1] t s the average age of the document for author Authortatveness Boostng Smlar to PageRank Algorthm Iteratve Approach If an authortatve t author references a paper, the n-lnk to that reference s ncreased In-Lnks of less authortatve authors pose no detrment In-Lnk Boostng Calculate the mean of the n-lnks N Aj e n = N j= 1 Update the Authortatveness Metrc N A j ' e n = = 1 1 j n A j f e > n A j f e n 3

Determnng Authortatveness Order the authors by Authortatveness Select Top K authors Cluster the documents Fnd cluster closest to current ssue Fnd most authortatve k authors for that ssue Authors that are authortatve overall may not be authortatve on specfc topc Possble applcaton of Apror Prncpal Expermental Results 945 Artcles on Genetc Algorthms Unclustered to determne overall authorty 30 teratons of In-Lnk Boostng λ set to 1 to not dscount older authortatveness Expermental Results Cont. Author Orgnal Boosted Authortatveness Authortatveness J. H. Holland 5.278 6.2122 J. R. Koza 4.8828 5.8105 F. H. Bennett 4.5849 5.1350 Cyclc Innovaton & Data Mnng Mnng of Requrements Creaton of Requrements Database Constructon of Requrements Tree L. Altenberg 4.3438 4.4306 D. Andre 3.9512 4.0105 Note the mnmal boost of the last two authors. 4

Web Mnng For Requrements Source of Requrements Blogs User Revews Expert Revews Patent Databases Trade Journals Stock Market Analyss (trcky at best) Flterng Requrements Moaners and Prasers I hate ths MP3 player and would never buy a product from ths company agan. I just love Mcrosoft and every thng they produce. It s all bug free Attempt to assgn a measure of success to the requrement Identfy hstorc ssues versus new requrements Requrements Database Transactonal Database of Sorts Product Workstaton Desk Smaller Increased Interface Increased Nonnterferng Footprnt Bandwdth wth Stylus RPMs legs 1 0 0 0 1 Turret Lathe 1 0 0 1 1 Abstracton of Database Utlze Multdmensonal Cubes Ablty to Roll-up or Drll Down smlar to OLAP Increased choces n levels of abstracton Smartphone 1 1 1 0 0 5

Mnng Frequent Requrements Select Product/Servce type to mne the requrements for IPod (ncludes all MP3 Players) On-lne Tax Servce (ncludes all on-lne) Utlze a Market Basket type of Analyss Apror Algorthm FP-Growth Mnng Frequent Requrements Dscover Frequent Itemsets Itemset can be consdered as a conjuncton of tems A ^ B Itemset can be consdered as a predcate A => B Frequent Itemset Metrcs Support Confdence sup( A B) = conf ( A B) = Number of tupels contanng both A and B Total number of tupels Number of tuples contanng both A and B Number of tupels contanng A Frequent Itemset Generaton 1. Scan Database for frequent 1 tems 2. Remove those tems that have a support value of less than a gven threshold 3. Jon the remanng frequent tems to form 2 tem temsets 4. Repeat steps 2 and 3 ncrementng the temset sze each tme untl there are no temsets left to jon. 6

Frequent Itemset Example Workbench Close to lathe and Desk Items wth legs and a top wth stablty Mne the frequent requrements from our Requrements database prevously shown Frequent Itemset Example Product Workstaton Desk Smaller Footprnt Increased Bandwdth Interface wth Stylus Increased RPMs Nonnterferng legs 1 0 0 0 1 Turret Lathe 1 0 0 1 1 Smartphone 1 1 1 0 0 Choose tems of smlar abstracton to mne from. Frequent Itemset Example Frequent Itemset Example Itemset Support Smaller Footprnt 2 Increased Bandwdth 0 Itemset Smaller Footprnt and Non-Interferng Legs Support 2 Interface wth Stylus 0 There remans no further temsets to jon. Increased RPMs 1 Non-nterferng Legs 2 Set support = 2 7

Buld Requrements Tree Bult from frequent temsets 8