Visual Analysis of Classification Scheme Veslava Osińska Institute of Information Science and Book Studies Nicolaus Copernicus University Toruń, Poland 1/26
Information Visualization (INFOVIS) Visualization is a process in which data, information and knowledge are transformed into a visual form exploiting people. Visualization is more than a method of computing. It is a process of transforming information into a visual form enabling the viewer to observe, browse, make sense, and understand the information. Information visualization concentrates on the use of computer-supported tools to explore large amount of abstract data. INSIGHT 2/26
Three objectives of InfoVis Science Domains structure s discovering Information Retrieval aid Data clusterization or classification 3/26
Infoviz - examples words clouds http://manyeyes.alphaworks.ibm.com/ Title words only from all abstracts accepted for the 2008 ONS Congress and 2008 Advanced Practice Nursing Conference 4/26
5/26 Tree maps
Visualization of English Wikipedia by Todd Holloway, Miran Bozicevic, Katy Börner datamining.typepad.com 6/26
http://mapofscience.com/ Mathematics, Physics Biotechnology Brain Research Chemistry Earth Sciences Health Professionals Computer Science Biology Social Sciences Engineering Medical Sciences Humanities Science mapping based on a combination of the bibliographic coupling of references and keyword vectors 7.2 million papers and over 16,000 separate journals, proceedings from a five year period: 2001-2005 7/26
Computing Classification System Association for Computing Machinery (ACM) 1964, 1968, 1998 A. General Literature B. Hardware C. Computer Systems Organization D. Software E. Data F. Theory of Computation G. Mathematics of Computing H. Information Systems I. Computing Methodologies J. Computer Applications K. Computing Milieux 8/26
Linear order A D C H?? D.2 D.3 C.1 C.3 C.4 H.5 H.6 K Visualization principle: similar objects are located closer one another, distinct far. How to define the similarity of tree s objects: (sub)classes? 9/26
Nonlinear Similarity Metrics Metadata of CCS documents: title author year primary class symbol aditional classes symbols keywords general terms subject descriptors analysis units As closer thematically two subclasses the more common articles they include. Topic similarity between classes is proportional to the number of common documents 10/26
Mapping Space continuos symmetric curved surface has more topological possibilities sphere is easy for navigation and retrieval processes 11/26
Similarity Matrix Documents metadata 2007 (N = 37 543 ) Dimension: 353 classes and subclassess number Normalization Symetrization N= 353 353 i= 1 j= 1 s ij S ij = 1 (i = j) and S ij <1 (i j) 12/26
Similarity Matrix cont. 13/26
MDS plot Phi = 0.25 r 2 =0.47 r ijk2 = x i2 +y j2 + z k 2 Morse potencial E s [( ) ] 1 2 1 ( ) ( ) r = D e b r R e 14/26
15/26 A.0 A.1 A.2 A.m B.0 B.1 B.1.0 B.1.1 B.1.2 B.1.3 B.1.4 B.1.5 B.2 B.2.0 B.2.1 B.2.2 B.2.4 B.3 B.3.0 B.3.1 B.3.2 B.3.3 B.3.m B.4 B.4.0 B.4.1 B.4.2 B.4.3 B.4.4 B.4.m B.5.1 B.5.2 B.5.3 B.6 B.6.0 B.6.1 B.6.3 B.6.m B.7 B.7.0 B.7.1 B.7.2 B.7.3 B.7.m B.8 B.8.0 B.8.1 B.8.2 B.8.m B.m C.0 C.1 C.1.0 C.1.1 C.1.2 C.1.3 C.1.4 C.1.m C.2 C.2.0 C.2.1 C.2.2 C.2.3 C.2.4 C.2.5 C.2.6 C.2.m C.3 C.4 C.5 C.5.0 C.5.1 C.5.2 C.5.3 C.5.4 C.5.5 C.5.m C.m D.0 D.1 D.1.0 D.1.1 D.1.2 D.1.3 D.1.4 D.1.5 D.1.6 D.1.7 D.1.m D.2 D.2.0 D.2.1 D.2.10 D.2.11 D.2.12 D.2.13 D.2.2 D.2.2 D.2.3 D.2.4 D.2.5 D.2.6 D.2.6 D.2.7 D.2.8 D.2.9 D.2.m D.3 D.3.0 D.3.1 D.3.2 D.3.3 D.3.4 D.3.m D.4 D.4.0 D.4.1 D.4.2 D.4.3 D.4.4 D.4.5 D.4.6 D.4.7 D.4.8 D.4.9 D.4.m D.m E.0 E.1 E.2 E.3 E.4 E.5 E.m F.0 F.1.0 F.1.1 F.1.2 F.1.3 F.1.m F.2 F.2.0 F.2.1 F.2.2 F.2.3 F.2.m F.3 F.3.0 F.3.1 F.3.2 F.3.3 F.4 F.4.0 F.4.1 F.4.2 F.4.3 F.4.m F.m G.0 G.1 G.1.0 G.1.1 G.1.10 G.1.2 G.1.3 G.1.4 G.1.5 G.1.6 G.1.7 G.1.8 G.1.9 G.1.m G.2 G.2.0 G.2.1 G.2.2 G.2.3 G.2.m G.3 G.4 G.m H. H.0 H.1 H.1.0 H.1.1 H.1.2 H.1.m H.2 H.2.0 H.2.1 H.2.2 H.2.3 H.2.4 H.2.5 H.2.6 H.2.7 H.2.8 H.2.m H.3 H.3.0 H.3.1 H.3.2 H.3.3 H.3.4 H.3.5 H.3.6 H.3.7 H.3.m H.4 H.4.0 H.4.1 H.4.2 H.4.3 H.4.m H.5 H.5.0 H.5.1 H.5.2 H.5.3 H.5.3 H.5.4 H.5.5 H.5.m H.m I.0 I.1 I.1.0 I.1.1 I.1.2 I.1.3 I.1.4 I.2 I.2.0 I.2.1 I.2.10 I.2.11 I.2.2 I.2.3 I.2.4 I.2.5 I.2.6 I.2.7 I.2.8 I.2.9 I.2.m I.3 I.3.0 I.3.1 I.3.2 I.3.3 I.3.4 I.3.5 I.3.6 I.3.7 I.3.8 I.3.m I.4 I.4.0 I.4.1 I.4.10 I.4.2 I.4.3 I.4.4 I.4.5 I.4.6 I.4.7 I.4.8 I.4.9 I.4.m I.5 I.5.0 I.5.1 I.5.2 I.5.3 I.5.4 I.5.5 I.5.m I.6 I.6.0 I.6.1 I.6.2 I.6.3 I.6.4 I.6.5 I.6.6 I.6.7 I.6.8 I.6.m I.7 I.7.1 I.7.2 I.7.4 I.7.5 I.7.m I.m J.0 J.1 J.2 J.3 J.4 J.5 J.6 J.7 J.m K.0 K.1 K.2 K.3 K.3.0 K.3.1 K.3.2 K.3.m K.4 K.4.0 K.4.1 K.4.2 K.4.3 K.4.4 K.4.m K.5 K.5.0 K.5.1 K.5.2 K.5.m K.6 K.6.0 K.6.1 K.6.2 K.6.3 K.6.4 K.6.5 K.6.m K.7 K.7.0 K.7.1 K.7.2 K.7.3 K.7.4 K.7.m K.8 K.8.0 K.8.1 K.8.2 K.8.3 K.8.m K.m 0 0,1 0,2 0,3 0,4 0,5 0,6 r 2 value The distribution r 2 r 2 =0.47
Classes visualization Atributes: 1. main class (color) 11 2. level 1,2,3 (luminosity) 3. population (size) 16/26
Documents Visualization principle Classifications weigths Main: Additional 0.6:0.4 0.7:0.3 Pr 0.5:0.5 Ad1 Ad2 17/26
Visualization sphere 18/26
Cartographic projection of sphere surface 19/26
20/26
21/26 Keywords Map Semantic map of Universum
Evolution of CCS Scheme 40000 30000 20000 10000 0 1968 1978 1988 1998 2007 22/26
Evolution of CCS Scheme (2) Longitudinal mapping 40000 30000 20000 10000 1968 1978 0 1968 1978 1988 1998 2007 1988 1998 23/26 2007
Evolution of CCS Scheme - conclusion In 80th - systematic of 4 basic categories: software, hardware, Computer Systems Organization and Information systems. In 90th - clusterization Classification evolves towards stronger adaptation CCS scheme in Digital Library. 24/26
Summary Metadata gathering Data extraction Data visualization Data analysis on the basis of: cartographic projections; fractal dimentsion; Implementations: in keywords mapping; in CCS Scheme evaluation; in CCS Scheme and CS domain evolution research 25/26
Conslusion Innovation - First visualization of: classification and universe, thematic categories of single domain, cross-historical of CS, use of own methods: sphere surface, cartographic projections, graphic filters, fractals and feedback mapping (one map verifies another). Potencial of method: use for automatic generation and evaluation of classification schemes analysis of inter- and multidisciplinarity Forecasting Large Trends in Science 26/26
Classifications mapping Instead textual mapping -> topology transformation? 27/26
Thank you for attention 28/26