Analysis of Documents Clustering Using Sampled Agglomerative Technique
|
|
- Thomasina Cook
- 6 years ago
- Views:
Transcription
1 Aalysis of Documets Clusterig Usig Sampled Agglomerative Techique Omar H. Karam, Ahmed M. Hamad, ad Sheri M. Moussa Abstract I this paper a clusterig algorithm for documets is proposed that adapts a samplig-based pruig strategy to simplify hierarchical clusterig. The algorithm ca be applied to ay text documets data set whose etries ca be embedded i a high dimesioal Euclidea space i which every documet is a vector of real umbers. This paper presets the results of a experimetal study of the proposed documet clusterig techique. The performace of the method is illustrated i terms of quality of clusters. Idex Terms Agglomerative clusterig, Documet clusterig, ad k-meas clusterig. D I. INTRODUCTION ocumet Clusterig has bee extesively ivestigated as a methodology for improvig documet search ad retrieval. The geeral assumptio is that mutually similar documets will ted to be relevat to the same queries, ad hece, that automatic determiatio of groups of such documets ca improve recall ad precisio by effectively broadeig a search request []. May techiques are used for documet clusterig [],[5],[6],[7]. The two mai approaches used for documet clusterig are the agglomerative hierarchical ad the k-meas clusterig techiques. k-meas is a partitioal clusterig techique based o the idea that a ceter poit (cetroid), represetig the average poit of the data poits grouped i oe cluster, ca represet a cluster. Give the umber of required clusters (k) ad the iitial k ceter poits (cetroids), documets ca be clustered by assigig each documet to the cluster havig the earest cetroid. This is doe by measurig similarity through the distace fuctio betwee the documet ad each of the k cetroids. A ew cetroid is the recomputed for each ewly created cluster, ad the whole process is repeated util o chage i the k cetroids or the Omar H. Karam, Departmet of Iformatio Systems, Faculty of Computer ad Iformatio Scieces; Ai Shams Uiversity, Cairo, 566, EgyptPhoe: () Fax: () ohkaram@hotmail.com). Ahmed M. Hamad, Professor of Computer Systems, Departmet of Iformatio Systems, Faculty of Computer ad Iformatio Scieces, Ai Shams Uiversity, Cairo, 566, Egypt Phoe: () Fax: () ( amhamad@yahoo.com). Sheri M. Moussa, Departmet of Iformatio Systems, Faculty of Computer ad Iformatio Scieces Ai Shams Uiversity, Cairo, 566, detected chage is less tha a determied threshold. O the other had, the agglomerative clusterig algorithm is a hierarchical techique that produces a ested sequece of partitios, with a sigle, all-iclusive cluster at the top ad sigleto clusters of idividual poits at the bottom [],[9]. It starts with each documet as a idividual cluster, ad at each iteratio, merges the most similar or closest pair of clusters util oe cluster is obtaied. Therefore, clusterig should stop before obtaiig the sigle cluster depedig o a specified coditio, such as the umber of clusters required, or the quality of the clusters to obtai. Agglomerative hierarchical clusterig is ofte portrayed as better tha K-meas, although slower. It is also superior to k-meas clusterig, especially for buildig documet hierarchies. [],[],[],[8]. The traditioal agglomerative hierarchical clusterig procedure is summarized as follows []:. Compute the similarity betwee all pairs of clusters, i.e., calculate a similarity matrix whose ij th etry gives the similarity betwee the i th ad j th clusters.. Merge the most similar (closest) two clusters.. Update the similarity matrix to reflect the pairwise similarity betwee the ew cluster ad the origial clusters. 4. Repeat steps ad util oly a sigle cluster remais. There are differet hierarchical clusterig algorithms [],[],[8]. The oly real differece betwee these differet hierarchical schemes is how they choose which clusters to merge, i.e., how they choose to defie cluster similarity.(i.e. Itra-Cluster Similarity Techique (IST), Cetroid Similarity Techique (CST), ad Uweighted Pair Group Method with Arithmetic mea (UPGMA)). Amog the several obstacles to be faced i text miig, the amout of preprocessig that has to be applied to a documet for text miig purposes, ad the documets mathematical represetatio [4]. I this paper, a agglomerative techique is applied to documet clusterig that adapts a samplig-based pruig strategy to simplify hierarchical clusterig. The algorithm ca be applied to ay text documets data set whose etries ca be embedded i a high dimesioal Euclidea space (i.e. i which every documet is a vector of real umbers). The proposed algorithm is followed by a experimetal study ad the obtaied results while varyig its differet parameters ad observig their effect o the resultig cluster qualities. Egypt Phoe: () Fax: () ( shero@usa.com).
2 This paper is orgaized as follows. Sectio discusses the preparatio of text documets i order to be clustered. Sectio explais the method used i our system for documet clusterig, ad sectio 4 presets the results of experimets evaluatig the performace of the system. Fially, sectio 5 cocludes the paper. II. DOCUMENTS PREPARATION I order to apply ay clusterig algorithm o text documets, some steps should be followed [],[],[4]. These are: a) Preprocessig Iitially, Very commo words, (i.e. prepositios ad o-cotet bearig words, ofte kow as stop words), are removed completely from documets. I additio, stemmig is applied where differet forms of a word are reduced to a sigle stem form. I our work, Porter s suffix strippig algorithm is used []. Now documets are ready to be mathematically represeted. b) Mathematical Represetatio I our system, the vector space model is used, where each documet is represeted by the (TF) vector, d, i the multidimesioal space of documet words. The Term- Frequecy vector is: d = (tf, tf, L, tf ) () tf where tf i is the frequecy of the ith term i the documet. Give a set, S, of documets ad their correspodig vector represetatios, the cetroid vector, c, is defied to be: c = d () S d S which is obtaied by averagig the weights of the various terms preset i the documets of S, where d is the documet vector obtaied from the set S. Thus, the vector space model of a text data set is cosidered to be a word-by-documet matrix whose rows are words ad colums are documet vectors. c) Dimesio Space Costructio Sice the documet vectors are of high dimesioality, pruig ca be used to reduce the dimesioality. A threshold P is give, ad words havig a frequecy less tha this threshold are removed completely from the documet vectors. At this stage, documet vectors legths are ot uified. Therefore, a superset vector is created that cotais all words i all the documets after pruig. This superset vector is the compared to each documet vector, ad each documet vector is mapped from its ow space to the superset vector space. If a word foud i the superset vector is missig i the documet vector, it is added to the documet vector with term frequecy zero. Thus a dimesio space is costructed, where each word i a documet vector is cosidered to be a dimesio. d) Similarity Measure: Documet vectors are ow ready for clusterig, where similar documet vectors are grouped together i the same cluster. The similarity measure betwee ay two vectors is computed usig the Mikowski distace fuctio, d q q q /q ( i, j) = ( x x + x x + L+ x x ) i j i () where i represets the ith documet vector ad j represets the jth documet vector, x ip is the word umber p i the ith documet vector; x jp is the word umber p i the jth documet vector, q is the power of the distace fuctio. This similarity measure ca be oe of the followig approaches: a) Sigle likage, b) Complete likage, ad c) Cetroids similarity [8]. I this work, the third approach is used, where similarity betwee two clusters is measured betwee the cetroids of each. The cetroid of a cluster is its average elemet. III. THE PROPOSED CLUSTERING ALGORITHM The proposed clusterig algorithm reads the whole collectio of documets to create the correspodig documet vectors, which are prued by the give threshold, P. The dimesio space is the costructed as explaied, ad a radom sample, R, is chose from the data set. Agglomerative clusterig is the applied to this chose sample usig the Cetroid Similarity Techique (CST), where at each successive iteratio, pairs of clusters havig their similarity measuremet value below a icremetig step value, are merged together ito oe cluster. At the ext iteratio, the icremetig step is icreased ad the procedure is repeated util the give agglomerative distace threshold, A, is exceeded. This agglomerative distace threshold at which the agglomeratio stops, that is, the distace betwee clusters, is take as the quality measure for the agglomerative clusters. Whe agglomeratio stops, the umber of the created clusters is cosidered to be the umber of the clusters that will cotai all the documet vectors of the whole data set, ad the computed cetroids of the created clusters are the iitial cetroids for the documet vectors of the whole data set. Idividual documets that have ot bee merged yet with other clusters are cosidered to be idividual clusters, havig the idividual documet vector as the cetroid vector of the cosidered cluster. Hece, the first ru of k-meas algorithm is applied, where each documet vector of the whole data set is assiged to the cluster havig the earest or the most similar cetroid vector. The quality of a obtaied cluster for these documets assigmet is take as the itra-cluster similarity measure, that is, the sum of the distaces betwee each documet vector ad j ip jp
3 the cetroid vector of the cluster to which this documet is attached. This value is the ormalized by dividig it by the umber of documets assiged to the cosidered cluster. The total quality value is computed by summig up the quality values of all the created clusters, which is ormalized by further dividig this total value by the umber of the created clusters. That is, cluster quality equals: = k k j= i= ( d ij c j ) (4) where d ij is the ith documet vector i the data set ad belogs to the jth cluster, c j is the cetroid vector of the jth cluster, k is the umber of the required clusters ad is the umber of documets i the whole data set. The complexity of the proposed clusterig algorithm is O(m + k), where m is the umber of chose documets i the radom sample, k is the umber of created clusters, ad is the umber of documets i the whole data set. set. The rows represet the umber of cotaied words (obtaied from the superset vector) ad the colums represet the umber of cosidered documet vectors. It is clear that as the pruig threshold value (P) icreases, the matrices size decreases. At higher pruig thresholds, the matrices size are almost similar for both sample sizes R = % ad %. Fig. Matrices Sizes for Costructed Dimesio Spaces TABLE I K VALUE AT DIFFERENT AGGLOMERATIVE THRESHOLDS WITH SAMPLE SIZE = % Pruig A = 5 A = A = 5 A = IV. EXPERIMENTS AND RESULTS The proposed clusterig algorithm is studied by examiig it o the NSF (Natioal Sciece Foudatio) data set obtaied by dowloadig 75 abstracts of the grats awarded by the NSF. These abstracts icluded subjects Such as astroomy, populatio studies, udergraduate educatio, materials, mathematics, biology, health, oceaography, computer sciece ad chemistry. Differet pruig thresholds are applied o the data set with P =,, 4, 5, 6, 7 where most of the documet vectors cotaied o words at P = 7. After creatig the correspodig superset vector ad costructig the dimesio space, samples of R = %, % were chose to ru the agglomerative clusterig approach o these sample documet vectors. At this stage, the Euclidea distace fuctio (q = ) is used to compute distaces betwee the vectors. Differet agglomerative distace thresholds are also applied, A = 5,, 5, ad where the umber of the created clusters at A = was oly two. The effect of varyig these parameters o the umber of the created agglomerative clusters ad the quality of the fial resultat clusters are moitored. Figure () represets a graph showig the sizes of the created matrices of the dimesio spaces costructed for the chose sample at R = % ad % ad for the whole data Matrices Sizes Pruig R = % R = % k-meas TABLE K VALUE AT DIFFERENT AGGLOMERATIVE THRESHOLDS WITH SAMPLE SIZE =% Pruig A = 5 A = A = TABLE K VALUE AT DIFFERENT PRUNING THRESHOLDS WITH SAMPLE SIZE = % TABLE 4 Agglomerative P = P = P = 4 P = 5 P = 6 P = K VALUE AT DIFFERENT PRUNING THRESHOLDS WITH SAMPLE SIZE = %
4 4 Agglomerative Tables () ad () shows k value at differet agglomerative thresholds A, for R = % ad % respectively. The more the pruig threshold P icreases, i other words, the more the documet vectors are reduced, the less the umber of created agglomerative clusters obtaied (k value). O the other had, tables () ad (4) shows k value at differet pruig thresholds P for R = % ad % respectively. It was foud that the more the agglomerative distace threshold A icreases, the less the umber of created agglomerative clusters obtaied (k value) P = P = 4 P = 5 P = 6 P = Pruig Agglo. =5 Agglo. = Agglo. =5 Agglo. = By studyig the pruig threshold effect o the cluster quality of the resultat clusters from our proposed algorithm as show i figures () ad () for R = % ad % respectively ad for differet values of A, it was foud that as the pruig threshold P icreases, the resultat cluster quality icreases except for the highest pruig threshold P = 7 where the resultat cluster quality decreased agai. This is due to the removal of may words from the documet vectors that distiguish documets from oe aother Agglomerative Prue =4 Prue =6 Prue =5 Prue =7 Fig. 4 Agglomerative Effect o At Differet Pruig s with Sample Size = % Fig. Pruig Effect o At Differet Agglomerative s with Sample Size = % Pruig Agglo. =5 Agglo. = Agglo. =5 Fig. Pruig Effect o At Differet Agglomerative s with Sample Size = % 5 5 Agglomerative Prue =4 Prue =6 Prue =5 Prue =7 Fig. 5 Agglomerative Effect o At Differet Pruig s with Sample Size = % As for the effect of the agglomerative distace threshold, A, o the resultat cluster quality for R = % ad % respectively ad for differet P values as show i figures (4) ad (5), it was foud that the more the agglomerative distace threshold A icreases, the higher is the cluster quality of the resultat clusters.
5 5 After studyig the effect of both pruig ad agglomerative distace thresholds o the resultat cluster quality, it ca be summarized that at the same value of the agglomerative threshold, A, the variatio i the resultat cluster quality at the differet pruig thresholds, P is high. But at the same value of the pruig threshold, P, the variatio i the resultat cluster quality at the differet agglomerative thresholds, A, is low. This is valid for both the sample sizes, R = %, %. V. CONCLUSION I this paper, we applied a agglomerative techique to documet clusterig that adapts a samplig-based pruig strategy to simplify hierarchical clusterig depedig o the required quality for the obtaied clusters. The proposed algorithm combies the agglomerative hierarchical approach with the first ru of k-meas approach to provide k disjoit clusters. A experimetal study was coducted to evaluate the differet parameters affectig the resultat cluster quality of the obtaied clusters specifically the pruig threshold, the agglomerative threshold for the sample sizes of % ad %. It was show that the variatio i the resultat cluster quality at the differet pruig thresholds is high, versus the variatio i the resultat cluster quality at the differet agglomerative thresholds, thus pruig threshold effect is domiat. REFERENCES [] [] J. Ha, M. Kamber. Data Miig Cocepts ad Techiques. Morga Kaufma Publishers. Pages 5 88 ad 48 45,. [] [] I. S. Dhillo ad D. S. Modha. Cocept Decompositios for Large Sparse Text Data usig Clusterig. Techical Report RJ 47, IBM Almada Research ceter,. [] [] M. Steibach, G. Karypis, as V. Kumar. A Compariso of Documet Clusterig Techiques. I KDD workshop o Text Miig,. [4] [4] J.L. Neto, A.D. Satos, C.A.A. Kaester, ad A.A. Freitas. Documet Clusterig ad Text Summarizatio. I Proceedigs, 4 th Iteratioal Coferece o Practical Applicatios of Kowledge Discovery ad Data Miig (PADD-), Lodo: The Practical Applicatio Compay,. [5] [5] N. Turee ad F. Rousselot. Evaluatio of Four Clusterig Methods used i Text Miig, 999. [6] [6] M. Meila, ad D. Hackerma, A Experimetal Compariso of Several Clusterig ad Iitializatio Methods, (Techical Report 98-6). Microsoft Research Redmod, WA, 998. [7] [7] C. Wei, Y. Lee ad C. Hsu. Empirical Compariso of Fast Clusterig Algorithms for Large Data Sets. Proceedigs of the rd Hawaii Iteratioal Coferece o System Scieces, 998. [8] [8] M. J. A. Berry, G. Lioff. Data Miig Techiques for Marketig, Sales, ad Customer Support. Joh Wiley & Sos. Pages 87 5, 997. [9] [9] D. Fisher. Iterative optimizatio ad simplificatio of hierarchical clusterigs. J. Art. Itell. Res., 4:47--79, 996. [] [] D. Cuttig, D. Karger, J. Pederse, ad J. Tukey. Scatter/gather: a cluster-based approach to browsig large documet collectios. I 5th A It ACM SIGIR Coferece o Research ad Developmet i Iformatio Retrieval (SIGIR'9), pages 8--9, 99. [] [] M.F. Porter, A Algorithm for Suffix Strippig, Program 4 (), Pages -7, July 98.
The Effect of Word Sampling on Document Clustering
The Effect of Word Sampling on Document Clustering OMAR H. KARAM AHMED M. HAMAD SHERIN M. MOUSSA Department of Information Systems Faculty of Computer and Information Sciences University of Ain Shams,
More informationCSCI 5090/7090- Machine Learning. Spring Mehdi Allahyari Georgia Southern University
CSCI 5090/7090- Machie Learig Sprig 018 Mehdi Allahyari Georgia Souther Uiversity Clusterig (slides borrowed from Tom Mitchell, Maria Floria Balca, Ali Borji, Ke Che) 1 Clusterig, Iformal Goals Goal: Automatically
More informationAdministrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today
Admiistrative Fial project No office hours today UNSUPERVISED LEARNING David Kauchak CS 451 Fall 2013 Supervised learig Usupervised learig label label 1 label 3 model/ predictor label 4 label 5 Supervised
More informationOnes Assignment Method for Solving Traveling Salesman Problem
Joural of mathematics ad computer sciece 0 (0), 58-65 Oes Assigmet Method for Solvig Travelig Salesma Problem Hadi Basirzadeh Departmet of Mathematics, Shahid Chamra Uiversity, Ahvaz, Ira Article history:
More informationJournal of Chemical and Pharmaceutical Research, 2013, 5(12): Research Article
Available olie www.jocpr.com Joural of Chemical ad Pharmaceutical Research, 2013, 5(12):745-749 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 K-meas algorithm i the optimal iitial cetroids based
More information3D Model Retrieval Method Based on Sample Prediction
20 Iteratioal Coferece o Computer Commuicatio ad Maagemet Proc.of CSIT vol.5 (20) (20) IACSIT Press, Sigapore 3D Model Retrieval Method Based o Sample Predictio Qigche Zhag, Ya Tag* School of Computer
More informationA New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method
A ew Morphological 3D Shape Decompositio: Grayscale Iterframe Iterpolatio Method D.. Vizireau Politehica Uiversity Bucharest, Romaia ae@comm.pub.ro R. M. Udrea Politehica Uiversity Bucharest, Romaia mihea@comm.pub.ro
More informationImproving Template Based Spike Detection
Improvig Template Based Spike Detectio Kirk Smith, Member - IEEE Portlad State Uiversity petra@ee.pdx.edu Abstract Template matchig algorithms like SSE, Covolutio ad Maximum Likelihood are well kow for
More informationFundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le
Fudametals of Media Processig Shi'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dih Le Today's topics Noparametric Methods Parze Widow k-nearest Neighbor Estimatio Clusterig Techiques k-meas Agglomerative Hierarchical
More informationPerformance Comparisons of PSO based Clustering
Performace Comparisos of PSO based Clusterig Suresh Chadra Satapathy, 2 Guaidhi Pradha, 3 Sabyasachi Pattai, 4 JVR Murthy, 5 PVGD Prasad Reddy Ail Neeruoda Istitute of Techology ad Scieces, Sagivalas,Vishaapatam
More informationEvaluation scheme for Tracking in AMI
A M I C o m m u i c a t i o A U G M E N T E D M U L T I - P A R T Y I N T E R A C T I O N http://www.amiproject.org/ Evaluatio scheme for Trackig i AMI S. Schreiber a D. Gatica-Perez b AMI WP4 Trackig:
More informationImage Segmentation EEE 508
Image Segmetatio Objective: to determie (etract) object boudaries. It is a process of partitioig a image ito distict regios by groupig together eighborig piels based o some predefied similarity criterio.
More informationNew HSL Distance Based Colour Clustering Algorithm
The 4th Midwest Artificial Itelligece ad Cogitive Scieces Coferece (MAICS 03 pp 85-9 New Albay Idiaa USA April 3-4 03 New HSL Distace Based Colour Clusterig Algorithm Vasile Patrascu Departemet of Iformatics
More informationHADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING
Y.K. Patil* Iteratioal Joural of Advaced Research i ISSN: 2278-6244 IT ad Egieerig Impact Factor: 4.54 HADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING Prof. V.S. Nadedkar** Abstract: Documet clusterig is
More informationPruning and Summarizing the Discovered Time Series Association Rules from Mechanical Sensor Data Qing YANG1,a,*, Shao-Yu WANG1,b, Ting-Ting ZHANG2,c
Advaces i Egieerig Research (AER), volume 131 3rd Aual Iteratioal Coferece o Electroics, Electrical Egieerig ad Iformatio Sciece (EEEIS 2017) Pruig ad Summarizig the Discovered Time Series Associatio Rules
More informationImprovement of the Orthogonal Code Convolution Capabilities Using FPGA Implementation
Improvemet of the Orthogoal Code Covolutio Capabilities Usig FPGA Implemetatio Naima Kaabouch, Member, IEEE, Apara Dhirde, Member, IEEE, Saleh Faruque, Member, IEEE Departmet of Electrical Egieerig, Uiversity
More informationAccuracy Improvement in Camera Calibration
Accuracy Improvemet i Camera Calibratio FaJie L Qi Zag ad Reihard Klette CITR, Computer Sciece Departmet The Uiversity of Aucklad Tamaki Campus, Aucklad, New Zealad fli006, qza001@ec.aucklad.ac.z r.klette@aucklad.ac.z
More informationAnalysis of Different Similarity Measure Functions and their Impacts on Shared Nearest Neighbor Clustering Approach
Aalysis of Differet Similarity Measure Fuctios ad their Impacts o Shared Nearest Neighbor Clusterig Approach Ail Kumar Patidar School of IT, Rajiv Gadhi Techical Uiversity, Bhopal (M.P.), Idia Jitedra
More informationElementary Educational Computer
Chapter 5 Elemetary Educatioal Computer. Geeral structure of the Elemetary Educatioal Computer (EEC) The EEC coforms to the 5 uits structure defied by vo Neuma's model (.) All uits are preseted i a simplified
More informationDATA MINING II - 1DL460
DATA MINING II - 1DL460 Sprig 2017 A secod course i data miig http://www.it.uu.se/edu/course/homepage/ifoutv2/vt17/ Kjell Orsbor Uppsala Database Laboratory Departmet of Iformatio Techology, Uppsala Uiversity,
More informationIMP: Superposer Integrated Morphometrics Package Superposition Tool
IMP: Superposer Itegrated Morphometrics Package Superpositio Tool Programmig by: David Lieber ( 03) Caisius College 200 Mai St. Buffalo, NY 4208 Cocept by: H. David Sheets, Dept. of Physics, Caisius College
More informationCluster Analysis. Andrew Kusiak Intelligent Systems Laboratory
Cluster Aalysis Adrew Kusiak Itelliget Systems Laboratory 2139 Seamas Ceter The Uiversity of Iowa Iowa City, Iowa 52242-1527 adrew-kusiak@uiowa.edu http://www.icae.uiowa.edu/~akusiak Two geeric modes of
More informationComparison of Different Distance Measures on Hierarchical Document Clustering in 2-Pass Retrieval
Compariso of Differet Distace Measures o Hierarchical Documet Clusterig i 2-Pass Retrieval Azam Jalali Farhad Oroumchia Mahmoud Reza Hejazi Departmet of Computer ad Electrical Egieerig, Faculty of Egieerig,
More informationComputers and Scientific Thinking
Computers ad Scietific Thikig David Reed, Creighto Uiversity Chapter 15 JavaScript Strigs 1 Strigs as Objects so far, your iteractive Web pages have maipulated strigs i simple ways use text box to iput
More informationPerformance Plus Software Parameter Definitions
Performace Plus+ Software Parameter Defiitios/ Performace Plus Software Parameter Defiitios Chapma Techical Note-TG-5 paramete.doc ev-0-03 Performace Plus+ Software Parameter Defiitios/2 Backgroud ad Defiitios
More informationarxiv: v2 [cs.ds] 24 Mar 2018
Similar Elemets ad Metric Labelig o Complete Graphs arxiv:1803.08037v [cs.ds] 4 Mar 018 Pedro F. Felzeszwalb Brow Uiversity Providece, RI, USA pff@brow.edu March 8, 018 We cosider a problem that ivolves
More informationDescriptive Statistics Summary Lists
Chapter 209 Descriptive Statistics Summary Lists Itroductio This procedure is used to summarize cotiuous data. Large volumes of such data may be easily summarized i statistical lists of meas, couts, stadard
More information. Written in factored form it is easy to see that the roots are 2, 2, i,
CMPS A Itroductio to Programmig Programmig Assigmet 4 I this assigmet you will write a java program that determies the real roots of a polyomial that lie withi a specified rage. Recall that the roots (or
More informationChapter 3 Classification of FFT Processor Algorithms
Chapter Classificatio of FFT Processor Algorithms The computatioal complexity of the Discrete Fourier trasform (DFT) is very high. It requires () 2 complex multiplicatios ad () complex additios [5]. As
More informationOn (K t e)-saturated Graphs
Noame mauscript No. (will be iserted by the editor O (K t e-saturated Graphs Jessica Fuller Roald J. Gould the date of receipt ad acceptace should be iserted later Abstract Give a graph H, we say a graph
More informationMorgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5
Morga Kaufma Publishers 26 February, 28 COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter 5 Set-Associative Cache Architecture Performace Summary Whe CPU performace icreases:
More informationStone Images Retrieval Based on Color Histogram
Stoe Images Retrieval Based o Color Histogram Qiag Zhao, Jie Yag, Jigyi Yag, Hogxig Liu School of Iformatio Egieerig, Wuha Uiversity of Techology Wuha, Chia Abstract Stoe images color features are chose
More informationA SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON
A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON Roberto Lopez ad Eugeio Oñate Iteratioal Ceter for Numerical Methods i Egieerig (CIMNE) Edificio C1, Gra Capitá s/, 08034 Barceloa, Spai ABSTRACT I this work
More informationPattern Recognition Systems Lab 1 Least Mean Squares
Patter Recogitio Systems Lab 1 Least Mea Squares 1. Objectives This laboratory work itroduces the OpeCV-based framework used throughout the course. I this assigmet a lie is fitted to a set of poits usig
More informationThe isoperimetric problem on the hypercube
The isoperimetric problem o the hypercube Prepared by: Steve Butler November 2, 2005 1 The isoperimetric problem We will cosider the -dimesioal hypercube Q Recall that the hypercube Q is a graph whose
More informationIdentification of the Swiss Z24 Highway Bridge by Frequency Domain Decomposition Brincker, Rune; Andersen, P.
Aalborg Uiversitet Idetificatio of the Swiss Z24 Highway Bridge by Frequecy Domai Decompositio Bricker, Rue; Aderse, P. Published i: Proceedigs of IMAC 2 Publicatio date: 22 Documet Versio Publisher's
More informationNew Fuzzy Color Clustering Algorithm Based on hsl Similarity
IFSA-EUSFLAT 009 New Fuzzy Color Clusterig Algorithm Based o hsl Similarity Vasile Ptracu Departmet of Iformatics Techology Tarom Compay Bucharest Romaia Email: patrascu.v@gmail.com Abstract I this paper
More informationBAYESIAN WITH FULL CONDITIONAL POSTERIOR DISTRIBUTION APPROACH FOR SOLUTION OF COMPLEX MODELS. Pudji Ismartini
Proceedig of Iteratioal Coferece O Research, Implemetatio Ad Educatio Of Mathematics Ad Scieces 014, Yogyakarta State Uiversity, 18-0 May 014 BAYESIAN WIH FULL CONDIIONAL POSERIOR DISRIBUION APPROACH FOR
More informationCombination Labelings Of Graphs
Applied Mathematics E-Notes, (0), - c ISSN 0-0 Available free at mirror sites of http://wwwmaththuedutw/ame/ Combiatio Labeligs Of Graphs Pak Chig Li y Received February 0 Abstract Suppose G = (V; E) is
More informationSorting in Linear Time. Data Structures and Algorithms Andrei Bulatov
Sortig i Liear Time Data Structures ad Algorithms Adrei Bulatov Algorithms Sortig i Liear Time 7-2 Compariso Sorts The oly test that all the algorithms we have cosidered so far is compariso The oly iformatio
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 18 Strategies for Query Processig Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio DBMS techiques to process a query Scaer idetifies
More informationLecture 1: Introduction and Strassen s Algorithm
5-750: Graduate Algorithms Jauary 7, 08 Lecture : Itroductio ad Strasse s Algorithm Lecturer: Gary Miller Scribe: Robert Parker Itroductio Machie models I this class, we will primarily use the Radom Access
More informationInvestigating methods for improving Bagged k-nn classifiers
Ivestigatig methods for improvig Bagged k-nn classifiers Fuad M. Alkoot Telecommuicatio & Navigatio Istitute, P.A.A.E.T. P.O.Box 4575, Alsalmia, 22046 Kuwait Abstract- We experimet with baggig knn classifiers
More informationThe Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana
The Closest Lie to a Data Set i the Plae David Gurey Southeaster Louisiaa Uiversity Hammod, Louisiaa ABSTRACT This paper looks at three differet measures of distace betwee a lie ad a data set i the plae:
More informationCLUSTERING TECHNIQUES TO ANALYSES IN DENSITY BASED SOCIAL NETWORKS
Iteratioal Joural of Computer Egieerig ad Applicatios, Volume VII, Issue II, Part I, August 14 CLUSTERING TECHNIQUES TO ANALYSES IN DENSITY BASED SOCIAL NETWORKS P. Logamai 1, Mrs. S. C. Puitha 2 1 Research
More informationSectio 4, a prototype project of settig field weight with AHP method is developed ad the experimetal results are aalyzed. Fially, we coclude our work
200 2d Iteratioal Coferece o Iformatio ad Multimedia Techology (ICIMT 200) IPCSIT vol. 42 (202) (202) IACSIT Press, Sigapore DOI: 0.7763/IPCSIT.202.V42.0 Idex Weight Decisio Based o AHP for Iformatio Retrieval
More informationCMPT 125 Assignment 2 Solutions
CMPT 25 Assigmet 2 Solutios Questio (20 marks total) a) Let s cosider a iteger array of size 0. (0 marks, each part is 2 marks) it a[0]; I. How would you assig a poiter, called pa, to store the address
More informationDesign and Implementation of Web Usage Mining Intelligent System in the Field of e-commerce
Available olie at www.sciecedirect.com Procedia Egieerig 30 (2012) 20 27 Iteratioal Coferece o Commuicatio Techology ad System Desig Desig ad Implemetatio of Web Usage Miig Itelliget System i the Field
More informationAnalysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis
Itro to Algorithm Aalysis Aalysis Metrics Slides. Table of Cotets. Aalysis Metrics 3. Exact Aalysis Rules 4. Simple Summatio 5. Summatio Formulas 6. Order of Magitude 7. Big-O otatio 8. Big-O Theorems
More informationcondition w i B i S maximum u i
ecture 10 Dyamic Programmig 10.1 Kapsack Problem November 1, 2004 ecturer: Kamal Jai Notes: Tobias Holgers We are give a set of items U = {a 1, a 2,..., a }. Each item has a weight w i Z + ad a utility
More information1 Graph Sparsfication
CME 305: Discrete Mathematics ad Algorithms 1 Graph Sparsficatio I this sectio we discuss the approximatio of a graph G(V, E) by a sparse graph H(V, F ) o the same vertex set. I particular, we cosider
More informationText Summarization using Neural Network Theory
Iteratioal Joural of Computer Systems (ISSN: 2394-065), Volume 03 Issue 07, July, 206 Available at http://www.ijcsolie.com/ Simra Kaur Jolly, Wg Cdr Ail Chopra 2 Departmet of CSE, Ligayas Uiversity, Faridabad
More informationChapter 9. Pointers and Dynamic Arrays. Copyright 2015 Pearson Education, Ltd.. All rights reserved.
Chapter 9 Poiters ad Dyamic Arrays Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Overview 9.1 Poiters 9.2 Dyamic Arrays Copyright 2015 Pearso Educatio, Ltd.. All rights reserved. Slide 9-3
More informationIntroduction. Nature-Inspired Computing. Terminology. Problem Types. Constraint Satisfaction Problems - CSP. Free Optimization Problem - FOP
Nature-Ispired Computig Hadlig Costraits Dr. Şima Uyar September 2006 Itroductio may practical problems are costraied ot all combiatios of variable values represet valid solutios feasible solutios ifeasible
More informationEuclidean Distance Based Feature Selection for Fault Detection Prediction Model in Semiconductor Manufacturing Process
Vol.133 (Iformatio Techology ad Computer Sciece 016), pp.85-89 http://dx.doi.org/10.1457/astl.016. Euclidea Distace Based Feature Selectio for Fault Detectio Predictio Model i Semicoductor Maufacturig
More informationBig-O Analysis. Asymptotics
Big-O Aalysis 1 Defiitio: Suppose that f() ad g() are oegative fuctios of. The we say that f() is O(g()) provided that there are costats C > 0 ad N > 0 such that for all > N, f() Cg(). Big-O expresses
More informationCounting the Number of Minimum Roman Dominating Functions of a Graph
Coutig the Number of Miimum Roma Domiatig Fuctios of a Graph SHI ZHENG ad KOH KHEE MENG, Natioal Uiversity of Sigapore We provide two algorithms coutig the umber of miimum Roma domiatig fuctios of a graph
More informationc-dominating Sets for Families of Graphs
c-domiatig Sets for Families of Graphs Kelsie Syder Mathematics Uiversity of Mary Washigto April 6, 011 1 Abstract The topic of domiatio i graphs has a rich history, begiig with chess ethusiasts i the
More informationAn Efficient Algorithm for Graph Bisection of Triangularizations
Applied Mathematical Scieces, Vol. 1, 2007, o. 25, 1203-1215 A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045, Oe
More informationBasic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.
5-23 The course that gives CM its Zip Memory Maagemet II: Dyamic Storage Allocatio Mar 6, 2000 Topics Segregated lists Buddy system Garbage collectio Mark ad Sweep Copyig eferece coutig Basic allocator
More informationImproving Information Retrieval System Security via an Optimal Maximal Coding Scheme
Improvig Iformatio Retrieval System Security via a Optimal Maximal Codig Scheme Dogyag Log Departmet of Computer Sciece, City Uiversity of Hog Kog, 8 Tat Chee Aveue Kowloo, Hog Kog SAR, PRC dylog@cs.cityu.edu.hk
More informationLecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein
068.670 Subliear Time Algorithms November, 0 Lecture 6 Lecturer: Roitt Rubifeld Scribes: Che Ziv, Eliav Buchik, Ophir Arie, Joatha Gradstei Lesso overview. Usig the oracle reductio framework for approximatig
More informationAssignment Problems with fuzzy costs using Ones Assignment Method
IOSR Joural of Mathematics (IOSR-JM) e-issn: 8-8, p-issn: 9-6. Volume, Issue Ver. V (Sep. - Oct.06), PP 8-89 www.iosrjourals.org Assigmet Problems with fuzzy costs usig Oes Assigmet Method S.Vimala, S.Krisha
More informationSAMPLE VERSUS POPULATION. Population - consists of all possible measurements that can be made on a particular item or procedure.
SAMPLE VERSUS POPULATION Populatio - cosists of all possible measuremets that ca be made o a particular item or procedure. Ofte a populatio has a ifiite umber of data elemets Geerally expese to determie
More informationHeuristic Approaches for Solving the Multidimensional Knapsack Problem (MKP)
Heuristic Approaches for Solvig the Multidimesioal Kapsack Problem (MKP) R. PARRA-HERNANDEZ N. DIMOPOULOS Departmet of Electrical ad Computer Eg. Uiversity of Victoria Victoria, B.C. CANADA Abstract: -
More informationA Parallel DFA Minimization Algorithm
A Parallel DFA Miimizatio Algorithm Ambuj Tewari, Utkarsh Srivastava, ad P. Gupta Departmet of Computer Sciece & Egieerig Idia Istitute of Techology Kapur Kapur 208 016,INDIA pg@iitk.ac.i Abstract. I this
More informationAlgorithms for Disk Covering Problems with the Most Points
Algorithms for Disk Coverig Problems with the Most Poits Bi Xiao Departmet of Computig Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog csbxiao@comp.polyu.edu.hk Qigfeg Zhuge, Yi He, Zili Shao, Edwi
More informationAn Efficient Algorithm for Graph Bisection of Triangularizations
A Efficiet Algorithm for Graph Bisectio of Triagularizatios Gerold Jäger Departmet of Computer Sciece Washigto Uiversity Campus Box 1045 Oe Brookigs Drive St. Louis, Missouri 63130-4899, USA jaegerg@cse.wustl.edu
More informationResearch on Interest Model of User Behavior
2011 Iteratioal Coferece o Computer Sciece ad Iformatio Techology (ICCSIT 2011) IPCSIT vol. 51 (2012) (2012) IACSIT Press, Sigapore DOI: 10.7763/IPCSIT.2012.V51.94 Research o Iterest Model of User Behavior
More informationn n B. How many subsets of C are there of cardinality n. We are selecting elements for such a
4. [10] Usig a combiatorial argumet, prove that for 1: = 0 = Let A ad B be disjoit sets of cardiality each ad C = A B. How may subsets of C are there of cardiality. We are selectig elemets for such a subset
More informationSpeeding-up dynamic programming in sequence alignment
Departmet of Computer Sciece Aarhus Uiversity Demark Speedig-up dyamic programmig i sequece aligmet Master s Thesis Dug My Hoa - 443 December, Supervisor: Christia Nørgaard Storm Pederse Implemetatio code
More informationFast Fourier Transform (FFT) Algorithms
Fast Fourier Trasform FFT Algorithms Relatio to the z-trasform elsewhere, ozero, z x z X x [ ] 2 ~ elsewhere,, ~ e j x X x x π j e z z X X π 2 ~ The DFS X represets evely spaced samples of the z- trasform
More informationHigher-order iterative methods free from second derivative for solving nonlinear equations
Iteratioal Joural of the Phsical Scieces Vol 6(8, pp 887-89, 8 April, Available olie at http://wwwacademicjouralsorg/ijps DOI: 5897/IJPS45 ISSN 99-95 Academic Jourals Full Legth Research Paper Higher-order
More informationEFFECT OF QUERY FORMATION ON WEB SEARCH ENGINE RESULTS
Iteratioal Joural o Natural Laguage Computig (IJNLC) Vol. 2, No., February 203 EFFECT OF QUERY FORMATION ON WEB SEARCH ENGINE RESULTS Raj Kishor Bisht ad Ila Pat Bisht 2 Departmet of Computer Sciece &
More informationEE123 Digital Signal Processing
Last Time EE Digital Sigal Processig Lecture 7 Block Covolutio, Overlap ad Add, FFT Discrete Fourier Trasform Properties of the Liear covolutio through circular Today Liear covolutio with Overlap ad add
More informationPython Programming: An Introduction to Computer Science
Pytho Programmig: A Itroductio to Computer Sciece Chapter 1 Computers ad Programs 1 Objectives To uderstad the respective roles of hardware ad software i a computig system. To lear what computer scietists
More informationA Novel Feature Extraction Algorithm for Haar Local Binary Pattern Texture Based on Human Vision System
A Novel Feature Extractio Algorithm for Haar Local Biary Patter Texture Based o Huma Visio System Liu Tao 1,* 1 Departmet of Electroic Egieerig Shaaxi Eergy Istitute Xiayag, Shaaxi, Chia Abstract The locality
More informationComparison of classification algorithms in the task of object recognition on radar images of the MSTAR base
Compariso of classificatio algorithms i the task of object recogitio o radar images of the MSTAR base A.A. Borodiov 1, V.V. Myasikov 1,2 1 Samara Natioal Research Uiversity, 34 Moskovskoe Shosse, 443086,
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe CHAPTER 19 Query Optimizatio Copyright 2016 Ramez Elmasri ad Shamkat B. Navathe Itroductio Query optimizatio Coducted by a query optimizer i a DBMS Goal:
More informationCOMP9318: Data Warehousing and Data Mining
COMP9318: Data Warehousig ad Data Miig L8: Clusterig COMP9318: Data Warehousig ad Data Miig 1 What is Cluster Aalysis? COMP9318: Data Warehousig ad Data Miig 2 What is Cluster Aalysis? Cluster: a collectio
More informationOne advantage that SONAR has over any other music-sequencing product I ve worked
*gajedra* D:/Thomso_Learig_Projects/Garrigus_163132/z_productio/z_3B2_3D_files/Garrigus_163132_ch17.3d, 14/11/08/16:26:39, 16:26, page: 647 17 CAL 101 Oe advatage that SONAR has over ay other music-sequecig
More informationEmpirical Validate C&K Suite for Predict Fault-Proneness of Object-Oriented Classes Developed Using Fuzzy Logic.
Empirical Validate C&K Suite for Predict Fault-Proeess of Object-Orieted Classes Developed Usig Fuzzy Logic. Mohammad Amro 1, Moataz Ahmed 1, Kaaa Faisal 2 1 Iformatio ad Computer Sciece Departmet, Kig
More informationGPUMP: a Multiple-Precision Integer Library for GPUs
GPUMP: a Multiple-Precisio Iteger Library for GPUs Kaiyog Zhao ad Xiaowe Chu Departmet of Computer Sciece, Hog Kog Baptist Uiversity Hog Kog, P. R. Chia Email: {kyzhao, chxw}@comp.hkbu.edu.hk Abstract
More informationCreating Exact Bezier Representations of CST Shapes. David D. Marshall. California Polytechnic State University, San Luis Obispo, CA , USA
Creatig Exact Bezier Represetatios of CST Shapes David D. Marshall Califoria Polytechic State Uiversity, Sa Luis Obispo, CA 93407-035, USA The paper presets a method of expressig CST shapes pioeered by
More informationOptimum Solution of Quadratic Programming Problem: By Wolfe s Modified Simplex Method
Volume VI, Issue III, March 7 ISSN 78-5 Optimum Solutio of Quadratic Programmig Problem: By Wolfe s Modified Simple Method Kalpaa Lokhade, P. G. Khot & N. W. Khobragade, Departmet of Mathematics, MJP Educatioal
More informationAlpha Individual Solutions MAΘ National Convention 2013
Alpha Idividual Solutios MAΘ Natioal Covetio 0 Aswers:. D. A. C 4. D 5. C 6. B 7. A 8. C 9. D 0. B. B. A. D 4. C 5. A 6. C 7. B 8. A 9. A 0. C. E. B. D 4. C 5. A 6. D 7. B 8. C 9. D 0. B TB. 570 TB. 5
More informationEvaluating Top-k Selection Queries
Evaluatig Top-k Selectio Queries Surajit Chaudhuri Microsoft Research surajitc@microsoft.com Luis Gravao Columbia Uiversity gravao@cs.columbia.edu Abstract I may applicatios, users specify target values
More informationDynamic Programming and Curve Fitting Based Road Boundary Detection
Dyamic Programmig ad Curve Fittig Based Road Boudary Detectio SHYAM PRASAD ADHIKARI, HYONGSUK KIM, Divisio of Electroics ad Iformatio Egieerig Chobuk Natioal Uiversity 664-4 Ga Deokji-Dog Jeoju-City Jeobuk
More informationThe Counterchanged Crossed Cube Interconnection Network and Its Topology Properties
WSEAS TRANSACTIONS o COMMUNICATIONS Wag Xiyag The Couterchaged Crossed Cube Itercoectio Network ad Its Topology Properties WANG XINYANG School of Computer Sciece ad Egieerig South Chia Uiversity of Techology
More informationReading. Subdivision curves and surfaces. Subdivision curves. Chaikin s algorithm. Recommended:
Readig Recommeded: Stollitz, DeRose, ad Salesi. Wavelets for Computer Graphics: Theory ad Applicatios, 996, sectio 6.-6.3, 0., A.5. Subdivisio curves ad surfaces Note: there is a error i Stollitz, et al.,
More informationSum-connectivity indices of trees and unicyclic graphs of fixed maximum degree
1 Sum-coectivity idices of trees ad uicyclic graphs of fixed maximum degree Zhibi Du a, Bo Zhou a *, Nead Triajstić b a Departmet of Mathematics, South Chia Normal Uiversity, uagzhou 510631, Chia email:
More informationSD vs. SD + One of the most important uses of sample statistics is to estimate the corresponding population parameters.
SD vs. SD + Oe of the most importat uses of sample statistics is to estimate the correspodig populatio parameters. The mea of a represetative sample is a good estimate of the mea of the populatio that
More informationFREQUENCY ESTIMATION OF INTERNET PACKET STREAMS WITH LIMITED SPACE: UPPER AND LOWER BOUNDS
FREQUENCY ESTIMATION OF INTERNET PACKET STREAMS WITH LIMITED SPACE: UPPER AND LOWER BOUNDS Prosejit Bose Evagelos Kraakis Pat Mori Yihui Tag School of Computer Sciece, Carleto Uiversity {jit,kraakis,mori,y
More informationBoost K-Means. Wan-Lei Zhao, Cheng-Hao Deng, and Chong-Wah Ngo, Senior Memeber, IEEE
1 Boost K-Meas Wa-Lei Zhao, Cheg-Hao Deg, ad Chog-Wah Ngo, Seior Memeber, IEEE Abstract Due to its simplicity ad versatility, remais popular sice it was proposed three decades ago. The performace of has
More informationare two specific neighboring points, F( x, y)
$33/,&$7,212)7+(6(/)$92,',1* 5$1'20:$/.12,6(5('8&7,21$/*25,7+0,17+(&2/285,0$*(6(*0(17$7,21 %RJGDQ602/.$+HQU\N3$/86'DPLDQ%(5(6.$ 6LOHVLDQ7HFKQLFDO8QLYHUVLW\'HSDUWPHQWRI&RPSXWHU6FLHQFH $NDGHPLFND*OLZLFH32/$1'
More informationBezier curves. Figure 2 shows cubic Bezier curves for various control points. In a Bezier curve, only
Edited: Yeh-Liag Hsu (998--; recommeded: Yeh-Liag Hsu (--9; last updated: Yeh-Liag Hsu (9--7. Note: This is the course material for ME55 Geometric modelig ad computer graphics, Yua Ze Uiversity. art of
More informationArithmetic Sequences
. Arithmetic Sequeces COMMON CORE Learig Stadards HSF-IF.A. HSF-BF.A.1a HSF-BF.A. HSF-LE.A. Essetial Questio How ca you use a arithmetic sequece to describe a patter? A arithmetic sequece is a ordered
More informationBOOLEAN MATHEMATICS: GENERAL THEORY
CHAPTER 3 BOOLEAN MATHEMATICS: GENERAL THEORY 3.1 ISOMORPHIC PROPERTIES The ame Boolea Arithmetic was chose because it was discovered that literal Boolea Algebra could have a isomorphic umerical aspect.
More informationFuzzy Rule Selection by Data Mining Criteria and Genetic Algorithms
Fuzzy Rule Selectio by Data Miig Criteria ad Geetic Algorithms Hisao Ishibuchi Dept. of Idustrial Egieerig Osaka Prefecture Uiversity 1-1 Gakue-cho, Sakai, Osaka 599-8531, JAPAN E-mail: hisaoi@ie.osakafu-u.ac.jp
More informationUnsupervised Discretization Using Kernel Density Estimation
Usupervised Discretizatio Usig Kerel Desity Estimatio Maregle Biba, Floriaa Esposito, Stefao Ferilli, Nicola Di Mauro, Teresa M.A Basile Departmet of Computer Sciece, Uiversity of Bari Via Oraboa 4, 7025
More information