Clustering documents with vector space model using n-grams
|
|
- Clement Shields
- 5 years ago
- Views:
Transcription
1 Clusterg documets wth vector space model usg -grams Klas Skogmar, Joha Olsso, Lud Isttute of Techology Supervsed by: Perre Nugues, Abstract Ths paper descrbe a method to cluster documets usg the lear space algorthm together wth ugrams, bgrams, trgrams ad -grams a attempt to ehace the clusterg performace compared to ugrams oly. Our tests dd ot reveal a mprovemet, but shows that the techque has a potetal, especally certa specfc areas lke fdg ctatos or versos of the same documet. The extra effort requred for mplemetato ad the speed loss could make t less terestg however.
2 Itroducto Ths paper s the result of a project doe for a course laguage processg at Lud Isttute of Techology. The project was madatory, but the topc of the project was optoal. We chose to use our ewly acqured kowledge o the beefts of -grams versus ugrams, but o a dfferet area, amely: documet clusterg. We wated to use basc stadard algorthms for everythg, because we wated to focus o the dffereces that -grams versus ugrams could make for the accuracy of clusterg. Therefore we chose the vector space model for determg smlartes betwee documets, ad k-meas for clusterg. We expected that the beefts of our suggested soluto would be that t ehaced the accuracy of clusterg, that t worked wth all avalable solutos (wth some mor modfcatos), ad that t gave better retrevg of keywords (whch we wll ot cosder ths paper). The accuracy should crease documets, where smlar words occur, but whe the orderg s dfferet. A example s Avod a wrtte door code ad code wrtte to avod the Troja back door, where they would be cosdered qute smlar usg ugrams, but ot as smlar usg bgrams or trgrams. I the example above Troja back door s a example of the beefts of trgrams over bgrams. The proposed algorthm wll ot gve as may beefts laguages where words are compoud words, as Swedsh ( cotrast to Eglsh). As far as we kow, most exstg algorthms oly use ugrams, although the advatages of -grams have prove most successful may laguage processg areas. The problem Clusterg s the ablty to automatcally group smlar documets, oly usg the text the documet tself. To do ths oe eeds to determe the smlarty betwee documets. There s also a eed to create referece documets whe combg two or more documets. Clusterg s doe may areas of computer scece, ad there are a lot of dfferet techques, lke algorthms, eural etworks, geetc algorthms, et cetera. The dfferet techques used deped o the applcato area. There are may applcatos for clusterg documets. Some examples are: Iteret search eges, kowledge maagemet systems or documet databases. Smlartes betwee documets could be used for searchg the local computer for documets or automatcally acqurg meta-data from documets for use verso maagemet systems. Usually clusterg of documets ca be combed wth several other techques to ehace the clusterg accuracy. These techques ca be: determg grammar, omttg commo words or puttg weghts to the words. Our soluto We chose to use the k-meas clusterg algorthm. Ths algorthm requres a dstace value betwee documets. For ths we used oe of the most wdely
3 used techques for determg smlartes: the Vector space model. The reaso for the choce of these algorthms s that they are very commo ad therefore kow by most computer lgusts. The fact that they are well documeted make them easy to mplemet ad easy to read for others. Also, we wated to focus o the beefts of -grams compared to ugrams. We chose to mplemet the project Java. I addto to the fact that t s object oreted ad that we are famlar wth the laguage, we could also show our progress usg a Applet o the Iteret. Sce ths s a commoly used laguage, others wll be able to copy our source code ad modfy t for ther ow purposes. Sce we had used regular expressos the course, we thought t would be good to clude that fuctoalty ths project as well. Ths makes the parsg of the documets very flexble. We used Java s stadard mplemetato of regular expressos (cluded sce Java.4). Vector space model The vector space model s wdely used for determg smlarty betwee documets [5]. Ths method sees each documet as a vector a word-space. The dmeso (umber of axs) of ths word-space s the umber of dfferet words the two documets. The umber of occurreces of each word determes the legth of that axs for that documet. The the vector space model determes the agle (cose coeffcet) or some other value (for example Jaccard or Dce coeffcets) [4] betwee the documets. Ths represets the smlarty betwee the documets. Whe usg -grams stead of ugrams, each axs cossts of more tha oe word. Sce we used Java as the programmg laguage, we dd some smple hertace to solve ths. We chose to use the cose coeffcet, whch determes the agle betwee the documets. [] preseted the algorthm for VSM as follows: r r cos( q, d) = = = q q d = where q s the query, d s the documet ad the umber of dfferet words. Sce we used oly the umber of words as the coeffcet, oly those words, whch have a word cout hgher tha zero both documets, wll produce a value. Therefore we smplfed ths algorthm, so that we would oly eed to multply the commo words the omator. Ths way the algorthm we use does ot requre ay computatos o the words that are ot part of both the documets. r r cos( d, d ) = q = qd = d d d d d =, d where d s documet, d s documet, ad qd the umber of commo words. We extract the words from the documets usg regular expressos, whch make t flexble to redefe the smallest parts that are aalyzed. The we sort usg Java s Mergesort for O(log) performace, stead of puttg the words a sorted lst drectly, whch would gve O( ) performace.,
4 Clusterg Accordg to [], the k-meas algorthm s the coceptually smplest method ad should probably be used frst o a ew data set because ts results are ofte suffcet. Ths summarzes the reasos why we used ths algorthm. The k-meas method s really smple; frst some cluster ceters (ceter of mass) are radomly chose (we pcked radomly chose documets). Each documet s assged to oe of these clusters (defed by the closest cetre). The a ew cluster cetre s calculated for each cluster. I the ext terato each documet s assged to oe of the ew cluster ceters that were prevously calculated. Curretly we terate a gve umber of tmes, stead of havg a stop crtero. The calculato of ew cluster ceters was doe usg oly the words of all the documets that cluster. A alteratve would have bee to clude all words all documets stead, but that would have requred more computatoal power. Results We have oly tested our algorthm o a lmted test, due to tme restrctos. We have also costructed examples where our algorthm outperforms tradtoal (ugram) methods. For example our method ca see dffereces texts wth words that are the same, but that come a dfferet order. I our test texts there are ot may cases where ths arses. I some areas there are two words that are belogg to each other, though, whch make our algorthm work better. Whe calculatg a Vector space model value betwee two documets, a choce has to be made betwee ugrams, bgrams, trgrams, et cetera. Sce we wated to aalyze the beefts of -grams, we dd both separate tests usg them dvdually ad a weghed sum of the ugrams, bgrams ad trgrams. We foud that the Vector space model betwee arbtrary documets s oly applcable usg up to 4-grams or 5- grams, uless you wat to spot ctatos or versos of exactly the same documets. We have chose to oly use the frst three values (up to trgrams) our clusterg tests. Whe we tested to cluster 6 texts from the New Scetst web page, 3 texts about eutros ad 3 texts about trasstors, they were clustered correctly usg both ugrams ad our exteded verso, whch cluded bgrams ad trgrams. Whe aalyzg the output of the Vector space model (the smlartes), we foud that our approach usg oly bgrams chaged the outcome of the clusterg somewhat the wrog drecto to what we had expected. Clusterg wth oly trgrams made t cluster properly aga, as dd the combato of the three. Oe reaso to the falure of bgrams could be the small quatty of texts, or the fact that we dd ot choose texts that were dfferet eough. Cocluso Our lmted testg shows that lookg at more tha oe word at a tme wo t ecessarly gve accuracy, but ca potetally gve other beefts whe comparg documets. I the few texts we have used, there s o obvous advatage of usg our suggested algorthm, although t could be our hadpcked documets that are ot adequately belogg to other areas. The extra tme of mplemetg the
5 algorthm probably makes t eve less terestg. We hope that people ca reuse, ad make use of, our source code, though. Besdes clusterg, -grams could be used to spot ctatos from oe documet to aother, or versos of the same documet, wth some small chages the source code. The method could also be used to extract key -grams, stead of extractg keywords from texts. These would probably descrbe the documet eve better tha sgle words. We thk there are a lot of potetally terestg areas where -grams ca be used, ad clusterg s oe of them. We have show that t has potetal, but the curret beefts are too small. Maybe a combato of -grams ad other techques wll prove to work best? Further testg of our proposed algorthm eeds to be doe. Applet, Java source, ad test texts To make t easer to aalyze our results, ad to preset the code used, a web page was created for the project. It s located at: e. There we have the source code for the project, together wth javadoc, a example Applet, the test text samples we used ad ths report.
6 Refereces [] Chrstopher D. Mag ad Hrch Schütze, 999, Foudatos of statstcal atural laguage processg, MIT Press. [] Adre Z. Broder, Steve C. Glassma, Mark S. Maasse, Geoffrey Zweg, 997, Sytactc Clusterg of the Web, Dgtal SRC Techcal Note , SRC/techcal-otes/abstracts/src-t html [3] Perre Nugues, 00, Itroducto to Laguage Processg ad Computatoal Lgustcs, Lecture Notes, Lud Isttute of Techology. [4] Damarks Tekske Uverstet, 00, Itroducto: Vector Space Model, Techcal report, ultmeda/textmg/ode5.html [5] Gerard Salto, A. Wog, C.S Yag, 975, A Vector Space Model for Automatc Idexg, Commucatos of the ACM, 8(), November. [6] Joh Zukowsk, 00, Java Tech Tps o usg regular expressos Java, Su JDC Techcal otes, per/jdctechtps/00/tt043.html
Face Recognition using Supervised & Unsupervised Techniques
Natoal Uversty of Sgapore EE5907-Patter recogto-2 NAIONAL UNIVERSIY OF SINGAPORE EE5907 Patter Recogto Project Part-2 Face Recogto usg Supervsed & Usupervsed echques SUBMIED BY: SUDEN NAME: harapa Reddy
More information1-D matrix method. U 4 transmitted. incident U 2. reflected U 1 U 5 U 3 L 2 L 3 L 4. EE 439 matrix method 1
-D matrx method We ca expad the smple plae-wave scatterg for -D examples that we ve see to a more versatle matrx approach that ca be used to hadle may terestg -D problems. The basc dea s that we ca break
More informationCS 2710 Foundations of AI Lecture 22. Machine learning. Machine Learning
CS 7 Foudatos of AI Lecture Mache learg Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square Mache Learg The feld of mache learg studes the desg of computer programs (agets) capable of learg from past eperece
More informationITEM ToolKit Technical Support Notes
ITEM ToolKt Notes Fault Tree Mathematcs Revew, Ic. 2875 Mchelle Drve Sute 300 Irve, CA 92606 Phoe: +1.240.297.4442 Fax: +1.240.297.4429 http://www.itemsoft.com Page 1 of 15 6/1/2016 Copyrght, Ic., All
More informationFor all questions, answer choice E) NOTA" means none of the above answers is correct. A) 50,500 B) 500,000 C) 500,500 D) 1,001,000 E) NOTA
For all questos, aswer choce " meas oe of the above aswers s correct.. What s the sum of the frst 000 postve tegers? A) 50,500 B) 500,000 C) 500,500 D),00,000. What s the sum of the tegers betwee 00 ad
More informationJournal of Chemical and Pharmaceutical Research, 2015, 7(3): Research Article
Avalable ole www.ocpr.com Joural of Chemcal ad Pharmaceutcal Research, 2015, 73):476-481 Research Artcle ISSN : 0975-7384 CODENUSA) : JCPRC5 Research o cocept smlarty calculato method based o sematc grd
More informationBlind Steganalysis for Digital Images using Support Vector Machine Method
06 Iteratoal Symposum o Electrocs ad Smart Devces (ISESD) November 9-30, 06 Bld Stegaalyss for Dgtal Images usg Support Vector Mache Method Marcelus Hery Meor School of Electrcal Egeerg ad Iformatcs Badug
More informationMachine Learning: Algorithms and Applications
/03/ Mache Learg: Algorthms ad Applcatos Florao Z Free Uversty of Boze-Bolzao Faculty of Computer Scece Academc Year 0-0 Lecture 3: th March 0 Naïve Bayes classfer ( Problem defto A trag set X, where each
More informationAPPLICATION OF CLUSTERING METHODS IN BANK S PROPENSITY MODEL
APPLICATION OF CLUSTERING METHODS IN BANK S PROPENSITY MODEL Sergej Srota Haa Řezaková Abstract Bak s propesty models are beg developed for busess support. They should help to choose clets wth a hgher
More informationSoftware Clustering Techniques and the Use of Combined Algorithm
Software Clusterg Techques ad the Use of Combed Algorthm M. Saeed, O. Maqbool, H.A. Babr, S.Z. Hassa, S.M. Sarwar Computer Scece Departmet Lahore Uversty of Maagemet Sceces DHA Lahore, Paksta oaza@lums.edu.pk
More informationDifferentiated Service of Streaming Media Playback Technology
Iteratoal Coferece o Advaced Iformato ad Commucato Techology for Educato (ICAICTE 2013) Dfferetated Servce of Streamg Meda Playback Techology CHENG Z-ao 1 MENG Bo 1 WANG Da-hua 1 ZHAO Yue 1 1 Iformatzato
More informationCOMSC 2613 Summer 2000
Programmg II Fal Exam COMSC 63 Summer Istructos: Name:. Prt your ame the space provded Studet Id:. Prt your studet detfer the space Secto: provded. Date: 3. Prt the secto umber of the secto whch you are
More informationOptimal Allocation of Complex Equipment System Maintainability
Optmal Allocato of Complex Equpmet System ataablty X Re Graduate School, Natoal Defese Uversty, Bejg, 100091, Cha edcal Protecto Laboratory, Naval edcal Research Isttute, Shagha, 200433, Cha Emal:rexs841013@163.com
More informationPerformance Impact of Load Balancers on Server Farms
erformace Impact of Load Balacers o Server Farms Ypg Dg BMC Software Server Farms have gaed popularty for provdg scalable ad relable computg / Web servces. A load balacer plays a key role ths archtecture,
More informationMATHEMATICAL PROGRAMMING MODEL OF THE CRITICAL CHAIN METHOD
MATHEMATICAL PROGRAMMING MODEL OF THE CRITICAL CHAIN METHOD TOMÁŠ ŠUBRT, PAVLÍNA LANGROVÁ CUA, SLOVAKIA Abstract Curretly there s creasgly dcated that most of classcal project maagemet methods s ot sutable
More informationNEURO FUZZY MODELING OF CONTROL SYSTEMS
NEURO FUZZY MODELING OF CONTROL SYSTEMS Efré Gorrosteta, Carlos Pedraza Cetro de Igeería y Desarrollo Idustral CIDESI, Av Pe de La Cuesta 70. Des. Sa Pablo. Querétaro, Qro, Méxco gorrosteta@teso.mx pedraza@cdes.mx
More informationEight Solved and Eight Open Problems in Elementary Geometry
Eght Solved ad Eght Ope Problems Elemetary Geometry Floret Smaradache Math & Scece Departmet Uversty of New Mexco, Gallup, US I ths paper we revew eght prevous proposed ad solved problems of elemetary
More informationArea and Power Efficient Modulo 2^n+1 Multiplier
Iteratoal Joural of Moder Egeerg Research (IJMER) www.jmer.com Vol.3, Issue.3, May-Jue. 013 pp-137-1376 ISSN: 49-6645 Area ad Power Effcet Modulo ^+1 Multpler K. Ptambar Patra, 1 Saket Shrvastava, Sehlata
More informationInternational Mathematical Forum, 1, 2006, no. 31, ON JONES POLYNOMIALS OF GRAPHS OF TORUS KNOTS K (2, q ) Tamer UGUR, Abdullah KOPUZLU
Iteratoal Mathematcal Forum,, 6, o., 57-54 ON JONES POLYNOMIALS OF RAPHS OF TORUS KNOTS K (, q ) Tamer UUR, Abdullah KOPUZLU Atatürk Uverst Scece Facult Dept. of. Math. 54 Erzurum, Turkey tugur@atau.edu.tr
More informationA Web Mining Based Network Personalized Learning System Hua PANG1, a, Jian YU1, Long WANG2, b
3rd Iteratoal Coferece o Machery, Materals ad Iformato Techology Applcatos (ICMMITA 05) A Web Mg Based Network Persoalzed Learg System Hua PANG, a, Ja YU, Log WANG, b College of Educato Techology, Sheyag
More informationWeb Page Clustering by Combining Dense Units
Web Page Clusterg by Combg Dese Uts Morteza Haghr Chehregha, Hassa Abolhassa ad Mostafa Haghr Chehregha Departmet of CE, Sharf Uversty of Techology, Tehra, IRA {haghr, abolhassa}@ce.sharf.edu Departmet
More informationPERSPECTIVES OF THE USE OF GENETIC ALGORITHMS IN CRYPTANALYSIS
PERSPECTIVES OF THE USE OF GENETIC ALGORITHMS IN CRYPTANALYSIS Lal Besela Sokhum State Uversty, Poltkovskaa str., Tbls, Georga Abstract Moder cryptosystems aalyss s a complex task, the soluto of whch s
More informationDescriptive Statistics: Measures of Center
Secto 2.3 Descrptve Statstcs: Measures of Ceter Frequec dstrbutos are helpful provdg formato about categorcal data, but wth umercal data we ma wat more formato. Statstc: s a umercal measure calculated
More informationText Categorization Based on a Similarity Approach
Text Categorzato Based o a Smlarty Approach Cha Yag Ju We School of Computer Scece & Egeerg, Uversty of Electroc Scece ad Techology of Cha, Chegdu 60054, P.R. Cha Abstract Text classfcato ca effcetly ehace
More informationAn Improved Fuzzy C-Means Clustering Algorithm Based on Potential Field
07 d Iteratoal Coferece o Advaces Maagemet Egeerg ad Iformato Techology (AMEIT 07) ISBN: 978--60595-457-8 A Improved Fuzzy C-Meas Clusterg Algorthm Based o Potetal Feld Yua-hag HAO, Zhu-chao YU *, X GAO
More informationUsing Linear-threshold Algorithms to Combine Multi-class Sub-experts
Usg Lear-threshold Algorthms to Combe Mult-class Sub-experts Chrs Mesterharm MESTERHA@CS.RUTGERS.EDU Rutgers Computer Scece Departmet 110 Frelghuyse Road Pscataway, NJ 08854 USA Abstract We preset a ew
More informationBezier curves. 1. Defining a Bezier curve. A closed Bezier curve can simply be generated by closing its characteristic polygon
Curve represetato Copyrght@, YZU Optmal Desg Laboratory. All rghts reserved. Last updated: Yeh-Lag Hsu (--). Note: Ths s the course materal for ME55 Geometrc modelg ad computer graphcs, Yua Ze Uversty.
More informationA Genetic K-means Clustering Algorithm Applied to Gene Expression Data
A Geetc K-meas Clusterg Algorthm Appled to Gee Expresso Data Fag-Xag Wu, W. J. Zhag, ad Athoy J. Kusal Dvso of Bomedcal Egeerg, Uversty of Sasatchewa, Sasatoo, S S7N 5A9, CANADA faw34@mal.usas.ca, zhagc@egr.usas.ca
More informationEight Solved and Eight Open Problems in Elementary Geometry
Eght Solved ad Eght Ope Problems Elemetary Geometry Floret Smaradache Math & Scece Departmet Uversty of New Mexco, Gallup, US I ths paper we revew eght prevous proposed ad solved problems of elemetary
More informationMINIMIZATION OF THE VALUE OF DAVIES-BOULDIN INDEX
MIIMIZATIO OF THE VALUE OF DAVIES-BOULDI IDEX ISMO ÄRÄIE ad PASI FRÄTI Departmet of Computer Scece, Uversty of Joesuu Box, FI-800 Joesuu, FILAD ABSTRACT We study the clusterg problem whe usg Daves-Bould
More informationAPR 1965 Aggregation Methodology
Sa Joaqu Valley Ar Polluto Cotrol Dstrct APR 1965 Aggregato Methodology Approved By: Sged Date: March 3, 2016 Araud Marjollet, Drector of Permt Servces Backgroud Health rsk modelg ad the collecto of emssos
More informationLP: example of formulations
LP: eample of formulatos Three classcal decso problems OR: Trasportato problem Product-m problem Producto plag problem Operatos Research Massmo Paolucc DIBRIS Uversty of Geova Trasportato problem The decso
More informationConstructive Semi-Supervised Classification Algorithm and Its Implement in Data Mining
Costructve Sem-Supervsed Classfcato Algorthm ad Its Implemet Data Mg Arvd Sgh Chadel, Arua Twar, ad Naredra S. Chaudhar Departmet of Computer Egg. Shr GS Ist of Tech.& Sc. SGSITS, 3, Par Road, Idore (M.P.)
More informationFitting. We ve learned how to detect edges, corners, blobs. Now what? We would like to form a. compact representation of
Fttg Fttg We ve leared how to detect edges, corers, blobs. Now what? We would lke to form a hgher-level, h l more compact represetato of the features the mage b groupg multple features accordg to a smple
More informationMorphological Ending based Strategies of Unknown Word Estimation for Statistical POS Urdu Tagger
IJCSES Iteratoal Joural of Computer Sceces ad Egeerg Systems, Vol., No.3, July 2007 CSES Iteratoal c2007 ISSN 0973-4406 67 Morphologcal Edg based Strateges of Ukow Word Estmato for Statstcal POS Urdu Tagger
More informationShort Vector SIMD Code Generation for DSP Algorithms
Short Vector SMD Code Geerato for DSP Algorthms Fraz Frachett Chrstoph Ueberhuber Appled ad Numercal Mathematcs Techcal Uversty of Vea Austra Markus Püschel José Moura Electrcal ad Computer Egeerg Carege
More informationA genetic procedure used to train RFB neural networks
A geetc procedure used to tra RFB eural etworks Costat-Iula VIZITIU Commucatos ad Electroc Systems Departmet Mltary Techcal Academy George Cosbuc Aveue 8-83 5 th Dstrct Bucharest ROMANIA vc@mta.ro http://www.mta.ro
More informationDenoising Algorithm Using Adaptive Block Based Singular Value Decomposition Filtering
Advaces Computer Scece Deosg Algorthm Usg Adaptve Block Based Sgular Value Decomposto Flterg SOMKAIT UDOMHUNSAKUL Departmet of Egeerg ad Archtecture Rajamagala Uversty of Techology Suvarabhum 7/ Suaya,
More informationAssociated Pagerank: A Content Relevance Weighted Pagerank Algorithm
Assocated Pagerak: A Cotet Relevace Weghted Pagerak Algorthm Jh-Shh Hsu,Cha-Che Ye Assocated Pagerak: A Cotet Relevace Weghted Pagerak Algorthm 1 Jh-Shh Hsu, 2 Cha-Che Ye 1 Natoal Yul Uversty of Scece
More informationAdministrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today
Admiistrative Fial project No office hours today UNSUPERVISED LEARNING David Kauchak CS 451 Fall 2013 Supervised learig Usupervised learig label label 1 label 3 model/ predictor label 4 label 5 Supervised
More informationUsing The ACO Algorithm in Image Segmentation for Optimal Thresholding 陳香伶財務金融系
教專研 95P- Usg The ACO Algorthm Image Segmetato for Optmal Thresholdg Abstract Usg The ACO Algorthm Image Segmetato for Optmal Thresholdg 陳香伶財務金融系 Despte the fact that the problem of thresholdg has bee qute
More informationSoftware reliability is defined as the probability of failure
Evolutoary Regresso Predcto for Software Cumulatve Falure Modelg: a comparatve study M. Beaddy, M. Wakrm & S. Aljahdal 2 : Dept. of Math. & Ifo. Equpe MMS, Ib Zohr Uversty Morocco. beaddym@yahoo.fr 2:
More informationEvaluation of Node and Link Importance Based on Network Topology and Traffic Information DU Xun-Wei, LIU Jun, GUO Wei
Advaced Materals Research Submtted: 2014-08-29 ISSN: 1662-8985, Vols. 1049-1050, pp 1765-1770 Accepted: 2014-09-01 do:10.4028/www.scetfc.et/amr.1049-1050.1765 Ole: 2014-10-10 2014 Tras Tech Publcatos,
More informationA Comparison of Univariate Smoothing Models: Application to Heart Rate Data Marcus Beal, Member, IEEE
A Comparso of Uvarate Smoothg Models: Applcato to Heart Rate Data Marcus Beal, Member, IEEE E-mal: bealm@pdx.edu Abstract There are a umber of uvarate smoothg models that ca be appled to a varety of olear
More informationAnalysis of Energy Consumption and Lifetime of Heterogeneous Wireless Sensor Networks
Aalyss of Eergy Cosumpto ad Lfetme of Heterogeeous Wreless Sesor Networks Erque J. Duarte-Melo, Mgya Lu EECS, Uversty of Mchga, A Arbor ejd, mgya @eecs.umch.edu Abstract The paper exames the performace
More informationA MapReduce-Based Multiple Flow Direction Runoff Simulation
A MapReduce-Based Multple Flow Drecto Ruoff Smulato Ahmed Sdahmed ad Gyozo Gdofalv GeoIformatcs, Urba lag ad Evromet, KTH Drottg Krstas väg 30 100 44 Stockholm Telephoe: +46-8-790 8709 Emal:{sdahmed, gyozo}@
More informationChEn 475 Statistical Analysis of Regression Lesson 1. The Need for Statistical Analysis of Regression
Statstcal-Regresso_hadout.xmcd Statstcal Aalss of Regresso ChE 475 Statstcal Aalss of Regresso Lesso. The Need for Statstcal Aalss of Regresso What do ou do wth dvdual expermetal data pots? How are the
More informationSALAM A. ISMAEEL Computer Man College for Computer Studies, Khartoum / Sudan
AAPTIVE HYBRI-WAVELET ETHO FOR GPS/ SYSTE INTEGRATION SALA A. ISAEEL Computer a College for Computer Studes, Khartoum / Suda salam.smaeel@gmal.com ABSTRACT I ths paper, a techque for estmato a global postog
More informationTransistor/Gate Sizing Optimization
Trasstor/Gate Szg Optmzato Gve: Logc etwork wth or wthout cell lbrary Fd: Optmal sze for each trasstor/gate to mmze area or power, both uder delay costrat Statc szg: based o tmg aalyss ad cosder all paths
More informationMode Changes in Priority Pre-emptively Scheduled Systems. K. W. Tindell, A. Burns, A. J. Wellings
ode hages rorty re-emptvely Scheduled Systems. W. dell, A. Burs, A.. Wellgs Departmet of omputer Scece, Uversty of York, Eglad Abstract may hard real tme systems the set of fuctos that a system s requred
More informationApplication of Genetic Algorithm for Computing a Global 3D Scene Exploration
Joural of Software Egeerg ad Applcatos, 2011, 4, 253-258 do:10.4236/jsea.2011.44028 Publshed Ole Aprl 2011 (http://www.scrp.org/joural/jsea) 253 Applcato of Geetc Algorthm for Computg a Global 3D Scee
More informationNine Solved and Nine Open Problems in Elementary Geometry
Ne Solved ad Ne Ope Problems Elemetary Geometry Floret Smaradache Math & Scece Departmet Uversty of New Mexco, Gallup, US I ths paper we revew e prevous proposed ad solved problems of elemetary D geometry
More informationWeighting Cache Replace Algorithm for Storage System
Weghtg Cache Replace Algorthm for Storage System Yhu Luo 2 Chagsheg Xe 2 Chegfeg Zhag 2 School of mathematcs ad Computer Scece, Hube Uversty, Wuha 430062, P.R. Cha 2 Natoal Storage System Laboratory, School
More informationMode-based temporal filtering for in-band wavelet video coding with spatial scalability
Mode-based temporal flterg for -bad wavelet vdeo codg wth spatal scalablty ogdog Zhag a*, Jzheg Xu b, Feg Wu b, Weju Zhag a, ogka Xog a a Image Commucato Isttute, Shagha Jao Tog Uversty, Shagha b Mcrosoft
More informationTerm Weighting Schemes Experiment Based on SVD for Malay Text Retrieval
IJCSS Iteratoal Joural o Computer Scece ad etwork Securty, VOL.8 o.0, October 2008 357 Term Weghtg Schemes Expermet Based o SVD or Malay Text Retreval ordaah Ab Samat, Masrah Azrah Azm Murad, Muhamad Tauk
More informationEnumerating XML Data for Dynamic Updating
Eumeratg XML Data for Dyamc Updatg Lau Ho Kt ad Vcet Ng Departmet of Computg, The Hog Kog Polytechc Uversty, Hug Hom, Kowloo, Hog Kog cstyg@comp.polyu.edu.h Abstract I ths paper, a ew mappg model, called
More informationReconstruction of Orthogonal Polygonal Lines
Recostructo of Orthogoal Polygoal Les Alexader Grbov ad Eugee Bodasky Evrometal System Research Isttute (ESRI) 380 New ork St. Redlads CA 9373-800 USA {agrbov ebodasky}@esr.com Abstract. A orthogoal polygoal
More informationOn a Sufficient and Necessary Condition for Graph Coloring
Ope Joural of Dscrete Matheatcs, 04, 4, -5 Publshed Ole Jauary 04 (http://wwwscrporg/joural/ojd) http://dxdoorg/0436/ojd04400 O a Suffcet ad Necessary Codto for raph Colorg Maodog Ye Departet of Matheatcs,
More informationBeijing University of Technology, Beijing , China; Beijing University of Technology, Beijing , China;
d Iteratoal Coferece o Machery, Materals Egeerg, Chemcal Egeerg ad Botechology (MMECEB 5) Research of error detecto ad compesato of CNC mache tools based o laser terferometer Yuemg Zhag, a, Xuxu Chu, b
More informationParallel Iterative Poisson Solver for a Distributed Memory Architecture
Parallel Iteratve Posso Solver for a Dstrbted Memory Archtectre Erc Dow Aerospace Comptatoal Desg Lab Departmet of Aeroatcs ad Astroatcs 2 Motvato Solvg Posso s Eqato s a commo sbproblem may mercal schemes
More informationDelay based Duplicate Transmission Avoid (DDA) Coordination Scheme for Opportunistic routing
Delay based Duplcate Trasmsso Avod (DDA) Coordato Scheme for Opportustc routg Ng L, Studet Member IEEE, Jose-Fera Martez-Ortega, Vcete Heradez Daz Abstract-Sce the packet s trasmtted to a set of relayg
More informationProcess Quality Evaluation based on Maximum Entropy Principle. Yuhong Wang, Chuanliang Zhang, Wei Dai a and Yu Zhao
Appled Mechacs ad Materals Submtted: 204-08-26 ISSN: 662-7482, Vols. 668-669, pp 625-628 Accepted: 204-09-02 do:0.4028/www.scetfc.et/amm.668-669.625 Ole: 204-0-08 204 Tras Tech Publcatos, Swtzerlad Process
More informationIntegrating Language Model in Handwritten Chinese Text Recognition
2009 10th Iteratoal Coferece o Documet Aalyss ad Recogto Itegratg Laguage Model Hadwrtte Chese Text Recogto Qu-Feg Wag, Fe Y, Cheg-L Lu Natoal Laboratory of Patter Recogto (NLPR), Isttute of Automato,
More informationClassification Web Pages By Using User Web Navigation Matrix By Mementic Algorithm
Classfcato Web Pages By Usg User Web Navgato Matrx By Memetc Algorthm 1 Parvaeh roustae 2 Mehd sadegh zadeh 1 Studet of Computer Egeerg Software EgeergDepartmet of ComputerEgeerg, Bushehr brach,
More informationA Type of Variation of Hamilton Path Problem with Applications
Edth Cowa Uersty Research Ole ECU Publcatos Pre. 20 2008 A Type of Varato of Hamlto Path Problem wth Applcatos Jta Xao Edth Cowa Uersty Ju Wag Wezhou Uersty, Zhejag, Cha 0.09/ICYCS.2008.470 Ths artcle
More informationQUADRATURE POINTS ON POLYHEDRAL ELEMENTS
QUADRATURE POINTS ON POLYHEDRAL ELEMENTS Tobas Pck, Peter Mlbradt 2 ABSTRACT The method of the fte elemets s a flexble umerc procedure both for terpolato ad approxmato of the solutos of partal dfferetal
More informationOffice Hours. COS 341 Discrete Math. Office Hours. Homework 8. Currently, my office hours are on Friday, from 2:30 to 3:30.
Oce Hours Curretly, my oce hours are o Frday, rom :30 to 3:30. COS 31 Dscrete Math 1 Oce Hours Curretly, my oce hours are o Frday, rom :30 to 3:30. Nobody seems to care. Chage oce hours? Tuesday, 8 PM
More informationAN IMPROVED TEXT CLASSIFICATION METHOD BASED ON GINI INDEX
Joural of Theoretcal ad Appled Iformato Techology 30 th September 0. Vol. 43 No. 005-0 JATIT & LLS. All rghts reserved. ISSN: 99-8645 www.jatt.org E-ISSN: 87-395 AN IMPROVED TEXT CLASSIFICATION METHOD
More informationAdaptive Clustering Algorithm for Mining Subspace Clusters in High-Dimensional Data Stream *
ISSN 673-948 CODEN JKYTA8 E-mal: fcst@vp.63.com Joural of Froters of Computer Scece ad Techology http://www.ceaj.org 673-948/200/04(09)-0859-06 Tel: +86-0-566056 DOI: 0.3778/j.ss.673-948.200.09.009 *,2,
More informationSVM Classification Method Based Marginal Points of Representative Sample Sets
P P College P P College P Iteratoal Joural of Iformato Techology Vol. No. 9 005 SVM Classfcato Method Based Margal Pots of Represetatve Sample Sets Wecag ZhaoP P, Guagrog JP P, Ru NaP P, ad Che FegP of
More informationA modified Logic Scoring Preference method for dynamic Web services evaluation and selection
A modfed Logc Scorg Preferece method for dyamc Web servces evaluato ad selecto Hog Qg Yu ad Herá Mola 2 Departmet of Computer Scece, Uversty of Lecester, UK hqy@mcs.le.ac.uk 2 Departmet of Iformatcs, School
More informationFace Authentication for Multiple Subjects Using Eigenflow
Face Authetcato for Multple Subjects Usg Egeflow Xaomg Lu Tsuha Che ad B.V.K. Vjaya Kumar Advaced Multmeda Processg Lab Techcal Report AMP -5 Aprl 2 Electrcal ad Computer Egeerg Carege Mello Uversty Pttsburgh,
More informationPreventing Information Leakage in C Applications Using RBAC-Based Model
Proceedgs of the 5th WSEAS It. Cof. o Software Egeerg Parallel ad Dstrbuted Systems Madrd Spa February 5-7 2006 (pp69-73) Prevetg Iformato Leakage C Applcatos Usg RBAC-Based Model SHIH-CHIEN CHOU Departmet
More informationA New Hybrid Audio Classification Algorithm Based on SVM Weight Factor and Euclidean Distance
Proceedgs of the 2007 WSEAS Iteratoal Coferece o Computer Egeerg ad Applcatos, Gold Coast, Australa, Jauary 7-9, 2007 52 A New Hybrd Audo Classfcato Algorthm Based o SVM Weght Factor ad Eucldea Dstace
More informationSpeeding- Up Fractal Image Compression Using Entropy Technique
Avalable Ole at www.jcsmc.com Iteratoal Joural of Computer Scece ad Moble Computg A Mothly Joural of Computer Scece ad Iformato Techology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC, Vol. 5, Issue. 4, Aprl
More informationA Novel Optimization Algorithm for Adaptive Simplex Method with Application to High Dimensional Functions
A Novel Optmzato Algorthm for Adaptve Smplex Method th Applcato to Hgh Dmesoal Fuctos ZuoYg LIU, XaWe LUO 2 Southest Uversty Chogqg 402460, Cha, zuyglu@alyu.com 2 Southest Uversty Chogqg 402460, Cha, xael@su.edu.c.
More informationReview Statistics review 1: Presenting and summarising data Elise Whitley* and Jonathan Ball
Crtcal Care February Vol 6 No Whtley ad Ball Revew Statstcs revew : Presetg ad summarsg data Else Whtley* ad Joatha Ball *Lecturer Medcal Statstcs, Uversty of Brstol, Brstol, UK Lecturer Itesve Care Medce,
More informationA Simple Dimensionality Reduction Technique for Fast Similarity Search in Large Time Series Databases
A Smple Dmesoalty Reducto Techque for Fast Smlarty Search Large Tme Seres Databases Eamo J. Keogh ad Mchael J. Pazza Departmet of Iformato ad Computer Scece Uversty of Calfora, Irve, Calfora 92697 USA
More informationMachine Learning. CS 2750 Machine Learning. Administration. Lecture 1. Milos Hauskrecht 5329 Sennott Square, x4-8845
CS 75 Mache Learg Lecture Mache Learg Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square, 5 people.cs.ptt.edu/~mlos/courses/cs75/ Admstrato Istructor: Prof. Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square,
More informationA PROCEDURE FOR SOLVING INTEGER BILEVEL LINEAR PROGRAMMING PROBLEMS
ISSN: 39-8753 Iteratoal Joural of Iovatve Research Scece, Egeerg ad Techology A ISO 397: 7 Certfed Orgazato) Vol. 3, Issue, Jauary 4 A PROCEDURE FOR SOLVING INTEGER BILEVEL LINEAR PROGRAMMING PROBLEMS
More informationSupplementary Information
Supplemetary Iformato A Self-Trag Subspace Clusterg Algorthm uder Low-Rak Represetato for Cacer Classfcato o Gee Expresso Data Chu-Qu Xa 1, Ke Ha 1, Yog Q 1, Yag Zhag 2, ad Dog-Ju Yu 1,2, 1 School of Computer
More informationBODY MEASUREMENT USING 3D HANDHELD SCANNER
Movemet, Health & Exercse, 7(1), 179-187, 2018 BODY MEASUREMENT USING 3D HANDHELD SCANNER Mohamed Najb b Salleh *, Halm b Mad Lazm, ad Hedrk b Lamsal Techology ad Supply Cha Isttute, School of Techology
More informationFEATURE SELECTION ON COMBINATIONS FOR EFFICIENT LEARNING FROM IMAGES. Rong Xiao, Lei Zhang, and Hong-Jiang Zhang
FEATURE SELECTION ON COMBINATIONS FOR EFFICIENT LEARNING FROM IMAGES Rog Xao, Le Zhag, ad Hog-Jag Zhag Mcrosoft Research Asa, Bejg 100080, P.R. Cha {t-rxao, lezhag, hjzhag}@mcrosoft.com ABSTRACT Due to
More informationMining Web Content Outliers using Structure Oriented Weighting Techniques and N-Grams
2005 ACM Symposum o Appled Computg Mg Web Cotet Outlers usg Structure Oreted Weghtg Techques ad N-Grams Mal Agyemag Ke Barer Rada S. Alha Departmet of Computer Scece Uversty of Calgary 2500 Uversty Drve
More informationUnsupervised Discretization Using Kernel Density Estimation
Usupervsed Dscretzato Usg Kerel Desty Estmato Maregle Bba, Floraa Esposto, Stefao Ferll, Ncola D Mauro, Teresa M.A Basle Departmet of Computer Scece, Uversty of Bar Va Oraboa 4, 7025 Bar, Italy {bba,esposto,ferll,dm,basle}@d.uba.t
More informationSummary of Curve Smoothing Technology Development
RESEARCH ARTICLE Summary of Curve Smoothg Techology Developmet Wu Yze,ZhagXu,Jag Mgyag (College of Mechacal Egeerg, Shagha Uversty of Egeerg Scece, Shagha, Cha) Abstract: Wth the cotuous developmet of
More informationA Comparison of Heuristics for Scheduling Spatial Clusters to Reduce I/O Cost in Spatial Join Processing
Edth Cowa Uversty Research Ole ECU Publcatos Pre. 20 2006 A Comparso of Heurstcs for Schedulg Spatal Clusters to Reduce I/O Cost Spatal Jo Processg Jta Xao Edth Cowa Uversty 0.09/ICMLC.2006.258779 Ths
More informationA hybrid method using FAHP and TOPSIS for project selection Xuan Lia, Jiang Jiangb and Su Deng c
5th Iteratoal Coferece o Computer Sceces ad Automato Egeerg (ICCSAE 205) A hybrd method usg FAHP ad TOPSIS for project selecto Xua La, Jag Jagb ad Su Deg c College of Iformato System ad Maagemet, Natoal
More informationABSTRACT Keywords
A Preprocessg Scheme for Hgh-Cardalty Categorcal Attrbutes Classfcato ad Predcto Problems Daele Mcc-Barreca ClearCommerce Corporato 1100 Metrc Blvd. Aust, TX 78732 ABSTRACT Categorcal data felds characterzed
More informationANALYSIS OF VARIANCE WITH PARETO DATA
Proceedgs of the th Aual Coferece of Asa Pacfc Decso Sceces Isttute Hog Kog, Jue -8, 006, pp. 599-609. ANALYSIS OF VARIANCE WITH PARETO DATA Lakhaa Watthaacheewakul Departmet of Mathematcs ad Statstcs,
More informationA New Newton s Method with Diagonal Jacobian Approximation for Systems of Nonlinear Equations
Joural of Mathematcs ad Statstcs 6 (3): 46-5, ISSN 549-3644 Scece Publcatos A New Newto s Method wth Dagoal Jacoba Appromato for Systems of Nolear Equatos M.Y. Wazr, W.J. Leog, M.A. Hassa ad M. Mos Departmet
More informationClustering Algorithm for High Dimensional Data Stream over Sliding Windows
2011 Iteratoal Jot Coferece of IEEE TrustCom-11/IEEE ICESS-11/FCST-11 Clusterg Algorthm for Hgh Dmesoal Data Stream over Sldg Wdows Weguo Lu, Ja OuYag School of Iformato Scece ad Egeerg, Cetral South Uversty,
More informationFrom Math to Efficient Hardware. James C. Hoe Department of ECE Carnegie Mellon University
FFT Compler: From Math to Effcet Hardware James C. Hoe Departmet of ECE Carege Mello Uversty jot wor wth Peter A. Mlder, Fraz Frachett, ad Marus Pueschel the SPIRAL project wth support from NSF ACR-3493,
More informationSpatial Interpolation Using Neural Fuzzy Technique
Wog, K.W., Gedeo, T., Fug, C.C. ad Wog, P.M. (00) Spatal terpolato usg eural fuzzy techque. I: Proceedgs of the 8th Iteratoal Coferece o Neural Iformato Processg (ICONIP), Shagha, Cha Spatal Iterpolato
More information2 General Regression Neural Network (GRNN)
4 Geeral Regresso Neural Network (GRNN) GRNN, as proposed b oald F. Specht [Specht 9] falls to the categor of probablstc eural etworks as dscussed Chapter oe. Ths eural etwork lke other probablstc eural
More informationProf. Feng Liu. Winter /24/2019
Prof. Feg Lu Wter 209 http://www.cs.pd.edu/~flu/courses/cs40/ 0/24/209 Last Tme Feature detecto 2 Toda Feature matchg Fttg The followg sldes are largel from Prof. S. Lazebk 3 Wh etract features? Motvato:
More informationStatistical Techniques Employed in Atmospheric Sampling
Appedx A Statstcal Techques Employed Atmospherc Samplg A.1 Itroducto Proper use of statstcs ad statstcal techques s ecessary for assessg the qualty of ambet ar samplg data. For a comprehesve dscusso of
More informationSignal Classification Method Based on Support Vector Machine and High-Order Cumulants
Wreless Sesor Network,,, 48-5 do:.46/ws..7 Publshed Ole Jauary (http://www.scrp.org/joural/ws/). Sgal Classfcato Method Based o Support Vector Mache ad Hgh-Order Cumulats Abstract X ZHOU, Yg WU, B YANG
More informationA Novel Clustering Algorithm Based on Graph Matching
JOURNAL OF SOFTWARE, VOL. 8, NO. 4, APRIL 203 035 A Novel Clusterg Algorthm Based o raph Matchg uoyua L School of Computer Scece ad Techology, Cha Uversty of Mg ad Techology, Xuzhou, Cha State Key Laboratory
More informationTDT-2004: ADAPTIVE TOPIC TRACKING AT MARYLAND
TDT-2004: ADAPTIVE TOPIC TRACKING AT MARYLAND Tamer Elsayed, Douglas W. Oard, Davd Doerma Isttute for Advaced r Studes Uversty of Marylad, College Park, MD 20742 Cotact author: telsayed@cs.umd.edu Gary
More information