Optimizing Document Scoring for Query Retrieval
|
|
- Sara Atkinson
- 5 years ago
- Views:
Transcription
1 Optmzng Document Scorng for Query Retreval Brent Ellwen Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng to fnd an optmal scorng functon gven term frequency, document frequency, and query matchng. The accuracy of the results was measured as the greatest average precson of the top ten scored documents. Background Ths project consdered 3 statstcs when scorng a document s relevance to a gven query. These statstcs are common throughout Informaton Retreval: Term Frequency (tf) a measure of the number of tmes that a gven token appears n a document Inverse Document Frequency (df) a measure of the nverse of the number of documents that the token appears n Overlap (coord) a measure of the number of query terms n the document For a typcal nformaton retreval system, one would not use the raw number, but would nstead use a scalng functon of the statstc. One mght use square roots or logs to perform the scalng. The total score s reported as a combnaton of the ndvdual terms. However, t s not generally known what the best scorng functon or combnaton of functons s. In fact, the best scorng functon may be dfferent dependng on the doman of the corpus and any artfacts of the content 1. Expermental Condtons For tranng, testng, and the collecton of results, I used the Cranfeld collecton. Ths corpus conssts of approxmately 14 abstracts and 226 queres all related to aeronautcal engneerng. Each document s scored relevant or not relevant for each of the queres. 1 For example, when tryng to avod spam documents on the nternet, one mght want to severely dampen, or even lmt the tf score. Scorng s calculated as the scaled product of tf, df, and coord. After computng the score for each document, the top 1 most relevant documents are compared aganst the known relevancy values. The precson s the rato of relevant documents to total number of documents returned (1). These precson scores are then averaged and the result s the overall precson of the engne on the queres. The objectve s to maxmze ths overall precson. Learnng Envronment In order to fnd the optmal combnaton of scalng functons, I used the followng parameterzaton functon weghts. Each of the parameters θ are optmzed by the learnng algorthm. tf (as a functon of frequency f) f θ + θ + θ log( f + θ f + θe 1f 2 ) 3 4 df (as a functon of document frequency df and number of documents n) θ + θ / df+ 1+ θ log(/ n df+ 1) + θ n/ df+ + θ ( ) ( ) ( ) df 5 6n coord (as a functon of term matches m and number of query terms q) θ + θ / q+ θ log( m/ q) + θ m q+ θ m 1 11m / 14 total score (product of the parameters) θ 15 tf 16 * 17 * θ df θ coord Each parameter was optmzed accordng the K-fold cross valdaton for varous values of K. It seems obvous that the parameters θ 15, θ 16, and θ 17 are redundant (that s to say ther values could n effect be combned wth the parameters from the groups they correspond to), but early experments showed that they mproved the convergence rate. Later, as normalzed Coordnate Ascent was mplemented, ther purpose was decoupled from the parameters nsde the groups and they became scalng factors for each group as a whole. The default parameters for the query engne, whch are typcal default parameters used throughout the feld of Informaton Retreval, are: tf * (log( n /( df + 1)) + 1) * m / q In other words, θ =θ 3 =θ 5 =θ 7 =θ 11 =θ 15 =θ 16 =θ 17 =1. All others would be set to zero.
2 Bounded Coordnate Ascent The frst algorthm I attempted to optmze the parameters wth was a bounded coordnate ascent model. It s mportant to note that each tranng teraton started wth random values for each of the parameters. I dscovered that f I ntalzed the values to the default values, that the algorthm was unable to learn from the local optmum that the values represent. In ths model, each parameter s optmzed ndvdually accordng to the Newton- Raphson method. At any gven optmzaton step, the values each theta s allowed to take are strctly bounded by -1 and 1. The model was ntalzed wth random values and optmzaton was repeated untl convergence. The results of ths optmzaton were hghly unsatsfactory compared aganst the default confguraton. The followng charts summarze the precson results for a couple tral of k-fold valdaton. For these charts, tranng refers to the queres used as the tranng set and test refers to the queres held out of tranng and used for valdaton. Default values refer to the default confguraton mentoned above. Learned refers to the optmzed parameters Average Precson (K=1) Average Precson (K=2) It s easy to see from these graphs that the optmzed classfer generally performs worse than the default classfer for both tranng and test queres. The optmzed classfer more closely matches, and even out-performs the default classfer when K s large whch suggests that a large number of tranng examples s necessary n order to properly optmze the parameters. Further experments show that even when usng the entre dataset, the optmzed classfer wll typcally under perform the default classfer. An explanaton for the negatve results s that tranng yelded a hgh level of varaton n the parameters across tranng datasets. Ths ndcates the presence of many local optma that the algorthm was unable to recover from. Below s a table whch summarzes the fluctuatons n the values for each theta Standard Devaton for each theta K = 5 K = 1 K = 2 As can clearly be shown n the chart, nearly each value for theta s fluctuatng volently across each of K trals of K-fold cross valdaton. The hgh degree of varaton led me to beleve that the parameters were not convergng to optma close to the global maxmum, but were too hghly nfluence by ther ntal random values. Normalzed Coordnate Ascent The next logcal step n order to ncrease the precson of the optmzed query engne was to add restrctons on the values each parameter could hold. As before, the values had to be ntalzed at random. Frst, I tred restrctng the value to be bounded by and 1, but that dd not provde notceable mprovement. Then, I added the restrcton that, after each optmzaton, all the values be re-normalzed across categores so that they would sum to 1. Naturally, all the values would then be restrcted to the range -1 after optmzaton, but the Newton-Raphson steps would stll be allowed to explore beyond the 1 boundary. Whle ths yelded slghtly better results, the results were stll on average only equal to the default classfer. Gong another step further, I restrcted the parameters to sum to one only wthn specfc groupngs. follows: tf group 5 = The groupngs are enumerated as
3 df group 1 = 6 coord group 14 = 1 total score group 17 = 15 I ran the tranng and testng examples agan and recorded that the fluctuatons n the parameters theta had notceably reduced. Whle t s true that the forced scalng s cause for some scaled reducton, another thng to notce s that several parameters almost never change. The followng chart, as before, summarzes the results Standard Devaton for theta (normalzed) K = 5 K = 1 K = 2 Whle there s some varaton n the data, ths can be explaned by lookng at the raw values for the parameters (whch are too numerous to nclude n ths report). I wll provde an n-depth dscusson for each parameter whch wll provde some reasonng for the behavor of ts optmzed value. Havng succeeded n reducng the varance of the optmzaton, we can now look more closely at the tral results for k-fold cross valdaton. As before, the followng precson charts summarze the behavor for each of K = 5,1, Average Precson (K = 5) (normalzed) Average Precson (K = 1) (normalzed) Average Precson (K = 2) (normalzed) Notce that the precson s consstently better for the learned parameters over the default settngs. Ths generally apples both to the tranng data (as you would expect) and to the testng data as well! One nterestng tem to notce s that n comparng K=1 to K=2, we fnd a hgher percentage of the cross-valdaton precsons to be less than the default for the case of K=2. Ths suggests that the learnng algorthm s overtranng on the tranng data and that the algorthm has too much varance and not enough bas. Lookng for trends n the data, I wll suggest why ths mght be the so. Parameter optmzaton and trends The learnng algorthm parttons the parameters nto 4 groups by ther normalzaton. In ths dscusson, t s convenent to thnk of the parameters n these natural groupngs. Total Score Group Parameters θ 15-17, behave n one of 2 dstnct ways. Ether θ 17 s domnate (.e. has a value between.9 and 1.), or the weght s roughly evenly dstrbuted among the 3 parameters. For the case where θ 17 s domnate, the dfference n test precsons between default and optmzed parameters tend to be larger than when the weght s more evenly dstrbuted. Even though the parameters are multpled, and thus can be treated as a sngle entty, arrangng them n ths fashon allowed the learnng algorthm to
4 retran ths parameter multple tmes. Experment tests show that the learnng algorthm performs less effectvely f the parameters are collapsed nto a sngle entty. TF Group For each tral the optmzaton of both θ and θ 4 was essentally. Effectvely, the algorthm has provded expermental evdence to suggest that one would not want to use an exponental or constant term frequency. The other parameters n ths group θ 1-3 were gven approxmately equal weghts and grouped wth apparent randomness. That s to say, for one tral, θ 1 and θ 2 mght each have 1/2 whle θ 3 s whle on another tral θ 1, θ 2, and θ 3 mght each have 1/3. On other trals, ether θ 1, θ 2 or θ 3 mght be 1 whle the other 2 zero. Because there s not a strong correlaton between any combnaton and a more precse result, the data suggests that any of the scalng functons (lner, logarthmc or root) wll have an equal beneft on the overall query precson. IDF Group The parameter θ 7 domnated ths group. As s evdenced by the low devatons for each element n ths group (gnorng the nosy case of K=5, whch s not enough tranng samples), the values for each theta were essentally the same for each set of tranng data. Whle the other values, θ 5, θ 6, θ 8, θ 9 were each zero, or very near zero, θ 7 was consstently at or very near 1. So, the expermental data suggests that a good classfer wll use log( n /( df + 1)) as a scorng functon for df. 2 COORD Group As evdenced by the low devatons for these parameters, they nearly always optmzed to the same value. In the case of θ 11, θ 12, θ 13, that value was always zero. The values of θ 1, and θ 14 were coupled. Ether θ 14 was one and θ 1 zero, or they both were a half. There s a hgh correlaton (about 66%) of the case where θ 1 θ 14.5 and the test results beng lower for the optmzed parameters than the default parameters. So, t would seem that t s more advantageous (hgher bas) to have θ 14 equal 1 and the others set to zero. Functonally, ths corresponds to only takng nto consderaton the number of matches between a query and a document and gnorng the sze of the document altogether. 2 Recall that n s the number of documents n the corpus whle df s the number of documents whch contan the term of nterest. Benchmarkng Durng the dscusson of the tests, I have shown that the optmzaton process can produce gans n average precson over the top ten results. Now, I wll move the dscusson to nclude some other tests to see f the optmzed query parameters are really better, or f the gans n precson came at too great a cost. Precson - Recall One metrc common n IR s to plot a precsonrecall curve. Comparng both the default and optmzed parameters, the followng precsonrecall curves are produced: Precson Default Recall Learned It s easy to tell from the graph that there s not a notceable dfference between the two classfers when averaged over all the queres. Generally, the learned classfer has a slghtly hgher precson rate as evdenced by the plot below. For the plot, postve values ndcate that the learned classfer performed better,.e. the dfferences as calculated as learned - default. Dfference n Precson Recall Clearly, by measurng only the rate of precson at a gven recall level, there s not a sgnfcant dfference between the learned and default classfer. Whle ths result may not be terrbly mpressve, t does show that the learned query parameters have a smlar bas and varance as the default query parameters. Now, havng shown that the learned parameters are on average no worse than the default parameters, lets take a closer look at precson n the top 1.
5 Snce we were tryng to optmze the precson n the top ten documents, we would expect to see some performance ncreases for the top ten results. The queres can be grouped nto three groupngs: those where the precson was the same, those where the learned performed better and those where the default performed better. Default was better (18 queres) Typcally, the dfference between default and learned was only one, n fact the of the 18 queres n ths group, only one query had a dfference less than -1. Many of the losses were from documents movng from the 9 th or 1 th rank to a slghtly lower rank. Only two queres whch had relevant results n the top ten for the default had none for the learned parameters. Learned was better (42 queres) As was the case when the default was better, the dfference n ths group was typcally only 1. However, there were two cases where the precson was mproved by 2, and 1 case where the precson was mproved by 4. Moreover, there were two cases where the number of relevant documents ncreased from to 1. Precson was the same (166 queres) For most of the queres, the precson n the top ten was the same. The only dfferences n ths group were that the rankng of ndvdual documents was somewhat re-ordered. In rare cases, a relevant document was replaced wth one whose orgnal rank was greater than 1. Learnng Rate As mentoned prevously, the default values represent a stubborn maxmum such that t s hard to explore beyond t. To get around ths feature, I was forced to ntalze the parameters for each theta randomly for each tranng tral. Fortunately, the learnng algorthm typcally converged after only 5 or 6 passes through that data. Below s a chart whch summarzes a learnng curve from one of the examples. Conclusons Usng the optmzatons found by the learnng algorthm, one can expect to see more precse results n the top ten queres about 2% of the tme, whle experencng less precse results about 8% of the tme. Ths net gan ndcates that the learned parameters provde some gan for the top ten results, but as evdenced by the precson-recall fgure, these gans do not accumulate over tme. Beyond smply the gan n precson, there are several other results whch are worth notng. Frst, the algorthm was successful n demonstratng that some of the proposed features are bad canddates and should not be used n an nformaton retreval system. One surprsng result was that the algorthm was unable to consstently select parameters for the TF group. The data suggests that usng lnear, logarthmc or root scalng of the term frequency wll have bascally the same effect. I had expected that the lnear parameter would be optmzed to a very low value. Another surprsng expermental result was that the default values represented a stubborn local maxmum, and f I ntalzed usng these values, the learnng algorthm could not recover from ths maxmum. Acknowledgements: Dr. Chrstopher Mannng (Stanford) was helpful by provdng the Cranfeld data fles as well as suggestng canddate scalng functons to parameterze. Lucene ( was the backbone of the project. The open-source lbrares allowed me to buld custom ndces and run queres aganst them whle tranng. Learnng Curve Iterato ns Classfer Default
Performance Evaluation of Information Retrieval Systems
Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence
More informationComplex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.
Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal
More informationQuery Clustering Using a Hybrid Query Similarity Measure
Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan
More informationThe Codesign Challenge
ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.
More informationLearning the Kernel Parameters in Kernel Minimum Distance Classifier
Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department
More informationTerm Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task
Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto
More informationUser Authentication Based On Behavioral Mouse Dynamics Biometrics
User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA
More informationUB at GeoCLEF Department of Geography Abstract
UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department
More informationSome Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.
Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,
More informationA Binarization Algorithm specialized on Document Images and Photos
A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a
More informationX- Chart Using ANOM Approach
ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are
More informationProblem Set 3 Solutions
Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,
More informationAssignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.
Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton
More informationSupport Vector Machines
/9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.
More informationOutline. Type of Machine Learning. Examples of Application. Unsupervised Learning
Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton
More informationSmoothing Spline ANOVA for variable screening
Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory
More informationContent Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers
IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth
More informationGSLM Operations Research II Fall 13/14
GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are
More informationUSING GRAPHING SKILLS
Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp
More informationy and the total sum of
Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton
More informationSLAM Summer School 2006 Practical 2: SLAM using Monocular Vision
SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,
More informationThe Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique
//00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy
More informationSequential search. Building Java Programs Chapter 13. Sequential search. Sequential search
Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to
More informationReducing Frame Rate for Object Tracking
Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15
CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc
More informationWishing you all a Total Quality New Year!
Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma
More informationIntro. Iterators. 1. Access
Intro Ths mornng I d lke to talk a lttle bt about s and s. We wll start out wth smlartes and dfferences, then we wll see how to draw them n envronment dagrams, and we wll fnsh wth some examples. Happy
More informationMachine Learning 9. week
Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below
More informationData Mining: Model Evaluation
Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)
More informationEdge Detection in Noisy Images Using the Support Vector Machines
Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona
More informationA MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS
Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung
More informationSimulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010
Smulaton: Solvng Dynamc Models ABE 5646 Week Chapter 2, Sprng 200 Week Descrpton Readng Materal Mar 5- Mar 9 Evaluatng [Crop] Models Comparng a model wth data - Graphcal, errors - Measures of agreement
More information5 The Primal-Dual Method
5 The Prmal-Dual Method Orgnally desgned as a method for solvng lnear programs, where t reduces weghted optmzaton problems to smpler combnatoral ones, the prmal-dual method (PDM) has receved much attenton
More informationSupport Vector Machines
Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned
More informationUnsupervised Learning
Pattern Recognton Lecture 8 Outlne Introducton Unsupervsed Learnng Parametrc VS Non-Parametrc Approach Mxture of Denstes Maxmum-Lkelhood Estmates Clusterng Prof. Danel Yeung School of Computer Scence and
More informationMathematics 256 a course in differential equations for engineering students
Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the
More informationA Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems
A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty
More informationModule Management Tool in Software Development Organizations
Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,
More informationFeature Reduction and Selection
Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components
More informationLearning-Based Top-N Selection Query Evaluation over Relational Databases
Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **
More informationDescription of NTU Approach to NTCIR3 Multilingual Information Retrieval
Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan
More informationClassifier Selection Based on Data Complexity Measures *
Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.
More informationImprovement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration
Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd
More informationBiostatistics 615/815
The E-M Algorthm Bostatstcs 615/815 Lecture 17 Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts
More informationSteps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices
Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between
More informationMachine Learning: Algorithms and Applications
14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of
More information6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour
6.854 Advanced Algorthms Petar Maymounkov Problem Set 11 (November 23, 2005) Wth: Benjamn Rossman, Oren Wemann, and Pouya Kheradpour Problem 1. We reduce vertex cover to MAX-SAT wth weghts, such that the
More informationSubspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;
Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features
More informationMachine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)
Machne Learnng Support Vector Machnes (contans materal adapted from talks by Constantn F. Alfers & Ioanns Tsamardnos, and Martn Law) Bryan Pardo, Machne Learnng: EECS 349 Fall 2014 Support Vector Machnes
More informationNUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS
ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana
More informationLearning an Image Manifold for Retrieval
Learnng an Image Manfold for Retreval Xaofe He*, We-Yng Ma, and Hong-Jang Zhang Mcrosoft Research Asa Bejng, Chna, 100080 {wyma,hjzhang}@mcrosoft.com *Department of Computer Scence, The Unversty of Chcago
More informationProgramming in Fortran 90 : 2017/2018
Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values
More informationLecture 5: Multilayer Perceptrons
Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented
More informationProblem Definitions and Evaluation Criteria for Computational Expensive Optimization
Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty
More informationParallelism for Nested Loops with Non-uniform and Flow Dependences
Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr
More informationUnsupervised Learning and Clustering
Unsupervsed Learnng and Clusterng Supervsed vs. Unsupervsed Learnng Up to now we consdered supervsed learnng scenaro, where we are gven 1. samples 1,, n 2. class labels for all samples 1,, n Ths s also
More informationCS 534: Computer Vision Model Fitting
CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust
More informationCHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION
24 CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION The present chapter proposes an IPSO approach for multprocessor task schedulng problem wth two classfcatons, namely, statc ndependent tasks and
More informationFace Detection with Deep Learning
Face Detecton wth Deep Learnng Yu Shen Yus122@ucsd.edu A13227146 Kuan-We Chen kuc010@ucsd.edu A99045121 Yzhou Hao y3hao@ucsd.edu A98017773 Mn Hsuan Wu mhwu@ucsd.edu A92424998 Abstract The project here
More informationSAO: A Stream Index for Answering Linear Optimization Queries
SAO: A Stream Index for Answerng near Optmzaton Queres Gang uo Kun-ung Wu Phlp S. Yu IBM T.J. Watson Research Center {luog, klwu, psyu}@us.bm.com Abstract near optmzaton queres retreve the top-k tuples
More informationUnsupervised Learning and Clustering
Unsupervsed Learnng and Clusterng Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned
More informationCompiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz
Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster
More informationSimulation Based Analysis of FAST TCP using OMNET++
Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months
More informationA Fast Content-Based Multimedia Retrieval Technique Using Compressed Data
A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,
More informationCluster Analysis of Electrical Behavior
Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School
More informationA mathematical programming approach to the analysis, design and scheduling of offshore oilfields
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and
More informationCombining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval
Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu
More informationAdditive Groves of Regression Trees
Addtve Groves of Regresson Trees Dara Sorokna, Rch Caruana, and Mrek Redewald Department of Computer Scence, Cornell Unversty, Ithaca, NY, USA {dara,caruana,mrek}@cs.cornell.edu Abstract. We present a
More informationEECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science
EECS 730 Introducton to Bonformatcs Sequence Algnment Luke Huan Electrcal Engneerng and Computer Scence http://people.eecs.ku.edu/~huan/ HMM Π s a set of states Transton Probabltes a kl Pr( l 1 k Probablty
More informationFrom Comparing Clusterings to Combining Clusterings
Proceedngs of the Twenty-Thrd AAAI Conference on Artfcal Intellgence (008 From Comparng Clusterngs to Combnng Clusterngs Zhwu Lu and Yuxn Peng and Janguo Xao Insttute of Computer Scence and Technology,
More informationBackpropagation: In Search of Performance Parameters
Bacpropagaton: In Search of Performance Parameters ANIL KUMAR ENUMULAPALLY, LINGGUO BU, and KHOSROW KAIKHAH, Ph.D. Computer Scence Department Texas State Unversty-San Marcos San Marcos, TX-78666 USA ae049@txstate.edu,
More informationCHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION
48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue
More informationTN348: Openlab Module - Colocalization
TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages
More informationMachine Learning. Topic 6: Clustering
Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess
More informationGeneralized Team Draft Interleaving
Generalzed Team Draft Interleavng Eugene Khartonov,2, Crag Macdonald 2, Pavel Serdyukov, Iadh Ouns 2 Yandex, Russa 2 Unversty of Glasgow, UK {khartonov, pavser}@yandex-team.ru 2 {crag.macdonald, adh.ouns}@glasgow.ac.uk
More informationThe Research of Support Vector Machine in Agricultural Data Classification
The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou
More informationThe Effect of Similarity Measures on The Quality of Query Clusters
The effect of smlarty measures on the qualty of query clusters. Fu. L., Goh, D.H., Foo, S., & Na, J.C. (2004). Journal of Informaton Scence, 30(5) 396-407 The Effect of Smlarty Measures on The Qualty of
More informationSupport Vector Machines. CS534 - Machine Learning
Support Vector Machnes CS534 - Machne Learnng Perceptron Revsted: Lnear Separators Bnar classfcaton can be veed as the task of separatng classes n feature space: b > 0 b 0 b < 0 f() sgn( b) Lnear Separators
More informationUsing Ambiguity Measure Feature Selection Algorithm for Support Vector Machine Classifier
Usng Ambguty Measure Feature Selecton Algorthm for Support Vector Machne Classfer Saet S.R. Mengle Informaton Retreval Lab Computer Scence Department Illnos Insttute of Technology Chcago, Illnos, U.S.A
More informationLearning to Project in Multi-Objective Binary Linear Programming
Learnng to Project n Mult-Objectve Bnary Lnear Programmng Alvaro Serra-Altamranda Department of Industral and Management System Engneerng, Unversty of South Florda, Tampa, FL, 33620 USA, amserra@mal.usf.edu,
More informationAn efficient method to build panoramic image mosaics
An effcent method to buld panoramc mage mosacs Pattern Recognton Letters vol. 4 003 Dae-Hyun Km Yong-In Yoon Jong-Soo Cho School of Electrcal Engneerng and Computer Scence Kyungpook Natonal Unv. Abstract
More informationKIDS Lab at ImageCLEF 2012 Personal Photo Retrieval
KD Lab at mageclef 2012 Personal Photo Retreval Cha-We Ku, Been-Chan Chen, Guan-Bn Chen, L-J Gaou, Rong-ng Huang, and ao-en Wang Knowledge, nformaton, and Database ystem Laboratory Department of Computer
More informationHierarchical Bayesian Domain Adaptation
Herarchcal Bayesan Doman Adaptaton Jenny Rose Fnkel and Chrstopher D. Mannng Computer Scence Department Stanford Unversty Stanford, CA 94305 {jrfnkel mannng}@cs.stanford.edu Abstract Mult-task learnng
More informationSelecting Query Term Alterations for Web Search by Exploiting Query Contexts
Selectng Query Term Alteratons for Web Search by Explotng Query Contexts Guhong Cao Stephen Robertson Jan-Yun Ne Dept. of Computer Scence and Operatons Research Mcrosoft Research at Cambrdge Dept. of Computer
More informationExperiments in Text Categorization Using Term Selection by Distance to Transition Point
Experments n Text Categorzaton Usng Term Selecton by Dstance to Transton Pont Edgar Moyotl-Hernández, Héctor Jménez-Salazar Facultad de Cencas de la Computacón, B. Unversdad Autónoma de Puebla, 14 Sur
More information11. HARMS How To: CSV Import
and Rsk System 11. How To: CSV Import Preparng the spreadsheet for CSV Import Refer to the spreadsheet template to ad algnng spreadsheet columns wth Data Felds. The spreadsheet s shown n the Appendx, an
More informationWhy visualisation? IRDS: Visualization. Univariate data. Visualisations that we won t be interested in. Graphics provide little additional information
Why vsualsaton? IRDS: Vsualzaton Charles Sutton Unversty of Ednburgh Goal : Have a data set that I want to understand. Ths s called exploratory data analyss. Today s lecture. Goal II: Want to dsplay data
More informationNetwork Intrusion Detection Based on PSO-SVM
TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*
More informationComparing High-Order Boolean Features
Brgham Young Unversty BYU cholarsarchve All Faculty Publcatons 2005-07-0 Comparng Hgh-Order Boolean Features Adam Drake adam_drake@yahoo.com Dan A. Ventura ventura@cs.byu.edu Follow ths and addtonal works
More informationExtraction of User Preferences from a Few Positive Documents
Extracton of User Preferences from a Few Postve Documents Byeong Man Km, Qng L Dept. of Computer Scences Kumoh Natonal Insttute of Technology Kum, kyungpook, 730-70,South Korea (Bmkm, lqng)@se.kumoh.ac.kr
More informationCS47300: Web Information Search and Management
CS47300: Web Informaton Search and Management Prof. Chrs Clfton 15 September 2017 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group Retreval Models Informaton Need Representaton
More informationA Hybrid Genetic Algorithm for Routing Optimization in IP Networks Utilizing Bandwidth and Delay Metrics
A Hybrd Genetc Algorthm for Routng Optmzaton n IP Networks Utlzng Bandwdth and Delay Metrcs Anton Redl Insttute of Communcaton Networks, Munch Unversty of Technology, Arcsstr. 21, 80290 Munch, Germany
More informationA Statistical Model Selection Strategy Applied to Neural Networks
A Statstcal Model Selecton Strategy Appled to Neural Networks Joaquín Pzarro Elsa Guerrero Pedro L. Galndo joaqun.pzarro@uca.es elsa.guerrero@uca.es pedro.galndo@uca.es Dpto Lenguajes y Sstemas Informátcos
More informationOutline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1
4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:
More informationFederated Search of Text-Based Digital Libraries in Hierarchical Peer-to-Peer Networks
Federated Search of Text-Based Dgtal Lbrares n Herarchcal Peer-to-Peer Networks Je Lu School of Computer Scence Carnege Mellon Unversty Pttsburgh, PA 15213 jelu@cs.cmu.edu Jame Callan School of Computer
More informationLecture #15 Lecture Notes
Lecture #15 Lecture Notes The ocean water column s very much a 3-D spatal entt and we need to represent that structure n an economcal way to deal wth t n calculatons. We wll dscuss one way to do so, emprcal
More informationNews. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example
Unversty of Brtsh Columba CPSC, Intro to Computaton Jan-Apr Tamara Munzner News Assgnment correctons to ASCIIArtste.java posted defntely read WebCT bboards Arrays Lecture, Tue Feb based on sldes by Kurt
More informationIncremental Learning with Support Vector Machines and Fuzzy Set Theory
The 25th Workshop on Combnatoral Mathematcs and Computaton Theory Incremental Learnng wth Support Vector Machnes and Fuzzy Set Theory Yu-Mng Chuang 1 and Cha-Hwa Ln 2* 1 Department of Computer Scence and
More information