Behavioral Model Extraction of Search Engines Used in an Intelligent Meta Search Engine

Behavoral Model Extracton of Search Engnes Used n an Intellgent Meta Search Engne AVEH AVOUSI Computer Department, Azad Unversty, Garmsar Branch BEHZAD MOSHIRI Electrcal and Computer department, Faculty of engneerng, Unversty of Tehran IRAN Abstract: Informaton fuson placed over the Data fuson level prepares grounds to gan more perfect and clear results based on uncertan collected nformaton on one subect from dfferent aspects. Nowadays, the need for ntellgent systems as personal Meta Search engne capable of supplyng user by needed nformaton from great mass of nformaton resources s sensble. More over, the measures taen on ths ground have many defcences. In a Meta Search engne the user's nterests are receved and the proper queres based upon them are transmtted to the search engnes. Then, the returned results of the search engnes become fltered and based on prorty they are made avalable for the user. But, t s obvous that the dfferent search engnes have dfferent behavor on dfferent subects. On the same drecton n ths study we try to examne a part of a customzed ntellgent agent whch s able to extract behavoral model of search engnes from dfferent subectve clusters gradually, and accordng to the feedbac t gets from the user. ey words: Data/Informaton fuson-lst fuson-meta search engne- Intellgent agent Subectve clusters. Introducton Undoubtedly,sutable retreve of nformaton from nternet and other data sources wth large scales and very large scales s one of the most mportant problems n effcent use of nformaton sources. Nowadays, web s the largest data source of documents and other forms of nformaton, and a sutable ground for evaluatng the dfferent Informaton retreval technques. The more the web s expanded, the more need for powerful search tools become evdent. At the present, there are lots of servces for web search. But none of them are helpful as expected, and actually n the most cases the results are dssatsfactory. One of the most mportant reasons of ths s because of naccurate nowledge of the users about present search engne abltes, by the other word, ther behavoral model. Researches done about the nternet Meta Search engnes shows that the lsts fuson whch has ts own ndependent lterature [] and also the behavoral model of nternet search engnes were not studed from the angle consdered n ths paper. The Characterstcs of Intellgent Meta Search engne based on Informaton Fuson In ths paper, an ntellgent Meta Search engne s dscussed whch usng nformaton Fuson technques, t roles as a customzed Meta Search engne for the user. Ths agent receves the words or phrases nterested for the user who s wllng to fnd the subects related to them to study, and then t ass the user to determne weghts of words accordng to ther mportance, whether or not to be used n the text. Weght has a lngustc concept here. It means that the user can determne the mportance of whether or not word to be n the text as None, Low, Medum, Hgh, and Very Hgh. Then the agent maes ready the queres, by a unt named "Query Generator", accordng to the number of nformaton servers (e.g. nternet search engnes, and/ or data sources) [,]. After sendng queres each server returns a lst of raned documents based on ther proxmty to the subect, and accordng to the algorthm that server wors upon t. Then the Meta Search engne revews these lsts and then elmnates the repeated tems, and fuses them based on lst fuson algorthms such that a raned lst of documents s prepared. In ths lst a

score s gven to each document based on ts poston n the lst []. Then the documents are processed one by one n ths lst, and ther condtons are determned regardng whether or not the ey words or phrases selected by the user to be present, and accordng to ther presence quantty and dstrbuton, two scores are gven to each document. In ths mechansm, Ordered Weghted Averagng operator (OWA) [,8,9,0,] has the basc role. The mentoned aspects of Meta Search engne operaton are dscussed n [,,] thoroughly. Ths paper ntally focused on the fusng method of the lsts attaned from search engnes and modelng of the search engnes used by Meta Search engne. Each tme the user decdes to use the Meta Search engne, he or she specfes that ths nterestng subect s n whch subectve cluster [6,7]. By subectve cluster, we mean "a logcal classfcaton of nterestng subects". Each tme the user starts a new search; he or she can select from the avalable clusters or create a new cluster. Some of avalable clusters at the present are shown n Table. The agent has allocated one score to each server based on ts hstorc operaton n each subectve cluster. The agent allocates the score of the server whch has retreved the document to t. These scores become updated after each use of Meta Search engne, based on an algorthmthoroughly dscussed later so gradually the behavoral model of each search engne and ts effcency on a specal subect s formed n the mnd of ntellgent Meta Search engne. Regardng the mentoned explanatons, dfferent scores are ganed for each document that at the end the agent must calculate the fnal score of each document based on them and represent t to the user. It s done by fusng of these scores by methods of nformaton fuson and for each score a weght s consdered whch shows the mportance of that crteron at the fnal decson. Also, fnally, the score of each server n the specal subectve cluster s updated wth / wthout feedbac receved from the user.. The problem of lst fuson Dfferent data sources on the web often complete each other. Thus, to cover all the nformaton resources and to gan more pure results, t s a logc strategy to use dfferent search tools and at the end the results to be refned and then fused together. Now, the queston s: what s the best method to fuse these lsts together? Ths queston s mportant because the lsts whch are represented by the search tools are often raned. Now, we want to fuse all these lsts together to get a unt lst whch ts tems are selected from all the presented tems n all the lsts. But fusng these and producng a fnal lst s an mportant dscusson. Suppose we have a group of nformaton servers. S We show these servers wth In whch, =,... M M s the number of servers. Also, we suppose each server got a unque collecton of documents. It means that each document s n ust one nformaton server. Of course, we can't suppose ths for nternet search engnes. But we can create the same condton by elmnatng the repeated tems from other lsts. We also suppose that each nformaton server has ts own search mechansms. For a Query Q, each server gve one score to each document and at the end prepares a raned lst of documents related to the query as ts answer. The problem we have to solve: to choose N documents most related to query and put them nto the fnal lst. Each N document may be got from each of the servers. The pont whch maes our ob harder s the dfference among servers n ther methods of allocatng scores, and these methods may never be comparable. For ths reason we can't select N documents n order from hghest score to the lowest one n each server.. Representng Mathematcal Formulaton of the Lsts Fuson Mechansm Let us denote the number of lsts that we want to fuse by M.The lsts themselves can be denoted by L,..., L M. For each from to M let us denote by N the number of tems n -th lst. A natural way of fusng these M lsts nto a sngle sorted lst s to assgn a value v to each tem of the lsts and then sort all N + N +... + N M elements n the ncreasng order of ths value. So the queston s: How can we determne the value v for each tem? We need a functon of two varables. An tem s unquely characterzed wth two parameters: The number of the lst L from whch ths tem comes The order of the gven tem n the correspondng lst L ( =,, ). The value v must be unquely determned by the values of these two parameters. Dependence v on, can only appear through dependence on N.So v s a functon of and N.

() v= v(, N ) If two lsts L and L have the same length, there s no reason to assgn hgher prorty to each one of them. It means: v (, N ) = v(, N ). The way of calculatng V (, N ) s descrbed thoroughly n []. Here, we agan state the optmal formula for such functon s: α ) v (, N) = N.( + cn) In the same reference, t s shown that for α = 0 also ths formula can model the behavor of an expert person. Now, after utlzng fusng lsts method, we have a raned lst, that each document n t has a score based on ts ran n ths lst. A smple and adequate method s represented later. But, n advance we represent the followng defntons: Total number of documents = N Absolute score of document n the fused lst = μ. Ths score s descrbed completely before. Normalzed score of document = V Ths s one of the four crtera n fnal documents scorng and for each document s calculated as: μ V = Max( μ ) =. Score allocaton to nformaton servers As explaned before now we have a raned lst of documents whch extracted from dfferent search engnes. But, n ths ranng the behavoral model of search engnes has no role. But as t s explaned before due to varablty and dfferences of desgn parameters and the desgners of each search engne, and also due to ams of each search engne, each search engne has a powerful functon n one Feld, and medum or wea n another. Thus, neglectng ths fact leads to accuracy decreasng n Meta Search engne's functon. In ths part, the place and tme are gven to extract behavoral model of each search engne to utlze t n documents ranng. Intally, a score of 5 s gven to all the nformaton servers. Ths ncludes the sources wll be added gradually to the system n the future. It must be consdered that the deal score of a source s. Each tme the user sends a request to the system and gets result (s) from t, a lst of raned documents s prepared. We devsed a method for learnng the mportance weght of each nformaton server. Ths parameter has mportant role n mang behavoral model of the nformaton servers. A score s allocated to each document n the fnal lst based on the nformaton server that has retreved that document. Thus, the documents retreved from servers wth more powerful bacground n that certan subect, have more chances. Ths pont s mportant. Because, some nformaton servers may be very powerful n a certan feld (subectve cluster), or they are desgned and practced for retrevng the documents related to a certan subect. For modelng of score allocaton to each nformaton server we represent the followng defntons: The number of nformaton servers whch have at least one document n the fnal lst = M The number of documents presented from server n the fnal lst = d Set of the documents rans related to server arranged n ncreasng order = R R = r =,, d ; r < r < < r { } d The number of the documents wth most mportance n the fnal lst = (The user wll chec only documents n the fnal lst) The score of Server form begnnng tll ( t) () t now for cluster = s ( 0 s ) The score of Server n the next step for ( t+ ) ( t+ ) cluster = s ( 0 s ) The absolute score of Server n the current step for cluster = φ The relatve score of Server n the current step for cluster (resulted from normalzng ϕ ) = ψ ( 0 ψ ) Now, we explan the calculatng way ofφ. It s observed that at the present tme each nformaton (t) server's score to the subectve cluster s S whch s between 0 and, and ths value s consdered for all the documents retreved by ths server, and along wth other scores partcpates n the fnal score(usng OWA operator). To calculate φ varety of methods can be adopted. But, we should fnd a moderate method to ths fgure. It seems that the followng methods are adequate: (5) φ = r R d β ( ) β ( ) = r = r R r

(6) φ r R r d r β( ) β( ) = e = = r R In each of the above formulas the more the document become far from the top of lst the more the allocated scores decreased, and the fnal score of each server s obtaned from sum of scores of documents that are retreved by ths server. β( ) s a contnuous functon of n whch s the number of document most consdered by the user n the fnal lst. For example, = 0 means that ordnarly the frst 0 documents s more useful for the user, thus nformaton servers from whch the frst 0 documents are retreved must get the most ncreasng n scores. The value of s determned by the user. β( ) specfes the documents, How affect on score changng of ther nformaton servers. For example n (5) the bgger β( ) conduce that documents wth hgher ran be more effectve n ncreasng the score of related nformaton server. Relatng to (5) f β( ) >, the results wll be unreasonable, such that there wll be a bg dfference between document wth frst poston n the lst and the second. Relatng to (6) ths s vce versa. The more β( ) > s the more the condton s moderate. To mae the matter more clear an example represented: Example: suppose we have 5 nformaton servers (M=5) whch are specfed by S to. Also, suppose that the fused lst for subectve cluster s as Table. Consderng table the values of r and are calculated accordng to table ( s the score of Server form begnnng tll now for cluster ). Now, relatng to (5), for β( ) = and 0 β( ) = and = 0 we calculate φ (Table ). 5 e d (t) S As t s observed, the values of φ for β > are exactly on the opposte of an expert's vew. Because, consderng the raned lst of documents, an expert evaluates the scores of and servers close to each other, but for (5) when β > ths s not correct. Also, for 0 β, determnng the sutable amount for β s not smple. We can show that (5) s not sutable for our purpose. But tunng τ n (6) can produce proper results (Because of the t nature of functon ft () = e τ ). Thus, t maes t 0 possble that only the frst documents n the fnal lst ncrease the score of ther related nformaton server. For example, f an nformaton server has retreved even ust the last document nterested for user (nterestng document ), t gets a postve score. But, after that the speed of decreasng the allocated scores wll ncrease rapdly. β( ) = α can be the smplest form for ths α 5 purpose. Consderng the above ponts the formula used by the ntellgent agent to calculatng the values of φ s as followng: (7) α φ = d r e = r R (α = s sutable amount). Le other parts, wth normalzng the amounts ofφ, we calculate the score of each φ server as followng: (8) ψ = M Max ( φ ) = Now we descrbe the way of updatng (t ) server's score by the agent. S s the server score on the current step. We are gong to fnd a ( t+) functon by whch we calculate S (server ( t+ ) ( t) score on the next step): (9) s = f( s,ψ ) The score of each server can be updated by determnng functon f. But, t must be consdered that the tme parameter, also, affects the functon, ndrectly. The mportance of ths functon s capablty of t n reconstructng the behavoral model of each server. In ths case, the smplest way, s calculatng the average scores of each sever n each cluster. To do ths the followng formula s sutable: (0) ( t+ ) t t s = s t+ + ψ (). t+ In whch, t s the number of queres that agent has sent them about cluster to the server. The results obtaned from the above mechansm, mprove the qualty of the Meta Search engne s results effectvely. The complete results of usng ths mechansm are represented n []. 5. Concluson Snce each of the nternet search engnes are produced by ther own desgner's thought, vson,

and reasonng t s obvous that they have dfferent behavors on fulfllng the users' demands on searchng dfferent subects. Thus, t seems that desgnng an ntellgent Meta search engne wthout consderng behavor of each search engne aganst dfferent subects s naccurate. The results obtaned from ths paper shows that consderng ths parameter n desgnng Meta search engnes, conduce to mprovement n qualty of output results of desgned Meta Search Engne. Accurate modelng of a search engne, n addton to stated parameters n ths paper, may depend on other parameters, too. For example, many search engnes consder the amount of money receved form the document owner n fnal ranng, on whch we ddn t dscuss. Ths parameter and the others effectng on modelng process can be studed more complete n next studes. References: [] Improvng the ntellgent methods of Informaton Fuson Software agent on nternet, aveh avous, M.Sc thess for Artfcal Intellgence and Robotcs, Dept. of Electrcal and computer engneerng, Faculty of engneerng,unversty of Tehran. [] Improvng the functon of ntellgent agent of nformaton fusng, aveh avous, Behzad Moshr, Technology College of Tehran Unversty's publshng, summer 00 ssue. [] Archtectural desgnng of an ntellgent agent based on data fuson for extractng nformaton from searchng felds, Behzad Moshr, aveh avous, The th electrcty engneerng Conference n Shraz. [] A Broad Class of Standard DFSes, I. Glocner, Belefeld Unversty Report TR-000, 00 [5] Usng An Intellgent Agent to Enhance Search Engne Performance, J. Jansen, Peer Revewed Journal on the Internet, http://www.eecs.usma.edu/usma/academc/eec s/nstruct/ansen/, 997. [6] Inductve learnng from consderably erroneous examples wth a specfcty based stoppng rule, J. acprzy, Proceedngs of the Internatonal Conference on Fuzzy Logc and Neural Networs, Izua, Japan, 89, 99 [7] Text-Learnng and Related Intellgent Agents: A Survey, D. Mladenc, IEEE Intellgent Systems Journal, pp. -5, July/August, 999. [8] Pecewse Lnear Aggregaton Functons, S. Ovchnnov, Internatonal Journal of Uncertanty, Fuzzness and nowledge-based Systems, Vol. No. (000). pp.-, 00 [9] Decson mang under Dempster-Shafer uncertantes, Ronald R. Yager, Internatonal Journal of General Systems, Vol. 0, pp. - 5, 99. [0] Fuzzy logc controllers wth flexble structures, Ronald R. Yager and D. P. Flev, Proceedngs of Second Internatonal Conference on Fuzzy Sets and Neural Networs, Izua, Japan, pp. 7-0, 99. [] On the Fuson of Documents from Multple Collecton Informaton Retreval Systems, R. R. Yager, A. Rybalov, Journal of the Amercan Socety for Informaton Scence, 997. [] Fuzzy quotent operators for fuzzy relatonal data bases, Ronald R. Yager, Proc. Int. Fuzzy Engneerng Symposum, pp. 8-96, Yoohama, Japan, 99.

Table : Informaton clusters lst whch the agent holds ther bacground to dentfy behavoral model of nformaton servers. Name of cluster Cluster No. Name of cluster Cluster No. Moble Robot Navgaton 8 Data / Informaton Fuson In tumescent Coatngs 9 Context Senstve Web Searchng Pant & Resn Technology 0 Dempster Shafer Theory Robotcs Computer Scence, Hardware TBM model Computer Scence, Software 5 Neural Networs Case Based Reasonng 6 Neuro Fuzzy Systems Fuzzy Controllers 7 Table - fused lst ganed from raned lsts that retreved by Informaton servers the server that retreved ths dacument Score The server that retreved ths S Score 6 The server that retreved ths document S S 5 S 7 8 9 0 5 () t s Table - Extractng the ran of each document n fused lst R = { r =,, d; r < L< rd } Server d 065. 08. 06. 07. 079. { 68, } { 5,,9, 0} { 75,, } {, } {,,, } S S Table -Extractng absolute score of servers accordng to dfferent values of φ for φ for Server R = { r =,, d; r < L < rd } d ( ) 05. β( ) S β = = 0 7680. 09678 66. 5559 86. = = 5 0778. 06 0875 00 759 { 68, } { 5,,9, 0} { 75,, } {, } {,,, } S S