A Method of Hot Topic Detection in Blogs Using N-gram Model

Size: px
Start display at page:

Download "A Method of Hot Topic Detection in Blogs Using N-gram Model"

Transcription

1 84 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY 203 A Method of Hot Topc Detecton n Blogs Usng N-gram Model Xaodong Wang College of Computer and Informaton Technology, Henan Normal Unversty, Xnxang, Chna Emal: wangxaodong.wang@yahho.com.cn Juan Wang College of Computer and Informaton Technology, Henan Normal Unversty, Xnxang, Chna Emal: juaner.50@63.com Abstract Over the last few years, blogs (web logs) have ganed massve popularty and have become one of the most mportant web socal meda, through whch people can get and release nformaton. Hot topc detecton n blogs s most commonly used n analyzng network publc opnon. A method of hot topc detecton usng n-gram model and hotness of topc evaluaton s proposed n ths paper. Our approach conssts of three steps. Frst of all, keywords durng a gven tme perod are obtaned by means of calculatng word's weght, and hot keywords are collected by combnng keywords. Secondly, based on hot keywords, hot keyword groups are extracted usng n-gram model. In the thrd step, hot keyword groups are extracted and hot topcs are detected. The hotness of hot topc s evaluated by the value of keywords weght, whch s got n the second step. Evaluatons on Chnese corpus show that when the sze of n for n-gram s fve, the proposed method s most effectve. Index Terms n-gram model; blog; hot keyword; hot keyword group; hot topc I. INTRODUCTION Wth the development of WEB 2.0, blogs have ganed massve popularty and have become one of the most nfluental web socal meda n our tmes. Anyone wth an nternet connecton can convenently publsh topcs. Accordng to a research [], there are over 75,000 new blogs are created per day by people all over the world, on a great varety of subjects. The huge growth of blogs provdes a wealth of nformaton watng to be extracted. Blogs are becomng an extremely relevant resource for dfferent knds of studes focused on many useful applcatons. Accordngly, blogs offer a rch opportunty for detectng hot topcs that may not be covered n tradtonal newswre text. Unlke news reports, blog artcle expresses a wde range of topcs, opnons, vocabulary and wrtng style. The change n edtoral requrements allows blog authors to comment freely on local, natonal and nternatonal ssues, whle stll expressng ther personal sentment. These forms of self publshed meda mght also allow topc detecton systems to dentfy developng topcs before offcal news reports can be wrtten. Most prevous approaches n hot topc detecton are based on cluster technologes. L and Wu [2] used an mproved algorthm based on K-means clusterng method and support vector machne to group the forums nto varous clusters. Hao and Hu [3] proposed sngle-pass clusterng method to detect topc orented to BBS (Bulletn Board Systems). Da [4] and Wang [5] proposed herarchcal clusterng method. Zheng and Fang [6] used clusterng method and agng theory to detect hot topc on BBS. Another trend of research s to use Natural Language Processng technology. As two representatve methods, Chnese Word Segmentaton Technology (CWST) and Named Entty Recognton [7] are utlzed. Zhu and Wu [8] used CWST to dg out hot topcs based on combnaton of multple keywords. Chen [9] presented a nose-fltered model to extract the outburst topcs from web forums usng terms and partcpatons of users. K. Chen and L. Luesukprasert [0] extracted hot terms by mappng ther dstrbuton over tme, and dentfed key sentences through hot terms, then used multdmensonal sentence vectors to group key sentences nto clusters that represented hot topcs. M. Plataks, D. Kotsakos and D. Gunopulos proposed a hot topc detecton method durng a tme nterval whch was based on bursty dscovery. Yadong Zhou, Xaohong Guan and et al. [] utlzed statstcs and correlaton of popular words n network traffc content to extract popular topcs on the Internet. Two or three keywords can generally express a topc [2]. Mergng multple keywords [3] can reflect the hot news events n a certan tme. Zhou et al. [4] constructed a keyword network based on word frequency and cooccurrences to detect hot topc on professonal blogs. Unfortunately, the frequences of terms descrbng the same hot topc nformaton are dfferent. If the threshold value s not hgh enough, some mpact terms wll be fltered. As a result, the detected keywords wll not effectvely represent the hot topc nformaton. How to evaluate the hotness of topc s one of the mportant problems n hot topc detecton. Janjang L, Xuechun Zhang, and et al. [5] evaluated the blog hotness based on text opnon analyss. They took the number of revews, comments and publcaton tme of the do:0.4304/jsw

2 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY blog topc, and the comment opnon nto account. Lan You, Yongpng Du and et al. [6] evaluated the hotness of the topc through ts popularty, qualty and message dstrbuton usng Back-Propagaton Neural Network based on classfcaton algorthm. Tngtng He, Guozhong Qu and et al. [7] presented a sem-automatc hot event detecton approach. They evaluated the actvty of events frstly, then fltered and sorted the events accordng to the actvty of events, and fnally got hot event. These methods above could detect hot topc and evaluate hotness of the topc effectvely. However, blog has some partcular structure features tself. These methods may not be completely sutable for hot topc detecton n blogs. In ths paper, employng CWST and takng both the several words n the sentence whch locate left and rght of the keywords nto account, we propose a method to detect hot topc n blogs based on n- gram model. Frstly, the hot keywords are extracted based on the co-occurrence nformaton n n-gram tems whch provde more nformaton about hot topc. Then, hot keywords groups are collected by calculatng the smlarty of n-gram tems contanng keywords. Fnally, the hot topcs are detected by computng the smlarty between keywords and n-gram tems to decde whether the keyword group represents a hot topc. The hotness of the topc s evaluated by the value of keywords weght. Ths paper s organzed as follows: Secton 2 ntroduces n-gram model and gves some concepts. Secton 3 presents our approach for hot topc detecton and hotness of the topc evaluaton n blogs. Secton 4 presents the expermental results on Chnese corpus. Fnally, we summarze the future work. II. N-GRAM MODEL AND RELATED CONCEPTS In the felds of computatonal lngustcs and probablty, an n-gram s a contguous sequence of n words from a gven sequence of text or speech. N-gram models are wdely used n statstcal natural language processng, and approxmate matchng. Statstcal N-gram models capturng patterns of local co-occurrence of contguous words n sentences have been used n varous hybrd mplementatons of Natural Language Processng and Machne Translaton systems [8-22]. In ths paper, one contguous sequence of n words s referred to as an n- gram tem, and one n-gram tem has n entres. We use n- gram model to present sentence that contans keyword. Words that relate wth the keywords can be found n the n-gram tems, so the detaled nformaton of topc that keywords depct can be detected. The keywords depctng one topc can be detected through calculatng the smlarty of n-gram tems that contan keywords. A blog artcle conssts of three parts: the ttle, the content and the reply. For a sentence n blog artcle, frst, we defne a stoppng-word lst to remove the words that are rrelevant to the theme of the blog artcle. Then we do word segmentaton and Part-of-Speech taggng. Fnally, ths sentence s represented by n-gram model. An example of fve-gram model s gven n Fg.. Every fve-gram tem has fve entres, all of whch n an tem have some co-occurrence nformaton. Every sentence has several fve-gram tems. In process of combnng fve-gram tems, f the number of the entres n the last fve-gram tem s smaller than fve, then we use stop words to fll. The n-grams can effectvely capture the relatonshp between words, whch have co-occurrence nformaton of keywords. III. HOT TOPIC DETECTION METHOD IN BLOGS Two or three keywords can generally express a topc, but they can't provde the detaled topc nformaton, such as tme, place and related people. A topc n blogs s a cluster that s composed of a number of blog artcles sharng theme. Each blog artcle has one or more keywords to depct the theme. For a blog artcle, n the sentence contanng keywords, words that are not far away from the keywords generally provde the topc nformaton. So for the sake of effectveness, we use n- gram model to present sentences, set a dstance wndow of n entres, and take the n-gram tems that contanng keywords nto account. The hot topc detecton method n blogs s composed of three parts, they are hot keyword extracton method, hot keyword group extracton method and hot topc detecton algorthm. In Fg.2, the general process of the method s gven. The frst step s to get blog artcles, and then do preprocessng and word segmentaton. Next, extract keyword by calculatng the word s weght, represent the sentence contanng keywords wth n-gram model, and dscover the frequency of every k-grams where k n by scannng the tranng dataset and estmatng each precson and recall. In the thrd step, we buld the hot keyword extracton algorthm, hot keyword group extracton algorthm and hot topc detecton algorthm usng n-grams. Sentence The fourth annversary of the Wen Chuan earthquake Word Segmentaton The/ fourth/ annversary/ of/ the/ Wen/ Chuan/ earthquake Fve-gram tems The- fourth- annversary -of -the; fourth- annversary -of - the-wen; annversary -of -the-wen- Chuan; of -the-wen- Chuan- earthquake Fgure. An example of fve-gram model

3 86 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY 203 Blog Artcle Set Preprocessng Word Segmentaton Keywords Extracton Possble N-gram Hot Keywords Extracton Hot Keyword Group Extracton Hot Topc Detecton Fgure.2 The framework of our approach Feature Selecton Selected N-gram A. Hot Keyword Extracton Method Because words n the ttle, text, and reply have dfferent nfluence on theme of a blog artcle, we set a weght for each keyword, whch reflects the mportance and hotness of the keyword. The weght can be calculated as weght =α tf _ tt +β tf _ txt +γ * tf _ rly () where α + β + γ =, and tf _ tt s the word frequency n the ttle, tf _ txt s the word frequency n the text, and tf _ rly s the word frequency n the reply. In ths paper, we set α =0.5,β =0.3, γ =0.2. We consder the word whose weght exceeds the predefned threshold hot as a keyword, and put t nto a keyword lst. We create a keyword lst for each blog artcle. The data structure of the node n the keyword lst s defned as follows: Typedef struct{ Strng keyword ; Float weght ; Strng pos [] ; } every node n keyword lst has three element, keyword, weght that represents the weght of keyword, and pos [] whose member are n-gram tems contanng keyword. We use n-gram model to represent the sentence that contans keywords. In propose of smplfcaton, we do not consder all the n-gram tems of the sentence, only the n-gram tems that contan keywords are consdered. The keyword, keyword's weght and n-gram tems contanng keyword are stored n the keyword lst. After calculatng the keyword lst for each blog artcle, we get the hot keywords by mergng keyword lsts. The algorthm to merge keyword lst s descrbed as follows. Algorthm: Keyword Combnaton Algorthm Input: the keyword lst of blog artcle a, the keyword lst of blog artcle b. Output: merged keyword lst. ) For ( =; La. Length ; ++) 2) For ( j =; j Lb. Length ; j ++) 3) If(La[ ].keyword==lb[ j ].keyword) 4) Merge(La[ ],Lb[ j ]); 5) End For 6) End For 7) If( Lb. Length >0) 8) Put all the nodes n Lb nto La, delete Lb. In Algorthm, La. Length s the length of the keyword lst of artcle a, and Lb. Length s the length of the keyword lst of artcle b. The steps to calculate Merge (La[ ], Lb[ j ]) can be defned as follows: Step Alter the keyword s weght n La[ ].weght, set La[ ].weght=la[ ].weght + Lb[ j ].weght; Step2 Put all the n-gram tems n Lb[ j]. pos [] nto La[]. pos []; Step3 Remove the repeated n-gram tems n La[]. pos []; Step4 Delete Lb[j] n Lb. Suppose the number of the collected blog artcles s n, there are n keyword lsts. We use Algorthm to merge all the keyword lsts. Then get the merged keyword lst of all the n blog artcles. If there s a new blog artcle, t wll be combned wth the merged keyword lst.

4 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY For convenence, the value of the keyword's weght s set to between 0 and. The normalzaton value of weght s descrbed as weght( j) weght( j) = max ( weght ( )) (2) l where weght( j) presents the weght of keyword n the jth node n the merged keyword lst, and l s the length of the merged keyword lst. We use pollng method to look over the weght of each keyword n merged keyword lst, f there s a keyword whose weght value s greater than the predefned threshold hot, the keyword wll be consdered as a hot keyword. All the hot keywords are stored n a hot keyword lst, whch has the same data structure wth keyword lst. For each node n hot keyword lst, we use ts weght value to measure the hotness of ts keyword, that s to say, the hotness of a hot keyword s presented wth the value of ts weght. We rank all the keywords accordng to ther weghts value. The hottest keyword s arranged n the frst node of hot keyword lst. B. Hot Keyword Group Extracton Method One hot keyword can't depct a hot topc comprehensvely, so a hot topc always has several hot keywords, all the keywords are contaned n the hot keyword lst. In ths paper, we combne hot keywords that descrbe the same hot topc to form a hot keyword group. The algorthm to combne hot keywords descrbng a hot topc s represented as follows. Algorthm 2: Hot Keyword Combnaton Algorthm Input: Hot keyword lst Lh. Output: Combned hot keyword lst. ) For( =; < Lh. Length ; ++) 2) For(j= +;j Lh. Length ; j++) 3) sm ( Lh[]. pos [], Lh[ j]. pos []); 4) If ( sm ( Lh[]. pos [], Lh[ j]. pos [])> threshold sm ) 5) Merge( Lh[], Lh[ j ]); 6) End For 7) End For where sm( Lh[ ]. pos[], Lh[ j]. pos []) represents the smlarty of Lh[]. pos [] and Lh[ j]. pos [], the hgher the value of smlarty s, the more nformaton Lh[]. pos[] and Lh[ j]. pos[] share, and the greater probablty of Lh[]. keyword and Lh[ j]. keyword depctng a hot topc s. Let the number of n-gram tems n Lh[]. pos [] be m, the number of n- gram tems n Lh[ j]. pos [] be k. The steps of method to compute sm( Lh[ ]. pos[], Lh[ j]. pos []) are descrbed as follows: ) Let th =0; 2) For ( =; m; ++) 3) For( j =; j k; j ++) 4) sm( pos[ ], pos[ j ]); 5) th = th + sm( pos[ ], pos[ j ]); 6) End For 7) End For where th represents the value of sm( pos[ ], pos[ j ]), and ts value s set to zero n the begnnng. We use the followng normalzaton formula. (3) to reset the value of th at last. th = th (3) th sm( pos[ ], pos[ j ]) represents the smlarty between the th n-gram tem n Lh[]. pos [] and the jth n-gram tem n Lh[ j]. pos []. The steps to calculate sm( pos[ ], pos[ j]) are defned as follows. Step Compare the n contnuous entres n pos[ j ] wth those n pos[], f they match, smn ( pos[ ], pos[ j ]) =, else smn ( pos[ ], pos[ j ]) =0. Step2 Cut the last entry n pos[ j ], compare the n- contnuous entres wth the n contnuous entres n pos[]. If the (n-)-gram n pos[ j ] s the part of n- gram n pos[], smn ( pos[ ], pos[ j ]) =n-/n, else smn ( pos[ ], pos[ j ]) =0. Step3 Repeat Step2 untl all the remanng contnuous entres n pos[ j ] are compared. In each step, f they match, smp ( pos[ ], pos[ j])= p, else n smp ( pos[ ], pos[ j ]) = 0, where p changes from n-2 to. Step4 The sm( pos[ ], pos[ j]) s defned as ( [ ], [ ]) = n sm pos pos j smp ( pos[ ], pos[ j ]) (4) p= In Algorthm 2, the method to calculate Merge( Lh[ ], Lh[ j ]) s represented as follows: () Put the hot keywords n Lh[ j ] nto Lh[] ; (2) Let Lh[]. weght = Lh[]. weght + Lh[ j]. weght ; (3) Put the n-gram tems n Lh[ j]. pos [] nto Lh[]. pos [], and remove the repeated n-gram tems n Lh[]. pos []; (4) Delete Lh[ j ] n Lh. The hot keyword group n combned hot keyword lst s the combnaton of several hot keywords that depct the same topc. Let L [] be a node n the combned hot keyword lst L, then L[]. keyword contans the keywords that have same nformaton n ther n-gram tems, L []. weght presents the sum of the value of all keywords weght n L[]. keyword, and L[]. pos [] contans the n-gram tems contanng keyword. The n- gram tems n L[]. pos contan the context nformaton of keywords n the sentence whch ncludes these keywords, so the n-gram tems descrpt the tme and place nformaton of hot topc n detal.

5 88 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY 203 Due to the complexty of Chnese sentence, the detected keyword may not reflect the blog artcle theme, and the keywords n L[]. keyword may not descrbe the hot topc. If a keyword n L[]. keyword reflect a topc, then the keyword wll appear n more than one n-gram tems n L []. pos []. So the keyword group represents a hot topc canddate, we need to calculate the smlarty between the keyword group and ther n-gram tems to decde whether the keyword group represents a hot topc. C. Hot Topc Detecton Algorthm Let K = {k,k 2,...,k n} represent the combned hot keyword lst. Suppose the number of keywords n node k s m, the number of n-gram tems n k.[ pos ] s p, and the set of keywords s represented by { s, s, s,..., s }. The method to detect hot topc s S= 2 3 m descrbed as follows: Algorthm 3: Hot Topc Detecton Algorthm Input: Combned hot keyword lst Output: Hot topcs Step Calculate the smlarty between s and k.[ pos ] usng the followng formula.(5) p sm _ s (,.[ ]) s k pos = (5) p where p presents the number of n-gram tems contanng s n k.[ ] pos. Step2 Repeat step untl all the keywords n { s, s2, s3,..., s m } have been calculated smlarty wth k.[ pos ]. Step3 The smlarty between kkeyword. and.[ pos ] s defned as k m sm( k. keyword, k.[ pos]) = sm _ s ( s, k.[ pos]) (6) = Step4 If sm( k. keyword, k.[ pos ]) s greater than the predefned threshold tpc, then k represents a hot topc. The detected hot topc s represented by the keyword group n combned keyword lst, the hotness of the topc s measured by the value of keywords' weght n keyword lst. We rank the detected hot topcs accordng to ther hotness. Suppose the number of keywords n a keyword group s m, the hotness of the topc that the keyword group represents can be defned as m hotness = weght( keyword ) (7) = where weght( keyword ) represents the value of the th keyword s weght n keyword group. data were publshed from May, 202 to May 4, 202. There are up to 7980 artcles. All artcles have been already preprocessed, such as word segmentaton, Partof-Speech taggng and unknown words recognton. The corpus was dvded nto two data sets: Tranng Set: Publshed from May to May 7, whch contans 4480 artcles. Testng Set: Publshed from May 8 to May 4, whch contans 3500 artcles. The experment contans two steps. In frst step, we use the 4480 artcles to tran. The sze of n n n-gram, the values of threshold hot, threshold top, threshold sm, and threshold tpc should be set n ths step. In second step, we use the remanng 3500 artcles to test and evaluate the parameters. We come up wth sxteen sets of experments to fnalze the parameters n the algorthms above. All the cases are shown n TABLEⅠ. For all the cases, we mplemented the method descrbed n secton 3 to fnd out the best values for the parameters. We use precson and recall to evaluate the performance of the cases above. The results are shown n TABLEⅡ. TABLE Ⅰ THE VALUES OF PARAMETERS N THD hot THD top THD sm THD tpc Cases Remark: THD represents en threshold IV. EXPERIMENT ANALYSIS AND RESULT At present, there s no publc blog corpus, we desgn a crawler to get blog artcles from Sna Blog, Tencent Blog, whch are two of most popular blogs n Chna. All the

6 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY TABLE Ⅱ EVALUATION OF PARAMETERS Approaches Precson Recall of keywords, so the keywords depctng a topc can be effectvely clustered together. Case 63.4% 68.36% Case % 67.64% Case % 66.99% Case % 63.26% Case5 62.8% 70.42% Case % 69.68% Case7 7.8% 68.09% Case % 65.38% Case9 73.4% 78.36% Case % 77.64% Case 79.89% 76.99% Case % 73.26% (a) Tr-grams Case % 73.42% Case % 74.7% Case % 76.98% Case % 76.9% As shown n TABLE Ⅱ, Case gves us the best result. That s N=5, threshold hot =0.35, threshold top =0.40, threshold sm =0.45, threshold tpc =0.60. In second step, we use the remanng 3500 artcles to test. In addton, we mplemented other methods as compare experments. The methods we used for comparson are Multple Keywords Combnaton (MKC) method, sngle-pass clusterng (SPC) method. TABLE Ⅲ EVALUATION OF COMPARED METHODS Approaches Num Precson Recall (b) Four-grams SPC % 75.93% HKC % 79.54% Our approach % 79.98% Remark: Num represents the numbers of detected hot topcs. As n TABLE Ⅲ, the result shows that, under the same experment condton, the HKC method mproves the effectveness compared wth SPC, and our approach got the best performance. Fg.3 gves the results of comparson of our approach and other two approaches wth feature selecton on testng data n tr-gram, four-gram, fve-gram and sxgram models. In Fg.3, we can see that our approach outperforms SPC and HKC approaches n most case, and we get the best results when we use fve-gram model. That s to say, fve-grams can well reflect the relatonshp (c) Fve-grams

7 90 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY 203 Ths work was supported by the Scentfc and Technology project of Henan Provnce of Chna (No ) and (NO ). (d) Sx-grams Fgure.3 Select n-grams The detected top four blog topcs usng the proposed method based on fve-gram model s shown n TABLE Ⅳ. Ths lst s sorted by weght that represents the hotness of topc n descendng order. TABLE Ⅳ TOP FOUR HOT TOPICS IN BLOGS DURING & MAY TO 4 MAY Topc Expermental Result The fourth annversary of the Wen Chuan earthquake 2 What wll you do on Mother s Day Why oppose the war n the south Chna sea between 3 Chna and Phlppnes 4 Refned ol prce falls for the frst tme V. CONCLUSION AND FUTURE WORK Ths paper presents a hot topc detecton and hotness of the topc evaluaton method usng n-gram model, whch can mprove hot topc retreval n blogs. The man contrbutons of the paper are as follows. Frst, we analyze the n-gram model. Second we propose an effectve way to valdate the mportance of words n blog artcle accordng to the features of blog artcle. Thrd, we apply n-gram model to desgn the algorthm to detect hot topc. Experment results on Chnese corpus show that the proposed method s promsng. However, there are stll some shortages. Frst, the experment data s trend to be ncomplete n real lfe applcaton. Second, the method to optmze the parameters s just through repeated experments. Thrd, user partcpaton degree, opnon communcaton degree of blog artcle should be consdered. How to mprove the coverage of experment data, fnd the optmzaton algorthm to adjust the threshold and take more features of blog artcle nto account wll be conducted n the future. ACKNOWLEDGMENT REFERENCES [] M. Plataks, D. Kotsakos, and D. Gunopulos. Dscoverng Hot Topcs n the Blogosphere. In: Proceedng of the Second Panhellenc Scentfc Student Conference on Informatcs, Related Technologes and Applcatons EUREKA, pp (2008) [2] N. L, and D. D. Wu. Usng text mnng and sentment analyss for onlne forums hotspot detecton and forecast [J]. Decson Support System, 48(2), 200, [3] X. Hao, and Y. Hu. Topc detecton and trackng orented to BBS. n: Proceedngs of the 200 Internatonal Conference on Computer, Mechatroncs, Control and Electronc Engneerng(CMCE), 4(200), [4] X. Y. Da, Q. C. Chen, X. L. Wang, and J. Xu. Onlne topc detecton and trackng of fnancal news based on herarchcal clusterng. n: Proceedngs of the Nnth Internatonal Conference on Machne learnng and Cybernetcs(ICMLC), 6(200), [5] C. H. Wang, M. Zhang, S. P. Ma, and L. Y. Ru. Automatc hot event detecton usng both meda and user attenton. Journal of Computatonal Informaton Systems, 4(3), 2008, [6] Y. Yang, J. Zhang, J. Carbonell, and C. Jn. Topccondtoned Novelty Detecton, n: Proceedngs of the 8th ACM SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng'02, 2002, [7] D. H. Zheng, and F. L. Hot Topc Detecton on BBS Usng Agng Theory. Web Informaton System and Mnng Lecture Notes n Computer Scence, 5854(2009), [8] S. D. Zhu, X. H. Wu, and J. P. Fan. Analyss of Bulletn Board System Hot Topc Based on Multple Keywords Combnaton. n: Management and Servce Scence (MASS), 20 Internatonal Conference, 8(20), -4 [9] Y. Chen, X. Q. Cheng, and S. Yang. Outburst Topc Detecton for Web Forums. Journal of Chnese Informaton Processng, 24(3), 200, [0] K. Chen, L. and Luesukprasert. Hot topc extracton based on tmelne analyss and multdmensonal sentence modelng. Knowledge and Data Engneerng, IEEE Transactons on, vol.9: pp , Aug.2007 [] Y. D. Zhou, X. H. Guan and et al. Approach to extractng hot topcs based on network traffc content. Fronters of Electrcal and Electronc Engneerng, vol.4: pp.20-23,2009 [2] H. X. L, H. P. Zhang, and et al. Keywords based hot topc detecton on Internet[C]. n: Proceedngs of the 5th CCIR, 2009,34-43(n Chnese)

8 JOURNAL OF SOFTWARE, VOL. 8, NO., JANUARY [3] K.Zheng, X. M. Shu, and H. Y. Yuan. Hot Spot Informaton Auto-detecton Method of Network Publc Opnon, Computer Engneerng, 36(3),200,4-6 [4] J. J L, X. H. Zhang, and et al. Blog Hotness Evaluaton Model Based on Text Opnon Analyss, Proceedngs of the 2009 Eghth IEEE Internatonal Conference on Dependable, Autonomc and Secure Computng, p , December 2-4, 2009 [5] E. Z. Zhou, N. Zhong, and Y. F. L. Hot Topc Detecton n Professonal Blogs. Lecture Notes n Computer Scence, 6890(20), 4-52 [6] L. You, Y. P. Du, and et al. BBS Based Hot Topc Retreval Usng Back-Propagaton Neural Network. Proceedngs of st Internatonal Jont Conference on Natural Language Processng, Chna, Sprnger- Verlag, 2005, pp [7] T. T. He, G. Z. Qu, and et al. Sem-automatc Hot Event Detecton. In proceedngs of the 2nd Internatonal Conference on Advanced Data Mnng and Applcatons, 2006, LNAI 4093, pp [8] Knght, K, Hatzvassloglou, V. Two-Level, Many- Paths Generaton. In: Proceedngs of the 33rd Annual Meetng of the Assocaton for Computatonal Lngustcs(ACL-95), Cambrdge, MA(995) [9] Brown, R, Frederkng, R. Applyng Statstcal Englsh Language Modelng to Symbolc Machne Translaton. In: Proceedngs of the Sxth Internatonal Conference on Theoretcal and Methodologcal Issues n Machne Translaton, Leuven, Belgum(995) [20] Langklde, I, Knght, K. Generatng Word Lattces from Abstract Meanng Representaton. Techncal report, Informaton Scence Insttute, Unversty of Southern Calforna(998) [2] Bangalore, S, Rambow, O. Corpus Based Lexcal Choce n Natural Language Generaton. In: Proceedngs of the 38th Annual Meetng of the Assocaton for Computatonal Lngustcs(ACL2000), Hongkong, Chna(2000) [22] Habash, N. Dorr, B. Traum, D. Hybrd Natural Language Generaton from Lexcal Conceptual Structures. Machne Translaton7(2003) Xaodong Wang receved hs M.E. degree n Computer Scence from Tsnghua Unversty, Chna n 993 and receved hs Ph.D. degree n Informaton technology n Educaton from East Chna Normal Unversty, Chna n He s an assocate professor n the College of Computer Scence, Henan Normal Unversty, Chna. Hs current areas of nterest nclude Ontology and Knowledge Engneerng. Emal: wangxaodong.wang@yahho.com.cn Juan Wang s major n Computer Applcaton Technology. She s an undergraduate student at Henan Normal Unversty, Chna. Her research nterests nclude Natural Language Processng, Ontology and Knowledge Engneerng. Emal:juaner.50@63.com

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION

A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION A CALCULATION METHOD OF DEEP WEB ENTITIES RECOGNITION 1 FENG YONG, DANG XIAO-WAN, 3 XU HONG-YAN School of Informaton, Laonng Unversty, Shenyang Laonng E-mal: 1 fyxuhy@163.com, dangxaowan@163.com, 3 xuhongyan_lndx@163.com

More information

Available online at Available online at Advanced in Control Engineering and Information Science

Available online at   Available online at   Advanced in Control Engineering and Information Science Avalable onlne at wwwscencedrectcom Avalable onlne at wwwscencedrectcom Proceda Proceda Engneerng Engneerng 00 (2011) 15000 000 (2011) 1642 1646 Proceda Engneerng wwwelsevercom/locate/proceda Advanced

More information

Query Clustering Using a Hybrid Query Similarity Measure

Query Clustering Using a Hybrid Query Similarity Measure Query clusterng usng a hybrd query smlarty measure Fu. L., Goh, D.H., & Foo, S. (2004). WSEAS Transacton on Computers, 3(3), 700-705. Query Clusterng Usng a Hybrd Query Smlarty Measure Ln Fu, Don Hoe-Lan

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

Network Intrusion Detection Based on PSO-SVM

Network Intrusion Detection Based on PSO-SVM TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.1, No., February 014, pp. 150 ~ 1508 DOI: http://dx.do.org/10.11591/telkomnka.v1.386 150 Network Intruson Detecton Based on PSO-SVM Changsheng Xang*

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Alignment Results of SOBOM for OAEI 2010

Alignment Results of SOBOM for OAEI 2010 Algnment Results of SOBOM for OAEI 2010 Pegang Xu, Yadong Wang, Lang Cheng, Tany Zang School of Computer Scence and Technology Harbn Insttute of Technology, Harbn, Chna pegang.xu@gmal.com, ydwang@ht.edu.cn,

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques

Enhancement of Infrequent Purchased Product Recommendation Using Data Mining Techniques Enhancement of Infrequent Purchased Product Recommendaton Usng Data Mnng Technques Noraswalza Abdullah, Yue Xu, Shlomo Geva, and Mark Loo Dscplne of Computer Scence Faculty of Scence and Technology Queensland

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval Proceedngs of the Thrd NTCIR Workshop Descrpton of NTU Approach to NTCIR3 Multlngual Informaton Retreval Wen-Cheng Ln and Hsn-Hs Chen Department of Computer Scence and Informaton Engneerng Natonal Tawan

More information

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters Proper Choce of Data Used for the Estmaton of Datum Transformaton Parameters Hakan S. KUTOGLU, Turkey Key words: Coordnate systems; transformaton; estmaton, relablty. SUMMARY Advances n technologes and

More information

On-line Hot Topic Recommendation Using Tolerance Rough Set Based Topic Clustering

On-line Hot Topic Recommendation Using Tolerance Rough Set Based Topic Clustering JOURNAL OF COMPUTERS, VOL. 5, NO. 4, APRIL 2010 549 On-lne Hot Topc Recommendaton Usng Tolerance Rough Set Based Topc Clusterng Yonghu Wu, Yuxn Dng, Xaolong Wang, Jun Xu Intellgence Computng Research Center

More information

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 97-735 Volume Issue 9 BoTechnology An Indan Journal FULL PAPER BTAIJ, (9), [333-3] Matlab mult-dmensonal model-based - 3 Chnese football assocaton super league

More information

An Improved Image Segmentation Algorithm Based on the Otsu Method

An Improved Image Segmentation Algorithm Based on the Otsu Method 3th ACIS Internatonal Conference on Software Engneerng, Artfcal Intellgence, Networkng arallel/dstrbuted Computng An Improved Image Segmentaton Algorthm Based on the Otsu Method Mengxng Huang, enjao Yu,

More information

Virtual Machine Migration based on Trust Measurement of Computer Node

Virtual Machine Migration based on Trust Measurement of Computer Node Appled Mechancs and Materals Onlne: 2014-04-04 ISSN: 1662-7482, Vols. 536-537, pp 678-682 do:10.4028/www.scentfc.net/amm.536-537.678 2014 Trans Tech Publcatons, Swtzerland Vrtual Machne Mgraton based on

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures A Novel Adaptve Descrptor Algorthm for Ternary Pattern Textures Fahuan Hu 1,2, Guopng Lu 1 *, Zengwen Dong 1 1.School of Mechancal & Electrcal Engneerng, Nanchang Unversty, Nanchang, 330031, Chna; 2. School

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. [Type text] [Type text] [Type text] ISSN : 0974-74 Volume 0 Issue BoTechnology 04 An Indan Journal FULL PAPER BTAIJ 0() 04 [684-689] Revew on Chna s sports ndustry fnancng market based on market -orented

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Study of Data Stream Clustering Based on Bio-inspired Model

Study of Data Stream Clustering Based on Bio-inspired Model , pp.412-418 http://dx.do.org/10.14257/astl.2014.53.86 Study of Data Stream lusterng Based on Bo-nspred Model Yngme L, Mn L, Jngbo Shao, Gaoyang Wang ollege of omputer Scence and Informaton Engneerng,

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

A Deflected Grid-based Algorithm for Clustering Analysis

A Deflected Grid-based Algorithm for Clustering Analysis A Deflected Grd-based Algorthm for Clusterng Analyss NANCY P. LIN, CHUNG-I CHANG, HAO-EN CHUEH, HUNG-JEN CHEN, WEI-HUA HAO Department of Computer Scence and Informaton Engneerng Tamkang Unversty 5 Yng-chuan

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

A Hybrid Re-ranking Method for Entity Recognition and Linking in Search Queries

A Hybrid Re-ranking Method for Entity Recognition and Linking in Search Queries A Hybrd Re-rankng Method for Entty Recognton and Lnkng n Search Queres Gongbo Tang 1,2, Yutng Guo 2, Dong Yu 1,2(), and Endong Xun 1,2 1 Insttute of Bg Data and Language Educaton, Bejng Language and Culture

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images Internatonal Journal of Informaton and Electroncs Engneerng Vol. 5 No. 6 November 015 Usng Fuzzy Logc to Enhance the Large Sze Remote Sensng Images Trung Nguyen Tu Huy Ngo Hoang and Thoa Vu Van Abstract

More information

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach Angle Estmaton and Correcton of Hand Wrtten, Textual and Large areas of Non-Textual Document Images: A Novel Approach D.R.Ramesh Babu Pyush M Kumat Mahesh D Dhannawat PES Insttute of Technology Research

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval

Combining Multiple Resources, Evidence and Criteria for Genomic Information Retrieval Combnng Multple Resources, Evdence and Crtera for Genomc Informaton Retreval Luo S 1, Je Lu 2 and Jame Callan 2 1 Department of Computer Scence, Purdue Unversty, West Lafayette, IN 47907, USA ls@cs.purdue.edu

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

Pictures at an Exhibition

Pictures at an Exhibition 1 Pctures at an Exhbton Stephane Kwan and Karen Zhu Department of Electrcal Engneerng Stanford Unversty, Stanford, CA 9405 Emal: {skwan1, kyzhu}@stanford.edu Abstract An mage processng algorthm s desgned

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Domain Thesaurus Construction from Wikipedia *

Domain Thesaurus Construction from Wikipedia * Internatonal Conference on Computer, Networks and Communcaton Engneerng (ICCNCE 2013) Doman Thesaurus Constructon from Wkpeda * WenKe Yn 1, Mng Zhu 2, TanHao Chen 2 1 Department of Electronc Engneerng

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

The Study of Remote Sensing Image Classification Based on Support Vector Machine

The Study of Remote Sensing Image Classification Based on Support Vector Machine Sensors & Transducers 03 by IFSA http://www.sensorsportal.com The Study of Remote Sensng Image Classfcaton Based on Support Vector Machne, ZHANG Jan-Hua Key Research Insttute of Yellow Rver Cvlzaton and

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Document Representation and Clustering with WordNet Based Similarity Rough Set Model

Document Representation and Clustering with WordNet Based Similarity Rough Set Model IJCSI Internatonal Journal of Computer Scence Issues, Vol. 8, Issue 5, No 3, September 20 ISSN (Onlne): 694-084 www.ijcsi.org Document Representaton and Clusterng wth WordNet Based Smlarty Rough Set Model

More information

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines (IJCSIS) Internatonal Journal of Computer Scence and Informaton Securty, Herarchcal Web Page Classfcaton Based on a Topc Model and Neghborng Pages Integraton Wongkot Srura Phayung Meesad Choochart Haruechayasak

More information

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION SHI-LIANG SUN, HONG-LEI SHI Department of Computer Scence and Technology, East Chna Normal Unversty 500 Dongchuan Road, Shangha 200241, P. R. Chna E-MAIL: slsun@cs.ecnu.edu.cn,

More information

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study Arabc Text Classfcaton Usng N-Gram Frequency Statstcs A Comparatve Study Lala Khresat Dept. of Computer Scence, Math and Physcs Farlegh Dcknson Unversty 285 Madson Ave, Madson NJ 07940 Khresat@fdu.edu

More information

Face Recognition Based on SVM and 2DPCA

Face Recognition Based on SVM and 2DPCA Vol. 4, o. 3, September, 2011 Face Recognton Based on SVM and 2DPCA Tha Hoang Le, Len Bu Faculty of Informaton Technology, HCMC Unversty of Scence Faculty of Informaton Scences and Engneerng, Unversty

More information

A fast algorithm for color image segmentation

A fast algorithm for color image segmentation Unersty of Wollongong Research Onlne Faculty of Informatcs - Papers (Arche) Faculty of Engneerng and Informaton Scences 006 A fast algorthm for color mage segmentaton L. Dong Unersty of Wollongong, lju@uow.edu.au

More information

Object-Based Techniques for Image Retrieval

Object-Based Techniques for Image Retrieval 54 Zhang, Gao, & Luo Chapter VII Object-Based Technques for Image Retreval Y. J. Zhang, Tsnghua Unversty, Chna Y. Y. Gao, Tsnghua Unversty, Chna Y. Luo, Tsnghua Unversty, Chna ABSTRACT To overcome the

More information

KIDS Lab at ImageCLEF 2012 Personal Photo Retrieval

KIDS Lab at ImageCLEF 2012 Personal Photo Retrieval KD Lab at mageclef 2012 Personal Photo Retreval Cha-We Ku, Been-Chan Chen, Guan-Bn Chen, L-J Gaou, Rong-ng Huang, and ao-en Wang Knowledge, nformaton, and Database ystem Laboratory Department of Computer

More information

Application of Clustering Algorithm in Big Data Sample Set Optimization

Application of Clustering Algorithm in Big Data Sample Set Optimization Applcaton of Clusterng Algorthm n Bg Data Sample Set Optmzaton Yutang Lu 1, Qn Zhang 2 1 Department of Basc Subjects, Henan Insttute of Technology, Xnxang 453002, Chna 2 School of Mathematcs and Informaton

More information

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET 1 BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET TZU-CHENG CHUANG School of Electrcal and Computer Engneerng, Purdue Unversty, West Lafayette, Indana 47907 SAUL B. GELFAND School

More information

Web Document Classification Based on Fuzzy Association

Web Document Classification Based on Fuzzy Association Web Document Classfcaton Based on Fuzzy Assocaton Choochart Haruechayasa, Me-Lng Shyu Department of Electrcal and Computer Engneerng Unversty of Mam Coral Gables, FL 33124, USA charuech@mam.edu, shyu@mam.edu

More information

A Feature-Weighted Instance-Based Learner for Deep Web Search Interface Identification

A Feature-Weighted Instance-Based Learner for Deep Web Search Interface Identification Research Journal of Appled Scences, Engneerng and Technology 5(4): 1278-1283, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scentfc Organzaton, 2013 Submtted: June 28, 2012 Accepted: August 08, 2012

More information

A Clustering Algorithm Solution to the Collaborative Filtering

A Clustering Algorithm Solution to the Collaborative Filtering Internatonal Journal of Scence Vol.4 No.8 017 ISSN: 1813-4890 A Clusterng Algorthm Soluton to the Collaboratve Flterng Yongl Yang 1, a, Fe Xue, b, Yongquan Ca 1, c Zhenhu Nng 1, d,* Hafeng Lu 3, e 1 Faculty

More information

Professional competences training path for an e-commerce major, based on the ISM method

Professional competences training path for an e-commerce major, based on the ISM method World Transactons on Engneerng and Technology Educaton Vol.14, No.4, 2016 2016 WIETE Professonal competences tranng path for an e-commerce maor, based on the ISM method Ru Wang, Pn Peng, L-gang Lu & Lng

More information

Audio Content Classification Method Research Based on Two-step Strategy

Audio Content Classification Method Research Based on Two-step Strategy (IJACSA) Internatonal Journal of Advanced Computer Scence and Applcatons, Audo Content Classfcaton Method Research Based on Two-step Strategy Sume Lang Department of Computer Scence and Technology Chongqng

More information

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION 1 THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Seres A, OF THE ROMANIAN ACADEMY Volume 4, Number 2/2003, pp.000-000 A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION Tudor BARBU Insttute

More information

Semantic Image Retrieval Using Region Based Inverted File

Semantic Image Retrieval Using Region Based Inverted File Semantc Image Retreval Usng Regon Based Inverted Fle Dengsheng Zhang, Md Monrul Islam, Guoun Lu and Jn Hou 2 Gppsland School of Informaton Technology, Monash Unversty Churchll, VIC 3842, Australa E-mal:

More information

Ontology Generator from Relational Database Based on Jena

Ontology Generator from Relational Database Based on Jena Computer and Informaton Scence Vol. 3, No. 2; May 2010 Ontology Generator from Relatonal Database Based on Jena Shufeng Zhou (Correspondng author) College of Mathematcs Scence, Laocheng Unversty No.34

More information

Application of k-nn Classifier to Categorizing French Financial News

Application of k-nn Classifier to Categorizing French Financial News Applcaton of k-nn Classfer to Categorzng French Fnancal News Huazhong KOU, Georges GARDARIN 2, Alan D'heygère 2, Karne Zetoun PRSM Laboratory, Unversty of Versalles Sant-Quentn 45 Etats-Uns Road, 78035

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

An Approach to Real-Time Recognition of Chinese Handwritten Sentences

An Approach to Real-Time Recognition of Chinese Handwritten Sentences An Approach to Real-Tme Recognton of Chnese Handwrtten Sentences Da-Han Wang, Cheng-Ln Lu Natonal Laboratory of Pattern Recognton, Insttute of Automaton of Chnese Academy of Scences, Bejng 100190, P.R.

More information

Classic Term Weighting Technique for Mining Web Content Outliers

Classic Term Weighting Technique for Mining Web Content Outliers Internatonal Conference on Computatonal Technques and Artfcal Intellgence (ICCTAI'2012) Penang, Malaysa Classc Term Weghtng Technque for Mnng Web Content Outlers W.R. Wan Zulkfel, N. Mustapha, and A. Mustapha

More information

Keyword-based Document Clustering

Keyword-based Document Clustering Keyword-based ocument lusterng Seung-Shk Kang School of omputer Scence Kookmn Unversty & AIrc hungnung-dong Songbuk-gu Seoul 36-72 Korea sskang@kookmn.ac.kr Abstract ocument clusterng s an aggregaton of

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

A Similarity Measure Method for Symbolization Time Series

A Similarity Measure Method for Symbolization Time Series Research Journal of Appled Scences, Engneerng and Technology 5(5): 1726-1730, 2013 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scentfc Organzaton, 2013 Submtted: July 27, 2012 Accepted: September 03, 2012

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Data Preprocessing Based on Partially Supervised Learning Na Liu1,2, a, Guanglai Gao1,b, Guiping Liu2,c

Data Preprocessing Based on Partially Supervised Learning Na Liu1,2, a, Guanglai Gao1,b, Guiping Liu2,c 6th Internatonal Conference on Informaton Engneerng for Mechancs and Materals (ICIMM 2016) Data Preprocessng Based on Partally Supervsed Learnng Na Lu1,2, a, Guangla Gao1,b, Gupng Lu2,c 1 College of Computer

More information

Clustering Algorithm Combining CPSO with K-Means Chunqin Gu 1, a, Qian Tao 2, b

Clustering Algorithm Combining CPSO with K-Means Chunqin Gu 1, a, Qian Tao 2, b Internatonal Conference on Advances n Mechancal Engneerng and Industral Informatcs (AMEII 05) Clusterng Algorthm Combnng CPSO wth K-Means Chunqn Gu, a, Qan Tao, b Department of Informaton Scence, Zhongka

More information

A Novel Optimization Technique for Translation Retrieval in Networks Search Engines

A Novel Optimization Technique for Translation Retrieval in Networks Search Engines A Novel Optmzaton Technque for Translaton Retreval n Networks Search Engnes Yanyan Zhang Zhengzhou Unversty of Industral Technology, Henan, Chna Abstract - Ths paper studes models of Translaton Retreval.e.

More information

HIGH-LEVEL SEMANTICS OF IMAGES IN WEB DOCUMENTS USING WEIGHTED TAGS AND STRENGTH MATRIX

HIGH-LEVEL SEMANTICS OF IMAGES IN WEB DOCUMENTS USING WEIGHTED TAGS AND STRENGTH MATRIX HIGH-LEVEL SEMANTICS OF IMAGES IN WEB DOCUMENTS USING WEIGHTED TAGS AND STRENGTH MATRIX P.Shanmugavadvu 1, P.Sumathy 2, A.Vadvel 3 12 Department of Computer Scence and Applcatons, Gandhgram Rural Insttute,

More information

Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence

Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence 2nd Internatonal Conference on Software Engneerng, Knowledge Engneerng and Informaton Engneerng (SEKEIE 204) Text Smlarty Computng Based on LDA Topc Model and Word Co-occurrence Mngla Shao School of Computer,

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

A Novel Term_Class Relevance Measure for Text Categorization

A Novel Term_Class Relevance Measure for Text Categorization A Novel Term_Class Relevance Measure for Text Categorzaton D S Guru, Mahamad Suhl Department of Studes n Computer Scence, Unversty of Mysore, Mysore, Inda Abstract: In ths paper, we ntroduce a new measure

More information

Novel Pattern-based Fingerprint Recognition Technique Using 2D Wavelet Decomposition

Novel Pattern-based Fingerprint Recognition Technique Using 2D Wavelet Decomposition Mathematcal Methods for Informaton Scence and Economcs Novel Pattern-based Fngerprnt Recognton Technque Usng D Wavelet Decomposton TUDOR BARBU Insttute of Computer Scence of the Romanan Academy T. Codrescu,,

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Algorithm for Human Skin Detection Using Fuzzy Logic

Algorithm for Human Skin Detection Using Fuzzy Logic Algorthm for Human Skn Detecton Usng Fuzzy Logc Mrtunjay Ra, R. K. Yadav, Gaurav Snha Department of Electroncs & Communcaton Engneerng JRE Group of Insttutons, Greater Noda, Inda er.mrtunjayra@gmal.com

More information

DYNAMIC NETWORK OF CONCEPTS FROM WEB-PUBLICATIONS

DYNAMIC NETWORK OF CONCEPTS FROM WEB-PUBLICATIONS DYNAMIC NETWORK OF CONCEPTS FROM WEB-PUBLICATIONS Lande D.V. (dwl@vst.net), IC «ELVISTI», NTUU «KPI» Snarsk A.A. (asnarsk@gmal.com), NTUU «KPI» The network, the nodes of whch are concepts (people's names,

More information

Gender Classification using Interlaced Derivative Patterns

Gender Classification using Interlaced Derivative Patterns Gender Classfcaton usng Interlaced Dervatve Patterns Author Shobernejad, Ameneh, Gao, Yongsheng Publshed 2 Conference Ttle Proceedngs of the 2th Internatonal Conference on Pattern Recognton (ICPR 2) DOI

More information

Cross-Language Information Retrieval

Cross-Language Information Retrieval Feature Artcle: Cross-Language Informaton Retreval 19 Cross-Language Informaton Retreval Jan-Yun Ne 1 Abstract A research group n Unversty of Montreal has worked on the problem of cross-language nformaton

More information

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Positive Semi-definite Programming Localization in Wireless Sensor Networks Postve Sem-defnte Programmng Localzaton n Wreless Sensor etworks Shengdong Xe 1,, Jn Wang, Aqun Hu 1, Yunl Gu, Jang Xu, 1 School of Informaton Scence and Engneerng, Southeast Unversty, 10096, anjng Computer

More information

SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB

SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB V. Hotař, A. Hotař Techncal Unversty of Lberec, Department of Glass Producng Machnes and Robotcs, Department of Materal

More information

A Knowledge Management System for Organizing MEDLINE Database

A Knowledge Management System for Organizing MEDLINE Database A Knowledge Management System for Organzng MEDLINE Database Hyunk Km, Su-Shng Chen Computer and Informaton Scence Engneerng Department, Unversty of Florda, Ganesvlle, Florda 32611, USA Wth the exploson

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Load-Balanced Anycast Routing

Load-Balanced Anycast Routing Load-Balanced Anycast Routng Chng-Yu Ln, Jung-Hua Lo, and Sy-Yen Kuo Department of Electrcal Engneerng atonal Tawan Unversty, Tape, Tawan sykuo@cc.ee.ntu.edu.tw Abstract For fault-tolerance and load-balance

More information

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research Schedulng Remote Access to Scentfc Instruments n Cybernfrastructure for Educaton and Research Je Yn 1, Junwe Cao 2,3,*, Yuexuan Wang 4, Lanchen Lu 1,3 and Cheng Wu 1,3 1 Natonal CIMS Engneerng and Research

More information

Clustering Algorithm of Similarity Segmentation based on Point Sorting

Clustering Algorithm of Similarity Segmentation based on Point Sorting Internatonal onference on Logstcs Engneerng, Management and omputer Scence (LEMS 2015) lusterng Algorthm of Smlarty Segmentaton based on Pont Sortng Hanbng L, Yan Wang*, Lan Huang, Mngda L, Yng Sun, Hanyuan

More information