CERIAS Tech Report Spam Detection in Voice-over-IP Calls through Semi-Supervised Clustering by Yu-Sung Wu, Saurabh Bagchi, Navjot Singh,

Size: px
Start display at page:

Download "CERIAS Tech Report Spam Detection in Voice-over-IP Calls through Semi-Supervised Clustering by Yu-Sung Wu, Saurabh Bagchi, Navjot Singh,"

Transcription

1 CERIAS Tec Report 9-3 Spam Detecton n Voce-over-IP Calls troug Sem-Supervsed Clusterng by Yu-Sung Wu, Saurab Bagc, Navjot Sng, Ratsameetp Wta Center for Educaton and Researc Informaton Assurance and Securty Purdue Unversty, West Lafayette, IN

2 Spam Detecton n Voce-over-IP Calls troug Sem- Supervsed Clusterng Yu-Sung Wu, Navjot Sng Ratsameetp Wta 3 Saurab Bagc Scool of Electrcal & Computer Eng., Purdue Unversty, West Lafayette, IN 4797 Avaya Labs, 33 Mt. Ary Rd., Baskng Rdge, NJ 79 3 Culalongkorn Unversty, Taland {yswu,sbagc@purdue.edu, sng@avaya.com, Ratsameetp.W@Student.cula.ac.t ABSTRACT In ts paper, we present an approac for detecton of spam calls over IP telepony called SPIT n Voce-over-IP (VoIP) systems. SPIT detecton s dfferent from spam detecton n emal n tat te process as to be soft real-tme, fewer features are avalable for examnaton due to te dffculty of mnng voce traffc at runtme, and smlarty n sgnalng traffc between legtmate and malcous callers. Our approac dffers from exstng work n ts adaptablty to new envronments wtout te need for laborous and error-prone manual parameter confguraton. We use clusterng based on te call parameters leveragng optonal user feedback for some calls, wc tey mark as SPIT or non-spit. We mprove on a popular algortm for sem-supervsed learnng, called MPC-Means, to make t scalable to a large number of calls. Our evaluaton on captured call traces sows a ffteen fold reducton n computaton tme, wt mprovement n detecton accuracy. eywords Voce-over-IP systems, spam detecton, spt detecton, semsupervsed learnng, clusterng. INTRODUCTION As te popularty of VoIP systems ncreases, tey are beng subjected to dfferent knds of securty treats []. A large class of te treats suc as call reroutng, toll fraud, and conversaton jackng ncur devatons n te protocol state macnes and can be detected troug montorng te protocol state transtons [],[3]. Addtonally, cryptograpcally secure versons of te common VoIP protocols, suc as Secure SIP and Secure RTP, address many of te attacks presented n te lterature. However, spam calls n VoIP [4], commonly called SPIT, are becomng an ncreasng nusance. Te ease wt wc automated SPIT calls can be launced can deral te adopton of VoIP as a crtcal nfrastructure element. Exstng montorng and cryptograpc solutons are not mmedately applcable to SPIT detecton. In ts paper, we address te problem of detecton of SPIT calls. Detecton of spam emals s a mature feld and tere are some smlartes to our problem. In bot domans, users can provde feedback about ndvdual emal or call, often troug a bult-n button n commercally avalable VoIP pones. However, tere exst sgnfcant dfferences VoIP traffc s real-tme and te detecton sould deally be real-tme as well, some features are expensve to extract n real-tme, specfcally tose n te voce meda traffc, te sgnalng patterns are lkely smlar n legtmate and malcous calls renderng content-based flterng on sgnalng traffc neffectve, and features from multple protocols used n VoIP may be relevant. In ts paper, we present te desgn of a system tat uses semsupervsed macne learnng for detecton of SPIT calls. It bulds on te noton of clusterng wereby calls wt smlar features are placed n a cluster for SPIT or legtmate calls. Call features nclude tose extracted drectly from sgnalng traffc suc as te source and destnaton addresses, extracted from meda traffc, suc as proporton of slence, and derved from calls, suc as duraton and frequency of calls. Approaces tat use tresolds [5] on te call features are dffcult to use n practce snce te nature of SPIT calls vares wdely. Te popular sem-supervsed clusterng algortm called MPC-Means [6] scales as O(N 3 ) were N s te number of calls. Ts would generally be too expensve for real-tme operaton. We modfy ts to create our algortm called empc-means, usng VoIP specfc features to reduce t to O(N). Suc specalzaton ncludes te early use of user feedback and pror knowledge of te number of clusters. Addtonally, we create an ncremental protocol called pmpc-means, tat can perform te detecton as soon as te call s establsed provdng te opton of automatcally angng up a suspect call. We evaluate te protocols usng four call traces wt dfferent caracterstcs of SPIT and non-spit calls, over dfferent proportons of user feedback and accuracy of te user feedback. Wt a batc of 4 calls, empc-means s 5 tmes faster tan MPC-Means, wle acevng better detecton coverage n terms of true and false postves. Snce pmpc-means can examne a lmted set of call features, t works well only wt a large fracton of calls wt accurate user feedback.. RELATED WOR Rosenberg [4] detals te problem of VoIP SPIT and gves varous g-level conceptual solutons. Te solutons can be placed n tree categores [7]: () Non-ntrusve metods based on te excange and analyss of sgnalng messages; () Interacton metods tat create nconvenences for te caller by requestng er to pass a ceckng procedure before te call s establsed; (3) Callee nteracton metods tat excange nformaton wt te callee on eac call. An example work n category s [8] were te autors look at te SIP sgnalng traffc pattern to detect SPIT. However, tey do not provde quanttatve data on te detecton accuracy. Our expermental results ndcate solely relyng on SIP message patterns wll gve low detecton coverage. Te work by Quttek [7] generates a greetng sound or faked rng tone to te caller rgt after te call s establsed and montors te response voce patterns from te caller to dfferentate between uman

3 caller and a SPIT generator. Ts falls n category. In comparson, our work encompasses categores and 3. olan [9] presents an approac based on trust, reputaton, and socal networkng. Tey mantan te trust nformaton for eac caller, wc can be automatcally bult up troug user feedback, or troug a propagaton of reputaton va socal networks. Ter approac can be used wt our work n wc we can embed te caller s trust as one of te dmensons n our call data ponts. A dstncton between ter work and ours s tat ter approac s ted wt eac caller, wle our approac looks at eac pone call drectly. Te sze of te reputaton database may grow large and a reputaton system can be gamed by false prase or false negatve ratngs. For spam detecton n emal, te famly of Nave Bayes (NB) classfers [],[] s probably one of te most commonly mplemented. Tey extract keywords and oter ndcators from emal messages and determne weter te messages are spam usng some statstcal or eurstc sceme. However, spammers nowadays are usng ncreasngly sopstcated tecnques to trck content-based flters by clever manpulaton of te spam content, leadng to a vrtual arms race. Recent work [],[3] as used features for te senders extracted from socal networks n supervsed learnng or n creatng wte/blacklsts. Supervsed learnng reles on a tranng pase wt accurate tranng data, wle our approac does not. Clusterng s a way to learn a classfcaton from te data [4], especally wt unlabelled data. Clusterng tecnques ave been used on detectng e-mal spam by [5] and [6]. On te oter and, classfcaton tecnques suc as SVM[7] s a popular approac for dong data classfcaton. However, tey typcally requre labeled data and doesn t take unlabelled data nto consderaton. Recent development on sem-supervsed classfcaton tecnques [8] suc as sem-supervsed SVM[9] addresses ts. 3. DESIGN 3. Structure of VoIP calls A. Call Establsment B. Meda Stream (RTP/RTCP) / Call Mantenance C. Call Tear Down Fgure. Tree pases n a VoIP call Fgure sows te typcal tree pases nvolved n a VoIP pone call []. Te frst pase s call establsment troug a tree-way andsake, wc nvolves te caller sendng SIP INVITE messages to te proxy server, te server forwardng te INVITE message to te callee, te callee replyng wt SIP O message, and eventually SIP AC message to complete te call establsment pase. Te second pase s te conversaton, wc contans te meda stream (voce) transmtted between te caller and te callee typcally usng RTP/RTCP []. Te last pase s te call tear down pase, wc can be ntated by eter te caller or te callee sendng SIP BYE messages followed by SIP O and SIP AC messages. 3. Caracterstcs of VoIP SPIT calls In ts secton, we elaborate on te dfferent patterns one can perceve between SPIT and non-spit calls. 3.. Look at eac ndvdual call Wen lookng at eac ndvdual call, bot SPIT and non-spit calls follow te same tree pases descrbed n Sec. 3.. From te call establsment pase, te only possble dfferences one can really tell s te source IP and te From URI n te SIP INVITE (te pone number of te caller/sptter). Tere may be dfferences n te User-Agent feld and te SDP part n te SIP INVITE f te pones used by te sptter are of a dfferent model tan te ones used by legtmate users. However, ts s a dubous caracterstc to base te determnaton on. In general, one can only resort to a blacklst-based approac to dstngus between SPIT calls and non-spit calls wen lookng at te call establsment pase. In te meda stream pase, a typcal pattern one can magne for SPIT calls s tat te caller tends to speak more tan te callee. Anoter pattern s tat te lengt of te meda stream pase,.e., te call duraton, could be sorter n te case of calls answered by a lve person snce SPIT calls are generally undesrable. Te call tear down pase nvolves te same SIP BYE/O/AC messages for SPIT and non-spit calls. A possble dfference toug s wc of te two partes angs up te call frst, wc translates to wc sde sends te SIP BYE message. Assumng SPIT calls are undesrable, one can assume tat t s more lkely tat a call wll be ung up by te callee. Ts mode of lookng at ndvdual calls would be useful to an ndvdual consumer wo would be elped by a SPIT call beng dropped wtout er avng to pck up te call. Te callenge ere s wat would be te approprate tresold values on te dfferent call features to determne f a call s SPIT or not. We beleve tat due to te dversty of SPIT calls, t s dangerous to use statc tresolds for any of te call features. Ts prompted us to explore more dynamc scemes tat automatcally adapt to cangng call patterns. 3.. Look at te wole batc Snce SPIT calls are usually large volume calls made by some sptter wtn a perod of tme, we found tat t s also useful to look for te correspondng pattern n a batc of calls. Certan features are avalable wen lookng at te collectve set of calls. Tese nclude te nter-arrval tme between calls, te densty of calls, etc. Ts mode would be useful say to a servce provder tat s nterested n dentfyng and blacklstng sources of SPIT calls n ts networks. 3.3 Te detecton sceme A VoIP envronment typcally conssts of multple domans wt eac doman composed of a few proxy servers and pones belongng to end users. Te pones can be soft pones (a general purpose computer wt approprate software) or ard pones (ardware for makng and recevng VoIP calls). Te two domnant sgnalng protocols used n VoIP are H.33 and SIP, wt te latter becomng ncreasngly domnant. Fgure sows an example VoIP envronment consstng of two domans. In a VoIP envronment, a proxy server s man functon s to route te sgnalng messages. For te specfc example we sow, ere Proxy # s used to route te sgnalng messages among pones {A,B,C. And smlarly, Proxy # s used to route te sgnalng messages among pones {E,F. Cross doman pone calls

4 {A,B,C {E,F are collaboratvely andled by Proxy # and Proxy #. Once a pone call s establsed, subsequent sgnalng messages can possbly travel drectly between pones not va te proxes. However, an ISP can mandate all sgnalng traffc troug te proxes, wc s usually te case for bllng/securty purposes. Legend S Te meda streams followng call establsment are usually transmtted drectly between te pones suc tat te workload on te server can be reduced. However, t s also possble to let te servers proxy te meda streams as well. Ts s used wen transcodng of te meda stream s requred (e.g. te two pones do not use te same codec.) and for securty and accountng purposes. Our approac n detectng SPIT calls nvolves placng local detectors at te SIP Proxy and te pones n te managed doman. Essentally, te detectors observe te sgnalng and meda streams wtn te doman no matter tey are proxed or sent drectly from end to end. Te domans tat ave our detecton mecansm are called managed domans and oters are called unmanaged domans. We assume a sptter can exst as any pone n a VoIP envronment, weter wtn a managed (pone B) or an unmanaged doman (pone E). New pone call detected A Early Detecton (before meda stream s establsed) A3 : normal user : sptter SIP based VoIP Proxy Server # Clent -sde Detector B Annotate pone call B wt optonal Collecton of B3 user feedback nformaton all detected pone calls A Server -sde Detector Clent -sde Detector S Spt Detector Clent -sde Detector A B C Clusters of spt calls. SIP based VoIP Proxy Server # S Fgure 3. Detectng Spt Calls n a VoIP Envronment B4 E F Semsupervsed clusterng Te detectors embedded at te server and te pones collect te nformaton of te pone calls and send tem to te SPITDetector, wc s te place were te logc on dfferentatng SPIT calls from non-spit calls executes. Te nformaton about a pone call may nclude From URI (caller dentty), To URI (callee dentty), slence duraton n te RTP meda streams, etc. Snce te decodng of tese nformaton are andled by te respectve server-sde/clent-sde detectors and only a dgest of te necessary nformaton s forwarded up to te SPITDetector, network traffc s mnmzed. If scalablty becomes an ssue, we can dvde a doman nto sub-domans and ave separate SPITDetectors perform detecton for pone calls wtn eac sub-doman concurrently. User feedback about calls can be sared between te sub-domans wtn one doman. Our approac accommodates ncremental deployment n tat clents n managed domans wll ave detecton of SPIT calls, weter tey orgnate from managed or unmanaged domans. Essentally, our problem s a data classfcaton problem, were we want to classfy pone calls nto SPIT calls and non- SPIT calls. In te SPITDetector, we employ a sem-supervsed clusterng tecnque, wc s one of te recent data classfcaton approaces []. Fgure sows te process flow nvolved n te SPITDetector, wc supports two modes of detecton: Mode A: Look at eac pone call wt early detecton In ts mode, te SPITDetector as to determne weter a call s a SPIT or not before te meda stream of te call s establsed. Ts means tat te detecton as to be completed before te callee pcks up te pone. Ts mode s useful from an end-user s pont of vew snce SPIT calls can be potentally blocked wtout furter annoyance. In Fgure, pat A represents te extracton of features (call caracterstcs) of a call durng te call establsment pase (Sec. 3.). Te features are terefore lmted to source IP, From URI, To URI, and start tme of te call (Sec. 3..). Te detecton s carred out troug calculatng te dstance of te call n queston wt te current SPIT/non-SPIT clusters (Sec. 3.5). If te call s closer to te SPIT cluster n terms of te dstance metrc, t wll be regarded as a SPIT call va pat A3, wc leads to te termnaton of te pone call. Mode B: Look at te wole batc of pone calls Wt Mode B, we assume receved calls are kept n a collecton troug Pat B and B. Te calls n te collecton are ten presented n a batc to te sem-supervsed clusterng algortm. Ts detecton mode provdes ger detecton accuracy tan Mode A due to te avalablty of complete call feature nformaton. Due to te nteractve nature of VoIP calls, Mode B s probably not as attractve to an end-user as te detecton s carred out after a pone call s completed. However, we see possble applcaton from a servce provder s perspectve, wo can resort to actons aganst tose accounts tat ave been nvolved n placng SPIT calls. Ts can be used togeter wt legslatve measures n place for puntve actons aganst tose wo volate te do-not-call lst or target SPIT calls to cell pones. Also, an accurate dentfcaton of SPIT actvtes can be an ntegral part of system performance montorng. Termnate te pone call f detected as a spt call Clusters of non-spt calls. B5 Servce provder nterventon Fgure. Process flow n SPITDetector 3

5 3.4 SPIT Detecton usng Sem-Supervsed Clusterng 3.4. Clusterng & Sem-Supervsed Clusterng: Background Clusterng s a popular approac for data classfcaton wc groups data ponts nto clusters suc tat ponts n te same cluster are more smlar to eac oter tan to ponts n te oter clusters accordng to some defned crtera [3]. For example, te classcal -Means algortm [4] clusters N data ponts {x,x,,x N nto clusters {X, X,.., X wt centrods {μ, μ,, μ. Te correspondng objectve functon s tat: j= x X j x μ s mnmzed () In our problem context, eac VoIP call s regarded as one data pont. We are nterested n clusterng call data ponts nto two clusters, one contanng te SPIT call data ponts, and te oter contanng te non-spit call data ponts. In general, tere may be multple sub-clusters wtn eac cluster correspondng to radcally dfferent knds of SPIT or non-spit calls. We explore ts approac of multple sub-clusters furter n Sec Tere s abundant exstng work, suc as [5] and [6], on applyng clusterng tecnques to dentfy e-mal spam, wc furter motvates our attempt on applyng clusterng tecnque for solvng te VoIP SPIT detecton problem. Classcally clusterng s regarded as an unsupervsed data classfcaton tecnque. Ts means tat except for te crtera (lke Eq. () for -Means), tere s no need to provde some labeled data n advance as te tranng set. In classcal -Means, eac dmenson of te data pont vector x s regarded as equally mportant n te Eucldean dstance calculaton. Ts mples tat only te most relevant features sould be converted to a dmenson n a data pont vector snce te oter less-relevant features wll act as nose n te clusterng process and wll result n poorer clusterng qualty. Sem-supervsed clusterng [], [4], [6] s a recent development n te data clusterng researc communty tat ams to address te aforementoned ssue on selectng te proper crtera for clusterng. Sem-supervsed clusterng allows te use of optonal labeled data for a subset of te observatons seen at runtme to progressvely modfy te clusterng crtera. Ts means tat one does not need to worry about wc features sould go nto te data pont and wc sould not. Te clusterng crtera wll be traned nto generatng clusters tat obey te userlabeled data as fatfully as possble [6]. Te mplct assumpton s tat user feedback s perfectly accurate. In our work ere, we evaluate te mpact of nose n te user feedback Features of a VoIP call for dentfyng patterns of SPIT calls We construct a data pont for eac VoIP call based on 7 features :.From URI,.To URI, 3.Start tme, 4. Duraton, 5.# of SIP INVITE messages, 6.# of AC messages, 7.# of BYE messages from caller, 8.# of BYE messages from callee, 9. Tme snce te last call from te orgnator of te current call, -5.# of xx, xx, 3xx, 4xx, 5xx, and 6xx SIP Response messages, 6.Call frequency of te orgnator of te current call, 7.Rato of te non-slence duraton of te callee meda stream over te caller meda stream. For Mode A early detecton (Sec. 3.3), only features,, 3, and 9 are avalable. Feature 7 s derved from te RTP meda stream by clent-sde detectors f te meda streams are confgured to flow drectly between clents [5] or t can be provded by te server-sde detector f te meda streams are confgured to flow troug te SIP Proxy. In our desgn, expert knowledge on te representatveness of eac feature for dentfyng a SPIT call s not a known pror. Te desgn reles on sem-supervsed clusterng to automatcally rewegtng te mportance of eac feature n te runtme. We select features wc rougly cover dfferent facets of a VoIP call and attempt to keep a small number of features to lmt te nvolved tme/space complextes n te clusterng process User feedback as labeled data for semsupervsed clusterng As mentoned earler, sem-supervsed clusterng allows user labeled data to be suppled to tran te clusterng crtera. Here, we assume pone calls receved n te managed doman (A and C n our example Fgure ) can ave optonal user feedback nformaton ndcatng weter a call s a SPIT call or a non-spit call. Te correspondng data pont wll be labeled wt a SPIT or a non-spit tag and fed nto te sem-supervsed clusterng process. In te deal case, te effect s tat te clusterng process wll ntellgently group all SPIT calls nto one cluster and group non-spit calls nto te oter cluster and also dentfy wc call features are useful n makng ts dstncton. Te use of user feedback and sem-supervsed clusterng enables our detecton sceme to adapt to dfferent envronments. For example, n a corporate nternal VoIP system, g frequency calls could be a common case for non-spit calls, wle n a ouseold communty VoIP system, g frequency calls could be an ndcator for SPIT calls. Te user feedback can be mplemented as a Yes/No button on te pone for te user to press to ndcate a receved pone call as SPIT/non-SPIT [6]. In general, we argue aganst te use tresold-based or statc rulebased scemes for detecton of SPIT calls n VoIP envronments Extended -Means for sem-supervsed clusterng: MPC-Means For ts work, we select te sem-supervsed clusterng algortm called MPC-Means [6]. τ mpckm = x x χ ( x,x ) j x M ( x,x ) j x C ( ( )) μl log det A l Al + wj fm x,x j l lj + wj fc x,x j l = lj T () x μ = x A x A μ l l μ (3) f M x,x j = x xj + x x A j (4) l A lj C x,x j x l x x x l j A A f = (5) l l j 4

6 T A = X ( x μ)( x μ) x X T + wj ( x xj)( x xj) l lj x, xj M T + wj ( x x )( x x ) x, xj C T ( x xj)( x xj) l = lj Eq. () s te objectve functon n MPC-Means. l s te cluster tat pont x s assocated wt. Te man dea s te same as classc -Means were ntra-cluster dstance s beng mnmzed. However te Eucldean dstance metrc n MPC- Means s wegted by a cluster-specfc matrx A l (one can also use te same A matrx across all clusters. [6]). A l s modfed based on user feedback and ponts n cluster l followng Eq. (6). Te user labeled data n MPC-Means s suppled n te form of clusterng constrants M (must lnk sets) and C (cannot lnk set). Here M set specfes pars of data ponts tat sould be put n te same cluster wle te C set specfes tose pars of data ponts tat sould not be put n te same cluster. In (), te last two terms are used to add penalty to te objectve functon from te volaton of te constrants. Te functon f M returns a value proportonal to te dstance between te two ponts tat are n dfferent clusters. Te functon f C returns a value tat s nversely proportonal to te dstance between two ponts tat are n te same cluster. Te ponts x l and x l represent te two fartest data ponts n X l wt respect to ter dstance computed usng A l. Te pseudo code for MPC-Means s lsted as Algortm. Algortm: MPC-Means Input: Set of data ponts { M = ( x, xj), Set of cannot-lnk constrants C ( x, xj) { N (6) X = x, Set of must-lnk constrants = { =, Number of clusters, sets of constrants costs W and W, t. Output: Dsjont -parttonng { functon τ mpckm s locally mnmzed. Metod:. Intalze clusters:.. Create te λ negboroods { N P λ P= λ f () Intalze { μ = startng from te largest N P. Else () ntalze { μ λ = X = of X suc tat objectve from M and C. usng wegtest fartest-frst traversal N λ = wt centrods of { P P ntalze remanng clusters at random. Repeat untl convergence.. For eac data pont x X ( * = argmn x μ log det ( A ) A (, ) (, ) ) + w f x x l + w f x x = l ( x, xj) M j M j j ( x, xj) C j C j j t+ Assgn x to X *.. For eac cluster X, { x μ ( t+ ) t+ t X x X +.3. Update_metrcs A for all clusters { X = (Eq. (6)).4. t t + Algortm. MPC-Means (Adapted from [6]) Mappng user feedback to par-wse constrants n MPC-Means Te user feedback model s tat a user (te callee) n te managed doman can optonally ndcate weter a receved pone call s a SPIT or not. For te correspondng data pont x, te user ndcates x F S for a SPIT call and ndcates x F N for a non-spit call. Te system keeps two sets: F S (data ponts of SPIT calls from feedback) and F N (data ponts of non-spit calls from feedback). Wt respect to te MPC-Means algortm, must-lnk constrants M are derved onlne from pars of ponts (x, x j ) F S or (x, x j ) F N. Smlarly, cannot-lnk constrants C are created onlne from (x, x j ), were x F S and x j F N. For ease of exposton, we ntally dscuss te case wt clusters one for SPIT and te oter for non-spit calls. We dscuss te extenson to consder multple clusters for SPIT / non- SPIT calls n Sec Buldng detecton predcate based on cluster assocatons Gven a cluster X from te clusterng algortm, we use te majorty of te data ponts wt user feedback n te cluster to determne te assocaton of te cluster. If X FS > X FN, te calls n X wll be consdered SPIT calls; oterwse, tey wll be consdered non-spit calls. 3.5 empc-means (Effcent batc-mode MPC-Means) In te cluster assgnment step of MPC-Means (Step.) te tme complexty on teratng troug te must-lnk/cannot-lnk peers of pont x s a O(N) operaton. X s te wole set of data ponts suppled to te clusterng algortm. N= X s te number of data ponts. Te determnaton of te maxmally separated ponts ' x and '' x used n f c (.) (Step. of Algortm ) and update_metrcs (Step.3) as tme complexty O(N ). Ts mples MPC-Means s O(N 3 ) snce te operaton as to be done for eac data pont. Tus, MPC-Means does not scale well wt large data sets. For our applcaton, were N can be undreds for a small-szed doman or tousands for a md-szed doman, t turns out to be probtve tme-wse to apply te orgnal MPC- Means drectly. Terefore, we adapt MPC-Means nto te empc-means (effcent MPC-Means) algortm, wt te maxmally separated ponts estmated troug an O() approxmaton algortm. We use an O(N) mplementaton for te negborood creaton process n te cluster ntalzaton step of MPC-Means. Addtonally, te general practcal experence wt a -Means based algortm s tat t converges wtn a small number of teratons for te man loop Step n MPC-Means. Combned tese make empc-means lnear tme complexty wt respect to 5

7 te number of data ponts N and te constant s small for a range of VoIP call traces empc-means : Intalze clusters Te empc-means algortm creates te ntal negboroods drectly from te user feedback F S and F N sets. Specfcally, t creates w negboroods {F S, F N, x n3, x n4,, x nw, were {x n3, x n4,, x nw = X-F S -F N s te set of data ponts not covered by te user feedback. Te complexty of ts step s O(N). We use te same wegted-fartest-frst traversal as n MPC- Means, wc s O(N) wen te number of clusters s a constant [7]. Overall, te ntalze clusters n empc-means as O(N) complexty empc-means : effcent estmaton of maxmally separated ponts ( x, x ) In MPC-Means, to fnd te exact maxmally separated ponts x, x used n Eq. (5) and Eq. (6), t requres evaluatng te ( ) dstance x x j A for every par of ponts (x, x j ) X, wc s an O(N ) operaton. Snce te matrx A s updated n eac teraton of te loop of step n Algortm, ts evaluaton as to be repeated as well. In empc-means, we estmate te maxmally separated ponts by frst puttng data ponts from X nto an array R[..N] n a random orderng. We ten terate troug consecutve elements R[] and R[+] n te array. We set (, ) x x to (R[ ], R[ +]) tat gves te maxmal value of R[ '] R[ ' + ]. Ts A operaton (Step, Algortm ) s performed once rgt after te cluster ntalzaton step. Te tme complexty of ts step s O(N). However, snce te A matrx s updated n eac teraton of MPC-Means (Step.3, Algortm ), te estmate (, ) x x as to be updated accordngly as well. We embed te updatng process nto te calculaton of te parameterzed Eucldean dstance x xj (Eq. (3)). Te parameterzed Eucldean A dstance s calculated n Eq. (4) and Eq. (5) as well. Te dea ere s tat wen a par of ponts (x, x j ) s found to ave a greater dstance tan te current estmate ( x, x) at te tme of evaluatng te parameterzed Eucldean dstance, we wll set te maxmally separately ponts estmate to (x, x j ). Te advantage of ts approac s tat t s an O() operaton and does not ncrease te order of complexty of te empc-means algortm. However, ts s an approxmaton because suppose, n te loop to terate troug all te ponts, we are at pont x A and are calculatng x A - x B. Te pont x C s to be consdered n a later teraton and (x A, x C ) appens to be te fartest par of ponts. Ten, te computaton for pont x A wll not ave te accurate dstance for te fartest par of ponts. Hereafter, wen we refer to Eucldean dstance computaton, we mean tat t as maxmally separated pont estmaton embedded wtn t. To nsure tat f C (.) functon (Eq. (5)) does not evaluate to negatve values wt our approxmated estmaton of ( x, x ), we enforce tat te second term s always evaluated before te frst x, x. term so tat tere s an opportunty to update ( ) Use only a fxed number of constrants n cluster assgnment step In te cluster assgnment step of MPC-Means (Step., Algortm ), rater tan teratng troug te complete mustlnk/cannot-lnk peers of x, wc makes Step. O(N ), we coose a fxed szed subset of tem. Ts corresponds to Step 3. n empc-means (Algortm ). Ts optmzaton s nted at by te fact tat te must-lnk/cannot-lnk nformaton n our doman as sgnfcant redundancy. A set of k and k calls placed, troug user feedback, n te SPIT and non-spit categores generates k +k must-lnk and k k cannot-lnk constrants. On te oter and, we see from experment results by [6] tat MPC- Means can work reasonably well even wt lmted numbers of constrants. Togeter wt te O() complexty maxmally separated ponts estmaton, te overall tme complexty on te cluster assgnment step s brougt down to O(N). In general, ts can negatvely affect te clusterng qualty. However, we beleve t s a trade-off tat s necessary n an effort to make te detecton sceme scalable. In te followng, we wll dscuss about a doman specfc optmzaton wc can mprove te detecton qualty over MPC-Means for te cases wt lmted numbers of constrants Pre metrcs update on te startng cluster(s) In MPC-Means, te frst update metrcs step (Step.3) occurs only after te frst teraton of te cluster assgnment step (Step.). In te frst teraton of te cluster assgnment, a default dentty matrx s assgned to A, wc drectly affects te qualty of te generated clusters from te frst teraton and as a longterm effect on te qualty of te eventual clusters as we see emprcally. Terefore, n empc-means we conduct a metrcs update (Step., empc-means, Algortm ) early on, rgt after te ntal clusters are generated from te cluster ntalzaton step (Step., Algortm ). Intutvely, te user feedback s avalable at te outset and ts optmzaton allows te A matrx to mmedately adapt to te user feedback at te begnnng, wc results n more precse clusterng. Based on our experments, ts mproves te clusterng qualty and te detecton accuracy of empc-means over te orgnal MPC-Means. Addtonally, t mproves te convergence speed of te algortm as we see later (Table ). Algortm: empc-means Input: Set of data ponts { M N X = x, Set of must-lnk constrants = = {( x, xj), Set of cannot-lnk constrants C {( x, xj) =, Number of clusters, sets of constrants costs W and W, () Optonal ntal cluster centrods { μ Output: Dsjont -parttonng { functon τ mpckm s locally mnmzed. Metod: (). If ntal cluster centrods { μ, t = X = of X suc tat objectve = N λ =.. Create te λ negboroods { P P f s not gven n te nput wt steps from Sec λ Use wegtest fartest-frst traversal to select negboroods { N P. = 6

8 Else () Assgn te data ponts { X NP () Intalze { μ () { = X N λ = Intalze remanng clusters at random () Intalze { μ =.. Update metrcs A for all clusters { X = = (Eq. (6)). x x wt respect to eac A. 3. Repeat untl convergence 3.. For eac x X M { ( x, xj ) M, M = ctssze Randomly select. C ( x, xj) C, C = ctssze. Intalzaton of maxmally separated ponts (, ) ( { * = argmn x μ log det ( A ) A w (, ) j fm x, xj lj w, x (, ) j fc x xj lj ) xj M x xj C + + = t+ Assgn x to X * 3.. For eac cluster X, { x μ ( t+ ) t+ t X x X Update_metrcs A for all clusters { X = 3.4. t t+ (Eq. (6)) Algortm. empc-means Algortm sows te proposed empc-means wt te above modfcatons to MPC-Means. Step decdes te startng centrods (means) for te clusters troug te use of ntal user feedback. For te specfc case of te user flaggng calls as SPIT or non-spit, =. Step ntalzes te maxmally separated ponts estmaton. Step 3. performs te cluster assgnment. Step 3. updates te mean. Note tat te mean can be updated n constant tme by keepng te sum of te data ponts and performng an addton/subtracton wen a data pont s assocated wt/unassocated from a cluster. Step 3.3 updates te matrx A for eac cluster based on calculatng te covarance matrx on te latest clusterng results as used n Maalanobs dstance [8] and also factors n te tranng data gven n te must-lnk set M and cannot-lnk set C (Eq. (6)). Te goal of ts process s to pck te A suc tat te objectve functon (Eq. ()) s mnmzed for te cluster assgnment done n te current teraton of Step 3. From a g-level pont of vew, ts process wll result n A tat puts ger wegts on tose features wc are consstent among data ponts n te same cluster and lower wegts on tose tat are less consstent. Overall, empc-means s sown to be of O(N) complexty wtn eac teraton leadng to convergence and emprcally, te algortm needs a small number of teratons. Along wt te tme advantage, te clusterng qualty and detecton accuracy are also mproved beyond te orgnal MPC-Means algortm. 3.6 pmpc-means (Progressve MPC- Means) Bot MPC-Means and empc-means assume te data ponts are avalable n a batc, and tey are drectly suted for te Mode B (batc mode) detecton (Sec. 3.3). To support te Mode A per-call or early detecton, we create a varant called pmpc- Means. Te pseudo code s gven as Algortm 3. Te dea ere s tat wen a new call comes n, pmpc-means performs only te cluster assgnment step and only for te new data pont. Some features n te data pont lke duraton of dalog, # of ACs, # of BYEs, and # of xxx response messages wll not be avalable tll te end of te call, wle te features lke From URI, To URI, and Tme from te last call by te same caller are avalable at te begnnng of te pone call and are used n pmpc-means. For te features tat are not avalable, pmpc- Means flls te data pont x wt te mean values from te cluster to wc ts pont s dstance s beng computed. Ts s mplctly carred out n Step 4 of Algortm 3. In pmpc-means, te update metrcs operaton only occurs occasonally wen te cluster means ave canged sgnfcantly (exceedng a gven tresold d tresold ). Estmatng te mean s a O() operaton for te addton of eac data pont. Ts amortzes over many calls te cost of A computaton and te cost of reclusterng all exstng data ponts. However, a cost as to be pad n advance, wc s tat we requre reasonably szed cluster(s) to be grown on te ntal data ponts troug empc-means. Te reason s tat we want te ntal A matrx to be as accurate as possble and te tresold for te sze s denoted as t tresold n Algortm 3. Algortm: pmpc-means X = ( t ) Input: A new data pont x t., Dsjont -parttonng { {,,.., ( t ) X = x x x t. Output: te cluster assocaton l t for te pont x t. Dsjont -parttonng { X Internal Varables: { μ = Metod:. If t < t tresold ( t ) X X x t { X Return. If { X = = ( t ) { = (all clusters are empty) { X X x t. Call empc-means to generate { X of X { x,,..,, = x xt xt = =. from { μ μ = Return M { ( x, xj) M, M = ctssze 3. Randomly select. C ( x, xj ) C, C = ctssze 4. ( { * = argmn x μ log det ( A ) A X. (, ) (, ) ) + w f x x l + w f x x = l ( x, xj) M j M j j ( x, xj) C j C j j of 7

9 ( t ) 5. { X X = 6. X * X * { x t 7. If μ * μ * / x * x * > d tresold A A * * Call empc-means wt ntal centrods { μ { X = on ( t ) X. { μ μ. = Algortm 3. pmpc-means = to generate 3.7 Mult-Class empc clusterng We create a varant of empc n wc te ntal clusters are splt nto sub-clusters based on te call types calls gong to voce mal, calls termnated mmedately after te call s establsed, and te remanng calls. Tese tree types exbt dfferent patterns n te non-slence call duraton rato (feature 7, Sec. 3.4.). Te sub-clusters are formed for bot SPIT and non-spit calls. Ts s an attempt to gude te clusterng process troug expert knowledge. Te user feedback owever s only able to dfferentate between SPIT and non-spit calls, and not place a call nto a sub-cluster. 4. Experments and Results 4. Testbed We set up a two-doman testbed wt a topology smlar to Fgure, one of te domans beng protected by our detecton tecnque. We use Astersk [9] as te VoIP proxy servers and use MjSp [3] for te pone clents. Eac doman as 9 pones actng as non-sptters and 6 pones actng as sptters. We use te Posson [8] dstrbuton to model call arrval tmes and te Exponental dstrbuton to model call duratons. Te generaton of call traces was done by only one of te co-autors wtout provdng any nformaton about te mx of Non-SPIT and SPIT calls to te rest of te team. Ts was done on purpose so tat te team workng on te detecton system does not ave any pror knowledge of te call mx. Ideally we would ave lked to perform te evaluaton on trd-party call traces. However, at te tme of wrtng, no suc call trace s publcly avalable. Settng up a VoIP system wtn our unversty s also not a soluton, snce te correspondng call trace s unlkely to ave any sgnfcant number of SPIT calls. 4. Summary of call trace dataset We collected four call traces from our testbed wt varyng call caracterstcs as follows (call trace name, Non-SPIT Call lengt average, Non-SPIT Call nter-arrval tme average, SPIT Call lengt average, SPIT call nter-arrval tme average, Number of SPIT calls n trace, Number of non-spit calls n trace): (v4, 5, 3,,,, 7), (v5, 5,,,, 45, 338), (v6, 5, 3,,, 94, 89), (v7, 5, 3, 5,, 8, 3). Te tme unt s mnute. In terms of smlarty between SPIT and non-spit calls, n decreasng order, te call traces are v5, v7, v6, and v4. Tere are oter caracterstcs wc are sared by te four call traces. Examples nclude a 6% cance of a call beng ung up by te caller for a non-spit call and a % cance of beng ung up by te caller (sptter) for a SPIT call. Te meda streams for a SPIT call are domnated by te sptter wle for a non-spit call, te non-slence duraton on te caller and te callee meda streams are about te same on average. Oter expermental parameter settngs are: at most 5 mustlnk and 5 cannot-lnk constrants are used n our algortms. Te pmpc-means algortm queues data ponts ntally and runs empc-means before commencng ncremental operaton for subsequent ponts. Te convergence crteron s tat te clusterng does not cange from te prevous teraton or a prevously seen clusterng s revsted. Eac data pont n te experment s based on te average from 5 runs wt te same parameter settngs. 4.3 Effect of proporton of user feedback In ts experment, True P os tve v4 v5 v6 v7.5 ra to of ca lls wt fe e dba ck Fgure 4. Compare empc True Postve Rate across call traces 4,5,6, and 7 we evaluate te effect of te proporton of calls tat come wt user feedback. We assume te same rato for bot SPIT and non-spit calls. We assume te feedback s perfectly accurate. Fgure 5 sows te clusterng qualty wt respect to four dfferent algortms proposed on call trace 4 n terms of te F- Measure [6]. Te F-Measure s defned as follows: Precson = (# pars correctly predcted n same cluster)/(total # pars predcted n same cluster) Recall = (# pars correctly predcted n same cluster)/(total # pars n same cluster) F-Measure = ( Precson Recall)/(Precson + Recall) (7) A larger F-Measure value means better qualty clusterng. From Sec. 4., we know tat call trace 4 exbts a very clear dstncton between SPIT and non-spit calls n terms of call duraton and call nter-arrval tme. Ts makes empc perform well wt user feedback rato as low as.. Te orgnal MPC- Means aceves te same level but wt a ger user feedback rato of.. Te mproved result of empc s due to te premetrcs update (Sec ), wc creates a more accurate wegt matrx A based on user feedback, pror to teratng over te data ponts. Te F-Measure from empc Mult Class drops wt ncreasng user feedback rato because we break te cluster nto sub-clusters based on te call types. As a result, empc Mult Class wll put dfferent types of SPIT and non-spit calls nto dfferent sub-clusters. Bot wll urt te F-Measure snce by defnton of F-Measure, tese calls sould be clustered nto te same cluster. Ts negatve effect grows stronger as te user feedback rato ncreases. Fgure 6 and Fgure 7 sow te true postve and false postve rates of SPIT detecton on call trace v4. Wat we can see ere s tat empc Mult Class actually performs well despte te poor F-Measure. empc Mult Class performs worse tan empc at low user feedback rato because breakng te ntal cluster nto sub-clusters reduces te number of call data ponts wt feedback n eac sub-cluster. Ts results n poor clusterng and ence low 8

10 detecton accuracy. Compared to empc, MPC s detecton accuracy lags bend due to te lack of pre-metrcs updatng. pmpc performs rater poorly even wt call trace v4. However, t s stll n te usable range (e.g..63 True Postve wt a user feedback rato of.). pmpc s poor performance s due to te lmted features avalable before te meda stream s establsed. Due to space constrants, we sow only te True Postve curves for call traces v5, v6, and v7 n Fgure 8, Fgure 9, and Fgure respectvely. All te algortms perform worse wt call trace v5 due to same nter-arrval tme of SPIT and non-spit calls. Ts makes te tme snce last call from te same caller and call frequency (features 9 and 6 n Sec. 3.4.) muc less useful. Anoter factor s te number of SPIT calls n te call trace s decreased to 45 (compared to n v4) wc furter ampers te clusterng qualty and detecton accuracy. Fgure 4 summarzes te True Postve rates from empc across te four call traces. Ts bascally corresponds to ow salent te dfferences between SPIT calls and non-spit calls n te call traces are. In order, te easest one s v4, followed closely by v6, and ten v7. Te ardest s v5. One tng to note s tat n v5, SPIT calls are almost non-dstngusable from sort-duraton non-spit calls. We sow error-bar ( ± s.t.d.) for empc n Fgure 5. Tey are omtted n te rest of te fgures for presentaton clarty. Te general trend s tat te errors dmns wt ncreasng rato of user feedback. We observe less tan ± 5% error across te experments on call traces 4,6,and 7 wen user rato s set beyond.. For call trace 5, te error s ger (up to ± 5% at. rato). Ts s due to te fact tat trace 5 s te ardest for detecton as mentoned n prevous paragrap. F-Measure rato of calls wt feedback Fgure 5. Call trace v4 / F-Measure wt all algortms True Postve Rate rato of calls wt feedback Fgure 6. Call trace v4 / True Postve rate. False Postve Rate rato of calls wt feedback Fgure 7. Call trace 4 / False Postve rate True Postve Rate True Postve Rate True Postve Rate rato of calls wt feedback rato of calls wt feedback rato of calls wt feedback Fgure 8. Call trace 5 / True Postve Rate Fgure 9. Call trace 6 / True Postve Rate Fgure. Call trace 7 / True Postve Rate Tme (ms) Num of data ponts Fgure. MPC Runnng tme Tme (ms) Num of data ponts Fgure. empc Runnng tme MPC empc v v v v Average Table. Number of teratons to convergence 9

11 4.4 Scalablty of executon tme In ts experment we compare te runnng tmes of MPC and empc by varyng te number of call data ponts. Call trace v7 s used for ts experment. For MPC, we apply exact optmzatons wc do not cause loss of accuracy. For example, te maxmally separated ponts evaluaton s re-executed only wen te A matrx gets canged. Te results are based on code compled wt MS VC++ 8. wt default optmzaton level runnng on Wndows XP, Intel E64.3 GHz CPU. As Fgure and Fgure sows, MPC exbts non-lnear growt n te runnng tme as te number of call data ponts ncreases and s terefore non-scalable. empc, on te oter and, exbts a lnear growt n te runnng tme. Te error bars are based on ± s.t.d.. Also, MPC takes sgnfcantly longer tme to run compared wt empc wt te same number of nput data ponts. Lookng at te number of teratons tat eac algortm takes to converge (Table ), empc fares better. Te runnng tme advantage of empc comes from te lower number of teratons as well as te lower runnng tme of eac teraton. Te lower number of teratons s explaned by empc s ntal update of A s on te startng clusters. For call trace v5, te smlarty n SPIT and non-spit calls renders te A ntalzaton neffectve and te number of teratons s rougly equal for bot algortms. F-Measure Nose level True Postve Rate Nose level False Postve Rate Nose level Fgure 3. empc / F-Measure vs. Nose 4.5 Effect of nose n user feedback In ts experment, we perform an evaluaton of te dfferent algortms wt varous nose levels n te user feedback. Wen we say te nose level s c, t means tat a fracton c of te user feedback s false,.e., a SPIT call s reported as non-spit and vce-versa. We sow te result wt call trace 6 for ts experment. Te user feedback rato s fxed at.3. Fgure 3 sows te effect on te F-Measure, were te curves are symmetrc around.5 nose level. Ts s understandable snce F- Measure cares only about te clusterng qualty and not te absolute SPIT/non-SPIT cluster assocaton. Fgure 4 sows te true postve rate decreases as te nose level ncreases. Observng volume = Fgure 4. empc / True Postve vs. Nose volume = Fgure 5. empc /False Postve vs. Nose te false postve rates n Fgure 5, we conclude tat pmpc s completely unusable troug te wole nose level range wle te oter algortms are usable at low nose levels. We conclude tat pmpc s usable only for a g proporton of accurate user feedback. Beyond nose level.5 empc performance drops below tat of MPC due to our desgn of te detecton predcate (Sec ), namely, consderng te cluster tat contans more calls marked by te user as SPIT tan non-spit, to be te SPIT cluster. Wt nose level above.5, te user feedback s wrong more often tan rgt and te negatve effect s more pronounced n empc tan MPC, snce t dd a better job of clusterng on te user feedback tan MPC. As an example of a usable operatng pont, consder tat at nose levels. or below, empc as bot true postve and true negatve above.8. volume = TP - FP nose level.5 rato of feedback Fgure 6. MPC (True Postve False Postve) / call trace v6 TP - FP nose level.5 rato of feedback Fgure 7. empc (True Postve False Postve) / call trace 6 TP - FP nose level.5 rato of feedback Fgure 8. pmpc (True Postve False Postve) / call trace 6

12 4.6 Evaluaton wt nose and feedback rato Volume = ( TP FP) df dn (8) n= f =. n: nose level, f:feedback rato TP-FP Volume v4 v5 v6 v7 avg. MPC empc (Mult Class) empc pmpc Table. Summary of TP-FP volume comparsons Here we perform an evaluaton of all four proposed algortms wt respect to te four call traces. Our evaluaton metodology consders te combned effect of proporton of user feedback and te nose level and te results are sown n Fgure 6, Fgure 7, and Fgure 8. In te 3D plot, te Z-axs corresponds to (TP-FP), te dfference between True Postve rate and False Postve rate, wt respect to eac par of feedback rato and nose level. Intutvely, f (TP-FP) s greater tan zero, t means te detecton gves more correct results tan ncorrect results and can be regarded as a vald operatng pont were te detecton s useful. Due to page lengt lmtaton, we sow te 3D plots only for call trace 6. A general trend we can see n te 3D plots s tat wen fxng te nose level, te (TP-FP) value clmbs to a peak and ten goes down wen varyng te feedback rato from to. Tere s no sarp breakdown of performance for any of te algortms. If te user feedback s accurate, ten even wt low rato of user feedback, te performance s good for MPC and empc. Te performance of pmpc on te oter and s acceptable only close to te extreme regon of almost perfect user feedback for almost all calls. Expectedly, at te extreme nose level of, te peaks are at te locaton wt feedback rato ; at te extreme case of nose level, te peaks are at feedback rato. To gve an overall quantfcaton of te detecton qualty, we defne te volume metrc based on te ntegral (Eq. (8)). In te deal case were (TP-FP) s mantaned at troug te entre range of nose levels and feedback rato values, te volume wll be.9. Table sows te volume for eac combnaton of algortm and call trace. Call trace v5 gves te lowest volume correspondng to te worst performance for all algortms. Averaged over te entre range, we see tat empc performs best followed by empc (Mult Class), MPC, and pmpc. 5. CONCLUSION In ts paper, we proposed a new approac to detect SPIT calls n a VoIP envronment. We map eac pone call nto a data pont based on an extendable set of call features, derved from te sgnalng as well as te meda protocols. Ts converts te problem of SPIT detecton nto a data classfcaton problem, were a classc soluton s te use of clusterng. We apply semsupervsed clusterng, wc allows for te optonal use of user feedback for more accurate classfcaton. Ts corresponds to users flaggng some calls as SPIT and oters as legtmate. A callenge we encounter s tat te MPC-Means semsupervsed clusterng algortm s not scalable wt te number of data ponts. Furtermore, for our specfc problem on classfyng VoIP calls, t requres at least 3% of calls to ave user feedback to aceve g (> 9%) detecton true postve rates. Ts s lkely too onerous to te end users. Terefore, we create a new algortm called empc-means wc provdes emprcally lnear tme performance speedng up te performance of MPC-Means ffteen fold. It aceves ts by replacng te tmeconsumng parts n te MPC-Means algortm wt domanspecfc approxmatons. We ntroduce a pre-metrcs-update step, wc contrbutes to g (> 9%) detecton true postve rates wt less tan % user feedback data ponts for tree of te four call traces used ere. We found tat t s dffcult to attan g detecton accuracy based only on features avalable n te call establsment pase, wc would enable a SPIT call to be dropped wtout te user needng to answer te call. Ts algortm pmpc performs well only wt accurate user feedback for a majorty of calls. One possble mprovement s to nclude te trust/reputaton of callers as a feature n te clusterng. However, as mentoned earler n Sec., we would ave to address te drawbacks of classcal reputaton systems. We are also nterested n furter features tat can be extracted from meda traffc tat can ad n te classfcaton. Ts brngs n te queston of feasblty, consderng te load tat suc processng wll place on te clentsde detectors, especally consderng tat some ard pones may be qute lmted n ter processng capabltes. Implctly we ave assumed tat nose n user feedback s due to onest mstakes, not malcous users tryng to steer te clusterng n wrong drectons. How sould te system protect tself wen te assumpton s no longer true? Could t be possble to dentfy suc users and downgrade ter feedback for te clusterng process? 6. REFERENCES [] VOIPSA, "VoIP Treat Taxonomy," 8. [] Y. S. Wu, S. Bagc, S. Garg, and N. Sng, "SCIDIVE: a stateful and cross protocol ntruson detecton arctecture for voce-over-ip envronments," n DSN, 4, pp [3] H. Sengar, D. Wjesekera, H. Wang, and S. Jajoda, "VoIP Intruson Detecton Troug Interactng Protocol State Macnes," n DSN, 6, pp [4] C. J. J. Rosenberg, "RFC 539 : Te Sesson Intaton Protocol (SIP) and Spam," 8. [5] D. Sn, J. An, and C. Sm, "Progressve Mult Gray- Levelng: A Voce Spam Protecton Algortm," IEEE Network, vol., pp. 8-4, 6. [6] M. Blenko, S. Basu, and R. J. Mooney, "Integratng constrants and metrc learnng n sem-supervsed clusterng," n ICML, 4, pp [7] J. Quttek, S. Nccoln, S. Tartarell, M. Stemerlng, M. Brunner, and T. Ewald, "Detectng SPIT Calls by Ceckng Human Communcaton Patterns," n ICC, 7, pp [8] R. MacIntos and D. Vnokurov, "Detecton and mtgaton of spam n IP telepony networks usng sgnalng protocol analyss," n IEEE/Sarnoff Symposum on Advances n Wred and Wreless Communcaton, 5, pp [9] P. olan and R. Dantu, "Soco-tecncal defense aganst voce spammng," ACM Transactons on Autonomous and Adaptve Systems (TAAS), vol., 7.

Machine Learning. K-means Algorithm

Machine Learning. K-means Algorithm Macne Learnng CS 6375 --- Sprng 2015 Gaussan Mture Model GMM pectaton Mamzaton M Acknowledgement: some sldes adopted from Crstoper Bsop Vncent Ng. 1 K-means Algortm Specal case of M Goal: represent a data

More information

Priority queues and heaps Professors Clark F. Olson and Carol Zander

Priority queues and heaps Professors Clark F. Olson and Carol Zander Prorty queues and eaps Professors Clark F. Olson and Carol Zander Prorty queues A common abstract data type (ADT) n computer scence s te prorty queue. As you mgt expect from te name, eac tem n te prorty

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

An Analytical Tool to Assess Readiness of Existing Networks for Deploying IP Telephony

An Analytical Tool to Assess Readiness of Existing Networks for Deploying IP Telephony An Analytcal Tool to Assess Readness of Exstng Networks for Deployng IP Telepony K. Sala M. Almasar Department of Informaton and Computer Scence Kng Fad Unversty of Petroleum and Mnerals Daran 31261, Saud

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Investigations of Topology and Shape of Multi-material Optimum Design of Structures

Investigations of Topology and Shape of Multi-material Optimum Design of Structures Advanced Scence and Tecnology Letters Vol.141 (GST 2016), pp.241-245 ttp://dx.do.org/10.14257/astl.2016.141.52 Investgatons of Topology and Sape of Mult-materal Optmum Desgn of Structures Quoc Hoan Doan

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Exact solution, the Direct Linear Transfo. ct solution, the Direct Linear Transform

Exact solution, the Direct Linear Transfo. ct solution, the Direct Linear Transform Estmaton Basc questons We are gong to be nterested of solvng e.g. te followng estmaton problems: D omograpy. Gven a pont set n P and crespondng ponts n P, fnd te omograpy suc tat ( ) =. Camera projecton.

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z. TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS Muradalyev AZ Azerbajan Scentfc-Research and Desgn-Prospectng Insttute of Energetc AZ1012, Ave HZardab-94 E-mal:aydn_murad@yahoocom Importance of

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Intelligent Information Acquisition for Improved Clustering

Intelligent Information Acquisition for Improved Clustering Intellgent Informaton Acquston for Improved Clusterng Duy Vu Unversty of Texas at Austn duyvu@cs.utexas.edu Mkhal Blenko Mcrosoft Research mblenko@mcrosoft.com Prem Melvlle IBM T.J. Watson Research Center

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

Performance Evaluation of Information Retrieval Systems

Performance Evaluation of Information Retrieval Systems Why System Evaluaton? Performance Evaluaton of Informaton Retreval Systems Many sldes n ths secton are adapted from Prof. Joydeep Ghosh (UT ECE) who n turn adapted them from Prof. Dk Lee (Unv. of Scence

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr) Helsnk Unversty Of Technology, Systems Analyss Laboratory Mat-2.08 Independent research projects n appled mathematcs (3 cr) "! #$&% Antt Laukkanen 506 R ajlaukka@cc.hut.f 2 Introducton...3 2 Multattrbute

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Cooperative Network Coding-Aware Routing for Multi-Rate Wireless Networks

Cooperative Network Coding-Aware Routing for Multi-Rate Wireless Networks Ts full text paper was peer revewed at te drecton of IEEE Communcatons Socety subject matter experts for publcaton n te IEEE INFOCOM 009 proceedngs. Cooperatve Network Codng-Aware Routng for Mult-Rate

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization Problem efntons and Evaluaton Crtera for Computatonal Expensve Optmzaton B. Lu 1, Q. Chen and Q. Zhang 3, J. J. Lang 4, P. N. Suganthan, B. Y. Qu 6 1 epartment of Computng, Glyndwr Unversty, UK Faclty

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS ARPN Journal of Engneerng and Appled Scences 006-017 Asan Research Publshng Network (ARPN). All rghts reserved. NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS Igor Grgoryev, Svetlana

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Video Proxy System for a Large-scale VOD System (DINA)

Video Proxy System for a Large-scale VOD System (DINA) Vdeo Proxy System for a Large-scale VOD System (DINA) KWUN-CHUNG CHAN #, KWOK-WAI CHEUNG *# #Department of Informaton Engneerng *Centre of Innovaton and Technology The Chnese Unversty of Hong Kong SHATIN,

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Mode-seeking by Medoidshifts

Mode-seeking by Medoidshifts Mode-seekng by Medodsfts Yaser Ajmal Sek Robotcs Insttute Carnege Mellon Unversty yaser@cs.cmu.edu Erum Arf Kan Department of Computer Scence Unversty of Central Florda ekan@cs.ucf.edu Takeo Kanade Robotcs

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 48 CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION 3.1 INTRODUCTION The raw mcroarray data s bascally an mage wth dfferent colors ndcatng hybrdzaton (Xue

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

UB at GeoCLEF Department of Geography Abstract

UB at GeoCLEF Department of Geography   Abstract UB at GeoCLEF 2006 Mguel E. Ruz (1), Stuart Shapro (2), June Abbas (1), Slva B. Southwck (1) and Davd Mark (3) State Unversty of New York at Buffalo (1) Department of Lbrary and Informaton Studes (2) Department

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

S1 Note. Basis functions.

S1 Note. Basis functions. S1 Note. Bass functons. Contents Types of bass functons...1 The Fourer bass...2 B-splne bass...3 Power and type I error rates wth dfferent numbers of bass functons...4 Table S1. Smulaton results of type

More information

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines A Modfed Medan Flter for the Removal of Impulse Nose Based on the Support Vector Machnes H. GOMEZ-MORENO, S. MALDONADO-BASCON, F. LOPEZ-FERRERAS, M. UTRILLA- MANSO AND P. GIL-JIMENEZ Departamento de Teoría

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp Lfe Tables (Tmes) Summary... 1 Data Input... 2 Analyss Summary... 3 Survval Functon... 5 Log Survval Functon... 6 Cumulatve Hazard Functon... 7 Percentles... 7 Group Comparsons... 8 Summary The Lfe Tables

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

kccvoip.com basic voip training NAT/PAT extract 2008

kccvoip.com basic voip training NAT/PAT extract 2008 kccvop.com basc vop tranng NAT/PAT extract 28 As we have seen n the prevous sldes, SIP and H2 both use addressng nsde ther packets to rely nformaton. Thnk of an envelope where we place the addresses of

More information

USING GRAPHING SKILLS

USING GRAPHING SKILLS Name: BOLOGY: Date: _ Class: USNG GRAPHNG SKLLS NTRODUCTON: Recorded data can be plotted on a graph. A graph s a pctoral representaton of nformaton recorded n a data table. t s used to show a relatonshp

More information

A Distributed First and Last Consistent Global Checkpoint Algorithm

A Distributed First and Last Consistent Global Checkpoint Algorithm A Dstrbuted Frst and Last Consstent Global Cecpont Algortm Yosfum Manabe NTT Basc Researc Laboratores 3-1 Mornosato-Waamya, Atsug-s, Kanagawa 43-01 Japan manabe@teory.brl.ntt.co.p Abstract Dstrbuted coordnated

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

The AVL Balance Condition. CSE 326: Data Structures. AVL Trees. The AVL Tree Data Structure. Is this an AVL Tree? Height of an AVL Tree

The AVL Balance Condition. CSE 326: Data Structures. AVL Trees. The AVL Tree Data Structure. Is this an AVL Tree? Height of an AVL Tree CSE : Data Structures AL Trees Neva Cernavsy Summer Te AL Balance Condton AL balance property: Left and rgt subtrees of every node ave egts dfferng by at most Ensures small dept ll prove ts by sowng tat

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated. Some Advanced SP Tools 1. umulatve Sum ontrol (usum) hart For the data shown n Table 9-1, the x chart can be generated. However, the shft taken place at sample #21 s not apparent. 92 For ths set samples,

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK L-qng Qu, Yong-quan Lang 2, Jng-Chen 3, 2 College of Informaton Scence and Technology, Shandong Unversty of Scence and Technology,

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Decision Strategies for Rating Objects in Knowledge-Shared Research Networks

Decision Strategies for Rating Objects in Knowledge-Shared Research Networks Decson Strateges for Ratng Objects n Knowledge-Shared Research etwors ALEXADRA GRACHAROVA *, HAS-JOACHM ER **, HASSA OUR ELD ** OM SUUROE ***, HARR ARAKSE *** * nsttute of Control and System Research,

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

On Some Entertaining Applications of the Concept of Set in Computer Science Course

On Some Entertaining Applications of the Concept of Set in Computer Science Course On Some Entertanng Applcatons of the Concept of Set n Computer Scence Course Krasmr Yordzhev *, Hrstna Kostadnova ** * Assocate Professor Krasmr Yordzhev, Ph.D., Faculty of Mathematcs and Natural Scences,

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem An Effcent Genetc Algorthm wth Fuzzy c-means Clusterng for Travelng Salesman Problem Jong-Won Yoon and Sung-Bae Cho Dept. of Computer Scence Yonse Unversty Seoul, Korea jwyoon@sclab.yonse.ac.r, sbcho@cs.yonse.ac.r

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Determining the Optimal Bandwidth Based on Multi-criterion Fusion Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn

More information

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Learning-Based Top-N Selection Query Evaluation over Relational Databases Learnng-Based Top-N Selecton Query Evaluaton over Relatonal Databases Lang Zhu *, Wey Meng ** * School of Mathematcs and Computer Scence, Hebe Unversty, Baodng, Hebe 071002, Chna, zhu@mal.hbu.edu.cn **

More information

Prof. Feng Liu. Spring /24/2017

Prof. Feng Liu. Spring /24/2017 Prof. Feng Lu Sprng 2017 ttp://www.cs.pd.edu/~flu/courses/cs510/ 05/24/2017 Last me Compostng and Mattng 2 oday Vdeo Stablzaton Vdeo stablzaton ppelne 3 Orson Welles, ouc of Evl, 1958 4 Images courtesy

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

The ray density estimation of a CT system by a supervised learning algorithm

The ray density estimation of a CT system by a supervised learning algorithm Te ray densty estaton of a CT syste by a suervsed learnng algort Nae : Jongduk Baek Student ID : 5459 Toc y toc s to fnd te ray densty of a new CT syste by usng te learnng algort Background Snce te develoent

More information

Wightman. Mobility. Quick Reference Guide THIS SPACE INTENTIONALLY LEFT BLANK

Wightman. Mobility. Quick Reference Guide THIS SPACE INTENTIONALLY LEFT BLANK Wghtman Moblty Quck Reference Gude THIS SPACE INTENTIONALLY LEFT BLANK WIGHTMAN MOBILITY BASICS How to Set Up Your Vocemal 1. On your phone s dal screen, press and hold 1 to access your vocemal. If your

More information

SVM-based Learning for Multiple Model Estimation

SVM-based Learning for Multiple Model Estimation SVM-based Learnng for Multple Model Estmaton Vladmr Cherkassky and Yunqan Ma Department of Electrcal and Computer Engneerng Unversty of Mnnesota Mnneapols, MN 55455 {cherkass,myq}@ece.umn.edu Abstract:

More information

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A New Approach For the Ranking of Fuzzy Sets With Different Heights New pproach For the ankng of Fuzzy Sets Wth Dfferent Heghts Pushpnder Sngh School of Mathematcs Computer pplcatons Thapar Unversty, Patala-7 00 Inda pushpndersnl@gmalcom STCT ankng of fuzzy sets plays

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Path Planning for Formation Control of Autonomous

Path Planning for Formation Control of Autonomous Pat Plannng for Formaton Control of Autonomous Vecles 1 E.K. Xdas, 2 C. Palotta, 3 N.A. Aspragatos and 2 K.Y. Pettersen 1 Department of Product and Systems Desgn engneerng, Unversty of te Aegean, 84100

More information

Help for Time-Resolved Analysis TRI2 version 2.4 P Barber,

Help for Time-Resolved Analysis TRI2 version 2.4 P Barber, Help for Tme-Resolved Analyss TRI2 verson 2.4 P Barber, 22.01.10 Introducton Tme-resolved Analyss (TRA) becomes avalable under the processng menu once you have loaded and selected an mage that contans

More information

y and the total sum of

y and the total sum of Lnear regresson Testng for non-lnearty In analytcal chemstry, lnear regresson s commonly used n the constructon of calbraton functons requred for analytcal technques such as gas chromatography, atomc absorpton

More information