HVLearn: Automated Black-box Analysis of Hostname Verification in SSL/TLS Implementations

Size: px
Start display at page:

Download "HVLearn: Automated Black-box Analysis of Hostname Verification in SSL/TLS Implementations"

Transcription

1 2017 IEEE Symposium on Security nd Privcy HVLern: Automted Blck-box Anlysis of Hostnme Verifiction in SSL/TLS Implementtions Suphnnee Sivkorn, George Argyros, Kexin Pei, Angelos D. Keromytis, nd Sumn Jn Deprtment of Computer Science Columbi University, New York, USA {suphnnee, rgyros, kpei, ngelos, Abstrct SSL/TLS is the most commonly deployed fmily of protocols for securing network communictions. The security gurntees of SSL/TLS re criticlly dependent on the correct vlidtion of the X.509 server certifictes presented during the hndshke stge of the SSL/TLS protocol. Hostnme verifiction is criticl component of the certificte vlidtion process tht verifies the remote server s identity by checking if the hostnme of the server mtches ny of the nmes present in the X.509 certificte. Hostnme verifiction is highly complex process due to the presence of numerous fetures nd corner cses such s wildcrds, IP ddresses, interntionl domin nmes, nd so forth. Therefore, testing hostnme verifiction implementtions present chllenging tsk. In this pper, we present HVLern, novel blck-box testing frmework for nlyzing SSL/TLS hostnme verifiction implementtions, which is bsed on utomt lerning lgorithms. HVLern utilizes number of certificte templtes, i.e., certifictes with common nme (CN) set to specific pttern, in order to test different rules from the corresponding specifiction. For ech certificte templte, HVLern uses utomt lerning lgorithms to infer Deterministic Finite Automton (DFA) tht describes the set of ll hostnmes tht mtch the CN of given certificte. Once model is inferred for certificte templte, HVLern checks the model for bugs by finding discrepncies with the inferred models from other implementtions or by checking ginst regulr-expression-bsed rules derived from the specifiction. The key insight behind our pproch is tht the cceptble hostnmes for given certificte templte form regulr lnguge. Therefore, we cn leverge utomt lerning techniques to efficiently infer DFA models tht ccept the corresponding regulr lnguge. We use HVLern to nlyze the hostnme verifiction implementtions in number of populr SSL/TLS librries nd pplictions written in diverse set of lnguges like C, Python, nd Jv. We demonstrte tht HVLern cn chieve on verge 11.21% higher code coverge thn existing blck/gry-box fuzzing techniques. By compring the DFA models inferred by HVLern, we found 8 unique violtions of the RFC specifictions in the tested hostnme verifiction implementtions. Severl of these violtions re criticl nd cn render the ffected implementtions vulnerble to ctive mn-in-the-middle ttcks. I. INTRODUCTION The SSL/TLS fmily of protocols re the most commonly used mechnisms for protecting the security nd privcy of network communictions from mn-in-the-middle ttcks. The security gurntees of SSL/TLS protocols re criticlly dependent on correct vlidtion of X.509 digitl certifictes presented by the servers during the SSL/TLS hndshke phse. The certificte vlidtion, in turn, depends on hostnme verifiction for verifying tht the hostnme (i.e., fully qulified domin nme, IP ddress, nd so forth) of the server mtches one of the identifiers in the SubjectAltNme extension or the Common Nme (CN) ttribute of the presented lef certificte. Therefore, ny mistke in the implementtion of hostnme verifiction could completely undermine the security nd privcy gurntees of SSL/TLS. Hostnme verifiction is complex process due to the presence of numerous specil cses (e.g., wildcrds, IP ddresses, interntionl domin nmes, etc.). For exmple, wildcrd chrcter ( * ) is only llowed in the left-most prt (seprted by. ) of hostnme. To get sense of the complexities involved in the hostnme verifiction process, consider the fct tht different prts of its specifictions re described in five different RFCs [18], [20], [21], [24], [25]. Given the complexity nd security-criticl nture of the hostnme verifiction process, it is crucil to perform utomted nlysis of the implementtions for finding ny devition from the specifiction. However, despite the criticl nture of the hostnme verifiction process, none of the prior reserch projects deling with dversril testing of SSL/TLS certificte vlidtion [36], [38], [45], [50], support detiled utomted testing of hostnme verifiction implementtions. The prior projects either completely ignore testing of the hostnme verifiction process or simply check whether the hostnme verifiction process is enbled or not. Therefore, they cnnot detect ny subtle bugs where the hostnme verifiction implementtions re enbled but devite subtly from the specifictions. The key problem behind utomted dversril testing of hostnme verifiction implementtions is tht the inputs (i.e., hostnmes nd certificte identifiers like common nmes) re highly structured, sprse strings nd therefore mkes it very hrd for existing blck/gry-box fuzz testing techniques to chieve high test coverge or generte inputs triggering the corner cses. Hevily lnguge/pltform-dependent white-box testing techniques re lso hrd to pply for testing hostnme verifiction implementtions due to the lnguge/pltform diversity of SSL/TLS implementtions. In this pper, we design, implement, nd evlute HVLern, blck-box differentil testing frmework bsed on utomt lerning, which cn utomticlly infer Deterministic Finite Automt (DFA) models of the hostnme verifiction implementtions. The key insight behind HVLern is tht hostnme verifiction, even though very complex, conceptully closely 2017, Suphnnee Sivkorn. Under license to IEEE. DOI /SP

2 resemble the regulr expression mtching process in mny wys (e.g., wildcrds). This insight on the structure of the certificte identifier formt suggests tht the cceptble hostnmes for given certificte identifier, s suggested by the specifictions, form regulr lnguge. Therefore, we cn use blck-box utomt lerning techniques to efficiently infer Deterministic Finite Automt (DFA) models tht ccept the regulr lnguge corresponding to given hostnme verifiction implementtion. Prior results by Angluin et l. hve shown tht DFAs cn be lerned efficiently through blck-box queries in polynomil time over the number of sttes [31]. The DFA models inferred by HVLern cn be used to efficiently perform two min tsks tht existing testing techniques cnnot do well: (i) finding nd enumerting unique differences between multiple different implementtions; nd (ii) extrcting forml, bckwrd-comptible reference specifiction for the hostnme verifiction process by computing the intersection DFA of the inferred DFA models from different implementtions. We pply HVLern to nlyze number of populr SSL/TLS librries such s OpenSSL, GnuTLS, MbedTLS, MtrixSSL, CPython SSL nd pplictions such s Jv HttpClient nd curl written in diverse lnguges like C, Python, nd Jv. We found 8 distinct specifiction violtions like the incorrect hndling of wildcrds in interntionlized domin nmes, confusing domin nmes with IP ddresses, incorrect hndling of NULL chrcters, nd so forth. Severl of these violtions llow network ttckers to completely brek the security gurntees of SSL/TLS protocol by llowing the ttckers to red/modify ny dt trnsmitted over the SSL/TLS connections set up using the ffected implementtions. HVLern lso found 121 unique differences, on verge, between ny two pirs of tested ppliction/librry. The mjor contributions of this pper re s follows. To the best of our knowledge, HVLern is the first testing tool tht cn lern DFA models for implementtions of hostnme verifiction, criticl prt of SSL/TLS implementtions. The inferred DFA models cn be used for efficient differentil testing or extrcting forml reference specifiction comptible with multiple existing implementtions. We design nd implement severl domin-specific optimiztions like equivlence query design, lphbet selection, etc. in HVLern for efficiently lerning DFA models from hostnme verifiction implementtions. We evlute HVLern on 6 populr librries nd 2 pplictions. HVLern chieved significntly higher (11.21% more on verge) code coverge thn existing blck/grybox fuzzing techniques nd found 8 unique previously unknown RFC violtions s shown in Tble II, severl of which render the ffected SSL/TLS implementtions completely insecure to mn-in-the-middle ttcks. The reminder of this pper is orgnized s follows: Section II presents the descriptions of the SSL/TLS hostnme verifiction process. We discuss the chllenges in testing hostnme verifiction nd our testing methodology in Section III. Section IV describes the design nd implementtion detils of HVLern. We present the evlution results for using HVLern to test SSL/TLS implementtions in Section V. Section VI presents detiled cse study of severl securitycriticl bugs tht HVLern found. Section VII discusses the relted work nd Section VIII concludes the pper. For the detiled developer responses on the bugs found by HVLern, we refer interested reders to Appendix X-B. II. OVERVIEW OF HOSTNAME VERIFICATION As prt of the hostnme verifiction process, the SSL/TLS client must check tht the host nme of the server mtches either the common nme ttribute in the certificte or one of the nmes in the subjectaltnme extension in the certificte [21]. Note tht even though the process is clled hostnme verifiction, it lso supports verifiction of IP ddresses or emil ddresses. In this section, we first provide brief summry of the hostnme formt nd specifictions tht describe the formt of the common nme ttribute nd subjectaltnme extension formts in X.509 certificte. Figure 1 provides high-level summry of the relevnt prts of n X.509 certificte. Next, we describe different prts of the hostnme verifiction process (e.g., domin nme restrictions, wildcrd chrcters, nd so forth) in detil. X.509 Certificte type formt Subject: CN= X520CommonNme rbitrry X509v3 extensions X509v3 Subject Alterntive Nme: type formt DNS: IA5String dnsnme IP Address: emil: IA5String IA5String ipaddress rfc822nme Fig. 1. Fields in n X.509 certificte tht re used for hostnme verifiction. A. Hostnme verifiction inputs Hostnme formt. Hostnmes re usully either fully qulified domin nme or single string without ny. chrcters. Severl SSL/TLS implementtions (i.e., OpenSSL) lso support IP ddresses nd emil ddresses to be pssed s the hostnme to the corresponding hostnme verifiction implementtion. A domin nme consists of multiple lbels, ech seprted by. chrcter. The domin nme lbels cn only contin letters -z or A-Z (in cse-insensitive mnner), digits 0-9 nd the hyphen chrcter - [16]. Ech lbel cn be up to 63 chrcters long. The totl length of domin nme cn be up to 255 chrcters. Erlier specifictions required tht the lbels must begin with letters [21]. However, subsequent revisions hve llowed lbels tht begin with digits [17]. Common nmes in X.509 certifictes. The Common Nme (CN) is n ttribute of the subject distinguished nme 522

3 field in n X.509 certificte. The common nme in server certificte is used for vlidting the hostnme of the server s prt of the certificte verifiction process. A common nme usully contins fully qulified domin nme, but it cn lso contin string with rbitrry ASCII nd UTF-8 chrcters describing service (e.g., CN= Smple Service ). The only restriction on the common nme string is tht it should follow the X520CommonNme stndrd (e.g., should not repet the substring CN= ) [21]. Note tht this is different from the hostnme specifictions tht re very strictly defined nd only llow certin chrcters nd digits s described bove. SubjectAltNme in X.509 certifictes. Subject lterntive nme (subjectaltnme) is n X.509 extension tht cn be used to store different types of identity informtion like fully qulified domin nmes, IP ddresses, URI strings, emil ddresses, nd so forth. Ech of these types hs different restrictions on llowed formts. For exmple, dnsnme(dns) nd uniformresourceidentifier(uri) must be vlid IA5String strings, subset of ASCII strings [21]. We refer interested reders to Section of RFC 5280 for further reding. B. Hostnme verifiction rules Mtching order. RFC 6125 recommends SSL/TLS implementtions to use subjectaltnme extensions, if present in certificte, over common nmes s the common nme is not strongly tied to n identity nd cn be n rbitrry string s mentioned erlier [24]. If multiple identifiers re present in subjectaltnme, the SSL/TLS implementtions should try to mtch DNS, SRV, URI, or ny other identifier type supported by the implementtion nd must not mtch the hostnme ginst the common nme of the certificte [24]. The Certificte Authorities (CAs) re lso supposed to use the dnsnme insted of common nme for storing the identity informtion while issuing certifictes [18]. Wildcrd in common nme/subjectaltnme. if server certificte contins wildcrd chrcter *, n SSL/TLS implementtion should mtch hostnme ginst them using the rules described in RFC 6125 [24]. We provide summry of the rules below. A wildcrd chrcter is only llowed in the left-most lbel. If the presented identifier contins wildcrd chrcter in ny lbel other then the left-most lbel (e.g., nd the SSL/TLS implementtions should reject the certificte. A wildcrd chrcter is llowed to be present nywhere in the left-most lbel, i.e., wildcrd does not hve to be the only chrcter in the left-most lbel. For exmple, identifiers like br*.exmple.com, *br.exmple.com, or f*br.exmple.com vlid. While mtching hostnmes ginst the identifiers present in certificte, wildcrd chrcter in n identifier should only pply to one sub-domin nd n SSL/TLS implementtion should not compre ginst nything but the leftmost lbel of the hostnme (e.g., *.exmple.com should mtch foo.exmple.com but not br.foo.exmple.com or exmple.com). Severl specil cses involving the wildcrds re llowed in the RFC 6125 only for bckwrd comptibility of existing SSL/TLS implementtions s they tend to differ from the specifictions in these cses. RFC 6125 clerly notes tht these cses often led to overly complex hostnme verifiction code nd might led to potentilly exploitble vulnerbilities. Therefore, new SSL/TLS implementtions re discourged from supporting such cses. We summrize some of them: (i) wildcrd is ll or prt of lbel tht identifies public suffix (e.g., *.com nd *.info), (ii) multiple wildcrds re present in lbel (e.g., f*b*r.exmple.com), nd (iii) wildcrds re included s ll or prt of multiple lbels (e.g., *.*.exmple.com). Interntionl domin nme (IDN). IDNs cn contin chrcters from lnguge-specific lphbet like Arbic or Chinese. An IDN is encoded s string of unicode chrcters. A domin nme lbel is ctegorized s U-lbel if it contins t lest one non-ascii chrcter (e.g., UTF-8). RFC 6125 specifies tht ny U-lbels in IDNs must be converted to A-lbels domin before performing hostnme verifiction [24]. U-lbel strings re converted to A-lbels, n ASCII-comptible encoding, by dding the prefix xn-- nd ppending the output of Punycode trnsformtion pplied to the corresponding U- lbel string s described in RFC 3492 [19]. Both U-lbels nd A-lbels still must stisfy the stndrd length bound on the domin nmes (i.e. up to 255 bytes). IDN in subjectaltnme. As indicted in RFC 5280, ny IDN in X.509 subjectaltnme extension must be defined s type IA5String which is limited only to subset of ASCII chrcters [21]. Any U-lbel in n IDN must be converted to A-lbel before dding it to the subjectaltnme. Emil ddresses involving IDNs must lso be converted to A-lbels before. IDNs in common nme. Unlike IDNs in subjectaltnme, IDNs in common nmes re llowed to contin PrintbleString (A-Z, -z, 0-9, specil chrcters = ( ) +, -. / :?, nd spce) s well s UTF-8 chrcters [21]. Wildcrd nd IDN. There is no specifiction defining how wildcrd chrcter my be embedded within A-lbels or U-lbels of n IDN [23]. As result RFC 6125 [24] recommends tht SSL/TLS implementtions should not mtch presented identifier in certificte where the wildcrd is embedded within n A-lbel or U-lbel of n IDN (e.g., xn--kcry6tjko*.exmple.com). However, SSL/TLS implementtions should mtch wildcrd chrcter in n IDN s long s the wildcrd chrcter occupies the entire left-most lbel of the IDN (e.g. *.xn--kcry6tjko.exmple.com). IP ddress. IP ddresses cn be prt of either the common nme ttribute or the subjectaltnme extension (with n IP: prefix) in certificte. Section of RFC 6125 specifies tht n IP ddress must be converted to network byte order octet string before performing certificte verifiction [24]. SSL/TLS implementtions should compre this octet string with the common nme or subjectaltnme identifiers. The length of the octet string must be 4 bytes nd 18 bytes for IPv4 nd IPv6 respectively. The hostnme verifiction should 523

4 succeed only if both octet strings re identicl. Therefore, wildcrd chrcters re not llowed in IP ddress identifiers, nd the SSL/TLS implementtions should not ttempt to mtch wildcrds. Emil. Emil cn be embedded in common nme s the emiladdress ttribute in legcy SSL/TLS implementtions. The ttribute is not cse sensitive. However, new implementtions must dd emil ddresses in rfc822nme formt to subject lterntive nme extension insted of the common nme ttribute [21]. Interntionlized emil. As similr to IDNs in subjectaltnme extensions, n interntionlized emil must be converted into the ASCII representtion before verifiction. RFC 5321 lso specifies tht network dministrtors must not define milboxes (locl-prt@domin/ddress-literl) with non-ascii chrcters nd ASCII control chrcters. Emil ddresses re considered to mtch if the locl-prt nd host-prt re exct mtches using cse-sensitive nd cse-insensitive ASCII comprison respectively (e.g., MYE- MAIL@exmple.com does not mtch myemil@exmple.com but mtches MY @EXAMPLE.COM) [21]. Note tht this specifiction contrdicts tht of the emil ddresses embedded in the common nme tht is supposed to be completely cse-insensitive. Emil with IP ddress in the host prt. RFCs 5280 nd 6125 do not specify ny specil tretment for IP ddress in the host prt of emil nd only llow emil in rfc822nme formt. The rfc822nme formt supports both IPv4 nd IPv6 ddresses in the host prt. Therefore, n emil with n IP ddress in the host prt is llowed to be present in certificte [22]. Wildcrd in emil. There is no specifiction tht wildcrd should be interpreted nd ttempted to mtch when they re prt of n emil ddress in certificte. Other identifiers in subjectaltnme. There re other identifiers tht cn be used to perform identity checks e.g., UniformResourceIdentifier(URI), SRVNme, nd othernme. However, most populr SSL/TLS librries do not support checking these identifiers nd leve it up to the pplictions. III. METHODOLOGY In this section, we describe the chllenges behind utomted testing of hostnme verifiction implementtions. Albeit smll in size, the diversity of these implementtions nd the subtleties in the hostnme verifiction process mke these implementtions difficult to test. We then proceed to describe n overview of our methodology for testing hostnme verifiction implementtions using utomt lerning lgorithms. We lso provide brief summry of the bsic setting under which utomt lerning lgorithms operte. A. Chllenges in hostnme verifiction nlysis We believe tht ny methodology for utomticlly nlyzing hostnme verifiction functionlity should ddress the following chllenges: 1. Ill-defined informl specifictions. As discussed in Section II, lthough the relevnt RFCs provide some exmples/rules defining the hostnme verifiction process, mny corner cses re left unspecified. Therefore, it is necessry for ny hostnme verifiction implementtion nlysis to tke into ccount the behviors of other populr implementtions to discover discrepncies tht could led to security/comptibility flws. 2. Complexity of nme checking functionlity. Hostnme verifiction is significntly more complex thn simple string comprison due to the presence of numerous corner cses nd specil chrcters. Therefore, ny utomted nlysis must be ble to explore these corner cses. We observe tht the formt of the certificte identifier s well s the mtching rules closely resemble regulr expression mtching problem. In fct, we find tht the set of ccepted hostnmes for ech given certificte identifier form regulr lnguge. 3. Diversity of implementtions. The importnce nd populrity of the SSL/TLS protocol resulted in lrge number of different SSL/TLS implementtions. Therefore, hostnme verifiction logic is often implemented in number of different progrmming lnguges such s C/C++, Jv, Python, nd so forth. Furthermore, some of these implementtions might be only ccessible remotely without ny ccess to their source code. Therefore, we rgue tht blck-box nlysis lgorithm is the most suitble technique for testing lrge vriety of different hostnme verifiction implementtions. B. HVLern s pproch to hostnme verifiction nlysis Motivted by the chllenges described bove, we now present our methodology for nlyzing hostnme verifiction routines in SSL/TLS librries nd pplictions. The min ide behind our HVLern system is the following: For different rules in the RFCs s well s for mbiguous rules which re not well defined in the RFC, we generte templte certifictes with common nmes which re specificlly designed in order to check specific rule. Afterwrd, we use utomt lerning lgorithms in order to extrct DFA which describes the set of ll hostnme strings which re mtching the common nme in our templte certificte. For exmple, the inferred DFA from n implementtion for the identifier templte.*..com cn be used to test conformnce with the rule in RFC 6125 prohibiting wildcrd chrcters from ppering in ny other lbel thn the leftmost lbel of the common nme. Once DFA model is generted by the lerning lgorithm, we check the model for violtions of ny RFC rules or for other suspicious behvior. HVLern offers two methods to check n inferred DFA model: Regulr-expression-bsed rules. The first option llows the user to provide regulr expression tht specifies set of invlid strings. HVLern cn ensure tht the inferred DFAs do not ccept ny of those strings. For exmple, RFC 1035 sttes tht only chrcters in the set [A-Z-z0-9] nd the chrcters - nd. should be used in hostnme identifiers. Users therefore cn construct simple regulr expression tht cn be used by HVLern to check whether ny of the tested implementtions ccept hostnme with chrcter outside the given set. 524

5 Model M Equivlence Orcle Lerning Model Lerning Algorithm Membership query Trget System Is model M correct? Yes/No with counter-exmple Fig. 2. Exct lerning from queries: the ctive lerning model under which our utomt lerning lgorithms operte. Differentil testing. The second option offered by HVLern is to perform differentil testing between the inferred model nd models inferred from other implementtions for the sme certificte templte. Given two inferred DFA models, HVLern genertes set of unique differences between the two models using n lgorithm which we discuss in Section IV-E. This option is especilly useful for finding bugs in corner cses which re not well defined in the RFCs. We summrize the dvntges of our pproch below: Adopting blck-box lerning pproch ensures tht our nlysis method is lnguge independent nd we cn esily test vriety of different implementtions. Our only requirement is the bility to query the trget librry/ppliction with certificte nd hostnme of our choice nd find whether the hostnme is mtching the given identifier in the certificte. As pointed out in the previous section, hostnme verifiction is similr to regulr expression mtching. Given tht regulr expressions cn be represented s DFAs, dopting n utomt-bsed lerning lgorithm for representing the inferred models for ech certificte templte is nturl nd effective choice. Finlly, n dditionl dvntge of hving DFA models is tht we cn efficiently compre two inferred models nd enumerte ll differences between them. This property is very importnt for differentil testing s it helps us in nlyzing the mbiguous rules in the specifictions. Limittions. A nturl trde-off of choosing to implement our system s blck-box nlysis method is tht we cnnot gurntee completeness or soundness of our models. However, ech difference inferred by HVLern cn be esily verified by querying the corresponding implementtions. Moreover, since our system will find ll differences mong implementtions, it will not report bug tht is common mong ll implementtions unless rule is explicitly specified for it, s described bove. Finlly, we point out tht not ll discrepncies mong systems re necessrily security vulnerbilities; they my represent eqully cceptble design choices for mbiguous prts of the RFCs. C. Automt Lerning Algorithms We will now describe the utomt lerning lgorithms tht llow us to relize our utomt-bsed nlysis frmework. Lerning model. We utilize lerning lgorithms tht work in n ctive lerning model which is clled exct lerning from queries. Trditionl supervised lerning lgorithms, such s those used to trin deep neurl networks, work on given set of lbeled exmples. In contrst, ctive lerning lgorithms in our model work by dptively selecting inputs tht they use to query trget system nd obtin the correct lbel. Figure 2 presents n overview of our lerning model. A lerning lgorithm ttempts to lern model of trget system by querying the trget system with inputs of its choice. Eventully, by querying the trget system multiple times, the lerning lgorithm infers model of the trget system. This model is then checked for correctness through n equivlence orcle, n orcle tht checks whether the inferred model correctly summrizes the behvior of the trget system. If the model is correct, i.e., it grees with the trget system on ll inputs, then the lerning lgorithm will output the generted model nd terminte. On the other hnd, if the model is incorrect, the equivlence orcle will produce counterexmple, i.e., n input under which the trget system nd the model produce different outputs. The lerning lgorithm then uses the counterexmple to refine the inferred model. This process itertes until the lerning lgorithm produces correct model. To summrize, lerning lgorithm in the exct lerning model is ble to interct with the trget system using two types of queries: Membership queries: The input to this type of query is string s nd the output is Accept or Reject depending on whether the string s is ccepted by the trget system or not. Equivlence queries: The input to n equivlence query is model M nd the output of the query is either True, if the model M is equivlent to the trget system on ll inputs, or counterexmple input under which the model nd trget system produce different outputs. Automt lerning in prctice. The first lgorithm for inferring DFA models in the exct lerning from queries model ws developed by Angluin [31] nd ws followed by lrge number of optimiztions nd vritions in the following yers. In our system, we use the Kerns-Vzirni (KV) lgorithm [54]. The KV lgorithm utilizes dt structure clled the discrimintion tree nd it is in prctice more efficient in terms of the mount of queries it requires to infer DFA model. The most significnt chllenge tht one should ddress in order to use the KV lgorithm nd other utomt lerning lgorithms in prctice, is how to implement n efficient nd ccurte equivlence orcle in order to simulte the equivlence queries performed by the lerning lgorithm. Since we only hve blck-box ccess to the trget system, ny method for implementing equivlence queries is necessrily incomplete. In HVLern, we use the Wp-method [49], for implementing equivlence queries. The Wp-method checks the equivlence between n inferred DFA nd trget system using only blck-box queries to the trget system. Essentilly, the Wpmethod pproximtes n equivlence orcle by using multiple 525

6 HVLern certificte templtes equivlence query DFA model LernLib Optimized Wp-Method counterexmple KV lgorithm output finl model for test certificte templte test certificte templte Wp-method s test hostnmes hostnme (membership queries) ccept/reject SSL/TLS hostnme verifiction implementtion mtch (hostnme, test cert)? Fig. 3. Overview of lerning hostnme verifiction implementtion using HVLern. membership queries. The lgorithm is given s input the DFA to be checked nd n upper bound on the number of sttes in the trget system when modeled s DFA, prmeter which we cll depth. Then, the lgorithm cretes set of test inputs S, which re then submitted to the trget system. If the trget system grees with the DFA model on ll inputs in the test set S, then the DFA nd the trget system re proved equivlent under the ssumption tht the upper bound on the number of sttes of the trget system is correct. In theory, one cn set the depth prmeter of the Wp-method to very lrge vlue in order to design n equivlence orcle which is, in prctice, complete. However, the size of the set of test inputs produced by the Wp-method is on the order of O(n 2 Σ m n+1 ) where Σ is the input lphbet for the DFA, m is the upper bound on the number of sttes of the trget system nd n is the number of sttes in the input DFA. Therefore, using the Wp-method with lrge depth (i.e., upper bound on the number of sttes of the trget system) is imprcticl. Note tht, the bound on the number of test inputs produced by the Wp-method is not worst cse bound; on the contrry, the number of test inputs produced is usully of tht order. Consequently, it is essentil for the efficiency of our system to mintin smll lphbet for our DFAs nd lso set smll upper bound (depth) on the number of sttes of the trget system while using the Wp-method. We ddress both of these issues in the next section. IV. ARCHITECTURE OF HVLEARN In this section, we describe the design nd implementtion of our system, HVLern, bsed on utomt lerning techniques. Specificlly, we describe the technicl chllenges tht rise when we ttempt to use utomt lerning lgorithms in prctice. We lso summrize the optimiztions tht HVLern implements to ddress these chllenges nd efficiently lern DFA models of hostnme verifiction implementtions. A. System overview Figure 3 presents n overview of how HVLern is used to nlyze the hostnme verifiction functionlity of n SSL/TLS librry. To use HVLern, the user provides HVLern ccess to the hostnme verifiction function tht tkes n X.509 certificte nd hostnme s input nd returns ccept/reject depending on whether the provided hostnme is mtching the identifier in the certificte. We describe how we implement this interfce in Section IV-C. Our system includes number of certificte templtes, which re certifictes designed to test the SSL/TLS implementtion on number of different rules s described in Section IV-B. For ech such templte, HVLern will lern DFA model describing the set of hostnmes ccepted by given implementtion for the given certificte templte. To produce DFA model, HVLern utilizes the LernLib [59] librry which contins implementtions of both the KV lgorithm nd the Wp-method. To void setting the mximum depth of the Wp-method to imprcticlly high vlues, we optimize the equivlence orcle s described in Section IV-D. Once model is generted, our system proceeds to nlyze the model s described in Section IV-E. The results of our nlysis, both the inferred models nd the differences between models re then sved for reuse. Optionlly, HVLern cn lso utilize the inferred models for certificte templte to extrct forml specifiction for the corresponding certificte templte s described in Section V-F. B. Generting certificte templtes To cover ll different rules nd mbiguous prctices in hostnme verifiction, we creted set of 23 certifictes with different identifier templtes, where ech certificte is designed to test specific rule from the specifiction. These certifictes re selected to cover ll the rules we described in Section II. For exmple, certificte with common nme xn--*. will test if the implementtion llows wildcrds s prt of n A-lbel in n IDN, something which is explicitly forbidden by RFC Our templte certifictes re self-signed X.509 v3 certifictes generted using the GnuTLS librry. We choose to use GnuTLS for certificte genertion becuse it llows identifiers with embedded NULL chrcters in both subject common nme nd SAN. The templte identifier to be tested is plced in either Subject CN nd/or SAN (s dnsnme, ipaddress, or emil). C. Performing membership queries In order to utilize the lerning lgorithms in LernLib (including the Wp-method), we implement membership query function tht performs ll queries to the trget system. This function ccepts input s string nd returns binry vlue. In our system, we use the hostnme verifiction function from the trget SSL/TLS implementtion. We note here tht, since LernLib is written in Jv while mny of our tested SSL/TLS implementtions re written in C/C++/Python, we utilized the Jv Ntive Interfce (JNI) [10] to efficiently perform membership queries to the trget in such cses. D. Automt lerning prmeters nd optimiztions In this section, we describe the rchitecturl decisions nd optimiztions tht we implemented to efficiently scle the KV 526

7 lgorithm for testing complex rel-world SSL/TLS hostnme verifiction implementtions. Alphbet size. The first importnt decision we hve to mke to utilize the KV lgorithm is to select n lphbet tht will be used by the lgorithm. The lphbet refers to the set of symbols tht the lerning lgorithm will test. A strightforwrd pproch is to use very generl set of chrcters such s the set of ASCII chrcters. However, this will impose n unnecessry overhed in our system s performnce since the performnce of both the KV lgorithm nd the Wp-method rely hevily on the underlying lphbet size. Our min insight is tht we cn reduce the lphbet to smll set of representtive chrcters tht will thoroughly test ll different spects of hostnme verifiction. In prticulr we select the set Σ={, 1,, A, =, *, x, n, -, \u4f60, NULL} s n input lphbet in our experiments. In the presented lphbet, denotes the. chrcter, \s denotes the spce chrcter (ASCII vlue 32), NULL denotes the zero byte chrcter, nd \u4f60 denotes the unicode chrcter with hexdeciml vlue 4F60. Note tht this set of symbols is dequte for nlyzing hostnme verifiction implementtions since it includes chrcters from ll different ctegories such s lowercse, uppercse, digits, unicode, etc., s well s specil chrcters like the NULL chrcter. The lowercse chrcters x, n in conjunction with the - chrcter re necessry in order to encode IDN hostnmes. Finlly, the inclusion of some nonlphnumeric chrcters such s the = chrcter llows us to detect violtions where n implementtion ccepts invlid hostnmes. Note tht, even though the hostnmes generted using this lphbet set will often not resolve to rel IP ddress when processed s DNS nmes, it does not ffect the ccurcy of our nlysis in ny wy. This is side-effect the fct tht the hostnme verifiction routines re not responsible for resolving the provided DNS nme to n IP ddress. It simply checks whether the given hostnme mtches the identifier in the provided certificte. Cching membership queries. To void the communiction cost of repeted querying of the SSL/TLS implementtions with sme inputs, we utilize LernLib s DFALerningCche clss to cche the results of the membership queries. The cche is checked on ech new query, nd cched result is used whenever found. This optimiztion is prticulrly useful for cutting down the overhed of the repeted queries generted by the Wp-method cross multiple equivlence queries. Optimizing equivlence queries. In prctice, the first model generted by the lerning lgorithm is usully just single stte DFA which rejects ll hostnmes. The reson is tht the lerning lgorithm is not ble to generte ny ccepting hostnme nd thus cnnot distinguish between the initil stte nd ny other stte in the trget system. Sometimes, to force the KV lgorithm to produce n ccepting hostnme using the Wp-method, very lrge depth is required. This my cuse efficiency issues in the system. However, if we supply the model with n ccepting hostnme, then trivil models will be improved quickly without hving to utilize excessive depth prmeters in the Wp-method. Recll here tht the exponentil term in the Wp-method is dependent on the difference between the number of sttes in the model nd the provided depth. Therefore, once we discover n ccepting stte in the trget system, the Wp-method with much smller depth will still be ble to explore mny different spects of the hostnme verifiction implementtion. In order to generte n ccepting hostnme, we perform the following test during n equivlence query nd before clling the Wp-method. First, we serch for ny wildcrd chrcters (*) in the provided common nme nd replce them with rndom chrcters from our lphbet to obtin concrete hostnme. Next, we check tht the generted model nd the trget hostnme verifiction implementtion gree on set of hostnmes generted using this method. If not, we return the hostnme for which they differ s counterexmple. The min dvntge of this heuristic is tht it llows us to quickly produce ccepting hostnmes tht uncover new sttes in the trget system without invoking the Wp-method with very lrge depth vlues. Once these sttes re uncovered, nd the qulity of the inferred models improve, the Wp-method, with smll depth prmeter, is utilized to discover dditionl sttes in the trget system. E. Anlysis nd comprison of inferred DFA models After HVLern outputs model, the next tsk for our system is to nlyze the produced model for RFC violtions or, confusing/mbiguous rules in the RFC, to compre different inferred models nd nlyze ny discrepncies found between different implementtions. Anlyzing single DFA model. In the cse of single model, we would like to determine whether the model is ccepting invlid hostnmes prohibited by the RFC specifiction. If the specifiction is uncler, our nlysis cn still be used in order to mnully inspect the behvior of the implementtion on the specific certificte templte besides the differentil nlysis described below. Our system offers two options for performing nlysis of single model. First, our system genertes inputs tht will exercise ll simple pths (i.e., pths without loops) tht led to ccepting sttes, in the inferred model. Intuitively, these inputs re smll set of inputs tht describe ll different flvors of hostnmes tht will be ccepted for the given certificte templte. By inspecting these certifictes, we cn determine if the implementtion is ccepting invlid hostnmes. Second, HVLern llows the user to specify regulr expression rule to be checked ginst the inferred model. In this cse, the user specifies regulr expression nd HVLern verifies tht the regulr expression nd the inferred model does not shre ny common strings. This option llows to esily check certin RFC violtions by utilizing simple regulr expression rules. For exmple, consider the rule specifying tht no nonlphnumeric chrcters should be prt of mtching hostnme. By specifying the regulr expression rule (.)*=(.)* 527

8 we cn check whether there exists ny mtching hostnme tht contins the = chrcter in the inferred model. Compring unique differences between DFA models. For nlyzing certin corner cses which re not specified in the RFC, testing single model my not be enough. Insted, we compre the inferred models for different SSL/TLS implementtions nd find inputs under which the implementtions behve differently. To perform this nlysis, we utilize the difference enumertion lgorithm from [33]. In nutshell, this lgorithm computes the product DFA between two, or more, given models nd then finds ll simple pths to sttes in which the DFAs re producing different output. F. Specifiction Extrction As we discussed lredy, the RFC specifictions leve certin spects of hostnme verifiction up to the implementtions by not specifying the correct behvior in ll cses. In these cses imposing specific restrictions in the implementtions is chllenging since we hve to be creful to void breking comptibility with existing implementtions nd vlid certifictes. In this section, we describe how the inferred DFA models for the different certificte templtes cn be used to infer forml specifiction, which is comptible with existing implementtions, for the cses where RFC specifictions re vgue. Our min insight is the following: For ech certificte templte, we cn use the DFA ccepting the set of hostnmes ccepted by ll SSL/TLS implementtions s forml specifiction of the corresponding rule templte. The intuition behind this choice is tht this specifiction is voiding smll idiosyncrsies of ech librry nd it is thus very compct. On the other hnd, if vulnerbility exists in this specifiction then this vulnerbility must lso exist in ll tested implementtions. Since ech implementtion is udited independently, our choice gives us confidence tht our specifiction is secure from simple vulnerbilities while mintining bckwrd comptibility with the tested implementtions. Computing the specifiction. In order to compute the corresponding specifiction for ech certificte templte, we proceed s follows: First, we obtin DFA models for ll hostnme verifiction implementtions under test using HVLern. Next, we compute the product DFA for ll the inferred models. The product DFA ccepts the intersection of the regulr lnguges of ech DFA. We compute the product DFA using stndrd utomt lgorithms [60]. The inferred forml specifiction for our set of implementtions is represented by the product DFA of ech DFA model. This product DFA cn be then converted bck to regulr expression to improve redbility. Finlly, we would like to point out tht computing the intersection of k DFAs hve worst cse time complexity of O(n k ) where n is the number of sttes in ech DFA [55]. However, in our cse, the inferred DFAs re mostly similr nd thus, the product construction is very efficient becuse intersecting two DFAs is not dding significnt number of sttes in the resulting product DFA. We provide more evidence supporting this hypothesis in Section V. V. EVALUATION The min gols of our evlution of HVLern to nswer the following questions: (i) how effective HVLern is in finding RFC violtions in rel-world hostnme verifiction implementtions? (ii) How much do our optimiztions help in improving the performnce of HVLern? (iii) how does HVLern perform compre to existing blck-box or covergeguided gry-box techniques (iv) cn HVLern infer bckwrdcomptible specifictions from the inferred DFAs of rel-world hostnme verifiction implementtions. A. Hostnme verifiction test subjects We use HVLern to test hostnme verifiction implementtions in six populr open-source SSL/TLS implementtions, nmely OpenSSL, GnuTLS, MbedTLS (PolrSSL), MtrixSSL, JSSE, nd CPython SSL, s well s in two populr SSL/TLS pplictions: curl nd HttpClient. Note tht s severl librries like OpenSSL versions prior to do not provide support for hostnme verifiction nd leve it up to the ppliction developer to implement it. Therefore, pplictions like curl/httpclient tht support different librries re often forced to write their own implementtions of hostnme verifiction. Among the librries tht support hostnme verifiction, some like OpenSSL provide seprte API functions for mtching ech type of identifier (i.e., domin nme, IP ddresses, emil, etc.) nd leve it up to ppliction to select the pproprite one depending on the setting. In contrst, others like MtrixSSL combine ll supported types of identifiers in one function nd figure out the pproprite by inspecting the input string. Tble I shows the hostnme verifiction function/clss nmes for ll implementtions tht we tested nd the types of identifier(s) tht ech of them supports. The lst column shows physicl source lines of code (SLOC) for ech host mtching function/clss s reported by the SLOCCount [14] tool. Note tht the shown SLOC only count the prts of the code tht perform hostnme mtching. B. Finding RFC violtions with HVLern We use HVLern to produce DFA models for ech distinct certificte templte corresponding to different ptterns from the RFCs. Afterwrd, we detect potentilly buggy behvior by both performing differentil testing of output DFAs s well s checking individul DFAs for violtions of regulrexpression-bsed rules tht we creted mnully s described in Section IV-E. Tble II presents the results of our experiments. We evluted diverse set of rules from four different RFCs [16], [17], [21], [24]. We found tht every rule tht we tested is violted by t lest one implementtion, while on verge ech implementtion is violting three RFC rules. Severl of these violtions hve severe security implictions (e.g., mishndling wildcrd chrcters in interntionl domin nmes, confusing IP ddresses s domin nmes etc.). We describe these cses long with their security implictions in detil in Section VI. 528

9 TABLE I HOSTNAME VERIFICATION FUNCTIONS (ALONG WITH THE TYPES OF SUPPORTED IDENTIFIERS) IN SSL/TLS LIBRARIES AND APPLICATIONS SSL/TLS Version Supported Hostnme Mtching Approx. Libs/Apps Identifier(s) Function/Clss Nme SLOC OpenSSL OpenSSL CN/DNS X509 check host 314 IP X509 check ip 308 IP X509 check ip sc 417 X509 check emil 314 GnuTLS CN/DNS/IP gnutls x509 crt check hostnme, 195 gnutls x509 crt check hostnme2 gnutls x509 crt check emil 149 MbedTLS CN/DNS mbedtls x509 crt verify, 193 mbedtls x509 crt verify with profile MtrixSSL CN/DNS/IP/ mtrixvlidtecerts 130 JSSE 1.8 CN/DNS/IP HostnmeChecker 202 CPython SSL CN/DNS/IP mtch hostnme 59 HttpClient CN/DNS/IP DefultHostnmeVerifier 257 curl CN/DNS/IP verifyhost, 300 Curl verifyhost Note tht the librry with the most violtions is JSSE (four violtions), while HttpClient is the ppliction with the most violtions (five violtions). OpenSSL, MbedTLS, nd CPython SSL only hve two violtions ech, hving common the violtion of mtching invlid hostnmes. The interested reder cn find n extended description of our results in the Appendix (Tble VIII). C. Compring unique differences between DFA models In order to evlute the discrepncies between ll different hostnme verifiction implementtions, we computed the number of differences for ech pir of hostnme verifiction implementtions in our test set. Recll tht for two given DFA models we define the number of differences s the number of simple pths in the product DFA which led to different output being produced by the two models [33]. Tble III presents the results of our experiment. For exmple, OpenSSL nd GnuTLS hve 95 discrepncies in totl. This is obtined by summing up the number of unique pths tht re different between the inferred DFAs for ech common nme in Tble VIII. Note tht ll pirs of implementtions contin lrge number of unique cses under which they produce different output. As seen in Tble III, ech pir of tested implementtion hs 127 unique differences on verge between them. We note tht some differences only imply mbiguous RFC rules while some revel the potentil invlid hostnmes or RFC violtion bugs. The interested reder cn find more detiled list of the unique strings tht ech implementtion is ccepting in Tble VIII in the Appendix. In ny cse, we find the fct tht ll implementtions of such security criticl component of the SSL/TLS protocol present such lrger number of discrepncies to be n lrming issue since it signifies either poor implementtion of the specifiction or vgueness in the specifiction itself. Our nlysis suggests tht both cses re present in prctice. D. Compring code coverge of HVLern nd blck/gry-box fuzzing In order to compre HVLern s effectiveness in finding bugs with tht of blck/gry-box fuzzing, we investigte the following reserch question: RQ.1: How HVLern s code coverge differ from blck/grybox fuzzing techniques? We compre the code coverge of the tested hostnme verifiction implementtions chieved by HVLern nd two other techniques, blck-box fuzzing, nd coverge-guided gry-box fuzzing. We describe our testing setup briefly below. HVLern: HVLern leverges utomt lerning tht invokes the hostnme verifiction mtching routine with predefined certificte templte nd lphbet set. HVLern dptively refines DFA corresponding to the test hostnme verifiction implementtion by querying the implementtion with new hostnme strings. We mesure the code coverge chieved during the lerning process until it finishes. We lso monitor the totl number of queries NQ, which comes from both the membership nd the equivlence queries. Blck-box fuzzing: With the sme lphbet nd certificte templte used by HVLern, we rndomly generte NQ strings nd query the trget SSL/TLS hostnme verifiction function with the sme certificte templte. Note tht the blck-box fuzzer genertes independent rndom strings without ny sort of guidnce. Coverge-guided gry-box fuzzing: Unlike blck-box fuzzing, coverge-guided gry-box fuzzing tries to generte more interesting inputs by using evolutionry techniques to the input genertion process. In ech genertion, new btch of inputs re generted from the previous genertion through muttion/cross-over nd only the inputs tht increse code coverge re kept for further chnges. Coverge-guided grybox fuzzing is populr technique for finding bugs in lrge rel-world progrms [6], [11]. To mke it fir comprison with HVLern, we implemented our own coverge-guided gry-box fuzzer s existing tools like AFL do not provide n esy wy of restricting the muttion outputs within given lphbet. With the sme lphbet set, we initilize the fuzzer with set of strings of vrying lengths s the seeds mintined in queue Q. The seeds re then used by the fuzzer to query the trget hostnme verifiction implementtion. After finishing querying, using the seeds, the fuzzer gets the string S = dequeue(q). It rndomly muttes one chrcter within S nd obtins S. Then it uses the mutted S to query the trget. If the mutted string S incresed code coverge, we store it in the queue for further muttion, i.e., enqueue(s,q). Otherwise, we throw it wy. The fuzzer is thus guided to lwys mutte on the strings tht hve better code coverge. The fuzzer itertively performs this enqueue/dequeue opertions for NQ rounds, nd we obtin the finl code coverge COV rndmu of ech 529

10 TABLE II A SUMMARY OF RFC VIOLATIONS AND DISCREPANT BEHAVIORS FOUND BY HVLEARN IN THE TESTED SSL/TLS LIBRARIES AND APPLICATIONS RFC Violtions RFC Invlid hostnme chrcter Only lphnumeric nd - mtches in hostnme 1035 Cse-insensitive hostnme Mtch CN in cse-insensitive mnner 5280, 6125 Wildcrd Not ttempt to mtch wildcrd not in left-most lbel (CN/DNS:.*.) 6125 IDN nd wildcrd Not ttempt to mtch wildcrd frgment in IDN (xn--*.) 6125 Common nme nd subjectaltnme No CN checked when DNS presents 6125 No CN checked when ny SAN ID presents 6125 Emil-bsed certificte Cse-sensitive on locl-prt of emil ttribute in SAN 5280 IP ddress-bsed certificte Not ttempt to mtch IP ddress with DNS (DNS: ) 1123 Discrepncies Wildcrd Attempt to mtch wildcrd with empty lbel (hostnme:.. with CN/DNS: *..) Attempt to mtch wildcrd in public suffix (CN/DNS: *.co.uk) 6125 Embedded NULL chrcter Allowed NULL chrcter in CN Allowed NULL chrcter in SAN Mtch NULL chrcter hostnme: b.b\0.., CN/DNS: b.b\0.. Other invlid hostnme Prtilly mtch suffix (hostnme:. with CN/DNS:.,..) 1035 Mtch triling (hostnme:. with CN/DNS:.) OpenSSL GnuTLS MbedTLS MtrixSSL JSSE CPython SSL curl HttpClient HttpClient* HttpClient*: HttpClient with PublicSuffixMtcher For RFC Violtion: = OK, = RFC violte, = libs/pps do not support For Discrepncies: = Accept, = Reject TABLE III NUMBER OF UNIQUE DIFFERENCES BETWEEN AUTOMATA INFERRED FROM DIFFERENT SSL/TLS IMPLEMENTATIONS OpenSSL GnuTLS MbedTLS MtrixSSL JSSE CPython HttpClient Curl OpenSSL GnuTLS MbedTLS MtrixSSL JSSE CPython HttpClient 414 Curl % of line coverge HVLern Coverge-guided gry-box fuzzing Blckbox fuzzing Number of queries Fig. 4. Comprison of code coverge chieved by HVLern, gry-box fuzzing, nd blck-box fuzzing for OpenSSL hostnme verifiction. functions SSL/TLS implementtions. Note tht we keep the test certificte templte fixed during the entire test. We use the percentge of lines executed, which re extrcted by Gcov [51], s the indictor for the code coverge. Considering tht hostnme verifiction is smll prt of n SSL/TLS implementtion, we do not compute the percentge of lines covered with respect to the totl number of lines. Insted, we clculte the percentge of line coverge within ech function nd only tke into ccount the functions tht re relted to hostnme verifiction. Result 1: HVLern chieves 11.21% increse in code coverge on verge when compring to the blck/grybox fuzzing techniques. Therefore, let LE(f) be the number of lines executed of function f in the SI nd L(f) be the totl number of lines of f, the code coverge cn be defined in the following equ- 530

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.

More information

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5 CS321 Lnguges nd Compiler Design I Winter 2012 Lecture 5 1 FINITE AUTOMATA A non-deterministic finite utomton (NFA) consists of: An input lphet Σ, e.g. Σ =,. A set of sttes S, e.g. S = {1, 3, 5, 7, 11,

More information

Fig.25: the Role of LEX

Fig.25: the Role of LEX The Lnguge for Specifying Lexicl Anlyzer We shll now study how to uild lexicl nlyzer from specifiction of tokens in the form of list of regulr expressions The discussion centers round the design of n existing

More information

2014 Haskell January Test Regular Expressions and Finite Automata

2014 Haskell January Test Regular Expressions and Finite Automata 0 Hskell Jnury Test Regulr Expressions nd Finite Automt This test comprises four prts nd the mximum mrk is 5. Prts I, II nd III re worth 3 of the 5 mrks vilble. The 0 Hskell Progrmming Prize will be wrded

More information

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Lecture 10 Evolutionry Computtion: Evolution strtegies nd genetic progrmming Evolution strtegies Genetic progrmming Summry Negnevitsky, Person Eduction, 2011 1 Evolution Strtegies Another pproch to simulting

More information

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties, Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey

Alignment of Long Sequences. BMI/CS Spring 2012 Colin Dewey Alignment of Long Sequences BMI/CS 776 www.biostt.wisc.edu/bmi776/ Spring 2012 Colin Dewey cdewey@biostt.wisc.edu Gols for Lecture the key concepts to understnd re the following how lrge-scle lignment

More information

Tool Vendor Perspectives SysML Thus Far

Tool Vendor Perspectives SysML Thus Far Frontiers 2008 Pnel Georgi Tec, 05-13-08 Tool Vendor Perspectives SysML Thus Fr Hns-Peter Hoffmnn, Ph.D Chief Systems Methodologist Telelogic, Systems & Softwre Modeling Business Unit Peter.Hoffmnn@telelogic.com

More information

Dr. D.M. Akbar Hussain

Dr. D.M. Akbar Hussain Dr. D.M. Akr Hussin Lexicl Anlysis. Bsic Ide: Red the source code nd generte tokens, it is similr wht humns will do to red in; just tking on the input nd reking it down in pieces. Ech token is sequence

More information

Presentation Martin Randers

Presentation Martin Randers Presenttion Mrtin Rnders Outline Introduction Algorithms Implementtion nd experiments Memory consumption Summry Introduction Introduction Evolution of species cn e modelled in trees Trees consist of nodes

More information

COMP 423 lecture 11 Jan. 28, 2008

COMP 423 lecture 11 Jan. 28, 2008 COMP 423 lecture 11 Jn. 28, 2008 Up to now, we hve looked t how some symols in n lphet occur more frequently thn others nd how we cn sve its y using code such tht the codewords for more frequently occuring

More information

A Heuristic Approach for Discovering Reference Models by Mining Process Model Variants

A Heuristic Approach for Discovering Reference Models by Mining Process Model Variants A Heuristic Approch for Discovering Reference Models by Mining Process Model Vrints Chen Li 1, Mnfred Reichert 2, nd Andres Wombcher 3 1 Informtion System Group, University of Twente, The Netherlnds lic@cs.utwente.nl

More information

MATH 25 CLASS 5 NOTES, SEP

MATH 25 CLASS 5 NOTES, SEP MATH 25 CLASS 5 NOTES, SEP 30 2011 Contents 1. A brief diversion: reltively prime numbers 1 2. Lest common multiples 3 3. Finding ll solutions to x + by = c 4 Quick links to definitions/theorems Euclid

More information

Scanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an

Scanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an Scnner Termintion A scnner reds input chrcters nd prtitions them into tokens. Wht hppens when the end of the input file is reched? It my be useful to crete n Eof pseudo-chrcter when this occurs. In Jv,

More information

vcloud Director Service Provider Admin Portal Guide vcloud Director 9.1

vcloud Director Service Provider Admin Portal Guide vcloud Director 9.1 vcloud Director Service Provider Admin Portl Guide vcloud Director 9. vcloud Director Service Provider Admin Portl Guide You cn find the most up-to-dte technicl documenttion on the VMwre website t: https://docs.vmwre.com/

More information

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method

A New Learning Algorithm for the MAXQ Hierarchical Reinforcement Learning Method A New Lerning Algorithm for the MAXQ Hierrchicl Reinforcement Lerning Method Frzneh Mirzzdeh 1, Bbk Behsz 2, nd Hmid Beigy 1 1 Deprtment of Computer Engineering, Shrif University of Technology, Tehrn,

More information

UNIT 11. Query Optimization

UNIT 11. Query Optimization UNIT Query Optimiztion Contents Introduction to Query Optimiztion 2 The Optimiztion Process: An Overview 3 Optimiztion in System R 4 Optimiztion in INGRES 5 Implementing the Join Opertors Wei-Png Yng,

More information

Efficient Regular Expression Grouping Algorithm Based on Label Propagation Xi Chena, Shuqiao Chenb and Ming Maoc

Efficient Regular Expression Grouping Algorithm Based on Label Propagation Xi Chena, Shuqiao Chenb and Ming Maoc 4th Ntionl Conference on Electricl, Electronics nd Computer Engineering (NCEECE 2015) Efficient Regulr Expression Grouping Algorithm Bsed on Lbel Propgtion Xi Chen, Shuqio Chenb nd Ming Moc Ntionl Digitl

More information

On String Matching in Chunked Texts

On String Matching in Chunked Texts On String Mtching in Chunked Texts Hnnu Peltol nd Jorm Trhio {hpeltol, trhio}@cs.hut.fi Deprtment of Computer Science nd Engineering Helsinki University of Technology P.O. Box 5400, FI-02015 HUT, Finlnd

More information

Mid-term exam. Scores. Fall term 2012 KAIST EE209 Programming Structures for EE. Thursday Oct 25, Student's name: Student ID:

Mid-term exam. Scores. Fall term 2012 KAIST EE209 Programming Structures for EE. Thursday Oct 25, Student's name: Student ID: Fll term 2012 KAIST EE209 Progrmming Structures for EE Mid-term exm Thursdy Oct 25, 2012 Student's nme: Student ID: The exm is closed book nd notes. Red the questions crefully nd focus your nswers on wht

More information

Theory of Computation CSE 105

Theory of Computation CSE 105 $ $ $ Theory of Computtion CSE 105 Regulr Lnguges Study Guide nd Homework I Homework I: Solutions to the following problems should be turned in clss on July 1, 1999. Instructions: Write your nswers clerly

More information

II. THE ALGORITHM. A. Depth Map Processing

II. THE ALGORITHM. A. Depth Map Processing Lerning Plnr Geometric Scene Context Using Stereo Vision Pul G. Bumstrck, Bryn D. Brudevold, nd Pul D. Reynolds {pbumstrck,brynb,pulr2}@stnford.edu CS229 Finl Project Report December 15, 2006 Abstrct A

More information

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08 CS412/413 Introduction to Compilers Tim Teitelum Lecture 4: Lexicl Anlyzers 28 Jn 08 Outline DFA stte minimiztion Lexicl nlyzers Automting lexicl nlysis Jlex lexicl nlyzer genertor CS 412/413 Spring 2008

More information

Coversheet. Publication metadata

Coversheet. Publication metadata Coversheet This is the ccepted mnuscript (post-print version) of the rticle. Contentwise, the ccepted mnuscript version is identicl to the finl published version, but there my be differences in typogrphy

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

CSE 401 Midterm Exam 11/5/10 Sample Solution

CSE 401 Midterm Exam 11/5/10 Sample Solution Question 1. egulr expressions (20 points) In the Ad Progrmming lnguge n integer constnt contins one or more digits, but it my lso contin embedded underscores. Any underscores must be preceded nd followed

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Engineer To Engineer Note

Engineer To Engineer Note Engineer To Engineer Note EE-186 Technicl Notes on using Anlog Devices' DSP components nd development tools Contct our technicl support by phone: (800) ANALOG-D or e-mil: dsp.support@nlog.com Or visit

More information

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFORMATICS 1 COMPUTATION & LOGIC INSTRUCTIONS TO CANDIDATES

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFORMATICS 1 COMPUTATION & LOGIC INSTRUCTIONS TO CANDIDATES UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFORMATICS COMPUTATION & LOGIC Sturdy st April 7 : to : INSTRUCTIONS TO CANDIDATES This is tke-home exercise. It will not

More information

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata CS 432 Fll 2017 Mike Lm, Professor (c)* Regulr Expressions nd Finite Automt Compiltion Current focus "Bck end" Source code Tokens Syntx tree Mchine code chr dt[20]; int min() { flot x = 42.0; return 7;

More information

Synchronizability of Conversations Among Web Services

Synchronizability of Conversations Among Web Services 1 Synchronizbility of Converstions Among Web Services Xing Fu, Tevfik Bultn, Jinwen Su Abstrct We present frmework for nlyzing interctions mong web services tht communicte with synchronous messges. We

More information

Definition of Regular Expression

Definition of Regular Expression Definition of Regulr Expression After the definition of the string nd lnguges, we re redy to descrie regulr expressions, the nottion we shll use to define the clss of lnguges known s regulr sets. Recll

More information

CS481: Bioinformatics Algorithms

CS481: Bioinformatics Algorithms CS481: Bioinformtics Algorithms Cn Alkn EA509 clkn@cs.ilkent.edu.tr http://www.cs.ilkent.edu.tr/~clkn/teching/cs481/ EXACT STRING MATCHING Fingerprint ide Assume: We cn compute fingerprint f(p) of P in

More information

Topic 2: Lexing and Flexing

Topic 2: Lexing and Flexing Topic 2: Lexing nd Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennrt Beringer 1 2 The Compiler Lexicl Anlysis Gol: rek strem of ASCII chrcters (source/input) into sequence of

More information

Fault injection attacks on cryptographic devices and countermeasures Part 2

Fault injection attacks on cryptographic devices and countermeasures Part 2 Fult injection ttcks on cryptogrphic devices nd countermesures Prt Isrel Koren Deprtment of Electricl nd Computer Engineering University of Msschusetts Amherst, MA Countermesures - Exmples Must first detect

More information

File Manager Quick Reference Guide. June Prepared for the Mayo Clinic Enterprise Kahua Deployment

File Manager Quick Reference Guide. June Prepared for the Mayo Clinic Enterprise Kahua Deployment File Mnger Quick Reference Guide June 2018 Prepred for the Myo Clinic Enterprise Khu Deployment NVIGTION IN FILE MNGER To nvigte in File Mnger, users will mke use of the left pne to nvigte nd further pnes

More information

Reducing a DFA to a Minimal DFA

Reducing a DFA to a Minimal DFA Lexicl Anlysis - Prt 4 Reducing DFA to Miniml DFA Input: DFA IN Assume DFA IN never gets stuck (dd ded stte if necessry) Output: DFA MIN An equivlent DFA with the minimum numer of sttes. Hrry H. Porter,

More information

Semistructured Data Management Part 2 - Graph Databases

Semistructured Data Management Part 2 - Graph Databases Semistructured Dt Mngement Prt 2 - Grph Dtbses 2003/4, Krl Aberer, EPFL-SSC, Lbortoire de systèmes d'informtions réprtis Semi-structured Dt - 1 1 Tody's Questions 1. Schems for Semi-structured Dt 2. Grph

More information

2 Computing all Intersections of a Set of Segments Line Segment Intersection

2 Computing all Intersections of a Set of Segments Line Segment Intersection 15-451/651: Design & Anlysis of Algorithms Novemer 14, 2016 Lecture #21 Sweep-Line nd Segment Intersection lst chnged: Novemer 8, 2017 1 Preliminries The sweep-line prdigm is very powerful lgorithmic design

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Dt Mining y I. H. Witten nd E. Frnk Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully

More information

Midterm 2 Sample solution

Midterm 2 Sample solution Nme: Instructions Midterm 2 Smple solution CMSC 430 Introduction to Compilers Fll 2012 November 28, 2012 This exm contins 9 pges, including this one. Mke sure you hve ll the pges. Write your nme on the

More information

pdfapilot Server 2 Manual

pdfapilot Server 2 Manual pdfpilot Server 2 Mnul 2011 by clls softwre gmbh Schönhuser Allee 6/7 D 10119 Berlin Germny info@cllssoftwre.com www.cllssoftwre.com Mnul clls pdfpilot Server 2 Pge 2 clls pdfpilot Server 2 Mnul Lst modified:

More information

Statistical classification of spatial relationships among mathematical symbols

Statistical classification of spatial relationships among mathematical symbols 2009 10th Interntionl Conference on Document Anlysis nd Recognition Sttisticl clssifiction of sptil reltionships mong mthemticl symbols Wl Aly, Seiichi Uchid Deprtment of Intelligent Systems, Kyushu University

More information

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis CS143 Hndout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexicl Anlysis In this first written ssignment, you'll get the chnce to ply round with the vrious constructions tht come up when doing lexicl

More information

Assignment 4. Due 09/18/17

Assignment 4. Due 09/18/17 Assignment 4. ue 09/18/17 1. ). Write regulr expressions tht define the strings recognized by the following finite utomt: b d b b b c c b) Write FA tht recognizes the tokens defined by the following regulr

More information

CS201 Discussion 10 DRAWTREE + TRIES

CS201 Discussion 10 DRAWTREE + TRIES CS201 Discussion 10 DRAWTREE + TRIES DrwTree First instinct: recursion As very generic structure, we could tckle this problem s follows: drw(): Find the root drw(root) drw(root): Write the line for the

More information

Preserving Constraints for Aggregation Relationship Type Update in XML Document

Preserving Constraints for Aggregation Relationship Type Update in XML Document Preserving Constrints for Aggregtion Reltionship Type Updte in XML Document Eric Prdede 1, J. Wenny Rhyu 1, nd Dvid Tnir 2 1 Deprtment of Computer Science nd Computer Engineering, L Trobe University, Bundoor

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Epson Projector Content Manager Operation Guide

Epson Projector Content Manager Operation Guide Epson Projector Content Mnger Opertion Guide Contents 2 Introduction to the Epson Projector Content Mnger Softwre 3 Epson Projector Content Mnger Fetures... 4 Setting Up the Softwre for the First Time

More information

Lecture T4: Pattern Matching

Lecture T4: Pattern Matching Introduction to Theoreticl CS Lecture T4: Pttern Mtching Two fundmentl questions. Wht cn computer do? How fst cn it do it? Generl pproch. Don t tlk bout specific mchines or problems. Consider miniml bstrct

More information

ECE 468/573 Midterm 1 September 28, 2012

ECE 468/573 Midterm 1 September 28, 2012 ECE 468/573 Midterm 1 September 28, 2012 Nme:! Purdue emil:! Plese sign the following: I ffirm tht the nswers given on this test re mine nd mine lone. I did not receive help from ny person or mteril (other

More information

Digital Design. Chapter 6: Optimizations and Tradeoffs

Digital Design. Chapter 6: Optimizations and Tradeoffs Digitl Design Chpter 6: Optimiztions nd Trdeoffs Slides to ccompny the tetbook Digitl Design, with RTL Design, VHDL, nd Verilog, 2nd Edition, by Frnk Vhid, John Wiley nd Sons Publishers, 2. http://www.ddvhid.com

More information

Transparent neutral-element elimination in MPI reduction operations

Transparent neutral-element elimination in MPI reduction operations Trnsprent neutrl-element elimintion in MPI reduction opertions Jesper Lrsson Träff Deprtment of Scientific Computing University of Vienn Disclimer Exploiting repetition nd sprsity in input for reducing

More information

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course L. Yroslvsky. Fundmentls of Digitl Imge Processing. Course 0555.330 Lecture. Imge enhncement.. Imge enhncement s n imge processing tsk. Clssifiction of imge enhncement methods Imge enhncement is processing

More information

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona CSc 453 Compilers nd Systems Softwre 4 : Lexicl Anlysis II Deprtment of Computer Science University of Arizon collerg@gmil.com Copyright c 2009 Christin Collerg Implementing Automt NFAs nd DFAs cn e hrd-coded

More information

COMBINATORIAL PATTERN MATCHING

COMBINATORIAL PATTERN MATCHING COMBINATORIAL PATTERN MATCHING Genomic Repets Exmple of repets: ATGGTCTAGGTCCTAGTGGTC Motivtion to find them: Genomic rerrngements re often ssocited with repets Trce evolutionry secrets Mny tumors re chrcterized

More information

Scanner Termination. Multi Character Lookahead

Scanner Termination. Multi Character Lookahead If d.doublevlue() represents vlid integer, (int) d.doublevlue() will crete the pproprite integer vlue. If string representtion of n integer begins with ~ we cn strip the ~, convert to double nd then negte

More information

12-B FRACTIONS AND DECIMALS

12-B FRACTIONS AND DECIMALS -B Frctions nd Decimls. () If ll four integers were negtive, their product would be positive, nd so could not equl one of them. If ll four integers were positive, their product would be much greter thn

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Cross-Supervised Synthesis of Web-Crawlers

Cross-Supervised Synthesis of Web-Crawlers Cross-Supervised Synthesis of Web-Crwlers Adi Omri Technion omri@cs.technion.c.il Shron Shohm Acdemic College of Tel Aviv Yffo shron.shohm@gmil.com Ern Yhv Technion yhve@cs.technion.c.il ABSTRACT A web-crwler

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1):

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1): Overview (): Before We Begin Administrtive detils Review some questions to consider Winter 2006 Imge Enhncement in the Sptil Domin: Bsics of Sptil Filtering, Smoothing Sptil Filters, Order Sttistics Filters

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Midterm I Solutions CS164, Spring 2006

Midterm I Solutions CS164, Spring 2006 Midterm I Solutions CS164, Spring 2006 Februry 23, 2006 Plese red ll instructions (including these) crefully. Write your nme, login, SID, nd circle the section time. There re 8 pges in this exm nd 4 questions,

More information

What are suffix trees?

What are suffix trees? Suffix Trees 1 Wht re suffix trees? Allow lgorithm designers to store very lrge mount of informtion out strings while still keeping within liner spce Allow users to serch for new strings in the originl

More information

Pointwise convergence need not behave well with respect to standard properties such as continuity.

Pointwise convergence need not behave well with respect to standard properties such as continuity. Chpter 3 Uniform Convergence Lecture 9 Sequences of functions re of gret importnce in mny res of pure nd pplied mthemtics, nd their properties cn often be studied in the context of metric spces, s in Exmples

More information

ISG: Itemset based Subgraph Mining

ISG: Itemset based Subgraph Mining ISG: Itemset bsed Subgrph Mining by Lini Thoms, Stynryn R Vlluri, Kmlkr Krlplem Report No: IIIT/TR/2009/179 Centre for Dt Engineering Interntionl Institute of Informtion Technology Hyderbd - 500 032, INDIA

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Automata Processor. Tobias Markus Computer Architecture Group, University of Heidelberg

Automata Processor. Tobias Markus Computer Architecture Group, University of Heidelberg 1 Automt Processor Tobis Mrkus Computer Architecture Group, University of Heidelberg Abstrct This pper gives brief overview over nondeterministic utomt nd the Automt Processor n rchitecture implemented

More information

Representation of Numbers. Number Representation. Representation of Numbers. 32-bit Unsigned Integers 3/24/2014. Fixed point Integer Representation

Representation of Numbers. Number Representation. Representation of Numbers. 32-bit Unsigned Integers 3/24/2014. Fixed point Integer Representation Representtion of Numbers Number Representtion Computer represent ll numbers, other thn integers nd some frctions with imprecision. Numbers re stored in some pproximtion which cn be represented by fixed

More information

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li

Complete Coverage Path Planning of Mobile Robot Based on Dynamic Programming Algorithm Peng Zhou, Zhong-min Wang, Zhen-nan Li, Yang Li 2nd Interntionl Conference on Electronic & Mechnicl Engineering nd Informtion Technology (EMEIT-212) Complete Coverge Pth Plnning of Mobile Robot Bsed on Dynmic Progrmming Algorithm Peng Zhou, Zhong-min

More information

Topic: Software Model Checking via Counter-Example Guided Abstraction Refinement. Having a BLAST with SLAM. Combining Strengths. SLAM Overview SLAM

Topic: Software Model Checking via Counter-Example Guided Abstraction Refinement. Having a BLAST with SLAM. Combining Strengths. SLAM Overview SLAM Hving BLAST with SLAM Topic: Softwre Model Checking vi Counter-Exmple Guided Abstrction Refinement There re esily two dozen SLAM/BLAST/MAGIC ppers; I will skim. # # Theorem Proving Combining Strengths

More information

CS 430 Spring Mike Lam, Professor. Parsing

CS 430 Spring Mike Lam, Professor. Parsing CS 430 Spring 2015 Mike Lm, Professor Prsing Syntx Anlysis We cn now formlly descrie lnguge's syntx Using regulr expressions nd BNF grmmrs How does tht help us? Syntx Anlysis We cn now formlly descrie

More information

Regular Expression Matching with Multi-Strings and Intervals. Philip Bille Mikkel Thorup

Regular Expression Matching with Multi-Strings and Intervals. Philip Bille Mikkel Thorup Regulr Expression Mtching with Multi-Strings nd Intervls Philip Bille Mikkel Thorup Outline Definition Applictions Previous work Two new problems: Multi-strings nd chrcter clss intervls Algorithms Thompson

More information

Chapter 2 Sensitivity Analysis: Differential Calculus of Models

Chapter 2 Sensitivity Analysis: Differential Calculus of Models Chpter 2 Sensitivity Anlysis: Differentil Clculus of Models Abstrct Models in remote sensing nd in science nd engineering, in generl re, essentilly, functions of discrete model input prmeters, nd/or functionls

More information

An Efficient Divide and Conquer Algorithm for Exact Hazard Free Logic Minimization

An Efficient Divide and Conquer Algorithm for Exact Hazard Free Logic Minimization An Efficient Divide nd Conquer Algorithm for Exct Hzrd Free Logic Minimiztion J.W.J.M. Rutten, M.R.C.M. Berkelr, C.A.J. vn Eijk, M.A.J. Kolsteren Eindhoven University of Technology Informtion nd Communiction

More information

Algorithm Design (5) Text Search

Algorithm Design (5) Text Search Algorithm Design (5) Text Serch Tkshi Chikym School of Engineering The University of Tokyo Text Serch Find sustring tht mtches the given key string in text dt of lrge mount Key string: chr x[m] Text Dt:

More information

Text mining: bag of words representation and beyond it

Text mining: bag of words representation and beyond it Text mining: bg of words representtion nd beyond it Jsmink Dobš Fculty of Orgniztion nd Informtics University of Zgreb 1 Outline Definition of text mining Vector spce model or Bg of words representtion

More information

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών ΕΠΛ323 - Θωρία και Πρακτική Μταγλωττιστών Lecture 3 Lexicl Anlysis Elis Athnsopoulos elisthn@cs.ucy.c.cy Recognition of Tokens if expressions nd reltionl opertors if è if then è then else è else relop

More information

Spring 2018 Midterm Exam 1 March 1, You may not use any books, notes, or electronic devices during this exam.

Spring 2018 Midterm Exam 1 March 1, You may not use any books, notes, or electronic devices during this exam. 15-112 Spring 2018 Midterm Exm 1 Mrch 1, 2018 Nme: Andrew ID: Recittion Section: You my not use ny books, notes, or electronic devices during this exm. You my not sk questions bout the exm except for lnguge

More information

Section 3.1: Sequences and Series

Section 3.1: Sequences and Series Section.: Sequences d Series Sequences Let s strt out with the definition of sequence: sequence: ordered list of numbers, often with definite pttern Recll tht in set, order doesn t mtter so this is one

More information

Lexical Analysis: Constructing a Scanner from Regular Expressions

Lexical Analysis: Constructing a Scanner from Regular Expressions Lexicl Anlysis: Constructing Scnner from Regulr Expressions Gol Show how to construct FA to recognize ny RE This Lecture Convert RE to n nondeterministic finite utomton (NFA) Use Thompson s construction

More information

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have Rndom Numers nd Monte Crlo Methods Rndom Numer Methods The integrtion methods discussed so fr ll re sed upon mking polynomil pproximtions to the integrnd. Another clss of numericl methods relies upon using

More information

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries Tries Yufei To KAIST April 9, 2013 Y. To, April 9, 2013 Tries In this lecture, we will discuss the following exct mtching prolem on strings. Prolem Let S e set of strings, ech of which hs unique integer

More information

Computer-Aided Multiscale Modelling for Chemical Process Engineering

Computer-Aided Multiscale Modelling for Chemical Process Engineering 17 th Europen Symposium on Computer Aided Process Engineesing ESCAPE17 V. Plesu nd P.S. Agchi (Editors) 2007 Elsevier B.V. All rights reserved. 1 Computer-Aided Multiscle Modelling for Chemicl Process

More information

Introduction to Computer Engineering EECS 203 dickrp/eecs203/ CMOS transmission gate (TG) TG example

Introduction to Computer Engineering EECS 203  dickrp/eecs203/ CMOS transmission gate (TG) TG example Introduction to Computer Engineering EECS 23 http://ziyng.eecs.northwestern.edu/ dickrp/eecs23/ CMOS trnsmission gte TG Instructor: Robert Dick Office: L477 Tech Emil: dickrp@northwestern.edu Phone: 847

More information

9 Graph Cutting Procedures

9 Graph Cutting Procedures 9 Grph Cutting Procedures Lst clss we begn looking t how to embed rbitrry metrics into distributions of trees, nd proved the following theorem due to Brtl (1996): Theorem 9.1 (Brtl (1996)) Given metric

More information

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork

MA1008. Calculus and Linear Algebra for Engineers. Course Notes for Section B. Stephen Wills. Department of Mathematics. University College Cork MA1008 Clculus nd Liner Algebr for Engineers Course Notes for Section B Stephen Wills Deprtment of Mthemtics University College Cork s.wills@ucc.ie http://euclid.ucc.ie/pges/stff/wills/teching/m1008/ma1008.html

More information

On Computation and Resource Management in Networked Embedded Systems

On Computation and Resource Management in Networked Embedded Systems On Computtion nd Resource Mngement in Networed Embedded Systems Soheil Ghisi Krlene Nguyen Elheh Bozorgzdeh Mjid Srrfzdeh Computer Science Deprtment University of Cliforni, Los Angeles, CA 90095 soheil,

More information

Character-Stroke Detection for Text-Localization and Extraction

Character-Stroke Detection for Text-Localization and Extraction Chrcter-Stroke Detection for Text-Locliztion nd Extrction Krishn Subrmnin ksubrm@bbn.com Prem Ntrjn pntrj@bbn.com Michel Decerbo mdecerbo@bbn.com Dvid Cstñòn Boston University dc@bu.edu Abstrct In this

More information

EasyMP Multi PC Projection Operation Guide

EasyMP Multi PC Projection Operation Guide EsyMP Multi PC Projection Opertion Guide Contents 2 Introduction to EsyMP Multi PC Projection 5 EsyMP Multi PC Projection Fetures... 6 Connection to Vrious Devices... 6 Four-Pnel Disply... 6 Chnge Presenters

More information

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Lexical Analysis. Amitabha Sanyal. (www.cse.iitb.ac.in/ as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay Lexicl Anlysis Amith Snyl (www.cse.iit.c.in/ s) Deprtment of Computer Science nd Engineering, Indin Institute of Technology, Bomy Septemer 27 College of Engineering, Pune Lexicl Anlysis: 2/6 Recp The input

More information

INTRODUCTION TO SIMPLICIAL COMPLEXES

INTRODUCTION TO SIMPLICIAL COMPLEXES INTRODUCTION TO SIMPLICIAL COMPLEXES CASEY KELLEHER AND ALESSANDRA PANTANO 0.1. Introduction. In this ctivity set we re going to introduce notion from Algebric Topology clled simplicil homology. The min

More information

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona Implementing utomt Sc 5 ompilers nd Systems Softwre : Lexicl nlysis II Deprtment of omputer Science University of rizon collerg@gmil.com opyright c 009 hristin ollerg NFs nd DFs cn e hrd-coded using this

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

COS 333: Advanced Programming Techniques

COS 333: Advanced Programming Techniques COS 333: Advnced Progrmming Techniques Brin Kernighn wk@cs, www.cs.princeton.edu/~wk 311 CS Building 609-258-2089 (ut emil is lwys etter) TA's: Junwen Li, li@cs, CS 217,258-0451 Yong Wng,yongwng@cs, CS

More information

Compiler Construction D7011E

Compiler Construction D7011E Compiler Construction D7011E Lecture 3: Lexer genertors Viktor Leijon Slides lrgely y John Nordlnder with mteril generously provided y Mrk P. Jones. 1 Recp: Hndwritten Lexers: Don t require sophisticted

More information

On the Detection of Step Edges in Algorithms Based on Gradient Vector Analysis

On the Detection of Step Edges in Algorithms Based on Gradient Vector Analysis On the Detection of Step Edges in Algorithms Bsed on Grdient Vector Anlysis A. Lrr6, E. Montseny Computer Engineering Dept. Universitt Rovir i Virgili Crreter de Slou sin 43006 Trrgon, Spin Emil: lrre@etse.urv.es

More information

Digital Design. Chapter 1: Introduction. Digital Design. Copyright 2006 Frank Vahid

Digital Design. Chapter 1: Introduction. Digital Design. Copyright 2006 Frank Vahid Chpter : Introduction Copyright 6 Why Study?. Look under the hood of computers Solid understnding --> confidence, insight, even better progrmmer when wre of hrdwre resource issues Electronic devices becoming

More information