HVLearn: Automated Black-box Analysis of Hostname Verification in SSL/TLS Implementations

Size: px

Start display at page:

Download "HVLearn: Automated Black-box Analysis of Hostname Verification in SSL/TLS Implementations"

Wilfred Lambert
6 years ago
Views:

2017 IEEE Symposium on Security nd Privcy HVLern: Automted Blck-box Anlysis of Hostnme Verifiction in SSL/TLS Implementtions Suphnnee Sivkorn, George Argyros, Kexin Pei, Angelos D.

1 2017 IEEE Symposium on Security nd Privcy HVLern: Automted Blck-box Anlysis of Hostnme Verifiction in SSL/TLS Implementtions Suphnnee Sivkorn, George Argyros, Kexin Pei, Angelos D. Keromytis, nd Sumn Jn Deprtment of Computer Science Columbi University, New York, USA {suphnnee, rgyros, kpei, ngelos, Abstrct SSL/TLS is the most commonly deployed fmily of protocols for securing network communictions. The security gurntees of SSL/TLS re criticlly dependent on the correct vlidtion of the X.509 server certifictes presented during the hndshke stge of the SSL/TLS protocol. Hostnme verifiction is criticl component of the certificte vlidtion process tht verifies the remote server s identity by checking if the hostnme of the server mtches ny of the nmes present in the X.509 certificte. Hostnme verifiction is highly complex process due to the presence of numerous fetures nd corner cses such s wildcrds, IP ddresses, interntionl domin nmes, nd so forth. Therefore, testing hostnme verifiction implementtions present chllenging tsk. In this pper, we present HVLern, novel blck-box testing frmework for nlyzing SSL/TLS hostnme verifiction implementtions, which is bsed on utomt lerning lgorithms. HVLern utilizes number of certificte templtes, i.e., certifictes with common nme (CN) set to specific pttern, in order to test different rules from the corresponding specifiction. For ech certificte templte, HVLern uses utomt lerning lgorithms to infer Deterministic Finite Automton (DFA) tht describes the set of ll hostnmes tht mtch the CN of given certificte. Once model is inferred for certificte templte, HVLern checks the model for bugs by finding discrepncies with the inferred models from other implementtions or by checking ginst regulr-expression-bsed rules derived from the specifiction. The key insight behind our pproch is tht the cceptble hostnmes for given certificte templte form regulr lnguge. Therefore, we cn leverge utomt lerning techniques to efficiently infer DFA models tht ccept the corresponding regulr lnguge. We use HVLern to nlyze the hostnme verifiction implementtions in number of populr SSL/TLS librries nd pplictions written in diverse set of lnguges like C, Python, nd Jv. We demonstrte tht HVLern cn chieve on verge 11.21% higher code coverge thn existing blck/gry-box fuzzing techniques. By compring the DFA models inferred by HVLern, we found 8 unique violtions of the RFC specifictions in the tested hostnme verifiction implementtions. Severl of these violtions re criticl nd cn render the ffected implementtions vulnerble to ctive mn-in-the-middle ttcks. I. INTRODUCTION The SSL/TLS fmily of protocols re the most commonly used mechnisms for protecting the security nd privcy of network communictions from mn-in-the-middle ttcks. The security gurntees of SSL/TLS protocols re criticlly dependent on correct vlidtion of X.509 digitl certifictes presented by the servers during the SSL/TLS hndshke phse. The certificte vlidtion, in turn, depends on hostnme verifiction for verifying tht the hostnme (i.e., fully qulified domin nme, IP ddress, nd so forth) of the server mtches one of the identifiers in the SubjectAltNme extension or the Common Nme (CN) ttribute of the presented lef certificte. Therefore, ny mistke in the implementtion of hostnme verifiction could completely undermine the security nd privcy gurntees of SSL/TLS. Hostnme verifiction is complex process due to the presence of numerous specil cses (e.g., wildcrds, IP ddresses, interntionl domin nmes, etc.). For exmple, wildcrd chrcter ( * ) is only llowed in the left-most prt (seprted by. ) of hostnme. To get sense of the complexities involved in the hostnme verifiction process, consider the fct tht different prts of its specifictions re described in five different RFCs [18], [20], [21], [24], [25]. Given the complexity nd security-criticl nture of the hostnme verifiction process, it is crucil to perform utomted nlysis of the implementtions for finding ny devition from the specifiction. However, despite the criticl nture of the hostnme verifiction process, none of the prior reserch projects deling with dversril testing of SSL/TLS certificte vlidtion [36], [38], [45], [50], support detiled utomted testing of hostnme verifiction implementtions. The prior projects either completely ignore testing of the hostnme verifiction process or simply check whether the hostnme verifiction process is enbled or not. Therefore, they cnnot detect ny subtle bugs where the hostnme verifiction implementtions re enbled but devite subtly from the specifictions. The key problem behind utomted dversril testing of hostnme verifiction implementtions is tht the inputs (i.e., hostnmes nd certificte identifiers like common nmes) re highly structured, sprse strings nd therefore mkes it very hrd for existing blck/gry-box fuzz testing techniques to chieve high test coverge or generte inputs triggering the corner cses. Hevily lnguge/pltform-dependent white-box testing techniques re lso hrd to pply for testing hostnme verifiction implementtions due to the lnguge/pltform diversity of SSL/TLS implementtions. In this pper, we design, implement, nd evlute HVLern, blck-box differentil testing frmework bsed on utomt lerning, which cn utomticlly infer Deterministic Finite Automt (DFA) models of the hostnme verifiction implementtions. The key insight behind HVLern is tht hostnme verifiction, even though very complex, conceptully closely 2017, Suphnnee Sivkorn. Under license to IEEE. DOI /SP

2 resemble the regulr expression mtching process in mny wys (e.g., wildcrds). This insight on the structure of the certificte identifier formt suggests tht the cceptble hostnmes for given certificte identifier, s suggested by the specifictions, form regulr lnguge. Therefore, we cn use blck-box utomt lerning techniques to efficiently infer Deterministic Finite Automt (DFA) models tht ccept the regulr lnguge corresponding to given hostnme verifiction implementtion. Prior results by Angluin et l. hve shown tht DFAs cn be lerned efficiently through blck-box queries in polynomil time over the number of sttes [31]. The DFA models inferred by HVLern cn be used to efficiently perform two min tsks tht existing testing techniques cnnot do well: (i) finding nd enumerting unique differences between multiple different implementtions; nd (ii) extrcting forml, bckwrd-comptible reference specifiction for the hostnme verifiction process by computing the intersection DFA of the inferred DFA models from different implementtions. We pply HVLern to nlyze number of populr SSL/TLS librries such s OpenSSL, GnuTLS, MbedTLS, MtrixSSL, CPython SSL nd pplictions such s Jv HttpClient nd curl written in diverse lnguges like C, Python, nd Jv. We found 8 distinct specifiction violtions like the incorrect hndling of wildcrds in interntionlized domin nmes, confusing domin nmes with IP ddresses, incorrect hndling of NULL chrcters, nd so forth. Severl of these violtions llow network ttckers to completely brek the security gurntees of SSL/TLS protocol by llowing the ttckers to red/modify ny dt trnsmitted over the SSL/TLS connections set up using the ffected implementtions. HVLern lso found 121 unique differences, on verge, between ny two pirs of tested ppliction/librry. The mjor contributions of this pper re s follows. To the best of our knowledge, HVLern is the first testing tool tht cn lern DFA models for implementtions of hostnme verifiction, criticl prt of SSL/TLS implementtions. The inferred DFA models cn be used for efficient differentil testing or extrcting forml reference specifiction comptible with multiple existing implementtions. We design nd implement severl domin-specific optimiztions like equivlence query design, lphbet selection, etc. in HVLern for efficiently lerning DFA models from hostnme verifiction implementtions. We evlute HVLern on 6 populr librries nd 2 pplictions. HVLern chieved significntly higher (11.21% more on verge) code coverge thn existing blck/grybox fuzzing techniques nd found 8 unique previously unknown RFC violtions s shown in Tble II, severl of which render the ffected SSL/TLS implementtions completely insecure to mn-in-the-middle ttcks. The reminder of this pper is orgnized s follows: Section II presents the descriptions of the SSL/TLS hostnme verifiction process. We discuss the chllenges in testing hostnme verifiction nd our testing methodology in Section III. Section IV describes the design nd implementtion detils of HVLern. We present the evlution results for using HVLern to test SSL/TLS implementtions in Section V. Section VI presents detiled cse study of severl securitycriticl bugs tht HVLern found. Section VII discusses the relted work nd Section VIII concludes the pper. For the detiled developer responses on the bugs found by HVLern, we refer interested reders to Appendix X-B. II. OVERVIEW OF HOSTNAME VERIFICATION As prt of the hostnme verifiction process, the SSL/TLS client must check tht the host nme of the server mtches either the common nme ttribute in the certificte or one of the nmes in the subjectaltnme extension in the certificte [21]. Note tht even though the process is clled hostnme verifiction, it lso supports verifiction of IP ddresses or emil ddresses. In this section, we first provide brief summry of the hostnme formt nd specifictions tht describe the formt of the common nme ttribute nd subjectaltnme extension formts in X.509 certificte. Figure 1 provides high-level summry of the relevnt prts of n X.509 certificte. Next, we describe different prts of the hostnme verifiction process (e.g., domin nme restrictions, wildcrd chrcters, nd so forth) in detil. X.509 Certificte type formt Subject: CN= X520CommonNme rbitrry X509v3 extensions X509v3 Subject Alterntive Nme: type formt DNS: IA5String dnsnme IP Address: emil: IA5String IA5String ipaddress rfc822nme Fig. 1. Fields in n X.509 certificte tht re used for hostnme verifiction. A. Hostnme verifiction inputs Hostnme formt. Hostnmes re usully either fully qulified domin nme or single string without ny. chrcters. Severl SSL/TLS implementtions (i.e., OpenSSL) lso support IP ddresses nd emil ddresses to be pssed s the hostnme to the corresponding hostnme verifiction implementtion. A domin nme consists of multiple lbels, ech seprted by. chrcter. The domin nme lbels cn only contin letters -z or A-Z (in cse-insensitive mnner), digits 0-9 nd the hyphen chrcter - [16]. Ech lbel cn be up to 63 chrcters long. The totl length of domin nme cn be up to 255 chrcters. Erlier specifictions required tht the lbels must begin with letters [21]. However, subsequent revisions hve llowed lbels tht begin with digits [17]. Common nmes in X.509 certifictes. The Common Nme (CN) is n ttribute of the subject distinguished nme 522

3 field in n X.509 certificte. The common nme in server certificte is used for vlidting the hostnme of the server s prt of the certificte verifiction process. A common nme usully contins fully qulified domin nme, but it cn lso contin string with rbitrry ASCII nd UTF-8 chrcters describing service (e.g., CN= Smple Service ). The only restriction on the common nme string is tht it should follow the X520CommonNme stndrd (e.g., should not repet the substring CN= ) [21]. Note tht this is different from the hostnme specifictions tht re very strictly defined nd only llow certin chrcters nd digits s described bove. SubjectAltNme in X.509 certifictes. Subject lterntive nme (subjectaltnme) is n X.509 extension tht cn be used to store different types of identity informtion like fully qulified domin nmes, IP ddresses, URI strings, emil ddresses, nd so forth. Ech of these types hs different restrictions on llowed formts. For exmple, dnsnme(dns) nd uniformresourceidentifier(uri) must be vlid IA5String strings, subset of ASCII strings [21]. We refer interested reders to Section of RFC 5280 for further reding. B. Hostnme verifiction rules Mtching order. RFC 6125 recommends SSL/TLS implementtions to use subjectaltnme extensions, if present in certificte, over common nmes s the common nme is not strongly tied to n identity nd cn be n rbitrry string s mentioned erlier [24]. If multiple identifiers re present in subjectaltnme, the SSL/TLS implementtions should try to mtch DNS, SRV, URI, or ny other identifier type supported by the implementtion nd must not mtch the hostnme ginst the common nme of the certificte [24]. The Certificte Authorities (CAs) re lso supposed to use the dnsnme insted of common nme for storing the identity informtion while issuing certifictes [18]. Wildcrd in common nme/subjectaltnme. if server certificte contins wildcrd chrcter *, n SSL/TLS implementtion should mtch hostnme ginst them using the rules described in RFC 6125 [24]. We provide summry of the rules below. A wildcrd chrcter is only llowed in the left-most lbel. If the presented identifier contins wildcrd chrcter in ny lbel other then the left-most lbel (e.g., nd the SSL/TLS implementtions should reject the certificte. A wildcrd chrcter is llowed to be present nywhere in the left-most lbel, i.e., wildcrd does not hve to be the only chrcter in the left-most lbel. For exmple, identifiers like br*.exmple.com, *br.exmple.com, or f*br.exmple.com vlid. While mtching hostnmes ginst the identifiers present in certificte, wildcrd chrcter in n identifier should only pply to one sub-domin nd n SSL/TLS implementtion should not compre ginst nything but the leftmost lbel of the hostnme (e.g., *.exmple.com should mtch foo.exmple.com but not br.foo.exmple.com or exmple.com). Severl specil cses involving the wildcrds re llowed in the RFC 6125 only for bckwrd comptibility of existing SSL/TLS implementtions s they tend to differ from the specifictions in these cses. RFC 6125 clerly notes tht these cses often led to overly complex hostnme verifiction code nd might led to potentilly exploitble vulnerbilities. Therefore, new SSL/TLS implementtions re discourged from supporting such cses. We summrize some of them: (i) wildcrd is ll or prt of lbel tht identifies public suffix (e.g., *.com nd *.info), (ii) multiple wildcrds re present in lbel (e.g., f*b*r.exmple.com), nd (iii) wildcrds re included s ll or prt of multiple lbels (e.g., *.*.exmple.com). Interntionl domin nme (IDN). IDNs cn contin chrcters from lnguge-specific lphbet like Arbic or Chinese. An IDN is encoded s string of unicode chrcters. A domin nme lbel is ctegorized s U-lbel if it contins t lest one non-ascii chrcter (e.g., UTF-8). RFC 6125 specifies tht ny U-lbels in IDNs must be converted to A-lbels domin before performing hostnme verifiction [24]. U-lbel strings re converted to A-lbels, n ASCII-comptible encoding, by dding the prefix xn-- nd ppending the output of Punycode trnsformtion pplied to the corresponding U- lbel string s described in RFC 3492 [19]. Both U-lbels nd A-lbels still must stisfy the stndrd length bound on the domin nmes (i.e. up to 255 bytes). IDN in subjectaltnme. As indicted in RFC 5280, ny IDN in X.509 subjectaltnme extension must be defined s type IA5String which is limited only to subset of ASCII chrcters [21]. Any U-lbel in n IDN must be converted to A-lbel before dding it to the subjectaltnme. Emil ddresses involving IDNs must lso be converted to A-lbels before. IDNs in common nme. Unlike IDNs in subjectaltnme, IDNs in common nmes re llowed to contin PrintbleString (A-Z, -z, 0-9, specil chrcters = ( ) +, -. / :?, nd spce) s well s UTF-8 chrcters [21]. Wildcrd nd IDN. There is no specifiction defining how wildcrd chrcter my be embedded within A-lbels or U-lbels of n IDN [23]. As result RFC 6125 [24] recommends tht SSL/TLS implementtions should not mtch presented identifier in certificte where the wildcrd is embedded within n A-lbel or U-lbel of n IDN (e.g., xn--kcry6tjko*.exmple.com). However, SSL/TLS implementtions should mtch wildcrd chrcter in n IDN s long s the wildcrd chrcter occupies the entire left-most lbel of the IDN (e.g. *.xn--kcry6tjko.exmple.com). IP ddress. IP ddresses cn be prt of either the common nme ttribute or the subjectaltnme extension (with n IP: prefix) in certificte. Section of RFC 6125 specifies tht n IP ddress must be converted to network byte order octet string before performing certificte verifiction [24]. SSL/TLS implementtions should compre this octet string with the common nme or subjectaltnme identifiers. The length of the octet string must be 4 bytes nd 18 bytes for IPv4 nd IPv6 respectively. The hostnme verifiction should 523

4 succeed only if both octet strings re identicl. Therefore, wildcrd chrcters re not llowed in IP ddress identifiers, nd the SSL/TLS implementtions should not ttempt to mtch wildcrds. Emil. Emil cn be embedded in common nme s the emiladdress ttribute in legcy SSL/TLS implementtions. The ttribute is not cse sensitive. However, new implementtions must dd emil ddresses in rfc822nme formt to subject lterntive nme extension insted of the common nme ttribute [21]. Interntionlized emil. As similr to IDNs in subjectaltnme extensions, n interntionlized emil must be converted into the ASCII representtion before verifiction. RFC 5321 lso specifies tht network dministrtors must not define milboxes (locl-prt@domin/ddress-literl) with non-ascii chrcters nd ASCII control chrcters. Emil ddresses re considered to mtch if the locl-prt nd host-prt re exct mtches using cse-sensitive nd cse-insensitive ASCII comprison respectively (e.g., MYE- MAIL@exmple.com does not mtch myemil@exmple.com but mtches MY @EXAMPLE.COM) [21]. Note tht this specifiction contrdicts tht of the emil ddresses embedded in the common nme tht is supposed to be completely cse-insensitive. Emil with IP ddress in the host prt. RFCs 5280 nd 6125 do not specify ny specil tretment for IP ddress in the host prt of emil nd only llow emil in rfc822nme formt. The rfc822nme formt supports both IPv4 nd IPv6 ddresses in the host prt. Therefore, n emil with n IP ddress in the host prt is llowed to be present in certificte [22]. Wildcrd in emil. There is no specifiction tht wildcrd should be interpreted nd ttempted to mtch when they re prt of n emil ddress in certificte. Other identifiers in subjectaltnme. There re other identifiers tht cn be used to perform identity checks e.g., UniformResourceIdentifier(URI), SRVNme, nd othernme. However, most populr SSL/TLS librries do not support checking these identifiers nd leve it up to the pplictions. III. METHODOLOGY In this section, we describe the chllenges behind utomted testing of hostnme verifiction implementtions. Albeit smll in size, the diversity of these implementtions nd the subtleties in the hostnme verifiction process mke these implementtions difficult to test. We then proceed to describe n overview of our methodology for testing hostnme verifiction implementtions using utomt lerning lgorithms. We lso provide brief summry of the bsic setting under which utomt lerning lgorithms operte. A. Chllenges in hostnme verifiction nlysis We believe tht ny methodology for utomticlly nlyzing hostnme verifiction functionlity should ddress the following chllenges: 1. Ill-defined informl specifictions. As discussed in Section II, lthough the relevnt RFCs provide some exmples/rules defining the hostnme verifiction process, mny corner cses re left unspecified. Therefore, it is necessry for ny hostnme verifiction implementtion nlysis to tke into ccount the behviors of other populr implementtions to discover discrepncies tht could led to security/comptibility flws. 2. Complexity of nme checking functionlity. Hostnme verifiction is significntly more complex thn simple string comprison due to the presence of numerous corner cses nd specil chrcters. Therefore, ny utomted nlysis must be ble to explore these corner cses. We observe tht the formt of the certificte identifier s well s the mtching rules closely resemble regulr expression mtching problem. In fct, we find tht the set of ccepted hostnmes for ech given certificte identifier form regulr lnguge. 3. Diversity of implementtions. The importnce nd populrity of the SSL/TLS protocol resulted in lrge number of different SSL/TLS implementtions. Therefore, hostnme verifiction logic is often implemented in number of different progrmming lnguges such s C/C++, Jv, Python, nd so forth. Furthermore, some of these implementtions might be only ccessible remotely without ny ccess to their source code. Therefore, we rgue tht blck-box nlysis lgorithm is the most suitble technique for testing lrge vriety of different hostnme verifiction implementtions. B. HVLern s pproch to hostnme verifiction nlysis Motivted by the chllenges described bove, we now present our methodology for nlyzing hostnme verifiction routines in SSL/TLS librries nd pplictions. The min ide behind our HVLern system is the following: For different rules in the RFCs s well s for mbiguous rules which re not well defined in the RFC, we generte templte certifictes with common nmes which re specificlly designed in order to check specific rule. Afterwrd, we use utomt lerning lgorithms in order to extrct DFA which describes the set of ll hostnme strings which re mtching the common nme in our templte certificte. For exmple, the inferred DFA from n implementtion for the identifier templte.*..com cn be used to test conformnce with the rule in RFC 6125 prohibiting wildcrd chrcters from ppering in ny other lbel thn the leftmost lbel of the common nme. Once DFA model is generted by the lerning lgorithm, we check the model for violtions of ny RFC rules or for other suspicious behvior. HVLern offers two methods to check n inferred DFA model: Regulr-expression-bsed rules. The first option llows the user to provide regulr expression tht specifies set of invlid strings. HVLern cn ensure tht the inferred DFAs do not ccept ny of those strings. For exmple, RFC 1035 sttes tht only chrcters in the set [A-Z-z0-9] nd the chrcters - nd. should be used in hostnme identifiers. Users therefore cn construct simple regulr expression tht cn be used by HVLern to check whether ny of the tested implementtions ccept hostnme with chrcter outside the given set. 524

The second option offered by HVLern is to perform differentil testing between the inferred model nd models inferred from other implementtions for the sme certificte templte.

5 Model M Equivlence Orcle Lerning Model Lerning Algorithm Membership query Trget System Is model M correct? Yes/No with counter-exmple Fig. 2. Exct lerning from queries: the ctive lerning model under which our utomt lerning lgorithms operte. Differentil testing. The second option offered by HVLern is to perform differentil testing between the inferred model nd models inferred from other implementtions for the sme certificte templte. Given two inferred DFA models, HVLern genertes set of unique differences between the two models using n lgorithm which we discuss in Section IV-E. This option is especilly useful for finding bugs in corner cses which re not well defined in the RFCs. We summrize the dvntges of our pproch below: Adopting blck-box lerning pproch ensures tht our nlysis method is lnguge independent nd we cn esily test vriety of different implementtions. Our only requirement is the bility to query the trget librry/ppliction with certificte nd hostnme of our choice nd find whether the hostnme is mtching the given identifier in the certificte. As pointed out in the previous section, hostnme verifiction is similr to regulr expression mtching. Given tht regulr expressions cn be represented s DFAs, dopting n utomt-bsed lerning lgorithm for representing the inferred models for ech certificte templte is nturl nd effective choice. Finlly, n dditionl dvntge of hving DFA models is tht we cn efficiently compre two inferred models nd enumerte ll differences between them. This property is very importnt for differentil testing s it helps us in nlyzing the mbiguous rules in the specifictions. Limittions. A nturl trde-off of choosing to implement our system s blck-box nlysis method is tht we cnnot gurntee completeness or soundness of our models. However, ech difference inferred by HVLern cn be esily verified by querying the corresponding implementtions. Moreover, since our system will find ll differences mong implementtions, it will not report bug tht is common mong ll implementtions unless rule is explicitly specified for it, s described bove. Finlly, we point out tht not ll discrepncies mong systems re necessrily security vulnerbilities; they my represent eqully cceptble design choices for mbiguous prts of the RFCs. C. Automt Lerning Algorithms We will now describe the utomt lerning lgorithms tht llow us to relize our utomt-bsed nlysis frmework. Lerning model. We utilize lerning lgorithms tht work in n ctive lerning model which is clled exct lerning from queries. Trditionl supervised lerning lgorithms, such s those used to trin deep neurl networks, work on given set of lbeled exmples. In contrst, ctive lerning lgorithms in our model work by dptively selecting inputs tht they use to query trget system nd obtin the correct lbel. Figure 2 presents n overview of our lerning model. A lerning lgorithm ttempts to lern model of trget system by querying the trget system with inputs of its choice. Eventully, by querying the trget system multiple times, the lerning lgorithm infers model of the trget system. This model is then checked for correctness through n equivlence orcle, n orcle tht checks whether the inferred model correctly summrizes the behvior of the trget system. If the model is correct, i.e., it grees with the trget system on ll inputs, then the lerning lgorithm will output the generted model nd terminte. On the other hnd, if the model is incorrect, the equivlence orcle will produce counterexmple, i.e., n input under which the trget system nd the model produce different outputs. The lerning lgorithm then uses the counterexmple to refine the inferred model. This process itertes until the lerning lgorithm produces correct model. To summrize, lerning lgorithm in the exct lerning model is ble to interct with the trget system using two types of queries: Membership queries: The input to this type of query is string s nd the output is Accept or Reject depending on whether the string s is ccepted by the trget system or not. Equivlence queries: The input to n equivlence query is model M nd the output of the query is either True, if the model M is equivlent to the trget system on ll inputs, or counterexmple input under which the model nd trget system produce different outputs. Automt lerning in prctice. The first lgorithm for inferring DFA models in the exct lerning from queries model ws developed by Angluin [31] nd ws followed by lrge number of optimiztions nd vritions in the following yers. In our system, we use the Kerns-Vzirni (KV) lgorithm [54]. The KV lgorithm utilizes dt structure clled the discrimintion tree nd it is in prctice more efficient in terms of the mount of queries it requires to infer DFA model. The most significnt chllenge tht one should ddress in order to use the KV lgorithm nd other utomt lerning lgorithms in prctice, is how to implement n efficient nd ccurte equivlence orcle in order to simulte the equivlence queries performed by the lerning lgorithm. Since we only hve blck-box ccess to the trget system, ny method for implementing equivlence queries is necessrily incomplete. In HVLern, we use the Wp-method [49], for implementing equivlence queries. The Wp-method checks the equivlence between n inferred DFA nd trget system using only blck-box queries to the trget system. Essentilly, the Wpmethod pproximtes n equivlence orcle by using multiple 525

HVLern certificte templtes equivlence query DFA model LernLib Optimized Wp-Method counterexmple KV lgorithm output finl model for test certificte templte test certificte templte Wp-method s test

6 HVLern certificte templtes equivlence query DFA model LernLib Optimized Wp-Method counterexmple KV lgorithm output finl model for test certificte templte test certificte templte Wp-method s test hostnmes hostnme (membership queries) ccept/reject SSL/TLS hostnme verifiction implementtion mtch (hostnme, test cert)? Fig. 3. Overview of lerning hostnme verifiction implementtion using HVLern. membership queries. The lgorithm is given s input the DFA to be checked nd n upper bound on the number of sttes in the trget system when modeled s DFA, prmeter which we cll depth. Then, the lgorithm cretes set of test inputs S, which re then submitted to the trget system. If the trget system grees with the DFA model on ll inputs in the test set S, then the DFA nd the trget system re proved equivlent under the ssumption tht the upper bound on the number of sttes of the trget system is correct. In theory, one cn set the depth prmeter of the Wp-method to very lrge vlue in order to design n equivlence orcle which is, in prctice, complete. However, the size of the set of test inputs produced by the Wp-method is on the order of O(n 2 Σ m n+1 ) where Σ is the input lphbet for the DFA, m is the upper bound on the number of sttes of the trget system nd n is the number of sttes in the input DFA. Therefore, using the Wp-method with lrge depth (i.e., upper bound on the number of sttes of the trget system) is imprcticl. Note tht, the bound on the number of test inputs produced by the Wp-method is not worst cse bound; on the contrry, the number of test inputs produced is usully of tht order. Consequently, it is essentil for the efficiency of our system to mintin smll lphbet for our DFAs nd lso set smll upper bound (depth) on the number of sttes of the trget system while using the Wp-method. We ddress both of these issues in the next section. IV. ARCHITECTURE OF HVLEARN In this section, we describe the design nd implementtion of our system, HVLern, bsed on utomt lerning techniques. Specificlly, we describe the technicl chllenges tht rise when we ttempt to use utomt lerning lgorithms in prctice. We lso summrize the optimiztions tht HVLern implements to ddress these chllenges nd efficiently lern DFA models of hostnme verifiction implementtions. A. System overview Figure 3 presents n overview of how HVLern is used to nlyze the hostnme verifiction functionlity of n SSL/TLS librry. To use HVLern, the user provides HVLern ccess to the hostnme verifiction function tht tkes n X.509 certificte nd hostnme s input nd returns ccept/reject depending on whether the provided hostnme is mtching the identifier in the certificte. We describe how we implement this interfce in Section IV-C. Our system includes number of certificte templtes, which re certifictes designed to test the SSL/TLS implementtion on number of different rules s described in Section IV-B. For ech such templte, HVLern will lern DFA model describing the set of hostnmes ccepted by given implementtion for the given certificte templte. To produce DFA model, HVLern utilizes the LernLib [59] librry which contins implementtions of both the KV lgorithm nd the Wp-method. To void setting the mximum depth of the Wp-method to imprcticlly high vlues, we optimize the equivlence orcle s described in Section IV-D. Once model is generted, our system proceeds to nlyze the model s described in Section IV-E. The results of our nlysis, both the inferred models nd the differences between models re then sved for reuse. Optionlly, HVLern cn lso utilize the inferred models for certificte templte to extrct forml specifiction for the corresponding certificte templte s described in Section V-F. B. Generting certificte templtes To cover ll different rules nd mbiguous prctices in hostnme verifiction, we creted set of 23 certifictes with different identifier templtes, where ech certificte is designed to test specific rule from the specifiction. These certifictes re selected to cover ll the rules we described in Section II. For exmple, certificte with common nme xn--*. will test if the implementtion llows wildcrds s prt of n A-lbel in n IDN, something which is explicitly forbidden by RFC Our templte certifictes re self-signed X.509 v3 certifictes generted using the GnuTLS librry. We choose to use GnuTLS for certificte genertion becuse it llows identifiers with embedded NULL chrcters in both subject common nme nd SAN. The templte identifier to be tested is plced in either Subject CN nd/or SAN (s dnsnme, ipaddress, or emil). C. Performing membership queries In order to utilize the lerning lgorithms in LernLib (including the Wp-method), we implement membership query function tht performs ll queries to the trget system. This function ccepts input s string nd returns binry vlue. In our system, we use the hostnme verifiction function from the trget SSL/TLS implementtion. We note here tht, since LernLib is written in Jv while mny of our tested SSL/TLS implementtions re written in C/C++/Python, we utilized the Jv Ntive Interfce (JNI) [10] to efficiently perform membership queries to the trget in such cses. D. Automt lerning prmeters nd optimiztions In this section, we describe the rchitecturl decisions nd optimiztions tht we implemented to efficiently scle the KV 526

7 lgorithm for testing complex rel-world SSL/TLS hostnme verifiction implementtions. Alphbet size. The first importnt decision we hve to mke to utilize the KV lgorithm is to select n lphbet tht will be used by the lgorithm. The lphbet refers to the set of symbols tht the lerning lgorithm will test. A strightforwrd pproch is to use very generl set of chrcters such s the set of ASCII chrcters. However, this will impose n unnecessry overhed in our system s performnce since the performnce of both the KV lgorithm nd the Wp-method rely hevily on the underlying lphbet size. Our min insight is tht we cn reduce the lphbet to smll set of representtive chrcters tht will thoroughly test ll different spects of hostnme verifiction. In prticulr we select the set Σ={, 1,, A, =, *, x, n, -, \u4f60, NULL} s n input lphbet in our experiments. In the presented lphbet, denotes the. chrcter, \s denotes the spce chrcter (ASCII vlue 32), NULL denotes the zero byte chrcter, nd \u4f60 denotes the unicode chrcter with hexdeciml vlue 4F60. Note tht this set of symbols is dequte for nlyzing hostnme verifiction implementtions since it includes chrcters from ll different ctegories such s lowercse, uppercse, digits, unicode, etc., s well s specil chrcters like the NULL chrcter. The lowercse chrcters x, n in conjunction with the - chrcter re necessry in order to encode IDN hostnmes. Finlly, the inclusion of some nonlphnumeric chrcters such s the = chrcter llows us to detect violtions where n implementtion ccepts invlid hostnmes. Note tht, even though the hostnmes generted using this lphbet set will often not resolve to rel IP ddress when processed s DNS nmes, it does not ffect the ccurcy of our nlysis in ny wy. This is side-effect the fct tht the hostnme verifiction routines re not responsible for resolving the provided DNS nme to n IP ddress. It simply checks whether the given hostnme mtches the identifier in the provided certificte. Cching membership queries. To void the communiction cost of repeted querying of the SSL/TLS implementtions with sme inputs, we utilize LernLib s DFALerningCche clss to cche the results of the membership queries. The cche is checked on ech new query, nd cched result is used whenever found. This optimiztion is prticulrly useful for cutting down the overhed of the repeted queries generted by the Wp-method cross multiple equivlence queries. Optimizing equivlence queries. In prctice, the first model generted by the lerning lgorithm is usully just single stte DFA which rejects ll hostnmes. The reson is tht the lerning lgorithm is not ble to generte ny ccepting hostnme nd thus cnnot distinguish between the initil stte nd ny other stte in the trget system. Sometimes, to force the KV lgorithm to produce n ccepting hostnme using the Wp-method, very lrge depth is required. This my cuse efficiency issues in the system. However, if we supply the model with n ccepting hostnme, then trivil models will be improved quickly without hving to utilize excessive depth prmeters in the Wp-method. Recll here tht the exponentil term in the Wp-method is dependent on the difference between the number of sttes in the model nd the provided depth. Therefore, once we discover n ccepting stte in the trget system, the Wp-method with much smller depth will still be ble to explore mny different spects of the hostnme verifiction implementtion. In order to generte n ccepting hostnme, we perform the following test during n equivlence query nd before clling the Wp-method. First, we serch for ny wildcrd chrcters (*) in the provided common nme nd replce them with rndom chrcters from our lphbet to obtin concrete hostnme. Next, we check tht the generted model nd the trget hostnme verifiction implementtion gree on set of hostnmes generted using this method. If not, we return the hostnme for which they differ s counterexmple. The min dvntge of this heuristic is tht it llows us to quickly produce ccepting hostnmes tht uncover new sttes in the trget system without invoking the Wp-method with very lrge depth vlues. Once these sttes re uncovered, nd the qulity of the inferred models improve, the Wp-method, with smll depth prmeter, is utilized to discover dditionl sttes in the trget system. E. Anlysis nd comprison of inferred DFA models After HVLern outputs model, the next tsk for our system is to nlyze the produced model for RFC violtions or, confusing/mbiguous rules in the RFC, to compre different inferred models nd nlyze ny discrepncies found between different implementtions. Anlyzing single DFA model. In the cse of single model, we would like to determine whether the model is ccepting invlid hostnmes prohibited by the RFC specifiction. If the specifiction is uncler, our nlysis cn still be used in order to mnully inspect the behvior of the implementtion on the specific certificte templte besides the differentil nlysis described below. Our system offers two options for performing nlysis of single model. First, our system genertes inputs tht will exercise ll simple pths (i.e., pths without loops) tht led to ccepting sttes, in the inferred model. Intuitively, these inputs re smll set of inputs tht describe ll different flvors of hostnmes tht will be ccepted for the given certificte templte. By inspecting these certifictes, we cn determine if the implementtion is ccepting invlid hostnmes. Second, HVLern llows the user to specify regulr expression rule to be checked ginst the inferred model. In this cse, the user specifies regulr expression nd HVLern verifies tht the regulr expression nd the inferred model does not shre ny common strings. This option llows to esily check certin RFC violtions by utilizing simple regulr expression rules. For exmple, consider the rule specifying tht no nonlphnumeric chrcters should be prt of mtching hostnme. By specifying the regulr expression rule (.)*=(.)* 527

8 we cn check whether there exists ny mtching hostnme tht contins the = chrcter in the inferred model. Compring unique differences between DFA models. For nlyzing certin corner cses which re not specified in the RFC, testing single model my not be enough. Insted, we compre the inferred models for different SSL/TLS implementtions nd find inputs under which the implementtions behve differently. To perform this nlysis, we utilize the difference enumertion lgorithm from [33]. In nutshell, this lgorithm computes the product DFA between two, or more, given models nd then finds ll simple pths to sttes in which the DFAs re producing different output. F. Specifiction Extrction As we discussed lredy, the RFC specifictions leve certin spects of hostnme verifiction up to the implementtions by not specifying the correct behvior in ll cses. In these cses imposing specific restrictions in the implementtions is chllenging since we hve to be creful to void breking comptibility with existing implementtions nd vlid certifictes. In this section, we describe how the inferred DFA models for the different certificte templtes cn be used to infer forml specifiction, which is comptible with existing implementtions, for the cses where RFC specifictions re vgue. Our min insight is the following: For ech certificte templte, we cn use the DFA ccepting the set of hostnmes ccepted by ll SSL/TLS implementtions s forml specifiction of the corresponding rule templte. The intuition behind this choice is tht this specifiction is voiding smll idiosyncrsies of ech librry nd it is thus very compct. On the other hnd, if vulnerbility exists in this specifiction then this vulnerbility must lso exist in ll tested implementtions. Since ech implementtion is udited independently, our choice gives us confidence tht our specifiction is secure from simple vulnerbilities while mintining bckwrd comptibility with the tested implementtions. Computing the specifiction. In order to compute the corresponding specifiction for ech certificte templte, we proceed s follows: First, we obtin DFA models for ll hostnme verifiction implementtions under test using HVLern. Next, we compute the product DFA for ll the inferred models. The product DFA ccepts the intersection of the regulr lnguges of ech DFA. We compute the product DFA using stndrd utomt lgorithms [60]. The inferred forml specifiction for our set of implementtions is represented by the product DFA of ech DFA model. This product DFA cn be then converted bck to regulr expression to improve redbility. Finlly, we would like to point out tht computing the intersection of k DFAs hve worst cse time complexity of O(n k ) where n is the number of sttes in ech DFA [55]. However, in our cse, the inferred DFAs re mostly similr nd thus, the product construction is very efficient becuse intersecting two DFAs is not dding significnt number of sttes in the resulting product DFA. We provide more evidence supporting this hypothesis in Section V. V. EVALUATION The min gols of our evlution of HVLern to nswer the following questions: (i) how effective HVLern is in finding RFC violtions in rel-world hostnme verifiction implementtions? (ii) How much do our optimiztions help in improving the performnce of HVLern? (iii) how does HVLern perform compre to existing blck-box or covergeguided gry-box techniques (iv) cn HVLern infer bckwrdcomptible specifictions from the inferred DFAs of rel-world hostnme verifiction implementtions. A. Hostnme verifiction test subjects We use HVLern to test hostnme verifiction implementtions in six populr open-source SSL/TLS implementtions, nmely OpenSSL, GnuTLS, MbedTLS (PolrSSL), MtrixSSL, JSSE, nd CPython SSL, s well s in two populr SSL/TLS pplictions: curl nd HttpClient. Note tht s severl librries like OpenSSL versions prior to do not provide support for hostnme verifiction nd leve it up to the ppliction developer to implement it. Therefore, pplictions like curl/httpclient tht support different librries re often forced to write their own implementtions of hostnme verifiction. Among the librries tht support hostnme verifiction, some like OpenSSL provide seprte API functions for mtching ech type of identifier (i.e., domin nme, IP ddresses, emil, etc.) nd leve it up to ppliction to select the pproprite one depending on the setting. In contrst, others like MtrixSSL combine ll supported types of identifiers in one function nd figure out the pproprite by inspecting the input string. Tble I shows the hostnme verifiction function/clss nmes for ll implementtions tht we tested nd the types of identifier(s) tht ech of them supports. The lst column shows physicl source lines of code (SLOC) for ech host mtching function/clss s reported by the SLOCCount [14] tool. Note tht the shown SLOC only count the prts of the code tht perform hostnme mtching. B. Finding RFC violtions with HVLern We use HVLern to produce DFA models for ech distinct certificte templte corresponding to different ptterns from the RFCs. Afterwrd, we detect potentilly buggy behvior by both performing differentil testing of output DFAs s well s checking individul DFAs for violtions of regulrexpression-bsed rules tht we creted mnully s described in Section IV-E. Tble II presents the results of our experiments. We evluted diverse set of rules from four different RFCs [16], [17], [21], [24]. We found tht every rule tht we tested is violted by t lest one implementtion, while on verge ech implementtion is violting three RFC rules. Severl of these violtions hve severe security implictions (e.g., mishndling wildcrd chrcters in interntionl domin nmes, confusing IP ddresses s domin nmes etc.). We describe these cses long with their security implictions in detil in Section VI. 528

9 TABLE I HOSTNAME VERIFICATION FUNCTIONS (ALONG WITH THE TYPES OF SUPPORTED IDENTIFIERS) IN SSL/TLS LIBRARIES AND APPLICATIONS SSL/TLS Version Supported Hostnme Mtching Approx. Libs/Apps Identifier(s) Function/Clss Nme SLOC OpenSSL OpenSSL CN/DNS X509 check host 314 IP X509 check ip 308 IP X509 check ip sc 417 X509 check emil 314 GnuTLS CN/DNS/IP gnutls x509 crt check hostnme, 195 gnutls x509 crt check hostnme2 gnutls x509 crt check emil 149 MbedTLS CN/DNS mbedtls x509 crt verify, 193 mbedtls x509 crt verify with profile MtrixSSL CN/DNS/IP/ mtrixvlidtecerts 130 JSSE 1.8 CN/DNS/IP HostnmeChecker 202 CPython SSL CN/DNS/IP mtch hostnme 59 HttpClient CN/DNS/IP DefultHostnmeVerifier 257 curl CN/DNS/IP verifyhost, 300 Curl verifyhost Note tht the librry with the most violtions is JSSE (four violtions), while HttpClient is the ppliction with the most violtions (five violtions). OpenSSL, MbedTLS, nd CPython SSL only hve two violtions ech, hving common the violtion of mtching invlid hostnmes. The interested reder cn find n extended description of our results in the Appendix (Tble VIII). C. Compring unique differences between DFA models In order to evlute the discrepncies between ll different hostnme verifiction implementtions, we computed the number of differences for ech pir of hostnme verifiction implementtions in our test set. Recll tht for two given DFA models we define the number of differences s the number of simple pths in the product DFA which led to different output being produced by the two models [33]. Tble III presents the results of our experiment. For exmple, OpenSSL nd GnuTLS hve 95 discrepncies in totl. This is obtined by summing up the number of unique pths tht re different between the inferred DFAs for ech common nme in Tble VIII. Note tht ll pirs of implementtions contin lrge number of unique cses under which they produce different output. As seen in Tble III, ech pir of tested implementtion hs 127 unique differences on verge between them. We note tht some differences only imply mbiguous RFC rules while some revel the potentil invlid hostnmes or RFC violtion bugs. The interested reder cn find more detiled list of the unique strings tht ech implementtion is ccepting in Tble VIII in the Appendix. In ny cse, we find the fct tht ll implementtions of such security criticl component of the SSL/TLS protocol present such lrger number of discrepncies to be n lrming issue since it signifies either poor implementtion of the specifiction or vgueness in the specifiction itself. Our nlysis suggests tht both cses re present in prctice. D. Compring code coverge of HVLern nd blck/gry-box fuzzing In order to compre HVLern s effectiveness in finding bugs with tht of blck/gry-box fuzzing, we investigte the following reserch question: RQ.1: How HVLern s code coverge differ from blck/grybox fuzzing techniques? We compre the code coverge of the tested hostnme verifiction implementtions chieved by HVLern nd two other techniques, blck-box fuzzing, nd coverge-guided gry-box fuzzing. We describe our testing setup briefly below. HVLern: HVLern leverges utomt lerning tht invokes the hostnme verifiction mtching routine with predefined certificte templte nd lphbet set. HVLern dptively refines DFA corresponding to the test hostnme verifiction implementtion by querying the implementtion with new hostnme strings. We mesure the code coverge chieved during the lerning process until it finishes. We lso monitor the totl number of queries NQ, which comes from both the membership nd the equivlence queries. Blck-box fuzzing: With the sme lphbet nd certificte templte used by HVLern, we rndomly generte NQ strings nd query the trget SSL/TLS hostnme verifiction function with the sme certificte templte. Note tht the blck-box fuzzer genertes independent rndom strings without ny sort of guidnce. Coverge-guided gry-box fuzzing: Unlike blck-box fuzzing, coverge-guided gry-box fuzzing tries to generte more interesting inputs by using evolutionry techniques to the input genertion process. In ech genertion, new btch of inputs re generted from the previous genertion through muttion/cross-over nd only the inputs tht increse code coverge re kept for further chnges. Coverge-guided grybox fuzzing is populr technique for finding bugs in lrge rel-world progrms [6], [11]. To mke it fir comprison with HVLern, we implemented our own coverge-guided gry-box fuzzer s existing tools like AFL do not provide n esy wy of restricting the muttion outputs within given lphbet. With the sme lphbet set, we initilize the fuzzer with set of strings of vrying lengths s the seeds mintined in queue Q. The seeds re then used by the fuzzer to query the trget hostnme verifiction implementtion. After finishing querying, using the seeds, the fuzzer gets the string S = dequeue(q). It rndomly muttes one chrcter within S nd obtins S. Then it uses the mutted S to query the trget. If the mutted string S incresed code coverge, we store it in the queue for further muttion, i.e., enqueue(s,q). Otherwise, we throw it wy. The fuzzer is thus guided to lwys mutte on the strings tht hve better code coverge. The fuzzer itertively performs this enqueue/dequeue opertions for NQ rounds, nd we obtin the finl code coverge COV rndmu of ech 529

10 TABLE II A SUMMARY OF RFC VIOLATIONS AND DISCREPANT BEHAVIORS FOUND BY HVLEARN IN THE TESTED SSL/TLS LIBRARIES AND APPLICATIONS RFC Violtions RFC Invlid hostnme chrcter Only lphnumeric nd - mtches in hostnme 1035 Cse-insensitive hostnme Mtch CN in cse-insensitive mnner 5280, 6125 Wildcrd Not ttempt to mtch wildcrd not in left-most lbel (CN/DNS:.*.) 6125 IDN nd wildcrd Not ttempt to mtch wildcrd frgment in IDN (xn--*.) 6125 Common nme nd subjectaltnme No CN checked when DNS presents 6125 No CN checked when ny SAN ID presents 6125 Emil-bsed certificte Cse-sensitive on locl-prt of emil ttribute in SAN 5280 IP ddress-bsed certificte Not ttempt to mtch IP ddress with DNS (DNS: ) 1123 Discrepncies Wildcrd Attempt to mtch wildcrd with empty lbel (hostnme:.. with CN/DNS: *..) Attempt to mtch wildcrd in public suffix (CN/DNS: *.co.uk) 6125 Embedded NULL chrcter Allowed NULL chrcter in CN Allowed NULL chrcter in SAN Mtch NULL chrcter hostnme: b.b\0.., CN/DNS: b.b\0.. Other invlid hostnme Prtilly mtch suffix (hostnme:. with CN/DNS:.,..) 1035 Mtch triling (hostnme:. with CN/DNS:.) OpenSSL GnuTLS MbedTLS MtrixSSL JSSE CPython SSL curl HttpClient HttpClient* HttpClient*: HttpClient with PublicSuffixMtcher For RFC Violtion: = OK, = RFC violte, = libs/pps do not support For Discrepncies: = Accept, = Reject TABLE III NUMBER OF UNIQUE DIFFERENCES BETWEEN AUTOMATA INFERRED FROM DIFFERENT SSL/TLS IMPLEMENTATIONS OpenSSL GnuTLS MbedTLS MtrixSSL JSSE CPython HttpClient Curl OpenSSL GnuTLS MbedTLS MtrixSSL JSSE CPython HttpClient 414 Curl % of line coverge HVLern Coverge-guided gry-box fuzzing Blckbox fuzzing Number of queries Fig. 4. Comprison of code coverge chieved by HVLern, gry-box fuzzing, nd blck-box fuzzing for OpenSSL hostnme verifiction. functions SSL/TLS implementtions. Note tht we keep the test certificte templte fixed during the entire test. We use the percentge of lines executed, which re extrcted by Gcov [51], s the indictor for the code coverge. Considering tht hostnme verifiction is smll prt of n SSL/TLS implementtion, we do not compute the percentge of lines covered with respect to the totl number of lines. Insted, we clculte the percentge of line coverge within ech function nd only tke into ccount the functions tht re relted to hostnme verifiction. Result 1: HVLern chieves 11.21% increse in code coverge on verge when compring to the blck/grybox fuzzing techniques. Therefore, let LE(f) be the number of lines executed of function f in the SI nd L(f) be the totl number of lines of f, the code coverge cn be defined in the following equ- 530

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

In the last lecture, we discussed how valid tokens may be specified by regular expressions. LECTURE 5 Scnning SYNTAX ANALYSIS We know from our previous lectures tht the process of verifying the syntx of the progrm is performed in two stges: Scnning: Identifying nd verifying tokens in progrm.