Multi Path Considerations for Anonymized Routing: Challenges and Opportunities

Similar documents
CE Advanced Network Security Anonymity II

Tor Experimentation Tools

Traffic Optimization in Anonymous Networks

The Onion Routing Performance using Shadowplugin-TOR

Toward Improving Path Selection in Tor

A SIMPLE INTRODUCTION TO TOR

CS268: Beyond TCP Congestion Control

Avoiding The Man on the Wire: Improving Tor s Security with Trust-Aware Path Selection

Tor: The Second-Generation Onion Router. Roger Dingledine, Nick Mathewson, Paul Syverson

The New Cell-Counting-Based Against Anonymous Proxy

Chapter III. congestion situation in Highspeed Networks

THE SECOND GENERATION ONION ROUTER. Roger Dingledine Nick Mathewson Paul Syverson. -Presented by Arindam Paul

Transmission Control Protocol (TCP)

Implementation and Evaluation of Coupled Congestion Control for Multipath TCP

Performance Consequences of Partial RED Deployment

Impact of transmission errors on TCP performance. Outline. Random Errors

Anonymity With Tor. The Onion Router. July 21, Technische Universität München

Anonymity C S A D VA N C E D S E C U R I T Y TO P I C S P R E S E N TAT I O N BY: PA N AY I OTO U M A R KO S 4 T H O F A P R I L

Moving Tor Circuits Towards Multiple-Path: Anonymity and Performance Considerations

Appendix B. Standards-Track TCP Evaluation

TCP Congestion Control in Wired and Wireless networks

2 ND GENERATION ONION ROUTER

15-744: Computer Networking TCP

RD-TCP: Reorder Detecting TCP

Chapter 6. What happens at the Transport Layer? Services provided Transport protocols UDP TCP Flow control Congestion control

TCP. CSU CS557, Spring 2018 Instructor: Lorenzo De Carli (Slides by Christos Papadopoulos, remixed by Lorenzo De Carli)

The effect of Mobile IP handoffs on the performance of TCP

Topics. TCP sliding window protocol TCP PUSH flag TCP slow start Bulk data throughput

TCP based Receiver Assistant Congestion Control

One Fast Guard for Life (or 9 months)

RED behavior with different packet sizes

0x1A Great Papers in Computer Security

Onion Routing. Varun Pandey Dept. of Computer Science, Virginia Tech. CS 6204, Spring

Improving stream correlation attacks on anonymous networks

CS526: Information security

Performance Enhancement Of TCP For Wireless Network

A Survey on Quality of Service and Congestion Control

Revisiting Circuit Clogging Attacks on Tor

CS419: Computer Networks. Lecture 10, Part 3: Apr 13, 2005 Transport: TCP performance

Metrics for Security and Performance in Low-Latency Anonymity Systems

ADVANCED COMPUTER NETWORKS

Adaptive throttling of Tor clients by entry guards

CMPE 257: Wireless and Mobile Networking

Transmission Control Protocol. ITS 413 Internet Technologies and Applications

Challenges in building overlay networks: a case study of Tor. Steven Murdoch Principal Research Fellow University College London

Tor Experimentation Tools

cs/ee 143 Communication Networks

There are 10 questions in total. Please write your SID on each page.

Practical Anonymity for the Masses with MorphMix

Improving TCP Performance over Wireless Networks using Loss Predictors

Network Working Group. Category: Experimental February TCP Congestion Control with Appropriate Byte Counting (ABC)

TCP and Congestion Control (Day 1) Yoshifumi Nishida Sony Computer Science Labs, Inc. Today's Lecture

Transport layer issues

Congestion Avoidance and Control. Rohan Tabish and Zane Ma

Lecture 3: The Transport Layer: UDP and TCP

Overview. TCP congestion control Computer Networking. TCP modern loss recovery. TCP modeling. TCP Congestion Control AIMD

Anonymity. Assumption: If we know IP address, we know identity

CMSC 417. Computer Networks Prof. Ashok K Agrawala Ashok Agrawala. October 25, 2018

ECE 333: Introduction to Communication Networks Fall 2001

Transport Layer. Application / Transport Interface. Transport Layer Services. Transport Layer Connections

Dynamic Deferred Acknowledgment Mechanism for Improving the Performance of TCP in Multi-Hop Wireless Networks

Safely Measuring Tor. Rob Jansen U.S. Naval Research Laboratory Center for High Assurance Computer Systems

Safely Measuring Tor. Rob Jansen U.S. Naval Research Laboratory Center for High Assurance Computer Systems

Impact of Short-lived TCP Flows on TCP Link Utilization over 10Gbps High-speed Networks

Privacy defense on the Internet. Csaba Kiraly

How YouTube Performance is Improved in the T-Mobile Network. Jie Hui, Kevin Lau, Ankur Jain, Andreas Terzis, Jeff Smith T Mobile, Google

Fall 2012: FCM 708 Bridge Foundation I

CS 638 Lab 6: Transport Control Protocol (TCP)

Congestion Control In The Internet Part 2: How it is implemented in TCP. JY Le Boudec 2014

Wireless Challenges : Computer Networking. Overview. Routing to Mobile Nodes. Lecture 25: Wireless Networking

Advanced Computer Networks. Datacenter TCP

Lecture 4: Congestion Control

UNIT IV -- TRANSPORT LAYER

Addressing the Challenges of Web Data Transport

Experimental Study of TCP Congestion Control Algorithms

Social Networking for Anonymous Communication Systems: A Survey

Circuit Fingerprinting Attack: Passive Deanonymization of Tor Hidden Services

A Two-level Threshold Recovery Mechanism for SCTP

Congestion Control in Communication Networks

MultiPath TCP: From Theory to Practice

Congestion Control In The Internet Part 2: How it is implemented in TCP. JY Le Boudec 2014

Torchestra: Reducing Interactive Traffic Delays over Tor

Congestion Control. Brighten Godfrey CS 538 January Based in part on slides by Ion Stoica

Mean Waiting Delay for Web Object Transfer in Wireless SCTP Environment

Does current Internet Transport work over Wireless? Reviewing the status of IETF work in this area

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup

PrivCount: A Distributed System for Safely Measuring Tor

ENSC 835: COMMUNICATION NETWORKS

Recap. TCP connection setup/teardown Sliding window, flow control Retransmission timeouts Fairness, max-min fairness AIMD achieves max-min fairness

Hiding Amongst the Clouds

ENRICHMENT OF SACK TCP PERFORMANCE BY DELAYING FAST RECOVERY Mr. R. D. Mehta 1, Dr. C. H. Vithalani 2, Dr. N. N. Jani 3

Fast Retransmit. Problem: coarsegrain. timeouts lead to idle periods Fast retransmit: use duplicate ACKs to trigger retransmission

image 3.8 KB Figure 1.6: Example Web Page

Department of EECS - University of California at Berkeley EECS122 - Introduction to Communication Networks - Spring 2005 Final: 5/20/2005

Chapter 13 TRANSPORT. Mobile Computing Winter 2005 / Overview. TCP Overview. TCP slow-start. Motivation Simple analysis Various TCP mechanisms

User Datagram Protocol (UDP):

BLEST: Blocking Estimation-based MPTCP Scheduler for Heterogeneous Networks. Simone Ferlin, Ozgu Alay, Olivier Mehani and Roksana Boreli

Multiple unconnected networks

Congestion Control In The Internet Part 2: How it is implemented in TCP. JY Le Boudec 2015

TCP Congestion Control

Transcription:

Multi Path Considerations for Anonymized Routing: Challenges and Opportunities Hasan T Karaoglu, Mehmet Burak Akgun, Mehmet Hadi Gunes and Murat Yuksel Department of Computer Science and Engineering University of Nevada, Reno Email: { karaoglu, makgun, mgunes, yuksem}@cse.unr.edu Abstract Recently, there has been a surging interest on anonymizer technologies as a result of increasing privacy concerns and raising censorship barriers around the world. Tor network has emerged as the most popular option in avail thanks to its large volunteer base. However, popularity of Tor network is under threat of growing user frustration over low throughput and long latencies. Coincidentally multi-path routing proposals have become popular in tackling the similar performance issues in various contexts from data center networks to inter-domain routing. In this work, we apply multi-path techniques so as to overcome limitations in Tor s path construction and rigid congestion avoidance mechanisms which are major factors behind the unsatisfactory user experience. It turns out that multipath mechanisms nicely complement underlying onion-routing mechanisms of Tor via exploiting diversity in Tor network. Our preliminary evaluation on live Tor network revealed the significant potential of performance improvements achieved by multi-path techniques especially in increasing throughput which can offer up to 4 times speed up in data transmission durations. Also, multi-path mechanisms may increase reliability of the overall system against congestion or service-interruption based attacks in return of acceptable anonymity trade-offs. Besides these very promising features, application of multi-path techniques on anonymized routing introduces many open research questions calling for further research on data splitting, path construction, latency estimation techniques whose findings can benefit many research areas. I. INTRODUCTION As the Internet literally revolutionizes the world through social media recently (and for couple of decades in many other ways), privacy, privacy of identity, i.e. anonymity, and free access to the knowledge on the Internet have become crucial components of society. Although these rights are not granted by governments or legislation on every country, there has been significant effort to provide them by means of technology [1]. Currently, Tor Project [2] is the largest network of volunteers who help running a low-latency traffic mixing network which makes anonymous access to the Internet possible for many around the world. Tor network has been built on onion-routing scheme which basically prevents matching of flows entering and leaving an overlay network to ensure anonymity against third parties to a certain level via multi-layered encryption and traffic-mixing [3]. Within that scheme, Onion-Proxy (OP), which is the Tor software running on end-user computer, connects to a node which can be a well-known, reliable-high bandwidth entry Fig. 1. Tor Circuit Establishment guard or a well-hidden bridge so as to get information about other nodes in the network. Then, OP builds paths called circuits which connects OP to the Internet and redirect its traffic over other relay nodes in an overlay manner as depicted in Figure 1. A typical circuit consists of three relay routers: an entry node, a middle Onion Router (OR) in addition to an exit node. These circuits are periodically built and torn down after a time period to ensure anonymity of the users. In a typical web browsing scenario as depicted in Figure 1, browser running at user-host creates a TCP connection to a website through Tor proxy running on her computer. Tor proxy creates a stream whose cells, i.e. atomic unit of transmission in Tor network (size of 512 bytes), would be forwarded along the circuit to the circuit s exit node. Exit node in turn reconstructs TCP segments from these fixed-size cells and connects to the website as if the exit node itself is the original source of web request. Fig. 2. Onion Routing Encryption Layers While building a circuit, OP agrees on pairwise symmetric keys with each relay node on the circuit [3]. So, payload in a cell is encrypted within three layers of encryption upon leaving the OP. Each layer of encryption is removed by the relay nodes along the path till exit node forwards this data to the Internet in its original form (as it is received by Tor proxy software running in host OP) as seen in Figure 2. In its current design, Tor network favors and prioritizes

nodes who advertises high-bandwidth capacity in Tor network directories so that these high-capacity nodes are most probably selected by OPs querying these directories [4], [5], [6]. OPs build circuits in a source-based routing manner without much coordination from Tor network, mostly based on self-advertised bandwidth capacity information and circuit establishment latencies nearly oblivious to congestion on relay nodes. So, overall load-balancing performance in Tor network may result in several highly congested periods of time on particular relays whereas other parts of the network may have low traffic utilization [5], [6]. In addition to its path construction method, Tor network is increasingly criticized over its static congestion window management [6], [7]. According to its token-bucket like algorithm, there can be at most 1000 cells (500 KiB) concurrently in transmission through a single circuit. For stream-level control, this limit is 500 cells. When there is no token left at the bucket, OP stops sending cells until it receives a SENDME message from other end of the circuit which adds a token worth of 50 KiB at circuit-level (and 25 KiB at stream-level) to the bucket. Moreover, introduction of low-bandwidth bridge relays, which provides the only means of access to the Tor network for users living in countries with active Internet censorship policies, exaggerates the problem of low-throughput and longlatencies experienced by users. As pointed out by the authors of a recent work [7], slowing growth rate of Tor network user base could be the result of user frustration over Tor performance. This poses a major threat for Tor network whose mixing-anonymization performance directly relies on the size and diversity of its user base. In this paper, we investigate possibilities of multi-path routing on Tor network to tackle performance issues related to congestion and low throughput. Multi-path techniques have been proposed previously for a wide range of areas, especially for enhanced reliability [8]. Interest on multi-path routing has been recently revived by the urgent need of high-performance transmission solutions on data-center networks [9] and proliferation of computers equipped with multiple network interfaces, e.g., wireless and wired, which can enjoy significant throughput improvements by concurrent data transmission over multiple media [10], [11], [12]. In its simplest form, multi-path routing in Tor network can increase overall throughput by exploiting concurrent data transmission over multiple low-bandwidth circuits. Since multi-path routing dynamically distributes traffic among multiple circuits according to performance of underlying overlay paths, temporary congestions on relay nodes or artificial delays due to Tor congestion management (at circuit and stream level) can be circumvented quite effectively. Moreover, overall loadbalancing performance of Tor network can be significantly improved. Better traffic mixing can be achieved via exploiting path diversity of Tor network by means of intelligent path construction mechanisms [4], [13], [6] coupled with multipath routing. Indeed, our preliminary analysis in live Tor network revealed that significant improvements on throughput are achieved via multi-circuit transmissions up to 400 percent. In this work, we analyzed the differences in throughput speed-ups achieved by varying number of circuits, and also differences in performance improvements in short and bulk data transfers which turned out to be quite diverging. We also investigate how node diversity and locality of nodes affect improvements achieved by multi-path routing based on our experiments with Amazon EC2 [14] nodes on different regions of the world. Finally, we investigate how multi-path mechanisms may affect the size of buffers on Tor relays especially Onion proxies and exit nodes. This analysis is particularly important as it is widely known that multi-path schemes may cause overall high memory usage, e.g., receiver buffer in MultiPath TCP [12]. In fact, our findings show that introduced increases in receiver buffer spaces can be quite large (up to several hundred KiBs) unless intelligent circuit establishment and management mechanisms are applied effectively. The rest of the paper is organized as follows: In Section I-A, we summarize some of the recent developments in improving Tor network performance. Then, in Section II, we describe principal ideas and mechanisms related to the design of Multipath Onion Routing. In Section III, first we explain our experiment setup in detail, then we share the evaluation results for performance improvements in various setup cases along with cost analysis regarding required growth in buffer size. In Section IV, we discuss challenges in designing a better multi-path prototype and share some future directions where we want to proceed with. Finally, in Section V, we conclude our paper. A. Related Work Previously, there have been several proposals targeting overhauls in node-selection algorithms of Tor for enhancing anonymity by increasing AS-level and subnet-level diversity of selected nodes [15], [16]. More recently, similar approaches have been proposed mostly targeting the performance improvements [4], [6]. Snader et al. proposes several bandwidth estimation techniques including passive observance of achieved bandwidth values and active/periodic measurement of temporal bandwidth capacity of nodes, so as to increase chance of establishing high-performing circuits instead of relying selfadvertised capacity values by relay nodes [4]. Snader et al. also proposes a tunable mechanism which allows users to manage tradeoffs between performance and anonymity by adjusting probability of selecting nodes out of small-set of high-capacity nodes or from large-set of nodes with various bandwidth capacities homogeneously [4]. Their findings showed that significant improvements can be achieved by little compromise on anonymity [4]. On a similar study, Wang et al. proposed consideration of node congestion while establishing circuits [6]. Wang et al. explained how latency measurement techniques can be applied for achieving effective node-based congestion estimation by using active probing and opportunistic use of control messages [6]. They also introduced mechanisms to avoid temporal and long-term congestion on nodes by fastcircuit switching and by keeping track of long-term congestion

performance of nodes in Tor network [6]. AlSabah et al. addressed the Tor performance issues by improving static congestion window mechanisms currently deployed by Tor [7]. AlSabah et al. experimented with adaptive window sizing mechanisms. In the same work, explicit congestion signaling performance of backpressure mechanisms were compared against the performance of dynamic window sizing. Their preliminary findings showed that backpressure congestion signaling and dynamic queue management approaches perform better than dynamic window sizing [7]. In a recent technical report, Mashael et al. described a scenario where multi-path routing is applied on Tor network [13]. They have reported considerable improvements on twocircuits scenarios for several performance metrics, e.g., time to receive first byte, overall download time. They experimented with modified Tor proxy and exit nodes. In their design, they exploited SENDME control messages to estimate round trip times in an aggregate manner for every 100 cells. In their study, they also removed stream-level Tor congestion windows. Moreover, Mashael et al. provided a detailed security analysis for the implication of multi-path routing for the cases when some of the entry and exit nodes of the Tor network is compromised. According to their analysis, multi-path routing with three-circuits increases the risk of compromising anonymity by only 7 percent [13]. Our approach differs with work of Mashael et al. [13] in several ways. First, we take a black-box approach and work with original Tor software instead of working with modified Tor proxies. Our aim is to be able to work with diverse set of exit nodes instead of a small subset. Our experiment is mostly focus on discovering the limits of multi-path routing. For example, when do the benefits of having an additional circuit level off? What are the negative impacts of multi-path routing on buffer sizes of Tor exit nodes and proxies? What are the regional differences in achieved performance improvements? Can lessons learned from recent MultiPath TCP studies be applied on Tor network? [9], [8], [10], [11], [12] We leveraged RTT estimation techniques and combined congestion window management methods from the recent works on MultiPath- TCP [10], [11], [12]. Generally speaking, it can be said that their study mostly focuses on the efficient implementation of multi-path routing on top of existing Tor mechanisms whereas our study targets to explore limits and boundaries of multi-path routing performance via black-box approach. In that regard, we prefer working with fine-granularity RTT estimation and dynamic sizing of traffic distribution with statistic updates upon per-cell delivery whereas scheme of Mashael et al. [13] mostly works at aggregate level with updates upon delivery of 100 cells. A. In-band Signaling II. MULTI-PATH ONION ROUTING Multi-path routing scheme requires introduction of many changes to the original Tor software. First of all, in-band signaling of multi-circuit transmission should be implemented, so that OP and exit node could combine multiple-streams over Fig. 3. Tor with Multi-path Approach separate circuits into one logical stream. In our design, we envision existence of several circuits which share a single exit node. Common exit node combines all the data received from related streams over multiple-circuits and forwards it to the Internet as a single stream of traffic as seen in Figure 3. Entry nodes also can be common however it is less likely to perform as well as entry-node disjoint multi-circuit transmission in case of a congested or low-capacity entry node. Similar to MultiPath-TCP [12], if both OP and exit nodes are multi-path capable, OP informs exit node of its intention of multi-path transmission initiation. Then, both ends of communication should exchange security tokens (nonce) and related circuit information over master stream [12], [13]. Both ends should verify these exchanged tokens again over each substream flowing over each circuit. Finally, exit node and OP logically join these streams together as if they were a single logical stream. B. Traffic Distribution over Sub-streams Secondly, for dynamic distribution of traffic among multiple-circuits, a relatively fine-granularity round-trip-time (RTT) estimation is necessary. So that, in the long run, a high-performing (high-throughput) circuit is preferred more often than a low-performing circuit while traffic distribution decisions are made at per-cell transmission level. Also, in the short-run, temporarily congested high-capacity circuit can be avoided by forwarding most of the cells over non-congested circuits. C. Sequencing and Receiver Windows Since dynamic traffic distribution is made at per-cell basis and characteristics of transmitting circuits can be quite different in terms of latency and bandwidth, reordering of data should be handled by both end of communication via sequence numbers and packet buffering. Again, in analogy with TCP, a receiver buffer should keep packets in according to their sequence numbers. If an incoming cell has the next expected sequence number, it can be directly delivered to higher layers (user program on OP or TCP socket at exit node) without kept in buffer. Otherwise, it should be kept in receiver window and then delivered to higher layers as a whole block of continuous cells once the cell with expected sequence number is received. Size of the receiver window is directly affected by the performances of the concurrently transmitting streams. In fact, if the receiver window is full, high-performing streams should wait for the delivery of the next expected cell

Fig. 4. Our Experimental Setup for Evaluating Multi-path Tor by low-performing stream. Although original Tor congestion window management may help limiting an unbounded growth of receiver window, receiver window still should be sized significantly large in order to avoid performance degradations due to wide performance differences between concurrently transmitting circuits. D. Black-box Experiment Design for Performance Evaluation In this work, we prefer making a black-box analysis on multi-path routing performance on Tor network. So that, we can leverage node diversity of live Tor network without limitations of working with small number of modified exit nodes. Similarly, for construction of circuits, we do not particularly apply any heuristics for node selection instead we rely on bandwidth-weighted node selection algorithm of default Tor implementation. Consequently, we are able evaluate performance of our multi-path scheme on original Tor network without installing modified nodes, treating Tor network as a black-box. So, our findings can be considered as lower-bound benchmark whose performance can be significantly improved as we discussed in Section IV. As depicted in Figure 4, our experimental setup involves a multi-path routing scenario where traffic between a Tor proxy and web server is transferred over multiple circuits. Our web server running on Utah Emulab Testbed [17] is listening on multiple-ports and combines data received from these ports into a single stream. Our client side software implements a Tor controller [18] which can manage the circuit establishment and attachment of streams into preferred circuits. Our Tor proxy software also implements packet buffering, reordering and dynamic traffic distribution over multiple-circuits in according to RTT estimations made on each circuit. For achieving fine-granularity RTT estimation, we introduces acknowledgment packets. This could be an overkill in a realistic Tor implementation, as there are several techniques which can achieve the similar task in relatively coarse-grained but efficient manner by using estimates based on periodic circuit-level and stream-level SENDME messages [13] as well as more costly active probing techniques [6]. We implement original Jacobson RTT estimation and dynamic retransmission timer interval calculation as described in [19]. Our intention of using retransmission timers is not retransmitting packets as it would be pointless in the existence of reliable underlying TCP pipe. However, we use retransmission timers to sense temporal congestion conditions and significant deviations from regular round-trip times so that we can adaptively change the traffic distribution among multiple circuits. So, if a node along a particular circuit suffers from congestion to the degree that RTT values exceed calculated RTO limits, we temporarily reduce the rate of traffic transferred over that circuit. In analogy with TCP and its congestion window management, we implemented a dynamic-traffic distribution algorithm similar to coupled congested control mechanism for MultiPath-TCP as described in a previous study [12]. In that algorithm, multiple sub-flows share a common congestion window whose size is represented by cwnd tot. Each flow rate-limits its transmission to its share, i.e. cwnd i, from this combined congestion window. Using RTT estimates made for each sub-flow, α parameter is calculated by following the Equation 1. ( ) cwndi mss max 2 i i RTTi α = cwnd tot 2 ( n ) 2. (1) cwnd i mss i RTT i i=1 Upon receipt of an acknowledgment packet, congestion window of an individual sub-flow grows according to the value calculated by the formula given in Equation 2. α 1 min(, ) (2) cwnd tot cwnd i If an observed RTT value exceeds RTO estimate, it will signal sender about the congestion experienced on path of this particular sub-stream. In that case, we decrement the size of sub-stream congestion window, cwnd i by its half as described in [12]. Although in TCP context, retransmission timeout means packet losses due to congestion along the path, in our case retransmission timeout serves a totally different purpose. By calculating dynamic RTO values according to dynamic sampling of RTT values as described in [19], we basically keep track of regular latencies observed on a circuit. If a significant deviation from these latencies are observed, we reduce the volume of traffic taking this circuit according to our dynamic traffic distribution algorithm by shrinking the congestion window of the sub-stream. So, contribution of substream to overall size of receiver window adjusted according to its throughput. Such an approach is targeted to prevent cases where receiver window is completely filled and highthroughput sub-streams stop and wait for the slow sub-stream. A. Experimental Setup III. EVALUATION In our experiments, we use a unidirectional scenario where a client-side software uploads a file to a web server over Tor network. Client-side software which is a Tor controller, establishes multiple socket connections to web server, attaches each resultant stream to a separate circuit. Also, client-side software receives acknowledgment messages over multiple circuits, calculates RTT values on each circuit and dynamically distributes traffic over streams as described in previous section. We experimented with file sizes of 1.5 MiB for representing bulk file transfers and 300 KiB for representing a typical

Fig. 5. Time to Download a 1.5 MiB file with multiple circuits Fig. 6. Time to Download a 300 KiB file with multiple circuits HTTP web transaction. To investigate varying multi-path performance gains with regional differences in node diversity, we installed our client software on our campus at Reno, USA and several Amazon EC2 sites at Singapore, Ireland, and North Virginia. Our web server fixed at Emulab Utah facility, is listening on multiple ports and combines packets received from these ports into a single logical connection. B. Test Procedure We have conducted 33 tests during the day of December 18th, December 29th, 2011 and January 1st, 2012. For each test, client Tor proxy establishes 10 circuits, selects 4 of them,and then starts uploading the file to our web server at Utah over these circuits. Once upload is complete, total download time, throughput, and RTT statistics are recorded. Then, client proxy tears down all 4 circuits, waits for 2 minutes and reestablishes 3 of these circuits again using the same relay nodes whose information is kept last time. Procedure is repeated with 2 circuits and a single circuit as well. Finally, one of the four test is repeated again. If the performance of repeated test, e.g., previous test with 3 circuits, matches the recent test performance within a reasonable difference margin, we consider this particular set of tests as reliable. Otherwise we discard the whole results for this set of tests due to possibility of an inconsistency within test set where transmission with n-circuits may have experienced significantly different congestion conditions with conditions experienced during n-1 circuit test case. C. Overall Data Transfer Performance In Figure 5, we plot empirical cumulative distribution function (CDF) of bulk data transfer durations for files size of 1.5 MiB between our campus and Emulab Utah facility over Tor network. As shown in Figure 5, dynamic traffic distribution over multiple-circuits significantly reduces the overall download time in comparison to single circuit case. Download times are significantly improved for all multi-circuit cases, where performance improvements with 3 and 4 circuits are significantly better than case with 2 circuits. However, performance improvements with 3 and 4 circuits are relatively close which may hint that using 4 circuits could be an overkill considering the cost of establishing additional circuits. With smaller files, performance improvements only get more obvious in comparison to bulk file transfer. Again improvements achieved via 3 and 4 circuits behave similarly. Download times with 3 and 4 circuits significantly differ from transfer performance of 2-circuits. We want to note that overall download times in our test cases are significantly larger than median values shared in Tor Metric Portal [20] although they are below timeout and failure cases. Since we compare our results again single-circuit case which does not include any RTT estimation or traffic distribution algorithm, we believe that our results reflect the latencies usually experienced by Tor users [21]. File Size Number of Circuits 4 3 2 300 KiB 3.92/1.45 3.01/0.88 1.85/0.34 1.5 MiB 2.49/0.41 2.28/0.54 1.73/0.25 TABLE I SPEEDUP FACTORS FOR DOWNLOAD TIMES (MEAN/MOE WITH 90% CI) In Table I, we share average throughput increases with varying number of circuits along with margin of error for 90th percentile confidence interval. Throughput improvements for small files are significantly better than large file uploads although they are more prone to performance fluctuations. D. Receiver Window Size Next, we investigate the trade-off between increasing throughput via multi-path routing and consequent increase in receiver window size. In our tests, we do not limit the receiver window size at web server to achieve maximum throughput improvement without concerning the slow-down effects of low-performing circuits. However, we calculate required receiver window size on average to achieve such improvement performance by applying Equation 3 on RTT statistics gathered from our experiment. rbuf = 2 ( n BW i RTT max ) (3) i=1 As given by the author in [12], these values are considered for the cases where fast-retransmit timers are in use so that fast retransmission of lagging packets are made without waiting

for a retransmission timeout. This condition corresponds to a case where a cell in flight on a low-performing circuit is fast-retransmitted over a high-performing circuit after a time interval. This mechanism requires implementation of an actual congestion window for Tor proxies and exit nodes which would be an overkill since Tor relies on upper level reliability and congestion control mechanisms for such tasks. In that sense, given values can be considered as an optimistic lowerbound benchmark which does not reflect the growth of receiver window sizes completely. File Size Number of Circuits 4 3 2 300 KiB 161.11/35.25 127.51/36.63 92.01/21.38 1.5 MiB 283.85/88.97 254.20/69.04 251.07/67.04 TABLE II RECEIVER WINDOW SIZE (KIB) (MEAN/MOE WITH 90% CI) As seen in Table II, increase in receiver window size can be over 250 KiB for bulk data transfers and 160 KiB for short interactive transfers. These values corresponds to significant increases in overall memory usage in exit nodes for the traffic leaving the Tor network. Since the general Web traffic is mostly asymmetric for HTTP and relatively balanced for peerto-peer (P2P) traffic, we can expect that the increases in receiver buffer sizes will most likely to impact overall memory usage of Onion Proxies running at client side instead of Tor exit nodes. Increase in receiver window size is parallel to increasing number of circuits used in concurrent transmission. These findings underline the importance of intelligent circuit establishment mechanisms and dynamic circuit management schemes as previously proposed by several researchers [4], [7], [6]. Via dynamic circuit management mechanisms, lowperforming circuits can be torn down to prevent unacceptable growth of receiver window size (or adverse effects of slowcircuits on other circuits in case of limited receiver buffer size). E. Regional Differences Finally, we analyze the differences in performance improvements varying with the location of Onion Proxies. We repeated the test cases 7 times for each location where our Tor controller client code is deployed on our campus in Nevada, USA, and also Amazon EC2 [14] sites in Ireland and Singapore. Our web server is fixed at Emulab facility at Utah, USA. Since most of the Tor relay nodes are deployed in USA and Europe region, we expect that multi-path Tor performs better for these regions exploiting the underlying diversity. In parallel to our expectations, tests with 300 KiB files resulted in better speedup performance in overall throughput for these regions with high node diversity as seen in Table III. IV. FUTURE DIRECTIONS Our experiment setup has several limitations. First of all, our experiment procedure of repeating a single test case with varying number of circuits took significant amount of EC2 Region Number of Circuits 4 3 2 USA 4.32/2.52 3.66/2.59 2.31/0.99 Ireland 4.54/1.45 3.02/1.82 1.78/0.51 Singapore 2.46/0.75 2.06/0.51 1.32/0.32 TABLE III REGIONAL DIFFERENCES IN SPEEDUP FACTOR (300 KIB) (MEAN/MOE WITH 90% CI) time due to Tor network performance and large performance fluctuations. So that, we were able to conduct only 33 sets of experiments. We would like to conduct further experiments in greater numbers in live Tor network as well as on an isolated Tor experiment testbed [21] to further verify our findings. In our multi-circuit scenarios, circuits does not share a common exit node due to our choice of black-box testing approach. This would presumably have an adverse effect on throughput improvements due to the possibility of exit nodes being geographically separated by large distances. However, skewed geographic distribution of Tor relay nodes has a limiting effect on this experimental shortcoming. In fact, our analysis showed that 76 percent of the time selected exit nodes happened to be within the same country. Nevertheless, our findings still report significant throughput improvements over single-circuit transmission. In that regard, these improvements should be considered as lower-bounds on full potential of achievable performance gains. Similar criticism can be made for our memory usage analysis of multi-path scheme. However, these possible side effects on remaining 24 percent of test cases most certainly are offset by our assumption of fastretransmission of delayed cells over fast-circuits. Another consideration for future directions can be implementation of congestion-aware circuit establishments [6] as proposed by Wang et al.. In that scheme, Tor proxy chooses among the best of pre-built circuits and switches between circuits if necessary. Similarly, Multi-path Tor can significantly benefit from dynamic management and maintenance of circuits. It is necessary for selecting compatible circuits, so that growth in receiver window size can stay within an acceptable region. Otherwise, overall increase in memory usage may expose Tor proxies and exit nodes to additional risks of attacks targeting service interruption and performance degradation. We also think that our method of combined congestion window management for multiple-streams, especially the increment function given in Equation 2 could be improved by following the recently popularized Cubic-TCP alternative which quickly restores window sizes to previous higher levels after a congestion [22]. We also would like to analyze robustness of multi-path Tor against congestion attacks as described in previous studies [23]. Since multi-path Tor improves overall throughput and download times, these improvements can provide more leeway for introducing artificial latencies and dummy packets for prevention against Timing attacks [24] and traffic fingerprint analysis. Although it would complicate relay node traffic for-

warding, very interesting mechanisms against Timing attacks and traffic shape analysis can be developed on top of Leakypipe techniques [3] and multi-path routing. Recently, many researchers proposed cloud-based anonymous communication mechanisms [25], [26]. Usually cloud service facilities are connected to world via tens of Internet Service Providers [27], [14]. Multi-path Tor mechanisms are capable of exploiting this provided diversity in order to achieve better load balancing, AS-aware and congestion-aware path selection [16], [6], [4]. We would like to extend our study to investigate multi-path routing capabilities in exploiting diversity promised by cloud infrastructure. V. CONCLUSIONS In this work, inspired by recent MultiPath TCP studies [9], [8], [10], [11], [12], we applied dynamic traffic distribution mechanisms on Tor anonymizer software. Our experiments with bulk data files and short interactive data transactions showed that multi-path routing has great potential for improving overall throughput and decreasing latencies experienced by users on Tor network. Also, our findings revealed that multi-path routing is able to exploit underlying regional node diversity effectively. As pointed out by a recent study [13], anonymity implications of multi-path routing is relatively small which does not expose Tor network to additional risk of circuit compromise. Our work showed that despite of its great potential in improving Tor network performance, multipath routing may introduce significant buffering costs on Tor proxies and exit nodes unless intelligent circuit management and path selection algorithms deployed. Current research on multi-path TCP mostly focuses on improving end-host performance via exploiting multiple interfaces on hosts without giving much consideration to path diversity provided by underlying network. This limitation can mostly attributed to the limited capabilities of underlying routing protocols in providing path diversity. (We would like to exclude local data-center traffic and high-performance routing areas from this discussion). Flexible source-based routing mechanisms of Tor network offers a flexible working environment for researchers that are working on multi-path routing. In that sense, we believe that Tor network can be a great incubation place for research toward addressing the open research questions in the area of multi-path research and anonymous communication. REFERENCES [1] I. Goldberg, D. Wagner, and E. Brewer, Privacy-enhancing technologies for the internet, in Compcon 97. Proceedings, IEEE, feb 1997, pp. 103 109. [2] Tor project: Anonymity online, https://www.torproject.org/. [3] R. Dingledine, N. Mathewson, and P. Syverson, Tor: the secondgeneration onion router, in Proceedings of the 13th conference on USENIX Security Symposium - Volume 13, ser. SSYM 04. Berkeley, CA, USA: USENIX Association, 2004, pp. 21 21. [4] R. Snader and N. Borisov, A tune-up for tor: Improving security and performance in the tor network, in Proceedings of the Network and Distributed Security Symposium - NDSS 08, February 2008. [5] R. Dingledine and S. J. Murdoch, Performance improvements on tor or, why tor is slow and what were going to do about it, March 2009, www.torproject.org/press/presskit/2009-03-11-performance.pdf. [6] T. Wang, K. Bauer, C. Forero, and I. Goldberg, Congestion-aware path selection for tor, in Proceedings of 16th International Conference on Financial Cryptography and Data Security, February 2012. [7] M. AlSabah, K. S. Bauer, I. Goldberg, D. Grunwald, D. McCoy, S. Savage, and G. M. Voelker, Defenestrator: Throwing out windows in tor, in PETS, ser. Lecture Notes in Computer Science, vol. 6794. Springer, 2011, pp. 134 154. [8] K. Rojviboonchai, T. Osuga, and H. Aida, R-m/tcp: Protocol for reliable multi-path transport over the internet, in Proceedings of the 19th International Conference on Advanced Information Networking and Applications - Volume 1, ser. AINA 05. Washington, DC, USA: IEEE Computer Society, 2005, pp. 801 806. [9] C. Raiciu, S. Barre, C. Pluntke, A. Greenhalgh, D. Wischik, and M. Handley, Improving datacenter performance and robustness with multipath tcp, SIGCOMM Comput. Commun. Rev., vol. 41, pp. 266 277, Aug. 2011. [10] D. Wischik, C. Raiciu, A. Greenhalgh, and M. Handley, Design, implementation and evaluation of congestion control for multipath tcp, in Proceedings of the 8th USENIX conference on Networked systems design and implementation, ser. NSDI 11. Berkeley, CA, USA: USENIX Association, 2011, pp. 8 8. [11] A. Ford, C. Raiciu, and M. Handley, Tcp extensions for multipath operation with multiple addresses, draft-ford-mptcp-multiaddressed-03, September 2010. [12] S. Barre, C. Paasch, and O. Bonaventure, Multipath tcp: from theory to practice, in Proceedings of the 10th international IFIP TC 6 conference on Networking - Volume Part I. Springer-Verlag, 2011, pp. 444 457. [13] M. AlSabah, K. Bauer, T. Elahi, and I. Goldberg, Tempura: Improved tor performance with multipath routing, Centre for Cryptographic Research, University of Waterloo, Tech. Rep., August 2011. [14] Amazon elastic computing cloud Amazon EC2, http://aws.amazon.com/ec2/. [15] N. Feamster and R. Dingledine, Location diversity in anonymity networks, in Proceedings of the 2004 ACM workshop on Privacy in the electronic society, ser. WPES 04. New York, NY, USA: ACM, 2004, pp. 66 76. [16] M. Edman and P. Syverson, As-awareness in tor path selection, in Proceedings of the 16th ACM conference on Computer and communications security, ser. CCS 09. New York, NY, USA: ACM, 2009, pp. 380 389. [17] M. Hibler, R. Ricci, L. Stoller, J. Duerig, S. Guruprasad, T. Stack, K. Webb, and J. Lepreau, Large-scale virtualization in the emulab network testbed, in USENIX 2008 Annual Technical Conference on Annual Technical Conference. Berkeley, CA, USA: USENIX Association, 2008, pp. 113 128. [18] N. Mathewson, Tc: A tor control protocol (version 1), https://gitweb.torproject.org/torspec.git/blob/head:/control-spec.txt. [19] V. Paxson and M. Allman, Computing tcp s retransmission timer, RFC 2988, November 2000. [20] Tor metrics portal, http://metrics.torproject.org/. [21] K. Bauer, M. Sherr, D. McCoy, and D. Grunwald, Experimentor: a testbed for safe and realistic tor experimentation, in Proceedings of the 4th conference on Cyber security experimentation and test, ser. CSET 11. Berkeley, CA, USA: USENIX Association, 2011, pp. 7 7. [22] S. Ha, I. Rhee, and L. Xu, Cubic: a new tcp-friendly high-speed tcp variant, SIGOPS Oper. Syst. Rev., vol. 42, pp. 64 74, July 2008. [23] N. S. Evans, R. Dingledine, and C. Grothoff, A practical congestion attack on tor using long paths, in Proceedings of the 18th conference on USENIX security symposium, ser. SSYM 09. Berkeley, CA, USA: USENIX Association, 2009, pp. 33 50. [24] V. Shmatikov and M.-H. Wang, Timing analysis in low-latency mix networks: attacks and defenses, in Proceedings of ESORICS, 2006, pp. 18 33. [25] R. Mortier, A. Madhavapeddy, T. Hong, D. Murray, and M. Schwarzkopf, Using Dust Clouds to Enhance Anonymous Communication, 2010. [Online]. Available: http://anil.recoil.org/papers/2010- iswp-dustclouds.pdf [26] N. Jones, M. Arye, J. Cesareo, and M. J. Freedman, Hiding amongst the clouds: A proposal for cloud-based onion routing, in USENIX Workshop on Free and Open Communications on the Internet - FOCI, August 2011. [27] Windows azure, http://www.windowsazure.com.