Data Streaming Algorithms for Efficient and Accurate Estimation of Flow Size Distribution
|
|
- Cornelius Baker
- 5 years ago
- Views:
Transcription
1 Data Streaming Algorithms for Efficient and Accurate Estimation of Flow Size Distribution Jun (Jim) Xu Networking and Telecommunications Group College of Computing Georgia Institute of Technology Joint work with: Abhishek Kumar, Jun Li, Minho Sung, Ellen Zegura, Georgia Tech. Olivier Spatscheck, Jia Wang, AT&T Labs - Research Li Li, Bell Labs Networking area Qualifying Exam: Oral Presentation
2 Outline Motivation and introduction Our data streaming works Space-Code Bloom Filter per flow traffic measurement without per flow state Estimating flow size distribution without per flow state Data streaming techniques for IP traceback Data streaming techniques for unstructured P2P routing Theory and philosophy of network data streaming
3 Motivation for network data streaming Problem: to monitor network links for events such as Elephant flows, Number of distinct flows, average flow size Flow size distribution Per-flow traffic volume Example applications: Traffic engineering Network pricing and billing, Anomaly detection, Network queue management
4 Motivation for network data streaming Monitoring at high speed is challenging packets arrive every 25ns on a 40 Gbps (OC-768) link has to use SRAM for per-packet processing per-flow state too large to fit into SRAM Traditional solution using sampling: Sample a small percentage of packets Process these packets using per-flow state stored in slow memory (DRAM) Using scaling to recover the original statistics, hence high inaccuracy with low sampling rate Fighting a losing cause: higher link speed requires lower sampling rate Conclusion: naive sampling based approach is doomed
5 Network data streaming a smarter solution Computational model: process a long stream of data (packets) in one pass using a small (yet fast) memory Problem to solve: need to answer some queries about the stream at the end or continuously Trick: try to remember the most important information about the stream pertinent to the queries Application: provide a framework for solving many network monitoring problems Difference with data streaming in DB/theory: need to run orders of magnitude faster (i.e., wisdom under fire) Status: extremely young and promising area no book, few papers, and a lot of confusion
6 Data Streaming Algorithm for Estimating Flow Distribution
7 Data Streaming Algorithm for Estimating Flow Distribution Problem: To estimate the probability distribution of flow sizes. In other words, for each positive integer i, estimate n i, the number of flows of size i. Applications: Traffic characterization, network billing/accounting, anomaly detection, etc. Importance: The mother of many other flow statistics such as average flow size (first moment) Definition of a flow: All packets with the same flow-label. The flow-label can be defined as any combination of fields from the IP header, e.g., <Source IP, source Port, Dest. IP, Dest. Port, Protocol>.
8 Related work: Inverting sampled packet traces. Current measurement boxes only collect sampled traces. The approach is to invert the sampled distribution to obtain the actual distribution [Duffield, Lund, & Thorup 2003]. High estimation errors due to low sampling rates. Theoretical limitations to inverting sampled traffic [Hohn & Veitch, IMC 2003].
9 Our approach: network data streaming Main idea: process each and every packet using a fast SRAM to skim most important information Design philosophy: Lossy data structure + Bayesian statistics = Accurate streaming Information loss is unavoidable: (1) memory very small compare to the data stream (2) too little time to put data into the right place Control the loss so that Bayesian statistical techniques such as Maximum Likelihood Estimation can still recover a decent amount of information.
10 Architecture of our Solution System Model Packet stream Header Header Header 1. Update Online Streaming Module 2. Raw streaming result Offline Processing Module 3. Flow distribution
11 Architecture of our Solution Lossy data structure Measurement proceeds in epochs (e.g. 100 seconds). Maintain an array of counters in fast memory (SRAM). For each packet, a counter is chosen via hashing, and incremented. No attempt to detect or resolve collisions. Data collection is lossy (erroneous), but very fast. Each 32-bit counter only uses 9-bit of SRAM (due to [Ramabhadran & Varghese 2003]) At the end of the epoch, the counter array is paged to disk.
12 The shape of the Counter Value Distribution 1e Actual flow distribution m=1024k m=512k m=256k m=128k frequency flow size The distribution of flow sizes and raw counter values (both x and y axes are in log-scale). m = number of counters.
13 Architecture of our solution Bayesian statistics The counter array is processed to obtain the Counter Value Distribution. {m o =# counters with value 0, y 1 =# counters with value 1,, y z =# counters with value z.} Use Bayesian statistics to derive the following quantities from this data: The total no. of flows, n. The total no. of flows with exactly one packet, n 1. The flow distribution φ.
14 Estimating n and n 1 Let total number of counters be m. Let the number of value-0 counters be m 0 Then ˆn = m ln(m/m 0 ) Let the number of value-1 counters be y 1 Then ˆn 1 = y 1 eˆn/m Generalizing this process to estimate n 2, n 3, and the whole flow size distribution will not work Solution: joint estimation using Expectation Maximization
15 Expectation Maximization in a nutshell Let Φ be the parameters we would like to estimate Let Y be the observation Goal: find Φ that maximizes the likelihood function P (Φ Y), i.e., the Maximum Likelihood Estimator. Problem: P (Φ Y), viewed as f(φ, Y), can be hard to evaluate and maximize Solution: P (Φ Y), viewed as g(φ, Y, Γ), is often easy to compute and maximize as a function of Φ. Also E[Γ Φ old, Y] is often easy to evaluate as a function of Γ Obtain Γ = E[Γ Φ old, Y] Maximize g(φ, Y, Γ) to obtain Φ new Repeat the above two steps until the algorithm converges.
16 Estimating the entire distribution, φ, using EM Begin with a guess of the flow distribution, φ ini. Based on this φ ini, compute the various possible ways of splitting a particular counter value and the respective probabilities of such events. This allows us to compute a refined estimate of the flow distribution φ new. Repeating this multiple times allows the estimate to converge to a local maximum. This is an instance of Expectation maximization.
17 Estimating the entire flow distribution an example For example, a counter value of 3 could be caused by three events: 3 = 3 (no hash collision); 3 = (a flow of size 1 colliding with a flow of size 2); 3 = (three flows of size 1 hashed to the same location) Suppose the respective probabilities of these three events are 0.5, 0.3, and 0.2 respectively, and there are 1000 counters with value 3. Then we estimate that 500, 300, and 200 counters split in the three above ways, respectively. So we credit 300 * * 3 = 900 to n 1, the count of size 1 flows, and credit 300 and 500 to n 2 and n 3, respectively.
18 Evaluation Before and after running the Estimation algorithm 1e Actual flow distribution Raw counter values Estimation using our algorithm 1000 frequency flow size
19 Sampling vs. array of counters Web traffic Actual flow distribution Inferred from sampling,n=10 Inferred from sampling,n=100 Estimation using our algorithm frequency flow size
20 Sampling vs. array of counters DNS traffic Actual flow distribution Inferred from sampling,n=10 Inferred from sampling,n=100 Estimation using our algorithm frequency flow size
21 Space Code Bloom Filter for per-flow measurement
22 Space Code Bloom Filter for per-flow measurement Problem: To keep track of the total number of packets belonging to each flow at a high speed link. Network billing/accounting and anomaly detec- Applications: tion Challenges: 1. Traditional techniques such as sampling will not work at high link speed. 2. Existing streaming algorithms only track the identity of the elephants [Estan and Varghese 2002]
23 Architecture and system model of SCBF Measurement proceeds in epochs (e.g. 10 second). Maintain an aggregate synopsis data-structure. Update the data-structure on every packet arrival. Write-only data structure fast updates, low hardware complexity. Copies of the synopsis are paged to disk periodically. 0. New packet arrival Header 1. Process header CPU 2. Write to SCBF SCBF Module SRAM Module 1 SRAM Module 2 3. Paging to disk once "full" Persistent Storage 4. Query 5.Answer
24 Architecture and system model of SCBF Queries provide a flow-label and ask for its size. Obtain raw data from the data-structure and infer the approximate flow size from the raw data (a table lookup process) This provides approximate estimates that have low relative error with high probability.
25 Balls in Bins
26 Balls in Bins Packet Processor
27 Balls in Bins Packet Processor Pick one bin (uniformaly at random)
28 Balls in Bins Packet Processor Mark Bin as occupied
29 Balls in Bins Packet Processor Mark Bin as occupied (by setting the corresponding bit to 1)
30 Balls in Bins - Estimation # Occupied bins =
31 Balls in Bins - Estimation #packets=2 #packets=3 #packets=4 pdf pdf pdf # bins # bins # bins # Occupied bins =
32 Balls in Bins - Estimation #packets=2 #packets=3 #packets=4 pdf pdf pdf # bins # bins Maximum Likelihood estimate # Occupied bins = # bins
33 Using Bloom Filters to Implement Bins Insert (label) h 1, h 2,..., h K h 1, h 2,..., h K Yes Query (label) All 1? No
34 Use of multiple Bloom Filters to implement Multiple Bins (1) (1) (2) (2) h 1, h 2,..., h (2) 1, h2,..., h (1) (l) (l) (l) K h K h 1, h2,..., h K Bloom Filter 1 Bloom Filter 2 Bloom Filter L Insert each ball (packet) in one of l bins (Bloom filters), chosen uniformly at random. While querying, count number of bins (Bloom filters) in which one or more balls (packets) were inserted and estimate the size of the flow from this count.
35 Space Code Bloom Filter - Multiple bins in the same Bloom Filter (1) (1) (2) (2) h 1, h 2,..., h (2) 1, h2,..., h (1) (l) (l) (l) K h K h 1, h2,..., h K Bloom Filter 1 Bloom Filter 2 Bloom Filter L
36 Querying an SCBF Query (label) Yes No Yes Count = (# Yes) (1) (1) (2) (2) h 1, h 2,..., h (2) 1, h2,..., h (1) (l) (l) (l) K h K h 1, h2,..., h K Bloom Filter 1 Bloom Filter 2 Bloom Filter L
37 Extending the range of SCBF An SCBF of size l fills up after about (l ln l) insertions (result from the classic coupon collector s problem). So we can t use it for estimating flow sizes much larger than (l ln l). Objective To provide a constant relative error, irrespective of flow-size. Solution Use multiple SCBFs operating at different resolutions. Small flows will be captured by SCBFs of finer resolution. Large flows will be captured by SCBFs of coarse resolutions.
38 Multi-Resolution SCBF (1) (1) (2) (2) h 1, h 2,..., h (2) 1, h2,..., h (1) (l) (l) (l) K h K h 1, h2,..., h K Bloom Filter 1 Bloom Filter 2 Bloom Filter L SCBF 1 Sampling rate 1 (1) (1) (2) (2) h 1, h 2,..., h (2) 1, h2,..., h (1) (l) (l) (l) K h K h 1, h2,..., h K Bloom Filter 1 Bloom Filter 2 Bloom Filter L SCBF 2 Sampling rate 1/4 (1) (1) (2) (2) h 1, h 2,..., h (2) 1, h2,..., h (1) (l) (l) (l) K h K h 1, h2,..., h K SCBF r Sampling rate 1/4 r 1 Bloom Filter 1 Bloom Filter 2 Bloom Filter L
39 Original vs. estimated flow size. Note that both axes are on logscale. Accuracy of SCBF estimated original Estimated flow length (packets) Original flow length (packets)
40 Accuracy of SCBF using Maximum Likelihood Estimation flows >=10 --> <-- flows >=5 P[relative err < e] (CDF) all flows flows >=2 flows >=3 flows >=5 flows >=10 flows >=100 flows >=3 ---> flows >=100 --> <-- all flows e CDF of relative error for flows of various size
41 Performance of SCBF - complexities and accuracy Computational complexity compute 5 hash-functions and write 5 bits per packet. Space complexity 4 bits of storage required for each packet. Can operate at OC-768 (40 Gbps) with 5 ns SRAM. More than 80% responses are within ±25% of the actual value.
42 Data streaming techniques for IP traceback Efficient and Scalable Query Routing for Unstructured Peer-to-Peer Networks Abhishek Kumar Jun (Jim) Xu Ellen W. Zegura Georgia Institute of Technology
43 Search in Unstructured Peer-to-Peer Networks Flooding is the naive solution. Random walk reduces duplicate queries. Replication of content indices improves both solutions. However, such replication is expensive and not efficient. How about replicating probabilistic information?
44 Search using Probabilistic information Noise Low information Medium information Strong information Noise Content Host (Origin of information) Noise
45 Exponential Decay Bloom Filter EDBF A data structure for succinct representation of probabilistic information. Number if bits used for representing a probability value is proportional to the value. Decay of information with network distance. Easy set union operation.
46 Routing using EDBF Nodes advertise their own content using all bits (strng infromation). Information from neighboring nodes is combined into one EDBF and decayed by a constant factor. The union of this decayed information with the local advertisement is then propagated to neighbors. Efficient propagation of probabilistic information. Delta compression of updates is easy and results in huge savings.
47 Simulation Results % of queries answered SQR OHR Random walk flooding % of queries answered SQR OHR Random walk flooding Hop Limit (a)flat topologies e+06 Hop Limit (b) Flat topologies, log scale on x-axis.
48 Simulation Results % of queries answered SQR+GIA GIA Ultrapeer Hop Limit (c) Hierarchical topologies, log scale on x- axis.
49 Distributed coordinated data streaming a new paradigm A network of streaming nodes Every node is both a producer and a consumer of data streams Every node exchanges data with neighbors, streams the data received, and passes it on further From EE point of view: distributed source coding + distributed channel coding
50 Philosophy of network data streaming View it as special-purpose data compression Parametric or nonparametric statistics? View it as a channel design problem Future work: develop the filter theory for network data streaming
51 Contributions Data streaming algorithms for estimating flow size distribution and per-flow traffic volume, for performing better IP traceback, and for searching more efficiently in P2P networks A new methodology for network data streaming: Lossy data structure + Bayesian statistics = Accurate Streaming A new data streaming paradigm: distributed coordinated streaming Help lay the foundation of this new field.
52 Related publications Kumar, A., Xu, J., Zegura, E. Efficient and Scalable Query Routing for Unstructured Peer-to-Peer Networks, to appear in Proc. of IEEE Infocom Kumar, A., Sung, M., Xu, J., Wang, J. Data Streaming Algorithms for Efficient and Accurate Estimation of Flow Distribution", in Proc. of ACM Sigmetrics 2004/IFIP WG 7.3 Performance 2004, Best Student Paper Award. Li, J., Sung, M., Xu, J., Li, L. Large-Scale IP Traceback in High-Speed Internet: Practical Techniques and Theoretical Foundation", in 2004 IEEE Symp. on Security & Privacy. Kumar, A., Xu, J., Wang, J., Spatschek, O., Li, L. Space-Code Bloom Filter for Efficient Per-Flow Traffic Measurement, in Proc. of IEEE Infocom 2004.
Network Data Streaming A Computer Scientist s Journey in Signal Processing
Network Data Streaming A Computer Scientist s Journey in Signal Processing Jun (Jim) Xu Networking and Telecommunications Group College of Computing Georgia Institute of Technology Joint work with: Abhishek
More informationLecture 2: Streaming Algorithms for Counting Distinct Elements
Lecture 2: Streaming Algorithms for Counting Distinct Elements 20th August, 2008 Streaming Algorithms Streaming Algorithms Streaming algorithms have the following properties: 1 items in the stream are
More informationOverlay and P2P Networks. Unstructured networks. Prof. Sasu Tarkoma
Overlay and P2P Networks Unstructured networks Prof. Sasu Tarkoma 20.1.2014 Contents P2P index revisited Unstructured networks Gnutella Bloom filters BitTorrent Freenet Summary of unstructured networks
More informationScalable Hash-based IP Traceback using Rate-limited Probabilistic Packet Marking
TECHNICAL REPORT, COLLEGE OF COMPUTING, GEORGIA INSTITUTE OF TECHNOLOGY Scalable Hash-based IP Traceback using Rate-limited Probabilistic Packet Marking Minho Sung, Jason Chiang, and Jun (Jim) Xu Abstract
More informationOrigin-Destination Flow Measurement in High-Speed Networks
Origin-Destination Flow Measurement in High-Speed Networks Tao Li Shigang Chen Yan Qiao Department of Computer & Information Science & Engineering University of Florida, Gainesville, FL, USA Abstract An
More informationKNOM Tutorial Internet Traffic Matrix Measurement and Analysis. Sue Bok Moon Dept. of Computer Science
KNOM Tutorial 2003 Internet Traffic Matrix Measurement and Analysis Sue Bok Moon Dept. of Computer Science Overview Definition of Traffic Matrix 4Traffic demand, delay, loss Applications of Traffic Matrix
More informationMaking Gnutella-like P2P Systems Scalable
Making Gnutella-like P2P Systems Scalable Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, S. Shenker Presented by: Herman Li Mar 2, 2005 Outline What are peer-to-peer (P2P) systems? Early P2P systems
More informationOverlay and P2P Networks. Unstructured networks. Prof. Sasu Tarkoma
Overlay and P2P Networks Unstructured networks Prof. Sasu Tarkoma 19.1.2015 Contents Unstructured networks Last week Napster Skype This week: Gnutella BitTorrent P2P Index It is crucial to be able to find
More informationPacket Doppler: Network Monitoring using Packet Shift Detection
Packet Doppler: Network Monitoring using Packet Shift Detection Tongqing Qiu, Nan Hua, Jun (Jim) Xu Georgia Tech Jian Ni, Hao Wang, Richard (Yang) Yang Yale University Presenter: Richard Ma December 10th,
More informationDesign and Performance Analysis of a DRAM-based Statistics Counter Array Architecture
Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture Chuck Zhao 1 Hao Wang 2 Bill Lin 2 Jim Xu 1 1 Georgia Institute of Technology 2 University of California, San Diego
More informationScalable overlay Networks
overlay Networks Dr. Samu Varjonen 1 Lectures MO 15.01. C122 Introduction. Exercises. Motivation. TH 18.01. DK117 Unstructured networks I MO 22.01. C122 Unstructured networks II TH 25.01. DK117 Bittorrent
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Queries on streams
More informationA Framework for Efficient Class-based Sampling
A Framework for Efficient Class-based Sampling Mohit Saxena and Ramana Rao Kompella Department of Computer Science Purdue University West Lafayette, IN, 47907 Email: {msaxena,kompella}@cs.purdue.edu Abstract
More informationOptimizing Capacity-Heterogeneous Unstructured P2P Networks for Random-Walk Traffic
Optimizing Capacity-Heterogeneous Unstructured P2P Networks for Random-Walk Traffic Chandan Rama Reddy Microsoft Joint work with Derek Leonard and Dmitri Loguinov Internet Research Lab Department of Computer
More informationLesson n.11 Data Structures for P2P Systems: Bloom Filters, Merkle Trees
Lesson n.11 : Bloom Filters, Merkle Trees Didactic Material Tutorial on Moodle 15/11/2013 1 SET MEMBERSHIP PROBLEM Let us consider the set S={s 1,s 2,...,s n } of n elements chosen from a very large universe
More informationError Detection and Correction by using Bloom Filters R. Prem Kumar, Smt. V. Annapurna
Error Detection and Correction by using Bloom Filters R. Prem Kumar, Smt. V. Annapurna Abstract---Bloom filters (BFs) provide a fast and efficient way to check whether a given element belongs to a set.
More informationJoint Data Streaming and Sampling Techniques for Detection of Super Sources and Destinations
Joint Data Streaming and Sampling Techniques for Detection of Super Sources and Destinations Qi (George) Zhao Abhishek Kumar Jun (Jim) Xu College of Computing, Georgia Institute of Technology Abstract
More informationOverlay and P2P Networks. Unstructured networks. PhD. Samu Varjonen
Overlay and P2P Networks Unstructured networks PhD. Samu Varjonen 25.1.2016 Contents Unstructured networks Last week Napster Skype This week: Gnutella BitTorrent P2P Index It is crucial to be able to find
More informationNew Directions in Traffic Measurement and Accounting. Need for traffic measurement. Relation to stream databases. Internet backbone monitoring
New Directions in Traffic Measurement and Accounting C. Estan and G. Varghese Presented by Aaditeshwar Seth 1 Need for traffic measurement Internet backbone monitoring Short term Detect DoS attacks Long
More informationECSE-6600: Internet Protocols Spring 2007, Exam 1 SOLUTIONS
ECSE-6600: Internet Protocols Spring 2007, Exam 1 SOLUTIONS Time: 75 min (strictly enforced) Points: 50 YOUR NAME (1 pt): Be brief, but DO NOT omit necessary detail {Note: Simply copying text directly
More informationDRAM is Plenty Fast for Wirespeed Statistics Counting
DRAM is Plenty Fast for Wirespeed Statistics Counting Bill Lin Electrical and Computing Engineering University of California, San Diego billlin@ece.ucsd.edu Jun (Jim) Xu College of Computing Georgia Institute
More informationCounter Braids: A novel counter architecture
Counter Braids: A novel counter architecture Balaji Prabhakar Balaji Prabhakar Stanford University Joint work with: Yi Lu, Andrea Montanari, Sarang Dharmapurikar and Abdul Kabbani Overview Counter Braids
More informationDesign and Performance Analysis of a DRAM-based Statistics Counter Array Architecture
Design and Performance Analysis of a DRAM-based Statistics Counter Array Architecture Chuck Zhao 1 Hao Wang 2 Bill Lin 2 1 1 Georgia Tech 2 UCSD October 2nd, 2008 Broader High-Level Question What are the
More informationMAD 12 Monitoring the Dynamics of Network Traffic by Recursive Multi-dimensional Aggregation. Midori Kato, Kenjiro Cho, Michio Honda, Hideyuki Tokuda
MAD 12 Monitoring the Dynamics of Network Traffic by Recursive Multi-dimensional Aggregation Midori Kato, Kenjiro Cho, Michio Honda, Hideyuki Tokuda 1 Background Traffic monitoring is important to detect
More informationA Scalable Content- Addressable Network
A Scalable Content- Addressable Network In Proceedings of ACM SIGCOMM 2001 S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker Presented by L.G. Alex Sung 9th March 2005 for CS856 1 Outline CAN basics
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationHighly Compact Virtual Maximum Likelihood Sketches for Counting Big Network Data
Highly Compact Virtual Maximum Likelihood Sketches for Counting Big Network Data Zhen Mo Yan Qiao Shigang Chen Department of Computer & Information Science & Engineering University of Florida Gainesville,
More informationRouting, Routers, Switching Fabrics
Routing, Routers, Switching Fabrics Outline Link state routing Link weights Router Design / Switching Fabrics CS 640 1 Link State Routing Summary One of the oldest algorithm for routing Finds SP by developing
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #6: Mining Data Streams Seoul National University 1 Outline Overview Sampling From Data Stream Queries Over Sliding Window 2 Data Streams In many data mining situations,
More informationOne-Pass Streaming Algorithms
One-Pass Streaming Algorithms Theory and Practice Complaints and Grievances about theory in practice Disclaimer Experiences with Gigascope. A practitioner s perspective. Will be using my own implementations,
More informationScalable Enterprise Networks with Inexpensive Switches
Scalable Enterprise Networks with Inexpensive Switches Minlan Yu minlanyu@cs.princeton.edu Princeton University Joint work with Alex Fabrikant, Mike Freedman, Jennifer Rexford and Jia Wang 1 Enterprises
More informationDRAM is Plenty Fast for Wirespeed Statistics Counting
ACM SIGMETRICS PERFORMANCE EVALUATION REVIEW, VOLUME 36, ISSUE 2, SEPTEMBER 2008 DRAM is Plenty Fast for Wirespeed Statistics Counting Bill Lin Electrical and Computing Engineering University of California,
More informationHighly Compact Virtual Counters for Per-Flow Traffic Measurement through Register Sharing
Highly Compact Virtual Counters for Per-Flow Traffic Measurement through Register Sharing You Zhou Yian Zhou Min Chen Qingjun Xiao Shigang Chen Department of Computer & Information Science & Engineering,
More informationSplitQuest: Controlled and Exhaustive Search in Peer-to-Peer Networks
SplitQuest: Controlled and Exhaustive Search in Peer-to-Peer Networks Pericles Lopes Ronaldo A. Ferreira pericles@facom.ufms.br raf@facom.ufms.br College of Computing, Federal University of Mato Grosso
More informationData Streams. Everything Data CompSci 216 Spring 2018
Data Streams Everything Data CompSci 216 Spring 2018 How much data is generated every 2 minute in the world? haps://fossbytes.com/how-much-data-is-generated-every-minute-in-the-world/ 3 Data stream A potentially
More informationOne Memory Access Bloom Filters and Their Generalization
This paper was presented as part of the main technical program at IEEE INFOCOM 211 One Memory Access Bloom Filters and Their Generalization Yan Qiao Tao Li Shigang Chen Department of Computer & Information
More informationExtending the LAN. Context. Info 341 Networking and Distributed Applications. Building up the network. How to hook things together. Media NIC 10/18/10
Extending the LAN Info 341 Networking and Distributed Applications Context Building up the network Media NIC Application How to hook things together Transport Internetwork Network Access Physical Internet
More informationHint-based Routing in WSNs using Scope Decay Bloom Filters
Hint-based Routing in WSNs using Scope Decay Bloom Filters Xiuqi Li and Jie Wu Department of Computer Science and Engineering Florida Atlantic University Boca Raton, FL 33431 {xli,jie}@cse.fau.edu Jun
More informationCounter Braids: A novel counter architecture
Counter Braids: A novel counter architecture Balaji Prabhakar Balaji Prabhakar Stanford University Joint work with: Yi Lu, Andrea Montanari, Sarang Dharmapurikar and Abdul Kabbani Overview Counter Braids
More informationCACHE-OBLIVIOUS MAPS. Edward Kmett McGraw Hill Financial. Saturday, October 26, 13
CACHE-OBLIVIOUS MAPS Edward Kmett McGraw Hill Financial CACHE-OBLIVIOUS MAPS Edward Kmett McGraw Hill Financial CACHE-OBLIVIOUS MAPS Indexing and Machine Models Cache-Oblivious Lookahead Arrays Amortization
More informationOssification of the Internet
Ossification of the Internet The Internet evolved as an experimental packet-switched network Today, many aspects appear to be set in stone - Witness difficulty in getting IP multicast deployed - Major
More informationLow Latency via Redundancy
Low Latency via Redundancy Ashish Vulimiri, Philip Brighten Godfrey, Radhika Mittal, Justine Sherry, Sylvia Ratnasamy, Scott Shenker Presenter: Meng Wang 2 Low Latency Is Important Injecting just 400 milliseconds
More informationCodes, Bloom Filters, and Overlay Networks. Michael Mitzenmacher
Codes, Bloom Filters, and Overlay Networks Michael Mitzenmacher 1 Today... Erasure codes Digital Fountain Bloom Filters Summary Cache, Compressed Bloom Filters Informed Content Delivery Combining the two
More informationData Centers. Tom Anderson
Data Centers Tom Anderson Transport Clarification RPC messages can be arbitrary size Ex: ok to send a tree or a hash table Can require more than one packet sent/received We assume messages can be dropped,
More informationSimply Top Talkers Jeroen Massar, Andreas Kind and Marc Ph. Stoecklin
IBM Research - Zurich Simply Top Talkers Jeroen Massar, Andreas Kind and Marc Ph. Stoecklin 2009 IBM Corporation Motivation and Outline Need to understand and correctly handle dominant aspects within the
More informationAlgorithms and Applications for the Estimation of Stream Statistics in Networks
Algorithms and Applications for the Estimation of Stream Statistics in Networks Aviv Yehezkel Ph.D. Research Proposal Supervisor: Prof. Reuven Cohen Motivation Introduction Cardinality Estimation Problem
More informationMASTER DEGREE COMPUTER SCIENCE COMPUTER SCIENCE AND NETWORKING. Peer to Peer Systems LAURA RICCI 2/5/2011
MASTER DEGREE COMPUTER SCIENCE COMPUTER SCIENCE AND NETWORKING Peer to Peer Systems LAURA RICCI 2/5/2011 1 WHY A P2P SYSTEMS COURSE? P2P applications have become extremely popular and currently contribute
More informationBuilding Efficient and Reliable Software-Defined Networks. Naga Katta
FPO Talk Building Efficient and Reliable Software-Defined Networks Naga Katta Jennifer Rexford (Advisor) Readers: Mike Freedman, David Walker Examiners: Nick Feamster, Aarti Gupta 1 Traditional Networking
More informationSCAN: Scalable Content routing for
ICC 2011@kyoto SCAN: Scalable routing for content-aware Networking Munyoung Lee, Kideok Cho, Kunwoo Park, Td T Ted Taekyoung Kwon, Yanghee Choi (mylee@mmlab.snu.ac.kr) Seoul National University 2011. 6.
More informationAdaptation of Real-time Temporal Resolution for Bitrate Estimates in IPFIX Systems
Adaptation of Real-time Temporal Resolution for Bitrate Estimates in IPFIX Systems Rosa Vilardi, Luigi Alfredo Grieco, Gennaro Boggia DEE - Politecnico di Bari - Italy Email: {r.vilardi, a.grieco, g.boggia}@poliba.it
More informationEarly Measurements of a Cluster-based Architecture for P2P Systems
Early Measurements of a Cluster-based Architecture for P2P Systems Balachander Krishnamurthy, Jia Wang, Yinglian Xie I. INTRODUCTION Peer-to-peer applications such as Napster [4], Freenet [1], and Gnutella
More informationC13b: Routing Problem and Algorithms
CISC 7332X T6 C13b: Routing Problem and Algorithms Hui Chen Department of Computer & Information Science CUNY Brooklyn College 11/20/2018 CUNY Brooklyn College 1 Acknowledgements Some pictures used in
More informationCollege of Computing, Georgia Institute of Technology, 801 Atlantic Drive, Atlanta, GA , USA
Int. J. High Performance Computing and Networking, Vol. 6, Nos. 3/4, 2 8 HR-SDBF: an approach to data-centric routing in WSNs Xiuqi Li* Department of Mathematics and Computer Science, University of North
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 3/6/2012 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 In many data mining
More informationChapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,
Introduction Chapter 5 Hashing hashing performs basic operations, such as insertion, deletion, and finds in average time 2 Hashing a hash table is merely an of some fixed size hashing converts into locations
More informationComparing Alternative Approaches for Networking of Named Objects in the Future Internet
Comparing Alternative Approaches for Networking of Named Objects in the Future Internet Akash Baid, Tam Vu, Dipankar Raychaudhuri, Rutgers University, NJ, USA Motivation Increasing consensus on: Rethinking
More informationData Stream Mining. Tore Risch Dept. of information technology Uppsala University Sweden
Data Stream Mining Tore Risch Dept. of information technology Uppsala University Sweden 2016-02-25 Enormous data growth Read landmark article in Economist 2010-02-27: http://www.economist.com/node/15557443/
More informationTowards High-performance Flow-level level Packet Processing on Multi-core Network Processors
Towards High-performance Flow-level level Packet Processing on Multi-core Network Processors Yaxuan Qi (presenter), Bo Xu, Fei He, Baohua Yang, Jianming Yu and Jun Li ANCS 2007, Orlando, USA Outline Introduction
More informationDifferential Congestion Notification: Taming the Elephants
Differential Congestion Notification: Taming the Elephants Long Le, Jay Kikat, Kevin Jeffay, and Don Smith Department of Computer science University of North Carolina at Chapel Hill http://www.cs.unc.edu/research/dirt
More informationDoS Attacks. Network Traceback. The Ultimate Goal. The Ultimate Goal. Overview of Traceback Ideas. Easy to launch. Hard to trace.
DoS Attacks Network Traceback Eric Stone Easy to launch Hard to trace Zombie machines Fake header info The Ultimate Goal Stopping attacks at the source To stop an attack at its source, you need to know
More informationDistributed Queue Dual Bus
Distributed Queue Dual Bus IEEE 802.3 to 802.5 protocols are only suited for small LANs. They cannot be used for very large but non-wide area networks. IEEE 802.6 DQDB is designed for MANs It can cover
More informationPeer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today
Network Science: Peer-to-Peer Systems Ozalp Babaoglu Dipartimento di Informatica Scienza e Ingegneria Università di Bologna www.cs.unibo.it/babaoglu/ Introduction Peer-to-peer (PP) systems have become
More informationNetwork Routing - II Failures, Recovery, and Change
MIT 6.02 DRAFT Lecture Notes Spring 2009 (Last update: April 27, 2009) Comments, questions or bug reports? Please contact Hari Balakrishnan (hari@mit.edu) or 6.02-staff@mit.edu LECTURE 21 Network Routing
More informationPeer-to-Peer Internet Applications: A Review
Peer-to-Peer Internet Applications: A Review Davide Quaglia 01/14/10 Introduction Key points Lookup task Outline Centralized (Napster) Query flooding (Gnutella) Distributed Hash Table (Chord) Simulation
More informationImproving Sketch Reconstruction Accuracy Using Linear Least Squares Method
Improving Sketch Reconstruction Accuracy Using Linear Least Squares Method Gene Moo Lee, Huiya Liu, Young Yoon and Yin Zhang Department of Computer Sciences University of Texas at Austin Austin, TX 7872,
More informationClustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More informationIntroduction to Peer-to-Peer Systems
Introduction Introduction to Peer-to-Peer Systems Peer-to-peer (PP) systems have become extremely popular and contribute to vast amounts of Internet traffic PP basic definition: A PP system is a distributed
More informationImpact of Sampling on Anomaly Detection
Impact of Sampling on Anomaly Detection DIMACS/DyDan Workshop on Internet Tomography Chen-Nee Chuah Robust & Ubiquitous Networking (RUBINET) Lab http://www.ece.ucdavis.edu/rubinet Electrical & Computer
More informationReinforcement learning algorithms for non-stationary environments. Devika Subramanian Rice University
Reinforcement learning algorithms for non-stationary environments Devika Subramanian Rice University Joint work with Peter Druschel and Johnny Chen of Rice University. Supported by a grant from Southwestern
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/25/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 3 In many data mining
More informationDEPARTMENT OF EECS MASSACHUSETTS INSTITUTE OF TECHNOLOGY Fall Quiz III. December 18, 2012
6.02 Fall 2012, Quiz 3 Page1of13 Name: DEPARTMENT OF EECS MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.02 Fall 2012 Quiz III December 18, 2012 your section Section Time Recitation Instructor TA D 1 10-11 Victor
More informationDistributed Data Aggregation Scheduling in Wireless Sensor Networks
Distributed Data Aggregation Scheduling in Wireless Sensor Networks Bo Yu, Jianzhong Li, School of Computer Science and Technology, Harbin Institute of Technology, China Email: bo yu@hit.edu.cn, lijzh@hit.edu.cn
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/24/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 High dim. data
More informationAdvanced Algorithmics (6EAP) MTAT Hashing. Jaak Vilo 2016 Fall
Advanced Algorithmics (6EAP) MTAT.03.238 Hashing Jaak Vilo 2016 Fall Jaak Vilo 1 ADT asscociative array INSERT, SEARCH, DELETE An associative array (also associative container, map, mapping, dictionary,
More informationTowards Effective Packet Classification. J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005
Towards Effective Packet Classification J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005 Outline Algorithm Study Understanding Packet Classification Worst-case Complexity
More informationInter-AS routing. Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley
Inter-AS routing Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley Some materials copyright 1996-2012 J.F Kurose and K.W. Ross, All Rights Reserved Chapter 4:
More informationYouki Kadobayashi NAIST
Information Network 1 Routing (1) Image: Part of the entire Internet topology based on CAIDA dataset, using NAIST Internet viewer Youki Kadobayashi NAIST 1 The Routing Problem! How do I get from source
More informationTree-Based Minimization of TCAM Entries for Packet Classification
Tree-Based Minimization of TCAM Entries for Packet Classification YanSunandMinSikKim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington 99164-2752, U.S.A.
More informationDynamic Search Algorithm in P2P Networks
Dynamic Search Algorithm in P2P Networks Prabhudev.S.Irabashetti M.tech Student,UBDTCE, Davangere, Abstract- Designing efficient search algorithms is a key challenge in unstructured peer-to-peer networks.
More informationCS454/654 Midterm Exam Fall 2004
CS454/654 Midterm Exam Fall 2004 (3 November 2004) Question 1: Distributed System Models (18 pts) (a) [4 pts] Explain two benefits of middleware to distributed system programmers, providing an example
More information2. Modelling of telecommunication systems (part 1)
2. Modelling of telecommunication systems (part ) lect02.ppt S-38.45 - Introduction to Teletraffic Theory - Fall 999 2. Modelling of telecommunication systems (part ) Contents Telecommunication networks
More informationScalable P2P architectures
Scalable P2P architectures Oscar Boykin Electrical Engineering, UCLA Joint work with: Jesse Bridgewater, Joseph Kong, Kamen Lozev, Behnam Rezaei, Vwani Roychowdhury, Nima Sarshar Outline Introduction to
More informationCounter Braids: A novel counter architecture
Counter Braids: A novel counter architecture Balaji Prabhakar Balaji Prabhakar Stanford University Joint work with: Yi Lu, Andrea Montanari, Sarang Dharmapurikar and Abdul Kabbani Overview Counter Braids
More informationSCAN: Scalable Content Routing for Content-Aware Networking
: Scalable Routing for -Aware Networking Munyoung Lee, Kideok Cho, Kunwoo Park, Ted Taekyoung Kwon, and Yanghee Choi School of Computer Science and Engineering Seoul National University, Seoul, Korea Email:
More informationIntroduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far
Chapter 5 Hashing 2 Introduction hashing performs basic operations, such as insertion, deletion, and finds in average time better than other ADTs we ve seen so far 3 Hashing a hash table is merely an hashing
More informationResearch Students Lecture Series 2015
Research Students Lecture Series 215 Analyse your big data with this one weird probabilistic approach! Or: applied probabilistic algorithms in 5 easy pieces Advait Sarkar advait.sarkar@cl.cam.ac.uk Research
More informationEnergy-Efficient Communication Protocol for Wireless Micro-sensor Networks
Energy-Efficient Communication Protocol for Wireless Micro-sensor Networks Paper by: Wendi Rabiner Heinzelman, Anantha Chandrakasan, and Hari Balakrishnan Outline Brief Introduction on Wireless Sensor
More informationLecture 13: Link-state Routing. CSE 123: Computer Networks Alex C. Snoeren
Lecture 3: Link-state Routing CSE 23: Computer Networks Alex C. Snoeren Lecture 3 Overview Routing overview Intra vs. Inter-domain routing Link-state routing protocols 2 Router Tasks Forwarding Move packet
More informationCS155b: E-Commerce. Lecture 3: Jan 16, How Does the Internet Work? Acknowledgements: S. Bradner and R. Wang
CS155b: E-Commerce Lecture 3: Jan 16, 2001 How Does the Internet Work? Acknowledgements: S. Bradner and R. Wang Internet Protocols Design Philosophy ordered set of goals 1. multiplexed utilization of existing
More informationSIGCOMM 17 Preview Session: Network Monitoring
SIGCOMM 17 Preview Session: Network Monitoring Ying Zhang Software Engineer, Facebook Network monitoring is important! Performance Diagnose long delay/loss problems Utilization Traffic engineering Availability
More informationEARM: An Efficient and Adaptive File Replication with Consistency Maintenance in P2P Systems.
: An Efficient and Adaptive File Replication with Consistency Maintenance in P2P Systems. 1 K.V.K.Chaitanya, 2 Smt. S.Vasundra, M,Tech., (Ph.D), 1 M.Tech (Computer Science), 2 Associate Professor, Department
More informationQCN: Quantized Congestion Notification. Rong Pan, Balaji Prabhakar, Ashvin Laxmikantha
QCN: Quantized Congestion Notification Rong Pan, Balaji Prabhakar, Ashvin Laxmikantha Overview Description the QCN scheme Pseudocode available with Rong Pan: ropan@cisco.com Basic simulations Infinitely
More informationPeer-to-Peer Systems. Chapter General Characteristics
Chapter 2 Peer-to-Peer Systems Abstract In this chapter, a basic overview is given of P2P systems, architectures, and search strategies in P2P systems. More specific concepts that are outlined include
More informationCSE 461 Routing. Routing. Focus: Distance-vector and link-state Shortest path routing Key properties of schemes
CSE 46 Routing Routing Focus: How to find and set up paths through a network Distance-vector and link-state Shortest path routing Key properties of schemes Application Transport Network Link Physical Forwarding
More informationLecture 11: Packet forwarding
Lecture 11: Packet forwarding Anirudh Sivaraman 2017/10/23 This week we ll talk about the data plane. Recall that the routing layer broadly consists of two parts: (1) the control plane that computes routes
More informationNetworking Sensors, II
Networking Sensors, II Sensing Networking Leonidas Guibas Stanford University Computation CS321 ZG Book, Ch. 3 1 Class Administration Paper presentation preferences due today, by class time Project info
More informationSummarizing and mining inverse distributions on data streams via dynamic inverse sampling
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling Presented by Graham Cormode cormode@bell-labs.com S. Muthukrishnan muthu@cs.rutgers.edu Irina Rozenbaum rozenbau@paul.rutgers.edu
More informationHeckaton. SQL Server's Memory Optimized OLTP Engine
Heckaton SQL Server's Memory Optimized OLTP Engine Agenda Introduction to Hekaton Design Consideration High Level Architecture Storage and Indexing Query Processing Transaction Management Transaction Durability
More informationNaming in Distributed Systems
Naming in Distributed Systems Dr. Yong Guan Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Outline for Today s Talk Overview: Names, Identifiers,
More informationBloom Filter for Network Security Alex X. Liu & Haipeng Dai
Bloom Filter for Network Security Alex X. Liu & Haipeng Dai haipengdai@nju.edu.cn 313 CS Building Department of Computer Science and Technology Nanjing University Bloom Filters Given a set S = {x 1,x 2,x
More information