Spotify Behind the Scenes

Similar documents
Spotify Behind the Scenes

Octoshape. Commercial hosting not cable to home, founded 2003

P2P Applications. Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza Canale A-L Prof.ssa Chiara Petrioli

CS 344/444 Computer Network Fundamentals Final Exam Solutions Spring 2007

Internet Technology. 06. Exam 1 Review Paul Krzyzanowski. Rutgers University. Spring 2016

P2P Applications. Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza

Internet Technology 3/2/2016

15-441: Computer Networks Homework 3

CMPE150 Midterm Solutions

ECE 435 Network Engineering Lecture 10

Lecture 17: Peer-to-Peer System and BitTorrent

Chapter 2: Application layer

Fundamental Questions to Answer About Computer Networking, Jan 2009 Prof. Ying-Dar Lin,

CMSC 332 Computer Networks P2P and Sockets

Introduction to P P Networks

Chapter 2 Application Layer. Lecture 4: principles of network applications. Computer Networking: A Top Down Approach

Transmission Control Protocol. ITS 413 Internet Technologies and Applications

ECSE-6600: Internet Protocols Spring 2007, Exam 1 SOLUTIONS

Content distribution networks

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

ICS 351: Networking Protocols

Topics. TCP sliding window protocol TCP PUSH flag TCP slow start Bulk data throughput

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

Communication Networks

bitcoin allnet exam review: transport layer TCP basics congestion control project 2 Computer Networks ICS 651

CS 514: Transport Protocols for Datacenters

Wireless Challenges : Computer Networking. Overview. Routing to Mobile Nodes. Lecture 25: Wireless Networking

SamKnows test methodology

ELEC / COMP 177 Fall Some slides from Kurose and Ross, Computer Networking, 5 th Edition

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

Overview. TCP & router queuing Computer Networking. TCP details. Workloads. TCP Performance. TCP Performance. Lecture 10 TCP & Routers

UDP, TCP, IP multicast

The Controlled Delay (CoDel) AQM Approach to fighting bufferbloat

A First Look at Traffic on Smartphones

DVS-200 Configuration Guide

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

No, the bogus packet will fail the integrity check (which uses a shared MAC key).!

Topics in P2P Networked Systems

Managing Caching Performance and Differentiated Services

Final Exam for ECE374 05/03/12 Solution!!

Network Security: Broadcast and Multicast. Tuomas Aura T Network security Aalto University, Nov-Dec 2010

! " Lecture 5: Networking for Games (cont d) Packet headers. Packet footers. IP address. Edge router (cable modem, DSL modem)

Peer-to-Peer Internet Applications: A Review

Lecture 3: The Transport Layer: UDP and TCP

Impact of transmission errors on TCP performance. Outline. Random Errors

Review problems (for no credit): Transport and Network Layer

Networking and Internetworking 1

Multimedia! 23/03/18. Part 3: Lecture 3! Content and multimedia! Internet traffic!

Part 3: Lecture 3! Content and multimedia!

CS 457 Multimedia Applications. Fall 2014

Introduction... xiii Chapter 1: Introduction to Computer Networks and Internet Computer Networks Uses of Computer Networks...

Network Technology 1 5th - Transport Protocol. Mario Lombardo -

Experimental Study of Skype. Skype Peer-to-Peer VoIP System

CSC 4900 Computer Networks: P2P and Sockets

Content Overlays (continued) Nick Feamster CS 7260 March 26, 2007

Architecture and Evaluation of an Unplanned b Mesh Network

Networking and Internetworking 1

Chapter 5.6 Network and Multiplayer

8. TCP Congestion Control

Network Security: Broadcast and Multicast. Tuomas Aura T Network security Aalto University, Nov-Dec 2011

Exploring Alternative Routes Using Multipath TCP

Outline Computer Networking. TCP slow start. TCP modeling. TCP details AIMD. Congestion Avoidance. Lecture 18 TCP Performance Peter Steenkiste

Lecture 11. Transport Layer (cont d) Transport Layer 1

Confused, Timid, and Unstable: Picking a Video Streaming Rate is Hard

Review of Previous Lecture

Computer Communications DIT 420 EDA343

CS 138: Communication I. CS 138 V 1 Copyright 2012 Thomas W. Doeppner. All rights reserved.

Defining the Internet

NT1210 Introduction to Networking. Unit 10

CSCI-1680 P2P Rodrigo Fonseca

CSCI 466 Midterm Networks Fall 2013

CS 5520/ECE 5590NA: Network Architecture I Spring Lecture 13: UDP and TCP

Outline. Connecting to the access network: DHCP and mobile IP, LTE. Transport layer: UDP and TCP

TSIN02 - Internetworking

Peer-to-Peer Systems. Internet Computing Workshop Tom Chothia

Last Time. Internet in a Day Day 2 of 1. Today: TCP and Apps

DOCUMENT REFERENCE: SQ EN. SAMKNOWS TEST METHODOLOGY Methodology and technical information relating to the SamKnows testing platform.

TCP Congestion Control

TCP Congestion Control

Multimedia Networking

ECE 650 Systems Programming & Engineering. Spring 2018

TCP in Asymmetric Environments

CMSC 417. Computer Networks Prof. Ashok K Agrawala Ashok Agrawala. October 25, 2018

Internetworking With TCP/IP

Lecture 2: January 24

Peer-to-Peer Networks

Networking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ

An In-depth Study of LTE: Effect of Network Protocol and Application Behavior on Performance

Wireless TCP Performance Issues

TSIN02 - Internetworking

CS November 2017

Lecture 14: Performance Architecture

CS 43: Computer Networks. 19: TCP Flow and Congestion Control October 31, Nov 2, 2018

Resource Control and Reservation

RSVP 1. Resource Control and Reservation

Operating Systems. Week 13 Recitation: Exam 3 Preview Review of Exam 3, Spring Paul Krzyzanowski. Rutgers University.

CS 416: Operating Systems Design April 22, 2015

Load Balancing Technology White Paper

Outline. TCP: Overview RFCs: 793, 1122, 1323, 2018, Development of reliable protocol Sliding window protocols

CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS

Transcription:

Background Spotify gkreitz@spotify.com LiTH, November 22 2010

What is Spotify? Lightweight on-demand streaming Large catalogue, over 10 million tracks Available in 7 European countries, over 10 million users Fast (median playback latency of 265 ms) Legal mage based on Western Europe Map, Wikimedia Commons, CC BY SA 3.0

Background Business dea More convenient than piracy Spotify Open (ads, 20h/month) Spotify Free (ads, invite needed) Spotify Unlimited (no ads, on computer) Spotify Premium (no ads, mobile, offline, AP)

Background Business dea More convenient than piracy Spotify Open (ads, 20h/month) Spotify Free (ads, invite needed) Spotify Unlimited (no ads, on computer) Spotify Premium (no ads, mobile, offline, AP)

Background Spotify Tech Team Almost all developers in Stockholm Very talented people Proud of the product Team size: > 50 We re growing fast and hiring!

Development Environment Scrum methodology with three week sprints Some cross-functional teams, some specialized project teams Kanban for some teams Scrum teams consist of programmers, testers and designers Hack days

Technical Design Goals Available Fast Scalable Secure

The mportance of Being Fast How important is speed? ncreasing latency of Google searches by 100 to 400ms decreased usage by 0.2% to 0.6% [Brutlag09] The decreased usage persists Median playback latency in Spotify is 265 ms (feels immediate)

Background The forbidden word (By http://www.flickr.com/photos/marxalot/, CC BY-SA 2.0)

Background Client Software Desktop clients on Linux (preview), OS X and Windows Windows version works well under Wine Smartphone clients on Android, ios, Palm, Symbian, Windows Phone libspotify on Linux, OS X and Windows Sonos and Telia hardware players Mostly in C++, some Objective-C++ and Java

Client Software vs. Web-based Web-based applications are easier to update and maintain Web-based don t need to be installed Client software still gives better user experience Volume control, separate application, faster Auto-upgrades eases parts of installation pain

Background Everything is a link spotify: UR scheme spotify:track:6jek0cvvjdjjmubfoxshnz#0:44 spotify:user:gkreitz:playlist: 4W5L19AvhsGC3U9xm6lQ9Q spotify:search:never+gonna+give+you+up New UR schemes not universally supported http://open.spotify.com/track/ 6JEK0CvvjDjjMUBFoXShNZ#0:44

Background Everything is a link spotify: UR scheme spotify:track:6jek0cvvjdjjmubfoxshnz#0:44 spotify:user:gkreitz:playlist: 4W5L19AvhsGC3U9xm6lQ9Q spotify:search:never+gonna+give+you+up New UR schemes not universally supported http://open.spotify.com/track/ 6JEK0CvvjDjjMUBFoXShNZ#0:44

Background Links contain opaque id:s (mage from XKCD, http://www.xkcd.com/351)

Background Metadata AP Simple, http-based AP Search and lookup http://ws.spotify.com/lookup/1/?uri=spotify:track: 6JEK0CvvjDjjMUBFoXShNZ http://ws.spotify.com/search/1/artist?q=foo Developer resources: http://developer.spotify.com/

Background Overview of Spotify Protocol Proprietary protocol Designed for on-demand streaming Only Spotify can add tracks 96 320 kbps audio streams (most are Ogg Vorbis q5, 160 kbps) Peer-assisted streaming Photo by opethpainter http://www.flickr.com/photos/opethpainter/3452027651, CC BY 2.0

Spotify Protocol (Almost) Everything over TCP (Almost) Everything encrypted Multiplex messages over a single TCP connection Persistent TCP connection to server while logged in

Caches Player caches tracks it has played Default policy is to use 10% of free space (capped at 10 GB) Caches are large (56% are over 5 GB) Least Recently Used policy for cache eviction Over 50% of data comes from local cache Cached files are served in P2P overlay

a Track Request first piece from Spotify servers Meanwhile, search Peer-to-peer (P2P) for remainder Switch back and forth between Spotify servers and peers as needed Towards end of a track, start prefetching next one

Background TCP Congestion Window TCP maintains several windows, among them cwnd cwnd is used to avoid network congestion A TCP sender can never have more than cwnd un-ack:ed bytes outstanding Additive increase, multiplicative decrease What to do with cwnd when a connection sits idle? RFC 5681 (TCP Congestion Control) says: Therefore, a TCP SHOULD set cwnd to no more than RW before beginning transmission if the TCP has not sent data in an interval exceeding the retransmission timeout.

Background TCP Congestion Window TCP maintains several windows, among them cwnd cwnd is used to avoid network congestion A TCP sender can never have more than cwnd un-ack:ed bytes outstanding Additive increase, multiplicative decrease What to do with cwnd when a connection sits idle? RFC 5681 (TCP Congestion Control) says: Therefore, a TCP SHOULD set cwnd to no more than RW before beginning transmission if the TCP has not sent data in an interval exceeding the retransmission timeout.

TCP Congestion Window and Spotify Spotify traffic is bursty nitial burst is very latency-critical Want to avoid needless reduction of congestion window Configure kernels to not follow the RFC 5681 SHOULD.

Background When to Start Playing? Minimize latency while avoiding stutter TCP throughput varies Sensitive to packet loss Bandwidth over wireless mediums vary Model throughput as a Markov chain and simulate Heuristics mage by ronin691 http://www.flickr.com/photos/ronin691/3482770627, CC BY SA 2.0

Background Security Through Obscurity Client must be able to access music data Reverse engineers should not So, we can t tell you exactly how our client works Plus, we need to apply software obfuscation mage by XKCD http://xkcd.com/730/, CC BY NC 2.5

Background Security Through Obscurity (mage from XKCD, http://www.xkcd.com/257)

P2P Goals Easier to scale Less servers Lass bandwidth Better uptime Fun!

Comparison with Video-on-Demand Lower bitrate Shorter objects More objects Different access pattern More active users Users play their favorite tracks often

P2P Structure Unstructured network (not a Distributed Hash Table) Nodes have fixed maximum degree (60) Neighbor eviction by heuristic evaluation of utility No overlay routing

P2P Structure All peers are equals (no supernodes) A user only downloads data she needs P2P network becomes (weakly) clustered by interest Oblivious to network architecture

Brief Comparison to BitTorrent One (well, two) P2P overlay for all tracks (not per-torrent) Does not inform peers about downloaded blocks Downloads blocks in order Does not enforce fairness (such as tit-for-tat) nforms peers about urgency of request

Finding Peers Sever-side tracker (BitTorrent style) Only remembers 20 peers per track Returns 10 (online) peers to client on query Broadcast query in small (2 hops) neighborhood in overlay (Gnutella style) LAN peer discovery (cherry on top) Client uses all mechanisms for every track

Downloading in P2P Ask for most urgent pieces first f a peer is slow, re-request from new peers When buffers are low, download from central server as well When doing so, estimate what point P2P will catch up from f buffers are very low, stop uploading

Limit resource usage Cap number of neighbors Cap number of simultaneous uploads TCP Congestion Control gives fairness between connections Cap cache size Mobile clients don t participate in P2P

P2P NAT Traversal Asks to open ports via UPnP Attempt connections in both directions High connection failure rate (65%) Room for improvement

Security in our P2P Network Control access to participate Verify integrity of downloaded files Data transfered in P2P network is encrypted Usernames are not exposed in P2P network, all peers assigned pseudonym

Avoiding hijacking A peer cannot ask peers to connect to arbitrary P adress/port Avoiding DDoS issues Misbehaving peers are reported

Background Back End Software Comprised of many small services Do one task and do it well Python, C++, Java, Scala No common framework (yet)

High-level overview Client connects to an Access Point (AP) AP handles authentication and encryption AP demultiplexes requests, forwards to backend servers Gives redundancy and fault-tolerance

High-level overview (cont d) Proprietary Playlist Client Proprietary AP HTTP HTTP Search HTTP Storage User

Background Locating an Access Point DNS SRV lookup of _spotify-client._tcp.spotify.com GeoDNS to return access point close to you Fallback to A record for ap.spotify.com Seeing problems with large responses (TCP DNS in home routers)

Communcating with backend servers Most common backend protocol is HTTP Some services need to push information, e.g. playlist Currently, each such service has its own protocol Moving towards a more unified backend protocol

Lookup, version 1 Put content on random servers Multicast UDP to find server Each server has a small daemon with an index, responding to lookup queries Scaling issues

Lookup, version 2 DNS-based (using TXT records) Distributed Hash Table Each server handles parts of keyspace Hash key to find master server Repeated hashing to find slaves

Background Storage

Playlist Our most complex service (!) Simultaneous writes with automatic conflict resolution Publish-subscribe system to clients Changes automatically versioned, transmits deltas Terabyte sizes of data

Evaluation So, how well does it work? Collected measurements 23 29 March 2010 (Before Facebook integration, local files,...)

Data Sources % 100 90 80 70 60 50 Weekday Morning Data source - ratio - by week Weekend Night RRDTOOL / TOB OETKER 40 30 20 10 0 Mon Tue Wed Thu Fri Sat Sun Mon Cur: Min: Avg: Server 10.86 6.76 9.62 P2P 33.90 23.78 33.86 Cache 55.24 48.47 56.53

Data Sources Mostly minor variations over time Better P2P performance on weekends P2P most effective at peak hours 8.8% from servers 35.8% from P2P 55.4% from caches

Latency and Stutter Median latency: 265 ms 75th percentile: 515 ms 90th percentile: 1047 ms Below 1% of playbacks had stutter occurrences

Finding Peers Table: Sources of peers Sources for peers Fraction of searches Tracker and P2P 75.1% Only Tracker 9.0% Only P2P 7.0% No Peers Found 8.9% Each mechanism by itself is fairly effective

Protocol Overhead Table: Distribution of application layer traffic in overlay network Type Fraction Music Data, Used 94.80% Music Data, Unused 2.38% Search Overhead 2.33% Other Overhead 0.48% Measured at socket layer Unused data means it was cancelled/duplicate

Thank you! Questions? gkreitz@spotify.com