Data Analytics for IoT: Applications to Security and Privacy Nick Feamster Princeton University
Growing Market for IoT Analytics More than 25 billion devices by 2020 Each of these devices generates data. (Cows generating 200 MB of data per year!) Devices deployed in homes, industrial control systems, business, etc. 2
Established Companies Collecting IoT Data Google: Incorporation of data from phones, thermostats, streaming devices, etc. Amazon: Echo, Fire Phone, Fire TV, Fire TV stick Samsung: Smart TVs, Smart Things Hub Apple: Apple Watch, Car Play 3
Many Companies Developing Platforms for IoT Analytics Intel: Cloud Analytics Guavus, Pentaho: IoT Analytics, Smart Metering IoTivity: Platform for developing IoT devices Iobeam: Analytics for device operations (monitoring failures, etc.) Samsara: Easy sensor deployment for fleet monitoring, IT management, etc. 4
Applications of IoT Analytics Fault management: Identifying faults, root cause analysis of system faults Smart metering: What if scenarios for energy management, anomaly detection Predictive maintenance: predicting outages, real-time alerts, etc. Resource optimization: identifying underutilized assets, abnormal consumption patterns Asset tracking: monitor assets in real-time, trace asset use Security and breach detection: Alert operators to intrusion events and risks 5
Do you think concerns over data accuracy in IoT is slowing organizations ability to create solid strategies around analyzing the data? 6
This Talk: IoT Analytics for Security and Privacy Problem: Devices deployed with insecure software that will never be patched. Simply identifying what is connected to the network is difficult. Identifying anomalies, data leaks, etc. even more difficult: Each device is different, anomalous activities may not generate a lot of traffic. 7
Example: DoS Attacks on Smart House! 8
Empirical Study: Connect and Monitor SmartSense Multi-sensor PixStar Digital Photoframe Sharx Security IP Camera Smartthings Hub WiFi Z-Wave Laptop Gateway (Passive Monitor) Belkin WeMo Switch Nest Thermostat Ubi Smart Speaker 9
A Growing Security Problem Increasing number of Internet-connected IoT devices in consumer homes. Devices ship with security & privacy flaws. Cannot rely on manufacturers alone to secure software or devices 10
Current State of Consumer Smart Devices Many different manufacturers, small startups, novice programmers Low capability hardware, not enough for security protocols Most data goes to an online server on the cloud Even devices in the same home communicate via the cloud forgerock.com 11
Security Risks of IoT Devices Devices may be difficult (or impossible!) to patch Not isolated from one another (can attack one another) Not isolated from the Internet (can attack other devices on the Internet) 12
Approach IoT lab to collect, analyze IoT traffic Machine learning algorithms to address: Device identification: Which devices are connected to the network? Anomaly detection: Are devices behaving abnormally due to compromise? 13
Privacy Risks of IoT Data Leaks of private user information Leaks of what devices are being used Leaks about user activity and behavior Often do not use encryption by default email:xxx@y.com URI: smart-light json:{ activity : switch_on } 14
Technical Challenges Richness of patterns: Heterogeneous devices; diverse traffic patterns Feature design and selection: Succinct, effective feature representations Real-time anomaly detection: Balancing time efficiency with detection accuracy. Shifts in normal behavior over time Recalibration in deployment: Deployment settings differ from testbeds 15
Data Analytics of Existing IoT Devices SmartSense Multi-sensor PixStar Digital Photoframe Sharx Security IP Camera Smartthings Hub WiFi Z-Wave Laptop Gateway (Passive Monitor) Belkin WeMo Switch Nest Thermostat Ubi Smart Speaker 16
Subproblems for Security and Privacy Analytics What devices are connected to the network? Approach: Statistical approaches to analyze network traffic patterns. What data is being leaked to the cloud? Approach: Scalable traffic monitoring. (Note: When data is encrypted, this may become more challenging!) What are devices doing? Are devices infected? Approach: Statistical network anomaly detection. (Note: Anomalies may be low volume events.) 17
Device Fingerprinting: What is Connected to the Network? What devices are connected to the network? (What devices, what manufacturers?) What is the device doing? (Activity recognition.) Approach: Simple traffic analysis can reveal manufacturers (e.g., DNS lookup). Spectral clustering and reveal activity patterns. For example: Spectral clustering applied to simple traffic volume can identify different activities on thermostat. 18
Device Fingerprinting: Network Traffic Analysis Statistical analysis of network traffic features uniquely identify devices, and their characteristic behavior Challenge: Often the features that identify a device type are not high energy (needle in a haystack!) Sensor Switch 19
Device Fingerprinting: Powerline Analysis Signatures in frequency spectrum on powerline can identify devices and activities. Switched mode power supplies and low-power devices are more challenging to discern. 20
Subproblems for Security and Privacy Analytics What devices are connected to the network? Approach: Statistical approaches to analyze network traffic patterns. What data is being leaked to the cloud? Approach: Scalable traffic monitoring. (Note: When data is encrypted, this may become more challenging!) What are devices doing? Are devices infected? Approach: Statistical network anomaly detection. (Note: Anomalies may be low volume events.) 21
Example: Nest Thermostat: Traffic Analysis All traffic to nest is HTTPS on port 443 and 9543 Uses TLSv1.2 and TLSv1.0 for all traffic We found some incoming weather updates containing location information of the home and weather station in the clear. Nest has fixed this bug after our report. DNS query: time.nestlabs.com, frontdoor.nest.com, log-rts01- iad01.devices.nest.net. transport01-rts04- iad01.transport.home.nest.com 22
Nest: Privacy Issues Fairly secure device: all outgoing personal traffic, including configuration settings and updates to the server, use HTTPS *User zip code bug has been fixed user zip code* DNS query as well as the use of the unique port 9543 clearly identifies a Nest device. 23
Digital Photoframe: Traffic Analysis All traffic and feeds (RSS) cleartext over HTTP port 80 All actions sent to server in HTTP GET packet Downloads radio streams in cleartext over different ports DNS queries: api.pix-star.com, iptime.pix-star.com 24
Photoframe: Privacy Issues User email ID is in clear text when syncing account Current user activity in clear text in HTTP GET DNS queries and HTTP traffic identifies a pix-star photoframe current activity email 25
IP Camera: Traffic Analysis All traffic over cleartext HTTP port 80, even though viewing the stream requires login password Actions are sent as HTTP GET URI strings Videos are sent as image/jpeg and image/gif in the clear FTP requests also sent in clear over port 21, and FTP data is sent in clear text over many ports above 30,000 DNS query: www.sharxsecurity.com 26
IP Camera: Privacy Issues Video can be recovered from FTP data traffic by network eavesdropper DNS query, HTTP headers, and ports identify a Sharx security camera private user data 27
Ubi: Traffic Analysis All voice-to-text traffic sent in clear over port 80 Activities sent in clear, and radio streamed over port 80 Sensor readings are synced with server in the background over port 80 Only communication with google API used HTTPS on port 443 and port 5228 (google talk) DNS query: portal.theubi.com, www.google.com, mtalk.google.com, api.grooveshark.com 28
current state Ubi: Privacy Issues Although HTTPS is clearly available, Ubi still uses HTTP to communicate to its portal. Eavesdropper can intercept all voice chats and sensor readings to Ubi s main portal Sensor values such as sound, temperature, light, humidity can identify if the user is home and currently active Email in the clear can identify the user DNS query, HTTP header (UA, Host) clearly identifies Ubi device current activity 29
Are companies that are investing in IoT doing enough to ensure end-toend encryption of data or do you feel like they have a long way to go? 30
Subproblems for Security and Privacy Analytics What devices are connected to the network? Approach: Statistical approaches to analyze network traffic patterns. What data is being leaked to the cloud? Approach: Scalable traffic monitoring. (Note: When data is encrypted, this may become more challenging!) What are devices doing? Are devices infected? Approach: Statistical network anomaly detection. (Note: Anomalies may be low volume events.) 31
Network Traffic from Home Routers Network traffic patterns reveal usage, sometimes reveal power cycling. In some cases, could determine anomalous activity, or human behavior. 32
Gathering IoT Data from Powerline Capture voltage samples first 200 KHz from Powerline Sample rate of 400 khz Extract Time Domain features min, max, mean, variance, kurtosis, skewness, IQR etc. over 5 sec window Run machine learning algorithm with above tuple as features Decision Tree, Random Forest 33
Activity Data from Powerline Activity periods clearly visible from differences in powerline frequency. IoT data analytics can determine whether devices are active in normal or unusual ways. Detection of activities, infections, etc. likely 34
Conclusion: Much Left to Do! Large and growing market for IoT Analytics Security and privacy will be huge markets for IoT Analytics Devices are difficult to secure, patch, and maintain. Insecure devices will always be connected to the network. IoT devices will continue to send data to the cloud, third parties, etc. Identifying data leaks will be important. Collection of IoT data at the network gateway is a promising approach. Both network traffic and powerline information may be revealing Plenty of opportunities for new businesses, technologies in this space. Get in Touch! Nick Feamster: feamster@cs.princeton.edu 35