OSSPolice - Identifying Open-Source License Violation and 1-day Security Risk at Large Scale

Similar documents
VirtualSwindle: An Automated Attack Against In-App Billing on Android

Android Kernel Security

The Android security jungle: pitfalls, threats and survival tips. Scott

OPEN SOURCE SECURITY ANALYSIS The State of Open Source Security in Commercial Applications

H.-S. Oh, B.-J. Kim, H.-K. Choi, S.-M. Moon. School of Electrical Engineering and Computer Science Seoul National University, Korea

Advancing Clangd. Bringing persisted indexing to Clang tooling. Marc-André Laperle, Ericsson

Black Hat Webcast Series. C/C++ AppSec in 2014

When providing a native mobile app ruins the security of your existing web solution. CyberSec Conference /11/2015 Jérémy MATOS

BUILDING APPLICATION SECURITY INTO PRODUCTION CONTAINER ENVIRONMENTS Informed by the National Institute of Standards and Technology

Sonatype CLM Enforcement Points - Nexus. Sonatype CLM Enforcement Points - Nexus

ConnectWise Automate. What is ConnectWise Automate?

THE THREE WAYS OF SECURITY. Jeff Williams Co-founder and CTO Contrast Security

Container Deployment and Security Best Practices

Oracle Database Security Assessment Tool

Lecture 1 Introduction to Android. App Development for Mobile Devices. App Development for Mobile Devices. Announcement.

This slide is relevant to providing either a single three hour training session or explaining how a series of shorter sessions focused on per chapter

Mobile and Ubiquitous Computing: Android Programming (part 1)

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY Fall Quiz I

Things You May Not Know About Android (Un)Packers: A Systematic Study based on Whole- System Emulation

Tale of a mobile application ruining the security of global solution because of a broken API design. SIGS Geneva 21/09/2016 Jérémy MATOS

Data Mining Practical Machine Learning Tools And Techniques With Java Implementations The Morgan Kaufmann Series In Data Management Systems

The Impact of Third-party Code on Android App Security. Erik Derr

The checklist is dynamic, not exhaustive, and will be updated regularly. If you have any suggestions or comments, we would like to hear from you.

Honours/Master/PhD Thesis Projects Supervised by Dr. Yulei Sui

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

THOMAS LATOZA SWE 621 FALL 2018 DESIGN ECOSYSTEMS

Jenkins: A complete solution. From Continuous Integration to Continuous Delivery For HSBC

Android Application Development A Beginners Tutorial

Detecting Advanced Android Malware by Data Flow Analysis Engine. Xu Hao & pll

Reconstructing DALVIK. Applications. Marc Schönefeld CANSECWEST 2009, MAR18

Chapter 12 Databases and Database Management Systems

CS152 Programming Language Paradigms Prof. Tom Austin, Fall Syntax & Semantics, and Language Design Criteria

A Framework for Evaluating Mobile App Repackaging Detection Algorithms

HTML5 Evolution and Development. Matt Spencer UI & Browser Marketing Manager

Azure DevOps. Randy Pagels Intelligent Cloud Technical Specialist Great Lakes Region

Micro Focus Fortify Application Security

Understanding and Detecting Wake Lock Misuses for Android Applications

Modern and Fast: A New Wave of Database and Java in the Cloud. Joost Pronk Van Hoogeveen Lead Product Manager, Oracle

A Method-Based Ahead-of-Time Compiler For Android Applications

IJRDTM Kailash ISBN No Vol.17 Issue

Understanding and Detecting Wake Lock Misuses for Android Applications

Black Hat Python Python Programming For Hackers And Pentesters

FreeBSD and Git. Ed Maste - FreeBSD Vendor Summit 2018

HexType: Efficient Detection of Type Confusion Errors for C++ Yuseok Jeon Priyam Biswas Scott A. Carr Byoungyoung Lee Mathias Payer

Tracelet- Based Code Search in Executables. Yaniv David & Eran Yahav Technion,Israel

MOBILE DEFEND. Powering Robust Mobile Security Solutions

KLEE Workshop Feeding the Fuzzers. with KLEE. Marek Zmysłowski MOBILE SECURITY TEAM R&D INSTITUTE POLAND

Investigative Response Case Metrics Initiative Preliminary findings from 700+ data compromise investigations

Security and Privacy in a Big Data World

FROM VSTS TO AZURE DEVOPS

Protecting MySQL network traffic. Daniël van Eeden 25 April 2017

Dalvik And Art Android Internals Newandroidbook

Supervised classification of law area in the legal domain

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

CS370 Operating Systems

Introduction. CS 2210 Compiler Design Wonsun Ahn

GFI product comparison: GFI LanGuard 12 vs Microsoft Windows Intune (February 2015 Release)

Automatic Intra-Application Load Balancing for Heterogeneous Systems

Inference of Memory Bounds

User Interface. An Introductory Guide

Introduction to Java

IntFlow: Integer Error Handling With Information Flow Tracking

Security in a Mainframe Emulator. Chaining Security Vulnerabilities Until Disaster Strikes (twice) Author Tim Thurlings & Meiyer Goren

Comprehensive Mitigation

Adaptive Android Kernel Live Patching

Copyright 2015 MathEmbedded Ltd.r. Finding security vulnerabilities by fuzzing and dynamic code analysis

The State of Open Source Security

CMake & Ninja. by István Papp

HIPAA COMPLIANCE WHAT YOU NEED TO DO TO ENSURE YOU HAVE CYBERSECURITY COVERED

How to construct a sustainable vulnerability management program

IPLocks Vulnerability Assessment: A Database Assessment Solution

Securing Your Campus

Mobile-as-a-Medical-Device (Security) David Kleidermacher Chief Security Officer, BlackBerry

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY Fall Quiz I

Synology Security Whitepaper

SoK: Eternal War in Memory Laszlo Szekeres, Mathias Payer, Tao Wei, and Dawn Song In: Oakland 14

MobileFindr: Function Similarity Identification for Reversing Mobile Binaries. Yibin Liao, Ruoyan Cai, Guodong Zhu, Yue Yin, Kang Li

CS 110 Computer Architecture. Lecture 2: Introduction to C, Part I. Instructor: Sören Schwertfeger.

THE CYBERSECURITY LITERACY CONFIDENCE GAP

How to set up FMOD*, Cocos2D-x*, and OpenAL* Libraries for Android* on Intel Architecture

Summation & ediscovery Patches Release Notes

Vulnerability disclosure

An Evil Copy: How the Loader Betrays You

Summation & ediscovery Patches Release Notes

Java 9: Tips on MigraDon and Upgradability

Unpacking the Packed Unpacker

CIP-014. JEA Compliance Approach. FRCC Fall Compliance Workshop Presenter Daniel Mishra

Detecting Third-Party Libraries in Android Applications with High Precision and Recall

Another difference is that the kernel includes only the suspend to memory mechanism, and not the suspend to hard disk, which is used on PCs.

Security through Multi-Layer Diversity

Repairing crashes in Android Apps. Shin Hwei Tan Zhen Dong Xiang Gao Abhik Roychoudhury National University of Singapore

McAfee epolicy Orchestrator

CSC Introduction to Computers and Their Applications

File Synchronization using API Google Drive on Android Operating System

in memory: an evolution of attacks Mathias Payer Purdue University

THE ART OF SECURING 100 PRODUCTS. Nir

ART JIT in Android N. Xueliang ZHONG Linaro ART Team

How to Leverage Containers to Bolster Security and Performance While Moving to Google Cloud

Language Oriented Modularity: From Theory to Practice

IP Protection in Java Based Software

Transcription:

OSSPolice - Identifying Open-Source License Violation and 1-day Security Risk at Large Scale Ruian Duan, Ashish Bijlani, Meng Xu Taesoo Kim, Wenke Lee ACM CCS 2017 1

Background Open Source Software (OSS) is gaining popularity, e.g. GitHub reported 20M users and 57M repos Mobile app market grows fast with over 2M apps on Play Store Developers reuse OSS as is for lots of benefits Legal risks and security risks arise 2

Risks in OSS use OSS licenses have constraints (e.g. GNU GPL requires derivative works to open source) 1-day vulnerabilities in stale OSS versions are exploited by hackers For now, GNU GPL is an enforceable contract, says US federal judge! Artifex Slaps Palm with PDF Reader Copyright Suit Equifax blames open-source software for its record-breaking security breach Community Health Systems Breach Possible due to Heartbleed Vulnerability 3

Goal Design a tool, OSSPolice, to analyze Android apps for open-source license violation and 1-day security risk by detecting reuse of OSS and their versions at large scale Requirements Accurate detection for hundreds of thousands of OSS Accurate version pinpointing Efficient resource usage Fast search to support vetting a large number of Android apps 4

Overview and challenges Feature selection Source vs binary: automatically building source code is hard, due to dependencies, various build configs etc. Compare App against OSS Fused app binaries: multiple OSS can be linked or compiled into a single file Partial builds and internal code clones: not all OSS features are built into libraries and OSS reuses other OSS Identify OSS versions Cross-match of unique version features: fused app binaries and internal code clones can confuse the provenance of unique features 5

Source vs binary C/C++ OSS are built into stripped native shared libraries (so files) Source Code Shared Library Stripped Shared.text.rodata Library.dynsym.symtab.text.rodata.debug_info.dynsym Foo.c void foo() { w= hello } Bar.c static bar() { w= world } Java OSS are built into obfuscated dalvik executables (dex files) Source code Dalvik Bytecode Obfuscated Dalvik Bytecode package edu.gatech; class Foo { bar(){println( hello world )}; }.class edu/gatech/foo.method bar const-string v1,"hello World invoke-virtual {v0,v1},println.class a.method a const-string v1,"hello World invoke-virtual {v0,v1},println 6

Feature selection C/C++ OSS vs so files String literal Clang-based lexer for OSS and.rodata for libraries Exported function Clang-based parser for OSS and.dynsym for libraries Java OSS vs dex files String constant Normalized class Captures interaction with framework Function centroid Captures intra-procedural control flow 7

Fused app binaries An app uses multiple OSS!"# %&&!"# %&&!"# %&& edu.gatech.example OpenSSL MuPDF OpenCV OkHttp MoPub Log4j Iterate N OSS has O(N) time complexity Flag all OSS being used at the same time Index OSS and their versions! 8

Flat indexing and matching Indexing: Maps features to OSS Matching: Lookup feature -> OSS mapping to identify OSS reuse edu.gatech.example feature 1 feature 2 feature 3 MuPDF OpenCV Flat indexing blow up table to 90G after indexing 7K OSS Indexing multiple versions of OSS further adds to the problem Given N OSS with F features and V versions, O(NFV) space complexity 9

Partial builds and internal code clones repo dir file Internal code clones confuses MuPDF OpenCV third-party with core and requires high match ratio to filter source thirdparty 3rdparty modules/core pdf fitz LibJPEG LibPNG src test jpeglib.h pngtest.c pdf-lex.c test-dev.cpp png.c opengl.cpp test-io.cpp Partial builds (e.g. examples, tests) causes the match ratio to be low 10

Hierarchical indexing and matching Hierarchical Indexing Records source hierarchy to track internal clones Uses Simhash algorithm to generate ids for non-leaf nodes for deduplication Record unique features across versions via separate lists edu.gatech.example feature 1 feature 2 feature 3 Hierarchical Matching NormScore (TF-IDF based) to promote unique parts when computing matching ratio of a node Allow partial builds by skipping nodes with low ratio Drop internal code clones by skipping nodes likely to be third-party file 1 file 2 file 3 dir 1 dir 2 dir 3 dir 4 dir 5 MuPDF OpenCV LibPNG 11

Cross-match of unique version features edu.gatech.example MuPDF V1.6 LibPNG V1.2.46 1.5.0 1.6.0 1.2.46 foo_string int bar_func() MuPDF V 1.5 V 1.6 LibPNG V 1.2.46 V 1.6.0 12

Context-based filtering Leverage context information in hierarchical indexing table Use NormScore to assign different weights to features MuPDF V1.6 edu.gatech.example MuPDF V1.6 LibPNG V1.2.46 pdf.c 1.6.0 int pdf_read() LibPNG V1.6.0 png.c 1.6.0 int png_read() 13

Evaluation FDroid Apps 4,469 apps, 579 with native libraries 295 C/C++ OSS uses, 7,055 Java OSS uses BAT: internal code clones LibScout: partial builds (code removal) C/C++ OSS Evaluation Results Java OSS Evaluation Results 100 80 60 40 20 0 55 matches 100 80 60 40 20 0 478 matches 295 matches Precision (%) Recall (%) Version Precision (%) Precision (%) Recall (%) Version Precision (%) OSSPolice BAT OSSPolice LibScout 14

Dataset C/C++ OSS from GitHub 3,119 popular repos and 60,450 OSS versions 29% repos are GPL/AGPL 11% repos are vulnerable with 5,611 severe CVEs (CVSS 4.0) Java OSS from Maven and JCenter 4,777 popular artifacts, 77,308 artifact versions 2.3% artifacts are GPL/AGPL 1.7% artifacts are vulnerable with 452 severe CVE ids Android Apps from Google Play 1.6M apps, 515,812 with native libraries 15

Performance and Scalability Indexing 60,450 C/C++ repos and 77,308 Java repos Time cost is 1000s vs. 40s on average Memory grows sublinearly to 30GB and 9GB Memory usage (GB) 37.25 32.60 27.94 23.28 18.63 13.97 9.31 4.66 C/C++ Memory Usage Java Memory Usage 0.00 0 10 20 30 40 50 60 70 80 Number of indexed repos(thousands) Matching Sampled 10,000 Google Play apps 80% of dex and so files finish within 100s and 200s 16

Popular libraries Long-tailed distribution of OSS uses 150000 100000 50000 0 Top 10 detected Java OSS excluding Android and Google OSS # Uses aggregated by types Utils Network Social Image Codec 100,000 80,000 60,000 40,000 20,000 0 Top 10 detected C/C++ OSS # Uses aggregated by types Codec Game Font Network Audio Viewer 17

Legal Risks More than 40K potential GPL violators More violators using C/C++ than Java and encoding libraries dominate 1500 1000 500 Top 5 offended Java OSS # Uses aggregated by types 0 itextpdf Java Connector GreenDAO Generator Top 5 offended C/C++ OSS Proguard Weka-Dev Codec Utils Compiler # Uses aggregated by types 40000 30000 20000 10000 0 MuPDF FFmpeg PJSIP VLC and X264 BZRTP Codec Communication 18

Legal Risks Why violating GPL/AGPL? MuPDF and itextpdf are used due to lack of free alternatives OSS developers responses MuPDF got new customers J FFmpeg and VideoLAN have interest, but FFmpeg cannot enforce J PJSIP not interested due to NDA, itext did not reply L Awareness of OSS licensing terms None of the app developers provided source code yet L 19

Security Risks More than 100K apps using vulnerable OSS versions More apps using vulnerable C/C++ OSS than Java 45000 40000 35000 30000 25000 20000 15000 10000 5000 0 1,244 LibPNG and 4,919 OpenSSL uses are not detected by App Security Improvement Program Top 6 C/C++ (ASIP) and 4 Java vulnerable OSS C/C++ Java 20

Security Risks Which versions of OSS do app developers choose? Both vulnerable and patched OSS are being used What causes the update of OSS versions? ASIP mitigates vulnerable OSS usage, but still remains a problem MoPub OpenSSL OkHttp FFmpeg # Vuln. Usage # Patched Usage 750 500 250 0 800 600 400 200 0 2400 1600 800 0 240 160 80 0 2013-05-12 2013-11-28 2014-06-16 2015-01-02 2015-07-21 2016-02-06 2016-08-24 Date ASIP Deadline ASIP Notification Timeline of OSS usage for the top 10K apps, 300K app versions 21

Discussion Checking license compliance requires manual efforts Obfuscation and optimization String encryption in dex files Function hiding in so files Version pinpointing Not all versions can be uniquely identified More programming languages (i.e. JS, Python) and platforms (i.e. ios) 22

Conclusion OSSPolice: an accurate and scalable tool to identify license violations and 1-day security risks Hierarchical indexing and matching scheme Context-based unique feature filtering A large scale measurement 1.6M free Google Play Store apps 40K cases of potential GPL/AGPL violations and 100K apps using vulnerable OSS Interesting insights Developers violate GPL/AGPL due to lack of free alternatives App developers use vulnerable OSS versions despite efforts from Google 23