Classifying malware using network traffic analysis. Or how to learn Redis, git, tshark and Python in 4 hours.

Similar documents
Redis - a Flexible Key/Value Datastore An Introduction

Using Redis for data processing in a incident response environment.

REDIS: NOSQL DATA STORAGE

LECTURE 27. Python and Redis

NoSQL: Redis and MongoDB A.A. 2016/17

Jason Brelloch and William Gimson

INFO-H-415 Advanced Databases Key-value stores and Redis. Fatemeh Shafiee Raisa Uku

Dr. Chuck Cartledge. 19 Nov. 2015

pyredis Documentation

Part IV. More on Python. Tobias Neckel: Scripting with Bash and Python Compact Max-Planck, February 16-26,

Redis Ma as, greitas, galingas. Specialiai VilniusPHP

CSC326 Persistent Programming i. CSC326 Persistent Programming

COSC Redis. Paul Moore, Stephen Smithbower, William Lee. March 11, 2013

Harnessing the Full power of Redis. Daniel Magliola

Big Data Management and NoSQL Databases

Redis Functions and Data Structures at Percona Live. Dave Nielsen, Developer Redis

Database Solution in Cloud Computing

Redisco Documentation

Scrapy-Redis Documentation

Buffering to Redis for Efficient Real-Time Processing. Percona Live, April 24, 2018

Script language: Python Data and files

THE FLEXIBLE DATA-STRUCTURE SERVER THAT COULD.

semidbm Documentation

Caching Memcached vs. Redis

Amritansh Sharma

Redis as a Time Series DB. Josiah Carlson

Harness the power of Python magic methods and lazy objects.

Redis Func+ons and Data Structures

NoSQL Databases Analysis

Python Basics. Lecture and Lab 5 Day Course. Python Basics

Intro to Redis. A Support Overview

CMSC 461 Final Exam Study Guide

Give Your Site A Boost With Memcache. Ben Ramsey September 25, 2009

Bazaar Architecture Overview Release 2.8.0dev1

Microsoft Developing SQL Databases. Download Full version :

LECTURE 4 Python Basics Part 3

Pillaging DVCS Repos Adam Baldwin

Prototyping Data Intensive Apps: TrendingTopics.org

ReJSON = { "activity": "new trick" } Itamar

Introduction to Supercomputing

Distributed Cache Service. User Guide. Date

AIL Framework for Analysis of Information Leaks

How you can benefit from using. javier

redis-py Documentation

Platform Migrator Technical Report TR

1

Practical Performance Tuning using Digested SQL Logs. Bob Burgess Salesforce Marketing Cloud

Distributed Systems. 29. Distributed Caching Paul Krzyzanowski. Rutgers University. Fall 2014

Perl. Perl. Perl. Which Perl

Avi Networks Technical Reference (17.2)

REdis: Implementing Redis in Erlang. A step-by-step walkthrough

Text Analytics. Index-Structures for Information Retrieval. Ulf Leser

Introduction Variables Helper commands Control Flow Constructs Basic Plumbing. Bash Scripting. Alessandro Barenghi

Agenda. About Me JDriven The case : Westy Tracking Background Architecture Implementation Demo

sottotitolo A.A. 2016/17 Federico Reghenzani, Alessandro Barenghi


Bigtable. Presenter: Yijun Hou, Yixiao Peng

STATS Data Analysis using Python. Lecture 15: Advanced Command Line

Automatic MySQL Schema Management with Skeema. Evan Elias Percona Live, April 2017

Programming for Data Science Syllabus


Python Tutorial. Day 1

Appendix A GLOSSARY. SYS-ED/ Computer Education Techniques, Inc.

70-532: Developing Microsoft Azure Solutions

Merge Conflicts p. 92 More GitHub Workflows: Forking and Pull Requests p. 97 Using Git to Make Life Easier: Working with Past Commits p.

CNIT 50: Network Security Monitoring. 6 Command Line Packet Analysis Tools

How tshark saved my SDN Forensics

Introduction to the UNIX command line

NETCONF Protocol. Restrictions for the NETCONF Protocol. Information About the NETCONF Protocol

Perl. Many of these conflict with design principles of languages for teaching.

APIs and API Design with Python

NoSQL Databases. an overview

CS Programming Languages: Python

Cisco XML API Overview

CIEL Tutorial. Connecting to your personal cluster

GIS 4653/5653: Spatial Programming and GIS. More Python: Statements, Types, Functions, Modules, Classes

CS November 2017

CS November 2018

Source Code Management wih git

Ruby on Redis. Pascal Weemaels Koen Handekyn Oct 2013

pymemcache Documentation

13 File Structures. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

Scalable Time Series in PCP. Lukas Berk

Oracle Database 10g: New Features for Administrators Release 2

Chapter 1 - Introduction. September 8, 2016

9 Reasons To Use a Binary Repository for Front-End Development with Bower

Introduction to the shell Part II

django-redis-cache Documentation

ABSTRACT 2. BACKGROUND

Useful Unix Commands Cheat Sheet

asyncio_redis Documentation

Agenda. Introduction You Me JDriven The case : Westy Tracking Details Implementation Deployment

python-anyvcs Documentation

Performance Optimization for Informatica Data Services ( Hotfix 3)

Table of contents. Our goal. Notes. Notes. Notes. Summer June 29, Our goal is to see how we can use Unix as a tool for developing programs

Infoblox Kubernetes1.0.0 IPAM Plugin

The Little Redis Book is licensed under the Attribution-NonCommercial 3.0 Unported license. You should not have paid for this book.

SMB Analysis OPSEC 2016

Chapter 8: Working With Databases & Tables

Functional Programming Invades Architecture. George Fairbanks SATURN May 2017

Transcription:

Classifying malware using network traffic analysis. Or how to learn Redis, git, tshark and Python in 4 hours. Alexandre Dulaunoy January 9, 2015

Problem Statement We have more 5000 pcap files generated per day for each malware execution in a sandbox We need to classify 1 the malware into various sets The project needs to be done in less than a day and the code shared to another team via GitHub 1 Classification parameters are defined by the analyst 2 of 1

File Format and Filename... 0580c82f6f90b75fcf81fd3ac779ae84.pcap 05a0f4f7a72f04bda62e3a6c92970f6e.pcap 05b4a945e5f1f7675c19b74748fd30d1.pcap 05b57374486ce8a5ce33d3b7d6c9ba48.pcap 05bbddc8edac3615754f93139cf11674.pcap 05bf1ff78685b5de06b0417da01443a9.pcap 05c3bccc1abab5c698efa0dfec2fd3a4.pcap... <MD5 hash of the malware).pcap MD5 values 2 of malware samples are used by A/V vendors, security researchers and analyst. 2 https://www.virustotal.com/ as an example 3 of 1

MapReduce and Network Forensic MapReduce is an old concept in computer science The map stage to perform isolated computation on independent problems The reduce stage to combine the computation results Network forensic computations can easily be expressed in map and reduce steps: parsing, filtering, counting, sorting, aggregating, anonymizing, shuffling... 4 of 1

Processing and Reading pcap files ls -1 parallel --gnu tcpdump -s0 -A -n -r {1} Nice for processing the files but... How do we combine the results? How do we extract the classification parameters? (e.g. sed, awk, regexp?) 5 of 1

tshark tshark -G fields Wireshark is supporting a wide range of dissectors tshark allows to use the dissectors from the command line tshark -E separator=, -Tfields -e ip.dst -r mycap.cap 6 of 1

Concurrent Network Forensic Processing To allow concurrent processing, a non-blocking data store is required To allow flexibility, a schema-free data store is required To allow fast processing, you need to scale horizontally and to know the cost of querying the data store To allow streaming processing, write/cost versus read/cost should be equivalent 7 of 1

Redis: a key-value/tuple store Redis is key store written in C with an extended set of data types like lists, sets, ranked sets, hashes, queues Redis is usually in memory with persistence achieved by regularly saving on disk Redis API is simple (telnet-like) and supported by a multitude of programming languages http://www.redis.io/ 8 of 1

Redis: installation Download Redis 2.8.19 (stable version) tar xvfz redis-2.8.19.tar.gz cd redis-2.8.19 make 9 of 1

Keys Keys are free text values (up to 2 31 bytes) - newline not allowed Short keys are usually better (to save memory) Naming convention are used like keys separated by colon 10 of 1

Value and data types binary-safe strings lists of binary-safe strings sets of binary-safe strings hashes (dictionary-like) pubsub channels 11 of 1

Running redis and talking to redis... screen cd./src/ &&./redis-server new screen session (crtl-a c) redis-cli DBSIZE 12 of 1

Commands available on all keys Those commands are available on all keys regardless of their type TYPE [key] gives you the type of key (from string to hash) EXISTS [key] does the key exist in the current database RENAME [old new] RENAMENX [old new] DEL [key] RANDOMKEY returns a random key TTL [key] returns the number of sec before expiration EXPIRE [key ttl] or EXPIRE [key ts] KEYS [pattern] returns all keys matching a pattern (!to use with care) 13 of 1

Commands available for strings type SET [key] [value] GET [key] MGET [key1] [key2] [key3] MSET [key1] [valueofkey1]... INCR [key] INCRBY [key] [value]! string interpreted as integer DECR [key] INCRBY [key] [value]! string interpreted as integer APPEND [key] [value] 14 of 1

Commands available for sets type SADD [key] [member] adds a member to a set named key SMEMBERS [key] return the member of a set SREM [key] [member] removes a member to a set named key SCARD [key] returns the cardinality of a set SUNION [key...] returns the union of all the sets SINTER [key...] returns the intersection of all the sets SDIFF [key...] returns the difference of all the sets S...STORE [destkey key...] same as before but stores the result 15 of 1

Commands available for list type RPUSH - LPUSH [key] [value] LLEN [key] LRANGE [key] [start] [end] LTRIM [key] [start] [end] LSET [key] [index] [value] LREM [key] [count] [value] 16 of 1

Sorting SORT [key] SORT [key] LIMIT 0 4 17 of 1

Commands availabled for sorted set type ZADD [key] [score] [member] ZCARD [key] ZSCORE [key] [member] ZRANK [key] [member] get the rank of a member from bottom ZREVRANK [key] [member] get the rank of a member from top 18 of 1

Atomic commands GETSET [key] [newvalue] sets newvalue and return previous value (M)SETNX [key] [newvalue] sets newvalue except if key exists (useful for locking) MSETNX is very useful to update a large set of objects without race condition. 19 of 1

Database commands SELECT [0-15] selects a database (default is 0) MOVE [key] [db] move key to another database FLUSHDB delete all the keys in the current database FLUSHALL delete all the keys in all the databases SAVE - BGSAVE save database on disk (directly or in background) DBSIZE MONITOR what s going on against your redis datastore (check also redis-stat) 20 of 1

Redis from shell? ret=$(redis-cli SADD dns:${md5} ${rdata}) num=$(redis-cli SCARD dns:${md5}) Why not Python? import redis r = redis.strictredis(host= localhost, port=6379, db=0) r.set( foo, bar ) r.get( foo ) 21 of 1

How do you integrate it? ls -1./pcap/*.pcap parallel --gnu "cat {1} tshark -E separator=, -Tfields -e http.server -r {1} python import.py -f {1} " Code need to be shared? 22 of 1

Structure and Sharing As you start to work on a project and you are willing to share it One approach is to start with a common directory structure for the software to be shared /PcapClassifier/bin/ -> where to store the code /etc/ -> where to store configuration /data/ README.md 23 of 1

import.py (first iteration) # Need to catch the MD5 filename of the malware ( from the pcap filename ) import argparse import sys argparser = argparse. ArgumentParser ( description = Pcap classifier ) argparser. add_argument ( -f, action = append, help = Filename ) args = argparser. parse_args () if args.f is not None : filename = args.f for line in sys. stdin : print ( line. split ("," )[0]) else : argparser. print_help () 24 of 1

Storing the code and its revision Initialize a git repository (one time) cd PcapClassifier git init Add a file to the index (one time) git add./ bin / import.py Commit a file to the index (when you do changes) git commit./ bin / import.py 25 of 1

import.py (second iteration) # Need to know and store the MD5 filename processed import argparse import sys import redis argparser = argparse. ArgumentParser ( description = Pcap classifier ) argparser. add_argument ( -f, action = append, help = Filename ) args = argparser. parse_args () r = redis. StrictRedis (host = localhost, port =6379, db =0) if args.f is not None : md5 = args.f [0]. split ("." )[0] r. sadd ( processed, md5 ) for line in sys. stdin : print ( line. split ("," )[0]) else : argparser. print_help () 26 of 1

Redis datastructure Now we have a set of the file processed (including the MD5 hash of the malware): {processed}={abcdef..., deadbeef..., 0101aba...} A type set to store the field type decoded from the pcap (e.g. http.user agent) {type}={ http.user agent, http.server } A unique set to store all the values seen per type {e : http.user agent}={ Mozilla..., Curl... } 27 of 1

import.py (third iteration) import argparse import sys import redis argparser = argparse. ArgumentParser ( description = Pcap classifier ) argparser. add_argument ( -f, action = append, help = Filename ) args = argparser. parse_args () r = redis. StrictRedis ( host = localhost, port =6379, db =0) if args.f is not None : md5 = args.f [0]. split ("." )[0] r. sadd ( processed, md5 ) lnumber =0 fields = None for line in sys. stdin : if lnumber == 0: fields = line. rstrip (). split (",") for field in fields : r. sadd ( type, field ) else : elements = line. rstrip (). split (",") i=0 for element in elements : try : r. sadd ( e: + fields [i], element ) except IndexError : print (" Empty fields ") i=i+1 else : lnumber = lnumber + 1 argparser. print_help () 28 of 1

How to link a type to a MD5 malware hash? Now, we have set of value per type {e : http.user agent}={ Mozilla..., Curl... } Using MD5 to hash type value and create a set of type value for each malware {v : MD5hex( Mozilla... )}={ MD5 of malware,... } 29 of 1

import.py (fourth iteration) import argparse import sys import redis import hashlib argparser = argparse. ArgumentParser ( description = Pcap classifier ) argparser. add_argument ( -f, action = append, help = Filename ) args = argparser. parse_args () r = redis. StrictRedis ( host = localhost, port =6379, db =0) if args.f is not None : md5 = args.f [0]. split ("." )[0] r. sadd ( processed, md5 ) lnumber =0 fields = None for line in sys. stdin : if lnumber == 0: # print ( line. split (",")[0]) fields = line. rstrip (). split (",") for field in fields : r. sadd ( type, field ) else : elements = line. rstrip (). split (",") i=0 for element in elements : try : r. sadd ( e: + fields [i], element ) ehash = hashlib. md5 () ehash. update ( element. encode ( utf -8 )) ehhex = ehash. hexdigest () if element is not "": r. sadd ( v: + ehhex, md5 ) except IndexError : print (" Empty fields ") i=i+1 lnumber = lnumber + 1 else : argparser. print_help () 30 of 1

graph.py - exporting graph from Redis import redis import networkx as nx import hashlib r = redis. StrictRedis (host = localhost, port =6379, db =0) g = nx. Graph () for malware in r. smembers ( processed ): g. add_node ( malware ) for fieldtype in r. smembers ( type ): g. add_node ( fieldtype ) for v in r. smembers ( e: +fieldtype. decode ( utf -8 )): g. add_node (v) ehash = hashlib. md5 () ehash. update (v) ehhex = ehash. hexdigest () for m in r. smembers ( v: + ehhex ): print (m) g. add_edge (v,m) nx. write_gexf (g,"/tmp /out.gexf ") 31 of 1

graph.py - exporting graph from Redis 32 of 1