Naming in Distributed Systems Dr. Yong Guan Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University
Outline for Today s Talk Overview: Names, Identifiers, Addresses, Routes, Name Space, Name Resolution,... Flat Naming Structured Naming Attribute-based Naming
Readings for Today s Lecture Chapter 5 of Distributed Systems: Principles and Paradigms NDSS 2011 paper on monitoring DNS queries and responses
Names, Identifiers, And Addresses An Entity in a distributed system can be pretty much anything. A Name is a string of bits used to refer to an entity. We operate on an entity through its Access Point. The Address is the name of the access point. Example Telephone as Access Point to a person. The Telephone Number then becomes the address of the person. Transport-Level Addresses: IP address and port number Properties of entity: Entity can have several addresses. Person can have several telephone numbers. Entities may change access points over time Telephone numbers, e-mail addresses, IP addresses in mobile systems,...
Flat Naming CprE 450-550
Overview of Flat Naming In many cases, identifiers are random bit strings (i.e., unstructured or flat names). No information on how to locate the access point of its associated entity. Issue: How to locate an entity given only its identifier? Simple solutions: Broadcast and multicast Forwarding Pointers
Simple Solution: Broadcast and Multicast Basic idea: Broadcasting a message containing the identifier of the entity Each machine checks whether it has that entity. Only the machine that offers an access point for the entity send a reply message containing the address of that access point. Works well in LANs Example: ARP Inefficient -> Multicast
Simple Solutions: Forwarding Pointers Basic idea: When an entity moves from A to B, it leaves a reference behind in A to its new location at B. Simplicity, works well in LANs Drawbacks: A chain for a highly mobile entity can become so long. All intermediates have to maintain their part of the chain of forwarding points as long as needed. Vulnerable to broken links Issue: How to keep chains relatively short and robust?
Home-based Approaches Scalability of the previous two solutions? One solution is to use Home Location. Keep track of the current location of an entity Examples: Fall-back mechanism for location services based on forwarding pointers Mobile IP Drawbacks: Communication latency Fixed home location Contacting entity becomes impossible if the home location does not exist. A long-lived entity moves permanently to a different location. Solution: Register the home at naming service and let a client first look up the location.
Distributed Hash Tables Various DHT-based systems exist. General Mechanism Chord uses m-bit identifier space to assign randomly-chosen identifiers to nodes as well as keys to specific nodes. m can be 128 or 160. An entity with key k falls under the jurisdiction of the node with the smallest identifier id >= k. Successor of k: succ(k) Issue: How to efficiently resolve a key k to the address of succ(k)?
Distributed Hash Tables (2) Example: Resolving key 26 from node 1 and key 12 from node 28 in a Chord system.
Distributed Hash Tables (3) Exploiting network proximity: Topology-based assignment of node identifiers Proximity routing Proximity neighbor selection Iterative vs. recursive lookup
Hierarchical Approaches Domains Leaf domains Root (directory) node Hierarchical organization of a location service into domains, each having an associated directory node
Hierarchical Approaches (2) An example of storing information of an entity having two addresses in different leaf domains.
Hierarchical Approaches (3) Looking up a location in a hierarchically organized location service.
Hierarchical Approaches (4) (a) An insert request is forwarded to the first node that knows about entity E. (b) A chain of forwarding pointers to the leaf node is created
Structured Naming CprE 450-550
Overview of Structured Naming Flat names: good for machines, not convinient for humans to use. Structured names: Simple and human-readable names Name Space: Leaf node Root node Directory node Directory table Path name: Absolute and relative path name Global and local name
Structured Naming: Another example The general organization of the UNIX file system implementation on a logical disk of contiguous disk blocks.
Name Resolution Closure Mechanism Knowing how and where to start name resolution Linking and Mounting Aliases Hard links Symbolic links Mounting point Information required to mount a foreign name space in a distributed system The name of an access protocol. The name of the server. The name of the mounting point in the foreign name space.
Linking and Mounting Symbolic Link Remote File System Mounting
Name Space Distribution (1) An example partitioning of the DNS name space, including Internet-accessible files, into three layers.
Name Space Distribution (2) A comparison between name servers for implementing nodes from a large-scale name space partitioned into a global layer, an administrational layer, and a managerial layer.
Implementation of Name Resolution Where to start name resolution? ( Closure ) Simplified picture: No replication of name servers No client side caching Each client has access to local name resolver. Example: resolve root:<edu,iastate,ee,ftp,pub,netex,index.txt> Iterative Resolution vs. Recursive Resolution
Implementation of Name Resolution (2) The principle of iterative name resolution.
Implementation of Name Resolution (3) The principle of recursive name resolution.
Iterative vs. Recursive Iterative Stateless Recursive Higher-level servers need to maintain state about resolutions. Caching is effective. Reduced communication costs Example: The Domain Name System
The DNS Name Space Type of record Associated entity Description SOA Zone Holds information on the represented zone A Host Contains an IP address of the host this node represents MX Domain Refers to a mail server to handle mail addressed to this node SRV Domain Refers to a server handling a specific service NS Zone Refers to a name server that implements the represented zone CNAME Node Symbolic link with the primary name of the represented node PTR Host Contains the canonical name of a host HINFO Host Holds information on the host this node represents TXT Any kind Contains any entity-specific information considered useful The most important types of resource records forming the contents of nodes in the DNS name space.
DNS Implementation An excerpt from the DNS database for the zone cs.vu.nl.
Attribute-based Naming CprE 450-550
Attribute-based Naming (Attribute, value) Directory Services: X.500 - DIT, DSA, DUA Hierarchical implementation: LDAP Combining structured naming with attribute-based naming A simple example of an LDAP directory entry using LDAP naming conventions:
Hierarchical Implementations: LDAP
Decentralized Implementation Mapping to Distributed Hash Tables Attribute-value tree (AVTree) (a) A general description of a resource. (b) Its representation as an AVTree.
Mapping to Distributed Hash Tables (a) The resource description of a query. (b) Its representation as an AVTree.
Tor Hidden Service CprE 450-550
Tor Hidden Service (cont.) CprE 450-550
Monitoring DNS Queries and Responses NDSS 11 Paper EXPOSURE: FINDING MALICIOUS DOMAINS USING PASSIVE DNS ANALYSIS
Botnet and other malware Domain name service (DNS) A two-way mapping between domain names and their IPs. Many malicious services also depend on DNS. Fast-flux (FF) DNS techniques: Changing the domain name mappings to different IP addresses frequently. Botnets work as a global Content Deliver Network (CDN). Identifying malicious domains can help defend Internet threats: Botnets Phishing
EXPOSURE Malicious domains: Blacklists and DGA Labeled Data Malicious/Benign Domains Collector Learning Module Benign: Alexa top 1000 domains and domains older than one year Data Collector Feature Attribution Classifier DNS Queries Unlabeled Data
FEATURE Time-Based Features DNS Answer-Based Features TTL Value-Based Features Domain Name-Based Features
TIME-BASED FEATURES Short life A sudden increase followed by a sudden decrease Daily similarity An increase or decrease of the request count at the same intervals everyday Repeating patterns Change point detection Access ratio Idle stat or continuously access
DNS ANSWER-BASED FEATURES Large Value Number of distinct IP addresses that are resolved for a given domain Number of distinct countries that these IP addresses are located in Number of distinct domains that share the returned IP address Number of distinct domains that share the IP addresses that resolve to the given domain
TTL VALUE-BASED FEATURES Small TTL Average TTL Standard Deviation of TTL Number of distinct TTL values Number of TTL change Percentage usage of specific TTL ranges A lot of values and changes [0, 100) exhibits a significant peak for malicious domains
DOMAIN NAME-BASED FEATURES Ratio of numerical characters to the length of the domain name Ratio of the length of the longest meaningful substring to the length of the domain name Benign domain names can be easily remembered, but attackers do not care
EVALUATION Their method can detect a high number of unknown malicious domains from DNS traffic They have a significant performance improvement over previous work.
LIMITATION Attackers can evade EXPOSURE by avoiding the specific features and behavior in DNS traffic Attackers would take a reliability hit on their malicious infrastructures. Their detection ratio depends on the training set. EXPOSURE cannot detect malicious domains that are unknown and have not been encountered before.
Questions? Thanks and See you next time