External data representation

Similar documents
CSCI-1680 RPC and Data Representation John Jannotti

CSCI-1680 RPC and Data Representation. Rodrigo Fonseca

CSCI-1680 RPC and Data Representation. Rodrigo Fonseca

Protocol Buffers, grpc

CS 417 9/18/17. Paul Krzyzanowski 1. Socket-based communication. Distributed Systems 03. Remote Procedure Calls. Sample SMTP Interaction

Distributed Systems. 03. Remote Procedure Calls. Paul Krzyzanowski. Rutgers University. Fall 2017

NETWORK AND SYSTEM PROGRAMMING

Distributed Systems 8. Remote Procedure Calls

Operating Systems. 18. Remote Procedure Calls. Paul Krzyzanowski. Rutgers University. Spring /20/ Paul Krzyzanowski

Socket Programming for TCP and UDP

markup language carry data define your own tags self-descriptive W3C Recommendation

CSC Web Technologies, Spring Web Data Exchange Formats

Introduction to JSON. Roger Lacroix MQ Technical Conference v

Semesterprojekt Implementierung eines Brettspiels (inklusive computergesteuerter Spieler) Remote Procedure Calls

Assigns a number to 110,000 letters/glyphs U+0041 is an A U+0062 is an a. U+00A9 is a copyright symbol U+0F03 is an

INTERNET PROGRAMMING XML

15-213/ Final Exam Notes Sheet Spring 2013!

Googles Approach for Distributed Systems. Slides partially based upon Majd F. Sakr, Vinay Kolar, Mohammad Hammoud and Google PrototBuf Tutorial

B4M36DS2, BE4M36DS2: Database Systems 2

Piqi-RPC. Exposing Erlang services via JSON, XML and Google Protocol Buffers over HTTP. Friday, March 25, 2011

Avro Specification

JSON Support for Junos OS

Programming Ethernet with Socket API. Hui Chen, Ph.D. Dept. of Engineering & Computer Science Virginia State University Petersburg, VA 23806

Distributed Systems. How do regular procedure calls work in programming languages? Problems with sockets RPC. Regular procedure calls

Problem 1: Concepts (Please be concise)

Oral. Total. Dated Sign (2) (5) (3) (2)

User Interaction: XML and JSON

Grading for Assignment #1

Avro Specification

Programming in C. Lecture 9: Tooling. Dr Neel Krishnaswami. Michaelmas Term

Apache Thrift Introduction & Tutorial

Section 3: File I/O, JSON, Generics. Meghan Cowan

So, if you receive data from a server, in JSON format, you can use it like any other JavaScript object.

Programming Ethernet with Socket API. Hui Chen, Ph.D. Dept. of Engineering & Computer Science Virginia State University Petersburg, VA 23806

Flat triples approach to RDF graphs in JSON

AVR Helper Library 1.3. Dean Ferreyra

Google Protocol Buffers for Embedded IoT

Jansson Documentation

JSON is a light-weight alternative to XML for data-interchange JSON = JavaScript Object Notation

Introduction to XML. M2 MIA, Grenoble Université. François Faure

CS 378 Big Data Programming

Binary Encodings for JavaScript Object Notation: JSON-B, JSON-C, JSON-D

Distributed Systems Theory 4. Remote Procedure Call. October 17, 2008

Distributed Programming and Remote Procedure Calls (RPC): Apache Thrift. George Porter CSE 124 February 19, 2015

Thrift specification - Remote Procedure Call

Core Components of Policy-based Telemetry Streaming

Chapter 4 Communication

Socket Programming. CSIS0234A Computer and Communication Networks. Socket Programming in C

JSON as an XML Alternative. JSON is a light-weight alternative to XML for datainterchange

Binghamton University. CS-211 Fall Syntax. What the Compiler needs to understand your program

Overview. Communication types and role of Middleware Remote Procedure Call (RPC) Message Oriented Communication Multicasting 2/36

pybdg Documentation Release 1.0.dev2 Outernet Inc

s Protocol Buffer Knight76 at gmail.com

CSE 544 Principles of Database Management Systems. Lecture 4: Data Models a Never-Ending Story

ReJSON = { "activity": "new trick" } Itamar

Introduction to Azure DocumentDB. Jeff Renz, BI Architect RevGen Partners

CLIENT-SIDE PROGRAMMING

CSC 337. JavaScript Object Notation (JSON) Rick Mercer

C 1. Recap: Finger Table. CSE 486/586 Distributed Systems Remote Procedure Call. Chord: Node Joins and Leaves. Recall? Socket API

Packaging Data for the Web

Projects and Apache Thrift. August 30 th 2018 Suresh Marru, Marlon Pierce

This tutorial will help you understand JSON and its use within various programming languages such as PHP, PERL, Python, Ruby, Java, etc.

Interprocess Communication

CPSC 3740 Programming Languages University of Lethbridge. Data Types

What is XML? XML is designed to transport and store data.

REMOTE PROCEDURE CALLS EE324

Communication in Distributed Systems

Linux Programming

Tutorial on Socket Programming

CS Programming In C

CS 261 Fall Mike Lam, Professor Integer Encodings

Data Storage. August 9, Indiana University. Geoffrey Brown, Bryce Himebaugh 2015 August 9, / 19

C Language Part 1 Digital Computer Concept and Practice Copyright 2012 by Jaejin Lee

Elliotte Rusty Harold August From XML to Flat Buffers: Markup in the Twenty-teens

External Data Representation (XDR)

Networked Applications: Sockets. End System: Computer on the Net

Socket Programming. Joseph Cordina

// socket for establishing connections

Teach your (micro)services speak Protocol Buffers with grpc.

End-to-End Data. Presentation Formatting. Difficulties. Outline Formatting Compression

Socket Programming. #In the name of Allah. Computer Engineering Department Sharif University of Technology CE443- Computer Networks

Programming in C++ 6. Floating point data types

Data Types. (with Examples In Haskell) COMP 524: Programming Languages Srinivas Krishnan March 22, 2011

Notes. Submit homework on Blackboard The first homework deadline is the end of Sunday, Feb 11 th. Final slides have 'Spring 2018' in chapter title

Data Types Literals, Variables & Constants

Features of C. Portable Procedural / Modular Structured Language Statically typed Middle level language

Objectives. Chapter 2: Basic Elements of C++ Introduction. Objectives (cont d.) A C++ Program (cont d.) A C++ Program

Chapter 2: Basic Elements of C++

Simple Object Access Protocol (SOAP) Reference: 1. Web Services, Gustavo Alonso et. al., Springer

Distributed Systems Lecture 2 1. External Data Representation and Marshalling (Sec. 4.3) Request reply protocol (failure modes) (Sec. 4.

Chapter 2: Basic Elements of C++ Objectives. Objectives (cont d.) A C++ Program. Introduction

C OVERVIEW. C Overview. Goals speed portability allow access to features of the architecture speed


Short Notes of CS201

Serialization and deserialization of complex data structures, and applications in high performance computing

Outline. EEC-681/781 Distributed Computing Systems. The OSI Network Architecture. Inter-Process Communications (IPC) Lecture 4

CS201 - Introduction to Programming Glossary By

Comparative Survey of Object Serialization Techniques and the Programming Supports

Java language. Part 1. Java fundamentals. Yevhen Berkunskyi, NUoS

XML. Marie Dubremetz Uppsala, April 2014

Transcription:

External data representation https://developers.google.com/protocol-buffers/ https://github.com/protobuf-c/protobuf-c http://www.drdobbs.com/webdevelopment/after-xml-json-thenwhat/240151851 http://www.digip.org/jansson/ 1

System data Internal data is represented as data structures C structures/arrays Java objects Transferred data is represented as byte sequences Data must be flatten to be transmitted Same data type can have multiple representations e.g. floats/integers/characters 2

Data transmission Format of the transmitted data should be agreed conversion to a common format transmitter converts receiver converts transmitted in the sender format receiver converts 3

Data transfer Sends request receives results Deserialization Serialization Waits for request (well know address) Deserialization Serialization Communication potocol Network data format What kind of protocol to use, and what data to transmit? Efficient mechanism for storing and exchanging data Requirements Correction Efficiency Interoperability (language/os) Ease to use 4

Data representation CORBA Overdesigned and heavyweight Java object serialization taylored to one environment: Java DCOM, COM+ taylored to one environment: Windows JSON, Plain Text, XML Lack protocol description. Programmer has to maintain both client and server code. XML has high parsing overhead. Relatively expensive to process; large due to repeated tags Binary 5

Binary - byteorder Byte oder on >16bits numbers depend on the processor architecture Can be done in two ways: Big-endian: lower addresses with higher order bits (ex: ARM *). Little-endian: lower addresses with lower order bits (ex: Intel x86). Integer 1000465 (0x000F4411), Big-endian Little-endian 0x10003 11 0x10003 00 0x10002 44 0x10002 0F 0x10001 0F 0x10001 44 0x10000 00 0x10000 11

Big-endian advantages: htons htonl ntohs ntohl Integers are stored in the same order as strings (from left to right). Number signal is on the first byte (base address). Little-endian advantages: Eases conversion between different length integers (ex: 12 is represented by 0x0C eor 0x000C). In the Internet, Addresses are alwayes big-endian. The first ARPANET routers (named Interface Message Processor) were 16 bits Honeywell DDP-516 computers bigendian representation

uint32_t htonl(uint32_t hostlong); htons htonl ntohs ntohl The htonl() function converts the unsigned integer hostlong from host byte order tonetwork byte order. uint16_t htons(uint16_t hostshort); The htons() function converts the unsigned short integer hostshort from host byte order to network byte order. uint32_t ntohl(uint32_t netlong); The ntohl() function converts the unsigned integer netlong from network byte order to host byte order. uint16_t ntohs(uint16_t netshort); The ntohs() function converts the unsigned short integer netshort from network byte order to host byte order. On the i386 the host byte order is Least Significant Byte first, whereas the network byte order, as used on the Internet, is Most Significant Byte first.

How to represent data? How to represent data? Which data types do you want to support? Base types, Flat types, Complex types How to encode data into the wire How to decode the data? Self-describing (tags) Implicit description (the ends know) Several answers: Many frameworks do these things automatically 9

JSON JSON is built on two structures: A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array. An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence. These are universal data structures. Virtually all modern programming languages support them in one form or another. It makes sense that a data format that is interchangeable with programming languages also be based on these structures. 10

JSON An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by, (comma). 11

JSON An array is an ordered collection of values. An array begins with [ (left bracket) and ends with ] (right bracket). Values are separated by, (comma). 12

JSON A value can be a string in double quotes, or a number, or true or false or null, or an object or an array. These structures can be nested. 13

http://www.digip.org/jansson/ Parse text: root = json_loads(text, 0, &error); if(!root){ return 1; } Verify if is array if(!json_is_array(root)){ fprintf(stderr, "error: root is not an array\n"); } return 1; Verify if is object if(!json_is_object(data)){ fprintf(stderr, "error: commit data is not an object\n"); } return 1; 14

http://www.digip.org/jansson/ Get data from array for(i = 0; i < json_array_size(root); i++){ data = json_array_get(root, i);. Get data from object sha = json_object_get(data, "sha"); if(!json_is_string(sha)){ fprintf(stderr, "error: sha is not a string\n"); return 1; } message_text = json_string_value(sha); 15

http://www.digip.org/jansson/ json_t *json_array(void) int json_array_append(json_t *array, json_t *value) json_t *json_object(void) int json_object_set(json_t *object, const char *key, json_t *value json_t *json_integer(json_int_t value) char *json_dumps(const json_t *json, size_t flags) 16

xml XML stands for extensible Markup Language XML is a markup language much like HTML XML was designed to store and transport data XML was designed to be self-descriptive XML is a W3C Recommendation <note> <to>tove</to> <from>jani</from> <heading>reminder</heading> <body>don't forget me this weekend!</body> </note> 17

xml XML Documents Must Have a Root Element All XML Elements Must Have a Closing Tag XML Tags are Case Sensitive XML Elements Must be Properly Nested XML Attribute Values Must be Quoted White-space is Preserved in XML XML Stores New Line as LF Well Formed XML 18

JSON vs XML Similarities: Both are human readable Both have very simple syntax Both are hierarchical Both are language independent Differences: Syntax is different JSON is less verbose JSON includes arrays Names in JSON must not be JavaScript reserved words XML can be validated JavaScript is not typically used on the server side Still require explicit parsing and processing by the programmer 19

Data Schema How to parse the encoded data? Two Extremes: Self-describing data: tags Additional information added to message to help in decoding Examples: feld name, type, length Implicit: the code at both ends knows how to decode the message Interoperability depends on well defined protocol specification! Very difficult to change 20

Frameworks 21

Stub Generation Many systems generate stub code from independent specification: IDL IDL Interface Description Language describes an interface in a language neutral way Separates logical description of data from Dispatching code Marshalling/unmarshalling code Data wire format 22

Google Protocol Buffers Defined by Google, released to the public Widely used internally and externally Supports common types, service definitions Natively generates C++/Java/Python code Over 20 other supported by third parties Not a full RPC system, only does marshalling Many third party RPC implementations Effcient binary encoding, readable text encoding Performance 3 to 10 times smaller than XML 20 to 100 times faster to process 23

Google Protocol Buffers Properties Efficient, binary serialization Support protocol evolution Can add new parameters Order in which I specify parameters is not important Skip non-essential parameters Supports types which give you compile-time errors! Supports somewhat complex structures Usage Pattern: for each RPC call, define a new message type for its input and one for its output in a.proto file Protocol buffers are used for other things, e.g., serializing data to non-relational databases their backward compatible features make for nice long-term storage formats Google uses them everywhere (50k proto buf definitions) 24

Google Protocol Buffers 25

Google Protocol Buffers Support service definitions and stub generation, but don't come with transport for RPC There are third-party libraries for that 26

Goal of Protocol Buffer The goal of Protocol Buffer is to provide a language- and platform-neutral way to specify and serialize data such that: Serialization process is efficient, extensible and simple to use Serialized data can be stored or transmitted over the network In Protocol buffers, Google has designed a language to specify messages 27

Protocol Buffer Language Message contains uniquely numbered fields Field is represented by field-type, Data-type Field-name encoding-value default value] Available data-types Primitive data-type int, float, bool, string, raw-bytes Enumerated data-type Nested Message message Person { required string name = 1; required int32 id = 2; optional string email = 3; enum PhoneType { MOBILE = 0; HOME = 1; WORK = 2; } message PhoneNumber { required string number = 1; optional PhoneType type = 2 [default = HOME]; } Allows structuring data into an hierarchy } repeated PhoneNumber phone = 4; 28

Protocol Buffer Language Field-types can be: Required fields Optional fields Repeated fields Dynamically sized array Encoding-value A unique number =1 =2 represents a tag that a particular field has in the binary encoding of the message message Person { } required string name = 1; required int32 id = 2; optional string email = 3; enum PhoneType { } MOBILE = 0; HOME = 1; WORK = 2; message PhoneNumber { } required string number = 1; optional PhoneType type = 2 [default = HOME]; repeated PhoneNumber phone = 4; 29