SAS7BDAT Database Binary Format

Size: px
Start display at page:

Download "SAS7BDAT Database Binary Format"

Transcription

1 SAS7BDAT Database Binary Format Matthew S. Shotwell Contents ˆ Introduction ˆ SAS7BDAT Header ˆ SAS7BDAT Pages ˆ SAS7BDAT Subheaders ˆ SAS7BDAT Packed Binary Data ˆ Platform Differences ˆ Compression Data ˆ Software Prototype ˆ ToDo Introduction The SAS7BDAT file is a binary database storage file. At the time of this writing, no description of the SAS7BDAT file format is publicly available. Hence, users who wish to read and manipulate these files must obtain a license for the SAS software, or third party software with support for SAS7BDAT files. The purpose of this document is to promote interoperability between SAS and other popular statistical software packages, especially R ( The information below was deduced by examining the contents of many SAS7BDAT databases downloaded freely from internet resources (see data/sources.csv). No guarantee is made regarding its accuracy. No SAS software, nor any other software requiring the purchase of a license was used. SAS7BDAT files consist of binary encoded data. Data files encoded in this format often have the extension.sas7bdat. The name SAS7BDAT is not official, but is used throughout this document to refer to SAS database files formatted according to the descriptions below. There appear to be significant differences in the SAS7BDAT format across operating systems (see platform differences). The format described below applies to the majority of the collection of test files referenced in data/sources.csv directory (i.e. files associated with Microsoft Windows). The figure below illustrates the overall structure of the SAS7BDAT database. Each file consists of a 1024 byte header, followed by PC pages, each of length PS bytes (PC and PS are shorthand for page count and page size respectively, and are used to denote these quantities throughout this document).: 1024 header PS page 1 PS page

2 PS page PC SAS7BDAT Header The SAS7BDAT file header contains a binary file identifier (i.e. a magic number), the dataset name, timestamp, the number pages (PC), their size (PS) and a variety of other values that pertain to the database as a whole. The purpose of many header fields remain unknown, but are likely to include specifications for data compression and encryption, password protection, and dates/times of creation and/or modification. All files encountered encode multi-byte values little-endian (least significant byte first). However, it is typical to specify endianness of multi-byte values in a file header. The offset table below describes the SAS7BDAT file header as a sequence of bytes. Information stored in the table is indexed by its byte offset (first column) in the header and its length (second column) in bytes. Byte lengths having the form %n should read: the number of bytes remaining until byte n. The fourth column gives a short description of the data contained at this address. For example, LE uint, page size := PS indicates that the data stored at the corresponding location is a little-endian unsigned integer representing the page size, which we denote PS. The description???????????? indicates that the meaning of data stored at the corresponding location is unknown. The third column represents the author s confidence (low, medium, high) in the corresponding offset, length, and description. Each offset table in this document is formatted in a similar fashion. Variables defined in an offset table are sometimes used in subsequent tables. Header Offset Table 0 32 high binary, magic number 32 3 low???????????? 35 3 low bitmasks (SAS host, SAS release) 38 1 low???????????? 39 1 low ascii, file format version (1-UNIX or 2-WIN) low???????????? high ascii, dataset name medium ascii, file type high 2x LE double, timestamp, secs since 1/1/ low???????????? low???????????? high LE uint, page size := PS high LE uint, page count := PC low???????????? high ascii, release high ascii, host low???????????? low string with timestamps, license? 336 %1024 medium filler/zeros The bitmasks at offsets 35, 36, and 37 appear to hold information regarding the offset of the release and host information. The following table describes the possible polymorphisms, where the first column contains the hex values for bytes 35-37, the second column shows bytes (. represents a non- 2

3 ASCII character or 0, a represents an ASCII character), and the third column gives the type of platform data observed there ( WIN * represents various Microsoft Windows types, such as WIN NT and WIN PRO ). Additional data files are needed to investigate this aspect further. bytes host + release data platform aaaaaaaaaaaaaaaa... WIN * and Linux aaaaaaaaaaaa... WIN aaaaaaaaaaaaaaaa SunOS The byte at offset 39 appears to distinguish the file format type, where 1 indicates that the file was generated on a UNIX-like system, such as Linux or SunOS, and 2 indicates the file was generated on a Microsoft Windows platform. Magic Number The SAS7BDAT magic number is the following 32 byte (hex) sequence.: c2 ea b cf bd c7 31 8c 18 1f SAS7BDAT Pages Following the SAS7BDAT header are pages of data. Each page can be one of (at least) four types. The first three are those that contain meta-information (e.g. field/column attributes), packed binary data, or a combination of both. These types are denoted meta, data, and mix respectively. Meta-information is required to correctly interpret the packed binary information. Hence, this information must be parsed first. In test files (see data/sources.csv), pages containing meta-information always precede pages consisting entirely of packed binary data. In some test data files (from a single source), there is a fourth page type (04) which appears to encode additional meta information. This page usually occurs last, and appears to contain amended meta information. It s purpose is unclear. The page offset table below describes each page type. Byte offsets appended with one of (meta/mix), (mix), or (data) indicate that the corresponding length and description apply only to pages of the listed type. Page Offset Table 0 4 low???????????? (sometimes repeated) 4 8 low???????????? (not critical) 12 4 low???????????? row/col related (not critical) 16 1 low???????????? 17 1 low LE uint, page type meta/data/mix/? (0/1/2/4) 18 (meta/mix) 2 low???????????? 20 (meta/mix) 4 medium LE uint, number of subheader pointers := L 24 (meta/mix) L*12 medium L subheader pointers, 24+L*12 := M M (meta) %PS medium subheader data M+M%8 (mix) %PS medium SAS7BDAT packed binary data... continued on next page 3

4 18 (data) 4 medium LE uint, page row count 24 (data) %PS medium SAS7BDAT packed binary data If a page is of type meta or mix, data beginning at offset byte 24 are a sequence of L 12-byte subheader pointers, which point to an offset farther down the page. SAS7BDAT Subheaders stored at these offsets hold meta information about the database, including the column names, labels, and types. If a page is of type mix, then packed binary data begin at the next 8 byte boundary following the last subheader pointer. In this case, the data begin at offset 24+L*12 + (24+L*12) % 8, where % is the modulo operator. If a page is of type data, then packed binary data begin at offset 24. Subheader Pointers The subheader pointers encode information about the offset and length of subheaders relative to the beginning of the page where the subheader pointer is located. The purpose of the last four bytes of the subheader pointer are uncertain, but may indicate that additional subheader pointers are to be found on the next page, or that the corresponding subheader is not crucial. 0 4 high LE uint, offset from page start to subheader 4 4 high LE uint, length of subheader := H 8 1 low LE uint, optional (0/1)? 9 1 low LE uint, continue next page (0/1)? 10 2 low???????????? SAS7BDAT Subheaders Subheaders contain meta information regarding the SAS7BDAT database, including row and column counts, column names, labels, and types. Each subheader is associated with a four-byte signature that identifies the subheader type, and hence, how it should be parsed. Row Size Subheader The row size subheader holds information about row length (in bytes), their total count, and their count on a page of type mix. 0 4 medium binary, signature F7F7F7F low???????????? 20 4 medium LE uint, row length (in bytes) medium LE uint, row count := r (12 bytes?) medium LE uint, column count (12 bytes?) 48 4 low???????????? 52 4 low LE uint, page size? 56 4 low???????????? 60 4 medium LE uint, max row count on mix page 64 8 medium sequence of 8 FF, end of header... continued on next page 4

5 72 %H low filler Column Size Subheader The column size subheader holds the column count. 0 4 medium binary, signature F6F6F6F6 4 8 medium LE uint, column count := CC Signature 00FCFFFF Subheader The purpose of the subheader with signature 00FCFFFF is unknown. This subheader might contain pointers to column formatting information relative to the column text subheader. 0 4 medium binary, signature 00FCFFFF 4 %H low???????????? Column Text Subheader The column text subheader contains all text associated with columns, including the column name, label, and formatting. However, this subheader is not sufficient to parse these information. Other subheaders (e.g. the column name subheader), which point to specific elements relative to this subheader are also needed. 0 4 medium binary, signature FDFFFFFF 4 12 medium LE uint, length of remaining subheader medium ascii, proc name that generated data? 76 %H high ascii, combined column names, labels, formats Column Name Subheader The column name subheader contains a sequence of column name pointers to the offset of each column name relative to the column text subheader. 0 4 medium binary, signature FFFFFFFF 4 8 medium LE uint, length of remaining subheader 12 8*CC medium column name pointers (see below) 12+8*CC 8 medium filler Column Name Pointers 5

6 0 1 low LE uint, offset relative to page 04 subheader 0 1 low????????????? 2 2 medium LE uint, column name offset w.r.t. FDFFFFFF 4 2 medium LE uint, column name length 6 2 low binary, zeros If the first byte in the column name pointer is 01 (it is usually 00), this indicates that the column name offset is relative to an amendment subheader (i.e. a subheader with the same signature, but found on an amendment page (page type 04). Column Attributes Subheader The column attribute subheader holds information regarding the column offsets within a row, the column widths, and the column types (either numeric or character). The column attribute subheader sometimes occurs more than once (in test data). In these cases, column attributes are applied in the order they are parsed. 0 4 medium binary, signature FCFFFFFF 4 8 medium LE uint, length of remaining subheader 12 12*CC medium column attributes (see below) 12+12*CC 8 medium filler Column Attributes 0 4 medium LE uint, column offset in w.r.t. row 4 4 medium LE uint, column width 8 2 low???????????? 10 2 medium LE uint, column type (01-num, 02-chr) Column Label Subheader The column label subheader contains a column label pointer to the offset of a column label relative to the column text subheader. Since the column label subheader only contains information regarding a single column, there are typically as many column label subheaders as columns. 0 4 medium binary, signature FEFBFFFF 4 38 low???????????? 42 2 medium LE uint, column label offset wrt FDFFFFFF 44 2 medium LE uint, column label length 46 6 low???????????? 6

7 SAS7BDAT Packed Binary Data SAS7BDAT packed binary data are stored by rows, where the size of a row (in bytes) is defined by the row size subheader. When multiple rows occur on a single page, they are immediately adjacent. When a database contains many rows, it is typical that the collection of rows (i.e. their data) is evenly distributed to a number of data pages. However, in test files, no single row s data is broken across two or more pages. A single data row is parsed by interpreting the binary data according to the collection of column attributes contained in the column attributes subheader. Binary data can be interpreted in two ways, as ASCII characters, or as floating point numbers. The column width attribute specifies the number of bytes associated with a column. For character data, this interpretation is straight-forward. For numeric data, interpretation of the column width is more complex. The common binary representation of floating point numbers has three parts; the sign (s), exponent (e), and mantissa (m). The corresponding floating point number is s * m ^ e. Under the IEEE 754 floating point standard, the sign requires 1 bit, the exponent requires 11, and the mantissa requires 52 bits, for a total of 8 bytes. In SAS7BDAT file, numeric quantities can be 3, 4, 5, 6, 7, or 8 bytes in length. For numeric quantities using less than 8 bytes, some number of bytes are absent from the most significant part of the mantissa. The smaller width mantissa means that the range of possible values is restricted. The table of numeric binary formats below describes how bits are distributed among the six possible column widths in SAS7BDAT files. Numeric Binary Formats size 24bit 32bit 40bit 48bit 56bit 64bit 1 bytes sign exponent mantissa Platform Differences The test files referenced in data/sources.csv were examined over a period of time. Files with non- Microsoft Windows markings were only observed late into the writing of this document. Consequently (but not intentionally), the SAS7BDAT description above is specific to SAS datasets generated on the most commonly observed platform: Microsoft Windows. The format of SAS7BDAT files generated on other platforms are formatted differently. In particular, the files natlerr1944.sas7bdat, natlerr2006.sas7bdat appear to be generated on the SunOS platform. The header in these files appear to be 8196 bytes, rather than the 1024 seen on Microsoft Windows platforms. The files cfrance2.sas7bdat, cfrance.sas7bdat, coutline.sas7bdat, gfrance2.sas7bdat, gfrance.sas7bdat, goutline.sas7bdat, xfrance2.sas7bdat, xfrance.sas7bdat, xoutline.sas7bdat appear to be generated on a Linux system. Compression Data The table below presents the results of compression tests on a collection of 142 SAS7BDAT data files (sources in data/). The type field represents the type of compression, ctime is the compression time (in seconds), dtime is the decompression time, and the compression ratio field holds the cumulative disk usage (in megabytes) before and after compression. Although the xz algorithm requires significantly more time to compress these data, the decompression time is on par with gzip. 1 Only 64bit is IEEE 754 compliant! 7

8 type ctime dtime compression ratio gzip s 2.6s 541M / 30.3M = 17.9 bzip s 11.2s 541M / 19.0M = 28.5 xz s 2.7s 541M / 12.8M = 42.3 Software Prototype The prototype program for reading SAS7BDAT formatted files is implemented entirely in R (see file src/sas7bdat.r). Files not recognized as having been generated under a Microsoft Windows platform are rejected (for now). Implementation of the read.sas7bdat function should be considered a reference implementation, and not one designed with performance in mind. There are certain advantages and disadvantages to developing a prototype of this nature in R. Advantages: 1. R is an interpreted language with built-in debugger. Hence, experimental routines may be implemented and debugged quickly and interactively, without the need of external compiler or debugger tools (e.g. gcc, gdb). 2. R programs are portable across a variety of computing platforms. This is especially important in the present context, because manipulating files stored on disk is a platform-specific task. Platform-specific operations are abstracted from the R user. Disadvantages: 1. Manipulating binary (raw) data in R is a relatively new capability. The best tools and practices for binary data operations are not as developed as those for other data types. 2. Interpreted code is often much less efficient than compiled code. This is not major disadvantage for prototype implementations because human code development is far less efficient than the R interpreter. Gains made in efficient code development using an interpreted language far outweigh benefit of compiled languages. ToDo ˆ experiment further with amendment page concept ˆ consider header bytes -by- SAS host ˆ check that only one page of type mix is observed. If so insert In all test cases (data/sources.csv), there are exactly zero or one pages of type mix. under the Page Offset Table header. ˆ identify all missing value representations: missing numeric values appear to be represented as D1FFFF (nan) for numeric double quantities. ˆ identify purpose of subheader 00FCFFFF ˆ identify purpose of unknown header quantities ˆ determine other bytes in subheader with signature FEFBFFFF ˆ can SAS7BDAT files use non-ascii encoding? ˆ identify SAS7BDAT compression and encryption methods (this is not the same as cracking, or breaking encryption): data files may be compressed using the RLE (CHAR) and RDC (BINARY) algorithms. 8

Data Storage and Query Answering. Data Storage and Disk Structure (4)

Data Storage and Query Answering. Data Storage and Disk Structure (4) Data Storage and Query Answering Data Storage and Disk Structure (4) Introduction We have introduced secondary storage devices, in particular disks. Disks use blocks as basic units of transfer and storage.

More information

13 File Structures. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

13 File Structures. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to: 13 File Structures 13.1 Source: Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: Define two categories of access methods: sequential

More information

Basic data types. Building blocks of computation

Basic data types. Building blocks of computation Basic data types Building blocks of computation Goals By the end of this lesson you will be able to: Understand the commonly used basic data types of C++ including Characters Integers Floating-point values

More information

Number Systems. Binary Numbers. Appendix. Decimal notation represents numbers as powers of 10, for example

Number Systems. Binary Numbers. Appendix. Decimal notation represents numbers as powers of 10, for example Appendix F Number Systems Binary Numbers Decimal notation represents numbers as powers of 10, for example 1729 1 103 7 102 2 101 9 100 decimal = + + + There is no particular reason for the choice of 10,

More information

Portable Executable format, TitaniumCore report and packers. Katja Pericin

Portable Executable format, TitaniumCore report and packers. Katja Pericin Portable Executable format, TitaniumCore report and packers Katja Pericin Portable Executable format 3/21/2018 2 Introduction file? operating system abstraction for a data container segment(s) of physical

More information

(C) Copyright. Andrew Dainis =============================================================================

(C) Copyright. Andrew Dainis ============================================================================= (C) Copyright. Andrew Dainis ============================================================================= The C3D data file format was developed by Andrew Dainis in 1987 as a convenient and efficient

More information

Database Systems II. Record Organization

Database Systems II. Record Organization Database Systems II Record Organization CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 75 Introduction We have introduced secondary storage devices, in particular disks. Disks use blocks as

More information

CAAM 420 FALL 2012 Lecture 9. Mishael Owoyemi

CAAM 420 FALL 2012 Lecture 9. Mishael Owoyemi CAAM 420 FALL 2012 Lecture 9 Mishael Owoyemi September 21, 2012 Table of Contents 1 Variables 3 1.1 Recap and Introduction.................................. 3 1.1.1 Recap........................................

More information

IECD Institute for Entrepreneurship and Career Development Bharathidasan University, Tiruchirappalli 23.

IECD Institute for Entrepreneurship and Career Development Bharathidasan University, Tiruchirappalli 23. Subject code - CCP01 Chapt Chapter 1 INTRODUCTION TO C 1. A group of software developed for certain purpose are referred as ---- a. Program b. Variable c. Software d. Data 2. Software is classified into

More information

1. Introduction to the Common Language Infrastructure

1. Introduction to the Common Language Infrastructure Miller-CHP1.fm Page 1 Wednesday, September 24, 2003 1:50 PM to the Common Language Infrastructure The Common Language Infrastructure (CLI) is an International Standard that is the basis for creating execution

More information

Data Storage. August 9, Indiana University. Geoffrey Brown, Bryce Himebaugh 2015 August 9, / 19

Data Storage. August 9, Indiana University. Geoffrey Brown, Bryce Himebaugh 2015 August 9, / 19 Data Storage Geoffrey Brown Bryce Himebaugh Indiana University August 9, 2016 Geoffrey Brown, Bryce Himebaugh 2015 August 9, 2016 1 / 19 Outline Bits, Bytes, Words Word Size Byte Addressable Memory Byte

More information

File Format Specification MMPLD Version: 1.2 Release Author: Sebastian Grottel Date:

File Format Specification MMPLD Version: 1.2 Release Author: Sebastian Grottel Date: File Format Specification MMPLD Version: 1.2 Release Author: Sebastian Grottel Date: 17.05.2016 Preface The file formats MMPLD and MMDPLD basically are binary memory dumps of MegaMol s internal data structures,

More information

These notes are designed to provide an introductory-level knowledge appropriate to understanding the basics of digital data formats.

These notes are designed to provide an introductory-level knowledge appropriate to understanding the basics of digital data formats. A brief guide to binary data Mike Sandiford, March 2001 These notes are designed to provide an introductory-level knowledge appropriate to understanding the basics of digital data formats. The problem

More information

ASPRS LiDAR SPRS Data Exchan LiDAR Data Exchange Format Standard LAS ge Format Standard LAS IIT Kanp IIT Kan ur

ASPRS LiDAR SPRS Data Exchan LiDAR Data Exchange Format Standard LAS ge Format Standard LAS IIT Kanp IIT Kan ur ASPRS LiDAR Data Exchange Format Standard LAS IIT Kanpur 1 Definition: Files conforming to the ASPRS LIDAR data exchange format standard are named with a LAS extension. The LAS file is intended to contain

More information

Essential Skills for Bioinformatics: Unix/Linux

Essential Skills for Bioinformatics: Unix/Linux Essential Skills for Bioinformatics: Unix/Linux WORKING WITH COMPRESSED DATA Overview Data compression, the process of condensing data so that it takes up less space (on disk drives, in memory, or across

More information

CS Programming In C

CS Programming In C CS 24000 - Programming In C Week Two: Basic C Program Organization and Data Types Zhiyuan Li Department of Computer Science Purdue University, USA 2 int main() { } return 0; The Simplest C Program C programs

More information

BMP file format - Wikipedia

BMP file format - Wikipedia Page 1 of 3 Bitmap file header This block of bytes is at the start of the file and is used to identify the file. A typical application reads this block first to ensure that the file is actually a BMP file

More information

UNIT 7A Data Representation: Numbers and Text. Digital Data

UNIT 7A Data Representation: Numbers and Text. Digital Data UNIT 7A Data Representation: Numbers and Text 1 Digital Data 10010101011110101010110101001110 What does this binary sequence represent? It could be: an integer a floating point number text encoded with

More information

Unix and C Program Development SEEM

Unix and C Program Development SEEM Unix and C Program Development SEEM 3460 1 Operating Systems A computer system cannot function without an operating system (OS). There are many different operating systems available for PCs, minicomputers,

More information

Avro Specification

Avro Specification Table of contents 1 Introduction...2 2 Schema Declaration... 2 2.1 Primitive Types... 2 2.2 Complex Types...2 2.3 Names... 5 3 Data Serialization...6 3.1 Encodings... 6 3.2 Binary Encoding...6 3.3 JSON

More information

Avro Specification

Avro Specification Table of contents 1 Introduction...2 2 Schema Declaration... 2 2.1 Primitive Types... 2 2.2 Complex Types...2 2.3 Names... 5 2.4 Aliases... 6 3 Data Serialization...6 3.1 Encodings... 7 3.2 Binary Encoding...7

More information

Work relative to other classes

Work relative to other classes Work relative to other classes 1 Hours/week on projects 2 C BOOTCAMP DAY 1 CS3600, Northeastern University Slides adapted from Anandha Gopalan s CS132 course at Univ. of Pittsburgh Overview C: A language

More information

COMP2611: Computer Organization. Data Representation

COMP2611: Computer Organization. Data Representation COMP2611: Computer Organization Comp2611 Fall 2015 2 1. Binary numbers and 2 s Complement Numbers 3 Bits: are the basis for binary number representation in digital computers What you will learn here: How

More information

CS 525: Advanced Database Organization 03: Disk Organization

CS 525: Advanced Database Organization 03: Disk Organization CS 525: Advanced Database Organization 03: Disk Organization Boris Glavic Slides: adapted from a course taught by Hector Garcia-Molina, Stanford InfoLab CS 525 Notes 3 1 Topics for today How to lay out

More information

BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS. What is SAS History of SAS Modules available SAS

BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS. What is SAS History of SAS Modules available SAS SAS COURSE CONTENT Course Duration - 40hrs BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS What is SAS History of SAS Modules available SAS GETTING STARTED

More information

Representing Data Elements

Representing Data Elements Representing Data Elements Week 10 and 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 18.3.2002 by Hector Garcia-Molina, Vera Goebel INF3100/INF4100 Database Systems Page

More information

Programming in C++ 6. Floating point data types

Programming in C++ 6. Floating point data types Programming in C++ 6. Floating point data types! Introduction! Type double! Type float! Changing types! Type promotion & conversion! Casts! Initialization! Assignment operators! Summary 1 Introduction

More information

A Proposal for Endian-Portable Record Representation Clauses

A Proposal for Endian-Portable Record Representation Clauses A Proposal for Endian-Portable Record Representation Clauses Norman H. Cohen What problem are we solving? There are at least two endian problems. One is the run-time problem of transferring data between

More information

Lecture 3. Essential skills for bioinformatics: Unix/Linux

Lecture 3. Essential skills for bioinformatics: Unix/Linux Lecture 3 Essential skills for bioinformatics: Unix/Linux RETRIEVING DATA Overview Whether downloading large sequencing datasets or accessing a web application hundreds of times to download specific files,

More information

[MS-FSSHTTPD]: Binary Data Format for File Synchronization via SOAP. Intellectual Property Rights Notice for Open Specifications Documentation

[MS-FSSHTTPD]: Binary Data Format for File Synchronization via SOAP. Intellectual Property Rights Notice for Open Specifications Documentation [MS-FSSHTTPD]: Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation ( this documentation ) for protocols,

More information

CIS 2107 Computer Systems and Low-Level Programming Fall 2011 Midterm

CIS 2107 Computer Systems and Low-Level Programming Fall 2011 Midterm Fall 2011 Name: Page Points Score 1 5 2 10 3 10 4 7 5 8 6 15 7 4 8 7 9 16 10 18 Total: 100 Instructions The exam is closed book, closed notes. You may not use a calculator, cell phone, etc. For each of

More information

Here is a C function that will print a selected block of bytes from such a memory block, using an array-based view of the necessary logic:

Here is a C function that will print a selected block of bytes from such a memory block, using an array-based view of the necessary logic: Pointer Manipulations Pointer Casts and Data Accesses Viewing Memory The contents of a block of memory may be viewed as a collection of hex nybbles indicating the contents of the byte in the memory region;

More information

,879 B FAT #1 FAT #2 root directory data. Figure 1: Disk layout for a 1.44 Mb DOS diskette. B is the boot sector.

,879 B FAT #1 FAT #2 root directory data. Figure 1: Disk layout for a 1.44 Mb DOS diskette. B is the boot sector. Homework 11 Spring 2012 File Systems: Part 2 MAT 4970 April 18, 2012 Background To complete this assignment, you need to know how directories and files are stored on a 1.44 Mb diskette, formatted for DOS/Windows.

More information

CP SC 4040/6040 Computer Graphics Images. Joshua Levine

CP SC 4040/6040 Computer Graphics Images. Joshua Levine CP SC 4040/6040 Computer Graphics Images Joshua Levine levinej@clemson.edu Lecture 03 File Formats Aug. 27, 2015 Agenda pa01 - Due Tues. 9/8 at 11:59pm More info: http://people.cs.clemson.edu/ ~levinej/courses/6040

More information

US Options Complex Multicast TOP Specification

US Options Complex Multicast TOP Specification US Options Complex Multicast TOP Specification Version 1.0.12 March 23, 2018 Contents 1 Introduction... 5 1.1 Overview... 5 1.2 Feed Connectivity Requirements... 5 1.3 Symbol Ranges, Units, and Sequence

More information

The emulator represents a single 1103A machine word as the 36 least significant bits of a 64 bit unsigned integer.

The emulator represents a single 1103A machine word as the 36 least significant bits of a 64 bit unsigned integer. NAME atlas an 1103A emulator for UNIX systems SYNOPSIS atlas DESCRIPTION Atlas is an emulator for the Univac Scientific 1103A. In addition to implementing all 41 basic instructions of the CPU, including

More information

UNIT- 3 Introduction to C++

UNIT- 3 Introduction to C++ UNIT- 3 Introduction to C++ C++ Character Sets: Letters A-Z, a-z Digits 0-9 Special Symbols Space + - * / ^ \ ( ) [ ] =!= . $, ; : %! &? _ # = @ White Spaces Blank spaces, horizontal tab, carriage

More information

No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation.

No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation. [MS-FSSHTTPD]: Intellectual Property Rights Notice for Open Specifications Documentation Technical Documentation. Microsoft publishes Open Specifications documentation for protocols, file formats, languages,

More information

INTRODUCTION 1 AND REVIEW

INTRODUCTION 1 AND REVIEW INTRODUTION 1 AND REVIEW hapter SYS-ED/ OMPUTER EDUATION TEHNIQUES, IN. Programming: Advanced Objectives You will learn: Program structure. Program statements. Datatypes. Pointers. Arrays. Structures.

More information

/* defines are mostly used to declare constants */ #define MAX_ITER 10 // max number of iterations

/* defines are mostly used to declare constants */ #define MAX_ITER 10 // max number of iterations General Framework of a C program for MSP430 /* Compiler directives (includes & defines) come first */ // Code you write in this class will need this include #include msp430.h // CCS header for MSP430 family

More information

CHAPTER 3 Expressions, Functions, Output

CHAPTER 3 Expressions, Functions, Output CHAPTER 3 Expressions, Functions, Output More Data Types: Integral Number Types short, long, int (all represent integer values with no fractional part). Computer Representation of integer numbers - Number

More information

FYSOS and the Simple File System This document pertains to and is written for the purpose of adding this file system to FYSOS found at:

FYSOS and the Simple File System This document pertains to and is written for the purpose of adding this file system to FYSOS found at: The Simple File System 18 September 2017 Original Design by Brendan Trotter This documentation and minor additions by Benjamin David Lunt Copyright (c) Forever Young Software 1984-2017 Version 1.10.rc02

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models RCFile: A Fast and Space-efficient Data

More information

CS201 - Introduction to Programming Glossary By

CS201 - Introduction to Programming Glossary By CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with

More information

Basic Compression Library

Basic Compression Library Basic Compression Library Manual API version 1.2 July 22, 2006 c 2003-2006 Marcus Geelnard Summary This document describes the algorithms used in the Basic Compression Library, and how to use the library

More information

Page 1. Where Have We Been? Chapter 2 Representing and Manipulating Information. Why Don t Computers Use Base 10?

Page 1. Where Have We Been? Chapter 2 Representing and Manipulating Information. Why Don t Computers Use Base 10? Where Have We Been? Class Introduction Great Realities of Computing Int s are not Integers, Float s are not Reals You must know assembly Memory Matters Performance! Asymptotic Complexity It s more than

More information

Memory and C/C++ modules

Memory and C/C++ modules Memory and C/C++ modules From Reading #5 and mostly #6 More OOP topics (templates; libraries) as time permits later Program building l Have: source code human readable instructions l Need: machine language

More information

Lab 03 - x86-64: atoi

Lab 03 - x86-64: atoi CSCI0330 Intro Computer Systems Doeppner Lab 03 - x86-64: atoi Due: October 1, 2017 at 4pm 1 Introduction 1 2 Assignment 1 2.1 Algorithm 2 3 Assembling and Testing 3 3.1 A Text Editor, Makefile, and gdb

More information

Time: 8:30-10:00 pm (Arrive at 8:15 pm) Location What to bring:

Time: 8:30-10:00 pm (Arrive at 8:15 pm) Location What to bring: ECE 120 Midterm 1 HKN Review Session Time: 8:30-10:00 pm (Arrive at 8:15 pm) Location: Your Room on Compass What to bring: icard, pens/pencils, Cheat sheet (Handwritten) Overview of Review Binary IEEE

More information

SAINT2. System Analysis Interface Tool 2. Emulation User Guide. Version 2.5. May 27, Copyright Delphi Automotive Systems Corporation 2009, 2010

SAINT2. System Analysis Interface Tool 2. Emulation User Guide. Version 2.5. May 27, Copyright Delphi Automotive Systems Corporation 2009, 2010 SAINT2 System Analysis Interface Tool 2 Emulation User Guide Version 2.5 May 27, 2010 Copyright Delphi Automotive Systems Corporation 2009, 2010 Maintained by: SAINT2 Team Delphi www.saint2support.com

More information

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM 1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM 1.1 Introduction Given that digital logic and memory devices are based on two electrical states (on and off), it is natural to use a number

More information

Lecture 3: C Programm

Lecture 3: C Programm 0 3 E CS 1 Lecture 3: C Programm ing Reading Quiz Note the intimidating red border! 2 A variable is: A. an area in memory that is reserved at run time to hold a value of particular type B. an area in memory

More information

LZ UTF8. LZ UTF8 is a practical text compression library and stream format designed with the following objectives and properties:

LZ UTF8. LZ UTF8 is a practical text compression library and stream format designed with the following objectives and properties: LZ UTF8 LZ UTF8 is a practical text compression library and stream format designed with the following objectives and properties: 1. Compress UTF 8 and 7 bit ASCII strings only. No support for arbitrary

More information

Course Schedule. CS 221 Computer Architecture. Week 3: Plan. I. Hexadecimals and Character Representations. Hexadecimal Representation

Course Schedule. CS 221 Computer Architecture. Week 3: Plan. I. Hexadecimals and Character Representations. Hexadecimal Representation Course Schedule CS 221 Computer Architecture Week 3: Information Representation (2) Fall 2001 W1 Sep 11- Sep 14 Introduction W2 Sep 18- Sep 21 Information Representation (1) (Chapter 3) W3 Sep 25- Sep

More information

Name: CMSC 313 Fall 2001 Computer Organization & Assembly Language Programming Exam 1. Question Points I. /34 II. /30 III.

Name: CMSC 313 Fall 2001 Computer Organization & Assembly Language Programming Exam 1. Question Points I. /34 II. /30 III. CMSC 313 Fall 2001 Computer Organization & Assembly Language Programming Exam 1 Name: Question Points I. /34 II. /30 III. /36 TOTAL: /100 Instructions: 1. This is a closed-book, closed-notes exam. 2. You

More information

MIB BROADCAST STREAM SPECIFICATION

MIB BROADCAST STREAM SPECIFICATION MIB BROADCAST STREAM SPECIFICATION November 5, 2002, Version 1.0 This document contains a specification for the MIB broadcast stream. It will be specified in a language independent manner. It is intended

More information

Introduction to Computers and Programming. Numeric Values

Introduction to Computers and Programming. Numeric Values Introduction to Computers and Programming Prof. I. K. Lundqvist Lecture 5 Reading: B pp. 47-71 Sept 1 003 Numeric Values Storing the value of 5 10 using ASCII: 00110010 00110101 Binary notation: 00000000

More information

Administrivia. CMSC 216 Introduction to Computer Systems Lecture 24 Data Representation and Libraries. Representing characters DATA REPRESENTATION

Administrivia. CMSC 216 Introduction to Computer Systems Lecture 24 Data Representation and Libraries. Representing characters DATA REPRESENTATION Administrivia CMSC 216 Introduction to Computer Systems Lecture 24 Data Representation and Libraries Jan Plane & Alan Sussman {jplane, als}@cs.umd.edu Project 6 due next Friday, 12/10 public tests posted

More information

EE168 Lab/Homework #1 Introduction to Digital Image Processing Handout #3

EE168 Lab/Homework #1 Introduction to Digital Image Processing Handout #3 EE168 Lab/Homework #1 Introduction to Digital Image Processing Handout #3 We will be combining laboratory exercises with homework problems in the lab sessions for this course. In the scheduled lab times,

More information

CITS2401 Computer Analysis & Visualisation

CITS2401 Computer Analysis & Visualisation FACULTY OF ENGINEERING, COMPUTING AND MATHEMATICS CITS2401 Computer Analysis & Visualisation SCHOOL OF COMPUTER SCIENCE AND SOFTWARE ENGINEERING Topic 3 Introduction to Matlab Material from MATLAB for

More information

Lecture 2: C Programm

Lecture 2: C Programm 0 3 E CS 1 Lecture 2: C Programm ing C Programming Procedural thought process No built in object abstractions data separate from methods/functions Low memory overhead compared to Java No overhead of classes

More information

MACHINE LEVEL REPRESENTATION OF DATA

MACHINE LEVEL REPRESENTATION OF DATA MACHINE LEVEL REPRESENTATION OF DATA CHAPTER 2 1 Objectives Understand how integers and fractional numbers are represented in binary Explore the relationship between decimal number system and number systems

More information

Description Hex M E V smallest value > largest denormalized negative infinity number with hex representation 3BB0 ---

Description Hex M E V smallest value > largest denormalized negative infinity number with hex representation 3BB0 --- CSE2421 HOMEWORK #2 DUE DATE: MONDAY 11/5 11:59pm PROBLEM 2.84 Given a floating-point format with a k-bit exponent and an n-bit fraction, write formulas for the exponent E, significand M, the fraction

More information

Functions. Using Bloodshed Dev-C++ Heejin Park. Hanyang University

Functions. Using Bloodshed Dev-C++ Heejin Park. Hanyang University Functions Using Bloodshed Dev-C++ Heejin Park Hanyang University 2 Introduction Reviewing Functions ANSI C Function Prototyping Recursion Compiling Programs with Two or More Source Code Files Finding Addresses:

More information

Short Notes of CS201

Short Notes of CS201 #includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system

More information

CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS

CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS Aleksandar Milenković The LaCASA Laboratory, ECE Department, The University of Alabama in Huntsville Email: milenka@uah.edu Web:

More information

Reminder: compiling & linking

Reminder: compiling & linking Reminder: compiling & linking source file 1 object file 1 source file 2 compilation object file 2 library object file 1 linking (relocation + linking) load file source file N object file N library object

More information

CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS

CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS Aleksandar Milenković The LaCASA Laboratory, ECE Department, The University of Alabama in Huntsville Email: milenka@uah.edu Web:

More information

GNSS High Rate Binary Format (v2.1) 1 Overview

GNSS High Rate Binary Format (v2.1) 1 Overview GNSS High Rate Binary Format (v2.1) 1 Overview This document describes a binary GNSS data format intended for storing both low and high rate (>1Hz) tracking data. To accommodate all modern GNSS measurements

More information

Motivation was to facilitate development of systems software, especially OS development.

Motivation was to facilitate development of systems software, especially OS development. A History Lesson C Basics 1 Development of language by Dennis Ritchie at Bell Labs culminated in the C language in 1972. Motivation was to facilitate development of systems software, especially OS development.

More information

A Forensic Log File Extraction Tool for ICQ Instant Messaging Clients

A Forensic Log File Extraction Tool for ICQ Instant Messaging Clients Edith Cowan University Research Online ECU Publications Pre. 2011 2006 A Forensic Log File Extraction Tool for ICQ Instant Messaging Clients Kim Morfitt Edith Cowan University Craig Valli Edith Cowan University

More information

CS 245: Database System Principles

CS 245: Database System Principles CS 245: Database System Principles Notes 03: Disk Organization Peter Bailis CS 245 Notes 3 1 Topics for today How to lay out data on disk How to move it to memory CS 245 Notes 3 2 What are the data items

More information

US Options Complex Multicast TOP Specification

US Options Complex Multicast TOP Specification US Options Complex Multicast TOP Specification Version 1.0.4 September 1, 2017 Contents 1 Introduction... 5 1.1 Overview... 5 1.2 Feed Connectivity Requirements... 5 1.3 Symbol Ranges, Units, and Sequence

More information

The Design and Implementation of a Rigorous. A Rigorous High Precision Floating Point Arithmetic. for Taylor Models

The Design and Implementation of a Rigorous. A Rigorous High Precision Floating Point Arithmetic. for Taylor Models The and of a Rigorous High Precision Floating Point Arithmetic for Taylor Models Department of Physics, Michigan State University East Lansing, MI, 48824 4th International Workshop on Taylor Methods Boca

More information

CPS 104 Computer Organization and Programming Lecture-2 : Data representations,

CPS 104 Computer Organization and Programming Lecture-2 : Data representations, CPS 104 Computer Organization and Programming Lecture-2 : Data representations, Sep. 1, 1999 Dietolf Ramm http://www.cs.duke.edu/~dr/cps104.html CPS104 Lec2.1 GK&DR Fall 1999 Data Representation Computers

More information

Project 3: Base64 Content-Transfer-Encoding

Project 3: Base64 Content-Transfer-Encoding CMSC 313, Computer Organization & Assembly Language Programming Section 0101 Fall 2001 Project 3: Base64 Content-Transfer-Encoding Due: Tuesday November 13, 2001 Objective The objectives of this programming

More information

Caching and Buffering in HDF5

Caching and Buffering in HDF5 Caching and Buffering in HDF5 September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 1 Software stack Life cycle: What happens to data when it is transferred from application buffer to HDF5 file and from HDF5

More information

Lecture (01) Digital Systems and Binary Numbers By: Dr. Ahmed ElShafee

Lecture (01) Digital Systems and Binary Numbers By: Dr. Ahmed ElShafee ١ Lecture (01) Digital Systems and Binary Numbers By: Dr. Ahmed ElShafee Digital systems Digital systems are used in communication, business transactions, traffic control, spacecraft guidance, medical

More information

Question 4: a. We want to store a binary encoding of the 150 original Pokemon. How many bits do we need to use?

Question 4: a. We want to store a binary encoding of the 150 original Pokemon. How many bits do we need to use? Question 4: a. We want to store a binary encoding of the 150 original Pokemon. How many bits do we need to use? b. What is the encoding for Pikachu (#25)? Question 2: Flippin Fo Fun (10 points, 14 minutes)

More information

Image Compression. cs2: Computational Thinking for Scientists.

Image Compression. cs2: Computational Thinking for Scientists. Image Compression cs2: Computational Thinking for Scientists Çetin Kaya Koç http://cs.ucsb.edu/~koc/cs2 koc@cs.ucsb.edu The course was developed with input from: Ömer Eǧecioǧlu (Computer Science), Maribel

More information

Time (self-scheduled): Location Schedule Your Exam: What to bring:

Time (self-scheduled): Location Schedule Your Exam: What to bring: ECE 120 Midterm 1B HKN Review Session Time (self-scheduled): Between Wednesday, September 27 and Friday, September 29, 2017 Location: 57 Grainger Engineering Library (in the basement on the east side)

More information

Building a Test Suite

Building a Test Suite Program #3 Is on the web Exam #1 Announcements Today, 6:00 7:30 in Armory 0126 Makeup Exam Friday March 9, 2:00 PM room TBA Reading Notes (Today) Chapter 16 (Tuesday) 1 API: Building a Test Suite Int createemployee(char

More information

User Commands GZIP ( 1 )

User Commands GZIP ( 1 ) NAME gzip, gunzip, gzcat compress or expand files SYNOPSIS gzip [ acdfhllnnrtvv19 ] [ S suffix] [ name... ] gunzip [ acfhllnnrtvv ] [ S suffix] [ name... ] gzcat [ fhlv ] [ name... ] DESCRIPTION Gzip reduces

More information

Appendix. Numbering Systems. In This Appendix...

Appendix. Numbering Systems. In This Appendix... Numbering Systems ppendix In This ppendix... Introduction... inary Numbering System... exadecimal Numbering System... Octal Numbering System... inary oded ecimal () Numbering System... 5 Real (Floating

More information

Raima Database Manager Version 14.1 In-memory Database Engine

Raima Database Manager Version 14.1 In-memory Database Engine + Raima Database Manager Version 14.1 In-memory Database Engine By Jeffrey R. Parsons, Chief Engineer November 2017 Abstract Raima Database Manager (RDM) v14.1 contains an all new data storage engine optimized

More information

Bits and Bytes. Why bits? Representing information as bits Binary/Hexadecimal Byte representations» numbers» characters and strings» Instructions

Bits and Bytes. Why bits? Representing information as bits Binary/Hexadecimal Byte representations» numbers» characters and strings» Instructions Bits and Bytes Topics Why bits? Representing information as bits Binary/Hexadecimal Byte representations» numbers» characters and strings» Instructions Bit-level manipulations Boolean algebra Expressing

More information

File Management. Ezio Bartocci.

File Management. Ezio Bartocci. File Management Ezio Bartocci ezio.bartocci@tuwien.ac.at Cyber-Physical Systems Group Institute for Computer Engineering Faculty of Informatics, TU Wien Motivation A process can only contain a limited

More information

mmap: Memory Mapped Files in R

mmap: Memory Mapped Files in R mmap: Memory Mapped Files in R Jeffrey A. Ryan August 1, 2011 Abstract The mmap package offers a cross-platform interface for R to information that resides on disk. As dataset grow, the finite limits of

More information

Rationale for TR Extension to the programming language C. Decimal Floating-Point Arithmetic

Rationale for TR Extension to the programming language C. Decimal Floating-Point Arithmetic WG14 N1161 Rationale for TR 24732 Extension to the programming language C Decimal Floating-Point Arithmetic Contents 1 Introduction... 1 1.1 Background... 1 1.2 The Arithmetic Model... 3 1.3 The Encodings...

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY HAYSTACK OBSERVATORY WESTFORD, MASSACHUSETTS December 2009

MASSACHUSETTS INSTITUTE OF TECHNOLOGY HAYSTACK OBSERVATORY WESTFORD, MASSACHUSETTS December 2009 MASSACHUSETTS INSTITUTE OF TECHNOLOGY HAYSTACK OBSERVATORY MARK 5 MEMO #081 WESTFORD, MASSACHUSETTS 01886 10 December 2009 Telephone: 978-692-4764 Fax: 781-981-0590 TO: FROM: SUBJECT: General Mark 5 group

More information

l l l l l l l Base 2; each digit is 0 or 1 l Each bit in place i has value 2 i l Binary representation is used in computers

l l l l l l l Base 2; each digit is 0 or 1 l Each bit in place i has value 2 i l Binary representation is used in computers 198:211 Computer Architecture Topics: Lecture 8 (W5) Fall 2012 Data representation 2.1 and 2.2 of the book Floating point 2.4 of the book Computer Architecture What do computers do? Manipulate stored information

More information

Accessing Arbitrary Hierarchical Data

Accessing Arbitrary Hierarchical Data D.G.Muir February 2010 Accessing Arbitrary Hierarchical Data Accessing experimental data is relatively straightforward when data are regular and can be modelled using fixed size arrays of an atomic data

More information

Chapter 4: Data Representations

Chapter 4: Data Representations Chapter 4: Data Representations Integer Representations o unsigned o sign-magnitude o one's complement o two's complement o bias o comparison o sign extension o overflow Character Representations Floating

More information

Pretty-printing of kernel data structures

Pretty-printing of kernel data structures Pretty-printing of kernel data structures Daniel Lovasko Charles University in Prague lovasko@freebsd.org Abstract One of the key features of a debugger is the ability to examine memory and the associated

More information

CSC 453 Operating Systems

CSC 453 Operating Systems CSC 453 Operating Systems Lecture 10 : File-System Interface The Concept of A File A file is a collection of data stored on external device or A file is a collection of data entering or exiting the computer.

More information

Lecture 03 Bits, Bytes and Data Types

Lecture 03 Bits, Bytes and Data Types Lecture 03 Bits, Bytes and Data Types Computer Languages A computer language is a language that is used to communicate with a machine. Like all languages, computer languages have syntax (form) and semantics

More information

Bits, Words, and Integers

Bits, Words, and Integers Computer Science 52 Bits, Words, and Integers Spring Semester, 2017 In this document, we look at how bits are organized into meaningful data. In particular, we will see the details of how integers are

More information

Room 3P16 Telephone: extension ~irjohnson/uqc146s1.html

Room 3P16 Telephone: extension ~irjohnson/uqc146s1.html UQC146S1 Introductory Image Processing in C Ian Johnson Room 3P16 Telephone: extension 3167 Email: Ian.Johnson@uwe.ac.uk http://www.csm.uwe.ac.uk/ ~irjohnson/uqc146s1.html Ian Johnson 1 UQC146S1 What is

More information

Appendix. Numbering Systems. In this Appendix

Appendix. Numbering Systems. In this Appendix Numbering Systems ppendix n this ppendix ntroduction... inary Numbering System... exadecimal Numbering System... Octal Numbering System... inary oded ecimal () Numbering System... 5 Real (Floating Point)

More information

Introduction Presentation A

Introduction Presentation A CSE 2421/5042: Systems I Low-Level Programming and Computer Organization Introduction Presentation A Read carefully: Bryant Chapter 1 Study: Reek Chapter 2 Skim: Reek Chapter 1 08/22/2018 Gojko Babić Some

More information