AVL 4 4 PDV DECLARE 7 _NEW_

Similar documents
CS201 - Introduction to Programming Glossary By

UNIT - 5 EDITORS AND DEBUGGING SYSTEMS

Short Notes of CS201

Contents. About This Book...1

Part 1. Introduction to File Organization


Introduction to File Structures

Presenters. Paul Dorfman, Independent Consultant Don Henderson, Henderson Consulting Services, LLC

Index. object lifetimes, and ownership, use after change by an alias errors, use after drop errors, BTreeMap, 309

Introduction p. 1 Pseudocode p. 2 Algorithm Header p. 2 Purpose, Conditions, and Return p. 3 Statement Numbers p. 4 Variables p. 4 Algorithm Analysis

The Language for Specifying Lexical Analyzer

A Side of Hash for You To Dig Into

Search Engine Report May

Updating Data Using the MODIFY Statement and the KEY= Option

Introduction. Choice orthogonal to indexing technique used to locate entries K.

C Language Programming

Pace University. Fundamental Concepts of CS121 1

External Data Representation (XDR)

Bean's Automatic Tape Manipulator A Description, and Operating Instructions. Jeffrey Bean

Base and Advance SAS

PL/SQL Block structure

Accelerated Library Framework for Hybrid-x86


Stream Computing using Brook+

CERTIFICATE IN WEB PROGRAMMING

LESSON 6 FLOW OF CONTROL

Chapter 17. Disk Storage, Basic File Structures, and Hashing. Records. Blocking

Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Contents. Preface. Introduction. Introduction to C Programming

Contents I Introduction 1 Introduction to PL/SQL iii

5. Single-row function

egrapher Language Reference Manual

false, import, new 1 class Lecture2 { 2 3 "Data types, Variables, and Operators" 4

Essentials of PDV: Directing the Aim to Understanding the DATA Step! Arthur Xuejun Li, City of Hope National Medical Center, Duarte, CA

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

Programming Languages Third Edition. Chapter 9 Control I Expressions and Statements


Language Extensions for Vector loop level parallelism

The Compositional C++ Language. Denition. Abstract. This document gives a concise denition of the syntax and semantics

Computers Programming Course 5. Iulian Năstac

Absolute C++ Walter Savitch

COMP171. Hashing.

Weiss Chapter 1 terminology (parenthesized numbers are page numbers)

CIS 1.5 Course Objectives. a. Understand the concept of a program (i.e., a computer following a series of instructions)

Introduction to Computer Science and Business

Indexing Methods. Lecture 9. Storage Requirements of Databases

ORACLE: PL/SQL Programming

Request for Comments: 851 Obsoletes RFC: 802. The ARPANET 1822L Host Access Protocol RFC 851. Andrew G. Malis ARPANET Mail:

Introduction to Visual Basic and Visual C++ Arithmetic Expression. Arithmetic Expression. Using Arithmetic Expression. Lesson 4.

Introduction to Programming Using Java (98-388)

Qualifying Exam in Programming Languages and Compilers

ATOMS. 3.1 Interface. The Atom interface is simple: atom.h #ifndef ATOM_INCLUDED #define ATOM_INCLUDED

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer

Algorithms Interview Questions

Chapter 1 Disk Storage, Basic File Structures, and Hashing.

CSCI 171 Chapter Outlines

The Procedure Abstraction Part I: Basics

File Structures and Indexing

Chapter 1 INTRODUCTION. SYS-ED/ Computer Education Techniques, Inc.

UNIT 3

UNIVERSITY OF CALIFORNIA, SANTA CRUZ BOARD OF STUDIES IN COMPUTER ENGINEERING

Symbol Table. Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management

A Proposal for Endian-Portable Record Representation Clauses

Definition: A data structure is a way of organizing data in a computer so that it can be used efficiently.

Run-Time Data Structures

Algorithms & Data Structures

UNIT V SYSTEM SOFTWARE TOOLS

Extending SystemVerilog Data Types to Nets

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING B.E SECOND SEMESTER CS 6202 PROGRAMMING AND DATA STRUCTURES I TWO MARKS UNIT I- 2 MARKS

ADVANTAGES. Via PL/SQL, all sorts of calculations can be done quickly and efficiently without use of Oracle engine.

Model Viva Questions for Programming in C lab

Programming 2. Topic 8: Linked Lists, Basic Searching and Sorting

What are the characteristics of Object Oriented programming language?

3.Constructors and Destructors. Develop cpp program to implement constructor and destructor.

false, import, new 1 class Lecture2 { 2 3 "Data types, Variables, and Operators" 4

C & Data Structures syllabus

Glossary. abort. application schema

false, import, new 1 class Lecture2 { 2 3 "Data types, Variables, and Operators" 4

Contents of SAS Programming Techniques

Introduction hashing: a technique used for storing and retrieving information as quickly as possible.

CS Programming In C

XQ: An XML Query Language Language Reference Manual

1 P a g e A r y a n C o l l e g e \ B S c _ I T \ C \

Oracle Developer Track Course Contents. Mr. Sandeep M Shinde. Oracle Application Techno-Functional Consultant

Programming Languages Third Edition

Oracle Database 11g: Program with PL/SQL Release 2

Type Conversion. and. Statements

Let Hash SUMINC Count For You Joseph Hinson, Accenture Life Sciences, Berwyn, PA, USA

Contents. Chapter 1 Overview of the JavaScript C Engine...1. Chapter 2 JavaScript API Reference...23

Multiple Choice: 2 pts each CS-3020 Exam #3 (CH16-CH21) FORM: EX03:P

Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1

key h(key) Hash Indexing Friday, April 09, 2004 Disadvantages of Sequential File Organization Must use an index and/or binary search to locate data

IEEE LANGUAGE REFERENCE MANUAL Std P1076a /D3

CSE 544 Principles of Database Management Systems

Concept as a Generalization of Class and Principles of the Concept-Oriented Programming

The Procedure Abstraction

Chapter 2. DB2 concepts

For searching and sorting algorithms, this is particularly dependent on the number of data elements.

SAS Online Training: Course contents: Agenda:

IEEE Floating-Point Representation 1

Transcription:

Glossary Program Control... 2 SAS Variable... 2 Program Data Vector (PDV)... 2 SAS Expression... 2 Data Type... 3 Scalar... 3 Non-Scalar... 3 Big O Notation... 3 Hash Table... 3 Hash Algorithm... 4 Hash Function... 4 AVL Tree... 4 Hash Variable... 4 PDV Host Variables... 4 Parameter Type Matching... 4 Hash Table Item... 5 Hash Table Entry... 5 Key Portion... 5 Hash Table Key... 5 Key-Value... 5 Same-Key Item Group... 5 Data Portion... 6 Data-Value... 6 Hash Object... 6 Hash Object Name... 6 Hash Object Reference Variable... 6 Hash Object Tools... 6 Hash Object Instance... 7 Active Hash Object Instance... 7 Initialized Hash Object Instance... 7 DECLARE Statement... 7 _NEW_ Operator... 7 Compound DECLARE Statement... 7 Hash Attributes... 7 Hash Methods... 7

2 Data Management Solutions Using SAS Hash Table Operations: A Business Intelligence Case Study Hash Argument Tags... 8 Hash Arguments... 8 Hash Iterator Object... 8 Hash Iterator Name... 8 Hash Iterator Reference Variable... 8 Hash Iterator Instance... 8 Hash Iterator Active Instance... 8 Hash Iterator Pointer... 8 Hash Iterator Locking... 8 Explicit Method Call... 9 Implicit Method Call... 9 Assigned Method Call... 9 Unassigned Method Call... 9 LINK Statement... 9 DoW Loop... 9 Program Control An imaginary pointer to the run time DATA step statement being currently executed. The active statement highlighted by the DATA step debugger. When a statement is currently executed, it is said that it has program control. SAS Variable A data container in memory whose value can be used and changed by a program. A column (field) in a SAS data file, i.e. a SAS data set or data view. Any SAS variable has a number of attributes, such as the name, data type, length, etc. Program Data Vector (PDV) A logical image of the SAS variables in the DATA step memory, both created by the program and automatic, such as _N_, _IORC_, and _ERROR_. The PDV is commonly represented graphically as a sequence of containers named after the corresponding variables, holding their current values, and listed in the compilation order. SAS Expression An elemental program component with a specific action that resolves to a value. The elemental components include literal constants, variables, functions, formats, and informats. A group of elemental components combined via SAS operators that resolves to a value.

Glossary 3 Data Type A data attribute indicating the kind of operation that can be performed on the data. An attribute of an elemental program component indicating the kind of data it can operate on. Such components include literal constants, variables, functions, operators, formats, and informats. An attribute of a SAS expression indicating the data type of the value to which it resolves. Scalar A singular unit of actual data, such as a numeric or character value. A data type designed to process a specific kind of scalar data. Base SAS has two scalar data types: Numeric and character. A SAS variable of a scalar data type, i.e. a numeric or character variable. Any other numeric or character SAS expression. Non-Scalar A data structure with multiple data points, such as a list, an array, a table, an object, etc. A data type designed to process a specific kind of non-scalar data. Specific to this book, are two non-scalar data types: Type hash and type hash iterator (hiter). A PDV variable of non-scalar data type, whose values identify an occurrence of nonscalar data. Big O Notation A convention used to indicate how a computer operation's run time scales as the data grows. For example, if for a data table with N rows with N distinct keys it takes time T to read 1 row, the mean time ST needed to find a key via the linear search is proportional to T*N. Using the big O notation, this fact is denoted as O(N) or by saying that the search "runs in O(N) time". If the table is ordered by the key and the binary search is used, ST is proportional to T*log2(N). Respectively, in the big O notation it is denoted as O(log2(N)). The table can be organized to make its search time ST proportional to T*1=T, i.e. the same regardless of N. In this case, it is said that the search runs in O(1) or constant time. Hash Table A memory-resident table, i.e. a data structure with rows and columns. The rows are called hash items, and the columns are called hash variables. One or more columns are collectively designated as a key. The table is specifically organized to be searched by the key via the hash algorithm.

4 Data Management Solutions Using SAS Hash Table Operations: A Business Intelligence Case Study Hash Algorithm An algorithm designed to search a hash table by its key in O(1), i.e. constant, time. In the SAS hash object, the algorithm is based on coupling a hash function with AVL trees. It is executed by first locating a key-value in one of the AVL trees and then searching it. Hash Function A function whose response H splits N distinct arbitrary keys into H nearly equal groups (buckets) with approximately N/H key values in each group. The SAS hash object hash function splits distinct input key values evenly between H AVL trees. H is defined by the values of the HASHEXP:P argument tag as H=2**P. By default, P=8. AVL Tree A self-balancing binary search tree (named by its inventors Adelson-Velsky and Landis). Self-balancing is a specific mechanism of inserting input keys into the tree. It ensures that the search time of a tree with N distinct keys stays O(log2(N)) regardless of the input key values. The AVL trees implemented in SAS are fast and efficient. They underlie not only the hash object, but also SAS procedures like MEANS, FREQ, TABULATE, etc. Hash Variable Hash table column, whose names adheres to the SAS variable naming conventions. Hash variables are structurally analogous to SAS data set variables. They are divided in two functionally distinct groups: Key portion and data portion. PDV Host Variables PDV variables whose names match the names of the defined hash variables. For each hash variable (defined at run time), there must have existed a host variable with the same name (defined at compile time). Hash variables inherit their attributes (such as length) from their respective host variables. The host variables are the PDV variables updated when data is retrieved from the hash table. The current PDV host variable values are the values assumed by implicit method calls adding, removing, or updating hash table items. Parameter Type Matching Any technique of placing the host variables in the PDV at compile time.

Glossary 5 Parameter type matching ensures that by the time the hash variables are being defined at run time, the host variables with the requisite names already exist in the PDV. Hash Table Item A hash table row populated with values of the key portion and data portion variables. A hash item in a hash table is structurally analogous to an observation in a SAS data set. Hash Table Entry A list of all hash variables, both in the key and data portions, and their attributes (such as length). The order of the variables in the hash table entry follows the order they are defined by the DEFINEKEY and DEFINEDATA methods. The hash table entry is analogous to the descriptor (aka the header) of a SAS data set. Key Portion The hash variables defined in the program to serve collectively as the hash table key. The values of key portion variables within an existing item cannot be updated. Conversely, their values cannot be used to update PDV variables. The data type of the key portion variables can be only scalar, i.e. either numeric or character. The key portion is mandatory, i.e. it must be defined with at least one hash variable. Hash Table Key The key portion variables whose combined value identifies one or more hash items. The hash variables comprising the key are called key variables or key components. If the key consists of one key variable, it is simple; otherwise, it is composite. The key variables are defined using the DEFINEKEY method. Key-Value The combination of the values of all key variables in the order they are defined. A combination of values of PDV variables (or other SAS expressions) the table is searched for to identify one or more hash items with the matching values of the key variables. Same-Key Item Group A group of hash items with the same key-value. Items with the same key-value can exist in a hash table if the MULTIDATA:"Y" argument tag is specified at the time when its hash object instance is created.

6 Data Management Solutions Using SAS Hash Table Operations: A Business Intelligence Case Study Data Portion The hash variables whose values can be both updated by the program and used to update the values of PDV variables. The data portion hash variables are called hash data variables. The hash data variables can be both scalar (i.e. numeric or character) and non-scalar. The values of non-scalar data variables can be used to identify non-scalar data, such as objects. The data portion, like the key portion, is mandatory. It must contain at least one hash variable. The hash data variables are defined using the DEFINEDATA method. Data-Value The value of an individual data portion hash variable. The value of a PDV variable (or other SAS expression) used to update a data portion variable. Hash Object A memory-resident data structure associated with an object reference variable. It comprises a hash table and a set of hash object tools invoked in the DATA step to perform hash operations on the table and its data. Hash Object Name A valid SAS name given to the hash object in the DECLARE statement. Hash Object Reference Variable A non-scalar DATA step PDV variable of type hash defined in the DECLARE statement at compile time with the same name as the hash object name. Hash Object Tools Functions or routines invoked via the DATA Step Component Interface (DSCI) to perform hash object operations. They include: DECLARE statement _NEW_ operator Hash attributes Hash methods Hash argument tags Hash arguments

Glossary 7 Hash Object Instance A run time physical occurrence of a hash object uniquely identified by a value of type hash. The actual container of a hash table that hash tools can manipulate. Active Hash Object Instance The hash object instance whose identifier value (also referred to as a hash instance pointer) is currently stored in the hash object reference PDV variable.the instance against which any hash tool called by the hash object name currently operates. Initialized Hash Object Instance An operable hash object instance whose hash table variables are defined and validated. Only the hash table data related to an initialized instance can be manipulated. DECLARE Statement A compile time statement to declare a hash object and define its reference PDV variable. _NEW_ Operator A run time tool to create a hash object instance and generate its unique identifier value. Compound DECLARE Statement A statement combining the actions of the DECLARE statement at compile time and the actions of the _NEW_ operator at run time. Hash Attributes Hash object tools that return descriptive information about the active hash object instance: NUM_ITEMS. Returns the current number of hash items in a hash table. ITEM_SIZE. Returns the actual length of hash table entry in bytes. Hash Methods Hash object functions called to manipulate the active hash object instance and its hash table data. They belong to two groups: Table-level: Acting upon the object instance and/or its hash table as a whole. Item-level: Acting upon individual hash table items.

8 Data Management Solutions Using SAS Hash Table Operations: A Business Intelligence Case Study Hash Argument Tags Keyword parameters through which most methods accept their arguments. Hash Arguments Expressions passed as values to argument tags (i.e. most methods). Expressions passed as values to the methods and/or statements with no argument tags (e.g., the DEFINEKEY and DEFINEDATA methods and the statements creating a hash iterator instance). Hash Iterator Object A data structure designed to process hash table items sequentially. Hash Iterator Name A valid SAS name given to the hash iterator object in the DECLARE statement. Hash Iterator Reference Variable A non-scalar DATA step PDV variable of type hash iterator defined in the DECLARE statement at compile time with the same name as the hash iterator object name. Hash Iterator Instance A run time physical occurence of the hash iterator object uniquely identified by a value of type hash iterator. It is uniquely linked to and always works with a specific hash object instance. Hash Iterator Active Instance The hash iterator instance whose identifier value is currently stored in its hash iterator reference PDV variable. The iterator instance the program uses whenever it is called by its iterator reference name. Hash Iterator Pointer An imaginary cursor pointing to the hash item the iterator is currently processing. If the iterator is not latched onto any item, the iterator pointer is thought to dwell outside the hash table. Hash Iterator Locking A condition when an item cannot be removed from a hash table because it is locked by a hash iterator. It occurs when: o The iterator pointer dwells on any item and an attempt is made to remove all items from the table.

Glossary 9 o The iterator pointer dwells on one item in a same-key item group, and an attempt is made to remove any item from that same-key group. Explicit Method Call A method call in which its argument tags are valued with expressions of the required data type. The method accepts the values to which the expessions resolve. Implicit Method Call A method call in which its argument tags are left out. The method accepts the current values from the corresponding PDV host variables. Assigned Method Call A method call whose return code is assigned to either: A separate receiving variable in an assignment statement. An internal buffer when the method call is used as part of a statement or an expression. Unassigned Method Call A standalone method call whose return code is not assigned to either a separate variable or the internal buffer. LINK Statement A labeled block of SAS DATA step statements called by it label name and coded at the bottom of the step after the RETURN or STOP statement. Used to avoid repetitive coding when the same block of statements needs to be executed more than once. An alternative to coding a similarly functional macro. DoW Loop A programming construct representing a DO loop wrapped around a file-reading statement, such as SET or MERGE, and terminated when a certain break-event condition is met. Typically, the break-event represents either having read the last record in a BY group or the last record in the entire file. The DoW loop takes over control of reading the file from the implied DATA step loop. As such, it has a number of significant advantages compared to relying exclusively on the implied loop in terms of simplifying programming logic, eliminating the need to retain and reinitialize aggregation variables, eschewing unnecessary conditional statements, and fostering better performance.

10 Data Management Solutions Using SAS Hash Table Operations: A Business Intelligence Case Study It can be used, in a single execution of the DATA step, to read each BY group or the entire file repeatedly or read different files one at a time, allowing for pre/post-processing before/after each separate read. "DoW loop" is an informal term for this construct by the SAS community since about 2000. See, for example: https://analytics.ncsu.edu/sesug/2011/ss01.dorfman.pdf