Small Formulas for Large Programs: On-line Constraint Simplification In Scalable Static Analysis

Similar documents
Finite Model Generation for Isabelle/HOL Using a SAT Solver

Local Two-Level And-Inverter Graph Minimization without Blowup

On Resolution Proofs for Combinational Equivalence Checking

Boolean Representations and Combinatorial Equivalence

Minimum Satisfying Assignments for SMT. Işıl Dillig, Tom Dillig Ken McMillan Alex Aiken College of William & Mary Microsoft Research Stanford U.

ABC basics (compilation from different articles)

Andrew Reynolds Liana Hadarean

Scalable Program Analysis Using Boolean Satisfiability: The Saturn Project

On Resolution Proofs for Combinational Equivalence

Motivation. CS389L: Automated Logical Reasoning. Lecture 5: Binary Decision Diagrams. Historical Context. Binary Decision Trees

TypeChef: Towards Correct Variability Analysis of Unpreprocessed C Code for Software Product Lines

A Satisfiability Procedure for Quantified Boolean Formulae

Efficient Circuit to CNF Conversion

Lecture Notes on Binary Decision Diagrams

SAT-Based Area Recovery in Technology Mapping

CS 267: Automated Verification. Lecture 13: Bounded Model Checking. Instructor: Tevfik Bultan

Decision Procedures. An Algorithmic Point of View. Bit-Vectors. D. Kroening O. Strichman. Version 1.0, ETH/Technion

Implementation of a Sudoku Solver Using Reduction to SAT

Complete Instantiation of Quantified Formulas in Satisfiability Modulo Theories. ACSys Seminar

A proof-producing CSP solver: A proof supplement

SAT-Based Area Recovery in Structural Technology Mapping

Binary Decision Diagrams

Verifying the Safety of Security-Critical Applications

VS 3 : SMT Solvers for Program Verification

Linear Time Unit Propagation, Horn-SAT and 2-SAT

Lost in translation. Leonardo de Moura Microsoft Research. how easy problems become hard due to bad encodings. Vampire Workshop 2015

Symbolic and Concolic Execution of Programs

Decision Procedures in First Order Logic

A Toolbox for Counter-Example Analysis and Optimization

Lecture 1 Contracts : Principles of Imperative Computation (Fall 2018) Frank Pfenning

Handout 9: Imperative Programs and State

The Satisfiability Problem [HMU06,Chp.10b] Satisfiability (SAT) Problem Cook s Theorem: An NP-Complete Problem Restricted SAT: CSAT, k-sat, 3SAT

1.4 Normal Forms. We define conjunctions of formulas as follows: and analogously disjunctions: Literals and Clauses

Lecture 1 Contracts. 1 A Mysterious Program : Principles of Imperative Computation (Spring 2018) Frank Pfenning

Formally-Proven Kosaraju s algorithm

[Draft. Please do not distribute] Online Proof-Producing Decision Procedure for Mixed-Integer Linear Arithmetic

Static Program Checking

Software Model Checking. From Programs to Kripke Structures

EECS 219C: Formal Methods Binary Decision Diagrams (BDDs) Sanjit A. Seshia EECS, UC Berkeley

Tree Interpolation in Vampire

Programming Languages and Compilers Qualifying Examination. Answer 4 of 6 questions.1

Decision Procedures in the Theory of Bit-Vectors

Deductive Methods, Bounded Model Checking

PROPOSITIONAL LOGIC (2)

Lecture 6: Arithmetic and Threshold Circuits

Boolean Functions (Formulas) and Propositional Logic

Model Checking Parallel Programs with Inputs

Template-based Program Verification and Program Synthesis

Validation of models and tests for constrained combinatorial interaction testing

Verifying Temporal Properties via Dynamic Program Execution. Zhenhua Duan Xidian University, China

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Model Checking I Binary Decision Diagrams

Array Dependence Analysis as Integer Constraints. Array Dependence Analysis Example. Array Dependence Analysis as Integer Constraints, cont

Cuts from Proofs: A Complete and Practical Technique for Solving Linear Inequalities over Integers

Alloy: A Lightweight Object Modelling Notation

1 Definition of Reduction

Decision Procedures. An Algorithmic Point of View. Decision Procedures for Propositional Logic. D. Kroening O. Strichman.

Software Model Checking. Xiangyu Zhang

EECS 219C: Computer-Aided Verification Boolean Satisfiability Solving. Sanjit A. Seshia EECS, UC Berkeley

Maximal Specification Synthesis

Reasoning About Set Comprehensions

Integrating an AIG Package, Simulator, and SAT Solver

Finding and Fixing Bugs in Liquid Haskell. Anish Tondwalkar

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Research Collection. Formal background and algorithms. Other Conference Item. ETH Library. Author(s): Biere, Armin. Publication Date: 2001

SAT-CNF Is N P-complete

Term Algebras with Length Function and Bounded Quantifier Elimination

Tighter Integration of BDDs and SMT for Predicate Abstraction

CSC 501 Semantics of Programming Languages

Decision Procedures for Equality Logic. Daniel Kroening and Ofer Strichman 1

On the implementation of a multiple output algorithm for defeasible argumentation

Constraint Satisfaction Problems

Verification Overview Testing Theory and Principles Testing in Practice. Verification. Miaoqing Huang University of Arkansas 1 / 80

UNIT 1 BASICS OF AN ALGORITHM

Leonardo de Moura and Nikolaj Bjorner Microsoft Research

Lecture 14: Lower Bounds for Tree Resolution

Chapter 2 PRELIMINARIES

Z Notation. June 21, 2018

Inductive Invariant Generation via Abductive Inference

Nenofar: A Negation Normal Form SMT Solver

The CIFF Proof Procedure for Abductive Logic Programming with Constraints

Lecture 4 Searching Arrays

Lecture Notes on Linear Search

On Reasoning about Finite Sets in Software Checking

8.1 Polynomial-Time Reductions

Constraint Satisfaction Problems

Testing & Symbolic Execution

NO WARRANTY. Use of any trademarks in this presentation is not intended in any way to infringe on the rights of the trademark holder.

A Pearl on SAT Solving in Prolog (extended abstract)

Solving Linear Constraints over Real. German Vitaliy Jr. Researcher Glushkov Institute of Cybernetic NAS Ukraine

Symbolic Trajectory Evaluation - A Survey

Simplifying Loop Invariant Generation Using Splitter Predicates. Rahul Sharma Işil Dillig, Thomas Dillig, and Alex Aiken Stanford University

A Gentle Introduction to Program Analysis

More on Verification and Model Checking

Knowledge Compilation Properties of Tree-of-BDDs

Motivation. CS389L: Automated Logical Reasoning. Lecture 17: SMT Solvers and the DPPL(T ) Framework. SMT solvers. The Basic Idea.

Automated Error Diagnosis Using Abductive Inference

Lecture 2: Symbolic Model Checking With SAT

Lee Pike. June 3, 2005

EECS 219C: Formal Methods Boolean Satisfiability Solving. Sanjit A. Seshia EECS, UC Berkeley

Transcription:

Small Formulas for Large Programs: On-line Constraint Simplification In Scalable Static Analysis Isil Dillig, Thomas Dillig, Alex Aiken Stanford University

Scalability and Formula Size Many program analysis techniques represent program states as SAT or SMT formulas. Queries about program => Satisfiability and validity queries to the constraint solver Scalability of these techniques is often very sensitive to formula size.

Techniques to Limit Formula Size Many different techniques to control formula size: Basic Predicate abstraction Predicate abstraction with CEGAR SLAM, BLAST Iteratively discover relevant predicates. Property simulation Formulas are over a finite, fixed set of predicates. ESP Track only those path conditions where property differs along arms of the branch. and many others...

Our Approach Afore mentioned approaches control formula size by restricting the set of facts that are tracked by the analysis. We attack the problem from a different angle: Instead of aggressively restricting which facts to track a priori, our focus is to guarantee non redundancy of formulas via constraint simplification.

Goal #1: Non-redundancy Given formula F, we want to find formula F' such that: F' is equivalent to F F' has no redundant subparts F' is no larger than F Such a formula is in simplified form If F is a formula characterizing program property P, then predicates irrelevant to P are not mentioned in F'. No need to guess in advance which facts/predicates may be needed later to prove P.

Goal #2: On-line Simplification should be on line: Formulas are continuously simplified and reused throughout the analysis. Important because program analyses construct new formulas from existing formulas. Simplification prevents incremental build up of massive, redundant formulas. In our system, formulas are simplified at every satisfiability or validity query.

An Example enum op_type {ADD=0, SUBTRACT=1, MULTIPLY=2, DIV=3}; int perform_op(op_type op, int x, int y) { int res; Performs op if(op == ADD) res = x+y; on x and y else if(op == SUBTRACT) res = x-y; else if(op == MULTIPLY) res = x*y; else if(op == DIV) { assert(y!=0); res = x/y; } else res = UNDEFINED; return res; } Suppose we are interested in the condition under which perform_op successfully returns, i.e., does not abort.

An Example enum op_type {ADD=0, SUBTRACT=1, MULTIPLY=2, DIV=3}; int perform_op(op_type op, int x, int y) { int res; if(op == ADD) res = x+y; else if(op == SUBTRACT) res = x-y; else if(op == MULTIPLY) res = x*y; else if(op == DIV) { assert(y!=0); res = x/y; } else res = UNDEFINED; return res; Program analysis tool } examines every branch Branch Success Condition and computes op = 0 condition under which true each branch succeeds. op 6= 0 ^ op = 1 true op 6= 0 ^ op 6= 1 ^ op = 2 true y 6= 0 op 6= ^op 6= 1 ^ op 6= 2 ^ op 6= 3 true

An Example enum op_type {ADD=0, SUBTRACT=1, MULTIPLY=2, DIV=3}; int perform_op(op_type op, int x, int y) { int res; if(op == ADD) res = x+y; else if(op == SUBTRACT) res = x-y; else if(op == MULTIPLY) res = x*y; else if(op == DIV) { assert(y!=0); res = x/y; } else res = UNDEFINED; return res; } op = 0 _ (op 6= 0 ^ op = 1) _ (op 6= 0 ^ op 6= 1 ^ op = 2)_ (op 6= 0 ^ op 6= 1 ^ op 6= 2 ^ op = 3 ^ y 6= 0)_ (op 6= 0 ^ op 6= 1 ^ op 6= 2 ^ op 6= 3)

An Example enum op_type {ADD=0, SUBTRACT=1, MULTIPLY=2, DIV=3}; int perform_op(op_type op, int x, int y) { int res; if(op == ADD) res = x+y; else if(op == SUBTRACT) res = x-y; else if(op == MULTIPLY) res = x*y; else if(op == DIV) { assert(y!=0); res = x/y; } else res = UNDEFINED; return res; } No irrelevant predicates, much more concise In simplified form: op 6= 3 _ y 6= 0

Now that this example has convinced you simplification is a good idea, how do we actually do it?

Leaves of a Formula We consider quantifier free formulas using the boolean connectives AND, OR, and NOT over any decidable theory. We assume formulas are in NNF. A formula that does not contain conjunction or disjunction is an atomic formula. Each syntactic occurrence of an atomic formula is a leaf. Example: :f (x) = 1 _ (:f (x) = 1 ^ x + y 1) 3 distinct leaves

Redundant Leaves A leaf L is non constraining in formula F if replacing L with true in F yields an equivalent formula. L is non relaxing in F if replacing L with false is equivalent to F. L is redundant if it is non constraining or non relaxing. x = y ^ (f (x) = 1 _ (f (y) = 1 ^ x + y 1)) {z } {z } {z } {z } L0 L1 Non relaxing because formula is equivalent when it is replaced by false. L2 L3 Both non constraining and non relaxing.

Simplified Form A formula F is in simplified form if no leaf in F is redundant. Important Fact: If a formula is in simplified form, we cannot obtain a smaller, equivalent formula by replacing any subset of the leaves by true or false. This means that we only need to check one leaf at a time for redundancy, not subsets of leaves.

Properties of Simplified Forms A formula in simplified form is satisfiable if and only if it is not syntactically false, and it is valid iff it is syntactically true. Simplified forms are preserved under negation. Simplified forms are not unique. Consider formula linear integer arithmetic. Both in and are simplified forms. Equivalence of simplified forms cannot be determined syntactically.

Algorithm Definition of simplified form suggests trivial algorithm: Pick any leaf, replace it by true/false. Check if formula is equivalent. Repeat until no leaf can be replaced. Requires repeatedly checking satisfiability of formulas twice as large as the original formula. But we can do better than this naïve algorithm!

Critical Constraint Idea: Compute a constraint C, called critical constraint, for each leaf L such that: (i) L is non constraining iff C ) L (ii) L is non relaxing iff C ) :L Intuitively, C describes the condition under which L determines whether an assignment satisfies the formula. C is no larger than original formula F, so redundancy is checked using formulas at most as large as F.

Constructing Critical Constraint Assume we represent formula as a tree. The critical constraint for root is true. Let N be any non root node with parent P and i'th sibling S(i). If P is an AND connective: If P is an OR connective:

Example Consider again the formula: x = y ^ (f (x) = 1 _ (f (y) = 1 ^ x + y 1)) true x=y f (x) = 1 _ (f (y) = 1 ^ x + y 1) x=y ^ (f (y) 6= 1 _ x + y > 1) x = y ^ f (x) 6= 1 ^ x+y 1 x = y ^ f (x) 6= 1 f alse

Example Consider again the formula: x = y ^ (f (x) = 1 _ (f (y) = 1 ^ x + y 1)) true x=y f (x) = 1 _ (f (y) = 1 ^ x + y 1) x=y ^ (f (y) 6= 1 _ x + y > 1) x = y ^ f (x) 6= 1 ^ x+y 1 Non relaxing because C(L2 ) ) :(f (y) = 1) x = y ^ f (x) 6= 1 f alse

Example Consider again the formula: x = y ^ (f (x) = 1 _ (f (y) = 1 ^ x + y 1)) true x=y f (x) = 1 _ (f (y) = 1 ^ x + y 1) x=y ^ (f (y) 6= 1 _ x + y > 1) x = y ^ f (x) 6= 1 ^ x+y 1 Both non constraining and non relaxing because x = y ^ f (x) 6= false 1 implies leaf and its negation. f alse

The Full Algorithm /* * Recursive algorithm to compute simplified form. * N: current subformula, C: critical constraint of N. */ simplify(n, C) { If N is a leaf: If C => N return true /* Non constraining */ If C=> N return false /* Non relaxing */ Otherwise, return N /* Neither */ If N is a connective, for each child X of N: Critical constraint is recomputed because siblings may change. Compute critical constraint C(X) X = simplify(x, C(X)) Repeat until no child of N can be further simplified. }

Making it Practical Worst case: Requires 2n2 validity checks. (n = # leaves) Important Optimization: Insight: The leaves of the formulas whose validity is checked are always the same. For simplifying SMT formulas, we can gainfully reuse the same conflict clauses throughout simplification Empirical Result: Overhead of simplification over solving sub linear (logarithmic) in practice for constraints generated by our program analysis system.

Impact on Analysis Scalability To evaluate impact of on line simplification on analysis scalability, we ran our program analysis system, Compass, on 811 benchmarks. 173,000 LOC Programs ranging from 20 to 30,000 lines Checked for assertions and various memory safety properties. Compared running time of runs that use on line simplification with runs that do not.

Impact on Analysis Scalability Analysis time (seconds) Times out at 3600s Programs >100 lines are analyzed faster with simplification. 2 orders of magnitude improvement # lines of code

Why Such a Difference? Because program analysis systems typically generate highly redundant constraints! COMPASS Size of simplified formula consistently under 20 while non simplified formula have several hundred leaves

It's not just Compass Measured redundancy of constraints in a different analysis system, SATURN. SATURN Similar pattern as in Compass despite attempts to heuristically control formula size.

Related Work Contextual Rewriting Lucas, S. Fundamentals of Contex Sensitive Rewriting. LNCS 1995 Armando, A., Ranise, S. Constraint contextual rewriting. Journal of Symbolic Computation 2003 Logic Synthesis and ATPG Mishchenko, A., Chatterjee, S., Brayton, R. DAG aware AIG rewriting: A fresh look at combinational logic synthesis. DAC 2006 Mishchenko, A., Brayton, R., Jiang, J., Jang, S. SAT based logic optimization and resynthesis IWLS 2007 And many others: BDDs and BMDs, vacuity detection in CTL, term rewrite systems, optimizing CLP compilers...

? s n o i t s Any que