Π-Ware: An Embedded Hardware Description Language using Dependent Types

Π-Ware: An Embedded Hardware Description Language using Dependent Types João Paulo Pizani Flor M.Sc. thesis ICA-3860418 [Supervisors] Wouter Swierstra and Johan Jeuring Friday 29 th August, 2014 Department of Computing Science

Abstract Recently, the incentives for hardware acceleration of algorithms are growing, as Moore's law continues to hold but optimizations in traditional processors show diminishing returns. This growing need for hardware implementation pushes programmers towards hardware design, an activity with stricter correctness and performance requirements than software development. A long-standing line of research is concerned with the application of functional programming techniques to hardware design, and the relatively recent dependently-typed programming paradigm has been claimed by many researchers to be the "successor" of functional programming. This thesis aims, therefore, to investigate what are the improvements that Dependently-Typed Programming (DTP) can bring to hardware design. We developed a domain-specific language for hardware, called Π-Ware, embedded as a library in the Agda programming language. Π-Ware allows the specification of circuits, their simulation, synthesis and verification. Compared to similar approaches in the literature, Π-Ware provides high type safety and robust proofs of correctness for whole families of circuits, however it still needs significant improvements in terms of ease-of-use.

Contents 1 Introduction 2 2 Background 6 2.1 Hardware Description.......................... 6 2.2 Functional Hardware Description.................... 8 2.2.1 Embedded Functional Hardware Description......... 9 2.3 Dependently-Typed Programming................... 13 2.3.1 Type systems.......................... 13 2.3.2 Dependent types........................ 14 2.4 Hardware and dependent types..................... 16 2.4.1 Coquet............................. 17 3 The Π-Ware library 19 3.1 Circuit Syntax.............................. 19 3.1.1 Design discipline enforced by circuit constructors....... 21 3.2 Abstraction Mechanisms........................ 22 3.2.1 Atoms.............................. 23 3.2.2 Gate library........................... 25 3.2.3 High-level circuit datatype................... 27 3.2.4 The Synthesizable type class................. 29 3.3 Circuit Semantics............................ 32 3.3.1 Synthesis semantics...................... 32 3.3.2 Combinational simulation................... 33 3.3.3 Sequential simulation...................... 35 3.4 Proving circuit properties........................ 39 3.4.1 Functional correctness..................... 40 3.4.2 Sequential correctness..................... 42 4 Discussion and related work 45 4.1 Functional hardware Embedded Domain-Specific Languages (EDSLs) 45 4.2 Dependently-Typed Hardware EDSLs................. 49 5 Conclusions 50 5.1 Future work............................... 51 5.1.1 Current limitations and trade-offs in Π-Ware......... 51 5.1.2 Directions for future research.................. 52 1

Chapter 1 Introduction Several factors have been causing an increase in the benefits of hardware acceleration of algorithms. On the one hand, Moore's law still holds for the near future [13], which means bigger circuits in the same wafer area. On the other hand, advances in Instruction-Level Parallelism (ILP) and microarchitecture of processors are starting to have diminishing returns [11]. Also, there is pressure to reduce the duration and cost of the circuit design process. These trends are at odds with the techniques and tooling used in the hardware design process, which have not experienced the same evolution as the ones for software development. The design of an Application-Specific Integrated Circuit (ASIC) imposes strong requirements, specially with regards to correctness, functional and otherwise (power consumption, real-time performance, etc.). Design mistakes which make their way into the manufacturing process can be very expensive, and errors that reach the consumer cost even more, as there's no such thing as "updating" a chip. One infamous example of such a bug reaching the consumer was the FDIV bug in Intel's Pentium chip, which cost the company over 400 million dollars [26]. A language for hardware design should, therefore, try to prevent as many design mistakes as possible, and as early as possible in the implementation chain. Also, it should facilitate the task of verification, providing techniques more efficient than exhaustive testing and model checking for circuits of fixed size. In the software industry, especially in application domains requiring high correctness assurance (such as information security, machine control, etc.), functional programming techniques have long been used to improve productivity and reduce the need for extensive testing and debugging. These claims have been confirmed both in industrial settings [33] and in experiments [15]. Another advantage usually attributed to functional programming is an increased capacity to reason about your programs, specially to perform what is called equational reasoning. Equational reasoning can be used even as an optimization technique. For example, some libraries for array programming in Haskell take advantage of the following law involving map and function composition. The proof is derived by structural induction on the array or list, and by using equational reasoning with the definitions of the functions involved: map f. map g == map (f. g) Besides the growing usage of functional programming languages in several domain 2

areas, functional techniques and constructs keep "penetrating" imperative languages with each new release. Some examples are: Apple's recently released Swift language, which features immutable data structures, first-class functions and type inference. Java's adoption of generics and more recently of lambda expressions 1. The concept of nullable type 2 in C#, equivalent to Haskell's Maybe. Python's generator expressions, inspired by list comprehensions and lazy evaluation. In a certain way, we can compare the power of the tools and techniques used nowadays in hardware design to earlier days of software development. Of course there are inherent and fundamental differences between the two activities, but this comparison leads us to ask whether recent ideas from programming language research, specially those related to functional programming, can be used to improve hardware design. Research trying to answer this broad question started already in the 1980s, with the work of Mary Sheeran and others [29], developing functional hardware description languages, such as μfp [28]. Later, a trend emerged of developing EDSLs for hardware description, hosted in purely functional languages, such as Haskell [21]. Prominent examples of this trend are the Lava [4] family, as well as ForSyDe [27]. A circuit description written in Lava an EDSL hosted in Haskell can look like the following: toggle :: Signal Bool toggle = let output = inv (latch output) in output Even though pure functional languages (especially Haskell) are well-suited for implementing EDSLs, there is still room for improvement in the type-safety of the embedded languages. This can be done by hosting the EDSLs in a language supporting dependent types, as exemplified in the paper "The Power of Pi" [24]. A type system with dependent types can express strong properties about the programs written in it. Systems with dependent types can be seen as regular programming languages, but because of the expressive power of dependent types, we can also see them as interactive theorem provers. Using dependent types, one can more easily bring together specification and implementation. The type signature of a function can give many more guarantees about its behaviour. A classical example when introducing Dependently-Typed Programming (DTP) is the type of statically-sized vectors, along with the safe head function, depicted in Listing 1. In this example, the parameter to head cannot be empty its size will be at least 1 (suc zero). This is expressed in the parameter's type: Vect α (suc n). When checking the totality of head (whether all cases are covered by pattern matching), the type checker will notice that the only constructor of Vect able to produce an element of type Vect α (suc n) is _ _. In the safe head example, we made the specification more precise by constraining the type of an argument. We can also use dependent types to make the return type of a function more precise. The group function for sized vectors in Agda's standard library has the following type: 1 http://docs.oracle.com/javase/tutorial/java/javaoo/lambdaexpressions.html 2 http://msdn.microsoft.com/en-us/library/1t3y8s4s.aspx 3

data Vect (α Set) N Set where ε Vect α zero _ _ {n} α Vect α n Vect α (suc n) head {α n} Vect α (suc n) α head (x xs) = x Listing 1: Type of sized vectors and the safe head function. group {α Set} n k (xs Vec α (n k)) λ (xss Vec (Vec α k) n) xs concat xss It receives as parameters, besides the vector to be "sliced", the number of slices desired (n) and the size of each slice (k). Notice that the size of the passed vector needs to match (n k). The return type of this function may be read as: A vector (xss) of size n (whose elements are vectors of size k), such that xs concat xss. Therefore, it returns a collection of groups "sliced" from the original vector, each with the requested size. Additionally, it returns a proof that concatenating all groups results in the original vector. This type serves as a complete specification of correctness for the function. Any function with this type is, by definition, a correct grouping function. In a deep-embedded DSL for hardware there is usually a type representing circuits. One can imagine that it would be useful to design this type in such a way that as few of its elements as possible are malformed. Therefore, we want to make the typing of the circuits as strong as possible, to eliminate as many classes of design mistakes as possible. Dependent type systems allow for easy expression of these well-formedness rules for circuits. One simple criterion of circuit well-formedness is, for example, that it contains no floating signals. That is, all of a circuit's internal components must have all their ports connected. Figure 1.1 shows an example of circuit violating this rule. Figure 1.1: Malformed circuit with a floating signal. In this M.Sc thesis we developed a hardware EDSL Π-Ware which enforces this and other well-formedness rules using dependent types. Π-Ware is a deep-embedded Domain-Specific Language (DSL) hosted in the Agda programming language, supporting simulation for combinational and synchronous sequential circuits. Furthermore, the user of Π-Ware can prove expressive properties about circuits in the logic of Agda itself. In Chapter 2 we will establish the research questions, concepts, languages and tools from which we took inspiration to create Π-Ware. We will discuss the process of hard- 4

ware design, functional languages for hardware, and how dependent types can enter the picture. In Chapter 3 we dive into a detailed study of the Π-Ware EDSL itself. A detailed account is given of how Π-Ware works currently, what design decisions were involved in its development, and how to use it. We give examples of circuits modeled in Π-Ware and proofs of properties involving these circuits. In Chapter 4 we compare Π-Ware with other functional and dependently-typed EDSLs, highlighting similarities and differences. This is done by commenting on sample circuits and proofs. In Chapter 5, we conclude by pointing Π-Ware's most important current limitations, what are their causes, and trying to speculate on possible solutions. We also comment on what other related research paths could be pursued involving hardware and dependent types, based on the insights gained with Π-Ware. Finally, all source code and documentation for the Π-Ware library is hosted in a public GitHub repository 3. More specifically, the bibliographic database file containing all the citations in this thesis is also available 4, for the the convenience of those readers interested in consulting the related literature. 3 https://github.com/joaopizani/piware 4 https://github.com/joaopizani/piware/raw/report-final/thesis/references.bib 5

Chapter 2 Background 2.1 Hardware Description The hardware development process can be well understood by analyzing its similarities and differences to software development. Both hardware and software development usually "begin" with a high-level specification of the algorithm to be implemented. Also, both proceed by a series of translations, increasingly adding more details to the description. However, the final targets of both hardware and software development differ: while in software the final artifact is machine code (sequence of instructions) for some architecture, in hardware the target is usually a floorplan, a spatially placed graph of logic gates and wires. Also, the transformation steps in the software and hardware chains are different, as depicted in Figure 2.1. In this figure, the rectangular boxes represent transformation steps towards a more detailed description, while the ellipsoid shapes represent artifacts that are consumed/produced by each step. The first (highest) two levels in the hardware implementation flow are usually described using so-called Hardware Description Languages (HDLs), usually Verilog or VHDL. Nowadays they are used by hardware engineers to write both behavioural specifications of circuits (topmost ellipsis in Figure 2.1) as well as Register-Transfer Level (RTL) descriptions. However, these languages were originally designed for simulation purposes, and several problems arise when using them to model hardware architecture and behaviour. First of all, only a subset of these languages can be used for synthesis (actually deriving a netlist and floorplan). Although there is a standard [17] defining a synthesizable subset of VHDL, tools differ greatly in the level of support. To further complicate the matter, this synthesizable subset is not syntactically segregated. One example of complex requirement for synthesizability of VHDL is: "in a process, every output must be assigned a value for every possible combination of input values". In Listing 2 we have an example process violating this requirement (the if statement lacks an else branch, and the y output has no assigned value when sel is different than zero). Another significant difference between hardware and software development is the level of automation in the chains (Figure 2.1). Usually, all transformation steps in the software chain are automatic. On the hardware chain, the crucial "Architectural design" 6

High-level algorithm Software Hardware Source code Intermediate code Assembly code Machine code Program image Programming Compile to VM Compile to arch Assembler Linker Architectural design RTL synthesis Technology mapping Place & route FPGA bitgen / ASIC fab RTL Description Netlist Mapped netlist Floorplan FPGA bitfile / ASIC mask Figure 2.1: Software and Hardware refinement chains. process(sel, a, b) begin if (sel = 0 ) then y <= a; end if; end process; Listing 2: Unsynthesizable VHDL process. step is mostly manual. The complex task of making explicit the space-time trade-offs lays in the hands of the hardware designer: they must decide how much parallelism to use, how to pipeline the processing steps of the algorithm, taking care to not generate data hazards, etc. Recently, tools have been developed for this first step called High-Level Synthesis (HLS). They usually take behavioural VHDL, C or even C++ as input, and produce RTL code. However, current HLS-based hardware implementation chains still face two main problems: The input languages (VHDL, C) are not expressive enough as they were designed for other purposes: VHDL for simulation and C/C++ for software development. Specification, verification and synthesis are all done with different tools and different languages. Even behavioural VHDL can be considered practically a different language than Register-Transfer Level VHDL. Functional programming languages have been touted as a solution to both of these problems. A functional program is a more abstract and expressive specification of an algorithm than a C function or VHDL entity. Also, functional languages could be used 7

to both write the specification of a circuit's behaviour, as well as its Register-Transfer Level description. 2.2 Functional Hardware Description In the beginning of the 1980s, many researchers were trying to use functional programming languages to design and reason about hardware circuits. These developments were happening at the same time when the VHSIC Hardware Description Language (VHDL) was being designed and standardized. Even though VHDL and Verilog ended up "winning" and becoming the de facto industry standards, it is still useful to take a look at the ideas behind these early functional HDLs, as some of them inspired current approaches. The idea was then to come up with new functional languages, specialized to do hardware description. One of the prominent early examples in this set is μfp [28], which was in turn inspired by John Backus's FP [3]. In contrast to VHDL, which was later adapted to synthesis in an ad-hoc way, μfp was designed since the beginning to have two interpretations: Behavioural Each "primitive" circuit as well as each combining form (higher-order function) has an attached functional semantics, used in simulation. Geometric Each combining form has a typical geometric interpretation. For example, sequential composition of two circuits c 1 and c 2 will result in a floorplan in which c 2 is placed adjacent to c 1 and connected to it by the required wires. The μfp language is an extension of Backus's FP, and contained only one extra combining form: μ. The μ combining form is responsible for the creation of functions (circuits) with internal state. According to the original μfp paper [28]: The meaning of μf is defined in terms of the meaning of f. The functional out hides the state so that while M(f) maps a sequence of input-state pairs to a sequence of output-state pairs, out(m(f)) just maps a sequence of inputs to a sequence of outputs. For a given cycle, the next output and the next state depend on the current input and the current state. One very simple example of a circuit with internal state is a shift register. In Figure 2.2, we can see the structure of this circuit. Each of the dotted boxes represents a μfp expression. The smaller box denotes a combinational circuit (with 2 inputs and 2 outputs) that swaps its inputs. The bigger box corresponds to the application of the μ combinator to the smaller one. It adds the indicated latch and a feedback loop, creating a stateful circuit. By being a conservative extension of Backus's FP, almost all algebraic laws of FP also hold for μfp. Furthermore, the μ combining form has useful algebraic laws of its own. The most notable of these laws states that the composition of two "μ-wrapped" functions can be converted to a single "μ-wrapper". In other words: a circuit with several, localized, memory elements can be converted into a circuit with a single centralized memory bank and a combinational block. If we apply this rewrite rule from right to left, we can see it as a form of "optimization", in which we start with a single memory bank and refine the design towards one in which memory elements sit closer to the sub-circuits using them. Two of the core ideas of μfp served as inspiration for Π-Ware: 8

in out μfp: [2, 1] Latch μfp: μ[2, 1] Figure 2.2: Shift register: a simple example of sequential circuit in μfp. Double interpretation As in μfp, each circuit and circuit combinator in Π-Ware has two distinct semantics: they can be simulated (functional semantics) or synthesized to a netlist (geometric semantics). Single sequential constructor As in μfp, there is only one way in Π-Ware to construct a sequential circuit. Also, the same law regarding the state-introducing constructor holds in Π-Ware, but because we use Agda as host language, this meta-theoretical property can be proven in the same language in which circuits are described. 2.2.1 Embedded Functional Hardware Description With the growing popularity and advocacy [14] of embedded DSLs, a trend emerged of also having embedded HDLs. Functional programming languages were a natural fit for hosting DSLs. In particular, the Haskell programming language proved to be a popular choice of host, due to features such as lazy evaluation and a highly-customizable syntax. Some examples of highly-successful DSLs implemented in Haskell are: Attoparsec 1 (monadic parsing) Accelerate 2 (data-parallel computing) Esqueleto 3 (SQL queries) Diagrams 4 (2D and 3D vector graphics) Also, some of the most popular and powerful Embedded Hardware Description Languages (EHDLs) are hosted by Haskell. Let us now present some of these EHDLs and discuss some of their limitations, which will lead to the question of how to solve them using dependent types. 1 https://github.com/bos/attoparsec 2 https://github.com/acceleratehs/accelerate/wiki 3 https://github.com/prowdsponsor/esqueleto 4 http://projects.haskell.org/diagrams/ 9

Lava Perhaps the most popular family of hardware DSLs embedded in Haskell is the Lava family. Lava's first incarnation [4] was developed at Chalmers University of Technology and Xilinx, and used a shallow-embedding in Haskell, along with a monadic description style to handle naming and sharing. These circuit monads were parameterized, and by using different instances, different interpretations could be given to a single circuit description, such as simulation, synthesis, or model checking. The design of Lava suffered changed significantly later on in order to improve ease of use, abandoning the monadic interface and adopting an observable sharing solution based on reference equality [8]. Currently, Lava has several dialects, such as Chalmers- Lava, Xilinx-Lava, York-Lava and Kansas-Lava. We base our examples in Chalmers- Lava 5, which can be said to be the "canonical" dialect, and also the most actively developed. In Lava, circuits are written as normal Haskell functions, by using pattern matching, function application and local naming. One restriction is that the types of the arguments are constructed by the Signal type constructor. For example, a negation gate in Lava would have the following type signature: inv :: Signal Bool -> Signal Bool Another restriction is that all circuit inputs must be uncurried, i.e., even though the circuit has several inputs, the function modeling that circuit must have only one argument, with all inputs in a tuple. A NAND gate in Lava would look like this: nand2 :: (Signal Bool, Signal Bool) -> Signal Bool nand2 (a, b) = inv (and2 (a, b)) Finally, due to the instances provided for the Signal type, the only way to aggregate bits in Lava is by using tuples and lists. Using only these structures we forego some type safety that Haskell could provide. For example, an n-bit binary (ripple-carry) adder in Lava has the following description: adder :: (Signal Bool, ([Signal Bool], [Signal Bool])) -> ([Signal Bool], Signal Bool) adder (carryin, ([],[])) = ([], carryin) adder (carryin, (a:as, b:bs) = (sum:sums, carryout) where (sum, carry) = fulladd (carryin, (a, b)) (sums, carryout) = adder (carry, (as, bs)) In this circuit description, there is an expectation that both inputs have the same size. When this expectation is not met, a run-time error will occur during simulation. This happens because the given definition is partial: the cases for (carryin, ([], b:bs)) and (carryin, (a:as, [])) are left undefined. This limitation of Lava can be solved by dependent types, namely using staticallysized vectors. Another limitation of Lava is related to the way in which it solves the observable sharing question: in order to detect sharing and cycles, the equality over the Signal type is defined as an equality over the references used. Therefore, comparisons of signals "created" in different sites will fail, even though their values are the same. For example, the following expressions will evaluate to False: 5 https://hackage.haskell.org/package/chalmers-lava2000 10

test1 :: Bool test1 = low == low test2 :: Bool test2 = simulate adder (low, ([low], [low])) == low This limitation is also not present in Π-Ware, due to the way in which we describe circuits in a structural fashion, completely avoiding the need to deal with observable sharing. ForSyDe Another example of EHDL in Haskell is ForSyDe [27]. ForSyDe and Lava differ substantially in description style and internal workings, and a more complete comparison of the two can be found in the final report [25] of the experimentation project conducted as preparation for this thesis. In ForSyDe, the central concepts are those of process and signal. Whereas in Lava only synchronous sequential circuits can be described which is also the case in Π- Ware in ForSyDe processes can belong to synchronous, asynchronous and continuous models of computation. In the synchronous model of computation (which was studied more deeply), process constructors take a combinational function (called process function in ForSyDe jargon) and turn it into a synchronous sequential circuit (for example a state machine). Instead of just using (smart) constructors of a certain circuit datatype to build process functions, ForSyDe relies on Template Haskell to quote regular Haskell syntax into an Abstract Syntax Tree (AST) which is then further processed by ForSyDe. For example, the description of an adder in ForSyDe would look like the following: adderfun :: ProcFun (Int8 -> Int8 -> Int8) adderfun = \$(newprocfun [d ]) adderfun' :: Int8 -> Int8 -> Int8 adderfun' x y = x + y adderproc :: Signal Int8 -> Signal Int8 -> Signal Int8 adderproc = zipwithsy "adderprocess" adderfun Notice how the adderfun process function is built from a regular Haskell function (adderfun'), which is then quoted (by the [d quasi-quoter), processed by newprocfun and finally spliced back into place. The (combinational) process function is then lifted into the synchronous sequential setting by the zipwithsy process constructor, which just "zips" the inputs signals (streams) by applying the given process function pointwise. Not any Haskell function can be quoted and processed by newprocfun and turned into a ForSyDe process function: the argument and return types of a ProcFun must belong to the ProcType type class. Instances of this class are provided only for: Primitive types Int, Int8, Int16, Int32, Bool, Bit Enumerated types User-defined enumerations, with instances for Data and Lift Containers Tuples and fixed-length vectors (Data.Param.FSVec), holding a type of the above two categories and unrestrictedly nested. 11

Under certain conditions, ForSyDe is able to generate VHDL netlists from the system description. In order for a ForSyDe system description to be synthesizable, all process functions need to comply with some extra requirements: Pointed notation Declarations with point-free notation are not accepted Single-clause To be synthesizable, the body of a process function cannot have multiple clauses, and it cannot have let or where blocks. Recursion is also forbidden inside process functions. Pattern matching is only possible by using the case construct. Despite all these limitations, ForSyDe still provides a more typed approach to hardware description than Lava, with perhaps its most distinctive feature being the usage of fixed-length vectors. In the parameterized-data page on Hackage 6, the authors admit that the library's goal is to provide "type-level computations and emulate dependent types". Therefore it is not a stretch to assume that using actual dependent types could improve on the ideas proposed by ForSyDe. Hawk and Cλash Finally, in this short review of functional hardware EDSLs, there are two more alternatives to be mentioned: Hawk and Cλash. Hawk [20] is a Haskell-hosted EDSL with a different target than the ones presented until now: instead of modelling circuits, it intends to model microarchitectures. Therefore, it has a higher level of abstraction than Lava or even ForSyDe. Models in Hawk are executable, and shallow-embedded. However, Hawk uses a technique similar to the original Lava version [4] in order to also allow for symbolic evaluations: The descriptions are "parameterized" by type classes (in Hawk's case they are Instruction, Boolean, etc.), and by providing different instances, different interpretations are achieved. The authors of Hawk mention some shortcomings of Haskell which they met, among which: Haskell's List datatype doesn't quite match the intended semantics for Hawk's Signals: the preferred semantics for Signal should be a truly infinite, coinductive stream. The authors mention a parallel effort of them to embed Hawk in the Isabelle theorem prover, which achieved the desired semantics. In Π-Ware, we make use of coinductive streams [19] to model the inputs and outputs of synchronous sequential circuits, exactly as the authors of Hawk desired. The type class system of Haskell is limited: the authors mention the desire to be able to explicitly provide specific instances at specific sites, and also the desire to use views on datatypes. Both features are available in Agda. Finally, we would like to mention Cλash [2]. Even though it is not (strictly speaking) an embedded DSL, it has very powerful features. Cλash was developed at the University of Twente, as an independent DSL, i.e, with a compiler of its own. However, its source language is strongly based on Haskell, and Cλash's compiler reuses several pieces of machinery from Haskell's toolchain (specifically, GHC), in order to perform its term-rewriting and supercompilation. 6 http://hackage.haskell.org/package/parameterized-data-0.1.5 12

The circuit models in Cλash are higher-order, polymorphic functional programs, and they can be synthesized to VHDL netlist. In this sense, Cλash serves as a good target with which to compare hardware EDSL development efforts: it is not in itself embedded, but has the same goals. 2.3 Dependently-Typed Programming 2.3.1 Type systems A type system, in the context of programming languages, serves the purpose of grouping values, so that meaningless and potentially undesirable operations are avoided. For example, in some dynamic languages, a string and an integer can be "added", because the language runtime first implicitly casts the integer to a string, effectively performing string concatenation. This implicit behaviour can cause all sorts of programmer mistakes to go undetected. It is also hard to reason about addition in general when all sorts of coercions can happen at runtime. To define addition sensibly, therefore, a type system can help by banning all programs trying to add incompatible values. Type systems can have very different properties and be implemented in very different ways. Some of the ways in which type systems can be categorized are [7]: Weak vs. strong Static vs. dynamic Polymorphism (parametric, ad-hoc, subtyping) The most basic form of abstraction values depending on values (the concept of functions) is supported in practically all type systems. In some systems, the result type of functions can also depend on the types of the arguments; this property is called polymorphism. There are two kinds of polymorphism relevant to our discussion: In parametric polymorphism, functions are written without mentioning any specific type, and can therefore be applied to any instantiation of the type variable. A typical example of a parametric polymorphic function is obtaining the length of a list, where the same definition works for any type of element in the list. In ad-hoc polymorphism, the behaviour of a function varies with the type of the inputs. In Haskell, ad-hoc polymorphism is implemented in the type class system. A typical example of an ad-hoc polymorphic function is a comparison-based sorting algorithm in which depending on the type of the elements in the collection different comparison operators are used. Polymorphism adds expressivity and makes a type system stronger, in the sense that it allows for more precise specifications. For example, if we want to implement a swap function for pairs in Haskell, we could start by specifying the type of the function in a non-polymorphic way: swap' :: (Int, Int) -> (Int, Int) There are several definitions satisfying the above type which do not swap the elements. For example, one "wrong" implementation would output a constant pair: 13

swap' _ = (1, 1) Notice how the argument is ignored (we use the "don't care" pattern). To rule out this class of wrong implementations, we could make the type polymorphic: swap'' :: (a, a) -> (a, a) Now "constant" definitions will not have the specified type anymore. This because the only way to get an element of the polymorphic type a is to use the parameter passed to the function. However, we could still write a wrong function with this type, for example the identity: swap'' (x, y) = (x, y) This is possible because our type does not yet reflect a precise enough specification of swap. On the type we have now, the types of both elements in the pair are the same. If we lift this artificial restriction, we get the type signature which fully specifies swap. swap :: (a, b) -> (b, a) Now, any total definition with this type is a correct definition of swap. Because of the way in which we use type variables (a and b) in the signature, there is only one possible implementation of swap: swap (x, y) = (y, x) Going one step further, some type systems support types that depend on other types, these are called type operators or type-level functions. In Haskell, this form of abstraction is implemented as regular parameterized type constructors (List, Maybe, etc.) and as indexed type families. The last step then in our "ladder" of type system expressiveness is dependent types. 2.3.2 Dependent types A dependent type is a type that depends on a value. A typical example of dependent type is the type of dependent pairs, in which the type of the second element depends on the value of the first: record Σ (α Set) (β α Set) Set where constructor _, _ field fst α snd β fst In a dependent type system, we can also have functions in which the return type depends on the value of a parameter. These functions belong to the so-called dependent function space. For example, we can imagine having a take function for vectors which makes use of this possibility to have a more precise specification. First of all, when indexing or obtaining a prefix from a vector, we need to ensure that the index (or amount to be extracted) is within bounds. That is, we cannot take more elements than the size of the vector. The type signature of take is shown in Listing 3. 14

take {α Set} {n N} (k N) Vec α (k + n) Vec α k Listing 3: A "size-safe" prefix-taking function for sized vectors. In the signature of take, a dependent function space is used both to restrict the domain and to express a desired property of the codomain. Notice how the vector parameter has a type (Vec α (k + n)) which depends on the value of the first (explicit) parameter: k. Also, the result type is restricted to Vec α k. Finally, we observe that the type of take is a necessary but not sufficient condition for what we would intuitively conceive as being a "correct" prefix-taking function. There are "wrong" implementations which still would have this type, for example returning a constant vector of size k. This type, however, provides many more static guarantees: all implementations violating bounds and producing incorrectly-sized results are ruled out by the type checker at compile-time. Until now, we have approached languages with dependent types as a way to help with programming. Specifically, we gave examples on how using dependent types in function's type signatures can rule out incorrect implementations by design. Dependentlytyped languages can also be seen from a logical point of view. This remark that a programming language (with it's type system) can also be seen as a logic system goes back to the early days of computing, and is known by several names, among which "Curry-Howard isomorphism" and "propositions as types" [32]. It initially was "discovered" as a connection between the simply-typed lambda calculus and intuitionistic propositional logic. As decades went by, this correspondence was found to be much more general, and several connections were drawn between diverse typed lambda calculi and logics. The system of dependent types which we use in this thesis (the one used by Agda) corresponds to intuitionistic predicate logic. Even though there are logics connected to other less powerful typed lambda calculi, the one used in Agda is expressive enough to radically improve its area of use. In a language with dependent types, we can not only write programs, but also proofs. For example, in Agda, we can model the less-than-or-equal order relation using an indexed data family (Listing 4), where the indices are the usual Peano naturals. data _ _ N N Set where z n {n N} zero n s s {m n N} m n suc m suc n Listing 4: Order relation ( ) over naturals, as an Agda indexed data family. Having defined this relation, we can then prove facts involving it. For example, one trivial fact is that two is smaller-than-or-equal to four, which has the trivial proof shown in Listing 5. twoleqfour 2 4 twoleqfour = s s (s s z n) Listing 5: A very simple proof involving the relation over natural numbers. 15

Following the "propositions as types" isomorphism, the type of twoleqfour can be seen as a logic statement, and the expression to the right-hand side of the equals sign (the body of the function) as a proof of this statement. The proof is built with z n as its "basis" (the fact that zero is lesser-than-or equal to any number), producing 0 2. Then s s is applied twice, producing, respectively, 1 3 and finally 2 4. Of course more "interesting" and general statements can be proved. For example, the fact that the relation is transitive is stated and proved in Listing 6. trans {a b c N} a b b c a c trans z n _ = z n trans (s s ab ) (s s bc ) = s s ( trans ab bc ) Listing 6: Proof that the relation is transitive. As proofs are first-class citizens, they can be passed as arguments to functions, and returned as well. This provides yet another powerful way of expressing requirements of function arguments, and showing that the result returned satisfies some desired property. For example, we have already shown (in Listing 3) how to have a safe version of a prefix-taking function for statically-sized arrays: by using a specially-crafted input type. Now we have another way to express the same requirement: we can explicitly require a proof argument be passed, guaranteeing that the amount to be "taken" is less than or equal to the array size. take {α Set} {n N} (m N) {p m n} Vec α n Vec α m This capability for precisely defining and enforcing requirements and invariants make dependently-typed programming languages very good candidates for hosting EDSLs. In the paper "The power of Pi" [24], three examples are shown of how to embed DSLs in Agda and get corresponding Domain-Specific Type Systems (equipped with desirable properties) "for free". The examples in "The power of Pi" specially the embedding of Cryptol, a low-level cryptographic DSL served as inspiration, and lead us to investigate how dependentlytyped programming can benefit hardware design, the main question targeted by this project. 2.4 Hardware and dependent types We are aware of three attempts in the literature of bringing dependent types to improve hardware design. Each of them has provided us with specific insights and ideas, some of which Π-Ware incorporates. Let us now briefly review these works, and highlight the most important contributions of each. The paper "Constructing Correct Circuits" [5], from 2007, gives a clear example of how dependent types can tie together specification and implementation. In this paper, the authors give a mapping between the usual, Peano-based, naturals and binary numbers, which they then use to build a (ripple-carry) binary adder which is correct by construction. This approach is significantly different than the one taken in Π-Ware. In the paper by Brady, McKinna & Hammond [5], the functional specification of a circuit is carried in the type of the circuit. This, according to the authors, means that "a type-correct program is a constructive proof that the program meets its specification". 16

In Π-Ware, on the other hand, a circuit's type only expresses its input and output types, and gives some structural well-formedness guarantees (correct sizing). The development of Π-Ware was already underway as this paper first came to my knowledge. Even so, the decision of keeping circuits and proofs apart (though in the same language) still seems to have been the right one; mainly because of the requirement that circuits be synthesizable to VHDL. 2.4.1 Coquet The most significant source of inspiration for the design of Π-Ware was Coquet [6], a hardware DSL embedded in the Coq language 7. Coquet and Π-Ware share some very important characteristics and differ on several others, with these differences caused not only by the different host languages used (Coq and Agda), but also by different implementations chosen for some core concepts. First of all, the similarities. Coquet's circuit datatype utilizes the same concept of structural description as Π-Ware, with a "fundamental" gate constructor and constructors for structural combination. Also, the idea of dedicating one specific constructor for building "rewiring" circuits was inspired by Coquet's Plug. The Circuit datatype of Coquet is shown on Listing 7. Context {tech : Techno} Inductive Circuit : Type -> Type -> Type := Atom : forall {n m : Type} {Hfn : Fin n} {Hfm : Fin m}, techno n m -> Circuit n m Plug : forall {n m : Type} {Hfn : Fin n} {Hfm : Fin m} (f : m -> n), Circuit n m Ser : forall {n m p : Type}, Circuit n m -> Circuit m p -> Circuit n p Par : forall {n m p q : Type}, Circuit n p -> Circuit m q -> Circuit (n + m) (p + q) Loop : forall {n m p : Type}, Circuit (n + p) (n + p) -> Circuit n m Listing 7: Coquet's Circuit datatype. Some differences between Coquet's Circuit type and that of Π-Ware (detailed in Section 3.1) are, however, immediately noticeable: Indices are Types, instead of just sizes In Π-Ware, the core circuit datatype is indexed by two natural numbers, representing the sizes of the circuit's input and output. In Coquet, the indices are arbitrary types, constrained by the Fin type class. The definition of Fin makes the synthesis of Coquet's circuits interfaces to VHDL harder. Before deciding to use natural number indices on Π-Ware, another alternative for generalization was considered, detailed in Section 5.1.1. 7 Referring to Coq as a language is a misnomer, as it is an interactive theorem prover that uses different languages for defining terms, interactive commands, and user-defined proof tactics 17

Lack of a "sum" constructor Both Coquet and Π-Ware have a "parallel composition" constructor which, given two circuits as arguments (with inputs respectively α and β), will build a circuit with the product α β as its input. Π-Ware, however, has an extra "sum" constructor, building a circuit which is capable of "deciding" which one of the two subcircuits to apply to an input depending on a tag. This is useful for implementing, for example, generic multiplexers. Single fundamental gate vs. gate library Coquet's fundamental constructor (Atom) is parameterized by the techno instance, which gives the description of one technologydependent gate (NAND, NOR, etc.). Π-Ware circuit descriptions, however, are parameterized by a gate library, with an attached specification function to each of its elements, as detailed in Section 3.2.2. Another similarity between Coquet and Π-Ware is the usage of data abstraction, embodied in Coquet by the Iso type class and in Π-Ware by Synthesizable, described in more detail in Section 3.2.4. The goal of data abstraction is to be able to express the specification of circuits in more user-friendly (high-level) types, while the circuits themselves are just described in terms of sizes and wires. Finally, some of the the more fundamental differences between Π-Ware and Coquet are: Relational vs. functional semantics Coquet defines a relational semantics for circuits, that is, the semantics of a circuit is a datatype defined by recursion on circuit structure. Π-Ware, on the other hand, has an executable functional semantics 8, in which each circuit is mapped to an Agda function over sequences of bits (or other types). Semantics of sequential circuits Π-Ware, in contrast to Coquet, makes a clear distinction between purely combinational and possibly sequential circuits. We maintain the invariant that purely combinational circuits never contain loops. Also, in Π- Ware, the only constructor which builds loops also adds a delay element 9, eliminating the need to consider an extra "sequential fundamental gate", as in Coquet. The functional semantics of sequential circuits in Π-Ware is defined as a causal stream function (detailed in Section 3.3.3), while Coquet uses regular functions over streams. 8 another term commonly used in the literature is denotational semantics 9 in hardware terms, a clocked latch 18

Chapter 3 The Π-Ware library Π-Ware is an embedded hardware description language that allows for circuit description (modelling), simulation, verification and synthesis. In this chapter we will describe in detail how the Π-Ware library is organized, what principles are behind some of the most important design decisions taken in its development, and how to use Π-Ware to model, simulate and reason about circuit behaviour. The reader is assumed to be familiar with the Agda programming language and with Dependently-Typed Programming in general. For readers familiar with functional programming, but not Agda nor DTP, a good introduction can be found in the official Agda tutorial [23]. 3.1 Circuit Syntax The most basic activity allowed by Π-Ware is the description of circuits: as already mentioned briefly in the introduction, Π-Ware is deeply embedded, which means there is a datatype whose values are circuits. A deep embedding was chosen in order to allow for semantics other than execution (simulation). Particularly, the possibility of synthesizing circuit models was a requirement kept in mind throughout the whole development. The circuit datatype (C ) is the most fundamental of the whole library. It is defined as a dependent inductive family, indexed by two natural numbers, as shown in Listing 8. data C where Nil C zero zero Gate (g# Gates#) C ( in g#) ( out g#) Plug {i o} (f Fin o Fin i) C i o DelayLoop {i o l} (c C (i + l) (o + l)) {p comb c} C i o _ _ {i m o} C i m C m o C i o _ _ {i 1 o 1 i 2 o 2 } C i 1 o 1 C i 2 o 2 C (i 1 + i 2 ) (o 1 + o 2 ) _ + _ {i 1 i 2 o} C i 1 o C i 2 o C (suc (i 1 i 2 )) o Listing 8: The core circuit type (C ) of Π-Ware. 19

A structural representation of a circuit is achieved by the constructors of C. This is in stark contrast with the the description style used in the Lava family, for example. In Lava, the constructors of the circuit datatype represent solely logic (or arithmetic) gates, and metalanguage (Haskell) constructs such as application, tupling and local naming are used to represent sequencing, parallel composition, loops and sharing. In Π-Ware, on the other hand, explicit constructors represent these combinations, avoiding the need to implement some form of observable sharing [12] in the host language. The indices of C represent, respectively, the size of the circuit's input and output. This can be thought of as the number of "wires" entering (resp. leaving) that circuit. Notice that the representation of inputs and outputs used in C is untyped and unstructured. However, there is another higher-level circuit datatype (C), which adds a layer of typing on top of C, and constitutes the actual intended interface between the user (hardware designer) and Π-Ware. This abstraction layer will be discussed in more detail on Section 3.2.3. To better understand the reasoning behind the design of the low-level C datatype, its constructors can be considered to belong to one of three categories: Fundamental: These construct predefined or "atomic" circuits, with no sub-components. Nil: The empty circuit. Performs no computation and has neither inputs nor outputs. It is mainly useful as a "base case" when building large circuits through recursive definitions. Gate: Constructs a chosen circuit among those provided by a gate library passed as parameter to the Circuit module. Such libraries can consist of, for example, logic primitives ({NOT, AND, OR}) or even arithmetic gates (adders, etc.). Plug: Constructs a "rewiring" circuit. Performs no computation, but can be used to apply permutations and to duplicate or discard wires. The type of the function passed as argument to Plug ensures that no short circuit can occur. Structural: Represent ways in which smaller circuits can be interconnected to build a bigger one. c 1 c 2 : Sequential composition. Given c 1 and c 2, connects the output of c 1 into the input of c 2. c 1 c 2 : Parallel composition. Splits the input into two parts, connected to c 1 and c 2, and rejoins the outputs of c 1 and c 2 into a single output. c 1 + c 2 : Tagged branching. Based on the value of a tag given in one of the input wires, feed the remaining input wires into either c 1 or c 2. Delay: The DelayLoop constructor belongs to a category of its own. Its purpose is to construct a state-holding circuit given a purely combinational circuit as argument. Part of the output of the circuit given as argument to DelayLoop is passed through a memory element and looped back to that circuit's input. State machines and other sequential circuits can be created with this constructor. These descriptions are just a rough summary of the synthesis semantics of Π-Ware, that is, how each circuit constructor creates a netlist, given netlists as arguments. The precise definition of the synthesis semantics is given in Section 3.3. The same section contains a detailed definition and explanation of a simulation semantics for Π-Ware circuits, both purely combinational and sequential ones. 20