Fractional Rate Dataflow Model for Efficient Code Synthesis

Size: px
Start display at page:

Download "Fractional Rate Dataflow Model for Efficient Code Synthesis"

Transcription

1 Fractional Rate Dataflow Model for Efficient Code Synthesis Hyunok Oh Soonhoi Ha The School of Electrical Engineering and Computer Science Seoul National University Seoul , KOREA TEL : {oho,sha}@comp.snu.ac.kr ABSTRACT Automatic code synthesis from dataflow program graphs is a promising high-level design methodology for rapid prototyping of multimedia embedded systems. Memory efficient code synthesis from dataflow models has been an active research subject to reduce the gap in terms of memory requirements between the synthesized code and the hand-optimized code. However, existent dataflow models have inherent difficulty of efficiently handling data structures. In this paper, we propose a new dataflow extension called fractional rate dataflow (FRDF) in which fractional number of samples can be produced and consumed. In the proposed FRDF model, a constituent data type is considered as a fraction of the composite data type. Existent integer rate dataflow models can be easily extended to incorporate the fractional rates without loosing analytical properties. In this paper, the SDF model is extended to include FRDF, which can reduce the buffer memory requirements significantly, up to 70%, for some multimedia applications. Extended SDF model with fractional rate has been implemented in our system design environment called PeaCE(Ptolemy extension as Codesign Environment). 1. INTRODUCTION As system complexity increases and fast design turn-around time becomes important, automatic code synthesis from dataflow program graphs is a promising high level design methodology for rapid prototyping of multimedia embedded systems. COSSAP [1], GRAPE [2], and Ptolemy [3] are well-known design environments, especially for digital signal processing applications, with automatic code synthesis facility from graphical dataflow programs. The main obstacle of using this methodology for practical design problems is the code efficiency in terms of cycle counts and memory requirements of the synthesized code compared with the hand-optimized code. In a hierarchical dataflow program graph, a node, or a block, represents a function that transforms input data streams into output streams. The functionality of an atomic node is described in a high-level language such as C or VHDL. An arc represents a channel that carries streams of data samples from the source node to the destination node. The number of samples 1

2 produced (or consumed) per node firing is called the output (or the input) sample rate of the output (or the input) port. The simplest dataflow model is a single-rate dataflow (SRDF) [4] in which all sample rates are unity. When a port may consume or produce multiple samples, we call it a multirate dataflow (MRDF), which is assumed in this paper as well as in other design environments. We also assume static dataflow model in which the sample rates of all nodes are known at compile time. Static dataflow models permit compile time analysis that determines the node execution schedule and the memory requirements in the synthesized code from the graph. Hence most design environments use static dataflow models such as synchronous dataflow (SDF) [5] and cyclo-static dataflow (CSDF) [6]. In the SDF model, all sample rates are fixed while they can be changed periodically in the CSDF model. Memory efficient code synthesis from static dataflow models has been an active research subject to reduce the gap in terms of memory requirements between the automatically synthesized code and the manually optimized code [7][8][9]. The function body of a node is presumably highly optimized a priori and predefined in a block library. In the synthesized code, the function codes of atomic nodes are stitched together according to the execution order determined at compile time. Then, the cycle count of the synthesized code is not much inferior because the computation intensive kernel is encapsulated within a node. On the other hand, each arc of a dataflow graph is assigned a buffer memory whose size is at least the maximum amount of samples accumulated on the arc at run time. To minimize the buffer requirements in the synthesized code, many works have been performed to minimize the buffer requirements by optimal scheduling [7][8] and by buffer sharing [10][11]. While most previous works do not differentiate the composite type from the primitive type data, we take special account for the composite data type and achieve significant reduction on buffer memory requirement. Composite data types, such as video frame or network packet, are used extensively in recent networked multimedia applications, and become the major consumer of scarce memory resource. Existent dataflow models have inherent difficulty of efficiently expressing the mixture of a composite data type and its constituents: for example, a video frame and macro blocks. A video frame is regarded as a unit of data sample in integer rate dataflow graphs, and should be broken down into multiple macro blocks explicitly consuming extra memory space. To overcome this difficulty, we propose a new dataflow extension called fractional rate dataflow (FRDF) in which fractional number of samples can be produced or consumed In the proposed FRDF model, a constituent data type is considered as a fraction of the composite data type. An FRDF graph can be synthesized into efficient codes with shorter latency and less buffer memory than integer rate dataflow graphs. Moreover, the FRDF model can be adopted in the existent dataflow models. In the next section, we motivate the fractional rate dataflow model with an H.263 encoder example. In section 3, we show how the analytical properties of the static dataflow models can be extended to the FRDF model. Section 4 discusses the experimental results with an H.263 encoder example. In section 5, we extend the proposed fractional rate concept to the primitive type data samples. Some real experiments demonstrate in section 6 that fraction rate extension to the primitive type data also reduces the memory requirements. Code generation from the proposed FRDF model is discussed in section 7. Concluding remarks follow in section 8. 2

3 2. MOTIVATION Figure 1(a) shows an SDF subgraph of an H.263 encoder example. The ME node receives the current and the previous frames to compute the motion vectors and pixel differences, and the D node divides the input frame into 99 macro blocks to be encoded by the next EN node. Since the ME node regards a 176x144 video frame as the unit data sample while the EN node a 16x16 macro block, the D node is inserted to convert data type explicitly. The schedule of node execution becomes 1(ME,D)99(EN) in this subgraph, which executes ME and D once and EN 99 times. Hence two buffers of frame size are required for arc α and β both. The cyclo-static dataflow (CSDF) model gives a better representation, where the D node can produce a macro block per execution. The annotated notation, 1,98x0, on arc α in Figure 1(b) means that the D node consumes 1 data sample of frame size at the first execution and no sample at the subsequent 98 executions. The sample rates vary one firing to the next in a cyclic pattern. Because the pattern is periodic and predictable, it is possible to statically construct a periodic schedule: 1(ME)99(D,EN). As a result, the buffer size on arc β is reduced to the macro block size from the frame size of SDF model. The memory for arc α still has the frame size since the ME node generates a frame output. Compared with the hand-written code for the program graph, we observe that the motion estimation node need not produce the frame-size output at once. Since the ME node is the most time consuming and runs for each macro block, it had better generate output samples at the unit of macro block for shorter latency. The proposed FRDF allows implicit type conversion from the video frame type to the macro block type at the input arc of the ME node (Figure 1(c)). The fraction rate 1/99 indicates that the macro block is 1/99 of the video frame in its size. How to divide the fraction depends the frame type and should be specified by the programmer before run time. For each execution of the ME node, it consumes a macro block from the input frame, and produces a macro block output. Thus, we do not need the D node any more since type conversion is already made inside the ME node. A possible schedule becomes 99(ME,EN). Now, the buffer size becomes the macro-block size. Šœ Œ Œ œš tl `` k lu α β OˆP Šœ Œ Œ œš tl S`_ŸW ``Ÿ k lu α β O P Šœ Œ Œ œš V`` Šœ Œ S`_ŸW V`` tl lu S`_ŸW tl ``Ÿ lu Œ œš OŠP O P Figure 1. A subgraph of an H.263 encoder example: (a) SDF representation, (b) CSDF representation, (c) FRDF representation, and (d) revised CSDF representation. Even though the FRDF representation permits more intuitive representation, Figure 1(d) shows a CSDF graph with the same buffer requirements. The ME node consumes a whole frame at the first execution followed by 98 executions without input sample but produces one macro block at each execution. In case the input frame of the previous node is prepared at once, this modified CSDF and the FRDF solutions are the same in terms of memory requirements. However, there is a difference. The 3

4 FRDF solution can execute the ME node before the entire current frame arrives at the input arc while the CSDF solution can execute the node only after the entire current frame is available. 3. STATIC ANALYSIS Existent dataflow models can be extended to support fractional rates without losing any analytical property. In this section, we extend the SDF model to support fractional rates and discuss its static analyzability. The similar discussion can be made to other dataflow models like CSDF. We start our discussion stating the following definition. Definition 1: Equivalent node For an FRDF node A, an equivalent SDF node (A ) is defined such that Q executions of the FRDF node A is equivalent to a single execution of A, where Q is the minimum number of executions to make all sample rates integer. The sample rates of the equivalent node are Q times those of the FRDF node. Q is called an equivalence factor. We denote this equivalent relation as A ~ A. Q p i qi SDF node exists. And the function body of the SDF node performs Q executions of the FRDF node at each execution. For sample rates { p i q i } of a FRDF node, the equivalent SDF node has sample rates. Since q i divides Q, such an We also define the equivalence relation between an FRDF extension of an SDF graph and a pure SDF graph as follows. Definition 2: Equivalent graph For a given FRDF extension of an SDF graph, the equivalent graph is defined as the SDF graph obtained after all FRDF nodes are replaced with their equivalent SDF nodes as stated in definition 1. The key analytical property of the SDF model is that the node execution schedule can be constructed at compile time if the graph is consistent [5] meaning the repeated execution of the schedule does not overflow buffers. The number of executions of node A within a schedule is called the repetition number x(a) of the node. For a given arc e, we denote src(e) as the source node and snk(e) as the destination node. The output sample rate of src(e) onto the arc is denoted as prod(e) and the input sample rate of snk(e) as cons(e). Then, the following equation, called a balance equation, should be held for a consistent SDF graph. x ( src( e)) prod ( e) = x( snk ( e)) cons( e) for each e. (1) Now, we arrive at the following theorem. 4

5 Theorem 1. An FRDF extension of an SDF graph is consistent if the equivalent SDF graph is consistent. <proof> Suppose that an SDF node is substituted for an equivalent FRDF node with Q equivalence factor and equation (1) holds for the equivalent SDF graph. Then, the sample rate p i of each port becomes p i Q in the equivalent FRDF node. Let the repetition number of the FRDF node be Q times the repetition number of the SDF node. Then equation (1) becomes prod( e) (2) Q { Q x( src( e)) } = x( snk( e)) cons( e) for an output arc e of the FRDF node. Similar argument holds for any input arc. Thus if the equivalent SDF graph is consistent, the FRDF graph is consistent. Q.E.D. The theorem tells how to compute the repetition numbers of FRDF nodes in the extended SDF graph. The repetition number of an FRDF node is the product of the equivalence factor of the node and the repetition number of the equivalent SDF node in the equivalent SDF graph. Consider an FRDF example of Figure 2(a). The equivalent SDF graph is drawn in Figure 2(b) where equivalence factor of nodes B and C are 4 and 2 respectively. The key difference between these two graphs lies in the execution schedule of nodes. The schedule of the SDF graph becomes 4(A)B C meaning that node A is executed 4 times before node B and C are executed in sequence. If we replace B and C with the original FRDF nodes B and C, the schedule becomes as displayed in Figure 2(d). If we allow fractional rate, however, we may have more buffer-efficient schedule. After executing node B twice, half of the data is filled out, which can be consumed at node C. Therefore, we can obtain the following schedule 2(2(AB)C) as shown in Figure 2(c). We save the half of the buffer size on arc BC from this schedule with careful buffer sharing. A 1 1 B 1/4 1/2 C (a) Schedule: (2(2(AB)C) (c) (B ~ 4B) (C ~ 2C) A 1 4 B 1 1 C (b) Schedule: 4A(B =4B)(C =2C) (d) Figure 2. (a) An FRDF graph, (b) the equivalent SDF graph, (c) a schedule result with fractional sample rate, and (d) another schedule result with integer sample rate 5

6 4. H.263 ENCODER EAMPLE In this section, we compare the hand-coded reference code and the synthesized codes from three dataflow models with H.263 encoder example handling composite-type video frame data: SDF, CSDF, and FRDF. We use tmn (ver.2.0) H.263 encoder as the reference code, which has been developed by Telenor Research and Development [12]. For fair comparison, we use the same function body for each block as the associated function definition in the reference code. Since the dataflow models implement only the basic mode defined in the standard, we manually remove the extra code sections for the advanced coding mode from the reference code. As a result, the code size and the execution cycles are not much different from each other. Simplified graph specifications with SDF and CSDF model are shown in Figure 3 and Figure 4 respectively. We already discussed this example in section 2 to motivate the FRDF specification. Remind that the key difference lies in the type conversion between the video frame type and the macro-block type. In the SDF and CSDF representations, the Distributor node performs this type conversion explicitly, thus requires the frame-size buffer space on the input arc. yœˆ m kœ ŠŒ t `` tˆš k š œ i Š lš ˆ l Š Ž `` }ˆ ˆ Œ sœ Ž j Ž t `` j Œ šˆ tˆš i Š kœš Ž ~ Œ { kœ ŠŒ Figure 3. H.263 encoder in SDF yœˆ m kœ ŠŒ t lš ˆ S`_ŸW tˆš k š œ i Š l Š Ž ``Ÿ }ˆ ˆ Œ sœ Ž j Ž `_ŸWS `_ŸWS t j Œ šˆ ``Ÿ tˆš i Š kœš Ž ~ Œ { kœ ŠŒ Figure 4. H.263 encoder in CSDF A fractional rate dataflow specification is shown in Figure 5, where the type conversion occurs at the input port of the Motion Estimation node by the notion of fractional rate 1/99. Therefore, we do without the Distributor node in the FRDF specification, saving the buffer space of the frame size on the output arc of the Motion Estimation node. yœˆ m kœ ŠŒ V`` V`` t lš ˆ tˆš i Š l Š Ž }ˆ ˆ Œ sœ Ž j Ž V`` V`` t j Œ šˆ tˆš i Š kœš Ž ~ Œ { kœ ŠŒ V`` Figure 5. H.263 encoder in SDF with fractional rate 6

7 Table 1 compares the buffer sizes between the reference code and the synthesized codes from three dataflow specifications: SDF(Figure 3), CSDF(Figure 4), and FRDF(Figure 5). The buffer requirements are largely proportional to the number of frame instances since frame data structure is significantly larger than other data types. Note that the reference code is not fully optimized, so it maintains five frame structures throughout the encoding procedure. On the other hand, in the case of FRDF specification, the minimum number of required frame instances is statically analyzed at compile-time. H.263 in extended SDF with fractional rate reduces data memory requirements by 67% and 23% than in SDF and CSDF respectively. Table 1. Buffer size comparison between the reference code and the synthesized codes from SDF, CSDF, FRDF graphs Reference Code SDF CSDF FRDF Buffer size 361KB 686KB 291KB 225KB No. of frame-type data We can reduce the buffer size further if the output sampling rate of the Read From Device node becomes fractional. Assume that the output sampling rate is reduced to 1/2. Then, the buffer size on the associated arc can be reduced to half of the frame size. In other words, we do not need a full frame size memory even though the data unit has the frame size. Thus, fractional rate has advantages of less latency and less buffer memory. 5. ETENDED INTERPRETATION OF FRACTIONAL RATE The notion of fractional rate is understandable when a composite data type is composed of multiple constituent data types. Then, what is the meaning of the factional rate for an atomic type such as integer or float? The notion of fractional rate could be found in the Boolean Dataflow (BDF) model of computation introduced by Buck et al. [13]. The Switch node in a BDF graph has two outputs whose sample rates are dynamically determined by the value of the control data sample (Figure 6). They denote the sample rate of the TRUE arc as p which is a fractional value unknown at compile time. Their interpretation of the fractional rate is the ensemble average of the output sample rates. Our interpretation is similar to theirs for atomic data types. Unlike their approach, however, the fraction value is known and fixed at compile-time so that the fully static analysis of the graph is still possible. O{y l ˆ ŠP O ˆ ˆP zž Š T Omhszl ˆ ŠP OŠ P Figure 6. Switch node in the Boolean Dataflow model. The value p indicates the fractional sample rate. Such statistical interpretation opens another possibility of buffer memory minimization. Figure 7(a) shows an SDF-style multiplexer node where it consumes two inputs at once and routes one sample to each output at each execution. Since the input port needs two samples per execution, the buffer memory of the input arc should be no less than two. An equivalent 7

8 node of the CSDF model saves one buffer space by varying the output sample rate periodically as displayed in Figure 7(b): the upper output port produces one sample at the first invocation and no sample at the next while the lower output port produces no sample at the first and one sample at the next invocation. Figure 7(c) shows an FRDF representation of the multiplexer node, where the average sample rate of each output port becomes 0.5. Since a sample is guaranteed to be produced after two firings of the node, this node behaves like the SDF-style multiplexer. However, we need only one buffer at the input arc of the multiplexer. Y t YŸ t SW t VY WS VY OˆP O P OŠP Figure 7. A multiplexer node: (a) an SDF representation, (b) a CSDF representation, and (c) an FRDF representation A subtle difference can be found between the FRDF-style multiplexer and the CSDF-style multiplexer. In the FRDF-style multiplexer, the order of sample production is not specified. In other words, the output sample rate of the upper arc can be 0, 1, 1, 0, 0, 1, 0, 1, as long as 2k executions of the node generate k output samples for all integer k. Hence the output sample rates may be aperiodic in the FRDF. The same is true for the SDF-style multiplexer. This implies how to determine the fractional sample rate of each port in a block. The body of an FRDF block will have a periodic behavior in terms of how many samples will be consumed or produced. Unlike the CSDF, however, the pattern of the sample rate variation for each port need not be fixed nor periodic. Consider a down-sampling FRDF block whose input sample rate is 1 and the output sample rate is 1/3. The fractional rate only indicates that the block generates one sample per every 3 executions. For example, the output sample rate may become {1, 2x0}, {0,1,0}, or {2x0,1} in the CSDF context. The inclusion relationship between SDF, CSDF, and FRDF extension of SDF is shown in figure 8. jzkm zkm mykm ŒŸ Œ š Figure 8. Inclusion relationship between dataflow models It should also be noted that sample rates 1/2 and 2/4 are different. The former generates one sample per every 2 executions while the latter may generate no sample at the first two executions if it generates two samples during the last two executions. At the first look, it is counter-intuitive. But it is not actually because the sample rate is determined by the periodic behavior of the function. 8

9 For a fractional sample rate p i /q i for port i, q i indicates the period of the repeated node behavior and p i indicates the number of samples consumed or produced during period q i. We define q i as the repetition period of the port. In case an FRDF node has multiple ports with different repetition periods, the period of its repeated behavior is defined as the I/O period of the node. Definition 3: I/O period For an FRDF node, the I/O period is the period during which the node consumes or produces the same number of samples over iterations. It is the least common multiple of the repetition periods of all ports: I/O period = LCM (repetition periods of all ports). The I/O period of the node is in fact equivalent to the equivalence factor Q defined in Definition 1 if all fractions are irreducible. Consider a node with a fraction rate 2/4. The I/O period of the node becomes 4 while the equivalence factor can be 2. Therefore, the equivalence factor of Definition 1 should be replaced with the I/O period in the extended interpretation of fractional rate for atomic type. When we construct a schedule, we have to examine the firing condition of each node depending on the data types it handles: composite and atomic. Revisit the example of Figure 2(a). Now, we assume that the data type on arc BC is atomic. Then, the fraction rate 1/4 only means that one output sample will be produced after 4 executions of node B. So, node B should be scheduled 4 times before scheduling node C. Schedule 2(d) satisfies this condition so that it is a valid schedule. However, we have a better schedule 4(AB)2(C). The buffer size between nodes A and B is reduced to 1 from 4. Therefore, using the fractional rate is still beneficial for the atomic type data samples. We summarize the firing condition of FRDF node using the extended interpretation with the following lemma. Lemma 1. Firing condition of an FRDF node An FRDF node is fireable, or executable, if all input arcs satisfy the following condition depending on the data type: (condition 1) If the data type is atomic, there must be at least as many samples stored as the numerator value of the fractional sample rate. An integer sample rate is a special fractional sample rate with hidden denominator 1. (condition 2) If the data type is composite, there must be at least as large fraction of samples stored as the fractional input sample rate. In summary, fractional rate has different meanings for atomic data types and composite data types. For atomic data types, a fractional rate means the ensemble average during the I/O period of the associated node. For composite data types, it means the portion of the constituent data samples that compose the composite data type. The most popular data type of this sort is an array-type data: vector or matrix. If the data type does not have well-defined meaning of fractional rate, we regard the type as atomic and use the statistical interpretation of fractional rate. 9

10 Such interpretation should be consistent between the consumer and the producer of the data samples. Suppose that there is a two-dimensional array and the producer regards it as an array of row vectors while the consumer regards it as an array of column vectors. In this case, the two-dimensional array may not be regarded as a composite type data. Figure 9 illustrates this fact pictorially. œšœ VZ VY Š šœ Œ Figure 9. The consumer and the producer should have the same interpretation on the fraction of the composite data type. If they do not agree, the data type should be considered as atomic. 6. EPERIMENTS WITH ATOMIC DATA SAMPLES In this section, we compare three dataflow models with a compact disk to digital audiotape converter (CD2DAT) algorithm and a CELP algorithm handling atomic data samples. Figure 10 illustrates a CD2DAT specified in an SDF that converts 44.1kHz sampling data to 48KHz data. In SDF, FIR3 produces 5 samples at once when 7 samples are available on the input port. A looped schedule with minimum buffer requirements is 7(7(3(FIR1)2(FIR2))8(FIR3))40(FIR4). Then the required buffer size on each arc becomes 6, 56 and 280 summing up to 342. Y Z [ ^ \ ^ [ mpy mpyy mpyz mpy[ Figure 10. A CD2DAT algorithm in SDF Figure 11 illustrates the CD2DAT in SDF with fractional rate. In fractional rate dataflow, FIR2 consumes a sample every execution and produces four samples at once every three executions. Similarly, FIR3 generates 5 samples every 7 executions while FIR4 produces 4 samples. The transformation from FIR3 to 7(FIR3 ) as well as FIR4 to 7(FIR4 ) enables a better schedule to be 7(7(3(FIR1) 2(3(FIR2 )4(FIR3 ))) 40(FIR4 )), which requires 6, 4 and 40 buffer memory summing up to 50. The reduction rate is 85% compared with the SDF case. Y [VZ \V^ [V^ mpy mpyy mpyz mpy[ Figure 11. CD2DAT in SDF with fractional rate Figure 12 represents the CD2DAT implementation in CSDF. Even though the computation of the accurate sample rate is complicated, as many buffers are required as specified in FRDF if a loop schedule is applied to the application. 10

11 SW YŸ ZŸSW [Ÿ ^Ÿ WSSSWSSS mpy mpyy mpyz WSSWSSWSS mpy[ ^Ÿ Figure 12. CD2DAT in CSDF Figure 13 illustrates an SDF sub-system for evaluating codebooks in a code excited linear prediction (CELP) algorithm[6]. In SDF, LPC Anal is fired only after 80 samples are available and 80 samples are produced at once. A looped schedule with minimum buffer requirements is 80(fork)(LPC Coeff)(LPC Anal)2((Code Search)10(Channel)(Code Lookup))(LPC Synth). Then, the required buffer sizes on arcs {a,b,c,d,e} become {80, 80, 10, 10, 80} summing up to 260. _W swj j Œ ˆ ŒŠŒ Œ š Œ šœ Œ š Œ _W swj _W [W j Œ W W j Œ [W _W j ˆ Œ h ˆ zœˆ Š s œ Š Œ swj z U _W Figure 13. A subgraph of a CELP algorithm in SDF The CSDF specification of the CELP algorithm gives a more efficient implementation in term of buffer requirements as shown in figure 14. LPC Anal is fired for each data sample and produces one output sample at each execution. When 40 samples are produced, the next Code Search block can be executed. Therefore CSDF has benefits of short latency and less buffer memory over SDF. A possible schedule is 80(fork)(LPC Coeff)2(40(LPC Anal)(Code Search)10(Channel)(Code Lookup)(LPC Synth)). The required buffer sizes on arcs become {80, 40, 10, 10, 40} summing up to 180. Note that the buffer size on the output arc of the LPC Anal block is reduced to 40. ŒŠŒ Œ š Œ _W swj j Œ ˆ S^`ŸW šœ Œ š Œ S^`ŸW _WŸ swj _WŸ [W j Œ W W j Œ [W _WŸ j ˆ Œ h ˆ zœˆ Š s œ Š Œ swj z U _WŸ Figure 14. CELP specification of figure 13 in CSDF Figure 15 represents an FRDF extension of the SDF graph. It has the same advantage over the pure SDF representation as the CSDF specification has in terms of buffer requirements. However, we can use the simpler analysis technique of SDF model to obtain the memory-optimized looped schedule than CSDF. 11

12 _W swj j Œ ˆ V_W ŒŠŒ Œ š Œ šœ Œ š Œ V_W swj [W j Œ W W j Œ [W j ˆ Œ h ˆ zœˆ Š s œ Š Œ swj z U Figure 15. CELP algorithm in SDF with fractional rate 7. CODE GENERATION To use fractional rate, the behavior of a block has to be changed accordingly. In this section, we show that fractional rate extension requires at most a negligible amount of code overhead to maintain the invocation index in the I/O period of the block. First we discuss block definitions with the composite data structure. Figure 16 (a) and (b) represent an SDF-style motion estimation block and its internal code definition. In the code definition, variable i is used as the index number of the output macroblock to compute the x and y coordinates. Figure 16(c) shows a part of the generated code from figure 3. Note that the buffer sizes of input and output ports are both frame-size since the output port consists of 99 macroblocks of which the total buffer size is equal to the frame size ME input output (a) for(i=0; i<99; i++) { x = (16* i)%144; y = 16*((16* i)/144); motionestimation(&input,x,y, &output[i]); } (b) main() { struct { unsigned char s[176*144]; } input; struct { char s[16*16]; } output[99]; { //ReadFromDevice } { // MotionEstimation for(i=0; i<99; i++) { x = (16* i)%144; y = 16*((16* i)/144); motionestimation(&input,x,y, &output[i]); }} } (c) Figure 16. (a) SDF-style motion estimation node, (b) block definition, and (c) the generated code from figure 3 The behavior of the equivalent FRDF block is depicted in figure 17(b). Two modifications from the SDF definition are distinguished. First, the FRDF definition describes the loop body without loop structure. The loop structure will be automatically generated from the execution schedule as shown in figure 17(c). Second, the loop index of the SDF definition is replaced with a phase variable. In FRDF model, a phase variable is used to represent how many fractional samples are produced or consumed. The phase variable is incremented at every execution and reset at every I/O period. For instance, in the ME node the 1/99 sample rate of input port leads to reset the phase variable to 0 every 99 executions. If the sample rate is not a fractional number we do not use phase variables since the phase variables will be reset every execution. 12

13 Figure 17(c) illustrates the generated code that is similar to figure 16(c) except the overhead of phase variable management. To remove the overhead further, we can replace the variable ME_phase_input with the loop index (figure 17(d)). The buffer size of output port reduces to macroblock size in FRDF because the subsequent block to process each macroblock is executed inside the same loop. 1/99 1 ME input output (a) x = (16*$phase(input))%144; y = 16*((16*$phase(input))/144); motionestimation(&input,x,y, &output); (b) main() { struct { unsigned char s[176*144]; } input; struct { char s[16*16]; } output; { // ReadFromDevice } for(int loop=0; loop<99; loop++) { { // MotionEstimation x = (16*ME_phase_input)%144; y = 16*((16*ME_phase_input)/144); motionestimation(&input,x,y, &output); ME_phase_input = (ME_phase_input+1)%99; } } } (c) main() { struct { unsigned char s[176*144]; } input; struct { char s[16*16]; } output; { // ReadFromDevice } for(int loop=0; loop<99; loop++) { { // MotionEstimation x = (16*loop)%144; y = 16*((16*loop)/144); motionestimation(&input,x,y, &output); } } } (d) Figure 17. FRDF-style motion estimation: (a) node specification, (b) code definition, (c) the generated code without and (d) with phase variable sharing The implicit specification of the phase variables in FRDF model requires additional code overhead in the block definition especially for primitive data. Consider the multimplexer example that is already mentioned in Figure 7. Figure 18 (b) and (d) represent the block definition of the SDF-style multiplexer node and FRDF-style respectively. The SDF-style code is simpler than the FRDF-style code since the buffer size of the input port is two. The input buffer stores all input samples and distributes them to output ports at once, which provides simple code definition. On the other hand, in the code definition of the FRDF-style multiplexer node, we need a phase variable to specify the invocation index within an I/O period. In this example, an input sample is mapped to the output1 port at the first execution and another sample is to the output2 port at second execution. The output ports in FRDF-style multiplexer node can produce a sample every other execution. input 2 Mux (a) 1 1 output1 output2 output1 = input[0]; output2 = input[1]; (b) 1 input Mux (c) 1/2 output1 1/2 output2 if($phase(output1)==0) output1 = input; else output2 = input; (d) Figure 18. (a) SDF multiplexer node, (b) block definition of the SDF node, (c) FRDF multiplexer node, and (d) the block definition of the FRDF node Figure 19(a) shows an example using the FRDF-style multiplexer node and Figure 19(b) represents the generated code. In this code, we use only one buffer, A_output, for input port of the multiplexer node while we need two buffers if we use the SDFstyle multiplexer node. 13

14 A 1 1/2 1 1 Mux 1/2 1 2(A,Mux) (B) B main(){ int A_output, Mux_output1, Mux_output2; while(1) { for(i=0;i<2;i++) { {A} {if(i==0) Mux_output1 = A_output; else Mux_output2 = A_output; }} {B}}} (b) (a) Figure 19. Code generation from FRDF: (a) an FRDF graph, and (b) its generated code 8. CONCLUSION In this paper, we propose a new dataflow extension called fractional rate dataflow in which fractional number of samples can be produced or consumed. The notion of fractional sampling rate matches well with multimedia applications where type conversion between composite data types and constituent data types are needed in the dataflow representation. Existent integer rate dataflow models can be easily extended to incorporate the fractional rates without loosing analytical properties. In this paper, the SDF model is extended to include FRDF, which can reduce the buffer memory requirements significantly (up to 70%) for multimedia applications. Even for the atomic type data, use of fractional sampling rate may reduce the buffer requirement by large amount, which is demonstrated with a CD2DAT rate conversion and a CELP example. The SDF dataflow model extended with fractional rates has been implemented in our high-level system synthesis environment called PeaCE(Ptolemy extension as Codesign Environment) [14]. The buffer size of an H.263 encoder specified in our environment is reduced down to what the hand-optimized code would require. This result will bring the proposed high level code synthesis methodology from dataflow specification closer to the practical use, we believe. While the FRDF representation could reduce the buffer requirements, finding out the buffer-optimal schedule is another research subject we are currently pursuing. 9. REFERENCES [1] COSSAP: A stream driven simulator, in Proc. IEEE Int. Workshop Microelectronics Commun., Interlaken, Switzerland, Mar [2] R. Lauwereins, M. Engels, J. A. Peperstraete, E. Steegmans, and J. Van Ginderdeuren, GRAPE: A CASE Tool for Digital Signal Parallel Processing, IEEE ASSP Magazine, vol. 7, (no.2):32-43, April, 1990 [3] J. T. Buck, S. Ha, E. A. Lee, and D. G. Messerschimitt, Ptolemy: A Framework for Simulating and Prototyping Heterogeneous Systems, Int. Journal of Computer Simulation, special issue on Simulation Software Development, vol. 4, pp , April, [4] J. A. Sharp, Data Flow Computing. West Sussex, U.K.: Ellis Horwood, [5] E. A. Lee and D. G. Messerschmitt, Static scheduling of synchronous dataflow programs for digital signal processing, IEEE Trans. Comput., vol. C-36, no. 1, pp , Jan

15 [6] G. Bilsen, M. Engles, R. Lauwereins, and J. A. Peperstraete, Cyclo-Static Dataflow, IEEE Trans. Signal Processing, vol. 44, no. 2, Feb [7] S. S. Bhattacharyya, P. K. Murthy, and E. A. Lee, APGAN and RPMC: Complementary Heuristics for Translating DSP Block Diagrams into Efficient Software Implementations, DAES, vol. 2, no. 1, pp , January, [8] W. Sung and S. Ha, "Memory Efficient Software Synthesis using Mixed Coding Style from Dataflow Graph", IEEE Transactions on VLSI Systems, Vo.8, pp , Oct, [9] E. De Greef, F. Cattohoor, and H. DE Man, Array placement for storage size reduction in embedded multimedia systems, in Proc. Int. Conf. Applications Specific Systems, Architectures, Processors, Zurich, Switzerland, July 1997, pp [10] S. Ritz, M. Willems, and H. Meyr, Scheduling for optimum data memory compaction in block diagram oriented software synthesis, in Proc. ICASSP, Detroit, MI, May 1995, pp [11] P. K. Murthy, and S. S. Bhattacharyya, Shared Buffer Implementations of Signal Processing Systems Using Lifetime Analysis Techniques, IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 20, no. 2, Feb., [12] Telenor Research, TMN (H.263) Encoder/Decoder, Version 2.0, Download via anonymous ftp to bonde.nta.no, June [13] J. T. Buck, Scheduling dynamic dataflow graphs with bounded memory using the token flow model, Ph. D. Dissertation, Dept. Elect. Eng. Comput. Sci., Univ. of California, Berkeley, Sept [14] PeaCE : Codesign Environment. 15

Efficient Code Synthesis from Extended Dataflow Graphs for Multimedia Applications

Efficient Code Synthesis from Extended Dataflow Graphs for Multimedia Applications Efficient Code Synthesis from Extended Dataflow Graphs for Multimedia Applications Hyunok Oh The School of Electrical Engineering and Computer Science Seoul National University Seoul 5-742, KOREA TEL :

More information

STATIC SCHEDULING FOR CYCLO STATIC DATA FLOW GRAPHS

STATIC SCHEDULING FOR CYCLO STATIC DATA FLOW GRAPHS STATIC SCHEDULING FOR CYCLO STATIC DATA FLOW GRAPHS Sukumar Reddy Anapalli Krishna Chaithanya Chakilam Timothy W. O Neil Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science The

More information

Static Scheduling and Code Generation from Dynamic Dataflow Graphs With Integer- Valued Control Streams

Static Scheduling and Code Generation from Dynamic Dataflow Graphs With Integer- Valued Control Streams Presented at 28th Asilomar Conference on Signals, Systems, and Computers November, 994 Static Scheduling and Code Generation from Dynamic Dataflow Graphs With Integer- Valued Control Streams Joseph T.

More information

Renesting Single Appearance Schedules to Minimize Buffer Memory ABSTRACT

Renesting Single Appearance Schedules to Minimize Buffer Memory ABSTRACT UCB/ERL Technical Report, Memorandum No. UCB/ERL M95/43, Electronics Research Laboratory, College of Engineering, University of California, Berkeley, CA 94720 Renesting Single Appearance Schedules to Minimize

More information

Software Synthesis Trade-offs in Dataflow Representations of DSP Applications

Software Synthesis Trade-offs in Dataflow Representations of DSP Applications in Dataflow Representations of DSP Applications Shuvra S. Bhattacharyya Department of Electrical and Computer Engineering, and Institute for Advanced Computer Studies University of Maryland, College Park

More information

Software Synthesis from Dataflow Models for G and LabVIEW

Software Synthesis from Dataflow Models for G and LabVIEW Software Synthesis from Dataflow Models for G and LabVIEW Hugo A. Andrade Scott Kovner Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712 andrade@mail.utexas.edu

More information

Code Generation for TMS320C6x in Ptolemy

Code Generation for TMS320C6x in Ptolemy Code Generation for TMS320C6x in Ptolemy Sresth Kumar, Vikram Sardesai and Hamid Rahim Sheikh EE382C-9 Embedded Software Systems Spring 2000 Abstract Most Electronic Design Automation (EDA) tool vendors

More information

MODELING OF BLOCK-BASED DSP SYSTEMS

MODELING OF BLOCK-BASED DSP SYSTEMS MODELING OF BLOCK-BASED DSP SYSTEMS Dong-Ik Ko and Shuvra S. Bhattacharyya Department of Electrical and Computer Engineering, and Institute for Advanced Computer Studies University of Maryland, College

More information

Dataflow Languages. Languages for Embedded Systems. Prof. Stephen A. Edwards. March Columbia University

Dataflow Languages. Languages for Embedded Systems. Prof. Stephen A. Edwards. March Columbia University Dataflow Languages Languages for Embedded Systems Prof. Stephen A. Edwards Columbia University March 2009 Philosophy of Dataflow Languages Drastically different way of looking at computation Von Neumann

More information

Parameterized Modeling and Scheduling for Dataflow Graphs 1

Parameterized Modeling and Scheduling for Dataflow Graphs 1 Technical Report #UMIACS-TR-99-73, Institute for Advanced Computer Studies, University of Maryland at College Park, December 2, 999 Parameterized Modeling and Scheduling for Dataflow Graphs Bishnupriya

More information

Optimal Architectures for Massively Parallel Implementation of Hard. Real-time Beamformers

Optimal Architectures for Massively Parallel Implementation of Hard. Real-time Beamformers Optimal Architectures for Massively Parallel Implementation of Hard Real-time Beamformers Final Report Thomas Holme and Karen P. Watkins 8 May 1998 EE 382C Embedded Software Systems Prof. Brian Evans 1

More information

Parameterized Dataflow Modeling for DSP Systems

Parameterized Dataflow Modeling for DSP Systems 2408 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 49, NO. 10, OCTOBER 2001 Parameterized Dataflow Modeling for DSP Systems Bishnupriya Bhattacharya and Shuvra S. Bhattacharyya, Senior Member, IEEE Abstract

More information

Synthesis of Embedded Software from Synchronous Dataflow Specifications

Synthesis of Embedded Software from Synchronous Dataflow Specifications Journal of VLSI Signal Processing 21, 151 166 (1999) c 1999 Kluwer Academic Publishers. Manufactured in The Netherlands. Synthesis of Embedded Software from Synchronous Dataflow Specifications SHUVRA S.

More information

Efficient Simulation of Critical Synchronous Dataflow Graphs

Efficient Simulation of Critical Synchronous Dataflow Graphs 52.1 Efficient Simulation of Critical Synchronous Dataflow Graphs Chia-Jui Hsu 1, Suren Ramasubbu 2, Ming-Yung Ko 1, José Luis Pino 2, Shuvra S. Bhattacharyya 1 1 Department of Electrical and Computer

More information

Node Prefetch Prediction in Dataflow Graphs

Node Prefetch Prediction in Dataflow Graphs Node Prefetch Prediction in Dataflow Graphs Newton G. Petersen Martin R. Wojcik The Department of Electrical and Computer Engineering The University of Texas at Austin newton.petersen@ni.com mrw325@yahoo.com

More information

Invited submission to the Journal of VLSI Signal Processing SYNTHESIS OF EMBEDDED SOFTWARE FROM SYNCHRONOUS DATAFLOW SPECIFICATIONS

Invited submission to the Journal of VLSI Signal Processing SYNTHESIS OF EMBEDDED SOFTWARE FROM SYNCHRONOUS DATAFLOW SPECIFICATIONS Invited submission to the Journal of VLSI Signal Processing SYNTHESIS OF EMBEDDED SOFTWARE FROM SYNCHRONOUS DATAFLOW SPECIFICATIONS Shuvra S. Bhattacharyya, Praveen K. Murthy, and Edward A. Lee February

More information

Hierarchical FSMs with Multiple CMs

Hierarchical FSMs with Multiple CMs Hierarchical FSMs with Multiple CMs Manaloor Govindarajan Balasubramanian Manikantan Bharathwaj Muthuswamy (aka Bharath) Reference: Hierarchical FSMs with Multiple Concurrency Models. Alain Girault, Bilung

More information

Memory Optimal Single Appearance Schedule with Dynamic Loop Count for Synchronous Dataflow Graphs

Memory Optimal Single Appearance Schedule with Dynamic Loop Count for Synchronous Dataflow Graphs Memory Optimal Single ppearance Schedule with Dynamic Loop ount for Synchronous Dataflow Graphs Hyunok Oh, Nikil Dutt enter for Embedded omputer Systems University of alifornia, Irvine, {hoh,dutt@ics.uci.edu

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

In Journal of VLSI Signal Processing Systems, Vol 21, No. 2, pages , June Kluwer Academic Publishers.

In Journal of VLSI Signal Processing Systems, Vol 21, No. 2, pages , June Kluwer Academic Publishers. In Journal of VLSI Signal Processing Systems, Vol 21, No. 2, pages 151-166, June 1999. 1999 Kluwer Academic Publishers. SYNTHESIS OF EMBEDDED SOFTWARE FROM SYNCHRONOUS DATAFLOW SPECIFICATIONS Shuvra S.

More information

Memory-constrained Block Processing Optimization for Synthesis of DSP Software

Memory-constrained Block Processing Optimization for Synthesis of DSP Software Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, Samos, Greece, July 2006. Memory-constrained Block Processing Optimization for Synthesis

More information

Fixed Point Representation And Fractional Math. By Erick L. Oberstar Oberstar Consulting

Fixed Point Representation And Fractional Math. By Erick L. Oberstar Oberstar Consulting Fixed Point Representation And Fractional Math By Erick L. Oberstar 2004-2005 Oberstar Consulting Table of Contents Table of Contents... 1 Summary... 2 1. Fixed-Point Representation... 2 1.1. Fixed Point

More information

Cycle-Breaking Techniques for Scheduling Synchronous Dataflow Graphs

Cycle-Breaking Techniques for Scheduling Synchronous Dataflow Graphs Technical Report #UMIACS-TR-2007-12, Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, February 2007 Cycle-Breaking Techniques for Scheduling Synchronous Dataflow

More information

The Ptolemy Kernel Supporting Heterogeneous Design

The Ptolemy Kernel Supporting Heterogeneous Design February 16, 1995 The Ptolemy Kernel Supporting Heterogeneous Design U N T H E I V E R S I T Y A O F LET THERE BE 1868 LIG HT C A L I A I F O R N by The Ptolemy Team 1 Proposed article for the RASSP Digest

More information

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors

More information

Rapid Prototyping of flexible Embedded Systems on multi-dsp Architectures

Rapid Prototyping of flexible Embedded Systems on multi-dsp Architectures Rapid Prototyping of flexible Embedded Systems on multi-dsp Architectures Bernhard Rinner, Martin Schmid and Reinhold Weiss Institut für Technische Informatik Technische Universität Graz Inffeldgasse 16/1,

More information

LOOPED SCHEDULES FOR DATAFLOW DESCRIPTIONS OF MULTIRATE SIGNAL PROCESSING ALGORITHMS 1 ABSTRACT

LOOPED SCHEDULES FOR DATAFLOW DESCRIPTIONS OF MULTIRATE SIGNAL PROCESSING ALGORITHMS 1 ABSTRACT LOOPED SCHEDULES FOR DATAFLOW DESCRIPTIONS OF MULTIRATE SIGNAL PROCESSING ALGORITHMS Shuvra S. Bhattacharyya Edward A. Lee Department of Electrical Engineering and Computer Sciences University of California

More information

LabVIEW Based Embedded Design [First Report]

LabVIEW Based Embedded Design [First Report] LabVIEW Based Embedded Design [First Report] Sadia Malik Ram Rajagopal Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712 malik@ece.utexas.edu ram.rajagopal@ni.com

More information

EE213A - EE298-2 Lecture 8

EE213A - EE298-2 Lecture 8 EE3A - EE98- Lecture 8 Synchronous ata Flow Ingrid Verbauwhede epartment of Electrical Engineering University of California Los Angeles ingrid@ee.ucla.edu EE3A, Spring 000, Ingrid Verbauwhede, UCLA - Lecture

More information

Lecture 4: Synchronous Data Flow Graphs - HJ94 goal: Skiing down a mountain

Lecture 4: Synchronous Data Flow Graphs - HJ94 goal: Skiing down a mountain Lecture 4: Synchronous ata Flow Graphs - I. Verbauwhede, 05-06 K.U.Leuven HJ94 goal: Skiing down a mountain SPW, Matlab, C pipelining, unrolling Specification Algorithm Transformations loop merging, compaction

More information

Efficient Simulation of Critical Synchronous Dataflow Graphs

Efficient Simulation of Critical Synchronous Dataflow Graphs Efficient Simulation of Critical Synchronous Dataflow Graphs CHIA-JUI HSU, MING-YUNG KO, and SHUVRA S. BHATTACHARYYA University of Maryland and SUREN RAMASUBBU and JOSÉ LUIS PINO Agilent Technologies System-level

More information

Heterogeneous Design in Functional DIF

Heterogeneous Design in Functional DIF Heterogeneous Design in Functional DIF William Plishker, Nimish Sane, Mary Kiemb, and Shuvra S. Bhattacharyya Department of Electrical and Computer Engineering, and Institute for Advanced Computer Studies,

More information

Interface Design of VHDL Simulation for Hardware-Software Cosimulation

Interface Design of VHDL Simulation for Hardware-Software Cosimulation Interface Design of Simulation for Hardware-Software Cosimulation Wonyong Sung, Moonwook Oh, Soonhoi Ha Seoul National University Codesign and Parallel Processing Laboratory Seoul, Korea TEL : 2-880-7292,

More information

FPGA Matrix Multiplier

FPGA Matrix Multiplier FPGA Matrix Multiplier In Hwan Baek Henri Samueli School of Engineering and Applied Science University of California Los Angeles Los Angeles, California Email: chris.inhwan.baek@gmail.com David Boeck Henri

More information

Fusing Dataflow with Finite State Machines

Fusing Dataflow with Finite State Machines May 3, 1996 U N T H E I V E R S I T Y A O F LE T TH E R E B E 1 8 6 8 LIG H T C A L I A I F O R N Fusing Dataflow with Finite State Machines Department of Electrical Engineering and Computer Science Bilung

More information

Symbolic Buffer Sizing for Throughput-Optimal Scheduling of Dataflow Graphs

Symbolic Buffer Sizing for Throughput-Optimal Scheduling of Dataflow Graphs Symbolic Buffer Sizing for Throughput-Optimal Scheduling of Dataflow Graphs Anan Bouakaz Pascal Fradet Alain Girault Real-Time and Embedded Technology and Applications Symposium, Vienna April 14th, 2016

More information

Hardware/Software Co-synthesis of DSP Systems

Hardware/Software Co-synthesis of DSP Systems Hardware/Software Co-synthesis of DSP Systems Department of Electrical and Computer Engineering, and Institute for Advanced Computer Studies University of Maryland, College Park This chapter focuses on

More information

Fundamental Algorithms for System Modeling, Analysis, and Optimization

Fundamental Algorithms for System Modeling, Analysis, and Optimization Fundamental Algorithms for System Modeling, Analysis, and Optimization Stavros Tripakis, Edward A. Lee UC Berkeley EECS 144/244 Fall 2014 Copyright 2014, E. A. Lee, J. Roydhowdhury, S. A. Seshia, S. Tripakis

More information

Ptolemy Seamlessly Supports Heterogeneous Design 5 of 5

Ptolemy Seamlessly Supports Heterogeneous Design 5 of 5 In summary, the key idea in the Ptolemy project is to mix models of computation, rather than trying to develop one, all-encompassing model. The rationale is that specialized models of computation are (1)

More information

Extensions of Daedalus Todor Stefanov

Extensions of Daedalus Todor Stefanov Extensions of Daedalus Todor Stefanov Leiden Embedded Research Center, Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Overview of Extensions in Daedalus DSE limited to

More information

EECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization

EECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization EECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization Dataflow Lecture: SDF, Kahn Process Networks Stavros Tripakis University of California, Berkeley Stavros Tripakis: EECS

More information

GENERATING COMPACT CODE FROM DATAFLOW SPECIFICATIONS OF MULTIRATE SIGNAL PROCESSING ALGORITHMS ABSTRACT

GENERATING COMPACT CODE FROM DATAFLOW SPECIFICATIONS OF MULTIRATE SIGNAL PROCESSING ALGORITHMS ABSTRACT in IEEE Tr. on Circuits and Systems I: Fundamental Theory and Applications, Vol. 4, No. 3, March 995 995 ΙΕΕΕ GENERATING COMPACT CODE FROM DATAFLOW SPECIFICATIONS OF MULTIRATE SIGNAL PROCESSING ALGORITHMS

More information

DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING

DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING 1 DSP applications DSP platforms The synthesis problem Models of computation OUTLINE 2 DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: Time-discrete representation

More information

Software Synthesis and Code Generation for Signal Processing Systems

Software Synthesis and Code Generation for Signal Processing Systems Software Synthesis and Code Generation for Signal Processing Systems Shuvra S. Bhattacharyya University of Maryland Department of Electrical and Computer Engineering College Park, MD 20742, USA Rainer

More information

Published in: Proceedings of the 45th Annual Asilomar Conference on Signals, Systems, and Computers

Published in: Proceedings of the 45th Annual Asilomar Conference on Signals, Systems, and Computers A machine model for dataflow actors and its applications Janneck, Jörn Published in: Proceedings of the 45th Annual Asilomar Conference on Signals, Systems, and Computers DOI: 10.1109/ACSSC.2011.6190107

More information

FIR Filter Synthesis Algorithms for Minimizing the Delay and the Number of Adders

FIR Filter Synthesis Algorithms for Minimizing the Delay and the Number of Adders 770 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 48, NO. 8, AUGUST 2001 FIR Filter Synthesis Algorithms for Minimizing the Delay and the Number of Adders Hyeong-Ju

More information

Compositionality in Synchronous Data Flow: Modular Code Generation from Hierarchical SDF Graphs

Compositionality in Synchronous Data Flow: Modular Code Generation from Hierarchical SDF Graphs Compositionality in Synchronous Data Flow: Modular Code Generation from Hierarchical SDF Graphs Stavros Tripakis Dai Bui Marc Geilen Bert Rodiers Edward A. Lee Electrical Engineering and Computer Sciences

More information

Dynamic Scheduling Implementation to Synchronous Data Flow Graph in DSP Networks

Dynamic Scheduling Implementation to Synchronous Data Flow Graph in DSP Networks Dynamic Scheduling Implementation to Synchronous Data Flow Graph in DSP Networks ENSC 833 Project Final Report Zhenhua Xiao (Max) zxiao@sfu.ca April 22, 2001 Department of Engineering Science, Simon Fraser

More information

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Zhining Huang, Sharad Malik Electrical Engineering Department

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 1, JANUARY 2009 81 Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes Ji-Hoon Kim, Student Member,

More information

[6] E. A. Lee and D. G. Messerschmitt, Static Scheduling of Synchronous Dataflow Programs for Digital Signal Processing, IEEE Trans.

[6] E. A. Lee and D. G. Messerschmitt, Static Scheduling of Synchronous Dataflow Programs for Digital Signal Processing, IEEE Trans. [6] E A Lee and G Messerschmitt, Static Scheduling of Synchronous ataflow Programs for igital Signal Processing, IEEE Trans on Computers, February, 1987 [7] E A Lee, and S Ha, Scheduling Strategies for

More information

Embedded Systems 8. Identifying, modeling and documenting how data moves around an information system. Dataflow modeling examines

Embedded Systems 8. Identifying, modeling and documenting how data moves around an information system. Dataflow modeling examines Embedded Systems 8 - - Dataflow modeling Identifying, modeling and documenting how data moves around an information system. Dataflow modeling examines processes (activities that transform data from one

More information

Software Synthesis and Code Generation for Signal Processing Systems

Software Synthesis and Code Generation for Signal Processing Systems IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 47, NO. 9, SEPTEMBER 2000 849 Software Synthesis and Code Generation for Signal Processing Systems Shuvra S. Bhattacharyya,

More information

Memory Optimal Single Appearance Schedule with Dynamic Loop Count for Synchronous Dataflow Graphs

Memory Optimal Single Appearance Schedule with Dynamic Loop Count for Synchronous Dataflow Graphs Memory Optimal Single ppearance Schedule with Dynamic Loop Count for Synchronous Dataflow Graphs Hyunok Oh, RM Inc., Nikil Dutt, UC Irvine, Soonhoi Ha, Seoul National Univ. 1 Outline Code synthesis from

More information

CHAPTER 5 PROPAGATION DELAY

CHAPTER 5 PROPAGATION DELAY 98 CHAPTER 5 PROPAGATION DELAY Underwater wireless sensor networks deployed of sensor nodes with sensing, forwarding and processing abilities that operate in underwater. In this environment brought challenges,

More information

Twiddle Factor Transformation for Pipelined FFT Processing

Twiddle Factor Transformation for Pipelined FFT Processing Twiddle Factor Transformation for Pipelined FFT Processing In-Cheol Park, WonHee Son, and Ji-Hoon Kim School of EECS, Korea Advanced Institute of Science and Technology, Daejeon, Korea icpark@ee.kaist.ac.kr,

More information

BANDWIDTH REDUCTION SCHEMES FOR MPEG-2 TO H.264 TRANSCODER DESIGN

BANDWIDTH REDUCTION SCHEMES FOR MPEG-2 TO H.264 TRANSCODER DESIGN BANDWIDTH REDUCTION SCHEMES FOR MPEG- TO H. TRANSCODER DESIGN Xianghui Wei, Wenqi You, Guifen Tian, Yan Zhuang, Takeshi Ikenaga, Satoshi Goto Graduate School of Information, Production and Systems, Waseda

More information

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given

More information

Shared Memory Implementations of Synchronous Dataflow Specifications

Shared Memory Implementations of Synchronous Dataflow Specifications Shared Memory Implementations of Synchronous Dataflow Specifications Praveen K. Murthy ngeles Design Systems, San Jose Ca, US Shuvra S. Bhattacharyya University of Maryland, College Park, MD, US bstract

More information

Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults

Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults Seungjin Park Jong-Hoon Youn Bella Bose Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science

More information

Buffer Allocation in Regular Dataflow Networks: An Approach Based on Coloring Circular- Arc Graphs

Buffer Allocation in Regular Dataflow Networks: An Approach Based on Coloring Circular- Arc Graphs Buffer Allocation in Regular Dataflow Networks: An Approach Based on Coloring Circular- Arc Graphs R. Govindarajan S. Rengai-ajan Supercomputer Education and Research Center BFL Softwan: Limited Dept.

More information

Short Notes of CS201

Short Notes of CS201 #includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system

More information

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Preeti Ranjan Panda and Nikil D. Dutt Department of Information and Computer Science University of California, Irvine, CA 92697-3425,

More information

Reliable Embedded Multimedia Systems?

Reliable Embedded Multimedia Systems? 2 Overview Reliable Embedded Multimedia Systems? Twan Basten Joint work with Marc Geilen, AmirHossein Ghamarian, Hamid Shojaei, Sander Stuijk, Bart Theelen, and others Embedded Multi-media Analysis of

More information

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Incompatibility Dimensions and Integration of Atomic Commit Protocols The International Arab Journal of Information Technology, Vol. 5, No. 4, October 2008 381 Incompatibility Dimensions and Integration of Atomic Commit Protocols Yousef Al-Houmaily Department of Computer

More information

HYBRID PETRI NET MODEL BASED DECISION SUPPORT SYSTEM. Janetta Culita, Simona Caramihai, Calin Munteanu

HYBRID PETRI NET MODEL BASED DECISION SUPPORT SYSTEM. Janetta Culita, Simona Caramihai, Calin Munteanu HYBRID PETRI NET MODEL BASED DECISION SUPPORT SYSTEM Janetta Culita, Simona Caramihai, Calin Munteanu Politehnica University of Bucharest Dept. of Automatic Control and Computer Science E-mail: jculita@yahoo.com,

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle   holds various files of this Leiden University dissertation Cover Page The handle http://hdl.handle.net/1887/32963 holds various files of this Leiden University dissertation Author: Zhai, Jiali Teddy Title: Adaptive streaming applications : analysis and implementation

More information

Meta-Data-Enabled Reuse of Dataflow Intellectual Property for FPGAs

Meta-Data-Enabled Reuse of Dataflow Intellectual Property for FPGAs Meta-Data-Enabled Reuse of Dataflow Intellectual Property for FPGAs Adam Arnesen NSF Center for High-Performance Reconfigurable Computing (CHREC) Dept. of Electrical and Computer Engineering Brigham Young

More information

CS201 - Introduction to Programming Glossary By

CS201 - Introduction to Programming Glossary By CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with

More information

On Algebraic Expressions of Generalized Fibonacci Graphs

On Algebraic Expressions of Generalized Fibonacci Graphs On Algebraic Expressions of Generalized Fibonacci Graphs MARK KORENBLIT and VADIM E LEVIT Department of Computer Science Holon Academic Institute of Technology 5 Golomb Str, PO Box 305, Holon 580 ISRAEL

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Functional Test Generation for Delay Faults in Combinational Circuits

Functional Test Generation for Delay Faults in Combinational Circuits Functional Test Generation for Delay Faults in Combinational Circuits Irith Pomeranz and Sudhakar M. Reddy + Electrical and Computer Engineering Department University of Iowa Iowa City, IA 52242 Abstract

More information

A New Configuration of Adaptive Arithmetic Model for Video Coding with 3D SPIHT

A New Configuration of Adaptive Arithmetic Model for Video Coding with 3D SPIHT A New Configuration of Adaptive Arithmetic Model for Video Coding with 3D SPIHT Wai Chong Chia, Li-Minn Ang, and Kah Phooi Seng Abstract The 3D Set Partitioning In Hierarchical Trees (SPIHT) is a video

More information

A New Optimal State Assignment Technique for Partial Scan Designs

A New Optimal State Assignment Technique for Partial Scan Designs A New Optimal State Assignment Technique for Partial Scan Designs Sungju Park, Saeyang Yang and Sangwook Cho The state assignment for a finite state machine greatly affects the delay, area, and testabilities

More information

Transactions on Information and Communications Technologies vol 3, 1993 WIT Press, ISSN

Transactions on Information and Communications Technologies vol 3, 1993 WIT Press,   ISSN Toward an automatic mapping of DSP algorithms onto parallel processors M. Razaz, K.A. Marlow University of East Anglia, School of Information Systems, Norwich, UK ABSTRACT With ever increasing computational

More information

II. MOTIVATION AND IMPLEMENTATION

II. MOTIVATION AND IMPLEMENTATION An Efficient Design of Modified Booth Recoder for Fused Add-Multiply operator Dhanalakshmi.G Applied Electronics PSN College of Engineering and Technology Tirunelveli dhanamgovind20@gmail.com Prof.V.Gopi

More information

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y.

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Published in: Proceedings of the 2010 International Conference on Field-programmable

More information

A Process Model suitable for defining and programming MpSoCs

A Process Model suitable for defining and programming MpSoCs A Process Model suitable for defining and programming MpSoCs MpSoC-Workshop at Rheinfels, 29-30.6.2010 F. Mayer-Lindenberg, TU Hamburg-Harburg 1. Motivation 2. The Process Model 3. Mapping to MpSoC 4.

More information

VLSI System Design Part II : Logic Synthesis (1) Oct Feb.2007

VLSI System Design Part II : Logic Synthesis (1) Oct Feb.2007 VLSI System Design Part II : Logic Synthesis (1) Oct.2006 - Feb.2007 Lecturer : Tsuyoshi Isshiki Dept. Communications and Integrated Systems, Tokyo Institute of Technology isshiki@vlsi.ss.titech.ac.jp

More information

Modeling Stream-Based Applications using the SBF model of computation

Modeling Stream-Based Applications using the SBF model of computation Modeling Stream-Based Applications using the SBF model of computation Bart Kienhuis and Ed F. Deprettere Leiden University, LIACS, Niels Bohrweg 1, 2333 CA, Leiden The Netherlands kienhuis,edd @liacs.nl

More information

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Patrick Brown EE382C Embedded Software Systems May 10, 2000 $EVWUDFW MPEG Audio Layer-3 is a standard for the compression of high-quality digital audio.

More information

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,

More information

Mapping Vector Codes to a Stream Processor (Imagine)

Mapping Vector Codes to a Stream Processor (Imagine) Mapping Vector Codes to a Stream Processor (Imagine) Mehdi Baradaran Tahoori and Paul Wang Lee {mtahoori,paulwlee}@stanford.edu Abstract: We examined some basic problems in mapping vector codes to stream

More information

System Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework

System Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework System Modeling and Implementation of MPEG-4 Encoder under Fine-Granular-Scalability Framework Final Report Embedded Software Systems Prof. B. L. Evans by Wei Li and Zhenxun Xiao May 8, 2002 Abstract Stream

More information

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC Randa Atta, Rehab F. Abdel-Kader, and Amera Abd-AlRahem Electrical Engineering Department, Faculty of Engineering, Port

More information

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing 244 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 10, NO 2, APRIL 2002 Heuristic Algorithms for Multiconstrained Quality-of-Service Routing Xin Yuan, Member, IEEE Abstract Multiconstrained quality-of-service

More information

REDUCING THE CODE SIZE OF RETIMED SOFTWARE LOOPS UNDER TIMING AND RESOURCE CONSTRAINTS

REDUCING THE CODE SIZE OF RETIMED SOFTWARE LOOPS UNDER TIMING AND RESOURCE CONSTRAINTS REDUCING THE CODE SIZE OF RETIMED SOFTWARE LOOPS UNDER TIMING AND RESOURCE CONSTRAINTS Noureddine Chabini 1 and Wayne Wolf 2 1 Department of Electrical and Computer Engineering, Royal Military College

More information

Functional modeling style for efficient SW code generation of video codec applications

Functional modeling style for efficient SW code generation of video codec applications Functional modeling style for efficient SW code generation of video codec applications Sang-Il Han 1)2) Soo-Ik Chae 1) Ahmed. A. Jerraya 2) SD Group 1) SLS Group 2) Seoul National Univ., Korea TIMA laboratory,

More information

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,

More information

An Overview of the Ptolemy Project. Organizational

An Overview of the Ptolemy Project. Organizational An Overview of the Ptolemy Project Edward A. Lee Professor and Principal Investigator UC Berkeley Dept. of EECS Copyright 1997, The Regents of the University of California All rights reserved. Organizational

More information

Contents. Chapter 3 Combinational Circuits Page 1 of 34

Contents. Chapter 3 Combinational Circuits Page 1 of 34 Chapter 3 Combinational Circuits Page of 34 Contents Contents... 3 Combinational Circuits... 2 3. Analysis of Combinational Circuits... 2 3.. Using a Truth Table... 2 3..2 Using a Boolean unction... 4

More information

An Optimized Message Passing Framework for Parallel Implementation of Signal Processing Applications

An Optimized Message Passing Framework for Parallel Implementation of Signal Processing Applications In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, Munich, Germany, March 2008. To appear. An Optimized Message Passing Framework for Parallel Implementation of Signal

More information

Placement Algorithm for FPGA Circuits

Placement Algorithm for FPGA Circuits Placement Algorithm for FPGA Circuits ZOLTAN BARUCH, OCTAVIAN CREŢ, KALMAN PUSZTAI Computer Science Department, Technical University of Cluj-Napoca, 26, Bariţiu St., 3400 Cluj-Napoca, Romania {Zoltan.Baruch,

More information

Speed-up of Parallel Processing of Divisible Loads on k-dimensional Meshes and Tori

Speed-up of Parallel Processing of Divisible Loads on k-dimensional Meshes and Tori The Computer Journal, 46(6, c British Computer Society 2003; all rights reserved Speed-up of Parallel Processing of Divisible Loads on k-dimensional Meshes Tori KEQIN LI Department of Computer Science,

More information

Shared Memory Implementations of Synchronous Dataflow Specifications Using Lifetime Analysis Techniques

Shared Memory Implementations of Synchronous Dataflow Specifications Using Lifetime Analysis Techniques June 3, 999 Shared Memory Implementations of Synchronous Dataflow Specifications Using Lifetime Analysis Techniques Praveen K. Murthy Angeles Design Systems pmurthy@angeles.com Shuvra S. Bhattacharyya

More information

SDL. Jian-Jia Chen (slides are based on Peter Marwedel) TU Dortmund, Informatik 年 10 月 18 日. technische universität dortmund

SDL. Jian-Jia Chen (slides are based on Peter Marwedel) TU Dortmund, Informatik 年 10 月 18 日. technische universität dortmund 12 SDL Jian-Jia Chen (slides are based on Peter Marwedel) TU Dortmund, Informatik 12 2017 年 10 月 18 日 Springer, 2010 These slides use Microsoft clip arts. Microsoft copyright restrictions apply. Models

More information

Multiframe Blocking-Artifact Reduction for Transform-Coded Video

Multiframe Blocking-Artifact Reduction for Transform-Coded Video 276 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 4, APRIL 2002 Multiframe Blocking-Artifact Reduction for Transform-Coded Video Bahadir K. Gunturk, Yucel Altunbasak, and

More information

simply by implementing large parts of the system functionality in software running on application-speciæc instruction set processor èasipè cores. To s

simply by implementing large parts of the system functionality in software running on application-speciæc instruction set processor èasipè cores. To s SYSTEM MODELING AND IMPLEMENTATION OF A GENERIC VIDEO CODEC Jong-il Kim and Brian L. Evans æ Department of Electrical and Computer Engineering, The University of Texas at Austin Austin, TX 78712-1084 fjikim,bevansg@ece.utexas.edu

More information

DUE to the high computational complexity and real-time

DUE to the high computational complexity and real-time IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 3, MARCH 2005 445 A Memory-Efficient Realization of Cyclic Convolution and Its Application to Discrete Cosine Transform Hun-Chen

More information