GEZEL Controller Design

From Gezel2

Jump to: navigation, search

This section covers the link between a datapath with multiple signal-flowgraphs (instructions), and a controller. Information on how to model datapaths and signal flowgraphs can be found in GEZEL Datapath Design. The generic model of control is FSMD. This section covers this model by itself as well as the representation of this model in GEZEL.

Contents

FSMD models

The control/datapath model of GEZEL is based on a more generic form of register-transfer level modeling called Finite State Machine and Datapath, or FSMD for short. An FSMD model expresses both datapath operations as well as control operations. It makes a clear distinction however between what is control and what is data processing. Recall from GEZEL Overview] that an FSMD consists of two cross-coupled state machines. One plays the role of the controller, the other plays the role of the datapath. Information exchange between the two includes conditions (going from the datapath to the controller) and instructions (going from the controller to the datapath).

An FSMD provides separate modeling for data processing and for control processing. That is for a good reason, in practice there are many differences between the controller and the datapath. First, the modeling style for the two is different. Datapaths are modeled with expressions on signals and registers. Controllers are modeled with state transition graphs. Secondly, the logic implementation style of the two also shows differences. A datapath with operators typically exhibits a regular logic style. Think for example of a ripple carry-chain adder. A controller on the other hand exhibits an irregular logic style.

The FSMD concepts map as follows to the GEZEL model.

  • Instructions are created by selecting one or more sfg out of a datapath. A single sfg can be directly referred to by its name. A set of sfg is enumerated as a comma-separated list in between brackets. For example, assume a datapath is defined as follows.
 dp adp(out a : ns(3)) {
   sig k : ns(2);
   sfg f1 { a = 3; }
   sfg f2 { k = 2;
            a = 2;}
   sfg f3 { k = 1; }
 }

Then, the following are valid instructions:

 f1
 f2
 (f1, f3)

Examples of invalid instruction are:

 f3
 (f1, f2)

These are invalid because the violate the semantic requirements for datapath models (See GEZEL Datapath Design).

When an instruction is executed during a particular control step of a controller, then that will imply execution of the sfg included in the instruction as well.

  • Conditions are created out of logical expressions on registers in the datapath. When conditions are extracted out of datapath inputs or signals, the GEZEL parser will issue a warning. The reason is that GEZEL selects the instruction to execute right at the start of a clock cycle. Before this can be done, any required conditions need to be defined. At the start of a clock cycle however, the only stable values are constants and register outputs. A user can still continue the simulation despite the presence of this warning. However, one must realize at that moment there is a potential risk for anticausal simulation effects (e.g. using the value of a signal before it is available). Therefore, when this warning occurs one must consider if the code can be written such that no warnings appear.
  • The connection between a datapath and a controller is established by refering the name of the datapath while creating the controller. Some earlier examples of this could be seen with the hardwired controller:
 dp adp(out a : ns(3)) {
   ..
 }
 hardwired h_adp(adp) { f1; }

In this example, a controller called h_adp is created and attached to datapath adp.

Sequencer Datapath Controllers

Besides the trivial hardwired controller (See Section 2.5 on page 14), the simplest controller is the sequencer. As the name indicates, a sequencer will execute a set of instructions sequentially, without taking any conditions into account.

A typical case where sequencers are useful is for static, fixed schedules. Consider for example a 4-tap decimating averaging filter. Such a filter reads four subsequent samples, integrates and dumps the sum of the samples at every fourth sample.

 dp avg(in i : ns(8); out o : ns(8)) {
   reg acc : ns(9);
   sfg phase0  { acc = i; o = 0; }
   sfg phase12 { acc = acc + i; o = 0;}
   sfg phase3  { o   = (acc + i) >> 2;}
 }
 sequencer h_avg(avg) { phase0;
                        phase12;
                        phase12;
                       phase3;}
 
 dp tst(in i : ns(8); out o : ns(8)) {
    reg a : ns(8);
    always {
      o = a;
      a = a + 2;
      $display(“C �?, $cycle, “: i=�?, o, “ o=�?, i);
    }
 }
 
 dp sysavg {
   sig i, o : ns(8);
   use avg(i, o);
   use tst(o, i);
 }
 
 system S {
   sysavg;
 }

An averaging filter has four phases. As the datapath in lines 1—6 illustrates, there is an initialization instruction (phase0), an accumulation instruction (phase12) and a termination instruction (phase3). The controller for this datapath is a sequencer with four steps, as shown in lines 7—10. Lines 12—19 show a simple testbench that will feed a string of even numbers to this four-phase averager. Finally, lines 21—29 show the system interconnect function.

This description can be simulated for 10 clock cycles to yield the following output. One can verify that indeed (0+2+4+6)/4 is 3. fdlsim listing5.fdl 10

 C1: i=0 o=0
 C2: i=2 o=0
 C3: i=4 o=0
 C4: i=6 o=3
 C5: i=8 o=0
 C6: i=a o=0
 C7: i=c o=0
 C8: i=e o=b
 C9: i=10 o=0
 C10: i=12 o=0

An important motivation for developing FSMD models, instead of plain hardwired datapaths, is that an FSMD allows to express operation sharing in an elegant way. Consider the descriptions in phase0, phase12 and phase3. They specify two assignments on an accumulator register and three assignments to an output port without the use of a multiplexer. When the same behavior would be represented in a single sfg, it would look like this:

 reg phase : ns(2);
 sfg singlephase {
   acc = (phase == 0) ? i : acc + i;
   o   = (phase == 3) ? (acc + i) >> 2 : 0;
   phase = phase + 1;
 }

This description style gives you precise control over how the implementation will look like, but requires more modeling as the control operations have to be written down explicitly as expression. We will discuss the tradeoff between single-sfg/ multiple-sfg description styles below.


Finite-State Machines

A Finite State Machine implements conditional control sequencing on a datapath. The control model is captured by a state transition graph. A Finite State Machine can be in a well-defined number of states. One of these states is the initial state, it is the state the FSM is in when it first initializes.

A Finite State Machine will take one state transition per clock cycle. During this state transition, a datapath instruction (one or more sfg) can be executed. A state transition can be conditional. In that case, the condition is based on the values of registers in the datapath (or on logical expressions directly derived from it). When state transitions are conditional, then the set of conditions must be complete. This means that, for every if (true-branch), there must be a complimentary else (false-branch).

Consider the following simple example of FSM modeling. The sequencer of Listing 5 can also be written as an FSM as follows.

 fsm h_avg(avg) {
   initial s0;
   state s1, s2, s3;
   @s0 phase0  -> s1;
   @s1 phase12 -> s2;
   @s2 phase12 -> s3;
   @s3 phase3  -> s0;
 }

This description creates four states, called s0, s1, s2 and s3. s0 is the initial state, the others are normal states. A state transition indicates the start state with the @ symbol, and the target state with an arrow (->). In between, a datapath instruction is indicated. A single sfg can be written as such, a group of sfg is specified as a comma-separated list in between round brackets.

Next is an example with slightly more complicated FSM control. The example is a raster line drawing routine, known as the Bresenham Algorithm. The strong point of this algorithm is that it can draw lines of arbitrary slope on a discrete (X,Y) grid, and without the use of floating point arithmetic. The complete GEZEL listing illustrates how a slightly more complicated design looks like.

 // Bresenham line plotter for points in an arbitrary octant
 dp bresen(in x1_in, y1_in, x2_in, y2_in : tc(12)) {
    reg x, y             : tc(12);   // current plot position
    reg e                : tc(12);   // accumulated error
    reg eol              : tc(1);    // end-of-loop flag
    reg einc1, einc2     : tc(12);   // increments
    reg xinc1, xinc2     : tc(12);
    reg yinc1, yinc2     : tc(12);
    sig se, sdx, sdy     : tc(12);
    sig asdx, asdy       : tc(12);
    sig stepx, stepy     : tc(12);
 
   sfg init {
     // evaluate range of pixels and their absolute value
     sdx   = x2_in - x1_in;  asdx = (sdx < 0) ? -sdx : sdx;
     sdy   = y2_in - y1_in;  asdy = (sdy < 0) ? -sdy : sdy;
     // determine direction of x and y increments
     stepx = (sdx < 0) ? -1 : 1;
     stepy = (sdy < 0) ? -1 : 1;
     // initial error
     se    = (asdx > asdy) ? 2 * asdy - asdx : 2 * asdx - asdy;
     // error increment for straight (einc1) and diagonal (einc2) step
     einc1 = (asdx > asdy) ? (asdy - asdx) : (asdx - asdy);
     einc2 = (asdx > asdy) ?  asdy         :  asdx;
     // increment in x direction for straight and diagonal steps
     xinc1 = (asdx > asdy) ? stepx : stepx;
     xinc2 = (asdx > asdy) ? stepx : 0;
     // increment in y direction for straight and diagonal step
     yinc1 = (asdx > asdy) ? stepy : stepy;
     yinc2 = (asdx > asdy) ? 0     : stepy;
     // initialize registers
     x    = x1_in;  y    = y1_in; 
     e    = se;
   }
 
   // end-of-loop test - check if we reach target
   sfg looptest {
     eol   = ((x == x2_in) & (y == y2_in));
   }
 
   // loop body: adjust x, y and error accumulator
   // use error value to decide straight or diagonal step
   sfg loop {
     x     = (e >= 0) ? x + xinc1 : x + xinc2;
     y     = (e >= 0) ? y + yinc1 : y + yinc2;
     e     = (e >= 0) ? e + einc1 : e + einc2;
     $display($hex,"Cycle: ",$cycle," Plot point (", x, ",", y, ") ");
   }
 }
 // controller for bresenham algorithm
 // initializes, draws one line and then waits in state s3
 fsm f_bresen(bresen) {
   initial s0;
   state s1, s2, s3;
   @s0 (init)                -> s1;
   @s1 (loop, looptest)      -> s2;
   @s2 if (eol) then (init)  -> s3;
       else (loop, looptest) -> s2;
   @s3 (init)                -> s3;
 }
 
 // testbench
 dp test_bresen(out x1, y1, x2, y2 : tc(12)) {
   sig sx : tc(12);
   sfg run {
     x1 = 5; x2 = 18; y1 = 2; y2 = 8;
   }    
 }
 hardwired h_test_bresen(test_bresen) {run;}
 
 dp sysbresen {
   sig x1, y1, x2, y2 : tc(12);
   use bresen(x1, y1, x2, y2);
   use test_bresen(x1, y1, x2, y2);
 }
 
 system S {
   sysbresen;
 }

The Bresenham datapath accepts two coordinate tuples, indicating the starting resp. ending points of the vector to be drawn. The bulk of the calculation of the algorithm takes place in an initialization phase, for which a single sfg is created (lines 13—34). Basically, the Bresenham algorithm works with three accumulators: one for the x coordinate (register x), one for the y coordinate (register y), and one error accumulator (register e). At runtime, the error accumulator is evaluated to decide on the required increments in the x and y accumulators.

Not all vectors have the same length, and the Bresenham algorithm only takes a single step (horizontal, vertical or diagonal) per iteration. Because each clock only a single iteration of the Bresenham algortihm is executed, a complete line takes a variable number of clock cycles to generate a vector. Lines 37—39 contain a loop test that decide when to terminate a loop. The actual loop body, which contains the error accumulations, is shown in lines 43—48.

The FSM controller of the Bresenham algorithm is shown in lines 52—60. After initialization, the algorithm takes a first iteration of the loop and evaluates the end-of-loop flag on line 56. From then on, the FSM takes conditional state transitions, which will take it back each time from state s2 to state s2 (line 58), or else terminate the loop into state s3 (line 57). The test (eol) checks when the end-of-loop flag becomes true. This test is taken on the value in a register, so it actually checks the end-of-loop condition of the previous iteration. For this reason, the instruction of the transition into s3 is an initialization instruction (line 57). When the output of eol is high, the x and y accumulators are already at there target position, and no more increments should be done.

Finally, lines 63—79 show a simple testbench for the vector generator. The test will evaluate pixels from the vector running from (5,2) to (18,8) (line 66). The output of this simulation with fdlsim is shown next. Register values are printed out as tuples. These correspond to output/input of a register.

 >fdlsim bresen.fdl 20
 Cycle: 2 Plot point (5/6,2/2)
 Cycle: 3 Plot point (6/7,2/3)
 Cycle: 4 Plot point (7/8,3/3)
 Cycle: 5 Plot point (8/9,3/4)
 Cycle: 6 Plot point (9/a,4/4)
 Cycle: 7 Plot point (a/b,4/5)
 Cycle: 8 Plot point (b/c,5/5)
 Cycle: 9 Plot point (c/d,5/6)
 Cycle: 10 Plot point (d/e,6/6)
 Cycle: 11 Plot point (e/f,6/7)
 Cycle: 12 Plot point (f/10,7/7)
 Cycle: 13 Plot point (10/11,7/8)
 Cycle: 14 Plot point (11/12,8/8)
 Cycle: 15 Plot point (12/13,8/8)

The algorithm needs 14 cycles to complete the drawing. This corresponds to the largest dimension of the vector, in this case along the X axis. State transition conditions can also be nested hierarchically. It is possible to write

 @s0 if (c1) then
       if (c2) then (sfg1) -> s0;
               else (sfg2) -> s0;
     else
       if (c3) then (sfg3) -> s0;
              else (sfg4) -> s0;

or, equivalently as a chained else-if condition like

 @s0 if      ( c1 &  c2) then (sfg1) -> s0;
     else if ( c1 & ~c2) then (sfg2)  -> s0;
     else if (~c1 &  c3) then (sfg3) -> s0;
     else if (~c1 & ~c3) then (sfg4) -> s0;

Choosing a controller style

An FSMD consist of two coupled state machines, one playing the role of datapath, and one playing the role of controller. The FSMD model introduces control steps in a description, and allows the GEZEL description to move from a structural description to a beha-vioral description. A GEZEL description is called structural if it uses only a single signal flow graph for a datapath that is executed at each clock cycle — cfr. the alway signal flow graph A behavioral description is one in which there are multiple sfg in a datapath, which are executed over multiple clock cycles.

A structural description will always have only a single assignment per state variable (a register), while a behavioral description can have more. Each control step of a behavioral description, a different assignment can be done. A behavioral description avoids writing multiplexers when multiple assignments are done to the same state variable in multiple sfg. When the same functionality needs to be migrated from a behavioral to a structural description, these multiplexers need to be introduced by hand (using the ternary operator ‘a ? b : c’).

The absence or presence of the control-step concept also has an important implication on the operation-to-resource binding. Indeed, in a structural description, each operation is executed at each clock cycle. Therefore, each operation will require an individual operator. The word operator indicates the resource that implements an operation. In a behavioral description, several operations can share the same operator provided that these operations are executed in different control steps. The GEZEL code generator creates VHDL code in such a way that this sharing is possible.

Still there are design cases in which structural descriptions are preferable over behavioral ones. In particular, when creating highly constrained implementations such as very fast or very area-sensitive hardware, it can be necessary to control all aspects of the implementation.

Thus, any design can be created in either design style: structural and behavioral. Which of the two description styles is the better one ? The answer to this question depends on the actual design case, and on the designer. Both have their strengths and weaknesses, and ultimately it is the designer who must select the better option. Here a number of statements that illustrate a few design considerations to select a description style.

Structural Behavioral
The expressions in a datapath description .. .. include both scheduling as well as data processing. .. contain only data processing.
The expressions in a datapath description ... .. are harder to reuse with different schedules. .. are easier to reuse with different schedules.
The state assignment of the controller ... .. is chosen by the designer. .. is chosen by the logic synthesis tool.
This writing style is useful for ... .. high-throughput or area-sensitive designs that require full designer control. .. cycle-true descriptions that put as much work as possible on the logic synthesis tools.


A Galois Field multiplier

The look and feel of a structural vs behavioral description style is illustrated by implementing a 4-bit, bit-serial Galois-Field Multiplier in each of the description styles.

A Galois Field Multiplier multiplies elements of the field GF(24). This finite field consists of 16 elements and is created out of the 2-element field GF(2). The representation of the elements is done using four bits, in terms of a field basis. The field basis that will be used is the polynomial basis, in which the individual bits represent coefficients of a polynomial. In this case, the four bits a0a1a2a3 are assumed to be coefficients of a polynomial g(t):

 g(t) = a3t3 + a2t2 + a1t + a0

The multiplication of two elements out of the field GF(24) is defined by the multiplication of two polynomials a(t) and b(t), modulo the irreducible field polynomial d(t). This is a polynomial of degree 4. The simplest irreducible field polynomial for GF(24) is

 d(t) = t4 + t + 1

As an example, consider the multiplication of a = (1001) with b = (0110). In polynomial format this becomes

 c(t) = [a(t).b(t)] mod d(t)
 c(t) = [(t3 + 1).(t2 + 1)] mod (t4 + t + 1)
 c(t) = [t5 + t4 + t2 + 1] mod (t4 + t + 1)

The coefficients of this multiplication are elements of the field GF(2), and they are evaluated with modulo-2 arithmetic. Thus, the multiplication result can be simplified to

 c(t) = [(t + 1)( t4 + t + 1) + (t + 1)] mod (t4 + t + 1) = (t + 1)

The multiplication result corresponds to the bitstring c = (0011). The next two listings implement this algorithm in a bit-serial fashion. That is, the multiplications of the b operand execute bit-by-bit, and accumulate into the a operand. When the partial results exceeds 4 bits, the resulting polynomial is reduced modulo (t4 + t + 1). This is done by modulo-2 addition of this polynomial to the partial result.

Galois Field Multiplier in behavioral-style description

 dp D( in fp, i1, i2 : ns(4); out mul: ns(4); 
       in mul_st: ns(1); 
       out mul_done : ns(1)) {
   reg acc, sr2, fpr, r1 : ns(4);
   reg mul_st_cmd : ns(1);
   sfg ini { // initialization
      fpr        = fp;
      r1         = i1;
      sr2        = i2;
      acc        = 0;
      mul_st_cmd = mul_st;
   }
   sfg calc { // calculation
     sr2 = (sr2 << 1);
     acc = (acc << 1) ^ (r1 & (tc(1))  sr2[3])  // add a if b=’1’
            ^ (fpr & (tc(1)) acc[3]); // reduction if carry
   }
   sfg omul { // output inactive
     mul      = acc;
     mul_done = 1;
     $display("done. mul=", mul);
   }
   sfg noout { // output active
     mul      = 0;
     mul_done = 0;
   }
 }
 fsm F(D) {
   state s1, s2, s3, s4, s5;
   initial   s0;
   @s0 (ini,  noout) -> s1;
   @s1 if (mul_st_cmd) then (calc, noout) -> s2; 
                       else (ini, noout)  -> s1;
   @s2 (calc, noout) -> s3;
   @s3 (calc, noout) -> s4;
   @s4 (calc, noout) -> s5;
   @s5 (ini,  omul ) -> s1;
 }

Galois Field Multiplier in structural-style description

 dp D( in fp, i1, i2 : ns(4); out mul: ns(4); 
       in mul_st: ns(1); 
       out mul_done : ns(1)) {
   reg ctl : ns(5);
   reg acc, sr2, fpr, r1 : ns(4);
 
   always {
      ctl = mul_st ? 1 : (ctl << 1); // one-hot control
      fpr = fp;
      r1  = i1;
      sr2 = ((ctl == 0) ? i2 : (sr2 << 1));
      acc = (ctl == 0)  ?  0 : (acc << 1) 
                             ^ (r1 & (tc(1))  sr2[3])
                             ^ (fpr & (tc(1)) acc[3]);
      mul = acc;
      mul_done = ctl[4];
      $display("mul ", mul, " mul_done ", mul_done);
   }
 }

Both descriptions behave exactly the same, yet they are modeled differently. The first is a behavioral description and uses an fsm to model control of the datapath. The second listing shows a hardwired datapath. It introduces an extra variable, namely the register ctl. This register implements a one-hot controller. At the start of a control cycle, a ‘1’ is injected in this shift register. When it reaches the end, the algorithm is completed. The operations on registers (like acc and sr2) use the value of the ctl register to multiplex two expressions in one assignment. One can verify that in the first listing, these assignments are located in different sfg (ini and calc). They are integrated by the control steps executed in the control FSM description.