GEZEL Instruction-set Cosimulation

From Gezel2

Jump to: navigation, search

GEZEL designs can be cosimulated with instruction-set simulators. Such designs can include coprocessors that implement graphics, networking and/or cryptographic functions. The GEZEL cosimulation engine is called gplatform. It supports cosimulations with one or more ARM cores, 8051 microcontrollers, or picoblaze microcontrollers.

The general characteristics of the cosimulation design flow are outlined first, followed by a discussion of each of the three cosimulation environments in more detail.

Contents

Cosimulation Interfaces and Interface Protocols

The cosimulation of GEZEL with an instruction-set simulator requires, besides a GEZEL program, also an executable program that can run on the instruction-set simulator. These executables can be created using a compiler. When a compiler runs on a different host machine (e.g. a linux PC) than the target execution environment (e.g. an ARM instruction-set simulator), a cross-compiler is required. In this discussion, the C programming language and a C cross-compiler will be used to create the executables.

The interactions between the GEZEL program and the executables running on the instruction-set simulators are captured in a cosimulation interface, which is an abstracted version of the real hardware/software interface. The cosimulation interfaces of GEZEL are cycle-true models of the real implementations.

There are various forms of cosimulation interfaces, depending on the I/O mechanisms provided by the core (instruction-set simulator). A commonly used type of interface is a memory-mapped interface, in which a set of addresses in the address space of the core is shared between the hardware and the software running on the core. There can also be specialized coprocessor- or I/O-interfaces, which are supported by dedicated instructions on the core. The main advantage of a memory-mapped interface is that it is almost core-independent. Therefore, C code and GEZEL code written for one type of processor can be ported to another processor with only minimal changes. The main advantage of the specialized interface on the other hand, is that it provides a dedicated, non-shared and usually high-bandwidth data channel between the core and the hardware.

A cycle-true cosimulation interface by itself provides only a mechanism to transfer data between a C program and GEZEL. This data transfer proceeds between two concurrent entities (a core and a hardware block). To avoid that data values get lost when one party is unware of the others’ activities, synchronization is required. Such synchronization will be provided in a synchronization protocol. A synchronization protocol defines a signalling sequence on one or more control signals, in addition to the data transfer channel between C and GEZEL. This signalling sequence ensures that the communicating parties achieve synchronization. Both the control signals and data transfer channel can be implemented using the same cosimulation interfaces. For example, you can use a memory-mapped interface for both of them.

The gplatform tool

The gplatform tool is able to do everything what you can do with armcosim, armzilla, and gezel51. It was introduced as of GEZEL 1.7 to start consolidating the amount of cosimulation tools supported in the GEZEL release without trimming down on the flexibility or capabilities.

Eventually, gplatform will support a wide range of system architectures including single-processor systems, loosely coupled as well as tightly coupled multiprocessor architectures, and homogeneous as well as heterogeneous systems. In a loosely coupled system, each core has a private memory program space. In a tightly coupled system, multiple cores will share a single program space. The current version of gplatform supports loosely-coupled multiprocessor systems including an arbitrary configuration of ARM and i8051 cores.

The command line of gplatform is as follows

 gplatform [-d] -c max_cycles gezel_file

The system configuration is fully contained within gezel_file. The -c flag allows to indicate an upperbound for the amount of cycles to simulate. By default, the cosimulation will run until all instruction-set simulators have completed execution of their application program (these stopping conditions may vary from core to core).

Cosimulation interfaces for StrongARM

There are three categories of interfaces for the StrongARM ISS.

  1. Memory-mapped interfaces define a memory-mapped address decoder and intercept memory reads or writes from the ARM software. This is by far the most common and popular method of a hardware-software cosimulation/codesign interface, due to its ease of use and its flexibility. The disadvantage of this interface is communication bandwidth betweem ARM software and GEZEL hardware. This type of interface is implemented using armsystemsource, armsystemsink, and armbuffer.
  2. Special-function unit (SFU) interfaces define an IO port into the pipeline of the processor, and thus can be used to experiment with ASIP (application-specific instruction-set processor) concepts. The SFU interface was designed by the author of Simit-ARM, [Wei Qin]. The special-function unit interfaces are triggered by special, reserved instructions. SFU interfaces have a much larger bandwidth into the StrongARM processor than memory-mapped interfaces. This type of interface is implemented using armsfu2x2, armsfu2x1, armsfu3x1.
  3. Fast-Simplex-Link (FSL) interfaces define a dedicated coprocessor port on the ARM, which is emulated using memory-reads and memory-writes in the ARM software. The FSL interface is defined by the MicroBlaze processor by Xilinx, which also provides detailed documentation for this interface. The gplatform simulator implements an FSL-like interface that enables users to experiment without moving to VHDL or FGPA synthesis. Like the SFU interface, the bandwidth of an FSL interface is higher than that of a memory-mapped interface. This type of interface is implemented using armfslslave and armfslmaster.

Memory-mapped interfaces

Example 1 - Cosimulation with a single ARM

Here is a small example of a hardware-software cosimulation, consisting of a synchronized data transfer. Below is the hardware description in GEZEL.

A GEZEL description of hardware-side of hardware/software handshake

 // ARM core running program ‘listing13’
 ipblock myarm {
   iptype "armsystem";
   ipparm "exec=listing13";
 }
 
 // Cosimulation interfaces
 ipblock b1(in data : ns(8)) {
   iptype "armsystemsink";
   ipparm "core=myarm";
   ipparm "address=0x80000000";
 }
 ipblock b2(out data : ns(8)) {
   iptype "armsystemsource";
   ipparm "core=myarm";
   ipparm "address=0x80000004";
 }
 ipblock b3(out data : ns(32)) {
   iptype "armsystemsource";
   ipparm "core=myarm";
   ipparm "address=0x80000008";
 }
 
 // hardware receiver
 dp D2(in req : ns(8); out ack : ns(8); in data : ns(32)) {
   reg reqreg  : ns(8);
   reg datareg : ns(32);
   sfg sendack {
    ack = 1;
   }
   sfg sendidle {
    ack = 0;
   }
   sfg read {
    reqreg  = req;
    datareg = data;
   }
   sfg rcv {
    $display("data received ", data, " cycle ", $cycle);    
   }
 }
 fsm F2(D2) {
   initial s0;
   state   s1, s2;
   @s0 (read, sendack) -> s1;
   @s1 if (reqreg) then (read, rcv, sendidle) -> s2;
                   else (read, sendack)       -> s1;
   @s2 if (reqreg) then (read, sendidle)      -> s2;
                   else (read, sendack)       -> s1;
 }
 
 dp sysD2 {
   sig r, a : ns(8);
   sig d    : ns(32);
   use myarm;
   use D2(r,a,d);
   use b1(a);
   use b2(r);
   use b3(d);
 }
 
 // connect hardware to cosimulation interfaces
 system S {
   sysD2;
 }

A GEZEL file for cosimulation will in general include the following elements.

  • One or more cores which will be simulated using an

instruction-set simulator

  • One or more cosimulation interfaces that provide communication

channels from GEZEL to the application programs on the core

  • For the hardware part of the cosimulation, a hardware

description using FSMD semantics.

Image:hsk.png

The first two bullets (cores and cosimulation interfaces) are expressed with the GEZEL library block mechanism (ipblock). An ipblock is a library block with similar semantics as a datapath dp. Library blocks are discussed in detail in GEZEL Library Blocks. For the purpose of this discussion, suffices to say that a library block as a type and one or more parameters. A library block’s type is expressed using the ipblock statement, while a library block’s parameters are expressed using the ipparm statement.

Lines 1-5 in Listing 12 include an ARM core in the simulation. It has type 'armsystem’, which means it is a complete instruction-set simulator including its’ program memory. The application program that must be loaded into the program memory is given as a parameter to this library block, on Line 4. In this case, we specify the application program is stored in the executable listing13.

Lines 6-22 define three cosimulation interfaces between GEZEL and the ARM. These interfaces are unidirectional, memory-mapped interfaces. There are two types of memory-mapped interfaces:

  • armsystemsink blocks, such as in lines 8-12. These are channels from GEZEL to the ARM; they are a data sink for GEZEL. These blocks define an input port on the library block where data to be send to the ARM is provided.
  • armsystemsource blocks, such as in lines 18-22. These are channels from the ARM to GEZEL; they are a data source for GEZEL. These blocks define an output port on the library block where data that is received from the ARM can be retrieved.

Both armsystemsink and armsystemsource define two parameters using the ipparm field. The first parameter is the name of the ARM core they belong to. The second parameter is the address value of the ARM memory location that is shared between the GEZEL hardware block and the ARM core.

Lines 24-50 define an example hardware module that can accept values from the software running on the ARM. The module executes a two-phase full-handshake protocol, which uses two control lines (an input req and an output ack). The operations of the protocol are illustrated in Figure 5.6. At the start of the two-phase full-handshake protocol, the hardware module is waiting for the req control signal to become high (lines 46-47). Before driving this signal high, the software will first set the data value to a stable value.

At that moment the second phase of the handshake protocol is entered, and an inverse but symmetric handshake sequence is executed. First the software will drive req to zero, after which the GEZEL hardware model will respond by driving ack to zero (lines 48-49).

A software program that executes this handshake sequence on the ARM is shown next.

A C description of software side of hardware/software handshake

 int main() {
   volatile unsigned char *reqp, *ackp;
   volatile unsigned int  *datap;
   int data = 0;
   int i;
 
   reqp  = (volatile unsigned char *) 0x80000004;
   ackp  = (volatile unsigned char *) 0x80000000;
   datap = (volatile unsigned int  *) 0x80000008;
 
   for (i=0; i<10; i++) {
     *datap = data;
     data++;
 
     *reqp = 1; 
     while (*ackp) { }
 
     *reqp  = 0;
     while (! *ackp) { }
   }
   return 0;
 }

The memory-mapped hardware/software interfaces are included in lines 2-3 as pointers of the volatile type. Such pointers are treated with caution by a compiler optimizer. In particular, no assumption is made about the persistence of the memory location that is being pointed at by this pointer. The pointers are initialized in lines 7-9 with values corresponding to the memory addresses used in the GEZEL description.

In lines 11-20, a simple loop is shown that executes the software side of the two-phase full-handshake protocol. Lines 15 and 18 illustrate why the volatile declaration is important. An optimizing C compiler would conclude that reqp is simply overwritten in the body of the loop. In addition, the resulting value is loop-invariant and can be hoisted outside of the loop body. The resulting optimized code would write the value 0 once in reqp and never change it afterwards. By declaring reqp to be a volatile pointer, the compiler will refrain from such optimizations.

Everything is now ready to run the cosimulation. Start by compiling the ARM program using a cross-compiler. The -static flag creates a statically linked executable, a requirement for the ARM ISS.

 > /usr/local/arm/bin/arm-linux-gcc -static \
            hshakedriver.c -o hshakedriver

Next run the cosimulation with gplatform:

 > gplatform listing12.fdl
 armsystem: loading executable [listing13]
 armsystemsink: set address 2147483648
 data received 0 cycle 29365
 data received 1 cycle 29527
 data received 2 cycle 29563
 data received 3 cycle 29599
 data received 4 cycle 29635
 data received 5 cycle 29671
 data received 6 cycle 29707
 data received 7 cycle 29743
 data received 8 cycle 29779
 data received 9 cycle 29815
 Total Cycles: 32450

The simulation initializes and then prints a series of 'data received’ messages, which are generated by the GEZEL program. The round-trip execution time of the protocol takes 36 clock cycles, a rather high value because we are working with an unoptimized C program and an unoptimized handshake protocol.

Example 2 - Dual ARM Cosimulation

An example of a system with two ARM processors follows next, where data is shipped from one ARM to the next using a dedicated communication bus and an optimized two phase single-sided handshake protocol. The example is comparable in functionality to the previous example, but uses two ARM processors instead of an ARM processor and a hardware module. The system is configured according to the following GEZEL description.

GEZEL interconnect description for a two-ARM system

 ipblock myarm1 {
   iptype "armsystem";
   ipparm "exec=listing15";
 }
 
 ipblock myarm2 {
   iptype "armsystem";
   ipparm "exec=listing16";
   ipparm "period = 2"; // ARM clock = 1/2 system clock 
 }
 
 ipblock channelsrc1(out data : ns(32)) {
   iptype "armsystemsource";
   ipparm "core=myarm1";
   ipparm "address=0x80000008";
 }
 
 ipblock channelsnk2(in data : ns(32)) {
   iptype "armsystemsink";
   ipparm "core=myarm2";
   ipparm "address=0x80000008";
 }
 
 dp sys {
   sig src1, snk2 : ns(32);
 
   use myarm1;
   use myarm2;
   use channelsrc1(src1);
   use channelsnk2(snk2);
   
   always {
     snk2 = src1;
   }
 }
 system S {
   sys;
 }

The two cores are called myarm1 and myarm2 respectively. Two memory-mapped interfaces enable a direct connection between these two processors. The cores and interfaces are described using the library block mechanism (Lines 1-22), in a similar fashion as in the previous example. The interconnection network is described using GEZEL semantics, and consists of a very simple point-to-point connection (Lines 24-38).

The software running on each of the cores is shown in Listing 15 and Listing 16 The handshaking protocol ilustrated here is an optimized version of the two-phase full-handshake protocol described earlier. The optimizations include the following.

  • Convert the two-way request-acknowledge handshake with a one-way

request-only handshake, going from the sender to the receiver. This optimization is possible when the receiver is faster than the sender, because the sender can not verify the status of the receiver and thus must assume it is always safe to send data.

  • Trigger the handshaking on signal level changes rather than signal

levels. This effectively doubles the communication bandwith with respect to a level-triggered case.

  • Merge the request and data signals into a single shared memory

address. This looses one bit of the useful data bandwidth, but at the same time reduces the number of memory accesses by the ARM. In a RISC processor, memory bus badnwidth is a very scarce resource. In the examples below, the most-significant bit of the data word is used as the request bit for the single-side handshake protocol.

A Sender C program of the two-ARM multiprocessor

 #include <stdio.h>
 
 int main() {
   volatile unsigned int  *datap;
   int data = 0;
   int i;
   datap = (unsigned int  *) 0x80000008;
 
   for (i=0; i<5; i++) {
     *datap = data | 0x80000000;
     printf("Sender sends %d\n", data);
     data++;
 
     data &= 0x7FFFFFFF;
     *datap = data;
     printf("Sender sends %d\n", data);
     data++;
   }
   return 0;
 }

A Receiver C program of the two-ARM multiprocessor

 #include <stdio.h>
 
 int main() {
   volatile unsigned int  *datap;
   int data = 0;
   int i;
   datap = (unsigned int  *) 0x80000008;
 
   for (i=0; i<5; i++) {
     do 
       data = *datap;
     while (!(data & 0x80000000));
 
     do 
       data = *datap;
     while ( (data & 0x80000000));
   }
   printf("Receiver complete - last data = %d\n", data);
   return 0;
 }

The simulation of this multiprocessor proceeds as follows. First, compile each of the sender and receiver programs into statically linked ARM-ELF executables.

 > /usr/local/arm/bin/arm-linux-gcc -static  \
                                  listin15.c -o listing15
 > /usr/local/arm/bin/arm-linux-gcc -static \
                                listing16.c -o listing16

Next, run gplatform with the GEZEL file as command line argument. gplatform will then load the ARM executables, the GEZEL description, and start the simulation. In the output, messages printed by the sender are interleaved with messages from the GEZEL program.

 gplatform listing14.fdl
 armsystem: loading executable [listing15]
 armsystem: loading executable [listing16]
 armsystemsink: set address 2147483656
 Sender sends 0
 Sender sends 1
 Sender sends 2
 Sender sends 3
 Sender sends 4
 Sender sends 5
 Sender sends 6
 Sender sends 7
 Sender sends 8
 Sender sends 9
 Receiver complete - last data = 9
 Total Cycles: 64971


Special-Function Units

Here is a brief example that shows how a custom instruction for StrongARM can be created. They can be called from C through the use of the following macro's. In this case, we map the op2x2 instruction (which the StrongARM does not have, of course) to the 'smullnv' instruction. This is a non-implemented instruction which is supported by the StrongARM compiler.

 #define OP2x2_1(D1,D2,S1,S2) \
       asm volatile ("smullnv %0, %1, %2, %3": \
               "=&r"(D1),"=&r"(D2): \
               "r"(S1),"r"(S2));

We will describe the use of OP2x2 with a small example. Consider the following C program. It contains two calls to OP2x2_1, which will map to a custom instruction of the op2x2 type. This program simply defines the input arguments for op2x2, calls it, and prints the result.

 int main() { 
   int p;
   int a,b,c,d; 
   
   a = 10; 
   b = 20; 
   OP2x2_1(c, d, a, b);
   printf("%d %d %d %d\n", a, b, c, d); 
 
   a = 50;
   b = 20;
   OP2x2_1(c, d, a, b);
   printf("%d %d %d %d\n", a, b, c, d); 
   
   return 0; 
 }

Here is a corresponding GEZEL program that implements the special-function unit.

 ipblock myarm {
   iptype "armsystem";
   ipparm "exec = sfudriver";
 }
 ipblock armsfu1(out d1, d2 : ns(32);
                 in  q1, q2 : ns(32)) {
   iptype "armsfu2x2";
   ipparm "core = myarm";
   ipparm "device = 0";
 }
   
 dp addsub {
 use myarm;
 sig d1, d2, q1, q2 : ns(32);
 use armsfu1(d1, d2, q1, q2);
 always {
   q1 = (d1 + d2);
   q2 = (d1 - d2);
   $display("SFU 2x2 runs at ", $cycle, ": " , q1, " ", q2);
   }
 }
  
 system S {
   addsub;
 }

The armsfu1 ipblock in the program defines the interface between StrongARM and GEZEL. This interface provides two outputs (d1 and d2) and two inputs (q1 and q2) as expected for a op2x2 block. The addsub datapath connects to this interface, and performs operations on the values provided by the armsfu1 interface.

The synchronization between StrongARM and the custom datapath in GEZEL is implicit; whenever the StrongARM executes an op2x2 instruction, it provides the input values to the armsfu1 interface, and gives GEZEL one clock cycle to process them. At the end of that clock cycle, the StrongARM takes whatever value is available at the interface back into the program. In this case, the custom datapath is nothing more than a simple add/subtract function.

To compile and simulate this program, run make followed by make sim.

 > make /usr/local/arm/bin/arm-linux-gcc -static  sfudriver.c -o sfudriver
 > make sim
   /opt/gezel-2.1/bin/gplatform armsfu.fdl
   core myarm
   armsystem: loading executable [sfudriver]
   SFU 2x2 runs at 0: 0 0
   SFU 2x2 runs at 30791: 1e fffffff6
   10 20 30 -10
   SFU 2x2 runs at 47074: 46 1e
   50 20 70 30
   Total Cycles: 54062

See 'Library Blocks' for the definition of other SFU interfaces.

Fast Simplex Link

A ‘Fast Simplex Link’ implements a point-to-point connection between microblaze and a coprocessor. Several design features ensure high-throughput between the MicroBlaze and the coprocessor

  • A FSL is a dedicated, non-shared link, driven by a simple handshake protocol rather than a memory-bys read/write cycle.
  • The MicroBlaze processor has dedicated instructions to access the FSL.
  • A FSL can be buffered with a dedicated queue, which enables execution overlap of the MicroBlaze operation and the coprocessor.

We build a high-level simulation model of a copy-processor in GEZEL. The copy processor transfers data from an FSL slave to and FSL master. The terminology used by Xilinx (‘slave’ and ‘master’) is slightly confusing since these interfaces are not slave or master, but rather ‘read’ and ‘write’. The concept of ‘slave’ or ‘master’ implies a predetermined control sequence in the control handshake lines, which is in fact independent of the direction in which the data travels. In any case we will stick to the ‘slave’ and ‘master’ terminology. The listing below shows the Copy Coprocessor modeled in GEZEL.

    ipblock arm1 {
      iptype "armsystem";
      ipparm "exec = fsldrive";
    }
    
    ipblock fsl1(out data   : ns(32);
                 out exists : ns(1);
                 in  read   : ns(1)) {
      iptype "armfslslave";
      ipparm "core=arm1";
      ipparm "write=0x80000000";
    }
    
    ipblock fsl2(in  data   : ns(32);
                 out full   : ns(1);
                 in  write  : ns(1)) {
      iptype "armfslmaster";
      ipparm "core=arm1";
      ipparm "read=0x80000004";
      ipparm "status=0x80000008";
    }
    
    dp gezelfslcopy(in  rdata   : ns(32);
                    in  exists  : ns(1);
                    out read    : ns(1);
                    out wdata   : ns(32);
                    in  full    : ns(1);
                    out write   : ns(1)) {
      reg rexists, rfull : ns(1);
      reg rcopy : ns(32);
      always {
       rexists = exists;
       rfull   = full;
       wdata   = rcopy;
      }
      sfg dowrite   { write = 1; }
      sfg dontwrite { write = 0; }
      sfg doread    { read = 1; }
      sfg dontread  { read = 0; }
      sfg capture   { rcopy = rdata; 
                      $display("captures data: ", rdata);
                    }
    }
    fsm fsm_gezelfslcopy(gezelfslcopy) {
      initial s0;
      state s1, s2, s3;
      @s0 if (rexists) then (capture , doread, dontwrite) -> s1;
                          else (dontread, dontwrite)         -> s0;
      @s1 if (rfull)   then (dontread, dontwrite)         -> s1;
                       else (dowrite , dontread )         -> s0;
    }
    
    dp top {
      sig rdata, wdata : ns(32);
      sig write, read  : ns(1);
      sig exists, full : ns(1);
      use arm1;
      use fsl1(rdata, exists, read);
      use fsl2(wdata, full, write);
      use gezelfslcopy(rdata, exists, read, wdata, full, write);
    }
    
    system S {
      top;
    }

We make use of an ARM instruction-set simulator since a cycle-accurate microblaze ISS is currently not available in GEZEL. The FSL are modeled by means of ipblock constructs (line 6-22). An ARM does not have a FSL and therefore these are emulated through a memory-mapped protocol. The FSL-slide of these ipblock however implement the exact FSL protocol. In other words, any coprocessor that can be functionally verified using cosimulation with this setup, will also work when attached to Microblaze FSL. The memory-mapped protocol through which the ARM drives the FSL works as follows.

  • The FSL slave defines a write address. When the ARM writes to this address, that data will be transferred to the FSL slave interface.
  • The FSL master defines a read address. When the ARM reads from this address, the last data token provided from the FSL master will be returned. The FSL master also defines a status address. When the ARM reads from this address, the presence of a new token will be indicated. In other words, before accessing the read address, the ARM should test the value of the status address to ensure new data was written into the FSL master by the coprocessor.

The copy processor, line 23-51, is a simple FSMD that alternately drives the input FSL handshake and the output FSL handshake. The minimum latency through the coprocessor is two clock cycles: in the first clock cycle, data is copied from the FSL slave to the internal rcopy register. In the second clock cycle, data is transferred from the internal rcopy register to the FSL master.

A corresponding software driver routine that can run on the strongARM and drive this coprocessor is shown in the following listing. We use initialized pointers to provide a convenient abstraction of memory-mapped interfaces. The ARM memory read/write instructions will result in the corresponding FSL protocol to be executed in the GEZEL model. When we will transfer this program to the actual microblaze coprocessor, we will need to replace these memory reads/writes with actual microblaze FSL instructions

  #include <stdio.h>
  
  int main() {
    volatile unsigned int *wchannel = (volatile unsigned int *) 0x80000000;
    volatile unsigned int *rchannel_data = (volatile unsigned int *) 0x80000004;
    volatile unsigned int *rchannel_status = (volatile unsigned int *) 0x80000008;
    int i;	
  			
    for (i=0; i<5; i++) {
      *wchannel = i;	
      while (*rchannel_status != 1) ;
      printf("Received data %d\n", *rchannel_data);	
    }												     
    return 0;	
  }

The C file and the GEZEL file can be cosimulated with the GEZEL-based cosimulator, gplatform. Sample simulation output is as follows.

 > make
 /opt/arm-linux-3.2/bin/arm-linux-gcc -static -O3 fsldrive.c -o fsldrive
 > make sim
 gplatform fslcopy.fdl
 core arm1
 armsystem: loading executable [fsldrive]
 Coprocessor instruction ignored 0xee303110!
 Coprocessor instruction ignored 0xee203110!
 captures data: 0
 Received data 0
 captures data: 1
 Received data 1
 captures data: 2
 Received data 2
 captures data: 3
 Received data 3
 captures data: 4
 Received data 4
 Total Cycles: 71446

Cosimulation interfaces for Microblaze and OPB

GEZEL 2.3 introduces several cosimulation interfaces for Xilinx Microblaze. There is no Microblaze ISS included in the release. Instead, the StrongARM ISS is used as a place-holder for the 32-bit Microblaze RISC. While this approach does not give exact performance estimation, it still allows one to use GEZEL for coprocessor development in a Microblaze-based codesign environment.

The cosimulation interfaces support the following cases:

  • The OPB/IPIF interface in Xilinx FPGA's is supported with the following interfaces:
    • Memory-mapped registers
    • Read- and Write-FIFO's
    • Memory-bus
  • Fast-Simplex Link Interface available on Microblaze Processors.

Cosimulation interfaces for 8051

There are two categories of interfaces for the i8051 ISS.

  • Port-mapped interfaces attach to port P0, P1, P2, or P3 of the 8051 processor. This type of interface is implemented using i8051systemsource and i8051systemsink.
  • Shared-memory interfaces define a shared-memory block attached on the xbus of the 8051 processor. Both GEZEL and the i8051 can read from/write to this memory. This type of interface is implemented using i8051buffer.

Let’s consider a small example of an 8051 cosimulation, a program that will simply transfer data values from the 8051 microcontroller to the GEZEL simulation

A GEZEL description of the 8051 ‘hello’ coprocessor

 dp hello_decoder(in   ins : ns(8);
                  in   din : ns(8)) {
   reg insreg : ns(8);
   reg dinreg : ns(8);
   sfg decode   { insreg = ins; 
                  dinreg = din; }
   sfg hello    { $display($cycle, " Hello! You gave me ", dinreg); } 
 }
 
 fsm fhello_decoder(hello_decoder) {
   initial s0;
   state s1, s2;
   @s0 (decode) -> s1;
   @s1 if (insreg == 1) then (hello, decode) -> s2;
                        else (decode)        -> s1;
   @s2 if (insreg == 0) then (decode)        -> s1;
                        else (decode)        -> s2;
 }
 
 ipblock my8051 {
   iptype "i8051system";
   ipparm "exec=driver.ihx";
   ipparm "verbose=1";
 }
 
 ipblock my8051_ins(out data : ns(8)) {
   iptype "i8051systemsource";
   ipparm "core=my8051";
   ipparm "port=P0";
 }
 
 ipblock my8051_datain(out data : ns(8)) {
   iptype "i8051systemsource";
   ipparm "core=my8051";
   ipparm "port=P1";
 }
 
 dp sys {
   sig ins, din : ns(8);
 
   use my8051;
   use my8051_ins(ins);
   use my8051_datain(din);
   use hello_decoder(ins, din);
 }
 
 system S {
   sys;
 }

The first part of the program, lines 1—17, is a one-way handshake, that accepts data values and prints them. Of particular interest for this example are the hardware/software interfaces in lines 26—36. The cosimulation interfaces with an 8051 are not memory-mapped but rather port-mapped. The 8051 has four ports, labeled P0 to P3, which are mapped to its’ internal memory space but which are available as IO ports on the core. These ports are intended to attach peripherals, and in this case are used to attach a GEZEL processor. To learn more about the 8051, refer to the UCR Dalton project (http://www.cs.ucr.edu/~dalton/i8051/) or the numerous other sources of 8051 information on the web. Here is a driver program in C for this coprocessor.

8051 Driver program for the Hello coprocessor

 #include <8051.h>
 enum {ins_idle, ins_hello};
 void sayhello(char d) {
   P1 = d;
   P0 = ins_hello;
   P0 = ins_idle;
 }
 void terminate() {
   // special command to stop simulator
   P3 = 0x55;
 }
 void main() {
   sayhello(3);
   sayhello(2);
   sayhello(1);
   terminate();
 }

The program transfers a values to the GEZEL coprocessor using sayhello in lines 3-8. The include file on line 1 is specific for this 8051 processor. Unlike a standard C program, a C program on the 8051 never terminates, and there is no concept of standard C library. Consequently, there are no printf functions and so on; these would be of little use within a micro-controller. The include file 8051.h contains several defintions, including those of ports P0 to P3. The special function terminate in lines 8 to 11 is used to stop the cosimulation. It writes the hex value '55’ to port P3 (this is a specific convention for this simulator).

The simulation proceeds as follows. First compile the 8051 program, using the Small Devices C Compiler (sdcc)

 >sdcc listing18.c

The compiler creates several intermediate files, as well as a hex-dump format of the compiled code in Intel Hex format, listing18.ihx. Next, run the gplatform simulator to execute the cosimulation.

 >gplatform listing17.fdl
 i8051system: loading executable [listing18.ihx]
 0xFF    0x03 0xFF 0xFF
 0x01    0x03 0xFF 0xFF
 9612 Hello! You gave me 3/3
 0x00    0x03 0xFF 0xFF
 0x00    0x02 0xFF 0xFF
 0x01    0x02 0xFF 0xFF
 9753 Hello! You gave me 2/2
 0x00    0x02 0xFF 0xFF
 0x00    0x01 0xFF 0xFF
 0x01    0x01 0xFF 0xFF
 9894 Hello! You gave me 1/1
 0x00    0x01 0xFF 0xFF
 0x00    0x01 0xFF 0x55
 Total Cycles: 9987

The output of the simulation shows the $display output from GEZEL, in addition to a value-change trace of the 8051’s ports (P0 to P3). The 8051 uses many clock cycles; there is one ‘machine cycle’ for each 12 clock cycles. Typically, a single instruction can execute in one machine cycle.

Cosimulation interfaces for PicoBlaze

The picoblaze is instantiated as a single block in the simulation. For a description of the picoblaze microcontroller, please refer to the documentation of Xilinx. The model implemented in GEZEL is a cycle-true implementation based on the instruction-set simulator kpicosim by Mark Six. The encapsulation into a cycle-true interface was designed by Eric Simpson.

Here is a small example of a GEZEL design that uses a picoblaze processor.

 ipblock mypico (out port_id : ns(8);
                 out write_strobe : ns(1);
                 out read_strobe : ns(1);
                 out out_port : ns(8);
                 in  in_port : ns(8);
                 in  interrupt : ns(1);
                 out interrupt_ack : ns(1);
                 in  reset : ns(1);
                 in  clk : ns(1)) {
     iptype "picoblaze";
     ipparm "exec=SMALL.DEC";
     ipparm "verbose=0";
 }
 dp shw(in a    : ns(8); 
        in addr : ns(8); 
        in ws   : ns(1)) {
   reg k : ns(8);
   always {
     k = a;
     $display("* ", $cycle, " P->G: V = ", a, " addr = ", addr, " ws = ", ws);
   }
 }
 dp cnt(out a   : ns(8); 
        in addr : ns(8); 
        in rs   : ns(1)) {
   reg c : ns(8);
   always {
    c = c + 1;
    a = c;
   }
 }
 dp top {
   sig port_id, out_port, in_port : ns(8);
   sig write_strobe, read_strobe, interrupt, interrupt_ack : ns(1);
   sig reset, clk : ns(1);
   use mypico(port_id,      
              write_strobe, 
              read_strobe,  
              out_port,     
              in_port,      
              interrupt,    
              interrupt_ack,
              reset,
              clk);
   use cnt(in_port,  port_id, read_strobe);
   use shw(out_port, port_id, write_strobe);
   always {
     interrupt = 0;
     reset = 0;
     clk   = 0;
   } 
 }  
 system S {
   top;
 }

The design attaches a free-running counter to the input data port of a picoblaze, and prints whatever is generated on the output data port of the picoblaze. Note that the reset and clk ports have no meaning for the GEZEL simulation - they are only there to create an exact pin-compatible copy of the picoblaze processor in the top-level netlist.

The program running on the picoblaze is written in picoblaze assembly. Again, refer to the documentation by Xilinx for a description of available picoblaze instructions. Here is a small program that copies the input to the output, while adding 1.

         ENABLE INTERRUPT
 LOOP:   INPUT SA,25
         ADD SA,01
         OUTPUT SA,10
         JUMP LOOP

We can convert that program into a binary (hex) file with the picoblaze assembler.

 > make
 wine ~/picoblaze/KCPSM3/Assembler/KCPSM3.EXE small.psm >& /dev/null

The resulting program and architecture can next by simulated with gplatform:

 > make sim
 ../../../build/bin/gplatform -c 5000 pb.fdl
 picoblaze: executable [exec=SMALL.DEC]
 (RESET EVENT)
 *  0 P->G: V = 20 addr = b8 ws = 0
 *  1 P->G: V = 20 addr = b8 ws = 0
 *  2 P->G: V = 20 addr = 25 ws = 0
 *  3 P->G: V = 20 addr = 25 ws = 0
 *  4 P->G: V = 20 addr = 25 ws = 0
 *  5 P->G: V = 20 addr = 25 ws = 0
 *  6 P->G: V =  4 addr = 10 ws = 0
 *  7 P->G: V =  4 addr = 10 ws = 1
 *  8 P->G: V =  4 addr = 10 ws = 0
 *  9 P->G: V =  4 addr = 10 ws = 0
 * 10 P->G: V =  4 addr = 25 ws = 0
 * 11 P->G: V =  4 addr = 25 ws = 0
 * 12 P->G: V =  4 addr = 25 ws = 0
 * 13 P->G: V =  4 addr = 25 ws = 0
 * 14 P->G: V =  c addr = 10 ws = 0
 * 15 P->G: V =  c addr = 10 ws = 1
 * 16 P->G: V =  c addr = 10 ws = 0
 * 17 P->G: V =  c addr = 10 ws = 0

As can be seen, the first output instruction is at cycle 6 (the write strobe goes high in the second cycle of the output instruction, showing ws = 1 at cycle 7). The data written out at that point is 4. Consequently, considering the picoblaze assembly program, we conclude that this data was captured in cycle 3.

A picoblaze is particularly effective in coping with complex control situations. If you find yourself developing FSM after FSM, with no improvement in sight, it may be useful to reconsider your approach to control design, and try to use a picoblaze controller.

Things to keep in mind with cosimulation

In general, the speed of a good instruction-set simulator is far higher than that of the GEZEL kernel. This is because an ISS is developed with the architecture of the processor it must model in mind, which is not possible for the GEZEL kernel. Also, the GEZEL kernel uses scripted simulation, rather than compiled simulation.

As an optimization, the GEZEL simulator uses a strategy of sleep/awake modes as was discussed in GEZEL_Standalone_Simulation#The_simulation_algorithm. This mode switching is also important for cosimulation. If this is possible, a user should develop the GEZEL hardware model in such a way that periods of idle or inactive operation also imply no datapath register changes and no state changes in the GEZEL controllers. This will result not only in a more energy efficient implementation (less useless toggling nets), but also in improved simulation speed.