Skip to content

This repository contains the work done as part of the 5 Workshop on FPGA - Fabric, Design and Architecture sponsored by the OSFPGA Foundation

Notifications You must be signed in to change notification settings

NAvi349/osfpga-fda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 

Repository files navigation

image

Table of Contents

image

Day 1

FPGA Introduction

Flow

We shall be looking at a 4-bit counter example and the RISC-V RVMyth Processor through two flows:

  • Vivado
  • Skywater OpenSource FPGA (SOFA)

FPGA and FPGA Architecture

Programmable Logic Devices.

  • The hardware can be customized after the manufacturing by programming the device
  • PLA - Produces Minterms, Variable AND Gates, Input to OR Gates is fixed. But sequential elements cannot be programmed using this
  • CPLD - Complex Programmable Logic Devices
  • FPGA - Field Programmable Gate Arrays

FPGA consists of Lookup Tables, Flip Flops and CLBs (Configurable Logic Blocks)

ASIC Vs FPGAs

ASIC FPGA
Not programmable Reprogrammable
Final Stage implementation Useful in prototyping a design
Huge time required to design Less Time Comparatively
RTL to Layout RTL to Bitstream generation

Applications

  • FPGAs are good for implementing algorithms which are able to run parallely.
  • Hardware acceleration implementation of Neural Networks.
  • Embedded Systems
  • Machine Learning

FPGA Architecture

  • Can be used to implement any combinatorial and sequential design.

image

(Diagram taken from Course Slide for reference)

An FPGA Architecture consists of

  • Configurable Logic Blocks

    • It contains an N-input Lookup Tables, Carry chain, Multiplexer and a Flip Flop.
  • Programmable Interconnects

    • The wires which connect the CLBs.
  • Programmable I/O

Bitstream

  • User generates a bitstream for their design specification, and uploads it into the board.
  • Bitstream describes what CLBs that have to be connected and their function.
  • It also defines the connection between the CLBs and I/Os.

Lookup table

  • The Lookup table consists of all the possible minterms of a N-input function.
  • Basically a function can be expressed as Sum of Products form or Sum of Minterms form.
  • If we want to implement a logic function, the corresponding minterms of the function are selected by using the Mux.
  • The output of the Mux feeds into the Flip Flop.
  • A $2^N$ - input Mux can be used to implement any N - input Function or some N + 1 input function with an additional NOT gate.

FPGA Programming

  • HDL - Verilog
  • HLS - C/C++/Python

FPGA Design Methodology

graph TD;
  A[Architectural Description] --> B[RTL Design and testbench]
  B --> C[Behavioral Simulation]
  C --> D["Synthesis (Timing Analysis)"]
  D --> E["Implementation Place/Route (Timing Analysis)"]
  E --> F[Bitstream generation]
  F --> G[FPGA Programming]

  H[Timing constraints] --> D
  I[Pin Assignments] --> D
Loading

Synthesizable and Non - Synthesizable

The following are non-synthesizable:

  • User defined delays
  • Inital block
  • nmos, pmos primitives
  • Dynamic Memory allocation
  • Infinite loops

Basys 3 FPGA Board

image (Image taken from slide for reference)

Tools used

Different ways of programming

  • Local Programming using a board
  • Remote Programming - A FPGA connected to a remote server: I/O through Virtual I/O.

Counter Example in Vivado

  • We use a frequency synthesizer to generate a slow clock to observe the outputs
Code
module counter(clk,reset,count);
input clk,reset;
output reg [3:0] count = 4'b0000;
reg [25:0] count_reg;
reg clk_div = 1'b0;

always @ (posedge clk)
begin
if (reset)
    begin
        clk_div <= 1'b0;
        count_reg <= 26'd0;
        ///count <= 4'b0;
     end
else
    begin
        count_reg <= count_reg + 1;
    /// if (count_reg == 26'h3ffffff) // for synthesis
        if (count_reg == 26'd12) // for simulation
        begin
            clk_div <= ~ clk_div;
            count_reg <= 26'd0;
        end
    end
 end

  always @ (posedge clk_div) begin
   if (reset)
   begin
      count <= 4'b0000;
   end
   else
   begin
      count <= count + 1'b1;
   end
  end
endmodule

Vivado-counter example

i. Type vivado in the terminal.

ii. Create a new project.

iii. Do the following settings.

image

iv. Click finish.

v. Add source files

  • Set the testbench as top module while during Behavioural Simulation

vi. Run Behavioural Simulation

image

  • The output counter gets incremented at every posedge of the slow clock

vii. Now set the design as top unit (not the testbench)

vii. Run Elaboration

image

viii. Do the I/O pin assignment as follows and save as constraints.xdc file.

image

Static Timing Analysis

Setup Time (Max. Delay) Constraints

image

  • The data should be available Tsetup before the capturing clock edge comes.
  • We need fast cells to satisfy Max Delay constraints.
  • Setup time constraints are always calculated with respect to the next clock edge.
Hold Time (Min. Delay) Constraints

image

  • Data should be stable for T-hold after the clock edge to be captured properly.
  • If the next data comes earlier than T-hold, the current data will not be captured properly.
  • We need slow cells to satisfy hold time constraints
Setup Slack
  • Setup slack = Required Arrival Time(Clock setup time) - Arrival Time (Data)
  • Delay in clock helps us
  • Delay in data is not feasible here
Hold Slack
  • Hold Slack = Arrival Time(Data) - Required Arrival Time(hold time of FF)
  • Here delay in data is helpful
  • But delay in clock is not desirable.

Synthesis

i. Now run synthesis.

image

ii. Primary clock is selected. image

iii. No generated clock. image

iv Skip to finish.

image

v. Synthesized netlist image

Implementation

image

Resource Utilization

image

Bit Stream

image

Programming

  • Click program device to send the bitstream to the FPGA board.

Timing Analysis

This can be done after synthesis or after implementation.

i. Click on Report Timing Summary

image

ii. We can also view the detailed path report for each path. It specifies the starting and endpoint.

image

  • As the slack is positive, all the timing constraints are met.

Power

image

Area

  • Click on report Resource Utilization

image

image

  • Summary

image

Virtual Input/Output Counter

  • Using this method we can program a remote FPGA connected to a cloud.
  • The VIO provides the inputs to the FPGA and probes the output.
  • VIO output is the input to the FPGA
  • VIO input is the output of the FPGA

image

VIO Code

  • Make these changes. The reset and clock are come from VIO. So they are no longer input ports. The counter output will be probed by the VIO, so it is no longer an output port.

VIO Inputs: Slow Clock, Counter Output VIO Outputs: Reset

image

i. Click on IP Catalog in the Project Manager.

ii. Search VIO

image

iii. Configure as shown.

image

image

image

iv. Click generate

v. Go to IP Sources and click on Instantion Template and copy these lines from the .veo file.

image

vi. Use this to instantiate the VIO in the .v file

image

Elaboration and Pin Assignment

image

Bitstream Generation

image

Day 2

Intro to OpenFPGA

  • It is an open source framework to quickly generated a custom FPGA fabric specific to our design
  • It is automated
  • Reduces development time
  • Generally it takes 24 hours to produce a production ready layouts

Custom FPGAs

  • For certain applications custom-made FPGAs provide huge performance gain.
  • Custom FPGA architectures are expensive to produce
  • OpenFPGA allows us to customize our own FPGA fabrix using a set of templates
  • Generate verilog netlists based on an XML file - VPR ( Versatile Place and Route )
  • Automatically generates Verilog testbenches'
  • Bitstream generation based on XML format

Visit this repo to install OpenFPGA: https://github.com/lnis-uofu/OpenFPGA

Verilog to Routing (VTR)

graph TD;
  A[Verilog Digital Circuit] --> E["Elaboration & Synthesis (Odin II)"]
  B[XML FPGA Architecture] --> E
  E --> L
  L["Logic Optimization & Technology Mapping (ABC)"] --> P["Pack the netlist, Placement, Routing and Timing Analysis (VPR)"]

  P --> O[Output Statistics]
  P --> PI[Post-Implementation netlist]
Loading

VTR Flow

i. We shall first run VPR on a Pre-synthesized circuit. ii. Then, we shall run entire VPR Flow from RTL

VPR Flow on a Pre-Synthesized Design

i. Packing - combines the technology mapped netlist into (CLBs) Complex Logic Blocks. ii. Placement - postition of the CLBs - it produces a .place file. iii. Routing - interconnection between blocks iv. Analysis - Analyzes the implementation

Input: Blif file, Earch

This is the general structure of a Earch.xml file. It describes the FPGA Architecture.

<!--
-->
<architecture>
  <models>
    <model name = "">
    </model>
  </models>
  <tiles>
    <tile name = "">
    </tile>
  </tiles>
  <layout> <!-- Grid Layout, aspect ratio -->
  </layout>
  <device> <!-- Transistor definitions -->
  </device>

  <switchlist>
  </switchlist>

  <segmentlist>
  </segmentlist>

  <directlist>
  </directlist>

</architecture>

.blif file

.names - LUTS

Lab on VPR on a Pre-Synthesized Design

i. Create a working directory.

mkdir -p vtr_work/quickstart/vpr_tseng
cd ./vtr_work/quickstart/vpr_tseng

ii. Now pass the FPGA architecture and the technology mapped netlist of the design.

$VTR_ROOT/vpr/vpr \
$VTR_ROOT/vtr_flow/arch/timing/EArch.xml \
$VTR_ROOT/vtr_flow/benchmarks/blif/tseng.blif \
--route_chan_width 100
--disp on

image

image

image

image

iii. Additional parameter can be applied here

$VTR_ROOT/vpr/vpr \
$VTR_ROOT/vtr_flow/arch/timing/EArch.xml \
$VTR_ROOT/vtr_flow/benchmarks/blif/tseng.blif \
--route_chan_width 100
--analysis
--disp on

Output of the VPR Step

image

  • .net file: post packed circuit, circuit in terms of CLBs.
  • .place file: how the cells are placed
  • .route file: interconnects
  • .log file:

Timing Reports

VPR Setup Slack

image

  • Here, the setup slack is violated as no clock constraints are specified.
VPR Hold Slack

image

Constraints

i. Create a tseng.sdc file

create_clock -period 10 -name pclk
set_input_delay -clock pclk -max 0 [get_ports {*}]
set_output_delay -clock pclk -max 0 [get_ports {*}]

ii. Run include sdc file as part of the command

$VTR_ROOT/vtr_flow/arch/timing/EArch.xml $VTR_ROOT/vtr_flow/benchmarks/blif/tseng.blif --route_chan_width 100 --disp on --sdc_file tseng.sdc
  • Now we can see that the setup slack is met.

image

Counter VTR Flow

VTR Synthesis and Simulation

i. Run the automated VTR Flow command.

$VTR_ROOT/vtr_flow/scripts/run_vtr_flow.py \
      /home/knavin2002/Desktop/openfpga/Day\ 2/vtr_work/counter.v \
      $VTR_ROOT/vtr_flow/arch/timing/EArch.xml \
      -temp_dir . --route_chan_width 100

image

ii. We can run vpr (flow analysis) on the pre-vpr file.blif file.

$VTR_ROOT/vpr/vpr $VTR_ROOT/vtr_flow/arch/timing/EArch.xml \
                  counter --circuit_file counter.pre-vpr.blif \
                  --route_chan_width 100 \
                  --analysis --disp on

image

iii. Now run the entire VPR Flow

$VTR_ROOT/vpr/vpr $VTR_ROOT/vtr_flow/arch/timing/EArch.xml \
                  counter --circuit_file counter.pre-vpr.blif \
                  --route_chan_width 100 --disp on

image

iv. Generation of Post implementation netlist

$VTR_ROOT/vpr/vpr $VTR_ROOT/vtr_flow/arch/timing/EArch.xml \
                  counter.pre-vpr.blif \
                  --gen_post_synthesis_netlist on

image image

  • To run the post implementation netlist, we need primitves file

v. Simulation in Xilinx Vivaldo. Include post_synthesis file, testbench and primitives file

image

  • Include the path to the .sdf file correctly in the testbench.

image

image

VTR Timing Analysis

  • We have to pass the .blif to the VPR Flow otherwise the tool will have its own constraints.

i. Create a constraints.sdc file. Create constraints as follows. Also change the name of the clock in the pre-vpr.blif as up_counter_clk.

create_clock -period 10 up_counter_clk
set_input_delay -clock up_counter_clk -max 0 [get_ports {*}]
set_output_delay -clock up_counter_clk -max 0 [get_ports {*}]

ii. Now run VPR flow with te timing constraints.

  $VTR_ROOT/vpr/vpr \
  $VTR_ROOT/vtr_flow/arch/timing/EArch.xml \
  counter.pre-vpr.blif --route_chan_width 100 \
  --sdc_file /home/knavin2002/Desktop/openfpga/Day\ 2/tseng.sdc

image

iii. We can check whether the setup and hold time constraints are met.

image

iv. Now change the clock period to 5ns in .sdc file.

image

v. Check the setup and hold constraints.

image

image

  • They are still met.

Area Analysis

i. Open vpr_stdout.log file.

image

image

image

Power Analysis

i. Include the -power and the cmos_tech xml file the vtr flow command.

  $VTR_ROOT/vtr_flow/scripts/run_vtr_flow.py \
  /home/knavin2002/Desktop/openfpga/Day\ 2/vtr_work/counter.v \
  $VTR_ROOT/vtr_flow/arch/timing/EArch.xml -power -cmos_tech \
  $VTR_ROOT/vtr_flow/tech/PTM_45nm/45nm.xml \
  -temp_dir . --route_chan_width 100

image

image

Comparison of Basys3 and VTR Flow

  • Now we shall consolidating all the results from the Vivado flow and the VTR Flow of the counter design.

Post Implementation Timing

  • .sdc file
  • Clock period 10ns or 100 MHz
Parameter Basys3 VTR Earch
Technology 28 nm 40 nm
Worst Negative Slack - Setup 5.77 ns 8.54 ns
Worst Negative Slack - Hold 0.23 ns 0.293 ns
  • Minimum Slack
Parameter Basys3 (ns) VTR Earch (ns)
Minimum Time Period 3.5 1.8
Worst Negative Slack - Setup 0.41 0.34
Worst Negative Slack - Hold 0.33 0.293

Area Comparison

Parameter Basys3 VTR Earch (ns)
LUTs 23 18
I/O 6 7
Flip Flops 31 4

Power Comparison

Parameter Basys3 VTR Earch (ns)
Clocks 1e-3 1e-4
Signals 1e-3 1.4e-5
Logic 1e-3 8.2e-5
I/O 1.1e-2 0
Dynamic 1.2e-2

Day 3 RISC - V core Programming on Vivado

RTL-to-Synthesis

Behavioural Simulation

i. Create a new project and add the core as a source and add a testbench as simulation source.

ii. Run Behavioural Simulation.

image

  • We can see the output is 45 as expected.

Elaboration

i. Assign pin number as shown.

image

ii. We shall use ILA to observer the outputs. So remove the output ports and declare it as reg.

image

iii. Run Elaboration again

image

iv. From IP Catalog instantiate ILA and connect to the RISC - V core

image

v. Run Synthesis and then add constraints image

vi. Run synthesis

image

Synthesis-to-bitstream

Core Resource Utilization

image

Schematic

image

RISC - V Implementation

i. Run implementation.

image

ii. Report timing summary

image

  • All the user constraints are met.

Bitstream-Generation for RISC - V core

i. Run Bitstream generation.

image

image

Day 4 - SOFA FPGA Fabric IP

Intro to SOFA

i. Clone the repository

git clone https://github.com/lnis-uofu/SOFA.git

ii. Make FPGA1212_QLSOFA_HD_PNR/ as the current directory.

iii. Copy the counter.v file to the benchmark folder.

image

iii. Open task_simulation.conf

  gedit FPGA1212_QLSOFA_HD_task/config/task_simulation.conf

image

iv. Add the following lines under respective sections

[BENCHMARKS]
bench0=${PATH:TASK_DIR}/BENCHMARK/counter_new/counter.v

[SYNTHESIS_PARAM]
bench0_top = up_counter

image

  • generate_testbench.openfpga will call the vpr flow

image

  • architecture is available under arch directory.

image

v. Run the makefile

make runOpenFPGA

image

vi. View the generated files

 cd FPGA1212_QLSOFA_HD_task/latest/vpr_arch/up_counter/MIN_ROUTE_CHAN_WIDTH/

image

SOFA counter Statistics

image

image

SOFA counter Timing Analysis

i. Create counter.sdc file in the counter_new folder in BENCHMARK.

create_clock -period 20 clk
set_input_delay -clock clk -max 0 [get_ports {*}]
set_output_delay -clock clk -max 0 [get_ports {*}]

image

ii. Open generate_testbench.openfpga and the counter.sdc file in the vpr arguments.

image

iii. Run the makefile command

make runOpenFPGA

image

SOFA counter Timing Report

  • Setup slack

image

  • Hold Slack

image

SOFA counter Post Implementation

i. Add this argument in the vpr flow in the generate_testbench.openfpga file.

--gen_post_synthesis_netlist on

image

ii. Run makefile again

image

  • post implementation file

image

iii. Run simulation through vivado.

image

SOFA Power Analysis

i. Change this two lines in the task_simulation.conf file.

image

ii. Add this new line the generate_testbench.openfpga file

vpr ${VPR_ARCH_FILE} ${VPR_TESTBENCH_BLIF} --clock_modeling ideal \
  --device ${OPENFPGA_VPR_DEVICE_LAYOUT} --route_chan_width ${OPENFPGA_VPR_ROUTE_CHAN_WIDTH} \
  --absorb_buffer_luts off --power \
  --activity_file /home/knavin2002/Desktop/openfpga/Day-4/SOFA/FPGA1212_QLSOFA_HD_PNR/FPGA1212_QLSOFA_HD_task/latest/vpr_arch/up_counter/MIN_ROUTE_CHAN_WIDTH/up_counter_ace_out.act \
  --tech_properties /home/kunalg123/Desktop/vtr-verilog-to-routing/vtr_flow/tech/PTM_45nm/45nm.xml

image

iii. Add the options for power analysis in vpr_arch.xml file.

<!-- added for power analysis  -->
  <power>
      <local_interconnect C_wire="2.5e-10"/>
      <mux_transistor_size mux_transistor_size="3"/>
      <FF_size FF_size="4"/>
      <LUT_transistor_size LUT_transistor_size="4"/>
  </power>
  <clocks>
    <clock buffer_size="auto" C_wire="2.5e-10"/>
  </clocks>

image

iv. Run the makefile command again.

image

v. See the output in counter.power.

image

Day 5 SOFA RISC - V Core

  • We shall fall the same steps for the RISC - V RVMyth Core

i. Git clone the SOFA repo as before.

ii. Make these changes in the corresponding files. Also refer to the files in the Day 5 folder. Comment lines 157 and 158, line 259 in the vpr.xml.

image

image

image

iii. Run makefile command

image

iv. Open openfpgashell.log.

image

RISC - V Resource Utilization

  • Open vpr_stdout.log for the output statistics.

image

  • Logic Elements

image

SOFA RISC - V Timing Reports

i. Create .sdc file

create_clock -period 200 clk
set_input_delay -clock clk -max 0 [get_ports {*}]
set_output_delay -clock clk -max 0 [get_ports {*}]

ii. Pass the .sdc file as argument in generate_testbench.openfpga

image

image

iii. Run the makefile command.

  • Setup slack

image

  • Hold slack

image

SOFA RISC - V Post Implementation

i. Add the required argument.

image

ii. Run makefile

image

iii. View the core_post_synthesis.v

image

  • Check the waveform using Vivado simulation. It is correct (45 = Sum of 1 to 9).

image

Acknowledgements

  • Kunal Ghosh, Founder, VSD
  • Nandita Rao, Course Instructor
  • Dr. Xifan Tang, OpenFPGA and Chief Engineer RapidSilicon

About

This repository contains the work done as part of the 5 Workshop on FPGA - Fabric, Design and Architecture sponsored by the OSFPGA Foundation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published