Team AND
An introduction to FPGAs & VHDL
Stéphane Domas
Introduction
FPGAs
• architecture principles,
• an example : the spartan 3,
• advantages and drawbacks for scientific computing.
VHDL
• syntax and semantic,
• programming and simulation tools.
FPGAs
General architecture
• based on SRAM cells (Static Random Access Memory),
• composed of logic blocs (CLBs),
• mesh to interconnect blocs,
• I/O cells,
• sometimes specialized blocs : RAM, clock generation, multipliers, ...
• the whole configurable at will.
FPGAs
Logical blocks
• Architecture :
• one or several Look-Up Table (LUT),
• multiplexers,
• one or several output flip-flops (or latches).
• Functionalities :
• LUTs⇒simple logical functions on few bits,
• multiplexers⇒combine output of LUTs,
• latches⇒1 bit memories.
FPGAs
Mesh
• Manhattan type (as a grid or a torus),
• routing matrix at each cross,
• several channels :
• between direct neighbors,
• between close neighbors,
• sometimes betweenn-neighbors,
• some dedicated to distribute clock signals.
• . . .
FPGAs
Inputs/Outputs
• several types :
• power supply,
• external clock,
• FPGA initialization and programming,
• general purpose.
• configurable I/O cells⇒adaptation to outside environment (e.g.
different logic levels)
FPGAs
abstraction
• state machine :
• CLB = vertex withninputs andmoutputs of 1 bit,
• interconnection mesh = edges,
• particularities :
• considering the physical time to propagate a signal over an edge,
• vertices evolving synchronously or asynchronously
⇒clock signal distributed to all/some CLBs.
• possible to delay an output of a CLB,
⇒store state of several 1 bit signals in a "register" thanks to flip-flops (type D)
Spartan 3
• the base : CLBs, IO cells,
• bonus : RAM and multipliers (18 bits),
• DCM = Digital Clock Manager (clock generation, phase
shift, ...)
Spartan 3
• 4 slices per CLB,
• LUTs can be used as RAM or shift-register.
Spartan 3
version 3A
version 3A-DSP
For scientific computing
Advantages
• natural parallelism,
• natural task pipelining,
• dynamic and fast reconfiguration,
• easy to integrate within an embedded board Drawbacks
• extremely low level programming,
• non imperative algorithmic,
• binary logic⇒computation with integer values (no floating point operations),
• low clock frequency (compared to classic processors),
• "big" FPGAs costly,
• development/production environments not free.
VHDL : principles
• VHDL = Very High Description Language,
• provides an abstraction of the hardware,
• based on functional description of blocs with n inputs and m outputs (called ports),
• hierarchical aggregation of blocs,
• each bloc contains "instructions" to manipulate values called signals,
• signal = aggregation of n physical signals of 1 bit.
VHDL : syntax
to create a block
1. description of I/O ports and variables (entity),
2. description of its structure and its code (architecture).
entity clockgen is
generic ( Tps : Time );
port ( phase : out std_logic );
end entity clockgen;
architecture rtl of clockgen is ...
end architecture rtl;
VHDL : syntax
content ofarchitecture 1. sub-blocks declaration, 2. signal declaration, 3. sub-blocks instantiation, 4. code.
VHDL : syntax
Example :
architecture rtl of div_tb is component clockgen
generic ( Tps : Time );
port ( phase : out std_logic );
end component;
...
signal ext_clk : std_logic;
...
begin
clkgen0 : clockgen
generic map ( Tps => 5ns ) port map ( phase => ext_clk );
...
end architecture rtl;
VHDL : syntax
the "code"
• assignment :
signal <= expression
• expression =
• composition of signals using logic, relational or arithmetic operators,
• cast of types (e.g.std_logic_vector→signed),
• conditional assignment.
valsign <= val(3);
valabs <= "0000" & unsigned(val) when (valsign = ’0’) else "0000" & unsigned(-val);
VHDL : syntax
the "code" : two execution contexts
• outsideprocess: concurrent execution
⇒continuous assignments (i.e. follow physical changes).
• withinprocess: sequential execution,
⇒real assignments are effective at the process end.
VHDL : syntax
example outside process
entity setreset is
port(s,r : in bit; q,qb : out bit);
end setreset;
architecture rtl of setreset is signal sq,sqb : bit;
begin
q <= sq; - assignment 1 qb <= sqb; - assignment 2 sq <= sqb NOR r; - assignment 3 sqb <= sq NOR s; - assignment 4 end architecture rtl;
VHDL : syntax
process principles :
• sequential execution within the process,
• sensibility list = parameters of the process,
• any change in the state of a parameter⇒start the process,
• control instructions allowed (if, case, ...),
• inner variables with immediate assignment allowed,
• a signal can be assigned in a single process,
• a signal can be read by several process,
• special case : a process with the clock as a parameter.
⇒the state of all signals are stored in registers, and thus delayed by a clock cycle.
VHDL : syntax
example of an asynchronous process : subtraction/comparison signal a : unsigned(3 downto 0);
signal b : unsigned(3 downto 0);
signal sum : unsigned(3 downto 0);
signal cmp : std_logic;
...
diffcmp : process (a,b)
variable s : signed(3 downto 0);
begin
s := a - b;
sum <= s; - totally useless cmp <= ’1’;
if a < b then cmp <= ’0’;
s := -s;
end if;
sum <= s;
end process;
VHDL : syntax
example of a synchronous process : two cycles delay signal s : std_logic_vector(2 downto 0);
signal sdly : std_logic_vector(2 downto 0);
signal r1 : std_logic_vector(2 downto 0);
signal r2 : std_logic_vector(2 downto 0);
...
dly2cy : process (clk) if rising_edge(clk) then
r1 <= s;
r2 <= r1;
end if;
end process;
sdly <= r2;
VHDL : syntax
example of mixed process : delay + asynchronous reset signal s : std_logic_vector(2 downto 0);
signal sdly : std_logic_vector(2 downto 0);
signal r1 : std_logic_vector(2 downto 0);
signal r2 : std_logic_vector(2 downto 0);
...
dly2cyraz : process (clk,reset) if reset = ’1’ then
r1 <= "000";
r2 <= "000";
elsif rising_edge(clk) then r1 <= s;
r2 <= r1;
end if;
end process;
sdly <= r2;
VHDL : coding
Solutions
• "by hand" in text files :
• long and extremely tedious,
• a lot of syntax mistakes.
• automatic generation (simulink, HDL code, coregen, ...)
• design made with drawing boxes,
• each box has a fixed functionality or can contain matlab instructions,
• costly tools,
• code need modification to be usable on real FPGAs.
VHDL : coding
Simulink example
VHDL : simulation
Solutions
• GPL solution : ghdl and gtkwave
• need to write a VHDL code called testbench) that : I instantiates the design + a clock generator if needed, I defines a process to generate input signals for the design.
• VHDL compilation (via ghdl)⇒syntax checking,
• creation of an executable to simulate the behavior of the design,
• this executable produces traces of the signals,
• display and analysis of these traces via gtkwave.
• paying solutions (matlab + modelsim)
• automatic generation of the testbench,
• co-simulation,
• integrated trace visualization.
VHDL : simulation
Visualization example : a divider
VHDL : execution
Principles
• microcode, called bitstream, generation via a placement/routing tool (ise),
• uploading the bitstream (via USB, PCI, ...),
• "execution" starts at the end of the upload.
• reading/writing values through I/O pins :
• with external peripherals,
• with another FPGA or processor,
• with USB/PCI/Ethernet/... controllers.
• on some FPGA boards :
• only a part of the bitstream can be uploaded,
• bitstream stored in a flash memory and automatically loaded when powering the board