FPGA Basics: LUTs, CLBs, Slices, and Logic Cells

03/03/2023

FPGAs, or Field Programmable Gate Arrays, are highly versatile integrated circuits that can be reprogrammed by the user, providing flexibility in hardware design. Through the use of specialized software languages like Verilog or VHDL, users can manipulate the FPGA’s physical hardware, allowing them to reconfigure a single board to be just about any digital circuit. At the heart of an FPGA lies an array of configurable logic blocks, which can be connected together to create complex sequential and combinational logic circuits. This makes FPGAs a popular choice for applications such as digital signal processing, image processing, and network processing.

Architecture of a Typical FPGA

The bulk of the work done by an FPGA is performed by a component known as a Configurable Logic Block (CLB) which is the component that is configured to a given logic circuit. Each CLB is composed of grouped elements that are known as slices, which each slice containing logical elements known as Logic cells. Although each CLB may contain different numbers of slices based on manufacturers, for the purpose of demonstration here we will assume each CLB contains 4 slices, and each slice contains 2 logic cells.

Basic Architecture of an FPGA

Programmable Logic Arrays (PLA)

In order to gain a comprehensive understanding of field-programmable gate arrays (FPGAs), it is essential to first grasp the fundamental concept of programmable logic arrays (PLAs). A PLA is composed of numerous AND gates that are interconnected with an OR gate. These arrays are initially manufactured as blank slates, meaning that they are not initially linked to the inputs. The designer of a logic circuit will begin the design process by deriving a sum-of-product equation from a truth table, determining the output rows with a value of 1, and then combining these terms.

Generation of a sum-of-products equation from a truth table

Next, the designer creates connections between each input and the appropriate AND gate based on the defined equation. Once this is complete, the output of each AND gate is combined using the OR gate to create the final output, also known as the sum.

Example of connections formed on a PLA

Look Up Tables

Look Up Tables bare most of the functionality of the FPGA as they are the elements that actually implement the logical circuits (similarly to PLAs). The LUTs are configurable by the user, and can be connected together to create complicated systems.

Look Up Tables as a Black Box

To me the easiest approach to understanding complex concepts is to begin at the fundamental level and gradually incorporate increasingly intricate details until a comprehensive understanding is achieved. When it comes to logical elements, this typically entails starting with a basic definition of the element as a black box, which represents the element’s functionality (the “what”) without delving into the hardware implementation (the “how”). This method allows for a more holistic understanding of the element’s behavior:

Black Box Representation of LUT

The black box representation of the LUT shows 3 input lines and 1 output line. For the functionality aspect of the LUT, it should be able to take any sequential logic circuit (such as the one derived above) and translate the inputs to the proper logical output. It is important to note that at the black box level both the LUT and the PLA are identical.

Functionality of the LUT

The black box definition of the LUT can be expanded by including the two most important elements that make up an LUT: and SRAM array and a multiplexer.

An SRAM cell is simply a memory bit that is constantly refreshed in order to maintain its previous value, similar to the latches we utilized in the creation of registers for my discrete computer project. Multiplexers on the other hand are logical elements that have n select lines and 2^n input lines. Whenever the select lines select for one of the 2^n input lines this line is connected directly to the output and the value of that input line can be seen at the output.

The LUT functions by storing the expected values of the logic table in the SRAM array. This means when the multiplexer selects for a certain SRAM cell, the expected value from the truth table is passed to the output. Another way to think of it is by thinking of each combination of inputs as an address that selects for a specific memory cell of the LUT that corresponds to the output value on the Truth table. Lets take for example a multiplexer with 3 input select lines, which of course selects from up to 8 lines (2^3):

The LUT functions by storing the anticipated results of the logic table in the SRAM array. This means when a specific SRAM cell is selected by the multiplexer, the projected value from the truth table is transferred to the output. An alternative approach to conceptualizing this process is to consider each possible input configuration as an address that targets a unique memory cell of the LUT that corresponds to the output value stipulated in the truth table.

With this concept in mind the black body model of the LUT can be expanded to the following diagram:

Updated model of the LUT

Now lets say we configure the SRAM cells according to the equation A’B’C’+AB’C+ABC that had been derived from truth table above. We can see that the SRAM cells associated with A’B’C’ (000), AB’C(101), and ABC (111) are set to high. This means that whenever these lines are selected for the resulting output will be HIGH, while any other possible input combination will yield a LOW value from the SRAM.

LUT Configured to the equation A’B’C’+AB’C+ABC

Sometimes it is easier to understand this type of stuff by looking at the very low level representation of the circuit to see the exact patch a signal takes through the transistor level. The following diagram shows the 3 to 1 LUX that we have been modelling throughout this post on the transistor level using MOSFET technology:

CMOS level LUT

The MOSFETs with a dot at the base are P channel MOSFETs (current flow activated by a low signal), while those without are N channel MOSFETs (current flow activated by a low signal). When using the equation A’B’C’+AB’C+ABC, two cases of the LUT can be seen below:

LUT functionality for the configuration equation A’B’C’+AB’C+ABC; left side is input case AB’C and right side is input case AB’C’

In these examples red follows the case in which a low signal is passed and the red X’s show the MOSFETs that are turned off (disallowing current to flow). These cases show that when the input matches a case where the value is 1 on the truth table (AB’C) a value of 1 is passed to the output, and otherwise the value of 0 is passed.

Adding Configuration Functionality

The main feature of the FPGA is that it is programmable so the user must have a method for configuring the LUT. This is done by placing a mask on the SRAM array that matches the expected values of the truth table (with the SRAM cell that corresponds to a 1 containing a high signal and those corresponding to a 0 containing a low signal). The two main methods for transferring this mask to the SRAM array is to use a bit addressable decoder, or a data shift scheme.

The first method of configuring the cells is with a decoder. A bit addressable decoder simply takes an n-bit input (such as 3 in our example) and selects the associated line from 2^n lines (8 in our case). This can be used to program the SRAM array by activation each SRAM cell individually, then loading to appropriate bit value to the cell. The decoder can select for all 8 lines of the array within 8 cycles.

LUT with decoder configuration method

The second method involves setting up the array as a shift register with the serial data coming in at only the first cell. In this method on each clock cycle data is passed into the first SRAM cell, and the data from all other shift cells down one. After 8 clock cycles in our example each SRAM cell will be loaded with the correct value.

LUT with shift configuration method

Each method has its advantages and disadvantages and are specific to the necessities of the system. Shift configuration has the advantage of reduction in the total number of pins utilized, while the decoder method allows more control as individual cells within the array can be selected for. Overall independent of which method is utilized our running model of the LUT can be updated to include a general configuration block as each LUT will need one:

Updated LUT model with configuration added

The Logic Cell

The logic cell is the main unit that makes up the CLBs. They are comprised of a LUT, as well as a few other logical elements to allow for increased functionality of the FPGA such as shift registers and carry chains.

The most basic diagram of a logic cell includes a configurable LUT (as described above), a D flip flop to optionally latch or save the output, and a multiplexer to select between using a latched value or taking a new value from the LUT:

Basic Logic Cell

It is important to note at this point that this model of a logic cell is extremely simplified and that commercial FPGAs will likely include more elements as well as more control signals. Additionally the CLB does not give a complete image of the FPGA as there are many other components of FPGAs that will be explored in future posts such as configurable paths between CLBs and I/O ports.