Case Study on Unit-Level Verification


The unit-level approach to hardware verification has been applied to several units of MIPS64-compatible microprocessor including translation lookaside buffer (TLB), floating-point unit (FPU), arithmetic-logic unit (ALU), and cache L2 unit. This page contains two case studies:

Verification of TLB

The approach was applied to a Verilog model of translation lookaside buffer (TLB) of MIPS64-compatible microprocessor. TLB is a buffer that is used to translate virtual addresses into physical ones.

The memory of the TLB under test comprises three buffers: a 4-entries instruction micro TLB (ITLB), a 4-entries data micro TLB (DTLB), and a large 64-entries joint TLB (JTLB). The purpose of the micro TLBs is to allow two address translations being performed simultaneously — one for an instruction fetch address (via the ITLB) and one for a data load/store address (via the DTLB).

Besides the address translation operations, TLB implements operations for reading entry from the buffer, writing entry to the buffer, and probing if the entry exists in the buffer.

Part of the requirements on the TLB under test were formulated by the developers, while the others were derived from the technical documentation. A total number of the requirements is about a hundred. TLB requirements partitioning is shown in the table below.

Table 1: TLB sample - Requirements partitioning
Operation Number of requirements
Read 7
Write 9
Probe 8
Data Address Translation 38
Instruction Address Translation 37
Total 99

Requirements on each operation was represented in the form of preconditions and postconditions of microoperations. It should be emphasized that all requirements were cheaply formalized. The volume of specifications is about 2.5 KLOC in SeC language (specification extension of C).

The total labor costs of the testbench development are 2.5 man-months.

We have found 9 errors in the TLB implementation including critical ones.

Verification of L2 Cache

The approach was applied to Verilog model of L2 cache of MIPS64-compatible microprocessor. The L2 cache under verification is a direct-mapped 256 KB cache that consists of 8192 rows. Each row contains data (four 64-bit double words), tag (18 upper bits of physical address), and control bits (V – Valid and W – Writeback).

The L2 cache serves both data and instructions. It implements operations for loading and storing data, loading instructions, modifying control bits and tags in cache rows. It also supports special snoop reading, writing, and writeline operations. Some of operations can be started in parallel way with a maximum parallelism level of three. E.g. it is possible to start a loading of instruction with a loading of data and with any snoop operation in parallel.

A total number of the requirements is more than one hundred. L2 cache requirements partitioning is shown in the table below.

Table 2: L2 cache sample - Requirements partitioning
Operation Number of requirements
Load Data 21
Store Data 6
Load Instruction 35
Cache 33
Snoop Read 9
Snoop Write 11
Snoop Writeline 5
Total 120

The volume of specifications is about 3 KLOC in SeC language (specification extension of C).

The total labor costs of the testbench development are 6 man-months.

We have found 12 errors in the L2 cache implementation including critical ones.

Verification of MAU-Hub

The approach was used for verification of a relatively simple module of the memory access unit hub (or MAU-hub) of Elbrus-compatible microprocessor. This hub is used to connect four memory access units to each others in multi-core microprocessor.

The unit included about ten inputs and four outputs but relatively easy logic inside: only a couple of arbiters without history. All the operations transferred data from one channel to another. The requirements weren't marked up in the specifications of the module, but their number can be approximately estimated as about fifty. The volume of code was about 1 KLOC. The code of the test system included about 2.5 KLOC of SeC (specification extension of C) code and 1.5 KLOC of SystemVerilog code. Approximately half of this amount was generated automatically based on original Verilog-sources.

The total labor costs of the testbench development were about 1 man-month plus approximately 1 man-month for experiments with SystemVerilog and running of tests.

We have found 5 errors in the implementation.

Verification of North bridge data hub

The approach has been twice applied to the North bridge data hub of Elbrus-compatible microprocessor. This hub is used to connect data flows between cores, IO-channels and memory together.

The main difference between two models was that the first release under test supported only one MAU (memory access unit) and in the second one the total amount of them increased to four. In other aspects they are very similar with only one exception of IO-packets processing.

The first test system was developed by means of pure CTESK and its SeC language. While developing the second version of it, new approach was utilized. That time new approach was used twice: first, test system was written by means of SeC for scenarios and C++ for reference model and mediator, then, to avoid usage of SeC scenarios they were also translated into C++. The resulted test system uses only CPPTESK-toolkit not CTESK.

The volume of DUT's code was about 3 KLOC first and 4 KLOC in the latest version. The first version consisted of 6 KLOC of SeC code and 1 KLOC SeC code automatically generated. The second version made by means of SeC plus C++ included 2 KLOC SeC code, 5 KLOC C++ code and 1.5 KLOC automatically generated C++ code. The second version made by means C++ only included approx. 6.5 KLOC C++ code and 1.5 KLOC of automatically generated C++ code.

The first version took about 3 man-months and it was cycle-accurate. It should be noticed that the first DUT was carefully checked before our verification but nevertheless we managed to found some problems affecting the performance of the unit and also one real bug. The second test system took about gross 4 man-months or 3 net man-months. Extra time was spent to correct the DUT after bugs had been found. That time the DUT had not been checked at all before verification so that process of bugs finding took more efforts and time than it had expected. We were not much surprised when the number of found bugs had overcome 15.