Category Archives: Performance

Detecting infinite loops in Verilog processes

Because Verilog code has communicating concurrent processes, it’s much easier to accidentally write code that results in an infinite loop, and it’s harder to identify the cause of the infinite loop.

We’ve added an option to VeriLogger to automatically interrupt the simulation after a large number of delta cycles have taken place without advancing simulation time. By default, the delta cycle limit is set to 1000, but it can be changed using the simxloader option, –scd_max_delta=<limit>.

Here’s an example of an infinite loop between two zero-delayed assigns that would be interrupted automatically:

wire bw = cw === 1'bx ? b : cw+b;
assign cw = bw;
initial  #1 b = 1;

Delta cycles occur whenever a process is scheduled. Since the delta limit cycle is finite, it’s possible that some code could hit the limit even when there’s no infinite loop. For example, the code below would trigger the limit because the #0 time delay would cause the process to be rescheduled each iteration through the loop.

for (i=0; i < 4095; i = i + 1)
#0 ram[i] <= 0;

Of course, this code is not good code for many reasons (including efficiency, since it forces unnecessary reschedulings of the process) and should be written as below (which won’t trigger the limit):

for (i=0; i < 4095; i = i + 1)
ram[i] <= 0;

When the delta limit is hit, the simulator will interrupt the simulation and place the user at the simulator’s interactive command prompt so that the source of the infinite loop can be analyzed and debugged. Simulation can be resumed from the command prompt using any of the normal run commands.

 

64-bit Verilog Simulator on Windows

As expected, we didn’t see nearly the performance improvement on Windows that we did on Unix. Average speed improvement was probably 3% or less on our benchmark suite. However, the 64bit simulator is still useful when you’re working on a very large design that won’t compile or run within the memory limits of 32-bit Windows (either 2GB, generally).

64-bit Verilog Simulation on Linux and Windows

We recently released a 64-bit Linux version of the simulation engine. The main reason for this port was to allow the simulator to handle extremely large designs (32-bit applications are limited to 3GB on Linux and while this is enough for most designs, it’s a definite limitation for some of today’s large designs).

However, we were surprised and pleased to find that the 64bit simulator runtime executes about 30% faster than the 32bit version running on equivalent hardware. We also see a decent speedup during the simulation generation process (e.g. compiling the Verilog code into a simulation). 64bit mode does use a little more memory since pointers are 64bits rather than 32bits in size, and this had been one area of concern to us when starting the port, but in practice we found this had little impact on memory usage by the simulator.

We’ve not had a chance yet to fully explore all the reasons for the speedup, but initial research has turned up several reasons for this. First, in 64bit mode, about twice as many registers are available to the compiler, allowing commonly used varaiables to be kept inside the CPU longer. In addition, since 64bit Linux is less register constrained, it uses a different function calling convention (known as the __fastcall convention)where most function parameters can be passed in registers rather than on the stack, reducing the overhead of function calls. And, finally, 64bit integer operations (such as time value calculations in Verilog) can be done in a single operation, of course.

If you’re like me, you’d like to know more about exactly why we see such a dramatic speedup. Well, it’s on the list for further investigation, but that work is pending right now till we finish our 64bit Windows port. We’ll revisit this subject then, and in the meantime I hope to be able to report on what kind of improvement we see for 64bit Windows. We’re expecting less improvement here, since the 64bit Windows version of the fast-call covention only passes the first four parameters of a function via registers.