← back to syllabus ← back to notes
These topics are covered in our CODmips textbook’s Chapter 4 presentation deck
Refer to Notes on Designing a von Neuman computer
The first steps in designing a processor and its instruction set architecture (ISA) involve a comprehensive understanding of the requirements and goals for the new computer. This includes considering the following aspects:
Architects often need to define what the computer will do. These requirements can be inspired by market demands or driven by the needs of application software. If a large amount of software exists for a particular ISA, the architect might choose to implement an existing instruction set. Understanding the intended applications is crucial as it influences the necessary features and capabilities of the processor.
The ISA is the foundation of the processor design, acting as the interface between hardware and software. It defines what is visible to the programmer or compiler writer, and includes:
Instruction Fetch:
Instruction Decode:
Instruction Issue:
Operand Fetch/Read Operands:
Execute:
Write Result:
Instruction Commitment (Retire):
Control Unit’s Role:
Alongside functional requirements, architects must consider constraints such as price, power consumption, performance targets, and availability goals. These constraints will influence design decisions at all levels, from the ISA to the physical implementation.
As most programs are written in high-level languages and compiled, the ISA must be a suitable target for compilers. Therefore, understanding compiler technology is critical for efficient ISA design and implementation. Architects may even collaborate with compiler writers to determine desired features.
Early design steps involve deciding on basic architectural principles, such as whether to aim for a Complex Instruction Set Computer (CISC) or a Reduced Instruction Set Computer (RISC) architecture. The choice affects the complexity of the instruction set and the processor implementation. Other fundamental choices include the number of addresses in instructions, data types supported, and the overall philosophy of the architecture (e.g., load-store vs. register-memory).
In essence, the initial stages of processor and ISA design are about defining what the processor needs to do and how programmers and compilers will interact with it, all while respecting the practical limitations of cost, power, and performance. The ISA forms the blueprint that guides the subsequent steps of processor organization (microarchitecture) and hardware implementation
A basic testbench for a Verilog implementation of a simple MIPS 32-bit processor serves to verify the functionality of the designed processor by providing inputs and checking the outputs. Here’s a breakdown of the essential components and concepts for such a testbench, drawing from the sources:
A testbench is itself a Verilog module that typically has no inputs or outputs. It instantiates the device under test (DUT), which in this case is the MIPS processor module. A MIPS processor module might have these kinds of ports to interact with memory and control signals.
module tb_computer();
  // Signals to interact with the DUT
  logic clk;
  logic reset;
  logic [31:0] pc;        // Program Counter (output from DUT)
  logic [31:0] instr;     // Instruction (input to DUT - from instruction memory)
  logic memwrite;       // Memory write signal (output from DUT)
  logic [31:0] aluout;    // ALU output (output from DUT)
  logic [31:0] writedata; // Data to be written to memory (output from DUT)
  logic [31:0] readdata; // Data read from memory (input to DUT)
  // Instantiate the MIPS processor (DUT)
  mips dut (
    .clk(clk),
    .reset(reset),
    .pc(pc),
    .instr(instr),
    .memwrite(memwrite),
    .aluout(aluout),
    .writedata(writedata),
    .readdata(readdata)
  );
  // ... (Clock generation, reset sequence, stimulus generation, and result checking)
endmodule
Most digital systems, including processors, are synchronous and rely on a clock signal. The testbench needs to provide this clock signal to the DUT. A common way to generate a clock in a testbench is using an always block with a delay.
  // Generate clock
  always begin
    clk = 1;
    #5; // Wait for 5 time units (adjust as needed)
    clk = 0;
    #5;
  end
A reset signal is crucial to initialize the processor to a known state. The testbench should assert the reset signal for a certain duration at the beginning of the simulation.
  // Generate reset
  initial begin
    reset = 1;
    #22; // Assert reset for 22 time units (adjust as needed)
    reset = 0;
  end
To test the processor, the testbench needs to provide a sequence of instructions and any necessary data to the processor’s memory interface. This often involves simulating instruction and data memories within the testbench or connecting to external memory models.
A crucial part of the testbench is to verify if the processor is behaving correctly. This involves monitoring the output signals from the DUT (e.g., register values, memory writes) and comparing them against expected values.
always @(posedge clk) or always @(negedge clk) blocks to observe the processor’s state at specific clock edges$display system task can be used to print messages indicating success or failure.For example, a test program can write the value 7 to memory address 84, and the test bench would check with an assert if the program wrote the correct, expected result at the memory address.
The testbench needs to control the simulation duration. You can use the ``$finish system task to end the simulation once the tests are complete or if an error is detected.
Key Considerations for a Basic Testbench:
A basic testbench for a simple MIPS 32-bit processor will instantiate the processor, provide a clock and reset signal, supply instructions (possibly through a simplified memory model), and include some mechanism to check if the processor executes these instructions correctly. As the complexity of the processor implementation grows, the testbench will also need to become more sophisticated to cover all aspects of its behavior.
Section 1.6 of CODmips textbook, titled “Performance”, discusses the crucial aspects of evaluating and comparing the speed and efficiency of computers. It highlights the challenges in performance assessment due to the complexity of modern software and hardware.
The section begins by emphasizing that when we say one computer has better performance than another, the definition can be subtle. Using the analogy of passenger airplanes, it illustrates that “performance” can have different meanings depending on the criteria (e.g., speed for a single passenger vs. capacity for many). Similarly, computer performance can be defined in different ways based on the user’s or manager’s perspective.
The section distinguishes between two primary measures of performance:
The section explains how to quantitatively compare the performance of two computers. If computer X is n times faster than computer Y, it means that the execution time on Y is n times longer than on X. The performance ratio is calculated as (Performance of X) / (Performance of Y) = (Execution time of Y) / (Execution time of X).
Time is presented as the ultimate and most reliable measure of computer performance; the computer that completes the same work in less time is faster.
Different ways of measuring time are discussed:
Computer designers often think about performance in terms of clock cycles, which are discrete time intervals determined by the computer’s clock. The length of a clock cycle is the clock cycle time (e.g., in picoseconds), and its inverse is the clock rate (e.g., in gigahertz).
The section introduces the fundamental equation that relates CPU execution time to key hardware characteristics:
This equation shows that performance is affected by the number of instructions in a program, the average number of clock cycles required for each instruction (CPI), and the duration of each clock cycle. Figure 1.15 summarizes these basic components of performance.
 
The section elaborates on the factors that influence each component of the performance equation:
The dynamic frequency of different instruction types in a program (instruction mix) also plays a crucial role in overall performance as different instructions may have different CPI values.
The section cautions against using only a subset of the performance equation (like clock rate) to compare computers, as this can be misleading. The only complete and reliable measure of computer performance is time.
The performance of a program depends on the algorithm, the language, the compiler, the architecture, and the actual hardware. The following table summarizes how these components affect the factors in the CPU performance equation.
| Hardware or software component | Affects what? | How? | 
|---|---|---|
| Algorithm | Instruction count, possibly CPI | The algorithm determines the number of source program instructions executed and hence the number of processor instructions executed. The algorithm may also affect the CPI, by favoring slower or faster instructions. For example, if the algorithm uses more divides, it will tend to have a higher CPI. | 
| Programming language | Instruction count, CPI | The programming language certainly affects the instruction count, since statements in the language are translated to processor instructions, which determine instruction count. The language may also affect the CPI because of its features; for example, a language with heavy support for data abstraction (e.g., Java) will require indirect calls, which will use higher CPI instructions. | 
| Compiler | Instruction count, CPI | The efficiency of the compiler affects both the instruction count and average cycles per instruction, since the compiler determines the translation of the source language instructions into computer instructions. The compiler’s role can be very complex and affect the CPI in complicated ways. | 
| Instruction set architecture | Instruction count, clock rate, CPI | The instruction set architecture affects all three aspects of CPU performance, since it affects the instructions needed for a function, the cost in cycles of each instruction, and the overall clock rate of the processor. | 
Although you might expect that the minimum CPI is 1.0, as we’ll see in Chapter 4, some processors fetch and execute multiple instructions per clock cycle. To reflect that approach, some designers invert CPI to talk about IPC, or instructions per clock cycle. If a processor executes on average 2 instructions per clock cycle, then it has an IPC of 2 and hence a CPI of 0.5.
Although clock cycle time has traditionally been fixed, to save energy or temporarily boost performance, today’s processors can vary their clock rates, so we would need to use the average clock rate for a program. For example, the Intel Core i7 will temporarily increase clock rate by about 10% until the chip gets too warm. Intel calls this Turbo mode.
In summary, Section 1.6 lays the foundation for understanding computer performance by defining it from different perspectives, introducing key metrics like response time and throughput, explaining how to compare performance quantitatively, and presenting the fundamental performance equation that links execution time to instruction count, CPI, and clock cycle time. It emphasizes that achieving high performance requires considering all these factors and that time is the ultimate measure.
Section 4.5 covers multicycle processor implementation. We moved from basic concepts to a simple single-cycle implementation in Section 4.4