← back to syllabus ← back to notes
← back to syllabus ← back to notes
The arithmetic instructions of the MIPS32 architecture are a cornerstone of understanding how CPUs actually do math. Remember, we’re building up from the ground floor, so understanding these fundamentals is crucial for your future work in computer architecture and embedded systems.
Here’s the high-level overview of MIPS32 arithmetic instructions, focusing on the key concepts relevant to you as budding engineers and computer scientists:
operation $rd, $rs, $rt
, where $rd
is the destination, $rs
is the first source, and $rt
is the second source. For example, add $t0, $t1, $t2
means $t0 = $t1 + $t2
. This consistent format simplifies instruction decoding and execution.add
, addu
(unsigned), addi
(immediate), addiu
(unsigned immediate).sub
, subu
(unsigned). mul
, mult
, multu
. mult
and multu
produce a 64-bit result, stored in the special HI
and LO
registers. You then use mfhi
and mflo
to move these parts into general-purpose registers. mul
provides the lower 32-bit result.div
, divu
. Similar to multiplication, div
and divu
produce a quotient and a remainder. The quotient is stored in LO
, and the remainder in HI
.and
, or
, xor
, nor
. These perform bitwise logical operations. Crucial for manipulating data at the bit level. We’ll delve into their uses later when we discuss control flow and data manipulation.sll
(shift left logical), srl
(shift right logical), sra
(shift right arithmetic). These are essential for bit manipulation and are often used in implementing multiplication and division by powers of 2.addi
). This allows you to use a constant value directly in the instruction, without needing to load it into a register first. This is a significant optimization for frequently used constants. Preparation is handled by the assembler and encoded into the machine code of the program.Let’s illustrate these MIPS32 arithmetic concepts with some SystemVerilog examples. These examples demonstrate how you might represent these operations in a hardware description language, keeping in mind that this is a simplified representation and a real MIPS32 implementation would be significantly more complex.
mips_alu.sv
module mips_alu (
input logic clk,
input logic rst,
input logic [4:0] opcode, // Represents the arithmetic operation
input logic [31:0] rs,
input logic [31:0] rt,
output logic [31:0] rd,
output logic overflow
);
logic [63:0] mult_result; // For multiplication
always_ff @(posedge clk) begin
if (rst) begin
rd <= '0;
overflow <= 0;
end else begin
case (opcode)
5'b00000: begin // add
rd <= rs + rt;
overflow <= (rs[31] == rt[31]) && (rs[31] != rd[31]); // Signed overflow check
end
5'b00001: begin // addu (unsigned)
rd <= rs + rt; // No overflow check for unsigned
overflow <= 0;
end
5'b00010: begin // sub
rd <= rs - rt;
overflow <= (rs[31] != rt[31]) && (rs[31] != rd[31]); // Signed overflow check
end
5'b00011: begin // subu (unsigned)
rd <= rs - rt; // No overflow check for unsigned
overflow <= 0;
end
5'b00100: begin // mul
mult_result <= rs * rt;
rd <= mult_result[31:0]; // Lower 32 bits
overflow <= 0; // Simplified - real mul doesn't set overflow in this way.
end
5'b00101: begin // mult (signed)
mult_result <= $signed(rs) * $signed(rt);
rd <= mult_result[31:0];
overflow <= 0; // Simplified
end
5'b00110: begin // multu (unsigned)
mult_result <= rs * rt;
rd <= mult_result[31:0];
overflow <= 0; // Simplified
end
5'b00111: begin // div (signed)
if (rt == 0) begin
rd <= 'x; // Undefined result on divide by zero
overflow <= 1; // Indicate divide by zero.
end else begin
rd <= $signed(rs) / $signed(rt);
overflow <= 0;
end
end
5'b01000: begin // divu (unsigned)
if (rt == 0) begin
rd <= 'x; // Undefined result
overflow <= 1; // Indicate divide by zero.
end else begin
rd <= rs / rt;
overflow <= 0;
end
5'b01001: begin // and
rd <= rs & rt;
overflow <= 0;
end
5'b01010: begin // or
rd <= rs | rt;
overflow <= 0;
end
5'b01011: begin // xor
rd <= rs ^ rt;
overflow <= 0;
end
5'b01100: begin // nor
rd <= ~(rs | rt);
overflow <= 0;
end
5'b01101: begin // sll
rd <= rs << rt[4:0]; // Only lower 5 bits of rt are used for shift amount
overflow <= 0;
end
5'b01110: begin // srl
rd <= rs >> rt[4:0]; // Logical shift
overflow <= 0;
end
5'b01111: begin // sra
rd <= $signed(rs) >>> rt[4:0]; // Arithmetic shift
overflow <= 0;
end
default: begin
rd <= 'x; // Invalid opcode
overflow <= 0;
end
endcase
end
end
endmodule
Key points about this SystemVerilog example:
rs
and rt
represent the source registers, and rd
is the destination register.mult
and multu
instructions produce a 64-bit result. This example uses a wider mult_result
signal to hold this intermediate value. In a real implementation, you would then transfer the HI
and LO
parts of this result to separate registers using other instructions.shift
operations use only the lower 5 bits of the rt
register as the shift amount, as per the MIPS32 specification.Think of your computer’s memory as a vast warehouse, but instead of physical goods, it stores data and instructions. A memory map is essentially the blueprint of this warehouse, defining how different sections are organized and used. In a general-purpose computer, this map typically includes several key regions:
Text (Code) Segment: This is where the program’s instructions reside. It’s often read-only, as controlled by the kernel/OS, to prevent accidental modification by users or malware, which could lead to crashes. Think of it as the instruction manual for the CPU.
Data Segment: This segment holds global variables and static data, which are allocated before the program starts execution and exist throughout its runtime. It’s like the storage area for items that need to be readily available.
Heap: The heap is a region of memory used for dynamic memory allocation. When your program needs to create objects or data structures during execution (using functions like malloc
in C or new
in C++), it requests space from the heap. This is a more flexible storage area, but it requires careful management to avoid memory leaks.
Stack: The stack is used for function (or procedure) calls and local variables. When a function is called, its parameters, local variables, and return address are pushed onto the stack. When the function returns, meaning it’s finished doing its work, these items are popped off the stack, that is, freed from stack memory and made available to the calling function. The stack operates on a Last-In, First-Out (LIFO) principle, like a stack of plates.
Reserved Memory: Certain memory locations are reserved for specific purposes, often by the operating system or hardware, itself. These areas are typically off-limits to user programs to maintain system stability. This is like the “staff only” area of our warehouse.
Visual Representation:
A simplified memory map might look like this:
+-----------------+ High Address
| Reserved Memory |
+-----------------+
| Stack | Growing downwards
+-----------------+
| Heap | Growing upwards
+-----------------+
| Data |
+-----------------+
| Text |
+-----------------+ Low Address
Now, let’s look at how MIPS32, a popular RISC architecture often used in embedded systems and for teaching computer architecture, defines its memory map. MIPS32 has a well-defined memory map that simplifies memory management and provides a consistent environment for software development.
MIPS32’s memory map is divided into several segments, but a few key ones are worth highlighting:
Kernel Segment (0x80000000 - 0xFFFFFFFF)
: This segment is reserved for the operating system kernel. User programs cannot directly access this area, ensuring system protection.
(0x00000000 - 0x7FFFFFFF)
: This is where user programs reside and execute. This segment is further subdivided, but the key divisions are:
kseg0
, kseg1
: These segments are for kernel data and are cached (kseg0
) or uncached (kseg1
), respectively.A 32-bit MIPS processor, being byte-addressable by design, supports $2^{32}$ memory addresses, equating to 4,294,967,296 unique memory locations, each holding a single byte of data. Since each address holds one byte, and there are $2^{32}$ addresses, the total memory is $2^{32}$ bytes. To convert this to gigabytes (GB), we know that 1 GB is equal to $2^{30}$ bytes.
Therefore, the total memory in GB is: $\Large\dfrac{ 2^{32} bytes }{2^{30}\frac{bytes}{GB}} = 2^{2} GB = 4 GB$
So, a 32-bit MIPS processor with byte addressing supports 4 GB of memory.
Let’s talk about pointers in the MIPS32 memory map, focusing on the crucial frame pointer and stack pointer. Pointers, in essence, are memory addresses. They “point” to a specific location in memory, allowing you to access and manipulate data stored there. In MIPS32, like most architectures, pointers are typically 32-bit values, capable of addressing any location within the 4GB address space.
$sp
):The stack pointer ($sp
) is one of the 32 registers (R29
) that holds the address of the top of the stack. Remember, the stack grows downwards in memory. So, as you push data onto the stack, the stack pointer decrements. Conversely, when you pop data off the stack, the stack pointer increments.
The $sp
is essential for managing function calls and local variables. When a function is called:
$sp
.$sp
):// Assuming 'stack_memory' is an array representing the stack
// and 'sp' is a variable holding the stack pointer value
// Push a value onto the stack
sp = sp - 4; // Decrement stack pointer (4 bytes for a word)
stack_memory[sp] = data_to_push;
// Pop a value from the stack
data_received = stack_memory[sp];
sp = sp + 4; // Increment stack pointer
Why add 4? Remember, MIPS32 is byte-addressable, so 1 word = 4 bytes to move up addresses of memory map/ladders
$fp
):The frame pointer ($fp
) is another important register that points to the base of the current function’s stack frame. A stack frame is the region of the stack dedicated to a particular function call, containing its local variables, parameters, and return address.
The $fp
provides a stable reference point for accessing local variables and function arguments within a function, even if the stack pointer changes during the function’s execution (e.g., due to pushing or popping other values). This is especially useful for debugging and for languages that support variable-length argument lists.
$sp
: At the beginning of a function’s execution, the $fp
is typically set to a known offset from the current $sp
. The $sp
might change during the function’s operation, but the $fp
remains constant, providing a consistent way to access the function’s data.$fp
):// At function entry:
fp = sp + offset_to_frame_base; // Set frame pointer
// Accessing a local variable (at a fixed offset from fp)
local_variable = stack_memory[fp + offset_to_local_variable];
$sp
: Dynamically changes as data is pushed and popped from the stack. It always points to the top of the stack.$fp
: Generally remains constant during a function’s execution. It points to a fixed location within the function’s stack frame, providing a stable base for accessing local variables and parameters.While not strictly required (some compilers optimize it away), the frame pointer simplifies function call management and makes debugging easier. It allows you to trace back the call stack and inspect the values of local variables at different points in the program’s execution.
When designing a MIPS32-based system in SystemVerilog, you’ll need to model this memory map. You’ll define address ranges for each segment and ensure that your memory controller correctly handles memory accesses based on these ranges. For instance, you could use parameterized address ranges in your SystemVerilog code to represent each segment. You’ll also need to model the behavior of the stack and heap, perhaps using arrays and pointers within your SystemVerilog testbench to simulate memory allocation and deallocation.
Understanding the MIPS32 memory map is crucial for writing correct and efficient code for MIPS-based systems. It allows you to manage memory effectively, avoid memory leaks, and take advantage of the architecture’s features.
When you’re modeling a MIPS32 processor in SystemVerilog, you’ll need to implement the behavior of both the $sp
and $fp
. You’ll typically use variables or arrays to represent the stack memory and registers, and then implement the logic for pushing, popping, and accessing data using these pointers. You’ll also need to consider how these pointers are initialized and updated during function calls and returns.
Remember, the specific usage of $fp
can vary slightly depending on the compiler and calling conventions used. But the fundamental principles of stack management and the role of these pointers remain consistent.
Let’s shift gears and talk about decision-making in MIPS32. Arithmetic is great, but a CPU also needs to make choices – to branch, loop, and execute code conditionally. That’s where decision-making instructions come in. Here’s a high-level summary for you:
beq
(branch if equal) and bne
(branch if not equal). They take three operands: two registers to compare and a branch target (an address). If the comparison is true, the program counter (PC
) is updated to the branch target, and execution continues from there. Otherwise, execution continues sequentially. Example: beq $t0, $t1, label
.slt
(set less than) and sltu
(set less than unsigned) instructions. These are not branch instructions. They perform a comparison and store the result (1
if true, 0
if false) in a register. Example: slt $t2, $t3, $t4
. This sets $t2
to 1
if $t3
< $t4
, and 0
otherwise. You then use beq
or bne
with $t2
to make a branch decision.slti
(set less than immediate) and sltiu
(set less than immediate unsigned). These allow you to compare a register with a constant value directly.j
(jump) unconditionally jumps to a target address. jr
(jump register) jumps to the address stored in a register. These are used for implementing function calls, returns, and other control flow structures.slt
, slti
, beq
, and bne
to achieve the same result.nop
– no operation) or arrange your code so that the instruction in the delay slot doesn’t depend on the branch result.In essence, decision-making in MIPS32 boils down to combining comparisons (using slt, slti) with conditional branching (beq, bne). The lack of condition codes and the presence of delayed branching are key characteristics you need to understand to write correct and efficient MIPS32 code. Now, let’s look at some SystemVerilog examples of how you might represent these instructions in hardware.
Taking a sneak-peak into some SystemVerilog code for implementing some of these. Don’t worry, we will deep dive in the coming weeks.
mips_control_unit.sv
module mips_control_unit (
input logic clk,
input logic rst,
input logic [31:0] instruction, // The MIPS instruction
input logic [31:0] rs,
input logic [31:0] rt,
input logic [31:0] pc, // Current Program Counter
output logic [31:0] next_pc, // Next Program Counter
output logic branch_taken, // Indicates if a branch was taken
output logic [31:0] rd_data_sel, // Data to write to rd
output logic rd_write_enable // Enable write to rd
);
logic [5:0] opcode;
logic [4:0] rs_addr, rt_addr, rd_addr;
logic [15:0] immediate;
logic [25:0] jump_target;
logic slt_result;
assign opcode = instruction[31:26];
assign rs_addr = instruction[25:21];
assign rt_addr = instruction[20:16];
assign rd_addr = instruction[15:11];
assign immediate = instruction[15:0];
assign jump_target = instruction[25:0];
always_ff @(posedge clk) begin
if (rst) begin
next_pc <= '0;
branch_taken <= 0;
rd_data_sel <= '0;
rd_write_enable <= 0;
end else begin
next_pc <= pc + 4; // Default: sequential execution (PC + 4)
branch_taken <= 0;
rd_write_enable <= 0;
case (opcode)
6'b000100: begin // beq
if (rs == rt) begin
next_pc <= pc + (immediate << 2); // Branch target calculation
branch_taken <= 1;
end
end
6'b000101: begin // bne
if (rs != rt) begin
next_pc <= pc + (immediate << 2); // Branch target calculation
branch_taken <= 1;
end
end
6'b001010: begin // slti
slt_result <= ($signed(rs) < $signed($signed(immediate)));
rd_data_sel <= slt_result;
rd_write_enable <= 1;
end
6'b001011: begin // sltiu
slt_result <= (rs < immediate); // Unsigned comparison
rd_data_sel <= slt_result;
rd_write_enable <= 1;
end
6'b000010: begin // j
next_pc <= {jump_target, 2'b00}; // Jump target calculation
branch_taken <= 1; // Treat as a branch
end
6'b000011: begin // jal
next_pc <= {jump_target, 2'b00}; // Jump target calculation
branch_taken <= 1; // Treat as a branch
end
6'b001100: begin // andi
rd_data_sel <= rs & immediate;
rd_write_enable <= 1;
end
6'b001101: begin // ori
rd_data_sel <= rs | immediate;
rd_write_enable <= 1;
end
6'b001110: begin // xori
rd_data_sel <= rs ^ immediate;
rd_write_enable <= 1;
end
// ... other instructions
default: begin
// Handle invalid opcodes or other instructions
end
endcase
end
end
endmodule
slti
and sltiu
: The slti
and sltiu
instructions are implemented, setting the slt_result
which is then used to update the destination register.j
and jal
(jump and link) instructions are included. Note how the jump target is constructed.andi
, ori
, xori
: Added for completeness, these are often used for masking and bit manipulation in conjunction with branching.branch_taken
Output: This output signals whether a branch was taken. This is useful for pipeline control and performance analysis.rd_data_sel
and rd_write_enable
: These outputs control the write-back stage of the pipeline, selecting the data to be written to the destination register (rd
) and enabling the write operation. This is a more realistic representation of how data is written back in a pipelined processor (next major chapter).This SystemVerilog module is a more complete (though still simplified) representation of the control logic for handling decision-making instructions in MIPS32. It shows how the instruction is decoded, how branch targets are calculated, and how the results of comparisons are used to control program flow. Remember, this is still a building block. A real MIPS CPU would have a much more complex control unit, handling interrupts, exceptions, and other features. We can expand on this as we delve into pipelined processors and more advanced topics later in chapter 3.
TBA
TBA