← back to syllabus ← back to notes
Since SPIM is a simulator and not an operating system, it provides basic OS-like functionality to assembly programmers via the syscall instruction. This allows your MIPS code to practically interact with the console for input and output.
To use a syscall, you follow a strict convention:
$v0.$a0 (and a1) data registers (or $f12 for FP).syscall instruction physically.Here is the essential reference table of SPIM library calls you will need to design and test your assembly programs pragmatically:
| Service | Code in $v0 |
Arguments Required | Action / Result |
|---|---|---|---|
| Print Integer | 1 | $a0 = integer value to print |
Prints integer to the console |
| Print Float | 2 | $f12 = float value to print |
Prints float to the console |
| Print Double | 3 | $f12 = double value to print |
Prints double to the console |
| Print String | 4 | $a0 = address of NULL-terminated string (.asciiz) |
Prints string to the console |
| Read Integer | 5 | - | User inputs integer; Returned in $v0 |
| Read Float | 6 | - | User inputs float; Returned in $f0 |
| Read Double | 7 | - | User inputs double; Returned in $f0 |
| Read String | 8 | $a0 = address of input buffer; $a1 = length |
User inputs string into memory at $a0 |
Allocate Heap (sbrk) |
9 | $a0 = number of bytes to allocate |
Address of allocated memory returned in $v0 |
| Exit | 10 | - | Terminates program execution cleanly |
| Print Char | 11 | $a0 = character to print |
Prints a single ASCII character |
| Read Char | 12 | - | User inputs character; Returned in $v0 |
| Open File | 13 | $a0 = filename, $a1 = flags, $a2 = mode |
Opens file; File Descriptor returned in $v0 |
| Read File | 14 | $a0 = file desc., $a1 = buffer, $a2 = max length |
Reads from file into memory buffer |
| Write File | 15 | $a0 = file desc., $a1 = buffer, $a2 = output length |
Writes memory buffer out to file |
| Close File | 16 | $a0 = file descriptor |
Closes the open file |
| Exit (with value) | 17 | $a0 = termination result |
Terminates program, returns value to OS |
Note: The Print String syscall (4) structurally relies on the target string being explicitly terminated by a NULL byte. This is handled automatically by the compiler’s .asciiz directive, but must be mathematically managed if you use the raw .ascii declaration.
Here is a complete, runnable example demonstrating how to use syscalls to print string prompts, wait for the user to type on their keyboard, and capture their input into memory:
Source Code Reference: interactive_io.asm
.data
prompt_name: .asciiz "Please enter your name: "
prompt_age: .asciiz "Please enter your age: "
result_msg: .asciiz "Hello! You entered age: "
newline: .asciiz "\n"
name_buffer: .space 64 # Reserve 64 bytes in memory for the string input
.text
.globl main
main:
# 1. Ask for Name FIRST (to avoid the leftover newline from syscall 5)
li $v0, 4 # syscall 4: print string
la $a0, prompt_name
syscall
# 2. Read String from Keyboard
li $v0, 8 # syscall 8: read string
la $a0, name_buffer # $a0 = address of where to save the string in memory
li $a1, 64 # $a1 = maximum length to read (prevents buffer overflow)
syscall # Program pauses waiting for user input
# 3. Ask for Age
li $v0, 4 # syscall 4: print string
la $a0, prompt_age
syscall
# 4. Read Integer from Keyboard
li $v0, 5 # syscall 5: read integer
syscall # Program pauses waiting for user to type and press Enter
add $s0, $0, $v0 # IMMEDIATELY save the returned integer into a safe register ($s0)
# 5. Echo the Results Back
li $v0, 4 # syscall 4: print string
la $a0, result_msg
syscall
li $v0, 1 # syscall 1: print integer
add $a0, $0, $s0 # load our saved age from $s0
syscall
# Print trailing newline
li $v0, 4 # syscall 4: print string
la $a0, newline
syscall
# Exit cleanly
li $v0, 10
syscall
A powerful capability of SPIM is its native support for opening, reading, and closing literal files on your hard drive (Syscalls 13, 14, and 16).
Suppose you have a text file named data.txt in the exact same directory as your .asm file.
The literal contents of data.txt are 5 integers separated by a newline (\n on Mac/Linux, or \r\n on Windows).
Target File Reference: data.txt
10
20
30
40
50
The following code demonstrates how to open that file, read the dense string of ASCII characters into a buffer space reserved dynamically in the .data Memory Segment (0x10010000), convert those characters into mathematical integers, and then use a jal procedure to calculate their average.
Source Code Reference: file_io.asm
.data
filename: .asciiz "data.txt"
file_buffer: .space 1024 # Reserve 1024 bytes in the Data Segment to hold the raw text file
result_msg: .asciiz "The average of the 5 integers is: "
newline: .asciiz "\n"
myArray: .word 0, 0, 0, 0, 0 # Reserve 5 integer slots (20 bytes) to store the converted math values
.text
.globl main
main:
# 1. Open the File
li $v0, 13 # syscall 13: open file
la $a0, filename # $a0 = address of null-terminated filename
li $a1, 0 # $a1 = flags (0 = read-only)
li $a2, 0 # $a2 = mode (ignored)
syscall
add $s0, $0, $v0 # IMMEDIATELY save the "File Descriptor" returned in $v0 to $s0 safely
# 2. Read the File into Memory
li $v0, 14 # syscall 14: read from file
add $a0, $0, $s0 # $a0 = File descriptor we just saved
la $a1, file_buffer # $a1 = address of our 1024 byte buffer in the `.data` segment
li $a2, 1024 # $a2 = maximum number of bytes to read
syscall
# 3. Close the File
li $v0, 16 # syscall 16: close file
add $a0, $0, $s0 # $a0 = File Descriptor
syscall
# 4. Convert the Raw ASCII Text Buffer into Physical Integers
la $a0, file_buffer # Argument 0: Base address of the text buffer we just populated
la $a1, myArray # Argument 1: Base address of the destination integer array
li $a2, 5 # Argument 2: How many integers we expect to parse
jal convert_ascii_to_int # Jump and Link to our custom parsing procedure!
# 5. Call the 'average' Procedure
la $a0, myArray # Pass the base address of the array as Argument 0
li $a1, 5 # Pass the length of the array as Argument 1
jal average # Jump and Link to the procedure (saves Return Address to $ra)
add $t0, $0, $v0 # Save the calculated average returned in $v0
# 6. Print the Results
li $v0, 4 # syscall 4: print string
la $a0, result_msg
syscall
li $v0, 1 # syscall 1: print integer
add $a0, $0, $t0 # load our average
syscall
# Print trailing newline
li $v0, 4 # syscall 4: print string
la $a0, newline
syscall
# Exit cleanly
li $v0, 10
syscall
# ---------------------------------------------------- #
# Procedure: average
# Arguments: $a0 = array base address, $a1 = size
# Returns: $v0 = integer average
# ---------------------------------------------------- #
average:
li $t0, 0 # Loop counter
li $t1, 0 # Running sum
add $t2, $0, $a0 # Copy the base address to $t2 so we can shift it
Avg_Loop:
beq $t0, $a1, Avg_Done # If counter == size, we reached the end
lw $t3, 0($t2) # Load physical word from memory array
add $t1, $t1, $t3 # Add to sum
addi $t2, $t2, 4 # Shift memory pointer by 4 bytes (1 word)
addi $t0, $t0, 1 # Increment counter by 1
j Avg_Loop # Jump back up
Avg_Done:
div $t1, $a1 # Divide Sum ($t1) by Count ($a1)
mflo $v0 # MIPS stores the quotient in the LO register. Move it to $v0 as our return value!
jr $ra # Return structurally to the caller (main) using $ra
# ---------------------------------------------------- #
# Procedure: convert_ascii_to_int
# Arguments: $a0 = address of file_buffer (source ASCII)
# $a1 = address of myArray (destination ints)
# $a2 = number of expected integers to find
# Returns: None (modifies memory directly at myArray)
# ---------------------------------------------------- #
convert_ascii_to_int:
add $t0, $0, $a0 # $t0 = Parse pointer (sliding along file_buffer)
add $t1, $0, $a1 # $t1 = Store pointer (sliding along myArray)
li $t2, 0 # $t2 = Current integer accumulator
li $t3, 0 # $t3 = Successful integers parsed counter
li $t4, 10 # $t4 = Constant math multiplier (10)
Parse_Loop:
beq $t3, $a2, Parse_Done # If we found all expected integers, exit!
lb $t5, 0($t0) # Load exactly 1 byte of ASCII text
addi $t0, $t0, 1 # Shift parse pointer to the right by 1 byte
# Check for empty byte / Null terminator (ASCII 0)
beq $t5, 0, Parse_Done
# Check for newline '\n' (ASCII 10)
beq $t5, 10, Save_Int
# Check for Windows carriage return '\r' (ASCII 13)
beq $t5, 13, Parse_Loop # Safely ignore \r and read the next physical byte
# Check for Space (ASCII 32)
beq $t5, 32, Save_Int
# If we are here, we strongly assume it is a valid digit '0' to '9'.
# Subtract 48 (ASCII code for '0') to reveal the true mathematical value
addi $t5, $t5, -48
# Accumulate: Current_Value = (Current_Value * 10) + new_digit
mul $t2, $t2, $t4 # Shift current value left natively in base-10
add $t2, $t2, $t5 # Add the new digit
j Parse_Loop # Unconditionally jump back up to grab the next byte!
Save_Int:
sw $t2, 0($t1) # Save the built up integer physically into the myArray slot
addi $t1, $t1, 4 # Shift the destination pointer down by 1 word (4 bytes)
addi $t3, $t3, 1 # Increment our successful integers counter by 1
li $t2, 0 # **CRITICAL:** Reset the accumulator back to 0 for the next number!
j Parse_Loop # Go back to finding the next number
Parse_Done:
jr $ra # Return structurally to the caller (main) using $ra
exceptions.s)Whenever you run a file in SPIM, you will notice the very first output line on your terminal is:
Loaded: /opt/homebrew/Cellar/spim/9.1.24/share/exceptions.s
What are Exceptions in Computer Architecture? In hardware architecture, an Exception (or Interrupt) is an unscheduled event that forcibly disrupts the normal flow of program execution. When a CPU encounters an impossible physical situation, such as dividing a binary number by zero, attempting to execute an invalid opcode sequence, or trying to access memory that doesn’t physically exist, it cannot simply “keep going.” The hardware inherently triggers an Exception, immediately halting the user’s program and aggressively transferring control privileges to the Operating System (or a dedicated Kernel handler) to deal with the crisis.
What is the exceptions.s file?
Because SPIM is a simulator acting as a bare-bones operating system, it must manually provide an OS Kernel Exception Handler. The exceptions.s file is exactly that: a pre-written MIPS assembly file that SPIM permanently loads into the protected Kernel Text Segment (.ktext at 0x80000180) before it ever looks at your user code.
When your code violently crashes, the simulated MIPS CPU hardware jumps directly to 0x80000180. The exceptions.s routine reads the hardware Cause Register (Coprocessor 0, Register 13) to aggressively identify why the CPU halted. It then prints a structured error message based on an internal array of strings (__excp), and conventionally terminates your process to prevent catastrophic cascading failures.
What Exceptions Does It Catch?
If you look closely at the exceptions.s source code, it officially catches and translates over a dozen hardware faults for you, including:
[Address error in inst/data fetch] (Unaligned memory access)[Bad instruction address][Error in syscall] (Passing an invalid $v0 code)[Reserved instruction] (Invalid machine code)[Arithmetic overflow] (Standard add exceeding 32-bit limits)[Floating point] (FPU division by zero)Example: Triggering an Address Exception
The MIPS architecture rigidly enforces that all 32-bit Words must align on physical Memory Boundaries that are multiples of 4 (e.g., 0x10000000, 0x10000004, 0x10000008). If you attempt to load a Word (lw) from an obscure address like 0x10010001, the silicon gating will critically fail.
Running the following “simplest exception” code:
Source Code Reference: exception_demo.asm
.text
.globl main
main:
# Trigger an Address Error Exception by loading a word from an unaligned address
li $t0, 0x10010001 # Unaligned memory address (not a multiple of 4)
lw $t1, 0($t0) # Attempt to load a 32-bit word
# Exit cleanly (will never be reached)
li $v0, 10
syscall
Will trigger SPIM’s exceptions.s handler to catch the hardware failure and cleanly abort the run:
Loaded: /opt/homebrew/Cellar/spim/9.1.24/share/exceptions.s
Exception occurred at PC=0x0040002c
Unaligned address in inst/data fetch: 0x10010001
Exception 4 [Address error in inst/data fetch] occurred and ignored
We’ve mastered the mechanical foundations of MIPS: the load-store architecture, memory layouts, control flow (beq/j), and basic procedure calling. Today, we bridge these isolated mechanics to write complex, algorithmic software natively in hardware.
move, li, la)Before we dive into advanced patterns, let’s address a structural quirk of the SPIM simulator and the MIPS assembler: Pseudo-instructions.
Pseudo-instructions are “fake” assembly commands that do not physically exist in the MIPS hardware silicon. They are provided purely as a convenience by the software Assembler (like SPIM), which quietly translates them into one or two actual native hardware instructions right before execution.
1. move (Copying Registers)
You will often see tutorials broadly use move $t0, $t1. However, the MIPS hardware architecture has no actual move command in silicon! To physically copy a register, you must systematically exploit the hardwired $0 (Zero) register using the ALU. Adding zero to a number leaves it unchanged:
move $t0, $t1add $t0, $0, $t1We strictly enforce this native format to prove exactly how the ALU routes data physically!
2. li (Load Immediate)
MIPS instructions are rigidly exactly 32-bits long. The I-Type (Immediate) machine code format only grants you 16 literal bits to store a constant number. If you try to load a massive number like 100,000, it physically cannot fit inside the instruction!
To bypass this limitation seamlessly, the software Assembler provides li $t0, 100000. Behind your back, it secretly splits the operation into two native hardware commands:
lui (Load Upper Immediate) to securely fill the top 16 bits.ori (Or Immediate) to mathematically inject the bottom 16 bits.Why do we allow it? Forcing you to manually translate every large decimal into binary, slice it seamlessly in half, and write two native instructions for every constant is tedious and distracts heavily from the core algorithmic logic of your program (e.g., loops and recursion).
3. la (Load Address)
When you define a label in the .data segment (e.g., myArray:), the compiler/linker eventually assigns it a physical 32-bit SRAM/DRAM address (commonly 0x10010000 via SPIM). Because you, the programmer, do not mathematically know exactly what that 32-bit mapped address will be until the code physically compiles, you simply use la $t0, myArray. Similarly to li, the assembler secretly replaces this abstraction with native lui and ori instructions once it structurally finalizes the 32-bit memory map!
Why do we allow it? Attempting to manually hardcode 32-bit RAM addresses independently without the linker’s mathematical assistance is nearly impossible for dynamic software!
When we talk about “Arrays” in assembly, we are merely talking about a contiguous block of data sitting in Main Memory (usually the .data or Heap segments). Because MIPS memory is byte-addressed, and our standard integers (“words”) are 32 bits (4 bytes) long, we cannot simply increment an array index by 1 to move to the next integer. We must increment our memory pointer by 4.
Example: Summing an Array of 5 Integers
Source Code Reference: sum_array.asm
.data
myArray: .word 10, 20, 30, 40, 50 # 5 integers (20 bytes total)
length: .word 5
newline: .asciiz "\n"
.text
.globl main
main:
la $t0, myArray # $t0 = Base address pointer of the array
lw $t1, length # $t1 = Total elements (5)
li $t2, 0 # $t2 = Current index counter (starts at 0)
li $t3, 0 # $t3 = Running sum accumulator (starts at 0)
Loop_Start:
beq $t2, $t1, Loop_End # If counter == length, we're done! Exit loop.
lw $t4, 0($t0) # Load the physical word at the current pointer address
add $t3, $t3, $t4 # Add it to our running sum
addi $t0, $t0, 4 # CRITICAL: Shift the memory pointer up by 4 bytes!
addi $t2, $t2, 1 # Increment our logical loop counter by 1
j Loop_Start # Unconditionally jump back up to evaluate the next loop
Loop_End:
# Print the final sum ($t3)
li $v0, 1 # syscall 1 = print integer
add $a0, $0, $t3 # move our accumulated sum into the argument register
syscall # execute print
# Print trailing newline
li $v0, 4 # syscall 4 = print string
la $a0, newline
syscall
# Exit cleanly
li $v0, 10 # syscall 10 = exit
syscall
Example: Summing an Array of 5 Decimals (Floats)
A look ahead! While integers (.word) rely on standard registers ($t0-$t9), dealing with fractional decimals (.float) requires delegating execution natively to the hardware’s Floating Point Unit (FPU). Notice how we introduce Coprocessor 1 specific commands like l.s (Load Single-Precision Float) and add.s (Floating-Point Add), utilizing dedicated FPU registers ($f0-$f31).
Source Code Reference: sum_array_floats.asm
.data
myArray: .float 10.5, 20.25, 30.125, 40.0, 50.5 # 5 floats (20 bytes total in memory)
length: .word 5
init_sum: .float 0.0
result_msg: .asciiz "The sum of the decimals is: "
newline: .asciiz "\n"
.text
.globl main
main:
la $t0, myArray # $t0 = Base address pointer of the array
lw $t1, length # $t1 = Total elements (5)
li $t2, 0 # $t2 = Current index counter (starts at 0)
# Initialize floating-point sum to 0.0 using Coprocessor 1 (FPU)
l.s $f12, init_sum # $f12 = Running sum accumulator (Starts at 0.0)
Loop_Start:
beq $t2, $t1, Loop_End # If counter == length, we're done! Exit loop.
# Load physical float from the memory pointer directly into the Floating Point Unit
l.s $f4, 0($t0)
# Add the extracted decimal to our running sum natively inside the FPU
add.s $f12, $f12, $f4
addi $t0, $t0, 4 # CRITICAL: Shift the memory pointer up by 4 bytes (Floats are 32-bit Words too)!
addi $t2, $t2, 1 # Increment our logical loop counter by 1
j Loop_Start # Unconditionally jump back up to evaluate the next loop
Loop_End:
# Print the descriptive text
li $v0, 4 # syscall 4 = print string
la $a0, result_msg
syscall
# Print the final decimal sum ($f12)
# Note: SPIM syscall 2 inherently accesses register $f12 to find the float to print!
li $v0, 2 # syscall 2 = print float
syscall # execute print
# Print trailing newline
li $v0, 4 # syscall 4 = print string
la $a0, newline
syscall
# Exit cleanly
li $v0, 10 # syscall 10 = exit
syscall
Strings (.asciiz) are handled differently than arrays of integers. Every ASCII character occupies exactly 1 byte (8 bits).
Therefore, to iterate through a string, you do not shift your memory pointer by 4 (which would skip 3 characters). You shift your pointer by 1. You also use lb (Load Byte) instead of lw (Load Word).
Note: .asciiz automatically appends a “Null Terminator” (a literal binary 0x00) to the end of your string. Your loop condition simply searches for this zero byte to know when the string is entirely finished!
Recursion is when a function calls itself back-to-back until it hits a base condition. It is the ultimate test of your Stack Management ($sp), because every single recursive layer aggressively overwrites the $ra (Return Address) tracker and the $a0 arguments!
The Fibonacci Example (fib(n))
Let’s trace how the CPU physically builds the stack frame when calculating fib(3).
main calls fib(3). $a0 = 3. The CPU saves main’s return address in $ra.fib: The function pushes a “plate” onto the Stack (addi $sp, $sp, -12). It rigidly saves its current $ra, $a0, and $s0 values into memory.$a0 ($a0 = 2) and calls jal fib.fib(2) inside fib(3) inside main.When the base cases fib(1) or fib(0) are finally breached, the recursion violently collapses upwards.
The CPU systematically calculates the math, issues lw to pop the saved $ra and $a0 values back off their respective stack plates, destroys the physical stack slot (addi $sp, $sp, 8), and utilizes jr $ra to seamlessly return up the chain of functions. If a single stack push/pop operation triggers out of order, the entire program fatally crashes into the void of memory.
The Fast Inverse Square Root algorithm was famously used in the source code of the 1999 video game Quake III Arena. In 3D graphics engines, calculating how light reflects off a polygonal surface requires calculating the “surface normal” (a perpendicular vector). Normalizing this lighting vector mathematically requires computing $\frac{1}{\sqrt{x}}$, where $x$ is the magnitude.
At the time, performing a standard CPU square root followed by a division instruction was devastatingly slow. To maintain a smooth, performant 3D experience, the Quake developers implemented a brilliant algorithmic hack that drastically sped up this calculation using bit manipulation and Newton-Raphson approximation.
Here is a simplified MIPS implementation of that exact logic. Notice how it seamlessly moves raw bits between the floating-point ($f12) and integer ($t0) registers using mfc1 and mtc1 to perform rapid integer math on a floating-point structure!
Simplified Lighting Reflection Example:
Source Code Reference: quake_inv_sqrt.asm
.data
magic: .word 0x5f3759df # The famous Quake "magic number" constant
threehalfs: .float 1.5
half: .float 0.5
magnitude: .float 16.0 # Simulated light vector magnitude (x)
result_msg: .asciiz "1 / sqrt(16.0) is approx: "
newline: .asciiz "\n"
.text
.globl main
main:
# Load our lighting magnitude into a floating-point register
l.s $f12, magnitude
# --- The Fast Inverse Square Root Algorithm ---
# 1. Calculate x2 = x * 0.5
l.s $f4, half
mul.s $f2, $f12, $f4 # $f2 (x2) = x * 0.5
# 2. Evil Bit-Level Hacking (Move float bits natively to an integer register)
mfc1 $t0, $f12 # Copy raw IEEE 754 bits from FPU ($f12) to CPU ($t0)
# 3. Initial Approximation via Magic Number Subtraction
srl $t1, $t0, 1 # Logic Shift Right by 1 (i >> 1)
lw $t2, magic # Load the 0x5f3759df magic constant
sub $t0, $t2, $t1 # i = magic - (i >> 1)
# 4. Move the manipulated integer bits back to the FPU
mtc1 $t0, $f0 # The float structure in $f0 is now incredibly close to 1/sqrt(x)!
# 5. First iteration of Newton-Raphson to clean up the approximation
l.s $f6, threehalfs # Load 1.5
mul.s $f8, $f0, $f0 # y * y
mul.s $f8, $f2, $f8 # x2 * (y * y)
sub.s $f8, $f6, $f8 # threehalfs - (x2 * y * y)
mul.s $f0, $f0, $f8 # Final $f0 Result: y = y * [1.5 - (x2 * y * y)]
# 6. Second iteration of Newton-Raphson to further refine the approximation
mul.s $f8, $f0, $f0 # y * y
mul.s $f8, $f2, $f8 # x2 * (y * y)
sub.s $f8, $f6, $f8 # threehalfs - (x2 * y * y)
mul.s $f0, $f0, $f8 # Final $f0 Result: y = y * [1.5 - (x2 * y * y)]
# ----------------------------------------------
# (The value in $f0 can now be used to rapidly normalize the light reflection vector!)
# Print the descriptive text
li $v0, 4 # syscall 4: print string
la $a0, result_msg
syscall
# Print the calculated float result in $f0
li $v0, 2 # syscall 2: print float
mov.s $f12, $f0 # Move result to argument register for printing
syscall
# Print trailing newline
li $v0, 4
la $a0, newline
syscall
# Exit cleanly
li $v0, 10
syscall
When you run this code, the output will yield:
1 / sqrt(16.0) is approx: 0.24999891
Why this exact number? Mathematically, computing $\frac{1}{\sqrt{16.0}}$ yields exactly $\frac{1}{4.0}$, or 0.25000000. By exploiting integer bitwise architecture to yield an initial guess, and mathematically refining it with two iterations of the Newton-Raphson method, Quake approximates 0.24999891. This is stunningly precise (an error margin of just 0.000436%) while entirely avoiding the use of a strict floating-point division instruction.
Wait, * decimals? Up until this exact moment in Chapter 1 and Chapter 2 of our textbook (Computer Organization and Design), we have rigidly introduced you *only to the integer data type (both unsigned integers and signed integers using Two’s Complement).
However, just like desktop calculators, computer CPUs must structurally handle fractional decimal numbers too (like 0.24999891). Our native 32-bit registers ($t0, $s0) are perfectly wired for whole integers. But how do we physically wire a . (decimal point) into microscopic silicon gates?
A rudimentary “Fixed-Point” decimal system is the first intuitive option, but structurally, it does not provide sufficient dynamic range to handle both macroscopically huge numbers and microscopically tiny numbers inside the exact same 32-bit operand limit. A universally robust protocol is required.
Next week, we transition into Chapter 3. We will formally introduce the Floating-Point Data Type powered by the IEEE 754 protocol. We will explore how MIPS handles these specialized decimals by delegating them to an entirely separate piece of hardware acting as a Coprocessor: the Floating Point Unit (FPU).
(Check out the bottom-right corner of your MIPS Green Sheet to preview the dedicated add.s, sub.s, and div.s instructions that exclusively run on this FPU coprocessor!)
In an engineering context, let’s compare the Quake approach against the standard IEEE-754 approach to demonstrate the marked performance improvement.
Standard IEEE Approach: 2 Instructions
sqrt.s $f0, $f12 # Calculate square root
div.s $f0, $f4, $f0 # 1.0 / sqrt(x)
Quake Approach: 14 Instructions
As seen in the code block above, the Quake method utilizes roughly 14 individual instructions (mul.s, sub, srl, mfc1, etc.) to achieve the same result.
The Emulator Timing Abstraction: If you write a benchmark script to loop both algorithms 100,000 times in the SPIM emulator, the Standard approach will finish roughly 3x faster than the Quake approach. Why? Because software emulators like SPIM execute instructions sequentially on your modern host Mac/PC CPU. SPIM treats every instruction as taking exactly 1 software “step”, abstracting away the realities of the physical silicon. This creates a severe pedagogical blind spot where 2 software steps empirically beats 14 software steps, regardless of the underlying structural pipeline bottlenecks those steps might trigger on physical hardware.
To explicitly prove this mathematical abstraction, here is the raw output from measuring SPIM executing both programs 100,000 times natively on my terminal (time spim -file ...):
Benchmark References:
$ time spim -file bench_standard.asm
spim -file bench_standard.asm 0.04s user 0.03s system 96% cpu 0.074 total
$ time spim -file bench_quake.asm
spim -file bench_quake.asm 0.13s user 0.08s system 98% cpu 0.214 total
As you can see, the 2-instruction loop finishes in 0.074s, while the 14-instruction loop finishes in 0.214s (almost exactly 3x slower natively).
The Physical Hardware Reality: On physical silicon in 1999 (e.g., Pentium III or MIPS R4000 CPUs), all instructions did not take equal time. We can mathematically prove exactly why the Quake method was superior by revisiting the fundamental CPU Performance Equation from Chapter 1 of our textbook (Computer Organization and Design):
\[\text{CPU Execution Time} = \text{Instruction Count (IC)} \times \text{Cycles Per Instruction (CPI)} \times \text{Clock Cycle Time}\]div.s) was incredibly complex and could stall the processor pipeline for up to 54 clock cycles waiting for completion. Assuming a basic sqrt.s execution alongside it, the average CPI is roughly $27.5$. The total execution requirement is $2 \text{ instructions} \times 27.5 \text{ CPI} \approx 55 \text{ physical clock cycles}$.sub, srl) and bit-moves (mfc1) natively in exactly 1 clock cycle. Therefore, the constant CPI is strictly $1.0$. The total execution requirement is $14 \text{ instructions} \times 1.0 \text{ CPI} = \mathbf{14 \text{ physical clock cycles}}$.Even though the Instruction Count was radically higher, the Quake developers minimized the Cycles Per Instruction so drastically that their total CPU Execution Time finished nearly 4x faster on actual silicon!
This perfectly illustrates why relying blindly on a software emulator (which mathematically enforces an abstracted, flat CPI of $1.0$ across every command) creates a severe pedagogical blind spot for computer architects.
Today commemorates the official assignment of your Final Group Project (due May 15, 2026 no later than 5:00 PM ET)! This project encompasses the primary educational objective of ECE 251: you will organically design and implement a von Neumann computer from scratch.
This project carries a total value of 100 raw points (accounting for 25% of your final semester grade). There is also an allocation for a possible 30 points of extra-credit.
You are to implement the design using SystemVerilog and to include a test bench to demonstrate the full functionality of your processor and memory (especially loading and running programs, as well as taking input and showing output to the user).
The following requirements hold for full assessment:
See below for the formal grading parameters of the final project. The rubric evaluates your complete architectural design across 200 raw points (which scales directly to the 100 base points defined in the syllabus).
1. ISA Design (34 Raw Points)
shamt Size, Instruction Size, PC Increment, Immediate Size.2. Memory Design & Implementation (16 Raw Points)
3. Processor Design & Implementation (120 Raw Points)
4. Project & Design Documentation (30 Raw Points)
If your base von Neumann machine boots successfully, you may attempt the following advanced architectural features for extra credit:
Note: The Professor will be hosting optional weekend lab sessions leading up to the deadline to assist with simulation debugging or architectural design consultation.
Choose a partner. Brainstorm your final project for which you will design, program, and simulate a computer with your own CPU ISA using SystemVerilog. The theme of the project is to create a computer of your choice with a minimum of 4 bits, given the complexity expected from this course.
Some fundamental ideas would include the Intel 4004, Intel 8086, Intel 8088, Motorola 6800, Motorola 68000, the MOS 6502, Zilog Z80, etc. Check out the Computer History Museum Timeline for some inspiration, or be creative! Here is a design guide to use as an official resource: Professor Marano’s Notes on Designing a von Neumann Computer.
You will design the entire ISA, the data path, the control path, the ALU, and the Controller units, along with the explicit support for integer processing. (Challenge: How would you mathematically handle floating point?)
If you like, you can choose an alternative path and write a functional MIPS simulator that shows the state of each register and how the memory works, along with writing the control path, data path, ALU, and Controller. I will share the basic architecture in SystemVerilog. Check out the CPU HDL Catalog for reference.
Submission Guidelines:
Next week (Session 8: March 12th) represents the Midterm Exam. We have covered an immense amount of foundational computing architecture since January.
The Midterm will heavily assess the following comprehensive topics:
assign dataflows, combinatorial block rules versus sequential always_ff edge-triggered operations, and the explicit differences between Blocking (=) and Non-Blocking (<=) assignments physically altering silicon synthesis.bne/beq), array manipulations, translating standard C structures (if/else/while), and procedure tracking (jal, $ra, $sp execution chains).Begin aggressively studying the posted Midterm Study Guide. Office hours will dynamically switch to structured, review-oriented topic overviews!