Courses & Projects by Rob Marano

Notes for Week 14

← back to syllabus ← back to notes

Topics

  1. CPU interrupts and exceptions
  2. Basics of memory caching for CPUs

Introduction to Exceptions and Interrupts

Causes and Types of Exceptions and Interrupts in MIPS

Common events that trigger exceptions or interrupts in MIPS include:

MIPS Exception and Interrupt Handling Process

When an exception or interrupt occurs and is recognized by the processor, a sequence of events takes place:

  1. Detection: The processor detects the exceptional condition or receives an interrupt signal. Processors often check for interrupts after completing the execution of the current instruction.
  2. Finish Current Instruction (usually): For interrupts, the processor typically finishes the execution of the current instruction before responding. Pipelined processors may need to handle this differently, potentially involving flushing instructions in the pipeline.
  3. Save Processor State: The processor needs to save enough information to resume the interrupted program later.
    • The address of the offending or interrupted instruction is saved in a special register called the Exception Program Counter (EPC). This is analogous to using the $ra register for saving the return address during a jump-and-link (jal) instruction.
    • The cause of the exception is recorded in another special register called the Cause register. The exception handler routine reads this register to determine how to proceed. MIPS uses different codes in the Cause register for different events.
    • Other status information, such as the Program Status Word (PSW) or processor status register (PS), may also be saved, sometimes automatically in a dedicated register (like IPS in some architectures) or on the system stack.
    • In MIPS, the EPC and Cause registers, along with others used for system functions, are typically part of Coprocessor 0 (CP0).
  4. Transfer Control to Handler: The processor transfers execution control to a predefined address where the exception handler or interrupt service routine (ISR) resides.
    • The starting address of the routine is loaded into the Program Counter (PC).
    • Many MIPS implementations use a single fixed entry point address for all exceptions (e.g., 8000 0180hex or 8000 0000hex for TLB misses). The handler code at this address then uses the Cause register to determine the specific event and branch to the appropriate specific handler code. Alternatively, some architectures (or MIPS implementations) might use vectored exceptions, where each exception type jumps directly to a different handler address.
    • The processor may switch from user mode to supervisor mode (or kernel mode). This is necessary for the OS handler to access privileged resources and instructions.
  5. Execute the Handler Routine: The handler routine performs the actions necessary to deal with the exception or interrupt.
    • Software can access the EPC and Cause registers using the mfc0 (move from coprocessor 0) instruction, copying their contents into a general-purpose register. Similarly, mtc0 (move to coprocessor 0) can be used to write to these registers.
    • The handler should save any general-purpose registers it might modify if they were not automatically saved, often on the system stack.
    • If exceptions were automatically disabled upon entering the handler (to prevent nested exceptions), they must be re-enabled at an appropriate point in the handler, usually after saving necessary state. Disabling/enabling is often controlled by a bit in the processor status register and can be managed using instructions like ei (enable interrupts) and di (disable interrupts).
  6. Return to Interrupted Program: After handling the event, the processor needs to return to the point where the original program was interrupted.
    • In MIPS, this is done using the ERET (Return from Exception) instruction. This instruction also typically resets the processor back to user mode.
    • The ERET instruction transfers control to the address stored in the EPC.
    • For some exceptions like TLB misses or page faults, the instruction that caused the exception is re-executed after the handler has resolved the issue. This requires that the instruction be restartable, meaning its execution can be resumed after an exception without affecting the result.

Integration with Instruction Cycle and Pipelining

Role of the Operating System

MIPS Exception Worked Example

Here is a conceptual worked example illustrating how MIPS handles an arithmetic overflow exception in a pipelined processor.

Handling an Arithmetic Overflow Exception in MIPS

This example focuses on the events that occur when an add instruction triggers an arithmetic overflow in a pipeline MIPS processor.

Scenario: Imagine the following sequence of MIPS instructions being executed in a pipelined processor:

        ...         # Instructions before the potential overflow
0x40:  sub $11, $2, $4
0x44:  and $12, $2, $5
0x48:  or $13, $2, $6
0x4C:  add $1, $2, $1   # Assume the sum of $2 and $1 exceeds the maximum
                        # representable signed 32-bit integer, causing overflow.
0x50:  slt $15, $6, $7
0x54:  lw  $16, 50($7)
        ...         # Subsequent instructions

Assume the processor is executing these instructions concurrently in its pipeline stages, i.e., Instruction Fetch (IF), Instruction Decode (ID), Instruction Execute (EX), Memory Access (MEM), register Write-back (WB).

Process When Overflow Occurs:

  1. Instruction Execution & Detection (EX Stage): The add $1, $2, $1 instruction at address 0x4C reaches the Execute (EX) stage of the pipeline. During its operation in the ALU, the arithmetic overflow condition is detected. Note, the MIPS instructions add, addi, and sub are designed to cause exceptions on overflow. (see the MIPS ISA Green Sheet notes for each instruction)
  2. Initiating Exception Handling (Hardware Actions): Upon detecting the overflow, the hardware initiates the exception handling process. This is treated similarly to a control hazard in the pipeline.
    • Save Program Counter: The address of the instruction that caused the exception is saved in a special register called the Exception Program Counter (EPC). MIPS typically saves the address of the next, following instruction (PC + 4) in the EPC. So, in this case, 0x50 (0x4C + 4) is saved in the EPC.
    • Record Exception Cause: The hardware records the reason for the exception (arithmetic overflow) in the Cause register. See Table 6.7, which indicates the code for arithmetic overflow is 0x00000030. The Cause register can potentially record multiple pending exceptions that occur in the same clock cycle, allowing the exception software to prioritize.
    • Flush Pipeline: Instructions that entered the pipeline after the faulting add instruction are flushed (discarded). This might include the slt (0x50) and lw (0x54) instructions, and any others in the IF or ID stages. Control signals like EX.Flush and ID.Flush can be used for this. Importantly, writes to the register file or memory by the faulting instruction itself (the add at 0x4C) and subsequent instructions are prevented. This ensures the incorrect overflow result doesn’t modify register $1.
    • Change Program Counter: The processor forces the PC to a predefined entry point address for the exception handler routine. A common single entry point address for exceptions in MIPS is 0x8000_0180. The PC is updated with this address.
    • Switch Mode: The processor typically transitions from user mode to a privileged mode (like supervisor or kernel mode of the operating system). This allows the exception handler, which is part of the operating system, to access privileged resources and instructions necessary for handling the exception. Exceptions might also be automatically disabled upon entering the handler to prevent nested exceptions initially.
  3. Exception Handler Execution (Software Actions): Execution begins at the fixed handler address (e.g., 0x8000_0180).
    • The handler is OS code. It first needs to determine the cause and location of the exception.
    • It uses the mfc0 (move from coprocessor 0, the actual CPU) instruction to read the contents of the Cause and EPC registers into general-purpose registers. For example:
      mfc0 $t0, Cause   # Read the exception cause into $t0; recall Table 6.7.
      mfc0 $t1, EPC     # Read the saved PC+4 address into $t1
      
    • The handler code examines the value in $t0 (which should be 0x00000030 for overflow) to identify the exception type.
    • It saves the general-purpose registers of the interrupted program onto the stack to preserve its state. Note that registers $k0 and $k1 are often reserved for the OS and not saved/restored by compilers.
    • The handler uses the address saved in $t1 (EPC) and potentially subtracts 4 to get the address of the faulting instruction (0x4C).
    • The handler performs actions appropriate for an arithmetic overflow. This could range from logging the error and terminating the program to potentially attempting corrective action (though the nature of this correction for overflow is not detailed in the sources).
    • At an appropriate point, the handler might re-enable exceptions.
    • Before returning, the handler restores the saved general-purpose registers from the stack.
  4. Returning to the Program:
    • To return to the interrupted program, the handler uses the ERET (Return from Exception) instruction. (Older MIPS-I used rfe and jr).
    • ERET transfers control back to the address stored in the EPC (which, after adjustment by the handler, is 0x4C). It also typically switches the processor back to user mode.
    • For many exceptions, including overflow and page faults, the instruction that caused the exception (add $1, $2, $1 at 0x4C) is re-executed from scratch after the handler returns. MIPS instructions are generally restartable because they complete their data writes at the end of the instruction cycle and only write one result.

This sequence allows the processor to handle unexpected events, transition to a privileged handler routine to manage the event, and potentially resume the original program execution.

Basics of Memory Caching for CPUs

1. Why Memory Hierarchies and Caches?

2. What is a Cache?

3. The Principle of Locality

4. Cache Organization: Blocks, Tags, and Mapping

To review:

  1. Direct-Mapped Cache:
    • How it works: Each block from main memory has only one specific location (cache line) where it can be placed in the cache. This is typically determined by taking the memory address modulo (%) the number of cache lines.
    • Analogy: Imagine a bookshelf where each book (memory block) has a specific, pre-assigned slot based on its title’s first letter. If another book with the same starting letter comes along and that slot is full, the old book is removed.
    • Pros: It’s simple to implement and fast because the location of a memory block in the cache is fixed.
    • Cons: It can lead to higher miss rates if multiple active memory blocks map to the same cache line, even if other cache lines are empty. This is known as “conflict misses.”
  2. Fully Associative Cache:
    • How it works: A block from main memory can be placed in any available cache line. There are no restrictions on where a block can go.
    • Analogy: Think of a bookshelf where any book can be placed in any empty slot. To find a book, you’d have to look at all the slots.
    • Pros: It’s the most flexible and can achieve the lowest miss rates because it avoids conflict misses. A block is only replaced when the cache is actually full.
    • Cons: It’s the most complex and expensive to implement. To find a block, the cache controller must simultaneously compare the requested memory block’s tag with all the tags in the cache (using Content Addressable Memory - CAM), which requires a lot of comparison hardware. This also makes it slower for larger caches.
  3. Set-Associative Cache (or N-way Set-Associative Cache):
    • How it works: This is a compromise between direct-mapped and fully associative cache. The cache is divided into a number of sets. Each block from main memory maps to a specific set, but it can be placed in any of the cache lines within that set. If there are ‘N’ cache lines in a set, it’s called an ‘N-way’ set-associative cache.
      • A direct-mapped cache can be thought of as a 1-way set-associative cache.
      • A fully associative cache with ‘M’ lines can be thought of as an M-way set-associative cache where there’s only one set.
    • Analogy: Imagine a bookshelf divided into sections (sets) based on, say, genre. A book (memory block) must go into its designated genre section, but within that section, it can be placed in any available slot.
    • Pros: It offers a good balance between the performance of fully associative (by reducing conflict misses compared to direct-mapped) and the hardware cost/complexity of direct-mapped (by limiting the number of comparisons needed).
    • Cons: It’s more complex than direct-mapped but less complex than fully associative. The miss rate is generally lower than direct-mapped but can be slightly higher than fully associative. The choice of ‘N’ (the associativity) is a design trade-off.

In summary, the three main types are:

The obvious question is to ask which of these three cache types has the highest (average) hit rate, but it’s important to understand that there isn’t a fixed “average hit rate” for each type of cache mapping that applies universally. The actual hit rate achieved by a cache depends heavily on a variety of factors, including:

  1. Cache Size: Larger caches generally lead to higher hit rates because more data can be stored, reducing capacity misses.
  2. Block Size (or Line Size): The amount of data fetched from memory into a cache line. If the block size is too small, the cache might not exploit spatial locality well. If it’s too large, it might bring in unused data, wasting cache space and bandwidth, potentially leading to pollution.
  3. The Specific Program/Workload: This is a huge factor! Programs with good locality of reference (both temporal – recently accessed data is likely to be accessed again soon, and spatial – data near recently accessed data is likely to be accessed soon) will experience much higher hit rates.
  4. Associativity: For set-associative caches, the number of ways (N) influences the hit rate.
  5. Replacement Policy: For set-associative and fully associative caches, the algorithm used to decide which block to evict when a new block needs to be brought in (e.g., LRU - Least Recently Used, FIFO - First-In, First-Out, Random) can impact hit rates.
  6. The specific MIPS processor implementation and its overall memory system design.

Relative Performance and General Trends:

Instead of specific percentages, we can talk about the relative performance you might expect and the general trends:

How Hit Rates Are Determined in Practice:

Cache hit rates are usually determined through simulation using benchmark programs that represent typical workloads for the target system. Designers run these benchmarks on models of different cache configurations (varying size, associativity, block size, etc.) to see which designs provide the best performance for the cost.

In summary:

But remember, these are general tendencies. A large, well-designed direct-mapped cache might outperform a very small fully associative cache for certain workloads. The specific workload (program behavior) is often the most dominant factor.

5. Cache Write Policies

6. Caches and the MIPS Processor

Understanding these basic cache concepts is fundamental to understanding how MIPS processors efficiently access memory and achieve performance.

Example of a direct-mapped cache:

Cache Configuration:

Derived Cache Parameters:

  1. Number of Cache Lines:
    • Formula: Total Cache Size / Block Size
    • Calculation: 32 Bytes / 4 Bytes/block = 8 lines
    • This means our cache has 8 slots, indexed 0 through 7.
  2. Address Structure:
    • Offset Bits: To address a byte within a block.
      • Formula: $log_2(\text{Block Size})$
      • Calculation: $log_2(4) = 2$ bits. These are the least significant bits of the address.
    • Index Bits: To determine which cache line the memory block maps to.
      • Formula: $log_2(\text{Number of Cache Lines})$
      • Calculation: $log_2(8) = 3$ bits. These bits come after the offset bits.
    • Tag Bits: The remaining most significant bits, used to verify if the correct block is in the cache line.
      • Formula: Total Address Bits - Index Bits - Offset Bits
      • Calculation: 8 - 3 - 2 = 3 bits.

So, an 8-bit memory address AAAAAAAA will be interpreted as:

TTT III OO Where:

Initial Cache State:

We’ll assume the cache is initially empty. Each line has a “Valid” bit (0 for invalid/empty, 1 for valid/contains data) and a “Tag” field.

Index (Binary) Index (Decimal) Valid Bit Tag (Binary) Data (Conceptual)
000 0 0 Empty
001 1 0 Empty
010 2 0 Empty
011 3 0 Empty
100 4 0 Empty
101 5 0 Empty
110 6 0 Empty
111 7 0 Empty

Sequence of Memory Accesses (Byte Addresses):

Let’s process a sequence of memory accesses and see what happens in the cache.


1. Access: Read Byte at Address 0 (Decimal) = 0000 0000 (Binary)

Index (Binary) Index (Decimal) Valid Bit Tag (Binary) Data (Conceptual)
000 0 1 000 Block 0 (Bytes 0-3)
001 1 0 Empty

2. Access: Read Byte at Address 4 (Decimal) = 0000 0100 (Binary)

Index (Binary) Index (Decimal) Valid Bit Tag (Binary) Data (Conceptual)
000 0 1 000 Block 0 (Bytes 0-3)
001 1 1 000 Block 1 (Bytes 4-7)

3. Access: Read Byte at Address 0 (Decimal) = 0000 0000 (Binary)


4. Access: Read Byte at Address 32 (Decimal) = 0010 0000 (Binary)

Index (Binary) Index (Decimal) Valid Bit Tag (Binary) Data (Conceptual)
000 0 1 001 Block for Addr 32
001 1 1 000 Block 1 (Bytes 4-7)

5. Access: Read Byte at Address 0 (Decimal) = 0000 0000 (Binary)

Index (Binary) Index (Decimal) Valid Bit Tag (Binary) Data (Conceptual)
000 0 1 000 Block 0 (Bytes 0-3)
001 1 1 000 Block 1 (Bytes 4-7)

6. Access: Read Byte at Address 60 (Decimal) = 0011 1100 (Binary)

Index (Binary) Index (Decimal) Valid Bit Tag (Binary) Data (Conceptual)
000 0 1 000 Block 0 (Bytes 0-3)
001 1 1 000 Block 1 (Bytes 4-7)
111 7 1 001 Block for Addr 60

Summary of this direct-mapped cache example:

This example should give you a good idea of how direct-mapped caches operate, including how addresses are divided and how hits and misses (especially conflict misses) occur.

Example of a fully associative cache:

In a fully associative cache, any block from main memory can be placed in any available cache line. This requires comparing the incoming address’s tag with the tags of all currently valid lines in the cache.

Cache Configuration:

Derived Cache Parameters:

  1. Number of Cache Lines:
    • Formula: Total Cache Size / Block Size
    • Calculation: 16 Bytes / 4 Bytes/block = 4 lines
    • Our cache has 4 slots (let’s call them Line 0, Line 1, Line 2, Line 3).
  2. Address Structure:
    • Offset Bits: To address a byte within a block.
      • Formula: $log_2(\text{Block Size})$
      • Calculation: $log_2(4) = 2$ bits.
    • Tag Bits: The remaining bits identify the memory block. There are no index bits used for lookup in a fully associative cache.
      • Formula: Total Address Bits - Offset Bits
      • Calculation: 8 - 2 = 6 bits.

    So, an 8-bit memory address AAAAAAAA will be interpreted as:

    TTTTTT OO Where:

    • TTTTTT: 6 Tag bits
    • OO: 2 Offset bits
  3. Replacement Policy:
    • Since any block can go anywhere, when the cache is full and a new block needs to be brought in (a miss occurs), we must decide which existing block to evict. We’ll use the LRU (Least Recently Used) policy. This means we’ll evict the block that hasn’t been accessed for the longest time.
    • To manage LRU, we’ll keep track of the order of access for the cache lines.

Initial Cache State:

Each line has a “Valid” bit, a “Tag” field, and we’ll also track its “LRU Status” (conceptually, 1 could be Most Recently Used (MRU), and 4 could be Least Recently Used (LRU) when the cache is full). For simplicity in this example, we’ll maintain an ordered list of lines from MRU to LRU.

Line # Valid Bit Tag (Binary) Data (Conceptual Block Addr)
0 0
1 0
2 0
3 0

LRU Order (MRU -> LRU): [] (empty initially)


Sequence of Memory Accesses (Byte Addresses):

1. Access: Read Byte at Address 0 (Decimal) = 0000 0000 (Binary)


2. Access: Read Byte at Address 8 (Decimal) = 0000 1000 (Binary)


3. Access: Read Byte at Address 0 (Decimal) = 0000 0000 (Binary)


4. Access: Read Byte at Address 16 (Decimal) = 0001 0000 (Binary)


5. Access: Read Byte at Address 24 (Decimal) = 0001 1000 (Binary)


6. Access: Read Byte at Address 0 (Decimal) = 0000 0000 (Binary)


7. Access: Read Byte at Address 32 (Decimal) = 0010 0000 (Binary)


Summary of Fully Associative Example:

This example demonstrates how a new block can occupy any line, how hits occur when a tag matches any line, and how LRU policy works when the cache is full and a replacement is needed.

Example of a n-way set associative cache:

Let’s walk through a worked example of a 2-way set-associative cache. This type of cache is a compromise between the simplicity of direct-mapped and the flexibility of fully associative.

Cache Configuration:

Derived Cache Parameters:

  1. Total Number of Cache Lines:
    • Formula: Total Cache Size / Block Size
    • Calculation: 32 Bytes / 4 Bytes/block = 8 lines
  2. Number of Sets:
    • Formula: Total Number of Cache Lines / Associativity
    • Calculation: 8 lines / 2 ways = 4 sets
    • Our cache has 4 sets, indexed 00, 01, 10, 11 (binary) or 0, 1, 2, 3 (decimal).
  3. Address Structure:
    • Offset Bits: To address a byte within a block.
      • Formula: $log_2(\text{Block Size})$
      • Calculation: $log_2(4) = 2$ bits.
    • Set Index Bits: To determine which set the memory block maps to.
      • Formula: $log_2(\text{Number of Sets})$
      • Calculation: $log_2(4) = 2$ bits.
    • Tag Bits: The remaining most significant bits.
      • Formula: Total Address Bits - Set Index Bits - Offset Bits
      • Calculation: 8 - 2 - 2 = 4 bits.

    So, an 8-bit memory address AAAAAAAA will be interpreted as:

    TTTT SS OO Where:

    • TTTT: 4 Tag bits
    • SS: 2 Set Index bits
    • OO: 2 Offset bits
  4. Replacement Policy within each Set:
    • We’ll use LRU (Least Recently Used) for each set. Since each set has 2 ways (Way 0, Way 1), the LRU logic is simple: if one way is accessed, the other becomes the LRU for that set.

Initial Cache State:

Each set has two ways. Each way has a Valid bit, a Tag field. We also track the LRU way for each set.

Set Index (Binary) Set Index (Decimal) Way Valid Bit Tag (Binary) Data (Conceptual Block Addr) LRU Way in Set
00 0 0 0 —- Way 1 (or 0 if both empty)
    1 0 —-  
01 1 0 0 —- Way 1 (or 0)
    1 0 —-  
10 2 0 0 —- Way 1 (or 0)
    1 0 —-  
11 3 0 0 —- Way 1 (or 0)
    1 0 —-  

(For LRU Way in Set: We’ll designate which way is LRU. If Way 0 is hit/filled, Way 1 becomes LRU for that set, and vice-versa).


Sequence of Memory Accesses (Byte Addresses):

1. Access: Read Byte at Address 0 (Decimal) = 0000 0000 (Binary)


2. Access: Read Byte at Address 16 (Decimal) = 0001 0000 (Binary)


3. Access: Read Byte at Address 0 (Decimal) = 0000 0000 (Binary)


4. Access: Read Byte at Address 32 (Decimal) = 0010 0000 (Binary)


5. Access: Read Byte at Address 4 (Decimal) = 0000 0100 (Binary)


6. Access: Read Byte at Address 20 (Decimal) = 0001 0100 (Binary)


7. Access: Read Byte at Address 16 (Decimal) = 0001 0000 (Binary)


Summary of 2-Way Set-Associative Example:

This example illustrates how a memory address targets a specific set, and then a limited associative search is performed within that set. It also shows how LRU works on a per-set basis when replacements are necessary.