Courses & Projects by Rob Marano

Assignment 11: Pipelined Datapath and Hazards

<5 points>

Homework Pointing Scheme

Total points	Explanation
0	Not handed in
1	Handed in late
2	Handed in on time, not every problem fully worked through and clearly identifying the solution
3	Handed in on time, each problem answered a boxed answer, each problems answered with a clearly worked through solution, and less than majority of problems answered correctly
4	Handed in on time, majority of problems answered correctly, each solution boxed clearly, and each problem fully worked through
5	Handed in on time, every problem answered correctly, every solution boxed clearly, and every problem fully worked through.

Reading

Computer Organization and Design (6th Edition) Chapter 4, Sections 4.6, 4.7, 4.8, and 4.9.

Formal Guidelines

You must show all algebraic derivations for timing calculations.
When referencing pipeline stages, utilize the 5-stage abbreviations: IF, ID, EX, MEM, WB.
Any SystemVerilog code should distinguish between combinational (always_comb) and sequential (always_ff) logic blocks.

Part 1: Pipelining Speedup Analysis (Easy)

Assume a custom silicon processor possesses the following absolute functional component latencies:

Instruction Memory Fetch (IF): $200\text{ ps}$
Register File Read/Decode (ID): $100\text{ ps}$
ALU Compute (EX): $200\text{ ps}$
Data Memory Access (MEM): $200\text{ ps}$
Register File Write (WB): $100\text{ ps}$
Pipeline Register Overhead: $20\text{ ps}$

Problem 1.1

What is the minimum safe global Clock Cycle Time ($T_c$) necessary to execute instructions flawlessly across this 5-stage pipeline?

Problem 1.2

Calculate the total instruction latency (from Fetch to Writeback) for a single lw instruction passing through the pipeline, factoring in the clock cycle time limit from your previous answer. How does this compare to the un-pipelined latency?

Part 2: Data Hazards and Hardware Bypassing (Medium)

Suppose the pipelined processor is executing the following sequential code block. Assume the processor implements full physical EX/MEM and MEM/WB forwarding (bypassing), but does not predict branches.

sub $t2, $t1, $t3
and $t4, $t2, $t5
or  $t8, $t2, $t6
add $t9, $t4, $t2

Problem 2.1

Identify all Read-After-Write (RAW) Data Dependencies in the given instruction sequence. List each dependency by the producer instruction, the consumer instruction, and the targeted register.

Problem 2.2

Identify which hardware forwarding path (e.g., EX/MEM to EX, or MEM/WB to EX) successfully resolves each of the dependencies you listed in 2.1, preventing any pipeline stalls.

Part 3: Load-Use Stalls and SystemVerilog (Hard)

Consider the following MIPS sequence, which suffers from a specific architectural limitation that pure bypass logic cannot solve:

lw  $t1, 0($t2)
add $t3, $t1, $t4

Problem 3.1

Explain exactly why full hardware bypassing (forwarding) fails to prevent a stall for this specific instruction sequence. During which clock cycles (and stages) do the producer and consumer overlap?

Problem 3.2

Assume the processor includes a fundamental Hazard Unit hazard.sv. Write the boolean logical equation or SystemVerilog assign snippet evaluated in the Decode (ID) stage that structurally asserts a lwstall signal for the pipeline. You may reference signals logically mapped from the EX stage to detect this overlap.

Part 4: Branch Prediction and Control Hazards (Conceptual)

Problem 4.1

When evaluating Control Hazards (e.g., encountering a beq), the hardware can natively assume a “Predict-Not-Taken” pipeline flush technique, OR the architecture can rely on the compiler natively shifting instructions into a “Branch Delay Slot.” Describe both mechanisms and explain the primary advantages of relying on the compiler for branch delay optimization.

This site built with GitHub.