<5 points>
| Total points | Explanation |
|---|---|
| 0 | Not handed in |
| 1 | Handed in late |
| 2 | Handed in on time, not every problem fully worked through and clearly identifying the solution |
| 3 | Handed in on time, each problem answered a boxed answer, each problems answered with a clearly worked through solution, and less than majority of problems answered correctly |
| 4 | Handed in on time, majority of problems answered correctly, each solution boxed clearly, and each problem fully worked through |
| 5 | Handed in on time, every problem answered correctly, every solution boxed clearly, and every problem fully worked through. |
Computer Organization and Design (6th Edition) Chapter 4, Sections 4.6, 4.7, 4.8, and 4.9.
always_comb) and sequential (always_ff) logic blocks.Assume a custom silicon processor possesses the following absolute functional component latencies:
What is the minimum safe global Clock Cycle Time ($T_c$) necessary to execute instructions flawlessly across this 5-stage pipeline?
Calculate the total instruction latency (from Fetch to Writeback) for a single lw instruction passing through the pipeline, factoring in the clock cycle time limit from your previous answer. How does this compare to the un-pipelined latency?
Suppose the pipelined processor is executing the following sequential code block. Assume the processor implements full physical EX/MEM and MEM/WB forwarding (bypassing), but does not predict branches.
sub $t2, $t1, $t3
and $t4, $t2, $t5
or $t8, $t2, $t6
add $t9, $t4, $t2
Identify all Read-After-Write (RAW) Data Dependencies in the given instruction sequence. List each dependency by the producer instruction, the consumer instruction, and the targeted register.
Identify which hardware forwarding path (e.g., EX/MEM to EX, or MEM/WB to EX) successfully resolves each of the dependencies you listed in 2.1, preventing any pipeline stalls.
Consider the following MIPS sequence, which suffers from a specific architectural limitation that pure bypass logic cannot solve:
lw $t1, 0($t2)
add $t3, $t1, $t4
Explain exactly why full hardware bypassing (forwarding) fails to prevent a stall for this specific instruction sequence. During which clock cycles (and stages) do the producer and consumer overlap?
Assume the processor includes a fundamental Hazard Unit hazard.sv. Write the boolean logical equation or SystemVerilog assign snippet evaluated in the Decode (ID) stage that structurally asserts a lwstall signal for the pipeline. You may reference signals logically mapped from the EX stage to detect this overlap.
When evaluating Control Hazards (e.g., encountering a beq), the hardware can natively assume a “Predict-Not-Taken” pipeline flush technique, OR the architecture can rely on the compiler natively shifting instructions into a “Branch Delay Slot.” Describe both mechanisms and explain the primary advantages of relying on the compiler for branch delay optimization.