<5 points>
| Total points | Explanation |
|---|---|
| 0 | Not handed in |
| 1 | Handed in late |
| 2 | Handed in on time, not every problem fully worked through and clearly identifying the solution |
| 3 | Handed in on time, each problem answered a boxed answer, each problems answered with a clearly worked through solution, and less than majority of problems answered correctly |
| 4 | Handed in on time, majority of problems answered correctly, each solution boxed clearly, and each problem fully worked through |
| 5 | Handed in on time, every problem answered correctly, every solution boxed clearly, and every problem fully worked through. |
1. Cache Fields and Configuration
For a direct-mapped cache design with a 32-bit physical address, the following bits of the address are used to access the cache:
a) What is the cache block size (in words)? b) How many entries (blocks) does the cache have? c) What is the ratio between total bits required for such a cache implementation over the data storage bits? (Assume 1 valid bit per block and no dirty bit).
2. Direct-Mapped Cache Access
Below is a list of 32-bit memory address references, given as word addresses (not byte addresses):
0x03, 0xb4, 0x2b, 0x02, 0xbf, 0x58, 0xbe, 0x0e, 0xb5, 0x2c, 0xba, 0xfd
For each of these references, identify the binary address, the tag, and the index given a direct-mapped cache with 16 one-word blocks. Also list if each reference is a hit or a miss, assuming the cache is initially empty. Create a trace table showing this.
3. Cache Performance and Bottlenecks
Assume that main memory accesses take 70 ns and that memory accesses represent 36% of all instructions executing in a pipeline. The following table shows data for L1 caches attached to each of two processors, P1 and P2.
| Processor | L1 Size | L1 Miss Rate | L1 Hit Time |
|---|---|---|---|
| P1 | 2 KB | 8.0% | 0.66 ns |
| P2 | 4 KB | 6.0% | 0.90 ns |
a) Assuming that the L1 hit time completely determines the clock cycle time for P1 and P2, what are their respective clock rates in GHz? b) What is the Average Memory Access Time (AMAT) in nanoseconds for P1 and P2? c) Assuming a base CPI of 1.0 without any memory stalls, what is the total effective CPI for P1 and P2? Which processor is ultimately faster? Show your total execution time calculation.
Submit your answers as a PDF or Markdown file via the Microsoft Teams’ assignment. Show all your mathematical work clearly for full credit.