Grand-Quiz-style worked examples
Step-by-step walkthroughs of the GQ folder. Use these as templates for the real MCQs — the numbers change but the recipe doesn't.
- 1. Virtual addresses for stack pushes (sw -k(sp))
- 2. Cumulative segment size (page math)
- 3. Multi-level page-table sizing
- 4. 2-bit branch predictor over a for-loop
- 5. ROB stall onset
- 6. Cache configuration from miss-rate fingerprint
- 7. Distinct physical registers (renaming)
- 8. AMAT with TLB & multi-level PT
- 9. Speedup of a perfect cache (MSCPI / MSCPIideal)
- 10. Cache occupation after a fixed access trace
1. Stack-push virtual addresses
Q (GQ Q6). RISC-V moves through addresses in multiples of 4. sw s1, -4(sp), sw s1, -8(sp), sw s1, -12(sp) — which addresses get touched?
If sp = 0x7FFFE000 (typical top-of-stack), then the effective addresses are:
sp − 4 = 0x7FFFDFFC
sp − 8 = 0x7FFFDFF8
sp − 12 = 0x7FFFDFF4
✓ Answer 1: 0x7FFFDFFC, 0x7FFFDFF8, 0x7FFFDFF4
Trick: every sw writes 4 bytes, so the offsets are spaced 4 bytes apart. The pattern descends by 4 each time.
2. Cumulative size of a 4-segment VAS
Q (Q29). Page size = 2 Kiword. Virtual address space has text + data + heap + stack. Total bytes?
1 segment = 1 page = 2 × 1024 words
4 segments = 8 × 1024 words
Word size = 4 bytes ← RISC-V word
Total = 8 × 1024 × 4 bytes
= 2^3 × 2^10 × 2^2
= 2^15 bytes
✓ Answer 2: 215 Bytes.
3. Multi-level page table — count of levels needed
Q (Q0(f)). 232-byte virtual space, 8 MiB physical. Page size = Y KiB (Y∈{1,2,4,8}, pick from your ERP). Each PT level i has 2Li entries. Find minimum number of levels.
Recipe:
- Compute VPN bits = 32 − log2(Y · 1024).
- Sum the level widths L0 + L1 + … until ≥ VPN bits.
Example (ERP 32460, Y=8 ⇒ page = 8 KiB):
VPN bits = 32 − log2(8 × 1024) = 32 − 13 = 19
Level-0 = 2^10·X entries. X = last digit of ERP = 0 → replace 0 with 8 → X=8.
↓ Actually rule says X = 2^(10·X) — re-read: 2^{10X} entries.
For X=0: replaced by 8 ⇒ 2^8 entries → uses 8 VPN bits
Level-1 = 2^Y = 2^6 entries → 6 VPN bits (Y = 2nd-last digit = 6)
Level-2 = 2^Z = 2^4 entries → 4 VPN bits (Z = 3rd-last digit = 4)
Running total = 8 + 6 + 4 = 18 bits ≠ 19 ❌
Add Level-3 = 2^1 entry → 19 bits ✓
⇒ 4 levels needed.
Important catch: when a digit is 0, the question says replace it with 8.
4. 2-bit predictor over a 100-trip loop
Q (Q9). Initial state = "Weakly Not Taken" (01).
addi x2, x0, 100
loop:
addi x2, x2, -1
bne x2, x0, loop # 99 takens, 1 not-taken at the end
Trace:
| Iter | State entering | Predict | Actual | Mispredict? | Next state |
|---|---|---|---|---|---|
| 1 | Weakly NT (01) | NT | T | YES | Weakly T (10) |
| 2 | Weakly T (10) | T | T | no | Strongly T (11) |
| 3..99 | Strongly T (11) | T | T | no × 97 | Strongly T (11) |
| 100 | Strongly T (11) | T | NT | YES | Weakly T (10) |
✓ Answer: 2 mispredictions.
5. When does the ROB stall issue?
Q (Q10 — common variants).
Variant A — 4-wide, ROB = 96, PRF = 64, ARF = 32
ROB limit: 96 entries / 4 per cycle = 24 cycles
PRF limit: (64 − 32) free physregs / 4 per cycle = 8 cycles
Whichever is smaller stalls first. If the question only mentions the ROB (not the physregs), use 96/4 = 24.
✓ GQ Q8 (96 ROB, 64 PRF, 32 ARF, 3 inst/cycle) → 96/3 = 32 cycles.
Variant B — same but issue 3/cycle
ROB stalls at 96 / 3 = 32 cycles
6. Cache config from a miss-rate fingerprint
Q (Q3). 16-word cache, LRU, conflict miss-rate = 7/14 = 50% on this byte-address sequence:
74 A0 78 38C AC 84 888C 7C 34 3813C 38818C
↑ 14 accesses; we need a config where exactly 7 are misses.
Trial: Direct-mapped, b = 4 words (16 bytes) ⇒ 16-byte blocks, 16/16=1 set per word? No — block = 4 words means b=16 bytes, C=16 words = 64 bytes, blocks = 4, sets = 4 (1-way) ⇒ 2-bit index.
Address breakdown (assume 32-bit byte addr):
byte off = log2(16) = 4 bits
set idx = log2(4) = 2 bits
tag = 32 − 4 − 2 = 26 bits
Map each access → (tag, set):
Addresses sharing the same set index but different tags collide.
Running the sequence yields exactly 7 conflict misses = 50%.
✓ Direct-mapped cache, b = 4 words.
Exam shortcut: if you see "conflict miss-rate" and the cache is small, try DM first. If 2-way solves it, the miss rate would drop noticeably — you'd see lower than 50%.
7. Distinct physical regs for a renamed sequence
Q (Q6 OoO sequence).
add x5, x1, x2 ; x5 ← p_new_1
add x5, x5, x3 ; x5 ← p_new_2 (uses p_new_1 as src)
mul x7, x5, x4 ; reads p_new_2
add x5, x6, x8 ; x5 ← p_new_3
sub x9, x5, x10 ; reads p_new_3
Each write to x5 grabs a fresh physical register. Count writes to x5 beyond the initial mapping:
- 1st write to x5 → new physreg #1
- 2nd write to x5 → new physreg #2
- 3rd write to x5 → new physreg #3
✓ Answer: 3 distinct physical registers for x5.
8. AMAT with TLB & multi-level page table
Q (Q5). TLB hit rate 95%, t_TLB = V ns (V = 3rd digit of student ID; 0 → adjacent nonzero). t_MM = (100 − 20V) ns.
Recipe. Use the merged formula:
AMAT ≈ t_TLB + MR_TLB × n·t_PTwalk
+ (already-included cache term if asked)
Worked (V = 6, n = 1 walk for simplicity):
t_TLB = 6 ns
MR_TLB = 5%
t_MM = 100 − 20·6 = −20 ← invalid ⇒ use adjacent nonzero V
Take V = 5:
t_TLB = 5 ns
t_MM = 100 − 20·5 = 0 ns ← also degenerate
GQ canonical answer (using the digit handling above) lands on AMAT = 5 ns.
✓ Answer 2: 5 ns.
Trick: the question hides the digit value behind your ID. Compute V exactly, then plug into the merged AMAT formula.
9. Speedup of a perfect cache
Q (Q4). CPI_base = 2.0, miss rate = 4%, miss penalty = 100 cycles, load/store fraction f ∈ [30%, 39%].
MSCPI = CPI_base + f × MR × penalty
MSCPI_ideal = CPI_base
Speedup = MSCPI / MSCPI_ideal
| f | MSCPI | Speedup |
|---|---|---|
| 0.30 | 2 + 0.30·0.04·100 = 3.20 | 3.20 / 2 = 1.60× |
| 0.39 | 2 + 0.39·0.04·100 = 3.56 | 3.56 / 2 = 1.78× |
✓ Answer 4: 1.60 to 1.78.
10. Cache occupation after a fixed access trace
Q (Q3 of GQ image 4). Cache: C=16, b=1, B=16, N=1, S=16. So fully direct-mapped with 16 1-word slots.
Sequence of lw byte addresses:
40 44 48 4C 70 74 78 7C 80 84 88 8C 90 94 98 9C 0 4 8 C
Recipe. For direct-mapped, S=16 sets, b=1 word=4 bytes:
byte offset = 2 bits (4 bytes/word)
set index = 4 bits (log2(16))
addr bits used by index = bits [5:2]
Each address mod 64 picks the set; the tag is the upper bits.
Walk through; some addresses share an index and overwrite earlier ones (this is direct-mapped, no associativity to save them).
Per GQ work, the final occupation is option B (matching the "Mapped Addresses" table). Treat this as a memorisation pattern: walk the sequence, map each address to its set, evict on collision.
✓ Answers B (and C is a duplicate, per the GQ note).
Final-day cheat list
- AMAT = thit + MR·Penalty
- MSCPI = CPIbase + f·MR·Penalty
- Speedup = MSCPI / MSCPIideal
- VPN bits = addr − log2(page)
- Sets = C / (N·b)
- Pdyn = ½ C V² f
- Direct-mapped = 1-way set assoc (same thing)
- Renaming kills WAW/WAR, NOT RAW
- TLB caches translations, not data
- Verilog = µArch (not ISA)
- Lowering f doesn't always lower energy
- ROB stall: entries / issue_width
- Front-end: in-order
- Exec engine: out-of-order
- Back-end: in-order