Mini quizzes

Bite-sized MCQs per topic. Click an option to reveal the answer + explanation. Live score tracker at the top.

Score: 0 / 0 Answered: 0 Pick a topic ↓

Pipelining

1An ideal pipelined processor has a CPI of:
1
0
5 (number of stages)
Equal to the number of instructions
1. One instruction completes per cycle in steady state — that's what pipelining buys you. Hazards push CPI above 1.
2Why is a multicycle processor sometimes slower than single-cycle for the same program?
It uses more transistors
Each instruction takes multiple cycles, raising CPI well above 1
It has no forwarding
Branch misprediction penalty is higher
Multicycle. The clock is faster but every instruction takes 3-5 cycles. Total time = #inst × CPI × Tc, and CPI > 1 dominates. Pipelining keeps Tc short AND CPI ≈ 1.
3Which hazard cannot be eliminated by forwarding alone?
ALU → ALU RAW
Load-use (lw immediately followed by a dependent op)
EX → MEM forwarding
WAW hazards
Load-use. The load result isn't available until after MEM, so a dependent instruction in EX must stall by 1 bubble even with forwarding.
4In a 5-stage pipeline, a mispredicted branch costs how many flushed instructions?
5
3
2
0
2. The branch resolves in EX, by which point 2 instructions are already in IF and ID. Both must be flushed (branch CPI = 3).
5SPECINT2000 mix: 25% loads (40% stall), 10% stores, 13% branches (50% mispredict), 52% R-type. Average CPI ≈
1.00
1.23
1.57
2.00
1.23. CPIlw=0.6·1+0.4·2=1.4. CPIbr=0.5·1+0.5·3=2. Avg = 0.25·1.4 + 0.10·1 + 0.13·2 + 0.52·1 = 1.23.

Advanced µArch

1Register renaming eliminates which hazards?
RAW only
RAW and WAW
WAR and WAW
All hazards
WAR & WAW. These are name hazards — fresh physical registers eliminate the conflict. RAW is a true data dependency and survives renaming.
2The execution-order pattern of a modern OoO core is:
In-order, In-order, Out-of-order
Out-of-order, In-order, Out-of-order
In-order, Out-of-order, In-order
Out-of-order, Out-of-order, In-order
In, Out, In. Front-end fetches/decodes/renames in order. Execution engine schedules out of order. Back-end (ROB) commits in order so the program appears sequential.
3A 2-bit predictor starting at "Weakly Not Taken" runs a 100-iteration loop. Total mispredictions =
1
2
3
100
2. One on iteration 1 (state was 01, branch taken). The state walks to Strongly Taken and stays. The 100th iteration is the loop exit (not taken) → mispredict #2.
4A 4-wide superscalar with a 96-entry ROB. Each instruction writes one register. Cycles before the ROB blocks issue (assuming no commits yet)?
96
32
24
8
24. ROB capacity / issue width = 96 / 4 = 24 cycles before the buffer fills.
52-bit predictors are better than 1-bit predictors because they cause:
Increased CPI
Reduced IPC
Fewer mispredictions on stable loops (need 2 wrongs to flip direction)
No hardware cost
Hysteresis. 1-bit flips on any wrong guess; 2-bit needs two consecutive wrongs. Loop-exits don't destabilise the predictor.

Memory Systems

1If the degree of associativity is 1, the cache is:
1-Way Set Associative only
Direct Mapped only
Both 1 and 2 (they're the same thing)
Neither
Both. A 1-way set-associative cache is a direct-mapped cache. Two names, one structure.
2The TLB is:
A small cache for data
A small cache for instructions
A small cache for address translations
A small cache for page-fault records
Translations. Translation Lookaside Buffer holds recent VPN → PPN mappings so we skip the page-table walk on hits.
3Page size = 2 Kiword, 4 segments (text/data/heap/stack). Total cumulative bytes?
213
215
217
210
215. 4 × 2 × 1024 words = 213 words. Times 4 bytes/word = 215 bytes.
4CPIbase = 2, miss rate 4%, miss penalty 100 cyc, f = 30%. Speedup from a perfect cache:
1.20×
1.60×
2.00×
3.20×
1.60×. MSCPI = 2 + 0.30·0.04·100 = 3.20. Speedup = 3.20 / 2 = 1.60.
5Increasing the cache block size will:
Always improve miss rate
Always hurt miss rate
Exploit spatial locality but may raise conflict misses & penalty
Have no effect on miss penalty
Trade-off. Bigger blocks pre-fetch more useful neighbours (spatial locality) but reduce the # of blocks ⇒ more conflicts, and each miss transfers more data ⇒ higher penalty.

Vector / RVV

1SIMD / Vector exploits which kind of parallelism?
Instruction-Level
Thread-Level
Data-Level
Bit-Level
Data-level. Same operation across many data elements simultaneously.
2VLEN = 256 bits, SEW = 32 bits, LMUL = 1. VLMAX is:
256
32
8
1
8. VLMAX = LMUL × VLEN / SEW = 1 × 256 / 32 = 8 elements per vector register.
3With AVL = 5 and VLMAX = 8, the runtime VL set by vsetvli is:
8
5
3
13
5. VL = min(AVL, VLMAX) = min(5, 8) = 5. The other 3 lanes are tail.
4The biggest advantage of RVV over traditional SIMD (SSE/AVX) is:
It's faster on every operation
It uses less power
VL is runtime-variable, so the same binary scales across hardware widths
It avoids cache misses
Portability. SSE/AVX bake width into opcodes (movdqa 128, AVX-512). RVV uses a runtime VL — no recompile, automatic tail handling.
5With "tail agnostic", the hardware may leave inactive lanes:
Always zeroed
Always preserved (old values)
Anything — old, all-1s, undefined; don't rely on them
Set to NaN
Anything. Agnostic = the spec doesn't require any specific value. Use tail undisturbed if you need preservation.

Green Computing + Design Verification

1Dynamic power scales as:
P ∝ V
P ∝ V·f
P ∝ ½ C V² f
P ∝ I·V (leakage only)
½ C V² f. Voltage is squared — that's why voltage scaling delivers the biggest power wins.
2"Dark silicon" refers to:
Defective transistors
Areas of the die left unpowered because we can't dissipate the heat
Carbon emissions from chip manufacture
Off-state leakage
Unpowered areas. Dennard scaling broke ~2003 — more transistors fit, but per-transistor power didn't fall, so portions of the die must stay dark to stay thermally safe.
3Lower PUE means a data centre is:
Hotter
More efficient (less overhead beyond IT load)
Less reliable
Slower
Efficient. PUE = total facility power / IT power. Ideal = 1.0 (no overhead). Industry avg ~1.5, Google ~1.1.
4Verilog is used to code:
Microarchitecture (datapath, control, hardware)
Architecture (ISA)
Assembly
High-level language programs
Microarchitecture. Verilog is an HDL — it describes the actual hardware that implements the ISA.
5In a UVM testbench, the component that observes DUT signals passively and broadcasts transactions via an analysis port is the:
Driver
Sequencer
Monitor
Scoreboard
Monitor. Passive — never drives signals. Captures activity → transaction → analysis port → scoreboard/coverage.