← back to syllabus ← back to notes
While we touched upon structural modeling (instantiating gates) in Week 4, SystemVerilog provides a robust set of built-in gate primitives that form the lowest level of structural description natively supported by the language.
Detailed notes on Verilog/SystemVerilog as a whole and notes on how to run SystemVerilog on your computer
SystemVerilog supports the standard logic gates, which can have one output and multiple inputs:
and, nandor, norxor, xnorSyntax: gate_type [instance_name] (output, input1, input2, ...);
// 2-input AND gate
and a1 (out, in1, in2);
// 4-input OR gate (primitives can take N inputs!)
or o1 (out, in1, in2, in3, in4);
Unlike standard logic gates, buffers and inverters have one input and can have multiple outputs.
buf: Passes the input to the output unchanged (used for increasing drive strength).not: Inverts the input.Syntax: gate_type [instance_name] (output1, output2, ..., input);
// 1 input 'A', fanning out to 3 inverted outputs
not n1 (out1, out2, out3, A);
Used to model busses where multiple drivers might exist. They have an input, an output, and a control signal.
bufif1: Passes input to output if control is 1, else high-impedance (Z).notif0: Inverts input to output if control is 0, else high-impedance (Z).You can accurately model physical propagation delays using the # delay control operator when instantiating primitives.
// Output 'y' changes 5 time units after inputs 'a' or 'b' change
and #(5) u1 (y, a, b);
// Specifies separate rise, fall, and turn-off delays
// #(Rise, Fall, Turn-Off)
bufif1 #(2, 3, 4) b1 (out, in, ctrl);
When dealing with busses, instantiating 32 individual AND gates is tedious. SystemVerilog allows instantiating an array of primitives concisely.
logic [7:0] A, B, Y;
// Instantiates 8 AND gates, where Y[i] = A[i] & B[i]
and bitwise_and [7:0] (Y, A, B);
Example 1: Easy (SOP Equation)
Task: Implement $z = (x_1 x_2)’ + x_3$ using nand, and, or.
module easy_prim(output logic z, input logic x1, x2, x3);
logic net1;
nand u1 (net1, x1, x2); // net1 = (x1 * x2)'
or u2 (z, net1, x3); // z = net1 + x3
endmodule
Example 2: Medium (8-bit Odd Parity Generator)
Task: An array of xor and xnor to determine if an 8-bit bus has an odd number of 1s.
module med_prim(output logic p_odd, input logic [7:0] data);
logic [3:0] net1;
logic [1:0] net2;
// Layer 1: pairwise XOR
xor l1 [3:0] (net1, data[7:4], data[3:0]);
// Layer 2: pairwise XOR
xor l2 [1:0] (net2, net1[3:2], net1[1:0]);
// Layer 3: final XNOR (inverts even parity to get odd)
xnor out (p_odd, net2[1], net2[0]);
endmodule
Example 3: Hard (Structural 4-to-1 Multiplexer)
Task: Build a 4:1 mux using only bufif1 tri-state primitives and standard gates.
module hard_prim(output wire y, input logic [3:0] d, input logic [1:0] sel);
logic sel0_n, sel1_n;
logic dec[3:0]; // decoder outputs
// Invert select lines
not (sel0_n, sel[0]);
not (sel1_n, sel[1]);
// 2-to-4 Decoder
and (dec[0], sel1_n, sel0_n);
and (dec[1], sel1_n, sel[0]);
and (dec[2], sel[1], sel0_n);
and (dec[3], sel[1], sel[0]);
// Tri-state drivers
// Only one driver is active at a time; others are high-Z.
bufif1 b0 (y, d[0], dec[0]);
bufif1 b1 (y, d[1], dec[1]);
bufif1 b2 (y, d[2], dec[2]);
bufif1 b3 (y, d[3], dec[3]);
endmodule
While built-in primitives handle basic logic, what if you want to define a custom, optimized gate cell (like a 2-to-1 mux or a specific SR latch) without building it from smaller gates? SystemVerilog allows User-Defined Primitives (UDPs).
UDPs are essentially truth tables or state transition tables evaluated extremely quickly by the simulator.
inout ports.UDPs are declared outside of module boundaries using the primitive ... endprimitive block.
A combinational UDP defines the output solely based on the current inputs.
Example: A Custom 3-Input Majority Gate Outputs 1 if two or more inputs are 1.
primitive majority3 (
output logic y,
input logic a, b, c
);
// Define the truth table
table
// a b c : y
0 0 0 : 0;
0 0 1 : 0;
0 1 0 : 0;
0 1 1 : 1;
1 0 0 : 0;
1 0 1 : 1;
1 1 0 : 1;
1 1 1 : 1;
endtable
endprimitive
You instantiate majority3 exactly as you would an and gate.
A sequential UDP has state (memory). The output port relies on both its current state and the inputs. The output port must be declared as a reg or logic within the primitive.
Example: A Level-Sensitive D-Latch
primitive d_latch_udp (
output logic q,
input logic clk, d
);
// Optional: Set initial state
initial q = 1'b0;
table
// clk d : state : next_state(q)
1 0 : ? : 0; // CLK=1, pass D
1 1 : ? : 1;
0 ? : ? : -; // CLK=0, retain state ('-' means no change)
? (??) : ? : -; // Ignore any edges on D if CLK hasn't changed
endtable
endprimitive
(Note: ? means any state. (01) denotes a rising edge, (??) denotes any edge).
As seen above, UDPs are exceptional for modeling low-level library cells provided by silicon foundries, allowing for fast simulation of specific hardware behaviors.
Example 1: Easy (2-to-1 Mux Combinational UDP) Task: truth table for a Multiplexer.
primitive mux2_udp(output logic y, input logic d0, d1, sel);
table
// d0 d1 sel : y
0 ? 0 : 0; // sel=0, pass d0
1 ? 0 : 1;
? 0 1 : 0; // sel=1, pass d1
? 1 1 : 1;
endtable
endprimitive
Example 2: Medium (Positive Edge-Triggered D-FF)
Task: Sequential UDP transitioning only on posedge clk.
primitive dff_udp(output logic q, input logic clk, d);
initial q = 0;
table
// clk d : state : next_state
(01) 0 : ? : 0; // Rising edge, capture 0
(01) 1 : ? : 1; // Rising edge, capture 1
(1?) ? : ? : -; // Falling edge, hold state
(?0) ? : ? : -; // Any 0 transition, hold state
? (?) : ? : -; // Ignore changes in D without clock edge
endtable
endprimitive
Example 3: Hard (D-FF with Asynchronous Active-Low Clear) Task: Sequential UDP handling asynchronous inputs alongside a clock.
primitive dff_clr_udp(output logic q, input logic clk, d, clr_n);
initial q = 0;
table
// clk d clr_n : state : next_state
? ? 0 : ? : 0; // Async clear takes precedence
? ? (?0) : ? : 0; // Clearing edge
(01) 0 1 : ? : 0; // Rising edge, clr is inactive (1)
(01) 1 1 : ? : 1;
(1?) ? 1 : ? : -; // Ignoring falling clock
? (?) 1 : ? : -; // Ignoring D variations
? ? (01) : 0 : -; // clr recovering, hold 0
endtable
endprimitive
Dataflow modeling (assign) describes how data moves from inputs to outputs concurrently through combinational logic. In Week 4, we saw basic boolean equations. Here we dive into the nuances that make Dataflow powerful for RTL design.
assign)A continuous assignment evaluates instantly anytime a signal on its right-hand side (RHS) changes, driving the result to the left-hand side (LHS) continuously.
Rules:
wire, logic). It cannot be a variable type heavily used in procedural blocks without care.wire sum, carry;
// Both statements execute concurrently. If 'a' changes, both are re-evaluated.
assign sum = a ^ b ^ cin;
assign carry = (a & b) | (cin & (a ^ b));
Instead of declaring a net and then assigning it, you can combine them.
// Explicit
wire out;
assign out = in1 & in2;
// Implicit (Shorthand)
wire out = in1 & in2;
Unlike structural delays, dataflow delays specify how long it takes for a changed value on the RHS to propagate to the LHS.
// If 'a' changes, wait 10 units, then assign the new value of (a & b) to 'y'
assign #10 y = a & b;
wire #10 y; // 'y' always takes 10 units to reflect any driven value
assign y = a & b; // Evaluates immediately, but 'y' won't update for 10 units
SystemVerilog provides a rich set of operators for dataflow equations. Understanding precedence (order of operations) is critical to avoid unexpected logic.
(Highest to Lowest Precedence)
+ - ! ~ & ~& | ~| ^ ~^ (Unary / Reduction)** (Power / Exponentiation)* / % (Arithmetic Multiply/Divide)+ - (Arithmetic Add/Subtract)<< >> <<< >>> (Logical and Arithmetic Shifts)< <= > >= (Relational)== != === !== (Equality and Case Equality)& (Bitwise AND)^ ~^ (Bitwise XOR/XNOR)| (Bitwise OR)&& (Logical AND)|| (Logical OR)?: (Conditional / Ternary)Always use parentheses () to clarify intent, even if you memorize precedence.
Dataflow Modeling Patterns:
assign y = (a + b) - c; (Synthesis tools infer adders/subtractors).assign is_valid = (data_in > 10) && (en == 1); (Returns 1-bit boolean).assign data_out = (sel == 2'b00) ? in0 :
(sel == 2'b01) ? in1 :
(sel == 2'b10) ? in2 : in3;
// Packing state flags
assign status_bus = {is_ready, has_error, 6'b000000};
// Sign-extending a 4-bit number 'a' to 8 bits
assign padded_a = , a};
By mastering these Dataflow tools, complex combinational logic blocks like ALU functions (Add, Subtract, Shift, Compare) become readable and highly synthesizable one-liners.
assign, =, and <=As designs grow complex, understanding EXACTLY how values are placed into variables is critical to synthesize working hardware. SystemVerilog strictly separates continuous dataflow assignments from procedural block assignments.
1. Continuous Assignment (assign)
module.wire or the structural use of logic).2. Procedural Blocking Assignment (=)
always or always_comb procedural blocks.reg or the procedural use of logic).if/else and case structures.3. Procedural Non-Blocking Assignment (<=)
always_ff or always @(posedge clk) blocks.reg or logic).<= statements in the block are evaluated immediately, but the left-hand sides do not change until the very end of the simulation time step. This is explicitly required to model Registers and Flip-Flops. Using <= safely prevents simulation race conditions between discrete clocked hardware elements.[!WARNING] The Golden Rules:
- Never use
=to model sequential logic (Flip-Flops). Use<=.- Never use
<=to model combinational logic. Use=orassign.- Never mix
=and<=in the samealwaysblock.
Example 1: Easy (Full Adder) Task: Utilize logic operations and concatenation for a 1-bit full adder.
module easy_dataflow(output logic sum, cout, input logic a, b, cin);
// Explicit boolean eq:
// assign sum = a ^ b ^ cin;
// assign cout = (a & b) | (cin & (a ^ b));
// Better: Allow synthesis tools to map the arithmetic operator (+)
assign {cout, sum} = a + b + cin;
endmodule
Example 2: Medium (Dataflow 4-to-1 Mux)
Task: A 4:1 Mux modeled strictly with nested conditional (?:) operators.
module med_dataflow(output logic [31:0] y, input logic [31:0] d0, d1, d2, d3, input logic [1:0] s);
assign y = (s == 2'b00) ? d0 :
(s == 2'b01) ? d1 :
(s == 2'b10) ? d2 : d3;
endmodule
Example 3: Hard (Simple Arithmetic Logic Unit)
Task: A dataflow ALU determining its output operation via a 3-bit opcode.
module hard_dataflow(output logic [15:0] result, output logic zero, input logic [15:0] a, b, input logic [2:0] opcode);
// 000: ADD, 001: SUB, 010: AND, 011: OR, 100: XOR, 101: Shift L, 110: Shift R
assign result = (opcode == 3'b000) ? (a + b) :
(opcode == 3'b001) ? (a - b) :
(opcode == 3'b010) ? (a & b) :
(opcode == 3'b011) ? (a | b) :
(opcode == 3'b100) ? (a ^ b) :
(opcode == 3'b101) ? (a << b[3:0]) :
(opcode == 3'b110) ? (a >> b[3:0]) : 16'h0000;
// Reduction NOR logic to test for zero outcome
assign zero = ~(|result);
endmodule
A procedure (or function) is a mechanism for code reuse and abstraction. The MIPS architecture relies heavily on software conventions to implement procedure calls seamlessly.
$a0-$a3).jal Label).$v0-$v1).jr $ra).$a0-$a3 (4-7): Arguments passed to the procedure.$v0-$v1 (2-3): Return values.$t0-$t9 (8-15, 24-25): Caller-Saved temporaries (not preserved across calls).$s0-$s7 (16-23): Callee-Saved saved registers (must be preserved across calls).$sp (29): Stack Pointer.$ra (31): Return Address.The stack is memory used to “spill” registers when we run out of MIPS’s 32 hardware registers, or when we need to preserve variables across a nested procedure call. The stack grows downwards from high memory addresses to low.
addi $sp, $sp, -4 then sw $t0, 0($sp)lw $t0, 0($sp) then addi $sp, $sp, 4Leaf vs. Nested Procedures:
A leaf procedure does not call any other procedures. It may not need to use the stack at all if it only uses $t registers. A nested procedure calls another procedure. It must save its $ra to the stack before jumping, otherwise the second jal will irreparably overwrite the original return address!
MIPS instructions are rigidly 32 bits long. This makes it impossible for an I-Type instruction (like addi) to contain both the opcode/registers AND a full 32-bit constant, as the immediate field is only 16 bits.
We construct 32-bit constants using two instructions:
lui (Load Upper Immediate): Loads a 16-bit constant into the top 16 bits of a register, clearing the bottom 16 bits to 0.ori (OR Immediate): Bitwise ORs a 16-bit constant into the bottom 16 bits.// Load 0xDEADBEEF into $s0
lui $s0, 0xDEAD // $s0 = 0xDEAD0000
ori $s0, $s0, 0xBEEF // $s0 = 0xDEADBEEF
addi $t0, $t1, 5).add $t0, $t1, $t2).lw $t0, 100($s1)).beq $t0, $t1, Label). Used for relatively close branches. Target = $PC + 4 + (\text{offset} \times 4)$.J-Type instruction shifted left by 2, combined with the top 4 bits of the current PC (e.g., j Label). This allows a jump anywhere within the current 256 MB block of memory. To jump further, jr (Register Addressing) must be used.In modern multicore processors, multiple CPUs share the same main memory. This introduces the concept of a Data Race: when two threads try to read and write to the same memory location simultaneously, the final variable state is unpredictable and depends on the exact timing of the threads.
A simple “lock” mechanism stored in memory (0 = unlocked, 1 = locked) cannot be implemented with standard lw and sw. Two threads might both lw the lock simultaneously, see that it is 0, and both sw 1 to claim it. They both think they own the lock! We need an atomic read-modify-write operation (an operation that cannot be interrupted).
MIPS provides two special instructions to build atomic operations hardware-efficiently without complex lock lines on the memory bus.
ll rt, offset(rs) (Load Linked): Loads a word from memory and registers the memory address with the CPU.sc rt, offset(rs) (Store Conditional): Attempts to store to the address. It succeeds (writes to memory and sets rt to 1) ONLY IF no other processor has written to that address since the ll. Otherwise, it fails (does not write, sets rt to 0).If sc fails, the local thread can simply loop back and try the ll again.
Complete these 9 problems. See the answers at the bottom when you are done to check your work!
1. Easy:
Translate the equation $Y = A \cdot B + C$ directly into structural gate primitives and and or.
2. Medium:
Identify the error in the following instantiation of logic primitives: and u1 (in1, in2, out);
3. Hard:
Design a structural 2-bit Equality Comparator ($A=B$) using only xnor and and primitives. Assume $A = {a1, a0}$ and $B = {b1, b0}$.
4. Easy:
Write the table section of a combinational UDP for a 3-input XOR gate ($Y = A \oplus B \oplus C$).
5. Medium:
Write the state table for an RS-Latch Sequential UDP. The latch holds state if $R=0, S=0$. It sets if $S=1$, resets if $R=1$, and state is unpredictable limit (x) if $R=1, S=1$.
6. Hard: Write the core state table for a negative-edge-triggered T-Flip-Flop (Toggles on clock falling edge if $T=1$).
7. Easy: Write a single assign statement for a 2-to-4 decoder’s $0$th signal (dec0), which should be high only when sel is 2'b00.
8. Medium: What is logically incorrect about this dataflow statement trying to select ‘data1’ when ‘enable’ is high, and bitwise ORing the result with a mask? assign out = enable ? data1 : data0 | mask; (Hint: Check operator precedence!)
9. Hard:
Design a 4-input priority encoder using nested conditional operators. $D[3]$ has highest priority, $D[0]$ has lowest. If $D[3]$ is 1, output 2'b11. If $D=\text{0000}$, output 2'b00 and assert an idle signal using concatenation {idle, val}.
10. Easy:
Categorize the following registers into “Caller-Saved” or “Callee-Saved”: $t0, $s0, $ra, $a0.
11. Medium:
Write a MIPS procedure prologue and epilogue that safely backs up and restores $s0 and $s1. Assume $sp needs to stay 4-byte aligned (so allocate 8 bytes).
12. Hard:
alculate the necessary stack frame size (in bytes) and $sp movement for a deeply nested recursive function that needs to save $ra, $a0, $s0, $s1, and $s2 on the stack for each call.
13. Easy:
Write the two instructions needed to load the 32-bit constant 0xDEADBEEF into register $t0.
14. Medium:
Calculate the exact target address of a beq instruction given the $PC = \text{0x00400010}$ and the 16-bit offset is 0x0005.
15. Hard:
Contrast the reach (addressable range) of a j instruction vs. a jr instruction. Explain why jr is strictly necessary for large OS kernels to jump anywhere in a 32-bit address space.
16. Easy: Define a “data race” condition in the context of multicore processors.
17. Medium:
Explain what happens to an sc (store conditional) instruction if an operating system context-switch (an interrupt) occurs entirely between the ll (load linked) and the sc.
18. Hard:
Write a simple atomic “spin-lock” acquisition loop using ll and sc instructions in MIPS assembly. Assume the lock address is in $a0 (0 means unlocked, 1 means locked).
1. Easy
logic net1;
and (net1, A, B);
or (Y, net1, C);
2. Medium:
The port ordering is incorrect. In SystemVerilog primitives, the output is always listed first. It should be and u1 (out, in1, in2);
3. Hard
logic eq0, eq1;
xnor (eq0, a0, b0);
xnor (eq1, a1, b1);
and (A_eq_B, eq0, eq1);
4. Easy
table
// A B C : Y
0 0 0 : 0;
0 0 1 : 1;
0 1 0 : 1;
0 1 1 : 0;
1 0 0 : 1;
1 0 1 : 0;
1 1 0 : 0;
1 1 1 : 1;
endtable
5. Medium
table
// R S : state : next_state
0 0 : ? : -; // hold
1 0 : ? : 0; // reset
0 1 : ? : 1; // set
1 1 : ? : x; // unpredictable
endtable
6. Hard
table
// clk T : state : next_state
(10) 0 : ? : -; // Falling edge, T=0, hold
(10) 1 : 0 : 1; // Falling edge, T=1, toggle 0->1
(10) 1 : 1 : 0; // Falling edge, T=1, toggle 1->0
(0?) ? : ? : -; // Ignore rising edges
? (?) : ? : -; // Ignore T variations
endtable
7. Easy
assign dec0 = (sel == 2'b00); // Evaluates to 1-bit logic 1/0
8. Medium:
According to precedence rules, Bitwise OR (|) evaluates before the Conditional operator (?:). Therefore, the code actually evaluates as assign out = enable ? data1 : (data0 | mask). If ‘enable’ is high, the mask is entirely ignored! The correct notation requires parenthesis: assign out = (enable ? data1 : data0) | mask;
9. Hard
assign {idle, val} = (D[3]) ? {1'b0, 2'b11} :
(D[2]) ? {1'b0, 2'b10} :
(D[1]) ? {1'b0, 2'b01} :
(D[0]) ? {1'b0, 2'b00} : {1'b1, 2'b00}; // D is 0000, assert idle bit
10. Easy:
$t0, $a0.$s0, $ra.11. Medium
// Prologue
addi $sp, $sp, -8 // Allocate 8 bytes
sw $s0, 4($sp) // Save $s0
sw $s1, 0($sp) // Save $s1
// [Procedure Body]
// Epilogue
lw $s1, 0($sp) // Restore $s1
lw $s0, 4($sp) // Restore $s0
addi $sp, $sp, 8 // Deallocate 8 bytes
jr $ra // Return
12. Hard:
Saving 5 registers requires 5 words. Each word is 4 bytes.
Frame Size = $5 \times 4 = 20$ bytes.
Stack Pointer movement: addi $sp, $sp, -20 (prologue) and addi $sp, $sp, 20 (epilogue).
13. Easy
lui $t0, 0xDEAD // Load Upper Immediate
ori $t0, $t0, 0xBEEF // OR Immediate
14. Medium:
beq uses PC-Relative Addressing. The target address is $PC + 4 + (\text{offset} \times 4)$.
$Target = \text{0x00400010} + 4 + (5 \times 4) = \text{0x00400014} + 20 = \text{0x00400014} + \text{0x00000014} = \text{0x00400028}$.
15. Hard:
A j instruction uses Pseudodirect Addressing, allowing it to reach anywhere within the current 256 MB block (determined by the upper 4 bits of the PC). A jr instruction jumps to the address contained in a 32-bit register, giving it an absolute 4 GB reach. jr is necessary for OS kernels and dynamically linked libraries because they might reside in completely different memory segments than the calling code block.
16. Easy: A data race occurs when two or more concurrent threads/processes access the same memory location simultaneously, and at least one access is a write, and the accesses are not synchronized. The final result depends on the unpredictable timing of the execution.
17. Medium:
If a context switch occurs between ll and sc, the sc instruction will fail (store 0 into the target register instead of 1, and not update memory) because the operating system guarantees that any exception/interrupt resets the link register used by the ll/sc pair.
18. Hard
spin_lock:
ll $t0, 0($a0) // Load Linked: Read lock status
bne $t0, $zero, spin_lock // If lock is 1 (taken), spin and try again
add $t1, $zero, 1 // Setup $t1 = 1 (locked value)
sc $t1, 0($a0) // Store Conditional: Try to write 1 to lock
beq $t1, $zero, spin_lock // If sc failed (returned 0), spin and try again
// Lock acquired!
A fully parameterized, combinational Arithmetic Logic Unit (ALU) modeled after the MIPS32 architecture has been added to this week’s materials. It processes standard R-Type instructions (e.g., add, sub, and, or, slt) directly from the opcode and funct bit inputs without utilizing a hardware clock.
Project Files:
README.md: Documentation and simulation instructions.alu.sv: The parameterized ALU SystemVerilog module.tb_alu.sv: The testbench validating signed/unsigned math operations and the zero-flag.Run it at the command line:
iverilog -o alu.vvp alu.sv tb_alu.sv
vvp alu.vvp
To reinforce the transition from gate-level logic to more advanced behavior and dataflow modeling, keep these core practical concepts in mind as you work through the assignments and the MIPS ALU project:
0, 1, x, z). High-impedance (z) and undefined (x) states are essential for debugging bus conflicts and undefined signals. When writing simulation tests, using manual delays (like #10) relative to your defined timescales (1ns/10ps) is crucial for verifying that signals settle properly.PEMDAS meticulously when writing dataflow assign statements. Furthermore, maintaining strict assignment protocols is mandatory: use assign for continuous logic, = for blocking combinational blocks (always_comb), and <= for non-blocking sequential blocks (always_ff). Mixing these up will cause unpredictable race conditions during simulation.Zero flag (critical for the beq branch instruction) and the Overflow flag—is just as important as the primary data output.