Courses & Projects by Rob Marano

Notes for Week 1

← back to syllabus ← back to notes

🗂️ Download Week 01 Slides (PDF)

Slides for Class 01

Topics

  1. Software eats the world: Computers, their abstraction, and why we study them – prologue
  2. Stored Program Concept and its processing
  3. The alphabet, vocabulary, grammar of computers 1. 1s and 0s as the alphabet 2. compute and memory instructions as the vocabulary 3. implementation of compute and memory instructions as the grammar
  4. Introducing the instructions of a computer delivered by the architecture 1. Operations of computer hardware 2. Operands of computer hardware 3. Signed and unsigned numbers 4. Representing instructions in the computer 5. Logical operations
  5. Performance of computer hardware, and how it is measured 1. ALU 2. Bus width 3. Memory 4. CPU 5. Clock speed 6. Multi-core (Ahmdal’s Law) 7. Threading
  6. Intro to the history of computer architecture and modern advancements

Class 01 Lecture Summary

We had our inaugural lecture of the ECE 251 Computer Architecture course this Spring 2026 semester. I outlined the course logistics, the “arc of learning” for the semester, and introduces foundational concepts regarding the definition, history, and design principles of modern computers.

Course Structure and Philosophy

I emphasized a supportive teaching environment, encouraging open communication via chat and direct contact. The course is co-taught with Professor Fontaine and is designed to teach the “art and science” behind modern computer construction, applicable to electrical engineers, computer scientists, and programmers alike.

Topics Covered

1. Definition of a Computer The lecture distinguishes between historical analog computers and modern digital general-purpose computers.

2. Instruction Set Architecture (ISA) The course focuses on MIPS because it is a pedagogically pure RISC (Reduced Instruction Set Computer) architecture, unlike the more complex x86.

3. The Seven Great Ideas in Computer Architecture I introduced seven enduring design principles found in our textbook:

4. Performance Measurement Performance is defined by the time it takes to execute a program, not just clock speed.

5. Computer Architectures Two primary memory architectures are distinguished:

6. Current Revolutions We closed the lecture by contextualizing the course within current technological revolutions, specifically Quantum Computing and Generative AI. AI workloads (like matrix multiplication) rely heavily on specialized architectures like GPUs and NPUs (Neural Processing Units), requiring engineers to understand these underlying hardware principles.

Topics Deep Dive

What is Computer Architecture?

Welcome to Computer Architecture! This course will delve into the fundamental principles governing how computers work at a hardware level. It’s not just about programming (though that’s related!), nor is it solely about circuit design (though that plays a role). Computer architecture sits at the intersection of hardware and software, defining the interface between them.

Think of it as the blueprint of a building, like our New Academic Building. Architects don’t lay every brick, nor do they decide how the occupants will use each room. Instead, they design the structure, layout, and systems (electrical, plumbing) that enable both construction and habitation. Similarly, computer architects define the fundamental organization and behavior of a computer system, enabling both hardware implementation and software execution.

Key Questions to Consider

Why Study Computer Architecture?

The Five Classic Components of a Computer

Every computer, from your smartphone to a supercomputer, can be conceptually broken down into five main components:

  1. Input: Mechanisms for feeding data into the computer (keyboard, mouse, network interface, sensors, etc.).
  2. Output: Mechanisms for displaying or transmitting results (monitor, printer, network interface, actuators, etc.).
  3. Memory: Stores both instructions (the program) and data that the computer is actively using. Think of it as the computer’s workspace. We’ll explore different types of memory (RAM, cache, registers) in detail later.
  4. CPU: Performs the actual computations (arithmetic operations, logical comparisons) on the data. This is the “brain” of the computer.
  5. Control Unit: Directs the operation of all other components. It fetches instructions from memory, decodes them, and issues signals to the ALU, memory, and I/O devices to execute those instructions. It’s the “conductor” of the computer’s orchestra.

These five components are interconnected by address buses, data buses, and control buses, which are sets of wires that carry data and control signals.

Five Parts of a Computer

The Stored Program Concept

One of the most crucial concepts in computer architecture is the stored program concept. Before this, computers were often hardwired for specific tasks. Changing the program required rewiring the machine—a tedious and error-prone process.

The stored program concept, attributed to John von Neumann, revolutionized computing by storing both the instructions (the program) and the data in the computer’s memory. This allows for:

This concept is fundamental to how all modern computers operate.

von Neumann (Princeton) vs. Harvard Architectures

While the von Neumann architecture is dominant, it’s important to understand its historical context and alternatives.

Feature Von Neumann (Princeton) Architecture Harvard Architecture
Memory Single memory space for both instructions and data Separate memory spaces for instructions and data
Access Instructions and data share the same memory bus Instructions and data can be accessed simultaneously
Advantages Simpler design, more efficient use of memory Faster instruction fetch, avoids bottlenecks
Disadvantages Potential bottleneck (von Neumann bottleneck) as both instructions and data compete for the same memory access More complex design, requires separate memory modules
Applications General-purpose computers, PCs, laptops Embedded systems, digital signal processors (DSPs)

The von Neumann bottleneck arises because both instructions and data must travel over the same bus to and from memory. This can limit performance, especially when the CPU needs to fetch instructions and data frequently. The Harvard architecture mitigates this by allowing parallel access to instruction and data memories.

Architectural Comparison

Von Neumann vs. Harvard Architectures

While modern general-purpose computers primarily use variations of the von Neumann architecture (often with caching and other techniques to reduce the bottleneck), the Harvard architecture is still relevant in specialized applications where performance and parallelism are critical.

Additional readings for these architecture types:

History of the Stored Program Concept

The stored-program concept, a fundamental principle in computer architecture, was pioneered by John von Neumann in the mid-1940s. It revolutionized computing by allowing computers to store both data and instructions in the same memory location. This seemingly simple idea had profound implications, enabling computers to become much more flexible and powerful than their predecessors, which relied on fixed programs or manual reconfiguration.  

Before the stored-program concept, computers were often limited to specific tasks. These “Fixed Program Computers” had their functionality determined by their physical design and could not be easily reprogrammed. For example, the ENIAC and Colossus had to be physically rewired or reconfigured with switches and cables to change their programs. This process was time-consuming and laborious, often taking weeks to set up and debug a single program. Imagine having to rewire your computer every time you wanted to switch from writing an email to browsing the internet!  

With the stored-program concept, instructions are encoded as binary numbers and stored in memory alongside the data they operate on. This means that the computer can access and execute instructions sequentially, just like it accesses data. This is achieved through a continuous cycle of fetching instructions from memory, decoding them, and then executing them, known as the fetch-decode-execute cycle. The control unit acts like the brain of the computer, fetching instructions from memory and interpreting them. It then instructs the ALU, which is responsible for performing calculations and logical operations, to carry out the tasks specified by the instructions.

Relating through an Example

Think of a chef in a kitchen. In early computers, the chef would have to follow a single recipe written on a wall, with no way to change it, cooking one item, dish, or ingredient at a time. The chef is the computer system and the program. With the stored-program concept, the chef now has a cookbook where they can store and access different recipes (programs) as needed. The chef can then follow the instructions in the chosen recipe to prepare a dish (perform a computation).

A crucial aspect of this concept is that instructions are treated as data, that is, data and instructions are coded into binary representation and manipulated by the computer architecture to compute (and store) the results requested by the program and driven by the data. As a result, programs can not only be stored and executed, but they can also be manipulated and modified like any other data. This has profound implications, as it allows for the creation of programs that can write or modify other programs, leading to the development of assemblers, compilers, linkers, and other essential software tools. It also enables self-modifying code, where a program can alter its own instructions during execution, allowing for more complex and dynamic behavior.  

This ability to store and execute different programs from memory is what allows your computer to run various applications, from TikTok, SnapChat, and web browsers to games, video editing software, and Matlab, for example.  

Key Advantages of the Stored-Program Concept:

  1. Programmability: Computers can be easily reprogrammed to perform different tasks by simply loading a new set of instructions into memory.
  2. Flexibility: A single computer can be used for a wide range of applications.
  3. Self-modifying code: Programs can modify their own instructions during execution, enabling more complex and dynamic behavior. (Think how computer viruses work…)

Limitations of the Von Neumann Architecture (aka Princeton Architecture)

While the Von Neumann architecture revolutionized computing, it also has limitations. One of the most significant is the “Von Neumann bottleneck.” This bottleneck arises because the CPU fetches both data and instructions from the same memory location using a single bus. This means that the CPU cannot fetch data and instructions simultaneously, leading to a slowdown in processing speed, especially when dealing with large amounts of data.  

To mitigate this bottleneck, modern computer architectures employ various techniques, such as:

  1. Memory Hierarchy: Two of the five main components of a modern, general purpose computer are CPU and memory. This general term memory represents all the addressable storage locations. The memory hierarchy begins with cache memory, closest to the CPU. These small, high-speed memory units store frequently accessed data and instructions, reducing the need to access the lower memory stages in the hierarchy, ultimately to the full extent of all the addressable storage locations, aka, main memory.
  2. Modified Harvard architecture: Using separate caches or access paths for data and instructions.  
  3. Branch prediction: Predicting the flow of program execution to pre-fetch instructions and reduce delays.  

These advancements help to improve the performance of modern computers, but the fundamental principle of the Von Neumann architecture remains a cornerstone of their design.

In Conclusion

The stored-program concept, a brainchild of John von Neumann, revolutionized computing by allowing both data and instructions to reside in the same memory. This innovation enabled computers to become programmable, flexible, and capable of performing a wide range of tasks. By treating instructions as data, it paved the way for the development of software, operating systems, and ultimately, the digital world we have today. Modern computers, from smartphones to supercomputers, owe their versatility and power to this fundamental principle.

Introducing Performance of a Computer

Background reading on performance

Defining Performance in Computer Architecture

Performance in computer architecture is a multifaceted concept, and there isn’t one single “best” metric. It’s often a balancing act between different factors, and the “right” performance measure depends on the specific application and priorities. Here’s a breakdown of key aspects:

1. Execution Time:

2. Throughput:

3. Latency:

4. Resource Utilization:

5. Power Consumption:

6. Cost:

The “Power Wall” refers to the increasing difficulty and impracticality of continuing to increase processor clock speeds to achieve performance gains. For many years, increasing clock speed was the primary driver of improved CPU performance. However, this approach has run into fundamental physical limitations, leading to the “power wall.”

The Problem:

As clock speeds increase, so does the power consumption of the processor. This increased power consumption manifests as heat. The relationship is roughly cubic: doubling the clock speed can increase power consumption by a factor of eight. This heat becomes increasingly difficult and expensive to dissipate. Think of it like trying to cool a rapidly boiling pot of water; at some point, you can’t add any more heat without it boiling over.

Consequences of Excessive Heat:

The Relationship Between Clock Speed and Power

Where:

Why It’s Close to a Factor of Eight

  1. Linear Increase with Frequency: If you only doubled the clock speed and kept the voltage the same, the power would increase linearly (doubling).
  2. Voltage Increase: However, to reliably double the clock speed, you typically need to increase the voltage. This increase in voltage, when squared, has a much larger impact on power consumption.
  3. Combined Effect: The combination of the linear increase from frequency and the quadratic increase from voltage results in a power increase that is close to a factor of eight when you double the clock speed.

Important Caveats

In Summary

While not an absolute rule, the “factor of eight” is a good rule of thumb to illustrate the significant power challenges associated with increasing clock speeds. It highlights the need for innovative design techniques and power management strategies in modern computer architecture.

The Shift in Focus:

The power wall has forced a fundamental shift in how computer architects design processors. Instead of focusing solely on increasing clock speed, the emphasis has moved towards:

In summary: The power wall is a critical challenge in computer architecture. It signifies the limitations of simply increasing clock speeds to achieve performance gains. The industry has responded by shifting its focus towards multi-core processors, specialized hardware, architectural innovations, and power-efficient designs. Managing power consumption and heat dissipation has become a central concern for computer architects.

Looking Ahead to next lecture — Instruction Set Architecture (ISA)

Today, we’ve laid the foundation for understanding the basic components and principles of computer architecture. Our next lecture will delve into the Instruction Set Architecture (ISA).

The ISA defines the set of instructions that a particular processor can understand and execute. It’s the interface between the hardware and the software. We’ll explore:

Understanding the ISA is crucial for writing efficient code, optimizing compiler design, and designing new processors. It’s the bridge between the high-level world of programming and the low-level world of hardware.

Textbook Chapter 1 Summary Notes

1.1 Introduction

The computer revolution has been driven by rapid improvements in technology, leading to distinct classes of computing systems:

1.2 Seven Great Ideas in Computer Architecture

  1. Use Abstraction to Simplify Design: Hiding lower-level details to manage complexity (e.g., instruction set architecture).
  2. Make the Common Case Fast: Optimize the most frequently executed paths.
  3. Performance via Parallelism: Doing multiple things at once.
  4. Performance via Pipelining: Specific form of parallelism (like an assembly line).
  5. Performance via Prediction: Guessing the outcome to proceed sooner (e.g., branch prediction).
  6. Hierarchy of Memories: Combining small/fast memory (cache) with large/slow memory to give the illusion of large/fast memory.
  7. Dependability via Redundancy: Redundant components to handle failures (e.g., RAID).

1.3 Below Your Program

Software is organized in layers:

  1. Applications: High-level code.
  2. System Software:
    • Operating System (OS): Handles I/O, memory, and resource allocation.
    • Compiler: Translates high-level language (C, Java) into assembly language.
  3. Hardware: Executes the machine code.

Translation Hierarchy:

1.4 Under the Covers (The 5 Classic Components)

Every computer consists of 5 components:

  1. Input: Feeds data (keyboard, mouse, mic).
  2. Output: Conveys results (display, speaker).
  3. Memory: Stores data and programs (DRAM for main memory, SRAM for cache, Flash/Disk for secondary).
  4. Datapath: Performs arithmetic operations (brawn).
  5. Control: Commands the datapath, memory, and I/O (brain).

1.5 Technologies for Processors and Memory

1.6 Performance

Performance is defined by the user’s needs:

Measuring Time:

The CPU Performance Equation: Execution time depends on the number of clock cycles and the clock cycle time.

\[\text{CPU Time} = \text{Instruction Count} \times \text{CPI} \times \text{Clock Cycle Time}\]

Or equivalently: \(\text{CPU Time} = \frac{\text{Instruction Count} \times \text{CPI}}{\text{Clock Rate}}\)

CPU Performance Equation

cpu-perf-diagram

1.7 The Power Wall

For decades, clock rates increased rapidly, but this has slowed due to thermal limits.

1.8 The Sea Change: Multicore

To improve performance without increasing clock rate, manufacturers switched to Multicore Processors (multiple processors per chip).

1.11 Fallacies and Pitfalls

1. The Amdahl’s Law Pitfall

Pitfall: Expecting the improvement of one aspect of a computer to increase overall performance by an amount proportional to the size of the improvement.

2. The Low Utilization Fallacy

Fallacy: Computers at low utilization use little power.

3. The Performance Metric Pitfall

Pitfall: Using a subset of the performance equation as a performance metric.


← back to syllabus ← back to notes