Classes will be decided week-to-week.
| Week(s) | Week of | Topic |
|---|---|---|
| 01 | 1/27 | Intro; centralized vs distributed systems; development environment setup |
| 02 | 2/3 | Multi-processing & network programming — Part 1 |
| 03 | 2/10 | Multi-processing & network programming — Part 2 |
| 04 | 2/17 | Multi-processing & network programming — Part 3 |
| 05 | 2/24 | Containerization: Docker and Kubernetes |
| 06 | 3/3 | DevOps and CI/CD |
| 07 | 3/10 | Integrate application to infrastructure |
| 08 | 3/17 | Distributed Architectures |
| 09 | 3/24 | Communication and Coordination |
| 10 | 3/31 | Consistency & Replication |
| 11 | 4/7 | Fault Tolerance |
| 12 | 4/21 | Security |
| 13 | 4/28 | Deploying on k8s on cloud-based virtual bare metal nodes |
| 14 | 5/5 | Deploying on k8s on cloud-based k8s |
| 15 | 5/12 | Final individual projects due |
Follow the link above to the respective week’s materials below.
A distributed system is defined as a collection of autonomous computing elements that appears to its users as a single coherent system. This definition highlights two key features:
Distributed vs. Decentralized Systems The distinction between these systems lies in how and why computers are connected:
The Centralization Myth A common misconception is that centralized solutions are inherently unscalable or vulnerable. However, a distinction must be made between logical and physical centralization.
Building distributed systems is complex and justified only when specific goals are met:
The primary goal is to make resources (storage, computing power, data, networks) easily accessible and shareable among users. This allows for economic efficiency and collaboration.
The system should hide the fact that its resources are physically distributed. This is typically achieved through a middleware layer.
| Transparency Type | Description |
|---|---|
| Access | Hides differences in data representation and how resources are accessed. |
| Location | Hides where a resource is physically located. |
| Relocation | Hides that a resource may move to another location while in use. |
| Migration | Hides that a resource may move to another location. |
| Replication | Hides that a resource is replicated (copied) across multiple nodes. |
| Concurrency | Hides that a resource may be shared by several competitive users. |
| Failure | Hides the failure and recovery of a resource. |
Note on Transparency: Full transparency is often impossible or undesirable (e.g., hiding network latency in a real-time system is physically impossible).
An open system offers components that can be easily used by or integrated into other systems.
Scalability is measured along three dimensions:
Scaling Techniques:
Focuses on integrating separate applications into an enterprise-wide system.
Systems that blend into the environment, characterized by small, battery-powered, mobile devices.
Developers often commit errors by accepting the following false assumptions about the underlying network:
The textbook emphasizes that distributed systems rely on message passing and hiding complexity (transparency). Below are Python examples illustrating Access Transparency (hiding data representation using serialization) and Basic Connectivity (the foundation of distributed systems).
In distributed systems, machines may represent data differently. To achieve access transparency, data must be marshaled (serialized) into a standard format before transmission.
import pickle
# A complex data object (list of dictionaries)
# In a real scenario, this could be a database record or object state.
local_data = [
{"id": 1, "action": "update", "value": 42},
{"id": 2, "action": "delete"}
]
print(f"Original Data Type: {type(local_data)}")
# Marshaling (Serialization)
# This simulates preparing data to be sent over the network.
# It hides the internal memory representation of the Python list.
network_message = pickle.dumps(local_data)
print(f"Marshaled (Network) Data: {network_message}")
# --- Network Transmission Simulation ---
# Unmarshaling (Deserialization)
# The receiving node reconstructs the object without knowing
# the sender's internal memory layout.
received_data = pickle.loads(network_message)
print(f"Reconstructed Data: {received_data}")
print(f"Is data identical? {local_data == received_data}")
Ref: Concepts based on Section 1.2.2 and Python pickle usage in Note 4.4.
This example demonstrates the “Networked” aspect of distributed systems using sockets. This is the low-level mechanism upon which higher-level distributed abstractions (like RPC) are built.
The Server (Run this first):
from socket import *
def start_server():
# Create a TCP/IP socket
server_socket = socket(AF_INET, SOCK_STREAM)
# Bind the socket to the address and port
server_socket.bind(('localhost', 8080))
# Listen for incoming connections (queue up to 1 request)
server_socket.listen(1)
print("Server is listening on port 8080...")
while True:
# Accept a connection
connection, client_address = server_socket.accept()
try:
print(f"Connection from {client_address}")
# Receive data in small chunks
data = connection.recv(1024)
if data:
print(f"Received: {data.decode()}")
# Send data back to the client (Echo)
response = "Acknowledged: " + data.decode()
connection.sendall(response.encode())
finally:
# Clean up the connection
connection.close()
if __name__ == "__main__":
start_server()
The Client:
from socket import *
def start_client():
# Create a TCP/IP socket
client_socket = socket(AF_INET, SOCK_STREAM)
# Connect the socket to the server's port
client_socket.connect(('localhost', 8080))
try:
# Send data
message = "Hello Distributed World"
print(f"Sending: {message}")
client_socket.sendall(message.encode())
# Look for the response
response = client_socket.recv(1024)
print(f"Received: {response.decode()}")
finally:
print("Closing socket")
client_socket.close()
if __name__ == "__main__":
start_client()
Ref: Adapted from Note 2.1 illustrating basic connectivity principles discussed in Chapter 1.
In modern distributed systems, processes are often decomposed into threads to achieve higher performance and hide latency.
Why use threads in distributed systems?
Virtualization plays a foundational role in cloud computing by decoupling software from the underlying hardware.
Types of Virtualization:
Docker is the industry standard for creating and managing containers. It packages code and all its dependencies into a standard unit for software development.
To containerize our TCPServer from previous weeks, we use a Dockerfile. Consider the following week_05/netprog/Dockerfile snippet:
# Start from a base Ubuntu image
FROM ubuntu:24.10
# Install Java and networking tools
RUN apt-get update && apt-get install -y openjdk-21-jdk-headless dnsutils dos2unix
# Copy the compiled Java classes into the container
RUN mkdir -p /classes
COPY ./bin/TCPServer.class /classes
COPY ./bin/TCPServer\$ClientHandler.class /classes
# Copy and configure the startup script
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh && dos2unix /entrypoint.sh
# Expose the port the server listens on
EXPOSE 12345
# Define the command to run when the container starts
CMD ["/bin/bash","-c","/entrypoint.sh"]
docker build -t transcriptor:v1 .docker run -d -p 12345:12345 transcriptor:v1While Docker is excellent for running single containers, managing thousands of containers across many physical machines requires an orchestrator. Kubernetes is an open-source platform designed to automate deploying, scaling, and operating containerized applications.
As distributed systems scale, they face challenges:
Kubernetes solves these problems by providing a framework to run distributed systems resiliently.
A Kubernetes cluster consists of a set of worker machines, called Nodes, that run containerized applications. Every cluster has at least one worker node. The worker nodes host the Pods. The Control Plane manages the worker nodes and the Pods in the cluster.
In this week’s lab, you will:
TCPServer Java code.Dockerfile to build a Docker image for the server.TCPClient to communicate with the containerized application.See the README.md inside weeks/week_05/netprog/ for step-by-step instructions.