Программирование [Mikhail Smarshchok] System Design for Interviews and Beyond (2023)

Robots01 · 10 Янв 2023

Описание [Mikhail Smarshchok] System Design for Interviews and Beyond (2023):

Course curriculum

1. Introduction
1.1. Course introduction
1.2. Who will benefit from the course and how
1.3. Course overview

2. How to define system requirements
2.1. System requirements
2.2. Functional requirements
2.3. High availability
2.4. Fault tolerance, resilience, reliability
2.5. Scalability
2.6. Performance
2.7. Durability
2.8. Consistency
2.9. Maintainability, security, cost
2.10. Summary of system requirements

3. How to achieve certain system qualities with the help of hardware
3.1. Regions, availability zones, data centers, racks, servers
3.2. Physical servers, virtual machines, containers, serverless
3.3. Fundamentals of reliable, scalable, and fast communication
3.4. Synchronous vs asynchronous communication
3.5. Asynchronous messaging patterns
3.6. Network protocols
3.7. Blocking vs non-blocking I/O
3.8. Data encoding formats
3.9. Message acknowledgment

4. How to improve system performance with caching
4.1. Deduplication cache
4.2. Metadata cache

5. The importance of queues in distributed systems
5.1. Queue
5.2. Full and empty queue problems
5.3. Start with something simple
5.4. Blocking queue and producer-consumer pattern
5.5. Thread pool
5.6. Big compute architecture

6. Data store internals
6.1. Log
6.2. Index
6.3. Time series data
6.4. Simple key-value database
6.5. B-tree index
6.6. Embedded database
6.7. RocksDB
6.8. LSM-tree vs B-tree
6.9. Page cache

7. How to build efficient communication in distributed systems
7.1. Push vs pull
7.2. Host discovery
7.3. Service discovery
7.4. Peer discovery
7.5. How to choose a network protocol
7.6. Network protocols in real-life systems
7.7. Video over HTTP
7.8. CDN
7.9. Push and pull technologies
7.10. Push and pull technologies in real-life systems
7.11. Large-scale push architectures

8. How to deliver data reliably
8.1. What else to know to build reliable, scalable, and fast systems
8.2. Timeouts
8.3. What to do with failed requests
8.4. When to retry
8.5. How to retry
8.6. Message delivery guarantees
8.7. Consumer offsets

9. How to deliver data quickly
9.1. Batching
9.2. Compression
10. How to deliver data at large scale
10.1. How to scale message consumption
10.2. Partitioning in real-life systems
10.3. Partitioning strategies
10.4. Request routing
10.5. Rebalancing partitions
10.6. Consistent hashing
11. How to protect servers from clients
11.1. System overload
11.2. Autoscaling
11.3. Autoscaling system design
11.4. Load shedding
11.5. Rate limiting
11.6. How to protect clients from servers
11.7. Synchronous and asynchronous clients
11.8. Circuit breaker
11.9. Fail-fast design principle
11.10. Bulkhead
11.11. Shuffle sharding

12. Epilogue
12.1. The end (but not quite)

Course syllabus
System requirements (functional and non-functional requirements)
Functional requirements (how to define, working backwards approach)
High availability (time-based and count-based availability, design principles behind high availability, processes behind high availability, SLO, SLA)
Fault tolerance, resilience, reliability (error, fault, failure, fault tolerance, resilience, game day vs chaos engineering, expected and unexpected failures, reliability)
Scalability (vertical and horizontal scaling, elasticity vs scalability)
Performance (latency, throughput, percentiles, how to increase write and throughput, bandwidth)
Durability (backup (full, differential, incremental), RAID, replication, checksum, availability vs durability)
Consistency (consistency models, eventual consistency, linearizability, monotonic reads, read-your-writes (read-after-write), consistent prefix reads)
Maintainability, security, cost (maintainability aspects (failure modes and mitigations, monitoring, testing, deployment), security aspects(CIA triad, identity and permissions management, infrastructure protection, data protection), cost aspects (engineering, maintenance, hardware, software))
Summary of system requirements (a single list of the most popular non-functional requirements)
Regions, availability zones, data centers, racks, servers (how hardware helps to achieve certain qualities)
Physical servers, virtual machines, containers, serverless (pros and cons of different computing environments, what are they good for)
Synchronous vs asynchronous communication (synchronous and asynchronous request-response models, asynchronous messaging)
Asynchronous messaging patterns (message queuing, publish/subscribe, competing consumers, request/response messaging, priority queue, claim check)
Network protocols (TCP, UDP, HTTP, HTTP request and response)
Blocking vs non-blocking I/O (socket (blocking and non-blocking), connection, thread per connection model, thread per request with non-blocking I/O model, event loop model, concurrency vs parallelism)
Data encoding formats (textual vs binary formats, schema sharing options, backward compatibility, forward compatibility)
Message acknowledgment (safe and unsafe acknowledgment modes)
Deduplication cache (local vs external cache, adding data to cache (explicitly, implicitly), cache data eviction (size-based, time-based, explicit), expiration vs refresh)
Metadata cache (cache-aside pattern, read-through and write-through patterns, write-behind (write-back) pattern)
Queue (bounded and unbounded queues, circular buffer (ring buffer) and its applications)
Full and empty queue problems (load shedding, rate limiting, what to do with failed requests, backpressure, elastic scaling)
Start with something simple (similarities between single machine and distributed system concepts, interview tip)
Blocking queue and producer-consumer pattern (producer-consumer pattern, wait and notify, semaphores, blocking queue applications)
Thread pool (pros and cons, CPU-bound and I/O-bound tasks, graceful shutdown)
Big compute architecture (batch computing model, embarrassingly parallel problems)
Log (memory vs disk, log segmentation, message position (offset))
Index (how to implement an efficient index for a messaging system)
Time series data (how to store and retrieve time series data at scale and with low latency)
Simple key-value database (how to build a simple key-value database, log compaction)
B-tree index (how databases and messaging systems use B-tree indexes)
Embedded database (embedded vs remote database)
RocksDB (memtable, write-ahead log, sorted strings table (SSTable))
LSM-tree vs B-tree (log-structured merge-tree data structure, write amplification, read amplification)
Page cache (how to increase disk throughput (batching, zero-copy read))
Push vs pull (pros and cons of both models)
Host discovery (DNS, anycast)
Service discovery (server‑side and client-side discovery patterns, service registry and its applications)
Peer discovery (peer discovery options, membership and failure detection problems, seed node, how gossip protocol works and its applications)
How to choose a network protocol (when and how to choose between TCP, UDP and HTTP)
Network protocols in real-life systems (quiz: what network protocol would you choose for various system design problems)
Video over HTTP (adaptive streaming)
CDN (how to use it, how it works, point of presence (POP), benefits)
Push and pull technologies (short polling, long polling, websocket, server-sent events)
Push and pull technologies in real-life systems (quiz: what technology would you choose for various system design problems)
Large-scale push architectures (C10K and C10M problems, examples of large-scale push architectures, the most noticeable problems of handling long-lived connections at large scale)
What else to know to build reliable, scalable, and fast systems (a list of common problems in distributed systems, a list of system design concepts that help solve these problems, three-tier architecture)
Timeouts (fast failures, slow failures, connection and request timeouts)
What to do with failed requests (strategies for handling failed requests (cancel, retry, failover, fallback))
When to retry (idempotency, quiz: which AWS API failures are safe to retry)
How to retry (exponential backoff, jitter)
Message delivery guarantees (at-most-once, at-least-once, exactly-once)
Consumer offsets (log-based messaging systems, checkpointing)
Batching (pros and cons, how to handle batch requests)
Compression (pros and cons, compression algorithms and the trade-offs they make)
How to scale message consumption (single consumer vs multiple consumers, problems with multiple consumers (order of message processing, double processing))
Partitioning in real-life systems (pros and cons, applications of partitioning)
Partitioning strategies (lookup strategy, range strategy, hash strategy)
Request routing (physical and virtual shards, request routing options)
Rebalancing partitions (how to rebalance partitions)
Consistent hashing (how to implement, advantages and disadvantages, virtual nodes, applications of consistent hashing)
System overload (why it is important to protect the system from overload)
Autoscaling (scaling policies (metric-based, schedule-based, predictive))
Autoscaling system design (how to design an autoscaling system)
Load shedding (how to implement it in distributed systems, important considerations)
Rate limiting (how to use the knowledge gained in the course for solving the problem of rate limiting (step by step guide))
Synchronous and asynchronous clients (admission control systems, blocking I/O and non-blocking I/O clients)
Circuit breaker (circuit breaker finite-state machine, important considerations)
Fail-fast design principle (problems with slow services (chain reactions, cascading failures) and ways to solve them)
Bulkhead (how to implement this pattern in distributed systems)
Shuffle sharding (how to implement this pattern in distributed systems)
The end (a list of topics that we will cover in the next module of the course)

Подробнее:

Скачать курс - [Mikhail Smarshchok] System Design for Interviews and Beyond (2023):

Для просмотра содержимого вам необходимо авторизоваться