Hi folks š
Last year I wrote a blog post series on algorithms that is now a companion series to the official lectures from Prof. Skiena. I wanted to do the same thing with systems design with the same purpose: to really learn this stuff because I expect the people Iām hiring to know it as well. Itās only fair, right?
So the next several weeks are going to be blog posts on my learnings as I read through Designing Data-Intensive Applications. If youāre a hiring manager or someone on the job market right now this should be a highly informative series for you to check out and stay tuned for!
Scalability, reliability, and maintainability
Like my series on Algorithms, Iāve decided that I need to really understand systems design.
I run hiring for multiple teams. Some of our questions revolve around systems design. How can I possibly ask these questions if I couldnāt answer them perfectly myself?
Itās only fair that I master these concepts. The book that everyone seems to agree with is the best book on this topic is Designing Data-Intensive Applications by Martin Kleppmann.
In the spirit of my last educational series, I will transcribe my raw notes over the next 7 weeks or so. I do this both to teach my readers and also to reinforce the concepts myself. As I recall from my course on learning how to learn, one of the best ways to learn is to teach others.
The syllabus
Iāve decided to craft a syllabus for myself as if I were both the teacher and the student to help keep me on track and reinforcing the concepts. If you have a similar learning plan or have questions about mine, please feel free to reach out.
Week 1 (this week)
Read Chapter 1: Scalability, Reliability, and Maintainability
Practice: Design a scalable system to count events
Listen: the bookās podcast
Explore, at a high-level, data tools in the space
Week 2
Read Chapter 2: Data Models & Query Languages
Practice: Top K Problem
Week 3
Read Chapter 3: Storage & Retrieval
Practice: Distributed cache
Week 4
Read Chapter 5: Replication
Practice: Rate limiter
Week 5
Read Chapter 6: Partitioning
Practice: Notification service
Week 6
Read Chapter 8: Faults & Reliability
Practice: Distributed messaging queue
Week 7
Read Chapter 9: Consistency & Consensus
Practice: Read the HighScalabilitySeries on Twitter and implement it
Week 8
Skim Ch 10 & 11: Stream and Batch Processing
More possible practice that leverage stream/batch processing like Uber
Notes on Chapter 1
Chapter one is really an overview chapter. Itās easy to read and easy to skim. There arenāt too many deep concepts here. I found lots of shortlists/rules of thumb to be all you need to focus on the basics.
Data storage
When thinking of data storage, consider these 5 buckets:
DB engines (Postgres, Cassandra, CouchDB)
Search indexes (ElasticSearch, Solr)
Caches (Memcached, Redis)
Batch Processing (Storm, Flink, Hadoop)
Stream Processing (Kafka, Samza, Flink)
Functional vs non-functional requirements
Requirements concepts fall into 2 categories:
Functional requirements such as API design and type signatures
Non-functional requirements like scalability, fault tolerance, and so on
Requirements gathering
When asking questions regarding a product spec for a large-scale system, focus on these 5 categories of questions:
Users: Who are they? What do they do with the data?
Scale: How many requests/sec? Reads or writes? Where is the bottleneck? How many users are we supporting? How often/fast do users need data?
Performance: When do things need to be returned/confirmed? What are the tolerance and SLAs for constraints?
Cost: Are we optimizing for development cost or operational/maintenance cost?
CAP theorem: The P (partition tolerance) is always assumed to fail. That is why you always need redundant partitions and backups. Therefore, you want to ask what is more valuable: consistency or availability?
If consistency is most important, consider an ACID database like Postgres or MySQL. If availability is most important, consider a BASE database like an eventually consistent NoSQL solution such as CouchDB, Cassandra, or MongoDB.
The three system qualities in 1 line
This chapter can effectively be summarized in 3 sentences:
Scalability determines if this system can grow with the growth of your product.
Reliability determines if this system produces correct results (nearly) each and every time.
Maintainability determines if this system can evolve with your team and is easy to understand, write, and extend.
Further reading and study
As I said before, this is a pretty simple chapter. I also watched this systems design intro. This video extended these concepts and informed some of these notes. I like to accompany learnings with practice to seed new questions for our own interview process. This article on YouTubeās architecture further reinforces the sample problem on the YouTube video (how meta). You can check your solution against the one that was really used by YouTube.
Check-in next week with a summary of Chapter 2 of the book: Data Models & Query Languages!
š Thatās all folks š
Aaaand weāre done with issue 133 of the User Interfacing Newsletter.
If you got something out of this newsletter, feel free to forward it to your friends, family, and/or coworkers to help it grow.
Interested in sponsoring the newsletter? Have questions, comments, or corrections? Hit that ā©ļø reply button and let me know what you're up to!