title: Twitter System Design Example for Tech Interviews
published: true
description: Your complete guide to designing twitter or x.com in a system design interview with an example
tags: systemdesign, softwaredevelopment, programming, development
Disclosure: This post includes affiliate links; I may receive compensation if you purchase products or services from the different links provided in this article.
credit - Sandeep/CodeKarle
Hello guys, if you are preparing for a system design interview and looking for common System design problems and resources then you have come to the right place.
In the past, I have talked about essential system design concepts like API Gateway vs Load Balancer and Horizontal vs Vertical Scaling, Forward proxy vs reverse proxy as well common System Design problems and today I am going to discuss one popular System design problem --- designing Twitter or X.com.
Designing a complex system like Twitter can be challenging, especially in a system design interview.
The biggest challenge is not the complexity but the time as you need to convince your interviewer in 40 minutes that you know your stuff and this can only be possible if you prepare well and follow a structured approach while answering such questions.
In this system design tutorial, I will also give you a simple guide to help you structure and system design template (see below) to collect your thoughts and present a clear design.
By the way, if you are preparing for System design interviews and want to learn System Design in a limited time then you can also check sites like ByteByteGo, DesignGurus.io, Exponent, Educative.io, Codemia.io and Udemy which have many great System design courses
Similarly, while answering System design questions you can also follow a System design template like this from DesignGurus.ioto articulate your answer better in a limited time.
Following this template is one of the best things you can do to start your preparation for any system design interview.
Now, let's jump into the problem and solution.
Designing a system like Twitter is a common scenario in system design interviews but if you want to practice this question from scratch you can start it on Codemia.io which is a Leetcode style platform for system design interviews.
It has more than 120+ system design problems and growing, including Designing Twitter. It also provides editorial solutions created by senior engineers from reputed companies.
It also has free System Design questions and designing Twitter is is one of them. You can access it here.
Now coming back to the question itself, it's a great way to showcase your understanding of large-scale distributed systems, as it involves various aspects such as handling massive user bases, ensuring high availability, and maintaining stability under heavy loads.
This solution guide will walk you through the process of designing Twitter, covering system requirements, capacity estimation, API design, database design, high-level design, request flow, detailed component design, trade-offs, and potential failure scenarios.
By the end of this guide, you'll have a solid understanding of how to approach and present this design in an interview setting.
Here is a Twitter architecture diagram to get overall idea:
As I said we will solve this problem step by step and we will cover:
So, what are we waiting for, let start.
First thing you should get the requirements right and it starts with functional requirements.
Choose the one you are most familiar with if you are cloning a real app like buying stuff on Amazon or sending messages on Facebook or Twitter.
To design a robust and user-friendly Twitter-like system, we need to outline the core functionalities.
Users should be able to compose and share tweets, which is the primary function of the platform.
This involves creating a new tweet, attaching optional media, and sharing it with their followers. Additionally, users should be able to follow other users to see their updates in their feeds.
This means managing a list of followed users and ensuring their tweets appear in the user's timeline.
Another essential feature is allowing users to favorite tweets, indicating their appreciation and potentially bookmarking these tweets for future reference.
Here are the key functional requirements for your reference:
For a platform with the scale of Twitter, non-functional requirements are crucial. Scalability is paramount as the system must handle a vast number of users, tweets, and interactions without degradation in performance.
High availability ensures that the platform remains accessible and functional even during peak traffic times or in the event of hardware failures.
Stability is another critical aspect, as the service must be reliable, with minimal downtime and consistent performance, even under high concurrency.
Here are the key non-functional requirements that you should mention during the interview:
Estimating the user base is the first step in understanding the scale of the system. For this design, let's assume a user base of 500 million. This helps us gauge the expected load and the necessary infrastructure to support such a large number of users.
User Base
Traffic
To get a sense of the daily operations, we need to estimate the traffic. Assuming each user tweets once a day, we can expect 500 million tweets daily.
Additionally, if each user views 10 pages of their home feed per day, this results in substantial read operations.
Following relationships also add to the complexity, with each user following 100 others on average, leading to 50 billion follow relationships.
Lastly, if each user favorites 5 tweets daily, we have 2.5 billion favorite operations per day.
Here are the key traffic requirements you should consider or mention:
Breaking down these operations into queries per second (QPS) helps us understand the real-time load.
For write operations, we calculate approximately 15k QPS, for read operations about 75k QPS, and for favorite operations around 30k QPS.
These numbers help in planning the necessary infrastructure and load-balancing strategies.
Understanding the data size is crucial for storage planning. With 500 million tweets daily, and each tweet averaging 300 bytes after considering encoding, this totals 140GB of new data daily, or 50TB annually.
For media content, if we assume it to be 100 times the size of tweets, it results in 10TB daily, or 4PB annually.
This estimation underscores the need for a distributed storage architecture.
Now, let's talk about API Design which is another important area in System design interviews:
APIs for tweeting need to handle the creation and posting of tweets efficiently. This involves capturing user information, tweet content, location data, and timestamps. Proper error handling and validation are essential to ensure a smooth user experience.
public Result postTweet(Long userId, String tweetText, String location, DateTime date);
The following functionality requires APIs to manage relationships between users. This includes following and unfollowing users, ensuring data integrity, and updating the user's following list promptly.
public Result followUser(Long userId, Long followedUserId);\
public Result unfollowUser(Long userId, Long followedUserId);
APIs for favorites allow users to like or unlike tweets. These operations should be efficient, with proper indexing and error handling to ensure quick updates and an accurate count of favorites on each tweet.
public Result favoriteTweet(Long userId, Long tweetId);\
public Result unfavoriteTweet(Long userId, Long tweetId);
The feed rendering API is crucial for fetching and displaying tweets from followed users. This requires efficient querying and pagination to ensure quick load times and a seamless user experience.
public Result getFeeds(Long userId, String location, int pageNo);
After API Design, let's talk about Database design
The database design involves defining tables for users, tweets, and follower relationships. The UserInfo
table stores user details, the Tweets
table handles tweet content and metadata, and the Follower
table manages follow relationships. Proper indexing is essential for fast lookups and updates.
Here is a simple ERD diagram to understand Twitter Schema architecture better:
Choosing the right storage solutions is critical. For structured data like user profiles and tweets, MySQL is a good choice due to its support for complex queries and transactions.
For media storage, Amazon S3 offers scalable and cost-effective storage for images and videos.
Let's see the high-level design first:
The client layer involves websites and apps sending requests to the server. These requests are distributed via load balancers to ensure even load distribution and high availability. Using a CDN for static files helps reduce latency and improve load times.
The client layer involves websites and apps sending requests to the server. These requests are distributed via load balancers to ensure even load distribution and high availability.
Using a CDN for static files helps reduce latency and improve load times.
The data layer involves caching data with Redis to increase response speed and using MySQL for persistent storage with master-slave architecture to ensure consistency and availability. Amazon S3 is used for storing media files, ensuring scalability and durability.
Explaining the request flow helps in understanding how different components interact. When a client sends a request, it first hits the load balancer, which distributes it to an appropriate server.
The server processes the request, updates the database, and caches the necessary data. For read requests, data is retrieved from the cache or database, and media files are fetched from the CDN, ensuring quick response times.
Here is a nice mermaid diagram to better understand the request flow in Twitter architecture, When I practice system design problem on Codemia.io I use their interface to create such mermaid diagram also.
Now, let's see the detailed component design and various software architecture components we can use to design Twitter.
Deploying multiple load balancers in a cluster ensures high availability and even load distribution.
Placing load balancers in different locations reduces latency for users, and using various algorithms like round-robin or least connections helps manage the load efficiently.
If you don't know What is Load Balancer, here is a nice diagram from DesignGurus.io, one of my favorite site for learning System design:
Using a CDN for static content reduces the load on the origin server and improves load times for users. Optimizing caching rules and adjusting TTL helps achieve higher cache hit ratios, ensuring content is served quickly.
Using Redis for caching involves setting up a Redis cluster for scalability and employing master-slave replication for high availability. Sentinel monitors the cluster and handles failovers, ensuring the cache remains available even during node failures.
MySQL's master-slave architecture supports high-volume traffic and ensures data consistency through replication. Horizontal partitioning helps distribute the load across multiple servers, handling large datasets efficiently.
This is probably the most important part of System Design interviews, as you will have to explain your choices and tradeoffs you made and how they help. let's see:
Choosing MySQL over NoSQL is due to the need for complex queries and transaction support. While NoSQL offers schema flexibility, it lacks support for structured data and complex transactions, which are essential for Twitter's business model.
Redis is preferred over Memcached due to its support for various data types and horizontal scaling. While Memcached is efficient for basic key-value storage, Redis offers advanced features and better scalability, making it suitable for large-scale systems.
I chose Redis over Memcached for its advanced features and horizontal scaling.
Now, let's take a look at how robust and resilient our system is
To handle users who follow many accounts, a hybrid model combining pull and push approaches can reduce latency. For users following many people, pushing new tweets reduces the load during feed aggregation, improving user experience.
Handling read hotspots, such as popular users with many followers, involves caching their tweets in Redis and using a cache-aside strategy for consistency. Adding hot zones in Redis servers and using local caches can distribute the load, avoiding excessive calls to the same server.
Future improvements include implementing a multi-region active-active strategy for disaster recovery and high availability.
Deploying service clusters and database clusters in multiple locations with automatic failover and load balancing ensures no single point of failure, maintaining service continuity and reliability.
If you are preparing for a System design interview and looking for the best resources then here are curated list of the best system design books, online courses, and practice websites which you can check to better prepare for System design interviews. Most of these courses also answer questions I have shared here.
DesignGuru's Grokking System Design Course: An interactive learning platform with hands-on exercises and real-world scenarios to strengthen your system design skills.
Codemia.io: This is another great platform to practice System design problems for interviews. It has more than 120+ System design problems, many of which are free, and also a proper structure to solve them.
ByteByteGo: A live book and course by Alex Xu for System design interview preparation. It contains all the content of System Design Interview book volumes 1 and 2 and will be updated with volume 3 which is coming soon.
Exponent: A specialized site for interview prep especially for FAANG companies like Amazon and Google, They also have a great system design course and many other materials that can help you crack FAAN interviews.
"System Design Interview" by Alex Xu: This book provides an in-depth exploration of system design concepts, strategies, and interview preparation tips.
"Designing Data-Intensive Applications" by Martin Kleppmann: A comprehensive guide that covers the principles and practices for designing scalable and reliable systems.
LeetCode System Design Tag: LeetCode is a popular platform for technical interview preparation. The System Design tag on LeetCode includes a variety of questions to practice.
"System Design Primer" on GitHub: A curated list of resources, including articles, books, and videos, to help you prepare for system design interviews.
Educative's System Design Course: An interactive learning platform with hands-on exercises and real-world scenarios to strengthen your system design skills.
High Scalability Blog: A blog that features articles and case studies on the architecture of high-traffic websites and scalable systems.
YouTube Channels: Check out channels like "Gaurav Sen" and "Tech Dummies" for insightful videos on system design concepts and interview preparation.
image_credit --- ByteByteGo
Remember to combine theoretical knowledge with practical application by working on real-world projects and participating in mock interviews.
Continuous practice and learning will give you confidence for system design interviews.
That's all about how to design Twitter or X.com on system design interviews. By following this structure, you can design a robust and scalable system similar to Twitter. This guide will help you present your design effectively in a system design interview.
All the best for your System design interview