TinyURL
System Design : Scalable URL shortener service like TinyURL
URL shortening is used to create shorter aliases for long URLs.
Requirements
Functional
- Write service: given a URL, generate a short and unique alias for it.
- Read service: give a short URL, return the original url.
- Users customization: pick up a custom short link for the URL.
- URL translation has an expire time.
Non-Functional
- High availability.
- Real time with minial latency.
- Shortened links should not be guessable.
Extended requirements
- Telemetry: how many times a reduction happened?
- REST APIs by other service.
Capacity Estimation and constraints
Read-heavy service: R:W = 100:1
Traffic
500M/Month new service
1 day = 24*3600s =86.4K
1 month = 30 24 3600 s = 2.592 million seconds
1 year = 22.896 Million seconds
Write: 500M/Month = 200 URL/s
Read: 20K URL/S
Storage
Store URL in 2 years:
500 Million 12 2 years = 12 Billion
Every URL takes 0.5 K
Then we have 6 Billion K = 6 TB.
Bandwidth
Write: 200 * 0.5 K/s = 100 KB/s
Read: 100KB/S * 100 = 10 MB/S
Memory : 80-20 rules
Cache 20% hot URLs for 24 hours.
20K/S * 86.4K seconds=1.7 B requests
1.7 Billion 0.2 0.5 KB = 0.17 TB = 170 GB
High-Level estimates
Types of URLs | Time estimates |
---|---|
New URLs | 200/s |
URL redirections | 20K/s |
Incoming data | 100KB/s |
Outgoing data | 10 MB/s |
Storage for 5 years | 6 TB |
Memory for cache | 170 GB |
System APIs
createURL(
api_dev_key,
original_url,
custom_alias=None,
user_name=None,
expire_date=None) -> string
deleteURL(api_dev_key, url_key)
DB Design
- 6 TB
- Read-heavy
- No relationship between records.
DB Schema
URL DB: PK - Hash: string
original_url: string
create_time: datetime
user_id: int
User DB: PK - UserID: int
name: string
email: string
create_time: datetime
last_login_time: datatime
DB Type
No relationship between objects. No-SQL for availability and horizontal scalability.
Basic System Design and Algorithm
How to generate a short and unique key for a given URL?
Encoding actual URL
Base64 with 6 letters long. 64^6 = 68.7 Billion
MD5Sum to generate hash. 128 bits.
How to resolve hash collision?
Randomly pick up digits out of 128 bits.
Append use name id or increasing counter in md5 hashing func.
Generating keys off-line
Key Generation Service(KGS) in key-DB.
6 chars * 68.7 Billion (unique keys) = 412 GB.
Data Partitioning and Replication
Range based partitioning
Hash-Based Partitioning
Consistent Hashing into number between 1-256
Cache
How much cache memory should we have?
170GB
Which cache eviction policy would best fit our needs?
Least Recently Used (LRU) can be used.
How can each cache replica be updated?
Whenever there is a cache miss, servers need to access DB.
Load Balancer
- Between Clients and Application servers
- Between Application Servers and database servers
- Between Application Servers and Cache servers
Purging or DB cleanup
- Lazy cleanup
- Cleanup Service to clean periodically.
- Default expiration time
Telemetry
- How many times a short URL has been used?
- What location is it used?
- How would we store these statistics?
- Country of the visitor, date and time of access, web page, browser, platform from where the page was accessed.
Security and permission
Can users created private URLs or allow a particular set?
Use wide-column database like Cassandra. They key for the table will be the HASH, add one columns UserIDs.