Professional Documents
Culture Documents
# 1. Reliable, Scalable, and Maintainable Applications (20)
- Reliability
- human error
- Scalability
- Maintainability
- operability
- simplicity
- evolvability
CPU is not a limiting factor for `data-intensive` apps (it would be for `compute-intensive`).
- Store data so that they, or another application, can nd it again later (databases)
- Remember the result of an expensive operation, to speed up reads (caches - e.g. `Memcached`)
- Allow users to search data by keyword or lter it in various ways (search indexes, full-text search
- `Elasticsearch or Solr`)
The boundaries between database and queues are blurred, so they all fall under `Data Systems`
category.
## Reliability
- hardware fault
- aim to have your app running on several servers (multi machine redundancy)
- software fault-tolerance techniques can hide certain types of faults from the end user.
- software errors
- a service that the system depends on that slows down, becomes unresponsive, or starts
returning corrupted responses.
- cascading failures, where a small fault in one component triggers a fault in another
component, which in turn triggers further faults
- human errors
- provide fully featured non-production sandbox environments where people can explore and
experiment safely, using real data, without a ecting real users.
- telemetry
fl
fi
ff
fi
- proper testing to handle user errors gracefully
## Scalability
Scalability is the term we use to describe a system’s ability to cope with increased load meaning
you can add processing capacity in order to remain reliable under high load.
## Describing load
What load are we talking about for the speci c app? Number of active users, number of
messages per sec.
- each time a following person posts something, that post gets inserted into 'mailbox' list of
each followers timeline cache
- but publishing post is a long operation for users with a lot of followers
3. mixed approach
- their tweets are pulled into the timeline at the moment when user opens timeline (like in
approach 1)
## Describing Performance
- For batch processing Hadoop performance is `throughput` — the number of records we can
process per second.
- For online systems performance is `response time` (request sent, response received) - not a
single number. Time di ers from response to response, so performance is a distribution of
response time values.
For response time using average is not good as it doesn't show how many users experience
delays. Better to use median and use percentiles (p50). In that case 50% are faster than the
median, 50% are slower.
> median response time of less than 200 ms and a 99th percentile under 1 s (if the
> response time is longer, it might as well be down), and the service may be required to
- `head-of-line blocking` - when the requests are queued to server and long to process requests
are in the beginning of the queue blocking the quick requests. Due to this e ect, it is important to
measure response
ff
fi
ff
- `tail latency ampli cation` - one request results into multiple other services requests and needs
all the responses to get back to the user. It takes even just once request being slow to slow down
the whole response to the user.
- `scaling out` or `shared-nothing` (horizontal scaling, distributing the load across multiple smaller
machines)
- `elastic` systems can detect load increase and add resources - used when load is unpredicted
There is no universal scalable architecture `magic scaling sauce`. The problems are di erent:
- volume of reads
- writes
- data to store
- complexity of data
## Maintainability
- documentation
- predictable behavior
- Simplicity of code
- avoid:
- tangled dependencies,
- hacks
- focus on:
- good abstraction
- For example, how would you “refactor” Twitter’s architecture for assembling home timelines
(“Describing Load” on page 11) from approach 1 to approach 2?
## Summary
- `functional requirements` (what user gets) - what the system should do, such as allowing data to
be stored, retrieved, searched, and processed in various ways.
- `nonfunctional requirements` (so that it works well) - general properties like security, reliability,
compliance, scalability, compatibility, and maintainability
fi
ff