Tuesday, August 21, 2018

Design of data intensive applications -1

Design of data intensive applications -1


An application has to meet various requirements to be useful. They are functional and non-functional requirements. 

Some of the non functional requirements are Scalability, Maintainability & Reliability

Reliability : the system continues to work correctly in the face of adversity like software errors, hardware failures and human errors


Scalability: As the system grows there should be ways of dealing with that. This is because the system degrades under increasing load.


Describing load: Load can be described with load parameters which depend on the architecture of the system. It may be number of requests per second or the ratio of reads to write to a database, rate of cache misses. In Twitter one of the useful load parameters is the fan out of a tweet.

To describe performance we can ask two questions - when load on a system is increased how. the performance affected and — what resources are needed to keep the performance same with increased load parameters. To answer these we need performance numbers. We can take the average numbers but it is not representative of the pain. Percentiles are better as we know how much percent of requests are served in how much time. Also it is important to measure the response times on the client side

The system which runs on single server is easy work with but the capacity of a single machine is limited and hence we cannot avoid scaling out ( using multiple machines). This is easy if the application is stateless but replicating state across multiple machines is complex. So if possible keep database on a single node and scale out only is necessary.

There is no magical scaling sauce. It needs to be rethought at every level of magnitude increase. It also depends on what requests are frequent and what requests are rare. I.e. the load parameters.

Maintainability: over time maintaining current system and adapting it to new use cases should be easy.


Operability — Operators should be able to quickly view the state of the system and correct any issues easily. So they need metrics, automated deployment etc

Simplicity - simple things are easy to reason about. So strive to keep things as simple as possible. Good abstractions help with that.

Evolvability - the system should be able to evolve as needed. Again good processes like agile help in making change easy. Good abstractions also help by being easy to reason about.

Source: Design of data intensive applications - Martin Kleppman (book)

No comments:

Post a Comment

Comments