CAP Theorem
CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:
- Consistency: data on every non-failing node in the distributed system is the same. So that updates across distributed system must be done before allowing further reads.
- Availability: Availability can be used in two different meanings:
- Availability of real service - can be measured as ratio expressed as a percentage between working and non-working time of the service
- Availability in context of CAP theorem - for a distributed system to be continuously available, every request received by a non-failing node in the system must result in a response. So that data must be replicated between nodes of the system and also server is not allowed to ignore the client's requests.
- Partition tolerance: the system continues to operate even if any one part of the system is lost or fails. Partition tolerance doesn’t require every node still be available to handle requests. It just means that partitions may occur. If you deploy on a typical IP network, partitions will occur; partition tolerance in these environments is not optional. So only a total network failure can cause a system to respond incorrectly.
So in practice every distributed system using network, must use P, and thus we have two possible types of systems: AP or CP. For systems not using network, we have AC, AP, CP models.
AP or AC:
Conventional databases assume no partitioning - clusters were assumed to be small and local (CA).
NoSQL systems may sacrifice consistency.
- On systems that allow reads before updating all the nodes, we will get high Availability
- On systems that lock all the nodes before allowing reads, we will get Consistency
Description of the CAP theorem:
- Setup:
- we have distributed system consisting of 2 servers - S1 and S2
- S1 and S2 are interconnected
- C connects to both S1 and S2
- client - C - can query any of these servers (S1 or S2)
- S1 and S2 keep track on a variable v with initial value = 0 (v=0)
- write is done from C to S1 or S2 (write request and write responce) and read is done from C (read request and read responce) to S1 or S2
- Consistency:
- consistent system:
- C write-request S1 => v=1
- S1 write => v=1
- S1 write-response C => v=1
- S1 update S2
- S2 update => v=1
- inconsistent system:
- C write-request S1 => v=1
- S1 write => v=1
- S1 write-response C => v=1
- S2 is not updated and v on S2 is still v=1
- Patition:
- When partition occurs - S1 and S2 are no more interconnected