High Availability

Fault tolerance

Fault tolerance is the ability of the system to stay up despite taking hits.

Failure reasons

Software crashes
Hardware Failures
Human errors This is an interesting read.
Planned Downtown: maintenance operations, software, hardware upgrade.

Fail soft:A few of the instances/nodes, out of several, running the service go offline & bounce back all the time. In case of these internal failures, the system could work at a reduced level but it will not go down entirely.

Highly Available Fault-Tolerant Service.

the entire massive service is architecturally broken down into smaller loosely coupled services called the micro-services

Redundancy

Active-Passive HA Mode

GPS, aircrafts, communication satellites which have zero downtime. The availability of these systems is ensured by making the components redundant.

Single point failure: Distributed systems

Monitoring & Automation - to cut down human errors

Replication

Active-Active High Availability

Geographical Distribution of Workload

High Availability Clustering

A High Availability cluster also known as the Fail-Over cluster contains a set of nodes running in conjunction with each other that ensures high availability of the service. Heartbeat network monitors the health.

A single state across all the nodes in a cluster is achieved with the help of a shared distributed memory & a distributed co-ordination service like the Zookeeper.

e.g. Jenkins master and clients.

Disk mirroring/RAID Redundant Array Of Independent Disks, redundant network connections, redundant electrical power

Which of the following statements is true in context to scalability & high availability?

Scaling an application horizontally means getting rid of single points of failure & that means the application also becomes highly available.

Fault tolerance​

Failure reasons​

Redundancy​

Active-Passive HA Mode​

Single point failure: Distributed systems​

Monitoring & Automation - to cut down human errors​

Replication​

Active-Active High Availability​

Geographical Distribution of Workload​

High Availability Clustering​