CS865 – Distributed Software Development

Lecture 8

Tannenbaum and Van Steen – Chapter 8

Fault Tolerance
Dealing successfully with partial failure within a Distributed System. ( a review by Gartner 1999)

Key technique: Redundancy.

Basic Concepts
Fault Tolerance is closely related to the notion of “Dependability”
In Distributed Systems, this is characterized under a number of headings:
What Is “Failure”?

Definition: A system is said to “fail” when it cannot meet its promises.

 Types of Faults

Failure Models
 Different types of failures. (Cristian 1991) and (Hadzilacos and Toueg 1993).

Type of failure

Description

Crash failure

A server halts, but is working correctly until it halts

Omission failure
  • Receive omission

  • Send omission
A server fails to respond to incoming requests
  • A server fails to receive incoming messages

  • A server fails to send messages

Timing failure

A server's response lies outside the specified time interval

Response failure
  •  Value failure

  •  State transition failure
A server's response is incorrect
  • The value of the response is wrong

  • The server deviates from the correct flow of control

Arbitrary failure

A server may produce arbitrary responses at arbitrary times

 
 Failure Masking by Redundancy
 Strategy: hide the occurrence of failure from other processes using redundancy.
Three main types:



Distributed Systems Fault Tolerance Topics
  1. Process Resilience
  2. Reliable Client/Server Communications
  3. Reliable Group Communciation
  4. Distributed COMMIT
  5. Recovery Strategie


Process Resilience
(Guerraoui and Schiper, 1997)

Flat Groups versus Hierarchical Groups
(a) Communication in a flat group. (b) Communication in a simple hierarchical group.
 

 Communication in a flat group – all the processes are equal, decisions are made collectively.

Communication in a simple hierarchical group - one of the processes is elected to be the coordinator, which selects another process (a worker) to perform the operation.


Failure Masking and Replication

 By organizing a fault tolerant group of processes , we can protect a single vulnerable process.

Two approaches to arranging the replication of the group:

Primary (backup) Protocols

Replicated-Write Protocols

 Agreement in Faulty Systems

Goal of distributed agreement algorithms - have all the non-faulty processes reach consensus on some issue, and to establish that consensus within a finite number of steps.
Complications:
  1. Synchronous versus asynchronous systems.
  1. Communication delay is bounded or not.
  1. Message delivery is ordered or not.
  1. Message transmission is done through unicasting or multicasting.



Circumstances under which distributed agreement can be reached.

 
Note - most distributed systems in practice assume that processes behave asynchronously, message transmission is unicast, and communication delays are unbounded.  

 History Lesson: The Byzantine Empire



How does a process group deal with a faulty member?

The “Byzantine Generals Problem” for 3 loyal generals and 1 traitor.

  1. The generals announce their troop strengths (in units of 1 kilosoldiers) to the other members of the group by sending a message.
  2. The vectors that each general assembles based on (a), each general knows their own strength. They then send their vectors to all the other generals.
  3. The vectors that each general receives in step 3. It is clear to all that General 3 is the traitor. In each ‘column’, the majority value is assumed to be correct.

 
Goal of Byzantine agreement is that consensus is reached on the value for the non-faulty processes only.

Solution in computer terms:


Algorithm operates in four steps.

  1. Every non-faulty process i sends vi to every other process using reliable unicasting.

  1. The results of the announcements of step 1 are collected together in the form of the vectors (Fig.b).
  2. Every process passes its vector from (Fig.b) to every other process.

  1. Each process examines the ith element of each of the newly received vectors.

 
Example Again:
With 2 loyal generals and 1 traitor. 
Note: It is no longer possible to determine the majority value in each column, and the algorithm has failed to produce agreement.
 
Two correct process and one faulty process.
            
 
 
Reliable Client-Server Communication

Kinds of Failures:


Detecting process failures:

Processes actively send "are you alive?" messages to each other (for which they obviously expect an answer)

Processes passively wait until messages come in from different processes.



Example: RPC Semantics and Failures
Remote Procedure Call (RPC) mechanism works well as long as both the client and server function perfectly!!!


Five classes of RPC failure can be identified:
  1. The client cannot locate the server, so no request can be sent.

  2. The client’s request to the server is lost, so no response is returned by the server to the waiting client.

  3. The server crashes after receiving the request, and the service request is left acknowledged, but undone.

  4. The server’s reply is lost on its way to the client, the service has completed, but the results never arrive at the client

  5. The client crashes after sending its request, and the server sends a reply to a newly-restarted client that may not be expecting it.


A server in client-server communication.
(a). A request arrives, is carried out, and a reply is sent.
(b). A request arrives and is carried out, just as before, but the server crashes before it can send the reply.
(c). Again a request arrives, but this time the server crashes before it can even be carried out. And, no reply is sent back.
   
 
 
Server crashes are dealt with by implementing one of three possible implementation philosophies:


It has proved difficult to provide exactly once semantics.


Lost replies are difficult to deal with. 


A request that can be repeated any number of times without any nasty side-effects is said to be idempotent. 


Nonidempotent requests (for example, the electronic transfer of funds) are a little harder to deal with. 


Client Crashes
When a client crashes, and when an ‘old’ reply arrives, such a reply is known as an orphan.


Four orphan solutions have been proposed:
  1. extermination (the orphan is simply killed-off),

  2. reincarnation (each client session has an epoch associated with it, making orphans easy to spot),
  3. gentle reincarnation (when a new epoch is identified, an attempt is made to locate a requests owner, otherwise the orphan is killed),
  4. expiration (if the RPC cannot be completed within a stardard amount of time, it is assumed to have expired).



In practice, however, none of these methods are desirable for dealing with orphans. 
Orphan elimination is discussed in more detail by Panzieri and Shrivastava (1988).
 
 
Reliable Group Communication
Reliable multicast services guarantee that all messages are delivered to all members of a process group.


Small group: multiple, reliable point-to-point channels will do the job, however, such a solution scales poorly as the group membership grows. 

Basic Reliable-Multicasting Schemes
Simple solution to reliable multicasting when all receivers are known and are assumed not to fail
 
a)Message transmission – note that the third receiver is expecting 24.
b)Reporting feedback – the third receiver informs the sender.
 
 A extensive and detailed survey of total-order broadcasts can be found in Defago et al. (2004).
 
Scalability in Reliable Multicasting

A comparison between different scalable reliable multicasting can be found in Levine and Garcia-Luna-Aceves (1998).

Nonhierarchical Feedback Control
Feedback Suppression – reducing the number of feedback messages to the sender (as implemented in the Scalable Reliable Multicasting Protocol). Floyd et al. (1997) and Liu et al. (1998)


Several receivers have scheduled a request for retransmission, but the first retransmission request leads to the suppression of others.
 
 
 
Hierarchical Feedback Control
Hierarchical reliable multicasting - the main characteristic is that it supports the creation of very large groups.
a)Sub-groups within the entire group are created, with each local coordinator forwarding messages to its children.
b)A local coordinator handles retransmission requests locally, using any appropriate multicasting method for small groups.

 
 
Main problem : construction of the tree.

Conclusion:
 

Atomic Multicast
Atomic multicast problem:


 
 
Virtual Synchrony
The concept of virtual synchrony was proposed by Kenneth Birman as the abstraction that group communication protocols should attempt to build on top of an asynchronous system.

Virtual synchrony is defined as follows:
  1. All recipients have identical group views when a message is delivered. (The group view of a recipient defines the set of "correct" processors from the perspective of that recepient.)
  2. The destination list of the message consists precisely of the members in that view
  3. The message should be delivered either to all members in its destination list or to no one at all. The latter case can occur only if the sender fails during transmission.
  4. Messages should be delivered in causal or total order (depending on application semantics).


Reliable multicast with the above properties is said to be virtually synchronous (Birman and Joseph, 1987).
 
 Message Ordering
 Four different orderings:
  1. Unordered multicasts

  1. FIFO-ordered multicasts

  1. Causally-ordered multicasts

  1. Totally-ordered multicasts

 
Six different versions of virtually synchronous reliable multicasting.
 
Distributed Commit
Examples of distributed commit, and how it can be solved are discussed in Tanisch (2000).


General Goal: We want an operation to be performed by all group members or none at all.


One-Phase Commit Protocol:


Two-Phase Commit Protocol:
  1. The coordinator sends a VOTE_REQUEST message to all group members.

  2. A group member returns VOTE_COMMIT if it can commit locally, otherwise VOTE_ABORT.

  3. All votes are collected by the coordinator. 

  1. Group members then COMMIT or ABORT based on the last message received from the coordinator.



 
First phase - voting phase - steps 1 and 2.
Second phase - decision phase steps 3 and 4.

(a) The finite state machine for the coordinator in 2PC.
(b) The finite state machine for a participant.
 
 
Big Problem with Two-Phase Commit


Three-Phase Commit Protocol:
 
Essence: the states of the coordinator and each participant satisfy the following two conditions:
  1. There is no single state from which it is possible to make a transition directly to either a COMMIT or an ABORT state.
  2. There is no state in which it is not possible to make a final decision, and from which a transition to a COMMIT state can be made.

 
(a) The finite state machine for the coordinator in 3PC.
(b) The finite state machine for a participant.
 
  
 
Recovery
  1. Backward Recovery: return the system to some previous correct state (using checkpoints), then continue executing.

  2. Forward Recovery: bring the system into a correct state, from which it can then continue to execute.



Forward and Backward Recovery
Backward Recovery:
Advantages
Disadvantages:

[Despite the cost, backward recovery is implemented more often.  The “logging” of information can be thought of as a type of checkpointing.].



Disadvantage of Forward Recovery:

Example
Consider as an example: Reliable Communications.
Retransmission of a lost/damaged packet - backward recovery technique.
Erasure Correction - When a lost/damaged packet can be reconstructed as a result of the receipt of other successfully delivered packets - forward recovery technique. [see Rizzo (1997)]



 
Recovery-Oriented Computing
Recovery-oriented computing - Start over again (Candea et al., 2004a).

Different flavors: