27-Mar-2025

Leader Election

Leader election is a fundamental problem in distributed systems. It’s the process of designating a single process as the “leader” within a group of distributed processes. This leader then typically takes on special responsibilities like coordinating tasks, managing resources, or acting as a single point of contact. Having a single leader helps avoid conflicts and ensures consistency across the system.

Several algorithms exist to solve leader election, each with its trade-offs in terms of complexity, fault tolerance, and performance. Here, we’ll explore the Bully algorithm.

The Bully Algorithm

The Bully algorithm is relatively simple to understand and implement. It operates on the premise that each process has a unique identifier (ID) and processes with higher IDs are “stronger” or more preferred as leaders.

Here’s a breakdown of the algorithm:

Election Trigger: An election is triggered when a process detects the current leader has failed (e.g., it becomes unresponsive).
Election Message: The process initiating the election sends an “election” message to all processes with higher IDs.
Response:
- If a process receives an “election” message and it is alive, it responds with an “OK” message to the sender, indicating it’s a candidate and will participate in the election. It then starts its own election by sending “election” messages to processes with higher IDs than itself.
- If a process doesn’t respond, it is assumed to be down.
Leader Declaration:
- If a process sends out “election” messages and receives no “OK” responses, it declares itself the leader and sends a “coordinator” message to all other processes to inform them.
- If a process receives an “OK” message, it waits for a predetermined timeout period. If it doesn’t receive a “coordinator” message within that time, it assumes the higher-ID process failed and restarts the election process.

Example (Conceptual):

Imagine processes P1, P2, P3, and P4 (P4 having the highest ID).

P1 detects the leader is down and initiates an election. It sends “election” messages to P2, P3, and P4.
P2, P3, and P4 respond with “OK”.
P2 sends “election” to P3 and P4. P3 sends “election” to P4.
P4 receives “election” from both P2 and P3, responds “OK” to both.
Since P4 receives no further “election” messages, it declares itself the leader and broadcasts a “coordinator” message.

Example (JavaScript - Simplified):

class Process {
  constructor(id, processes) {
    this.id = id;
    this.processes = processes;
    this.isLeader = false;
  }

  startElection() {
    console.log(`Process ${this.id} starting election`);
    let receivedOK = false;
    for (const process of this.processes) {
      if (process.id > this.id) {
        console.log(`Process ${this.id} sending election message to ${process.id}`);
        // Simulate sending message and receiving OK
        if (process.receiveElectionMessage(this.id)) {
          receivedOK = true;
        }
      }
    }
    if (!receivedOK) {
      this.declareLeader();
    }
  }

  receiveElectionMessage(senderId) {
    console.log(`Process ${this.id} received election message from ${senderId}`);
    // Simulate sending OK
    console.log(`Process ${this.id} sending OK to ${senderId}`);

    //Start its own election.
    this.startElection();
    return true;

  }

  declareLeader() {
    console.log(`Process ${this.id} declares itself leader`);
    this.isLeader = true;

    //Inform others. In reality a hearbeat mechanism would be used.
    for (const process of this.processes) {
         if(process.id != this.id){
           process.newLeaderNotification(this.id);
         }
    }
  }


  newLeaderNotification(leaderId) {
    console.log(`Process ${this.id} notified that ${leaderId} is leader`);
  }


}

const processes = [new Process(1, []), new Process(2, []), new Process(3, []), new Process(4, [])];
processes.forEach(p => p.processes = processes)

processes[0].startElection();

The Bully algorithm, while effective, has limitations. The number of messages exchanged can be high, especially in larger clusters. Other algorithms like the Ring algorithm or Paxos offer different approaches and performance characteristics. The choice of algorithm depends heavily on the specific needs of the distributed system.