Application load balancing methods

Natalia Pryntsova
4 min readOct 20, 2020

The holy grail of load balancing is uniform load between workers. Obviously it only exists in a universe of equally-sized tasks and workers which never go offline/slow down or otherwise fail.

Here we will have a brief look at common load balancing methods.
Rather than going into details of HOW load balancing is implemented (for example NGINX) we will focus on WHY we should use one method or another.

Round Robin

The load balancer will send the next task to the next worker in the pool.

Why is it good?
It is a simple and predictable algorithm, default for many load balancers. There are no issues if one node dies or comes back online — it’s just included back in the pool of active nodes.

It works well if requests/tasks are more or less the same size.

What’s the problem?
If some tasks are much longer than the others, round robin suffers from head of the queue problem when one long task in front of the queue slows down all shorter tasks which are behind it in the queue.

It is a bit like choosing a “bad queue” in the supermarket checkout when someone in front of you is taking ages to pay meaning you will also queue for longer than necessary.
Round robin LB immediately makes this issue worse by continuing to add more and more tasks into the affected queue.

Head of the queue illustrated

Least loaded

The load balancer will send the task to the next “least loaded” worker in the pool, where least loaded usually translates into least connections or least outstanding requests.

Why is it good?
In case of differently sized tasks, it has a better chance to load the workers uniformly. If one of the workers gets a long-running task in the queue, the queue will initially build up but then tasks will be routed to other workers.

What’s the problem?
Head of the queue issue will still happen for some number of short tasks so you will see “random” spikes in latency for tasks which are supposed to be fast but spend time queueing.

The load balancer itself becomes more complex as it has to keep track of which worker has which number of connections/outstanding requests.

It also requires proper monitoring to automatically take out faulty workers from the pool. Why?

  • If a worker crashes and restarts, after restart it will have least connections/zero outstanding requests. So all new tasks will go to this (potentially faulty) worker and may overwhelm it again.
  • If a worker is faulty and quickly deals with the tasks by failing them — this worker will also appear as least loaded and new tasks will be disproportionately routed to a faulty worker.
W3 is “the best worker” but actually it cannot connect to the DB

Is load balancing based on other metrics like CPU Usage or Memory usage possible?

There are sophisticated load balancers which allow for it, but for normal requests/task execution load balancing CPU and Memory metrics are too volatile.
Basically, the load balancer will constantly base its routing decisions on the stale data which will lead to overloading and/or underutilizing of workers.

If you feel like it would be nice to load balance by CPU, memory, disk or something else — maybe the time has come to switch to a more complex task scheduler instead so you can use algorithms like token bucket for resource allocation.

IP Hashing/Sticky sessions

This set of load balancing methods creates a persistent association between a client and a server/worker.

Why is it good?
One situation I see where sticky sessions may work better is when an application relies heavily on local in-memory cache so the same requests going to the same worker may be beneficial.

What’s the problem?
There are many: a client’s IP address can change in-flight between requests and client will lose the session. One worker can receive too many requests from one client and die while other workers will be doing nothing. System becomes more vulnerable to DDoS attacks.

The load will never be uniform or anywhere near so general advice is to avoid “sticky” LB methods.

And finally, load balancer vs reverse proxy.
I have seen people use these terms interchangeably while there is functional difference.

Reverse proxy does not need to have multiple servers/workers to distribute the workload, it’s main purpose is to play the “public face” and public IP address to your main application servers. In addition you can end SSL on it or deny/allow client IPs (but you can do the same on LB).

Summary

As a developer you may never need to install and configure a load balancer yourself but you need to be aware of the main load balancing methods and be able to choose which one will give you the best performance.

--

--