Load distribution is a term in computers and telecommunications networks, which refers to the partitioning of a workload M over N resources with capacity less than M. It is often closely related to multihoming and fault tolerance, and those goals often are integrated. Load distribution, however, may be done purely for performance and capacity reasons.
It is perfectly normal to find multiple load distribution mechanisms used in the same network service, for different protocols and for different applications. Some general terms apply to all forms.
Resource selection principles
Resource selection may be based on deterministic, statistical estimation, or measurement. The classic deterministic method is round-robin, where each successive unit of work — a packet in routing, a connection at the end-to-end level, a transaction at the application level — is assigned to the next sequential resource in a set of resources. When the last resource is selected for a unit of work, the next unit of work goes to the first resource in the set, and the cycle repeats.
Resource assignment is the method by which the work is actually sent to the resource. It can be indirect, in which the resource is selected during an association phase when the resource selection takes place in the association manager. Associations certainly include connections, but also connectionless functions such as Domain Name System response, as well as path setup that does not commit resources, as in the Resource Reservation Protocol. Association managers tend to be considered part of the network rather than a server, although they may be a proxy for the server and not visible to the client.
A subset of indirect assignment is done by the application server rather than a network element or directory. Application protocols including the Hypertext Transfer Protocol (HTTP), Telnet, and the File Transfer Protocol can redirect the client to a different address or port than the server originally contacted.
Direct resource assignment is "on the fly", in which packet addresses are changed in transmission.
Packet and frame level
At the most basic, frame-level bridges and packet-level routers can only make decisions about sending traffic over N directly connected transmission link resources.
While early routing researchers thought the routing system could be made dynamically load-aware, this simply has not worked well in practice, causing much overhead but never really reflecting current load. For example, the now obsolete Cisco Interior Gateway Routing Protocol (IGRP) had the capability of using the utilization ratio of a link as a factor in its link cost computation. A 100% utilized 10 Mbps link would be less attractive than a 2% utilized 64 Kbps link. In practice, however, adjustment based on link utilization led to oscillation: successive packets would avoid the more heavily utilized link, until it became lightly used — and again became the preferred outgoing link.
Link utilization also could select a locally lightly loaded link, for which the next link was extremely congested. No global load tracking method was devised that had a reasonable balance between the overhead of tracking and the constant resetting of routes.
What has been effective is to use routing to find paths,and then assign end-to-end paths at the entry point to the network.
In the first data networks, a simplifying assumption could be made that all links had the same speed, and round-robin assignment was used. Assume N=3. The first packet would go to link 0, the second to link 1, the third to link 2, and the fourth to link (last + 1) modulo N, or back to zero. This worked acceptably with slow links, where computing time and memory were much more available than bandwidth. As bandwidth increased, round robin became less attractive, because, for each destination in the forwarding information base, state had to be maintained: the last link used, which had to be updated for every unit of data sent.
A fundamental Internet design concept is that routers, and by extension bridges, are stateless with respect to individual forwarding decisions. In reality, they retain a certain amount of state in their forwarding tables, but these are in the control plane, not the forwarding plane, and are updated at a much slower rate than of data forwarding.
The next approach was to recognize each new destination address, and, when it was first seen, associate it to the next available link, and always send traffic to that destination over that link. With a large number of destinations, such that workload per link averages to roughly the same amount, this can work. With a small number of destinations, as, for example, on a set of links purely internal to an enterprise, "pinhole congestion" may occur, where one link is assigned to a destination receiving far more traffic than any other.
Pinhole congestion was much more easily avoided by hashing the combination of source and destination address (i.e., flow to a link. Where per-destination would bind all the traffic for the Web or DNS server to a single link, source-destination hash would spread the load from different clients onto different links.
Some flow hashing techniques also consider payload information, such as the protocol type field in an IPv4 header.
Several methods are used at association time, usually directing flows to particular paths or servers. When dealing with network resources, connection admission control may be used, the data network analogy of a "all circuits busy" telephone signal that all resources are in use.
Application-level load distribution should be understood first in the simpler case of colocated servers, where there is no significant difference in the cost of communicating with them, before taking on the more complex case of distributed servers and network load distribution.
Even in the simpler local server case, both the nature of the load, and the power of the servers, need to be considered. Are all servers of equal processing power? Do all transaction types take the same processing resources? If the answer to both these questions is "yes", round-robin will be quite satisfactory. Do not, however, fall into the trap of the fallacy of the Norwegian Blue Parrot.
|Is the server only resting?
Devotees of Monty Python consider the Pet Shop skit, centering on the Norwegian Blue Parrot, among their greatest comedic triumphs. Devotees of Monty Python who are also network architects see a dangerous analogy between this bird of lovely plumage, and a technical challenge in load distribution.
In the skit, John Cleese returns, irate, to a pet store where he bought a parrot. In response to the shopkeeper's (Michael Palin) question about the problem, Cleese snaps "I'll tell you what's wrong with it. It's dead, that's what's wrong with it."
Palin insists it is merely resting, or perhaps "pining for the fjords", until Cleese thunders "It's not pining. It's passed on. This parrot is no more. It has ceased to be. It's expired and gone to meet its maker. This is a late parrot. It's a stiff. Bereft of life, it rests in peace. If you hadn't nailed it to the perch, it would be pushing up the daisies. It's rung up the curtain and joined the choir invisible.
A load distribution method that tries to send transaction to the least utilized server must be able to tell the difference between a server that is lightly loaded because it is resting, and one that is not loaded at all because it is dead. Load distribution devices must periodically poll servers for "liveness", and remove dead servers from the pool of candidates to do useful work.
Simple round robin becomes increasingly inefficient when transactions take different amounts of processing time, or some servers are faster than others, so that servers on which a transaction completed are idle. The least load first algorithm is an incremental improvement over simple round robin, and works reasonably well when transactions are equal in load and the servers are equal in capacity. This algorithm assume the transactions are of equal size.
If the work presented by a transaction is proportional to its size, the least weighted load first algorithm will give better results. This assigns the largest transaction to the least loaded server.
When the servers are of unequal capability, least weighted load becomes appropriate.
- ServerCapacity(i): Capacity of server i
- SessionCount (i,j): Number of servers of type j on server i
- LoadingUnit (j): the unit of workload estimation, which could be session or size
- LoadingFactor (n): estimate of server capacity needed for 1 unit of load.
- Foreach session on server(i)
- serverusage(i) = severUsage(i)+(serverUsage(LoadUnit(j) * LoadingFactor (i,j))
- Assign the new unit of work to the server(i) with the lowest serverUsage(i)/serverCapacity(i)
- ↑ P. Srisuresh, D. Gan (August 1998), Load Sharing using IP Network Address Translation (LSNAT), Internet Engineering Task Force, RFC2391