Antonio Gulli: Categorization, Networking, Software

You are here: Home \| Webnet 98

Web Server Load Balance
	Abstract: A common problem with hypertext information servers is that they must cope high load of requests. Jamming.Net is a multithread server, written in Java, which dispatches incoming HTTP connections to several machines according to a preferred policy of balance. Consequently, Jamming.Net is able to share load among a set of conventional web servers, both commercial and public domain, improving performance and reducing latency. Jamming.Net runs without problems on every machine with JDK version 1.1. Tests have been performed on Solaris, NT, Linux.


		Jamming.Net: a Server to Balance WWW Load Antonio Gulli gulli@di.unipi.it Computer Science Dept. Univ. Pisa, Studio Associato I-Tech 1 The Need of Distributed Servers A need for advanced solution to manage high load traffic is necessary, as the WWW increases in size and complexity. Today, the most common solution is to have a single web server running on a very powerful machine with great capacity of memory, fast processors, fast access disk array. This way to face the problem is somewhat limited because the solution is not much scalable. In fact, as soon as the volume of traffic increases, arises the need to modify the hardware configuration of the machine running web server. There are cases in which one must even forced to replace hardware. Besides, this solution is not fault tolerant. If there is a hardware problem in the machine, or a software failure in HTTP server, the WWW service stops working and published information is no longer available on line. We are identifying three requirements that modern systems should meet in order to publish information: Network Scalability: preserving the used hardware architecture and adding only a new HTTP server running on a different machine, whenever the incoming traffic to an existing HTTP server increases. Load Balancing: sharing traffic among a group of HTTP servers according to some policies which depend on local load or some pseudo random heuristic. Fault Tolerance: In case of fault of one of the servers we want to be able to recover, stopping its use and replacing it with one of alive servers automatically. 2 Jamming.Net: a Server to Load Balance Existing WWW Servers To address these requirements, we developed a software server architecture that tries to satisfy them. One more need we want to satisfy is not to build another web server; in fact commercial and public domain servers exist with many features in term of programmability (server side script, module extensions and so on..) and with many tools dedicated to manage them. So, we develop a program acting in front of pre-existing set of HTTP servers, each one eventually running on different computers. In fig. 1 we have the position in network architecture in which Jamming.Net is placed. Like a traditional www proxy [1], Jamming.Net is situated between a www client (the browser) and the information content provider. Compared to a www proxy the marked difference is that Jamming.Net is able to decide what HTTP server has to provide the answer for every incoming request, on the bases of local information. Moreover Jamming.Net is transparent to the www client, in fact there is no need to configure the browser in order to use it. Figure 1: The position in network architecture in which Jamming.Net is placed. 2.1 Java Multithread Architecture To Build this kind of architecture we decide to use Java [2] as developing language; it grants us desirable features such us portability, object oriented programming, general network API and multithreading. While portability allows us to run Jamming.Net on the most common platform (both Unix and NT), O-O Programming allows us a clear prototyping, finally network API programming and multithread allows us to gain benefits from lightweight process and BSD-like network features [3]. Inner server architecture is described in fig. 2. It consists of many threads. A subset of them lives forever and provides general functions such us management, periodic fault discovery, handling of incoming connections. Another subset is created dynamically for every incoming connection and only lives for the duration of same connection. We can explain it in detail: One Listening Thread that listens to incoming connections from Web clients, creates a new thread to handle each one, choosing a www server to serve them, among the configured set. Since now, we call worker a HTTP server in the configured set. The method of choosing a worker is a matter of balance, and we are going to talk about it later. A set of Switching Threads, each one created dynamically by the listening thread when a new HTTP connection arrives. They take the job to store & forward all HTTP packets moving from the assigned www client to the chosen worker, as well as all answer from the worker to the client. In this way, they die as soon as there is no more data to handle. One Control Thread acceptinging to commands given through a control panel applet. In fact these commands are used to modify the server's state in real time (balance policy, workers' type, fault-tolerance). We are going to talk about Jamming.Net management later. One Sampling Thread, running at low priority, which performs periodic network's sampling in order to discover a worker's fault. Figure 2:Jamming.Net server architecture 2.2 Balancing Policy and Network Scalability Inside Jamming.Net, load balancing consists in choosing which worker serves incoming requests, dynamically and for every single HTTP connection. This technique is more useful than HTTP Redirect or DNS Balancing methods [4] because it gives the administrator control on the load generated by each WWW request. In fact, HTTP Redirect performs a redirect to another web server only for the first incoming connection. All subsequent requests go directly to that server. DNS Balancing has many problems well described in [4]: above all, it shows poor fault tolerance and poor balancing in case of spot traffic. Jamming.Net implements three balancing policies in order to let the listener thread choose which worker serves the incoming request: Round Robin: for every incoming connection, next worker in modulo is chosen. Uniform Random: a worker is chosen basing on a generated uniform random number. Origin IP Hashing: given the IP address of WWW client (the browser), a worker is chosen by applying a hash function on it. Consequently incoming connections from the same machine are always served by the same worker, by using this technique. Network Scalability is simply reached enlarging the set of HTTP servers acting as workers. This could be exploited via the management applet, without performing any Jamming.Net reset. So, when traffic goes over a certain threshold, a network administrator can expand the system by preserving the existing hardware and software installation. 2.3 Fault Tolerance Fault tolerance is the set of techniques used by the server to discover the failure of a worker. Implemented techniques are grouped in two classes: Synchronous Sampling: as seen before, a worker is selected when a www connection is arriving, basing on the chosen balancing policy. If it happens that a worker does not answer before a time-out threshold, a fault condition is raised. This kind of testing is synchronous; in fact it depends on both www client request and chosen balancing policy. Asynchronous Sampling: as seen in architecture schema, a sampler thread stays alive with minimum priority. When a threshold time arrives it choices a worker, produces a request and waits for an answer. If it does not come, then a fault condition is raised. This kind of testing is asynchronous since it does not depend on www requests originating from clients. If a fault is raised, some strategies of recovery could be exploited: Worker Remove: the worker is removed from the list of those available, and another available worker is selected. Administrator Notify: the event of worker's fault is notified to the administrator. He / She could decide what to do to recover from fault. 2.4 Security Some security controls are performed by server: Time-out on Listener side: a system of time-outs is exploited on the listener thread in order to avoid that incoming connections with no HTTP request could cause a waste of resources. Network address verifying: when a new worker is inserted in the set of those available, Jamming.Net performs a control that it has a valid network address, and that there is a www server on the given port. This is to prevent an erroneous configuration. 2.5 System Cache In order to reduce communication latency for small TCP-IP connections, a simple LRU cache mechanism was implemented. Each store & forward thread accesses cache concurrently by using a mutex mechanism. Cached data is stored in the local file system and loaded in memory by means of a hash table. It is possible to configure: data TTL, maximum dimension of cached data and maximum cache dimension. By adopting a very small cache substantially improves the achieved performance. 2.6 Workers Architecture: Replication or Shared File System Workers are machines running a WWW server. There are two possible strategies for mantaining the content of these servers. The information published could be: Replicated: for example on a Unix System using the "rdist" [5] mechanism Shared: for example on an Unix System using NFS or AFS [6], on NT windows via SMB [7] Workers could be placed on the same LAN of Jamming.Net server or, improving performances, on a dedicated LAN. In such cases, the machine running Jamming.Net must act as a router for this auxiliary LAN. We can note that in these architectures Jamming.Net could act as a simple firewall. 3 Management Applet The whole server configuration is performed by using an applet written by using JDK1.0 and consequently running on most common browsers. It is possible to modify the inner configuration of Jamming.Net without stopping and restarting the server, by using the applet. We can see a subset of the applet's tabbed panels in fig. 3, they are used to group together server functionality: Workers' Control: this panel gives to the administrator the possibility to modify the workers' set known by Jamming.Net. Balance Technique: this panel allows the administrator to select the balancing policy used by Jamming.Net Fault Panel: this panel permits the administrator to select what to do if a worker fault is raised, what kind of sampling to perform (synchronous and/or asynchronous) and to choose sampling intervals. Cache Panel: this panel permits the administrator to configure and to manage system cache, to choose maximum occupation allowed in file system, maximum file dimension and data TTL. Interactions with server are performed by using three buttons. The Submit button submits to the server the choices selected in the applet. The Resync button loads from server the actual configuration. The Status button loads the server's status (number of faults and statistics). 4 Current State of Work and Future Work Currently, we have a working system that meets the requirements described above. Jamming.Net is used to share traffic on a test bed architecture composed of three workers running Linux on a bi-processor Pentium with Apache[8] used as HTTP server. We are testing what is the maximum throughput reached by the distributed architecture under different load conditions, different dimension of data requested and different balancing policy. Jamming.Net's code is available on request and the executable was accepted by Java Application Catalyst Italia (maintained by Sun Italy), Gamelan (maintained by Developer.com), JavaToys and JARS. We believe that this work could point out the general need to move from an unique information server to a scenario of distributed servers' set. 5 Acknowledgments I want to thank Dr. R. Perego and Dr. S. Orlando because they have suggested me some ideas behind Jamming.Net. I also thank Prof. G. Attardi, all Agent Group in CS Multimedia Lab, all the cool people belonging to Studio I-tech, Dr. Dario Lupi, Nico Tranquilli and Dr. Paola Padella. They all gave me critic review about this work. 6 References [1] A Distributed Testbed for National Information Provisioning, http://ircache.nlanr.net/Cache/ [2] Java Computing, http://www.sun.com/java/ [3] JavaSoft: Technical Articles, http://developer.javasoft.com/developer/technicalArticles/#network [4] ONE-IP:Techniques for Hosting a Service on a Cluster of Machines, http://www6.nttlabs.com/HyperNews/get/PAPER196.html [5] Distributing files remotely over TCP/IP, http://obgyn.umsmed.edu:457/NetAdminG/rdistN.rdist.html [6] Transarc AFS Documentation, http://eis.jpl.nasa.gov/afs/transarc.com/public/www/Public/Documentation/afs_doc.html [7] SMB Resources and Information, http://www.ns.aus.com/smb-res.html [8] Apache's site, http://www.apache.org

Antonio Gullì
gulli@di.unipi.it