Wednesday, September 25, 2013

Setup Basic Load Balancing in Nginx


First we must understand what is Load Balancing.

Wikipedia definition  - >

"Load balancing is a computer networking method for distributing workloads across multiple computing resources, such as computers, a computer cluster, network links, central processing units or disk drives. Load balancing aims to optimize resource use, maximize throughput, minimize response time, and avoid overload of any one of the resources. Using multiple components with load balancing instead of a single component may increase reliability through redundancy. Load balancing is usually provided by dedicated software or hardware, such as a multilayer switch or a Domain Name System server process."


In layman terms wrt nginx , it is a way in which multiple servers are ready to serve a request from client to increase reliability and reduce disk access latency, etc . Having a load balancer allows us to dynamically add and remove back-end servers easily and perform many more functions listed here:
http://en.wikipedia.org/wiki/Load_balancing_(computing)#Load_balancer_features

This diagram depicts the scenario:


Source : http://indonetworksecurity.com/

Nginx is an amazing load balancer and used by many sites (wikipedia itself - link) . 

In this diagram the load balancer can be running nginx and the servers can be running apache, boa or nginx itself. 

So we shall now setup a very simple but effective load balancer using nginx (as load balancer) and any 2 backend servers (can be any webserver)

I have assumed you followed previous post in setting up nginx. [Build and Setup Nginx from Source]

In the conf folder there is a nginx.conf file and this is main configuration file for nginx.

under http directive ->
      create a upstream directive ->

upstream backendserver  {
  server backendserver1;   
  server backendserver2;                                                                  }

   backend1/2 - > can be an ip/url of the server residing on the same system as nginx(localhost/127.0.0.1) or in another networked system as well.

under server directive inside http
      create a location directive ->


location / {
            proxy_pass http://backendserver;
        }

note the same name backendserver in proxypass and upstream.

restart nginx if running and now a simple load balancer is up and running in round robin distribution.(make sure backendservers are up as well :P)

Advanced Configurations :

ip hash; Specifies that a group should use a load balancing method where requests are distributed between servers based on client IP addresses.


weight;  Nginx allows us to assign a number specifying the proportion of traffic that should be directed to each server. 

health_check; will send “/” requests to each server in the backend group every five seconds. If any communication error or timeout occurs, or a proxied server responds with the status code other than 2xx or 3xx, the health check will fail, and the server will be considered unhealthy. Client requests are not passed to unhealthy servers.

and many more.

Sources :
http://en.wikipedia.org/wiki/Load_balancing_(computing)
http://wiki.nginx.org/HttpUpstreamModule
http://nginx.org/en/docs/http/ngx_http_upstream_module.html
https://www.digitalocean.com/community/articles/how-to-set-up-nginx-load-balancing

Tuesday, September 24, 2013

Build And Setup Nginx From Source

We will now build nginx from source rather than installing pre complied versions.(:P) and host a directory

We have to download these files for doing so.

1) latest nginx source code (http://nginx.org)

2) latest PCRE source code (www.pcre.org)

3) latest zlib source code (http://www.zlib.net)


Extract all files etc.. and then cd into the nginx-x.x.x folder

Note:

PATHN = "the path to folder where nginx build will be stored"
PATHP = "the path to pcre folder"
PATHZ = "the path to zlib folder"

Now run

./configure --prefix=PATHN --with-zlib=PATHP --with-pcre=PATHP && make && make install


After some time of compiling, linking etc the nginx binary will be built


cd into the PATHN (nginx folder) . we wil have 4 folders namely sbin , conf , logs , html .

sbin -> nginx built binary
conf -> contains the configuration files
logs -> contains various logs when server is running
html -> basic html files

we can now run the nginx binary and test at 127.0.0.1 or localhost in any browser to get this page.
(nginx should be run with root privileges to bind to port 80(default port))




cd into the conf folder and open nginx.conf

go to server part

change: listen 80 -> listen xxxx .  xxxx stands for any valid port

in location: change root html -> root "path of folder you want to host"

also add :-> autoindex on; //if you want to enable automatic directory index

save, close the file.


restart nginx and check the new configuration you have set.

to restart nginx -> run killall nginx
-> run nginx binary again

All done :)

Sunday, September 22, 2013

Decoding the Jargon

As per the plan, we started reading up on wiki and a number of other blogs to get the holistic picture of the system that we were planning to patch.

We were already familiar with the concept of a proxy server (forward proxy server to be specific) - It acts as an intermediary on the client side, fetches resources on web and returns it to them, aiding in implementing caching. 

A Reverse proxy server is its counter part on the server side, which takes requests from the client and forwards them to the servers in the internal network. The client connects to the reverse proxy which abstracts the origin servers and can be designed to provide security, caching, content compression or be coupled with a load balancer.

BUT, contrary to popular belief, a reverse proxy server isn't synonymous with a load balancer!

A load balancer, as the name indicates, is a mechanism used to enhance performance and reliability through redundancy in the servers in a client-server architecture. It can be implemented in software as well as hardware. 

Our focus is on the current load balancing mechanisms supported by NGINX. 

Clearly a load balancer is a must in a web based system as it is directly associated with the throughput. NGINX provides 2 different mechanisms - round robin and ip hashing. The round robin technique distributes the workload across servers in a round robin fashion. IP hash can be used to associate a particular ip/set of ip addresses with a server. 

Associating ip addresses with backend servers doesn't prove to be a very efficient technique in stateful applications, where we look to associate client sessions with a single server. Using this, all the client activity and data is tracked and maintained locally by the same server instance. This concept of having "sticky sessions" to introduce persistence is essential to be able to scale all stateful web applications. IP hash fails as multiple clients may come in with the same ip addresses. (courtesy forward proxies) 

Our search for existing implementations of this feature in NGINX led to nginx-sticky-module, which uses cookies to store and identify user sessions. This module is a win-win so long as the cookies aren't disabled by the browser. 

So now we finally had a more well defined task - to look for and implement an alternative solution to counter the limitations of the cookie based approach.

Our answer (not exactly ours) - URL rewriting. Assuming that the application appends a session id or an equivalent to the url, our nginx module can read it and direct the request to the server to which the user session is mapped to. A mapping is created in case the request lacks a session id, indicating a new client session. 

For now we decided to restrict ourselves to a system with a single load balancer, to avoid the the challenges of sharing the mapping information. 

Coming up next - setting up nginx on our machines and the much dreaded anatomy of nginx. 





Intro.

Hi! We are a team of 3 people working on a college project for a course on the Architecture of Open Source Technologies #PESIT #CS. We are extremely ambitious about implementing session based load balancing in NGINX (engine - X), a popular open source web server. A blog will help us gloat at the end of this course and might accidentally help someone somewhere. #pj

Engine - X ?

Wiki describes NIGNX as an open source reverse proxy server, load balancer, HTTP cache and a web server. #techjargon #kills

So in the pilot week, we decided to read up about load balancing, web servers and session based load balancing in general before getting our hands dirty with the nitty-gritties of NGINX.