Sunday, September 22, 2013

Decoding the Jargon

As per the plan, we started reading up on wiki and a number of other blogs to get the holistic picture of the system that we were planning to patch.

We were already familiar with the concept of a proxy server (forward proxy server to be specific) - It acts as an intermediary on the client side, fetches resources on web and returns it to them, aiding in implementing caching. 

A Reverse proxy server is its counter part on the server side, which takes requests from the client and forwards them to the servers in the internal network. The client connects to the reverse proxy which abstracts the origin servers and can be designed to provide security, caching, content compression or be coupled with a load balancer.

BUT, contrary to popular belief, a reverse proxy server isn't synonymous with a load balancer!

A load balancer, as the name indicates, is a mechanism used to enhance performance and reliability through redundancy in the servers in a client-server architecture. It can be implemented in software as well as hardware. 

Our focus is on the current load balancing mechanisms supported by NGINX. 

Clearly a load balancer is a must in a web based system as it is directly associated with the throughput. NGINX provides 2 different mechanisms - round robin and ip hashing. The round robin technique distributes the workload across servers in a round robin fashion. IP hash can be used to associate a particular ip/set of ip addresses with a server. 

Associating ip addresses with backend servers doesn't prove to be a very efficient technique in stateful applications, where we look to associate client sessions with a single server. Using this, all the client activity and data is tracked and maintained locally by the same server instance. This concept of having "sticky sessions" to introduce persistence is essential to be able to scale all stateful web applications. IP hash fails as multiple clients may come in with the same ip addresses. (courtesy forward proxies) 

Our search for existing implementations of this feature in NGINX led to nginx-sticky-module, which uses cookies to store and identify user sessions. This module is a win-win so long as the cookies aren't disabled by the browser. 

So now we finally had a more well defined task - to look for and implement an alternative solution to counter the limitations of the cookie based approach.

Our answer (not exactly ours) - URL rewriting. Assuming that the application appends a session id or an equivalent to the url, our nginx module can read it and direct the request to the server to which the user session is mapped to. A mapping is created in case the request lacks a session id, indicating a new client session. 

For now we decided to restrict ourselves to a system with a single load balancer, to avoid the the challenges of sharing the mapping information. 

Coming up next - setting up nginx on our machines and the much dreaded anatomy of nginx. 





1 comment:

  1. Good progress. It is always fruitful and more productive to work with focused task.
    To test out your work, try making the basic Load Balancer setup so that when you come with your enhancement, you should not be spending time into creating the test setup.

    Further, it is helpful to define in the beginning itself what king of URL rewriting you are planning to use. Study and understand how Apache does this from observed behaviour (just using wireshark) on the client side. Essentially, need to be fully clear on what kind of URL rewriting you need to work with.

    ReplyDelete