8 minute read

Set Up a Reverse Proxy in Apache

Read on to learn how to set up and configure a reverse proxy.

The Apache HTTP Web server is extremely popular, and is used by most enterprises and start-ups for their front facing servers. More commonly known as just Apache, it was developed by the Apache Software Foundation and was released in 1995. The most recent and stable version is 2.4.7, which was released in November last year. Apache is open source software and is available for most of the major operating systems, including UNIX, Linux, OS X, Microsoft Windows, etc. Let’s now explore how to install Apache HTTP server and configure it to set up a reverse proxy on it.

Advertisement

Before proceeding to the actual set-up and configuration, I would like to explain what a reverse proxy is. According to Wikipedia, “A reverse proxy is a type of proxy server that retrieves resources on behalf of a client from one or more (internal) servers.” In reality, the actual resources might be getting fetched from different internal servers, but the client will not be aware of it. For the client, it will appear as though those resources were generated from the server itself. To understand the scenario, consider the example shown in Figure 1. Generally, this job is done by a dedicated proxy server. But it may not be possible to have a dedicated proxy server always. Hence, we can use the Apache HTTP server to act as a proxy server.

Reverse proxies come with a lot of benefits. As mentioned on Wikipedia, the following are the major advantages of using reverse proxies: 1. Reverse proxies can hide the existence and

characteristics of an origin server or servers. 2. Application firewall features can protect against common Web-based attacks. Without a reverse proxy, removing malware or initiating takedowns, for example, can become difficult. 3. In the case of secure websites, a Web server may not perform SSL encryption itself, but instead offload the task to a reverse proxy that may be equipped with SSL acceleration hardware. (See SSL termination proxy.) 4. A reverse proxy can distribute the load from incoming requests to several servers, with each server serving its own application area. In the case of reverse proxying in the neighbourhood of Web servers, the reverse proxy may have to rewrite the URL in each incoming request in order to match the relevant internal location of the requested resource. 5. A reverse proxy can reduce the load on its origin servers by caching static content, as well as dynamic content—also known as Web acceleration. Proxy caches of this sort can often satisfy a considerable number of website requests, greatly reducing the load on the origin server(s). 6. A reverse proxy can optimise content by compressing it in order to speed up loading times. 7. In a technique known as ‘spoon feeding’, a dynamically generated page can be produced all at one time and served to the reverse-proxy, which

can then return it to the client a little bit at a time.

The program that generates the page need not remain open, thus releasing server resources during the possibly extended time the client requires to complete the transfer. 8. Reverse proxies can operate whenever multiple Web servers need to be accessible via a single public IP address. The Web servers listen on different ports in the same machine, with the same local IP address or, possibly, on different machines and different local IP addresses altogether. The reverse proxy analyses each incoming request and delivers it to the right server within the local area network. 9. Reverse proxies can perform multiple other forms of testing without placing JavaScript tags or code into pages. 10. The reverse proxy concept is also used in search engine marketing to automatically embed a destination website with usage tracking code that can be used for campaign reporting or campaign optimisation. This is typically accepted as being bad practice.

Let us now go on to achieve Points 4 and 8 from the above list.

So our objective is to primarily configure Apache such that it can cater to multiple application servers with only one public IP address. Also, we will look at how to rewrite URLs so that the client doesn’t know about any internal server.

In this article, I will consider Fedora as the operating system installed on my server. Similar functionality can be achieved with any of the OSs that are supported by Apache. First, to install Apache HTTP server, run the following command either as the root user, or run the command with sudo:

yum install httpd

Based on the host OS, it will depend whether the ‘yum’ command has to be used or not. There are binaries available for direct installation as well. ‘httpd’ basically stands for Hypertext Transfer Protocol Daemon. Since the Apache Web server works on HTTP standards and the server always runs as a daemon (a background process), the package was named as httpd.

Now that the server is installed, let’s just get it going:

/etc/init.d/httpd start ## OR ## service httpd start

The default port of Apache is 80. This can be found in the httpd.conf file inside the Apache folder. On a Linux system, simply browse to the following location:

cd /etc/httpd/conf Internet Proxy Server

Web/ Application Server(s)

Internal Network

Figure 1: Selecting the interface list for packet analysis

Here, you will see a file called httpd.conf. This file is where all the magic happens. If you are working with a different OS, the file’s location will be different, but essentially you need to search for this file only: httpd.conf.

First, let’s check which default port has been listed inside this conf file. Look for a line similar to the following:

Listen 80

This is the default port on which Apache server is listening. Based on which port is open for your public IP, you can simply change this port number here. For example, if your port number is 6500, change the number next to ‘Listen’:

Listen 6500

Next, to check if the server is actually up and running, type the following command in the terminal:

netstat –tulpn | grep :portNumber

…where portNumber is the port number that you have set in the httpd.conf file. If the server is running, you will get an output as shown below:

tcp6 0 :::6500 :::* LISTEN 61974/httpd

…where 6500 is the port number and 61974 is the PID (process ID) for the httpd daemon.

Now that the server is up and running, let’s do the reverse proxy and URL rewriting configuration. First, ensure that the following line is not commented in the conf file:

Include conf.modules.d/*.conf

Next, we need to create a new module tag for the mod_ rewrite.c module. Preferably, let’s add this at the bottom of the file so that in case we face an error, we know where to look for it:

<IfModule mod_rewrite.c> RewriteEngine On </IfModule>

This block is where we will write all our redirection rules. For testing purposes, start any of the application servers that you may have on any of the internal servers. Let’s assume that our public IP address is a.b.c.d:6500 and let’s take two application servers, one running Tomcat on 10.0.0.1 on Port 8080 and another running the nodejs server on 10.0.0.2 on Port 3000. Now, to add rules for these two application servers, simply modify the above code block:

<IfModule mod_rewrite.c> RewriteEngine On #Tomcat Rewriting Rules #Matching pattern for sampleWebAppOne, to redirect to local tomcat server running on 8080 port RewriteRule ^/sampleWebAppOne /(.*)$ http://10.0.0.1:8080/sampleWebAppOne /$1 [L,P] #Nodejs Rewriting Rules #Matching pattern for sampleWebAppTwo, to redirect to local nodejs server running on 3000 port RewriteRule ^/sampleWebAppTwo /(.*)$ http://10.0.0.2:3000/sampleWebAppTwo/$1 [L,P] </IfModule>

mod_rewrite is nothing but regex magic. Here, we are matching request patterns and redirecting those requests to other local internal servers based on the matched patterns. For the sampleWebAppOne, we send a request like the following:

http://a.b.c.d:6500/sampleWebAppOne/someResource

So this request gets translated based on the pattern that we have written above. The regex is looking for a pattern that matches ‘/sampleWebAppOne/’ and any number of characters following it. Next, if the pattern matches, this request is redirected to the internal server. Here, whatever falls into the category of ‘any number of characters following it’ gets passed to ‘$1’. So, the above request gets translated to:

http://10.0.0.1:8080/sampleWebAppOne/someResource

So is the case for the nodejs server rule. Next, there are specific flags that can be used with these rules. In our rules, we have used the flags [L,P]. The meaning of the L flag is that if the rule matches, no other rules should be processed. The P flag redirects the request to mod_proxy. This is how we achieve the reverse proxy configuration. Using this, we can map the remote content into the namespace of the local server without exposing the local server to the public.

Finally, to test these, first restart the Apache server:

/etc/init.d/httpd restart ## OR ## service httpd restart

If everything is fine, the server will restart gracefully without any errors. To confirm if the server is up and running, type the following command:

netstat –tulpn | grep :portNumber

Next, try accessing any of the resources on the Tomcat server or the nodejs server using the requests with the public IP and the respective port. If the rewriting was successful, you will be able to access those resources without seeing a change in the IP in the address bar. This means you have achieved reverse proxy configuration along with URL rewriting to cater to requests from a single public IP, without exposing the internal servers.

By: Manit Singh Kalsi

The author works as a mobile evangelist in the Mobility Center of Excellence at Genpact Headstrong Capital Markets. He is a Java and JavaScript developer who enjoys exploring new technologies and frameworks. When not coding, he is either strumming his guitar or playing video games. Follow him @manitsinghkalsi

This article is from: