http relay - tracking http requests
I need to track call center agents interaction with a website, specifically when the agent enters information into a certain page, I need to write that information to a database.
My solution has been to have the agents connects to an internal website, forward all communication to the external website.
The simplest solution would be to write a custom relay application, that inspects the communication between the agent's browser and the web server. It looks for a post with the information that needs to be saved. However there are two problems:
1. The external web site uses https, so all communication is encrypted. It would be OK for the agent to connect to the relay over http, since it is behind the corporate firewall, and have the relay connect to the external website over https. However, I think that adding http -> https translation to the custom relay application is a non-trivial work item.
2. The pages returned by the server contains absolute urls, and redirections back to that website. Thus, the user may connect to the relay but might be redirected back to the real website, or links in the returned content might point back to the web site, so by the time the user get to the page of interest they are no longer connected with the relay.
To solve #1 I decided to use Apache as a the basis for my relay, and have it deal with the http -> https translation.
Initially I tried to accomplish the relay functionality with apache's proxy and filters modules. You can set Apache as a reverse proxy and have it translate from http to https. And using an output filter you can inspect all returned content and translate urls etc. such that they will point to the relay. However, it turned out that target server redirection escapes the filter and is passed back by the apache proxy to the client.
So next I wrote a cgi that plugs into apache and uses wget to fetch the content. The user accesses pages from the local server, which translates urls in both the outgoing and incoming content. The cgi uses wget with https to access the external website. I also take advantage of wget's ability to cache pages in the local web server.
It works! the only problem is that it is slow! I think that is because wget closes the connection after each request (since it is called per request) and takes a long time to negotiate the ssl (https) connection for each request. In contrast, when the browser connects directly to the external website over https, it keeps the connection open.
So I'm looking for a better solution short of implementing a complete http->https relay.