http relay

wastingtime · 04-27-2009, 12:57 AM

I need to track call center agents interaction with a website, specifically when the agent enters information into a certain page, I need to write that information to a database.

My solution has been to have the agents connects to an internal website, forward all communication to the external website.

The simplest solution would be to write a custom relay application, that inspects the communication between the agent's browser and the web server. It looks for a post with the information that needs to be saved. However there are two problems:

1. The external web site uses https, so all communication is encrypted. It would be OK for the agent to connect to the relay over http, since it is behind the corporate firewall, and have the relay connect to the external website over https. However, I think that adding http -> https translation to the custom relay application is a non-trivial work item.

2. The pages returned by the server contains absolute urls, and redirections back to that website. Thus, the user may connect to the relay but might be redirected back to the real website, or links in the returned content might point back to the web site, so by the time the user get to the page of interest they are no longer connected with the relay.

To solve #1 I decided to use Apache as a the basis for my relay, and have it deal with the http -> https translation.

Initially I tried to accomplish the relay functionality with apache's proxy and filters modules. You can set Apache as a reverse proxy and have it translate from http to https. And using an output filter you can inspect all returned content and translate urls etc. such that they will point to the relay. However, it turned out that target server redirection escapes the filter and is passed back by the apache proxy to the client.

So next I wrote a cgi that plugs into apache and uses wget to fetch the content. The user accesses pages from the local server, which translates urls in both the outgoing and incoming content. The cgi uses wget with https to access the external website. I also take advantage of wget's ability to cache pages in the local web server.

It works! the only problem is that it is slow! I think that is because wget closes the connection after each request (since it is called per request) and takes a long time to negotiate the ssl (https) connection for each request. In contrast, when the browser connects directly to the external website over https, it keeps the connection open.

So I'm looking for a better solution short of implementing a complete http->https relay.

acid_kewpie · 04-28-2009, 07:52 AM

if you have issues about the hostnames that the box contacts, how about modifying your DNS solution to tell your clients that the hostname in the URL is actually your local box? It's not horribly nice in the first instance, but not that uncommon, albeit usually in different architectures. Quite what you do with the request once it hits your box can still be an issue though, but your existing server side ssl on apache may still sort out your issues.

You could do something hopefully simpler without apache too. you could use just forward requests to the remote server via an ssl connection handled by a widget like stunnel. here, you'd only need to get the clients to hit a port on the stunnel box that you are listening on for the plaintext stunnel connection (via dns entries) and then squirt it through. There's no mention of actual http inspection here at all though, as you could just use something like ngrep to recgonise various bits of data straight off of the wire without actually terminating it or messing in any other way.

More formally again, squid really should be able to do server side ssl too, but a brief google hasn't shown anything. It was from there that I drifted towards stunnel...

unSpawn · 04-28-2009, 06:05 PM

...another to bridge HTTP to HTTPS could be Delegate (multi-purpose application level gateway or proxy server), see example: A universal TLS gateway by DeleGate.