LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Networking (https://www.linuxquestions.org/questions/linux-networking-3/)
-   -   problem with RSS feed and reverse proxy changes (https://www.linuxquestions.org/questions/linux-networking-3/problem-with-rss-feed-and-reverse-proxy-changes-725425/)

deesto 05-12-2009 08:37 AM

problem with RSS feed and reverse proxy changes
 
I'm hoping someone can help me decipher an overly-complex problem with a simple end: to display an RSS feed in a portlet on a CMS (Plone) site.

The scenario: three back-end servers are running these CMS sites. All three are sitting behind proxy servers, all running Apache with a virtual host for each back-end Plone server. On top of this is a firewall, to get through which to the outside we need to specify an environment variable.

To load any RSS feed in the sites, I had to add a http_proxy variable and value to its configuration file (zope.conf) and restart the application. Once I did this, the RSS feeds would appear in the portlets. Without the proxy variable, any RSS I tried to load -- even internal feeds that didn't come from beyond the firewall -- failed, and the portlet just never appeared. In short, without the proxy setting, no feeds would load.

The problem: the maintainers of the firewall just implemented a change to their Squids to prohibit HTTP OPTIONS in their reverse proxies. The claim is that this change should have no effect on services within the firewall. But the moment this change was made, our RSS feed portlets disappeared, and no other configuration change was made on our side.

It has been suggested that the problem must be that the CMS must be going outside the firewall to resolve the host name of the RSS feed URL (even when it is the same as itself) and pull in the feed, and that resolving the URL via the systems' hosts files would fix it. However, the DNS servers we're using are also within the firewall. In addition, the sites are being proxied on separate machines, which have virtual host definitions for each site, so resolving the site names on the back-end would break stuff.

I've been running tcpdumps, strace, and wireshark on one of the servers to watch the traffic, but it's encrypted (HTTPS) and running through an stunnel so I can't see much useful data, except that traffic is indeed going from the host to the proxy and back.

However, I did increase the feed refresh time on the Plone server to pull the RSS feed from the other server (on which the RSS feed is published) via its proxy every 1 minute. I see traffic going in and out to do this, but I do not see an access request on the proxy for the feed file, nor on its back-end server. I've also tried removing the http_proxy definition from the CMS configuration file and restarting the application, but this has no effect.

The question(s): how can I trace what is happening with regard to the RSS feed communication (nothing shows in the logs) and pinpoint exactly where it is going wrong?

sarin 05-13-2009 06:25 PM

Quote:

The question(s): how can I trace what is happening with regard to the RSS feed communication (nothing shows in the logs) and pinpoint exactly where it is going wrong?
May be you already tried this. But, this is all I can think of since you don't have access to proxy servers.

Get a privileged account on one of your external server or set up an external server that serves rss. Ensure that you can access these feeds from some machine present on Internet. Run wireshark on the feed server. From your internal machine, use a browser and access the rss page. The browser should have the proxy server enabled. Look for signs of traffic on the wireshark running on the external server. If you can't see any sign of traffic, pain your proxy admins.

deesto 05-14-2009 10:16 AM

Thanks Sarin. Actually that's part of the problem: as long as the proxy is set on the "internal" machine, it can load an external RSS feed normally within a browser. It's the CMS application that fails to properly pull in and load the RSS feed, and there aren't any log entries to indicate a problem.

Maybe important to mention this part again:
Quote:

I did increase the feed refresh time on the Plone server to pull the RSS feed from the other server (on which the RSS feed is published) via its proxy every 1 minute. I see traffic going in and out to do this, but I do not see an access request on the proxy for the feed file, nor on its back-end server.
So, what should happen here is every minute, the application on the "internal" machine makes a request to reload the RSS data from the feed. I can see it initiating this process every minute via an strace on the application process, and I see communication between the internal machine and its proxy (inside the firewall). What I don't see is any consistency in this request actually making it all the way through to its destination, which is the other "internal" server on which the RSS feed is published, or even to the proxy of that internal server. Instead of Apache's logs showing a request for the RSS feed once per minute, I see an erratic pattern of access between 0-3 times per day.

I think there is enough evidence here to convince me there is a problem, but not yet enough hard evidence to convince the other folks. I need more clues.


All times are GMT -5. The time now is 06:29 AM.