PDF Fast Web View Won't Work!

paulo.monk · 07-20-2009, 12:15 PM

I don't know if this is the right place to ask for this, but since its a forum for programming questions, without restrictions, here it is (sorry if its not - I'm more of a forum reader than writer)..

Does anyone know how to make PDF Fast Web View work properly?

I'm developing an java EE 6 webapp that opens, inline mode, pdf documents. I fell into the same problem of many developers within this scope, that is the size of these documents. As PDF size increases, becomes almost impossible to work with them opening in a static way (waiting for the whole document to download and then load it). Most of the times, firefox 3, 3.5 or IE 7 e 8, with the latest adobe plugin, returns a message saying that the file may be corrupt or something strange like "This document doesnt begins with %PDF" or something like that.

My scenario is composed by Tomcat 6 and Apache 2.2 (redirecting Tomcat requests throw mod_jk). Server runs Debian Lenny and my workstations, where almost all tests were ran, runs slackware 12.2 with the same versions of Apache and Tomcat.

I've found this post, showing how to set up an java file servlet that supports resuming (with code example), treating all HTTP headers related to the process in the proper way (ETAG, Content-Type, Content-Disposition, Accept-Ranges, etc). For testing purposes I set up this servlet, without any modifications, and this damn fast web view doesn't work. The document starts to load dinamically, but it freezes the tab and sometimes the browser. At least the servlet's resuming capability was functional.

For testing purposes, I've downloaded an relativelly large PDF from one site that works with F.W.V. and optimized another from my specific domain with ghostscript software. In both, when I open them with Adobe Reader 9.1, in the document properties it says that Fast Web View is enabled.

Desperate, I tryed some lower level analisys, like sniffing the requests/replys with Wireshark and discovered an possible TCP checksum offload problem. I followed the suggestion from Wireshark Wiki, but nothing changed (at least in the browser's perspective - the behavior was the same as before). The "only" difference was that the checksum problem doesn't appeared anymore on the wireshark's sniffing log.

I've googled all around, looked up in all kinds of forums, and haven't found any related problem. At least not this deep: in majority, problem was in wrong treatment of HTTP headers. But, as the resumeable downloads works on this file servlet code, I really don't know what's happening.

If someone have a clue, please give me a hand! I'm really lost and almost one month is gone with this.

I didn't supplied all the documented information that I have from the tests I've executed 'cause its HUGE. Oo Anyone, just ask for specific information, I will send it right away!

Thanks in advance!

Guttorm · 07-21-2009, 03:50 AM

Hi

I doubt it has anything to do with the HTTP headers. Fast web view in PDF means that the order of things inside the PDF file is set so that the things needed to display the first page are in the beginning of the PDF file. Normally things like pictures and fonts can appear anywhere in the file, so to render the first page, it could need to read the entire file because some picture on the first page is at the end of the PDF file.

Fast web view takes care of that. So you don't need any HTTP headers and it should not use any resume, accept-ranges etc. But I think modern readers can use that if the PDF file is not optimized for fast web view.

When you get warnings like "This document doesnt begins with %PDF" - it means just that. A valid PDF file should always start with %PDF, so I suspect there is something wrong with how you handle the output, maybe in the HTTP headers? I would play with wget -S to see all the headers, and then look at the PDF file to see if the downloaded file is the same as the file on the server.

paulo.monk · 07-21-2009, 07:17 AM

I agree with you in part. The process of PDF linnearization reorganize the PDF so that its bytes stays in order. This way, as I send bytes, from first to last, the file can be loaded even if only part of it was downloaded at the moment.

But you have to handle these headers. The goal of this is to let the file to be requested in parts. I sniffed an connection with other site that works with PDF Fast Web View and the HTTP transaction is made up by many requests, each one asking one specific byte range. I don't know if there's a way of sending it only opening a stream and putting all bytes in one turn. But I see it process similar to a resumeable file servlet. Anyway, some extra controls shouldn't block the process. Tests without this header control were made before, with bad results (same behavior, freezing browser).

I thought it could be some apache/tomcat configuration, but I didn't found anything related.

Any other ideas?

paulo.monk · 07-21-2009, 07:27 AM

I forgot to say.

This message you mentioned, "This document doesn't begins with %PDF" appears even if im trying to open a valid PDF.

Let's say, if i try to open some relativelly large PDF, inline mode, 5 attempts will return this error and the other 5 will open the file properly. It's an transfer/connection issue more than an error generated by an incompatible format.

Thanks for the reply.
Some other suggestions?

monk

Guttorm · 07-21-2009, 07:46 AM

Hi again

Well you only have to handle these headers if the PDF files are not optimized for web. I've also seen the browser hang and weird things happening, but then it's because of corrupt PDF files or simply sending the wrong files.

Quote:

Most of the times, firefox 3, 3.5 or IE 7 e 8, with the latest adobe plugin, returns a message saying that the file may be corrupt or something strange like "This document doesnt begins with %PDF" or something like that.

This is why I suspect there is some bug in your code sending the files. Did you try wget -S and comparing your servlet to a regular Apache server sending it?

paulo.monk · 07-21-2009, 09:02 AM

The PDF I'm using for testing is one from the website which works properly with PDF Fast Web View. I've tried to send it in two different ways: using the code from the file servlet, located in the previous mentioned URL and in a simpler way, as sugested by you, just getting the HttpServletResponse's output stream and sending the document to the client without concerning about byte ranges and setting the content type as "application/pdf" instead of "multipart/byteranges" (used in the other method). Both ways, I got the same behavior.

I didn't tried wget -S yet, but I'm gonna try it right now. I've compared the packet flow between the previous website (with FWV working). In an first look, the server's responding in the same way, but the browser responds, I mean, doesn't. Oo

What do you mean by a "regular apache server"?

Guttorm · 07-21-2009, 09:32 AM

Hi

I just meant to put a PDF file on a regular apache server, so there is no Tomcat, Java or anything involved. Does it still behave the same? I just tried downloading a PDF from my computer, and got these HTTP headers:

Quote:

HTTP/1.1 200 OK
Date: Tue, 21 Jul 2009 14:17:42 GMT
Server: Apache/2.2.8 (Ubuntu) DAV/2 SVN/1.5.1 PHP/5.2.4-2ubuntu5.6 with Suhosin-Patch mod_ssl/2.2.8 OpenSSL/0.9.8g
Last-Modified: Tue, 21 Jul 2009 14:17:26 GMT
ETag: "3e361f-2844830-46f37e9f6e980"
Accept-Ranges: bytes
Content-Length: 42223664
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: application/pdf

wget -S showed them, what does it show when you download thru Tomcat?

Also, try a diff on the downloaded PDF files, is there anything appended or prepended to the files? I guess packet sniffing could also show some difference, but wget is maybe a bit easier?

paulo.monk · 07-21-2009, 11:45 AM

That was a GREAT suggestion man!

Right now I'm almost hitting my head on the table for not thinking about this before! Oo

Getting it with apache only, without tomcat, struts, etc it just worked!

Now I just have to figure out why tomcat/struts are blocking it. Maybe PDF FWV won't work if the requests are intercepted by Tomcat/Struts. I'm gonna try to make it work inside Struts, but at least I've got an alternative now.

Thanks A LOT, man!