What's a good way to parse a PDF?
I do the books for my condominium association. The city/county combined billing for water, sewage, trash hauling, and recycling. They make the bills available as PDFs. For years I converted them to text files, extracted the data and inserted it into a spreadsheet with a script (they don't make the bill available in any format other than PDF). A year ago they started using new formats to write their PDFs,ones that aren't predictable. The PDFs look the same but the amounts associated with what they bill for aren't in the same spot every month; in fact sometimes the last item on the bill has its name on the last page of the PDF and text conversion but the amount on the first page. I assume they've gone to a columnar format that the conversion doesn't get 'right'. I've tried conversions from ghostscript, xpdf, Open Office, and acrobat. Has someone another suggestion?
|