LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 01-02-2006, 04:50 PM   #1
narc
Member
 
Registered: Aug 2004
Location: Montréal
Distribution: Linux from scratch
Posts: 68

Rep: Reputation: 15
pdftotext - How to output to html with ampersand entities ?


Hello.

I am transforming japanese PDFs into web pages.

pdftotext outputs great in Shift-JIS, EUC-JP and UTF-8 but they are all binary outputs. Is there a switch were I can output the content in ampersand entities like &#000 or &#x000 ?

Thanks.
 
Old 01-04-2006, 04:20 AM   #2
Pierre Lambion
LQ Newbie
 
Registered: Oct 2003
Distribution: Slackware
Posts: 9

Rep: Reputation: 0
I don't have a direct answer but maybe you could pipe the html pages through a sed job replacing the characters by the right ampersand entities?
 
Old 01-04-2006, 02:34 PM   #3
narc
Member
 
Registered: Aug 2004
Location: Montréal
Distribution: Linux from scratch
Posts: 68

Original Poster
Rep: Reputation: 15
I saw a few sed commands with my LFS installation but that's about it. I thought sed dealt less with numerical transformation than pattern matching, in which case a lookup table would be necessary. But by the time I learn about all this, my gray hair will become white. :-) I basically created a C filter for all UTF-8 characters > 0x007F. Works good.

Thanks for your time and answer.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to print my CGI programs output inside an HTML zaveko Programming 9 10-11-2005 05:37 PM
xpdf, pdftotext phoenix7 General 7 09-08-2005 02:54 AM
Charset in html-output from DocBook? BoonZie Linux - Software 0 12-14-2004 03:59 PM
easy way to wrap a lot of links for html output? the_rhino Programming 7 10-20-2004 12:40 PM
alternative way to write an ampersand (or another fix to my superkaramba problem)? fibbi Linux - Software 1 04-23-2004 02:51 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 04:17 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration