LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 03-21-2010, 03:36 PM   #1
comiconomenclaturist
LQ Newbie
 
Registered: Oct 2007
Posts: 9

Rep: Reputation: 0
dynamically format text and send to another programme...


I have a Word document that I want to read from and format the text dynamically to send to another programme. The document is about 12 pages long, and I need to preserve CR or EOL's, but the rest of the text will need be formatted according to variables such as characters per line and lines per page. I will need to read and print characters like %, , ;, and : too.

I'm not sure where to start with this! Can anyone recommend an application or programming language that would be suitable? Could I do this with python, latex, or even just awk or sed, in a bash script? What would be best?

James
 
Old 03-21-2010, 08:50 PM   #2
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946
Too vague. We'd need to see a representative sample of the input, and an example of the intended output, at the very least. The right tools to use depend a lot on the formatting of the input you have to deal with.

Also, do you need the script itself to get the text out of the word document and into plain text, or can you handle that separately? You say it's a 12-page document. Will the new formatting be on a per-page basis, or will it restructure the whole document? It's things like this that we need to know.

Last edited by David the H.; 03-21-2010 at 08:53 PM. Reason: addendum
 
Old 03-22-2010, 04:55 AM   #3
comiconomenclaturist
LQ Newbie
 
Registered: Oct 2007
Posts: 9

Original Poster
Rep: Reputation: 0
Sorry for being vague and thanks for the reply. I've attached a sample of some text from the Word document. What I want to do is input two variables into the script (characters per line and lines per page) and read as many lines as necessary to output one page. Then, when requested, I want to send the next page with the same formatting of characters per line and lines per page. (The whole text is similar to the example and will be formatted the same). Ultimately I want to display this text in fullscreen. I've also attached an example of how the output might look with line wrap at 78 characters. And finally yes, it would be possible for me to save the Word document to plain text first if necessary.

thanks for your help
Attached Files
File Type: txt sample_text.txt (2.3 KB, 3 views)
File Type: txt sample_text_output.txt (2.3 KB, 3 views)
 
Old 03-22-2010, 07:10 AM   #4
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946
So really there are two operations to perform. First re-wrap the lines to the width you want, then paginate the text to the desired length, outputting one page at a time. Correct?

A couple more clarification questions here. You want to rewrap the text, but not the dashed lines separating blocks, right? Also, do the separators always separate single lines of text, or could there be more than one in each section (i.e. newlines or even blank lines). Are there any other possible formatting gotchas to worry about?


Just monkeying about for few minutes though, I came up with this for the first operation:

Code:
linewidth=78

sed 's/^_\+$/<linebreak>/' sample_text.txt | fold -sw "$linewidth" | \
sed 's/<linebreak>/__________________________________________________________________________________________/'
Formatting the bulk of the text while excluding the separators is a little challenging, so I just applied a brute-force method. I used sed to replace the separators with a substitute string, used fold to wrap the text, then converted the separator back to full length. Hey, it works.

Not to sure how to paginate it though, especially while controlling the timing of the output. You could use sed to break it up, and perhaps use a loop of some kind to control the timing of the output based on user input. I'd have to think about it a bit more.

It might be better to go about this with something like perl, which I personally don't know much about.
 
1 members found this post helpful.
Old 03-22-2010, 09:28 AM   #5
comiconomenclaturist
LQ Newbie
 
Registered: Oct 2007
Posts: 9

Original Poster
Rep: Reputation: 0
Great, thanks for this starting point. You are right about the separators - they should not be wrapped. And yes, they may separate single lines or multiple lines of text including blank lines. Also, some lines are indented within Word and it would be good to preserve this... I'm going to look at the sed commands more closely and perhaps Perl, like you suggest.

many thanks

James
 
Old 03-22-2010, 07:47 PM   #6
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,251

Rep: Reputation: 2026Reputation: 2026Reputation: 2026Reputation: 2026Reputation: 2026Reputation: 2026Reputation: 2026Reputation: 2026Reputation: 2026Reputation: 2026Reputation: 2026
A couple of Perl modules

http://search.cpan.org/~gabor/Text-F...Text/Format.pm
http://search.cpan.org/~muir/Text-Ta...b/Text/Wrap.pm (simpler)

You may need to write your own code:
http://perldoc.perl.org/
http://www.perlmonks.org/?node=Tutorials
 
  


Reply

Tags
dynamic, format, programming, text


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to send the dynamically allocated two dimensional character array through C++ UDP kbarani Linux - Software 3 05-04-2009 11:41 PM
prepending text dynamically to a file curos Linux - Newbie 1 02-13-2009 03:29 AM
in Pascal: how to exec a program, discard text output or send to text file Valkyrie_of_valhalla Programming 6 05-02-2007 09:50 AM
Printing numbers from a text file dynamically mrobertson Programming 1 06-28-2005 08:19 AM
using sendmail I can't send email in Rich Text Format robmainella Linux - Software 4 09-09-2003 10:33 AM


All times are GMT -5. The time now is 07:21 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration