LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-28-2012, 12:41 AM   #1
jamtat
Member
 
Registered: Oct 2004
Distribution: Debian/Ubuntu, Arch, Gentoo, Void
Posts: 138

Rep: Reputation: 24
scripting help/advice; use bash?


Hi. I recently did some tweaking to nano so that I could create outlines with it that look good on the screen. I also need to print out those outlines and have them look nice on paper and I've come up with a way of doing that which involves inserting--manually, for now--TeX/LaTeX mark-up, then changing the file's extension to .tex and running pdflatex on it. You can read about the project at audaciousamateur.blogspot.com for more details if you're interested.

It seems to me that, even for someone with my limited knowledge, there should be some non-manual way to add the mark-up to my outline files. Someone who knows perl or python well, for example, could probably easily cobble together some way of doing this task using one of those languages. But I know next to nothing about either language.

In another forum where I asked about this I was directed to The Advanced Bash Scripting Guide. I was kind of gravitating toward bash anyway for this since, if I can lay somewhat dubious claim to being familiar with any sort of scripting, it would be using bash (I've created some extremely rudimentary bash scripts in the past). But I have such a poor grasp of even bash that I really wasn't sure it could process these files.

Well, I actually located in the ABS a sample script that converts a text file to html--something very close to what I need to do. In short, I need to add some lines at the beginning of the file and append some at the end, as well as to insert some mark-up within the file: that's pretty much what the bash script I found does as well (see the script at http://www.tldp.org/LDP/abs/html/con...ts.html#TOHTML ).

I just want to start off this thread by asking those much better versed in bash whether I'm on the right track in considering the ABS sample script as being a good starting point for a script that could be used to process my outline files?

Thanks, James

Last edited by jamtat; 01-28-2012 at 01:07 AM.
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 01-28-2012, 02:04 AM   #2
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Its hard to answer without specific information about the format of your file before and after inserting the Tex/LaTeX markup.

Bash is not great at string manipulation but it can use sed to do the complex work for it as done in the linked script.

awk might be a better choice. If you know C, awk is relatively easy to learn -- easier than bash.

Can you post an illustrative example of the input file format and the desired output?
 
Old 01-28-2012, 04:29 AM   #3
pyroscope
LQ Newbie
 
Registered: Jan 2012
Distribution: Debian / Ubuntu
Posts: 15

Rep: Reputation: 2
If I understand this, you want simple text files with simple markup that are directly readable for a human, but also publishable. There are (wiki-like) documentation systems just for that.

You might want to take a step back and consider one of those already round wheels, e.g. http://sphinx.pocoo.org/rest.html#li...te-like-blocks, or actually a personal wiki, which is what I use for quick notes. All wikis convert to HTML obviously, good ones also to PDF.
 
Old 01-28-2012, 09:34 AM   #4
jamtat
Member
 
Registered: Oct 2004
Distribution: Debian/Ubuntu, Arch, Gentoo, Void
Posts: 138

Original Poster
Rep: Reputation: 24
Thanks for the answers thus far. There are illustrations at my blog (I posted the address in the OP), but I'll repeat some of that here.

The outline text file is, obviously, an outline. Each level of the outline gets indented 0 or more tab spaces from the left margin. Unindented lines are the level one parts of the outline; lines indented one tab space from the left margin are level two parts; lines indented two tab spaces from the left margin are level three parts; and so on. I've designated a unique character--the equals sign--as a sort of pseudo-bullet for all outline levels as well. Here's a link to a screenshot of a sample outline I did that should better illustrate visually what I'm describing: http://1.bp.blogspot.com/-o2ZlVk8sLL...s1600/Scr1.png

So, here's what needs to be done to this text file so as to make it print nicely on paper. Nine lines need to be prepended to the beginning. Those lines are:
Quote:
\documentclass{article}
\usepackage{cjwoutl}
\usepackage[top=1in,bottom=1in,left=1in,right=1in]{geometry}
\pagestyle{myheadings}
\markright{\today{\hfill \Large{***Header*title*here***}\hfill}}
\linespread{1.3} % gives 1.5 line spacing
\begin{document}
\begin{outline}[new]
\begin{Large} % gives ca. 14 pt font
Another three lines need to be appended at the end. Those lines are
Quote:
\end{Large}
\end{outline}
\end{document}
Then, mark-up needs to be added within the body of the outline as follows. Every new line that starts with an equals sign (the equals sign being the pseudo-bullet I've selected to use for all outline levels in the text file) needs to have the equals sign replaced by the mark-up \outl{1}. Every new line followed by a single tab space then the equals sign should have the equals sign replaced by \outl{2}. Every new line followed by two tab spaces and the equals sign should have the equals sign replaced by \outl{3}. And so on, up to \outl{10} (I doubt my outlines will ever go to ten levels, but the cjwoutl package is capable of that so it should be possible for the script to handle it: namely any new line followed by nine tab spaces, then the equals sign, should have the equals sign replaced by \outl{10}). See http://2.bp.blogspot.com/-0ABppprz7A...s1600/Scr2.png for an example of how the file looks after I've added (manually, in that case) the mark-up.

I hope this gives enough further detail to determine whether a bash script is the right tool, or even a possible tool, to use for this job. As I said, it seems to me the bash script for converting a text file to html works very similarly to what I need--though my scenario is actually a bit simpler in that mark-up only needs to be added in a certain relation to new lines. The bash script I found, so far as I can understand it, needs to do replacements within lines and paragraphs and so, it seems, calls sed.

Further input will be appreciated. And by the way, I do not know C or any other programming language. The only thing remotely resembling programming that I have any familiarity with at all is some rudimentary html and, as I said, very rudimentary bash scripting.

James

Last edited by jamtat; 01-28-2012 at 10:50 AM.
 
Old 01-28-2012, 09:47 AM   #5
jamtat
Member
 
Registered: Oct 2004
Distribution: Debian/Ubuntu, Arch, Gentoo, Void
Posts: 138

Original Poster
Rep: Reputation: 24
Quote:
Originally Posted by pyroscope View Post
If I understand this, you want simple text files with simple markup that are directly readable for a human, but also publishable. There are (wiki-like) documentation systems just for that.

You might want to take a step back and consider one of those already round wheels, e.g. http://sphinx.pocoo.org/rest.html#li...te-like-blocks, or actually a personal wiki, which is what I use for quick notes. All wikis convert to HTML obviously, good ones also to PDF.
Thanks for your input, pyroscope. I do use moinmoin and am familiar with its mark-up. So I think I understand what you're getting at and it is an interesting thought.

The reason I like the solution I'm proposing is that I can, using TeX/LaTeX mark-up, essentially create a template that will render the printed output in just the form I want it. I can, for example, control margin width, font size, line spacing, header content--even doing tricky things like having the date auto-inserted in the header. So far as I understand it I would have to get involved in a lot of additional tweaking of the file in order to get that kind of output from a wiki file. But I'll certainly be giving the matter some more thought.

James
 
Old 01-28-2012, 10:19 AM   #6
Cedrik
Senior Member
 
Registered: Jul 2004
Distribution: Slackware
Posts: 2,140

Rep: Reputation: 244Reputation: 244Reputation: 244
I don't know with bash, but with Perl a way to do it:

(change $tab_limit value if you need the script to handle more than 10 tabs)
Code:
#!/usr/bin/perl

my $tab_limit = 10;

print <<END
\\documentclass{article}
\\usepackage{cjwoutl}
\\usepackage[top=1in,bottom=1in,left=1in,right=1in]{geometry}
\\pagestyle{myheadings}
\\markright{\\today{\\hfill \\Large{***Header*title*here***}\\hfill}}
\\linespread{1.3} % gives 1.5 line spacing
\\begin{document}
\\begin{outline}[new]
\\begin{Large} % gives ca. 14 pt font 
END
;

while (<>) {

        for my $i (1 .. $tab_limit) {
            my $search = '^\t{' . ($i -1). '}=';
            if (/$search/) {
                my $replace = '\\outl{' . $i . '}';
                s/$search/$replace/;
                last;
            }
        }
        print;
}

print <<END
\\end{Large}
\\end{outline}
\\end{document} 
END
;
Then save as edit_tabs.pl (or any name you want)
Make it executable (chmod +x edit_tabs.pl)
Use it like
Code:
./edit_tabs.pl yourfile.txt > newfile.txt
[edit]
I found a better version, removing the need of limiting tabs count
Also remove equal sign as it was one requirement (and the previous script did not satisfy it)
Code:
#!/usr/bin/perl

print <<END
\\documentclass{article}
\\usepackage{cjwoutl}
\\usepackage[top=1in,bottom=1in,left=1in,right=1in]{geometry}
\\pagestyle{myheadings}
\\markright{\\today{\\hfill \\Large{***Header*title*here***}\\hfill}}
\\linespread{1.3} % gives 1.5 line spacing
\\begin{document}
\\begin{outline}[new]
\\begin{Large} % gives ca. 14 pt font 
END
;

while (<>) {
	s/^(\t*)=(.*)/"$1\\outl{".((length $1) + 1)."}$2"/e;
	print;
}

print <<END
\\end{Large}
\\end{outline}
\\end{document} 
END
;

Last edited by Cedrik; 01-29-2012 at 07:53 AM.
 
2 members found this post helpful.
Old 01-28-2012, 11:02 AM   #7
jamtat
Member
 
Registered: Oct 2004
Distribution: Debian/Ubuntu, Arch, Gentoo, Void
Posts: 138

Original Poster
Rep: Reputation: 24
Quote:
Originally Posted by Cedrik View Post
I don't know with bash, but with Perl a way to do it:
. . . snip
Thank you for offering that, Cedrik. I thought this might be a fairly trivial task for someone familiar with a language like perl.

Now testing . . .

Wow. That works pretty well (though I did have some anomolies at first that resulted from some weirdness introduced when I copied and pasted the code). I note that in newfile.txt your script gets rid of the tab spaces where the \outl{#} tags get inserted. Of course pdflatex doesn't care about whether or not there are tab spaces at those points and formats the file just fine for printing anyway. But for my purposes, preserving the tab spaces found in the original outline is helpful: I can make better sense of the file visually with the presence of the tab spaces at those points. So, is there a way to modify your perl script so that it preserves the tab spaces that occur in the original outline in conjunction with the equals signs?

Otherwise, this looks like it could be a great solution.

James

Last edited by jamtat; 01-28-2012 at 11:29 AM.
 
Old 01-28-2012, 01:16 PM   #8
jamtat
Member
 
Registered: Oct 2004
Distribution: Debian/Ubuntu, Arch, Gentoo, Void
Posts: 138

Original Poster
Rep: Reputation: 24
Would replacing the line
Code:
my $replace = '\\outl{' . $i . '}';
with the line
Code:
my $replace = '^\t{' . ($i). '}\\outl{' . $i . '}';
cause the tab spaces to be preserved?

Thanks, James

Never mind. That doesn't work--just prepends the characters ^\t{#} to lines that being with \outl{#}

Last edited by jamtat; 01-28-2012 at 01:50 PM.
 
Old 01-28-2012, 02:53 PM   #9
Cedrik
Senior Member
 
Registered: Jul 2004
Distribution: Slackware
Posts: 2,140

Rep: Reputation: 244Reputation: 244Reputation: 244
If you want to preserve tabs, change $replace line:
Code:
my $replace = "\t" x ($i - 1) . '\\outl{' . $i . '}';
That should do it
 
1 members found this post helpful.
Old 01-28-2012, 03:47 PM   #10
jamtat
Member
 
Registered: Oct 2004
Distribution: Debian/Ubuntu, Arch, Gentoo, Void
Posts: 138

Original Poster
Rep: Reputation: 24
Yep, that does do it, Cedrik. Thanks again so much for helping with this! I now have a workable way of inserting the needed mark-up into my outlines!

I still may try and do this with a bash script, though. I've wanted for some time now to advance my pathetic abilities with bash, and figuring out how to do this with bash (if, as it seems to me, it will be possible with bash) would provide an opportunity to learn more about it. So if anyone has further input on whether the bash script I found that adds html mark-up to a text file could be adapted to add TeX mark-up as I'm trying to do, please weigh in.

James
 
Old 01-28-2012, 05:33 PM   #11
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
I would use awk instead of Bash, because awk has all the necessary string facilities, whereas with Bash they're a bit lacking. Bash would certainly be a LOT slower.

Here is a plain awk script. You can supply it
  • -v tab=8
    the size of the tab stops
  • -v template=template.tex
    the path to the LaTeX template file
  • -v title="string"
    the string that replaces ~Title~ in the header; empty by default
and the input file name(s). If there are no input files, the input is read from standard input.

You can add further variables, especially ones similar to the title, very easily. I've tried to comment the code well; I want it to be an example and explanation, and not just a suggested solution.

It should work well even with mixed tabs and spaces. It uses the % minimum-indentation \\outl{level} comments in the template (where minimum-indentation is either empty or desired whitespace string). Within the input, extra spaces or tabs do not matter; extra indentation up to the next outline level is accepted. It does not require empty lines between outline levels, as it tracks the preferred outline level for each line, and only inserts the outline definition before the first non-whitespace character when the outline level changes.
Code:
#!/usr/bin/awk -f
#
# -v tab=8
#       set tab stops at every eight columns (the default).
#
# -v template=template.tex
#       set the path to the LaTeX template file.
#
# -v title=text
#       set the text that replaces ~title~ in the template.
#

# Convert tabs to spaces.
function detab(detab_line) {

    while ((detab_pos = index(detab_line, "\t")) > 0)
        detab_line = substr(detab_line, 1, detab_pos - 1) substr(tabsp, detab_pos % tab) substr(detab_line, detab_pos + 1)

    return detab_line
}

BEGIN {
    # -v tab=N sets tab width to N spaces.
    if (tab < 1) tab = 8;

    # tabsp is a tab-length string of spaces.
    tabsp = "        "
    while (length(tabsp) < tab) tabsp = tabsp tabsp
    tabsp = substr(tabsp, 1, tab)

    # -v template=path sets the default template.
    if (length(template) < 1) template = "template.tex"

    # Array mapping indentation in spaces to outline level.
    split("", outline)
    outline[0] = 1      # No indentation maps to outline level 1.
    maxspaces  = 0

    # Record separator is a newline, including trailing whitespace.
    RS = "[\t\v\f ]*(\r\n|\n\r|\r|\n)"

    # Field separator is consecutive whitespace.
    FS = "[\t\v\f ]+"

    # Read the header part of the template.
    while ((getline < template) > 0) {

        # "Content" marks the end of the header.
        if (tolower($1) == "content")
            break

        # Does this line define the intentation level?
        if ($0 ~ /^[\t\v\f ]*%[\t\v\f ]*\\outl{/) {

            # Convert tabs to spaces first.
            line = detab($0)

            # Remove the leading percent sign.
            sub(/^[^%]%/, "", line)

            # Calculate the indentation in spaces.
            spaces = index(line, "\\") - 1

            # Parse the outline parameter.
            level = line
            sub(/^[^{]*{/, "", level)
            sub(/}.*$/,    "", level)
            level = int(level)

            # Add to outline array.
            if (spaces >= 0 && level > 0) {
                outline[spaces] = level
                if (spaces > maxspaces)
                    maxspaces = spaces
            }

            # Do not include this line in the template.
            continue
        }

        # This line is output as part of the header.
        line = $0

        # Replace title.
        gsub(/~[Tt]itle~/, title, line)

        # Remove comments.
        gsub(/[\t\v\f ]%[^{}\\]*$/, "", line)

        printf("%s\n", line)
    }

    # Fill in the number-of-spaces-to-outline-level mapping.
    # Each indentation in the template is the minimum;
    # extra spaces are allowed (up to the next level).
    level = outline[0]
    for (spaces = 1; spaces < maxspaces; spaces++)
        if (outline[spaces] > 0)
            level = outline[spaces]
        else
            outline[spaces] = level

    # Start without outline.
    level = 0
}

/^[\t\v\f ]*%/ {
    # Skip comment lines.
    next
}

{
    line = $0

    # Remove comments.
    sub(/[\t\v\f ]%[^{}\\]*$/, "", line)

    # Create a prefix of just the indentation.
    prefix = line
    sub(/[^\t\v\f ].*$/, "", prefix)

    # Indentation size in spaces.
    spaces = length(detab(prefix))

    # We only need the length of the prefix in the input line.
    prefix = length(prefix)

    # Find out the outline level for this indentation.
    if (spaces > maxspaces)
        newlevel = outline[maxspaces]
    else
        newlevel = outline[spaces]

    # Outline level change?
    if (level != newlevel) {
        level = newlevel
        line = substr(line, 1, prefix) "\\outl{" level "}" substr(line, prefix + 1)
    }

    printf("%s\n", line)
}

END {
    # Output template footer.
    while ((getline line < template) > 0)
        printf("%s\n", line)

    close(template)
}
With this, or Cedrik's Perl script, you can use a single Bash command to regenerate the LaTex and PDF files whenever you save your file. You'll also need inotifywait from the inotify-tools package. For example, if your text files are named .txt in current working directory and/or subdirectories, with the template being the default template.tex in each directory, and the above script is named text2latex, run
Code:
inotifywait -q -m -e close_write,moved_to --format '%w%f' -r . | while read FILE ; do
    [ "$FILE" = "${FILE%%.txt}" ] && continue
    NAME="${FILE##*/}"
    DIR="${FILE%/*}"
    [ -f "$DIR/$NAME" ] || continue
    [ -f "$DIR/template.tex" ] || continue

    clear
    echo -n "$FILE: "
    date

    TEMP="$FILE.$$"
    ./text2latex -v title="${NAME%.txt}" "$FILE" > "$TEMP.tex" && \
        pdflatex "$TEMP.tex" && \
        mv -f "$TEMP.pdf" "$FILE.pdf" && \
        mv -f "$TEMP.tex" "$FILE.tex"
    rm -f "$TEMP."*
done
in another shell. Then, every time you save (or copy/rename/move) a file name ending with .txt, that will automatically create/overwrite the .txt.tex and .txt.pdf files for you.

If you like to use evince to look at the PDFs, you'll soon notice it lacks the option to watch the files; that is, it will not automatically reload the PDF file when the file changes. (You need to hit Ctrl-R to see the updates.) To make life easier, you could run in yet another shell
Code:
inotifywait -q -m -e moved_to --format '%w%f' -r . | while read FILE ; do
    [ "${FILE}" = "${FILE%%.txt.pdf}" ] && continue
    kill -HUP $(jobs -p)
    evince "$FILE" </dev/null >/dev/null 2>/dev/null
done
which will always reopen one instance of evince to the latest .txt.pdf file in the working directory (or any subdirectory). It will not interfere with any other evince instances, and if you happen to close the window, it'll just reopen when the next PDF file emerges.

Pretty nifty, eh?

In a different thread I tried to explain why the Unix philosophy, using small interchangeable modules to construct complex tools, is way better than large monolithic applications that direct you to work in a certain way. Above, you only use bash, awk, pdflatex, inotifywait, evince , and your favourite text editor nano (my preference too, actually!), to construct a fully automated document generation suited to your needs. Talk about powerful...

Last edited by Nominal Animal; 01-28-2012 at 05:43 PM.
 
3 members found this post helpful.
Old 01-28-2012, 11:27 PM   #12
jamtat
Member
 
Registered: Oct 2004
Distribution: Debian/Ubuntu, Arch, Gentoo, Void
Posts: 138

Original Poster
Rep: Reputation: 24
Wow. That is some script you've put together, Nominal. I'm impressed. And even more impressed by the additional suggestions for how to keep the various forms of the files updated. That looks like a lot of work. I'm truly grateful.

That said, understanding the workings of this script is way beyond me. I've looked at it several times now to see if I can get some idea of which does what. I get lost almost immediately and have to give up.

I have tested it though, and it is, of course, quite effective. I assumed it needed to be saved with and *.awk extension and would then need to be chmod +x'd, so I did that. I wasn't sure yet whether it should be used in the same way as Cedrik's (i.e., script.awk outline.file > tex-file.tex), but some brief experimentation cleared that up. I was able this way to do a largely successful first run.

By largely successful what I mean is that the script properly identifed and marked up most, though not all, outline levels. I do have some question about that, but those will need to be prefaced by a bit of explanation. Maybe I can move to that later in this response--assuming you'll be able to devote a bit more attention to clarifying some things. But first, I have some other questions.

I'm not quite understanding about the title aspect of the script. It seems you might be allowing here for some way to sort of automate entry of the header text (i.e., what goes between curly brackets at {***Header*title*here***})--certainly a helpful addition: have I understood correctly? If so, I'm failing to gather from looking at the script from where the input for that is supposed to come. Is it reading some part of the input file for that information? Further clarification on that part of the script will be appreciated.

I haven't quite understood what you've said about your script's handling of spaces as opposed to tabs, either. That touches on an issue I was struggling to comprehend: namely, whether any script that could process these files would be able to distinguish tab spaces from single spaces. As you may be aware, the nano tweak I applied in order to get nano's color highlighting to work on my outline files does not distinguish between the two. It sounds as though your script treats them the same, true? If so, that seems like a plus.

Which brings up another issue I'm wondering about: since your script does not seem to rely on my pseudo-bullet (the equals sign) how does it distinguish an outline level from, say, a wrapped line? I never managed to understand whether, when nano does line wrapping (which I have it set to do), it inserts an end-of-line mark then a new-line mark at the beginning of the next line. If your script searches for (regular expression) new-line marks, then those must occur only when a carriage return is entered, rather than at points where nano simply wraps a line? Clarification on that will be appreciated.

On the pseudo-bullet character I've chosen, I'll just mention that I chose it for two reasons. One is that it can help me better to distinguish outline levels when I'm looking, on a screen, at one of my outline files under nano. Perhaps just as importantly though, I decided such a unique character might be needed in order for some search-and-replace script to even work. Yet your script seems to work fairly effectively even without the presence of the pseudo-bullet (something I discovered by accident, btw). Can you clarify how that happens? What I was trying to do in some initial experiments with searching and replacing regular expressions using nano's built-in search-and-replace function, was to get it to detect instances of end-of-line followed by new-line. I couldn't get it to detect such a combination and was unsure, in any case, whether line wrapping would entail that end-of-line/new-line combination as well. So I decided the pseudo-bullet would probably be needed and, in any case, would be helpful in distinguishing outline levels under nano on the screen. So I introduced it. Maybe I should be rethinking that?

I think I'll leave my questions at that for now and perhaps pose others later, if you will have any more time to devote to this thread. In conclusion, yes, what you've put together is truly nifty. Thanks again for your input on this!

James

Last edited by jamtat; 01-28-2012 at 11:39 PM.
 
Old 01-29-2012, 02:58 AM   #13
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 374Reputation: 374Reputation: 374Reputation: 374
Hi jamtat,

I know you have a working solution, but I wanted to respond. I'm forcing myself to do Python scripting so I can learn it. So, since nobody has posted a Python solution, I'll post mine. It will work the same way that Cedrik's perl script does (i.e. "script.py inputfile.txt > outputfile.txt")

EDIT:
I modified the script based on Cedrik's point (in a later response) that tabs appearing after the equal sign would cause problems with the "\out{}" text.
/EDIT

EDIT2:
This script runs on my 2.6.6 Python interpreter. As jamtat later discovers, it will not work for 3.2.2. The problem is a change in Python's print() syntax. An updated script for 3.2.2 is posted on the next page of this thread.
/EDIT

I named the script "texoutline.py" but as long as you use the ".py" extension and adjust your path for python at the top (if necessary), it should work:
Code:
#!/usr/bin/python

import sys
import re

if( len( sys.argv ) != 2 ):
    print >> sys.stderr, "{0} requires one filename to process.".format( sys.argv[0].split('/')[-1] )
    sys.exit( 1)

try:
    rawOutline = open( sys.argv[1], 'r' )
except:
    print >> sys.stderr, "Unable to open {0} for reading".format( sys.argv[1] )
    sys.exit( 2 )

print ( '\\documentclass{article}\n'
        '\\usepackage{cjwoutl}\n'
        '\\usepackage[top=1in,bottom=1in,left=1in,right=1in]{geometry}\n'
        '\\pagestyle{myheadings}\n'
        '\\markright{\\today{\\hfill \\Large{***Header*title*here***}\\hfill}}\n'
        '\\linespread{1.3} % gives 1.5 line spacing\n'
        '\\begin{document}\n'
        '\\begin{outline}[new]\n'
        '\\begin{Large} % gives ca. 14 pt font' )

for inputLine in rawOutline:
    reMatches = re.match( r"(\t*)=(.*)", inputLine )
    if( reMatches == None ):
        print inputLine.rstrip()
    else:
        tabCount = len( reMatches.group(1).split('\t') )
        print "{0}\\outl{{{1:d}}}{2}".format( reMatches.group(1), tabCount, reMatches.group(2) )

print ( '\\end{Large}\n'
        '\\end{outline}\n'
        '\\end{document}\n' )
Since I'm learning, it may not be very Python-ish style-wise. Oh, and one last note, Python cares about indentation. So if you try the script yourself, make sure that the indentation is preserved. Given the problem is all about outlines, I don't think that should be a problem

Last edited by Dark_Helmet; 01-29-2012 at 03:16 PM.
 
2 members found this post helpful.
Old 01-29-2012, 05:42 AM   #14
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
@Dark_Helmet: Nice; MUCH easier to read than mine, that's for sure!

Quote:
Originally Posted by jamtat View Post
I've looked at it several times now to see if I can get some idea of which does what. I get lost almost immediately and have to give up.
Sorry about that. Do you yourself prefer any specific language? I could rewrite in that, to help you understand the logic.

Quote:
Originally Posted by jamtat View Post
I'm not quite understanding about the title aspect of the script. It seems you might be allowing here for some way to sort of automate entry of the header text (i.e., what goes between curly brackets at {***Header*title*here***})
~Title~ in the template will be replaced with string from command line option -v title="string" .

The regular expression for ***Header*title*here*** allowing for leading capitalization is /\*\*\*[Hh]eader\*[Tt]itle\*[Hh]ere\*\*\*/ which hurt my eyes, so I chose an easier string.

Quote:
Originally Posted by jamtat View Post
Is it reading some part of the input file for that information?
No, it is set by option -v title="title string" on the command line when running the script. In my latter examples, the script is run by the Bash-inotifywait loop, so it'd take a bit of sed'ing to parse the title from the input file.

Quote:
Originally Posted by jamtat View Post
I haven't quite understood what you've said about your script's handling of spaces as opposed to tabs, either.
When you supply -v tab=7 to the script, it calculates the columns just like nano -T 7 does. It internally converts the tabs to the correct number of spaces (just like nano -E ), so the character index of a line will always match the column number. It will always use the original whitespace convention, though.

Quote:
Originally Posted by jamtat View Post
Which brings up another issue I'm wondering about: since your script does not seem to rely on my pseudo-bullet (the equals sign) how does it distinguish an outline level from, say, a wrapped line?
Because the indentation changes.

Consider this logic:
  • If line starts without whitespace, it is at outline level 1.
  • If the line starts with 1 to 8 columns of whitespace, it is at level 2.
  • If the line starts with 9 to 16 columns of whitespace, it is at level 3.
  • If the line starts with 17 to 24 columns of whitespace, it is at level 4.
  • If the line starts with at least 25 columns of whitespace, it is at level 5.

Whenever you get a new line of input, you check which indentation level that line needs. In my script, the number of whitespace columns is spaces, the level at current line is newlevel, and the level the last line printed was on is level. level is initialized to zero, so that you get the initial outline level set for the first word.

If newlevel and level differ for a line, you need to set level=newlevel and insert the outline command just after the whitespace on that line.

That is all.

Quote:
Originally Posted by jamtat View Post
On the pseudo-bullet character I've chosen
I did not notice that. It would help if you provided both your input text, and the LaTeX you want to be generated from it. Lorem ipsum examples, not just (differently indented text goes here). Also mark which part is the header (part before the LaTeX generated from the input text file), and which part is the footer (part after the generated bit).
_ _ _

I think there might be a better interface for you, though.

Assume you sprinkle LaTeX comments into your input text file, something like
Code:
% Title: This is the document title
% Author: Author
% Template: Name of template
% Outline: 1   2   3   4   5
These would be completely optional, defaulting to whatever values you set in ~/.textopdf/config or /etc/textopdf/config . All templates would be in ~/.textopdf/templates/ or /etc/textopdf/templates/.

The Outline line in the input text would define your indentation levels. Lines without indentation are always on the first indentation level, so 1 marks the left margin. Since 2 is three columns to the right, second outline level requires at least three spaces. With none, one, or two spaces at the start of a line, you're at outline level 1.

The template file would not have anything related to indentation, since the script would do that automatically, based on just the Outline: line. If there is an empty Outline: line (either in the input text file, or in the configuration file), then the script would skip indentation-to-outline-level mapping altogether.

In the template LaTeX file, you could use e.g ~Author~ to insert the string from the corresponding comment. You can make that automatic, i.e. you can add whatever strings to both the template and the input text, as long as the keyword only contains letters A-Z, a-z, and maybe digits 0-9 and dashes -, so detection is as reliable as possible.

I'd like to avoid using the % character, so that the templates themselves would stay valid LaTeX.

You could even add snippet support: For example,
Code:
% =snippetname [string]
would include ~/.textopdf/snippets/snippetname.tex or /etc/textopfd/snippets/snippetname.tex into the LaTeX, with perhaps ~:~ within it replaced with string.

The main thing with these is to think up suitable patterns -- like I have for the ones with ~ above -- that can be simply replaced from the template, without matching unrelated LaTeX code. Using % would be good because it is a comment character, but % & 8 are so easy to confuse, I picked ~ instead.
 
Old 01-29-2012, 06:45 AM   #15
Cedrik
Senior Member
 
Registered: Jul 2004
Distribution: Slackware
Posts: 2,140

Rep: Reputation: 244Reputation: 244Reputation: 244
@Dark_Helmet
Code:
tabCount = len( inputLine.split('\t') )
I like your tab count method
But it assumes there will never be tabs elsewhere than start of line, no ?
(I mean for using the tab count in \outl{x})

@jamtat
FWIW, I have improved the perl script, and noticed it didn't replace the equal sign, have corrected that

Last edited by Cedrik; 01-29-2012 at 07:34 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
groovy scripting advice john83reuben Programming 1 03-07-2010 07:44 PM
Reading a bash variable in bash scripting problem freeindy Programming 3 11-27-2008 02:29 AM
some advice with bash scripting xaos5 Programming 6 03-06-2007 09:43 PM
bash scripting advice? WilliamS Slackware 21 01-14-2007 08:08 AM
Bash Scripting newb.. Advice needed. trey85stang Linux - General 5 09-28-2006 12:05 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 05:59 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration