LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   scripting help/advice; use bash? (https://www.linuxquestions.org/questions/programming-9/scripting-help-advice%3B-use-bash-926098/)

Dark_Helmet 01-29-2012 11:47 AM

Thanks to both of you!

Quote:

Originally Posted by Cedrik
But it assumes there will never be tabs elsewhere than start of line, no ?

Absolutely. I guess I unconsciously assumed that tabs were "special" characters and would not appear in the text data. If tabs can appear after the equal sign, then the more technically correct line would be:
Code:

        tabCount = len( reMatches.group(1).split('\t') )
I'll modify the original script in a moment just to be consistent.

jamtat 01-29-2012 02:12 PM

This is really turning into an embarrasment of riches--now there are 3 different scripts, one of them even having already gone through a revision!

Thanks for providing a python alternative, Dark_Helmet. I get an error when I try to run this one, though:
Code:

[me@mymachine ~]$ ./outl2latex.py sample.outl > sample.tex
  File "./outl2latex.py", line 29
    print inputLine.rstrip()
                  ^
SyntaxError: invalid syntax

This is python 3.2.2, by the way. Does your script require an earlier version, perhaps?

James

PS In my outlines I cannot anticipate any scenario where tab spaces would occur anywhere other than at the left margin.

Dark_Helmet 01-29-2012 02:22 PM

Odd.

The Python interpreter on my machine is 2.6.6.

Though, I don't recall seeing that rstrip() was deprecated or any other change to strings that would account for that syntax error. I'll go check the docs.

EDIT:
According to Online Python Docs that syntax should be fine--as long as inputLine is still treated as a string. I'll double-check whether they changed how a file iterator is handled.

EDIT2:
I don't see any changes to file I/O that would explain it either. For jollies, I may download the 3.2.2 source and see what I can find out. In the meantime, the only other thing I can suggest is to check for typos if you manually typed in the script.

Dark_Helmet 01-29-2012 03:09 PM

I should be a betting man. Because I could make a lot of money betting that whenever I think I know what the problem is, it's something else.
"Yes, Mr. Bookie, I think strings and file I/O are the problem. So please put this $50 on anything but those two"

The syntax error is because the print statement changed. I'll put up a corrected version of the script in a moment.

EDIT:

The new script modified for 3.2.2's new print() statement style. This ran on my system with a compiled 3.2.2 interpreter.

Code:

#!/usr/bin/python

import sys
import re

if( len( sys.argv ) != 2 ):
    print ( "{0} requires a filename to process.".format( sys.argv[0].split('/')[-1] ), file=sys.stderr )
    sys.exit( 1 )

try:
    rawOutline = open( sys.argv[1], 'r' )
except:
    print ( "Unable to open {0} for reading".format( sys.argv[1] ), file=sys.stderr )
    sys.exit( 2 )

print ( '\\documentclass{article}\n'
        '\\usepackage{cjwoutl}\n'
        '\\usepackage[top=1in,bottom=1in,left=1in,right=1in]{geometry}\n'
        '\\pagestyle{myheadings}\n'
        '\\markright{\\today{\\hfill \\Large{***Header*title*here***}\\hfill}}\n'
        '\\linespread{1.3} % gives 1.5 line spacing\n'
        '\\begin{document}\n'
        '\\begin{outline}[new]\n'
        '\\begin{Large} % gives ca. 14 pt font' )

for inputLine in rawOutline:
    reMatches = re.match( r"(\t*)=(.*)", inputLine )
    if( reMatches == None ):
        print ( inputLine.rstrip() )
    else:
        tabCount = len( reMatches.group(1).split('\t') )
        print ( "{0}\\outl{{{1:d}}}{2}".format( reMatches.group(1), tabCount, reMatches.group(2) ) )

print ( '\\end{Large}\n'
        '\\end{outline}\n'
        '\\end{document}\n' )


jamtat 01-29-2012 04:34 PM

Thanks for that modified version, Dark_Helmet. Sorry to put you to that extra work: I discovered in the meantime that I do actually have an older version of python on this machine (2.7) and that, when invoked with the path to that version, the script ran fine. Still, I'm wondering whether the updated script will be backward-compatible, i.e., whether it'll run using older versions of python? Or does each version have to be used only with particular versions of python?

James

Later edit: Your new version works fine here with my python 3.2.2 as well. It does not work when invoked with the path to python 2.7.

Dark_Helmet 01-29-2012 06:31 PM

No need to apologize. To be honest, I'm not sure why the Debian maintainers have not pushed Python 3.2 out yet as a replacement for 2.6. Then again, given the problem we just went through, they could be concerned that the upgrade could break lots of existing scripts.

Such is the case when a fundamental script tool (such as print) is changed and the change breaks backward compatibility. But some of that is to be expected in a major version change.

Back to your original problem, I completely forgot to address your question: could you write a bash script to solve your problem?

Sure, you can do it. You could do it for the same reason I wrote this in Python: to teach yourself more of the ins-and-outs of the scripting language and its features. As a rule of thumb: if you can accomplish a task with a sequence of commands at a terminal, you can code that task as a bash shell script.

There may be languages that are better suited for specific tasks. I think Nominal Animal touched on this (though I only scanned his replies). So, it boils down to your goal. Is your goal a functioning script, to build on your scripting experience, or a combination of both?

If I were to start writing a bash shell script for this particular task, a very basic pseudocode outline:
Code:

#!/bin/bash

cat << EOF
<insert all the header markup here>
EOF

cat "${1}" | while read inputLine ; do
  grepOut=$( echo "${inputLine}" | grep -e "^\t*=" )
  if [ -z "${grepOut}" ] ; then
    echo -e "${inputLine}"
  else
    literalTabs=$( echo "${inputLine}" | sed 's@^\(\t*\)=.*@\1@' )
    everythingElse=$( echo "${inputLine}" | sed 's@^\t*=\(.*\)@\1@' )
    <insert some logic to calculate the number of tabs in literalTabs>
    echo -e "${literalTabs}\outl{${tabCount}}${everythingElse}"
  fi
done

cat << EOF
<insert all the footer markup here>
EOF

Obviously that's incomplete. And I make no guarantees about the sed/grep patterns :) But, if you want, take it and run with it.

jamtat 01-29-2012 11:15 PM

Thanks for the input on the bash script and for reworking the python script, Dark_Helmet. I'm still kind of inclined to cut my programming teeth, as it were, on something like bash since it's so relevant to administration of my Linux systems. So I'm glad you've provided some kind of starting point and thus some encouragement to pursue it further.

So far, I'm not sure anyone has looked very carefully at the bash script that I found that converts text to html, the one I thought might be adaptable to the task of adding LaTeX mark-up to my outline files (I provided a link above). So why don't I go ahead and paste it here for reference.
Code:

#!/bin/bash
# tohtml.sh [v. 0.2, reldate: 06/26/08, still buggy]

# Convert a text file to HTML format.
# Author: Mendel Cooper
# License: GPL3
# Usage: sh tohtml.sh < textfile > htmlfile
# Script can easily be modified to accept source and target filenames.

#    Assumptions:
# 1) Paragraphs in (target) text file are separated by a blank line.
# 2) Jpeg images (*.jpg) are located in "images" subdirectory.
#    In the target file, the image names are enclosed in square brackets,
#    for example, [image01.jpg].
# 3) Emphasized (italic) phrases begin with a space+underscore
#+  or the first character on the line is an underscore,
#+  and end with an underscore+space or underscore+end-of-line.


# Settings
FNTSIZE=2        # Small-medium font size
IMGDIR="images"  # Image directory
# Headers
HDR01='<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">'
HDR02='<!-- Converted to HTML by ***tohtml.sh*** script -->'
HDR03='<!-- script author: M. Leo Cooper <thegrendel.abs@gmail.com> -->'
HDR10='<html>'
HDR11='<head>'
HDR11a='</head>'
HDR12a='<title>'
HDR12b='</title>'
HDR121='<META NAME="GENERATOR" CONTENT="tohtml.sh script">'
HDR13='<body bgcolor="#dddddd">'  # Change background color to suit.
HDR14a='<font size='
HDR14b='>'
# Footers
FTR10='</body>'
FTR11='</html>'
# Tags
BOLD="<b>"
CENTER="<center>"
END_CENTER="</center>"
LF="<br>"


write_headers ()
  {
  echo "$HDR01"
  echo
  echo "$HDR02"
  echo "$HDR03"
  echo
  echo
  echo "$HDR10"
  echo "$HDR11"
  echo "$HDR121"
  echo "$HDR11a"
  echo "$HDR13"
  echo
  echo -n "$HDR14a"
  echo -n "$FNTSIZE"
  echo "$HDR14b"
  echo
  echo "$BOLD"        # Everything in bold (more easily readable).
  }


process_text ()
  {
  while read line    # Read one line at a time.
  do
    {
    if [ ! "$line" ]  # Blank line?
    then              # Then new paragraph must follow.
      echo
      echo "$LF"      # Insert two <br> tags.
      echo "$LF"
      echo
      continue        # Skip the underscore test.
    else              # Otherwise . . .

      if [[ "$line" =~ "\[*jpg\]" ]]  # Is a graphic?
      then                            # Strip away brackets.
        temp=$( echo "$line" | sed -e 's/\[//' -e 's/\]//' )
        line=""$CENTER" <img src="\"$IMGDIR"/$temp\"> "$END_CENTER" "
                                      # Add image tag.
                                      # And, center it.
      fi

    fi


    echo "$line" | grep -q _
    if [ "$?" -eq 0 ]    # If line contains underscore ...
    then
      # ===================================================
      # Convert underscored phrase to italics.
      temp=$( echo "$line" |
              sed -e 's/ _/ <i>/' -e 's/_/<\/i> /' |
              sed -e 's/^_/<i>/'  -e 's/_/<\/i>/' )
      #  Process only underscores prefixed by space,
      #+ or at beginning or end of line.
      #  Do not convert underscores embedded within a word!
      line="$temp"
      # Slows script execution. Can be optimized?
      # ===================================================
    fi


 
    echo
    echo "$line"
    echo
    } # End while
  done
  }  # End process_text ()


write_footers ()  # Termination tags.
  {
  echo "$FTR10"
  echo "$FTR11"
  }


# main () {
# =========
write_headers
process_text
write_footers
# =========
#        }

exit $?

Specifically what I wondered about is whether, since all tags I need to insert are added in relation to new lines (either right at the new line or a certain number of tab spaces from it) and my pseudo-bullet--the equals sign--sed would even need to be used.

Any thoughts on the possibility of adapting this script, anyone?

James

jamtat 01-29-2012 11:33 PM

Quote:

Originally Posted by Nominal Animal (Post 4587309)
Sorry about that. Do you yourself prefer any specific language? I could rewrite in that, to help you understand the logic.

I'd be flattering myself if I stated that I know enough about any programming language to have a preference for one over another, Nominal. Within the context of this thread I can say that the perl script seems to me most comprehensible. Anyway thanks for offering to do further work on this.
Quote:

~Title~ in the template will be replaced with string from command line option -v title="string" .

The regular expression for ***Header*title*here*** allowing for leading capitalization is /\*\*\*[Hh]eader\*[Tt]itle\*[Hh]ere\*\*\*/ which hurt my eyes, so I chose an easier string.

No, it is set by option -v title="title string" on the command line when running the script. In my latter examples, the script is run by the Bash-inotifywait loop, so it'd take a bit of sed'ing to parse the title from the input file.
I see. I put that text with all the asterisks between those curly brackets intending, with the idea that I would be adding the actual header manually, for it to readily catch my attention as something in need of further editing. Of course anything--or even nothing at all--could be put between those curly brackets, depending on whether some automated way of entering the text for the header were in play. On which point, see below.

Here's something I've begun to wonder about in relation to the title/header thing. I'm still kind of attached to this pseudo-bullet (equals sign) idea and have even included it now in the color highlighting scheme I developed for my outline files under nano. So, what about the following scenario: if the first line of the file starts at the left margin and does not begin with an equals sign, nano won't color it. That's a good way, while looking at the file under nano, of helping to distinguish that first line as the title; but it could act, additionally, as an indicator to some conversion script that the text contained in that line needs to go into the header, i.e., between the curly brackets at \Large{***Header*title*here***}. Does that make sense?
Quote:

I did not notice that. It would help if you provided both your input text, and the LaTeX you want to be generated from it. Lorem ipsum examples, not just (differently indented text goes here). Also mark which part is the header (part before the LaTeX generated from the input text file), and which part is the footer (part after the generated bit).
I did provide above links to some screenshots of the text as it looks in nano--both with and without the LaTeX mark-up added. A complete file is not shown there, though. I could certainly copy and paste here a complete example file if that would be helpful.

Thanks,
James

Dark_Helmet 01-30-2012 01:43 AM

Quote:

Originally Posted by jamtat (Post 4587895)
Any thoughts on the possibility of adapting this script, anyone?

That script (no offense to the author) is an exercise in unnecessary verbosity.

Creating a separate variable for each line of the header? Yikes. The "cat << EOF ... EOF" is, in my mind, far and away a much more suitable approach.

Creating a function that issues an echo for each header line? Yikes. See previous point re: cat structure.

Defining the functions themselves? I don't see the cost-benefit. The author took the time to write those functions. Why? Because the author had to believe that writing:
Code:

write_headers
process_text
write_footers

was more understandable than:
Code:

cat << EOF
<header stuff>
EOF

process_text

cat << EOF
<footer stuff>
EOF

To be specific, the author felt the "gain" in understanding outweighed the extra effort to code the functions. Abstraction for the sake of abstraction takes away (rather than adds to) the readability/understandability of code sometimes. I think that script is an example of one such instance.

All that said, I'm not trying to bash the script. I just think the author went out of his way with good intentions, but they were unnecessary and backfired. That, or the author used the script as a learning tool and incorporated some shell features as an academic exercise--as opposed to a functional exercise.

Now, if you convert those variables and functions to the "cat << EOF" format, the html script starts to look a lot like the perl, python, and incomplete bash scripts from earlier.

Quote:

Specifically what I wondered about is whether, since all tags I need to insert are added in relation to new lines (either right at the new line or a certain number of tab spaces from it) and my pseudo-bullet--the equals sign--sed would even need to be used.
Well, you'll notice that my first dabble at the bash script used grep and sed. You probably also notice that the html script you found relies on grep and sed. The reason is that, at a high level, those tool are performing specific steps:
1. grep is used to decide whether you need to modify a line from your outline
2. sed extracts and/or modifies the contents of the line

For the bash script above, you could replace the two sed commands with cut. For instance:
Code:

    literalTabs=$( echo "${inputLine}" | cut -f 1 -d '=' )
    everythingElse=$( echo "${inputLine}" | cut -f 2- -d '=' )

In this particular case, because your modification is fairly simple, you can make that change. Though, as your tasks become more complex, using cut, nested cuts, or various combinations of tr-cut-tr-cut-tr will become unwieldy. Not to mention the code will be unintelligible if more than a week passes without you looking at it.

jamtat 01-30-2012 10:41 AM

Quote:

Originally Posted by Nominal Animal (Post 4587309)
Because the indentation changes.

Consider this logic:
  • If line starts without whitespace, it is at outline level 1.
  • If the line starts with 1 to 8 columns of whitespace, it is at level 2.
  • If the line starts with 9 to 16 columns of whitespace, it is at level 3.
  • If the line starts with 17 to 24 columns of whitespace, it is at level 4.
  • If the line starts with at least 25 columns of whitespace, it is at level 5.

Whenever you get a new line of input, you check which indentation level that line needs. In my script, the number of whitespace columns is spaces, the level at current line is newlevel, and the level the last line printed was on is level. level is initialized to zero, so that you get the initial outline level set for the first word.

It took awhile, obviously, for me to digest all this material Nominal. I now realize where and probably why this script fails to insert mark-up in certain places where it's supposed to be. If I'm not mistaken, it's because the script only looks for level changes as its indication of where to insert the \outl{#} mark-up. But in a real-world outline, mark-up could need to be inserted at a point where no level changes occur.

Consider the following case:
Quote:

1. level 1 number 1
a. point a of level 1 number 1
b. point b of level 1 number 1
2. level 1 number 2
a. point a of level 1 number 2

And so on. Not only point a. of level 1 number 1. needs to have the mark-up inserted, but point b. of level 1 number 1. does too. I think your script would fail to insert mark-up for point b. of level 1 number 1. So, levels, sub-levels, sub-sub-levels (and so forth) can all have numerous entries under them that are all on the same level, and each entry needs the \outl{#} tag so it'll process right and so print out properly once pdflatex is run on it.

I hope what I'm saying is clear and that I have, in fact, correctly identified the shortcoming of your script. Now, if the script were looking for the spacing plus equals sign, I think it would be catching all instances where the \outl{#} tag needs to be inserted. Unless there's some other way for it to detect something akin to carriage returns and furthermore, if the text editor's line-wrapping feature does not introduce something like the carriage return.

Sorry I was unable, by looking at your code, to identify this. I managed to figure it out by reading over and thinking about the words you used to express your concept. I relate best to that sort of code, it seems :)

James

Nominal Animal 01-31-2012 10:31 PM

Quote:

Originally Posted by jamtat (Post 4588419)
the script only looks for level changes as its indication of where to insert the \outl{#} mark-up. But in a real-world outline, mark-up could need to be inserted at a point where no level changes occur.

Quite right -- good catch!

Quote:

Originally Posted by jamtat (Post 4588419)
If the script were looking for the spacing plus equals sign, I think it would be catching all instances where the \outl{#} tag needs to be inserted. Unless there's some other way for it to detect something akin to carriage returns and furthermore, if the text editor's line-wrapping feature does not introduce something like the carriage return.

You could write the bulk of the text un-indented, and use the indentation to detect a new paragraph, so every new paragraph would start with at least one space of indentation. Or you could use an empty line between paragraphs. Or, you can use a paragraph mark -- in fact, that is, really, what your = in the first column is.

All are good solutions, it is just a matter of which one you are most comfortable with.

EDIT 2: The script below uses a = before text to start a new paragraph. Spaces prior to the = are considered the indent that defines the outline level; any whitespace after the = is not considered (but is kept in the output).

Quote:

Originally Posted by jamtat (Post 4587903)
if the first line of the file starts at the left margin and does not begin with an equals sign .. distinguish that first line as the title

Hmm, why not leverage the LaTeX comment character for this? Consider a text input file looking like this:
Code:

% Template: template.tex
% Title: Title text
% FooBar: Some other text

=First paragraph goes here.
Multiple lines are perfectly okay.
        They may even be indented.
=Second paragraph, also at the first outline level.
        =Third paragraph is on the second level.
                =Fourth paragraph is on the third outline level.
=                Fifth pararaph. First outline level, just
                has a lot of whitespace.

and the corresponding template.tex in the same directory being
Code:

% Defaults, if not specified in the text input file
% Tab: 8
% Date: \today
\documentclass{article}
\usepackage{cjwoutl}
\usepackage[top=1in,bottom=1in,left=1in,right=1in]{geometry}
\pagestyle{myheadings}
\markright{~Date~ {\hfill \Large{~Title~}\hfill}}
\linespread{1.3} % gives 1.5 line spacing
\begin{document}
\begin{outline}[new]
\begin{Large} % gives ca. 14 pt font
%\outl{1}
%      \outl{2}
%              \outl{3}
%                      \outl{4}
%                              \outl{5}
%                                      \outl{6}
%                                              \outl{7}
%                                                      \outl{8}
%                                                              \outl{9}
%                                                                      \outl{10}
content
\end{Large}
\end{outline}
\end{document}

Using the following script,
Code:

#!/usr/bin/awk -f
#
# -v tab=8
#      set tab stops at every eight columns (the default).
#
# -v template=template.tex
#      set the path to the LaTeX template file.
#

# Convert tabs to spaces.
function detab(detab_line) {

    if (length(tabsp) != tab) {
    }

    while ((detab_pos = index(detab_line, "\t")) > 0)
        detab_line = substr(detab_line, 1, detab_pos - 1) substr(tabsp, detab_pos % tab) substr(detab_line, detab_pos + 1)

    return detab_line
}

BEGIN {
    # Set tab width to default, unless set on the command line.
    if (tab < 1)
        tab = 8

    # Set template name to default, unless set on the command line.
    if (length(template) < 1)
        template = "template.tex"

    # Record separator is a newline, including trailing whitespace.
    RS = "[\t\v\f ]*(\r\n|\n\r|\r|\n)"

    # Field separator is consecutive whitespace.
    FS = "[\t\v\f ]+"

    # Configuration -- parsed from magic comments.
    split("", config)
    config["tab"] = tab
    config["template"] = template

    # We are not working on anything yet.
    template = ""
    header = ""
    footer = ""
    split("", outline)
    outline[0] = 1
    maxspaces  = 0
    CURR = ""
}

CURR != FILENAME {

    # Empty line?
    if ($0 ~ /^[\t ]*$/)
        next       

    # Configuration comment?
    if ($0 ~ /^%[\t ]*[A-Za-z][0-9A-Za-z]*[\t ]*:/) {
        name = $0
        sub(/^%[\t ]*/, "", name)
        sub(/[\t ]*:.*$/, "", name)
        value = $0
        sub(/^[^:]*:[\t ]*/, "", value)

        # Make the name case-insensitive.
        temp = name
        name = ""
        for (i = 1; i <= length(temp); i++) {
            c = substr(temp, i, 1)
            uc = toupper(c)
            lc = tolower(c)
            if (uc != lc)
                name = name "[" uc lc "]"
            else
                name = name c
        }

        config[name] = value
        next
    }

    # Comment line (skipped)?
    if ($0 ~ /^[\t ]*%/)
        next

    # This is the first line of actual content.
    CURR = FILENAME

    # Set up tabs as currectly specified.
    tab = int(config["tab"])
    tabsp = "                "
    while (length(tabsp) < tab)
        tabsp = tabsp tabsp
    tabsp = substr(tabsp, 1, tab)

    # Have we used a template yet?
    if (length(template) < 1) {
        # No, read it.
        template = config["template"]
        if (length(template) < 1) template = "-"
        OLDRS = RS
        RS = "(\r\n|\n\r|\r|\n)"

        while ((getline line < template) > 0) {
            # Content marker line?
            if (line ~ /^[\t\v\f ]*[Cc][Oo][Nn][Tt][Ee][Nn][Tt][\t\v\f ]*$/)
                break

            # Outline level definition?
            if (line ~ /^%[\t ]*\\outl{/) {
                level = line
                sub(/^[^{]*{/, "", level)
                sub(/}.*$/, "", level)
                level = int(level)

                line = detab(line)
                sub(/\\.*$/, "", line)
                sub(/%/, "", line)
                spaces = length(line)
                outline[spaces] = level
                if (spaces > maxspaces)
                    maxspaces = spaces
                continue
            }

            # Default value definition?
            if (line ~ /^%[\t ]*[A-Z][0-9A-Za-z]*:/) {
                name = line
                sub(/^%[\t ]*/, "", name)
                sub(/[\t ]*:.*$/, "", name)
                value = line
                sub(/^[^:]*:[\t ]*/, "", value)

                # Make the name case-insensitive.
                temp = name
                name = ""
                for (i = 1; i <= length(temp); i++) {
                    c = substr(temp, i, 1)
                    uc = toupper(c)
                    lc = tolower(c)
                    if (uc != lc)
                        name = name "[" uc lc "]"
                    else
                        name = name c
                }

                # If not in config already, set.
                if (!(name in config))
                    config[name] = value
                continue
            }

            # Comment line?
            if (line ~ /^[\t ]*%/)
                continue

            # Ordinary header line. Remove comment.
            sub(/[\t ]%.*$/, "", line)
            header = header line "\n"
        }

        # The rest belongs to footer.
        while ((getline line < template) > 0)
            footer = footer line "\n"

        close(template)
        RS = OLDRS

        # Fill in the outline levels.
        level = outline[0]
        for (spaces = 1; spaces < maxspaces; spaces++)
            if (spaces in outline)
                level = outline[spaces]
            else
                outline[spaces] = level

        # Replace all known ~Name~ in the template.
        for (name in config) {
            gsub("~" name "~", config[name], header)
            gsub("~" name "~", config[name], footer)
        }

        # Replace all other ~Name~ entries in the template with empty strings.
        gsub(/~[A-Z][0-9A-Za-z]*~/, "", header)
        gsub(/~[A-Z][0-9A-Za-z]*~/, "", footer)

        # Emit the template.
        printf("%s", header)
    }
}

/^[\t ]*=/ {
    line = $0
    prefix = index(line, "=") - 1

    # Indentation size in spaces.
    spaces = length(detab(substr(line, 1, prefix)))

    # Find out the outline level for this indentation.
    if (spaces > maxspaces)
        level = outline[maxspaces]
    else
        level = outline[spaces]

    # Add outline level definition.
    line = substr(line, 1, prefix) "\\outl{" level "}" substr(line, prefix + 2)

    printf("%s\n", line)
    next
}

{  printf("%s\n", $0)
}

END {
    printf("%s", footer)
}

i.e. running
Code:

./script.awk input.txt
the output is
Code:

\documentclass{article}
\usepackage{cjwoutl}
\usepackage[top=1in,bottom=1in,left=1in,right=1in]{geometry}
\pagestyle{myheadings}
\markright{\today {\hfill \Large{Title text}\hfill}}
\linespread{1.3}
\begin{document}
\begin{outline}[new]
\begin{Large}
\outl{1}First paragraph goes here.
Multiple lines are perfectly okay.
        They may even be indented.
\outl{1}Second paragraph, also at the first outline level.
        \outl{2}Third paragraph is on the second level.
                \outl{3}Fourth paragraph is on the third outline level.
\outl{1}                Fifth pararaph. First outline level, just
                has a lot of whitespace.
\end{Large}
\end{outline}
\end{document}

EDIT 1:

The awk script above now allows defaults to be set in the template file itself. That way you only need to use the initial lines like % Title: if you want to override a string in the template.

In the example cases above, you can add a % Date: string line to your input text file, if you want to specify a certain date. By default, the template file uses \today as you can see in the start of the new template file, and the output.

jamtat 01-31-2012 11:24 PM

I think I may have not properly conveyed where the equals sign gets placed in my outlines: it actually precedes immediately the text in each level. Look, for example, at the bullets in the sample outline below:
  • Level 1 number one
    • point a. of level 1 number one
    • point b. of level 1 number one
  • Level 1 number two
    • point a. of level 1 number two
In my nano outlines, the equals sign takes the place of the bullets in the example seen above--immediately preceding the text and indented along with it--which is why I've called them "pseudo-bullets." Sorry I did not make that clearer.

In case this might help, the color highlighting stipulations I've entered into .nanorc--which also show some regular expressions that help nano to decide where to apply highlighting--is as follows:
Code:

syntax "outl" "\.outl$"
color brightwhite "(^)=.*$"
color brightred "(^[[:blank:]])=.*$"
color brightgreen "(^[[:blank:]]{2})=.*$"
color magenta "(^[[:blank:]]{3})=.*$"
color brightblue "(^[[:blank:]]{4})=.*$"
color brightyellow "(^[[:blank:]]{5})=.*$"
color cyan "(^[[:blank:]]{6})=.*$"
color brightred "(^[[:blank:]]{7})=.*$"
color brightgreen "(^[[:blank:]]{8})=.*$"
color magenta "(^[[:blank:]]{9})=.*$"
color brightblue "(^[[:blank:]]{10})=.*$"

Thanks for your continuing input on this, Nominal.

James

Nominal Animal 01-31-2012 11:44 PM

I edited my post above. Does it match your use case better this way?

Note that I assume that all whitespace after the = is part of the text, not considered "indent" when selecting the outline level. I think, but am not sure, that this matches your use patterns.

sundialsvcs 02-01-2012 08:03 AM

As for me, given that there are so many good scripting tools available, any of which can be used with a simple #!shebang such that the user doesn't have to know or care, I don't choose to use bash (or ksh) for scripting purposes.

The compelling advantage of using "the big guns," in addition to the simple fact that they are intended for the purpose and are therefore often a good bit easier to use (IMHO) than bash, is that you can grab complete and well-developed "packages" to use along with your language-of-choice. For instance, if you need to parse an Apache log file, well (for example, if you decide to use Perl), go to http://search.cpan.org, type in the search term "apache log," and there you have a list of (at the present moment) 373 full-featured packages to choose from. Install the package of your choice, use it in your script, and all of the functionality in that package is something that you didn't have to write. That's huge. And all of the languages are similar ... Perl, Python, Ruby, even PHP. The end-user of your command just invokes the command, identically to the way they'd invoke a bash-script, and doesn't have to know or care how it works.

Always remember that, no matter what it is that you're trying to do, it has certainly been done before by someone else, and done very well. These complete and platform-independent packages of well tested tools are free for the asking.

jamtat 02-01-2012 10:11 PM

EDIT: I take back what I've said below. It seems somehow the mark-up got removed from the template. And now, back to your regularly scheduled programming (no pun intended) . . .

Thanks for your continued input, Nominal. The edited version does match better my use case. I've noted one minor anomaly though: it's somehow removing the \today mark-up from the header. I've tried it twice now with the same result each time. Since the code is so complex, I cannot determine from looking at it where/why that might be happening.

James


All times are GMT -5. The time now is 12:45 AM.