scripting help/advice; use bash?
Hi. I recently did some tweaking to nano so that I could create outlines with it that look good on the screen. I also need to print out those outlines and have them look nice on paper and I've come up with a way of doing that which involves inserting--manually, for now--TeX/LaTeX mark-up, then changing the file's extension to .tex and running pdflatex on it. You can read about the project at audaciousamateur.blogspot.com for more details if you're interested.
It seems to me that, even for someone with my limited knowledge, there should be some non-manual way to add the mark-up to my outline files. Someone who knows perl or python well, for example, could probably easily cobble together some way of doing this task using one of those languages. But I know next to nothing about either language. In another forum where I asked about this I was directed to The Advanced Bash Scripting Guide. I was kind of gravitating toward bash anyway for this since, if I can lay somewhat dubious claim to being familiar with any sort of scripting, it would be using bash (I've created some extremely rudimentary bash scripts in the past). But I have such a poor grasp of even bash that I really wasn't sure it could process these files. Well, I actually located in the ABS a sample script that converts a text file to html--something very close to what I need to do. In short, I need to add some lines at the beginning of the file and append some at the end, as well as to insert some mark-up within the file: that's pretty much what the bash script I found does as well (see the script at http://www.tldp.org/LDP/abs/html/con...ts.html#TOHTML ). I just want to start off this thread by asking those much better versed in bash whether I'm on the right track in considering the ABS sample script as being a good starting point for a script that could be used to process my outline files? Thanks, James |
Its hard to answer without specific information about the format of your file before and after inserting the Tex/LaTeX markup.
Bash is not great at string manipulation but it can use sed to do the complex work for it as done in the linked script. awk might be a better choice. If you know C, awk is relatively easy to learn -- easier than bash. Can you post an illustrative example of the input file format and the desired output? |
If I understand this, you want simple text files with simple markup that are directly readable for a human, but also publishable. There are (wiki-like) documentation systems just for that.
You might want to take a step back and consider one of those already round wheels, e.g. http://sphinx.pocoo.org/rest.html#li...te-like-blocks, or actually a personal wiki, which is what I use for quick notes. All wikis convert to HTML obviously, good ones also to PDF. |
Thanks for the answers thus far. There are illustrations at my blog (I posted the address in the OP), but I'll repeat some of that here.
The outline text file is, obviously, an outline. Each level of the outline gets indented 0 or more tab spaces from the left margin. Unindented lines are the level one parts of the outline; lines indented one tab space from the left margin are level two parts; lines indented two tab spaces from the left margin are level three parts; and so on. I've designated a unique character--the equals sign--as a sort of pseudo-bullet for all outline levels as well. Here's a link to a screenshot of a sample outline I did that should better illustrate visually what I'm describing: http://1.bp.blogspot.com/-o2ZlVk8sLL...s1600/Scr1.png So, here's what needs to be done to this text file so as to make it print nicely on paper. Nine lines need to be prepended to the beginning. Those lines are: Quote:
Quote:
I hope this gives enough further detail to determine whether a bash script is the right tool, or even a possible tool, to use for this job. As I said, it seems to me the bash script for converting a text file to html works very similarly to what I need--though my scenario is actually a bit simpler in that mark-up only needs to be added in a certain relation to new lines. The bash script I found, so far as I can understand it, needs to do replacements within lines and paragraphs and so, it seems, calls sed. Further input will be appreciated. And by the way, I do not know C or any other programming language. The only thing remotely resembling programming that I have any familiarity with at all is some rudimentary html and, as I said, very rudimentary bash scripting. James |
Quote:
The reason I like the solution I'm proposing is that I can, using TeX/LaTeX mark-up, essentially create a template that will render the printed output in just the form I want it. I can, for example, control margin width, font size, line spacing, header content--even doing tricky things like having the date auto-inserted in the header. So far as I understand it I would have to get involved in a lot of additional tweaking of the file in order to get that kind of output from a wiki file. But I'll certainly be giving the matter some more thought. James |
I don't know with bash, but with Perl a way to do it:
(change $tab_limit value if you need the script to handle more than 10 tabs) Code:
#!/usr/bin/perl Make it executable (chmod +x edit_tabs.pl) Use it like Code:
./edit_tabs.pl yourfile.txt > newfile.txt I found a better version, removing the need of limiting tabs count Also remove equal sign as it was one requirement (and the previous script did not satisfy it) Code:
#!/usr/bin/perl |
Quote:
Now testing . . . Wow. That works pretty well (though I did have some anomolies at first that resulted from some weirdness introduced when I copied and pasted the code). I note that in newfile.txt your script gets rid of the tab spaces where the \outl{#} tags get inserted. Of course pdflatex doesn't care about whether or not there are tab spaces at those points and formats the file just fine for printing anyway. But for my purposes, preserving the tab spaces found in the original outline is helpful: I can make better sense of the file visually with the presence of the tab spaces at those points. So, is there a way to modify your perl script so that it preserves the tab spaces that occur in the original outline in conjunction with the equals signs? Otherwise, this looks like it could be a great solution. James |
Would replacing the line
Code:
my $replace = '\\outl{' . $i . '}'; Code:
my $replace = '^\t{' . ($i). '}\\outl{' . $i . '}'; Thanks, James Never mind. That doesn't work--just prepends the characters ^\t{#} to lines that being with \outl{#} |
If you want to preserve tabs, change $replace line:
Code:
my $replace = "\t" x ($i - 1) . '\\outl{' . $i . '}'; |
Yep, that does do it, Cedrik. Thanks again so much for helping with this! I now have a workable way of inserting the needed mark-up into my outlines!
I still may try and do this with a bash script, though. I've wanted for some time now to advance my pathetic abilities with bash, and figuring out how to do this with bash (if, as it seems to me, it will be possible with bash) would provide an opportunity to learn more about it. So if anyone has further input on whether the bash script I found that adds html mark-up to a text file could be adapted to add TeX mark-up as I'm trying to do, please weigh in. James |
I would use awk instead of Bash, because awk has all the necessary string facilities, whereas with Bash they're a bit lacking. Bash would certainly be a LOT slower.
Here is a plain awk script. You can supply it
You can add further variables, especially ones similar to the title, very easily. I've tried to comment the code well; I want it to be an example and explanation, and not just a suggested solution. It should work well even with mixed tabs and spaces. It uses the % minimum-indentation \\outl{level} comments in the template (where minimum-indentation is either empty or desired whitespace string). Within the input, extra spaces or tabs do not matter; extra indentation up to the next outline level is accepted. It does not require empty lines between outline levels, as it tracks the preferred outline level for each line, and only inserts the outline definition before the first non-whitespace character when the outline level changes. Code:
#!/usr/bin/awk -f Code:
inotifywait -q -m -e close_write,moved_to --format '%w%f' -r . | while read FILE ; do If you like to use evince to look at the PDFs, you'll soon notice it lacks the option to watch the files; that is, it will not automatically reload the PDF file when the file changes. (You need to hit Ctrl-R to see the updates.) To make life easier, you could run in yet another shell Code:
inotifywait -q -m -e moved_to --format '%w%f' -r . | while read FILE ; do Pretty nifty, eh? In a different thread I tried to explain why the Unix philosophy, using small interchangeable modules to construct complex tools, is way better than large monolithic applications that direct you to work in a certain way. Above, you only use bash, awk, pdflatex, inotifywait, evince , and your favourite text editor nano (my preference too, actually!), to construct a fully automated document generation suited to your needs. Talk about powerful... |
Wow. That is some script you've put together, Nominal. I'm impressed. And even more impressed by the additional suggestions for how to keep the various forms of the files updated. That looks like a lot of work. I'm truly grateful.
That said, understanding the workings of this script is way beyond me. I've looked at it several times now to see if I can get some idea of which does what. I get lost almost immediately and have to give up. I have tested it though, and it is, of course, quite effective. I assumed it needed to be saved with and *.awk extension and would then need to be chmod +x'd, so I did that. I wasn't sure yet whether it should be used in the same way as Cedrik's (i.e., script.awk outline.file > tex-file.tex), but some brief experimentation cleared that up. I was able this way to do a largely successful first run. By largely successful what I mean is that the script properly identifed and marked up most, though not all, outline levels. I do have some question about that, but those will need to be prefaced by a bit of explanation. Maybe I can move to that later in this response--assuming you'll be able to devote a bit more attention to clarifying some things. But first, I have some other questions. I'm not quite understanding about the title aspect of the script. It seems you might be allowing here for some way to sort of automate entry of the header text (i.e., what goes between curly brackets at {***Header*title*here***})--certainly a helpful addition: have I understood correctly? If so, I'm failing to gather from looking at the script from where the input for that is supposed to come. Is it reading some part of the input file for that information? Further clarification on that part of the script will be appreciated. I haven't quite understood what you've said about your script's handling of spaces as opposed to tabs, either. That touches on an issue I was struggling to comprehend: namely, whether any script that could process these files would be able to distinguish tab spaces from single spaces. As you may be aware, the nano tweak I applied in order to get nano's color highlighting to work on my outline files does not distinguish between the two. It sounds as though your script treats them the same, true? If so, that seems like a plus. Which brings up another issue I'm wondering about: since your script does not seem to rely on my pseudo-bullet (the equals sign) how does it distinguish an outline level from, say, a wrapped line? I never managed to understand whether, when nano does line wrapping (which I have it set to do), it inserts an end-of-line mark then a new-line mark at the beginning of the next line. If your script searches for (regular expression) new-line marks, then those must occur only when a carriage return is entered, rather than at points where nano simply wraps a line? Clarification on that will be appreciated. On the pseudo-bullet character I've chosen, I'll just mention that I chose it for two reasons. One is that it can help me better to distinguish outline levels when I'm looking, on a screen, at one of my outline files under nano. Perhaps just as importantly though, I decided such a unique character might be needed in order for some search-and-replace script to even work. Yet your script seems to work fairly effectively even without the presence of the pseudo-bullet (something I discovered by accident, btw). Can you clarify how that happens? What I was trying to do in some initial experiments with searching and replacing regular expressions using nano's built-in search-and-replace function, was to get it to detect instances of end-of-line followed by new-line. I couldn't get it to detect such a combination and was unsure, in any case, whether line wrapping would entail that end-of-line/new-line combination as well. So I decided the pseudo-bullet would probably be needed and, in any case, would be helpful in distinguishing outline levels under nano on the screen. So I introduced it. Maybe I should be rethinking that? I think I'll leave my questions at that for now and perhaps pose others later, if you will have any more time to devote to this thread. In conclusion, yes, what you've put together is truly nifty. Thanks again for your input on this! James |
Hi jamtat,
I know you have a working solution, but I wanted to respond. I'm forcing myself to do Python scripting so I can learn it. So, since nobody has posted a Python solution, I'll post mine. It will work the same way that Cedrik's perl script does (i.e. "script.py inputfile.txt > outputfile.txt") EDIT: I modified the script based on Cedrik's point (in a later response) that tabs appearing after the equal sign would cause problems with the "\out{}" text. /EDIT EDIT2: This script runs on my 2.6.6 Python interpreter. As jamtat later discovers, it will not work for 3.2.2. The problem is a change in Python's print() syntax. An updated script for 3.2.2 is posted on the next page of this thread. /EDIT I named the script "texoutline.py" but as long as you use the ".py" extension and adjust your path for python at the top (if necessary), it should work: Code:
#!/usr/bin/python |
@Dark_Helmet: Nice; MUCH easier to read than mine, that's for sure!
Quote:
Quote:
The regular expression for ***Header*title*here*** allowing for leading capitalization is /\*\*\*[Hh]eader\*[Tt]itle\*[Hh]ere\*\*\*/ which hurt my eyes, so I chose an easier string. Quote:
Quote:
Quote:
Consider this logic:
Whenever you get a new line of input, you check which indentation level that line needs. In my script, the number of whitespace columns is spaces, the level at current line is newlevel, and the level the last line printed was on is level. level is initialized to zero, so that you get the initial outline level set for the first word. If newlevel and level differ for a line, you need to set level=newlevel and insert the outline command just after the whitespace on that line. That is all. Quote:
_ _ _ I think there might be a better interface for you, though. Assume you sprinkle LaTeX comments into your input text file, something like Code:
% Title: This is the document title The Outline line in the input text would define your indentation levels. Lines without indentation are always on the first indentation level, so 1 marks the left margin. Since 2 is three columns to the right, second outline level requires at least three spaces. With none, one, or two spaces at the start of a line, you're at outline level 1. The template file would not have anything related to indentation, since the script would do that automatically, based on just the Outline: line. If there is an empty Outline: line (either in the input text file, or in the configuration file), then the script would skip indentation-to-outline-level mapping altogether. In the template LaTeX file, you could use e.g ~Author~ to insert the string from the corresponding comment. You can make that automatic, i.e. you can add whatever strings to both the template and the input text, as long as the keyword only contains letters A-Z, a-z, and maybe digits 0-9 and dashes -, so detection is as reliable as possible. I'd like to avoid using the % character, so that the templates themselves would stay valid LaTeX. You could even add snippet support: For example, Code:
% =snippetname [string] The main thing with these is to think up suitable patterns -- like I have for the ones with ~ above -- that can be simply replaced from the template, without matching unrelated LaTeX code. Using % would be good because it is a comment character, but % & 8 are so easy to confuse, I picked ~ instead. |
@Dark_Helmet
Code:
tabCount = len( inputLine.split('\t') ) But it assumes there will never be tabs elsewhere than start of line, no ? (I mean for using the tab count in \outl{x}) @jamtat FWIW, I have improved the perl script, and noticed it didn't replace the equal sign, have corrected that |
All times are GMT -5. The time now is 08:17 AM. |