![]() |
Newbie SED / AWK / Regex command help request
I'm not too familiar with some of the regex syntax, I am seeking help with a text parser, I beleive SED / AWK would probably do it...
I have a text file, basically a code file. The interperter doesn't like tabs or comments, and wants each command on it's own line, complete command on one line only. I do like formatting, it helps me read the code and undertand what is going on when I look at it a week later. Different text editors will replace a <tab> with various combinations of special characters or consecutive spaces. So I need to replace: - tab - /t - carriage return / line feed - everything on a line after // (comments) - consecutive spaces ... with a single space (in that order, so any consecutive replacements to a single space don't add up to multiple spaces, the last replacement is the consecutive spaces) e.g. Code:
//Test for the start commandCode:
Digital01 = if ((Mode = 2 OR Mode = 3) AND (Calculated_Start = 1 OR Manual Start = 1) Then 1, Else Digital01)Critcho. |
Is this homework?
A bit of an ugly hack, but I got this working with gawk: several calls to its string function sub(), and concatenation of each line to a single long string that gets displayed at the end. http://ftp.wayne.edu/pub/gnu/Manuals...mono/gawk.html |
Nope, not cheating on homework. Part of a much larger program that I am working with a team on, just that it is a minor addition to the project so none of the main engineers want to divert time to write and test the script, and I am pretty busy at the moment, learning awk looks like it will take me some time...
|
Ok, I warned you -- it ain't pretty.
Code:
[simba@lk ~]$ cat formatted-code Good luck. :) |
anomie, thanks for your help.
The script as you have it doens't work on my OS (embedded Linux - BusyBox v1.00 ), but by following your lead (and studying just the regex's you used rather than the whole book) I have got most of the way with: Code:
sed 's/ */ /g; s/\/\/.*//g; /^$/d' temp.lgeCode:
Digital01 = if ( |
Sure - the regexp to search for to get the carriage returns will look something like:
"\r$" (which means match a carriage return at the very end of the line) or "\r" (match a carriage return anywhere in the line) |
in other words,
Code:
sed 's/ */ /g; s/\/\/.*//g; /^$/d; /\r/d; ' temp.lgeCode:
sed 's/ */ /g; s/\/\/.*//g; /^$/d; ' temp.lge | tr '\n' ' ' |
Yeah, I had tried that, but it replaced all my "r"'s with spaces....
Code:
[user@Dreadnaught config]$ cat test3.lge.txt |
Quote:
I just played with the order a bit to remove double spaces created by the tr command (it can't take null / '' as a second argument). Now I would like to be able join lines that are not seperated by blank lines.... (i.e. convert paragraphs into lines, but one line per paragraph) getting tougher, but getting much closer... My interim solution of writing the code with -- on blank lines will do me for now. i.e. Code:
[user@Dreadnaught config]$ cat test3.lge.txt |
ah, well, try this:
Code:
sed 's/ */ /g; s/\/\/.*//g; /^$/d; ' temp.lge | tr -d '\n' | sed -e 's/ */ /g;'To keep the paragraphs, you will need to change it a little more: Code:
sed 's/ */ /g; s/\/\/.*//g; s/ *$//; s/^ *//; s/^$/\/\//' tmp.lge | tr '\n' ' ' | sed -e 's/\/\//\n/g' |
OK, I think I have the final final working version :)
Code:
sed 's/\/\/.*//g; s/ *$//; s/^$/\/\//' $1.lge.txt | tr '\n' ' ' | sed 's/\/\//\n/g; s/ */ /g' | sed 's/^ //g; /^[#tab]*$/d' > $1.lgeCode:
[[user@Dreadnaught config]$ cat cleanlge.shThanks all! |
| All times are GMT -5. The time now is 07:55 AM. |