Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I'm not too familiar with some of the regex syntax, I am seeking help with a text parser, I beleive SED / AWK would probably do it...
I have a text file, basically a code file.
The interperter doesn't like tabs or comments, and wants each command on it's own line, complete command on one line only. I do like formatting, it helps me read the code and undertand what is going on when I look at it a week later.
Different text editors will replace a <tab> with various combinations of special characters or consecutive spaces.
So I need to replace:
- tab
- /t
- carriage return / line feed
- everything on a line after // (comments)
- consecutive spaces
... with a single space (in that order, so any consecutive replacements to a single space don't add up to multiple spaces, the last replacement is the consecutive spaces)
e.g.
Code:
//Test for the start command
Digital01 = if (
( //system is enabled
Mode = 2
OR Mode = 3
)
AND
( //start request received
Calculated_Start = 1
OR Manual_Start = 1
)
Then
1, //start
Else
Digital01) //Unchanged
... gets replaced with
Code:
Digital01 = if ((Mode = 2 OR Mode = 3) AND (Calculated_Start = 1 OR Manual Start = 1) Then 1, Else Digital01)
A bit of an ugly hack, but I got this working with gawk: several calls to its string function sub(), and concatenation of each line to a single long string that gets displayed at the end.
Nope, not cheating on homework. Part of a much larger program that I am working with a team on, just that it is a minor addition to the project so none of the main engineers want to divert time to write and test the script, and I am pretty busy at the moment, learning awk looks like it will take me some time...
[simba@lk ~]$ cat formatted-code
//Test for the start command
Digital01 = if (
( //system is enabled
Mode = 2
OR Mode = 3
)
AND
( //start request received
Calculated_Start = 1
OR Manual_Start = 1
)
Then
1, //start
Else
Digital01) //Unchanged
[simba@lk ~]$ gawk --posix '{ sub(/\/\/.*$/,""); str=(str $0); gsub(/[[:space:]]{2,}/," ",str); }END{ print str; }' formatted-code
Digital01 = if ( ( Mode = 2 OR Mode = 3 ) AND ( Calculated_Start = 1 OR Manual_Start = 1 ) Then 1, Else Digital01)
[simba@lk ~]$ gawk --version | head -1
GNU Awk 3.1.3
Very close to your desired output, but not exact. You might need to play around with it and tweak it a bit. (Which will require learning g/awk and regular expressions if you decide to go this route.)
The script as you have it doens't work on my OS (embedded Linux - BusyBox v1.00 ), but by following your lead (and studying just the regex's you used rather than the whole book) I have got most of the way with:
Code:
sed 's/ */ /g; s/\/\/.*//g; /^$/d' temp.lge
this returns:
Code:
Digital01 = if (
(
Mode = 2
OR Mode = 3
)
AND
(
Calculated_Start = 1
OR Manual_Start = 1
)
Then
1,
Else
Digital01)
Sure - the regexp to search for to get the carriage returns will look something like: "\r$" (which means match a carriage return at the very end of the line)
or "\r" (match a carriage return anywhere in the line)
Yeah, I had tried that, but it replaced all my "r"'s with spaces....
Code:
[user@Dreadnaught config]$ cat test3.lge.txt
Line 1 has words with the letter r in it
Line 3 ends with a space
Line 5 next line is 2 spaces
[user@Dreadnaught config]$ sed 's/ */ /g; s/\/\/.*//g; /^$/d; s/\r/ /g' test3.lge.txt >test3.lge
[user@Dreadnaught config]$ cat test3.lge
Line 1 has wo ds with the lette in it
Line 3 ends with a space
Line 5 next line is 2 spaces
[user@Dreadnaught config]$
Sure does!! Yup, that is getting me much closer! Thanks heaps!
I just played with the order a bit to remove double spaces created by the tr command (it can't take null / '' as a second argument). Now I would like to be able join lines that are not seperated by blank lines.... (i.e. convert paragraphs into lines, but one line per paragraph) getting tougher, but getting much closer...
My interim solution of writing the code with -- on blank lines will do me for now.
i.e.
Code:
[user@Dreadnaught config]$ cat test3.lge.txt
Line 1 has words with the letter r in it
//Comments on line 2
--
Line 4 ends with a space
--
Line 6 next line is 2 spaces
--
All above lines should stay on their own line
--
Lines 10 through 12
are considered a paragraph
they should end up on one line
[user@Dreadnaught config]$ sed 's/\/\/.*//g; /^$/d' test3.lge.txt | tr '\n' ' ' | tr '\-\-' '\n' | sed 's/\ */ /g; s/^ //g' >test3.lge
[user@Dreadnaught config]$ cat test3.lge
Line 1 has words with the letter r in it
Line 4 ends with a space
Line 6 next line is 2 spaces
All above lines should stay on their own line
Lines 10 through 12 are considered a paragraph they should end up on one line [user@Dreadnaught config]$
sed 's/ */ /g; s/\/\/.*//g; /^$/d; ' temp.lge | tr -d '\n' | sed -e 's/ */ /g;'
Edit: under closer inspection, i read about the paragraphs -- this only solves the multiple spaces. note the 'tr -d '\n'', this is how you do the null replacement you were attempting (-d = delete)
To keep the paragraphs, you will need to change it a little more:
Code:
sed 's/ */ /g; s/\/\/.*//g; s/ *$//; s/^ *//; s/^$/\/\//' tmp.lge | tr '\n' ' ' | sed -e 's/\/\//\n/g'
I kept the space in the tr command, since some commands managed to string together otherwise...
Last edited by pokemaster; 03-16-2007 at 05:27 PM.
OK, I think I have the final final working version
Code:
sed 's/\/\/.*//g; s/ *$//; s/^$/\/\//' $1.lge.txt | tr '\n' ' ' | sed 's/\/\//\n/g; s/ */ /g' | sed 's/^ //g; /^[#tab]*$/d' > $1.lge
And it does what I want it to!!
Code:
[[user@Dreadnaught config]$ cat cleanlge.sh
if [ $# -ne 1 ]; then
echo "cleanlge script"
echo
echo "function:"
echo " cleans comments from lge file, puts all commands on one line"
echo
echo "usage:"
echo " . ./cleanlge.sh filename"
echo " will clean up filename.lge.txt and save as filename.lge"
echo
else
echo "--------Original File------------"
cat $1.lge.txt
echo "--------Cleaned File-------------"
sed 's/\/\/.*//g; s/ *$//; s/^$/\/\//' $1.lge.txt | tr '\n' ' ' | sed 's/\/\//\n/g; s/ */ /g' | sed 's/^ //g; /^[#tab]*$/d' > $1.lge
cat $1.lge
echo
echo "---------------------------------"
fi
[user@Dreadnaught config]$ . ./cleanlge.sh test3
--------Original File------------
Line 1 has words with the letter r in it
//Comments on line 2
Line 4 ends with a space //Comments at the end of line 4
Line 6 next line is 2 spaces
All above lines should stay on their own line
Lines 10 through 12 //and they have comments on each line
are considered a paragraph //more comments
they should end up on one line
--------Cleaned File-------------
Line 1 has words with the letter r in it
Line 4 ends with a space
Line 6 next line is 2 spaces
All above lines should stay on their own line
Lines 10 through 12 are considered a paragraph they should end up on one line
---------------------------------
[user@Dreadnaught config]$ [user@Dreadnaught config]$
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.