LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices



Reply
 
Search this Thread
Old 03-15-2007, 03:32 PM   #1
Critcho
LQ Newbie
 
Registered: Feb 2006
Posts: 20

Rep: Reputation: 0
Newbie SED / AWK / Regex command help request


I'm not too familiar with some of the regex syntax, I am seeking help with a text parser, I beleive SED / AWK would probably do it...

I have a text file, basically a code file.

The interperter doesn't like tabs or comments, and wants each command on it's own line, complete command on one line only. I do like formatting, it helps me read the code and undertand what is going on when I look at it a week later.

Different text editors will replace a <tab> with various combinations of special characters or consecutive spaces.

So I need to replace:
- tab
- /t
- carriage return / line feed
- everything on a line after // (comments)
- consecutive spaces

... with a single space (in that order, so any consecutive replacements to a single space don't add up to multiple spaces, the last replacement is the consecutive spaces)

e.g.
Code:
//Test for the start command
Digital01 = if (
                  ( //system is enabled
                       Mode = 2
                   OR  Mode = 3
                   )
               AND
                   ( //start request received
                       Calculated_Start = 1
                    OR Manual_Start = 1
                    )
            Then
                 1, //start
            Else
                 Digital01) //Unchanged
... gets replaced with

Code:
Digital01 = if ((Mode = 2 OR Mode = 3) AND (Calculated_Start = 1 OR Manual Start = 1) Then 1, Else Digital01)
Thanks!!
Critcho.

Last edited by Critcho; 03-15-2007 at 03:51 PM.
 
Old 03-15-2007, 04:28 PM   #2
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora, Lubuntu, FreeBSD
Posts: 3,930
Blog Entries: 5

Rep: Reputation: Disabled
Is this homework?

A bit of an ugly hack, but I got this working with gawk: several calls to its string function sub(), and concatenation of each line to a single long string that gets displayed at the end.

http://ftp.wayne.edu/pub/gnu/Manuals...mono/gawk.html
 
Old 03-15-2007, 05:26 PM   #3
Critcho
LQ Newbie
 
Registered: Feb 2006
Posts: 20

Original Poster
Rep: Reputation: 0
Nope, not cheating on homework. Part of a much larger program that I am working with a team on, just that it is a minor addition to the project so none of the main engineers want to divert time to write and test the script, and I am pretty busy at the moment, learning awk looks like it will take me some time...
 
Old 03-15-2007, 05:39 PM   #4
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora, Lubuntu, FreeBSD
Posts: 3,930
Blog Entries: 5

Rep: Reputation: Disabled
Ok, I warned you -- it ain't pretty.

Code:
[simba@lk ~]$ cat formatted-code 
//Test for the start command
Digital01 = if (
                  ( //system is enabled
                       Mode = 2
                   OR  Mode = 3
                   )
               AND
                   ( //start request received
                       Calculated_Start = 1
                    OR Manual_Start = 1
                    )
            Then
                 1, //start
            Else
                 Digital01) //Unchanged

[simba@lk ~]$ gawk --posix '{ sub(/\/\/.*$/,""); str=(str $0); gsub(/[[:space:]]{2,}/," ",str); }END{ print str; }' formatted-code 
Digital01 = if ( ( Mode = 2 OR Mode = 3 ) AND ( Calculated_Start = 1 OR Manual_Start = 1 ) Then 1, Else Digital01)

[simba@lk ~]$ gawk --version | head -1
GNU Awk 3.1.3
Very close to your desired output, but not exact. You might need to play around with it and tweak it a bit. (Which will require learning g/awk and regular expressions if you decide to go this route.)

Good luck.
 
Old 03-15-2007, 08:06 PM   #5
Critcho
LQ Newbie
 
Registered: Feb 2006
Posts: 20

Original Poster
Rep: Reputation: 0
anomie, thanks for your help.

The script as you have it doens't work on my OS (embedded Linux - BusyBox v1.00 ), but by following your lead (and studying just the regex's you used rather than the whole book) I have got most of the way with:

Code:
sed 's/  */ /g; s/\/\/.*//g; /^$/d' temp.lge
this returns:

Code:
Digital01 = if (
 (
 Mode = 2
 OR Mode = 3
 )
 AND
 (
 Calculated_Start = 1
 OR Manual_Start = 1
 )
 Then
 1,
 Else
 Digital01)
I just need to replace the carriage returns...
 
Old 03-15-2007, 10:44 PM   #6
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora, Lubuntu, FreeBSD
Posts: 3,930
Blog Entries: 5

Rep: Reputation: Disabled
Sure - the regexp to search for to get the carriage returns will look something like:
"\r$" (which means match a carriage return at the very end of the line)
or
"\r" (match a carriage return anywhere in the line)
 
Old 03-16-2007, 12:24 PM   #7
pokemaster
Member
 
Registered: Apr 2005
Location: Massachusetts, USA
Distribution: debian,ubuntu,slackware
Posts: 110

Rep: Reputation: 17
in other words,

Code:
sed 's/  */ /g; s/\/\/.*//g; /^$/d; /\r/d; ' temp.lge
However, this won't work, because of the way sed processes lines. Instead, run this:

Code:
sed 's/  */ /g; s/\/\/.*//g; /^$/d; ' temp.lge | tr '\n' ' ' 
Does this help?
 
Old 03-16-2007, 03:46 PM   #8
Critcho
LQ Newbie
 
Registered: Feb 2006
Posts: 20

Original Poster
Rep: Reputation: 0
Yeah, I had tried that, but it replaced all my "r"'s with spaces....

Code:
[user@Dreadnaught config]$ cat test3.lge.txt
Line 1 has words with the letter r in it

Line 3    ends with a space

Line 5 next line is 2 spaces

[user@Dreadnaught config]$ sed 's/  */ /g; s/\/\/.*//g; /^$/d; s/\r/ /g' test3.lge.txt >test3.lge
[user@Dreadnaught config]$ cat test3.lge
Line 1 has wo ds with the lette    in it
Line 3 ends with a space
Line 5 next line is 2 spaces

[user@Dreadnaught config]$
 
Old 03-16-2007, 04:31 PM   #9
Critcho
LQ Newbie
 
Registered: Feb 2006
Posts: 20

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by pokemaster
Code:
sed 's/  */ /g; s/\/\/.*//g; /^$/d; ' temp.lge | tr '\n' ' ' 
Does this help?
Sure does!! Yup, that is getting me much closer! Thanks heaps!

I just played with the order a bit to remove double spaces created by the tr command (it can't take null / '' as a second argument). Now I would like to be able join lines that are not seperated by blank lines.... (i.e. convert paragraphs into lines, but one line per paragraph) getting tougher, but getting much closer...

My interim solution of writing the code with -- on blank lines will do me for now.

i.e.
Code:
[user@Dreadnaught config]$ cat test3.lge.txt
Line 1 has words with the letter r in it
//Comments on line 2
--
Line 4    ends with a space
--
Line 6 next line is 2 spaces
--
All above lines should stay on their own line
--
  Lines 10 through 12
  are considered a paragraph
  they should end up on one line
[user@Dreadnaught config]$ sed 's/\/\/.*//g; /^$/d' test3.lge.txt | tr '\n' ' ' | tr '\-\-' '\n' | sed 's/\  */ /g; s/^ //g' >test3.lge
[user@Dreadnaught config]$ cat test3.lge
Line 1 has words with the letter r in it

Line 4 ends with a space

Line 6 next line is 2 spaces

All above lines should stay on their own line

Lines 10 through 12 are considered a paragraph they should end up on one line [user@Dreadnaught config]$

Last edited by Critcho; 03-16-2007 at 04:34 PM.
 
Old 03-16-2007, 06:15 PM   #10
pokemaster
Member
 
Registered: Apr 2005
Location: Massachusetts, USA
Distribution: debian,ubuntu,slackware
Posts: 110

Rep: Reputation: 17
ah, well, try this:
Code:
sed 's/  */ /g; s/\/\/.*//g; /^$/d; ' temp.lge | tr -d '\n' | sed -e 's/  */ /g;'
Edit: under closer inspection, i read about the paragraphs -- this only solves the multiple spaces. note the 'tr -d '\n'', this is how you do the null replacement you were attempting (-d = delete)

To keep the paragraphs, you will need to change it a little more:
Code:
sed 's/  */ /g; s/\/\/.*//g; s/ *$//; s/^ *//; s/^$/\/\//' tmp.lge | tr '\n' ' ' | sed -e 's/\/\//\n/g'
I kept the space in the tr command, since some commands managed to string together otherwise...

Last edited by pokemaster; 03-16-2007 at 06:27 PM.
 
Old 03-19-2007, 12:22 PM   #11
Critcho
LQ Newbie
 
Registered: Feb 2006
Posts: 20

Original Poster
Rep: Reputation: 0
OK, I think I have the final final working version

Code:
sed 's/\/\/.*//g; s/ *$//; s/^$/\/\//' $1.lge.txt | tr '\n' ' ' | sed 's/\/\//\n/g; s/ */ /g' | sed 's/^ //g; /^[#tab]*$/d' > $1.lge
And it does what I want it to!!
Code:
[[user@Dreadnaught config]$ cat cleanlge.sh
if [ $# -ne 1 ]; then
        echo "cleanlge script"
        echo
        echo "function:"
        echo "  cleans comments from lge file, puts all commands on one line"
        echo
        echo "usage:"
        echo "  . ./cleanlge.sh filename"
        echo "  will clean up filename.lge.txt and save as filename.lge"
        echo
else
        echo "--------Original File------------"
        cat $1.lge.txt
        echo "--------Cleaned File-------------"
        sed 's/\/\/.*//g; s/ *$//; s/^$/\/\//' $1.lge.txt | tr '\n' ' ' | sed 's/\/\//\n/g; s/ */ /g' | sed 's/^ //g; /^[#tab]*$/d' > $1.lge
        cat $1.lge
        echo
        echo "---------------------------------"
fi
[user@Dreadnaught config]$ . ./cleanlge.sh test3
--------Original File------------
Line 1 has words with the letter r in it
//Comments on line 2

Line 4    ends with a space //Comments at the end of line 4

Line 6 next line is 2 spaces

All above lines should stay on their own line

  Lines 10 through 12 //and they have comments on each line
  are considered a paragraph //more comments
  they should end up on one line
--------Cleaned File-------------
Line 1 has words with the letter r in it
Line 4 ends with a space
Line 6 next line is 2 spaces
All above lines should stay on their own line
Lines 10 through 12 are considered a paragraph they should end up on one line
---------------------------------
[user@Dreadnaught config]$ [user@Dreadnaught config]$
Yah!

Thanks all!
 
  


Reply

Tags
awk, regex, sed


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
sed / awk command to print line number as column? johnpaulodonnell Linux - Newbie 2 01-22-2007 08:07 AM
sed RegEx problems InJesus Programming 6 01-12-2007 12:48 PM
sed / regex question whysyn Linux - General 3 06-28-2005 03:11 PM
Sed/Awk command help needed. farmerjoe Programming 3 03-02-2005 12:13 PM
Help with Sed and regex cmfarley19 Programming 6 11-18-2004 02:09 PM


All times are GMT -5. The time now is 11:55 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration