how to use grep/sed

tnik · 11-01-2007, 08:00 AM

I have a text file I'm trying to split up into multiple files.. the format is like this:

Code:

O1234 (filename.ext)
(text here)
X123.23 Y023.
G42 X2234. Y0.
M30

01235 (diffilename.ext)
(text here)
X123.23 Y023.
G42 X2234. Y0.
X123.23 Y023.
M30

01236
(text here)
X123.23 Y023.
G42 X2234. Y0.
X123.23 Y023.
M30

Now what I would like to do is parse the file, if (filename.ext) exists after the O##### then name the file that, and put everything from O### to M30 in it. If it doesn't exist, just use the O##### as the filename.

I'm not even sure where to start on this.. Any help would be greatly appreciated.

Thanks

-Tom

Agrouf · 11-01-2007, 11:43 AM

It's not sed but

Code:

filename=
cat file | while read line
do    
        x1=$(echo $line | cut -f1 -d" ")
        echo $x1 | grep -q '^0[0-9]*'
        if [ $? = 0 ]
        then
           filename=$x1
           x2=$(echo $line | cut -f2 -d" ")
           [ -n $x2 ] && filename=$(echo $x2 | tr -d '()')
        fi
        echo "$line" >>$filename
done

Tinkster · 11-04-2007, 12:18 AM

And an awk-version ... :}

Code:

BEGIN{
  RS=ORS="\n\n";FS=OFS="\n"
}
{
  file=$1
  if( $1 ~ /\(/ ){
    file=gensub( /.+ \(([^\)]+)\)/ , "\\1", 1, $1)
  }
  print $0 > file
}

Cheers,
Tink

sundialsvcs · 11-06-2007, 07:51 PM

There are four great tools in the Unix/Linux world that are terrific for handling problems like these. I'll introduce them individually....

sed ("stream editor") is very useful when you have a single file that you want to do something to, to produce another single file as output. (In Linux/Unix-land, this idea is often applied as a "filter" when "piping" things ... but that's another story...)

grep is a great tool for finding which files contain a particular string. It grows on you... For example, when I needed to find all of the files in a great-big directory (which contains over 3,500 files in various subdirectories) which contained the word "arp" as a whole-word (that is to say, surrounded on both sides by a character that is not a letter), regardless of UPPer or LoWeR CAse, I "merely" typed: grep -rilw arp ~/projects/* Nothin' to it...

awk is probably the tool that you want in this case. The file that you need to process has certain definitely-identifiable characteristics, such as:

There's one "record" per "line," and it seems that "fields" in each "record" are separated by "one or more spaces."
"A line that begins with 'O' followed by one-or-more 'digits'" marks the beginning of "something I am interested in," and when I see such a record, "the second field" (filename) "is interesting."
After I have seen a record like that, zero-or-more records contain useful text...
"But when I see a record starting with 'X' followed by zero-or-more 'digits'" I want to...

Well, you get the idea. awk is a tool that's designed for things like that.

For the truly adventurous, the programming language perl was actually designed by a person who started his quest by "extending awk" and ... well ... "one thing lead to another," as things in our peculiar industry so-often do.