LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 11-03-2009, 09:08 AM   #1
cs24
LQ Newbie
 
Registered: Nov 2009
Location: Chinese
Distribution: Centos 5.2
Posts: 23

Rep: Reputation: 1
awk or sed to for this questions solution ?


I have a fileA like this:
>cat fileA
aaa 111 222
ccc 333 444
ddd 555 666
bbb 777 888
eee 999 000
aaa 222 111
fff 888 000
ccc 555 444
bbb 555 666

I want to select line between aaa and bbb output to a file.
and I have another condition ,when the ccc between aaa and bbb is "333 444",output to file3,when the ccc between aaa and bbb is "555 444",output to file5.
like this
>cat file3
aaa 111 222
ccc 333 444
ddd 555 666
bbb 777 888

>cat file5
aaa 222 111
fff 888 000
ccc 555 444
bbb 555 666

my arithmetic is separate the line between aaa and bbb to a temp file.and use the grep to do a judge to different file.(file3 or file5)

when is run this command ,it will put all segment(between aaa and bbb) to one file.but not just the first segment (between aaa and bbb)
>cat fileA | awk '/aaa/,/bbb/' > file
aaa 111 222
ccc 333 444
ddd 555 666
bbb 777 888
aaa 222 111
fff 888 000
ccc 555 444
bbb 555 666

how to finish this job ? what command need me to use? awk ? sed ? grep ? or perl ? how to write this script ?
 
Old 11-03-2009, 07:39 PM   #2
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 239Reputation: 239Reputation: 239
Code:
awk 'BEGIN{e["333 444"]="file3.txt";e["555 444"]="file5.txt"}
$1=="aaa"{ f=1; d=0}
f && $1=="bbb"{
  for(i=1;i<=d;i++){  
    print c[i] > e[a["ccc"]]
  }
  print $0 > e[a["ccc"]]
  f=0
  delete a
  delete c
}
f{
   c[++d]=$0
   if ($1=="ccc") a["ccc"]=$2 FS $3
}' file
output
Code:
$ ./shell.sh

$ more file5.txt
aaa 222 111
fff 888 000
ccc 555 444
bbb 555 666

$ more file3.txt
aaa 111 222
ccc 333 444
ddd 555 666
bbb 777 888
 
Old 11-10-2009, 03:42 AM   #3
cs24
LQ Newbie
 
Registered: Nov 2009
Location: Chinese
Distribution: Centos 5.2
Posts: 23

Original Poster
Rep: Reputation: 1
first ,I thank you so much .this script is wonderful to accomplish my request .and thank you for your reply so quick .
as a beginner to the awk. I have some confuse to this script .
Code:
awk 'BEGIN{e["333 444"]="file3.txt";e["555 444"]="file5.txt"}
$1=="aaa"{ f=1; d=0}
#here is set the first key word,i see it.
f && $1=="bbb"{
# and then find the second key word.
  for(i=1;i<=d;i++){  
#here is confuse,is the d assigned 0 ? the "i" must be less then 0,how this loop can go on ? 
    print c[i] > e[a["ccc"]]
#here e[a["ccc"]],is it a two-dimensional array? i just guess this action for set the length of e[a["ccc"]] array to the length from "aaa" to "bbb" 
  }
  print $0 > e[a["ccc"]]
  f=0
  delete a
  delete c
}
f{
   c[++d]=$0 #I still confuse the variable d,
   if ($1=="ccc") a["ccc"]=$2 FS $3
}' file
and at the begin just this e["333 444"]="file3.txt";e["555 444"]="file5.txt",can output to two file ?
i have read some materials about the awk array .Maybe some other i still don't know well or understand.
 
Old 11-11-2009, 02:01 AM   #4
cs24
LQ Newbie
 
Registered: Nov 2009
Location: Chinese
Distribution: Centos 5.2
Posts: 23

Original Poster
Rep: Reputation: 1
I add some "print" to the script and I know variation for what.
but I still can understand why it will be.
I print the e["333 444"],it's file3.txt this is obviously .but why e[a["ccc"]] can link to e["333 444"] or e["555 444"] ?
when my original fileA have not only one ccc area ,like this
aaa 111 222
ccc 333 444
ddd 555 666
ccc 888 999
bbb 777 888
eee 999 000
aaa 222 111
fff 888 000
ccc 555 444
bbb 555 666

In this situation ,the script not work .e[a["ccc"]] can not link to e["333 444"] or e["555 444"]. it will appear "Null file output " error.
My original fileA have more then one area "ccc" how can it fix it ?
 
Old 11-11-2009, 02:12 AM   #5
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
Code:
sed -n '/aaa/,/bbb/{p
>                   /bbb/q}' testfile
aaa 111 222
ccc 333 444
ddd 555 666
bbb 777 888
Firstly, using cat is not necessary. I'm mentioning it because that is one of my pet peeves.
I grouped commands after the first range. The second command tests for the end of the range and simply quits.

You could have a pattern match test for /ccc/ between the p & q commands. The command executed could write to a file. However if the entire range needs to be written do a different file, you will need to use the N or H commands to build up the lines in the Line or Hold registers and then test for the ccc line.

It may be easier to do this with a second sed command, and pipe the fragment you want into this command. Then a range test is no longer necessary.

You could even redirect the output to a temporary file and the test for the ccc pattern with grep, renaming the file depending on the results of grep:
grep <pattern> tempfile && mv tempfile file3
grep <pattern2> tempfile && mv tempfile file4

Not the most elegant, but maybe the most readable.

Last edited by jschiwal; 11-11-2009 at 02:23 AM.
 
Old 03-20-2010, 03:45 AM   #6
cs24
LQ Newbie
 
Registered: Nov 2009
Location: Chinese
Distribution: Centos 5.2
Posts: 23

Original Poster
Rep: Reputation: 1
Code:
e["333 444"]="file3.txt"
is that means the value of array's index is 333 444,because there is a blank, so need a pair quotation marks ?
and the array's value is file3.txt ?
 
Old 03-20-2010, 05:23 AM   #7
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,178

Rep: Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779
Hi cs24

You need to step through the code ghostdog gave you line by line for each value in your file to understand when each part is executed.
Basically

1. BEGIN is a keyword for awk so everything in its {} is run / set
2. $1=="aaa", only when this is true will its {} be implemented. When it is implemented though it sets f=1 which means true
3. f && $1=="bbb", so this requires f to be not zero (ie true) and first field to equal bbb for {} to be implemented
4. f, so this one simply says if f not zero then implement {}. As this is where d is incremented you will see that step 3 above will not be run for the first three
entries of your file, hence d=3 the first time step 3 is run

This should be enough to let you work out what the script is doing.
As for your new issue, yes you are correct that an array in awk may take any string as its value (kinda cool IMHO).
So when the f{} portion is run it sets a["ccc"]=888 999, which of course there is no e[] value pointing to a file for this output
to go to. Assuming you want everything not in file3.txt or file5.txt to go in one other single file, you simply need an if to tell awk which
file to put it in if it does not equal one of the others.

Hope this helps
 
Old 03-20-2010, 05:23 AM   #8
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,178

Rep: Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779
Hi cs24

You need to step through the code ghostdog gave you line by line for each value in your file to understand when each part is executed.
Basically

1. BEGIN is a keyword for awk so everything in its {} is run / set
2. $1=="aaa", only when this is true will its {} be implemented. When it is implemented though it sets f=1 which means true
3. f && $1=="bbb", so this requires f to be not zero (ie true) and first field to equal bbb for {} to be implemented
4. f, so this one simply says if f not zero then implement {}. As this is where d is incremented you will see that step 3 above will not be run for the first three
entries of your file, hence d=3 the first time step 3 is run

This should be enough to let you work out what the script is doing.
As for your new issue, yes you are correct that an array in awk may take any string as its value (kinda cool IMHO).
So when the f{} portion is run it sets a["ccc"]=888 999, which of course there is no e[] value pointing to a file for this output
to go to. Assuming you want everything not in file3.txt or file5.txt to go in one other single file, you simply need an if to tell awk which
file to put it in if it does not equal one of the others.

Hope this helps
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
sed, awk - solution for filtering logs cmeyer Linux - Software 8 10-11-2008 01:01 PM
sed or awk ilo Programming 1 08-22-2008 10:38 AM
awk and/or sed linux2man Linux - General 7 01-22-2007 10:02 AM
Sed and Awk Gins Programming 7 04-19-2006 10:32 AM
awk/sed help pantera Programming 1 05-13-2004 11:59 PM


All times are GMT -5. The time now is 01:47 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration