LinuxQuestions.org - sed challenge..datamining

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - sed challenge..datamining (https://www.linuxquestions.org/questions/programming-9/sed-challenge-datamining-519099/)

sed challenge..datamining

Hello All,

I am working on a project and i came up with this problem.I need to extract a certain information from a text file...
for example.
YES_CHICK
agagagagadagatdgatagatfgatagatagag
agagagagatagatagatagtagatatagtatagta
fgafgatatatatatattgtgatgatgatgatgat
YES_HUMAN
sgsgsgsgasgafafafatsfgsgsfsgsfgsfsg
fgsfstfstsgtsgstsgstsgtsgstsgstsgts
gstsstsgstsgtsgstsgstsgtsgsststgstgs
YES_DEMON
fgsdgddghudghgdgshghdghsghgdhgdsgdhghd
fgdshdghdshgdshgdsgdhghdsghdshgdsgsh
gshgdhsgdhgsdgshghdsghdsghgdhgshgdhgsh

and i want to extract the info from it.for example if the user query is YES_HUMAN..then i get all the lines after YES_HUMAN uptil...YES_DEMON(not included.)

I have worked with sed before many times but i am having trouble doing this..i am sure it is possible.

If u think that it is not possible..what other options do i have..like any C++ code would also be of great help..

thanks and Regards to all
FAHAD SAEED

Hi.

Have a read of this:
http://enterprise.linux.com/article....33253&from=rss
The 'Searching, browsing, and exporting records' bit should be particularly interesting.

Dave

it still wont work... :(...any other ideas

just one way

Code:

sed -n "/YES_CHICK/,/YES_HUMAN/{/YES_*/!p}" yourfile

sorry, i forgot to say my sed is GNU based.

The code by Ghostdog doesn't work with me in bash. I admit, I don't understand the code either otherwise I would have tried to fix it.

In these cases, I think awk is your friend. I really pays off to grab the concept of awk. Once you do it only takes a few minutes to create a script for this kind of processing. Awk was written for this purpose. :) Writing this post took me longer than writing the script.

This is the script:

Code:

BEGIN {

        pflag=0

}



{

        if ($0 ~ /YES_/){

                pflag=0

        }



        if ($0 == flavour) {

                pflag=1

        }





        if (pflag == 1 && $0 !~ flavour ){

                print $0

        }

}

With this input file

Code:

YES_CHICKEN

1. chicken chicken

2. chicken chicken

3. chicken chicken

4. chicken chicken

YES_HUMAN

1. human human

2. human human

3. human human

4. human human

YES_BIRD

1. bird bird

2. bird bird

3. bird bird

4. bird bird

yesfile is the input file containing your data strings. yes.awk is the awk script file.

it gives this output:

donald_pc:/tmp$ cat yesfile | awk -v flavour=YES_ -f yes.awk

donald_pc:/tmp$ cat yesfile | awk -v flavour=YES_m -f yes.awk

donald_pc:/tmp$ cat yesfile | awk -v flavour=YES_BIRD -f yes.awk
1. bird bird
2. bird bird
3. bird bird
4. bird bird

donald_pc:/tmp$ cat yesfile | awk -v flavour=YES_CHICKEN -f yes.awk
1. chicken chicken
2. chicken chicken
3. chicken chicken
4. chicken chicken

donald_pc:/tmp$

If you want the query string to show up before the data lines, change
if (pflag == 1 && $0 !~ flavour ){
in
if (pflag == 1){

On the command line, "-v flavour" passes a command line parameter to the awk script.

I know that there are awk gurus who can do this much more elegantly, and put it all on one line. This script is readable though :D

Let me know if this works for you

jlinkels

Thankyou all :)...that was very nice of u..

The code did work for the input file that i gave...

BUT there is one more hurdle...

the original data file is

Code:

>YES_CHICKEN

1. chicken chicken

2. chicken chicken

3. chicken chicken

4. chicken chicken

>YES_HUMAN

1. human human

2. human human

3. human human

4. human human

>YES_BIRD

1. bird bird

2. bird bird

3. bird bird

4. bird bird

and the code that jlinkels gave did not work with this data file...
so i tried to modify the code and did this

Code:

BEGIN {

        pflag=0

}



{

        if ($0 ~ />YES_/){

                pflag=0

        }



        if ($0 == flavour) {

                pflag=1

        }





        if (pflag == 1 && $0 !~ flavour ){

                print $0

        }

}

and for the output i typed this...

Code:

cat yesfile | awk -v flavour=>YES_BIRD -f yes.awk

BUT it wont work for the new data file...please help!!!!

If you are going to put > before the field, then use quotes in the awk statement.

Code:

cat file.txt

>YES_CHICKEN

1. chicken chicken

2. chicken chicken

3. chicken chicken

4. chicken chicken

>YES_HUMAN

1. human human

2. human human

3. human human

4. human human

>YES_BIRD

1. bird bird

2. bird bird

3. bird bird

4. bird bird

Code:

awk -v flavour=">YES_BIRD" -f yes.awk file.txt

1. bird bird

2. bird bird

3. bird bird

4. bird bird

Edit: By the way, the sed command works on my FC6 box

Code:

sed -n '/>YES_HUMAN/,/>YES_BIRD/{/>YES_BIRD/!p}' file.txt

>YES_HUMAN

1. human human

2. human human

3. human human

4. human human

Thankyou so much for allof the help :)

the code did work for the modified data file.

However this did nt work on my RedHat Linux 3.3..

Code:

sed -n "/YES_CHICK/,/YES_HUMAN/{/YES_*/!p}" yourfile