LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-14-2007, 12:13 AM   #1
fs11
Member
 
Registered: Aug 2006
Posts: 79

Rep: Reputation: 15
sed challenge..datamining


Hello All,

I am working on a project and i came up with this problem.I need to extract a certain information from a text file...
for example.
YES_CHICK
agagagagadagatdgatagatfgatagatagag
agagagagatagatagatagtagatatagtatagta
fgafgatatatatatattgtgatgatgatgatgat
YES_HUMAN
sgsgsgsgasgafafafatsfgsgsfsgsfgsfsg
fgsfstfstsgtsgstsgstsgtsgstsgstsgts
gstsstsgstsgtsgstsgstsgtsgsststgstgs
YES_DEMON
fgsdgddghudghgdgshghdghsghgdhgdsgdhghd
fgdshdghdshgdshgdsgdhghdsghdshgdsgsh
gshgdhsgdhgsdgshghdsghdsghgdhgshgdhgsh


and i want to extract the info from it.for example if the user query is YES_HUMAN..then i get all the lines after YES_HUMAN uptil...YES_DEMON(not included.)


I have worked with sed before many times but i am having trouble doing this..i am sure it is possible.

If u think that it is not possible..what other options do i have..like any C++ code would also be of great help..

thanks and Regards to all
FAHAD SAEED
 
Old 01-14-2007, 12:59 AM   #2
ilikejam
Senior Member
 
Registered: Aug 2003
Location: Glasgow
Distribution: Fedora / Solaris
Posts: 3,109

Rep: Reputation: 97
Hi.

Have a read of this:
http://enterprise.linux.com/article....33253&from=rss
The 'Searching, browsing, and exporting records' bit should be particularly interesting.

Dave

Last edited by ilikejam; 01-14-2007 at 01:00 AM.
 
Old 01-14-2007, 01:46 AM   #3
fs11
Member
 
Registered: Aug 2006
Posts: 79

Original Poster
Rep: Reputation: 15
it still wont work... ...any other ideas
 
Old 01-14-2007, 02:10 AM   #4
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
just one way
Code:
sed -n "/YES_CHICK/,/YES_HUMAN/{/YES_*/!p}" yourfile
sorry, i forgot to say my sed is GNU based.

Last edited by ghostdog74; 01-14-2007 at 07:20 PM.
 
Old 01-14-2007, 02:54 PM   #5
jlinkels
LQ Guru
 
Registered: Oct 2003
Location: Bonaire, Leeuwarden
Distribution: Debian /Jessie/Stretch/Sid, Linux Mint DE
Posts: 5,195

Rep: Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043
The code by Ghostdog doesn't work with me in bash. I admit, I don't understand the code either otherwise I would have tried to fix it.

In these cases, I think awk is your friend. I really pays off to grab the concept of awk. Once you do it only takes a few minutes to create a script for this kind of processing. Awk was written for this purpose. Writing this post took me longer than writing the script.

This is the script:

Code:
BEGIN {
        pflag=0
}

{
        if ($0 ~ /YES_/){
                pflag=0
        }

        if ($0 == flavour) {
                pflag=1
        }


        if (pflag == 1 && $0 !~ flavour ){
                print $0
        }
}
With this input file
Code:
YES_CHICKEN
1. chicken chicken
2. chicken chicken
3. chicken chicken
4. chicken chicken
YES_HUMAN
1. human human
2. human human
3. human human
4. human human
YES_BIRD
1. bird bird
2. bird bird
3. bird bird
4. bird bird
yesfile is the input file containing your data strings. yes.awk is the awk script file.

it gives this output:

donald_pc:/tmp$ cat yesfile | awk -v flavour=YES_ -f yes.awk

donald_pc:/tmp$ cat yesfile | awk -v flavour=YES_m -f yes.awk

donald_pc:/tmp$ cat yesfile | awk -v flavour=YES_BIRD -f yes.awk
1. bird bird
2. bird bird
3. bird bird
4. bird bird

donald_pc:/tmp$ cat yesfile | awk -v flavour=YES_CHICKEN -f yes.awk
1. chicken chicken
2. chicken chicken
3. chicken chicken
4. chicken chicken

donald_pc:/tmp$


If you want the query string to show up before the data lines, change
if (pflag == 1 && $0 !~ flavour ){
in
if (pflag == 1){

On the command line, "-v flavour" passes a command line parameter to the awk script.

I know that there are awk gurus who can do this much more elegantly, and put it all on one line. This script is readable though

Let me know if this works for you

jlinkels
 
Old 01-14-2007, 07:58 PM   #6
fs11
Member
 
Registered: Aug 2006
Posts: 79

Original Poster
Rep: Reputation: 15
Thankyou all ...that was very nice of u..

The code did work for the input file that i gave...

BUT there is one more hurdle...

the original data file is


Code:
>YES_CHICKEN
1. chicken chicken
2. chicken chicken
3. chicken chicken
4. chicken chicken
>YES_HUMAN
1. human human
2. human human
3. human human
4. human human
>YES_BIRD
1. bird bird
2. bird bird
3. bird bird
4. bird bird

and the code that jlinkels gave did not work with this data file...
so i tried to modify the code and did this

Code:
BEGIN {
        pflag=0
}

{
        if ($0 ~ />YES_/){
                pflag=0
        }

        if ($0 == flavour) {
                pflag=1
        }


        if (pflag == 1 && $0 !~ flavour ){
                print $0
        }
}
and for the output i typed this...

Code:
cat yesfile | awk -v flavour=>YES_BIRD -f yes.awk
BUT it wont work for the new data file...please help!!!!

Last edited by fs11; 01-14-2007 at 07:59 PM.
 
Old 01-14-2007, 08:18 PM   #7
homey
Senior Member
 
Registered: Oct 2003
Posts: 3,057

Rep: Reputation: 61
If you are going to put > before the field, then use quotes in the awk statement.

Code:
cat file.txt
>YES_CHICKEN
1. chicken chicken
2. chicken chicken
3. chicken chicken
4. chicken chicken
>YES_HUMAN
1. human human
2. human human
3. human human
4. human human
>YES_BIRD
1. bird bird
2. bird bird
3. bird bird
4. bird bird
Code:
awk -v flavour=">YES_BIRD" -f yes.awk file.txt
1. bird bird
2. bird bird
3. bird bird
4. bird bird
Edit: By the way, the sed command works on my FC6 box
Code:
sed -n '/>YES_HUMAN/,/>YES_BIRD/{/>YES_BIRD/!p}' file.txt
>YES_HUMAN
1. human human
2. human human
3. human human
4. human human

Last edited by homey; 01-14-2007 at 08:21 PM.
 
Old 01-14-2007, 08:26 PM   #8
fs11
Member
 
Registered: Aug 2006
Posts: 79

Original Poster
Rep: Reputation: 15
Thankyou so much for allof the help

the code did work for the modified data file.


However this did nt work on my RedHat Linux 3.3..

Code:
sed -n "/YES_CHICK/,/YES_HUMAN/{/YES_*/!p}" yourfile
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
new challenge thebiggiantmouse Linux - Newbie 16 07-24-2006 09:03 AM
bash script with grep and sed: sed getting filenames from grep odysseus.lost Programming 1 07-17-2006 11:36 AM
[sed] "Advanced" sed question(s) G00fy Programming 2 03-20-2006 12:34 AM
sed and escaping & in something like: echo $y | sed 's/&/_/g' prx Programming 7 02-03-2005 11:00 PM
Insert character into a line with sed? & variables in sed? jago25_98 Programming 5 03-11-2004 06:12 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:08 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration