LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Help needed in removing intermediate segments from a pipe delimited segment file (http://www.linuxquestions.org/questions/programming-9/help-needed-in-removing-intermediate-segments-from-a-pipe-delimited-segment-file-604083/)

naren_0101bits 12-03-2007 03:29 AM

Help needed in removing intermediate segments from a pipe delimited segment file
 
Hi,

I just stuckup in doing some regular expressions on a file.

I have data which has multiple FHS and BTS segments like:

FHS|12121|LOCAL|2323
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
BTS|0000|MERSTO|LIABLE
FHS|12121|LOCAL|2323
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
BTS|0000|MERSTO|LIABLE
FHS|12121|LOCAL|2323
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
BTS|0000|MERSTO|LIABLE

I am trying to have an output which will have only one FHS at the beginning and one BTS in the ending.
And all other FHS and BTS in the middle should be deleted.

The output should look like :

FHS|12121|LOCAL|2323
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
BTS|0000|MERSTO|LIABLE


I will be glad if you give me some light in solving this problem.

Thanks in advance.

Naren

bigearsbilly 12-03-2007 03:38 AM

egrep '^(FHS|BTS)'

matthewg42 12-03-2007 03:42 AM

Sounds like a job for Perl or awk (I have no doubt you will get awk posts from others, so here's a little Perl program to do it):
Code:

#!/usr/bin/perl

use strict;
use warnings;

my $bts = undef;
my $fhs_printed = 0;

while (<>) {
    if (/^FHS/ && ! $fhs_printed) {
        print;
        $fhs_printed = 1;
    }
    elsif (/^BTS/) {
        $bts = $_;
    }
    else {
        print;
    }
}

print $bts || warn "no BTS line found... output is missing a BTS line\n";;

Just save that to a file, chmod 755 the file and run it with the input data file as an argument. Re-direct the output to another file and you have your modified data.

naren_0101bits 12-03-2007 03:56 AM

Hi bigearsbilly,
I am not trying for FHS and BTS alone. I am trying for all the segments except the FHS and BTS which are in between the first and last lines.

Hi matthewg42,
Thanks for the immediate reply. I am trying in awk.

ghostdog74 12-03-2007 04:09 AM

tested on your sample data only
Code:

awk '
/^BTS/ { getline;
        if ($0 ~ /^FHS/) { next }       
}
{print}
' file

output:
Code:

# ./test.sh
FHS|12121|LOCAL|2323
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
BTS|0000|MERSTO|LIABLE


naren_0101bits 12-03-2007 06:16 AM

Hi ghostdog74,

awk '
/^BTS/ { getline;
if ($0 ~ /^FHS/) { next }
}
{print}
' file

This script on execution is not giving the final BTS segment line. But i want the output which has first FHS and last BTS with out intermediate FHS and BTS

AnanthaP 12-03-2007 07:27 AM

Try in awk and here is the pseudo logic.

{
if (fhs and first fhs) then print;
elseif bts then store $0 in x;
else print ;
}
'END' {
print the last stored bts from x;
}

What are the lines really? Not a class test I hope.
End

PAix 12-03-2007 07:42 AM

So I called my script aaa, but as shown, using the file supplied, my result fully supports Ghostdog's assertion. I have highlighted the first and last lines of the output, but other than that it's as output. Did you cut and paste the script that Ghostdog provided?
Code:

ian@C4SL101D:~/bashandawk> cat aaa
#!/bin/sh
awk '
/^BTS/ { getline;
        if ($0 ~ /^FHS/) { next }
}
{print}
' infile
ian@C4SL101D:~/bashandawk> ./aaa
FHS|12121|LOCAL|2323
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
BTS|0000|MERSTO|LIABLE


radoulov 12-03-2007 07:47 AM

If the first FHS and the last BTS are not the first and the last line respectively (otherwise it will be easier):

Code:

awk 'f && /^FHS/ {
        fhs[FNR]
        }
/^FHS/ {
        f = 1
        }
/^BTS/ {
        f1 = FNR
        }
{
        x[FNR] = $0
        } END {
                for(i=1; i<=FNR; i++)
                        if (!((x[i] ~ /^BTS/) && (i != f1)) && !(i in fhs))
                                print x[i]
}' filename


makyo 12-03-2007 09:37 AM

Hi.

I never trust the user to supply clean data, so here is what I am using for testing, "data1":
Code:

chaff
detritus
FHS|12121|LOCAL|2323
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
BTS|0000|MERSTO|LIABLE
FHS|12121|LOCAL|2323
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
BTS|0000|MERSTO|LIABLE
FHS|12121|LOCAL|2323
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
BTS|0000|MERSTO|LIABLE
garbage
junk

Looking at the two awk scripts, this file fed into "user1" (ghostdog74), produces:
Code:

% ./user1 data1
chaff
detritus
FHS|12121|LOCAL|2323
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
garbage
junk

and fed into "user2" (radoulov) produces:
Code:

% ./user2 data1
chaff
detritus
FHS|12121|LOCAL|2323
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
BTS|0000|MERSTO|LIABLE
garbage
junk

Both of those awk scripts seem to work with "clean" data.

I have a multi-sed solution that I may post later ... cheers, makyo

ghostdog74 12-03-2007 10:20 AM

@OP, the script i posted was only tested on your sample data. For the sample version posted my makyo, it will miss the last BTS. here's a fix for it
Code:

awk '
/^BTS/ { l=$0;getline;
        if ($0 ~ /^FHS/) next
        else print l                 
}
{print}
' file

output:
Code:

# ./test.sh
chaff
detritus
FHS|12121|LOCAL|2323
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
MSH|10101|POTAMAS|2323
PID|121221|THOMAS|DAVID|23432
OBX|2342|H1211|3232
BTS|0000|MERSTO|LIABLE
garbage
junk


naren_0101bits 12-03-2007 10:45 AM

Hi all,

Thanks a lot for giving me so many proper and thought making responses in time.


Naren

radoulov 12-03-2007 10:47 AM

Ha! Didn't notice that BTS is followed by FHS.
And if the final BTS could not be followed by FHS:

sed version:

Code:

sed '/^BTS/{N;/\nFHS/d}' filename
or (for older seds):

Code:

sed '/^BTS/{N;/\nFHS/d;}' filename


All times are GMT -5. The time now is 05:16 AM.