LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   selecting digits from regular expression (http://www.linuxquestions.org/questions/programming-9/selecting-digits-from-regular-expression-901129/)

vjramana 09-04-2011 08:58 PM

selecting digits from regular expression
 
I shall alter the question to make it more clear.have data set as below:
Quote:

HBOND SUMMARY
output to file HB_lowLyo_D_lipid_A_water_001_064.tbl,
data was sorted, intra-residue interactions are NOT included,
Distance cutoff is 4.00 angstroms, angle cutoff is 120.00 degrees
Hydrogen bond information dumped for occupancies > 0.00

DONOR ACCEPTORH ACCEPTOR
atom# res@atom atom# res@atom atom# res@atom %occupied distance angle
| 4645 58@O12 | 23489 1174@H1 23488 1174@O | 22.79 2.945 ( 0.28) 26.79 (14.41)
| 4645 58@O12 | 23490 1174@H2 23488 1174@O | 22.49 2.965 ( 0.31) 28.01 (14.47)
| 2701 34@O12 | 23333 1122@H1 23332 1122@O | 20.60 2.965 ( 0.23) 30.07 (14.18)
| 2701 34@O12 | 23334 1122@H2 23332 1122@O | 19.74 2.963 ( 0.23) 31.43 (13.88)
| 271 4@O12 | 23334 1122@H2 23332 1122@O | 19.70 2.825 ( 0.19) 21.92 (12.15)
| 271 4@O12 | 23333 1122@H1 23332 1122@O | 19.55 2.826 ( 0.19) 22.22 (12.71)
| 4655 58@O16 | 21156 396@H2 21154 396@O | 19.43 2.933 ( 0.22) 31.95 (15.18)
| 4658 58@O15 | 21156 396@H2 21154 396@O | 18.96 3.163 ( 0.27) 37.03 (14.63)
| 4310 54@O26 | 23202 1078@H2 23200 1078@O | 18.73 2.821 ( 0.24) 25.87 (13.92)
| 4655 58@O16 | 21155 396@H1 21154 396@O | 18.63 2.917 ( 0.22) 31.91 (15.00)
| 1820 23@O16 | 21167 400@H1 21166 400@O | 18.14 2.910 ( 0.22) 27.20 (13.87)
| 1820 23@O16 | 21168 400@H2 21166 400@O | 17.96 2.907 ( 0.21) 26.69 (13.86)
| 3845 48@O16 | 23454 1162@H2 23452 1162@O | 17.68 2.991 ( 0.31) 28.45 (14.88)
| 4658 58@O15 | 21155 396@H1 21154 396@O | 17.31 3.177 ( 0.27) 38.82 (14.69)
| 3845 48@O16 | 23453 1162@H1 23452 1162@O | 17.29 3.016 ( 0.32) 28.84 (14.57)
| 1489 19@O13 | 23201 1078@H1 23200 1078@O | 16.66 2.884 ( 0.23) 31.39 (15.56)
| 3824 48@O26 | 21099 377@H2 21097 377@O | 15.44 2.992 ( 0.30) 30.78 (15.01)
| 4253 53@O15 | 23454 1162@H2 23452 1162@O | 14.98 2.961 ( 0.27) 33.71 (15.09)
| 1459 19@O22 | 23201 1078@H1 23200 1078@O | 14.84 3.012 ( 0.33) 35.08 (16.12)
| 1081 14@O12 | 21173 402@H1 21172 402@O | 14.76 2.937 ( 0.24) 27.54 (14.26)
| 4253 53@O15 | 23453 1162@H1 23452 1162@O | 14.63 2.955 ( 0.25) 33.68 (15.11)
| 1081 14@O12 | 21174 402@H2 21172 402@O | 14.41 2.944 ( 0.25) 28.34 (14.35)
| 3824 48@O26 | 21098 377@H1 21097 377@O | 13.70 3.002 ( 0.30) 31.00 (15.21)
| 3845 48@O16 | 21156 396@H2 21154 396@O | 13.06 2.934 ( 0.26) 27.71 (14.05)
.
.
.
few thousand lines
The first field, $1 represent "|".
The $3 (3rd field) and $6 (6th field) in my data file represent "number-molecule" which has arrangement as below:


Quote:

1 2 3 4 5 6 7 8

9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56

57 58 59 60 61 62 63 64
Any pairs made from above numbers actually represents pairs in the 3rd and 6th field of each line in the data file.

What I want is to select the pairs from the data file made only by the numbers which are arranged at the outer most lines of the above number-molecule ordering.

In short, ANY PAIRS made by only the numbers

Quote:

(1 2 3 4 5 6 7 8 57 58 59 60 61 62 63 64 9 17 25 33 41 49 57 8 16 24 32 40 48 56 64)

in other words

1 , 2
1 , 3
1 , 4
.
.
1 , 57
1 , 58
1 , 59
.
.
.
2, 1
2, 3
2, 4
2, 5
.
.
.
2, 57
2, 58
2, 59
.
.
.
are need to be deleted from the data file.

To achieve this I have tried to write awk script as below to test to print out the line which I suppose to delete. But at this level I fail to select those line pairs.

Quote:

#!/usr/bin/awk -f

BEGIN {
i=0
for (n=1; n<=8; n++) set[i++] = n;
for (n=57; n<=64; n++) set[i++] = n;
for (n=9; n<=49; n+=8) {set[i++] = n; set[i++] = n+7};
}


($1== "|") {
split($3, res1, "@"); split($6, res2, "@"); #print res1[1], res2[1]

if ( (res1[1] in set) == (res2[1] in set) );

{
print;
}

}

Can I get any help to resolve this needs?

Thanks in advance

grail 09-05-2011 02:57 AM

Based on the input you have shown I would expect zero output as nothing in the sixth column is <= 64

Proud 09-05-2011 06:02 AM

There seems to be an assumption that the | characters are to be ignored, but I think most things default to using spaces/whitespace as field separators.

Even ignoring the |s aka using the first line of column/field headers, your 3rd field is of atom#s when it seems more intuitive that you'd mean another res part of a res@atom field, i.e. fields 4 and 6.


All times are GMT -5. The time now is 05:57 AM.