LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 07-12-2011, 01:37 PM   #1
mzh
Member
 
Registered: Apr 2011
Location: Copenhagen
Posts: 71

Rep: Reputation: 0
Incrementing Column alphabetically


Dear Forum,
Lets say, I have a file structured like this:
Code:
 MODEL        1
 ATOM      1  N   LYS A   3      31.221  22.957  43.101  1.00 54.39           N
 ATOM      2  CA  LYS A   3      31.828  24.118  42.476  1.00 49.68           C
 ATOM      3  C   LYS A   3      31.979  23.854  41.021  0.00 48.07           C
 ATOM    963  OG1 THR A 125      35.484  35.831  24.497  0.00 27.99           O
 ATOM    964  CG2 THR A 125      33.105  35.742  24.150  0.00 27.99           C
[...]
 TER     965      THR A 125
 ATOM    966  N   ARG B   2      63.476   8.426  12.454  0.00 60.90           N
 ATOM    967  CA  ARG B   2      64.895   8.182  12.295  1.00 66.32           C
 ATOM    968  C   ARG B   2      65.347   6.986  13.094  1.00 65.20           C
 HETATM 2023  O   HOH B 545      66.492  13.204  20.035  1.00 48.25           O
 HETATM 2024  O   HOH B 546      43.799  -1.641  46.084  1.00 69.42           O
[...]
 ENDMDL
 MODEL        2
 ATOM      1  N   LYS A   3      48.929 -22.957  43.101  1.00 54.39           N
 ATOM      2  CA  LYS A   3      48.322 -24.118  42.476  1.00 49.68           C
 ATOM    963  OG1 THR A 125      44.666 -35.831  24.497  0.00 27.99           O
 ATOM    964  CG2 THR A 125      47.045 -35.742  24.150  0.00 27.99           C
[...]
 TER     965      THR A 125
 ATOM    966  N   ARG B   2      16.674  -8.426  12.454  0.00 60.90           N
 ATOM    967  CA  ARG B   2      15.255  -8.182  12.295  1.00 66.32           C
 HETATM 2023  O   HOH B 545      13.658 -13.204  20.035  1.00 48.25           O
 HETATM 2024  O   HOH B 546      36.351   1.641  46.084  1.00 69.42           O
[...]
 ENDMDL
 MASTER       14    0    0    2   18    0   15    9 2022    2    0   20
 END
As you can see, there are various columns with data (its a file describing a molecular protein structure, in total around 4000 lines).
What I require is to change the A's and B's in the 5 column, whenever a 'TER' or 'ENDMDL' keyword appears.
So the file above should be like this:
Code:
 ATOM      1  N   LYS A   3      31.221  22.957  43.101  1.00 54.39           N
 ATOM      2  CA  LYS A   3      31.828  24.118  42.476  1.00 49.68           C
 ATOM    964  CG2 THR A 125      33.105  35.742  24.150  0.00 27.99           C
[...]
 ATOM    966  N   ARG B   2      63.476   8.426  12.454  0.00 60.90           N
 ATOM    967  CA  ARG B   2      64.895   8.182  12.295  1.00 66.32           C
 ATOM    968  C   ARG B   2      65.347   6.986  13.094  1.00 65.20           C
 HETATM 2023  O   HOH B 545      66.492  13.204  20.035  1.00 48.25           O
 HETATM 2024  O   HOH B 546      43.799  -1.641  46.084  1.00 69.42           O
[...]
 ATOM      1  N   LYS C   3      48.929 -22.957  43.101  1.00 54.39           N
 ATOM      2  CA  LYS C   3      48.322 -24.118  42.476  1.00 49.68           C
 ATOM    963  OG1 THR C 125      44.666 -35.831  24.497  0.00 27.99           O
[...]
 ATOM    966  N   ARG D   2      16.674  -8.426  12.454  0.00 60.90           N
 ATOM    967  CA  ARG D   2      15.255  -8.182  12.295  1.00 66.32           C
 HETATM 2023  O   HOH D 545      13.658 -13.204  20.035  1.00 48.25           O
Now, the letter descriptor, the "chain id" has been changed from A/B to A/B/C/D (I ommitted a couple of lines, but the change happens at every 'TER' or 'ENDMDL' keyword). The 'TER', 'MODEL' and 'ENDMDL' lines are also not essential.
Is there a smart way to do this?

Thanks a lot for any help.
Martin

Last edited by mzh; 07-13-2011 at 02:35 AM.
 
Old 07-12-2011, 02:09 PM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

Infile has +/- 4000 lines, each individual block seems to be +/- 6 lines. You would need around 666 unique tokens.

The alphabet has 26 letters, what needs to happen when it reached Z?
 
Old 07-13-2011, 02:37 AM   #3
mzh
Member
 
Registered: Apr 2011
Location: Copenhagen
Posts: 71

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by druuna View Post
Hi,

Infile has +/- 4000 lines, each individual block seems to be +/- 6 lines. You would need around 666 unique tokens.

The alphabet has 26 letters, what needs to happen when it reached Z?
Hey, thanks for the feedback. True, you're right. I edited my post above to make it more clear. Its 4000 lines in the file, and in total 4 blocks, so e.g. the 'A' block of the first model is roughly 1000 lines. At the end, there would be four blocks of each around 1000 lines. Each with a unique chain descriptor: A, B, C, D.
Sorry for the confusion.
 
Old 07-13-2011, 03:35 AM   #4
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

Here's a (long) oneliner that should do what you want/need:
Code:
awk 'BEGIN { l[1] = "A" ; l[2] = "B" ; l[3] = "C" ; l[4] = "D" ; x = 1 } /( TER| ENDMDL)/ { x++ } /(ATOM|HETATM)/ { print $1 "\t" $2 "\t" $3 "t" $4 "\t" l[x] "\t" $6 "\t" $7 "\t" $8 "\t" $9 "\t" $10 "\t" $11 "\t" $12 }' infile
Example run:
Code:
$ cat infile
MODEL        1
 ATOM      1  N   LYS A   3      31.221  22.957  43.101  1.00 54.39           N
 ATOM      2  CA  LYS A   3      31.828  24.118  42.476  1.00 49.68           C
 ATOM      3  C   LYS A   3      31.979  23.854  41.021  0.00 48.07           C
 ATOM    963  OG1 THR A 125      35.484  35.831  24.497  0.00 27.99           O
 ATOM    964  CG2 THR A 125      33.105  35.742  24.150  0.00 27.99           C
 TER     965      THR A 125
 ATOM    966  N   ARG B   2      63.476   8.426  12.454  0.00 60.90           N
 ATOM    967  CA  ARG B   2      64.895   8.182  12.295  1.00 66.32           C
 ATOM    968  C   ARG B   2      65.347   6.986  13.094  1.00 65.20           C
 HETATM 2023  O   HOH B 545      66.492  13.204  20.035  1.00 48.25           O
 HETATM 2024  O   HOH B 546      43.799  -1.641  46.084  1.00 69.42           O
 ENDMDL
 MODEL        2
 ATOM      1  N   LYS A   3      48.929 -22.957  43.101  1.00 54.39           N
 ATOM      2  CA  LYS A   3      48.322 -24.118  42.476  1.00 49.68           C
 ATOM    963  OG1 THR A 125      44.666 -35.831  24.497  0.00 27.99           O
 ATOM    964  CG2 THR A 125      47.045 -35.742  24.150  0.00 27.99           C
 TER     965      THR A 125
 ATOM    966  N   ARG B   2      16.674  -8.426  12.454  0.00 60.90           N
 ATOM    967  CA  ARG B   2      15.255  -8.182  12.295  1.00 66.32           C
 HETATM 2023  O   HOH B 545      13.658 -13.204  20.035  1.00 48.25           O
 HETATM 2024  O   HOH B 546      36.351   1.641  46.084  1.00 69.42           O
 ENDMDL
 MASTER       14    0    0    2   18    0   15    9 2022    2    0   20
 END

$ awk 'BEGIN { l[1] = "A" ; l[2] = "B" ; l[3] = "C" ; l[4] = "D" ; x = 1 } /( TER| ENDMDL)/ { x++ } /(ATOM|HETATM)/ { print $1 "\t" $2 "\t" $3 "t" $4 "\t" l[x] "\t" $6 "\t" $7 "\t" $8 "\t" $9 "\t" $10 "\t" $11 "\t" $12 }' infile
ATOM    1       NtLYS   A       3       31.221  22.957  43.101  1.00    54.39  N
ATOM    2       CAtLYS  A       3       31.828  24.118  42.476  1.00    49.68  C
ATOM    3       CtLYS   A       3       31.979  23.854  41.021  0.00    48.07  C
ATOM    963     OG1tTHR A       125     35.484  35.831  24.497  0.00    27.99  O
ATOM    964     CG2tTHR A       125     33.105  35.742  24.150  0.00    27.99  C
ATOM    966     NtARG   B       2       63.476  8.426   12.454  0.00    60.90  N
ATOM    967     CAtARG  B       2       64.895  8.182   12.295  1.00    66.32  C
ATOM    968     CtARG   B       2       65.347  6.986   13.094  1.00    65.20  C
HETATM  2023    OtHOH   B       545     66.492  13.204  20.035  1.00    48.25  O
HETATM  2024    OtHOH   B       546     43.799  -1.641  46.084  1.00    69.42  O
ATOM    1       NtLYS   C       3       48.929  -22.957 43.101  1.00    54.39  N
ATOM    2       CAtLYS  C       3       48.322  -24.118 42.476  1.00    49.68  C
ATOM    963     OG1tTHR C       125     44.666  -35.831 24.497  0.00    27.99  O
ATOM    964     CG2tTHR C       125     47.045  -35.742 24.150  0.00    27.99  C
ATOM    966     NtARG   D       2       16.674  -8.426  12.454  0.00    60.90  N
ATOM    967     CAtARG  D       2       15.255  -8.182  12.295  1.00    66.32  C
HETATM  2023    OtHOH   D       545     13.658  -13.204 20.035  1.00    48.25  O
HETATM  2024    OtHOH   D       546     36.351  1.641   46.084  1.00    69.42  O
Hope this helps.
 
1 members found this post helpful.
Old 07-13-2011, 03:57 AM   #5
mzh
Member
 
Registered: Apr 2011
Location: Copenhagen
Posts: 71

Original Poster
Rep: Reputation: 0
awesome.
 
Old 07-13-2011, 04:16 AM   #6
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405


If you need any help with the one-liner just let me know.

BTW: If this is solved, can you put up the solved tag (first post -> Thread Tools).
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
awk multiple column into single column ilukacevic Programming 49 07-19-2010 07:23 PM
Read text file column by column RVF16 Programming 11 05-31-2009 07:16 AM
Concatenate column 1 and column 2 of related lines cgcamal Programming 4 11-20-2008 10:43 AM
Ntop - list interfaces alphabetically granny Linux - Software 0 12-12-2007 11:53 AM
Modify the way files are alphabetically sorted Sabinou Linux - General 8 08-11-2006 02:44 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 06:01 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration