Dear Forum,
Lets say, I have a file structured like this:
Code:
MODEL 1
ATOM 1 N LYS A 3 31.221 22.957 43.101 1.00 54.39 N
ATOM 2 CA LYS A 3 31.828 24.118 42.476 1.00 49.68 C
ATOM 3 C LYS A 3 31.979 23.854 41.021 0.00 48.07 C
ATOM 963 OG1 THR A 125 35.484 35.831 24.497 0.00 27.99 O
ATOM 964 CG2 THR A 125 33.105 35.742 24.150 0.00 27.99 C
[...]
TER 965 THR A 125
ATOM 966 N ARG B 2 63.476 8.426 12.454 0.00 60.90 N
ATOM 967 CA ARG B 2 64.895 8.182 12.295 1.00 66.32 C
ATOM 968 C ARG B 2 65.347 6.986 13.094 1.00 65.20 C
HETATM 2023 O HOH B 545 66.492 13.204 20.035 1.00 48.25 O
HETATM 2024 O HOH B 546 43.799 -1.641 46.084 1.00 69.42 O
[...]
ENDMDL
MODEL 2
ATOM 1 N LYS A 3 48.929 -22.957 43.101 1.00 54.39 N
ATOM 2 CA LYS A 3 48.322 -24.118 42.476 1.00 49.68 C
ATOM 963 OG1 THR A 125 44.666 -35.831 24.497 0.00 27.99 O
ATOM 964 CG2 THR A 125 47.045 -35.742 24.150 0.00 27.99 C
[...]
TER 965 THR A 125
ATOM 966 N ARG B 2 16.674 -8.426 12.454 0.00 60.90 N
ATOM 967 CA ARG B 2 15.255 -8.182 12.295 1.00 66.32 C
HETATM 2023 O HOH B 545 13.658 -13.204 20.035 1.00 48.25 O
HETATM 2024 O HOH B 546 36.351 1.641 46.084 1.00 69.42 O
[...]
ENDMDL
MASTER 14 0 0 2 18 0 15 9 2022 2 0 20
END
As you can see, there are various columns with data (its a file describing a molecular protein structure, in total around 4000 lines).
What I require is to change the A's and B's in the 5 column, whenever a 'TER' or 'ENDMDL' keyword appears.
So the file above should be like this:
Code:
ATOM 1 N LYS A 3 31.221 22.957 43.101 1.00 54.39 N
ATOM 2 CA LYS A 3 31.828 24.118 42.476 1.00 49.68 C
ATOM 964 CG2 THR A 125 33.105 35.742 24.150 0.00 27.99 C
[...]
ATOM 966 N ARG B 2 63.476 8.426 12.454 0.00 60.90 N
ATOM 967 CA ARG B 2 64.895 8.182 12.295 1.00 66.32 C
ATOM 968 C ARG B 2 65.347 6.986 13.094 1.00 65.20 C
HETATM 2023 O HOH B 545 66.492 13.204 20.035 1.00 48.25 O
HETATM 2024 O HOH B 546 43.799 -1.641 46.084 1.00 69.42 O
[...]
ATOM 1 N LYS C 3 48.929 -22.957 43.101 1.00 54.39 N
ATOM 2 CA LYS C 3 48.322 -24.118 42.476 1.00 49.68 C
ATOM 963 OG1 THR C 125 44.666 -35.831 24.497 0.00 27.99 O
[...]
ATOM 966 N ARG D 2 16.674 -8.426 12.454 0.00 60.90 N
ATOM 967 CA ARG D 2 15.255 -8.182 12.295 1.00 66.32 C
HETATM 2023 O HOH D 545 13.658 -13.204 20.035 1.00 48.25 O
Now, the letter descriptor, the "chain id" has been changed from A/B to A/B/C/D (I ommitted a couple of lines, but the change happens at every 'TER' or 'ENDMDL' keyword). The 'TER', 'MODEL' and 'ENDMDL' lines are also not essential.
Is there a smart way to do this?
Thanks a lot for any help.
Martin