The file editing through linux command

rama_david · 03-17-2014, 10:22 AM

Hi friends
I am new to linux.
I have the .pdb file like the

....
ATOM 1 N MET 1 54.673 11.020 11.759 1.00 0.00
ATOM 2 HN1 MET 1 54.560 11.703 11.010 1.00 0.00
ATOM 3 HN2 MET 1 55.655 10.817 11.946 1.00 0.00
ATOM 4 HN3 MET 1 54.425 11.407 12.670 1.00 0.00
ATOM 5 CA MET 1 53.903 9.795 11.449 1.00 0.00
ATOM 6 C MET 1 54.068 8.797 12.543 1.00 0.00
.....
Now I want to add the column containing the word ¨ A ¨ ( throughout the file ) at 25th column and delete the column 27 ( throughout the file )..
how to do it with the linux command ??

I will be a very thankful to you for help

schneidz · 03-17-2014, 10:58 AM

what have you tried and where are you stuck. you may want to look into the cut, paste, awk, ... commands.

jefro · 03-17-2014, 05:23 PM

I'm sure the script guys could make a simple script starting with "for"

Can you use nano, vi or vim to edit with?

Others

http://www.thegeekstuff.com/2009/07/...-text-editors/

padeen · 03-17-2014, 08:30 PM

Cut 27th char is `cut --complement -c 27`, add A to column 25 is `sed -e 's/\(.\{25\}\)/\1A/'` . You can pipe them together (depending on which action takes place first):

Code:

sed -e 's/\(.\{25\}\)/\1A/' input_file_name | cut --complement -c 27 > output_file_name

"complement" is a GNU extension. Gets more tricky if you're not using GNU; probably have to use a small awk or sed script.

Norseman01 · 03-20-2014, 01:01 AM

Maybe this will help.

Prompt:> cat orgdata
ATOM 1 N MET 1 54.673 11.020 11.759 1.00 0.00
ATOM 2 HN1 MET 1 54.560 11.703 11.010 1.00 0.00
ATOM 3 HN2 MET 1 55.655 10.817 11.946 1.00 0.00
ATOM 4 HN3 MET 1 54.425 11.407 12.670 1.00 0.00
ATOM 5 CA MET 1 53.903 9.795 11.449 1.00 0.00
ATOM 6 C MET 1 54.068 8.797 12.543 1.00 0.00

Prompt:> cat cmd
sed -e 's/\(.\{25\}\)/\1A/' orgdata | cut --complement -c 27 >moddata

Prompt:> cat moddata
ATOM 1 N MET 1 54.673 11.A20 11.759 1.00 0.00
ATOM 2 HN1 MET 1 54.560 1A.703 11.010 1.00 0.00
ATOM 3 HN2 MET 1 55.655 1A.817 11.946 1.00 0.00
ATOM 4 HN3 MET 1 54.425 1A.407 12.670 1.00 0.00
ATOM 5 CA MET 1 53.903 9.A95 11.449 1.00 0.00
ATOM 6 C MET 1 54.068 8.7A7 12.543 1.00 0.00

Prompt:>

I doubt the modified data lines are what you want.

Problem: This is a "Delimited" database file.

How do I know? Same number of groups on the line
with the same delimiter between each group.
Total line lengths differ.

awk is the choice here. do a man awk #and read
do an info awk #and read

This is going to sound harsh but it is true:
If you have never programmed a computer, take a beginner class.
Learn (remember if you had the class) things like:
for/next, if/else/endif and such - aka program controlls
read/write commands, assign commands, what a var (variable) is.

Anyone can use Linux for ordinary Office use. google OpenOffice,
but to get the most from it you need programming skills.

Am I trying to scare you? NO!!
Pick a determined attitude, go in with your eyes open and in no
time at all you will be teaching us.

The use of awk is to create one var for each "field" in the line.
Then each var can be manipulated individually, realigned, stored or
printed or whatever.
Each field can be assigned to a var (database calls them field names),
something like: f001=ATOM
f002=1
f003=N ... and so on
next line: f001=ATOM
f002=2
f003=HN1
... and so on

ATOM 1 N MET 1 54.673 11.020 11.759 1.00 0.00
ATOM 2 HN1 MET 1 54.560 11.703 11.010 1.00 0.00

if we revert to the old IBM Punch Card aka Standard Data Format
we will see the problem:
123456789x123456789x1234X6X789x123456789x123456789x
ATOM 1 N MET 1 54.673 11.020 11.759 1.00 0.00
ATOM 2 HN1 MET 1 54.560 11.703 11.010 1.00 0.00
see the "X"?
Col-25 and Col-27 is in different places on different lines.

it should be:
123456789x123456789x1234X6X89x123456789x123456789x
ATOM1N MET154.67311.02011.7591.000.00 These two lines should be
ATOM2HN1MET154.56011.70311.0101.000.00 the same length, see f003
ie... f001,4
f002,1,0
f003,3
f004,3
f005,1,0
f006,6,3
... and so on
Last line of fields just above read as:
name,width in columns,# decimal places
f006 6 3 #max number 99.999

Spaces or (more commonly) commas are added at print/display time.
ANY byte can be used as a delimiter as long as it is NOT in any
field containing data. If one wishes to waste great amounts of
disk space one can place the delimiter in the database.
I count 10 fields and 9 delimiters in each line.
A database of 1 million lines which has 9 blanks in each line
wastes 9 million bytes of storage. Thus, fully delimited
storage of orgdata is a bad idea.

It's also why, when posted,
this posting reads funny. The leading blanks are removed to save space.
The SDF method gets more
data stored in same space. (Assuming all or most fields are
not empty.)

So what are you looking to do?
A) Open and read line by line untill all read
B) break each line's fields into individual vars
C) determine which var(s) to modify/delete/whatever
D) do the modifictions
E) copy/print/store the re-assembled string
F) go to step A untill source is consumed
G) Oh-See-Seven (GONG) done

TIP:
Visual inspection of supplied data suggests:
Col25 = 25 - (MaxLineLength - CurrentLineLength)

Because the translation may not be saying what you are,
At this point I need to ask you:
Given:
ATOM 2 HN1 MET 1 54.560 11.703 11.010 1.00 0.00
Is this the desired result?
ATOM 2 HN1 MET 1 54.560 A 11703 11.010 1.00 0.00

Norseman01