Replace 6th column entries

sam@ · 03-03-2015, 12:14 AM

Hi

My input file looks

Code:

String000002  GeneWise        CW     48945   49354   .       -       0       Pt=PEQU_00004;
String000002  LEN   NA    52125   52604   0.945751        -       .       PID=PEQU_00005;lvid_id=PEQ_28708;
String000002  LEN   CW     52125   52604   .       -       0       Pt=PEQU_00005;
String000002  WEise        NA    66200   66667   45.48   -       .       PID=PEQU_00006;lvid_id=Os03t0797100-00-D1363;Shift=0;
String000002  WEise        CW     66200   66667   .       -       0       Pt=PEQU_00006;
String000002  GUST        NA    90829   91128   0.21    +       .       PID=PEQU_00007;lvid_id=A00088;
String000002  GUST        CW     90829   91128   0.21    +       0       Pt=PEQU_00007;
String000002  LEN   NA    104627  107284  0.499954        -       .       PID=PEQU_00008;lvid_id=PEQ_36749;
String000002  LEN   CW     104627  105584  .       -       1       Pt=PEQU_00008;

I want to replace all my 6th column entries to .

My original file has decimal values in column 6 .It also has . as 6th column entries. I just want to replace all 6th column entries to .
My required output file is :

Code:

String000002  GeneWise        CW     48945   49354   .       -       0       Pt=PEQU_00004;
String000002  LEN   NA    52125   52604   .        -       .       PID=PEQU_00005;lvid_id=PEQ_28708;
String000002  LEN   CW     52125   52604   .       -       0       Pt=PEQU_00005;
String000002  WEise        NA    66200   66667   .   -       .       PID=PEQU_00006;lvid_id=Os03t0797100-00-D1363;Shift=0;
String000002  WEise        CW     66200   66667   .       -       0       Pt=PEQU_00006;
String000002  GUST        NA    90829   91128   .    +       .       PID=PEQU_00007;lvid_id=A00088;
String000002  GUST        CW     90829   91128   .    +       0       Pt=PEQU_00007;
String000002  LEN   NA    104627  107284  .        -       .       PID=PEQU_00008;lvid_id=PEQ_36749;
String000002  LEN   CW     104627  105584  .       -       1       Pt=PEQU_00008;

Is there a sed or awk command that I could use?

pan64 · 03-03-2015, 12:34 AM

yes: awk ' { $6="." }'

sam@ · 03-03-2015, 08:41 AM

hi I tried awk ' { $6="." }' infile> outfile

But it gave me an empty file.Am I missing anything?

pan64 · 03-03-2015, 08:47 AM

yes, choose:

Code:

awk '$6="."' in>out
- or -
awk '{$6=".";print}' in>out

sam@ · 03-03-2015, 09:44 AM

Hi
Thanks, it did replace the 6th column entries but it got rid of the spaces which are necessary for further processing.

My columns are separated by tab which got eliminated in the process.Here is how it looks now:

Code:

String000002 GeneWise CW 48945 49354 . - 0 Pt=PEQU_00004;
String000002 LEN NA 52125 52604 . - . PID=PEQU_00005;lvid_id=PEQ_28708;
String000002 LEN CW 52125 52604 . - 0 Pt=PEQU_00005;
String000002 WEise NA 66200 66667 . - . PID=PEQU_00006;lvid_id=Os03t0797100-00-D1363;Shift=0;
String000002 WEise CW 66200 66667 . - 0 Pt=PEQU_00006;
String000002 GUST NA 90829 91128 . + . PID=PEQU_00007;lvid_id=A00088;
String000002 GUST CW 90829 91128 . + 0 Pt=PEQU_00007;
String000002 LEN NA 104627 107284 . - . PID=PEQU_00008;lvid_id=PEQ_36749;
String000002 LEN CW 104627 105584 . - 1 Pt=PEQU_00008;

jpollard · 03-03-2015, 09:55 AM

It still has all fields. What is the problem with the spacing?

sam@ · 03-03-2015, 10:10 AM

if you compare with my required output file , it had tab spacing between the fields which is important for further parsinng.

Here is the sample output required.

Code:

String000002  GeneWise        CW     48945   49354   .       -       0       Pt=PEQU_00004;
String000002  LEN   NA    52125   52604   .        -       .       PID=PEQU_00005;lvid_id=PEQ_28708;
String000002  LEN   CW     52125   52604   .       -       0       Pt=PEQU_00005;
String000002  WEise        NA    66200   66667   .   -       .       PID=PEQU_00006;lvid_id=Os03t0797100-00-D1363;Shift=0;
String000002  WEise        CW     66200   66667   .       -       0       Pt=PEQU_00006;
String000002  GUST        NA    90829   91128   .    +       .       PID=PEQU_00007;lvid_id=A00088;
String000002  GUST        CW     90829   91128   .    +       0       Pt=PEQU_00007;
String000002  LEN   NA    104627  107284  .        -       .       PID=PEQU_00008;lvid_id=PEQ_36749;
String000002  LEN   CW     104627  105584  .       -       1       Pt=PEQU_00008

;

Here is code result

Code:

String000002 GeneWise CW 48945 49354 . - 0 Pt=PEQU_00004;
String000002 LEN NA 52125 52604 . - . PID=PEQU_00005;lvid_id=PEQ_28708;
String000002 LEN CW 52125 52604 . - 0 Pt=PEQU_00005;
String000002 WEise NA 66200 66667 . - . PID=PEQU_00006;lvid_id=Os03t0797100-00-D1363;Shift=0;
String000002 WEise CW 66200 66667 . - 0 Pt=PEQU_00006;
String000002 GUST NA 90829 91128 . + . PID=PEQU_00007;lvid_id=A00088;
String000002 GUST CW 90829 91128 . + 0 Pt=PEQU_00007;
String000002 LEN NA 104627 107284 . - . PID=PEQU_00008;lvid_id=PEQ_36749;
String000002 LEN CW 104627 105584 . - 1 Pt=PEQU_00008;

It did the replacing but the tab spacing is required for further parsing.

is it possible to retain the format of file by only replacing using sed or awk.

jpollard · 03-03-2015, 11:19 AM

Not without knowing what it is supposed to be... And the input seems to have varying field width (which would be why the columns don't match up).

It almost looks like the field width is varying depending on the contents - more spaces make things wider...

Now if each field was tab separated, then that can be fixed - use the -F and specify the field separator is a tab. By default field separation is by one or more spacing characters (spaces or tabs). If the field is delimited by just a tab, then using the explicit separator would preserve the spaces as they are not considered field separators.

pan64 · 03-04-2015, 12:37 AM

sed 's/^$[^ \t]*\s*[^ \t]*\s*[^ \t]*\s*[^ \t]*\s*[^ \t]*\s*$[^ \t]*$\s.*$/\1.\2/'