Hi
I have a file whose contents are as follows:
Code:
sorce1 LEN assumption 695 3570 0.770047 - . ID=f000001.1;source_id=A.off_LEN_10008424;
sorce1 LEN descriptive 3334 3570 . - 0 Parent=f000001.1;
sorce1 LEN assumption 8859 11328 0.628724 + . ID=f000002.1;source_id=A.off_LEN_10008425;
sorce1 LEN descriptive 8859 9032 . + 0 Parent=f000002.1;
sorce1 LEN assumption 354569 361011 0.628724 + . ID=f000012.1;source_id=A.off_LEN_10008425;
sorce1 LEN descriptive 354600 360111 . + 0 Parent=f000012.1;
sorce1 LEN assumption 350567 354686 0.628724 + . ID=f000012.2;source_id=A.off_LEN_10008425;
sorce1 LEN descriptive 350567 353321 . + 0 Parent=f000012.2;
I wanted it to look like this
Code:
sorce1 LEN predictive 695 3570 0.770047 - . ID=f000001;source_id=A.off_LEN_10008424;
sorce1 LEN assumption 695 3570 0.770047 - . ID=f000001.1;source_id=A.off_LEN_10008424;
sorce1 LEN descriptive 3334 3570 . - 0 Parent=f000001.1;
sorce1 LEN predictive 8859 11328 0.628724 + . ID=f000002;source_id=A.off_LEN_10008425;
sorce1 LEN assumption 8859 11328 0.628724 + . ID=f000002.1;source_id=A.off_LEN_10008425;
sorce1 LEN descriptive 8859 9032 . + 0 Parent=f000002.1;
sorce1 LEN predictive 350567 361011 0.628724 + . ID=f000012;source_id=A.off_LEN_10008425;
sorce1 LEN assumption 354569 361011 0.628724 + . ID=f000012.1;source_id=A.off_LEN_10008425;
sorce1 LEN descriptive 354600 360111 . + 0 Parent=f000012.1;
sorce1 LEN assumption 350567 354686 0.628724 + . ID=f000012.2;source_id=A.off_LEN_10008425;
sorce1 LEN descriptive 350567 353321 . + 0 Parent=f000012.2;
Basically I wanted to add a statement with the third column entry as predictive and the ID having only the id name without anything after the dot.
So for every statement for assumption,I need to add a statement with predictive.
So i used this code
sed 's/\(.*\)assumption\(.*\)\(ID=[^.]*\)[^;]*\(;.*\)/\1predictive\2\3\4\n&/' file
However in my file, I have some instance where there are variants for the id name :For example One variant of id is f000012.1 and the other is f000012.2
this above code worked perfectly well for instance having no variants of IDS. But in case of variants,I am getting a multiple entry of predictive statement for the same ids.
result of the code
Code:
sorce1 LEN predictive 695 3570 0.770047 - . ID=f000001;source_id=A.off_LEN_10008424;
sorce1 LEN assumption 695 3570 0.770047 - . ID=f000001.1;source_id=A.off_LEN_10008424;
sorce1 LEN descriptive 3334 3570 . - 0 Parent=f000001.1;
sorce1 LEN predictive 8859 11328 0.628724 + . ID=f000002;source_id=A.off_LEN_10008425;
sorce1 LEN assumption 8859 11328 0.628724 + . ID=f000002.1;source_id=A.off_LEN_10008425;
sorce1 LEN descriptive 8859 9032 . + 0 Parent=f000002.1;
sorce1 LEN predictive 354569 361011 0.628724 + . ID=f000012.1;source_id=A.off_LEN_10008425;
sorce1 LEN assumption 354569 361011 0.628724 + . ID=f000012.1;source_id=A.off_LEN_10008425;
sorce1 LEN descriptive 354600 360111 . + 0 Parent=f000012.1;
sorce1 LEN predictive 350567 354686 0.628724 + . ID=f000012.2;source_id=A.off_LEN_10008425;
sorce1 LEN assumption 350567 354686 0.628724 + . ID=f000012.2;source_id=A.off_LEN_10008425;
sorce1 LEN descrptive 350567 353321 . + 0 Parent=f000012.2;
whereas what i needed should look like this
sorce1 LEN predictive 350567 361011 0.628724 + . ID=f000012;source_id=A.off_LEN_10008425;
Is there a way I could only add a single line with predictive statement with using the earliest start point i e : and farthest away end point to represent the predictive statement?The ID name shouldnt have variants .
thanks in advance