LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
LinkBack Search this Thread
Old 02-16-2014, 09:31 PM   #1
sam@
LQ Newbie
 
Registered: Sep 2013
Posts: 14

Rep: Reputation: Disabled
Adding a line


Hi

I have a file whose contents are as follows:
Code:
sorce1       LEN   assumption   695     3570    0.770047        -       .       ID=f000001.1;source_id=A.off_LEN_10008424;
sorce1       LEN   descriptive     3334    3570    .       -       0       Parent=f000001.1;

sorce1       LEN   assumption    8859    11328   0.628724        +       .       ID=f000002.1;source_id=A.off_LEN_10008425;
sorce1       LEN   descriptive     8859    9032    .       +       0       Parent=f000002.1;

sorce1       LEN   assumption    354569    361011   0.628724        +       .       ID=f000012.1;source_id=A.off_LEN_10008425;
sorce1       LEN   descriptive        354600    360111    .       +       0       Parent=f000012.1;

sorce1       LEN   assumption    350567    354686    0.628724        +       .       ID=f000012.2;source_id=A.off_LEN_10008425;
sorce1       LEN   descriptive     350567    353321    .       +       0                       Parent=f000012.2;
I wanted it to look like this
Code:

sorce1       LEN   predictive    695     3570    0.770047        -       .       ID=f000001;source_id=A.off_LEN_10008424;
sorce1       LEN   assumption   695     3570    0.770047        -       .       ID=f000001.1;source_id=A.off_LEN_10008424;
sorce1       LEN   descriptive     3334    3570    .       -       0       Parent=f000001.1;

sorce1       LEN   predictive    8859    11328   0.628724        +       .       ID=f000002;source_id=A.off_LEN_10008425;
sorce1       LEN   assumption    8859    11328   0.628724        +       .       ID=f000002.1;source_id=A.off_LEN_10008425;
sorce1       LEN   descriptive     8859    9032    .       +       0       Parent=f000002.1;

sorce1       LEN   predictive    350567    361011    0.628724        +       .       ID=f000012;source_id=A.off_LEN_10008425;
sorce1       LEN   assumption    354569    361011   0.628724        +       .       ID=f000012.1;source_id=A.off_LEN_10008425;
sorce1       LEN   descriptive        354600    360111    .       +       0       Parent=f000012.1;

sorce1       LEN   assumption    350567    354686    0.628724        +       .       ID=f000012.2;source_id=A.off_LEN_10008425;
sorce1       LEN   descriptive     350567    353321    .       +       0                       Parent=f000012.2;
Basically I wanted to add a statement with the third column entry as predictive and the ID having only the id name without anything after the dot.
So for every statement for assumption,I need to add a statement with predictive.

So i used this code
sed 's/\(.*\)assumption\(.*\)\(ID=[^.]*\)[^;]*\(;.*\)/\1predictive\2\3\4\n&/' file


However in my file, I have some instance where there are variants for the id name :For example One variant of id is f000012.1 and the other is f000012.2
this above code worked perfectly well for instance having no variants of IDS. But in case of variants,I am getting a multiple entry of predictive statement for the same ids.


result of the code
Code:
sorce1       LEN   predictive    695     3570    0.770047        -       .       ID=f000001;source_id=A.off_LEN_10008424;
sorce1       LEN   assumption   695     3570    0.770047        -       .       ID=f000001.1;source_id=A.off_LEN_10008424;
sorce1       LEN   descriptive     3334    3570    .       -       0       Parent=f000001.1;

sorce1       LEN   predictive    8859    11328   0.628724        +       .       ID=f000002;source_id=A.off_LEN_10008425;
sorce1       LEN   assumption    8859    11328   0.628724        +       .       ID=f000002.1;source_id=A.off_LEN_10008425;
sorce1       LEN   descriptive     8859    9032    .       +       0       Parent=f000002.1;

sorce1       LEN   predictive   354569    361011   0.628724        +       .       ID=f000012.1;source_id=A.off_LEN_10008425;
sorce1       LEN   assumption    354569    361011   0.628724        +       .       ID=f000012.1;source_id=A.off_LEN_10008425;
sorce1       LEN   descriptive        354600    360111    .       +       0       Parent=f000012.1;

sorce1       LEN  predictive     350567    354686    0.628724        +       .       ID=f000012.2;source_id=A.off_LEN_10008425;
sorce1       LEN   assumption    350567    354686    0.628724        +       .       ID=f000012.2;source_id=A.off_LEN_10008425;
sorce1       LEN   descrptive     350567    353321    .       +       0                       Parent=f000012.2;
whereas what i needed should look like this
sorce1 LEN predictive 350567 361011 0.628724 + . ID=f000012;source_id=A.off_LEN_10008425;


Is there a way I could only add a single line with predictive statement with using the earliest start point i e : and farthest away end point to represent the predictive statement?The ID name shouldnt have variants .

thanks in advance
 
Old 02-17-2014, 12:47 AM   #2
pan64
Senior Member
 
Registered: Mar 2012
Location: Hungary
Distribution: debian i686 (solaris)
Posts: 4,001

Rep: Reputation: 1003Reputation: 1003Reputation: 1003Reputation: 1003Reputation: 1003Reputation: 1003Reputation: 1003Reputation: 1003
probably I missed something, your sed is working exactly as you explained, I could not reproduce that problem.
 
Old 02-17-2014, 02:48 AM   #3
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,362

Rep: Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910
A solution in awk, that checks if the ID has been already used:
Code:
/assumption/ {
  line = $0
  i = gensub(/^.*ID=([^.]+)\.[^;]+;.*$/,"\\1","g",line)
  if ( ! _[i] ) {
    sub(/assumption/,"predictive",line)
    line = gensub(/^(.*ID=[^.]+)\.[^;]+(;.*$)/,"\\1\\2","g",line)
    print line
  }
  _[i]++
}
1
 
Old 02-17-2014, 02:57 AM   #4
pan64
Senior Member
 
Registered: Mar 2012
Location: Hungary
Distribution: debian i686 (solaris)
Posts: 4,001

Rep: Reputation: 1003Reputation: 1003Reputation: 1003Reputation: 1003Reputation: 1003Reputation: 1003Reputation: 1003Reputation: 1003
now I think I understand: you need only the last line containing the same ID and should be printed at the first occurrence? Is that ok?
that can be solved only in two passes: first you need to parse input file (looking for all the possible IDs) and calculate lines and print the result.
 
Old 02-18-2014, 06:14 PM   #5
sam@
LQ Newbie
 
Registered: Sep 2013
Posts: 14

Original Poster
Rep: Reputation: Disabled
Unhappy reply

@ colucix
I used the command :
Code:
awk '/assumption/ {
  line = $0
  i = gensub(/^.*ID=([^.]+)\.[^;]+;.*$/,"\\1","g",line)
  if ( ! _[i] ) {
    sub(/assumption/,"predictive",line)
    line = gensub(/^(.*ID=[^.]+)\.[^;]+(;.*$)/,"\\1\\2","g",line)
    print line
  }
  _[i]++
}
1
' infile > outfile
its somehow changing the format of the file

Last edited by sam@; 02-20-2014 at 11:00 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
adding line sam@ Programming 5 12-14-2013 01:09 PM
[SOLVED] Adding a character in each line depending upon number of such characters present in a line. ls_milkyway Linux - Newbie 13 08-24-2013 03:19 AM
Adding # to the first line leblinux Linux - Newbie 6 06-23-2011 08:12 AM
[SOLVED] adding new line after each line in perl KManepalli Linux - Newbie 9 04-28-2011 08:02 AM
[SOLVED] adding line from file1 into a line of another file based on maching IDs rossk Programming 6 01-06-2011 12:06 AM


All times are GMT -5. The time now is 11:19 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration