LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 05-28-2013, 04:26 PM   #1
sawdusted
LQ Newbie
 
Registered: Dec 2012
Posts: 14

Rep: Reputation: Disabled
Inserting Blank Spaces-Line command


Heu Guys, I hope you can help me with a simple newbie question. I would like to insert in blank spaces/lines so that the scores can line up sequentially with my line number. This is a tab file with only 2 columns. Could you help me with a command to insert the blank spaces? Or even better insert in a score of 0 in the lines that do not have a score. Some of my files contain thousands of lines, so ideally, this can be done using a script rather than manually filling in the 0s.

Thanks!

Example:
------------------------------
Score Line
2 0
1 1
1 2
2 5
2 6
1 7
3 8
2 16
1 18
1 19
1 24
1 25


Want it to look like :
Score Line number
2 0
1 1
1 2
0 3
0 4
2 5
2 6
1 7
3 8
0 9
1 10
3 11
1 12
3 13
1 14
2 15
2 16
0 17
1 18
1 19
0 20
0 21
0 22
0 23
1 24
1 25

Last edited by sawdusted; 05-28-2013 at 04:30 PM.
 
Old 05-28-2013, 07:10 PM   #2
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,240

Rep: Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324
1. please use code tags https://www.linuxquestions.org/quest...do=bbcode#code
2. can you expand on that; the 2nd file just looks like a longer version of the first one. I don't get your qn
 
Old 05-28-2013, 10:37 PM   #3
Beryllos
Member
 
Registered: Apr 2013
Location: Massachusetts
Distribution: Debian
Posts: 304

Rep: Reputation: 121Reputation: 121
chrism01, In the first file, line numbers are omitted when the score is zero. He wants those lines put back in so the full range of line numbers is shown for each file.

sawdusted, Where are these score files coming from? What code is used to generate them? Rather than patching up the files after the fact, why don't you rewrite the originating program or script so it includes all the lines (doesn't skip lines with score of zero)?

By the way, is this a homework question?

The way I would approach it is to declare an array and initialize all scores to zero. Then read in the file to insert the non-zero scores wherever they may occur. Then write all lines back to the output file (as the original scoring program should have done).

Try writing your own script to do that. If you get stuck or have specific questions, give us a holler.
 
Old 05-29-2013, 01:11 AM   #4
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,240

Rep: Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324
now I see it.
I agree, fixing up the generating program makes more sense than post-facto.
 
Old 05-29-2013, 03:33 PM   #5
sawdusted
LQ Newbie
 
Registered: Dec 2012
Posts: 14

Original Poster
Rep: Reputation: Disabled
Thanks for the replies guys. No this is not a homework

The original files were generated by counting hits that fall within a particular numbered region. If there were no hits in that region, there were not counts/score.

I'm not sure how I can modify my original script. Maybe you can help?

Basically I first start off by extracting lines that fall within a particular region of a chromosome
Code:
 grep -w chr10 nelf-ctl.bowtie | gawk '$4>102104816 && $4<102126247'> scd.nelf-ctl
Then I cut out a column and count and sort the hits:
Code:
 cut -f4 scd.nelf-ctl | gawk '{print int(($1-102104816)/10)}'| sort | uniq -c | sort -k2,2n > scd.nelf-ctl.10bp-bin.counts
Which output is as I first described in the original post.

I have tried to create a sequentially numbered file and to join the counts file with the sequentially numbered file but it doesn't always work. Sometimes it joins up to line 100, sometimes line 90, sometimes skips line 100-999.

Code:
 gawk 'BEGIN {for (i=0; i<=( 21431/10); i++) print i}' > scd-10bp.allbins
 join -1 1 -2 2 -a 1 scd-10bp.allbins scd.nelf-ctl.10bp-bin.counts > scd.nelf-ctl.10bp-bin.allbins
Sorry for this long post.

Thank you for your help.
Julian

Last edited by sawdusted; 05-29-2013 at 03:34 PM.
 
Old 05-29-2013, 07:32 PM   #6
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,240

Rep: Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324
I'd pick a lang eg awk (in my case Perl of course and do the whole thing in that language.
This way, you can init the score to zero for every record/match before you start and just overwrite it if you get a non-zero 'score'.
If you call lots of other tools, it makes it harder to preserve multiple values, unless you use a lot of temp files.

Re Perl; Bio isn't my area, but I know there's a lot of Perl modules for it see search.cpan.org.
A couple of examples http://search.cpan.org/~cjfields/Bio...01/Bio/Perl.pm, https://en.wikipedia.org/wiki/Bioperl

Last edited by chrism01; 05-29-2013 at 07:34 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to deal with spaces in command line arguments satya123 Linux - Newbie 14 05-25-2012 12:59 PM
[SOLVED] Spaces in command line switches ProgrammerTim Linux - Newbie 2 10-06-2010 05:54 PM
Moving file which it's name contain spaces by command line??? Mr.mick-duck Linux - General 6 03-14-2009 06:30 AM
Use spaces in command line? Geminias Linux - Newbie 2 01-03-2006 08:37 PM
Spaces on command line odd Linux - Software 2 05-22-2004 09:17 AM


All times are GMT -5. The time now is 08:26 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration