LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
LinkBack Search this Thread
Old 04-10-2012, 11:48 AM   #1
rafir
LQ Newbie
 
Registered: Oct 2011
Posts: 8

Rep: Reputation: Disabled
Bash awk coding loop


Hi

I have a very large file of format:

chr1 54478 54479 00 00 00 00 3
chr1 41980 41981 11 11 11 11 4
chr1 54352 54353 00 nn 01 00 4
chr1 52726 52727 nn 00 01 01 12
chr1 47291 47292 nn nn nn 00 13
chr1 46669 46670 01 nn nn nn 14
chr1 47107 47108 nn nn 01 nn 14
chr1 54379 54380 00 00 0n 00 15
chr1 49297 49298 nn nn nn nn 16
chr1 54675 54676 00 00 00 0n 16
chr1 55163 55164 11 11 nn 11 16
chr1 51672 51673 00 nn nn nn 18
...

I would like to subset the data multiple times along column 8. Then for each subset ask, how many times columns 4 and 5 do not match, and the same for columns 6 and 7. The only way I could think of was to mesh bash and awk, but it does not seem to work.

for(i=1; i<=120; i++);
grep $8<=i|
awk 'BEGIN{n=0;m=0}
{
if($4!=$5){n=n+1;}
if($6!=$7){m=m+1;}
}
END{print i, n, m}'
done

**************************************

-or-

for(i=1; i<=120; i++);
awk 'BEGIN{n=0;m=0}
{if($8<=i){
if($4!=$5){n=n+1;}
if($6!=$7){m=m+1;}
}
}
END{print i, n, m}'
done

Thank you

Last edited by rafir; 04-10-2012 at 12:57 PM.
 
Old 04-10-2012, 12:04 PM   #2
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946
Please use [code][/code] tags around your code and data, to preserve formatting and to improve readability. Please do not use quote tags, colors, or other fancy formatting.

Could you please explain what you want a little more clearly? Show us exactly what the output should look like for the above example.

The first thing I noticed:

Code:
for(i=1, i=120, i++){
grep $8>=i | awk
The first part of this command is shell syntax, not awk syntax, so, for example, $8 here would be considered the shell's eighth positional parameter, and the for loop syntax is completely wrong.
 
Old 04-10-2012, 12:20 PM   #3
rafir
LQ Newbie
 
Registered: Oct 2011
Posts: 8

Original Poster
Rep: Reputation: Disabled
That's true. I am mixing up formats. So for (i =1:18) I would want an output of:

3 0 0
4 1 1
...
12 2 1
13 2 2
14 3 3
15 3 4
16 3 6
17 3 6
18 4 6
 
Old 04-10-2012, 12:41 PM   #4
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,195

Rep: Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796
Yeah still lost me Maybe you could explain how you are measuring the data you have shown, for example, 18 4 6, I follow that 18 is in the last column but have absolutely zero
ideas on how you manufactured the other 2 numbers???
 
Old 04-10-2012, 12:49 PM   #5
amani
Senior Member
 
Registered: Jul 2006
Location: Kolkata, India
Distribution: 64-bit GNU/Linux, Kubuntu64, Fedora QA, Slackware,
Posts: 2,754

Rep: Reputation: Disabled
@grail, 4 ,6 must be the number of nonmatches (read 1st post)
 
Old 04-10-2012, 12:58 PM   #6
rafir
LQ Newbie
 
Registered: Oct 2011
Posts: 8

Original Poster
Rep: Reputation: Disabled
amani is right. When the last column is <= 18, there are 4 nonmatches where ($4 !=$5), and 6 nonmatches where column ($6 != $7)
 
Old 04-10-2012, 01:05 PM   #7
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,195

Rep: Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796
Okay ... so it is cumulative ... nice to know

Next silly question, when the last column is the same number, ie row 2 and 3 both end in a 4, are we not to output the information until there is a change in the last column?

As an example if we were not doing it per change, the output would be:
Code:
3 0 0
4 0 0
4 1 1
It may seem like an odd question being your output example, however, your example also includes data not present in the original example, such as 17.
 
Old 04-10-2012, 01:38 PM   #8
rafir
LQ Newbie
 
Registered: Oct 2011
Posts: 8

Original Poster
Rep: Reputation: Disabled
Yes only output when i changes value, and every iteration of i should get only one entry. So the complete output from above would be:

1 0 0
2 0 0
3 0 0
4 1 1
5 1 1
6 1 1
7 1 1
8 1 1
9 1 1
10 1 1
11 1 1
12 2 1
13 2 2
14 3 3
15 3 4
16 3 6
17 3 6
18 4 6
 
Old 04-11-2012, 02:49 AM   #9
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,367

Rep: Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910
No need to mix bash and a powerful tool like awk! Please check this script:
Code:
#!/bin/awk -f

{

  pair_one[$NF] = pair_one[$NF] + ( $4 != $5 )
  pair_two[$NF] = pair_two[$NF] + ( $6 != $7 )
  
}

END {

  for ( i = 1; i <= $NF; i++ ) {
    sum_one += pair_one[i]
    sum_two += pair_two[i]
    print i, sum_one, sum_two
  }
    
}
The loop in the END section terminates at the last number encountered in the last column at the end of the file. Change $NF with 120 if you already know it is the last/maximum value (or if the numbers in the last column are not sorted in ascending order). Moreover, please notice that - as written - this is a script interpreted by awk (see the sha-bang in the very first line). Hope this helps.

Last edited by colucix; 04-11-2012 at 02:53 AM.
 
Old 04-11-2012, 03:50 AM   #10
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,195

Rep: Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796
Right ... so now that I have all the information, you might want something like:
Code:
awk '{a[$NF] = a[last] + ($4!=$5);b[$NF] = b[last] + ($6!=$7);last = $NF}END{for(i=1;i<=$NF;i++){if(a[i])n = i;print i,a[n],b[n]}}' file
Currently it does not print zeros but I am sure you can change as need be
 
Old 04-11-2012, 12:32 PM   #11
rafir
LQ Newbie
 
Registered: Oct 2011
Posts: 8

Original Poster
Rep: Reputation: Disabled
When run on the data above, both codes produce huge (but not identical) files that seems to be infinite loops.

What is the meaning of the syntax:

pair_one[$NF] = pair_one[$NF] + ( $4 != $5 )
 
Old 04-11-2012, 01:34 PM   #12
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,367

Rep: Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910Reputation: 1910
Quote:
Originally Posted by rafir View Post
When run on the data above, both codes produce huge (but not identical) files that seems to be infinite loops.
How did you run the code? Please, show us what you entered in the command line, what did you get and what is the content of the current version of your script. Possibly using CODE tags to make it more readable.

Quote:
Originally Posted by rafir View Post
What is the meaning of the syntax:

pair_one[$NF] = pair_one[$NF] + ( $4 != $5 )
This means that the $NF-th element (that is the element that has index equal to the value of the last field of the current record) of the array pair_one is equal to itself increased by the value returned by the expression
Code:
( $4 != $5 )
In awk (and similarly in many programming languages) a logical expression is evaluated 0 if it's false and 1 if it's true. Hence the count is increased by 1 if the two fields are different and it is not increased if the two fields are equal. Hope it's a bit more clear, now.
 
Old 04-12-2012, 03:18 AM   #13
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,195

Rep: Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796Reputation: 1796
I am with colucix. I have run the code on the given example and it generates the exact output you have given.
 
  


Reply

Tags
awk, bash scripting, loops


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[bash/awk] How to increment hour in a loop? wasim_jd Linux - Newbie 4 12-08-2011 11:59 AM
how to loop over text file lines within bash script for loop? johnpaulodonnell Linux - Newbie 8 04-05-2011 09:18 AM
LXer: Get started coding with GAWK and AWK LXer Syndicated Linux News 0 09-26-2006 02:54 PM
How to loop or sort in bash, awk or sed? j4r0d Programming 1 09-09-2004 03:22 AM
problem coding odd loop shams Programming 3 07-17-2004 01:22 PM


All times are GMT -5. The time now is 02:37 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration