LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 05-18-2012, 08:06 AM   #1
jv61
LQ Newbie
 
Registered: May 2012
Posts: 24

Rep: Reputation: Disabled
printing columns that has specific delimiter


Hi all,

My file looks like this.

consensus_3 . GT:PL:DP:SP:GQ 0/0:0,255,106:119:0:99
consensus_6 . GT:PL:DP:SP:GQ 0/0:0,51,39:114:0:50

What I want to do is print 1st and 3rd column with fields that has : as a delimiter. For example I want the output to look like

GT DP 0/0 119
GT DP 0/0 114

What I did is to try loop over the columns and print 1st and 3rd column for the fields that has : as a delimiter.

The following is the code I tried for that.

awk '{for(x=1;x<NF;x++);split($x,a,":");print a[1],a[3]}'


But it is not working as I expected to be. Instead it prints just the 1st and 3rd column from the last field.

0/0 119
0/0 114.

How to print out the specified columns from all the fields with : as a delimiter and not just one field.

Any thoughts would be appreciated.

Thanks

Last edited by jv61; 05-18-2012 at 08:09 AM.
 
Old 05-18-2012, 08:24 AM   #2
tronayne
Senior Member
 
Registered: Oct 2003
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541

Rep: Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065
You need to use the FS (field separator).
Code:
BEGIN {
     FS=":"
}
{
     printf ("%s %s\n", $1, $2);
}
Hope this helps some.
 
Old 05-18-2012, 11:11 AM   #3
jv61
LQ Newbie
 
Registered: May 2012
Posts: 24

Original Poster
Rep: Reputation: Disabled
Many thanks for the reply. I tried it with field separator, but it still prints out only the first encountered coulmn which has : as delimiter.

For example this is my file:

Chrom Sample1 Sample2 Sample3
1 AD:DP:GL:CG 1/1:119:23 0/1:110:22
2 AD:DP:GL:GC 0/1:120:24 1/1:100:80

I would like to print something like this

Chrom Sample1 Sample2
1 AD GL 1/1 23
2 AD GL 0/1 24

When I try this command

BEGIN {
FS=":"
}
{
printf ("%s %s\n", $1, $3);
}

it prints only from Sample1 column like given below.

AD GL
AD GL

But I would like to print it from all the sample columns.

Any thoughts of how to do that, thank you.
 
Old 05-19-2012, 08:09 AM   #4
tronayne
Senior Member
 
Registered: Oct 2003
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541

Rep: Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065
Here's a hint. You've got a file with two different delimiters in it, spaces and colons:
Code:
Chrom Sample1 Sample2 Sample3
1 AD:DP:GL:CG 1/1:119:23 0/1:110:22
2 AD:DP:GL:GC 0/1:120:24 1/1:100:80
Run it though sed, changing the spaces to colons (requiring the use of FS) or changing the colons to spaces (the default delimiter in AWK):
Code:
sed 's/:/ /g' chrom
Chrom Sample1 Sample2 Sample3
1 AD DP GL CG 1/1 119 23 0/1 110 22
2 AD DP GL GC 0/1 120 24 1/1 100 80
or
Code:
sed 's/ /:/g' chrom
Chrom:Sample1:Sample2:Sample3
1:AD:DP:GL:CG:1/1:119:23:0/1:110:22
2:AD:DP:GL:GC:0/1:120:24:1/1:100:80
Thus you'll have something that AWK can deal with easily.

Hope this helps some.

Last edited by tronayne; 05-19-2012 at 08:10 AM.
 
Old 05-19-2012, 07:40 PM   #5
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,326

Rep: Reputation: 919Reputation: 919Reputation: 919Reputation: 919Reputation: 919Reputation: 919Reputation: 919Reputation: 919
another quick-and-dirty hack would be to use two awks:
Code:
[schneidz@hyper ~]$ awk -F : '{print $1 " " $3 " " $5 " " $7}' jv61.lst | awk '{print $3 " " $4 " " $6 " " $7}'
GT DP 0/0 119
GT DP 0/0 114
 
Old 05-20-2012, 03:48 PM   #6
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
***Please use [code][/code] tags around your code and data, to preserve formatting and to improve readability. Please do not use quote tags, colors, or other fancy formatting.***

You need a check to ensure that you are only printing fields that have delimiters. You also need to use printf to keep all the output on a single line. This makes the final line formatting a bit trickier however.

Here's the version I came up with, formatted as a stand-alone script:

Code:
#!/usr/bin/awk -f

{
	for ( x=1 ; x<=NF ; x++ )
	{
		if ( $x ~ /[:]/ )
		{
			split( $x , a , ":" )
			printf( "%s %s" , a[1] , a[3] )
			if ( x != NF )
			{
				printf( " " )
			}
		}
	}
	print ""
}
I don't doubt there are cleaner ways to go about it, though.


Edit: Here's a revised version as mentioned in my follow-up post. To target an arbitrary number of fields, just replace NF with the maximum number you want (3, in this case). You can also print the first line by testing the NR variable. Be sure to set the number of "%s" entries in printf to match the field count.

I'm thinking that the if ":" test may be superfluous in this case, and perhaps even detrimental, if you're going to limit the fields printed by count. But in the absence of further clarification I left it in.

Finally, I took the liberty of changing the format of the output to use tabs instead of spaces, assuming you want something human-readable. Just go back and replace all the "\t"s with spaces if you don't desire that behavior.

Code:
#!/usr/bin/awk -f

{
	if ( NR == 1 )
	{
		printf( "%s\t%s\t%s\n" ,$1,$2,$3 )
		next
	}

	printf( "%s\t", $1 )
	for ( x=1 ; x<=3 ; x++ )
	{
		if ( $x ~ /[:]/ )
		{
			split( $x , a , ":" )
			printf( "%s %s" , a[1] , a[3] )
			if ( x != 3 )
			{
				printf( "\t" )
			}
		}
	}
	print ""
}

Last edited by David the H.; 05-22-2012 at 10:31 AM. Reason: 1) minor rewording 2) forgot the -f on the shebang 3) as posted
 
1 members found this post helpful.
Old 05-20-2012, 05:54 PM   #7
tronayne
Senior Member
 
Registered: Oct 2003
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541

Rep: Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065
If your input file, chrom looks like this:
Code:
Chrom Sample1 Sample2 Sample3
1 AD:DP:GL:CG 1/1:119:23 0/1:110:22
2 AD:DP:GL:GC 0/1:120:24 1/1:100:80
and your AWK program, chrom.awk, looks like this:
Code:
BEGIN {
	# print the first line
	printf ("%s %s %s\n", $1, $2, $3);
}
{
	printf ("%s %s %s %s %s\n", $1, $2, $4, $6, $8);
}
then
Code:
sed 's/:/ /g' chrom | awk -f chrom.awk
  
Chrom Sample1 Sample3  
1 AD GL 1/1 23
2 AD GL 0/1 24
You could use tabs to space things a little better, but what that heck.

That about what you wanted?

Hope this helps some.
 
1 members found this post helpful.
Old 05-22-2012, 10:06 AM   #8
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
I must say, all of the supplied suggestions so far that use sed or a second awk command for pre-processing assume that all input is exactly like the example the OP posted. But we can't take it for granted, at this point, that every line has exactly the same number of fields and sub-fields. Maybe they do, but the OP hasn't yet said so.

If you read the OP's actual stated requirements, you'll realize that just trying to replicate the example input>>output may not be enough to satisfy it under all conditions. My solution is so far the only one that correctly applies the criteria as described in the OP; print only sub-fields 1 and 3 of fields that contain colon delimiters), and can handle lines of arbitrary length.


(Although I did miss the part from his second post about printing the header and only a subset of fields. I'm going to go back and edit it to add a modified version after this post).


If the OP will come back and further clarify the structure of the input and the desired output, however, it may be possible that one of the above solutions, or a similar simpler command, would be satisfactory. Indeed, if the data structure is absolutely fixed, then you can simply use space OR colon as the delimiter and print only exactly the fields you want:

Code:
awk -F '[ :]' 'NR==1 { print $1,$2,$3 } ; NR>1 { print $2,$4,$6,$8 }'
 
1 members found this post helpful.
Old 05-22-2012, 11:15 AM   #9
jv61
LQ Newbie
 
Registered: May 2012
Posts: 24

Original Poster
Rep: Reputation: Disabled
Hi all,

Many thanks for all your replies & helpful hints. Sorry for my delayed response.

The following code from David answers close enough to my question.

Code:
#!/usr/bin/awk -f

{
	for ( x=1 ; x<=NF ; x++ )
	{
		if ( $x ~ /[:]/ )
		{
			split( $x , a , ":" )
			printf( "%s %s" , a[1] , a[3] )
			if ( x != NF )
			{
				printf( " " )
			}
		}
	}
	print ""
}

The problem I have here is, the above prints the data from each column in a separate line. For example the above code prints the results like this.

Code:
GT DP
0/0 119
0/0 92
0/0 109
0/1 22
GT DP
0/0 114
0/0 101
0/0 56
0/0 13
GT DP
1/1 99
1/1 73
0/1 101
0/0 12
But I would like to print like this

Code:
GT DP 0/0 119 0/0 92  0/0 109 0/1 22
GT DP 0/0 114 0/0 101 0/0 56  0/0 13
GT DP 1/1 99  1/1 73  0/1 101 0/0 12
David, in answer to your question about my input file type, my input file type has 96 Sample columns that have : as a delimiter and others don't. The following is a portion from my input file type. I have printed just two sample columns but I have 96 sample cloumns with similar type of data from Sample 1 to Sample 96.

Code:
#CHROM      POS ID   FORMAT        Sample1                 Sample2 
consensus_3  67 .  GT:PL:DP:SP:GQ  0/0:0,255,106:119:0:99  0/0:0,255,96:92:0:99
consensus_6  48 .  GT:PL:DP:SP:GQ  0/0:0,51,39:114:0:50    0/0:0,83,45:101:0:58
consensus_48 20 .  GT:PL:DP:SP:GQ  1/1:98,255,0:99:0:99    1/1:90,220,0:73:0:98
consensus_93 48 .  GT:PL:DP:SP:GQ  1/1:84,205,0:82:0:87    0/1:45,0,53:63:0:48
What I want to do is to print the 1st and 3rd column from the fields that has : as delimiter and prints the rest of the columns unchanged. My output should look something like this

Code:
#CHROM      POS ID FORMAT  Sample1   Sample2 
consensus_3  67 .  GT DP   0/0 119   0/0  92   
consensus_6  48 .  GT DP   0/0 114   0/0 101
consensus_6  48 .  GT DP   1/1  99   1/1  73
consensus_6  48 .  GT DP   1/1  82   0/1  63
Thanks
 
Old 05-22-2012, 12:58 PM   #10
jv61
LQ Newbie
 
Registered: May 2012
Posts: 24

Original Poster
Rep: Reputation: Disabled
Solved

That's awesome, thanks for the revised version David. It solved my problem. This forum & post has been really a learning experience for me. Thank you all again for sharing your ideas & help. Much appreciated :)

Code:
#!/usr/bin/awk -f
{
        if ( NR == 1 )
        {
            printf( "%s\n", $0 )
                        next
                        }

            printf( "%s %s %s ", $1,$2,$3 )
            for ( x=1 ; x<=NF ; x++ )
         {
             if ( $x ~ /[:]/ )
          {
             split( $x , a , ":" )
             printf( "%s %s " , a[1] , a[3] )
             if ( x != NF )
          {
             printf( " " )
         }
        }

      }
print " "
}
 
Old 05-22-2012, 02:27 PM   #11
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Quote:
Originally Posted by jv61 View Post
The problem I have here is, the above prints the data from each column in a separate line. For example the above code prints the results like this.

That's quite odd. There's no place in the original script where a newline should be inserted except after the whole line has been processed. Unless you modified it to insert one yourself somehow.


But now that I understand your exact requirements, we can simplify things quite a bit.

Code:
#!/usr/bin/awk -f

BEGIN{ OFS=" " }
{
	for ( x=1 ; x<=NF ; x++ )
	{
		if ( $x ~ /[:]/ )
		{
			split( $x , a , ":" )
			$x=a[1] OFS a[3]
		}
		else $x=$x
	}
	print
}
Just scan every field on the line, and if it contains a colon, split it and replace it with the modified version. Then print the line.

The addition of the BEGIN block allows you to set whatever output separator you wish between fields. The "else $x=$x" is also there so that otherwise unmodified lines such as the first one also print as separate fields according to OFS, rather than as an unmodified unit. I'm not really sure why that's necessary, to tell the truth, but according to my testing it is.
 
Old 05-22-2012, 05:52 PM   #12
jv61
LQ Newbie
 
Registered: May 2012
Posts: 24

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by David the H. View Post
That's quite odd. There's no place in the original script where a newline should be inserted except after the whole line has been processed. Unless you modified it to insert one yourself somehow.

Yes, I modified that one. The simplified version of yours is smarter way to do.

Thanks

Last edited by jv61; 05-22-2012 at 05:53 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How use CUT -d 'delimiter' is delimiter is a TAB? frenchn00b Programming 12 11-06-2013 04:17 AM
[SOLVED] Printing Columns of Arrays odstderek Linux - Newbie 4 11-06-2011 02:39 PM
printing 2 arrays to 2 columns. casperdaghost Programming 2 06-16-2010 08:27 AM
Text file manipulation: selecting specific lines/columns using awk and print CHARL0TTE Linux - Newbie 2 02-27-2010 03:40 AM
printing multiple columns with awk kdelover Programming 16 12-16-2009 10:10 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 03:37 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration