LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-18-2014, 05:47 PM   #1
hashbang#!
Member
 
Registered: Aug 2009
Location: soon to be independent Scotland
Distribution: Debian
Posts: 120

Rep: Reputation: 17
[bash] group without sorting


I have a tab-separated file of the following format:

Code:
<company-id> <employee-id> <name> <company> <phone>
example

Code:
123 11 "John MacDonald" "ABC Ltd" 01234814215
124 11 "Mike Smith"     "ABC Ltd" 01234814333
136 09 "Jane Doe"       "XYZ Ltd" 01234814444
135 09 "Peter Miller"   "XYZ Ltd" 01234814888
I'd like to create a tab-separated output grouping employees by company without changing the order of companies and employees:

Code:
"ABC Ltd"
 "John MacDonald" 01234814215 123
 "Mike Smith"     01234814333 124
"XYZ Ltd"
 "Jane Doe"       01234814444 136
 "Peter Miller"   01234814888 135
Speed is not of the essence. The input files are never more than 100 rows.
 
Old 02-18-2014, 06:22 PM   #2
harryhaller
Member
 
Registered: Sep 2004
Distribution: Slackware-14.2
Posts: 468

Rep: Reputation: Disabled
Try using Awk. Here is a tutorial.
Don't worry, your solution is one of the most basic functions of awk - so you won't have to read very far before you have your solution.
 
1 members found this post helpful.
Old 02-18-2014, 07:43 PM   #3
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
With this InFile ...
Code:
123 11 "John MacDonald" "ABC Ltd" 01234814215
124 11 "Mike Smith"     "ABC Ltd" 01234814333
136 09 "Jane Doe"       "XYZ Ltd" 01234814444
135 09 "Peter Miller"   "XYZ Ltd" 01234814888
... this awk ...
Code:
awk -F \" '{if (CoName!=$4) {CoName=$4; print "\""CoName"\""}
      print " "$1"\""$2"\""$3$5}' $InFile >$OutFile
... produced this OutFile ...
Code:
"ABC Ltd"
 123 11 "John MacDonald"  01234814215
 124 11 "Mike Smith"      01234814333
"XYZ Ltd"
 136 09 "Jane Doe"        01234814444
 135 09 "Peter Miller"    01234814888
Daniel B. Martin

Last edited by danielbmartin; 02-18-2014 at 07:52 PM. Reason: Tighten the code, slightly
 
1 members found this post helpful.
Old 02-18-2014, 07:47 PM   #4
hashbang#!
Member
 
Registered: Aug 2009
Location: soon to be independent Scotland
Distribution: Debian
Posts: 120

Original Poster
Rep: Reputation: 17
Solved:

Code:
awk '
    BEGIN {
        FS  = "\t";
        OFS = "\t";
    }
    {
        if (r != $2)
            print $4;
        r = $2; 
        print "\t" $3, $5 ,$1;
    }
    ' $INFILE
 
Old 02-18-2014, 07:49 PM   #5
hashbang#!
Member
 
Registered: Aug 2009
Location: soon to be independent Scotland
Distribution: Debian
Posts: 120

Original Poster
Rep: Reputation: 17
Oops, daniel, our posts crossed!
 
Old 02-19-2014, 10:57 PM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
@OP - seems to me when I run your code I receive the following output:
Code:
			123 11 "John MacDonald" "ABC Ltd" 01234814215
			124 11 "Mike Smith"     "ABC Ltd" 01234814333
			136 09 "Jane Doe"       "XYZ Ltd" 01234814444
			135 09 "Peter Miller"   "XYZ Ltd" 01234814888
You may need to check this unless your requirement has changed

**** Hold the phone ****

Turns out the data provided is not tab separated which I just noticed you have said it is in your code although not in the original problem.
So after fixing the input file and using your code I now get:
Code:
"ABC Ltd"
	"John MacDonald"	01234814215	123
	"Mike Smith"	"ABC Ltd"	124

	"Jane Doe"		136
	"Peter Miller"	"XYZ Ltd"	135
So a bit closer, but would still need some work
 
Old 02-19-2014, 11:06 PM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
forgot to put in a suggested solution for you:
Code:
awk -F'\t+' 'x!=$(NF-1){print $(NF-1);x=$(NF-1)}{print $3,$NF,$1}' OFS="|" infile | column -t -s '|'
This displays your originally requested output
 
Old 02-20-2014, 06:42 AM   #8
hashbang#!
Member
 
Registered: Aug 2009
Location: soon to be independent Scotland
Distribution: Debian
Posts: 120

Original Poster
Rep: Reputation: 17
grail, which Perth, by the way?

Input + output are tab-separated as described in the OP. But the example I gave was typed up by hand using quote marks around names containing spaces and spaces instead of tabs. (I didn't think I code post tabs into a code block.) The original input/output has no quotes.

I am a bit puzzled as to why you found my script was not working. I have been using it for days and I can't see where I might have introduced a mistake in my posts.

I just ran a test and this is what I'm getting:

input
Code:
123	11	JohnMacDonald	ABCLtd	01234814215
124	11	MikeSmith	ABCLtd	01234814333
136	09	JaneDoe	XYZLtd	01234814444
135	09	PeterMiller	XYZLtd	01234814888

output
Code:
ABCLtd
	JohnMacDonald	01234814215	123
	MikeSmith	01234814333	124
XYZLtd
	JaneDoe	01234814444	136
	PeterMiller	01234814888	135


Your script works fine. I just need an empty first column in the employee rows (i.e. a tab preceding <name>) and output needs to be tab-separated.

I didn't realize that you can use expressions like $(NF-1) with awk. Learn something new every day!

Last edited by hashbang#!; 02-20-2014 at 06:51 AM.
 
Old 02-20-2014, 09:28 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
So I had a bit of look and based on how you formatted the input data, it appeared that there would be the necessary number of tabs (ie 1 or more) to make the original
data line up. Since your code set the FS value to a single tab, awk says that after each tab there must be a field, even if it is empty.
Hence, Mike Smith is shorter than the previous name and required an extra tab to line the data up, hence bogus output.

This is why you will notice that my FS value uses the '+' after the tab to indicate the possibility of 1 or more.

Also the addition of the column command nicely tidies up the output for you without worrying about too much formatting yourself.
For your requirement to have an additional space at the front:
Code:
awk -F'\t+' 'x!=$(NF-1){print $(NF-1);x=$(NF-1)}{print "",$3,$NF,$1}' OFS="|" infile | column -t -s '|'
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Sorting in bash corfuitl Programming 5 01-16-2014 01:00 PM
[SOLVED] help with sorting in bash... masavini Programming 4 07-20-2012 10:58 AM
sorting columns in bash twistadias Linux - Newbie 8 08-25-2008 12:22 AM
Bash: sorting by two fields humbletech99 Programming 7 09-25-2007 03:30 PM
Sorting files in BASH deleted/ Linux - Newbie 16 01-26-2006 06:03 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 05:46 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration