LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 04-08-2013, 03:56 PM   #1
karthikbhuvanagiri
LQ Newbie
 
Registered: Mar 2013
Posts: 6

Rep: Reputation: Disabled
Group by and string concatenation


Hi,

I was trying to work on a file which had the following data format

Code:
1  hi
1  this
1  is
1  john
2  hello 
3  test
3  case
the expected output file is the below
Code:
1  hi, this, is, john
2  hello
3  test, case
Can anybody help me in writing a command ?
 
Old 04-08-2013, 05:34 PM   #2
PaBlO_r00t
LQ Newbie
 
Registered: Apr 2013
Posts: 4

Rep: Reputation: Disabled
Vamos allá:

while read clave; do línea=`grep -e "^$clave" fichero.txt | cut -d " " -f 2 | tr "\n" " "`; echo "$clave $linea">>fichero_output.txt; done < <( cut -d " " -f 1 fichero.txt | grep -v -e "^$" | uniq )
 
Old 04-08-2013, 05:42 PM   #3
karthikbhuvanagiri
LQ Newbie
 
Registered: Mar 2013
Posts: 6

Original Poster
Rep: Reputation: Disabled
Thanks for replying, but I couldnt get the command to work , I am getting `<(' unexpected, also could you please explain what exactly the command is doing ? I am a newbie in UNIX.

Thanks,
Karthik
 
Old 04-08-2013, 05:56 PM   #4
PaBlO_r00t
LQ Newbie
 
Registered: Apr 2013
Posts: 4

Rep: Reputation: Disabled
uffff sorry for my english:

1º) get the possible keys in the file (different numbers that appears in column 1)
cut -d " " -f 1 fichero.txt | grep -v -e "^$" | uniq

2º) ok, lets go to read them one by one with a loop:

while read clave; do ..... ; done < <( cut -d " " -f 1 fichero.txt | grep -v -e "^$" | uniq )

3º) for each value of the key we find all the registers in the file that contains it and replace the carrier return with an space:

grep -e "^$clave" /tmp/fichero.txt | cut -d " " -f 2 | tr "\n" " "

and the result stores in a variable:

linea=`grep -e "^$clave" /tmp/fichero.txt | cut -d " " -f 2 | tr "\n" " "`

4º) then we have the key and the strings, finally put all together in the output file:

echo "$clave $linea">>fichero_output.txt
 
1 members found this post helpful.
Old 04-08-2013, 06:10 PM   #5
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,442

Rep: Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880
Well the previous answer looks a bit convoluted to me. Perhaps you can identify what language (bash, perl, awk) you know and / or were wanting to write this in?

I would also expect to see what attempt you have made to solve the problem yourself and where you are stuck?
 
Old 04-08-2013, 06:13 PM   #6
PaBlO_r00t
LQ Newbie
 
Registered: Apr 2013
Posts: 4

Rep: Reputation: Disabled
sorry, in bash-scripting....simply runs it in the command prompt.
Im a teacher of bash-scripting and operating systems. I hope it will be helpfull

Last edited by PaBlO_r00t; 04-08-2013 at 06:17 PM.
 
Old 04-08-2013, 06:32 PM   #7
karthikbhuvanagiri
LQ Newbie
 
Registered: Mar 2013
Posts: 6

Original Poster
Rep: Reputation: Disabled
Thanks for the explanation Pablo, but i still cannot get this to work ,get the same error (' unexpected

Code:
while read clave;
do linea='grep -e "^$clave" test.txt| cut -d " " -f 2 | tr "\n" " "';
echo "$clave $linea">>output.txt; 
done << ( cut -d " " -f 1 test.txt | grep -v -e "^$" | uniq )
@Grail: I tried writing this using awk and while read loop, but couldnt get too far .I previously tried in doing a sum for the columns, but cannot get this to work for concatenation.

This is of course assuming column 2 has decimal values, So i was trying to change this command to accommodate the concatenation for string type fields, but couldnt get it to work
Code:
awk '($1 == col1) || (col1 == "") {col2 += $2} ($1 != col1) && (col1 != "")  {print col1 " " col2; col2 = $2} {col1 = $1} END {print col1 " " col2}' temp.txt
 
Old 04-08-2013, 07:17 PM   #8
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,442

Rep: Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880
Ok ... the problem you are having with Pablo's bash code is the < signs after the 'done' must have a space between them, so it should look like:
Code:
done < <( cut -d " " -f 1 test.txt | grep -v -e "^$" | uniq )
I would have thought a slightly cleaner process may have been:
Code:
#!/bin/bash

old=0

while read -r num rest
do
    if (( old == 0 ))
    then
        printf "%d %s" num, rest
        old=num
        continue
    fi

    if (( num <> old ))
    then
        printf "\n%d %s" num, rest
    else
        printf ", %s" rest
    fi

    old=num
done<file
As for awk, this can be made simpler by:
Code:
awk 'NR==1{print;old=$1;next}{if(old != $1)printf "\n%s", $0;else printf ", %s", $2;old=$1}' file
I haven't tested either but you should get the drift
 
1 members found this post helpful.
Old 04-08-2013, 09:15 PM   #9
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,067

Rep: Reputation: 284Reputation: 284Reputation: 284
Quote:
Originally Posted by grail View Post
As for awk, this can be made simpler by:
Code:
awk 'NR==1{print;old=$1;next}{if(old != $1)printf "\n%s", $0;else printf ", %s", $2;old=$1}' file
This variation works even better ...
Code:
awk '{if(old != $1)printf "\n%s", $0;else printf ", %s", $2;old=$1} END{print}' $InFile
Daniel B. Martin

Last edited by danielbmartin; 04-08-2013 at 09:58 PM. Reason: Improved code
 
Old 04-08-2013, 10:23 PM   #10
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,442

Rep: Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880
@Daniel - Almost, except now you have a new line prior to the first entry Only reason for the extra I put in
 
Old 04-09-2013, 05:43 PM   #11
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946
Here's another awk option for you.

Code:
awk '{ a[$1]=a[$1]?a[$1]", "$2:$2 } END{ for (i in a){ print i,a[i] } }' infile.txt | sort -k1n
It uses an array with the value of field 1 as the index, and a ternary operator to test whether that value has been seen before. If a[$1] exists (i.e. that $1 value has been processed before), add $2 to it with comma+space between them. If not, initiate it with $2 alone.

At the end, loop through the array and print the indexes and their values.

Finally pass the output through sort, since awk's arrays won't necessarily track in the order you want.

Incidentally, gnu awk from v.4 can be configured to handle the sorting internally, although it does make the command longer.

Code:
gawk 'BEGIN{ PROCINFO["sorted_in"]="@ind_num_asc" } { a[$1]=a[$1]?a[$1]", "$2:$2 } END{ for (i in a){ print i,a[i] } }' infile.txt
http://www.gnu.org/software/gawk/man...y-Sorting.html
 
1 members found this post helpful.
Old 04-10-2013, 02:19 PM   #12
pgalba
LQ Newbie
 
Registered: Apr 2013
Posts: 6

Rep: Reputation: Disabled
Try this one:

Code:
awk  '{current=$1; if (prev == current) {printf(", %s", $2)} else {printf("\n%s", $0)}; prev=current}END{printf("\n")}'
I you like it, thanks guru H_TeXMeX_H from this post: http://www.linuxquestions.org/questi...54#post4929154

Also hoping that someone could see why the same code is not working for my live data.

Thanks,
 
Old 04-13-2013, 04:05 PM   #13
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 620

Rep: Reputation: 362Reputation: 362Reputation: 362Reputation: 362
Hi.
Here is a sed solution, just for fun:
Code:
$ sed -nr '1{h;b}; H; x; s/^([^ ]+)(.*)\n\1(.*)/\1\2,\3/; ta; $ba; P; b; :a; h; $p' in
1  hi,  this,  is,  john
2  hello 
3  test,  case
 
1 members found this post helpful.
Old 04-14-2013, 11:18 AM   #14
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,067

Rep: Reputation: 284Reputation: 284Reputation: 284
Quote:
Originally Posted by firstfire View Post
Code:
$ sed -nr '1{h;b}; H; x; s/^([^ ]+)(.*)\n\1(.*)/\1\2,\3/; ta; $ba; P; b; :a; h; $p' in
1  hi,  this,  is,  john
2  hello 
3  test,  case
Minor nitpick: OP asked for one blank following each comma. Your solution has two.

Daniel B. Martin
 
Old 04-14-2013, 12:13 PM   #15
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 620

Rep: Reputation: 362Reputation: 362Reputation: 362Reputation: 362
Quote:
Originally Posted by danielbmartin View Post
Minor nitpick: OP asked for one blank following each comma. Your solution has two.

Daniel B. Martin
Hi, Daniel.
It's easy to fix:
Code:
$ sed -nr '1{h;b}; H; x; s/^([^ ]+)(.*)\n\1 *(.*)/\1\2, \3/; ta; $ba; P; b; :a; h; $p' in
1  hi, this, is, john
2  hello 
3  test, case
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Why does string concatenation in perl result in two lines, not one? markhod Programming 5 01-12-2010 08:00 PM
String concatenation problem in a loop tugce_zehra Linux - Newbie 1 12-06-2008 05:43 PM
Shell string concatenation Quakeboy02 Programming 4 04-26-2007 11:11 AM
string concatenation in c/c++ for a socket program mohtasham1983 Programming 3 02-14-2007 05:42 PM
string concatenation in AWK xanthium Programming 1 04-22-2002 04:41 AM


All times are GMT -5. The time now is 12:58 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration