LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 12-06-2011, 09:57 AM   #1
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,135

Rep: Reputation: 299Reputation: 299Reputation: 299
Combining lines based on key


In this contrived example the key field is the first name.

Input file:
Doris Fletcher
Jane Baker
Jane Simmons
Janice Taylor
Linda Archer
Linda Brown
Linda Green
Mary Carter

Desired output file:
Doris Fletcher
Jane Baker Simmons
Janice Taylor
Linda Archer Brown Green
Mary Carter

I am improving self-written REXX programs by replacing REXX code with Linux commands. This provides several benefits:
- more concise programs
- shorter execution times
- learn Linux (learn by doing)

The desired function is already working in REXX, so an awk or Perl solution is not sought. I hope to find a Linux command (or combination of commands) which do this task.

Please advise.

Daniel B. Martin
 
Old 12-06-2011, 10:41 AM   #2
TB0ne
Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 14,769

Rep: Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614
Quote:
Originally Posted by danielbmartin View Post
In this contrived example the key field is the first name.

Input file:
Doris Fletcher
Jane Baker
Jane Simmons
Janice Taylor
Linda Archer
Linda Brown
Linda Green
Mary Carter

Desired output file:
Doris Fletcher
Jane Baker Simmons
Janice Taylor
Linda Archer Brown Green
Mary Carter

I am improving self-written REXX programs by replacing REXX code with Linux commands. This provides several benefits:
- more concise programs
- shorter execution times
- learn Linux (learn by doing)

The desired function is already working in REXX, so an awk or Perl solution is not sought. I hope to find a Linux command (or combination of commands) which do this task.
Without using awk or writing a shell script, not sure how you'd do it. I'd use awk to break each line up into two variables (FN and LN), compare the FN with the previous value, and if it's the same, output F2 on the same line. If it's NOT the same, start a new line, and put both on it.

Since you want to 'learn by doing', reference the shell scripting tutorial at http://tldp.org/LDP/abs/html/. Also, when asking for advice, it's probably best to avoid telling people what you don't want to hear, since we're all just trying to help each other. Perl could probably do this with a one-liner, and (if not), the code would be VERY tight and fast.
 
Old 12-06-2011, 12:30 PM   #3
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,135

Original Poster
Rep: Reputation: 299Reputation: 299Reputation: 299
Quote:
Originally Posted by TB0ne View Post
Also, when asking for advice, it's probably best to avoid telling people what you don't want to hear, since we're all just trying to help each other.
Telling people what I don't want to hear is intended as a courtesy to the reader. Otherwise he may devote time to creating a solution which won't be used. That annoys the person who was "just trying to help."

Daniel B. Martin
 
3 members found this post helpful.
Old 12-06-2011, 12:52 PM   #4
TB0ne
Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 14,769

Rep: Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614
Quote:
Originally Posted by danielbmartin View Post
Telling people what I don't want to hear is intended as a courtesy to the reader. Otherwise he may devote time to creating a solution which won't be used. That annoys the person who was "just trying to help."
Only if you come back and post, saying "I didn't use your solution, because it wasn't exactly what I wanted". And since you're doing this to 'learn by doing', none of us here are going to create your solution, since that would (obviously), defeat the purpose of you learning anything. Ruling out obvious solutions would tend to indicate a homework-assignment.

Perl was created exactly for such things. You wanted Linux commands to do this...awk would be it, since it would split the based on whatever field delimiter you see fit, in this case, a space. Since you have the means to assign the first/last name fields to variables, and you've already GOT working logic, it should be simple for you to use these things (along with the bash tutorial), to get done what you'd like. A bash script would be Linux commands, so it would seem your original query has been answered.
 
0 members found this post helpful.
Old 12-06-2011, 01:18 PM   #5
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,135

Original Poster
Rep: Reputation: 299Reputation: 299Reputation: 299
Quote:
Originally Posted by TB0ne View Post
Ruling out obvious solutions would tend to indicate a homework-assignment.
I assure you, this is *not* homework! I am well into retirement (17 years, now) and dabble in programming as a hobby, hoping to keep my brain from atrophying. Any LQ member who has lingering doubts is invited to contact me off-forum. I will respond with details about my employment history, detail which should convince you that I am in compliance with LQ forum rules.
 
Old 12-06-2011, 01:58 PM   #6
TB0ne
Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 14,769

Rep: Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614Reputation: 2614
Quote:
Originally Posted by danielbmartin View Post
I assure you, this is *not* homework! I am well into retirement (17 years, now) and dabble in programming as a hobby, hoping to keep my brain from atrophying. Any LQ member who has lingering doubts is invited to contact me off-forum. I will respond with details about my employment history, detail which should convince you that I am in compliance with LQ forum rules.
Not really needed, but the phrasing of your question and conditions set forth does tend to point in the 'homework' direction.

Regardless...the awk command is what you need to easily do this. Cut can also be used, and you've got man pages for both. These commands/man pages plus the scripting guide should be all you need.
 
0 members found this post helpful.
Old 12-06-2011, 03:04 PM   #7
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949Reputation: 1949
From what I see, you appear to be assuming that there is an adequate solution for your problem that doesn't use awk or perl. You also don't seem to recognize that awk is one of the core utilities found by default on all *nix boxes and is used ubiquitously in scripting.

Indeed, awk is exactly what any linux/unix user would tell you to use first off, because your request is exactly the kind of thing that it excels at above all other unix tools. As it stands, the three solutions I would suggest are an awk script, a perl script, or a bash script, probably in that order (although I'm most proficient at bash personally and would probably start with that myself).

Whichever the language used, I believe the simplest solution is simply to populate an associative array/hash with the first field as the index string, and then tacking the second field onto that entry as subsequent hits are made. Then you can simply follow up by printing out the whole array at the end.

Other than that, none of the other commonly-available tools will do exactly what you want, although it might be possible to cobble together a working solution by chaining together multiple commands. But why bother when we have awk at hand? Of course there may also be some lesser-known tool floating around that does exactly this, but you'd be just as likely to find them on your own as me, if you tried searching for them.
 
1 members found this post helpful.
Old 12-06-2011, 07:19 PM   #8
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,986
Blog Entries: 11

Rep: Reputation: 880Reputation: 880Reputation: 880Reputation: 880Reputation: 880Reputation: 880Reputation: 880
Moved: This thread is more suitable in <PROGRAMMING> and has been moved accordingly to help your thread/question get the exposure it deserves.


And I have a strong feeling of deja-vu :} reading this thread.

If you're on bash4 you're lucky, because you can use the first
column as the subscript for an array (older bash' only allow
numeric subscripts). Your reluctance you utilise awk still
baffles me; it's not like using awk on Linux is that different
from using REXX on zOS, OS/2 or even the Amiga. It's there,
it's free, does what you ask, and does it quickly (and easily).


Cheers,
Tink

Last edited by Tinkster; 12-06-2011 at 07:27 PM.
 
Old 12-06-2011, 11:11 PM   #9
crts
Senior Member
 
Registered: Jan 2010
Posts: 1,604

Rep: Reputation: 446Reputation: 446Reputation: 446Reputation: 446Reputation: 446
Hi,

is 'sed' a viable alternative?
Code:
$ cat file
Janice Flavor
Doris Fletcher
Jane Baker
Jane Simmons
Janice Taylor
Linda Archer
Linda Brown
Janice Wafer
Linda Green
Janice Joice
Mary Carter 

$ sed -nr ':a N;$! ba;:b s/(([^ ]+) +[^ \n]+) *([^ \n]*) *(.*\n)\2 +([^ \n]+)/\1 \5 \3 \4/g;tb;s/\n+/\n/gp' file
Janice Flavor Taylor Wafer Joice 
Doris Fletcher
Jane Baker Simmons  
Linda Archer Brown Green 
Mary Carter
It will produce the desired output even if the input is unsorted.

I am actually not really serious about doing such tasks with 'sed'. As others have already pointed out, 'awk' is far more appropriate for this kind of things.
 
1 members found this post helpful.
Old 12-07-2011, 07:09 AM   #10
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,135

Original Poster
Rep: Reputation: 299Reputation: 299Reputation: 299
Quote:
Originally Posted by crts View Post
is 'sed' a viable alternative?
It will produce the desired output even if the input is unsorted.

I am actually not really serious about doing such tasks with 'sed'. As others have already pointed out, 'awk' is far more appropriate for this kind of things.
Wow! This is the type of solution I asked for but I may have bitten off more than I can chew. As a Linux newbie I have used sed but only timidly, being awed by the power of this command. Please give an overview explanation of your code. Guided by this, I will tiptoe through the manual to get a better understanding. Perhaps this experience will overcome my reluctance to delve into awk. Thank you, thank you!
 
Old 12-07-2011, 07:24 AM   #11
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,551
Blog Entries: 28

Rep: Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176
Quote:
Originally Posted by danielbmartin View Post
I hope to find a Linux command (or combination of commands) which do this task.
Hello Daniel

Good to learn you are still going with this project, especially as I have fond memories of ReXX from VM/CMS days and partly wrote (not finished) a ReXX interpreter on UNIX as an exercise to learn C, UNIX and emacs.

I was going to ask if you regarded a bash script as a "combination of commands" but crts' sed fulfils your "a command" criterion.

Incidentally I find awk a lot easier than sed because it's more of a programming language -- especially if you do everything in the BEGIN section and use getline to read all the lines instead of using awk's pattern matching!

@crts: that's great
 
Old 12-07-2011, 10:18 AM   #12
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,135

Original Poster
Rep: Reputation: 299Reputation: 299Reputation: 299
Quote:
Originally Posted by catkin View Post
... I have fond memories of ReXX from VM/CMS days ...
As do I. Seventeen years ago I retired after a long career as a mainframe engineer/programmer working for a major computer manufacturer. Knew nothing of PCs, nothing of Linux. During my working years I became proficient with REXX and CMS Pipelines.

Two years ago I installed Ubuntu at the recommendation of a friend. I was enchanted by the similarity of Linux commands to CMS Pipelines. I've made a choice to write code using Linux commands (those few which I have learned) in a style which is frankly imitative of CMS Pipelines. This includes an abhorrence of explicit loops. Someday I may depart from this style, but for the time being I am not using Bash or Perl or awk.

Last edited by danielbmartin; 12-07-2011 at 11:35 AM.
 
Old 12-07-2011, 11:02 AM   #13
Cedrik
Senior Member
 
Registered: Jul 2004
Distribution: Slackware
Posts: 2,140

Rep: Reputation: 242Reputation: 242Reputation: 242
Quote:
Originally Posted by crts View Post
Code:
$ sed -nr ':a N;$! ba;:b s/(([^ ]+) +[^ \n]+) *([^ \n]*) *(.*\n)\2 +([^ \n]+)/\1 \5 \3 \4/g;tb;s/\n+/\n/gp' file
I quit !
 
Old 12-07-2011, 02:08 PM   #14
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942
Although the OP is not interested in awk solutions, I would personally use a combination of awk and sort in Linux:
Code:
awk '{ for (i = 2; i <= NF; i++) list[$1] = list[$1] " " $i } END { for (i in list) printf("%s%s\n", i, list[i]) }' file | sort
The final sort is needed because the list traversal order is undefined. The input does not need to be sorted, but the output might be unsorted. This should work well in any awk variant available in Linux (gawk, mawk).

On an embedded linux there might not be any awk available, so I would first sort the input, then combine consecutive lines using a simple POSIX shell loop:
Code:
sort file | sh -c '
    currkey=""
    currval=""
    while read key val ; do
        if [ "$key" = "$currkey" ]; then
            currval="$currval $val"
        else
            [ -n "$currkey$currval" ] && echo "$currkey $currval"
            currkey="$key"
            currval="$val"
        fi
    done
    [ -n "$currkey$currval" ] && echo "$currkey $currval"
'
or, written as a standalone utility script,
Code:
#!/bin/sh
if [ $# -lt 1 ] || [ "$1" = "-h" ] || [ "$1" = "--help" ]; then
    exec >&2
    echo ""
    echo "Usage: $0 [ -h | --help ]"
    echo "       $0 file(s)..."
    echo ""
    echo "This script will combine all records with the same initial field."
    echo "Duplicates are not removed. The input is considered unsorted."
    echo "The output is always sorted."
    echo ""
    exit 0
fi
sort "$@" | (
  currkey=""
  currval=""
  while read key val ; do
      if [ ":$key" = ":$currkey" ]; then
          currval="$currval $val"
      else
          [ -n "$currkey$currval" ] && echo "$currkey $currval"
          currkey="$key"
          currval="$val"
      fi
  done
  [ -n "$currkey$currval" ] && echo "$currkey $currval"
)
It is pretty common nowadays to use dash (a POSIX shell) instead of Bash, when resources are tight (or the script is simple and minimal execution time is desired; dash loads faster than bash). For example, most initial ramdisks used by Linux distributions use shell scripts written for dash. Some Linux distributions may still have sh symlinked to bash, however, so it might be prudent to specify dash explicitly instead of just using generic sh.
 
Old 12-07-2011, 07:18 PM   #15
timetraveler
Member
 
Registered: Apr 2010
Posts: 243
Blog Entries: 2

Rep: Reputation: 31
Well you have to use some shell to run sed, so do you mean you won't use a bash shell?
Just curious which shell meets your requirements.

Can you post your Rexx code to perform this task?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Select lines from FileA based on a key field in FileB danielbmartin Linux - Newbie 2 02-11-2011 11:37 AM
Combining records with same key danielbmartin Linux - Newbie 1 04-04-2010 11:11 PM
[2 internet connections] Combining load balancing and rule based routing TomG22 Linux - Networking 4 05-18-2009 04:50 PM
Combining 2 command lines satimis Programming 6 10-18-2004 09:40 PM
combining multiple dsl lines BaudRacer General 3 01-12-2004 09:15 AM


All times are GMT -5. The time now is 02:26 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration