Combining lines based on key
In this contrived example the key field is the first name.
Input file: Doris Fletcher Jane Baker Jane Simmons Janice Taylor Linda Archer Linda Brown Linda Green Mary Carter Desired output file: Doris Fletcher Jane Baker Simmons Janice Taylor Linda Archer Brown Green Mary Carter I am improving self-written REXX programs by replacing REXX code with Linux commands. This provides several benefits: - more concise programs - shorter execution times - learn Linux (learn by doing) The desired function is already working in REXX, so an awk or Perl solution is not sought. I hope to find a Linux command (or combination of commands) which do this task. Please advise. Daniel B. Martin |
Quote:
Since you want to 'learn by doing', reference the shell scripting tutorial at http://tldp.org/LDP/abs/html/. Also, when asking for advice, it's probably best to avoid telling people what you don't want to hear, since we're all just trying to help each other. Perl could probably do this with a one-liner, and (if not), the code would be VERY tight and fast. |
Quote:
Daniel B. Martin |
Quote:
Perl was created exactly for such things. You wanted Linux commands to do this...awk would be it, since it would split the based on whatever field delimiter you see fit, in this case, a space. Since you have the means to assign the first/last name fields to variables, and you've already GOT working logic, it should be simple for you to use these things (along with the bash tutorial), to get done what you'd like. A bash script would be Linux commands, so it would seem your original query has been answered. |
Quote:
|
Quote:
Regardless...the awk command is what you need to easily do this. Cut can also be used, and you've got man pages for both. These commands/man pages plus the scripting guide should be all you need. |
From what I see, you appear to be assuming that there is an adequate solution for your problem that doesn't use awk or perl. You also don't seem to recognize that awk is one of the core utilities found by default on all *nix boxes and is used ubiquitously in scripting.
Indeed, awk is exactly what any linux/unix user would tell you to use first off, because your request is exactly the kind of thing that it excels at above all other unix tools. As it stands, the three solutions I would suggest are an awk script, a perl script, or a bash script, probably in that order (although I'm most proficient at bash personally and would probably start with that myself). Whichever the language used, I believe the simplest solution is simply to populate an associative array/hash with the first field as the index string, and then tacking the second field onto that entry as subsequent hits are made. Then you can simply follow up by printing out the whole array at the end. Other than that, none of the other commonly-available tools will do exactly what you want, although it might be possible to cobble together a working solution by chaining together multiple commands. But why bother when we have awk at hand? Of course there may also be some lesser-known tool floating around that does exactly this, but you'd be just as likely to find them on your own as me, if you tried searching for them. |
Moved: This thread is more suitable in <PROGRAMMING> and has been moved accordingly to help your thread/question get the exposure it deserves.
And I have a strong feeling of deja-vu :} reading this thread. If you're on bash4 you're lucky, because you can use the first column as the subscript for an array (older bash' only allow numeric subscripts). Your reluctance you utilise awk still baffles me; it's not like using awk on Linux is that different from using REXX on zOS, OS/2 or even the Amiga. It's there, it's free, does what you ask, and does it quickly (and easily). Cheers, Tink |
Hi,
is 'sed' a viable alternative? Code:
$ cat file I am actually not really serious about doing such tasks with 'sed'. As others have already pointed out, 'awk' is far more appropriate for this kind of things. |
Quote:
|
Quote:
Good to learn you are still going with this project, especially as I have fond memories of ReXX from VM/CMS days and partly wrote (not finished) a ReXX interpreter on UNIX as an exercise to learn C, UNIX and emacs. I was going to ask if you regarded a bash script as a "combination of commands" but crts' sed fulfils your "a command" criterion. Incidentally I find awk a lot easier than sed because it's more of a programming language -- especially if you do everything in the BEGIN section and use getline to read all the lines instead of using awk's pattern matching! :D @crts: that's great :) |
Quote:
Two years ago I installed Ubuntu at the recommendation of a friend. I was enchanted by the similarity of Linux commands to CMS Pipelines. I've made a choice to write code using Linux commands (those few which I have learned) in a style which is frankly imitative of CMS Pipelines. This includes an abhorrence of explicit loops. Someday I may depart from this style, but for the time being I am not using Bash or Perl or awk. |
Quote:
|
Although the OP is not interested in awk solutions, I would personally use a combination of awk and sort in Linux:
Code:
awk '{ for (i = 2; i <= NF; i++) list[$1] = list[$1] " " $i } END { for (i in list) printf("%s%s\n", i, list[i]) }' file | sort On an embedded linux there might not be any awk available, so I would first sort the input, then combine consecutive lines using a simple POSIX shell loop: Code:
sort file | sh -c ' Code:
#!/bin/sh |
Well you have to use some shell to run sed, so do you mean you won't use a bash shell?
Just curious which shell meets your requirements. Can you post your Rexx code to perform this task? |
All times are GMT -5. The time now is 12:14 PM. |