LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Combining lines based on key (https://www.linuxquestions.org/questions/programming-9/combining-lines-based-on-key-917402/)

danielbmartin 12-07-2011 09:33 PM

Quote:

Originally Posted by timetraveler (Post 4544686)
Can you post your Rexx code to perform this task?

Code:


RecsWritten = 0
Key = subword(InRec.1,01,01)
OutRec = InRec.1
do j = 2 to InRec.0
  NextKey = subword(InRec.j,01,01)
  if NextKey = Key then
    do
      OutRec = OutRec subword(InRec.j,02,01)  /* Extend the output record */
    end
                  else
    do
      rc = LineOut(OutFile,OutRec)          /* Write completed record  */
      RecsWritten = RecsWritten + 1
      OutRec = InRec.j
      Key = NextKey
    end
end j
rc = LineOut(OutFile,OutRec)  /* Flush buffer  */
RecsWritten = RecsWritten + 1


danielbmartin 12-07-2011 09:46 PM

Quote:

Originally Posted by timetraveler (Post 4544686)
Well you have to use some shell to run sed, so do you mean you won't use a bash shell?

This thread was originally posted on the Newbie forum and moved to the Programming forum by a moderator. I am a newbie. I am so uneducated in Linux that I don't even know what a shell is, so it's difficult to answer your question. As stated earlier in this thread, I have a number of self-written REXX programs. I'm improving them by substituting small chunks of Linux commands for large chunks of REXX code. Does this mean my REXX program is a shell? It is still a REXX program which executes by invoking the REGINA interpreter.

theNbomr 12-07-2011 10:44 PM

I won't dispute the OP's wish to preclude Perl and AWK solutions (although the problem clearly wants a solution that uses associative arrays), however I am curious how Perl solutions are seen as something other than Linux, while a sed solution is not. Hard to interpret the requirements based on any logic I can derive from that.

--- rod.

timetraveler 12-08-2011 12:52 AM

Quote:

Originally Posted by danielbmartin
Code:


RecsWritten = 0
... rexx code ...


Sure I could have looked up rexx code but this way I can connect the dots between rexx and the other examples given here. Thanks for sharing.

Here's a linux command to do same:
(perl one-liners count as linux commands, but you don't have to use them)

perl -lane '$n{$F[0]} = $n{$F[0]} . " $F[1]";if(eof){print "$_$n{$_}" for keys %n}' names

Linda Archer Brown Green
Jane Baker Simmons
Mary Carter
Janice Taylor
Doris Fletcher

If there's a good rexx compiler/interpreter on linux then rexx counts too. That's the gnu/linux way and that's a huge part of the gnu/linux attraction, for many. Lots of choices.

For some, bash shell programming is their favorite system programming language. Others like Python, others Perl, etc. Shell programming infers awk,sed,tail,head,cut, etc.
Perl and Python (and others) can do those things natively. Limit your self or don't limit your self, gnu/linux lets you have it any way you want.

timetraveler 12-08-2011 01:03 AM

Quote:

Originally Posted by danielbmartin
This thread was originally posted on the Newbie forum and moved to the Programming forum by a moderator. I am a newbie. I am so uneducated in Linux that I don't even know what a shell is, so it's difficult to answer your question. As stated earlier in this thread, I have a number of self-written REXX programs. I'm improving them by substituting small chunks of Linux commands for large chunks of REXX code. Does this mean my REXX program is a shell? It is still a REXX program which executes by invoking the REGINA interpreter.

I don't know Rexx but I'm not convinced your improving on them if they already worked. It seems to me that you're using your knowledge of Rexx as a launch pad and bridge to learning more about gnu/linux. That's a great way to do it and you're probably already seeing plenty of similarities.

The shell is the software that gives you a gnu/linux command line. There are several shells around. Bash is probably the most common. There is also tcsh, csh, zsh, ksh and others. Sed, awk, etc. are separate and distinct from the shell but are run from a shell.

The Regina interpreter might be a shell but I don't know. It probably is not but instead is run from a shell. Most likely you are using bash inside your terminal program.

Main thing is to have some fun exploring gnu/linux and use whatever tools you like.

David the H. 12-08-2011 01:14 AM

A shell is a command-line interpreter, a cli interface into your system. They also generally have their own scripting language and the ability to act as interpreters for executing them. I'm not too familiar with REXX, but I believe it's more of a stand-alone interpreted language that can be easily used for scripting tasks. I don't know if it offers a shell interface per-se, but in the end much of the functionality is probably very similar.


@theNbomr: sed and awk are core programs found in all *nix systems, as (I believe) specified by posix. perl, OTOH, is an optional, multi-platform language, and can't be guaranteed to exist on any given system. So unlike the first two I can understand eliminating it as not a specifically "linux" solution.

danielbmartin 12-08-2011 09:48 AM

Quote:

Originally Posted by timetraveler (Post 4544860)
It seems to me that you're using your knowledge of Rexx as a launch pad and bridge to learning more about gnu/linux.

Yes. Moreover, the similarities between Linux commands and CMS Pipelines provides motivation.

Quote:

Originally Posted by timetraveler (Post 4544860)
I don't know Rexx but I'm not convinced your improving on them if they already worked.

Execution time was not the original subject of his thread but it is worth mentioning. This is an example.

I had a self-written REXX program which operates on the voter registration list from the county where I live. (This file is a public record, readily downloadable by anyone.) The program sifts the data, slices and dices it, sorts, reformats, etc. This program worked, i.e. it generated the desired result. Then I discovered that I could replace large chunks of REXX code with smaller chunks of Linux commands.

Now we get to the punch line, and that is execution time.
Same input file, 500,000+ records.
Same output file, 220,000+ records.
Execution time for the original REXX-only version: 9+ hours (an overnight run).
Execution time for the new mixed REXX+Linux version: 1 minute.

A breathtaking improvement! As a consequence, execution time for this program is now of small concern. It is still a REXX program but now the Linux commands do all the heavy lifting.

This dramatic reduction in execution time provides the motivation to learn more Linux commands and rework more of my REXX programs.

Daniel B. Martin

timetraveler 12-08-2011 08:13 PM

Quote:

Originally Posted by danielbmartin

....execution time.
Same input file, 500,000+ records.
Same output file, 220,000+ records.
Execution time for the original REXX-only version: 9+ hours (an overnight run).
Execution time for the new mixed REXX+Linux version: 1 minute.

Nice improvement. Your gnu/linux exploration started paying dividends right away it seems.

timetraveler 12-08-2011 08:19 PM

Quote:

Originally Posted by David the H.
....perl, OTOH, is an optional, multi-platform language, and can't be guaranteed to exist on any given system. So unlike the first two I can understand eliminating it as not a specifically "linux" solution.

Can you name one linux distro that doesn't contain perl. I can't think of one. But if the OP doesn't want to try perl it's his choice. Linux is all about choices.

Reuti 12-09-2011 06:29 AM

Oh, REXX - I used it at the time my employer changed from EXEC2 to it. Besides Regina there is also ooREXX which was open sourced by IBM several years ago and it includes a compiler. Maybe the original REXX script can execute faster in precompiled form too.

David the H. 12-09-2011 09:29 AM

Quote:

Originally Posted by timetraveler (Post 4545585)
Can you name one linux distro that doesn't contain perl. I can't think of one. But if the OP doesn't want to try perl it's his choice. Linux is all about choices.

Pretty much every distribution includes perl in their repositories, sure. But do all of them have it installed by default? Can you walk up to any random Linux computer and be certain that your perl script will run on it?

Full agreement here on the second point. :cool:

ntubski 12-09-2011 10:00 AM

Quote:

Originally Posted by danielbmartin (Post 4545204)
Now we get to the punch line, and that is execution time.
Same input file, 500,000+ records.
Same output file, 220,000+ records.
Execution time for the original REXX-only version: 9+ hours (an overnight run).
Execution time for the new mixed REXX+Linux version: 1 minute.

Seems excessive, did you use some bad algorithms (eg bubblesort) in the REXX-only version?

theNbomr 12-09-2011 10:19 AM

That is a factor of ~500 in speed. The older hardware supporting REXX (I assume) could easily account for all of that difference. But without knowing anything about the hardware, it is hard to make any realistic comparison. However, just having an OS that will run on commodity hardware must be a good thing.
I think it would be a rare distro that doesn't include Perl out of the box. I'm not sure the OP meant to limit the discussion to POSIX-only distros, and as I stated earlier I can respect his wish for 'Linux-only' solutions. I just don't know how he defines that, either either conceptually, or by some defined standard.

--- rod.

danielbmartin 12-10-2011 09:50 AM

Quote:

Originally Posted by ntubski (Post 4545939)
Seems excessive, did you use some bad algorithms (eg bubblesort) in the REXX-only version?

Bubble sort is awful; I used a QuickSort routine.

The long execution time of the REXX-only version may be attributed to:
1) Regina is interpreted, not compiled, and sorting large files takes a long time.
2) Regina I/O is painfully slow compared to Linux.

danielbmartin 12-10-2011 09:59 AM

Quote:

Originally Posted by theNbomr (Post 4545963)
... I can respect his wish for 'Linux-only' solutions. I just don't know how he defines that, either either conceptually, or by some defined standard.

Perhaps I'm using incorrect terminology if I say "Linux-only." Allow me to clarify by repeating part of a previous post in this thread.

Seventeen years ago I retired from a mainframe engineer/programmer job. During my working years I became proficient with REXX and CMS Pipelines.

Two years ago I installed Ubuntu on my home PC. (Good-bye Microsoft! Good-bye forever!!) I was enchanted by the similarity of Linux commands to CMS Pipelines. I've made a choice to write code using Linux commands (those few which I have learned) in a style which is frankly imitative of CMS Pipelines. This includes an abhorrence of explicit loops. Someday I may depart from this style, but for the time being I am not using Bash or Perl or awk.


All times are GMT -5. The time now is 01:34 AM.