[SOLVED] how to sort on last column

chrism01 · 06-09-2022, 12:38 AM

If(!) i understand the qn (or close-ish)

Code:

# test file t.t contains 4 recs
ne4 tow4 three4 last4
ne2 tow2 three2 last2
ne1 tow1 three1 last1
ne3 tow3 three3 last3

# my code
#!/usr/bin/perl -w
use strict;

my (
	$file, $rec, %txt_pairs, $rev_rec, $last, $key
	);

$file="t.t";
open( TXT_FILE, '<' , "$file" ) or
            die "Can't open txt file: $file: $!\n";
while ( defined ( $rec = <TXT_FILE> ) )
{
   # Remove unwanted chars
   chomp $rec;                 # newline
   $rec =~ s/^\s+//;           # leading whitespace
   $rec =~ s/\s+$//;           # trailing whitespace

   next unless length($rec);   # anything left?

   # Split 'key value' string 
   ($last)=reverse(split(/\s+/, $rec) );
   $txt_pairs{$last} = $rec;

#print "last $last: Rec $rec \n";
}
close(TXT_FILE) or
            die "Can't close txt file: $file: $!\n";


for $key (sort keys %txt_pairs )
{
    print "$key $txt_pairs{$key}\n";
}

#Results
last1 ne1 tow1 three1 last1
last2 ne2 tow2 three2 last2
last3 ne3 tow3 three3 last3
last4 ne4 tow4 three4 last4

HTH

Skaperen · 06-09-2022, 01:05 AM

does column one have "ne4 tow4 three4" or "ne4" in the first line of your test file?

it's a misleading example because it looks like white space is the delimiter and it's unclear how to parse just the first column. the difficulty is that splitting the columns could be using a delimiter that is a part of the first column. there could be multiple such "delimiters".

pan64 · 06-09-2022, 03:58 AM

Quote:

Originally Posted by Skaperen

does column one have "ne4 tow4 three4" or "ne4" in the first line of your test file?

it's a misleading example because it looks like white space is the delimiter and it's unclear how to parse just the first column. the difficulty is that splitting the columns could be using a delimiter that is a part of the first column. there could be multiple such "delimiters".

And again. Without details we cannot help you. You have to specify a way to parse that input file, telling us "it is misleading" or "it won't work" is just useless.

The packages handled by apt is already stored in a database and has a perl api to manipulate it.
And I still think you’re overcomplicating something that can be solved a lot easier, just you refuses to tell us the real details and your real goal.

boughtonp · 06-09-2022, 08:51 AM

Quote:

Originally Posted by Skaperen

i have no interest in why they are in that form. knowing why won't improve anything.

i did check out the link. i did not send them anything.

the analysis involves some things i am already doing (how i select and upgrade packages) so i know it will be of no interest to others. this will be an ongoing thing i'll be doing each release. it is to look at how packages get renamed or split with upgrades.

I think it's pretty rare that there's only one person in the entire world interested in something, but if what you're doing is truly of no interest to others, I guess the same is true for this thread.

Skaperen · 06-09-2022, 12:51 PM

Quote:

Originally Posted by pan64

You have to specify a way to parse that input file

that's mostly what this thread is about ... how to parse that input file ... in or for sort.

Quote:

Originally Posted by pan64

The packages handled by apt is already stored in a database and has a perl api to manipulate it.
And I still think you’re overcomplicating something that can be solved a lot easier, just you refuses to tell us the real details and your real goal.

i have minimized the problem and narrowed it down. it is to sort the (uncompressed) files i find in the "apt-file" package. i have described the format in the widest possible scope ... to consider the most difficult cases where some package file path has one or more delimiter characters in it. there are only 2 delimited columns, first and last. last is the sort key.

so the goal is to sort the specified files as described. i already figured out one way to parse this but wanted to do it in sort to specify the last column as key. but a solution was given: flip first and last, sort, flip back (i can just use it flipped and skip flip back). and another solution involved a regex i have not yet tested (i don't know regex enough to visualize if it should work).

do you think i should use different files, instead, that i don't know the format for (yet)?

i minimized to this narrow problem and asked it. i'm staying on topic with the problem i asked and not going to a wider one of the whole project. i have no interest in (expanding the topic) asking about the wider project.

chrism01 · 06-10-2022, 12:18 AM

My input test file is as stated.

You seemed to be implying that for your data, the 'last column' is separated by <some space> from preceding data, which may or may not have spaces.

My prog temp reverse a (copy of) the rec from the input file and splits out the now-first (was last) col of the orig data, then uses that as a key to a hash where the 'data' in the hash is the entire rec (unchanged) which seems to be what you wanted.

It then sorts the hash on the 'key' ie what was orig the last col (as reqd) and then prints the key (optional - just remove from print if you want) followed by the entire associated rec.

HTH

PS If that is NOT your requirements, please specify in detail, clearly - thank you.

Skaperen · 06-13-2022, 06:10 PM

Quote:

Originally Posted by chrism01

PS If that is NOT your requirements, please specify in detail, clearly - thank you.

i see it as a workaround. as a solution, it works, and looks like it should work well.

chrism01 · 06-16-2022, 01:00 AM

I did have a slight tweak that might run a bit faster, but the new filtering on LQ won't let me post it..
However the above should work just fine.

Skaperen · 06-19-2022, 06:31 PM

if there is something i can't post in a technical sense, i put it in a web file and post the URL. i would not even do that for general rule violations. this is a family-rated web site. my youngest niece was reading this site when she was 4 y/o.