Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Explanation of above o/p:
If the pattern between "<tell://me/" and ">" is same on any of the lines that contains "TT" then take only one line from them.
That line should be followed by the lines followed by the actual lines that have the same pattern between "<tell://me/" and ">".
Looking forward to your help as soon as possible. Let me know if any queries.
This is possible (though awkward) to do with "awk", and likely much easier in perl.
The approach is to simply save every TT entry in an hash table of arrays. Each time you get a duplicate TT entry you just push the addition records on the end of the array associated with key. (until you reach a blank line).
After you reach the end of file, you can output each entry in the array - which requires the output of each key in the hash, then each entry in the nested array.
The major problem occurs if the input file has millions of records to process. You could run out of memory adding entries to the hash table, or one of the arrays.
#!/usr/bin/perl
$k = "";
while (<>) {
if (/^TT: \</) { # a key is identified
$k = $_;
$tbl{$k} = [] if (!defined($tbl{$k})); # create new entry only if it doesn't exist
} elsif (/^\<tell:/) {
push($tbl{$k},$_); # add a record to the array for this key
} elsif (/^$/) { # blank lines have things start over
$k = "";
}
}
# entire file has been read
foreach $k (keys(%tbl)) {
print $k, @{$tbl{$k}}; # output the key record, and all data records associated with it
print "\n"; # and restore the blank line between sections
}
Make the script executable and run as "script <input_file >output_file".
Nope. Because the keys(the "TT:" records) are used in a hash table they could come out in any order.
Now if you are referring to the "<tell:" records, then yes - these are in the array in the same order they were read.
It is possible to maintain the order of the "TT:" records though - by using another hash table (and a record counter). If the key is undefined in the current hash table (tbl in my example), then all that has to be done is to use the record number as the key in this "another hash", and the value of associated with it is the key used in the original table.
At the end of the data gathering, sort the keys of the new hash table (which will now be in numeric order), then use that to retrieve the key from the new hash table, and then output the data from the first hash table.
This should only add several new lines, and reword two existing lines:
Code:
#!/usr/bin/perl
$k = "";
$record_counter=0;
while (<>) {
$record_counter++; # new record read
if (/^TT: \</) { # a key is identified
$k = $_;
if (!defined($tbl{$k})) { # only add definitions if they don't exist
$tbl{$k} = []; # new TT: line seen
$newtbl{$record_counter} = $k; # and where it was seen
}
} elsif (/^\<tell:/) {
push($tbl{$k},$_); # add a record to the array for this key
} elsif (/^$/) { # blank lines have things start over
$k = "";
}
}
# entire file has been read
foreach $i (sort keys(%newtbl)) { # the keys of newtbl are record numbers,
# so sorting will force the correct order
$k = $newtbl{$i} # get the key for this entry
print $k, @{$tbl{$k}}; # output the key record, and all data records associated with it
print "\n"; # and restore the blank line between sections
}
I haven't tested this version, but I think it would work.
I was trying something similar with assoc arrays in awk - like you said above, ugly ...
Easy enough to get the data as wanted, but not sorted as requested. I though I recalled hashes had the same issue.
I was thinking about it more, and it is also possible to use an array to determine the order - just push the key onto a new array when it is not found. The advantage this has is that it eliminates the need for the sort, and even the record counter. Since the array of keys is in the proper order things just come out right:
Code:
#!/usr/bin/perl
$k = "";
while (<>) {
if (/^TT: \</) { # a key is identified
$k = $_;
if (!defined($tbl{$k})) { # only add definitions if they don't exist
$tbl{$k} = []; # new TT: line seen
push(@newarray,$k); # and add to the order it was seen
}
} elsif (/^\<tell:/) {
push($tbl{$k},$_); # add a record to the array for this key
} elsif (/^$/) { # blank lines have things start over
$k = "";
}
}
# entire file has been read
foreach $k (@newarray) { # the keys in newarray are in the order read
print $k, @{$tbl{$k}}; # output the key record, and all data records associated with it
print "\n"; # and restore the blank line between sections
}
This version hasn't been tested either, but is really not that different from the original.
As they say "Always a different way to do the same thing".
BTW, it shouldn't even be necessary to "start things over", so the last two lines of the "elsif (/^$/)" and the $k = "" could be dropped. I thought it would help point out errors by creating a null key reference - but if there are no records other than "TT:" and "<tell:", there shouldn't be any null keys either. Since things are either "TT" or "<tell:" entries being recorded -- there aren't any null keys recorded either.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.