LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 08-28-2014, 12:45 PM   #1
raosr020
LQ Newbie
 
Registered: Nov 2012
Posts: 17

Rep: Reputation: Disabled
Question Help needed in formatting the Output file


Hi All,

Need your help in resolving the below issue.

I've a file called "data.txt" with the below lines:

PHP Code:
TT: <tell://me/sreenivas> 
<tell://me/100>

TT: <tell://me/sudheer> 
<tell://me/300>

TT: <tell://me/sreenivas> 
<tell://me/200>

TT: <tell://me/sudheer> 
<tell://me/400> 
I want an output in the below format. Please help me.

PHP Code:
TT: <tell://me/sreenivas>
<tell://me/100>
<tell://me/200>

TT: <tell://me/sudheer> 
<tell://me/300>
<tell://me/400> 
Explanation of above o/p:
If the pattern between "<tell://me/" and ">" is same on any of the lines that contains "TT" then take only one line from them.
That line should be followed by the lines followed by the actual lines that have the same pattern between "<tell://me/" and ">".

Looking forward to your help as soon as possible. Let me know if any queries.

With Regards,
SRK

Last edited by raosr020; 08-28-2014 at 12:52 PM.
 
Old 08-28-2014, 03:07 PM   #2
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
This is possible (though awkward) to do with "awk", and likely much easier in perl.

The approach is to simply save every TT entry in an hash table of arrays. Each time you get a duplicate TT entry you just push the addition records on the end of the array associated with key. (until you reach a blank line).

After you reach the end of file, you can output each entry in the array - which requires the output of each key in the hash, then each entry in the nested array.

The major problem occurs if the input file has millions of records to process. You could run out of memory adding entries to the hash table, or one of the arrays.
 
Old 08-28-2014, 03:34 PM   #3
jefro
Moderator
 
Registered: Mar 2008
Posts: 21,982

Rep: Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626Reputation: 3626
Might be easier to use a python script or (forget the name) program that includes all the parts needed without sending python.

I might be able to write it but I'm sure some good programmer in python could do this in maybe 10 lines or less. Maybe 3 lines.
 
Old 08-28-2014, 07:36 PM   #4
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
Try this:
Code:
#!/usr/bin/perl

$k = "";

while (<>) {
  
    if (/^TT: \</) { # a key is identified
        $k = $_;
        $tbl{$k} = [] if (!defined($tbl{$k})); # create new entry only if it doesn't exist
    } elsif (/^\<tell:/) {
        push($tbl{$k},$_);    # add a record to the array for this key
    } elsif (/^$/) {          # blank lines have things start over
        $k = "";
    }
}

# entire file has been read

foreach $k (keys(%tbl)) {
    print $k, @{$tbl{$k}};   # output the key record, and all data records associated with it
    print "\n";              # and restore the blank line between sections
}
Make the script executable and run as "script <input_file >output_file".

This works for your sample input.
 
Old 08-28-2014, 09:00 PM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,128

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
Are those keys guaranteed to be in "entry" order (when printed) ?.

Last edited by syg00; 08-28-2014 at 09:01 PM. Reason: clarified - when printing
 
Old 08-28-2014, 10:02 PM   #6
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
Nope. Because the keys(the "TT:" records) are used in a hash table they could come out in any order.

Now if you are referring to the "<tell:" records, then yes - these are in the array in the same order they were read.

It is possible to maintain the order of the "TT:" records though - by using another hash table (and a record counter). If the key is undefined in the current hash table (tbl in my example), then all that has to be done is to use the record number as the key in this "another hash", and the value of associated with it is the key used in the original table.

At the end of the data gathering, sort the keys of the new hash table (which will now be in numeric order), then use that to retrieve the key from the new hash table, and then output the data from the first hash table.

This should only add several new lines, and reword two existing lines:
Code:
#!/usr/bin/perl

$k = "";
$record_counter=0;

while (<>) {
    $record_counter++;   # new record read
    if (/^TT: \</) { # a key is identified
        $k = $_;
        if (!defined($tbl{$k})) { # only add definitions if they don't exist
            $tbl{$k} = [];        # new TT: line seen
            $newtbl{$record_counter} = $k;   # and where it was seen
        }
    } elsif (/^\<tell:/) {
        push($tbl{$k},$_);    # add a record to the array for this key
    } elsif (/^$/) {          # blank lines have things start over
        $k = "";
    }
}

# entire file has been read

foreach $i (sort keys(%newtbl)) { # the keys of newtbl are record numbers,
                                  # so sorting will force the correct order
    $k = $newtbl{$i}         # get the key for this entry
    print $k, @{$tbl{$k}};   # output the key record, and all data records associated with it
    print "\n";              # and restore the blank line between sections
}
I haven't tested this version, but I think it would work.
 
Old 08-28-2014, 10:32 PM   #7
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,128

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
I was trying something similar with assoc arrays in awk - like you said above, ugly ...
Easy enough to get the data as wanted, but not sorted as requested. I though I recalled hashes had the same issue.
 
Old 08-29-2014, 04:22 AM   #8
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
I was thinking about it more, and it is also possible to use an array to determine the order - just push the key onto a new array when it is not found. The advantage this has is that it eliminates the need for the sort, and even the record counter. Since the array of keys is in the proper order things just come out right:

Code:
#!/usr/bin/perl

$k = "";

while (<>) {
    if (/^TT: \</) { # a key is identified
        $k = $_;
        if (!defined($tbl{$k})) { # only add definitions if they don't exist
            $tbl{$k} = [];        # new TT: line seen
            push(@newarray,$k);   # and add to the order it was seen
        }
    } elsif (/^\<tell:/) {
        push($tbl{$k},$_);    # add a record to the array for this key
    } elsif (/^$/) {          # blank lines have things start over
        $k = "";
    }
}

# entire file has been read

foreach $k (@newarray) {     # the keys in newarray are in the order read
    print $k, @{$tbl{$k}};   # output the key record, and all data records associated with it
    print "\n";              # and restore the blank line between sections
}
This version hasn't been tested either, but is really not that different from the original.

As they say "Always a different way to do the same thing".

BTW, it shouldn't even be necessary to "start things over", so the last two lines of the "elsif (/^$/)" and the $k = "" could be dropped. I thought it would help point out errors by creating a null key reference - but if there are no records other than "TT:" and "<tell:", there shouldn't be any null keys either. Since things are either "TT" or "<tell:" entries being recorded -- there aren't any null keys recorded either.

Last edited by jpollard; 08-29-2014 at 04:29 AM.
 
Old 08-29-2014, 10:59 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Here are some alternatives:
Code:
#!/usr/bin/awk -f

BEGIN{ FS = "[/>]"
       c = 1 
     }   

/^TT/{
  a = $0

  if(!(a in o)) 
    b[c++] = a 

  getline
  
  o[a][$(NF-1)] = $0
}

END{
  for(i = 1; i < c; i++)
  {
    print b[i]
    for(j in o[b[i]])
      print o[b[i]][j]
  }
}
Or maybe a confusing one liner
Code:
ruby -ne 'o ||= {}; if /^TT/;a = $_;o["#{a}"] ||= [];end;o["#{a}"] << $_ if /^</;END{o.each{|k,v| puts "#{k}#{v.sort.join}"}}' file
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Formatting output of a file lLinux_Newbiel Programming 5 01-31-2012 08:12 AM
[SOLVED] facing problem in formatting output file smritisingh03 Programming 2 11-11-2010 04:39 PM
Output formatting help needed. pinga123 Linux - Newbie 8 11-03-2009 06:34 AM
Can't figure out the combine and formatting of output to a file HyperTrey Programming 11 12-12-2008 01:40 PM
Formatting output of ls doodar Linux - Newbie 29 07-29-2004 01:25 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 04:08 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration