LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-15-2012, 02:16 PM   #1
bunti01
LQ Newbie
 
Registered: Jul 2012
Posts: 10

Rep: Reputation: Disabled
Extract last paragraph from text file


Hi,

From a text file like below, I want to extract the last paragraph (note paragraphs are separated by
“----------------------------------------------------------“ and not by blank lines)

So in example below I would like to extract only the paragraph in red


Source = [DATABASE1], Dest = [ConnectionALL], SessionID = []
Table name = [Connection]
Table size = [2]x[12]
[ColLabel] = [ConnectionID] [ClientID] [LoginTime] [NumOfSubscribedSymbol] [REQ_IMAGE] [REQ_IMAGE_UPDATE] [REQ_CLOSE] [REQ_SYMBOL] [REQ_MBO_IMAGE] [REQ_MBP_IMAGE] [REQ_MBO_MBP_IMAGE] [REQ_INVALID]
[User 1] = [1] [TEST01] [2012-07-06 00:00:09] [5] [2454] [5] [0] [1] [0] [0] [0] [0]
[User 2] = [2] [TEST01] [2012-07-06 00:00:11] [1928] [2454] [1928] [0] [1] [0] [0] [0] [0]
----------------------------------------------------------
Source = [DATABASE1], Dest = [ConnectionALL], SessionID = []
Table name = [Connection]
Table size = [2]x[12]
[ColLabel] = [ConnectionID] [ClientID] [LoginTime] [NumOfSubscribedSymbol] [REQ_IMAGE] [REQ_IMAGE_UPDATE] [REQ_CLOSE] [REQ_SYMBOL] [REQ_MBO_IMAGE] [REQ_MBP_IMAGE] [REQ_MBO_MBP_IMAGE] [REQ_INVALID]
[User 1] = [1] [TEST01] [2012-07-06 00:00:09] [5] [2454] [5] [0] [1] [0] [0] [0] [0]
[User 2] = [2] [TEST01] [2012-07-06 00:00:11] [1928] [2454] [1928] [0] [1] [0] [0] [0] [0]
----------------------------------------------------------
Source = [DATABASE1], Dest = [ConnectionALL], SessionID = []
Table name = [Connection]
Table size = [2]x[12]
[ColLabel] = [ConnectionID] [ClientID] [LoginTime] [NumOfSubscribedSymbol] [REQ_IMAGE] [REQ_IMAGE_UPDATE] [REQ_CLOSE] [REQ_SYMBOL] [REQ_MBO_IMAGE] [REQ_MBP_IMAGE] [REQ_MBO_MBP_IMAGE] [REQ_INVALID]
[User 1] = [1] [TEST01] [2012-07-06 00:00:09] [5] [2454] [5] [0] [1] [0] [0] [0] [0]
[User 2] = [2] [TEST01] [2012-07-06 00:00:11] [1928] [2454] [1928] [0] [1] [0] [0] [0] [0]
----------------------------------------------------------
Source = [DATABASE1], Dest = [ConnectionALL], SessionID = []
Table name = [Connection]
Table size = [2]x[12]
[ColLabel] = [ConnectionID] [ClientID] [LoginTime] [NumOfSubscribedSymbol] [REQ_IMAGE] [REQ_IMAGE_UPDATE] [REQ_CLOSE] [REQ_SYMBOL] [REQ_MBO_IMAGE] [REQ_MBP_IMAGE] [REQ_MBO_MBP_IMAGE] [REQ_INVALID]
[User 1] = [1] [TEST01] [2012-07-06 00:00:09] [5] [2454] [5] [0] [1] [0] [0] [0] [0]
[User 2] = [2] [TEST01] [2012-07-06 00:00:11] [1928] [2454] [1928] [0] [1] [0] [0] [0] [0]
----------------------------------------------------------


Any help appreciated

Many Thanks
 
Old 07-15-2012, 02:41 PM   #2
tonyfreeman
Member
 
Registered: Sep 2003
Location: Fort worth, TX
Distribution: Debian testing 64bit at home, EL5 32/64bit at work.
Posts: 196

Rep: Reputation: 30
If the paragraphs are all the same number of lines, then you can use the "tail" command:

Code:
tail -n 8 <filename>
.
 
Old 07-15-2012, 02:45 PM   #3
bunti01
LQ Newbie
 
Registered: Jul 2012
Posts: 10

Original Poster
Rep: Reputation: Disabled
Thanks, unfortunately the number of lines in each paragraph is unknown and always varies and therefore need a way to extract the last paragraph based on the paragraph seperator (-----------------)
 
Old 07-15-2012, 02:54 PM   #4
Snark1994
Senior Member
 
Registered: Sep 2010
Distribution: Debian
Posts: 1,632
Blog Entries: 3

Rep: Reputation: 346Reputation: 346Reputation: 346Reputation: 346
This awk program seems to do the job:

Code:
BEGIN { acc = ""; blank = 0; }
{
    if(blank == 1){ 
        acc = $0 "\n";
        blank = 0;
    } else {
        acc = acc $0 "\n";
    }   
}
/----------------------------------------------------------/ {
    blank = 1;
}
END   { print acc; }
Run it with:

Code:
awk -f awk_file.awk input_file.txt
 
Old 07-15-2012, 02:59 PM   #5
bunti01
LQ Newbie
 
Registered: Jul 2012
Posts: 10

Original Poster
Rep: Reputation: Disabled
That does the job thanks Snark1994.

If anyone has a sed/awk one liner which does the same thing that would also be great
 
Old 07-15-2012, 03:16 PM   #6
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 709

Rep: Reputation: 428Reputation: 428Reputation: 428Reputation: 428Reputation: 428
Hi.

Using ed:
Code:
$ printf "%s\n" \$k '?\-\{58\}?+1,$-1p' | ed -s infile
printf command simply prints its arguments, on separate lines:
Code:
$ printf "%s\n" \$k '?\-\{58\}?+1,$-1p'
$k
?\-\{58\}?+1,$-1p
These arguments are ed commands. First one $k makes ed to go to the last line in the file. Second one uses reverse search to find delimiter and then prints lines to end of file (skipping delimiters, note +1 and -1).
 
1 members found this post helpful.
Old 07-15-2012, 03:21 PM   #7
bunti01
LQ Newbie
 
Registered: Jul 2012
Posts: 10

Original Poster
Rep: Reputation: Disabled
thanks firstfire, thats working nicely as well
 
Old 07-15-2012, 03:35 PM   #8
bunti01
LQ Newbie
 
Registered: Jul 2012
Posts: 10

Original Poster
Rep: Reputation: Disabled
Can anyone think of a way to do this so it fits into following usage:

cat inputfile.txt | "command goes here" > "required output generated"

Thank You
 
Old 07-15-2012, 03:35 PM   #9
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Quote:
Originally Posted by bunti01 View Post
That does the job thanks Snark1994.

If anyone has a sed/awk one liner which does the same thing that would also be great
SED can do amazing things, but this one looks like a stretch.
 
Old 07-15-2012, 03:40 PM   #10
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 709

Rep: Reputation: 428Reputation: 428Reputation: 428Reputation: 428Reputation: 428
Here is sed solution:

Code:
$ sed -rn '{:a; /-{5,}/be; N; ba};  :e; $p' infile
Or you can do
Code:
$ cat infile | sed -rn '{:a; /-{5,}/be; N; ba};  :e; $p' > outfile
if you wish.

Funny, it was simple to invent this script, but it took a while to understand why it works..

Last edited by firstfire; 07-15-2012 at 03:44 PM.
 
Old 07-15-2012, 04:01 PM   #11
bunti01
LQ Newbie
 
Registered: Jul 2012
Posts: 10

Original Poster
Rep: Reputation: Disabled
Many thanks firstfire, that sed options works perfect.

If you get a spare moment would be great to understand what each part of the command is doing please

cheers
 
Old 07-15-2012, 04:15 PM   #12
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
That is pretty clever!!

If I read it right:
"read a new line, and then keep appending lines to the working register until we find 5 or more "-", then, if we are at the end of the file, print the accumulated record. Otherwise go back to the default collection of lines (without appending)."

this of course only works if there is a line of ------- at the end of the last record.

Last edited by pixellany; 07-16-2012 at 04:19 AM. Reason: Fixed error---see later posts
 
Old 07-15-2012, 09:41 PM   #13
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Code:
awk 'NF{d=$0}END{print d}' RS='--+' file
 
1 members found this post helpful.
Old 07-15-2012, 11:28 PM   #14
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 709

Rep: Reputation: 428Reputation: 428Reputation: 428Reputation: 428Reputation: 428
Hi.

Quote:
Originally Posted by pixellany View Post
That is pretty clever!!

If I read it right:
"read a new line, and then keep appending lines to the working register until we find 5 or more "-", then, if we are at the end of the line, print the accumulated record. Otherwise go back the default collection of lines (without appending)."

this of course only works if there is a line of ------- at the end of the last record.
That's right, except for this part
Quote:
if we are at the end of the line
which should read
Quote:
if we are at the end of the file
because $ in the address position means last line of the file.

Somewhat ugly solution to the last problem you mentioned is as follows:
Code:
$ sed -rn '{:a; /-{5,}/be; $be; N; ba};  :e; $p' in
 
Old 07-16-2012, 01:42 AM   #15
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Quote:
cat inputfile.txt | "command goes here" > "required output generated"
Most will not show you this form as it is a Useless use of cat
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to do search & replace on a text file--need to extract URLs from a sitemap file Mountain Linux - General 4 08-07-2015 10:52 AM
Prompt the user for a file to open, extract the XML and write to another text file. richiep Linux - Newbie 7 10-22-2010 03:34 PM
[SOLVED] How to Awk Paragraph in complex text file? VMthinker Linux - General 1 09-24-2010 05:41 AM
[SOLVED] How to Awk Paragraph in complex text file? VMthinker Linux - Newbie 1 09-24-2010 01:15 AM
Extract certain text info from text file xmrkite Linux - Software 30 02-26-2008 11:06 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:28 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration