LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-16-2012, 04:18 AM   #16
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743

OOPS!!---yes, I was thinking end of file---fingers were not connected to brain.
 
Old 07-16-2012, 04:32 AM   #17
Snark1994
Senior Member
 
Registered: Sep 2010
Distribution: Debian
Posts: 1,632
Blog Entries: 3

Rep: Reputation: 346Reputation: 346Reputation: 346Reputation: 346
Quote:
Originally Posted by firstfire View Post
Here is sed solution:

Code:
$ sed -rn '{:a; /-{5,}/be; N; ba};  :e; $p' infile
Or you can do
Code:
$ cat infile | sed -rn '{:a; /-{5,}/be; N; ba};  :e; $p' > outfile
if you wish.

Funny, it was simple to invent this script, but it took a while to understand why it works..
That's just showing off -.- nice work :P

Also, if the OP considers eir problem solved, could e please mark the thread as 'SOLVED'. Thanks

Last edited by Snark1994; 07-16-2012 at 04:40 AM.
 
Old 07-16-2012, 04:58 AM   #18
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Quote:
That's just showing off -.- nice work :P
There's a few people here that enjoy solving problems with specific commands--I am one of them. To me, it's just like solving puzzles.

I'm working on the FORTRAN solution to the OPs problem....
 
Old 07-16-2012, 06:06 AM   #19
bunti01
LQ Newbie
 
Registered: Jul 2012
Posts: 10

Original Poster
Rep: Reputation: Disabled
thanks all for your quick responses
 
Old 07-16-2012, 06:32 AM   #20
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by grail View Post
Code:
awk 'NF{d=$0}END{print d}' RS='--+' file
This is brilliant! Please walk us through it.

As a novice awker I cannot fully follow it. I've groped through the darkness only this far:

NF is a System Variable which is the number of fields for the current input record. What is its significance here?

RS is a System Variable which is the record separator. In this thread the individual paragraphs are separated by a string of dashes so RS='--+' might be defining each paragraph as a record. Is this right? Why the +? Why is it cited at the end of this awk instead of the beginning?

$0 is the current input record in its entirety.

d is apparently a variable because if I change it to e or f the code still works.

So... (and here I get shaky)... we read the entire file one paragraph at a time, and each time we overwrite the contents of variable d with the most recent paragraph. Then we hit END which tells us to stop reading and start printing. There's only one thing to print, and that is d, the last paragraph.

Daniel B. Martin
 
Old 07-16-2012, 09:29 AM   #21
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Not too far off daniel

NF - you are correct about its origin. You then need to remember that everything in front of {} is evaluated to eventually be true or false. As a record with zero fields would have an NF value of
zero, the braces would not get entered and the value of variable 'd' will not change. The significance to the OPs example is because there are dashes after the last visible record, awk will say that
the final record is the empty one after the last dashes, which of course we do not wish to print.

RS - again origin is correct. The trick to remember with awk is that there are actually 3 places you can set 'system variables':

1. Use -v ... awk -vRS="--+"

2. In the BEGIN ... BEGIN{RS = "--+"}

3. After the 'program' ... this is of course what I have used here

My general rule of thumb is if only one and it is less typing I use after the program, otherwise I use the BEGIN. I reserve the -v option only for those I wish to draw from the environment (usually)

As for the '+', * is zero or more and + is one or more. The data leant itself to the latter (try changing for a * and see the difference)
Quote:
Then we hit END which tells us to stop reading and start printing.
Slight correction here, END is only processed once all files have finished being read (gawk v4+ now also contains ENDFILE which allows you to set things to occur when each file
completes)

Please let me know if you need any further information
 
1 members found this post helpful.
Old 07-16-2012, 09:59 AM   #22
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by grail View Post
Please let me know if you need any further information
All questions answered; thank you.

Purely as a learning exercise, I propose making the OP's problem a bit more difficult. Suppose he wanted the penultimate paragraph. How could that be done?

Daniel B. Martin
 
Old 07-16-2012, 10:28 AM   #23
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
I would suggest using 2 variables and print the alternate one. If you then extend to any line I would suggest storing in an array and print length - N of array
 
1 members found this post helpful.
Old 07-16-2012, 11:44 AM   #24
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 709

Rep: Reputation: 428Reputation: 428Reputation: 428Reputation: 428Reputation: 428
Hi.

Quote:
Originally Posted by danielbmartin View Post
Suppose he wanted the penultimate paragraph. How could that be done?
With sed it turned out to be very simple:
Code:
$ sed -nr '{:a; /-{5,}/be; N; ba}; :e; x; $p' in
The only new command here is 'x', which swaps pattern space and hold space. These two registers constitute a "ring buffer" of length 2.
 
1 members found this post helpful.
Old 07-16-2012, 02:11 PM   #25
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by firstfire View Post
With sed it turned out to be very simple:
Code:
$ sed -nr '{:a; /-{5,}/be; N; ba}; :e; x; $p' in
Lovely!

Daniel B. Martin
 
Old 07-16-2012, 10:04 PM   #26
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by danielbmartin View Post
Suppose he wanted the penultimate paragraph. How could that be done?
Playing off grail's method I devised this.
Code:
tac $InFile |awk 'NR==3 {print $0}' RS='--+' |tac > $OutFile
Daniel B. Martin
 
Old 07-16-2012, 11:19 PM   #27
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Code:
awk '{d[NR]=$0}END{print d[NR-2]}' RS='--+' file
 
1 members found this post helpful.
Old 07-17-2012, 05:23 AM   #28
Snark1994
Senior Member
 
Registered: Sep 2010
Distribution: Debian
Posts: 1,632
Blog Entries: 3

Rep: Reputation: 346Reputation: 346Reputation: 346Reputation: 346
Quote:
Originally Posted by pixellany View Post
There's a few people here that enjoy solving problems with specific commands--I am one of them. To me, it's just like solving puzzles.
As am I, I was just admiring a particularly amazing solution. Well, here's a haskell version:

Code:
import System.Environment (getArgs)
import Text.Regex.Posix

interactWith function inputFile outputFile = do
    input <- readFile inputFile
    writeFile outputFile (function input)

main = do args <- getArgs
    case args of  
        [input,output] -> interactWith (unlines . (!!1) . reverse . splitSections . lines) input output
        _ -> putStrLn "Usage: this_script.hs inputfile outputfile"

splitSections xs = foldr step [[]] xs
    where step x acc 
        | x =~ "---------" :: Bool = [x] : acc 
        | otherwise                = (x : head acc) : (tail acc)
Good luck with your FORTRAN, I've only had to use it once and I still have nightmares about it...
 
Old 07-17-2012, 05:59 AM   #29
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by grail View Post
Code:
awk '{d[NR]=$0}END{print d[NR-2]}' RS='--+' file
grail hits the bulls-eye again! Thank you!

Daniel B. Martin
 
Old 07-17-2012, 06:32 AM   #30
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
I know nothing of internal implementation, hence this question.

Does tac really begin reading a file from its last record, or does it read the entire file and buffer it (or parts of it)?

The answer has performance implications. If the input file is huge, then a solution to OP's problem which begins with tac might faster than another solution which reads the entire file, start to end. That assumes tac is clever enough to read only enough to satisfy the following piped commands.

Ideas? Comments?

Daniel B. Martin
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to do search & replace on a text file--need to extract URLs from a sitemap file Mountain Linux - General 4 08-07-2015 10:52 AM
Prompt the user for a file to open, extract the XML and write to another text file. richiep Linux - Newbie 7 10-22-2010 03:34 PM
[SOLVED] How to Awk Paragraph in complex text file? VMthinker Linux - General 1 09-24-2010 05:41 AM
[SOLVED] How to Awk Paragraph in complex text file? VMthinker Linux - Newbie 1 09-24-2010 01:15 AM
Extract certain text info from text file xmrkite Linux - Software 30 02-26-2008 11:06 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:19 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration