LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   search for inverse of pattern and join with the line before it (https://www.linuxquestions.org/questions/programming-9/search-for-inverse-of-pattern-and-join-with-the-line-before-it-739953/)

akelder 07-14-2009 02:58 PM

Oreilly sed & awk book to the rescue!

Quote:

The replacement string recalls the first saved substring as "\1" and the second as "\2,"
which is surrounded by bold-font requests.
Great book!

akelder 07-14-2009 03:02 PM

Quote:

Originally Posted by sundialsvcs (Post 3607205)
awk, or the Perl programming language, would be an appropriate tool for this, because the task at hand is expressed algorithmically.

(1) Initialize a line-buffer to an empty string.

(2) While not end-of-file, read another newline-delimited string and append it to the buffer.

(3) Look within the buffer for "some text, and a date." If you find that, remove it from the head of the buffer and output it. Keep the tail of the string in the buffer.

(4) Repeat step (3) until no more matches can be found.

(5) When you reach the end-of-file, don't forget what's still in the buffer (if anything). In this case I don't think you intend to do anything with it.

This algorithm suggests itself because, in the data you provide, I see that newlines can appear anywhere in a date, which is nevertheless seen as one.

The two tools that I spoke of are "power tools" for doing this kind of string-manipulation and file parsing.

sundialsvcs, this is very helpful, thanks much. How would this be expressed in awk?

Kenhelm 07-14-2009 05:47 PM

Code:

    1  2 3  4        5        6  7 8 9  10  11    12
sed -r ':a N; /\n[0-9]{1,5}\.\s[^\n]*$/! s/(.*)\n/\1 /; ba'

1. Tells sed to expect extended regexp
2. Creates a label to return to later
3. Appends the next line of input into the pattern space
4. Pattern starts with newline
5. Pattern to match
6. [^\n]* is a string which does not contain a newline. This limits the pattern to the last line in a multi-line pattern space. Otherwise the 'greedy matching' of * would try to match back to the second line in the pattern space.
7. Matches till the end of line
8. Negates previous expression
9. Substitutes
10. Due to 'greedy matching' .*\n matches everything up to the start of the last line in the pattern space.
(The end of the last line in the pattern space is matched by $ not by \n )
11. \1 is whatever is matched by the pattern in the ( )
i.e. Everything up to the last \n but not including it.
12. Returns to label

ghostdog74 07-14-2009 07:46 PM

Quote:

Originally Posted by akelder (Post 3607578)
Ghostdog, this works perfectly, but could you explain how it works?

Cheers!

if line has digit + dot, print newline
for the rest of the line, using printf (without \n) will concat lines


All times are GMT -5. The time now is 10:06 AM.