search for inverse of pattern and join with the line before it
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Distribution: debian on servers, ubuntu on desktops/laptops
Posts: 45
Rep:
search for inverse of pattern and join with the line before it
Let's say I have:
Code:
122. some text, March 2, 1996
123. some text, April 22, 1997
124. some text, April 23,
1998
125. some text, May 1,
1999
20555. some text, August 3, 2007
20556. some text, July 3,
2008
20557. some text, July
4, 2009
20558. some text, August 1, 2010
And I need to turn it into:
Code:
122. some text, March 2, 1996
123. some text, April 22, 1997
124. some text, April 23, 1998
125. some text, May 1, 1999
20555. some text, August 3, 2007
20556. some text, July 3, 2008
20557. some text, July 4, 2009
20558. some text, August 1, 2010
I hacked together some sed that will find lines that start with a number between 1 and 5 digits, followed by a dot and one or more spaces and replace newline at the end with a space to join each line containing this pattern with the line that follows:
Code:
cat file | sed -e '/\(^[0-9]\{1,5\}\.\s\+\)/N;s/\n/ /g'
And also sed that will find lines that start with "text" and join each with the preceding line:
Code:
cat file | sed ':a; $!N;s/\ntext/ text/;ta;P;D'
But I cannot figure out how to find lines not matching a pattern (inverse match, like "grep -v") and then append each to the preceding line.
Something like the following (which doesn't work):
Code:
cat file | sed -e ':a; /\(^[0-9]\{1,5\}\.\s\+\)!/N;s/\n/ /;ta;P;D'
Any way to do this with sed (or awk, or something else)?
I don't grasp what this does or how it fits with the question....???
that was formulated according to the output he wants using the sample input. BUT it does not check for the digits and such because the output OP wants all have alphabets. Hence my suggestion. Of course, if there are more variation of input then there will be a need to do more thorough check like what colucix did.
ghostdog, the input has lines that contain only the year which has to be appended to the previous lines. Only the output has alphabets in every line. In my code I just checked the first number immediately followed by a dot, as suggested by the OP.
awk, or the Perl programming language, would be an appropriate tool for this, because the task at hand is expressed algorithmically.
(1) Initialize a line-buffer to an empty string.
(2) While not end-of-file, read another newline-delimited string and append it to the buffer.
(3) Look within the buffer for "some text, and a date." If you find that, remove it from the head of the buffer and output it. Keep the tail of the string in the buffer.
(4) Repeat step (3) until no more matches can be found.
(5) When you reach the end-of-file, don't forget what's still in the buffer (if anything). In this case I don't think you intend to do anything with it.
This algorithm suggests itself because, in the data you provide, I see that newlines can appear anywhere in a date, which is nevertheless seen as one.
The two tools that I spoke of are "power tools" for doing this kind of string-manipulation and file parsing.
ghostdog, the input has lines that contain only the year which has to be appended to the previous lines. Only the output has alphabets in every line. In my code I just checked the first number immediately followed by a dot, as suggested by the OP.
This uses GNU sed:-
All of the lines are appended one at a time in the pattern space separated by \n. If the start of the latest line in doesn't match the number pattern then the 's' command replaces the last '\n' in the pattern space with a space.
It can join a continuous run of lines which don't start with the number pattern.
Code:
echo \
'124. some text, April 23,
1998
20557. some text, July
4, 2009
some more text
20558. some text, August 1, 2010' |
sed -r ':a N; /\n[0-9]{1,5}\.\s[^\n]*$/! s/(.*)\n/\1 /; ba'
124. some text, April 23, 1998
20557. some text, July 4, 2009 some more text
20558. some text, August 1, 2010
Kenhelm, this is great and works perfectly, but I don't fully understand it. Here's what I see, please correct me.. :-P
1. Tells sed to expect extended regexp
2. Creates a label to return to later
3. Appends the next line of input into the pattern space
4. Pattern starts with newline
5. Pattern to match
6. ?
7. Matches till the end of line
8. Negates previous expression
9. Substitutes
10. Anything followed by a newline?
11. Replaces with ?? <-- Edit: "\1" means first saved substring
12. Returns to label
Thanks again!
Last edited by akelder; 07-14-2009 at 03:05 PM.
Reason: Figured out #11
Distribution: debian on servers, ubuntu on desktops/laptops
Posts: 45
Original Poster
Rep:
Quote:
Originally Posted by PMP
try this out, It worked for me
Code:
cat file | tr "\n" " " | sed 's/\([0-9]\+\.\)/\n\1/g'
PMP, thanks, that works great. Clever workaround to not have to bother with holding stuff in the buffer.. I see you're using that \1 at the end, too.. Gotta figure out what that means..
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.