LinuxQuestions.org - search for inverse of pattern and join with the line before it

Page 1 of 2

Show 50 post(s) from this thread on one page

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - search for inverse of pattern and join with the line before it (https://www.linuxquestions.org/questions/programming-9/search-for-inverse-of-pattern-and-join-with-the-line-before-it-739953/)

akelder

07-14-2009 05:03 AM

search for inverse of pattern and join with the line before it

Let's say I have:

Code:

122. some text, March 2, 1996

123.  some text, April 22, 1997

124. some text, April 23,

1998

125.  some text, May 1,

1999

20555.  some text, August 3, 2007

20556. some text, July 3,

2008

20557. some text, July

4, 2009

20558. some text, August 1, 2010

And I need to turn it into:

Code:

122. some text, March 2, 1996

123.  some text, April 22, 1997

124. some text, April 23, 1998

125.  some text, May 1, 1999

20555.  some text, August 3, 2007

20556. some text, July 3, 2008

20557. some text, July 4, 2009

20558. some text, August 1, 2010

I hacked together some sed that will find lines that start with a number between 1 and 5 digits, followed by a dot and one or more spaces and replace newline at the end with a space to join each line containing this pattern with the line that follows:

Code:

cat file | sed -e '/$^[0-9]\{1,5\}\.\s\+$/N;s/\n/ /g'

And also sed that will find lines that start with "text" and join each with the preceding line:

Code:

cat file | sed ':a; $!N;s/\ntext/ text/;ta;P;D'

But I cannot figure out how to find lines not matching a pattern (inverse match, like "grep -v") and then append each to the preceding line.

Something like the following (which doesn't work):

Code:

cat file | sed -e ':a; /$^[0-9]\{1,5\}\.\s\+$!/N;s/\n/ /;ta;P;D'

Any way to do this with sed (or awk, or something else)?

ghostdog74

07-14-2009 05:49 AM

Code:

awk '/[a-zA-Z]/' file

PMP	07-14-2009 06:18 AM

try this out, It worked for me

Code:

cat file | tr "\n" " " | sed 's/$[0-9]\+\.$/\n\1/g'

pixellany

07-14-2009 06:19 AM

Quote:

Originally Posted by ghostdog74 (Post 3607070)

Code:

awk '/[a-zA-Z]/' file

I don't grasp what this does or how it fits with the question....???

colucix

07-14-2009 06:19 AM

Code:

awk '/[[:digit:]][.]/{

  if ( string != "" )

    print string

  string=$0

}

!/[[:digit:]][.]/{

  print string, $0

  string=""

}

END { if ( string != "" )

      print string

}' testfile

ghostdog74

07-14-2009 07:19 AM

Quote:

Originally Posted by pixellany (Post 3607091)

I don't grasp what this does or how it fits with the question....???

that was formulated according to the output he wants using the sample input. BUT it does not check for the digits and such because the output OP wants all have alphabets. Hence my suggestion. Of course, if there are more variation of input then there will be a need to do more thorough check like what colucix did.

colucix

07-14-2009 07:39 AM

ghostdog, the input has lines that contain only the year which has to be appended to the previous lines. Only the output has alphabets in every line. In my code I just checked the first number immediately followed by a dot, as suggested by the OP.

sundialsvcs

07-14-2009 08:20 AM

awk, or the Perl programming language, would be an appropriate tool for this, because the task at hand is expressed algorithmically.

(1) Initialize a line-buffer to an empty string.

(2) While not end-of-file, read another newline-delimited string and append it to the buffer.

(3) Look within the buffer for "some text, and a date." If you find that, remove it from the head of the buffer and output it. Keep the tail of the string in the buffer.

(4) Repeat step (3) until no more matches can be found.

(5) When you reach the end-of-file, don't forget what's still in the buffer (if anything). In this case I don't think you intend to do anything with it.

This algorithm suggests itself because, in the data you provide, I see that newlines can appear anywhere in a date, which is nevertheless seen as one.

The two tools that I spoke of are "power tools" for doing this kind of string-manipulation and file parsing.

ghostdog74

07-14-2009 08:32 AM

Quote:

Originally Posted by colucix (Post 3607155)

thanks. i missed the year to append to previous.

Code:

awk '/[0-9][.]/ && NR>1{ print "";}{printf "%s",$0}' file

Kenhelm

07-14-2009 12:08 PM

This uses GNU sed:-
All of the lines are appended one at a time in the pattern space separated by \n. If the start of the latest line in doesn't match the number pattern then the 's' command replaces the last '\n' in the pattern space with a space.
It can join a continuous run of lines which don't start with the number pattern.

Code:

echo \

'124. some text, April 23,

1998

20557. some text, July

4, 2009

some more text

20558. some text, August 1, 2010' |



sed -r ':a N; /\n[0-9]{1,5}\.\s[^\n]*$/! s/(.*)\n/\1 /; ba'



124. some text, April 23, 1998

20557. some text, July 4, 2009 some more text

20558. some text, August 1, 2010

akelder

07-14-2009 01:54 PM

Thanks a lot, everyone, this is some great stuff!

akelder

07-14-2009 02:24 PM

Quote:

Originally Posted by Kenhelm (Post 3607449)

Code:

    1  2 3  4        5        6  7 8 9  10  11    12      

sed -r ':a N; /\n[0-9]{1,5}\.\s[^\n]*$/! s/(.*)\n/\1 /; ba'

Kenhelm, this is great and works perfectly, but I don't fully understand it. Here's what I see, please correct me.. :-P

1. Tells sed to expect extended regexp
2. Creates a label to return to later
3. Appends the next line of input into the pattern space
4. Pattern starts with newline
5. Pattern to match
6. ?
7. Matches till the end of line
8. Negates previous expression
9. Substitutes
10. Anything followed by a newline?
11. Replaces with ?? <-- Edit: "\1" means first saved substring
12. Returns to label

Thanks again!

akelder

07-14-2009 02:30 PM

Quote:

Originally Posted by ghostdog74 (Post 3607220)

Code:

awk '/[0-9][.]/ && NR>1{ print "";}{printf "%s",$0}' file

Ghostdog, this works perfectly, but could you explain how it works?

Cheers!

akelder

07-14-2009 02:45 PM

Quote:

Originally Posted by colucix (Post 3607093)

Code:

awk '/[[:digit:]][.]/{

  if ( string != "" )

    print string

  string=$0

}

!/[[:digit:]][.]/{

  print string, $0

  string=""

}

END { if ( string != "" )

      print string

}' testfile

colucix, thanks much, sorry for being dense, but how do I run this? Running it from the shell runs without error, but doesn't work.

akelder

07-14-2009 02:56 PM

Quote:

Originally Posted by PMP (Post 3607090)

try this out, It worked for me

Code:

cat file | tr "\n" " " | sed 's/$[0-9]\+\.$/\n\1/g'

PMP, thanks, that works great. Clever workaround to not have to bother with holding stuff in the buffer.. I see you're using that \1 at the end, too.. Gotta figure out what that means..

All times are GMT -5. The time now is 06:08 AM.

Page 1 of 2

Show 50 post(s) from this thread on one page