LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-14-2009, 05:03 AM   #1
akelder
Member
 
Registered: Jan 2007
Distribution: debian on servers, ubuntu on desktops/laptops
Posts: 45

Rep: Reputation: 16
Question search for inverse of pattern and join with the line before it


Let's say I have:

Code:
122. some text, March 2, 1996
123.  some text, April 22, 1997
124. some text, April 23,
1998
125.  some text, May 1,
1999
20555.   some text, August 3, 2007
20556. some text, July 3,
2008
20557. some text, July
4, 2009
20558. some text, August 1, 2010
And I need to turn it into:

Code:
122. some text, March 2, 1996
123.  some text, April 22, 1997
124. some text, April 23, 1998
125.  some text, May 1, 1999
20555.   some text, August 3, 2007
20556. some text, July 3, 2008
20557. some text, July 4, 2009
20558. some text, August 1, 2010
I hacked together some sed that will find lines that start with a number between 1 and 5 digits, followed by a dot and one or more spaces and replace newline at the end with a space to join each line containing this pattern with the line that follows:

Code:
cat file | sed -e '/\(^[0-9]\{1,5\}\.\s\+\)/N;s/\n/ /g'
And also sed that will find lines that start with "text" and join each with the preceding line:

Code:
cat file | sed ':a; $!N;s/\ntext/ text/;ta;P;D'
But I cannot figure out how to find lines not matching a pattern (inverse match, like "grep -v") and then append each to the preceding line.

Something like the following (which doesn't work):

Code:
cat file | sed -e ':a; /\(^[0-9]\{1,5\}\.\s\+\)!/N;s/\n/ /;ta;P;D'
Any way to do this with sed (or awk, or something else)?
 
Old 07-14-2009, 05:49 AM   #2
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Code:
awk '/[a-zA-Z]/' file
 
Old 07-14-2009, 06:18 AM   #3
PMP
Member
 
Registered: Apr 2009
Location: ~
Distribution: RHEL, Fedora
Posts: 381

Rep: Reputation: 58
try this out, It worked for me
Code:
cat file | tr "\n" " " | sed 's/\([0-9]\+\.\)/\n\1/g'
 
Old 07-14-2009, 06:19 AM   #4
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Quote:
Originally Posted by ghostdog74 View Post
Code:
awk '/[a-zA-Z]/' file
I don't grasp what this does or how it fits with the question....???
 
Old 07-14-2009, 06:19 AM   #5
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Code:
awk '/[[:digit:]][.]/{
  if ( string != "" )
     print string
  string=$0
}
!/[[:digit:]][.]/{
  print string, $0
  string=""
}
END { if ( string != "" )
       print string
}' testfile
 
Old 07-14-2009, 07:19 AM   #6
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by pixellany View Post
I don't grasp what this does or how it fits with the question....???
that was formulated according to the output he wants using the sample input. BUT it does not check for the digits and such because the output OP wants all have alphabets. Hence my suggestion. Of course, if there are more variation of input then there will be a need to do more thorough check like what colucix did.
 
Old 07-14-2009, 07:39 AM   #7
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
ghostdog, the input has lines that contain only the year which has to be appended to the previous lines. Only the output has alphabets in every line. In my code I just checked the first number immediately followed by a dot, as suggested by the OP.
 
Old 07-14-2009, 08:20 AM   #8
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,663
Blog Entries: 4

Rep: Reputation: 3944Reputation: 3944Reputation: 3944Reputation: 3944Reputation: 3944Reputation: 3944Reputation: 3944Reputation: 3944Reputation: 3944Reputation: 3944Reputation: 3944
awk, or the Perl programming language, would be an appropriate tool for this, because the task at hand is expressed algorithmically.

(1) Initialize a line-buffer to an empty string.

(2) While not end-of-file, read another newline-delimited string and append it to the buffer.

(3) Look within the buffer for "some text, and a date." If you find that, remove it from the head of the buffer and output it. Keep the tail of the string in the buffer.

(4) Repeat step (3) until no more matches can be found.

(5) When you reach the end-of-file, don't forget what's still in the buffer (if anything). In this case I don't think you intend to do anything with it.

This algorithm suggests itself because, in the data you provide, I see that newlines can appear anywhere in a date, which is nevertheless seen as one.

The two tools that I spoke of are "power tools" for doing this kind of string-manipulation and file parsing.
 
Old 07-14-2009, 08:32 AM   #9
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by colucix View Post
ghostdog, the input has lines that contain only the year which has to be appended to the previous lines. Only the output has alphabets in every line. In my code I just checked the first number immediately followed by a dot, as suggested by the OP.
thanks. i missed the year to append to previous.
Code:
awk '/[0-9][.]/ && NR>1{ print "";}{printf "%s",$0}' file

Last edited by ghostdog74; 07-14-2009 at 09:00 AM.
 
Old 07-14-2009, 12:08 PM   #10
Kenhelm
Member
 
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 360

Rep: Reputation: 170Reputation: 170
This uses GNU sed:-
All of the lines are appended one at a time in the pattern space separated by \n. If the start of the latest line in doesn't match the number pattern then the 's' command replaces the last '\n' in the pattern space with a space.
It can join a continuous run of lines which don't start with the number pattern.
Code:
echo \
'124. some text, April 23,
1998
20557. some text, July
4, 2009
some more text
20558. some text, August 1, 2010' |

sed -r ':a N; /\n[0-9]{1,5}\.\s[^\n]*$/! s/(.*)\n/\1 /; ba'

124. some text, April 23, 1998
20557. some text, July 4, 2009 some more text
20558. some text, August 1, 2010
 
Old 07-14-2009, 01:54 PM   #11
akelder
Member
 
Registered: Jan 2007
Distribution: debian on servers, ubuntu on desktops/laptops
Posts: 45

Original Poster
Rep: Reputation: 16
Thanks a lot, everyone, this is some great stuff!
 
Old 07-14-2009, 02:24 PM   #12
akelder
Member
 
Registered: Jan 2007
Distribution: debian on servers, ubuntu on desktops/laptops
Posts: 45

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by Kenhelm View Post
Code:
     1   2 3   4        5        6   7 8 9   10   11    12       
sed -r ':a N; /\n[0-9]{1,5}\.\s[^\n]*$/! s/(.*)\n/\1 /; ba'
Kenhelm, this is great and works perfectly, but I don't fully understand it. Here's what I see, please correct me.. :-P

1. Tells sed to expect extended regexp
2. Creates a label to return to later
3. Appends the next line of input into the pattern space
4. Pattern starts with newline
5. Pattern to match
6. ?
7. Matches till the end of line
8. Negates previous expression
9. Substitutes
10. Anything followed by a newline?
11. Replaces with ?? <-- Edit: "\1" means first saved substring
12. Returns to label

Thanks again!

Last edited by akelder; 07-14-2009 at 03:05 PM. Reason: Figured out #11
 
Old 07-14-2009, 02:30 PM   #13
akelder
Member
 
Registered: Jan 2007
Distribution: debian on servers, ubuntu on desktops/laptops
Posts: 45

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by ghostdog74 View Post
Code:
awk '/[0-9][.]/ && NR>1{ print "";}{printf "%s",$0}' file
Ghostdog, this works perfectly, but could you explain how it works?

Cheers!
 
Old 07-14-2009, 02:45 PM   #14
akelder
Member
 
Registered: Jan 2007
Distribution: debian on servers, ubuntu on desktops/laptops
Posts: 45

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by colucix View Post
Code:
awk '/[[:digit:]][.]/{
  if ( string != "" )
     print string
  string=$0
}
!/[[:digit:]][.]/{
  print string, $0
  string=""
}
END { if ( string != "" )
       print string
}' testfile
colucix, thanks much, sorry for being dense, but how do I run this? Running it from the shell runs without error, but doesn't work.
 
Old 07-14-2009, 02:56 PM   #15
akelder
Member
 
Registered: Jan 2007
Distribution: debian on servers, ubuntu on desktops/laptops
Posts: 45

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by PMP View Post
try this out, It worked for me
Code:
cat file | tr "\n" " " | sed 's/\([0-9]\+\.\)/\n\1/g'
PMP, thanks, that works great. Clever workaround to not have to bother with holding stuff in the buffer.. I see you're using that \1 at the end, too.. Gotta figure out what that means..
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
printing pattern match and not whole line that matches pattern Avatar33 Programming 13 05-06-2009 06:17 AM
Putting blank line after the search pattern. dina3e Programming 2 09-21-2008 07:38 AM
Texmaker & inverse-search setting Blue Jacket Linux - Software 1 04-19-2006 03:27 PM
LXer: The Inverse Extension Design Pattern LXer Syndicated Linux News 0 12-21-2005 11:16 PM
Pattern search in a line jitz Linux - General 2 12-06-2003 04:50 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:37 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration