search for inverse of pattern and join with the line before it
Let's say I have:
Code:
122. some text, March 2, 1996 Code:
122. some text, March 2, 1996 Code:
cat file | sed -e '/\(^[0-9]\{1,5\}\.\s\+\)/N;s/\n/ /g' Code:
cat file | sed ':a; $!N;s/\ntext/ text/;ta;P;D' Something like the following (which doesn't work): Code:
cat file | sed -e ':a; /\(^[0-9]\{1,5\}\.\s\+\)!/N;s/\n/ /;ta;P;D' |
Code:
awk '/[a-zA-Z]/' file |
try this out, It worked for me
Code:
cat file | tr "\n" " " | sed 's/\([0-9]\+\.\)/\n\1/g' |
Quote:
|
Code:
awk '/[[:digit:]][.]/{ |
Quote:
|
ghostdog, the input has lines that contain only the year which has to be appended to the previous lines. Only the output has alphabets in every line. In my code I just checked the first number immediately followed by a dot, as suggested by the OP.
|
awk, or the Perl programming language, would be an appropriate tool for this, because the task at hand is expressed algorithmically.
(1) Initialize a line-buffer to an empty string. (2) While not end-of-file, read another newline-delimited string and append it to the buffer. (3) Look within the buffer for "some text, and a date." If you find that, remove it from the head of the buffer and output it. Keep the tail of the string in the buffer. (4) Repeat step (3) until no more matches can be found. (5) When you reach the end-of-file, don't forget what's still in the buffer (if anything). In this case I don't think you intend to do anything with it. This algorithm suggests itself because, in the data you provide, I see that newlines can appear anywhere in a date, which is nevertheless seen as one. The two tools that I spoke of are "power tools" for doing this kind of string-manipulation and file parsing. |
Quote:
Code:
awk '/[0-9][.]/ && NR>1{ print "";}{printf "%s",$0}' file |
This uses GNU sed:-
All of the lines are appended one at a time in the pattern space separated by \n. If the start of the latest line in doesn't match the number pattern then the 's' command replaces the last '\n' in the pattern space with a space. It can join a continuous run of lines which don't start with the number pattern. Code:
echo \ |
Thanks a lot, everyone, this is some great stuff!
|
Quote:
1. Tells sed to expect extended regexp 2. Creates a label to return to later 3. Appends the next line of input into the pattern space 4. Pattern starts with newline 5. Pattern to match 6. ? 7. Matches till the end of line 8. Negates previous expression 9. Substitutes 10. Anything followed by a newline? 11. Replaces with ?? <-- Edit: "\1" means first saved substring 12. Returns to label Thanks again! |
Quote:
Cheers! |
Quote:
|
Quote:
|
All times are GMT -5. The time now is 06:08 AM. |