Joining line ending with lowercase and starting with lowercase, or uppercase
Hi,
Fistly I would like to join lines ending with lowercase and starting with lowercase, but with newline between them, like this:Secondly I would like to join lines ending with lowercase and starting with uppercase, but with newline between them, like this (when there is no period): I'm not a pro in Sed, but I use it sometimes (for easy stuff); I tried something like this (find on the net) for cases where there are no newline between: Code:
sed -r ':a;N;$!ba;s/\n([^A-Z])/ \1/g' file.txt Thirdly How can I delete each line in capital, starting or ending with a number?Thanks for any help. |
Hi,
So actually, there is no difference between lower case or upper case new line starting? |
Or you want
Quote:
Quote:
|
No, there is no difference between lower case and upper case new line starting.
And I really want to join these lines, like I said in the first post. Thank you. |
Not a good idea to go using commands you don't understand. here is the online documentation. Time spent reading it will be worthwhile. Have a look at the "s" command and "Other commands" sections to work out what the command you cited in the original post does. Then you can modify it to suit what you want to do ow.
|
OK for lower case
Well I succeed for lower case:
I have CR characters in my original files, I convert them to LF and now it works for lower case: Code:
sed -r ':a;N;$!ba;s/\n\n([^A-Z])/ \1/g' file.txt 1) How to join lines ending in lower case with lines starting with upper case, where there is no period between them. 2) How can I delete each line in capital letters, starting or ending with a number? Like this: Quote:
Quote:
|
You have a few questions you need to ask yourself as some of your data is now overlapping.
Code:
this is I agree with syg00 that you should first go and read what the sed line you have does and also work out if you need multiple sed's piped together or in a single script??? |
Try using tr.
|
It appears that you are trying to parse a double spaced text file prepared in Windows.
Your needs are still unclear to me, but perhaps 'sed' is the wrong tool for the task. Given an input file Quote:
Quote:
1. Skip blank lines 2. Skip a line starting with a number 3. Skip a line ending with a number 4. If a line does not end with a period, concatenate with any previous input 5. If a line ends with a period, concatenate with any previous input and print. This can be done with an awk script containing appropriate /regular expression/ patterns and actions. |
If it is a Windows text file, the problem is simply that it contains both line feeds and unwanted carriage returns. In that case, loading into an editor may give you the opportunity to ignore the CRs.
|
Quote:
You right, but I spend to much time to learn each time something new, even for something that I will use only once in my life, that I do not have enough time to live normally :s (no kidding, it is obsessive). So I refrain myself (very hard) from learning somethings that will not benefit me for long time. But perhaps, I should learn a little more Sed. Finally, it is what I did (I have to admit, not enough), and I have almost finished all I need to do. 1. I cleaned the files from not matching newlines (CR+LF to LF) Code:
for i in *.htm ; do tr '\r\n' '\n' < $i > tr-$i; done; 2. Then join lines (as needed) Code:
for i in *.htm ; do sed -r ':a;N;$!ba;s/\n([^A-Z])/ \1/g' $i Code:
for i in *.htm ; do sed -i -r 's/.*[A-Z0-9(]+*[a-z0-9].*[a-z.:? ]$/\<p\>&/' $i ; done; Code:
sed -r 's/^.*[A-Z0-9].*[A-Z0-9]$/\<h2\>&/' In fact, I want to make an ebook from theses files. Now I only need to find some way to clean some stuffs, but it should be OK. Finally, a long manual work for cleaning some OCR mistakes. Thank you. |
Quote:
never use it again always makes me laugh. I would add that if you had have advised a little more on what you had and what you needed, there may have been those on this site that could have made better suggestions than to solve an issue that is not really related. I am not trying to pick on you specifically, but this is a good question to point out that if you do not supply the correct information you will not get very useful answers. Glad you found a solution :) You might also wish to look up dos2unix command for future use on altering Windows based text files. I would finish by saying that awk (or a higher level language) could have easily perform all your tasks in a single script. |
All times are GMT -5. The time now is 06:16 PM. |