[SOLVED] Using sed to search and stop at a blank line
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
print > "file"++i - print the current record (ie all up to the empty line) into a file called "fileN", where N is 1, 2, 3, etc
Thank you for this explanation. I now understand a distinction between record and line.
Now, a nitpick. Empty line could mean a null line, or it could mean a line containing only white space. When displayed on the screen both look alike. Your solution is short and sweet (I admire that) but it depends on empty line = null line.
Now, a nitpick. Empty line could mean a null line, or it could mean a line containing only white space. When displayed on the screen both look alike. Your solution is short and sweet (I admire that) but it depends on empty line = null line.
And I am sure by now you could easily convert this to allow for whitespace
# File identification
Path=$(readlink -f $0 | cut -d'.' -f1)
InFile=$Path"inp.txt"
# In this version each output file includes the first line of each paragraph.
o=$(readlink -f $0 | cut -d'.' -f1)"out" #o = output file names
rm $o'.'* # Blow away any leftover output files.
ofid="" # Initialize ofid, Output File IDentfier.
while read line
do
if [[ "$ofid" == "" ]]
then ofid=$line
fi
if [[ "$line" == "" ]]
then ofid=""
else echo $line >> $o'.'$ofid
fi
done < $InFile
... and this code (based grail's superb awk one-liner) is more concise...
Code:
# File identification
Path=$(readlink -f $0 | cut -d'.' -f1)
InFile=$Path"inp.txt"
# In this version each output file includes the first line of each paragraph.
o=$(readlink -f $0 | cut -d'.' -f1)"out" #o = output file names
awk -v o=$o '{print > o"."$1}' RS="" $InFile
Suggestions and corrections are gratefully accepted.
This is an interesting problem and, as a learning experience, I improved on previous solutions.
Instead of sequence numbers I used the first line in each "paragraph" as part of the output file names.
This InFile ...
Code:
able
choice1-1
choice1-2
choice1-3
baker
choice2-1
choice2-2
choice2-3
choice2-4
choice2-5
charlie
choice3-1
choice3-2
dog
choice4-1
choice4-2
choice4-3
... produces these four OutFiles ...
dbm690out.able
Code:
choice1-1
choice1-2
choice1-3
dbm690out.baker
Code:
choice2-1
choice2-2
choice2-3
choice2-4
choice2-5
dbm690out.charlie
Code:
choice3-1
choice3-2
dbm690out.dog
Code:
choice4-1
choice4-2
choice4-3
This code (using bash) does the job...
Code:
# File identification
Path=$(readlink -f $0 | cut -d'.' -f1)
InFile=$Path"inp.txt"
# In this version each output file excludes the first line of each paragraph.
o=$(readlink -f $0 | cut -d'.' -f1)"out" #o = output file names
rm $o'.'* # Blow away any leftover output files.
ofid="" # Initialize ofid, Output File IDentfier.
while read line
do
if [[ "$ofid" == "" ]];
then ofid=$line;
fi
if [[ "$line" == "" ]];
then ofid="";
fi
if [[ "$ofid" != "$line" ]];
then echo $line >> $o'.'$ofid
fi
done < $InFile
... and this code (based grail's superb awk one-liner) is more concise...
Code:
# File identification
Path=$(readlink -f $0 | cut -d'.' -f1)
InFile=$Path"inp.txt"
# In this version each output file excludes the first line of each paragraph.
o=$(readlink -f $0 | cut -d'.' -f1)"out" #o = output file names
awk -v o=$o '{t=$1;$1="";sub(/^ /,"");gsub(" ","\n")} {print > o"."t}' RS="" $InFile
Suggestions and corrections are gratefully accepted.
Daniel B. Martin
Last edited by danielbmartin; 03-18-2013 at 10:02 AM.
Reason: Fix bug identified by grail (post #22)
Might want to check the output files that are using the second awk solution. I think you will find that your data is not line for line, but now on a single line.
Might want to check the output files that are using the second awk solution. I think you will find that your data is not line for line, but now on a single line.
Recognition of a bug is the first step toward fixing the bug. The man who points out a flaw in my code is helping me. Thank you, grail.
I edited post #21 to show corrected code. It works but is unlovely. Is there a cleaner way?
Now, let's make the problem more challenging by permitting multi-word "choice" lines.
With this InFile ...
Code:
able
how now
brown cow
baker
now is the time
for all good men
to come to the aid
of their party
charlie
the quick brown fox
jumps over
the lazy programmer
dog
words to live by:
let sleeping dogs lie
...this bash code ...
Code:
# File identification
Path=$(readlink -f $0 | cut -d'.' -f1)
InFile=$Path"inp.txt"
# In this version each output file excludes the first line of each paragraph.
o=$(readlink -f $0 | cut -d'.' -f1)"out" #o = output file names
rm $o'.'* # Blow away any leftover output files.
ofid="" # Initialize ofid, Output File IDentfier.
while read line
do
if [[ "$ofid" == "" ]];
then ofid=$line;
fi
if [[ "$line" == "" ]];
then ofid="";
fi
if [[ "$ofid" != "$line" ]];
then echo $line >> $o'.'$ofid
fi
done < $InFile
# For debugging...
for file in $o*; do echo; echo $file "..."; cat $file; done
... produces this result ...
Code:
/home/daniel/Desktop/LQfiles/dbm690out.able ...
how now
brown cow
/home/daniel/Desktop/LQfiles/dbm690out.baker ...
now is the time
for all good men
to come to the aid
of their party
/home/daniel/Desktop/LQfiles/dbm690out.charlie ...
the quick brown fox
jumps over
the lazy programmer
/home/daniel/Desktop/LQfiles/dbm690out.dog ...
words to live by:
let sleeping dogs lie
... but I'm unable to code an equivalent in awk. Anyone care to take a shot at it?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.