Using sed to search and stop at a blank line
I'm using a sed statement within a bash shell script to search through a file and stop when it reaches a blank line.
The sed statement is working, but I'm having trouble understanding how. (I found it online somewhere). Code:
"sed -n "\?$i?,\?^$\|pattern?p" 1. Variable $i is coming from a while loop. 2. I'm using the "?" as a delimiter so sed doesn't choke on special characters which may be used in the strings it's searching. 3. I understand using the "," as a range, but am having trouble understanding why the "?" after "$i" would not be commented-out with a "\" and also why the "|pattern" text is there. 4. I understand using "-n" to suppress output, then using "p" to print only what is returned from sed. If someone could help break this down for me, I would appreciate it. Here is the original sed statement before I modified it: Code:
sed -n "/$i/,/^$\|pattern/p" |
The final example makes sense but I am at a loss how your first would work?? My first issue would be the incorrect number of quotes and why the line would start with them?
Secondly, it is my understanding that sed only allows the changing of the delimiter the following, s/// ... so s??? could be used. On a quick test of a file here it definitely does not work for me to have "?" as the delimiter for a range. |
Quote:
Code:
Line 1 Code:
sed '/^ *$/q' $InFile >$OutFile Code:
Line 1 To eliminate that blank line ... Code:
sed '/^ *$/q' $InFile |sed '$d' >$OutFile |
@grail. Yes, you can change the delimiter of the address regex if you prefix the first delimiting character with a backslash, as in this case (\?regex?). It's in the man page.
Now lets try breaking down the command, minus the delimiters (and assuming the first quote mark is just a typo): Code:
sed -n The second address is a complex regex. "|" is the "or" separator, enabled by prefixing it with a backslash because you're still in basic regex mode. If you used the "-r" option to enter extended regex mode, the backslash becomes unnecessary*. So range 2 is either "pattern" or "^$", a blank line. All told, it prints every line from "$i" to either the first instance of "pattern" or the first blank line. *See the appropriate section of the grep man page for more details on basic vs. extended regex. Edit: @daniel, I really hate seeing multiple commands chained together when one can do the job. In this case replace "q" with "Q" and it will exit before printing the last line. Code:
sed '/^ *$/Q' $InFile Code:
i=2 http://wiki.bash-hackers.org/howto/edit-ed http://snap.nlc.dcccd.edu/learn/nlc/ed.html (also read the info page) |
Thanks everyone - yes, sorry, the first double-quote before sed is a typo.
Thanks to David H for breaking this down, makes more sense now. The file i'm searching through is formatted like this: "string1" "choice1" "choice2" "choice3" "string2" "choice1" "choice2" "choice3" So what I am doing is searching for each "stringx" and grabbing it, plus its following choices, down to the blank line, because that is where the list ends and the next string begins. Then for each string + choices found in the source file, I'm writing those to a new file. The actual source file can contain hundreds of entries like above. |
Quote:
Daniel B. Martin |
Thanks David ... hadn't seen that one before ... tick something new today :)
|
Quote:
Oh, well if that's what you want, consider using the csplit utility instead (it's part of the coreutils). It splits text into multiple files based on patterns or numbers of lines. Code:
csplit -f "file-" -b "%03d.txt" -z infile.txt '/^$/' '{*}' The only problem with the above is that the blank lines are still left in the new files. But a simple bit of post-processing with sed can remove those. Code:
for fname in file*.txt; do sed -i '/^$/d' "$fname"; done |
Quote:
Code:
string1 Code:
#!/bin/bash Daniel B. Martin Mar13 Daniel B. Martin |
Yeah not sure why this would have to be so difficult:
Code:
awk '{print > "file"++i}' RS="" infile |
Code:
while read line ; do |
Quote:
Code:
k=1 Daniel B. Martin |
I don't see why the output file name needs to be modified by a counter. I guess I'm missing something -I thought the idea was to "stop when it reaches a blank line". Either way, the case statement will be faster than [[ or test.
|
@gnashley - your original idea was correct for the first post , but as of post #5 the OP has now asked that each part of the file be entered into separate files
|
Oops, I guess I've slept since then... No, wait, it seems to be raining in my hat!
|
Quote:
Daniel B. Martin |
RS="" - Set record separator to an empty line
print > "file"++i - print the current record (ie all up to the empty line) into a file called "fileN", where N is 1, 2, 3, etc |
Quote:
Now, a nitpick. Empty line could mean a null line, or it could mean a line containing only white space. When displayed on the screen both look alike. Your solution is short and sweet (I admire that) but it depends on empty line = null line. Daniel B. Martin |
Quote:
|
This is an interesting problem and, as a learning experience, I improved on previous solutions.
Instead of sequence numbers I used the first line in each "paragraph" as part of the output file names. This InFile ... Code:
able dbm686out.able Code:
able Code:
baker Code:
charlie Code:
dog Code:
# File identification Code:
# File identification Daniel B. Martin |
This is an interesting problem and, as a learning experience, I improved on previous solutions.
Instead of sequence numbers I used the first line in each "paragraph" as part of the output file names. This InFile ... Code:
able dbm690out.able Code:
choice1-1 Code:
choice2-1 Code:
choice3-1 Code:
choice4-1 Code:
# File identification Code:
# File identification Daniel B. Martin |
Might want to check the output files that are using the second awk solution. I think you will find that your data is not line for line, but now on a single line.
Example: Instead of dbm690out.able being: Code:
choice1-1 Code:
choice1-1 choice1-2 choice1-3 |
Quote:
I edited post #21 to show corrected code. It works but is unlovely. Is there a cleaner way? Daniel B. Martin |
How about:
Code:
awk -vo=$o '{t=$1;$1="";sub(/^\n/,"");print > o "." t}' RS="" OFS="\n" file Code:
ruby -ane 'BEGIN{$/=""};IO.write("name."+ $F[0],$F[1..-1]*"\n")' file |
Now, let's make the problem more challenging by permitting multi-word "choice" lines.
With this InFile ... Code:
able Code:
# File identification Code:
/home/daniel/Desktop/LQfiles/dbm690out.able ... Daniel B. Martin |
My hint will be, have a look at the input field separator (FS)
|
All times are GMT -5. The time now is 04:57 PM. |