LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-14-2013, 11:51 AM   #1
Tech109
LQ Newbie
 
Registered: Jan 2011
Posts: 12

Rep: Reputation: 0
Using sed to search and stop at a blank line


I'm using a sed statement within a bash shell script to search through a file and stop when it reaches a blank line.

The sed statement is working, but I'm having trouble understanding how. (I found it online somewhere).

Code:
"sed -n "\?$i?,\?^$\|pattern?p"
A few things to note:
1. Variable $i is coming from a while loop.
2. I'm using the "?" as a delimiter so sed doesn't choke on special characters which may be used in the strings it's searching.
3. I understand using the "," as a range, but am having trouble understanding why the "?" after "$i" would not be commented-out with a "\" and also why the "|pattern" text is there.
4. I understand using "-n" to suppress output, then using "p" to print only what is returned from sed.

If someone could help break this down for me, I would appreciate it.

Here is the original sed statement before I modified it:

Code:
sed -n "/$i/,/^$\|pattern/p"
 
Old 03-14-2013, 12:16 PM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
The final example makes sense but I am at a loss how your first would work?? My first issue would be the incorrect number of quotes and why the line would start with them?

Secondly, it is my understanding that sed only allows the changing of the delimiter the following, s/// ... so s??? could be used. On a quick test of a file here it definitely does not work
for me to have "?" as the delimiter for a range.
 
Old 03-14-2013, 12:41 PM   #3
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by Tech109 View Post
I'm using a sed statement within a bash shell script to search through a file and stop when it reaches a blank line.
With this InFile ...
Code:
Line 1
Line 2
Line 3
  
Line 4
Line 5

Line 6
Line 7
... this code ...
Code:
sed '/^ *$/q' $InFile >$OutFile
... produced this OutFile ...
Code:
Line 1
Line 2
Line 3
Note this (possibly acceptable) defect: the output contains all lines up to and including the first blank line.

To eliminate that blank line ...
Code:
sed '/^ *$/q' $InFile |sed '$d' >$OutFile
Daniel B. Martin

Last edited by danielbmartin; 03-14-2013 at 01:36 PM. Reason: Improved code
 
Old 03-15-2013, 09:32 AM   #4
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
@grail. Yes, you can change the delimiter of the address regex if you prefix the first delimiting character with a backslash, as in this case (\?regex?). It's in the man page.

Now lets try breaking down the command, minus the delimiters (and assuming the first quote mark is just a typo):

Code:
sed -n
$i            #address 1
,
^$\|pattern   #address 2
p             #command
The first address is your "$i" variable, naturally.

The second address is a complex regex. "|" is the "or" separator, enabled by prefixing it with a backslash because you're still in basic regex mode. If you used the "-r" option to enter extended regex mode, the backslash becomes unnecessary*.

So range 2 is either "pattern" or "^$", a blank line.

All told, it prints every line from "$i" to either the first instance of "pattern" or the first blank line.

*See the appropriate section of the grep man page for more details on basic vs. extended regex.

Edit: @daniel, I really hate seeing multiple commands chained together when one can do the job. In this case replace "q" with "Q" and it will exit before printing the last line.
Code:
sed '/^ *$/Q'  $InFile
Unfortunately, this won't work if you need to start printing from any line other than the first though. Speaking of which, ed would do the job easily, thanks to its ability to designate relative line positions.

Code:
i=2
printf '%s\n' "$i,/^$/-1p" | ed -s infile.txt
How to use ed:
http://wiki.bash-hackers.org/howto/edit-ed
http://snap.nlc.dcccd.edu/learn/nlc/ed.html
(also read the info page)

Last edited by David the H.; 03-15-2013 at 09:54 AM. Reason: as stated
 
2 members found this post helpful.
Old 03-15-2013, 09:59 AM   #5
Tech109
LQ Newbie
 
Registered: Jan 2011
Posts: 12

Original Poster
Rep: Reputation: 0
Thanks everyone - yes, sorry, the first double-quote before sed is a typo.

Thanks to David H for breaking this down, makes more sense now.

The file i'm searching through is formatted like this:

"string1"
"choice1"
"choice2"
"choice3"

"string2"
"choice1"
"choice2"
"choice3"

So what I am doing is searching for each "stringx" and grabbing it, plus its following choices, down to the blank line, because that is where the list ends and the next string begins. Then for each string + choices found in the source file, I'm writing those to a new file. The actual source file can contain hundreds of entries like above.

Last edited by Tech109; 03-15-2013 at 10:01 AM.
 
Old 03-15-2013, 10:04 AM   #6
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by David the H. View Post
Edit: @daniel, I really hate seeing multiple commands chained together when one can do the job. In this case replace "q" with "Q" and it will exit before printing the last line.
Code:
sed '/^ *$/Q'  $InFile
Perfect! Technical intuition suggested this could be done but I couldn't find the Q. Thank you!

Daniel B. Martin

Last edited by danielbmartin; 03-15-2013 at 11:17 AM. Reason: Correct wording
 
Old 03-15-2013, 10:22 AM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Thanks David ... hadn't seen that one before ... tick something new today
 
Old 03-15-2013, 11:18 AM   #8
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Quote:
Originally Posted by Tech109 View Post
So what I am doing is searching for each "stringx" and grabbing it, plus its following choices, down to the blank line, because that is where the list ends and the next string begins. Then for each string + choices found in the source file, I'm writing those to a new file. The actual source file can contain hundreds of entries like above.

Oh, well if that's what you want, consider using the csplit utility instead (it's part of the coreutils). It splits text into multiple files based on patterns or numbers of lines.

Code:
csplit -f "file-" -b "%03d.txt" -z infile.txt '/^$/' '{*}'
Read the info page for the full details on how it's used.

The only problem with the above is that the blank lines are still left in the new files. But a simple bit of post-processing with sed can remove those.

Code:
for fname in file*.txt; do sed -i '/^$/d' "$fname"; done
 
Old 03-15-2013, 11:23 AM   #9
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by Tech109 View Post
So what I am doing is searching for each "stringx" and grabbing it, plus its following choices, down to the blank line, because that is where the list ends and the next string begins. Then for each string + choices found in the source file, I'm writing those to a new file. The actual source file can contain hundreds of entries like above.
With this InFile ...
Code:
string1
choice1-1
choice1-2
choice1-3

string2
choice2-1
choice2-2
choice2-3
choice2-4
choice2-5

string3
choice3-1
choice3-2

string4
choice4-1
choice4-2
choice4-3
... this code ...
Code:
#!/bin/bash     Daniel B. Martin   Mar13
#
#   To execute this program, launch a terminal session and enter:
#   bash /home/daniel/Desktop/LQfiles/dbm684.bin
#
#  This program inspired by:
#  http://www.linuxquestions.org/questions/programming-9/
#    using-sed-to-search-and-stop-at-a-blank-line-4175454081/

# File identification
   Path=$(cut -d'.' -f1 <<< ${0})
 InFile=$Path"inp.txt"

echo
echo "Method of LQ member danielbmartin #1"
# Blow away any leftover output files.
rm -f $Path'out'*'.txt'
k=1
while read line
  do
    if [[ "$line" == "" ]]; then let k=$k+1
                            else echo $line >> $Path'out'$k'.txt'
    fi
  done < $InFile


echo; echo "Normal end of job."; echo
exit
... produced four subset output files, as specified.

Daniel B. Martin
 
Old 03-15-2013, 12:24 PM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Yeah not sure why this would have to be so difficult:
Code:
awk '{print > "file"++i}' RS="" infile
Based on Daniel's input file this yields 4 output files with the required data stored in them.
 
1 members found this post helpful.
Old 03-15-2013, 12:28 PM   #11
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,928

Rep: Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612
Code:
while read line ; do
    case $line in
       '') exit ;;
        *) echo $line >> wherever ;;
    esac
  done < $InFile
A case statement is probably faster than using '[[' or 'test' builtins -and also faster than a pipe through sed (twice!) for small files. sed gives me headaches...
 
Old 03-15-2013, 02:21 PM   #12
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by gnashley View Post
Code:
while read line ; do
    case $line in
       '') exit ;;
        *) echo $line >> wherever ;;
    esac
  done < $InFile
Did you test this? I don't see any code which modifies the "wherever." Without that, all output goes into the same file. I modified your code thusly ...
Code:
k=1
while read line ; do
    case $line in
       '') let k=$k+1 ;;
        *) echo $line >> $Path'out'$k'.txt' ;;
    esac
  done < $InFile
... and it works.

Daniel B. Martin
 
Old 03-16-2013, 03:01 AM   #13
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,928

Rep: Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612
I don't see why the output file name needs to be modified by a counter. I guess I'm missing something -I thought the idea was to "stop when it reaches a blank line". Either way, the case statement will be faster than [[ or test.
 
Old 03-16-2013, 03:28 AM   #14
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
@gnashley - your original idea was correct for the first post , but as of post #5 the OP has now asked that each part of the file be entered into separate files
 
Old 03-16-2013, 04:51 AM   #15
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,928

Rep: Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612
Oops, I guess I've slept since then... No, wait, it seems to be raining in my hat!
 
  


Reply

Tags
bash scripting, sed, shell script, shell scripting



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] sed: global search and replace if a string isn't anywhere in that line linux_kung_fu Linux - General 5 03-09-2012 10:53 AM
sed: remove newline except when it's a blank line muzzol Linux - Newbie 6 02-12-2012 01:52 PM
sed multi-line search/replace woes djmm Programming 8 03-17-2009 05:25 AM
Putting blank line after the search pattern. dina3e Programming 2 09-21-2008 07:38 AM
grab the line below a blank line and the line above the next blank line awk or perl? Pantomime Linux - General 7 06-26-2008 08:13 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:11 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration