LinuxQuestions.org
Latest LQ Deal: Linux Power User Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 12-06-2018, 01:09 AM   #1
bsmile
Member
 
Registered: Oct 2017
Posts: 42

Rep: Reputation: Disabled
Question how to use regular expression to represent a block of content?


Given an input_file containing the following lines,

Code:
KCUBE_BULK
 -0.5  -0.5  -0.5   ! Original point for 3D k plane


LATTICE
Angstrom
    3.473994998    0.000000000    0.000000000
    0.000000000    6.297732338    0.000000000
    0.000000000   -0.000725376   13.828612779


SURFACE
 1  0  0
 0  1  0
 0  0  1
    
    
ATOM_POSITIONS
12                              ! number of atoms for projectors
Cartisen                          ! Direct or Cartisen coordinate
 Mo   1        1.737000968    5.636422789    0.005638021 
 Mo   2        0.000003469    3.357094654   13.629153510


PROJECTORS
 5 5 5 5 3 3 3 3 3 3 3 3         ! number of projectors

I would like to replace 

    3.473994998    0.000000000    0.000000000
    0.000000000    6.297732338    0.000000000
    0.000000000   -0.000725376   13.828612779
with another set of data using command sed. I would roughly use something like the following to do the replacement,

Code:
      sed -e :a -e N -e '$!ba' -e "s/regexp1/Angstrom\n$string/g" input_file
A simple definition of regexp1 as /Angstrom.*\n\n/ would replace the whole block between Angstrom and the last two newlines instead of the two newlines immediately coming after the data block. Also, use of * does not function as expected within the two double quotation marks.

Thanks for any input. I am completely a newbie in regular expression and don't even know what to google for an answer for this issue.

Last edited by bsmile; 12-06-2018 at 11:29 AM.
 
Old 12-06-2018, 02:09 AM   #2
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_10{.0|.1|.2}
Posts: 4,839
Blog Entries: 6

Rep: Reputation: 2667Reputation: 2667Reputation: 2667Reputation: 2667Reputation: 2667Reputation: 2667Reputation: 2667Reputation: 2667Reputation: 2667Reputation: 2667Reputation: 2667
Please place your code snippets inside [CODE]...[/CODE] tags for better readability. You may type those yourself or click the "#" button in the edit controls.

It is not at all clear to me what you intended with the -e's in your example, but the basic method of replacing multiple lines, i.e. a block of text, with sed is to use a line address range, in your case from a line beginning with Angstrom to the next empty line, like this...

Code:
/^Angstrom/,/^$/
The caret, ^, means first of line and $ means end of line, the start address is the word Angstrom anchored at the start of the line, and ^$ means nothing between start and end - an empty line.

Then you can do multiple manipulations within curly braces, or if you want to replace everything in that range use the change operator, 'c'.

Putting this into a script for reuse (call it replace.sed for now)...

Code:
/^Angstrom/,/^$/c\
Angstrom\
12345\
67890\
ABCD\
Then used with your example input file we get...

Code:
sed -f replace.sed infle.txt
KCUBE_BULK
-0.5 -0.5 -0.5 ! Original point for 3D k plane


LATTICE
Angstrom
12345
67890
ABCD


SURFACE
1 0 0
0 1 0
0 0 1


ATOM_POSITIONS
12 ! number of atoms for projectors
Cartisen ! Direct or Cartisen coordinate
Mo 1 1.737000968 5.636422789 0.005638021
Mo 2 0.000003469 3.357094654 13.629153510


PROJECTORS
5 5 5 5 3 3 3 3 3 3 3 3 ! number of projectors
Is that roughly what you are trying to do?

In sed you can provide addresses in different ways, but the sed man page will cover them all, simply look for "address ranges".

Last edited by astrogeek; 12-06-2018 at 02:44 AM. Reason: typos, clarity
 
1 members found this post helpful.
Old 12-06-2018, 04:39 PM   #3
bsmile
Member
 
Registered: Oct 2017
Posts: 42

Original Poster
Rep: Reputation: Disabled
Thanks, this is so helpful! At least now I could pick up the part which I want to replace. A few follow-up questions here

(1) the following two give identical results, which is expected. Is it a better practice to try to describe the feature of a whole line instead of just giving some characteristic keywords of that line?

Code:
sed "/^Angstrom/,/^$/c\cccc" input_file
sed "/Angstrom/,/^$/c\cccc" input_file

(2) Actually I need to replace the three old data lines coming after Angstrom with three new data lines. I have figured the answer by googling, but cannot understand what's in {}. Could you please explain a bit here?

Code:
cccc=`head -n 5 data_file | tail -n 3`
echo $cccc
sed "/^Angstrom/,/^$/c\\${cccc//$'\n'/\\n}" input_file

(4) The way to use -e in my original post is to read the whole input_file to sed as a single pattern line and then ask s command to do replacement for me. This then reduces replacing the source string with a target string. Surely this is not as elegant as your suggestion, but it should work in principle if we can write a regular expression to represent the following source lines

Code:
Angstrom
    3.473994998    0.000000000    0.000000000
    0.000000000    6.297732338    0.000000000
    0.000000000   -0.000725376   13.828612779
How to possibly write out that regular expression out?
 
Old 12-06-2018, 06:11 PM   #4
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_10{.0|.1|.2}
Posts: 4,839
Blog Entries: 6

Rep: Reputation: 2667Reputation: 2667Reputation: 2667Reputation: 2667Reputation: 2667Reputation: 2667Reputation: 2667Reputation: 2667Reputation: 2667Reputation: 2667Reputation: 2667
Quote:
Originally Posted by bsmile View Post
Thanks, this is so helpful! At least now I could pick up the part which I want to replace. A few follow-up questions here

(1) the following two give identical results, which is expected. Is it a better practice to try to describe the feature of a whole line instead of just giving some characteristic keywords of that line?

Code:
sed "/^Angstrom/,/^$/c\cccc" input_file
sed "/Angstrom/,/^$/c\cccc" input_file
The first with the ^ is just more specific as it matches Angstrom only at the start of the line, the second matches it anywhere in a line. Use whatever best fits your incoming data, but generally the more specific your expressions the less they are likely to match in unintended places.

Quote:
Originally Posted by bsmile View Post
(2) Actually I need to replace the three old data lines coming after Angstrom with three new data lines. I have figured the answer by googling, but cannot understand what's in {}. Could you please explain a bit here?

Code:
cccc=`head -n 5 data_file | tail -n 3`
echo $cccc
sed "/^Angstrom/,/^$/c\\${cccc//$'\n'/\\n}" input_file
The curly braces in this use are just variable expansion in the shell, not part of the regular expression itself.

I did not write it so I cannot say with certainty what was intended, but it looks like a pattern substitution to fix (escape) the line breaks at the end of lines in the $cccc string. See man bash, Parameter Expansion for details of usage. You might also ask the person who wrote it if it was an example from elswhere.

The two leading backslashes, \\, simply preserve any whitespace at the start of the substituted string in $cccc.

Quote:
Originally Posted by bsmile View Post
(4) The way to use -e in my original post is to read the whole input_file to sed as a single pattern line and then ask s command to do replacement for me. This then reduces replacing the source string with a target string. Surely this is not as elegant as your suggestion, but it should work in principle if we can write a regular expression to represent the following source lines

Code:
Angstrom
    3.473994998    0.000000000    0.000000000
    0.000000000    6.297732338    0.000000000
    0.000000000   -0.000725376   13.828612779
How to possibly write out that regular expression out?
Well, the only way I would do that is as I have shown, with line addressing, otherwise it gets very messy and fragile.

Perhaps someone else will have a better idea how to do that.

Last edited by astrogeek; 12-06-2018 at 06:45 PM.
 
Old 12-06-2018, 06:29 PM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 17,242

Rep: Reputation: 2638Reputation: 2638Reputation: 2638Reputation: 2638Reputation: 2638Reputation: 2638Reputation: 2638Reputation: 2638Reputation: 2638Reputation: 2638Reputation: 2638
astrogeek gave you a perfectly good solution - there is no reason to not use it.
There is no intrinsic benefit in reading the entire file into storage that I can see, but regardless, the regex offered would do the job.

As for which regex expression - be as specific as you can be. But beware of whitespace intrusions - especially at beginning and end of lines.
 
1 members found this post helpful.
Old 12-07-2018, 08:40 AM   #6
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: MacOS, Slackware
Posts: 456

Rep: Reputation: 126Reputation: 126
By the way, I would replace:
Quote:
Originally Posted by bsmile View Post
cccc=`head -n 5 data_file | tail -n 3`
with
Code:
cccc=$(sed -n '3,5p' data_file)
or
Code:
cccc=$(sed -n '3,+2p' data_file)
 
Old 12-07-2018, 11:27 AM   #7
bsmile
Member
 
Registered: Oct 2017
Posts: 42

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by l0f4r0 View Post
By the way, I would replace:

with
Code:
cccc=$(sed -n '3,5p' data_file)
or
Code:
cccc=$(sed -n '3,+2p' data_file)
Thanks for the input, I had thought about these options but finally chose the most familiar way to proceed. Could you please explain, say, why prefer $() over `` and why using sed instead of head/tail?
 
Old 12-07-2018, 12:03 PM   #8
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: MacOS, Slackware
Posts: 456

Rep: Reputation: 126Reputation: 126
Quote:
Originally Posted by bsmile View Post
Could you please explain, say, why prefer $() over ``[...]?
Quote:
In general, the $() should be the preferred method:
  • it's clean syntax
  • it's intuitive syntax
  • it's more readable
  • it's nestable
  • its inner parsing is separate
Source: http://wiki.bash-hackers.org/syntax/expansion/cmdsubst

Quote:
Originally Posted by bsmile View Post
Could you please explain, say, [...] why using sed instead of head/tail?
sed here is more readable, direct, concise, consumes less CPU and is quicker.

Last edited by l0f4r0; 12-07-2018 at 12:05 PM.
 
Old 12-07-2018, 03:52 PM   #9
MadeInGermany
Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 953

Rep: Reputation: 417Reputation: 417Reputation: 417Reputation: 417Reputation: 417
With shell-builtins:
Code:
cccc=$(sed -n '3,5p' data_file)
prt=1
while IFS= read line
do
  [ "$prt" ] && echo "$line"
  case $line in
  (Angstrom*)
    echo "$cccc"
    prt=
  ;;
  (*[!" "]*)
  # a non-blank line
  ;;
  (*)
    # a blank line
    [ "$prt" ] || echo "$line"
    prt=1
  ;;
  esac
done < input_file

Last edited by MadeInGermany; 12-07-2018 at 04:00 PM. Reason: add the trailing blank line
 
  


Reply

Tags
regular expression, sed bash


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] jhalfs sed: -e expression #1, char 55:Invalid preceding regular expression percy_vere_uk Linux From Scratch 10 07-22-2017 08:15 AM
To Block diffrent URL through Regular Expression through ACL,s in squid nawazwattoo Linux - Software 1 06-04-2014 10:13 PM
[SOLVED] Help building a regular expression for use in split function? mrm5102 Linux - Newbie 2 04-04-2011 09:30 AM
[SOLVED] How to use regular expression in grep? sagarkha Linux - Newbie 6 01-23-2010 09:15 AM
SWATCH::not able to get the correct match count when i use regular expression vjayraghavan Linux - Newbie 0 03-27-2009 02:16 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 09:15 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration