Another sed thread ...

t0x · 11-02-2009, 10:02 AM

Hi,

I need help. I now how to use regular expressions, and I thought that I could apply them directly to sed but it doesn't seem to work.

I need to append a string to multiple php files like this :

Quote:

startwebtext( 0, 'top_bulhosa_titulo' );?><?=$main_title;?><?$main_title=stopwebtext(false);
/*startWebText(0,"instock_message");echo("Em stock, enviamos em 24 horas .");
<h3><%startWebText(0,"sinopse_detalhe_produto")%>Sinopse<%stopWebText()%></h3>
<h3><%startWebText(0,"livro_por_dentro")%>O livro por dentro<%stopWebText()%></h3>

Every time I find something with "startwebtext" I want to add a string right after it.

eg.

This :

startwebtext( [number], [content] );

should become this :

startwebtext( [number], [NEW TEXT] . [content] );

this is not working :

sed -e 's/startwebtext$\s*([0-9]),\s*(.*?)\s*$;/startwebtext\( $1, \$_GLOBALS['test'] . $2 );/ig' file

David the H. · 11-02-2009, 11:43 AM

It usually helps to post the exact results you get when you say something "isn't working".

Is "\$_GLOBALS['test']" supposed to be a literal string, or a shell variable/array value that gets replaced by the shell? If the latter, you'll have to replace the single quotes around the sed expression with double quotes (or "unquote" the string itself) so that the shell can expand it.

It's better to go with a full regex expression for something like this. The following assumes that you want a case-insensitive match. It works on your example string, but I can't guarantee it will match every possible combination in your file.

Code:

sed -r -e 's/startwebtext?\([ ]?([0-9]),[ ]([^)]*)\)/startWebText(\1, $_GLOBALS['test'] . \2)/Ig'

Dr_P_Ross · 11-02-2009, 12:29 PM

tOx,

You wrote:

Quote:

this is not working :
sed -e 's/startwebtext$\s*([0-9]),\s*(.*?)\s*$;/startwebtext\( $1, \$_GLOBALS['test'] . $2 );/ig' file

There are a couple of things wrong here. In the replacement you have used
$1 and $2, presumably intended to be the first and second parts of the old content of startwebtext(...), but the sed syntax should be \1 and \2 rather than $1 and $2. Also, in the match part of the s/../../, there is only one pair of $..$, and that pair ends just before the semicolon.

I think you want something like this instead:

Quote:

sed -e 's/startwebtext($\s*[0-9]\s*,$$[^)]*$);/startwebtext(\1 \$_GLOBALS['test']. \2 );/igp' file

In this, the first $..$ captures the content up to and including the comma, maybe with spaces around the single digit. The second $..$
captures everything after the comma that does not include a right parenthesis. And the match expects to see a final right parenthesis and semicolon.

This does of course assume that the text string after the comma does not include a right parenthesis, but if so you can adjust it to suit.

ghostdog74 · 11-02-2009, 06:04 PM

don't have to use long messy regex for this.

Code:

awk -vFS=","  '/startwebtext/{
   printf "%s,%s" ,$1,"new words"
   for(i=2;i<NF;i++) printf $i FS
   printf $NF
}'  file

output

Code:

$ more file
startwebtext( [number], [content] );
$ ./shell.sh
startwebtext( [number],new words [content] );

Dr_P_Ross · 11-03-2009, 06:05 AM

ghostdog74,

Although awk is very handy for quick solutions to many tasks, your suggested solution is not general enough. It supposes that startwebtext occurs before the first comma on a line, and at most once per line. And in strings (eg line 2 of tOx's sample input), white space after commas would be munged. For case-insensitive matching, you would also need the gawk-specific IGNORECASE=1 or similar.

ghostdog74 · 11-03-2009, 06:47 AM

Quote:

Originally Posted by Dr_P_Ross

ghostdog74,

Although awk is very handy for quick solutions to many tasks, your suggested solution is not general enough.

who says every suggestion i post have to be general and solve every possible cases? none of the sed solutions posted address the issue of multiple lines either. If a thorough solution is desired, then more data is needed for test cases. But whatever it is, awk is still the better tool to use, period

pixellany · 11-03-2009, 07:04 AM

Hmmmmm---I guess the OP would have to define the general formula: What is "good enough" depends on the specific data set, n'est-ce pas?

Quote:

But whatever it is, awk is still the better tool to use, period

I'm quite sure that I read somewhere that it is a badge of honor to solve problems with SED, if at all possible---preferably with code that is as obfuscated as possible........

Seriously, the tool to use is:
-the one that works
-the one you know how to use

Now where is the OP?

ghostdog74 · 11-03-2009, 07:23 AM

Quote:

Originally Posted by pixellany

I'm quite sure that I read somewhere that it is a badge of honor to solve problems with SED,

where did you read that?
use sed only for simple subs.

The tool to use is the one that works and at the same time, makes code easy to read and understand.

pixellany · 11-03-2009, 07:46 AM

Quote:

Originally Posted by ghostdog74

where did you read that?
use sed only for simple subs.

The tool to use is the one that works and at the same time, makes code easy to read and understand.

understand by who? A professional programmer has an obligation to write code so that his/her stakeholders can understand it. If I am writing code for my own use, then what counts is whether I understand the utility well enough to solve the problem. Thus I often use SED because I understand it better.

I am however in awe of the AWK gurus, and I will learn it someday.

Quote:

I'm quite sure that I read somewhere that it is a badge of honor to solve problems with SED

where did you read that?

Feeble attempt at humor.....

ghostdog74 · 11-03-2009, 07:57 AM

Quote:

Originally Posted by pixellany

understand by who? ...
...
A professional programmer has an obligation to write code so that his/her stakeholders can understand it.

Stakeholder's do not need to know what you do with your programs. All they need to know is whether your company is making money. Even system owners don't have to know code details . They just need to know the business sense of it. The actual person who is going to troubleshoot your code if things happen, is the one that needs to read your code.

pixellany · 11-03-2009, 08:00 AM

Sorry, I used "stakeholders" a bit loosely. Your description is correct.

Dr_P_Ross · 11-03-2009, 09:16 AM

I think we can all agree that awk is more powerful and useful (and bigger) than sed, just as perl is even more powerful (and bigger).
Sed still has its uses. Harking back to the original question,
we don't know in which field, as defined by the FS field separator,
the "startwebtext" appears, so an awk solution might need to use
gawk's gensub function .. in which case we'd still need a slightly non-trivial regexp.

More generally, don't just stick with one tool. The more you learn, the
more common ground you find and the easier it becomes to be versatile.

H_TeXMeX_H · 11-03-2009, 02:36 PM

I don't think it matters if you use sed or awk, as long as it works and is readable (at least to you).

Usually awk is for tables. If you're dealing with tables, awk or perl. For substitution, deleting, and other quick modifications then you should probably use sed. But, it's your choice, it's just that you may have a harder time using the not as appropriate tool.

pixellany · 11-03-2009, 02:50 PM

I'll add to my list of criteria for the "right" tool: The one that is closest. I once used my hair brush as a hammer. It was there, and getting a hammer would have been an unknown mission (in a hotel).

Applied to scripting, the message is that the best tool is often the one you can find the quickest ("find" includes looking up some specific syntax.) If you use Google to find your tools, then "best" might simply mean "most popular" (which in turn = the one that is closest.)

H_TeXMeX_H · 11-03-2009, 02:52 PM

... but a hair brush will take more hits to drive a nail, and it can break more easily ...