LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
Search this Thread
Old 02-16-2009, 12:27 AM   #1
nemobluesix
LQ Newbie
 
Registered: May 2008
Distribution: fedora 9
Posts: 13

Rep: Reputation: 0
Question awk regexp for one character match


Hello.

I'm working on an awk script and one of the rules should change strings like:
Code:
%some text %param_id% some text%
to:
Code:
%some text 33 some text%
I tried with this expression
Code:
match($0,/%param_(.+)%/,prm)
and it works great but it has a problem. If the original string ends with %param_id% like this
Code:
%some text %param_id%%
(which may happend very often) then the string matched is %param_id%% and not %param_id% and, of course, prm[1] becomes "id%" instead of "id".

I read the manual and it sais that the expression should look like
Code:
match($0,/%param_(.+)%{1}/,prm)
but that doesn't work

Any ideas?
Thanks

Last edited by nemobluesix; 02-16-2009 at 12:28 AM. Reason: spell
 
Old 02-16-2009, 02:51 AM   #2
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,985
Blog Entries: 11

Rep: Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879
I'm not 100% convinced that the tool (match) you're using
is the right tool for your job. What exactly do you want
to do with what's returned in your array prm?


Code:
[tink@tink:~]$ echo '%some text %param_id%%'|awk '{print gensub(/(.*)(%param_[^%]+%)(.*)/, "\\133\\3", "1")}'
%some text 33%
[tink@tink:~]$ echo '%some text %param_id% and now what%'|awk '{print gensub(/(.*)(%param_[^%]+%)(.*)/, "\\133\\3", "1")}'
%some text 33 and now what%
 
Old 02-16-2009, 11:39 AM   #3
nemobluesix
LQ Newbie
 
Registered: May 2008
Distribution: fedora 9
Posts: 13

Original Poster
Rep: Reputation: 0
Hi Tinkster,
Thanks for your reply.
You are right, gensub is more suited here than match. I was using match to save the name of the param to use it later in a sub function. It looks silly now . I didn't know that gensub can do that too.
Based on your code I reached this solution:

Code:
$ echo "%text text %param_id% text text text%param_name%%" | awk '
> BEGIN {
> for(i=0;i<ARGC;i++){
> if(match(ARGV[i],/param_(.+)=(.+)/,p)) param[p[1]]=p[2];
> }
> }
> {
> print gensub(/%param_([^%]+)%/, param["\\1"], "g");
> print "debug: param_id - " param["id"];
> }' param_id=100 param_name=abc
%text text  text text text%
debug: param_id - 100
This is the best I could achieve. The output should have been:
Code:
%text text 100 text text textabc%
Why param["\\1"] is empty?

==============
If you are curious, the whole process looks like this:
1) I call the script with an unknown number of arguments and with unknown names (before runtime) like this:
Code:
$ ./test.awk param_id=12 param_name=abc ... data_file
2) the BEGIN section reads all the param_* pairs and saves each value in an array, say param, with the names used as indexes like this:
param["id"]=12;
param["name"]=abc;
...
3) I parse the data_file and replace each param_* with its value:
%param_id% becomes 12
%param_name% becomes abc

The ideea is that I want to replace "words" I don't know before calling the script. Maybe you have a better solution.

Last edited by nemobluesix; 02-16-2009 at 12:49 PM.
 
Old 02-16-2009, 12:42 PM   #4
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,985
Blog Entries: 11

Rep: Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879
The problem here is that awk's regex' are greedy, and that this behaviour
(as far as I know) can not be modified. So the first expression "(.*)"
matches everything including %param_id% and only picks up the last bit.

The only work-around within awk will be two independent statements.
Code:
gensub(/(.*)(%param_id[^%]+%)(.*)/, "\\133\\3", "1")
gensub(/(.*)(%param_name[^%]+%)(.*)/, "\\133\\3", "1")
Regarding the flexible number of arguments - how are you handling
multiple param_names or ids on the command-line in terms of assignment
to variables and then arrays? W/o testing my gut says that only the
last thing on the command-line will be valid within the BEGIN section.
 
Old 02-16-2009, 12:56 PM   #5
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,985
Blog Entries: 11

Rep: Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879
And on a second thought .... maybe awk isn't quite what you're after in
the first place ... have you considered using m4?
 
Old 02-16-2009, 01:20 PM   #6
nemobluesix
LQ Newbie
 
Registered: May 2008
Distribution: fedora 9
Posts: 13

Original Poster
Rep: Reputation: 0
I should have post a new reply instead of editing my post, maybe you did'n notice the change.

I'm ok now with the regexp, it works. The problem is that I can't use my array inside gensub. As you see above, inside gensub param["\\1"] evaluates to "" and outside, as expected, to the correct value.

I never used m4. Based on the problem described in my previous post, you think m4 would be better? And if yes, what's the learning curve? My awk script only needs this issue fixed and it's done.
Thanks again.
 
Old 02-16-2009, 01:28 PM   #7
nemobluesix
LQ Newbie
 
Registered: May 2008
Distribution: fedora 9
Posts: 13

Original Poster
Rep: Reputation: 0
I'm not using the arguments from the command line as variables as they were ment to be. I'm using both param_anything and value as values. param_id and param_name were just examples; they can be anything the user thinks of.
I couldn't find a better way to pass these things inside the script...
 
Old 02-16-2009, 10:50 PM   #8
nemobluesix
LQ Newbie
 
Registered: May 2008
Distribution: fedora 9
Posts: 13

Original Poster
Rep: Reputation: 0
Thumbs up working code

well... this combination might no be the best but it works
Code:
$ cat test.awk
#! /bin/awk -f

BEGIN {
        for(i=0;i<ARGC;i++){
                if(match(ARGV[i],/param_(.+)=(.+)/,p)) param[p[1]]=p[2];
        }
}

{
        while(match($0,/%param_([^%]+)%/,pa)){ if(!sub(/%param_[^%]+%/, param[pa[1]],$0)) break; }
}
{ print $0; }

$ echo %text text %param_one% text text%param_two% text text text text%param_n%% | ./test.awk param_one=111 param_two=222 param_n=xxx
%text text 111 text text222 text text text textxxx%
I could not get gensub to read param["\\1"] so I, again, used match to save the "\\1" piece.
 
  


Reply

Tags
awk, regexp


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Character \ in awk indiancosmonaut Programming 6 06-30-2008 07:57 PM
simple pattern match with awk, sed alenD Linux - Newbie 10 03-10-2008 02:31 PM
RE in commands like match() inside awk. stalin.varanasi Linux - Newbie 2 12-12-2007 11:31 PM
REGEXP Match * through multiple lines ? ALInux Linux - Software 12 08-14-2007 07:39 AM
grep/sed/awk - find match, then match on next line gctaylor1 Programming 3 07-11-2007 08:55 AM


All times are GMT -5. The time now is 02:02 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration