LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-16-2010, 03:27 PM   #1
gfarrell
LQ Newbie
 
Registered: May 2010
Location: London
Distribution: Mac OSX, Ubuntu, Debian
Posts: 29

Rep: Reputation: 15
Angry Perl regex not matching across multiple lines despite ms flags


Hi all,

I have written a regular expression (tested in regexpal and regextester alpha something) with which I want to replace something like:

Code:
myFunction: function myFunction(arg1, arg2) {
moreCode = someCode() + someMoreCode();
return moreCode();
}
into
Code:
myFunction: (function myFunction ...
})
The regular expression I've used is:
Code:
([A-Za-z0-9_]+):\s?function\s?\1?\(([A-Za-z0-9_\ ,]*)\) {(\n?((.*)\n\t?)*)}
which is the most versatile matching pattern I need, and the replacement pattern is:
Code:
$1: (function $1($2) {$3})
So I wrote a bash script to do this for me:
Code:
#! /bin/bash

for file in `find /Users/gid/Sites/uikit_testing/app/webroot/js/ -name *.js`;
do
	echo "Processing $file"
	cp $file $file.bak
	perl -pi -e "s/([A-Za-z0-9_]+):\s?function\s?\1?\(([A-Za-z0-9_\ ,]*)\) {(\n?((.*)\n\t?)*)}/\1: (function \1(\2) {\3})/gms" $file;
done
but it only matches functions which occupy one line only, despite my tests showing multiple line matching in javascript testers online and using the m and s flags (which should make it multi line no?)

This is really frustrating me so I'd love some help.
 
Old 08-16-2010, 05:12 PM   #2
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
We should first verify that it's not a syntax problem. Try this mod:
Code:
#! /bin/bash

while read file; do
	echo "Processing $file"
	cp "$file" "$file.bak"
	perl -pi -e 's/([A-Za-z0-9_]+):\s?function\s?\1?\(([A-Za-z0-9_\ ,]*)\) {(\n?((.*)\n\t?)*)}/\1: (function \1(\2) {\3})/gms' "$file";
done < <(exec find /Users/gid/Sites/uikit_testing/app/webroot/js/ -name '*.js')
Notice how arguments are properly placed in quotes.
 
Old 08-16-2010, 05:29 PM   #3
gfarrell
LQ Newbie
 
Registered: May 2010
Location: London
Distribution: Mac OSX, Ubuntu, Debian
Posts: 29

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by konsolebox View Post
We should first verify that it's not a syntax problem. Try this mod:
Code:
#! /bin/bash

while read file; do
	echo "Processing $file"
	cp "$file" "$file.bak"
	perl -pi -e 's/([A-Za-z0-9_]+):\s?function\s?\1?\(([A-Za-z0-9_\ ,]*)\) {(\n?((.*)\n\t?)*)}/\1: (function \1(\2) {\3})/gms' "$file";
done < <(exec find /Users/gid/Sites/uikit_testing/app/webroot/js/ -name '*.js')
Notice how arguments are properly placed in quotes.
Thanks for the reply, but it had the same effect as before, only transforming single-line functions e.g. from
Code:
myFunction: function myFunction(arg) {...},
to
Code:
myFunction: (function myFunction(arg) {...}),
Whereas I am trying to match all function definitions...
 
Old 08-16-2010, 05:58 PM   #4
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
Try this one
Code:
#! /bin/bash

while read file; do
	echo "Processing $file"
	cp "$file" "$file.bak"
	perl -pi -e 'undef $/; s/([[:alnum:]_]+):\s?function\s?\1?\(([[:alnum:]_\s,]*)\) {([^}]*)}/\1: (function \1(\2) {\3})/gs' "$file";
done < <(exec find /Users/gid/Sites/uikit_testing/app/webroot/js/ -name '*.js')
Only that this method is limited to functions that doesn't contain another '}' inside the body.
 
Old 08-16-2010, 06:17 PM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Aargh, konsolebox just beat me to almost the exact same solution. And I spent all morning trying to figure it out, since I have little direct experience with perl. (Indeed, I still don't know exactly what undef $/ does, only that it seems to work.)

One thing I can add though is that I read that \w equals [[:alnum:]_], so you should be able to use that instead. This is what I came up with in the end:
Code:
perl -pe 'undef $/; s/(\w):\s?(function\s?\1?([^}]+)})/\1: (\2)/gms'

Last edited by David the H.; 08-16-2010 at 06:54 PM. Reason: minor correction
 
Old 08-16-2010, 06:43 PM   #6
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
That was only a difference in time. I also don't know much about perl re (not even noticed \w) so I searched the web for answers. That icon's really funny btw.
 
Old 08-16-2010, 06:50 PM   #7
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
No sweat there. It was a good learning experience for me. It's just a bit of a letdown to finally figure something out, only to be beaten by just a few minutes.

Now I know of a new way to handle multi-line matches. If only sed would introduce similar flags....
 
Old 08-16-2010, 07:28 PM   #8
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
I searched again. It appears that it can also be done in sed:
Code:
sed -n '
	# if the first line copy the pattern to the hold buffer
	1h
	# if not the first line then append the pattern to the hold buffer
	1!H
	# if the last line then ...
	$ {
		# copy from the hold to the pattern buffer
		g
		# do the search and replace
		s/\([[:alnum:]_]\+\):[[:blank:]]\?function[[:blank:]]\?\1\?(\([[:alnum:]_[:blank:],]*\)) {\([^}]*\)}/\1: (function \1\(\2\) {\3})/g
		# print
		p
	}
' file
reference: http://austinmatzko.com/2008/04/26/s...h-and-replace/

Last edited by konsolebox; 08-16-2010 at 07:33 PM. Reason: don't need extra \(\)
 
Old 08-17-2010, 04:51 AM   #9
gfarrell
LQ Newbie
 
Registered: May 2010
Location: London
Distribution: Mac OSX, Ubuntu, Debian
Posts: 29

Original Poster
Rep: Reputation: 15
Hi guys,

I really appreciate all the responses but the regex I originally supplied:
Code:
([A-Za-z0-9_]+):\s?function\s?\1?\(([A-Za-z0-9_\ ,]*)\) {(\n?((.*)\n\t?)*)}
Actually matched all of these functions properly:
Code:
	aFunction: function aFunction() {
		for(var i = 0; i<10; i++) {
			document.write(i);
		}
		someStuff();
		//This is a comment...
	},
	anotherFunction: function anotherFunction(myArg) {},
	yetAnotherOne: function yetAnotherOne() {}
The problem is not with the regex but with perl because it only matched anotherFunction and yetAnotherOne but not aFunction. This is probably because perl is having issues with my newline and tab characters?
 
Old 08-17-2010, 06:48 AM   #10
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
But even if it's not the case that perl properly recognize newline and tab chars, this expresion could still match more than just a single function
Code:
{(\n?((.*)\n\t?)*)}
Code:
... {
...
}
...
... {
...
}
 
Old 08-17-2010, 07:29 AM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Thought I would put in 2C if not perl:
Code:
awk 'BEGIN{RS=""}/function/{sub(/function/,"(&");$NF=$NF ")"}1' file
 
Old 08-17-2010, 07:47 AM   #12
gfarrell
LQ Newbie
 
Registered: May 2010
Location: London
Distribution: Mac OSX, Ubuntu, Debian
Posts: 29

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by konsolebox View Post
But even if it's not the case that perl properly recognize newline and tab chars, this expresion could still match more than just a single function
Code:
{(\n?((.*)\n\t?)*)}
Code:
... {
...
}
...
... {
...
}
Try it here: http://www.regextester.com/. It works fine both on "Javascript" and "Preg".

@grail: You'll have to explain that to me...
 
Old 08-17-2010, 08:58 AM   #13
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Quote:
@grail: You'll have to explain that to me...
No probs

It sets the Record Separator to be an empty line and then searches for any record containing the word function. Once found it prepends an opening bracket to the
start of the word 'function' and then to the last field it appends a closing bracket. The 1 at the end prints everything it runs into so you can transfer all contents to a new file. If you are replacing the original you will need to perform a move on completion.
 
Old 08-17-2010, 09:52 AM   #14
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
@grail So that's the power of RS. I never thought that awk can do that. That should be helpful with the improvement of my compiler .

--- a wrong idea ---

Last edited by konsolebox; 08-17-2010 at 09:54 AM.
 
Old 08-17-2010, 10:01 AM   #15
gfarrell
LQ Newbie
 
Registered: May 2010
Location: London
Distribution: Mac OSX, Ubuntu, Debian
Posts: 29

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by grail View Post
No probs

It sets the Record Separator to be an empty line and then searches for any record containing the word function. Once found it prepends an opening bracket to the
start of the word 'function' and then to the last field it appends a closing bracket. The 1 at the end prints everything it runs into so you can transfer all contents to a new file. If you are replacing the original you will need to perform a move on completion.
So if I wanted to make it even more limited so it only matched the pattern
Code:
someName: function someName... }
?
 
  


Reply

Tags
perl, regex



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Perl Regex matching HTML hawk__0 Programming 2 03-19-2010 07:57 PM
php preg_replace and perl regex matching aolong Programming 1 06-02-2009 05:32 PM
Matching two wildcards with perl and regex borinus Programming 3 09-09-2008 04:04 AM
Embedded regex matching in Perl GATTACA Programming 5 01-17-2007 09:16 AM
perl regex matching exodist Programming 2 11-15-2004 10:50 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:59 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration