LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-25-2009, 06:00 AM   #1
koobi
Member
 
Registered: Jun 2006
Location: Colombo, Sri Lanka
Distribution: Ubuntu
Posts: 103

Rep: Reputation: 15
Capturing only matches in grep/egrep via shell/bash


Hi,
I'm trying to write up a little shell script to automate installing vim scripts once I install a new Linux distro on a machine.

This is the general idea:
1. Let user specify vim dir (we would install the scripts in the plugin dir of this vim dir).
2. Let user choose which plugins they want to install from a list of displayed scripts ($choice would hold this choice).
3. wget the script and install it in dir specified in step 1.

I'm having problems with step 3.
This is what I have so far:
Code:
wget -O blah.zip http://www.vim.org/scripts/$(wget -qO - http://www.vim.org/scripts/script.php?script_id=${choice} | cat | grep -m 1 -o download_script\.php\?src_id=[0-9]*)
in the above code, the nested wget would run in quiet (-q) mode and output to STDOUT (-O -), fetching the script selected in step 2.
It would then pipe the output to cat and grep for the first match (-m 1) and only print out the part that matched the expression (-o) which would be the first (i.e. latest) download script for a vim script.
Then, the outer wget would download the matched expression and output to a 'blah.zip' file



My problem is, I don't want to output to blah.zip, I want to output to the relevant named script but I can't capture any matched expressions using grep or egrep.
I've tried the following (only the last part of the grep expression where I match the digits is modified here):
Code:
wget -O blah.zip http://www.vim.org/scripts/$(wget -qO - http://www.vim.org/scripts/script.php?script_id=${choice} | cat | grep -m 1 -o (download_script\.php\?src_id=[0-9]*)\"\>([^\<]*)\</a\>)
But I don't know how to access the captured expression in grep or egrep.

So, for example if I were to install the Rails script for vim which is:
Code:
http://www.vim.org/scripts/script.php?script_id=1567
my script should download to a file called "rails.zip" which would have the following URL as of today:
Code:
http://www.vim.org/scripts/download_script.php?src_id=9854

Any ideas?



Oh also, in my expression, why do I have to use:
Code:
download_script\.php\?src_id=[0-9]*
instead of:
Code:
download_script\.php\?src_id=[0-9]{,4}
Thanks!

Last edited by koobi; 03-25-2009 at 06:03 AM.
 
Old 03-25-2009, 01:45 PM   #2
raconteur
Member
 
Registered: Dec 2007
Location: Slightly left of center
Distribution: slackware
Posts: 276
Blog Entries: 2

Rep: Reputation: 44
Quote:
Originally Posted by koobi View Post
Oh also, in my expression, why do I have to use:
Code:
download_script\.php\?src_id=[0-9]*
instead of:
Code:
download_script\.php\?src_id=[0-9]{,4}
AFAIK, positional arguments are not supported in bash pattern matching as it is in perl and other apps.

As for your first question, you may be able to use the BASH_REMATCH array to get the name you are after, see the man page for bash for more info. You will have to set up parenthesized expressions and pay attention to the order of execution, but I think it could be done.
 
Old 03-25-2009, 01:57 PM   #3
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
What's the point of the '| cat |' in your $(wget ... | cat | grep ...) expression> Doesn't that just pipe stdin to stdout so stout becomes stdin for the grep, which it was at the first |?

Using sed are gwak in place of the grep would give you a lot more flexibility.
 
Old 03-25-2009, 04:08 PM   #4
koobi
Member
 
Registered: Jun 2006
Location: Colombo, Sri Lanka
Distribution: Ubuntu
Posts: 103

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by raconteur View Post
AFAIK, positional arguments are not supported in bash pattern matching as it is in perl and other apps.

As for your first question, you may be able to use the BASH_REMATCH array to get the name you are after, see the man page for bash for more info. You will have to set up parenthesized expressions and pay attention to the order of execution, but I think it could be done.

Well, `man grep` says:
Quote:
Repetition
A regular expression may be followed by one of several repetition operators:
? The preceding item is optional and matched at most once.
* The preceding item will be matched zero or more times.
+ The preceding item will be matched one or more times.
{n} The preceding item is matched exactly n times.
{n,} The preceding item is matched n or more times.
{,m} The preceding item is matched at most m times.
{n,m} The preceding item is matched at least n times, but not more than m times.
Thanks, I checked out BASH_REMATCH, it sounds like it should do the job. I'll give it a go and post here if I have more problems. I still can't figure out why the {} won't work.
Also, I don't understand why the + won't match one or more numbers and I have to use * instead (see the last character of my expression).




Quote:
Originally Posted by PTrenholme View Post
What's the point of the '| cat |' in your $(wget ... | cat | grep ...) expression> Doesn't that just pipe stdin to stdout so stout becomes stdin for the grep, which it was at the first |?

Using sed are gwak in place of the grep would give you a lot more flexibility.
You're right...I didn't realize that. I'll remove the cat...it's redundant there.
I read up on sed a bit but with BASH_REMATCH, I think I might be able to do it (since I will have to backreference a matched expression in the outer wget)...if not, I will definitely look at sed.
 
Old 03-28-2009, 12:37 PM   #5
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
Quote:
Originally Posted by koobi View Post
...
Thanks, I checked out BASH_REMATCH, it sounds like it should do the job. I'll give it a go and post here if I have more problems. I still can't figure out why the {} won't work.
Also, I don't understand why the + won't match one or more numbers and I have to use * instead (see the last character of my expression). ...
Try adding a shopt -s extglob at the start of your script. See info bash -> Bash Builtins for a description of the extglob option in the shopt section.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
limiting line matches in grep genderbender Programming 5 07-05-2008 05:18 PM
Help with ls and grep/egrep kasthana Linux - Newbie 1 05-29-2008 01:06 PM
using grep and egrep in the terminal KumARan23 Linux - Newbie 3 11-11-2007 09:27 AM
Using Grep and Egrep linux-nerd Linux - General 5 10-10-2004 11:37 AM
bash: routine outputting both matches and non-matches separately??? Bebo Programming 8 07-19-2004 06:52 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:54 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration