Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
04-20-2006, 11:26 AM
|
#1
|
|
Member
Registered: Oct 2001
Location: New Jersey
Distribution: Mandrake 10.2
Posts: 75
Rep:
|
String manipulation with BASH script
All,
I've run into a bit of trouble with some string manipulation I'm trying to do.
In my script, I have a string stored in $TEMP. This is read from a file, and is in an HTML-like format.
What I want to do is strip off an individual HTML-tag from the end.
The original string will look like:
<some html tag>some text<another tag>more text<a final tag>zero-or-more-chars
The last tag and zero or more chars part should be deleted ONLY if the zero-or-more-chars does NOT contain any of the following:
a-z
A-Z
0-9
,
.
:
So basically I know I want to tell it to remove the last instance of \<*\> followed by zero or more characters that are NOT what I've listed above.
I tried the following:
TEMP=${ORIGSTRING%\<*\>[^a-zA-Z0-9\,\.\:*]*}
And it works EXCEPT for in cases where there's no extra characters after the tag.
What am I doing wrong? Also, do I have an extraneous asterisk?
Thanks in advance.
|
|
|
|
04-20-2006, 12:09 PM
|
#2
|
|
Member
Registered: Sep 2004
Location: Dayton, Oh
Distribution: Linux Mint 10, Linux Mint 11
Posts: 148
Rep:
|
I don't believe that it can be done as such as Bash works from the start or the end, not inbetweens (unless you know the positions). Maybe this would work for you.
Code:
TEMP=${ORIGSTRING%\<*\>*}${ORIGSTRING##*\>}
That should give you what you are looking for.
|
|
|
|
04-20-2006, 12:28 PM
|
#3
|
|
Member
Registered: Oct 2001
Location: New Jersey
Distribution: Mandrake 10.2
Posts: 75
Original Poster
Rep:
|
I should clarify, because I realize the explanation of what I want to do is a bit self-contradictory.
Only the last tag AND any characters following the last tag should be stripped, unless the characters after the tag contain:
a-z
A-Z
0-9
:
,
.
If one of those characters DOES exist after the last tag, then the string should be left alone, unchanged.
So, for example:
hello<tag1>-+
should result in:
hello
hello<tag1>+a
should result in:
hello<tag1>+a
hello<tag1>
should result in:
hello
|
|
|
|
04-20-2006, 01:56 PM
|
#4
|
|
Member
Registered: Sep 2005
Location: Old Blighty
Distribution: Slackware, NetBSD
Posts: 536
Rep:
|
This should work in zsh (sorry, I'm not familiar with bash, so I don't know how to convert it).
Code:
TEMP=${ORIGSTRING/%\>[^[:alnum:],.:]##/>}
Edit:
Note: It's the ## that bash doesn't understand. In zsh, this means "one or more of the preceding character/pattern" (i.e. equivalent to + in standard regex). Bash probably has some way to do this, man page should tell you.
Note 2: If you want to try this in zsh, I think you need 'setopt extendedglob' to activate the ##.
Note 3: If you have no luck with bash, you might want to use sed, as regular expressions are far more powerful than shell pattern matching.
Last edited by ioerror; 04-20-2006 at 02:05 PM.
|
|
|
|
04-20-2006, 02:15 PM
|
#5
|
|
Member
Registered: Oct 2001
Location: New Jersey
Distribution: Mandrake 10.2
Posts: 75
Original Poster
Rep:
|
ioerror: No luck, doesn't work in bash
Hobbletoe: Actually, what you gave me comes REALLY close to working for my needs (I thought it didn't because I mistyped it the first time I tried it). I'm trying to tweak the second half of it now to see if I can get the sort of results I'm looking for. So far, I've not been able to get the end-text (anything after the tag) to be reliably eliminated/preserved depending on whether it lacks/has the characters I want to keep.
EDIT:
Ok, for those interested, I came up with a solution. It's clumsier than I'd like, but it seems to work:
HEAD=${ORIGSTRING%\<*\>*}
TAIL=${ORIGSTRING##*\>}
case $TAIL in
*[a-zA-Z0-9\,\.\:]*) RESULT=$ORIGSTRING;;
*) RESULT=$HEAD
esac
Still wish there was a one-liner way I could figure out to do it...
Last edited by King V; 04-20-2006 at 03:12 PM.
|
|
|
|
04-20-2006, 02:56 PM
|
#6
|
|
Member
Registered: Sep 2005
Location: Old Blighty
Distribution: Slackware, NetBSD
Posts: 536
Rep:
|
Quote:
|
ioerror: No luck, doesn't work in bash
|
I know it doesn't! It's zsh specific syntax. I don't know if bash has any equivalent, just looked at the man page, can't find anything, except the ksh style +(...) but that doesn't seem to work in a parameter expansion (in bash anyway, just tried it in zsh too and it works OK).
Maybe you should just upgrade to zsh? 
|
|
|
|
04-20-2006, 03:13 PM
|
#7
|
|
Member
Registered: Oct 2001
Location: New Jersey
Distribution: Mandrake 10.2
Posts: 75
Original Poster
Rep:
|
Erp! Right, I meant to say it doesn't work because I couldn't figure out how to translate it. Oddly, though, the zsh syntax didn't give me any sort of errors.
|
|
|
|
04-20-2006, 03:21 PM
|
#8
|
|
Moderator
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733
|
> echo $temp
hello<tag1>+
> echo ${temp%<*>[^[:alpha:]]}
hello
> temp='hello<tag>+a'
> echo ${temp%<*>[^[:alpha:]]}
hello<tag>+a
A more complicated pattern may require enabling the option for regular expressions rather than file globbing patterns, so the the "*" means zero or more of the previous character instead of zero or more of any character.
Last edited by jschiwal; 04-20-2006 at 03:29 PM.
|
|
|
|
04-20-2006, 04:19 PM
|
#9
|
|
Member
Registered: Sep 2005
Location: Old Blighty
Distribution: Slackware, NetBSD
Posts: 536
Rep:
|
Quote:
|
the zsh syntax didn't give me any sort of errors.
|
This is probably because the ## is actually valid syntax in bash. It's used in ${...##...} paramter expansion, which zsh also understands. But zsh has another meaning for ## depending on the context, and it's this alternate meaning we're using here.
I think I've achieved what you want with zsh, but I can't get it to work in bash either. Of course, you don't need to change shells just to write a script, you can just use a '#!/what/ever/zsh' instead of the usual '#!/bin/sh' or '#!/bin/bash'.
Failing that, I think you best bet is to use regex (sed or perl) as I can't see any way to do this with bash.
|
|
|
|
04-21-2006, 03:15 PM
|
#10
|
|
Member
Registered: Oct 2001
Location: New Jersey
Distribution: Mandrake 10.2
Posts: 75
Original Poster
Rep:
|
Well, in any case, I have a solution that works in bash (see the edit of my 2nd post in this thread). It's just sorta clumsy.
On the other hand, it's actually kind of easy to follow.
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 05:46 PM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|