![]() |
String manipulation with BASH script
All,
I've run into a bit of trouble with some string manipulation I'm trying to do. In my script, I have a string stored in $TEMP. This is read from a file, and is in an HTML-like format. What I want to do is strip off an individual HTML-tag from the end. The original string will look like: <some html tag>some text<another tag>more text<a final tag>zero-or-more-chars The last tag and zero or more chars part should be deleted ONLY if the zero-or-more-chars does NOT contain any of the following: a-z A-Z 0-9 , . : So basically I know I want to tell it to remove the last instance of \<*\> followed by zero or more characters that are NOT what I've listed above. I tried the following: TEMP=${ORIGSTRING%\<*\>[^a-zA-Z0-9\,\.\:*]*} And it works EXCEPT for in cases where there's no extra characters after the tag. What am I doing wrong? Also, do I have an extraneous asterisk? Thanks in advance. |
I don't believe that it can be done as such as Bash works from the start or the end, not inbetweens (unless you know the positions). Maybe this would work for you.
Code:
TEMP=${ORIGSTRING%\<*\>*}${ORIGSTRING##*\>} |
I should clarify, because I realize the explanation of what I want to do is a bit self-contradictory.
Only the last tag AND any characters following the last tag should be stripped, unless the characters after the tag contain: a-z A-Z 0-9 : , . If one of those characters DOES exist after the last tag, then the string should be left alone, unchanged. So, for example: hello<tag1>-+ should result in: hello hello<tag1>+a should result in: hello<tag1>+a hello<tag1> should result in: hello |
This should work in zsh (sorry, I'm not familiar with bash, so I don't know how to convert it).
Code:
TEMP=${ORIGSTRING/%\>[^[:alnum:],.:]##/>}Note: It's the ## that bash doesn't understand. In zsh, this means "one or more of the preceding character/pattern" (i.e. equivalent to + in standard regex). Bash probably has some way to do this, man page should tell you. Note 2: If you want to try this in zsh, I think you need 'setopt extendedglob' to activate the ##. Note 3: If you have no luck with bash, you might want to use sed, as regular expressions are far more powerful than shell pattern matching. |
ioerror: No luck, doesn't work in bash
Hobbletoe: Actually, what you gave me comes REALLY close to working for my needs (I thought it didn't because I mistyped it the first time I tried it). I'm trying to tweak the second half of it now to see if I can get the sort of results I'm looking for. So far, I've not been able to get the end-text (anything after the tag) to be reliably eliminated/preserved depending on whether it lacks/has the characters I want to keep. EDIT: Ok, for those interested, I came up with a solution. It's clumsier than I'd like, but it seems to work: HEAD=${ORIGSTRING%\<*\>*} TAIL=${ORIGSTRING##*\>} case $TAIL in *[a-zA-Z0-9\,\.\:]*) RESULT=$ORIGSTRING;; *) RESULT=$HEAD esac Still wish there was a one-liner way I could figure out to do it... |
Quote:
Maybe you should just upgrade to zsh? ;) |
Erp! Right, I meant to say it doesn't work because I couldn't figure out how to translate it. Oddly, though, the zsh syntax didn't give me any sort of errors.
|
> echo $temp
hello<tag1>+ > echo ${temp%<*>[^[:alpha:]]} hello > temp='hello<tag>+a' > echo ${temp%<*>[^[:alpha:]]} hello<tag>+a A more complicated pattern may require enabling the option for regular expressions rather than file globbing patterns, so the the "*" means zero or more of the previous character instead of zero or more of any character. |
Quote:
I think I've achieved what you want with zsh, but I can't get it to work in bash either. Of course, you don't need to change shells just to write a script, you can just use a '#!/what/ever/zsh' instead of the usual '#!/bin/sh' or '#!/bin/bash'. Failing that, I think you best bet is to use regex (sed or perl) as I can't see any way to do this with bash. |
Well, in any case, I have a solution that works in bash (see the edit of my 2nd post in this thread). It's just sorta clumsy.
On the other hand, it's actually kind of easy to follow. |
| All times are GMT -5. The time now is 09:15 AM. |