LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   String manipulation with BASH script (http://www.linuxquestions.org/questions/programming-9/string-manipulation-with-bash-script-437143/)

King V 04-20-2006 11:26 AM

String manipulation with BASH script
 
All,

I've run into a bit of trouble with some string manipulation I'm trying to do.

In my script, I have a string stored in $TEMP. This is read from a file, and is in an HTML-like format.

What I want to do is strip off an individual HTML-tag from the end.

The original string will look like:

<some html tag>some text<another tag>more text<a final tag>zero-or-more-chars


The last tag and zero or more chars part should be deleted ONLY if the zero-or-more-chars does NOT contain any of the following:
a-z
A-Z
0-9
,
.
:

So basically I know I want to tell it to remove the last instance of \<*\> followed by zero or more characters that are NOT what I've listed above.

I tried the following:

TEMP=${ORIGSTRING%\<*\>[^a-zA-Z0-9\,\.\:*]*}


And it works EXCEPT for in cases where there's no extra characters after the tag.

What am I doing wrong? Also, do I have an extraneous asterisk?

Thanks in advance.

Hobbletoe 04-20-2006 12:09 PM

I don't believe that it can be done as such as Bash works from the start or the end, not inbetweens (unless you know the positions). Maybe this would work for you.

Code:

TEMP=${ORIGSTRING%\<*\>*}${ORIGSTRING##*\>}
That should give you what you are looking for.

King V 04-20-2006 12:28 PM

I should clarify, because I realize the explanation of what I want to do is a bit self-contradictory.

Only the last tag AND any characters following the last tag should be stripped, unless the characters after the tag contain:
a-z
A-Z
0-9
:
,
.

If one of those characters DOES exist after the last tag, then the string should be left alone, unchanged.

So, for example:

hello<tag1>-+

should result in:

hello




hello<tag1>+a

should result in:

hello<tag1>+a




hello<tag1>

should result in:

hello

ioerror 04-20-2006 01:56 PM

This should work in zsh (sorry, I'm not familiar with bash, so I don't know how to convert it).

Code:

TEMP=${ORIGSTRING/%\>[^[:alnum:],.:]##/>}
Edit:
Note: It's the ## that bash doesn't understand. In zsh, this means "one or more of the preceding character/pattern" (i.e. equivalent to + in standard regex). Bash probably has some way to do this, man page should tell you.

Note 2: If you want to try this in zsh, I think you need 'setopt extendedglob' to activate the ##.

Note 3: If you have no luck with bash, you might want to use sed, as regular expressions are far more powerful than shell pattern matching.

King V 04-20-2006 02:15 PM

ioerror: No luck, doesn't work in bash

Hobbletoe: Actually, what you gave me comes REALLY close to working for my needs (I thought it didn't because I mistyped it the first time I tried it). I'm trying to tweak the second half of it now to see if I can get the sort of results I'm looking for. So far, I've not been able to get the end-text (anything after the tag) to be reliably eliminated/preserved depending on whether it lacks/has the characters I want to keep.

EDIT:

Ok, for those interested, I came up with a solution. It's clumsier than I'd like, but it seems to work:

HEAD=${ORIGSTRING%\<*\>*}
TAIL=${ORIGSTRING##*\>}
case $TAIL in
*[a-zA-Z0-9\,\.\:]*) RESULT=$ORIGSTRING;;
*) RESULT=$HEAD
esac


Still wish there was a one-liner way I could figure out to do it...

ioerror 04-20-2006 02:56 PM

Quote:

ioerror: No luck, doesn't work in bash
I know it doesn't! It's zsh specific syntax. I don't know if bash has any equivalent, just looked at the man page, can't find anything, except the ksh style +(...) but that doesn't seem to work in a parameter expansion (in bash anyway, just tried it in zsh too and it works OK).

Maybe you should just upgrade to zsh? ;)

King V 04-20-2006 03:13 PM

Erp! Right, I meant to say it doesn't work because I couldn't figure out how to translate it. Oddly, though, the zsh syntax didn't give me any sort of errors.

jschiwal 04-20-2006 03:21 PM

> echo $temp
hello<tag1>+
> echo ${temp%<*>[^[:alpha:]]}
hello
> temp='hello<tag>+a'
> echo ${temp%<*>[^[:alpha:]]}
hello<tag>+a

A more complicated pattern may require enabling the option for regular expressions rather than file globbing patterns, so the the "*" means zero or more of the previous character instead of zero or more of any character.

ioerror 04-20-2006 04:19 PM

Quote:

the zsh syntax didn't give me any sort of errors.
This is probably because the ## is actually valid syntax in bash. It's used in ${...##...} paramter expansion, which zsh also understands. But zsh has another meaning for ## depending on the context, and it's this alternate meaning we're using here.

I think I've achieved what you want with zsh, but I can't get it to work in bash either. Of course, you don't need to change shells just to write a script, you can just use a '#!/what/ever/zsh' instead of the usual '#!/bin/sh' or '#!/bin/bash'.

Failing that, I think you best bet is to use regex (sed or perl) as I can't see any way to do this with bash.

King V 04-21-2006 03:15 PM

Well, in any case, I have a solution that works in bash (see the edit of my 2nd post in this thread). It's just sorta clumsy.

On the other hand, it's actually kind of easy to follow.


All times are GMT -5. The time now is 02:00 PM.