LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 04-20-2006, 11:26 AM   #1
King V
Member
 
Registered: Oct 2001
Location: New Jersey
Distribution: Mandrake 10.2
Posts: 75

Rep: Reputation: 15
Question String manipulation with BASH script


All,

I've run into a bit of trouble with some string manipulation I'm trying to do.

In my script, I have a string stored in $TEMP. This is read from a file, and is in an HTML-like format.

What I want to do is strip off an individual HTML-tag from the end.

The original string will look like:

<some html tag>some text<another tag>more text<a final tag>zero-or-more-chars


The last tag and zero or more chars part should be deleted ONLY if the zero-or-more-chars does NOT contain any of the following:
a-z
A-Z
0-9
,
.
:

So basically I know I want to tell it to remove the last instance of \<*\> followed by zero or more characters that are NOT what I've listed above.

I tried the following:

TEMP=${ORIGSTRING%\<*\>[^a-zA-Z0-9\,\.\:*]*}


And it works EXCEPT for in cases where there's no extra characters after the tag.

What am I doing wrong? Also, do I have an extraneous asterisk?

Thanks in advance.
 
Old 04-20-2006, 12:09 PM   #2
Hobbletoe
Member
 
Registered: Sep 2004
Location: Dayton, Oh
Distribution: Linux Mint 10, Linux Mint 11
Posts: 148

Rep: Reputation: 18
I don't believe that it can be done as such as Bash works from the start or the end, not inbetweens (unless you know the positions). Maybe this would work for you.

Code:
TEMP=${ORIGSTRING%\<*\>*}${ORIGSTRING##*\>}
That should give you what you are looking for.
 
Old 04-20-2006, 12:28 PM   #3
King V
Member
 
Registered: Oct 2001
Location: New Jersey
Distribution: Mandrake 10.2
Posts: 75

Original Poster
Rep: Reputation: 15
I should clarify, because I realize the explanation of what I want to do is a bit self-contradictory.

Only the last tag AND any characters following the last tag should be stripped, unless the characters after the tag contain:
a-z
A-Z
0-9
:
,
.

If one of those characters DOES exist after the last tag, then the string should be left alone, unchanged.

So, for example:

hello<tag1>-+

should result in:

hello




hello<tag1>+a

should result in:

hello<tag1>+a




hello<tag1>

should result in:

hello
 
Old 04-20-2006, 01:56 PM   #4
ioerror
Member
 
Registered: Sep 2005
Location: Old Blighty
Distribution: Slackware, NetBSD
Posts: 536

Rep: Reputation: 30
This should work in zsh (sorry, I'm not familiar with bash, so I don't know how to convert it).

Code:
TEMP=${ORIGSTRING/%\>[^[:alnum:],.:]##/>}
Edit:
Note: It's the ## that bash doesn't understand. In zsh, this means "one or more of the preceding character/pattern" (i.e. equivalent to + in standard regex). Bash probably has some way to do this, man page should tell you.

Note 2: If you want to try this in zsh, I think you need 'setopt extendedglob' to activate the ##.

Note 3: If you have no luck with bash, you might want to use sed, as regular expressions are far more powerful than shell pattern matching.

Last edited by ioerror; 04-20-2006 at 02:05 PM.
 
Old 04-20-2006, 02:15 PM   #5
King V
Member
 
Registered: Oct 2001
Location: New Jersey
Distribution: Mandrake 10.2
Posts: 75

Original Poster
Rep: Reputation: 15
ioerror: No luck, doesn't work in bash

Hobbletoe: Actually, what you gave me comes REALLY close to working for my needs (I thought it didn't because I mistyped it the first time I tried it). I'm trying to tweak the second half of it now to see if I can get the sort of results I'm looking for. So far, I've not been able to get the end-text (anything after the tag) to be reliably eliminated/preserved depending on whether it lacks/has the characters I want to keep.

EDIT:

Ok, for those interested, I came up with a solution. It's clumsier than I'd like, but it seems to work:

HEAD=${ORIGSTRING%\<*\>*}
TAIL=${ORIGSTRING##*\>}
case $TAIL in
*[a-zA-Z0-9\,\.\:]*) RESULT=$ORIGSTRING;;
*) RESULT=$HEAD
esac


Still wish there was a one-liner way I could figure out to do it...

Last edited by King V; 04-20-2006 at 03:12 PM.
 
Old 04-20-2006, 02:56 PM   #6
ioerror
Member
 
Registered: Sep 2005
Location: Old Blighty
Distribution: Slackware, NetBSD
Posts: 536

Rep: Reputation: 30
Quote:
ioerror: No luck, doesn't work in bash
I know it doesn't! It's zsh specific syntax. I don't know if bash has any equivalent, just looked at the man page, can't find anything, except the ksh style +(...) but that doesn't seem to work in a parameter expansion (in bash anyway, just tried it in zsh too and it works OK).

Maybe you should just upgrade to zsh?
 
Old 04-20-2006, 03:13 PM   #7
King V
Member
 
Registered: Oct 2001
Location: New Jersey
Distribution: Mandrake 10.2
Posts: 75

Original Poster
Rep: Reputation: 15
Erp! Right, I meant to say it doesn't work because I couldn't figure out how to translate it. Oddly, though, the zsh syntax didn't give me any sort of errors.
 
Old 04-20-2006, 03:21 PM   #8
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
> echo $temp
hello<tag1>+
> echo ${temp%<*>[^[:alpha:]]}
hello
> temp='hello<tag>+a'
> echo ${temp%<*>[^[:alpha:]]}
hello<tag>+a

A more complicated pattern may require enabling the option for regular expressions rather than file globbing patterns, so the the "*" means zero or more of the previous character instead of zero or more of any character.

Last edited by jschiwal; 04-20-2006 at 03:29 PM.
 
Old 04-20-2006, 04:19 PM   #9
ioerror
Member
 
Registered: Sep 2005
Location: Old Blighty
Distribution: Slackware, NetBSD
Posts: 536

Rep: Reputation: 30
Quote:
the zsh syntax didn't give me any sort of errors.
This is probably because the ## is actually valid syntax in bash. It's used in ${...##...} paramter expansion, which zsh also understands. But zsh has another meaning for ## depending on the context, and it's this alternate meaning we're using here.

I think I've achieved what you want with zsh, but I can't get it to work in bash either. Of course, you don't need to change shells just to write a script, you can just use a '#!/what/ever/zsh' instead of the usual '#!/bin/sh' or '#!/bin/bash'.

Failing that, I think you best bet is to use regex (sed or perl) as I can't see any way to do this with bash.
 
Old 04-21-2006, 03:15 PM   #10
King V
Member
 
Registered: Oct 2001
Location: New Jersey
Distribution: Mandrake 10.2
Posts: 75

Original Poster
Rep: Reputation: 15
Well, in any case, I have a solution that works in bash (see the edit of my 2nd post in this thread). It's just sorta clumsy.

On the other hand, it's actually kind of easy to follow.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Parse String in a Bash script jimwelc Linux - Newbie 8 11-09-2012 07:47 AM
bash + string manipulation dave bean Programming 7 02-16-2005 11:16 AM
Bash Script String Splitting MurrayL Linux - Newbie 1 09-21-2004 03:20 AM
String manipulation with a script. philipina Programming 4 03-16-2004 02:42 PM
String manipulation with a script? philipina General 1 03-15-2004 12:07 PM


All times are GMT -5. The time now is 01:27 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration