LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Tcl: Remove all alphabetic characters at beginning of string (https://www.linuxquestions.org/questions/programming-9/tcl-remove-all-alphabetic-characters-at-beginning-of-string-822346/)

BozMan 07-26-2010 11:24 PM

Tcl: Remove all alphabetic characters at beginning of string
 
Hello all,

I am working with a Tcl script and have some strings in the following format (RE):
[a-zA-Z]+[0-9]{6}-[0-9]

There are some leading letters, combinations of capital and lowercase. Then six digits, followed by a hyphen, then one more digit.

I would like to remove all of the leading alphabetic characters from the string. The resulting string would then be in this format: [0-9]{6}-[0-9]. In other words, six numeric digits, a hyphen, then one more digit.

I have tried:
Code:

set newstr [string trimleft $origstr alpha]
But that only removes the first alphabetic character, not all of them.

I couldn't get anything with regsub to work correctly, but I am somewhat of a noob with RE's in general and regsub in particular.

There are usually 5 leading letters at the beginning of these strings, and I could in most cases get away with using string replace and constant indices to extract the substring.

However, my preference is for this to be robust enough to handle all cases with 1 through n leading alphabetic characters.

Wim Sturkenboom 07-26-2010 11:55 PM

The following code might help (it's part of a regular expression tester that I wrote as part of an application so users can test regular expressions).
Code:

proc try_it {regexp str} {

        # global variables; clear them
        global matchstr
        set matchstr ""
        for {set cnt 1} {$cnt <=10} {incr cnt} {
                global submatch$cnt
                set  submatch$cnt ""
        }

        set match [regexp $regexp $str matchstr submatch1 submatch2 submatch3 submatch4 submatch5 submatch6 submatch7 submatch8 submatch9 submatch10]
        if {!$match} {
                tk_messageBox -title "No match" -parent .regexp -message "input  string does not match regular expression"
                return
        }
}

'str' is the input under test, 'regexp' is the regular expression. The bold line will use the regular expression to parse the string into up to 10 substrings (submatch1 .. submatch10); you can limit it for your use to 2 submatches.

Your regular expression: ^([a-z]+)([0-9]{6}-[0-9])$
Your string: hallo123456-3
matchstr : hallo123456-3
submatch1 : hallo
submatch2 : 123456-3

The trick lays in the grouping (the round braces marked in bold red).

Hope this helps

acvoight 07-27-2010 12:23 AM

My rather poor solution would be:

cat filewithstrings | tr -d [:alpha:] >filewithstrings

You can probably experiment and such and find a better solution though.

Edit: Didn't realize Tcl was a language, the above is in bash, sorry :(.

BozMan 07-27-2010 06:05 PM

Thank you to both acvoight and Wim Sturkenboom. Those examples are both useful. Even if not Tcl, still good to know.

I had an epiphany this morning that should work:

Code:

regsub {^[a-zA-Z]+} $strwithletters "" noletters
The $noletters variable then contains just the digits and hyphen, no preceding letters.

I'd still prefer to do this the other way -- search for the pattern that I want to keep, and then store just that pattern into a variable. Right now I am searching for the pattern belonging to the part I want to remove.

Wim Sturkenboom's solution of subdividing the entire regular expression (both the letters part and the digits part) into subpatterns is probably the way to go...


All times are GMT -5. The time now is 12:41 AM.