LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 07-26-2010, 11:24 PM   #1
BozMan
LQ Newbie
 
Registered: Jan 2007
Posts: 5

Rep: Reputation: 0
Tcl: Remove all alphabetic characters at beginning of string


Hello all,

I am working with a Tcl script and have some strings in the following format (RE):
[a-zA-Z]+[0-9]{6}-[0-9]

There are some leading letters, combinations of capital and lowercase. Then six digits, followed by a hyphen, then one more digit.

I would like to remove all of the leading alphabetic characters from the string. The resulting string would then be in this format: [0-9]{6}-[0-9]. In other words, six numeric digits, a hyphen, then one more digit.

I have tried:
Code:
set newstr [string trimleft $origstr alpha]
But that only removes the first alphabetic character, not all of them.

I couldn't get anything with regsub to work correctly, but I am somewhat of a noob with RE's in general and regsub in particular.

There are usually 5 leading letters at the beginning of these strings, and I could in most cases get away with using string replace and constant indices to extract the substring.

However, my preference is for this to be robust enough to handle all cases with 1 through n leading alphabetic characters.
 
Old 07-26-2010, 11:55 PM   #2
Wim Sturkenboom
Senior Member
 
Registered: Jan 2005
Location: Roodepoort, South Africa
Distribution: Slackware 10.1/10.2/12, Ubuntu 12.04, Crunchbang Statler
Posts: 3,786

Rep: Reputation: 282Reputation: 282Reputation: 282
The following code might help (it's part of a regular expression tester that I wrote as part of an application so users can test regular expressions).
Code:
proc try_it {regexp str} {

	# global variables; clear them
	global matchstr
	set matchstr ""
	for {set cnt 1} {$cnt <=10} {incr cnt} {
		global submatch$cnt
		set  submatch$cnt ""
	}

	set match [regexp $regexp $str matchstr submatch1 submatch2 submatch3 submatch4 submatch5 submatch6 submatch7 submatch8 submatch9 submatch10]
	if {!$match} {
		tk_messageBox -title "No match" -parent .regexp -message "input  string does not match regular expression"
		return
	}
}
'str' is the input under test, 'regexp' is the regular expression. The bold line will use the regular expression to parse the string into up to 10 substrings (submatch1 .. submatch10); you can limit it for your use to 2 submatches.

Your regular expression: ^([a-z]+)([0-9]{6}-[0-9])$
Your string: hallo123456-3
matchstr : hallo123456-3
submatch1 : hallo
submatch2 : 123456-3

The trick lays in the grouping (the round braces marked in bold red).

Hope this helps
 
1 members found this post helpful.
Old 07-27-2010, 12:23 AM   #3
acvoight
LQ Newbie
 
Registered: Jul 2010
Distribution: Linux Mint
Posts: 21

Rep: Reputation: 1
My rather poor solution would be:

cat filewithstrings | tr -d [:alpha:] >filewithstrings

You can probably experiment and such and find a better solution though.

Edit: Didn't realize Tcl was a language, the above is in bash, sorry .

Last edited by acvoight; 07-27-2010 at 12:40 AM.
 
1 members found this post helpful.
Old 07-27-2010, 06:05 PM   #4
BozMan
LQ Newbie
 
Registered: Jan 2007
Posts: 5

Original Poster
Rep: Reputation: 0
Thank you to both acvoight and Wim Sturkenboom. Those examples are both useful. Even if not Tcl, still good to know.

I had an epiphany this morning that should work:

Code:
regsub {^[a-zA-Z]+} $strwithletters "" noletters
The $noletters variable then contains just the digits and hyphen, no preceding letters.

I'd still prefer to do this the other way -- search for the pattern that I want to keep, and then store just that pattern into a variable. Right now I am searching for the pattern belonging to the part I want to remove.

Wim Sturkenboom's solution of subdividing the entire regular expression (both the letters part and the digits part) into subpatterns is probably the way to go...
 
  


Reply

Tags
regular expressions, tcl


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
SED - remove last four characters from string 3saul Linux - Software 5 07-28-2014 06:25 AM
[SOLVED] Convert string to number without 0 in the beginning nadinnne Linux - Newbie 11 06-08-2010 08:27 AM
Remove chars from beginning of line using vi Johnomal Linux - General 4 08-31-2009 01:55 AM
display all text beginning at <string> daemon_14 Linux - General 4 11-11-2008 03:36 PM
deleting characters at the beginning of a line poobeany Programming 6 09-23-2003 05:17 PM


All times are GMT -5. The time now is 02:51 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration