LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-14-2007, 09:59 PM   #1
new_2_unix
LQ Newbie
 
Registered: Oct 2007
Posts: 26

Rep: Reputation: 15
awk does not seem to recognize character classes


hi,

i have a text file with alphabets and numbers, as follows:

TestString123
AnotherString456

i'm using awk's "sub" function to substitute all alphabets with a "" so that i'm left with just the numbers, as follows:

cat filename | gawk '{sub(/TestString/,"")} ; { print $1 }'

This works just fine for the first line of the file and i've the output:

123
AnotherString456

My problem is that if i try to use character class [:alpha:] as follows:

cat filename | gawk '{sub(/[[:alpha:]]/,"")} ; { print $1 }'

it does not work properly and it removes only a part of the alphabets and not all, like the following output:

estString123
notherString456

if i try:
cat filename | gawk '{sub(/[A-Za-z]/,"")} ; { print $1 }'

that also does not work properly:

ing123
ing456

i can't figure out why the character classes or A-Za-z are only partially working whereas if i spell out the word explicitly, it works.

any guidance will be really helpful. thanks!
 
Old 10-14-2007, 10:04 PM   #2
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 67
You want to use gsub. sub only substitutes the first occurrence, gsub (the g stands for global) replaces all occurrences.
 
Old 10-14-2007, 10:32 PM   #3
new_2_unix
LQ Newbie
 
Registered: Oct 2007
Posts: 26

Original Poster
Rep: Reputation: 15
Yes! That was it!
Thanks a lot!!
 
Old 10-14-2007, 10:33 PM   #4
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Quote:
Originally Posted by new_2_unix View Post
hi,

i have a text file with alphabets and numbers, as follows:

TestString123
AnotherString456

i'm using awk's "sub" function to substitute all alphabets with a "" so that i'm left with just the numbers, as follows:

cat filename | gawk '{sub(/TestString/,"")} ; { print $1 }'

This works just fine for the first line of the file and i've the output:

123
AnotherString456

My problem is that if i try to use character class [:alpha:] as follows:

cat filename | gawk '{sub(/[[:alpha:]]/,"")} ; { print $1 }'

it does not work properly and it removes only a part of the alphabets and not all, like the following output:

estString123
notherString456

if i try:
cat filename | gawk '{sub(/[A-Za-z]/,"")} ; { print $1 }'

that also does not work properly:

ing123
ing456

i can't figure out why the character classes or A-Za-z are only partially working whereas if i spell out the word explicitly, it works.

any guidance will be really helpful. thanks!
Works as defined. You're asking for ONE occurrence to be
removed, and that's what it's doing.

If you want ALL alphas to disappear, use a +
Code:
 gawk '{sub(/[[:alpha:]]+/,"")} ; { print $1 }' blahblah 
123
456
Same with [A-Za-z] ... w/o a + (or *) it will match ONE
(exactly one) character...


Cheers,
Tink
 
Old 10-14-2007, 10:41 PM   #5
new_2_unix
LQ Newbie
 
Registered: Oct 2007
Posts: 26

Original Poster
Rep: Reputation: 15
thanks for that too!
i didn't know that... gud thing to learn!
 
Old 10-14-2007, 11:13 PM   #6
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Quote:
Originally Posted by new_2_unix View Post
Yes! That was it!
Thanks a lot!!
Actually in this context sub does exactly what you
want. gsub would come in handy if the strings were
something like:

Code:
TestString123AnotherString456
and your expected result should be 123456... I'd
say that using gsub in the way described above is
abuse :}


Cheers,
Tink
 
Old 10-15-2007, 06:36 AM   #7
Geist3
Member
 
Registered: Oct 2003
Location: Richmond, Virginia USA
Distribution: Slackware 12.2
Posts: 59

Rep: Reputation: 15
I find sed easier for substitutions:

echo st1906ghFU22 > teststring
sed 's/[[:alpha:]]//g' teststring
190622
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
awk error 'Invalid collation character: ' Mash Programming 1 04-15-2007 02:44 AM
Insert character by using sed/awk manish_meet_in Linux - General 3 04-05-2007 01:19 PM
awk escape character for colon in string quadmore Programming 2 02-27-2007 05:56 PM
OOP (PHP) classes and extended classes ldp Programming 3 03-05-2005 12:45 PM
AWK: print field to end, and character count? ridertech Linux - Newbie 1 05-07-2004 06:07 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 02:11 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration