Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
10-14-2007, 09:59 PM
|
#1
|
LQ Newbie
Registered: Oct 2007
Posts: 26
Rep:
|
awk does not seem to recognize character classes
hi,
i have a text file with alphabets and numbers, as follows:
TestString123
AnotherString456
i'm using awk's "sub" function to substitute all alphabets with a "" so that i'm left with just the numbers, as follows:
cat filename | gawk '{sub(/TestString/,"")} ; { print $1 }'
This works just fine for the first line of the file and i've the output:
123
AnotherString456
My problem is that if i try to use character class [:alpha:] as follows:
cat filename | gawk '{sub(/[[:alpha:]]/,"")} ; { print $1 }'
it does not work properly and it removes only a part of the alphabets and not all, like the following output:
estString123
notherString456
if i try:
cat filename | gawk '{sub(/[A-Za-z]/,"")} ; { print $1 }'
that also does not work properly:
ing123
ing456
i can't figure out why the character classes or A-Za-z are only partially working whereas if i spell out the word explicitly, it works.
any guidance will be really helpful. thanks!
|
|
|
10-14-2007, 10:04 PM
|
#2
|
Senior Member
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530
Rep:
|
You want to use gsub. sub only substitutes the first occurrence, gsub (the g stands for global) replaces all occurrences.
|
|
|
10-14-2007, 10:32 PM
|
#3
|
LQ Newbie
Registered: Oct 2007
Posts: 26
Original Poster
Rep:
|
Yes! That was it!
Thanks a lot!!
|
|
|
10-14-2007, 10:33 PM
|
#4
|
Moderator
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
|
Quote:
Originally Posted by new_2_unix
hi,
i have a text file with alphabets and numbers, as follows:
TestString123
AnotherString456
i'm using awk's "sub" function to substitute all alphabets with a "" so that i'm left with just the numbers, as follows:
cat filename | gawk '{sub(/TestString/,"")} ; { print $1 }'
This works just fine for the first line of the file and i've the output:
123
AnotherString456
My problem is that if i try to use character class [:alpha:] as follows:
cat filename | gawk '{sub(/[[:alpha:]]/,"")} ; { print $1 }'
it does not work properly and it removes only a part of the alphabets and not all, like the following output:
estString123
notherString456
if i try:
cat filename | gawk '{sub(/[A-Za-z]/,"")} ; { print $1 }'
that also does not work properly:
ing123
ing456
i can't figure out why the character classes or A-Za-z are only partially working whereas if i spell out the word explicitly, it works.
any guidance will be really helpful. thanks!
|
Works as defined. You're asking for ONE occurrence to be
removed, and that's what it's doing.
If you want ALL alphas to disappear, use a +
Code:
gawk '{sub(/[[:alpha:]]+/,"")} ; { print $1 }' blahblah
123
456
Same with [A-Za-z] ... w/o a + (or *) it will match ONE
(exactly one) character...
Cheers,
Tink
|
|
|
10-14-2007, 10:41 PM
|
#5
|
LQ Newbie
Registered: Oct 2007
Posts: 26
Original Poster
Rep:
|
thanks for that too!
i didn't know that... gud thing to learn!
|
|
|
10-14-2007, 11:13 PM
|
#6
|
Moderator
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
|
Quote:
Originally Posted by new_2_unix
Yes! That was it!
Thanks a lot!!
|
Actually in this context sub does exactly what you
want. gsub would come in handy if the strings were
something like:
Code:
TestString123AnotherString456
and your expected result should be 123456... I'd
say that using gsub in the way described above is
abuse :}
Cheers,
Tink
|
|
|
10-15-2007, 06:36 AM
|
#7
|
Member
Registered: Oct 2003
Location: Richmond, Virginia USA
Distribution: Slackware 12.2
Posts: 59
Rep:
|
I find sed easier for substitutions:
echo st1906ghFU22 > teststring
sed 's/[[:alpha:]]//g' teststring
190622
|
|
|
All times are GMT -5. The time now is 02:11 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|