LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-20-2013, 11:11 AM   #1
brgr88
Member
 
Registered: Apr 2006
Distribution: Slackware 14.2
Posts: 48

Rep: Reputation: 15
POSIX Character Classes - Displaying Member Characters?


Is there a way to print the member characters of the various POSIX character classes?
I'm essentially looking for a portable way to list the characters in the classes for use in character sets other than ASCII.

For instance, something like:
echo [:graph:] > file

to give me all printables except space, and so forth.
 
Old 04-21-2013, 11:28 PM   #2
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
I'm not sure but perhaps tput could help you. If not I guess you could examine grep's source code.
 
Old 04-23-2013, 09:35 AM   #3
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian + kde 4 / 5
Posts: 6,846

Rep: Reputation: 2007Reputation: 2007Reputation: 2007Reputation: 2007Reputation: 2007Reputation: 2007Reputation: 2007Reputation: 2007Reputation: 2007Reputation: 2007Reputation: 2007
That's an interesting question, and I've wondered about it myself in the past. Unfortunately I don't think there's anything like a simple command available. You'll probably just have to do some testing with the encodings you're interested in.

After a bit of internet snooping I came across this page describing the unicode collation algorithm:

http://www.unicode.org/reports/tr10/

I've only just glanced through it myself, but I think it's safe to say that the answer isn't as simple as pulling up a chart listing the correct order. In particular section 1.8 points out that collation is a function of the language setting, not the character set or encoding used. So if there's an answer anywhere, it's likely buried somewhere in the various language definition files.
 
Old 04-23-2013, 06:59 PM   #4
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,762

Rep: Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612
Quote:
Originally Posted by brgr88 View Post
Is there a way to print the member characters of the various POSIX character classes?
Perhaps I don't understand the nuances of the problem. Here's an approach which might be useful. I don't know enough bash to write a character string containing every character and have used this ...
Code:
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
... to represent such a string. See what can be done with it. This code ...
Code:
echo; echo "This is the character class [:digit:]"
printf "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"  \
|sed 's/./&\n/g'  \
|grep [[:digit:]] \
|paste -s -d"\0"

echo; echo "This is the character class [:upper:]"
printf "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"  \
|sed 's/./&\n/g'  \
|grep [[:upper:]] \
|paste -s -d"\0"

echo; echo "This is the character class [:lower:]"
printf "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"  \
|sed 's/./&\n/g'  \
|grep [[:lower:]] \
|paste -s -d"\0"
... produces this result ...
Code:
This is the character class [:digit:]
0123456789

This is the character class [:upper:]
ABCDEFGHIJKLMNOPQRSTUVWXYZ

This is the character class [:lower:]
abcdefghijklmnopqrstuvwxyz
These three similar code snippets may be combined ..
Code:
for j in digit upper lower
  do
    echo; echo "This is the character class [:$j:]"
    printf "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"  \
    |sed 's/./&\n/g'  \
    |grep [[:$j:]] \
    |paste -s -d"\0"
  done
Daniel B. Martin

Last edited by danielbmartin; 04-23-2013 at 08:34 PM. Reason: Elaboration
 
Old 04-24-2013, 02:31 AM   #5
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 7.7 (?), Centos 8.1
Posts: 17,735

Rep: Reputation: 2522Reputation: 2522Reputation: 2522Reputation: 2522Reputation: 2522Reputation: 2522Reputation: 2522Reputation: 2522Reputation: 2522Reputation: 2522Reputation: 2522
@Daniel, have a read of this http://www.regular-expressions.info/posixbrackets.html and consider different locales as well ...
 
Old 04-24-2013, 06:48 AM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,772

Rep: Reputation: 3054Reputation: 3054Reputation: 3054Reputation: 3054Reputation: 3054Reputation: 3054Reputation: 3054Reputation: 3054Reputation: 3054Reputation: 3054Reputation: 3054
So here is something a little more encompassing as far as which characters are being looked at. Obviously to have more than those in the ascii table you need to look at unicode.
Code:
#!/bin/bash

for class
do

	reg="[[:$class:]]"

	for i in {001..377}
	do
		[[ $i =~ [89] ]] && continue

		[[ $(echo -e "\0$i") =~ $reg ]] && echo -ne "\0$i "
	done

	echo
done
Called with the class(es) you wish to look at as parameters:
Code:
./class.sh digit alnum
 
1 members found this post helpful.
Old 04-26-2013, 10:09 PM   #7
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,559

Rep: Reputation: 1854Reputation: 1854Reputation: 1854Reputation: 1854Reputation: 1854Reputation: 1854Reputation: 1854Reputation: 1854Reputation: 1854Reputation: 1854Reputation: 1854
Code:
seq 1 $((0x10ffff)) | # all unicode code points
xargs printf '%08X' | # convert to hex string
xxd -r -p           | # convert to raw binary
iconv -f UCS4       | # convert to current text encoding
grep -o '[[:graph:]]' # print only the characters in the class
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Using Character Classes to Create Matches LXer Syndicated Linux News 0 08-27-2011 12:40 PM
sed and character classes , strange results crispyleif Linux - Newbie 9 02-10-2009 03:08 AM
awk does not seem to recognize character classes new_2_unix Linux - Newbie 6 10-15-2007 05:36 AM
bash simple test with posix character class osio Programming 5 01-22-2006 07:23 PM
Displaying characters of secondary character set: Cyrillic mjjzf Linux - Software 2 08-11-2005 02:49 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:58 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration