LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 05-01-2008, 08:07 PM   #1
bad_jaye
LQ Newbie
 
Registered: Feb 2008
Posts: 4

Rep: Reputation: 0
Smile search unknown special characters on a textfile


Hi--

I need to create a shell script to search for unknown speacial characters on a file.

sample:
3 0O
2 SP5
1 U ASAKOU
1 MARTSSAUT b
2 XID CHDSC
1 STAR
2 ID75

What I need is to print out the "1 MARTSSAUT b". I only need to retrieve the visible special charcters like "b". Note that the special characters varies from time to time so I need a flexible script that excludes [A-Z][a-z][0-9],white spaces and any characters that can be found on keyboard like "\ | * ? ^ # @ ! ~" and so on.

I hope someone can help me with this. I tried using grep, awk and tc but I cannot seem to get the desired result.

Thanks in advance.
 
Old 05-01-2008, 09:40 PM   #2
eggixyz
Member
 
Registered: Apr 2008
Posts: 310

Rep: Reputation: 30
Hey There,

You can use od to get you started, it'll pick all of those out of there and then you can ignore the regular stuff (note that the first field of the default output is the character offset, so you can ignore that, too)

Ex: with your file

Code:
-bash-3.2$ od -c yourFile
0000000    3       0   O  \n   2       S   P   5  \n   1       U       A
0000020    S   A   K   O   U  \n   1       M   A   R   T   S   S   A   U
0000040    T                b     \n   2       X   I   D       C   H
0000060    D   S   C  \n   1       S   T   A   R  \n   2       I   D   7
0000100    5  \n
0000102
Hope that helps get you started

Let me know if you need further help

, Mike
 
Old 05-02-2008, 08:11 AM   #3
bad_jaye
LQ Newbie
 
Registered: Feb 2008
Posts: 4

Original Poster
Rep: Reputation: 0
Hi eggixyz,

Thanks for the help. Really appreciate it!!!

The things is that I needed a script that will only output the special characters. The textfile that I search usually composed of hundreds of line. Using your script will be like searching each lines manually for special characters. So this is the reason that I needed only the lines where special characters are present. In the example, I want the...

1 MARTSSAUT b

to be the only output of the script.

More power to you and godbless!!

This is very tedious job and excruciating if I have to go cheking each lines. Please help me.. O GOD Help me!!!
 
Old 05-02-2008, 08:17 PM   #4
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
I have been struggling with this, but have not solved it.
 
Old 05-02-2008, 09:35 PM   #5
jsurles
Member
 
Registered: Feb 2007
Location: Katy, TX
Distribution: gentoo, slackware, centos, ESX, gnu/linux
Posts: 33

Rep: Reputation: 15
Quote:
Originally Posted by pixellany View Post
I have been struggling with this, but have not solved it.
I would do this.. of course, you'll need to add in the special chars like !@#$%^&&**() etc, I'm not sure if there's an easy thing like A-z or 0-9 with those chars.. but this seems to work:

Code:
for each in `sed 's/\(\)/ /g' samplefile`
do
  echo $each | egrep -v "[A-z]|[0-9]"
done
 
Old 05-02-2008, 09:46 PM   #6
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,964
Blog Entries: 11

Rep: Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865
Quote:
Originally Posted by bad_jaye View Post
Hi--
... script that excludes [A-Z][a-z][0-9],white spaces and any characters that can be found on keyboard like "\ | * ? ^ # @ ! ~" and so on.

I hope someone can help me with this. I tried using grep, awk and tc but I cannot seem to get the desired result.

Thanks in advance.
The "^I" below were produce in vi by pressing Ctrl-v<TAB>
Save it as clean.sed

Code:
s^I[^][ !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~]^I^Ig
sed -f clean.sed funny_text



Cheers,
Tink
 
Old 05-02-2008, 09:55 PM   #7
eggixyz
Member
 
Registered: Apr 2008
Posts: 310

Rep: Reputation: 30
Hey There,

This will do it for your script.

Code:
#!/usr/bin/perl

open(FILE, "<G");
while (<FILE>)  {
        if ( $_ =~ /[^A-Za-z0-9\s\t]/ ) {
               print $_
        }
}
close(FILE);
And, even though this is ugly, I think this ignores pretty much everything that's "normal" (all 94 regular characters and space and tab -- just add \n, etc for whatever extra characters you want to not earmark)

#!/usr/bin/perl

open(FILE, "<G");
while (<FILE>) {
if ( $_ =~ /[^A-Za-z0-9\s\t\`\-=\[\]\\;\',\.\/~!@#$%^&\*\(\)_+\{\}\|:\"<>\?)]/ ) {
print $_
}
}
close(FILE);


If you need it for regexp outside of perl, sed/awk should be able to make all the same matches, with an extra backslash or two.

Hope that helps

, Mike
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Special Characters SimeonV Suse/Novell 14 07-07-2006 01:29 PM
Special Characters in username ljramos Linux - General 1 02-27-2006 01:56 PM
special characters greenbox Linux - Software 9 12-23-2005 07:33 PM
Special characters consty Programming 3 08-07-2005 05:53 AM
using special characters one_ro Mandriva 5 11-04-2004 08:52 AM


All times are GMT -5. The time now is 07:04 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration