LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 01-07-2009, 03:56 AM   #1
zeky
Member
 
Registered: Jul 2002
Location: Vukojebina, Europe, Earth
Distribution: M$ Lunix v6.66
Posts: 168

Rep: Reputation: 31
Question Perl help - recursively find non-ascii characters in file


Hi!

I have some nasty, non-ascii character in some files that contains php code. What I want to do here is to recursively find all the files that contains a specific non-ascii character in the file. And most importantly - i need to know the name of that file.

So far, I found a script that looks into a file for non-ascii characters:

Code:
while (<>) { 
    s/([\x80-\xff])/sprintf "\\x{%02x}",ord($1)/eg; 
    print; 
}
Ok, this is good, the non-ascii character that I'm looking for is:

Code:
x{ef}\\x{bb}\\x{bf}
The problem here is that i can can't run this script to run recursively and I don't get the name of the file that contains this characters.

I've tried with bash, but since it's standard output, I can't get any resault on this. Here is what I've tried:

Code:
find |xargs /usr/local/bin/check_for_non-ascii_characters.sh  |grep -l 'x{ef}\\x{bb}\\x{bf}'

So, I need a way to recursively find non-ascii characters (a specific pattern, mentioned before) in all files and I need the name of the files containing it.

Thanks
 
Old 01-07-2009, 04:01 AM   #2
eco
Member
 
Registered: May 2006
Location: BE
Distribution: Debian/Gentoo
Posts: 412

Rep: Reputation: 48
why not simply use the '-R' option in grep?

Code:
# grep -Rl 'x{ef}\\x{bb}\\x{bf}' *
 
Old 01-07-2009, 06:44 AM   #3
zeky
Member
 
Registered: Jul 2002
Location: Vukojebina, Europe, Earth
Distribution: M$ Lunix v6.66
Posts: 168

Original Poster
Rep: Reputation: 31
Quote:
Originally Posted by eco View Post
why not simply use the '-R' option in grep?

Code:
# grep -Rl 'x{ef}\\x{bb}\\x{bf}' *
The problem is that I can's use grep directly on files, because the non-ascii characters are not recognized by it. The Perl script recognize it, so i need to use some perl "hack" in order to see what file contains this characters.
 
Old 01-07-2009, 09:46 AM   #4
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,371

Rep: Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750
Gnu grep has an experimental -P switch for perl regular expressions.
Perhaps 'grep -lR -P \\x{EF}\\x{BB}\\x{BF} .' run from top directory will give what you want.
 
Old 01-07-2009, 07:38 PM   #5
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,359

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
You should be able to adapt this:
Code:
#!/usr/bin/perl -w
use File::Find;

use strict;

@ARGV = ('.');
my ($str, $fname, $rec);

sub check
{
    $fname = $_;
    open(FILE, "<", $fname) or die "unable to open $fname: $!\n";
    while(defined($rec=<FILE>))
    {
        chomp($rec);
        if( $rec =~ /zxc/ )
        {
            print "FNAME: $fname\n";
            print "FULLPATH: $File::Find::name\n";
        }
    }
    close(FILE) or die "unable to open $fname: $!\n";
}
find(\&check, @ARGV);
It actually searches for a string (zxc) but you can change that.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
ASCII characters in my script... Firebar Programming 9 10-27-2008 04:59 PM
find a string in all ascii file of a SOLARIS system markraem Solaris / OpenSolaris 4 01-16-2008 05:58 PM
display in hex + perl + non ASCII characters kshkid Programming 4 02-06-2007 04:48 PM
ascii characters lakshman Linux - General 1 03-14-2003 11:28 AM
Deleting non ASCII characters Thinkgeekness Linux - Networking 4 03-04-2003 01:29 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 07:46 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration