Oh, been there, did that (and it hurt, too!).
For some years I was receiving files from clients that were supposed to be text only, no special characters, no junk. But, as we all know, Microsoft users don't know a slash from a back slant and...
I developed a front-end filtering utility that all received files passed through to eliminate every bad character users could come up with, \, ^, tab, back space (yes, back space) and who knows what all. Kept a lot of crap out of our data bases.
I believe that doing this in SQL is going to be extremely difficult, even if you use the C-API (it's probably doable but, wow, what a pain).
I suspect that the easy way is to unload your affected tables to text files and use one of the editors,
sed for example, to clean up the mess. You could write a little C filter program that deletes non-ASCII characters or perhaps make use of this sample program (a subset of the "big" front end filter):
Code:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#ifndef TRUE
# define TRUE 1
#endif
#ifndef FALSE
# define FALSE 0
#endif
void main (int argc, char *argv [])
{
int c;
int error = FALSE; /* error flag */
int line = 1; /* number of lines */
int ropt = FALSE; /* remove */
int sopt = FALSE; /* replace with blank */
int wopt = FALSE; /* warn */
FILE *infile, *outfile;
/* process command line options */
while ((c = getopt (argc, argv, "rsw?")) != EOF) {
switch (c) {
case 'r':
ropt = TRUE;
break;
case 's':
sopt = TRUE;
break;
case 'w':
wopt = TRUE;
break;
case '?':
error = TRUE;
break;
default:
error = TRUE;
break;
}
}
if (error || (ropt && sopt)) {
(void) fprintf (stderr,
"usage:\t%s [-r remove] [-s sub] [-w warn] [file]\n",
argv [0]);
exit (1);
}
if (optind == argc) {
infile = stdin;
} else {
if ((infile = fopen (argv [optind], "r")) == (FILE *) NULL) {
(void) fprintf (stderr, "can't open %s\n", argv [optind]);
exit (2);
}
}
/* make the output file stdout */
outfile = stdout;
/* while there are characters to get... */
while ((c = getc (infile)) != EOF) {
/* new line is ok */
if (c == '\n') {
line++;
/* tab and form feed are ok */
} else if (c != '\t' &&
c != '\f' &&
(c < 32 || c > 126)) {
/* warn about it? */
if (wopt) {
if (c < 32) {
(void) fprintf (stderr,
"at line %d, Ctrl-%c (0x%03o)\n",
line, c+64, c);
} else {
(void) fprintf (stderr,
"at line %d, 0x%04o\n",
line, c);
}
}
if (ropt) { /* remove it */
continue;
} else if (sopt) { /* substitute it */
c = ' ';
}
}
(void) putc (c, outfile);
}
}
Save the above as, oh,
filter.c and compile it with
Code:
cc -o filter filter.c
The usage is
Code:
usage: filter [-r remove] [-s sub] [-w warn] infile
So you would execute it with one of those options; -r removes bad characters, -s substitutes a bad character with a blank, w= warns you where bad characters appear in your file. The output is to standard out, so
Code:
filter -r infile > outfile
Feel free to fiddle with it; it's yours to do with what you wish.
Hope this helps some.