sed character replacement
I need to build up a dictionary from books containing just minor caps without special characters. Thus I wrote a small script to replace A-Z by a-z and all . - ! ? " ; , ' < > \n (newline) * [ ] by spaces. It works fine for . ! ? " ; , < > * [ but makes problems for ] - \n '
- is ignored when included inside the first command and ] breaks the regexp no matter if I escape the characters or not. I was able to create a work around by processing them separately. But I'd rather like to have as little command calls as possible as the files to process are complete books and thus rather large. \n is simply not replaced and ' causes a syntax error. Here comes the script: Code:
#!/bin/sh Code:
ABCDEFGHIJKLMNOPQRSTUVWXYZ.-!?";,'<> |
Code:
sed -i "s/'/ /g" $CLEANFILENAME |
Try:
Code:
sed -i "s/\'/ /g" $CLEANFILENAME troop beat me to it :) |
Cool - the doublequotes helped a lot (I don't understand why). Unfortunately \n is still completely ignored and ] still needs to be processed separately.
The script now is: Code:
#!/bin/sh Code:
ABCDEFGHIJKLMNOPQRSTUVWXYZ.-!?";,'<> |
The following script is not what I originally wanted but works sufficiently fast for my purpose:
Code:
#!/bin/sh |
What I would do for such a problem is to use the inverse operator the '^'. For example:
Code:
cat file | tr [:upper:] [:lower:] | tr [:space:] " " | sed 's|[^_a-z]| |g' |
Cool - I like that (and it works).
|
All times are GMT -5. The time now is 01:46 AM. |