LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   bash - exclude some characters from a file (https://www.linuxquestions.org/questions/programming-9/bash-exclude-some-characters-from-a-file-4175446861/)

Hoxygen232 01-23-2013 04:08 AM

bash - exclude some characters from a file
 
hi,

I would like to print this file:
Code:

anche
due
-tre
quattro
pile,
a
patto.
che
eroghino--
la
stessa
tensione,

to onother file in this way:
Code:

anche
due
tre
quattro
pile
a
patto
che
eroghino
la
stessa
tensione

so without spaces, commas, etc...
it has to be clean of spaces etc... because after that I need to parse the file word by word

I have done this:
Code:

while read line;
 do
    TRASFORMER=$( echo "$line" | tr "[:upper:]" "[:lower:]" | sed -e 's/\.//g' -e 's/\,//g' -e 's/\-//g' -e 's/ //g' ) 
    echo "WORD TO SEARCH (trasformata): $TRASFORMER:"
    WORD=$( fgrep -w -i "$TRASFORMER" "$dictfile" )  # search a dictionar
  if [ -z "$WORD" ] 
  then
    echo "not found"
    echo
  else             
    echo "word found.........."
    RIGA=$( fgrep -w -n "$WORD" "$dictfile" | sed s/:$WORD//g | sed 's/ //g') 
    echo $RIGA
    echo "$RIGA" >> $FILE_OUTPUT
  fi
 done < $FILE_INPUT

but when it reach the word "a" it prints on terminal:
Code:

WORD TO SEARCH: a:
word found..........
sed: expression -e #1, character 5: command `s' not terminated

why?

Thank you

millgates 01-23-2013 05:14 AM

Hi,

Code:

TRASFORMER=$( echo "$line" | tr "[:upper:]" "[:lower:]" | sed -e 's/\.//g' -e 's/\,//g' -e 's/\-//g' -e 's/ //g' )
You don't have to put each character into a separate regexp. You can just write

Code:

sed -e 's/[.,- ]//g'
Also, if you have recent enough bash, this should work:

Code:

TRANSFORMER="${line//[,.- ]/}"
You can also transform the text to lower/upper case like this:

Code:

lower="${line,,}"
upper="${line^^}"

But this is not necessary, since your greps are case-insensitive anyway.

Code:

sed: expression -e #1, character 5: command `s' not terminated
This is because you don't quote the expression:

Code:

sed s/:$WORD//g
if $WORD contains any whitespace characters, the expression will be word-split by bash.
My guess is, the

Code:

WORD=$( fgrep -w -i "$TRASFORMER" "$dictfile" )
finds multiple matches in your $dictfile, so $WORD contains multiple words separated by newlines.

Hoxygen232 01-23-2013 07:38 AM

you're right, the problem is that, while searching the word in dictfile, it finds the LAST word that contains 'a';
i.e. I search word a and it finds: whisky-a-gogo
so how to solve it?

thanks

millgates 01-23-2013 08:53 AM

Well, you need a better matching pattern. Dash is not considered a "word" character, so grep -w foo will match "bar-foo-baz". If your "$dictfile" contains one word per line, with no leading/trailing whitespaces, you can match whole lines:

Code:

grep "^${WORD}$"

Hoxygen232 01-23-2013 08:57 AM

thanks, it works great


All times are GMT -5. The time now is 11:59 AM.