LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-23-2013, 04:08 AM   #1
Hoxygen232
Member
 
Registered: Jan 2013
Posts: 37

Rep: Reputation: Disabled
bash - exclude some characters from a file


hi,

I would like to print this file:
Code:
anche
due
-tre
quattro
pile,
a
patto.
che
eroghino--
la
stessa
tensione,
to onother file in this way:
Code:
anche
due
tre
quattro
pile
a
patto
che
eroghino
la
stessa
tensione
so without spaces, commas, etc...
it has to be clean of spaces etc... because after that I need to parse the file word by word

I have done this:
Code:
 while read line; 
 do 
     TRASFORMER=$( echo "$line" | tr "[:upper:]" "[:lower:]" | sed -e 's/\.//g' -e 's/\,//g' -e 's/\-//g' -e 's/ //g' )  
     echo "WORD TO SEARCH (trasformata): $TRASFORMER:" 
     WORD=$( fgrep -w -i "$TRASFORMER" "$dictfile" )  # search a dictionar
  if [ -z "$WORD" ]  
  then
     echo "not found"
     echo 
  else               
     echo "word found.........." 
     RIGA=$( fgrep -w -n "$WORD" "$dictfile" | sed s/:$WORD//g | sed 's/ //g')   
     echo $RIGA
     echo "$RIGA" >> $FILE_OUTPUT 
  fi
 done < $FILE_INPUT
but when it reach the word "a" it prints on terminal:
Code:
WORD TO SEARCH: a:
word found..........
sed: expression -e #1, character 5: command `s' not terminated
why?

Thank you

Last edited by Hoxygen232; 01-23-2013 at 08:58 AM.
 
Old 01-23-2013, 05:14 AM   #2
millgates
Member
 
Registered: Feb 2009
Location: 192.168.x.x
Distribution: Slackware
Posts: 852

Rep: Reputation: 389Reputation: 389Reputation: 389Reputation: 389
Hi,

Code:
TRASFORMER=$( echo "$line" | tr "[:upper:]" "[:lower:]" | sed -e 's/\.//g' -e 's/\,//g' -e 's/\-//g' -e 's/ //g' )
You don't have to put each character into a separate regexp. You can just write

Code:
sed -e 's/[.,- ]//g'
Also, if you have recent enough bash, this should work:

Code:
TRANSFORMER="${line//[,.- ]/}"
You can also transform the text to lower/upper case like this:

Code:
lower="${line,,}"
upper="${line^^}"
But this is not necessary, since your greps are case-insensitive anyway.

Code:
sed: expression -e #1, character 5: command `s' not terminated
This is because you don't quote the expression:

Code:
sed s/:$WORD//g
if $WORD contains any whitespace characters, the expression will be word-split by bash.
My guess is, the

Code:
WORD=$( fgrep -w -i "$TRASFORMER" "$dictfile" )
finds multiple matches in your $dictfile, so $WORD contains multiple words separated by newlines.
 
Old 01-23-2013, 07:38 AM   #3
Hoxygen232
Member
 
Registered: Jan 2013
Posts: 37

Original Poster
Rep: Reputation: Disabled
you're right, the problem is that, while searching the word in dictfile, it finds the LAST word that contains 'a';
i.e. I search word a and it finds: whisky-a-gogo
so how to solve it?

thanks
 
Old 01-23-2013, 08:53 AM   #4
millgates
Member
 
Registered: Feb 2009
Location: 192.168.x.x
Distribution: Slackware
Posts: 852

Rep: Reputation: 389Reputation: 389Reputation: 389Reputation: 389
Well, you need a better matching pattern. Dash is not considered a "word" character, so grep -w foo will match "bar-foo-baz". If your "$dictfile" contains one word per line, with no leading/trailing whitespaces, you can match whole lines:

Code:
grep "^${WORD}$"
 
2 members found this post helpful.
Old 01-23-2013, 08:57 AM   #5
Hoxygen232
Member
 
Registered: Jan 2013
Posts: 37

Original Poster
Rep: Reputation: Disabled
thanks, it works great
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Using Find with an exclude/exclude file metallica1973 Linux - General 8 11-06-2011 09:39 PM
[SOLVED] Bash script to parse a file to get a set of line between a specific characters venkatrg Linux - Newbie 5 12-24-2010 06:55 AM
awk: how to exclude last several characters ejinh Linux - General 6 08-18-2010 03:19 AM
bash script to find out more than 1 continuous special characters in a file. kkpal Linux - Newbie 1 06-02-2008 04:56 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:49 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration