LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 06-30-2009, 07:19 AM   #1
Dazamondo
LQ Newbie
 
Registered: Jun 2009
Posts: 7

Rep: Reputation: 0
Command to delete words out of a text file.


Hi

I am trying to delete whole words out of a text file that contain certain characters such as ø. I have tried many different ways such as sed and I have also been advised to use perl but I have not had any luck.

Anyone got any ideas or code that I could try to do this?

Cheers
Daz
 
Old 06-30-2009, 07:35 AM   #2
zhjim
Senior Member
 
Registered: Oct 2004
Distribution: Debian Squeeze x86_64
Posts: 1,748
Blog Entries: 11

Rep: Reputation: 233Reputation: 233Reputation: 233
what about the tr command

Code:
NAME
       tr - translate or delete characters

SYNOPSIS
       tr [OPTION]... SET1 [SET2]

DESCRIPTION
       Translate, squeeze, and/or delete characters from standard input, writing to standard output.
......
 
Old 06-30-2009, 07:42 AM   #3
onebuck
Moderator
 
Registered: Jan 2005
Location: Central Florida 20 minutes from Disney World
Distribution: Slackware®
Posts: 13,925
Blog Entries: 44

Rep: Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159Reputation: 3159
Hi,

Welcome to LQ!

Show us what you have tried.
 
Old 06-30-2009, 07:58 AM   #4
Dazamondo
LQ Newbie
 
Registered: Jun 2009
Posts: 7

Original Poster
Rep: Reputation: 0
I cant use tr as I actually want to delete the whole word that has the character in it rather than just delete the character.I did use tr when I wanted single characters removing.
I have tried many different sed commands such as cat filename | sed 's/[:allnum:]ø[:allnum:]//g' > newfilename. I have asked some other people and they say it cant be done with sed, it needs to be perl but I was very confused when trying perl and got nowhere tbh .

I'm very new to this and my commands are probably totally wrong but any help or advice will do.
 
Old 06-30-2009, 08:09 AM   #5
zhjim
Senior Member
 
Registered: Oct 2004
Distribution: Debian Squeeze x86_64
Posts: 1,748
Blog Entries: 11

Rep: Reputation: 233Reputation: 233Reputation: 233
I don't see the reason why it should not be done with sed. But who knows. Can you provide a snippet of the text you want to clear?
Just as a note the sed script you used would only remove numbers around the (o with slash ).
Here is my version if this is the desired behavior

Code:
sed -e '/s/[0-9]o[0-9]//g' > newfile
or
Code:
sed --inplace -e '/s/[0-9]o[0-9]//g'
--inplace just does the subsition inside the original file. So only use it if you sure it does what you want.
 
Old 06-30-2009, 08:22 AM   #6
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by Dazamondo View Post
cat filename | sed 's/[:allnum:]ø[:allnum:]//g' > newfilename.
Your sed command should be:
Code:
sed -i.bck 's/[[:alnum:]]ø[[:alnum:]]//g' filename
You can also use a character list in place of ø to include all the characters you want to match in one-shot. Note that -i.bck will edit the file in place making a backup copy of the original file with the suffix .bck appended.
 
Old 06-30-2009, 09:03 AM   #7
Dazamondo
LQ Newbie
 
Registered: Jun 2009
Posts: 7

Original Poster
Rep: Reputation: 0
Oh right I thought [:alnum:] was all letters and digits. No it still doesn't seem to like it, some snippets from the text file is shown below:

able
adlød
adele
administration
administer
Aalbørg

For example I would want the command to delete adlød and Aalbørg as they have the ø symbol. Thanks for your help guys.
 
Old 06-30-2009, 09:13 AM   #8
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Ok. I forgot to put asterisks to match against any number of alpha-numeric characters. If the file is made of lines containing a single word, you can use the delete command of sed to remove the whole line:
Code:
$ cat testfileable
adlød
adele
administration
administer
Aalbørg
$ sed '/[[:alnum:]]*ø[[:alnum:]]*/d' testfile
able
adele
administration
administer

Last edited by colucix; 06-30-2009 at 09:16 AM.
 
Old 06-30-2009, 09:24 AM   #9
xxloaf
LQ Newbie
 
Registered: May 2007
Distribution: Debian
Posts: 10

Rep: Reputation: 0
Quote:
Originally Posted by Dazamondo View Post
able
adlød
adele
administration
administer
Aalbørg
Is the file you are trying to replace have all the words on separate lines like this?

If so just use grep to get those words out

Code:
grep -v "ø" file.txt > newfile.txt
 
Old 06-30-2009, 09:26 AM   #10
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by xxloaf View Post
Is the file you are trying to replace have all the words on separate lines like this?

If so just use grep to get those words out

Code:
grep -v "ø" file.txt > newfile.txt
xxloaf, you hit the nail on the head! I was going to correct my post to suggest this simple solution.
 
Old 06-30-2009, 09:41 AM   #11
Dazamondo
LQ Newbie
 
Registered: Jun 2009
Posts: 7

Original Poster
Rep: Reputation: 0
I have used both them commands before, I originally thought grep would be easiest (and used the same command as you suggested) but nothing will work on this file. I have just tried the commands on a smaller text file and it works great, however on the one I need it to work on they don't . Could it be due to the number of words that the file holds - one million or so?

Cheers
 
Old 06-30-2009, 09:44 AM   #12
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Nope. These commands parse the file line after line and the number of lines does not make any difference. What do you mean for "they don't work"? Did you get any error message? Or just not the desired result?
 
Old 06-30-2009, 09:48 AM   #13
Dazamondo
LQ Newbie
 
Registered: Jun 2009
Posts: 7

Original Poster
Rep: Reputation: 0
Like I say the both commands suggested work fine on a smaller file (the words that need to be removed are). However on the big file the command completes fine no errors but when I open the file it still has the words that should have been removed. That is why I am so confused because on the small file it works but the same command doesn't get the desired results on the big file.
 
Old 06-30-2009, 10:48 AM   #14
xxloaf
LQ Newbie
 
Registered: May 2007
Distribution: Debian
Posts: 10

Rep: Reputation: 0
Hmm are you sure you are using the correct character in your command?

If you just do
Code:
grep "ø" file.txt
Note the removal of the "-v"

Does the command return any results? If not you need to be sure you are grabbing the correct character, those weird ascii characters can be tricky at times.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Delete text between key words : sed shalomajay Programming 7 05-11-2011 06:22 PM
threads doesn't show words in text file lp_s Linux - Newbie 2 05-13-2009 12:11 AM
Edit words to upper case without delete anything from source file cgcamal Programming 9 01-17-2009 06:06 AM
text match pipe to file then delete from original text file create new dir automatic tr1px Linux - Newbie 6 09-10-2008 09:40 PM
Replacing words in a text file Raghavan_sat Programming 3 05-27-2008 03:11 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 05:32 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration