Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I am trying to delete whole words out of a text file that contain certain characters such as ø. I have tried many different ways such as sed and I have also been advised to use perl but I have not had any luck.
Anyone got any ideas or code that I could try to do this?
NAME
tr - translate or delete characters
SYNOPSIS
tr [OPTION]... SET1 [SET2]
DESCRIPTION
Translate, squeeze, and/or delete characters from standard input, writing to standard output.
......
I cant use tr as I actually want to delete the whole word that has the character in it rather than just delete the character.I did use tr when I wanted single characters removing.
I have tried many different sed commands such as cat filename | sed 's/[:allnum:]ø[:allnum:]//g' > newfilename. I have asked some other people and they say it cant be done with sed, it needs to be perl but I was very confused when trying perl and got nowhere tbh .
I'm very new to this and my commands are probably totally wrong but any help or advice will do.
I don't see the reason why it should not be done with sed. But who knows. Can you provide a snippet of the text you want to clear?
Just as a note the sed script you used would only remove numbers around the (o with slash ).
Here is my version if this is the desired behavior
Code:
sed -e '/s/[0-9]o[0-9]//g' > newfile
or
Code:
sed --inplace -e '/s/[0-9]o[0-9]//g'
--inplace just does the subsition inside the original file. So only use it if you sure it does what you want.
cat filename | sed 's/[:allnum:]ø[:allnum:]//g' > newfilename.
Your sed command should be:
Code:
sed -i.bck 's/[[:alnum:]]ø[[:alnum:]]//g' filename
You can also use a character list in place of ø to include all the characters you want to match in one-shot. Note that -i.bck will edit the file in place making a backup copy of the original file with the suffix .bck appended.
Ok. I forgot to put asterisks to match against any number of alpha-numeric characters. If the file is made of lines containing a single word, you can use the delete command of sed to remove the whole line:
Code:
$ cat testfileable
adlød
adele
administration
administer
Aalbørg
$ sed '/[[:alnum:]]*ø[[:alnum:]]*/d' testfile
able
adele
administration
administer
I have used both them commands before, I originally thought grep would be easiest (and used the same command as you suggested) but nothing will work on this file. I have just tried the commands on a smaller text file and it works great, however on the one I need it to work on they don't . Could it be due to the number of words that the file holds - one million or so?
Nope. These commands parse the file line after line and the number of lines does not make any difference. What do you mean for "they don't work"? Did you get any error message? Or just not the desired result?
Like I say the both commands suggested work fine on a smaller file (the words that need to be removed are). However on the big file the command completes fine no errors but when I open the file it still has the words that should have been removed. That is why I am so confused because on the small file it works but the same command doesn't get the desired results on the big file.
Hmm are you sure you are using the correct character in your command?
If you just do
Code:
grep "ø" file.txt
Note the removal of the "-v"
Does the command return any results? If not you need to be sure you are grabbing the correct character, those weird ascii characters can be tricky at times.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.