Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I would like to find a command which automatically finds and removes phrases which appear more than once in a text file. I still want to keep one of these phrases, but I only want to see one of them. Any ideas?
The answer depends on the exact circumstances. Please give us a representative sample of the text, and the kind of changes you want to make.
In general, if you can define regular patterns and rules for matching and modification, then it's probably scriptable. The more variation and unpredictability in the text, the harder it is to work with.
Make a copy of your file and play around with it. Note that with the host file it requires each "phrase" to be a separate line so it will look something like this.
For that matter, if you assume that each phrase is on a separate line, and that the original order doesn't need to be maintained, then all you may really need is:
Code:
sort -u filename
But that's why I requested clarification. Until the OP defines his needs in more detail, we're having to make assumptions like this.
I have used grep to select some lines from a group of .htm files (250 in total, 10 per file) and store them in a text file. Unfortunately I've run into another small problem when it comes to sorting the list which is that the filename comes before the actual phrase which I want to order them by. I would have no problem (and in fact want to) get rid of the filename in the phrases.
Here is a sample of the text I wish to modify (I have changed the actual names, but I'm sure whatever you give me will work for the actual names). The phrases I am woried about are shown in bold. Note that the first number shown in bold is part of the filename, which I want removed.
grep has the -h option, which turns off filename output. See the man page.
But I still don't get it. Do you want to whole lines, or just the "1abcd" part? But you want to keep the first instance? I think just removing that phrase would lead to some odd remainders. Care to elaborate further?
But that's not what your first example shows. It has several different html components, with the target phrase embedded inside multiple different components. And they aren't on individual lines either, unless your lack of [code][/code] tags around them has broken the formatting. Or is that supposed to be a single line?
But again, if the original order of the text doesn't matter, and the whole lines are truly identical, then the sort command I gave before can do it. If order matters, then crts' awk command will do it.
If the lines aren't exactly the same, then we'll need to do more work. Can you show us a larger sample of the actual text, wrapped in code tags, and exactly how you want it to look afterwards?
"sort -u filename" was what I needed. In actual fact all phrases were on one line each, and they were all identical to each other apart from one part which I wanted them to be ordered by.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.