Sort specific words within sentence
Hello all,
I have a txt file that contain sentences with the following format: Code:
Bla bla bla id2 bla bla bla bla id3 bla bla bla bla bla id1 bla bla bla id4 bla bla Code:
Bla bla bla id1 bla bla bla bla id2 bla bla bla bla bla id3 bla bla bla id4 bla bla Thanks, |
If I understand you correctly, you really don't care where in the string "id1", "id2", etc. are, nor do you care about any of the words surrounding each instance, you just want the first "id#" to be called "id1", the second to be called "id2", etc. Is that correct?
In that case, I would suggest modifying your approach to simply perform a string replacement on any "id#" you find. Replace the first instance, no matter what number it contains, with id1. Replace the second instance with id2, etc. I imagine this could be easily accomplished with awk or sed. Of course you'll need to think about outlier situations. Eg: is it possible to have an id4 without an id3? If so, do you want the output to show id4 or id3? Can there be duplicates, and again what do you want the result to be? |
hi,
Thank you for your reply. Yes, you are right, I want the first id# to be the number 1, the second the id2 and so on. there are not duplicates. Could you please provide me the command, or something to start? Thank you in advance for your support. |
perl works pretty well for this:
Quote:
|
hi,
thanks for your prompt reply. Perl one-liner works pretty good for this :) |
I think if you search with almost your exact input data on the forums here you will see this has already been asked and answered.
Which I must say makes me curious if the question is from the same course?? |
OP said "I have a txt file that contain sentences with the following format..." His example contained only one line so I made a two-line test file. A good test file should have unique words.
With this InFile ... Code:
B01 b02 id2 b21 b22 id3 b31 b32 b33 id1 b11 b12 b13 id4 b41 b42 Code:
awk 'BEGIN{FS="id"} Code:
B01 b02 id1 b11 b12 b13 id2 b21 b22 id3 b31 b32 b33 id4 b41 b42 I don't know perl, so might have botched the test, but the results generated by the code in a previous post look wrong. Please make your own test. Daniel B. Martin |
It sounded to me like he wanted the id count reset for every line of input. Is this not the case?
Code:
perl -e '$n=1;while(defined($x=<STDIN>)) {foreach $w(split(/\s+/, $x)){if ($w =~ /^id\d+$/){print "id".$n++." ";}else{print"$w "}}print"\n";}' |
Quote:
Daniel B. Martin |
Hi,
thanks for your interest in my issue. I just wanted the reset of the id numbers! To be honest, I didn't understand what exactly the awk command does. Thanks again for your time. |
Here is an awk alternative:
Code:
echo 'Bla bla bla id2 bla bla bla bla id3 bla bla bla bla bla id1 bla bla bla id4 bla bla' | awk '/id/{sub(/.$/,++i)}1' RS=" " ORS=" " |
Quote:
The thread title is Sort specific words within sentence. Sorting means reordering the data without changing any of it. By contrast, what you wanted was to change some of the data (the ID number) without reordering it. The confusion arose from use (or misuse) of the word sort. Daniel B. Martin |
Hi all,
I had some time to test your codes but unfortunately they are not working. Please find the commands and the outputs: Quote:
Quote:
Quote:
Thanks |
Quote:
Code:
B01 b02 id2 b21 b22 id3 b31 b32 b33 id1 b11 b12 b13 id4 b41 b42 Daniel B. Martin |
Mine is due to the additional character you added by having your id at the end of the line and the echo returning a new character.
You can simply fix mine by passing -n to echo: Code:
echo -n 'id2 word1 id1 id3 word2 id4' | awk '/id/{sub(/.$/,++i)}1' RS=" " ORS=" " The perl one yu might have to wait for, I can tell you it is due to the fact that you changed your format on where id's might appear. So I was able to correct all but the last: Code:
echo -n 'id2 word1 id1 id3 word2 id4' | perl -e 'while(defined($x=<STDIN>)) { $n = -1; print join("", map { ++$n > 0 ? " id$n $_" : $_ } split(/\s*id\d+\s*/, $x)); }' |
All times are GMT -5. The time now is 08:52 AM. |