LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Sort specific words within sentence (https://www.linuxquestions.org/questions/programming-9/sort-specific-words-within-sentence-4175523501/)

corfuitl 10-27-2014 03:28 PM

Sort specific words within sentence
 
Hello all,

I have a txt file that contain sentences with the following format:

Code:

Bla bla bla id2 bla bla bla bla id3 bla bla bla bla bla id1 bla bla bla id4 bla bla
And want to change the position of id{1..n} so they will be first the smaller and then the higher. For instance:
Code:

Bla bla bla id1 bla bla bla bla id2 bla bla bla bla bla id3 bla bla bla id4 bla bla
Do you know if there is any script for fixing this issue?

Thanks,

suicidaleggroll 10-27-2014 05:30 PM

If I understand you correctly, you really don't care where in the string "id1", "id2", etc. are, nor do you care about any of the words surrounding each instance, you just want the first "id#" to be called "id1", the second to be called "id2", etc. Is that correct?

In that case, I would suggest modifying your approach to simply perform a string replacement on any "id#" you find. Replace the first instance, no matter what number it contains, with id1. Replace the second instance with id2, etc. I imagine this could be easily accomplished with awk or sed. Of course you'll need to think about outlier situations. Eg: is it possible to have an id4 without an id3? If so, do you want the output to show id4 or id3? Can there be duplicates, and again what do you want the result to be?

corfuitl 10-27-2014 05:36 PM

hi,

Thank you for your reply. Yes, you are right, I want the first id# to be the number 1, the second the id2 and so on. there are not duplicates. Could you please provide me the command, or something to start?

Thank you in advance for your support.

ttk 10-27-2014 06:38 PM

perl works pretty well for this:

Quote:

echo 'Bla bla bla id2 bla bla bla bla id3 bla bla bla bla bla id1 bla bla bla id4 bla bla' | perl -e 'while(defined($x=<STDIN>)) { $n = -1; print join("", map { ++$n > 0 ? " id$n $_" : $_ } split(/\s+id\d+\s+/, $x)); }'
Bla bla bla id1 bla bla bla bla id2 bla bla bla bla bla id3 bla bla bla id4 bla bla

corfuitl 10-28-2014 04:16 AM

hi,

thanks for your prompt reply. Perl one-liner works pretty good for this :)

grail 10-28-2014 05:24 AM

I think if you search with almost your exact input data on the forums here you will see this has already been asked and answered.
Which I must say makes me curious if the question is from the same course??

danielbmartin 10-28-2014 11:18 AM

OP said "I have a txt file that contain sentences with the following format..." His example contained only one line so I made a two-line test file. A good test file should have unique words.

With this InFile ...
Code:

B01 b02 id2 b21 b22 id3 b31 b32 b33 id1 b11 b12 b13 id4 b41 b42
B11 b12 id6 b61 b62 id7 b71 b72 b73 b74 id5 b51 b52 b53

... this awk ...
Code:

awk 'BEGIN{FS="id"}
  {$0="id0 "$0; split($0,w); asort(w,m);
  s=""; for (k=2;k<=NF;k++) {s=s"id"m[k]};
  sub(/id0 /,"",s); print s}' $InFile >$OutFile

... produced this OutFile ...
Code:

B01 b02 id1 b11 b12 b13 id2 b21 b22 id3 b31 b32 b33 id4 b41 b42
B11 b12 id5 b51 b52 b53 id6 b61 b62 id7 b71 b72 b73 b74

... which looks correct.

I don't know perl, so might have botched the test, but the results generated by the code in a previous post look wrong. Please make your own test.

Daniel B. Martin

ttk 10-28-2014 12:10 PM

It sounded to me like he wanted the id count reset for every line of input. Is this not the case?

Code:

perl -e '$n=1;while(defined($x=<STDIN>)) {foreach $w(split(/\s+/, $x)){if ($w =~ /^id\d+$/){print "id".$n++." ";}else{print"$w "}}print"\n";}'

danielbmartin 10-28-2014 01:04 PM

Quote:

Originally Posted by ttk (Post 5260835)
Code:

perl -e '$n=1;while(defined($x=<STDIN>)) {foreach $w(split(/\s+/, $x)){if ($w =~ /^id\d+$/){print "id".$n++." ";}else{print"$w "}}print"\n";}'

The output from the perl code (second line) is still not the same as that of the awk code.

Daniel B. Martin

corfuitl 10-28-2014 02:55 PM

Hi,

thanks for your interest in my issue. I just wanted the reset of the id numbers!

To be honest, I didn't understand what exactly the awk command does.

Thanks again for your time.

grail 10-28-2014 06:17 PM

Here is an awk alternative:
Code:

echo 'Bla bla bla id2 bla bla bla bla id3 bla bla bla bla bla id1 bla bla bla id4 bla bla' | awk '/id/{sub(/.$/,++i)}1' RS=" " ORS=" "
You do end up with an extra space at the end ... but you get the idea :)

danielbmartin 10-29-2014 05:56 AM

Quote:

Originally Posted by corfuitl (Post 5260946)
To be honest, I didn't understand what exactly the awk command does.

After reconsideration, I realize my solution doesn't match your problem.

The thread title is Sort specific words within sentence. Sorting means reordering the data without changing any of it. By contrast, what you wanted was to change some of the data (the ID number) without reordering it. The confusion arose from use (or misuse) of the word sort.

Daniel B. Martin

corfuitl 10-29-2014 09:07 AM

Hi all,

I had some time to test your codes but unfortunately they are not working.

Please find the commands and the outputs:


Quote:

echo 'id2 word1 id1 id3 word2 id4' | perl -e 'while(defined($x=<STDIN>)) { $n = -1; print join("", map { ++$n > 0 ? " id$n $_" : $_ } split(/\s+id\d+\s+/, $x)); }'
id2 word1 id1 id3 word2

Quote:

echo 'id2 word1 id1 id3 word2 id4' | awk 'BEGIN{FS="id"} {$0="id0 "$0; split($0,w); asort(w,m); s=""; for (k=2;k<=NF;k++) {s=s"id"m[k]}; sub(/id0 /,"",s); print s}'
id1 id2 word1 id3 word2 id4

Quote:

echo 'id2 word1 id1 id3 word2 id4' | awk '/id/{sub(/.$/,++i)}1' RS=" " ORS=" "
id1 word1 id2 id3 word2 id44
I would greatly appreciate it if you kindly give me some help.

Thanks

danielbmartin 10-29-2014 09:42 AM

Quote:

Originally Posted by corfuitl (Post 5261358)
I had some time to test your codes but unfortunately they are not working.

Help us to help you. Take this sample input file ...
Code:

B01 b02 id2 b21 b22 id3 b31 b32 b33 id1 b11 b12 b13 id4 b41 b42
B11 b12 id6 b61 b62 id7 b71 b72 b73 b74 id5 b51 b52 b53

... and construct (by hand) the corresponding output file. That will give us a better idea of what you want and also give us something to check against the results produced by our code.

Daniel B. Martin

grail 10-29-2014 09:43 AM

Mine is due to the additional character you added by having your id at the end of the line and the echo returning a new character.
You can simply fix mine by passing -n to echo:
Code:

echo -n 'id2 word1 id1 id3 word2 id4' | awk '/id/{sub(/.$/,++i)}1' RS=" " ORS=" "
I believe Daniel has already advised why his probably won't give the desired output.

The perl one yu might have to wait for, I can tell you it is due to the fact that you changed your format on where id's might appear.
So I was able to correct all but the last:
Code:

echo -n 'id2 word1 id1 id3 word2 id4' | perl -e 'while(defined($x=<STDIN>)) { $n = -1; print join("", map { ++$n > 0 ? " id$n $_" : $_ } split(/\s*id\d+\s*/, $x)); }'
So may need to wait for ttk to help further on that one.


All times are GMT -5. The time now is 06:38 PM.