LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-27-2014, 03:28 PM   #1
corfuitl
Member
 
Registered: Mar 2012
Posts: 38

Rep: Reputation: Disabled
Sort specific words within sentence


Hello all,

I have a txt file that contain sentences with the following format:

Code:
Bla bla bla id2 bla bla bla bla id3 bla bla bla bla bla id1 bla bla bla id4 bla bla
And want to change the position of id{1..n} so they will be first the smaller and then the higher. For instance:
Code:
Bla bla bla id1 bla bla bla bla id2 bla bla bla bla bla id3 bla bla bla id4 bla bla
Do you know if there is any script for fixing this issue?

Thanks,
 
Old 10-27-2014, 05:30 PM   #2
suicidaleggroll
LQ Guru
 
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 5,492

Rep: Reputation: 2090Reputation: 2090Reputation: 2090Reputation: 2090Reputation: 2090Reputation: 2090Reputation: 2090Reputation: 2090Reputation: 2090Reputation: 2090Reputation: 2090
If I understand you correctly, you really don't care where in the string "id1", "id2", etc. are, nor do you care about any of the words surrounding each instance, you just want the first "id#" to be called "id1", the second to be called "id2", etc. Is that correct?

In that case, I would suggest modifying your approach to simply perform a string replacement on any "id#" you find. Replace the first instance, no matter what number it contains, with id1. Replace the second instance with id2, etc. I imagine this could be easily accomplished with awk or sed. Of course you'll need to think about outlier situations. Eg: is it possible to have an id4 without an id3? If so, do you want the output to show id4 or id3? Can there be duplicates, and again what do you want the result to be?

Last edited by suicidaleggroll; 10-27-2014 at 05:32 PM.
 
Old 10-27-2014, 05:36 PM   #3
corfuitl
Member
 
Registered: Mar 2012
Posts: 38

Original Poster
Rep: Reputation: Disabled
hi,

Thank you for your reply. Yes, you are right, I want the first id# to be the number 1, the second the id2 and so on. there are not duplicates. Could you please provide me the command, or something to start?

Thank you in advance for your support.
 
Old 10-27-2014, 06:38 PM   #4
ttk
Member
 
Registered: May 2012
Location: Sebastopol, CA
Distribution: Slackware
Posts: 520
Blog Entries: 20

Rep: Reputation: 532Reputation: 532Reputation: 532Reputation: 532Reputation: 532Reputation: 532
perl works pretty well for this:

Quote:
echo 'Bla bla bla id2 bla bla bla bla id3 bla bla bla bla bla id1 bla bla bla id4 bla bla' | perl -e 'while(defined($x=<STDIN>)) { $n = -1; print join("", map { ++$n > 0 ? " id$n $_" : $_ } split(/\s+id\d+\s+/, $x)); }'
Bla bla bla id1 bla bla bla bla id2 bla bla bla bla bla id3 bla bla bla id4 bla bla
 
1 members found this post helpful.
Old 10-28-2014, 04:16 AM   #5
corfuitl
Member
 
Registered: Mar 2012
Posts: 38

Original Poster
Rep: Reputation: Disabled
hi,

thanks for your prompt reply. Perl one-liner works pretty good for this
 
Old 10-28-2014, 05:24 AM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,507

Rep: Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890
I think if you search with almost your exact input data on the forums here you will see this has already been asked and answered.
Which I must say makes me curious if the question is from the same course??
 
Old 10-28-2014, 11:18 AM   #7
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,559

Rep: Reputation: 474Reputation: 474Reputation: 474Reputation: 474Reputation: 474
OP said "I have a txt file that contain sentences with the following format..." His example contained only one line so I made a two-line test file. A good test file should have unique words.

With this InFile ...
Code:
B01 b02 id2 b21 b22 id3 b31 b32 b33 id1 b11 b12 b13 id4 b41 b42 
B11 b12 id6 b61 b62 id7 b71 b72 b73 b74 id5 b51 b52 b53
... this awk ...
Code:
awk 'BEGIN{FS="id"}
  {$0="id0 "$0; split($0,w); asort(w,m);
   s=""; for (k=2;k<=NF;k++) {s=s"id"m[k]};
   sub(/id0 /,"",s); print s}' $InFile >$OutFile
... produced this OutFile ...
Code:
B01 b02 id1 b11 b12 b13 id2 b21 b22 id3 b31 b32 b33 id4 b41 b42 
B11 b12 id5 b51 b52 b53 id6 b61 b62 id7 b71 b72 b73 b74
... which looks correct.

I don't know perl, so might have botched the test, but the results generated by the code in a previous post look wrong. Please make your own test.

Daniel B. Martin
 
Old 10-28-2014, 12:10 PM   #8
ttk
Member
 
Registered: May 2012
Location: Sebastopol, CA
Distribution: Slackware
Posts: 520
Blog Entries: 20

Rep: Reputation: 532Reputation: 532Reputation: 532Reputation: 532Reputation: 532Reputation: 532
It sounded to me like he wanted the id count reset for every line of input. Is this not the case?

Code:
perl -e '$n=1;while(defined($x=<STDIN>)) {foreach $w(split(/\s+/, $x)){if ($w =~ /^id\d+$/){print "id".$n++." ";}else{print"$w "}}print"\n";}'

Last edited by ttk; 10-28-2014 at 12:19 PM.
 
Old 10-28-2014, 01:04 PM   #9
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,559

Rep: Reputation: 474Reputation: 474Reputation: 474Reputation: 474Reputation: 474
Quote:
Originally Posted by ttk View Post
Code:
perl -e '$n=1;while(defined($x=<STDIN>)) {foreach $w(split(/\s+/, $x)){if ($w =~ /^id\d+$/){print "id".$n++." ";}else{print"$w "}}print"\n";}'
The output from the perl code (second line) is still not the same as that of the awk code.

Daniel B. Martin
 
Old 10-28-2014, 02:55 PM   #10
corfuitl
Member
 
Registered: Mar 2012
Posts: 38

Original Poster
Rep: Reputation: Disabled
Hi,

thanks for your interest in my issue. I just wanted the reset of the id numbers!

To be honest, I didn't understand what exactly the awk command does.

Thanks again for your time.
 
Old 10-28-2014, 06:17 PM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,507

Rep: Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890
Here is an awk alternative:
Code:
echo 'Bla bla bla id2 bla bla bla bla id3 bla bla bla bla bla id1 bla bla bla id4 bla bla' | awk '/id/{sub(/.$/,++i)}1' RS=" " ORS=" "
You do end up with an extra space at the end ... but you get the idea
 
Old 10-29-2014, 05:56 AM   #12
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,559

Rep: Reputation: 474Reputation: 474Reputation: 474Reputation: 474Reputation: 474
Quote:
Originally Posted by corfuitl View Post
To be honest, I didn't understand what exactly the awk command does.
After reconsideration, I realize my solution doesn't match your problem.

The thread title is Sort specific words within sentence. Sorting means reordering the data without changing any of it. By contrast, what you wanted was to change some of the data (the ID number) without reordering it. The confusion arose from use (or misuse) of the word sort.

Daniel B. Martin
 
Old 10-29-2014, 09:07 AM   #13
corfuitl
Member
 
Registered: Mar 2012
Posts: 38

Original Poster
Rep: Reputation: Disabled
Hi all,

I had some time to test your codes but unfortunately they are not working.

Please find the commands and the outputs:


Quote:
echo 'id2 word1 id1 id3 word2 id4' | perl -e 'while(defined($x=<STDIN>)) { $n = -1; print join("", map { ++$n > 0 ? " id$n $_" : $_ } split(/\s+id\d+\s+/, $x)); }'
id2 word1 id1 id3 word2

Quote:
echo 'id2 word1 id1 id3 word2 id4' | awk 'BEGIN{FS="id"} {$0="id0 "$0; split($0,w); asort(w,m); s=""; for (k=2;k<=NF;k++) {s=s"id"m[k]}; sub(/id0 /,"",s); print s}'
id1 id2 word1 id3 word2 id4

Quote:
echo 'id2 word1 id1 id3 word2 id4' | awk '/id/{sub(/.$/,++i)}1' RS=" " ORS=" "
id1 word1 id2 id3 word2 id44
I would greatly appreciate it if you kindly give me some help.

Thanks
 
Old 10-29-2014, 09:42 AM   #14
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,559

Rep: Reputation: 474Reputation: 474Reputation: 474Reputation: 474Reputation: 474
Quote:
Originally Posted by corfuitl View Post
I had some time to test your codes but unfortunately they are not working.
Help us to help you. Take this sample input file ...
Code:
B01 b02 id2 b21 b22 id3 b31 b32 b33 id1 b11 b12 b13 id4 b41 b42 
B11 b12 id6 b61 b62 id7 b71 b72 b73 b74 id5 b51 b52 b53
... and construct (by hand) the corresponding output file. That will give us a better idea of what you want and also give us something to check against the results produced by our code.

Daniel B. Martin
 
Old 10-29-2014, 09:43 AM   #15
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,507

Rep: Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890Reputation: 2890
Mine is due to the additional character you added by having your id at the end of the line and the echo returning a new character.
You can simply fix mine by passing -n to echo:
Code:
echo -n 'id2 word1 id1 id3 word2 id4' | awk '/id/{sub(/.$/,++i)}1' RS=" " ORS=" "
I believe Daniel has already advised why his probably won't give the desired output.

The perl one yu might have to wait for, I can tell you it is due to the fact that you changed your format on where id's might appear.
So I was able to correct all but the last:
Code:
echo -n 'id2 word1 id1 id3 word2 id4' | perl -e 'while(defined($x=<STDIN>)) { $n = -1; print join("", map { ++$n > 0 ? " id$n $_" : $_ } split(/\s*id\d+\s*/, $x)); }'
So may need to wait for ttk to help further on that one.
 
  


Reply

Tags
awk, bash, perl, python, sed


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] awk - sort words within each line danielbmartin Programming 3 02-03-2012 11:17 AM
Get all lines containing 23 specific words with AWK cgcamal Programming 3 11-05-2008 10:51 AM
Dividing sentence into words TheMstrLaw Programming 5 11-02-2008 07:59 AM
Squid specific words blocking lakshan Linux - Software 1 07-11-2006 08:12 PM
Search logfiles for specific words kinetik Linux - General 2 03-29-2006 07:04 PM


All times are GMT -5. The time now is 06:24 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration