Sorting Email list
Hi,
I have an email list that I would like to sort. The problem is that emails can be of different sizes and won't always have the same number of columns. Example Hello@zmail.com Hello.World@zmail.net Hello@mail.zmail.org Hello.World@mail.zmail.edu This makes it difficult for me because I want to sort first by the .com/.net/.etc part. Then by the domain name. Then the username. So this list: Bill.Mattews@amail.com Chester.Cheese@dmail.edu David@other.ymail.com Matt@zmail.edu Carter@bmail.net Edison@cmail.org Jason@new.amail.com Nathon.Apple@other.ymail.com NoExcuses@bmail.net Lana@bmail.com Will look like this after sorting: Bill.Mattews@amail.com Jason@new.amail.com Lana@bmail.com David@other.ymail.com Nathon.Apple@other.ymail.com Chester.Cheese@dmail.edu Matt@zmail.edu Carter@bmail.net NoExcuses@bmail.net dison@cmail.org Thanks. |
You could do it in three steps:
1. separate the various fields you want to sort by (by putting the extracted fields in front of the original line) using awk. In the case of the domain field, you would have to reverse the character order of the field. 2. sort by new fields 3. remove the new fields from the sorted file, outputting only the original data (also using awk). |
jpollard,
From what you said I looked up awk and found this code: Code:
awk 'BEGIN {FS="."; OFS="|"}{print$NF,$0}' test2 |sort -t"|" -k1 com|Bill.Mattews@amail.com com|David@other.ymail.com com|Jason@new.amail.com com|Lana@bmail.com com|Nathon.Apple@other.ymail.com edu|Chester.Cheese@dmail.edu edu|Matt@zmail.edu net|Carter@bmail.net net|NoExcuses@bmail.net org|Edison@cmail.org I believe this is part of step 1 that you mentioned but I don't understand when you said Quote:
|
You wanted things "abc.dom.net" and "dom.net" to be grouped together. To do that easily would require them to be sorted with reversed strings: "ten.mod" and "ten.mod.cba". This would put them together nearly appropriately.
An alternative (which is more work, but more accurate) would be to reverse the order of the names in the domain: "net.dom.abc" and "net.dom" and then sort... The intermediate file would be name net.dom.abc name@abc.dom.net name1 net.dom name1@abc.dom.net This gives you three fields, the first two are just keys for sorting. The last phase would just be a simple awk script that prints the third column. The resulting sorted intermediate file would be: name1 net.dom name1@dom.net name net.dom.abc name@abc.dom.net where the domain field is the primary key, and the name field the secondary key. As a side note, it would even be possible to switch the order such that the name field is second, but that is arbitrary as far as sort goes. The two key fields would have to be specified separately so that the sort would start the strings in the proper column. |
jpollard,
I tried to reverse the order of the names in the domains and I came up with this code: First I move everything after the "@" sign to a different file. Code:
awk -F'@' '{print $2}' $FILE > temp1 amail.com dmail.edu other.ymail.com zmail.edu bmail.net cmail.org new.amail.com other.ymail.com bmail.net bmail.com I then tried to reverse the order of the names, I found this code online and I'm still trying to figure out how it works. Code:
awk -F"." '{n=split($0,F); for(i in F) $i=F[n-i+1]}1' temp1 > temp2 com amail edu dmail com ymail other edu zmail net bmail org cmail com amail new com ymail other net bmail com bmail I then pasted temp2 and the orginal file to another file with "|" as a seperator Code:
paste -d'|' temp2 $FILE > temp3 edu dmail|Chester.Cheese@dmail.edu com ymail other|David@other.ymail.com edu zmail|Matt@zmail.edu net bmail|Carter@bmail.net org cmail|Edison@cmail.org com amail new|Jason@new.amail.com com ymail other|Nathon.Apple@other.ymail.com net bmail|NoExcuses@bmail.net com bmail|Lana@bmail.com I then sorted by the first column and then by the second column, removed the first column and placed it into another file. Code:
sort -f -t'|' -k1,1 -k2,2 temp3 | awk -F'|' '{print $2}' > FileSorted.txt Bill.Mattews@amail.com Jason@new.amail.com Lana@bmail.com David@other.ymail.com Nathon.Apple@other.ymail.com Chester.Cheese@dmail.edu Matt@zmail.edu Carter@bmail.net NoExcuses@bmail.net Edison@cmail.org I placed all the code in a script so that I didn't have to type each line of code every time. Thanks for your help, I would never of though of reversing the order of the names if you didn't mention it. |
Quote:
1) the array F gets the array created from the domain names as split by the field separator 2) the n gets the number of elements in the array the expression "n - i + 1" is used to process the elements of the array in reverse order. If you look at the "awk" man page, you will see that split has two optional parameters - a parameter to specify a pattern to use when splitting the string. In your case, you would want to split based on the ".", so it would be the pattern "/\./" (the other optional parameter is for an array to receive the character the split for the corresponding element). Using the extra parameter would eliminate the use of another array. Quote:
|
All times are GMT -5. The time now is 09:48 AM. |