LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Sorting Email list (https://www.linuxquestions.org/questions/linux-newbie-8/sorting-email-list-4175508407/)

Rail89 06-18-2014 02:05 PM

Sorting Email list
 
Hi,
I have an email list that I would like to sort.
The problem is that emails can be of different sizes and won't always have the same number of columns.

Example

Hello@zmail.com
Hello.World@zmail.net
Hello@mail.zmail.org
Hello.World@mail.zmail.edu

This makes it difficult for me because I want to sort first by the .com/.net/.etc part.

Then by the domain name.

Then the username.

So this list:

Bill.Mattews@amail.com
Chester.Cheese@dmail.edu
David@other.ymail.com
Matt@zmail.edu
Carter@bmail.net
Edison@cmail.org
Jason@new.amail.com
Nathon.Apple@other.ymail.com
NoExcuses@bmail.net
Lana@bmail.com

Will look like this after sorting:

Bill.Mattews@amail.com
Jason@new.amail.com
Lana@bmail.com
David@other.ymail.com
Nathon.Apple@other.ymail.com
Chester.Cheese@dmail.edu
Matt@zmail.edu
Carter@bmail.net
NoExcuses@bmail.net
dison@cmail.org


Thanks.

jpollard 06-18-2014 08:11 PM

You could do it in three steps:

1. separate the various fields you want to sort by (by putting the extracted fields in front of the original line) using awk. In the case of the domain field, you would have to reverse the character order of the field.
2. sort by new fields
3. remove the new fields from the sorted file, outputting only the original data (also using awk).

Rail89 06-20-2014 08:34 AM

jpollard,
From what you said I looked up awk and found this code:

Code:

awk 'BEGIN {FS="."; OFS="|"}{print$NF,$0}' test2 |sort -t"|" -k1
which gave me this result:

com|Bill.Mattews@amail.com
com|David@other.ymail.com
com|Jason@new.amail.com
com|Lana@bmail.com
com|Nathon.Apple@other.ymail.com
edu|Chester.Cheese@dmail.edu
edu|Matt@zmail.edu
net|Carter@bmail.net
net|NoExcuses@bmail.net
org|Edison@cmail.org

I believe this is part of step 1 that you mentioned but I don't understand when you said
Quote:

In the case of the domain field, you would have to reverse the character order of the field.
What exactly do you mean?

jpollard 06-20-2014 09:59 AM

You wanted things "abc.dom.net" and "dom.net" to be grouped together. To do that easily would require them to be sorted with reversed strings: "ten.mod" and "ten.mod.cba". This would put them together nearly appropriately.

An alternative (which is more work, but more accurate) would be to reverse the order of the names in the domain:
"net.dom.abc" and "net.dom" and then sort...

The intermediate file would be

name net.dom.abc name@abc.dom.net
name1 net.dom name1@abc.dom.net

This gives you three fields, the first two are just keys for sorting. The last phase would just be a simple awk script that prints the third column.

The resulting sorted intermediate file would be:

name1 net.dom name1@dom.net
name net.dom.abc name@abc.dom.net

where the domain field is the primary key, and the name field the secondary key. As a side note, it would even be possible to switch the order such that the name field is second, but that is arbitrary as far as sort goes.

The two key fields would have to be specified separately so that the sort would start the strings in the proper column.

Rail89 06-26-2014 09:22 AM

jpollard,

I tried to reverse the order of the names in the domains and I came up with this code:

First I move everything after the "@" sign to a different file.

Code:

awk -F'@' '{print $2}' $FILE > temp1
From the example list of emails I gave before this line of code gave me this as result.

amail.com
dmail.edu
other.ymail.com
zmail.edu
bmail.net
cmail.org
new.amail.com
other.ymail.com
bmail.net
bmail.com

I then tried to reverse the order of the names, I found this code online and I'm still trying to figure out how it works.

Code:

awk -F"." '{n=split($0,F); for(i in F) $i=F[n-i+1]}1' temp1 > temp2
This line of code gave me this as a result.

com amail
edu dmail
com ymail other
edu zmail
net bmail
org cmail
com amail new
com ymail other
net bmail
com bmail

I then pasted temp2 and the orginal file to another file with "|" as a seperator

Code:

paste -d'|' temp2 $FILE > temp3
com amail|Bill.Mattews@amail.com
edu dmail|Chester.Cheese@dmail.edu
com ymail other|David@other.ymail.com
edu zmail|Matt@zmail.edu
net bmail|Carter@bmail.net
org cmail|Edison@cmail.org
com amail new|Jason@new.amail.com
com ymail other|Nathon.Apple@other.ymail.com
net bmail|NoExcuses@bmail.net
com bmail|Lana@bmail.com

I then sorted by the first column and then by the second column, removed the first column and placed it into another file.

Code:

sort -f -t'|' -k1,1 -k2,2 temp3 | awk -F'|' '{print $2}' > FileSorted.txt

Bill.Mattews@amail.com
Jason@new.amail.com
Lana@bmail.com
David@other.ymail.com
Nathon.Apple@other.ymail.com
Chester.Cheese@dmail.edu
Matt@zmail.edu
Carter@bmail.net
NoExcuses@bmail.net
Edison@cmail.org


I placed all the code in a script so that I didn't have to type each line of code every time.

Thanks for your help, I would never of though of reversing the order of the names if you didn't mention it.

jpollard 06-26-2014 09:08 PM

Quote:

Originally Posted by Rail89 (Post 5194412)
jpollard,

I tried to reverse the order of the names in the domains and I came up with this code:

First I move everything after the "@" sign to a different file.

Code:

awk -F'@' '{print $2}' $FILE > temp1
From the example list of emails I gave before this line of code gave me this as result.

amail.com
dmail.edu
other.ymail.com
zmail.edu
bmail.net
cmail.org
new.amail.com
other.ymail.com
bmail.net
bmail.com

I then tried to reverse the order of the names, I found this code online and I'm still trying to figure out how it works.

Code:

awk -F"." '{n=split($0,F); for(i in F) $i=F[n-i+1]}1' temp1 > temp2

The line "n=split($0,F)", does two things:
1) the array F gets the array created from the domain names as split by the field separator
2) the n gets the number of elements in the array

the expression "n - i + 1" is used to process the elements of the array in reverse order.

If you look at the "awk" man page, you will see that split has two optional parameters - a parameter to specify a pattern to use when splitting the string. In your case, you would want to split based on the ".", so it would be the pattern "/\./" (the other optional parameter is for an array to receive the character the split for the corresponding element). Using the extra parameter would eliminate the use of another array.

Quote:

This line of code gave me this as a result.

com amail
edu dmail
com ymail other
edu zmail
net bmail
org cmail
com amail new
com ymail other
net bmail
com bmail

I then pasted temp2 and the orginal file to another file with "|" as a seperator

Code:

paste -d'|' temp2 $FILE > temp3
com amail|Bill.Mattews@amail.com
edu dmail|Chester.Cheese@dmail.edu
com ymail other|David@other.ymail.com
edu zmail|Matt@zmail.edu
net bmail|Carter@bmail.net
org cmail|Edison@cmail.org
com amail new|Jason@new.amail.com
com ymail other|Nathon.Apple@other.ymail.com
net bmail|NoExcuses@bmail.net
com bmail|Lana@bmail.com

I then sorted by the first column and then by the second column, removed the first column and placed it into another file.

Code:

sort -f -t'|' -k1,1 -k2,2 temp3 | awk -F'|' '{print $2}' > FileSorted.txt

Bill.Mattews@amail.com
Jason@new.amail.com
Lana@bmail.com
David@other.ymail.com
Nathon.Apple@other.ymail.com
Chester.Cheese@dmail.edu
Matt@zmail.edu
Carter@bmail.net
NoExcuses@bmail.net
Edison@cmail.org


I placed all the code in a script so that I didn't have to type each line of code every time.

Thanks for your help, I would never of though of reversing the order of the names if you didn't mention it.
No problem. I once had a similar problem with a cross reference listing, and had to do the same type of thing, though mine didn't have as many elements in the name.


All times are GMT -5. The time now is 09:48 AM.