Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have a contacts list with approximately 1400 entries.
What I need to do is determine "whos related/shares contact info phone/address with who", or in other words... I need to parse these contacts in a manner that shows which ones share the same contact details, like an address or a phone number. I know I could do it "manually" with kaddressbook, but I want something more robust that puts everything infront of me.
My solution could be a stand alone application, script, or web based (LAMP) app/script.
For starters, where is the list now - in kaddressbook, or in some other app/file?
The best solution would depend on the format of the data, and on which scripting/programming tools/languages you are most comfortable.
One approach might be to export the data to CSV/vCard/some other text-based format, and then write a script to do a comparison. If this is a once off exercise, you could probably find the duplicates with tools like grep or cut to extract the fields you want, and sort & uniq to find duplicates. For a more permanent tool there are probably better options, though, depending on your skills.
Thank You for your response. Currently I am using kaddressbook and Evolution. I am most comfortable with php, but that's about the extent of my programming skills.
I guess what am hoping for is to be able to list which contacts share a data-point, and analysis of such, if possible. I am playing with a few ladp web based apps, but nothing yet that does what I want, at least out of the box.
Could grep be used to find relations of data records? I assumed I would still have to input an initial value to find it's match, much like a general "search" algorithm.
You have a slew of options ... including both databases (even SQLite "files"), and the ubiquitous spreadsheet.
Although "1,400 records" is daunting for a human being, it's child's play for a computer. Quite frankly, I'd load the data into a spreadsheet (say, OpenOffice, or, dare I say it, Microsoft Excel ...), and, on additional "notebook pages," begin looking for commonality. You might wish to, for instance, sort the records by a particular column or set of columns and then simply scroll through them, looking for groups of identical or nearly-identical information (since sorting places these records adjacent to one another).
"Keep it simple. Very simple." You don't need to write a LAMP web-site. You probably won't even have to write a script. Your spreadsheet tool already possesses database connectivity, but, with "only" 1,400 records in play, I'm not entirely sure I'd bother.
Could grep be used to find relations of data records? I assumed I would still have to input an initial value to find it's match, much like a general "search" algorithm.
Let's look at duplicate phone numbers as an example. By using "grep" I meant use it to extract the phone numbers (1st step). grep would be a good tool for this if you export the data to vCard / LDIF file(s). Then use "sort" and "uniq" to find all phone numbers that are duplicated.
Once you have the numbers that are duplicated, you can use the number as pattern for "grep" to find the contacts with that number.
As an example, I have a bunch of contacts in individual vcard files. I would do something like this:
This will give me a count (number of occurrences) for each number; where the count is higher than 1, I have a duplicate ;-)
If you export to a CSV file, you could use "cut" to extract the relevant field from the file (no grep required), and then do the same "sort | uniq -c | sort -rn" on that output.
I hope that makes more sense now ;-)
If you want to write something in PHP, I would start out by exporting to CSV, and importing that into an SQLite database as sundialsvcs' suggested. From there it shouldn't be too difficult to write some code to find duplicates. One advantage of such an approach is you could also sanitize the data a little in the PHP code (like standardize the format of phone numbers).
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.