ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
each file has around 19988 line; the start and end of each line in a.txt are the same in its corresponding line in b.txt "lines #1 in a.txt and #1 in b.txt both begin with 2 and finish with 91, and so on for all lines". Lines can have different lenghts even corresponding lines "line #1 from a.txt has length = 5, but #1 in b.txt has length 6". The length is: the number if numbers - 1.
Now, what I'm looking to know, is figuring out for how much similar corresponding lines to each other, e.g:
Line #1 from a.txt: 2 4 6 7 8 91
Line #1 from b.txt: 2 4 6 66 9 19 91
From left to right, (2 to 4) and (4 to 6) the only two jumps shared by both lines so the jump-similarity degree is 2. Also, How many numbers are shared by both lines? (2,4,6,91) only, so the node-similarity degree is 4-2 = 2 since the start and end are always the same as I mentioned earlier. I'll appreciate your help on this!
You should also first do careful research to see if you are, in fact, solving a problem that has already been solved before, such that you do not actually need to write new code to do any part of it (other than, say, the text parsing, which is trivial with regular expressions).
Then, bring to bear the real programming-language of your choice that has good support for vectors. Perl, Python, Ruby ... not Bash (which isn't a programming language anyway, and please don't start a tangent on this) and not C/C++ (which would be overkill). You want to find and use just as much alreadybuiltandtested code as you can find ... over here, for instance.
Last edited by sundialsvcs; 12-22-2011 at 04:39 AM.
You seem to need to develop your program as two fundamental parts: one that implements the comparison of two records, and produces some measure of similarity according to your requirements, and an iteration component that reads one record from each file and calls the comparison routine, passing the two records to it on each iteration. Shell scripting is probably sub-optimal for this, but depending in the complexity of your comparison algorithm, is probably do-able. You should be able to focus your design on these two elements more or less independently; the divide and conquer principle.
No one here is likely to fully understand your requirements for the record comparison algorithm without a significantly more detailed description. You need to do this anyway, as part of your design process. Developing a rigorous specification should help you understand the probable method/algorithm that will ultimately be used. On the matter of the outer layer that iterates over all records in the files, that should be easily done with standard shell looping constructs and file IO. Shell commands/keywords like while and for are going to be part of the looping code. Getting data records from files will probably use read. If you choose to implement the code in some other language, the basic structure should probably be the same.
Start writing some code, and when you bump into roadblocks, post the relevant fragments for specific help.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.