sequence of numbers, how to extract which numbers are missing

ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Welcome to LinuxQuestions.org, a friendly and active Linux Community.

You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!

Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.

sequence of numbers, how to extract which numbers are missing

I have a sequence of numbers sorted in an array, but they are not necessarily contigous. (ex 1,2,4,7,8). How can I iterate through determine which numbers are missing? They should each count up by one.

I have done this as the following, but think there may be a better way to do it.
Started at the first value, if the next array value is 1 greater than the previous, then it is in sequence, then move to the next array element.

You'd have to step through each element of the array once so you approach would appear to be alright. Was there something more specific that you were concerned about?

Hi,
I dont know what language you are using, so I will make my examples in Python
One idea is to generate another array with the complete sequence and then filter the ones that are in your array.
You would have:

Code:

numbers = [1,4,5,7,9]
maxNumber = max(numbers)
sequence = range(1, maxNumber + 1) # +1 because it stops one less
# this is a function that returns true if a value is not in your array of numbers
def notIncluded(x): return x not in numbers
dif = filter(notIncluded, sequence)
#here you have en dif the numbers that are missing

Another solution (far more declarative) is, again, to generate the whole sequence of numbers and then remove the ones you have:

Code:

numbers = [1,4,5,7,9]
maxNumber = max(numbers)
sequence = range(1, maxNumber + 1) # +1 because it stops one less
# This aplies the function sequence.remove to all the elements
# in numbers. Since sequence.remove takes one element from the
# sequence list, at the end you'll have in the sequence
# variable, a list with the missing numbers
map(sequence.remove, numbers)

The drawback from this solutions is that you have to generate an array with the whole sequence, which could take a lot of memory if your sequence is extremely long.
Plus, you could need to write some more depending on the language you are using (i.e. in C you would have to work some more for this solution since you dont any of this prebuilt functions).

But its another solution which might be useful (which I hope it is).

I have a sequence of numbers sorted in an array, but they are not necessarily contigous. (ex 1,2,4,7,8). How can I iterate through determine which numbers are missing? They should each count up by one.

I have done this as the following, but think there may be a better way to do it.
Started at the first value, if the next array value is 1 greater than the previous, then it is in sequence, then move to the next array element.

Thanks,

Jon

i dont see a problem with how you are doing it unless you also needed to keep track of which numbers were missing. then you would need 2 counters, one for the index position in the array and one for the ordered count.

Hi,
6000 numbers doesnt seem too much for todays computers (mine took no time in solving it...) unless your max number is 10000000.
But since we are talking Perl, here is a Perl solution. First, since we are talking Perl, you array maybe shouldnt be an array. It should probably be a hash, with your numbers as key. I think this is better because Perl doesnt have a happy way to check if a number is in an array. So, apart from the solution, to change your array to a hash be could write:

And as I am writting this I came with a cooler solution! (I just love Perl ). My new idea is to traverse the numbers array just once, and in each step we generate the missing secuence (with the .. operator). Something like this:

Thank you so much for your replies. demon_vox I was able to use some of the code you gave me to improve my scripts. I hadn't used map or push before (briefly heard of them) but have found ways to improve my scripts with them.

hmm, you could use a search like algorithm to check where you have an error. Since all your numbers add one up, you know what number that should be in the middle. So you read the number in the middle, if it is what you expected this means the first half of the list is ok. If it isn't, this means you have a problem in the first half of the list. You just repeat this for the first have of the list, until you find the index where the problem resides. You solve the problem and start all over again.

Searching a list this way is O(log n) in time, so as long as there aren't to many issues it should give you a speedup. Of course if you have to solve your problem by inserting stuff, this will get slow because the whole tail of the list will have to be copied everytime! This was probably your problem in the first place.

But if every element is in error ie [1,3,5,7,9] Then it is O(n), but using your idea you could start from the end of the list and then mimic a binary search. The best case would therefore be O(2), a list with one error would be O(log n) a list with m errors would be O(min(mlog n, n)) and so a list with n errors would be O(n).

%hash = map {$_, $_} @list; # make a hash of it!!!

this is making a hash of the list by doubling up the members.
like $hash{1} = 1;
map returns what is in the {} block for each member of the list,
in this case just ($_, $_ } (could have been any value)

i.e. simply: (1,2,3) becomes (1,1,2,2,3,3)
so assign this to a hash. like {1=>1, 2=>2, 3=>3 } etc.

as you said the list is sorted, (i assumed ascending) then the last element $list[-1]
must be the highest number. So (1..$list[-1]) is the list of integers between the two.

grep finds in a list (1..max), items ($_) that satisfy the block expression.
so we find items that are NOT defined in the previous hash and return the value.
( just using !defined($hash{$_}) would return 1 each time, we need to return the value
hence && $_)

I hope that helps.

map applies function to each member of a list
grep simply is used for filtering out members of a list

LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.