sequence of numbers, how to extract which numbers are missing

jonlake · 06-20-2006, 12:39 PM

I have a sequence of numbers sorted in an array, but they are not necessarily contigous. (ex 1,2,4,7,8). How can I iterate through determine which numbers are missing? They should each count up by one.

I have done this as the following, but think there may be a better way to do it.
Started at the first value, if the next array value is 1 greater than the previous, then it is in sequence, then move to the next array element.

Thanks,

Jon

graemef · 06-20-2006, 12:56 PM

You'd have to step through each element of the array once so you approach would appear to be alright. Was there something more specific that you were concerned about?

demon_vox · 06-20-2006, 05:29 PM

Hi,
I dont know what language you are using, so I will make my examples in Python

One idea is to generate another array with the complete sequence and then filter the ones that are in your array.
You would have:

Code:

numbers = [1,4,5,7,9]
maxNumber = max(numbers)
sequence = range(1, maxNumber + 1) # +1 because it stops one less

# this is a function that returns true if a value is not in your array of numbers
def notIncluded(x): return x not in numbers

dif = filter(notIncluded, sequence)

#here you have en dif the numbers that are missing

Another solution (far more declarative) is, again, to generate the whole sequence of numbers and then remove the ones you have:

Code:

numbers = [1,4,5,7,9]
maxNumber = max(numbers)
sequence = range(1, maxNumber + 1) # +1 because it stops one less

# This aplies the function sequence.remove to all the elements
# in numbers. Since sequence.remove takes one element from the
# sequence list, at the end you'll have in the sequence
# variable, a list with the missing numbers
map(sequence.remove, numbers)

The drawback from this solutions is that you have to generate an array with the whole sequence, which could take a lot of memory if your sequence is extremely long.
Plus, you could need to write some more depending on the language you are using (i.e. in C you would have to work some more for this solution since you dont any of this prebuilt functions).

But its another solution which might be useful (which I hope it is).

Greetings!

xhi · 06-20-2006, 06:39 PM

Quote:

Originally Posted by jonlake

I have a sequence of numbers sorted in an array, but they are not necessarily contigous. (ex 1,2,4,7,8). How can I iterate through determine which numbers are missing? They should each count up by one.

I have done this as the following, but think there may be a better way to do it.
Started at the first value, if the next array value is 1 greater than the previous, then it is in sequence, then move to the next array element.

Thanks,

Jon

i dont see a problem with how you are doing it unless you also needed to keep track of which numbers were missing. then you would need 2 counters, one for the index position in the array and one for the ordered count.

jonlake · 06-21-2006, 08:34 AM

Thought I put in the language, but I am using perl. My sequence is about 6000 numbers long, so it can take some time.

graemef · 06-21-2006, 08:56 AM

It shouldn't take very long you only need to step through the list once so the time required will be linear.

demon_vox · 06-21-2006, 09:51 AM

Hi,
6000 numbers doesnt seem too much for todays computers (mine took no time in solving it...) unless your max number is 10000000.

But since we are talking Perl, here is a Perl solution. First, since we are talking Perl, you array maybe shouldnt be an array. It should probably be a hash, with your numbers as key. I think this is better because Perl doesnt have a happy way to check if a number is in an array. So, apart from the solution, to change your array to a hash be could write:

Code:

my %hashNumbers = ();
map { $hashNumbers{$_} = 1 } @numbers;

and then we could solve the problem with:

Code:


map {
  push @result, $_ unless (defined($hashNumbers{$_}));
} (1..6000);

And as I am writting this I came with a cooler solution! (I just love Perl

). My new idea is to traverse the numbers array just once, and in each step we generate the missing secuence (with the .. operator). Something like this:

Code:

my @numbers = (1, 4, 5 ,7 ,9);

$lastNumber = 0;
@result = ();

for $currentNumber (@numbers) {
  $min = $lastNumber + 1;
  $max = $currentNumber - 1;
  push @result,  ($min .. $max);
  $lastNumber = $currentNumber;
}

print "$_\n" for (@result);

I like this solution because there is no searching and we just iterate once over the array.

Well I cant think of anything more right now (plus I should get to work jejejejje).

Hope this is useful!
Greetings!

jonlake · 06-21-2006, 03:35 PM

Thank you so much for your replies. demon_vox I was able to use some of the code you gave me to improve my scripts. I hadn't used map or push before (briefly heard of them) but have found ways to improve my scripts with them.

Thanks,

Jon

ppanyam · 06-22-2006, 12:32 AM

Similar to demon_vox's code, but more like a C-programmer!!!

Code:

#!/usr/bin/perl5 -w
my @numbers = (1, 4, 5 ,7 ,9);

@result = ();

for ($i=0;$i<$#numbers;$i++) {
  if ( $numbers[$i+1] - $numbers[$i] > 1)  {
     push @result,  ($numbers[$i]+1 .. $numbers[$i+1]-1);
    } 
}
print "$_\n" for (@result);

elluva · 06-22-2006, 03:14 AM

hmm, you could use a search like algorithm to check where you have an error. Since all your numbers add one up, you know what number that should be in the middle. So you read the number in the middle, if it is what you expected this means the first half of the list is ok. If it isn't, this means you have a problem in the first half of the list. You just repeat this for the first have of the list, until you find the index where the problem resides. You solve the problem and start all over again.

Searching a list this way is O(log n) in time, so as long as there aren't to many issues it should give you a speedup. Of course if you have to solve your problem by inserting stuff, this will get slow because the whole tail of the list will have to be copied everytime! This was probably your problem in the first place.

graemef · 06-22-2006, 04:25 AM

Quote:

Originally Posted by elluva

Searching a list this way is O(log n) in time

But if every element is in error ie [1,3,5,7,9] Then it is O(n), but using your idea you could start from the end of the list and then mimic a binary search. The best case would therefore be O(2), a list with one error would be O(log n) a list with m errors would be O(min(mlog n, n)) and so a list with n errors would be O(n).

bigearsbilly · 06-22-2006, 06:09 AM

my go!

Code:

#!/usr/bin/perl -ws



# make random list
@list = sort{$a<=>$b}  map {int rand(30) + 1} (1..15);

# your list goes here
# ===================
%hash = map {$_, $_} @list; # make a hash of it!!!
@missing = grep { !defined($hash{$_}) && $_ } (1..$list[-1]);

print "@list\n";
print "missing = @missing\n";

ppanyam · 06-23-2006, 12:33 AM

Wow!

Quote:

%hash = map {$_, $_} @list; # make a hash of it!!!
@missing = grep { !defined($hash{$_}) && $_ } (1..$list[-1]);

Can you anglicize your perl code plz. I dont seem to understand anything which isnt procedural(& OO).

ppanyam

bigearsbilly · 06-26-2006, 03:28 AM

I will try!

Code:

%hash = map {$_, $_} @list; # make a hash of it!!!

this is making a hash of the list by doubling up the members.
like $hash{1} = 1;
map returns what is in the {} block for each member of the list,
in this case just ($_, $_ } (could have been any value)

i.e. simply: (1,2,3) becomes (1,1,2,2,3,3)
so assign this to a hash. like {1=>1, 2=>2, 3=>3 } etc.

Code:

@missing = grep { !defined($hash{$_}) && $_ } (1..$list[-1]);

as you said the list is sorted, (i assumed ascending) then the last element $list[-1]
must be the highest number. So (1..$list[-1]) is the list of integers between the two.

grep finds in a list (1..max), items ($_) that satisfy the block expression.
so we find items that are NOT defined in the previous hash and return the value.
( just using !defined($hash{$_}) would return 1 each time, we need to return the value
hence && $_)

I hope that helps.

map applies function to each member of a list
grep simply is used for filtering out members of a list