Pages

SPOJ Prime generator

Another well known source of programming problems is SPOJ, SPhere Online Judge. Among the SPOJ classical problems, the first (interesting) one is about building a prime generator, with a few specific requirements.

I guess the C++ solution I am about to describe here won't be considered a SPOJ top notch one, because instead of focusing on writing the most performant piece of code one can do, I have been more interested in writing something that is easy to understand. Anyone has his own priority, I guess.

The code should give as output all the prime numbers in a close interval [low, high] with a span less or equal to 100,000. The biggest number we are interested in is a mere 1,000,000,000. So no big performance issues, if you are using a decent machine.

Given the relatively small numbers involved, we can safely use the sieve of Eratosthenes to check if a given number is a prime or not, as already discussed in a previous post.

I found interesting to design a solution based on a class, like this:
class PrimeGenerator
{
private:
    enum { MAX_VALUE = 1000000000, MAX_DELTA = 100000 }; // 1
    std::vector<int> primes_; // 2
public:
    PrimeGenerator(); // 3
    std::vector<int> findPrimes(int low, int high); // 4
};
1. In the real world, I'd prefer to have these value configurable, but for our toy generator a couple of constants would suffice.
2. Here I'm going to store a few ten thousands prime numbers, enough of them to let the sieve working fast on any expected input interval. Given the requested MAX_VALUE, working with int (on a 64 bit architecture) is more than enough.
3. The constructor would take care of the boring and time expensive job of setting up the private primes vector.
4. So that this function would be (relatively) blazing fast.

My solution would make sense if we plan an extensive use of this prime generator, otherwise the pain that I take in the constructor to calculate all those prime numbers could be a sheer waste of time.

Here is the constructor:
PrimeGenerator::PrimeGenerator()
{
    primes_.reserve(3401); // 1
    primes_.push_back(2);
    primes_.push_back(3); // 2

    const int MAX_DIV = static_cast<const int>(std::ceil(std::sqrt(static_cast<double>(MAX_VALUE)))); // 3
    for(int i = 5; i <= MAX_DIV; i += 2) // 4
    {
        bool prime = true;
        for(std::vector<int>::iterator it = primes_.begin(); it != primes_.end(); ++it)
        {
            if(i % *it == 0)
            {
                prime = false;
                break;
            }
        }
        if(prime)
            primes_.push_back(i);
    }
}
1. What is this 3401 magic number? Well, I cheated. The sqrt(1,000,000,000) is a bit less than 31,623, and in [1 .. 31,623] there are 3401 prime numbers. So, knowing that these are the primes number that we need (if you wonder why, you could check my previous post on the sieve of Eratosthenes to clear your doubts - hopefully), I reserved this exact number of elements for the primes vector. This would save some memory (re)allocation time.
2. The first cases are so trivial, I decided to resolve them by hand.
3. Whats?! Don't be fooled by this longish line, it is actually saying only that I take the square root of the biggest value my toy generator manages, round to the next integer and converted to an int (counterintuitively the ceiling function ceil() returns a double value, if you think about it, it makes sense, still could be perplexing at first sight). This value is 31623, but I would have felt bad using another magic number in so few lines, being it so easy to calculate.
4. Check only odd numbers till square root max value, doesn't make sense checking for even numbers, since we already know they are not primes. When I found a prime, I add it to the private vector of primes. And so on till I checked all 31K+ values. A smarter prime calculator would cache those values in a file, so that I would have this pain just once. But let's assume this piece of code is meant to be loaded once in memory and stay there alive and running for a long time, so that optimizing its startup won't be much useful.

Once the vector containing that bunch of primes number is ready, I could call the only (other) method in the class public interface to get all the primes number in a given interval:
std::vector<int> PrimeGenerator::findPrimes(int low, int high)
{
    if(low < 1 || high < 1 || low > high || high > MAX_VALUE || high - low > MAX_DELTA) // 1
    {
        std::cout << "Bad input values" << std::endl;
        return std::vector<int>();
    }

    std::vector<int> output;
    output.reserve((high - low) / 2); // 2

    int actualLow = low; // 3
    if(low <= primes_.back())
        for(std::vector<int>::iterator it = primes_.begin(); it != primes_.end() && *it <= high; ++it)
            if(*it >= low)
                output.push_back(*it);
    if(!output.empty()) // 4
        actualLow = output.back() + 1;

    int maxRoot = static_cast<int>(std::ceil(std::sqrt(static_cast<double>(high)))); // 5
    for(int i = actualLow; i <= high; ++i)
    {
        bool prime = true;
        for(std::vector<int>::iterator it = primes_.begin(); it != primes_.end() && *it <= maxRoot; ++it)
        {
            if(i % *it == 0)
            {
                prime = false;
                break;
            }
        }
        if(prime)
            output.push_back(i);
    }

    return output;
}
1. Let's check the input value before start doing the real job. You might want to throw an exception in case of unexpected values. I take here a milder approach, returning no prime number (and logging a message to standard output console).
2. Saving some memory allocation time. The number of prime numbers in a given interval is not a number that you can estimate easily. But we can safely assume they are usually less than half the size of the interval, and the exceptions to this empirical rule are about meaningless.
3. We have already calculated a bunch of prime numbers. It is worthy to check among them before doing again the same expensive job.
4. If we have found at least an element in the cached vector of primes, we start checking the values in the interval above it.
5. Time to apply Eratosthenes, in a restricted version, given what we already know. We don't need to check the values against all the primes we have, it is enough to check them till the root of the actual max value.

As usual, I wrote a number of tests to drive the code development. This post is already too long to include them, but I guess you can figure them out.

No comments:

Post a Comment