Pages

Equilibrium in a vector

We have an array of integer in input, whose size we know for sure being in the range [2..100,000] and each element is in [−1,000..1,000]. We'd like to split it in two parts, so that the sum of the elements on both sides is as close as possible. For some weird reason, we are not asked to return the index for which such condition is fulfilled, but the minimal difference, as an absolute value, between left and right sum.

You could find this problem in the Codility Train page, in the Time Complexity section, under the nickname Tape-Equilibrium. You could submit there your solution for evaluation in one of the many languages supported. C++11 is not one of them, so I have written my code for C++98.

[Good news, now Codility supports C++11. I have refreshed the code accordingly. Please follow the link for details.]

Notice that the number of elements in the vector and the value for each element is such that we can happily work with plain integers, being the maximum sum we could get something about a mere hundred millions. The real issue in this problem is time complexity, we should strive for a linear solution, if we want to get a 100% score.

Firstly, some test cases (written for GoogleTest):
int equilibrium(std::vector<int>& input); // 1

TEST(TapEq, Given) // 2
{
    std::vector<int> input;
    input.push_back(3);
    input.push_back(1);
    input.push_back(2);
    input.push_back(4);
    input.push_back(3);

    EXPECT_EQ(1, equilibrium(input));
}

TEST(TapEq, TwoBig) // 3
{
    std::vector<int> input;
    input.push_back(-1000);
    input.push_back(1000);

    EXPECT_EQ(2000, equilibrium(input));
}

TEST(TapEq, HundredK) // 4
{
    std::vector<int> input(100000, 1000);

    EXPECT_EQ(0, equilibrium(input));
}

TEST(TapEq, AlmostHundredK) // 5
{
    std::vector<int> input(99999, 1000);

    EXPECT_EQ(1000, equilibrium(input));
}
1. I don't feel right to pass around a non-const reference to a STL container if there is not any compelling reason to do that. Here the reason is just that Codility wants us to do that.
2. This is the test given with the problem description. The equilibrium point is such that the vector is split between (3, 1, 2) and (4, 3) leading to a difference of 1, that is going to be returned to the caller.
3. Minimal case, just two elements.
4. Biggest vector I could get in input, each element has value 100,000, so the result is zero.
5. Almost like the case (4), but here we have an odd number of elements, all of them have the same value, so it is not possible to get a perfect equilibrium.

Bad solution O(N**2)

We could think of looping on all the elements in the vector, from the first to the last but one. That would be our pivot, dividing the vector in two parts. Then we'll all the left and right elements, and compare the results:
int equilibrium(std::vector<int>& input)
{
    int result = std::numeric_limits<int>::max(); // 1
    for(unsigned i = 0; i < input.size() - 1; ++i)
    {
        int left = 0; // 2
        for(unsigned j = 0; j <= i; ++j)
            left += input[j];

        int right = 0; // 3
        for(unsigned j = i + 1; j < input.size(); ++j)
            right += input[j];

        int difference = std::abs(left - right); // 4
        if(difference < result)
            result = difference;
    }

    return result;
}
1. In the beginning we have no result, let's remark this initializing the variable to the highest available value.
2. Sum up all the elements on the left of the current pivot (included).
3. Sum up all the elements on the right of the current pivot (excluded).
4. Get the difference between left and right, and keep it, if it is the current minimum value.

This algorithm is straightforward, but it has a major issue. Its time complexity is in the order of N squared, as we can see immediately, given the twin for-loops in a for-loop.

And, all this summing up is not required. Or better, we repeats many time the same addictions we have already done. A better solution would permit us to minimize them to bare necessity.

Linear solution

Let's start splitting the vector in two parts. One contains just the leftmost element, the other one all the other elements. Then it would be just a matter of adding the current border element to the left and simultaneously subtracting it to the right. We still have two O(N) for-loops, but they are one after the other, so the resulting time complexity falls to linear:
int equilibrium(std::vector<int>& input)
{
    std::vector<int>::iterator pivot = input.begin();
    int left = *pivot;
    int right = std::accumulate(++pivot, input.end(), 0); // 1
    int result = std::abs(left - right);

    for(; pivot < input.end() - 1; ++pivot) // 2
    {
        left += *pivot;
        right -= *pivot;
        int diff = std::abs(left - right);
        if(diff < result)
            result = diff;
    }

    return result;
}
1. The first for-loop is hidden in this call to STL accumulate() function.
2. Second for-loop. Notice that I loop till the last but one element, since I want the right side of the vector to contain at least an element.

No comments:

Post a Comment