Jul 25, 2016

The Way of the Lambda - Elegant and Efficient

In Java 8, you can do
items.removeIf(i -> predicate(i));
where you used to do
for (Iterator it = items.iterator(); it.hasNext();) {  
    if (predicate(it.next())) {
        it.remove();    
    }
}
which is not only considerably more elegant but also a more efficient solution in some cases. I have the graphs to prove it!

Background

Recently having spent a few years away from Java development, it is great to be back and find that Java 8 indeed has delivered on what was promised in 2013. With lambdas and streams it is possible to tackle problems in a functional manner and write code that is both readable and efficient. This over due modernization of the language is by far the most important change in Java 8, but there are also other improvements that deserve mentioning. This post elaborates on the topic of Collection.removeIf() which allows for conditional removal of elements of a collection.

The Way of the Iterator

Removing elements of a collection while iterating over it is somewhat like cutting off the branch on which you are sitting, but by using Iterators that can be done without a runtime ConcurrentModificationException. Consider the following typical simplistic example, where the elements of the collection items for which predicate() holds true are removed.
for (Iterator it = items.iterator(); it.hasNext();) {
    if (predicate(it.next())) {
        it.remove();    
    }
}
If the collection in question is a linked list of size n and m elements are removed, the full iteration with removals is O(n + m) = O(n) since iteration steps and removal from the list are constant time operations and m is limited by n. If, however, the data structure of the collection is one where removal is relatively expensive compared to iteration, the complexity becomes less favorable. For an ArrayList a single removal step is O(n) since all items at higher indices have to be moved. There are still n iteration steps to be made so the full sequence of removals becomes O(m*n) which is O(n^2) assuming worst case with the number of removals being a constant fraction of the array size.

In more pragmatic terms, what happens in the ArrayList case is that for each element that is removed, all elements of higher indices are moved to fill the gap in the backing array. This is clearly suboptimal since that operation is repeated for each removal. Thus, to reach its final destination an element will visit all locations between the starting position and the destination. In case many elements are removed, this may be a considerable amount of work in vain.

The Java 8 way of the lambda

In Java 8, the code snippet from above can be expressed as follows.
items.removeIf(i -> predicate(i));
Clearly more concise, this expression is also more efficient on an ArrayList compared to the Iterator approach. Where the Iterator requires completion of each removal operation before continuing to the next one, removeIf operates on a predicate and can therefore iterate over all items once to determine which elements are to be removed and then shift the remaining elements to the correct new indices all in a second pass. Therefore, removeIf is O(n) since it performs two linear passes through the collection.
Each retained element in the resulting collection is moved just once, while for the Iterator approach it would be moved one step at a time towards its destination. See this gist for the relevant JDK code.

The Difference Visualized

We compare execution times for the two approaches as a function of the array size for a rather extreme case where 999 out of 1000 items are removed from an ArrayList. For each data point 40 runs are made, the 5 fastest and slowest outliers are removed and the sum of the remaining 30 form the execution time value. For details please refer to this gist.

We expect the execution time to be considerably shorter for the Java 8 way of the lambda and as a matter of fact that is what we find. We also note that within the resolution of the graph, the lines are quite noise free due to the many repeats and filtering of outlier results.



For array sizes of 150,000 elements, the removeIf() approach is more than 3000 times faster. Even more importantly, the factor between the two increases as the size of the array increases which is expected since the way of the lambda has a superior asymptotical time complexity.

To better compare the time complexity of the two, we scale up the green line to match the green one. By multiplying the execution times of the faster algorithm by ~1500, the graphs can be compared in terms of shape. As expected, we find linear behavior for the Java 8 removeIf() design, while the old ways show a quadratic time complexity due to the inefficient removal of elements.


When scaling up the measurements we get to see some noise in the green line. The sensitivity to random fluctuations in execution time is greater since the execution times are several orders of magnitude shorter compared to the blue. Both lines are fitted to polynomials of power two and as can be clearly seen, the light green line is very similar to a straight line while the blue line clearly aims for the sky in a super linear fashion.

Edit: There is a reddit thread about this blog post. Good points are being made there, for example the fact that the removeIf implementation is atomic (in the database sense) as it first evaluates the predicate for all items before making any alterations to the list. Thus, if the predicate throws an exception for any of the items in the list, the list will remain unchanged.

Also published at DZone: The Way of the Lambda and removeIf().


12 comments:

  1. How does it compare with an index loop?
    for (int i = 0; i < arrayList.size(); i++) {
    if (condition) arrayList.remove(i--);
    }

    ReplyDelete
    Replies
    1. That is also O(n^2). The loop will pass n times, and each remove operation is O(n), so we end up with an O(n*n) solution.

      Delete
    2. The cost to remove a item from a Collection depends on the implementation.
      The most famous being Arraylist O(N) and LinkedList O(1)

      Delete
    3. Initialize your i to the arrayList.size()-1 and decrement. That will be faster.

      Delete
    4. Backwards is faster but still O(n^2). I agree that the cost of removal is highly dependent on the data structure used. Contrasting with a LinkedList in the beginning, this blog post is intended to cover only the special case of an ArrayList.

      Delete
  2. Your solutions also works in O(n^2) time - after each removal all elements in ArrayList will be shifted in memory.
    One of the easiest solutions without Java Streams is to create new list only with elements satisfying predicate:

    List filteredItems = new ArrayList<>();
    for (Item item: items) {
    if (predicate(item)) {
    filteredItems.add(item);
    }
    }

    ReplyDelete
    Replies
    1. I did not intend to convey that I have created any novel solution to the removal problem, on the contrary I meant to point out that the Java 8 remove operation is O(n). Thus, none of the solutions are mine in any sense.

      I think that your comment beautifully illustrates the reason why the The Java 8 implementation is more efficient. Ignoring any language or implementation details and just considering the problem per se, it is easy to see that removal of elements in an array is O(n) (since it entails relocation of O(n) elements). Thus, solving the general problem of removing n elements from an array by using the basic removal operation will always be O(n^2).

      The reason for this is that we have introduced an extra constraint to the original problem when deciding to use item removal operations. The basic item removal operation has to maintain the data structure invariants for each loop of our solution, which (as you point out) requires relocation of all elements of higher indices.

      Your suggestion demonstrates a straight-forward solution to the problem - just copy the data over to a new structure while figuring out the new position of the element. The Java 8 removeIf solution is slightly more complex but avoids creating a new List. (Basically two passes - one to find the elements to move and a second to move each element once to its new position.)

      See https://gist.github.com/lawesson/101ed054fcdb00a94d69b34e04ef0be7

      Delete
  3. Does the Java 8 solution silently use multithreading ?
    Or can it be modifier for this ?

    ReplyDelete
    Replies
    1. As written above, the execution will take place in a single thread. It is how ever trivial to add a ".parallel()" to the stream to make the solution using several threads. When happens then is that the stream is split into chunks of data, to be handled in parallel streams created transparently to the user.

      Good question, BTW, in some future blog post I will use this as an example of the advantages of declarative programming.

      Delete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. Nice post. I also want you to visit this web-page to learn more about facebook spy app.

    ReplyDelete
  6. Admiring the time and effort you put into your blog and detailed information you offer!..
    Lyko Rabattkod

    ReplyDelete