Generally accessible in Java 8, higher order functions open up a whole new level of abstractions allowing us to reason about code and algorithms in a new way. When our code can operate on functions as input and output we are given the power to easily express algorithms that modify behavior just like we have always done for data.
The power to create algorithms that modify behavior makes it possible to reason about functionality. Such a higher order algorithm can therefore extend or optimize the characteristic of a function, before the function receives any input.
Speedment is an Open Source ORM with an API founded in Java 8 streams. This post describes some new features of the API of recently released Speedment version 3.0 which add more features to support higher order functions. A reader already familiar with higher order functions and declarative programming may benefit from skipping ahead to the last section.
Functions Operating on Functions
For example, it is natural to have a program that prints a sorted list of names that are given as input. When compiled, the sorting function is static but can dynamically be used on any data that happens to be a list of names. If our program was to be generalized to take the actual sorting algorithm as input, we have created a higher order function since our program accepts a function as input.
Seen in such a way, functional aspects of Java are not new. Supplying a function as parameter is commonplace in Java, perhaps most notably as event driven callbacks for asynchronous interfaces. The callback idiom is a very simple type of operation on functions since the function itself is never altered - it is just kept as a reference until the time comes to call it.
Runnable callback = new Runnable() {
@Override
public void run() {
System.out.println("ping");
}
};
// Some other time, in a context far, far away...
callback.run();
The concept of higher order functions entails not only that functions can be referred to and invoked within the framework of the language itself, but also that functions are treated as other data on which other functions can operate.
The Bliss of Meta Functions
IntStream.iterate(0, i -> i + 1)
.map(n -> n*2)
.filter(n -> n % 10 != 0)
.limit(10)
.forEachOrdered(System.out::println);
A reader used to for example unix pipes would perhaps interpret this algorithm as a source of data creating an unbounded sequence of integers on line 1, doubling each number on line 2, removing the numbers evenly divisible by 10 on line 3 and then on line 4 throwing away all numbers except for the first 10 before printing them when reaching line 5. Actually quite far from what will happen when the code is executed, this is in a sense a very useful intuition.
Interpreting this example as a stream of data being manipulated from top to bottom by each operation yields a correct understanding of the resulting output while creating minimal cognitive encumbrance on the reader. It makes it possible for a reader and even the designer of the code to correctly understand the meaning of the code in terms of output without really thinking about how the expression actually will be evaluated.
What actually does happen in lines 1 through 4 is that a function is created and modified. No integers are created, let alone filtered or mapped, until we reach line 5. That last line is a terminating operation of the stream entailing that it operates on all aspects of the stream, including both the source of data and the operations to finally return the result. Having access to the information about the entire operation and the source, the terminator of the stream can deliver the result without performing unnecessary operations on items what will be discarded by the limit operation.
This lazy evaluation is possible since the program describes the requested operations rather than actually executing them. What at first may seem like a simple sequence of function invocations on a sequence of integers is actually a creation of a function that is passed to another function.
By using higher order functions we decouple the declarative description of the function from the imperative interpretation. The program simply describes what is computed while leveraging previous work invested in the design of the used stream components which determine how the computation will take place.
Functions operating on other functions allows blissful ignorance of (and therefore also decoupling from) execution details.
Parallelism as a Simple Add-on
IntStream.iterate(0, i -> i + 1)
.parallel()
.map(n -> n*2)
.filter(n -> n % 10 != 0)
.limit(10)
.forEachOrdered(System.out::println);
This small alteration of the stream will yield fundamentally different execution characteristics. In this case, the parallel version is likely to be more wasteful than the original version. The typical implementation of a parallel stream, the Spliterator, will partition the stream of integers in chunks processed concurrently. Since several threads are helping out multiplying integers by two and filtering out numbers that are multiples of ten, the limit operation which requires synchronization between the parallel workers will be fed with more numbers than it will allow to pass through. The first non-parallel example would pull items from the source when needed, stopping just when the limit is reached.
Speedment Taking Higher Order Functions to the Extreme
An ORM provides a mapping between the data model of a relational database and the object centric view of an object oriented language such as Java. The Java 8 stream ORM Speedment provides such a mapping with a fully declarative Java 8 stream API.
As seen above, the conceptual building blocks for declarative programming of applications leveraging relational database data are available even without Speedment;
- the database is queried using the declarative SQL and
- the Java 8 streams features provide a similarly declarative approach to design of the operations on the data once retrieved to the JVM.
However, without Speedment the two declarative languages have to be mixed in an application. In a previous post on this blog we have seen that the designer has to suboptimize two separate programs; the declarative description of the data to retrieve from the database and then in a separate language the operations to perform on that data in the JVM.
The language barrier between the database engine and the JVM forces the designer to design, optimize and then maintain the structure of the data as transferred from the database since it constitutes output from the first part of the solution and input to the next. Taking the declarative approach from its starting point in two declarative building blocks all the way to its logical conclusion of a unified declarative construct, Speedment abstracts away the SQL query and allows the user of the framework to create a design based solely on streams.
Consider the fully declarative Speedment way of counting the users belonging to a particular department in the following code snippet.
long count = users.stream()
.filter(User.DEPARTMENT.equal(1))
.count();
The source of the stream is a manager instance called users. As described in length above, viewing a stream as a sequence of objects flowing from the source and modified on each following line will give the correct understanding of the result of the operation while not necessarily any insight to the actual execution of the program. The same applies here. Retrieving all the users from the database in line 1, then in line 2 filtering out the users belonging to department number 1 and finally counting the remaining users would yield the desired result of the operation.
This understanding of the algorithm has a strong appeal in its abstract simplicity and allows the Speedment framework to decide on all the details of the execution. Not only does Speedment optimize the operations on the data in the JVM, but since also the relational database operations are covered by the abstraction the SQL query will be created by Speedment taking the full pipeline of operations into account.
By means of a sequence of reductional operations on the pipeline Speedment will collapse the operations to a single SQL statement that relieves the JVM from any operations on user instances. The executed query will actually be as follows.
SELECT COUNT(*) FROM user WHERE (department = 1)
As showed above in the example of the parallel streams, in a declarative setting where the framework makes decisions on execution strategy it is easy to reuse a declarative program in a new setting with completely different executional properties. In its enterprise version, the Speedment framework may also chose to execute queries using data from an in-JVM memory data store, improving database latency by orders of magnitude without changing a single line of the declarative program describing the business logic.
Speedment 3.0 Structural Support for Higher Order Functions
users.stream()
.filter(User.BORN.between(1985, 1995)) // Filters out Users born 1985 up to and including 1994
.map(User.CATEGORY.setTo(3)) // Applies a function that sets their category to 3
.forEach(users.updater()); // Applies the updater function to the selected users
The updater construct is new in Speedment 3.0 and lends itself to executional optimization in the Speetment framework since if the preceding pipeline has suitable properties the whole update operation pipeline may be collapsed to a single SQL UPDATE statement. While version 3.0 of the framework does not perform all the possible optimizations to collapse the pipeline to a single SQL statement, the program will yield a correct result by a sequence of UPDATE operations.
Another example of the extended support for higher order functions in Speedment 3.0 are the categorizers. Functions designed to work with the standard Java stream operations, the categorizers allow constructs such as the following which creates a mapping from users to their homepages assuming a homepage has a foreign key pointing to its owner.
Map<User, List<HomePage>> join = homepages.stream()
.collect(
groupingBy(users.finderBy(HomePage.OWNER), toList())
);
As we have seen in two examples above, the beauty of the declarative programming paradigm is that executional properties are abstracted away. When the Speedment framework in a later version implements all the needed optimizations, existing Speedment leveraging applications will start emitting smarter SQL queries to the relational database as soon as they use the newer framework version. A dream come true from an application maintenance perspective, the declarative program describes a solution and therefore remains the same since the problem it solves is the same.