Sep 29, 2016

Java 8: Streams in Hibernate and Beyond

In version 5.2 Hibernate has moved to Java 8 as base line. Keeping up with the new functional paradigm of Java 8 with lambdas and streams, Hibernate 5.2 also supports handling a query result set as a stream. Admittedly a small addition to the API, streams add significant value by allowing the Hibernate user to leverage streams parallelism and functional programming without creating any custom adaptors.

This post will elaborate on the added superficially small but fundamentally important streams feature of Hibernate 5.2 and then discuss how the Java 8 stream ORM Speedment takes the functional paradigm further by removing the language barrier and thus enabling a clean declarative design.

The following text will assume general knowledge of relational databases and the concept of ORM in particular. Without a basic knowledge of Java 8 streams and lambdas the presentation will probably seem overly abstract since basic features will be mentioned without further elaboration.

Imperative Processing of a Query Result

To contrast different approaches to handling the data from a relational database via Hibernate, we consider an example that despite its simplicity illustrates the big picture - an HQL query produces a set of Java objects that are further processed in the JVM. We start out fully imperative and gradually move towards a declarative design.

The table we use is a table of Hares, where a Hare has a name and an id.

  `name` varchar(45) NOT NULL,
  PRIMARY KEY (`id`)

To avoid discussing the query language per se, we use an example of a simplistic HQL query that creates a result set containing all the contents of a table of the database. The naïve approach to finding the item we are looking for would be to iterate over the data of the table as follows.

List<Hare> hares = session.createQuery("SELECT h FROM Hare h", Hare.class).getResultList();
for (Hare hare : hares) {
  if (hare.getId() == 1) {

Note how the design of the query result handling is fully imperative. The implementation clearly states a step-by-step instruction of how to iterate over the elements and what to do with each element. By the end of the day, when it is time to run the program, all programs are in a sense imperative since the processor will need a very explicit sequence of instrucitons to execute. The imperative approach to programming may therefore seem the most intuitive.

Declaring the Goal, Receiving the Path

In contrast to the imperative design, the declarative approach focuses on what to be done, rather than on how to do it. This does not just tend to create more concise and elegant programs, but introduces a fundamental advantage as it allows the computer to figure out the transition from what to how. Sometimes without even thinking about it, many programmers are used to this approach in the realm of relational databases since the query language SQL is one of the most popular instances of declarative programming. Relieved of the details of exactly how the database engine will retrieve the data the designer can focus on what data to get, and then of course what to do with it after it is retrieved.

Java 8 streams and lambdas allow for a declarative approach to handling collections of data. Instead of listing a sequence of instructions to be carried out, the user of a stream first creates a pipeline of abstract operations to be carried out and when presented with a terminated pipeline, the stream implementation will figure out the imperative details.

Even before Hibernate 5.2, our running example could be ported to the Java 8 domain of streams by just adding a simple method call in the chain of operations since the List itself has a stream method.

List<Hare> hares = session.createQuery("SELECT h FROM Hare h", Hare.class).getResultList();
  .filter(h -> h.getId() == 1)
  .forEach(h -> System.out.println(h.getName()));

While this example may seem similar to the imperative iteration in the previous design, the fundamental difference is that this program will first create a representation of the operations to be carried out and then lazy evaluate it. Thus, nothing actually happens to the items of the List until the full pipeline is created. We express what we want in terms of a functional composition of basic operations but do not lock down any decisions about how to execute the resulting function.

Since a major feature of functional programming is the compositional design, a more typical streams approach would be to chain stepwise operations on the data. To extract the name of the item, we may map the getter on the stream as follows.

List<Hare> hares = session.createQuery("SELECT h FROM Hare h", Hare.class).getResultList();
  .filter(h -> h.getId() == 1)
This pattern is suboptimal in many aspects. The obvious problem mentioned above of not using the database to filter the data in the first place is just one of them. Another problem is that this pattern forces the entire table to be loaded in JVM memory before the iteration can start. Notice how a List of Hare is populated in the first line of each code snippet. In order to start filtering that stream, we first instantiate the entire source of the stream. This means that in terms of memory footprint, the JVM memory will have to accommodate all Hares in the database - completely defeating the purpose of analyzing a set of data piece by piece in a stream.

Streaming a Result Set

As shown in the section above, naïve streams of query results in Hibernate before version 5.2 came with a steep price. Even though the underlying JDBC database driver handles the results of a query as a result set over which the user may iterate piece by piece, streams were not supported by the query result instance, forcing the functionally inclined developer to implement a stream source from the result set or go via an intermediate collection to get a stream without custom stream implementation.

With Hibernate 5.2, the query result can produce a stream, allowing the following minimal change in code which has the important advantage of not loading the entire table into an intermediate representation from which to source the stream.

session.createQuery("SELECT h FROM Hare h", Hare.class).stream()
  .filter(h -> h.getId() == 1)
With this improvement, Java 8 streams are efficiently supported out-of-the-box by Hibernate without creating a custom stream source from the result set.

Selection by the Source

As mentioned above, the code snippets so far illustrate the general case of an HQL query generating a result which the JVM will use for some further processing. The details of the example reveal an almost offensive lack of interest in what the database can do for the user locally in terms of filtering data and that shortcoming will be adressed in the following.

The optimization desperately needed for this code is of course to adjust the query to allow the database to create a result set closer to the desired result of the operation. Focusing on just filtering the rows of the database and leaving the extraction of the columns to the JVM, the now familiar code snippet can be updated to the following.

session.createQuery("SELECT h FROM Hare h WHERE id = 1", Hare.class).stream()

Note that this short piece of a program contains two declarative parts that require separate design with different kinds of considerations. Since the program is divided between what happens before and after the stream is created, any optimization will have to consider what happens on both sides of that barrier.

While this indeed is considerably more elegant than the first example (which admittedly for pedagogical reasons was designed to showcase potential for improvement rather than representing a real solution to a problem), the barrier poses a fundamental problem in terms of declarative design. It can rightfully be claimed that the program still is an imperative program composed by two declarative sub routines - first execute the query and then execute the Java part of the program. We may chose to refer to this as the language barrier, since the interface between the two declarative languages creates a barrier over which functional abstraction will not take place.

Enter Speedment - Going Fully Declarative

As discussed above, the advantages of Java 8 streams extend far beyond a more elegant syntax. The appeal of the functional approach to data processing also stems from among other things,
  • the seamless generalization to parallelism (expressing a design as a pipeline of operations is a great starting point for building a set of parallel pipes),
  • design by composition (reuse and modularization of code is encouraged by a paradigm of composing solutions as a composition of smaller operations),
  • higher order functions (behavior expressed as lambdas can be used as language entities such as parameters to methods) and
  • declarative programming (the application designer focuses on what is needed, the framework or stream primitives design determines the details about how, allowing lazy evaluation and shortcuts).

We have shown how the new Hibernate API of version 5.2 adds basic support for streams, which allows for a declarative approach to describing the operations applied to the dataset retrieved from the database. While this is a fundamental insight and improvement, the Hibernate design with a foundation in an explicit query language limits the reach of the declarative features of the resulting programs due to the language barrier constituted by the interface between two languages.

The logical next step along the path from iterative to declarative design would be to break the language barrier and that is what the Java stream ORM Speedment does.

In the Speedment framework, the resulting SQL query is the responsibility of the framework. Thus, a program leveraging Speedment does not use any explicit query language. Instead, all the data operations are expressed as a pipeline of operations on a stream of data and the framework will create the SQL query. Returning to our example, a Speedment based design could be expressed as follows.
  .filter(h -> h.getId() == 1)

The hares manager is the source of the stream of Hares. No SQL will be run or even created until the pipeline of operations is terminated. In the general case, the Speedment framework cannot optimize a SQL query followed by lambda filters since the lambda may contain any functionality. Therefore, the executed SQL query for this example will be a query for all data in the Hares table since the behavior of the first filter cannot be analysed by the framework. To allow the framework to optimize the pipeline, there is a need for a data structure representing the operations in terms of basic known building blocks instead of general lambda operations. This is supported by the framework and is expressed in a program as follows.

The pipeline of operations is now a clean data structure declaratively describing the operations without any runnable code, in contrast to a filter with a lambda. Thus, the SQL query that will be run is no longer a selection of all items of the table, but instead a query of the type "SELECT * FROM hares WHERE ID=1". Thus, by removing the language barrier, a fully declarative design is achieved. The program states "Find me the names of the hares of the database with ID 1" and it is up to the Speedment framework and the database engine to cooperate in figuring out how to turn that program into a set of instructions to execute.

This discussion uses an very simplistic example to illustrate a general point. Please see the Speedment API Quick Start for more elaborate examples of what the framework can do.

Edit: This text is also published at DZone: Streams in Hibernate and Beyond.