Sep 22, 2017

A Declarative Java ORM Speaks Fluent SQL

As a developer you do not want to be micromanaged with detailed lists of instructions. So when you ask your ORM to give you data, why do you have to supply the database query to be executed? Clearly, that has historical reasons and an ORM that is free to create queries by itself needs to have an API that declaratively describes the expected result as a combination of data retrieval and modification operations. Java 8 streams provided the needed language support for that. The time had come for a declarative ORM.

Working at Speedment, I am often eager to describe the fundamental advantages of a declarative ORM and the following is an attempt to do so. The reader will be presented with an analogy of a factory working in two phases to describe how micro management of internal steps of a larger task creates barriers for efficiency improvements, just as explicit SQL in a database application constrains the application to a specific solution when a framework with more freedom could find optimizations on a larger scale.

The Furniture Factory Analogy - The Manufacture and the Assembly Teams

A typical Java based database application consists of two steps of computation - first the database query and then the JVM code operating on the data from the database. As an analogy, consider a furniture factory that operates in two steps; manufacture of parts and then and assembly of the parts into complete products.

The furniture company consists of two teams - the manufacture team that creates parts and the assembly team that uses the parts to create complete products. Both teams have declarative work instructions which means that the instructions describe the expected output rather than a sequence of operations to perform. The factory output in terms of complete furniture is thus fully determined by the assembly team instruction, but the factory efficiency is also highly dependent on well-tuned manufacture team instructions. If the parts sent to assembly are too small and simple, assembly may not be possible and if they are too complex the manufacture process becomes too expensive.

Telling the manufacture team to take instructions from the assembly team will make managing the factory easier and allows the teams to cooperate to find the best solution. The assembly team has thus been promoted to a design team since it has the responsibility to design the parts. Instead of detailed descriptions of all parts needed, the factory instructions now only describe the end result and instead of being constrained by given part schemas the design team may freely decide upon the whole process.

Recent technology advances allow for an analogous revolution for relational data Java applications where the manufacture team corresponds to the database engine, the assembly team is a traditional ORM and the design team that replaces the assembly team is a declarative ORM.

How Java Database Applications Relate to the Furniture Factory Analogy

SQL is a well known example of declarative programming and the SQL query works just as the instructions to the manufacture team in the analogy above - you leave it to the database engine to figure out an optimal execution plan for how to compute the result described by the query. Similarly, the pipeline of a Java stream is a description of a sequence of abstract operations, conceptually similar to the assembly team instructions above, where the framework implementing the stream termination determines the actual execution path.

Putting the two together, a Java application using relational data typically uses an ORM. The current standard API for ORMs is JPA and Hibernate may be the most well known implementation of that API. Leveraging the declarative power of the SQL language, Hibernate in a rather transparent way exposes the user to the Hibernate Query Language (HQL) that can be seen as SQL for Java objects. While this is useful for developers used to SQL, it does introduce a mix of languages - the Java code will contain HQL code. Therefore, using the furniture factory analogy, even though SQL is replaced with HQL we are still stuck with detailed manufacture team instructions as long as Hibernate is used.

Just as the furniture factory suffered from maintaining two sets of instructions with implicit dependencies, the mix of Java and HQL comes with a price. Apart from being error prone and creating a high maintenance cost, such a mix of languages also creates a barrier over which functional abstraction cannot take place. The situation calls for a solution similar to replacing the furniture factory assembly team with a design team. In order to fully leverage the declarative nature of a functional Java streams application, the language barrier of the ORM framework needs to be removed.

The Language Barrier Limiting Declarative Power

Ever since the 1970s, SQL has allowed applications operating on relational data to leverage a declarative approach to data handling. An application developer does not need to know about decades of database engine research since the code she writes will only describe what data it needs, not how it is to be retrieved. This clever separation of concerns decouples the application from the database engine details, minimizing maintenance and development cost.

The introduction of streams in Java allows for a declarative programming style conceptually similar to SQL in the sense that the streams created are a description of a sequence of operations rather than an explicit sequence of imperative instructions. The description is fed to the framework defining the termination of the stream, allowing the framework to reason about the whole sequence of operations before actually executing any data operations. Thus, just as the SQL query is a declarative statement about what data to retrieve, the stream is a declarative statement about the operations to be executed on the data. Just as the database engine is free to optimize the query execution as long as the result is the same, the stream termination may alter the data operations as long as the semantic invariants hold.

Until recently, a typical relational data application leveraging Java streams would consist of two distinct parts defining a two step process; first data is retrieved via a database query and then operations are carried out on the collection of data that is returned in the result set from the database. The two frameworks that reason about query and streams respectively are confined to their respective realms; no optimizations may span both domains. Java streams do have expressive power enough to span both domains, since they can be seen as declarative constructs.

Declarative Java Streams Applications

Functional programming has received solid language support in Java with the introduction of streams and lambdas. Allowing the user to create pipelines of abstract operations on a stream of data, the new language constructs introduce a higher order functional approach to programming. Higher order functions take other functions as input and thus allow abstract reasoning about behavior, which opens up a whole new level of abstractions. For instance, we now have language support to create programs that modify other programs.

Consider the following code that describes the first ten even integers that are not divisible by 10.


IntStream.iterate(0, i -> i + 1)
  .map(n -> n * 2)
  .filter(n -> n % 10 != 0)
  .limit(10)
  .forEachOrdered(System.out::println);

A reader used to for example Unix pipes would perhaps interpret this piece of code as a source of data creating an unbounded sequence of integers on line 1, doubling each number on line 2, removing the numbers evenly divisible by 10 on line 3 and then on line 4 throwing away all numbers except for the first 10 before printing them when reaching line 5. This is a very useful intuition even though it is quite far from what will happen when the code is executed.

Interpreting this example as a stream of data being manipulated from top to bottom by each operation yields a correct understanding of the resulting output while creating minimal cognitive encumbrance on the reader. It makes it possible for a reader and even the designer of the code to correctly understand the meaning of the code in terms of output without really thinking about how the expression actually will be evaluated.

What actually does happen in lines 1 through 4 is that a function is created and modified. No integers are created, let alone filtered or mapped, until we reach line 5. That last line is a terminating operation of the stream entailing that it operates on all aspects of the stream, including both the source of data and the operations to finally return the result. Having access to the information about the entire operation and the source, the terminator of the stream can deliver the result without performing unnecessary operations on items what will be discarded by the limit operation.

This lazy evaluation is possible since the program describes the requested operations rather than actually executing them. What at first may seem like a simple sequence of function invocations on a sequence of integers is actually a creation of a function that is passed to another function. The program simply describes what is computed while leveraging previous work invested in the design of the used stream components that determine how the computation will take place. This is the kind of expressive power that is needed to create an ORM that breaks the language barrier by reasoning about the whole task including both in-JVM and database engine operations.

Breaking the Language Barrier with Java Streams

The language barrier between the database engine and the JVM forces the designer to design, optimize and then maintain the structure of the data as transferred from the database since it constitutes output from the first part of the solution and input to the next. Speedment on GitHub is a declarative ORM that breaks this language barrier. Taking the declarative approach from its starting point in two declarative building blocks all the way to its logical conclusion of a unified declarative construct, the Speedment toolkit and runtime abstracts away the SQL query and allows the user of the framework to create a design based solely on streams.

Consider the fully declarative way of counting the users belonging to a particular department in the following code snippet.


long count = users.stream()
  .filter(User.DEPARTMENT.equal(1))
  .count();

The source of the stream is a manager instance called users, which is instantiated from code generated by the toolkit analyzing the database metadata. As described above, viewing a stream as a sequence of objects flowing from the source and modified on each following line will give the correct understanding of the result of the operation while not necessarily any insight to the actual execution of the program. The same applies here. Retrieving all the users from the database in line 1, then in line 2 filtering out the users belonging to department number 1 and finally counting the remaining users would yield the desired result of the operation.

This understanding of the algorithm has a strong appeal in its abstract simplicity and allows the Speedment runtime to decide on all the details of the execution. Not only does it optimize the operations on the data in the JVM, but since the relational database operations are also covered by the abstraction, the generated SQL query will take the full pipeline of operations into account.

By means of a sequence of reductional operations on the pipeline, the runtime will collapse the operations to a single SQL statement that relieves the JVM from any operations on user data instances. The following query will be executed:


SELECT COUNT(*) FROM user WHERE (department = 1)

In this minimalistic example the database engine performs all the data handling operations since the generated SQL query gives the full result, but in general the framework will have to execute part of the pipeline in the JVM. The power of the fully declarative approach is that the framework can freely decide on these aspects. This freedom also opens up for more elaborate optimizations, for example in-JVM-memory data structures for faster lookup of data. The open-source Speedment ORM has an enterprise version that exploits this to provide efficiency boosts of up to several orders of magnitude.

Expressing SQL as Java Streams

As seen above, Speedment removes the polyglot requirement for Java database applications by abstracting away SQL from the business logic code. When analyzing the similarities between the two declarative constructs, one will find striking similarities between the corresponding constructs in SQL and Java streams.

SQLJava Streams
FROMstream()
SELECTmap()
WHEREfilter() (before collecting)
HAVINGfilter() (after collecting)
JOINflatMap()
DISTINCTdistinct()
UNIONconcat().distinct()
ORDER BYsorted()
OFFSETskip()
LIMITlimit()
GROUP BYcollect(groupingBy())
COUNTcount()

Conclusions

Traditional ORMs employ an explicit query language for database interaction. This micromanagement imposes unnecessary constraints on the ORM which can be unleashed by instead using a fully declarative approach. Such an approach is enabled by the declarative nature of Java streams, which allows expressing both database data retrieval and in-JVM data operations in the same language. The declarative nature of the resulting application provides decoupling from the imperative interpretation which enables seamless adjustment to a multi-threaded and/or in-JVM-memory accelerated solution since all details of execution and database-to-JVM data transfer have been abstracted away.