Nov 19, 2018

Hibernate Acceleration by Snapshots

A Google search for “Hibernate and performance” will yield innumerable articles describing how to fix performance issues. This post is not yet another performance improver. Instead we will demonstrate how to remove bottlenecks in your Hibernate project by using JPA support in Spring boot in tandem with in-JVM-memory snapshots which provide speedups of orders of magnitude. Let us use the Sakila sample database for the purpose of this post. The database contains, among other things, Films and Actors and relations between them.

A Straightforward Hibernate and Spring Application

A non-complicated way of interacting with a relational database is to use the Spring JPA plugin to allow Spring to handle dependency injection and setup of the project. This allows for a pure java implementation without any XML code to set up the ORM. For example, a properly annotated plain Film class is all that is needed to map the database table of films into the Java object domain.

@Entity
public class Film {
   @Id
   @Column(name="film_id")
   private int id;
   private String title;
   private String description;
   private int releaseYear;

   // ... more fields, getter and setters et c
}

The perhaps most straightforward way of retrieving such entities is by means of a Repository


@Repository
public interface FilmRepository extends CrudRepository {
}
which allows us to write an application with minimal boilerplate that operates on Films with code such as the following.

@SpringBootApplication
public class HibernateSpeedmentApplication implements CommandLineRunner {

   public static void main(String[] args) {
       SpringApplication.run(HibernateSpeedmentApplication.class, args);
   }

  @Autowired
  FilmRepository filmRepository;

  // ... code using the filmRepository
}

Here Spring helps us inject the filmRepository which can then be used as follows where we stream over all films and sum the film lengths.


public Long getTotalLengthHibernate() {
   return StreamSupport.stream(filmRepository.findAll().spliterator(), false)
       .mapToLong(com.example.hibernatespeedment.data.Film::getLength)
       .sum();
}

Clearly, this is an inefficient way of summing all the film lengths, since it entails fetching all film entities to the JVM and then summing a single property. Since we just retrieve a single value and are not interested in updating any data we would be better off with a Data Transfer Object that only contains the film length. That would require us to write some code that SELECTs the length column server side in the database. When we realize that we want some of the logic of this operation to be moved to the database, it makes a lot of sense to compute the whole sum in the database instead of transferring the film lengths. We then arrive at the following piece of code.


public Long getTotalLengthHibernateQuery() {
   EntityManager em = entityManagerFactory.createEntityManager();
   Query query = em.createQuery("SELECT SUM(length) FROM Film");
   return (Long) query.getSingleResult();
}

Now the application logic contains an explicit work split between JVM and database including a query language construct that is more or less opaque to the compiler.

Remove a Bottleneck with Speedment

While the stream construct with which we started out in the section above was very inefficient, it has appeal in the way that it abstracts away the details of the database operations. The ORM Speedment has a Streams based API allowing stream operations to be efficient. The Speedment based application code is very similar to the Hibernate example with the exception that the Repository is replaced by a Manager and this manager provides streams of entities. Thus, the corresponding Speedment application code would be as follows.

@Autowired
FilmManager filmManager;

public Long getTotalLengthSpeedment() {
   return filmManager.stream()
       .mapToLong(Film.LENGTH.asLong())
       .sum();
}

There are several advantages of deciding on the SQL details at runtime rather than in the application code, including type safety and lower maintenance cost for a more concise business logic code base. The perhaps most prominent advantage of the clean abstraction from database operations, however, is that it allows the runtime to provide acceleration. As a matter of setup configuration and with no modification of any application logic, an optional plugin to the Speedment runtime allows partial snapshots of the database to be prefetched to an in-memory data store, providing several orders of magnitude application speedup without rewriting any part of the application logic.

For this particular example, the Query based Hibernate solution was approximately 5 times faster than the naive approach of streaming over the full set of entities. The Speedment powered solution returned a result 50 times faster than the Query based Hibernate solution. If you try it out, your mileage may vary depending on setup, but clearly the in-memory snapshot will invariably be orders of magnitude faster than round tripping the database with an explicit query which in turn will be significantly faster than fetching the full table to the JVM which happens in the naive implementation.

Coexistence - Using the Right Tool for the Job

While in-memory acceleration does deliver unparallelled speed it is no panacea. For some tables, a speedup of several orders of magnitude may not yield any noticeable effect on the overall application. For other tables, querying an in-memory snapshot of data may not be acceptable due to transactional dependencies. For example, operating on a snapshot may be perfect for the whole dataset in a business intelligence system, a dashboard of KPIs or a tool for exploring historical trade data. On the other hand, the resulting balance after a bank account deposit needs to be immediately visible to the online bank and thus serving the bank account from a snapshot would be a terrible idea.

In many real-world scenarios one would need a solution where some data is served from a snapshot while data from other tables are always fetched from the database. In such a common hybrid case, the code that directly fetches data from the database may use the same Speedment API as the snapshot querying code but for projects already using Hibernate it works perfectly well to combine Speedment and Hibernate.

Since both Hibernate and Speedment rely on JDBC under the hood, they will ultimately use the same driver and may therefore work in tandem in an application. Having a Hibernate powered application, a decision to move to Speedment for bottlenecks can therefore be local to the part of the application that will benefit the most. The rest of the Hibernate application will coexist with the code that leverages Speedment.

It is easy to try this out for yourself. The Sakila database is Open Source and can be downloaded here. Speedment is available as a free version, use the Initializer to download.




Note: For the Spring autowire of the Speedment FilmManager to work, we also need a configuration class which may look as follows.

@Configuration
public class Setup {
    @Bean
    public SakilaApplication createApplication() {
        SakilaApplication app = new SakilaApplicationBuilder()
            .withBundle(DataStoreBundle.class)
            .withUsername("sakila")
            .withPassword("sakila")
            .build();
        app.getOrThrow(DataStoreComponent.class).load();
        return app;
    }

    @Bean
    public FilmManager createFilmMananger(SakilaApplication app) {
        return app.getOrThrow(FilmManager.class);
    }
}

Oct 2, 2018

Sharding for Low Latency Clusters

Scalability of a cluster is often perceived as a matter of being able to scale out, but in practice the constraint of latency may be at least as important; it does not really help making a system future proof in terms of data size capacity if it turns out that handling an increasing amount of data will render the system intolerably slow. This post gives a high level idea of the concept of sharding. We will elaborate on the principle in general, how it relates to affinity and speed and then present some code examples of sharding usage for low latency.

Sharding as Divide and Conquer

There are many problems that can be solved by dividing the problem into smaller subproblems and then assembling the subresults. In a cluster environment where several nodes cooperate to deliver a service, such a divide and conquer strategy may turn a single intractable problem into several smaller and therefore solvable problems. If the size of the data is too large for any of the cluster nodes, the cluster may still be able to handle the service if the data can be partitioned between the nodes.

The strategy of partitioning the data in a way that it lends itself for local computations is often referred to as sharding. A simple example would be a directory lookup service. Let us say we design a clustered lookup service which will return the phone number of a person given her name. Let us assume that the size of the lookup table is more than what we can fit into a single node. This means that the nodes will need to cooperate somehow, and divide the lookup table data between them.

In this example, there keys to the lookup table are totally ordered. Then it is easy to devise threshold values by which we partition the keys. We may for example say that the first node holds all names that appear before Charlie in the alphabet and the second all names between Charlie and Delta and so on. The name is then what we call a shard key which we use to determine node given key. Having a strategy for how to find the node given a key, it is easy to devise an edge service of our cluster that given a node routes the request to the proper node.

In the simplistic example of phone directory lookup, the problem can always be solved at a single node and the reply from that node will constitute the reply from the cluster. In a more realistic scenario, the nodes involved may be any subset of the nodes, perhaps all of them, the response from the service is then some kind of accumulation of the subresults. Further, in this example the shard key is identical to the actual input data but in a more realistic scenario the shard key may be a subset of the input data. The sharding function from shard key to nodes is in general a hash function, or more theoretically a surjective function from shard key to node.

What we have gained in the example of the phone directory is that we have transformed a problem that needs a large amount of data to be solved into several smaller ones by exploiting locality properties of the problem domain. Clearly, such an efficient divide and conquer strategy does not exist if the data cannot be partitioned in any meaningful way.

However, even without the nice property of the total order of keys in this problem, we could divide the data by some other shard key and still deliver the service by just asking all nodes to help. If we for some reason have partitioned the person data by age of the person, that partitioning will be of little help in directing the search by name. Without the help from an index, the service may still be able to deliver a result be sending the request to all nodes and then assembling the subresults. While this helps addressing the problem of the size of the data in each node, it is much less efficient than a shard key that helps narrowing down the search.

Speed by Affinity

Sharding helps combating overwhelming sizes of data, but it may also be a major factor in speeding up computations. Not only by narrowing down the search as demonstrated in the previous paragraph but also because reducing the amount of data needed will allow for lower latency. When dealing with low latency computation, affinity becomes paramount.

The time needed for a system to compute an answer is a function of many factors. Clearly, the design of the algorithm, the size of the data and the computing power of the computing engine are major factors. On a low level of abstraction, the computation speed is limited by the instruction set available to the processor and its operating clock frequency, i.e., the kind of operations it is able to perform and the rate at which those operations can be executed. A computer that can perform 10 complex multiplications per second is faster than a computer able to only to simple additions at the same or lower rate. (Yes, with those numbers we envision the original kind of computer; a human person performing arithmetic.)

The sheer computing power of the processor is however mostly not the only bottleneck. To consider the processor speed as a major limiting factor of the computation time one has to assume that data is readily available when needed, and in real-world situations that is often far from the truth. No matter how clever algorithm the designer of the system has devised, and no matter how fast the processors involved can process the data, the time needed to compute the result is bounded by the time needed to bring the input data to the processor.

In a computing setting, the term affinity is often used to refer to the action of associating a process to a particular processor, often called the kin processor. Having a kin processor, the process will benefit from having intermediate results and state readily available in the processor registers and caches when it gets scheduled to perform work. Ascending the level of abstraction from processors and caches, it also makes sense to use the term affinity for input data available to computing nodes in a cluster. In the following we use the term in this higher level of abstraction. A computing cluster where the nodes have all needed input data available in RAM at all times has the highest degree of affinity while a system where the input data needs to be fetched from remote nodes before computation can complete has a lower degree of affinity.

For a computing cluster with no data redundancy and with data randomly spread out over the nodes, the computing time as function of the number of nodes in the cluster will asymptotically behave as if no data is ever available where it is needed. When there are ten nodes, there is a chance of 1/10 that the needed data will be available, but this chance diminishes as the cluster grows. Since the time needed to transfer data between cluster nodes is several orders of magnitude larger than fetching it from local RAM, keeping the data close at hand is paramount to achieve really low latency.

The random distribution of data is a worst case example and there are many elaborate approaches to increasing the chances of having data available when needed in a computation cluster, but in some cases the problem domain per se actually allows proactive distribution of data that fits the usage pattern and then we may use sharding to achieve ultra low latency for really large data sizes.

The Two Major Dimensions of Cluster Scalability

Scalability of a cluster is often perceived as a matter of being able to scale out, i.e., adding more nodes to the cluster as the size of the data grows. While the ability to handle the sheer size of the problem clearly is a non-negotiable limitation of the system, it may not be the bottleneck. In a non-sharded system, the likelihood of having data available when needed asymptotically approaches zero as the size of the problem increases and this lack of affinity has an impact on computation speed.

All systems become obsolete at some point, and when that happens it may very well be because the system is unable to hold the increased set of data, but another common reason is that the system simply is too slow. Two major factors come into play; the increasing amount of data requiring more nodes with added data transfer latency as a result and also evolving requirements on the system - quite often new applications demand more complex computations. Wasting time on data transfer and being asked to do heavier work turns the once competent system into an obsolete object for a migration project.

We conclude that theoretically scalability is just one pice of the scalability puzzle. Latency as function of data size and problem complexity cannot be fully ignored in a real life scenario.

Sharding to Scale In-JVM-Memory Ultra Low Latency Computing

Focusing from general computation theory into the more specific domain of Java server side development, we use the term in-JVM-memory to refer to a computation strategy where each node has all its needed data directly available to the JVM without the need for fetching it, even from other local memory. Thus taking affinity to the extreme, this strategy allows for ultra-low latency. Clearly, this requires all the node data needed for the computation to fit in available memory and that clearly poses a limitation to the size of the problem that can be handled.

Since a low latency system using in-JVM-memory data needs all data to fit in the available memory of the node, sharding will be needed to address situations where the data is larger than what can fit in the memory of the node. Therefore, sharding support is a key feature of a scalable in-JVM-memory system.

Concluding this rather theoretical post about scalability of low latency clusters we give a few real examples of sharding support. We will use the Java stream ORM Speedment, which has an acceleration feature that relies on in-JVM-memory techniques.

Immutable Sharding

When the shard key values are known apriori, all sharding details can be given at startup time. Further described in the manual, the following example allows for creation of two sharded computation engine instances for shard keys A and B. Then, at other nodes, other shard keys can be used and in such a way the nodes of the cluster divide the data between them ensuring that the first node has all data needed for solving problems in the part of the space of data determined by shard keys A and B.

// Creates a builder from a shard key.
// In this example we are not considering the shard key for
// the builder itself.
Function builderMapper = shardKey ->
    new SpeedmentTestApplicationBuilder()
        .withPassword("speedment_test")
        .withBundle(InMemoryBundle.class);


// Creates a ShardedSpeedment object with two keys "A" and "B"
// The content of the different shards are controlled by the given stream decorator
ShardedSpeedment shardedSpeedemnt = ShardedSpeedment.builder(String.class)
    .withApplicationBuilder(builderMapper)
    .putStreamDecorator(CountryManager.IDENTIFIER, (shardKey, stream) -> 
        stream.filter(Country.NAME.startsWith(shardKey)))
    .putShardKey("A")
    .putShardKey("B")
    .build();

// Loads all the shards into memory
shardedSpeedemnt.load();

// Prints all countries in the "A" shard
shardedSpeedemnt
    .getOrThrow("A")
    .getOrThrow(CountryManager.class)
    .stream()
    .forEachOrdered(System.out::println);

// Prints all countries in the "B" shard
shardedSpeedemnt
    .getOrThrow("B")
    .getOrThrow(CountryManager.class)
    .stream()
    .forEachOrdered(System.out::println);

// Closes all the shards
shardedSpeedemnt.close();

The node that runs this code will only handle data related to countries starting with letters A and B. Other nodes will be handling other countries. Now, this partitioning of the data is quite static and while it sometimes suffices, a more general approach is needed if a more flexible sharding strategy is needed. With Mutable Sharding we allow the cluster shards to be added dynamically.

Mutable Sharding

When the set of shard keys is unknown at startup, mutable sharding can be used as follows. In contrast to the example of immutable sharding, we here allow for adding new keys to the sharding scheme during the life cycle of the application. More details are to be found in the manual.

// Creator that, when applied, will create a Speedment instance for
// a given shard key
final Function creator = shardKey -> {
    SpeedmentTestApplication app = TestUtil.createSpeedmentBuilder().build();
    app.getOrThrow(DataStoreComponent.class).reload(
        ForkJoinPool.commonPool(),
        StreamSupplierComponentDecorator.builder()
            .withStreamDecorator(CountryManager.IDENTIFIER, s -> s.filter(Country.NAME.startsWith(shardKey)))
            .build()
        );
    return app;
};

// Creates a MutableShardedSpeedment
MutableShardedSpeedment shardedSpeedemnt = MutableShardedSpeedment.create(String.class);

// Acquires a Speedment instance for the shard key "A"
// (if the shard is already created, returns the shard,
// if the shard is not created, creates and returns a new
// shard that will be reused for subsequent calls for the
// same shard key.
SpeedmentTestApplication aApp = shardedSpeedemnt.computeIfAbsent("A", creator);

SpeedmentTestApplication bApp = shardedSpeedemnt.computeIfAbsent("B", creator);

final CountryManager aCountryManager = aApp.getOrThrow(CountryManager.class);
final CountryManager bCountryManager = bApp.getOrThrow(CountryManager.class);

// Prints all countries in an "A" shard
aCountryManager.stream().forEach(System.out::println);

// Prints all countries in an "B" shard
bCountryManager.stream().forEach(System.out::println);

// Closes all the shards
shardedSpeedemnt.close();

To download and try Speedment out, there is a Speedment Initailizer, you can find here.

Sep 11, 2018

A Java Stream to UPDATE a Subset of Columns

The Stream ORM Speedment has received a nice feature in the latest release. While traditionally focused on accelerating database reads, Speedment has always also had database write functionality. In the latest version a much anticipated feature has been added - the ability to determine a subset of columns to update.

The Streams API Goes Both Ways

Persisting data in Speedment follows the same intuitive stream approach as other Speedment data oriented operations. Just as querying the database is expressed as a stream of operations on data items, the POJO entities received from the database may be persisted to the database by simply terminating the stream in a database persister.

Simple retrieval of data can be expressed as


Optional<Film> longFilm = films.stream()
    .filter(Film.LENGTH.greaterThan(120))
    .findAny();
which will find a POJO representing a row in the underlying database for which the supplied predicate holds true. In this case, the user will get a film longer than two hours, if any such film exists. For reading data, Speedment thus supplies a stream source that represents the database, in this case the films instance which runs code generated by Speedment. Analogously, Speedment defines a stream termination that handles data writing which can be used as follows.

Stream.of("Italiano", "Español")
        .map(ln -> new LanguageImpl().setName(ln))
        .forEach(languages.persister());
Here we create a stream of POJOs representing rows in the database that may not yet exist in the database. The languages instance on line 3 is a Manager just like films above and is implemented by code generated by Speedment. The persister() method returns a Consumer of POJOs that will persist the items to the database.

Notice the symmetry where Speedment has generated code handling reading and writing of database data as intuitive Stream operations.

Updating Data in a Single Stream

Since Speedment provides a consistent API of treating database operations as stream operations on POJOs, the reading and writing of data can be composed and combined freely. Thus, a POJO retrieved from a Speedment source is the very same kind of POJO that is needed for persistence. Therefore, it makes a lot of sense to use streams that have both source and termination defined by Speedment. If one for example would like to update some rows of a table of the database, this can be done in the following concise way.

  languages.stream()
        .filter(Language.NAME.equal("Deutsch"))
        .map(Language.NAME.setTo("German"))
        .forEach(languages.updater());
Almost self-explanatory, at least compared to the corresponding JDBC operations, the code above will find any Language named “Deutsch” in the database and rename it to “German”. The terminating operation here is the updater which in contrast to the persister modifies existing rows of the database.

Selecting the Fields to Update

The basic updater will update all relevant columns of the row in question, which makes sense in many cases. However, for the case above when updating a single column of the database this behaviour may be wasteful or even prone to errors.

Even if the code above intends to update only the name of the language, since the updater updates all columns of the row it will actually update the name to a new value and also reset all other columns to the values they had when the POJO was created from the database. If this code is the sole actor modifying the database this may be a minor problem, but in a concurrent environment it may create undesired race conditions where this innocent update of a single field may happen to undo changes to other columns.

In Speedment 3.1.6 and later, the user may select which fields to update and persist by supplying a description of the desired fields of the POJO. To improve on the last example, the following code will update only the name of the Language.


   Updater<Language> updater = languages.updater(FieldSet.of(Language.NAME)); 
   languages.stream()
        .filter(Language.NAME.equal("Deutsch"))
        .map(Language.NAME.setTo("German"))
        .forEach(updater);
There are elaborate ways to express the set of fields to update and the interested reader is encouraged to learn more from the Speedment User Guide and to get a free license and example project scaffolding at the initializer.

Sep 22, 2017

A Declarative Java ORM Speaks Fluent SQL

As a developer you do not want to be micromanaged with detailed lists of instructions. So when you ask your ORM to give you data, why do you have to supply the database query to be executed? Clearly, that has historical reasons and an ORM that is free to create queries by itself needs to have an API that declaratively describes the expected result as a combination of data retrieval and modification operations. Java 8 streams provided the needed language support for that. The time had come for a declarative ORM.

Working at Speedment, I am often eager to describe the fundamental advantages of a declarative ORM and the following is an attempt to do so. The reader will be presented with an analogy of a factory working in two phases to describe how micro management of internal steps of a larger task creates barriers for efficiency improvements, just as explicit SQL in a database application constrains the application to a specific solution when a framework with more freedom could find optimizations on a larger scale.

The Furniture Factory Analogy - The Manufacture and the Assembly Teams

A typical Java based database application consists of two steps of computation - first the database query and then the JVM code operating on the data from the database. As an analogy, consider a furniture factory that operates in two steps; manufacture of parts and then and assembly of the parts into complete products.

The furniture company consists of two teams - the manufacture team that creates parts and the assembly team that uses the parts to create complete products. Both teams have declarative work instructions which means that the instructions describe the expected output rather than a sequence of operations to perform. The factory output in terms of complete furniture is thus fully determined by the assembly team instruction, but the factory efficiency is also highly dependent on well-tuned manufacture team instructions. If the parts sent to assembly are too small and simple, assembly may not be possible and if they are too complex the manufacture process becomes too expensive.

Telling the manufacture team to take instructions from the assembly team will make managing the factory easier and allows the teams to cooperate to find the best solution. The assembly team has thus been promoted to a design team since it has the responsibility to design the parts. Instead of detailed descriptions of all parts needed, the factory instructions now only describe the end result and instead of being constrained by given part schemas the design team may freely decide upon the whole process.

Recent technology advances allow for an analogous revolution for relational data Java applications where the manufacture team corresponds to the database engine, the assembly team is a traditional ORM and the design team that replaces the assembly team is a declarative ORM.

How Java Database Applications Relate to the Furniture Factory Analogy

SQL is a well known example of declarative programming and the SQL query works just as the instructions to the manufacture team in the analogy above - you leave it to the database engine to figure out an optimal execution plan for how to compute the result described by the query. Similarly, the pipeline of a Java stream is a description of a sequence of abstract operations, conceptually similar to the assembly team instructions above, where the framework implementing the stream termination determines the actual execution path.

Putting the two together, a Java application using relational data typically uses an ORM. The current standard API for ORMs is JPA and Hibernate may be the most well known implementation of that API. Leveraging the declarative power of the SQL language, Hibernate in a rather transparent way exposes the user to the Hibernate Query Language (HQL) that can be seen as SQL for Java objects. While this is useful for developers used to SQL, it does introduce a mix of languages - the Java code will contain HQL code. Therefore, using the furniture factory analogy, even though SQL is replaced with HQL we are still stuck with detailed manufacture team instructions as long as Hibernate is used.

Just as the furniture factory suffered from maintaining two sets of instructions with implicit dependencies, the mix of Java and HQL comes with a price. Apart from being error prone and creating a high maintenance cost, such a mix of languages also creates a barrier over which functional abstraction cannot take place. The situation calls for a solution similar to replacing the furniture factory assembly team with a design team. In order to fully leverage the declarative nature of a functional Java streams application, the language barrier of the ORM framework needs to be removed.

The Language Barrier Limiting Declarative Power

Ever since the 1970s, SQL has allowed applications operating on relational data to leverage a declarative approach to data handling. An application developer does not need to know about decades of database engine research since the code she writes will only describe what data it needs, not how it is to be retrieved. This clever separation of concerns decouples the application from the database engine details, minimizing maintenance and development cost.

The introduction of streams in Java allows for a declarative programming style conceptually similar to SQL in the sense that the streams created are a description of a sequence of operations rather than an explicit sequence of imperative instructions. The description is fed to the framework defining the termination of the stream, allowing the framework to reason about the whole sequence of operations before actually executing any data operations. Thus, just as the SQL query is a declarative statement about what data to retrieve, the stream is a declarative statement about the operations to be executed on the data. Just as the database engine is free to optimize the query execution as long as the result is the same, the stream termination may alter the data operations as long as the semantic invariants hold.

Until recently, a typical relational data application leveraging Java streams would consist of two distinct parts defining a two step process; first data is retrieved via a database query and then operations are carried out on the collection of data that is returned in the result set from the database. The two frameworks that reason about query and streams respectively are confined to their respective realms; no optimizations may span both domains. Java streams do have expressive power enough to span both domains, since they can be seen as declarative constructs.

Declarative Java Streams Applications

Functional programming has received solid language support in Java with the introduction of streams and lambdas. Allowing the user to create pipelines of abstract operations on a stream of data, the new language constructs introduce a higher order functional approach to programming. Higher order functions take other functions as input and thus allow abstract reasoning about behavior, which opens up a whole new level of abstractions. For instance, we now have language support to create programs that modify other programs.

Consider the following code that describes the first ten even integers that are not divisible by 10.


IntStream.iterate(0, i -> i + 1)
  .map(n -> n * 2)
  .filter(n -> n % 10 != 0)
  .limit(10)
  .forEachOrdered(System.out::println);

A reader used to for example Unix pipes would perhaps interpret this piece of code as a source of data creating an unbounded sequence of integers on line 1, doubling each number on line 2, removing the numbers evenly divisible by 10 on line 3 and then on line 4 throwing away all numbers except for the first 10 before printing them when reaching line 5. This is a very useful intuition even though it is quite far from what will happen when the code is executed.

Interpreting this example as a stream of data being manipulated from top to bottom by each operation yields a correct understanding of the resulting output while creating minimal cognitive encumbrance on the reader. It makes it possible for a reader and even the designer of the code to correctly understand the meaning of the code in terms of output without really thinking about how the expression actually will be evaluated.

What actually does happen in lines 1 through 4 is that a function is created and modified. No integers are created, let alone filtered or mapped, until we reach line 5. That last line is a terminating operation of the stream entailing that it operates on all aspects of the stream, including both the source of data and the operations to finally return the result. Having access to the information about the entire operation and the source, the terminator of the stream can deliver the result without performing unnecessary operations on items what will be discarded by the limit operation.

This lazy evaluation is possible since the program describes the requested operations rather than actually executing them. What at first may seem like a simple sequence of function invocations on a sequence of integers is actually a creation of a function that is passed to another function. The program simply describes what is computed while leveraging previous work invested in the design of the used stream components that determine how the computation will take place. This is the kind of expressive power that is needed to create an ORM that breaks the language barrier by reasoning about the whole task including both in-JVM and database engine operations.

Breaking the Language Barrier with Java Streams

The language barrier between the database engine and the JVM forces the designer to design, optimize and then maintain the structure of the data as transferred from the database since it constitutes output from the first part of the solution and input to the next. Speedment on GitHub is a declarative ORM that breaks this language barrier. Taking the declarative approach from its starting point in two declarative building blocks all the way to its logical conclusion of a unified declarative construct, the Speedment toolkit and runtime abstracts away the SQL query and allows the user of the framework to create a design based solely on streams.

Consider the fully declarative way of counting the users belonging to a particular department in the following code snippet.


long count = users.stream()
  .filter(User.DEPARTMENT.equal(1))
  .count();

The source of the stream is a manager instance called users, which is instantiated from code generated by the toolkit analyzing the database metadata. As described above, viewing a stream as a sequence of objects flowing from the source and modified on each following line will give the correct understanding of the result of the operation while not necessarily any insight to the actual execution of the program. The same applies here. Retrieving all the users from the database in line 1, then in line 2 filtering out the users belonging to department number 1 and finally counting the remaining users would yield the desired result of the operation.

This understanding of the algorithm has a strong appeal in its abstract simplicity and allows the Speedment runtime to decide on all the details of the execution. Not only does it optimize the operations on the data in the JVM, but since the relational database operations are also covered by the abstraction, the generated SQL query will take the full pipeline of operations into account.

By means of a sequence of reductional operations on the pipeline, the runtime will collapse the operations to a single SQL statement that relieves the JVM from any operations on user data instances. The following query will be executed:


SELECT COUNT(*) FROM user WHERE (department = 1)

In this minimalistic example the database engine performs all the data handling operations since the generated SQL query gives the full result, but in general the framework will have to execute part of the pipeline in the JVM. The power of the fully declarative approach is that the framework can freely decide on these aspects. This freedom also opens up for more elaborate optimizations, for example in-JVM-memory data structures for faster lookup of data. The open-source Speedment ORM has an enterprise version that exploits this to provide efficiency boosts of up to several orders of magnitude.

Expressing SQL as Java Streams

As seen above, Speedment removes the polyglot requirement for Java database applications by abstracting away SQL from the business logic code. When analyzing the similarities between the two declarative constructs, one will find striking similarities between the corresponding constructs in SQL and Java streams.

SQLJava Streams
FROMstream()
SELECTmap()
WHEREfilter() (before collecting)
HAVINGfilter() (after collecting)
JOINflatMap()
DISTINCTdistinct()
UNIONconcat().distinct()
ORDER BYsorted()
OFFSETskip()
LIMITlimit()
GROUP BYcollect(groupingBy())
COUNTcount()

Conclusions

Traditional ORMs employ an explicit query language for database interaction. This micromanagement imposes unnecessary constraints on the ORM which can be unleashed by instead using a fully declarative approach. Such an approach is enabled by the declarative nature of Java streams, which allows expressing both database data retrieval and in-JVM data operations in the same language. The declarative nature of the resulting application provides decoupling from the imperative interpretation which enables seamless adjustment to a multi-threaded and/or in-JVM-memory accelerated solution since all details of execution and database-to-JVM data transfer have been abstracted away.

Nov 25, 2016

Speedment 3.0 - Higher Order Functions

Generally accessible in Java 8, higher order functions open up a whole new level of abstractions allowing us to reason about code and algorithms in a new way. When our code can operate on functions as input and output we are given the power to easily express algorithms that modify behavior just like we have always done for data.

The power to create algorithms that modify behavior makes it possible to reason about functionality. Such a higher order algorithm can therefore extend or optimize the characteristic of a function, before the function receives any input.

Speedment is an Open Source ORM with an API founded in Java 8 streams. This post describes some new features of the API of recently released Speedment version 3.0 which add more features to support higher order functions. A reader already familiar with higher order functions and declarative programming may benefit from skipping ahead to the last section.

Functions Operating on Functions

To many developers, there is a fundamental difference between functions and data. A common view of a traditional program is a clear separation of the two; the program is a static structure of procedures and functions determined at compile time which operates on data, part of which typically is supplied at runtime.

For example, it is natural to have a program that prints a sorted list of names that are given as input. When compiled, the sorting function is static but can dynamically be used on any data that happens to be a list of names. If our program was to be generalized to take the actual sorting algorithm as input, we have created a higher order function since our program accepts a function as input.

Seen in such a way, functional aspects of Java are not new. Supplying a function as parameter is commonplace in Java, perhaps most notably as event driven callbacks for asynchronous interfaces. The callback idiom is a very simple type of operation on functions since the function itself is never altered - it is just kept as a reference until the time comes to call it.


Runnable callback = new Runnable() {
  @Override
  public void run() {
    System.out.println("ping");
  }
};

// Some other time, in a context far, far away...
callback.run();

The concept of higher order functions entails not only that functions can be referred to and invoked within the framework of the language itself, but also that functions are treated as other data on which other functions can operate.

The Bliss of Meta Functions

A better example of the power of functional programming in Java 8 would showcase modification of an existing behavior. Even though not obvious from syntax, the Java 8 concept of streams does exactly that. Consider the following code.

IntStream.iterate(0, i -> i + 1)
  .map(n -> n*2)
  .filter(n -> n % 10 != 0)
  .limit(10)
  .forEachOrdered(System.out::println);

A reader used to for example unix pipes would perhaps interpret this algorithm as a source of data creating an unbounded sequence of integers on line 1, doubling each number on line 2, removing the numbers evenly divisible by 10 on line 3 and then on line 4 throwing away all numbers except for the first 10 before printing them when reaching line 5. Actually quite far from what will happen when the code is executed, this is in a sense a very useful intuition.

Interpreting this example as a stream of data being manipulated from top to bottom by each operation yields a correct understanding of the resulting output while creating minimal cognitive encumbrance on the reader. It makes it possible for a reader and even the designer of the code to correctly understand the meaning of the code in terms of output without really thinking about how the expression actually will be evaluated.

What actually does happen in lines 1 through 4 is that a function is created and modified. No integers are created, let alone filtered or mapped, until we reach line 5. That last line is a terminating operation of the stream entailing that it operates on all aspects of the stream, including both the source of data and the operations to finally return the result. Having access to the information about the entire operation and the source, the terminator of the stream can deliver the result without performing unnecessary operations on items what will be discarded by the limit operation.

This lazy evaluation is possible since the program describes the requested operations rather than actually executing them. What at first may seem like a simple sequence of function invocations on a sequence of integers is actually a creation of a function that is passed to another function.

By using higher order functions we decouple the declarative description of the function from the imperative interpretation. The program simply describes what is computed while leveraging previous work invested in the design of the used stream components which determine how the computation will take place.

Functions operating on other functions allows blissful ignorance of (and therefore also decoupling from) execution details.

Parallelism as a Simple Add-on

Decoupling of algorithm from executional details is also the reason why parallelism can be so elegantly implemented for Java 8 streams. The following code will utilize available processor cores to parallelize the work of the stream.

IntStream.iterate(0, i -> i + 1)
  .parallel()
  .map(n -> n*2)
  .filter(n -> n % 10 != 0)
  .limit(10)
  .forEachOrdered(System.out::println);

This small alteration of the stream will yield fundamentally different execution characteristics. In this case, the parallel version is likely to be more wasteful than the original version. The typical implementation of a parallel stream, the Spliterator, will partition the stream of integers in chunks processed concurrently. Since several threads are helping out multiplying integers by two and filtering out numbers that are multiples of ten, the limit operation which requires synchronization between the parallel workers will be fed with more numbers than it will allow to pass through. The first non-parallel example would pull items from the source when needed, stopping just when the limit is reached.

Speedment Taking Higher Order Functions to the Extreme

Similarly to the declarative example above where the composer of the stream describes what to calculate rather than how to do it, the typical user of a relational database is used to query the database by means of the declarative language SQL. Compared to the number of software engineers that have ever designed software that uses data from a relational database, very few are actually concerned with the details of how the database engine cleverly executes the query.

An ORM provides a mapping between the data model of a relational database and the object centric view of an object oriented language such as Java. The Java 8 stream ORM Speedment provides such a mapping with a fully declarative Java 8 stream API.

As seen above, the conceptual building blocks for declarative programming of applications leveraging relational database data are available even without Speedment;

  1. the database is queried using the declarative SQL and
  2. the Java 8 streams features provide a similarly declarative approach to design of the operations on the data once retrieved to the JVM.

However, without Speedment the two declarative languages have to be mixed in an application. In a previous post on this blog we have seen that the designer has to suboptimize two separate programs; the declarative description of the data to retrieve from the database and then in a separate language the operations to perform on that data in the JVM.

The language barrier between the database engine and the JVM forces the designer to design, optimize and then maintain the structure of the data as transferred from the database since it constitutes output from the first part of the solution and input to the next. Taking the declarative approach from its starting point in two declarative building blocks all the way to its logical conclusion of a unified declarative construct, Speedment abstracts away the SQL query and allows the user of the framework to create a design based solely on streams.

Consider the fully declarative Speedment way of counting the users belonging to a particular department in the following code snippet.


long count = users.stream()
  .filter(User.DEPARTMENT.equal(1))
  .count();

The source of the stream is a manager instance called users. As described in length above, viewing a stream as a sequence of objects flowing from the source and modified on each following line will give the correct understanding of the result of the operation while not necessarily any insight to the actual execution of the program. The same applies here. Retrieving all the users from the database in line 1, then in line 2 filtering out the users belonging to department number 1 and finally counting the remaining users would yield the desired result of the operation.

This understanding of the algorithm has a strong appeal in its abstract simplicity and allows the Speedment framework to decide on all the details of the execution. Not only does Speedment optimize the operations on the data in the JVM, but since also the relational database operations are covered by the abstraction the SQL query will be created by Speedment taking the full pipeline of operations into account.

By means of a sequence of reductional operations on the pipeline Speedment will collapse the operations to a single SQL statement that relieves the JVM from any operations on user instances. The executed query will actually be as follows.


SELECT COUNT(*) FROM user WHERE (department = 1)

As showed above in the example of the parallel streams, in a declarative setting where the framework makes decisions on execution strategy it is easy to reuse a declarative program in a new setting with completely different executional properties. In its enterprise version, the Speedment framework may also chose to execute queries using data from an in-JVM memory data store, improving database latency by orders of magnitude without changing a single line of the declarative program describing the business logic.

Speedment 3.0 Structural Support for Higher Order Functions

Recently released Speedment 3.0 adds improved API support for higher order functions. While the runtime framework SQL optimization of those constructs is not yet finished, it allows design of more complex and elegant declarative programs that are prepared to be executing more efficiently in later versions of Speedment. The following code can be used to set the “category” property to 3 for all users born in a particular range of dates.

users.stream()
  .filter(User.BORN.between(1985, 1995)) // Filters out Users born 1985 up to and including 1994
  .map(User.CATEGORY.setTo(3))           // Applies a function that sets their category to 3
  .forEach(users.updater());             // Applies the updater function to the selected users
The updater construct is new in Speedment 3.0 and lends itself to executional optimization in the Speetment framework since if the preceding pipeline has suitable properties the whole update operation pipeline may be collapsed to a single SQL UPDATE statement. While version 3.0 of the framework does not perform all the possible optimizations to collapse the pipeline to a single SQL statement, the program will yield a correct result by a sequence of UPDATE operations. Another example of the extended support for higher order functions in Speedment 3.0 are the categorizers. Functions designed to work with the standard Java stream operations, the categorizers allow constructs such as the following which creates a mapping from users to their homepages assuming a homepage has a foreign key pointing to its owner.

Map<User, List<HomePage>> join = homepages.stream()
  .collect(
    groupingBy(users.finderBy(HomePage.OWNER), toList())
  );
As we have seen in two examples above, the beauty of the declarative programming paradigm is that executional properties are abstracted away. When the Speedment framework in a later version implements all the needed optimizations, existing Speedment leveraging applications will start emitting smarter SQL queries to the relational database as soon as they use the newer framework version. A dream come true from an application maintenance perspective, the declarative program describes a solution and therefore remains the same since the problem it solves is the same.

Oct 31, 2016

When the Hare Met the Reindeer at JavaOne

The Java Duke, Vaadin Reindeer and Speedment Hare - a picture by Elis Minborg, inspired by a John Bauer

Coming from a software development background mainly rooted in Java I have spent roughly two years away from the Java world, exploring the wonderful aspects of embedded system design in the automotive industry. Taking on new challenges yet again related to Java is a cognitive homecoming of sorts for me while the spatial aspects of the new job initially entail transatlantic commuting. My first California visit on the new job at Speedment Inc luckily coincided with the JavaOne 2016 conference in San Francisco.

The story that you are about to read is about one particular encounter at this for me very inspiring event. You will learn about Hares and Reindeer, Nordic cooperation at its finest and a perfect match of frameworks creating an inspiringly elegant hello world web application rendering data from a relational database in a browser.

TL;DR: Jump directly to the take home message or the code in case you do not want to indulge in the full narrative.

JavaOne 2016 Keeping the Promises of Conferences Past

At the conference JFocus in 2013, there was a consensus that Java felt dated as compared to more modern languages such as Scala and Groovy and suffered from somewhat deserved lack of enthusiastic backing. The upcoming Java 8 promised great progress, however, and it was repeatedly stressed that the future for Java looked brighter than the recent past since the new concepts of streams and lambdas would modernize the language.

Having watched the progress from a certain distance since the automotive industry has a way of absorbing one's attention, it was a delight to be back in 2016 to what seemed quite similar to the bright future pictured a few years ago. The future progress promised in 2013 had not only materialized but also been widely embraced. The functional paradigm of Java 8 lambdas and streams had not been received as just a facelift but as the fundamental gamechanger it actually is.

Attending a big conference when just having started a new job is a great way of getting a feel for the general direction of the industry and the way the product of the company is received. An inspiring and greatly informative quick start, participating at such a major congregation immediately tells what parts of our product actually peak others interest and what kind of questions arise during follow-up discussions.

I learned from several enthusiastic exchanges that Speedment bringing Java 8 style functional programming in an innovative and uniquely consistent way to the realm of relational databases was the key factor that caught others attention.

An Inspiring Meeting - Vaadin and Speedment

The most inspiring interaction was probably with UI framework designers Vaadin. Creating a framework that can be described as AWT/Swing for the web done right, Vaadin brings elegant programming of web interfaces to Java. Delivering relational data to Java in a way very similar to what Vaadin expects as input, Speedment seems at first glance to be the perfect match with Vaadin and together the two frameworks could in theory bring relational data to the web allowing minimal boilerplate business logic in between.

Capturing the spirit of the meeting the graphic illustration of this blog post is inspired by painter John Bauer and pictures a reindeer and a hare working together. With its roots in Finland, Vaadin has a Reindeer mascot. Speedment has Swedish heritage and mascot Hare.

Nordic people may not be known for overly expressive outbursts of enthusiasm, but I believe that I displayed my excitement with an average of more than one positive word per sentence and an occasional smirk that may have resembled a smile. Thus expressing profound enthusiasm (bordering the socially acceptable in my home country) about the promising outlook, we decided to try it out. What about trying to get Vaadin and Speedment work together there and then at the exhibition hall of the conference?

It has happened to me many times that what at first seems like a great idea in theory somehow does not handle the meeting with reality very well. General discussions in the abstract world of concepts quite often reveal overlapping or matching ideas that implicate great synergies, only to let one down when further explored. This is quite natural for at least two reasons;

  1. We are able to reason about fundamentally complex systems much due to our ability to perform agile transitions between different levels of abstraction. Roaming the realm of higher abstractions, we allow ourselves the luxury of turning a blind eye to details, enabling us to find patterns on a larger scale. When descending the stairs of abstraction to lower levels we often find ourselves knee deep in a swap of complicating details.
  2. Additionally, when talking about newly found synergies, one only has deep knowledge about a limited part of the problem since otherwise the topic at hand would not be new in the first place. Compared to thinking within one’s knowledge comfort zone, reasoning about something one does not really know typically gives rise to much creativity and rejoicing but increases the risk of creating things that simply do not work.

Well knowing this, setting out to create running code right there on the exhibition floor seemed like a fun exercise yet doomed to run into some obstacle, no matter how clean and perfect the match does initially seem. It would turn out that the expected obstacles never materialized.

Creating a Hello World Relational Data Web Application

Granting myself some philosophical leeway, I submit to the reader that many great things that are considered to be invented are actually rather discovered. Instead of being born as the result of a creative process, they were in some ontological sense present all the time waiting to be found. I would say that the application that emerged in my laptop while bright minds of Vaadin and Speedment contributed with each part of the solution was the result of the two frameworks being cleverly designed, rather than an inspired act of invention in that busy exhibition hall.

After some ten minutes of coding no more than standard instructions of the respective framework, the browser of my laptop rendered data from a table of an SQL database. The data was delivered by Speedment, manipulated in straightforward Java 8 code with minimal boilerplate and then sent to Vaadin for rendering.

One reason that this application worked more or less out of the box is that Vaadin and Speedment agree on using standard Java constructs in their interfaces. Had there been any special constructs used for any of the frameworks, the code would be bloated with adapting boilerplate.

For the toy example we started out with one of our example databases. The table of interest in this example is a table of hares that have name, color and age.


CREATE TABLE `hare` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(45) NOT NULL,
  `color` varchar(45) NOT NULL,
  `age` int(11) NOT NULL,
  PRIMARY KEY (`id`)
);

INSERT INTO hare (name, color, age) VALUES ("Hansel", "Gray", 3);
INSERT INTO hare (name, color, age) VALUES ("Henrietta", "White", 2);
INSERT INTO hare (name, color, age) VALUES ("Henry", "Black", 9);
INSERT INTO hare (name, color, age) VALUES ("Harry", "Gray", 400);
The business logic of our toy example is minimal, the following declarative code describes filtering out hares that have age above 3 and collecting them in a list. The Speedment framework has an API founded in Java 8 streams, a fact that was the major cause of excitement when describing Speedment to people I met at JavaOne. For further reading on that topic, please see for example this previous blog post.


  final List haresList = hares.stream()
    .filter(Hare.AGE.greaterThan(3))
    .collect(Collectors.toList());

The first line starts out the description of an operation pipeline operating on a stream of hares from the database and the second line adds a filter to the pipeline telling Speedment to find hares of the desired age. The last line terminates the pipeline, defining the result as a list of the filtered hares.

Having a list of beans, a table of the Vaadin framework can be populated by the following piece of code.


  MTable<hareimpl> table = new MTable<>(Hare.class)
      .withCaption("Hares older than 3 years")
      .withProperties("name", "color", "age");
  table.setRows(haresList);
  setContent(table);

A First Test Run

The code shown so far is all the business logic code needed to get a live view of the database content in the browser. Asking the database for the contents we happen to find 4 hares.

  mysql> SELECT * FROM hare;
  +-----+-----------+-------+-----+
  | id  | name      | color | age |
  +-----+-----------+-------+-----+
  |   1 | Hansel    | Gray  |   3 |
  |   2 | Henrietta | White |   2 |
  |   3 | Henry     | Black |   9 |
  | 700 | Harry     | Gray  | 400 |
  +-----+-----------+-------+-----+
  4 rows in set (0,00 sec)

Our business logic described above filters out the hares with age strictly greater than 3, yielding two lines of the table presented in the browser.

Further Improvements

A promising feature for a future blog post about Vaadin and Speedment is the lazy loading of MTables. It seems quite straight-forward, and probably interacts nicely with Speedment creating streams of data instead of a fully instantiated list.

The upcoming Vaadin 8 API promises to take Java 8 support to a higher level. See for example this blog post. I look forward to exploring how this may enable even more elegant Speedment Vaadin interaction.

Final Remarks: The Take Home Message

Are you a web application developer leveraging data from a relational database? This blog post describes how Speedment and Vaadin provide the ideal framework to create a modern, type safe, easy to maintain and elegant Java 8 streams based application without any query language or cumbersome boilerplate for the UI.

I thoroughly enjoyed being there to see this application appear on my screen. It gave me some insight into the Vaadin framework and I took with me new knowledge of how to give a quick visual presentation of the power of the Speedment framework. For a back end tool provider, demonstrating end user value is often an indirect exercise. To show a relevant example of how to use our tool, we need to add some front end presentation and Vaadin is a great match for doing so in the future.

The two frameworks seamlessly interacting to create a solution to a toy representation of a real use case, in this case a web application using a relational database, is a sign of relevancy and maturity of both frameworks.

What you have seen here is a starting point of more to come, a proof of concept if you will. Since the two frameworks complement each other so well to create a self contained complete example, we will build on this to create more elaborate Speedment web applications using Vaadin . For a developer new to the field, it is very easy to see that the frameworks solve and abstract away the domain specifics and allows the developer of the business logic to create her application using modern standard Java code with minimal concern for boilerplate or framework specific constructs.

As described in more detail in for example this blog post, Speedment allows a developer to access the relational database in a declarative way without explicitly creating SQL queries, which allows for very elegant code with low maintenance cost. The proof of concept described here shows how nicely Vaadin provides a web front end that works with Speedment out of the box.

Appendix: How to Run the Application

The code shown above is all the Java code needed for the application to run. To get the frameworks configured and running, clearly some dependencies are needed in the pom file and the Speedment framework needs proper database credentials. The following steps will get it running by using this github repository as a starting point.
  1. Create the database table.
    
    CREATE TABLE `hare` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `name` varchar(45) NOT NULL,
      `color` varchar(45) NOT NULL,
      `age` int(11) NOT NULL,
      PRIMARY KEY (`id`)
    );
    
  2. Populate the database with some data to make the application result interesting.
    
    INSERT INTO hare (name, color, age) VALUES ("Hansel", "Gray", 3);
    INSERT INTO hare (name, color, age) VALUES ("Henrietta", "White", 2);
    INSERT INTO hare (name, color, age) VALUES ("Henry", "Black", 9);
    INSERT INTO hare (name, color, age) VALUES ("Harry", "Gray", 400);
    
  3. Clone the github skeleton containing a pom.xml file and the java code from this blog post.
    git clone https://github.com/lawesson/speedment-vaadin-demo.git
  4. Change directory to the newly created git repo.
    cd speedment-vaadin-demo
  5. Run the Speedment code generation tool.
    mvn speedment:tool
  6. Enter database credentials in the UI, fill in the schema hares and press the Generate button without changing any defaults to create Speedment code.
  7. Run the application. Substitute pwd and user with proper database credentials.
    mvn -Djdbc.password=pwd -Djdbc.username=user compile exec:java
  8. With the application running, point a browser to the Vaadin rendered UI located at http://localhost:8080.
  9. Optionally, add some more hares in the table and reload the browser page to see the data filtered by your application.

Edit: Since this post was originally published, Matti Tahvonen of Vaadin has provided valuable feedback and a pull request to the git repo bringing the example code up to speed with the latest advances of Vaadin technology.

Sep 29, 2016

Java 8: Streams in Hibernate and Beyond

In version 5.2 Hibernate has moved to Java 8 as base line. Keeping up with the new functional paradigm of Java 8 with lambdas and streams, Hibernate 5.2 also supports handling a query result set as a stream. Admittedly a small addition to the API, streams add significant value by allowing the Hibernate user to leverage streams parallelism and functional programming without creating any custom adaptors.

This post will elaborate on the added superficially small but fundamentally important streams feature of Hibernate 5.2 and then discuss how the Java 8 stream ORM Speedment takes the functional paradigm further by removing the language barrier and thus enabling a clean declarative design.

The following text will assume general knowledge of relational databases and the concept of ORM in particular. Without a basic knowledge of Java 8 streams and lambdas the presentation will probably seem overly abstract since basic features will be mentioned without further elaboration.

Imperative Processing of a Query Result

To contrast different approaches to handling the data from a relational database via Hibernate, we consider an example that despite its simplicity illustrates the big picture - an HQL query produces a set of Java objects that are further processed in the JVM. We start out fully imperative and gradually move towards a declarative design.

The table we use is a table of Hares, where a Hare has a name and an id.


CREATE TABLE `hare` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(45) NOT NULL,
  PRIMARY KEY (`id`)
);

To avoid discussing the query language per se, we use an example of a simplistic HQL query that creates a result set containing all the contents of a table of the database. The naïve approach to finding the item we are looking for would be to iterate over the data of the table as follows.


List<Hare> hares = session.createQuery("SELECT h FROM Hare h", Hare.class).getResultList();
for (Hare hare : hares) {
  if (hare.getId() == 1) {
    System.out.println(hare.getName());
  }
}

Note how the design of the query result handling is fully imperative. The implementation clearly states a step-by-step instruction of how to iterate over the elements and what to do with each element. By the end of the day, when it is time to run the program, all programs are in a sense imperative since the processor will need a very explicit sequence of instrucitons to execute. The imperative approach to programming may therefore seem the most intuitive.

Declaring the Goal, Receiving the Path

In contrast to the imperative design, the declarative approach focuses on what to be done, rather than on how to do it. This does not just tend to create more concise and elegant programs, but introduces a fundamental advantage as it allows the computer to figure out the transition from what to how. Sometimes without even thinking about it, many programmers are used to this approach in the realm of relational databases since the query language SQL is one of the most popular instances of declarative programming. Relieved of the details of exactly how the database engine will retrieve the data the designer can focus on what data to get, and then of course what to do with it after it is retrieved.

Java 8 streams and lambdas allow for a declarative approach to handling collections of data. Instead of listing a sequence of instructions to be carried out, the user of a stream first creates a pipeline of abstract operations to be carried out and when presented with a terminated pipeline, the stream implementation will figure out the imperative details.

Even before Hibernate 5.2, our running example could be ported to the Java 8 domain of streams by just adding a simple method call in the chain of operations since the List itself has a stream method.


List<Hare> hares = session.createQuery("SELECT h FROM Hare h", Hare.class).getResultList();
hares.stream()
  .filter(h -> h.getId() == 1)
  .forEach(h -> System.out.println(h.getName()));

While this example may seem similar to the imperative iteration in the previous design, the fundamental difference is that this program will first create a representation of the operations to be carried out and then lazy evaluate it. Thus, nothing actually happens to the items of the List until the full pipeline is created. We express what we want in terms of a functional composition of basic operations but do not lock down any decisions about how to execute the resulting function.

Since a major feature of functional programming is the compositional design, a more typical streams approach would be to chain stepwise operations on the data. To extract the name of the item, we may map the getter on the stream as follows.


List<Hare> hares = session.createQuery("SELECT h FROM Hare h", Hare.class).getResultList();
hares.stream()
  .filter(h -> h.getId() == 1)
  .map(Hare::getName)
  .forEach(System.out::println);
This pattern is suboptimal in many aspects. The obvious problem mentioned above of not using the database to filter the data in the first place is just one of them. Another problem is that this pattern forces the entire table to be loaded in JVM memory before the iteration can start. Notice how a List of Hare is populated in the first line of each code snippet. In order to start filtering that stream, we first instantiate the entire source of the stream. This means that in terms of memory footprint, the JVM memory will have to accommodate all Hares in the database - completely defeating the purpose of analyzing a set of data piece by piece in a stream.

Streaming a Result Set

As shown in the section above, naïve streams of query results in Hibernate before version 5.2 came with a steep price. Even though the underlying JDBC database driver handles the results of a query as a result set over which the user may iterate piece by piece, streams were not supported by the query result instance, forcing the functionally inclined developer to implement a stream source from the result set or go via an intermediate collection to get a stream without custom stream implementation.

With Hibernate 5.2, the query result can produce a stream, allowing the following minimal change in code which has the important advantage of not loading the entire table into an intermediate representation from which to source the stream.


session.createQuery("SELECT h FROM Hare h", Hare.class).stream()
  .filter(h -> h.getId() == 1)
  .map(Hare::getName)
  .forEach(System.out::println);
With this improvement, Java 8 streams are efficiently supported out-of-the-box by Hibernate without creating a custom stream source from the result set.

Selection by the Source

As mentioned above, the code snippets so far illustrate the general case of an HQL query generating a result which the JVM will use for some further processing. The details of the example reveal an almost offensive lack of interest in what the database can do for the user locally in terms of filtering data and that shortcoming will be adressed in the following.

The optimization desperately needed for this code is of course to adjust the query to allow the database to create a result set closer to the desired result of the operation. Focusing on just filtering the rows of the database and leaving the extraction of the columns to the JVM, the now familiar code snippet can be updated to the following.


session.createQuery("SELECT h FROM Hare h WHERE id = 1", Hare.class).stream()
  .map(Hare::getName)
  .forEach(System.out::println);

Note that this short piece of a program contains two declarative parts that require separate design with different kinds of considerations. Since the program is divided between what happens before and after the stream is created, any optimization will have to consider what happens on both sides of that barrier.

While this indeed is considerably more elegant than the first example (which admittedly for pedagogical reasons was designed to showcase potential for improvement rather than representing a real solution to a problem), the barrier poses a fundamental problem in terms of declarative design. It can rightfully be claimed that the program still is an imperative program composed by two declarative sub routines - first execute the query and then execute the Java part of the program. We may chose to refer to this as the language barrier, since the interface between the two declarative languages creates a barrier over which functional abstraction will not take place.

Enter Speedment - Going Fully Declarative

As discussed above, the advantages of Java 8 streams extend far beyond a more elegant syntax. The appeal of the functional approach to data processing also stems from among other things,
  • the seamless generalization to parallelism (expressing a design as a pipeline of operations is a great starting point for building a set of parallel pipes),
  • design by composition (reuse and modularization of code is encouraged by a paradigm of composing solutions as a composition of smaller operations),
  • higher order functions (behavior expressed as lambdas can be used as language entities such as parameters to methods) and
  • declarative programming (the application designer focuses on what is needed, the framework or stream primitives design determines the details about how, allowing lazy evaluation and shortcuts).

We have shown how the new Hibernate API of version 5.2 adds basic support for streams, which allows for a declarative approach to describing the operations applied to the dataset retrieved from the database. While this is a fundamental insight and improvement, the Hibernate design with a foundation in an explicit query language limits the reach of the declarative features of the resulting programs due to the language barrier constituted by the interface between two languages.

The logical next step along the path from iterative to declarative design would be to break the language barrier and that is what the Java stream ORM Speedment does.

In the Speedment framework, the resulting SQL query is the responsibility of the framework. Thus, a program leveraging Speedment does not use any explicit query language. Instead, all the data operations are expressed as a pipeline of operations on a stream of data and the framework will create the SQL query. Returning to our example, a Speedment based design could be expressed as follows.


hares.stream()
  .filter(h -> h.getId() == 1)
  .map(Hare::getName)
  .forEach(System.out::println);

The hares manager is the source of the stream of Hares. No SQL will be run or even created until the pipeline of operations is terminated. In the general case, the Speedment framework cannot optimize a SQL query followed by lambda filters since the lambda may contain any functionality. Therefore, the executed SQL query for this example will be a query for all data in the Hares table since the behavior of the first filter cannot be analysed by the framework. To allow the framework to optimize the pipeline, there is a need for a data structure representing the operations in terms of basic known building blocks instead of general lambda operations. This is supported by the framework and is expressed in a program as follows.


hares.stream()
  .filter(Hare.ID.equal(1))
  .map(Hare.NAME.getter())
  .forEach(System.out::println);

The pipeline of operations is now a clean data structure declaratively describing the operations without any runnable code, in contrast to a filter with a lambda. Thus, the SQL query that will be run is no longer a selection of all items of the table, but instead a query of the type "SELECT * FROM hares WHERE ID=1". Thus, by removing the language barrier, a fully declarative design is achieved. The program states "Find me the names of the hares of the database with ID 1" and it is up to the Speedment framework and the database engine to cooperate in figuring out how to turn that program into a set of instructions to execute.

This discussion uses an very simplistic example to illustrate a general point. Please see the Speedment API Quick Start for more elaborate examples of what the framework can do.

Edit: This text is also published at DZone: Streams in Hibernate and Beyond.