Nov 25, 2016

Speedment 3.0 - Higher Order Functions

Generally accessible in Java 8, higher order functions open up a whole new level of abstractions allowing us to reason about code and algorithms in a new way. When our code can operate on functions as input and output we are given the power to easily express algorithms that modify behavior just like we have always done for data.

The power to create algorithms that modify behavior makes it possible to reason about functionality. Such a higher order algorithm can therefore extend or optimize the characteristic of a function, before the function receives any input.

Speedment is an Open Source ORM with an API founded in Java 8 streams. This post describes some new features of the API of recently released Speedment version 3.0 which add more features to support higher order functions. A reader already familiar with higher order functions and declarative programming may benefit from skipping ahead to the last section.

Functions Operating on Functions

To many developers, there is a fundamental difference between functions and data. A common view of a traditional program is a clear separation of the two; the program is a static structure of procedures and functions determined at compile time which operates on data, part of which typically is supplied at runtime.

For example, it is natural to have a program that prints a sorted list of names that are given as input. When compiled, the sorting function is static but can dynamically be used on any data that happens to be a list of names. If our program was to be generalized to take the actual sorting algorithm as input, we have created a higher order function since our program accepts a function as input.

Seen in such a way, functional aspects of Java are not new. Supplying a function as parameter is commonplace in Java, perhaps most notably as event driven callbacks for asynchronous interfaces. The callback idiom is a very simple type of operation on functions since the function itself is never altered - it is just kept as a reference until the time comes to call it.


Runnable callback = new Runnable() {
  @Override
  public void run() {
    System.out.println("ping");
  }
};

// Some other time, in a context far, far away...
callback.run();

The concept of higher order functions entails not only that functions can be referred to and invoked within the framework of the language itself, but also that functions are treated as other data on which other functions can operate.

The Bliss of Meta Functions

A better example of the power of functional programming in Java 8 would showcase modification of an existing behavior. Even though not obvious from syntax, the Java 8 concept of streams does exactly that. Consider the following code.

IntStream.iterate(0, i -> i + 1)
  .map(n -> n*2)
  .filter(n -> n % 10 != 0)
  .limit(10)
  .forEachOrdered(System.out::println);

A reader used to for example unix pipes would perhaps interpret this algorithm as a source of data creating an unbounded sequence of integers on line 1, doubling each number on line 2, removing the numbers evenly divisible by 10 on line 3 and then on line 4 throwing away all numbers except for the first 10 before printing them when reaching line 5. Actually quite far from what will happen when the code is executed, this is in a sense a very useful intuition.

Interpreting this example as a stream of data being manipulated from top to bottom by each operation yields a correct understanding of the resulting output while creating minimal cognitive encumbrance on the reader. It makes it possible for a reader and even the designer of the code to correctly understand the meaning of the code in terms of output without really thinking about how the expression actually will be evaluated.

What actually does happen in lines 1 through 4 is that a function is created and modified. No integers are created, let alone filtered or mapped, until we reach line 5. That last line is a terminating operation of the stream entailing that it operates on all aspects of the stream, including both the source of data and the operations to finally return the result. Having access to the information about the entire operation and the source, the terminator of the stream can deliver the result without performing unnecessary operations on items what will be discarded by the limit operation.

This lazy evaluation is possible since the program describes the requested operations rather than actually executing them. What at first may seem like a simple sequence of function invocations on a sequence of integers is actually a creation of a function that is passed to another function.

By using higher order functions we decouple the declarative description of the function from the imperative interpretation. The program simply describes what is computed while leveraging previous work invested in the design of the used stream components which determine how the computation will take place.

Functions operating on other functions allows blissful ignorance of (and therefore also decoupling from) execution details.

Parallelism as a Simple Add-on

Decoupling of algorithm from executional details is also the reason why parallelism can be so elegantly implemented for Java 8 streams. The following code will utilize available processor cores to parallelize the work of the stream.

IntStream.iterate(0, i -> i + 1)
  .parallel()
  .map(n -> n*2)
  .filter(n -> n % 10 != 0)
  .limit(10)
  .forEachOrdered(System.out::println);

This small alteration of the stream will yield fundamentally different execution characteristics. In this case, the parallel version is likely to be more wasteful than the original version. The typical implementation of a parallel stream, the Spliterator, will partition the stream of integers in chunks processed concurrently. Since several threads are helping out multiplying integers by two and filtering out numbers that are multiples of ten, the limit operation which requires synchronization between the parallel workers will be fed with more numbers than it will allow to pass through. The first non-parallel example would pull items from the source when needed, stopping just when the limit is reached.

Speedment Taking Higher Order Functions to the Extreme

Similarly to the declarative example above where the composer of the stream describes what to calculate rather than how to do it, the typical user of a relational database is used to query the database by means of the declarative language SQL. Compared to the number of software engineers that have ever designed software that uses data from a relational database, very few are actually concerned with the details of how the database engine cleverly executes the query.

An ORM provides a mapping between the data model of a relational database and the object centric view of an object oriented language such as Java. The Java 8 stream ORM Speedment provides such a mapping with a fully declarative Java 8 stream API.

As seen above, the conceptual building blocks for declarative programming of applications leveraging relational database data are available even without Speedment;

  1. the database is queried using the declarative SQL and
  2. the Java 8 streams features provide a similarly declarative approach to design of the operations on the data once retrieved to the JVM.

However, without Speedment the two declarative languages have to be mixed in an application. In a previous post on this blog we have seen that the designer has to suboptimize two separate programs; the declarative description of the data to retrieve from the database and then in a separate language the operations to perform on that data in the JVM.

The language barrier between the database engine and the JVM forces the designer to design, optimize and then maintain the structure of the data as transferred from the database since it constitutes output from the first part of the solution and input to the next. Taking the declarative approach from its starting point in two declarative building blocks all the way to its logical conclusion of a unified declarative construct, Speedment abstracts away the SQL query and allows the user of the framework to create a design based solely on streams.

Consider the fully declarative Speedment way of counting the users belonging to a particular department in the following code snippet.


long count = users.stream()
  .filter(User.DEPARTMENT.equal(1))
  .count();

The source of the stream is a manager instance called users. As described in length above, viewing a stream as a sequence of objects flowing from the source and modified on each following line will give the correct understanding of the result of the operation while not necessarily any insight to the actual execution of the program. The same applies here. Retrieving all the users from the database in line 1, then in line 2 filtering out the users belonging to department number 1 and finally counting the remaining users would yield the desired result of the operation.

This understanding of the algorithm has a strong appeal in its abstract simplicity and allows the Speedment framework to decide on all the details of the execution. Not only does Speedment optimize the operations on the data in the JVM, but since also the relational database operations are covered by the abstraction the SQL query will be created by Speedment taking the full pipeline of operations into account.

By means of a sequence of reductional operations on the pipeline Speedment will collapse the operations to a single SQL statement that relieves the JVM from any operations on user instances. The executed query will actually be as follows.


SELECT COUNT(*) FROM user WHERE (department = 1)

As showed above in the example of the parallel streams, in a declarative setting where the framework makes decisions on execution strategy it is easy to reuse a declarative program in a new setting with completely different executional properties. In its enterprise version, the Speedment framework may also chose to execute queries using data from an in-JVM memory data store, improving database latency by orders of magnitude without changing a single line of the declarative program describing the business logic.

Speedment 3.0 Structural Support for Higher Order Functions

Recently released Speedment 3.0 adds improved API support for higher order functions. While the runtime framework SQL optimization of those constructs is not yet finished, it allows design of more complex and elegant declarative programs that are prepared to be executing more efficiently in later versions of Speedment. The following code can be used to set the “category” property to 3 for all users born in a particular range of dates.

users.stream()
  .filter(User.BORN.between(1985, 1995)) // Filters out Users born 1985 up to and including 1994
  .map(User.CATEGORY.setTo(3))           // Applies a function that sets their category to 3
  .forEach(users.updater());             // Applies the updater function to the selected users
The updater construct is new in Speedment 3.0 and lends itself to executional optimization in the Speetment framework since if the preceding pipeline has suitable properties the whole update operation pipeline may be collapsed to a single SQL UPDATE statement. While version 3.0 of the framework does not perform all the possible optimizations to collapse the pipeline to a single SQL statement, the program will yield a correct result by a sequence of UPDATE operations. Another example of the extended support for higher order functions in Speedment 3.0 are the categorizers. Functions designed to work with the standard Java stream operations, the categorizers allow constructs such as the following which creates a mapping from users to their homepages assuming a homepage has a foreign key pointing to its owner.

Map<User, List<HomePage>> join = homepages.stream()
  .collect(
    groupingBy(users.finderBy(HomePage.OWNER), toList())
  );
As we have seen in two examples above, the beauty of the declarative programming paradigm is that executional properties are abstracted away. When the Speedment framework in a later version implements all the needed optimizations, existing Speedment leveraging applications will start emitting smarter SQL queries to the relational database as soon as they use the newer framework version. A dream come true from an application maintenance perspective, the declarative program describes a solution and therefore remains the same since the problem it solves is the same.

Oct 31, 2016

When the Hare Met the Reindeer at JavaOne

The Java Duke, Vaadin Reindeer and Speedment Hare - a picture by Elis Minborg, inspired by a John Bauer

Coming from a software development background mainly rooted in Java I have spent roughly two years away from the Java world, exploring the wonderful aspects of embedded system design in the automotive industry. Taking on new challenges yet again related to Java is a cognitive homecoming of sorts for me while the spatial aspects of the new job initially entail transatlantic commuting. My first California visit on the new job at Speedment Inc luckily coincided with the JavaOne 2016 conference in San Francisco.

The story that you are about to read is about one particular encounter at this for me very inspiring event. You will learn about Hares and Reindeer, Nordic cooperation at its finest and a perfect match of frameworks creating an inspiringly elegant hello world web application rendering data from a relational database in a browser.

TL;DR: Jump directly to the take home message or the code in case you do not want to indulge in the full narrative.

JavaOne 2016 Keeping the Promises of Conferences Past

At the conference JFocus in 2013, there was a consensus that Java felt dated as compared to more modern languages such as Scala and Groovy and suffered from somewhat deserved lack of enthusiastic backing. The upcoming Java 8 promised great progress, however, and it was repeatedly stressed that the future for Java looked brighter than the recent past since the new concepts of streams and lambdas would modernize the language.

Having watched the progress from a certain distance since the automotive industry has a way of absorbing one's attention, it was a delight to be back in 2016 to what seemed quite similar to the bright future pictured a few years ago. The future progress promised in 2013 had not only materialized but also been widely embraced. The functional paradigm of Java 8 lambdas and streams had not been received as just a facelift but as the fundamental gamechanger it actually is.

Attending a big conference when just having started a new job is a great way of getting a feel for the general direction of the industry and the way the product of the company is received. An inspiring and greatly informative quick start, participating at such a major congregation immediately tells what parts of our product actually peak others interest and what kind of questions arise during follow-up discussions.

I learned from several enthusiastic exchanges that Speedment bringing Java 8 style functional programming in an innovative and uniquely consistent way to the realm of relational databases was the key factor that caught others attention.

An Inspiring Meeting - Vaadin and Speedment

The most inspiring interaction was probably with UI framework designers Vaadin. Creating a framework that can be described as AWT/Swing for the web done right, Vaadin brings elegant programming of web interfaces to Java. Delivering relational data to Java in a way very similar to what Vaadin expects as input, Speedment seems at first glance to be the perfect match with Vaadin and together the two frameworks could in theory bring relational data to the web allowing minimal boilerplate business logic in between.

Capturing the spirit of the meeting the graphic illustration of this blog post is inspired by painter John Bauer and pictures a reindeer and a hare working together. With its roots in Finland, Vaadin has a Reindeer mascot. Speedment has Swedish heritage and mascot Hare.

Nordic people may not be known for overly expressive outbursts of enthusiasm, but I believe that I displayed my excitement with an average of more than one positive word per sentence and an occasional smirk that may have resembled a smile. Thus expressing profound enthusiasm (bordering the socially acceptable in my home country) about the promising outlook, we decided to try it out. What about trying to get Vaadin and Speedment work together there and then at the exhibition hall of the conference?

It has happened to me many times that what at first seems like a great idea in theory somehow does not handle the meeting with reality very well. General discussions in the abstract world of concepts quite often reveal overlapping or matching ideas that implicate great synergies, only to let one down when further explored. This is quite natural for at least two reasons;

  1. We are able to reason about fundamentally complex systems much due to our ability to perform agile transitions between different levels of abstraction. Roaming the realm of higher abstractions, we allow ourselves the luxury of turning a blind eye to details, enabling us to find patterns on a larger scale. When descending the stairs of abstraction to lower levels we often find ourselves knee deep in a swap of complicating details.
  2. Additionally, when talking about newly found synergies, one only has deep knowledge about a limited part of the problem since otherwise the topic at hand would not be new in the first place. Compared to thinking within one’s knowledge comfort zone, reasoning about something one does not really know typically gives rise to much creativity and rejoicing but increases the risk of creating things that simply do not work.

Well knowing this, setting out to create running code right there on the exhibition floor seemed like a fun exercise yet doomed to run into some obstacle, no matter how clean and perfect the match does initially seem. It would turn out that the expected obstacles never materialized.

Creating a Hello World Relational Data Web Application

Granting myself some philosophical leeway, I submit to the reader that many great things that are considered to be invented are actually rather discovered. Instead of being born as the result of a creative process, they were in some ontological sense present all the time waiting to be found. I would say that the application that emerged in my laptop while bright minds of Vaadin and Speedment contributed with each part of the solution was the result of the two frameworks being cleverly designed, rather than an inspired act of invention in that busy exhibition hall.

After some ten minutes of coding no more than standard instructions of the respective framework, the browser of my laptop rendered data from a table of an SQL database. The data was delivered by Speedment, manipulated in straightforward Java 8 code with minimal boilerplate and then sent to Vaadin for rendering.

One reason that this application worked more or less out of the box is that Vaadin and Speedment agree on using standard Java constructs in their interfaces. Had there been any special constructs used for any of the frameworks, the code would be bloated with adapting boilerplate.

For the toy example we started out with one of our example databases. The table of interest in this example is a table of hares that have name, color and age.


CREATE TABLE `hare` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(45) NOT NULL,
  `color` varchar(45) NOT NULL,
  `age` int(11) NOT NULL,
  PRIMARY KEY (`id`)
);

INSERT INTO hare (name, color, age) VALUES ("Hansel", "Gray", 3);
INSERT INTO hare (name, color, age) VALUES ("Henrietta", "White", 2);
INSERT INTO hare (name, color, age) VALUES ("Henry", "Black", 9);
INSERT INTO hare (name, color, age) VALUES ("Harry", "Gray", 400);
The business logic of our toy example is minimal, the following declarative code describes filtering out hares that have age above 3 and collecting them in a list. The Speedment framework has an API founded in Java 8 streams, a fact that was the major cause of excitement when describing Speedment to people I met at JavaOne. For further reading on that topic, please see for example this previous blog post.


  final List haresList = hares.stream()
    .filter(Hare.AGE.greaterThan(3))
    .collect(Collectors.toList());

The first line starts out the description of an operation pipeline operating on a stream of hares from the database and the second line adds a filter to the pipeline telling Speedment to find hares of the desired age. The last line terminates the pipeline, defining the result as a list of the filtered hares.

Having a list of beans, a table of the Vaadin framework can be populated by the following piece of code.


  MTable<hareimpl> table = new MTable<>(Hare.class)
      .withCaption("Hares older than 3 years")
      .withProperties("name", "color", "age");
  table.setRows(haresList);
  setContent(table);

A First Test Run

The code shown so far is all the business logic code needed to get a live view of the database content in the browser. Asking the database for the contents we happen to find 4 hares.

  mysql> SELECT * FROM hare;
  +-----+-----------+-------+-----+
  | id  | name      | color | age |
  +-----+-----------+-------+-----+
  |   1 | Hansel    | Gray  |   3 |
  |   2 | Henrietta | White |   2 |
  |   3 | Henry     | Black |   9 |
  | 700 | Harry     | Gray  | 400 |
  +-----+-----------+-------+-----+
  4 rows in set (0,00 sec)

Our business logic described above filters out the hares with age strictly greater than 3, yielding two lines of the table presented in the browser.

Further Improvements

A promising feature for a future blog post about Vaadin and Speedment is the lazy loading of MTables. It seems quite straight-forward, and probably interacts nicely with Speedment creating streams of data instead of a fully instantiated list.

The upcoming Vaadin 8 API promises to take Java 8 support to a higher level. See for example this blog post. I look forward to exploring how this may enable even more elegant Speedment Vaadin interaction.

Final Remarks: The Take Home Message

Are you a web application developer leveraging data from a relational database? This blog post describes how Speedment and Vaadin provide the ideal framework to create a modern, type safe, easy to maintain and elegant Java 8 streams based application without any query language or cumbersome boilerplate for the UI.

I thoroughly enjoyed being there to see this application appear on my screen. It gave me some insight into the Vaadin framework and I took with me new knowledge of how to give a quick visual presentation of the power of the Speedment framework. For a back end tool provider, demonstrating end user value is often an indirect exercise. To show a relevant example of how to use our tool, we need to add some front end presentation and Vaadin is a great match for doing so in the future.

The two frameworks seamlessly interacting to create a solution to a toy representation of a real use case, in this case a web application using a relational database, is a sign of relevancy and maturity of both frameworks.

What you have seen here is a starting point of more to come, a proof of concept if you will. Since the two frameworks complement each other so well to create a self contained complete example, we will build on this to create more elaborate Speedment web applications using Vaadin . For a developer new to the field, it is very easy to see that the frameworks solve and abstract away the domain specifics and allows the developer of the business logic to create her application using modern standard Java code with minimal concern for boilerplate or framework specific constructs.

As described in more detail in for example this blog post, Speedment allows a developer to access the relational database in a declarative way without explicitly creating SQL queries, which allows for very elegant code with low maintenance cost. The proof of concept described here shows how nicely Vaadin provides a web front end that works with Speedment out of the box.

Appendix: How to Run the Application

The code shown above is all the Java code needed for the application to run. To get the frameworks configured and running, clearly some dependencies are needed in the pom file and the Speedment framework needs proper database credentials. The following steps will get it running by using this github repository as a starting point.
  1. Create the database table.
    
    CREATE TABLE `hare` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `name` varchar(45) NOT NULL,
      `color` varchar(45) NOT NULL,
      `age` int(11) NOT NULL,
      PRIMARY KEY (`id`)
    );
    
  2. Populate the database with some data to make the application result interesting.
    
    INSERT INTO hare (name, color, age) VALUES ("Hansel", "Gray", 3);
    INSERT INTO hare (name, color, age) VALUES ("Henrietta", "White", 2);
    INSERT INTO hare (name, color, age) VALUES ("Henry", "Black", 9);
    INSERT INTO hare (name, color, age) VALUES ("Harry", "Gray", 400);
    
  3. Clone the github skeleton containing a pom.xml file and the java code from this blog post.
    git clone https://github.com/lawesson/speedment-vaadin-demo.git
  4. Change directory to the newly created git repo.
    cd speedment-vaadin-demo
  5. Run the Speedment code generation tool.
    mvn speedment:tool
  6. Enter database credentials in the UI, fill in the schema hares and press the Generate button without changing any defaults to create Speedment code.
  7. Run the application. Substitute pwd and user with proper database credentials.
    mvn -Djdbc.password=pwd -Djdbc.username=user compile exec:java
  8. With the application running, point a browser to the Vaadin rendered UI located at http://localhost:8080.
  9. Optionally, add some more hares in the table and reload the browser page to see the data filtered by your application.

Edit: Since this post was originally published, Matti Tahvonen of Vaadin has provided valuable feedback and a pull request to the git repo bringing the example code up to speed with the latest advances of Vaadin technology.

Sep 29, 2016

Java 8: Streams in Hibernate and Beyond

In version 5.2 Hibernate has moved to Java 8 as base line. Keeping up with the new functional paradigm of Java 8 with lambdas and streams, Hibernate 5.2 also supports handling a query result set as a stream. Admittedly a small addition to the API, streams add significant value by allowing the Hibernate user to leverage streams parallelism and functional programming without creating any custom adaptors.

This post will elaborate on the added superficially small but fundamentally important streams feature of Hibernate 5.2 and then discuss how the Java 8 stream ORM Speedment takes the functional paradigm further by removing the language barrier and thus enabling a clean declarative design.

The following text will assume general knowledge of relational databases and the concept of ORM in particular. Without a basic knowledge of Java 8 streams and lambdas the presentation will probably seem overly abstract since basic features will be mentioned without further elaboration.

Imperative Processing of a Query Result

To contrast different approaches to handling the data from a relational database via Hibernate, we consider an example that despite its simplicity illustrates the big picture - an HQL query produces a set of Java objects that are further processed in the JVM. We start out fully imperative and gradually move towards a declarative design.

The table we use is a table of Hares, where a Hare has a name and an id.


CREATE TABLE `hare` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(45) NOT NULL,
  PRIMARY KEY (`id`)
);

To avoid discussing the query language per se, we use an example of a simplistic HQL query that creates a result set containing all the contents of a table of the database. The naïve approach to finding the item we are looking for would be to iterate over the data of the table as follows.


List<Hare> hares = session.createQuery("SELECT h FROM Hare h", Hare.class).getResultList();
for (Hare hare : hares) {
  if (hare.getId() == 1) {
    System.out.println(hare.getName());
  }
}

Note how the design of the query result handling is fully imperative. The implementation clearly states a step-by-step instruction of how to iterate over the elements and what to do with each element. By the end of the day, when it is time to run the program, all programs are in a sense imperative since the processor will need a very explicit sequence of instrucitons to execute. The imperative approach to programming may therefore seem the most intuitive.

Declaring the Goal, Receiving the Path

In contrast to the imperative design, the declarative approach focuses on what to be done, rather than on how to do it. This does not just tend to create more concise and elegant programs, but introduces a fundamental advantage as it allows the computer to figure out the transition from what to how. Sometimes without even thinking about it, many programmers are used to this approach in the realm of relational databases since the query language SQL is one of the most popular instances of declarative programming. Relieved of the details of exactly how the database engine will retrieve the data the designer can focus on what data to get, and then of course what to do with it after it is retrieved.

Java 8 streams and lambdas allow for a declarative approach to handling collections of data. Instead of listing a sequence of instructions to be carried out, the user of a stream first creates a pipeline of abstract operations to be carried out and when presented with a terminated pipeline, the stream implementation will figure out the imperative details.

Even before Hibernate 5.2, our running example could be ported to the Java 8 domain of streams by just adding a simple method call in the chain of operations since the List itself has a stream method.


List<Hare> hares = session.createQuery("SELECT h FROM Hare h", Hare.class).getResultList();
hares.stream()
  .filter(h -> h.getId() == 1)
  .forEach(h -> System.out.println(h.getName()));

While this example may seem similar to the imperative iteration in the previous design, the fundamental difference is that this program will first create a representation of the operations to be carried out and then lazy evaluate it. Thus, nothing actually happens to the items of the List until the full pipeline is created. We express what we want in terms of a functional composition of basic operations but do not lock down any decisions about how to execute the resulting function.

Since a major feature of functional programming is the compositional design, a more typical streams approach would be to chain stepwise operations on the data. To extract the name of the item, we may map the getter on the stream as follows.


List<Hare> hares = session.createQuery("SELECT h FROM Hare h", Hare.class).getResultList();
hares.stream()
  .filter(h -> h.getId() == 1)
  .map(Hare::getName)
  .forEach(System.out::println);
This pattern is suboptimal in many aspects. The obvious problem mentioned above of not using the database to filter the data in the first place is just one of them. Another problem is that this pattern forces the entire table to be loaded in JVM memory before the iteration can start. Notice how a List of Hare is populated in the first line of each code snippet. In order to start filtering that stream, we first instantiate the entire source of the stream. This means that in terms of memory footprint, the JVM memory will have to accommodate all Hares in the database - completely defeating the purpose of analyzing a set of data piece by piece in a stream.

Streaming a Result Set

As shown in the section above, naïve streams of query results in Hibernate before version 5.2 came with a steep price. Even though the underlying JDBC database driver handles the results of a query as a result set over which the user may iterate piece by piece, streams were not supported by the query result instance, forcing the functionally inclined developer to implement a stream source from the result set or go via an intermediate collection to get a stream without custom stream implementation.

With Hibernate 5.2, the query result can produce a stream, allowing the following minimal change in code which has the important advantage of not loading the entire table into an intermediate representation from which to source the stream.


session.createQuery("SELECT h FROM Hare h", Hare.class).stream()
  .filter(h -> h.getId() == 1)
  .map(Hare::getName)
  .forEach(System.out::println);
With this improvement, Java 8 streams are efficiently supported out-of-the-box by Hibernate without creating a custom stream source from the result set.

Selection by the Source

As mentioned above, the code snippets so far illustrate the general case of an HQL query generating a result which the JVM will use for some further processing. The details of the example reveal an almost offensive lack of interest in what the database can do for the user locally in terms of filtering data and that shortcoming will be adressed in the following.

The optimization desperately needed for this code is of course to adjust the query to allow the database to create a result set closer to the desired result of the operation. Focusing on just filtering the rows of the database and leaving the extraction of the columns to the JVM, the now familiar code snippet can be updated to the following.


session.createQuery("SELECT h FROM Hare h WHERE id = 1", Hare.class).stream()
  .map(Hare::getName)
  .forEach(System.out::println);

Note that this short piece of a program contains two declarative parts that require separate design with different kinds of considerations. Since the program is divided between what happens before and after the stream is created, any optimization will have to consider what happens on both sides of that barrier.

While this indeed is considerably more elegant than the first example (which admittedly for pedagogical reasons was designed to showcase potential for improvement rather than representing a real solution to a problem), the barrier poses a fundamental problem in terms of declarative design. It can rightfully be claimed that the program still is an imperative program composed by two declarative sub routines - first execute the query and then execute the Java part of the program. We may chose to refer to this as the language barrier, since the interface between the two declarative languages creates a barrier over which functional abstraction will not take place.

Enter Speedment - Going Fully Declarative

As discussed above, the advantages of Java 8 streams extend far beyond a more elegant syntax. The appeal of the functional approach to data processing also stems from among other things,
  • the seamless generalization to parallelism (expressing a design as a pipeline of operations is a great starting point for building a set of parallel pipes),
  • design by composition (reuse and modularization of code is encouraged by a paradigm of composing solutions as a composition of smaller operations),
  • higher order functions (behavior expressed as lambdas can be used as language entities such as parameters to methods) and
  • declarative programming (the application designer focuses on what is needed, the framework or stream primitives design determines the details about how, allowing lazy evaluation and shortcuts).

We have shown how the new Hibernate API of version 5.2 adds basic support for streams, which allows for a declarative approach to describing the operations applied to the dataset retrieved from the database. While this is a fundamental insight and improvement, the Hibernate design with a foundation in an explicit query language limits the reach of the declarative features of the resulting programs due to the language barrier constituted by the interface between two languages.

The logical next step along the path from iterative to declarative design would be to break the language barrier and that is what the Java stream ORM Speedment does.

In the Speedment framework, the resulting SQL query is the responsibility of the framework. Thus, a program leveraging Speedment does not use any explicit query language. Instead, all the data operations are expressed as a pipeline of operations on a stream of data and the framework will create the SQL query. Returning to our example, a Speedment based design could be expressed as follows.


hares.stream()
  .filter(h -> h.getId() == 1)
  .map(Hare::getName)
  .forEach(System.out::println);

The hares manager is the source of the stream of Hares. No SQL will be run or even created until the pipeline of operations is terminated. In the general case, the Speedment framework cannot optimize a SQL query followed by lambda filters since the lambda may contain any functionality. Therefore, the executed SQL query for this example will be a query for all data in the Hares table since the behavior of the first filter cannot be analysed by the framework. To allow the framework to optimize the pipeline, there is a need for a data structure representing the operations in terms of basic known building blocks instead of general lambda operations. This is supported by the framework and is expressed in a program as follows.


hares.stream()
  .filter(Hare.ID.equal(1))
  .map(Hare.NAME.getter())
  .forEach(System.out::println);

The pipeline of operations is now a clean data structure declaratively describing the operations without any runnable code, in contrast to a filter with a lambda. Thus, the SQL query that will be run is no longer a selection of all items of the table, but instead a query of the type "SELECT * FROM hares WHERE ID=1". Thus, by removing the language barrier, a fully declarative design is achieved. The program states "Find me the names of the hares of the database with ID 1" and it is up to the Speedment framework and the database engine to cooperate in figuring out how to turn that program into a set of instructions to execute.

This discussion uses an very simplistic example to illustrate a general point. Please see the Speedment API Quick Start for more elaborate examples of what the framework can do.

Edit: This text is also published at DZone: Streams in Hibernate and Beyond.

Aug 29, 2016

The Internal Cache of Integers

There is an IntegerCache in java.lang.Integer which stores instances for values -128 though 127. This means that Integer.valueOf(17) always will return the very same instance, while Integer.of(200) will not. While this clearly has the advantage of reuse of commonly used Integer values, thus relieving the GC from some work, it also has implications for autoboxing and identity comparisons.

The Cache

An easy way to see the cache in action is to investigate identity for two small integers compared to numbers outside the scope of the cache. The following two statements both hold true.
Integer.valueOf(17) == Integer.valueOf(17)
and
Integer.valueOf(200) != Integer.valueOf(200)

The java.lang.Integer.IntegerCache and Autoboxing

The cache ensures that there is exactly one Integer instance for each of the 256 integer numbers closest to 0. While this may have impact for GC, a more fundamental effect is how this affects autoboxing as can be seen in the following example.


public class IntegerCacheTest {
  private static void test(Integer i, Integer i2) {
    System.out.println(i);
    if (i == i2) System.out.println("  the same");
    if (i != i2) System.out.println("  different");
    if (i.equals(i2)) System.out.println("  equal");
  }

  public static void main(String[] args) {
    test(17, 17);
    test(200, 200);
  }
}
Knowing about the caching of instances for integer 17 but not for 200 the reader may already have expected to get differing results for the two comparisons and indeed, the output is as follows.
17
  the same
  equal
200
  different
  equal
What happens is that 17 and 200 are autoboxed when sent as parameters to a method accepting Integers, but for the lower number the cache will ensure we get the identically same instance while the higher number is outside the size limit of the cache and therefore we get two different but equal instances.

Tweaking the Cache Size

The size of the cache can be tweaked with the -XX:AutoBoxCacheMax= option. Thus, if we run the above program setting the cache size to 1000 by adding -XX:AutoBoxCacheMax=1000 to the VM parameters, we get the following result.
17
  the same
  equal
200
  the same
  equal

The Cache Internals

The cache is used as follows, as seen in the source code of java.lang.Integer.

  public static Integer valueOf(int i) {
    if (i >= IntegerCache.low && i <= IntegerCache.high)
      return IntegerCache.cache[i + (-IntegerCache.low)];
    return new Integer(i);
  }
The actual cache can be found in java.lang.Integer and looks as follows.


...
  private static class IntegerCache {
    static final int low = -128;
    static final int high;
    static final Integer cache[];

    static {
      // high value may be configured by property
      int h = 127;
      String integerCacheHighPropValue =
        sun.misc.VM.getSavedProperty("java.lang.Integer.IntegerCache.high");
      if (integerCacheHighPropValue != null) {
        try {
          int i = parseInt(integerCacheHighPropValue);
          i = Math.max(i, 127);
          // Maximum array size is Integer.MAX_VALUE
          h = Math.min(i, Integer.MAX_VALUE - (-low) -1);
        } catch( NumberFormatException nfe) {
          // If the property cannot be parsed into an int, ignore it.
        }
      }
      high = h;

      cache = new Integer[(high - low) + 1];
      int j = low;
      for(int k = 0; k < cache.length; k++)
        cache[k] = new Integer(j++);

      // range [-128, 127] must be interned (JLS7 5.1.7)
      assert IntegerCache.high >= 127;
    }

    private IntegerCache() {}
  }
...
Note how the cache is pre-filled on startup rather than on demand, meaning that the memory footprint of the cache is constant no matter which Integers are actually used in the VM. Tweaking the cache to a very large size thus comes with a steep prize in terms of memory consumption.

Interestingly enough, the parameter to set the size of the cache only affects the upper limit and never the lower, meaning that there is no simple way to get Integers smaller than -128 to be interned in the cache.

Edit: There is a reddit thread about this blog post with valuable feedback. For instance, it is pointed out that new instances of small Integers can still be created with the new operator, a feature that will be deprecated in Java 9.

Also published at DZone: The Internal Cache of Integers.


Jul 25, 2016

The Way of the Lambda - Elegant and Efficient

In Java 8, you can do
items.removeIf(i -> predicate(i));
where you used to do
for (Iterator it = items.iterator(); it.hasNext();) {  
    if (predicate(it.next())) {
        it.remove();    
    }
}
which is not only considerably more elegant but also a more efficient solution in some cases. I have the graphs to prove it!

Background

Recently having spent a few years away from Java development, it is great to be back and find that Java 8 indeed has delivered on what was promised in 2013. With lambdas and streams it is possible to tackle problems in a functional manner and write code that is both readable and efficient. This over due modernization of the language is by far the most important change in Java 8, but there are also other improvements that deserve mentioning. This post elaborates on the topic of Collection.removeIf() which allows for conditional removal of elements of a collection.

The Way of the Iterator

Removing elements of a collection while iterating over it is somewhat like cutting off the branch on which you are sitting, but by using Iterators that can be done without a runtime ConcurrentModificationException. Consider the following typical simplistic example, where the elements of the collection items for which predicate() holds true are removed.
for (Iterator it = items.iterator(); it.hasNext();) {
    if (predicate(it.next())) {
        it.remove();    
    }
}
If the collection in question is a linked list of size n and m elements are removed, the full iteration with removals is O(n + m) = O(n) since iteration steps and removal from the list are constant time operations and m is limited by n. If, however, the data structure of the collection is one where removal is relatively expensive compared to iteration, the complexity becomes less favorable. For an ArrayList a single removal step is O(n) since all items at higher indices have to be moved. There are still n iteration steps to be made so the full sequence of removals becomes O(m*n) which is O(n^2) assuming worst case with the number of removals being a constant fraction of the array size.

In more pragmatic terms, what happens in the ArrayList case is that for each element that is removed, all elements of higher indices are moved to fill the gap in the backing array. This is clearly suboptimal since that operation is repeated for each removal. Thus, to reach its final destination an element will visit all locations between the starting position and the destination. In case many elements are removed, this may be a considerable amount of work in vain.

The Java 8 way of the lambda

In Java 8, the code snippet from above can be expressed as follows.
items.removeIf(i -> predicate(i));
Clearly more concise, this expression is also more efficient on an ArrayList compared to the Iterator approach. Where the Iterator requires completion of each removal operation before continuing to the next one, removeIf operates on a predicate and can therefore iterate over all items once to determine which elements are to be removed and then shift the remaining elements to the correct new indices all in a second pass. Therefore, removeIf is O(n) since it performs two linear passes through the collection.
Each retained element in the resulting collection is moved just once, while for the Iterator approach it would be moved one step at a time towards its destination. See this gist for the relevant JDK code.

The Difference Visualized

We compare execution times for the two approaches as a function of the array size for a rather extreme case where 999 out of 1000 items are removed from an ArrayList. For each data point 40 runs are made, the 5 fastest and slowest outliers are removed and the sum of the remaining 30 form the execution time value. For details please refer to this gist.

We expect the execution time to be considerably shorter for the Java 8 way of the lambda and as a matter of fact that is what we find. We also note that within the resolution of the graph, the lines are quite noise free due to the many repeats and filtering of outlier results.



For array sizes of 150,000 elements, the removeIf() approach is more than 3000 times faster. Even more importantly, the factor between the two increases as the size of the array increases which is expected since the way of the lambda has a superior asymptotical time complexity.

To better compare the time complexity of the two, we scale up the green line to match the green one. By multiplying the execution times of the faster algorithm by ~1500, the graphs can be compared in terms of shape. As expected, we find linear behavior for the Java 8 removeIf() design, while the old ways show a quadratic time complexity due to the inefficient removal of elements.


When scaling up the measurements we get to see some noise in the green line. The sensitivity to random fluctuations in execution time is greater since the execution times are several orders of magnitude shorter compared to the blue. Both lines are fitted to polynomials of power two and as can be clearly seen, the light green line is very similar to a straight line while the blue line clearly aims for the sky in a super linear fashion.

Edit: There is a reddit thread about this blog post. Good points are being made there, for example the fact that the removeIf implementation is atomic (in the database sense) as it first evaluates the predicate for all items before making any alterations to the list. Thus, if the predicate throws an exception for any of the items in the list, the list will remain unchanged.

Also published at DZone: The Way of the Lambda and removeIf().