Part I – And why would I even need that
If you are like most Java programmers, or indeed most programmers, a lot of your code will read
for(Money dePluribusUnum: wallet) {
// Just do it.
// Lots of it.
// And make us some money.
}
And that’s cool. Mostly.
It happens to the best of us
Loops are cool, until at some point someone will ask you to make that whole thing faster. Or someone gives you a wallet that is so big that it does not fit into main memory. First, you bring out the profiler. But there is no gain. It just takes that long to do the stuff inside the loop. Then you ask for a faster computer. And more RAM. But the execution time just won't budge and the JVM just can't address more memory. Or you could flee to a different language that is somehow 'better' and re-write all the stuff.
By this time the 'make it faster for bigger wallets' ticket has become bright red and your boss has talked to your bosses' boss and it all becomes a big kerfuffle.
There is no avoiding it, you need to have a different idea. Foggily you recall Threads. And Iterators. And soon you have entered the daemonic world of race conditions, dead-locks and live-locks. Your code is riddled with synchronized blocks that produce as much sync, speed and fluency as the Hydramatic on an old Chevy Chevette.
Riddled by back-pain and angst you drag yourself to the coffee-machine, only to meet the new chick. Seems friendly enough, but clearly straight from college and no trench experience. She asks you where the COVID masks are and where she can get the key to the cycling shower. You help and then jokingly say: "Now you help me with my problem." And then you tell your sad tale and how fishing is likely off for the weekend.
"Why don't you use Streams?" says the Noob. "Like IO Streams? - How is that going to help?" you retort, clearly puzzled by the absurd idea. "No, I mean functional Streams". She sits down with you and you pair program. You set up her SSH keys and the pesky HR service and she shows you the only way she knows to do fast resource-sparing code_._
By the time the fluorescents come on in the office, you two are the last people in the shop, the pizza is gone, but your weekend is saved. The code is pretty, linear and hardly needs comments. And the thing scales. The more cores you throw at it, the faster it gets. And the memory spike before the loop that blew out: Gone.
Functional Streams. Who would have thought?
The Stream that ate the World
In the past Moore's law has held steady. At regular time intervals we have seen an increase of processing power because circuits got smaller and smaller, and clock speeds got higher and higher. The way we worked with loops, variables and instructions directly mapped to processor opcodes and registers. But in the last ten years physics struck. Structures are getting too small to isolate the electrons. And the only way to scale is becoming to add more processors on the same chip. Enter: the multi-core machine.
At first, this bothered no-one. Each processor roughly got a program assigned. Then an idea came down from the realm of graphics cards, where working on operations for each pixel in parallel was common: How about, we write software so the infrastructure can make use of all the cores? We apply the operation in parallel and then condense to result if we must do so.
Map-Reduce was born and started powering Google's searches. To follow up, Java was retrofitted with a new control system for Threads and a Fork-Join-Task to roughly model this functionality. The canonical example is fittingly an Image Blur function. Here is a good introduction to Java's fork-and-join in case you like procedural thread-oriented thinking.
Retrofitting Java
We will travel away from 2013 towards the present. Originally, Java is an object-oriented language. Its aim is to keep state encapsulated in an object and change the object. This property is called mutability. Mutable objects are no problem for a single process working with the data, but they are very tricky to handle when you are running multiple processes. If properties of an object are related, interleaved access from different processes can leave the object in an unforeseen state. Waiting for work of one process to lead to the next defined state can easily lead to repeated attempts for access that never succeed (livelock) or rings of accesses with participating processes waiting for one another immobilized (deadlock).
Hence working with multiple processes, you prefer objects that are either straight-up atomic or have a record style. Atomic values are values that change completely in one processing cycle. They are things that fit into one computational gulp of the processor: Integer numbers, single characters, a row of Boolean flags and so on. Records are data structures that are recorded once and then are permanent. An alternative term for Record in Java is Immutable Class. But unfortunately, Immutable Classes in Java are still classes and must be designed diligently by a developer. However, the demand for Records has increased so much in Java, that Java has gained a way in version 14 to have code for functionally correct Immutable Classes automatically generated.
As a result of this trend that returns to separate modelling of data, its natural counterpart, the Function, has reappeared. In the next installment of this blog, we are going to explore how functions were woven into the existing architecture in Java 8. Spoiler: This has added some unexpected properties to the language and without knowing this prequel some things past version 8 of Java will be just strange.
HelloStreams.java
Now, finally, the scene is set to explain what the functional Streams facility is.
It allows you to:
- Get a stream of immutable data (atoms or records),
- That may even be endless (D,A,T,A) and
- Run it through several transformations (red MAP, blue MAP).
- Which are functions that take the type in the input and push out the type in the output
- Or run multiple stream inputs into a reduction,
- Which is a function that takes a whole stream as input and produces an aggregate as output.
Here is an example of some imperative code to say hello to streams:
int x = 0;
int sum = 0;
final int limit = 3004
while(){
x+=3;
if (x > limit) break;
sum+=x;
}
And here is its functional equivalent:
import IntStream.iterate
import Collectors.sum
final int limit = 3004
iterate(0, x -> x+3).takeWhile(x -> x > limit).collect(sum());
It has half of the lines of the imperative version. Note that if the imports are used more often, this ratio will only improve. It has better clarity. As a last step in this article, I will take it apart, so you see how it works:
iterate(0, x -> x+3)
: An endless stream of integers, for the skip list of 3: 0, 3, 6, 9 …
takeWhile(x > limit)
: We will take elements named x from this list as long x is less than limit and them to a new outgoing stream. If that is not the case anymore, we will cut the stream of.
collect(sum())
: We will collect the stream into a single number using the sum() function for this.
In the next installment: Adapting Java to work with Streams and Functions, so watch this space.