Showing posts with label Functional programming. Show all posts
Showing posts with label Functional programming. Show all posts

Monday, 14 November 2016

[Java 8 / Parallel stream / Stream] Should I always use parallel stream instead of stream ?

Streams are probably one of the most commonly used feature of Java 8. At first people discover forEach() method, then map() and filter() and so on. Some of them starts reading about functional programming but from my experience I'd say that in general people still think that stream is just an improved looping structure.

Then comes this exciting moment when they realize that it all can be much, much faster because there's also parallelStream. And then problems come...

Very often when I'm waiting for something I look into the code and try to fix some crappy parts. This one I've found yesterday:
resource.setRegions(product.getRegions().parallelStream().map(Region::getName).collect(toList()));
Our database has something like ten regions. Let's see how long it takes to collect such items using stream and parallelStream.
public static void main(String[] args) {
    final List<Region> regions = IntStream.range(0, 10)
                        .mapToObj(i -> new Regioan("region:" + i))
                        .collect(toList());

    useLabel("stream()").andLogPerformanceOf(() -> regions.stream()
                                                         .map(Region::getName)
                                                         .collect(toList()));
    useLabel("parallelStream()").andLogPerformanceOf(() -> regions.parallelStream()
                                                              .map(Region::getName)
                                                              .collect(toList()));
}
useLabel(...).andLogPerformanceOf(...) is just a simple wrapper that runs a piece of code and logs time taken (I'll paste it at the end of the article). First run shows:
stream() started
stream() completed. Time elapsed = 1 millis
parallelStream() started
parallelStream() completed. Time elapsed = 9 millis
And some more results:
10 elements
Stream
Parallel stream
4
10
2
3
2
14
2
18
1
6
3
17
2
8
5
7
2
14
1
8
As you can see in all cases stream() is faster than parallelStream(). Parallel stream has much higher overhead compared to stream which uses single thread. When you want to split collection's computation you need to divide the input so that the threads compute similar amount of data, run the threads, collect results and so on.

Let's make the input list bigger.
100 elements
Stream
Parallel stream
2
12
2
11
1
5
2
7
2
8
1
6
3
6
2
6
4
9
6
18
Parallel stream is still slower.
1000 elements
Stream
Parallel stream
9
14
2
20
2
7
9
23
3
9
2
20
3
6
2
5
3
9
3
5
Still slower.
10 000 elements
Stream
Parallel stream
8
7
6
23
12
9
5
10
6
9
16
19
7
9
14
14
11
22
20
18
For 10k elements the results are similar.
1 000 000 elements
Stream
Parallel stream
1423
 65
 1715  91
 1244  63
 1345  68
 1458 91
 1479  65
 1415  48
 1584 87
 1425  61
 1506 73
Having list that contains 1M elements parallel stream is way faster but how often do you work with such big collections ?

Let's get back to the main question: Should I always use parallel stream instead of stream ?

Definitely not.

You should consider parallel version:
  • when you work with huge collections
  • when computation of single element takes much time
I suppose that each case should be considered separately. Performance stronlgy depends on operations you perform so in my opinion trying to define some kind of conditions when parallel stream should be used simply doesn't make sense.

You've seen example that transforms huge collection. You can find another one which shows processing collection for which computing single element takes much time in my post here: How to control pool size while using parallel stream.

You should also remember that if you want to make your code parallel IT HAS TO BE immutable. I stronly recommend reading about functional programming principles.

That's all. I've promised to paste the tool that logs performance so here you are:
/**
 * @author Grzegorz Taramina
 *         Created on: 13/06/16
 */
public class PerformanceLoggingBlock implements Logging {
    private final String label;

    public static PerformanceLoggingBlock useLabel(final String label) {
        return new PerformanceLoggingBlock(label);
    }

    private PerformanceLoggingBlock(final String label) {
        this.label = label;
    }

    public void andLogPerformanceOf(final Runnable runnable) {
        perfLog().info(label + " started");
        Stopwatch stopwatch = Stopwatch.createStarted();
        runnable.run();
        perfLog().info(label + " completed. Time elapsed = " + stopwatch.elapsed(MILLISECONDS) + " millis");
    }

    public <T> T andLogPerformanceOf(final Supplier<T> supplier) {
        System.out.println(label + " started");
        Stopwatch stopwatch = Stopwatch.createStarted();
        T result = supplier.get();
        System.out.println(label + " completed. Time elapsed = " + stopwatch.elapsed(MILLISECONDS) + " millis");
        return result;
    }
}

Tuesday, 12 July 2016

[Java 8 / Functional programming] Functional util that invokes a command n times.

Recently I've been doing major refactoring of integration tests. I've found many tests which do stuff like that:
for (int i = 0; i < 256; i++) {
     addProduct(UUID.randomUUID().toString);
 }
It's pretty ugly, isn't it ? It would be nice to have a small tool that invokes given piece of code n times. In Java 8 we can use IntStream:
IntStream.range(0, 256).forEach(i -> addProduct(UUID.randomUUID().toString()));
Looks better but it's still not very readable. Again I've started with a test that specifies how the tool should work:
    @Test
    public void shouldInvokeCommandFiveTimes() throws Exception {
        // given
        final List<String> list = newArrayList();

        // when
        times(5).invoke(() -> list.add("item"));

        // then
        assertThat(list).containsExactly("item", "item", "item", "item", "item");
    }
I've come up with the following class:
/**
 * @author Grzegorz Taramina
 *         Created on: 12/07/16
 */
public class Times {
    private final int times;

    private Times(final int times) {
        this.times = times;
    }

    public static Times times(final int times) {
        Assert.isTrue(times >= 0, "times must be at least equal to zero");
        return new Times(times);
    }

    public void invoke(final Runnable runnable) {
        IntStream.range(0, times).forEach(i -> runnable.run());
    }
}
It's very simple but makes code concise and readable:
times(5).invoke(() -> addProduct(randomUUID().toString));
I might have exaggerated saying that this is functional tool. It's simply higher order function but very useful.

Monday, 11 July 2016

[Java8 / Functional programming] How to create object that will be created lazily ?

Sometimes you may want to create some objects lazily. Especially when it comes to really heavy objects that may or may not be used in runtime. In one of the companies I used to work we had to use Java 6. Some developers must have read some articles about laziness and started to create literally all the objects lazily like that:
    public static class OldFashionedHeavyObjectHolder {
        private HeavyObject heavyObject;

        public synchronized HeavyObject getHeavyObject() {
            if (heavyObject == null) {
                heavyObject = new HeavyObject();
            }

            return heavyObject;
        }
    }
After couple of months we had a lot of classes with tons of getters that check if an object is null and so on. I can notice at least four disadvantages of that approach:
  • the method has to be synchronized because more than one thread can invoke the method when heavyObject == null
  • even if heavyObject has already been created you have to check that
  • it's extremely ugly
  • it's hard to test it
Luckily I've changed the company and now I can use all those fancy streams, lambdas and everything that Java 8 comes with. Basically I wanted to create a tool which works like that:
/**
 * @author Grzegorz Taramina
 *         Created on: 23/06/16
 */
public class LazyInstanceTest {
    @Test
    public void shouldCreateLazyInstance() throws Exception {
        // given
        LazyInstance<String> instance = LazyInstance.of(() -> "i'm lazy");

        // when
        String result = instance.get();

        // then
        assertThat(result).isEqualTo("i'm lazy");
    }
}
String is obviously just a simplification. So some kind of factory that creates a holder of a heavy instance and takes care of creating it lazily. I've figured out the following class:
 *
 * @author Grzegorz Taramina
 *         Created on: 23/06/16
 */
public class LazyInstance<T> {
    private final Supplier<T> instanceSupplier;
    private Supplier<T> instance = this::create;

    public static <T> LazyInstance<T> of(final Supplier<T> instanceSupplier) {
        return new LazyInstance<>(instanceSupplier);
    }

    /**
     * Creates LazyInstance
     * @param instanceSupplier supplier that will be lazily used while creating instance
     */
    private LazyInstance(final Supplier<T> instanceSupplier) {
        this.instanceSupplier = instanceSupplier;
    }

    public T get() {
        return instance.get();
    }

    private synchronized T create() {
        class InstanceFactory implements Supplier<T> {
            private final T instance = instanceSupplier.get();

            public T get() {
                return instance;
            }
        }

        if (!InstanceFactory.class.isInstance(instance)) {
            instance = new InstanceFactory();
        }

        return instance.get();
    }
}
It works like that:
final LazyInstance<HeavyObject> heavy = new LazyInstance<>(HeavyObject::new);
HeavyObject heavyObject = heavy.get();
The main idea of this class is that the supplier is being invoked lazily. Synchronized create method returns value returned by InstanceFactory (in fact it's a Supplier). Instance factory in turn returns value that returns Supplier provided to the LazyInstance. So basically we're invoking supplier that invokes supplier that creates real instance. Another thing: instance = new InstanceFactory(); - this line's really important because it swaps suppliers which means that synchronized block and if-else statement are being invoked only once. After the object is created the instance field (in LazyInstance not the InstanceFactory) contains InstanceFactory instance which returns real instance. It may look a bit complicated but I think it does all the stuff quite elegantly. Just to prove that the instance is being created lazily:
    public static class HeavyObject {
        public HeavyObject() {
            System.out.println("heavy's being created...");
        }
    }

    public static void main(String [] args) {
        System.out.println("Started executing main method");
        final LazyInstance<HeavyObject> heavy = LazyInstance.of(HeavyObject::new);
        System.out.println("Created lazy instance");
        System.out.println("Calling heavy.get()");
        HeavyObject heavyObject = heavy.get();
        System.out.println("End of main");
    }
The output:
Started executing main method
Created lazy instance
Calling heavy.get()
heavy's being created...
End of main
I should also mention that it looks good in Java 8 because of lambdas but it can be also implemented in older versions using anonymous classes.