gt_dev

Friday, 10 November 2017

[Bash / Google search] How to query google search in bash ?

Recently my colleague asked me if I could write some script that automatically asks google about some games. He has a list of game titles in a file like:

➜  tools cat products 
Witcher 3
Metro Last Light
Valiant Hearts

And now he wants to ask google where those games can be bought:

where can I buy Witcher 3
where can I buy Metro Last Light
where can I buy Valiant Hearts

and then print first 5 links of each result.
At first I was thinking about writing scala script but I was also wondering if I can do the same in bash.
I've found an app called googler. You can find it on github: https://github.com/jarun/googler
It allows to use google search engine from the command line. You can use either interactive mode or json. For instance I want first two results for "scala cookbook":

googler "scala cookbook" --count 2 --json

It returns:

[
  {
    "abstract": "Save time and trouble when using Scala to build object-oriented, functional, and 
concurrent applications. With more than 250 ready-to-use recipes and 700 code ...",
    "title": "Scala Cookbook - O'Reilly Media",
    "url": "http://shop.oreilly.com/product/0636920026914.do"
  },
  {
    "abstract": "Save time and trouble when using Scala to build object-oriented, functional, and 
concurrent applications. With more than 250 ready-to-use recipes and 700 code ...",
    "title": "Scala Cookbook: Recipes for Object-Oriented and Functional ...",
    "url": "https://www.amazon.com/Scala-Cookbook-Object-Oriented-Functional-Programming/dp/1449339611"
  }
]

And this is exactly what I needed. I want only links so I used jq in order to extract url field from json objects like this:

➜  tools googler "scala cookbook" --count 2 --json | jq '.[].url'
"http://shop.oreilly.com/product/0636920026914.do"
"https://www.amazon.com/Scala-Cookbook-Object-Oriented-Functional-Programming/dp/1449339611"

Note that I used .[].url because it returns array of objects.
Now I want to loop over game titles from the file, print the title and results from google:

➜  tools cat products | while read title; do echo $title && googler "where can I buy $title" --count 2 --json | jq '.[].url'; done
Witcher 3
"http://buy.thewitcher.com/"
"https://www.gog.com/game/the_witcher_3_wild_hunt"
Metro Last Light
"http://store.steampowered.com/app/287390/Metro_Last_Light_Redux/"
"https://www.g2a.com/metro-last-light-redux-steam-key-global-i10000000633010?___store=polish"
Valiant Hearts
"https://store.ubi.com/us/valiant-hearts---the-great-war/575ffdb2a3be1633568b4e7a.html"
"https://www.microsoft.com/pl-pl/store/p/valiant-hearts-the-great-war/c0x15tr4np0b"

And the last thing - allow user to pass the query (where can I buy) and number of expected links and make it executable:

#!/bin/bash

QUERY=$1
RESULTS=$2

cat products | while read title; do echo $title && googler "$1 $title" --count $2 --json | jq '.[].url'; done

It works exactly as I expected so there's really no need to use any more sophisticated programming language. I realize that I could do the same using curl but googler has a lot of useful features so you should definitely try it.

Tuesday, 10 October 2017

[Scala / Scalatest / JUnit] How to run Scalatests with existing JUnit tests ?

The project I've been working on has originally been created in Java. I've added Scala one year ago so it became a cross-language project. Since then all new features have been coded in Scala but I didn't have that much time to add any scala testing framework so all the tests have been written in Scala + JUnit like this one:

import com.wiso.cw.externalsupplier.genba.auth.GenbaHeader._
import org.assertj.core.api.Assertions.assertThat
import org.assertj.core.data.MapEntry.entry
import org.junit.Test
import org.mockito.Mockito._

class GenbaHttpEntityCreatorTest {
  private val defaultHeadersCreator = mock(classOf[GenbaDefaultHeadersCreator])

  private val entityCreator = new GenbaHttpEntityCreator(defaultHeadersCreator)

  @Test
  def should_create_http_entity_with_headers(): Unit = {
    // when
    val entity = entityCreator.createHttpEntity("some content", Token -> "token",
                                                                AppId -> "our app id",
                                                                Accept -> "application/json")

    // then
    assertThat(entity.getBody) isEqualTo "some content"
    assertThat(entity.getHeaders).containsOnly(entry("token", "token"),
                                               entry("appId", "our app id"),
                                               entry("Accept", "application/json"))
  }
}

Recently I decided to add Scalatest but we have thousands of JUnit tests so the only option is to run both Scalatest and JUnit tests when the project is being built by gradle or intellij. At first I added gradle dependency:

testCompile group: 'org.scalatest', name: 'scalatest_2.12', version: '3.0.1'

Note that the artifact name is scalatest_2.12 which means it's gonna work with Scala 2.12 so if you use other version make sure to find appropriate version of scalatest.

Basically if you want to run scalatest like any other junit test just add JUnitRunner:

import org.junit.runner.RunWith
import org.scalatest.junit.JUnitRunner

@RunWith(classOf[JUnitRunner])

It works but you have to add @RunWith annotation and extend proper spec (see: http://www.scalatest.org/user_guide/selecting_a_style) in every single test. I'd rather prefer creating my own spec. I've created two specs: UnitSpec - for all unit tests and IntegrationSpec - for tests that run spring context. The spec should extend any scalatest spec you'd like to use (for me it's FlatSpec).

@RunWith(classOf[JUnitRunner])
abstract class UnitSpec extends FlatSpec

Now every test that extend UnitSpec uses FlatSpec and can be run like JUnit test. You can obviously add some stuff here like Matchers, MockitoSugar and so on. My UnitSpec looks like that:

@RunWith(classOf[JUnitRunner])
abstract class UnitSpec extends FlatSpec
  with Matchers
  with MockitoSugar
  with OptionValues
  with GivenWhenThen
  with OneInstancePerTest
  with BeforeAndAfter {
  def when[T](methodcall: T): OngoingStubbing[T] = Mockito.when(methodcall)

  def verify[T](methodcall: T): T = Mockito.verify(methodcall)

  def verify[T](methodcall: T, verificationMode: VerificationMode): T = Mockito.verify(methodcall, verificationMode)

  def spy[T](o: T): T = Mockito.spy(o)

  def verifyZeroInterations(mocks: Object*): Unit = Mockito.verifyZeroInteractions(mocks: _*)

  def timeout(milis: Long): VerificationWithTimeout = Mockito.timeout(milis)
}

Note that we still use mockito and I don't want to import Mockito.when, Mockito.verify and so on in tests so it's more conveniant to have those methods in base class.

If you know Scala you may ask if I could use trait instead of abstract class. I was also thinking about that but it seems that annotations are not being inherited from traits.. Consider the following code:

@RunWith(classOf[JUnitRunner])
trait Trait

@RunWith(classOf[JUnitRunner])
abstract class AbstractClass

class ChildTrait extends Trait

class ChildClass extends AbstractClass

object A extends App {
  new ChildTrait().getClass.getAnnotations.foreach(println)

  new ChildClass().getClass.getAnnotations.foreach(println)
}

The output:

Child trait:
 s PU3g! )b#D    9"AA Ue LG C     !$  =S:LGO   7A Q  )
Child class:
@org.junit.runner.RunWith(value=class org.scalatest.junit.JUnitRunner)
^"mCN"B
   ! A  j]&$F  ! y  )

It shows that @RunWith annotation has not been inherited by ChildTrait class so if ChildClass were a test it wouldn't be run with other tests!!! Here's some simple test:

class RetryTest extends UnitSpec {
  private val someService = mock[Service]

  it should "retry twice and return value" in {
    Given("some service returns result after two errors")
    when(someService.doSth()).thenThrow(new RuntimeException)
      .thenThrow(new RuntimeException)
      .thenReturn("done")

    When("running it with double retry")
    val result = retry(3) {
      someService.doSth()
    }

    Then("result should be returned")
    result shouldBe "done"
  }
}

Friday, 28 July 2017

[Scala / Threads / Parallel collections] How to configure number of threads on parallel collection ?

I've already written an article about controlling pool size while using Java's parallel stream. Recently I needed exactly the same feature in Scala. Scala collection can be turned into parallel collection by invoking par method (introduced by Parallelizable trait).

trait Parallelizable[+A, +ParRepr <: Parallel] extends Any

so assuming you don't break basic rules of functional programming (immutablility, statelessness etc.) you can make your algorithms much faster by adding .par. Unfortunately .par method (exactly like .parallelStream() in Java) doesn't take any parameter which could controll the size of a pool.

/** Returns a parallel implementation of this collection.
   *
   *  For most collection types, this method creates a new parallel collection by copying
   *  all the elements. For these collection, `par` takes linear time. Mutable collections
   *  in this category do not produce a mutable parallel collection that has the same
   *  underlying dataset, so changes in one collection will not be reflected in the other one.
   *
   *  Specific collections (e.g. `ParArray` or `mutable.ParHashMap`) override this default
   *  behaviour by creating a parallel collection which shares the same underlying dataset.
   *  For these collections, `par` takes constant or sublinear time.
   *
   *  All parallel collections return a reference to themselves.
   *
   *  @return  a parallel implementation of this collection
   */
  def par: ParRepr = {
    val cb = parCombiner
    for (x <- seq) cb += x
    cb.result()
  }

However you can set the number of threads using ForkJoinTaskSupport like that:

object ParExample extends App {
  val friends = List("rachel", "ross", "joey", "chandler", "pheebs", "monica").par
  friends.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool(6))
}

Let's prove it works:

object ParExample extends App {
  val friends = List("rachel", "ross", "joey", "chandler", "pheebs", "monica", "gunther", "mike", "richard").par
  val stopwatch = Stopwatch.createStarted()
  friends.foreach(f => Thread.sleep(5000))
  println(stopwatch.elapsed(MILLISECONDS))
}

It prints 10025 so the pool must have been smaller than the list which contains 9 strings. Now I'll set the pool size:

object ParExample extends App {
  val friends = List("rachel", "ross", "joey", "chandler", "pheebs", "monica", "gunther", "mike", "richard").par
  friends.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool(9))
  val stopwatch = Stopwatch.createStarted()
  friends.foreach(f => Thread.sleep(5000))
  println(stopwatch.elapsed(MILLISECONDS))
}

This time it took only 5024 milliseconds because each element of the list has been processed by separate thread. The solution is ok for me but I'd rather want to use concise one-liners so I've created implicit conversion between ParSeq and RichParSeq that adds parallelismLevel(numberOfThreads: Int) method.

class RichParSeq[T](val p: ParSeq[T]) {
  def parallelismLevel(numberOfThreads: Int): ParSeq[T] = {
    p.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool(numberOfThreads))
    p
  }
}

object ConfigurableParallelism {
  implicit def parIterableToRichParIterable[T](p: ParSeq[T]): RichParSeq[T] = new RichParSeq[T](p)
}

Now you can use it like that:

import ConfigurableParallelism._

object ParExample extends App {
  val stopwatch = Stopwatch.createStarted()
  val friends = List("rachel", "ross", "joey", "chandler", "pheebs", "monica", "gunther", "mike", "richard").par
    .parallelismLevel(9)
    .foreach(f => Thread.sleep(5000))
  println(stopwatch.elapsed(MILLISECONDS))
}

It took 5231 milliseconds.

Wednesday, 12 April 2017

[Scala] How to convert dates implicitly ?

If you read any book or tutorial about scala you probably know what implicit keyword means. Basically you can create a method that will be called when compiler thinks it's a good idea to do so. Although I'm not a fan of implicit conversions (when you overuse it the code becomes less readable and clear) it's very useful to get rid of method calls in some cases.

When you create a DAO you probably extend some abstract data access object that provides entity manager, jdbc template or other object that connects your application with a database.

I've been using Jooq for 16 months now to construct SQL queries. It's quite nice tool that prevents from typical sql typos. See example below:

def findForUpdate(searchKey: DateTime, productId: String): Option[SearchHistoryReport] = {
    val sql = dslContext.select(SEARCH_HISTORY_REPORT.SEARCH_DATE_AND_HOUR,
      SEARCH_HISTORY_REPORT.PRODUCT_ID,
      SEARCH_HISTORY_REPORT.SEARCH_SCORE)
      .from(SEARCH_HISTORY_REPORT)
      .where(SEARCH_HISTORY_REPORT.SEARCH_DATE_AND_HOUR.equal(new Timestamp(searchKey.getMillis)))
      .and(SEARCH_HISTORY_REPORT.PRODUCT_ID.equal(productId))
      .forUpdate().getSQL(INLINED)
    Try({
      npjt.queryForObject(sql, noParams(), (rs: ResultSet, rowNum: Int) => SearchHistoryReport.builder()
        .searchDateAndHour(fromTimestamp(rs.getTimestamp(SEARCH_HISTORY_REPORT.SEARCH_DATE_AND_HOUR.toString)))
        .productId(rs.getString(SEARCH_HISTORY_REPORT.PRODUCT_ID.toString))
        .searchScore(rs.getBigDecimal(SEARCH_HISTORY_REPORT.SEARCH_SCORE.toString)).build())
    }).toOption
  }

Search date and hour is a datetime column in the database:

MariaDB [_censored]> describe search_history_report;
+----------------------+---------------+------+-----+---------+-------+
| Field                | Type          | Null | Key | Default | Extra |
+----------------------+---------------+------+-----+---------+-------+
| product_id           | varchar(255)  | NO   | PRI | NULL    |       |
| search_date_and_hour | datetime      | NO   | PRI | NULL    |       |
| search_score         | decimal(19,2) | NO   |     | NULL    |       |
+----------------------+---------------+------+-----+---------+-------+

Datetime can be compared to java.sql.Timestamp but all dates in my domain objects are Joda's DateTime.

Let's see another example:

val sql = dslContext.select(sum(SEARCH_HISTORY_REPORT.SEARCH_SCORE).as(SCORE_SUM_ALIAS), SEARCH_HISTORY_REPORT.PRODUCT_ID)
      .from(SEARCH_HISTORY_REPORT)
      .where(SEARCH_HISTORY_REPORT.PRODUCT_ID.in(productIds))
      .and(SEARCH_HISTORY_REPORT.SEARCH_DATE_AND_HOUR.between(new Timestamp(startDate.getMillis)).and(new Timestamp(endDate.getMillis)))
      .groupBy(SEARCH_HISTORY_REPORT.PRODUCT_ID).getSQL(INLINED)

I simply want to get rid of DateTime => Timestamp conversion.

Let's assume that my abstract dao looks like that:

trait Dao {
  def dslContext(): DSLContext = ???

  def jdbcTemplate(): NamedParameterJdbcTemplate = ???
}

It provides methods that return dslContext (jooq) and jdbcTemplate (spring).

I'd be really happy if all my daos could automatically convert org.joda.time.DateTime to java.sql.Timestamp.

Let's create implicit conversion between those types.

trait Dao {
  implicit def asTimestamp(date: DateTime): Timestamp = new Timestamp(date.getMillis)

  def dslContext(): DSLContext = ???

  def jdbcTemplate(): NamedParameterJdbcTemplate = ???
}

That's all. Now when I pass DateTime to method that takes Timestamp the compiler knows that implicit conversion can be used. It calls asTimestamp method that returns Timestamp behind the hood. Programmer doesn't have to remember that jooq likes timestamps only.

Query that uses implicit conversion looks like that:

val sql = dslContext.select(sum(SEARCH_HISTORY_REPORT.SEARCH_SCORE).as(SCORE_SUM_ALIAS), SEARCH_HISTORY_REPORT.PRODUCT_ID)
      .from(SEARCH_HISTORY_REPORT)
      .where(SEARCH_HISTORY_REPORT.PRODUCT_ID.in(productIds))
      .and(SEARCH_HISTORY_REPORT.SEARCH_DATE_AND_HOUR.between(startDate).and(endDate))
      .groupBy(SEARCH_HISTORY_REPORT.PRODUCT_ID).getSQL(INLINED)

I guess it's perfect use case when implicit conversion can be used. The code is very consise and readable and it's natural to pass DateTime so noone has to remember about this conversion.

Wednesday, 22 March 2017

[Scala] How to transform tuple to class instance ?

In scala TupleN contains N fields and this is basically all that tuple can do (except swapping elements in Tuple2). Although it's really simple data structure it's extremely useful especially when you do some collection's processing.

Imagine that users in your system make orders and you want to find all users with their orders.

object A extends App {
  List("some@email.com", "another@email.com", "and@other.com")
    .map(email => (email, findOrders(email)))

  private def findOrders(email: String): List[Order] = List() // call some dao here...
}

case class Account(email: String, orders: List[Order])

As a result we have collection of tuples that contain email and list of orders. We could have created Account object instead of tuple like that:

.map(email => Account(email, findOrders(email))

but let's say we want only users who made at least one order so we have to make additional filtering:

List("some@email.com", "another@email.com", "and@other.com")
    .map(email => (email, findOrders(email)))
    .filterNot(_._2.isEmpty)

In the end we want to return list of accounts which means the tuple has to be transformed into Account instance. The easiest way would be something like that:

List("some@email.com", "another@email.com", "and@other.com")
    .map(email => (email, findOrders(email)))
    .filterNot(_._2.isEmpty)
    .map(t => Account(t._1, t._2))

but it just doesn't feel right. You might have noticed that we used case class instead of class so tupled method can be invoked (it comes from Function2 trait).

/** Creates a tupled version of this function: instead of 2 arguments,
   *  it accepts a single [[scala.Tuple2]] argument.
   *
   *  @return   a function `f` such that `f((x1, x2)) == f(Tuple2(x1, x2)) == apply(x1, x2)`
   */
  @annotation.unspecialized def tupled: Tuple2[T1, T2] => R = {
    case Tuple2(x1, x2) => apply(x1, x2)
  }

As you see in scaladoc it allows to create object's instance from tuple so considering our Account we can either call Account(email, orders) or Account((email, orders)). And this is exaclty what we're looking for:

List("some@email.com", "another@email.com", "and@other.com")
    .map(email => (email, findOrders(email)))
    .filterNot(_._2.isEmpty)
    .map(Account.tupled)

Wednesday, 7 December 2016

[Java / Assertj] How to create your own fluent assertions ?

Everyone should write unit tests. And I mean EVERYONE. Integration tests are very popular and useful but unit tests are the fastest. Having good unit tests allows you to check whether your class works within second or two. UTs also tell you that something's wrong but how they do it ? When test fails you know that some parts of your code don't work as you expect. You see the name of failing test and a class which the test corresponds to so you have general idea what might have gone wrong but if you want to fix that bug you have to read both test and the code. This is why your tests should be extremely readable. I use the following live template for all the tests I write (Intellij Idea):

@Test
public void $METHOD_NAME$() throws Exception {
    // given
    $END$
    
    // when
    
    
    // then
    
}

When section invokes a method which is being tested by particular test. Given section prepares the input and records mocks' behaviour. Then section checks the output. I really like given-when-then template. It clearly separates those three sections so you always know which object is the input and so on. When section typically contains single line of code. Actually I like when each section is one-liner but it's usually impossible. Perfect test for me looks basically like that:

@Test
public void should_calculate_ceiling() throws Exception {
    // given
    double price = 7.8;
    
    // when
    double ceiling = Math.ceil(price);

    // then
    assertThat(ceiling).isEqualTo(8.0);
}

That case is actually too simple because typically you have to check the state of some object. Let's say that you're testing some class that returns an instance of Person:

@AllArgsConstructor
@Getter
public static class Person {
    private final String firstName;
    private final String lastName;
    private final int age;
    private final Sex sex;
    private final Optional<Job> job;

    public static enum Sex {
        MALE, FEMALE
    }

    public static class Job {}
}

I would never implement Person class like this but it's just an example so forgive me that :) If you use standard junit assertions you would probably write something like that:

@Test
public void should_return_joey() throws Exception {
    // when
    Person person = p();

    // then
    assertEquals(Sex.MALE, person.getSex());
    assertEquals("Joey", person.getFirstName());
    assertEquals("Tribiani", person.getLastName());
    assertTrue(person.getAge() > 18);
    assertEquals(Optional.empty(), person.getJob());
}

private Person p() {
    return new Person("Joey", "Tribiani", 25, Sex.MALE, Optional.empty());
}

I really don't like assertEquals(). First of all you have to remember order of parameters. The first one is expected and the second actual. If you make mistake you will see a misleading message. Another thing assertTrue() doesn't tell you why Joey has to be older than 18. This is why you should use some library that contains fluent assertions and allows you to create your own assertions. The most popular assertions library for Java is probably AssertJ. Having for instance a string you can check many things at once:

assertThat("Joey Tribiani").hasSize(13)
                           .startsWith("Jo")
                           .contains("y T")
                           .endsWith("ani")
                           .doesNotContain("Rachel");

Same applies to collections:

List<String> friends = ImmutableList.of("Rachel", "Ross", "Joey", "Chandler", "Pheebs", "Monica");
assertThat(friends).hasSize(6)
                   .doesNotContain("Gunther")
                   .containsSequence("Rachel", "Ross")
                   .endsWith("Monica")
                   .allMatch(friend -> friend.matches("[A-Z][a-z]*"));

And so on... Ok let's write custom assertions for Person class. First of all you have to create a class that extends AbstractAssert

public class PersonAssertions extends AbstractAssert<PersonAssertions, Person> {
}

Then we must add constructor that matches super() and static method assertThat:

public class PersonAssertions extends AbstractAssert<PersonAssertions, Person> {
    public static PersonAssertions assertThat(final Person actual) {
        return new PersonAssertions(actual);
    }

    private PersonAssertions(final Person actual) {
        super(actual, PersonAssertions.class);
    }
}

Now we can start adding assertions. First of all let's create some methods that allow you to check first and last name.

public PersonAssertions hasFirstName(final String name) {
    Assertions.assertThat(name).isEqualTo(actual.getFirstName());
    return this;
}

public PersonAssertions hasLastName(final String lastName) {
    Assertions.assertThat(lastName).isEqualTo(actual.getLastName());
    return this;
}

Note that we always return this to make the assertions fluent. Now:

assertEquals("Joey", person.getFirstName());
assertEquals("Tribiani", person.getLastName());

Can be replaced by:

assertThat(person).hasFirstName("Joey")
                  .hasLastName("Tribiani");

Now let's check gender:

public PersonAssertions isFemale() {
    Assertions.assertThat(actual.getSex()).isSameAs(Sex.FEMALE);
    return this;
}

public PersonAssertions isMale() {
    Assertions.assertThat(actual.getSex()).isSameAs(Sex.MALE);
    return this;
}

Now we have:

assertThat(person).hasFirstName("Joey")
                  .hasLastName("Tribiani")
                  .isMale();

Those assertions are just fancy methods that check internal state of given object. All the created methods are way more readable but we can still add more specific assertions that suits to our domain. Let's assume that we're selling alcohol.... In Poland you can buy alcohol if you're 18 years old.

public PersonAssertions canBuyBeerInPoland() {
    Assertions.assertThat(actual.getAge()).isGreaterThanOrEqualTo(18);
    return this;
}

assertThat(person).hasFirstName("Joey")
                  .hasLastName("Tribiani")
                  .isMale()
                  .canBuyBeerInPoland();

canBuyBeerInPoland() looks much better than assertTrue(person.getAge() > 18) because it uses our domain language. And the last one. Let's check whether given person is unemployed. Person class contains Optional<Job> so we have to check if it's empty or not:

public PersonAssertions isUnemployed() {
    Assertions.assertThat(actual.getJob()).isEmpty();
    return this;
}

Now you can compare junit assertions to our custom assertj assertions:
JUnit:

assertEquals(Sex.MALE, person.getSex());
assertEquals("Joey", person.getFirstName());
assertEquals("Tribiani", person.getLastName());
assertTrue(person.getAge() > 18);
assertEquals(Optional.empty(), person.getJob());

AssertJ:

assertThat(person).hasFirstName("Joey")
                  .hasLastName("Tribiani")
                  .isMale()
                  .canBuyBeerInPoland()
                  .isUnemployed();

It looks much better and you can reuse all those methods. You should also think about overriding error messages. We're still using the same instance of Person:

private Person p() {
    return new Person("Joey", "Tribiani", 25, Sex.MALE, Optional.empty());
}

The follwoing assertion:

assertThat(person).hasFirstName("Joeya");

generates error message like this:

org.junit.ComparisonFailure: 
Expected :"Joey"
Actual   :"Joeya"

You can always override error message like that:

public PersonAssertions hasFirstName(final String name) {
    Assertions.assertThat(name).overridingErrorMessage("Given Person [" + actual + "] has name " + actual.getFirstName() + ". Expected: " + name)
            .isEqualTo(actual.getFirstName());
    return this;
}

Now the same test generates:

java.lang.AssertionError: Given Person [DistributorSalesScheduler.Person(firstName=Joey, lastName=Tribiani, age=25, sex=MALE, job=Optional.empty)] has name Joey. Expected: Joeya

The whole class looks as follows:

public static class PersonAssertions extends AbstractAssert<PersonAssertions, Person> {
    public static PersonAssertions assertThat(final Person actual) {
        return new PersonAssertions(actual);
    }

    private PersonAssertions(final Person actual) {
        super(actual, PersonAssertions.class);
    }

    public PersonAssertions hasFirstName(final String name) {
        Assertions.assertThat(name).overridingErrorMessage("Given Person [" + actual + "] has name " + actual.getFirstName() + ". Expected: " + name)
                .isEqualTo(actual.getFirstName());
        return this;
    }

    public PersonAssertions hasLastName(final String lastName) {
        Assertions.assertThat(lastName).isEqualTo(actual.getLastName());
        return this;
    }

    public PersonAssertions canBuyBeerInPoland() {
        Assertions.assertThat(actual.getAge()).isGreaterThanOrEqualTo(18);
        return this;
    }

    public PersonAssertions isFemale() {
        Assertions.assertThat(actual.getSex()).isSameAs(Sex.FEMALE);
        return this;
    }

    public PersonAssertions isMale() {
        Assertions.assertThat(actual.getSex()).isSameAs(Sex.MALE);
        return this;
    }

    public PersonAssertions hasJob() {
        Assertions.assertThat(actual.getJob()).isPresent();
        return this;
    }

    public PersonAssertions isUnemployed() {
        Assertions.assertThat(actual.getJob()).isEmpty();
        return this;
    }
}

As you see writing custom assertions is very easy and makes tests more readable. All the assertions are reusable and can be unit tested. The project I've been working on contains module with all the utilities for tests so other modules can import the assertions. I'm always very happy when I see that someone's created new set of assertions for our domain objects :)

Monday, 14 November 2016

[Java 8 / Parallel stream / Stream] Should I always use parallel stream instead of stream ?

Streams are probably one of the most commonly used feature of Java 8. At first people discover forEach() method, then map() and filter() and so on. Some of them starts reading about functional programming but from my experience I'd say that in general people still think that stream is just an improved looping structure.

Then comes this exciting moment when they realize that it all can be much, much faster because there's also parallelStream. And then problems come...

Very often when I'm waiting for something I look into the code and try to fix some crappy parts. This one I've found yesterday:

resource.setRegions(product.getRegions().parallelStream().map(Region::getName).collect(toList()));

Our database has something like ten regions. Let's see how long it takes to collect such items using stream and parallelStream.

public static void main(String[] args) {
    final List<Region> regions = IntStream.range(0, 10)
                        .mapToObj(i -> new Regioan("region:" + i))
                        .collect(toList());

    useLabel("stream()").andLogPerformanceOf(() -> regions.stream()
                                                         .map(Region::getName)
                                                         .collect(toList()));
    useLabel("parallelStream()").andLogPerformanceOf(() -> regions.parallelStream()
                                                              .map(Region::getName)
                                                              .collect(toList()));
}

useLabel(...).andLogPerformanceOf(...) is just a simple wrapper that runs a piece of code and logs time taken (I'll paste it at the end of the article). First run shows:

stream() started
stream() completed. Time elapsed = 1 millis
parallelStream() started
parallelStream() completed. Time elapsed = 9 millis

And some more results:

10 elements
Stream	Parallel stream
4	10
2	3
2	14
2	18
1	6
3	17
2	8
5	7
2	14
1	8

As you can see in all cases stream() is faster than parallelStream(). Parallel stream has much higher overhead compared to stream which uses single thread. When you want to split collection's computation you need to divide the input so that the threads compute similar amount of data, run the threads, collect results and so on.

Let's make the input list bigger.

100 elements
Stream	Parallel stream
2	12
2	11
1	5
2	7
2	8
1	6
3	6
2	6
4	9
6	18

Parallel stream is still slower.

1000 elements
Stream	Parallel stream
9	14
2	20
2	7
9	23
3	9
2	20
3	6
2	5
3	9
3	5

Still slower.

10 000 elements
Stream	Parallel stream
8	7
6	23
12	9
5	10
6	9
16	19
7	9
14	14
11	22
20	18

For 10k elements the results are similar.

1 000 000 elements
Stream	Parallel stream
1423	65
1715	91
1244	63
1345	68
1458	91
1479	65
1415	48
1584	87
1425	61
1506	73

Having list that contains 1M elements parallel stream is way faster but how often do you work with such big collections ?

Let's get back to the main question: Should I always use parallel stream instead of stream ?

Definitely not.

You should consider parallel version:

when you work with huge collections
when computation of single element takes much time

I suppose that each case should be considered separately. Performance stronlgy depends on operations you perform so in my opinion trying to define some kind of conditions when parallel stream should be used simply doesn't make sense.

You've seen example that transforms huge collection. You can find another one which shows processing collection for which computing single element takes much time in my post here: How to control pool size while using parallel stream.

You should also remember that if you want to make your code parallel IT HAS TO BE immutable. I stronly recommend reading about functional programming principles.

That's all. I've promised to paste the tool that logs performance so here you are:

/**
 * @author Grzegorz Taramina
 *         Created on: 13/06/16
 */
public class PerformanceLoggingBlock implements Logging {
    private final String label;

    public static PerformanceLoggingBlock useLabel(final String label) {
        return new PerformanceLoggingBlock(label);
    }

    private PerformanceLoggingBlock(final String label) {
        this.label = label;
    }

    public void andLogPerformanceOf(final Runnable runnable) {
        perfLog().info(label + " started");
        Stopwatch stopwatch = Stopwatch.createStarted();
        runnable.run();
        perfLog().info(label + " completed. Time elapsed = " + stopwatch.elapsed(MILLISECONDS) + " millis");
    }

    public <T> T andLogPerformanceOf(final Supplier<T> supplier) {
        System.out.println(label + " started");
        Stopwatch stopwatch = Stopwatch.createStarted();
        T result = supplier.get();
        System.out.println(label + " completed. Time elapsed = " + stopwatch.elapsed(MILLISECONDS) + " millis");
        return result;
    }
}

Wednesday, 9 November 2016

[Scala / Java / Gradle] How to add Scala to Java project and use both ?

I've been developing Java projects for couple of years and I remember the day when I finally could use Java 8. I was pretty excited that I can abandon Guava's FluentIterable, command pattern, anonymous classes and so on but I shortly realized that it's not enough when you want to write concise functional code.

Although Java8 makes significant step forward it's nothing compared to Scala. I haven't heard about a company that decided to rewrite some huge Java project to Scala so far but fortunately Scala runs on JVM so you can use both.

Ok I have Java8 + Gradle project. Let's add Scala :)

In build gradle you need to add scala plugin:

apply plugin: 'scala'

And scala lang/compiler dependencies:

compile group: 'org.scala-lang', name: 'scala-library', version: scalaVersion
compile group: 'org.scala-lang', name: 'scala-compiler', version: scalaVersion

You should also make sure that .java and .scala files are being built together so:

sourceSets.main.scala.srcDir "src/main/java"
sourceSets.test.scala.srcDir "src/test/java"
sourceSets.main.java.srcDirs = []
sourceSets.test.java.srcDirs = []

The project I've been working on has multiple modules so I've added sourceSets and dependencies in subprojects section. Remember about installing Scala plugin in your IDE (in intellij it's called Scala).

That's all :)

Now when I'm trying to build the project I get the following output:

➜  cw git:(develop) git status
On branch develop
Your branch is up-to-date with 'origin/develop'.
nothing to commit, working directory clean
➜  cw git:(develop) gradle build
:compileJava UP-TO-DATE
:compileScala UP-TO-DATE
:processResources UP-TO-DATE
:classes UP-TO-DATE
:jar UP-TO-DATE
:assemble UP-TO-DATE
:compileTestJava UP-TO-DATE
:compileTestScala UP-TO-DATE
:processTestResources UP-TO-DATE
:testClasses UP-TO-DATE
:test UP-TO-DATE
:check UP-TO-DATE
:build UP-TO-DATE
...

As you can see there's compileScala among others which in fact builds both .java and .scala files. It's because of our sourceSets. It allows you to use Scala classes in .java files and Java classes in .scala files.

You should also make sure that you chose proper version of Scala. I use 2.11.5. It works without any issues with Java 8.

Friday, 23 September 2016

[Bash / Git] How to print number of commits done today ?

Quite simple stuff. You want to know how many commits you've done so far today. How to do that ?

➜  cw git:(develop) git log

It shows commits in the following format:

commit 2affec1442eb6b4ad21cd6d93c7670e49e4f3cba
Author: gt
Date:   Wed Sep 21 13:33:15 2016 +0200

    got rid of some warnings

commit 2bd8a52b239301adedeee8a36cff47c2599f992f
Author: gt
Date:   Wed Sep 21 13:25:24 2016 +0200

    bumped scala version in versions.gradle

You can exactly see when and by whom the commit has been made.

Author: gt
Date:   Wed Sep 21 13:25:24 2016 +0200

We need to filter git log output by author and date. It would be easier to grep the data if both author and commit were in the same line. We also need date only (time isn't important here) so let's format git log output like that:

git log --date=short --format=format:"%ad %aE %s"

It prints:

2016-09-23 mymaila@mycompany.pl commit message 1
2016-09-23 mymaila@mycompany.pl commit message 2
2016-09-23 mymaila@mycompany.pl commit message 3
2016-09-23 mymaila@mycompany.pl commit message 4
2016-09-23 mymaila@mycompany.pl commit message 5
2016-09-23 mymaila@mycompany.pl commit message 6
2016-09-23 mymaila@mycompany.pl commit message 7
2016-09-22 mymaila@mycompany.pl commit message 8
2016-09-22 mymaila@mycompany.pl commit message 9
2016-09-22 mymaila@mycompany.pl commit message 10

Now we can grep such output easily but we still need to know current date in the same format.

➜  cw git:(develop) date +'%Y-%m-%d'
2016-09-23

Let's put it together:

➜  cw git:(develop) ✗ git log --date=short --format=format:"%ad %aE %s" | grep "$(date +'%Y-%m-%d') mymaila@mycompany.pl"
2016-09-23 mymaila@mycompany.pl commit message 1
2016-09-23 mymaila@mycompany.pl commit message 2
2016-09-23 mymaila@mycompany.pl commit message 3
2016-09-23 mymaila@mycompany.pl commit message 4
2016-09-23 mymaila@mycompany.pl commit message 5
2016-09-23 mymaila@mycompany.pl commit message 6
2016-09-23 mymaila@mycompany.pl commit message 7

And the final step - count the lines:

➜  cw git:(develop) ✗ git log --date=short --format=format:"%ad %aE %s" | grep "$(date +'%Y-%m-%d') mymaila@mycompany.pl" | wc -l
8

I guess it's faster to count manually commits in git log than type the whole command above so let's put it into a script:

#!/bin/bash

TODAY=$(date +'%Y-%m-%d')
AUTHOR="mymaila@mycompany.pl"
COMMITS=$(git log --date=short --format=format:"%ad %aE %s" | grep "$TODAY $AUTHOR" | wc -l)

echo "Commits today: $COMMITS"

I've also added alias in .zshrc:

alias hmc="/home/gt/tools/howManyCommitsToday.sh"

➜  cw git:(develop) hmc
Commits today: 8

Wednesday, 10 August 2016

[CleanCode / Patterns] How classes like Period should be constructed ?

I've been thinking recently a lot about constructing objects that contain only two fields. The system I develop contains many classes like this which is rather problematic. I guess Period is a great example.

    public class Period {
        private final DateTime start;
        private final DateTime end;

	...
    }

How can we create instance of Period ?

1. No-arg constructor + setters

This is probably the worst way. It would look like that:

final Period period = new Period();
period.setStart(startDate);
period.setEnd(endDate);

Advantages:

None

Disadvantages:

you create empty object
it takes three lines of code
it's muttable so you never know if the object is consistent
you can make stupid mistakes like invoke same setter twice with different parameters and so on

Basically for me no-arg constructor and set of getters and setters is a data structure. It isn't encapsulation because you won't put any logic to setters. You can make all fields public - it's the same. Let's talk about object consistency for a while. Consider you have Address class:

    public static class Address {
        private final String street;
        private final String houseNumber;
        private final String postalCode;
    }

Some user (let's say Ben) registers in your system with the following address:

new Address("Long street", "16a", "50-500");

After few days Ben finds better flat on the same street say: Long Steet, 46d, 50-500 What now ? Should we update existing address or create new one ? Some people would say update but what if this house belongs to other post therefore should have other postal code ? I think that we should in such case always create new object. I have a validator which checks Address instance so I am sure that the address is correct. When I create and validate object I get consistent and correct object so immutable objects should be created whenever it's possible.

2. All-args constructor + getters

Advantages:

object is immutable
it takes one line to create instance

Disadvantages:

it's not very readable (espiecially for people who aren't developers)
when constructor's parameters are of the same type it's easy to swap them by mistake

3. Builder

I think it's best solution in this case. Most people use builder pattern for classes which have a lot of fields. In my opinion builder is a perfect pattern to create instance of class which has only two fields. Let's get back to Period class:

    @Getter
    public static class Period {
        private final DateTime startDate;
        private final DateTime endDate;

        public static Builder newPeriodThatStarts(final DateTime startDate) {
            return new Builder(startDate);
        }

        private Period(final DateTime startDate, final DateTime endDate) {
            this.startDate = startDate;
            this.endDate = endDate;
        }

        public static class Builder {
            private final DateTime startDate;
            
            private Builder(final DateTime startDate) {
                this.startDate = startDate;
            }
            
            public Period andEnds(final DateTime endDate) {
                return new Period(startDate, endDate);
            }
        }
    }

And this is how you instantiate Period instance:

    public static void main(String [] args) {
        Period period = newPeriodThatStarts(now()).andEnds(now().plusDays(1));
        Period anotherPeriod = newPeriodThatStarts(new DateTime(2016, 2, 15, 10, 0, 0)).andEnds(now());
    }

Advantages:

object is immutable
it takes one line to create instance
it's very readable even for someone who isn't software developer

Disadvantages:

you have to create a builder (or use lombok)

Builder seems to be the best way for me. For Period you could also use all args constructor because the parameters have natural order - the first one is startDate and the second endDate (I can't really imagine the opposite) so you shouldn't swap the parameters by mistake. A lot of domain objects take two strings as parameters and in this case I would definitely use builder.