20Dec
Benchmark Java Applications using JMH
Benchmark Java Applications using JMH

In this article, we will introduce JMH as a tool to benchmark your Java applications. We will discuss on how to leverage JMH for performance insights of your Java application.

Before we talk about benchmarks, let us distinguish between the functional requirements and performance requirements of an application. We use JUnit / Mockito to test the functionality of Java applications. We write test cases to check whether all the functional requirements of the application are satisfied. Automated testing tools such as Selenium makes testing even simpler. But unfortunately, that’s not the only challenge in maintaining applications in production. One of the major concerns is the application performance and that’s where the benchmark comes into the picture. Customer-faced applications demand low latency and high throughput.  Troubleshooting application performance is one of the hardest jobs for a developer. The hardest part is to figure out the actual logic in the application that causes performance bottleneck.

App in Dev and App in Production
App in Dev and App in Production

 

We may be measuring the performance of the application as a whole using multiple benchmark experiments. However, it is difficult to test benchmarks on a block of code rather than the entire application. So, we need a benchmark tool to test the performance of application logic. Also, you would want to investigate on a more detailed level. JMH (Java Microbenchmark Harness) serves for the exact purpose.

Pre-requisites

This article assumes that you have basic programming knowledge in Java. You need to have Java and Maven installed on your system in order to follow this article. We prefer JDK 8 to be installed although there are no requirements for the JDK version. You also need to have IntelliJ IDEA  to run the examples flawlessly. If you don’t have an IDE like IntelliJ IDEA, then you could simply run the code samples using java command.

What is JMH?

JMH is an Open JDK tool that helps to implement benchmarks correctly. Remember, JMH specializes in micro-benchmarks where low-level performance metrics are measured. As an example, we can think of measuring the performance of a block of code or a function rather than the whole application. So, it is important to note that micro-benchmarks are the exact opposite of what an end-to-end performance testing is meant for. Micro-benchmarks are useful for in-depth analysis of performance issues. Writing JMH benchmarks are similar to writing JUnit test cases. The only difference is with the annotations used in JMH.

Writing your first JMH benchmark

Start by creating a maven project and your pom.xml should contain the below-mentioned dependencies.

<dependency>
       <groupId>org.openjdk.jmh</groupId>
       <artifactId>jmh-core</artifactId>
       <version>1.22</version>
</dependency>

<dependency>
       <groupId>org.openjdk.jmh</groupId>
       <artifactId>jmh-generator-annprocess</artifactId>
       <version>1.22</version>
</dependency>

The ideal approach is to create a new JMH benchmark project/class instead of writing them in the same Java application. So, let’s start by creating a class called BenchMark to run your JMH benchmarks.

@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(value = 2)
public class BenchMark {
   
  // benchmark methods to be added.

}

Don’t worry about the unfamiliar annotations in the above code. We will explain every one of them in detail as we go along with the article. So, @BenchmarkMode annotation specifies the benchmark mode in which you want to measure the performance. Below are 5 different benchmark modes available for you.

  1. Throughput -> Use Throughput mode if you want to measure the number of operations per second. An operation corresponds to the benchmark method.
  2. AverageTime -> Use AverageTime mode if you want to measure average time in which the benchmark method is executed.
  3. SampleTime -> Use SampleTime mode if you want to measure the max/min time in which the benchmark method is executed.
  4. SingleShotTime -> Use SingleShotTime mode if you want to measure the time in which the benchmark method is executed without considering the warm-up iterations.
  5. All -> Use All mode if you want to measure all the benchmarks mentioned in the previous 4 benchmark modes.

The annotation @OutputTimeUnit specify the time unit in which benchmark results should be displayed. @Fork annotation will set default forking parameters. It is used to control the benchmark execution. We can specify the number of parameters such as value, warmups & jvmArgs with @Fork annotation. The value parameter specifies the number of times in which the benchmark should be executed. We can also specify the number of warm-up iterations using the warmups parameter. Warm-up iterations will not be considered for the actual benchmark operation. The jvmArgs parameter specifies the JVM arguments to be used with the benchmark.

As we discussed earlier, micro-benchmarks target low-level code, which means a method or particular block of code. Each of the benchmarks that we need to measure will be placed inside a method. We will then annotate them with @Benchmark annotation. For instance, refer to the below benchmark method:

@Benchmark
public void testMethod1(Blackhole bh){
   int[] array = new int[2000];
   bh.consume(array);
}

In the above method, we are simply allocating array space for 2000 primitive int elements. The intent is to benchmark the memory space allocation of a primitive array. You might be wondering what is a Blackhole and why is it required in the first place. Let’s assume if the above method was written like below:

@Benchmark
public void testMethod1(){
   int[] array = new int[2000];
}

Now when you see the benchmark results, you might come across inconsistent results. First of all, there is nothing to benchmark here! Why? because the variable array is garbage collected since it is not referenced or used anywhere. JVM already handled just as usual. Now we will be like:

Thanks!
Thanks!

 

So, how do we benchmark the above method without JVM interference for optimization?  That’s where Blackhole comes into the picture. Blackhole consumes values and thereby fooling the JVM that the value was actually being used. Now for the demonstration, let’s write another benchmark method for string array allocation.

@Benchmark
public void testMethod2(Blackhole bh){
  String[] array = new String[2000];
  bh.consume(array);
}

Now let’s say we have a benchmark method as shown below:

@Benchmark
public int testMethod2(int sum){
  int a = 100;
  int b = 200;
  int sum = a+b;
  return sum;
}

In this case, our benchmark method returns a value and it is being used. So, no need to handle them using Blackhole. At this point, it seems perfectly fine in our primary inspection. But there are still chances of JVM interference here. As you can see on the above code, we use constants to form the sum variable. So, JVM will probably memorize the value of sum as 300 instead of performing the actual sum operation. So, we won’t be able to actually evaluate the benchmark here. So, how do we solve this problem?  By moving the constant part to a benchmark state. A benchmark state can be defined using a separate class or using an inner class. We move the constant variables to the benchmark state class. These variables are called state variables. Let’s say, we have a state class as shown below:

@State(Scope.Thread)
public static class BenchmarkState {
  public int a = 100;
  public int b = 200;
}

Then our benchmark method can be re-written as shown below:

@Benchmark
public int testMethod2(BenchmarkState state){
  return state.a + state.b;;
}

@State annotation accepts three different scopes for the benchmark state.

  • Thread -> State instances are created for every thread that runs the benchmark.
  • Group ->  State instances are created for every thread group that runs the benchmark.
  • Benchmark -> All the threads share the same state object for the benchmark process.

So, it is a preferred approach to keep the constants out from the benchmark to eliminate possible JVM optimization.  @State annotation plays a big role here.
Let’s go back to our previous implementations for the benchmarks. Below were the benchmarks for your reference.

@Benchmark
public void testMethod1(Blackhole bh){
  int[] array = new int[2000];
  bh.consume(array);
}

@Benchmark
public void testMethod2(Blackhole bh){
  String[] array = new String[2000];
  bh.consume(array);
}

Let us compare the benchmark methods: testMethod1 & testMethod2 for their performance. Since we mentioned the benchmark mode as  Mode. Throughput earlier, throughput for these two benchmark methods will be measured. Let us run them from the main method itself. We use Options / OptionsBuilder to pass the JMH application options to the running thread. We just have to mention the benchmark class there. Benchmark methods will be picked up since we tagged them using @Benchmark annotation. Refer to the below code:

public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(BenchMark.class.getSimpleName())
                .forks(1)
                .build();

        new Runner(opt).run();
}

Once you execute the main method, we get the result as shown below:

"C:\Program Files\Java\jdk1.8.0_191\bin\java.exe" -javaagent:C:\Users\rahul.raj\Downloads\ideaIC-2019.3.win\lib\idea_rt.jar=63365:C:\Users\rahul.raj\Downloads\ideaIC-2019.3.win\bin -Dfile.encoding=UTF-8 -classpath "C:\Program Files\Java\jdk1.8.0_191\jre\lib\charsets.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\deploy.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\ext\cldrdata.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\ext\jfxrt.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\ext\nashorn.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\ext\sunpkcs11.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\ext\zipfs.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\javaws.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\jce.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\jfr.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\jfxswt.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\jsse.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\management-agent.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\plugin.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\resources.jar;C:\Program Files\Java\jdk1.8.0_191\jre\lib\rt.jar;C:\Users\rahul.raj\IdeaProjects\JmhBenchMark\target\classes;C:\Users\rahul.raj\.m2\repository\org\openjdk\jmh\jmh-core\1.21\jmh-core-1.21.jar;C:\Users\rahul.raj\.m2\repository\net\sf\jopt-simple\jopt-simple\4.6\jopt-simple-4.6.jar;C:\Users\rahul.raj\.m2\repository\org\apache\commons\commons-math3\3.2\commons-math3-3.2.jar;C:\Users\rahul.raj\.m2\repository\org\openjdk\jmh\jmh-generator-annprocess\1.21\jmh-generator-annprocess-1.21.jar" com.benchmark.BenchMark
# JMH version: 1.21
# VM version: JDK 1.8.0_191, Java HotSpot(TM) 64-Bit Server VM, 25.191-b12
# VM invoker: C:\Program Files\Java\jdk1.8.0_191\jre\bin\java.exe
# VM options: -Xms2G -Xmx2G
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.benchmark.BenchMark.testMethod1

# Run progress: 0.00% complete, ETA 00:03:20
# Fork: 1 of 1
# Warmup Iteration   1: 1398.239 ops/ms
# Warmup Iteration   2: 1288.721 ops/ms
# Warmup Iteration   3: 1367.695 ops/ms
# Warmup Iteration   4: 1502.809 ops/ms
# Warmup Iteration   5: 1656.859 ops/ms
Iteration   1: 1672.573 ops/ms
Iteration   2: 1618.713 ops/ms
Iteration   3: 1589.281 ops/ms
Iteration   4: 1645.972 ops/ms
Iteration   5: 1636.542 ops/ms


Result "com.benchmark.BenchMark.testMethod1":
  1632.616 ±(99.9%) 119.643 ops/ms [Average]
  (min, avg, max) = (1589.281, 1632.616, 1672.573), stdev = 31.071
  CI (99.9%): [1512.973, 1752.260] (assumes normal distribution)


# JMH version: 1.21
# VM version: JDK 1.8.0_191, Java HotSpot(TM) 64-Bit Server VM, 25.191-b12
# VM invoker: C:\Program Files\Java\jdk1.8.0_191\jre\bin\java.exe
# VM options: -Xms2G -Xmx2G
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.benchmark.BenchMark.testMethod2

# Run progress: 50.00% complete, ETA 00:01:40
# Fork: 1 of 1
# Warmup Iteration   1: 1637.763 ops/ms
# Warmup Iteration   2: 1636.405 ops/ms
# Warmup Iteration   3: 1670.132 ops/ms
# Warmup Iteration   4: 1419.568 ops/ms
# Warmup Iteration   5: 1624.282 ops/ms
Iteration   1: 1265.384 ops/ms
Iteration   2: 1203.591 ops/ms
Iteration   3: 1271.569 ops/ms
Iteration   4: 1277.881 ops/ms
Iteration   5: 1208.284 ops/ms


Result "com.benchmark.BenchMark.testMethod2":
  1245.342 ±(99.9%) 139.698 ops/ms [Average]
  (min, avg, max) = (1203.591, 1245.342, 1277.881), stdev = 36.279
  CI (99.9%): [1105.643, 1385.040] (assumes normal distribution)


# Run complete. Total time: 00:03:21

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark               Mode  Cnt     Score     Error   Units
BenchMark.testMethod1  thrpt    5  1632.616 ± 119.643  ops/ms
BenchMark.testMethod2  thrpt    5  1245.342 ± 139.698  ops/ms

Results are as expected since allocations for primitive arrays are faster than that of String.  However, note that these results can vary depending upon your hardware. No code is magic without good hardware!

JMH does a good job in troubleshooting performance bottlenecks since we can narrow down the problem to a specific block of code. For those who had experience working in customer faced production apps,  you might have been in a similar situation:

Devs life be like
Devs life be like

The possibility of JMH doesn’t end here. Refer to the JMH documentation for more advanced use cases. You will also be developing your troubleshooting skills to another level 🙂

So, that’s it for now and hope you find this article useful!

RxJS Methods. Part 1

The methods that are covered in this video are map, filter, reduce, take/skip and distinct. Most of them are array-like primitives they will give you the basis of your further work with RxJs and introduce main instruments from this large swiss knife!

One Reply to “Benchmark Java Applications using JMH”

  1. Liked! Why not Caliper or JMeter?

Leave a Reply