January 7, 2015

Performance Optimization on String Concatenation

In Java, there are lots of built-in and powerful classes that we use daily. For small programs the built-in classes may work well, but when it comes to big data manipulation, sometimes these simple programs create havoc and we have to look around the optimizations.

Problem statement:

String is one of the powerful classes that resides in the java.lang package. This is an immutable class and any operation that causes a change in the original string results in creating a new object. So when we do the operation that results in the alteration of any string this creates a performance bottleneck, which we can avoid by following simple steps.

So here we will discuss concatenation and see how a simple program can result to performance bottleneck.

Exploring String Concatenation:

These are the ways by which two or more strings can be concatenated.

1) Using the [+] plus operator.
2) Using the concat method of String Class.
3) Using the append method of StringBuilder Class.

String concatenation using plus [+] operator.

Developers prefer this way to concat the strings. They do not even think about the type of the operands and this operator does the job for them without any hassle. So let’s just consider some cases and check if it is the right way of using this operator.
Consider the following simple code

public static void main(String[] args) {
	int number = 8;
	String temp = number + " CONCAT";
}

This code looks just great, clean and simple to understand. Let’s look a bit behind the yellow shaded code, which is translated into bytecode for the execution of Java runtime environment.

1)        NEW java/lang/StringBuilder
2)        DUP
3)        ILOAD 1
4)        INVOKESTATIC java/lang/String.valueOf (I)Ljava/lang/String;
5)        INVOKESPECIAL java/lang/StringBuilder.<init> (Ljava/lang/String;)V
6)        ALOAD 2
7)        INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
8)        INVOKEVIRTUAL java/lang/StringBuilder.toString ()Ljava/lang/String;

NOTE: The yellow shaded lines in bytecode are affecting the performance

Explanation of above byte code

  • At step 3: value 8 of type int is loaded.
  • At step 4: This integer is converted to String using String.valueOf method.
  • At step 5: Object of StringBuilder is created by passing the argument 8 as String, converted at step 4.
  • At step 6: String “ CONCAT” is loaded.
  • At step 7: append method of StringBuilder is called.
  • At step 8: toString method of StringBuilder is called and resultant value is returned to the caller.

So in the above case, the performance is not really good if you try to concat this type of arrangement a large number of times. The above highlighted lines are the main cause for this.

Problem Explanation in above bytecode

So here are the following caveats to the performance.

  • String.valueOf method creates two objects to convert the primitives into String object.

a)      Array of char is created to fit that number into it.
b)      String object is created to copy char array into it.

  • StringBuilder append the String.

StringBuilder creates the char array internally to append the newly added string. With each new append, the size of internal char array is checked and expanded to accommodate the new string.

How to use plus operator for String concatenation

public static void main(String[] args) {
			String temp = "FIRST" + " SECOND";
	}

In this case also the code looks great, but let’s analyzes the bytecode.

L0
LINENUMBER 7 L0
LDC "FIRST SECOND"
ASTORE 1

Here, if you see the line LDC “FIRST SECOND”Java uses its compilation optimizations, where the constants are already appended at the compile time.

So the plus operator will be at its best when used with the String constants. If not all the attributes are constants, at least left most value should be String constant.

String concatenation using StringBuilder

StringBuilder is using its append method to concat the strings. Internally, it creates the char array and appends the Strings to it. The way by which StringBuilder append the strings as

Problem Explanation

It checks if the existing size of internal chars array has sufficient space. If not, then it expands the size of internal char array by following ways.

  • Internal char array is expanded by the length [(CURRENT_LENGTH + 1)*2].
  • If [(CURRENT_LENGTH + 1)*2] is negative, then the max size of Integer is assigned as the new length.
  • If the new String has the size more than [(CURRENT_LENGTH + 1)*2], then it is expanded to the new length.

After looking at the above expansion logic, it is quite clear that after each append, we have additional char bits and its garbage process, which break this append process down.

How to use StringBuilder for concatenation

  • Allocate enough space to the StringBuilder, so that on each append it should not expand internal char array.
  • If the Current string is small and the new string’s length is greater than the current string, then internal char array will be expanded, but the expansion will be smaller than [(CURRENT_LENGTH + 1)*2].

String concatenation using String.concat method

The concat method on the String class takes the current string, appends the specified string, and returns a new String. Internally, the String.concat method creates a new char array with enough space for both strings, copies both strings into the new array, and then creates a new String with that array.That constructor actually creates a new duplicated array and copies the bytes. So, because of the package protected special constructor, the String.concat method is optimized in that it uses shared char arrays and only requires the creation of a new String and a new char array

How to use concat method for concatenation

Use the concat method only when we have to concat two string variables. So if we have value1 and value2 non-constant, then use the concat method. String.concat is considerably faster in this case again as the JVM can no longer apply its optimizations for + operator and the buffer created with StringBuilder is overflown with normal non-constant variables.

String temp = value1.concat (value2);

Performance Results for Concatenation under various scenarios

1)      Concatenating Two Constants

The results shown below clearly state that [+] operator is much faster than other methods.

Type Input count Methods
[+] operator concat() append()
Constants 2 59 4285 4981

2)      Concatenating Two Variables

The results shown below clearly state that concat method is much faster than other methods.

Type Input count Methods
[+] operator concat() append()
Variables 2 7292 4486 5281

3)      Concatenating Four Constants

Type Input count Methods
[+] operator concat() append()
Constants 4 52 14370 7529

4)      Concatenating Four Variables

Here the append() method is more useful to get the optimum results.

Type Input count Methods
[+] operator concat() append()
Variables 4 23158 16725 9883

Note: The time taken is shown in milliseconds after 100000000 iterations.

Based upon the above performance result, we can conclude that String + operator concatenate’s final String Objects at compile time and for others it internally uses StringBuilder. We would suggest that the String concat method is efficient if two String Objects have to be concatenated but if we concatenate mass String objects, then we should use StringBuilder.