Core Java

Guava Splitter vs StringUtils

So I recently wrote a post about good old reliable Apache Commons StringUtils, which provoked a couple of comments, one of which was that Google Guava provides better mechanisms for joining and splitting Strings. I have to admit, this is a corner of Guava I’ve yet to explore. So thought I ought to take a closer look, and compare with StringUtils, and I have to admit I was surprised at what I found.

Splitting strings eh? There can’t be many different ways of doing this surely?

Well Guava and StringUtils do take a sylisticly different approach. Lets start with the basic usage.
 

// Apache StringUtils...
String[] tokens1 = StringUtils.split('one,two,three',',');

// Guava splitter...
Iterable<String> tokens2 = Splitter.on(',').split('one,two,three');

So, my first observation is that Splitter is more object orientated. You have to create a splitter object, which you then use to do the splitting. Whereas the StringUtils splitter methods uses a more functional style, with static methods.

Here I much prefer Splitter. Need a reusable splitter that splits comma separated lists? A splitter that also trims leading and trailing white space, and ignores empty elements? Not a problem:

Splitter niceCommaSplitter = Splitter.on(',')
                              .omitEmptyString()
                              .trimResults();

niceCommaSplitter.split('one,, two,  three'); //'one','two','three'
niceCommaSplitter.split('  four  ,  five  '); //'four','five'

That looks really useful, any other differences?

The other thing to notice is that Splitter returns an Iterable<String>, whereas StringUtils.split returns a String array.

Don’t really see that making much of a difference, most of the time I just want to loop through the tokens in order anyway!

I also didn’t think it was a big deal, until I examined the performance of the two approaches. To do this I tried running the following code:

final String numberList = 'One,Two,Three,Four,Five,Six,Seven,Eight,Nine,Ten';

long start = System.currentTimeMillis();  
for(int i=0; i<1000000; i++) {
    StringUtils.split(numberList , ',');   
}
System.out.println(System.currentTimeMillis() - start);

start = System.currentTimeMillis();
for(int i=0; i<1000000; i++) {
    Splitter.on(',').split(numberList );
}
System.out.println(System.currentTimeMillis() - start);

On my machine this output the following times:

594
31

Guava’s Splitter is almost 10 times faster!

Now this is a much bigger difference than I was expecting, Splitter is over 10 times faster than StringUtils. How can this be? Well, I suspect it’s something to do with the return type. Splitter returns an Iterable<String>, whereas StringUtils.split gives you an array of Strings! So Splitter doesn’t actually need to create new String objects.

It’s also worth noting you can cache your Splitter object, which results in an even faster runtime.

Blimey, end of argument? Guava’s Splitter wins every time?

Hold on a second. This isn’t quite the full story. Notice we’re not actually doing anything with the result of the Strings? Like I mentioned, it looks like the Splitter isn’t actually creating any new Strings. I suspect it’s actually deferring this to the Iterator object it returns.

So can we test this?

Sure thing. Here’s some code to repeatedly check the lengths of the generated substrings:

final String numberList = 'One,Two,Three,Four,Five,Six,Seven,Eight,Nine,Ten';
long start = System.currentTimeMillis();  
for(int i=0; i<1000000; i++) {
  final String[] numbers = StringUtils.split(numberList, ',');
    for(String number : numbers) {
      number.length();
    }
  }
System.out.println(System.currentTimeMillis() - start);

Splitter splitter = Splitter.on(',');
start = System.currentTimeMillis();
for(int i=0; i<1000000; i++) {
  Iterable<String> numbers = splitter.split(numberList);
    for(String number : numbers) {
      number.length();
    }
  }
System.out.println(System.currentTimeMillis() - start);

On my machine this outputs:

609
2048

Guava’s Splitter is almost 4 times slower!

Indeed, I was expecting them to be about the same, or maybe Guava slightly faster, so this is another surprising result. Looks like by returning an Iterable, Splitter is trading immediate gains, for longer term pain. There’s also a moral here about making sure performance tests are actually testing something useful.

In conclusion I think I’ll still use Splitter most of the time. On small lists the difference in performance is going to be negligible, and Splitter just feels much nicer to use. Still I was surprised by the result, and if you’re splitting lots of Strings and performance is an issue, it might be worth considering switching back to Commons StringUtils.
 

Reference: Guava Splitter vs StringUtils from our JCG partner Tom Jefferys at the Tom’s Programming Blog blog.

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
oussama zoghlami
oussama zoghlami
12 years ago

except guava’s Splitter and Joiner, i think that StringUtils is more richer. It’s time for guava team to improve their ‘Strings’ utility class ;)

Karl Isenberg
12 years ago

It’s worth mentioning that the Splitter Iterator delays the actual splitting, whereas StringUtils.split does it all up front. This may make a difference in certain use cases, like when searching for the first match and still needing all the preceding values, but not the following ones, or when you only need a subset of the return values and never store the others to variables. It’s also a boon when parsing large strings as it doesn’t have to store the whole array in memory at one time. There might also be cases where returning an Iterator makes the code simpler than… Read more »

assylias
assylias
12 years ago

Beware of micro benchmarks: http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java For example, it is conceivable that the second loop in your first example is simply ignored by the JVM because it does not have any side effects. There are many factors that could significantly affect your results.

Sam_Sonite
Sam_Sonite
12 years ago

“There’s also a moral here about making sure performance tests are actually testing something useful.” Maybe follow your own advice. How is your test useful? Also, were your tests written in Groovy. All the strings have single quotes? here are my results if you sum the lengths and print it out: run: 39000000: 375 39000000: 427 summing the lengths with a leading whitespace in one value run: 40000000: 357 40000000: 436 … and trimming the results run: 39000000: 456 39000000: 586 Not nearly as dramatic as you exclaim. Now, this difference is over 1 million trial so the only question… Read more »

Back to top button