November 2012

Imagine the situation you have to calculate a number of occurrence of the words in a text. The classic approach is to traverse through the text with iterator(or collection) and increment the counter (stored in a map) for each occurrence of string

        public Map<String, Integer> count(Iterable<String> input) {
            Map<String, Integer> result = new HashMap<String, Integer>();
            for (String word : input) {
                Integer integer = result.get(word);
                if (integer == null) result.put(word, 1);
                else result.put(word, integer + 1);
            }
            return result;
        }

Since we can't create an instance of a Map parametrized with String key and int value we have to use Integer and rely on boxing/unboxing. And in cases we have a plenty of repeating strings the boxing/unboxing of integers at the line 6(result.put(word, integer + 1)) has a significant impact on the speed of execution. But this can be easily optimized. There are several approaches on how to eliminate this drawback:

Use com.google.common.collect.Multimap in order to count repetitions. It fits this task very well, but if you wanna calculate a bit more sophisticated metric on a String (:String => :Integer) the Multimap doesn't help.
We can use TObjectIntMap by Trove, it's fast (In general if you need to work with collections of primitive types I would recommend Trove).

Create a wrapper (let's call it MutableInteger) to avoid boxing and unboxing (as alternative you can use Apache's commons implementation). So, our code turns to be:

                public Map<String, MutableInteger> count(Iterable<String> input) {
                Map<String, MutableInteger> result = new HashMap<String, MutableInteger>();
                    for (String word : input) {
                        MutableInteger integer = result.get(word);
                        if (integer == null) result.put(word, new MutableInteger(1));
                        else integer.value +=1;
                    }
                    return result;
                }

                class MutableInteger {
                    int value;

                    MutableInteger(int value) {
                        this.value = value;
                    }

                    public int getValue() {
                        return value;
                    }
                }

Use the int[1] instead of Integer to map and your code will look like this

                public Map<String, int[]> count(Iterable<String> input) {
                    Map<String, int[]> result = new HashMap<String, int[]>();
                    for (String word : input) {
                        int[] integer = result.get(word);
                        if (integer == null) result.put(word, new int[]{1});
                        else integer[0] += 1;
                    }
                    return result;
                }

This approach kills the abstraction and doesn't encapsulate the value but the question is does it work faster than 3nd solution?

While the 1-4 solve the problem the 3rd and 4th approaches raise another question "Is the access to objects field faster than the access to element of array with index 0?"

Is the access to objects field faster than the access to element of array with index 0?

Let's try to solve this problem with synthetic tests

        public void accessToArray0(int[] array) {
            System.gc();
            for (int i = 0; i < Integer.MAX_VALUE; i++)
                array[0] += 1; //then write
        }
    
        public void accessToField(MutableInteger mutable) {
            System.gc();
            for (int i = 0; i < Integer.MAX_VALUE; i++)
                mutable.value += 1;
        }
    
        public long testFieldAccess() {
            long l;
            long delta;
            MutableInteger mutable = new MutableInteger(1);
            for (int i = 0; i < 5; i++) {
                accessToField(mutable);
            }
            l = System.currentTimeMillis();
            accessToField(new MutableInteger(0));
            delta = System.currentTimeMillis() - l;
            return delta;
        }
    
        public long testArrayAccess() {
            long l;
            long delta;
            int[] array = {1};
            for (int i = 0; i < 5; i++) {
                accessToArray0(array);
            }
            l = System.currentTimeMillis();
            accessToArray0(new int[]{0});
            delta = System.currentTimeMillis() - l;
            return delta;
        }
    
        public static void main(String[] args) {
            Calculate calculate = new Calculate();
            for (int i = 0; i < 5; i++) {
                System.out.println("Field: " + calculate.testFieldAccess());
                System.out.println("Array: " + calculate.testArrayAccess());
            }
/*
Field: 92
Array: 134
Field: 94
Array: 134
Field: 92
Array: 136
Field: 93
Array: 135
Field: 93
Array: 135
*/
        }

This test gives a 50% better results for an access to field, surprising isn't it? It next article I'll try to explain why does this happen.

Coding for Food

Wednesday, November 28, 2012

Mastering Java: Boxing vs MutableInteger vs int[], Part 1

Is the access to objects field faster than the access to element of array with index 0?

Thursday, November 8, 2012

Grails: MapReduce in Mongo