Thursday, July 25, 2013

» Post-Order Iterative DFS Traversal using Stack

Several days ago I had to traverse a huge graph in depth (depth-first-search) visiting edges in post-order. I was working on Kasaraju's algorithm to find strongly connected components of a graph.

Of course the default (natural) implementation of post-order DFS is very easy to be read and understand. In my implementation of I'm using callback in order to change the behaviour when node is visited - for example I can just print the nodes or add the into a collection without changing the algorithm.

    public void dfsPostOrderRecursive(AdjGraph graph, AdjGraph.Node vertex, 
                                                 Callback callback) {
        if (vertex.isVisited())
            return;

        vertex.markVisited();

        List<AdjGraph.Node> edges = graph.edges(vertex);
        for (AdjGraph.Node node : edges) {
            if (!node.isVisited())
                dfsPostOrderRecursive(graph, node, callback);
        }

        callback.nodeVisited(graph, vertex);  //post order call to callback
    }

The thing is, there were about a million of nodes in the graph. So it's possible to reach a stack depth of 1 million of nested calls while doing recursive DFS. The Java stack by default is not configured to work with such a huge amount of recursive calls...

So there were 2 possible ways to workaround the problem:

  1. Configure JVM stack to work with a million of nested calls. 
  2. Re-write the algorithm to be iterative. 
I'll be honest, at first I've tweaked the stack config. -Xss4m worked and the calculation has finished successfully without StackOverflowError.


java -Xmx1024m -Xss4m -classpath "...." MyApp

Iterative Post-Order DFS. 

The second approach is more generic and can be ported to any platform or language. I've googled if there any existing solution existed but found only algorithms for trees.

While I needed it to work for general ordered cyclic graphs I've decided to implement the algorithm by myself. Frankly it wasn't even hard but a bit trickier that pre-order.

Here you go, iterative post-order DFS using stack:

 public void dfsPostOrderIterative(AdjGraph graph, AdjGraph.Node vertex,
                                         Callback callback) {
        Stack<Level> toVisit = new Stack<Level>();
        toVisit.push(new Level(Collections.singletonList(vertex)));

        while (!toVisit.isEmpty()) {
            Level level = toVisit.peek();

            if (level.index >= level.nodes.size()) {
                toVisit.pop();
                continue;
            }

            AdjGraph.Node node = level.nodes.get(level.index);

            if (!node.isVisited()) {
                if (node.isChildrenExplored()) {
                    node.markVisited();
                    callback.nodeVisited(graph, node);
                    level.index++;
                } else {
                    List<AdjGraph.Node> edges = graph.edges(node);
                    List<AdjGraph.Node> outgoing = Lists.newArrayList
                                  (Collections2.filter(edges, new Predicate<AdjGraph.Node>() {
                        @Override
                        public boolean apply(AdjGraph.Node input) {
                            return !input.isChildrenExplored();
                        }
                    }));

                    if (outgoing.size() > 0)
                        toVisit.add(new Level(outgoing));
                    node.markChildrenExplored();
                }
            } else {
                level.index++;
            }
        }
    }

The main idea of iterative post-ordered DFS is to keep two states for each vertex:

  1. Track whether we explored it's children 
  2. If we finally visited the node. 
On each level of stack we are keep all the outgoing edges(in fact nodes) then iterate though them visiting those which haven't been visited yet.

Pretty simple and straightforward algorithm that might help you to go without -Xss option for JVM with a huge graph.

The complete DFS and Kasaraju's implementations can be found on my GitHub.


 In addition I'd like to recommend you to read a post on why accessing the field of an object is faster then a accessing an element of array. And also I'd recommend to take a look at Java Tuples library. Enjoy and thank you.

No comments:

Post a Comment