Mimic MapReduce using CyclicBarrier to count a billion digits of PI

Hi, There:

I have been playing with java.concurrent.CyclicBarrier. It occurred to me that since CyclicBarrier is designed for waiting for a fixed number of Threads to reach the Barrier before moving on with an optional Runnable action, I could use this feature to mimic a MapReduce process – a Poorman’s MapReduce if you will.

I decide to count the occurrence of 0-9 in the PI up to a billion of digits. For pre-treatment, I sliced the one billion digits into 100 files, each roughly of 10 MB. This mimics the HDFS, without sharding.

The main method creates a CyclicBarrier and injects it into 100 Runnable Counters which count the occurrences of 0-9 in each of those 100 files. The Barrier registers another Runnable Aggregator which will only start to run() after all 100 Counters have reached the barrier. The Aggregator simply sums up all the counts from the 100 Counter instances and outputs the final result in terms of percentage of 0-9. Again, this mimics MapReduce process’ aggregation step.

CyclicBarrier barrier = new CyclicBarrier(100, new Aggregator());
for (int i=0;i<100;i++) new Thread(new Counter(i,barrier)).start();

 

Counter.java


class Counter implements Runnable {

// ...

public void run() {

FileReader reader = null;

try {

reader = new FileReader("pi/pi-slicer." + i);

int c = -1;

while ((c = reader.read()) != -1) ;

int k = Character.getNumericValue(c);
map.put(k, 1 + map.getOrDefault(k,0));

}

Aggregator.aggregate(map);

barrier.await();

} catch (Exception ex){

// ...

} finally {

// ...

}

}

Aggregator.java


class Aggregator implements Runnable {

static Map<Integer, Integer> digitMap = new ConcurrentHashMap<Integer, Integer>();

public void run() {

// calculate percentage of occurrence of 0-9

}

}

}

The key point is that the CyclicBarrier controls the whole process by waiting for all 100 Counters to finish the Reduce process before the aggregator does the final calculation.

The performance improvement from this Poorman’s MapReduce is great, despite the overheads and IO contention. The processing time was reduced from a single process of ~60 to 15 seconds.

By the way, if you care to know, each digit of 0-9 occurs almost exactly 10% in all of these one billion digits – to the precision of 5 decimal points!

Best,

-Tony

 

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s