Mimic MapReduce using CyclicBarrier to count a billion digits of PI

Hi, There:

I have been playing with java.concurrent.CyclicBarrier. It occurred to me that since CyclicBarrier is designed for waiting for a fixed number of Threads to reach the Barrier before moving on with an optional Runnable action, I could use this feature to mimic a MapReduce process – a Poorman’s MapReduce if you will.

I decide to count the occurrence of 0-9 in the PI up to a billion of digits. For pre-treatment, I sliced the one billion digits into 100 files, each roughly of 10 MB. This mimics the HDFS, without sharding.

The main method creates a CyclicBarrier and injects it into 100 Runnable Counters which count the occurrences of 0-9 in each of those 100 files. The Barrier registers another Runnable Aggregator which will only start to run() after all 100 Counters have reached the barrier. The Aggregator simply sums up all the counts from the 100 Counter instances and outputs the final result in terms of percentage of 0-9. Again, this mimics MapReduce process’ aggregation step.

CyclicBarrier barrier = new CyclicBarrier(100, new Aggregator());
for (int i=0;i<100;i++) new Thread(new Counter(i,barrier)).start();



class Counter implements Runnable {

// ...

public void run() {

FileReader reader = null;

try {

reader = new FileReader("pi/pi-slicer." + i);

int c = -1;

while ((c = reader.read()) != -1) ;

int k = Character.getNumericValue(c);
map.put(k, 1 + map.getOrDefault(k,0));




} catch (Exception ex){

// ...

} finally {

// ...




class Aggregator implements Runnable {

static Map<Integer, Integer> digitMap = new ConcurrentHashMap<Integer, Integer>();

public void run() {

// calculate percentage of occurrence of 0-9




The key point is that the CyclicBarrier controls the whole process by waiting for all 100 Counters to finish the Reduce process before the aggregator does the final calculation.

The performance improvement from this Poorman’s MapReduce is great, despite the overheads and IO contention. The processing time was reduced from a single process of ~60 to 15 seconds.

By the way, if you care to know, each digit of 0-9 occurs almost exactly 10% in all of these one billion digits – to the precision of 5 decimal points!





Convenience Method, Anyone?

Hi, There:

It’s been a while again. I am posting a quick one on Java Convenience Method, like this:

public static Integer addAll(Integer... nums) {
     Integer sum = 0;
     for (Integer num:nums) sum += num; 
     return sum; 

In comparison with a typical Array version:

public static Integer addAll(Integer[] nums) {
     Integer sum = 0;
     for (Integer num:nums) sum += num; 
     return sum; 

They both work fine with variable number of Integers and Integer[], respectively. But the problem is that if you put both of them in the same Class, the compile will complain about the “Duplicate Method” error. So it is apparent that the compiler treats the two method signature the same way and considers them to be identical.

It is said the Convenience Method is really a cosmetic syntax. In order to compare them, I looked at them using a Bytecode examiner. The two methods, respectively, showed up like this:

public static java.lang.Integer addAll(java.lang.Integer... nums);
 0 iconst_0
 1 invokestatic java.lang.Integer.valueOf(int) : java.lang.Integer [28]


 public static java.lang.Integer addAll(java.lang.Integer[] nums);
 0 iconst_0
 1 invokestatic java.lang.Integer.valueOf(int) : java.lang.Integer [28]

This means the ByteCode Inspector doesn’t tell us the real differences at all. Then I decided to examine it in runtime, using Reflection to reveal the Method Type . The result showed that both methods have identical signature like this:

public static java.lang.Integer ConvenienceTest.addAll(java.lang.Integer[])

This proves that both method are still different in Bytecode, but possess identical signature as Integer[] when translated into machine code in the JVM.

So why would you want to use the variable arguments over the Array version? I guess the keyword is “Convenience“, in that when this method is called, one can conveniently stack the Integer up into the arguments.  But personally, I think if I have more than three arguments, I would prefer to use the Array version. What do you think?




Internet Explorer caching behavior for RESTful data pulled by AngularJS

Hi, There:

A UI tester told me an AngularJS app doesn’t work in IE11, but works fine in Chrome or FF. So I gave it try and tested it in IE. In order to see how things are doing, I turned on the Developer tool to see what the console says and how network traffic moves – and you guessed it – everything works just fine! Soon I realized that the tester would never turn on the IE Developer tool when she was testing, so I turned it off. Then I could replicate the problems that she had described to me.

A bit later I realized, by default, IE actually cache the JSON data that AngularJS gets from the server side. But the cache is off when the Developer tool is on. Hence the behavior is different. For Chrome or FF, caching is not on by default. Searching throw sites such as StackOverflow gave similar suggestions. So it looks like force turning off caching in the application should be the right solution to the problem.

Typically there are two solutions to turn off cache, client side and server side. They both should work. But since the app is an AngularJS app that asynchronously gets data from server side’s RESTful services, html <meta> tag like this will not work well since the JSON data requested would not be affected:

<meta http-equiv="Pragma" content="no-cache">   <!-- does not work -->

The solution that will work is to enforce response to be not cached on the server side:

response.setHeader("Cache-Control", "private, no-store, no-cache, must-revalidate");
response.setHeader("Pragma", "no-cache");


This should ensure non-caching at the client side for a particular JSON data pulled by AngularJS. Certainly, if caching is desired, it should not be implemented as such. After implemented this, the above AngularJS application whose behavior depends fully on fresh JSON data now works well as expected!


-T. Yan


mitigate through maven-replacer-plugin phase issue

Hi, Happy Friday!

We often want to change certain application properties in Maven build process. A very common one is the build timestamp. A nice plugin we can use is the maven-replacer-plugin. Unfortunately, setting the <phase> tag in this plugin is not always straightforward, especially when you deal with different kind of packaging such as bundle for Felix, instead of jar files. Often the replacer runs too early (when the file being replaced has not been moved into the target folder) or too late (when the actual package has already been built).

I found a way to mitigate through the situation, a method may not be so orthodox but works every time.

Assume your file myapp.properties has items to be replaced on contains this line:


And the pom.xml can have this property and the replacer plugin config:




Then each time the plugin is run at clean phase, it will look for the regex pattern, in this case:


and replace it with the a current timestamp so the resulted line becomes:


This way, each time the build timestamp can be replaced correctly at the earliest phase of the build. When use this property in application, simply remove the pattern holder, in this case the two @’s before using it.

Again it is not super elegant or orthodox. But I have dealt with the phase issue with this replacer plugin and felt the time spent on it is totally not worth it. The above mitigating solution gives me expected result every time Maven runs.



Integrate Git into ANT targets

Hi, There:

For some of our projects, we still use ANT instead of Maven. And we have the need to build project directly out of GIT using ANT targets to ensure consistency for production deployment. If you don’t use Jenkins, then running ANT from Eclipse or Command Line is the option to have. The integration of ANT and Git is not all that straightforward. From the web, there are some good examples of how to do this, and this one works for us:

<target name="git.get.source">
 <delete dir="${LOCAL_PATH}"/>
 <mkdir dir="${LOCAL_PATH}"/>
 <git command = "clone">
 <arg value = "--progress" />
 <arg value = "--verbose" />
 <arg value = "${GIT_URL}" />
 <arg value = "${LOCAL_PATH}" />

<!-- git macro utils setup -->
 <macrodef name = "git">
 <attribute name = "command" />
 <attribute name = "dir" default = "" />
 <element name = "args" optional = "true" />
 <echo message = "git @{command}" />
 <exec executable = "${GIT_EXE_PATH}" dir = "@{dir}">
 <arg value = "@{command}" />

The above ANT scripts assumes that you have installed Git client and ${GIT_EXE_PATH} is where the git.exe resides. After the source code project is cloned into ${LOCAL_PATH}, building project is simply to run typical ANT targets such as compile and jar.

There is one error I have encountered, however, if the ${LOCAL_PATH} is synchronized into any Eclipse project,  this annoying error would occur the next time you run the git.get.source:

Unable to delete ..

It turns out that Eclipse is keeping an eye on the Git status as well and have a file lock on this pack file. It is not really a bug per se, just multiple processes are working on the Git Status of the clone.

The only way to get around this issue if you want to run ANT script within Eclipse is to use a folder for ${LOCAL_PATH} that is not under the control of Eclipse project. For example, use c:\temp\myproject\ for ${LOCAL_PATH} would work just fine. Since we don’t plan to modify the codes from GIT in Eclipse but simply want to build the project out of Git sources, this trick is good to use.



CipherInputStream to the rescue: Saving memory usage when encrypting huge binary payloads

Hi, There:

If you are one of those lucky ones in the northeast like me, I hope your arms are not so sore after shoveling the snow! Normally I enjoy using my jumbo snow blower for any snow that’s larger than 4 inches. But it decided to quit with busted sheer bolts this time! Before it is fixed, my arm and back will step in for the job!

Anyway, at work we need to encrypt huge binary payload due to the nature of the data. I found that simply using Cipher to encrypt large payload, it often busts the Heap Size when the payload is a couple of hundreds of MB, depend on the JVM and the encrypting parameters, of course. But this limit is not desirable for us where we need sometimes 1 GB of data to be encrypted, in a timely way.

At this point, JDK 8 already has Streaming Encryption implemented for us! How nice!  The class is CipherInputStream. Here is the demo code excerpt that will encrypt byte[] mydata through the Streaming process and save the cipher into a file. The time to encrypt is almost linear to the size of byte[] mydata, indicating a nice Streaming process. Most importantly, no more Memory issues with heap size busting!

byte[] salt = new byte[8];
// random SALT

new Random().nextBytes(salt); 

// random Initial Vector IV when cipher init
PBEParameterSpec pbeParamSpec = new PBEParameterSpec(salt, 1000); 

PBEKeySpec pbeKeySpec = new PBEKeySpec(“funkypassword”.toCharArray());
SecretKeyFactory keyFac = SecretKeyFactory.getInstance(“PBEWithHmacSHA256AndAES_256”);
SecretKey pbeKey = keyFac.generateSecret(pbeKeySpec);

// Create PBE Cipher
Cipher pbeCipher = Cipher.getInstance(“PBEWithHmacSHA256AndAES_256”);
// Initialize PBE Cipher with key and parameters
pbeCipher.init(Cipher.ENCRYPT_MODE, pbeKey, pbeParamSpec);

AlgorithmParameters algParams = pbeCipher.getParameters();
byte[] encodedAlgParams = algParams.getEncoded();

byte[] mydata = “make believe large payload secrete data here”.getBytes();
FileOutputStream fos = null;
CipherInputStream cis = null;

fos = new FileOutputStream(new File(“C:/temp/secrete.cipher”));
cis = new CipherInputStream(new ByteArrayInputStream(mydata), pbeCipher);

byte[] b = new byte[1024];
int i = cis.read(b);
while (i != -1) {
fos.write(b, 0, i);
i = cis.read(b);



I am not posing the codes here. But if you want to decrypt huge binary payload, just use the CipherOutputStream in a similar way. It works just as nicely as CipherInputStream.

Cheers, and if we can’t help it, let it snow – let it snow !!


JAX-WS Streaming/MTOM with WSSE UsernameToken WITHOUT using MessageHandler

Hi, Happy Chinese New Year!

A while back I have this post about Data Handler Issue when using MTOM for Streaming SOAP services Issue with Streaming/MTOM with DataHandler. The issue was essentially that any message handler after the will cause load of whole binary content into memory and hence cause Out of Memory issues at the client side if the binary content is too large. I described a solution for the CXF implementation.  Now I have searched and tested a solution when use JAX-WS default implementation, thanks mostly to this post in StackOverflow Link.

import javax.xml.soap.SOAPElement;
import javax.xml.soap.SOAPFactory;
import javax.xml.ws.BindingProvider;
import javax.xml.ws.soap.MTOMFeature;
import javax.xml.ws.soap.SOAPBinding;
import com.sun.xml.ws.api.message.Header;
import com.sun.xml.ws.api.message.Headers;
//Static Strings
private static String SECURITY_NS = “http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd&#8221;;
private static String PASSWORD_TYPE = “http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#PasswordText&#8221;;
private static String AUTH_PREFIX = “wss”;
// Prepare Service and Port
MyService service = getMySerice();
MyservicePort port = service.getMyServicePort();
BindingProvider bp = (BindingProvider)port;
SOAPBinding soapbinding = (SOAPBinding) bp.getBinding();
SOAPFactory soapFactory = SOAPFactory.newInstance();
SOAPElement security = soapFactory.createElement(“Security”, AUTH_PREFIX, SECURITY_NS); SOAPElement uToken = soapFactory.createElement(“UsernameToken”, AUTH_PREFIX, SECURITY_NS);
SOAPElement username = soapFactory.createElement(“Username”, AUTH_PREFIX, SECURITY_NS);
SOAPElement pass = soapFactory.createElement(“Password”, AUTH_PREFIX, SECURITY_NS); pass.addAttribute(new QName(“Type”), PASSWORD_TYPE); pass.addTextNode(this.getPassword().trim());
Header header = Headers.create(security);
((WSBindingProvider) port).setOutboundHeaders(header);

This way, the Security Header will have the WSSE UsernameToken without disturbing MTOM payload which is being streamed in my operation. If the WSSE header were processed in MessageHandler, any huge binary payload would cause Out of Memory exception very quickly and fail the whole SOAP invocation right off the bat.