Kafka Streams process vertically

Hi, There:

Happy Friday! Can it be any colder one day before May 1st in the northeast? 😦

If you ever hear that Kafka Streams process vertically, it is actually more important than it sounds like. I am doing a simple demo below.

The Streams Pipeline:

final KStream<String, String> input = builder.stream(INPUT_TOPIC);

input
.peek((k, v) -> System.out.println(new Date() + " @peek 1 value: " +  v))
.filter((k, v) -> v.length() < 10)
.peek((k, v) -> System.out.println(new Date() + " @peek 2 value: " +  v))
.mapValues(v -> v = holdAndUpper(v))
.peek((k, v) -> System.out.println(new Date() + " @peek 3 value: " +  v))
.print(Printed.toSysOut());


Two meaningful processes, one to filter out anything longer than 10 chars, and mapValues to call an external function holdAndUpper. Between them, there are peek functions.

In the function holdAndUpper, we hold for n seconds and then return the uppercased string of the input. n is the length of the input String. If the input is 8 in length, it will wait for 8 seconds before returning.

private static String holdAndUpper(String s) {
	try { 
		Thread.sleep(s.length()*1000); 
	} catch (InterruptedException e) {
		e.printStackTrace(); 
	}
	return s.toUpperCase();
}

Now input all these three strings in one step.

abcdefghiklmn
abcdefg
ab

The Output:

Fri Apr 30 17:39:35 EDT 2021 @peek 1 value: abcdefghiklmn
Fri Apr 30 17:39:35 EDT 2021 @peek 1 value: abcdefg
Fri Apr 30 17:39:35 EDT 2021 @peek 2 value: abcdefg
Fri Apr 30 17:39:42 EDT 2021 @peek 3 value: ABCDEFG
[KSTREAM-PEEK-0000000005]: null, ABCDEFG
Fri Apr 30 17:39:42 EDT 2021 @peek 1 value: ab
Fri Apr 30 17:39:42 EDT 2021 @peek 2 value: ab
Fri Apr 30 17:39:44 EDT 2021 @peek 3 value: AB
[KSTREAM-PEEK-0000000005]: null, AB

Under this single processor condition, a few important things can be observed from the output.

  • The filter of > 10 chars stops right away and it finished life inside of the processor at the filter. It is immediately followed by the second input to the steam, which has 7 chars.
  • While the 7 chars string (abcdefg) in the processing pipeline, the third input (ab) never enters the pipeline. The 7 chars string took all the expected time of 7 seconds before it exits the processor.
  • The third 2 chars string (ab) only enters the processor after the 7 chars string is finished.

Now, that is the meaning of “vertical processing” of Kafka Streams. It also emphasizes the importance of order of the processing steps. For one, the filter step should be put in the front of the pipeline. If a time-costly step is before a filter, an input could be going through that step and later be filtered off in the filter step. That is not only a waste of the process itself, other inputs could be waiting at the same time for bigger loss.

Depends on the nature of the data processing, the processes can be spread across horizontally to many consumers, or vertically to multiple threads. But just remember in each instance of the processor, the steps are going through vertically for each input.

Cheers!

~T