Easy testing SLAs on distributed components with grep4j
So your distributed architecture looks something like the picture below and you just have received a requirement from the business to make sure that the SLAs of the messages sent by the Producer and then traveling to the downstream systems (consumers) must be fast and never slower than 400 milliseconds.
Requirement says :
The Latency of a message sent from the Producer to any of the Consumers should be never slower than 400 milliseconds.
Sounds familiar? To me yes, and experience taught me that if I want to protect the SLAs in the future, I need as well to automate the test in order to not introduce bottlenecks that then increase the latency of the messages. But how to do it? Producer and Consumers are in separate machines and some of the consumers are not written in Java. Also, between the producer and the consumers there is a Queue (or web-service or RMI or an ESB or another component or whatever), so things are not getting easier for me to test. Well, all components write logs in a similar way, so why not using logs as a data to test?
For example these are 2 sample logs, one from the producer firing a message (id 1546366) and the other from one of the consumer receiving the message (id 1546366):
Producer logs
2013-02-19 10:09:05,795 INFO [org.grep4j.demo.input.core.InputCoreMessageSender] (pool-19-thread-9) Input::MessageSender::Message(1546366) Sent Successfully
Consumer logs
2013-02-19 10:09:06,161 INFO [org.grep4j.demo.handler.bean.mdb.SingleDestPacketHandler] (Thread-62314 (HornetQ-client-global-threads-989457197)) Handler::Packet::Message(1546366) Received::PacketId(982336426)::State(NORMAL)::Dest(CONSUMER4, 1)::DataLevel(EVENT, 7)::Op(CREATE, C)::GlobalId(1546366)::Priority(1)::Src(GUI, 1::Ids(EventId=1546366,SFBId=1546366,isBirType=false,inBir=false))
And this is how my automated performance test looks like using Grep4j :
package com.gdg.grep4j.demo; import static com.gdg.grep4j.demo.profiles.Profiles.consumer1; import static com.gdg.grep4j.demo.profiles.Profiles.consumer2; import static com.gdg.grep4j.demo.profiles.Profiles.consumer3; import static com.gdg.grep4j.demo.profiles.Profiles.producer; import static com.gdg.grep4j.demo.services.TimeService.extractTime; import static org.grep4j.core.Grep4j.constantExpression; import static org.grep4j.core.Grep4j.grep; import static org.grep4j.core.fluent.Dictionary.on; import static org.hamcrest.CoreMatchers.is; import static org.hamcrest.number.OrderingComparison.lessThan; import static org.junit.Assert.assertThat; import org.grep4j.core.result.GrepResults; import org.testng.annotations.BeforeTest; import org.testng.annotations.Test; @Test public class MessageDistributionPerformanceTest { private static final long MAX_ACCETABLE_LATENCY = 400L; private long producerTime = 0; private GrepResults consumersResults; @BeforeTest public void triggerMessageDispatcher() { System.out.println('Producing and firing a Message(1546366) to downstream systems...'); } @BeforeTest public void extractProducerTime() { GrepResults producerResult = grep(constantExpression('Message(1546366) Sent Successfully'), on(producer)); producerTime = extractTime(producerResult.toString()); } @BeforeTest public void grepConsumerLogs() { consumersResults = grep(constantExpression('Message(1546366) Received'), on(consumer1, consumer2, consumer3)); } public void testConsumer1Latency() { long consumer1Time = extractTime(consumersResults.filterOnProfile(consumer1).toString()); assertThat((consumer1Time - producerTime), is(lessThan(MAX_ACCETABLE_LATENCY))); } public void testConsumer2Latency() { long consumer2Time = extractTime(consumersResults.filterOnProfile(consumer2).toString()); assertThat((consumer2Time - producerTime), is(lessThan(MAX_ACCETABLE_LATENCY))); } public void testConsumer3Latency() { long consumer3Time = extractTime(consumersResults.filterOnProfile(consumer3).toString()); assertThat((consumer3Time - producerTime), is(lessThan(MAX_ACCETABLE_LATENCY))); } }
A profile is the grep target context, in my case all the profiles are remote machines (for a better understanding of Profiles see Grep4j page).
TimeService is just a simple service class extracting the time in the logs.
package com.gdg.grep4j.demo.services; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Date; import java.util.TimeZone; import java.util.regex.Matcher; import java.util.regex.Pattern; public class TimeService { private static final Pattern timePattern = Pattern .compile('([0-9][0-9][0-9][0-9])-([0-9][0-9])-([0-9][0-9]) ([0-9][0-9]|2[0-3]):([0-9][0-9]):([0-9][0-9]),([0-9][0-9][0-9])'); public static long extractTime(String text) { Matcher lm = timePattern.matcher(text); if (lm.find()) { SimpleDateFormat sdf = new SimpleDateFormat( 'yyyy-MM-dd HH:mm:ss,SSS'); sdf.setTimeZone(TimeZone.getTimeZone('UTC')); String inputString = lm.group(); Date date = null; try { date = sdf.parse(inputString); } catch (ParseException e) { e.printStackTrace(); } return date.getTime(); } else { throw new IllegalArgumentException('timePattern not found'); } } }
In few simple lines of code I have my extremely flexible test (I can test anything that was produced in the logs) . For the full code : https://github.com/marcocast/grep4j-gdg.git
Reference: Easy testing SLAs on distributed components with grep4j from our JCG partner Marco Castigliego at the Remove duplication and fix bad names blog.