Time to break free from the SQL-92 mindset

Vlad MihalceaMay 14th, 2014Last Updated: May 14th, 2014

0 40 2 minutes read

Are you stuck in the 90s?

If you are only using the SQL-92 language reference, then you are overlooking so many great features like:

Some test data

In my previous article I imported some CSV Codahale metrics into PostgreSQL for further analysis.

Our time series data consists of a total request count and a data recording time stamp:

count	t
151	1398778568
169	1398778583
169	1398778598
169	1398778613
169	1398778628
169	1398778643
1587	1398778658
6473	1398778673
11985	1398778688

Arrival velocity

I want to calculate the arrival velocity which can be defined as:

λ = arrival_velocity = Δcount / Δt

For each time event we need to subtract the current and previous count and time stamp values.

Window functions allow us to aggregate/reference previous/next rows without restricting the SELECT clause to a single result row:

SELECT
	t as "Current time stamp",
	prev_t as "Previous time stamp",
	current_count as "Current total request count",
	prev_count as "Previous total request count",
	ROUND(((current_count - prev_count)::numeric/(t - prev_t)::numeric), 3) as "Velocity [req/sec]"	
FROM (
	SELECT
		t,
		lag(t, 1) over () as prev_t,				
		count as current_count,
		lag(count, 1) over () as prev_count
	FROM
		connection_lease_millis
	WINDOW grouping AS (			
		ORDER BY t
		ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
	)
) raw_data

Giving us the arrival velocity:

Current time stamp	Previous time stamp	Current total request count	Previous total request count	Velocity [req/sec]
1398778568		151
1398778583	1398778568	169	151	1.200
1398778598	1398778583	169	169	0.000
1398778613	1398778598	169	169	0.000
1398778628	1398778613	169	169	0.000
1398778643	1398778628	169	169	0.000
1398778658	1398778643	1587	169	94.533
1398778673	1398778658	6473	1587	325.733
1398778688	1398778673	11985	6473	367.467

Arrival acceleration

But what if we want to calculate the arrival acceleration (e.g. so we can figure out how the arrival rate flactuates), which is

arrival_acceleration = Δarrival_velocity/ Δt

This is how we can do it:

SELECT
	t as "Current time stamp",
	prev_t as "Previous time stamp",
	velocity "Velocity [Req/sec]",
	ROUND((velocity - lag(velocity, 1) over ())::numeric/(t - prev_t)::numeric, 3) as "Acceleration [req/sec2]"		
FROM (
	SELECT
		t,
		prev_t,
		current_count,
		prev_count,
		ROUND(((current_count - prev_count)::numeric/(t - prev_t)::numeric), 3) as velocity	
	FROM (
		SELECT
			t,
			lag(t, 1) over () as prev_t,				
			count as current_count,
			lag(count, 1) over () as prev_count
		FROM
			connection_lease_millis
		WINDOW grouping AS (			
			ORDER BY t
			ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
		)
	) raw_data	
) velocity_data

Giving us:

Current time stamp	Previous time stamp	Velocity [Req/sec]	Acceleration [req/sec2]
1398778568
1398778583	1398778568	1.200
1398778598	1398778583	0.000	-0.080
1398778613	1398778598	0.000	0.000
1398778628	1398778613	0.000	0.000
1398778643	1398778628	0.000	0.000
1398778658	1398778643	94.533	6.302
1398778673	1398778658	325.733	15.413
1398778688	1398778673	367.467	2.782

Giving us a nice overview over the arrival rate distribution:

Conclusion

SQL has more to offer than the standard aggregate functions. The window functions allow you to group rows while still retaining the select criteria.

How many of you are still using the 1.0 versions of Java, C# or Python? Shouldn’t we benefit from the latest SQL features the same way we do with any other programming language we use on a daily basis?

In case you’re still skeptic, this great article may shatter your doubts.

Reference:

Time to break free from the SQL-92 mindset from our JCG partner Vlad Mihalcea at the Vlad Mihalcea’s Blog blog.