Time to break free from the SQL-92 mindset
Are you stuck in the 90s?
If you are only using the SQL-92 language reference, then you are overlooking so many great features like:
Some test data
In my previous article I imported some CSV Codahale metrics into PostgreSQL for further analysis.
Our time series data consists of a total request count and a data recording time stamp:
count | t |
---|---|
151 | 1398778568 |
169 | 1398778583 |
169 | 1398778598 |
169 | 1398778613 |
169 | 1398778628 |
169 | 1398778643 |
1587 | 1398778658 |
6473 | 1398778673 |
11985 | 1398778688 |
Arrival velocity
I want to calculate the arrival velocity which can be defined as:
λ = arrival_velocity = Δcount / Δt
For each time event we need to subtract the current and previous count and time stamp values.
Window functions allow us to aggregate/reference previous/next rows without restricting the SELECT clause to a single result row:
SELECT t as "Current time stamp", prev_t as "Previous time stamp", current_count as "Current total request count", prev_count as "Previous total request count", ROUND(((current_count - prev_count)::numeric/(t - prev_t)::numeric), 3) as "Velocity [req/sec]" FROM ( SELECT t, lag(t, 1) over () as prev_t, count as current_count, lag(count, 1) over () as prev_count FROM connection_lease_millis WINDOW grouping AS ( ORDER BY t ROWS BETWEEN 1 PRECEDING AND CURRENT ROW ) ) raw_data
Giving us the arrival velocity:
Current time stamp | Previous time stamp | Current total request count | Previous total request count | Velocity [req/sec] |
---|---|---|---|---|
1398778568 | 151 | |||
1398778583 | 1398778568 | 169 | 151 | 1.200 |
1398778598 | 1398778583 | 169 | 169 | 0.000 |
1398778613 | 1398778598 | 169 | 169 | 0.000 |
1398778628 | 1398778613 | 169 | 169 | 0.000 |
1398778643 | 1398778628 | 169 | 169 | 0.000 |
1398778658 | 1398778643 | 1587 | 169 | 94.533 |
1398778673 | 1398778658 | 6473 | 1587 | 325.733 |
1398778688 | 1398778673 | 11985 | 6473 | 367.467 |
Arrival acceleration
But what if we want to calculate the arrival acceleration (e.g. so we can figure out how the arrival rate flactuates), which is
arrival_acceleration = Δarrival_velocity/ Δt
This is how we can do it:
SELECT t as "Current time stamp", prev_t as "Previous time stamp", velocity "Velocity [Req/sec]", ROUND((velocity - lag(velocity, 1) over ())::numeric/(t - prev_t)::numeric, 3) as "Acceleration [req/sec2]" FROM ( SELECT t, prev_t, current_count, prev_count, ROUND(((current_count - prev_count)::numeric/(t - prev_t)::numeric), 3) as velocity FROM ( SELECT t, lag(t, 1) over () as prev_t, count as current_count, lag(count, 1) over () as prev_count FROM connection_lease_millis WINDOW grouping AS ( ORDER BY t ROWS BETWEEN 1 PRECEDING AND CURRENT ROW ) ) raw_data ) velocity_data
Giving us:
Current time stamp | Previous time stamp | Velocity [Req/sec] | Acceleration [req/sec2] |
---|---|---|---|
1398778568 | |||
1398778583 | 1398778568 | 1.200 | |
1398778598 | 1398778583 | 0.000 | -0.080 |
1398778613 | 1398778598 | 0.000 | 0.000 |
1398778628 | 1398778613 | 0.000 | 0.000 |
1398778643 | 1398778628 | 0.000 | 0.000 |
1398778658 | 1398778643 | 94.533 | 6.302 |
1398778673 | 1398778658 | 325.733 | 15.413 |
1398778688 | 1398778673 | 367.467 | 2.782 |
Giving us a nice overview over the arrival rate distribution:
Conclusion
SQL has more to offer than the standard aggregate functions. The window functions allow you to group rows while still retaining the select criteria.
How many of you are still using the 1.0 versions of Java, C# or Python? Shouldn’t we benefit from the latest SQL features the same way we do with any other programming language we use on a daily basis?
In case you’re still skeptic, this great article may shatter your doubts.
Reference: | Time to break free from the SQL-92 mindset from our JCG partner Vlad Mihalcea at the Vlad Mihalcea’s Blog blog. |