Examples

diff

We’ll add more examples, but the best places to look for now are:

grouping_sets_query

Grouping sets are a neat feature of some databases that allow you to GROUP BY multiple combinations of columns in a single pass over your data. Some databases, like PostgreSQL and DuckDB, support them natively, whereas others, like SQLite, don’t. datools.sqlalchemy.grouping_sets_query will generate a GROUPING SETs query if your database allows it or create a synthetic equivalent using a UNION ALL of several GROUP BY queries.

This is concept best explained by example, and we’ll use the test suite for our example. Say you have an underlying relation like SELECT * FROM sensor readings, and you want to COUNT(*) across multiple combinations of created_at and sensor_id. In datools, you’d write:

from datools.sqlalchemy_utils import grouping_sets_query
query, set_index = grouping_sets_query(
    db_engine,
    'SELECT * FROM sensor_readings',
    (
        (Column('created_at'), Column('sensor_id')),
        (Column('created_at'),),
        (Column('sensor_id'),),
        (),
    ),
    (Aggregate(AggregateFunction.COUNT, Column('*'), Column('num_rows')), )
)

print('Query:', query)
print('Set index:', set_index)

On PostgreSQL (which supports GROUPING SETS), this would result in:

Query:
WITH query AS (SELECT * FROM sensor_readings)
SELECT
    GROUPING(created_at, sensor_id) AS grouping_id,
    created_at, sensor_id,
    COUNT(*) AS num_rows
FROM query
GROUP BY GROUPING SETS ((created_at, sensor_id), (created_at), (sensor_id), ())
Set index: {7: (Column(name='created_at'), Column(name='sensor_id')), 11: (Column(name='created_at'),), 13: (Column(name='sensor_id'),), 14: ()}

On SQLite (which doesn’t support GROUPING SETS), this would result in:

Query:
WITH query AS (SELECT * FROM sensor_readings)

SELECT
    0 AS grouping_id,
    created_at, sensor_id,
    COUNT(*) AS num_rows
FROM query
GROUP BY created_at, sensor_id

UNION ALL

SELECT
    1 AS grouping_id,
    created_at, NULL AS sensor_id,
    COUNT(*) AS num_rows
FROM query
GROUP BY created_at

UNION ALL

SELECT
    2 AS grouping_id,
    NULL AS created_at, sensor_id,
    COUNT(*) AS num_rows
FROM query
GROUP BY sensor_id

UNION ALL

SELECT
    3 AS grouping_id,
    NULL AS created_at, NULL AS sensor_id,
    COUNT(*) AS num_rows
FROM query
Set index: {0: (Column(name='created_at'), Column(name='sensor_id')), 1: (Column(name='created_at'),), 2: (Column(name='sensor_id'),), 3: ()}