[Solved] Rails : How to build statistics per day/month/year or How database agnostic SQL functions are missing (ex. : STRFTIME, DATE_FORMAT, DATE_TRUNC)

EverSQL Database Performance Knowledge Base

Rails : How to build statistics per day/month/year or How database agnostic SQL functions are missing (ex. : STRFTIME, DATE_FORMAT, DATE_TRUNC)

Database type:

I have been searching all over the web and I have no clue.

Basically, there are two options :

1) Retrieve all rows from the database using Subscriber.all and aggregate by day in the Rails app using the Enumerable.group_by :

@subscribers = Subscriber.all
@subscriptions_per_day = @subscribers.group_by { |s| s.created_at.beginning_of_day }

I think this is a really bad idea. Retrieving all rows from the database can be acceptable for a small application, but it will not scale at all. Database aggregate and date functions to the rescue !

2) Run a SQL query in the database using aggregate and date functions :

Subscriber.select('STRFTIME("%Y-%m-%d", created_at) AS day, COUNT(*) AS subscriptions').group('day')

Which will run in this SQL query :

SELECT STRFTIME("%Y-%m-%d", created_at) AS day, COUNT(*) AS subscriptions
FROM subscribers
GROUP BY day

Much better. Now aggregates are done in the database which is optimized for this kind of task, and only one row per day is returned from the database to the Rails app.

... but wait... now the app has to go live in my production env which uses MySQL ! Replace STRFTIME() with DATE_FORMAT(). What if tomorrow I switch to PostgreSQL ? Replace DATE_FORMAT() with DATE_TRUNC().

I like to develop with SQLite. Simple and easy. I also like the idea that Rails is database agnostic. But why Rails doesn't provide a way to translate SQL functions that do the exact same thing, but have different syntax in each RDBMS (this difference is really stupid, but hey, it's too late to complain about it) ?

I can't believe that I find so few answers on the Web for such a basic feature of a Rails app : count the subscriptions per day, month or year.

Tell me I'm missing something :)

EDIT

It's been a few years since I posted this question. Experience has shown that I should use the same DB for dev and prod. So I now consider the database agnostic requirement irrelevant.

Dev/prod parity FTW.

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Explicitly ORDER BY After GROUP BY (modified query below): By default, the database sorts all 'GROUP BY col1, col2, ...' queries as if you specified 'ORDER BY col1, col2, ...' in the query as well. If a query includes a GROUP BY clause but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying 'ORDER BY NULL'.
The optimized query:
SELECT
        STRFTIME("%Y-%m-%d",
        subscribers.created_at) AS day,
        COUNT(*) AS subscriptions 
    FROM
        subscribers 
    GROUP BY
        day 
    ORDER BY
        NULL

Related Articles



* original question posted on StackOverflow here.