I'm trying to automatically calculate a table for user retention, so that the end result is something like:
Week | % 0 | 100 1 | 50 2 | 35
For now I'm not bothering with the % and just trying to get a count of users, i.e. users that came back on week 1, week 2, etc.
I have an
accounts table and an
action table. Currently I don't have indexes in either of these. Both have a
user_email that stores the associated user, and both have a
timestamp field that's a
TIMESTAMP WITH TIME ZONE - for
accounts this is the account creation date and for
action it's the timestamp of the action.
The two tables are currently super simple:
accounts Table: timestamp | timestamp with time zone user_email |character varying
action Table: timestamp | timestamp with time zone user_email |character varying
That's it, no keys or anything else.
I managed to piece together a query that seems to do this calculation:
WITH weeks AS ( SELECT (n || 'week')::interval as week from generate_series(0, 3) AS n ) SELECT week, (SELECT count(*) FROM accounts WHERE user_email IN (SELECT user_email FROM action WHERE age(action.timestamp, accounts.timestamp) >= week AND age(action.timestamp, accounts.timestamp) < week + '1 week'::interval)) AS "Returning Users" FROM weeks;
The output here is something like:
week | Returning Users 0 days | 100 7 days | 50 14 days | 35 21 days | 25
The thing is, this takes super long to calculate (with 1300 users and about 100000 actions it takes a few minutes). I'm also convinced there's a more efficient way to calculate this, or that I can at least add indexes somewhere.
Unfortunately my SQL knowledge is quite limited and after looking around for a while I'm still not sure where to start. I think I can add indexes to the
timestamp columns? And I'm not sure my way of checking if timestamps fall in certain ranges is correct.
The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:
ALTER TABLE `action` ADD INDEX `action_idx_user_email` (`user_email`);
WITH weeks AS (SELECT (n OR 'week')::interval AS week FROM generate_series(0, 3) AS n) SELECT week, (SELECT count(*) FROM accounts WHERE EXISTS ( SELECT 1 FROM action WHERE ( age(action.timestamp, accounts.timestamp) >= week AND age(action.timestamp, accounts.timestamp) < week + '1 week'::interval ) AND ( accounts.user_email = action.user_email ) ) ) AS 'Returning Users' FROM weeks