[Solved] Dynamically generate columns in PostgreSQL
Looking to automatically optimize YOUR SQL query? Start for free.

EverSQL Database Performance Knowledge Base

Dynamically generate columns in PostgreSQL

Database type:

I have seen that there are quit a few similar questions like this one, but I havent understood how to code it myself. Please have in mind that I am just a beginner in this field.

Basically I want to pivot the table like this:

zoom |    day     | point         zoom | 2015-10-01 |  2015-10-02 | ......
------+-----------+-------  ---> ------+------------+-------------+
   1 | 2015-10-01 |   201            1 |    201     |     685     |
   2 | 2015-10-01 |    43            2 |     43     |     346     | 
   3 | 2015-10-01 |    80            3 |     80     |     534     | 
   4 | 2015-10-01 |   324            4 |    324     |     786     | 
   5 | 2015-10-01 |    25            5 |     25     |     685     |
   1 | 2015-10-02 |   685 
   2 | 2015-10-02 |   346 
   3 | 2015-10-02 |   534 
   4 | 2015-10-02 |   555 
   5 | 2015-10-02 |   786
   :
   :
   :

Time can vary.

Results on left I get with:

SELECT 
zoom,
to_char(date_trunc('day', time), 'YYYY-MM-DD') AS day,
count(*) as point
FROM province
WHERE time >= '2015-05-01' AND time < '2015-06-01'
GROUP BY to_char(date_trunc('day', time), 'YYYY-MM-DD'), zoom;

I have read that there are some issues if I use count and also that it would be better if I use CASE and GROUP BY, however I have no idea how to CASE this.

Crosstab itself doesnt support dynamic creation of column names, but that can be achieved with crosstab_hash, if I understood it correctly.

This might be probably nice solution: http://okbob.blogspot.ca/2008/08/using-cursors-for-generating-cross.html however I am stucked with it trying to program it myself.

I have to use this kind of pivoting quite often, so I would appriciate any kind of help and additional explanation behind it.

Edit1

I am trying to figure out how crosstab works with dates, currently without returning dynamic names of columns. Later on I will explain why. It is realted to the main question. For this example I am using only period of 2 dates.

Based on @Erwin Brandstetter answer:

SELECT * FROM crosstab(
       'SELECT zoom, day, point
        FROM   province
        ORDER  BY 1, 2'
      , $$VALUES ('2015-10-01'::date), ('2015-10-02')$$)
AS ct (zoom text, day1 int, day2 int);

returned results are:

zoom |    day1    |    day2     | 
-----+------------+-------------+
   1 |    201     |     685     |
   2 |     43     |     346     | 
   3 |     80     |     534     | 
   4 |    324     |     786     | 

I am trying to get this

zoom | 2015-10-01 |  2015-10-02 | 
-----+------------+-------------+
   1 |    201     |     685     |
   2 |     43     |     346     | 
   3 |     80     |     534     | 
   4 |    324     |     786     | 

but my query doesnt work:

SELECT *
FROM crosstab(
      'SELECT *
       FROM province
       ORDER  BY 1,2')
AS ct (zoom text, "2015-10-01" date, "2015-10-02" date);

ERROR:  return and sql tuple descriptions are incompatible

Edit1, Q1. Why does this doesnt work and how can I return results like that?

I have read links that @Erwin Brandstetter provided me, especially this one: Execute a dynamic crosstab query. I have copied/pasted his function:

CREATE OR REPLACE FUNCTION pivottab(_tbl regclass, 
                                    _row text, _cat text, 
                                    _expr text,
                                    _type regtype)  
RETURNS text AS
$func$
DECLARE
   _cat_list text;
   _col_list text;
BEGIN
-- generate categories for xtab param and col definition list    
EXECUTE format(
 $$SELECT string_agg(quote_literal(x.cat), '), (')
        , string_agg(quote_ident  (x.cat), %L)
   FROM  (SELECT DISTINCT %I AS cat FROM %s ORDER BY 1) x$$
 , ' ' || _type || ', ', _cat, _tbl)
INTO  _cat_list, _col_list;

-- generate query string
RETURN format(
'SELECT * FROM crosstab(
   $q$SELECT %I, %I, %s
      FROM   %I
      GROUP  BY 1, 2
      ORDER  BY 1, 2$q$
 , $c$VALUES (%5$s)$c$
   ) ct(%1$I text, %6$s %7$s)'
, _row, _cat, _expr, _tbl, _cat_list, _col_list, _type
);

END
$func$ LANGUAGE plpgsql;

and call it with query

SELECT pivottab('province','zoom','day','point','date');

Function returned me:

                         pivottab                         
----------------------------------------------------------
 SELECT * FROM crosstab(                                 +
    $q$SELECT zoom, day, point                           +
       FROM   province                                   +
       GROUP  BY 1, 2                                    +
       ORDER  BY 1, 2$q$                                 +
  , $c$VALUES ('2015-10-01'), ('2015-10-02')$c$          +
    ) ct(zoom text, "2015-10-01" date, "2015-10-02" date)
(1 row)

So when I edited the query and added ; (it would be nice that ; is already there) I got:

ERROR:  column "province.point" must appear in the GROUP BY clause or be used in an aggregate function

Edit1, Q2. Any ideas how to solove this?

Edit1, Q3. I guess next question will be how to execute function automaticlly, which is also mentioned on the same link, but got stucked on previous steps.

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Calling Functions With Indexed Columns (query line: 13): When a function is used directly on an indexed column, the database's optimizer won’t be able to use the index. For example, if the column `time` is indexed, the index won’t be used as it’s wrapped with the function `to_char`. If you can’t find an alternative condition that won’t use a function call, a possible solution is to store the required value in a new indexed column.
  2. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
CREATE INDEX province_idx_time ON "province" ("time");
CREATE INDEX province_idx_zoom ON "province" ("zoom");
The optimized query:
SELECT
        province.zoom,
        to_char(date_trunc('day',
        province.time),
        'YYYY-MM-DD') AS day,
        count(*) AS point 
    FROM
        province 
    WHERE
        province.time >= '2015-05-01' 
        AND province.time < '2015-06-01' 
    GROUP BY
        to_char(date_trunc('day',
        province.time),
        'YYYY-MM-DD'),
        province.zoom

Related Articles



* original question posted on StackOverflow here.