[Solved] Should SQL format the output or just retrieve the raw data?

EverSQL Database Performance Knowledge Base

Should SQL format the output or just retrieve the raw data?

Generally speaking, the SQL queries that I write return unformatted data and I leave it to the presentation layer, a web page or a windows app, to format the data as required. Other people that I work with, including my boss, will insist that it is more efficient to have the database do it. I'm not sure that I buy that and believe that even if there was a measurable performance gain by having the database do it, that there are more compelling reasons to generally avoid this.

For example, I will place my queries in a Data Access layer with the intent of potentially reusing the queries whenever possible. Given this, I ascertain that the queries are more likely to be able to be reused if the data remains in their native type rather than converting the data to a string and applying formatting functions on them, for example, formatting a date column to a DD-MMM-YYYY format for display. Sure, if the SQL was returning the dates as formatted strings, you could reverse the process to revert the value back to a date data type, but this seems awkward, for lack of a better word. Furtehrmore, when it comes to formatting other data, for example, a machine serial number made up of a prefix, base and suffix with separating dashes and leading zeros removed in each sub field, you risk the possibility that you may not be able to correctly revert back to the original serial number when going in the other direction. Maybe this is a bad example, but I hope you see the direction I am going with this...

To take things a step further, I see people write VERY complex SQLs because they are essentially writing what I would call presentation logic into a SQL instead of returning simple data and then applying this presentation logic in the presentation layer. In my mind, this results in very complex, difficult to maintain and more brittle SQL that is less adaptable to change.

Take the following real-life example of what I found in our system and tell me what you think. The rational I was given for this approach was that this made the web app very simple to render the page as it used the following 1-line snippet of classic ADO logic in a Classic ASP web app to process the rows returned:

     oRS.GetString ( , , "</td>" & vbCrLf & "<td style=""font-size:x-small"" nowrap>" ,"</td>" & vbCrLf & "</tr>" & vbCrLf & "<tr>" & vbCrLf & _
     "<td style=""font-size:x-small"" nowrap>" ,"&nbsp;" ) & "</td>" & vbCrLf & "</tr>" & vbCrLf & _

Here's the SQL itself. While I appreciate the author's ability to write a complex SQL, I feel like this is a maintenance nightmare. Am I nuts? The SQL is returning a list of programs that are current running against our database and the status of each:

Because the SQL did not display with CR/LFs when I pasted here, I decided to put the SQL on an otherwise empty personal Google site. Please feel free to comment. Thanks.

By the way-This SQL was actually constructed using VB Script nested WITHIN a classic ASP page, not calling a stored procedure, so you have the additional complexity of embedded concatentations and quoted markup, if you know what I mean, not to mention lack of formatting. The first thing I did when I was asked to help to debug the SQL was to add a debug.print of the SQL output and throw it through a SQL formatter that I just found. Some of the formatting was lost in pasting at the following link:

Edit(Andomar): copied inline: (external link removed, thanks-Chad)

SELECT 
Substring(Datename("dw",start_datetime),1,3) 
+ ', '
+ Cast(start_datetime AS VARCHAR) "Start Time (UTC/GMT)"
,program_name "Program Name"
,run_sequence "Run Sequence"
,CASE 
WHEN batchno = 0
THEN Char(160)
WHEN batchno = NULL
THEN Char(160)
ELSE Cast(batchno AS VARCHAR)
END "Batch #" /* ,Replace(Replace(detail_log ,'K:\' ,'file://servernamehere/DiskVolK/') ,'\' ,'/') "log"*/ /* */
,Cast('<a href="GOIS_ViewLog.asp?Program_Name=' AS VARCHAR(99))
+ Cast(program_name AS VARCHAR)
+ Cast('&Run_Sequence=' AS VARCHAR)
+ Cast(run_sequence AS VARCHAR)
+ Cast('&Page=1' AS VARCHAR)
+ ''
+ Cast('">'
+ CASE 
WHEN end_datetime >= start_datetime
THEN CASE 
WHEN end_datetime <> 'Jan 1 1900 2:00 PM'
THEN CASE 
WHEN (success_code = 10
OR success_code = 0)
AND exit_code = 10
THEN CASE 
WHEN errorcount = 0
THEN 'Completed Successfully'
ELSE 'Completed with Errors'
END
WHEN success_code = 100
AND exit_code = 10
THEN 'Completed with Errors'
ELSE CASE 
WHEN program_name <> 'FileDepCheck'
THEN 'Failed'
ELSE 'File not found'
END
END
ELSE CASE 
WHEN success_code = 10
AND exit_code = 0
THEN 'Failed; Entries for Input File Missing'
ELSE 'Aborted'
END
END
ELSE CASE 
WHEN ((Cast(Datediff(mi,start_datetime,Getdate()) AS INT) <= 240)
OR ((SELECT 
Count(* )
FROM 
MASTER.dbo.sysprocesses a(nolock)
INNER JOIN gcsdwdb.dbo.update_log b(nolock)
ON a.program_name = b.program_name
WHERE a.program_name = update_log.program_name
AND (Abs(Datediff(n,b.start_datetime,a.login_time))) < 1) > 0))
THEN 'Processing...'
ELSE 'Aborted without end date'
END
END
+ '</a>' AS VARCHAR) "Status / Log"
,Cast('<a href="' AS VARCHAR)
+ Replace(Replace(detail_log,'K:\','file://servernamehere/DiskVolK/'),
'\','/')
+ Cast('" title="Click to view Detail log text file"' AS VARCHAR(99))
+ Cast('style="font-family:comic sans ms; font-size:12; color:blue"><img src="images\DetailLog.bmp" border="0"></a>' AS VARCHAR(999))
+ Char(160)
+ Cast('<a href="' AS VARCHAR)
+ Replace(Replace(summary_log,'K:\','file://servernamehere/DiskVolK/'),
'\','/')
+ Cast('" title="Click to view Summary log text file"' AS VARCHAR(99))
+ Cast('style="font-family:comic sans ms; font-size:12; color:blue"><img src="images\SummaryLog.bmp" border="0"></a>' AS VARCHAR(999)) "Text Logs"
,errorcount "Error Count"
,warningcount "Warning Count"
,(totmsgcount
- errorcount
- warningcount) "Information Message Count"
,CASE 
WHEN end_datetime > start_datetime
THEN CASE 
WHEN Cast(Datepart("hh",(end_datetime
- start_datetime)) AS INT) > 0
THEN Cast(Datepart("hh",(end_datetime
- start_datetime)) AS VARCHAR)
+ ' hr '
ELSE ' '
END
+ CASE 
WHEN Cast(Datepart("mi",(end_datetime
- start_datetime)) AS INT) > 0
THEN Cast(Datepart("mi",(end_datetime
- start_datetime)) AS VARCHAR)
+ ' min '
ELSE ' '
END
+ CASE 
WHEN Cast(Datepart("ss",(end_datetime
- start_datetime)) AS INT) > 0
THEN Cast(Datepart("ss",(end_datetime
- start_datetime)) AS VARCHAR)
+ ' sec '
ELSE ' '
END
ELSE CASE 
WHEN end_datetime = start_datetime
THEN '< 1 sec'
ELSE CASE 
WHEN ((Cast(Datediff(mi,start_datetime,Getdate()) AS INT) <= 240)
OR ((SELECT 
Count(* )
FROM 
MASTER.dbo.sysprocesses a(nolock)
INNER JOIN gcsdwdb.dbo.update_log b(nolock)
ON a.program_name = b.program_name
WHERE a.program_name = update_log.program_name
AND (Abs(Datediff(n,b.start_datetime,a.login_time))) < 1) > 0))
THEN 'Running '
+ Cast(Datediff(mi,start_datetime,Getdate()) AS VARCHAR)
+ ' min'
ELSE '&nbsp;'
END
END
END "Elapsed Time" /* ,end_datetime "End Time (UTC/GMT)" ,datepart("hh" ,
(end_datetime - start_datetime)) "Hr" ,datepart("mi" ,(end_datetime - start_datetime)) "Mins" ,datepart("ss" ,(end_datetime - start_datetime)) "Sec" ,datepart("ms" ,(end_datetime - start_datetime)) "mSecs" ,datepart("dw" ,start_datetime) "dp" ,case when datepart("dw" ,start_datetime) = 6 then ' Fri' when datepart("dw" ,start_datetime) = 5 then ' Thu' else '1' end */
,totalrows "Total Rows"
,inserted "Rows Inserted"
,updated "Rows Updated" /* ,success_code "succ" ,exit_code "exit" */
FROM 
update_log
WHERE start_datetime >= '5/29/2009 16:15'
ORDER BY start_datetime DESC

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Calling Functions With Indexed Columns (query line: 42): When a function is used directly on an indexed column, the database's optimizer won’t be able to use the index. For example, if the column `start_datetime` is indexed, the index won’t be used as it’s wrapped with the function `Abs`. If you can’t find an alternative condition that won’t use a function call, a possible solution is to store the required value in a new indexed column.
  2. Avoid Calling Functions With Indexed Columns (query line: 42): When a function is used directly on an indexed column, the database's optimizer won’t be able to use the index. For example, if the column `login_time` is indexed, the index won’t be used as it’s wrapped with the function `Abs`. If you can’t find an alternative condition that won’t use a function call, a possible solution is to store the required value in a new indexed column.
  3. Avoid Calling Functions With Indexed Columns (query line: 86): When a function is used directly on an indexed column, the database's optimizer won’t be able to use the index. For example, if the column `start_datetime` is indexed, the index won’t be used as it’s wrapped with the function `Abs`. If you can’t find an alternative condition that won’t use a function call, a possible solution is to store the required value in a new indexed column.
  4. Avoid Calling Functions With Indexed Columns (query line: 86): When a function is used directly on an indexed column, the database's optimizer won’t be able to use the index. For example, if the column `login_time` is indexed, the index won’t be used as it’s wrapped with the function `Abs`. If you can’t find an alternative condition that won’t use a function call, a possible solution is to store the required value in a new indexed column.
  5. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
ALTER TABLE `sysprocesses` ADD INDEX `sysprocesses_idx_program_name` (`program_name`);
ALTER TABLE `update_log` ADD INDEX `update_log_idx_start_datetime` (`start_datetime`);
ALTER TABLE `update_log` ADD INDEX `update_log_idx_program_name` (`program_name`);
The optimized query:
SELECT
        Substring(Datename('dw',
        update_log.start_datetime),
        1,
        3) + ', ' + CAST(update_log.start_datetime AS VARCHAR) 'Start Time (UTC/GMT)',
        update_log.program_name 'Program Name',
        update_log.run_sequence 'Run Sequence',
        CASE 
            WHEN update_log.batchno = 0 THEN Char(160) 
            WHEN update_log.batchno = NULL THEN Char(160) 
            ELSE CAST(update_log.batchno AS VARCHAR) END "Batch #",
CAST('' + CASE 
    WHEN update_log.end_datetime >= update_log.start_datetime THEN CASE 
        WHEN update_log.end_datetime <> 'Jan 1 1900 2:00 PM' THEN CASE 
            WHEN (update_log.success_code = 10 
            OR update_log.success_code = 0) 
            AND update_log.exit_code = 10 THEN CASE 
                WHEN update_log.errorcount = 0 THEN 'Completed Successfully' 
                ELSE 'Completed with Errors' END 
WHEN update_log.success_code = 100 
AND update_log.exit_code = 10 THEN 'Completed with Errors' 
ELSE CASE 
    WHEN update_log.program_name <> 'FileDepCheck' THEN 'Failed' 
    ELSE 'File not found' END END 
ELSE CASE 
    WHEN update_log.success_code = 10 
    AND update_log.exit_code = 0 THEN 'Failed; Entries for Input File Missing' 
    ELSE 'Aborted' END END 
ELSE CASE 
    WHEN ((CAST(Datediff(mi,
    update_log.start_datetime,
    Getdate()) AS INT) <= 240) 
    OR ((SELECT
        Count(*) 
    FROM
        MASTER.dbo.sysprocesses a(nolock) 
    INNER JOIN
        gcsdwdb.dbo.update_log b(nolock) 
            ON a.program_name = b.program_name 
    WHERE
        a.program_name = update_log.program_name 
        AND (Abs(Datediff(n, b.start_datetime, a.login_time))) < 1) > 0)) THEN 'Processing...' 
    ELSE 'Aborted without end date' END END + '' AS VARCHAR) 'Status / Log',
CAST('' AS VARCHAR (999)) + Char(160) + CAST('' AS VARCHAR (999)) 'Text Logs',
update_log.errorcount 'Error Count',
update_log.warningcount 'Warning Count',
(update_log.totmsgcount - update_log.errorcount - update_log.warningcount) 'Information Message Count',
CASE 
    WHEN update_log.end_datetime > update_log.start_datetime THEN CASE 
        WHEN CAST(Datepart('hh',
        (update_log.end_datetime - update_log.start_datetime)) AS INT) > 0 THEN CAST(Datepart('hh',
        (update_log.end_datetime - update_log.start_datetime)) AS VARCHAR) + ' hr ' 
        ELSE ' ' END + CASE 
    WHEN CAST(Datepart('mi',
    (update_log.end_datetime - update_log.start_datetime)) AS INT) > 0 THEN CAST(Datepart('mi',
    (update_log.end_datetime - update_log.start_datetime)) AS VARCHAR) + ' min ' 
    ELSE ' ' END + CASE 
    WHEN CAST(Datepart('ss',
    (update_log.end_datetime - update_log.start_datetime)) AS INT) > 0 THEN CAST(Datepart('ss',
    (update_log.end_datetime - update_log.start_datetime)) AS VARCHAR) + ' sec ' 
    ELSE ' ' END 
ELSE CASE 
    WHEN update_log.end_datetime = update_log.start_datetime THEN '< 1 sec' 
    ELSE CASE 
        WHEN ((CAST(Datediff(mi,
        update_log.start_datetime,
        Getdate()) AS INT) <= 240) 
        OR ((SELECT
            Count(*) 
        FROM
            MASTER.dbo.sysprocesses a(nolock) 
        INNER JOIN
            gcsdwdb.dbo.update_log b(nolock) 
                ON a.program_name = b.program_name 
        WHERE
            a.program_name = update_log.program_name 
            AND (
                Abs(Datediff(n, b.start_datetime, a.login_time))
            ) < 1) > 0)) THEN 'Running ' + CAST(Datediff(mi,
        update_log.start_datetime,
        Getdate()) AS VARCHAR) + ' min' 
        ELSE ' ' END END END 'Elapsed Time',
update_log.totalrows 'Total Rows',
update_log.inserted 'Rows Inserted',
update_log.updated 'Rows Updated' 
FROM
update_log 
WHERE
update_log.start_datetime >= '5/29/2009 16:15' 
ORDER BY
update_log.start_datetime DESC

Related Articles



* original question posted on StackOverflow here.