Professional Documents
Culture Documents
the
GROUP BY clause
As a Data Analyst or Scientist you will probably do segmentations all the time. For
instance, it’s interesting to know the average departure delay of all flights (we have
just learned that it’s 11.36). But when it comes to business decisions, this number is
not actionable at all. However, if we turn this information into a more useful format
– let’s say we break it down by airport – it will instantly become something we can
act on!
Here’s a simplified chart showing how SQL performs automatic segmentation based
on column values:
The process has three important steps:
STEP 1 – Specify which columns you want to work with as an input. In our case we
want to use the list of the airports (origin column) and the departure delays
(depdelay column).
STEP 2 – Specify which column(s) we want to create our segmentation from. For us
it’s the origin. SQL automatically looks for every unique value in this column (in the
above example – airport 1, airport 2 and airport 3), then creates groups from them
and sorts each line from your data table into the right group.
STEP 3 – Finally it calculates the averages using the SQL AVG function for each group
and returns the results on your screen.
The only new thing here is the “grouping” at STEP 2. We have an SQL clause for that.
It’s called GROUP BY. Let’s see it in action:
SELECT
AVG(depdelay),
origin
FROM flight_delays
GROUP BY origin;
If you scroll through the results, you will see that there are some airports with an
average departure delay of more than 30 or even 40 minutes. From a business
perspective it’s important to understand what’s going on at those airports. On the
other hand it’s also worth taking a closer look at how the good airports
(depdelay close to 0) are managing to reach this ideal phase. (Yeah, it’s over-
simplified, but just for example…)
But what just happened SQL-wise? We have selected two columns
– origin and depdelay. origin has been used to create the segments (GROUP BY
origin). depdelay has been used to calculate the averages of the arrival delays in
Test yourself #1
month,
SUM(airtime)
FROM flight_delays
GROUP BY month;
I did pretty much the same stuff that I have done before, but now I’ve created the
groups based on the months – and this time I had to use the SUM function.
Test yourself #2
AVG(depdelay),
origin
FROM flight_delays
The takeaway from this assignment is something that you might have already
realized: you can use the SQL WHERE clause to filter even those columns that are not
part of your SELECT statement.
---------------
The CTE is then given a name and wraps a statement with parentheses. After the
second parentheses, we immediately follow that with another SQL statement like
a SELECT to view our results.
For our recursive list, we need a starting point. The starting point for our records are the
ParentIds containing null.
Our first SELECT statement will grab our initial records and follow that up with a UNION
ALL to the CTE name. This is where it gets a little trippy.
;WITH cte_categories
AS
(
SELECT
ms.MenuId
,ms.Title
,ms.ParentId
FROM MenuSystem ms
WHERE ParentId IS NULL
UNION ALL
SELECT
ms.MenuId
,ms.Title
,ms.ParentId
FROM MenuSystem ms
INNER JOIN cte_categories cat ON ms.ParentId = cat.MenuId
)
Notice how we are joining on the cte_categories inside the cte_categories.
;WITH cte_categories
AS
(
SELECT
ms.MenuId
,ms.Title
,ms.ParentId
FROM MenuSystem ms
WHERE ParentId IS NULL
UNION ALL
SELECT
ms.MenuId
,ms.Title
,ms.ParentId
FROM MenuSystem ms
INNER JOIN cte_categories cat ON ms.ParentId = cat.MenuId
)
SELECT
MenuId
,Title
,ParentId
FROM cte_categories
And this result set returns back all of the records.
"But JD, why not just return all the records anyway and let C# handle it?"
Even though you could do a "SELECT * FROM MenuSystem", CTEs provide a better way to
grab hierarchical data.
This is where the beauty of recursive common table expressions shines through.
Let's say our user selects the "Movies, Music, & Games" menu option from the Amazon
menu and, on the next page, you want to display all of the menu items from MenuId 2
down. Your CTE would look like this:
;WITH cte_categories
AS
(
SELECT
ms.MenuId
,ms.Title
,ms.ParentId
FROM MenuSystem ms
WHERE ms.MenuId=2 -- Make your starting point a single menu item
UNION ALL
SELECT
ms.MenuId
,ms.Title
,ms.ParentId
FROM MenuSystem ms
INNER JOIN cte_categories cat ON ms.ParentId = cat.MenuId
)
SELECT
MenuId
,Title
,ParentId
FROM cte_categories
Your results look like this:
--------------------------------- Date with CAST and Convert in SQL server ------------
Syntax
-- CAST Syntax:
CAST ( expression AS data_type [ ( length ) ] )
Eg: cast (o.OrderDate as date)
-- CONVERT Syntax:
CONVERT ( data_type [ ( length ) ] , expression [ , style ] )
convert (date, o.OrderDate) as date
https://www.mssqltips.com/sqlservertip/1145/date-and-time-conversions-using-sql-server/
Problem
There are many instances when dates and times don't show up at your doorstep in the
format you'd like it to be, nor does the output of a query fit the needs of the people
viewing it. One option is to format the data in the application itself. Another option is to
use the built-in functions SQL Server provides to format the date string for you.
Solution
SQL Server provides a number of options you can use to format a date/time string. One
of the first considerations is the actual date/time needed. The most common is the
current date/time using getdate(). This provides the current date and time according to
the server providing the date and time. If a universal date/time is needed,
then getutcdate() should be used. To change the format of the date, you convert the
requested date to a string and specify the format number corresponding to the format
needed.
Below is a list of formats and an example of the output. The date used for all of these
examples is "2006-12-30 00:38:54.840".
You can also format the date or time without dividing characters, as well as concatenate
the date and time string: