You are on page 1of 12

Basic segmentation analysis with SQL – aka.

the
GROUP BY clause

As a Data Analyst or Scientist you will probably do segmentations all the time. For
instance, it’s interesting to know the average departure delay of all flights (we have
just learned that it’s 11.36). But when it comes to business decisions, this number is
not actionable at all. However, if we turn this information into a more useful format
– let’s say we break it down by airport – it will instantly become something we can
act on!
Here’s a simplified chart showing how SQL performs automatic segmentation based
on column values:
The process has three important steps:

STEP 1 – Specify which columns you want to work with as an input. In our case we
want to use the list of the airports (origin column) and the departure delays
(depdelay column).
STEP 2 – Specify which column(s) we want to create our segmentation from. For us
it’s the origin. SQL automatically looks for every unique value in this column (in the
above example – airport 1, airport 2 and airport 3), then creates groups from them
and sorts each line from your data table into the right group.
STEP 3 – Finally it calculates the averages using the SQL AVG function for each group
and returns the results on your screen.
The only new thing here is the “grouping” at STEP 2. We have an SQL clause for that.
It’s called GROUP BY. Let’s see it in action:
SELECT
AVG(depdelay),

origin

FROM flight_delays

GROUP BY origin;

If you scroll through the results, you will see that there are some airports with an
average departure delay of more than 30 or even 40 minutes. From a business
perspective it’s important to understand what’s going on at those airports. On the
other hand it’s also worth taking a closer look at how the good airports
(depdelay close to 0) are managing to reach this ideal phase. (Yeah, it’s over-
simplified, but just for example…)
But what just happened SQL-wise? We have selected two columns
– origin and depdelay. origin has been used to create the segments (GROUP BY
origin). depdelay has been used to calculate the averages of the arrival delays in

these segments (AVG(depdelay)).


Note: As you can see, the logic of SQL is not as linear as it was in bash. If you write
an SQL query, the first line of it could highly rely on the last line. When you’ll write
really long and complex queries, this might cause some unexpected errors and thus
of course a little headache too… But that’s why I find it very, very important to give
yourself enough time to practice the basic things and make sure that you fully
understand the relationships between the different clauses, functions and other
stuff in SQL.

Test yourself #1

Here’s a little assignment to practice on and to double-check that you understand


everything so far! The task is:
Print the total monthly airtime!
.
.
.
Ready?
Here’s my solution:
SELECT

month,

SUM(airtime)

FROM flight_delays

GROUP BY month;
I did pretty much the same stuff that I have done before, but now I’ve created the
groups based on the months – and this time I had to use the SUM function.

Test yourself #2

And another exercise:


Calculate the average departure delay by airport again, but this time use only those
flights that flew more than 2000 miles (you will find this info in
the distance column).
.
.
.
Here’s the query:
SELECT

AVG(depdelay),

origin

FROM flight_delays

WHERE distance > 2000


GROUP BY origin;

The takeaway from this assignment is something that you might have already
realized: you can use the SQL WHERE clause to filter even those columns that are not
part of your SELECT statement.

---------------

Common Table Expression Structure


CTEs should begin with a ";WITH" statement. The semi-colon is required to make sure
no other commands bleed into this CTE.

The CTE is then given a name and wraps a statement with parentheses. After the
second parentheses, we immediately follow that with another SQL statement like
a SELECT to view our results.

For our recursive list, we need a starting point. The starting point for our records are the
ParentIds containing null.

Our first SELECT statement will grab our initial records and follow that up with a UNION
ALL to the CTE name. This is where it gets a little trippy.

;WITH cte_categories
AS
(
SELECT
ms.MenuId
,ms.Title
,ms.ParentId
FROM MenuSystem ms
WHERE ParentId IS NULL

UNION ALL

SELECT
ms.MenuId
,ms.Title
,ms.ParentId
FROM MenuSystem ms
INNER JOIN cte_categories cat ON ms.ParentId = cat.MenuId
)
Notice how we are joining on the cte_categories inside the cte_categories.

As I mentioned before, it is absolutely necessary to have a single statement right after


the ending parentheses or SQL Server will complain about it.

So our final Common Table Expression looks like this:

;WITH cte_categories
AS
(
SELECT
ms.MenuId
,ms.Title
,ms.ParentId
FROM MenuSystem ms
WHERE ParentId IS NULL

UNION ALL

SELECT
ms.MenuId
,ms.Title
,ms.ParentId
FROM MenuSystem ms
INNER JOIN cte_categories cat ON ms.ParentId = cat.MenuId
)
SELECT
MenuId
,Title
,ParentId
FROM cte_categories
And this result set returns back all of the records.

"But JD, why not just return all the records anyway and let C# handle it?"
Even though you could do a "SELECT * FROM MenuSystem", CTEs provide a better way to
grab hierarchical data.

This is where the beauty of recursive common table expressions shines through.

Let's say our user selects the "Movies, Music, & Games" menu option from the Amazon
menu and, on the next page, you want to display all of the menu items from MenuId 2
down. Your CTE would look like this:

;WITH cte_categories
AS
(
SELECT
ms.MenuId
,ms.Title
,ms.ParentId
FROM MenuSystem ms
WHERE ms.MenuId=2 -- Make your starting point a single menu item

UNION ALL

SELECT
ms.MenuId
,ms.Title
,ms.ParentId
FROM MenuSystem ms
INNER JOIN cte_categories cat ON ms.ParentId = cat.MenuId
)
SELECT
MenuId
,Title
,ParentId
FROM cte_categories
Your results look like this:
--------------------------------- Date with CAST and Convert in SQL server ------------

Syntax
-- CAST Syntax:
CAST ( expression AS data_type [ ( length ) ] )
Eg: cast (o.OrderDate as date)
-- CONVERT Syntax:
CONVERT ( data_type [ ( length ) ] , expression [ , style ] )
convert (date, o.OrderDate) as date

https://www.mssqltips.com/sqlservertip/1145/date-and-time-conversions-using-sql-server/

Problem

There are many instances when dates and times don't show up at your doorstep in the
format you'd like it to be, nor does the output of a query fit the needs of the people
viewing it. One option is to format the data in the application itself. Another option is to
use the built-in functions SQL Server provides to format the date string for you.

Solution

SQL Server provides a number of options you can use to format a date/time string. One
of the first considerations is the actual date/time needed. The most common is the
current date/time using getdate(). This provides the current date and time according to
the server providing the date and time. If a universal date/time is needed,
then getutcdate() should be used. To change the format of the date, you convert the
requested date to a string and specify the format number corresponding to the format
needed.

How to get different SQL Server date formats


1. Use the date format option along with CONVERT function
2. To get YYYY-MM-DD use SELECT CONVERT(varchar, getdate(), 23)
3. To get MM/DD/YYYY use SELECT CONVERT(varchar, getdate(), 1)
4. Check out the chart to get a list of all format options

Below is a list of formats and an example of the output. The date used for all of these
examples is "2006-12-30 00:38:54.840".

DATE ONLY FORMATS

Format # Query Sample

1 select convert(varchar, getdate(), 1) 12/30/06

2 select convert(varchar, getdate(), 2) 06.12.30

3 select convert(varchar, getdate(), 3) 30/12/06

4 select convert(varchar, getdate(), 4) 30.12.06

5 select convert(varchar, getdate(), 5) 30-12-06

6 select convert(varchar, getdate(), 6) 30 Dec 06

7 select convert(varchar, getdate(), 7) Dec 30, 06

10 select convert(varchar, getdate(), 10) 12-30-06

11 select convert(varchar, getdate(), 11) 06/12/30

12 select convert(varchar, getdate(), 12) 061230

23 select convert(varchar, getdate(), 23) 2006-12-30

101 select convert(varchar, getdate(), 101) 12/30/2006

102 select convert(varchar, getdate(), 102) 2006.12.30

103 select convert(varchar, getdate(), 103) 30/12/2006

104 select convert(varchar, getdate(), 104) 30.12.2006

105 select convert(varchar, getdate(), 105) 30-12-2006

106 select convert(varchar, getdate(), 106) 30 Dec 2006


107 select convert(varchar, getdate(), 107) Dec 30, 2006

110 select convert(varchar, getdate(), 110) 12-30-2006

111 select convert(varchar, getdate(), 111) 2006/12/30

112 select convert(varchar, getdate(), 112) 20061230

TIME ONLY FORMATS

8 select convert(varchar, getdate(), 8) 00:38:54

14 select convert(varchar, getdate(), 14) 00:38:54:840

24 select convert(varchar, getdate(), 24) 00:38:54

108 select convert(varchar, getdate(), 108) 00:38:54

114 select convert(varchar, getdate(), 114) 00:38:54:840

DATE & TIME FORMATS

0 select convert(varchar, getdate(), 0) Dec 12 2006 12:38AM

9 select convert(varchar, getdate(), 9) Dec 30 2006 12:38:54:840AM

13 select convert(varchar, getdate(), 13) 30 Dec 2006 00:38:54:840AM

20 select convert(varchar, getdate(), 20) 2006-12-30 00:38:54

21 select convert(varchar, getdate(), 21) 2006-12-30 00:38:54.840

22 select convert(varchar, getdate(), 22) 12/30/06 12:38:54 AM

25 select convert(varchar, getdate(), 25) 2006-12-30 00:38:54.840

100 select convert(varchar, getdate(), 100) Dec 30 2006 12:38AM

109 select convert(varchar, getdate(), 109) Dec 30 2006 12:38:54:840AM

113 select convert(varchar, getdate(), 113) 30 Dec 2006 00:38:54:840


120 select convert(varchar, getdate(), 120) 2006-12-30 00:38:54

121 select convert(varchar, getdate(), 121) 2006-12-30 00:38:54.840

126 select convert(varchar, getdate(), 126) 2006-12-30T00:38:54.840

127 select convert(varchar, getdate(), 127) 2006-12-30T00:38:54.840

ISLAMIC CALENDAR DATES

130 select convert(nvarchar, getdate(), 130)

131 select convert(nvarchar, getdate(), 131) 10/12/1427 12:38:54:840AM

You can also format the date or time without dividing characters, as well as concatenate
the date and time string:

Sample statement Output

select replace(convert(varchar, getdate(),101),'/','') 12302006

select replace(convert(varchar, getdate(),101),'/','') + replace(convert(varchar,


12302006004426
getdate(),108),':','')

You might also like