[Company Logo Image] 

Home Up Contents Coffee Break Credits Glossary Links Search

 
Grouping customers using Azure SQL DB Machine Learning

 

 

Home
Analysis Services
Azure
CLR Integration
High Availability
Open Source
Security
SQL Server 2008
SQL Server 2012
SQL Server 2014
SQL Server 2016
SQL Server 2017
SQL Server 2019
Tips
Troubleshooting
Tuning

Grouping customers using Azure SQL DB Machine Learning.


Date created: June 2, 2019.
 

In this article, we will use Azure SQL Database Machine Learning Services to group customers of an online store using K-means (clustering) based on their purchase and return history.

We are trying to reproduce an old SQL Server tutorial on the new Azure SQL Database Machine Learning service.

 

Installing Microsoft R Client and SQL Server Management Studio.


For the purpose of this article we need to install Microsoft R Client.

 

 

 

 

 

 

 

We also need to install SQL Server Management Studio as explained here.

 

Calculating sales return KPIs.

In this section we are using R Client to connect to an Azure SQL Database and get some KPIs about sales returns like the return frequency, the return order ratio, the return item ratio and the return amount ratio.





On the next image you see how to connect to an Azure SQL Database using Microsoft R Client.


 

Below the query to calculate all required KPIs.




input_query <- "
SELECT
ss_customer_sk AS customer,
round(CASE WHEN ((orders_count = 0) OR (returns_count IS NULL) OR (orders_count IS NULL) OR ((returns_count / orders_count) IS NULL) ) THEN 0.0 ELSE (cast(returns_count as nchar(10)) / orders_count) END, 7) AS orderRatio,
round(CASE WHEN ((orders_items = 0) OR(returns_items IS NULL) OR (orders_items IS NULL) OR ((returns_items / orders_items) IS NULL) ) THEN 0.0 ELSE (cast(returns_items as nchar(10)) / orders_items) END, 7) AS itemsRatio,
round(CASE WHEN ((orders_money = 0) OR (returns_money IS NULL) OR (orders_money IS NULL) OR ((returns_money / orders_money) IS NULL) ) THEN 0.0 ELSE (cast(returns_money as nchar(10)) / orders_money) END, 7) AS monetaryRatio,
round(CASE WHEN ( returns_count IS NULL ) THEN 0.0 ELSE returns_count END, 0) AS frequency
--cast(round(cast(CASE WHEN (returns_count IS NULL) THEN 0.0 ELSE returns_count END as double)) as integer) AS frequency
FROM
(
SELECT
ss_customer_sk,
-- return order ratio
COUNT(distinct(ss_ticket_number)) AS orders_count,
-- return ss_item_sk ratio
COUNT(ss_item_sk) AS orders_items,
-- return monetary amount ratio
SUM( ss_net_paid ) AS orders_money
FROM store_sales s
GROUP BY ss_customer_sk
) orders
LEFT OUTER JOIN
(
SELECT
sr_customer_sk,
-- return order ratio
count(distinct(sr_ticket_number)) as returns_count,
-- return ss_item_sk ratio
COUNT(sr_item_sk) as returns_items,
-- return monetary amount ratio
SUM( sr_return_amt ) AS returns_money
FROM store_returns
GROUP BY sr_customer_sk
) returned ON ss_customer_sk=sr_customer_sk
"



 

customer_returns <- RxSqlServerData(sqlQuery=input_query,colClasses=c(customer ="numeric" , orderRatio="numeric" , itemsRatio="numeric" , monetaryRatio="numeric" , frequency="numeric" ),connectionString=connStr);






Showing the results for 5 customers only.
 


Using Kmeans to perform clustering and analyzing results using a Plot.


return_cluster is the output table to hold the customer group mappings. This is a table where the cluster mappings will be saved in the database. This table is generated from R.






Set.seed for random number generator for predictability.





Generate clusters using rxKmeans and output key / cluster to a table in SQL Server called return_cluster.
 




Read the customer returns cluster table from the database.





Installing the R library "cluster".










Using a plot to visualize data for analysis.








Below the statement that created the plot.






Analyzing cluster means.
 

The clust object contains the results from our Kmeans clustering. Look at the clustering details and analyze results.





Now, let's use SQL Server Management Studio and the following SELECT statement to use the cluster data to show email addresses of customers belonging to cluster #3 from the tables in Azure SQL Database.





Creating a stored procedure with R code for clustering.
 

Putting R code on stored procedures makes it easy to run R code from applications. The following script creates a stored procedure that will perform clustering using R code.


DROP PROC IF EXISTS generate_customer_return_clusters;
GO
CREATE procedure [dbo].[generate_customer_return_clusters]
AS
/*
This procedure uses R to classify customers into different groups based on their
purchase & return history.
*/
BEGIN
DECLARE @duration FLOAT
, @instance_name NVARCHAR(100) = @@SERVERNAME
, @database_name NVARCHAR(128) = db_name()
, @input_query NVARCHAR(MAX) = N'
SELECT
ss_customer_sk AS customer,
round(CASE WHEN ((orders_count = 0) OR (returns_count IS NULL) OR (orders_count IS NULL) OR ((returns_count / orders_count) IS NULL) ) THEN 0.0 ELSE (cast(returns_count as nchar(10)) / orders_count) END, 7) AS orderRatio,
round(CASE WHEN ((orders_items = 0) OR(returns_items IS NULL) OR (orders_items IS NULL) OR ((returns_items / orders_items) IS NULL) ) THEN 0.0 ELSE (cast(returns_items as nchar(10)) / orders_items) END, 7) AS itemsRatio,
round(CASE WHEN ((orders_money = 0) OR (returns_money IS NULL) OR (orders_money IS NULL) OR ((returns_money / orders_money) IS NULL) ) THEN 0.0 ELSE (cast(returns_money as nchar(10)) / orders_money) END, 7) AS monetaryRatio,
round(CASE WHEN ( returns_count IS NULL ) THEN 0.0 ELSE returns_count END, 0) AS frequency

FROM
(
SELECT
ss_customer_sk,
-- return order ratio
COUNT(distinct(ss_ticket_number)) AS orders_count,
-- return ss_item_sk ratio
COUNT(ss_item_sk) AS orders_items,
-- return monetary amount ratio
SUM( ss_net_paid ) AS orders_money
FROM store_sales s
GROUP BY ss_customer_sk
) orders
LEFT OUTER JOIN
(
SELECT
sr_customer_sk,
-- return order ratio
count(distinct(sr_ticket_number)) as returns_count,
-- return ss_item_sk ratio
COUNT(sr_item_sk) as returns_items,
-- return monetary amount ratio
SUM( sr_return_amt ) AS returns_money
FROM store_returns
GROUP BY sr_customer_sk
) returned ON ss_customer_sk=sr_customer_sk
'
EXEC sp_execute_external_script
@language = N'R'
, @script = N'

connStr <- paste("Driver=SQL Server;Server=", instance_name, " ;Database=", database_name, " ;Trusted_Connection=true;", sep="" );

customer_returns <- RxSqlServerData(sqlQuery = input_query,
colClasses = c(customer = "numeric", orderRatio = "numeric", itemsRatio = "numeric", monetaryRatio = "numeric", frequency = "numeric"),
connectionString = connStr);

return_cluster = RxSqlServerData(table = "customer_return_clusters", connectionString = connStr);

set.seed(10);

"customer_return_clusters".
clust <- rxKmeans( ~ orderRatio + itemsRatio + monetaryRatio + frequency, customer_returns, numClusters = 4
, outFile = return_cluster, outColName = "cluster", writeModelVars = TRUE , extraVarsToWrite = c("customer"), overwrite = TRUE);
'
, @input_data_1 = N''
, @params = N'@instance_name nvarchar(100), @database_name nvarchar(128), @input_query nvarchar(max), @duration float OUTPUT'
, @instance_name = @instance_name
, @database_name = @database_name
, @input_query = @input_query
, @duration = @duration OUTPUT;
END;

GO





 

Let's now execute the stored procedure to perform the clustering.

 




The following query verifies customers have been grouped and have assigned a cluster number.







Let's run again the following SELECT statement to use the cluster data and show email addresses of customers belonging to cluster #3 from the tables in Azure SQL Database.




 

 

 

.Send mail to webmaster@sqlcoffee.com with questions or comments about this web site.