When migrating SQL Server workloads to Azure there is a trade-off between optimizing for performance and optimizing for costs. You need to know how much work your sql servers are performing when evaluating them prior to a migration.
Ideally you will have a tool that will capture key metrics such as your servers wait statistics as well as resource consumption, however not all of us have that luxury, and they can often be hard to aggregate and report from. This post will show how you can capture a lot of these metrics from within sql server negating the need for a monitoring tool and at the same time giving you an easy repository to analyse from.
To start of you need to know what the metrics are you need to capture, these typically include:
- CPU – example perfmon counters
- \Processor Information(_Total)% Processor Time
- \Process(sqlservr)% Processor Time
- Memory
- Storage
- \LogicalDisk\Disk Reads/Sec (read IOPS)
- \LogicalDisk\Disk Writes/Sec (write IOPS)
- \LogicalDisk\Disk Read Bytes/Sec (read throughput requirements for the data, log, and tempdb files)
- \LogicalDisk\Disk Write Bytes/Sec (write throughput requirements for the data, log, and tempdb files)
As documented here – https://learn.microsoft.com/en-us/azure/azure-sql/virtual-machines/windows/performance-guidelines-best-practices-collect-baseline?view=azuresql-vm
Perfmon is one way of capturing this, but it is also readily available from within sql server itself.
Perfmon is one way of capturing this, but it is also readily available from within sql server itself.
CPU
For CPU you can use sys.dm_os_ring_buffers DMV and specifically type ‘RING_BUFFER_SCHEDULER_MONITOR’ which contains cpu utilisation captured every 60 seconds to give you the SQL process utilisation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
–CPU WITH SQLProcessCPU AS (SELECT TOP (30) SQLProcessUtilization AS ‘SQL_CPU_Usage’, SystemIdle AS ‘CPU_Idle’, 100 – y.SystemIdle – y.SQLProcessUtilization AS ‘Other_CPU_USage’, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS ‘row_number’ FROM ( SELECT record.value(‘(./Record/@id)[1]’, ‘int’) AS record_id, record.value(‘(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]’, ‘int’) AS [SystemIdle], record.value(‘(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]’, ‘int’) AS [SQLProcessUtilization], timestamp FROM ( SELECT timestamp, CONVERT(XML, record) AS [record] FROM sys.dm_os_ring_buffers WHERE ring_buffer_type = N‘RING_BUFFER_SCHEDULER_MONITOR’ AN D record like ‘%<SystemHealth>%’ ) AS x ) AS y ) –INSERT INTO [DBA].perf.cpu SELECT GETDATE() AS Dt, AVG(SQLProcessCPU.SQL_CPU_Usage) AS ‘SQLProcessUtilization_1_Minutes’, AVG(SQLProcessCPU.CPU_Idle) AS [AVG_CPU_Idle], AVG(SQLProcessCPU.Other_CPU_USage) AS [AVG_Other_Processes] FROM SQLProcessCPU WHERE row_number BETWEEN 1 AND 2; |

Memory
Memory performance metrics can be found by querying the sys.dm_os_performance_counters and sys.dm_os_sys_memory DMV’s
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
—Memory —INSERT INTO [DBA].perf.memory SELECT GETDATE() AS Dt, pc.object_name, pc.counter_name, pc.instance_name, pc.cntr_value, pc.cntr_type FROM sys.dm_os_performance_counters pc WHERE pc.counter_name IN ( ‘Target Server Memory (KB)’, ‘Total Server Memory (KB)’, ‘Lazy writes/sec’, ‘page life expectancy’ ) AND pc.object_name NOT LIKE ‘%:Buffer Node%’ UNION SELECT GETDATE() AS Dt, ‘sys.dm_os_sys_memory’ AS [object_name], ‘available_physical_memory_kb’ AS counter_name, ‘OS’ AS [instance_name], available_physical_memory_kb AS [cntr_value], ” AS [cntr_type] FROM sys.dm_os_sys_memory; |

Batch Requests/second
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
—SQL Server Metrics DECLARE @v1 BIGINT, @delay SMALLINT = 2, @time DATETIME; SELECT @time = DATEADD(SECOND, @delay, ’00:00:02′); SELECT @v1 = cntr_value FROM master.sys.dm_os_performance_counters WHERE counter_name = ‘Batch Requests/sec’; WAITFOR DELAY @time; /* INSERT INTO perf.SQLServer ( Dt, Instance, Metric, Metric_Value ) */ SELECT GETDATE() AS [Dt], @@SERVERNAME AS [Instance], ‘BatchRequests/sec’ AS [Metric], (cntr_value – @v1) / @delay AS [Metric_Value] FROM master.sys.dm_os_performance_counters WHERE counter_name = ‘Batch Requests/sec’; |

Storage
I have taken Tim Radney’s modification of Paul Randal’s IO Script https://www.sqlskills.com/blogs/tim/capturing-throughput-usage-in-sql-server/
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
—Storage /*============================================================================ File: ShortPeriodIOLatencies.sql Summary: Short snapshot of I/O latencies SQL Server Versions: 2005 onwards —————————————————————————— Written by Paul S. Randal, SQLskills.com (c) 2014, SQLskills.com. All rights reserved. For more scripts and sample code, check out http://www.SQLskills.com You may alter this code for your own *non-commercial* purposes (e.g. in a for-sale commercial tool). Use in your own environment is encouraged. You may republish altered code as long as you include this copyright and give due credit, but you must obtain prior permission before blogging this code. THIS CODE AND INFORMATION ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE. OF – Modified to include Tim Radneys analysis for azure storage ============================================================================*/ IF EXISTS ( SELECT * FROM [tempdb].[sys].[objects] WHERE [name] = N‘##SQLskillsStats1’ ) DROP TABLE [##SQLskillsStats1]; IF EXISTS ( SELECT * FROM [tempdb].[sys].[objects] WHERE [name] = N‘##SQLskillsStats2’ ) DROP TABLE [##SQLskillsStats2]; SELECT [database_id], [file_id], [num_of_reads], [io_stall_read_ms], [num_of_writes], [io_stall_write_ms], [io_stall], [num_of_bytes_read], [num_of_bytes_written], [file_handle] INTO ##SQLskillsStats1 FROM sys.dm_io_virtual_file_stats(NULL, NULL); WAITFOR DELAY 00:02:00 —2 minutes; SELECT [database_id], [file_id], [num_of_reads], [io_stall_read_ms], [num_of_writes], [io_stall_write_ms], [io_stall], [num_of_bytes_read], [num_of_bytes_written], [file_handle] INTO ##SQLskillsStats2 FROM sys.dm_io_virtual_file_stats(NULL, NULL); WITH [DiffLatencies] AS (SELECT — Files that werent in the first snapshot [ts2].[database_id], [ts2].[file_id], [ts2].[num_of_reads], [ts2].[io_stall_read_ms], [ts2].[num_of_writes], [ts2].[io_stall_write_ms], [ts2].[io_stall], [ts2].[num_of_bytes_read], [ts2].[num_of_bytes_written] FROM [##SQLskillsStats2] AS [ts2] LEFT OUTER JOIN [##SQLskillsStats1] AS [ts1] ON [ts2].[file_handle] = [ts1].[file_handle] WHERE [ts1].[file_handle] IS NULL UNION SELECT — Diff of latencies in both snapshots [ts2].[database_id], [ts2].[file_id], [ts2].[num_of_reads] – [ts1].[num_of_reads] AS [num_of_reads], [ts2].[io_stall_read_ms] – [ts1].[io_stall_read_ms] AS [io_stall_read_ms], [ts2].[num_of_writes] – [ts1].[num_of_writes] AS [num_of_writes], [ts2].[io_stall_write_ms] – [ts1].[io_stall_write_ms] AS [io_stall_write_ms], [ts2].[io_stall] – [ts1].[io_stall] AS [io_stall], [ts2].[num_of_bytes_read] – [ts1].[num_of_bytes_read] AS [num_of_bytes_read], [ts2].[num_of_bytes_written] – [ts1].[num_of_bytes_written] AS [num_of_bytes_written] FROM [##SQLskillsStats2] AS [ts2] LEFT OUTER JOIN [##SQLskillsStats1] AS [ts1] ON [ts2].[file_handle] = [ts1].[file_handle] WHERE [ts1].[file_handle] IS NOT NULL) INSERT INTO DBA.perf.storage ( Dt, DB, Drive, Type_Desc, Reads, Writes, ReadLatency_ms, WriteLatency_ms, AvgBPerRead, AvgBPerWrite, Physical_Name, num_of_mb_written, num_of_mb_read, mb_per_sec_Written, mb_per_Sec_Read ) SELECT GETDATE() AS Dt, DB_NAME([vfs].[database_id]) AS [DB], LEFT([mf].[physical_name], 2) AS [Drive], [mf].[type_desc], [num_of_reads] AS [Reads], [num_of_writes] AS [Writes], [ReadLatency(ms)] = CASE WHEN [num_of_reads] = 0 THEN 0 ELSE ([io_stall_read_ms] / [num_of_reads]) END, [WriteLatency(ms)] = CASE WHEN [num_of_writes] = 0 THEN 0 ELSE ([io_stall_write_ms] / [num_of_writes]) END, — [Latency] = — CASE WHEN ([num_of_reads] = 0 AND [num_of_writes] = 0) — THEN 0 ELSE ([io_stall] / ([num_of_reads] + [num_of_writes])) END, [AvgBPerRead] = CASE WHEN [num_of_reads] = 0 THEN 0 ELSE ([num_of_bytes_read] / [num_of_reads]) END, [AvgBPerWrite] = CASE WHEN [num_of_writes] = 0 THEN 0 ELSE ([num_of_bytes_written] / [num_of_writes]) END, — [AvgBPerTransfer] = — CASE WHEN ([num_of_reads] = 0 AND [num_of_writes] = 0) — THEN 0 ELSE — (([num_of_bytes_read] + [num_of_bytes_written]) / — ([num_of_reads] + [num_of_writes])) END, [mf].[physical_name], (num_of_bytes_written / 1048576) AS num_of_mb_written, (num_of_bytes_read / 1048576) AS num_of_mb_read, (num_of_bytes_written / 1048576 / 120) AS mb_per_sec_Written, (num_of_bytes_read / 1048576 / 120) AS mb_per_sec_read FROM [DiffLatencies] AS [vfs] JOIN sys.master_files AS [mf] ON [vfs].[database_id] = [mf].[database_id] AND [vfs].[file_id] = [mf].[file_id] — ORDER BY [ReadLatency(ms)] DESC ORDER BY [WriteLatency(ms)] DESC; — Cleanup IF EXISTS ( SELECT * FROM [tempdb].[sys].[objects] WHERE [name] = N‘##SQLskillsStats1’ ) DROP TABLE [##SQLskillsStats1]; IF EXISTS ( SELECT * FROM [tempdb].[sys].[objects] WHERE [name] = N‘##SQLskillsStats2’ ) DROP TABLE [##SQLskillsStats2]; |
It’s important to note here the time interval of 2 minutes. If you are less worried about bursty throughput 5 minutes is a good interval, but you must be aware it can aggregate throughput over 2 minutes to give the appearance of lower throughput. See the example below:
Max of 1150mbps read at the time shown:

At a 10 second granularity..

Pulling it all together you can have a holistic view of your server :


And if you’re worried about overhead, all of the above in a stored proc is pretty lightweight..

Comments
One response to “Baselining SQL Server for Azure”
Very handy, thanks Oli 🙂