database-wiki

Don't reinvent the wheel, just realign it!

Performance Analysis? SQL Server slow or Query slow? PSSDIAG is there for the rescue!

Posted by database-wiki on July 3, 2012

Things you need to have:

1. PSSDIAG configuration tool:

Download link:

http://support.microsoft.com/kb/830232

2. Some points to consider before configuring and running PSSDIAG on a production system.

=> —  Make sure that you choose the correct version of SQL Server and platform to be captured (i.e. SQL 2000/2005/2008, x86/x64/ia64). If you make modifications to events, etc. and then you change the version, platform, then you will lose any modifications you have made to that point.

=>—  PSSDIAG should be captured on a local disk from the server that has the SQL instance running on it to make sure that all the data is captured.

=>—  PSSDIAG should be collected on a fast and dedicated disk volume to help ensure minimum performance impact on the SQL Server and also to ensure that the disk statistics are not skewed because of PSSDIAG data collection.

=>—  Make sure you puts PSSDiag on a disk where no database files reside.

=>—  Always provide the server-name and instance name while configuring the PSSDIAG to prevent unnecessary data from being collected.

— =>  Always trace to a locally attached (or SAN) disk, never to a network share or UNC path.

=>—  Don’t capture extremely high frequency events. (Refer below)

=>—  The Show Plan Statistics event can be extremely useful, but be aware that capturing this event has a much larger CPU cost than most other events.

=>  Trace filtering can dramatically reduce .trc file size and the I/O cost of tracing, but you should be aware that it can actually increase the CPU burden of tracing.

For Long Term PSSDIAG collection:

  • Enable NTFS compression in PSSDIAG
  • Eliminate Profiler tracing if a trace is not required
  • If a Profiler trace is required, minimize the size of the trace
  • Enable the “Delete Old Trace Files” custom task group in PSSDIAG
  • Consider using PSSDIAG’s /E and /L command line parameter to restart collection each night
  • Consider using PSSDIAG’s built in NT service support.
  • Collect only Completed events for Batch and SPs if you know that the you are not experiencing timeouts
  • Statement level events can be avoided to reduce the size of data collection but that would prevent us from drilling down to the actual statement

Warning: Do not do any of the following in an attempt to reduce the disk space used by a PSSDIAG capture:

  • Do not try to run PSSDIAG on a remote machine in an attempt to redirect the profiler .TRC files to a remote machine with more disk space
  • Do not use a UNC path or mapped network drive for PSSDIAG’s output path if you are capturing profiler

Possible customization:

  • Cluster (similar to MPS Reports) – For corruption RCA cases
  • Database Mail/Service Broker/Database Mirroring/Full Text Search
  • Delete Old Trace Files – For long running PSSDIAG
  • Merge Repl (Dist)/Merge Repl (Pub-Sub)/Tran Repl (Dist)/Tran Repl (Pub)/Repl (General)
  • Missing Perfmon Counters
  • OSDrivers – For corruption RCA cases
  • PSSLooksAlive – For cluster failover RCA
  • SQL 2000 Blocking Script/SQL 2005 Perf Stats/SQL 2008 Perf Stats –Select relevant one when configuring for long term collection to reduce size.
  • SQL 2000 Perf Stats
  • SQL 2005 tempdb Space and Latching
  • SQL 2008 Backup MDW
  • SQL Backup Restore
  • SQL Blocking – For deadlocks
  • SQL BPA
  • SQL Dumps – To be enabled when you need all SQL Server dumps
  • SQL Memory Error
  • SQL Setup – Not to be collect unless specifically required
  • SQLAgent/SQLMail

High frequency event categories

Broker

Query Notifications

Performance

Security Audit

Scans

Objects

Locks

Transactions (Except SQLTransaction)

OLEDB

Cursors

High frequency events

Stored Procedures : SP:CacheMiss

Stored Procedures : SP:CacheInsert

Stored Procedures : SP:CacheRemove

Stored Procedures : SP:CacheHit

TSQL : Prepare SQL

TSQL : Exec Prepared SQL

TSQL : Unprepare SQL

3. what counters to capture to troubleshoot a specific issue?

For High CPU:

For Timeouts and Deadlocks:

For Blocking and Disk I/O:

For Memory Issues:

Here I have configured the PSSDIAG for the below problem description:

Problem Description:

We have observed that queries take a very variable amount of time to execute. This is a very important issue for us as many of our applications are of a real-time nature and depend upon getting data in a short time.

One particular query which we have focussed on, selects about 100000 rows and takes 500ms usually, but can sometimes take over 10 seconds. The profiler does not show any activity in this period which explains the delay (no other queries which would lock the data).

We need to try to understand why the performance is so variable, what SQL Server is doing during this period which is holding it up from running the query. There are no obvious bottlenecks at the server level such as CPU.

Here is a description of our DB architecture, which is rather complex with respect to replication. (I suggest you draw a diagram as you read through the below, I’m sorry I don’t have a diagram I can send you.)

To summarise the replication architecture, we have DBs which are kept centrally on a pair of servers which provide us with some fall-back/redundancy. Some of the data is needed (for read only) on over 50 other systems, and so we replicate it down to an MSDE/SQLExpress on these systems. Other data only resides at the central servers and our applications query it directly.

There are two main servers, I’ll call them A and B. A is the server where we have been having the long execution times. (We do not run queries on B except when A is out of service.)

A and B have about 5 databases where A is the publisher to B using transactional replication with updatable subscriptions. One of the these databases, which I will call X, is replicated down to many other places as follows.

B has 5 publications set up for X, each to about 10 MSDE (SQL 2000 generation) instances on other servers using transactional replication (simple one-way).

A also replicates to a similar set of MSDE instances as B, but the setup is a little more complex due to limitations of what setup was possible. Because DB X on A is a publisher using updatable subscriptions, we could not set up publishers with simple transactional replication to these MSDEs. So, they are set up as updatable subscriptions down to the MSDE instances. To further complicate matters, since the installation of SQL 2005 SP3, this configuration could not be done (an issue I believe with it being down to MSDE, a more up to date SQL Express would have been ok). Although existing publications continued to operate as before it was not possible to add new ones. Therefore, a new DB has set up on A to act as a copy of X with transactional replication with updatable subscriptions (let’s call it X2), and any new replications to MSDE instances that we need to add get done as one way replication down from X2.

In addition to A and B, there is also another server C. This is set up in the same way as B but to less MSDE instances.

I hope that this description is clear.

I am also sending, in another email, two profiler traces. They were taken on server A and show activity on all databases. At line 246178 in trace 1 you will see a test query taking about half a second. Then at line 247565 you will see exactly the same query taking over 3 seconds. Although there are blocks of activity relating to replication during this 3 seconds, they do not span the whole period and so do not seem to directly explain the delay. There are no other queries running on the same data during this 3 seconds. In trace 3, the query takes over 13 seconds. Again, the trace does not show a solid block of activity in the previous 13 seconds which might explain the duration.

Then I have save the PSSDIAG configuration to pssd.exe.

To run PSSDIAG I need to first send the pssd.exe to cx and he needs to follow the below steps to collect the same:

Note:

If you have configure PSSDIAG for SQL Server 2000 or SQL Server 2005 you need to run pssdiag.exe from the command prompt after extracting pssd.exe. But for SQL Server 2008 onwards you need to run pssdiag.cmd after extracting pssd.exe.

  1. On the server PSSQL2012 that contains the SQL Server 2005 instance (SQL2012), create a folder PSSDIAG in a drive that has at least 20 gigs of free space ( Make sure you do not put this in the already stressed disk or the SQL Server data file are present) . Copy the file pssd.exe in to the PSSDIAG folder then run it.
  2. Navigate to the folder using command prompt.
  3. When users complain that the issue has occurred, then execute pssdiag.exe

 

4. Once PSSDIAG has started (you will see a message saying that PSSDIAG has started) in the cmd prompt.

5. After seeing “PSSDIAG Collection started. Press Ctrl+C to stop” message – wait for 10 minutes to capture issue occurs and then press ctrl+c in the above screen only once. Also make sure you monitor the size of this PSSDIAG folder for size growth.

6. Inside the PSSDIAG folder you will find an output folder. That the data you will need to start you analysis using SQL NEXUS.

3. SQL Nexus to analyze the PSSDIAG.

Download link:

http://sqlnexus.codeplex.com/

Analysis using nexus:

For analysis using nexus you need a test SQL Server instance. You can also use SQL Server express for the same.

Install SQL Server express edition on your test or research server. You can download the same from http://www.microsoft.com/en-us/download/details.aspx?id=23650

Once you have installed nexus and started it for the first time it will problem you for an instance to connect.

You can create a new database from Current DB drop down for each analysis.

You need to using import to import the data into database inside SQL Server 2008 R2 express edition. For me that edition is sqlnexus.

Once the all the data is loaded with out any error particularly profiler trace, you can use SQL Perf Main to view the reports.

Tips and Tricks:

A. Once you have executed the file pssd.exe your folder will look like,

B. Navigate to the folder using command prompt. If it’s a windows 2008 server you need to open cmd prompt as “run as administrator”

C. Run the command as below,

Here I’m starting the PSSDIAG at 19:05 PM on 30th that’s today and stopping at 19:10 PM.

/B means begin time and /E means end time.

For you the command should look like, ( you can also change the date but we want to collect the data at 2:50 to 3:20 that’s when the process should be complete but if it’s going to extend beyond we should be capturing the bad run. )

Pssdiag.exe /R /B 20111201_02:50:00 /E 20111201_03:20:00

We are going to collect the data on 1st December 2011 between 2:50 to 3:20 AM. Once you run the above command the output will be:

D . After which you need to go to the services console and start PSSDIAG service.

I have tested this procedure and it works. All you need to do is use the output folder located in the PSSDIAG folder and import it in SQL Nexus.

For me it’s,

D:\logshipping\tester\output

Delete the PSSDIAG service,

NOTE: The account under which the PSSDIAG service is created/running should have access to connect SQL Server instance and capture profiler trace.

B. For running PSSDIAG on a different server, please edit the PSSDIAG.XML after extracting pssd.exe as

<Machine name=”SERVER_NAME”>

And

Default instance,

<Instance name=”MSSQLSERVER” windowsauth=”true” ssver=”10″ user=””>

Or

Named instance,

<Instance name=”INSTANCE_NAME” windowsauth=”true” ssver=”10″ user=””>

Detailed Analysis of collected Data:

Server Configuraton Values

AWE   – False
Cluster   – No
Database   drives – C:, E:, D:
Machine Name – RSxxxx12
Number   of Databases – 17
Number of Processors – 8
OS   Name – Microsoft(R) Windows(R) Server 2003,EnterpriseEdition
SQL   Bit Version – Intel X86
SQL   Edition -Enterprise  Edition
SQL Server Name – RSxxxx12
SQL   Version – 9.00.4035.00
Total RAM – 4,089.73 MB
VM   – False
Win Bit Version – X86-based PC
Win   Version – Windows NT 5.2 (Build 3790: Service Pack 2)

SP_CONFIGURE VALUES:

name                                 minimum      maximum      config_value run_value   
Ad   Hoc Distributed Queries

0

1

0

0

affinity   I/O   mask

-2147483648

2147483647

0

0

affinity   mask

-2147483648

2147483647

0

0

Agent   XPs

0

1

1

1

allow   updates

0

1

0

0

awe   enabled

0

1

0

0

blocked   process threshold

0

86400

5

5

c2   audit   mode

0

1

0

0

clr   enabled

0

1

0

0

common   criteria compliance enabled

0

1

0

0

cost   threshold for parallelism

0

32767

5

5

cross   db ownership chaining

0

1

0

0

cursor   threshold

-1

2147483647

-1

-1

Database   Mail   XPs

0

1

0

0

default   full-text language

0

2147483647

1033

1033

default   language

0

9999

0

0

default   trace   enabled

0

1

1

1

disallow   results from triggers

0

1

0

0

fill   factor   (%)

0

100

0

0

ft   crawl bandwidth   (max)

0

32767

100

100

ft   crawl bandwidth   (min)

0

32767

0

0

ft   notify bandwidth (max)

0

32767

100

100

ft   notify bandwidth (min)

0

32767

0

0

index   create memory   (KB)

704

2147483647

0

0

in-doubt   xact resolution

0

2

0

0

lightweight   pooling

0

1

0

0

locks

5000

2147483647

0

0

max degree of parallelism            

0

64

0

0

max   full-text crawl range

0

256

4

4

max server memory   (MB)            

16

2147483647

2147483647

2147483647

max   text repl size   (B)

0

2147483647

2147483647

2147483647

max   worker   threads

128

32767

0

0

media   retention

0

365

0

0

min   memory per query (KB)

512

2147483647

1024

1024

min   server memory   (MB)

0

2147483647

0

0

nested   triggers

0

1

1

1

network   packet size   (B)

512

32767

4096

4096

Ole   Automation Procedures

0

1

0

0

open   objects

0

2147483647

0

0

PH   timeout (s)

1

3600

60

60

precompute   rank

0

1

0

0

priority   boost

0

1

0

0

query   governor cost limit

0

2147483647

0

0

query   wait (s)

-1

2147483647

-1

-1

recovery   interval   (min)

0

32767

0

0

remote   access

0

1

1

1

remote   admin connections

0

1

0

0

remote   login timeout (s)

0

2147483647

20

20

remote   proc   trans

0

1

0

0

remote   query timeout (s)

0

2147483647

600

600

Replication   XPs

0

1

0

0

scan   for startup   procs

0

1

1

1

server   trigger recursion

0

1

1

1

set   working set   size

0

1

0

0

show   advanced   options

0

1

1

1

SMO   and DMO   XPs

0

1

1

1

SQL   Mail   XPs

0

1

0

0

transform   noise   words

0

1

0

0

two   digit year   cutoff

1753

9999

2049

2049

user   connections

0

32767

0

0

user   options

0

32767

0

0

Web   Assistant   Procedures

0

1

0

0

xp_cmdshell

0

1

1

1

We definitely see blocking, even though this SPID 1004 which was executing the query which is of your concern never got blocked , but still these kind of blocking can cause performance delays, Queries can suffer delays in execution.

The blocking is caused by LOCK BLOCKING.

first_rownum   last_rownum num_snapshots blocking_start blocking_end head_blocker_session_id blocking_wait_type max_blocked_task_count max_total_wait_duration_ms avg_wait_duration_ms max_wait_duration_ms max_blocking_chain_depth   head_blocker_session_id_orig

0

0

1

19:25.1

19:37.3

1509

LOCK BLOCKING

1332

843984

633

828

2

1509

1

1

1

19:37.3

19:47.6

2064

LOCK BLOCKING

30

31090

1036

1047

2

2064

2

3

2

19:47.6

20:08.1

1509

LOCK BLOCKING

1122

224103

630

1062

2

1509

4

4

1

20:28.8

20:39.3

1509

LOCK BLOCKING

6

1842

307

312

2

1509

5

6

2

20:39.3

20:59.7

1232

LOCK BLOCKING

342

89550

263

1609

2

1232

7

8

2

20:49.5

21:10.0

2064

LOCK BLOCKING

30

4200

124

140

2

2064

10

10

1

21:30.3

21:41.1

499

LOCK BLOCKING

1

421

421

421

1

499

9

12

2

21:30.3

21:51.3

1509

LOCK BLOCKING

6

2314

352

406

2

1509

11

11

1

21:41.1

21:51.3

2064

LOCK BLOCKING

30

17820

594

594

2

2064

13

13

1

22:01.5

22:11.8

1509

LOCK BLOCKING

1260

873705

693

1141

2

1509

14

14

1

22:11.8

22:22.0

1546

LOCK BLOCKING

1

1032

1032

1032

1

1546

15

15

1

23:03.0

23:13.2

499

LOCK BLOCKING

1

1063

1063

1063

1

499

16

16

1

23:13.2

23:23.4

2064

LOCK BLOCKING

30

10800

360

360

2

2064

17

17

1

23:54.6

24:04.9

1509

LOCK BLOCKING

42

42282

1006

1156

2

1509

19

19

1

24:15.1

24:25.3

1509

NULL

6

0

0

0

2

1509

18

18

1

24:15.1

24:25.3

499

WRITELOG

1

63

63

63

1

499

20

20

1

24:35.5

24:46.0

1509

ASYNC_NETWORK_IO

1

16

16

16

2

1509

21

21

1

24:35.5

24:46.0

1509

NULL

33

0

0

0

2

1509

As we see above the Head Blocker is Waiting on a WRITELOG which means that there is a potential delay in Write Operation, this indicates Disk Bottleneck.

WaitType                                      WaitTime             WaitTimePerSec(ms)

——————————————— ——————– ——————–

LCK_M_S                                       1322250              3516

WRITELOG                                      1223125              3252

ASYNC_NETWORK_IO                              589906               1568

PAGEIOLATCH_SH                                418985               1114

SOS_SCHEDULER_YIELD                           312297               830

MSQL_XP                                       186985               497

PAGEIOLATCH_EX                                133156               354

IO_COMPLETION                                 41062                109

LCK_M_U                                       29766                79

SLEEP_BPOOL_FLUSH                             28453                75

Please go through the description of these wait type from the below KB article.

http://support.microsoft.com/kb/822101

Lock wait type:

LCK_M_S 0×03 Share
WRITELOG 0×81 This   waittype indicates that the SPID is waiting for a transaction log I/O request   to complete. This waittype may also indicate a possible disk bottleneck.

ASYNC_NETWORK_IO

The waittype is Network I/O (client fetch) and below is the definition. ( ASYNC_NETWORK_IO )

  • Network I/O (client fetch) – This wait type indicates that SQL Server is waiting for the client application to fetch rows.

PAGEIOLATCH_SH

Poor IO performance can easily become a bottleneck on a database system. The bottleneck is commonly visible through PAGEIOLATCH_xx or WRITELOG waits.

An IO bottleneck is commonly a result of the IO subsystem being undersized or incorrectly configured to handle the load. If the poor performance coincides with spikes in checkpoint IO you may also be able to throttle the number of checkpoint IOs ( http://support.microsoft.com/?kbid=929240 ).

As for external causes for slow IO performance, here are some other common causes:

From the above data, Any wait category that waits for more than 1000 ms per sec is a bottleneck which we need to consider.

We have a considerable amount of missing indexes when we need to create it. Also check with the application team before implementing the same.

To Co-relate our above theory we see the Perfmon counters [NOTE WE REMOVED THE C DRIVE COUNTER]

 

Counter Name                                                Avg                  Max                  Min

—————————————— ——————– ——————– ——————–

LogicalDisk(D:)\% Idle Time                             19.194               59.339                0.000

LogicalDisk(D:)\Avg. Disk Queue Length                    2.348                4.308                0.687

LogicalDisk(D:)\Avg. Disk sec/Read                        0.238                0.925                0.000

LogicalDisk(D:)\Avg. Disk sec/Transfer                    0.278                0.900                0.078

LogicalDisk(D:)\Avg. Disk sec/Write                       0.282                0.910                0.068

LogicalDisk(D:)\Disk Bytes/sec                        88256.383           516716.327            10354.913

LogicalDisk(D:)\Split IO/Sec                              0.143                6.574                0.000

LogicalDisk(E:)\% Idle Time                              84.256               94.898               16.350

LogicalDisk(E:)\Avg. Disk Queue Length                    0.723               11.347                0.111

LogicalDisk(E:)\Avg. Disk sec/Read                        0.022                0.079                0.006

LogicalDisk(E:)\Avg. Disk sec/Transfer                    0.018                0.077                0.006

LogicalDisk(E:)\Avg. Disk sec/Write                       0.009                0.016                0.005

LogicalDisk(E:)\Disk Bytes/sec                      1480158.906          7070527.456           570347.259

LogicalDisk(E:)\Split IO/Sec                              0.769                5.184                0.000

Looking at the performance of the disk:

Counter   Name                                                     Avg                    Max                    Min Server          
LogicalDisk(_Total)\Avg.   Disk Queue Length

129.256

398.375

60.16

\\RSxxxx12\
LogicalDisk(_Total)\Avg.   Disk sec/Transfer

0.145

0.451

0.077

\\RSxxxx12\
LogicalDisk(_Total)\Disk   Bytes/sec

22140967.98

29825877.25

16101317.61

\\RSxxxx12\
LogicalDisk(_Total)\Disk   Transfers/sec

892.64

1058.612

699.41

\\RSxxxx12\
LogicalDisk(_Total)\Split   IO/sec

15.894

28.822

5.58

\\RSxxxx12\
LogicalDisk(C:)\Avg.   Disk Queue Length

126.185

393.851

58.876

\\RSxxxx12\
LogicalDisk(C:)\Avg.   Disk sec/Transfer

0.148

0.467

0.08

\\RSxxxx12\
LogicalDisk(C:)\Disk   Bytes/sec

20572552.69

28850140.58

15002983.59

\\RSxxxx12\
LogicalDisk(C:)\Disk   Transfers/sec

851.455

943.152

670.623

\\RSxxxx12\
LogicalDisk(C:)\Split   IO/sec

14.982

28.027

4.384

\\RSxxxx12\
LogicalDisk(D:)\Avg.   Disk Queue Length

2.348

4.308

0.687

\\RSxxxx12\
LogicalDisk(D:)\Avg. Disk   sec/Transfer            0.278 (278 milliseconds = 0.278 seconds)

0.9

0.078

\\RSxxxx12\     
LogicalDisk(D:)\Disk   Bytes/sec                  88256.383 (0.08 MB per second = 81.92 KB)

516716.327

10354.913

\\RSxxxx12\     
LogicalDisk(D:)\Disk   Transfers/sec

9.177

21.314

4.775

\\RSxxxx12\
LogicalDisk(D:)\Split   IO/sec

0.143

6.574

0

\\RSxxxx12\
LogicalDisk(E:)\Avg.   Disk Queue Length

0.723

11.347

0.111

\\RSxxxx12\
LogicalDisk(E:)\Avg.   Disk sec/Transfer

0.018

0.077

0.006

\\RSxxxx12\
LogicalDisk(E:)\Disk   Bytes/sec

1480158.906

7070527.456

570347.259

\\RSxxxx12\
LogicalDisk(E:)\Disk   Transfers/sec

32.007

149.33

9.369

\\RSxxxx12\
LogicalDisk(E:)\Split   IO/sec

0.769

5.184

0

\\RSxxxx12\

 For 0.08 MB (81.92 KB)of data on an average it take 278 milliseconds = 0.278 seconds which does not look good at all.

Considering For SAN’s – Avg Disk Bytes/sec can be 250 – 300 MB/Sec for single disk – Avg Disk Bytes/Sec > 50-70 MB/Sec

 All our log files are located in the D drive that’s performing bad and we see an obvious wait type WRITELOG.

 We started to dig into the specific query which was provided by you.

SELECT Service.ID, FileShop.szName,  FileShop.szPath FROM FileShop, Service WHERE Service.lPartitionViewID =  FileShop.lPartitionViewID AND FileShop.lPartitionViewID <> 0UNIONALL SELECT 0, FileShop.szName,  FileShop.szPath FROM FileShop WHERE FileShop.lPartitionViewID = 0

On the RTSinfo DB.

From the profiler see only one instance of this query and it took close to 20 seconds to complete.

SQL:BatchStarting          7                          RSxxxx79      4504   Microsoft Open Database Connectivity  RTPerformer   1004          2011-12-02 20:19:29.850                                              RTSinfo    0XE1B7F4B2572D6940834D202EE9196261       0      0      -1823749497          RTformer   SELECT Service.ID, FileShop.szName,  FileShop.szPath FROM FileShop, Service WHERE Service.lPartitionViewID =  FileShop.lPartitionViewID AND FileShop.lPartitionViewID <> 0 UNION ALL SELECT 0, FileShop.szName,  FileShop.szPath FROM FileShop WHERE FileShop.lPartitionViewID = 0

From the PSSDIAG snapshot data, all the query was suspended due to ASYNC_NETWORK_IO before it completed.

./RSxxxx12__SQL_2005_Perf_Stats_Startup.OUT:2011-12-02T20:19:37.293              1004           0           0                   0             0 SUSPENDED       ASYNC_NETWORK_IO                                                               15                                                                                                                                                                                       ASYNC_NETWORK_IO                                             0 2-Read Committed                             1               94                  2375                   76                    1            2             6879                 3627                   11                606323                 9285              76699 2011-12-02T20:19:29.850        2011-12-02T20:19:29.850                           2           0               1                       7471              .00                    0 SELECT                   2011-12-02T20:19:29.867        2-Read only     2-Active        2011-12-02T20:19:29.850        suspended       SELECT                                0                   -1           7           6                      0               76 2011-11-25T19:57:47.530        RSxxxx79             Microsoft Open Database Connectivity                          4504              4 ODBC                           RTformer                                                                                   4096 10.0.16.63           running                    6

./RSxxxx12__SQL_2005_Perf_Stats_Startup.OUT:2011-12-02T20:19:37.293              1004           0                  NULL                 NULL                   NULL                      NULL                      NULL                     NULL RTSinfo                                      NULL NULL                                                         SELECT Service.ID, FileShop.szName,  FileShop.szPath FROM FileShop, Service WHERE Service.lPartitionViewID =  FileShop.lPartitionViewID AND FileShop.lPartitionViewID <> 0 UNION ALL SELECT 0, FileShop.szName,  FileShop.szPath FROM FileShop WHERE FileShop.lPartitionViewID = 0                  0x060007009F1FE836B8A16354000000000000000000000000               

./RSxxxx12__SQL_2005_Perf_Stats_Startup.OUT:2011-12-02T20:19:47.630              1004           0           0                   0             0 SUSPENDED       ASYNC_NETWORK_IO                                                                0                                                                                                                                                                                       ASYNC_NETWORK_IO                                             0 2-Read Committed                             1              140                  3622                  128                    1            2             6879                 3627                   11                606323                 9285              76699 2011-12-02T20:19:29.850        2011-12-02T20:19:29.850                           2           0               1                      17787              .00                    0 SELECT                   2011-12-02T20:19:29.867        2-Read only     2-Active        2011-12-02T20:19:29.850        suspended       SELECT                                0                   -1           7           6                      0              129 2011-11-25T19:57:47.530        RSxxxx79             Microsoft Open Database Connectivity                          4504              4 ODBC                           RTformer                                                                                   4096 10.0.16.63           running                    6

./RSxxxx12__SQL_2005_Perf_Stats_Startup.OUT:2011-12-02T20:19:47.630              1004           0                  NULL                 NULL                   NULL                      NULL                      NULL                     NULL RTSinfo                                      NULL NULL                                                         SELECT Service.ID, FileShop.szName,  FileShop.szPath FROM FileShop, Service WHERE Service.lPartitionViewID =  FileShop.lPartitionViewID AND FileShop.lPartitionViewID <> 0 UNION ALL SELECT 0, FileShop.szName,  FileShop.szPath FROM FileShop WHERE FileShop.lPartitionViewID = 0                  0x060007009F1FE836B8A16354000000000000000000000000               

Analysis:

The disk contention are usually due to these reasons:

1. There is an underlying H/W issue. This could mean that some driver or firmware needs an upgrade (essentially all your firmware/drivers should always be on the latest builds available) or it could be a problem with the SAN configuration or it could be that this particular SAN is not dedicated to SQL Server.

2. OR it could be that the disk contention is due to the fact that the queries executing on SQL Server are performing excessive amount of reads due to sub-optimal plans. Sub-optimal plans would be generated if your indexes are out of shape or if your statistics are skewed. This would require a statistics update with a full scan. (Please refer my statistics update post.) You should be able to find out which process is driving IO by looking at the Process performance object in Perfmon and the I/O Data Bytes counter. There is no fixed value for good and bad performance for this counter. The amount of I//O being pushed through should be compared with the response time of the disks and the I/O thresholds which the disk can handle.

3. OR it could be that the disk activity on the disks has increased over a period of time due to increase in data or addition of new databases. In such a case, it would be a good idea to provide a separate disk or storage area for some of the databases as the current activity might be too much for the disk to handle. Sometimes, load balancing for physical storage also helps in alleviating a disk contention issue.

4. Ensure that the client application processes all rows it is requesting.   If not all rows will be needed or used – consider only querying the rows that are needed (TOP X for example).

5. Look at your NIC configuration on the server and make sure there are no issues (physical card issue, autodetect not picking the fastest speed)

6. Validate the network components between the application/clients and the SQL Server instance (router, for example). This should be done as per

Your application design. Client -> webserver ( if applicable ) -> SQL Server box

- Update the network related drivers on Client , webserver ( if applicable ) , SQL Server box

-Here are a few articles that summarize this:

http://blogs.msdn.com/joesack/archive/2009/01/09/troubleshooting-async-network-io-networkio.aspx

http://support.microsoft.com/kb/822101

-I think it’s pretty safe to say that the network delay is causing the issue and need to elimate this from the equation first.

7. STATISTIC report from perf stats log in the output folder confirms that statistics are outdated. Some cases update stats with full scan can be helpful in other cases update stats by sampling can be helpful, i depends upon environment to environment.

8. Suggest you create indexes on the table(s) for which this particular query belongs to, use database tuning advisor.

SELECT Service.ID, FileShop.szName,  FileShop.szPath FROM Fileshop, Service WHERE Service.lPartitionViewID =  Fileshop.lPartitionViewID AND FileShop.lPartitionViewID <> 0 UNION ALL SELECT 0, FileShop.szName,  FileShop.szPath FROM Fileshop WHERE FileShop.lPartitionViewID = 0

9. Not very related to query just a general recommendation, Change Max Memory Settings in SQL Server to a value less than the Physical Memory on the box. Here is an example to set Max. Server Memory of SQL Server to 2.5 GB:

sp_configure ‘max server memory (MB)’,2560

go

reconfigure with override

go

Additional Info:

LOG_M_S – is this just a normal thing?

Share lock waittype is indicated by LCK_M_S, from our analysis what I can tell you is the physical I/O bottleneck that is prolonging the duration of transactions and locks.

ASYNC_NETWORK_IO – how can the rate at which clients fetch cause a blocking issue?

ASYNC_NETWORK_IO normally means that the server is waiting for the client to fetch the data. Here is some information regarding the waittype:

http://msdn.microsoft.com/en-us/library/ms179984.aspx

Based on our experience, in most of the cases, this problem should be caused by the client application issue (we may see this behavior when we are fetching a very large result set as well, so it’s not necessary means an issue when you see the waittype).

So in our case ASYNC_NETWORK_IO is not causing blocking. What we are saying is, we see Lock Blocking is different and we think its caused by the physical I/O bottleneck but at the same time we also see ASYNC_NETWORK_IO wait type which is often caused by the client not processing results
from the server. This causes the network buffers to fill. The server cannot send more data to the client, so the task executing the batch needs to pause while waiting for the ability to continue sending results and hence the wait type ASYNC_NETWORK_IO.

So LCK_M_S  lock blocking should be correlated with physical I/O bottleneck and ASYNC_NETWORK_IO should be correlated with the network bandwidth or the query that’s coming from the client is not fine-tuned. ( for example, Its like querying entire table instead setting condition for the data fetch using where clause. )

2 Responses to “Performance Analysis? SQL Server slow or Query slow? PSSDIAG is there for the rescue!”

  1. danieladeniji said

    Thanks for sharing this. You really went into a lot of depth and I personally found it correlates with some of the problems that I am having at this time.

    I have being meaning to get a bit more in pssdiag and this is a good encouragement; for that as well.

    Happy SQL.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 177 other followers

%d bloggers like this: