Tony Davis

Simple-Talk Editor
News, views and good brews

SQL Server Diagnostics

Published Tuesday, February 12, 2008 11:38 AM

SQL Server diagnostics are a mess. To monitor the well-being of a server, you need to be able to inspect all logged errors, have a whole raft of performance information, WMI data, and be able to study the history of backups and scheduled jobs. You have to monitor server downtime, low disk space, low physical memory, job failures, jobs failing to start, jobs taking too long to finish, blocks, locks and long running queries. There will be DBCC checks too. Because the beast we think of as SQL Server is actually built as a collection of services, you need to look for information in several places, including the host operating system. As well as that, you need to check periodically on the top cached query plans just to ensure that there is nothing awry with indexes or application design. You need also to check for long-running SPIDs, intrusion attempts, and patterns of usage that indicate fraud.

The DBA with around a hundred production servers may rightfully start to believe that her/his workload is excessive, especially as there are a whole raft of other Database administrative functions to perform as well. Around two years ago, the last time we got reliable information, Google admitted to half a million PCs serving information. It must be a lot more now, and it would be interesting to know how they are administered.

The past twenty years have seen an explosion of techniques for distributing the holding, analysis and processing of data. The monolithic systems of a previous generation have, in some cases, been replaced by container-loads of commodity PCs. Relational databases haven’t always kept pace with the change.

A group of SQL Servers requires an approach that is more organic. One has to think of syndromes, where one deduces a cause from a variety of symptoms or indications. It is one thing to collect all the indications of trouble, but quite another to work out the underlying cause. What then are the key symptoms; The twinges of pain the spots, the rashes, the depressions? What, even, is the catalogue of illnesses?

There have been many fascinating approaches to this problem, and a number of software tools that seek to provide the panacea. SQL Response Beta really is a Beta, in that it provides a framework, in order to find out from real users what the key requirements are. It acknowledges that the DBA really knows best. Even this early Beta is the result of much reflection on the predicament of DBAs in administering large numbers of Servers. It doesn’t pretend to provide all the answers. Dan and the other developers really want to know whether they are on the right track; what changes or improvements are necessary to make it more useful; and anything else that would be important to professional DBAs with heavy workloads.

One speaks of DBAs as though they were a uniform breed with similar requirements. The reverse is truer, and it would be fascinating to learn of some of the extremes of the tasks facing working DBAs, and whether a software tool could really help the workload significantly. Do you have a vast number of servers, a huge jelly-mass of inscrutable scripts? Post your headaches as a comment to my blog. All entries will go into a draw for a $50 Amazon voucher!

Cheers,

Tony.

Comments

 

grumpy old dba said:

The usual failures of any or all of these is the ability or lack of it of the person who is viewing/monitoring such.
For example many companies set up monitoring frameworks such as MOM, BMC etc. etc. with loads of alerts and thresholds - usually intended for the non dba and set to values which usually have little relevance and generally either cause so much spam they get turned off or ignored thus invalidating the original purpose.
I doubt there is a universal answer to diagnostics - I swear by SQL Diagnostic Manager ( hope you don't mind another's product mentioned ) but unless the user of this can interpret or understand the output/statistics the cost is wasted. Usually "enterprise" solutions end up being unwieldy requiring dodgy sysadmin logins, agents and/or linked servers. As a DBA who delves into such diagnostics and troyuble shooting I'm not sure what the answer is - consider SSRS on 2005 , this currently requires dot net, IIS, o/s and SQL skills to diagnose problems. Good post not sure where it's going though as so often diagnostics is such a wide area.
February 13, 2008 9:24 AM
 

grumpy old dba said:

The other point I forgot to mention is that tools cost money, at even $100 per instance a tool for 400 servers is clocking  $40,000 ( and I figure $100 to be conservative ) ever asked for a DBA tool that costs that much?
February 13, 2008 9:34 AM
 

Patrick Index said:

I agree wholeheartedly with Tony that SQL Server diagnostics is a joke and was wondering when Red Gate was going to come up with an innovative solution.  I have only used Idera’s SQL Diagnostic Manager which was a help but not perfect.

For me the most important feature would be a blocking history monitor which records all  blocking situations (which are longer than a certain time which could be set – after all in some apps blocking is to be expected) and provides all the SQL of the offending spids.  In the past I have had to write various mini systems to do this which were never perfect.

The second area that I would like clear, concise, detailed information on is the performance of the disk system or sub system in the case of a clustered SAN.  To be able to get away from using Perfmon and trying to workout which are the correct counters to analyse this would be a great relief.

The one feature I liked the best in SQL Diagnostics was “the worst performing query” history which was very comprehensive and useful.  It provided the io stats for the query, the length of time it ran for, how often it ran over a period, and its hierarchy (i.e worst query/longest running query at the top) within the system.  You could also export this data for further analysis.

I also liked the fact that it worked remotely from the server.
February 14, 2008 4:42 AM
 

ValeriyNY said:

Well, I don't have hundreds of SQL Servers in my environment; we have 2 clusters with 4 nodes each in PROD; but each node has 100+ small databases on it. We use Idera's SQL Diagnostic manager; but sometimes it does not keep up with load; e.g. "Proc cache" page is never able to refresh. Sometimes I think Qwest Central (which we had in my previous company) provides me with more valuable information from Application (not System) DBA perspective. Performance Dashboard reports are good as well. Definetely would like to have "blocking history report"
February 14, 2008 11:21 AM
 

johnc said:

Lets just start with the simple stuff..

For SERVER X i need to set a threshold for CPU usage for Low, Medium, High Alert (i.e. 70, 80, 90%). For Database X on Server X I need to set a threshold for L,M,H for amount of free space left... X MBs, XX MBs, XXX MBS. For Physical server X I need to set thresholds for L,M,H if the physical disk spaces is less than X, XX, XX.
If any job failes on SERVER X display it on the list.  

If the Low threshold is reached change an icon to yellow.. if a Medium is reached change icon to lite red & send normal email.  If High is reached send "Important email" and flash something on screen.

I need to be able to flag a specific alert as "noted and don't bug me" anymore.

Do not want a tool where 10 people running it mean ten different programs hitting the servers.

-jfc-
February 15, 2008 10:01 AM
 

GSquared said:

I've been using the SQL Response Beta, and it certainly has some good ideas in it, but it also has a few bugs.

For example, it keeps telling me the log file for one of my databases has 2.7 Gig of free space in it, and that that's excessive.  That would certainly be excessive, and a good target for a shrink file command, except for the minor detail that the log file for that database is only 5 Meg (unless Windows is lying to me about that, which I doubt), and SSMS tells me the log file has 0.2 Meg of free space.

Other than that, I'm finding it quite handy.  Another DBA I showed it to became quite interested after a brief glance.

As far as what I'm mainly in need of with monitoring multiple servers, it would have to start with "this server has no maintenance plans" or "this server's maintenance plans only include backups", because if that's the case (as it is with several of the servers I'm currently working with), no other message is all that important.

After that, a list of the indexes that are badly fragmented, a list of long-running queries, with the ability to filter so it ignores certain procs or jobs that I already know are going to take a long time, outdated table statistics with a time threshhold that I can set; those would certainly be a good start.
February 15, 2008 12:05 PM
 

grumpy old dba said:

as I was browsing in passing I just wanted to say that capturing blocking history is very easy and the sql cat team put out code to do this many many years ago. I have a simple proc and table which captures this ( when i want to ) think it's available on my blog - if not mail me and i'll send you the code
February 27, 2008 10:27 AM
You need to sign in to comment on this blog

















<February 2008>
SuMoTuWeThFrSa
272829303112
3456789
10111213141516
17181920212223
2425262728291
2345678
Monitor your Database Backups Using Operations Manager
 Thomas LaRock shows just how easy it is for a DBAs to monitor any aspect of the databases in their... Read more...

Verity Stob: Geek of the Week
 Real geeks read Verity Stob. Verity writes her painfully funny invective from a powerful advantage, she... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Managing Exchange 2007 Mailbox Quotas with Windows PowerShell
 The use of PowerShell with Exchange Server 2007 can do a great deal to ease the task of managing... Read more...

Optimizing Exchange Server 2007
 Brien Posey ponders an 'off the cuff' remark that Exchange 2007 runs so well with a default... Read more...