Search This Blog

Monday, November 26, 2012

SCOM Exchange Management Pack Pitfall

Well this was certainly a fun one to track down. Again, being my week on call and it being a holiday, there was no rest for the wicked apparently. The problems actually surfaced earlier in the week but then reared their head again on Saturday. If you get into this situation, there will be what seems to be random issues with exchange, queues shutting down, mailboxes getting disconnected. All sorts of weird stuff. I'll explain more after the errors. Here are some of the error messages we started receiving in the email queue and the exchange event logs.

***************************************************
Alert: The database copy is very low on log volume space. The volume has reached critical levels.

Source: Database Copy (log) Logical Disk Space (D:\DB1) - <server> (Mailbox) -
 
Path: <server>; <server>(Mailbox) - Last modified by: System Last modified time: 11/24/2012 10:33:24 AM Alert description: TimeSampled: 2012-11-24T10:32:45.0000000-08:00

ObjectName: LogicalDisk

CounterName: % Free Space

InstanceName: D:\DB1

Value: 4

SampleValue: 11.9463672637939


***************************************************
Log Name:      Microsoft-Exchange-Troubleshooters/Operational
Source:        Database Space
Date:          11/23/2012 4:27:32 PM
Event ID:      5701
Task Category: (1)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:     <servername>
Description:
The database space troubleshooter detected a low space condition on volume D:\DB1\ for database DB1. Provisioning for this database has been disabled. Database is under 16% free space.
***************************************************

Now let's pull the curtains back a bit and find out what's going on here. There are several things involved. First, Exchange 2010 has a powershell script called troubleshoot-databasespace.ps1. Without SCOM, this script would be called manually. The Exchange management pack, however, calls it automatically. More details can be found here - http://letsexchange.blogspot.com/2012/09/exchange-2010-sp1-added-new-script.html

Troubleshoot-databasespace.ps1 refers to a file that has the limits set to gauge the database health. Here are the default constants in the file, StoreTSConstants.ps1:

There were found in the \Exchange14\Scripts folder

# The percentage of disk space for the EDB file at which we should start quarantining users.
$PercentEdbFreeSpaceDefaultThreshold = 25
# The percentage of disk space for the logs at which we should start quarantining users.
$PercentLogFreeSpaceDefaultThreshold = 25
# The percentage of disk space for the EDB file at which we are at alert levels.
$PercentEdbFreeSpaceAlertThreshold = 16
# The percentage of disk space for the EDB file at which we are at critical levels.
$PercentEdbFreeSpaceCriticalThreshold = 8

So, in our case, we have a 600 Gb lun and were down to roughly 12% of our space, falling below the alert levels set by default, but still had 72Gb of storage left. So exchange went into alert mode. We started receiving issues of users not being able to connect to exchange or unable to send messages but still able to receive them. Very strange stuff. I changed the values to start alerting at 5% and then put the servers in maintenance mode to get over the immediate issue.

So, don't get caught in the Exchange Management Pack trap. Ensure you set these levels to something that makes sense. Unlike the SCOM disk alerts that have a two factor calculation mechanisms, this is the old-fashioned percentage calculation, making it very easy to get bit on large volumes. The other option is that you turn off these monitors. Given the space monitoring is somewhat redundant with the SCOM disk space alerts, that may be a safer alternative.








1 comment:

  1. If I want to turn off this monitoring and rely on SCOM disk space alerts, where do I do that?

    ReplyDelete