Search This Blog

Wednesday, May 23, 2012

SCOM - Putting Systems in Maintenance Mode through Citrix

One of the goals I have is to give the ability of application developers and other individuals the ability to put their systems into maintenance mode without necessarily having to access the Management Console directly or get me involved. When they do code updates or other system maintenance, it is handy to give them a quick and basic way to put their systems into maintenance. Additionally, there are times where it is nice to have the ability to put systems into maintenance mode from a cell phone, tablet or other device from a remote location, which is where Citrix comes in handy. Basically, I put the following code into a file and publish that script through Citrix to the users that may need it.

Here is the script:

# Enter the FQDN of your SCOM management server in this variable
$RMSFQDN = "my-managementserver.mydomain.internal"

# Enter the internal DNS suffix for your environment
$DNS = "mydomain.internal"

$Name = "Microsoft.EnterpriseManagement.OperationsManager.Client"

$ModuleLoaded = Get-Pssnapin $Name -ErrorAction SilentlyContinue

If (-not $ModuleLoaded)
{
add-pssnapin "Microsoft.EnterpriseManagement.OperationsManager.Client";
}

New-ManagementGroupConnection -ConnectionString $RMSFQDN
Set-Location "OperationsManagerMonitoring::";

$startTime = [System.DateTime]::Now

# You can change the default time for how long systems should be in maintenance
$Hours = 3

$endTime = $startTime.AddHours($Hours)

$comment = "Computer Maintenance"

While ( ($computerPrincipalName -ne "done") -or ($computerPrincipalName -ne "Done") )
{
 $computerClass = get-monitoringclass -name:Microsoft.SystemCenter.ManagedComputer
 $computerPrincipalName = Read-Host "Enter the computer name to put into maintenance (enter 'done' to finish maintenance mode)"
 $computerCriteria = "PrincipalName='" + $computerPrincipalName + "." + $DNS + "'"
 write-host $computerPrincipalName
 $computer = get-monitoringobject -monitoringclass:$computerClass -criteria:$computerCriteria

 if($computer -eq $null)
 {
  $unixClass = get-monitoringclass -name "Microsoft.Unix.Computer"
  $monObject = Get-MonitoringObject -monitoringclass:$unixClass
  $computer = $monObject | where {$_.displayname -eq $computerPrincipalName}
 }
 ELSE
 {
  $computerClass = get-monitoringclass -name:Microsoft.Windows.Computer
  $computer = get-monitoringobject -monitoringclass:$computerClass -criteria:$computerCriteria
 }
 if($computer.InMaintenanceMode -eq $false)
 {
  "Putting " + $computerPrincipalName + " into maintenance mode"
  New-MaintenanceWindow -startTime:$startTime -endTime:$endTime -comment:$comment -monitoringObject:$computer
 }
}
stop-process -Id $PID

Wednesday, May 16, 2012

Monitoring Theory (Murphy's Law of Monitoring)

Yeah, I just created a new monitoring theory. Ok, might not be new but it is the philosophy of how I often configure System Center as the monitoring platform of choice. As a systems admin, we all hate the late night alerts that essentially mean nothing. So how do you prevent those alerts from hitting your phone whilst in the comfort of your sleep? How should monitoring be approached? Do you care that once a day, a server's CPU spikes and generates a critical alert? I take this approach to alerting and monitoring, transient problems should be collated over a period of time to see if there is a long term trend that needs addressing; alerts, especially after hours, should be comprised of site, server or service down plus hard disk space issues. Ultimately, are these not the events that will have the corporate director or customer calling you in the morning to chew your head off? So start your approach there, whether specifically subscribing to those alerts individually or lowering the severity of other alerts (cpu utlization, disk slowness, etc.). As for that transient information, create an SLA report with somewhere between a 90-95% value. Why? Well trying to get a server to have 99% acceptable values or CPU utlization could get expensive and likely be a waste of resources when the server is not busy the other 80% of the time. What the SLA gives you is a trending value for the alert that you do not necessarily care about if it happens once or twice. However, if it is consistent enough over the period of a month or quarter, then you may want to look at some upgrade planning or optimization of the server. The SLA reports also allow you to look at the health of all monitored devices at once, instead of ad hoc alerts that come in for a transient condition.