Search This Blog

Thursday, January 8, 2015

System Center Custom Application Monitoring

Sometimes, a problem can seem really hard, but turn out to be rather simple if approached from a different perspective. The application team I support has a poorly written application running on a server. This should sound familiar to most system admins out there.

The application crashed the other day, but none of the windows services actually stopped, nor were there any event log errors to really go off of either. How do we monitor that service then? One option might be the TCP port but nobody seemed to know what that was. Digging into the application, it had a small scripting engine, which allowed us to run some basic scripts.

The first thought the application team had was, we'll write an event log saying everything is ok, and when that doesn't appear, we want an alert. Well, we can monitor for missing alerts in SCOM, but that seemed like it would be destined for error.

What we settled on instead was to have the program simply drop a file in the temp directory. It would put the file there every 30 minutes, with the same name. So now what? I created a small and simple batch file that would check for the file, then delete it if it was there. Otherwise, report the file missing and the service stopped.

IF EXIST C:\TEMP\running.log GOTO Good
EVENTCREATE /T ERROR /ID 333 /L application /d "Custom Application Failed"
:Good
DEL C:\TEMP\running.log /q

I then set a schedule task to run every 30 minutes to run this batch file. When the file went missing, it would write the error to the event log. From there, just setup an event monitor in System Center to catch and alert on the event.

Operations Manager Performance Trending with Excel - No SQL Required

A powerful tool for administrators is to trend data to troubleshoot performance problems and forecast future resource needs. In the past, I've run SQL queries but needed a way to instruct support staff on a basic means to accomplish the same tasks right from the console. Thankfully, Operations Manager and Excel allow just that.

Let's get started.

Fire up the Operations Manager Console to the Monitoring section, then open the Windows Computers view (or any section where you access the computer health view, such as SQL)


Find a server you're interested in or simply select one from the list.

 
After selecting the system, select the Performance View under the Navigation pane of the task panel on the right side of the management console.
 
 
Once the Performance View window comes up, in the Performance Actions pane on the right side of the console, change your time frame via Select Time Range, selecting a meaningful period, such as two weeks or longer.
 
 
 
 
Now at the bottom of the performance monitor screen, select a counter you're interested. I'll use Percent Memory Used for this example.


When selected, a graph should display such as the following:
 
  
Going back to the Performance Actions pane, select Copy Data to Clipboard

Open up notepad and past the contents, which should look similar to the following:


Save the file with an xml extension
 
 
  
Now open Excel, select the Data tab and select From Other Sources -> From XML Data Import
 

Select the XML file created earlier; accept the import defaults when prompted
 

This should populate the Excel spreadsheet with an X and Y column. The first column is the date/time stamp and the Y column is the performance data.
 
 
Select all of the Y data and with it highlighted, select the Insert tab -> Line -> 2-D Line to generate a graph.
 

This should yield a graph in Excel such as the following:
 

Right-click on the graph line and select Add Trendline
 

Generally, accepting the default will paint a trendline  that is helpful for finding issues such as a memory leak or consistent data usage on a hard drive. However, you can play around with the trend to perform longer-term forecasts. I added 50 periods to the end of my trend line to see how memory might look in the future after my data set.
 
 
Graph results with the trendline:
 
 
The line extends a bit beyond the graph data and shows an overall flat trend on memory utilization. If the server had a memory leak, as an example, the graph might trend steadily upwards like this:

 
There you have it, a simple but powerful tool for analyzing data recorded in SCOM without a lot of effort.