Search This Blog

Showing posts with label monitoring. Show all posts
Showing posts with label monitoring. Show all posts

Friday, July 6, 2012

Setting up SNMP Monitoring in SCOM 2012 (Part - 2)

**Update, Oct. 30th, 2013. Recently, I have been successfully deploying SCOM without having to also install the native SNMP options in Windows. However, the screens I normally use to validate that SNMP is working did not function correctly. While I was able to generate alerts, the event captures that I show on the last step of this post did NOT function. So, until I am able to get all the features working as I'd like, the steps in this post provide maximum flexibility in SNMP monitoring. I hope, in the future, to perform a side-by-side comparison of what works and does not work based on your installation type. So, you could skip to Step 8 in this guide if you are so inclined and proceed through step 15, but the validation steps after that will likely not work properly.

If you have landed on this page, you are lkely interested in setting up SNMP monitoring for System Center Operations Manager 2012 and have probably been frustrated with the lack of consolidated information or even outdated information found on the Internet. Prepare for a quick and accurate guide to get you trapping SNMP events in no time! I have labeled this Part-2, though there is no Part-1 just yet. I presume that you have taken steps to already discover devices in SNMP and now want to start seeing what type of traps are being generated from those systems. If you haven't been that far yet, let me know and I will post some resources.

First thing is first, despite what you may have read up until now, you still need to have the SNMP service running on the management server that is receiving the traps, do not disable the SNMP service. The TRAP service should be installed but turned off. This contradicts almost every other blog out there but we could not get traps coming in until we turned the service back on, period. Try the enclosed methods first and if you want to toy around, go from there, but I cannot guarantee that you'll be able to get traps if you disable both services. Additionally, to test traps, we setup a basic CentOS system running SNMP. We added the device to SCOM under networking devices. We did not install the LINUX agent.


  1. Open up the Windows Server Manager, then select Features and on right, select "Add Features"
  2. Windows Server Manager to install SNMP for Microsoft System Center Operations Manager
  3. Select SNMP Service along with SNMP WMI Provider, you will have to expand to in order to select that additional component.
  4. Install SNMP and WMI SNMP Provider in order to receive SNMP traps for System Center Operations Manager
  5. Select "Install" and wait for the installation to complete.
  6. Open services.msc via the run prompt, or through the server manager. Scroll down until you see the SNMP services. Disable the SNMP Trap service as shown.
  7. Double-click the "SNMP Service" to open the property settings.
  8. In the SNMP Services Properties window, select the option to "Accept SNMP packets from any host" and then input a community name, such as "public".
  9. System Center Operations Manager SNMP settings require a community name and need to accept traps from all devices
  10. Open the Operations Managre console, select Authoring and then rules. Right-click on Rules in order to create a new rule
  11. Select Rules in order to create a System Center Operations Manager SNMP collection rule
  12. In the initial Create Rule Wizard, expand "Collection rules"->"Event based" and select the "SNMP Trap (Event)".
  13. Select to view all targets, then look for node type.
  14. Your rule should look like this now, a rule name that you have provided, the category of "event collection" and a rule target of "Node".
  15. Change target type to Node
  16. For now, you have to input an OID as the screen will not take a blank OID. The one below is generic and can be used for now (1.2.3.4.0).
  17. Go back to the authoring screen and change the scope to Node so that you can find the newly created rule more easily.
  18. Right-click and select properties or double-click the new rule you just created.
  19. Under the "Data sources" option, select "Edit".
  20. After creating the SNMP rule in Microsoft System Center Operations Manager, go back into the rule and edit the data sources to remove the OID
  21. Now clear the previously entered OID and select OK.
  22. Now navigate to the "Monitoring" screen in the Operations Manager Console. Let's create a folder to group our SNMP alerts and collections to make them easy to find. Right-click the top tree labeled "Monitoring" and select "New -> Folder".
  23. Select the newly created folder and lets add a new event view to that folder.
  24. Narrow the scope to show data related to "Node" objects.
  25. For your "Select conditions" show information generated by rules and select the new rule that was created earlier.
  26. Now, generate a trap. In this case, we're using CentOS 6.2 to generate a simple version 2c SNMP trap to send to the SCOM server.
  27. If all is well, you should see those traps show up in the SNMP event window.
If you do not see SNMP events showing up, then it is likely your local SNMP service is not functioning correctly or needs to be reinstalled. I will add a couple of great troubleshooting blogs in a few days.

Wednesday, June 20, 2012

Setup and Test SysLog Alerting Through SCOM 2012

The following link provides the details for the full document. I have enclosed the basic text here as well.
https://docs.google.com/document/d/1eSwu81QxGhkRIfua-OBv3XPKXuP4PwD_WfEXv8wYFX4/edit


Document Overview Information


This document contanins step-by-step instructions on how to monitor devices via SysLog alerting through Microsoft System Center Operations Manager 2012. The purpose of the document is to fully setup and test SysLog alerting for your network or server devices. This document will be periodically updated with methods to enable syslogging on systems such as VMWare, F5 and other devices.
Presumptions


This document presumes the user already has a number of devices discovered within System Center. If feedback requests that the document cover how to add an SNMP device or network equipment, then that may eventually be included.
References


The following sources provided valuable information for this whitepaper:








Jumping In



Essentially, when setting up SysLogging for Operations Manager, you are setting up a global monitoring parameter, meaning a single rule will alert for multiple devices. You could create groups and add some custom information to get more granular with the alerts, such as whether the alerts were coming from a network device, server, VMWare environment or Windows. We’ll explore that further in future document versions. For now, the document presumes you want to get SysLogging up as quickly as possible to interface with as many devices as possible.
SCOM SysLog Variables


As mentioned in Clive Eastwoods blog posting, there are a number of variables that you can add to the alerts to help identify the source. These variables are as follows:


$Data/EventData/DataItem/Facility$


$Data/EventData/DataItem/Severity$


$Data/EventData/DataItem/Priority$


$Data/EventData/DataItem/PriorityName$


$Data/EventData/DataItem/TimeStamp$


$Data/EventData/DataItem/HostName$


$Data/EventData/DataItem/Message$


SysLog Facility names



Syslog categorizes alerts from various system components through facility names. These facility names generally correspond to the operating level of the system component, zero or zero ring being the kernel in most systems and then moving up the chain from there. Enclosed is a full table of the SysLog Facility names:


<><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><>
Facility Number
Facility Description
0
kernel messages
1
user-level messages
2
mail system
3
system daemons
4
security/authorization messages
5
messages generated internally by syslogd
6
line printer subsystem
7
network news subsystem
8
UUCP subsystem
9
clock daemon
10
security/authorization messages
11
FTP daemon
12
NTP subsystem
13
log audit
14
log alert
15
clock daemon
16
local use 0 (local0)
17
local use 1 (local1)
18
local use 2 (local2)
19
local use 3 (local3)
20
local use 4 (local4)
21
local use 5 (local5)
22
local use 6 (local6)
23
local use 7 (local7)




In addition to monitoring which system area is generating the alert, the SysLog service will assign a criticality to the alert. These can be used to setup additional rules and views within SCOM, which will be covered in another section. The severity levels are described as follows (source wikipedia.org - http://en.wikipedia.org/wiki/Syslog):


<><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><>
Code
Severity
Description
General Description
0
Emergency
System is unusable.
A "panic" condition usually affecting multiple apps/servers/sites. At this level it would usually notify all tech staff on call.
1
Alert
Action must be taken immediately.
Should be corrected immediately, therefore notify staff who can fix the problem. An example would be the loss of a backup ISP connection.
2
Critical
Critical conditions.
Should be corrected immediately, but indicates failure in a primary system, an example is a loss of primary ISP connection.
3
Error
Error conditions.
Non-urgent failures, these should be relayed to developers or admins; each item must be resolved within a given time.
4
Warning
Warning conditions.
Warning messages, not an error, but indication that an error will occur if action is not taken, e.g. file system 85% full - each item must be resolved within a given time.
5
Notice
Normal but significant condition.
Events that are unusual but not error conditions - might be summarized in an email to developers or admins to spot potential problems - no immediate action required.
6
Informational
Informational messages.
Normal operational messages - may be harvested for reporting, measuring throughput, etc. - no action required.
7
Debug
Debug-level messages.
Info useful to developers for debugging the application, not useful during operations.




Now that we have the variables for the alerts along with the system facilities and severity levels to use, we can jump in to setup SysLogging!


Some quick background, if you want to use custom device groups for controlling your syslog alerts, the management pack that has the group will have to be used for setting up the alert rules as well. For this reason, I suggest setting up a management pack for syslog rules and monitors as well as device groups. This ensures you will be able to select those groups during the rule setup. Otherwise, groups outside of the management pack that are in unsealed management packs (i.e. ones that you have created already) will not be available.


1.       Open the Operations Manager Management Console, Select Authoring

2.        Select “Groups” and then “Create a New Group”


a.       For the group name, put in something such as “SysLog Devices”


b.      When selecting your management pack, I suggest creating a new management pack dedicated to syslog, label the management pack SysLog Monitoring or something to that effect so you know it has to do with the SysLog services.


c.       For explicit members, add the SNMP devices or Windows Hosts you might want to process.



4.       Right-click on Rules and select “Create a New Rule”

“Select a Rule Type”



5.       Select “Event Based” under the “Alert Generating Rules” and then select “Syslog (Alert)”. Select the management pack you setup earlier for SysLog alerts and groups. 


“Rule Name and Description”



6.       We will label this rule name as “Syslog Kernel Alerts (Critical)” 


7.       Rule category will be “Alert” 


8.       For the “Rule Target”, select the group you setup for syslog devices.

“Build Event Expression”



9.       In the “Build Event Expression” screen, select the “Insert” dropdown and select an “And Group” 


10.   In the first field, under “Parameter Name” type in “Facility” (Without the quotes), “Operator” should be “Equals” and the “Value” should be “0” (zero)


11.   Simple hit “Insert” to add another row. Under the “Parameter Name” type in “Severity”, “Operator” should be “Equals” and the “Value” should be “2” 


“Configure Alerts”



12.   On the “Configure Alerts” screen, you can alert the fields you want to include. I typically include all the variables allowed. If you were grouping devices, you could also include a descriptor here to show what the group was, such as routers, switches, servers, etc. Enclosed are the variables we use.


Event Description: $Data[Default='']/EventDescription$


Facility: $Data/EventData/DataItem/Facility$


Severity: $Data/EventData/DataItem/Severity$


Priority: $Data/EventData/DataItem/Priority$


Priority Name: $Data/EventData/DataItem/PriorityName$


Time Stamp: $Data/EventData/DataItem/TimeStamp$


Host Name: $Data/EventData/DataItem/HostName$


Error Message: $Data/EventData/DataItem/Message$


Post Creation Edits



Once the alert has been created, you may want to go back to the alert and configure the suppression fields, this way, a repeating alert from a system won’t flood your monitoring dashboard.


a.       To do this, go to the Rules section and double-click on your new alert.


b.      Select the “Configuration” tab and then “Edit” under the “Responses” section.  


c.       On the next screen, select the “Alert Suppression” 


d.      You can play around with the options here. The ones checked in this example usually work well to start.

Testing the Alert



This section outlines methods that can be used to test SysLog alerts.


1.       First step is to download the Kiwi Syslog Message Generator –





2.       After installing the Syslog message generator, select the options to match the alert conditions, in this case, a Facility of “Kernel” and Severity of “Critical”. The priority is not used in this case and can be any number. The target IP will be the SCOM Management Server.



3.       The SysLog Alert should now show up in the management console.

Tweaks



This process can be repeated by using different severity  levels and facilities. For example, if you want to have one rule that can alert for all critical severity levels, the you could setup a rule to look for severity 0,1 or 2 along with the kernel facility. The rule use an AND statement to include the facility level, then an or statement along wtih multiple expressions for the severity level.