Search This Blog

Tuesday, December 23, 2014

AD Site Availability Degraded / AD Site Performance Health Degraded

After deploying the Active Directory Management Packs, we had a domain controller start alert spewing. I had not come across anything out there that really dealt with the alert; the warning from this type of event was not in eventid.net either. But it's all figured out now and here is the solution to the perplexing problem I encountered.

You could also title this, "How to Perform an Online/Offline Defragmentation of your Health Service Store in System Center".

Problem Description:

First, the SCOM console began to fill up with AD Site Availability Health Degraded and AD Site Performance Health Degraded critical alerts from the Active Directory Management Packs.

AD Site Availability Health Degraded and AD Site Performance Health Degraded

On the offending domain controller, I observed the following Application event log spewing:
 

 
The contents of the warning were as follows:
HealthService (1704) A significant portion of the database buffer cache has been written out to the system paging file. This may result in severe performance degredation. See help link for complete details of possible causes. Log Name: Application | Source: ESENT | Event ID: 906

Troubleshooting:

Initially what I suspected was that I had an application or process going bonkers on the server, taking up memory and causing the SCOM agent to malfunction or be starved of resources. I loaded the Systernals Process Monitor utility to see what was happening when these events fired off, since typically it only took a few minutes in between each event. What was captured was a significant amount of file activity from the Health Service to
C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\Health Service Store\HealthServiceStore.edb . Essentially, there was no other process at the time of these warnings or corresponding alerts in the System Center Management Console that could account for issues on the system.
 
 
SCOM HealthService | ReadFile | C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\Health Service Store\HealthServiceStore.edb
 
With the smoking gun being the Health Service Database, I performed some quick online maintenance from within the console to start.
 
In the Operations Manager Console, I started by browsing to the Operations Manager folder, then Agent Details and selecting the Agents by Version view.
Management Console Tree -> Operations Manager -> Agent Details -> Agents By Version
 
 
Selecting the offending computer brought up the Health Service Tasks I could perform, Start Online Store Maintenance, being the one I was looking for.
Management Console Health Service Task for Health Service Database Maintenance | Start Online Store Maintenance
 
Final Solution:

Unfortunately, the online store maintenance was not adequate enough to remediate the errors and warnings I was encountering so I opted for an offline defragmentation of the Health Service Store database. Perform the following if local warnings persist on the client system.
 
  • Login to the offending client system via console or RDP
  • Open an administrative command prompt
  • Change directory to "C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\Health Service Store"
  • From the service console (services.msc) or from command prompt (net stop “Microsoft Monitoring Agent”), stop the Microsoft Monitoring Agent service
  • Run esentutl /r edb (without this, you likely won't be able to perform a defragmentation)
  • Next, run esentutl /d HealthServiceStore.edb
Running esentutl /d HealthServiceStore.edb in order to compact and defragment the health service database after log spewing occurred from loading the Active Directory management packs

When this completed, my HealthServiceStore.edb file went from 174MB to 27Mb and both the warnings in the local Application event log and the critical health alerts in the System Center Operations Manager Console went away.

Wednesday, August 6, 2014

Problems with the 2012 R2 Web Consoles

This post is a little long, but I wanted to include as much pertinent error information as possible to help folks properly identify if they are encountering the same type of issue.

Recently upgraded our systems to SCOM 2012 R2 and encountered some issues with client connectivity to the web console. SQL is on a separate system from the management console. Web and Management Console is on the same system (for perspective on how our systems are distributed).

First, let's start with some of the errors I was seeing:

From a client, attempting to connect to the AppAdvisor console:

Error on the client:

An error has occured - The additional error information can be found int he Windows Application Log. We appologize for any inconvenience caused by this temporary service outage.


Warning on the SCOM management server when connecting to the AppAdvisor console:

Event code: 3005 Event message: An unhandled exception has occurred. Event time: 8/5/2014 9:38:10 AM :
Event time (UTC): 8/5/2014 4:38:10 PM :
Event ID: 20964fc40f3c43348ccff13e467e259a :
Event sequence: 7 :
Event occurrence: 1 :
Event detail code: 0 :
:
Application information: :
Application domain: /LM/W3SVC/1/ROOT/AppAdvisor-1-130517302775480349 :
Trust level: Full :
Application Virtual Path: /AppAdvisor :
Application Path: C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\WebConsole\AppDiagnostics\AppAdvisor\Web\ :
Machine name: SCOM-MS01 :
:
Process information: :
Process ID: 4332 :
Process name: w3wp.exe :
Account name: NT AUTHORITY\NETWORK SERVICE :
:
Exception information: :
: Exception type: WebException :
Exception message: The request failed with HTTP status 401: Unauthorized.:
:
Request information: :
Request URL: http://scom-ms01/AppAdvisor/Pages/ReportService/ReportServicePageImpl.aspx?_r=&_c=g&_pg=436ac5a4-3e70-41b9-9fe1-5a5c96724dc0&_s=2C369460 :
Request path: /AppAdvisor/Pages/ReportService/ReportServicePageImpl.aspx :
User host address: :
User: :
Is authenticated: True :
Authentication Type: Forms :
Thread account name: NT AUTHORITY\NETWORK SERVICE :
:
Thread information: :
Thread ID: 17 :
Thread account name: NT AUTHORITY\NETWORK SERVICE :
Is impersonating: False :

Similarly, I received that error when connecting to the AppDiagnostics site as well:

Event code: 3005
Event message: An unhandled exception has occurred.
Event time: 8/5/2014 9:32:02 AM
Event time (UTC): 8/5/2014 4:32:02 PM
Event ID: 67e2d2ba9c4842c3bc041c62bad932e3
Event sequence: 8
Event occurrence: 1
Event detail code: 0
Application information:
Application domain: /LM/W3SVC/1/ROOT/AppDiagnostics-2-130517299136496487
Trust level: Full
Application Virtual Path: /AppDiagnostics
Application Path: C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\WebConsole\AppDiagnostics\Web\
Machine name: SCOM-MS01

Process information:
Process ID: 8048
Process name: w3wp.exe
Account name: IIS APPPOOL\OperationsManagerAppMonitoring

Exception information:
Exception type: OleDbCommandException
Exception message: Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'.
Command text: Select CONFIGID, CONFIGNAME, CONFIGVALUE From apm.CONFIG
Connection: Provider=SQLOLEDB;Server=scom-sql;database=OperationsManager;Integrated Security=SSPI;

Request information:
Request URL: http://scom-ms01/AppDiagnostics/Pages/Authenticate.aspx?ReturnUrl=/appdiagnostics
Request path: /AppDiagnostics/Pages/Authenticate.aspx
User host address:
User:
Is authenticated: False
Authentication Type:
Thread account name: IIS APPPOOL\OperationsManagerAppMonitoring

Thread information:
Thread ID: 9
Thread account name: IIS APPPOOL\OperationsManagerAppMonitoring
Is impersonating: False

And finally, on the primary /OperationsManager web console, I'd receive an authentication error. The client would be prompted multiple times for a username and password and eventually bomb out.

 
Server Error - 401 - Unauthorized: Access is denied due to invalide credentials. You do not have permission to view this directory or page using the credentials that you supplied.
 
Solving the problem.

First step was a prerequisite for both the AppAdvisor and AppDiagnostic issues.
  1. Open the IIS console on the web console server
  2. Select "Application Pools"
  3. Select "OperationsManagerAppMonitoring"
  4. If you are receiving the errors and the application pool "Identity" is set to "ApplicationPoolIdentity", with the OperationsManagerAppMonitoring pool highlighted, select "Advanced Settings" option in the action pane.
  5. Under "Process Model", change the Identity from ApplicationPoolIdentity to "NetworkService"
  6. Run an IISReset at an administrator (elevated) command prompt
At this point, the AppDiagnostic website started working, but the AppAdvisor site did not. I had to perform additional steps for that site.
  1. Open the IIS console on the web console server
  2. Select and expand the site (Default Web Site on my server) where the Operations Manager web console is installed.
  3. Select the virtual directory named "AppAdvisor"
  4. Open the "Authentication" applet
  5. If not already enabled, enable the "Anonymous" and "ASP .NET Impersonation" methods
  6. Run an IISReset at an administrator (elevated) command prompt
Final piece to get into the Operations Manager web console was to adjust an IE setting, oddly enough. To fix this portion, I took the following steps:
  1. Open "Internet Options" in Internet Explorer
  2. Select the "Advanced" tab
  3. Scroll almost all the way down and uncheck the box for "Enable Integrated Windows Authentication"
After these adjustments, all web consoles were available for remote clients.

Friday, February 7, 2014

SCOM 2012 Failed Accessing Windows Event Log with Veeam Management Pack

Noticed during a routine health check that our two Management Servers were showing a warning state. Error read as "Failed Access Windows Event Log" <management server 1> (Health Service).

Error details show the following:

The Windows Event Log Provider is still unable to open the Veeam Collector event log on computer 'management server 1'. The Provider has been unable to open the Veeam Collector event log for 720 seconds. Most recent error details: The specified channel could not be found. Check channel configuration. One or more workflows were affected by this. Workflow name: many Instance name: many Instance ID: many Management group:

We have the Veeam management pack for SCOM loaded and sure enough, this appears to be a documented issue on the Veeam knowledge base.

http://www.veeam.com/kb1496#/kb1496