Optimization of Watches notifications
The point of optimization is to reach a state, where you should react to every message (email, SMS) that you receive, make some corrective actions for source of the problem, so the message is not generated pointlessly. If there's a message, which you deliberately don't read, it means you don't need the message at all, and it's only increasing the risk that you miss some other important message.
Today there's a lot of tools to optimize and reduce the number of messages from Watches (and from other parts of CM too), though as our experience has shown, they're not really used, even though they can provide significant transparency in status of technology. We are aware that there's plenty of room to improve the tools of messages optimization, for example, you can expect prioritization of Watches.
Overview of notification types
the fundamental type of notification is record about an error on CM Server, which can be viewed through CM portal at Presentation and Evaluation -> Warnings -> Errors and select Watches (Online).
CM Server sends an email notification to assigned operators of the company or to the computer when FAIL state occurs, then every 24 hours when it lasts, at switching back to OK status, optionally if notification of Unknown state is enabled, then also all changes Unknown -> OK/FAIL, OK/FAIL -> Unknown.
C-Monitor sends email and SMS messages independently from CM server notifications using Actions of Watches (that's why it might happen that you receive two reports of one error - one from CM server and the other one from C-Monitor action).
Activate notification by Actions, if you need to :
- remind of a Watch's FAIL state more than once every 24 hours
- send the message also to other operators, than the company's assigned operators and to the computer on which the Watch was created
- send a SMS message (CM Server doesn't iniciate sending of SMS messages, that's always C-Monitor's job)
Tools to reduce the amount of notification
Delay of switching a Watch to failed state
Switching a Watch to Unknown status without notification
Block notification of Watches' status from CM Server while keeping records of errors
Block evaluation & notification of Watches' status from CM Server
Delay of switching a Watch to failed state
Delay of occurence of failed state is a frequently used setting of CM Watches.
You define a time limit, during which FAIL state must persist, for the Watch to be evaluated as FAILED.
Delay for Fail state is used in many situations, when you don't want to be notified about a short-term FAILED status, such as :
1) You're testing availability of an ip address on network using the condition Ping every 5 seconds. If you miss one ping response, result of the condition is immediately set to False state and defaultly, the Watch would be switched to FAILED state. You'd then be notified about change of its state, even though this might be an "insignificant" failure.
Using Delay for Fail state set to e.g. 30 seconds will secure, that FAILED state of the Watch will only occur after 30 seconds since the first FAIL state of the condition, without a change.
2) You're testing load of CPU or Memory using the condition CPU usage or Memory usage.
If the limit value of the device's load,which you've set in the condition, is exceeded (e.g. < 95%), the condition is switched to False state. However, this load may be short-run, and therefore has no informative value. The merits of FAILED state begin after the FAIL state lasts for several seconds, or minutes. That's why you should use Delay for Fail state, which increases informative value of the Watch's status.
The procedure of setup through C-Monitor console is as follows:
1) Open the tab Watches in the C-Monitor console menu
2) Double click the Watch that you want to edit
3) Tick the checkbox Delay for FAIL state and set the required time period
Switching a Watch to Unknown status without notification
Unknown state of a Watch is a specific state,which signalizes that the Watch's status is "indefinite", resp. it cannot be evaluated either as OK or Failed.
The Watch's state can be set to Unknown by any condition, regardless the result of other conditions of the Watch.
For a better understanding will serve an example:
We want to set evaluation of a Watch to check usage of CPU and Memory on server during working hours, when a certain load should not be exceeded, i.e. the Watch should be in OK state the entire time. If the Watch reaches FAILED state, notifications are delivered to the operators.
However, there are maintenance processes running outside office hours, which load the devices to high performance values, so we can assume that the Watch will reach the FAILED state, though the operator doesn't need to be notified about it, as this state is not a relevant error.
Exactly for this purpose, to get the Watch to the described state where it won't be evaluated as OK, nor as FAILED, there's the Unknown status.
To define Unknown state, create a support condition. If you want to specify certain time interval, when the Watch should be in the Unknown state, the most suitable is the condition Time Range.
Notification of Unknown state is defaultly enabled for CM Watches, so this must be ticked out from the checkbox.
The described setup is shown on the following image, where in the steps 1 and 2 you can see that the condition Time Range was added to the Watch, which will cause that the Watch will be evaluated as Unknown outside 6:00 - 18:00. To disable notifications about the Unknown state, the checkbox in step 3 mustn't be ticked.
Another example for internet/network services is the availability check of a certain location, where there are tests running for individual services, and you don't want to be notified of errors of all the individual services in case the entire location fails. Availability of a location can be tested by an independent Watch and then the condition "Watch state" added for the particular tested services, which will change their status to Unknow, if state of the Watch is failed.
Block notification of Watches' status from CM Server while keeping records of errors
Blocking of notification of Watches' status from CM server while records of errors are still created is also used for monitoring via CM Watches, where the operator won't be notified of error states by notification messages, but it's necessary to keep, and to be able to browse the records.
This option is often used for Watches, which frequently change their status, and the operator would be unnecessarily burdened by the frequent notification.
For example, the operator is deployed to track an inferior internet line, where outages begin to occur frequently. The operator will disable notification through CM Portal, and the error states are monitored by Watches Online.
The blockage is set through CM Portal Admin zone -> Watches - Settings -> choose a concrete Watch and press "edit"
Subsequently, select the option BLOCKED notification on CM Server.
Block evaluation & notification of Watches' status from CM Server.
Blocking of both evaluation & notification of Watches' status from CM Server is used in case you only need to evaluate the Watch in C-Monitor through C-Monitor console.
Therefore, it won't be possible to check results of such Watch through CM Portal at all.
The blockage is set through CM Portal Admin zone -> Watches - Settings -> choose a concrete Watch and press "edit".
Subsequently, select the option BLOCKED evaluation & notification on CM Server.