Vigilance Computer Server, Applications and Network Service Availability Remote Outsourced MSP Health Monitoring - NOC Engineering - Proactive computer server, applications Frame Relay, WAN, SNMP and network Best Practices and monitoring. Home Based in Portland, Oregon and Seattle, Washington.

Information about Computer server, applications, network, WAN, Frame Relay monitoring basics. Vigilance Monitoring. Portland, Oregon. A proactive service availability monitoring provider. (MSP)

Vigilance

Network Security White Papers

Benefits

Vigilance Computer Applications Service Availability Monitoring. Proactive server, applications Frame Relay, WAN and network monitoring. Home Based in Portland, Oregon and Seattle, Washington.

Vigilance® Monitoring
Monitoring 101 - The Basics

Monitoring Overview

Looking for an expert, professional, NOC Design Engineer to design and build your Network Operations Center initiative?

For most companies it is no longer acceptable to learn they have a computer or network problem by receiving a call from one of their customers. In fact, companies that view their computer operations as being mission critical want to be warned well in advance so they can take corrective action before a problem impacts customers or employees. Any IT organization that is charged with delivering Service Level Agreements (SLA) to their customers is going to require some type of professional monitoring to meet those objectives. Today's monitoring technologies can solve these problems if you know what tools to employ and how to utilize them properly. This page discusses terminology and monitoring capabilities that can be used to lower your computer service delivery costs while improving performance and reliability.

Standard terminology: Some service providers use the term "7x24x365" to describe their monitoring. Frequently what this means is that there is a computer sitting in the corner sending out ping packets every 5-10 minutes. Others claim to have "professionally staffed" monitoring when in fact their monitoring software is sending alerts to a person who is carrying the "duty pager". Proactive? In many cases, this means paging you about something fairly meaningless such as a disk being 90% full or about CPU usage momentarily running high. This is quite different than knowing what is causing the disk to fill or what processes are causing the high CPU usage. Do you really want a 3 AM page just because the CPU went to 90% for 60 seconds? You'll need to ask the right questions to determine whether 7x24x365 professionally staffed proactive monitoring means the same thing to a prospective MSP as it does to you and us.

A well implemented monitoring service should pay for itself by lowering your costs and improving productivity.

Basic Ping and Port Check Monitoring

The most basic level of monitoring tells you if something is up or down. Ping monitoring confirms whether an IP address is alive and port monitoring confirms whether a specific port at an IP address is responding. URL monitoring verifies that a specific web site page is reachable. These tests do not measure performance. They merely tell you if something is broken. This type of monitoring provides no advance warning, so the alerts they generate will be reactive in nature. Most ping/port check tools run unattended. They send out pages and e-mails if they do not receive a reply from the host being tested within a certain amount of time. It is not uncommon for these types of tools to send out frequent false alarms because network latency or other factors delay the response from the host being tested.

There are hundreds if not thousands of people offering this service. For most, this is a part time business which runs on a PC in their garage. Typically, ping/port checking is implemented by running free or nearly free open source software. These companies will claim to provide 7x24x365 monitoring but will really just be pinging and port checking your server once every 5-10 minutes and will spray out e-mails and pages if their PC doesn't receive a response in time.

Most of these companies will have some bell or whistle that gives the impression that they are doing more than ping & port checking. However, even with an extra feature or two, this type of service is still very low end and entry level.

The fee for this type of monitoring typically runs in the $100-$150 per month per server range. You can save some money if you are willing to have your server tested less often (e.g. one ping every 20 minutes). Personally, we think you should save even more money and just let your customers call you when your server goes down.

Server Performance Monitoring

This is the next level of monitoring and is where the majority of monitoring services end. The methodology to perform this monitoring usually depends on tools included in the server's operating system. The items being monitored would include such things as CPU usage, server load, disk utilization, memory usage, and entries in selected log files. This monitoring can provide some advance warning about impending system problems if the thresholds for the alerts are set properly. For example, if you set the upper alert level for disk usage at 90% you should receive an alarm in time to take action before the disk become full. Of course these types of tools have no way of knowing what is causing the disk to fill or how fast disk space is being consumed. Like low end monitoring, this service almost always runs unattended on "at home PCs" and sprays out e-mails and pages to a list when a problem is detected.

Many of the people in this business claim that they are providing 7x24x365 "proactive monitoring". We'll leave it to you to decide whether that's true or not.

SNMP Monitoring

Almost every piece of equipment in a company network can be monitored via SNMP polling. This methodology uses device specific management information blocks (MIBs) to obtain additional information about a device's health. This information is collected at regular polling intervals. SNMP enabled devices can also be programmed to send SNMP trap information to message handlers in real time. If your servers and network infrastructure is truly mission critical this is an increased level of monitoring that can be quite important. Building and maintaining an SNMP monitoring environment is a non-trivial undertaking. IT Managers will need to purchase expensive Network Node Manager type software and commit internal resources to bring this capability in-house or will need to outsource this to a professional monitoring company.

Almost none of the "7x24x365 monitoring" companies you'll find in a google search will provide meaningful SNMP monitoring.

Application Monitoring

Many monitoring tools can tell you if an application is alive or dead, but not many monitor the actual health and well being of applications. An application can be alive but performing so slowly that customers or employees can't use it. For example, this type of monitoring might be used to measure the speed of shopping cart transactions for ecommerce companies or to determine how fast server based applications respond to employees. An important applications monitoring feature is the ability to monitor log files and to understand the meaning of messages placed there by the application.

Bandwidth Monitoring

The high cost of telecom WAN pipelines makes bandwidth monitoring a necessity. It provides a way to optimize the capacity of your circuits, identify bottlenecks, plan future needs, verify bills, and eliminate illegal usage.

Agent Based Monitoring

This is a requirement for IT Managers who are serious about operating system and applications monitoring. It is unlikely that even two 9's SLAs (99% uptime) can be reached without employing this type of technology. Agents are essentially daemons or processes that run on the server and provide "hooks" for other pieces of code that do much more precise levels of monitoring than previously discussed. For example, a monitoring software agent would usually operate with a Smart Plug In (SPI) or Knowledge Module (KM) that was designed to monitor specific operating systems and applications such as Solaris and Oracle or IIS and SQL. Agents may operate independently but more often they also communicate with server consoles or enterprise managers located in a Network Operations Center (NOC). In this configuration, a NOC Tech can verify and troubleshoot equipment in real time as well as receive and view asynchronous messages and alarms from server agents.

Local vs Remote Monitoring

Your company firewall can prevent a remote monitoring company from having access to the information needed for availability, performance and bandwidth testing. Outsourced monitoring often requires the creation of a VPN between your data center and the monitoring company. Another option is to choose a monitoring company that places a monitoring appliance inside your data center, behind your firewall. A third choice is to select a monitoring company that utilizes software tools that have been specifically designed to safely monitor devices from untrusted Internet space. Obviously, security is a huge concern any time you allow anyone access to your computing equipment. Make sure your MSP is not going to expose your network to any type of security risk!

Reactive vs Proactive Monitoring

One of the primary purposes of monitoring is to warn you of problems so you can take corrective action in a timely manner. Wherever possible you want to implement monitoring that will give you as much advance warning as possible before a problem impacts employees or customers. Ping and port check monitoring can only inform you after an outage has occurred. Most of the other monitoring methods discussed here can be configured in a way that will give you at least some advance warning so you can take corrective action before an outage occurs. For maximum proactive coverage, having actual Technicians viewing a monitoring console in a NOC is going to be a requirement. This is really the only way to get past two 9's SLAs. Even the best monitoring software is going to generate false alarms or miss important events if implemented to run unattended. For example, a disk drive could be failing and generating error messages in the system log file; obviously an important indication that some maintenance is going to be required very soon. However creating an alarm for every instance of the word "error" in the log would be impractical since many unimportant things also generate error messages. Having a Technician review these log messages as they come in to make sure that nothing important is missed is an important monitoring feature.

Event and Alert Escalation

You will need to determine how you want to handle alerts that have been generated. Many companies will want custom alert event handling depending on severity, time of day, service redundancy and other factors. Alarms can be sent to pagers, e-mail addresses or a trouble ticket system. Verified failures can also be reported directly to Customer IT Administrators or to a service provider for resolution. You want to avoid monitoring companies that only call the "on call Engineer" or that spray out calls and pages to your entire IT Staff. A competent monitoring company will isolate the problem sufficiently so that only the proper source of solution needs to be notified (e.g. The Oracle DBA is called for Oracle problems, not the Network or UNIX Admins). Or you can choose a monitoring company that has the technical expertise to perform root cause analysis and correct the problem for you.

Trouble Ticket Systems

A complete monitoring offering will include an integrated trouble ticket system. This system will automatically open a trouble ticket when an alert is generated so you have an audit trail of the problem and actions that were taken to correct it. This also could be used by your IT staff to track all IT related tasks and inform the call center of actions in process.

Vigilance Monitoring is a Division of Easyrider LAN Pro, established 1990.