It’s 9am, Wednesday morning. The system is down, the Marketing Manager is calling you saying they have an urgent piece of work to complete now and the Sales Director is shouting that his proposal needs to be sent by 9.15am to reach the deadline! Would you prefer your response to be:
“We’ll get right on it” or “We are already looking into it”
If “you are already looking into it” you are using proactive monitoring. There is a growing trend in businesses to implement this type of monitoring and this becomes increasingly important as IT moves from a hardware base to a software base such as cloud computing.
“In an economic climate where costs are constantly under scrutiny, the efficiency and productivity of the IT organisation can be dramatically enhanced by a move to an automated, proactive approach to application lifecycle performance management.” Bernd Greifeneder, CTO & Founder, dynaTrace
“Staff are often blamed, cursed and lambasted for crashing their own computers, causing havoc with email and wasting the IT manager’s time, but is it really their fault? At the same time, users curse the IT team when technical problems slow them down, cause missed deadlines and hold up communication between colleagues. “ Urvesh Lakhani , Avanade
Communication between the IT department and the other departments of an organisation is absolutely critical to business success. Through dialogue you can discover what the real pain points of each department are and via proactive monitoring of their systems by you can allow them to get to grips with any problems they may be having with email or storage space for example. By working as a team instances such as downtime can be avoided or strategically planned to least impact on the workforce.
Do you know your systems time to death?
Performance monitoring can sound very similar to proactive monitoring. Two of the biggest differences between them are that with proactive monitoring you would expect an alert event is raised should a threshold be breached. The second being that proactive monitoring events are transient and would not be retained much longer than an incident period, whereas with performance monitoring you are collecting information over a period of time for analysis.
This is performed by taking a sample of the current state of the device you wish to track and storing it. This is a performance counter that you normally sample once every 5 minutes and retain for a minimum of 1 year for analysis so you can compare the growth, although many organisations retain longer periods, albeit decreasing the sample rate.
This information is crucial to ascertain the growth in usage of the service then predict it’s time to death. You can also work out what the usage trends of the systems are. Armed with this you can predict the time
When the service will need to be upgraded or replaced, keep a check on the trend to see if utilisation increases and if this will have an impact on when further expenditure will be required or if there is slack capacity in the system or if it is underutilised where by an investigation into why it is not being used or a move to increasing the utilisation of the assets can be performed.
How to make monitoring more effective?
If you have already invested in implementing a monitoring system, this may be a good opportunity to take a look at what you are actually checking and asking yourself why is it being checked? What information do we actually need to know?
Monitoring must be aligned to the service you are providing to the users. A simple example of this would be looking at a web service.
- If your monitoring only monitors the memory usage, CPU utilisation, network utilisation and disk space on the server then you will be alerted on events that may or may not affect the service and human intervention will be required to judge whether this is normal behaviour of the application or a problem.
- If you only monitor that the service/daemon is running or the port is available then you are only checking that the underlying application server is available and not the web application that the user interacts with. Therefore it is logical that pages within the web application should be monitored to ensure service is being delivered to the user.
- If you only monitor that the service/daemon is running or the port is available then you are only checking that the underlying application server is available and not the web application that the user interacts with. Therefore, it is logical that pages within the web application should be monitored to ensure service is
- Being delivered to the user.
- Most applications and services provide a wealth of diagnostic information in their log files. Make sure you include automatic checks by the monitoring system of these files for events that you know are likely to cause problems or generic terms for critical failures.
Plan your storage space
When planning your performance monitoring, storage is critical and a prudent approach to selecting appropriate counters is advised. To illustrate this if we make the following assumptions: Each sample is 4K in size; is it sampled every 5 minutes; Samples are retained for 1 year.
This would result in a sample set being 410Mb in size. Most systems would require a minimum of 10 sets each making the stored size 4Gb. Many services require multiple systems.
This does not sound like a large amount but if we look at a basic resilient Microsoft Office SharePoint Server 2010 implementation of: Dual web front end servers; Indexing server; Dual SQL Servers; Dual Active Directory Server; Dual Virtual hosts; Dual switches
This increases the stored set for 1 year to 44Gb. If we then look at counters that you would want to trend for SharePoint implementations this would double to near 90Gb.
But what should I monitor?
Each environment will be different and an understanding of the characteristics is paramount. Contact us for a starter set to get you going.
Be proactive in your IT provision and make IT really work for your business.