How Key Performance Indicators Can Help You Recognize And Optimize Your Data Center’s Operations Potential
With all of the data flowing in and out of your Data Center, it’s hard to know where to start with measuring performance. Developing a set of performance baselines and measuring your team’s output is a great place to start. But what KPI’s or Key Performance Indicators should you begin with? Knowing what KPI’s to watch, what your baseline is and what your organizations required support SLA’s/SLO’s are will help you to determine the best measurement threshold for each KPI. The concept of a data-driven data center might sound a bit redundant, but if you’re not assessing the performance of your data centers, then you could be wasting resources.
Monitoring KPIs and ITSM (IT Service Management) metrics will make your data centers more efficient in this fast-paced and ever-evolving data management climate. Developing a set of standards to measure internal and external performance is an often-overlooked piece of this puzzle.
Here are some helpful KPIs to get you started with managing your sites and teams performance:
Given the importance of power, cooling and space consumption in your data center, it is essential to track and monitor how every server stacks up in these categories. By following the output and consumption of each server, you can ensure that you are making the best use of your RU space and available power. Underutilized servers are referred to as Zombie Servers. These servers can consume massive amounts of valuable data center resources reducing valuable real estate needed for expansion and growth. By monitoring the processor output and energy consumption of these servers over time, you will be able to trend resource utilization and determine if this hardware is needed or can be removed.
ITSM TTC – Total Ticket Count
One easy way to measure the output, performance, and workload of your data center operations team is to review the service ticket queue details. Understanding the total ticket count will help you better understand how busy your team is. Drilling down into that data can also reveal what times of the day your workload increases and decreases, the tech vs. ticket quantity and optics on whether additional or fewer resources are needed to meet the demands of your data center.
ITSM TTR – Time to Respond
Response time goes hand in hand with your internal and external SLA/SLO metrics. Your ITSM ticket queue is how workflows from the application owner to the data center technicians. Establishing an SLA (Service Level Response) metric will ensure that each ticket is completed based on priority and not necessarily when it comes into the queue, freeing your team up to work on the most critical tasks immediately. Monitoring the response time of the data center operations team will help ensure that your internal and external customers work is complete in the timeframe that is required. By continually measuring and trending this KPI you will be able to fine tune you SLA/SLO times to better serve your customers.
ITSM TTC – Time to Complete
This metric is very subjective because not all similar tasks take the same amount of time to resolve. Troubleshooting issues may get fixed on the first try or take several different attempts. It is good to understand how long a task takes to complete, which ensures you allocate enough time in your SLAs. Only you can decide how to interpret this data, and it may take a few tries to get it right. Monitor these metrics closely and discuss with your team to help you establish the proper benchmarks.
Capacity Planning (White Space and Power Utilization):
White Space Utilization, PUE (Power Usage Effectiveness) and Power Consumption vs. Availability are all good KPI’s to use when tracking Capacity Planning. Understanding what you have now and the growth each month will help forecast when new space or power is needed. Space and energy can be costly additions to the data center. Developing a methodology to track utilization efficiencies will then help provide data for when it is time to expand. PUE (Total Facility Energy / Total IT Equipment Energy) helps determine how efficient your data center is running. The industry average for PUE is 1.8 – 2.0. It is important not to worry about how your PUE compares to other data centers because of the wide variance of data used to determine your sites PUE. Your PUE will be affected by many different factors including the age of the building and infrastructure inside. By tracking your PUE each month, you will be able to measure how effective your efforts are to increase efficiency and decrease energy usage.