Platform Monitoring — First concepts.

What is monitoring?

Monitoring provides detailed visibility into performance, availability, user experience, and resource utilization, helping us deliver a consistent platform performance with a low MTTR (Mean Time To Resolution).

Why is monitoring critical?

When a platform experiences performance issues or is unavailable, the company owns that platform risks losing customers. Monitoring tools provide real-time performance and availability insights that allow teams to react quickly when an issue arises.

Why should a platform use a monitoring tool?

A monitoring tool can give a couple of advantages for an organization, such as.

Reduce MTTR

A monitoring tool helps engineers understand what every day looks like by using performance metrics to set their baselines, set proper alerts based on those metrics, and identify the root cause of performance or availability issues.

Increased revenue

A monitoring tool will help the engineering teams quickly identify a critical performance or availability issue that frustrates its platform customers. Frustrated customers start to find other platforms to execute their needs, which directly impacts their revenue.

Reduce costs

A monitoring tool helps an organization understand what every day looks like by checking how much memory and CPU an application needs if the resources of a virtual machine are over-dimensioned.

Monitoring strategies

There are two types of monitoring strategies available, open-box monitoring, former known as white-box monitoring, and closed-box monitoring, former known as blackbox-monitoring.

Open-Box Monitoring

Open-box monitoring gives a rich insight into the service or application. To better understand that, let's use the following simples python service as an example.

Closed-Box Monitoring

Closed-box monitoring, formerly known as blackbox-monitoring, is a strategy to monitor services from the outside using artificial probes. It means that we don't leverage any internal service insights beyond the probe result.

What do means artificial probes?

The artificial probes can be an HTTP request, TCP Socket, or script execution to execute a specific task against the service or application.

Why is closed-box monitoring necessary?

Closed-box monitoring is fundamental to help organizations implement synthetic monitoring and simulate user behavior periodically, and this is a strategy that many companies already use.

Conclusion

As we can see, monitoring is crucial for any organization does matter its size. The organization will quickly respond to incidents, scale its business, and increase customer happiness using open-box and closed-box monitoring strategies. More sophisticated monitoring implementations can help the organization predict incidents and proactively approach performance and reliability issues.

  • Nagios
  • Prometheus

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Nicolas Takashi

Nicolas Takashi

I love to speak, teach, and write about distributed systems, cloud computing, architecture, systems engineering, and APIs.