As container platforms continue to gain adoption in Enterprise software environments, monitoring methodologies that worked well for traditional (virtual and physical machine) deployments have become ineffective. Prometheus is an open source systems monitoring tool chain originally developed at Sound Cloud that works very well in container environments. This article is intended to provide quick instructions to get you started with Prometheus in a Kubernetes or OpenShift environment.
Why things need to change
For some context, before we get started with Prometheus, let's briefly define some core differences between system and service monitoring in a virtual machine environment and monitoring in a container environment such as Kubernetes. Suppose, for example, we have a critical Java service that receives purchase order requests and persists them in a database. Traditionally, the service might be deployed with three JVM instances on three hosts for high availability. The service might expose metrics through JMX that are available via some connector, such as the RMI connector. There are plenty of tools available with capabilities to query JMX over a remote protocol, e.g. JConsole, or even tools that can compare JMX statistics against SLAs and enable reports and alerting; JBoss Operations Network (JON) comes to mind. Now, if we were to migrate this service to a container environment like Kubernetes, these types of tools would become much less effective for several reasons. Here are a few of them:
We no longer have static metrics endpoints; container instances could get replaced by upgraded instances, and spin up on a different node in the cluster all together. Remember, it's not very effective to use a Kubernetes service as a monitoring endpoint, because we want more granular statistics about each underlying container that composes a service.
We might not even have a static number of container instances in a service. We might be scaling up and down the number of container instances in a service manually through the Kubernetes API, or we may even have auto-scaling implemented based on some utilization metric.
We would much rather use a language-agnostic protocol for exposing metrics. JMX over RMI is really only a Java thing. With exciting new languages taking the stage, along with the idea that microservice architectures could (or at least should be able to) include components that are implemented in a boundless variety of ways, we need an instrumentation strategy that can be implemented by any service, not just Java services, and we need monitoring that understands an agnostic, standard metrics API.
So most of what we really need is a way to keep track of all these containers popping up, dying, scaling... it's really a moving target, right? So what knows about all of that? Kubernetes does!
That's the idea that makes Prometheus work so nicely in a container environment. The metrics-gathering algorithm begins by retrieving an up-to-date inventory of container instances in the cluster from the master Kubernetes API. And Kubernetes doesn't only know about container inventory, but core infrastructure components, nodes, etc. Almost anything in the environment can, and in most cases, has been, instrumented to expose metrics. Even legacy applications, like the Java service example described earlier can easily plug into this strategy as Jolokia provides a bridge between the existing JMX-exposed metrics and the Kubernetes master API. Read more about Jolokia JVM Monitoring in OpenShift if you're curious.
Long story, short, Prometheus (with a lot of help from Kubernetes) can discover almost anything that is instrumented with metrics throughout a container cluster.
Start up the Container Development Kit
These instructions will be targeted at the Red Hat Container Development Kit. If you don't know about the CDK, it's pretty cool. Essentially, you can spin up an entire OpenShift (Kubernetes-based) environment on your local machine, using some virtualization software like VirtualBox. I'm going to start with some basic instructions for setting up the CDK, but note that any Kubernetes-based environment will work for the Prometheus set up.
Here's how I got the CDK up and running. (Skip this if you already have a cluster)
Start a Prometheus container
The first thing we need is a service account that has privileges to query the Kubernetes master API.
$ oc create serviceaccount metrics -n sample-project $ oc adm policy add-cluster-role-to-user cluster-reader system:serviceaccount:sample-project:metrics
Now that we have a service account, we can spin up the Prometheus service.
The Fabric8 community has a really nice collection of templates for common open source applications. I used the Prometheus template from that collection.
You'll notice the template references a service account called
metrics. That's why we named the service account in the first step
metrics. You can choose a different name, just change it in the template.
Note that this template is using a Docker image that the Fabric8 community contributed. This has configuration embedded that helps Prometheus discover metrics exposed by Kubernetes. This is not the default Prometheus config, it is specifically targeted at a Kubernetes environment.
$ oc create -f https://raw.githubusercontent.com/fabric8io/templates/master/default/template/prometheus.json -n sample-project
After importing the template, you can browse to the OpenShift web console, select the sample project, and hit the "add to project" button. Prometheus should be an option. Select it and hit "create".
The service and replication controller should be created instantly, but it will take a sec for the pod to spin up. Once the pod is up, expose the service with a route:
$ oc expose service prometheus --port=9090 -n sample-project
You can obtain the URL to your Prometheus web UI by viewing details of the newly created route. For example:
$ oc get routes
The URL will look something like
prometheus-sample-project.rhel-cdk.10.1.2.2.xip.io if you are using the CDK.
So what is this UI for? Two things:
- The Status tab shows you the current configuration and a really high-level health report.
- The Graph tab allows you to construct and test metric queries out.
For example, type
pod:memory_usage_bytes in the input box, and hit "execute". Toggle between "Graph" and "Console".
There's a whole lot more to Prometheus. This article was meant to help you get started. Here's a great video that includes a more comprehensive overview of metric queries, and don't forget the the Prometheus docs site has tons of information including how you could develop metrics visualizations with Grafana, how you can instrument your services to expose custom metrics, and how you might go about an alerting strategy.