Web Servers

Understanding Istio Telemetry v2

It’s been a while since I’ve blogged, and just like other posts in the past, this one is meant as a way to dig into something and for me to catalog my own thoughts for later. While digging into some issues for some of our Istio customers as well as for a chapter in my upcoming book, Istio in Action, I found myself knee-deep in the Istio telemetry v2 functionality. Let’s see how it works.

We will use source code from https://github.com/christian-posta/istio-telemetry-v2 for this blog.

Istio telemetry v2

Istio telemetry v2 is a combination of data-plane extensions (ie, Envoy extensions) and an programable API to allow operators to tune, customize, and even create “service-level” metrics within the proxy. This “v2” status replaces a previous implementation based on an out-of-band integration engine called Mixer.

There are three main concepts in the telemetry v2 functionality that you should understand to fully wrap your ahead around what it’s doing and how to customize it:

  • Metrics
  • Dimensions
  • Attributes

A metric is a counter, gauge, or histogram/distribution of telemetry signals between service calls (inbound/outbound). For example, some of the Istio standard metrics are:

  • istio_requests_total a COUNTER measuring total number of requests
  • istio_request_duration_milliseconds a DISTRIBUTION measuring latency of requests
  • istio_request_bytes a DISTRIBUTION which measure http request body sizes
  • istio_response_bytes a DISTRIBUTION which measures http response body sizes

For the istio_requests_total metric, we count the total number of requests that have come through. The interesting bit is that a metric can have various dimensions which are additional properties that give more depth and insight for a particular metric.

From the docs, you can see for example, the istio_requests_total metric has some out of the box dimensions. Here’s an example of those dimensions:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
istio_requests_total{
    response_code="200",
    reporter="destination",
    source_workload="web-api",
    source_workload_namespace="istioinaction",
    source_principal="spiffe://cluster.local/ns/istioinaction/sa/default",
    source_app="web-api",
    source_version="unknown",
    source_cluster="Kubernetes",
    destination_workload="recommendation",
    destination_workload_namespace="istioinaction",
    destination_principal="spiffe://cluster.local/ns/istioinaction/sa/default",
    destination_app="recommendation",
    destination_version="unknown",
    destination_service="recommendation.istioinaction.svc.cluster.local",
    destination_service_name="recommendation",
    destination_service_namespace="istioinaction",
    destination_cluster="Kubernetes",
    request_protocol="http",
    response_flags="-",
    grpc_response_status="",
    connection_security_policy="mutual_tls",
    source_canonical_service="web-api",
    destination_canonical_service="recommendation",
    source_canonical_revision="latest",
    destination_canonical_revision="latest"
  } 5

This means we’ve seen 5 requests from the web-api app to the recommendation app that have a response_code of HTTP 200. If any of these dimensions are different, we’ll see a new entry for this metric. For example, if there are any HTTP 500 response codes, we’d see this in a different line (some dimensions left out for brevity):

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
istio_requests_total{
    response_code="200",
    reporter="destination",
    source_workload="web-api",
    source_workload_namespace="istioinaction",
    destination_workload="recommendation",
    destination_workload_namespace="istioinaction",
    request_protocol="http",
    connection_security_policy="mutual_tls",
  } 5
 
istio_requests_total{
    response_code="500",
    reporter="destination",
    source_workload="web-api",
    source_workload_namespace="istioinaction",
    destination_workload="recommendation",
    destination_workload_namespace="istioinaction",
    request_protocol="http",
    connection_security_policy="mutual_tls",
  } 3

The last important bit of detail is where these dimensions come from. To answer this, we need to understand attributes and CEL expressions. In it’s simplest form a dimension gets its values at runtime from attributes that come from Envoy’s underlying attributes or from Istio’s peer-metadata filter.

For example, let’s see the Request Attributes that come from Envoy:

AttributeDescription
request.pathThe path portion of the URL
request.url_pathThe path portion of the URL without the query string
request.hostThe host portion of the URL
request.schemeThe scheme portion of the URL e.g. “http”
request.methodRequest method e.g. “GET”
request.headersAll request headers indexed by the lower-cased header name
request.refererReferer request header
request.useragentUser agent request header
request.timeTime of the first byte received
request.idRequest ID corresponding to x-request-id header value
request.protocolRequest protocol (“HTTP/1.0”, “HTTP/1.1”, “HTTP/2”, or “HTTP/3”)

For example, to map an attribute to a dimension, we can configure the metric (we’ll see that in the next section) like this:

1
request_url = request.url

As stated earlier, there is a wealth of available attributes out of the box from Envoy as well as from Istio’s Peer Metadata plugin. Please check the respective docs. We can even create our own derivative attributes to use for a particular dimension which we’ll see in the next section.

Metrics in Action

Let’s see how this all works with an example. You can follow along with the source at https://github.com/christian-posta/istio-telemetry-v2.

First, set up and deploy the sample applications. The sample applications show a call graph between three different services:

web-api –> recommendation –> purchase-history

We have configured the purchase-history service to return errors on 50% of the calls (HTTP 500) and for the rest, to return a latency of P50 of 750ms (variance of 100ms). We can easily make some sample calls with the following command (you should try run this a few times):

1
$  kubectl -n default exec -it deploy/sleep -- curl -H "Host: istioinaction.io" http://istio-ingressgateway.istio-system/

Now, let’s evaluate the metrics from the recommendation service to see what has been captured and for what dimensions:

1
kubectl exec -it -n istioinaction deploy/recommendation -c istio-proxy -- curl localhost:15000/stats/prometheus | grep istio_requests_total

We should see something like:

01
02
03
04
05
06
07
08
09
10
11
12
istio_requests_total{
response_code="200",reporter="destination",source_workload="web-api", source_workload_namespace="istioinaction",destination_workload="recommendation",    destination_workload_namespace="istioinaction",request_protocol="http",    connection_security_policy="mutual_tls"
} 5
istio_requests_total{
response_code="500",reporter="destination",source_workload="web-api", source_workload_namespace="istioinaction",destination_workload="recommendation",    destination_workload_namespace="istioinaction",request_protocol="http",    connection_security_policy="mutual_tls"
} 3
istio_requests_total{
response_code="200",reporter="source",source_workload="recommendation", source_workload_namespace="istioinaction",destination_workload="purchase-history",    destination_workload_namespace="istioinaction",request_protocol="http",    connection_security_policy="mutual_tls"
} 5
istio_requests_total{
response_code="500",reporter="source",source_workload="recommendation", source_workload_namespace="istioinaction",destination_workload="purchase-history",    destination_workload_namespace="istioinaction",request_protocol="http",    connection_security_policy="mutual_tls"
} 3

We can see four different entries for the istio_requests_total along a couple different dimensions (some dimensions removed for brevity). We see differences in the reporter, response_code, source_workload, and destiation_workload dimensions.

We can see a latency distribution for the requests with the istio_request_duration_milliseconds metric

1
kubectl exec -it -n istioinaction deploy/recommendation -c istio-proxy -- curl localhost:15000/stats/prometheus | grep istio_request_duration_milliseconds

Customizing metrics

We can also customize what dimensions are included in a particular metric. In fact, there is already an out of the box example for how to configure these metrics. When we install Istio, a few EnvoyFilters are installed which configure metrics.

01
02
03
04
05
06
07
08
09
10
11
$  kubectl get EnvoyFilter -A
 
NAMESPACE      NAME                        AGE
istio-system   metadata-exchange-1.8       51m
istio-system   metadata-exchange-1.9       51m
istio-system   stats-filter-1.8            51m
istio-system   stats-filter-1.9            51m
istio-system   tcp-metadata-exchange-1.8   51m
istio-system   tcp-metadata-exchange-1.9   51m
istio-system   tcp-stats-filter-1.8        51m
istio-system   tcp-stats-filter-1.9        51m

The one we’re interested in is the stats-filter-* EnvoyFilter. If we take a look at the stats-filter-1.9 we see an EnvoyFilter definition but the salient part is here:

1
$   kubectl get EnvoyFilter -n istio-system stats-filter-1.9 -o yaml
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
- applyTo: HTTP_FILTER
    match:                          
      context: SIDECAR_OUTBOUND                                                       
      listener:
        filterChain:
          filter:
            name: envoy.filters.network.http_connection_manager
            subFilter:
              name: envoy.filters.http.router
      proxy:
        proxyVersion: ^1\.9.*
    patch:
      operation: INSERT_BEFORE
      value:
        name: istio.stats
        typed_config:
          '@type': type.googleapis.com/udpa.type.v1.TypedStruct
          type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
          value:
            config:
              configuration:
                '@type': type.googleapis.com/google.protobuf.StringValue
                value: |
                  {
                    "debug": "false",
                    "stat_prefix": "istio",
                    "metrics": [
                      {
                        "dimensions": {
                          "source_cluster": "node.metadata['CLUSTER_ID']",
                          "destination_cluster": "upstream_peer.cluster_id"
                        }
                      }
                    ]
                  }
              root_id: stats_outbound
              vm_config:
                code:
                  local:
                    inline_string: envoy.wasm.stats
                runtime: envoy.wasm.runtime.null
                vm_id: stats_outbound

This EnvoyFilter is used to ADD a new configuration to the Envoy Http Connection Manager and the chain of filters used to process an HTTP request. Note there are multiple configuration sections in this EnvoyFilter because we configure both the INBOUND as well as OUTBOUND paths. Specifically this is added toward the end of the chain BEFORE the router filter (this is important… the router should be the last filter in this chain). The important config bits are the following:

01
02
03
04
05
06
07
08
09
10
11
12
{
        "debug": "false",
        "stat_prefix": "istio",
        "metrics": [
          {
            "dimensions": {
              "source_cluster": "node.metadata['CLUSTER_ID']",
              "destination_cluster": "upstream_peer.cluster_id"
            }
          }
        ]
      }

This configuration stanza corresponds to the Istio docs here and if you’re really interested, corresponds to the proto in the stats extension here.

Specifically what gets configured here is the cluster dimensions for ALL of the standard Istio metrics (it’s ALL because we don’t explicitly name a metric here).

Let’s change the metrics a bit. We can edit the stats-filter directly, or we can create a different EnvoyFilter that augments the stats-filter with our new configuration. From the source code for this blog, see the customize-metric.yaml file for the full contents:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
{
        "debug": "false",
        "stat_prefix": "istio",
        "metrics": [
          {
            "name": "requests_total",
            "dimensions": {
              "posta": "upstream_peer.istio_version",
              "posta_two": "node.metadata['MESH_ID']"
            },
            "tags_to_remove": [
              "request_protocol"
            ]
          },                   
          {
            "dimensions": {
              "source_cluster": "node.metadata['CLUSTER_ID']",
              "destination_cluster": "upstream_peer.cluster_id"
            }
          }
        ]
      }

In this configuration we’ve added two new dimensions called posta and posta_two and we use attributes from the previous section to populate them.

1
$  kubectl apply -f customize-metric.yaml

If we reviewed our metric at this point, we’d see some discrepancies. The posta and posta_two dimension is not known to our proxy so before we can use it, we need to expose it. This is because these new dimensions are not in the default tag list that Istio knows about. Let’s add the following annotation to our recommendation Deployment pod spec:

1
2
3
4
5
6
template:
    metadata:
      labels:
        app: recommendation
      annotations:
        sidecar.istio.io/extraStatTags: posta,posta_two

This exposes the metric dimensions correctly.

1
$  kubectl apply -f recommendation-tags.yaml -n istioinaction

Now let’s place a few calls:

1
$  kubectl -n default exec -it deploy/sleep -- curl -H "Host: istioinaction.io" http://istio-ingressgateway.istio-system/

Now if we review our istio_requests_total metric we should see the new dimensions:

1
kubectl exec -it -n istioinaction deploy/recommendation -c istio-proxy -- curl localhost:15000/stats/prometheus | grep istio_requests_total
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
istio_requests_total{
    response_code="200",
    reporter="destination",
    source_workload="web-api",
    source_workload_namespace="istioinaction",
    destination_workload="recommendation",
    destination_workload_namespace="istioinaction",
    request_protocol="http",
    connection_security_policy="mutual_tls",
    posta="1.9.3",
    posta_two="cluster.local"
  } 5
 
istio_requests_total{
    response_code="500",
    reporter="destination",
    source_workload="web-api",
    source_workload_namespace="istioinaction",
    destination_workload="recommendation",
    destination_workload_namespace="istioinaction",
    request_protocol="http",
    connection_security_policy="mutual_tls",
    posta="1.9.3",
    posta_two="cluster.local"   
  } 3

Creating new metrics

The last thing we’ll look at in this blog is creating a new metric. To do that, we need to specify a metric definition in the configuration for the stats plugin. Something like this would work to create a new metric called posta_metric:

01
02
03
04
05
06
07
08
09
10
11
{
        "debug": "false",
        "stat_prefix": "istio",
        "definitions": [
          {
            "name": "posta_metric",
            "type": "COUNTER",
            "value": "1"                     
          }
        ]
      }

This is a very simple metric of type COUNTER which just counts requests when they come in (just like istio_requests_total). However, the value field is actually a string where you can place a CEL expression that evaluates some attributes; just note that this expression should evaluate to an integer.

Let’s apply the create-new-metric.yaml from our source code repo:

1
$  kubectl apply -f create-new-metric.yaml

Just like we exposed extra dimensions on the recommendation deployment in the previous step, we will need to expose this new metric with the statsInclusionPrefixes annotation:

1
2
3
4
5
6
7
template:
    metadata:
      labels:
        app: recommendation
      annotations:
        sidecar.istio.io/extraStatTags: posta,posta_two
        sidecar.istio.io/statsInclusionPrefixes: istio_posta_metric

Note that even though we called the metric posta_metric it gets a prefix of istio_ anyway.

1
$  kubectl apply -f recommendation-new-metric.yaml -n istioinaction

Now let’s send some more traffic:

1
$  kubectl -n default exec -it deploy/sleep -- curl -H "Host: istioinaction.io" http://istio-ingressgateway.istio-system/

Now if we review our istio_requests_total metric we should see the new dimensions:

1
2
3
4
$  kubectl exec -it -n istioinaction deploy/recommendation -c istio-proxy -- curl localhost:15000/stats/prometheus | grep posta_metric
 
# TYPE istio_posta_metric counter
istio_posta_metric{} 2

Note there are no dimensions for this metric! Just like we customized the dimensions for metrics in the previous section, we could do something like this:

01
02
03
04
05
06
07
08
09
10
11
12
13
{
        "debug": "false",
        "stat_prefix": "istio",
        "metrics": [
          {
            "name": "posta_metric",
            "dimensions": {
              "posta": "upstream_peer.istio_version",
              "posta_two": "node.metadata['MESH_ID']"
            }
          }
        ]
      }

Note, when we name the metric explicitly, we DON’T need to use the prefix istio_ as it will understand it by default.

Creating your own attributes

Hopefully this blog as gone into enough detail about understanding metrics and Istio’s telemetry v2. Armed with this information, you should now be able to see the Istio docs about generating your own attributes so you can use those in dimensions.

For more information

I cover Istio telemetry v2 deeply in chapter 7 of Istio in Action. Also check the community Istio docs. If you’re deploying Istio and need help, please reach out to me (@christianposta) or ceposta on CNCF/Kubernetes/Istio/Solo.io slack.


Published on Java Code Geeks with permission by Christian Posta, partner at our JCG program. See the original article here: Understanding Istio Telemetry v2

Opinions expressed by Java Code Geeks contributors are their own.

Christian Posta

Christian is a Principal Consultant at FuseSource specializing in developing enterprise software applications with an emphasis on software integration and messaging. His strengths include helping clients build software using industry best practices, Test Driven Design, ActiveMQ, Apache Camel, ServiceMix, Spring Framework, and most importantly, modeling complex domains so that they can be realized in software. He works primarily using Java and its many frameworks, but his favorite programming language is Python. He's in the midst of learning Scala and hopes to contribute to the Apache Apollo project.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button