This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Health checks

How to setup health checks for the Dapr sidecar and your application

1 - App health checks

Reacting to apps’ health status changes

The app health checks feature allows probing for the health of your application and reacting to status changes.

Applications can become unresponsive for a variety of reasons. For example, your application:

  • Could be too busy to accept new work;
  • Could have crashed; or
  • Could be in a deadlock state.

Sometimes the condition can be transitory, for example:

  • If the app is just busy and will resume accepting new work eventually
  • If the application is being restarted for whatever reason and is in its initialization phase

App health checks are disabled by default. Once you enable app health checks, the Dapr runtime (sidecar) periodically polls your application via HTTP or gRPC calls. When it detects a failure in the app’s health, Dapr stops accepting new work on behalf of the application by:

  • Unsubscribing from all pub/sub subscriptions
  • Stopping all input bindings
  • Short-circuiting all service-invocation requests, which terminate in the Dapr runtime and are not forwarded to the application
  • Unregistering Dapr Actor types, thereby causing Actor instances to migrate to a different replica if one is available

These changes are meant to be temporary, and Dapr resumes normal operations once it detects that the application is responsive again.

Diagram showing the app health feature. Running Dapr with app health enabled causes Dapr to periodically probe the app for its health.

App health checks vs platform-level health checks

App health checks in Dapr are meant to be complementary to, and not replace, any platform-level health checks, like liveness probes when running on Kubernetes.

Platform-level health checks (or liveness probes) generally ensure that the application is running, and cause the platform to restart the application in case of failures.

Unlike platform-level health checks, Dapr’s app health checks focus on pausing work to an application that is currently unable to accept it, but is expected to be able to resume accepting work eventually. Goals include:

  • Not bringing more load to an application that is already overloaded.
  • Do the “polite” thing by not taking messages from queues, bindings, or pub/sub brokers when Dapr knows the application won’t be able to process them.

In this regard, Dapr’s app health checks are “softer”, waiting for an application to be able to process work, rather than terminating the running process in a “hard” way.

Configuring app health checks

App health checks are disabled by default, but can be enabled with either:

  • The --enable-app-health-check CLI flag; or
  • The dapr.io/enable-app-health-check: true annotation when running on Kubernetes.

Adding this flag is both necessary and sufficient to enable app health checks with the default options.

The full list of options are listed in this table:

CLI flags Kubernetes deployment annotation Description Default value
--enable-app-health-check dapr.io/enable-app-health-check Boolean that enables the health checks Disabled
--app-health-check-path dapr.io/app-health-check-path Path that Dapr invokes for health probes when the app channel is HTTP (this value is ignored if the app channel is using gRPC) /healthz
--app-health-probe-interval dapr.io/app-health-probe-interval Number of seconds between each health probe 5
--app-health-probe-timeout dapr.io/app-health-probe-timeout Timeout in milliseconds for health probe requests 500
--app-health-threshold dapr.io/app-health-threshold Max number of consecutive failures before the app is considered unhealthy 3

See the full Dapr arguments and annotations reference for all options and how to enable them.

Additionally, app health checks are impacted by the protocol used for the app channel, which is configured with the following flag or annotation:

CLI flag Kubernetes deployment annotation Description Default value
--app-protocol dapr.io/app-protocol Protocol used for the app channel. supported values are http, grpc, https, grpcs, and h2c (HTTP/2 Cleartext). http

Health check paths

HTTP

When using HTTP (including http, https, and h2c) for app-protocol, Dapr performs health probes by making an HTTP call to the path specified in app-health-check-path, which is /health by default.

For your app to be considered healthy, the response must have an HTTP status code in the 200-299 range. Any other status code is considered a failure. Dapr is only concerned with the status code of the response, and ignores any response header or body.

gRPC

When using gRPC for the app channel (app-protocol set to grpc or grpcs), Dapr invokes the method /dapr.proto.runtime.v1.AppCallbackHealthCheck/HealthCheck in your application. Most likely, you will use a Dapr SDK to implement the handler for this method.

While responding to a health probe request, your app may decide to perform additional internal health checks to determine if it’s ready to process work from the Dapr runtime. However, this is not required; it’s a choice that depends on your application’s needs.

Intervals, timeouts, and thresholds

Intervals

By default, when app health checks are enabled, Dapr probes your application every 5 seconds. You can configure the interval, in seconds, with app-health-probe-interval. These probes happen regularly, regardless of whether your application is healthy or not.

Timeouts

When the Dapr runtime (sidecar) is initially started, Dapr waits for a successful health probe before considering the app healthy. This means that pub/sub subscriptions, input bindings, and service invocation requests won’t be enabled for your application until this first health check is complete and successful.

Health probe requests are considered successful if the application sends a successful response (as explained above) within the timeout configured in app-health-probe-timeout. The default value is 500, corresponding to 500 milliseconds (half a second).

Thresholds

Before Dapr considers an app to have entered an unhealthy state, it will wait for app-health-threshold consecutive failures, whose default value is 3. This default value means that your application must fail health probes 3 times in a row to be considered unhealthy.

If you set the threshold to 1, any failure causes Dapr to assume your app is unhealthy and will stop delivering work to it.

A threshold greater than 1 can help exclude transient failures due to external circumstances. The right value for your application depends on your requirements.

Thresholds only apply to failures. A single successful response is enough for Dapr to consider your app to be healthy and resume normal operations.

Example

Use the CLI flags with the dapr run command to enable app health checks:

dapr run \
  --app-id my-app \
  --app-port 7001 \
  --app-protocol http \
  --enable-app-health-check \
  --app-health-check-path=/healthz \
  --app-health-probe-interval 3 \
  --app-health-probe-timeout 200 \
  --app-health-threshold 2 \
  -- \
    <command to execute>

To enable app health checks in Kubernetes, add the relevant annotations to your Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  labels:
    app: my-app
spec:
  template:
    metadata:
      labels:
        app: my-app
      annotations:
        dapr.io/enabled: "true"
        dapr.io/app-id: "my-app"
        dapr.io/app-port: "7001"
        dapr.io/app-protocol: "http"
        dapr.io/enable-app-health-check: "true"
        dapr.io/app-health-check-path: "/healthz"
        dapr.io/app-health-probe-interval: "3"
        dapr.io/app-health-probe-timeout: "200"
        dapr.io/app-health-threshold: "2"

Demo

Watch this video for an overview of using app health checks:

2 - Sidecar health

Dapr sidecar health checks

Dapr provides a way to determine its health using an HTTP /healthz endpoint. With this endpoint, the daprd process, or sidecar, can be:

  • Probed for its overall health
  • Probed for Dapr sidecar readiness during initialization
  • Determined for readiness and liveness with Kubernetes

In this guide, you learn how the Dapr /healthz endpoint integrates with health probes from the application hosting platform (for example, Kubernetes) as well as the Dapr SDKs.

The following diagram shows the steps when a Dapr sidecar starts, the healthz endpoint and when the app channel is initialized.

Diagram of Dapr checking oubound health connections.

Outbound health endpoint

As shown by the red boundary lines in the diagram above, the v1.0/healthz/ endpoint is used to wait for when:

  • All components are initialized;
  • The Dapr HTTP port is available; and,
  • The app channel is initialized.

This is used to check the complete initialization of the Dapr sidecar and its health.

Setting the DAPR_HEALTH_TIMEOUT environment variable lets you control the health timeout, which, for example, can be important in different environments with higher latency.

On the other hand, as shown by the green boundary lines in the diagram above, the v1.0/healthz/outbound endpoint returns successfully when:

  • All the components are initialized;
  • The Dapr HTTP port is available; but,
  • The app channel is not yet established.

In the Dapr SDKs, the waitForSidecar/wait_until_ready method (depending on which SDK you use) is used for this specific check with the v1.0/healthz/outbound endpoint. Using this behavior, instead of waiting for the app channel to be available (see: red boundary lines) with the v1.0/healthz/ endpoint, Dapr waits for a successful response from v1.0/healthz/outbound. This approach enables your application to perform calls on the Dapr sidecar APIs before the app channel is initalized - for example, reading secrets with the secrets API.

If you are using the waitForSidecar/wait_until_ready method on the SDKs, then the correct initialization is performed. Otherwise, you can call the v1.0/healthz/outbound endpoint during initalization, and if successesful, you can call the Dapr sidecar APIs.

SDKs supporting outbound health endpoint

Currently, the v1.0/healthz/outbound endpoint is supported in the:

Health endpoint: Integration with Kubernetes

When deploying Dapr to a hosting platform like Kubernetes, the Dapr health endpoint is automatically configured for you.

Kubernetes uses readiness and liveness probes to determines the health of the container.

Liveness

The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock (a running application that is unable to make progress). Restarting a container in such a state can help to make the application more available despite having bugs.

How to configure a liveness probe in Kubernetes

In the pod configuration file, the liveness probe is added in the containers spec section as shown below:

    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 3
      periodSeconds: 3

In the above example, the periodSeconds field specifies that the kubelet should perform a liveness probe every 3 seconds. The initialDelaySeconds field tells the kubelet that it should wait 3 seconds before performing the first probe. To perform a probe, the kubelet sends an HTTP GET request to the server that is running in the container and listening on port 8080. If the handler for the server’s /healthz path returns a success code, the kubelet considers the container to be alive and healthy. If the handler returns a failure code, the kubelet kills the container and restarts it.

Any HTTP status code between 200 and 399 indicates success; any other status code indicates failure.

Readiness

The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A pod is considered ready when all of its containers are ready. One use of this readiness signal is to control which pods are used as backends for Kubernetes services. When a pod is not ready, it is removed from Kubernetes service load balancers.

How to configure a readiness probe in Kubernetes

Readiness probes are configured similarly to liveness probes. The only difference is that you use the readinessProbe field instead of the livenessProbe field:

    readinessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 3
      periodSeconds: 3

Sidecar Injector

When integrating with Kubernetes, the Dapr sidecar is injected with a Kubernetes probe configuration telling it to use the Dapr healthz endpoint. This is done by the “Sidecar Injector” system service. The integration with the kubelet is shown in the diagram below.

Diagram of Dapr services interacting

How the Dapr sidecar health endpoint is configured with Kubernetes

As mentioned above, this configuration is done automatically by the Sidecar Injector service. This section describes the specific values that are set on the liveness and readiness probes.

Dapr has its HTTP health endpoint /v1.0/healthz on port 3500. This can be used with Kubernetes for readiness and liveness probe. When the Dapr sidecar is injected, the readiness and liveness probes are configured in the pod configuration file with the following values:

    livenessProbe:
      httpGet:
        path: v1.0/healthz
        port: 3500
      initialDelaySeconds: 5
      periodSeconds: 10
      timeoutSeconds : 5
      failureThreshold : 3
    readinessProbe:
      httpGet:
        path: v1.0/healthz
        port: 3500
      initialDelaySeconds: 5
      periodSeconds: 10
      timeoutSeconds : 5
      failureThreshold: 3

Delay graceful shutdown

Dapr accepts a dapr.io/block-shutdown-duration annotation or --dapr-block-shutdown-duration CLI flag, which delays the full shutdown procedure for the specified duration, or until the app reports as unhealthy, whichever is sooner.

During this period, all subscriptions and input bindings are closed. This is useful for applications that need to use the Dapr APIs as part of their own shutdown procedure.

Applicable annotations or CLI flags include:

  • --dapr-graceful-shutdown-seconds/dapr.io/graceful-shutdown-seconds
  • --dapr-block-shutdown-duration/dapr.io/block-shutdown-duration
  • --dapr-graceful-shutdown-seconds/dapr.io/graceful-shutdown-seconds
  • --dapr-block-shutdown-duration/dapr.io/block-shutdown-duration

Learn more about these and how to use them in the Annotations and arguments guide.