Monitoring

We’re overhauling Dgraph’s docs to make them clearer and more approachable. If you notice any issues during this transition or have suggestions, please let us know.

Dgraph exposes metrics via the /debug/vars endpoint in JSON format and the /debug/prometheus_metrics endpoint in Prometheus’ text-based format. Dgraph doesn’t store the metrics and only exposes the value of the metrics at that instant. You can either poll this endpoint to get the data in your monitoring systems or install Prometheus. Replace targets in the configuration file below with the IP address of your Dgraph instances and run prometheus using the command prometheus --config.file my_config.yaml.

scrape_configs:
  - job_name: "dgraph"
    metrics_path: "/debug/prometheus_metrics"
    scrape_interval: "2s"
    static_configs:
    - targets:
      - 172.31.9.133:6080     # For Dgraph zero, 6080 is the http endpoint exposing metrics.
      - 172.31.15.230:8080    # For Dgraph alpha, 8080 is the http endpoint exposing metrics.
      - 172.31.0.170:8080
      - 172.31.8.118:8080

Raw data exported by Prometheus is available via /debug/prometheus_metrics endpoint on Dgraph alphas.

Install Grafana to plot the metrics. Grafana runs at port 3000 in default settings. Create a Prometheus data source by following these steps. Import grafana_dashboard.json by following these instructions.

Amazon CloudWatch

Route53’s health checks can be leveraged to create standard CloudWatch alarms to notify on change in the status of the /health endpoints of Alpha and Zero. Considering that the endpoints to monitor are publicly accessible and you have the AWS credentials and awscli setup, we’ll go through an example of setting up a simple CloudWatch alarm configured to alert via email for the Alpha endpoint alpha.acme.org:8080/health. Dgraph Zero’s /health endpoint can also be monitored in a similar way.

Create the Route53 health check

aws route53 create-health-check \
    --caller-reference $(date "+%Y%m%d%H%M%S") \
    --health-check-config file:///tmp/create-healthcheck.json \
    --query 'HealthCheck.Id'

The file /tmp/create-healthcheck.json would need to have the values for the parameters required to create the health check as such:

{
  "Type": "HTTPS",
  "ResourcePath": "/health",
  "FullyQualifiedDomainName": "alpha.acme.org",
  "Port": 8080,
  "RequestInterval": 30,
  "FailureThreshold": 3
}

The reference for the values one can specify while creating or updating a health check can be found on the AWS documentation. The response to the command id the ID of the created health check.

"29bdeaaa-f5b5-417e-a5ce-7dba1k5f131b"

Make a note of the health check ID. This is used to integrate CloudWatch alarms with the health check.

Currently, Route53 metrics are only available in the US East (N. Virginia) region. The CloudWatch Alarm (and the SNS Topic) should therefore be created in us-east-1.

SNS topics are used to create message delivery channels. If you do not have any SNS topics configured, one can be created by running the following command:

aws sns create-topic --region=us-east-1 --name ops --query 'TopicArn'

The response to this command is:

"arn:aws:sns:us-east-1:123456789012:ops"

Be sure to make a note of the topic ARN. This is used to configure the CloudWatch alarm’s action parameter. Run the following command to subscribe your email to the SNS topic:

aws sns subscribe \
    --topic-arn arn:aws:sns:us-east-1:123456789012:ops \
    --protocol email \
    --notification-endpoint ops@acme.org

The subscription needs to be confirmed through AWS Notification - Subscription Confirmation sent through email. Once the subscription is confirmed, CloudWatch can be configured to use the SNS topic to trigger the alarm notification.

Creating a CloudWatch alarm

The following command creates a CloudWatch alarm with --alarm-actions set to the ARN of the SNS topic and the --dimensions of the alarm set to the health check ID.

aws cloudwatch put-metric-alarm \
    --region=us-east-1 \
    --alarm-name dgraph-alpha \
    --alarm-description "Alarm for when Alpha is down" \
    --metric-name HealthCheckStatus \
    --dimensions "Name=HealthCheckId,Value=29bdeaaa-f5b5-417e-a5ce-7dba1k5f131b" \
    --namespace AWS/Route53 \
    --statistic Minimum \
    --period 60 \
    --threshold 1 \
    --comparison-operator LessThanThreshold \
    --evaluation-periods 1 \
    --treat-missing-data breaching \
    --alarm-actions arn:aws:sns:us-east-1:123456789012:ops

One can verify the alarm status from the CloudWatch or Route53 consoles.

Internal endpoints

If the Alpha endpoint is internal to the VPC network, create a Lambda function that periodically (triggered using CloudWatch Event Rules) requests the /health path and creates CloudWatch metrics which could then be used to create the required CloudWatch alarms. The architecture and the CloudFormation template to achieve the same can be found here.

Getting Started

Connecting

Query Language

GraphQL-based Development

Administration

Tools

Resources

Amazon CloudWatch

Create the Route53 health check

Creating a CloudWatch alarm

Internal endpoints

Getting Started

Connecting

Query Language

GraphQL-based Development

Administration

Tools

Resources

​Amazon CloudWatch

​Create the Route53 health check

​[Optional] Creating an SNS topic

​Creating a CloudWatch alarm

​Internal endpoints

Amazon CloudWatch

Create the Route53 health check

[Optional] Creating an SNS topic

Creating a CloudWatch alarm

Internal endpoints