We’re overhauling Dgraph’s docs to make them clearer and more approachable. If
you notice any issues during this transition or have suggestions, please
let us know.
Each Dgraph data node exposes profile over /debug/pprof
endpoint and metrics
over /debug/vars
endpoint. Each Dgraph data node has it’s own profiling and
metrics information. Below is a list of debugging information exposed by Dgraph
and the corresponding commands to retrieve them.
If you are collecting these metrics from outside the Dgraph instance you need to
pass --expose_trace=true
flag, otherwise there metrics can be collected by
connecting to the instance over localhost.
curl http://<IP>:<HTTP_PORT>/debug/vars
Metrics can also be retrieved in the Prometheus format at
/debug/prometheus_metrics
. See the Metrics section for the full
list of metrics.
Profiling information is available via the go tool pprof
profiling tool built
into Go. The
“Profiling Go programs” Go blog
post should help you get started with using pprof. Each Dgraph Zero and Dgraph
Alpha exposes a debug endpoint at /debug/pprof/<profile>
via the HTTP port.
go tool pprof http://<IP>:<HTTP_PORT>/debug/pprof/heap
Fetching profile from ...
Saved Profile in ...
The output of the command would show the location where the profile is stored.
In the interactive pprof shell, you can use commands like top
to get a listing
of the top functions in the profile, web
to get a visual graph of the profile
opened in a web browser, or list
to display a code listing with profiling
information overlaid.
CPU profile
go tool pprof http://<IP>:<HTTP_PORT>/debug/pprof/profile
Memory profile
go tool pprof http://<IP>:<HTTP_PORT>/debug/pprof/heap
Block profile
Dgraph by default doesn’t collect the block profile. Dgraph must be started with
--profile_mode=block
and --block_rate=<N>
with N > 1.
go tool pprof http://<IP>:<HTTP_PORT>/debug/pprof/block
Goroutine stack
The HTTP page /debug/pprof/
is available at the HTTP port of a Dgraph Zero or
Dgraph Alpha. From this page a link to the “full goroutine stack dump” is
available (for example, on a Dgraph Alpha this page would be at
http://localhost:8080/debug/pprof/goroutine?debug=2
). Looking at the full
goroutine stack can be useful to understand goroutine usage at that moment.
Instead of sending a request to the server for each CPU, memory, and goroutine
profile, you can use the debuginfo
command to collect all of these profiles,
along with several metrics.
You can run the command like this:
dgraph debuginfo -a <alpha_address:port> -z <zero_address:port> -d <path_to_dir_to_store_profiles>
Your output should look like:
I0311 14:13:53.243667 32654 run.go:118] using directory /tmp/dgraph-debuginfo037351492 for debug info dump.
I0311 14:13:53.243864 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/heap
I0311 14:13:53.243872 32654 debugging.go:70] please wait... (30s)
I0311 14:13:53.245338 32654 debugging.go:58] saving heap metric in /tmp/dgraph-debuginfo037351492/alpha_heap.gz
I0311 14:13:53.245349 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/profile?seconds=30
I0311 14:13:53.245357 32654 debugging.go:70] please wait... (30s)
I0311 14:14:23.250079 32654 debugging.go:58] saving cpu metric in /tmp/dgraph-debuginfo037351492/alpha_cpu.gz
I0311 14:14:23.250148 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/state
I0311 14:14:23.250173 32654 debugging.go:70] please wait... (30s)
I0311 14:14:23.255467 32654 debugging.go:58] saving state metric in /tmp/dgraph-debuginfo037351492/alpha_state.gz
I0311 14:14:23.255507 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/health
I0311 14:14:23.255528 32654 debugging.go:70] please wait... (30s)
I0311 14:14:23.257453 32654 debugging.go:58] saving health metric in /tmp/dgraph-debuginfo037351492/alpha_health.gz
I0311 14:14:23.257507 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/jemalloc
I0311 14:14:23.257548 32654 debugging.go:70] please wait... (30s)
I0311 14:14:23.259009 32654 debugging.go:58] saving jemalloc metric in /tmp/dgraph-debuginfo037351492/alpha_jemalloc.gz
I0311 14:14:23.259055 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/trace?seconds=30
I0311 14:14:23.259091 32654 debugging.go:70] please wait... (30s)
I0311 14:14:53.266092 32654 debugging.go:58] saving trace metric in /tmp/dgraph-debuginfo037351492/alpha_trace.gz
I0311 14:14:53.266152 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/metrics
I0311 14:14:53.266181 32654 debugging.go:70] please wait... (30s)
I0311 14:14:53.276357 32654 debugging.go:58] saving metrics metric in /tmp/dgraph-debuginfo037351492/alpha_metrics.gz
I0311 14:14:53.276414 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/vars
I0311 14:14:53.276439 32654 debugging.go:70] please wait... (30s)
I0311 14:14:53.278295 32654 debugging.go:58] saving vars metric in /tmp/dgraph-debuginfo037351492/alpha_vars.gz
I0311 14:14:53.278340 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/trace?seconds=30
I0311 14:14:53.278366 32654 debugging.go:70] please wait... (30s)
I0311 14:15:23.286770 32654 debugging.go:58] saving trace metric in /tmp/dgraph-debuginfo037351492/alpha_trace.gz
I0311 14:15:23.286830 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/goroutine?debug=2
I0311 14:15:23.286886 32654 debugging.go:70] please wait... (30s)
I0311 14:15:23.291120 32654 debugging.go:58] saving goroutine metric in /tmp/dgraph-debuginfo037351492/alpha_goroutine.gz
I0311 14:15:23.291164 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/block
I0311 14:15:23.291192 32654 debugging.go:70] please wait... (30s)
I0311 14:15:23.304562 32654 debugging.go:58] saving block metric in /tmp/dgraph-debuginfo037351492/alpha_block.gz
I0311 14:15:23.304664 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/mutex
I0311 14:15:23.304706 32654 debugging.go:70] please wait... (30s)
I0311 14:15:23.309171 32654 debugging.go:58] saving mutex metric in /tmp/dgraph-debuginfo037351492/alpha_mutex.gz
I0311 14:15:23.309228 32654 debugging.go:68] fetching information over HTTP from http://localhost:8080/debug/pprof/threadcreate
I0311 14:15:23.309256 32654 debugging.go:70] please wait... (30s)
I0311 14:15:23.313026 32654 debugging.go:58] saving threadcreate metric in /tmp/dgraph-debuginfo037351492/alpha_threadcreate.gz
I0311 14:15:23.385359 32654 run.go:150] Debuginfo archive successful: dgraph-debuginfo037351492.tar.gz
When the command finishes, debuginfo
returns the tarball’s file name. If no
destination has been specified, the file is created in the same directory from
where you ran the debuginfo
command.
The following files contain the metrics collected by the debuginfo
command:
dgraph-debuginfo639541060
├── alpha_block.gz
├── alpha_goroutine.gz
├── alpha_health.gz
├── alpha_heap.gz
├── alpha_jemalloc.gz
├── alpha_mutex.gz
├── alpha_profile.gz
├── alpha_state.gz
├── alpha_threadcreate.gz
├── alpha_trace.gz
├── zero_block.gz
├── zero_goroutine.gz
├── zero_health.gz
├── zero_heap.gz
├── zero_jemalloc.gz
├── zero_mutex.gz
├── zero_profile.gz
├── zero_state.gz
├── zero_threadcreate.gz
└── zero_trace.gz
Command parameters
-a, --alpha string Address of running dgraph alpha. (default "localhost:8080")
-x, --archive Whether to archive the generated report (default true)
-d, --directory string Directory to write the debug info into.
-h, --help help for debuginfo
-m, --metrics strings List of metrics & profiles to dump in the report. (default [heap,cpu,state,health,jemalloc,trace,metrics,vars,trace,goroutine,block,mutex,threadcreate])
-s, --seconds uint32 Duration for time-based metric collection. (default 30)
-z, --zero string Address of running dgraph zero.
The metrics flag (-m
)
By default, debuginfo
collects:
heap
cpu
state
health
jemalloc
trace
metrics
vars
trace
goroutine
block
mutex
threadcreate
If needed, you can collect some of them (not necessarily all). For example, this
command collects only jemalloc
and health
profiles:
dgraph debuginfo -m jemalloc,health
Profiles details
-
cpu profile
: CPU profile determines where a program spends its time while
actively consuming CPU cycles (as opposed to while sleeping or waiting for
I/O).
-
heap
: Heap profile reports memory allocation samples; used to monitor
current and historical memory usage, and to check for memory leaks.
-
threadcreate
: Thread creation profile reports the sections of the program
that lead the creation of new OS threads.
-
goroutine
: Goroutine profile reports the stack traces of all current
goroutines.
-
block
: Block profile shows where goroutines block waiting on synchronization
primitives (including timer channels).
-
mutex
: Mutex profile reports the lock contentions. When you think your CPU
isn’t fully utilized due to a mutex contention, use this profile.
-
trace
: this capture a wide range of runtime events. Execution tracer is a
tool to detect latency and utilization problems. You can examine how well the
CPU is utilized, and when networking or syscalls are a cause of preemption for
the goroutines. Tracer is useful to identify poorly parallelized execution,
understand some of the core runtime events, and how your goroutines execute.
To debug a running Dgraph cluster, first copy the postings (“p”) directory to
another location. If the Dgraph cluster isn’t running, then you can use the same
postings directory with the debug tool. If the “p” directory has been encrypted,
then the debug tool needs to use the --keyfile <path-to-keyfile>
option. This
file must contain the same key that was used to encrypt the “p” directory.
The dgraph debug
tool can be used to inspect Dgraph’s posting list structure.
You can use the debug tool to inspect the data, schema, and indices of your
Dgraph cluster.
Some scenarios where the debug tool is useful:
- Verify that mutations committed to Dgraph have been persisted to disk.
- Verify that indices are created.
- Inspect the history of a posting list.
- Parse a badger key into meaningful struct
Example
Debug the p directory.
dgraph debug --postings ./p
Debug the p directory, not opening in read-only mode. This is typically
necessary when the database wasn’t closed properly.
dgraph debug --postings ./p --readonly=false
Debug the p directory, only outputting the keys for the predicate 0-name
. Note
that 0 is the namespace and name is the predicate.
dgraph debug --postings ./p --readonly=false --pred=0-name
Debug the p directory, looking up a particular key:
dgraph debug --postings ./p --lookup 01000000000000000000046e616d65
Debug the p directory, inspecting the history of a particular key:
dgraph debug --postings ./p --lookup 01000000000000000000046e616d65 --history
Debug an encrypted p directory with the key in a local file at the path
./key_file:
dgraph debug --postings ./p --encryption=key-file=./key_file
The key file contains the key used to decrypt/encrypt the db. This key should be
kept secret. As a best practice,
-
Don’t store the key file on the disk permanently. Back it up in a safe place
and delete it after using it with the debug tool.
-
If the this isn’t possible, make sure correct privileges are set on the key
file. Only the user who owns the dgraph process should be able to read or
write the key file: chmod 600
Let’s go over an example with a Dgraph cluster with the following schema with a
term index, full-text index, and two separately committed mutations:
$ curl localhost:8080/alter -d '
name: string @index(term) .
url: string .
description: string @index(fulltext) .
'
$ curl -H "Content-Type: application/rdf" "localhost:8080/mutate?commitNow=true" -d '{
set {
_:dgraph <name> "Dgraph" .
_:dgraph <dgraph.type> "Software" .
_:dgraph <url> "https://github.com/hypermodeinc/dgraph" .
_:dgraph <description> "Fast, Transactional, Distributed Graph Database." .
}
}'
$ curl -H "Content-Type: application/rdf" "localhost:8080/mutate?commitNow=true" -d '{
set {
_:badger <name> "Badger" .
_:badger <dgraph.type> "Software" .
_:badger <url> "https://github.com/hypermodeinc/badger" .
_:badger <description> "Embeddable, persistent and fast key-value (KV) database written in pure Go." .
}
}'
After stopping Dgraph, you can run the debug tool to inspect the postings
directory:
The debug output can be very large. Typically you would redirect the debug
tool to a file first for easier analysis.
dgraph debug --postings ./p
Opening DB: ./p
prefix =
{d} ns: 0x0 attr: url uid: 1 ts: 5 item: [79, b0100] sz: 79 dcnt: 1 key: 000000000000000000000375726c000000000000000001
{d} ns: 0x0 attr: url uid: 2 ts: 8 item: [108, b1000] sz: 108 dcnt: 0 isz: 187 icount: 2 key: 000000000000000000000375726c000000000000000002
{d} ns: 0x0 attr: name uid: 1 ts: 5 item: [51, b0100] sz: 51 dcnt: 1 key: 00000000000000000000046e616d65000000000000000001
{d} ns: 0x0 attr: name uid: 2 ts: 8 item: [80, b1000] sz: 80 dcnt: 0 isz: 131 icount: 2 key: 00000000000000000000046e616d65000000000000000002
{i} ns: 0x0 attr: name term: [1] [badger] ts: 8 item: [41, b1000] sz: 41 dcnt: 0 isz: 79 icount: 2 key: 00000000000000000000046e616d650201626164676572
{i} ns: 0x0 attr: name term: [1] [dgraph] ts: 5 item: [38, b0100] sz: 38 dcnt: 1 key: 00000000000000000000046e616d650201646772617068
{d} ns: 0x0 attr: description uid: 1 ts: 5 item: [100, b0100] sz: 100 dcnt: 1 key: 000000000000000000000b6465736372697074696f6e000000000000000001
{d} ns: 0x0 attr: description uid: 2 ts: 8 item: [156, b1000] sz: 156 dcnt: 0 isz: 283 icount: 2 key: 000000000000000000000b6465736372697074696f6e000000000000000002
{i} ns: 0x0 attr: description term: [8] [databas] ts: 8 item: [49, b1000] sz: 49 dcnt: 0 isz: 141 icount: 3 key: 000000000000000000000b6465736372697074696f6e020864617461626173
{i} ns: 0x0 attr: description term: [8] [distribut] ts: 5 item: [48, b0100] sz: 48 dcnt: 1 key: 000000000000000000000b6465736372697074696f6e0208646973747269627574
{i} ns: 0x0 attr: description term: [8] [embedd] ts: 8 item: [48, b1000] sz: 48 dcnt: 0 isz: 93 icount: 2 key: 000000000000000000000b6465736372697074696f6e0208656d62656464
{i} ns: 0x0 attr: description term: [8] [fast] ts: 8 item: [46, b1000] sz: 46 dcnt: 0 isz: 132 icount: 3 key: 000000000000000000000b6465736372697074696f6e020866617374
{i} ns: 0x0 attr: description term: [8] [go] ts: 8 item: [44, b1000] sz: 44 dcnt: 0 isz: 85 icount: 2 key: 000000000000000000000b6465736372697074696f6e0208676f
{i} ns: 0x0 attr: description term: [8] [graph] ts: 5 item: [44, b0100] sz: 44 dcnt: 1 key: 000000000000000000000b6465736372697074696f6e02086772617068
{i} ns: 0x0 attr: description term: [8] [kei] ts: 8 item: [45, b1000] sz: 45 dcnt: 0 isz: 87 icount: 2 key: 000000000000000000000b6465736372697074696f6e02086b6569
{i} ns: 0x0 attr: description term: [8] [kv] ts: 8 item: [44, b1000] sz: 44 dcnt: 0 isz: 85 icount: 2 key: 000000000000000000000b6465736372697074696f6e02086b76
{i} ns: 0x0 attr: description term: [8] [persist] ts: 8 item: [49, b1000] sz: 49 dcnt: 0 isz: 95 icount: 2 key: 000000000000000000000b6465736372697074696f6e020870657273697374
{i} ns: 0x0 attr: description term: [8] [pure] ts: 8 item: [46, b1000] sz: 46 dcnt: 0 isz: 89 icount: 2 key: 000000000000000000000b6465736372697074696f6e020870757265
{i} ns: 0x0 attr: description term: [8] [transact] ts: 5 item: [47, b0100] sz: 47 dcnt: 1 key: 000000000000000000000b6465736372697074696f6e02087472616e73616374
{i} ns: 0x0 attr: description term: [8] [valu] ts: 8 item: [46, b1000] sz: 46 dcnt: 0 isz: 89 icount: 2 key: 000000000000000000000b6465736372697074696f6e020876616c75
{i} ns: 0x0 attr: description term: [8] [written] ts: 8 item: [49, b1000] sz: 49 dcnt: 0 isz: 95 icount: 2 key: 000000000000000000000b6465736372697074696f6e02087772697474656e
{d} ns: 0x0 attr: dgraph.type uid: 1 ts: 5 item: [60, b0100] sz: 60 dcnt: 1 key: 000000000000000000000b6467726170682e74797065000000000000000001
{d} ns: 0x0 attr: dgraph.type uid: 2 ts: 8 item: [88, b1000] sz: 88 dcnt: 0 isz: 148 icount: 2 key: 000000000000000000000b6467726170682e74797065000000000000000002
{i} ns: 0x0 attr: dgraph.type term: [2] [Software] ts: 8 item: [50, b1000] sz: 50 dcnt: 0 isz: 144 icount: 3 key: 000000000000000000000b6467726170682e747970650202536f667477617265
{s} ns: 0x0 attr: url ts: 3 item: [23, b0001] sz: 23 dcnt: 0 isz: 23 icount: 1 key: 010000000000000000000375726c
{s} ns: 0x0 attr: name ts: 3 item: [33, b0001] sz: 33 dcnt: 0 isz: 33 icount: 1 key: 01000000000000000000046e616d65
{s} ns: 0x0 attr: description ts: 3 item: [51, b0001] sz: 51 dcnt: 0 isz: 51 icount: 1 key: 010000000000000000000b6465736372697074696f6e
{s} ns: 0x0 attr: dgraph.type ts: 1 item: [50, b0001] sz: 50 dcnt: 0 isz: 50 icount: 1 key: 010000000000000000000b6467726170682e74797065
{s} ns: 0x0 attr: dgraph.drop.op ts: 1 item: [45, b0001] sz: 45 dcnt: 0 isz: 45 icount: 1 key: 010000000000000000000e6467726170682e64726f702e6f70
{s} ns: 0x0 attr: dgraph.graphql.xid ts: 1 item: [64, b0001] sz: 64 dcnt: 0 isz: 64 icount: 1 key: 01000000000000000000126467726170682e6772617068716c2e786964
{s} ns: 0x0 attr: dgraph.graphql.schema ts: 1 item: [59, b0001] sz: 59 dcnt: 0 isz: 59 icount: 1 key: 01000000000000000000156467726170682e6772617068716c2e736368656d61
{s} ns: 0x0 attr: dgraph.graphql.p_query ts: 1 item: [71, b0001] sz: 71 dcnt: 0 isz: 71 icount: 1 key: 01000000000000000000166467726170682e6772617068716c2e705f7175657279
ns: 0x0 attr: dgraph.graphql ts: 1 item: [98, b0001] sz: 98 dcnt: 0 isz: 98 icount: 1 key: 020000000000000000000e6467726170682e6772617068716c
ns: 0x0 attr: dgraph.graphql.persisted_query ts: 1 item: [105, b0001] sz: 105 dcnt: 0 isz: 105 icount: 1 key: 020000000000000000001e6467726170682e6772617068716c2e7065727369737465645f7175657279
Found 34 keys
Each line in the debug output contains a prefix indicating the type of the key:
{d}
: data key
{i}
: index key
{c}
: count key
{r}
: reverse key
{s}
: schema key
In the preceding debug output, we see data keys, index keys, and schema keys.
Each index key has a corresponding index type. For example, in
attr: name term: [1] [dgraph]
the [1]
shows that this is the term index
(0x1). In attr: description term: [8] [fast]
, the [8]
shows that
this is the full-text index (0x8). These IDs match the index IDs
in tok.go.
Key lookup
Every key can be inspected further with the --lookup
flag for the specific
key.
dgraph debug --postings ./p --lookup 000000000000000000000b6465736372697074696f6e020866617374
Opening DB: ./p
Key: 000000000000000000000b6465736372697074696f6e020866617374 Length: 2 Is multi-part list? false Uid: 1 Op: 0
Uid: 2 Op: 0
For data keys, a lookup shows its type and value. Below, we see that the key for
attr: url uid: 1
is a string value.
dgraph debug --postings ./p --lookup 000000000000000000000375726c000000000000000001
Opening DB: ./p
Key: 000000000000000000000375726c000000000000000001 Length: 1 Is multi-part list? false Uid: 18446744073709551615 Op: 1 Type: STRING. String Value: "https://github.com/hypermodeinc/dgraph
For index keys, a lookup shows the UIDs that are part of this index. Below, we
see that the fast
index for the <description>
predicate has UIDs 0x1 and
0x2.
dgraph debug --postings ./p --lookup 000000000000000000000b6465736372697074696f6e020866617374
Opening DB: ./p
Key: 000000000000000000000b6465736372697074696f6e020866617374 Length: 2 Is multi-part list? false Uid: 1 Op: 0
Uid: 2 Op: 0
Key history
You can also look up the history of values for a key using the --history
option.
dgraph debug --postings ./p --lookup 000000000000000000000b6465736372697074696f6e020866617374 --history
Opening DB: ./p
==> key: 000000000000000000000b6465736372697074696f6e020866617374. PK: UID: 0, Attr: 0-description, IsIndex: true, Term: 0
ts: 8 {item}{discard}{complete}
Num uids = 2. Size = 16
Uid = 1
Uid = 2
ts: 7 {item}{delta}
Uid: 2 Op: 1
ts: 5 {item}{delta}
Uid: 1 Op: 1
Above, we see that UID 0x1 was committed to this index at ts 5, and UID 0x2 was
committed to this index at ts 7.
The debug output also shows UserMeta information:
{complete}
: Complete posting list
{uid}
: UID posting list
{delta}
: Delta posting list
{empty}
: Empty posting list
{item}
: Item posting list
{deleted}
: Delete marker
Parse key
You can parse a key into its constituent components using --parse_key
. This
doesn’t require a p directory.
dgraph debug --parse_key 000000000000000000000b6467726170682e74797065000000000000000001
{d} Key: UID: 1, Attr: 0-dgraph.type, Data key