We’re overhauling Dgraph’s docs to make them clearer and more approachable. If
you notice any issues during this transition or have suggestions, please
let us know.
- A node is searched for, and then
- Depending on if it’s found or not, either:
- Updating some of its attributes, or
- Creating a new node with those attributes.
Upsert procedure
In Dgraph, upsert-style behavior can be implemented by users on top of transactions. The steps are as follows:- Create a new transaction.
-
Query for the node. This usually is as simple as
{ q(func: eq(email, "bob@example.com")) { uid }}
. If auid
result is returned, then that’s theuid
for the existing node. If no results are returned, then the user account doesn’t exist. -
In the case where the user account doesn’t exist, then a new node has to be
created. This is done in the usual way by making a mutation (inside the
transaction), e.g. the RDF
_:newAccount <email> "bob@example.com" .
. Theuid
assigned can be accessed by looking up the blank node namenewAccount
in theAssigned
object returned from the mutation. -
Now that you have the
uid
of the account (either new or existing), you can modify the account (using additional mutations) or perform queries on it in whichever way you wish.
Upserts in DQL and GraphQL
You can also use theUpsert Block
in DQL to achieve the upsert procedure in a
single mutation. The request contains both the query and the mutation as
explained here.
In GraphQL, you can use the upsert
input variable in an add
mutation, as
explained here.
Conflicts
Upsert operations are intended to be run concurrently, as per the needs of the app. As such, it’s possible that two concurrently running operations could try to add the same node at the same time. For example, both try to add a user with the same email address. If they do, then one of the transactions will fail with an error indicating that the transaction was aborted. If this happens, the transaction is rolled back and it’s up to the user’s app logic to retry the whole operation. The transaction has to be retried in its entirety, all the way from creating a new transaction. The choice of index placed on the predicate is important for performance. Hash is almost always the best choice of index for equality checking.It is the index that typically causes upsert conflicts to occur. The index
is stored as many key/value pairs, where each key is a combination of the
predicate name and some function of the predicate value (e.g. its hash for the
hash index). If two transactions modify the same key concurrently, then one
will fail.
uid
function in upsert
The upsert block contains one query block and mutation blocks. Variables defined
in the query block can be used in the mutation blocks using the uid
and val
function.
The uid
function allows extracting UIDs from variables defined in the query
block. There are two possible outcomes based on the results of executing the
query block:
- If the variable is empty i.e. no node matched the query, the
uid
function returns a new UID in case of aset
operation and is thus treated similar to a blank node. On the other hand, fordelete/del
operation, it returns no UID, and thus the operation becomes a no-op and is silently ignored. A blank node gets the same UID across all the mutation blocks. - If the variable stores one or more than one UIDs, the
uid
function returns all the UIDs stored in the variable. In this case, the operation is performed on all the UIDs returned, one at a time.
Example: uid
function
Consider an example with the following schema:
email
and name
information.
We also want to make sure that one email has exactly one corresponding user in
the database. To achieve this, we need to first query whether a user exists in
the database with the given email. If a user exists, we use its UID to update
the name
information. If the user doesn’t exist, we create a new user and
update the email
and name
information.
We can do this using the upsert block as follows:
v
. The mutation part then extracts the UID from variable
v
, and stores the name
and email
information in the database. If the user
exists, the information is updated. If the user doesn’t exist, uid(v)
is
treated as a blank node and a new user is created as explained above.
If we run the same mutation again, the data would just be overwritten, and no
new uid is created. Note that the uids
map is empty in the result when the
mutation is executed again and the data
map (key q
) contains the uid that
was created in the previous upsert.
json
dataset as follows:
age
information for the same user having the same
email user@company1.io
. We can use the upsert block to do the same as follows:
email
as user@company1.io
. It
stores the uid
of the user in variable v
. The mutation block then updates
the age
of the user by extracting the uid from the variable v
using uid
function.
We can achieve the same result using json
dataset as follows:
Example: Bulk delete
Let’s say we want to delete all the users ofcompany1
from the database. This
can be achieved in just one query using the upsert block as follows:
json
dataset as follows:
val
function in upsert
The upsert block allows performing queries and mutations in a single request.
The upsert block contains one query block and one or more than one mutation
blocks. Variables defined in the query block can be used in the mutation blocks
using the uid
and val
function.
The val
function allows extracting values from value variables. Value
variables store a mapping from UIDs to their corresponding values. Hence,
val(v)
is replaced by the value stored in the mapping for the UID (Subject) in
the N-Quad. If the variable v
has no value for a given UID, the mutation is
silently ignored. The val
function can be used with the result of aggregate
variables as well, in which case, all the UIDs in the mutation would be updated
with the aggregate value.
Let’s say we want to migrate the predicate age
to other
. We can do this
using the following mutation:
a
will store a mapping from all the UIDs to their age
. The
mutation block then stores the corresponding value of age
for each UID in the
other
predicate and deletes the age
predicate.
We can achieve the same result using json
dataset as follows:
External IDs and Upsert Block
The upsert block makes managing external IDs easy. Set the schema.http://schema.org/Person
” will remain but “Robin Wright
” will be
deleted.