pkg/plugin
. This
brings some restrictions to how plugins can be used.
pkg/plugin
only works on Linux. Therefore, plugins will only
work on Dgraph instances deployed in a Linux environment.
Tokenizer
. The type of the symbol must
be func() interface{}
. When the function is called the result returned should
be a value that implements the following interface:
Type()
corresponds to the concrete input type of
Tokens(interface{})
in the following way:
Type() return value | Tokens(interface{}) input type |
---|---|
"int" | int64 |
"float" | float64 |
"string" | string |
"bool" | bool |
"datetime" | time.Time |
plugin
build mode so that an .so
file
is produced instead of a regular executable. For example:
--custom_tokenizers
flag to tell Dgraph which
tokenizers to load. It accepts a comma separated list of plugins. E.g.
foo
to a string
predicate named
my_predicate
, use the following in the schema:
Mode | Behavior |
---|---|
anyof | Returns nodes that match on any of the tokens generated |
allof | Returns nodes that match on all of the tokens generated |
anyofterms
/allofterms
and
anyoftext
/alloftext
.
Tokens
, you can assume that value
will have concrete type
corresponding to that specified by Type()
. It is safe to do a type
assertion.
[]string
, you can always store non-unicode
data inside the string. See this blogpost
for some interesting background how string are implemented in Go and why they
can be used to store non-textual data. By storing arbitrary data in the
string, you can make the index more compact. In this case, varints are stored
in the return values.
A
and n
in their name
is Aaron:
A
and m
:
"ron"
,
you would find "Aaron"
, but not "Ronald"
. But if you were to search for
"no"
, you would match both "Aaron"
and "Ronald"
. The order of the runes in
the strings doesn’t matter.
It is possible to search for people that have any of the supplied runes in
their names (rather than all of the supplied runes). To do this, use anyof
instead of allof
:
"Ronald"
doesn’t contain m
or r
, so isn’t found by the search.
Tokens
method should be implemented.When Dgraph sees new edges that are to be indexed by your tokenizer, it will
tokenize the value. The resultant tokens are used as keys for posting lists. The
edge subject is then added to the posting list for each token.When a query root search occurs, the search value is tokenized. The result of
the search is all of the nodes in the union or intersection of the corresponding
posting lists (depending on whether anyof
or allof
was used)."100.55.22.11/32"
. The output are the CIDR
ranges that the IP address could possibly fall into. There could be up to 32
different outputs ("100.55.22.11/32"
does indeed have 32 possible ranges, one
for each mask size).
100.55.22.11/32
and 100.49.21.25/32
are both
100.48.0.0/12
. The other IP addresses in the database aren’t included in the
search result, since they have different CIDR ranges for 12 bit masks
(100.32.0.0/12
, 101.0.0.0/12
, 100.154.0.0/12
for 100.33.81.19/32
,
101.0.0.5/32
, and 100.176.2.1/32
respectively).
Note that we’re using allof
instead of anyof
. Only allof
will work
correctly with this index. Remember that the tokenizer generates all possible
CIDR ranges for an IP address. If we were to use anyof
then the search result
would include all IP addresses under the 1 bit mask (in this case, 0.0.0.0/1
,
which would match all IPs in this dataset).
anyof
or
allof
is used. The result will always be the same.
Type()
is "int"
, corresponding to the concrete
type of the input to Tokens
(which is int64
).