hash
and exact
indices, and implemented a keyword-based search to find
your favorite tweets using the term
index and its functions.
In this tutorial, we’ll continue from where we left off and learn about advanced
text search features in Dgraph.
Specifically, we’ll focus on two advanced feature:
tweets
, users
, and hashtags
. It is ready
for us to explore.
names
, hashtags
, twitter handle
, city names
are a few good
examples. These predicates are easy to query using their exact values.
For instance, here is an example query.
Give me all the tweets where the user name is equal to John Campbell
.
You can easily compose queries like these after adding either the hash
or an
exact
index to the string predicates.
But, some of the string predicates store sentences. Sometimes even one or more
paragraphs of text data in them. Predicates representing a tweet, a bio, a blog
post, a product description, or a movie review are just some examples. It is
relatively hard to query these predicates.
It is not practical to query such predicates using the hash
or exact
string
indices. A keyword-based search using the term
index is a good starting point
to query such predicates. We used it in our
previous tutorial to find the tweets with an exact match
for keywords like GraphQL
, Graphs
, and Go
.
But, for some of the use cases, just the keyword-based search may not be
sufficient. You might need a more powerful search capability, and that’s when
you should consider using Full-text search.
Let’s write some queries and understand Dgraph’s Full-text search capability in
detail.
To be able to do a Full-text search, you need to first set a fulltext
index on
the tweet
predicate.
Creating a fulltext
index on any string predicate is similar to creating any
other string indices.
graph data and analyzing it in graphdb
.
You can do so by using either of alloftext
or anyoftext
in-built functions.
Both functions take two arguments. The first argument is the predicate to
search. The second argument is the space-separated string values to search for,
and we call these as the search strings
.
alloftext
function.
Go to the query tab, paste the query below, and click Run. Here is our search
string: graph data and analyze it in graphdb
.
fulltext
index on the tweets, internally, the tweets are
processed, and fulltext
tokens are generated. These fulltext
tokens are then
indexed.
The search string also goes through the same processing pipeline, and fulltext
tokens generated them too.
Here are the steps to generate the fulltext
tokens:
fulltext
tokens
generated by Dgraph.
Actual text data | fulltext tokens generated by Dgraph |
---|---|
Let’s Go and catch @francesc at @Gopherpalooza today, as he scans into Go source code by building its Graph in Dgraph!\nBe there, as he Goes through analyzing Go source code, using a Go program, that stores data in the GraphDB built in Go!\n#golang #GraphDB #Databases #Dgraph | [analyz build built catch code data databas dgraph francesc go goe golang gopherpalooza graph graphdb program scan sourc store todai us] |
graph data and analyze it in graphdb | [analyz data graph graphdb] |
fulltext
tokens generated for our search string: [analyz
,
data
, graph
, graphdb
].
As you can see from the table above, all of the fulltext
tokens generated for
the search string exist in the matched tweet. Hence, the alloftext
function
returns a positive match for the tweet. It would not have returned a positive
match even if one of the tokens in the search string is missing for the tweet.
But, the anyoftext
function would’ve returned a positive match as long as the
tweets and the search string have at least one of the tokens in common.
If you’re interested to see Dgraph’s fulltext
tokenizer in action,
here is the gist
containing the instructions to use it.
Dgraph generates the same fulltext
tokens even if the words in a search string
is differently ordered. Hence, using the same search string with different order
would not impact the query result.
As you can see, all three queries below are the same for Dgraph.
graph
.
trigram
index on the string predicate to be able to perform regex-based queries.
Using regular expression based search, let’s match all the hashtags that have
this particular pattern:
Starts and ends with any characters of indefinite length, but with the substring graph in it
.
Here is the regex expression we can use: ^.*graph.*$
Check out
this tutorial if
you’re not familiar with writing a regular expression.
Let’s first find all the hashtags in the database using the has()
function.
has()
function, refer to
the first tutorial of the series.
You can see that we have six hashtags in total, and four of them have the
substring graph
in them: Dgraph
, GraphQL
, graphqlconf
, graphDB
.
We should use the built-in function regexp
to be able to use regular
expressions to search for predicates. This function takes two arguments, the
first is the name of the predicate, and the second one is the regular
expression.
Here is the syntax of the regexp
function:
regexp(predicate, /regular-expression/)
Let’s execute the following query to find the hashtags that have the substring
graph
.
Go to the query tab, type in the query, and click Run.
trigram
index on
the hashtag
predicate.
trigram
index is similar to setting any other string index,
let’s do that for the hashtag
predicate.
regexp
query.
Dgraph
and
graphqlconf
.
That’s because regexp
function is case-sensitive by default.
Add the character i
at the end of the second argument of the regexp
function
to make it case insensitive: regexp(predicate, /regular-expression/i)
graph
in them.
Let’s modify the regular expression to match only the hashtags
which have a
prefix called graph
.
product
search in an e-commerce store?
Let’s learn about the fuzzy search in our next tutorial.
Sounds interesting?
Check out our next tutorial of the getting started series
here.