Quickstart
Everything you need to get started with Badger
Prerequisites
- Go - v1.23 or higher
- Text editor - we recommend VS Code
- Terminal - access Badger through a command-line interface (CLI)
Installing
To start using Badger, run the following command to retrieve the library.
Then, install the Badger command line utility into your $GOBIN
path.
Opening a database
The top-level object in Badger is a DB
. It represents multiple files on disk
in specific directories, which contain the data for a single database.
To open your database, use the badger.Open()
function, with the appropriate
options. The Dir
and ValueDir
options are mandatory and you must specify
them in your client. To simplify, you can set both options to the same value.
Badger obtains a lock on the directories. Multiple processes can’t open the same database at the same time.
In-memory/diskless mode
By default, Badger ensures all data persists to disk. It also supports a pure
in-memory mode. When Badger is running in this mode, all data remains in memory
only. Reads and writes are much faster, but Badger loses all stored data in the
case of a crash or close. To open badger in in-memory mode, set the InMemory
option.
Encryption mode
If you enable encryption in Badger, you also need to set the index cache size.
The cache improves the performance. Otherwise, reads can be very slow with encryption enabled.
For example, to set a 100 Mb
cache:
Transactions
Read-only transactions
To start a read-only transaction, you can use the DB.View()
method:
You can’t perform any writes or deletes within this transaction. Badger ensures that you get a consistent view of the database within this closure. Any writes that happen elsewhere after the transaction has started aren’t seen by calls made within the closure.
Read-write transactions
To start a read-write transaction, you can use the DB.Update()
method:
Badger allows all database operations inside a read-write transaction.
Always check the returned error value. If you return an error within your closure it’s passed through.
An ErrConflict
error is reported in case of a conflict. Depending on the state
of your app, you have the option to retry the operation if you receive this
error.
An ErrTxnTooBig
is reported in case the number of pending writes/deletes in
the transaction exceeds a certain limit. In that case, it’s best to commit the
transaction and start a new transaction immediately. Here is an example (we
aren’t checking for errors in some places for simplicity):
Managing transactions manually
The DB.View()
and DB.Update()
methods are wrappers around the
DB.NewTransaction()
and Txn.Commit()
methods (or Txn.Discard()
in case of
read-only transactions). These helper methods start the transaction, execute a
function, and then safely discard your transaction if an error is returned. This
is the recommended way to use Badger transactions.
However, sometimes you may want to manually create and commit your transactions.
You can use the DB.NewTransaction()
function directly, which takes in a
boolean argument to specify whether a read-write transaction is required. For
read-write transactions, it’s necessary to call Txn.Commit()
to ensure the
transaction is committed. For read-only transactions, calling Txn.Discard()
is
sufficient. Txn.Commit()
also calls Txn.Discard()
internally to cleanup the
transaction, so just calling Txn.Commit()
is sufficient for read-write
transaction. However, if your code doesn’t call Txn.Commit()
for some reason
(for e.g it returns prematurely with an error), then please make sure you call
Txn.Discard()
in a defer
block. Refer to the code below.
The first argument to DB.NewTransaction()
is a boolean stating if the
transaction should be writable.
Badger allows an optional callback to the Txn.Commit()
method. Normally, the
callback can be set to nil
, and the method will return after all the writes
have succeeded. However, if this callback is provided, the Txn.Commit()
method
returns as soon as it has checked for any conflicts. The actual writing to the
disk happens asynchronously, and the callback is invoked once the writing has
finished, or an error has occurred. This can improve the throughput of the app
in some cases. But it also means that a transaction isn’t durable until the
callback has been invoked with a nil
error value.
Using key/value pairs
To save a key/value pair, use the Txn.Set()
method:
Key/Value pair can also be saved by first creating Entry
, then setting this
Entry
using Txn.SetEntry()
. Entry
also exposes methods to set properties
on it.
This sets the value of the "answer"
key to "42"
. To retrieve this value, we
can use the Txn.Get()
method:
Txn.Get()
returns ErrKeyNotFound
if the value isn’t found.
Please note that values returned from Get()
are only valid while the
transaction is open. If you need to use a value outside of the transaction then
you must use copy()
to copy it to another byte slice.
Use the Txn.Delete()
method to delete a key.
Monotonically increasing integers
To get unique monotonically increasing integers with strong durability, you can
use the DB.GetSequence
method. This method returns a Sequence
object, which
is thread-safe and can be used concurrently via various goroutines.
Badger would lease a range of integers to hand out from memory, with the
bandwidth provided to DB.GetSequence
. The frequency at which disk writes are
done is determined by this lease bandwidth and the frequency of Next
invocations. Setting a bandwidth too low would do more disk writes, setting it
too high would result in wasted integers if Badger is closed or crashes. To
avoid wasted integers, call Release
before closing Badger.
Merge operations
Badger provides support for ordered merge operations. You can define a func of
type MergeFunc
which takes in an existing value, and a value to be merged
with it. It returns a new value which is the result of the merge operation.
All values are specified in byte arrays. For e.g., here is a merge function
(add
) which appends a []byte
value to an existing []byte
value.
This function can then be passed to the DB.GetMergeOperator()
method, along
with a key, and a duration value. The duration specifies how often the merge
function is run on values that have been added using the MergeOperator.Add()
method.
MergeOperator.Get()
method can be used to retrieve the cumulative value of the
key associated with the merge operation.
Example: merge operator which increments a counter
It can be used as
Setting time to live (TTL) and user metadata on keys
Badger allows setting an optional Time to Live (TTL) value on keys. Once the TTL
has elapsed, the key is no longer retrievable and is eligible for garbage
collection. A TTL can be set as a time.Duration
value using the
Entry.WithTTL()
and Txn.SetEntry()
API methods.
An optional user metadata value can be set on each key. A user metadata value is
represented by a single byte. It can be used to set certain bits along with the
key to aid in interpreting or decoding the key-value pair. User metadata can be
set using Entry.WithMeta()
and Txn.SetEntry()
API methods.
Entry
APIs can be used to add the user metadata and TTL for same key. This
Entry
then can be set using Txn.SetEntry()
.
Iterating over keys
To iterate over keys, we can use an Iterator
, which can be obtained using the
Txn.NewIterator()
method. Iteration happens in byte-wise lexicographical
sorting order.
The iterator allows you to move to a specific point in the list of keys and move forward or backward through the keys one at a time.
By default, Badger prefetches the values of the next 100 items. You can adjust
that with the IteratorOptions.PrefetchSize
field. However, setting it to a
value higher than GOMAXPROCS
(which we recommend to be 128 or higher)
shouldn’t give any additional benefits. You can also turn off the fetching of
values altogether. See section below on key-only iteration.
Prefix scans
To iterate over a key prefix, you can combine Seek()
and ValidForPrefix()
:
Possible pagination implementation using Prefix scans
Considering that iteration happens in byte-wise lexicographical sorting
order, it’s possible to create a sorting-sensitive key. For example, a simple
blog post key might look like:feed:userUuid:timestamp:postUuid
. Here, the
timestamp
part of the key is treated as an attribute, and items will be stored
in the corresponding order:
Order ASC | Key |
---|---|
1 | feed:tQpnEDVRoCxTFQDvyQEzdo:1733127889:tQpnEDVRoCxTFQDvyQEzdo |
2 | feed:tQpnEDVRoCxTFQDvyQEzdo:1733127533:1Mryrou1xoekEaxzrFiHwL |
3 | feed:tQpnEDVRoCxTFQDvyQEzdo:1733127486:pprRrNL2WP4yfVXsSNBSx6 |
It’s important to properly configure keys for lexicographical sorting to avoid incorrect ordering.
A prefix scan through the preceding keys can be achieved using the prefix
feed:tQpnEDVRoCxTFQDvyQEzdo
. All matching keys are returned, sorted by
timestamp
.
Sorting can be done in ascending or descending order based on timestamp
or
reversed timestamp
as needed:
This makes it possible to implement simple pagination by using a limit for the number of keys and a cursor (the last key from the previous iteration) to identify where to resume.
Key-only iteration
Badger supports a unique mode of iteration called key-only iteration. It’s
several order of magnitudes faster than regular iteration, because it involves
access to the LSM-tree only, which is usually resident entirely in RAM. To
enable key-only iteration, you need to set the IteratorOptions.PrefetchValues
field to false
. This can also be used to do sparse reads for selected keys
during an iteration, by calling item.Value()
only when required.
Stream
Badger provides a Stream framework, which concurrently iterates over all or a portion of the DB, converting data into custom key-values, and streams it out serially to be sent over network, written to disk, or even written back to Badger. This is a lot faster way to iterate over Badger than using a single Iterator. Stream supports Badger in both managed and normal mode.
Stream uses the natural boundaries created by SSTables within the LSM tree, to
quickly generate key ranges. Each goroutine then picks a range and runs an
iterator to iterate over it. Each iterator iterates over all versions of values
and is created from the same transaction, thus working over a snapshot of the
DB. Every time a new key is encountered, it calls ChooseKey(item)
, followed by
KeyToList(key, itr)
. This allows a user to select or reject that key, and if
selected, convert the value versions into custom key-values. The goroutine
batches up 4 MB worth of key-values, before sending it over to a channel.
Another goroutine further batches up data from this channel using smart
batching algorithm and calls Send
serially.
This framework is designed for high throughput key-value iteration, spreading
the work of iteration across many goroutines. DB.Backup
uses this framework to
provide full and incremental backups quickly. Dgraph is a heavy user of this
framework. In fact, this framework was developed and used within Dgraph, before
getting ported over to Badger.
Garbage collection
Badger values need to be garbage collected, because of two reasons:
-
Badger keeps values separately from the LSM tree. This means that the compaction operations that clean up the LSM tree do not touch the values at all. Values need to be cleaned up separately.
-
Concurrent read/write transactions could leave behind multiple values for a single key, because they’re stored with different versions. These could accumulate, and take up unneeded space beyond the time these older versions are needed.
Badger relies on the client to perform garbage collection at a time of their choosing. It provides the following method, which can be invoked at an appropriate time:
-
DB.RunValueLogGC()
: This method is designed to do garbage collection while Badger is online. Along with randomly picking a file, it uses statistics generated by the LSM-tree compactions to pick files that are likely to lead to maximum space reclamation. It’s recommended to be called during periods of low activity in your system, or periodically. One call would only result in removal of at max one log file. As an optimization, you could also immediately re-run it whenever it returns nil error (indicating a successful value log GC), as shown below. -
DB.PurgeOlderVersions()
: This method is DEPRECATED since v1.5.0. Now, Badger’s LSM tree automatically discards older/invalid versions of keys.
The RunValueLogGC
method would not garbage collect the latest value log.
Database backup
There are two public API methods DB.Backup()
and DB.Load()
which can be used
to do online backups and restores. Badger v0.9 provides a CLI tool badger
,
which can do offline backup/restore. Make sure you have $GOPATH/bin
in your
PATH to use this tool.
The command below creates a version-agnostic backup of the database, to a file
badger.bak
in the current working directory
To restore badger.bak
in the current working directory to a new database:
See badger --help
for more details.
If you have a Badger database that was created using v0.8 (or below), you can
use the badger_backup
tool provided in v0.8.1, and then restore it using the
preceding command to upgrade your database to work with the latest version.
We recommend all users to use the Backup
and Restore
APIs and tools.
However, Badger is also rsync-friendly because all files are immutable, barring
the latest value log which is append-only. So, rsync can be used as rudimentary
way to perform a backup. In the following script, we repeat rsync to ensure that
the LSM tree remains consistent with the MANIFEST file while doing a full
backup.
Memory usage
Badger’s memory usage can be managed by tweaking several options available in
the Options
struct that’s passed in when opening the database using DB.Open
.
- Number of memtables (
Options.NumMemtables
)- If you modify
Options.NumMemtables
, also adjustOptions.NumLevelZeroTables
andOptions.NumLevelZeroTablesStall
accordingly.
- If you modify
- Number of concurrent compactions (
Options.NumCompactors
) - Size of table (
Options.BaseTableSize
) - Size of value log file (
Options.ValueLogFileSize
)
If you want to decrease the memory usage of Badger instance, tweak these options (ideally one at a time) until you achieve the desired memory usage.
Was this page helpful?