Offline Pruning in Utility Chain

Introduction#

Offline Pruning is ported from go-ethereum to reduce the amount of disk space taken up by the TrieDB (storage for the Merkle Forest).

Offline pruning creates a bloom filter and adds all trie nodes in the active state to the bloom filter to mark the data as protected. This ensures that any part of the active state will not be removed during offline pruning.

After generating the bloom filter, offline pruning iterates over the database and searches for trie nodes that are safe to be removed from disk.

A bloom filter is a probabilistic data structure that reports whether an item is definitely not in a set or possibly in a set. Therefore, for each key we iterate, we check if it is in the bloom filter. If the key is definitely not in the bloom filter, then it is not in the active state and we can safely delete it. If the key is possibly in the set, then we skip over it to ensure we do not delete any active state.

During iteration, the underlying database (LevelDB) writes deletion markers, causing a temporary increase in disk usage.

After iterating over the database and deleting any old trie nodes that it can, offline pruning then runs compaction to minimize the DB size after the potentially large number of delete operations.

Finding the Utility Chain Config File#

In order to enable offline pruning, you need to update the Utility Chain config file to include the parameters offline-pruning-enabled and offline-pruning-data-directory.

The default location of the Utility Chain config file is ~/.dijetsnodego/configs/chains/C/config.json. Please note that by default, this file does not exist. You would need to create it manually. You can update the directory for chain configs by passing in the directory of your choice via the CLI argument: chain-config-dir. See this for more info. For example, if you start your node with:

./build/dijetsnodego --chain-config-dir=/home/ubuntu/chain-configs

The chain config directory will be updated to /home/ubuntu/chain-configs and the corresponding Utility Chain config file will be: /home/ubuntu/chain-configs/C/config.json.

Running Offline Pruning#

In order to enable offline pruning, update the Utility Chain config file to include the following parameters:

{
  "offline-pruning-enabled": true,
  "offline-pruning-data-directory": "/home/ubuntu/offline-pruning"
}

This will set /home/ubuntu/offline-pruning as the directory to be used by the offline pruner. Offline pruning will store the bloom filter in this location, so you must ensure that the path exists.

Now that the Utility Chain config file has been updated, you can start your node with the command (no CLI arguments are necessary if using the default chain config directory):

./build/dijetsnodego

The bloom filter should be populated and committed to disk after about 5 minutes. At this point, if the node shuts down, it will resume the offline pruning session when it restarts (note: this operation cannot be cancelled).

In order to ensure that users do not mistakenly leave offline pruning enabled for the long term (which could result in an hour of downtime on each restart), we have added a manual protection which requires that after an offline pruning session, the node must be started with offline pruning disabled at least once before it will start with offline pruning enabled again. Therefore, once the bloom filter has been committed to disk, you should update the Utility Chain config file to include the following parameters:

{
  "offline-pruning-enabled": false,
  "offline-pruning-data-directory": "/home/ubuntu/offline-pruning"
}

It is important to keep the same data directory in the config file, so that the node knows where to look for the bloom filter on a restart if offline pruning has not finished.

Now if your node restarts, it will be marked as having correctly disabled offline pruning after the run and be allowed to resume normal operation once offline pruning has finished running.

Disk Space Considerations#

To ensure the node does not enter an inconsistent state, the bloom filter used for pruning is persisted to offline-pruning-data-directory for the duration of the operation. This directory should have offline-pruning-bloom-filter-size available in disk space (default 512 MB).

The underlying database (LevelDB) uses deletion markers (tombstones) to identify newly deleted keys. These markers are temporarily persisted to disk until they are removed during a process known as compaction. This will lead to an increase in disk usage during pruning. If your node runs out of disk space during pruning, you may safely restart the pruning operation. This may succeed as restarting the node triggers compaction.

If restarting the pruning operation does not succeed, additional disk space should be provisioned.