Skip to main content

Spent UTXO Pruning

This guide explains how to optimize disk usage in cardano-rosetta-java through spent UTXO pruning, including its impact on Rosetta API endpoints and configuration options.

Understanding Spent UTXO Pruning

Spent UTXO pruning is a disk optimization mechanism in cardano-rosetta-java, powered by its underlying indexer, Yaci-Store. This feature selectively removes data related to spent UTXOs from the local database.

Core Principles:

  • Targeted Deletion: Only spent UTXOs are removed. All current, unspent UTXOs are preserved, ensuring the accuracy of the present blockchain state and balances.
  • Distinction from Other Pruning: This mechanism differs from what is commonly understood as 'pruning' in some other blockchain contexts, including certain descriptions in the Coinbase Mesh API (formerly Rosetta). Unlike methods such as Bitcoin's pruning (which removes entire historical blocks), our approach retains full block history but selectively trims the UTXO set by removing only spent outputs.

How it Works: When enabled, the pruning process operates as follows:

  1. New UTXOs are indexed as transactions occur.
  2. UTXOs are marked as spent when consumed in subsequent transactions.
  3. A background job periodically permanently deletes spent UTXOs that are older than a configurable safety margin (default: 129,600 blocks, ~30 days on mainnet). This buffer safeguards data integrity against chain rollbacks within Cardano's finality window.

Impact Summary:

AspectEffect
Disk Storage✅ Significantly reduced (e.g., mainnet from ~1TB to ~500GB)
Current UTXO Set✅ Fully preserved; current balances remain accurate
Historical Spent UTXOs⚠️ Permanently deleted beyond the safety margin
Query Performance✅ Improved for queries against the current UTXO set

Impact on Rosetta API Endpoints

Spent UTXO pruning affects Rosetta API endpoints differently based on their reliance on historical transaction data. The table below summarizes the impact. Note that "Recent" refers to data within the safety margin (default ~30 days).

Oldest Block Identifier

When pruning is enabled, the /network/status endpoint includes an additional oldest_block_identifier object in its response. This identifier corresponds to the latest fully queryable block with complete data. Below this block index, blocks might have missing data due to pruning, making historical queries unreliable.

EndpointCurrent StateHistorical QueriesImpact & Notes
/account/balance✅ Works⚠️ LimitedLow - Current balances unaffected
/account/coins✅ Works⚠️ LimitedLow - Current UTXO lists complete
/block✅ Recent only❌ IncompleteHigh - Missing old transaction inputs
/block/transaction✅ Recent only❌ IncompleteHigh - Missing spent UTXOs operation details
/search/transactions⚠️ Recent only❌ LimitedMedium - Hash search works, address limited
/network/status✅ Works✅ WorksNone - Returns additional oldest_block_identifier when pruning enabled
/network/*✅ Works✅ WorksNone - Independent of UTXO data
/construction/*✅ Works✅ WorksNone - Uses current UTXOs only

After enabling pruning, searching for transactions by their hash will always work, because transaction records themselves are never pruned. However, searching by address is limited: address-based searches rely on the UTXO set, and once spent UTXOs older than the pruning window are deleted, only transactions involving current or recently spent UTXOs can be found by address. Older history is not returned once pruned.

When Spent UTxO Removal should be enabled?

Recommended Use Cases

Pruning improves performance by reducing the amount of data processed during API responses, leading to faster query times and lower resource consumption. It also optimizes disk space by focusing on current data rather than maintaining a complete historical record. Consider enabling pruning if your use case aligns with the following:

  • Exchange Integrations & Wallet Services: Primarily for tracking current balances, processing recent deposits/withdrawals, and validating recent transactions.
  • Resource-Constrained Environments: Ideal when disk space is a significant limitation (e.g., under 1TB available for mainnet data).
  • Tip-of-Chain Operations: For applications focused on the latest blockchain state rather than deep historical analysis.
  • Development and Testing: Useful when a full historical dataset is not essential for development or testing purposes.
Hybrid Deployment Strategy

We recommend running pruned nodes for live, day-to-day operations to benefit from performance improvements and reduced storage needs, while maintaining non-pruned (full-history) backup nodes to handle historical transaction reconciliation or audit-related queries as needed.

When to avoid setting UtxO Removal feature?

Not Suitable For

Avoid pruning if your operational or regulatory requirements necessitate access to complete and auditable historical blockchain data. Pruning is generally not suitable if you need:

  • Complete Historical Data & Deep Queries: For comprehensive auditing, compliance, data analytics, or block explorer-like functionality that requires querying full transaction history from any point in time.
  • Strict Compliance and Audit Trails: If regulatory mandates demand immutable, complete historical records. Pruned data cannot be recovered without a full resync, and historical queries for /block and /block/transaction become unreliable beyond the safety window.
Data Loss Warning

Once data is pruned, it cannot be recovered without a full blockchain resynchronization. Assess your historical data needs carefully before enabling pruning.

Configuration

Spent UTXO pruning is configured via environment variables, typically set in your .env.dockerfile or .env.docker-compose file:

# --- Spent UTXO Pruning Configuration ---

# Enable or disable spent UTXO pruning.
# Default: true (Pruning is enabled by default)
# To disable, set to: false
REMOVE_SPENT_UTXOS=true

# Safety margin: Number of recent blocks for which spent UTXOs are retained.
# Default: 129600 (approximately 30 days of blocks on mainnet)
# This value balances safety for rollbacks against storage savings.
# Example: To keep ~7 days of spent UTXOs, set to 30240.
# Note: Larger REMOVE_SPENT_UTXOS_LAST_BLOCKS_GRACE_COUNT values provide longer historical query support
# but use more disk space and delay the realization of storage benefits.
REMOVE_SPENT_UTXOS_LAST_BLOCKS_GRACE_COUNT=129600
Configuration Guidelines
  • Default settings have pruning enabled (REMOVE_SPENT_UTXOS=true) for optimal storage efficiency.
  • The provided defaults (REMOVE_SPENT_UTXOS_LAST_BLOCKS_GRACE_COUNT=129600) offer ~30 days of rollback safety on mainnet.
  • Decrease REMOVE_SPENT_UTXOS_LAST_BLOCKS_GRACE_COUNT for more aggressive space savings (e.g., 2160 for ~12 hours); increase it if you need a longer historical query window.

Migration and Operational Notes

This section outlines key considerations when changing pruning settings or managing a system with pruning enabled.

Changing Pruning Settings on an Existing Deployment

To change the pruning configuration, update the REMOVE_SPENT_UTXOS variable in your environment (to either true or false) and restart your cardano-rosetta-java services.

Resynchronization Is Required to Apply Changes

It is critical to understand that disabling pruning only affects how new blocks are handled; it does not retroactively alter your existing database. To have entire historical data again for a pruned enabled instance, an indexer resynchronization is required.

  • When disabling pruning (false), a resync is required to rebuild the complete transaction history that was previously pruned away.
  • When enabling pruning (true), a resync is not required to clear out historically spent UTXOs and reclaim disk space.

Without a resynchronization, your database will exist in a mixed state, and you will not see the expected results of your configuration change immediately.

How to Resynchronize the Indexer

The resynchronization process rebuilds the indexer database from your existing Cardano node data, which is much faster than resyncing the entire blockchain from scratch.

This is necessary in two main scenarios:

  • To reclaim disk space: When you enable pruning on an existing instance, a resync will clear out historically spent UTXOs. However, to actually reduce the file size on disk after REMOVE_SPENT_UTXOS is enabled and the indexer has fully synced, you must manually reclaim the space using PostgreSQL's VACUUM FULL command on the affected tables:

    # Connect to the PostgreSQL database
    docker exec -e PGPASSWORD="<password>" -it <postgres-container-name> psql -U <username> -d rosetta-java

    # set schema
    set search_path to mainnet;

    # Run VACUUM FULL on the tables that store UTXO data
    VACUUM FULL tx_input;
    VACUUM FULL address_utxo;
    Database Maintenance Required

    The VACUUM FULL operation requires an exclusive lock on the tables and can take significant time to complete. Plan for potential downtime during this maintenance operation. In addition, sufficient free disk space must be available, since VACUUM FULL rewrites each table by creating a new copy before replacing the old one. For example, in this case you would need approximately 400 GB of free space to complete the operation. The process will:

    • Reclaim disk space: Remove the gaps left by deleted spent UTXOs
    • Reorganize table data: Compact the remaining data for better performance
    • Require exclusive access: Block all other operations on these tables during execution
    Alternative Approach

    If you prefer to avoid the VACUUM FULL maintenance window, performing a complete indexer resynchronization (as described above) will achieve the same disk space reclamation while rebuilding the database from scratch with the new pruning configuration.

  • To restore full history: When you disable pruning, a resync will rebuild the complete transaction history that was previously pruned away.

Quick Resynchronization Steps
  1. Stop the stack: Gracefully shut down your services using docker compose down.

  2. Remove the indexer volume: Delete the persistent storage used by the indexer's Postgres database (do not touch the Cardano node data).

    # If your compose file uses a **bind mount** (default):
    sudo rm -rf ${DB_PATH} # replace ${DB_PATH} with the value from your .env file
  3. Restart the stack: Start the services again with docker compose up -d. The indexer will begin resyncing from the node, applying your new configuration.

Do Not Delete Cardano Node Data

Removing the node's data volume is unnecessary for this process and will trigger a full, time-consuming blockchain resynchronization, leading to significant downtime.

Further Reading