Vitalik's interpretation of The Purge: Key challenges and solutions for the long-term development of Ethereum

2025-07-10 12:00:51

Vitalik: The Possible Future of Ethereum, The Purge

One of the challenges facing Ethereum is that, by default, the expansion and complexity of any blockchain protocol tend to increase over time. This occurs in two areas:

Historical Data: Any transactions made at any point in history and any accounts created need to be permanently stored by all clients and downloaded by any new clients, thereby fully synchronizing with the network. This will lead to an increasing client load and synchronization time over time, even if the capacity of the chain remains unchanged.

Protocol Functionality: Adding new features is much easier than removing old ones, leading to increased code complexity over time.

In order for Ethereum to sustain itself in the long term, we need to apply strong counter-pressure to these two trends, reducing complexity and inflation over time. But at the same time, we need to retain one of the key attributes that make blockchain great: permanence. You can place an NFT, a love letter in transaction call data, or a smart contract containing 1 million dollars on the chain, enter a cave for ten years, and come out to find it still there waiting for you to read and interact with. To allow DApps to confidently fully decentralize and remove upgrade keys, they need to be assured that their dependencies will not upgrade in a way that undermines them - especially L1 itself.

If we are determined to strike a balance between these two demands and minimize or reverse bloating, complexity, and decline while maintaining continuity, this is absolutely possible. Organisms can achieve this: while most organisms age over time, a few lucky ones do not. Even social systems can have a very long lifespan. In certain cases, Ethereum has already succeeded: proof of work has disappeared, the SELFDESTRUCT opcode has mostly vanished, and beacon chain nodes have stored up to six months of old data. Finding this path for Ethereum in a more generalized way and moving towards a long-term stable end result is the ultimate challenge for Ethereum's long-term scalability, technological sustainability, and even security.

The Purge: Main Objective.

Reduce the storage requirements for clients by minimizing or eliminating the need for each node to permanently store all historical records or even the final state.

Reduce protocol complexity by eliminating unnecessary features.

Table of Contents:

History expiry

State expiry

Feature cleanup

History expiry

What problem does it solve?

As of the time of writing this article, a fully synchronized Ethereum node requires about 1.1 TB of disk space for the execution client, in addition to several hundred GB of disk space for the consensus client. The vast majority of this is historical: data about historical blocks, transactions, and receipts, most of which are several years old. This means that even if the Gas limit does not increase at all, the size of the nodes will continue to grow by several hundred GB each year.

What is it and how does it work?

A key simplification feature of historical storage issues is that, since each block is hash-linked (and structured) to point to the previous block, reaching consensus on the current state is sufficient to reach consensus on the history. As long as the network reaches consensus on the latest block, any historical block, transaction, or state (account balance, nonce, code, storage) can be provided by any single participant along with a Merkle proof, and that proof allows anyone else to verify its correctness. Consensus is an N/2-of-N trust model, while history is an N-of-N trust model.

This provides us with many options for how to store historical records. A natural choice is a network where each node only stores a small portion of the data. This is how seed networks have operated for decades: while the network stores and distributes millions of files in total, each participant only stores and distributes a few of them. Perhaps counterintuitively, this approach may not even necessarily reduce the robustness of the data. If we can build a network with 100,000 nodes where each node stores a random 10% of the historical records by making it more cost-effective for nodes to run, then each piece of data will be replicated 10,000 times - exactly the same replication factor as a network of 10,000 nodes, where each node stores everything.

Currently, Ethereum has begun to move away from the model where all nodes permanently store all history. Consensus blocks (i.e., the parts related to Proof of Stake consensus) only store data for about 6 months. Blobs are only stored for about 18 days. EIP-4444 aims to introduce a one-year storage period for historical blocks and receipts. The long-term goal is to establish a unified period (possibly around 18 days), during which each node is responsible for storing everything, and then create a peer-to-peer network made up of Ethereum nodes to store old data in a distributed manner.

Erasure codes can be used to improve robustness while keeping the replication factor the same. In fact, the Blob has already implemented erasure coding to support data availability sampling. The simplest solution is likely to reuse these Erasure codes and also include the execution and consensus block data in the blob.

What is the connection between ### and existing research?

EIP-4444；

Torrents and EIP-4444;

Gateway Network;

Portal Network and EIP-4444;

Distributed storage and retrieval of SSZ objects in Portal;

How to increase the gas limit (Paradigm).

What else needs to be done, and what needs to be weighed?

The remaining major work includes building and integrating a specific distributed solution to store history ------ at least execution history, but eventually also consensus and blobs. The simplest solution is to simply introduce existing torrent libraries, as well as an Ethereum native solution called the Portal Network. Once any of these is introduced, we can open EIP-4444. EIP-4444 itself does not require a hard fork, but it does require a new network protocol version. Therefore, it is valuable to enable it for all clients at the same time, otherwise there is a risk of clients failing due to connecting to other nodes expecting to download the complete history but not actually retrieving it.

The main trade-off involves how we strive to provide "ancient" historical data. The simplest solution is to stop storing ancient history tomorrow and rely on existing archival nodes and various centralized providers for replication. This is easy, but it undermines Ethereum's status as a permanent record-keeping place. A more difficult but safer approach is to first build and integrate a torrent network to store history in a distributed manner. Here, "how hard we try" has two dimensions:

How do we strive to ensure that the largest set of nodes actually stores all the data?

How deep is the integration of historical storage into the protocol?

An extreme paranoid approach to (1) would involve custodial proof: effectively requiring each proof-of-stake validator to store a certain proportion of historical records and regularly verify in an encrypted manner whether they are doing so. A more moderate approach is to set a voluntary standard for the percentage of history stored by each client.

For (2), the basic implementation only involves the work that has been completed today: the Portal has already stored the ERA file containing the entire Ethereum history. A more thorough implementation would involve actually connecting it to the synchronization process so that if someone wants to synchronize a full history storage node or an archive node, they can achieve it through direct synchronization from the portal network, even if no other archive nodes are online.

( How does it interact with other parts of the roadmap?

If we want to make it extremely easy to run or start a node, then reducing historical storage requirements can be said to be more important than statelessness: out of the 1.1 TB required by the node, about 300 GB is state, and the remaining approximately 800 GB has become historical. Only by achieving statelessness and EIP-4444 can we realize the vision of running an Ethereum node on a smartwatch and setting it up in just a few minutes.

Limiting historical storage also makes newer Ethereum nodes more feasible, only supporting the latest version of the protocol, which simplifies them. For example, many lines of code can now be safely removed because the empty storage slots created during the DoS attack in 2016 have all been deleted. Now that the transition to Proof of Stake has become history, clients can safely remove all code related to Proof of Work.

State expiry

) What problem does it solve?

Even if we eliminate the need for clients to store historical records, the storage requirements of clients will continue to grow by about 50 GB each year, as the state continues to grow: account balances and randomness, contract code, and contract storage. Users can pay a one-time fee, thus permanently burdening current and future Ethereum clients.

The status is harder to "expire" than history, because the EVM is fundamentally designed around the assumption that once a state object is created, it will always exist and can be read by any transaction at any time. If we introduce statelessness, some argue that this problem may not be so bad: only specialized block builder classes need to actually store state, while all other nodes (even those generating lists!) can operate statelessly. However, there is a viewpoint that we do not want to rely too much on statelessness, and ultimately we may want to make the state expire to maintain the decentralization of Ethereum.

( What is it and how does it work?

Today, when you create a new state object (which can happen in one of three ways: (i) sending ETH to a new account, (ii) creating a new account using code, (iii) setting an untouched storage slot), that state object remains in that state forever. Instead, what we want is for the object to automatically expire over time. The key challenge is to do this in a way that achieves three goals:

Efficiency: No need for a lot of extra computation to run the expiration process.

User-friendliness: If someone enters a cave for five years and comes back, they should not lose access to their ETH, ERC20, NFT, and CDP positions...

Developer friendliness: Developers do not have to switch to an entirely unfamiliar thinking model. In addition, applications that are currently rigid and not updated should continue to operate normally.

It is easy to resolve issues if these goals are not met. For example, you can have each state object also store an expiration date counter (which can be extended by burning ETH, and this could happen automatically at any time during read or write), and have a process that loops through states to remove state objects with expired dates. However, this introduces additional computation (even storage requirements), and it certainly cannot meet the requirements of user-friendliness. Developers also find it challenging to reason about edge cases where stored values sometimes reset to zero. If you set an expiration timer within the contract scope, it technically makes developers' lives easier, but it makes the economics more complicated: developers must consider how to "pass on" the ongoing storage costs to users.

These are issues that the Ethereum core development community has been striving to address for many years, including proposals such as "blockchain rent" and "regeneration". Ultimately, we combined the best parts of the proposals and focused on two categories of "least bad known solutions":

Partial status expiration solution
Recommendations for status expiration based on address cycle.

) Partial state expiry

Some expired proposals follow the same principles. We divide the state into chunks. Everyone permanently stores the "top-level mapping", where chunks can be empty or non-empty. Data in each chunk is stored only if it has been accessed recently. There is a "revival" mechanism that activates if it is no longer stored.

![Vitalik: The Possible Future of Ethereum, The Purge]###https://img-cdn.gateio.im/webp-social/moments-a97b8c7f7927e17a3ec0fa46a48c9f24.webp###

The main difference between these proposals is: (i) how we define "

ETH6.86%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

9 Likes