Ethereum's State Machine: Accounts, Storage, and the World State Trie

state-as-the-foundation-of-a-world-computer">State as the Foundation of a World Computer

Ethereum's ambition — a programmable blockchain that can run arbitrary smart contracts — required a fundamentally different data model from Bitcoin. Where Bitcoin's state is a set of discrete unspent coins, Ethereum's state is a global key-value store mapping addresses to account data. Every smart contract's storage, every user's ETH balance, every contract's code — it all lives in a single, coherent data structure called the world state.

Understanding how Ethereum manages this state — how it is structured, how it is updated efficiently, how its integrity is verified, and why it grows — is essential to understanding Ethereum's capabilities, its costs, and its future roadmap.

The Account Model

Ethereum's state maps 20-byte addresses to accounts. There are two kinds:

Externally Owned Accounts (EOAs)

Controlled by private keys, EOAs represent users. An EOA's state consists of: - Balance — amount of ETH (in wei, where 1 ETH = 10^18 wei) - Nonce — a counter incremented with each transaction sent from this account, preventing replay attacks

EOAs have no code and no storage. They can initiate transactions, sign messages, and receive ETH.

Contract Accounts

Created by deployment transactions, contract accounts consist of: - Balance — ETH held by the contract - Nonce — incremented each time the contract deploys a new contract - Code hash — the Keccak-256 hash of the contract's EVM bytecode (the code itself is stored separately) - Storage root — the root hash of the contract's persistent storage trie

The separation between the code hash and actual code means multiple contracts with identical bytecode share the same code storage, and the account structure stays compact.

Merkle Patricia Tries: The Data Structure of State

Ethereum's state is encoded in a Merkle Patricia Trie (MPT) — a hybrid data structure combining a Merkle tree (for cryptographic integrity proofs) with a Patricia trie (for efficient prefix-compressed key-value storage).

Why a Trie?

A trie (also called a prefix tree) is a tree where each node's position is determined by the prefix of its key. For Ethereum's 20-byte addresses (160 bits), keys are sequences of hexadecimal digits. Each node in the Patricia trie can represent multiple key characters at once (path compression), keeping the tree shallow even for many entries.

The Merkle Property

Each node in the MPT is identified by the hash of its contents. A parent node contains the hashes of its children. The root node's hash — the state root — cryptographically commits to the entire state: if any value anywhere in the trie changes, the root hash changes. This property is what makes the state root, included in every block header, a compact cryptographic fingerprint of Ethereum's complete world state.

Four Trie Structures

Ethereum actually maintains four separate MPT structures per block:

State trie — the global account trie, mapping addresses to account data. This is the "world state."
Transaction trie — all transactions in the block, indexed by position
Receipt trie — transaction receipts (gas used, status, event logs), indexed by position
Storage trie — each contract account has its own storage trie, mapping 32-byte slot keys to 32-byte values

The storage trie root is what is stored as the storage root in the contract's account entry in the state trie — a trie of tries.

The Trie as a Proof System

One of the most powerful properties of the MPT is that it enables Merkle proofs: a compact proof that a specific key-value pair exists in the trie, given only the root hash. A proof consists of the path of nodes from root to the target leaf. Any verifier with the root hash can verify this proof without downloading the entire trie.

This enables light clients — nodes that only track block headers (and their state roots) but can verify specific account balances or storage values by requesting Merkle proofs from full nodes. The eth_getProof RPC call returns exactly these proofs, enabling trustless verification of Ethereum state.

Contract Storage Layout

Every contract account has its own storage trie mapping 32-byte slot keys to 32-byte values. The Solidity compiler assigns state variables to specific slot numbers deterministically:

Slot 0: first declared state variable (padded to 32 bytes)
Slot 1: second declared state variable
...
Mappings and dynamic arrays use derived slots: keccak256(key ++ base_slot)

Because storage is a sparse 2^256-entry mapping, there are no collisions between a mapping's computed slot and a fixed-slot variable (the probability of collision is cryptographically negligible).

Storage Access Costs

The cost asymmetry in storage operations reflects the infrastructure overhead:

Operation	EVM Opcode	Gas Cost
Read cold slot (first access in tx)	SLOAD	2,100
Read warm slot (already accessed)	SLOAD	100
Write new non-zero value	SSTORE	22,100
Update existing non-zero value	SSTORE	5,000
Zero out a slot (refund)	SSTORE	5,000 (+ 4,800 refund)

The "cold" vs "warm" distinction (introduced in EIP-2929) reflects whether a slot has been accessed before in the current transaction — previously accessed slots are cached in memory, making subsequent accesses cheaper.

State Bloat: The Long-Term Challenge

Ethereum's state grows with every deployed contract, every token holding, every DeFi position. As of 2024, the Ethereum state contains approximately 250–300 million accounts and requires roughly 80–100 GB of storage on a full node (in LevelDB format). This state bloat is one of Ethereum's most pressing scalability challenges.

Unlike block data (which can be pruned from old nodes), state must be held in fast-access storage for transaction validation. Every SLOAD and SSTORE requires looking up a value in the state trie — if the trie overflows from RAM into disk, node performance degrades dramatically.

State Expiry and Stateless Clients

Ethereum's researchers have proposed several approaches to the state bloat problem:

State expiry — accounts and storage slots that have not been accessed for a long time (e.g., one to two years) would be "expired" and removed from the active state. To use an expired state element, its owner would need to provide a proof that it once existed, allowing it to be restored. This would cap the active state size at a manageable level.

Stateless clients — nodes that validate blocks without storing the full state, by requiring block proposers to include witnesses (Merkle proofs for all state elements accessed in the block) alongside the block itself. The node verifies state transitions using only the witness, then discards it. This would dramatically reduce node storage requirements.

Both approaches depend on an efficient proof system — which brings us to Verkle trees.

Verkle Trees: The Future of Ethereum State

The MPT's Merkle proofs have a fundamental limitation: each level of the trie requires including one hash per sibling node to prove a path. For a trie with branching factor 16 (hex digits), a proof for a leaf at depth 8 might include 8 × 15 = 120 hashes — approximately 4 KB.

For stateless clients to be practical, witness sizes must be much smaller. Verkle trees, named after the cryptographic primitive they use (Vector Commitment + Merkle), solve this by using polynomial commitments (specifically, Kate-Zaverucha-Goldberg or KZG commitments) that allow proving multiple elements within a node with a constant-size proof, regardless of the branching factor.

How Verkle Trees Differ

In a Verkle tree: - The branching factor is much higher (256, compared to 16 for MPT) - Each internal node contains a polynomial commitment that commits to all its children's values at once - A proof for any element at depth 6 in a 256-branching tree requires approximately 100–200 bytes, versus 4–8 KB for an MPT proof of equivalent depth

This 20–40x reduction in proof size makes including full witnesses in blocks economically feasible, enabling truly stateless clients.

Ethereum's Verkle tree transition (planned as "The Verge" in Ethereum's roadmap) would replace the MPT with a Verkle tree, enabling stateless validation and making state expiry practical. This transition would require migrating the entire world state — approximately 300 million accounts and their storage — into the new structure, one of the most complex coordinated state migrations ever attempted on a live production system.

The State Root as Trust Anchor

Every Ethereum block header contains a stateRoot field: the root hash of the world state MPT after applying all transactions in the block. This single 32-byte value cryptographically commits to the entire Ethereum world state — every account balance, every contract's code, every storage slot's value.

When an Ethereum validator attests to a block, they are attesting that they have correctly computed this state root by executing all transactions. The consensus mechanism — thousands of validators independently computing the same state root and signing attestations — is what makes the state root trustworthy. Any discrepancy in execution produces a different state root, and the network reaches consensus on only one canonical value.

This architecture means Ethereum's security is not just about consensus on transaction ordering — it is consensus on the correctness of a massive, continuously-updated computational state. It is this property that makes Ethereum genuinely a world computer: a deterministic, globally-agreed, cryptographically-verifiable state machine.