Ethereum: Simple deep-dive into EVM Storage.

Ethereum: Simple deep-dive into EVM Storage.

·

11 min read

One of the greatest features of Web3 is the Open Data Principle. The Data doesn't belong just to companies, governments, groups, and individuals - it belongs to all of them and you at once. Anyone can create a twig (the contract) in global storage (the tree), and define the Rules on how the data (the leaves) gets stored and modified. Though the contract developers spend much more effort on project design now, it gives us more transparency and trust.

Even though the data is publicly readable, Projects can still decide how transparent they are ready to go, and this level of transparency is visible to everyone. How much data are they ready to store on the blockchain? Do they publish the ABIs, the source code? Do they encrypt data?

As you may know, when the ABI is published, you can query the contract's data and logs. But developers may decide to make some state variables private, so the ABI for fetching its data won't be created. This is usually done, to hide the state variables from other contracts, but for you, as an external observer, the data is open. It's just more difficult to locate its storage slot to read the value with getStorageAt, in comparison to reading the Data provided via the Application Binary Interface.

Here I briefly explain how the contract's storage works in EVM and in Part II show the tool, which can generate the TypeScript classes with data-getter methods for all state variables based on the source code. In the dev or research environment, you can even use the data-setters to overwrite the data.

Storage

The storage is divided into 32-byte blocks (the slots). You can think about it, as a sparse Array of blocks. The maximum size of that array is — 2256 elements. A sparse — means an array with "holes", or gaps in the sequence of their indices. For example in JavaScript:

let foo = [];
foo[5] = 'A';
foo[999000] = 'B';

Though the length of the Array foo is 999001, it contains only two elements in the memory, so the indices are virtual memory pointers. I think this is important to know, to understand later how those indices (the storage locations) are calculated. But before we continue with the storage, let us go into Data-Types, and in particular their Memory Sizes.

Types

1. Simple fixed-size value types

TypeSize (Bytes)
address20
bool1
int{X}Math.ceil(X/8): int256=32B int8=1B int64=8B
bytes{X} byte{X}B: bytes32=32B byte=1B bytes8=8B
enumIt depends on the number of values in the enum. • amount < 256 then the enum is equal to uint8 amount < 512 then the enum is equal to uint16 ...

2. Complex fixed-size types

The fixed-size array of fixed-size elements

int256[3]

uint256=32B 3 ≡ ∑96B

Structs

struct Foo { uint256 value address owner }

uint256=32B address=20B ≡ ∑52B

Structs array

Foo[2]

52B 2 ≡ ∑104B

3. Variable-size types

Dynamic arrays

int256[] int256[][]Foo[]

Mappings

mapping(address => uint256)

Texts

string

Byte Buffers

bytes

Storage Layout

0

32-bytes

Storage slot #0

1

32-bytes

Storage slot #1

...........

..................

....................

2256-1

32-bytes

Storage slot #115792089237316195423570985008687907853269984665640564039457584007913129639935

From the table, it is easy to recognize, that there is a huge range of storage locations(indexes) and each storage slot has a fixed size - of 32bytes.

Storage Locations

Simple fixed-size value types

So far we know already the required number of bytes to store different variable types, and we know how the storage is divided into slots. Now let's see where the EVM stores the data for state variables and we will start from the simple contract:

contract FooContract {
    address foo;
    bool isActive;
    uint256 amount;
}

From the example, we can notice the order of declared state variables: foo is 0, isActive is 1, amound is 2. If there were more variables the order would proceed. The compiler takes this order into consideration. And so, our common sense would say each variable would occupy the slot with the index of the variable. And that's it, but with one exception: why would the "isActive" occupy the entire slot 1 when the data has only 1 byte, and in the previous slot, we have some space left (the address takes only 20bytes out of 32bytes)? Right, we can store the boolean in "slot 0", then the "amount" goes to the next slot 1 . Another question you may have is, why the data of the "amount" variable moves to the next slot if we still have some space left (11bytes) "slot 0": address(20bytes)+boolean(1byte)=21 But, as the uint256 takes the complete 32bytes we would need to split the data along 2 slots, and as the EVM reads the data per WORD basis (one slot at once) it is much better to save the base types in one slot, therefore amount variable occupies one complete slot 1 and the rule is very simple:

If we can store the complete value in the current slot, we do so, otherwise go-to the next slot and save there.

Complex fixed-size types

Here is an example with complex fixed-size state variables

contract FooContract {
    struct User {
        address foo;
        bool isActive;
        uint256 amount;
    }
    User user;
    User[3] users;
}

I explicitly took the same simple types for the User struct to show, that the logic of data locations is also the same:

  • when assigning slot numbers to a struct type, EVM assigns slot numbers to underlying slot variables.

  • when assigning slot numbers to a fixed array type, EVM assigns slot numbers to elements.

So you can think about those types, as just the logical "grouping" of data, which is represented in storage with the same locations, as without the "grouping".

contract FooContract {
    // User
    address foo;
    bool isActive;
    uint256 amount;

    // User[3]
    // User0
    address User0_foo;
    bool User0is_Active;
    uint256 User0_amount;
    // User1
    address User1_foo;
    bool User1_isActive;
    uint256 User1_amount;
    // and so on ...
}

Here we already know, how to locate and pack the data from the previous step.

Variable-size types

Again, let's start with the example:

contract FooContract {
    address[] users;
    bool isActive;
}

The previous logic won't work, as we don't know the number of elements in users array, so we can't place the data before isActive boolean, while the location of isActive would be dependent on this dynamic length of users.

The isActive the location must follow the previous rules: a) it is the 2nd variable b) we can't store it in the previous slot - which means, the isActive must stay in the slot 1 .

What do we save in 0 slot?

In case of array we must save at least the length of the array, otherwise users.length won't be possible. So length of the array goes directly into the slot 0.

Where do we save the array items?

And also we want that they are nonfragmented*, have deterministic locations*, and not collide* with other contract variables.

  • Not fragmented — we select the location in storage of the element 0 and any other goes directly after that location:

    locationOf(users[n]) === locationOf(users[0]) + n

  • Deterministic locationsif we want to access e.g. the item users[14] we should be able to calculate the storage location without any reads from the storage.

  • No collisions — there should be no collisions with other contracts variable states, nor they should be mixed or grouped. So the item locations must be isolated.

EVM solves all requirements above with a simple trick: they use keccak256 function to hash the slot number of a dynamically sized state variable. The hash is the uint256 number, which is used as a new starting position in the storage for the items. You can think about hashing as a "jump" in memory. So as we know, the users variable has the initial slot 0, the location of the users[14] will be:

const stateVariableSlotNr = 0;
let hash = keccak256(encodePacked({
    value: stateVariableSlotNr, 
    type: 'uint256'
}));
let jump = BigInt(hash); // new slot number
let users14Location = jump + 14n;
//hex: 0x290decd9548b62a8d60345a988386fc84ba6bc95484008f6362f93160ef3e571
//bigint:18569430475105882587588266137607568536673111973893317399460219858819262702961

The such approach solves all requirements above:

  • Not fragmented: After the "jump" we can store the dynamic array in the same way as we do it with fixed-size arrays - the elements follow one after another by incrementing the slot number.

  • Deterministic locations: From the source code, the compiler knows the initial slot number of the variable, that's why we can quickly calculate the "jump" with keccak256

  • Isolated: You can see in the example above how huge the "jump" number could be, with no chance for other variables to end up in the same location.

How do we save dynamic arrays of complex types?

Let's go one step further and look into complex array types, not as simple address[] as in the previous example.

contract FooContract {
    struct User {
        address foo;
        bool isActive;
        uint256 amount;
    }
    User[] users;
}

It turns out - everything works the same way, for example, to get the location of users[14].amount

  • From the source code, the compiler knows the slot index of users = 0

  • From the source code, the compiler knows that each element occupies 2 slots Remember packing?

    0 slot = foo(20bytes) + isActive(1byte)

    1 slot = amount(32bytes)

const stateVariableSlotNr = 0;
const slotsPerElement = 2n;
const amountPropertySlotNr = 1n;
const index = 14n;
let hash = keccak256(encodePacked({
    value: stateVariableSlotNr, 
    type: 'uint256'
}));
let jump = BigInt(hash); // new slot number
let users14AmountLocation = jump 
    + slotsPerElement * index 
    + amountPropertySlotNr;

What about nested dynamic arrays?

I hope from the previous examples, it won't be complicated for you to solve this task on your own - users[5].balances[8] :

contract FooContract {
    struct User {
        address foo;
        uint256[] balances;
    }
    User[] users;
}
  • users is the slot 0

  • users: the location of the first element in the array starts at

    UsersCursor = BigInt(keccak256(bytes32(0)))

  • User consists of 2 slots:

    0 slot = foo-address

    1 slot = balances-array-length

  • users[5]: the location of the 5th item is

    Item5Cursor = UsersCursor + 5 * 2

  • users[5].balances: the location of the first item in the balances array starts at: Item5BalancesCursor = BigInt(keccak256(bytes32( Item5Cursor + 1 ))

  • users[5].balances[14]: the location of the item is

    SlotNumber = Item5BalancesCursor + 14

Mappings

Another dynamically sized type is mapping. Unfortunately, it is not possible to arrange the storage for "Mapping" in "Not fragmented" and "Deterministic locations" ways at once. The Array indices are sorted, but the Mapping keys are not, so here we select determinism over defragmentation, but we apply here the similar logic of "jumps" for every key, as for the 0th item of the array. You remember - the location of the 0th item was: keccak256(bytes32(ARRAY_SLOT_NR)) With mapping items, we will also use the "MAPPING_SLOT_NR" and combine it with the key for a "jump" to every value.

let jump = BigInt(keccak256(encodePacked({
    value: bytes32(key),
    type: 'bytes32'
}, {
    value: bytes32(slotNr),
    type: 'bytes32'
})));

In ARRAY_SLOT_NR we have the array length, but in MAPPING_SLOT_NR we have nothing as there is no mapping.length, that is why we don't need any extra information in the mapping slot itself.

It is not possible to iterate over every mapping key, as there is no place in storage where the keys are stored. You must know the key to be able to locate its storage slot. That is why, it is important for a well-implemented contract, to emit the "Log", every time a mapping item is created. We know all the holders of an ERC20 token, not by reading the balances mapping, but by iterating the Transfer logs.

Complex mapping values

contract FooContract {
    struct User {
        address foo;
        bool isActive;
        uint256 amount;
    }
    mapping(address => User) users;
}

Everything works here exactly the same way: after we have calculated the location of the element by address key and the users stat variable slot number - at that index the foo and isActive are stored. The amount will be at the next slot (+ 1)


Strings and Bytes

string and bytes are not the same as byte[], bytes32 or bytes32[]

  • bytesX are fixed-sized entities in the storage, like uintX, address, etc.

  • bytesX[] are arrays of fixed-sized entities, like uintX[], address[], etc.

  • bytes and string are the dynamic-sized buffers in storage, that have slightly different storage layouts, but are very similar to arrays.

In the array's slot we store the number of items in the array. For bytes and strings, we store the size (bytes count) of the data.

EVM also applies a nice trick to pack short strings/byte buffers. If the size of the data is less then 32 bytes, we could store the size and the data in the same slot: 31 bytes for data, and 1 byte for the size number. Otherwise, the data is split into slots, which are stored in the same manner as arrays. For example, when we have the 100 bytes data, it will occupy Math.ceil(100/32) slots.


Inheritance and multiple inheritance

This is important for the slot order of the state variables. Let's look directly into the example:

contract Bar {
    uint256 bar;
}
contract Foo {
    uint256 foo;
}
contract Qux is Foo, Bar {
    uint256 qux;
}

The inheritance chain defines the order of storage variables, so the qux won't occupy the 0 slot. From the example:

0 slot = foo
1 slot = bar
2 slot = qux

For a more complex example, like the deeper inheritance chain - the rule of thumb stays the same:

The slot number of the first variable in a contract is the incremented slot number of the last variable from the previous contract in the inheritance chain.


Conclusion 🏁

By understanding the core concepts of storage variable order, basic type sizes, packing, and the "jump"s you have a clear view of EVMs storage.

You also see that the calculation of the slot locations for arbitrary contracts and variables is complicated when done manually, that's why we embedded the storage reader functionality into the TypeScript contract class generator - 0xweb. And we'll look into it in Part II.

Did you find this article valuable?

Support Alex Kit by becoming a sponsor. Any amount is appreciated!