One of the greatest features of Web3 is the Open Data Principle. The Data doesn't belong just to companies, governments, groups, and individuals - it belongs to all of them and you at once. Anyone can create a twig (the contract) in global storage (the tree), and define the Rules on how the data (the leaves) gets stored and modified. Though the contract developers spend much more effort on project design now, it gives us more transparency and trust.
Even though the data is publicly readable, Projects can still decide how transparent they are ready to go, and this level of transparency is visible to everyone. How much data are they ready to store on the blockchain? Do they publish the ABIs, the source code? Do they encrypt data?
As you may know, when the ABI is published, you can query the contract's data and logs. But developers may decide to make some state variables private, so the ABI for fetching its data won't be created. This is usually done, to hide the state variables from other contracts, but for you, as an external observer, the data is open. It's just more difficult to locate its storage slot to read the value with getStorageAt
, in comparison to reading the Data provided via the Application Binary Interface.
Here I briefly explain how the contract's storage works in EVM and in Part II show the tool, which can generate the TypeScript classes with data-getter methods for all state variables based on the source code. In the dev or research environment, you can even use the data-setters to overwrite the data.
Storage
The storage is divided into 32-byte blocks (the slots). You can think about it, as a sparse Array of blocks. The maximum size of that array is — 2256 elements. A sparse — means an array with "holes", or gaps in the sequence of their indices. For example in JavaScript:
let foo = [];
foo[5] = 'A';
foo[999000] = 'B';
Though the length of the Array foo
is 999001
, it contains only two elements in the memory, so the indices are virtual memory pointers. I think this is important to know, to understand later how those indices (the storage locations) are calculated. But before we continue with the storage, let us go into Data-Types, and in particular their Memory Sizes.
Types
1. Simple fixed-size value types
Type | Size (Bytes) |
address | 20 |
bool | 1 |
int{X} | Math.ceil(X/8) : int256=32B int8=1B int64=8B |
bytes{X} byte | {X}B : bytes32=32B byte=1B bytes8=8B |
enum | It depends on the number of values in the enum. • amount < 256 then the enum is equal to uint8 • amount < 512 then the enum is equal to uint16 • ... |
2. Complex fixed-size types
The fixed-size array of fixed-size elements | |
|
|
Structs | |
|
|
Structs array | |
|
|
3. Variable-size types
Dynamic arrays | • |
Mappings | • |
Texts | • |
Byte Buffers | • |
Storage Layout
0 |
|
|
1 |
|
|
........... | .................. | .................... |
2256-1 |
|
|
From the table, it is easy to recognize, that there is a huge range of storage locations(indexes) and each storage slot has a fixed size - of 32bytes.
Storage Locations
Simple fixed-size value types
So far we know already the required number of bytes to store different variable types, and we know how the storage is divided into slots. Now let's see where the EVM stores the data for state variables and we will start from the simple contract:
contract FooContract {
address foo;
bool isActive;
uint256 amount;
}
From the example, we can notice the order of declared state variables: foo
is 0
, isActive
is 1
, amound
is 2
. If there were more variables the order would proceed. The compiler takes this order into consideration. And so, our common sense would say each variable would occupy the slot with the index of the variable. And that's it, but with one exception: why would the "isActive"
occupy the entire slot 1
when the data has only 1 byte, and in the previous slot, we have some space left (the address
takes only 20bytes out of 32bytes)? Right, we can store the boolean in "slot 0"
, then the "amount"
goes to the next slot 1
. Another question you may have is, why the data of the "amount"
variable moves to the next slot if we still have some space left (11bytes)
"slot 0"
: address(20bytes)+boolean(1byte)=21
But, as the uint256
takes the complete 32bytes
we would need to split the data along 2 slots, and as the EVM reads the data per WORD basis (one slot at once) it is much better to save the base types in one slot, therefore amount
variable occupies one complete slot 1
and the rule is very simple:
If we can store the complete value in the current slot, we do so, otherwise go-to the next slot and save there.
Complex fixed-size types
Here is an example with complex fixed-size state variables
contract FooContract {
struct User {
address foo;
bool isActive;
uint256 amount;
}
User user;
User[3] users;
}
I explicitly took the same simple types for the User
struct to show, that the logic of data locations is also the same:
when assigning slot numbers to a
struct
type, EVM assigns slot numbers to underlying slot variables.when assigning slot numbers to a
fixed array
type, EVM assigns slot numbers to elements.
So you can think about those types, as just the logical "grouping" of data, which is represented in storage with the same locations, as without the "grouping".
contract FooContract {
// User
address foo;
bool isActive;
uint256 amount;
// User[3]
// User0
address User0_foo;
bool User0is_Active;
uint256 User0_amount;
// User1
address User1_foo;
bool User1_isActive;
uint256 User1_amount;
// and so on ...
}
Here we already know, how to locate and pack the data from the previous step.
Variable-size types
Again, let's start with the example:
contract FooContract {
address[] users;
bool isActive;
}
The previous logic won't work, as we don't know the number of elements in users
array, so we can't place the data before isActive
boolean, while the location of isActive
would be dependent on this dynamic length of users
.
The isActive
the location must follow the previous rules: a) it is the 2nd variable b) we can't store it in the previous slot - which means, the isActive
must stay in the slot 1
.
What do we save in 0
slot?
In case of array
we must save at least the length of the array, otherwise users.length
won't be possible. So length
of the array goes directly into the slot 0
.
Where do we save the array items?
And also we want that they are nonfragmented*, have deterministic locations*, and not collide* with other contract variables.
Not fragmented — we select the location in storage of the element
0
and any other goes directly after that location:locationOf(users[n]) === locationOf(users[0]) + n
Deterministic locations — if we want to access e.g. the item
users[14]
we should be able to calculate the storage location without any reads from the storage.No collisions — there should be no collisions with other contracts variable states, nor they should be mixed or grouped. So the item locations must be isolated.
EVM solves all requirements above with a simple trick: they use keccak256
function to hash the slot number of a dynamically sized state variable. The hash is the uint256
number, which is used as a new starting position in the storage for the items. You can think about hashing as a "jump" in memory. So as we know, the users
variable has the initial slot 0
, the location of the users[14]
will be:
const stateVariableSlotNr = 0;
let hash = keccak256(encodePacked({
value: stateVariableSlotNr,
type: 'uint256'
}));
let jump = BigInt(hash); // new slot number
let users14Location = jump + 14n;
//hex: 0x290decd9548b62a8d60345a988386fc84ba6bc95484008f6362f93160ef3e571
//bigint:18569430475105882587588266137607568536673111973893317399460219858819262702961
The such approach solves all requirements above:
Not fragmented: After the "jump" we can store the dynamic array in the same way as we do it with fixed-size arrays - the elements follow one after another by incrementing the slot number.
Deterministic locations: From the source code, the compiler knows the initial slot number of the variable, that's why we can quickly calculate the "jump" with
keccak256
Isolated: You can see in the example above how huge the "jump" number could be, with no chance for other variables to end up in the same location.
How do we save dynamic arrays of complex types?
Let's go one step further and look into complex array types, not as simple address[]
as in the previous example.
contract FooContract {
struct User {
address foo;
bool isActive;
uint256 amount;
}
User[] users;
}
It turns out - everything works the same way, for example, to get the location of users[14].amount
From the source code, the compiler knows the slot index of
users
=0
From the source code, the compiler knows that each element occupies 2 slots Remember packing?
0 slot = foo(20bytes) + isActive(1byte)
1 slot = amount(32bytes)
const stateVariableSlotNr = 0;
const slotsPerElement = 2n;
const amountPropertySlotNr = 1n;
const index = 14n;
let hash = keccak256(encodePacked({
value: stateVariableSlotNr,
type: 'uint256'
}));
let jump = BigInt(hash); // new slot number
let users14AmountLocation = jump
+ slotsPerElement * index
+ amountPropertySlotNr;
What about nested dynamic arrays?
I hope from the previous examples, it won't be complicated for you to solve this task on your own - users[5].balances[8]
:
contract FooContract {
struct User {
address foo;
uint256[] balances;
}
User[] users;
}
users
is the slot0
users
: the location of the first element in the array starts atUsersCursor = BigInt(keccak256(bytes32(0)))
User
consists of 2 slots:0 slot = foo-address
1 slot = balances-array-length
users[5]
: the location of the 5th item isItem5Cursor = UsersCursor + 5 * 2
users[5].balances
: the location of the first item in thebalances
array starts at:Item5BalancesCursor = BigInt(keccak256(bytes32( Item5Cursor + 1 ))
users[5].balances[14]
: the location of the item isSlotNumber = Item5BalancesCursor + 14
Mappings
Another dynamically sized type is mapping
. Unfortunately, it is not possible to arrange the storage for "Mapping
" in "Not fragmented" and "Deterministic locations" ways at once. The Array
indices are sorted, but the Mapping
keys are not, so here we select determinism over defragmentation, but we apply here the similar logic of "jumps" for every key, as for the 0th item of the array. You remember - the location of the 0th item was: keccak256(bytes32(ARRAY_SLOT_NR))
With mapping items, we will also use the "MAPPING_SLOT_NR
" and combine it with the key
for a "jump" to every value.
let jump = BigInt(keccak256(encodePacked({
value: bytes32(key),
type: 'bytes32'
}, {
value: bytes32(slotNr),
type: 'bytes32'
})));
In
ARRAY_SLOT_NR
we have the array length, but inMAPPING_SLOT_NR
we have nothing as there is nomapping.length
, that is why we don't need any extra information in the mapping slot itself.
It is not possible to iterate over every mapping key, as there is no place in storage where the keys are stored. You must know the key to be able to locate its storage slot. That is why, it is important for a well-implemented contract, to emit the "Log", every time a mapping item is created. We know all the holders of an ERC20
token, not by reading the balances
mapping, but by iterating the Transfer
logs.
Complex mapping values
contract FooContract {
struct User {
address foo;
bool isActive;
uint256 amount;
}
mapping(address => User) users;
}
Everything works here exactly the same way: after we have calculated the location of the element by address
key and the users
stat variable slot number - at that index the foo
and isActive
are stored. The amount
will be at the next slot (+ 1
)
Strings and Bytes
string
and bytes
are not the same as byte[]
, bytes32
or bytes32[]
bytesX
are fixed-sized entities in the storage, likeuintX
,address
, etc.bytesX[]
are arrays of fixed-sized entities, likeuintX[]
,address[]
, etc.bytes
andstring
are the dynamic-sized buffers in storage, that have slightly different storage layouts, but are very similar to arrays.
In the array's slot we store the number of items in the array. For bytes and strings, we store the size (bytes count) of the data.
EVM also applies a nice trick to pack short strings/byte buffers. If the size of the data is less then 32
bytes, we could store the size and the data in the same slot: 31 bytes for data, and 1 byte for the size number. Otherwise, the data is split into slots, which are stored in the same manner as arrays. For example, when we have the 100 bytes
data, it will occupy Math.ceil(100/32)
slots.
Inheritance and multiple inheritance
This is important for the slot order of the state variables. Let's look directly into the example:
contract Bar {
uint256 bar;
}
contract Foo {
uint256 foo;
}
contract Qux is Foo, Bar {
uint256 qux;
}
The inheritance chain defines the order of storage variables, so the qux
won't occupy the 0
slot. From the example:
0 slot = foo
1 slot = bar
2 slot = qux
For a more complex example, like the deeper inheritance chain - the rule of thumb stays the same:
The slot number of the first variable in a contract is the incremented slot number of the last variable from the previous contract in the inheritance chain.
Conclusion 🏁
By understanding the core concepts of storage variable order, basic type sizes, packing, and the "jump"s you have a clear view of EVMs storage.
You also see that the calculation of the slot locations for arbitrary contracts and variables is complicated when done manually, that's why we embedded the storage reader functionality into the TypeScript contract class generator - 0xweb. And we'll look into it in Part II.