Chapter 3

Data Is Just Data

A class wraps your data in machinery. A dict hands it to you.
Demo: Process heap layout: object graph vs flat dictionary
☠ Object graph: 6 heap allocations
0x7f00
0x7f20
0x7f40
0x7f60
0x7f80
0x7fA0
0x7fC0
0x7fE0
classOrder 128B + vtable
classCustomer 96B + vtable
classLineItem[] 64B × n + array header
classAddress 80B + vtable
classTaxRegion 48B + vtable
classRateTable 256B + vtable
L1 cache line (64B)
Allocations: 0
Cache misses: 0
Pointer chases: 0
✦ Flat dictionary: 1 heap allocation
0x7f00
0x7f20
0x7f40
0x7f60
0x7f80
0x7fA0
0x7fC0
0x7fE0
dictorder_data contiguous 312B
customer_name, customer_email,
items: [{sku, qty, price}, ...],
address_line1, city, state, zip,
tax_region, tax_rate,
total, discount, tax, grand_total
2-3 L1 cache line fetches (contiguous)
Allocations: 0
Cache misses: 0
Pointer chases: 0
0
Nanoseconds to allocate and read all data
0
Nanoseconds to allocate and read all data
0
Bytes allocated (with overhead)
0
Bytes allocated (with overhead)

What you just saw

The left panel is a process heap after constructing an Order object graph. The Order points to a Customer, which points to an Address, which points to a TaxRegion, which points to a RateTable. The LineItem array hangs off the Order separately. Six objects, six heap allocations, scattered across the address space by the allocator. To compute the order total, the CPU must chase five pointers. Each pointer chase is a potential cache miss: ~1ns if the data is in L1, ~4ns in L2, ~14ns in L3, and ~60ns if it has been evicted to main memory. The objects carry vtable pointers, alignment padding, and object headers that the computation never uses but the CPU must fetch anyway.

The right panel is the same data as a flat dictionary. One allocation. All fields contiguous in memory. The CPU fetches it in two or three cache line loads. No pointer chases. No vtable overhead. No alignment waste between objects. The prefetcher predicts the sequential access pattern and loads the next cache line before the CPU asks for it.

The class version uses more memory (672 bytes with overhead vs 312 bytes flat), takes more nanoseconds (pointer chases plus cache misses), and scatters data that the CPU needs together across addresses the CPU must hunt for separately. The dictionary version hands the CPU exactly what it needs, where it needs it, in the order it will read it.

How this works: This demo visualizes the execution difference between dishonest and honest code. Timing is proportional to real captured nanosecond costs. Instrumented source code, Dockerfile, and raw trace data: github.com/adamzwasserman/honest-code-traces
← Ch.2: Classes Considered Harmful Ch.4: Pure Functions →