Ch.3: Data Is Just Data

Demo: Process heap layout: object graph vs flat dictionary

☠ Object graph: 6 heap allocations

0x7f00

0x7f20

0x7f40

0x7f60

0x7f80

0x7fA0

0x7fC0

0x7fE0

classOrder 128B + vtable

classCustomer 96B + vtable

classLineItem[] 64B × n + array header

classAddress 80B + vtable

classTaxRegion 48B + vtable

classRateTable 256B + vtable

L1 cache line (64B)

Allocations: 0

Cache misses: 0

Pointer chases: 0

✦ Flat dictionary: 1 heap allocation

0x7f00

0x7f20

0x7f40

0x7f60

0x7f80

0x7fA0

0x7fC0

0x7fE0

dictorder_data contiguous 312B

customer_name, customer_email,
items: [{sku, qty, price}, ...],
address_line1, city, state, zip,
tax_region, tax_rate,
total, discount, tax, grand_total

2-3 L1 cache line fetches (contiguous)

Allocations: 0

Cache misses: 0

Pointer chases: 0

Nanoseconds to allocate and read all data

Bytes allocated (with overhead)

What you just saw

The left panel is a process heap after constructing an Order object graph. The Order points to a Customer, which points to an Address, which points to a TaxRegion, which points to a RateTable. The LineItem array hangs off the Order separately. Six objects, six heap allocations, scattered across the address space by the allocator. To compute the order total, the CPU must chase five pointers. Each pointer chase is a potential cache miss: ~1ns if the data is in L1, ~4ns in L2, ~14ns in L3, and ~60ns if it has been evicted to main memory. The objects carry vtable pointers, alignment padding, and object headers that the computation never uses but the CPU must fetch anyway.

The right panel is the same data as a flat dictionary. One allocation. All fields contiguous in memory. The CPU fetches it in two or three cache line loads. No pointer chases. No vtable overhead. No alignment waste between objects. The prefetcher predicts the sequential access pattern and loads the next cache line before the CPU asks for it.

The class version uses more memory (672 bytes with overhead vs 312 bytes flat), takes more nanoseconds (pointer chases plus cache misses), and scatters data that the CPU needs together across addresses the CPU must hunt for separately. The dictionary version hands the CPU exactly what it needs, where it needs it, in the order it will read it.