FLC is an architecture that redefines memory on modern devices. It offloads traditional memory usage to less expensive flash memory and solid-state drives while using only a small amount of expensive DRAM as cache. It dramatically reduces the size, cost, and power requirements of anything from personal devices to generative AI capable servers.
High-bandwidth, moderate-capacity DRAM inserted as final-level cache (FLC) for enhancing the performance of standard DDR memory. Additionally, the DDR memory can be used as a massive workload cache to hide the latency of storage (e.g., SSD) when used as a final memory.
Optimal combination of bandwidth, latency, capacity, and power dissipation
Economic & energy-efficient way to build petabyte scale accessible DRAM/SSD pool
INNOVATIVE & DISRUPTIVE
Massive Cache
Architecture
Very High (>95%) Hit Rate for FLC-1 High Bandwidth Cache; FLC-2 ~100% hit rate
Fully-associative look-up engine with gigantic entries (e.g., 32K/64K for 128MB cache)
Large cache line (e.g., 2KB, 4KB, 16KB, or larger)
Multi-level (2 or more) caching
Effective in inspecting & managing (= masking or mapping out) defective or failing memory addresses
Cache DRAM or HBM3 for FLC level 1
Final-Level Cache (FLC) Fundamental High Memory Bandwidth Technology
Memory Latency When Fully Active (Without FLC)
Low latency speed (=Published Spec., e.g. ~60ns) when idle
Big latency (e.g. >200ns) when fully active
Why High-Bandwidth FLC Wins
Sufficient bandwidth available in FLC1 for full memory access requests
Economic & energy-efficient way to build gigantic total (~peta bytes) accessible DRAM/SSD pool
What Happens When FLC 1 Misses?
Low latency from almost idle DDR for FLC 2
Much lower than conventional implementation without FLC1
Low FLC2 activity (Few % of Time when FLC1 Misses)
Without FLC
Typical DDR or CXL memory has very high latency due to the inherent overhead of CXL.
With FLC
The high bandwidth Cache DRAM hit rate of >95% results in significantly reduced latency. This is shown in the second graph. Even when missing FLC-1 High Bandwidth Cache, latency remains low, due to low utilization of the DDR as shown in the third graph.