Sun. Jan 22nd, 2023


In general terms, a cache is used to allow frequently used information to be stored locally in order to speed up processing. For example, a cache is used:

  • In a web browser – when you visit a web page, the content is saved locally on your computer so that if you revisit the page later, large files can be simply reloaded locally rather than downloaded again. This is used extensively with images, CSS and javascript files.
  • On a web server – if a web server needs to fetch information from a database in order to serve pages to clients, but that information is the same for multiple requests, the server could be programmed to cache the results of the database queries locally, so that subsequent requests can use the cached information rather than requesting the information from the database server every time it is needed. It is always faster to load a resource locally than it is to request it from a remote source

The principle outlined above (keep frequently used information close) is applied in processor design too.

Cache in CPUs

CPUs simply loop around the Fetch-Decode-Execute-Store cycle. They can operate at speeds of billions of operations per second.

Although RAM is extremely fast in comparison to secondary storage, it is still orders of magnitude slower than the CPU.

Without cache memory, the CPU would spend a significant proportion of its time waiting for data and instructions to be fetched from the system RAM, which would create a performance bottleneck.

CPUs solve this problem through a tiered system of cache memory.

The diagram above shows that each core of a CPU has a small amount of L1 (Level 1) cache. This is made from SRAM (a much faster version of memory than the DRAM used for system memory), and operates at the same frequency as the CPU core.

In addition, each core also has a larger amount of L2 (Level 2) cache, which is shared between the cores on that CPU.

It is also common to find designs where each core has its own L1 and L2 cache, and each CPU then has a L3 cache which is shared between cores.

Whatever the design, the principle of operation is the same: instead of fetching individual pieces of information from RAM, blocks are fetched and stored in the cache. The more frequently or recently it is used (or, expected to be used), the closer to the core the data is moved, into the fastest cache. This means the CPU no longer has to wait when reading or storing values and instructions, and provides a huge performance boost to the system.

You can read more here [Extremetech].