Sat. Jan 21st, 2023

How features of hardware affect their performance and that of a system

For this page, we will concentrate on:

  • CPU
  • GPU
  • Secondary storage
  • RAM

CPU

The CPU, in very simple terms, fetches instructions from memory, and executes them. This may or may not result in data being fetched or stored.

Factors of the CPU that affect its performance are:

Clock speed

In a CPU, there is a high-precision clock which generates a pulse signal which synchronises the rest of the CPU. If there are 100 pulses per second, the CPU is capable of 100 operations per second (note that this is not necessarily 100 instructions per second – more on that much later on).

Clearly, the faster the clock speed, the more pulses per second there will be, and therefore the more operations per second that can be completed.

There are limits to clock speed: although fast, transistors do not switch state instantaneously. You are traditionally taught that a digital signal looks like this:

In reality, the lines between states are not truly vertical – it takes a (small) amount of time for the transition to occur. The effect of this is that once the clock speed becomes high enough, a transistor may not have had time to fully switch from one state to the other, at which point corruptions occur (likely resulting in a crash). If you want further information, here is a highly technical paper about transistor switching: https://www.sciencedirect.com/topics/engineering/switching-speed

And here is a detailed article about overclocking CPUs – again, knowledge that isn’t going to be examined, but could be useful. Certainly interesting! https://www.tomshardware.com/uk/reviews/cpu-overclocking-guide,4593-2.html

The other issue is that of power consumption. The switching of a transistor requires energy, and the relationship between frequency and power is of the order n2. That is, if you double the rate of switching, it will require four times as much power. Considering the small size of a CPU (including the heat-spreader, maybe an inch-square), it becomes prohibitively difficult to remove increasing amounts of thermal energy from such a small component. If the CPU heats up too much, there will initially be errors, and this will progress into physical damage which is irreparable.

Core count

Having two cores means that the CPU behaves and appears as though it were two separate CPUs. This means that is can execute two instructions simultaneously.

Of course, how useful this is is entirely dependent on the task(s) being processed. Parallel processing (where more than one CPU is working on the same task at the same time) can increase performance, but this depends on whether the task can be parallelised: i.e. can different parts of the job be completed at the same time, or does each step require the output of the previous step in order to compute? If the output of previous steps is required, then having additional cores will not help. This is a good article about parallel computing.

Historically, software was not written to take advantage of multiple cores, as this was a specialist configuration. During the last ten years, there has been a gradual shift to multi-core processors, to the point where pretty much all CPUs contain multiple cores. As the hardware has become more prevalent, developers have begun to make more use of it.

The big benefit of multi-core designs is that in general, a multi-core CPU will run at a lower clock speed, but be able to perform more operations at the same time. For example:

A single-core CPU at 4GHz may require 65W. Let’s say it can process 4 billion operations per second.

A quad-core CPU may only run at 2.4GHz, but still fit within the 65W envelope (remember: slowing down the clock speed reduces the amount of power required). But this CPU could in theory process 2.4 billion operations per second on each of its four cores, for a total of 9.6 billion operations per second. Scaling doesn’t work quite this efficiently, but it makes the point.

(Note: at no point do we refer to the quad-core CPU as being 9.6GHz: it isn’t!)

In mobile devices, it is now common (and even as of 2020 in Intel CPUs) to have a mixture of fast and slow cores within the CPU package to help lower power consumption. When the system is idle, or performing light tasks, low-speed, basic cores are used. As soon as something more demanding is required, execution switches to the faster, bigger cores.

ARM in Cambridge were the pioneers of this technique.

Cache

RAM is often described as ‘fast temporary storage’ and in comparison to secondary storage, it is. However, RAM is orders of magnitude slower than a CPU.

This causes a problem: having a fast CPU is useless if it has nothing to do. If the processor can process data faster than it can fetch it, a lot of time will be spent doing nothing.

Cache is a special super-fast type of RAM, which is located on the CPU chip, and runs at speeds up to that of the CPU. It is typically split into levels: L1, L2 and L3.

Level1 (L1) cache is located closest to the CPU, and is the smallest, but fastest. This is extremely ‘expensive’ memory, both in terms of energy and the silicon real-estate required to make it.

Level2 (L2) cache is next, and is bigger, but not as fast as L1, and Level3 (L3) is last, and is again bigger, but slower. Regardless, all three levels are still faster than RAM.

Cache works by fetching blocks of memory instead of single instructions, and is also used for frequently required data. By holding information close to the CPU, the delays involved in fetching the next piece of information are massively reduced, allowing the CPU to operate at its full potential. GeeksforGeeks has a good explanation here.

GPU

Even a modest computer nowadays has 2D and 3D graphics acceleration built in. Consider an average Full HD display (1920*1080). That’s around 2 million pixels. Each pixel typically is represented in 24bit colour, or 30 bit on better screens.

24bpp * 2 million = 48 million bits = 6MB per image.

So, for an average 60Hz screen refresh, that’s 60 * 6 MB = 360MB of data created and sent to the screen, per second.

Without specialised hardware, rapid screen updates in 2D are not possible. However, nowadays even the most basic GPU is able to drive higher resolutions than this (for instance, a single 4K display would require over 1GB of data per second)

The world of 3D acceleration is more variable. Without going into too much detail, a GPU contains specialised hardware whose purpose is to place, texture and shade polygons in order to generate 3D images for output. GPUs tend to be massively parallel (i.e. have hundreds of ‘cores’ all operating on different parts of an image).

Asides from their use in generating 3D images, GPUs are also commonly used to accelerate other software – the term GPGPU is often used, meaning ‘General Purpose GPU’, and referring to the ability of modern GPUs to be programmed using special languages, allowing them to process data in place of the CPU. Tasks that can be split into many tasks which run in parallel are good candidates for GPU acceleration – things like rendering effects in PhotoShop, Premiere, and so on. Other tasks, such as cracking passwords, are also well suited to GPU coding – with hundreds or thousands of cores, a GPU can attempt hundreds or thousands of passwords simultaneously.

Secondary Storage

As discussed here, SSDs are faster than HDDs due to the lack of moving parts, massively reduced latency (time taken to initially find the data), and increased throughput. These factors will make any system using an SSD feel far faster and more responsive than an HDD.

HDDs are still used where speed is less important than capacity – at the time of writing, a 16TB HDD can be purchased for around £400, whereas 2TB SSD can easily come close to that cost.

Systems which feature frequent read/write actions will benefit from the use of an SSD. For example, virtualisation is a technology which allows a system to run another system ‘in a window’ (I use VMWare to run Windows on a Mac laptop – the laptop boots into MacOS, but he VMWare application allows Windows to run on the computer simultaneously, inside a window just like any other application). Running two operating systems on a machine at the same time obviously increases the amount of processing which is required, and also requires a lot more disk access. Using an HDD would be a very difficult experience, as the system would appear to freeze on a regular basis. Using an SSD can eliminate this.

RAM

We know that RAM is used for temporary storage of instructions and data; on a laptop, PC or server, it is incredibly rare to see an ‘out of memory’ situation.

This is because an operating system manages RAM very carefully, and cleverly. Consider you have 4GB of RAM, and once the operating system is loaded, along with a couple of applications and documents, and the RAM is full. You don’t receive an error. The operating system creates a large file on the secondary storage device (better hope its an SSD), and maps this to memory addresses – so maybe the first 4GB of RAM are actually physical RAM, and then gigabytes 4-8 are in a file on the SSD. This is called virtual memory, and the file is a page file. The operating system treats the secondary storage as if it were RAM, and shuffles things back and forwards so that the most regularly used information is held in RAM.

This has two implications:

  • The faster the secondary storage is, the more responsive the system will be, because the page file will be faster
  • The more RAM you have in a system, the more data you can load before a page file is required. This translates into being able to open more applications and documents simultaneously before the system begins to slow down

Here is a good explanation of how this works.