Saturday, December 12, 2020

Why Is Apple’s M1 Chip So Fast?

 

Why Is Apple’s M1 Chip So Fast?

Real-world experience with the new M1 Macs has started ticking in. They are fast. Real fast. But why? What is the magic?



Image: Apple


What is a microprocessor (CPU)?

A very basic RISC CPU, not the M1. Instructions are moved from memory along blue arrows into the instruction register. There a decoder figures out what the instruction is and enables different parts of the CPU through the red control lines. The ALU adds and subtracts numbers placed in the registers.


load r1, 150
load r2, 200
add r1, r2
store r1, 310
An old mechanical calculator with two registers: the accumulator and input register. Modern CPUs typically have more than a dozen registers, and they are electronic rather than mechanical.

The M1 is not a CPU!

M1 is a system on a chip. Meaning all the parts making up a computer are placed on one silicon chip.

Example of a computer motherboard. Memory, CPU, graphics cards, IO controllers, network cards, and many other components can be attached to the motherboard to communicate with each other.



However because we are able to put so many transistors on a silicon die today, companies such as Intel and AMD began putting multiple microprocessors onto one chip. Today we refer to these chips as CPU cores. One core is basically a full independent chip that can read instructions from memory and perform calculations.

A microchip with multiple CPU cores

Apple’s not so secret heterogeneous computing strategy

In blue you see multiple CPU cores accessing memory, and in green you see large numbers of GPU cores accessing memory.

What is Special About Apple’s Unified Memory Architecture?

Image for post
CPUs don’t need a lot of data served, but they want it fast.
Image for post
This is how your GPU wants their memory: huge portions. The more the merrier.
Image for post
GeForce RTX 3080
Image for post
How Mac’s used GPUs before unified memory. There was even an option of having graphics cards outside the computer using a Thunderbolt 3 cable. There is some speculation that this may still be possible in the future.

If SoCs Are So Smart, Why Don’t Intel and AMD Copy This Strategy?

Image for post
AMD Ryzen Accelerated Processing Unit (APU) which combines CPU and GPU (Radeon Vega) on one silicon chip. Does however not contain other co-processors, IO-controllers, or unified memory.
Image for post
TSMC semiconductor foundry in Taiwan. TSMC manufactures chips for other companies such as AMD, Apple, Nvidia, and Qualcomm.

The fundamental challenge of making any CPU run fast

Multi-core or Out-of-Order processors?

Image for post
Image for post
The Ampere Altra Max ARM CPU with 128 cores designed for cloud computing, where a lot of hardware threads is a benefit.

How Out-of-Order Execution Works

Image for post
Robot pickers in a Warehouse for Komplett.no, an online store in Norway
01: mul r1, r2, r3    // r1 ← r2 × r3
02: add r4, r1, 5 // r4 ← r1 + 5
03: add r6, r2, 1 // r6 ← r2 + 1

ISA Instructions vs Micro-Operations

MOV ax, 24
LDR r0, 24
01: mul r1, r2, r3    // r1 ← r2 × r3
02: add r4, r1, 5 // r4 ← r1 + 5
03: add r6, r2, 1 // r1 ← r2 + 1

Why is AMD and Intel Out-of-Order execution inferior to M1?

Why can’t Intel and AMD add more instruction decoders?

But AMDs Zen3 cores are still faster right?

The future

  • M1, 
  • Apple Silicon, apple, microprocessor,
    • Performance,
  • some abandoned plane incidents: click Here

    Oldest technologies scientists still cannot explain: click here