Notes on Golang Memory Allocation [ Blog ]

本文是《Go语言设计与实现》7.1节的学习笔记，是对该节内容的注解和扩展。

1. Golang 内存分配原理总结

Golang采用多级内存管理组件来管理堆内存，内存分配的基本单位为mspan。

最底层为mcache，对应每一个内核线程(CPU)，申请小对象时，首先尝试mcache中分配内存
mcache之上为mcentral，当mcache中的内存容量无法满足用户程序的需要时，mcache便会尝试向mcentral中申请内存
mcentral之上为mheap，管理整个堆。mheap使用了许多heapArena对象对内存进行管理，每个heapArena管理固定大小的内存(Linux: 64MB)。若当前已有的heapArena对象占有的内存无法满足用户程序的需要，runtime会创建新的heapArena对象

用户程序使用make或者new申请内存的操作最终都会被转换为对运行时函数runtime.mallocgc的调用。该函数根据申请内存的大小，将请求分为三类：

微对象(小于16字节)，使用mcache的微内存分配器进行管理
小对象(16字节-32768字节)，从mcache或者mcentral中获取内存
大对象(大于32KB)，直接调用runtime.mcache.allocLarge，计算分配对象需要的页数，根据页数直接在堆上分配内存

2. 注解与补充

2.1 跨度类

原文链接：7.1.2-内存管理单元-跨度类

跨度类决定内存管理单元中存储的对象大小和个数。文中提到Go语言的内存管理模块中一共包含67种跨度类，所有的数据都已经被预先计算好并存储在了runtime.class_to_size和runtime.class_to_allocnpages变量中：

const (
	_MaxSmallSize   = 32768
	smallSizeDiv    = 8
	smallSizeMax    = 1024
	largeSizeDiv    = 128
	_NumSizeClasses = 68
	_PageShift      = 13
)

var class_to_size = [_NumSizeClasses]uint16{0, 8, 16, 24, 32, 48, 64, 80...}
var class_to_allocnpages = [_NumSizeClasses]uint8{0, 1, 1, 1, 1, 1, 1, 1...}

从源代码中可以看出，runtime.class_to_size和runtime.class_to_allocnpages是全局数组，大小为_NumSizeClasses = 68。后文中提到每一个runtime.mcache结构体(线程缓存)都持有68 * 2个runtime.span，与跨度类的数量不符，这是为何？

runtime.class_to_size和runtime.class_to_allocnpages变量除了存储67个跨度类(1-67)之外，还包含了ID为0的跨度类，它用来管理大于32KB的特殊对象。67+1 => 68.

2.2 systemstack

原文链接：7.1.2-线程缓存-初始化

函数runtime.allocmcache函数代码如下：

func allocmcache() *mcache {
	var c *mcache
	systemstack(func() {
		lock(&mheap_.lock)
		c = (*mcache)(mheap_.cachealloc.alloc())
		c.flushGen = mheap_.sweepgen
		unlock(&mheap_.lock)
	})
	for i := range c.alloc {
		c.alloc[i] = &emptymspan
	}
	c.nextSample = nextSample()
	return c
}

函数体中调用了函数runtime.systemstack，为什么要调用这个函数呢，这个函数到底起了什么作用？

首先看一下Golang官方给出的解释：

Runtime code often temporarily switches to the system stack using systemstack, mcall, or asmcgocall to perform tasks that must not be preempted, that must not grow the user stack, or that switch user goroutines. Code running on the system stack is implicitly non-preemptible and the garbage collector does not scan system stacks. While running on the system stack, the current user stack is not used for execution.

golang runtime 在执行过程中，遇到一些任务，这些任务在执行过程中不能被其他goroutine抢占，不能被切换，因此这些任务就要运行在system stack之上，而不是goroutine的small stack(通常只有4KB)上。

这个system stack究竟是什么？system stack这个名字，会误认为它是由操作系统维护的内核栈，如果是这样的话，那么传给runtime.systemstack的golang函数就要在内核态执行，显然是不太合理的。这个system stack，是OS为每一个线程创建的栈，其实就是用户栈的一部分。ChatGPT是这样解释的：

The term "system stack" in Go is not referring to a separate stack maintained by the operating system. Instead, it refers to a specific region of the user stack that is reserved for executing certain critical operations within the Go runtime. These critical operations include system calls, memory allocation, garbage collection, and other low-level tasks that need to bypass the normal Go user stack.

In other words, the "system stack" is still part of the user stack, but it is used for specific purposes within the Go runtime and is not accessible to the user code directly.

除了runtime的一些任务，还有一些任务是需要跑在system stack上的，比如goroutine栈的拷贝。goroutine运行在一个比较小的栈上，在执行go函数执行，系统都会检查栈空间是否还足够，如果不够，就会分配一个更大的栈，并将现有的栈内容复制到新栈之上。但是复制栈的代码显然不能运行在已经用尽空间的栈上，所以只能是system stack。此外，某些垃圾回收的任务也是在系统栈上运行的。

2.3 中心缓存

原文链接：7.1.2-中心缓存

runtime.mcentral是内存分配器的中心缓存，与线程缓存不同，访问中心缓存的内存管理单元需要使用互斥锁：

type mcentral struct {
	spanclass spanClass
	partial  [2]spanSet
	full     [2]spanSet
}

A spanSet is a data structure that holds a collection of memory spans. It is used to organize and manage memory spans of a specific size class.

partial: This set contains memory spans that have some free memory blocks available for allocation. In other words, these spans are not fully used.

full: This set contains memory spans that have all their memory blocks allocated and are fully used. In other words, there is no free space available in these spans.

这里partial和full都使用数组类型，这种技巧被称为"double buffering"，可以是加锁解锁的开销降到最低，大幅度提升内存分配性能。ChatGPT给出的解释如下：

The use of two elements in the array is an optimization technique known as "double buffering." It allows the allocator to quickly switch between two sets of partial spans without locking or contention, making the allocation process more efficient.

The Go runtime uses "double buffering" to manage partial spans efficiently. While one set of partial spans is being actively used for allocations, the other set can be prepared in the background, and once it is ready, the roles of the two sets are swapped. This minimizes contention and allows for fast memory allocations.

线程缓存会通过中心缓存的方法runtime.mcentral.cacheSpan方法来获取新的内存管理单元。原文中对该方法实现的描述如下：

调用 runtime.mcentral.partialSwept 从清理过的、包含空闲空间的 runtime.spanSet 结构中查找可以使用的内存管理单元；
调用 runtime.mcentral.partialUnswept 从未被清理过的、包含空闲空间的 runtime.spanSet 结构中查找可以使用的内存管理单元；
调用 runtime.mcentral.fullUnswept 获取未被清理的、不包含空闲空间的 runtime.spanSet中获取内存管理单元并通过 runtime.mspan.sweep 清理它的内存空间；
调用 runtime.mcentral.grow 从堆中申请新的内存管理单元；
更新内存管理单元的 allocCache 等字段帮助快速分配内存；

当时看这一段的时候感觉非常迷惑，后来才明白，这里的sweep(清理)的等同于garbage collection(垃圾回收)。

1. Golang 内存分配原理 总结

2. 注解与补充

2.1 跨度类

2.2 systemstack

2.3 中心缓存

1. Golang 内存分配原理总结