Published on: Mar 23, 2024
Advanced Go: Internals, Memory Model, Garbage Collection and Concurrency
Since the release of Go in 2009, Go has gained a lot of traction among developers and become the de-facto choice of programming language for writing efficient Microservices and also command-line tools. Go's popularity comes with its simplicity and how effectively it solves complex concurrency problems with its simple concurrency model. Programmers coming from C/C++ backgrounds can easily start developing applications in Go even with less than 2-3 days of learning. That's much simple Golang is. Golang has been developed keeping in mind modern programming language paradigms and principles and to avoid legacy problems that can't be avoided by other languages. The way Golang demands writing and structuring the code lets developers avoid boring Object-oriented patterns.
Even though developers don't need to understand how Golang internally works, it's better to learn what the components and techniques implemented in Golang make it one of the best modern programming languages. In this blog, we will delve into some of the important concepts that make a better understanding of Golang and its internals.
I will be very brief about the following concepts as each one of them requires separate writing and mostly it will be like notes with some important points. The readers are encouraged to follow the links that will be provided for each concept to learn further.
Go vs C/C++
The developers of Go created the programming language along the lines of C/C++. Even though Go is a compiled language and most of the time it is close to the speed of C/C++, Go is not a worthy contestant for C/C++ in terms of speed. Check this Go vs C++ comparison. But for normal day-to-day tasks like writing APIs, handling network connections, and processing medium-sized data (for larger-size, prefer Rust), Go can be 1.5 times in comparison speed to C/C++ but not orders in comparison. Most of the time this speed difference is not noticeable. This makes Go the best choice for writing Microservices or web applications compared to Java, C#, or Python. Earlier Go's compiler was written in C but now changed to pure Go.
Even though Go is a compiled language and inspired by C/C++, it avoids some common compiler practices as the Go compiler doesn't do optimizations like C/C++ due to its adaptation of better memory model compared to C/C++ (more of this in a later section). The compiler optimizations done in C++ might result in dangerous pointer operations giving unexpected computations and sometimes leading to program crashes. One other important thing that makes Go slower than C/C++ is the Garbage collection implemented in Golang. So, Go has to interfere in the program lifecycle to maintain the liveness of the objects, schedule them for cleaning in the next cycle and other operations add some overhead where there is no automatic garbage collection in C++ but only manual de-allocation which causes problems sometimes (this is one of the problems that Rust solves).
And the most important part of Golang is compilation speed compared to C++. Go's superior compilation makes it the preferred heavy-size application that involves multiple iterations of compilation and deployment to test. It's like a trade-off to C++ as the latter gives performance but takes so much time in compilation where as Go is reasonable speed and compiles very fast.
Go Memory model
For any programming language memory model is one of the most important components. The techniques and principles used in the memory model determine how the programming language should be designed and written.
A memory model speaks about how the program handles the data race and also the operations performed at the memory location and atomic level like what happens when there is simultaneous read or write operations happening at the same memory location or what if two or more parties are writing to the same memory location.
Before going forward on memory models, it's worth learning how Go treats References. There are no references in the Go. People confuse with & operator in Go with the references in C++. There are only pointers in Go where the variable points to the same location of memory but doesn't share the same memory. Why is this? Because Go is strict about dangling pointers that we face in C++. So, to avoid the dangling memory object Go avoids variables with pointers directly referencing to memory location. Go passes data by value for functions, no reference data passing.
Earlier languages like C/C++, Python, and Java were developed when the programs were developed only for single-threaded applications. Go, being a modern language where harvesting all computing resources is essential like utilizing all CPU cores, has been designed to avoid problems involving simultaneous access to the same memory location. Some languages like C/C++ and Java solve this problem by using Locks and Mutexes. Even though Go has native support for Locks and Mutexes, it's not the goal for Go to solve this problem by using them. The Golang inherently solves this problem through better synchronization, restrictions of data race conditions, and a better-designed memory model. By definition, all Python native operations are atomic as it only single-threaded, but in a language like Go where concurrency is the main weapon, it has to do something extraordinary to avoid data race conditions.
The gist of the Go's memory model is, that when multiple goroutines are performing read/write operations on the same memory location, the accessing variables should only see the value before or after the value has been written but not in between. For example,
1If A, B, C, and D are the three variables accessing the same location that has a value of 40 2 3If A is reading 4B is Writing it as 50 5C is writing it as 60 6D is also reading 7 8Then, the following could be one of the outcomes without data race conditions 91. A will read the value as 40 102. B, in the meantime changes the values to 50 11 12If operation C is performed before D, then 133. C also changes the value from 50 to 60 144. D will eventually read 60 15 16else, 173. D will read 50 184. C will write it as 60 19 20Here all operations are performed without race conditions and all are atomic even though all are happening at the same time. 21
If the same thing happened in C/C++, the outcome is undetermined like
1A might read the value as 40 2 3While B is writing, C also writes to the same location, 4and D might read the value as 35 or 240 or even as -3002. 5 6This behavior is unexpected and undetermined. 7
How Go does this is very simple, it organizes the operation by goroutines as follows:
- Each goroutine execution is a set of memory operations
- Each Go program execution is a set of goroutine executions in a synchronized behavior where any write operation is executed before any read operation that comes after without executing them both
- Any write is happening, no other write operation is performed on the same location.
As discussed in the above section, to avoid data race conditions that happen when multiple goroutines access the same memory location, the compiler optimizations were excluded in Go.
Another thing is, that Go implements the memory model with one heap, multiple stacks, and a single constants and instructions memory section. Stack borrows memory from the limited heap. We can dynamically increase the heap memory and the stacks are nothing but the goroutines which we see in later sections.
Goroutines and Concurrency
Go is a Concurrency focused programming language contrary to multi-programming/parallel. Check out this famous video on Concurrency is not Parallelism by Rob Pike who is one of the creators of Go. Go concurrency model is based on Communicating sequential processes (CSP). The famous quote/Go idiom Do not communicate by sharing memory; instead, share memory by communicating. It simply means that other programming languages that were developed when the application was running with only a single thread era, those programming languages avoid data race conditions by using locks/mutexes to communicate between multiple threads. Later synchronized data structures like Queues and Pipes came into existence to solve these problems. Go being a modern programming language looks at this problem differently by taking inspiration from CSP where the process/goroutine that needs to send/notify the data to other process/goroutine is performed by completely passing the ownership of the memory in the synchronized data structure called Channels. The goroutine1 passes the data into this channel where goroutine2 listens and reads where the data ownership is completely handed over to goroutine2.
Even though Go is a concurrent programming language, we can do parallel programming with multiple Goroutines. This can be done because Go multiplexes the multiple goroutine execution to multiple OS threads for multiple cores in the machine. This makes goroutines work in parallel whereas they are processed concurrently by the scheduler. This was a game changer in the computing industry where multiple programming languages started implementing concurrency with Coroutines which have been in existence since the 1950s. Python has these coroutines, Java also introduced lightweight threads, and the big brother C++ also introduced coroutines.
Just like coroutines, goroutines are very lightweight, and the memory overhead is very minimal making goroutines the perfect choice for concurrency compared to threads. 90% of concurrency problems can be solved by using only goroutines and channels. Also, switching between goroutines is done by the process whereas switching between threads is done by OS. This makes goroutines faster.
It's good to have the following points to remember while using goroutines and concurrency:
- Select takes the first non-blocking case in a pseudo-random order. So, we cannot expect which block will execute in order as it can be random.
- Avoid using low-level data structures like Mutexes instead use the sync package for atomic-level access to primitive data types/structures.
- Atomic operations are slow, so use only when required. And don't switch between atomic and non-atomic functions.
- Apply concurrency at call-site: This means if there are multiple synchronous functions, then organize these functions as asynchronous calls by implementing concurrency at the function that is calling these synchronous functions.
Go starts goroutines with small stack memory (2kb size) and dynamically stores the data in a heap when data is grown. A stack per goroutine. This makes, scheduling goroutines very easy. Just like functions that are stored as stacks, when one goroutine spins another goroutine, the execution state is stored for the first goroutine in the stack, and the newly created goroutine is executed as a new stack in a new CPU core. This way of treating goroutines as stacks makes the scheduler suspend the goroutine execution, and continuing the execution is very easy even if it can continue the execution in a separate core completely because the goroutine state is stored in the stack.
Garbage collection
With GC, Go provides security, and fast memory access, as most of the memory is allocated in a stack. Whereas in C++, memory allocation and de-allocation in the heap should be managed by the program itself.
As in any language, the local variables (non-pointer) of a function are cleaned right away when the function is out-of-scope. So, Go doesn't care about this cleaning of local variables. As we all know these variables are stored in the stack these do not need to be looked up for recycling as the process is dynamic. GC handles the memory stored in the heap which is finite in size and the compiler should take care of this heap memory from time to time to avoid the out-of-memory issue (out-of-memory issue is not avoidable in some cases and manual cleaning is highly preferred instead of relying on GC completely). The typical data structures that are stored in the heap are Slices, Maps, and unbounded Channels whose size can be dynamically increased over time in the programming lifecycle. So typically variables with pointers to these data structures are stored in the heap.
One of the popular GC techniques involves maintaining the object reference count like in Python and cleaning the memory when the reference is 0. Go approaches a different technique called Mark-and-Sweep where GC traces all objects that are referred by the pointers as live heap and sweeps the memory held by non-live heap object for allocation of new data.
For a GC to run and do its work, it needs CPU time but frequent GC makes the program run slow. But, if GC executes in long intervals, the memory in the heap might pile up leading to out-of-memory for newly coming memory allocations. So, GOCC maintains a trade-off between CPU and memory with GC frequency. It's very complex to define how GC works because there are numerous parameters to control the frequency of GC depending on the application's needs. Read further about how GC frequency is calculated. Simply put, we can think GC will execute every 2 seconds (let's assume), and if in the meantime, when the heap memory is increased than the defined threshold at 1.5 seconds, then GC will trigger automatically.
For large and memory-intensive applications, it's recommended to run the GC cycle manually or define the behavior of the GC cycle and its frequency by setting parameters like target memory heap size and others.
Internals of Slices, Maps, and Channels
As discussed above there are no references in Go and Maps, Slices, and Channels are simply pointers but not actual data, their zero values are nil. This means, the compiler allocates the values of these data structures in the memory if they are pre-allocated size or else in the heap if they are dynamic and gives the pointer reference to the variables. If the pre-allocated size for Slices or Maps increases, they are moved to heap memory.
We know Slices are the pointers to the underlying Array storage. But, Maps and Channels are direct pointers to the underlying heap data structure that holds these values. So, these all are pointers to some data sources.
Maps and Slices are passed to functions by values and those values are pointers so that changes to duplicate variables change the source (not to be confused with pass-by-reference). Why is this? because inside the function, when the slice is grown/shrunk, a new array will be created by copying the old one and the new one will be referenced.
One more important thing is unlike Slices that are just pointers to backing arrays, for maps, the data is allocated arbitrarily in something called bucket memory structures when the map grows/shrinks. So, we cannot take the direct pointers to the map at that instance because the content of the map can be moved to another location if it shrinks/grows and if we have any reference to that old memory then it becomes a dangling pointer and Go doesn't allow dangling pointers. As the values of the keys can be changed dynamically, we cannot access the address to keys with pointers but values can be pointers.
For channels, one of the most important points to remember is, that when the sender has sent the data and holds the pointer to it, the sender should not modify the data holding by that pointer. Also, it's recommended to use read-only channels when the channels are passed to any functions.
The concepts discussed in this blog are very abstract and you're encouraged to delve deeper. Unlike Java or Python, there is very little to learn about Go and we do not even need to look further at what is not provided in the official documentation as everything has taken care of by Go itself making it's a simple and powerful language to use.
References
- Effective Go
- The Go Memory Model
- Concurrency is not Parallelism
- GO Advanced concurrency
- Concurrency Patterns In Go
- Go Concurrency Patterns
- Advanced Go Concurrency Patterns
- Rethinking Classical Concurrency Patterns
- Memory Optimization and Garbage Collector Management in Go
- https://dave.cheney.net/2017/04/30/if-a-map-isnt-a-reference-variable-what-is-it
- https://stackoverflow.com/questions/32495402/why-does-go-forbid-taking-the-address-of-map-member-yet-allows-slice-el