GopherCon 2018 - Go Scheduler

conference, golang, gophercon2018, notes

These are some notes from my experiences at the GopherCon 2018. I don’t expect these will be laid out in any particularly useful way; I am mostly taking them so I can remember some of the bits I found most useful in the future.

Go Scheduler Decisions

  • Can support hundreds of thousands of goroutines
  • How does the scheduler do what it does in a fair way?


  • User-space threads managed by the go runtime

    • Faster to create, destroy, context switch than OS threads
    • Smaller memory footprint
    • Need a scheduler to assign to OS threads


  • Invoked at goroutine creation and blocking
  • Syscalls also, but those also block the underlying OS thread
  • Goals

    • Use a small number of OS threads
    • Support high concurrency
    • Use all possible cores on a box (when appropriate)
  • How do goroutines get onto OS threads?

    • When to create OS threads?
    • When/how to distribute?
  • FIFO runqueue for things that need to be run

  • Can’t do single-thread, since syscalls will block the OS thread (and no //-ism)

  • Don’t want to do thread-per-goroutine (expensive)

  • Create threads when needed, but keep idle ones around to handle future goroutines

  • Need a lock on the runqueue

  • Also, what about creating lots of goroutines?

    • Lock contention ahoy!
    • Limit the number of OS threads – but to what? Number of cores
    • BUT! Increasing number of cores gets us back to heavy contention
  • Maybe split up the runqueue into N runqueues?

    • But what happens if the thread’s runqueue is empty?
    • Pick another queue at random and steal half the work to do
    • What happens if an OS thread blocks on a syscall? Background thread to redistribute work and start another OS thread (since the other is blocked)
  • What about non-cooperative goroutines (long running high-cpu ops)?

    • Can starve the runqueues
    • Need mechanism for preemption
    • Background thread “sysmon” to detect long-running goroutines and unschedule them, putting them on a lower-priority global runqueue (threads check the global queue less often than their own runqueues)
  • Limitations

    • No priorities (FIFO)
    • No strong preemption
    • Not aware of system topology – bad for cache locality