GopherCon 2018 - Machine Learning on Go
These are some notes from my experiences at the GopherCon 2018. I don’t expect these will be laid out in any particularly useful way; I am mostly taking them so I can remember some of the bits I found most useful in the future.
- XP - 45M LoC
- Ford F150 - 150M LoC!!!
Google - 1B LoC
Want better tools to let us write code better (and better code)
Machine Learning on Go Code
- (mostly written in python)
Input to the model is source code (instead of plain text, images, etc)
- Similar to data mining, NLP, graph-based learning
Trying to extract information about source code
- More structured
- Generation should compile
- Trying to extract things like intent vs. code written
Getting the Data
- GH Archive
- Public Git Archive
- Language Identification
- File Parsing - generate language agnostic ASTs
- Token Extraction - function names
- History analysis (
How to analyze?
- series of bytes?
- token sequence?
machine learning on trees is HARD
How to Learn?
- “kind of like a puppy”
Given prev 9 tokens, predict the next one
Recurrent Neural Networks
- Tried over the go std library with charRNN – 61% accuracy
- Generated some go-like constructs, but not really accurate
What can we Build?
- Is the next token right or not?
- Identify “interesting” bits of code in a diff
- Suggesting function names
- Assisted code review (
- Bug prediction
- Style Guide Enforcement
- Automated Code Review
- Code Generation
- Natural Analysis