GopherCon 2018 - Machine Learning on Go
These are some notes from my experiences at the GopherCon 2018. I don’t expect these will be laid out in any particularly useful way; I am mostly taking them so I can remember some of the bits I found most useful in the future.
XP - 45M LoC
Ford F150 - 150M LoC!!!
Google - 1B LoC
Want better tools to let us write code better (and better code)
Machine Learning on Go Code
(mostly written in python)
Input to the model is source code (instead of plain text, images, etc)
- Similar to data mining, NLP, graph-based learning
Trying to extract information about source code
- More structured
- Generation should compile
- Trying to extract things like intent vs. code written
Getting the Data
- GH Archive
- Public Git Archive
- Language Identification
- File Parsing - generate language agnostic ASTs
- Token Extraction - function names
- History analysis (
How to analyze?
series of bytes?
- machine learning on trees is HARD
How to Learn?
- “kind of like a puppy”
Given prev 9 tokens, predict the next one
Recurrent Neural Networks
- Tried over the go std library with charRNN – 61% accuracy
- Generated some go-like constructs, but not really accurate
What can we Build?
- Is the next token right or not?
- Identify “interesting” bits of code in a diff
- Suggesting function names
- Assisted code review (
- Bug prediction
- Style Guide Enforcement
- Automated Code Review
- Code Generation
- Natural Analysis