GopherCon 2018 - Machine Learning on Go

conference, golang, gophercon2018, notes

These are some notes from my experiences at the GopherCon 2018. I don’t expect these will be laid out in any particularly useful way; I am mostly taking them so I can remember some of the bits I found most useful in the future.


Fun Facts

  • XP - 45M LoC

  • Ford F150 - 150M LoC!!!

  • Google - 1B LoC

  • Want better tools to let us write code better (and better code)


Machine Learning on Go Code

  • (mostly written in python)

  • Input to the model is source code (instead of plain text, images, etc)

    • Similar to data mining, NLP, graph-based learning
  • Trying to extract information about source code

    • More structured
    • Generation should compile
    • Trying to extract things like intent vs. code written

Getting the Data

  • GH Archive
  • Public Git Archive
  • Language Identification
  • File Parsing - generate language agnostic ASTs
  • Token Extraction - function names
  • History analysis (go-git)

Data Analysis

  • How to analyze?

    • series of bytes?

    • token sequence?

    • ast?

      • machine learning on trees is HARD
    • flow graph?

How to Learn?

  • Neural Networks

    • “kind of like a puppy”
  • Given prev 9 tokens, predict the next one

  • Recurrent Neural Networks

    • Tried over the go std library with charRNN – 61% accuracy
    • Generated some go-like constructs, but not really accurate
  • code2vec

What can we Build?

  • Is the next token right or not?
  • Identify “interesting” bits of code in a diff
  • Suggesting function names
  • Assisted code review (src-d/lookout)

Future

  • Bug prediction
  • Education
  • Style Guide Enforcement
  • Automated Code Review

Future Future

  • Code Generation
  • Natural Analysis