Blog This is my blog. It's awesome.


Dafny Verification of Machine Learning Code

Written by Oreoluwa Alebiosu on January 15, 2015.

Notes

My goal was to verify a machine learning program, I initially chose NaiveBayes because it was the algorithm I was most familiar with. After spending some time in writing the code, I realized the data structure used for NaiveBayes may be too complex for Dafny. Typically you keep track of the number of occurrences of a cell value in a table for a particular label. The appropriate data structure is map with key as the label and value as the cell value. Another data structure is needed to store map in which case for each label, I store its column/attribute, then its column/attribute contains the number of occurrences of a cell value for that column/attribute. I use both data structures to compute the (prior and posterior) probability of each possible label for a future test case.

I encountered problems with the second data structure, as I couldn't fully verify the code because of an initialization error with the second data structure. I didn't document the root cause of the error, so you are welcome to contact me for more details so I can reporduce the error. However, I read more about incompleteness in dafny and noticed a major source of incompleteness is with quantifiers, in particular, if nested. Although, I tried to resolve it by guiding the proof search by inserting intermediate assertions, or providing witnesses for existential quantifiers. The map data structure seemed too daunting to work with and I couldn't find good documentation/resources to help me. Here is a link to the NaiveBayes dafny code is https://github.com/orealeb/naivebayes_dafny

So I decided to select a different ML code that doesn't require some form of key-value storage, particularly nested key-value storage. I chose kNN, as it is an algorithm I have worked with in my research. Below is a link to my code: https://github.com/orealeb/knn_dafny/blob/master/knn.dfy I was able to prove the correctness for all its invoked methods. Here is a final report of the this project. The final report is below:

More >>