Functional Testing of Machine Learning Models

Black-box testing and oracles

Black-box testing of machine learning (ML) models refers to testing with no knowledge about the internal details of the model, such as the algorithm used to create it and the features in it. The main objective of black-box testing is to ensure the quality of the models in a sustained manner.

The difficulty in black-box testing lies in trying to identify the test oracle, a mechanism for determining whether a test has passed or failed.

Why are ML models considered non-testable? How could they be made testable?

In the case of conventional software development, models are made testable by using a test oracle, such as testers, test engineers or testing mechanisms working alongside the test program. An oracle can verify the outcome of a test against expected values. ML models, however, are often considered untestable because of the difficulty in performing black-box testing on them. Since ML models output some sort of prediction, there are no expected values against which to verify test outcomes.

For lack of a test oracle, pseudo-oracles are used. Pseudo-oracles denote the scenario when the outputs for the given set of inputs are compared with each other and correctness is determined.

For example, to solve a problem, a program has been coded using two different implementations; one of them will be considered the main program. The input passes through both implementations. If the output is the same as or proportional to (i.e., it falls within a given range around a predetermined value) the main program, then the program can be considered working as expected or correctly.

Absence of test oracle

That is one of many techniques used to perform quality control checks on ML models.

Functional testing

To better understand functional testing of ML models, consider the following example:


      f = function under test

      x = input

      y = output

      H = Heuristic (i.e., the test or an experiment)

      O = Observation and oracle (i.e., the expected result)

Flow diagram of functional testing

How does a tester perform functional testing for ML models?

Testers must determine the f, x and y values before testing the models. If need be, they may consult with the engineers who are building the product, they may read and write documentation, they may rely on their own experience by making assumptions and they may research materials online. Through those methods, testers will arrive at possible experiments and tests (H) and expected results (O). They will also help find functional bugs.

Testers clarify the understanding of the f, x, and y values through every iteration of the testing process.