ONNX Inference

ONNX is an open format built to represent machine learning models. ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.

In a previous BrainFlow release, we demonstrated loading an ONNX model generated from sklearn into BrainFlow and calling it from different programming languages.

In this follow-up, we detail conversion of a basic TensorFlow model into ONNX using tf2onnx, and then calling that model using BrainFlow’s implementation of the ONNX runtime.

Example - Training with Tensorflow

If you already have a model skip this part and go to inference example. Continue reading this section if you want to train one by yourself.

Before running inference using BrainFlow API we need to train the model. In this example we will use TensorFlow’s Keras API.

Dependencies we will need:

  • brainflow
  • tensorflow~=2.9.1
  • tf2onnx~=1.12.0
  • onnxruntime~=1.12.1

You can install all of them from PYPI using python -m pip install %dependency name here%

The dataset is not specified in this example, only demonstration of building and implementing the ONNX model. The example code demonstrates a 4-channel time series input with 150 samples. For preprocessing, we can use BrainFlow’s built-in preprocessing steps

Here we demonstrate preprocessing of a labeled EEG dataset with 1000 samples, with a segment length of 150 and 4 channels. This example supposes there are 6 classes. Data is not provided for this example, and the first couple of lines should be replaced with your data and labels arranged in the same way.

import numpy as np
from brainflow.data_filter import DataFilter, NoiseTypes, DetrendOperations, FilterTypes

num_classes = 6

# Load raw data (replace np.zeros with your own data)
data = np.zeros(shape=(1000, 150, 4))  # REPLACE WITH YOUR OWN DATA
labels = np.zeros(shape=(1000, num_classes))  # REPLACE WITH YOUR OWN DATA (Labels in vector form)
data_filtered = np.copy(data)
sampling_rate = 250

for i in range(0, data.shape[0]):
    for ch in range(0, data.shape[2]):
        sample = np.copy(data[i, :, ch], order='C')
        DataFilter.detrend(sample, DetrendOperations.CONSTANT)
        DataFilter.remove_environmental_noise(sample, sampling_rate, NoiseTypes.FIFTY_AND_SIXTY)
        DataFilter.perform_highpass(sample, sampling_rate, 8.0, 3, FilterTypes.BUTTERWORTH, ripple=0.0)
        sample = sample * 1e-3
        data_filtered[i, :, ch] = sample

For this example we will use a simple 4-layer CNN as a classier, using TensorFlow as a backend with the Keras ‘Layers’ API:

import tensorflow as tf

def build_cnn_4layer(data_len, num_channels, num_classes, lr=0.001):
    input_shape = [data_len, num_channels]
    input_samples = tf.keras.layers.Input(shape=input_shape)
    input_reshaped = tf.keras.layers.Reshape(target_shape=(*input_shape, 1))(input_samples)

    c1 = tf.keras.layers.Conv2D(20, [8, 1], strides=1, padding='valid', activation='relu')(input_reshaped)
    c2 = tf.keras.layers.Conv2D(20, [1, num_channels], strides=(1, 1), padding='valid', activation='elu')(c1)
    c2_1 = tf.keras.layers.BatchNormalization(axis=-1, momentum=0.1, epsilon=1e-5)(c2)
    c3 = tf.keras.layers.Conv2D(40, [4, 1], strides=(4, 1), padding='valid', activation='relu')(c2_1)
    c3_1 = tf.keras.layers.BatchNormalization(axis=-1, momentum=0.1, epsilon=1e-5)(c3)
    c4 = tf.keras.layers.Conv2D(80, [4, 1], strides=(4, 1), padding='valid', activation='relu')(c3_1)
    c4_1 = tf.keras.layers.BatchNormalization(axis=-1, momentum=0.1, epsilon=1e-5)(c4)
    ff = tf.keras.layers.Flatten()(c4_1)
    output_layer = tf.keras.layers.Dense(num_classes, activation='softmax')(ff)
    model = tf.keras.Model(input_samples, output_layer)
    adam = tf.keras.optimizers.Adam(learning_rate=lr, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
    model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
    return model
model = build_cnn_4layer(data.shape[1], data.shape[2], labels.shape[1])

The model summary should show the following:

Model: "model"
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 150, 4)]          0         
 reshape (Reshape)           (None, 150, 4, 1)         0         
 conv2d (Conv2D)             (None, 143, 4, 20)        180       
 conv2d_1 (Conv2D)           (None, 143, 1, 20)        1620      
 batch_normalization (BatchN  (None, 143, 1, 20)       80        
 conv2d_2 (Conv2D)           (None, 35, 1, 40)         3240      
 batch_normalization_1 (Batc  (None, 35, 1, 40)        160       
 conv2d_3 (Conv2D)           (None, 8, 1, 80)          12880     
 batch_normalization_2 (Batc  (None, 8, 1, 80)         320       
 flatten (Flatten)           (None, 640)               0         
 dense (Dense)               (None, 6)                 3846      
Total params: 22,326
Trainable params: 22,046
Non-trainable params: 280

Converting model to ONNX format

Once this model is trained, we can convert the model to the ONNX format:

import tensorflow as tf
import tf2onnx

def tf_to_onnx(model, output_path="model.onnx", dtype=tf.float32, opset=11, input_shape=None):
    if input_shape is None:
        input_shape = model.input.shape
    spec = (tf.TensorSpec(input_shape, dtype, name=model.input.name),)
    model_proto, _ = tf2onnx.convert.from_keras(model, input_signature=spec, output_path=output_path, opset=opset)
    output_names = [n.name for n in model_proto.graph.output]
    print('Converted to onnx: ' + output_path)
    print(f"Input: {model.input.name}, output: {output_names[0]}")
    return model_proto, model.input.name, output_names[0]

onnx_model, input_name, output_name = tf_to_onnx(model, onnx_output_path, 
                                        dtype=tf.float32, input_shape=(1, data.shape[1], data.shape[2]))

Important: you have to target opset 11 or less

The snippet above generates the ONNX file. You should inspect the content of this file and check names of output nodes. Go to netron.app and load your model there.

It is also worth checking the model using the ONNX runtime builtin checker, and it is also worth testing the model by running your test data through the ONNX model to see that it matches the output of the TensorFlow model:

import onnx
import onnxruntime

def run_onnx_model(sess, input_shape, x_test, y_test, input_name, dtype=tf.float32):
    correct = 0
    wrong = 0
    for i in range(0, np.shape(x_test)[0]):
        x = x_test[i].reshape((1, *input_shape))
        y = y_test[i]
        if dtype == tf.float32:
            x = x.astype(np.float32)
            y = y.astype(np.float32)
        outputs = sess.run(None, {input_name: x})
        predicted, actual = outputs[0][0].argmax(0), y.argmax(0)
        if predicted == actual:
            correct += 1
            wrong += 1
    print(f"Correct predictions: {correct}, incorrect: {wrong} / {np.shape(x_test)[0]}")

content = onnx_model.SerializeToString()
sess = onnxruntime.InferenceSession(content, providers=['CPUExecutionProvider'])
run_onnx_model(sess, (data.shape[1], data.shape[2]), x_test, y_test, input_name, dtype=tf.float32)

If the prediction’s general accuracy (percentage of correct predictions) does not match the TensorFlow model, then there is something wrong. In general, any issues should be caught during the model generation process. Most likely, there is an issue with the data order in memory (row- vs column- order).

Running inference with BrainFlow

Now that we know the ONNX model works, we can import the ONNX file using BrainFlow and validate, again using test data:

import brainflow as bf

def run_onnx_model_bf(onnx_path, output_name, x_test, y_test, input_shape):
    bfmp = bf.BrainFlowModelParams(bf.BrainFlowMetrics.USER_DEFINED, bf.BrainFlowClassifiers.ONNX_CLASSIFIER)
    bfmp.file = onnx_path
    bfmp.output_name = output_name
    ml = bf.MLModel(bfmp)
    correct = 0
    wrong = 0
    for i in range(0, np.shape(x_test)[0]):
        result = ml.predict(x_test[i].reshape(math.prod(input_shape)))
        expected = y_test[i].argmax(0)
        predicted = result.argmax(0)
        if predicted == expected:
            correct += 1
            wrong += 1
    print(f"Correct predictions: {correct}, incorrect: {wrong} / {np.shape(x_test)[0]}")
run_onnx_model_bf(onnx_output_path, output_name, x_test, y_test, input_shape)

The predictions % accurate should match the expected output from the prior two tests (TensorFlow and ONNX Runtime). But for further granularity, you can run tests on individual samples and check that the results match. Note that for the BrainFlow ONNX implementation, you need to flatten the input sample using np.reshape()

def run_onnx_model_single(sess, x_sample, input_name, dtype=tf.float32):
    x_sample = x_sample.astype(np.float32)
    outputs = sess.run(None, {input_name: x_sample})
    return outputs
def run_onnx_model_bf_single(onnx_path, output_name, x_sample):
    bfmp = bf.BrainFlowModelParams(bf.BrainFlowMetrics.USER_DEFINED, bf.BrainFlowClassifiers.ONNX_CLASSIFIER)
    bfmp.file = onnx_path
    bfmp.output_name = output_name
    ml = bf.MLModel(bfmp)
    result = ml.predict(x_sample)
    return result

tf_sample_shape = (1, *input_shape)
orig_test_sample = np.reshape(x_test[0], tf_sample_shape)
y_tf_orig = model.predict(orig_test_sample)
print("tensorflow_outputs\n" + str(y_tf_orig))
y_ort_orig = run_onnx_model_single(sess, orig_test_sample, input_name, dtype=tf.float32)
print("onnxruntime\n" + str(y_ort_orig))
y_bf_orig = run_onnx_model_bf_single(onnx_output_path, output_name,
                                            np.reshape(orig_test_sample, math.prod(input_shape)))
print("brainflow+onnxruntime\n" + str(y_bf_orig))


Keras model file path: model_exports/eeg/keras/eeg1_6c_w150_cnn-4l.h5
Converted to onnx: model_exports/eeg/onnx/eeg1_6c_w150_cnn-4l.onnx
Input: input_1, output: dense
Correct predictions: 7799, incorrect: 831 / 8630
Brainflow+Onnxruntime Output: 
[2022-09-08 10:02:03.414] [ml_logger] [info] input type is: 1
[2022-09-08 10:02:03.414] [ml_logger] [info] num dims is: 3
[2022-09-08 10:02:03.414] [ml_logger] [info] Input Dim 0 size 1
[2022-09-08 10:02:03.414] [ml_logger] [info] Input Dim 1 size 150
[2022-09-08 10:02:03.414] [ml_logger] [info] Input Dim 2 size 4
[2022-09-08 10:02:03.414] [ml_logger] [info] found output node: dense
[2022-09-08 10:02:03.414] [ml_logger] [info] output type is: 1
[2022-09-08 10:02:03.414] [ml_logger] [info] num dims is: 2
[2022-09-08 10:02:03.414] [ml_logger] [info] Output Dim 0 size 1
[2022-09-08 10:02:03.414] [ml_logger] [info] Output Dim 1 size 6
Correct predictions: 7799, incorrect: 831 / 8630
1/1 [==============================] - 0s 39ms/step
[[9.9034435e-01 5.0133145e-03 9.5097981e-05 3.4038170e-04 2.8566546e-03
[array([[9.9034190e-01, 5.0146505e-03, 9.5106268e-05, 3.4051118e-04,
        2.8568176e-03, 1.3509562e-03]], dtype=float32)]
[9.90341902e-01 5.01465052e-03 9.51062684e-05 3.40511178e-04
 2.85681756e-03 1.35095615e-03]

Here we can see the outputs match up to 4 significant figures, but for all intents and purposes, match. The minor differences are due to the TensorFlow inference model computed on GPU whereas the ONNX models were run on CPU. If the values here differ greatly, then likely there is a problem with the model or, (more likely) the arrangement of the data in memory (row-order vs column-order).

This demonstrates that BrainFlow helps to streamline addition of machine learning techniques to your biometric data pipelines using the ONNX Runtime.


  • This implementation has been tested with following toolkits and frameworks; any others methods will require further testing:
  • Input type can be only float or double
  • Max ONNX opset tested so far is 11, so it may not work with all available TensorFlow ops.
  • Disable ZipMap for classifiers
  • So far we’ve added only default execution provider (CPU)