Snapdragon Neural Processing Engine SDK
Reference Guide
|
The benchmark shipped in the SNPE SDK consists of a set of python scripts that runs a network on a target Android/LinuxEmbedded device and collect performance metrics. It uses executables and libraries found in the SDK package to run a DLC file on target, using a set of inputs for the network, and a file that points to that set of inputs.
The input to the benchmark scripts is a configuration file in JSON format. The SDK ships with a configuration file for running the AlexNet model that is created in the SNPE SDK. SDK users are encouraged to create their own configuration files and use the benchmark scripts to run on target to collect timing and memory consumption measurements.
The configuration file allows the user to specify:
To see all of the command line parameters use the "-h" option when running snpe_bench.py
usage: snpe_bench.py [-h] -c CONFIG_FILE [-o OUTPUT_BASE_DIR_OVERRIDE] [-v DEVICE_ID_OVERRIDE] [-r HOST_NAME] [-a] [-t DEVICE_OS_TYPE_OVERRIDE] [-d] [-s SLEEP] [-b USERBUFFER_MODE] [-p PERFPROFILE] [-l PROFILINGLEVEL] [-json] [-cache] Run the snpe_bench required arguments: -c CONFIG_FILE, --config_file CONFIG_FILE Path to a valid config file Refer to sample config file config_help.json for more detail on how to fill params in config file optional arguments: -o OUTPUT_BASE_DIR_OVERRIDE, --output_base_dir_override OUTPUT_BASE_DIR_OVERRIDE Sets the output base directory. -v DEVICE_ID_OVERRIDE, --device_id_override DEVICE_ID_OVERRIDE Use this device ID instead of the one supplied in config file. Cannot be used with -a -r HOST_NAME, --host_name HOST_NAME Hostname/IP of remote machine to which devices are connected. -a, --run_on_all_connected_devices_override Runs on all connected devices, currently only support 1. Cannot be used with -v -t DEVICE_OS_TYPE_OVERRIDE, --device_os_type_override DEVICE_OS_TYPE_OVERRIDE Specify the target OS type, valid options are ['android', 'android-aarch64', 'le_oe_gcc8.2', 'le64_oe_gcc8.2'] -d, --debug Set to turn on debug log -s SLEEP, --sleep SLEEP Set number of seconds to sleep between runs e.g. 20 seconds -b USERBUFFER_MODE, --userbuffer_mode USERBUFFER_MODE [EXPERIMENTAL] Enable user buffer mode, default to float, can be tf8exact0 -p PERFPROFILE, --perfprofile PERFPROFILE Set the benchmark operating mode (balanced, default, sustained_high_performance, high_performance, power_saver, system_settings) -l PROFILINGLEVEL, --profilinglevel PROFILINGLEVEL Set the profiling level mode (off, basic, moderate, detailed). Default is basic. -json, --generate_json Set to produce json output. -cache, --enable_init_cache Enable init caching mode to accelerate the network building process. Defaults to disable.
snpe_bench.py is the main benchmark script to measure and report performance statistics. Here is how to use it with the AlexNet model and data that is created in the SDK.
cd $SNPE_ROOT/benchmarks python3 snpe_bench.py -c alexnet_sample.json -a where -a benchmarks on the device connected
All results are stored in the "HostResultDir" that is specified in the configuration json file. The benchmark creates time-stamped directories for each benchmark run. All timing results are stored in microseconds.
For your convenience, a "latest_results" link is created that always points to the most recent run.
# In alexnet_sample.json, "HostResultDir" is set to "alexnet/results" cd $SNPE_ROOT/benchmarks/alexnet/results # Notice the time stamped directories and the "latest_results" link. cd $SNPE_ROOT/benchmarks/alexnet/results/latest_results # Notice the .csv file, open this file in a csv viewer (Excel, LibreOffice Calc) # Notice the .json file, open the file with any text editor
CSV Benchmark Results File
The CSV file contains results similar to the example below. Some measurements may not be apparent in the CSV file. To get all timing information, profiling level needs to be set to detailed. By default, profiling level is basic. Note that colored headings have been added for clarity.
CSV 1: Configuration
This section contains information about:
CSV 2: Initialization metrics
This section contains measurements concerning model initialization. Profiling level affects the amount of measurements collected. No metrics are collected for profiling off. Metrics for basic and detailed are shown below.
Profiling level: Basic
Profiling level: Moderate or Detailed
CSV 3: Execution metrics
This section contains measurements concerning the execution of one inference pass of the neural network model. Profiling level affects the amount of measurements collected.
Profiling level: Basic
Profiling level: Moderate or Detailed
Profiling level: Detailed
CSV 4: Model Layer Names
This section contains the list of names of each layer of the neural network model.
Note that this information will only be present for DSP/GPU runtime only if the profiling level is set to detailed.
CSV 5: Model Layer Times
Each row in this section represents the execution time of each layer of the neural network model. For each runtime the average, min and max values are represented. Note that the values corresponding to each layer are referring to cycle counts, whereas the values for all other components like init, ForwardPropagate are in microseconds.
Note that this information will only be present for DSP/GPU runtime only if the profiling level is set to detailed.
Refer to Measurement Methodology Measurement Methodology for more background on the numbers.
JSON Benchmark Results File
The benchmark results published in the CSV file can also be made available in JSON format. The contents are the same as in the CSV file, structured as key-value pairs, and will help parsing the results in a simple and efficient manner. The JSON file contains results similar to the example below.
Prepare inputs
Before running the benchmark, a set of inputs need to be ready:
Config structure
The configuration file is a JSON file with a predefined structure. Refer to $SNPE_ROOT/benchmarks/alexnet_sample.json as an example.
All fields are required.
Optional fields:
Architecture support
Android ARM 32-bit and AARCH 64-bit are both supported. In addition to Android, there is limited support for LinuxEmbedded, where only the timing measurement is supported.
Runtime and measurement are concatenated to make a full run combination name, e.g.,
Note that for each specified runtime, there are multiple sets of timing measurements which differ only in the tensor format (Using ITensors and Using User Buffers). For example, for the DSP runtime, the following timing measurements are taken:
The easiest way to run the benchmark is to specify the -a option, which will run the benchmark on the lone device connected to your computer.
cd $SNPE_ROOT/benchmarks python3 snpe_bench.py -c yourmodel.json -a
The benchmark will do an md5sum on the host files (those specified in the JSON configuration) and on the device files. Because of the md5sum check, files needed for the benchmark run has to be available on the host.
For any file that exists both on host and on the device with mismatch md5, the benchmark will copy the file from host to target and issue a warning message letting you know that the local files do not match the device files. This is done so that you can be sure that the results you get from a benchmark run accurately reflect the files specified in the JSON file.
Other options
-v option
Allows you to override the device ID specified in the config file, so that the same config file can be used across multiple devices. You can use this instead of -a, if you have multiple devices attached.
-o option
Result output base directory override applies only if relative paths are specified for HostRootPath and HostResultsDir. It allows you to pool the output regardless of where you run the benchmark from.
-t option
OS Type override currently supports Android arm (armeabi-v7a), Android aarch64 (arm64-v8a), and LinuxEmbedded devices.
-b option
Allows you to specify type of input and output user buffers.
-p option
Allows you to profile performance in different operating modes.
-l option
Allows you to specify the level of performance profiling.
Open the results(CSV file or JSON file) in your <HostResultsDir>/latest_results folder to view your results. (<HostResultsDir> is what you specified in your json configuration file.)
In all cases, the snpe-net-run executable is used to load a model and run inputs through the model.
Performance ("timing")
Timing measurements are taken using internal timing utilities inside the SNPE libraries. When snpe-net-run is executed, the libraries will log timing info to a file. This file is then parsed offline to retrieve total inference times and per-layer times.
The total inference times include both the per-layer computation times plus overhead such as data movements between layers, as well as into and out of runtimes, whereas the per-layer times are strictly computational times for each layer. For smaller networks the overhead can be quite significant relative to computational time, particularly when offloading the networks to run on GPU or DSP.
As well, further optimizations present on the GPU/DSP may cause layer times to be mis-attributed, in the case of neuron conv-neuron or fc-neuron pairs. When executing on GPU the total time of the pairs would be assigned to convs, whereas for DSP they would be assigned to the neurons.
Binaries that the benchmark script depends on are in the following configuration files, depending on the target architecture, compiler, and STL library: