Snapdragon Neural Processing Engine SDK
Reference Guide
Tools

This chapter describes the various SDK tools and features.


snpe-net-run

snpe-net-run loads a DLC file, loads the data for the input tensor(s), and executes the network on the specified runtime.

DESCRIPTION:
------------
Example application demonstrating how to load and execute a neural network
using the SNPE C/C++ API.


REQUIRED ARGUMENTS:
-------------------
  --container  <FILE>   Path to the DL container containing the network.
  --input_list <FILE>   Path to a file listing the inputs for the network.


OPTIONAL ARGUMENTS:
-------------------
  --use_gpu             Use the GPU runtime for SNPE.
  --use_dsp             Use the DSP fixed point runtime for SNPE.
  --debug               Specifies that output from all layers of the network
                        will be saved.
  --output_dir=<val>
                        The directory to save output to. Defaults to ./output
  --storage_dir=<val>
                        The directory to store SNPE metadata files
  --encoding_type=<val>
                        Specifies the encoding type of input file. Valid settings are "nv21".
                        Cannot be combined with --userbuffer*.
  --use_native_input_files
                        Specifies to consume the input file(s) in their native data type(s).
                        Must be used with --userbuffer_xxx.
  --use_native_output_files
                        Specifies to write the output file(s) in their native data type(s).
                        Must be used with --userbuffer_xxx.
  --userbuffer_auto
                        Specifies to use userbuffer for input and output, with auto detection of types enabled.
                        Must be used with user specified buffer. Cannot be combined with --encoding_type.
  --userbuffer_float
                        Specifies to use userbuffer for inference, and the input type is float.
                        Cannot be combined with --encoding_type.
  --userbuffer_floatN=<val>
                        Specifies to use userbuffer for inference, and the input type is float 16 or float 32.
                        Cannot be combined with --encoding_type.
  --userbuffer_tf8      Specifies to use userbuffer for inference, and the input type is tf8exact0.
                        Cannot be combined with --encoding_type.
  --userbuffer_tfN=<val>
                        Overrides the userbuffer output used for inference, and the output type is tf8exact0 or tf16exact0.
                        Must be used with user specified buffer.
  --userbuffer_float_output
                        Overrides the userbuffer output used for inference, and the output type is float. Must be used with user
                        specified buffer.
  --userbuffer_floatN_output=<val>
                        Overrides the userbuffer output used for inference, and the output type is float 16 or float 32. Must be used with user
                        specified buffer.
  --userbuffer_tfN_output=<val>
                        Overrides the userbuffer output used for inference, and the output type is tf8exact0 or tf16exact0.
                        Must be used with user specified buffer.
  --userbuffer_tf8_output
                        Overrides the userbuffer output used for inference, and the output type is tf8exact0.
  --userbuffer_uintN_output=<val>
                        Overrides the userbuffer output used for inference, and the output type is Uint N. Must be used with user
                        specified buffer.
  --static_min_max  Specifies to use quantization parameters from the model instead of
                        input specific quantization. Used in conjunction with --userbuffer_tf8.
  --resizable_dim=<val>
                        Specifies the maximum number that resizable dimensions can grow into.
                        Used as a hint to create UserBuffers for models with dynamic sized outputs. Should be a
                        positive integer and is not applicable when using ITensor.
  --userbuffer_glbuffer
                        [EXPERIMENTAL]  Specifies to use userbuffer for inference, and the input source is OpenGL buffer.
                        Cannot be combined with --encoding_type.
                        GL buffer mode is only supported on Android OS.
  --data_type_map=<val>
                        Sets data type of IO buffers during prepare.
                        Arguments should be provided in the following format:
                        --data_type_map buffer_name1=buffer_name1_data_type --data_type_map buffer_name2=buffer_name2_data_type
                        Data Type can have the following values: float32, fixedPoint8, fixedPoint16
  --tensor_mode=<val>
                        Sets type of tensor to use.
                        Arguments should be provided in the following format:
                        --tensor_mode itensor
                        Data Type can have the following values: userBuffer, itensor
  --perf_profile=<val>
                        Specifies perf profile to set. Valid settings are "low_balanced" , "balanced" , "default",
                        "high_performance" ,"sustained_high_performance", "burst", "low_power_saver", "power_saver",
                        "high_power_saver" and "system_settings".
  --profiling_level=<val>
                        Specifies the profiling level.  Valid settings are "off", "basic", "moderate" and "detailed".
                        Default is detailed.
  --enable_cpu_fallback
                        Enables cpu fallback functionality. Defaults to disable mode.
  --input_name=<val>
                        Specifies the name of input for which dimensions are specified.
  --input_dimensions=<val>
                        Specifies new dimensions for input whose name is specified in input_name. e.g. "1,224,224,3".
                        For multiple inputs, specify --input_name and --input_dimensions multiple times.
  --gpu_mode=<val>  Specifies gpu operation mode. Valid settings are "default", "float16".
                        default = float32 math and float16 storage (equiv. use_gpu arg).
                        float16 = float16 math and float16 storage.
  --enable_init_cache
                        Enable init caching mode to accelerate the network building process. Defaults to disable.
  --platform_options=<val>
                        Specifies value to pass as platform options.
  --priority_hint=<val>
                        Specifies hint for priority level.  Valid settings are "low", "normal", "normal_high", "high". Defaults to normal.
                        Note: "normal_high" is only available on DSP.
  --inferences_per_duration=<val>
                        Specifies the number of inferences in specific duration (in seconds). e.g. "10,20".
  --runtime_order=<val>
                        Specifies the order of precedence for runtime e.g  cpu_float32, dsp_fixed8_tf etc
                        Valid values are:-
                        cpu_float32 (Snapdragon CPU)       = Data & Math: float 32bit
                        gpu_float32_16_hybrid (Adreno GPU) = Data: float 16bit Math: float 32bit
                        dsp_fixed8_tf (Hexagon DSP)        = Data & Math: 8bit fixed point Tensorflow style format
                        gpu_float16 (Adreno GPU)           = Data: float 16bit Math: float 16bit
  --set_output_tensors=<val>
                        Specifies a comma separated list of tensors to be output after execution.
  --set_unconsumed_as_output
                        Sets all unconsumed tensors as outputs.
                        aip_fixed8_tf (Snapdragon HTA+HVX) = Data & Math: 8bit fixed point Tensorflow style format
                        cpu (Snapdragon CPU)               = Same as cpu_float32
                        gpu (Adreno GPU)                   = Same as gpu_float32_16_hybrid
                        dsp (Hexagon DSP)                  = Same as dsp_fixed8_tf
                        aip (Snapdragon HTA+HVX)           = Same as aip_fixed8_tf
  --udo_package_path=<val>
                        Path to the registration library for UDO package(s).
                        Optionally, user can provide multiple packages as a comma-separated list.
  --duration=<val>      Specified the duration of the run in seconds. Loops over the input_list until this amount of time has transpired.
  --dbglogs
  --timeout=<val>       Execution terminated when exceeding time limit. Only valid for dsp runtime currently.
  --userlogs=<val>      Specifies the user level logging as level,<optional logPath>.
  --help                Show this help message.
  --version             Show SNPE Version Number.

This binary outputs raw output tensors into the output folder by default. Examples of using snpe-net-run can be found in Running AlexNet tutorial.

Additional details:

  • Running batched inputs:
    • snpe-net-run is able to automatically batch the input data. The batch size is indicated in the model container (DLC file) but can also be set using the "input_dimensions" argument passed to snpe-net-run. Users do not need to batch their input data. If the input data is not batch, the input size needs to be a multiple of the size of the input data files. snpe-net-run would group the provided inputs into batches and pad the incomplete batches (if present) with zeros.

      In the example below, the model is set to accept batches of three inputs. So, the inputs are automatically grouped together to form batches by snpe-net-run and padding is done to the final batch. Note that there are five output files generated by snpe-net-run:

            …
            Processing DNN input(s):
            cropped/notice_sign.raw
            cropped/trash_bin.raw
            cropped/plastic_cup.raw
            Processing DNN input(s):
            cropped/handicap_sign.raw
            cropped/chairs.raw
            Applying padding
  • input_list argument:
    • snpe-net-run can take multiple input files as input data per iteration, and specify multiple output names, in an input list file formated as below:

            #<output_name>[<space><output_name>]
            <input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>]
            …

      The first line starting with a "#" specifies the output layers' names. If there is more than one output, a whitespace should be used as a delimiter. Following the first line, you can use multiple lines to supply input files, one line per iteration, and each line only supply one layer.If there is more than one input per line, a whitespace should be used as a delimiter.

      Here is an example, where the layer names are "Input_1" and "Input_2", and inputs are located in the path "Placeholder_1/real_input_inputs_1/". Its input list file should look like this:

            #Output_1 Output_2
            Input_1:=Placeholder_1/real_input_inputs_1/0-0#e6fb51.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/0-1#8a171b.rawtensor
            Input_1:=Placeholder_1/real_input_inputs_1/1-0#67c965.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/1-1#54f1ff.rawtensor
            Input_1:=Placeholder_1/real_input_inputs_1/2-0#b42dc6.rawtensor Input_2:=Placeholder_1/real_input_inputs_1/2-1#346a0e.rawtensor

      Note: If the batch dimension of the model is greater than 1, the number of batch elements in the input file has to either match the batch dimension specified in the DLC or it has to be one. In the latter case, snpe-net-run will combine multiple lines into a single input tensor.

  • Running AIP Runtime:
    • AIP Runtime requires a DLC which was quantized, and HTA sections were generated offline. See Adding HTA sections
    • AIP Runtime does not support debug_mode
    • AIP Runtime requires a DLC with all the layers partitioned to HTA to support batched inputs

snpe-parallel-run

snpe-parallel-run loads a DLC file, loads the data for the input tensor(s), and executes the network on the specified runtime. This app is similar to snpe-net-run, but is able to run multiple threads of inference on the same network for benchmarking purposes.

DESCRIPTION:
------------
Example application demonstrating how to use SNPE
using the PSNPE and SNPE C/C++ API.


REQUIRED ARGUMENTS:
-------------------
  --container  <FILE>   Path to the DL container containing the network.
  --input_list <FILE>   Path to a file listing the inputs for the network.
  --perf_profile <VAL>
                        Specifies perf profile to set. Valid settings are "balanced" , "default" , "high_performance" , "sustained_high_performance" , "burst" , "power_saver" and "system_settings".
                        NOTE: "balanced" and "default" are the same.  "default" is being deprecated in the future.
  --cpu_fallback        Enables cpu fallback functionality. Valid settings are "false", "true".
  --runtime_order <VAL,VAL,VAL,..>
                        Specifies the order of precedence for runtime e.g cpu,gpu etc. Valid values are:-
                                 cpu_float32 (Snapdragon CPU)       = Data & Math: float 32bit
                                 gpu_float32_16_hybrid (Adreno GPU) = Data: float 16bit Math: float 32bit
                                 dsp_fixed8_tf (Hexagon DSP)        = Data & Math: 8bit fixed point Tensorflow style format
                                 gpu_float16 (Adreno GPU)           = Data: float 16bit Math: float 16bit
                                 aip_fixed8_tf (Snapdragon HTA+HVX) = Data & Math: 8bit fixed point Tensorflow style format
                                 cpu (Snapdragon CPU)               = Same as cpu_float32
                                 gpu (Adreno GPU)                   = Same as gpu_float32_16_hybrid
                                 dsp (Hexagon DSP)                  = Same as dsp_fixed8_tf
                                 aip (Snapdragon HTA+HVX)           = Same as aip_fixed8_tf
  --use_cpu             Use the CPU runtime for SNPE.
  --use_gpu             Use the GPU float32 runtime for SNPE.
  --use_gpu_fp16        Use the GPU float16 runtime for SNPE.
  --use_dsp             Use the DSP fixed point runtime for SNPE.
  --use_aip             Use the AIP fixed point runtime for SNPE.


OPTIONAL ARGUMENTS:
-------------------
  --userbuffer_float    Specifies to use userbuffer for inference, and the input type is float.
  --userbuffer_tf8      Specifies to use userbuffer for inference, and the input type is tf8exact0.
  --userbuffer_auto     Specifies to use userbuffer with automatic input and output type detection for inference.
  --use_native_input_files
                        Specifies to consume the input file(s) in their native data type(s).
                        Must be used with --userbuffer_xxx.
  --use_native_output_files
                        Specifies to write the output file(s) in their native data type(s).
                        Must be used with --userbuffer_xxx.
  --input_name <INPUT_NAME>
                        Specifies the name of input for which dimensions are specified.
  --input_dimensions <INPUT_DIM>
                        Specifies new dimensions for input whose name is specified in input_name. e.g. "1,224,224,3".
  --output_dir <DIR>    The directory to save result files
  --static_min_max      Specifies to use quantization parameters from the model instead of
                        input specific quantization. Used in conjunction with --userbuffer_tf8.
  --userbuffer_float_output
                        Overrides the userbuffer output used for inference, and the output type is float.
                        Must be used with user specified buffer.
  --userbuffer_tf8_output
                        Overrides the userbuffer output used for inference, and the output type is tf8exact0.
                        Must be used with user specified buffer.
  --enable_init_cache   Enable init caching mode to accelerate the network building process. Defaults to disable.
  --profiling_level     Specifies the profiling level.  Valid settings are "off", "basic", "moderate" and "detailed".Default is off.
  --platform_options    Specifies value to pass as platform options.  Valid settings: "HtaDLBC:ON/OFF", "unsignedPD:ON/OFF".
  --set_output_tensors  Specifies a comma separated list of tensors to be output after execution.
  --userlogs <VAL>      Specifies the user level logging as level,<optional logPath>.
  --version             Show SNPE Version Number.
  --help                Show this help message.

Additional details:

  • Required runtime argument:
    • For the required arguments pertaining to runtime specification, either –runtime_order OR use_cpu/gpu/etc. needs to be specified. The following example demonstrates an equivalent command using either of these options.

          snpe-parallel-run --container container.dlc --input_list input_list.txt
          --perf_profile burst --cpu_fallback true --use_dsp --use_gpu --userbuffer_auto

      is equivalent to

          snpe-parallel-run --container container.dlc --input_list input_list.txt
          --perf_profile burst --cpu_fallback true --runtime_order dsp,gpu --userbuffer_auto
  • Spawning multiple threads:

    • snpe-parallel-run is able to create multiple threads to execute identical inference passes.

      In the example below, the given command has the required arguments for container and input list given. After these 2 options, the remaining options form a repeating sequence that corresponds to each thread. In this example, we have varied the runtimes specified for each thread (one for dsp, another for gpu, and the last one for dsp).

            snpe-parallel-run --container container.dlc --input_list input_list.txt
            --perf_profile burst --cpu_fallback true --use_dsp --userbuffer_auto
            --perf_profile burst --cpu_fallback true --use_gpu --userbuffer_auto
            --perf_profile burst --cpu_fallback true --use_dsp --userbuffer_auto

      When this command is executed, the following section of output is observed:

            ...
            Processing DNN input(s):
            input.raw
            PSNPE start executing...
            runtimes: dsp_fixed8_tf gpu_float32_16_hybrid dsp_fixed8_tf - Mode :0- Number of images processed: x
             Build time: x seconds.
            ...

      Note that the number of runtimes listed corresponds to the number of threads specified, as well as the order in which those threads were specified.


snpe_bench.py

python script snpe_bench.py runs a DLC neural network and collects benchmark performance information.

usage: snpe_bench.py [-h] -c CONFIG_FILE [-o OUTPUT_BASE_DIR_OVERRIDE]
                     [-v DEVICE_ID_OVERRIDE] [-r HOST_NAME] [-a]
                     [-t DEVICE_OS_TYPE_OVERRIDE] [-d] [-s SLEEP]
                     [-b USERBUFFER_MODE] [-p PERFPROFILE] [-l PROFILINGLEVEL]
                     [-json] [-cache]

Run the snpe_bench

required arguments:
  -c CONFIG_FILE, --config_file CONFIG_FILE
                        Path to a valid config file
                        Refer to sample config file config_help.json for more
                        detail on how to fill params in config file

optional arguments:
  -o OUTPUT_BASE_DIR_OVERRIDE, --output_base_dir_override OUTPUT_BASE_DIR_OVERRIDE
                        Sets the output base directory.
  -v DEVICE_ID_OVERRIDE, --device_id_override DEVICE_ID_OVERRIDE
                        Use this device ID instead of the one supplied in config
                        file. Cannot be used with -a
  -r HOST_NAME, --host_name HOST_NAME
                        Hostname/IP of remote machine to which devices are
                        connected.
  -a, --run_on_all_connected_devices_override
                        Runs on all connected devices, currently only support 1.
                        Cannot be used with -v
  -t DEVICE_OS_TYPE_OVERRIDE, --device_os_type_override DEVICE_OS_TYPE_OVERRIDE
                        Specify the target OS type, valid options are
                        ['android', 'android-aarch64', 'le', 'le64_gcc4.9',
                        'le_oe_gcc6.4', 'le64_oe_gcc6.4']
  -d, --debug           Set to turn on debug log
  -s SLEEP, --sleep SLEEP
                        Set number of seconds to sleep between runs e.g. 20
                        seconds
  -b USERBUFFER_MODE, --userbuffer_mode USERBUFFER_MODE
                        [EXPERIMENTAL] Enable user buffer mode, default to
                        float, can be tf8exact0
  -p PERFPROFILE, --perfprofile PERFPROFILE
                        Set the benchmark operating mode (balanced, default,
                        sustained_high_performance, high_performance,
                        power_saver, system_settings)
  -l PROFILINGLEVEL, --profilinglevel PROFILINGLEVEL
                        Set the profiling level mode (off, basic, moderate, detailed).
                        Default is basic.
  -json, --generate_json
                        Set to produce json output.
  -cache, --enable_init_cache
                        Enable init caching mode to accelerate the network
                        building process. Defaults to disable.

snpe-caffe-to-dlc

snpe-caffe-to-dlc converts a Caffe model into an SNPE DLC file.

usage: snpe-caffe-to-dlc [-h] [--input_network INPUT_NETWORK] [-o OUTPUT_PATH]
                         [--out_node OUT_NAMES]
                         [--copyright_file COPYRIGHT_FILE]
                         [--model_version MODEL_VERSION]
                         [--disable_batchnorm_folding]
                         [--input_type INPUT_NAME INPUT_TYPE]
                         [--input_dtype INPUT_NAME INPUT_DTYPE]
                         [--input_encoding INPUT_NAME INPUT_ENCODING]
                         [--input_layout INPUT_NAME INPUT_LAYOUT]
                         [--udl UDL_MODULE FACTORY_FUNCTION]
                         [--enable_preprocessing]
                         [--quantization_overrides QUANTIZATION_OVERRIDES]
                         [--keep_quant_nodes]
                         [--keep_disconnected_nodes]
                         [--validation_target RUNTIME_TARGET PROCESSOR_TARGET]
                         [--strict] [--debug [DEBUG]]
                         [-b CAFFE_BIN]
                         [--udo_config_paths CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]

Script to convert caffemodel into a DLC file.

optional arguments:
  -h, --help            show this help message and exit

required arguments:
  --input_network INPUT_NETWORK, -i INPUT_NETWORK
                        Path to the source framework model.

optional arguments:
  --out_node OUT_NAMES, --out_name OUT_NAMES
                        Name of the graph's output Tensor Names. Multiple output names should be
                        provided separately like:
                            --out_name out_1 --out_name out_2
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Path where the converted Output model should be
                        saved.If not specified, the converter model will be
                        written to a file with same name as the input model
  --copyright_file COPYRIGHT_FILE
                        Path to copyright file. If provided, the content of
                        the file will be added to the output model.
  --model_version MODEL_VERSION
                        User-defined ASCII string to identify the model, only
                        first 64 bytes will be stored
  --disable_batchnorm_folding
                        If not specified, converter will try to fold batchnorm
                        into previous convolution layer
  --input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
                        Type of data expected by each input op/layer. Type for
                        each input is |default| if not specified. For example:
                        "data" image.Note that the quotes should always be
                        included in order to handle special characters,
                        spaces,etc. For multiple inputs specify multiple
                        --input_type on the command line. Eg: --input_type
                        "data1" image --input_type "data2" opaque These
                        options get used by DSP runtime and following
                        descriptions state how input will be handled for each
                        option. Image: input is float between 0-255 and the
                        input's mean is 0.0f and the input's max is 255.0f. We
                        will cast the float to uint8ts and pass the uint8ts to
                        the DSP. Default: pass the input as floats to the dsp
                        directly and the DSP will quantize it. Opaque: assumes
                        input is float because the consumer layer(i.e next
                        layer) requires it as float, therefore it won't be
                        quantized.Choices supported:['image', 'default',
                        'opaque']
  --input_dtype INPUT_NAME INPUT_DTYPE
                        The names and datatype of the network input layers
                        specified in the format [input_name datatype], for
                        example: 'data' 'float32'. Default is float32 if not
                        specified. Note that the quotes should always be
                        included in order to handle special characters, spaces,
                        etc. For multiple inputs specify multiple
                        --input_dtype on the command line like: --input_dtype
                        'data1' 'float32' --input_dtype 'data2' 'float32'
  --input_encoding INPUT_NAME INPUT_ENCODING, -e INPUT_NAME INPUT_ENCODING
                        Image encoding of the source images. Default is bgr.
                        Eg usage: "data" rgba Note the quotes should always be
                        included in order to handle special characters,
                        spaces, etc. For multiple inputs specify
                        --input_encoding for each on the command line. Eg:
                        --input_encoding "data1" rgba --input_encoding "data2"
                        other. Use options: color encodings(bgr,rgb, nv21...)
                        if input is image; time_series: for inputs of rnn
                        models; other: if input doesn't follow above
                        categories or is unknown. Choices supported:['bgr',
                        'rgb', 'rgba', 'argb32', 'nv21', 'time_series',
                        'other']
  --input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
                        Layout of each input tensor. If not specified, it will use the default
                        based on the Source Framework, shape of input and input encoding.
                        Accepted values are-
                          NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
                        N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
                        NDHWC/NCDHW used for 5d inputs
                        NHWC/NCHW used for 4d image-like inputs
                        NFC/NCF used for inputs to Conv1D or other 1D ops
                        NTF/TNF used for inputs with time steps like the ones used for LSTM op
                        NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
                        NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
                        F used for 1D inputs, e.g. Bias tensor
                        NONTRIVIAL for everything elseFor multiple inputs specify multiple
                        --input_layout on the command line.
                        Eg:
                           --input_layout "data1" NCHW --input_layout "data2" NCHW
  --udl UDL_MODULE FACTORY_FUNCTION
                        Option to add User Defined Layers. Provide Filename, Function
                        name.1.Filename: Name of python module to load for registering custom
                        udl(note: must be in PYTHONPATH). If file part of package list the
                        package.filename as you would when doing a python import.2.Function name:
                        Name of the udl factory function that return a dictionary of key layer type
                        and value function callback.
  --enable_preprocessing
                        If specified, converter will enable preprocessing specified by a datalayer
                        transform_param subtract_mean is supported.
  --keep_disconnected_nodes
                        Disable Optimization that removes Ops not connected to the main graph.
                        This optimization uses output names provided over commandline OR
                        inputs/outputs extracted from the Source model to determine the main graph
  --validation_target RUNTIME_TARGET PROCESSOR_TARGET
                        A combination of processor and runtime target against
                        which model will be validated. Choices for
                        RUNTIME_TARGET: {cpu, gpu, dsp}. Choices for
                        PROCESSOR_TARGET: {snapdragon_801, snapdragon_820,
                        snapdragon_835}.If not specified, will validate model
                        against {snapdragon_820, snapdragon_835} across all
                        runtime targets.
  --strict              If specified, will validate in strict mode whereby
                        model will not be produced if it violates constraints
                        of the specified validation target. If not specified,
                        will validate model in permissive mode against the
                        specified validation target.
  --debug [DEBUG]       Run the converter in debug mode.
  -b CAFFE_BIN, --caffe_bin CAFFE_BIN
                        Input caffe binary file containing the weight data
  --udo_config_paths CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...], -udo CUSTOM_OP_CONFIG_PATHS 
                        [CUSTOM_OP_CONFIG_PATHS ...]
                        Path to the UDO configs (space separated, if multiple)

Quantizer Options:
  --quantization_overrides QUANTIZATION_OVERRIDES
                        Use this option to specify a json file with parameters to use for
                        quantization. These will override any quantization data carried from
                        conversion (eg TF fake quantization) or calculated during the normal
                        quantization process. Format defined as per AIMET specification.
  --keep_quant_nodes    Use this option to keep activation quantization nodes in the graph rather
                        than stripping them.

Examples of using this script can be found in Converting Models from Caffe to SNPE.

Additional details:

  • input_encoding argument:
    • Specifies the encoding type of input images.
    • A preprocessing layer is added to the network to convert input images from the specified encoding to BGR, the encoding used by Caffe.
    • The encoding preprocessing layer can be seen when using snpe-dlc-info.
    • Allowed options are:
      • argb32: The ARGB32 format consists of 4 bytes per pixel: one byte for Red, one for Green, one for Blue and one for the alpha channel. The alpha channel is ignored. For little endian CPUs, the byte order is BGRA. For big endian CPUs, the byte order is ARGB.
      • rgba: The RGBA format consists of 4 bytes per pixel: one byte for Red, one for Green, one for Blue and one for the alpha channel. The alpha channel is ignored. The byte ordering is endian independent and is always RGBA byte order.
      • nv21: NV21 is the Android version of YUV. The Chrominance is down sampled and has a sub sampling ratio of 4:2:0. Note that this image format has 3 channels, but the U and V channels are subsampled. For every four Y pixels there is one U and one V pixel.
      • bgr: The BGR format consists of 3 bytes per pixel: one byte for Red, one for Green and one for Blue. The byte ordering is endian independent and is always BGR byte order.
    • This argument is optional. If omitted then input image encoding is assumed to be BGR and no preprocessing layer is added.
    • See input_preprocessing for more details.
  • disable_batchnorm_folding argument:
    • The disable batchnorm folding argument allows the user to turn off the optimization that folds batchnorm and batchnorm + scaling layers into previous convolution layers when possible.
    • This argument is optional. If omitted then the converter will fold batchnorm and batchnorm + scaling layers into previous convolution layers wherever possible as an optimization. When this occurs the names of the folded batchnorm and scale layers are concatenated to the convolution layer it was folded into.
      • For example: if batchnorm layer named 'bn' and scale layer named 'scale' are folded into a convolution layer named 'conv', the resulting dlc will show the convolution layer to be named 'conv.bn.scale'.
  • input_type argument:
    • Specifies the expected data type for a certain input layer name.
    • This argument can be passed more than once if you want to specify the expected data type of two or more input layers.
    • input_type argument takes INPUT_NAME followed by INPUT_TYPE.
    • This argument is optional. If omitted for a certain input layer then the expected data type will be of type:default.
    • Allowed options are:
      • default: Specifies that the input contains floating-point values.
      • image: Specifies that the input contains floating-point values that are all integers in the range 0..255.
      • opaque: Specifies that the input contains floating-point values that should be passed to the selected runtime without modification.
        For example an opaque tensor is passed directly to the DSP without quantization.
    • For example: [–input_type "data" image –input_type "roi" opaque].

snpe-diagview

snpe-diagview loads a DiagLog file generated by snpe-net-run whenever it operates on input tensor data. The DiagLog file contains timing information information for each layer as well as the entire forward propagate time. If the run uses an input list of input tensors, the timing info reported by snpe-diagview is an average over the entire input set.

The snpe-net-run generates a file called "SNPEDiag_0.log", "SNPEDiag_1.log" ... , "SNPEDiag_n.log", where n corresponds to the nth iteration of the snpe-net-run execution.

usage: snpe-diagview --input_log DIAG_LOG [-h] [--output CSV_FILE]

Reads a diagnostic log and output the contents to stdout

required arguments:
  --input_log     DIAG_LOG
                Diagnostic log file (required)
optional arguments:
  --output        CSV_FILE
                Output CSV file with all diagnostic data (optional)



snpe-dlc-info

snpe-dlc-info outputs layer information from a DLC file, which provides information about the network model.

usage: snpe-dlc-info [-h] -i INPUT_DLC [-s SAVE]

required arguments:
  -i INPUT_DLC, --input_dlc INPUT_DLC
                        path to a DLC file

optional arguments:
  -s SAVE, --save SAVE
                        Save the output to a csv file. Specify a target file path.



snpe-dlc-diff

snpe-dlc-diff compares two DLCs and by default outputs some of the following differences in them in a tabular format:

  • unique layers between the two DLCs
  • parameter differences in common layers
  • differences in dimensions of buffers associated with common layers
  • weight differences in common layers
  • output tensor names differences in common layers
  • unique records between the two DLCs (currently checks for AIP records only)
usage: snpe-dlc-diff [-h] -i1 INPUT_DLC_ONE -i2 INPUT_DLC_TWO [-c] [-l] [-p]
                     [-d] [-w] [-o] [-i] [-x] [-s SAVE]

required arguments:
  -i1 INPUT_DLC_ONE, --input_dlc_one INPUT_DLC_ONE
                        path to the first dl container archive
  -i2 INPUT_DLC_TWO, --input_dlc_two INPUT_DLC_TWO
                        path to the second dl container archive

optional arguments:
  -h, --help            show this help message and exit
  -c, --copyrights      compare copyrights between models
  -l, --layers          compare unique layers between models
  -p, --parameters      compare parameter differences between identically
                        named layers
  -d, --dimensions      compare dimension differences between identically
                        named layers
  -w, --weights         compare weight differences between identically named
                        layers.
  -o, --outputs         compare output_tensor name differences names between
                        identically named layers
  -i, --diff_by_id      Overrides the default comparison strategy for diffing
                        2 models components. By default comparison is made
                        between identically named layers. With this option the
                        models are ordered by id and diff is done in order as
                        long as no more than 1 consecutive layers have
                        different layer types.
  -x, --hta             compare HTA records differences in Models
  -s SAVE, --save SAVE  Save the output to a csv file. Specify a target file
                        path.



snpe-dlc-viewer

snpe-dlc-viewer visualizes the network structure of a DLC in a web browser.

usage: snpe-dlc-viewer [-h] -i INPUT_DLC [-s]

required arguments:
  -i INPUT_DLC, --input_dlc INPUT_DLC
                        Path to a DLC file

optional arguments:
  -s, --save            Save HTML file. Specify a file name and/or target save path
  -h, --help            Shows this help message and exits

Additional details:


The DLC viewer tool renders the specified network DLC in HTML format that may be viewed on a web browser.
On installations that support a native web browser a browser instance is opened on which the network is automatically rendered.
Users can optionally save the HTML content anywhere on their systems and open on a chosen web browser independently at a later time.

  • Features:
    • Graph-based representation of network model with nodes depicting layers and edges depicting buffer connections.
    • Colored legend to indicate layer types.
    • Zoom and drag options available for ease of visualization.
    • Tool-tips upon mouse hover to describe detailed layer parameters.
    • Sections showing metadata from DLC records
  • Supported browsers:
    • Google Chrome
    • Firefox
    • Internet Explorer on Windows
    • Microsoft Edge Browser on Windows
    • Safari on Mac

snpe-dlc-quantize

snpe-dlc-quantize converts non-quantized DLC models into quantized DLC models.

Command Line Options:
  [ -h,--help ]         Displays this help message.
  [ --version ]         Displays version information.
  [ --verbose ]         Enable verbose user messages.
  [ --quiet ]           Disables some user messages.
  [ --silent ]          Disables all but fatal user messages.
  [ --debug=<val> ]     Sets the debug log level.
  [ --debug1 ]          Enables level 1 debug messages.
  [ --debug2 ]          Enables level 2 debug messages.
  [ --debug3 ]          Enables level 3 debug messages.
  [ --log-mask=<val> ]  Sets the debug log mask to set the log level for one or more areas.
                        Example: ".*=USER_ERROR, .*=INFO, NDK=DEBUG2, NCC=DEBUG3"
  [ --log-file=<val> ]  Overrides the default name for the debug log file.
  [ --log-dir=<val> ]   Overrides the default directory path where debug log files are written.
  [ --log-file-include-hostname ]
                        Appends the name of this host to the log file name.
  [ --input_dlc=<val> ]
                        Path to the dlc container containing the model for which fixed-point encoding
                        metadata should be generated. This argument is required.
  [ --input_list=<val> ]
                        Path to a file specifying the trial inputs. This file should be a plain text file,
                        containing one or more absolute file paths per line. These files will be taken to constitute
                        the trial set. Each path is expected to point to a binary file containing one trial input
                        in the 'raw' format, ready to be consumed by SNPE without any further modifications.
                        This is similar to how input is provided to snpe-net-run application.
  [ --no_weight_quantization ]
                        Generate and add the fixed-point encoding metadata but keep the weights in
                        floating point. This argument is optional.
  [ --output_dlc=<val> ]
                        Path at which the metadata-included quantized model container should be written.
                        If this argument is omitted, the quantized model will be written at <unquantized_model_name>_quantized.dlc.
  [ --enable_htp ]      Pack HTP information in quantized DLC.
  [ --htp_socs=<val> ]  Specify SoC to generate HTP Offline Cache for.
                        SoCs are specified with an ASIC identifier, in a comma separated list.
                        For example, --htp_socs sm8550
  [ --overwrite_cache_records ]
                        Overwrite HTP cache records present in the DLC.
  [ --use_float_io ]
                        Pack HTP information in quantized DLC (Note: deprecated).
  [ --use_enhanced_quantizer ]
                        Use the enhanced quantizer feature when quantizing the model.  Regular quantization determines the range using the actual
                        values of min and max of the data being quantized.  Enhanced quantization uses an algorithm to determine optimal range.  It can be
                        useful for quantizing models that have long tails in the distribution of the data being quantized.
  [ --use_adjusted_weights_quantizer ]
                        Use the adjusted tf quantizer for quantizing the weights only. This might be helpful for improving the accuracy of some models,
                        such as denoise model as being tested. This option is only used when quantizing the weights with 8 bit.
  [ --optimizations ]   Use this option to enable new optimization algorithms. Usage is:
                        --optimizations <algo_name1> --optimizations <algo_name2>
                        The available optimization algorithms are:
                        cle - Cross layer equalization includes a number of methods for equalizing weights and biases across layers in order to rectify imbalances that cause quantization errors.
                        bc - Bias correction adjusts biases to offset activation quantization errors. Typically used in conjunction with 'cle' to improve quantization accuracy (Note: deprecated).
  [ --override_params ]
                        Use this option to override quantization parameters when quantization was provided from the original source framework (eg TF fake quantization)
  [ --use_encoding_optimizations ]
                        Use this option to enable quantization encoding optimizations. This can reduce requantization in the graph and may improve accuracy for some models
                        (Note: this flag can be passed in, but is a no-op. Recognition of this flag will be removed in the future).
  [ --use_symmetric_quantize_weights ]
                        Use the symmetric quantizer feature when quantizing the weights of the model. It makes sure min and max have the
                        same absolute values about zero. Symmetrically quantized data will also be stored as int#_t data such that the offset is always 0.
  [ --bias_bitwidth=<val> ]
                        Use the --bias_bitdwith option to select the bitwidth to use when quantizing the biases, either 8 (default) or 32. Using 32 bit biases may
                        sometimes provide a small improvement in accuracy. Can't mix with --bitwidth.
  [ --act_bitwidth=<val> ]
                        Use the --act_bitwidth option to select the bitwidth to use when quantizing the activations, either 8 (default) or 16. 8w/16a is only supported
                        by the HTA currently. Can't mix with --bitwidth.
  [ --weights_bitwidth=<val> ]
                        Use the --weights_bitwidth option to select the bitwidth to use when quantizing the weights, either 8 (default) or 16. 8w/16a is only supported
                        by the HTA currently. Can't mix with --bitwidth.
  [ --bitwidth=<val> ]
                        Use the --bitwidth option to select the bitwidth to use when quantizing the weights/activation/bias, either 8 (default) or 16. Can't mix with
                        --weights_bitwidth or --act_bitwidth or --bias_bitdwith.
  [ --udo_package_path=<val> ]
                        Use this option to specify path to the Registration Library for a UDO Package. Usage is:
                        --udo_package_path=<path_to_reg_lib>
                        This option must be specified for Networks with UDO. All UDO's in Network must have host-executable CPU Implementation


Description:
Generate 8 or 16 bit TensorFlow style fixed point weight and activations encodings for a floating point SNPE model.


Additional details:

  • For specifying input_list, refer to input_list argument in snpe-net-run for supported input formats (in order to calculate output activation encoding information for all layers, do not include the line which specifies desired outputs).
  • The tool requires the batch dimension of the DLC input file to be set to 1 during the original model conversion step.
  • An example of quantization using snpe-dlc-quantize can be found in the C/C++ Tutorial section:Running the Inception v3 Model. For details on quantization see Quantized vs Non-Quantized Models.
  • Using snpe-dlc-quantize is mandatory for running on HTA. See Adding HTA sections.
  • Using snpe-dlc-quantize is mandatory for running on DSP runtime on Snapdragon 865. It is recommended that offline cache generation be used. It is specified by using –enable_htp option for snpe-dlc-quantize.
  • When using offline cache generation for HTP, the same input(s) tensors or layers and output(s) tensors or layers should be specified when using snpe-dlc-quantize and to run inference on the model using SNPE APIs or snpe-net-run. Not doing so will cause the cache to be invalidated, and graph initialization will take longer.
  • Outputs can be specified for snpe-dlc-quantize by modifying the input_list in the following ways:
              #<output_layer_name>[<space><output_layer_name>]
              %<output_tensor_name>[<space><output_tensor_name>]
              <input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>]
              …
    Note: Output tensors and layers can be specified individually, but when specifying both, the order shown must be used to specify each.
  • When running a model with an offline generated cache using snpe-net-run:
    • Any output layers specified when snpe-dlc-quantize was called, need to be specified using the input list as shown in the input_list argument to snpe-net-run.
    • Any output tensors specified when snpe-dlc-quantized was called, need to be specified using the –set_output_tensors argument to snpe-net-run. Refer to snpe-net-run for documentation.
  • When using the SNPE API:
    • Any output layers specified when snpe-dlc-quantize was called, need to be specified using the SNPEBuilder::setOutputLayers function.
    • Any output tensors specified when snpe-dlc-quantize was called, need to be specified using the SNPEBuilder::setOutputTensors function.



snpe-dlc-quant

snpe-dlc-quant converts non-quantized DLC models into quantized DLC models.

Command Line Options:
  [ -h,--help ]         Displays this help message.
  [ --version ]         Displays version information.
  [ --verbose ]         Enable verbose user messages.
  [ --quiet ]           Disables some user messages.
  [ --silent ]          Disables all but fatal user messages.
  [ --debug=<val> ]     Sets the debug log level.
  [ --debug1 ]          Enables level 1 debug messages.
  [ --debug2 ]          Enables level 2 debug messages.
  [ --debug3 ]          Enables level 3 debug messages.
  [ --log-mask=<val> ]  Sets the debug log mask to set the log level for one or more areas.
                        Example: ".*=USER_ERROR, .*=INFO, NDK=DEBUG2, NCC=DEBUG3"
  [ --log-file=<val> ]  Overrides the default name for the debug log file.
  [ --log-dir=<val> ]   Overrides the default directory path where debug log files are written.
  [ --log-file-include-hostname ]
                        Appends the name of this host to the log file name.
  [ --input_dlc=<val> ]
                        Path to the dlc container containing the model for which fixed-point encoding
                        metadata should be generated. This argument is required.
  [ --input_list=<val> ]
                        Path to a file specifying the trial inputs. This file should be a plain text file,
                        containing one or more absolute file paths per line. These files will be taken to constitute
                        the trial set. Each path is expected to point to a binary file containing one trial input
                        in the 'raw' format, ready to be consumed by SNPE without any further modifications.
                        This is similar to how input is provided to snpe-net-run application.
  [ --no_weight_quantization ]
                        Generate and add the fixed-point encoding metadata but keep the weights in
                        floating point. This argument is optional.
  [ --output_dlc=<val> ]
                        Path at which the metadata-included quantized model container should be written.
                        If this argument is omitted, the quantized model will be written at <unquantized_model_name>_quantized.dlc.
  [ --use_enhanced_quantizer ]
                        Use the enhanced quantizer feature when quantizing the model.  Regular quantization determines the range using the actual
                        values of min and max of the data being quantized.  Enhanced quantization uses an algorithm to determine optimal range.  It can be
                        useful for quantizing models that have long tails in the distribution of the data being quantized.
  [ --use_adjusted_weights_quantizer ]
                        Use the adjusted tf quantizer for quantizing the weights only. This might be helpful for improving the accuracy of some models,
                        such as denoise model as being tested. This option is only used when quantizing the weights with 8 bit.
  [ --optimizations ]   Use this option to enable new optimization algorithms. Usage is:
                        --optimizations <algo_name1> --optimizations <algo_name2>
                        The available optimization algorithms are:
                        cle - Cross layer equalization includes a number of methods for equalizing weights and biases across layers in order to rectify imbalances that cause quantization errors.
  [ --override_params ]
                        Use this option to override quantization parameters when quantization was provided from the original source framework (eg TF fake quantization)
  [ --use_encoding_optimizations ]
                        Use this option to enable quantization encoding optimizations. This can reduce requantization in the graph and may improve accuracy for some models
                        (Note: deprecated).
  [ --use_symmetric_quantize_weights ]
                        Use the symmetric quantizer feature when quantizing the weights of the model. It makes sure min and max have the
                        same absolute values about zero. Symmetrically quantized data will also be stored as int#_t data such that the offset is always 0.
  [ --bias_bitwidth=<val> ]
                        Use the --bias_bitdwith option to select the bitwidth to use when quantizing the biases, either 8 (default) or 32. Using 32 bit biases may
                        sometimes provide a small improvement in accuracy. Can't mix with --bitwidth.
  [ --act_bitwidth=<val> ]
                        Use the --act_bitwidth option to select the bitwidth to use when quantizing the activations, either 8 (default) or 16. 8w/16a is only supported
                        by the HTA currently. Can't mix with --bitwidth.
  [ --weights_bitwidth=<val> ]
                        Use the --weights_bitwidth option to select the bitwidth to use when quantizing the weights, either 8 (default) or 16. 8w/16a is only supported
                        by the HTA currently. Can't mix with --bitwidth.
  [ --bitwidth=<val> ]
                        Use the --bitwidth option to select the bitwidth to use when quantizing the weights/activation/bias, either 8 (default) or 16. Can't mix with
                        --weights_bitwidth or --act_bitwidth or --bias_bitdwith.
  [ --udo_package_path=<val> ]
                        Use this option to specify path to the Registration Library for a UDO Package. Usage is:
                        --udo_package_path=<path_to_reg_lib>
                        This option must be specified for Networks with UDO. All UDO's in Network must have host-executable CPU Implementation


Description:
Generate 8 or 16 bit TensorFlow style fixed point weight and activations encodings for a floating point SNPE model.


Additional details:

  • For specifying input_list, refer to input_list argument in snpe-net-run for supported input formats (in order to calculate output activation encoding information for all layers, do not include the line which specifies desired outputs).
  • The tool requires the batch dimension of the DLC input file to be set to 1 during the original model conversion step.
  • An example of quantization using snpe-dlc-quant can be found in the C/C++ Tutorial section:Running the Inception v3 Model. For details on quantization see Quantized vs Non-Quantized Models.
  • Outputs can be specified for snpe-dlc-quant by modifying the input_list in the following ways:
              #<output_layer_name>[<space><output_layer_name>]
              %<output_tensor_name>[<space><output_tensor_name>]
              <input_layer_name>:=<input_layer_path>[<space><input_layer_name>:=<input_layer_path>]
              …
    Note: Output tensors and layers can be specified individually, but when specifying both, the order shown must be used to specify each.
  • When using the SNPE API:
    • Any output layers specified when snpe-dlc-quant was called, need to be specified using the SNPEBuilder::setOutputLayers function.
    • Any output tensors specified when snpe-dlc-quant was called, need to be specified using the SNPEBuilder::setOutputTensors function.



snpe-dlc-graph-prepare

snpe-dlc-graph-prepare is used to perform offline graph preparation on quantized dlcs to run on DSP/HTP runtimes.

Command Line Options:
  [ -h, --help ]
                                   Displays this help message.
  [ --version ]
                                   Displays version information.
  [ --verbose ]
                                   Enable verbose user messages.
  [ --quiet ]
                                   Disables some user messages.
  [ --silent ]
                                   Disables all but fatal user messages.
  [ --debug=<val> ]
                                   Sets the debug log level.
  [ --debug1 ]
                                   Enables level 1 debug messages.
  [ --debug2 ]
                                   Enables level 2 debug messages.
  [ --debug3 ]
                                   Enables level 3 debug messages.
  [ --log-mask=<val> ]
                                   Sets the debug log mask to set the log level for one or more areas.
                                   Example: ".*=USER_ERROR, .*=INFO, NDK=DEBUG2, NCC=DEBUG3"
  [ --log-file=<val> ]
                                   Overrides the default name for the debug log file.
  [ --log-dir=<val> ]
                                   Overrides the default directory path where debug log files are written.
  [ --log-file-include-hostname ]
                                   Appends the name of this host to the log file name.
  [ --input_dlc ]
                                   Path to the dlc container containing the model for which graph cache should be generated. This argument is required.
  [ --output_dlc ]
                                   Path at which the cached data included model container should be written. If this argument is omitted, the quantized model will be written at
                                   InputModelName_cached.dlc.
  [ --set_output_tensors ]
                                   Specifies a comma separated list of tensors to be output after execution without whitespace.
  [ --set_output_layers ]
                                   Specifies a comma separated list of layers whose output buffers should be output after execution, without whitespace.
  [ --input_list ]
                                   Path to a file specifying the trial inputs. This file should be a plain text file, containing one or more absolute file paths per line. These files
                                   will be taken to constitute the trial set. Each path is expected to point to a binary file containing one trial input in the 'raw' format, ready to
                                   be consumed by SNPE without any further modifications. This is similar to how input is provided to snpe-net-run application.
  [ --htp_socs ]
                                   Specify SoC(s) to generate HTP Offline Cache for. SoCs are specified with an
                                   ASIC identifier, in a comma seperated list without whitespace. For example
                                   --htp_socs sm8350,sm8450,sm8550. This flag and --htp_archs are mutually exclusive
  [ --htp_archs ]
                                   Specify DSP Architecture(s) to generate general HTP Offline Cache for.
                                   Architectures are specified with an ASIC identifier, in a comma seperated list
                                   without whitespace. For example, --htp_archs v68,v73. This flag cannot be
                                   coupled with --htp_socs or --vtcm_override
  [ --vtcm_override ]
                                   Specify a single value representing the VTCM size in MB for the generated HTP Offline Caches.
                                   For example, --vtcm_override 4. This flag can be used in conjunction with --htp_socs to
                                   override the default SOC vtcm size setting
  [ --buffer_data_type ]
                                   Sets data type of IO buffers during prepare. Data Type can be the following:
                                   float32, fixedPoint8, fixedPoint16. Arguments should be formatted as follows:
                                   --buffer_data_type buffer_name1=buffer_name1_data_type
                                   --buffer_data_type buffer_name2=buffer_name2_data_type
                                   (Note: deprecated)
  [ --overwrite_cache_records ]
                                   Erase all HTP cache records present in the DLC before generating requested caches
  [ --use_float_io ]
                                   Prepare quantized HTP Graph to operate with floating point inputs/outputs (Note: deprecated)
  [ --udo_package_path ]
                                   Use this option to specify path to the Registration Library for UDO Package(s). Usage is:
                                   --udo_package_path=<path_to_reg_lib>
                                   Optionally, user can provide multiple packages as a comma-separated list.
                                   This option must be specified for Networks with UDO. All UDO's in Network must have host executable CPU Implementation

snpe-tensorflow-to-dlc

snpe-tensorflow-to-dlc converts a TensorFlow model into an SNPE DLC file.

usage: snpe-tensorflow-to-dlc -d INPUT_NAME INPUT_DIM --out_node OUT_NAMES
                              [--input_type INPUT_NAME INPUT_TYPE]
                              [--input_dtype INPUT_NAME INPUT_DTYPE]
                              [--input_encoding INPUT_NAME INPUT_ENCODING]
                              [--debug [DEBUG]]
                              [--input_layout INPUT_NAME INPUT_LAYOUT]
                              [--udo_config_paths UDO_CONFIG_PATHS [UDO_CONFIG_PATHS ...]]
                              [--show_unconsumed_nodes]
                              [--saved_model_tag SAVED_MODEL_TAG]
                              [--saved_model_signature_key SAVED_MODEL_SIGNATURE_KEY]
                              [--disable_batchnorm_folding]
                              [--quantization_overrides QUANTIZATION_OVERRIDES]
                              [--keep_quant_nodes]
                              [--keep_disconnected_nodes]
                              --input_network INPUT_NETWORK [-h]
                              [-o OUTPUT_PATH]
                              [--copyright_file COPYRIGHT_FILE]
                              [--model_version MODEL_VERSION]
                              [--validation_target RUNTIME_TARGET PROCESSOR_TARGET]
                              [--strict]
                              [--udo_config_paths CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]

Script to convert tensorflow model into a DLC file.

optional arguments:
  -h, --help            show this help message and exit

required arguments:
  -d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
                        The names and dimensions of the network input layers
                        specified in the format [input_name comma-separated-
                        dimensions], for example: 'data' 1,224,224,3 Note that
                        the quotes should always be included in order to
                        handlespecial characters, spaces, etc. For multiple
                        inputs specify multiple --input_dim on the command
                        line like: --input_dim 'data1' 1,224,224,3 --input_dim
                        'data2' 1,50,100,3
  --out_node OUT_NODE   Name of the graph's output nodes. Multiple output
                        nodes should be provided separately like: --out_node
                        out_1 --out_node out_2
  --input_network INPUT_NETWORK, -i INPUT_NETWORK
                        Path to the source framework model.

optional arguments:
  --input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
                        Type of data expected by each input op/layer. Type for
                        each input is |default| if not specified. For example:
                        "data" image.Note that the quotes should always be
                        included in order to handle special characters,
                        spaces,etc. For multiple inputs specify multiple
                        --input_type on the command line. Eg: --input_type
                        "data1" image --input_type "data2" opaque These
                        options get used by DSP runtime and following
                        descriptions state how input will be handled for each
                        option. Image: Input is float between 0-255 and the
                        input's mean is 0.0f and the input's max is 255.0f. We
                        will cast the float to uint8ts and pass the uint8ts to
                        the DSP. Default: Pass the input as floats to the dsp
                        directly and the DSP will quantize it. Opaque: Assumes
                        input is float because the consumer layer(i.e next
                        layer) requires it as float, therefore it won't be
                        quantized. Choices supported: image default opaque
  --input_dtype INPUT_NAME INPUT_DTYPE
                        The names and datatype of the network input layers
                        specified in the format [input_name datatype], for
                        example: 'data' 'float32'. Default is float32 if not
                        specified. Note that the quotes should always be
                        included in order to handle special characters, spaces,
                        etc. For multiple inputs specify multiple
                        --input_dtype on the command line like: --input_dtype
                        'data1' 'float32' --input_dtype 'data2' 'float32'
  --input_encoding INPUT_NAME INPUT_ENCODING, -e INPUT_NAME INPUT_ENCODING
                        Image encoding of the source images. Default is bgr.
                        Eg usage: "data" rgba Note the quotes should always be
                        included in order to handle special characters,
                        spaces, etc. For multiple inputs specify
                        --input_encoding for each on the command line. Eg:
                        --input_encoding "data1" rgba --input_encoding "data2"
                        other Use options: color encodings(bgr,rgb, nv21...)
                        if input is image; time_series: for inputs of rnn
                        models; other: if input doesn't follow above
                        categories or is unknown. Choices supported: bgr rgb
                        rgba argb32 nv21 time_series other
  --debug [DEBUG]       Run the converter in debug mode.
  --input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
                        Layout of each input tensor. If not specified, it will use the default
                        based on the Source Framework, shape of input and input encoding.
                        Accepted values are-
                          NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
                        N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
                        NDHWC/NCDHW used for 5d inputs
                        NHWC/NCHW used for 4d image-like inputs
                        NFC/NCF used for inputs to Conv1D or other 1D ops
                        NTF/TNF used for inputs with time steps like the ones used for LSTM op
                        NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
                        NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
                        F used for 1D inputs, e.g. Bias tensor
                        NONTRIVIAL for everything elseFor multiple inputs specify multiple
                        --input_layout on the command line.
                        Eg:
                           --input_layout "data1" NCHW --input_layout "data2" NCHW
  --udo_config_paths UDO_CONFIG_PATHS [UDO_CONFIG_PATHS ...], -udo UDO_CONFIG_PATHS [UDO_CONFIG_PATHS ...]
                        Path to the UDO configs (space separated, if multiple)
  --show_unconsumed_nodes
                        Displays a list of unconsumed nodes, if there any are
                        found. Nodeswhich are unconsumed do not violate the
                        structural fidelity of thegenerated graph.
  --saved_model_tag SAVED_MODEL_TAG
                        Specify the tag to seletet a MetaGraph from
                        savedmodel. ex: --saved_model_tag serve. Default value
                        will be 'serve' when it is not assigned.
  --saved_model_signature_key SAVED_MODEL_SIGNATURE_KEY
                        Specify signature key to select input and output of
                        the model. ex: --saved_model_signature_key
                        serving_default. Default value will be
                        'serving_default' when it is not assigned
  --disable_batchnorm_folding
                        If not specified, converter will try to fold batchnorm into previous layer.
  --keep_disconnected_nodes
                        Disable Optimization that removes Ops not connected to the main graph.
                        This optimization uses output names provided over commandline OR
                        inputs/outputs extracted from the Source model to determine the main graph
  -h, --help            show this help message and exit
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Path where the converted Output model should be
                        saved.If not specified, the converter model will be
                        written to a file with same name as the input model
  --copyright_file COPYRIGHT_FILE
                        Path to copyright file. If provided, the content of
                        the file will be added to the output model.
  --model_version MODEL_VERSION
                        User-defined ASCII string to identify the model, only
                        first 64 bytes will be stored
  --validation_target RUNTIME_TARGET PROCESSOR_TARGET
                        A combination of processor and runtime target against
                        which model will be validated. Choices for
                        RUNTIME_TARGET: {cpu, gpu, dsp}. Choices for
                        PROCESSOR_TARGET: {snapdragon_801, snapdragon_820,
                        snapdragon_835}. If not specified, will validate model
                        against {snapdragon_820, snapdragon_835} across all
                        runtime targets.
  --strict              If specified, will validate in strict mode whereby
                        model will not be produced if it violates constraints
                        of the specified validation target. If not specified,
                        will validate model in permissive mode against the
                        specified validation target.
  --udo_config_paths CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...], -udo CUSTOM_OP_CONFIG_PATHS 
                        [CUSTOM_OP_CONFIG_PATHS ...]
                        Path to the UDO configs (space separated, if multiple)

Quantizer Options:
  --quantization_overrides QUANTIZATION_OVERRIDES
                        Use this option to specify a json file with parameters
                        to use for quantization. These will override any
                        quantization data carried from conversion (eg TF fake
                        quantization) or calculated during the normal
                        quantization process. Format defined as per AIMET
                        specification.
  --keep_quant_nodes    Use this option to keep activation quantization nodes in the graph rather
                        than stripping them.

Examples of using this script can be found in Converting Models from TensorFlow to SNPE.

Additional details:

  • input_network argument:
    • The converter supports a single frozen graph .pb file, a path to a pair of graph meta and checkpoint files, or the path to a SavedModel directory (TF 2.x).
    • If you are using the TensorFlow Saver to save your graph during training, 3 files will be generated as described below:
      1. <model-name>.meta
      2. <model-name>
      3. checkpoint
    • The converter --input_network option specifies the path to the graph meta file. The converter will also use the checkpoint file to read the graph nodes parameters during conversion. The checkpoint file must have the same name without the .meta suffix.
    • This argument is required.
  • input_dim argument:
    • Specifies the input dimensions of the graph's input node(s)
    • The converter requires a node name along with dimensions as input from which it will create an input layer by using the node output tensor dimensions. When defining a graph, there is typically a placeholder name used as input during training in the graph. The placeholder tensor name is the name you must use as the argument. It is also possible to use other types of nodes as input, however the node used as input will not be used as part of a layer other than the input layer.
    • Multiple Inputs
      • Networks with multiple inputs must provide --input_dim INPUT_NAME INPUT_DIM, one for each input node.
    • This argument is required.
  • out_node argument:
    • The name of the last node in your TensorFlow graph which will represent the output layer of your network.

    • Multiple Outputs
      • Networks with multiple outputs must provide several --out_node arguments, one for each output node.
  • output_path argument:
    • Specifies the output DLC file name.
    • This argument is optional. If not provided the converter will create a DLC file with the same name as the graph file name, with a .dlc file extension.
  • SavedModel is the default model format in TensorFlow 2 and can been supported in SNPE TensorFlow Converter now.


snpe-tflite-to-dlc

snpe-tflite-to-dlc converts a TFLite model into a SNPE DLC file.

usage: snpe-tflite-to-dlc -d INPUT_NAME INPUT_DIM
                          [--input_dtype INPUT_NAME INPUT_DTYPE]
                          [--out_node OUT_NODE]
                          [--input_type INPUT_NAME INPUT_TYPE]
                          [--input_dtype INPUT_NAME INPUT_DTYPE]
                          [--input_encoding INPUT_NAME INPUT_ENCODING]
                          [--debug [DEBUG]]
                          [--input_layout INPUT_NAME INPUT_LAYOUT]
                          [--udo_config_paths UDO_CONFIG_PATHS [UDO_CONFIG_PATHS ...]]
                          [--dump_relay DUMP_RELAY]
                          [--disable_batchnorm_folding]
                          [--quantization_overrides QUANTIZATION_OVERRIDES]
                          [--keep_quant_nodes]
                          [--keep_disconnected_nodes]
                          --input_network INPUT_NETWORK [-h] [-o OUTPUT_PATH]
                          [--copyright_file COPYRIGHT_FILE]
                          [--model_version MODEL_VERSION]
                          [--validation_target RUNTIME_TARGET PROCESSOR_TARGET]
                          [--strict]
                          [--udo_config_paths CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]

Script to convert TFLite model into DLC

required arguments:
  -d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
                        The names and dimensions of the network input layers
                        specified in the format [input_name comma-separated-
                        dimensions], for example: 'data' 1,224,224,3 Note that
                        the quotes should always be included in order to
                        handlespecial characters, spaces, etc. For multiple
                        inputs specify multiple --input_dim on the command
                        line like: --input_dim 'data1' 1,224,224,3 --input_dim
                        'data2' 1,50,100,3
  --input_network INPUT_NETWORK, -i INPUT_NETWORK
                        Path to the source framework model.

optional arguments:
  --input_dtype INPUT_NAME INPUT_DTYPE
                        The names and datatype of the network input layers
                        specified in the format [input_name datatype], for
                        example: 'data' 'float32'. Default is float32 if not
                        specified. Note that the quotes should always be
                        included in order to handle special characters, spaces,
                        etc. For multiple inputs specify multiple
                        --input_dtype on the command line like: --input_dtype
                        'data1' 'float32' --input_dtype 'data2' 'float32'
  --out_node OUT_NODE   Name of the graph's output nodes. Multiple output
                        nodes should be provided separately like: --out_node
                        out_1 --out_node out_2
  --input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
                        Type of data expected by each input op/layer. Type for
                        each input is |default| if not specified. For example:
                        "data" image.Note that the quotes should always be
                        included in order to handle special characters,
                        spaces,etc. For multiple inputs specify multiple
                        --input_type on the command line. Eg: --input_type
                        "data1" image --input_type "data2" opaque These
                        options get used by DSP runtime and following
                        descriptions state how input will be handled for each
                        option. Image: Input is float between 0-255 and the
                        input's mean is 0.0f and the input's max is 255.0f. We
                        will cast the float to uint8ts and pass the uint8ts to
                        the DSP. Default: Pass the input as floats to the dsp
                        directly and the DSP will quantize it. Opaque: Assumes
                        input is float because the consumer layer(i.e next
                        layer) requires it as float, therefore it won't be
                        quantized. Choices supported: image default opaque
  --input_encoding INPUT_NAME INPUT_ENCODING, -e INPUT_NAME INPUT_ENCODING
                        Image encoding of the source images. Default is bgr.
                        Eg usage: "data" rgba Note the quotes should always be
                        included in order to handle special characters,
                        spaces, etc. For multiple inputs specify
                        --input_encoding for each on the command line. Eg:
                        --input_encoding "data1" rgba --input_encoding "data2"
                        other Use options: color encodings(bgr,rgb, nv21...)
                        if input is image; time_series: for inputs of rnn
                        models; other: if input doesn't follow above
                        categories or is unknown. Choices supported: bgr rgb
                        rgba argb32 nv21 time_series other
  --debug [DEBUG]       Run the converter in debug mode.
  --input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
                        Layout of each input tensor. If not specified, it will use the default
                        based on the Source Framework, shape of input and input encoding.
                        Accepted values are-
                          NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
                        N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
                        NDHWC/NCDHW used for 5d inputs
                        NHWC/NCHW used for 4d image-like inputs
                        NFC/NCF used for inputs to Conv1D or other 1D ops
                        NTF/TNF used for inputs with time steps like the ones used for LSTM op
                        NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
                        NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
                        F used for 1D inputs, e.g. Bias tensor
                        NONTRIVIAL for everything elseFor multiple inputs specify multiple
                        --input_layout on the command line.
                        Eg:
                           --input_layout "data1" NCHW --input_layout "data2" NCHW
  --udo_config_paths UDO_CONFIG_PATHS [UDO_CONFIG_PATHS ...], -udo UDO_CONFIG_PATHS [UDO_CONFIG_PATHS ...]
                        Path to the UDO configs (space separated, if multiple)
  --dump_relay DUMP_RELAY
                        Dump Relay ASM and Params at the path provided with
                        the argument Usage: --dump_relay <path_to_dump>
  --disable_batchnorm_folding
                        If not specified, converter will try to fold batchnorm into previous layer.
  --keep_disconnected_nodes
                        Disable Optimization that removes Ops not connected to the main graph.
                        This optimization uses output names provided over commandline OR
                        inputs/outputs extracted from the Source model to determine the main graph
  -h, --help            show this help message and exit
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Path where the converted Output model should be
                        saved.If not specified, the converter model will be
                        written to a file with same name as the input model
  --copyright_file COPYRIGHT_FILE
                        Path to copyright file. If provided, the content of
                        the file will be added to the output model.
  --model_version MODEL_VERSION
                        User-defined ASCII string to identify the model, only
                        first 64 bytes will be stored
  --validation_target RUNTIME_TARGET PROCESSOR_TARGET
                        A combination of processor and runtime target against
                        which model will be validated. Choices for
                        RUNTIME_TARGET: {cpu, gpu, dsp}. Choices for
                        PROCESSOR_TARGET: {snapdragon_801, snapdragon_820,
                        snapdragon_835}. If not specified, will validate model
                        against {snapdragon_820, snapdragon_835} across all
                        runtime targets.
  --strict              If specified, will validate in strict mode whereby
                        model will not be produced if it violates constraints
                        of the specified validation target. If not specified,
                        will validate model in permissive mode against the
                        specified validation target.
  --udo_config_paths CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...], -udo CUSTOM_OP_CONFIG_PATHS 
                        [CUSTOM_OP_CONFIG_PATHS ...]
                        Path to the UDO configs (space separated, if multiple)

Quantizer Options:
  --quantization_overrides QUANTIZATION_OVERRIDES
                        Use this option to specify a json file with parameters
                        to use for quantization. These will override any
                        quantization data carried from conversion (eg TF fake
                        quantization) or calculated during the normal
                        quantization process. Format defined as per AIMET
                        specification.
  --keep_quant_nodes    Use this option to keep activation quantization nodes in the graph rather
                        than stripping them.

Examples of using this script can be found in Converting Models from TFLite to SNPE.

Additional details:

  • input_network argument:
    • The converter supports a single .tflite file.
    • The converter --input_network option specifies the path to the .tflite file.
    • This argument is required.
  • input_dim argument:
    • Specifies the input dimensions of the graph's input node(s)
    • The converter requires a node name along with dimensions as input from which it will create an input layer by using the node output tensor dimensions. When defining a graph, there is typically a placeholder name used as input during training in the graph. The placeholder tensor name is the name you must use as the argument. It is also possible to use other types of nodes as input, however the node used as input will not be used as part of a layer other than the input layer.
    • Multiple Inputs
      • Networks with multiple inputs must provide --input_dim INPUT_NAME INPUT_DIM, one for each input node.
    • This argument is required.
  • output_path argument:
    • Specifies the output DLC file name.
    • This argument is optional. If not provided the converter will create a DLC file with the same name as the tflite file name, with a .dlc file extension.
  • saved_model_tag:
    • For Tensorflow 2.x networks, this option allows a MetaGraph to be selected from the SavedModel specified by input_network.
    • This argument is optional and will default to "serve" if left unset.
  • saved_model_signature:
    • For Tensorflow 2.x networks, this option specifies the signature key for selecting inputs and outputs of a Tensorflow 2.x SavedModel.
    • This argument is optional and will default to "serving_default" if unspecified.

snpe-onnx-to-dlc

snpe-onnx-to-dlc converts a serialized ONNX model into a SNPE DLC file.

usage: snpe-onnx-to-dlc [-h] [--input_network INPUT_NETWORK] [-o OUTPUT_PATH]
                        [--copyright_file COPYRIGHT_FILE]
                        [--model_version MODEL_VERSION]
                        [--disable_batchnorm_folding]
                        [--input_type INPUT_NAME INPUT_TYPE]
                        [--input_dtype INPUT_NAME INPUT_DTYPE]
                        [--input_encoding INPUT_NAME INPUT_ENCODING]
                        [--input_layout INPUT_NAME INPUT_LAYOUT]
                        [-n, --no_simplification]
                        [--disable_batchnorm_folding]
                        [--keep_disconnected_nodes]
                        [--validation_target RUNTIME_TARGET PROCESSOR_TARGET]
                        [--strict] [--debug [DEBUG]]
                        [--dry_run [DRY_RUN]]
                        [--udo_config_paths CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]

Script to convert onnxmodel into a DLC file.

optional arguments:
  -h, --help            show this help message and exit

required arguments:
  --input_network INPUT_NETWORK, -i INPUT_NETWORK
                        Path to the source framework model.

optional arguments:
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Path where the converted Output model should be
                        saved.If not specified, the converter model will be
                        written to a file with same name as the input model
  --copyright_file COPYRIGHT_FILE
                        Path to copyright file. If provided, the content of
                        the file will be added to the output model.
  --model_version MODEL_VERSION
                        User-defined ASCII string to identify the model, only
                        first 64 bytes will be stored
  -n, --no_simplification
                        Do not attempt to simplify the model automatically. This may prevent some
                        models from properly converting
                        when sequences of unsupported static operations are present.
  --disable_batchnorm_folding
                        If not specified, converter will try to fold batchnorm
                        into previous convolution layer
  --keep_disconnected_nodes
                        Disable Optimization that removes Ops not connected to the main graph.
                        This optimization uses output names provided over commandline OR
                        inputs/outputs extracted from the Source model to determine the main graph
  --input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
                        Type of data expected by each input op/layer. Type for
                        each input is |default| if not specified. For example:
                        "data" image.Note that the quotes should always be
                        included in order to handle special characters,
                        spaces,etc. For multiple inputs specify multiple
                        --input_type on the command line. Eg: --input_type
                        "data1" image --input_type "data2" opaque These
                        options get used by DSP runtime and following
                        descriptions state how input will be handled for each
                        option. Image: input is float between 0-255 and the
                        input's mean is 0.0f and the input's max is 255.0f. We
                        will cast the float to uint8ts and pass the uint8ts to
                        the DSP. Default: pass the input as floats to the dsp
                        directly and the DSP will quantize it. Opaque: assumes
                        input is float because the consumer layer(i.e next
                        layer) requires it as float, therefore it won't be
                        quantized.Choices supported:['image', 'default',
                        'opaque']
  --input_dtype INPUT_NAME INPUT_DTYPE
                        The names and datatype of the network input layers
                        specified in the format [input_name datatype], for
                        example: 'data' 'float32'. Default is float32 if not
                        specified. Note that the quotes should always be
                        included in order to handle special characters, spaces,
                        etc. For multiple inputs specify multiple
                        --input_dtype on the command line like: --input_dtype
                        'data1' 'float32' --input_dtype 'data2' 'float32'
  --input_encoding INPUT_NAME INPUT_ENCODING, -e INPUT_NAME INPUT_ENCODING
                        Image encoding of the source images. Default is bgr.
                        Eg usage: "data" rgba Note the quotes should always be
                        included in order to handle special characters,
                        spaces, etc. For multiple inputs specify
                        --input_encoding for each on the command line. Eg:
                        --input_encoding "data1" rgba --input_encoding "data2"
                        other. Use options: color encodings(bgr,rgb, nv21...)
                        if input is image; time_series: for inputs of rnn
                        models; other: if input doesn't follow above
                        categories or is unknown. Choices supported:['bgr',
                        'rgb', 'rgba', 'argb32', 'nv21', 'time_series',
                        'other']
  --input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
                        Layout of each input tensor. If not specified, it will use the default
                        based on the Source Framework, shape of input and input encoding.
                        Accepted values are-
                          NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
                        N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
                        NDHWC/NCDHW used for 5d inputs
                        NHWC/NCHW used for 4d image-like inputs
                        NFC/NCF used for inputs to Conv1D or other 1D ops
                        NTF/TNF used for inputs with time steps like the ones used for LSTM op
                        NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
                        NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
                        F used for 1D inputs, e.g. Bias tensor
                        NONTRIVIAL for everything elseFor multiple inputs specify multiple
                        --input_layout on the command line.
                        Eg:
                           --input_layout "data1" NCHW --input_layout "data2" NCHW
  --validation_target RUNTIME_TARGET PROCESSOR_TARGET
                        A combination of processor and runtime target against
                        which model will be validated. Choices for
                        RUNTIME_TARGET: {cpu, gpu, dsp}. Choices for
                        PROCESSOR_TARGET: {snapdragon_801, snapdragon_820,
                        snapdragon_835}.If not specified, will validate model
                        against {snapdragon_820, snapdragon_835} across all
                        runtime targets.
  --strict              If specified, will validate in strict mode whereby
                        model will not be produced if it violates constraints
                        of the specified validation target. If not specified,
                        will validate model in permissive mode against the
                        specified validation target.
  --debug [DEBUG]       Run the converter in debug mode.
  --dry_run [DRY_RUN]   Evaluates the model without actually converting any
                        ops, and returns unsupported ops/attributes as well as
                        unused inputs and/or outputs if any. Leave empty or
                        specify "info" to see dry run as a table, or specify
                        "debug" to show more detailed messages only"
  --udo_config_paths    UDO_CONFIG_PATHS [UDO_CONFIG_PATHS ...], -udo UDO_CONFIG_PATHS
                        [UDO_CONFIG_PATHS ...]
                          Path to the UDO configs (space separated, if multiple)

Quantizer Options:
  --quantization_overrides QUANTIZATION_OVERRIDES
                        Use this option to specify a json file with parameters
                        to use for quantization. These will override any
                        quantization data carried from conversion (eg TF fake
                        quantization) or calculated during the normal
                        quantization process. Format defined as per AIMET
                        specification.
  --keep_quant_nodes    Use this option to keep activation quantization nodes in the graph rather
                        than stripping them.

For more information, see ONNX Model Conversion


snpe-pytorch-to-dlc

snpe-pytorch-to-dlc converts a serialized PyTorch model into a SNPE DLC file.

usage: snpe-pytorch-to-dlc -d INPUT_NAME INPUT_DIM
                           [--input_dtype INPUT_NAME INPUT_DTYPE]
                           [--input_type INPUT_NAME INPUT_TYPE]
                           [--input_dtype INPUT_NAME INPUT_DTYPE]
                           [--input_encoding INPUT_NAME INPUT_ENCODING]
                           [--debug [DEBUG]]
                           [--input_layout INPUT_NAME INPUT_LAYOUT] [--dump_relay DUMP_RELAY]
                           [--disable_batchnorm_folding]
                           [--quantization_overrides QUANTIZATION_OVERRIDES]
                           [--keep_quant_nodes]
                           [--keep_disconnected_nodes]
                           --input_network INPUT_NETWORK [-h] [-o OUTPUT_PATH]
                           [--copyright_file COPYRIGHT_FILE]
                           [--model_version MODEL_VERSION]
                           [--validation_target RUNTIME_TARGET PROCESSOR_TARGET]
                           [--strict]
                           [--udo_config_paths CUSTOM_OP_CONFIG_PATHS [CUSTOM_OP_CONFIG_PATHS ...]]

Script to convert PyTorch model into DLC

required arguments:
  -d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
                        The names and dimensions of the network input layers
                        specified in the format [input_name comma-separated-
                        dimensions], for example: 'data' 1,3,224,224 Note that
                        the quotes should always be included in order to
                        handlespecial characters, spaces, etc. For multiple
                        inputs specify multiple --input_dim on the command
                        line like: --input_dim 'data1' 1,3,224,224 --input_dim
                        'data2' 1,50,100,3
  --input_network INPUT_NETWORK, -i INPUT_NETWORK
                        Path to the source framework model.

optional arguments:
  --input_dtype INPUT_NAME INPUT_DTYPE
                        The names and datatype of the network input layers
                        specified in the format [input_name datatype], for
                        example: 'data' 'float32'. Default is float32 if not
                        specified. Note that the quotes should always be
                        included in order to handle special characters, spaces,
                        etc. For multiple inputs specify multiple
                        --input_dtype on the command line like: --input_dtype
                        'data1' 'float32' --input_dtype 'data2' 'float32'
  --input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
                        Type of data expected by each input op/layer. Type for
                        each input is |default| if not specified. For example:
                        "data" image.Note that the quotes should always be
                        included in order to handle special characters,
                        spaces,etc. For multiple inputs specify multiple
                        --input_type on the command line. Eg: --input_type
                        "data1" image --input_type "data2" opaque These
                        options get used by DSP runtime and following
                        descriptions state how input will be handled for each
                        option. Image: Input is float between 0-255 and the
                        input's mean is 0.0f and the input's max is 255.0f. We
                        will cast the float to uint8ts and pass the uint8ts to
                        the DSP. Default: Pass the input as floats to the dsp
                        directly and the DSP will quantize it. Opaque: Assumes
                        input is float because the consumer layer(i.e next
                        layer) requires it as float, therefore it won't be
                        quantized. Choices supported: image default opaque
  --input_dtype INPUT_NAME INPUT_DTYPE
                        The names and datatype of the network input layers
                        specified in the format [input_name datatype], for
                        example: 'data' 'float32' Default is float32 if not
                        specified Note that the quotes should always be
                        included in order to handlespecial characters, spaces,
                        etc. For multiple inputs specify multiple
                        --input_dtype on the command line like: --input_dtype
                        'data1' 'float32' --input_dtype 'data2' 'float32'
  --input_encoding INPUT_NAME INPUT_ENCODING, -e INPUT_NAME INPUT_ENCODING
                        Image encoding of the source images. Default is bgr.
                        Eg usage: "data" rgba Note the quotes should always be
                        included in order to handle special characters,
                        spaces, etc. For multiple inputs specify
                        --input_encoding for each on the command line. Eg:
                        --input_encoding "data1" rgba --input_encoding "data2"
                        other Use options: color encodings(bgr,rgb, nv21...)
                        if input is image; time_series: for inputs of rnn
                        models; other: if input doesn't follow above
                        categories or is unknown. Choices supported: bgr rgb
                        rgba argb32 nv21 time_series other
  --debug [DEBUG]       Run the converter in debug mode.
  --input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
                        Layout of each input tensor. If not specified, it will use the default
                        based on the Source Framework, shape of input and input encoding.
                        Accepted values are-
                          NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
                        N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
                        NDHWC/NCDHW used for 5d inputs
                        NHWC/NCHW used for 4d image-like inputs
                        NFC/NCF used for inputs to Conv1D or other 1D ops
                        NTF/TNF used for inputs with time steps like the ones used for LSTM op
                        NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
                        NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
                        F used for 1D inputs, e.g. Bias tensor
                        NONTRIVIAL for everything elseFor multiple inputs specify multiple
                        --input_layout on the command line.
                        Eg:
                           --input_layout "data1" NCHW --input_layout "data2" NCHW
  --dump_relay DUMP_RELAY
                        Dump Relay ASM and Params at the path provided with
                        the argument Usage: --dump_relay <path_to_dump>
  --disable_batchnorm_folding
                        If not specified, converter will try to fold batchnorm into previous layer.
  --keep_disconnected_nodes
                        Disable Optimization that removes Ops not connected to the main graph.
                        This optimization uses output names provided over commandline OR
                        inputs/outputs extracted from the Source model to determine the main graph
  -h, --help            show this help message and exit
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Path where the converted Output model should be
                        saved.If not specified, the converter model will be
                        written to a file with same name as the input model
  --copyright_file COPYRIGHT_FILE
                        Path to copyright file. If provided, the content of
                        the file will be added to the output model.
  --model_version MODEL_VERSION
                        User-defined ASCII string to identify the model, only
                        first 64 bytes will be stored
  --validation_target RUNTIME_TARGET PROCESSOR_TARGET
                        A combination of processor and runtime target against
                        which model will be validated. Choices for
                        RUNTIME_TARGET: {cpu, gpu, dsp}. Choices for
                        PROCESSOR_TARGET: {snapdragon_801, snapdragon_820,
                        snapdragon_835}. If not specified, will validate model
                        against {snapdragon_820, snapdragon_835} across all
                        runtime targets.
  --strict              If specified, will validate in strict mode whereby
                        model will not be produced if it violates constraints
                        of the specified validation target. If not specified,
                        will validate model in permissive mode against the
                        specified validation target.
  --udo_config_paths    UDO_CONFIG_PATHS [UDO_CONFIG_PATHS ...], -udo UDO_CONFIG_PATHS
                        [UDO_CONFIG_PATHS ...]
                          Path to the UDO configs (space separated, if multiple)

Quantizer Options:
  --quantization_overrides QUANTIZATION_OVERRIDES
                        Use this option to specify a json file with parameters
                        to use for quantization. These will override any
                        quantization data carried from conversion (eg TF fake
                        quantization) or calculated during the normal
                        quantization process. Format defined as per AIMET
                        specification.
  --keep_quant_nodes    Use this option to keep activation quantization nodes in the graph rather
                        than stripping them.

For more information, see PyTorch Model Conversion


snpe-platform-validator

DESCRIPTION:
------------
snpe-platform-validator checks the SNPE compatibility/capability of a device. This tool runs on the device,
rather than on the host, and requires a few additional files to be pushed to the device besides its own executable.
Additional details below.


REQUIRED ARGUMENTS:
-------------------
  --runtime <RUNTIME>   Specify the runtime to validate. <RUNTIME> : gpu, dsp, aip, all.

OPTIONAL ARGUMENTS:
-------------------
  --coreVersion         Query the runtime core descriptor.
  --libVersion          Query the runtime core library API.
  --testRuntime         Run diagnostic tests on the specified runtime.
  --targetPath <DIR>    The directory to save output on the device. Defaults to /data/local/tmp/platformValidator/output.
  --debug               Turn on verbose logging.
  --help                Show this help message.

Additional details:

  • File needed to be pushed to device:
    •    bin/snpe-platform-validator
         lib/libcalculator.so
         lib/libsnpe_dsp_domains_v2.so
         lib/dsp/libcalculator_skel.so
         lib/dsp/libsnpe_dsp_v65_domains_v2_skel.so
         lib/dsp/libsnpe_dsp_v66_domains_v2_skel.so
      
         example: for pushing arm-android-clang6.0 variant to /data/local/tmp/platformValidator
      
         adb push $SNPE_ROOT/bin/arm-android-clang6.0/snpe-platform-validator /data/local/tmp/platformValidator/bin/snpe-platform-validator
         adb push $SNPE_ROOT/lib/arm-android-clang6.0 /data/local/tmp/platformValidator/lib
         adb push $SNPE_ROOT/lib/dsp /data/local/tmp/platformValidator/dsp

snpe-throughput-net-run

snpe-throughput-net-run concurrently runs multiple instances of SNPE for a certain duration of time and measures inference throughput. Each instance of SNPE can have its own model, designated runtime and performance profile. Please note that the "--duration" parameter is common for all instances of SNPE created.

DESCRIPTION:
------------
Example application demonstrating how to load concurrent SNPE objects
using the SNPE C/C++ API.


REQUIRED ARGUMENTS:
-------------------
  --container  <FILE>   Path to the DL container containing the network.
  --duration   <VAL>    Duration of time (in seconds) to run network execution.
  --use_cpu             Use the CPU runtime for SNPE.
  --use_gpu             Use the GPU float32 runtime for SNPE.
  --use_gpu_fp16        Use the GPU float16 runtime for SNPE.
  --use_dsp             Use the DSP fixed point runtime for SNPE.
  --perf_profile <VAL>  Specifies perf profile to set. Valid settings are "balanced" , "default" , "high_performance" ,
                        "sustained_high_performance" , "burst" , "power_saver" and "system_settings".
                        NOTE: "balanced" and "default" are the same.  "default" is being deprecated in the future.
  --use_aip             Use the AIP fixed point runtime for SNPE


OPTIONAL ARGUMENTS:
-------------------
  --debug                               Specifies that output from all layers of the network
                                        will be saved.
  --userbuffer_auto                     Specifies to use userbuffer for input and output, with auto detection of types enabled.
                                        Must be used with user specified buffer.
  --userbuffer_float                    Specifies to use userbuffer for inference, and the input type is float.
                                        Must be used with user specified buffer.
  --userbuffer_floatN                   Specifies to use userbuffer for inference, and the input type is float16 or float32.
                                        Must be used with user specified buffer.
  --userbuffer_tf8                      Specifies to use userbuffer for inference, and the input type is tf8exact0.
                                        Must be used with user specified buffer.
  --userbuffer_tfN                      Specifies to use userbuffer for inference, and the input type is tf8exact0 or tf16exact0.
                                        Must be used with user specified buffer.
  --userbuffer_float_output             Overrides the userbuffer output used for inference, and the output type is float.
                                        Must be used with user specified buffer.
  --userbuffer_floatN_output            Overrides the userbuffer output used for inference, and the output type is float16 or float32.
                                        Must be used with user specified buffer.
  --userbuffer_tf8_output               Overrides the userbuffer output used for inference, and the output type is tf8exact0.
                                        Must be used with user specified buffer.
  --userbuffer_tfN_output               Overrides the userbuffer output used for inference, and the output type is tf8exact0 or tf16exact0.
                                        Must be used with user specified buffer.
  --storage_dir <DIR>                   The directory to store SNPE metadata files
  --version                             Show SNPE Version Number.
  --iterations <VAL>                    Number of times to iterate through entire input list
  --verbose                             Print more debug information.
  --skip_execute                        Don't do execution (just SNPE graph build/teardown)
  --json  <FILE>                        Generated JSON report.
  --input_raw <FILE>                    Path to raw inputs for the network, seperated by ",".
  --enable_cpu_fallback                 Enables cpu fallback functionality. Defaults to disable mode.
  --udo_package_path <VAL,VAL>
                                        Path to UDO package with registration library for UDOs.
                                        Optionally, user can provide multiple packages as a comma-separated list.
  --priority_hint <VAL>
                                        Specifies hint for priority level. Valid settings are "low", "normal", "normal_high", "high". Defaults to normal.
                                        Note: "normal_high" is only available on DSP.
  --help                                Show this help message.

snpe-platform-validator-py

DESCRIPTION:
------------
snpe-platform-validator-py checks the SNPE compatibility/capability of a device. The output is saved in a CSV file in the
"Output" directory, in a csv format. Basic logs are also displayed on the console.

REQUIRED ARGUMENTS:
-------------------
  --runtime <RUNTIME>      Specify the runtime to validate. <RUNTIME> : gpu, dsp, aip, all.
  --directory <ARTIFACTS>  Path to the root of the unpacked SDK directory containing the executable and library files.

OPTIONAL ARGUMENTS:
-------------------
  --buildVariant <VARIANT>      Specify the build variant (e.g: arm-android-clang6.0(default), aarch64-android-clang6.0) to be validated.
  --deviceId                    Uses the device for running the adb command. Defaults to first device in the adb devices list.
  --coreVersion                 Outputs the version of the runtime that is present on the target.
  --libVersion                  Outputs the library version of the runtime that is present on the target.
  --testRuntime                 Runs a small program on the runtime and Checks if SNPE is supported for runtime.
  --targetPath <PATH>           The path to be used on the device. Defaults to /data/local/tmp/platformValidator
                                NOTE that this directory will be deleted before proceeding with validation.
  --remoteHost <REMOTEHOST>     Run on remote host through remote adb server. Defaults to localhost.
  --debug                       Set to turn on debug log.

snpe-udo-package-generator

DESCRIPTION:
------------
This tool generates a UDO (User Defined Operation) package using a
user provided config file.

USAGE:
------------
snpe-udo-package-generator [-h] --config_path CONFIG_PATH [--debug]
                                  [--output_path OUTPUT_PATH] [-f]
OPTIONAL ARGUMENTS:
-------------------
  -h, --help            show this help message and exit
  --debug               Returns debugging information from generating the package
  --output_path OUTPUT_PATH, -o OUTPUT_PATH
                        Path where the package should be saved
  -f, --force-generation
                        This option will delete the existing package
                        Note appropriate file permissions must be set to use
                        this option.

REQUIRED_ARGUMENTS:
-------------------
  --config_path CONFIG_PATH, -p CONFIG_PATH
                        The path to a config file that defines a UDO.