Hi, I am now running neutral network utilizing QNN on Snapdragon 8gen3 HTP backend. However, I have encountered a puzzling issue regarding memory usage that I hope you might shed some light on.
Specifically, after finalizing the graph, I noticed a significant increase in memory usage, about 3x. For instance, when I added an INT8 Conv2D layer to implement FullyConnected, e.g., 1, 1, 1, 2048 for inputs and 1,1, 2048, 2048 for weights, the memory consumption should be only 4MB. However, surprisingly, the memory usage of the finalized graph was 12MB. A similar trend for 1, 1, 4096, 4096 weights was observed.
Here is the log for adding three 1,1, 2048, 2048 weights Conv2D ops. The number in () is virtual memory usage.
Memory Usage: 384 MB(3671)Memory Usage: 388 MB(3671) after add: model.layers.0.ln1Memory Usage: 393 MB(3671) after add: model.layers.0.ln2Memory Usage: 397 MB(3671) after add: model.layers.0.ln3Memory Usage: 397 MB(3671) at: before graph finilizeMemory Usage: 435 MB(3685) at: after graph finilize
As evident from the log, although each Conv2D operation should account for 4MB, the finalized graph exhibits an additional 38MB memory usage, nearly three times the expected value.
I wonder whether this is attributable to a memory bug for QNN or QNN graph finalization introduces some specific optimizations that inadvertently affect memory allocation. Given my intention to execute large neural networks, this threefold increase in memory usage poses a significant challenge, potentially leading to out-of-memory errors.
Any insights or assistance you could provide in resolving this issue would be immensely appreciated.
Looking forward to your prompt response!
Thanks.