Forums - QSML giving different results than BLAS and OPENBLAS

21 posts / 0 new
Last post
QSML giving different results than BLAS and OPENBLAS
jam513
Join Date: 5 Nov 16
Posts: 11
Posted: Tue, 2018-04-17 11:10

Hi,

I have a application that can currently uses either BLAS or OPENBLAS on a snapdragon 410E platform.  I was hoping I could switch to the QSML library (lp64 version) by changing the -lblas (or -lopenblas) to -lqmsl in the makefile.  The software compiles and runs, but does not seem to give the same or similiar results.  The system does gives the correct output using both BLAS and OPENBLAS.  Is there an additional step that is needed for QSML usage such as some kind of init call?  

I noticed that there was linux x64 versions of the libraries which i tried in an ubuntu VM as well.  The results I obtained in the VM using the x64 version of the QSML library was the same as the results i received on the QSML ARM64 snapdragon 410E library.  However, on both architectures it did not match with expected output generated from BLAS or OPENBLAS.

Any suggestions?

Thanks in advance.

-J

 

  • Up0
  • Down0
rakihasa
Join Date: 21 Sep 17
Posts: 27
Posted: Tue, 2018-04-17 11:18

Hi J, 

I am sorry to hear that you are having issues with QSML. Can you provide some details of your usecase? For example:

1. Which QSML version are you using?

2. Which routines are being used by the application?

  • Up0
  • Down0
jam513
Join Date: 5 Nov 16
Posts: 11
Posted: Tue, 2018-04-17 13:53

Hi Rakib,

I am using QSML version 0.15.2.  On the snapdragon 410E I was using the libraries for linux, arm64, lp64. On the ubuntu VM, it was linux, x64, lp64. I think the only calls the software seems to be making is to dgemm.  The software I am trying to incorporate the library into is RTKlib which is located here:

http://www.rtklib.com/

Thank you for your assistance,

-J

  • Up0
  • Down0
rakihasa
Join Date: 21 Sep 17
Posts: 27
Posted: Thu, 2018-04-19 16:38

Hi J, 

A bug was fixed recently in GEMM and it seems like you are facing the same issue.

We are preparing a new release. I will post here as soon as the release is available. Once it is released, please give it a try and let me know if the issue still exists.

  • Up0
  • Down0
rakihasa
Join Date: 21 Sep 17
Posts: 27
Posted: Wed, 2018-04-25 11:45

Hi J, 

QSML 0.15.5 was released two days ago. Please give it a try and let me know if the issue still exists.

  • Up0
  • Down0
jam513
Join Date: 5 Nov 16
Posts: 11
Posted: Mon, 2018-04-30 08:15

Hi Rakib,

The new version of QSML 0.15.5 still does not produce the correct results when compared to OPENBLAS and BLAS.  Aside from dgemm, there may also be problems in dgetrf and dgetrs.  

-J

  • Up0
  • Down0
rakihasa
Join Date: 21 Sep 17
Posts: 27
Posted: Mon, 2018-04-30 10:56

Hi J, 

Thank you for trying the new build and for the feedback.

Could you provide some details on how are your are building rktlib, which version, which tests are failing? 

I am trying to build testers from version 2.4.2 on an x64-Linux machine and I can't seem to build some of the test binaries.

 

  • Up0
  • Down0
jam513
Join Date: 5 Nov 16
Posts: 11
Posted: Mon, 2018-04-30 18:40

Hi Rakib,

In both the VM and on the snapdragon 410 I have tried both the 'rtklib_2.4.3' branch and the 'master' from here:

https://github.com/tomojitakasu/RTKLIB/tree/rtklib_2.4.3

When using the VM, I use GCC 5.4 compiler and libraries, when on the snapdragon I use GCC-4.8 versions.  I have symbolic links to the shared libraries in /usr/lib on both platforms.

In the master version there is a makeall.sh script in the app folder.  In the 2.4.3 branch version there is a makefile in the app folder.   I make the whole project, but the QSML math I am examining is primarily in the rnx2rtkp section.

in the folder app/rnx2rtkp/gcc/makefile , I make these changes.

OPTS    = -DTRACE -DENAGLO -DENAQZS -DENAGAL -DNFREQ=3 -DLAPACK

CFLAGS  = -Wall -O3 -ansi -pedantic -Wno-unused-but-set-variable -I$(SRC) $(OPTS)

LDLIBS  = -L/usr/lib -lm -lrt -lQSML

From this same folder i type:

make

which creates the rnx2rtkp executable.

I then type:

make test

The output of the test#.pos files helps me identify if it is working.  I change the -lQSML to -lblas or -lopenblas to have some test#.pos files to compare results to.  the test#.pos files from the QSML generally do not contain any data, just header information, which indicates something is not working.  

Thanks again for your assistance,

-J

 

 

 

  • Up0
  • Down0
rakihasa
Join Date: 21 Sep 17
Posts: 27
Posted: Tue, 2018-05-01 12:10

Hi J, 

Thank you for the detailed instructrions. I am able to reproduce the issue on my end.

With some early investigations, it seems that the getrf/getri might be the issue.

I will investigate more and keep you updated. 

  • Up0
  • Down0
rakihasa
Join Date: 21 Sep 17
Posts: 27
Posted: Thu, 2018-05-03 14:09

Hi J, 

I was able to figure out the issue. It is a bug in the QSML library where it assumes the error variable to be 0 (zero) when calling the routines like getrf/getri. 

The standard BLAS/LAPACK API doesn't assume that and manually sets it to 0 (zero) at the beginning of a routine and then proceeds. 

I already fixed the issue on our end and the next release should have the fix. 

For now, to bypass the error, you can initialized the error variable called 'info' to 0 (zero) in the following two lines (i am showing the changed lines):

                     "int info=0,lwork=n*16,*ipiv=imat(n,1);"

                     "int info=0,*ipiv=imat(n,1);"

Please let me know if you have any issues. I apologize for the inconvenience.

 

  • Up0
  • Down0
jam513
Join Date: 5 Nov 16
Posts: 11
Posted: Fri, 2018-05-04 09:17

Hi Rakib,

I was able to compile and run and get through some of the tests.  I have not finished all of the testing with the rest of my software setup, but it looks promising.  The first thing I noticed is that the QSML does not seem to be using more than 1 core.  I tried environemnt variables OMP_NUM_THREADS=4 and QSML_NUM_THREADS=4 as well as linux's taskset command, but was unsuccessful. 

Is there something else I need to do to allow it to use multiple cores? 

Thank you again for all of your help and assistance.

-J

  • Up0
  • Down0
rakihasa
Join Date: 21 Sep 17
Posts: 27
Posted: Fri, 2018-05-04 09:51

Hi J, 

Just to get it out of the way, you probably already checked that you are linking to libQSML-0.15.5.so and not libQSML-sequential-0.15.5.so, right?

You are correct that you should be able to control the number of threads using QSML_NUM_THREADS variable. By default, it should be the number of cores.

One thing that I need to mention is that when I was debugging the 'info' issue, I checked for the input sizes for dgemm/dgetrf from all the tests in that directory. Most of the time, the input sizes are below 100. At that range, the threading overhead is usually too high to actually use any parallel implementation. So, QSML just calls the sequential implementation instead. 

I hope I was able to clarify.

Please let me know if I could help with anything else.

  • Up0
  • Down0
jam513
Join Date: 5 Nov 16
Posts: 11
Posted: Fri, 2018-05-04 10:22

Hi Rakib,

I am using the regular (non-sequential) version of the library.  Is there a way to force it to use multiple cores? 

I know from testing with openblas, that my application's test code does infact benefit from using multiple cores.  I see significant speed increases with multiple cores being used with the openblas library.  Is there some test code you can post that can confirm the QSML would use multiple cores to execute?  

Thanks again.

-J

  • Up0
  • Down0
rakihasa
Join Date: 21 Sep 17
Posts: 27
Posted: Fri, 2018-05-04 15:27

Hi J, 

Yes. There is a way to always force QSML to use multiple cores.

You can set the environment variable QSML_ALWAYS_PARALLEL=1 to get that behavior.

  • Up0
  • Down0
jam513
Join Date: 5 Nov 16
Posts: 11
Posted: Mon, 2018-05-07 07:41

Hi Rakib,

I tried the QSML_ALWAYS_PARALLEL=1 variable, but it is still only using 1 core.  My test application on the snapdragon 410 does shows very good performance while using that one core though.  Is there a sample program or test you can post that you know will use multiple cores with QSML?  I want to confirm my system setup is capable of using multiple cores with QSML.  

Thanks again for your assistance.

-J

  • Up0
  • Down0
rakihasa
Join Date: 21 Sep 17
Posts: 27
Posted: Mon, 2018-05-07 09:55

Hi J, 

I tested the approach on a Snapdragon 835 device and it worked fine i.e. forcing always parallel even for tiny problems.

But I think I understand the issue you are having. Snapdragon 410 is not on the list of SOCs for which QSML is optimized. This is partly because it only contains 4 power-efficient CPU cores (Cortex-A53). If QSML doesn't recognize the SOC, it defaults to 1 core only.

I will try to get a Snapdragon 410 device and verify the issue. If that is really the case, then I will raise the issue and we will try to get it resolved in the next release. 

  • Up0
  • Down0
jam513
Join Date: 5 Nov 16
Posts: 11
Posted: Mon, 2018-05-07 17:39

Hi Rakib,

Thanks again for all your help.  Is there a published list detailing which Snapdragon processors the Qualcomm Snapdragon Math Library is optimized for and which Snapdragon processors it is not.  I could not find that kind of information in the documentation.  

Sincerely,

-J

  • Up0
  • Down0
rakihasa
Join Date: 21 Sep 17
Posts: 27
Posted: Tue, 2018-05-08 11:55

Hi J, 

There is no published list of the processors for which QSML is optimized.

As for debugging the threading issue, I verified it on my end. We will work on a better way to handle unrecognized SOCs.

  • Up0
  • Down0
rakihasa
Join Date: 21 Sep 17
Posts: 27
Posted: Mon, 2018-10-29 18:06

 

Hi J, 

A new version of the library (now named QML, version 1.0.0) is released. Please give it a try. 

This version should be able provide parallel performance even if the SOC is not in the supported list.

You should now also be able to override the number of threads using QML_NUM_THREADS environment variable.

​Please let me know if you are having any other issues.

  • Up0
  • Down0
jam513
Join Date: 5 Nov 16
Posts: 11
Posted: Tue, 2018-12-11 12:29

Hi Rakib,

Sorry for the delayed response.  I am having trouble making the project with the new version of the library.  during the make I am getting this error:

/usr/bin/ld: warning: libc.so, needed by /usr/lib/gcc/aarch64-linux-gnu/4.9/../../../../lib/libqml.so, not found (try using -rpath or -rpath-link)
 
This error did not appear in the last version.  I have tried different options in the make file as well as modifying /etc/ld.so.conf.d/libc.conf but did not have luck.  
 
Any suggestions?
 
-J
  • Up0
  • Down0
rakihasa
Join Date: 21 Sep 17
Posts: 27
Posted: Wed, 2019-01-02 11:18

Hi J, 

My mistake. In the latest release, we only released binaries for Android systems. Since you are using Linux, we need to release the Linux binaries of QMLv1.0 .

I will let you know as soon as we release the Linux binaries. 

  • Up0
  • Down0
or Register

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries (“Qualcomm”). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.