Hello,

ive got a question regarding the 3D-Reconstruction Module of FastCV:

Description of my problem: I have a couple of corresponding 3D->2D points and I want to calculate the OpenGL-Pose (4x4 Modelviewmatrix). I also have the ProjectionMatrix. How exactly can I do that? Im a little confused by thoose functions, cause they only evaluate and optimize a 3x4 Matrix, but there doesn't seem to be a function to actually calculate such a Matrix / Pose.

One can create the OpenGL modelview matrix by using the fastCV function fcvGeomPoseRefineGNf32(.). The 2D->3D point information is given by the input parameter 'corrs', initpose describes the initial pose and refinedpose will be the updated pose. Both initpose and refinedpose are 'float' vectors of size 12 and contains the rotation (R) and translation (T) information. R is a 3 x 3 matrix ([r1, r2, r3; r4, r5, r6; r7, r8, r9]) and T is a 3 x 1 vector ([t1; t2; t3]). Now the rotation and translation information in initpose and refined pose is ordered in the following manner: [r1; r2; r3; t1; r4; r5; r6; t2; r7; r8; r9; t3], where both initpose and refined pose are 12 x 1 vectors.

Now, to create the modelview matrix (dimension: 4 x 4) in OpenGL, the 12 x1 pose vector has to be assigned to the first 3 rows of modelview matrix and the last row of the modelview matrix will be [0, 0, 0, 1]. So, overall the modelview matrix will look like: M = [r1, r2, r3, t1; r4, r5, r6, t2; r7, r8, r9, t3; 0, 0, 0, 1]. The modelview matrix in OpenGL is of dimension 4 x 4.

The OpenGL projection matrix is independent from the pose we calculate in fastCV. The projection matrix is used for rendering only and it describes the camera frustum, viewing direction, etc. The scene contents within the viewing frustum will only be rendered on the screen and everything outside the viewing frustum will be clipped away. So, one needs to choose the viewing frustum paramters accordingly; else, contents of the scene that one would like to render will be alipped away.

Thanks, that was very helpful. What should go into the 'indices' field in the 'corrs' struct? Since we haven't done any refinement to that point, and we want to incorporate all correspondences we got so far, I supposed it should have the same length as the corrs.from array and just increment by one in every element.

So far it works, but the returned refinedpose is exactly the same as initpose.

Yes, you are correct. If you want all the points to be used for pose estimation, then the size of indices should be the # of points and the index of each point in your array should be just incremented by 1. In other words, for N points, the indices array will be from 0 to N - 1.

One way to do a sanity check for correct usage of the function is to have an identical set of to and from points and then call the pose estimation function. In this case, the refined pose should be same as 'initpose'.

Are the initial and refined poses the poses of the camera or the object one wants to track? We probably mixed this together here.

Doing a sanity check by using the same from and to values results in the same, both poses are identical. What information gives us the return value of fcvGeomPoseRefineGNf32 (reprojection error)? It varies between 270 and 310 in our case, when we find a lot of correspondences.

The initial and refined pose are the poses of the camera that we are tracking. Lets say the current pose of camera is 'pose1' at frame1. We want to find 'pose2', i.e. the pose of the camera at frame2. For a generic case, we will have a set of corresponding points in frame1 and frame2 and the refined pose will update the camera pose such that the points in frame1 will be mapped to points in frame2. Now, if frame1 and frame2 are identical (i.e. the camera didn't move at all), we will have identical set of points in frame1 and frame2; hence the refined pose will be identical to the initial pose.

I would have expected the reprojection error to be near zero; the reprojection error is result of fcvGeomPoseEvaluateErrorf32 being called based on the calculated refined pose. So, if refined pose is same as initial pose and the set of points are the same in both the frames, theoretically reprojection error should be 0.

Thanks, it *seems* to work now. The error with initpose being the same as refinedpose was most probably due to a faulty pointer reference in the arguments for fcvGeomPoseRefineGNf32.

Does the refined camera pose need any calculations to use them further (such as sine/cosine)? E.g, we're giving an identity matrix to the pose calculator, and get back a matrix with relatively big values (6.xx to 9.xx). Is it in the same "format" as the initial pose matrix?

The refinedPose is in the same format as the initialPose and shouldn't need any further calculations like sine/cosine/etc. How are your set of corresponding points? Have the points moved significantly between the two frames? In that case, the solution won't be very reliable. The function is more meant for incremental motion & pose updates.

The points shouldn't be that far off. Since we're currently testing our implementation we use very similar (randomly generated) correspondences (maybe 10 to 20 pixels in screenspace) and get a reprojection error of 1-10 which seems to be okay. The "from" values for the corrs struct are worldspace coordinates, the "to" values screenspace coordinates, is that correct?

Basically, this is the data we use:

from: 3D worldspace coordinates (in tuples of three)

to: 2D Screenspace coordinate (in tuples of two)

indices: Array from 0 to the number of our correspondences

fromStride/toStride: both 0 as suggested in the API docs

numIndices: Length of the indices array

numCorrespondences: Length of the from array

initPose: The rotation matrix of our camera plus its translation vector combined in one float array in the format [r1; r2; r3; t1; r4; r5; r6; t2; r7; r8; r9; t3]

refinedPose: We extract the rotation values and set them as the rotation matrix of our camera, and the translation values as the position of our camera.

Our native method call looks like this:

fcvGeomPoseRefineGNf32(&correspondenceData, 2, 5, 50, (float32_t*) &initPose, pRefinedPose);

The values for min/maxIterations and stopCriteria are more or less arbitrary choices.

Hm, then thats fine either. Another thing that came to my mind is that our 3D-Engine uses a different coordinate system (rotated by 180°, so positive y goes down and positive z goes into the screen). Does FastCV normally use a "normal" (positive y is up...) coordinate system and we need to convert the matrices, or does that not matter?

Yes, the coordinate system will matter if you want to fill up some transformation matrices in graphics APIs or rendering libraries. For example, in OpenGL one needs to set up the modelview matrix prior to rendering and the convention of FastCV is same as OpenGL's axes convention. It follows the typical right handed coordinate system with the origin at the center of the frame: X points to right, Y points up and Z is out of the board.

Now, if you use some other graphics API like DirectX or some of your own rendering library, you will need to adjust the transofrmation matrices to render correctly. As an example, in DirectX, Y points down and the origin is at the upper left corner. So, to be consistent with DirectX, the rotation matrix as output from fastCV has to be rotated around X axis by 180 degrees prior to rendering. Similarly, the translation is also required to align the origin to the upper left corner. From your description, I feel that you are using the coordinate system similar to DirectX (though I am not sure where your origin is) and you need to do the above transformation.