Forums - Support for copying of data to 3D textures?

8 posts / 0 new
Last post
Support for copying of data to 3D textures?
formisk
Join Date: 24 Jan 15
Posts: 8
Posted: Fri, 2015-02-06 07:01

Hello

I have example code that runs differently on Windows and on Android (Motorola G, has an adreno 305). It appears as if the adreno code refuses to copy anything but the first z layer when using glTexSubImage3D to copy from a buffer bound to GL_PIXEL_UNPACK_BUFFER, is this a bug?

I have attached code to reproduce this. It does the following:

1. Creates a buffer of RAM memory of size: 64*64*64 that contains the ints of 0...(64*64*64).

2. Uploads this buffer to GPU memory.

3. Converts the GPU buffer to a texture (I suspect the problem is here)

4. Uses transform feedback to copy the texture to a GPU buffer.

5. Downloads the GPU buffer and prints it.

PC/Win/ATI output:
4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089
4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129
4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149
 
Android/MotorolaG/Adreno305 output:
I/stdout  ( 6838): 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4
086 4087 4088 4089 4090 4091 4092 4093 4094 4095 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
 
Notice how the data starts repeating on android after item 4095. 64*64 is 4096 so this is right where the z data goes into the next slice for the first time.
 
Is this a bug? Or am I doing something wrong?
  • Up0
  • Down0
carrado.grant@g...
Join Date: 20 Sep 12
Posts: 38
Posted: Fri, 2015-02-06 09:30

How exactly are you using transform feeback to convert the GPU buffer to a texture? It would be great to have more information about the use case to have some frame of reference. In either case, the output of transform feeback is wholly dependent on how the transform feeback varyings are setup. Also tied to this is how are you reading your attributes in the vertex shader. I'm assume that you are rendering some form of geometry with the vertex/geometry shader that is writing to the transform feeback buffer...There are so many variables its difficult to say wahts going on. Are the results correct on Window ? The transform feeback stage is before the rasterizer so if you don't have pixel size triangles, then whatever you are writing to the buffer will be per-vertex and NOT per-pixel, so I don't see how you would ever be able to fill that buffer using transform feedback...btw a buffer of 64x64x64 will only be able to store ( 64x64x64 ) / 4  32-bit integer since the size of an integer will be 4 bytes and NOT 1 byte as I think your code is assuming..

  • Up0
  • Down0
carrado.grant@g...
Join Date: 20 Sep 12
Posts: 38
Posted: Fri, 2015-02-06 09:32

ignore the last sentence as I wrote that after realizing that you posted a code snippet.

  • Up0
  • Down0
formisk
Join Date: 24 Jan 15
Posts: 8
Posted: Fri, 2015-02-06 10:34

Thank you for taking the time to reply. I will try and give answers to your questions.

I am only using transform feedback in order to have some mechanism to copy a texture from GPU RAM to normal RAM. I cannot be 100% certain but I don't think transform feedback itself is relevant for the question at hand (I could be wrong!). The reason being that if I change the code from rendering 64*64*64 to rendering 32*32*32 values the data copied repeats as before at the first switch of z (now at the 32*32=1024 mark).

If you are curious what the end goal of all this is it is to implement GPU accelerated generation of and dual contouring of perlin noise 3d volumes. with this approach you are constantly generating data to buffers from shaders and using those buffers as input to generate more data (since there is no support for using large buffers as input they must be converted to textures). But I tried to create an example that was as small and to the point as possible, so this should be irrelevant, as far as these considerations go I am simply trying to copy a GPU buffer to a 3d GPU texture so I can input it to a shader.

With regards to what varyings are read in the transform feedback shader: In a perfect world none. I must cheat a bit here and read from a fake input otherwise the adreno opengl driver refuses to render the transform feedback, something that I do not have to do on the PC. But (as you can see in the code) it should have no impact since the input read is multiplied with 0.

There is nothing rendered to window. I do however print text to std::out. In fact, the rasterizer has been turned off with GL_RASTERIZER_DISCARD.

You can fill a 64*64*64 unsigned int buffer with transform feedback by first creating a buffer of the correct size (64*64*64*sizeof(GLuint)) and then binding it as output to the current transform feedback object and executing a transform feedback rendering with 64*64*64 primitives, if the shader defines the output (and you make sure to tell opengl about it before linking the transform feedback shader) as an uint (that is not a vector, just a single value) then the entire buffer will get filled. This is because OpenGL ES (as far as I know) gives no ability to the user to specify what datatype an output has, so you automatically get a size of 4 bytes.

Last sentence ignored as per instructions.

  • Up0
  • Down0
carrado.grant@g...
Join Date: 20 Sep 12
Posts: 38
Posted: Fri, 2015-02-06 11:22

Thanks for the insight as this gave some perspective to problem. If you are looking for an efficient way to CPU->GPU transfer
or vice-versa of 'texture' data I would stick to using pixel buffer objects ( PBOs ) since they were created for such a use
case and can be very efficient when used correctly.

-You would render the result of you noise generation into a texture attached to a frame buffer object and then using a PBO
to copy the data into host memory. You may have to create a ring-buffer of FBOs/attachment and PBOs to prevent stalling, but
this is something that you may have to experiment with.

  • Up0
  • Down0
formisk
Join Date: 24 Jan 15
Posts: 8
Posted: Fri, 2015-02-06 11:27

Thank you, this is a very interesting idea, to render directly to a texture via a framebuffer. It's strange but it had not crossed my mind even though I do that for another part of the code (atmospheric scattering). I will need to seriously look into that.

  • Up0
  • Down0
formisk
Join Date: 24 Jan 15
Posts: 8
Posted: Fri, 2015-02-20 11:16

I rewrote my code to use frame buffer rendering to volumetric texture instead of using transform feedback which lead me to be able to zero in on what was the problem since the problem persisted even after the reimplementation.

As a reminder the issue was that it appeared that only the first depth (0) of a 3D-texture was accesible. While the code read any depth value in the texture fine on my PC.

By using rendering to texture I could eliminate glTexSubImage3D as the culprit since it was no longer used. Instead it appears that it was texelFetch that was causing the problem. No matter what value was given as depth it only ever read from depth 0. I have now worked around the issue by using texture() to read from the 3D-texture instead and everything works fine (though it's not as good a tool for the job).

I suspect that this is a bug in the adreno OpenGL driver. It is after all not all that common to read from a volumetric texture via textureFetch, perhaps there are only tests for the first depth slice?

  • Up0
  • Down0
carrado.grant@g...
Join Date: 20 Sep 12
Posts: 38
Posted: Fri, 2015-02-20 20:31

Nice...I'm happy you made some progress. I'm going to be facetious and say its most likely a driver bug( if your application supports it, you could verify with other OpenGL implementation, desktop as well if you application is for mobile devices. )...Not to knock Qualcomm but I've been running into lots of issue on Adreno GPUs that I have not encountered with other implementation but usually devise a workaround for them. I have a pressing one myself that I will be listing soon wrt to pixel buffer object( again, the issue only manifest itself on a Adreno GPU ), but thats for another day..The one thing I will say though is that you will usually get a response to your post from them which is really good..

  • Up0
  • Down0
or Register

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries (“Qualcomm”). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.