I'm a browser developer working on android platform
Our browser's source code is based on chromium project (version:69.0.3497.100) with single process configuration.
Recently, we have collected many crashes while calling glBindFramebufferEXT.
The crash call stack is:
Thread Name: 'Chrome_InProcGp'
pid: 17217, tid: 18015 >>> com.UCMobile <<<
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr d6b81398
r0 b6b81394 r1 ee29ad6c r2 d6b81390 r3 80000000
r4 824fb7c8 r5 024fb7c8 r6 ffffffff r7 fffffffc
r8 00000001 r9 00000004 10 ece12414 fp fffffff0
ip 07ffffff sp 911f5270 lr ee25d619 pc ee25d77c cpsr 200e0030
00 pc 0005477c arena_run_reg_alloc LINE:libc.so
01 pc 00054615 je_arena_tcache_fill_small LINE:libc.so
02 pc 0006f301 je_tcache_alloc_small_hard LINE:libc.so
03 pc 00065327 je_calloc LINE:libc.so
04 pc 0017c201 EsxLinkedList::GetNewEntry() LINE:libGLESv2_adreno.so
05 pc 0017cd93 EsxMemPool::RecycleBufferDescriptor(EsxBufferDesc*) LINE:libGLESv2_adreno.so
06 pc 0017cd45 EsxMemPool::Init(EsxMemPoolCreateData const*) LINE:libGLESv2_adreno.so
07 pc 0017cc9b EsxMemPool::Create(EsxMemPoolCreateData*) LINE:libGLESv2_adreno.so
08 pc 001660eb EsxCmdBuf::CreateMemPool(EsxCmdBufCreateData const*) LINE:libGLESv2_adreno.so
09 pc 00165ff3 EsxCmdBuf::Create(EsxCmdBufCreateData*) LINE:libGLESv2_adreno.so
10 pc 000ce41f EsxFramebufferObject::Init(EsxFramebufferObjectCreateData*) LINE:libGLESv2_adreno.so
11 pc 0008b325 EsxContext::GlBindFramebuffer(unsigned int, unsigned int) LINE:libGLESv2_adreno.so
12 pc 00c16d57 gpu::gles2::GLES2DecoderImpl::DoBindFramebuffer(unsigned int, unsigned int) LINE: gles2_cmd_decoder.cc:6182
13 pc 00c06fe3 gpu::gles2::GLES2DecoderImpl::HandleBindFramebuffer(unsigned int, void const volatile*) LINE: gles2_cmd_decoder_autogen.h:103
14 pc 00c19c85 gpu::error::Error gpu::gles2::GLES2DecoderImpl::DoCommandsImpl<false>(unsigned int, void const volatile*, int, int*) LINE: gles2_cmd_decoder.cc:5660
15 pc 00be7c59 gpu::CommandBufferService::Flush(int, gpu::AsyncAPIInterface*) LINE: command_buffer_service.cc:87
16 pc 00c556f3 gpu::CommandBufferStub::OnAsyncFlush(int, unsigned int) LINE: command_buffer_stub.cc:613
17 pc 00206a9f void base::DispatchToMethod<gpu::CommandBufferProxyImpl*, void (gpu::CommandBufferProxyImpl::*)(gpu::error::ContextLostReason, gpu::error::Error), std::__ndk1::tuple<gpu::error::ContextLostReason, gpu::error::Error> >(gpu::CommandBufferProxyImpl* const&, void (gpu::CommandBufferProxyImpl::*)(gpu::error::ContextLostReason, gpu::error::Error), std::__ndk1::tuple<gpu::error::ContextLostReason, gpu::error::Error>&&) LINE: tuple.h:60
(inlined by) void IPC::DispatchToMethod<gpu::CommandBufferProxyImpl, void (gpu::CommandBufferProxyImpl::*)(gpu::error::ContextLostReason, gpu::error::Error), void, std::__ndk1::tuple<gpu::error::ContextLostReason, gpu::error::Error> >(gpu::CommandBufferProxyImpl*, void (gpu::CommandBufferProxyImpl::*)(gpu::error::ContextLostReason, gpu::error::Error), void*, std::__ndk1::tuple<gpu::error::ContextLostReason, gpu::error::Error>&&) LINE: ipc_message_templates.h:51
18 pc 0034e151 bool IPC::MessageT<FileSystemHostMsg_CancelWrite_Meta, std::__ndk1::tuple<int, int>, void>::Dispatch<content::FileAPIMessageFilter, content::FileAPIMessageFilter, void, void (content::FileAPIMessageFilter::*)(int, int)>(IPC::Message const*, content::FileAPIMessageFilter*, content::FileAPIMessageFilter*, void*, void (content::FileAPIMessageFilter::*)(int, int)) LINE: ipc_message_templates.h:146
19 pc 00c55283 gpu::CommandBufferStub::OnMessageReceived(IPC::Message const&) LINE: command_buffer_stub.cc:280
20 pc 00c583b1 gpu::GpuChannel::HandleMessageHelper(IPC::Message const&) LINE: gpu_channel.cc:538
21 pc 00c57ad3 gpu::GpuChannel::HandleMessage(IPC::Message const&) LINE: gpu_channel.cc:514
22 pc 00010aaf base::OnceCallback<void ()>::Run() && LINE: callback.h:99
23 pc 00be9091 gpu::Scheduler::RunNextTask() LINE: scheduler.cc:526
24 pc 007f8eef base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) LINE: callback.h:99
(inlined by) base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) LINE: task_annotator.cc:101
25 pc 00801be7 base::MessageLoop::RunTask(base::PendingTask*) LINE: message_loop.cc:426
26 pc 00801d4f base::MessageLoop::DeferOrRunPendingTask(base::PendingTask) LINE: message_loop.cc:437
27 pc 00801e4f base::MessageLoop::DoWork() LINE: message_loop.cc:485
28 pc 0080286b base::MessagePumpDefault::Run(base::MessagePump::Delegate*) LINE: message_pump_default.cc:37
29 pc 0080b0a5 base::RunLoop::Run() LINE: run_loop.cc:102
30 pc 008214ed base::Thread::ThreadMain() LINE: thread.cc:383
31 pc 0083423b base::(anonymous namespace)::ThreadFunc(void*) LINE: platform_thread_posix.cc:76
32 pc 00047b3f __pthread_start(void*) LINE:libc.so
33 pc 00019f21 __start_thread LINE:libc.so
the following mobile phone may produce the crash
phone model: OPPO R11
Android system version:7.1.1
CPU: Qualcomm Technologies, Inc SDM660
GPU: Adreno (TM) 512
GPU Driver Version: OpenGL ES 3.2 [email protected] (GIT@abd12f4 I92eb381bc9) (Date:01/23/18)
We think, there is a heap corruption in libGLESv2_adreno.so
The function gpu::gles2::GLES2DecoderImpl::DoBindFramebuffer(unsigned int, unsigned int) LINE: gles2_cmd_decoder.cc:6182 is calling GL API glBindFramebufferEXT.
The disassembly of pc (ee25d77c, libc.so+0005477c, arena_run_reg_alloc) is:
ee25d73c: 048c lsls r4, r1, #18
ee25d73e: 3414 adds r4, #20
ee25d740: f854 2904 ldr.w r2, [r4], #-4
ee25d744: 4432 add r2, r6
ee25d746: eb00 0282 add.w r2, r0, r2, lsl #2
ee25d74a: 6892 ldr r2, [r2, #8]
ee25d74c: fa92 f5a2 rbit r5, r2
ee25d750: 2a00 cmp r2, #0
ee25d752: fab5 f585 clz r5, r5
ee25d756: bf08 it eq
ee25d758: f04f 35ff moveq.w r5, #4294967295 ; 0xffffffff
ee25d75c: 3b01 subs r3, #1
ee25d75e: eb05 1646 add.w r6, r5, r6, lsl #5
ee25d762: d1ed bne.n 0xee25d740
ee25d764: ea4f 1c56 mov.w ip, r6, lsr #5
ee25d768: f006 031f and.w r3, r6, #31
ee25d76c: 2401 movs r4, #1
ee25d76e: eb00 028c add.w r2, r0, ip, lsl #2
ee25d772: fa04 f303 lsl.w r3, r4, r3
ee25d776: 6895 ldr r5, [r2, #8]
ee25d778: ea83 0405 eor.w r4, r3, r5
ee25d77c: 6094 str r4, [r2, #8] <<===========crash here
ee25d77e: 429d cmp r5, r3
ee25d780: d11f bne.n 0xee25d7c2
ee25d782: 698a ldr r2, [r1, #24]
ee25d784: 2a02 cmp r2, #2
ee25d786: d31c bcc.n 0xee25d7c2
ee25d788: f101 0320 add.w r3, r1, #32
ee25d78c: f04f 0e01 mov.w lr, #1
ee25d790: 2401 movs r4, #1
ee25d792: 681a ldr r2, [r3, #0]
ee25d794: f00c 051f and.w r5, ip, #31
ee25d798: eb02 125c add.w r2, r2, ip, lsr #5
ee25d79c: fa0e f505 lsl.w r5, lr, r5
ee25d7a0: eb00 0882 add.w r8, r0, r2, lsl #2
ee25d7a4: f8d8 7008 ldr.w r7, [r8, #8]
ee25d7a8: ea87 0205 eor.w r2, r7, r5
ee25d7ac: f8c8 2008 str.w r2, [r8, #8]
ee25d7b0: 42af cmp r7, r5
ee25d7b2: d106 bne.n 0xee25d7c2
ee25d7b4: 698a ldr r2, [r1, #24]
ee25d7b6: 3401 adds r4, #1
ee25d7b8: ea4f 1c5c mov.w ip, ip, lsr #5
ee25d7bc: 3304 adds r3, #4
ee25d7be: 4294 cmp r4, r2
ee25d7c0: d3e7 bcc.n 0xee25d792
r2 is pointint to code segment without write permission, and in the current crash log it's point to:
d694c000-d8cc4000 r-xp 04766000 103:22 647 /system/app/WebViewGoogleNX/WebViewGoogleNX.apk
And we found that, the memory point by r0 is always '00000014 00000002 00000000' in all crash logs.
We use a heap memory double free test code:
for (int i = 0; i < 1000; ++i) {
__android_log_print(ANDROID_LOG_INFO, "ss", "test loop: %d", i);
void* ptr = malloc(524); // the alloc size must in region [513, 640]
free(ptr);
free(ptr); // double free here
}
It can reproduce the crash stack in libc which like:
#00 pc 000539ba /system/lib/libc.so (arena_run_reg_alloc+109)
#01 pc 00053851 /system/lib/libc.so (je_arena_tcache_fill_small+176)
#02 pc 0006e3dd /system/lib/libc.so (je_tcache_alloc_small_hard+16)
#03 pc 0006441d /system/lib/libc.so (je_calloc+832)
and the memory pointed by r0 matches '00000014 00000002 00000000' also.
So, we guess there is a double free in libGLESv2_adreno.so, and the double freed memory size is between [513, 640] bytes.
And we confirmed there is a 524 bytes allocation in EsxLinkedList::GetNewEntry() according disassembly, so we guess there maybe an Entry in EsxLinkedList has been double freed.
Since our browser is running chromium with single process architecture, thus the android's RenderThread and chromium's Chrome_InProcGp are running in the same process.
Maybe libGLESv2_adreno.so has not fully considered synchronous locks for all GL APIs, and leads an Entry double-freed in libGLESv2_adreno.so
or any other reason to cause to problem?