1 背景:
最近在项目中遇到了一个奇怪的问题,android中做monkey测试时跳过LMK直接发生OOM,由于新接触android,根据网上数据和个人理解,应先发生LMK然后才是OOM啊。所以花了一点时间研究,mark下。
2 分析:
2.1 发生问题时LOG
DMA free:110724kB
min:3100kB low:9668kB high:10444kB active_anon:199844kB
inactive_anon:3564kB active_file:164kB inactive_file:68kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:892928kB managed:865200kB mlocked:0kB dirty:4kB
writeback:0kB mapped:305472kB shmem:3564kB slab_reclaimable:5716kB
slab_unreclaimable:23428kB kernel_stack:8336kB pagetables:4092kB
unstable:0kB bounce:0kB free_pcp:1044kB local_pcp:352kB
free_cma:107744kB writeback_tmp:0kB pages_scanned:1952
all_unreclaimable? yes
lowmem_reserve[]: 0
0 1496 1496
HighMem free:77480kB
min:512kB low:12148kB high:13520kB active_anon:585924kB
inactive_anon:8164kB active_file:173092kB inactive_file:666684kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:1531904kB managed:1531904kB mlocked:0kB dirty:8kB
writeback:0kB mapped:491188kB shmem:9340kB slab_reclaimable:0kB
slab_unreclaimable:0kB kernel_stack:0kB pagetables:18836kB
unstable:0kB bounce:0kB free_pcp:864kB local_pcp:216kB free_cma:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0
0 0 0
DMA: 443*4kB (UMEC)
206*8kB (MC) 12*16kB (C) 41*32kB (C) 34*64kB (C) 34*128kB (C)
18*256kB (C) 27*512kB (C) 5*1024kB (C) 3*2048kB (C) 17*4096kB (C) =
110780kB
HighMem: 985*4kB
(UM) 40*8kB (UM) 18*16kB (UM) 10*32kB (UM) 3*64kB (UM) 2*128kB (UM)
2*256kB (UM) 0*512kB 0*1024kB 1*2048kB (M) 17*4096kB (M) =
77508kB
2.2 Android
LMK配置
adj
0
58
117
176
529
1000
minfree
18432
23040
27648
32256
36864
46080
72MB
90MB
108MB
126MB
144MB
180MB
为了确认LMK我们做了以下实验:机制是否工作:
-〉在应用层中不断分配内存
-〉调整adj参数,echo 81920, 122880, 204800, 286720, 368640, 409600
>
/sys/module/lowmemorykiller/parameters/minfree
这两种情况都能顺利触发LMK, 说明LMK工作机制。
发生问题时DMA zone free 110724KB,HIGH zone 77480KB,系统的free
memory是的,不应该触发LMK,更不用说OOM了,查看mm/page_alloc.c
中__alloc_pages_nodemask发生问题时,函数调用关系如下:
__alloc_pages_nodemask -〉__alloc_pages_slowpath
-〉get_page_from_freelist -〉__alloc_pages_may_oom
原因在于函数get_page_from_freelist 会调用zone_watermark_ok() check
wmark是不是OK,结果返回false;从LOG尽管如此DMA zone free
有110724KB,但是cma内存107744KB,因此DMA
zone剩余内存多为CMA内存,扣除CMA内存只剩下2980KB <
DMA ZONE min wmark 3100KB,因此发生OOM。
3
总结:
-->在check
zone的wmark需要扣除时间CMA内存的:
#ifdef CONFIG_CMA
if (!(alloc_flags & ALLOC_CMA))
free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
#endif
-->
LMK在我们的情况下,所有可用的系统内存都应该减去系统预留的内存,这应该是110724KB 77480KB
-10444KB-13520KB。
int other_free = global_page_state(NR_FREE_PAGES) -
totalreserve_pages;
int other_file = global_page_state(NR_FILE_PAGES) -
global_page_state(NR_SHMEM) -
total_swapcache_pages();
所以不一定是先发生的LMK然后就会发生OOM。
4 解决:
solution 1: reduce CMA reserved memory:
因为分辨率高(1920)x720),所以我们给它CMA内存256M,小内存会引起GraphicBufferAllocator
fail,256M内存仅基于我们的估计:
framebuffer need by
surfacefliger --> 4
each APK use triple buffer
to draw, max of 3 apk at the same time -> 3*3 = 9
other non-HMI case: 2~3,
so total fb maybe 16, each buffer about 8M for aligment, so we
allocate 256M CMA, and we confirmed that 192M CMA memory still
would fail.
solution
2:待续