linux系统报xfs_vm_releasepage警告问题的处理方法
问题说明
最近的几台机器在同一天的不同时段都出现以下警告信息:
Mar2620:55:03host1kernel:WARNING:atfs/xfs/xfs_aops.c:1045xfs_vm_releasepage+0xcb/0x100[xfs]() Mar2620:55:03host1kernel:Moduleslinkedin:nf_conntrack_ipv4nf_defrag_ipv4xt_conntracknf_conntrackiptable_filterip_tablesebtable_filterebtablesip6table_ filterip6_tablesdevlinkbridgestpllcxt_multiportsunrpcdm_mirrordm_region_hashdm_logdm_modintel_powerclampcoretempintel_rapliosf_mbikvm_intelkvmirqbypa sscrc32_pclmulghash_clmulni_intelaesni_intellrwgf128mulglue_helperablk_helpercryptdiTCO_wdtiTCO_vendor_supportdcdbasipmi_devintfipmi_sisgpcspkripmi_msg handlershpchpi2c_i801lpc_ichnfitlibnvdimmacpi_power_meterkgwttm(OE)xfslibcrc32csd_modcrc_t10difcrct10dif_genericcrct10dif_pclmulcrct10dif_commoncrc32c_i ntelmgag200drm_kms_helperigbsyscopyareasysfillrectsysimgbltptpfb_sys_fopsttmpps_coredcaahcidrmi2c_algo_bitlibahcimegaraid_sasi2c_corelibata Mar2620:55:03host1kernel:fjes[lastunloaded:nf_defrag_ipv4] Mar2620:55:03host1kernel:CPU:10PID:224Comm:kswapd0Tainted:GOE------------3.10.0-514.21.2.el7.x86_64#1 Mar2620:55:03host1kernel:Hardwarename:DellInc.PowerEdgeR640/0W23H8,BIOS1.3.702/08/2018 Mar2620:55:03host1kernel:000000000000000000000000e02a0d05ffff88103c7ebaa0ffffffff81687073 Mar2620:55:03host1kernel:ffff88103c7ebad8ffffffff81085cb0ffffea0000687620ffffea0000687600 Mar2620:55:03host1kernel:ffff88004a71daf8ffff88103c7ebda0ffffea0000687600ffff88103c7ebae8 Mar2620:55:03host1kernel:CallTrace: Mar2620:55:03host1kernel:[]dump_stack+0x19/0x1b Mar2620:55:03host1kernel:[ ]warn_slowpath_common+0x70/0xb0 Mar2620:55:03host1kernel:[ ]warn_slowpath_null+0x1a/0x20 Mar2620:55:03host1kernel:[ ]xfs_vm_releasepage+0xcb/0x100[xfs] Mar2620:55:03host1kernel:[ ]try_to_release_page+0x32/0x50 Mar2620:55:03host1kernel:[ ]shrink_active_list+0x3d6/0x3e0 Mar2620:55:03host1kernel:[ ]shrink_lruvec+0x3f1/0x770 Mar2620:55:03host1kernel:[ ]shrink_zone+0x76/0x1a0 Mar2620:55:03host1kernel:[ ]balance_pgdat+0x48c/0x5e0 Mar2620:55:03host1kernel:[ ]kswapd+0x173/0x450 Mar2620:55:03host1kernel:[ ]?wake_up_atomic_t+0x30/0x30 Mar2620:55:03host1kernel:[ ]?balance_pgdat+0x5e0/0x5e0 Mar2620:55:03host1kernel:[ ]kthread+0xcf/0xe0 Mar2620:55:03host1kernel:[ ]?kthread_create_on_node+0x140/0x140 Mar2620:55:03host1kernel:[ ]ret_from_fork+0x58/0x90 Mar2620:55:03host1kernel:[ ]?kthread_create_on_node+0x140/0x140 Mar2620:55:03host1kernel:---[endtrace24823c5c7a1ea2be]---
这几台机器的kernel及应用程序等崩溃信息由abrtd服务接管,可以通过abrt-cli查看概要信息:
#abrt-clilist--since1547518209 id2181dce8f72761585cb6a904dbff1806c1315c27 reason:WARNING:atfs/xfs/xfs_aops.c:1045xfs_vm_releasepage+0xcb/0x100[xfs]() time:Sat23Mar201908:30:45PMCST cmdline:BOOT_IMAGE=/boot/vmlinuz-3.10.0-514.16.1.el7.x86_64root=/dev/sda1rocrashkernel=autonet.ifnames=0biosdevname=0 package:kernel uid:0(root) count:1 Directory:/var/spool/abrt/oops-2019-03-23-20:30:45-163925-0
内核版本如下:
Centos7
Linuxhost13.10.0-514.21.2.el7.x86_64
分析处理
红帽知识库
参考红帽知识库文档,xfs的这类警告信息在xfs模块遍历代码路径的时候会打印该信息,不影响主机使用.可升级内核到kernel-3.10.0-693.el7版本避免该警告信息,详细参见:redhat-access-2893711
RootCause:
Themessageswereinformationalandtheydonotaffectthesysteminanegativemanner.TheyareseenbecausetheXFSmoduleistraversingthroughXFScodepath.
代码分析
红帽知识库中并未提到内存回收的相关信息,不过从堆栈信息来看,像是因为内核回收内存而引起的,查看对应时间点的内存使用情况如下所示:
04:30:01PMkbmemfreekbmemused%memusedkbbufferskbcachedkbcommit%commitkbactivekbinactkbdirty ...... 08:40:01PM51394013097622099.618761046163802861058421.769243966034840920524 08:50:01PM47989613101026499.648761046664962855729221.729251387234804240400 09:00:01PM45594813103421299.658761046757122858885221.749241872434926132572 09:10:01PM55698013093318099.588761046103522855265621.719428721232983892900 #sysctlvm.min_free_kbytes vm.min_free_kbytes=90112
20:50到21:00之间的可用内存并没有增加,这意味着系统可能没有做内存回收操作,我们按照kernel日志的堆栈信息来看函数的调用关系:
shrink_active_list->try_to_release_page->xfs_vm_releasepage
//source/mm/filemap.c
3225inttry_to_release_page(structpage*page,gfp_tgfp_mask)
3226{
3227structaddress_space*constmapping=page->mapping;
......
3233if(mapping&&mapping->a_ops->releasepage)
3234returnmapping->a_ops->releasepage(page,gfp_mask);xfs_vm_releasepage
3235returntry_to_free_buffers(page);
3236}
//source/fs/xfs/xfs_aops.c
1034STATICint
1035xfs_vm_releasepage(
1036structpage*page,
1037gfp_tgfp_mask)
1038{
1039intdelalloc,unwritten;
1040
1041trace_xfs_releasepage(page->mapping->host,page,0,0);
1042
1043xfs_count_page_state(page,&delalloc,&unwritten);
1044
1045if(WARN_ON_ONCE(delalloc))
1046return0;
1047if(WARN_ON_ONCE(unwritten))
1048return0;
1049
1050returntry_to_free_buffers(page);
1051}
......
1827conststructaddress_space_operationsxfs_address_space_operations={
1833.releasepage=xfs_vm_releasepage,
对应kernel日志kernel:WARNING:atfs/xfs/xfs_aops.c:1045即可看出源文件source/fs/xfs/xfs_aops.c的1045行打印出了该堆栈信息,实际上并没有执行try_to_free_buffers就已经返回:
1045if(WARN_ON_ONCE(delalloc)) 1046return0;
WARN_ON_ONCE则相对简单,在源文件source/include/asm-generic/bug.h即可找到:
73#define__WARN()warn_slowpath_null(__FILE__,__LINE__)
85#defineWARN_ON(condition)({\
...
88__WARN();\
136#defineWARN_ON_ONCE(condition)({\
....
140if(unlikely(__ret_warn_once))\
141if(WARN_ON(!__warned))\
__WARN函数则调用了堆栈信息里的warn_slowpath_null函数,进而调用warn_slowpath_common函数打印了堆栈信息:
//source/kernel/panic.c
517voidwarn_slowpath_null(constchar*file,intline)
518{
519warn_slowpath_common(file,line,__builtin_return_address(0),
520TAINT_WARN,NULL);
521}
463staticvoidwarn_slowpath_common(constchar*file,intline,void*caller,
464unsignedtaint,structslowpath_args*args)
465{
466disable_trace_on_warning();
467
468printk(KERN_WARNING"------------[cuthere]------------\n");
469printk(KERN_WARNING"WARNING:at%s:%d%pS()\n",file,line,caller);
470
471if(args)
472vprintk(args->fmt,args->args);
......
485print_modules();
486dump_stack();
487print_oops_end_marker();
我们大致可以看出这个堆栈信息只是警告,和红帽知识库中描述的一致,并不影响主机的使用.
总结说明
从上面源文件的函数来看,只要kswapd内存回收的时候调用了xfs_vm_releasepage就有可能打印堆栈信息,如果打印堆栈则不会执行try_to_free_buffers操作,所以查看内存使用的时候可用内存并没有增加.如果不希望出现堆栈信息可以开启disable_trace_on_warning函数对应的kernel.traceoff_on_warning内核参数关闭堆栈提示,不过关闭后其他的内核信息也就不会再打印,所以从这方面来看只有升级内核版本才会避免出现这个信息.