问题现象:
db2 10.5.10/AIX db2 purescache的insert语句HANG住,检查应该的状态 UOW executing
同时观察到很多SQLO_LT_SQLB_BPD__bpdLatch_SX等待,继续等待。
Latch Waiters With No Holders: Address Holder Waiter Filename LOC LatchType 0x0A00020120B37E20 0 220105 /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 6506 SQLO_LT_SQLB_BPD__bpdLatch_SX 0x0A00020120B80688 0 221133 /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 6506 SQLO_LT_SQLB_BPD__bpdLatch_SX 0x0A00020120AF6368 0 221647 /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 6506 SQLO_LT_SQLB_BPD__bpdLatch_SX 0x0A00020120AF6368 0 553355 /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246 SQLO_LT_SQLB_BPD__bpdLatch_SX 0x0A00020120AF6368 0 537397 /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246 SQLO_LT_SQLB_BPD__bpdLatch_SX 0x0A00020120AF6368 0 554325 /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246 SQLO_LT_SQLB_BPD__bpdLatch_SX 0x0A00020120AF6368 0 539763 /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246 SQLO_LT_SQLB_BPD__bpdLatch_SX 0x0A00020120AF6368 0 411338 /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246 SQLO_LT_SQLB_BPD__bpdLatch_SX 0x0A00020120AF6368 0 480716 /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246 SQLO_LT_SQLB_BPD__bpdLatch_SX 0x0A00020120AF6368 0 507690 /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246 SQLO_LT_SQLB_BPD__bpdLatch_SX 0x0A00020120AF6368 0 404558 /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246 SQLO_LT_SQLB_BPD__bpdLatch_SX 0x0A00020120B80688 0 401092 /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246 SQLO_LT_SQLB_BPD__bpdLatch_SX 0x0A00020120AF6368 0 549067 /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246 SQLO_LT_SQLB_BPD__bpdLatch_SX 0x0A00020120AD1240 0 539120 /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246 SQLO_LT_SQLB_BPD__bpdLatch_SX 0x0A00020120ADBC00 0 416501 /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246 SQLO_LT_SQLB_BPD__bpdLatch_SX 0x0A00020120AF6368 0 481017 /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246 SQLO_LT_SQLB_BPD__bpdLatch_SX 0x0A00020120AD1240 0 550402 /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246 SQLO_LT_SQLB_BPD__bpdLatch_SX 0x0A00020120AF6368 0 531209 /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246 SQLO_LT_SQLB_BPD__bpdLatch_SX 0x0A00020120AF6368 0 423184 /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246 SQLO_LT_SQLB_BPD__bpdLatch_SX
问题分析:
收集应用程序Stack,如下 <StackTrace> -------Frame------ ------Function Offset------ thread_wait getConflictComplex getConflict sqlbfix sqldCreateNewAppendCache sqldExtendAppendCache sqldSearchPageCache Stack traceback unavailable. </StackTrace>
查看诊断日志,发现磁盘满了,自动表空间无法扩大:
2020-03-03-16.53.05.928941 480 I23217218A5756 LVEL: Error (OS)
PID : 12189880 TID : 549067 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : CSPDB
APPHDL : 0-35797 APPID: 197.0.1.101.40182.200303082448
AUTHID : APPUSER HOSTNAME: NODE01
EDUID : 549067 EDUNAME: db2agent (CSPDB) 0
FUNCTION: DB2 Common, OSSe, ossErrorIOAnalysis, probe:100
CALLED : OS, -, gpfs_prealloc
OSERR : ENOSPC (28) "No space left on device"
DATA #1 : String, 132 bytes
A total of 4 analysis will be performed :
- User info
- ulimit info
- Target file info
- File system
Target file handle = 680
DATA #2 : String, 184 bytes
Real user ID of current process = 550
Effective user ID of current process = 550
Real group ID of current process = 550
Effective group ID of current process = 550
DATA #3 : String, 381 bytes
Current process limits (unit in bytes except for nofiles) :
mem (S/H) = unlimited / unlimited
core (S/H) = unlimited / unlimited
cpu (S/H) = unlimited / unlimited
data (S/H) = unlimited / unlimited
fsize (S/H) = unlimited / unlimited
nofiles (S/H) = unlimited / unlimited
stack (S/H) = 4294967296 / 4294967296
rss (S/H) = unlimited / unlimited
DATA #4 : String, 41 bytes
current sbrk(0) value: 0x0000000177eb8200
DATA #5 : String, 261 bytes
Target File Information :
Size = 62914560
Link = No
Reference path = N/A
Type = 0x8000
Permissions = rw-------
UID = 550
GID = 550
Last modified time = 1583225585
DATA #6 : String, 385 bytes
File System Information of the target file :
Block size = 32768 bytes
Total size = 3457784217600 bytes
Free size = 139460608 bytes
Total # of inodes = 835584
FS name = N/A
Mount point = /db2data
FSID (major,minor)= 2, 151
FS type name = mmfs
DIO/CIO mount opt = None
Device type = N/A
FS type = 0x4
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
[0] 0x09000000177BD758 pdOSSeLoggingCallback + 0x5A0
[1] 0x090000001D43457C oss_log__FP9OSSLogFacUiN32UlN26iPPc + 0x1BC
[2] 0x090000001D4349F0 ossLogSysRC + 0x70
[3] 0x090000001D45D6BC ossErrorIOAnalysis__FCPC21OSSErrorAnalysisParam + 0xD7C
[4] 0x090000001993E164 sqloSystemErrorHandler + 0x518
[5] 0x0900000018AC2924 sqloSetFileSize + 0x26D0
[6] 0x0900000018AC1CE8 sqloSetFileSize + 0x1A94
[7] 0xFFFFFFFFFFFFFFFC ?unknown + 0xFFFFFFFF
[8] 0x0900000018AA8ED8 sqloSetFileSize + 0x3BC
[9] 0x09000000132DDA1C sqlbResizeDMSContainer__FP12SQLB_POOL_CBiUiT3P12SQLB_GLOBALS + 0x4E0
[10] 0x09000000132DE348 sqlbARExtendContainers__FlP12SQLB_POOL_CBP16SQLB_POOLCONT_CBbPlT5P16SQLB_POOL_MAP_CBP12SQLB_GLOBALS + 0x234
[11] 0x09000000190FE2CC sqlbGrowPool__FP12SQLB_POOL_CBP12SQLB_GLOBALS + 0x96C
[12] 0x0000000000000000 ?unknown + 0x0
[13] 0x0900000019122CA0 sqlbGrowPool__FP12SQLB_POOL_CBP12SQLB_GLOBALS + 0x5F54
[14] 0x09000000190AEF64 sqlbSearchSomeSMPs__FUiT1bPUiPbT4P12SQLB_POOL_CBP16SQLB_OBJECT_DESCT5UsP11SQLB_FIX_CBP12SQLB_GLOBALS + 0x350
[15] 0x09000000190CFF20 sqlbAllocateExtent__FP12SQLB_GLOBALSP12SQLB_POOL_CBP16SQLB_OBJECT_DESCPUiUs + 0x1354
[16] 0x09000000190D4E14 sqlbAllocateExtent__FP12SQLB_GLOBALSP12SQLB_POOL_CBP16SQLB_OBJECT_DESCPUiUs + 0x6248
[17] 0x090000001785AB74 sqlbDMScheckObjAlloc__FP16SQLB_OBJECT_DESCUliT3PPUiP12SQLB_GLOBALS + 0x2C8
[18] 0x0900000016744F18 sqlbExtendObject__FP12SQLB_GLOBALSP16SQLB_OBJECT_DESCUiN23PFP8sqeAgentP9SQLB_PAGEPv_vPvT3PUiP11SQLB_FIX_CB + 0x490
[19] 0x0900000016757FF4 @114@sqldCreateAppendCachePagesForBaseTable__FP13SQLD_DFM_WORKP15SQLD_PAGE_CACHEUlPUlP11SQLB_FIX_CBlPUiP16SQLD_TABLE_CACHEP16dm + 0x180
[20] 0x090000001676280C sqldCreateNewAppendCache__FP13SQLD_DFM_WORKP15SQLD_PAGE_CACHEUlT3PUlP16SQLD_TABLE_CACHET6 + 0x77C
[21] 0x0900000016760D64 sqldExtendAppendCache__FP13SQLD_DFM_WORKUlP16SQLD_TABLE_CACHET3 + 0xD34
[22] 0x0900000017C680C4 sqldSearchPageCache__FP13SQLD_DFM_WORKP15SQLD_PAGE_CACHEP16SQLD_TABLE_CACHET3 + 0x2F70
[23] 0x0900000017C4C564 sqldInsertRow__FP13SQLD_DFM_WORKi + 0x190
[24] 0x0900000017C4C564 sqldInsertRow__FP13SQLD_DFM_WORKi + 0x190
[25] 0x0900000017C4BE5C sqldRowInsert__FP8sqeAgentUsT2UcUliPP10SQLD_VALUEP8SQLZ_RIDPPv + 0x658
[26] 0x0900000017C4B330 sqlrinsr__FP8sqlrr_cbUsT2iT2PP10SQLD_VALUEQ3_10sqlri_iudo11t_iudoFlags17t_iudoFlagsKernelP8SQLZ_RIDPPv + 0xC8
[27] 0x0900000017C4B5DC sqlriisr__FP8sqlrr_cb + 0x1D0
[28] 0x0900000017B4319C sqlriSectInvoke__FP8sqlrr_cbP12sqlri_opparm + 0x94
[29] 0x0900000017E7380C sqljsParseRdbAccessed__FP13sqljsDrdaAsCbP13sqljDDMObjectP14db2UCinterface + 0x1ECC
[30] 0x0900000017E72C7C sqljsParseRdbAccessed__FP13sqljsDrdaAsCbP13sqljDDMObjectP14db2UCinterface + 0x133C
[31] 0x0900000017E726C0 sqljsParseRdbAccessed__FP13sqljsDrdaAsCbP13sqljDDMObjectP14db2UCinterface + 0xD80
[32] 0x0900000017EC3A1C @72@sqljsSqlam__FP14db2UCinterfaceP8sqeAgentb + 0x9B4
[33] 0x0900000017EC3A1C @72@sqljsSqlam__FP14db2UCinterfaceP8sqeAgentb + 0x9B4
[34] 0x0900000017E791B8 @72@sqljsSqlam__FP14db2UCinterfaceP8sqeAgentb + 0x1A5C
[35] 0x0900000018FA87E8 @72@sqljsDriveRequests__FP8sqeAgentP14db2UCconHandle + 0xA8
[36] 0x0900000018FA92FC @72@sqljsDrdaAsInnerDriver__FP18SQLCC_INITSTRUCT_Tb + 0x600
[37] 0x0900000018705C34 RunEDU__8sqeAgentFv + 0x4150C
[38] 0x090000001871B02C RunEDU__8sqeAgentFv + 0x124
[39] 0x090000001715B660 EDUDriver__9sqzEDUObjFv + 0x138
[40] 0x09000000178EDA08 sqloEDUEntry + 0x394
[41] 0x090000000053FE10 _pthread_body + 0xF0
[42] 0xFFFFFFFFFFFFFFFC ?unknown + 0xFFFFFFFF
问题解决:
清理了无用的磁盘数据,问题解决。
总结:
按照之前的常识,如果表空间无法扩容,insert语句一般会报错,而不是无何止的HANG住,所以HANG的问题除了db2pd之外,还要看db2diag.log,事实上,db2diag.log是解决DB2问题中最重要的工具之一。