记一次 .NET 某中医药附属医院门诊系统 崩溃分析
一:背景1. 讲故事
前段时间有位训练营的学员找到我,说他们的软件在客户那边崩溃了,没找到是什么原因,比较着急,让我帮忙看下是怎么回事?毕竟我的学员是永久的免费dump分析,必须给他上一卦。
二:崩溃分析
1. 为什么会崩溃
关于怎么分析崩溃dump,这个在训练营里面早已整出来了套路,先用 !analyze -v 自动化分析崩溃原因,简化后如下:
0:000> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
CONTEXT:(.ecxr)
eax=15c96638 ebx=010fecb0 ecx=00000000 edx=000109a8 esi=000109a8 edi=0000001c
eip=02f1d218 esp=010fec7c ebp=010feca8 iopl=0 nv up ei pl nz na pe nc
cs=0023ss=002bds=002bes=002bfs=0053gs=002b efl=00010206
02f1d218 8b410c mov eax,dword ptr ds:002b:0000000c=????????
Resetting default scope
EXCEPTION_RECORD:(.exr -1)
ExceptionAddress: 02f1d218
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter: 00000000
Parameter: 0000000c
Attempt to read from address 0000000c
STACK_TEXT:
WARNING: Frame IP not in any known module. Following frames may be wrong.
010feca8 758d139b 000109a8 0000001c 00000000 0x2f1d218
010fecd4 758c836a 15c9664e 000109a8 0000001c user32!_InternalCallWinProc+0x2b
010fedb8 758c7f6a 15c9664e 00000000 0000001c user32!UserCallWinProcCheckWow+0x33a
010fee1c 758cbb2f 01aef180 00000000 0000001c user32!DispatchClientMessage+0xea
010fee58 77a64f5d 010fee74 00000020 010ff110 user32!__fnDWORD+0x3f
010feee0 758cbdca 010fefb8 00000000 00000000 ntdll!KiUserCallbackDispatcher+0x4d
010feee0 758cbd3e 00000000 00000000 00000000 user32!_PeekMessage+0x2a
010fef1c 6f8a707c 010fefb8 00000000 00000000 user32!PeekMessageW+0x16e
010fef68 6f85443a 00000000 00000000 00000000 System_Windows_Forms_ni+0x22707c
010feffc 6f8540d1 00000000 ffffffff 00000000 System_Windows_Forms_ni!System.Windows.Forms.Application.ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop+0x1b6
010ff050 6f853f23 00000000 00000000 00000000 System_Windows_Forms_ni!System.Windows.Forms.Application.ThreadContext.RunMessageLoopInner+0x175
010ff07c 6f82c83d 00000000 00000000 00000000 System_Windows_Forms_ni!System.Windows.Forms.Application.ThreadContext.RunMessageLoop+0x4f
010ff094 02fa0b04 00000000 00000000 00000000 System_Windows_Forms_ni!System.Windows.Forms.Application.Run+0x35
010ff0f8 7337f066 00000000 00000000 00000000 xxx!xxx.Program.Main+0x2bc
...从卦中的 DispatchClientMessage 来看,这是提取到了消息队列中的消息,在 0x2f1d218 处出现了访问违例,接下来的问题是寻找到底在处理啥消息?
2. 到底在处理什么消息
要想找到这个问题的答案,可以通过 !dso 在调用栈上寻找 MSG 结构体,简化后的输出如下:
0:000> !dso
OS Thread Id: 0x20b0 (0)
ESP/REGObject Name
010FEF9C 175ea6ec System.Windows.Forms.NativeMethods+MSG[]
0:000> !mdt -e:2 175ea6ec
175ea6ec (System.Windows.Forms.NativeMethods+MSG[], Elements: 1, ElementMT=6f688e60)
(System.Windows.Forms.NativeMethods+MSG) VALTYPE (MT=6f688e60, ADDR=175ea6f4)
hwnd:00140488 (System.IntPtr)
message:0x113 (System.Int32)
wParam:00000531 (System.IntPtr)
lParam:00000000 (System.IntPtr)
time:0xfbf4f32 (System.Int32)
pt_x:0x118 (System.Int32)
pt_y:0x42d (System.Int32)从卦中的 message:0x113 来看,这是经典的 WM_TIMER 消息,即定时器事件,用 C# 的话术就是窗体的 Timer 控件,参考MSDN截图:
接下来的关注点就是分析崩溃处的汇编代码了,使用 ub 命令反编译,输出如下:
0:000> .ecxr
eax=15c96638 ebx=010fecb0 ecx=00000000 edx=000109a8 esi=000109a8 edi=0000001c
eip=02f1d218 esp=010fec7c ebp=010feca8 iopl=0 nv up ei pl nz na pe nc
cs=0023ss=002bds=002bes=002bfs=0053gs=002b efl=00010206
02f1d218 8b410c mov eax,dword ptr ds:002b:0000000c=????????
0:000> ub 02f1d218 La
02f1d200 50 push eax
02f1d201 107567 adc byte ptr ,dh
02f1d204 51 push ecx
02f1d205 83ec04 sub esp,4
02f1d208 ff7304 push dword ptr
02f1d20b ff7308 push dword ptr
02f1d20e ff730c push dword ptr
02f1d211 8b13 mov edx,dword ptr
02f1d213 8b4808 mov ecx,dword ptr
02f1d216 8b09 mov ecx,dword ptr 由于 02f1d218 处没有显示函数名,根据经验猜测,这个应该是 JIT 动态生成的小函数,并且 02f1d204 是函数的入口点,程序崩溃是因为执行了 ecx=0 导致的,接下来根据 ecx 的来源进行反推看看有没有新的发现,输出如下:
0:000> dp 15c96638+0x8 L1
15c96640015a8658
0:000> dp 015a8658 L1
015a865800000000
0:000> !do 015a8658
<Note: this object has an invalid CLASS field>
Invalid object
0:000> !dumpmd 015a8658
015a8658 is not a MethodDesc
0:000> !dumpmt 015a8658
015a8658 is not a MethodTable从卦中看没有任何发现,015a8658 既不是 obj,也不是 mt,也不是 md ,这一下子就把我打入了黑暗之渊。。。
3. 在绝望中寻找希望
一时也没想到好办法,到门口边抽烟边思考, message:0x113 是一个 Win32 的 Timer,应该是 Timer 的定时回调在JIT的函数中意外崩掉了,按道理说在崩溃处的内存附近应该能找到与之对应的C# Timer,有了这个想法之后就在 015a8658 附近内存查找,还真给找到了,参考如下:
0:000> dp 015a8658 L4
015a865800000000 2d61d1a8 2d4ef48c 00000000
0:000> !do 2d61d1a8
Name: System.Windows.Forms.NativeMethods+WndProc
MethodTable: 6f687200
EEClass: 6f681458
Size: 32(0x20) bytes
File: C:\windows\Microsoft.Net\assembly\GAC_MSIL\System.Windows.Forms\v4.0_4.0.0.0__b77a5c561934e089\System.Windows.Forms.dll
Fields:
MT Field Offset Type VT Attr Value Name
71ec273440002f3 4 System.Object0 instance 2d61d164 _target
71ec273440002f4 8 System.Object0 instance 00000000 _methodBase
71ec7b1840002f5 c System.IntPtr1 instance5b73c34 _methodPtr
71ec7b1840002f6 10 System.IntPtr1 instance 0 _methodPtrAux
71ec27344000300 14 System.Object0 instance 00000000 _invocationList
71ec7b184000301 18 System.IntPtr1 instance 0 _invocationCount
0:000> !do 2d61d164
Name: System.Windows.Forms.Timer+TimerNativeWindow
MethodTable: 6f6995e4
EEClass: 6f6ede04
Size: 56(0x38) bytes
File: C:\windows\Microsoft.Net\assembly\GAC_MSIL\System.Windows.Forms\v4.0_4.0.0.0__b77a5c561934e089\System.Windows.Forms.dll
Fields:
MT Field Offset Type VT Attr Value Name
71ec273440005ba 4 System.Object0 instance 00000000 __identity
71ec7b184001cf9 18 System.IntPtr1 instance 0 handle
6f6872004001cfa 8 ...veMethods+WndProc0 instance 2d61d1a8 windowProc
71ec7b184001cfb 1c System.IntPtr1 instance 15ca1bee windowProcPtr
71ec7b184001cfc 20 System.IntPtr1 instance 77a77f70 defWindowProc
71ec878c4001cfd 28 System.Boolean1 instance 1 suppressedGC
71ec878c4001cfe 29 System.Boolean1 instance 0 ownHandle
6f685da84001cff c ...orms.NativeWindow0 instance 00000000 previousWindow
6f685da84001d00 10 ...orms.NativeWindow0 instance 00000000 nextWindow
71ec60184001d01 14 System.WeakReference0 instance 2d61d19c weakThisPtr
702298544001d02 24 System.Int321 instance 0 windowDpiAwarenessContext
713fe7cc4001ce3 b88 ...stics.TraceSwitch0 static 00000000 WndProcChoice
71ec426c4001ce4 b8c System.Int32[]0 static 03111988 primes
71ec878c4001ceb 1312 System.Boolean1 static 1 anyHandleCreatedInApp
71ec42a84001ced 1304 System.Int321 static 1786 handleCount
71ec42a84001cee 1308 System.Int321 static 2915 hashLoadSize
6f685e9c4001cef b90 ...ow+HandleBucket[]0 static 2c7b5f14 hashBuckets
71ec7b184001cf0 130c System.IntPtr1 static 77a77f70 userDefWindowProc
71ec3a084001cf3 1313 System.Byte1 static 0 userSetProcFlagsForApp
71ec882c4001cf4 1310 System.Int161 static 1 globalID
71f1c5944001cf5 b94 ...ntPtr, mscorlib]]0 static 03111bc8 hashForIdHandle
71f1c6d04001cf6 b98 ...Int16, mscorlib]]0 static 03111c3c hashForHandleId
71ec27344001cf7 b9c System.Object0 static 03111b90 internalSyncObject
71ec27344001cf8 ba0 System.Object0 static 03111b9c createWindowSyncObject
71ec878c4001cea 979 System.Boolean1 TLstaticanyHandleCreated
>> Thread:Value 20b0:1 <<
71ec3a084001cf1 97a System.Byte1 TLstaticwndProcFlags
>> Thread:Value 20b0:1 <<
71ec3a084001cf2 97b System.Byte1 TLstaticuserSetProcFlags
>> Thread:Value 20b0:1 <<
6f69ad98400415e 2c ...ndows.Forms.Timer0 instance 14462858 _owner
71ec42a8400415f 30 System.Int321 instance 0 _timerID
71ec878c4004161 2a System.Boolean1 instance 0 _stoppingTimer
71ec42a84004160 190c System.Int321 static 2462 TimerID从卦中的引用链来看,原来它是挂在 DevComponents.DotNetBar.Controls.ComboBoxEx 控件之下的,赶紧反向寻找源代码,截图如下:
尼玛居然是加密的,也是无语了,由于是 DevComponents 组件中的代码,赶紧看看组件的版本,结果发现是 2002 年的第一场雪,距今 23年,没有bug也是奇怪了。。。截图如下:
最后给到朋友的建议就是升级 DevComponents 或者寻找替代品。
三:总结
有人说bug分析就是一门法医学,不断的在绝望中寻找希望,千淘万漉虽辛苦,吹尽狂沙始到金!
来源:程序园用户自行投稿发布,如果侵权,请联系站长删除
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!
页:
[1]