-
Notifications
You must be signed in to change notification settings - Fork 125
feat(crashdump): Add crash dump functionality for fatal errors #1594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
OmniBlade
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, tested dump generation with a 2022 build.
|
Another approach I realized after publishing could be to move the GameMemory allocations from using GlobalAlloc to instead create a separate GameMemory heap and use HeapAlloc. |
xezon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Just a bunch of small comments.
|
Pushed a commit attempting to address most of the comments in the previous review:
In addition:
Remaining: |
|
I think this should be enabled by default for now and included in our weekly pre-releases and legi.cc/patch distributions. The patch still crashes now and then without being able to find a cause (replays are send to us, but there is no crash to be found). For example, Legi (and only Legi) crashed yesterday on stream while being on the patch. No cause was found. Players on the patch (with crashdump on) can send us the crashdumps and thus help finding where the issues/instabilities are. |
7bb8e8c to
90a0b49
Compare
Agreed, that's the main use case I was hoping this could address. Once this is merged, the remaining hurdle would be to get the weekly builds to enable the |
xezon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the cleanup in MiniDumper class needs another look. Make it simple and robust.
xezon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me. A few minor comments that could be looked into.
| MiniDumpFilterWriteCombinedMemory = 0x01000000, | ||
| MiniDumpValidTypeFlags = 0x01ffffff, | ||
| MiniDumpNoIgnoreInaccessibleMemory = 0x02000000, | ||
| MiniDumpValidTypeFlagsEx = 0x03ffffff, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this value correct?
From https://learn.microsoft.com/en-us/windows/win32/api/minidumpapiset/ne-minidumpapiset-minidump_type I'd expect this to be 0x02000001.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There does seem to be some discrepancy between the doc and the SDK headers. The SDK headers defines it explicitly as 0x03ffffff, you can see this in the original file at C:\Program Files (x86)\Windows Kits\10\Include\10.0.26100.0\um\minidumpapiset.h
My money would be on the docs being wrong, and as we don't use this value directly it should be fine for our purpose. Does that seem reasonable?
And just out of curiosity, how did you notice this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include\10.0.26100.0\
I only have two older versions and MiniDumpValidTypeFlags is the last value for both versions.
Does that seem reasonable?
Yeah, seems reasonable to me.
And just out of curiosity, how did you notice this?
I was wondering why we're defining all these structs / enums, so I happened to check a couple or maybe just this one to see where they were coming from.
9cbf93d to
a241b80
Compare
|
Rebased on main.
|
|
Refactored a bit, introducing an enum for the memory range phases and modified the output variable of the callback routine so that our callback can always return TRUE. |
xezon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it looks like we can get rid of the concept of m_dumpObjectsSubState by caching the actual pointers to memory structures and keep iterating until they reach NULL and then go to the next until all lists reached the end.
Performance wise it is probably not the biggest of deals but it should make the logic cleaner.
|
Got rid of the |
xezon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much cleaner indeed.
xezon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch more small comments.
| m_currentAllocator = TheDynamicMemoryAllocator; | ||
| if (m_currentAllocator) | ||
| m_currentSingleBlock = m_currentAllocator->getFirstRawBlock(); | ||
| MoveToNextAllocatorWithRawBlocks(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine, but I was just wondering if the TheDynamicMemoryAllocator can have an iterator like you added for TheMemoryPoolFactory?
Basically have a MemoryPoolIterator and DynamicMemoryIterator and then use them the same way. Or is there a specific reason why Memory Pool needed the iterator while Dynamic Memory did not?
| #endif | ||
|
|
||
| #ifdef RTS_ENABLE_CRASHDUMP | ||
| class AllocationRangeIterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps also put "MemoryPool" into its name.
Alternatively, you could also make this class a member of "MemoryPool" and then simply name it "Iterator".
It then would become MemoryPool::Iterator
Unless there is a reason for it to be standalone or it is too much work to touch.
|
|
||
| AllocationRangeIterator(); | ||
| AllocationRangeIterator(const MemoryPoolFactory* factory); | ||
| AllocationRangeIterator(MemoryPool& pool, MemoryPoolBlob& blob); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constructor is redundant and unused. Can remove.
|
|
||
| void updateRange(); | ||
| void moveToNextBlob(); | ||
| const MemoryPoolFactory* m_factory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This member looks not necessary
| m_currentPool = NULL; | ||
| m_currentBlobInPool = NULL; | ||
| m_factory = NULL; | ||
| m_range = MemoryPoolAllocatedRange(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means m_range is still uninitialized, because MemoryPoolAllocatedRange has no initializer.
Are the constructors missing a call to updateRange() down here? It looks like the very first iterator is malformed.
| m_currentBlobInPool->fillAllocatedRange(m_range); | ||
| } | ||
|
|
||
| void AllocationRangeIterator::moveToNextBlob() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function looks over-complicated to the eye. Maybe. I asked Chat to simplify it and it claims that the following is logically identical
void AllocationRangeIterator::moveToNextBlob()
{
if (m_currentBlobInPool)
m_currentBlobInPool = m_currentBlobInPool->getNextInList();
while (!m_currentBlobInPool && m_currentPool)
{
m_currentPool = m_currentPool->getNextPoolInList();
if (m_currentPool)
m_currentBlobInPool = m_currentPool->m_firstBlob;
}
}| { | ||
| m_range.allocationAddr = nullptr; | ||
| m_range.allocationSize = 0; | ||
| return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use else {} instead of return.
| } | ||
| } | ||
|
|
||
| void MiniDumper::MoveToNextSingleBlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MoveToNextAllocatorWithRawBlocks and MoveToNextSingleBlock can likely be one function like how you did it in the Memory Pool Iterator.
|
I'm considering to reduce the scope of this PR a bit to make it easier to review etc.
Does this seem like a reasonable path forward? |
|
You can do that if you like. I think it would be possible to finalize this change as is as well. |
xezon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good.
| void MiniDumper::CreateMiniDump(DumpType dumpType) | ||
| { | ||
| // Create a unique dump file name, using the path from m_dumpDir, in m_dumpFile | ||
| // Create a unique dump file name, using the path from m_dumpDir, in m_dumpFile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing space :)
This change introduces the option to generate crash dumps, aka. mini dumps, on fatal errors.
The main minidump functionality is done by explicitly loading the
dbghelp.dllfrom the system directory, as the dbghelp.dll that is bundled with the game is an older version that does not include this functionality. There is an option to create small dumps or full memory dumps, currently both are created.Small dumps
These mostly contain stacks for the process threads and some stack variables, or to create dumps with extended info. The use case for these is to quickly determine where a crash occured, the type of crash, if it was already fixed etc. In addition, if the memory allocation structures are corrupted enough, an extended info dump might not succeed while the small dump should. The size of these dumps are typically on the order of 250kB.
Full memory
These contain all the memory regions in the process, as they specify
MiniDumpWithFullMemoryflag toMiniDumpWriteDump. They are considerably larger than the small dumps but compress relatively well. As an example, a ~400MB dump file is compressed to ~70MB using the Windows 11 built-in 7Z compression.Storage Location
Crash dumps are stored in a new folder called 'CrashDumps' under the userDir ("Documents\Command and Conquer Generals Zero Hour Data"), and on startup it will create this directory if it doesn't exist and delete any older dumps so only the 10 newest small and 2 newest full memory dumps are left. This is to preserve disk space, as the full memory files can be several hundred MB.
Integration points
For VS2022 builds, unhandled exceptions end up in the
UnhandledExceptionFilterin WinMain, which then get a reference to the actual exception that occurred and includes that in the dump.For VC6 builds, unhandled exceptions are caught in the
catch(...)blocks ofGameEngine::executewhich then calls RELEASE_CRASH. As there is no exception data available in this case to populate _EXCEPTION_POINTERS from, an intentional exception is triggered to get the trace of the current thread. This makes the stack traces for VC6 a bit more cryptic than VS2022 builds as the C++ exception handling gets included in the trace.Limitations
In the longer run we'll probably want to replace this code with a more mature solution, like CrashPad, but that currently depends on a newer compiler than VC6.
As the code is intended to be temporary, it's kept behind a new CMake feature so it can be easily removed. There are also some other decisions made with this in mind:
DbgHelpLoader_minidump.hfile.