-
Notifications
You must be signed in to change notification settings - Fork 112
AdvLoggerPkg: Fix memory bucket stability #799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/202502
Are you sure you want to change the base?
AdvLoggerPkg: Fix memory bucket stability #799
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## release/202502 #799 +/- ##
================================================
Coverage ? 4.36%
================================================
Files ? 36
Lines ? 4007
Branches ? 0
================================================
Hits ? 175
Misses ? 3832
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
My understanding is that there are at least two additional changes likely needed.
For others that have worked more with adv logger, is that true? Also, any quick sanity checks of the current changes (while those updates are being made) is appreciated. |
os-d
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted offline, I think this works, but we should only do this if we aren't doing the fixed buffer region
85a781d to
af44b6e
Compare
d326db1 to
2c59de3
Compare
|
@cfernald, @kuqin12, @os-d, this is now ready for review. We do plan to move to a fixed runtime region allocated in PEI. This is being checked into release/202502 which is needed by current platforms. The PEI fixed buffer may or may not be something we want to risk merging into 202502 (leaning more toward not). This can simply be adjusted at that time to not perform the buffer migration. That already is controlled with It could also be dropped entirely at that time if all platforms based on future release branches are forced to switch to a fixed PEI RT memory region. |
AdvLoggerPkg/Library/AdvancedLoggerLib/DxeCore/AdvancedLoggerLib.c
Outdated
Show resolved
Hide resolved
| if (ExistingLoggerInfo->LogCurrentOffset > ExistingLoggerInfo->LogBufferOffset) { | ||
| CopyMem ( | ||
| LOG_BUFFER_FROM_ALI (NewLoggerInfo), | ||
| LOG_BUFFER_FROM_ALI (ExistingLoggerInfo), | ||
| USED_LOG_SIZE (ExistingLoggerInfo) | ||
| ); | ||
| NewLoggerInfo->LogCurrentOffset = NewLoggerInfo->LogBufferOffset + USED_LOG_SIZE (ExistingLoggerInfo); | ||
| } | ||
|
|
||
| NewLoggerInfo->DiscardedSize = ExistingLoggerInfo->DiscardedSize; | ||
|
|
||
| // | ||
| // Set the old logger info's NewLoggerInfoAddress to redirect to the new buffer | ||
| // | ||
| ExistingLoggerInfo->NewLoggerInfoAddress = PA_FROM_PTR (NewLoggerInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this window it is possible that we lose logs from an interrupt, because we copy over, then set the NewLoggerInfoAddress, so we could copy, interrupt, log to old buffer, come back and set NewLoggerInfoAddress and never get the interrupt logs. Perhaps we can:
- Write LogCurrentOffset to NewBufferInfo
- Set NewLoggerInfoAddress (and probably should do that with a compare exchange to ensure something doesn't see a partial write again in the interrupt case, like we do for the LogCurrentOffset)
- Do the copy from LogBufferOffset -> OldInfo->LogCurrentOffset
Probably simpler to disable interrupts while doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I raised to TPL_NOTIFY during the migration code. This prevents all practical notifies from interrupting during the full migration section and complies with UEFI spec restrictions around TPL for protocol services invocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might still lose logs from an interrupt, though. Say a timer interrupt (or whatever) fires after the copy, before the new address set, so it will log to the old log, but here we've already done the copy and so miss out on those logs. I think it would be safer to entirely disable interrupts (could do TPL_HIGH, but also messing with TPLs can also bring headaches), so could use the CpuArch protocol (definitely available by end of DXE) to disable/enable interrupts during this time period.
If we just do that for the copy and setting the new logger address, should be fine, but I guess the driver lib is relying on the protocol being produced again, right? In which case that gets a little hairier. We can say that we might miss logs in this window and that's okay, but doesn't feel very "advanced" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Timer notifications are registered to
TPL_NOTIFYor lower. -
Timer events are queued at their trigger time and signaled when the TPL is lowered beyond their registered level. For example, a
TPL_NOTIFYtimer notification will not interrupt code executing atTPL_NOTIFY. -
Execution of services from another interrupt source would generally be a violation of calling restrictions if calling into UEFI sources managed by TPL.
-
OnEndOfDxe()is executed atTPL_CALLBACK.- So, a timer notification at
TPL_NOTIFYcould interrupt it
- So, a timer notification at
-
This change surrounds the actual migration section with
TPL_NOTIFYso normal timer notifications are not an issue.
Going to TPL_HIGH requires reducing the guarded section (because of calling restrictions on functions called which try to raise to TPL_NOTIFY and cause TPL inversion). I'd like to keep the set of migration code in the TPL area together right now which includes some of those calling restrictions.
The more likely case may be that an asynchronous (hardware SMI) occurs during the copy, but SMIs are not disabled with normal CPU interrupts anyway.
I guess that it's not an issue to proactively disable interrupts entirely, but in that case, I will register OnEndOfDxe() at TPL_NOTIFY itself and then disable interrupts internally in the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about that a bit more, it might make the most sense to unblock the new region, trigger a SW MMI, and then copy in MM which would have isolation against all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timer interrupt is a bad example, because you are right that will get dispatched at TPL_NOTIFY (though any logs in the core handling of the timer interrupt and queueing we would miss). For something like the ARM64 hardware interrupt protocol, an interrupt will get generated and the exception handler will directly call the handler. There's an equivalent on the x86 side if different interrupts are registered for.
Thinking about that a bit more, it might make the most sense to unblock the new region, trigger a SW MMI, and then copy in MM which would have isolation against all cases.
That makes sense to me and as you say gives greater isolation.
Explain what the versioning scheme is for the advanced logger strucuture and messages. Signed-off-by: Michael Kubacki <[email protected]>
2c59de3 to
c6abf47
Compare
Right now, the PEI Core instance of AdvancedLoggerLib will always allocate the logger buffer as `EfiRuntimeServicesData`. This means it is not allocated into the DXE Core managed RT Services Data memory bucket. The DXE Core instance of AdvancedLoggerLib either continues to use this buffer it is provided or allocates a new `EfiReservedMemoryType` buffer. This change always allocates a `EfiReservedMemoryType` buffer in the DXE Core instance to ensure it is allocated from the reserved memory type buckets setup by the DXE Core. The PEI Core logger buffer is updated to be `EfiBootServicesData` so it does not affect runtime allocations. Signed-off-by: Michael Kubacki <[email protected]>
Update the library to use the standard C predefined identifier `__func__` instead of `__FUNCTION__` to improve code portability. This was started in some recently modified lines in the library. The remaining instances have been updated for consistency in this change. Signed-off-by: Michael Kubacki <[email protected]>
c6abf47 to
d29e790
Compare
Description
Right now, the PEI Core instance of AdvancedLoggerLib will always allocate the logger buffer as
EfiRuntimeServicesData. This means it is not allocated into the DXE Core managed RT Services Data memory bucket. The DXE Core instance of AdvancedLoggerLib either continues to use this buffer it is provided or allocates a newEfiReservedMemoryTypebuffer.This change always allocates a
EfiReservedMemoryTypebuffer in the DXE Core instance to ensure it is allocated from the reserved memory type buckets setup by the DXE Core. The PEI Core logger buffer is updated to beEfiBootServicesDataso it does not affect runtime allocations.How This Was Tested
Integration Instructions