-
Notifications
You must be signed in to change notification settings - Fork 27
Firmware Update Spec Update #330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Update the Firmware Update spec 1. From the feedback from community and leadership, the partial updates is no longer prefered as this causes version compatibility issues. The prefered option is to always update the system with one package containing all the images of all the components. 2. Update the sequence diagrams to reflect the updated firmware update steps as follows: 1. Download the firmware package to MCU 2. Parse the components of the package 3. Verify the SOC manifest through mailbox command (without setting it in Caliptra) 4. Verify the downloaded MCU image by getting sha384 of image and comparing it with downloaded Manifest's SHA. 4. Verify the downloaded SOC images by getting sha384 of images and comparing it with downloaded Manifest's SHAs. 5. Verify and Activate Caliptra fw using FIRMWARE_LOAD mbox command. 6. Activate MCU FW using Hitless Update Reset 7. After boot, set the downloaded manifest to Caliptra using SET_AUTH_MANIFEST mbox command 8. Use image loading to load the rest of the SOC images from the staging partition, mark the pending partition as `ACTIVE` 3. Update the API definition
| else MCU RT or SoC image | ||
| API->>API: Verify through AUTHORIZE_AND_STASH<br/>Mailbox Command | ||
| end | ||
| API->>API: Verify SOC Manifest using VERIFY_MANIFEST mbox command |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @mlvisaya ,
Since we're updating the flow to verify the SoC Manifest and SoC Component first, followed by Caliptra FW and MCU RT, I wanted to highlight a potential issue:
Case:
If a fix is needed in MCU RT (e.g., a bug in the PLDM stack), it may not be possible to apply it due to the current verification sequence.
Explanation:
The verification of the SoC Manifest and/or SoC Component could fail because of the PLDM bug in MCU RT. This failure would prevent the update process from progressing to the MCU RT FW, effectively creating a deadlock. As a result, the update gets stuck in a loop and cannot proceed for always
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @RaunakGu, thanks for the review! The SoC Manifest needs to be verified first because it is used for authorizing the MCU firmware. The SoC Manifest contains the SHA384 of the MCU firmware. If the Manifest is invalid or unauthorized, MCU FW cannot be updated. I think the manifest verification needs to be bug-free since it is basis of verification for MCU FW, otherwise it will indeed lead to a bricked device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mlvisaya, Thank you
it is not only SoC Manifest but SoC Component verification failure can also cause the same issue,
and MCU RT is Mutable firmware which can't guranteed to be bug free..
Can you please read the explnation again in the previous comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RaunakGu, the use case you mentioned here is to update MCU RT firmware with bug in SoC verification flow. I don't think the deadlock is caused by the verification sequence we proposed. In that situation, using partial update could be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the previous update flow will not stuck in this case,
Update Reset -> Caliptra ROM -> Verify FMC/RT -> Verify SoC Manifest -> MCU RT -> SoC Component
Verification was started with Immutable code(Caliptra ROM), and incoming components were activated once incoming MCU RT is activated then then SoC Component verification was being done, this was ensuring that MCU RT is activated before SoC Components are verified,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mlvisaya, I see another challenge in meeting FW Update for SSDs(NVMe Commit) -
We need to have a standalone Verify only operation as well for Caliptra FWs.
Summary of the challenge:
NVMe Commit gives option to below configurations. there are cases where immediate activation of the new incoming image is not warranted
CA= 000 - Downloaded image replaces the existing image, if any, in the specified Firmware Slot. The newly placed image is not activated.
CA=001 - Downloaded image replaces the existing image, if any, in the specified Firmware Slot. The newly placed image is activated at the next Controller Level Reset.
CA=010 - The existing image in the specified Firmware Slot is activated at the next Controller Level Reset
CA=011 - Downloaded image replaces the existing image, if any, in the specified Firmware Slot and is then activated immediately - Similar to our hitless update
-- My recommendation --
Post 2.1 (Reduced Caliptra Mailbox) – SoC Provides Protected SRAM Memory
Option 1: Mutable Verification
Flow: MCU RT updates protected SRAM with Caliptra FW and triggers Verify Only; verification is done by Caliptra RT FW (mutable).
Option 2: Immutable Verification
Flow: MCU RT updates protected SRAM with Caliptra FW and triggers Update Reset; Caliptra ROM (immutable) verifies but does not activate the image.
| e. If all images have been loaded correctly, then the partition is marked as `ACTIVE`. | ||
|
|
||
| **Detailed steps:** | ||
| Another option is to separate the images to multiple components in a PLDM package. Refer to the [Appendix](#alternative-approach-updating-the-full-flash-image-as-multiple-pldm-firmware-components) for the detailed steps of this approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the feedback from community and leadership, the partial updates is no longer prefered as this causes version compatibility issues. The prefered option is to always update the system with one package containing all the images of all the components.
@mlvisaya,
Since partial updates have known compatibility issues and introduce complexity in maintaining consistent firmware states as well, Can we remove the Partial Update section entirely from the documentation and implementation scope.?
When do you plan to merge this PR?
| /// Define the callback function signature for firmware update events. | ||
| /// Returns Ok(()) if the notification is handled successfully, otherwise an error code. | ||
| pub type FirmwareUpdateCallback = fn(FirmwareUpdateNotification) -> Result<(),ErrorCode>; | ||
| ### Alternative approach: Updating the full flash image as multiple PLDM firmware components |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be worth adding the pros/cons of the approach, not necessarily as part of this PR, but would be good if we can start another issue and write that down.
One good thing about exposing the components to an agent is to allow for cases where different owners own different pieces of the FW components (ex silicon vendor vs OEM/End customer).
I think multiple components lends itself better to hitless updates and partial updates although it is not entirely clear how exposing components to the UA vs having UA invisible subcomponents would prevent us from implementing something. It just depends on where component specific update policies can be applied i suppose.
Anything else? Just thinking out loud here. im sure if we discuss more, i can think of more.
Another issue we have had internally is enforcing SVNs across different components, where we have explicit commands (as in OCP GPU/Accelerator spec) to update the SVN and not tie it to a FW update (ie FW update does not imply SVN update).
There are also implications around fallback or not, redundancy of images or not potentially.
Another thing that comes to mind is how things like redfish inventory would work, where a platform BMC or platform level consumer can find out the version of each subcomponent (ex during debug, sure we can look it up, but its better if system is self describing).
| 7. When the Update Agent issues the `ActivateFirmware` command: | ||
| a. The MCU authorizes and activates the Caliptra core FW in one shot using the `CALIPTRA_FW_UPLOAD` mailbox command. This will also reset Caliptra core and boot up with the updated image. | ||
| b. MCU updates the staging flash partition status to 'VALID'. | ||
| c. MCU will perform a Hitless Update Reset to reset the MCU. MCU copies the MCU RT image from the staging memory to the DMA staging address allocated to it in the SoC Manifest. MCU then sends `ACTIVATE_FIRMWARE` mailbox command to Caliptra. Caliptra will then initiate the reset of the MCU and set The `RESET_REASON` to `FW_HITLESS_UPD_RESET`. Refer to the [MCU Hitless Update](https://github.com/chipsalliance/caliptra-ss/blob/main/docs/CaliptraSSIntegrationSpecification.md#mcu-hitless-fw-update) section of the Caliptra subsystem integration specification for the details of the MCU Hitless Update Reset flow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it isnt entirely clear how the SoC FW would get activated. what if the SoC FW or a component of SoC FW needs a reset of the device to activate?
| a. The MCU authorizes and activates the Caliptra core FW in one shot using the `CALIPTRA_FW_UPLOAD` mailbox command. This will also reset Caliptra core and boot up with the updated image. | ||
| b. MCU updates the staging flash partition status to 'VALID'. | ||
| c. MCU will perform a Hitless Update Reset to reset the MCU. MCU copies the MCU RT image from the staging memory to the DMA staging address allocated to it in the SoC Manifest. MCU then sends `ACTIVATE_FIRMWARE` mailbox command to Caliptra. Caliptra will then initiate the reset of the MCU and set The `RESET_REASON` to `FW_HITLESS_UPD_RESET`. Refer to the [MCU Hitless Update](https://github.com/chipsalliance/caliptra-ss/blob/main/docs/CaliptraSSIntegrationSpecification.md#mcu-hitless-fw-update) section of the Caliptra subsystem integration specification for the details of the MCU Hitless Update Reset flow. | ||
| d. MCU boots up with the updated MCU image. During Image loading, if the `RESET_REASON` is `FW_HITLESS_UPD_RESET`, MCU will check if there are non-active `VALID` partitions. This means that there are downloaded update images before reboot. If there are, then MCU will try to load SoC images from that partition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
power loss resiliency could be addressed in the doc as a separate issue.
Update the Firmware Update spec
From the feedback from community and leadership, the partial updates is no longer prefered as this causes version compatibility issues. The prefered option is to always update the system with one package containing all the images of all the components.
Update the sequence diagrams to reflect the updated firmware update steps as follows:
Download the firmware package to MCU
Parse the components of the package
Verify the SOC manifest through mailbox command (without setting it in Caliptra)
Verify the downloaded MCU image by getting sha384 of image and comparing it with downloaded Manifest's SHA.
Verify the downloaded SOC images by getting sha384 of images and comparing it with downloaded Manifest's SHAs.
Verify and Activate Caliptra fw using FIRMWARE_LOAD mbox command.
Activate MCU FW using Hitless Update Reset
After boot, set the downloaded manifest to Caliptra using SET_AUTH_MANIFEST mbox command
Use image loading to load the rest of the SOC images from the staging partition, mark the pending partition as
ACTIVEUpdate the API definition