Skip to content

Conversation

@mlvisaya
Copy link
Collaborator

Update the Firmware Update spec

  1. From the feedback from community and leadership, the partial updates is no longer prefered as this causes version compatibility issues. The prefered option is to always update the system with one package containing all the images of all the components.

  2. Update the sequence diagrams to reflect the updated firmware update steps as follows:

  3. Download the firmware package to MCU

  4. Parse the components of the package

  5. Verify the SOC manifest through mailbox command (without setting it in Caliptra)

  6. Verify the downloaded MCU image by getting sha384 of image and comparing it with downloaded Manifest's SHA.

  7. Verify the downloaded SOC images by getting sha384 of images and comparing it with downloaded Manifest's SHAs.

  8. Verify and Activate Caliptra fw using FIRMWARE_LOAD mbox command.

  9. Activate MCU FW using Hitless Update Reset

  10. After boot, set the downloaded manifest to Caliptra using SET_AUTH_MANIFEST mbox command

  11. Use image loading to load the rest of the SOC images from the staging partition, mark the pending partition as ACTIVE

  12. Update the API definition

Update the Firmware Update spec

1. From the feedback from community and leadership, the partial updates
is no longer prefered as this causes version compatibility issues. The
prefered option is to always update the system with one package
containing all the images of all the components.

2. Update the sequence diagrams to reflect the updated firmware update
steps as follows:

1. Download the firmware package to MCU
2. Parse the components of the package
3. Verify the SOC manifest through mailbox command (without setting it
in Caliptra)
4. Verify the downloaded MCU image by getting sha384 of image and
comparing it with downloaded Manifest's SHA.
4. Verify the downloaded SOC images by getting sha384 of images and
comparing it with downloaded Manifest's SHAs.
5. Verify and Activate Caliptra fw using FIRMWARE_LOAD mbox command.
6. Activate MCU FW using Hitless Update Reset
7. After boot, set the downloaded manifest to Caliptra using
SET_AUTH_MANIFEST mbox command
8. Use image loading to load the rest of the SOC images from the staging
partition, mark the pending partition as `ACTIVE`

3. Update the API definition
else MCU RT or SoC image
API->>API: Verify through AUTHORIZE_AND_STASH<br/>Mailbox Command
end
API->>API: Verify SOC Manifest using VERIFY_MANIFEST mbox command
Copy link

@RaunakGu RaunakGu Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @mlvisaya ,
Since we're updating the flow to verify the SoC Manifest and SoC Component first, followed by Caliptra FW and MCU RT, I wanted to highlight a potential issue:

Case:
If a fix is needed in MCU RT (e.g., a bug in the PLDM stack), it may not be possible to apply it due to the current verification sequence.

Explanation:
The verification of the SoC Manifest and/or SoC Component could fail because of the PLDM bug in MCU RT. This failure would prevent the update process from progressing to the MCU RT FW, effectively creating a deadlock. As a result, the update gets stuck in a loop and cannot proceed for always

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @RaunakGu, thanks for the review! The SoC Manifest needs to be verified first because it is used for authorizing the MCU firmware. The SoC Manifest contains the SHA384 of the MCU firmware. If the Manifest is invalid or unauthorized, MCU FW cannot be updated. I think the manifest verification needs to be bug-free since it is basis of verification for MCU FW, otherwise it will indeed lead to a bricked device.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mlvisaya, Thank you

it is not only SoC Manifest but SoC Component verification failure can also cause the same issue,
and MCU RT is Mutable firmware which can't guranteed to be bug free..

Can you please read the explnation again in the previous comment?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RaunakGu, the use case you mentioned here is to update MCU RT firmware with bug in SoC verification flow. I don't think the deadlock is caused by the verification sequence we proposed. In that situation, using partial update could be helpful.

Copy link

@RaunakGu RaunakGu Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the previous update flow will not stuck in this case,

Update Reset -> Caliptra ROM -> Verify FMC/RT -> Verify SoC Manifest -> MCU RT -> SoC Component

Verification was started with Immutable code(Caliptra ROM), and incoming components were activated once incoming MCU RT is activated then then SoC Component verification was being done, this was ensuring that MCU RT is activated before SoC Components are verified,

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mlvisaya, I see another challenge in meeting FW Update for SSDs(NVMe Commit) -
We need to have a standalone Verify only operation as well for Caliptra FWs.

Summary of the challenge:
NVMe Commit gives option to below configurations. there are cases where immediate activation of the new incoming image is not warranted

CA= 000 - Downloaded image replaces the existing image, if any, in the specified Firmware Slot. The newly placed image is not activated.
CA=001 - Downloaded image replaces the existing image, if any, in the specified Firmware Slot. The newly placed image is activated at the next Controller Level Reset.
CA=010 - The existing image in the specified Firmware Slot is activated at the next Controller Level Reset
CA=011 - Downloaded image replaces the existing image, if any, in the specified Firmware Slot and is then activated immediately - Similar to our hitless update

-- My recommendation --
Post 2.1 (Reduced Caliptra Mailbox) – SoC Provides Protected SRAM Memory
Option 1: Mutable Verification
Flow: MCU RT updates protected SRAM with Caliptra FW and triggers Verify Only; verification is done by Caliptra RT FW (mutable).

Option 2: Immutable Verification
Flow: MCU RT updates protected SRAM with Caliptra FW and triggers Update Reset; Caliptra ROM (immutable) verifies but does not activate the image.

e. If all images have been loaded correctly, then the partition is marked as `ACTIVE`.

**Detailed steps:**
Another option is to separate the images to multiple components in a PLDM package. Refer to the [Appendix](#alternative-approach-updating-the-full-flash-image-as-multiple-pldm-firmware-components) for the detailed steps of this approach.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the feedback from community and leadership, the partial updates is no longer prefered as this causes version compatibility issues. The prefered option is to always update the system with one package containing all the images of all the components.

@mlvisaya,
Since partial updates have known compatibility issues and introduce complexity in maintaining consistent firmware states as well, Can we remove the Partial Update section entirely from the documentation and implementation scope.?

When do you plan to merge this PR?

/// Define the callback function signature for firmware update events.
/// Returns Ok(()) if the notification is handled successfully, otherwise an error code.
pub type FirmwareUpdateCallback = fn(FirmwareUpdateNotification) -> Result<(),ErrorCode>;
### Alternative approach: Updating the full flash image as multiple PLDM firmware components
Copy link
Contributor

@raghuncstate raghuncstate Sep 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be worth adding the pros/cons of the approach, not necessarily as part of this PR, but would be good if we can start another issue and write that down.
One good thing about exposing the components to an agent is to allow for cases where different owners own different pieces of the FW components (ex silicon vendor vs OEM/End customer).
I think multiple components lends itself better to hitless updates and partial updates although it is not entirely clear how exposing components to the UA vs having UA invisible subcomponents would prevent us from implementing something. It just depends on where component specific update policies can be applied i suppose.
Anything else? Just thinking out loud here. im sure if we discuss more, i can think of more.
Another issue we have had internally is enforcing SVNs across different components, where we have explicit commands (as in OCP GPU/Accelerator spec) to update the SVN and not tie it to a FW update (ie FW update does not imply SVN update).
There are also implications around fallback or not, redundancy of images or not potentially.
Another thing that comes to mind is how things like redfish inventory would work, where a platform BMC or platform level consumer can find out the version of each subcomponent (ex during debug, sure we can look it up, but its better if system is self describing).

7. When the Update Agent issues the `ActivateFirmware` command:
a. The MCU authorizes and activates the Caliptra core FW in one shot using the `CALIPTRA_FW_UPLOAD` mailbox command. This will also reset Caliptra core and boot up with the updated image.
b. MCU updates the staging flash partition status to 'VALID'.
c. MCU will perform a Hitless Update Reset to reset the MCU. MCU copies the MCU RT image from the staging memory to the DMA staging address allocated to it in the SoC Manifest. MCU then sends `ACTIVATE_FIRMWARE` mailbox command to Caliptra. Caliptra will then initiate the reset of the MCU and set The `RESET_REASON` to `FW_HITLESS_UPD_RESET`. Refer to the [MCU Hitless Update](https://github.com/chipsalliance/caliptra-ss/blob/main/docs/CaliptraSSIntegrationSpecification.md#mcu-hitless-fw-update) section of the Caliptra subsystem integration specification for the details of the MCU Hitless Update Reset flow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it isnt entirely clear how the SoC FW would get activated. what if the SoC FW or a component of SoC FW needs a reset of the device to activate?

a. The MCU authorizes and activates the Caliptra core FW in one shot using the `CALIPTRA_FW_UPLOAD` mailbox command. This will also reset Caliptra core and boot up with the updated image.
b. MCU updates the staging flash partition status to 'VALID'.
c. MCU will perform a Hitless Update Reset to reset the MCU. MCU copies the MCU RT image from the staging memory to the DMA staging address allocated to it in the SoC Manifest. MCU then sends `ACTIVATE_FIRMWARE` mailbox command to Caliptra. Caliptra will then initiate the reset of the MCU and set The `RESET_REASON` to `FW_HITLESS_UPD_RESET`. Refer to the [MCU Hitless Update](https://github.com/chipsalliance/caliptra-ss/blob/main/docs/CaliptraSSIntegrationSpecification.md#mcu-hitless-fw-update) section of the Caliptra subsystem integration specification for the details of the MCU Hitless Update Reset flow.
d. MCU boots up with the updated MCU image. During Image loading, if the `RESET_REASON` is `FW_HITLESS_UPD_RESET`, MCU will check if there are non-active `VALID` partitions. This means that there are downloaded update images before reboot. If there are, then MCU will try to load SoC images from that partition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

power loss resiliency could be addressed in the doc as a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants