feat(adr): ADR-0010 (TrustyAI SDK) #53

ruivieira · 2025-06-03T13:05:46Z

No description provided.

danielezonca · 2025-06-03T13:22:08Z

adr/ADR-0010-trustyai-sdk.md

+   - **CLI interface** providing command-line access to all provider functionality
+
+3. **Distribution**:
+   - Installable via `pip install trustyai`


I expect this to be more like pip install trustyai-sdk or pip install trustyai[sdk]. Or is the proposal to replace the current trustyai python library that includes the core algorithms with the SDK?

Good question @danielezonca. The idea is to replace the TrustyAI "core" indeed.
Actually, I think we could do it the other way around, i.e. pip install trustyai for the SDK and pip install trustyai[cli] for the CLI, for instance, since the SDK doesn't need the CLI to work, but not the other way around.

danielezonca · 2025-06-03T13:24:02Z

adr/ADR-0010-trustyai-sdk.md

+   - Provides programmatic access to all AI safety capabilities
+   - Can be used directly in Python applications or as a core foundation for services
+
+4. **Broad target support**:


I would include in this list "distribution projects" like Kubeflow (see Kubeflow SDK) or llama-stack.
We want to make sure it is easy to use the SDK to bring TrustyAI capabilities in similar distributions

danielezonca · 2025-06-03T13:29:38Z

adr/ADR-0010-trustyai-sdk.md

+
+**Provider Design Principles:**
+
+1. **Provider-based design**: Each AI safety scope will be represented by a `Provider` interface that defines the capabilities for that domain.


This goal seems to overlap with llama-stack Safety provider.
As far as I see it has some overlaps and the local vs kubernetes abstraction looks similar to the local vs remote concept of llama-stack.

As far as I see the scope of APIs (providers) expected to be covered here is larger than llama-stack but I would like to clarify here the correlation: i.e. TrustyAI EvaluationProvider compared to llama-stack Eval API.

@danielezonca @evaline-ju Very good question (I'll adress also here @evaline-ju's comment).

It's no secret that Llama Stack's design has been a big inspiration for the TrustyAI SDK proposal 🙂

Regarding the local vs. remote and local vs. Kubernetes distinction, I see a few key differences. The TrustyAI SDK specialises in two specific deployment targets: local and Kubernetes. These have been TrustyAI's infrastructure priorities. By targeting Kubernetes specifically (rather than a generic "remote" approach), I believe we can have deeper integration with Kubernetes and OpenShift than Llama Stack currently offers.

This specialised approach would allow us to provide common "core" methods and patterns for handling cluster resources, including general resource factories (utilities for translating parameters to Custom Resources and validating them), error handling frameworks, and other capabilities that can be reused across all SDK Providers.

The TrustyAI SDK also addresses different concerns than Llama Stack. Rather than focusing on LLM operations like inference, TrustyAI provides a unified API layer specialised in AI safety capabilities. I see this as complementary to Llama Stack rather than a duplication.

For example, the TrustyAI SDK could simplify the addition of new safety-focused providers to Llama Stack. By using TrustyAI as a dependency, implementing Llama Stack Providers would become almost trivial; requiring only minimal glue code to convert Llama Stack requests to TrustyAI SDK parameters. We could even make SDK providers directly "pluggable" as Llama Stack Providers.

I'll add a concrete example to the proposal demonstrating how the current LMEval Llama Stack Provider could be simplified using a TrustyAI SDK LMEval Provider.

danielezonca · 2025-06-03T13:33:57Z

adr/ADR-0010-trustyai-sdk.md

+class FileDataset(BaseDataset):
+    """Dataset implementation for file-based data sources (CSV, JSON, Parquet, etc.)."""


Can you please clarify if this option does cover PVC mounting scenario too?

@danielezonca I added a note on PVC.
IMHO, wouldn't PVC just be a storage abstraction? The FileDataset would behave the same way on a local and Kubernetes provider. Happy to change this, though.

evaline-ju

A few comments/questions

evaline-ju · 2025-06-04T16:49:41Z

adr/ADR-0010-trustyai-sdk.md

+
+TBD
+
+## 8. Alternatives Considered / Rejected


Perhaps similar to @danielezonca 's "This goal seems to overlap with llama-stack Safety provider" comment, it might be helpful to spell out why this separate SDK is the best decision going forward as opposed to leveraging something existent like llama-stack

@evaline-ju Very good point. I've answered the two comments here

evaline-ju · 2025-06-04T16:52:15Z

adr/ADR-0010-trustyai-sdk.md

+
+#### 6.4.1.1. Dataset Abstraction
+
+The TrustyAI SDK implements a unified Dataset abstraction that provides a consistent interface for accessing data from various sources while using pandas DataFrame as the universal data format. This abstraction allows providers to work with data from databases, files, cloud storage, and other sources without requiring knowledge of the underlying storage implementation.


Q about TrustyAI datasets in general - are these mostly static after access i.e. there aren't additional manipulations that would warrant any need to save or persist a manipulated dataset?

@evaline-ju great question!

In this proposal Datasets are indeed considered immutable data loaders using Dataframes (or numpy even, if we want to consider multi-dimensional data) and just a common format for providers.
In this case, manipulations would happen in user code with pandas and users handle their own persistence needs.

evaline-ju · 2025-06-04T17:00:51Z

adr/ADR-0010-trustyai-sdk.md

+    model: ModelReference
+    tasks: List[str] = Field(description="Evaluation tasks to run")
+    dataset: Optional[BaseDataset] = None
+    parameters: Dict[str, Any] = Field(default_factory=dict)


Not specific to evaluation, but with various algorithm implementations, I’ve seen args/kwargs parameters get fairly numerous and potentially nested - while the Dict is fairly flexible in python, how will the nesting be accounted for in the CLI translation i.e. the --parameters "embeddings_model=openai/text-embedding-ada-002…” portion?

@evaline-ju Very interesting question! 🙂

Personally, I'm partial to either:

Nested data serialisation

This would be similar to cURL's approach, i.e.

--parameters '{"embeddings_model": "openai/text-embedding-ada-002"}' # or --parameters @parameters.json

or

Nested key serialisation

Similar to Helm's value setting

--parameter embeddings.model=openai/text-embedding-ada-002 --parameter embeddings.other.nested=....

evaline-ju · 2025-06-04T17:05:52Z

adr/ADR-0010-trustyai-sdk.md

+    """Request model for evaluation operations."""
+    model: ModelReference
+    tasks: List[str] = Field(description="Evaluation tasks to run")
+    dataset: Optional[BaseDataset] = None


I realize this is just an example implementation but I'm curious about the expectations for batch cases, which I assume could be popular, whether it's usage of multiple models or multiple datasets in a request - will there be an expectation to update an existing request or implement new classes to accommodate these use cases or say that the user of trustyai is responsible for those cases?

evaline-ju · 2025-06-04T17:13:11Z

adr/ADR-0010-trustyai-sdk.md

+   - Returns results directly from the local process
+
+2. **Kubernetes Implementation:**
+   - Provider builds a Custom Resource (CR) from the input parameters


I see the "TrustyAI operator integration for Kubernetes deployments" in the dependency notes but am still a bit confused - will the dev here have to write the logic, or is this a matter of leveraging an existing operator that can do this?

dahlem

Great work on this ADR 🚀

dahlem · 2025-06-05T06:17:28Z

adr/ADR-0010-trustyai-sdk.md

+        self._cached_data: Optional[pd.DataFrame] = None
+
+    @abstractmethod
+    def load(self, **kwargs) -> pd.DataFrame:


What are the assumptions we are making on the underlying data asset and implications in materializing those to pandas DataFrames with respect to:

memory and scale: do we expect that the data asset can be fully loaded into memory?

parallelism: do we expect no further data manipulations since pandas is single-threaded?

versioning/lineage: @evaline-ju asked whether the data is static. If it isn't, do we need to track provenance?

serialization: if data is dynamic does it need to be serialized?

schema validation: pandas has week data schema support; does not enforce a schema throughout the DataFrame lifecycle; type coercion is implicit and error prone; missing data is loosely handled (NaNs for any column and non-nullable columns not enforced); and no built-in validation/constraints

dahlem · 2025-06-05T06:30:04Z

adr/ADR-0010-trustyai-sdk.md

+    fairness_provider = FairnessProvider(implementation="fairlearn")
+
+    # Provider handles data source abstraction internally
+    spd_score = fairness_provider.statistical_parity_difference(


Where does the logic sit, if a dataset does not fit into memory. Do the metric providers need to stream through the data and update the metric incrementally with bounded memory constraints? If so, some metrics are not easily streamable like (PR)-AUC, confusion matrices with arbitrary thresholds, ranking-based metrics, etc.

dahlem · 2025-06-05T06:39:42Z

adr/ADR-0010-trustyai-sdk.md

+```python
+# TrustyAI service specific metrics endpoints implementation
+@app.post("/v2/metrics/spd/calculate")
+async def calculate_spd(request: SPDRequest):


What is the pattern to go from ad-hoc metrics to runtime metrics? E.g., I'd like to monitor SPD over configurable windows of data continuously in deployment using some form of FIFO buffer or discrete non-overlapping buckets (weekly, monthly, etc). Does this proposal imply that the data needs to materialize first?

dahlem · 2025-06-05T06:43:14Z

adr/ADR-0010-trustyai-sdk.md

+**Local Execution:**
+```bash
+# Local evaluation execution
+trustyai eval execute --provider lm-evaluation-harness --execution-mode local --model "hf/microsoft/DialoGPT-medium" --limit 10 --tasks "hellaswag,arc"


can these model strings come potentially from the model registry / model catalogue? How might we enforce a secure supply chain?

dmaniloff · 2025-06-11T14:57:39Z

Thank you @ruivieira for this ADR!! This is super thorough and clearly written.

As the n00b that I am, it's not obvious to me that our problem statement is fragmentation and a steep learning curve, and that therefore we lack a unified API. I am also concerned that this isn't just inspired by Llama Stack, but actually doing a lot of the same things.

Having said that, I do see the need/value for us to stand on our own feet and have our own SDK. For example for a) customers that don't use LS, or b) the community of AI experts/tinkerers that is looking to experiment or develop without the bloat of LS, etc.

However, given Red Hat's decision to adopt LS as the glue and basis of our GenAI product, it seems to me that this need/value for us to stand on our own feet should manifest itself a bit more loudly in the form of customers looking for trustworthy solutions that don't / won't use LS, or an active community raising issues in our public repos that point at fragmentation issues, etc. In other words it would arise as a "pull" instead of a "push".

Therefore, while I agree with this ADR in the sense of the end state, I would like to propose that we consider a different order of operations. Specifically:

Start out (continue rather) building a collection of trustworthy components that use LS as the glue, and get them deployed in real production / adopted by the community. We already have a few out-of-tree providers. Let's continue down this path, build more, give them a home, promote them, etc. This way we a) prioritize our efforts toward being (an important) part of the Red Hat LS distribution; b) get exposure to customers alongside the rest of our product and to users in the LS community; c) avoid having to re-invent a lot of existing plumbing like a datasets abstraction and a RESTful service. I think you raise a strong point re: making things easily deployable locally and on k8s, so we should make that a focus of this collection of components.
Once we have adoption and/or customer feedback, we can look at our own API/service/engine. Even at that point I would advise that we make sure there isn't already something else that is taking hold in the community that we may want to adopt instead of building from scratch.

ruivieira · 2025-06-11T22:41:19Z

@dmaniloff Thank you for the feedback!

You raise excellent points about priorities and the relationship with Llama Stack (LS). Let me try to give my view on the fragmentation problem and how I think TrustyAI and LS can work well together.

You're absolutely right to point if the fragmentation problem is clear! Here's my view of the current situation: TrustyAI Core (Java-based with its own API patterns), Metrics Service (REST server with ad-hoc endpoints where each metric has different request/response formats), LM-Eval integration (deployed as Kubernetes Jobs with custom configuration), and Guardrails (separate configuration logic and deployment patterns). For someone wanting to implement AI safety, they need to learn four different interfaces, deployment methods, and configuration approaches. The SDK wouldn't replace these components' APIs, but provide a unified way to use them together.

I agree that making LS interoperability easy is critical. Rather than building TrustyAI "around" LS, I see advantages in making our provider architecture LS-compatible. This would actually increase the reach of TrustyAI providers: direct SDK usage for users wanting specialised AI safety tools, LS integration for broader LLM workflows, and "mixed" scenarios where teams start with direct SDK usage and later integrate with LS infrastructure. For instance, a TrustyAI evaluation provider could execute locally, on Kubernetes, or as an LS provider—all through the same interface.

Llama Stack provides a good general-purpose API for LLM inference and workflows. TrustyAI can complement this by offering an API for AI safety that moves faster in this specific domain. TrustyAI can be the safety SDK that plugs into LS infrastructure when needed, but that is also used by the community at large (which might not need LS).

I agree with your point about the datasets abstraction, but this work needs to happen regardless. In the TrustyAI service, we'd still need to write data abstractions to send data from databases, CSV files, and HDF5 to different algorithms from different libraries like AIF360, fairlearn, and Deon. Whether we implement it in the SDK or in the TrustyAI service, we need a unified way to handle different storage backends.

Another advantage of the SDK is that it can be used as a library dependency or from Jupyter as "pure" Python—no need for LS at all. This would be helpful for researchers and developers who want to work directly with AI safety tools without additional dependencies (such as a LS distro).

Looking at the effort: LS-first approach means implementing LS out-of-tree providers plus backends (K8s/local) plus a TrustyAI community LS distribution. SDK-first approach means building providers directly in SDK plus creating LS compatibility layer. In my view, the SDK-first approach seems more efficient because we develop each capability once with multiple deployment targets, can iterate faster on AI safety-specific features, and still achieve full LS compatibility.

Perhaps we can find a middle ground: develop the core SDK providers with LS compatibility built-in from day one, deploy these providers both standalone and as LS integrations. This way, we're not building separately from LS, but ensuring TrustyAI can work with LS deployments while also the broader community reach.

A good test run of this would be LMEval and Guardrails. By writing them as SDK providers (for LMEval for instance, the current LS provider is a good example—it's almost an "SDK provider" already) and then turning the out-of-tree provider into a very thin wrapper around the SDK provider.

What do you think about this approach?

feat(adr): Introduce ADR-0010 for TrustyAI SDK design and architecture

ca3a8bb

ruivieira requested review from RobGeada, adolfo-ab, christinaexyou, dahlem, danielezonca, dmaniloff, evaline-ju, m-misiura, nehachopra27 and sheltoncyril June 3, 2025 13:05

ruivieira self-assigned this Jun 3, 2025

ruivieira added ADR Architecture Decision Record ADR/under-discussion Architecture Decision Record under discussion labels Jun 3, 2025

docs: update proposal with new "execute" verb

62c150c

danielezonca reviewed Jun 3, 2025

View reviewed changes

ruivieira changed the title ~~feat(adr): Introduce ADR-0010 for TrustyAI SDK design and architecture~~ feat(adr): ADR-0010 (TrustyAI SDK) Jun 4, 2025

evaline-ju reviewed Jun 4, 2025

View reviewed changes

dahlem reviewed Jun 5, 2025

View reviewed changes

Add note on PVC

33a18e9

ruivieira linked an issue Jun 13, 2025 that may be closed by this pull request

TrustyAI SDK ADR #60

Open

ruivieira requested a review from zanetworker June 20, 2025 12:54


		Provider Design Principles:

		1. Provider-based design: Each AI safety scope will be represented by a `Provider` interface that defines the capabilities for that domain.

		class FileDataset(BaseDataset):
		"""Dataset implementation for file-based data sources (CSV, JSON, Parquet, etc.)."""


		#### 6.4.1.1. Dataset Abstraction

		The TrustyAI SDK implements a unified Dataset abstraction that provides a consistent interface for accessing data from various sources while using pandas DataFrame as the universal data format. This abstraction allows providers to work with data from databases, files, cloud storage, and other sources without requiring knowledge of the underlying storage implementation.

feat(adr): ADR-0010 (TrustyAI SDK) #53

Are you sure you want to change the base?

feat(adr): ADR-0010 (TrustyAI SDK) #53

Uh oh!

Conversation

ruivieira commented Jun 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruivieira Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evaline-ju left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruivieira Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dahlem left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmaniloff commented Jun 11, 2025

Uh oh!

ruivieira commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ruivieira Jun 11, 2025 •

edited

Loading

evaline-ju left a comment •

edited

Loading

ruivieira Jun 4, 2025 •

edited

Loading

ruivieira commented Jun 11, 2025 •

edited

Loading