Skip to content

Add content from 1-page backgrounder to content somewhere #9

Description

@ljdursi

We have a 1-page backgrounder (text follows) which Guillaume and Mike like that would be good to incorporate into the website somewhere.

Introduction
CanDIG (https://www.distributedenomics.ca) develops an open source, standards-based, federated and completely distributed platform which allows for querying and analyses of national scale genomics and human health data sets while data remains securely and privately controlled by its stewards. The platform enables national-scale genomics projects such as the TF4CN pilot project, and PROFYLE, a precision paediatric oncology research effort. CanDIG is funded through a CFI Cyberinfrastructure grant, and also receives funding from the CANARIE Research Data Management program (the CHORD project) and CIHR (as part of the EU Horizon2020 project CINECA, partnering with the EGA and ELIXIR).

Model & Data
The CanDIG model is decentralized and standards-based, with open-source implementations. Queries to the network are distributed out to the federation members in a peer-to-peer fashion. CanDIG is contributing its federated querying approach to the EGA and ELIXIR as part of the CINECA project, leading the work package on federated data discovery and queries; by mid-2020, those queries will be interoperable among the CINECA partners
CanDIG currently supports aligned whole-genome/exome sequencing data, RNAseq data, variant data, clinical and pipeline metadata associated with those data sets, and references to relevant external data sources. Clinical data includes patient demographic information, diagnosis, treatment, and outcome information. CanDIG is helping define a GA4GH standard RNA expression query API and developing an initial implementation.

Standards
CanDIG’s federated model is enabled by strong data governance policies, including the use of data standards such as mandated standard ontologies both within the data (e.g., HPO& SNOMED-CT) and about the data (e.g. DUO standard for describing consented use for a data set). Clinical metadata schemes in the v1 deployment of CanDIG use AACR/ASCO recommendations for cancer data sets, but the project is moving to HL7/FHIR for v2.
Containers are the reference method of deployment for CanDIG v1 services. CanDIG v2 will include pipeline execution for CWL or WDL pipelines with containers from a trusted repository. Access is via institutional credentials, using OpenID Connect (OIDC) for authentication, and local role-based user lists for authorization. We are working with ELIXIR AAI to develop interoperable and secure ways of propagating that role information via OIDC aggregated and distributed claims, allowing DAC authorizations to securely track a user. We follow GA4GH Security Working Group best practices for security of our sites.
We aim for interoperability not just amongst our sites but internationally. As a driver project for GA4GH, we contribute to international standards-setting efforts in data modelling and API design for human genomics and related health data, ensuring that emerging standards reflect the needs of Canada’s federated health data system. CanDIG actively participates in international workgroups building API standards for: cloud workflow execution and data objects; dataset discovery and querying; researcher identity and data use authorization ontologies; large-scale genomics analyses; and health data governance and research ethics.

Technology & Team
CanDIG v1, which is up and running today at the BC Genome Sciences Centre, the McGill University/Genome Quebec Innovation Centre, and HPC4Health uses a series of open-source, standards based tools: Keycloak at each site to provide OIDC identity services, a Tyk application gateway, and a suite of RESTful (GA4GH) genomics APIs in a Python application.
Taking what the project has learned working with researchers on the PROFYLE and TF4CN efforts, our team at U Sherbrooke, McGill U, HPC4Health (SickKids/UHN), and the BCGSC is developing our version 2 (backend services development is taking place here) to scale up and out the backend architecture to enable both larger volumes and greater complexity of data. We are using API-first design using OpenAPI to define RESTful APIs, retaining and extending our successful OpenID Connect authentication approach, adopting FHIR, Docker, Go/Python, GraphQL, both relational and NoSQL databases, and an object store. The workflow executor service (GA4GH WES) and data registry service (DRS) are part of this version. We are testing CanDIG v2 at two sites now and expect to fully deploy by the end of the year.

Vision
Canada has a unique history of success with federation of health data. CanDIG builds on this expertise and brings it to the international stage. We emphasize open-source and standards-based components to decrease risk and increase interoperability, but also to allow us to focus on those specifically Canadian aspects of the platform where solutions cannot be sourced from elsewhere - the federated aspects of the Canadian health data landscape.
In doing so we are building Canadian capacity to use cloud and data technologies to address problems of health data federation which are increasingly urgent internationally, as data volumes grow beyond scales feasible for centralization, and mobile health/IoT technologies push much health data away from the centre and towards the edge. CanDIG’s experience in providing services for national genomics projects between completely distributed, peer-to-peer data sets also demonstrates that a completely federated approach to data access can be successful, and that the modest increase in technical complexity is both manageable and easily offset by the reduced legal and policy effort.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions