SkillAgentSearch skills...

SAIL

The Sovereign AI Landing Zone deployment template is an open-source Infrastructure as Code (IaC) solution designed to deploy LLM models to run completely within an Azure region. This template enables organizations—especially those in highly regulated industries—to implement LLMs on Azure with full data and compute sovereignty

Install / Use

/learn @Azure/SAIL
About this skill

Quality Score

0/100

Category

Operations

Supported Platforms

Universal

README

Objective

This Sovereign AI Landing Zone (SAIL) repository provides a secure foundation for deploying AI models within Canada’s borders on Azure, so organizations can build, scale, and innovate while maintaining the highest standards of privacy and compliance. As the initial focus, we consider sovereignty on Azure as satisfying two key requirements:

  • Data at rest should be stored within Canadian Azure data centres
  • Data in-transit should be processed within Canadian Azure data centres

The critical Azure services in supporting the deployment of sovereign AI models in Canada are Microsoft Foundry and Azure Machine Learning.

We will provide a comprehensive review of deployment approaches and templates for AI models satisfying the two soverignity requirements of data at rest and in-transit staying within Canada borders. Initial Azure Bicep scripts for deployment of Azure Machine Learning and Microsoft Foundry through Infrastructure as Code (IaC) can be found in the infra folder. More updates to the IaC scripts and deployment scripts to come!

Microsoft Foundry AI model deployment options

For soverignity reasons, it would be important to consider AI models deployable within Microsoft Foundry from the list of Directly Sold by Azure models which satisfy deployment requirements from a data security and privacy perspective as outlined here.

In particular for models from the Directly Sold by Azure list within Microsoft Foundry:

  • Data at rest is stored in the Foundry resource in the customer's Azure tenant, within the same geography as the resource. For Canada, the geography is Canada Central and Canada East. Generally prompts and completions for such models are not stored except as part of specific features such as fine-tuning and Assistant API. Another default-enabled temporary data storage feature is to defend against abuse where potentially abusive material from prompts and completions may be stored up to 30 days for the sole purpose of Microsoft review. This feature can be disabled by submitting this form.

  • Data in-transit can be processed in various forms depending on the model deployment type. To ensure that AI models through AI Foundry process data in-transit within Canadian Azure regions, they must be deployed as either

    • Standard for Pay-As-You-Go deployments
    • Regional Provisioned for Provisioned Throughput Unit - PTU (dedicated capacity with guaranteed units of throughput) deployments
  • Alternatively, global deployment type means that data might be processed for inferencing in any Foundry location in the world. Data zone is not applicable for Canada as only US and Europe regions have Data Zone support.

  • As of March 20, 2026, these are the models within AI Foundry that provide guaranteed data in-transit processing within Canada:

    • Standard for Pay-As-You-Go deployments (available through Microsoft Foundry deployed in Canada East region):
      • gpt-4.1-mini
      • gpt-4o (Version 1120)
      • text embedding models (ada, 3-large, 3-small)
    • Regional Provisioned Throughput Units (PTU) deployments (available through Microsoft Foundry deployed in Canada East region):
      • o3-mini
      • gpt-5-mini (though it is currently out of capacity)
      • gpt-5
      • gpt-5.1
      • gpt-4o (Versions 1120, 0806, 0513 - also available in Canada Central)
      • gpt-4o-mini - also available in Canada Central
  • There are also many AI models that could be deployed using the Microsoft Foundry (classic) hub-based service using managed compute, such as certain Cohere models from the Directly Sold by Azure list. Such models would be deployed on managed GPU VMs to ensure data in-transit and data at rest remains in Canada geography in a Hub-based Foundry resource, which is based on the Azure ML deployment infrastructure as seen below. Just remember to set the Azure ML deployment script as kind: 'hub'.

Azure Machine Learning AI model deployment options

The following is guidance to facilitate deployment of generic AI models including large language models (LLMs) on Azure Machine Learning's (AML) Managed Online Endpoints for efficient, scalable, and secure real-time inference.​ Two patterns of deployment types are described: models through vLLM and generic AI models. By leveraging AML's Managed Online Endpoints, the model would be deployed within the AML region and secured through inbound and outbound private connections thus ensuring a secured and sovereign solution. The AI model is deployed in a managed virtual network within the region of the Azure ML service, which should be in Canada Central.

In particular, this pattern gives you the ability to utilize OOTB Hugging Face models onto Managed Online Endpoints in AML, using managed compute.

Pre-requisites :

  1. vLLM: A high-throughput, memory-efficient inference engine designed for LLMs.​ We will be creating a custom Dockerized environment for vLLM on AML as a foundational step.
  2. (Optional) You can also bring in any generic AI models by leveraging the custom Dockerfile and providing a generic score.py file that loads the model in memory and defines inferencing.
  3. Managed Online Endpoints: A feature in Azure Machine Learning that simplifies deploying machine learning models for real-time inference by handling serving, scaling, securing, and monitoring complexities.​ At the time of writing, an additional context to using this feature is to ensure data and regional residency abilities that could be achieved through the setup here.
  4. Model of your choice from HuggingFace (or any generic AI model). Knowledge around usage of HuggingFace models and the workflow and AuthN aspects are assumed.

Key Deployment Steps:

  1. Create a Custom Environment on AzureML: Define a Dockerfile specifying the environment for the model, utilizing vLLM's base container with necessary dependencies.​

  2. Deploy the AzureML Managed Online Endpoint: Configure the endpoint and deployment settings using YAML files, specifying the model to deploy, environment variables, and instance configurations.​

  3. Test the Deployment: Retrieve the endpoint's scoring URI and API keys, then send test requests to ensure the model is serving correctly.​ Using MS Entra for authentication and authorization is supported as well: https://learn.microsoft.com/en-us/azure/machine-learning/concept-endpoints-online-auth?view=azureml-api-2

  4. (Optional) Autoscale the AML Endpoint: Set up autoscaling rules to dynamically adjust the number of instances based on real-time metrics, ensuring efficient handling of varying loads.​

  5. For pre-trained Foundry large language models, as long as these models offer a managed compute deployment option, you can use the model deployment wizard or follow the guide here: https://learn.microsoft.com/en-us/azure/foundry-classic/how-to/deploy-models-managed?pivots=ai-foundry-portal though note that for private and security reasons, the managed compute endpoint should always be set to use private endpoint (which is the default configuration in this repo).

Essence of the steps via code/CLI commands:

  1. Authentication
az account set --subscription <subscription ID>
az configure --defaults workspace=<Azure Machine Learning workspace name> group=<resource group>
  1. Build Environment
az ml environment create -f environment.yml
  1. Deploy to Managed Online Endpoint
az ml online-endpoint create -f endpoint.yml
az ml online-deployment create -f deployment.yml --all-traffic
  1. Get API endpoint and API keys
az ml online
View on GitHub
GitHub Stars6
CategoryOperations
Updated4d ago
Forks7

Languages

Bicep

Security Score

85/100

Audited on Mar 23, 2026

No findings