Documentation

Topics Overview Overview Linux macOS Windows VS Code for the Web Raspberry Pi Network Additional Components Uninstall VS Code Tutorial Copilot Quickstart User Interface Personalize VS Code Install Extensions Tips and Tricks Intro Videos Overview Setup Quickstart Overview Language Models Context Tools Agents Customization Trust & Safety Overview Agents Tutorial Agents Window Planning Memory Tools Subagents Local Agents Copilot CLI Cloud Agents Third-Party Agents Overview Chat Sessions Add Context Inline Chat Review Edits Checkpoints Artifacts Panel Debug Chat Interactions Prompt Examples Overview Instructions Prompt Files Custom Agents Agent Skills Language Models MCP Hooks Plugins Context Engineering Customize AI Test-Driven Development Edit Notebooks with AI Test with AI Test Web Apps with Browser Tools Debug with AI MCP Dev Guide OpenTelemetry Monitoring Inline Suggestions Smart Actions Best Practices Security Troubleshooting FAQ Cheat Sheet Settings Reference MCP Configuration Workspace Context Display Language Layout Keyboard Shortcuts Settings Settings Sync Extension Marketplace Extension Runtime Security Themes Profiles Overview Voice Interactions Command Line Interface Telemetry Basic Editing IntelliSense Code Navigation Refactoring Snippets Overview Multi-Root Workspaces Workspace Trust Tasks Debugging Debug Configuration Testing Port Forwarding Integrated Browser Overview Quickstart Staging & Committing Branches & Worktrees Repositories & Remotes Merge Conflicts Collaborate on GitHub Troubleshooting FAQ Getting Started Tutorial Terminal Basics Terminal Profiles Shell Integration Appearance Advanced Overview Enterprise Policies AI Settings Extensions Telemetry Updates Overview JavaScript JSON HTML Emmet CSS, SCSS and Less TypeScript Markdown PowerShell C++ Java PHP Python Julia R Ruby Rust Go T-SQL C# .NET Swift Working with JavaScript Node.js Tutorial Node.js Debugging Deploy Node.js Apps Browser Debugging Angular Tutorial React Tutorial Vue Tutorial Debugging Recipes Performance Profiling Extensions Tutorial Transpiling Editing Refactoring Debugging Quick Start Tutorial Run Python Code Editing Linting Formatting Debugging Environments Testing Python Interactive Django Tutorial FastAPI Tutorial Flask Tutorial Create Containers Deploy Python Apps Python in the Web Settings Reference Getting Started Navigate and Edit Refactoring Formatting and Linting Project Management Build Tools Run and Debug Testing Spring Boot Modernizing Java Apps Application Servers Deploy Java Apps GUI Applications Extensions FAQ Intro Videos GCC on Linux GCC on Windows GCC on Windows Subsystem for Linux Clang on macOS Microsoft C++ on Windows Build with CMake CMake Tools on Linux CMake Quick Start C++ Dev Tools for Copilot Editing and Navigating Debugging Configure Debugging Refactoring Settings Reference Configure IntelliSense Configure IntelliSense for Cross-Compiling FAQ Intro Videos Get Started Navigate and Edit IntelliCode Refactoring Formatting and Linting Project Management Build Tools Package Management Run and Debug Testing FAQ Overview Node.js Python ASP.NET Core Debug Docker Compose Registries Deploy to Azure Choose a Dev Environment Customize Develop with Kubernetes Tips and Tricks Overview Jupyter Notebooks Data Science Tutorial Python Interactive Data Wrangler Quick Start Data Wrangler PyTorch Support Azure Machine Learning Manage Jupyter Kernels Jupyter Notebooks on the Web Data Science in Microsoft Fabric Foundry Toolkit Overview Foundry Toolkit Copilot Tools Create Agents Models Playground Agent Builder Agent Inspector Evaluation Tool Catalog Fine-Tuning (Automated Setup) Fine-Tuning (Project Template) Model Conversion Tracing Profiling (Windows ML) FAQ File Structure Manual Model Conversion Manual Model Conversion on GPU Setup Environment Without Foundry Toolkit Template Project Migrating from Visualizer to Agent Inspector Overview Getting Started Resources View Deployment VS Code for the Web - Azure Containers Azure Kubernetes Service Kubernetes MongoDB Remote Debugging for Node.js Overview SSH Dev Containers Windows Subsystem for Linux GitHub Codespaces VS Code Server Tunnels SSH Tutorial WSL Tutorial Tips and Tricks FAQ Overview Tutorial Attach to Container Create Dev Container Advanced Containers devcontainer.json Dev Container CLI Tips and Tricks FAQ Default Keyboard Shortcuts Default Settings Substitution Variables Tasks Schema

On this page there are 8 sections

Explore models in Foundry Toolkit

Foundry Toolkit provides comprehensive support for a wide variety of generative AI models, including both Small Language Models (SLMs) and Large Language Models (LLMs).

Within the model catalog, you can explore and utilize models from multiple hosting sources:

Models hosted on GitHub, such as Llama3, Phi-3, and Mistral, including pay-as-you-go options.
Models provided directly by publishers, including OpenAI's ChatGPT, Anthropic's Claude, and Google's Gemini.
Models hosted on Microsoft Foundry.
Models downloaded locally from repositories like Foundry Local, Ollama, and ONNX.
Custom self-hosted or externally deployed models accessible via Bring-Your-Own-Model (BYOM) integration.

Deploy models directly to Foundry from within the model catalog, streamlining your workflow.

Note

Use Microsoft Foundry, Foundry Local, and GitHub models added to Foundry Toolkit with GitHub Copilot. For more information, check out Changing the model for chat conversations.

Find a model

To find a model in the model catalog:

Select the Foundry Toolkit view in the Activity Bar
Select MODELS > Catalog to open the model catalog
Use the filters to reduce the list of available models
- Hosted by: Foundry Toolkit supports Microsoft Foundry, Foundry Local, GitHub, ONNX, OpenAI, Ollama, Anthropic, Google, NVIDIA NIM, MiniMax, Kimi, GLM and Windows AI API as model hosting sources.
- Publisher: The publisher for AI models, such as Microsoft, Meta, Google, OpenAI, Anthropic, Mistral AI, and more.
- Feature: Supported features of the model, such as Text Attachment, Image Attachment, Web Search, Structured Outputs, and more.
- Model type: Filter models that can run remotely or locally on CPU, GPU, or NPU. This filter depends on the local availability.
- Fine-tuning Support: Show models that can be used to run fine-tuning.
Browse the models in different categories, such as:
- Popular Models is a curated list of widely used models across various tasks and domains.
- GitHub Models provide easy access to popular models hosted on GitHub. It's best for fast prototyping and experimentation.
- ONNX Models are optimized for local execution and can run on CPU, GPU, or NPU.
- Ollama Models are popular models that can run locally with Ollama, supporting CPU via GGUF quantization.
Alternatively, use the search box to find a specific model by name or description

Add a model from the catalog

To add a model from the model catalog:

Locate the model you want to add in the model catalog.
Select the Add on the model card
The flow for adding models is slightly different based on the providers:
- Foundry Local: Foundry Local downloads and runs the model, which might take a few minutes depending on your internet speed. The model is available on a localhost page and added to Foundry Toolkit. Learn more in What is Foundry Local?.
- GitHub: Foundry Toolkit asks for your GitHub credentials to access the model repository. Once authenticated, the model is added directly into Foundry Toolkit.
  
  Note
  Foundry Toolkit now supports GitHub pay-as-you-go models, so you can keep working after passing free tier limits.
- ONNX: The model is downloaded from ONNX and added to Foundry Toolkit.
- Ollama: The model is downloaded from Ollama and added to Foundry Toolkit.
  
  Tip
  You can edit the API key later by right-clicking the model and selecting Edit and view the encrypted value in ${HOME}/.aikt/models/my-models/yml file.
- OpenAI, Anthropic, and Google: Foundry Toolkit prompts you to enter the API Key.
- Custom models: Refer to the Add a custom model section for detailed instructions.

Once added, the model appears under MY RESOURCES/Models in the tree view, and you can use it in the Playground or Agent Builder.

Add a custom model

You can also add your own models that are hosted externally or run locally. There are several options available:

Add Ollama models from the Ollama library or custom Ollama endpoints.
Add custom models that have an OpenAI compatible endpoint, such as a self-hosted model or a model running on a cloud service.
Add custom ONNX models, such as those from Hugging Face, using Foundry Toolkit's model conversion tool.

There are several entrypoints to add models to Foundry Toolkit:

From MY RESOURCES section in the tree view, hover over Models and select the + icon.
From the Model Catalog, select the + Add model button from the tool bar.
From the Add Custom Models section in the model catalog, select + Add Your Own Model.

Add Ollama models

Ollama enables many popular genAI models to run locally with CPU via GGUF quantization. If Ollama is installed on your local machine with downloaded Ollama models, add them to Foundry Toolkit for use in the model playground.

Prerequisites for using Ollama models in Foundry Toolkit:

Foundry Toolkit v0.6.2 or newer.
Ollama (Tested on Ollama v0.4.1)

To add local Ollama into Foundry Toolkit

From one of the entrypoints mentioned previously, select Add Ollama Model.
Next, select Select models from Ollama library

If you start the Ollama runtime at a different endpoint, choose Provide custom Ollama endpoint to specify an Ollama endpoint.
Select the models you want to add to Foundry Toolkit, and then select OK

Note
Foundry Toolkit only shows models that are already downloaded in Ollama and not yet added to Foundry Toolkit. To download a model from Ollama, you can run ollama pull <model-name>. To see the list of models supported by Ollama, see the Ollama library or refer to the Ollama documentation.
You should now see one or more selected Ollama models in the list of models in the tree view.

Note
Attachment isn't supported yet for Ollama models. Foundry Toolkit connects to Ollama using the OpenAI compatible endpoint and doesn't support attachments yet.

Add a custom model with OpenAI compatible endpoint

For self-hosted or deployed models accessible from the internet with an OpenAI compatible endpoint, add it to Foundry Toolkit for use in the playground.

From one of the entry points, select Add Custom Model.
Enter the OpenAI compatible endpoint URL and the required information.

To add a self-hosted or locally running Ollama model:

Select + Add model in the model catalog.
In the model Quick Pick, choose Ollama or Custom model.
Enter the required details for the model.

Add a custom ONNX model

To add a custom ONNX model, first convert it to the Foundry Toolkit model format using the model conversion tool. After conversion, add the model to Foundry Toolkit.

Deploy a model to Microsoft Foundry

Deploy a model to Microsoft Foundry directly from Foundry Toolkit. Run the model in the cloud and access it via an endpoint.

From the model catalog, select the model you want to deploy.
Select Deploy to Microsoft Foundry, either from the dropdown menu or directly from the Deploy to Microsoft Foundry button, as in the following screenshot:
In the model deployment tab, enter the required information, such as the model name, description, and any other settings, as in the following screenshot:
Select Deploy to Microsoft Foundry to start the deployment process.
Confirm the deployment by reviewing the details and selecting Deploy to proceed.
Once the deployment is complete, the model is available in the MY RESOURCES > Your project name > Models section of Foundry Toolkit, and you can use it in the playground or agent builder.

Select a model for testing

You can test a model in the playground for chat completions.

Use the actions on the model card in the model catalog:

Try in Playground: Load the selected model for testing in the Playground.
Try in Agent Builder: Load the selected model in the Agent Builder to build AI agents.

Manage models

You can manage your models in the MY RESOURCES/Models section of the Foundry Toolkit view:

View the list of models added to Foundry Toolkit.
Right-click on a model to access options such as:
- Load in Playground: Load the model in the Playground for testing.
- Copy Model Name: Copy the model name to the clipboard for use in other contexts, such as your code integration.
  - Refresh: Refresh the model configuration to ensure you have the latest settings.
  - Edit: Modify the model settings, such as the API key or endpoint.
  - Delete: Remove the model from Foundry Toolkit.
  - About this Model: View detailed information about the model, including its publisher, source, and supported features.
Right-click on ONNX section title to access options such as:
- Start Server: Start the ONNX server to run ONNX models locally.
- Stop Server: Stop the ONNX server if it's running.
- Copy Endpoint: Copy the ONNX server endpoint to the clipboard for use in other contexts, such as your code integration.

Some models require a publisher or hosting-service license and account to sign-in. In that case, before you can run the model in the model playground, you're prompted to provide this information.

What you learned

In this article, you learned how to:

Explore and manage generative AI models in Foundry Toolkit.
Find models from various sources, including Microsoft Foundry, Foundry Local, GitHub, ONNX, OpenAI, Anthropic, Google, Ollama, and custom endpoints.
Add models to your toolkit and deploy them to Microsoft Foundry.
Add custom models, including Ollama and OpenAI compatible models, and test them in the playground or agent builder.
Use the model catalog to view available models and select the best fit for your AI application needs.
Use filters and search to find models quickly.
Browse models by category, such as Popular, GitHub, ONNX, and Ollama.
Convert and add custom ONNX models using the model conversion tool.
Manage models in MY RESOURCES/Models, including editing, deleting, refreshing, and viewing details.
Start and stop the ONNX server and copy endpoints for local models.
Handle license and sign-in requirements for some models before testing them.

03/12/2026