Documentation

Topics Overview Overview Linux macOS Windows VS Code for the Web Raspberry Pi Network Additional Components Uninstall VS Code Tutorial Copilot Quickstart User Interface Personalize VS Code Install Extensions Tips and Tricks Intro Videos Overview Setup Quickstart Overview Language Models Context Tools Agents Customization Trust & Safety Overview Agents Tutorial Agents Window Planning Memory Tools Subagents Local Agents Copilot CLI Cloud Agents Third-Party Agents Overview Chat Sessions Add Context Inline Chat Review Edits Checkpoints Artifacts Panel Debug Chat Interactions Prompt Examples Overview Instructions Prompt Files Custom Agents Agent Skills Language Models MCP Hooks Plugins Context Engineering Customize AI Test-Driven Development Edit Notebooks with AI Test with AI Test Web Apps with Browser Tools Debug with AI MCP Dev Guide OpenTelemetry Monitoring Inline Suggestions Smart Actions Best Practices Security Troubleshooting FAQ Cheat Sheet Settings Reference MCP Configuration Workspace Context Display Language Layout Keyboard Shortcuts Settings Settings Sync Extension Marketplace Extension Runtime Security Themes Profiles Overview Voice Interactions Command Line Interface Telemetry Basic Editing IntelliSense Code Navigation Refactoring Snippets Overview Multi-Root Workspaces Workspace Trust Tasks Debugging Debug Configuration Testing Port Forwarding Integrated Browser Overview Quickstart Staging & Committing Branches & Worktrees Repositories & Remotes Merge Conflicts Collaborate on GitHub Troubleshooting FAQ Getting Started Tutorial Terminal Basics Terminal Profiles Shell Integration Appearance Advanced Overview Enterprise Policies AI Settings Extensions Telemetry Updates Overview JavaScript JSON HTML Emmet CSS, SCSS and Less TypeScript Markdown PowerShell C++ Java PHP Python Julia R Ruby Rust Go T-SQL C# .NET Swift Working with JavaScript Node.js Tutorial Node.js Debugging Deploy Node.js Apps Browser Debugging Angular Tutorial React Tutorial Vue Tutorial Debugging Recipes Performance Profiling Extensions Tutorial Transpiling Editing Refactoring Debugging Quick Start Tutorial Run Python Code Editing Linting Formatting Debugging Environments Testing Python Interactive Django Tutorial FastAPI Tutorial Flask Tutorial Create Containers Deploy Python Apps Python in the Web Settings Reference Getting Started Navigate and Edit Refactoring Formatting and Linting Project Management Build Tools Run and Debug Testing Spring Boot Modernizing Java Apps Application Servers Deploy Java Apps GUI Applications Extensions FAQ Intro Videos GCC on Linux GCC on Windows GCC on Windows Subsystem for Linux Clang on macOS Microsoft C++ on Windows Build with CMake CMake Tools on Linux CMake Quick Start C++ Dev Tools for Copilot Editing and Navigating Debugging Configure Debugging Refactoring Settings Reference Configure IntelliSense Configure IntelliSense for Cross-Compiling FAQ Intro Videos Get Started Navigate and Edit IntelliCode Refactoring Formatting and Linting Project Management Build Tools Package Management Run and Debug Testing FAQ Overview Node.js Python ASP.NET Core Debug Docker Compose Registries Deploy to Azure Choose a Dev Environment Customize Develop with Kubernetes Tips and Tricks Overview Jupyter Notebooks Data Science Tutorial Python Interactive Data Wrangler Quick Start Data Wrangler PyTorch Support Azure Machine Learning Manage Jupyter Kernels Jupyter Notebooks on the Web Data Science in Microsoft Fabric Foundry Toolkit Overview Foundry Toolkit Copilot Tools Create Agents Models Playground Agent Builder Agent Inspector Evaluation Tool Catalog Fine-Tuning (Automated Setup) Fine-Tuning (Project Template) Model Conversion Tracing Profiling (Windows ML) FAQ File Structure Manual Model Conversion Manual Model Conversion on GPU Setup Environment Without Foundry Toolkit Template Project Migrating from Visualizer to Agent Inspector Overview Getting Started Resources View Deployment VS Code for the Web - Azure Containers Azure Kubernetes Service Kubernetes MongoDB Remote Debugging for Node.js Overview SSH Dev Containers Windows Subsystem for Linux GitHub Codespaces VS Code Server Tunnels SSH Tutorial WSL Tutorial Tips and Tricks FAQ Overview Tutorial Attach to Container Create Dev Container Advanced Containers devcontainer.json Dev Container CLI Tips and Tricks FAQ Default Keyboard Shortcuts Default Settings Substitution Variables Tasks Schema

On this page there are 9 sections

Profiling an app using Windows Machine Learning

Profiling is a tool designed to help developers and AI engineers to diagnose CPU, GPU, and NPU resource usage of processes, profile ONNX models on different execution providers, and capture Windows ML events.

In this article, you learn how to start profiling and how to inspect the resource usage view and the events view.

Prerequisites

Install the latest version of Visual Studio Code.
Install the Foundry Toolkit VS Code extension. For more information, see install Foundry Toolkit.

Profile on app startup

In this mode, the profiling tool profiles the next app that is started and that is sending out Windows ML events. This option is ideal for testing a run-once app. In this case, you start profiling, then run the app, and the resource usage will appear.

The tool starts profiling a newly started app. This means that for profiling a Python notebook, if the kernel is already running, you need to restart the kernel to begin profiling for it. Just starting a new notebook does not automatically start profiling.

Important

To receive Windows ML events, the tool needs to be run in admin mode. If VS Code is not started in admin mode, a notification shows up and guides you to restart VS Code. You need to close all other VS Code instances to make the restart in admin mode work.

Profile a running app

In this mode, the profiling tool starts profiling an already running app. You can select a process based on these criteria:

Process ID: like 12345
Process name: usually the name of app without .exe. The first match will be profiled.
Process path: like c:\Users\xxx\Inference.Service.Agent.exe. The first match will be profiled.

This option is ideal for profiling an app that is already running and you're unable to restart it for profiling purposes.

Profile an ONNX model

In this mode, the profiling tool starts profiling an ONNX model file on a target execution provider (EP) for a given duration. You can see the resource usage while it's running.

This option is ideal for profiling an ONNX model on different EPs.

After profiling, a notification shows up to guide you to open or save the report.

The report contains detailed profiling statistics and results for the ONNX model.

Benchmark time for each operation

If OP Profiling is enabled, op level data will be generated to allow you to inspect the model in more detail.

The report contains detailed latencies for each op.

Profile an ONNX GenAI model

In this mode, the profiling tool starts profiling an ONNX GenAI model on a target execution provider (EP) for a specified number of prompts. You can see the resource usage while it's running.

Note

You need to select the folder of the GenAI model, this is the folder that contains genai_config.json.

Resource Usage view

In the main window, the plot on the top shows usage of CPU, GPU, NPU, and memory. The usage is updated every second, and is kept for 10 minutes. You can use the tools on the top right to navigate the timeline by zooming in, zooming out, and panning.

Note

This feature uses performance counters. To achieve higher accuracy, you could also try Windows Performance Recorder.

Windows ML Events view

In the main window, the plot on the bottom shows Windows ML events. Its timeline is synchronized with the Resource Usage view, so you can easily determine how resources are used when certain events occur.