← 返回首页
DirectX Developer Blog https://devblogs.microsoft.com/directx/ The latest news on Microsoft's Graphics and Display technology Fri, 15 May 2026 02:43:48 +0000 en-US hourly 1 https://devblogs.microsoft.com/directx/wp-content/uploads/sites/42/2024/10/Microsoft-favicon-48x48.jpg DirectX Developer Blog https://devblogs.microsoft.com/directx/ 32 32 Advanced Shader Delivery expands Public Preview with AMD https://devblogs.microsoft.com/directx/advanced-shader-delivery-expands-public-preview-with-amd/ https://devblogs.microsoft.com/directx/advanced-shader-delivery-expands-public-preview-with-amd/#comments Fri, 15 May 2026 14:00:08 +0000 https://devblogs.microsoft.com/directx/?p=13687 Last October, we released Advanced Shader Delivery (ASD) on the ROG Xbox Ally handhelds. Advanced Shader Delivery addresses one of the most frustrating challenges for PC players today – long load times and disruptive stuttering during a game’s first launch. The feature works by delivering precompiled shaders to your game at download time, reducing load […]

The post Advanced Shader Delivery expands Public Preview with AMD appeared first on DirectX Developer Blog.

]]> Last October, we released Advanced Shader Delivery (ASD) on the ROG Xbox Ally handhelds. Advanced Shader Delivery addresses one of the most frustrating challenges for PC players today – long load times and disruptive stuttering during a game’s first launch. The feature works by delivering precompiled shaders to your game at download time, reducing load time by up to 90% and eliminating shader stutter.

Since the launch of the ROG Xbox Ally handhelds, we’ve partnered with AMD to invest in and improve the experience of players on PC. Today, Advanced Shader Delivery expands beyond ROG Xbox Ally handhelds to Windows 11 PCs with discrete GPUs and gaming laptop integrated GPUs from AMD. Players can take advantage of this preview by joining Xbox Insiders.

This expansion includes Forza Horizon 6, which showcases the advantage of ASD by dramatically improving loading times by 95%. We partnered closely with the team behind Forza Horizon 6 to give more players the advantage of ASD on day one.

Expanded experience:

With ASD, this loading screen finishes 95% faster on an AMD Radeon RX 9060 GPU

Today marks the release of Forza Horizon 6. With advanced shader delivery, Forza Horizon 6 loads in 4 seconds, as compared to almost 1.5 minutes – that’s an overall 95% time savings! This measurement was taken on an AMD Radeon RX 7600 GPU and an AMD Ryzen 7 5800 8-Core processor CPU. ASD also reduces shader stutter when playing the title by circumventing just in time compilation of shaders during gameplay.

Forza Horizon 6 is a great title available in the Xbox PC app that showcases the benefits of ASD, loading instantly and with no shader stutter!

Check out our announcement from the ROG Xbox Ally and Ally X release that details other titles that support ASD and are available in the Xbox PC app.

Device Requirements:

To experience Advanced Shader Delivery at launch, you need the following minimum specs:

OS: Windows 11 24H2 or higher

Xbox Gaming Services: 37.113.11003.0 or higher (Microsoft Store > Library > Update Gaming Services)

Xbox Insider Hub: Open Xbox Insider Hub > Select Previews > PC Gaming Preview

GPU: AMD RDNA 3, RDNA 3.5, RDNA 4 architectures

Driver: Adrenalin 26.5.2 or higher

‘Precompiled shaders installed’ will appear in the launch window when Advanced Shader Delivery is working

Call to Action: Bringing ASD to more titles

If you are a game developer interested in lighting up Advanced Shader Delivery for your own title, this blog post details how you can leverage the latest AgilitySDK to take advantage of the benefits of ASD. You can upload your title with a state object database (SODB) to the Xbox Partner Center to support pre-compilation for your title today.

What’s coming next?

In the coming months, we will be enabling ASD on more Windows devices and other IHV hardware. Stay tuned for more information on our blog.

Feedback?

For feedback on the Advanced Shader Delivery experience, you can let us know via our DirectX Discord.

The post Advanced Shader Delivery expands Public Preview with AMD appeared first on DirectX Developer Blog.

]]> https://devblogs.microsoft.com/directx/advanced-shader-delivery-expands-public-preview-with-amd/feed/ 4 Automatic Super Resolution Preview Comes to the ROG Xbox Ally X for Docked Play https://devblogs.microsoft.com/directx/autosrpreview/ https://devblogs.microsoft.com/directx/autosrpreview/#comments Thu, 30 Apr 2026 16:00:12 +0000 https://devblogs.microsoft.com/directx/?p=13349 We previously introduced Automatic Super Resolution (Auto SR) on select Windows 11 Copilot+ PCs, to make games look sharper and play smoother. Today, we’re excited to give Xbox Insiders the opportunity to help us test and refine this feature on the ROG Xbox Ally X for docked play, where balancing framerate (FPS) and image quality can […]

The post Automatic Super Resolution Preview Comes to the ROG Xbox Ally X for Docked Play appeared first on DirectX Developer Blog.

]]> We previously introduced Automatic Super Resolution (Auto SR) on select Windows 11 Copilot+ PCs, to make games look sharper and play smoother. Today, we’re excited to give Xbox Insiders the opportunity to help us test and refine this feature on the ROG Xbox Ally X for docked play, where balancing framerate (FPS) and image quality can be especially challenging.

Imagine this… You’ve just had a great gaming session on your ROG Xbox Ally X on the go. Everything looks incredibly sharp and plays smoothly on your 7-inch screen. You dock it, lean back, and look up at your TV to keep playing, but now, stretching across a much larger screen, the image looks softer. You push the resolution and graphics settings higher to bring back detail, and FPS drops. You dial it back to keep things smooth, but you lose detail again. You’re left choosing between visuals and FPS, but you want both!

That’s where Auto SR comes in. By upscaling frames rendered at lower resolutions, Auto SR can help deliver smooth gameplay without losing sharpness. On the ROG Xbox Ally X, that means 1440p-like visuals and higher FPS on larger screens, where this balance matters most.

Take a look at Auto SR in action. Below are two frames from Forza Horizon 5: one rendered natively at 1440p, and the other enhanced with Auto SR. Look closely — can you tell the difference?

Not easy, right? Auto SR is the frame on the right, delivering 1440p-like visuals with more than a 30% FPS boost. What matters most: with Auto SR you get both.

Key Highlights

Why We’re Starting with Docked Play

Docked play means larger screens and higher resolutions, where drops in image quality are more noticeable or where some games struggle to maintain smooth FPS. That’s exactly the problem Auto SR was designed to solve, so we’re starting the preview with docked mode where we expect players will see the most value.

Super Resolution Today

Super resolution works by rendering at a lower resolution to boost FPS, then upscales the frames to restore detail. It is often a core part of how many modern games render, and players expect it.

Previously, super resolution came in two forms:

Why Auto SR Matters on the ROG Xbox Ally X

For super resolution to deliver the most value, it needs to provide both high image quality and FPS gains at once. Game-integrated super resolution does an excellent job delivering that balance, but gaps remain, most notably on gaming handhelds like the ROG Xbox Ally X.

Windows integration expands high-quality super resolution coverage: Not every game ships with game-integrated super resolution. Because Auto SR is built into Windows, it can broadly apply high-quality super resolution to existing games, especially those without game-integrated super resolution.

Larger super resolution models avoid memory bandwidth bottlenecks that can limit FPS: Super resolution means fewer pixels for the GPU to render, which traditionally means higher FPS. But game-integrated super resolution still relies on the game to produce more detailed surface textures than a lower resolution render would normally need. Otherwise, the upscaled image looks soft, like zooming in on a low-quality photo. All that texture data must move through memory every frame. On handheld PCs, where memory bandwidth is constrained, this directly limits the FPS gains super resolution is designed to deliver. Until now, the only option has been reducing texture quality at the cost of visual quality. Auto SR takes a different approach to super resolution, and uses larger models that can reconstruct texture detail rather than relying on the game to provide it, avoiding the bandwidth demands that limit FPS on these devices.

NPU enables higher quality and higher FPS super resolution: The longer it takes to render a frame, the lower your FPS. When super resolution runs on the GPU, it counts towards frame time. To avoid impacting FPS, models are limited to a minuscule 1–2ms, constraining their size and quality. Game-integrated super resolution fits in this window and still delivers quality by relying on the game to provide more detailed texture data. Auto SR sidesteps this limit by running larger models on the NPU in parallel with the GPU. This gives Auto SR an entire extra frame of time to run the model — critical for devices like the ROG Xbox Ally X, that couldn’t otherwise run these models without significantly impacting FPS. This also lets the GPU move straight to the next frame, so there is essentially no frame time overhead, giving Auto SR the potential to deliver high-quality super resolution at the theoretical maximum FPS in exchange for a frame of latency. GPU-based super resolution can’t do this.

What does this add up to? Texture-heavy games at higher resolutions and graphics settings are where super resolution is needed the most, but on gaming handhelds, that’s also where super resolution is hardest to deliver. Game-integrated super resolution remains the preferred choice. Auto SR steps in where game-integrated super resolution isn’t available or when hardware constraints prevent it from simultaneously delivering quality and FPS.

Choosing the Right Super Resolution Option On My ROG Xbox Ally X

Alongside Auto SR, the AMD Ryzen AI Z2 Extreme processor also supports AMD FSR Upscaling, RSR, AMD FSR Frame Generation, and AMD Fluid Motion Frames (AFMF). Here’s a quick guide developed in collaboration with AMD to help you choose the best option based on your play goals.

Scenario Player Guidance
Games run below 60 FPS Enable Super Resolution 

Both Auto SR and AMD FSR Upscaling deliver substantial gains across a wide range of games. Choose the upscaling that best fits your image quality and FPS needs.

If neither is available, use RSR.

Games run below 60 FPS with super resolution enabled Enable Auto SR + AFMF

Disable other super resolution and frame generation options when using this combination.

 

Examples

Now, let’s see some more examples! Forza Horizon 5 runs smoothly on the ROG Xbox Ally X’s internal screen, hitting 60 FPS at 1080p using “High” settings. Dock to a larger screen, Auto SR helps deliver higher visual detail by enabling the game’s “Ultra” settings with a 30% FPS boost over native 1440p at similar visual quality.

When compared to 720p, the visual improvement is striking, as shown below. Auto SR brings back much of the texture and detail you’d expect from higher resolutions, turning what would be a soft 720p image into something far sharper and more detailed. Auto SR delivers 1440P level image quality at framerates typically equivalent to if the game rendered natively at 720P, though framerates may run slightly below under heavy power loads.

Getting Started

  1. Enroll in Xbox Insider on PC to get started with Auto SR on your ROG Xbox Ally X.
  2. Confirm Auto SR is available:
    • Open Xbox Game Bar (press the Xbox button)
    • Navigate to the Display Widget and look for the Auto SR tab
  3. Make sure your device is up to date. If the Auto SR tab isn’t showing, the rollout may still be reaching your device.
    • Game Bar: Exit Xbox mode (Game Bar > Settings > Exit Xbox mode), then check Microsoft Store > Downloads for updates.
    • Auto SR package: Install the latest from the Microsoft Store.

Need Step-by-Step Instructions to Enable Auto SR?

Visit the Auto SR support page

 

Preview Notes and Feedback

As a preview feature, Auto SR is still evolving. Every PC game behaves a little differently, and there’s no one-size-fits-all setup. You may need to follow different steps depending on the game, and you might notice minor quirks along the way. Keep an eye on Game Bar status for guidance, and refer back to the support page if needed.

Help shape Auto SR

How is it working in your games? How does the setup and control feel?

Tell us at autosr@microsoft.com

Games to Try Auto SR On

Auto SR is most useful for titles running below 60 FPS. If your game is already running smoothly, Auto SR lets you turn up the resolution or graphics settings to get even better visuals while keeping FPS smooth.

Suggested Games

Once you are set up, try Auto SR with your favorite DirectX games (DX10 or later) or one of these: Assassin’s Creed: Mirage, Assassin’s Creed: Valhalla, Assetto Corsa, Avowed, Control, Dead Island: Definitive Edition, DOOM: The Dark Ages, Far Cry 6, Frostpunk 2, Grounded 2, Psychonauts 2, Rise of the Tomb Raider, The Outer Worlds 2, Tom Clancy’s Rainbow Six Siege, Tom Clancy’s The Division 2 and War Thunder.

What’s Next

Auto SR improves visual quality and framerates across supported games today. With this preview, Auto SR becomes a tool that puts players in control, letting them enable it based on their preference. Your feedback will help us make that control easier to discover and use. We’re also exploring expanding the scenarios Auto SR supports and continuing to improve quality and performance. Stay tuned for more.

The post Automatic Super Resolution Preview Comes to the ROG Xbox Ally X for Docked Play appeared first on DirectX Developer Blog.

]]> https://devblogs.microsoft.com/directx/autosrpreview/feed/ 6 Announcing Shader Model 6.10 Preview and AgilitySDK 720 Preview https://devblogs.microsoft.com/directx/shader-model-6-10-agilitysdk-720-preview/ https://devblogs.microsoft.com/directx/shader-model-6-10-agilitysdk-720-preview/#comments Mon, 27 Apr 2026 17:01:04 +0000 https://devblogs.microsoft.com/directx/?p=13389 Overview Today, we are pleased to announce that Shader Model 6.10 and other features have been officially released with Agility SDK 1.720-preview and complementary DXC 1.10.2605.2. AgilitySDK 1.720-preview exposes the following features. There’s more detail further below, including download and driver links. Shader Model 6.10 (via DXC 1.10.2605.2): linalg::Matrix Group Wave Index Variable Group Shared […]

The post Announcing Shader Model 6.10 Preview and AgilitySDK 720 Preview appeared first on DirectX Developer Blog.

]]> Overview

Today, we are pleased to announce that Shader Model 6.10 and other features have been officially released with Agility SDK 1.720-preview and complementary DXC 1.10.2605.2. AgilitySDK 1.720-preview exposes the following features. There’s more detail further below, including download and driver links.

Downloads

Hardware Support

IHV Driver Link(s)
AMD AMD Software: AgilitySDK Developer Preview Edition 25.30.41.02 
Intel Intel® Arc Graphics – Windows* 
NVIDIA Contact your developer relations representative for in-development driver access.

See Appendix > Feature Support for the full table of each feature’s supported hardware.

Features

HLSL Features (Shader Model 6.10):

linalg::Matrix

Shader Model 6.10 introduces a set of Matrix APIs covering a broad swath of use cases. Collectively the feature is called LinAlg (short for Linear Algebra).

We’ve written a dedicated a blog post covering the feature in depth here.

Also see the GDC 2026 blog putting this feature in context of the overall ML story for DirectX here.

HLSL Spec: hlsl-specs/proposals/0035-linalg-matrix.md at main · microsoft/hlsl-specs

Group Wave Index

Shader Model 6.10 introduces two new intrinsics, GetGroupWaveIndex() and GetGroupWaveCount(), that give compute, mesh, amplification, and node shaders direct knowledge of wave-level structure within a thread group. GetGroupWaveIndex() returns the current wave’s index (0 to N-1) and GetGroupWaveCount() returns the total number of waves executing the group. These enable wave-level work specialization and cooperation without relying on unsafe workarounds like dividing SV_GroupIndex by WaveGetLaneCount(), which is not guaranteed to be correct across all hardware. A single code path now works portably across all wave sizes.

HLSL Spec: hlsl-specs/proposals/0048-group-wave-index.md at main · microsoft/hlsl-specs

Variable Group Shared Memory

Shader Model 6.10 lifts the longstanding 32 KB (28 KB for mesh shaders) cap on groupshared memory by exposing the actual hardware limit through a new runtime query, MaxGroupSharedMemoryPerGroup. Shader authors can use a new [GroupSharedLimit(<bytes>)] entry-point attribute to declare the maximum shared memory their shader requires, giving the compiler a compile-time portability check while still allowing access to the full capacity of modern GPUs. Shaders that omit the attribute continue to be validated against the legacy limits, so existing code is unaffected. This unlocks algorithms like large tile culling, software rasterization bins, and big matrix workloads that were previously constrained by the spec rather than the hardware.

HLSL Spec: hlsl-specs/proposals/0049-variable-groupshared-memory.md at main · microsoft/hlsl-specs

Raytracing intrinsics

TriangleObjectPositions() is an intrinsic that can be called from an Any hit or Closest hit shader or RayQuery to obtain the positions of the vertices for the triangle that has been hit.

Spec: https://github.com/microsoft/hlsl-specs/blob/main/proposals/0041-triangle-object-positions.md

ClusterID() is an intrinsic that can be called from an Any hit or Closest hit shader or RayQuery to return the user defined ID of a cluster.  This isn’t currently useful since clustered geometry support for DXR isn’t ready yet.

HLSL Spec: https://github.com/microsoft/hlsl-specs/blob/main/proposals/0045-clustered-geometry.md

D3D12 Raytracing spec with work-in progress clustered geometry design (not shipped yet): https://github.com/microsoft/DirectX-Specs/blob/master/d3d/Raytracing2.md

Once the features in this spec ship (tentatively starting with a preview fall 2026), the ClusterID() intrinsic will become useful.

 

D3D12 Features:

Batched Asynchronous Command List APIs

D3D12’s legacy CopyBufferRegion, ClearUnorderedAccessViewFloat/Uint, ResolveSubresource, and similar commands all execute strictly in series because the old ResourceBarrier model has no way to express a dependency between two operations of the same type (e.g. copy-dest to copy-dest). This means the GPU stalls between every sequential copy or clear, even when the operations touch completely independent memory. The Batched Async Commands feature addresses this by introducing new command list methods that remove the implicit serialization contract, allowing the driver and hardware to overlap independent work within a single batch call. Developers opt into explicit synchronization using enhanced barriers only where true data hazards exist – such as when two copies write to overlapping regions of the same buffer – and everything else runs concurrently.

The feature also modernizes clears with ClearTextureSubresources, which clears textures directly by resource pointer and format – no RTV, UAV, descriptor heaps, or special resource flags required. This is notably the first D3D12 clear that works on block-compressed formats. Correspondingly, FillBuffers adds batched, format-aware or raw-pattern buffer fills with configurable repeat counts, replacing the descriptor gymnastics of UAV clears. In addition, new ClearBoundRenderTargetViews and ClearBoundDepthStencilView commands further improve ergonomics by operating on currently bound targets, enabling mid-render-pass clears and batch clearing multiple RTVs in a single call.

PIX

PIX supports all features released here. See the PIX release blog: https://devblogs.microsoft.com/pix/pix-2604-27004-preview/

Appendix

Feature Support

Using the latest drivers linked in Overview > Hardware Support:

AMD Intel NVIDIA
linAlg::Matrix Supported on AMD Radeon RX 9000 series graphics products.  Planned for an upcoming release.  Supported on all RTX hardware.
Group Wave Index Supported on AMD Radeon RX 7000 and 9000 series graphics products.  Supported on Intel® Arc B-Series Graphics.  Planned for an upcoming release. 
Variable Group Shared Memory Supported on AMD Radeon RX 7000 and 9000 series graphics products. 

Supports default memory limit size only. Higher size limits are planned for future driver releases. 

Supported on Intel® Arc B-Series Graphics.  Supported on all RTX hardware. 

Values differ across hardware. 

Raytracing intrinsics: TriangleObjectPositions/ClusterID Supported on AMD Radeon RX 7000 and 9000 series graphics products.   Supported on Intel® Arc B-Series Graphics.  Supported on all RTX hardware.  
Batched Asynchronous Command List APIs Supported on AMD Radeon RX 7000 and 9000 series graphics products. Supported on Intel® Arc B-Series Graphics.  Supported on all RTX hardware.  

 

The post Announcing Shader Model 6.10 Preview and AgilitySDK 720 Preview appeared first on DirectX Developer Blog.

]]> https://devblogs.microsoft.com/directx/shader-model-6-10-agilitysdk-720-preview/feed/ 3 D3D12 LinAlg Matrix Preview https://devblogs.microsoft.com/directx/d3d12-linalg-preview/ https://devblogs.microsoft.com/directx/d3d12-linalg-preview/#respond Mon, 27 Apr 2026 17:00:50 +0000 https://devblogs.microsoft.com/directx/?p=13418 Welcome to the D3D12 LinAlg Matrix Preview release! Today, we are excited to announce the preview release for the D3D12 Linear Algebra APIs! This feature set unlocks comprehensive hardware acceleration for Matrix-oriented operations across various use cases. Previously, we announced the WaveMMA and Cooperative Vectors features which supported narrow matrix operation use cases; the LinAlg […]

The post D3D12 LinAlg Matrix Preview appeared first on DirectX Developer Blog.

]]> Welcome to the D3D12 LinAlg Matrix Preview release!

Today, we are excited to announce the preview release for the D3D12 Linear Algebra APIs! This feature set unlocks comprehensive hardware acceleration for Matrix-oriented operations across various use cases. Previously, we announced the WaveMMA and Cooperative Vectors features which supported narrow matrix operation use cases; the LinAlg feature set being announced today subsumes these APIs into a singular set of orthogonal APIs. With today’s announcement, we are enabling developers to both efficiently drive neural rendering techniques directly from individual shader threads in real-time graphics pipelines and utilize higher bandwidth matrix MMA operations for ML and image processing applications, all in a singular combined API.

The application of machine learning techniques is now ubiquitous across the industry. For graphics development, neural network based rendering methods, which we’ve been calling neural rendering, are quickly growing in popularity. At the same time, offloading high bandwidth matrix compute onto the GPU is unquestionably at an all-time high. As such, GPU vendors continue to adopt and expand specialized hardware for matrix operations, and the new LinAlg Matrix APIs put the power of that hardware into your hands!

This blog post is part of the larger SM6.10 preview announcement. See the parent blog post for the full feature set. Also see the GDC 2026 blog putting this feature in context of the overall ML story for DirectX here.

Motivation

Unlocking efficient use of the GPU’s specialized matrix hardware is the core motivation for the introduction of the LinAlg Matrix APIs. Thanks to the preview process, we were able to go back to the drawing board and evolve the previous design. Thank you for all the feedback and please keep it coming! We will continue to evolve the LinAlg Matrix APIs over the preview period in response to real world feedback.

The new API supports three modes of operations (called Matrix Scopes), motivating different key matrix use cases:

MatrixScope::Thread (previously previewed as Cooperative Vectors)

A thread-scope matrix is expected to be used within the context of a graphics shader thread (SIMT mode). A potential example of such is running inference on a neural network trained to compute lighting, here the neural network solution would be a drop-in replacement to classical physics-based computations. Thread-scope matrices enable incremental adoption of ML techniques into graphics shaders. The compiler can efficiently map these inferencing operations to dedicated hardware accelerators.

MatrixScope::Wave (previously previewed as WaveMMA)

High bandwidth dedicated matrix multiplication hardware is increasingly available in contemporary GPUs. A wave-scope matrix surfaces access to this hardware for complex machine learning and image processing applications. A typical application may employ smaller matrices or manually tiled larger matrices for hardware accelerated matrix-matrix multiplications.

MatrixScope::ThreadGroup

MatrixScope::ThreadGroup is new to the LinAlg Matrix API. It is compatible with all the operations of a wave-scope matrix above serving a different use case. The inputs and weight matrices used in LLM-like networks are much larger than allowed sizes for wave-scope matrices. To serve this case with a wave-scope matrix, manual tiling is mandatory, and for cross-hardware performance, multiple different kernels would be required. Conversely, a threadgroup-scope matrix’s larger size avoids manual tiling. The tiling decision is shifted to the driver, allowing you to ship a single implementation while still retaining optimal tiling.

Feature Overview

Shader Model 6.10 introduces high level linear algebra APIs building on top of the Long Vectors and Native DXIL Vectors features released as part of SM6.9. The high level API is converted into a “mid level” API consumed by the driver.  The mid level API maintains high levels of context, enabling the driver to take advantage of the underlying hardware capabilities. Meanwhile, the high level API enables better source level usage rules and fast iteration. This API is centered on the new Matrix type provided as a permissively licensed HLSL source header. Depending on the declaration of a specific Matrix instance, various operations are enabled or disabled at compilation time. For example, MatrixScope::Thread is roughly limited to matrix-vector operations, while MatrixScope::Wave and MatrixScope::ThreadGroup are roughly limited to matrix-matrix operations. You can view the full table of available operations in the LinAlg spec.

Code Examples

Below are examples of primary use cases for the Matrix header, each serving a different goal with different available operations. You can find some of these examples here on GitHub.

Cooperative Vectors Example

// Compiled with the line below // bin/dxc -I ./include/hlsl -T cs_6_10 -enable-16bit-types coop-vec-example.hlsl // System header containing the LinAlg Matrix APIs #include <dx/linalg.h> // The API is nested under dx::linalg. Simplify the example by using it using namespace dx::linalg; // Byte Address Buffer to load/store the matrices ByteAddressBuffer InBuff : register(t0); [numthreads(8, 1, 1)] [shader("compute")] void main() { // The Matrix type names can get quite long. Alias them for readability // Looking at the template arguments we have: // ComponentType::F16 - The matrix holds and F16 type // 16 - The M dimension of the matrix is 16 // 16 - The N dimension of the matrix is 16 // MatrixUse::A - The Matrix is an "A" matrix, so it only fits into the "A" // slot of various functions // MatrixScope::Thread - The Matrix is a "Thread" matrix, so it may only be // used with "Thread Matrix" operations. These are the operations // previously covered under the Cooperative Vector API using MatrixATy = Matrix<ComponentType::F16, 16, 16, MatrixUse::A, MatrixScope::Thread>; // Setup data for later by loading the matrix and creating null vectors vector<float16_t, 16> Vec = (vector<float16_t, 16>)0; vector<float16_t, 16> Bias = (vector<float16_t, 16>)0; MatrixATy MatA = MatrixATy::Load<MatrixLayout::RowMajor>( InBuff, 0, /* Row stride = number of columns * element size */ 16 * 2); // Do a F16 Matrix x Vector multiply vector<float16_t, 16> Layer1 = Multiply<float16_t>(MatA, Vec); // Do a F16 Matrix x Vector multiply with a bias Vector vector<float16_t, 16> Layer2 = MultiplyAdd<float16_t>(MatA, Layer1, Bias); // Create a reference to an in-memory vector at offset 4096 in InBuff // without actually loading it in VectorRef<ComponentType::F8_E4M3FN, 16> MemBias = {InBuff, /*start offset*/ 4096}; // Do a F16 Matrix x Vector multiply with a bias vector stored in memory vector<float16_t, 16> Layer3 = MultiplyAdd<float16_t>(MatA, Layer2, MemBias); // Create some packed data vector<uint8_t4_packed, 4> SomeData = (vector<uint8_t4_packed, 4>)0; // Do a MatVecMulAdd but reinterpret the Vec data as F8_F8_E4M3FN with a bias // stored in memory vector<float16_t, 16> Layer4 = MultiplyAdd<float16_t>( MatA, MakeInterpretedVector<ComponentType::F8_E4M3FN>(SomeData), MemBias); // Do a MatVecMulAdd but reinterpret the Vec data as F8_E4M3FN with a regular // bias vector vector<float16_t, 16> Layer5 = MultiplyAdd<float16_t>( MatA, MakeInterpretedVector<ComponentType::F8_E4M3FN>(SomeData), Bias); // Create some uint data vector<uint, 16> SomeData2 = (vector<uint, 16>)0; // Do a MatVecMulAdd but convert SomeData2 from a U32 to a F8_E4M3FN first vector<float16_t, 16> Layer6 = MultiplyAdd<float16_t>( MatA, Convert<ComponentType::F8_E4M3FN, ComponentType::U32>(SomeData2), MemBias); }

OuterProduct and InterlockedAccumulate Example

// Compiled with the line below // bin/dxc -I ./include/hlsl -T cs_6_10 -enable-16bit-types outerproduct-example.hlsl // System header containing the LinAlg Matrix APIs #include <dx/linalg.h> // The API is nested under dx::linalg. Simplify the example by using it using namespace dx::linalg; // Byte Address Buffer to load/store from RWByteAddressBuffer OutBuff : register(u0); [numthreads(8, 1, 1)] [shader("compute")] void main() { // The Matrix type names can get quite long. Alias them for readability // Looking at the template arguments we have: // ComponentType::F16 - The matrix holds and F16 type // 16 - The M dimension of the matrix is 16 // 8 - The N dimension of the matrix is 8 // MatrixUse::Accumulator - The Matrix is an "Accumulator" matrix, so it // only fits into the "Accumulator" slot of various functions // MatrixScope::Thread - The Matrix is a "Thread" matrix, so it may only be // used with "Thread Matrix" operations. using MatrixAccumTy = Matrix<ComponentType::F16, 16, 8, MatrixUse::Accumulator, MatrixScope::Thread>; // Create some F16 vectors with placeholder data vector<float16_t, 16> VecA = (vector<float16_t, 16>)0; vector<float16_t, 8> VecB = (vector<float16_t, 8>)0; // Create an Accum matrix by outer producting the two vectors MatrixAccumTy MatAcc = OuterProduct<ComponentType::F16>(VecA, VecB); // Atomically accumulate the result into the output buffer MatAcc.InterlockedAccumulate(OutBuff, 0); }

Wave Matrix Example

// Compiled with the line below // bin/dxc -I ./include/hlsl -T cs_6_10 -enable-16bit-types linalg-wave.hlsl // System header containing the LinAlg Matrix APIs #include <dx/linalg.h> // The API is nested under dx::linalg. Simplify the example by using it using namespace dx::linalg; // This shader performs matrix multiplication C = α*A*B + β*C // where A, B, and C are matrices of dimensions MxK, KxN, and MxN respectively. // The shader uses wave-level parallelism to compute tiles of the output matrix // C. Each wave computes a TILE_SIZExTILE_SIZE tile of C. The dispatch must // allocate waves for each tile of the MxN output matrix. // GEMM constants cbuffer GemmConstants : register(b0) { float alpha; // Scalar multiplier for A*B float beta; // Scalar multiplier for existing C } ByteAddressBuffer MatrixA; ByteAddressBuffer MatrixB; RWByteAddressBuffer MatrixC; // Matrix dimensions - can be configured as needed #define M 1024 // Rows in A and C #define N 1024 // Columns in B and C #define K 1024 // Columns in A, rows in B #define TILE_SIZE 16 // Optimized GEMM using wave-level parallelism [numthreads(TILE_SIZE, 1, 1)] void main(uint3 group_id : SV_GroupID) { // Matrix type definitions for wave scope using MatrixATy = Matrix<ComponentType::F16, TILE_SIZE, TILE_SIZE, MatrixUse::A, MatrixScope::Wave>; using MatrixBTy = Matrix<ComponentType::F16, TILE_SIZE, TILE_SIZE, MatrixUse::B, MatrixScope::Wave>; using MatrixResultTy = Matrix<ComponentType::F32, TILE_SIZE, TILE_SIZE, MatrixUse::Accumulator, MatrixScope::Wave>; // Calculate tile coordinates for this thread group uint tile_row = group_id.y; uint tile_col = group_id.x; // Initialize accumulator MatrixResultTy c_tile = MatrixResultTy::Splat(0.0f); // Perform tiled matrix multiplication across K dimension for (uint k = 0; k < K; k += TILE_SIZE) { // Calculate byte offsets for A and B tiles uint a_offset = ((tile_row * TILE_SIZE) * K + k) * sizeof(half); uint b_offset = (k * N + (tile_col * TILE_SIZE)) * sizeof(half); // Load A and B tiles for this K iteration using ByteAddressBuffer MatrixATy a_k_tile = MatrixATy::Load( MatrixA, a_offset, K * sizeof(half), MatrixLayout::RowMajor); MatrixBTy b_k_tile = MatrixBTy::Load( MatrixB, b_offset, N * sizeof(half), MatrixLayout::RowMajor); // Multiply and accumulate with mixed precision (half inputs -> float accumulation) c_tile.MultiplyAccumulate(a_k_tile, b_k_tile); } // Calculate output offset for GEMM equation: C = α*A*B + β*C uint c_offset = ((tile_row * TILE_SIZE) * N + (tile_col * TILE_SIZE)) * sizeof(float); // Load existing C tile MatrixResultTy c_existing = MatrixResultTy::Load(MatrixC, c_offset, N * sizeof(float), MatrixLayout::RowMajor); // Apply GEMM scaling element-wise: α*A*B + β*C for (uint i = 0; i < c_tile.Length(); i++) { float ab_val = c_tile.Get(i); float c_val = c_existing.Get(i); float result = alpha * ab_val + beta * c_val; c_tile.Set(i, result); } c_tile.Store(MatrixC, c_offset, N * sizeof(float), MatrixLayout::RowMajor); }

ThreadGroup Matrix Example

// Compiled with the line below // bin/dxc -I ./include/hlsl -T cs_6_10 -enable-16bit-types linalg-threadgroup.hlsl // System header containing the LinAlg Matrix APIs #include <dx/linalg.h> // The API is nested under dx::linalg. Simplify the example by using it using namespace dx::linalg; // This shader performs matrix multiplication C = α*A*B + β*C // where A, B, and C are matrices of dimensions MxK, KxN, and MxN respectively. // The shader uses threadgroup-level parallelism to compute tiles of the output // matrix C. The GPU driver will generate code to split the matrix into optimal // tiles based on the hardware capabilities. // GEMM constants cbuffer GemmConstants : register(b0) { float alpha; // Scalar multiplier for A*B float beta; // Scalar multiplier for existing C } ByteAddressBuffer MatrixA; ByteAddressBuffer MatrixB; RWByteAddressBuffer MatrixC; // Matrix dimensions - can be configured as needed #define M 1024 // Rows in A and C #define N 1024 // Columns in B and C #define K 1024 // Columns in A, rows in B // Optimized GEMM using threadgroup-level parallelism [numthreads(1024, 1, 1)] void main() { // Matrix type definitions for threadgroup scope using MatrixATy = Matrix<ComponentType::F16, M, K, MatrixUse::A, MatrixScope::ThreadGroup>; using MatrixBTy = Matrix<ComponentType::F16, N, K, MatrixUse::B, MatrixScope::ThreadGroup>; using MatrixResultTy = Matrix<ComponentType::F32, M, N, MatrixUse::Accumulator, MatrixScope::ThreadGroup>; MatrixATy a_matrix = MatrixATy::Load(MatrixA, 0, K * sizeof(half), MatrixLayout::RowMajor); MatrixBTy b_matrix = MatrixBTy::Load(MatrixB, 0, N * sizeof(half), MatrixLayout::RowMajor); // Load existing C matrix for GEMM equation: C = α*A*B + β*C MatrixResultTy c_existing = MatrixResultTy::Load(MatrixC, 0, N * sizeof(float), MatrixLayout::RowMajor); // Compute A*B MatrixResultTy ab_result = Multiply<ComponentType::F32>(a_matrix, b_matrix); // Apply GEMM scaling element-wise: α*A*B + β*C for (uint i = 0; i < ab_result.Length(); i++) { float ab_val = ab_result.Get(i); float c_val = c_existing.Get(i); float result = alpha * ab_val + beta * c_val; ab_result.Set(i, result); } ab_result.Store(MatrixC, 0, N * sizeof(float), MatrixLayout::RowMajor); }

Data Preparation

There are a couple of D3D methods for converting weight and bias matrix data between formats:

enum D3D12_LINEAR_ALGEBRA_MATRIX_LAYOUT {     D3D12_LINEAR_ALGEBRA_MATRIX_LAYOUT_ROW_MAJOR,     D3D12_LINEAR_ALGEBRA_MATRIX_LAYOUT_COLUMN_MAJOR,     D3D12_LINEAR_ALGEBRA_MATRIX_LAYOUT_MUL_OPTIMAL,     D3D12_LINEAR_ALGEBRA_MATRIX_LAYOUT_OUTER_PRODUCT_OPTIMAL }

For instance, D3D12_LINEAR_ALGEBRA_MATRIX_LAYOUT_MUL_OPTIMAL is a device-specific layout for optimal use with the Matrix-Vector operations such as MultiplyAdd in the code example above.

See ID3D12DevicePreview::GetLinearAlgebraMatrixConversionDestinationInfo() and ID3D12CommandListPreview::ConvertLinearAlgebraMatrix() in the D3D LinAlg spec here.

Get Running

LinAlg is part of Shader Model 6.10, currently in preview. This requires:

Device Support:

NVIDIA: Contact your developer relations representative for in-development driver access. 
Intel: Support planned in an upcoming release.
AMD: AMD Software: AgilitySDK Developer Preview Edition 25.30.41.02 
WARP: Available on latest WARP software rasterizer preview, available here.

Checking for Support

To enable the LinAlg preview with the AgilitySDK from above, in code turn on experimental feature support before creating a D3D12 device:

UUID Features[] = { D3D12ExperimentalShaderModels }; ThrowIfFailed(D3D12EnableExperimentalFeatures(_countof(Features), Features, nullptr, nullptr));

The API provides many different dimensions of hardware support. To fully explore the granular API, see the documentation here.

To quickly get started with LinAlg, query the device for Tier 1 support:

D3D12_FEATURE_DATA_LINEAR_ALGEBRA_SUPPORT linearAlgebraSupport = {}; HRESULT hr = device->CheckFeatureSupport( D3D12_FEATURE_LINEAR_ALGEBRA_SUPPORT, &linearAlgebraSupport, sizeof(linearAlgebraSupport)); if (SUCCEEDED(hr) && linearAlgebraSupport.LinearAlgebraTier >= D3D12_LINEAR_ALGEBRA_TIER_1) { // Device supports Tier 1 linear algebra operations }

Supported Tier 1 features are listed here. Other tiered levels of support are also found in that spec.

PIX

As usual, Day One PIX support is available. Check here for the latest information.

Content from GPU Vendors

AMD

Linear Algebra Matrix is supported on AMD Radeon RX 9000 series graphics products using the AMD Software: AgilitySDK Developer Preview Edition 25.30.41.02driver. 

Intel

We’re working on a Linear Algebra implementation leveraging our XMX cores and are expect it to share it with ISVs later this year. This new API replaces cooperative vectors and enables efficient use of vector-matrix and matrix-matrix multiplication. It’s a key enabler for neural rendering techniques like texture set neural compression and more. We’re excited to see how developers will leverage this capability and can’t wait to see all the new cool rendering algorithms that will be developed on top of it!

– Matthäus Chajdas, Senior Principal Engineer

NVIDIA

Contact your developer relations representative for in-development driver access.

 

The post D3D12 LinAlg Matrix Preview appeared first on DirectX Developer Blog.

]]> https://devblogs.microsoft.com/directx/d3d12-linalg-preview/feed/ 0 Evolving DirectX for the ML Era on Windows https://devblogs.microsoft.com/directx/evolving-directx-for-the-ml-era-on-windows/ https://devblogs.microsoft.com/directx/evolving-directx-for-the-ml-era-on-windows/#comments Thu, 12 Mar 2026 20:33:24 +0000 https://devblogs.microsoft.com/directx/?p=13103 At GDC this year, we shared how machine learning is becoming foundational to real time graphics, and how DirectX is evolving to meet that shift across shader level and model level ML. ML is no longer a niche optimization or a postprocess trick. It’s increasingly embedded throughout the graphics pipeline, influencing how frames are generated, […]

The post Evolving DirectX for the ML Era on Windows appeared first on DirectX Developer Blog.

]]> At GDC this year, we shared how machine learning is becoming foundational to real time graphics, and how DirectX is evolving to meet that shift across shader level and model level ML. ML is no longer a niche optimization or a postprocess trick. It’s increasingly embedded throughout the graphics pipeline, influencing how frames are generated, how content is authored, and how game developers realize their artistic vision. DirectX is evolving to support this future— one where ML is a first-class citizen alongside traditional rendering workloads.

Introducing DX Linear Algebra

Last year, DirectX took a major step into the ML era with the introduction of Cooperative Vector in Shader Model 6.9. For the first time, developers could access hardware accelerated vector–matrix operations directly from HLSL, enabling a class of neural rendering techniques that execute inline with traditional shading. These workloads—such as neural texture compression (DTC) and neural radiance caching (NRC)—map naturally to highly parallel, per-pixel inference.

Cooperative Vector has since demonstrated that ML can be effectively integrated directly into the graphics pipeline, particularly for scenarios where developers want fine-grained, shader level control over how ML is applied alongside traditional rendering logic.

As ML usage expanded, however, it became clear that not all workloads fit this execution model. Many common and emerging scenarios—such as denoising, temporal upscaling, and more—require matrix–matrix operations, shared data across threads, and batch-oriented execution that go beyond what vector–matrix primitives alone can efficiently express.

To address this gap, we introduced DirectX Linear Algebra, an expansion of DirectX’s math capabilities designed to support both vector and matrix-based ML workloads under a single programming model. DX Linear Algebra adds first-class matrix–matrix operations while preserving the ability to author ML directly in HLSL, giving developers explicit control over math, data flow, and execution for shader level ML scenarios. These capabilities establish a scalable foundation for shader‑level ML in DirectX.

Expanding to Model Level ML with DirectX Compute Graph Compiler

While shader-level ML is powerful, many modern ML-driven graphics workloads are best expressed and optimized as full computation graphs, not as isolated operators or hand-authored kernels. These graphs capture end-to-end structure—dataflow, dependencies, and deep fusion—that are difficult or impossible to exploit at the shader level, especially when targeting the full PC ecosystem.

That’s why we introduced DirectX Compute Graph Compiler.

DirectX Compute Graph Compiler is a new DirectX ML compiler API designed to execute full model graphs with native class GPU performance. Models flow from modern frameworks, where DirectX can analyze and specialize the complete graph for a given device before lowering it into optimized workloads that integrate natively with D3D12 queues and command lists.

Key benefits include:

Shader-level ML and model-level ML now live side by side in DirectX: HLSL Linear Algebra for small, inline workloads and DirectX Compute Graph Compiler for larger models.

Support from our hardware vendor partners

AMD: “DirectX Linear Algebra and DirectX Compute Graph Compiler give developers new ways to integrate machine learning directly into their graphics pipelines while retaining the control and performance characteristics they expect from modern GPUs. We’re excited to collaborate with Microsoft on advancing ML-driven graphics on Windows.” – Robert Shearer, CVP Silicon Design Engineering, AMD

For more, see here

Intel: “DirectX Linear Algebra gives developers a powerful new foundation for bringing matrix-based machine learning directly into real-time graphics workflows. We’re excited to support Linear Algebra on day one.” – Lisa Pearce, Corporate Vice President, Software Group, Intel

For more, see here

NVIDIA: “With DirectX Linear Algebra and DirectX Compute Graph Compiler, developers gain flexible paths to integrate both shader level and model level machine learning seamlessly into their graphics pipelines. We’re pleased to support both capabilities and to collaborate with Microsoft on accelerating ML driven rendering and inference workflows on NVIDIA GeForce RTX GPUs.” – Patrick Neill, Distinguished Engineer, NVIDIA

For more, see here

Qualcomm: “DirectX Compute Graph Compiler is a meaningful step toward making full model ML feel native inside real-time engines. We’re excited to collaborate with Microsoft on a compiler-based approach that takes modern model graphs and produces optimized GPU workloads that integrate directly into DirectX.” – Balaji Calidas, Senior Director of Engineering, Qualcomm

What’s Next

ML is no longer an optional enhancement in rendering; it’s becoming core to how graphics are generated. To support this shift, DirectX is evolving into a platform that delivers efficient ML execution at every scale with first-class tooling and visibility. These layers give developers control over how and where ML is integrated in their pipeline, without sacrificing performance, portability, and artistic intent.

DirectX Compute Graph Compiler will be available for private preview this summer, please reach out to your Windows representative if you’re interested in joining.

DX Linear Algebra will enter public preview in April, giving developers an early opportunity to experiment with these capabilities and help shape the future of ML‑assisted graphics on Windows. See the Linear Algebra spec for more detail about the feature.

We’re excited to continue this journey with our partners and the developer community. Check out our GDC session, and stay tuned to the DirectX blog for deeper dives, samples, and updates.

The post Evolving DirectX for the ML Era on Windows appeared first on DirectX Developer Blog.

]]> https://devblogs.microsoft.com/directx/evolving-directx-for-the-ml-era-on-windows/feed/ 1 DirectX: Bringing Console-Level Developer Tools to Windows https://devblogs.microsoft.com/directx/directx-bringing-console-level-developer-tools-to-windows/ Thu, 12 Mar 2026 19:26:17 +0000 https://devblogs.microsoft.com/directx/?p=13111 On March 12th, 2026, the DirectX team and our hardware partners hosted DirectX: Bringing Console-Level GPU Developer Tools to Windows at GDC. We shared our dream of bringing console-level GPU developer tools to Windows, and today we are announcing a major step toward that goal with the biggest wave of new tooling features in DirectX’s history.   For the first time, all four […]

The post DirectX: Bringing Console-Level Developer Tools to Windows appeared first on DirectX Developer Blog.

]]> On March 12th, 2026, the DirectX team and our hardware partners hosted DirectX: Bringing Console-Level GPU Developer Tools to Windows at GDC. We shared our dream of bringing console-level GPU developer tools to Windows, and today we are announcing a major step toward that goal with the biggest wave of new tooling features in DirectX’s history.  

For the first time, all four Windows GPU hardware partners joined us on stage to demonstrate these features running on their hardware. AMD, Intel, NVIDIA, and Qualcomm have worked closely with us throughout feature development, each making significant contributions to make this release possible. This represents the deepest GPU tooling collaboration across the Windows ecosystem and the future of Windows GPU development. 

The announcements included:

DirectX Dump Files 

The Problem: GPU Crashes Are Painful 

Lower-level APIs like D3D12 extract maximum GPU performance, but they make it easier than ever to hit gnarly GPU bugs – whether during development, QA testing, or on retail gamers’ devices. Existing tools like the Debug Layer, GPU-Based Validation, and DRED each help, but none provide thorough crash dump infrastructure with deep OS integration across all hardware vendors.  

We want to change that. 

 

Introducing DirectX Dump Files 

DirectX Dump Files are GPU dump files generated when a TDR occurs. They represent thorough crash dump infrastructure in Windows, with robust integration that brings together critical data from all levels of the stack: hardware, user mode and kernel drivers, user mode and kernel components of Windows, and even your game/application via new D3D12 APIs. 

The first releases will offer immediate help to developers. After that, our robust infrastructure lays an exciting foundation for everyone to build upon, enabling rapid innovation in this space in the future.  

We are very thankful to our hardware partners AMD, Intel, NVIDIA and Qualcomm, who have worked very closely with us throughout feature development. They all joined us on stage during the GDC session to demonstrate DirectX Dump Files running on their hardware. We hope that many of you will try out our previews over the summer and send us your thoughts and feedback. This will help us refine the overall feature and make it as useful as possible for you. 

 

What’s in a DirectX Dump File 

A single .dxdmp file brings together data from every level of the stack, so you don’t have to piece together information from multiple sources: 

All of this maintains Windows process isolation guarantees, ensuring sensitive data from other processes isn’t included in your game’s dump file.

 

Supported Scenarios 

DirectX Dump Files support two critical developer scenarios: 

Both scenarios are fully supported, with customization available via new D3D12 APIs that will allow developers to sacrifice some application performance to improve crash dump actionability. 

 

Configuring Dump Quality vs. Performance 

New D3D12 APIs let you control a trade-off between game performance and dump file actionability, with three levels: no overhead (no runtime performance impact), medium overhead (balanced data with moderate impact), and high overhead (maximum data from the hardware vendor). Hardware vendors will define and document the exact features and impact of each level on their devices. 

These levels are divided into D3D12 support tiers. Tier 1, which all devices supporting DirectX Dump Files will support, includes the medium and high overhead options. Tier 2, which some devices will support initially, offers the no overhead option. On Tier 2 devices, the no overhead dumps will be enabled by default by Microsoft. On Tier 1 devices, we will not enable medium or high overhead options by default due to the application performance implications, so it will be up to developers to opt into these settings.

 

Retrieving Dump Files 

You can either retain the dump file (and use it to investigate a local crash or upload it from retail gamer’s machine to your server) or let Microsoft collect it via Watson. D3D12 provides an optional callback after the dump creation that provides the dump file path.

 

PIX Support 

PIX provides full support for analyzing crashes with DirectX Dump Files, inspired by Xbox PIX’s support for hang dumps (“HIX”). You can analyze any crash with a DirectX Dump File in the PIX UI, regardless of which hardware it was generated on. 

All four hardware partners have written PIX plugins to decode their hardware and driver state collected in the dump files via the standardized DirectX Dump File UI in PIX, which they each demonstrated during our GDC session. 

 

AMD Demo of DirectX Dump Files in PIX:

 

Intel Demo of DirectX Dump Files in PIX:

 

NVIDIA Demo of DirectX Dump Files in PIX:

 

Qualcomm Demo of DirectX Dump Files in PIX:

 

The PIX API also supports programmatic analysis of a GPU crash and extracting information from a DirectX Dump File. These APIs let you write C++, C#, or Python scripts to investigate crashes in your own environment. This could, for example, be used to analyze DirectX Dump Files that you have gathered from retail users’ machines to analyze patterns and bucketize bugs.

DirectX Dump Files will be available starting in early June 2026.

 

Auxiliary DirectX Features

DebugBreak() in HLSL 

A new intrinsic is coming in April in Shader Model 6.10: DebugBreak().  

DebugBreak() will be critical for Live Shader Debugging (see below) but in the short term it can also be used to improve the actionability of DirectX Dump Files. We are adding new D3D12 pipeline state object flags to configure DebugBreak()’s behavior, with one option letting you tell the driver to halt the GPU and immediately trigger a DirectX Dump File when a DebugBreak() is hit. This will enable abort-like behaviors, both in development and even in retail scenarios if necessary, allowing crash dumps to point more accurately to the first problem that occurred rather than a downstream consequence of it that crashed the GPU.  

You can read the full spec here

 

PIX Event Configurability 

PIX events and markers, such as PIXBeginEvent(), have multiple competing use cases: debugging and profiling. Changes to increase their helpfulness for one use case may hurt the other one. This is the main reason why, until now, PIX events would be absorbed by the D3D12 runtime and they wouldn’t reach the driver.

At GDC we announced new D3D12 APIs to let you configure this trade-off. If you tell the D3D12 runtime to pass PIX events to the driver, then the PIX events will be included in DirectX Dump Files to improve their actionability. This configuration also lets other driver-level tools use PIX events: see the Partner Announcements section below for examples of this in action. 

 

Preview: Live Shader Debugging 

At GDC we previewed Live Shader Debugging: real-time, on-chip shader debugging that’s coming to Windows. This is a much-loved Xbox feature that we are working to bring to Windows for the first time. It is designed to help you catch ‘needle in the haystack’ type problems – the kind of GPU bugs that are hardest to track down today. This is the deepest GPU tooling collaboration with hardware vendors in Windows history. 

We are targeting the first release of this in 2027. We appreciate that it’s very early to announce a feature like this. However, you are likely to see public work for this in the coming months (DebugBreak() above is a good example!) so we wanted to provide some context now about our goals and motivations. Stay tuned for more details later in 2026 and 2027. 

 

Shader Explorer 

We’re thrilled to announce the first version of Shader Explorer for PIX on Windows. Shader Explorer builds on the back-end shader compilers that driver writers must create as part of Advanced Shader Delivery. Together with PIX, they now give you low-level compile-time performance insights for your shaders alongside your HLSL. 

Insights that it may show include:

Since the compiler is decoupled from the driver, you can use Shader Explorer to analyze shaders for GPUs that you don’t own. 

AMD Demo of Shader Explorer in PIX:

Intel Demo of Shader Explorer in PIX:

 

The Shader Explorer Workflow 

Shader Explorer integrates deeply into PIX’s GPU Captures, giving you an intuitive iterative optimization flow: 

  1. Take a GPU Capture of your application and open it in PIX. 
  2. Analyze your capture and find an interesting shader to optimize. 
  3. Export that shader and its pipeline state object into Shader Explorer.
  4. Iterate on that shader, making changes guided by the static analysis insights. You can select different target GPUs to see how your changes affect each one. 
  5. Export that shader back into your GPU Capture and see its effect on rendering and performance. 

You can alternatively just directly load a HLSL file into Shader Explorer and iterate on it, without taking a PIX GPU Capture first.

You can also use the PIX API to analyze shaders programmatically, enabling you to write your own tools to analyze any or all of your shaders in bulk. 

Shader Explorer has day one support from AMD and Intel. This collaboration brings new hardware-specific optimization guidance directly into PIX’s workflow. 

 

Partner Announcements 

During the GDC session, we were joined on stage by all four Windows GPU hardware partners: AMD, Intel, NVIDIA, and Qualcomm. We are deeply appreciative of these collaborations to make Windows GPU tooling as great as possible. 

The features we announced today are the result of deep collaboration across the ecosystem. All four partners demonstrated DirectX Dump Files on their respective hardware during the session, and all four have invested in PIX plugin support to surface their hardware-specific information through PIX’s standardized UI. 

Beyond that shared foundation, our partners shared what else they’ve been working on. We encourage you to check them out. 

 

AMD 

AMD announced two integrations that go well beyond DirectX Dump Files support. 

First, they built interop between PIX and Radeon Raytracing Analyzer (RRA). This new integration allows PIX users to export an Acceleration Structure out of a PIX GPU Capture and deeply analyze it inside RRA, bringing together the strengths of both tools. 

Second, AMD showed how the new PIX Event Configurability APIs improve Radeon Graphics Profiler (RGP). With driver-level PIX markers enabled, RGP can now show your PIX markers natively without requiring any PIX header changes – they just work. 

For more on AMD’s support for PIX, see here

 

Intel 

Intel announced that their collaboration with Microsoft on PIX has moved to a whole new level, extending beyond Intel-specific PIX plugin development and into the core of PIX itself, helping to make the tool even better for all developers. They previewed their work to improve the reliability of Timing Data in GPU Captures, by filtering out misleading data caused by GPU preemption from other processes. This is just the start, and we’re excited for what Intel and PIX will build together next.

For more on Intel’s support for PIX, see here.

 

More PIX Announcements 

Here is a wide range of new PIX features also coming in this wave. We will share more details about each of these in ~May 2026 when they are released alongside the other PIX features above. 

 

PIX API 

Our long-term vision is to give you programmatic access to everything that you can see inside the PIX UI. At GDC we announced that the PIX API will be available publicly in May 2026, with support for C++, C#, and Python. It uses a D3D12-style nano-COM interface, supports all new PIX features immediately, and will light up existing PIX features over time. 

 

Tile Mappings Viewer 

We are bringing a dedicated Tile Mappings viewer to PIX to help you debug and fix issues with your tiled/reserved resource mappings. It includes the ability to see tile information for your selected pixel in the Texture Viewer, see the mappings (including visually) for a particular resource, and see the resources mapped into a particular heap. This will also be helpful for upcoming DirectX features – stay tuned. 

 

GPU Hardware Counters in System Monitor 

PIX’s system monitor view can now show low-level hardware-specific counters while your application runs, powered by our hardware partners’ PIX plugins. These complement existing cross-platform GPU and CPU counters in System Monitor today.

 

New GPU Capture File Format 

We have been working hard to rewrite our GPU Capture file format, and we are pleased to announce the first release of this in May 2026. 

 This first release comes with three main improvements: 

We are excited by the potential of this new file format. We look forward to building many new PIX features on top of it – stay tuned for details later in 2026! 

 

Capture/Replay Reliability 

We have made many reliability improvements to PIX’s capture and replay infrastructure. This includes turning on new D3D features by default, such as Application-Specific Driver State and RecreateAtGpuva, along with many other fixes to improve the consistency and reliability of your GPU captures. 

 

Remote Deployment

We announced new PIX remote deployment features, building on Remote Windows Game Development Tools. Once your local device is paired with a remote device via these remote tools, PIX will be able to automatically deploy itself to your remote machine and connect – removing the need for you to manually run PIX on both devices. This greatly streamlines multi-device development workflows.

 

Dr PIX 

Several improvements to Dr PIX are coming: a new PIX API to access and run all existing experiments, and a new experiment to help you measure how much improvement D3D12’s new Tight Alignment flag could bring to your application. These join last year’s NonUniformResourceIndexing experiment in making Dr PIX an increasingly powerful tool for finding performance wins. 

 

Update Notifications + What’s New 

PIX will now automatically notify you when a new version is available, and an updated What’s New page is built directly into PIX so you can easily see what changed. 

 

Machine Learning 

PIX supports new ML-driven graphics workloads. For more information, please visit our ML blog. 

 

Coming in May 2026 

Our biggest wave of new Windows GPU tooling features ever will mostly be available in preview in May 2026, with the DirectX Dump Files preview coming in early June 2026. DirectX Dump Files will reach retail availability in ~October 2026. Shader Explorer v2 with live/online analysis features will follow in late 2026.   

 

Get in Touch 

We are incredibly excited about this wave of new features and the deepening collaboration with our hardware partners. This is just the next big step – we will keep building toward our dream of console-level GPU developer tools on Windows. 

  

We’d love to hear from you: 

The post DirectX: Bringing Console-Level Developer Tools to Windows appeared first on DirectX Developer Blog.

]]> Advanced Shader Delivery: What’s New at GDC 2026 https://devblogs.microsoft.com/directx/advanced-shader-delivery-whats-new-at-gdc-2026/ https://devblogs.microsoft.com/directx/advanced-shader-delivery-whats-new-at-gdc-2026/#comments Thu, 12 Mar 2026 11:00:24 +0000 https://devblogs.microsoft.com/directx/?p=13122 Today we announced the innovation we’re bringing in solving shader compilation for the ecosystem at our GDC Talk: Advanced Shader Delivery for Windows. Want to find out what this means for solving shader compilation for your title and customers? Read on! State of the Industry Long shader compilation times and in-game shader stutter for D3D12 […]

The post Advanced Shader Delivery: What’s New at GDC 2026 appeared first on DirectX Developer Blog.

]]> Today we announced the innovation we’re bringing in solving shader compilation for the ecosystem at our GDC Talk: Advanced Shader Delivery for Windows. Want to find out what this means for solving shader compilation for your title and customers? Read on!

State of the Industry

Long shader compilation times and in-game shader stutter for D3D12 apps are two of the biggest problems in PC gaming. These problems are caused by compiling shaders at runtime. Unlike console, PC games do not have a fixed driver and GPU environment, and precompiled shaders need a way to be delivered to a large matrix of drivers and GPUs in the Windows ecosystem.

Last fall, we announced how advanced shader delivery is solving the problem on the Xbox ROG Ally and Ally X devices. Today, Microsoft is uniting these ecosystem pieces between game developers, IHVs, and game stores to solve shader compilation on PC going forward.

As game developers, you can enable gamers to download fully compiled shaders for their specific hardware. In our previous blog post, you can learn how to trace your game title or programmatically generate a state object database (SODB) file and use an offline compiler to compile the state objects into a precompiled shader database (PSDB) format to test the advanced shader delivery benefits locally.

What’s New and Coming Soon

In the AgilitySDK 1.619 release, we unveiled two new APIs: the app registration API and stats API.

New APIs:

App Identity  API: This API enables applications to declare their own application identity to D3D12 and the underlying graphics drivers in a standardized way. Allocations can set a default D3D12_APPLICATION_DESC and GUID to self-identify before a D3D12 device is created. Attaching application identity to the SODB will be a requirement for submitting an SODB file to the Xbox Partner Center for your title.

Stats API: This API gives game developers visibility into how well a precompiled shader database (PSDB) performs. If you are looking to see how well a given PSDB will work for a specific hardware configuration, these APIs will give game developers information on the shader cache hit rate.

PIX support: The May 2026 version of PIX will show these stats as real-time counters in PIX’s System Monitor view as your game runs.

Partial Graphics Programs:

Some titles have such a large amount of pipeline state objects (PSOs) to the point that most engines cannot enumerate them. Precompiling these PSO heavy titles in advance for many different hardware configurations for distribution through advanced shader delivery would take a significant amount of time and create duplicate effort. To address this, we are creating partial graphics programs. Partial graphics programs split the pipeline creation into two steps: create partial pre-rasterization and pixel shader programs containing common state used by different graphics pipelines, then link them together with other state.

For titles that have large amounts of PSOs, partial programs will be coming soon to more efficiently re-use graphics programs and link them together at runtime. In the meantime, check out our spec for partial graphics programs today.

Industry Alignment

We’re working closely with the GPU hardware vendors to expand advanced shader delivery across the PC ecosystem. Here’s what our partners have to say about support for this feature.

“Advanced Shader Delivery (ASD) is transforming the gaming experience, cutting load times and eliminating in‑game stutter on Xbox ROG Ally devices. It’s truly remarkable what the Microsoft and AMD engineering teams have accomplished in such a short period of time.”

-Rodney Andre, Corp. VP Software Development

For more, see here

“Intel is committed to solving shader compilation challenges on PC to improve the overall gaming experience. Microsoft Advanced Shader Delivery is a critical step toward reducing shader load times and compilation stutters, and Intel is pleased to release drivers supporting this feature on our Lunar Lake and Panther Lake platforms.”

– Lisa Pearce – Corporate Vice President, Software Group, Intel

For more, see here

“To eliminate the shader-related stutters and load times that have plagued gamers for years, NVIDIA is working closely with Microsoft on launching Advanced Shader Delivery for GeForce RTX consumers later this year.”

Henry Lin, Director of Product Management, Gaming & AI at NVIDIA

For more, see here

“Advanced Shader Delivery is a key feature for Qualcomm Snapdragon® compute platforms. By reducing redundant shader compilation, it improves the overall gaming experience. We are partnering with the Microsoft DirectX team to debut this feature soon on Qualcomm Adreno X2 GPUs.”

– Nagendra Kumar, Senior Director of Engineering

 

And here is what a middleware partner has to say about support for this feature.

“As Unreal, we’re excited about supporting advanced shader delivery in the ecosystem. We’ve been doing early testing and explorations on SODB and PSDB generation, and will have more details coming soon.” –Mihnea Balta, Director, Rendering Engineering at Epic Game

Call to action

In summary, to solve shader compilation for your title, there are two important steps: integrate SODB collection into your game engine and submit an SODB along with your game package to the Xbox Partner Center.

For more details on how to integrate SODB collection into the game development process, follow the below links.

Sample code on how to programmatically generate SODBs

Instructions on how to trace a title for SODBs

In May, the Xbox Partner Center will feature new UI where you can upload an SODB file alongside your game package.

New Xbox Partner Center UI for sharing SODB files

And in the near future, look out for the release of partial programs to more efficiently compile PSOs.

 

For additional questions, you can reach out to us via Discord.

 

 

 

 

 

 

 

 

 

 

 

 

 

The post Advanced Shader Delivery: What’s New at GDC 2026 appeared first on DirectX Developer Blog.

]]> https://devblogs.microsoft.com/directx/advanced-shader-delivery-whats-new-at-gdc-2026/feed/ 1 DirectStorage 1.4 release adds support for Zstandard https://devblogs.microsoft.com/directx/directstorage-1-4-release-adds-support-for-zstandard/ https://devblogs.microsoft.com/directx/directstorage-1-4-release-adds-support-for-zstandard/#comments Wed, 11 Mar 2026 19:00:23 +0000 https://devblogs.microsoft.com/directx/?p=13133 Today we’re releasing the public preview of DirectStorage 1.4 and the initial public preview of the Game Asset Conditioning Library. Together, they introduce Zstandard (Zstd) compression as an option for game assets on Windows. This new support meets the needs of the gaming ecosystem, bringing an open standard that improves compression ratios, enables faster load […]

The post DirectStorage 1.4 release adds support for Zstandard appeared first on DirectX Developer Blog.

]]> Today we’re releasing the public preview of DirectStorage 1.4 and the initial public preview of the Game Asset Conditioning Library. Together, they introduce Zstandard (Zstd) compression as an option for game assets on Windows. This new support meets the needs of the gaming ecosystem, bringing an open standard that improves compression ratios, enables faster load times, and provides smoother asset streaming for content-rich games.

We shared this availability at GDC in DirectX State of the Union: DirectStorage and Beyond, along with how our GPU hardware and software vendor partnerships will help this work reach ecosystem scale. Read on to understand all the details!

DirectStorage 1.4 public preview is now available

DirectStorage and Zstd

DirectStorage 1.4 brings Zstd codec support to the runtime. Zstd is a popular and open compression standard that meets our key criteria for the next great compression codec for game development.

We evaluated codecs across the following key criteria: compression ratio and decompression performance, hardware and software availability, and existing adoption. Zstd stands out by delivering competitive compression ratios and decompression performance, broad availability on hardware and software across operating systems, and widespread adoption in OS, cloud, and web scenarios.

In this release, Zstd is added to our multi-tier decompression framework with support for CPU and GPU decompression. This lets developers pick the best execution option for their workload today, while our GPU partners work towards future hardware specific optimizations for Zstd.

We’re also open sourcing Microsoft’s Zstd GPU decompression compute shader on the DirectStorage GitHub as an early, working baseline that all GPU implementations can reference. The shader is in development and is initially optimized for content chunked to 256KB or smaller, consistent with modern game packaging patterns for streaming workloads. We plan to expand capabilities and continue improving performance of the shader over the coming months as these compression investments scale across the PC ecosystem.

Improved developer orchestration and control

DirectStorage 1.3 introduced EnqueueRequests, giving developers more control over how data requests are issued and synchronized with graphics work. DirectStorage 1.4 continues this investment by adding global D3D12 CreatorID support. Developers specifying a CreatorID via DStorageSetConfiguration2 associates a D3D12 CreatorID with internal D3D12 command queues that DirectStorage manages on a per-device basis. This enables D3D12 command queue grouping to properly account for DirectStorage workloads facilitating improved predictability and GPU execution scheduling.

Introducing the Game Asset Conditioning Library

As we invested in Zstd, we saw opportunity to further improve the compression ratio resulting in the Game Asset Conditioning Library (GACL). GACL is designed to work in your existing content pipeline delivering up to a 50% improvement in Zstd compression ratios for your assets, while keeping runtime decompression cost low when used with DirectStorage.

This initial public preview contains lossless and lossy conditioning techniques including:

DirectStorage and GACL are designed to work together. After a Zstd stream is decompressed at runtime, any shuffle transforms applied during content conditioning are seamlessly reversed by DirectStorage. DirectStorage 1.4 supports this post processing step for BC1, BC3, BC4 and BC5 textures. BC7 support and additional post processing performance improvements will arrive in a future DirectStorage update.

Building for the PC ecosystem with our GPU hardware vendor partners

We are co-engineering closely with GPU hardware vendors to ensure Zstd decompression performs well across the breadth of gaming hardware. We’re excited about the optimizations each IHV will deliver in driver updates later this year that will light up even better performance when using Zstd through DirectStorage.

AMD: “Aligning the industry on an open compression standard builds a foundation for future game titles to deliver immersive experiences with even larger worlds than realistically possible today. We plan to make optimizations for AMD GPUs available in a public driver during the second half of 2026 and look forward to seeing how developers use these investments to enhance the player experience.” – Daniel Staheli, CVP Software Development, AMD

For more, see: AMD and Microsoft partner on DirectX ML, DirectStorage, and developer tools at GDC 2026

Intel: “We’re co-engineering with Microsoft to tune Zstandard decompression through DirectStorage across our GPU architectures. We look forward to sharing our performance improvements in the months ahead.” – Lisa Pearce, Corporate Vice President, Software Group, Intel

For more, see: Intel & Microsoft collaborate closely on the Future of PC Gaming

NVIDIA: “NVIDIA is excited to bring Zstd support to game developers, with decompression optimizations tailored for NVIDIA GeForce RTX GPUs arriving in the second half of this year.” – Patrick Neill, Distinguished Engineer, NVIDIA

Qualcomm: “Before the end of the year, we’re excited to bring tuned driver updates that reflect our investments in Zstd decompression on our platforms. We look forward to these investments ensuring reliable, high performance asset streaming across Windows games.” – Nagendra Kumar, Senior Director of Engineering, Qualcomm

Get started with Zstd

Get started with Zstd support in DirectStorage 1.4 and Game Asset Conditioning Library 1.0 using the steps below. This public preview release is meant to let you start evaluating Zstd with minimal disruption in your existing content and build pipeline.

  1. Download and try the DirectStorage 1.4 public preview to see Zstd support in action.
  2. Download and integrate the Game Asset Conditioning Library 1.0 public preview into your existing content pipeline.
  3. Run the new GameAssetConditioningDemo sample to see the recommended API usage and integration patterns. The sample starts with the asset content pipeline, uses GACL to shuffle and compress BCn textures at build time, and then loads and renders them at runtime through DirectStorage and D3D12.
  4. Run the updated GpuDecompressionBenchmark sample, which now supports Zstd, to compare throughput and CPU overhead across compression formats on your target hardware.
  5. Check out the Zstd GPU decompression compute shader to see how it integrates with DirectStorage and contribute improvements that scale across the breadth of GPUs.

We want your feedback as you explore these previews, especially around real-world pipeline integration, asset streaming behavior in production content, and what you need next from DirectStorage and Game Asset Conditioning Library. Feel free to open issues on the DirectStorage or Game Asset Conditioning Library GitHub, reach out over email at askwindstorage@microsoft.com, or use the #directstorage channel on the DirectX Discord.

The post DirectStorage 1.4 release adds support for Zstandard appeared first on DirectX Developer Blog.

]]> https://devblogs.microsoft.com/directx/directstorage-1-4-release-adds-support-for-zstandard/feed/ 3 DirectX Innovation at GDC 2026  https://devblogs.microsoft.com/directx/directx-gdc-2026/ Wed, 04 Mar 2026 18:00:13 +0000 https://devblogs.microsoft.com/directx/?p=13085 The excitement is building as we head into the 2026 Game Developer Conference and the DirectX team has a lot to share. We will be showcasing major updates in asset streaming, GPU tooling, ML-powered real-time graphics on Windows, and shader compilation at GDC.  If you’re attending GDC, we’d love to see you in person. Below is a quick preview of what we’ll be covering so that you can mark your calendars!  DirectX State […]

The post DirectX Innovation at GDC 2026  appeared first on DirectX Developer Blog.

]]> The excitement is building as we head into the 2026 Game Developer Conference and the DirectX team has a lot to share. We will be showcasing major updates in asset streaming, GPU tooling, ML-powered real-time graphics on Windows, and shader compilation at GDC. 

If you’re attending GDC, we’d love to see you in person. Below is a quick preview of what we’ll be covering so that you can mark your calendars! 

DirectX State of the Union 2026: DirectStorage and Beyond 

Wednesday, March 11, 11:30 AM – 12:30 PM (Room 3001, West Hall) 

Over the past year, we’ve tackled some of the toughest challenges facing developers on DirectX 12. One major investment is the next chapter in game asset streaming where we’re introducing Zstandard compression support in DirectStorage. Alongside this we’ll release the Game Asset Conditioning Library (GACL) and talk about asset pipelines optimizations that will unlock the full potential of NVMe storage. These innovations reduce I/O latency and enable smoother streaming, paving the way for richer, more responsive worlds. 

In this session we’ll also explore how we’re addressing other critical developer needs, including advancing raytracing features and shader pre-compilation. We’re evolving ML integration in DirectX and we’re delivering tooling improvements that converge our PC and Xbox developer experience. Join us to learn more about how we’re building the future of graphics. 

DirectX: Bringing Console-level GPU Tools to Windows 

Thursday, March 12, 11:30 AM – 12:30 PM (Room 2020, West Hall) 

Join the DirectX team and our GPU hardware partners as we announce new cutting-edge tooling features in DirectX, PIX, and more – all with the goal of helping you make great, reliable, and performant DirectX games on Windows. 

The talk will introduce major new DirectX features to help you with the biggest pain points of PC graphics development today. We will give you unprecedented insight into the ambitious GPU tooling work that is happening across the industry, both today and in the future. We will show the biggest wave of new features to come to PIX on Windows in its 10-year history. And our hardware partners will join us to showcase our joint collaborations – to DirectX, to PIX, and to the wide range of other GPU tools available on Windows today. 

Evolving DirectX for the ML Era on Windows 

Thursday, March 12, 12:45 PM – 1:45 PM (Room 2024, West Hall) 

Machine learning is reshaping graphics, and DirectX is evolving to meet this moment. This session explores how we’re enabling neural rendering on Windows by introducing linear algebra capabilities in HLSL, unlocking dedicated hardware acceleration for AI operations and allowing developers to embed lightweight models directly in shaders. We’ll also look ahead to the challenges of scaling beyond hand-authored shader models and share our vision for how we will eventually allow any developer to integrate entire models into their engines. Join us to learn how these innovations lay the groundwork for the next generation of real-time graphics and what they mean for the future of gaming. 

Advanced Shader Delivery on Windows 

Thursday, March 12, 3:40 PM – 4:40 PM (Room 2011, West Hall) 

Shader compilation has long been a pain point for PC games, causing slow initial loads and disruptive stutter during gameplay. Advanced Shader Delivery introduces a new approach: distributing precompiled shaders through storefronts so players experience faster startup and smoother performance. In this session, we’ll show how developers can leverage state object collection in their engines to automatically gather shader inputs, package them for submission through the Xbox PC app, and ensure updates stay in sync with game patches and driver changes. Come learn how this system streamlines deployment and improves the player experience across the PC ecosystem. 

See you at GDC! 

We’re looking forward to connecting with the community at GDC 2026 and sharing what we’ve been working on.  

If you’re attending, we hope you’ll stop by one (or all!) of our sessions! 

As always, check out our DirectX Discord server to stay in touch.

The post DirectX Innovation at GDC 2026  appeared first on DirectX Developer Blog.

]]> Announcing Shader Model 6.9 Retail and New D3D12 Improvements https://devblogs.microsoft.com/directx/shader-model-6-9-retail-and-more/ Thu, 26 Feb 2026 18:05:50 +0000 https://devblogs.microsoft.com/directx/?p=12769 Today, we are pleased to announce that Shader Model 6.9 and other features have been officially released with Agility SDK 1.619 and complementary DXC 1.9.2602.16. Many of these features have been in preview status since 2025. Simultaneously, we are releasing a handful of new preview features in a separate preview runtime: Agility SDK 1.719-preview. Overview […]

The post Announcing Shader Model 6.9 Retail and New D3D12 Improvements appeared first on DirectX Developer Blog.

]]> Today, we are pleased to announce that Shader Model 6.9 and other features have been officially released with Agility SDK 1.619 and complementary DXC 1.9.2602.16. Many of these features have been in preview status since 2025. Simultaneously, we are releasing a handful of new preview features in a separate preview runtime: Agility SDK 1.719-preview.

Overview

AgilitySDK 1.619 exposes the following features. There’s more detail further below, including download and driver links.

AgilitySDK 1.719-preview exposes the features in 1.619 in addition to the following preview D3D features, with more detail further below.

Downloads

Hardware Support

IHV Driver Link(s)
AMD AMD Software: Adrenalin Edition 26.2.1, AMD Software: AgilitySDK Developer Preview Edition 25.30.21.01
Intel Intel® Arc Graphics – Windows
NVIDIA Download The Official NVIDIA Drivers | NVIDIA or the NVIDIA App, which handles automatic driver updates. Driver version 595 and newer.

See Appendix > Feature Support for the full table of each feature’s supported hardware.

AgilitySDK 1.619 Features

Long Vector

The ability to load, store, and perform elementwise operations on HLSL vectors longer than 4 elements and up to 1024.  Required as part of Shader Model 6.9. Spec: https://github.com/microsoft/hlsl-specs/blob/main/proposals/0026-hlsl-long-vector-type.md

In addition to the above spec, check out this overview: https://devblogs.microsoft.com/directx/hlsl-native-and-long-vectors/. This mentions Cooperative Vector benefitting from Long Vectors; however, note that Cooperative Vector has been deprecated in favor of a future design unifying matrix-matrix and vector-matrix operations, coming in Shader Model 6.10. See this earlier blog: https://devblogs.microsoft.com/directx/shader-model-6-9-and-the-future-of-cooperative-vector/

16 bit float Specials

HLSL IsNan(), IsInf(), IsFinite() now also support 16 bit floats. Also newly added IsNormal(), including 16 bit support. Spec: https://github.com/microsoft/hlsl-specs/blob/main/proposals/0038-16bit-isspecialfloat.md

Previously optional features required in SM 6.9

Spec: https://github.com/microsoft/hlsl-specs/blob/main/proposals/0044-sm69-required-features.md

DXR 1.2

Opacity Micromaps

Opacity Micromaps (OMMs) enable hardware to handle alpha tested geometry more efficiently than relying only on costly AnyHit shader invocations. The overall feature shipped previously and just the small HLSL portion is coming out of preview. 

Blog with more details for OMM overall: https://devblogs.microsoft.com/directx/d3d12-opacity-micromaps/

Shader Execution Reordering

Shader Execution Reordering (SER) enables application shader code to inform hardware on how to find coherency across rays so they can be sorted for better parallel execution. 

This feature is coming out of preview now.  The addition since preview is that apps can query if a device actually reorders.

See details in the device support section of this blog which covers SER overall: https://devblogs.microsoft.com/directx/shader-execution-reordering/

D3D Customer-Requested Features

Revised Resource View Creation APIs

As GPU architectures have evolved, the original D3D12 view‑creation model has shown limitations, especially around buffer access patterns, descriptor management, and alignment rules. Revised View Creation modernizes this area of the API in response to multiple customer requests.

Previously, buffers views were limited to being measured in elements. With this change, they can now be measured using byte offsets and sizes too. In addition, variants of view creation have been added that return HRESULT rather than void to allow for programmatic error handling, as opposed to relying on the debug layer validation and dealing with a device removal.

Spec: https://github.com/microsoft/DirectX-Specs/blob/master/d3d/D3D12RevisedCreateViews.md

Periodic Trim Notifications

Kernel-level trim notifications are now available through D3D12 runtime interfaces, enabling applications to receive notifications when the system should trim residency. No new driver support is required for this feature.

Spec: https://github.com/microsoft/DirectX-Specs/blob/master/d3d/D3D12_PeriodicTrimNotifications.md

Increased 1D Dispatch Limit

Increases the maximum 1-Dimensional Dispatch/DispatchMesh size (Currently 65535) to a device specific value which is much larger for most recent hardware.

Spec: https://github.com/microsoft/DirectX-Specs/blob/master/d3d/D3D12IncreasedDispatchDimension.md

CPU Timeline Query Resolves

A new kind of Query Heap which can be resolved on the CPU timeline, avoiding unnecessary GPU work and overhead. Introduces ID3D12Device15::CreateQueryHeap1 along with ID3D12Device15::ResolveQueryData.

Spec: https://github.com/microsoft/DirectX-Specs/blob/master/d3d/D3D12CpuTimelineQueryResolution.md

AgilitySDK 1.719-preview Features

The following preview features are available in 1.719-preview: https://devblogs.microsoft.com/directx/directx12agility/

Enhanced Barriers update: Fence Barriers

Fence Barriers expand on Enhanced Barriers to provide support for signaling and waiting on fences during command buffer execution. This provides more flexibility to synchronize between more distant dependencies in the command stream and allows for real-time dependencies between the GPU timeline and CPU timeline.

See this separate blog for full details: https://devblogs.microsoft.com/directx/fence-barriers-fine-grained-gpu-synchronization-in-direct3d-12/

VPblit 3DLUT

The D3D12 VPBlit 3DLUT API enables access to dedicated video processing hardware for tone mapping operations that combine CSC, 1D LUT, and 3D LUT stages. While equivalent functionality can be achieved using D3D12 shaders on the 3D engine, exposing the video processing 3DLUT path through this API allows drivers and hardware to execute these operations more efficiently. This offloads tone mapping work from the 3D GPU engine and, in some scenarios, can reduce power consumption by leveraging the video processing engine. For detailed information, please refer to the spec document.

Driver support is available as follows:

Extension Mechanism

The D3D12 Extensions API enables GPU hardware vendors (IHVs) and software developers (ISVs) to collaborate on new experimental graphics features. By providing a structured extension mechanism within the D3D12 ecosystem, vendors can expose experimental or vendor-specific capabilities immediately, gather real-world feedback, and iterate rapidly—dramatically shortening the time it takes to bring features into the core API.

The core interface, ID3D12Extension, derives from ID3D12DeviceChild and integrates seamlessly with D3D12’s object model. Extensions communicate through D3D12_EXTENSION_ARGUMENTS (flexible input/output buffers) and can hook into standard operations like resource creation via ID3D12DeviceApiExtensions.

PIX

PIX supports all retail and preview features released here. See this PIX blog: https://devblogs.microsoft.com/pix/pix-2602-25

Appendix

Feature Support

Using the latest drivers linked in Overview > Hardware Support:

AMD Intel NVIDIA
Long Vector AMD Radeon RX 9000 series Intel® Arc B-Series Graphics All RTX hardware
16 bit float Specials AMD Radeon RX 9000 series Intel® Arc B-Series Graphics  All RTX hardware
Opacity Micromaps (OMM)  All RTX hardware. Hardware-accelerated on RTX 4xxx+ GPUs, software-emulated on older.
Shader Execution Reordering (SER) AMD Radeon RX 9000 series supports API but doesn’t reorder. Intel® Arc B-Series Graphics support API and do reordering. RTX 4xxx+ GPUs support API and do reordering.
Revised Resource View Creation APIs AMD Radeon RX 7000 and 9000 series Intel® Arc B-Series Graphics  All RTX hardware
Periodic Trim Notifications Intel® Arc B-Series Graphics  All RTX hardware
Increased Dispatch Grid Limit AMD Radeon RX 7000 and 9000 series. UINT_MAX compute, 64k mesh. Intel® Arc B-Series Graphics. Existing 64k limit, to increase in future drivers. All RTX hardware. Existing 64k limit, to increase in future drivers.
CPU Timeline Query Resolves AMD Radeon RX 7000 and 9000 series Intel® Arc B-Series Graphics  All RTX hardware
Fence Barriers (preview) AMD Radeon RX 7000 and 9000 series Intel® Arc B-Series Graphics Contact your developer relations representative for in-development driver access.
VPblit 3DLUT (preview) AMD Radeon RX 7000 series graphics cards and Ryzen AI 300/400 series processors with integrated graphics Intel Core Ultra processor family Lunar Lake and Panther Lake platforms Contact your developer relations representative for specifics.

 

The post Announcing Shader Model 6.9 Retail and New D3D12 Improvements appeared first on DirectX Developer Blog.

]]>