← 返回首页
[BUG] Memory leak in Cholesky decomposition (CUDA) · Issue #3702 · arrayfire/arrayfire · GitHub
Skip to content

Navigation Menu

Toggle navigation
Sign in
Appearance settings
Search or jump to...

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Resetting focus

[BUG] Memory leak in Cholesky decomposition (CUDA) #3702

New issue
New issue

Description

We experienced a gradual decrease of free GPU memory over time, which we pinned down to the Cholesky decomposition. Because of our architecture, this calculation occasionally ends up running in a new thread, each time causing thread-local allocations to be repeated over and over, eventually exhausting memory (40MB allocation per thread).

Description

Our software runs a number of services in different threads. Occasionally, the software will destroy these threads and create new ones. We run ArrayFire in one of these threads. We noticed that, each time after our software destroys and re-creates threads, GPU memory usage increases and never goes down. After a series of such destroy/re-create, GPU memory becomes clogged.

We pinned down this allocation in af::choleskyInPlace, i.e., likely inside cuSolver.

Reproducible Code and/or Steps

The code below reproduces the problem. Each thread allocates an extra 40MB of GPU memory, which is never released even after the thread is joined.

#include <arrayfire.h> #include <cuda_runtime.h> #include <iostream> #include <thread> std::size_t available_mem() { std::size_t free = 0; std::size_t total = 0; cudaMemGetInfo(&free, &total); return free; } int main() try { af::info(); std::size_t init_mem = available_mem(); for (std::size_t i = 0; i < 10; ++i) { std::thread t( [&]() { af::array x = af::randu(40, 10); af::array l = af::matmulTN(x, x); x = af::array(); af::eval(l); af::deviceGC(); if (af::choleskyInPlace(l, false) != 0) { std::cout << "bad" << std::endl; return; } af::eval(l); l = af::array(); af::deviceGC(); std::cout << "consumed: " << (init_mem - available_mem()) / 1024.0 / 1024.0 << std::endl; }); t.join(); } return 0; } catch (const std::exception& e) { std::cout << e.what() << std::endl; return 1; }

Output:

consumed: 50 consumed: 90 consumed: 130 consumed: 172 consumed: 212 consumed: 252 consumed: 294 consumed: 334 consumed: 374 consumed: 416

System Information

  1. ArrayFire version: 3.8.3
  2. Devices installed on the system: Nvidia GTX 1070 8GB
  3. (optional) Output from the af::info() function if applicable.
  4. ArrayFire v3.8.3 (CUDA, 64-bit Windows, build 987d5675a) Platform: CUDA Runtime 11.8, Driver: 13000 [0] NVIDIA GeForce GTX 1070, 8192 MB, CUDA Compute 6.1
  5. ...

Checklist

  • Using the latest available ArrayFire release
  • GPU drivers are up to date

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Footer

      © 2026 GitHub, Inc.