To ensure that the cudnn is properly initalized so as to avoid encountering the error Runtimeerror: cudnn error: cudnn_status_not_initialized:
1. Check the status of cuDNN and install it if it not present.
2. Check that PyTorch, cuDNN and CUDA versions are compartible to run together.
3. Confirm the library path configurations, and if they are configured in the right way.
4. Rectify any potential issues with the GPU memory, or try switching to CPU.
Reasons for encountering Runtimeerror: cudnn error: cudnn_status_not_initialized
The Runtimeerror: cudnn error: cudnn_status_not_initialized is common for the developers who often use the conventional neural network (CNN) or deep learning models that utilize cuDNN, a library that has implemented some optimized operations of CNN like activation, convolution and pooling among more.
Developed by NVIDIA to ensure that deep learning apps run fast in GPUs, cuDNN otherwise called CUDA Deep Neural Network library works with some of the common libraries like Keras, PyTorch and TensrFlow among more.
Now considering that cuDNN avails higher accuracy, quick training, and memory saving capabilities, encountering errors during its development can be a big setback. One of the most common error is the Runtimeerror: cudnn error: cudnn_status_not_initialized.
Just as the error message implies, PyTorch cannot initialize or find cuDNN on the system. The deep case can be as a result of a mismatched PyTorch and the present version of CUDA, small GPU memory, a missing or incompatible cuDNN installation.
We therefore use a blend of options to solve the error through checking and installing cuDNN, managing issues with the memory, and checking on the compatibility. We show you how the solutions will assist to manage Runtimeerror: cudnn error: cudnn_status_not_initialized with ease and hence be able to continue with your programming endeavors.
Employ extra caution when installing or updating cuDNN since it can cause conflicts or overwrite the files and libraries that exist in the system; hence it’s good to always backup the system so as to make any desired changes.
Solution
Solution 1: Check the state and Install cuDNN
Your first step is to confirm that cuDNN is installed in the right way, and works well on your system.
Here is a program that can assist you check the availability of cuDNN and whether it is compatible with your given version of PyTorch.
import torch
print(torch.backends.cudnn.is_available())
print(torch.backends.cudnn.version())
Output:
True
8700
There are two expected outcomes from the above program; that it prints true and proceeds on to show the version number if indeed cuDNN is available in the system, or None or False when the cuDNN is not compatible with the Pytorch version or not installed in the system.
If your case is the later, don’t fret as there is a turnaround for you to manage the error.
First go to the NVIDIA website and download and install cuDNN. Downloading will require you to register as a developer. Further, before proceeding with the download option, check the PyTorch website for compatibility so that the cuDNN you download aligns with CUDA and PyTorch versions that’s in your device.
Further, after you have downloaded the packages, follow the installation processes so that they are in line with your operating systems.
A case is with the Windows 10 OS where its process require extraction of the files from the package and then copying the files to the installation directory, that’s normally set at C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1. You may at times be required to add a bin folder of the CUDA installation directory to the environmental PATH variable.
Solution 2: Rectify the issues with version compatibility.
The other option to resolve issues that can appear due to compatibility issues between PyTorch, cuDNN and CUDA. Ensure that the versions of the mentioned components complement each other and especially satisfy the requirements of the deep learning model that you want to build.
import torch
print(torch.__version__)
print(torch.version.cuda)
Output:
2.0.1+cu118
11.8
It prints the version of PyTorch and CUDA from where you can compare whether they are compatible in the PyTorch website. Based on the results that you get you can then either opt to update to downgrade to ensure that PyTorch,CUDA and cuDNN have versions that match.
Below is an example showing installation of the components using pip:
# Install PyTorch 1.8.0 with CUDA 11.1 and cuDNN 8.0.5
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
Check the PyTorch website for installation commands based on the desired configurations.
Solution 3: Check the way library path are configured
A different cause of Runtimeerror: cudnn error: cudnn_status_not_initialized is because you have a wrong configuration of the library path. You will find it configured on the environment variable LB_LIBRARY_PATH. You can confirm its location with the help of the command echo $LD_LIBRARY_PATH that outputs the library path or else an error message if not present or configured inappropriately.
To rectify the issue and get your program to run as anticiated, add the directory where the library is situated to the LD_LIBRARY_PATH.Let's take a case where the library is in the directory /usr/local/lib, you can set the environment variable with the help of the following command:export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib.
After executing the command, restart the shell and you find that the program now executes well. Then use the ldd command in linux to list all the libraries that are linked to the program.
Solution 4: Rectify an Issues with the GPU Memory
Finally, we address the memory issues that can result to failure of the cuDNN initialization since small GPU memory can fail to initialize cuDNN because most storage is held by Pytorch in internal cache.
The following code will release the cache manually:
torch.cuda.empty_cache()
Alternatively, we can try to change the cuDNN initialization to mock a convolution at the start of the program:
import torch
def force_cudnn_initialization():
s = 32
dev = torch.device('cuda')
torch.nn.functional.conv2d(torch.zeros(s, s, s, s, device=dev), torch.zeros(s, s, s, s, device=dev))
Try the above code at the start of your program as it has proven effective in the solution.
Conclusion.
Hopefully, the article has enabled you to understand and solve Runtimeerror: cudnn error: cudnn_status_not_initialized that is common in Python. Note that you should go for a solution that seen more suitable for resolving your error, and more so, attempt different options.
Attempt solving it and then share your feedback below.
References
* NVIDIA cuDNN Documentation: https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnStatus_t
* NVIDIA Developer Forums: https://forums.developer.nvidia.com/t/failed-to-allocate-memory-error-in-cudnn/100520