nvidia-smi command to check the GPU usage. If you find that the program has already terminated but there is still GPU memory occupied, it indicates that a residual process is holding the memory. You can release it as follows:

ps -ef command.

python train.py in the screenshot is a process I started, while others are system processes or unrelated to GPU memory usage.
- Next, terminate the process:
- From the screenshot, the process IDs for
python train.pyare594and797. You can use thekill -9 594 797command to end these processes. However, when many processes occupy GPU memory, especially in multi-GPU parallel scenarios, this method can be cumbersome.
- From the screenshot, the process IDs for
- Here is a more powerful way to terminate processes:
- Using ps
-ef, you can see that all my processes contain the keywordtrain(and other unrelated system processes do not, to avoid accidental termination). You can filter your processes using thegrepcommand, for example:
- Using ps

awk command. The awk command is complex, but you only need to remember the following command:

ps -ef | grep train | awk '{print $2}' | xargs kill -9

grep train command itself generates a process that gets filtered out.

| symbol is called a pipe. Its function is to use the output of one command as the input for the next command (usually stdout; stderr requires separate handling). Pipes are very useful in many scenarios. For example, if a directory contains tens of thousands of files, but only one is a .txt file while the others are images, manually searching through the list generated by ls would be very cumbersome. Instead, you can use: ls | grep "\.txt$"