Why is my program stuck with no output?
First, use the top and nvidia-smi commands to check CPU and GPU usage. If the CPU is at 100% and the GPU is idle, it likely means the program is stuck at a GPU call. Refer to the answer to the previous question for more details. If not, debug the code by adding print statements to key lines to identify where the program is hanging. Then, search online for the specific issue. Always analyze the code instead of guessing blindly.Why am I encountering CUDA OOM (Out of Memory)?
If your program reports OOM, start by setting the batch size to 1 and gradually increase it to see when it fails. This will help you decide whether to upgrade your configuration or switch to a GPU with more memory. If the first run is fine but subsequent runs fail, usenvidia-smi to check for residual GPU memory usage. If there is usage, terminate the lingering process with ps -ef and kill -9 <PID>. If not, the program might inherently require more memory during computation, especially with dynamic frameworks.
What if there are not enough idle GPUs on the host?
You can either start the instance in “no-GPU mode” to download important data or migrate the instance to another host. Alternatively, wait for GPUs to become available on the current host.Why can’t I connect to VSCode or SSH after changing the instance image?
For Linux/Mac users, delete the local known_hosts file by runningrm ~/.ssh/known_hosts. For Windows users, delete C:/Users/<username>/.ssh/known_hosts. Then, retry the connection.