Skip to main content

GPUhub Elastic Deployment API

Read API Documentation for more details.

Why Not Serverless

Why choose Elastic Containers over Pure Serverless
elastic-archi-dia

Best Practices

Efficient deployments, service discovery/load balancing, and container management etc.

Scheduling Mode Overview

Resource Partitioning Logic

How Resources Are Divided (Computing Power Units)?
  1. GPUhub manages multiple physical servers (hosts) with varying setups, including different GPU models/numbers, CPU cores, and memory amounts.
  2. For each host, the system automatically divides the CPU and memory evenly based on the number of GPUs (e.g., if a host has 8 GPUs, it creates 8 “units”).
  3. Each unit is a fixed bundle: 1 GPU + a portion of CPU + a portion of memory. You can’t change or split these bundles—it’s designed for simplicity.
Simple Example:

  • Host A has: 8x RTX 5090 GPUs, 128 CPU cores, 720GB memory.
  • Each unit on this host: 1x RTX 5090 GPU + 16 CPU cores + 90GB memory.
  • When you create a container (your virtual machine), you can only request 1 to 8 units. The CPU/memory/GPU ratios stay fixed—no custom tweaks.
This keeps things straightforward: you pick how many GPUs you want, and the system gives you proportional CPU and memory automatically.

Creation & Launch Mechanism

How Containers Are Created and Started (Scheduling)?
  1. When you request a container, you specify basic conditions like: GPU model and count, CPU core range, memory size range, or price limit.
  2. The system scans available hosts and picks one that matches your conditions (due to the fixed units and host differences).
  3. Once matched, the container starts right away on that host.
Simple Example:

Available hosts:Host A: 8x RTX 5090, 128 CPU cores, 720GB memoryHost B: 8x RTX 4090, 64 CPU cores, 720GB memoryHost C: 8x RTX PRO 6000, 128 CPU cores, 360GB memoryYour request: 8 GPUs, CPU cores between 100-200, memory between 224-1024GB.System matches: Host A or Host C (Host B doesn’t meet CPU range).If on Host A: You get 8x RTX 5090, 128 CPU cores, 720GB memory.If on Host C: You get 8x RTX PRO 6000, 128 CPU cores, 360GB memory.
In short, you set loose requirements, and the system finds the best fit quickly - NO NEED to micromanage hardware details. If no match, it queues until one is available. This reduces complexity while ensuring efficient use of resources.

Elastic Scheduling Mode

Mode 1: ReplicaSet-Type Scheduling

Applicable Scenarios: Long-term services requiring high availability, such as web apps, APIs, or persistent inference servers.
GPUHub’s ReplicaSet refers to creating and maintaining a stable set of containers that are always running and reach the specified number of replicas at any time. Each container replica is scheduled and started according to your set container scheduling conditions. If the scheduling conditions are modified, the system will destroy existing running containers that do not meet the conditions and start new ones that do. Modifying the number of container replicas will immediately create new containers or destroy existing running ones until the number of running containers equals your set replica count.

Mode 2: Job-Type Scheduling

Applicable Scenarios: Batch processing tasks, such as distributed training jobs, data processing pipelines, or one-off computations that run to completion.
GPUHub’s Job refers to creating one or more containers until the specified number of containers complete execution and exit. Unlike ReplicaSet, Job will not start new containers after one ends to maintain the specified replica count (new containers are only started if the number of completed containers has not reached the target). Instead, completed containers are recorded as done until the number of completed containers reaches the target value, at which point the entire scheduling ends, and no new containers are started.

Mode 3: Container-Type Scheduling

Applicable Scenarios: Simple, single-instance tasks like quick testing, debugging, or short-lived scripts.
GPUHub’s Container refers to creating only one container until that container ends and exits, ending the scheduling. Equivalent to a Job with target container count=1.

Container Lifecycle

The container’s lifecycle depends on the execution lifecycle of your set cmd command. If the cmd execution ends, the container will exit and shut down. Therefore, if the command to start the application in your cmd is backgrounded, add sleep infinity at the end of the cmd to prevent the parent process from exiting, which would cause the container to shut down and all other processes to end. The following two methods work (specific commands are examples only):
# Method 1 example:
python app.py

# Method 2 example:
nohup python app.py & && sleep infinity
In Method 2, the application runs in the background, and sleep infinity blocks the cmd from ending, making the application’s lifecycle independent of the container’s lifecycle. You need to manage the application’s lifecycle independently (i.e., if app.py ends, the container is still running normally, and the application state cannot be inferred from the container state). If you need to manage the application’s lifecycle yourself, this method is recommended; otherwise, Method 1 is recommended. Additionally, you can call the API to stop the Elastic Deployment GPU or stop a specific container to complete the container’s shutdown.

Billing

Please refer the part of elastic-deployment in Billing for more details.

Differences Between “Container Instances” and “Serveless Containers”

Container Instances (Rented in Computing Market)Elastic Deployment GPU Containers
1. Data Retention RulesHave a data retention period after shutdown.Release data immediately upon shutdown and do not retain it.
2. Data Disk ExpansionSupport data disk expansion.Do not support expansion; default data disk size is 50GB.
3. Restart MethodCan be restarted as long as not released.Cannot be restarted after stopping. If the deployment has stopped, create a new deployment; if not stopped, adjust the number of running containers by setting replica count.
4. Entering Container Method(Assumed to support JupyterLab, based on context).Have no JupyterLab entry; to enter the container, obtain SSH command and password (refer to Documentation).