Skip to content

RunPod Deployment

Deploy Subtide on RunPod for GPU-accelerated transcription.


RunPod provides GPU instances for running Subtide with hardware-accelerated Whisper transcription. Two deployment modes are available:

ModeBest ForCost Model
ServerlessVariable load, pay-per-usePer-second billing
DedicatedConsistent load, always-onHourly billing

Terminal window
docker pull ghcr.io/rennerdo30/subtide-runpod:latest

Set your backend URL to your RunPod endpoint:

  • Serverless: https://api.runpod.ai/v2/{ENDPOINT_ID}
  • Dedicated: https://{POD_ID}-5001.proxy.runpod.net

  1. Go to RunPod Serverless
  2. Click New Endpoint
  3. Configure:
    • Name: subtide
    • Container Image: ghcr.io/rennerdo30/subtide-runpod:latest
    • GPU Type: RTX 3090, RTX 4090, or A100
    • Max Workers: 3-5 (adjust based on load)

Set these in the endpoint configuration:

WHISPER_MODEL=large-v3-turbo
WHISPER_BACKEND=faster
SERVER_API_KEY=sk-xxx
SERVER_API_URL=https://api.openai.com/v1
SERVER_MODEL=gpt-4o
Operation Mode: Tier 3 or Tier 4
Backend URL: https://api.runpod.ai/v2/{ENDPOINT_ID}
RunPod API Key: {YOUR_RUNPOD_API_KEY}
  • Pay only when processing requests
  • Cold start: ~10-30 seconds
  • Ideal for sporadic usage

  1. Go to RunPod GPU Pods
  2. Click Deploy
  3. Select a GPU (RTX 3090 or better recommended)
  4. Use the Docker image:
    ghcr.io/rennerdo30/subtide-runpod-server:latest

Set in pod configuration:

WHISPER_MODEL=large-v3-turbo
WHISPER_BACKEND=faster
SERVER_API_KEY=sk-xxx
SERVER_API_URL=https://api.openai.com/v1
SERVER_MODEL=gpt-4o
PORT=5001

Enable HTTP port 5001 in pod settings.

Operation Mode: Tier 3 or Tier 4
Backend URL: https://{POD_ID}-5001.proxy.runpod.net
  • Hourly billing while pod is running
  • No cold start
  • Ideal for consistent usage

GPUVRAMWhisper ModelCost
RTX 309024 GBlarge-v3$$
RTX 409024 GBlarge-v3$$$
A100 40GB40 GBlarge-v3$$$$
A100 80GB80 GBlarge-v3$$$$$

Tip: Recommendation RTX 3090 or RTX 4090 offer the best price/performance for Subtide.


GPU: RTX 3090
WHISPER_MODEL=base
Max Workers: 2
  • Lower cost per request
  • Faster processing of smaller models
  • Good for light usage
GPU: RTX 4090
WHISPER_MODEL=large-v3-turbo
Always-on pod
  • No cold start
  • Best transcription quality
  • Good for heavy usage

RunPod requires authentication for serverless endpoints:

Terminal window
curl -X POST https://api.runpod.ai/v2/{ENDPOINT_ID}/runsync \
-H "Authorization: Bearer {RUNPOD_API_KEY}" \
-H "Content-Type: application/json" \
-d '{"input": {...}}'

The extension handles this automatically when you provide your RunPod API key in the settings.


  • View request counts
  • Monitor worker scaling
  • Check error rates
  • See billing information
  • SSH into pod for logs
  • View GPU utilization
  • Monitor memory usage

  • Increase minimum workers (serverless)
  • Use dedicated pod for instant response
  • Pre-warm with periodic requests
  • Use smaller Whisper model
  • Reduce concurrent requests
  • Upgrade to larger GPU
  • Check pod/endpoint status
  • Verify URL format
  • Ensure port is exposed
  • Verify RunPod API key
  • Check endpoint ID
  • Ensure key has correct permissions

  1. Use serverless for variable load
  2. Scale workers based on actual usage
  3. Use smaller models when quality isn’t critical
  4. Stop pods when not in use
  5. Monitor usage in RunPod dashboard