RunPod Deployment
RunPod Deployment
Section titled “RunPod Deployment”Deploy Subtide on RunPod for GPU-accelerated transcription.
Overview
Section titled “Overview”RunPod provides GPU instances for running Subtide with hardware-accelerated Whisper transcription. Two deployment modes are available:
| Mode | Best For | Cost Model |
|---|---|---|
| Serverless | Variable load, pay-per-use | Per-second billing |
| Dedicated | Consistent load, always-on | Hourly billing |
Quick Start
Section titled “Quick Start”Docker Image
Section titled “Docker Image”docker pull ghcr.io/rennerdo30/subtide-runpod:latestConfigure Extension
Section titled “Configure Extension”Set your backend URL to your RunPod endpoint:
- Serverless:
https://api.runpod.ai/v2/{ENDPOINT_ID} - Dedicated:
https://{POD_ID}-5001.proxy.runpod.net
Serverless Deployment
Section titled “Serverless Deployment”1. Create Serverless Endpoint
Section titled “1. Create Serverless Endpoint”- Go to RunPod Serverless
- Click New Endpoint
- Configure:
- Name:
subtide - Container Image:
ghcr.io/rennerdo30/subtide-runpod:latest - GPU Type: RTX 3090, RTX 4090, or A100
- Max Workers: 3-5 (adjust based on load)
- Name:
2. Environment Variables
Section titled “2. Environment Variables”Set these in the endpoint configuration:
WHISPER_MODEL=large-v3-turboWHISPER_BACKEND=fasterSERVER_API_KEY=sk-xxxSERVER_API_URL=https://api.openai.com/v1SERVER_MODEL=gpt-4o3. Configure Extension
Section titled “3. Configure Extension”Operation Mode: Tier 3 or Tier 4Backend URL: https://api.runpod.ai/v2/{ENDPOINT_ID}RunPod API Key: {YOUR_RUNPOD_API_KEY}Serverless Pricing
Section titled “Serverless Pricing”- Pay only when processing requests
- Cold start: ~10-30 seconds
- Ideal for sporadic usage
Dedicated Pod Deployment
Section titled “Dedicated Pod Deployment”1. Create GPU Pod
Section titled “1. Create GPU Pod”- Go to RunPod GPU Pods
- Click Deploy
- Select a GPU (RTX 3090 or better recommended)
- Use the Docker image:
ghcr.io/rennerdo30/subtide-runpod-server:latest
2. Environment Variables
Section titled “2. Environment Variables”Set in pod configuration:
WHISPER_MODEL=large-v3-turboWHISPER_BACKEND=fasterSERVER_API_KEY=sk-xxxSERVER_API_URL=https://api.openai.com/v1SERVER_MODEL=gpt-4oPORT=50013. Expose Port
Section titled “3. Expose Port”Enable HTTP port 5001 in pod settings.
4. Configure Extension
Section titled “4. Configure Extension”Operation Mode: Tier 3 or Tier 4Backend URL: https://{POD_ID}-5001.proxy.runpod.netDedicated Pricing
Section titled “Dedicated Pricing”- Hourly billing while pod is running
- No cold start
- Ideal for consistent usage
GPU Selection
Section titled “GPU Selection”| GPU | VRAM | Whisper Model | Cost |
|---|---|---|---|
| RTX 3090 | 24 GB | large-v3 | $$ |
| RTX 4090 | 24 GB | large-v3 | $$$ |
| A100 40GB | 40 GB | large-v3 | $$$$ |
| A100 80GB | 80 GB | large-v3 | $$$$$ |
Tip: Recommendation RTX 3090 or RTX 4090 offer the best price/performance for Subtide.
Configuration Examples
Section titled “Configuration Examples”Cost-Optimized (Serverless)
Section titled “Cost-Optimized (Serverless)”GPU: RTX 3090WHISPER_MODEL=baseMax Workers: 2- Lower cost per request
- Faster processing of smaller models
- Good for light usage
Quality-Optimized (Dedicated)
Section titled “Quality-Optimized (Dedicated)”GPU: RTX 4090WHISPER_MODEL=large-v3-turboAlways-on pod- No cold start
- Best transcription quality
- Good for heavy usage
API Authentication
Section titled “API Authentication”RunPod requires authentication for serverless endpoints:
Header Authentication
Section titled “Header Authentication”curl -X POST https://api.runpod.ai/v2/{ENDPOINT_ID}/runsync \ -H "Authorization: Bearer {RUNPOD_API_KEY}" \ -H "Content-Type: application/json" \ -d '{"input": {...}}'Extension Configuration
Section titled “Extension Configuration”The extension handles this automatically when you provide your RunPod API key in the settings.
Monitoring
Section titled “Monitoring”Serverless Dashboard
Section titled “Serverless Dashboard”- View request counts
- Monitor worker scaling
- Check error rates
- See billing information
Pod Monitoring
Section titled “Pod Monitoring”- SSH into pod for logs
- View GPU utilization
- Monitor memory usage
Troubleshooting
Section titled “Troubleshooting”Cold Start Too Slow
Section titled “Cold Start Too Slow”- Increase minimum workers (serverless)
- Use dedicated pod for instant response
- Pre-warm with periodic requests
GPU Out of Memory
Section titled “GPU Out of Memory”- Use smaller Whisper model
- Reduce concurrent requests
- Upgrade to larger GPU
Connection Timeout
Section titled “Connection Timeout”- Check pod/endpoint status
- Verify URL format
- Ensure port is exposed
401 Unauthorized
Section titled “401 Unauthorized”- Verify RunPod API key
- Check endpoint ID
- Ensure key has correct permissions
Cost Optimization
Section titled “Cost Optimization”- Use serverless for variable load
- Scale workers based on actual usage
- Use smaller models when quality isn’t critical
- Stop pods when not in use
- Monitor usage in RunPod dashboard
Next Steps
Section titled “Next Steps”- Docker Deployment - Local Docker deployment
- Local LLM Setup - Run everything locally
- API Reference - API documentation