Fluid Compute and Streaming Costs
Fluid Compute charges for Active CPU time, not invocation count. I/O-bound functions get cheaper, but CPU-bound streaming workloads cost more than traditional per-invocation pricing.
The Edge Case: Pricing Model Shift Breaks Budget Expectations
Vercel's traditional serverless pricing was simple: $0.60 per million invocations for Pro plans. Run a function 10,000 times per day, pay $180 per month. Predictable, easy to calculate, and straightforward to budget. Fluid Compute changed this model in early 2026—now you're charged for CPU time consumed, not invocation count. For I/O-bound functions waiting on databases or APIs, costs drop dramatically. For CPU-bound functions processing images or running LLM inference, costs can spike unexpectedly.
The confusion comes from billing granularity. Fluid Compute charges for "active CPU" measured in milliseconds, plus provisioned memory. Idle time waiting on network I/O doesn't cost anything—but if your function spins CPU while waiting (retry loops, polling, busy waiting), you're paying for compute that isn't doing useful work. Worse, streaming responses for AI, video, or large file uploads keep CPU active longer than you expect, driving costs higher than per-invocation pricing.
What Is Fluid Compute?
Fluid Compute is Vercel's new execution model that takes the best of servers and serverless. Traditional serverless scales to zero immediately—you pay per invocation but accept cold start latency. Servers stay running—you pay 24/7 but get zero latency and predictable performance. Fluid sits between: it keeps function instances warm longer and reuses them across invocations, reducing cold starts while still scaling down during idle periods.
Fluid introduces two billing changes:
- Active CPU pricing: You're charged only when CPU is actually doing work, not while the function is idle waiting for I/O.
- Provisioned memory: You're charged for memory allocation per second, regardless of whether that memory is used.
This model is cheaper for I/O-bound workloads (database queries, API calls, file uploads) because most time is spent waiting on network responses, not using CPU. It's more expensive for CPU-bound workloads (image processing, video transcoding, LLM inference) because you're paying for every millisecond of CPU usage.
The Streaming Cost Trap
Streaming responses improve user experience—send data as it's available instead of buffering the full response. But streaming extends function duration because the function stays alive until the stream completes. With Fluid Compute's Active CPU pricing, this matters: if your function generates streaming data (AI responses, video chunks, database cursors), CPU stays active longer, increasing costs.
Consider an LLM inference endpoint:
// Traditional serverless pricing
// Invocation: 10,000 requests
// Duration: 500ms average
// Cost: 10,000 × $0.60 / 1,000,000 = $0.006
// Fluid Compute pricing
// CPU time: 500ms per request (5,000,000ms total)
// Memory: 512MB provisioned
// Cost: (5,000,000ms × $0.00025/1000ms) + (memory allocation) = $1.25 + memoryThe same workload costs 200x more under Fluid Compute—not because Vercel is more expensive, but because you're now paying for actual CPU usage instead of per-invocation minimums. For I/O-bound endpoints waiting on databases, Fluid Compute is cheaper. For CPU-bound endpoints doing actual work, Fluid Compute is more expensive.
Streaming makes this worse. If your LLM endpoint takes 30 seconds to stream a response, you're paying for 30 seconds of CPU time instead of the 500ms invocation time. Multiply by 10,000 requests and you've got a $75 bill instead of $0.006.
Active CPU vs Idle Time: What Actually Costs
Fluid Compute's Active CPU pricing is deceptively simple: you pay for CPU time when it's doing work. But "doing work" includes more than you think:
- Processing: JSON parsing, data transformation, business logic.
- Garbage collection: Node.js runtime pauses to clean up memory.
- Event loop processing: Handling async callbacks and promises.
- Busy waiting: Polling loops, retry mechanisms, spinlocks.
Idle time—when CPU is genuinely waiting—doesn't cost anything. But if your code has busy waiting or retry loops that spin CPU while waiting for responses, you're paying for wasted compute.
// BAD: Busy waiting wastes CPU
async function waitForResponse(url) {
while (true) {
try {
const response = await fetch(url);
if (response.ok) return response;
} catch (e) {
// Spinning CPU here - costs money with Fluid Compute
}
}
}
// GOOD: Exponential backoff - CPU idle during waits
async function waitForResponse(url, retries = 3) {
for (let i = 0; i < retries; i++) {
try {
const response = await fetch(url);
if (response.ok) return response;
} catch (e) {
if (i === retries - 1) throw e;
// CPU idle here - no cost with Fluid Compute
await new Promise(resolve => setTimeout(resolve, 100 * Math.pow(2, i)));
}
}
}The exponential backoff pattern adds delay without spinning CPU. With per-invocation pricing, both patterns cost the same—you pay per invocation regardless of CPU usage. With Fluid Compute's Active CPU pricing, the busy waiting pattern costs 10-100x more because CPU is spinning during retries.
Memory Allocation: The Silent Cost Driver
Fluid Compute charges for provisioned memory per second, regardless of whether you use it. Configure 1GB of memory and use only 100MB? You pay for 1GB. Configure 512MB and your function spikes to 600MB? You hit memory limits and the function crashes.
Memory costs add up quickly:
// Memory pricing (simplified)
// 512MB: $0.000007 per second
// 1GB: $0.000014 per second
// 2GB: $0.000028 per second
// Function running 30 seconds
// 512MB: $0.00021
// 1GB: $0.00042
// 2GB: $0.00084
// 10,000 invocations
// 512MB: $2.10
// 1GB: $4.20
// 2GB: $8.40For functions with predictable memory usage, provision exactly what you need. For functions with spiky memory needs (image processing, video transcoding), test with different memory allocations and find the minimum that avoids crashes. Don't default to 2GB for all functions—you're paying for memory you don't use.
When Fluid Compute Saves Money
Fluid Compute is cheaper for I/O-bound workloads where most time is spent waiting on network responses. Consider these scenarios:
Database-Heavy APIs
REST APIs that query databases, process results, and return JSON spend 90% of time waiting for PostgreSQL or MySQL. CPU usage is minimal—just deserializing JSON and formatting responses. With Fluid Compute, you're only charged for the 10% when CPU is actually working.
// api/users.ts - I/O-bound
export default async function handler(req, res) {
// 200ms waiting on database (idle CPU - no cost)
const users = await prisma.user.findMany();
// 5ms processing (active CPU - cost)
const formatted = users.map(user => ({
id: user.id,
name: user.name,
}));
// 200ms waiting on database again (idle CPU - no cost)
const posts = await prisma.post.findMany();
return res.json({ users: formatted, posts });
}This function takes 400ms total but only uses CPU for 10ms. With per-invocation pricing, you pay the full $0.60 per million invocations. With Fluid Compute, you pay for 10ms of CPU time—dramatically cheaper.
Webhook Processors
Webhook handlers fetch external APIs, process data, and return. Most time is waiting on HTTP responses. CPU usage is minimal—parsing JSON and making conditional decisions. Fluid Compute reduces costs for webhook-heavy applications by charging only for processing time, not wait time.
File Uploads
Streaming file uploads to Vercel Blob or S3 takes time, but most of that time is network I/O. CPU is idle while data streams. With Fluid Compute, you're only charged for the processing time after the upload completes (thumbnail generation, metadata extraction), not the upload duration itself.
When Fluid Compute Costs More
CPU-bound workloads pay more under Fluid Compute because you're now paying for every millisecond of CPU usage. These scenarios get expensive:
LLM Inference
Running AI models (GPT-4, Claude, Llama) requires sustained CPU usage throughout the request. Streaming responses extend this further because the function stays alive while generating tokens. With Fluid Compute, you pay for every millisecond of inference time—not the invocation itself.
Mitigation strategy: Use Vercel AI SDK with streaming, but implement request queuing and batching. Process multiple LLM requests on a single function instance instead of one per invocation. This amortizes initialization costs and reduces idle time between requests.
Image Processing
Sharp, Jimp, and other image libraries use CPU heavily for resizing, compression, and format conversion. A single image resize might take 200ms of sustained CPU usage. With Fluid Compute, you pay for that 200ms. With per-invocation pricing, you pay a flat $0.60 per million invocations regardless of CPU time.
Mitigation strategy: Offload image processing to a dedicated service (Cloudflare Images, Imgix) or use Vercel's Image Optimization API. These services have pre-optimized infrastructure and pricing designed for image processing workloads.
Video Transcoding
FFmpeg-based video transcoding is CPU-intensive and slow. Transcoding a 1-minute video might take 30 seconds of sustained CPU usage. With Fluid Compute, you're paying for 30 seconds of active CPU per invocation. Multiply by 100 videos and the cost adds up quickly.
Mitigation strategy: Use dedicated transcoding services (Mux, AWS Elemental) or move this workload to a long-running server where you pay fixed hourly rates instead of per-millisecond compute.
Monitoring Costs in Real-Time
Fluid Compute pricing makes cost monitoring critical. Use Vercel's usage dashboard to track CPU time and memory allocation by function. Set up alerts for unexpected spikes—sudden increases in CPU time might indicate a performance regression or inefficient algorithm.
// Enable Vercel usage alerts (vercel.json)
{
"usageAlerts": {
"cpuTimeThreshold": {
"value": 1000000, // 1 second of CPU time
"window": "1h", // per hour
"notify": ["email:team@example.com", "slack:#alerts"]
},
"memoryThreshold": {
"value": 2000000000, // 2GB memory
"window": "1h",
"notify": ["email:team@example.com", "slack:#alerts"]
}
}
}Review your costs weekly and identify functions driving spend. For expensive functions, consider:
- Migrating to Edge Functions (if workload fits)
- Optimizing algorithms to reduce CPU time
- Moving to long-running servers for constant heavy workloads
- Using specialized services (image optimization, transcoding)
The Bottom Line
Fluid Compute changes Vercel from a per-invocation pricing model to a pay-for-what-you-use model. For I/O-bound workloads, this saves money. For CPU-bound workloads, this increases costs. The key is understanding which category your functions fall into and optimizing accordingly. Don't enable Fluid Compute blindly—profile your functions, measure actual CPU usage, and compare costs before migrating. For mixed workloads, use different runtimes selectively: Edge Functions for lightweight I/O, Fluid Compute for database APIs, and traditional servers for constant CPU-heavy processing.
Advertisement
Explore these curated resources to deepen your understanding
Official Documentation
Tools & Utilities
Further Reading
Vercel Fluid Compute: Serverless Functions Guide (2026)
Comprehensive guide to Fluid Compute architecture, pricing, and optimization
Vercel AI Pricing Plans 2026: How Much Does It Cost?
Analysis of Vercel AI costs including streaming inference pricing
Lower Pricing with Active CPU Pricing for Fluid Compute
Vercel changelog announcing Active CPU pricing and cost reduction
Related Insights
Explore related edge cases and patterns
Advertisement