Cost vs VUs and Batch Size Contour Plots

These contour plots visualize the cost per billion tokens per day (1B_cost) as a function of VUs (Virtual Users) and Batch Size for different hardware configurations (hw_type) and image types (image). There are real points, but in-between I

How to Read the Charts:

  • Color Gradient: Shows the cost levels, with darker colors representing higher costs and lighter colors representing lower costs.
  • Contour Lines: Represent cost levels, helping identify cost-effective regions.
  • White Dots: Represent real data points used to generate the interpolated surface.
  • Red Stars: Highlight the lowest cost point in the dataset.
  • Small Red Dots: Indicate the lowest cost for each batch size.
  • Tight clusters: (of contour lines) indicate costs changing rapidly with small adjustments to batch size or VUs.

How to Use:

  • Identify the lowest cost configurations (red stars and dots).
  • Observe how cost changes with batch size and VUs to optimize your setup.
  • Compare different hardware types (hw_type) and image processing strategies (image) to find the best-performing configuration.

Analysis

Overall we can see that nvidia-t4s are more expensive for this cost-ratio and task. We should consider using the nvidia-l4.

GPU Image Batch Size VUs Min Cost
nvidia-t4 trt-onnx 512 48 $611.07
nvidia-t4 default 32 32 $622.81
nvidia-l4 trt-onnx 64 448 $255.07
nvidia-l4 default 64 448 $253.82

We can see a clear winner with nvidia-l4 over nvidia-t4 at this cost ratio. But surprisingly we see default slightly outperform trt-onnx. I think we should be careful not to overfit. These numbers can vary per run, but its good to know that each image can be competitive.

nvidia-t4

  • Here we can see that trt-onnx and default both perform similarly but with trt-onnx having a slight edge.
  • trt-onnx has a lower overall cost band (611–659) than default (623–676)

nvidia-l4

  • trt-onnx has a broad area of relatively low cost and hits a very low floor (~255)
    • This is great since it shows that we get consistently good results!
  • default can also dip into the mid‐200s in certain spots, but it has bigger, more expensive areas—especially at lower VUs and batch sizes.
    • This means we need to spend time to optimize it

Conclusion

If I have time, I might analyze the nvidia-l4 with trt-onnx across some different runs. Despite being $1.25 more expensive per 1B requests its a safer more consistent bet IMO.