Classification Optimization

Sanity Check Charts:

No Failed Requests: Verify that all requests were successful.
Monotonic Series: Ensure that we tried enough VUs
Accuracy Distribution: Evaluate the consistency of accuracy across runs.

Cost Analysis Charts

Best Image by Cost Savings: Identify the best image based on cost savings
Cost vs Latency: Identify optimal configurations balancing cost and latency.
Cost vs VUs & Batch: Analyze cost trends based on VUs and batch size.

Best Config

1B_cost	253.82
hw_type	nvidia-l4
image	default
batch_size	64
vus	448

Cost vs VUs and Batch Size Contour Plots

These contour plots visualize the cost per billion tokens per day (1B_cost) as a function of VUs (Virtual Users) and Batch Size for different hardware configurations (hw_type) and image types (image). There are real points, but in-between I

How to Read the Charts:

Color Gradient: Shows the cost levels, with darker colors representing higher costs and lighter colors representing lower costs.
Contour Lines: Represent cost levels, helping identify cost-effective regions.
White Dots: Represent real data points used to generate the interpolated surface.
Red Stars: Highlight the lowest cost point in the dataset.
Small Red Dots: Indicate the lowest cost for each batch size.
Tight clusters: (of contour lines) indicate costs changing rapidly with small adjustments to batch size or VUs.

How to Use:

Identify the lowest cost configurations (red stars and dots).
Observe how cost changes with batch size and VUs to optimize your setup.
Compare different hardware types (hw_type) and image processing strategies (image) to find the best-performing configuration.

Analysis

Overall we can see that nvidia-t4s are more expensive for this cost-ratio and task. We should consider using the nvidia-l4.

GPU	Image	Batch Size	VUs	Min Cost
nvidia-t4	`trt-onnx`	512	48	$611.07
nvidia-t4	`default`	32	32	$622.81
nvidia-l4	`trt-onnx`	64	448	$255.07
nvidia-l4	`default`	64	448	$253.82

We can see a clear winner with nvidia-l4 over nvidia-t4 at this cost ratio. But surprisingly we see default slightly outperform trt-onnx. I think we should be careful not to overfit. These numbers can vary per run, but its good to know that each image can be competitive.

`nvidia-t4`

Here we can see that trt-onnx and default both perform similarly but with trt-onnx having a slight edge.
trt-onnx has a lower overall cost band (611–659) than default (623–676)

`nvidia-l4`

trt-onnx has a broad area of relatively low cost and hits a very low floor (~255)
- This is great since it shows that we get consistently good results!
default can also dip into the mid‐200s in certain spots, but it has bigger, more expensive areas—especially at lower VUs and batch sizes.
- This means we need to spend time to optimize it

Conclusion

If I have time, I might analyze the nvidia-l4 with trt-onnx across some different runs. Despite being $1.25 more expensive per 1B requests its a safer more consistent bet IMO.

Plot

Failed Requests Check

If all requests were successful, the result should be 0.

Textbox

hw_type	batch_size	vus	total_requests	accuracy_percentage	absolute_deviation	z_score
nvidia-l4	16	271	117	53.85	4.16	-15.59
nvidia-l4	16	384	4803	57.76	0.25	-0.93
nvidia-l4	64	128	10000	58.06	0.06	0.21
nvidia-l4	128	256	10000	58.06	0.06	0.21
nvidia-l4	256	1024	10000	58.06	0.06	0.21

Classification Optimization

Sanity Check Charts:

Cost Analysis Charts

Best Config

Cost vs VUs and Batch Size Contour Plots

How to Read the Charts:

How to Use:

Analysis

`nvidia-t4`

`nvidia-l4`

Conclusion

Best Image by Cost Savings

Chart

Analysis

`nvidia-l4`

`nvidia-t4`

Takeaways

1B Requests Cost vs. Latency

How to Read the Chart:

Key Features:

How to Use:

Failed Requests Check

Did we try enough VUs?

Verification

Are we Accurate Enough?