We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.

By clicking Accept, you agree to our use of cookies.
Learn more.

Company logo
Customer Success Story
Masraff logo

12 Production ML Models on Auto-Scaling Infrastructure

Masraff automates expense entry and accounting workflows using computer vision and NLP models. Their AI pipeline processes receipt images, extracts transaction data, and classifies expenses—replacing manual data entry for corporate finance teams.

Before Datazone, their AWS bill scaled linearly with user traffic. Idle compute sat running 24/7. Storage costs grew unchecked. No autoscaling meant over-provisioning for peak load.

12
ML models in production
4x
Cloud cost cut
10x
Storage cost cut
Auto-scaling enabled
Compute scales with user requests

The Problem

No Auto-Scaling

Inference clusters ran 24/7 at fixed capacity. Peak-hour provisioning meant paying for idle compute during off-hours.

Storage Bloat

Training data, model artifacts, and logs accumulated without lifecycle policies. Storage costs grew faster than user growth.

Linear Cost Scaling

Every 10% increase in users meant 10% higher AWS spend. No cost leverage as the product scaled.

What Datazone Did

1

Training Data in Object Storage

Receipt images and labeled datasets stored in S3-compatible object storage with automatic lifecycle rules. Cold data archived after 90 days.

S3-compatible storageLifecycle policies10x storage savings
2

Spot Instance Training Jobs

Model training runs on spot compute clusters—up to 70% cheaper than on-demand. Datazone handles interruptions and checkpointing automatically.

Spot instancesAuto-checkpointing70% training cost savings
3

Auto-Scaling Inference Endpoints

Inference clusters scale from 0 to N based on request volume. During off-peak hours (nights, weekends), compute scales to zero. Peak traffic automatically provisions additional nodes.

Scale-to-zeroRequest-based autoscaling4x compute cost reduction

What Changed

Before
  • ×Fixed compute running 24/7
  • ×Storage costs growing unchecked
  • ×AWS bill scaling linearly with users
  • ×Over-provisioned for peak load
After
  • Inference scales to zero during off-hours
  • Lifecycle policies archive old data automatically
  • 4x reduction in overall cloud spend
  • Compute matches actual usage patterns

Masraff now runs 12 production ML models on infrastructure that costs 4x less than their previous setup. Auto-scaling handles traffic spikes without manual intervention. Storage costs dropped 10x with lifecycle policies. Their AWS bill no longer scales 1:1 with user growth.