Image-to-video is no longer a demo trick. Two models — OpenAI's Sora 2 and Google's Veo 3.1 — are now both production-ready for ad creatives. We took 80 static product images through both, generated 6-second clips, and tested them as Meta and TikTok hooks.
TL;DR
The benchmark
| Metric | Sora 2 | Veo 3.1 |
|---|---|---|
| Motion realism | 9.1 / 10 | 8.4 / 10 |
| Prompt adherence | 8.0 / 10 | 9.0 / 10 |
| Identity preservation (product) | 8.6 / 10 | 8.4 / 10 |
| Cost / 6s clip | ~$1.20 | ~$0.95 |
| Avg generation time | 90s | 70s |
When to use which
Sora 2
2026-02by OpenAI
Veo 3.1
2026-04by Google DeepMind
- Beauty + skincare hero shots → Sora 2 (motion + skin)
- Lifestyle scenes with multiple cuts → Veo 3.1 (coherence)
- Cinematic narrative ads → Veo 3.1 (longer scene logic)
- Product showcase + dramatic camera move → Sora 2
What this means for your ad ops
Both models cost real money. The brands shipping efficient image-to-video at scale aren't picking one — they're routing per concept. AdFrame does this automatically: when you turn a static ad into video, we pick the model that fits your concept's motion profile.
Turn your static ads into video, automatically
AdFrame's image-to-video router picks Sora 2 or Veo 3.1 based on your concept. No model API to learn.