How I stopped paying OpenAI to run my test suite
I was building an AI project and ran into something that kept bothering me. Every test that touched my LLM code was making a real API call. To OpenAI. Every single time. Tests were slow — 3 to 5 se...

Source: DEV Community
I was building an AI project and ran into something that kept bothering me. Every test that touched my LLM code was making a real API call. To OpenAI. Every single time. Tests were slow — 3 to 5 seconds each just waiting for a response. Every CI run cost real money in tokens, for code that hadn't even shipped yet. And the tests were flaky: same code, same input, different output. Language models are non-deterministic, so I'd get a passing run, then a failing run, with no way to tell if my code was broken or if the model just felt like responding differently. The existing options aren't great Mock the OpenAI Python client — then you're not testing the HTTP layer at all. The real SDK does a lot between your code and the wire. VCR-style cassettes — brittle. They break whenever the SDK updates its request format, which happens constantly. Ollama / local model — needs a GPU, still non-deterministic, and slow to start. None of these gave me what I actually wanted: fast, deterministic, zero-c