Cover for Which of the popular LLMs have the best developer experience

Kundan

|

Mar 29, 2025

|

6 min read

Which of the popular LLMs have the best developer experience?

I finally have some free time to share what's been bothering me lately: the developer experience of working with various LLM APIs. This isn't another post about benchmarks or technical capabilities—it's about the real-world frustrations and occasional wins when building actual products with these tools.

The Context: Resume Parsing with LLMs

For context, I've been developing TrackJobs.co, specifically focusing on the resume builder component. My challenge has been:

  1. Taking users' PDF resumes
  2. Extracting the raw text content (using libraries like PDFJS)
  3. Sending this content to various LLMs
  4. Getting structured data back according to a specific schema
  5. Using that data in my application

Sounds simple enough on paper. In reality? It's been a nightmare of inconsistent APIs, limited documentation, and hair-pulling debugging sessions. Let me break down my experience with each major provider.

OpenAI: The Structured Output Struggle

The OpenAI API was my starting point, and while powerful, it has some significant shortcomings:

Schema Support Issues

OpenAI introduced Zod support in their SDK, but the implementation feels half-baked(it's still in beta):

The real pain? Since, I want a single Zod schema to have form validation and to get LLM's response in the same schema.

Incomplete Parsing

When using GPT-4o-Mini for resume parsing (which should be straightforward), I keep hitting roadblocks:

Poor Error Handling

The error handling is bad:

Slow Responses

Compared to competitors, OpenAI's API is painfully slow:

Gemini: Faster But Still Frustrating

Google's Gemini API improves on some fronts but introduces its own headaches:

The 8K Token Ceiling

The most significant limitation is the output token limit:

I've confirmed this isn't a rate limit issue. For the 2.0 Flash model:

Schema Support Limitations

While Gemini's Zod support is better than OpenAI's, it still has significant gaps:

The Classic Google Documentation Problem

Like many Google products, Gemini suffers from fragmented, contradictory documentation:

Unclear Error Messages

The error messages are often useless:

Anthropic (Claude): Premium Price, Premium Pain

While I haven't extensively used Claude, my findings are:

The Apple Approach

Technical Limitations

DeepSeek: The Almost Alternative

DeepSeek offers OpenAI SDK compatibility but has limitations:

Cost Structure Problems

The Deeper Problem: Unpredictability

The biggest issue across all these platforms is the sheer unpredictability:

Companies are investing heavily in model capabilities but neglecting developer experience. They want the spotlight on their fancy new models but aren't focusing on making them practical to implement, especially for solo developers or small teams.

What Developers Actually Need

Based on my experience, here's what would dramatically improve the LLM developer experience:

  1. Complete schema support: First-class support for standard validation libraries like Zod and Pydantic
  2. Better documentation: Clear documentation on limitations and supported features
  3. Improved error handling: Token usage information for failed requests and clear error messages
  4. Predictable parsing: If a schema is enforced, the data should reliably match it
  5. Better structured output support: More work on making the beta features production-ready

Bottom Line: Which One Sucks Least?

If you're choosing an LLM API today, here's my assessment:

My verdict would be to go with Gemini as it has decent knowledge-cutoff and cheap enough for simpler tasks. But the constant patching won't go away anytime soon.

The current landscape feels like companies are rushing to release new models rather than fixing fundamental developer experience issues. Structured outputs have been in beta for 7-8 months with little improvement, while new models keep coming out every few weeks.

I'm still using a mix of these APIs depending on the specific task, but none provides a truly satisfying developer experience. OpenAI has the advantage of the larger token window, while Gemini wins on speed, but all of them make me want to tear my hair out on a regular basis.

Until these issues are addressed, we developers will continue patching together solutions and fighting with inconsistent APIs. If you've faced similar challenges or found workarounds, I'd love to hear about them.

For now, I'm back to debugging why my perfectly valid Zod schema is causing yet another cryptic error. The joys of working on the bleeding edge, I suppose.