Essential ingredients for enterprise AI success

Key takeaways:

Developer trust in AI output is declining. Over 75% of developers still want human validation when they don’t trust AI answers.
Debugging AI-generated code takes more time than expected, with “almost right but not quite” solutions being the top frustration.
Advanced questions on Stack Overflow doubled since 2023, indicating that LLMs may struggle with complex reasoning problems.
Agentic AI adoption is split: More than half of developers are still sticking to simpler AI tools, but 70% of adopters report reduced time on tasks thanks to agentic workflows.
Small language models and MCP servers are emerging as cost-effective solutions for enterprise and domain-specific tasks.

The 2025 Stack Overflow Developer Survey provides a nuanced look at AI adoption among enterprise development teams. AI tools are widely used, but as adoption rises and developers encounter the real-world limits of these tools, trust declines. The survey also highlights the continued value developers place on human knowledge and experience, particularly as AI tools become more prevalent.

On a recent episode of Leaders of Code, Stack Overflow Senior Product Marketing Manager Natalie Rotnov discussed key takeaways for enterprises regarding AI adoption and implementation. This article distills her insights from the survey findings, outlines action items for leadership, and further explores her recommendations for agentic AI in the enterprise. A key point emphasized is the critical role of data quality.

The AI trust decline: why developer skepticism is healthy

The 2025 Stack Overflow survey of nearly 50,000 developers globally revealed a decline in developer trust in AI tools. While this may not surprise developers, it could be unexpected for C-suite executives who have been optimistic about AI tools without fully understanding their teams’ workflows.

Rotnov suggests that developers’ skepticism toward AI is beneficial. She explains that “Developers are skeptics by trade. They have to be critical thinkers, and they’re on the front lines intimately familiar with the nuances of coding, debugging, and problem-solving.” These individuals are precisely who should be working with new AI coding tools.

What’s behind the AI distrust?

The survey identified developers’ biggest frustrations with AI:

“Almost right, but not quite” solutions. AI produces code that appears correct but contains subtle errors. These create pitfalls, especially for less seasoned developers, who may not have the experience to identify and correct these issues.
Time-consuming debugging. Fixing AI-generated code often takes longer than expected, especially without proper context.
Lack of complex reasoning. Current AI models struggle with advanced problem-solving and higher-order work.

These concerns align with research findings. Research from Apple suggests that LLMs primarily engage in pattern matching and memorization rather than true reasoning. The paper showed that as tasks grew more complex, model performance deteriorated—evidence that reasoning models are still relatively immature.

Key term: Reasoning models are AI models designed to break down problems and think through solutions step-by-step, mimicking human cognitive processes. OpenAI’s o1 is one example.

Do developers still rely on human knowledge?

Despite AI’s constantly expanding capabilities, the survey revealed that human knowledge remains paramount for complicated technical problems. More than 80% of developers still visit Stack Overflow regularly, and 75% turn to another person when they do not trust AI-generated answers.

Significantly, despite developers experimenting with reasoning models, advanced questions on Stack Overflow.com have doubled since 2023. Stack Overflow’s parent company, Prosus, utilizes an LLM to categorize questions as “basic” or “advanced.” This significant increase in “advanced” questions suggests developers are encountering issues that AI tools cannot resolve.

What does human validation mean for enterprises?

Rotnov emphasizes two important conclusions that enterprises should draw from this data:

LLMs haven’t mastered complex reasoning problems. Instead, developers turn to human-centered knowledge communities for help.
AI is creating new problems that communities have never encountered before.

Not only are human expertise and validation still essential, then, but the new problems cropping up because of AI use, misuse, or overuse require human-driven solutions.

Example: A developer using an AI coding assistant might generate a working application quickly, but when they need to optimize performance, handle edge cases, or integrate with legacy systems, they require human expertise and collaborative problem-solving.

What are the key action items for leaders driving enterprise AI projects?

Rotnov outlined two high-level action items business leaders can take to make their AI projects successful while supporting technical teams’ preferred tools and workflows: investing in spaces for knowledge curation/validation and doubling down on retrieval augmented generation (RAG).

Invest in spaces for knowledge curation and validation

What to do: Create internal platforms where developers can document, discuss, and validate new problems and solutions emerging from AI-assisted workflows.

Why it matters: As AI changes how developers work, they need structured spaces to build consensus around new patterns and best practices.

Best practices:

Choose platforms that support structured formats with metadata (tags, categories, labels).
Implement quality signals like voting, accepted answers, and expert verification.
Ensure the format is AI-friendly so this knowledge can feed back into internal LLMs and agents.

Key term: Metadata refers to information about data (like tags, categories, or timestamps) that helps organize and contextualize content, making it easier for both humans and AI systems to understand and retrieve relevant information.

Double down on RAG systems

RAG (retrieval augmented generation) continues to be highly relevant, according to Rotnov, and for good reason. The survey indicated:

36% of professional developers are learning RAG.
Searching for answers represents the highest area of AI adoption in development workflows.
The “RAG” tag has emerged as one of the most popular new tags on Stack Overflow.

What RAG does: RAG systems summarize internal knowledge sources into concise, relevant answers that appear wherever developers work, including within IDEs, chat platforms, or documentation.

Critical consideration: RAG is only as effective as its underlying data. Summarizing poorly structured or outdated information will lead to suboptimal results.

Example: A developer troubleshooting a deployment issue could query an internal RAG system that retrieves information from documentation, past incident reports, and team wikis to provide a comprehensive answer without manually searching multiple sources.

Future-proofing AI models through reasoning and human validation

For organizations building their own models (whether internal tools or products), Rotnov emphasizes two priorities: improving reasoning capabilities and implementing human validation loops.

Improve reasoning capabilities

The challenge: Current reasoning models are immature and struggle with complex tasks.

The solution: Train models on data that demonstrates human thought processes, not just final answers.

Important data types include:

Comment threads showing how humans discuss and evaluate solutions.
Curated knowledge that reveals how understanding evolves over time.
Decision-making processes that expose the “why” behind conclusions.

Survey insight: For the first time, Stack Overflow asked how people use the platform. The #1 answer? They look at comments. This reveals that developers are looking for more than just the accepted solution. They want to see the discussion, the relevant context, and the diverse perspectives surrounding a question.

Implement human validation loops

The issue: Model drift: when AI outputs become less accurate as real-world conditions change.

The fix: Build continuous feedback mechanisms where humans evaluate and correct AI outputs to ensure accuracy and alignment with human values.

Example: Stack Overflow is piloting integrations where AI models appear on leaderboards and users can vote on responses from different models, providing real-time feedback on performance.

The developer tool sprawl problem

A surprising finding indicates that over a third of developers use 6-10 different tools in their work, yet contrary to common assumptions, tool sprawl does not correlate with job dissatisfaction.

Rotnov observed that this finding was unexpected, as efforts to solve tool sprawl have been ongoing for years. However, it appears developers accept that each tool serves a specific use case and is necessary for their work. When deciding on AI tools and technologies, enterprises should recognize that developers can tolerate a degree of tool sprawl, provided each tool fulfills a distinct function within their workflows.

Agentic AI: The promised solution?

Regarding workflows, Agentic AI describes autonomous systems capable of performing complex tasks across various tools and platforms to achieve specific goals without continuous human guidance. Theoretically, agentic AI offers a solution to tool sprawl. However, adoption of agentic AI systems remains limited:

52% of developers either do not use agents or prefer simpler AI tools.
Security and privacy concerns continue to be significant barriers to agent adoption.
The immaturity of reasoning models restricts agents’ capabilities.

Nevertheless, for developers who have begun incorporating agentic AI into their workflows, the outcomes are encouraging:

70% report that agents reduced the time spent on specific tasks.
69% agree that agents increased their productivity.
Younger or less experienced developers show a greater likelihood of adopting agents.

Similar to the general adoption curve of AI tools, developers will embrace agentic workflows once they observe clear evidence of their effectiveness.

Recommendations for navigating agentic AI

On that note, Rotnov had some recommendations for enterprises rolling out agentic AI systems.

Start small and iterate

As with any new tool or technology, Rotnov recommends that enterprises pilot low-risk agentic use cases before rolling out broader implementations. Demonstrate value, build consensus, and then roll it out to more users once an understanding of how things work on a micro scale is achieved.

Consider piloting with interns or newer developers on onboarding tasks, where mistakes have lower consequences and feedback loops are clear.

Embrace MCP servers

MCP (model context protocol) provides a standardized method for LLMs to access and learn from data sources. It is comparable to the International Image Interoperability Framework (IIIF), which standardizes how images are delivered and described online.

What MCP servers do:

Help AI learn implicit knowledge: an organization’s language, culture, and operational methods.
Enable faster familiarization with internal systems.
Provide read-write access and pre-built prompts for dynamic knowledge sharing.
Connect to existing AI tools and agents for less context switching.

Real-world application: Stack Overflow recently launched a bi-directional MCP server. A developer creating an internal app in Cursor can connect to the MCP server and instantly access enterprise knowledge—including structure, quality signals (votes, accepted answers), and metadata (tags) to inform their application’s outputs.

Consider small language models

Why the trend: Small language models (SLMs) are increasing in popularity due to several factors:

Task-specific: Smaller models can be fine-tuned for particular domains or use cases.
Cost-effective: Small models are generally less expensive to build and maintain compared to large models.
Better for the environment: SLMs typically require less computational power.
Ideal for agents: Smaller models are well-suited for specialized agentic tasks.

Example: A healthcare company might implement an SLM specifically trained on medical coding standards and its internal protocols for processing insurance claims, rather than depending on a general-purpose LLM.

Don’t sleep on APIs

While MCP servers and agents get attention, APIs remain crucial for reducing context switching and the overall cognitive load on developers. In fact, developers are more likely to endorse and become fans of a technology if it has an easy-to-use and robust API.

What to evaluate:

Is the API well-documented and supported?
Does it use a REST architecture or other AI-friendly format?
Is pricing transparent?
Is there an SDK available for easier integration?

Example: Stack Overflow recently launched a TypeScript SDK for Stack Internal, making it easier for developers to build integrations and custom workflows.

Data quality is the key to enterprise AI success

Rotnov emphasized her primary recommendation for enterprises considering AI projects:

“You really need to be looking long and hard about what internal data sources you have that LLMs and AI can learn from and provide accurate answers to your teams.”

Key questions for organizations to consider:

Are developers given spaces to create new knowledge and problem-solve collaboratively?
Is that knowledge well-structured with good metadata and quality signals?
If third-party data is used, does it meet the same quality criteria?
Is the data conducive to AI, meaning organized in ways that LLMs can effectively learn from?

Regardless of what is being built—agentic systems, RAG implementations, or custom models—the underlying data quality determines success. Even synthetic data generation requires high-quality source material.

Final thoughts

For AI initiatives to succeed, enterprises must balance the productive potential of AI tools with the necessity for continuous human validation and community-driven knowledge infrastructure. Successful developers utilize AI not to replace human judgment or experience, but as a force multiplier. Similarly, successful enterprises integrate AI capabilities with human expertise, employing well-structured knowledge systems and strategic implementation to ensure AI delivers value across all business levels.

Latest Post

WSL is good, but it’s still not enough for me to go back to Windows

Anker’s X1 Pro shouldn’t exist, but I’m so glad it does

Suspected Russian Actor Linked to CANFAIL Malware Attacks on Ukrainian Organizations

Docker vs Kubernetes in Production: A Security-First Decision Framework

Effortless VS Code Theming: A Guide to Building Your Own Extension

Implementing Contrast-Color Functionality Using Current CSS Features

ChatGPT Mobile App Surpasses $3 Billion in Consumer Spending

Creator Tayla Cannon Lands $1.1M Investment for Rebuildr PT Software

Automate Your iPhone’s Always-On Display for Better Battery Life and Privacy

Latest Post

WSL is good, but it’s still not enough for me to go back to Windows

Anker’s X1 Pro shouldn’t exist, but I’m so glad it does

Suspected Russian Actor Linked to CANFAIL Malware Attacks on Ukrainian Organizations

Latest Post

Essential ingredients for enterprise AI success

Key takeaways:

The AI trust decline: why developer skepticism is healthy

What’s behind the AI distrust?

Do developers still rely on human knowledge?

What does human validation mean for enterprises?

What are the key action items for leaders driving enterprise AI projects?

Invest in spaces for knowledge curation and validation

Double down on RAG systems

Future-proofing AI models through reasoning and human validation

Improve reasoning capabilities

Implement human validation loops

The developer tool sprawl problem

Agentic AI: The promised solution?

Recommendations for navigating agentic AI

Start small and iterate

Embrace MCP servers

Consider small language models

Don’t sleep on APIs

Data quality is the key to enterprise AI success

Final thoughts

Related Posts