Close Menu
    Latest Post

    Build Resilient Generative AI Agents

    January 8, 2026

    Accelerating Stable Diffusion XL Inference with JAX on Cloud TPU v5e

    January 8, 2026

    Older Tech In The Browser Stack

    January 8, 2026
    Facebook X (Twitter) Instagram
    Trending
    • Build Resilient Generative AI Agents
    • Accelerating Stable Diffusion XL Inference with JAX on Cloud TPU v5e
    • Older Tech In The Browser Stack
    • If you hate Windows Search, try Raycast for these 3 reasons
    • The Rotel DX-5: A Compact Integrated Amplifier with Mighty Performance
    • Drones to Diplomas: How Russia’s Largest Private University is Linked to a $25M Essay Mill
    • Amazon’s 55-inch 4-Series Fire TV Sees First-Ever $100 Discount
    • Managing Cloudflare at Enterprise Scale with Infrastructure as Code and Shift-Left Principles
    Facebook X (Twitter) Instagram Pinterest Vimeo
    NodeTodayNodeToday
    • Home
    • AI
    • Dev
    • Guides
    • Products
    • Security
    • Startups
    • Tech
    • Tools
    NodeTodayNodeToday
    Home»Dev»Simulating lousy conversations: Q&A with Silvio Savarese, Chief Scientist & Head of AI Research at Salesforce
    Dev

    Simulating lousy conversations: Q&A with Silvio Savarese, Chief Scientist & Head of AI Research at Salesforce

    Samuel AlejandroBy Samuel AlejandroDecember 27, 2025No Comments6 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    src t69h3v featured
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Customer service represents a significant application for large language models (LLMs) and AI agents. A substantial portion of these interactions occur via phone, necessitating that customer service bots comprehend voice interactions. Phone conversations can be complex, often involving hostility, interruptions, background noise, and general unpredictability. Salesforce is addressing this challenge by simulating such chaotic scenarios to enhance the responsiveness of its voice agents in real-world phone calls. Silvio Savarese, Chief Scientist and Head of AI Research at Salesforce, discussed the development of eVerse, a simulation tool designed to rigorously test AI agents without involving actual customers.

    Article hero image

    Some might view AI voice agents and simulated training environments as an overly complex solution for tasks that phone button menus handle adequately. However, Silvio Savarese explains that while phone menus suffice for simple, scripted interactions like checking a balance, they become ineffective for complex, multi-step customer problems that deviate from predefined scripts. This approach also falls short from a user experience standpoint.

    AI agents, conversely, can capture the nuance in human language, which extends beyond the nature of the request itself. Customers might struggle to articulate their issues or require clarifying questions. Many edge cases exist where a simple button press is insufficient, prompting individuals to call for assistance. This highlights the importance of simulation environments like eVerse, which can create synthetic representations of numerous edge cases to ensure the best possible customer experience. Should human intervention still be necessary, the conversation can be seamlessly transferred, retaining all previously collected context.

    Regarding the determination of real conversation aspects for simulations and why simulating complex scenarios is preferred over simply engineering mitigation, Savarese notes that synthetic data generation offers extreme variety, creating scenarios that might not otherwise be conceived, and at scale. Because this synthetic data is generated independently from the agent's training data, the agent cannot anticipate the nature of the problem. While an agent should ideally handle environmental factors like wind, the primary goal is to train agents for unpredictable scenarios. These could involve various types of noise, language-related challenges, or entirely different issues. Different businesses also face unique challenges, such as ordering at a drive-through or changing a flight in a busy airport. Synthetic data generation allows for extrapolating a small amount of sample data into many different permutations. As the eVerse simulation loop addresses all possible corner cases, these simulation environments will eventually fulfill their purpose and become less necessary.

    Clarifying the distinction between simulation training data and agent training data, Savarese explains that LLMs are pre-trained on vast amounts of general data, much of which is not relevant to specific enterprise scenarios. Agents are sophisticated frameworks built around these LLMs, with capabilities like Agentforce allowing them to be dialed for determinism or creativity based on the use case. Simulation data fundamentally differs from LLM pre-training data. It takes small amounts of real enterprise data to generate realistic synthetic scenarios that LLMs would be highly unlikely to have encountered during pre-training. This approach ensures the agent learns to generalize rather than merely memorizing responses.

    Identifying and fixing gaps is central to eVerse's operation. After simulating large volumes of synthetic scenarios, agent responses are measured through benchmarking. Some methods are quantifiable, such as whether the agent took the correct action, while others are qualitative, like assessing simulated customer satisfaction. Human annotators are also employed to validate agent responses for critical scenarios. The feedback gathered from evaluating agent performance in these simulated scenarios drives continuous improvement. Crucially, edge cases are not static; they evolve with changes in customer behavior, regulations, and business rules. Similar to how flight simulators remain essential for experienced pilots, eVerse becomes increasingly valuable as agents scale, providing a safe environment to test changes where the cost of production failure is too high.

    Addressing the inclusion of less-than-kind customer interactions in simulations without the AI agent responding inappropriately, Savarese highlights that in a simulation, no real humans can have their feelings hurt by a rogue agent. If an agent exhibits inappropriate behavior in certain situations, the simulation environment is precisely where such behavior should be discovered and corrected. The aim is also to ensure agents can handle difficult situations optimally, building empathy and helping to diffuse conversations as an important aspect of customer service. Situations where agents might "curse back" can be detected either by human review to assess inappropriate responses or by using "judge agents/models" trained on sentiment detection to automatically identify such behavior.

    Regarding the "Move 37" analogy from the Go match between Lee Sedol and AlphaGo, and how to ensure simulations remain rooted in real-world human interactions without producing baffling but effective moves, Savarese notes that Go, an ancient board game with an immense number of possible moves, saw an unprecedented "Move 37" from AlphaGo that baffled experts. However, rather than trying to prevent such surprising moves, Go players now leverage AI to learn and improve their own game, viewing AI as a tool for enhancement. This reflects AI's potential in business scenarios: a tool to improve the performance of salespeople, service personnel, and other organizational functions. It is also crucial to establish guardrails that ensure agents operate within proper, trusted boundaries and avoid off-chart behavior. This can be enforced by using judge agents/models or by implementing determinism, as is being done in the new release of Agentforce.

    Salesforce's partnership with UCSF Health involves testing eVerse in a medical/billing environment. Savarese emphasizes the healthcare space as extremely important, offering a significant opportunity for AI to alleviate pressure on physicians and other workers. The collaboration with UCSF Health began with billing use cases, a major pain point for patients due to the numerous systems required to provide answers and the knowledge often trapped within subject matter experts. The pilot with UCSF Health is showing promising results. By creating a Learning Engine with eVerse, AI agents can involve humans when they lack an answer, preventing "hallucinations" and allowing humans to "teach" the AI the correct way to handle a situation. Industry data suggests that 60-70% of inbound calls to healthcare contact centers are routine inquiries that can be fully automated. For the more complex 30-40% of cases, eVerse continuously improves performance through human-in-the-loop feedback, gradually expanding coverage. The results indicate a potential increase in coverage from the 60-70% range to 84-88%. This means that new skills taught by human experts can be generalized and retained by the Learning Engine, improving coverage and allowing humans to focus on the most complex tasks.

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleiPhone 16 Troubleshooting: Force Restart, Recovery, DFU, and Wireless Restore
    Next Article How Meta Ray-Ban Display Was Developed: From Concept to Refinement
    Samuel Alejandro

    Related Posts

    Dev

    Older Tech In The Browser Stack

    January 8, 2026
    Dev

    CSS Wrapped 2025

    January 8, 2026
    Tech

    Meta Acquires Chinese-Founded AI Startup Manus

    January 7, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Latest Post

    ChatGPT Mobile App Surpasses $3 Billion in Consumer Spending

    December 21, 202512 Views

    Automate Your iPhone’s Always-On Display for Better Battery Life and Privacy

    December 21, 202510 Views

    Creator Tayla Cannon Lands $1.1M Investment for Rebuildr PT Software

    December 21, 20259 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    About

    Welcome to NodeToday, your trusted source for the latest updates in Technology, Artificial Intelligence, and Innovation. We are dedicated to delivering accurate, timely, and insightful content that helps readers stay ahead in a fast-evolving digital world.

    At NodeToday, we cover everything from AI breakthroughs and emerging technologies to product launches, software tools, developer news, and practical guides. Our goal is to simplify complex topics and present them in a clear, engaging, and easy-to-understand way for tech enthusiasts, professionals, and beginners alike.

    Latest Post

    Build Resilient Generative AI Agents

    January 8, 20260 Views

    Accelerating Stable Diffusion XL Inference with JAX on Cloud TPU v5e

    January 8, 20260 Views

    Older Tech In The Browser Stack

    January 8, 20260 Views
    Recent Posts
    • Build Resilient Generative AI Agents
    • Accelerating Stable Diffusion XL Inference with JAX on Cloud TPU v5e
    • Older Tech In The Browser Stack
    • If you hate Windows Search, try Raycast for these 3 reasons
    • The Rotel DX-5: A Compact Integrated Amplifier with Mighty Performance
    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Disclaimer
    • Cookie Policy
    © 2026 NodeToday.

    Type above and press Enter to search. Press Esc to cancel.