Close Menu
    Latest Post

    Enhanced Search Suggestions in Firefox

    January 12, 2026

    Biologists Treat LLMs Like Aliens to Uncover Their Secrets

    January 12, 2026

    Behind the Scenes: Developing the Question Assistant

    January 12, 2026
    Facebook X (Twitter) Instagram
    Trending
    • Enhanced Search Suggestions in Firefox
    • Biologists Treat LLMs Like Aliens to Uncover Their Secrets
    • Behind the Scenes: Developing the Question Assistant
    • Was Windows 8 Really That Bad? A Revisit 13 Years Later
    • Lenovo’s New Rollable Concept Could Be the Ideal Gaming Laptop
    • Dems pressure Google, Apple to drop X app as international regulators turn up heat
    • Go Katas: Production-Grade Challenges for Experienced Developers
    • TikTok Removes AI Weight Loss Ads from Fake Boots Account
    Facebook X (Twitter) Instagram Pinterest Vimeo
    NodeTodayNodeToday
    • Home
    • AI
    • Dev
    • Guides
    • Products
    • Security
    • Startups
    • Tech
    • Tools
    NodeTodayNodeToday
    Home»Dev»Behind the Scenes: Developing the Question Assistant
    Dev

    Behind the Scenes: Developing the Question Assistant

    Samuel AlejandroBy Samuel AlejandroJanuary 12, 2026No Comments6 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    src 1xgjq45 featured
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Article hero image

    Staging Ground was introduced to provide a private space for new users to receive feedback on question drafts from experienced members before public posting on Stack Overflow. While this initiative led to noticeable improvements in question quality, the process of obtaining human feedback and completing the Staging Ground review still required significant time.

    Reviewers frequently found themselves providing repetitive comments, such as indicating missing context or duplicate content. Stack Overflow even offers comment templates for common scenarios. This observation highlighted an opportunity to leverage machine learning and AI to automate feedback for these recurring issues, allowing human reviewers to focus on more complex cases. A partnership with Google provided access to Gemini, an AI tool instrumental in testing, identifying, and generating automated feedback.

    Ultimately, assessing question quality and delivering relevant feedback necessitated a combination of traditional machine learning methods and a generative AI solution. This article details the considerations, implementation, and measured outcomes of Question Assistant.

    What is a good question?

    Large Language Models (LLMs) offer significant text analysis capabilities, prompting an investigation into their ability to rate question quality within specific categories. Initial efforts focused on three defined categories: context and background, expected outcome, and formatting and readability. These categories were selected due to their frequent appearance in reviewer comments aimed at helping new users improve questions in Staging Ground.

    Tests with LLMs revealed an inability to reliably predict quality ratings and provide correlated feedback. The generated feedback was often repetitive and lacked specificity to its assigned category; for instance, all three categories might include generic advice about library or programming language versions. Furthermore, quality ratings and feedback remained unchanged even after question drafts were updated.

    To enable an LLM to reliably rate subjective question quality, a data-driven definition of a ‘quality question’ was necessary. Despite Stack Overflow’s guidelines on asking effective questions, quality is not readily quantifiable. This necessitated the creation of a labeled dataset for training and evaluating machine learning models.

    An initial attempt involved creating a ‘ground truth’ dataset for question rating. A survey distributed to 1,000 question-reviewers requested quality ratings (1-5) across the three categories. 152 participants completed the survey. Analysis using Krippendorff’s alpha yielded a low score, indicating the unsuitability of this labeled data for reliable model training and evaluation.

    Further data exploration led to the conclusion that numerical ratings alone do not offer actionable feedback. A score of ‘3’ in a category, for instance, lacks the context to inform a user on specific improvements needed to post their question.

    Although an LLM could not reliably determine overall quality, the survey confirmed the significance of the feedback categories. This insight guided an alternative approach: developing specific feedback indicators for each category. Instead of directly predicting a score, individual models were constructed to identify whether a question warranted feedback for a particular indicator.

    Building indicator models

    To avoid the broad and generic outputs often associated with LLMs, individual logistic regression models were developed. These models generate a binary response based on the question title and body, essentially determining if a specific comment template is applicable.

    The initial experiment focused on building models for a single category: context and background. This category was subdivided into four distinct and actionable feedback indicators:

    • Problem definition: The problem or goal is lacking information to understand what the user is trying to accomplish.
    • Attempt details: The question needs additional information on what you have tried and the pertinent code (as relevant).
    • Error details: The question needs additional information on error messages and debugging logs (as relevant).
    • Missing MRE: Missing a minimal reproducible example using a portion of your code that reproduces the problem.

    These feedback indicators were derived by clustering reviewer comments from Staging Ground posts to identify common themes. These themes aligned with existing comment templates and question close reasons, allowing the use of historical data for model training. Reviewer and close comments were vectorized using term frequency-inverse document frequency (TF-IDF) before being fed into logistic regression models.

    While traditional ML models were developed to flag questions based on quality indicators, an LLM was still required in the workflow to generate actionable feedback. When an indicator flags a question, a preloaded response text, along with the question and system prompts, is sent to Gemini. Gemini then synthesizes this information to produce feedback that is specific to the question and addresses the identified indicator.

    The following diagram illustrates the workflow:

    A sequence diagram illustrating the interaction between a Staging Ground User and a Question Feedback Service, which consists of Feedback Classifiers and an LLM (Large Language Model). The user sends a question title and body to the service. Feedback Classifiers assess the question draft against predefined feedback indicators. If indicators are positive, pre-loaded feedback text is retrieved and sent along with the question details to the LLM. The LLM generates specific feedback based on the positive indicators, which is then returned to the user. The process repeats for multiple edits until the question is refined. The diagram includes arrows showing the flow of interactions and a highlighted section indicating that this process repeats for multiple edits.

    These models were trained and stored within an Azure Databricks ecosystem. For production, a dedicated service on Azure Kubernetes downloads models from Databricks Unity Catalog and hosts them to generate predictions when feedback is requested.

    Subsequently, the models were prepared to generate feedback.

    Testing it on site

    The experiment was conducted in two stages: initially on Staging Ground, then expanded to stackoverflow.com for all users utilizing the Ask Wizard. Success was measured by collecting events via Azure Event Hub and logging predictions and results to Datadog. This data helped assess the helpfulness of the generated feedback and informed improvements for future indicator model iterations.

    The initial experiment took place in Staging Ground, targeting new users who often require the most assistance in drafting questions. An A/B test was implemented, with eligible Staging Ground users split equally into control and variant groups. The control group received no assistance from Gemini, while the variant group did. The objective was to determine if Question Assistant could increase the approval rate of questions to the main site and decrease review times.

    Results based on the initial goal metrics were inconclusive; neither approval rates nor average review times showed significant improvement for the variant group. However, the solution addressed a different problem. A notable increase was observed in the success rates of questions—defined as questions remaining open on the site and receiving an answer or a post score of at least +2. Thus, while the original objectives were not met, the experiment confirmed Question Assistant’s value to users and its positive influence on question quality.

    The second experiment involved an A/B test on all eligible users on the Ask Question page, integrated with the Ask Wizard. The aim was to validate the findings of the first experiment and assess Question Assistant’s utility for more experienced users.

    A consistent success rate increase of +12% was observed across both experiments. Given these significant and consistent findings, Question Assistant was made available to all Stack Overflow users on March 6, 2025.

    The next step forward

    Adjusting direction is a common occurrence in research and early development. Recognizing when a path lacks impact and pivoting to new logic is crucial for successful project completion. By combining traditional machine learning with Gemini, it was possible to integrate suggested indicator feedback with question text, delivering specific, contextual, and actionable feedback for users to improve their questions. This enhancement aims to streamline the process of finding needed knowledge. This represents progress in refining core Q&A flows to simplify asking, answering, and contributing to knowledge. Further iterations on the indicator models and optimizations to the question-asking experience using this feature are being explored by the Community Product teams.

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleWas Windows 8 Really That Bad? A Revisit 13 Years Later
    Next Article Biologists Treat LLMs Like Aliens to Uncover Their Secrets
    Samuel Alejandro

    Related Posts

    Dev

    The Math Is Haunted

    January 12, 2026
    Tools

    Zoomer: Boosting AI Performance at Scale Through Intelligent Debugging and Optimization

    January 11, 2026
    Dev

    AI is a crystal ball into your codebase

    January 11, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Latest Post

    ChatGPT Mobile App Surpasses $3 Billion in Consumer Spending

    December 21, 202512 Views

    Automate Your iPhone’s Always-On Display for Better Battery Life and Privacy

    December 21, 202510 Views

    Creator Tayla Cannon Lands $1.1M Investment for Rebuildr PT Software

    December 21, 20259 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    About

    Welcome to NodeToday, your trusted source for the latest updates in Technology, Artificial Intelligence, and Innovation. We are dedicated to delivering accurate, timely, and insightful content that helps readers stay ahead in a fast-evolving digital world.

    At NodeToday, we cover everything from AI breakthroughs and emerging technologies to product launches, software tools, developer news, and practical guides. Our goal is to simplify complex topics and present them in a clear, engaging, and easy-to-understand way for tech enthusiasts, professionals, and beginners alike.

    Latest Post

    Enhanced Search Suggestions in Firefox

    January 12, 20260 Views

    Biologists Treat LLMs Like Aliens to Uncover Their Secrets

    January 12, 20260 Views

    Behind the Scenes: Developing the Question Assistant

    January 12, 20260 Views
    Recent Posts
    • Enhanced Search Suggestions in Firefox
    • Biologists Treat LLMs Like Aliens to Uncover Their Secrets
    • Behind the Scenes: Developing the Question Assistant
    • Was Windows 8 Really That Bad? A Revisit 13 Years Later
    • Lenovo’s New Rollable Concept Could Be the Ideal Gaming Laptop
    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Disclaimer
    • Cookie Policy
    © 2026 NodeToday.

    Type above and press Enter to search. Press Esc to cancel.