Is a secure AI assistant possible?

AI agents present significant risks. Even within a chat interface, Large Language Models (LLMs) are prone to errors and undesirable behavior. When these models gain access to external tools like web browsers and email, the repercussions of such mistakes escalate dramatically.

This inherent risk might explain why the first notable LLM-powered personal assistant emerged not from a large AI research institution, which typically faces concerns about public image and legal responsibility, but from an independent software engineer, Peter Steinberger. In November 2025, Steinberger released his tool, now known as OpenClaw, on GitHub, and by late January, it had gained widespread attention.

OpenClaw leverages existing LLMs, enabling users to develop customized assistants. For many, this involves entrusting significant amounts of personal data, including years of emails and entire hard drive contents. This practice has alarmed security specialists. The potential dangers associated with OpenClaw are so vast that a comprehensive review of all the security blog posts and articles discussing its vulnerabilities published recently would likely take a considerable amount of time. The Chinese government even issued a public warning regarding OpenClaw’s security flaws.

Addressing these worries, Steinberger stated on X that individuals without technical expertise should avoid using the software. Despite this caution, there is a strong demand for OpenClaw’s capabilities, extending beyond those capable of conducting their own software security assessments. AI companies aiming to enter the personal assistant market must develop systems that ensure user data safety and security. This will require adopting advanced strategies from agent security research.

Risk management

Essentially, OpenClaw functions as an advanced interface for LLMs. Users can select any LLM to operate as the core intelligence, which then acquires enhanced memory and the capacity to schedule and repeat tasks. Unlike agentic solutions from major AI firms, OpenClaw agents are designed for continuous operation, allowing users to interact via platforms like WhatsApp. This enables them to serve as highly capable personal assistants, managing daily tasks, planning trips, and even developing new applications.

However, such extensive power comes with significant implications. For an AI personal assistant to manage an inbox, it requires access to email accounts and all the sensitive data within them. To facilitate purchases, credit card details must be provided. Furthermore, for tasks like coding on a computer, the assistant needs a certain level of access to local files.

Several issues can arise from this. Firstly, an AI assistant could make an error, as seen when a Google Antigravity coding agent reportedly erased a user’s entire hard drive. Secondly, unauthorized individuals could exploit conventional hacking methods to access the agent, either to steal sensitive information or execute harmful code. Since OpenClaw’s rise in popularity, security researchers have exposed multiple vulnerabilities of this nature, endangering users unfamiliar with security practices.

Both types of risks are manageable. Some users opt to operate their OpenClaw agents on isolated computers or within cloud environments, safeguarding their hard drive data from accidental deletion. Other vulnerabilities could be addressed through established security protocols.

However, many experts highlight a more subtle security threat: prompt injection. This technique essentially hijacks an LLM. Attackers can manipulate an LLM by embedding malicious text or images on a website it might browse, or by sending them to an inbox it monitors, thereby compelling it to perform actions against its user’s intent.

If an LLM possesses access to a user’s private data, the ramifications could be severe. Nicolas Papernot, a professor of electrical and computer engineering at the University of Toronto, states, “Using something like OpenClaw is like giving your wallet to a stranger in the street.” The comfort level of major AI companies in providing personal assistants will likely depend on the robustness of their defenses against these types of attacks.

It is worth noting that prompt injection has not yet led to any publicly reported major incidents. However, with potentially hundreds of thousands of OpenClaw agents now active online, prompt injection could become a more attractive method for cybercriminals. Papernot suggests, “Tools like this are incentivizing malicious actors to attack a much broader population.”

Building guardrails

The concept of “prompt injection” was introduced by LLM blogger Simon Willison in 2022, shortly before ChatGPT’s public release. Even then, it was evident that LLMs would bring forth a novel category of security vulnerability as they became more prevalent. LLMs struggle to differentiate between user instructions and the data used to execute those instructions, such as emails or web search results; they perceive all as mere text. Consequently, if an attacker embeds malicious sentences in an email, and the LLM misinterprets them as a user command, the attacker can compel the LLM to perform desired actions.

Prompt injection remains a challenging issue with no immediate comprehensive solution. Dawn Song, a professor of computer science at UC Berkeley, notes that “A definitive solution is not yet available.” Nevertheless, a strong academic community is actively researching this problem, developing strategies that could eventually ensure the safety of AI personal assistants.

From a technical standpoint, OpenClaw can be used today without the risk of prompt injection by simply keeping it offline. However, this severely limits its utility, as an AI assistant’s primary functions often include email management, calendar organization, and online research. The challenge in defending against prompt injection lies in preventing the LLM from falling victim to hijacking attempts while still allowing it to perform its intended tasks.

One method involves training the LLM to disregard prompt injections. A key phase in LLM development, known as post-training, transforms a model capable of generating realistic text into a helpful assistant by “rewarding” appropriate responses and “punishing” failures. While these terms are metaphorical, the LLM learns from them. Through this process, an LLM can be taught to resist specific instances of prompt injection.

However, a balance must be struck: if an LLM is overly aggressive in rejecting injected commands, it might also dismiss legitimate user requests. Given the inherent randomness in LLM behavior, even models highly trained to resist prompt injection may occasionally falter.

A different strategy focuses on intercepting prompt injection attacks before they reach the LLM. This usually entails employing a dedicated detector LLM to identify prompt injections within the data intended for the primary LLM. Yet, a recent study revealed that even the most effective detectors entirely missed certain types of prompt injection attacks.

The third strategy is more intricate. Instead of scrutinizing LLM inputs for prompt injections, this method aims to establish policies that govern the LLM’s outputs and behaviors, preventing it from executing harmful actions. Some defenses are straightforward: for instance, if an LLM is restricted to emailing only pre-approved addresses, it cannot transmit a user’s credit card details to an attacker. However, such strict policies could hinder the LLM from performing many beneficial tasks, like researching and contacting professional connections for the user.

Neil Gong, a professor of electrical and computer engineering at Duke University, notes, “The challenge is how to accurately define those policies. It’s a trade-off between utility and security.”

More broadly, the agentic AI community grapples with this balance: when will agents achieve sufficient security to be truly valuable? Opinions diverge among experts. Dawn Song, whose company Virtue AI develops an agent security platform, believes that safely deploying an AI personal assistant is feasible today. Conversely, Gong asserts, “We’re not there yet.”

While complete protection against prompt injection for AI agents may not yet be achievable, methods exist to reduce the associated risks. Some of these techniques could potentially be integrated into OpenClaw. At the recent ClawCon event in San Francisco, Steinberger revealed that a security specialist had joined the team to enhance the tool’s safety.

Currently, OpenClaw still has vulnerabilities, yet this has not deterred its many eager users. George Pickett, a volunteer maintainer for the OpenClaw GitHub repository and an advocate for the tool, has implemented personal security measures: he operates it in the cloud to prevent accidental hard drive erasure and has established safeguards to restrict unauthorized access to his assistant.

However, Pickett has not taken specific steps to counter prompt injection. While acknowledging the risk, he notes that he has not encountered any reported instances of it affecting OpenClaw. He commented, “Perhaps my perspective is naive, but it’s unlikely that I’ll be the first one to be hacked.”

Latest Post

Verifying 5G Standalone Activation on Your iPhone

Hands on: the Galaxy S26 and S26 Plus are more of the same for more money

IronCurtain: A Secure AI Agent Designed to Prevent Rogue Actions

IronCurtain: A Secure AI Agent Designed to Prevent Rogue Actions

Project Genie: Experimenting with Infinite, Interactive Worlds

Text Generation Using Diffusion Models and ROI with LLMs

ChatGPT Mobile App Surpasses $3 Billion in Consumer Spending

Automate Your iPhone’s Always-On Display for Better Battery Life and Privacy

Creator Tayla Cannon Lands $1.1M Investment for Rebuildr PT Software

Latest Post

Verifying 5G Standalone Activation on Your iPhone

Hands on: the Galaxy S26 and S26 Plus are more of the same for more money

IronCurtain: A Secure AI Agent Designed to Prevent Rogue Actions

Latest Post

Is a secure AI assistant possible?

Risk management

Building guardrails

Related Posts