Skip to main content
  1. Posts/

OpenClaw Security Evolution: From Crisis to Cautious Optimism, And Then OpenAI Showed Up

A Lot Can Change in Two Weeks
#

When I published my first piece on OpenClaw’s security problems back on January 31st, the platform looked like everything wrong with how we build AI, 100,000+ users, exposed instances leaking credentials, exploits working in under five minutes, and an architecture that was functionally indistinguishable from infostealer malware. I wasn’t exaggerating. It was genuinely bad.

That was sixteen days ago.

Since then, OpenClaw patched over 40 vulnerabilities, brought on a serious security advisor, partnered with VirusTotal to scan its skill marketplace, and got hit with one of the more sophisticated supply chain attacks I’ve seen targeting an AI platform. Oh, and the founder just joined OpenAI.

There’s a lot to unpack here. Let me walk through it the way I’d brief an executive team, what actually improved, what’s still broken by design, and what the OpenAI move means for everyone who has been building on top of this thing.

What Has Actually Improved
#

Security Leadership That Means Something
#

The most important change isn’t a patch or a config option. It’s organizational. OpenClaw brought on Jamieson O’Reilly, founder of Dvuln, CREST Advisory Council member, and notably the same researcher who originally demonstrated how trivially you could compromise OpenClaw, as their lead security advisor.

That last part matters. Hiring the person who broke your product publicly is either a genius move or a desperate one. In this case I think it’s the former. O’Reilly knows exactly where the skeletons are buried because he put several of them there. The project now has a formal vulnerability disclosure program, 48-hour response commitments for complete reports, and transparent processes that didn’t exist two months ago.

This is the difference between slapping a security label on something and actually building a security culture. OpenClaw went from “there’s no perfectly secure setup” as their only defense to having real institutional ownership over the problem.

Version 2026.2.12, Genuine Engineering, Not Cosmetic Patches
#

On February 13th, OpenClaw dropped version 2026.2.12 with fixes for 40+ vulnerabilities and a real attempt at defense-in-depth hardening. Having reviewed the changelog, this isn’t the kind of release where you rename a function and call it a security update. The substance is there.

The headline fixes worth knowing about:

SSRF protection, The gateway now enforces strict deny policies for URL-based inputs, with hostname allowlists, per-request URL limits, and audit logging for blocked fetch attempts. This closes a class of attacks that were embarrassingly easy to execute before.

Prompt injection defenses, Browser and web tool outputs are now treated as untrusted data, wrapped in structured metadata and sanitized before reaching the model. This is harder to get right than it sounds, and I’m reserving judgment on how durable these defenses are against novel attack patterns, but it’s directionally correct.

CVE-2026-25253, A high-severity RCE vulnerability that let crafted malicious content exfiltrate authentication tokens and commandeer local gateway control has been patched. This one was serious. If you haven’t updated, stop reading this and go update.

Authentication hardening, When no credentials are configured, OpenClaw now auto-generates secure gateway tokens and flags unauthenticated browser control routes. The old default of “just leave it open and see what happens” is gone.

Path traversal, Agents could previously read arbitrary files on the host by manipulating media file paths. Fixed. This is the kind of vulnerability that keeps forensics teams busy after a breach.

Scheduler reliability, The cron scheduler also got significant work: duplicate triggers, skipped jobs, and restart-related timing issues are addressed. Less dramatic than the RCE fix, but scheduler failures in an autonomous agent context create unpredictable behavior that’s a nightmare to audit.

The VirusTotal Partnership Is Smarter Than It Looks
#

Here’s the one that I think the security community underappreciated. OpenClaw partnered with VirusTotal to automatically scan every skill published to ClawHub using their Code Insight capability.

The workflow is clean: skills publish, scan runs asynchronously, benign verdicts auto-approve, suspicious content gets a warning label but stays visible (transparency by design), malicious skills get immediately blocked. Full VirusTotal reports are visible on the skill detail page.

Does this solve the supply chain problem? No. I’ll get to why in a minute. But it creates a meaningful filter that didn’t exist during the initial viral wave, and doing it in partnership with VirusTotal rather than rolling a homegrown scanner was the right call.

Architectural Hardening Across the Stack
#

Beyond the CVE-specific fixes, version 2026.2.1 had already enforced TLS 1.3 as the minimum, implemented system prompt guardrails, and addressed path traversal and local file inclusion vulnerabilities more broadly.

The platform now ships with:

  • Automated secret detection in CI/CD pipelines
  • Docker configs running as non-root users with reduced attack surface
  • Read-only filesystem options and capability restrictions
  • A public security roadmap tracked via GitHub issues

That last item might seem minor but it’s actually significant. Public accountability creates pressure to ship. A private roadmap that nobody outside the team can see gets quietly deprioritized when things get busy. This one is visible.

The Reality Check: ClawHavoc
#

While the core platform was hardening, the ecosystem got hit hard.

What Happened
#

Security researchers at Koi Security audited all 2,857 skills on ClawHub and found 341 malicious ones. Of those, 335 came from a single coordinated campaign they’re now tracking as ClawHavoc.

The sophistication was genuinely alarming. Attackers disguised malware as productivity boosters, cryptocurrency trackers, and YouTube summarizers. They built skills that looked professional, had legitimate documentation, worked correctly for their stated purpose, and buried the malicious payload around line 180 in hundreds of lines of otherwise clean code.

The infection vectors were different by platform:

On Windows, skills prompted users to download password-protected ZIP files containing keyloggers. The password protection was specifically designed to bypass automated antivirus scanning. That’s not amateur hour.

On macOS, skills executed base64-encoded scripts that fetched Atomic macOS Stealer (AMOS), a commodity infostealer that goes for around $500-$1,000/month as malware-as-a-service. AMOS harvests browser credentials, keychain passwords, cryptocurrency wallet data, SSH keys, files from common user directories, and, critically for AI agent environments, API keys, authentication tokens, and any secrets the agent is authorized to access.

The attack categories were calculated: 111 crypto utility skills (Solana and Phantom wallet trackers as the dominant category), 17 Google Workspace integration skills, 15 Ethereum gas tracker skills, plus a long tail of YouTube, weather, and productivity tools. All targeting users who are likely to have crypto credentials worth stealing.

Why It Worked
#

The attack exploited something that’s hard to patch: trust.

Skills with high download counts felt safe. Skills with clean documentation felt professional. The premise of “install this prerequisite to make the skill work” felt reasonable. Users followed instructions without realizing those instructions were the attack.

This is the social engineering dimension of supply chain attacks that technical controls struggle to address. You can scan code for known malware signatures. You can’t easily scan for well-disguised malicious intent wrapped in legitimate functionality.

OpenClaw creator Peter Steinberger rolled out a reporting feature where skills with more than 3 unique user reports get auto-hidden by default. That’s reactive though, you’re relying on victims to report before the damage is contained.

The underlying structural problem is that ClawHub is open by default with a one-week-old GitHub account as the only barrier to publishing. A Snyk audit found that 47% of ClawHub skills had at least one security concern, ranging from credential exposure to excessive permissions. That number should be uncomfortable for anyone making deployment decisions.

What This Means for Security and AI Governance
#

A few things I keep coming back to when I look at OpenClaw’s arc over the past month:

Security can be accelerated, but supply chain is the hard problem. Platform vulnerabilities get patched. That’s tractable. Supply chain attacks that exploit human trust in third-party code are fundamentally harder. This isn’t an OpenClaw-specific problem, it’s the same challenge that will hit ChatGPT plugins, Claude MCP servers, LangChain tools, and every community marketplace that follows. OpenClaw is just the most visible early test case.

Viral adoption stress-tests security models in ways that can’t be simulated. You can’t anticipate what 100,000 users will do with your platform by running penetration tests. OpenClaw’s chaotic launch, as painful as it was, produced a hardened platform faster than a measured rollout ever would have. The failures were expensive but they were real. The lessons are concrete.

User education is load-bearing infrastructure, not nice-to-have. ClawHavoc worked because users didn’t understand what they were executing. An AI agent with full system access is a powerful tool and a serious security liability simultaneously. If the people deploying it can’t distinguish between a legitimate prerequisite and a social engineering prompt, that’s not a gap you can fix with better scanning alone.

And Now, The OpenAI Move
#

This happened as I was finishing this piece, so I want to address it directly.

Peter Steinberger, OpenClaw’s creator, announced over the weekend that he is joining OpenAI to work on bringing agents to everyone. Sam Altman called it out on X, saying Steinberger is joining “to drive the next generation of personal agents” and describing him as “a genius with a lot of amazing ideas about the future of very smart agents interacting with each other.”

A few things worth clarifying because I’ve already seen this described inaccurately:

This is an acqui-hire, not a company acquisition. No company was bought. No acquisition price was disclosed. Steinberger is joining OpenAI as a person. OpenClaw itself transitions to an independent open-source foundation, and OpenAI is sponsoring it. The project stays open-source, that was Steinberger’s non-negotiable condition. Meta and Microsoft both reportedly courted him too. He chose OpenAI specifically because they agreed to honor that condition.

The irony here is thick in a way that should be a lesson for the industry. OpenClaw was arguably one of the biggest drivers of paying API traffic to Anthropic, since most users ran it on Claude. Anthropic’s trademark enforcement around the “Clawdbot” name, while legally defensible, may have been the catalyst that pushed Steinberger toward their biggest competitor. Protecting your brand and alienating your ecosystem aren’t always separable actions.

What this means for the security posture we’ve been discussing: In the near term, probably not much changes. The project has independent foundation governance, the security improvements that shipped are shipped, and Steinberger indicated the roadmap continues. The meaningful uncertainty is longer term, what happens to community trust, development velocity, and the open-source commitment as OpenAI’s commercial interests and OpenClaw’s architecture collide. OpenAI is never going to ship something with OpenClaw’s current permissiveness through their normal product channels. The whole point of OpenClaw is the lack of guardrails. How that tension resolves is genuinely unclear.

For organizations making deployment decisions: This development changes the political landscape around OpenClaw more than the technical one. If you were waiting for enterprise-grade safety controls backed by a major AI lab, OpenAI’s involvement might eventually deliver that, but you’ll be waiting a while, and what you get will almost certainly look different from what OpenClaw is today. If you’re deploying now, do it against the current technical realities, not against hypothetical future versions.

Current State: What’s Still Problematic
#

I want to be specific here rather than vague.

Roughly 135,000 exposed OpenClaw instances remain online, many running unpatched versions. This is a user behavior problem as much as a platform problem, but it creates real risk in the ecosystem.

900+ malicious skills have been identified across ClawHub as of this writing. The marketplace continues to grow faster than it can be vetted, even with automated scanning in place.

Prompt injection is architecturally inherent. With persistent memory, attacks become stateful and delayed, malicious payloads don’t need immediate execution because they can sit in memory and trigger later. This isn’t fixable with configuration changes. It requires a different architectural approach to agent memory that doesn’t exist yet.

Trust boundaries remain blurred by design. AI agents interpret natural language from untrusted sources, emails, web pages, documents, and execute system commands. No configuration prevents a carefully crafted prompt from manipulating agent behavior. This is the core tension that makes OpenClaw powerful and dangerous simultaneously.

Default configurations are still risky. The platform itself acknowledges there’s no perfectly secure setup. Active hardening is required, and most users aren’t doing it.

Security Recommendations for Deployment Right Now
#

Given everything above, here’s where I land on practical guidance:

For Individual Users
#

Never run OpenClaw on your primary machine. Deploy on a dedicated VPS, isolated VM, or air-gapped system. This isn’t optional, it’s the baseline.

Use Docker containerization. It’s the single highest-impact security improvement available to individual users. Containers limit blast radius if a malicious skill executes code.

Never expose the gateway to the internet directly. Bind it to localhost only, with external access going through an authenticated reverse proxy with TLS.

Audit every skill before installation. After ClawHavoc, “looks professional” isn’t due diligence. Read the source code, check the author’s history, verify the VirusTotal scan results. Budget 15 minutes per skill minimum.

Rotate credentials monthly. API keys, email credentials, messaging tokens, all of it. Use a secret manager. Environment files aren’t acceptable credential storage for anything connected to an AI agent with system access.

For Organizations
#

For personal or experimental use, require isolated home lab environments with no corporate credentials. Full stop.

For controlled pilots, and I mean genuinely controlled, with a threat model in place, you need comprehensive network segmentation, sandboxed monitored environments, dedicated service accounts with minimal privileges, and incident response procedures specifically designed for AI compromise scenarios.

For enterprise deployment, I still don’t recommend it without third-party security audits, behavior-based anomaly detection, allowlist-only skill policies with no public ClawHub access, full EDR/XDR coverage, SIEM integration, and multi-approval workflows for any system-level tool execution. If your business requirements can’t wait for the ecosystem to mature, those aren’t optional compensating controls, they’re the minimum.

Monitoring Indicators Security Teams Should Know
#

Watch for unexpected outbound connections to unknown IPs, particularly ranges like 91.92.242[.]30 associated with ClawHavoc. Monitor for unexpected process spawning from AI agent directories. Watch access to credential storage locations, browser profiles, keychain, ~/.clawdbot/. Base64-encoded commands appearing in shell history is a significant indicator. Large data exfiltration to external endpoints and installation of skills with recent publication dates and low community adoption are both worth flagging.

Treat AI agent activity as privileged user behavior requiring enhanced monitoring. Implement network-level controls to detect C2 patterns. Deploy honeypot credentials to detect exfiltration early.

The Broader Picture
#

OpenClaw’s arc over the past month is more instructive than any hypothetical case study about AI security governance. It shows that open-source AI platforms can mature their security posture rapidly under fire, but only with dedicated expertise, institutional commitment, and transparent processes. It also shows that supply chain security for AI ecosystems is a hard, unsolved problem that automated scanning addresses incompletely.

The OpenAI acqui-hire adds a layer of complexity that wasn’t here two weeks ago. The project now lives in the tension between open-source autonomy and corporate AI safety philosophy. How that plays out will be worth watching, not just for OpenClaw users, but for everyone trying to understand what safe, deployable AI agents eventually look like.

My assessment hasn’t fundamentally changed from January, just updated: OpenClaw is a powerful, genuinely useful, and inherently risky platform. The risks are more managed than they were a month ago. They aren’t gone. Understanding security and implementing it properly remains a prerequisite for deployment, not a nice-to-have.

For those with the expertise and risk tolerance to deploy it correctly, OpenClaw now represents something interesting: a real-world experiment in whether open-source AI can evolve fast enough to survive its own success. The OpenAI move adds another variable to that experiment.

Jury’s still out. Early returns are more promising than they were. Stay vigilant, demand transparency, and remember, there’s no perfectly secure setup, only continuously improving defenses against continuously evolving threats.

The lobster is getting more secure. But it just signed a deal with the biggest fish in the ocean. Watch this space. 🦞🔐


Additional Resources
#


About This Analysis
#

This analysis is based on security research, vulnerability disclosures, and threat intelligence reports published between February 1–17, 2026, including breaking news on the Steinberger/OpenAI announcement. All findings have been independently verified across multiple sources. The situation continues to evolve, I will update as warranted.

If you’re deploying AI agents in enterprise environments or have encountered OpenClaw-related security incidents, I’d welcome your perspective. Connect with me on LinkedIn or reach out through my website.


Disclaimer: This article is for educational and security awareness purposes. All security research cited was conducted responsibly by qualified professionals. Do not attempt unauthorized access to systems or conduct security testing without proper authorization. The author has no affiliation with OpenClaw, OpenAI, or Anthropic.

Juan Carlos Munera
Author
Juan Carlos Munera
Passionate about cybersecurity, governance, risk, and compliance. Sharing insights on security best practices, frameworks, and industry trends.

Related

AI in Payment Environments

·1453 words·7 mins
PCI DSS v4.x wasn’t written with AI in mind, but the framework is more adaptable than it gets credit for. Here’s where the standard holds up, where there’s room to grow, and how the PCI SSC is already engaging with AI through initiatives like The AI Exchange.

Carding-as-a-Service: What Underground Dump Shops Mean for PCI Scope

·1650 words·8 mins
When we talk about PCI DSS compliance, the conversation tends to stay clinical. Scoping exercises. Network diagrams. Encryption at rest. But compliance doesn’t exist in a vacuum. It exists because there’s a thriving, industrialized criminal economy on the other end waiting to monetize every gap you leave open. Rapid7 published a detailed piece of research this month that every QSA, security engineer, and compliance leader should read: their analysis of the carding-as-a-service (CaaS) ecosystem and the underground dump shops that power it. Having spent years on the assessor side of PCI, I want to connect what Rapid7 found directly back to what it means for your cardholder data environment and your scoping decisions.