Author: openx_editor

  • Why AI Agent Demos Impress but Production Disappoints: The Three Disciplines Enterprises Are Learning

    Why AI Agent Demos Impress but Production Disappoints: The Three Disciplines Enterprises Are Learning

    You’ve seen the demos. AI agents that handle customer inquiries, process refunds, and schedule appointments with superhuman efficiency. But behind the glossy presentations lies a sobering reality: most AI agent deployments fail to deliver on their promise in production environments.

    Getting AI agents to perform reliably outside of controlled demonstrations is turning out to be harder than enterprises anticipated. Fragmented data, unclear workflows, and runaway escalation rates are slowing deployments across industries. The technology itself often works well in demonstrations鈥攖he challenge begins when it’s asked to operate inside the complexity of a real organization.

    The Three Disciplines of Production AI

    Creatio, a company that’s been deploying AI agents for enterprise customers, has developed a methodology built around three core disciplines:

    • Data virtualization to work around data lake delays
    • Agent dashboards and KPIs as a management layer
    • Tightly bounded use-case loops to drive toward high autonomy

    In simpler use cases, these practices have enabled agents to handle 80-90% of tasks autonomously. With further tuning, Creatio estimates they could support autonomous resolution in at least half of more complex deployments.

    Why Agents Keep Failing

    The obstacles are numerous. Enterprises eager to adopt agentic AI often run into significant bottlenecks around data architecture, integration, monitoring, security, and workflow design.

    The data problem is almost always first. Enterprise information rarely exists in a neat or unified form鈥攊t’s spread across SaaS platforms, apps, internal databases, and other data stores. Some is structured, some isn’t. But even when enterprises overcome the data retrieval problem, integration becomes a major challenge.

    Agents rely on APIs and automation hooks to interact with applications, but many enterprise systems were designed before this kind of autonomous interaction was even conceived. This results in incomplete or inconsistent APIs, and systems that respond unpredictably when accessed programmatically.

    Perhaps most fundamentally, organizations attempt to automate processes that were never formally defined. As one analyst noted, many business workflows depend on tacit knowledge鈥攖he kind of exceptions that employees handle intuitively without explicit instructions. Those missing rules become startlingly obvious when workflows are translated into automation logic.

    The Tuning Loop That Actually Works

    Creatio deploys agents in a bounded scope with clear guardrails, followed by an explicit tuning and validation phase. The loop typically follows this pattern:

    Design-time tuning (before go-live): Performance is improved through prompt engineering, context wrapping, role definitions, workflow design, and grounding in data and documents.

    Human-in-the-loop correction (during execution): Developers approve, edit, or resolve exceptions. When humans have to intervene most frequently鈥攅scalation or approval scenarios鈥攗sers establish stronger rules, provide more context, and update workflow steps, or narrow tool access.

    Ongoing optimization (after go-live): Teams continue to monitor exception rates and outcomes, then tune repeatedly as needed, helping improve accuracy and autonomy over time.

    Retrieval-augmented generation (RAG) grounds agents in enterprise knowledge bases, CRM data, and proprietary sources. The feedback loop puts extra emphasis on intermediate checkpoints鈥攈umans review artifacts such as summaries, extracted facts, or draft recommendations and correct errors before they propagate.

    Data Readiness Without the Overhaul

    Is my data ready? is a common early question. Enterprises know data access is important but can be turned off by massive data consolidation projects. But virtual connections can allow agents access to underlying systems without requiring enterprises to move everything into a central data lake.

    One approach pulls data into a virtual object, processes it, and uses it like a standard object for UIs and workflows鈥攏o need to persist or duplicate large volumes of data. This technique is particularly valuable in banking, where transaction volumes are simply too large to copy into CRM but are still valuable for AI analysis and triggers.

    Matching Agents to the Work

    Not all workflows are equally suited for autonomous agents. The best fits are high-volume processes with clear structure and controllable risk鈥攄ocument intake and validation in onboarding, loan preparation, standardized outreach like renewals and referrals.

    Financial institutions provide a compelling example. Commercial lending teams and wealth management typically operate in silos, with no one looking across departments. An autonomous agent can identify commercial customers who might be good candidates for wealth management or advisory services鈥攕omething no human is actively doing at most banks. Companies that have applied agents to this scenario claim significant incremental revenue benefits.

    In regulated industries, longer-context agents aren’t just preferable, they’re necessary. For multi-step tasks like gathering evidence across systems, summarizing, comparing, drafting communications, and producing auditable rationales, the agent isn’t giving you a response immediately鈥攊t may take hours or days to complete full end-to-end tasks.

    This requires orchestrated agentic execution rather than a single giant prompt. The approach breaks work into deterministic steps performed by sub-agents, with memory and context management maintained across various steps and time intervals.

    The Digital Worker Model

    Once deployed, agents are monitored with dashboards providing performance analytics, conversion insights, and auditability. Essentially, agents are treated like digital workers with their own management layer and KPIs.

    Users see a dashboard of agents in use and each of their processes, workflows, and executed results. They can drill down into individual records showing step-by-step execution logs and related communications鈥攕upporting traceability, debugging, and agent tweaking.

    2026 is shaping up to be the year enterprise AI moves from impressive demos to reliable production systems鈥攂ut only for organizations willing to invest the time in proper training and tuning.

  • Beyond LLMs: The Three Architectural Approaches Teaching AI to Understand Physics

    Beyond LLMs: The Three Architectural Approaches Teaching AI to Understand Physics

    Large language models excel at writing poetry and debugging code, but ask them to predict what happens when you drop a ball and you’ll quickly discover their limitations. Despite mastering chess, generating art, and passing bar exams, today’s most powerful AI systems fundamentally don’t understand physics.

    This gap is becoming increasingly apparent as companies try to deploy AI in robotics, autonomous vehicles, and manufacturing. The solution? World models鈥攊nternal simulators that let AI systems safely test hypotheses before taking physical action. And investors are paying attention: AMI Labs raised a billion-dollar seed round, while World Labs secured funding from backers including Nvidia and AMD.

    The Problem with Next-Token Prediction

    LLMs work by predicting the next token in a sequence. This approach has been remarkably successful for text, but it has a critical flaw when applied to physical tasks. These models cannot reliably predict the physical consequences of real-world actions, according to AI researchers.

    Turing Award recipient Richard Sutton warned that LLMs just mimic what people say instead of modeling the world, which limits their capacity to learn from experience. DeepMind CEO Demis Hassabis calls this jagged intelligence鈥擜I that can solve complex math olympiads but fails at basic physics.

    The industry is responding with three distinct architectural approaches, each with different tradeoffs.

    1. JEPA: Learning Abstract Representations

    The Joint Embedding Predictive Architecture, endorsed by AMI Labs and pioneered by Yann LeCun, takes a fundamentally different approach. Instead of trying to predict what the next video frame will look like at the pixel level, JEPA models learn a smaller set of abstract, or latent, features.

    Think about how humans actually observe the world. When you watch a car driving down a street, you track its trajectory and speed鈥攜ou don’t calculate the exact reflection of light on every leaf in the background. JEPA models reproduce this cognitive shortcut.

    The benefits are substantial: JEPA models are highly compute and memory efficient, require fewer training examples, and run with significantly lower latency. These characteristics make it suitable for applications where real-time inference is non-negotiable鈥攔obotics, self-driving cars, high-stakes enterprise workflows.

    Healthcare company Nabla is already using this architecture to simulate operational complexity in fast-paced medical settings, reducing cognitive load for healthcare workers.

    2. Gaussian Splats: Building Spatial Worlds

    The second approach, adopted by World Labs led by AI pioneer Fei-Fei Li, uses generative models to build complete 3D spatial environments. The process takes an initial prompt鈥攅ither an image or textual description鈥攁nd uses a generative model to create a 3D Gaussian splat.

    A Gaussian splat represents 3D scenes using millions of tiny mathematical particles that define geometry and lighting. Unlike flat video generation, these 3D representations can be imported directly into standard physics and 3D engines like Unreal Engine, where users and AI agents can freely navigate and interact from any angle.

    World Labs founder Fei-Fei Li describes LLMs as wordsmiths in the dark鈥攑ossessing flowery language but lacking spatial intelligence and physical experience. The company’s Marble model aims to give AI that missing spatial awareness.

    Industrial design giant Autodesk has backed World Labs heavily, planning to integrate these models into their design applications. The approach has massive potential for spatial computing, interactive entertainment, and building training environments for robotics.

    3. End-to-End Generation: Physics Native

    The third approach uses an end-to-end generative model that continuously generates the scene, physical dynamics, and reactions on the fly. Rather than exporting to an external physics engine, the model itself acts as the engine.

    DeepMind’s Genie 3 and Nvidia’s Cosmos fall into this category. These models ingest an initial prompt alongside continuous user actions and generate subsequent environment frames in real-time, calculating physics, lighting, and object reactions natively.

    The compute cost is substantial鈥攃ontinuously rendering physics and pixels simultaneously requires significant resources. But the investment enables synthetic data factories that can generate infinite interactive experiences and massive volumes of synthetic training data.

    Nvidia Cosmos uses this architecture to scale synthetic data and physical AI reasoning. Waymo built its world model on Genie 3 for training self-driving cars, synthesizing rare, dangerous edge-case conditions without the cost or risk of physical testing.

    The Hybrid Future

    LLMs will continue serving as the reasoning and communication interface, but world models are positioning themselves as foundational infrastructure for physical and spatial data pipelines. We’re already seeing hybrid architectures emerge.

    Cybersecurity startup DeepTempo recently developed LogLM, integrating LLMs with JEPA elements to detect anomalies and cyber threats from security logs. The boundary between AI that thinks and AI that understands the physical world is beginning to dissolve.

    As world models mature, expect AI systems that can not only tell you how to change a tire, but actually understand what happens when you apply torque to a rusted bolt. The physical world is finally coming into focus for artificial intelligence.

  • Hermes Agent: The Self-Improving AI Agent That Learns From Every Conversation

    Hermes Agent: The Self-Improving AI Agent That Learns From Every Conversation

    Artificial intelligence agents are everywhere these days, but most of them share a fundamental limitation: they don’t really learn from their experiences. You have the same conversation with them repeatedly, and they never get better. Nous Research aims to change that with Hermes Agent, a new open-source project that bills itself as “the agent that grows with you.”

    A Memory That Actually Remembers

    Traditional AI assistants treat every conversation as a clean slate. Hermes takes a fundamentally different approach. It maintains persistent memory across sessions, creating skills from experience and improving them during use. The agent nudges itself to retain knowledge, searches through past conversations, and builds a deepening model of who you are over time.

    “The only agent with a built-in learning loop,” as the project describes itself, goes beyond simple context windows. While conventional agents can only work with what you tell them in the current session, Hermes actively works to preserve and apply knowledge from previous interactions. That customer you mentioned last week? Hermes remembers. That preference you expressed months ago? It’s still there.

    Works Everywhere You Do

    One of Hermes’s standout features is its multi-platform support. You can interact with it through Telegram, Discord, Slack, WhatsApp, Signal, or traditional CLI鈥攁ll from a single gateway process. Voice memo transcription and cross-platform conversation continuity mean you can start a conversation on your phone and continue it on your desktop without missing a beat.

    The agent runs on a VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. With Daytona and Modal, the agent’s environment hibernates when idle and wakes on demand. This means you get persistent assistance without persistent costs.

    Model Flexibility Without Lock-In

    Hermes doesn’t force you into a single AI provider. You can use Nous Portal, OpenRouter (with access to 200+ models), z.ai/GLM, Kimi/Moonshot, MiniMax, OpenAI, or your own endpoint. Switching models is as simple as running the model command鈥攏o code changes, no lock-in.

    This flexibility is particularly valuable for developers who want to experiment with different models for different tasks, or organizations that need to balance cost and performance across use cases.

    The Skills System

    Hermes includes a sophisticated skills system that allows the agent to create procedural memories and improve them autonomously. After completing complex tasks, the agent can create new skills that encapsulate what it learned. These skills then self-improve during subsequent use.

    The system uses FTS5 session search with LLM summarization for cross-session recall, and is compatible with the agentskills.io open standard. There’s also a Skills Hub where users can share and discover community-created skills.

    Research-Ready Architecture

    For AI researchers, Hermes offers batch trajectory generation, Atropos RL environments, and trajectory compression for training the next generation of tool-calling models. The project was built by Nous Research, the team behind several notable open-source AI projects.

    The installation process is straightforward鈥攔un a single curl command and you’re chatting with your new AI assistant in minutes. Windows users need WSL2, but Linux and macOS are supported natively.

    Migration from OpenClaw

    Interesting twist: Hermes can automatically import settings from OpenClaw, including persona files, memories, skills, API keys, and messaging configurations. If you’re already running an AI assistant setup, moving to Hermes is designed to be painless.

    With over 12,000 stars on GitHub, Hermes represents an interesting evolution in the AI agent space. Instead of just providing a static set of capabilities, it attempts to create a genuinely learning system鈥攐ne that gets better at helping you specifically, over time.

    The MIT-licensed project welcomes contributions and has an active Discord community for support and discussion. Whether you’re an individual looking for a more personal AI assistant or an enterprise exploring agentic workflows, Hermes offers a compelling combination of memory, flexibility, and self-improvement that sets it apart from the crowded agent space.