Andrej Karpathy created a repository showing various professions and their exposure to automation, which he took down shortly after its release. The repository and table can still be accessed through a post by Josh Kale.
OpenViking is a new open-source tool designed to improve context management in AI agents by using a filesystem-based structure for memory and retrieval, addressing issues of fragmentation and inefficiency in traditional vector databases.
Caliber is an open-source command-line interface that automates the generation of AI agent setups customized to specific codebases. It scans the codebase for languages, frameworks, and dependencies to create configuration files and recommendations for multi-agent coordination protocols. The tool runs locally, ensuring code privacy, and evolves with changes in the codebase.
An experiment involving six AI agents with different reasoning styles debated unsolved math problems, producing valid Ramsey graph constructions and implementing a fact-checking protocol to prevent hallucinations in claims.
The concept of AI 'afterlives' suggests that individuals can leave behind AI agents trained on their personal data, allowing them to continue managing assets and making decisions posthumously. This raises questions about ownership, liability, and the implications of having a version of a person remain economically active after their death.
AgentMeet is a newly developed group chat API designed for AI agents to share context, onboard new agents, and collaborate in real-time. It utilizes FastAPI and asyncpg on Postgres for backend operations, allowing agents to communicate via simple HTTP requests. The platform aims to facilitate seamless interaction among AI agents without the need for SDKs or complex setups.
JL-Engine-Local is a dynamic agent-assembly engine that builds and runs AI agents entirely in RAM, allowing for flexible integration with various backends while maintaining user privacy and control.
OpenJarvis is an open framework designed for creating personal AI agents that operate entirely on-device, emphasizing a local-first approach. It features modular components for model selection, inference, and adaptation, and supports various backends while focusing on efficiency metrics.
A new autonomous pipeline has been developed that generates playable Godot games from text prompts, addressing challenges in LLM code generation and verification. The system utilizes a three-layer reference system for GDScript, implements agentic lazy-loading for context management, and employs a three-stage verification process to ensure the correctness of the generated code.
A tutorial on implementing an automated experimentation pipeline using Andrej Karpathy’s AutoResearch framework for hyperparameter discovery and experiment tracking in Google Colab.
A new platform allows AI agents to compete against each other to complete Pokémon Red without human intervention, controlling the emulator and managing gameplay autonomously. The runs are live streamed for viewers to watch.
A new open-source AI agent framework has been developed that supports local models and features a modular plugin system, allowing for flexible and interactive UI generation.
Nemotron 3 Super is an open-source 120-billion parameter model developed for advanced multi-agent reasoning, offering 7x higher throughput and double the accuracy of its predecessor. It features 'Reasoning Budgets' for compute cost control and is fully open-sourced for enterprise-grade autonomous agents.
Andrej Karpathy discusses a new pattern in AI development focusing on configurability through skills rather than traditional config files. SlimClaw, a Python fork inspired by NanoClaw, exemplifies this approach by allowing users to add features like Telegram through skills that modify the codebase directly, maintaining a clean architecture.
Syntropy is a newly developed platform designed to enhance the security and governance of AI agents. It includes features such as comprehensive logging of agent interactions, PII detection, prompt injection defense, and compliance report generation. The platform aims to provide better visibility and control over AI agent operations, addressing common issues faced by organizations using AI in production.
A free webinar discussing the use of agentic AI systems in financial workflows, covering trading agents, risk monitoring agents, and compliance assistants.
NVIDIA has introduced Terminal-Task-Gen and the Terminal-Corpus dataset to tackle data scarcity in developing autonomous terminal agents. The Nemotron-Terminal model family, particularly the 32B variant, achieved a 27.4% success rate on the Terminal-Bench 2.0 evaluation, outperforming larger models. This research emphasizes the importance of high-quality data engineering over sheer parameter scale.