.Research Papers List 2026 (active)
#ai-Safety
- Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLMs : alphaxiv
- Frontier Models are Capable of In-context Scheming : alphaxiv
- Neural Chameleons: Language Models Can Learn to Hide Their Thoughts from Unseen Activation Monitors : alphaxiv
- Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents : alphaxiv
- Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios : alphaxiv
- Our evaluation of Claude Mythos Preview’s cyber capabilities | AISI Work : AISI Blog
- Tell me about yourself: LLMs are aware of their learned behavior : slides
- Steering Evaluation-Aware Language Models to Act Like They Are Deployed : arxiv
- When can we trust untrusted monitoring? A safety case sketch across collusion strategies : alphaxiv
- AI Control: Improving Safety Despite Intentional Subversion : alphaxiv
- SafeDialBench: A Fine-Grained Safety Evaluation Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks : alphaxiv
- Large Language Model Reasoning Failures : alphaxiv
- Training large language models on narrow tasks can lead to broad misalignment : Nature slide
- Toward a Science of AI Agent Reliability : alphaxiv
- Continuation of Measuring AI Ability to Complete Long Tasks : alphaxiv (Related to "Measuring AI Ability to Complete Long Software Tasks")
- Measuring AI Ability to Complete Long Tasks : METR Blog
- Against the METR graph : slides
- RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts : alphaxiv
- HCAST: Human-Calibrated Autonomy Software Tasks : alphaxiv
- Meta-RL Induces Exploration in Language Agents : alphaxiv
- Scaling Up Active Testing to Large Language Models : alphaxiv
- Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities : alphaxiv
- RepliBench: Evaluating the Autonomous Replication Capabilities of Language Model Agents : alphaxiv
- UK AISI Align Evaluation Case-Study : AISI Reports
Reading paper notes
- LTX-2: Decoupling Audio-Video Streams for 20x Efficiency
- Reading Between the Lines: What This Hallucination Detection Study Really Reveals
Current Paper reading list
2026
- Poolside Laguna M.1/XS.2 Technical Report
- Memory in the Age of AI Agents
- Less is More: Tiny Recursive Networks
- Opus 4.7 = 4T params? Incompressible Knowledge Probes
- DeepSeek V4 Pro/Flash
- Self-Distilled RLVR paper
- Agents of Chaos paper
- MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
- Composer 2 + Claude Code Leaks
- Moonshot Attention Residuals + Cursor Composer 2
- Karpathy Autoresearch, ShinkaEvolve, TexttoLORA
- Moltbook analysis + Persona Selection
- Discovering Multiagent Learning Algorithms with Large Language Models
- Midtraining Bridges Pretraining and Posttraining Distributions
- Rubric Based RL survey + Alec Radford Generative Meta-Model
- LLaDA 2.1 + RL via Self-Distillation
- Kimi K2.5 Tech Report + Alec Radford on Data Filtering
- Recursive Language Models, Meta Confucius Code Agent
- Anthropic's Assistant Axis: situating and stabilizing the character of large language models
- LTX-2: Efficient Joint Audio-Visual Foundation Model
- Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits
- Show-o2: Improved Native Unified Multimodal Models
2025
- Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
- NVIDIA Nemotron 3 + 3 Nano
- PLUM: Adapting Pre-trained Language Models for Industrial-scale Generative Recommendations: Semantic IDs v2
- DeepSeek v3.2, DeepSeekMath v2
- RF-DETR: Realtime Neural Arch Search for Realtime Detection Transformers
- Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
- Kimi K2 Thinking + Kimi Linear + Q&A
- Scaling LLMs for Next-Generation Single-Cell Analysis
- Real-Time Detection of Hallucinated Entities in Long-Form Generation
- Thinky LoRA + DeepSeek OCR
- RLVR paper roundup!
- Training scientific reasoning LLMs with biological world models as soft verifiers
- Veo 3 + DeepSeek 3.2
- Meta SuperIntelligence papers: Bootstrapping, CaTARE
- InternVL3.5 & Vision in GPT-OSS
- How much do Language Models Memorize?
- Hierarchical Reasoning Models
- RecSys with Generative Retrieval (RQ-VAE)
- GLM 4.5: Agentic Coding, Reasoning Foundation Model
- Qwen Image: SOTA text rendering + 4o-imagegen-level Editing Open Weights MMDiT
- GPT-OSS: OpenAI's Open AI
- Anthropic Fellows: Subliminal Learning & Inverse Scaling
- Kimi K-2 Tech Report
- Muon, MuonClip, Kimi K-2 from Moonshot AI
- Magistral Reasoning
- Evals for long-context Q&A + Ernie 4.5 Technical Report
- Reflect, Retry, Reward: Self-Improving LLMs
- Claude 4 + Self Adapting Language Models
- Apple: The Illusion of Thinking + WWDC25 Foundation Models
- Gemini Diffusion & Diffusion Models Survey
- RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale
- Llama 1/2/3/4 by Hand
- Phi-4 Reasoning
- Survey: Long Context, Leaderboard Illusion, Reasoning Economy
- Advances and Challenges in Foundation Agents
- Anthropic: Tracing the thoughts of an LLM
- Autoregressive Image Generation Survey
- RecSys and search in the age of LLMs