ICLR 2026 Workshop

Emergent Trust Risks in Large Reasoning Models (TrustLLM)

ICLR 2026 @ Rio de Janeiro, Brazil, April 26-27th, 2026
Exploring Process-Level Safety in Large Reasoning Models

Recent breakthroughs in large reasoning models (LRMs) have unlocked impressive gains across science, mathematics, and law. Yet these very reasoning chains introduce novel trust risks that scale-only LLM studies overlook.

This workshop aims to systematically explore, characterize, and address emergent trust risks in large reasoning models, moving beyond traditional evaluation of final LLM outputs to focus on the full process of reasoning.

Schedule

TimeSessionDescriptionDuration(mins)
Apr 27, 12:00
Opening Remark
    Opening Remark
    Taiji Suzuki
    10
    Apr 27, 12:10
    Invited Talk 1
      TBD
      Maksym Andriushchenko
      40
      Apr 27, 12:50
      Invited Talk 2
        TBD
        Alessio Lomuscio
        40
        Apr 27, 13:30
        Poster/Break
          Poster Session
          50
          Apr 27, 14:20
          Invited Talk 3
            TBD
            Hima Lakkaraju
            40
            Apr 27, 15:00
            Oral Presentations
              TBD
              TBD
              30
              Apr 27, 15:30
              Lunch
                Lunch
                -
                90
                Apr 27, 17:00
                Invited Talk 4
                  TBD
                  Bo Li
                  40
                  Apr 27, 17:40
                  Invited Talk 5
                    TBD
                    Rishi Bommasani
                    40
                    Apr 27, 18:20
                    Oral Presentations
                      TBD
                      TBD
                      30
                      Apr 27, 18:50
                      Invited Talk 6
                        TBD
                        Sahar Abdelnabi
                        40
                        Apr 27, 19:30
                        Invited Talk 7
                          TBD
                          Zhangyang Atlas Wang
                          40
                          Apr 27, 20:10
                          Panel Discussion
                            From Theory to Deployment: Process-Level Safety in LRMs
                            Panelists:
                            - Junchi Yan (SJTU)
                            - Zhangyang Atlas Wang (XTX Markets & UT Austin)
                            - Atsushi Nitanda (A*STAR & NTU)
                            - Maksym Andriushchenko (ELLIS Institute Tübingen)
                            - Aleksandra Korolova (Princeton University)
                            40
                            Apr 27, 20:50
                            Closing Remark & Awards & Community Building
                              Closing Remark
                              Wei Huang
                              20

                              Call for Papers

                              Workshop Description

                              Recent breakthroughs in large reasoning models (LRMs)—systems that perform explicit, multi-step inference at test time—have unlocked impressive gains across science, mathematics, and law (Guo et al., 2025; Snell et al., 2024). Yet these very reasoning chains introduce novel trust risks that scale-only LLM studies overlook: small logic slips can snowball, intermediate states become new attack surfaces, and plausible-but-false proofs can elude standard output-level filters (Geiping et al., 2025).
                              What sets this workshop apart: Prior workshops focused on generic robustness; This workshop aims to bring together researchers and practitioners to systematically explore, characterize, and address these emergent trust risks in large reasoning models. We seek to move beyond traditional evaluation of final LLM outputs, instead focusing on the full process of reasoning: how errors propagate, how uncertainty compounds, how transparency and accountability are affected, and what new opportunities exist for robust evaluation and intervention (Zhu et al., 2025; Ouyang et al., 2024).
                              Workshop Goals
                              • Establish a comprehensive taxonomy of process-level risks in large reasoning models
                              • Develop standardized benchmarks and evaluation metrics for reasoning safety
                              • Create a community-driven repository of tools and techniques for safer reasoning
                              • Foster interdisciplinary collaboration between AI safety, formal methods, and reasoning communities
                              Core Research Questions
                              • Foundation - Which theoretical frameworks and abstracts can be used to study the emergent trust risk problems?
                              • Risk Landscape — Which process-level failure modes threaten high-stakes deployment?
                              • Metric Design — How can we measure and benchmark step-wise safety at scale?
                              • Prototype Validation — What lightweight tools or model patches can we release now to seed wider research?
                              Topics of Interest
                              • Failure modes and risk propagation in multi-step reasoning.
                              • Adversarial attacks and defenses in inference-time reasoning.
                              • Evaluation metrics and empirical benchmarks for reasoning safety.
                              • Uncertainty quantification, error detection, and correction in reasoning processes.
                              • Approaches for improving interpretability and transparency of reasoning chains.
                              • Causal analysis as one lens for understanding reasoning failures.
                              • Human-AI trust, calibration, and collaborative reasoning.
                              • Case studies of reasoning models in real-world high-stakes domains.
                              • New architectures or algorithms for safer, more robust reasoning.

                              Submission Guide

                              Submission Instructions

                              We invite submissions of up to 8 pages (excluding references and appendices). The main manuscript and any appendices must be submitted as a single PDF file through the OpenReview portal (the official workshop venue link will be posted on the website). Papers previously published at other conferences will not be considered for acceptance
                              Each paper receives no less than 3 reviews. Reviewers declare the conflict of interests via the OpenReview interface; organizers will not handle papers with which they have a conflict.

                              Tiny Paper Track

                              In addition to full-length submissions, we encourage contributions to a Tiny Paper Track, designed to promote early-stage, exploratory, or rapidly developing ideas. Tiny papers may be up to 4 pages in length (excluding references) and can include:
                              • Exploratory experiments or small-scale empirical findings
                              • Modest but self-contained theoretical insights
                              • Re-analyses or replications of previously published work
                              • Implementation notes, tool releases, or negative results
                              • Fresh perspectives or conceptual critiques
                              Tiny papers are intended to encourage participation from junior researchers and underrepresented groups, fostering community-building and feedback exchange. All accepted tiny papers will be presented as posters or short talks during the workshop and included in the OpenReview proceedings under the workshop's track configuration.

                              Timelines

                              Submissions DeadlineJan 30th, 2026
                              Reviewer bidding datesFeb 1st - Feb 5th, 2026
                              Review DeadlineFeb 26th, 2026
                              Paper Notification DateMar 1st, 2026
                              Camera Ready DeadlineMar 5th, 2026
                              All deadlines are specified in 23:59AoE (Anywhere on Earth).

                              Frequently Asked Questions

                              Can we submit a paper that will also be submitted to ICLR 2026?

                              Yes.

                              Can we submit a paper that was accepted at ICLR 2026?

                              No. ICLR prohibits main conference publication from appearing concurrently at the workshops.

                              Will the reviews be made available to authors?

                              Yes.

                              I have a question not addressed here, whom should I contact?

                              Email organizers at trustllm-workshop@googlegroups.com

                              References

                              [1] Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., Wang, P., Bi, X. and Zhang, X., 2025. Deepseek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. arXiv preprint arXiv:2501.12948.

                              [2] Snell, C., Lee, J., Xu, K. and Kumar, A., 2024. Scaling LLM test-time compute optimally can be more effective than scaling model parameters. arXiv preprint arXiv:2408.03314.

                              [3] Muennighoff, N., Yang, Z., Shi, W., Li, X.L., Fei-Fei, L., Hajishirzi, H., Zettlemoyer, L., Liang, P., Candès, E. and Hashimoto, T., 2025. s1: Simple test-time scaling. arXiv preprint arXiv:2501.19393.

                              [4] Geiping, J., McLeish, S., Jain, N., Kirchenbauer, J., Singh, S., Bartoldson, B.R., Kailkhura, B., Bhatele, A. and Goldstein, T., 2025. Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach. arXiv preprint arXiv:2502.05171.

                              [5] Zhu, J., Yan, L., Wang, S., Yin, D. and Sha, L., 2025. Reasoning-to-defend: Safety-aware reasoning can defend large language models from Jailbreaking. arXiv preprint arXiv:2502.12970.