Reports of the Workshops Held at the 2026 AAAI Conference on Artificial Intelligence

Reports of the Workshops Held at the 2026 AAAI Conference on Artificial Intelligence

Reza Abbasi-Asl, Shaukat Ali, Nitay Alon, Shuang Ao, Pandarasamy Arjunan, Narges Armanfard, Ivan Au Yeung, Keshav Bhandari, Sree Bhattacharyya, Simone Bianco, Bruno Casella, Nancy F. Chen, Francisco Chicano, Tiansi Dong, Dhari Gandhi, Xiaoxue Gao, Thi Kieu Khanh Ho, Hadi Hojjati, Keisuke Imoto, Noa Izsak, Himanshu Joshi, Yun Sing Koh, Tatsuya Komatsu, Tarun Kumar, Jiahong Liu, Qian Liu, Tianqiao Liu, Joel Mackenzie, Deepak Maurya, Hayden McTavish, Martin Michalowski, Alberto Moraglio, Atsunori Moteki, Apurva Narayan, Lai Xing Ng, Duc Nguyen, Nobutaka Ono, Hong Qin, Edward Raff, Balaraman Ravindran, Mark Rice, Rafal Rzepka, Lesia Semenova, Arash Shaban-Nejad, Rahul Vashisht, Pengyang Wang, Muning Wen, Wai Tuck Wong, Kieran Woodward, Haiyan Yin, Yueyi Zhang, Di Zhao, Joey Tianyi Zhou

The Workshop Program of the Association for the Advancement of Artificial Intelligence’s 40th Conference on Artificial Intelligence (AAAI-26) was held in Singapore on January 26-27, 2026. This report contains summaries of the workshops, which were submitted by most, but not all, of the workshop chairs.

Workshop on Health Intelligence: Special Theme on “Foundation Models and AI Agents” (W1)

The 10th International Workshop on Health Intelligence was held in Singapore on January 26th and 27th, 2026, in conjunction with AAAI 2026. This milestone workshop marked a decade of a successful and influential series at the intersection of artificial intelligence and health. Under the special theme “Foundation Models and AI Agents,” the two-day event brought together a wide range of computer scientists, clinical and health informaticians, researchers, students, industry professionals, and representatives from national and international health agencies and NGOs, all united by an interest in the theory and practice of computational models of population health intelligence and personalized healthcare.

The 10th International Workshop on Health Intelligence (W3PHIAI-26) celebrated a decade of bringing AI and health research together, building on a lineage that began with the AAAI-W3PHI workshops focused on population health (2014-2016), the AAAI-HIAI workshops focused on personalized health (2013-2016), and the subsequent joint W3PHIAI workshops held annually from 2017 through 2025. Over this decade, the series has produced hundreds of talks and high-impact publications that have collectively received thousands of citations, shaping the research agenda in both population health intelligence and personalized healthcare AI. This year’s special theme, “Foundation Models and AI Agents,” reflected the field’s rapidly evolving frontier: the emergence of autonomous and semi-autonomous AI systems reshaping clinical workflows, patient management, health system operations, and public health surveillance.

Day 1 of the workshop focused on medical imaging and the translation of AI for clinical deployments and industry transformation. The day opened with a keynote by Professor Tien Yin Wong, Professor and Senior Vice-Chancellor at Tsinghua Medicine and Vice-Provost of Tsinghua University (Beijing, China), as well as Senior Advisor at SingHealth and the Singapore National Eye Center. His talk, “Foundational Models in Medicine: How to Build the Ecosystem for Clinical Adoption,” examined the gap between the rapid development of foundation models and their slow uptake in clinical settings globally. Professor Wong introduced the “6 Ps” framework for AI adoption, arguing that understanding these intertwined factors is essential for healthcare leaders, clinicians, and engineers to allow foundation models to meaningfully transform medicine. The afternoon’s invited speaker, Dr. Shikoh Gitau, CEO of Qhala (Nairobi, Kenya), offered a critical Global South perspective in her talk, “AI In the Wild: Building Context-Aware Health AI for the Global South,” highlighting the need for context-sensitive AI development in under-resourced health settings.

Presentations on Day 1 were organized around sessions on medical imaging, covering topics such as multimodal classification of age-related macular degeneration, limited-angle CT reconstruction, contrast-free brain perfusion imaging, and early lung cancer diagnosis through virtual follow-up CT generation. A session on pathology data and cancer featured work on immunohistochemistry image analysis and pathology foundation models developed through knowledge distillation. These presentations collectively illustrated the growing role of deep learning and transformer-based architectures in translating imaging data into clinically actionable insights.

Day 2 shifted focus to foundation model training, integration of novel data sources, AI simulation of patient behaviors, and discussions of ethics, fairness, privacy, and safety. The day’s first keynote was delivered by Professor Hua Xu, Robert T. McCluskey Professor and Vice Chair for Research and Development in the Department of Biomedical Informatics and Data Science, and Associate Dean for Biomedical Informatics at Yale School of Medicine. His talk, “From LLMs to Agents: AI for Biomedical Applications,” highlighted his group’s work on biomedical foundation models and AI agents built on state-of-the-art LLMs, with applications spanning real-world evidence generation, medical diagnosis, and literature-based discovery.

A particularly noteworthy session on Day 2 explored AI simulation of patient behaviors. Presentations included work on causal reinforcement learning for agent-patient interaction with clinical domain knowledge, and a multi-agent LLM framework (SynthAgent) for realistic patient simulation, demonstrated through a case study on obesity with mental health comorbidities. These works exemplified the workshop’s theme of agentic AI, systems capable of autonomous, multi-step reasoning and action within complex healthcare contexts.

The afternoon keynote on Day 2 was delivered by Dr. Sumytra Menon, Director of the Centre for Biomedical Ethics at the National University of Singapore. Her talk introduced the concept of the Personalised Patient Preference Predictor (P4), a “digital psychological twin” designed to infer patient decision-making preferences from authorized personal data. Dr. Menon examined how such a tool might bear on clinical and legal standards for shared decision-making and best-interest determinations, raising important governance questions about identity, accuracy, and evidentiary integrity.

Across both days, sessions on fairness, generalizability, and privacy reinforced a recurring theme: that technical performance alone is insufficient. Work evaluating foundation models for skin lesion classification across diverse skin tones and exploring operating-point fairness in dermatology models illustrated the ongoing challenge of building AI that is equitable and clinically trustworthy across diverse patient populations. The poster session added further breadth, with contributions on drug safety agents using graphs and ontologies, mental health conversation data analysis, and liver tumor segmentation. The workshop thus covered a remarkable scope, from imaging and clinical NLP to ethics, patient simulation, and global health equity.

Martin Michalowski, Arash Shaban-Nejad, and Simone Bianco served as co-chairs of this workshop, and Creighton Heaukulani and Robert Morris from the Singapore Ministry of Health Office for Healthcare Transformation, and Marija Stanojevic from Ellipsis Health served as co-chairs for the special theme. All the workshop papers are published by Springer in their “Studies in Computational Intelligence” series. This report was written by Martin Michalowski, Arash Shaban-Nejad, and Simone Bianco.

Agentic AI in Financial Services (W2)

Financial services play a crucial role in everyday life, requiring expert-level support to meet highly personalized user needs across domains like banking, insurance, and taxation. The emergence of foundation models, especially large language models (LLMs), has introduced new capabilities in communication, reasoning, and personalization that align well with financial decision-making processes. Recently, agentic AI has extended these models by enabling them to autonomously plan, reason, and act across multi-step tasks, making them highly suitable for complex use cases such as financial advising and compliance.

The first workshop on Agentic AI in Financial Services was held at AAAI on January 26, 2026, aiming to bring together both researchers and practitioners to explore the latest advances in agentic AI for a wide range of financial services, and to discuss new ideas related to the design, deployment, ethics, and real-world impact that AI can bring to the financial domain.

The workshop consisted of three keynote talks. Feng Liu (The University of Melbourne) explored the idea of model reprogramming, and how existing pre-trained models can be repurposed for downstream applications such as customizing language models to detect fraudulent transactions, automate customer support, or enhance risk assessment in banking and insurance. Hariharan Suresh (NVIDIA) demonstrated how NVIDIA’s open-source agentic AI ecosystem can accelerate production-ready development in the agentic space. Guansong Pang (Singapore Management University) discussed some recent work towards building generalized graph anomaly detection models that work across domains under zero- or few-shot settings, with applications in finance.

These keynote discussions were interleaved with two oral presentation sessions, each containing three invited talks from the pool of accepted papers. Topics included time series forecasting, benchmarking AI agents, reasoning for financial tasks, model cost optimization, bias and fairness detection, and financial question answering. The workshop also hosted an incredibly active poster session, where all 22 accepted submissions (out of 57 submitted) were presented interactively. We are grateful to the diverse audience, and workshop contributors, who made the event a great success. We are excited to see what lies ahead in the agentic AI space, especially as models become increasingly capable and trusted in high-risk settings such as the financial services.

Rocky Chen, Hongxu Chen, Joel Mackenzie, Fengbin Zhu, Luiz Pizzato, Ritchie Ng, and Anna Leontjeva served on the organizing committee. Wendy Chen and Naomi (Naime) Ranjbar Kermany handled workshop logistics, and Xurong Liang acted as the web chair. This report was written by Joel Mackenzie.

AI for Healthy Aging and Longevity (W3)

The First International Workshop on AI for Healthy Aging and Longevity (AIAA2026) focused on the intersection of artificial intelligence for healthy aging and longevity research, aiming to address the global challenges brought by rapid population aging. The workshop explored how AI technologies can support active lifestyles, disease management, healthcare affordability, and longevity, enabling older people to lead active, independent, and dignified lives.

No formal report was filed by the organizers for this workshop.

AI in Agriculture (W4)

The First International Workshop on Artificial Intelligence in Agriculture (AgriAI) was held on January 26, 2026, as part of the Fortieth AAAI Conference on Artificial Intelligence in Singapore. The workshop was designed to provide a forum for researchers and practitioners working at the intersection of artificial intelligence and agricultural science. Agriculture today faces mounting challenges arising from climate change, resource scarcity, and the growing demand for food production. The workshop therefore focused on how recent advances in artificial intelligence, including machine learning, computer vision, and robotics, can contribute to improving agricultural productivity, sustainability, and resilience.

The program featured a combination of keynote talks, oral presentations, and poster sessions that showcased emerging work in agricultural AI. The workshop included three keynote presentations delivered by leading researchers working at the intersection of artificial intelligence and agricultural and environmental sciences. In addition, the technical program included six contributed oral presentations and ten poster presentations selected from submitted papers. The oral sessions highlighted research on topics such as crop monitoring using remote sensing and computer vision, AI-enabled phenotyping, predictive modeling for yield and disease detection, and machine learning methods for agricultural decision support. Poster presentations provided an opportunity for participants to present early-stage research and datasets, while encouraging informal discussions and feedback from the community.

A central theme of the workshop was the integration of diverse data sources, such as satellite imagery, drone-based sensing, field sensors, and climate data, to support data-driven agricultural decision-making. Several presentations demonstrated how machine learning models can analyze large-scale geospatial and temporal data to detect crop stress, estimate yields, and monitor environmental conditions. Participants also discussed the growing role of AI in precision agriculture, where intelligent sensing systems and autonomous platforms can support targeted interventions such as irrigation, fertilization, and pest management. These approaches have the potential to increase efficiency while reducing environmental impacts associated with agricultural production.

The workshop also emphasized the importance of interdisciplinary collaboration between AI researchers and agricultural scientists. Many of the challenges in agriculture require domain expertise in agronomy, plant science, and environmental systems, alongside advances in AI algorithms and data infrastructure. Discussions throughout the workshop highlighted the need for open datasets, shared benchmarks, and collaborative platforms that enable researchers from multiple disciplines to contribute to the development of robust AI solutions for agriculture.

The event concluded with a joint panel discussion organized together with the AI for Accelerating Science and Engineering (AI2ASE) workshop. The panel brought together researchers from both communities to discuss broader opportunities for AI in scientific discovery and real-world applications, including agriculture. Panelists explored how advances in foundation models, scientific machine learning, and large-scale data infrastructure could accelerate progress in agricultural research and other scientific domains. The discussion also addressed challenges related to data availability, model interpretability, and the deployment of AI systems in real-world settings.

The workshop was co-chaired by Prof. Sajal Das, Missouri University of Science and Technology, USA and Soumik Sarkar, Iowa State University, USA. This report was written by Pandarasamy Arjunan.

AI to Accelerate Science and Engineering (W5)

The Workshop on AI to Accelerate Science and Engineering (AI2ASE) brought together researchers from AI and diverse science and engineering communities to identify and understand challenges in applying AI to specific problems, develop and refine AI tools for novel problem settings, and build community between AI researchers and domain area experts. This year’s theme was AI for agricultural and forestry sciences.

No formal report was filed by the organizers for this workshop.

Addressing Challenges and Opportunities in Human-Centric Manufacturing (W6)

The one-day workshop on Human-Centric Manufacturing was centred on exploring the relationship between people and technology in the workplace. Specifically, the focus was to draw from a range of different perspectives, to discuss ideas in relation to the development of applications and technologies for manufacturing ecosystems. This ranged from the co-development of learning and reasoning tasks, to accommodating for diverse cognitive and sensory-physical abilities within the population. Invited topics included advancements in machine learning, human-machine collaboration, privacy and security, explainable and responsible AI, and use cases pertaining to human-centric manufacturing.

The workshop began with an invited presentation from Associate Professor Fook Cheong Yee (Singapore Institute of Technology), who discussed case studies and implementation challenges, such as system complexity and rigidity, safety and mental wellbeing, among other issues. Prof Yee proposed a human-centric manufacturing integration framework and urged system designs to prioritise human collaboration and amplify human contributions – such as cross-domain pattern recognition, ethical judgement, and improvisations in manufacturing systems.

This was followed by a presentation from Professor Cecilia Laschi (National University of Singapore), who discussed soft robotics, drawing from the design of bio-inspired intelligence. She emphasised the importance of understanding and incorporating principles from nature, such as energy efficiency and the role of the body in intelligence, to develop more effective robots, and to collaborate with human operators for more natural interactions.

In addition, the workshop had two oral presentation sessions, interspersed with a single poster session. During the day, works from 13 papers were presented, with topics ranging from collaborative robots, human-AI object detection and augmentation, data privacy preservation techniques for obfuscating human data, intent-aware warehouse planning simulation, and reasoning of 3D assembly tasks and procedural planning.

In conjunction, the workshop reported on a “Robotic Collaborative Assembling for Human-Centered Manufacturing” (RoCo) challenge, co-organised by Assistant Professors Ziwei Wang and Jianfei Yang (Nanyang Technological University). The competition involved over 60 teams, with winners from Tsinghua and Beihang Universities presenting their technical approaches in the workshop. More details can be found on https://rocochallenge.github.io/RoCo2026/.

In the late afternoon, a panel discussion was conducted with three invited speakers: Professor Ashok Goel (Georgia Institute of Technology), Dr. Jamie Ng (Institute for Infocomm Research, A*STAR), and Assistant Professor Jianfei Yang (Nanyang Technological University). The panel discussed a range of topics, from advanced manufacturing in a Singapore context, to AI in education and manufacturing, and human-robot collaboration. Discussions included AI coaches and the requirement for adaptive learning and expertise, the digital divide, and the need to prevent segregating demographic groups, such as older workers, and the challenges of physical AI systems moving from prototypes to robust operation in real environments. Proposed research directions included modelling human intention for robots and developing a mutual theory of mind between humans and AI.

Based on the presented works, the workshop concluded with plans for a post-workshop publication and establishing a participant mailing list.

The workshop co-chairs were Mark Rice, Gu Ying, Lai Xing Ng (Institute for Infocomm Research, A*STAR), and Shijian Lu (Nanyang Technological University). This report is written by Mark Rice and Lai Xing Ng.

Advancing Artificial Intelligence through Theory of Mind (W7)

The Theory of Mind (ToM) for AI workshop (ToM4AI) was held as part of AAAI-2026 in Singapore. Now in its second year, the workshop continued its mission of bridging multiple scientific communities actively researching Theory of Mind, fostering collaboration across disciplines spanning cognitive science, psychology, and AI.

The AAAI-2026 ToM4AI workshop brought together researchers from psychology, Computational Psychiatry, computer science, robotics, and AI to explore the theoretical and practical dimensions of ToM. The workshop was motivated by the growing importance of social reasoning and mental state inference in the development of safe and effective AI systems, and the need to ground computational approaches in established findings from cognitive science.

The workshop featured three keynote speakers. Dr. Maarten Sap (CMU) addressed the gap between human social reasoning and current LLMs, arguing that individual, interpersonal, and cultural levels of inference are largely absent in today’s models, and presented work on social alignment through public-private knowledge inference. Prof. Geoff Bird (Oxford) brought a psychological perspective, urging the community to move beyond flawed benchmarks and presenting MindSpace Theory — a framework of multiplex representation along continuums — arguing that within-person, between-context trait generalisation is central to ToM. Prof. Sarit Kraus (BIU) discussed the role of humans-in-the-loop for providing ethical constraints and complementary advantages, and highlighted how high-level signals are critical for successful social communication in AI agents.

The workshop also included two poster sessions and eight flash talks, providing researchers across all career stages an opportunity to share their findings and receive feedback from the broader community. A hackathon was also held as part of this year’s program, encouraging hands-on collaborative work on ToM-related problems.

A recurring theme across the workshop was the tension between building systems that recapitulate human-like social reasoning versus systems that distil only the functional benefits of ToM — subtextual inference, adaptability, and energy efficiency — without needing to mirror biological reality. Discussions explored whether neural networks should serve as models of human cognition or purely as engineering tools.

Nitay Alon, Joseph M. Barnby, Reuth Mirsky, and Stefan Sarkadi co-organized this workshop. The workshop also announced a Special JAAMAS Issue on ToM4AI, inviting submissions from the community.

Agentic AI Benchmarks and Applications for Enterprise Tasks (W8)

The “Agentic AI Benchmarks and Applications for Enterprise Tasks” workshop at AAAI-26, co-organized by CMU, Keio University, and Fujitsu, successfully fostered discussions on Agentic AI evaluation and real-world applications through invited talks and poster sessions.

The “Agentic AI Benchmarks and Applications for Enterprise Tasks” workshop, held on January 26, 2026, as part of the 40th Annual AAAI Conference on Artificial Intelligence (AAAI-26) in Singapore, marked a significant step in exploring the evolving landscape of Agentic AI. Co-organized by Carnegie Mellon University (CMU), Keio University, and Fujitsu, the workshop aimed to stimulate discussion about this rapidly advancing domain. Key themes included benchmarks and evaluation for Agentic AI, enterprise applications, human-agent interaction, and multimodal reasoning for enterprises. The organizers adopted a broad approach for topic selection, centered on diverse enterprise-oriented Agentic AI proposals, to thoroughly cover aspects not fully addressed during the main conference. Success criteria were defined by the number of participants and the level of discussion engagement.

Extensive preparatory work was crucial for the workshop’s successful execution. Early formation of the organizing team facilitated vital decisions, including securing renowned invited speakers through the personal network of Dr. Alexandre Drouin from ServiceNow Research, a member of the Steering Committee. Seven distinguished speakers from institutions such as Keio University, IBM, the University of Illinois, and Amazon confirmed their participation. The call for papers, spanning October 1st to 29th, yielded a high volume of quality submissions, with 33 papers ultimately accepted after a rigorous peer review process conducted by 40 reviewers from various companies and universities, including Fujitsu and ServiceNow. Promotional activities, leveraging an interview article with CMU and Professor Graham Neubig’s LinkedIn posts, significantly boosted workshop awareness and participation.

The workshop, held at Singapore EXPO in the largest available room with a 120-person capacity, featured seven invited talks and three poster sessions. The invited talk session drew nearly 100 participants, fostering active Q&A sessions. Prominent researchers, including Fujitsu’s Senior Project Director Hiromichi Kobashi, Keio University’s Professor Komei Sugiura, University of Illinois Urbana-Champaign’s Assistant Professor Daniel Kang, ServiceNow Research’s Dr. Alexandre Drouin, Keio University’s Associate Professor Hirotaka Osawa, Amazon Research’s Applied Scientist Mr. Ananth Sadanand, and IBM’s Project Lead Dr. Asim Munawar, shared their latest research and challenges.

Among the highlighted presentations, Dr. Kobashi, Senior Project Director at Fujitsu, emphasized the critical need for objective evaluation of AI Agent capabilities and reliability. He introduced Fujitsu’s “FieldWorkArena,” a benchmark designed to replicate real-world field operations, and other Fujitsu benchmarks like CAD Inspection Assistant, ECHO, and RAG Hard Benchmark, showcasing Fujitsu’s contributions to AI Agent research. Assistant Professor Daniel Kang from the UIUC delivered a provocative talk titled “AI Agent Benchmarks Are Broken,” critiquing existing benchmarks for lacking “Task Validity” and “Outcome Validity,” citing examples of agents producing correct answers without actual work and inconsistent evaluation criteria. He strongly advocated systematic mechanisms to assess validity and detect annotation errors. Dr. Alexandre Drouin from ServiceNow Research discussed challenges in evaluating AI agent performance, safety, and security. He presented “BrowserGym” for web agent research and “WorkArena,” a benchmark covering a wide array of enterprise tasks, alongside ServiceNow’s efforts in combating false AI detections and prompt injections, and the “DoomArena” framework for security threats.

While on-site operations presented minor challenges, such as speaker PC connections and frequent poster replacements, these were effectively managed with local secretariat support. The high number of poster presenters, while invigorating the workshop, highlighted coordination complexities, suggesting future improvements in scheduling poster rearrangements.

The workshop concluded with significant success, evidenced by high participation, lively discussions, and positive external recognition, including an interview with IEEE Spectrum and commendation from Dr. Asim, an IBM influencer. The field of agentic AI is in particularly high demand among corporate researchers, and there is strong anticipation for the next workshop. Building on this success, we aim to create an even more compelling workshop. We extend our heartfelt gratitude to all participants and collaborators who made this workshop possible. Please visit the official website.

The co-chairs of this workshop were Associate Professor Graham Neubig and Assistant Professor Yonatan Bisk from CMU, Professor Hideo Saito from Keio University, and Principal Researcher Atsunori Moteki from Fujitsu Limited. This report was authored by Atsunori Moteki.

AI for CyberSecurity 2026 (W9)

The AAAI-26 Workshop on Artificial Intelligence for Cyber Security (AICS) 2026 brought together researchers and practitioners working at the intersection of artificial intelligence and cyber security, examining both rigorous assurance of learning-enabled systems and the fast-moving reality of large language models embedded into day-to-day security workflows.

The AAAI-26 Workshop on Artificial Intelligence for Cyber Security (AICS) 2026 was held on January 26, 2026, in Singapore, and brought together researchers and practitioners working at the intersection of artificial intelligence and cyber security. The workshop’s program reflected a field that had stretched in two directions at once: toward more rigorous assurance of learning-enabled systems, and toward the fast-moving reality of large language models embedded into day-to-day security workflows. The day moved from theory-heavy work, such as formal reasoning about model behavior, to the newest applications of large language models in agentic offensive and defensive security pipelines.

Several presentations examined the security of artificial intelligence through a more traditional machine-learning lens, focusing on risks that existed even before large language models became central to security workflows. One talk examined training-data privacy leakage through Training data membership inference via Gaussian process meta-modeling: a post-hoc analysis approach, highlighting how model behavior could be interrogated after training to infer whether specific data had influenced the learned system. Another line of work focused on robustness that could be argued formally, exemplified by Improving Neural Network Robustness to Convolutional Perturbations Through Certified Training, which emphasized certified procedures intended to bound worst-case behavior under a meaningful perturbation class. Together, these works served as a reminder that while the community explored newer paradigms such as agentic systems and large language models, longstanding concerns, privacy leakage, robustness, and verifiable guarantees remained foundational and continued to provide concrete tools for analyzing and strengthening AI systems.

The workshop also featured a strong set of contributions on large language models (LLMs), both as security enablers and as newly exposed attack surfaces. Peering Behind the Shield: Guardrail Identification in Large Language Models focused on techniques that allowed the identification of specific guardrails that were implemented in a LLM system, providing a steppingstone to further targeted attacks in the system. Another application of LLMs in the real world, Beyond BeautifulSoup: Benchmarking LLM-Powered Web Scraping for Everyday Users, showed how tool-using systems that interact with real websites can be used to enhance data-collection tasks in the security domain. Agentic and end-to-end security workflows formed another practical backbone of the program. Both Cybersecurity AI (CAI): An open framework for AI Security and Cybersecurity AI: Evaluating Agentic Cybersecurity in Attack/Defense CTFs pushed the boundaries of what agentic workflows can do to automate both offensive and defensive security, pushing evaluation into real adversarial interaction settings rather than static datasets.

Two keynotes anchored the day’s narrative by pairing a practitioner’s view of real-world usage and constraints with an academic perspective on how to formalize and guarantee safety properties in AI systems. Eugene Lim (Open Government Products, Government Technology Agency of Singapore) delivered Applications of LLMs in Cybersecurity Tooling – Practical Insights from the Field, describing how practitioners had already been integrating large language models into production security tooling and grounding the discussion in concrete enterprise pain points where LLMs could add value, including accelerating vulnerability discovery, improving analyst productivity, and even helping uncover business-logic bugs that had previously been difficult to identify. Sun Jun (Singapore Management University) presented Towards Guaranteed AI Safety, reinforcing that the next wave of progress in ensuring safe behavior would depend on translating high-level security goals into precise, checkable properties with clearly specified threat models and assumptions, so the community could move from aspirational guarantees to guarantees that could be proven, measured, or enforced in deployed systems.

The workshop concluded with a clear takeaway: the next phase of work would be defined by methods that deliver practical impact while making explicit, testable security and safety claims.

Edward Raff, Ahmad Ridley, Dennis Ross, Sagar Samtani, Ankit Shah, Arunesh Sinha, Allan Wollaber, and Wai Tuck Wong served as cochairs of this workshop. This report was written by Wai Tuck Wong.

AI for Scientific Research (W10)

The Workshop on AI for Scientific Research aimed to foster collaboration and stimulate innovation towards the development of next-generation AI research assistants that are reliable, transparent, and seamlessly integrated into the fabric of scientific discovery. The workshop addressed how recent advancements in generative AI and agentic systems have unlocked the potential to automate and accelerate every stage of research, from hypothesis generation to paper writing, while also examining the social and ethical challenges that arise.

No formal report was filed by the organizers for this workshop.

Workshop on Multi-Agent Path Finding (W11)

The Workshop on Multi-Agent Path Finding addressed computing collision-free paths for multiple agents from their starting locations to given destinations in a known environment. The workshop brought together researchers in artificial intelligence, robotics, and theoretical computer science working on various problem variants and solution approaches, to present research, discuss future research directions, and cross-fertilize the different communities.

No formal report was filed by the organizers for this workshop.

Federated Learning for Critical Applications (W12)

This first edition of the Federated Learning for Critical Applications (FLCA) workshop focused on the unique challenges of deploying FL systems in real-world, safety- or privacy-critical settings—environments where failure is not an option. Unlike existing FL workshops that emphasize general methods, FLCA focused on FL under realistic deployment constraints, including privacy guarantees, adversarial robustness, system-level failures, and regulatory compliance. The workshop’s emphasis was on practical FL systems that are scalable, robust, and aligned with real-world operational and infrastructural constraints.

The workshop was attended by about 50 participants throughout its entire duration. It consisted of 6 technical presentations, 2 spotlight presentations, and 3 keynote talks.

Each technical presentation was allocated a 15-minute time slot, including questions, which allowed for discussion and exchange of ideas. Each spotlight presentation was allocated a short time slot of a few minutes, excluding questions, and was accompanied by a dedicated poster to further enhance presentation and discussion.

The first keynote talk was delivered by Prof. Qi Dou from The Chinese University of Hong Kong. It provided an overview of how federated learning can enable model development for medical imaging, outlining key challenges and highlighting research directions aimed at making FL more efficient, robust, and applicable to advancing the field of medical imaging analysis.

The second keynote talk was delivered by Dr. Zhaomin Wu from the National University of Singapore. His talk discussed how to enable cross-silo federated learning in practice, including combining rigorous privacy mechanisms with heterogeneous data and unreliable communication, defending against poisoned client data, and enabling machine unlearning when data removal is required. It also outlined future directions for federated learning, highlighting challenges and opportunities in scaling trustworthy deployment for critical applications.

Finally, the last keynote talk was delivered by Katharine Daly and Dr. Daniel Ramage, both from Google Research. They gave a joint presentation explaining the history and evolution of FL at Google and introduced an updated definition of federated learning based on privacy principles (transparency/auditability, data minimization, and data anonymization) rather than on the location of data processing. They also described how this new perspective compares to traditional cross-device federated learning and presented new algorithms and use cases specific to the TEE-hosted federated learning setting.

The workshop’s lively discussions highlighted the importance of a unified community committed to improving the credibility of FL. Ensuring that FL and AI technologies evolve in alignment with real deployment needs requires coordinated collective effort.

Bruno Casella, Samuele Fonio, and Mirko Polato from the University of Turin, together with Michael Kamp and Linara Adilova from TU Dortmund, served as co-chairs of the workshop.

AI in Drug Discovery: From Methods to Molecules (W13)

The Workshop on AI in Drug Discovery aimed to connect algorithmic innovations in artificial intelligence with the practical challenges of discovering and developing new medicines, with emphasis on methodologically rigorous strategies designed to resolve persistent challenges in drug development, specifically target identification, chemical space exploration, and the use of translational models to mitigate risk. The workshop brought together academic and industry experts from AI, computational life sciences, and pharmaceutical research.

No formal report was filed by the organizers for this workshop.

AI for Education: On Opportunities and Challenges of Large Multimodal Models in Education (W14)

The AI for Education workshop (AI4EDU), held on January 26, 2026, at Singapore EXPO (Room: Tourmaline 207-209), brought together researchers, practitioners, and policymakers from around the world to explore the expanding role of large multimodal models (LMMs) and agentic AI systems in educational settings. The workshop addressed both the transformative opportunities and pressing challenges that these technologies present for teaching and learning.

The program opened with a keynote by Professor Wenli Chen, Associate Dean (Research Support) at the National Institute of Education (NIE), Nanyang Technological University (NTU), Singapore, who delivered a talk titled “Augmenting Learner Capability through Meaningful Human-AI Collaboration.” Professor Chen emphasized the importance of designing AI systems that complement, rather than replace, human agency in learning, setting a collaborative tone for the day’s discussions. The second keynote, “The Prompt Is Not Enough: Meta-Task Awareness in Post-Prompting Contexts,” was delivered by Dr. Lung Hsiang Wong, Senior Education Research Scientist at the Centre for Research in Pedagogy and Practice, NIE, NTU. Dr. Wong challenged the field to look beyond surface-level prompt engineering and consider the deeper metacognitive demands placed on both learners and AI systems. After the lunch break, Professor Jiannan Li, Assistant Professor of Computer Science at the School of Computing and Information Systems, Singapore Management University, presented on “Mixed-Modality Learning Interfaces and Opportunities for Multimodal Generation,” highlighting emerging interaction paradigms at the intersection of human-computer interaction and AI-generated content. The final keynote was delivered by Mr. Nansong Wang, Vice Principal of TAL Education Singapore, who shared practical insights on “Exploration and Application of AI in Education by Think Academy,” offering an industry perspective on deploying AI at scale in mathematics education.

The workshop featured four oral sessions spanning a wide range of themes. Oral Session A focused on AI-powered teachable agents and learning interactions. A notable contribution examined whether AI-powered teachable agents benefit students evenly, finding nuanced relationships between individual learner characteristics and mathematical outcomes. Another paper introduced a structured SOEI framework for modeling personality-aligned virtual student agents, advancing the realism and evaluability of simulated learners.

Oral Session B addressed knowledge tracing, personalization, and assessment. RouteKT, a knowledge tracing framework that models students’ problem-solving routes using large language models, was presented alongside work on personalized mathematics tutoring that incorporates persona-, memory-, and forgetting-aware mechanisms. A paper evaluating LLMs as self-assessing educational agents explored dual-role modeling for reliable AI-generated exams, raising important questions about assessment validity in AI-mediated contexts.

Oral Session C explored multi-agent tutoring and adaptive systems. EduVerse was introduced as a user-defined and developmental multi-agent simulation space for educational scenarios, while MASA presented a multi-agent framework for scenario-based assessment of critical thinking through guided interviews. CoLearn offered a multi-agent approach to personalized blended learning in higher education, demonstrating the growing sophistication of agent coordination in instructional design.

Oral Session D closed the day with work on multimodal content generation and recommendation. SpeakerTrainer demonstrated multimodal coaching of presentation skills on mobile devices, and a teacher-in-the-loop story-to-video system showcased how vision-language models can support educational content authoring with teacher oversight. A multimodal sequential recommendation system for online courses leveraged MLLM-enhanced semantic edge embeddings to improve course discovery.

The workshop received 60 submissions in total. After 14 desk rejections and 6 withdrawals, 40 valid submissions underwent peer review, each evaluated by at least two reviewers. Of these, 21 papers were accepted, yielding an acceptance rate of 35%. The breadth of accepted work — spanning knowledge tracing, multi-agent systems, multimodal recommendation, adaptive tutoring, and synthetic data generation — reflected the field’s rapid diversification and growing methodological maturity.

Across sessions, recurring themes included the need for pedagogical alignment in AI-generated content, the importance of equity and fairness across diverse learner populations, and the challenge of building interpretable systems that educators can trust. The workshop surfaced both the remarkable promise of LMMs and agentic AI in education and the critical work that remains to ensure these technologies serve all learners responsibly.

Zitao Liu, Yu Lu, Emmanuel G. Blanchard, and Tianqiao Liu served as cochairs of this workshop.

This report was written by Tianqiao Liu.

Assessing and Improving Reliability of Foundation Models in the Real World (W15)

The Workshop on Assessing and Improving Reliability of Foundation Models in the Real World served as a forum for researchers and practitioners to discuss definitions, metrics, and methods for reliability quantification, explore principled evaluation frameworks, and propose strategies to enhance robustness and trustworthiness across language and vision tasks. By bridging the large language model and vision-language model communities, the workshop aimed to foster cross-domain insights and encourage approaches that ensure dependable performance in operational settings.

No formal report was filed by the organizers for this workshop.

Artificial Intelligence for Air Transportation (W16)

The inaugural Workshop on Artificial Intelligence for Air Transportation brought together researchers and practitioners to explore how AI can address pressing challenges across air traffic management, airport operations, advanced air mobility, and aviation safety. Held in Singapore, a global hub for aviation innovation and regulatory leadership, the workshop provided a dynamic forum for interdisciplinary exchange and collaboration.

No formal report was filed by the organizers for this workshop.

Artificial Intelligence for Time Series Analysis: Theory, Algorithms, and Applications (W17)

The Workshop on Artificial Intelligence for Time Series Analysis provided a platform for researchers and AI practitioners from both academia and industry to discuss potential research directions, key technical issues, and solutions to challenges in practical applications. The workshop focused on both the theoretical and practical aspects of time series data analysis across domains including IoT devices, healthcare, smart vehicles, financial markets, and environmental sciences.

No formal report was filed by the organizers for this workshop.

Audio-Centric AI: Towards Real-World Multimodal Reasoning and Application Use Cases (W18)

The First Workshop on Audio-Centric AI brought together researchers and practitioners to discuss recent advances in audio-centric foundation models, multimodal large language models with speech and audio capabilities, and their evaluation, robustness, and societal impact. The workshop provided a focused venue for examining emerging methodologies in speech understanding, audio generation, cross-modal reasoning, and safety, while fostering interdisciplinary dialogue across the speech, machine learning, and broader AI communities.

The First Workshop on Audio-Centric AI was held in conjunction with the AAAI 2026. The workshop addressed the rapid evolution of audio foundation models and large audio-language models, which extend the paradigm of large language models to speech, environmental sound, and multimodal audio-text reasoning. As audio-capable AI systems become increasingly integrated into real-world applications, the workshop aimed to consolidate current research progress while identifying open challenges in modeling, evaluation, efficiency, and safety.

The technical program featured invited talks, contributed paper presentations, and interactive discussions. Invited speakers included Hung-yi Lee from National Taiwan University, Tsubasa Takahashi from Acompany Co., Ltd, Wenwu Wang from the University of Surrey, Yu Tsao from Academia Sinica, and Bjorn Schuller from Imperial College London. Their talks spanned topics such as scaling speech-language pre-training, multimodal alignment between acoustic and textual representations, paralinguistic modeling for affective computing, and trustworthy AI. Collectively, the invited presentations emphasized the importance of moving beyond conventional automatic speech recognition benchmarks toward holistic modeling of speech content, speaker traits, emotion, and acoustic environments.

The contributed papers spanned multimodal modeling, generation, robustness, and evaluation. Several works advanced speech-language integration through improved pre-training strategies, teacher-student distillation, and joint modeling of linguistic and acoustic representations. A newly proposed benchmark for audio reasoning underscored the need for evaluation protocols that extend beyond conventional speech recognition metrics. Generative research focused on controllable and expressive text-to-speech, including emotional modeling, cross-lingual synthesis in low-resource settings, and architectural designs that enhance diversity and prompt-based control. Robustness and safety were recurring topics, with authors examining adversarial vulnerabilities in speech systems, evaluation frameworks for audio reasoning and information retrieval in multimodal outputs. The program also included broader audio research on environmental sound evaluation, acoustic scene classification, real-time source separation, and regional speech corpus development. Across discussions, participants emphasized standardized evaluation, multilingual generalization, computational efficiency, and responsible deployment as central challenges for future audio foundation models.

Overall, the workshop demonstrated that audio foundation models are transitioning from task-specific pipelines toward unified multimodal systems capable of reasoning across speech, sound, and text. The event provided a timely platform for synthesizing perspectives from academia and industry, while outlining research directions toward scalable, robust, and trustworthy audio AI.

Xiaoxue Gao, Nancy F. Chen, Keisuke Imoto, Nobutaka Ono and Tatsuya Komatsu served as cochairs of this workshop. This report was written by Xiaoxue Gao.

Automated Spatial and Temporal Anomaly Detection (W19)

The 3rd Workshop on Automated Spatial and Temporal Anomaly Detection (ASTAD) brought together AI researchers and practitioners to discuss recent advances in anomaly detection, with a focus on spatial and temporal data. As AI-driven systems are increasingly deployed in many application domains such as healthcare and industry, the need for robust anomaly detection has become more urgent because rare or unexpected events can seriously affect safety and reliability. In the era of foundation models and large-scale learning, the workshop highlighted emerging directions such as employing foundation models for anomaly detection, zero-shot and few-shot settings, real-time detection for industrial automation and healthcare, and explainable methods that improve transparency and trust. The program combined three invited talks with oral and poster presentations, emphasizing approaches that generalize well, quantify uncertainty, and remain reliable in real-world deployments.

The 3rd ASTAD Workshop was held as part of AAAI 2026 and focused on identifying rare, unexpected, or harmful events in data that changes over time. As real-world systems increasingly depend on continuous streams from sensors, cameras, and other monitoring sources, anomaly detection has become essential for safety and reliability yet remains difficult because anomalies are high-impact but often poorly defined, sparsely labeled, and easy to overlook when data distributions shift. Reflecting new trends in the era of foundation models and large-scale learning, the workshop highlighted directions such as foundation-model-based anomaly detection, zero-shot and few-shot detection, real-time monitoring in domains like healthcare and industrial automation, and explainable methods that improve transparency and trust. Overall, the workshop served as a forum to exchange ideas across time series analysis, computer vision, multimodal learning, and deployment-focused anomaly detection.

Three invited talks anchored the workshop and helped connect practical needs. Dr. Ye Zhu (School of Information Technology, Deakin University) presented “Anomaly Detection Based on Isolation Mechanisms,” explaining why isolation-based ideas can be effective for large-scale and high-dimensional settings, and how they extend to scenarios such as streaming data and time series. A key takeaway from the discussion was that isolation mechanisms can offer strong efficiency and scalability, but there remains significant room to better handle complex modalities and changing temporal dynamics.

Dr. Jie Ren (Google DeepMind) delivered the keynote “Uncertainty Estimations in LLMs,” highlighting why uncertainty matters when we deploy AI systems in the real world. Participants connected this theme directly to anomaly detection, i.e., when a system encounters something unusual, it should not only produce a decision, but also communicate how confident it is, so downstream users can decide whether to trust the output, request human review, or trigger a safer fallback.

Prof. Guansong Pang (School of Computing and Information Systems, Singapore Management University) presented “Anomalies Are Not a Class: Leveraging Labeled Anomalies in Deep and Generalist Anomaly Detection.” The talk emphasized a practical reality, i.e., while anomaly detection is often framed as unsupervised (training on mostly normal data), many applications do contain some labeled anomalies. The challenge is using those labels without turning the task into ordinary closed-set classification, since future anomalies can look different from anything seen before. This sparked active discussion on evaluation protocols and how to measure generalization to unseen anomaly types.

Beyond the keynotes, the workshop included ten oral presentations and ten posters, which together reflected the diversity of the field, from new modeling ideas to application-driven studies. Rather than cataloging every paper, a useful way to summarize the contributed program is by its shared emphasis on robustness; that is how to detect anomalies under limited labels, how to stay reliable under domain shift, and how to build methods that remain interpretable enough for real-world settings.

The workshop attracted around 40 participants from North America, Europe, and Asia, and the format encouraged frequent questions and lively discussion throughout the day. Overall feedback was strongly positive, and the closing session focused on concrete ways to stay connected, such as follow-on collaborations, shared benchmarks and datasets, and future editions of the workshop that continue building the ASTAD community.

Narges Armanfard (McGill University and Mila – Quebec AI Institute) served as the chair of this workshop, and Thi Kieu Khanh Ho and Hadi Hojjati (McGill University and Mila – Quebec AI Institute) served on the organizing committee.

This report was written by Narges Armanfard, Thi Kieu Khanh Ho, and Hadi Hojjati.

Bodily Expressed Emotion Understanding 2026 (W20)

The 2nd BEEU workshop focused on bringing together researchers in computer vision, machine learning, robotics, psychology, dance, theater, design, and graphics, to discuss approaches and latest results on modeling body language automatically. The agenda of the workshop included several invited talks, a research panel, paper presentations, and a discussion of a dataset that will be released by the organizing team. The workshop successfully attracted a group of highly engaged participants who spent the day with us, fostering insightful discussions on interdisciplinary topics related to understanding bodily expression.

The goal of the workshop was to share research, encourage discussions on, and foster collaborative thinking in the space of bodily expressed emotion understanding. Major areas of application that were discussed included social and expressive robotics, assistive technologies, affective computing, and movement theory. The workshop mainly featured several invited (keynote) talks, including those from (a) Heather Knight, Assistant Professor of Computer Science and Robotics at Oregon State University, and Researcher at Allen Control Systems, (b) Beatrice de Gelder, Professor in the Department of Cognitive Neuroscience at the University of Maastricht, (c) Soujanya Poria, Associate Professor of Computer Science at Nanyang Technological University (NTU), Singapore, and (d) Lauren Bedal, Head of Product Design at Archetype AI.

The talks included a variety of topics: Heather’s talk, titled “Bodily Expressed Emotion from Human and Robot ‘Body’ Perspectives”, explored what meaning embodiment holds in applications such as functional or expressive robotics, and how subjective elements like emotions can be made to take shape in robot “bodies”. Beatrice’s talk approached the core focus of the workshop with a neuropsychological outlook and described how movement is encoded, processed, and expressed in the brain. The talk also focused on how neural substrates can be interpreted to draw connections between aspects of movement and emotions. Soujanya’s talk focused on the ever-popular large language and vision models, providing a detailed overview of the mathematical underpinnings of these methods, and how they currently lack the ability to operate effectively in the physical space. Finally, Lauren’s keynote approached technology as a design material – including exploring the roots of its design in the arts, and how subjective signals such as emotions can be leveraged to build foundational physical models that serve a wide range of applications.

The workshop further included a Research Panel, including Heather Knight, Beatrice de Gelder, and James Wang (Distinguished Professor of Informatics and Intelligent Systems at Penn State), which tackled questions ranging from epistemic differences in defining bodily expressed emotions to addressing subjectivity in affective modeling, potentially exciting or concerning applications of such models, and known failure modes of current state-of-the-art models. The research panel also welcomed questions from several members of the audience, which led to a spirited debate on the need for expressive embodied robots.

The workshop also featured talks from members of the organizing committee, including Amy LaViers, director of the RAD Lab, and Justin Lokos, an independent researcher associated with the Wang Group at Penn State. The talks highlighted recent advances that have been made by the team to further research in the space of bodily expressed emotion understanding, including the development of a novel movement annotation scheme (BESST notation), based on Laban Movement Analysis, and the ongoing development of a large-scale annotated bodily expressed emotion dataset and data sharing infrastructure. Finally, the workshop included six paper presentations, each lasting ten minutes, and showcased accepted papers. The paper topics included affective modeling of bodily expressed emotions using core computer vision-related methods, applications to movement theory, and methods for adjacent tasks like humor understanding.

James Wang, Amy LaViers, Rachelle Tsachor, Reginald B. Adams Jr., and Sree Bhattacharyya served as cochairs of this workshop. This report was written by Sree Bhattacharyya.

Bridging Neurons and Symbols for NLP and Knowledge Graph Reasoning (W21)

Neural networks have achieved remarkable success across domains such as question answering, game playing, mathematical reasoning, and code generation. However, large language models (LLMs) exhibit unpredictable behaviors, fragile abstract reasoning, and cases of correct answers supported by incorrect explanations. While multi-step prompting and code-based prompts can improve reasoning performance, it remains unclear whether LLMs truly reason or can reach the rigour of symbolic systems, raising concerns about reliability and societal risk. Motivated by recent advances such as the Sphere Neural Network and neural-symbolic collaborative distillation (NesyCD), this workshop brings together researchers in NLP and knowledge graph reasoning to explore how symbolic and neural approaches can be effectively integrated.

The workshop features six invited talks from leading researchers that explore key themes at the intersection of neural and symbolic AI. These talks collectively explore how symbolic knowledge, explicit semantics, structured reasoning, and neural models can be combined to improve interpretability, reliability, and reasoning robustness in modern NLP and knowledge graph systems.

The six talks tell a coherent story of how modern AI can move beyond surface-level fluency toward grounded, reliable, and explainable intelligence. The journey begins with Roberto Navigli (Professor of Natural Language Processing at the Sapienza University of Rome), who highlights the limits of pure neural semantics and shows how multilingual symbolic resources such as BabelNet can complement large language models’ internal representations of meaning. Building on such a multimodal semantic foundation, Regina Zhang (a postdoctoral fellow at the Nanyang Technological University) demonstrates how robust and efficient graph learning methods enable structured reasoning over spatial-temporal data in complex domains such as urban computing and biology. Bridging these perspectives, Liu Kang (full professor at the Institute of Automation, Chinese Academy of Sciences) explores bidirectional integration between symbols and neurons—embedding symbolic knowledge into neural models while extracting interpretable symbolic patterns from learned parameters. Addressing the critical issue of trust and safety, Minghui Dong (Professor and Chief Scientist at the Longgang Institute of Zhejiang Sci-Tech University) argues that truly responsible AI systems must know when not to answer, emphasizing evidence-grounded reasoning in high-risk settings. Extending interpretability to language understanding, Ruihong Huang (Associate professor in the Department of Computer Science & Engineering at Texas A&M University) shows how discourse and event structures reveal media bias and subtle contextual meaning that fluent text alone cannot capture. Finally, Zheng Wang (Assistant Chief Expert at Huawei) brings these ideas into practice by presenting an open-source platform for building reliable AI agents with advanced knowledge retrieval and planning. Collectively, these talks trace a clear arc: from semantic grounding and structured learning, through neural-symbolic integration, to deployable and trustworthy AI agents.

The workshop accepted a diverse set of research papers that explore neural-symbolic integration, reasoning in large models, and knowledge-guided AI, reflecting current challenges and innovations at the interface of symbolic and neural approaches. Topics covered by the workshop include neural-symbolic reasoning and alignment; chain-of-thought and latent steering methods for guiding model reasoning; graph- and table-based reasoning with large language models; neurosymbolic model tuning and prompt-engineering strategies; theoretical and empirical studies aimed at understanding the internal reasoning mechanisms of neural models; generalist and hybrid task frameworks that integrate symbolic and neural components; as well as issues of safety, robustness, and interpretability in advanced AI systems. There are multiple contributions combining symbolic structures with neural models to enhance robustness, interpretability, and reasoning quality.

The acceptance rate was 33.33%, indicating a highly rigorous and selective peer-review process in which submissions underwent thorough evaluation for scientific quality, originality, and relevance, with only a limited proportion being accepted for presentation. The accepted papers highlight advances in combining symbolic reasoning with neural models, exploring both theoretical foundations and practical frameworks for improved AI reasoning and safety.

The workshop selects the top one or two papers from the accepted submissions for the Best Paper Award — these stand out for their scientific quality, novelty and impact among all submissions.

Yunfei Long, Yansong Feng, Xianpei Han, Shizhu He, and Tiansi Dong served as cochairs of this workshop. This report was written by Dr. Tiansi Dong.

Consistency in Video Generative Models: from Clip to Wild (W22)

The Workshop on Consistency in Video Generative Models addressed a pivotal challenge in video generation: ensuring consistency across four key dimensions: intra-clip world knowledge consistency, inter-clip camera consistency, inter-shot element consistency, and human-in-the-loop preference consistency. The workshop brought together researchers and practitioners to foster collaboration, establish benchmarks, and develop robust methodologies for creating trustworthy and credible video generative systems.

No formal report was filed by the organizers for this workshop.

Creative AI for Live Interactive Performances (W23)

The workshop on Creative AI for Live Interactive Performances brought together researchers, artists and industry practitioners to explore interactive AI systems that collaborate with humans across the performing arts, including music, visual arts, dance and drama. Unlike traditional generative AI that produces finished outputs, the workshop focused on systems that respond dynamically to human input, adapt to context and facilitate ongoing human-AI collaboration in real time.

The workshop on Creative AI for Live Interactive Performances was held on February 26, 2026, at the Fortieth AAAI Conference on Artificial Intelligence in Singapore. The day opened with a brief introduction into the workshop and our work on the SomaBotics programme from the chairs Kieran Woodward (University of Nottingham), Alicia Falcon-Caro (University of Nottingham) and Richard Ramchurn (University of Nottingham) followed by four oral presentations spanning a range of approaches to creative AI. These included DanceChat, which uses large language models to guide music-to-dance generation, TalkSketch, demonstrating multimodal generative AI for real-time sketch ideation using speech, Move-Me, a wearable AI choreographic companion system and an autonomous policy debating system. A highlight of the morning session was AI Lens, a live generative AI camera system presented by Dr. Richard Ramchurn (University of Nottingham). The system transforms images in real time and the live demonstration proved particularly engaging as attendees interacted with the transformations as they happened.

The afternoon featured a keynote by Dr. Jose Luis Contreras-Vidal (University of Houston) on the challenges and opportunities for brain-computer interface-driven generative AI in the performing arts. His work with Balinese Gamelan performers illustrated the potential for brain-computer interfaces to enable new forms of creative expression, with video showing dancers wearing EEG headsets whose brain signals drove live visualisations.

A distinctive feature of the workshop was its emphasis on interactive demonstration and networking. Across two poster and demo sessions, attendees engaged directly with working systems. TalkSketch invited participants to try real-time sketch generation through speech. The autonomous debating system drew interest as people tested its persuasive capabilities in dialogue. Directing Space, presented by researchers exploring architecture as a performer using explainable AI, offered an interactive experience to direct lighting in real-time. Other contributions included TradJockey, a live remixing system for traditional music; SightDog, exploring AI-enhanced guide dogs through creative dialogue, Human-cantered Video Generation, experiential AI and live embodied human-AI performance systems.

The closing discussion addressed how to better involve artists in creative AI research and conferences. There was consensus that the most compelling work emerged where technical and creative perspectives intersected and that future events should bring artists along as collaborators rather than end-users. Finally, plans for continuing to build this interdisciplinary community were discussed. Overall, the workshop was successful in bringing together AI researchers, creative professionals and industry to discuss advances in the emerging field of creative AI for live interactive performance.

Accepted papers from the workshop will be published as Springer CCIS proceedings in March 2026.

Kieran Woodward, Steve Benford, Alicia Falcon-Caro, and Richard Ramchurn served as cochairs of this workshop. This report was written by Kieran Woodward.

Deployable AI Workshop (W24)

Artificial Intelligence has become a broad, rapidly advancing research area, and recent generative models have demonstrated strong performance across many tasks. This workshop examined a persistent tension in the field: impressive capabilities in controlled evaluations do not reliably translate into effective use in real-world settings. Across keynotes and discussions, the workshop emphasized that responsible deployment depends on progress in algorithmic methods, system design, and societal governance, with fairness, ethics, explainability, and privacy treated as central requirements rather than afterthoughts.

The workshop convened researchers working on the many facets of deployability, with the shared aim of supporting deployments that contribute to societal benefit. Participation from a global audience reinforced the sense that deployment challenges are widespread and context-dependent. A recurring theme was that model quality, while necessary, rarely suffices on its own: deployment places models inside sociotechnical environments shaped by users, institutions, operational constraints, and policy expectations.

The opening keynote was delivered by Professor Ramayya Krishnan of Carnegie Mellon University, titled “AI Capabilities vs. AI Deployment: Models and methods to fill the gap.” He situated his remarks in the context of rapidly improving model performance and the growing use of benchmarks that reflect realistic tasks. At the same time, he noted that organizations often struggle to integrate these systems into practice. His talk analyzed this capability-deployment gap and made a case for a systems perspective that accounts for user needs, organizational processes, and policy constraints.

A further highlight was a fireside chat on “Global North, South & the Future,” featuring Professor Gopal Ramchurn of the University of Southampton and Responsible AI UK, and Professor Balaraman Ravindran of the Indian Institute of Technology Madras, the CeRAI, and the WSAI. The exchange underscored that deployment conditions vary substantially across regions and institutions, and that these differences shape both priorities and risk. The discussion encouraged participants to approach deployability with attention to infrastructure, governance capacity, and the distribution of benefits and harms, particularly in settings where resources and policy environments differ markedly.

In his keynote, Professor Pradeep Varakantham of Singapore Management University discussed a research agenda for sequential decision-making systems intended to assist humans. He highlighted recent work on constrained reinforcement learning and defenses against adversarial attacks, emphasizing that agents remain safe under operational constraints and robust to malicious interference.

Professor Pang Wei Koh of the University of Washington delivered a keynote on finding supervision for complex tasks, where solving or even verifying model outputs requires substantial time and expertise, limiting scalable data collection. He described three complementary approaches: learning from relative quality signals through “delta learning,” diagnosing model weaknesses with EvalTree to target data collection, and training a long-form deep research model through reinforcement learning with rubrics that evolve during training to maintain discriminative supervision on difficult tasks.

Dr. Boyi Li of NVIDIA Research and the University of California, Berkeley, presented “FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos,” arguing that motion understanding is a prerequisite for physical reasoning. She introduced an automated pipeline that constructs large-scale motion datasets from videos using object tracking and large language models, and reported that models trained on these datasets achieve substantial gains in motion understanding and spatial reasoning relative to strong baselines.

Across the day, the keynotes offered complementary perspectives on deployable AI, reinforcing the view that deployability is shaped jointly by technical performance, system-level constraints, and societal impact. The contributed program added depth by addressing practical barriers and responsible deployment objectives. The workshop accepted 42 papers from 57 submissions, presented through oral and poster sessions that enabled both broad exchange and detailed technical discussion. The poster sessions were particularly lively, sparking conversations across multiple dimensions of deployability. In total, the accepted papers included 201 authors representing 20 countries, reflecting strong international engagement and diverse deployment contexts. Overall, the workshop highlighted a research agenda in which algorithmic advances are coupled with credible evaluation, operational fit, and explicit commitments to fairness, ethics, explainability, privacy, and security.

Balaraman Ravindran served as the workshop’s program chair. Dr. Gokul S. Krishnan served as the workshop’s session chair. Aravindan Raghuveer, Arpita Biswas, Arun Rajkumar, Chandrashekar L., Devika Jay, Harish Ramaswamy, Krishna Pillutla, and Rahul Vashisht served on the committee.

This report was written by Rahul Vashisht.

Emerging AI Technologies for Music (W25)

The 1st International Workshop on Emerging Artificial Intelligence Technologies for Music (EAIM 2026) explored human-centric AI systems that empower creativity through controllability, interpretability, and collaboration in music composition, performance, and production. The workshop featured four keynote addresses, four oral presentations, a poster session, and an open panel discussion. Workshop proceedings were published in Proceedings of Machine Learning Research (PMLR, Vol. 303).

The 1st International Workshop on Emerging Artificial Intelligence Technologies for Music (EAIM 2026) was held on 26 January 2026 in Singapore as part of the AAAI-26 conference. The event addressed the gap between current AI capabilities and their practical adoption by the creative community, advocating for systems that serve as collaborators rather than replacements for human artists.

The morning keynotes focused on generative pipelines and co-creativity. Ziqian Ning (AI Research Scientist at ByteDance Seed) presented the “DiffRhythm” series, showcasing open-source, non-autoregressive models capable of generating high-fidelity, full-length songs with vocals and accompaniment in seconds. He also introduced “SongEval,” a benchmark utilizing professional musician annotations to evaluate musical aesthetics beyond standard objective metrics. Dorien Herremans (Associate Professor at the Singapore University of Technology and Design) followed with a panoramic view of music AI in the age of large language models. She detailed “Mustango” and “Text2Midi”, two text-to-music systems offering explicit control over textual prompts including chords and tempo, and discussed paradigms for aligning AI outputs with human aesthetic preferences, such as reinforcement learning, direct preference optimization and inference time alignment methods.

Next, four oral presentations ensued from diverse research frontiers. Brandon Carone (PhD candidate, NYU) showed that large language models perform near-ceiling on symbolic MIDI but drop sharply on raw audio, identifying the audio encoder as the primary bottleneck for music perception. Najla Sadek and Joseph Bakarji (from the Music Intelligence Lab at the American University of Beirut) demonstrated that an unsupervised autoencoder trained only on Bach’s Well-Tempered Clavier spontaneously recovers the circle of fifths in their latent space. Yi-Hsuan Yang (National Taiwan University) presented “FABIO,” a model for converting vocal timbral techniques such as whispers or screams, while preserving speaker identity. Finally, Yuxuan Liu (Xi’an Jiaotong-Liverpool University) introduced “TS-RAMIA,” a grey-box membership inference framework designed to audit whether copyrighted symbolic music was used in model training. Following this, nine papers were presented at the poster session, spanning music visualisation, voice synthesis, South Asian rhythm learning, and diffusion-based therapeutic sleep music.

The afternoon program featured industry and historical perspectives. Harry Tan (researcher at Universal Music Group) discussed translating research into products, such as “MUSTAC” for automated catalog tagging and “Solos” for scientifically motivated sound therapy. He discussed a recent paper from their lab titled “SLAP” which closes the audio-text embedding gap without large batches of negative samples. Harry also presented his research on lossy compression detection, controllable music restoration, and a two-stage deepfake singer identification. He noted that UMG’s stance is method-dependent: creative augmentation is welcomed; anything that infringes on artist rights is not.

Ethan Manilow (Google DeepMind Magenta) offered a historical analysis tracing music technology from notation through acoustic recording, electric recording, magnetic tape, and DAWs to the present. Drawing on Attali, Benjamin, and McLuhan, he argued that each new technology faced accusations of soullessness before enabling entirely new genres. He positioned AI as both a continuation of the recording story and the start of a distinct generative chapter—one where artists may release models rather than albums, and challenged the researchers present to define what that world should look like.

A closing panel featuring the 4 keynote speakers moderated by Keshav Bhandari explored the pace of progress in music generation, the shift toward creative sandbox interfaces, the promising future of specialist models orchestrated by larger foundational systems, and the importance of human agency. The consensus was that controllability, interpretability, and human-centred design must remain first-class priorities as the field matures.

The proceedings are published in the Proceedings of Machine Learning Research (PMLR, Vol. 303). It is hoped that this inaugural event will serve as a springboard for continued collaboration at the intersection of AI and music.

Dorien Herremans, Keshav Bhandari, Abhinaba Roy, Simon Colton, and Mathieu Barthet served as co-chairs of this workshop. This report was written by Keshav Bhandari.

AI Governance Workshop: Alignment, Morality, Law, and Design (W26)

The workshop examined the growing challenge of governing increasingly autonomous and agentic AI systems. As AI evolves from generative tools to systems capable of independent planning, decision-making, and task execution, existing governance approaches are struggling to keep pace. The program brought together researchers, industry practitioners, and policymakers to develop actionable frameworks for safe, aligned, and accountable autonomous agents under the organizing theme of “Governance by Alignment, Morality, Law and Design,” embedding responsibility mechanisms directly into agentic architectures rather than applying governance as an external constraint.

Over 120 participants convened for the full-day workshop. Sessions were structured to move from foundational perspectives through technical methods to deployment and policy challenges, ensuring both conceptual grounding and practical applicability.

The workshop opened with remarks by Himanshu Joshi (COHUMAIN Labs, UT Austin), who framed the day around the need to move beyond principle-based responsible AI toward measurable, system-level accountability. He emphasized that governance must become continuous and operational, particularly as AI systems assume more autonomous roles in high-stakes environments.

The opening spotlight session highlighted emerging tools making governance technically actionable. Dhari Gandhi (Vector Institute for Artificial Intelligence) presented the Responsible AI Governance (ResAI) platform, an end-to-end framework that operationalizes governance across the AI lifecycle by translating abstract requirements into concrete, practitioner-ready controls. Dmitrii Volkov (Palisade Research) examined the reproducibility of evidence related to malicious AI behavior, positioning experimental rigor and standardized safety evaluation as foundational governance concerns. Djallel Bouneffouf (IBM Research) explored how large language models can simultaneously drive consensus and amplify divergence, underscoring societal risks when generative systems influence group reasoning at scale.

The keynote address was delivered by Professor Simon Chesterman (National University of Singapore), titled Silicon Sovereigns: Artificial Intelligence, International Law, and the Tech-Industrial Complex. Chesterman offered a geopolitical and legal analysis of AI concentration among a small number of technology actors, warning that fragmentation across national regulatory regimes and misalignment between legal frameworks and technical realities risk introducing systemic governance failures. His remarks served as a conceptual anchor for subsequent discussions throughout the day.

Panel discussions connected research advances with deployment realities. The morning panel, moderated by Himanshu Joshi and featuring Artem Petrov, Abhijeet Khadilkar, Professor Simon Chesterman, and Miro Pluckebaum, examined the persistent gap between governance frameworks and operational practice, emphasizing the need for stronger auditability, continuous monitoring, and human-in-the-loop design. Peer-reviewed oral presentations demonstrated strong technical engagement, covering red-teaming of AI agents in high-stakes domains, runtime steering mechanisms for safe reinforcement learning, executable governance through machine-interpretable policy constraints, emergent persuasion risks in LLMs, data attribution techniques, and scalable alignment approaches for multi-agent settings. Four poster sessions held throughout the day showcased path-breaking work by scholars, researchers, and industry leaders on making AI systems more secure and robust.

Afternoon spotlight sessions addressed frontier safety and legal preparedness. Adam Gleave (FAR.AI) discussed advances in red-teaming infrastructure for frontier models. Professor Sara Migliorini (University of Macau) examined liability regimes for rogue AI behavior and legal readiness for autonomous systems. Smrite Goudhaman and Himanshu Joshi presented the Agentic AI Governance Algorithm to the Accountability HITL Framework, proposing a structured approach for embedding human-in-the-loop oversight into agentic workflows.

The workshop concluded with an industry roundtable moderated by Himanshu Joshi and featuring Smrite Goudhaman, Parishrut Jassal, Archana Vaidheeswaran, and Samir Vats, focusing on organizational readiness for agentic AI adoption, monitoring infrastructure, compliance automation, and cross-functional governance processes. Participants consistently emphasized that ensuring trustworthy autonomous AI requires tightly integrated technical, legal, and socio-technical approaches, alongside sustained collaboration between research, industry, and policy communities.

Baihan Lin, Dhari Gandhi, Djallel Bouneffouf, Franziska Boenisch, Himanshu Joshi, Sara Migliorini, Sedef Kocak, and Shaina Raza served on the Program Committee. Baihan Lin and Himanshu Joshi served as co-chairs of this workshop. This report was written by Himanshu Joshi and Dhari Gandhi.

Foretell of Future AI from Mathematical Foundation (W27)

The workshop “Foretell of Future AI from Mathematical Foundation” was held in January 2026 as Workshop W27 at the 40th AAAI Conference in Singapore. It brought together researchers from mathematics, computer science, and related fields to discuss how rigorous mathematical thinking can inform, guide, and anticipate the development of modern AI systems.

The workshop attracted a diverse audience of faculty, postdoctoral researchers, industry practitioners, and students from difference institutions worldwide. The event was organized by Gitta Kutyniok (Ludwig Maximilian University of Munich), Simon See (NVIDIA AI Technology Center), Li Qianxiao (National University of Singapore), Leevan Ling (Hong Kong Baptist University), and Michael K. Ng (Hong Kong Baptist University), together with NVIDIA AI Technology Center MATH4AI members Charles Cheung, Juntao Yang, and Ivan Au Yeung. Their shared aim was to highlight how solid mathematical foundations can help address questions of reliability, interpretability, efficiency, and scalability in artificial intelligence, while also inspiring new AI architectures and methodologies.

The one-day program combined invited and contributed presentations with a poster session and a closing panel discussion. Five invited talks set the stage by surveying key fronts in mathematical AI. Yuan Yao (Hong Kong University of Science and Technology) revisited Smale’s 18th problem in the age of AI, discussing how classical questions in complexity and computation theory resonate with contemporary learning systems. Jianyu Hu (Nanyang Technological University) presented work on physics-informed kernels for partial differential equations, illustrating how kernel methods can bridge numerical analysis and data-driven modeling. Kelin Xia (Nanyang Technological University) introduced the audience to mathematical AI from the viewpoint of topological data analysis and topological deep learning, demonstrating how topological invariants can capture global structure in high-dimensional data. Tan Minh Nguyen (National University of Singapore) examined weight space symmetries in modern deep learning architectures, with implications for optimization and generalization. Finally, Rong Tang (Hong Kong University of Science and Technology) discussed minimax rates of distribution regression, providing statistical learning theory insights for problems where inputs are probability distributions rather than individual data points. These talks were complemented by four contributed presentations on topics such as gradient methods on curved spaces, expressivity limits of transformers, adaptive stepsizing in Bayesian neural networks, and the design of efficient spiking neural networks, providing concrete case studies of how mathematical structure can shape algorithm design.

Beyond the oral sessions, the workshop featured a poster session with ten posters, which created a setting for deeper technical exchanges and was particularly valuable for early-career researchers and students. An outcome of the day was the launch of the MATH4AI Special Interest Group, which brought together a group of professors committed to sustained collaboration on mathematical aspects of AI, including joint research, mentoring, and future community activities. The workshop concluded with a panel discussion featuring Simon See (NVIDIA AI Technology Center), Juan-Pablo Ortega (Nanyang Technological University), Yuan Yao (Hong Kong University of Science and Technology), and Benedict Leimkuhler (University of Edinburgh). The panelists reflected on open challenges and future directions for AI grounded in mathematics, emphasizing closer integration between pure mathematics and AI engineering. They also discussed how mathematics might proactively lead the formulation of new AI paradigms, rather than only explaining or fine-tuning existing algorithms, underscoring the central role of mathematically trained researchers in shaping the next generation of artificial intelligence.

Gitta Kutyniok, Simon See, Li Qianxiao, Leevan Ling, and Michael K. Ng served as cochairs of this workshop. This report was written by Ivan Au Yeung.

Post-AI Formal Methods (W28)

The Post-AI Formal Methods workshop brought together researchers working at the intersection of formal methods and artificial intelligence to explore how rigorous reasoning techniques can support the safety, reliability, and trustworthiness of modern AI systems. As AI increasingly relies on learned, optimized, and generative components rather than explicit programming, new conceptual and technical challenges arise. The event provided a focused forum within AAAI for a growing interdisciplinary community and marked the first dedicated gathering of this emerging area under a common banner.

As AI systems increasingly emerge through automated learning, large-scale optimization, and generative processes rather than direct human programming, longstanding questions of correctness, safety, and interpretability take on renewed urgency. Traditionally, formal methods provided rigorous guarantees for explicitly designed systems; however, modern AI systems often arise from training dynamics that challenge classical specification and verification paradigms. The workshop was convened to assess how formal reasoning methodologies may adapt to address this transition.

Researchers engaging at the intersection of formal methods and AI have contributed significantly to AAAI; however, their participation is often scattered across various tracks and sessions. Many have indicated the importance of a dedicated venue to facilitate more comprehensive discourse among peers confronting comparable conceptual and technical challenges. Consequently, this inaugural edition established a focal point within AAAI, fostering sustained dialogue among communities in symbolic reasoning, verification, and machine learning. The workshop was further strengthened by the guidance of a distinguished steering committee including Professor Moshe Y. Vardi, Dr.-Ing. Swen Jacobs, Professor Guy Katz, Associate Professor Bettina Konighofer, Professor Cesar Sanchez, and Professor Sriram Sankaranarayanan, whose support contributed to shaping its vision and outreach.

Held as a full-day, in-person event, the workshop maintained steady engagement throughout the day, with a minimum attendance of 35 participants and a peak of over 55. The response to the call for papers demonstrated broad international interest. From 20 submissions representing institutions across North America, Europe, and Asia, 16 papers were accepted across two complementary tracks: one highlighting AAAI main-track submissions and recently published work from leading venues, and another dedicated to exploratory contributions, extended abstracts, and emerging research directions. Each submission was reviewed by at least two members of a geographically and methodologically diverse evaluation committee, reflecting the breadth of expertise required in this interdisciplinary area.

The invited speakers list reflected the scope and diversity of research within the field. Dr. Soonho Kong (Amazon Web Services) discussed advances in automated reasoning and formal verification tools relevant to large-scale AI systems. Professor Hana Chockler (King’s College London) presented perspectives on causality, explainability, and formal reasoning for AI behavior. Professor Jin Song Dong (National University of Singapore) addressed verification and assurance techniques for complex and safety-critical systems. These talks illustrated both the progress of formal reasoning techniques and the new challenges posed by learning-enabled and generative AI systems.

The workshop concluded with a panel discussion led by the organizers. Panelists reflected on the evolving role of formal reasoning in an era where AI systems are increasingly opaque and emergent. The discussion addressed a central question: what does it mean to trust systems that we did not explicitly design? Participants emphasized that empirical performance alone cannot guarantee reliability, particularly in high-stakes settings, and that formal verification and logical reasoning provide essential foundations for accountability, transparency, and principled assurance in AI.

To encourage early-career engagement, the workshop introduced the P-AI-FM Registration Awards, supported by CISPA, which were awarded to Zhensu Sun and Belona Sonna. The organizers gratefully acknowledge the support of CISPA Helmholtz Center for Information Security, Amazon Web Services, and IMDEA Software Institute

The robust attendance and dynamic engagement at the inaugural workshop highlighted a clear demand for a dedicated forum within AAAI, with participants advocating for its continuation as a recurring event. The lively discussions and diverse representation underscored the growing importance of formal reasoning about AI systems—particularly those not explicitly designed by humans but still requiring trust and understanding—marking this intersection as a central theme for the AI community and establishing a foundation for ongoing collaboration and future editions.

Noa Izsak, Andoni Rodriguez, and Djordje Zikelic served as cochairs of this workshop. This report was written by Noa Izsak.

Spatial Reasoning and Therapeutics with Artificial Intelligence: From Omics to Imaging (W29)

The Workshop on Spatial Reasoning and Therapeutics with AI addressed the application of artificial intelligence to spatial reasoning problems in biomedical research, spanning from omics data to imaging. The workshop explored how AI methods can advance understanding of spatially resolved biological systems and accelerate therapeutic discovery.

No formal report was filed by the organizers for this workshop.

AI for Urban Planning (W30)

The 2nd Workshop on AI for Urban Planning was held at Singapore Expo, Singapore, January 26, 2026, including two invited keynote speeches, five oral paper presentations, three task/problem definition presentations, two tutorials, two poster sessions, and a panel discussion. This report contains summaries of the workshop, which were submitted by the workshop chair.

The workshop commenced its agenda with Opening Remarks. This introduction welcomed researchers, practitioners, and policymakers to the event, setting the stage to explore innovative AI-driven solutions for urban planning and to foster a co-learning paradigm known as the “New Urban Science” aimed at building smarter, more equitable, and sustainable cities.

Dr. Yu Zheng delivered the first invited talk, titled “Urban Computing: Enabling Spatio-temporal Intelligences in Cities”. He presented a data-centric computing framework designed to unlock the potential of massive, dynamic spatial and spatio-temporal big data, discussing key research challenges such as capturing spatio-temporal properties in AI models and cross-domain multimodal data fusion in the physical world.

Following this, Professor Tan Yigitcanlar presented the second invited talk on “Generative AI in Urban Planning”. His keynote explored how generative AI, large language models (LLMs), and agent-based architectures are reshaping planning by augmenting human capacity to analyze complexity, while also introducing the emerging paradigm of near-real-time “quantum cities”.

The morning continued with a Tea Break and Poster Sharing I session. This period provided attendees with a networking opportunity and a chance to engage directly with researchers presenting visual poster displays of their ongoing work.

The workshop then featured the “Task / Problem Definition” session, which successfully highlighted three distinct papers. The presented works included “UrbanControlNet: Reimaging Global Urban Development with Satellite Imagery,” “Urban-Net: A Standardized Dataset for AI for Sustainable City Planning,” and “Charting the Future of Urban Planning with AGI: A Five-Stage Research Agenda”.

The workshop subsequently broke for a Lunch Break. This scheduled interlude gave attendees time to rest, network, and informally discuss the morning’s presentations before the afternoon sessions began.

The afternoon resumed with the primary “Paper Presentation” block, which showcased five research papers. The comprehensive presentations covered a theory-grounded benchmark for urban plan quality (PlanKG), generative AI adoption by U.S. local governments, hidden archetypes of Seoul’s urban morphological patterns, few-shot urban AI modeling, and public transport route generation for equitable access.

Attendees then participated in the second Tea Break with Poster Sharing II. This session offered another dedicated window for informal discussions and the active dissemination of new research findings via poster presentations.

Following the break, the “Tutorial” session took place, focusing on practical applications through two specific paper presentations. The tutorials demonstrated an evaluation of LLM-generated survey data for urban studies using a four-tier protocol, as well as the implementation of a ChatGPT-assisted planning support systems platform.

The Panel Discussion featured a highly interactive, 30-minute debate among six experts driven by a “2+1” moderation strategy to maximize high-impact dialogue. The panelists explored core themes including the value of generative design versus solving practical policy inefficiencies, the need to identify an “ImageNet” standardized benchmark problem for the field, and ways to bridge the gap between AI’s need for massive data and the messy, human-centric reality of urban planning. They also debated the future role of human planners in the age of agentic AI—weighing the benefits of an AI “co-pilot” against the risks of an “auto-pilot” scenario—before concluding with rapid-fire “One-Year Moonshot” visions for collaborative breakthroughs expected by the 3rd Workshop at AAAI 2027.

Finally, the workshop concluded with Closing Remarks. This brief final session officially wrapped up the 2nd Workshop on AI for Urban Planning, concluding a successful day of interdisciplinary knowledge exchange between AI researchers, urban planners, and policymakers.

AI for Environmental Science (W31)

Recent advances have demonstrated the transformative potential of AI across a broad range of environmental applications. Despite these advancements, the integration of AI into environmental research remains fragmented across disciplines. This workshop seeks to advance the integration of AI into environmental science by providing a platform for interdisciplinary exchange, fostering collaboration across research communities, and accelerating the development of impactful, real-world solutions that address pressing global environmental challenges.

The half-day workshop was supported by TAIAO (https://taiao.ai/). It included a keynote talk, paper presentations, posters, and panel discussions. The keynote speaker, Prof. Woon-Soon Gan, Professor of Audio Engineering and Director of the Smart Nation TRANS Lab at Nanyang Technological University, delivered a talk titled “Listening to the City: Edge AI and Acoustic Intelligence for Sustainable Urban Environments.” The talk highlighted the importance of the acoustic environment as a critical yet often overlooked dimension of urban sustainability. Drawing on real-world deployments in Singapore, Prof. Gan discussed how signal processing and edge AI, powered by lightweight deep learning models, enable continuous environmental noise monitoring in dense urban settings. Applications include real-time urban soundscape classification and energy-efficient Active Noise Control (ANC) for natural ventilation, demonstrating how acoustic intelligence can support trustworthy decision systems and healthier cities.

Then, 15 oral presentations and 14 poster presentations introduced the novel AI methods for environmental science using various AI tools, including large language models, foundation models, graph neural networks, spatial-temporal transformers, diffusion models, transfer learning, retrieval-augmented generation (RAG), causal inference methods, and energy-efficient edge AI systems. These contributions covered a wide range of applications, including hydrological and weather forecasting, wildfire risk prediction, air quality and groundwater contamination modelling, methane plume detection, shoreline and coastal dynamics analysis, solar irradiance prediction, urban noise monitoring, waste detection, forest carbon estimation, biodiversity monitoring and wildlife re-identification, smart grid privacy analysis, energy-efficient control systems, and so on.

The panel discussion was on “Bridging the Gap: How Can AI Truly Serve Environmental Scientists?”. It was led by Prof. Yun Sing Koh from University of Auckland, and featured panelists Assistant Prof. Gianmarco Mengaldo from National University of Singapore, Assistant Prof. Luca Dal Zilio from Nanyang Technological University, Singapore, and Dr. Chen Wang from the National Institute of Water and Atmospheric Research, New Zealand. The discussion moved beyond technical promise and examined how AI can be designed, deployed, and evaluated in ways that genuinely serve environmental science needs.

Dr. Qian Liu (University of Auckland), Dr. Di Zhao (University of Auckland), Chen Wang (National Institute of Water and Atmospheric Research, New Zealand), Prof. Yun Sing Koh (University of Auckland) and Prof. Albert Bifet (University of Waikato) served as cochairs of this workshop. This report was written by Dr. Qian Liu, Dr. Di Zhao, and Prof. Yun Sing Koh.

Artificial Intelligence with Biased or Scarce Data (W32)

The Workshop on Artificial Intelligence with Biased or Scarce Data addressed the fundamental challenges that arise when training and deploying AI systems in real-world settings where data is limited, imbalanced, or systematically biased. The workshop explored methods for robust learning, data augmentation, and fairness-aware modeling to improve the reliability and equity of AI systems under realistic data constraints.

No formal report was filed by the organizers for this workshop.

Linguistic and Cognitive Approaches to Dialogue Agents (W33)

LACATODA 2026 explored the intersection of cognitive modeling and modern large language models (LLMs) to enhance the social, emotional, and reasoning capabilities of conversational agents. The workshop featured 12 accepted presentations, and one invited talk, addressed the critical transition from surface-level linguistic fluency in dialogue agents to deeper, cognitively grounded interaction. Many presentations showed that while current LLMs demonstrate remarkable generative success, they often struggle with multi-step reasoning, personality consistency, and emotional intelligence.

Contributors explored the intersection of cognitive modeling and modern large language models (LLMs) to enhance the social, emotional, and reasoning capabilities of conversational agents. They addressed the critical transition from surface-level linguistic fluency in dialogue agents to deeper, cognitively grounded interaction. A significant portion of the program focused on the social and psychological dimensions of artificial intelligence. James Hale and his team at the University of Southern California (USC) evaluated LLMs as dispute mediators, finding that they could successfully sense escalation indicators and generate messages preferred 2-to-1 by human observers over those of novice human mediators. Research by Bin Han (USC) and Hsien-Te Kao (Aptima, Inc.) examined the stability of personality expression in LLMs. Their findings suggested that while models can be conditioned with personality prompts, their behavioral realization remains highly context-sensitive, reflecting human-like adaptation rather than fixed, rigid traits. Complementing this, Shinji Muraji and colleagues from Hokkaido University investigated emotion-conditioned response generation using facial expression labels from visual novel data, discovering that explicit emotional conditioning significantly enhances the perceived “character-likeness” of role-playing agents.

The workshop also addressed practical security, utility, and domain-specific challenges. Rene Melendez from the Kitami Institute of Technology demonstrated the ease of generating realistic Japanese phishing emails with LLMs, underscoring an urgent need for localized defense models for non-Roman script languages. Addressing safety from a different angle, Rafal Rzepka of Hokkaido University presented lightweight embedding guardrails designed to prevent functional misalignment and unnecessary costs in specialized domains like export control by filtering out off-topic queries before they reach expensive LLM cores. Further extending the scope of utility, Yuta Nakajima from the Kitami Institute of Technology proposed an end-to-end method for assessing product review helpfulness, utilizing both subjective perception and objective information density to support consumer decision-making.

Technical highlights included novel methods for enhancing LLM reasoning and internal reliability. Murilo da Luz and colleagues from the Federal University of Goias introduced PREGU (Partial Reasoning Guided by Uncertainty), a method that monitors output distribution entropy during generation to trigger localized latent-space searches when the model is uncertain. Similarly, Soyeong Jeong from KAIST presented a multi-agent framework that adaptively coordinates specialized agents (focusing on factuality, personalization, and coherence) to proactively refine responses without requiring explicit user correction. Additional contributions explored structural components of ironic statements through neuro-symbolic approaches and optimized dialogue disentanglement through new context-aware assignment strategies. Together, these studies highlighted the ongoing need for dialogue agents that align more closely with human cognitive structures and social norms.

Dr. Rui Mao from Nanyang Technological University delivered an invited talk titled “Metaphorical Cognition and Its Computational Practices.” He argued that metaphor is not merely a linguistic ornament but a fundamental cognitive mechanism through which individuals organize concepts, interpret abstract domains, and make decisions. Through the MetaPro framework, Dr. Mao demonstrated how large-scale metaphor processing can reveal latent cognitive patterns in discourse across critical domains such as AI, finance, and mental well-being, offering a bridge between humanistic insight and high-performance data-driven analysis.

Rafal Rzepka, Michal Ptaszynski, and Pawel Dybala served as chairs of this workshop, while Siaw-Fong Chung handled the local organization. This report was written by Rafal Rzepka.

Navigating the Model Uncertainty and the Rashomon Effect: From Theory and Tools to Applications and Impact (W34)

At AAAI 2026, the MURE workshop (Model Uncertainty and the Rashomon Effect) brought together researchers and practitioners interested in model multiplicity. Named after the classic film where witnesses provide differing but equally plausible descriptions of the same event, the “Rashomon Effect” in machine learning describes situations where many distinct models achieve similarly strong predictive performance, yet imply different explanations, behaviors, or downstream decisions. Our goal was to make this phenomenon more actionable—to clarify when it arises in practice, why it matters for reliability and accountability, and what kinds of tools and interfaces can help stakeholders reason about it.

The program included invited talks, oral presentations, posters, and a community perspectives session. A recurring theme across all formats was that multiplicity is not a niche phenomenon: it shows up naturally in real pipelines due to underspecification, correlated features, data shifts, and flexible model classes, especially in high-stakes settings where interpretability, fairness, recourse, and safety are central. Talks and discussions spanned both methodological angles (e.g., how to characterize or approximate sets of good models; how to probe alternative rationales) and application-driven motivations.

The invited talks spoke to significant potential for positive impact from the study of model multiplicity. Brian Lim (Associate Professor, National University of Singapore) brought a particularly hopeful perspective from the HCI community. A key takeaway was that multiplicity should be treated as a design opportunity. In this view, explanations are interfaces for iteration and control, and multiplicity motivates richer mechanisms for stakeholder interaction. Emily Black (Assistant Professor, New York University) focused her talk on domain expert stakeholder interactions, sharing her advice from adapting the Rashomon Effect to directly inform political policies around model fairness. Yilin Ning (Senior Research Fellow, Centre for Biomedical Data Science, Duke-NUS Medical School) discussed solutions to technical challenges in trustworthy health AI; supporting robust variable importance and transparent, accuracy-preserving fair model selection.

The oral session and community perspectives session were well-attended and generated strong discussion. We also ran an interactive “multiplicity map” poster, where we asked attendees to post where they encountered multiplicity in their own work. Participation was modest, but we did collect several concrete examples that we plan to use as seeds for future community-building and case studies.

The workshop format was intentionally discussion-forward, designed to surface concrete “multiplicity in the wild” case studies and connect them to methodological gaps. Throughout these sessions, conversations consistently converged on shared needs: better methods for computing multiplicity, datasets that inherently exhibit it, and user-facing tools for exploring alternative rationales and downstream decisions. We plan to carry this momentum forward by organizing follow-up activities focused on addressing these gaps and fostering ongoing, cross-disciplinary collaboration.

Overall, MURE helped align vocabulary, surfaced concrete open problems around evaluation and communication, and sparked new connections and potential collaborations.

Lesia Semenova, Chudi Zhong, Varun Babbar, Hayden McTavish, Lucas Monteiro Paes, and Zachery Boner organized this workshop. This report was written by Lesia Semenova and Hayden McTavish.

Language Models for Underserved Communities (W35)

The Workshop on Language Models for Underserved Communities (W35) convened at the 2026 Association for the Advancement of Artificial Intelligence Conference in Singapore. The event gathered a multidisciplinary coalition of researchers, policymakers, and industry practitioners bound by a unified objective: ensuring that rapid advancements in natural language processing actively bridge the global digital divide rather than exacerbate existing inequalities.

The discussions situated the current explosion of large language models within a broader historical context of technology deployment, emphasizing that without intentional design and localized evaluation, computational innovations have historically marginalized low-resource populations. Attendees critically examined how the artificial intelligence community can pivot toward more equitable development practices.

The workshop commenced with a series of keynote presentations that established a rigorous foundation for the day’s proceedings. Dr. Jian Gang Ngui (AI Singapore) opened the session by outlining the critical need for a robust artificial intelligence infrastructure designed explicitly for the public interest, drawing heavily on the unique linguistic and cultural landscape of Southeast Asia. Following this, Professor Simon Chesterman (National University of Singapore) provided a sobering examination of artificial intelligence governance, discussing the practical realities of regulation and the complexities of measuring compliance across varied jurisdictions. Professor Tan Zhi-Xuan (National University of Singapore) then introduced a compelling framework termed “contractualist alignment,” advocating for a paradigm where artificial intelligence systems are governed by negotiated, role-specific norms rather than monolithic, universal preferences. Concluding the keynotes, Elina Noor (Carnegie Endowment for International Peace) illuminated the often-unspoken political and historical dimensions embedded within modern language models, urging attendees to recognize the inherent biases in contemporary training data and model architectures.

Throughout the event, several exceptional research papers were highlighted for their significant contributions to the field. Isaac Lim, Shaun Khoo, Watson Chua, Jessica Foo, Jia Yi Goh, and Roy Ka-Wei Lee (GovTech Singapore and Singapore University of Technology and Design) presented a robust approach to safety alignment in low-resource English languages, utilizing Singlish as a case study. Their work demonstrated the urgent need for culturally nuanced safety protocols that move beyond standard English benchmarks. Pierre Le Coz, Jia An Liu, Debarun Bhattacharjya, Georgina Curto, and Serge Stinckwich (United Nations University Institute in Macau and IBM Research) presented an evaluation of the policymaking capabilities of large language models, exploring how these tools might be utilized responsibly in public sector decision-making. Additionally, Jinju Kim, Haeji Jung, Youjeong Roh, Jong Hwan Ko, and David Mortensen (Sungkyunkwan University, Carnegie Mellon University, University of British Columbia, and Chungnam National University) explored language generalization on unseen low-resource varieties by harnessing linguistic dissimilarity rather than relying solely on cross-lingual transfer.

The poster sessions underscored the workshop’s commitment to equitable and resource-efficient natural language processing for underserved communities. Filip Trhlik, Andrew Caines, and Paula Buttery (University of Cambridge) won Best Poster for showcasing bias dynamics in scaled-down language models, proposing a compute-efficient sandbox to democratize pre-training debiasing efforts. Additionally, Davide Gabrielli, Simone Sestito, and Iacopo Masi (Sapienza University of Rome) were runners-up for their innovative approach to inverse language modeling aimed at developing robust and grounded systems. Broader poster presentations drove significant advancements across several vital themes. To overcome data scarcity, researchers introduced novel frameworks for low-resource machine translation, proposing resource-efficient data augmentation for African and Southeast Asian minority languages, alongside revitalization strategies for endangered dialects. Another critical theme was culturally aware evaluations. Presenters shared benchmarks explicitly designed to expose multilingual safety gaps, assess systemic bias in detection models, and rigorously measure hallucination risks within public service and youth-facing applications. Finally, the sessions highlighted the tangible impact of inclusive artificial intelligence, especially recent large language models, through domain-specific innovations, featuring frameworks tailored for behavioral and maternal mental health support, as well as specialized tools for sign language synthesis. Collectively, these diverse contributions demonstrated the community’s profound dedication to building language technologies that are context-aware, rigorously evaluated, and actively beneficial to historically marginalized populations.

Sang Truong, Sarah Luger, Rafael Mosquera, Duc Nguyen, Fagun Patel, Francesca Vera, Tracy Navichoque, and Sanmi Koyejo served as cochairs of this workshop. This report was written by Duc Nguyen.

LLM-based Multi-Agent Systems: Towards Responsible, Reliable, and Scalable Agentic Systems (W36)

The LaMAS 2026 Workshop focused on the emerging paradigm of Large Language Model-based Multi-Agent Systems (LaMAS), where multiple LLM agents collaborated to solve complex tasks. As interest in agentic AI grew rapidly, the workshop addressed foundational challenges arising from agent interaction, including coordination strategies, evaluation of emergent behaviors, and robustness under dynamic conditions. Particular emphasis was placed on safety, alignment, and responsible design, with discussions examining how to systematically understand failure modes and establish transparent, verifiable, and trustworthy multi-agent LLM systems.

The rapid shift from standalone large language models to interacting, tool-using agents introduced a new paradigm of LLM-based multi-agent systems. While these systems demonstrated impressive capabilities, they also exposed unresolved challenges related to coordination, emergent behaviors, and alignment in multi-agent settings. Research in large language models and multi-agent systems had largely progressed in parallel, with limited integration of safety principles and evaluation methodologies across the two areas. The LaMAS 2026 Workshop was established to bridge this gap and to provide a focused forum for advancing the reliable and responsible design of LLM-based multi-agent systems.

The workshop brought together approximately 200 participants for a full-day program, reflecting strong interest in LLM-based multi-agent systems within the AAAI community. Around 30 papers were accepted through peer review, complemented by four keynote talks and a panel discussion.

The accepted papers broadly clustered around three themes. The first examined coordination and orchestration mechanisms for multi-agent systems, including communication protocols, task decomposition, and system observability. The second focused on safety, robustness, and alignment, addressing issues such as hallucination mitigation, red teaming, ethical design, and vulnerability analysis. The third explored evaluation and emergent reasoning, introducing new benchmarks and analytical frameworks for understanding memory limits, disagreement, persuasion, and collective intelligence in interacting LLM agents. The workshop also recognized outstanding contributions through a Best Paper Award sponsored by Amazon Web Services (AWS), highlighting particularly innovative and impactful research among the accepted submissions. In addition, the workshop was supported by Responsible AI UK (RAi UK), whose commitment to advancing trustworthy and responsible AI research closely aligned with the workshop’s focus on safety, accountability, and reliable multi-agent system design.

The four keynote talks offered complementary perspectives spanning technical foundations, system design, and governance. Bruce Yang, Founder and CEO of Agnes AI, Singapore, discussed the development of Southeast Asian foundation models and the design of memory-driven agentic systems to enhance collaboration in real-world environments. Botao “Amber” Hu, a PhD candidate at the University of Oxford, examined the challenge of accountability in decentralized and sovereign agent ecosystems, questioning how responsibility could be assigned when agents acted autonomously. Michael Wooldridge, Ashall Professor of the Foundations of Artificial Intelligence at the University of Oxford, reflected on lessons from decades of multi-agent systems research and their relevance to the rapidly evolving landscape of LLM-based agents, emphasizing enduring principles alongside emerging coordination challenges. Elham Tabassi, Director of the Artificial Intelligence and Emerging Technology Initiative at the Brookings Institution, addressed accountability as a design constraint in distributed multi-agent systems and argued that responsibility should be embedded into technical architectures from the outset rather than treated as a post hoc policy layer.

The program also featured a panel discussion with Sarvapali D. (Gopal) Ramchurn (University of Southampton), Wan Sie Lee (Infocomm Media Development Authority, Singapore), Stefano V. Albrecht (DeepFlow, London), Ramayya Krishnan (Carnegie Mellon University), and Mengyue Yang (University of Bristol). The discussion explored fundamental research questions, emerging industrial needs, and long-term societal implications of LLM-based multi-agent systems. Panelists emphasized the importance of rigorous evaluation, safety-by-design principles, and collaboration across academia, industry, and policy communities.

Several cross-cutting themes emerged from the workshop. Participants highlighted the need for principled coordination mechanisms, systematic evaluation of emergent behaviors, and safety-by-design approaches that integrated accountability directly into multi-agent architectures. Discussions also underscored the importance of bridging foundational multi-agent theory with the practical realities of LLM-based systems. The strong engagement throughout the day reflected growing momentum in this field and reinforced the need for sustained collaboration to advance reliable and responsible agentic AI.

Shuang Ao and Muning Wen served as cochairs of this workshop. This report was written by Shuang Ao and Muning Wen.

Machine Ethics: From Formal Methods to Emergent Machine Ethics (W37)

The Workshop on Machine Ethics examined how ethical principles can be incorporated into AI systems, spanning approaches from formal specification and verification to emergent ethical behavior arising from learning and interaction. The workshop brought together researchers working on value alignment, normative reasoning, and the design of AI systems that behave in accordance with human ethical standards.

No formal report was filed by the organizers for this workshop.

Machine Learning for Wireless Communication and Networks (W38)

The Workshop on Machine Learning for Wireless Communication and Networks explored how machine learning techniques can address the growing complexity of wireless communication systems and networks. The workshop covered topics including intelligent resource management, signal processing, network optimization, and the application of deep learning and reinforcement learning to next-generation wireless infrastructure.

No formal report was filed by the organizers for this workshop.

Neuro for AI & AI for Neuro: Towards Multi-Modal Natural Intelligence (W39)

This workshop brought together researchers from artificial intelligence, computational neuroscience, and neuromorphic engineering to explore the bidirectional relationship between biological neural systems and machine learning. The program was organized around two complementary themes: how principles from neuroscience can inspire more efficient and robust AI architectures, and how advanced AI methods are transforming neuroscience research itself.

The workshop covered both directions in depth. Presentations addressed how AI methods can make sense of large-scale neural datasets, from cell-type atlases to whole-brain connectivity resources. Discussions also examined how biological principles might guide the design of more energy-efficient and interpretable AI systems, a pressing challenge given the steep compute demands of modern deep learning.

Andreas Tolias of Stanford University opened with a compelling vision: building “digital twins” of the brain. By combining large-scale neural recordings with AI-based predictive models, his group has created systems that link stimuli, neural activity, and behavior in a single framework. These digital twins make it possible to run experiments in silico at a scale that would simply be impossible in a living brain. Closed-loop experiments then serve as a rigorous way to validate what these models reveal about neural representation.

Mitya Chklovskii of the Flatiron Institute challenged a foundational assumption in modern AI: that the ReLU neuron is a good model of biological computation. He introduced the Rectified Spectral Unit (ReSU), which instead learns the stochastic dynamics underlying its inputs from short stimulus trajectories, and does so without backpropagation. A three-layer ReSU network trained on natural-scene translations was able to reproduce key computations in the Drosophila motion-vision pathway, suggesting that this framework captures fundamental biological sensory processing. Adrienne Fairhall of the University of Washington took a related but distinct angle, asking how network structure might self-organize. She showed that local learning rules can give rise to common dynamical motifs without any external supervision, pointing to a class of inductive biases that may be innate to biological networks.

Turning to AI-driven neuroscience, Martin Schrimpf of EPFL presented Brain-Score, an open platform with over 100 neural and behavioral benchmarks for evaluating AI models against brain data. A clear pattern emerged across many models: better task performance tends to correlate with representations that more closely resemble those found in the brain. Adeel Razi of Monash University pushed this idea further into the physical world with DishBrain, a platform in which living neuronal cultures learn to act in closed-loop environments. The work raises fascinating questions about where the boundary between biological and artificial intelligence really lies.

Elisa Donati of ETH Zurich showed how principles such as excitatory-inhibitory balance and sparse temporal coding can be directly translated into spiking neuromorphic chips for low-power neural interfacing. Guozhang Chen of Peking University offered a structural perspective, presenting a graph variational autoencoder that learns compact latent representations of connectomes. The resulting low-dimensional blueprint can generate novel, biologically plausible circuits and also substantially reduce the number of trainable parameters in AI models. On the AI system design side, Mike Zheng Shou of the National University of Singapore introduced Show-o, a single 1.3-billion-parameter transformer that handles both multimodal understanding and generation within one unified model, drawing a natural parallel to how a single brain supports integrated perception and action alike.

Across sessions, a recurring theme was the importance of open-source software and reproducible research practices. Presenters highlighted tools for calcium imaging analysis, electrophysiological data pipelines, and brain simulation software as critical shared infrastructure. The workshop also surfaced productive open questions around causally interpretable and mechanistically grounded models of neural circuits.

The day concluded with a lively panel discussion that surfaced opposing views in NeuroAI. Not all participants agreed that neuroscience is meaningfully contributing to AI in its current form. Others contended that principled grounding in cortical computation remains a largely underexplored source of insight. On the question of what a genuine breakthrough in NeuroAI would look like over the next five to ten years, panelists pointed to different bottlenecks: new theory, richer multimodal datasets, and neuromorphic hardware each had their advocates. There was broad agreement, however, that clearer benchmarks are needed for what it means for a model to genuinely illuminate brain function, rather than simply correlate with it.

Reza Abbasi-Asl (UCSF), Asim Iqbal (Tibbling Technologies), Shinya Ito (Allen Institute), Anton Arkhipov (Allen Institute), Sophia Sanborn (Stanford University), Naomi Donovan (UCSF), and Macarena Aloi (Allen Institute) served as co-organizers of this workshop. This report was prepared by Reza Abbasi-Asl and edited and approved by all co-organizers.

Neuromorphic Intelligence: From Algorithms to Systems (W40)

While Artificial Intelligence has achieved extraordinary milestones, traditional deep learning frameworks often struggle with high power consumption, latency in real-time processing, and a lack of flexibility in unpredictable environments. To address these challenges, the field is turning toward neuromorphic intelligence, which represents a compelling paradigm shift inspired by the brain’s inherently efficient, parallel, and event-driven processing capabilities.

This workshop serves as a platform to showcase interdisciplinary breakthroughs in neuromorphic systems, spanning the full spectrum from advanced software algorithms to specialized hardware architectures. By exploring the synergistic co-design of these elements, the event aims to replicate biological advantages to overcome the physical and computational bottlenecks of conventional AI. The scope of this exploration includes the transition from frame-based to event-based data acquisition in neuromorphic sensing, as well as the development of spiking neural networks and bio-inspired topologies. Furthermore, the workshop emphasizes the critical importance of algorithm-hardware co-design for optimizing low-power execution on neuromorphic chips and the establishment of rigorous standards for benchmarking neuromorphic performance.

The workshop attracted 8 submissions in the fascinating realm of Neuromorphic Intelligence. After careful review, 4 papers are accepted to demonstration in the workshop by the organizing committee.

On January 27, 2026, this workshop took place at Conference H of the Singapore EXPO. The poster session was hosted on Level 2.

The morning session immediately delved into the technical complexities of event-driven perception, featuring Prof. Gim Hee Lee from the National University of Singapore, who explored how event-based data can be harnessed for depth and motion sensing. This was followed by Prof. Lin Wang of NTU, who bridged the gap between bio-inspired sensing and modern Foundation AI models, specifically addressing the hurdles of Embodied AI. The first half of the day concluded with Prof. Zhiwei Xiong from USTC, who demonstrated the potential of event-based structured light for high-speed, high-dynamic-range 3D sensing, followed by a technical oral presentation from Yunshan Qi on efficient neural radiance fields.

After a midday break, the afternoon proceedings resumed with a focus on Spiking Neural Networks (SNNs) and hardware efficiency. Daye Kang from Seoul City University presented two significant papers exploring adaptive spiking transformers and reinforcement-learned dynamic execution for Swin-B architectures. The international perspective continued with Dr. Benoit Cottereau from CNRS, France, who discussed efficient scene understanding.

The final segment of the workshop highlighted the practical constraints of AI deployment. Dr. Manon Dampfhoffer from CEA, France, detailed the synergy between Graph Neural Networks and neuromorphic sensors to achieve low-latency edge AI. The technical sessions were rounded out by Dr. Danda Paudel from INSAIT, providing further insights into the future of sensing technology.

The day concluded with a formal closing statement by Dr. Yueyi Zhang, summarizing the breakthrough contributions made by the accepted papers—ranging from SAR image classification to swarm macro behaviors—and reinforcing the workshop’s success in fostering global collaboration in neuromorphic intelligence.

Yueyi Zhang, Zongwei Wu, Lin Wang, Zhiwei Xiong, and Pascal Vasseur served as co-chairs of this workshop. This report was written by Yueyi Zhang.

New Frontiers in Information Retrieval (W41)

The Workshop on New Frontiers in Information Retrieval explored emerging research directions at the intersection of information retrieval and artificial intelligence. The workshop addressed how advances in large language models, neural retrieval architectures, and multimodal systems are reshaping how information is indexed, searched, and surfaced across diverse domains and modalities.

No formal report was filed by the organizers for this workshop.

Next-Gen Code Development with Collaborative Artificial Intelligence Agents (W42)

The Workshop on Next-Gen Code Development with Collaborative AI Agents examined how AI agent systems are transforming software development workflows. The workshop addressed topics including automated code generation, multi-agent collaboration for programming tasks, AI-assisted debugging and testing, and the integration of large language models into professional software engineering practices.

No formal report was filed by the organizers for this workshop.

Orchestrating Synthesized Human and Artificial Intelligence-Agentic Workflows: Artificial Intelligence Agency Benefits, Disruptions and Management (W43)

The Workshop on Orchestrating Synthesized Human and AI-Agentic Workflows examined the benefits, disruptions, and management challenges that arise when human and AI agents collaborate within shared workflows. The workshop addressed how to design, coordinate, and govern hybrid systems in which autonomous AI agents and human workers jointly execute complex, multi-step tasks across organizational and technical boundaries.

No formal report was filed by the organizers for this workshop.

Personalization in the Era of Large Foundation Models (W44)

The workshop brought together researchers and practitioners to examine how large foundation models can move beyond “one-size-fits-all” behavior toward systems that adapt to individual users. Through invited keynotes, oral presentations, and poster sessions, the program highlighted emerging methods for scalable personalization, lifelong user modeling, and trustworthiness considerations in real-world deployments.

“Personalization in the Era of Large Foundation Models” was held on January 27, 2026, co-located with AAAI 2026 in Singapore, as a full-day event (9:00-17:00) at the Singapore EXPO (Level 2, Peridot 201), with dedicated poster areas supporting informal discussion throughout the day. The workshop’s central premise is that while foundation models perform strongly across language, vision, and multimodal tasks, they often fail to reflect individual preferences, behavioral patterns, and contextual needs—creating a gap between general capability and personalized user experience. The organizers positioned the workshop as a forum to advance theory, scalable architectures, evaluation, lifelong learning, and ethical considerations for next-generation AI systems that “adapt to and grow with” individual users.

The workshop bridged foundational and applied perspectives on personalizing large foundation models. Topics ranged from theory (generalization under user heterogeneity and privacy-utility trade-offs) to practical infrastructure (benchmarks, datasets, metrics, and evaluation protocols). The program also highlighted scalable algorithmic directions such as parameter-efficient adaptation, preference alignment, retrieval-augmented personalization, federated or on-device approaches, and agentic personalization. Across these threads, long-term memory and lifelong learning—particularly preference drift and catastrophic forgetting—were emphasized alongside trustworthiness requirements in safety, fairness, transparency, and privacy-preserving data practices.

The keynote talks connected research advances with deployment realities across domains. Dr. Jay Katukuri (JPMorganChase) discussed personalization in finance and the role of rich metadata; Prof. Hamed Zamani (University of Massachusetts Amherst) emphasized evaluation and retrieval-based personalization; and Prof. Xiangnan He (University of Science and Technology of China) outlined a vision of “personal intelligence” grounded in memory and continual adaptation. Dr. Quanyu Dai (Huawei) highlighted socially intelligent personal assistants with long-term memory, while Prof. Yulan He (King’s College London) compared retrieval, model adaptation, and preference-based alignment, underscoring challenges under sparse and heterogeneous personal data.

The paper sessions and posters reflected this breadth: oral talks addressed collaborative personalization under heterogeneous clients, taxonomy-adaptive moderation with guardrails, federated agent reinforcement learning, interaction “context equilibria,” and methods to mitigate catastrophic forgetting in continual fine-tuning. The awards highlighted both systems and safety-aware directions. The Best Paper recognized “Not All Clients Are Equal: Collaborative Model Personalization on Heterogeneous Multi-Modal Clients,” while Outstanding Papers included work on robust moderation guardrails, federated agent reinforcement learning, interaction equilibria, continual fine-tuning to reduce forgetting, and recommendation-oriented representations. Overall, the workshop underscored a shared community view: personalization is rapidly becoming a defining systems challenge for foundation models, demanding integrated progress in memory, alignment, evaluation, scalability, and responsible data practices.

Across the technical sessions, the workshop conveyed an overall message: personalization is becoming a defining capability for next-generation AI systems, but it requires integrated advances—spanning modeling, memory, alignment, evaluation, and responsible data practices—to be trustworthy in real-world use. By bringing together perspectives from academia and industry, the workshop served as a checkpoint on where the field is converging and where open problems remain, particularly in building personalization that is durable, controllable, and scalable beyond small, curated settings.

Jiahong Liu (The Chinese University of Hong Kong), Yang Zhang (National University of Singapore), Weizhi Zhang (University of Illinois Chicago), Runcong Zhao (King’s College London), and Lucas Vinh Tran (JPMorganChase) served as organizers of this workshop. This report was written by Jiahong Liu.

Foundations of Agentic Systems Theory (W45)

The Workshop on Foundations of Agentic Systems Theory addressed the theoretical underpinnings of agentic AI systems, including formal models of agency, planning, and decision-making in autonomous agents. The workshop brought together researchers working on the mathematical and computational foundations needed to reason rigorously about the behavior, safety, and capabilities of agentic systems.

No formal report was filed by the organizers for this workshop.

Quantum Computing and Artificial Intelligence (W46)

Small-scale quantum computers are becoming increasingly accessible through platforms offered by companies such as IBM, IQM, Google, and D-Wave. As a result, new opportunities are emerging to use them to enhance classical Artificial Intelligence (AI), e.g., to improve prediction quality or speed up training that benefits from quantum phenomena such as superposition and entanglement. This led to the field of Quantum Artificial Intelligence (QAI), which uses quantum computing (QC) to enhance classical AI. Moreover, there is increasing attention to applying classical AI methods to address QC challenges (AI4QC), including quantum software development, quantum noise learning and mitigation, and optimization tasks (e.g., minor-embedding in quantum annealing). Accordingly, this workshop focused on theoretical and applied contributions spanning QAI and AI4QC.

The Second International Workshop on Quantum Computing and Artificial Intelligence (QC+AI 2026) was held in conjunction with the 40th Annual AAAI Conference on Artificial Intelligence (AAAI 2026) in Singapore on January 27, 2026. The workshop invited contributions covering both theoretical and applied advances in Quantum Artificial Intelligence (QAI), as well as work applying classical AI techniques to various aspects of quantum computing (QC), including optimization problems arising in the QC context. The workshop received 17 submissions, each of which was peer-reviewed by at least three reviewers in a single-blind process. A total of three lightning talks and seven full papers were accepted for presentation.

The workshop started with a keynote speaker, Jayne Thompson from Nanyang Technological University, whose talk was titled “Saving Resources with Quantum Agents.” In the keynote talk, it was argued that the growing computational and energy demands of autonomous agents and large AI models reflect a fundamental energetic cost inherent to classical decision-making under uncertainty, rather than merely engineering inefficiencies. The talks discussed that any classical agent executing complex strategies must pay this unavoidable cost to remain prepared for all possible future contingencies. The talk further demonstrated that quantum agents can surpass these classical limits, achieving equivalent strategic performance while requiring substantially lower memory and energy.

The main workshop program was organized into two main technical sessions, followed by a panel and closing remarks. The morning session on Quantum Optimization covered full and lightning presentations covering QUBO-based modeling, learning-assisted graph reduction, hybrid learning and optimization methods for combinatorial problems, global optimization for training variational quantum algorithms, and new QUBO formulations for quantum annealing. After the lunch break, the afternoon session on Quantum Machine Learning included presentations on hybrid quantum, classical control frameworks, quantum kernel methods for medical risk classification, quantum-enhanced word embeddings with variational circuits, quantum acceleration for explainable graph neural networks, and a critical perspective on the current state of quantum deep learning.

The program concluded with a panel discussion and closing session. The panel was held as an open discussion with all workshop participants rather than by invitation to specific individuals. A set of questions was prepared by the organizers, including: (1) How can classical AI help to solve the challenges of fault-tolerant quantum computing? (2) What are the pros and cons of gate-based (digital) quantum computers and quantum annealers? (in terms of qubits, computation, flexibility, noise, etc.) (3) When will fault-tolerant quantum computing be a reality? (10, 15, 20 years, never?) (4) What are the pros and cons of quantum-inspired hardware and algorithms? (5) How many logical qubits do we need to solve real-world problems with quantum advantage? When will those qubits be available in quantum computers? The discussion covered various ideas from participants, including both optimism and skepticism about timelines, emphasized the critical role of AI in error correction, compilation, and optimization, and highlighted the significant scalability challenges that must be addressed.

Shaukat Ali, Francisco Chicano, and Alberto Moraglio were workshop co-chairs. This report is jointly authored by all three of them.

Workshop on Reproducible AI (W47)

The Reproducible Artificial Intelligence Workshop at AAAI-26 brought together researchers and practitioners to discuss how reproducibility expectations must evolve for modern AI—especially as agentic workflows, large language models, and complex software stacks make it harder to specify, rerun, and meaningfully compare results.

Reproducibility has long been a cornerstone of empirical science, but it has become increasingly difficult to achieve in contemporary artificial intelligence research, where results can hinge on fragile software dependencies, non-determinism, and subtle choices in evaluation design. The AAAI 2026 Workshop on Reproducible Artificial Intelligence (W47), held as part of the AAAI-26 workshop program in Singapore, centered on making these challenges explicit and on developing practical community approaches that go beyond “code is available” checkboxes toward repeatable, interpretable, and comparable experimentation.

The workshop opened by framing reproducibility as both a technical and a communication problem: authors must be able to state what claim is being tested, which baseline is being used as the reference point, and which parts of a pipeline are essential versus incidental. This theme was reinforced in an early session on problem framing and reference baselines, including work arguing that “automated reproducibility” efforts can fail when the underlying problem statement is underspecified, as well as work proposing open, reproducible reference baselines intended to make language model and dataset comparisons more transparent and easier to rerun.

A keynote by Chih-Jen Lin (Distinguished Professor, National Taiwan University; Affiliated Professor, Mohamed bin Zayed University of Artificial Intelligence) provided a complementary perspective: even when experiments are repeatable, they may still be misleading if evaluation practices rely on unrealistic assumptions. Using concrete examples, the talk highlighted how “rough use” of machine learning techniques—such as inadvertently leaking information or mishandling the separation of training, validation, and test data—can inflate performance estimates and distort comparisons, undermining the reliability of downstream conclusions.

Subsequent sessions broadened the scope from traditional ML pipelines to emerging agentic and LLM-mediated workflows. Several contributions examined how reproducibility breaks in practice when AI systems generate or modify code: results can depend on untracked dependency versions, environment setup gaps, and toolchain drift, even when high-level prompts and repository snapshots are shared. Related work explored “AI copilots” in scientific settings, treating reproducibility not only as a property to measure after the fact, but as a capability that tools can support during the research process (for example, by capturing provenance and enforcing structured experiment documentation).

The workshop also featured papers that pushed reproducibility into applied and domain-specific evaluation. Topics ranged from meta-benchmarking for evaluating cybersecurity AI agents in a repeatable manner, to stability-focused evaluation for small, open-source medical language models where accuracy alone may not capture robustness under small perturbations. Other work emphasized reproducible evaluation in vision pipelines, including generated-image detection and high-resolution reasoning strategies, underscoring that “same code” does not guarantee “same outcome” when data processing and evaluation protocols are not precisely specified.

To translate these discussions into coordinated next steps, the program reserved time for an interactive, workshop-wide effort toward a shared position paper, including breakout assignments and synthesis discussions. In parallel, the organizers launched the Vibe Coding Reproducibility Challenge to encourage reflective, from-scratch reproduction attempts (with LLM assistance permitted) and to seed a collective analysis paper on what helps—or harms—reproducibility in this new development style.

Odd Erik Gundersen (Norwegian University of Science and Technology), Edward Raff (CrowdStrike and University of Maryland, Baltimore County), and Sagar Samtani (Indiana University) served as cochairs of this workshop. This report was written by Edward Raff, Sr. Director of Data Science at CrowdStrike and Visiting Associate Professor at the University of Maryland, Baltimore County.

Safe, Ethical, Certified, Uncertainty-aware, Robust, and Explainable AI for Health (W48)

The Safe, Ethical, Certified, Uncertainty-Aware, Robust, and Explainable Artificial Intelligence for Health workshop (W48) at AAAI 2026 examined how to make trustworthy AI for health operational in real-world clinical and biomedical settings. The workshop brought together researchers and industry leaders to connect safety, robustness, uncertainty quantification, explainability, and certification-oriented evaluation with concrete modeling practices, post-deployment monitoring, and domain adaptation in high-stakes environments.

The Safe, Ethical, Certified, Uncertainty-Aware, Robust, and Explainable Artificial Intelligence for Health workshop (W48) was organized as part of the AAAI 2026 Workshop Program. The workshop addressed a central challenge in contemporary AI for health: how to translate principles of safety and ethics into measurable, technically grounded practices that can withstand real-world deployment pressures.

The opening keynote by Catherine Fang, Professor at Carnegie Mellon University, framed the workshop around the question of failure. Rather than treating “trust” as an abstract desideratum, she examined concrete failure modes in clinical AI systems and emphasized the need to communicate reliability to clinicians and patients through calibrated uncertainty, transparent reporting of limitations, and actionable explanations. Her talk underscored that operational trust requires not only high average performance but also explicit signals about when systems may be unreliable.

John Braun of the University of British Columbia Okanagan presented “AI, the SIR Model, and DE-Constrained Kernel Smoothing,” illustrating how AI in health can encompass statistically principled inference as well as deep learning. Using the classical SIR compartmental model as a foundation, he demonstrated how nonparametric kernel smoothing can transform noisy infection time series into stable trajectory and derivative estimates. By constraining estimation with the structure of the governing differential equations, he showed how key epidemiological parameters and reproduction numbers can be inferred more robustly from imperfect real-world data. The presentation highlighted a hybrid paradigm that integrates mechanistic disease models with data-driven smoothing to produce interpretable and stable results.

Explainability through generative modeling and counterfactual reasoning formed a major technical thread. Suparshva Jain of Tata Consultancy Services Research, New Delhi, presented “DERM-Clu: A Diffusion-based Explainable Representation Model using Clustering for Skin Lesion Classification,” coauthored with Amit Sangroya and Lovekesh Vig. The work structured the latent space of a diffusion-based model through clustering to enable interpretable representations in dermoscopy classification. By enabling counterfactual-style exploration, illustrating how changes in salient features could alter predictions, the approach sought to make model behavior more transparent, while also confronting issues of representation bias and label noise in medical datasets.

Dylan Hadfield-Menell of the Massachusetts Institute of Technology and collaborators presented “Side Effects May Include: Alignment Decay from Medical Fine-Tuning.” Their work examined how domain-specific fine-tuning of large language models for medical tasks can inadvertently erode previously established safety properties. The findings emphasized that post-fine-tuning safety evaluation should be treated as a primary requirement, particularly in high-stakes medical contexts where alignment failures may carry significant consequences.

Ritambhara Singh of Brown University and collaborators extended counterfactual reasoning to high-dimensional biological data in “SPACE: Sparsely Primed Autoencoder Counterfactual Explanations for Single-Cell Gene Perturbation Experiments.” Their framework generated sparse and biologically plausible gene-level counterfactuals, supporting hypothesis generation and guiding experimental follow-up in single-cell genomics. This work illustrated how explainability techniques can be adapted to omics-scale data while respecting biological constraints.

The workshop also connected trustworthy AI to intervention design and industrial biotechnology. Sun Jie, Chief Executive Officer and Co-founder of ChemT Biotechnology, presented an AI-driven “VirtualCell” perspective for pharmaceutical biomanufacturing. By modeling cell behavior as a controllable system and identifying actionable levers that drive desired outputs, the approach aimed to reduce trial-and-error in manufacturing pipelines characterized by intrinsic biological variability. The discussion extended to the broader impact of AI automation on industry practices and workforce preparation.

Apurva Narayan and Hong Qin served as cochairs of this workshop. The organizing committee consists also include Dr. Elham Dolatabadi, Dr. Rishi Ganesan, Dr. Yalda Mohsenzadeh, Dr. Letu Qingge, Dr. Laleh Seyyed-Kalantari, and Dr. Ritambhara Singh. This report was written by Hong Qin and Apurva Narayan.

Hong Qin is an associate professor in at Old Dominion University.

Apurva Narayan is a associate professor at Western University.

Shaping Responsible Synthetic Data in the Era of Foundation Models (W49)

The Workshop on Shaping Responsible Synthetic Data in the Era of Foundation Models addressed the growing use of synthetically generated data for training and evaluating AI systems, with a focus on ensuring that such data is produced and used responsibly. The workshop explored the fidelity, fairness, and privacy implications of synthetic data, as well as governance frameworks for its use in foundation model development.

No formal report was filed by the organizers for this workshop.

Graphs and more Complex Structures For Learning and Reasoning (W50)

The fifth workshop on Graphs and more Complex Structures for Learning and Reasoning (GCLR) was co-located with the 2026 AAAI Conference in Singapore. Building on previous editions, the 2026 workshop focused on complex graph-structured data and the growing influence of foundation models and large language models (LLMs). It brought together researchers from academia and industry to discuss advances in graph machine learning, integration of graph models with LLMs, fairness and privacy, and applications ranging from combinatorial optimization to collaborative intelligence. The day-long program featured keynote talks, an accepted-paper poster session, and lively discussions.

The GCLR 2026 workshop continued its mission of bridging graph theory, network science and modern AI. The opening remarks focused on how traditional graph models often fail to capture complex real-world systems and emphasised the need for knowledge graphs, multilayer graphs, hypergraphs and other complex structures. The organizers highlighted a dual focus: developing trustworthy, fair and privacy-aware algorithms and harnessing graph foundation models to learn representations that generalize across domains. This framing set the stage for a series of invited talks illustrating emerging directions.

Qingyun Sun (Beihang University) opened the technical program with a talk on Graph Machine Learning for the Large-Model Era. She revisited core tasks in graph learning and presented two development roadmaps: graph foundation models that learn generalizable representations and Graph Retrieval-Augmented Generation (GraphRAG), which integrates graph-structured knowledge with LLMs to improve reasoning and generation. The second keynote by Xiao Li (Singapore University of Technology and Design) took the audience from molecules to time series. Li demonstrated how heterogeneous information networks unify multi-typed biological data and reviewed their use in identifying disease-gene associations and drug-target interactions. Fang Yuan (Singapore Management University) discussed semantic-structural integration in text-attributed graphs. Yuan’s work highlighted the challenge of jointly modeling semantics and structure and pointed to promising zero-shot transfer capabilities. These talks showcased the versatility of graph-structured thinking across domains.

After a coffee break, Zhiguang Cao (SMU) reviewed learning-based approaches for vehicle routing problems. He traced the evolution from handcrafted heuristics to deep neural networks and described efforts toward foundation models that can transfer knowledge across multiple routing variants. Fayao Liu (A*STAR) then discussed scaling structural reasoning beyond the image grid. Her talk spanned monocular depth estimation using deep convolutional neural fields, spatio-temporal reasoning in outdoor point clouds, and CADCrafter, a model that produces parametric CAD sequences from images.

In the afternoon, Xavier Bresson (National University of Singapore) introduced methods for integrating LLMs with graph neural networks. He presented two approaches: enriching text-attributed graph node features using LLM reasoning and GraphRAG, which grounds LLM responses in sub-graphs to reduce hallucinations. Wee Peng Tay (Nanyang Technological University) followed with a talk on continuous dynamics in graph neural networks. By drawing on dynamical-systems theory, he showed that continuous-time GNNs offer interpretability, stability and robustness guarantees. The final invited speaker, Mengting Wan (Microsoft), explored collaboration intelligence. She talked about how higher-order, semantic-rich interaction graphs derived from group chat, co-editing and other digital traces can be combined with LLMs to infer intent and norms, enabling intelligent systems to support teamwork.

Beyond invited talks, the workshop included a well-curated poster session of 22 accepted papers. The papers were reviewed by a diverse program committee, and the poster session fostered deep technical discussions.

Attendees commented on the interdisciplinary nature of the talks and the prominence of foundation models and LLM integration. The workshop served as a vibrant platform for cross-disciplinary collaboration, with participants expressing enthusiasm for future editions.

The 5th instance of GCLR workshop was co-organized by Balaraman Ravindran (IIT Madras), Ginestra Bianconi (Queen Mary University of London), Tarun Kumar (Hewlett-Packard Labs), Deepak Maurya (Purdue University), Gayathri Saranathan (Hewlett-Packard Labs).

How Can We Trust and Control Agentic AI? Toward Alignment, Robustness, and Verifiability in Autonomous LLM Agents (W51)

The workshop How Can We Trust and Control Agentic AI? Toward Alignment, Robustness, and Verifiability in Autonomous LLM Agents was held on January 27, 2026, as a full-day event at the AAAI 2026 Conference. The workshop was motivated by the rapid emergence of agentic AI systems built on large language models that operate autonomously, interact with other agents and users, and make decisions in open-ended environments. As these systems move closer to real-world deployment, ensuring their alignment, robustness, and verifiability has become a central challenge.

The primary goal of the workshop was to frame trust and control as system-level properties of agentic AI rather than as isolated characteristics of individual models. The scope covered topics including multi-agent reinforcement learning, agentic workflows, symbolic and hybrid reasoning, safety and alignment mechanisms, privacy-preserving agent systems, and evaluation methodologies for autonomous decision-making. By bringing together researchers from machine learning, multi-agent systems, and AI systems communities, the workshop aimed to clarify emerging problem formulations and identify shared challenges that cut across algorithmic and architectural boundaries.

The program was designed to balance visionary perspectives with technical depth. It consisted of invited keynote talks, peer-reviewed oral presentations, an industry-focused session, and a poster session. Morning sessions emphasized foundational questions in reasoning, safety, and verification, while afternoon sessions focused on system architectures, optimization methods, and real-world applications. The workshop concluded with an award ceremony recognizing outstanding contributions, followed by a poster session that enabled extended technical discussion and interaction among participants.

Invited speakers included Minlie Huang (Tsinghua University, China), Chi Zhang (Westlake University, China), Carl Yang (Emory University, USA), Stefano V. Albrecht (Nanyang Technological University, Singapore), Yewen (Evan) Pu (Nanyang Technological University, Singapore), and Kaifa Zhao (Tencent Cloud, China). Together, these talks offered a comprehensive view of the agentic AI landscape. Notably, Stefano V. Albrecht articulated a unifying perspective on alignment, verification, and interpretability in multi-agent systems, arguing that trustworthiness must be addressed at the level of agent-agent interaction and system composition rather than by extending single-agent alignment techniques. Yewen (Evan) Pu presented a forward-looking vision on predicting the performance of symbolic and prompt-based agent programs, emphasizing example-driven generalization and performance modeling as foundations for principled agent design and optimization.

In addition to the invited keynote talks, the workshop featured a set of peer-reviewed oral presentations that showcased recent technical advances in trustworthy and controllable agentic AI. Presentations were delivered by Bo-Wen Zhang (Nanjing University), Canyu Chen (Northwestern University), Huichi Zhou (University College London), Tianyi Tang (Centre for Frontier AI Research, Agency for Science, Technology and Research, Singapore), Hyosik Moon (University of Toronto), and Hussein Jawad (Capgemini Invent, Paris, France). These talks addressed topics including trustworthy planning and constraint satisfaction in real-world environments, federated and privacy-aware agent learning, robustness in retrieval-augmented generation, and optimization of agent behavior under limited feedback. Collectively, the oral presentations grounded the high-level perspectives offered by the invited speakers in concrete algorithmic designs and empirical evaluations.

The workshop received nearly 100 submissions, reflecting strong and growing interest in trustworthy agentic AI. After peer review, 69 papers were accepted for presentation. The accepted contributions spanned a broad range of topics, including agentic workflow optimization, federated and privacy-aware agent learning, robustness and trust in retrieval-augmented generation, constraint satisfaction in real-world planning, and evaluation of multi-agent coordination under uncertainty. The diversity and depth of submissions underscored both the maturity of the field and the need for integrative venues connecting learning, reasoning, and systems research.

Overall, the workshop highlighted a clear shift in the field from isolated agent capabilities toward system-level reasoning about coordination, trust, and control. Discussions emphasized that scaling model size alone is insufficient, and that architectural structure, interaction protocols, and rigorous evaluation methodologies are now central research challenges. Key open problems include the lack of standardized benchmarks for agentic systems operating under non-stationarity, partial observability, and real-world constraints, as well as the difficulty of reconciling theoretical guarantees with practical deployment requirements. At the same time, the workshop identified promising directions such as predictive models of agent behavior and performance, reusable agentic workflows, and shared evaluation protocols, laying a strong foundation for future collaborations, benchmarks, and follow-up workshops on trustworthy and controllable agentic AI.

Haiyan Yin, Joey Tianyi Zhou, Piotr Koniusz, Sebastian Tschiatschek, and Simon See served as cochairs of this workshop.

This report was written by Haiyan Yin and Joey Tianyi Zhou.

From Understanding Model Behavior to Discovering New Scientific Knowledge (W52)

The Workshop on Explainable AI for Science examined how methods for understanding model behavior can be extended and applied to accelerate scientific discovery. The workshop brought together researchers working on interpretability, mechanistic understanding, and knowledge extraction from AI models, with a focus on translating these techniques into actionable insights for scientific domains including biology, chemistry, physics, and medicine.

No formal report was filed by the organizers for this workshop.

Author Biographies

Reza Abbasi-Asl is an Associate Professor in the Department of Neurology and the Department of Bioengineering and Therapeutic Sciences at the University of California, San Francisco.

Shaukat Ali is a Chief Research Scientist at Simula Research Laboratory, Norway.

Nitay Alon is a PhD student in the Hebrew University and the Max Planck Institute for Cybernetics.

Shuang Ao is Assistant Professor at Department of Warwick Manufacturing Group (WMG) in University of Warwick, UK.

Pandarasamy Arjunan is an Assistant Professor at the Indian Institute of Science, where his research focuses on artificial intelligence and machine learning applications for agriculture and environmental systems.

Narges Armanfard is an associate professor at McGill University and a core member of Mila – Quebec AI Institute.

Ivan Au Yeung is a senior solutions architect at the NVIDIA AI Technology Center in Hong Kong.

Keshav Bhandari is a PhD student at the Centre for Digital Music, Queen Mary University of London.

Sree Bhattacharyya is a 3rd year PhD candidate at the College of Information Sciences and Technology at Penn State.

Simone Bianco, PhD, is a Vice President of Computation at Altos Labs.

Bruno Casella is a postdoctoral researcher at the Computer Science Department of the University of Turin.

Nancy F. Chen is a Senior Principal Scientist at the Agency for Science, Technology and Research (A*STAR).

Francisco Chicano is a Professor at the University of Malaga, Spain.

Tiansi Dong is a visiting fellow at the Department of Computer Science and Technology, The University of Cambridge.

Dhari Gandhi is Manager, Applied AI Programs at Vector Institute of Artificial Intelligence.

Xiaoxue Gao is a Research Scientist at the Agency for Science, Technology and Research (A*STAR).

Thi Kieu Khanh Ho is a PhD candidate at McGill University and Mila – Quebec AI Institute.

Hadi Hojjati is a PhD candidate at McGill University and Mila – Quebec AI Institute.

Noa Izsak is a doctoral researcher at the CISPA Helmholtz Center for Information Security working on formal methods and verification for concurrent systems.

Himanshu Joshi is the Founder, CEO, and Lead AI Researcher at COHUMAIN Labs.

Yun Sing Koh is a professor at the School of Computer Science, University of Auckland.

Tarun Kumar is a senior research engineer at Hewlett Packard Enterprise.

Jiahong Liu is a Ph.D. candidate at the Department of Computer Science and Engineering, The Chinese University of Hong Kong.

Qian Liu is a lecturer at the School of Computer Science, University of Auckland.

Tianqiao Liu is a Senior NLP Researcher at TAL Education Group.

Joel Mackenzie is a senior lecturer and an ARC DECRA fellow at the University of Queensland.

Deepak Maurya is a PhD student at Purdue University.

Hayden McTavish is a PhD student in Computer Science at Duke University.

Martin Michalowski, PhD, FAMIA, FIAHSI, is an Associate Professor in the School of Nursing at the University of Minnesota.

Alberto Moraglio is a Professor at the University of Exeter, UK.

Atsunori Moteki is a Principal Researcher at Fujitsu Research’s Artificial Intelligence Laboratories.

Apurva Narayan is a associate professor at Western University.

Duc Nguyen is a PhD student at the School of Computing, National University of Singapore.

Hong Qin is an associate professor in at Old Dominion University.

Edward Raff is Sr. Director of Data Science at CrowdStrike and Visiting Associate Professor at the University of Maryland, Baltimore County.

Balaraman Ravindran is a professor at the Indian Institute of Technology Madras.

Rafal Rzepka is an associate professor at the Faculty of Information Science and Technology, Hokkaido University, Japan.

Lesia Semenova is an Assistant Professor of Computer Science at Rutgers University.

Arash Shaban-Nejad, PhD, MPH, FAMIA, is an Associate Professor and Director of Population and Precision Health in the Center for Biomedical Informatics at the University of Tennessee Health Science Center-Oak Ridge National Laboratory.

Rahul Vashisht is a Ph.D. Candidate at the Department of Computer Science and Engineering, IIT Madras.

Pengyang Wang is an Assistant Professor in the Department of Computer and Information Science at the University of Macau.

Muning Wen is Research Assistant Professor at Department of Computer Science and Engineering, Shanghai Jiao Tong University.

Wai Tuck Wong is Head of Labs Engineering at watchTowr and a PhD candidate at Singapore Management University.

Kieran Woodward is a Research Fellow in Computer Science at the University of Nottingham.

Haiyan Yin is an Early Career Principal Investigator and Senior Researcher at the Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A*STAR), Singapore.

Yueyi Zhang is a staff algorithm engineer in Midea Group, China.

Di Zhao is a postdoc research fellow at the School of Computer Science, University of Auckland.

Joey Tianyi Zhou is a Principal Investigator at the Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A*STAR), Singapore, and an adjunct faculty member at the National University of Singapore (NUS).

Reports of the Workshops Held at the 2026 AAAI Conference on Artificial Intelligence

Tags