The online interactive magazine of the Association for the Advancement of Artificial Intelligence

Reports of the Association for the Advancement of Artificial Intelligence’s 2025 Fall Symposium Series

By Bertrand Braunschweig, Kimberly A. Cornell, James Hendler, Brian Hu, Ross Mead, Thilanka Munasinghe, Apurva Narayan, Daniel E. O’Leary, David Porfirio, Michael J. Prietula, Hong Qin, Andrew Schoen, William Swartout, and Jennifer C. Wei

The Association for the Advancement of Artificial Intelligence’s 2025 Fall Symposium Series was held November 6–8, 2025, at the Westin Arlington Gateway in Arlington, Virginia. There were six symposia in the program: AI for Social Good: Emerging Methods, Measures, Data, and Ethics; AI Trustworthiness and Risk Assessment for Challenged Contexts; Engineering Safety-Critical AI Systems; First AAAI Symposium on Quantum Information and Machine Learning: Bridging Quantum Computing and Artificial Intelligence; Safe, Ethical, Certified, Uncertainty-aware, Robust, and Explainable AI for Health; and Unifying Representations for Robot Application Development. This report contains summaries of the symposia, which were submitted by most, but not all, of the symposium organizers.

AI for Social Good: Emerging Methods, Measures, Data, and Ethics (S1)

AI has demonstrated transformative potential across sectors such as aging, combating information manipulation, disaster response, education, environmental sustainability, government, healthcare, social care, transportation, and urban planning. Yet, the systematic development of AI For Social Good remains fragmented across those many research communities, with limited convergence around effective methodologies, equitable impact measurement, or access to important data and long-term engagement with targeted populations. The main objective for this symposium was to convene across disciplines and engage researchers, practitioners, and policymakers, with a particular focus on finding methods, measures and data that could be used in multiple settings.

There were roughly 30 participants. The plenary speakers included Lijun Yu (Google), who explored AI with video generation; William Swartout (University of Southern California), who discussed using generative AI to support USC’s writing program; Milind Tambe (Harvard University), who analyzed optimizing health using bandit algorithms; David Bray (LeadDoAdapt Ventures), who examined leadership for AI; Edward Queen (Emory University), who investigated AI Ethics; Dan Kokotov (Cnaught), who analyzed the increase in productivity from generative AI; and Daniel O’Leary, who reviewed corporate social good efforts with an analysis of Qlik for social good.

The idea of “AI for Social Good” has been around a long time. In the 1970s we built medical expert systems to help doctors prescribe drugs and make diagnoses with the goal of improving care. Historically, building such systems required not just AI expertise, but input from and collaboration with domain experts. This symposium continued that tradition, showing how the latest techniques from machine learning and generative AI could be used for a variety of social issues, including more equitable and adaptive allocation of recovery resources in disasters, a fairer approach to pricing ride shares like Uber during surge conditions, positively influencing public perceptions about water consumption to improve sustainability, facilitating use of carbon credits, reducing the dropout rate in maternal health programs in India, and using AI to identify influencers to help stem the spread of HIV among the homeless in Los Angeles.

However, this year something was different. About half the talks focused on AI itself, in particular, generative AI and how to deal with the problems it is creating now and in the future. These talks focused both on generative AI artifacts that do social good and on generative AI as an artifact at the center of key concerns.

Participants examined several questions around the use of generative AI. What are some implications of ability to create compellingly realistic videos? What are the emerging issues associated with the ability to build deep fakes? Can we re-think education, so that we re-design instruction and create educational apps that use generative AI to enhance students’ critical thinking rather than delegate thinking to the machine? Can we build a framework for auditing the “truth” and uncovering misinformation that considers both the artifact and the context surrounding it? What is the disparity between what companies claim about their chatbots’ ethical behavior and the actual real world behavior? When we crawl the web to assemble the data used to build LLMs, how can we ethically remove data that shouldn’t be included, or add data that would otherwise be missed? How can we use multiple agents to help reduce bias in LLMs? Can we think about safety associated with a Chatbot, as a process rather than a score, in the context of social good tools used in high emotional contexts? What is the role of AI ethics, AI assurance, trustworthiness, and competence when using generative AI? How can these ideas be formalized to create standardized practices?

Recent advances in generative AI have now made possible applications we could only have dreamed of a few years ago, but they have also introduced a recursive element into the AI for social good arena, because generative AI itself now has raised serious issues for society. Increasingly, the generative AI artifact provides both the ability to model the world but also a basis of problems it seeks to solve.

Daniel E. O’Leary and Michael J. Prietula served as cochairs of this symposium. This report was written by Daniel E. O’Leary, Michael J. Prietula, and William Swartout.

AI Trustworthiness and Risk Assessment for Challenged Contexts (S2)

The focus of this symposium was on AI trustworthiness broadly and methods that help provide bounds for fairness, reproducibility, reliability, and accountability in the context of quantifying AI-system risk, spanning the entire AI lifecycle from theoretical research formulations all the way to system implementation, deployment, and operation. The symposium gathered 38 participants for 2.5 days, including 24 accepted papers, 5 keynotes and two roundtable discussions.

AI systems, including those built on large language and foundational/multi-modal models, have proven their value in all aspects of human society, rapidly transforming traditional robotics and computational systems into intelligent systems with emergent, and often unanticipated, beneficial behaviors. However, the rapid embrace of AI-based critical systems introduces new dimensions of errors that induce increased levels of risk, limiting trustworthiness. The design of AI-based critical systems requires proving their trustworthiness. Thus, AI-based critical systems must be assessed across many dimensions by different parties (researchers, developers, regulators, customers, insurance companies, end-users, etc.) for different reasons. Assessment of trustworthiness should be made at both the full system level and at the level of individual AI components. At the theoretical and foundational level, such methods must go beyond explainability to deliver uncertainty estimations and formalisms that can bound the limits of the AI, provide traceability, and quantify risk.

The symposium explored this from different angles, including the use of a safety engineering approach to combine the trustworthiness that results from decades of practice with data-driven machine learning approaches. A key question was: Can we still apply most of these engineering practices to critical systems that involve AI, where life, environment or property is at risk, and provide formal guarantees of their safety or performance?

This was addressed in seven technical sessions, five keynote talks, and two panel discussions. The three main technical sessions focused on AI risks, trustworthiness factors, and alignment. The session devoted to AI risks covered methodologies and tools, especially considering the dependencies between different components of AI systems, to better analyze the risk chain; this session included a talk on the current situation of insurance companies, which are not yet prepared to deal with the risks of complex AI systems, and ended with a roundtable discussion between the various speakers. The session on trustworthiness factors examined analysis along three dimensions (“PPP”): purpose, process, and performance, and also included a paper on the newly proposed RUM (Robustness/Uncertainty/Monitoring) methodology, which should be particularly useful for challenged contexts. The session on alignment, a key issue with general-purpose AI and large language models, also featured a roundtable discussion with the speakers; opinions were diverse on whether the question of alignment will be solved in the future, but participants agreed we should keep working on this important matter, and an example was provided with the ethics2vec formalism for aligning artificial agents and human preferences. Other takeaways from the remaining sessions were that knowledge graphs, and more generally hybridization of AI paradigms, can improve correctness and performance of AI systems; that the specification of multimodal applications is challenging; and that there is still work needed in developing better AI explainability and metrics for time-series prediction problems.

Among the keynotes, Tom Dietterich (Oregon State University) gave a motivating talk based on the report on the safety of AI, authored by the Scientific Academies. The report advocates for the development of a new ML methodology, including the use of data covering the corner cases, the use of surrogate models, and training on as much variation as possible in order to detect novelty. David Sadek (Thales) presented his company’s approach to trustworthy AI based on four pillars: validity, security, explainability and responsibility, including many examples from critical sectors such as aeronautics, defense and safety. These two keynote talks were joint with the symposium on Engineering Safety-Critical AI Systems: because of the government shutdown, some keynote speakers from both symposia could not attend the conference, and the topics addressed by the two symposia were close enough to share these joint talks. Biplav Srivastava (University of South Carolina) and Stefan Buijsman (TU Delft) gave talks that took a step back and considered the issues of trustworthiness from the viewpoint of users, including a claim for “holistic AI” and the ability to define valid metrics linked to the conception of the system. Paolo Shakarian (Syracuse University) gave a talk that was devoted to artificial metacognition, in relation with the now well-known theory of System 1/System 2 thinking.

In conclusion, trustworthiness and risk assessment for challenging contexts present many facets, not only scientific and technological, but also regarding their societal aspects, so AI researchers should also consider using human-subjects research methods to evaluate claims of trustworthiness and effects of AI alignment on outcomes and subjective experience. Symposia like this one continue to be a collaborative place for exchanging concepts and ideas about these key matters.

Bertrand Braunschweig and Brian Hu served as cochairs of this symposium. This report was written by Bertrand Braunschweig and Brian Hu.

Engineering Safety-Critical AI Systems (S3)

Artificial intelligence has increasing application to high-risk settings, but foundational practices for engineering safe AI systems remain few, and research in safety engineering for AI remains scattered across disparate fields of study. The AI community needs more discussion on identifying, assessing, and mitigating AI hazards. This symposium addressed a fundamental question: How should we build AI systems for safety-critical applications? The symposium brought together three communities as part of a broader effort to advance the discipline of engineering AI for safety: domain experts who want to use AI in safety-critical applications, AI researchers, engineers, and practitioners who build AI capabilities, and safety and systems engineers who are tasked with designing, developing, and testing systems that include AI components. No formal report was filed by the organizers for this symposium.

First AAAI Symposium on Quantum Information and Machine Learning: Bridging Quantum Computing and Artificial Intelligence (S4)

The inaugural AAAI Symposium on Quantum Information and Machine Learning: Bridging Quantum Computing and Artificial Intelligence was held November 6–8, 2025, at the Westin Arlington Gateway in Arlington, Virginia, as part of the AAAI Fall Symposium Series. With 38 registrants matching attendance, the event ranked among the series’ most well-attended, drawing a diverse mix of participants from academia, industry, and government laboratories. The program examined how quantum algorithms and hybrid architectures can transform AI tasks across generative modeling, optimization, reinforcement learning, and neural and graph models, while also addressing scalability, benchmarking, cybersecurity, and ethics. Sessions covered foundations and practical implementations, including hybrid QC+HPC workflows and the use of classical ML for quantum systems such as error mitigation and compilation, alongside governance for responsible deployment.

The two keynote talks, held on two separate days, provided complementary perspectives from industry and government. Ismael Faro (IBM Vice President) discussed quantum hardware and hybrid workflows, emphasizing practical integration paths for near-term applications. Dr. Milt Halem (University of Maryland, Baltimore County) underscored the need for benchmarking, transparency, and reproducibility, advocating for an engineering-first focus over isolated algorithmic claims.

Three panels explored critical cross-cutting themes. A Quantum Education panel highlighted curriculum gaps and the value of hands-on, platform-agnostic training using open-source ecosystems, along with experiential activities that connect principles to practice. An Industry–Academia Interplay panel considered shared benchmarks, open tooling, and co-developed testbeds to accelerate technology transfer. A Quantum in Finance and Industry panel discussed early wins in risk modeling, anomaly detection, and compliance, where quantum sampling and optimization may yield practical benefits.

The technical program featured 15 peer-reviewed paper presentations plus a poster session, illustrating breadth across algorithms, models, and applications. Representative research included AI-driven synthesis of permutation circuits across general topologies; Quantum GANs for forecasting complex time series such as stock indices; a hybrid quantum-classical molecular autoencoder to enhance classical decoding in computational chemistry; and a fine-tuned text classifier combining classical BERT with quantum components for NLP. Work on Quantum Network Science investigated how graph structure relates to entanglement performance, informing Quantum Graph Neural Networks; other contributions explored variational algorithms for time-series anomaly detection and parametric quantum feature selection for high-stakes financial use cases. In the section for the generative models, reinforcement learning, and hybrid QC+HPC, highlights included BenchRL-QAS for benchmarking RL-driven quantum architecture search; vectorized attention with learnable encodings for Quantum Transformer models; and an end-to-end LLM-QUBO pipeline that converts natural-language problem descriptions into QUBO formulations for quantum optimization hardware.

The symposium showcased rapid progress and underscored the importance of interdisciplinary collaboration to translate theoretical potential into practical results. Discussions pointed to shared benchmarks and reproducibility, hybrid workflows linking HPC and quantum platforms, and scalable educational pathways, coupled with governance and ethical frameworks, as essential next steps for the QIML community.

James Hendler, Kimberly A. Cornell, Jennifer C. Wei, and Thilanka Munasinghe served as cochairs of this symposium. This report was written by Kimberly A. Cornell, Jennifer C. Wei, Thilanka Munasinghe, and James Hendler.

Safe, Ethical, Certified, Uncertainty-aware, Robust, and Explainable AI for Health (S5)

The SECURE-AI4H symposium convened researchers from artificial intelligence, biomedical informatics, security, and clinical sciences to explore methods for building trustworthy, safe, and interpretable AI systems in health. The program spanned 2.5 days and featured invited talks, paper presentations, lightning talks, a poster session, and a panel discussion on trustworthy deployment.

The 2025 symposium on Safe, Ethical, Certified, Uncertainty-aware, Robust, and Explainable AI for Health (SECURE-AI4H) brought together an interdisciplinary community working at the intersection of AI, health systems, and security-critical applications. Held over 2.5 days (November 6–8, 2025), the symposium focused on building AI systems that are robust to distribution shift, aligned with clinical constraints, transparent in their decision-making, and safe for real-world deployment. The program included ten invited talks, thirty-one accepted lightning-talk papers, a poster session, and a panel on trustworthy and regulatory-aligned AI for health.

Invited speakers highlighted both the scientific opportunities and the system-level challenges in deploying next-generation AI for health. Professor Aidong Zhang (University of Virginia) opened the symposium with a comprehensive overview of interpretable machine learning for biomedical data, discussing Generalized Additive Models, Concept Bottleneck Models, Self-Explaining Neural Networks, and neurosymbolic pipelines such as DeepGSEA and ProtoCell. Her talk emphasized the need for biologically grounded explanations that map model representations to human-understandable concepts in imaging, omics, and multimodal health datasets.

Professor Gamze Gürsoy (Columbia University and the New York Genome Center) examined the tension between large-scale data integration and the heightened risk of re-identification in modern biomedicine. She presented privateQTL and related secure analytics frameworks leveraging federated computation, multiparty computation, and encrypted workflows to support genomics research without releasing individual-level data. Her results demonstrated that privacy-preserving computations can match or exceed the statistical power of traditional pooled analyses.

Recent developments in NeuroAI were highlighted by Professor Apurva Ratan Murty (Georgia Institute of Technology), who presented a biologically inspired framework linking cortical organization with the design of topographic AI models. Through TopoLoss and TopoNets, his group induces brain-like modularity and local similarity constraints, producing models that are more interpretable, more robust, and better aligned with cognitive organization in vision, audition, and language.

Professor Gangqing “Michael” Hu (West Virginia University) discussed the capabilities and limitations of multimodal large language models for dermatologic diagnosis. His analysis showed that few-shot prompting can substantially improve performance, but that models remain unstable, inconsistent across skin tones, and still below state-of-the-art CNN systems. He also analyzed limits of customized GPT-style models, pointing to key research gaps in reference datasets, sampling strategies, and domain-tuned visual-language reasoning.

In addition to invited talks, the symposium featured thirty-one accepted lightning-talk papers. These presentations naturally organized into several thematic clusters. A major theme centered on fairness, data imbalance, and bias detection, spanning gender-specific diagnostic disparities, missing medication information, and causal disentanglement techniques for understanding modality-specific bias. A second cluster addressed robustness and reliability, with work on adversarial vulnerabilities in medical imaging, conformal-prediction pipelines for CT interpretation and text extraction, and calibration methods for large models operating under distribution shift.

Other talks emphasized interpretability and concept-level reasoning, including temporal concept tracing in critical-care prediction, interpretable gait-alignment models for prosthetics, and explanation-quality optimization for clinical decision support. Rapid advances in clinical LLM pipelines were represented by work on multi-agent structuring of clinical text, automated report generation, medical QA toxicity evaluation, and safety safeguards for clinical-adjacent agents. Disease-specific efforts included new models for Alzheimer’s risk prediction, SARS-CoV-2 fitness landscape modeling, and coma prognosis. The poster session on Day 2 further showcased emerging work on federated learning, secure multimodal data integration, differential privacy extensions, and human-AI trust calibration for clinical workflows.

A panel discussion on November 6 brought together six experts to examine regulatory expectations, reproducibility, and the broader challenges of certifying AI systems for high-stakes health settings. Across invited talks, accepted papers, and interactive discussions, the symposium reinforced the importance of safety, privacy, robustness, fairness, biological interpretability, and human-in-the-loop workflows as central pillars of trustworthy AI for health.

Apurva Narayan and Hong Qin served as cochairs of this symposium. This report was written by Hong Qin and Apurva Narayan.

Unifying Representations for Robot Application Development (S6)

The third AAAI Fall Symposium on Unifying Representations for Robot Application Development (UR-RAD) was held on November 6–8, 2025, in Arlington, Virginia. In addition to featuring four invited speakers and twelve paper presentations, the program featured several new activities, including author and speaker panels that occurred on the first and second days, a hardware session that occurred at the end of the second day, and a mentorship program for junior researchers.

Historically, UR-RAD has focused on how roboticists and artificial intelligence (AI) researchers use formal languages and computational abstractions to represent robot tasks, behaviors, and social interactions. The underlying goals of UR-RAD are, therefore, to categorize current trends in representations for robot application development, identify opportunities for adopting new representations, and identify areas where representational standardization versus representational diversity would be beneficial.

This year, UR-RAD increased its focus on inclusion. With the panels and mentorship sessions, UR-RAD 2025 aimed to increase inclusion of junior researchers in discussions with senior members of the field. With the hardware session, UR-RAD 2025 aimed to increase industry representation at UR-RAD and increase participation from hardware developers.

The first day of UR-RAD 2025 featured invited talks from two roboticists, Prof. Zhi Tan (Northeastern University) and Dr. Jeremy Marvel. Dr. Marvel discussed standardization in robotics, while Prof. Tan presented his group’s work in social robotics. The first day also featured eight paper presentations on robot architectures, robot control, and representations for robotics. The first day also featured two author and speaker panels, one in the morning and another in the afternoon. These panels began with a member of the organizing committee generating discussion with the authors and speakers from the most recent sessions, focusing on the use of representations within their own work. As the panels progressed, audience interaction increased.

The second day of UR-RAD featured invited talks from the social robotics and artificial intelligence communities: Prof. Cindy Bethel (Mississippi State University) and Prof. Ken Forbus (Northwestern University). Prof. Bethel spoke about the architectural-related challenges in social robotics research, while Prof. Forbus spoke about large-scale ontologies and qualitative representations for social reasoning. After the conclusion of each talk, the UR-RAD organizers once again engaged in a speaker and author panel. The second day also featured four paper presentations on social robotics and human-robot interaction. The day concluded with a robot hardware session, which featured short talks and a panel providing insights from the creators and developers of several robot hardware platforms: Ross Mead (Quori platform), Jon Ferguson (Jibo platform), Kayla Matheus (Ommie platform), and Saad Elbeleidy (Hugo by Peerbots platform).

The Best Paper Award was given to Huajing Zhao, Brian Flynn, Adam Norton, and Holly Yanco for their work titled “Towards Developing Standards and Guidelines for Robot Grasping and Manipulation Pipelines in the COMPARE Ecosystem” due to the strength of their research and its relevance to the symposium. The Best Paper Runner-Up was given to Pranay Dugar, Aayam Shrestha, Fangzhou Yu, Bart van Marum, and Alan Fern for their work titled “Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots.”

The third day of UR-RAD focused on summarizing and expanding upon points from the preceding days, particularly around ideas of how the findings translate to practical constraints and goals. Attendees engaged in in-depth discussions on several critical topics, including the tools and methods used for robot application development by different stakeholders, methods for detecting, modeling, and understanding human behaviors, and goals and concerns for how future robot application development should proceed, given recent advances in large language models. The third day concluded with discussions around possible future improvements to the UR-RAD symposium, as well as feedback on changes made during this year’s symposium, including talk lengths, mentorship sessions, and post-presentation author panels.

David Porfirio, Ruchen Wen, Ross Mead, Laura M. Hiatt, Saad Elbeleidy, Laura Stegner, Jason Wilson, and Andrew Schoen served as co-organizers for UR-RAD. This report was written by David Porfirio, Ross Mead, and Andrew Schoen.

Author Bios

Bertrand Braunschweig is the scientific coordinator for the European Trustworthy AI Association (ETAIA).

Kimberly A. Cornell is an Assistant Professor at the University at Albany’s College of Emergency Preparedness, Homeland Security, and Cybersecurity, where she directs the Cybersecurity and Cryptography Lab.

James Hendler is the Acting Department Head of the Cognitive Science Department, the Tetherless World Professor of Computer, Web and Cognitive Sciences at Rensselaer Polytechnic Institute, and director of the RPI-IBM Artificial Intelligence Research Collaboration.

Brian Hu is a Technical Leader on the Computer Vision team at Kitware, Inc.

Ross Mead is the Founder and CEO of Semio AI, Inc., and the Executive Director of Semio Community.

Thilanka Munasinghe is a Senior Lecturer at Rensselaer Polytechnic Institute.

Apurva Narayan is an Associate Professor in the Department of Computer Science and Electrical and Computer Engineering at Western University, with affiliate appointments at the University of British Columbia.

Daniel E. O’Leary is the Ernst and Young Professor in the Leventhal and Marshall Schools at the University of Southern California.

David Porfirio is an Assistant Professor in the Department of Computer Science at George Mason University.

Michael J. Prietula is a Professor in the Goizueta Business School and in the Rollins School of Public Health at Emory University.

Hong Qin is an Associate Professor in the School of Data Science and the Department of Computer Science at Old Dominion University.

Andrew Schoen is a Frontend Developer and Interaction Designer at Semio AI, Inc., and a contributing member of Semio Community.

William Swartout is Chief Science Officer at the USC Institute for Creative Technologies, Co-Director of the Center for Generative AI and Society, and a Research Professor in the Computer Science Department at the USC Viterbi School of Engineering.

Jennifer C. Wei is a Project Scientist for NASA’s Earth Science Data and Information Systems and the lead scientist at the Goddard Earth Science Data and Information Services Center.