The online interactive magazine of the Association for the Advancement of Artificial Intelligence

Authors: Sedef Akinli-Kocak, Simone Bianco, Jill Burstein, Lu Cheng, Mengnan Du, Jonas Ehrhardt, Alborz Geramifard, Bissan Ghaddar, Leilani H. Gilpin, Fengxiang He, René Heesch, Rachneet Kaur, Faiza Khan Khattak, Tarun Kumar, Mingxiao Li, Bo Li, Yuxi Li, Lydia Liu, Debshila Basu Mallick, Deepak Maurya, Martin Michalowski , Vahid Partovi Nia, Blessing Ogbuokiri, Kassiani Papasotiriou, Edward Raff, Balaraman Ravindran, Shaina Raza, Aaqib Saeed, Laleh Seyed-Kalantari, Arash Shaban-Nejad , Ankit Shah, Amarda Shehu, Ryan Shi, Suchetha Siddagangappa, Arunesh Sinha, Dimitris Spathis,Biplav Srivastava, Marija Stannojevic, Graham W. Taylor, Yihang Wang, Di Xu, Yichao Yan, Qin Yang, Xiaodan Zhu

The Workshop Program of the Association for the Advancement of Artificial Intelligence’s 38th Conference on Artificial Intelligence (AAAI-24) was held in Vancouver, Canada on February 26-27, 2024. There were 35 workshops in the program: “Are Large Language Models Simply Causal Parrots?”, AI for Credible Elections: A Call To Action with Trusted AI, AI for Digital Human, AI for Education: Bridging Innovation and Responsibility, AI in Finance for Social Impact, AI to Accelerate Science and Engineering, AI-based Planning for Cyber-Physical Systems (CAIPI), AIBED: Artificial Intelligence for Brain Encoding and Decoding, Artificial Intelligence for Cyber Security (AICS), Artificial Intelligence for Operations Research, Artificial Intelligence for Time Series Analysis (AI4TS): Theory, Algorithms, and Applications, Artificial Intelligence with Biased or Scarce Data, Cooperative Multi-Agent Systems Decision-Making and Learning: From Individual Needs to Swarm Intelligence, Deployable AI (DAI), EIW-III: The 3rd Edge Intelligence Workshop on Large Language and Vision Models, FACTIFY 3.0 – Workshop Series on Multimodal Fact-Checking and Hate Speech Detection, Graphs and more Complex Structures For Learning and Reasoning (GCLR), Health Intelligence (W3PHIAI-24), Human-Centric Representation Learning (HCRL), Imageomics: Discovering Biological Knowledge from Images using AI, Large Language Models for Biological Discoveries, Learnable Optimization (LEANOPT), Machine Learning for Cognitive and Mental Health, Neuro-Symbolic Learning and Reasoning in the era of Large Language Models (NuCLeaR), Privacy-Preserving Artificial Intelligence, Public Sector Large Language Models: Algorithmic and Sociotechnical Design, Recommendation Ecosystems: \\ Modeling, Optimization and Incentive Design, Responsible Language Models, Scientific Document Understanding, Sustainable AI, Synergy of Reinforcement Learning and Large Language Models, Workshop on Ad Hoc Teamwork, XAI4DRL: eXplainable Artificial Intelligence for Deep Reinforcement Learning, XAI4Sci: Explainable machine learning for sciences. This report contains summaries of the workshops, which were submitted by some, but not all, of the workshop chairs.

“Are Large Language Models Simply Causal Parrots?” (W1)

Understanding causal interactions is central to human cognition and, thereby, a central quest in science, engineering, business, and law. Developmental psychology has shown that children explore the world in a similar way to how scientists do, asking questions such as “What if?” and “Why?”. AI research aims to replicate these capabilities in machines. Deep learning, in particular, has brought about powerful tools for function approximation by means of end-to-end trainable deep neural networks. This capability has been corroborated by tremendous success in countless applications. One of these successes are Large Language Models (LLMs). However, their lack of interpretability and the indecisiveness of the community in assessing their reasoning capabilities prove to be a hindrance towards building systems of human-like ability. Therefore, both enabling and better understanding causal reasoning capabilities in such LLMs is of critical importance for research on the path towards human-level intelligence. The Pearlian formalization to causality has revealed a theoretically sound and practically strict hierarchy of reasoning that serves as a helpful grounding of what LLMs can achieve.

Our aim is to bring together researchers interested in identifying to which extent we can consider the output and internal workings of LLMs to be causal. Ultimately, we intend to answer the question: ”Are LLMs Causal Parrots or can they reason causally?”. No formal report was filed by the organizers for this workshop.

AI for Credible Elections: A Call To Action with Trusted AI (W2)

This brief report presents highlights from the day-long workshop at AAAI-2024 on how Artificial Intelligence (AI) -related technologies can be used to address challenges in conducting elections by exploring issues at the intersection of AI, cybersecurity, law, political science, and journalism.

The third workshop on “AI for Credible Elections: A Call to Action with Trusted AI” [1] was held in person at the AAAI 2024 conference on Monday, February 26, 2024. With 2024 being the year of elections, as almost half the world’s population is experiencing elections, the need for accurate information for voters, candidates, and other stakeholders could never have been greater. Concomitantly, Artificial Intelligence (AI), as the data-driven technologies that researchers have been working on for over five decades to provide information, may have been considered the right tool to rise to the occasion and address the needs. However, when it comes to elections and elections, there is a general perception of pessimism – so much so that whenever AI is referenced in connection with elections, it often draws negative reactions due to the fear of bots, manipulation, and voter suppression. In particular, there is a concern about large information gaps and disorders (like mis- and dis-information).

Given this context, continuing our quest to make AI relevant for elections constructively, the workshop explored the challenges of credible elections globally in an academic setting with apolitical discussion on significant issues at the intersection of Artificial Intelligence (AI), Security, Journalism, Law, and Political Science. The event drew interest from researchers, journalists, government and even a (technology conscious local) representative. We give a brief summary of the discussions – more details, presentations and papers are on the website.

A key theme of the workshop was data. The invited talk was given by Prof. Matthew Saltzman of Clemson University, representing the League of Women Voters (LWV), who talked about the US elections and how LWV has been providing voting-related information via There was also a paper collating and generalizing common questions asked in the context of elections globally. A novel theme of the workshop was election monitoring. There was an interesting paper on using crowdsourcing platforms to enable people to report on elections in Africa (Kenya, Nigeria), repurposing the Ushahidi platform used widely for emergency management. However, information disorder was a prominent topic, with Prof. Mathew Wright of Rochester Institute of Technology talking about his work on technology for journalists, Prof. Anita Nikolich of University of Illinois Urbana-Champaign sharing her work on scams and senior citizens, two papers discussing the implications of fake news for elections and a benchmark. There was also a demonstration on simulating and controlling information spread in opinion networks that won the conference’s best demo award.

A significant part of the day was spent in three panels discussing with experts from different fields: why AI should not be used for elections (Panel A; AI non-use), why AI should be used for elections (Panel B; AI use), and what is needed to facilitate AI for elections (Panel C; AI enablement) along with a session on funding. Panel A was moderated by Prof. Nikolich and featured Prof. Sriraam Natrajan (University of Texas at Dallas; AI expert), Prof. Bryant Walker Smith (University of South Carolina; Law) expert), and Ms. Rachel V See (Senior Counsel, Labor & Employment, Seyfarth Shaw; Regulations). Prof. Bryant was concerned that companies should not be able to influence opinions easily, Prof. Natrajan was worried that governments did not leverage AI researchers well and relied too much on CEOs, and Ms. See argued that regulations are too late in the game when AI is already made – proactive conversations should happen between developers, deployers, stakeholders, and purchasers of AI systems. All were concerned that the use of AI for reducing the information gap in elections is tricky given the self-interest of social media companies, governments’ automation bias, and lack of sufficient safeguards. The second panel was moderated by Prof. Biplav Srivastava of the University of South Carolina and featured Prof. Virginia Dignum (Umea University; AI and ethics expert), Prof. Smith, and Mr. Josh Lawson (Aspen Institute; Digital policy expert). Prof.

Dignum talked about the efforts made by the European Union to promote safe data access and AI impact, Prof. Smith provided context for AI use based on his extensive work in self-driving cars, and Mr. Lawson discussed the role his non-profit organization is playing in improving digital literacy. All agreed that the electoral information gap and disorder is an important societal opportunity to apply AI and not avoid it. The panel also discussed how, with the increasing electorate and scale of elections worldwide, an increased role of technology is inevitable. The third panel was moderated by Prof. Wright and featured Mr. Lawson, Prof. Nikolich, Prof. Srivastava, and Prof. Andrea Hickerson (University of Mississippi; Journalism expert). The discussion centered around the need for authoritative sources of data for AI and how open data can play an important role, the need for AI practices and standards for testing, the need to reuse practices from around the world, and how AI innovations may be funded. In fact, the last session of the day was a discussion between Dr. Ott Velsberg, Chief Data Officer, Ministry of Economic Affairs and Communications, Republic of Estonia, and Prof. Nikolich, who was a former program manager at NSF. Dr. Velsberg described how Estonia is able to make small investments in AI technology and engage companies to transform elections.

We now combine the lessons of the three workshops to synthesize a common message. People around the world are worried about their democracies, but 2024 will also see them participating in large numbers. In this regard, the workshop series has considered challenges and opportunities of using technology, especially AI, from multiple perspectives and global outlook.

A multi-pronged solution is needed: process, people, and technology.

We have considered elections with perspectives from the US, India, UK, Ireland, Canada, Switzerland, Brazil, Kenya, Nigeria. Many problems are already tackled at one scale (voter identification, voting technologies) and can be easily reused across geographies.

How a technology is designed and how issues around their usage are handled can affect voter’s trust in them. Electronic voting has the problem of transparency and a paper trail helps mitigate it somewhat. Crowdsourced election monitoring may be a promising idea for the future.

AI disorders are a key concern with elections but need not be a deal-breaker. AI can specifically help elections by disseminating official information (e.g., about candidates, electoral process, and candidates) personalized to a voter’s cognitive needs at scale, in their language and format. When AI helps prevent mis- and disinformation, the impact can be far-reaching.

More focus is needed in developing data sources, information system stack, testing and funding for AI and elections.

The different jurisdictions in the US have a lot of freedom in organizing elections, but as a result, also complexity. It is an open question whether this provides advantages and is desirable for a credible election.

Biplav Srivastava, Anita Nikolich, Andrea Hickerson, Tarmo Koppel, Chris Dawes and Sachindra Joshi served as cochairs of the workshop. This report was written by Biplav Srivastava.

AI for Digital Human (W3)

The first international workshop on AI for Digital Human was conducted with the objective of bringing together researchers interested in the latest advancements in the field of digital human and how artificial intelligence can be leveraged to improve the quality and efficiency of the process. The event attracted participants from around the globe, highlighting the significance of the discussed topic.

In the opening keynote, Prof. Dinesh K. Pai, Professor of Computer Science at the University of British Columbia (UBC) and founder of Vital Mechanics Research, gave a talk on “Digital Humans: Science and Simulation in the Time of AI.” This talk discussed how to develop realistic computational models of the human body and explored some fundamental technical limits to sensing and accurately predicting the actual behaviors of a real system. Wei Sen Loi, a staff researcher at Huawei Canada, and Joel Johnson, a graduate student at UBC and a research intern at Huawei Canada, delivered a talk on “Deep Albedo: Real-time biophysically-based facial map modifications .”They presented a method of leveraging the autoencoders and simulated biophysically-based skin parameters to enable an efficient spatial-varying mapping between the biophysical parameters and their resulting skin color. These two talks raised discussions among the participants and speakers.

In addition to the talks, we accepted ten high-quality papers out of 20 submissions. Authors of these papers presented their works in the oral session. The best paper award was given to “Block-wise LoRA: Revisiting Fine-grained LoRA for Effective Personalization and Stylization in Text-to-Image Generation”. During the workshop, the winners of the co-organized competitions were announced. Benjamin MacAdam, a postdoctoral researcher at Simon Fraser University, hosted all the local sessions.

This workshop was co-organized by Yichao Yan, Di Xu, Haozhe Jia, Matthias Nießner, Jiankang Deng, and KangKang Yin. This report was written by Yichao Yan and Di Xu.

AI for Education: Bridging Innovation and Responsibility (W4)

This two-day workshop connected experts, educators, learners, and policymakers to delve into the transformative promise and ethical challenges posed by AI for education (AI4ED). Thanks to generous scholarships from Eedi and Google, the workshop allowed emerging scholars from underrepresented communities to participate in this event.

Day 1 began with a focus on innovations in AI, specifically the burgeoning field of GenAI and its implications for education. Prof. Vered Shwartz’s (University of British Columbia) opening talk provided a comprehensive overview of the mechanics underpinning Large Language Models (LLMs), ensuring a level playing field for all participants. This foundational session was crucial, as LLMs dominated much of the workshop’s discussions, from their capabilities to their limitations in educational settings. After a series of contributed talks from emerging and expert researchers showcasing their innovative use cases using LLMs for computer science education, assessment of tutoring practices, student advising, generating hints to support student understanding, using GenAI for creating richer generative learning experiences, and enhancing critical thinking. The posters represented a wide variety of research, from reducing hallucinations in nursing education to the use of LLMs for scaling knowledge graph generation, co-creating concept analogies, and optimizing prompt engineering. Gary Marcus (professor emeritus, New York University), a well-known GenAI critic, delivered a provocative keynote that sparked debate and called for greater regulatory oversight of these technologies. With over 70 submissions, the mathematical reasoning competition revealed that LLMs outperformed on English vs. Mandarin math problems, while industry-leading models excelled over custom ones. Day 1 culminated in the forward-looking panel “AI4ED 2034: Prioritizing Use Cases for Equitable Impact” that explored groundbreaking use cases with a strong emphasis on equity, including AI tools to foster computational thinking, personalized learning with precision, transparently supporting faculty evaluations while mitigating potential biases, enhancing peer collaboration in STEM, and promoting inclusion and diverse perspectives. Ethical concerns were woven into the discussion, emphasizing the importance of AI literacy for educators, the creation of open standards for AI use, and interoperable tools to ensure fairness and inclusion.

Day 2 focused on responsibility in AI4ED. Prof. Virginia Dignum’s (Umea University) keynote called for socio-technical innovation, regulation, data protection, and an interdisciplinary approach to address the transformative impact AI will have on educational assessments. Four invited speakers discussed different aspects of RAI. Prof. Emma Brunskill (Stanford University) showcased how AI-based personalized learning systems can aid the lowest performing students, also noting the failure of LLMs to accurately simulate learning dynamics, pointing to an area ripe for further R&D. Dr. Victoria Yaneva (National Board of Medical Examiners) presented barriers and opportunities for adopting AI generation of test items in scientific domains. Dr. Will Belzak (Duolingo) discussed how complex measurement bias can be evaluated for LLM-generated and machine learning-scored test questions. Prof. Renzhe Yu (Columbia University) discussed digital Inequalities in the era of LLMs. The “Responsibly Developing, Evaluating, and Using (Generative) AI4ED” panel brought together thought leaders to discuss the ethical deployment of AI in education. The consensus underscored AI as a tool to support, not replace, educators, stressing the importance of involving stakeholders at every stage to ensure the representation of the diverse needs of students, educators, and educational institutions. Finally, the need for randomized controlled trials to obtain robust evidence of the impact of these technologies on learning outcomes beyond engagement was emphasized. The day two poster session included spotlight talks from a diverse group of new researchers. A common theme among these talks was the vulnerabilities and biases that arise from the use of LLMs in educational applications. The workshop wrapped up with a critical panel on “Centering Equity and Underserved Populations in AI4ED” that highlighted the ways in which inherent biases in GenAI models can lead to misrepresentation or loss of cultural nuances, particularly among indigenous and other underrepresented communities. Panelists shared insights into mitigating these challenges, emphasizing the importance of inclusivity and diversity in AI model training, development, and application.

While acknowledging the immense potential of AI to enhance educational experiences, the workshop highlighted an equally strong need for equity-driven responsible AI development. This workshop served as a humble attempt to reconcile the historic tension between innovation and responsibility, and a call-to-action for the AI4ED community to connect with each other, collaborate, and innovate responsibly.

The co-chairs for the workshop were Muktha Ananda (Google), Debshila Basu Mallick (OpenStax, Rice University), Jill Burstein (Duolingo), Lydia Liu (Princeton), Zitao Liu (Jinan University), James Sharpnack (Duolingo), Jack Wang (Adobe), and Serena Wang (UC Berkeley). This report was authored by Debshila Basu Mallick, Jill Burstein, and Lydia T. Liu.

AI in Finance for Social Impact (W5)

The “AI in Finance for Social Impact” workshop, held at AAAI 2024, marked the first of a promising series aimed at exploring the role of Artificial Intelligence (AI) in fostering socially responsible finance. Gathering experts from both academia and industry, the event served as a crucial platform for sharing ideas, experiences, and best practices to advance socially responsible finance and encourage collaboration among stakeholders. This focus aligns with the growing demand from consumers and regulators for companies to adopt Environmental, Social, and Governance (ESG) considerations in their investment strategies, enhancing risk management and aligning investments with broader social goals.

As AI’s presence in finance grows, its potential to support ethical and sustainable practices becomes more evident. The technology is being employed to combat financial crime, enhance financial inclusion, advance ESG investing, and develop privacy-preserving solutions. Moreover, there is an ongoing industry effort to ensure that AI is developed and utilized responsibly and ethically, particularly by addressing potential biases in AI-driven financial decision-making.

The workshop began with opening remarks by the Organizing Committee, followed by a series of engaging sessions. The morning session featured two invited talks and six contributed paper presentations, while the afternoon session included two more invited talks and three paper presentations. Notable speakers included Bryan Wilder from Carnegie Mellon University, Quanquan Gu from the University of California, Los Angeles, Fei Fang from Carnegie Mellon University, and Brian Barr from Capital One. Participants also enjoyed poster sessions during coffee and lunch breaks. The workshop received 26 paper submissions across Full, Short, and Extended Abstract formats. Approximately half were accepted for oral presentations and about 20% for poster sessions, with 14 full papers and six posters ultimately accepted. The event’s highlights were the Best Paper award given to “FedTabDiff: Federated Learning of Diffusion Probabilistic Models for Synthetic Mixed-Type Tabular Data Generation” by Marco Schreyer and the Best Poster award to “Breaking Barriers: Unveiling Gender Disparities in Corporate Board Career Paths Using Deep Learning” by Yuhao Zhou. The workshop attracted a peak attendance of 75 participants, characterized by a high level of engagement evidenced by insightful comments and questions from the audience.

In summary, the workshop not only highlighted the potential of AI in promoting socially responsible finance but also set the stage for future discussions and developments in this critical area. The exchange of knowledge and ideas at this workshop is expected to contribute significantly to the advancement of socially responsible financial practices globally.

Rachneet Kaur, Kassiani Papasotiriou, and Suchetha Siddagangappa served as co-chairs of this workshop.

AI to Accelerate Science and Engineering (W6)

Scientists and engineers in diverse application domains are increasingly relying on using computational and artificial intelligence (AI) tools to accelerate scientific discovery and engineering design. AI, machine learning, and reasoning algorithms are useful in building models and decision-making towards this goal. We have already seen several success stories of AI in applications such as materials discovery, ecology, wildlife conservation, and molecule design optimization. This workshop aims to bring together researchers from AI and diverse science/engineering communities to achieve the following goals: 1). Identify and understand the challenges in applying AI to specific science and engineering problems. 2). Develop, adapt, and refine AI tools for novel problem settings and challenges. 3). Community-building and education to encourage collaboration between AI researchers and domain area experts. No formal report was filed by the organizers for this workshop.

AI-based Planning for Cyber-Physical Systems (CAIPI) (W7)

The Workshop on “AI Planning for Cyber-Physical Systems” was held on February 26, 2024, at the 38th Annual AAAI Conference on Artificial Intelligence in Vancouver, Canada. Cyber-Physical Systems pose a challenging domain for planning algorithms due to their complexity and application in a broad spectrum of real-world scenarios. Recent advances in AI planning, including Neuro-Symbolic architectures, Large Language Models (LLMs), Deep Reinforcement Learning, and developments in symbolic planning paradigms, introduce new approaches toward tackling the complexity of CPS planning problems. The workshop posed as the first platform for research from the different subdomains of AI planning to discuss and share their knowledge in lively discussions.

The workshop hosted a keynote by IJCAI Computers and Thought Award winner Johan de Kleer, highlighting the imperative of resilience in CPS, the importance of preventing failures and ensuring continuous operation, and pointing out the origin of failures in inadequate anticipation of potential risks. De Kleer underscored AI’s role, especially in planning and Machine Learning (ML), in confronting unforeseen challenges, thereby bolstering system resilience.

The workshop featured six paper presentations from a wide range of applications in AI-based planning for CPS. Topics included cybersecurity enhancements in production settings, the use of LLMs for human-centric planning, online health-aware replanning for UAVs, addressing a wide array of challenges and simultaneously proposing solutions to them.

One study tackled cybersecurity vulnerabilities in cyber-physical production systems, proposing a novel method to efficiently isolate compromised devices, thus avoiding full production shutdowns. Another explored LLMs in human-in-the-loop CPS scenarios, introducing a model ensuring the feasibility and safety of generated plans demonstrated through automated insulin delivery systems for Type 1 Diabetes patients. Addressing decision-making under uncertainty, the Policy-Augmented Monte Carlo Tree Search (PA-MCTS) merged policy estimates with current environmental models, providing a robust solution for dynamic settings. Another paper bridged AI planning with practical applications by automating the creation of planning problems from semantic capability models, simplifying the planning process for manufacturing systems and autonomous robots. A health-aware replanning framework was introduced, incorporating prognostics and fault detection to enable safer planning based on system health information. Lastly, a novel approach using knowledge graphs aimed at structuring information to enhance decision-making and planning accuracy in the automotive electrical systems domain.

The workshop ended with a late lunch together, where the course was set for future collaborations and a joint project. Participants expressed their interest in continuing the discussions at future meetings and suggested repeating the workshop next year.

Oliver Niggemann, Gautam Biswas, Roni Stern, Alois Zoitl, Borahan Tümer, Alexander Diedrich, Jonas Ehrhardt, René Heesch, and Niklas Widulle served on a committee. This report was written by René Heesch and Jonas Ehrhardt.

AIBED: Artificial Intelligence for Brain Encoding and Decoding (W8)

The 1st Artificial Intelligence for Brain Encoding and Decoding Workshop in conjunction with AAAI 2024 was successfully held. The workshop is inspired by the significant advancements in the use of deep artificial neural networks for brain encoding and decoding, which have led to remarkable achievements in reconstructing high-quality images and videos from brain activity. The workshop aims to explore the intersection of AI and neuroscience, focusing on how AI, particularly deep artificial neural networks, can facilitate the encoding and decoding of brain activities. Additionally, it also provides a platform for researchers engaged in brain encoding and decoding, as well as those interested in the field, to discuss recent developments and share insights for future research.

The workshop received a total of eight submissions. These submissions underwent a rigorous double-blind review process to ensure fairness and anonymity, culminating in the selection of four papers for oral presentation. The four papers are:

A multi-modal visual information processing framework for encoding fMRI activity aided by verbal semantic information. Shuxiao Ma, Linyuan Wang, Ruoxi Qin, Senbao Hou, Bin Yan;

A Comparative Analysis of Language Models for the Classification of Alzheimer’s Disease Based on Connected Speech. Helena Balabin, Laure Spruyt, Ella Eycken, Ines Kabouche, Bastiaan Tamm, Jolien Schaeverbeke, Patrick Dupont, Marie-Francine Moens, Rik Vandenberghe;

Asynchronous Signalling in Spike Neural Networks: Enabling on Chip Training with Intrinsic Temporal Learning Capacity. Hao Yu, Nancy Fulda, Jordan Yorgason;

BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP for Generic Natural Visual Stimulus Decoding. Yulong Liu, Yongqiang Ma, HaodongJing, Wei Zhou, Guibo Zhu, Nanning Zheng.

The workshop was online streamed, and there were more than 5600 online viewers from around the globe watching. The entire event was scheduled to last approximately five hours and fifteen minutes, commencing at 15:00 and concluding at 20:15 Beijing Time. The entire event has been recorded during the online stream and will be uploaded to a public website for convenient access. The workshop was organized into three main parts: opening remarks, a series of four presentations on the topic by researchers from various institutions, and oral presentations by four paper authors. Dr. Jingyuan Sun, a postdoc researcher from KU Leuven, began the event with an introduction that covered the latest developments in brain encoding and decoding, then proceeded to discuss language and visual decoding from brain activity. This was succeeded by a talk from Professor Shaonan Wang of the Institute of Automation at the Chinese Academy of Sciences, entitled “Toward a Comprehensive Study of Human and Machine Language Understanding.” Following this, Professor Yu Takagi from Osaka University delved into how deep generative models can be integrated with human brain activity. Jiaxin Qing of the Chinese University of Hong Kong wrapped up this series with insights into reconstructing videos and images from brain signals. Each presentation was followed by a 10-minute question and discussion period. Concluding the workshop, four paper authors presented their research: Shuxiao Ma from the PLA Strategic Support Force Information Engineering University, Hao Yu from Brigham Young University, Helena Balabin from KU Leuven, and Jianxiong Gao from Fudan University. Each of these oral presentations was succeeded by a five-minute Q&A session.

Jingyuan Sun, Marie-Francine Moens, Shaonan Wang, Helena Balabin, Zijiao Chen, Jiaxin Qing, Mingxiao Li, and Xinpei Zhao co-organize the workshop, where Jingyuan Sun, Marie-Francine Moens, Helena Balabin, and Mingxiao Li are from the Department of Computer Science, KU Leuven, Belgium, Shaonan Wang, and Xinpei Zhao are from the Institute of Automation, Chinese Academy of Sciences, China, Zijiao Chen is from the National University of Singapore and Jiaxin Qing is from the Chinese University of Hong Kong. This report is written by Mingxiao Li.

Artificial Intelligence for Cyber Security (AICS) (W9)

The AICS 2024 workshop focused on research and applications of AI to problems in cyber security, including attacks on machine learning in cyber security, systems security, adversarial attacks and security budget allocation, and large language models. Talks emphasized the application of AI to operational problems and surveyed systems and basic research on techniques that enable the resilience of cyber-enabled systems. The focus this year was on large language model (LLM) and its relation to cyber security.

The workshop began with a keynote by Prof. Yevgeniy Vorobeychik (Professor of Computer Science and Engineering at Washington University in St. Louis), entitled “Towards Trustworthy Autonomous AI-Driven Systems.” Prof. Vorobeychik stressed the need for a principled means to address problems in AI for cyber-systems to help address, among other things, robust learning and control in complex perceptual environments. The talk explored various aspects of robust reinforcement learning in adversarial settings, including curriculum-based learning, stability analysis, and certified robustness.

Within the adversarial attack session, the first paper focused on practical and realizable attacks in adversarial machine learning, especially pointing out the disconnect between attacks handled in research versus attacks realizable in practice. A second paper discussed adversarial attacks on node attributes in graph neural networks.

Within the systems security session, the first paper evaluated the use of minority over-sampling methods to tackle class imbalance that occurs in network traffic classification. Combined with a deep neural network, this showed promising results for DDoS detection in modern network traffic datasets. The second paper introduced a deep learning framework aimed at quantifying uncertainty in the decisions made by the network intrusion detection classifier by learning class-conditional representations of the internal features. The third paper evaluated issues in detecting malware from raw bytes using State Space Models, where the extreme sequence length of up to 200 million bytes is an impediment to modern language models. By adapting techniques from classical AI and vector symbolic architectures, they showed improved accuracy and reduced runtime on long sequence classification of malware. The fourth paper contended that reinforcement learning is not suitable for network intrusion detection when making temporal

observations through the arrival of sequential packets. Instead, their method conditioned a causal decision transformer on past trajectories, comprising rewards, network packets, and detection decisions, to generate future detection decisions that attain the desired return. The final paper introduced a dataset and parser, MalDICT. Anti-virus products are notorious for having inconsistent and undocumented formatting of their outputs, making automated processing a challenge. Their paper addressed this by reverse engineering over 70 anti-virus product outputs and implementing parsers to extract malware behaviors, platforms, exploitation, and packers.

After lunch, the second keynote was by Scott Coull, Director of Data Science Research, Mandiant (part of Google Cloud), entitled “The Importance of Systems Thinking in Machine Learning for Cybersecurity.” Scott stressed the need for a practical look at problems in AI for cyber-systems and not just a deep, hyperfocused approach to solve well-established problems in the research community. The talk has interesting points on the general misalignment between researchers and practitioners and emphasizes the importance of blending the academic and industry research communities and developing a systems-focused view on machine learning problems in cybersecurity.

Within the adversarial attack and budget allocation security session, the first paper discussed performing a red team evaluation using deep reinforcement learning to assess the robustness of network intrusion detection systems against evasion attacks. The second paper focused on generating adversarial examples against multilingual text classifiers, challenging the assumption that victim models are monolingual. The third paper presented an optimization framework for budget allocation for tasks such as cybersecurity infrastructure improvement, incident response planning, and employee training. It demonstrated a direct impact of the budget on the selection of the optimal set of preventive mitigation measures and the associated cybersecurity risk.

Within the LLM session, the first paper exposed that many LLMs output hate speech when prompted progressively. The second paper investigated the use of LLM for interpreting Tactics, Techniques, and Procedures (TTPs) (describes the attacker’s process). The paper results reveal that LLMs offer inaccurate interpretations of cyberattack procedures. Significant improvements are shown when Retrieval Augmented Generation (RAG) is used for decoder-only LLMs. The final paper presented a multi-task benchmark for the evaluation of LLM in cybersecurity tasks, e.g., one task was LLM used to detect phishing emails.

AICS 2024 was the eighth AI for Cyber Security workshop. The workshop had a healthy attendance of 25-30 people on average. This year, AICS was co-chaired by James Holt, Edward Raff, Ahmad Ridley, Dennis Ross, Ankit Shah, Arunesh Sinha, Diane Staheli, and Allan Wollaber. This report was written by Edward Raff, Ankit Shah, and Arunesh Sinha.

Artificial Intelligence for Operations Research (W10)

The Artificial Intelligence for Operations Research Workshop (W10), held at AAAI 2024 on February 26th, 2024, provided a platform for exploring the integration of AI within the Operations Research workflow. It specifically focused on its impact on OR modeling and model-solving phases. The workshop catered to an audience from both academia and industry, who are at the intersection of Machine Learning and Optimization.

The workshop featured seven international speakers from various countries (Canada, USA, Europe, and Asia): Serdar Kadioglu, Madeleine Udell, Yong Zhang, Ellen Vitercik, Martin Takac, Phebe Vayanos, and Maximilian Schiffer. It showcased the latest advancements and methodologies in AI for OR, particularly emphasizing deep learning algorithms and the transformative capabilities of language models like GPT-4 in mathematical modeling and problem-solving. The morning session focused on integrating AI for OR Modeling, with three speakers presenting the recent advancements of language models in formulating mathematical models, generating optimization problem formulations, and assisting in decision-making processes within OR contexts. The second session covered AI for Model Solving, focusing on devising effective strategies and algorithms to solve mathematical optimization models.

The last part of the workshop consisted of a panel discussion on the opportunities and challenges of integrating LLMs in Operations Research. The panel included Serdar Kadioglu, Madeleine Udell, and Yong Zhang and was moderated by Bissan Ghaddar. The discussions centered on the following key questions:

  • How can LLMs enhance the decision-making process?
  • What are some successful stories of integrating LLMs with OR (in academia as well as in the industry)?
  • How does Explainable AI fit into the picture? What about ethical considerations, including bias and mitigation strategies, when using LLMs in decision-making?
  • How can we build an interdisciplinary community in this area?

The workshop ended with a poster session where six posters presented different aspects of integrating AI and Optimization. This initiated several discussions, and the papers of the presented posters will be published in a special issue of INFOR. Attendees and speakers recognized the workshop’s significance in addressing innovative approaches at the intersection of AI and Operations Research. The organizers received very positive feedback from the audience and speakers about the success of the workshop: ‘Congratulations on the successful AAAI Workshop!’ and ‘Thank you again for the amazing workshop – I am leaving Vancouver with great optimism and excitement in this domain!’ The success of the workshop not only highlights the importance of addressing the potential for AI-driven solutions to drive transformative change in OR practices but has also generated momentum where some participants have decided to collaborate on new projects relevant to these topics.

For a comprehensive overview, visit the workshop’s website:

Bissan Ghaddar, Claudia D’Ambrosio, Giuseppe Carenini, Jie Wang, Yong Zhang, Zhenan Fan, and Zirui Zhou served as cochairs of this workshop. This report was written by Bissan Ghaddar.

Artificial Intelligence for Time Series Analysis (AI4TS): Theory, Algorithms, and Applications (W11)

Time series data are becoming ubiquitous in numerous real-world applications, e.g., IoT devices, healthcare, wearable devices, smart vehicles, financial markets, biological sciences, environmental sciences, etc. Given the availability of massive amounts of data, their complex underlying structures/distributions, together with the high-performance computing platforms, there is a great demand for developing new theories and algorithms to tackle fundamental challenges (e.g., representation, classification, prediction, causal analysis, etc.) in various types of applications.

The goal of this workshop is to provide a platform for researchers and AI practitioners from both academia and industry to discuss potential research directions and key technical issues and present solutions to tackle related challenges in practical applications. The workshop will focus on both the theoretical and practical aspects of time series data analysis and aims to trigger research innovations in theories, algorithms, and applications. We will invite researchers and AI practitioners from the related areas of machine learning, data science, statistics, econometrics, and many others to contribute to this workshop. No formal report was filed by the organizers for this workshop.

Artificial Intelligence with Biased or Scarce Data (W12)

Despite notable advancements, integrating Artificial Intelligence (AI) into practical uses like autonomous vehicles, industrial robotics, and healthcare remains a formidable task. This complexity arises from the diverse and rare occurrences in the real world, necessitating AI algorithms to train on extensive data. However, these domains often suffer from data scarcity, making it difficult to gather raw or annotated data. Even when data is available, inherent biases creep in during collection, leading to skewed models. To address these concerns and with AI’s growing prominence, we aim to establish a platform for academics and industry professionals to deliberate on the challenges and remedies in constructing AI systems when confronted with limited data and biases. No formal report was filed by the organizers for this workshop.

Cooperative Multi-Agent Systems Decision-Making and Learning: From Individual Needs to Swarm Intelligence (W13)

With the tremendous growth of AI technology, Robotics, IoT, and high-speed wireless sensor networks (like 5G) in recent years, an artificial ecosystem termed artificial social systems has gradually formed, involving AI agents from software entities to hardware devices. How to integrate artificial social systems into human society and coexist harmoniously is a critical issue for the sustainable development of human beings. At this point, rational decision-making and efficient learning from multi-agent systems (MAS) interaction are the preconditions to guarantee multi-agent working in safety, balance the group utilities and system costs in the long term, and satisfy group members’ needs in their cooperation. From the cognitive modeling perspective, it may provide a more realistic basis for understanding cooperative multi-agent interaction by embodying realistic constraints, capabilities, and tendencies of individual agents in their interaction, including physical and social environments.

A number of research trends and challenges provide insights into this field. One important issue is how to model behaviors of cooperative MAS from the individual cognitive model’s aspect, like agent needs and innate value (utility), in their decision-making and learning. The other crucial problem is how to build a robust, stable, and reliable trust network among AI agents, such as trust among robots and between humans and robots, evaluating their performance and status in a common ground when they make collective decisions and learn from interactions in complex and uncertain environments. Furthermore, exploring practical and efficient reinforcement learning (RL) methods, such as deep RL and multi-agent RL, for global and partial cooperation in centralized, decentralized, and distributed ways is still challenging. The complexity of cooperative multi-agent problems will rise rapidly with the number of agents or their behavioral sophistication, especially in determining the action sequence and strategies and learning from the interactions adapting to complex and dynamically changing environments. In the invited speakers’ section, Pro. Maria Gini (University of Minnesota) discussed how to coordinate a large number of robots fulfilling a task, Pro. Giovanni Beltrame (Polytechnique Montreal) introduced the role of hierarchy in multi-agent decision-making and Pro. Christopher Amato (Northeastern University) clarified fundamental challenges and misunderstandings of multi-agent RL.

Moreover, cooperative MAS considers multiple agents interacting in complex or uncertain environments to jointly solve tasks and maximize the group’s utility. It surveys the system’s utility from individual needs. Balancing the rewards between agents and groups for MAS through interaction and adaptation in cooperation optimizes the global system’s utility and guarantees sustainable development for each group member, much like human society does. Pro. Aaron Courville (Université de Montréal) introduced the Q-value Shaping method to optimize an agent’s individual utility while fostering cooperation among adversaries in partially competitive environments, Pro. Michael L. Littman (Brown University) discussed the implementation of Interacting Agents and Safe(r) AI in the MAS-Human interaction, Pro. Kevin Leyton-Brown (University of British Columbia) talked about modeling nonstrategic human play in games by describing how the economic rationality of such models can be assessed and presenting some initial experimental findings showing the extent to which these models replicate human-like cognitive biases.

The application domains include self-driving cars, delivery drones, multi-robot rescue, swarm robots space exploration, automated warehouse systems, IoT devices, smart homes, unmanned medical care systems, automatic planting and harvesting systems, military scouting and patrolling, and real-time strategy (RTS) in video games. Especially the future factory is likely to utilize robots for a much broader range of tasks in a much more collaborative manner with humans, which intrinsically requires operation in proximity to humans, raising safety and efficiency issues. Specially, Pro. Sven Koenig (University of Southern California) introduced multi-agent path finding and its applications, including warehousing, manufacturing, train scheduling, and Pro. Marco Pavone (Stanford University) discussed artificial currency-based government welfare programs, e.g., transit benefits programs that provide eligible users with subsidized public transit.

Fourteen peer-reviewed papers were presented in the workshop, including five oral and nine poster presentations. They covered types like MAS RL in communication, traffic routing, Bayesian Soft Actor-Critic, multi-agent imperfect-information games, cognitive MAS RL, innate-values-driven RL, edge computing-based human-robot cognitive fusion, relational planning in MAS RL, etc. Some were accepted by the previous AAMAS conference, ACM SIGMOD, and top AI journals such as ACM TIST. The recording, photos and papers, of the workshop are available at workshop site: This report was written by Qin Yang.

Deployable AI (DAI) (W14)

Artificial Intelligence (AI) has evolved into a vast interdisciplinary research area, and we have reached an era where AI is beginning to be deployed as real-world solutions across various sectors/domains. Over the last few years, Generative AI in the form of models like GPT4, Bard, etc., has not only garnered interest across several sectors and shown tremendous success in various tasks but also has begun to see its applications in various sectors in naive ways. Moving to a wider scope of deployment of these AI models into the real world is not just a simple question of translational research and engineering but also requires several fundamental research questions and issues involving algorithmic, systemic, and societal aspects to be addressed, while also adhering to Responsible AI standards with respect to Fairness, Ethics, Privacy, Explainability, and Security. No formal report was filed by the organizers for this workshop.

EIW-III: The 3rd Edge Intelligence Workshop on Large Language and Vision Models (W16)

The workshop focused on the edge deployment of large language and vision models and how to make them more efficient in terms of Data, Model, Training, and Inference targeting edge and cloud platforms. This was an interdisciplinary workshop that covered the theory, hardware, and software aspects of AI models, especially for large language and vision models. The workshop offered an interactive platform for gathering different experts and talents from different research areas from academia and industry through invited talks, panel discussions, paper submissions, poster presentations, and contributed oral presentations.

There has been a growing demand to make intelligent applications more energy efficient by moving them to edge devices, given their advantages, such as increased privacy or lower network latency compared to cloud computing. Edge devices have been the key enabler for modern AI applications. In recent years, we have seen a trend in homogenizing edge and IoT devices, in which billions of mobile and IoT devices are connected to the Internet, generating a huge amount of data. AI for edge and IoT devices have received significant attention, leading to different synonyms, such as Edge AI, Edge Intelligence, Low Resource Computing, Energy Efficient Deep Learning, and Embedded AI, among others. The enterprise and the customer both showed considerable interest in keeping computation on the edge. Deploying large language and vision models, which are growing in size (such as GPT, PALM, LLaMA, ViT, Swin Transformer, and conversational models, e.g., ChatGPT) on resource-constrained edge devices poses various challenges, especially on edge devices. For instance, training these over-parameterized models on a huge amount of data and fine-tuning them is very expensive and consumes a huge portion of memory and computational power. On the other hand, embedded devices function under limited embedded memory and computing power, which creates the main challenge of bringing intelligence to the edge.

The scope of this workshop specifically covered:

  • Efficient Pre-training and Fine-tuning: How can we run training on edge devices with limited memory and computational power?
  • Parameter-efficient tuning solutions, i.e., training only a portion of the entire network.
  • Accelerating the fine-tuning process (e.g., by improving the optimizer and layer-skipping).

Data Efficiency:

  • Pre-training with unlabeled data
  • Fine-tuning with labeled data.
  • Labeling data and incorporating human-annotated data or human feedback are very time-consuming and costly.
  • How can we reduce the requirements for human-labeled data? (E.g., using RLAIF instead of RLHF. )
  • Can we rely on machine-generated data for training models? (E.g., data collected from ChatGPT)
  • How can we effectively use Data compression and data distillation?

Efficient Deployment:

  • How can we reduce the inference time or memory footprint of a trained model for a particular task?
  • How do we rely on in-context learning and prompt engineering of large language models or fine-tuning smaller models by knowledge transfer from larger models?
  • How we can use neural model compression techniques such as (post-training) quantization, pruning, layer decomposition, and knowledge distillation (KD) for language and vision models.
  • How can we evaluate the impact of different efficient deployment solutions on the inductive biases learned by the original models (such as OOD generalization, in-context learning, faithfulness, and hallucination)?

Other Efficient Applications:

  • How can we leverage knowledge localization, knowledge editing, or targeted editing/training of foundation models?
  • How can we extend such models for efficient dense retrieval and search?
  • How should we train these large models on the device?
  • How to incorporate external knowledge into pre-trained models.
  • How to implement efficient federated learning on such big models: reduce the communication costs, tackling heterogeneous data and heterogeneous models.

The workshop is held as a one-day workshop as a part of AAAI-2024 in Vancouver, Canada, with over 50 participants, 14 accepted papers, five invited talks, and a panel discussion. The papers and the slides are available on the workshop website: Warren Gross and Vahid Partovi Nia were co-chairs. This report was written by Vahid Partovi Nia.

FACTIFY 3.0 – Workshop Series on Multimodal Fact-Checking and Hate Speech Detection (W17)

It is a Workshop Series on Multimodal Fact-Checking and Hate Speech Detection. We are organizing two shared tasks namely Factify 3.0 and Dehate. We (tentatively) plan to invite: Prof. Eduard Hovy, Professor at the Language Technologies Institute, at the Carnegie Mellon University ; Prof. Yejin Choi, Professor at the University of Washington; and Zhe Gan who is a Staff Research Scientist at Apple AI/ML and has worked on large-scale multimodality. No formal report was filed by the organizers for this workshop.

Graphs and more Complex Structures For Learning and Reasoning (GCLR) (W18)

The fourth edition of the Graphs and Complex Structures for Learning and Reasoning (GCLR) workshop convened to foster interdisciplinary dialogue among scholars spanning computer science, mathematics, statistics, physics, and related domains. The global participation and enthusiastic feedback underscored the significance of the workshop’s themes.

In the opening keynote, “Human Allied Learning of Neurosymbolic Models,” Sriraam Natarajan explored the convergence of symbolic and statistical approaches in Artificial Intelligence (AI). He emphasized the necessity of integrating different pathways in AI to enable seamless human interaction. Prof. Natarajan highlighted the potential of rich, structured, and complex data models for fostering human-AI collaboration. Additionally, he discussed advancements in incorporating human advice into learning algorithms and the significance of “closing the loop” to enhance the self-awareness of AI agents.

These insights laid the groundwork for subsequent discussions on leveraging AI methodologies in diverse domains. Speakers such as Mahashweta Das delved into the application of Graph Neural Networks (GNNs) in financial data analysis. She discussed how GNNs offer a powerful framework for modeling complex financial networks and extracting meaningful insights from large-scale financial datasets. Clara Stegehuis elucidated the challenges in detecting underlying geometries in network structures. She emphasized the limitations of traditional metrics, such as triangle counts and clustering coefficients, in identifying geometries induced by hyperbolic spaces. Michael Galkin presented innovative approaches for knowledge graph reasoning, emphasizing the importance of foundation models in graph learning. Galkin demonstrated the superior performance of ULTRA in zero-shot inference mode and its applicability in complex logical query answering on knowledge graphs. His talk underscored the significance of foundation models in advancing reasoning capabilities in graph-based systems.

Furthermore, Serina Chang highlighted the role of AI methods in understanding complex human networks and decision-making processes, particularly in the context of the COVID-19 pandemic response. Chang shared insights from her team’s work on developing AI-driven tools for pandemic response and supply chain disruptions, showcasing the transformative potential of AI in addressing societal challenges. Jiliang Tang provided a retrospective on the evolution of graph-based learning techniques and envisioned future trajectories toward graph foundation models. Karthik Subbian concluded the keynote session by addressing practical challenges in graph representation learning, underscoring the complexities inherent in real-world applications.

Collectively, these keynote presentations underscored the interdisciplinary nature of AI research and its transformative potential in addressing contemporary societal challenges. In addition to the talks, there were many high-quality submissions to the workshop. Our program committee consisted of more than 50 researchers with diverse areas of expertise. All the paper submissions received at least three, and many of them got five constructive reviews. Based on the reviews, 20 high-quality papers were accepted. Authors of these papers presented their works in the poster session.

The audience was very attentive and asked several interesting questions during the keynote talks and poster presentation session, which made this hybrid event very interactive. We believe some of the attendees made new friends at the GCLR workshop, which may lead to future collaborations.

The fourth instance of the GCLR workshop was co-organized by Balaraman Ravindran (IIT Madras), Ginestra Bianconi (Queen Mary University of London), Philip S. Chodrow (Middlebury College), Nitesh Chawla (University of Notre Dame), Tarun Kumar (Hewlett-Packard Labs), Deepak Maurya (Purdue University), Revathy Venkataramanan (Univ. of Southern California, Hewlett-Packard Labs), Rucha Bhalachandra Joshi (NISER Bhubaneswar). This report was written by Deepak Maurya, Tarun Kumar, and Balaraman Ravindran.

Health Intelligence (W3PHIAI-24) (W19)

The 8th International Workshop on Health Intelligence was held in Vancouver, Canada, on February 26th and 27th, 2024. This workshop brought together a wide range of computer scientists, clinical and health informaticians, researchers, students, industry professionals, national and international health and public health agencies, and NGOs interested in the theory and practice of computational models of population health intelligence and personalized healthcare to highlight the latest achievements in the field.

Population health intelligence includes a set of activities to extract, capture, and analyze multi-dimensional socio-economic, behavioral, environmental, and health data to support decision-making to improve the health of different populations. Advances in artificial intelligence tools and techniques and internet technologies are dramatically changing the ways that scientists collect data and how people interact with each other, as well as with their environment. The Internet is also increasingly used to collect, analyze, and monitor health-related reports and activities and to facilitate health promotion programs and preventive interventions. In addition, to tackle and overcome several issues in personalized healthcare, information technology will need to evolve to improve communication, collaboration, and teamwork between patients, their families, healthcare communities, and care teams involving practitioners from different fields and specialties.

This workshop follows the success of previous health-related AAAI workshops, including the ones focused on personalized (HIAI 2013-16) and population (W3PHI 2014-16) healthcare and the seven subsequent joint workshops held at AAAI-17 through AAAI-23 (W3PHIAI-17 – W3PHIAI-23). This year’s workshop brought together a wide range of participants from the multidisciplinary field of medical and health informatics. Participants were interested in the theory and practice of computational models of web-based public health intelligence as well as personalized healthcare delivery. The papers (full and short) and the posters presented at the workshop covered a broad range of disciplines within Artificial Intelligence, including Generative AI, Prediction, Deep Learning, Large Language Models, and Knowledge Discovery.

The workshop included three invited talks: (1) Elizabeth Borycki (University of Victoria) spoke on the evolution of the field of technology safety in healthcare, (2) Thomas Kannampallil (Washington University) described the design, development, and implementation of machine learning-based clinical decision support for perioperative settings, and (3) Dimitris Spathis (Nokia Bell Labs), presented work in self-supervised multimodal learning to provide labels for machine learning on multimodal data. Presentations during the workshop highlighted the promises of AI in healthcare while also focusing on issues concerning ethics, system thinking for design and deployment, the financial and time costs of implementing AI methods in practice, and communication barriers between AI developers and practitioners. There was a robust discussion during various sessions on the use (and misuse) of generative AI, including large language models, in the provision of care.

Martin Michalowski, Arash Shaban-Nejad, and Simone Bianco served as co-chairs of this workshop and all the workshop papers are published by Springer in their “Studies in Computational Intelligence” series. This report was written by Martin Michalowski , Arash Shaban-Nejad , and Simone Bianco.

Human-Centric Representation Learning (HCRL) (W20)

The Human-Centric Representation Learning workshop ( at AAAI 2024 brought together researchers who are broadly interested in modern representation learning for human-centric data. Representation learning has become a key research area in machine learning and artificial intelligence, with the goal of automatically learning useful representations of data for a wide range of tasks. Powerful models like GPT4 or Llama 2 are trained in a self-supervised way in order to learn generalized representations. However, traditional representation learning approaches often fail to consider the human perspective and context, leading to representations that may not be interpretable or meaningful to humans. Here are some highlights from our workshop:

Accepted papers spanned a diverse range of topics in cutting-edge AI research and applications. This included computer vision, multimodal learning, fairness and ethics considerations, interpretability and explainability of models, learning effective representations, continual learning, generative modeling techniques, and novel applications in healthcare, among others. All accepted papers are available on our website and Arxiv. Several papers tackled challenges in multimodal and sequential data representation learning through novel self-supervised and graph neural network approaches. One of the papers found that “self-supervised learning has the capacity to achieve performance on par with supervised methods while significantly enhancing fairness – exhibiting up to a 27% increase in fairness with a mere 1% loss in performance through self-supervision.” For wearable sensor data with missing values, another paper showed that “transformers outperform baselines for missing data imputation of signals that change more frequently, but not for monotonic signals.”

Other works advanced interpretable representation learning for high-stakes domains like healthcare and brain data analysis. One of the papers demonstrated the value of “explainable GNNs, providing personalized feature importance scores for enhanced interpretability and clinical relevance” in predicting fatty liver disease from health data. In brain decoding, another paper introduced a cross-subject alignment technique that enabled “high-quality brain decoding and a potential reduction in scan time by 90%” when reconstructing visual stimuli from fMRI data across subjects. For continual learning on wearable sensors, a paper combined contrastive learning with self-training “to leverage unlabelled and labeled data,” achieving “the best overall trade-off in continual learning” for activity recognition.

We awarded three papers that share the common goal of aligning AI models, especially large language models, with human values, preferences, and social intelligence. One proposes techniques for improved controllability of language model outputs through activation steering, allowing humans to guide model behavior. Another explores hybrid natural language and feedback signals to fine-tune models toward satisfying human feedback during training itself. The third takes a fundamental perspective, studying how AI can learn representations mirroring human conceptualization and reasoning, positing this as a path to encode human values and social skills within AI models. Collectively, these works recognize the critical need for advanced AI capabilities to be directed responsibly, cohering with human preferences, value systems, and societal norms.

We had five keynote talks from academic and industry experts who covered a wide range of AI research topics. Ishan Misra (Meta) discussed using powerful diffusion models as multimodal world models that can generate videos and integrate with large language models. Jimeng Sun (University of Illinois) presented work on leveraging generative AI to improve various aspects of clinical trial development. Marinka Zitnik (Harvard University) covered research towards developing universal foundation models for time series data across diverse domains. Neil Zeghidour (KyutAI) introduced audio language models that unify audio analysis and synthesis using neural codecs and autoregressive sequence modeling. Ahmad Beirami (Google) talked about alignment techniques to fine-tune large language models to satisfy human preferences while minimally perturbing the base model. Despite their distinct focus areas, the keynotes collectively highlighted the frontiers of generative AI capabilities, multimodal learning, domain-specific adaptations of foundation models, human-centered controls, and applications across areas like healthcare and audio processing. They exemplified the rapid progress towards more capable, general, and human-compatible AI systems that can synthesize rich data modalities while adhering to human-specified constraints and real-world requirements.

Dimitris Spathis (Nokia Bell Labs / University of Cambridge) and Aaqib Saeed (Eindhoven University of Technology) served as co-chairs of this workshop and co-authored this report. The rest of the organizers included Ali Etemad (Queen’s University), Sana Tonekaboni (MIT), Stefanos Laskaridis (Brave), Shohreh Deldari (University of New South Wales), Ian Tang (Nokia Bell Labs / University of Cambridge), Patrick Schwab (GSK), and Shyam Tailor (Google).

Imageomics: Discovering Biological Knowledge from Images using AI (W21)

Imageomics is an emerging scientific field that uses images, ranging from microscopic cell images to videos of charismatic megafauna, to automatically extract biological information, specifically traits, for understanding the evolution or function of living organisms. A central goal of Imageomics is to make traits computable from images by grounding AI models in existing scientific knowledge. The goal of this workshop is to nurture the community of researchers working at the intersection of AI and biology and shape the vision of the nascent yet rapidly growing field of Imageomics. No formal report was filed by the organizers for this workshop.

Large Language Models for Biological Discoveries (W22)

Rapid advances in large language models (LLMs) provide an unprecedented opportunity to further scientific inquiry across scientific disciplines. Despite remarkable feats in natural language tasks, the potential of LLMs beyond natural language has yet to be realized. The “Large Language Models for Biological Discoveries” workshop provided an international forum that brought together diverse researchers from computer science, information science, and molecular, cellular, and systems biology to focus on unique challenges to LLMs for advancing biological discoveries, such as standardized datasets, community-accepted benchmarks, experimental noise and uncertainty quantification, interpretation, and injection of prior biological knowledge.

Twelve presentations from authors of accepted papers, a keynote, and two invited talks provided a broad overview of the state-of-the-art in LLM-enabled research across benchmark problems in genomics, proteomics, bioinformatics, and cheminformatics. Scientific outcomes included formulating new problem spaces, standardized datasets, community-accepted benchmarks, accounting for experimental error and quantifying uncertainty, injecting prior biological knowledge, etc. The program met these outcomes through three thematic sessions that focused on LLMs for Genomics, LLMs for Proteomics, and LLMs for Bio & ChemInformatics.

The workshop met key additional outcomes in the inclusion of more researchers in the identified intersectional communities and the catalysis of further innovation on accessible and inclusive LLMs to power the next scientific breakthroughs. A Github project,, that accompanied the workshop activity provided a long-term platform for sharing workshop research articles, datasets, benchmarks, metrics, and other resulting knowledge for the workshop in this debut offering at AAAI 2024 and other planned annual offerings.

This workshop, organized by co-chairs Amarda Shehu, Yana Bromberg, and Liang Zhao, provided a forum for diverse researchers to deeply understand scientific problems, community-acceptable standards, datasets, metrics, and benchmark tasks that truly capture our ability to advance on a problem, and precious knowledge gathered over decades of hard-fought research. The workshop co-chairs, Yana Bromberg, Amarda Shehu, and Liang Zhao, put together this report.

Learnable Optimization (LEANOPT) (W23)

The AAAI Workshop on Learnable Optimization (LEANOPT) builds on the momentum that has been directed over the past six years in both the operations research (OR) and machine learning (ML) communities towards establishing modern ML methods as a “first-class citizen” at all levels of the OR toolkit.

While much progress has been made, many challenges remain due in part to data uncertainty, the hard constraints inherent to OR problems, and the high stakes involved. LEANOPT will serve as an interdisciplinary forum for researchers in OR and ML to discuss technical issues at this interface and present new ML approaches and software tools that accelerate classical optimization algorithms (e.g., for continuous, combinatorial, mixed-integer, stochastic optimization) as well as novel applications. No formal report was filed by the organizers for this workshop.

Machine Learning for Cognitive and Mental Health (W24)

With a COVID-19 magnified mental health crisis and a growing old population (10.7% of the population aged over 65 is diagnosed with Alzheimer’s disease, and 18% is diagnosed with mild cognitive impairment (MCI)), there is an immediate need for developing systems that can better understand and characterize cognitive and mental health (CMH) by tracking various biomarkers from functional magnetic resonance imaging (fMRI), electroencephalogram (EEG), speech, electronic health record (EHR), movement, cognitive surveys, wearable devices, structured, genomic, and epigenomic data. One of the core technical opportunities for accelerating the computational analysis of CMH lies in machine learning (ML), which models the heterogeneity and interconnections between diverse input signals. In addition, it is imperative to emphasize the necessity for increased data sharing and enhanced collaboration within the CMH research community.

Recently, major progress has been made in pre-trained deep and MM learning from text, speech, images, video, signals, and structured data, and there has also been initial success in using deep learning and MM streams to improve the prediction of patient status or response to treatment in CMH applications. However, there remain computational and theoretical challenges that need to be solved in ML for CMH, spanning 1) collecting and sharing quality data for moderate and severe patients, 2) learning from many diverse and understudied signals, 3) theoretically understanding the natural way of modality connections and interactions in MM learning, 4) real-world deployment concerns such as safety, robustness, interpretability, and collaboration with various stakeholders, and 5) extending models to low resource and multilingual environments.

This workshop had three primary goals: 1) bring together experts from multiple disciplines working on ML and CMH to learn from each other, 2) encourage the development of shared goals and approaches across these communities, and 3) stimulate the creation of better MM technologies for real-world CMH impact. To achieve these goals, this workshop included a diverse lineup of invited speakers across fields associated with ML and CMH, hosting experts from computer vision (CV), natural language processing (NLP), MM learning, signal processing, human-computer interaction, neuroscience, psychiatry, and psychology. To encourage discussion and further collaboration toward the advancement of ML for CMH, the workshop combined invited talks, contributed papers and posters, and panel discussions. In addition, organizers hosted a mentorship program with the help of mentors from the program committee in order to increase reach and to help researchers from across the world who are new to this field improve the quality of their papers before the submission time.

This workshop encouraged collaboration to solve critical CMH tasks and published papers that developed new datasets, models, and resources to foster CMH research. In addition, the workshop encouraged multilingual and multimodal research. The organizers put an effort to

invite keynote speakers, panelists, and program committee members from diverse backgrounds, involving both academia and industry. Moreover, the program committee comprised researchers from 12 countries across five continents.

The workshop featured six keynote speakers, oral sessions, poster sessions, a panel discussion, and a networking lunch. Of the 20 submitted papers, six were selected for oral and poster presentation, and an additional nine were selected for poster presentation only. The acceptance rate was, therefore, 75%. All accepted papers are published in the open-access workshop’s proceedings at Further details about the workshop, including the program, presentations, posters of accepted papers, and details about organizers and program chairs, can be accessed at The event was sponsored by Cambridge Cognition and Winterlight Labs.

Keynote Speakers were: 1) dr. Peter Foltz, Professor at the University of Colorado, Boulder; 2) dr. Irina Rish, Professor at the University of Montreal, MILA, CIFAR; 3) dr. Guillermo Cecchi, Principal Researcher at (IBM; 4) dr. Paola Pedrelli, Assistant Professor at Harvard Medical School; 5) dr. Robert JT Morris, Chief Technology Strategist at the Ministry of Health Transformation, Singapore; 6) dr. Sunny X. Tang, Assistant Professor at Northwell Health.

Panel Speakers were: 1) Dr. Peter Foltz, Professor at the University of Colorado, Boulder; 2) Dr. Paola Pedrelli, Assistant Professor at Harvard Medical School; 3) Dr. Frank Rudzicz, Associate Professor at Dalhousie University, Vector Institute, CIFAR; 4) Jekaterina Novikova, ML Director at Winterlight Labs; 5) Vikram Ramanarayanan, Chief Science Officer at Modality.AI; 6) Dr. Xiaoxiao Li, Assistant Professor at the University of British Columbia, Vector Institute, CIFAR.

Dr. Marija Stanojevic, Dr. Elizabeth Shriberg, Paul Pu Liang, Dr. Jelena Curcic, Dr. Zining Zhu, Malikeh Ehghaghi, and Ali Akram served as cochairs of this workshop. This report was written by Marija Stanojevic.

Neuro-Symbolic Learning and Reasoning in the Era of Large Language Models (NuCLeaR) (W25)

We are thrilled to announce the workshop on Neuro-Symbolic Learning and Reasoning in the Era of Large Language Models (NucLeaR) at AAAI 2024 to be held on February 26-27, 2024. This workshop aims to provide a dedicated platform for researchers to present and share their cutting-edge advancements in the next generation of neuro-symbolic AI (NSAI). By creating an environment conducive to knowledge exchange and the exploration of innovative ideas, we aim to foster collaboration and inspire new breakthroughs. No formal report was filed by the organizers for this workshop.

Privacy-Preserving Artificial Intelligence (W26)

The rise of machine learning, optimization, and Large Language Models (LLMs) has created new paradigms for computing, but it has also ushered in complex privacy challenges. The intersection of AI and privacy is not merely a technical dilemma but a societal concern that demands careful considerations.

The Privacy Preserving AI workshop, in its 5th edition, will provide a multi-disciplinary platform for researchers, AI practitioners, and policymakers to focus on the theoretical and practical aspects of designing privacy-preserving AI systems and algorithms. The emphasis will be placed on policy considerations, broader implications of privacy in LLMs, and the societal impact of privacy within AI. No formal report was filed by the organizers for this workshop.

Public Sector Large Language Models: Algorithmic and Sociotechnical Design (W27)

Just as pre-trained Large Language Models (LLMs) like GPT-4 have revolutionized the field of AI with their remarkable capabilities in natural language understanding and generation, LLM-powered systems also see great potential for contributing to the well-being of our society through public sector applications, which often feature governmental, publicly-funded and non-profit organizations. The AAAI-24 Workshop on Public Sector Large Language Models: Algorithmic and Sociotechnical Design brought together researchers and practitioners from AI, HCI, and social sciences to present their research around LLM-powered tools in the context of public sector applications. The audience consisted of (i) AI researchers looking to develop LLM-powered tools for public sector applications and address the real-world challenges raised by these technologies; (ii) HCI and social science researchers studying how LLM-based systems interact with humans and human organizations; and (iii) Researchers and practitioners from public sector organizations looking to leverage LLM technologies for social good.

Invited speakers at the workshop included prominent researchers from various institutions. Malihe Alikhani, Assistant Professor at Northeastern University, discussed the importance of designing inclusive and equitable language technologies to serve diverse populations, particularly underserved communities. Kevin Leyton-Brown, Professor at the University of British Columbia, presented a methodology for assessing the economic rationality of LLM agents, focusing on decision-making processes. Niloufar Salehi, Assistant Professor at UC Berkeley, shared insights on human-computer interaction and the application of LLMs in areas such as machine translation in healthcare. Diyi Yang, Assistant Professor at Stanford University, discussed her research in computational social science and natural language processing, aiming to build socially aware NLP systems.

The 11 papers presented at the AAAI-24 Workshop on Public Sector Large Language Models: Algorithmic and Sociotechnical Design collectively addressed a diverse array of topics showcasing the potential applications and challenges of integrating LLMs into public sector domains. Researchers proposed innovative approaches for leveraging LLMs in specific contexts, such as the continued pre-training of models for specialized domains like government and climate change communication. These efforts aimed to enhance the effectiveness and relevance of LLM-powered tools for addressing pressing societal issues.

Moreover, several papers focused on the societal implications and ethical considerations associated with deploying LLMs in public sector settings. For instance, discussions centered around designing LLMs to be more inclusive and equitable, ensuring that these technologies effectively serve diverse populations while minimizing biases and disparities. Additionally, researchers explored the practical applications of LLMs in areas such as media monitoring for environmental conservation and providing support for frontline workers, such as homelessness caseworkers, through LLM-based tools.

The co-chairs of the workshop were Ryan Shi, Hong Shen, Sera Linardi, Lei Li, and Fei Fang. This report was written by Ryan Shi.

Recommendation Ecosystems: \\ Modeling, Optimization and Incentive Design (W28)

The workshop centers on the multi-faceted landscape of Recommender Ecosystems (RESs), which couple the behaviors of users, content providers, vendors, and advertisers to determine the long-term behavior of the platform. While prevalent in numerous online products, the modeling, learning, and optimization technologies traditionally used in recommenders prioritize interactions with a single user. Recent research has delved into multi-agent dynamics and economic interactions within RESs, encompassing areas like fairness, popularity bias, market design, social dynamics, and more. Despite its significance, this research remains fragmented across various academic domains. This workshop aspires to bridge these communities, emphasizing the convergence of diverse topics like game-theoretic models, AI techniques, and social dynamics to holistically comprehend RES ecosystems. By fostering interdisciplinary dialogue, the workshop aims to spotlight the complexities of RESs, engendering fresh insights and solutions. No formal report was filed by the organizers for this workshop.

Responsible Language Models (W29)

This report highlights key insights from the Responsible Language Models (ReLM) workshop at AAAI-2024, the first workshop in the responsible use of language models (LM) at AAAI. The one-day workshop took place in Vancouver on February 26, 2024, in person. The ReLM workshop focused on the development, implementation, and applications of LMs aligned with responsible AI principles. Both theoretical and practical challenges regarding the design and deployment of responsible LMs were discussed, including bias identification and quantification, bias mitigation, transparency, privacy and security issues, hallucination, uncertainty quantification, and various other risks associated with LMs. The workshop drew 68 registered attendees, showcasing widespread interest and engagement from various stakeholders.

Research Contributions: A total of 21 accepted articles, including six spotlight presentations and 15 posters, demonstrated the depth and breadth of research on the responsible use of LM. One paper, ‘Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs,’ was awarded ‘best paper,’ while another, ‘Inverse Prompt Engineering for Safety in Large Language Models,’ was a ‘runner-up’ in the categorization.

Talks: Six invited speakers, five panelists, and six spotlight papers discussed current and best practices, gaps, and likely AI-based interventions with diverse speakers from the United States, Canada, and India. The workshop successfully promoted collaboration among NLP researchers from academia and industry.

Our invited keynote speaker, Filippo Menczer from Indiana University, USA, presented a talk titled “AI and Social Media Manipulation: The Good, the Bad, and the Ugly.” The talk focused on analyzing and modeling the spread of information and misinformation in social networks, as well as detecting and countering the manipulation of social media. In the first invited talk, Frank Rudzicz from Dalhousie University, Canada, discussed the dangers of language models in his talk titled “Quis custodiet ipsos custodes?” The second invited talk was by Kun Zhang from Carnegie Mellon University and Mohamed bin Zayed University of Artificial Intelligence, UAE, who spoke about “Causal Representation Learning: Discovery of the Hidden World.” Muhammad Abdul-Mageed from the University of British Columbia, Canada, gave the third invited talk on “Inclusive Language Models,” focusing on applications related to speech and language understanding and generation tasks. Balaraman Ravindran from IIT Madras, India, presented “InSaAF: Incorporating Safety through Accuracy and Fairness. Are LLMs ready for the Indian Legal Domain?” The talk examined LLMs’ performance in legal tasks within the Indian context, introducing the Legal Safety Score to measure fairness and accuracy and suggesting fine-tuning with legal datasets. Lastly, Sarath Chandar from Ecole Polytechnique de Montre ́al, Canada, discussed “Rethinking Interpretability” in his talk.

Panel discussion: The topic of the panel discussion is Bridging the Gap: Responsible Language Model Deployment in Industry and Academia. The panel discussion was moderated by Peter Lewis, Ontario Tech University, Canada, and featured Antoaneta Vladimirova from Roche, Donny Cheung from Google, Emre Kiciman from Microsoft Research, Eric Jiawei He from Borealis AI, and Jiliang Tang from Michigan State University. This discussion focused on the challenges and opportunities associated with deploying LMs responsibly in real-world scenarios. The panelists advocated for the implementation of policies to establish standardized protocols for LMs before deployment. This emphasis on proactive measures aligns with the goal of ensuring responsible and ethical use of LMs.

Sponsors & Partners: This workshop was sponsored by Connected Minds from the University of Guelph, and VISTA & CareAI from York University. Vector Institute for AI was the Institute partner to provide support for the organization of the workshop.

In conclusion, the AAAI ReLM workshop is the first in a series of the responsible use of LMs and served as a valuable platform for collaboration, knowledge exchange, and advocacy, making significant efforts toward enabling responsible language model deployment across diverse domains.

This report was written by Faiza Khan Khattak, Lu Cheng Sedef Akinli-Kocak, Laleh Seyed-Kalantari, Mengnan Du, Fengxiang He, Bo Li, Blessing Ogbuokiri, Shaina Raza, Yihang Wang, Xiaodan Zhu, and Graham W. Taylor.

Scientific Document Understanding (W30)

Due to the fast growth of scientific publications, keeping abreast of new findings and recognizing unsolved challenges are becoming more difficult for researchers in various fields. It is thus necessary to be equipped with state-of-the-art technologies to effectively combine precious findings from diverse scientific documents into one easily accessible resource. Due to its importance, there have been some efforts to achieve this goal for scientific document understanding (SDU). However, despite all of the recent progress, the fragmented research focusing on different aspects of this domain necessitates a forum for researchers from different perspectives to discuss achievements, new challenges, new resource requirements, and impacts of scientific document understanding on various fields. Furthermore, the recent introduction of advanced resources and tools designed for the processing of scientific documents, such as large language models (LLMs) and generative AI systems like Galactica (Taylor et al., 2022) and Med-PaLM (Singhal et al., 2023), opens up new opportunities to advance research and applications of scientific document understanding. The SDU workshop is thus designed to address these gaps specifically for the scientific community. In addition to the recent focus on scholarly text processing and document understanding in natural language processing, this workshop extends SDU to other scientific areas, including but not limited to scientific image processing, automatic programming, knowledge graph manipulation, and data management. We hope that this workshop will foster collaborations with researchers working on different scientific and AI areas for SDU. No formal report was filed by the organizers for this workshop.

Sustainable AI (W31)

While AI has made remarkable achievements across various domains, there remain legitimate concerns regarding its sustainability. The pursuit of enhanced accuracy in tackling large-scale problems has led to the adoption of increasingly deep neural networks, resulting in elevated energy consumption and the emission of carbon dioxide, which contributes to climate change. As an illustration, researchers estimated that the training of a state-of-the-art deep learning NLP model alone generated approximately 626,000 pounds of carbon dioxide emissions. The environmental sustainability of AI is not the only concern; its societal impact is equally significant. Ethical considerations surrounding AI, including fairness, privacy, explainability, and safety, have gained increasing attention. For instance, biases and privacy issues associated with AI can limit its widespread application in various domains. Furthermore, AI has the potential to make a profound societal impact by directly addressing contemporary sustainability challenges. Climate modeling, urban planning, and design (for mitigating urban heat islands or optimizing renewable energy deployment), as well as the development of green technologies (such as advanced battery materials or optimized wind/ocean turbine design), are areas where AI techniques can be extensively applied. Leveraging AI in these areas is crucial for ensuring a net benefit to sustainability. No formal report was filed by the organizers for this workshop.

Synergy of Reinforcement Learning and Large Language Models (W32)

The AAAI 2024 workshop on the synergy of reinforcement learning and large language models took place on Feb 26 at the Vancouver Convention Center.

Large Language Models (LLMs) took us by storm with the launch of ChatGPT and GPT-4, following the development of BERT, T5, and GPT. ChatGPT and many LLMs adopted reinforcement learning from human feedback (RLHF) for human alignment, and they excel at several NLP tasks out of the box, such as machine translation, summarization, and much more. At the same time, reinforcement learning (RL) has had remarkable achievements like playing Go, Starcraft II, and Gran Turismo. Beyond games, RL helped with magnetic control of tokamak plasmas, navigating stratospheric balloons, matrix multiplication, and sorting. We believe there is a huge untapped potential for the marriage of these two fields. In particular, LLMs can further benefit from the RL framework for planning, exploration, and personalization. While the RL community started to bring LLM advances on their side (e.g., Decision Transformer, Trajectory Transformer), LLMs provide unparalleled opportunities to enhance RL further: rich representation, explainability, and task decomposition.

The goal of our workshop was to bring together RL and LLMs communities to facilitate cross-pollination. We discussed possible opportunities including but not limited to the above areas. Furthermore we had practitioners share their success stories of working on RL&LLMs problems, and the insights gained from such applications.

We had five invited talks: 1) Asli Celikyilmaz from Meta about Charting New Pathways: Next-Gen LLM Reasoners, 2) Yun Nung (Vivian) Chen from National Taiwan University about From Bots to Buddies: Making Conversational Agents More Human-Like, 3) Aleksandra Faust from Google Deepmind about Autonomous Agents in the Age of Large Language Models, 4) Matthew Taylor from University of Alberta about Glide-RL: A Student Teacher Framework for Instruction Learning and Generalization, and 5) Ahmad Beirami from Google about Language Model Alignment: Theory & Practice. After each talk, there was a Q&A session.

We had eight contributed talks, including one oral and seven lightening, for accepted papers, with the following titles: 1) Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4, 2) Generate Subgoal Images before Act: Unlocking the Chain-of-Thought, 3) Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts, 4) CriticGPT: Multimodal LLM as a Critic for Robot Manipulation, 5) Decision Transformer With Tokenized Actions, Reinforcement Learning for Optimizing RAG for Domain Chatbots, 6) Software Security Vulnerability Repair Using Reinforcement Learning with Large Language Models, 7) Exploring Reinforcement Learning with Large Language Models for Enhancing Badminton Players’ Strategies, and 8) DeLF: Designing Learning Environments with Foundation Models.

Each accepted paper came with a poster, and we scheduled four poster sessions.

We had a panel discussion with Michael Littman from Brown University and Lihong Li from Amazon, together with Ahmad Beirami, Yun Nung (Vivian) Chen, and Aleksandra Faust. Alborz Geramifard from Meta was the moderator, taking questions collected with, as well as from the audience. The panelists discussed or even debated over topics including hallucination, LLMs’ potential for cold start in RL training, AI agents, world model, embodiment, generative AI as policy optimization, challenges and opportunities for academia with limited resources, etc.

We had insightful invited talks, excellent contributed talks, thought-provoking panel discussions, stimulating Q&As, engaging poster discussions, and casual chats. We managed to make the workshop as interactive as possible.

A heartful thank you to the wonderful speakers, panelists, reviewers, audience, and AAAI 2024 conference and workshop organizing committees.

A workshop is perfect to reunion old friends, to meet new friends, to share, to discuss and to learn. Let’s work together for further synergy of reinforcement learning and large language models and further development of AI.

Check more info at the workshop website:

Alborz Geramifard from Meta, Yuxi Li from, Minmin Chen from Google Deepmind, and Dilek Hakkani-Tur from UIUC served as co-chairs of this workshop. This report was written by Yuxi Li and Alborz Geramifard.

Workshop on Ad Hoc Teamwork (W33)

Research on ad hoc teamwork (AHT) has been around for at least 18 years (Rovatsos, Weiß, and Wolf, 2002; Bowling & McCracken, 2005), but it was first introduced as a formal challenge by Stone et al. (2010). The challenge discussed in these papers is: “To create an autonomous agent that is able to efficiently and robustly collaborate with previously unknown teammates on tasks to which they are all individually capable of contributing as team members.”

This workshop aims to bridge the collaboration between research communities working on the various research topics related to AHT and their real-world applications. At the same time, we hope this event can play an important role in attracting new researchers to the field and connecting them with established researchers in areas related to AHT. No formal report was filed by the organizers for this workshop.

XAI4DRL: eXplainable Artificial Intelligence for Deep Reinforcement Learning (W34)

In reinforcement learning (RL), agents are trained to interact with environments through actions and rewards. In this workshop, we examined the need for explanations in the realm of deep reinforcement learning (DRL), a domain where it is essential for human users to anticipate the agent’s future actions. The goal of our workshop was two-fold: (1) to invite established researchers in the field to position previous work and motivate ongoing challenges and (2) to call for workshop submissions to highlight ongoing and preliminary work. We recorded more than 200 registrations for the first workshop on eXplainable AI (XAI) for deep reinforcement learning (XAI4DRL) at AAAI 2024.

We had the pleasure of inviting four highly experienced researchers as keynote speakers, who were able to give diverse perspectives on the importance of explainability for developing fair and safe systems that can be deployed in real-world settings. Dr. Jerone Andrew is a Research Scientist at Sony AI, and he gave our first keynote on the role of XAI in human design making, titled: “A View From Somewhere: Decomposing the Dimensions of Human Decision Making.” Professor Mark Riedl is a Professor in the School of Interactive Computing, College of Computing, Georgia Institute of Technology, and Associate Director of GT Machine Learning Center. He gave a talk on the role of XAI in RL and human design making, titled: “Toward Human-Centered Explainable Artificial Intelligence .”Dr. Melissa Gervasio is a Technical Director at SRI International, and she spoke about industry insights in a talk titled “Toward Actionable Explanations for Autonomy.” Finally, Dr. Anurag Koul is a postdoctoral researcher at Microsoft Research who was unable to attend the workshop in person, but he contributed a pre-recorded talk titled “Explaining RL Agents from the Lens of Perception, Memory, and Uncertainties.” All four keynote speakers gave insightful talks. Dr. Gervasio, Dr. Koul, and Dr. Andrews gave compelling insights from the industry, and Professor Riedl provided insights from various academic projects, from story-enabled intelligence to game playing. After each keynote talk, we gave ample time for questions and discussion.

Eight submissions were invited for oral or poster presentations, and all submissions were peer-reviewed. Four of eight papers were invited for oral presentation (contributed talks), and four papers were invited for a poster session, which was well attended by the community. The contributed talks covered a wide range of topics spanning from explainable RL for Alzheimer’s Disease Progression to Cooperative Multiagent RL. They examined core questions in deploying DRL agents in scenarios with real-time collaboration between humans and agents. They examined how to integrate counterfactual explanations and how to ensure that small changes are understandable to humans. They also highlighted ongoing challenges like how to leverage explanations to enhance agent behavior, how to best integrate XAI in multi-agent RL (MARL), and how to represent explanation uncertainty in various applications and scenarios. The diverse topics highlight the need for nuanced research that applies XAI in RL.

Co-chairs: Prof. Roberto Capobianco is a senior research scientist at Sony AI; Oliver Chang is a PhD student at UC Santa Cruz; Biagio La Rosa is a PhD student at Sapienza University of Rome; Michela Proietti is a PhD student at Sapienza University of Rome, and Alessio Ragno is a PhD student at Sapienza University of Rome. This report was written by Leilani H. Gilpin.

XAI4Sci: Explainable machine learning for sciences (W35)

As the deployment of machine learning technology becomes increasingly common in applications of consequence, such as medicine or science, the need for explanations of the system output has become a focus of great concern. Unfortunately, many state-of-the-art models are opaque, making their use challenging from an explanation standpoint, and current approaches to explaining these opaque models have stark limitations and have been the subject of serious criticism.

The XAI4Sci: Explainable Machine Learning for Sciences workshop aims to bring together a diverse community of researchers and practitioners working at the interface of science and machine learning to discuss the unique and pressing needs for explainable machine learning models to support science and scientific discovery. These needs include the ability to (1) leverage machine learning as a tool to make measurements and perform other activities in a manner comprehensible to and verifiable by the working scientists and (2) enable scientists to utilize the explanations of the machine learning models in order to generate new hypotheses and to further knowledge of the underlying science.

The XAI4Sci workshop invites researchers to contribute short papers that demonstrate progress in the development and application of explainable machine techniques to real-world problems in sciences (including but not limited to physics, materials science, earth science, cosmology, biology, chemistry, and forensic science). The target audience comprises members of the scientific community interested in explainable machine learning and researchers in the machine learning community interested in scientific applications of explainable machine learning. The workshop will provide a platform to facilitate a dialogue between these communities to discuss exciting open problems at the interface of explainable machine learning and science. Leading researchers from both communities will cover state-of-the-art techniques and set the stage for this workshop.


Sedef Akinli-Kocak, Vector Institute for AI.

Simone Bianco, Principal Investigator and Director, Bay Area Institute, Altos Labs.

Jill Burstein is the Principal Assessment Scientist at Duolingo.

Lu Cheng, University of Illinois Chicago.

Mengnan Du, New Jersey Institute of Technology.

Jonas Ehrhardt is a PhD candidate at the Professorship Computer Science in Mechanical Engineering at the Helmut-Schmidt-University.

Alborz Geramifard is a research director at Meta.

Bissan Ghaddar is an Associate Professor at Ivey Business School, Western University.

Leilani H. Gilpin is an Assistant Professor at UC Santa Cruz.

Fengxiang He, University of Edinburgh.

René Heesch is a PhD candidate at the Professorship Computer Science in Mechanical Engineering at the Helmut-Schmidt-University.

Rachneet Kaur is a Research Scientists at the AI Research Team at J.P. Morgan Chase.

Faiza Khan Khattak, Vector Institute for AI.

Tarun Kumar is a Research Engineer at Hewlett Packard Labs.

Mingxiao Li, Computer Science Department, KU Leuven.

Bo Li, University of Edinburgh.

Yuxi Li is the founder of

Lydia T. Liu is an Assistant Professor of Computer Science at Princeton University.

Debshila Basu Mallick is the Director of Research at OpenStax, Rice University.

Deepak Maurya is a Research Scholar at Purdue University.

Martin Michalowski, Associate Professor, School of Nursing, University of Minnesota.

Vahid Partovi Nia is an Adjunct Professor at Polytechnique Montreal and Principal Research Scientist and Huawei Noah’s Ark AI Research Lab.

Blessing Ogbuokiri, York University.

Kassiani Papasotiriou is a Research Scientists at the AI Research Team at J.P. Morgan Chase.

Edward Raff is Director of Emerging AI at Booz Allen Hamilton, and Visiting Associate Professor in the Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County.

Balaraman Ravindran heads the Wadhwani School of Data Science and Artificial Intelligence (WSAI), the Robert Bosch Centre for Data Science & Artificial Intelligence (RBCDSAI) and the Centre for Responsible AI (CeRAI) at IIT Madras.

Shaina Raza, Vector Institute for AI.

Aaqib Saeed is an assistant professor at the Eindhoven University of Technology.

Laleh Seyed-Kalantari, York University.

Arash Shaban-Nejad, Associate Professor, University of Tennessee Health Science Center-Oak Ridge National Laboratory (UTHSC-ORNL) Center for Biomedical Informatics.

Ankit Shah is an Assistant Professor in the Department of Industrial and Management Systems Engineering, University of South Florida.

Amarda Shehu is a Professor of Computer Science at George Mason University.

Ryan Shi is an assistant professor in the Department of Computer Science at the University of Pittsburgh.

Suchetha Siddagangappa is a Research Scientists at the AI Research Team at J.P. Morgan Chase.

Arunesh Sinha is an Assistant Professor in the Department of Management Science and Information Systems, Rutgers Business School, Rutgers University.

Dimitris Spathis is a senior research scientist at Nokia Bell Labs and a visiting researcher at the University of Cambridge.

Biplav Srivastava is a Professor in the AI Institute at the University of South Carolina.

Marija Stanojevic, Ph.D.  is an Applied Machine Learning Scientist at Cambridge Cognition.

Graham W. Taylor, Vector Institute for AI, University of Guelph.

Yihang Wang, University of Chicago.

Di Xu is a senior research scientist from Huawei Cloud.

Yichao Yan is an assistant professor at the AI Institute, Shanghai Jiao Tong University.

Qin Yang is an assistant professor in the Computer Science and Information Systems Department at Bradley University and the director of the Intelligent Social Systems and Swarm Robotics Lab (IS3R).

Xiaodan Zhu, Queen’s University.