Karl Aberer, Ebrahim Bagheri, Marya Bazzi, Rumi Chunara, Ziv Epstein, Fabian Flöck, Adriana Iamnitchi, Diana Inkpen, Maurice Jakesch, Kyraki Kalimeri, Elena Kochkina, Ugur Kursuncu, Maria Liakata, Yelena Mejova, George Mohler, Daniela Paolotti, Jérémie Rappaz, Manon Revel, Horacio Saggion, Indira Sen, Panayiotis Smeros, Katrin Weller, Sanjaya Wijeratne, Christopher C. Yang, Fattane Zarrinkalam

The Association for the Advancement of Artificial Intelligence’s 2021 International Conference on Web and Social Media was held virtually from June 8-10, 2021. There were 8 workshops in the program: Data for the Wellbeing of Most Vulnerable, Emoji 2021: International Workshop on Emoji Understanding and Applications in Social Media, Information Credibility and Alternative Realities in Troubled Democracies, International Workshop on Cyber Social Threats (CySoc 2021), International Workshop on Social Sensing (SocialSens 2021): Special Edition on Information Operations on Social Media, Participatory Development of Quality Guidelines for Social Media Research: A Structured, Hands-on Design Workshop, Mediate 2021: News Media and Computational Journalism, Mining Actionable Insights from Social Networks: Special Edition on Healthcare Social Analytics.

Data for the Wellbeing of Most Vulnerable

This workshop focused on applying new data analytics to address the needs of the most vulnerable populations, introduce resilience in vulnerable situations, and help battle new sources of vulnerabilities. The aim was to highlight latest developments in the use of new sources of data, including web and social media, in the efforts to address the health and other needs of most vulnerable, including children, families, marginalized groups, and those at the threat of poverty, conflict, natural disaster, or epidemic risk. The workshop brought together practitioners from the humanitarian sector and researchers from around the world. Main themes from the keynotes, paper and abstract presentations, and an interactive panel concerned the development of data sharing agreements and monitoring infrastructure before potential disasters strike, although the data collection should be performed such that the populations are not harmed in the process, and the resulting insights should contribute to a holistic understanding of the situation and subsequent decision-making. Finally, the researchers should always keep in mind that, just because there is a lack of data on a population, does not mean there is no problem, as there are still vulnerable populations (such as those not using social media, or having restricted access to technology) who may need assistance and study, and who are currently bypassed by the “big data” efforts.

The scale, reach, and real-time nature of the Internet is opening new frontiers for understanding the vulnerabilities in our societies, including inequalities and fragility in the face of a changing world. Thus, the aim of this workshop was to encourage the community to use new sources of data to study the wellbeing of vulnerable populations including children, elderly, racial or ethnic minorities, socioeconomically disadvantaged, underinsured or those with certain medical conditions. The selection of appropriate data sources, identification of vulnerable groups, and ethical considerations in the subsequent analysis are of great importance in extending the benefits of big data revolution to these populations.

As such, the topic is highly multidisciplinary, and the workshop attracted a diverse audience, including academia across computer science, medicine and social sciences, as well as representation from humanitarian organizations including UNICEF and UN World Food Program.

The use of new data sources for the benefit of underserved and vulnerable populations presents several exciting research opportunities, as well as serious challenges in terms of reachability and privacy. Many new sources have been employed to study such populations, including social media, internet searches, and app usage. Most of these are owned by companies, which provide a limited access via their APIs. However, the APIs often put limits on data collection scope, some of which are reasonable in terms of privacy preservation. There have been recent attempts of companies providing special access (or producing special-purpose datasets) in the theme of ”data for good”, such as Facebook’s Data for Good and Twitter for Good initiatives. Other data, such as phone traces that are owned by telephone companies, or search queries that are usually owned by a few large service providers, may require a connection to a particular research group inside the company. In this workshop, we saw that there is an increased collaboration between companies and humanitarian sector in terms of data sharing, as well as many resources that are becoming well-organized and documented, such as those hosted by the Humanitarian Data Exchange (HDX).

On the other hand, the desire for more data must not overtake the need for privacy and an even more careful consideration of possible harms to the subjects the research ostensibly is trying to benefit. For instance, as one of the keynote speakers, Claudia Cappa, pointed out during open discussion, it is possible to directly harm the subjects when attempting to learn more, such as in cases of domestic violence or other abuse. Further considerations of privacy concern minors, as well as those who may be entangled in legal or bureaucratic systems, such as migrants and wage workers. Several researchers in the workshop have mentioned requests from governments and other agencies to provide surveillance services using the computational tools discussed, which had to be handled carefully and with the utmost consideration with the welfare of the target population.

Below, we summarize the presentations and discussions that took place at the workshop. Overall agreement among attendees was that it is an exciting area of research, which is becoming ever more important. The need for establishing methodologies and data sharing agreements is greatest before the disasters strike and must be addressed by all stake-holders as soon as possible.

The first keynote speaker, Claudia Cappa, is a Senior Adviser for Statistics and Monitoring in the Data and Analytics Section, at the UNICEF headquarters. Coming from a qualitative research background, Claudia discussed her experience with leveraging social media data to assess issues addressed traditionally via surveys and interviews. She presented a study of Twitter and Reddit data which was reviewed for cases of abusive content and children’s exposure to violence during the COVID-19 pandemic. Her findings show the potential of assessing important issues of vulnerable populations online, while she pointed out the need for creating collaborations with data owners prior to a disaster or a pandemic so that we create solid baselines when an emergency occurs.

Elisa Omodei, second keynote speaker of the workshop, is the Lead Data Scientist of the Hunger Monitoring Unit at the UN World Food Program’s Research, Assessment and Monitoring division. She presented the HungerMap, a near real-time tool to map and predict food insecurity with non-traditional data. She underlined that tools like this require important financial and organizational commitment initially, however, the potentials are enormous especially during emergencies due to pandemics, political conflict or simply places where survey data are not possible to obtain.

Bridging the two worlds, Vedran Sekara, the final keynote speaker of our workshop, Assistant Professor at the IT University of Copenhagen and a Principal Researcher & Machine Learning Lead at UNICEF, underlined the potentials but also limitations of AI to address issues of the most vulnerable. He pointed out that operationalizing AI models can be very challenging, while transferability of models is hard and should not be overseen. Importantly, he stressed that not everything that matters are or can be measured (marian woods paradox), while real breakthroughs in the humanitarian sector can only be accomplished by a close collaboration with local stakeholders.

Core part of our workshop was the panel discussion, which was organized in an interactive way, giving the possibility to all participants to actively ask questions and participate in the discussion. Supported by an online Q&A management system, a variety of topics were considered, especially those concerning the integration of the insights achieved via data science methods into the decision-making process of those who may provide the direct benefit to the studied populations. To ensure academic research has more chances of producing real-life benefits, the panelists suggested that it should be presented not only in the academic venues, but also those organized by the humanitarian organizations, such as Humanitarian Networks and Partnerships Weeks (HNPW). These, ideally, would then result in co-designed studies by several partners.

A question about understudied vulnerable populations brought the discussion to the inherent biases of some datasets, such as social media, which reaches only certain geographic and demographic areas. Dr. Sekara recommended that data sources that are more “egalitarian” should be considered, such as satellite imagery and phone records – data which is more likely to capture all residents of a geography, for example.

Another question concerned the ways in which big data can be harnessed in order to improve the design of surveys. One way to do this was to merge big data insights with survey data, such as that by Blumenstock et al.(Blumenstock, Cadamuro, and On, 2015) that models socioeconomic status using a combination of mobile usage and phone surveys. The survey questions themselves could be designed in the light of the knowledge extracted from data, such as satellite imagery, in order to hone the questions to the lived environment of the respondents.

A large part of the debate concerned privacy consideration when such research is performed. The panelists agreed that it is best to avoid individual-level data, and to work on the aggregated statistics as much as possible. All agreed that the project aims must also align with everybody involved, such that there is no doubt that no harm will come to the subjects. Here, Dr. Cappa brought up an example of data collections that may actually trigger harm if they are not performed extremely carefully. Finally, a participant pointed out that privacy can be not only an individual, but also a communal concern, and vulnerable communities must be considered.

Finally, the panelists were asked about summer school or internship programs that young researchers interested in the area could take advantage of. UNICEF has an internship program, and Data Analytics section may look for internships in the future, but they may be open to work jointly to have an informal collaboration to write papers. This summer there will be a summer school on Behavioral Digital Trace Data in Response to the COVID-19 Pandemic5 also visiting PhD students. Also, Center for Humanitarian Data has summer program that welcomes young researchers6. Finally, the panelists encouraged people to reach out to people who work in the field directly.

In summary, the following insights emerged from the discussion:

1. Decision-making for vulnerable populations should not be automatic

2. It is fundamental to establish a collaboration with governments, statistics departments, and academics from the relevant geographic locales, a priori and not during an emergency.

3. Data holders should listen to the needs of the community and provide actionable data. Ideally, the data should be fine-grained, recent, community-specific.

4. At the same time, attention should be paid not to harm the vulnerable community by collecting data.

5. Importantly, especially in the context of vulnerable populations, just because there is no data, doesn’t mean there is no problem.

All submissions were reviewed by at least three multidisciplinary program committee (PC) members in the fields of computer science, digital epidemiology, and computational social sciences. Two full papers and three abstracts were accepted based on the quality of the rigor of analysis, results and presentation, and we provide a brief description for each contribution below.

The role of vulnerability in mediating the relationship between threat perception and the use of face masks in the context of COVID-19, presented by Emanuele Del Fava. In this study, the authors conducted a multi-country survey employing Facebook as recruitment tool. The performed stratified advertisement by demographic group, stratified by sex (2), age (4), and region. The survey assessed several aspects concerning the pandemic including threat perception. Their findings showed increased threat perception and wearing masks for people with vulnerabilities: those highly threatened were 2.25 times more likely to wear a face mask than those little to moderately threatened. The association between high perceived threat and wearing face masks is weaker among women and to a lesser extent older adults. Threat perception and wearing face masks was higher among women and other vulnerable groups. The authors find it to be a timely and cost-effective way to collect data, even performing several different studies at the same time to compare results, which would then complement findings from surveys.

Tactical Reframing of Online Disinformation Campaigns Against The Istanbul Convention, presented by Tugrulcan Elmas. The study deals with the Istanbul convention signed by 34 countries in 2011 to protect women from domestic violence. Turkey announced withdrawal from convention in 2021, triggered a campaign. The authors perform a case study of tactical reframing using Facebook Data by Crowd-tangle, capturing most public pages and groups around the convention. They show a bottom up campaign against the convention which develops over time, with increased mentions of the rights of divorced men, citing homosexuality, and the joining of religious and political groups. In future work, the authors would like to understand whether the users change their narrative around the issue using Twitter.

The Role of Data-Driven Discovery in Detecting Vulnerable Sub-populations, presented by Girmaw Abebe Tadesse and Skyler Speakman. The author began by stressing the need to be careful about how we define vulnerability, arguing that it’s a discovery question, not a modeling one.

Thus, they propose using tools from anomalous pattern domain to detect vulnerable sub-populations in data, as a pattern detection task. Their aim was to overcome limitations of human-driven confirmatory analysis, using data-driven techniques. In several case studies, the authors illustrate how well-defined sub-populations can be found in data that have abnormal target statistic. For instance, in sub-Saharan Africa they find abnormal under-5 child mortality for a particular set of women. In another study in Ghana examining neonatal mortality, they find a particular scenario when the mortality rate is abnormally high. The work is in collaboration with Bill & Melinda Gates foundation, and code will be made publicly available.

Getting “Clean” from Nonsuicidal Self-injury: Addiction Language and Experiences on the Subreddit r/selfharm, presented by Himelein-Wachowiak. The study deals with two vulnerable populations: People with addiction and people who self-injure. The authors looked for words concerning addiction in the Reddit platform at the dedicated page r/selfharm. They used diagnostic criteria for substance use and addiction: 11 criteria which can have levels. They coded posts for symptoms of addiction, finding the top ones to be urges/cravings, escalating severity/tolerance, physically hazardous non-suicidal self-injury (NSSI), consistent efforts to quit or cut back, and causing interpersonal problems. The authors conclude that clinicians who treat NSSI may want to adapt techniques from addiction treatment. The authors also point out that, in their experience, Reddit may contain more “honest” self-expression than Twitter and Facebook, but unfortunately there is a lack of informed consent for such research.

For a complete listing of papers and recordings of the presentations, check out the workshop website at https://sites.google.com/view/dataforvulnerable21.

This workshop was organized by Yelena Mejova, Senior Researcher at ISI Foundation, in Turin, Italy, Kyriaki Kalimeri, a Researcher at ISI Foundation, in Turin, Italy, Daniela Paolotti, a Senior Researcher at ISI Foundation, in Turin, Italy, and Rumi Chunara, an Assistant Professor in the departments of Computer Science and Engineering and Epidemiology/Biostats at New York University, USA. We thank our workshop program committee members7 for their helpful reviews and support. This report was written by Yelena Mejova, Kyraki Kalimeri, Daniela Paolotti, and Rumi Chunara.

Emoji 2021: International Workshop on Emoji Understanding and Applications in Social Media

The 4th International Workshop on Emoji Understanding and Applications in Social Media was held on the 7th of June, 2021 as a full-day online meeting. This workshop brought together computer and social science researchers as well as leading industry practitioners to discuss and exchange ideas on understanding social, cultural, communicative, and linguistic roles of emoji while leading the discussions on building novel computational methods to understand them.

With the rise of social media, emoji have become an extremely popular way to enhance electronic communication. Social media data has been used to study how emoji are used across different languages, cultures, user communities, and as features to learn machine learning models to solve problems spanning many applications. The ability to automatically process, derive meaning, and interpret text fused with emoji is essential as society embraces emoji as a standard form of online communication. However, the pictorial nature of emoji, that (the same) emoji may be used in different contexts to express different meanings, that the same emoji may be rendered differently across different devices and platforms, and that emoji are used in different cultures and communities over the world who interpret emoji differently, make it difficult to apply traditional Natural Language Processing techniques to analyze and understand them. The goal of this workshop was to stimulate research and discussion on developing novel approaches to address above challenges.

Spencer Cappallo (Machine Learning Researcher, Replicant.ai) gave the opening keynote of the workshop on “Emoji are Emoji, and We can use them as such” where he discussed the importance of treating emoji as a new modality that straddles between language and graphics. He showed how emoji can be interpreted with respect to their relationships to both image and text in online communications. Susan C. Herring (Professor, Indiana University, Bloomington) gave the second keynote on “The Emoji as Language Question Revisited” where she discussed how emoji present varying degrees of languageness in different (sub)cultures, on different platforms, and at different points in time. In an invited talk, Neil Cohn (Professor, Tilburg University, Netherlands) tackled the emoji language debate from another point of view by comparing emoji to the elements of natural written languages and visual languages found in comics. Cesco Reale (Founder, KomunIKON) gave an invited talk on the KomunIKON project where the goal is to develop an iconic language to translate any sentence into a sequence of icons. Jennifer Daniel (Creative Directory, Google and Chair, Unicode Emoji Subcommittee) gave an invited talk on a strategy for improving the coverage of heart emoji, highlighting the gaps in the existing heart emoji category.

The eight research papers presented at the workshop covered a wide variety of topics including how emoji meanings change over time, how emoji are used by specific user groups (e.g., hackers, people of color), how certain emoji are used in online communication, and the linguistic properties of emoji among others. A panel on “The Pictographic Languages” provided an animated and engaging forum to the attendees to discuss research problems related to picto languages with leading researchers and practitioners. The panel consisted of Susan C. Herring (Professor, Indiana University, Bloomington), Cesco Reale (Founder, KomunIKON), and Neil Cohn (Professor, Tilburg University, Netherlands) where they discussed topics such as the grammar, semantics, discourse, and universality of pictographic languages. Our interdisciplinary workshop program sparked highly engaging and thought provoking discussions among nearly hundred researchers who attended the workshop.

Sanjaya Wijeratne, Horacio Saggion, Jennifer 8. Lee, and Amit Sheth served as co-chairs of the workshop. The accepted papers are published under ICWSM Workshop Proceedings and can be accessed at http://workshop-proceedings.icwsm.org/index.php?year=2021. The co-chairs would like to thank Emojination for partnering with Adobe Inc. to sponsor close to fifty attendees with their workshop registration fees. This report was written by Sanjaya Wijeratne and Horacio Saggion.

Information Credibility and Alternative Realities in Troubled Democracies

The workshop on Information Credibility & Alternative Realities in Troubled Democracies was held virtually on June 7 at AAAI ICWSM 2021. The workshop’s goal was to bring together researchers and practitioners to investigate how challenges in information credibility lead to alternative realities and how to mitigate their distorting effects on the public debate.

Fake news, conspiracies, manipulated media, selective reporting, facts derided as lies: citizens of democracies encounter wildly conflicting information about the world. For example, after the 2020 United States presidential election, many Republican voters believed that Trump won, and the official results were fraudulent. COVID-19 has produced its myriad of false beliefs, and even outright denials of the pandemic or other complex phenomena such as climate change are not uncommon. With blurring borders between fact and opinion, the prospects of deliberative democracy are increasingly bleak. Certainly, truth has always been precarious, neither misinformation nor widespread false beliefs are unique to our time. But the web and social media technologies have distinctly rearranged the struggles over truth and falsehood. Web technologies have created novel vectors for misinformation and disinformation through their unfiltered publication process, breakneck pace, and obsession with engagement (rather than accuracy).

The workshop on Information Credibility & Alternative Realities in Troubled Democracies brought together academics and practitioners to explore how to measure and mitigate the challenges of information credibility online. Complex problems require complex solutions, so the workshop convened different methods and ways of seeing, from qualitative/ethnographic work and educational curricula to large-data observational analyses and controlled experiments. Throughout the day, we explored information disorders and ways to cope with them.

A central theme was the importance of building a more trustworthy information ecosystem. Renee DiResta from the Stanford Internet Observatory started the discussion with a keynote reporting on The Election Integrity Partnership exploring approaches to defend against voting-related mis and disinformation. Keeley Erhardt from the MIT Media Lab followed with a talk arguing that misinformation research has narrowly focused on factual veracity. While a focus on factual falsehood simplifies the identification and operationalization of information issues, it misses more subtle and systemic information operations that seek to undermine public faith. Andrew Beers and Sarah Nguyen from the University of Washington built on this in their description of a single Twitter user who achieved an outsized impact on the COVID-19 discourse not by large-scale production of falsehood but by systematically and strategically starting bad faith arguments. Marco Di Giovanni et al. further explored vaccine narratives in the Italian Tweetosphere through temporal and geographical analyses.

A second central theme was building individuals’ core competencies for discerning truth from falsehood. In his keynote, David Rand from MIT’s Sloan School discussed laboratory and field experiments that suggest shifting attention to accuracy could reduce the sharing of misinformation. His talk was a call for interventions that improve the cues users have to make informed decisions and mitigate the distracting scrolling patterns social media can induce. Building on this call, James Stomber from the Credibility Coalition discussed a new framework for designing and evaluating credibility tools. Similarly, Emilia Hughes from the University of Washington discussed credibility tools for YouTube based on various designs that would quote the references of the videos. Sarah Cen and Devavrat Shah from the Massachusetts Institute of Technology further expanded on theoretical guarantees showing that algorithmic filtering could be adapted to reduce the spread of fake news. On another note, Nils Kobis et al. investigated people’s capacity to detect deep fakes, hinting at the potentially problematic societal consequences.

In the open discussion concluding the workshop, we discussed cross-cutting tensions in this work, which can be summarized as three questions: How do we enable individual choice and deliberation in online information while acknowledging that many users do not inform themselves according to enlightenment ideals? How do we, as academics, reflect on our own biases that may impact research with inherent political connotations, and how do we conduct information research that relies on unbiased research concepts and agendas? When there’s a gap between what people do and what researchers think they “should” do, how do we close the gap while respecting the rights and autonomy of participants?

A scheme emerged for thinking through these problems, which divides interventions into two classes: The first class is concerned with more widespread basic information competency and respective education needs. In line with this, Thomas Nygren from Uppsala University provided an insightful presentation on his ongoing work on training adolescents in civic online reasoning using a literacy tool. The second class of interventions focused on design interventions that bring out that competency developed by the first class, such as Rand’s accuracy prompts or Stomber and Hughes’ credibility indicators. Tarunima Prabhakar and Anushree Gupta from Tattle Civic Technologies also studied the impact of such policies. The discussions suggested that the way of thinking through the core tensions in misinformation may differ for these two classes of interventions. Such conversations across disciplines and perspectives are crucial for understanding and reducing the spread of misinformation. Participants said they found the workshop enlightening and that they would attend more workshops with a similar topic.

Maurice Jakesch, Manon Revel, and Ziv Epstein served as co-chairs of this workshop. The symposium papers are published under the Proceedings of ICWSM-21 or on the workshop website (https://zivepstein.github.io/info-credibility-workshop/). This report was written by Maurice Jakesch, Ziv Epstein, and Manon Revel.

International Workshop on Cyber Social Threats

The Second Cyber Social Threats Workshop (http://cysoc.aiisc.ai/) was held on June 7, 2021. The goal of this workshop was to facilitate a rich forum for researchers and practitioners from both academia and industry in the areas of computing and social science, to discuss novel avenues for research on interdisciplinary aspects of harmful communications on social media, to exchange ideas and experiences, and to identify new opportunities for collaboration. The workshop garnered significant attention from the community with 119 registrants and 84 unique active participants, for two keynotes, one panel discussion, seven paper presentations, two demo presentations and one synthesis/brainstorming session.

Online platforms have been a prominent communication medium being used on a daily basis, which also introduced novel challenges due to their misuse by malicious actors and organizations. These cyber social threats often significantly impact the well-being of individuals as well as communities and our society at large, and they include online extremism, cyberbullying, harassment, fake news, human trafficking, gender-based violence and many others. The misuse of technology has been particularly rampant during the COVID-19 pandemic, as the efforts in the spread of misinformation on COVID-19 dramatically increased. In the pursuit of stimulating novel research directions, exchanging ideas and experiences, and identification of new opportunities for collaboration, the Cyber Social Threats workshop provided a forum to bring together researchers and practitioners from both academia and industry in the areas of computational social sciences, social network analysis and mining, natural language processing, computational linguistics, human-computer interaction, and cognitive scientists to present their related, fundamental research and emerging applications.

The two invited keynotes provided diverse insights from both social science and computing perspectives on the analysis, detection and countering potentially harmful content and behaviors. Brooke Foucault Welles from Northeastern University discussed the online framing of real-world protests with a case study around the protests in Baltimore. She explained how communications in this contested online network evolved as users constructed meaning and debated questions of protest and race. She shared their findings from the study as justice-oriented messages spread even in such contested networks and these networks incubate social justice messages along with extreme and hateful ideas. Huan Liu from Arizona State University shared recent advances in research beyond detection for AI-enabled strategies to combat cyber social threats. He shared findings from his group’s research as explainability and causality are essential in fast detection of fake news. Then, he described the next frontiers of countering strategies as mitigation of disinformation/fake news and bias on online platforms and how to develop novel algorithms and models accordingly.

A panel discussion on “The Role of AI in Countering Cyber Social Threats” featured Joan Donovan from Harvard University, Fil Menczer from Indiana University, and Alexandra Olteanu from Microsoft Research as panelists, and Aleszu Bajak from USA Today as the moderator. The discussion provided insights on challenges and opportunities from computing, social impact/implications, and ethics perspectives.

The workshop had seven accepted papers out of 13 submissions, all of which were reviewed by at least three multi-disciplinary program committee (PC) members (25 in total). The papers were about the Covid-19 vaccine, toxicity, misinformation dissemination, online user migration, human trafficking, organized inauthentic online activities, and an antisemitism dataset.

Two interesting demonstrations were presented to the participants on very timely issues: online extremism and Covid-19 vaccine adoption. Welton Chang and Eric Curwin from Human Rights First presented a platform, “Extremism Explorer”, that collects and analyzes content from multiple social networks, allowing researchers to research online violent hate speech in real time. Matthew DeVerna from Indiana University presented a tool, “CoVaxxy”, for visualizing the relationship between COVID-19 vaccine adoption and online (mis)information.

Lastly, a synthesis exercise session took place where the participants brainstormed ideas that they found most important, urgent, and high-impact for potential future work and collaborations. The ideas included understanding self-hate and characteristics of language of self-deprecation, investigating the content and network of actors that disseminate hate speech towards Jewish and Muslim communities. Further, for combatting human trafficking, the need to develop tools that will facilitate real-time data collection, detection and response capabilities.

At the end of the workshop, the participants expressed their interest in collaborating on the identified problems and areas, as well as participating in future workshops to be organized in the coming years. Ugur Kursuncu, Jeremy Blackburn, Yelena Mejova, Megan Squire, and Amit Sheth served as co-chairs of this workshop. This report was written by Ugur Kursuncu. The papers of the workshop were published as proceedings of the ICWSM workshops. A more extensive report can be found here: http://workshop-proceedings.icwsm.org/abstract?id=2021_81

International Workshop on Social Sensing: Special Edition on Information Operations on Social Media

The 6th International Workshop on Social Sensing (SocialSens 2021) was held online at the AAAI International Conference on Web and Social Media on June 7, 2021. The goal of the workshop was to bring together social scientists and computer scientists in a multidisciplinary meeting to discuss research that interprets social media as measurement instruments. The theme of the 2021 workshop was on information operations on social media.

Social media offers an unprecedented view into human habits, customs, culture, stances and descriptions of physical events that transpire in the world. They also give unprecedented opportunities to spread misinformation, influence opinion, distract from the truth, or advance specific agendas. The SocialSens workshop brought together researchers from a variety of disciplines including computer science, information & data science, communication, social work, psychology, and anthropology. The workshop was divided into two thematic paper sessions, a short paper session of vision abstracts, an expert panel discussion on information influence and a keynote on mapping emotions on social media.

In the first paper session, three papers were presented on social sensing during the pandemic. Research was presented on analyzing and modeling 2020 Twitter narratives, including the COVID-19 pandemic and the murder of George Floyd. In the second paper session, three papers were presented on mis and disinformation, including fact-checking, claims on voter fraud in the 2020 U.S. election, and multi-platform information operations. The short paper session included vision abstracts on asymmetric polarization that occurs online, identity signaling in online text, and new methods for identifying false information in social media and the intent behind it.

Cecile Paris from CSIRO, Australia, gave the keynote talk at the workshop. She presented two projects on using social media to understand mental health. These projects include tracking mental health of individuals on Twitter in real-time at scale and designing social media platforms specifically for mental health applications, allowing individuals to express emotion in order to identify emotional trajectories.

The workshop concluded with a panel on resilience and vulnerability to information influence. The panel included Kathleen Carley from Carnegie Mellon University, Jeff Hancock from Stanford University, and Maria Rodriguez from the University at Buffalo. The panelists discussed the amount and variety of false information online, where it comes from, and the intent behind it. Recent efforts and challenges to mitigating the spread of mis and disinformation were also presented. An individual’s skepticism may be increased with the goal of identifying fake news, but at a cost of less trust in information in general. Preventing the spread of misinformation in minority populations, who have lost trust in institutions due to systemic racism, presents its own unique set of challenges.

Adriana Iamnitchi and George Mohler served as co-program chairs of this workshop. The papers of the workshop were published by AAAI Press. This report was written by Adriana Iamnitchi and George Mohler.

Mediate 2021: News Media and Computational Journalism

With a drastic shift towards digital communication, individuals and organizations are able to almost disseminate information instantly to a large audience with little-to-no regulation, creating both new challenges and opportunities. This digital shift in our media sphere has caused a profound change in the production and consumption of information, which in turn has strong implications on the social and political landscape. The challenges that result from mass information diffusion have become more visible to the general public in light of recent events such as the COVID-19 infodemic and the US election. In this second rendition of MEDIATE, we continued our focus on the topic of misinformation and examined it via three interrelated lenses: (1) automated methods tackling misinformation; (2) uptake of automated information verification; and (3) ethics, regulation, and governance. In line with the spirit of this workshop series, we brought together media practitioners and academics to discuss these three themes, with particular emphasis on cross-discipline interaction. Our workshop shed light on a variety of perspectives around common themes, opportunities for collaborations and further research, as well as open challenges.

Online media today plays an unprecedented role on political, economic, and social scales. The rise of Web technologies enables most individuals to almost disseminate information instantly to a large audience with little-to-no regulation or quality control. This transformation has permanently altered the “information sphere” we live in (Elisa Shearer 2018).

The digital shift presents benefits to both the media industry and the public. In particular, digitalization has reduced the cost of publishing, has built new bridges between media outlets and their audiences, and generally has facilitated access to information. However, these opportunities come at a price: digital information diffusion tends to amplify disinformation and polarization phenomena and makes it hard to distinguish credible information from misleading content (Myllylahti 2018). This change has already led multiple disciplines to re-examine the notions of “truth” online. Over the past year, the COVID-19 infodemic and, more recently, the US election, have further brought the challenges of online media to the attention of the general public.

We focus on misinformation through three interrelated themes: (1) automated and semi-automated methods to counter misinformation; (2) real-world use-cases in which such methods can be employed; and (3) envisioned challenges towards the regulation of such methods. Importantly, we simultaneously considered the perspectives and experiences of practitioners and researchers, attempting to identify concrete challenges and opportunities that can be tackled at the cross-section of these two realms.

The problem of misinformation spread is one of the most significant challenges of the information age. Social media platforms enable it to spread rapidly, reaching broad audiences before manual verification can be performed. The severe harm that inaccurate information can cause to society in critical situations has led to an increased interest within the scientific community to develop tools that assist with the verification of information from social media. Throughout the workshop, we considered novel methodologies, describing their advantages and challenges in creating accurate systems that would be adopted in practice by journalists and the public. We also highlighted problems in misinformation that require further attention from the academic community.

With the rise of high-profile cases of the negative real-world impact of misinformation, news agencies and fact-checking organizations have significantly increased their efforts in debunking false or inaccurate information. Despite such efforts, manual verification is incapable of scaling with the abundance of (mis)information. While researchers have been offering automated solutions, there are a number of challenges to overcome before they have the potential to become widely adopted. Conversations with practitioners have revealed lack of trust in automated solutions for detecting misinformation due to poor generalizability to new topics, lack of interpretability, and potential algorithmic bias. Throughout the workshop, we discussed which automated solutions have been successfully adopted, the challenges in the adoption of solutions addressing misinformation directly, and (partial) solutions to said challenges.

Human-led moderation is limited by the volume of data that content curators can process. Therefore, automation has the potential to significantly increase the efficiency and scope of content moderation and fact-checking. However, unregulated interventions on social media – typically, content removal – could be regarded as a form of censorship imposed by private companies. This situation could be exacerbated by the introduction of computational methods into moderation processes. Throughout the workshop, we discussed the types of interventions that are the most effective to prevent the spread of false claims, the ethical boundaries of automated content moderation, the limits of automated information regulation, and how one can implement public and private governance on automated content moderation. In addition, the workshop facilitated interdisciplinary discussions around important questions on digital governance and democracy.

We had six invited keynote speakers who shared their perspectives on the three main themes of the workshop. We note that while we group key messages from each speaker by theme below, each invited speaker often touched on more than one theme throughout their talk.

Automated Methods Tackling Misinformation: Kristina Lerman, Principal Scientist at the USC Information Sciences Institute, talked about evaluating science skepticism and measuring polarization in the US using social media posts (Rao et al. 2020). Her research estimated polarization across multiple dimensions, including attitudes towards science, politics, and political mode racy using a large-scale COVID-19 Twitter dataset (Chen, Lerman, and Ferrara 2020). Their findings have shown that opinions about COVID-19 are strongly polarized, and polarization dimensions are correlated. For instance, conservatives are more anti-science, while moderates (centrist) are more pro- science. This study found that existing anti-science attitudes (in 2016) created fertile grounds for COVID-19 misinformation and mistrust of experts to spread. Divergent responses to the COVID-19 pandemic in the US showed that partisanship and mistrust of institutions, including science, can increase resistance to COVID-19 mitigation measures and vaccine hesitancy.

Chris Bregler Director / Principal Scientist at Google AI gave a talk titled “Context is everything: On Manipulated Media, Context Retargeting, and Misinformation Mitigation”. The key point of the talk was to raise awareness of the danger posed by so-called cheapfakes. While deepfakes are indeed dangerous and certainly enable various forms of abuse, cheapfakes, are more prevalent and require more attention than they currently receive from academics. Cheap-fakes are also more difficult to detect as they are more general. While more researchers have moved into cheapfakes over the past year, an important aspect of them that still has not garnered enough attention is the change of context (e.g., change of caption, time, location) in original material (Aneja, Bregler, and Nießner 2021). For this purpose, a new challenge to encourage research in out-of-context detection has been announced.

Mevan Babakar, COO at FullFact and Board Member of Democracy Club International Fact-Checking Network, talked about how fact-checking is performed at FullFact and how it is automated (Babakar and Moy 2016). Three tasks are being automated at FullFact: (1) claim detection, i.e., identifying factual statements that can be checked from the daily stream of sentences from UK media; (2) claim matching, i.e., identifying repetitions of claims, and (3) robochecking, i.e., checking a claim in real-time against primary sources. From a methodology perspective, both claim matching and claim detection solutions currently in use at FullFact rely on BERT-based models. Specifically, claim matching involves an ensemble of Sentence-BERT models for semantic similarity, topic detection, and entity extraction. Mevan pointed out that claim matching and claim detection are synergistic tasks.

Uptake of Automated Information Verification: Mevan Babakar pointed out that FullFact has experienced an impressive increase in their ability to fact check information since their adoption of automated claim detection, which boosted the number of detected claims by 1000x. However, she highlighted the need for accountability at every level of information dissemination and the fact-checking process. She also emphasized the importance that fact-checking organizations retain independence from governmental and other private organizations. Furthermore, Mevan mentioned a range of crucial and still open questions about the fact-checking process that will inform the future direction of discussions and research in this field, such as how to prioritize the most valuable claims, how third parties are using the data, and how platform-wide changes affect long term behaviors and attitudes.

Ethics, Regulation, and Governance: Rasmus Kleis Nielsen, Professor at the University of Oxford and Director of the Reuters Institute gave a talk on “‘News you don’t believe’: Public perspectives on f*ke news and misinformation and what they can tell us about automated and regulatory responses”. Rasmus noted that workable and robust mitigants to problems in online media such as misinformation lie at the intersection of (1) what research can validate and improve, (2) what policy can endorse, and (3) what the public needs or understands the problem to be. Studies from the Reuters Institute for the Study of Journalism surveyed individuals from a number of countries and contains statistics and qualitative insights into different facets of this question (N. Newman et al. 2020; Nielsen and Graves 2017). One important finding of these studies is that people’s conception of fake news is much broader than the commonly studied interpretation of “false information”; it includes satire, poor journalism, and certain types of advertising. Another finding of these studies is that there is significant public exposure to “poor journalism” (e.g., factual mistakes, misleading headlines, click- baits). For policy-level attempts at tackling misinformation, it is crucial to be evidence-based but also to account for the public’s perception and understanding of misinformation.

David Leslie, Ethics Theme Lead at the Alan Turing Institute, gave a talk on “Governing Critical Digital Infrastructure as a Global Public Utility” (Leslie 2020; Leslie 2019). He drew from existing work and ideas and highlighted five key practical goals that regulators should think about in safe-guarding digital governance: (1) prohibiting of targeted advertising; (2) securing common carriage, equity, fair pricing, and non-discrimination; (3) setting in place structural regulations that remove incentives to and organization enablers of predatory behavior; (4) instituting mechanisms of democratic governance and community ownership; and (5) building a global regulatory capacity to manage the global character of critical digital infrastructures. David also mentioned the importance of researcher awareness for mitigating potential misuse of automated systems and the ongoing work in this space by the Alan Turing Institute.

Ramsha Jahangir, Journalist at Coda Story and Scholar at Erasmus Mundus Journalism, gave a talk on “Content regulation in the digital age”. Ramsha highlighted the tension between government regulations, platform interventions, and internship freedoms, e.g., governments can ban media platforms, and media platforms can de-platform individuals, control misinformation labeling, and ban political advertisements. All these actions have repercussions on internet freedoms. While the tension between government, platforms, and internet freedoms is present on a global scale, it is crucial to remember that it can manifest with very different degrees of citizen censorship in different parts of the world (Jahangir 2020). The way in which these three entities should interact in the years to come is a big open question with profound ramifications.

We had two contributions on automated methods tackling misinformation and three contributions on the uptake of automation, discussing potential solutions that can be implemented by social media platforms to combat the spread of misinformation.

In the first contribution on this topic (Gruppi et al. 2021), the authors study the utilization of tweets by news sources of different credibility levels. Specifically, the authors show the differences in tweets embedded by reliable and unreliable sources regarding quantity and quality, the topics they cover, and the individuals they cite. The main takeaway of this study is that unreliable sources use significantly more Twitter-based content than reliable sources. Furthermore, 41% of the users cited by reliable sources were accounts verified by Twitter, against 14% cited by unreliable sources.

In the second contribution (Gruppi, Horne, and Adalı 2021), the authors propose a novel news veracity detection model. In particular, the authors show that content sharing behaviors, formulated as networks, represent signals of reliability. They investigate the interplay between network and text features in a predictive task and show that incorporating content sharing features leads to performance gain and makes the model more robust to concept drifts.

Uptake of Automated Information Verification: Mohsen Mosleh gave a talk on “Reducing Inaccurate Information on Social Media”. In this talk, Mohsen described several cognitive reflection experiments deployed on Twitter to study the prevalence of fundamentally low-quality information. Specifically, the experiments focused on two research questions, namely, who you follow and what you share. The main takeaway of these experiments was that people who engage in less cognitive reflection are more likely to consume and share low-quality content.

In another contribution on the same topic (Sumpter and Neal 2021), the authors conducted a cross-sectional survey in which 204 survey respondents rated the credibility of four news articles, each randomly assigned a credibility warning (i.e., an assessment of the article’s credibility determined by either an AI agent or human journalist, or no assessment at all). The authors found that AI warnings are as successful, if not more so, than warnings provided by a journalist, at influencing participants’ assessments of a news article’s credibility (regardless of the warning’s accuracy). Furthermore, language sentiment may influence the degree to which a user perceives and believes such warnings.

In the last contribution (Gausen, Luk, and Guo 2021), the authors developed an agent-based model of the spread of information on social media networks with four agent types (susceptible, believer, denier, and cured). The main take-away of this work was that agent-based modeling could be a useful tool for policy-makers to evaluate misinformation countermeasures at scale before implementing them on a social media platform.

Several opportunities and open questions were highlighted during the workshop. We mention a few below:

• Discussions throughout the workshop have highlighted that communication of uncertainty and explanations of model decisions will play a key role in unlocking public trust in automated fact-checking.

• The need for accountability at every level of information dissemination and fact-checking process, as well as the importance of the independence of fact-checking organizations, were also highlighted during the workshop.

• An array of important open questions to enhance and enable trustworthy fact-checking decisions and interventions were presented, such as how to prioritize the most valuable claims, what is an acceptable margin of error in automated solutions, how third parties are using the fact-checked information, and how platform-wide changes affect long-term behaviors and attitudes.

• In light of the current pandemic, when ongoing research unravels new facts at a high pace in real-time, there is a rising concern on how automated verification tools will tackle time-sensitive claims. This highlights the need for further research in this direction.

• While deepfakes are dangerous and leave room for serious forms of abuse, cheapfakes and, in particular, out-of- context cheapfakes are far more prevalent “in the wild”. Further academic effort and attention is needed to automate the detection of out-of-context cheapfakes. In general, researchers should not ignore seemingly easier but yet unresolved problems and consider their prevalence in the real world.

• Nudging users to consider the accuracy of the information affects how users consume and share content. Measuring and comparing the causal effect of various intervention methods, such as misinformation labeling or nudging, on the way in which individuals interact with inf

ormation, remains an important open question.

• Credible and robust mitigants for misinformation need to lie at the intersection of (1) what research can validate and improve, (2) what policy can endorse, and (3) what the public perception of the problem is.

• Measures taken by governments and platforms differ significantly across countries. Therefore, it becomes apparent that efforts to tackle misinformation need to account for the circumstances and needs of different countries, as well as the scale of the problem.

• Misinformation is a multifaceted problem where different entities play a key role: government, policy makers, platforms, publishers. The way in which the incentives of these different components should be balanced against each other and against the freedoms of citizens is an open and ongoing discussion with deep ramifications.

This workshop was organized as part of the efforts of Alan Turing Institute’s special interest group “Media in the Digital Age”. The organizers of the interest group have a multifaceted expertise in research areas related to news media such as rumor detection and verification (Kochkina, Liakata, and Zubiaga 2018; Kochkina and Liakata 2020; Gorrell et al. 2019), network science (Bazzi et al. 2020; Bazzi et al. 2016), selection bias (Rappaz, Bourgeois, and Aberer 2019; Bourgeois, Rappaz, and Aberer 2018) and scientific misinformation (Smeros, Castillo, and Aberer 2019; Romanou et al. 2020). The purpose of the interest group is to bring together and facilitate concrete collaborations between academics and practitioners (e.g., journalists, fact-checkers, and platforms) in order to explore important topics such as the adoption of technology in the media sphere, misinformation, content moderation, and personalization, interplay between key players of the media sphere, the future of digital media.

Panayiotis Smeros, Je ́re ́mie Rappaz, Marya Bazzi, Elena Kochkina, Maria Liakata, and Karl Aberer served as co-chairs of this workshop and coauthored this report. The contributed papers of the workshop are published in the Workshop Proceedings of the International AAAI Conference on Web and Social Media.

Work by Maria Liakata and Elena Kochkina was supported by a UKRI/EPSRC grant (EP/V048597/1) to Profs Yulan He and Maria Liakata as well as project funding from the Alan Turing Institute, grant EP/N510129/1. Marya Bazzi was supported by the Alan Turing Institute under the EP-SRC Grant No. EP/N510129/1.

Mining Actionable Insights from Social Networks: Special Edition on Healthcare Social Analytics

The 6th International Workshop on Mining Actionable Insights from Social Networks (MAISoN 2021) was held virtually M 2021). For this edition, we ran a special edition of the workshop with focus on healthcare social analytics. The goal of this special edition was to investigate different techniques that use social media data for building diagnostic, predictive and prescriptive analysis models for health applications.

With the emergence and growing popularity of social media such as blogging systems, wikis, social bookmarking, social networks and microblogging services, many users are extensively engaged in at least some of these applications to express their feelings and views about a wide variety of social topics as they happen in real time by commenting, tagging, joining, sharing, liking, and publishing posts. This has resulted in an ocean of data which presents an interesting opportunity for performing data mining and knowledge discovery in many domains including healthcare. The recent highly impressive advances in machine learning and natural language processing present exciting opportunities for developing automatic methods for the collection, extraction, representation, analysis, and validation of social media data for health applications. These methods should be able to simultaneously address the unique challenges of processing social media data and timely discover meaningful patterns identifying emerging health threats. This topic was especially timely because of all the considerations surrounding the impact of social media during the COVID-19 pandemic which has impacted on people’s physical, mental and social health issues.

This workshop brought together researchers and practitioners from across the world and from different disciplines, such as information retrieval, data mining and machine learning as well as social network analysis, healthcare informatics, network science and complex networks. The major theme of papers presented at the workshop was mining actionable insight from social media data for health research and applications. This workshop included three invited talks on this theme: 1) The talk delivered by Stevie Chancellor (University of Minnesota) outlined human-centered machine learning for dangerous mental health behaviors discussed on social media, 2) The talk by Abeed Sarker (Emory University School of Medicine) covered how insights can be obtained about non-medical use of prescription medications from social media via natural language processing, and 3) The talk by Jia Xue (University of Toronto) focused on Twitter as a tool for understanding sexual assault and family violence.

The workshop also included a panel on privacy and ethical considerations of mining health data from social media, with 5 panelists: Khaled El Emam (University of Ottawa), Farzaneh Etminani (Halmstad University), Tristan Henderson (University of St Andrews), Kirsten Ostherr (Rice University) and Reihaneh Rabbany (McGill University). Further, as a follow up to this edition of the workshop, we are organizing a journal special issue with IEEE Transactions on Network Science and Engineering journal (Impact Factor: 5.213) which is widely recognized by the community as a strong and leading publication venue and known for its diligent review process.

Ebrahim Bagheri, Diana Inkpen, Christopher C. Yang and Fattane Zarrinkalam served as co-chairs of this workshop and wrote this report. The papers of the workshop were published in the Workshop Proceedings of ICWSM 2021.

Biographies

Karl Aberer works at École Polytechnique Fédérale de Lausanne (EPFL).

Ebrahim Bagheri s an Associate Professor in the Department of Electrical, Computer and Biomedical Engineering at Ryerson University.

Marya Bazzi is at the University of Warwick Alan Turing Institute.

Rumi Chunara is an Assistant Professor in the departments of Computer Science and Engineering and Epidemiology/Biostats at New York University, USA.

Ziv Epstein is a Ph.D. candidate at the MIT Media Lab.

Fabian Flöck is at GESIS.

Adriana Iamnitchi is a professor of Computer Science and Engineering at University of South Florida.

Diana Inkpen is a Professor in the School of Electrical Engineering and Computer Science at the University of Ottawa.

Maurice Jakesch is a Ph.D. candidate at the Department of Information Science at Cornell University.

Kyraki Kalimeri is a Researcher at ISI Foundation, in Turin, Italy.

Elena Kochkina is at the Alan Turing Institute and Queen Mary University of London.

Ugur Kursuncu is a Postdoctoral Fellow, Artificial Intelligence Institute, University of South Carolina, SC, USA.

Maria Liakata is at the Alan Turing Institute and Queen Mary University of London.

Yelena Mejova is a Senior Researcher at ISI Foundation, in Turin, Italy.

George Mohler is a professor in the Department of Computer and Information Science at Indiana University – Purdue University Indianapolis.

Daniela Paolotti is a Senior Researcher at ISI Foundation in Turin, Italy.

Jérémie Rappaz works at École Polytechnique Fédérale de Lausanne (EPFL).

Manon Revel is a Ph.D. candidate at the MIT Institute for Data Systems and Society.

Horacio Saggion is a Professor and the Head of the Large Scale Text Understanding Systems Lab, Universität Pompeu Fabra, Spain.

Indira Sen is at GESIS.

Panayiotis Smeros works at École Polytechnique Fédérale de Lausanne (EPFL).

Katrin Weller works at GESIS.

Sanjaya Wijeratne is a Research Scientist at Holler Technologies, Inc., USA.

Christopher C. Yang is a Professor in the College of Computing and Informatics at Drexel University.

Fattane Zarrinkalam is a Research Scientist at Thomson Reuters Labs.

Reports of the Association for the Advancement of Artificial Intelligence’s 15th International Conference on Web and Social Media