How to reduce AI harms and risks at once

Concerns about AI ethics are sometimes pegged against concerns about AI safety. The risks from AI are too important for the community of people addressing them to fracture.

Cross-posted from the Simon Institute Blog.

Concerns about AI ethics — the harms AI causes due to imperfect development and training — are sometimes pegged against concerns about AI safety — the risk that AI development could lead to the loss of human control. Luckily, a collaborative approach to addressing these interconnected risks is easily found.

AI ethics is a field that seeks to create fair and accountable AI systems by addressing bias, discrimination, privacy, and transparency concerns with present-day AI systems. AI Safety is a field that focuses on mitigating the potential risks associated with frontier AI systems by researching how to create safe and secure AI systems aligned with human interests and values.

Among policymakers, academics, and the public, the discussion can appear monopolized by those at the extreme of either side, who dismiss the concerns of the other as mere distractions. However, this perspective is not universal. Many in these communities acknowledge and appreciate that AI ethics and AI safety are fundamentally interconnected.

At its core, AI ethics and AI safety share the common goal of ensuring that AI works for the benefit of all of humanity. There is a consensus that current AI systems can cause real harm, and that these harms might not always result from human misuse or inherent agency within the AI systems, but rather from poorly designed systems. The trend to build more and more advanced systems without a coordinated policy response to current ones could result in worse societal outcomes.

To consider each of these issues in isolation or opposition hinders the development of effective solutions.  The complexity of modern AI systems, such as the foundation model architecture behind ChatGPT that remains a black box to humans, makes it hard to understand how they make decisions. Both advocates for AI ethics and AI safety researchers have called for transparent, interpretable AI systems.

Take, for instance, a job recruitment AI system, which has been trained without explicit reference to gender, yet, it appears to favor male applicants for certain positions such as those in STEM fields. If the AI system were more interpretable, it might reveal that the algorithm indirectly uses gender-related factors, such as certain activities or keywords in a CV that might be more typical in one gender over the other. Likewise, more interpretable systems also allow researchers to detect and correct deceptive AI behavior or AI systems misaligned with human values.

Research is being conducted to develop controllable AI systems, with well-understood capabilities and restrictions. Such a bounded design would set clear parameters for what the AI system can and cannot do before deployment. Policymakers can mandate developers to ensure that systems cannot access more personal data than necessary for their function. By establishing clear limits on the power and capabilities of AI systems, we can prevent them from exceeding our control or understanding. Third-party auditors can then verify and evaluate the limits and capabilities of these models, holding AI developers accountable for any violation of ethical and safety standards.

To explain how current harms and future risks relate, Critch (2023) provides a multidimensional taxonomy of the causes of society-scale risks from AI, which we summarise as follows:

  1. Diffusion of Responsibility: The harm from AI is the absence of collective responsibility for AI systems, as “no one is uniquely accountable for technology’s creation or use.”
  2. Unintentionally large AI impacts: This occurs when AI that is not expected to have a societal-scale impact does. For example, AI researchers generate a natural language text model to produce artificial hate speech for robust hate-speech detection. However, the corpus is accidentally leaked, resulting in the large-scale reuse of “‘scientifically proven” insults.
  3. Unintentionally negative AI impacts: AI that is originally intended to have a large societal impact turns out to be harmful by mistake
  4. Willful indifference: creators of AI technology are willfully unconcerned about its moral consequences, allowing it to cause widespread societal harms like pollution, misinformation or injustice
  5. Weaponization (criminal and state ): AI is created to intentionally inflict harm, such as terrorism, or deployed in war or law enforcement.

Also, the Centre for AI Safety writes there is “considerable overlap” between researchers from both sides. They highlight “many existing policy proposals designed to address the present impacts of AI systems would also promote AI safety if implemented”:

  1. Legal Liability for AI Harms: The AI Now Institute advocates for stricter legal liability frameworks for the consequences caused by AI systems to discourage developers from relinquishing responsibility, thus addressing misinformation and algorithmic bias, and promoting safety as a key aspect of AI design.
  2. Increased Regulatory Scrutiny for AI Development: The Institute also suggests more regulatory scrutiny throughout the AI product cycle, with accountability for developers’ data and design choices. Such transparency measures aim to fight algorithmic bias, prevent unauthorized profit from copyrighted materials, and avoid issues stemming from goal misspecification.
  3. Human Oversight in AI Systems: The European Union’s proposed AI Act highlights the necessity of human intervention in high-risk AI systems. Human oversight can mitigate the risks of algorithmic bias and the spread of false information, and enable early detection and management of hazardous AI systems.

To effectively manage the potential and risks of AI, the conversation must not be hijacked by a polarized dichotomy that fails to appreciate the interwoven nature of AI ethics and safety. Neither AI ethics nor AI safety risks are distractions - both are legitimate and pressing concerns. Focusing on dismissing or downplaying the concerns of one side is itself a distraction from the substantive and productive progress needed to drive the safe and ethical development and deployment of AI systems.