Hattie Zhou

Societal Alignment Frameworks Can Improve LLM Alignment

Karolina Stanczak

Nicholas Meade

Mehar Bhatia

Hattie Zhou

Konstantin Böttinger

Jeremy Barnes

Jason Stanley

Jessica Montgomery

Richard Zemel

Nicolas Papernot

Nicolas Chapados

Denis Therien

Timothy P Lillicrap

Ana Marasovic

Sylvie Delacroix

Gillian K. Hadfield

Siva Reddy

Recent progress in large language models (LLMs) has focused on producing responses that meet human expectations and align with shared values… (see more) - a process coined alignment. However, aligning LLMs remains challenging due to the inherent disconnect between the complexity of human values and the narrow nature of the technological approaches designed to address them. Current alignment methods often lead to misspecified objectives, reflecting the broader issue of incomplete contracts, the impracticality of specifying a contract between a model developer, and the model that accounts for every scenario in LLM alignment. In this paper, we argue that improving LLM alignment requires incorporating insights from societal alignment frameworks, including social, economic, and contractual alignment, and discuss potential solutions drawn from these domains. Given the role of uncertainty within societal alignment frameworks, we then investigate how it manifests in LLM alignment. We end our discussion by offering an alternative view on LLM alignment, framing the underspecified nature of its objectives as an opportunity rather than perfect their specification. Beyond technical improvements in LLM alignment, we discuss the need for participatory alignment interface designs.

2025-03-04

Bi-Align @ International Conference on Learning Representations (poster)

doi.org

openreview.net

Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok

Tikeng Notsawo Pascal Junior

Pascal Notsawo

2024-03-03

ICLR.cc/2024/Workshop/ME-FoMo (poster)

doi.org

openreview.net

Teaching Algorithmic Reasoning via In-context Learning

Hattie Zhou

Azade Nova

Hugo Larochelle

Aaron Courville

Behnam Neyshabur

Hanie Sedghi

2022-11-14

ArXiv (preprint)

doi.org

openreview.net

Fortuitous Forgetting in Connectionist Networks

Forgetting is often seen as an unwanted characteristic in both human and machine learning. However, we propose that forgetting can in fact b… (see more)e favorable to learning. We introduce "forget-and-relearn" as a powerful paradigm for shaping the learning trajectories of artificial neural networks. In this process, the forgetting step selectively removes undesirable information from the model, and the relearning step reinforces features that are consistently useful under different conditions. The forget-and-relearn framework unifies many existing iterative training algorithms in the image classification and language emergence literature, and allows us to understand the success of these algorithms in terms of the disproportionate forgetting of undesirable information. We leverage this understanding to improve upon existing algorithms by designing more targeted forgetting operations. Insights from our analysis provide a coherent view on the dynamics of iterative training in neural networks and offer a clear path towards performance improvements.

2022-01-27

ICLR.cc/2022/Conference (poster)

doi.org

openreview.net

Mila Techaide 2026

Venture Scientist Bootcamp

AI Advantage: Productivity in Public Service

Hattie Zhou

Publications

Mila Techaide 2026

Venture Scientist Bootcamp

AI Advantage: Productivity in Public Service

Popular keywords:

Hattie Zhou

Publications