Portrait of Taylor Webb is unavailable

Taylor Webb

Associate Academic Member
Research Topics
Cognitive Science
Computer Vision
Consciousness
Deep Learning
Reasoning

Current Students

Research Intern - Université de Montréal
Collaborating Alumni
Master's Research - Université de Montréal

Publications

Visual serial processing deficits explain divergences in human and VLM reasoning
Nicholas Budny
Kia Ghods
Declan Campbell
Raja Marjieh
Amogh Joshi
Sreejan Kumar
Jonathan D. Cohen
Thomas L. Griffiths
Why do Vision Language Models (VLMs), despite success on standard benchmarks, often fail to match human performance on surprisingly simple v… (see more)isual reasoning tasks? While the underlying computational principles are still debated, we hypothesize that a crucial factor is a deficit in visually-grounded serial processing. To test this hypothesis, we compared human and VLM performance across tasks designed to vary serial processing demands in three distinct domains: geometric reasoning, perceptual enumeration, and mental rotation. Tasks within each domain varied serial processing load by manipulating factors such as geometric concept complexity, perceptual individuation load, and transformation difficulty. Across all domains, our results revealed a consistent pattern: decreased VLM accuracy was strongly correlated with increased human reaction time (used as a proxy for serial processing load). As tasks require more demanding serial processing -- whether composing concepts, enumerating items, or performing mental transformations -- the VLM-human performance gap widens reliably. These findings support our hypothesis, indicating that limitations in serial, visually grounded reasoning represent a fundamental bottleneck that distinguishes current VLMs from humans.
Visual serial processing deficits explain divergences in human and VLM reasoning
Nicholas Budny
Kia Ghods
Declan Campbell
Raja Marjieh
Amogh Joshi
Sreejan Kumar
Jonathan D. Cohen 0003
Thomas L. Griffiths
Why do Vision Language Models (VLMs), despite success on standard benchmarks, often fail to match human performance on surprisingly simple v… (see more)isual reasoning tasks? While the underlying computational principles are still debated, we hypothesize that a crucial factor is a deficit in visually-grounded serial processing. To test this hypothesis, we compared human and VLM performance across tasks designed to vary serial processing demands in three distinct domains: geometric reasoning, perceptual enumeration, and mental rotation. Tasks within each domain varied serial processing load by manipulating factors such as geometric concept complexity, perceptual individuation load, and transformation difficulty. Across all domains, our results revealed a consistent pattern: decreased VLM accuracy was strongly correlated with increased human reaction time (used as a proxy for serial processing load). As tasks require more demanding serial processing -- whether composing concepts, enumerating items, or performing mental transformations -- the VLM-human performance gap widens reliably. These findings support our hypothesis, indicating that limitations in serial, visually grounded reasoning represent a fundamental bottleneck that distinguishes current VLMs from humans.
Visual serial processing deficits explain divergences in human and VLM reasoning
Nicholas Budny
Kia Ghods
Declan Campbell
Raja Marjieh
Amogh Joshi
Sreejan Kumar
Jonathan D. Cohen 0003
Thomas L. Griffiths
Why do Vision Language Models (VLMs), despite success on standard benchmarks, often fail to match human performance on surprisingly simple v… (see more)isual reasoning tasks? While the underlying computational principles are still debated, we hypothesize that a crucial factor is a deficit in visually-grounded serial processing. To test this hypothesis, we compared human and VLM performance across tasks designed to vary serial processing demands in three distinct domains: geometric reasoning, perceptual enumeration, and mental rotation. Tasks within each domain varied serial processing load by manipulating factors such as geometric concept complexity, perceptual individuation load, and transformation difficulty. Across all domains, our results revealed a consistent pattern: decreased VLM accuracy was strongly correlated with increased human reaction time (used as a proxy for serial processing load). As tasks require more demanding serial processing -- whether composing concepts, enumerating items, or performing mental transformations -- the VLM-human performance gap widens reliably. These findings support our hypothesis, indicating that limitations in serial, visually grounded reasoning represent a fundamental bottleneck that distinguishes current VLMs from humans.
Whither symbols in the era of advanced neural networks?
Thomas L. Griffiths
Brenden M. Lake
R. Thomas McCoy
Ellie Pavlick
Some of the strongest evidence that human minds should be thought about in terms of symbolic systems has been the way they combine ideas, pr… (see more)oduce novelty, and learn quickly. We argue that modern neural networks -- and the artificial intelligence systems built upon them -- exhibit similar abilities. This undermines the argument that the cognitive processes and representations used by human minds are symbolic, although the fact that these neural networks are typically trained on data generated by symbolic systems illustrates that such systems play an important role in characterizing the abstract problems that human minds have to solve. This argument leads us to offer a new agenda for research on the symbolic basis of human thought.
Whither symbols in the era of advanced neural networks?
Thomas L. Griffiths
Brenden M. Lake
R. Thomas McCoy
Ellie Pavlick
Some of the strongest evidence that human minds should be thought about in terms of symbolic systems has been the way they combine ideas, pr… (see more)oduce novelty, and learn quickly. We argue that modern neural networks -- and the artificial intelligence systems built upon them -- exhibit similar abilities. This undermines the argument that the cognitive processes and representations used by human minds are symbolic, although the fact that these neural networks are typically trained on data generated by symbolic systems illustrates that such systems play an important role in characterizing the abstract problems that human minds have to solve. This argument leads us to offer a new agenda for research on the symbolic basis of human thought.
Whither symbols in the era of advanced neural networks?
Thomas L. Griffiths
Brenden M. Lake
R. Thomas McCoy
Ellie Pavlick
Some of the strongest evidence that human minds should be thought about in terms of symbolic systems has been the way they combine ideas, pr… (see more)oduce novelty, and learn quickly. We argue that modern neural networks -- and the artificial intelligence systems built upon them -- exhibit similar abilities. This undermines the argument that the cognitive processes and representations used by human minds are symbolic, although the fact that these neural networks are typically trained on data generated by symbolic systems illustrates that such systems play an important role in characterizing the abstract problems that human minds have to solve. This argument leads us to offer a new agenda for research on the symbolic basis of human thought.
Visual symbolic mechanisms: Emergent symbol processing in vision language models
Declan Campbell
To accurately process a visual scene, observers must bind features together to represent individual objects. This capacity is necessary, for… (see more) instance, to distinguish an image containing a red square and a blue circle from an image containing a blue square and a red circle. Recent work has found that language models solve this'binding problem'via a set of symbol-like, content-independent indices, but it is unclear whether similar mechanisms are employed by vision language models (VLMs). This question is especially relevant, given the persistent failures of VLMs on tasks that require binding. Here, we identify a set of emergent symbolic mechanisms that support binding in VLMs via a content-independent, spatial indexing scheme. Moreover, we find that binding errors can be traced directly to failures in these mechanisms. Taken together, these results shed light on the mechanisms that support symbol-like processing in VLMs, and suggest possible avenues for addressing the persistent binding failures exhibited by these models.
Visual symbolic mechanisms: Emergent symbol processing in vision language models
Declan Campbell
To accurately process a visual scene, observers must bind features together to represent individual objects. This capacity is necessary, for… (see more) instance, to distinguish an image containing a red square and a blue circle from an image containing a blue square and a red circle. Recent work has found that language models solve this'binding problem'via a set of symbol-like, content-independent indices, but it is unclear whether similar mechanisms are employed by vision language models (VLMs). This question is especially relevant, given the persistent failures of VLMs on tasks that require binding. Here, we identify a set of emergent symbolic mechanisms that support binding in VLMs via a content-independent, spatial indexing scheme. Moreover, we find that binding errors can be traced directly to failures in these mechanisms. Taken together, these results shed light on the mechanisms that support symbol-like processing in VLMs, and suggest possible avenues for addressing the persistent binding failures exhibited by these models.