🧠
🤖
🧠
🤖
🔮
🧠
⚙️
💫
🤖
🧠

Kathy Garcia

NSF Fellow (12%) • Published ICLR (25%) • Best Oral ICML LatinX • 2x VSS Talks (15%) • CCN Talk (25%) • 350+ models benchmarked • WSJ, PopSci Features

Building human-aligned AI through computational cognitive neuroscience.

I am a 4th year Ph.D. student in Computational Cognitive Science at Johns Hopkins University, where I investigate how humans and machines understand social interactions in the Computational Cognitive Neuroscience Lab with Professor Leyla Isik.

I am fortunate enough to have had my research published at top-tier venues including at ICLR and featured in The Wall Street Journal. I develop computational models that bridge human vision and artificial intelligence, with a focus on creating socially intelligent systems that understand dynamic human behavior.

Previously, I earned my B.S. at Stanford University and completed an NIH Postbac Fellowship in affective neuroscience at Northwestern University, mentored by Professor Robin Nusslock, Dr. Zachary Anderson, and Dr. Cassandra VanDunk.

🎓 MentorshipFirst-gen Latina | Happy to chat about research, grad school, or careers!

News

[Oct 2025] 📄 New preprint on aligning video models with human social judgments!
[Sept 2025] 🎉 Paper accepted at NeurIPS 2025 UniReps Workshop on social tokens for grounding LLMs!
[Aug 2025] 🧠 Selected for MIT Brains, Minds & Machines Summer Course - research led to new paper!
[July 2025] 🏆 Inducted into Sigma Xi Scientific Research Honor Society!
[May 2025] 📰 Featured in The Wall Street Journal on AI and social perception!
[Apr 2025] 📰 Featured in Johns Hopkins Hub on reading social cues!
[Feb 2025] 🏆 Awarded John I. Yellott Travel Award for VSS 2025!
[Feb 2025] 🏆 Received NEI Early Career Scientist Travel Grant!
[Feb 2025] 🎤 Talk accepted at VSS 2025 on social scene perception!
[Jan 2025] 🎉 Paper accepted at ICLR 2025 on dynamic social vision!
[Aug 2024] 🎤 Delivered a talk at CCN 2024 on deep learning vs. human vision!
[July 2024] 🥇 Won Best Oral Presentation at LatinX in AI @ ICML 2024!
[July 2024] 🎤 Delivered a talk at LatinX in AI @ ICML 2024 on social vision models!
[June 2024] 📄 New preprint on dynamic social vision now available!
[May 2024] 🏆 Received FOVEA 2024 Travel Award!
[May 2024] 🎤 Delivered a talk at VSS 2024 on neural benchmarking!
[Apr 2024] 🎤 Gave brown bag talk at JHU Cognitive Science on DNN benchmarking!
[Apr 2024] 🎓 Awarded the prestigious NSF GRFP!
[Sept 2021] 🎓 Awarded the Kelly Miller Fellowship at Johns Hopkins University!

📰 Featured in Media

WSJ, JHU Hub, PopSci, Marketplace, CISAC News

Research

Building the next generation of socially intelligent AI systems. I bridge human cognition and artificial intelligence by developing computational models that understand social interactions the way humans do. My research agenda focuses on closing the gap between human social perception and AI capabilities through neural benchmarking, multimodal learning, and cognitively-inspired architectures.

Working with researchers across MIT, Stanford, and Johns Hopkins, I develop methods that enable AI systems to perceive, interpret, and predict complex social dynamics. This work spans computer vision, natural language processing, and cognitive neuroscience — positioning AI systems to become truly collaborative partners in human social environments.

🧠⚡🤖
💫⚙️🔮
MIT Stanford Johns Hopkins

Research Projects

NeurIPS 2025

Look, Then Speak: Social Tokens for Grounding LLMs in Visual Interactions Kathy Garcia, Vighnesh Subramaniam, Boris Katz, Brian Cheung
Accepted at UniReps Workshop, NeurIPS 2025

PDF Project Page BibTeX Discussion

We introduce social tokens: visual features extracted from video frames & strategically inserted into language models to enhance their comprehension of social interactions. This method significantly improves LLM performance on social tasks by enabling richer integration of social cues. Our results demonstrate that social tokens help bridge the gap between human and model understanding of complex social dynamics.

Preprint

Aligning Video Models with Human Social Judgments via Behavior-Guided Fine-Tuning Kathy Garcia, Leyla Isik
arXiv preprint, 2025

PDF arXiv Project Page

We introduce behavior-guided fine-tuning to align video models with human social perception. Using 49,000+ human similarity judgments, we fine-tune video models via a hybrid triplet-RSA loss, significantly improving alignment with how humans perceive social interactions while maintaining performance on downstream tasks.

A Large-Scale Study of Social Scene Judgments: Alignment with Deep Neural Networks and Social-Affective Features Kathy Garcia, Leyla Isik
Vision Sciences Society (VSS), 2025 (Talk Presentation)

DOI Project Page

🎤 VSS 2025 Talk (15% acceptance)

We benchmark how closely deep neural networks capture the social understanding reflected in human judgments and brain responses to real-world interactions. Our dataset and framework reveal that while AI models are closing the gap, humans still rely on uniquely social cues to interpret complex social scenes.

ICLR 2025

Modeling Dynamic Social Vision Highlights Gaps Between Deep Learning and Humans Kathy Garcia, Emalie McMahon, Colin Conwell, Michael F. Bonner, Leyla Isik
International Conference on Learning Representations (ICLR), 2025

PDF Poster Project Page Code & Data BibTeX

🏆 Best Oral @ ICML LatinX 2024 • 📰 WSJ, JHU Hub, PopSci, Marketplace

We present a dataset of natural videos and captions involving complex multi-agent interactions, and we benchmark 350+ image, video, and language models on behavioral and neural responses to the videos. Together these results identify a major gap in AI's ability to match the human brain and behavior and highlight the importance of studying vision in dynamic, natural contexts.

Large-scale Deep Neural Network Benchmarking in Dynamic Social Vision Kathy Garcia, Colin Conwell, Emalie McMahon, Michael F. Bonner, Leyla Isik
Vision Sciences Society (VSS), 2024Cognitive Computational Neuroscience (CCN), 2024

DOI Project Page

🎤 VSS 2024 Talk (15% acceptance) • CCN 2024 Contributed Talk (25% acceptance)

Large-scale benchmarking of 300+ DNNs with diverse architectures, objectives, and training sets, against fMRI responses to a curated dataset of 200 naturalistic social videos, with a focus on the "lateral" visual stream.

Predicting Dimensional Symptoms of Psychopathology from Task-Based fMRI using Support Vector Regression Kathy Garcia, Zach Anderson,  Iris Ka-Yi Chat,  Katherine S.F. Damme,  Katherine Young, Susan Y. Bookheimer,  Richard Zinbarg,  Michelle Craske,  Robin Nusslock 
SfN Global Connectome, 2021 (Virtual poster presentation)

This study develops a novel machine learning approach using Support Vector Regression (SVR) to explore potential biomarkers in fMRI data for symptoms of anxiety and depression, finding that MID task-fMRI data does not accurately predict these symptoms, with results indicating a poor model fit.

Publications

Garcia, K., Subramaniam, V., Katz, B., & Cheung, B. (2025). Look, Then Speak: Social Tokens for Grounding LLMs in Visual Interactions. In NeurIPS 2025 Workshop on Unifying Representations in Neural Models (UniReps).
PDF BibTeX

Garcia, K., & Isik, L. (2025). Aligning Video Models with Human Social Judgments via Behavior-Guided Fine-Tuning. arXiv preprint arXiv:2510.01502.
PDF

Garcia, K., & Isik, L. (2025). Semantic and Social Features Drive Human Grouping of Dynamic, Visual Events in Large-Scale Similarity Judgements. Journal of Vision, 25(9), 2621.
DOI

Garcia, K., McMahon, E., Conwell, C., Bonner, M. F., & Isik, L. (2025). Modeling Dynamic Social Vision Highlights Gaps Between Deep Learning and Humans. In International Conference on Learning Representations (ICLR).
PDF BibTeX

Garcia, K., Conwell, C., McMahon, E., Bonner, M. F., & Isik, L. (2024). Large-scale Deep Neural Network Benchmarking in Dynamic Social Vision. Journal of Vision, 24(10), 716.
DOI

McMahon, E., Conwell, C., Garcia, K., Bonner, M. F., & Isik, L. (2024). Language model prediction of visual cortex responses to dynamic social scenes. Journal of Vision, 24(10), 904.
DOI

Garcia, K., McMahon, E., Conwell, C., Bonner, M. F., & Isik, L. (2024). Dynamic, social vision highlights gaps between deep learning and human behavior and neural responses. Cognitive Computational Neuroscience (CCN).
PDF

Media Coverage

I have been fortunate to have some of my research work covered by popular media:

Selected Honors

Best Oral Presentation Award, LatinX in AI Workshop, ICML (2024)

Sigma Xi Scientific Research Honor Society (2025)

National Eye Institute Early Career Scientist Travel Grant (2025)

John I. Yellott Travel Award for Vision Science, VSS (2025)

FoVea Travel and Networking Award (2024)

Kelly Miller Fellowship, Johns Hopkins University (2021-2025)

Northwestern University Interdepartmental Neuroscience Research Fellowship (2020)

Stanford University El Centro Latino Acknowledgement (2017)

Miscellaneous

Teaching Assistant, Cognitive Neuropsychology of Visual Perception - Spring 2024

Teaching Assistant, Neuroimaging Methods in High-Level Vision - Fall 2023

Teaching Assistant, Computational Cognitive Neuroscience of Vision - Spring 2023