Overview

Our lab is dedicated to developing more robust natural language processing models that can maintain reliable performance across various scenarios and resist adversarial attacks. We focus on several key areas:

Model Robustness and Reliability

We work on improving the robustness of language models across different dimensions:

  • Reliable generations and calibration
  • Robustness against visual and textual perturbations
  • Consistent and unbiased evaluation

Security and Privacy

We investigate security aspects of language models:

  • Detecting and preventing adversarial attacks
  • Proper source attribution
  • Privacy implications and membership inference
Enhancing Large Language Models’ Situated Faithfulness to External Contexts
Yukun Huang, Sanxing Chen, Hongyi Cai, Bhuwan Dhingra (2025)
In ICLR.  [ link | arxiv | code | data | twitter ]
Real-time Fake News from Adversarial Feedback
Sanxing Chen, Yukun Huang, Bhuwan Dhingra (2024)
In arXiv.  [ arxiv | code | twitter ]
ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods
Roy Xie, Junlin Wang, Ruomin Huang, Minxing Zhang, Rong Ge, Jian Pei, Neil Gong, Bhuwan Dhingra (2024)
In EMNLP.  [ link | arxiv | code | twitter ]
Adversarial Math Word Problem Generation
Roy Xie, Chengxuan Huang, Junlin Wang, Bhuwan Dhingra (2024)
In arXiv.  [ arxiv | code | twitter ]
Calibrating Long-form Generations from Large Language Models
Yukun Huang, Yixin Liu, Raghuveer Thirukovalluru, Arman Cohan, Bhuwan Dhingra (2024)
In EMNLP Findings.  [ arxiv | code ]
Learning the Legibility of Visual Text Perturbations
Dev Seth, Rickard Stureborg, Danish Pruthi, Bhuwan Dhingra (2023)
In EACL.  [ link | arxiv ]