Overview
Our lab is dedicated to developing more robust natural language processing models that can maintain reliable performance across various scenarios and resist adversarial attacks. We focus on several key areas:
Model Robustness and Reliability
We work on improving the robustness of language models across different dimensions:
- Reliable generations and calibration
- Robustness against visual and textual perturbations
- Consistent and unbiased evaluation
Security and Privacy
We investigate security aspects of language models:
- Detecting and preventing adversarial attacks
- Proper source attribution
- Privacy implications and membership inference
Featured Publications
Enhancing Large Language Models’ Situated Faithfulness to External Contexts
Yukun Huang, Sanxing Chen, Hongyi Cai, Bhuwan Dhingra (2025)
In ICLR. [ link | arxiv | code | data | twitter ]
Yukun Huang, Sanxing Chen, Hongyi Cai, Bhuwan Dhingra (2025)
In ICLR. [ link | arxiv | code | data | twitter ]
Real-time Fake News from Adversarial Feedback
Sanxing Chen, Yukun Huang, Bhuwan Dhingra (2024)
In arXiv. [ arxiv | code | twitter ]
Sanxing Chen, Yukun Huang, Bhuwan Dhingra (2024)
In arXiv. [ arxiv | code | twitter ]
ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods
Roy Xie, Junlin Wang, Ruomin Huang, Minxing Zhang, Rong Ge, Jian Pei, Neil Gong, Bhuwan Dhingra (2024)
In EMNLP. [ link | arxiv | code | twitter ]
Roy Xie, Junlin Wang, Ruomin Huang, Minxing Zhang, Rong Ge, Jian Pei, Neil Gong, Bhuwan Dhingra (2024)
In EMNLP. [ link | arxiv | code | twitter ]
Adversarial Math Word Problem Generation
Roy Xie, Chengxuan Huang, Junlin Wang, Bhuwan Dhingra (2024)
In arXiv. [ arxiv | code | twitter ]
Roy Xie, Chengxuan Huang, Junlin Wang, Bhuwan Dhingra (2024)
In arXiv. [ arxiv | code | twitter ]
Calibrating Long-form Generations from Large Language Models
Yukun Huang, Yixin Liu, Raghuveer Thirukovalluru, Arman Cohan, Bhuwan Dhingra (2024)
In EMNLP Findings. [ arxiv | code ]
Yukun Huang, Yixin Liu, Raghuveer Thirukovalluru, Arman Cohan, Bhuwan Dhingra (2024)
In EMNLP Findings. [ arxiv | code ]
Learning the Legibility of Visual Text Perturbations
Dev Seth, Rickard Stureborg, Danish Pruthi, Bhuwan Dhingra (2023)
In EACL. [ link | arxiv ]
Dev Seth, Rickard Stureborg, Danish Pruthi, Bhuwan Dhingra (2023)
In EACL. [ link | arxiv ]