Accepted Papers


All accepted papers are hosted on OpenReview, and listed here. OpenReview also provides bibtex entries for each paper.

🏆 Outstanding Paper Award Dated Data: Tracing Knowledge Cutoffs in Large Language Models
Jeffrey Cheng, Marc Marone, Orion Weller, Dawn Lawrie, Daniel Khashabi, Benjamin Van Durme

🏆 Outstanding Paper Award Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu, Tri Dao

🏆 Outstanding Paper Award AI-generated text boundary detection with RoFT
Laida Kushnareva, Tatiana Gaintseva, Dmitry Abulkhanov, Kristian Kuznetsov, German Magai, Eduard Tulchinskii, Serguei Barannikov, Sergey Nikolenko, Irina Piontkovskaya

🏆 Outstanding Paper Award Auxiliary task demands mask the capabilities of smaller language models
Jennifer Hu, Michael Frank

A Survey on Deep Learning for Theorem Proving
Zhaoyu Li, Jialiang Sun, Logan Murphy, Qidong Su, Zenan Li, Xian Zhang, Kaiyu Yang, Xujie Si

Towards Measuring the Representation of Subjective Global Opinions in Language Models
Esin DURMUS, Karina Nguyen, Thomas Liao, Nicholas Schiefer, Amanda Askell, Anton Bakhtin, Carol Chen, Zac Hatfield-Dodds, Danny Hernandez, Nicholas Joseph, Liane Lovitt, Sam McCandlish, Orowa Sikder, Alex Tamkin, Janel Thamkul, Jared Kaplan, Jack Clark, Deep Ganguli

🔦 Spotlight Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM
Chunqiu Steven Xia, Yinlin Deng, LINGMING ZHANG

🔦 Spotlight Transformer Circuit Evaluation Metrics Are Not Robust
Joseph Miller, Bilal Chughtai, William Saunders

🔦 Spotlight Long-Form Answers to Visual Questions from Blind and Low Vision People
Mina Huh, Fangyuan Xu, Yi-Hao Peng, Chongyan Chen, Hansika Murugu, Danna Gurari, Eunsol Choi, Amy Pavel

Locating and Editing Factual Associations in Mamba
Arnab Sen Sharma, David Atkinson, David Bau

Does Incomplete Syntax Influence Korean Language Model? Focusing on Word Order and Case Markers
Jong Myoung Kim, Young-Jun Lee, Yong-Jin Han, Ho-Jin Choi, Sangkeun Jung

LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play
Li-Chun Lu, Shou-Jen Chen, Tsung-Min Pai, Chan-Hung Yu, Hung-yi Lee, Shao-Hua Sun

Large Language Models are Capable of Offering Cognitive Reappraisal, if Guided
Hongli Zhan, Allen Zheng, Yoon Kyung Lee, Jina Suh, Junyi Jessy Li, Desmond Ong

NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Gerald Shen, Zhilin Wang, Olivier Delalleau, Jiaqi Zeng, Yi Dong, Daniel Egert, Shengyang Sun, Jimmy J. Zhang, Sahil Jain, Ali Taghibakhshi, Markel Sanz Ausin, Ashwath Aithal, Oleksii Kuchaiev

Khayyam Challenge (PersianMMLU): Is Your LLM Truly Wise to The Persian Language?
Omid Ghahroodi, Marzia Nouri, Mohammad Vali Sanian, Alireza Sahebi, Doratossadat Dastgheib, Ehsaneddin Asgari, Mahdieh Soleymani Baghshah, Mohammad Hossein Rohban

How Susceptible are LLMs to Influence in Prompts?
Sotiris Anagnostidis, Jannis Bulian

PairEval: Open-domain Dialogue Evaluation Metric with Pairwise Comparisons
ChaeHun Park, Minseok Choi, Dohyun Lee, Jaegul Choo

HGRN2: Gated Linear RNNs with State Expansion
Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong

Studying Large Language Model Behaviors Under Context-Memory Conflicts With Real Documents
Evgenii Kortukov, Alexander Rubinstein, Elisa Nguyen, Seong Joon Oh

Investigating Instruction Tuning Large Language Models on Graphs
Kerui Zhu, Bo-Wei Huang, Bowen Jin, Yizhu Jiao, Ming Zhong, Kevin Chen-Chuan Chang, Shou-De Lin, Jiawei Han

FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across Tokenizers
Joshua Nathaniel Williams, J Zico Kolter

CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization
Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Peter Jansen, Oyvind Tafjord, Niket Tandon, Li Zhang, Chris Callison-Burch, Peter Clark

Helmsman of the Masses? Evaluate the Opinion Leadership of Large Language Models in the Werewolf Game
Silin Du, Xiaowei Zhang

Factual and Tailored Recommendation Endorsements using Language Models and Reinforcement Learning
Jihwan Jeong, Yinlam Chow, Guy Tennenholtz, ChihWei Hsu, Mohammad Ghavamzadeh, Craig Boutilier

How Well Do LLMs Identify Cultural Unity in Diversity?
Jialin Li, Junli Wang, Junjie Hu, Ming Jiang

Guiding Language Model Reasoning with Planning Tokens
Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni

Large Language Model is not a (Multilingual) Compositional Relation Reasoner
Jinman Zhao, Xueyan Zhang

Instruction Mining: Instruction Data Selection for Tuning Large Language Models
Yihan Cao, Yanbin Kang, Chi Wang, Lichao Sun

Does your data spark joy? Performance gains from domain upsampling at the end of training
Cody Blakeney, Mansheej Paul, Brett W. Larsen, Sean Owen, Jonathan Frankle

Predicting Emergent Capabilities by Finetuning
Charlie Victor Snell, Eric Wallace, Dan Klein, Sergey Levine

Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
Tim Baumgärtner, Yang Gao, Dana Alon, Donald Metzler

CATS: Context-Aware Thresholding for Sparsity in Large Language Models
Donghyun Lee, Jaeyong Lee, Genghan Zhang, Mo Tiwari, Azalia Mirhoseini

Efficient Hybrid Long Sequence Modeling with State Space Augmented Transformers
Simiao Zuo, Xiaodong Liu, Jian Jiao, Denis X Charles, Eren Manavoglu, Tuo Zhao, Jianfeng Gao

Does Collaborative Human–LM Dialogue Generation Help Information Extraction from Human–Human Dialogues?
Bo-Ru Lu, Nikita Haduong, Chia-Hsuan Lee, Zeqiu Wu, Hao Cheng, Paul Koester, Jean Utke, Tao Yu, Noah A. Smith, Mari Ostendorf

🔦 Spotlight Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
Jiacheng Liu, Sewon Min, Luke Zettlemoyer, Yejin Choi, Hannaneh Hajishirzi

RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation
Chi-Min Chan, Chunpu Xu, Ruibin Yuan, Hongyin Luo, Wei Xue, Yike Guo, Jie Fu

Exploring the Mystery of Influential Data for Mathematical Reasoning
Xinzhe Ni, Yeyun Gong, Zhibin Gou, yelong shen, Yujiu Yang, Nan Duan, Weizhu Chen

LalaEval: A Holistic Human Evaluation Framework for Domain-Specific Large Language Models
Chongyan Sun, Ken Lin, Shiwei Wang, Hulong Wu, Chengfei Fu, Zhen Wang

Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild
Niloofar Mireshghallah, Maria Antoniak, Yash More, Yejin Choi, Golnoosh Farnadi

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
Yixuan Tang, Yi Yang

How bad is training on synthetic data? A statistical analysis of language model collapse
Mohamed El Amine Seddik, Suei-Wen Chen, Soufiane Hayou, Pierre Youssef, Merouane Abdelkader DEBBAH

V-STaR: Training Verifiers for Self-Taught Reasoners
Arian Hosseini, Xingdi Yuan, Nikolay Malkin, Aaron Courville, Alessandro Sordoni, Rishabh Agarwal

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Bo Peng, Daniel Goldstein, Quentin Gregory Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Teddy Ferdinan, Kranthi Kiran GV, Haowen Hou, Satyapriya Krishna, Ronald McClelland Jr., Niklas Muennighoff, Fares Obeid, Atsushi Saito, Guangyu Song, Haoqin Tu, Ruichong Zhang, Bingchen Zhao, Qihang Zhao, Jian Zhu, Rui-Jie Zhu

Linearizing Large Language Models
Jean Mercat, Igor Vasiljevic, Sedrick Keh, Kushal Arora, Achal Dave, Adrien Gaidon, Thomas Kollar

VideoDirectorGPT: Consistent Multi-Scene Video Generation via LLM-Guided Planning
Han Lin, Abhay Zala, Jaemin Cho, Mohit Bansal

OpenAgents: An Open Platform for Language Agents in the Wild
Tianbao Xie, Fan Zhou, Zhoujun Cheng, Peng Shi, Luoxuan Weng, Yitao Liu, Toh Jing Hua, Junning Zhao, Qian Liu, Che Liu, Zeyu Leo Liu, Yiheng Xu, Hongjin SU, Dongchan Shin, Caiming Xiong, Tao Yu

TPD: Enhancing Student Language Model Reasoning via Principle Discovery and Guidance
Haorui Wang, Rongzhi Zhang, Yinghao Li, Lingkai Kong, Yuchen Zhuang, Xiusi Chen, Chao Zhang

Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP
François Remy, Pieter Delobelle, Hayastan Avetisyan, Alfiya Khabibullina, Miryam de Lhoneux, Thomas Demeester

RAFT: Adapting Language Model to Domain Specific RAG
Tianjun Zhang, Shishir G Patil, Naman Jain, Sheng Shen, Matei Zaharia, Ion Stoica, Joseph E. Gonzalez

PhonATe: Impact of Type-Written Phonological Features of African American Language on Generative Language Modeling Tasks
Nicholas Deas, Jessica A Grieser, Xinmeng Hou, Shana Kleiner, Tajh Martin, Sreya Nandanampati, Desmond U. Patton, Kathleen McKeown

Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
Yize Zhao, Tina Behnia, Vala Vakilian, Christos Thrampoulidis

Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think
Xinpeng Wang, Chengzhi Hu, Bolei Ma, Paul Röttger, Barbara Plank

ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs
Pengrui Han, Rafal Dariusz Kocielnik, Adhithya Prakash Saravanan, Roy Luoyao Jiang, Or Sharir, Anima Anandkumar

Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation
Biqing Qi, Kaiyan Zhang, Kai Tian, Haoxiang Li, Zhang-Ren Chen, Sihang Zeng, Ermo Hua, Hu Jinfang, Bowen Zhou

Resolving Knowledge Conflicts in Large Language Models
Yike Wang, Shangbin Feng, Heng Wang, Weijia Shi, Vidhisha Balachandran, Tianxing He, Yulia Tsvetkov

How Far Are We from Intelligent Visual Deductive Reasoning?
Yizhe Zhang, Richard He Bai, Ruixiang ZHANG, Jiatao Gu, Shuangfei Zhai, Joshua M. Susskind, Navdeep Jaitly

DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training
Dacheng Li, Rulin Shao, Anze Xie, Eric Xing, Xuezhe Ma, Ion Stoica, Joseph E. Gonzalez, Hao Zhang

Web Retrieval Agents for Evidence-Based Misinformation Detection
Jacob-Junqi Tian, Hao Yu, Yury Orlovskiy, Tyler Vergho, Mauricio Rivera, Mayank Goel, Zachary Yang, Jean-François Godbout, Reihaneh Rabbany, Kellin Pelrine

Task Success is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors
Lin Guan, Yifan Zhou, Denis Liu, Yantian Zha, Heni Ben Amor, Subbarao Kambhampati

Stop Reasoning! When Multimodal LLM with Chain-of-Thought Reasoning Meets Adversarial Image
Zefeng Wang, Zhen Han, Shuo Chen, Fan Xue, Zifeng Ding, Xun Xiao, Volker Tresp, Philip Torr, Jindong Gu

PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models
Devansh Jain, Priyanshu Kumar, Samuel Gehman, Xuhui Zhou, Thomas Hartvigsen, Maarten Sap

Reasoning about concepts with LLMs: Inconsistencies abound
Rosario Uceda Sosa, Karthikeyan Natesan Ramamurthy, Maria Chang, Moninder Singh

Logits of API-Protected LLMs Leak Proprietary Information
Matthew Finlayson, Xiang Ren, Swabha Swayamdipta

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Eric Zelikman, Georges Raif Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah Goodman

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Sainbayar Sukhbaatar, Olga Golovneva, Vasu Sharma, Hu Xu, Xi Victoria Lin, Baptiste Roziere, Jacob Kahn, Shang-Wen Li, Wen-tau Yih, Jason E Weston, Xian Li

AdaMoLE: Fine-Tuning Large Language Models with Adaptive Mixture of Low-Rank Adaptation Experts
Zefang Liu, Jiahua Luo

Instruction-tuning Aligns LLMs to the Human Brain
Khai Loong Aw, Syrielle Montariol, Badr AlKhamissi, Martin Schrimpf, Antoine Bosselut

An Incomplete Loop: Instruction Inference, Instruction Following, and In-Context Learning in Language Models
Emmy Liu, Graham Neubig, Jacob Andreas

Learning to Plan for Language Modeling from Unlabeled Data
Nathan Cornille, Marie-Francine Moens, Florian Mai

Impact of Preference Noise on the Alignment Performance of Generative Language Models
Yang Gao, Dana Alon, Donald Metzler

🔦 Spotlight SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
Haojie Duanmu, Zhihang Yuan, Xiuhong Li, Jiangfei Duan, Xingcheng ZHANG, Dahua Lin

Eliciting Latent Knowledge from "Quirky" Language Models
Alex Troy Mallen, Madeline Brumley, Julia Kharchenko, Nora Belrose

AmbigDocs: Reasoning across Documents on Different Entities under the Same Name
Yoonsang Lee, Xi Ye, Eunsol Choi

Is ChatGPT a Good Sentiment Analyzer?
Zengzhi Wang, Qiming Xie, Yi Feng, Zixiang Ding, Zinong Yang, Rui Xia

Multi-FAct: Assessing Factuality of Multilingual LLMs using FActScore
Sheikh Shafayat, Eunsu Kim, Juhyun Oh, Alice Oh

Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
Bang An, Sicheng Zhu, Ruiyi Zhang, Michael-Andrei Panaitescu-Liess, Yuancheng Xu, Furong Huang

LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset
Botao Yu, Frazier N. Baker, Ziqi Chen, Xia Ning, Huan Sun

Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs
Yilun Hua, Yoav Artzi

Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback
Hongshen Xu, Zichen Zhu, Situo Zhang, Da Ma, Shuai Fan, Lu Chen, Kai Yu

CA-LoRA: Adapting Existing LoRA for Compressed LLMs to Enable Efficient Multi-Tasking on Personal Devices
Weilin Zhao, Yuxiang Huang, Xu Han, Zhiyuan Liu, Zhengyan Zhang, Kuai Li, Chen Chen, TAO YANG, Maosong Sun

Don't throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding
Jiacheng Liu, Andrew Cohen, Ramakanth Pasunuru, Yejin Choi, Hannaneh Hajishirzi, Asli Celikyilmaz

Crystal: Illuminating LLM Abilities on Language and Code
Tianhua Tao, Junbo Li, Bowen Tan, Hongyi Wang, William Marshall, Bhargav M Kanakiya, Joel Hestness, Natalia Vassilieva, Zhiqiang Shen, Eric Xing, Zhengzhong Liu

🔦 Spotlight GeniL: A Multilingual Dataset on Generalizing Language
Aida Mostafazadeh Davani, Sagar Gubbi Venkatesh, Sunipa Dev, Shachi Dave, Vinodkumar Prabhakaran

RULER: What’s the Real Context Size of Your Long-Context Language Models?
Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Boris Ginsburg

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Shengyi Huang, Michael Noukhovitch, Arian Hosseini, Kashif Rasul, Weixun Wang, Lewis Tunstall

Can Language Models Solve Olympiad Programming?
Quan Shi, Michael Tang, Karthik R Narasimhan, Shunyu Yao

From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function
Rafael Rafailov, Joey Hejna, Ryan Park, Chelsea Finn

PRobELM: Plausibility Ranking Evaluation for Language Models
Moy Yuan, Eric Chamoun, Rami Aly, Chenxi Whitehouse, Andreas Vlachos

Bring Your Own Data! Self-Sensitivity Evaluation for Large Language Models
Neel Jain, Khalid Saifullah, Yuxin Wen, John Kirchenbauer, Manli Shu, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein

Can MLLMs Perform Text-to-Image In-Context Learning?
Yuchen Zeng, Wonjun Kang, Yicong Chen, Hyung Il Koo, Kangwook Lee

DeStein: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion
Yu Li, Han Jiang, Chuanyang Gong, Zhihua Wei

Understanding Retrieval Augmentation for Long-Form Question Answering
Hung-Ting Chen, Fangyuan Xu, Shane Arora, Eunsol Choi

LAMPO: Large Language Models as Preference Machines for Few-shot Ordinal Classification
Zhen Qin, Junru Wu, Jiaming Shen, Tianqi Liu, Xuanhui Wang

LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models
Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Yan Xia, Wenshan Wu, Ting Song, Man Lan, Furu Wei

Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability
Zhuoyan Xu, Zhenmei Shi, Yingyu Liang

Does In-Context Learning Really Learn? Rethinking How Large Language Models Respond and Solve Tasks via In-Context Learning
Quanyu Long, Yin Wu, Wenya Wang, Sinno Jialin Pan

Characterizing Multimodal Long-form Summarization: A Case Study on Financial Reports
Tianyu Cao, Natraj Raman, Danial Dervovic, Chenhao Tan

NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness
Manav Singhal, Tushar Aggarwal, Abhijeet Awasthi, Nagarajan Natarajan, Aditya Kanade

TarGEN: Targeted Data Generation with Large Language Models
Himanshu Gupta, Kevin Scaria, Ujjwala Anantheswaran, Shreyas Verma, Mihir Parmar, Saurabh Arjun Sawant, Chitta Baral, Swaroop Mishra

Uncovering Intermediate Variables in Transformers using Circuit Probing
Michael A. Lepori, Thomas Serre, Ellie Pavlick

UniMem: Towards a Unified View of Long-Context Large Language Models
Junjie Fang, Likai Tang, Hongzhe Bi, Yujia Qin, Si Sun, Zhenyu Li, Haolun Li, Yongjian Li, Xin Cong, Yankai Lin, Yukun Yan, Xiaodong Shi, Sen Song, Zhiyuan Liu, Maosong Sun

Towards Verifiable Text Generation with Symbolic References
Lucas Torroba Hennigen, Zejiang Shen, Aniruddha Nrusimha, Bernhard Gapp, David Sontag, Yoon Kim

VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?
Junpeng Liu, Yifan Song, Bill Yuchen Lin, Wai Lam, Graham Neubig, Yuanzhi Li, Xiang Yue

HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs
Junying Chen, Xidong Wang, Ke Ji, Anningzhe Gao, Feng Jiang, Shunian Chen, Hongbo Zhang, Song Dingjie, Wenya Xie, Chuyi Kong, Jianquan Li, Xiang Wan, Haizhou Li, Benyou Wang

LMD3: Language Model Data Density Dependence
John Kirchenbauer, Garrett Honke, Gowthami Somepalli, Jonas Geiping, Katherine Lee, Daphne Ippolito, Tom Goldstein, David Andre

The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models
Kian Ahrabian, Zhivar Sourati, Kexuan Sun, Jiarui Zhang, Yifan Jiang, Fred Morstatter, Jay Pujara

🔦 Spotlight Tuning Language Models by Proxy
Alisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, Noah A. Smith

Evaluating LLMs at Detecting Errors in LLM Responses
Ryo Kamoi, Sarkar Snigdha Sarathi Das, Renze Lou, Jihyun Janice Ahn, Yilun Zhao, Xiaoxin Lu, Nan Zhang, Yusen Zhang, Haoran Ranran Zhang, Sujeeth Reddy Vummanthala, Salika Dave, Shaobo Qin, Arman Cohan, Wenpeng Yin, Rui Zhang

HDT: Hierarchical Document Transformer
Haoyu He, Markus Flicke, Jan Buchmann, Iryna Gurevych, Andreas Geiger

With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation
Yan Wang, Dongyang Ma, Deng Cai

CatCode: A Comprehensive Evaluation Framework for LLMs On the Mixture of Code and Text
Zhenru Lin, Yiqun Yao, Yang Yuan

Learning From Correctness Without Prompting Makes LLM Efficient Reasoner
Yuxuan Yao, Han Wu, Zhijiang Guo, Zhou Biyan, Jiahui Gao, Sichun Luo, Hanxu Hou, Xiaojin Fu, Linqi Song

Unveiling LLMs: The Evolution of Latent Representations in a Dynamic Knowledge Graph
Marco Bronzini, Carlo Nicolini, Bruno Lepri, Jacopo Staiano, Andrea Passerini

Scalable Model Editing via Customized Expert Networks
Zihan Yao, Yu He, Tianyu Qi, Ming Li

Fine-grained Hallucination Detection and Editing for Language Models
Abhika Mishra, Akari Asai, Vidhisha Balachandran, Yizhong Wang, Graham Neubig, Yulia Tsvetkov, Hannaneh Hajishirzi

ORAG: Ontology-Guided Retrieval-Augmented Generation for Theme-Specific Entity Typing
Jinfeng Xiao, Linyi Ding, James Barry, Mohab Elkaref, Geeth De Mel, Jiawei Han

Unified View of Grokking, Double Descent and Emergent Abilities: A Comprehensive Study on Algorithm Task
Yufei Huang, Shengding Hu, Xu Han, Zhiyuan Liu, Maosong Sun

Fakes of Varying Shades: How Warning Affects Human Perception and Engagement Regarding LLM Hallucinations
Mahjabin Nahar, Haeseung Seo, Eun-Ju Lee, Aiping Xiong, Dongwon Lee

Personalized Collaborative Fine-Tuning for On-Device Large Language Models
Nicolas Wagner, Dongyang Fan, Martin Jaggi

Benchmarks as Microscopes: A Call for Model Metrology
Michael Saxon, Ari Holtzman, Peter West, William Yang Wang, Naomi Saphra

Tabular Transfer Learning via Prompting LLMs
Jaehyun Nam, Woomin Song, Seong Hyeon Park, Jihoon Tack, Sukmin Yun, Jaehyung Kim, Kyu Hwan Oh, Jinwoo Shin

How Multilingual are Large Language Models Fine-tuned for Translation?
Aquia Richburg, Marine Carpuat

O3D: Offline Data-driven Discovery and Distillation for Sequential Decision-Making with Large Language Models
Yuchen Xiao, Yanchao Sun, Mengda Xu, Udari Madhushani Sehwag, Jared Vann, Deepeka Garg, Sumitra Ganesh

LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models
Shibo Hao, Yi Gu, Haotian Luo, Tianyang Liu, Xiyan Shao, Xinyuan Wang, Shuhua Xie, Haodi Ma, Adithya Samavedhi, Qiyue Gao, Zhen Wang, Zhiting Hu

Do Membership Inference Attacks Work on Large Language Models?
Michael Duan, Anshuman Suri, Niloofar Mireshghallah, Sewon Min, Weijia Shi, Luke Zettlemoyer, Yulia Tsvetkov, Yejin Choi, David Evans, Hannaneh Hajishirzi

Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics
James A. Michaelov, Catherine Arnett, Ben Bergen

🔦 Spotlight The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks, Max Tegmark

Hummer: Towards Limited Competitive Preference Dataset
Yusen Wu, Li Jiang, Junwu Xiong, Jingqing Ruan, Yichuan Ding, Qingpei Guo, zujie wen, JUN ZHOU, Xiaotie Deng

Zephyr: Direct Distillation of LM Alignment
Lewis Tunstall, Edward Emanuel Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro Von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M Rush, Thomas Wolf

Nonparametric Variational Regularisation of Pretrained Transformers
Fabio James Fehr, James Henderson

Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability
Jiri Hron, Laura A Culp, Gamaleldin Fathy Elsayed, Rosanne Liu, Jasper Snoek, Simon Kornblith, Alex Rizkowsky, Isabelle Simpson, Jascha Sohl-Dickstein, Noah Fiedel, Aaron T Parisi, Alexander A Alemi, Azade Nova, Ben Adlam, Bernd Bohnet, Gaurav Mishra, Hanie Sedghi, Izzeddin Gur, Jaehoon Lee, John D Co-Reyes, Kathleen Kenealy, Kelvin Xu, Kevin Swersky, Igor Mordatch, Lechao Xiao, Maxwell Bileschi, Peter J Liu, Roman Novak, Sharad Vikram, Tris Warkentin, Jeffrey Pennington

Redesigning Information Markets in the Era of Language Models
Martin Weiss, Nasim Rahaman, Manuel Wuthrich, Yoshua Bengio, Li Erran Li, Bernhard Schölkopf, Christopher Pal

Large Language Model Routing with Benchmark Datasets
Tal Shnitzer, Anthony Ou, Mírian Silva, Kate Soule, Yuekai Sun, Justin Solomon, Neil Thompson, Mikhail Yurochkin

Language Models as Critical Thinking Tools: A Case Study of Philosophers
Andre Ye, Jared Moore, Rose Novick, Amy X Zhang

Cookbook: A framework for improving LLM generative abilities via programmatic data generating templates
Avanika Narayan, Mayee F Chen, Kush Bhatia, Christopher Re

Prompt Exploration with Prompt Regression
Michael Feffer, Ronald Xu, Yuekai Sun, Mikhail Yurochkin

FABLES: Evaluating faithfulness and content selection in book-length summarization
Yekyung Kim, Yapei Chang, Marzena Karpinska, Aparna Garimella, Varun Manjunatha, Kyle Lo, Tanya Goyal, Mohit Iyyer

Mapping the Increasing Use of LLMs in Scientific Papers
Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang, Christopher Potts, Christopher D Manning, James Y. Zou

Scattered Mixture-of-Experts Implementation
Shawn Tan, Yikang Shen, Rameswar Panda, Aaron Courville

What Are Tools Anyway? A Survey from the Language Model Perspective
Zhiruo Wang, Zhoujun Cheng, Hao Zhu, Daniel Fried, Graham Neubig

A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration
Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, Diyi Yang

Data Checklist: On Unit-Testing Datasets with Usable Information
Heidi Chenyu Zhang, Shabnam Behzad, Kawin Ethayarajh, Dan Jurafsky

MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs
Vera Neplenbroek, Arianna Bisazza, Raquel Fernández

MambaByte: Token-free Selective State Space Model
Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan, Alexander M Rush

Description-Based Text Similarity
Shauli Ravfogel, Valentina Pyatkin, Amir David Nissan Cohen, Avshalom Manevich, Yoav Goldberg

Generating Synthetic Datasets for Few-shot Prompt Tuning
Xu Guo, Zilin Du, Boyang Li, Chunyan Miao

Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks?
Urja Khurana, Eric Nalisnick, Antske Fokkens, Swabha Swayamdipta

Does RoBERTa Perform Better than BERT in Continual Learning: An Attention Sink Perspective
Xueying Bai, Yifan Sun, Niranjan Balasubramanian

An In-Context Learning Agent for Formal Theorem-Proving
Amitayush Thakur, George Tsoukalas, Yeming Wen, Jimmy Xin, Swarat Chaudhuri

Efficient Parallelization Layouts for Large-Scale Distributed Model Training
Johannes Hagemann, Samuel Weinbach, Konstantin Dobler, Maximilian Schall, Gerard de Melo

Unforgettable Generalization in Language Models
Eric Zhang, Leshem Choshen, Jacob Andreas

MileBench: Benchmarking MLLMs in Long Context
Song Dingjie, Shunian Chen, Guiming Hardy Chen, Fei Yu, Xiang Wan, Benyou Wang

AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs
Zeyi Liao, Huan Sun

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
An Yan, Zhengyuan Yang, Junda Wu, Wanrong Zhu, Jianwei Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Julian McAuley, Jianfeng Gao, Lijuan Wang

Source-Aware Training Enables Knowledge Attribution in Language Models
Muhammad Khalifa, David Wadden, Emma Strubell, Honglak Lee, Lu Wang, Iz Beltagy, Hao Peng

A Language Agent for Autonomous Driving
Jiageng Mao, Junjie Ye, Yuxi Qian, Marco Pavone, Yue Wang

LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
Chengsong Huang, Qian Liu, Bill Yuchen Lin, Tianyu Pang, Chao Du, Min Lin

🔦 Spotlight GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, Samuel R. Bowman

Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms
Michael Hanna, Sandro Pezzelle, Yonatan Belinkov

Stronger Random Baselines for In-Context Learning
Gregory Yauney, David Mimno

Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities
Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Hiroki Iida, Masanari Ohi, Kakeru Hattori, Hirai Shota, Sakae Mizuki, Rio Yokota, Naoaki Okazaki

Decoupling Noise and Toxic Parameters for Language Model Detoxification by Task Vector Merging
Yongmin Kim, Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo

Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection
Guillem Ramírez, Alexandra Birch, Ivan Titov

Adaptive Quantization Error Reconstruction for LLMs with Mixed Precision
Lin Ou, Jinpeng Xia, Yuewei Zhang, Chuzhan Hao, Hao Henry Wang

StyleTalker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
Yinghao Aaron Li, Xilin Jiang, Jordan Darefsky, Ge Zhu, Nima Mesgarani

🔦 Spotlight Iteratively Prompting Multimodal LLMs to Reproduce Natural and AI-Generated Images
Ali Naseh, Katherine Thai, Mohit Iyyer, Amir Houmansadr

Compression Represents Intelligence Linearly
Yuzhen Huang, Jinghan Zhang, Zifei Shan, Junxian He

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Lucas Lehnert, Sainbayar Sukhbaatar, DiJia Su, Qinqing Zheng, Paul McVay, Michael Rabbat, Yuandong Tian

How Easily do Irrelevant Inputs Skew the Responses of Large Language Models?
Siye Wu, Jian Xie, Jiangjie Chen, Tinghui Zhu, Kai Zhang, Yanghua Xiao

Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas
Louis Kwok, Michal Bravansky, Lewis Griffin

Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning
Tinghui Zhu, Kai Zhang, Jian Xie, Yu Su

LLM economicus? Mapping the Behavioral Biases of LLMs via Utility Theory
Jillian Ross, Yoon Kim, Andrew Lo

CALM : A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias
Vipul Gupta, Pranav Narayanan Venkit, Hugo Laurençon, Shomir Wilson, Rebecca J. Passonneau

Chinese Tiny LLM: Pretraining a Chinese-Centered Large Language Model
Xeron Du, Zhouliang Yu, Songyang Gao, Ding Pan, Cheng Yuyang, Ziyang Ma, Ruibin Yuan, Xingwei Qu, Jiaheng Liu, Tianyu Zheng, Xinchen Luo, Guorui Zhou, Wenhu Chen, Ge Zhang

Using Natural Language Explanations to Rescale Human Judgments
Manya Wadhwa, Jifan Chen, Junyi Jessy Li, Greg Durrett

LLM360: Towards Fully Transparent Open-Source LLMs
Zhengzhong Liu, Aurick Qiao, Willie Neiswanger, Hongyi Wang, Bowen Tan, Tianhua Tao, Junbo Li, Yuqi Wang, Suqi Sun, Omkar Pangarkar, Richard Fan, Yi Gu, Victor Miller, Yonghao Zhuang, Guowei He, Haonan Li, Fajri Koto, Liping Tang, Nikhil Ranjan, Zhiqiang Shen, Roberto Iriondo, Cun Mu, Zhiting Hu, Mark Schulze, Preslav Nakov, Timothy Baldwin, Eric Xing

From Narratives to Numbers: Valid Inference Using Language Model Predictions from Verbal Autopsies
Shuxian Fan, Adam Visokay, Kentaro Hoffman, Stephen Salerno, Li Liu, Jeffrey T. Leek, Tyler McCormick

The Larger the Better? Improved LLM Code-Generation via Budget Reallocation
Michael Hassid, Tal Remez, Jonas Gehring, Roy Schwartz, Yossi Adi

"Merge Conflicts!'" Exploring the Impacts of External Knowledge Distractors to Parametric Knowledge Graphs
Cheng Qian, Xinran Zhao, Tongshuang Wu

Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models
Adam Karvonen

AgentKit: Structured LLM Reasoning with Dynamic Graphs
Yue Wu, Yewen Fan, So Yeon Min, Shrimai Prabhumoye, Stephen Marcus McAleer, Russ Salakhutdinov, Yonatan Bisk, Yuanzhi Li, Tom Mitchell

A Reparameterized Discrete Diffusion Model for Text Generation
Lin Zheng, Jianbo Yuan, Lei Yu, Lingpeng Kong

Best Practices and Lessons Learned on Synthetic Data
Ruibo Liu, Jerry Wei, Fangyu Liu, Chenglei Si, Yanzhe Zhang, Jinmeng Rao, Steven Zheng, Daiyi Peng, Diyi Yang, Denny Zhou, Andrew M. Dai

Let’s Think Dot by Dot: Hidden computation in transformer language models
Jacob Pfau, William Merrill, Samuel R. Bowman

Multi-hop Question Answering under Temporal Knowledge Editing
Keyuan Cheng, Gang Lin, Haoyang Fei, Yuxuan Zhai, Lu Yu, Muhammad Asif Ali, Lijie Hu, Di Wang

DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning
Abhay Zala, Han Lin, Jaemin Cho, Mohit Bansal

Autonomous Evaluation and Refinement of Digital Agents
Jiayi Pan, Yichi Zhang, Nicholas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr

Building a Large Japanese Web Corpus for Large Language Models
Naoaki Okazaki, Kakeru Hattori, Hirai Shota, Hiroki Iida, Masanari Ohi, Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Rio Yokota, Sakae Mizuki

Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck
Nathan Godey, Éric Villemonte de la Clergerie, Benoît Sagot

Are Language Models Robust Coreference Resolvers?
Nghia T. Le, Alan Ritter

Information Guided Regularization for Fine-tuning Language Models
Mandar Sharma, Nikhil Muralidhar, Shengzhe Xu, Raquib Bin Yousuf, Naren Ramakrishnan

Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
Ruiqi Zhang, Licong Lin, Yu Bai, Song Mei

ScenicNL: Generating Probabilistic Scenario Programs from Natural Language
Karim Elmaaroufi, Devan Shanker, Ana Cismaru, Marcell Vazquez-Chanlatte, Alberto Sangiovanni-Vincentelli, Matei Zaharia, Sanjit A. Seshia

Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers
MohammadReza Ebrahimi, Sunny Panchal, Roland Memisevic

Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
Xingyu Fu, Muyu He, Yujie Lu, William Yang Wang, Dan Roth

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
Robert Vacareanu, Vlad Andrei Negru, Vasile Suciu, Mihai Surdeanu

Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models - A Survey
Philipp Mondorf, Barbara Plank

Forklift: An Extensible Neural Lifter
Jordi Armengol-Estapé, Rodrigo C. O. Rocha, Jackson Woodruff, Pasquale Minervini, Michael O'Boyle

Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
Zexuan Zhong, Mengzhou Xia, Danqi Chen, Mike Lewis

What makes a good metric? Evaluating automatic metrics for text-to-image consistency
Candace Ross, Melissa Hall, Adriana Romero-Soriano, Adina Williams

Empowering Large Language Model Agents through Action Learning
Haiteng Zhao, Chang Ma, Guoyin Wang, Jing Su, Lingpeng Kong, Jingjing Xu, Zhi-Hong Deng, Hongxia Yang

On Limitations of the Transformer Architecture
Binghui Peng, Srini Narayanan, Christos Papadimitriou

IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations
Deqing Fu, Ruohao Guo, Ghazal Khalighinejad, Ollie Liu, Bhuwan Dhingra, Dani Yogatama, Robin Jia, Willie Neiswanger

On Fairness of Low-Rank Adaptation of Large Models
Zhoujie Ding, Ken Liu, Pura Peetathawatchai, Berivan Isik, Sanmi Koyejo

Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning
Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Daogao Liu, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

Information-Theoretic Distillation for Reference-less Summarization
Jaehun Jung, Ximing Lu, Liwei Jiang, Faeze Brahman, Peter West, Pang Wei Koh, Yejin Choi

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Chapados, Siva Reddy

Faithful and Unfaithful Error Recovery in Chain of Thought
Evelyn Yee, Alice Li, Chenyu Tang, Yeonho Jung, Ramamohan Paturi, Leon Bergen

AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models
Sicheng Zhu, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, Tong Sun

Evaluating Language Models for Efficient Code Generation
Jiawei Liu, Songrun Xie, Junhao Wang, Yuxiang Wei, Yifeng Ding, LINGMING ZHANG

Early Weight Averaging meets High Learning Rates for LLM Pre-training
Sunny Sanyal, Atula Tejaswi, Jean Kaddour, Abhishek Kumar, Sujay Sanghavi

Chain-of-Symbol Prompting For Spatial Reasoning in Large Language Models
Hanxu Hu, Hongyuan Lu, Huajian Zhang, Yun-Ze Song, Wai Lam, Yue Zhang

What is in Your Safe Data? Identifying Benign Data that Breaks Safety
Luxi He, Mengzhou Xia, Peter Henderson

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Hanshi Sun, Zhuoming Chen, Xinyu Yang, Yuandong Tian, Beidi Chen

Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models
Sebastian Bordt, Harsha Nori, Vanessa Cristiny Rodrigues Vasconcelos, Besmira Nushi, Rich Caruana

Reverse Training to Nurse the Reversal Curse
Olga Golovneva, Zeyuan Allen-Zhu, Jason E Weston, Sainbayar Sukhbaatar

LLM4Causal: Democratized Causal Tools for Everyone via Large Language Model
Haitao Jiang, Lin Ge, Yuhe Gao, Jianian Wang, Rui Song

🔦 Spotlight Starling-7B: Improving Helpfulness and Harmlessness with RLAIF
Banghua Zhu, Evan Frick, Tianhao Wu, Hanlin Zhu, Karthik Ganesan, Wei-Lin Chiang, Jian Zhang, Jiantao Jiao

RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models
Jie Huang, Wei Ping, Peng Xu, Mohammad Shoeybi, Kevin Chen-Chuan Chang, Bryan Catanzaro

JailBreakV: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks
Weidi Luo, Siyuan Ma, Xiaogeng Liu, Xiaoyu Guo, Chaowei Xiao

🔦 Spotlight A Long Way to Go: Investigating Length Correlations in RLHF
Prasann Singhal, Tanya Goyal, Jiacheng Xu, Greg Durrett

Counting Like Transformers: Compiling Temporal Counting Logic Into Softmax Transformers
Andy Yang, David Chiang

CoLLEGe: Concept Embedding Generation for Large Language Models
Ryan Teehan, Brenden M. Lake, Mengye Ren

CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration
Jiahui Gao, Renjie Pi, Tianyang Han, Han Wu, Lanqing HONG, Lingpeng Kong, Xin Jiang, Zhenguo Li

Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding
Zachary Ankner, Rishab Parthasarathy, Aniruddha Nrusimha, Christopher Rinard, Jonathan Ragan-Kelley, William Brandon

Model Autophagy Analysis to Explicate Self-consumption within Human-AI Interactions
Shu Yang, Muhammad Asif Ali, Lu Yu, Lijie Hu, Di Wang

EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents
Abhay Zala, Jaemin Cho, Han Lin, Jaehong Yoon, Mohit Bansal

Massive Activations in Large Language Models
Mingjie Sun, Xinlei Chen, J Zico Kolter, Zhuang Liu

Suspicion Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4
Jiaxian Guo, Bo Yang, Paul Yoo, Bill Yuchen Lin, Yusuke Iwasawa, Yutaka Matsuo

Evaluating the Adversarial Robustness of Retrieval-Based In-Context Learning for Large Language Models
Simon Yu, Jie He, Pasquale Minervini, Jeff Z. Pan

StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Alex Zhuang, Ge Zhang, Tianyu Zheng, Xeron Du, Junjie Wang, Weiming Ren, Wenhao Huang, Jie Fu, Xiang Yue, Wenhu Chen

D2PO: Discriminator-Guided DPO with Response Evaluation Models
Prasann Singhal, Nathan Lambert, Scott Niekum, Tanya Goyal, Greg Durrett

🔦 Spotlight Tower: An Open Multilingual Large Language Model for Translation-Related Tasks
Duarte Miguel Alves, José Pombal, Nuno M Guerreiro, Pedro Henrique Martins, João Alves, Amin Farajian, Ben Peters, Ricardo Rei, Patrick Fernandes, Sweta Agrawal, Pierre Colombo, José G. C. de Souza, Andre Martins

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Haotian Zhang, Haoxuan You, Philipp Dufter, Bowen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, Yinfei Yang

Self-Guide: Better Task-Specific Instruction Following via Self-Synthetic Finetuning
Chenyang Zhao, Xueying Jia, Vijay Viswanathan, Graham Neubig, Tongshuang Wu

3M-Diffusion: Latent Multi-Modal Diffusion for Language-Guided Molecular Structure Generation
Huaisheng Zhu, Teng Xiao, Vasant G Honavar

CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting
Huihan Li, Liwei Jiang, Nouha Dziri, Xiang Ren, Yejin Choi

LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models
Haoran Li, Junqi Liu, Zexian Wang, Shiyuan Luo, Xiaowei Jia, Huaxiu Yao

CTIKG: LLM-Powered Knowledge Graph Construction from Cyber Threat Intelligence
Liangyi Huang, Xusheng Xiao

Enhancing Adversarial Robustness of LLMs with Analytic Hierarchy Process
Jiahao Zhao, Minzheng Wang, Nan Xu, YinLuo, Wenji Mao

Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions
Federico Cassano, Luisa Li, Akul Sethi, Noah Shinn, Abby Brennan-Jones, Jacob Ginesin, Edward Berman, George Chakhnashvili, Anton Lozhkov, Carolyn Jane Anderson, Arjun Guha

Length-Controlled AlpacaEval: A Simple Debiasing of Automatic Evaluators
Yann Dubois, Percy Liang, Tatsunori Hashimoto

STaR-GATE: Teaching Language Models to Ask Clarifying Questions
Chinmaya Andukuri, Jan-Philipp Fränken, Tobias Gerstenberg, Noah Goodman

Should We Attend More or Less? Modulating Attention for Fairness
Abdelrahman Zayed, Goncalo Mordido, Samira Shabanian, Sarath Chandar

On Robustness-Accuracy Characterization of Language Models using Synthetic Datasets
Ching-Yun Ko, Pin-Yu Chen, Payel Das, Yung-Sung Chuang, Luca Daniel

Handling Open-Vocabulary Constructs in Formalizing Specifications: Retrieval Augmented Parsing with Expert Knowledge
Mohammad Saqib Hasan, Sayontan Ghosh, Dhruv Verma, Geoff Kuenning, Erez Zadok, Scott Smolka, Niranjan Balasubramanian

Do Language Models Plan Ahead for Future Tokens?
Wilson Wu, John Xavier Morris, Lionel Levine

Automata-based constraints for language model decoding
Terry Koo, Frederick Liu, Luheng He

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversations
Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, Chi Wang

TOFU: A Task of Fictitious Unlearning for LLMs
Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary Chase Lipton, J Zico Kolter

SynerGPT: In-Context Learning for Personalized Drug Synergy Prediction and Drug Design
Carl Edwards, Aakanksha Naik, Tushar Khot, Martin D. Burke, Heng Ji, Tom Hope

Inspecting and Editing Knowledge Representations in Language Models
Evan Hernandez, Belinda Z. Li, Jacob Andreas

Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Yinhong Liu, Han Zhou, Zhijiang Guo, Ehsan Shareghi, Ivan Vulić, Anna Korhonen, Nigel Collier

CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs
Jingzhe Shi, Jialuo Li, Qinwei Ma, Zaiwen Yang, Huan Ma, Lei Li

Forcing Diffuse Distributions out of Language Models
Yiming Zhang, Avi Schwarzschild, Nicholas Carlini, J Zico Kolter, Daphne Ippolito

Certifying LLM Safety against Adversarial Prompting
Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Aaron Jiaxun Li, Soheil Feizi, Himabindu Lakkaraju

Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data
Charles Jin

TMMLU+: An Improved Traditional Chinese Evaluation Suite for Foundation Models
Zhi Rui Tam, Ya Ting Pai, Yen-Wei Lee, Hong-Han Shuai, Jun-Da Chen, Wei Min Chu, Sega Cheng

BumbleBee: Dynamic KV-Cache Streaming Submodular Summarization for Infinite-Context Transformers
Lilly Kumari, Shengjie Wang, Tianyi Zhou, Nikhil Sarda, Anthony Rowe, Jeff Bilmes

Keep the Cost Down: A Review on Methods to Optimize LLM’s KV-Cache Consumption
Shi Luohe, Hongyi Zhang, Yao Yao, Zuchao Li, hai zhao

PAPERCLIP: Associating Astronomical Observations and Natural Language with Multi-Modal Models
Siddharth Mishra-Sharma, YIDING SONG, Jesse Thaler

IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models
Haz Sameen Shahgir, Khondker Salman Sayeed, Abhik Bhattacharjee, Wasi Uddin Ahmad, Yue Dong, Rifat Shahriyar

Yes, no, maybe? Revisiting language models' response stability under paraphrasing for the assessment of political leaning
Patrick Haller, Jannis Vamvas, Lena Ann Jäger

Measuring Taiwanese Mandarin Language Understanding
Po-Heng Chen, Sijia Cheng, Wei-Lin Chen, Yen-Ting Lin, Yun-Nung Chen

Pairwise Proximal Policy Optimization: Language Model Alignment with Comparative RL
Tianhao Wu, Banghua Zhu, Ruoyu Zhang, Zhaojin Wen, Kannan Ramchandran, Jiantao Jiao

Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness
Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Tongshuang Wu

Poly-Visual-Expert Vision-Language Models
Xiaoran Fan, Tao Ji, 江常皓, Shuo Li, Senjie Jin, Sirui Song, Junke Wang, Boyang Hong, Lu Chen, Guodong Zheng, Ming Zhang, Caishuang Huang, Rui Zheng, Zhiheng Xi, Yuhao Zhou, Shihan Dou, Junjie Ye, Hang Yan, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang

Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration
Qiushi Sun, Zhangyue Yin, Xiang Li, Zhiyong Wu, Xipeng Qiu, Lingpeng Kong

MANGO: A Benchmark for Evaluating Mapping and Navigation Abilities of Large Language Models
Peng Ding, Jiading Fang, Peng Li, Kangrui Wang, Xiaochen Zhou, Mo Yu, Jing Li, Hongyuan Mei, Matthew Walter

ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
Yuxuan Wang, Alan Yuille, Zhuowan Li, Zilong Zheng

🔦 Spotlight Measuring and Controlling Instruction (In)Stability in Language Model Dialogs
Kenneth Li, Tianle Liu, Naomi Bashkansky, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
Jacob Eisenstein, Chirag Nagpal, Alekh Agarwal, Ahmad Beirami, Alexander D'Amour, Krishnamurthy Dj Dvijotham, Adam Fisch, Katherine A Heller, Stephen R Pfohl, Deepak Ramachandran, Peter Shaw, Jonathan Berant

SteP: Stacked LLM Policies for Web Actions
Paloma Sodhi, S.R.K Branavan, Yoav Artzi, Ryan McDonald

LLM-Datasets: An Open Framework for Pretraining Datasets of Large Language Models
Malte Ostendorff, Pedro Ortiz Suarez, Lucas Fonseca Lage, Georg Rehm

Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization
Costas Mavromatis, Petros Karypis, George Karypis

Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners
Jihyeon Lee, Dain Kim, Doohae Jung, Boseop Kim, Kyoung-Woon On

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data
Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Tomasz Korbak, Henry Sleight, Rajashree Agrawal, John Hughes, Dhruv Bhandarkar Pai, Andrey Gromov, Dan Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo

Prompt-prompted Adaptive Structured Pruning for Efficient LLM Generation
Harry Dong, Beidi Chen, Yuejie Chi

WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting
Olly Styles, Sam Miller, Patricio Cerda-Mardini, Tanaya Guha, Victor Sanchez, Bertie Vidgen

🔦 Spotlight Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
Eric Zelikman, Eliana Lorch, Lester Mackey, Adam Tauman Kalai

Cohesive Conversations: Enhancing Authenticity in Multi-Agent Simulated Dialogues
KuanChao Chu, Yi-Pei Chen, Hideki Nakayama

StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows
Yiran Wu, Tianwei Yue, Shaokun Zhang, Chi Wang, Qingyun Wu

🔦 Spotlight MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Shengding Hu, Yuge Tu, Xu Han, Ganqu Cui, Chaoqun He, Weilin Zhao, Xiang Long, Zhi Zheng, Yewei Fang, Yuxiang Huang, Xinrong Zhang, Zhen Leng Thai, Chongyi Wang, Yuan Yao, Chenyang Zhao, Jie Zhou, Jie Cai, Zhongwu Zhai, Ning Ding, Chao Jia, Guoyang Zeng, dahai li, Zhiyuan Liu, Maosong Sun

Timo: Towards Better Temporal Reasoning for Language Models
Zhaochen Su, Jun Zhang, Tong Zhu, Xiaoye Qu, Juntao Li, Min zhang, Yu Cheng

Bot or Human? Detecting ChatGPT Imposters with A Single Question
Hong Wang, Xuan Luo, Weizhi Wang, Melody Yu, Xifeng Yan

🔦 Spotlight Will the Real Linda Please Stand up...to Large Language Models? Examining the Representativeness Heuristic in LLMs
Pengda Wang, Zilin Xiao, Hanjie Chen, Frederick L. Oswald

Enhancing Language Models with Idiomatic Reasoning
Jianing Zhou, Ziheng Zeng, Hongyu Gong, Suma Bhat

ACORN: Aspect-wise Commonsense Reasoning Explanation Evaluation
Ana Brassard, Benjamin Heinzerling, Keito Kudo, Keisuke Sakaguchi, Kentaro Inui

ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction
Mingyu Jin, Haochen Xue, Zhenting Wang, Boming Kang, Ruosong Ye, Kaixiong Zhou, Mengnan Du, Yongfeng Zhang

🔦 Spotlight Stream of Search (SoS): Learning to Search in Language
Kanishk Gandhi, Denise H J Lee, Gabriel Grand, Muxin Liu, Winson Cheng, Archit Sharma, Noah Goodman

Risks from Language Models for Automated Mental Healthcare: Ethics and Structure for Implementation
Declan Grabb, Max Lamparth, Nina Vasan

Prompt Public Large Language Models to Synthesize Data for Private On-device Applications
Shanshan Wu, Zheng Xu, Yanxiang Zhang, Yuanbo Zhang, Daniel Ramage

Agent-DocEdit: Language-Instructed LLM Agent for Content-Rich Document Editing
Te-Lin Wu, Rajiv Jain, Yufan Zhou, Puneet Mathur, Vlad I Morariu

From Strategic Narratives to Code-Like Cognitive Models: An LLM-Based Approach in A Sorting Task
Hanbo Xie, Hua-Dong Xiong, Robert Wilson

See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses
Yulong Chen, Yang Liu, Jianhao Yan, Xuefeng Bai, Ming Zhong, Yinghao Yang, Ziyi Yang, Chenguang Zhu, Yue Zhang

SPEER: Sentence-Level Planning of Long Clinical Summaries via Embedded Entity Retrieval
Griffin Thomas Adams, Jason Zucker, Noémie Elhadad

Effective Prompt Extraction from Language Models
Yiming Zhang, Nicholas Carlini, Daphne Ippolito

ReAct Meets ActRe: Autonomous Annotation of Agent Trajectories for Contrastive Self-Training
Zonghan Yang, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Yang Liu

InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification
Yujia Hu, Zhiqiang Hu, Chun Wei Seah, Roy Ka-Wei Lee