Accepted Papers
All accepted papers are hosted on OpenReview, and listed here. OpenReview also provides bibtex entries for each paper.
🏆 Outstanding Paper Award Dated Data: Tracing Knowledge Cutoffs in Large Language Models
Jeffrey Cheng, Marc Marone, Orion Weller, Dawn Lawrie, Daniel Khashabi, Benjamin Van Durme
🏆 Outstanding Paper Award Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu, Tri Dao
🏆 Outstanding Paper Award AI-generated text boundary detection with RoFT
Laida Kushnareva, Tatiana Gaintseva, Dmitry Abulkhanov, Kristian Kuznetsov, German Magai, Eduard Tulchinskii, Serguei Barannikov, Sergey Nikolenko, Irina Piontkovskaya
🏆 Outstanding Paper Award Auxiliary task demands mask the capabilities of smaller language models
Jennifer Hu, Michael Frank
A Survey on Deep Learning for Theorem Proving
Zhaoyu Li, Jialiang Sun, Logan Murphy, Qidong Su, Zenan Li, Xian Zhang, Kaiyu Yang, Xujie Si
Towards Measuring the Representation of Subjective Global Opinions in Language Models
Esin DURMUS, Karina Nguyen, Thomas Liao, Nicholas Schiefer, Amanda Askell, Anton Bakhtin, Carol Chen, Zac Hatfield-Dodds, Danny Hernandez, Nicholas Joseph, Liane Lovitt, Sam McCandlish, Orowa Sikder, Alex Tamkin, Janel Thamkul, Jared Kaplan, Jack Clark, Deep Ganguli
🔦 Spotlight Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM
Chunqiu Steven Xia, Yinlin Deng, LINGMING ZHANG
🔦 Spotlight Transformer Circuit Evaluation Metrics Are Not Robust
Joseph Miller, Bilal Chughtai, William Saunders
🔦 Spotlight Long-Form Answers to Visual Questions from Blind and Low Vision People
Mina Huh, Fangyuan Xu, Yi-Hao Peng, Chongyan Chen, Hansika Murugu, Danna Gurari, Eunsol Choi, Amy Pavel
Locating and Editing Factual Associations in Mamba
Arnab Sen Sharma, David Atkinson, David Bau
Does Incomplete Syntax Influence Korean Language Model? Focusing on Word Order and Case Markers
Jong Myoung Kim, Young-Jun Lee, Yong-Jin Han, Ho-Jin Choi, Sangkeun Jung
LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play
Li-Chun Lu, Shou-Jen Chen, Tsung-Min Pai, Chan-Hung Yu, Hung-yi Lee, Shao-Hua Sun
Large Language Models are Capable of Offering Cognitive Reappraisal, if Guided
Hongli Zhan, Allen Zheng, Yoon Kyung Lee, Jina Suh, Junyi Jessy Li, Desmond Ong
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Gerald Shen, Zhilin Wang, Olivier Delalleau, Jiaqi Zeng, Yi Dong, Daniel Egert, Shengyang Sun, Jimmy J. Zhang, Sahil Jain, Ali Taghibakhshi, Markel Sanz Ausin, Ashwath Aithal, Oleksii Kuchaiev
Khayyam Challenge (PersianMMLU): Is Your LLM Truly Wise to The Persian Language?
Omid Ghahroodi, Marzia Nouri, Mohammad Vali Sanian, Alireza Sahebi, Doratossadat Dastgheib, Ehsaneddin Asgari, Mahdieh Soleymani Baghshah, Mohammad Hossein Rohban
How Susceptible are LLMs to Influence in Prompts?
Sotiris Anagnostidis, Jannis Bulian
PairEval: Open-domain Dialogue Evaluation Metric with Pairwise Comparisons
ChaeHun Park, Minseok Choi, Dohyun Lee, Jaegul Choo
HGRN2: Gated Linear RNNs with State Expansion
Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong
Studying Large Language Model Behaviors Under Context-Memory Conflicts With Real Documents
Evgenii Kortukov, Alexander Rubinstein, Elisa Nguyen, Seong Joon Oh
Investigating Instruction Tuning Large Language Models on Graphs
Kerui Zhu, Bo-Wei Huang, Bowen Jin, Yizhu Jiao, Ming Zhong, Kevin Chen-Chuan Chang, Shou-De Lin, Jiawei Han
FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across Tokenizers
Joshua Nathaniel Williams, J Zico Kolter
CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization
Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Peter Jansen, Oyvind Tafjord, Niket Tandon, Li Zhang, Chris Callison-Burch, Peter Clark
Helmsman of the Masses? Evaluate the Opinion Leadership of Large Language Models in the Werewolf Game
Silin Du, Xiaowei Zhang
Factual and Tailored Recommendation Endorsements using Language Models and Reinforcement Learning
Jihwan Jeong, Yinlam Chow, Guy Tennenholtz, ChihWei Hsu, Mohammad Ghavamzadeh, Craig Boutilier
How Well Do LLMs Identify Cultural Unity in Diversity?
Jialin Li, Junli Wang, Junjie Hu, Ming Jiang
Guiding Language Model Reasoning with Planning Tokens
Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni
Large Language Model is not a (Multilingual) Compositional Relation Reasoner
Jinman Zhao, Xueyan Zhang
Instruction Mining: Instruction Data Selection for Tuning Large Language Models
Yihan Cao, Yanbin Kang, Chi Wang, Lichao Sun
Does your data spark joy? Performance gains from domain upsampling at the end of training
Cody Blakeney, Mansheej Paul, Brett W. Larsen, Sean Owen, Jonathan Frankle
Predicting Emergent Capabilities by Finetuning
Charlie Victor Snell, Eric Wallace, Dan Klein, Sergey Levine
Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
Tim Baumgärtner, Yang Gao, Dana Alon, Donald Metzler
CATS: Context-Aware Thresholding for Sparsity in Large Language Models
Donghyun Lee, Jaeyong Lee, Genghan Zhang, Mo Tiwari, Azalia Mirhoseini
Efficient Hybrid Long Sequence Modeling with State Space Augmented Transformers
Simiao Zuo, Xiaodong Liu, Jian Jiao, Denis X Charles, Eren Manavoglu, Tuo Zhao, Jianfeng Gao
Does Collaborative Human–LM Dialogue Generation Help Information Extraction from Human–Human Dialogues?
Bo-Ru Lu, Nikita Haduong, Chia-Hsuan Lee, Zeqiu Wu, Hao Cheng, Paul Koester, Jean Utke, Tao Yu, Noah A. Smith, Mari Ostendorf
🔦 Spotlight Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
Jiacheng Liu, Sewon Min, Luke Zettlemoyer, Yejin Choi, Hannaneh Hajishirzi
RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation
Chi-Min Chan, Chunpu Xu, Ruibin Yuan, Hongyin Luo, Wei Xue, Yike Guo, Jie Fu
Exploring the Mystery of Influential Data for Mathematical Reasoning
Xinzhe Ni, Yeyun Gong, Zhibin Gou, yelong shen, Yujiu Yang, Nan Duan, Weizhu Chen
LalaEval: A Holistic Human Evaluation Framework for Domain-Specific Large Language Models
Chongyan Sun, Ken Lin, Shiwei Wang, Hulong Wu, Chengfei Fu, Zhen Wang
Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild
Niloofar Mireshghallah, Maria Antoniak, Yash More, Yejin Choi, Golnoosh Farnadi
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
Yixuan Tang, Yi Yang
How bad is training on synthetic data? A statistical analysis of language model collapse
Mohamed El Amine Seddik, Suei-Wen Chen, Soufiane Hayou, Pierre Youssef, Merouane Abdelkader DEBBAH
V-STaR: Training Verifiers for Self-Taught Reasoners
Arian Hosseini, Xingdi Yuan, Nikolay Malkin, Aaron Courville, Alessandro Sordoni, Rishabh Agarwal
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Bo Peng, Daniel Goldstein, Quentin Gregory Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Teddy Ferdinan, Kranthi Kiran GV, Haowen Hou, Satyapriya Krishna, Ronald McClelland Jr., Niklas Muennighoff, Fares Obeid, Atsushi Saito, Guangyu Song, Haoqin Tu, Ruichong Zhang, Bingchen Zhao, Qihang Zhao, Jian Zhu, Rui-Jie Zhu
Linearizing Large Language Models
Jean Mercat, Igor Vasiljevic, Sedrick Keh, Kushal Arora, Achal Dave, Adrien Gaidon, Thomas Kollar
VideoDirectorGPT: Consistent Multi-Scene Video Generation via LLM-Guided Planning
Han Lin, Abhay Zala, Jaemin Cho, Mohit Bansal
OpenAgents: An Open Platform for Language Agents in the Wild
Tianbao Xie, Fan Zhou, Zhoujun Cheng, Peng Shi, Luoxuan Weng, Yitao Liu, Toh Jing Hua, Junning Zhao, Qian Liu, Che Liu, Zeyu Leo Liu, Yiheng Xu, Hongjin SU, Dongchan Shin, Caiming Xiong, Tao Yu
TPD: Enhancing Student Language Model Reasoning via Principle Discovery and Guidance
Haorui Wang, Rongzhi Zhang, Yinghao Li, Lingkai Kong, Yuchen Zhuang, Xiusi Chen, Chao Zhang
Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP
François Remy, Pieter Delobelle, Hayastan Avetisyan, Alfiya Khabibullina, Miryam de Lhoneux, Thomas Demeester
RAFT: Adapting Language Model to Domain Specific RAG
Tianjun Zhang, Shishir G Patil, Naman Jain, Sheng Shen, Matei Zaharia, Ion Stoica, Joseph E. Gonzalez
PhonATe: Impact of Type-Written Phonological Features of African American Language on Generative Language Modeling Tasks
Nicholas Deas, Jessica A Grieser, Xinmeng Hou, Shana Kleiner, Tajh Martin, Sreya Nandanampati, Desmond U. Patton, Kathleen McKeown
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
Yize Zhao, Tina Behnia, Vala Vakilian, Christos Thrampoulidis
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think
Xinpeng Wang, Chengzhi Hu, Bolei Ma, Paul Röttger, Barbara Plank
ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs
Pengrui Han, Rafal Dariusz Kocielnik, Adhithya Prakash Saravanan, Roy Luoyao Jiang, Or Sharir, Anima Anandkumar
Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation
Biqing Qi, Kaiyan Zhang, Kai Tian, Haoxiang Li, Zhang-Ren Chen, Sihang Zeng, Ermo Hua, Hu Jinfang, Bowen Zhou
Resolving Knowledge Conflicts in Large Language Models
Yike Wang, Shangbin Feng, Heng Wang, Weijia Shi, Vidhisha Balachandran, Tianxing He, Yulia Tsvetkov
How Far Are We from Intelligent Visual Deductive Reasoning?
Yizhe Zhang, Richard He Bai, Ruixiang ZHANG, Jiatao Gu, Shuangfei Zhai, Joshua M. Susskind, Navdeep Jaitly
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training
Dacheng Li, Rulin Shao, Anze Xie, Eric Xing, Xuezhe Ma, Ion Stoica, Joseph E. Gonzalez, Hao Zhang
Web Retrieval Agents for Evidence-Based Misinformation Detection
Jacob-Junqi Tian, Hao Yu, Yury Orlovskiy, Tyler Vergho, Mauricio Rivera, Mayank Goel, Zachary Yang, Jean-François Godbout, Reihaneh Rabbany, Kellin Pelrine
Task Success is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors
Lin Guan, Yifan Zhou, Denis Liu, Yantian Zha, Heni Ben Amor, Subbarao Kambhampati
Stop Reasoning! When Multimodal LLM with Chain-of-Thought Reasoning Meets Adversarial Image
Zefeng Wang, Zhen Han, Shuo Chen, Fan Xue, Zifeng Ding, Xun Xiao, Volker Tresp, Philip Torr, Jindong Gu
PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models
Devansh Jain, Priyanshu Kumar, Samuel Gehman, Xuhui Zhou, Thomas Hartvigsen, Maarten Sap
Reasoning about concepts with LLMs: Inconsistencies abound
Rosario Uceda Sosa, Karthikeyan Natesan Ramamurthy, Maria Chang, Moninder Singh
Logits of API-Protected LLMs Leak Proprietary Information
Matthew Finlayson, Xiang Ren, Swabha Swayamdipta
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Eric Zelikman, Georges Raif Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah Goodman
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Sainbayar Sukhbaatar, Olga Golovneva, Vasu Sharma, Hu Xu, Xi Victoria Lin, Baptiste Roziere, Jacob Kahn, Shang-Wen Li, Wen-tau Yih, Jason E Weston, Xian Li
AdaMoLE: Fine-Tuning Large Language Models with Adaptive Mixture of Low-Rank Adaptation Experts
Zefang Liu, Jiahua Luo
Instruction-tuning Aligns LLMs to the Human Brain
Khai Loong Aw, Syrielle Montariol, Badr AlKhamissi, Martin Schrimpf, Antoine Bosselut
An Incomplete Loop: Instruction Inference, Instruction Following, and In-Context Learning in Language Models
Emmy Liu, Graham Neubig, Jacob Andreas
Learning to Plan for Language Modeling from Unlabeled Data
Nathan Cornille, Marie-Francine Moens, Florian Mai
Impact of Preference Noise on the Alignment Performance of Generative Language Models
Yang Gao, Dana Alon, Donald Metzler
🔦 Spotlight SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
Haojie Duanmu, Zhihang Yuan, Xiuhong Li, Jiangfei Duan, Xingcheng ZHANG, Dahua Lin
Eliciting Latent Knowledge from "Quirky" Language Models
Alex Troy Mallen, Madeline Brumley, Julia Kharchenko, Nora Belrose
AmbigDocs: Reasoning across Documents on Different Entities under the Same Name
Yoonsang Lee, Xi Ye, Eunsol Choi
Is ChatGPT a Good Sentiment Analyzer?
Zengzhi Wang, Qiming Xie, Yi Feng, Zixiang Ding, Zinong Yang, Rui Xia
Multi-FAct: Assessing Factuality of Multilingual LLMs using FActScore
Sheikh Shafayat, Eunsu Kim, Juhyun Oh, Alice Oh
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
Bang An, Sicheng Zhu, Ruiyi Zhang, Michael-Andrei Panaitescu-Liess, Yuancheng Xu, Furong Huang
LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset
Botao Yu, Frazier N. Baker, Ziqi Chen, Xia Ning, Huan Sun
Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs
Yilun Hua, Yoav Artzi
Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback
Hongshen Xu, Zichen Zhu, Situo Zhang, Da Ma, Shuai Fan, Lu Chen, Kai Yu
CA-LoRA: Adapting Existing LoRA for Compressed LLMs to Enable Efficient Multi-Tasking on Personal Devices
Weilin Zhao, Yuxiang Huang, Xu Han, Zhiyuan Liu, Zhengyan Zhang, Kuai Li, Chen Chen, TAO YANG, Maosong Sun
Don't throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding
Jiacheng Liu, Andrew Cohen, Ramakanth Pasunuru, Yejin Choi, Hannaneh Hajishirzi, Asli Celikyilmaz
Crystal: Illuminating LLM Abilities on Language and Code
Tianhua Tao, Junbo Li, Bowen Tan, Hongyi Wang, William Marshall, Bhargav M Kanakiya, Joel Hestness, Natalia Vassilieva, Zhiqiang Shen, Eric Xing, Zhengzhong Liu
🔦 Spotlight GeniL: A Multilingual Dataset on Generalizing Language
Aida Mostafazadeh Davani, Sagar Gubbi Venkatesh, Sunipa Dev, Shachi Dave, Vinodkumar Prabhakaran
RULER: What’s the Real Context Size of Your Long-Context Language Models?
Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Boris Ginsburg
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Shengyi Huang, Michael Noukhovitch, Arian Hosseini, Kashif Rasul, Weixun Wang, Lewis Tunstall
Can Language Models Solve Olympiad Programming?
Quan Shi, Michael Tang, Karthik R Narasimhan, Shunyu Yao
From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function
Rafael Rafailov, Joey Hejna, Ryan Park, Chelsea Finn
PRobELM: Plausibility Ranking Evaluation for Language Models
Moy Yuan, Eric Chamoun, Rami Aly, Chenxi Whitehouse, Andreas Vlachos
Bring Your Own Data! Self-Sensitivity Evaluation for Large Language Models
Neel Jain, Khalid Saifullah, Yuxin Wen, John Kirchenbauer, Manli Shu, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein
Can MLLMs Perform Text-to-Image In-Context Learning?
Yuchen Zeng, Wonjun Kang, Yicong Chen, Hyung Il Koo, Kangwook Lee
DeStein: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion
Yu Li, Han Jiang, Chuanyang Gong, Zhihua Wei
Understanding Retrieval Augmentation for Long-Form Question Answering
Hung-Ting Chen, Fangyuan Xu, Shane Arora, Eunsol Choi
LAMPO: Large Language Models as Preference Machines for Few-shot Ordinal Classification
Zhen Qin, Junru Wu, Jiaming Shen, Tianqi Liu, Xuanhui Wang
LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models
Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Yan Xia, Wenshan Wu, Ting Song, Man Lan, Furu Wei
Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability
Zhuoyan Xu, Zhenmei Shi, Yingyu Liang
Does In-Context Learning Really Learn? Rethinking How Large Language Models Respond and Solve Tasks via In-Context Learning
Quanyu Long, Yin Wu, Wenya Wang, Sinno Jialin Pan
Characterizing Multimodal Long-form Summarization: A Case Study on Financial Reports
Tianyu Cao, Natraj Raman, Danial Dervovic, Chenhao Tan
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness
Manav Singhal, Tushar Aggarwal, Abhijeet Awasthi, Nagarajan Natarajan, Aditya Kanade
TarGEN: Targeted Data Generation with Large Language Models
Himanshu Gupta, Kevin Scaria, Ujjwala Anantheswaran, Shreyas Verma, Mihir Parmar, Saurabh Arjun Sawant, Chitta Baral, Swaroop Mishra
Uncovering Intermediate Variables in Transformers using Circuit Probing
Michael A. Lepori, Thomas Serre, Ellie Pavlick
UniMem: Towards a Unified View of Long-Context Large Language Models
Junjie Fang, Likai Tang, Hongzhe Bi, Yujia Qin, Si Sun, Zhenyu Li, Haolun Li, Yongjian Li, Xin Cong, Yankai Lin, Yukun Yan, Xiaodong Shi, Sen Song, Zhiyuan Liu, Maosong Sun
Towards Verifiable Text Generation with Symbolic References
Lucas Torroba Hennigen, Zejiang Shen, Aniruddha Nrusimha, Bernhard Gapp, David Sontag, Yoon Kim
VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?
Junpeng Liu, Yifan Song, Bill Yuchen Lin, Wai Lam, Graham Neubig, Yuanzhi Li, Xiang Yue
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs
Junying Chen, Xidong Wang, Ke Ji, Anningzhe Gao, Feng Jiang, Shunian Chen, Hongbo Zhang, Song Dingjie, Wenya Xie, Chuyi Kong, Jianquan Li, Xiang Wan, Haizhou Li, Benyou Wang
LMD3: Language Model Data Density Dependence
John Kirchenbauer, Garrett Honke, Gowthami Somepalli, Jonas Geiping, Katherine Lee, Daphne Ippolito, Tom Goldstein, David Andre
The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models
Kian Ahrabian, Zhivar Sourati, Kexuan Sun, Jiarui Zhang, Yifan Jiang, Fred Morstatter, Jay Pujara
🔦 Spotlight Tuning Language Models by Proxy
Alisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, Noah A. Smith
Evaluating LLMs at Detecting Errors in LLM Responses
Ryo Kamoi, Sarkar Snigdha Sarathi Das, Renze Lou, Jihyun Janice Ahn, Yilun Zhao, Xiaoxin Lu, Nan Zhang, Yusen Zhang, Haoran Ranran Zhang, Sujeeth Reddy Vummanthala, Salika Dave, Shaobo Qin, Arman Cohan, Wenpeng Yin, Rui Zhang
HDT: Hierarchical Document Transformer
Haoyu He, Markus Flicke, Jan Buchmann, Iryna Gurevych, Andreas Geiger
With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation
Yan Wang, Dongyang Ma, Deng Cai
CatCode: A Comprehensive Evaluation Framework for LLMs On the Mixture of Code and Text
Zhenru Lin, Yiqun Yao, Yang Yuan
Learning From Correctness Without Prompting Makes LLM Efficient Reasoner
Yuxuan Yao, Han Wu, Zhijiang Guo, Zhou Biyan, Jiahui Gao, Sichun Luo, Hanxu Hou, Xiaojin Fu, Linqi Song
Unveiling LLMs: The Evolution of Latent Representations in a Dynamic Knowledge Graph
Marco Bronzini, Carlo Nicolini, Bruno Lepri, Jacopo Staiano, Andrea Passerini
Scalable Model Editing via Customized Expert Networks
Zihan Yao, Yu He, Tianyu Qi, Ming Li
Fine-grained Hallucination Detection and Editing for Language Models
Abhika Mishra, Akari Asai, Vidhisha Balachandran, Yizhong Wang, Graham Neubig, Yulia Tsvetkov, Hannaneh Hajishirzi
ORAG: Ontology-Guided Retrieval-Augmented Generation for Theme-Specific Entity Typing
Jinfeng Xiao, Linyi Ding, James Barry, Mohab Elkaref, Geeth De Mel, Jiawei Han
Unified View of Grokking, Double Descent and Emergent Abilities: A Comprehensive Study on Algorithm Task
Yufei Huang, Shengding Hu, Xu Han, Zhiyuan Liu, Maosong Sun
Fakes of Varying Shades: How Warning Affects Human Perception and Engagement Regarding LLM Hallucinations
Mahjabin Nahar, Haeseung Seo, Eun-Ju Lee, Aiping Xiong, Dongwon Lee
Personalized Collaborative Fine-Tuning for On-Device Large Language Models
Nicolas Wagner, Dongyang Fan, Martin Jaggi
Benchmarks as Microscopes: A Call for Model Metrology
Michael Saxon, Ari Holtzman, Peter West, William Yang Wang, Naomi Saphra
Tabular Transfer Learning via Prompting LLMs
Jaehyun Nam, Woomin Song, Seong Hyeon Park, Jihoon Tack, Sukmin Yun, Jaehyung Kim, Kyu Hwan Oh, Jinwoo Shin
How Multilingual are Large Language Models Fine-tuned for Translation?
Aquia Richburg, Marine Carpuat
O3D: Offline Data-driven Discovery and Distillation for Sequential Decision-Making with Large Language Models
Yuchen Xiao, Yanchao Sun, Mengda Xu, Udari Madhushani Sehwag, Jared Vann, Deepeka Garg, Sumitra Ganesh
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models
Shibo Hao, Yi Gu, Haotian Luo, Tianyang Liu, Xiyan Shao, Xinyuan Wang, Shuhua Xie, Haodi Ma, Adithya Samavedhi, Qiyue Gao, Zhen Wang, Zhiting Hu
Do Membership Inference Attacks Work on Large Language Models?
Michael Duan, Anshuman Suri, Niloofar Mireshghallah, Sewon Min, Weijia Shi, Luke Zettlemoyer, Yulia Tsvetkov, Yejin Choi, David Evans, Hannaneh Hajishirzi
Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics
James A. Michaelov, Catherine Arnett, Ben Bergen
🔦 Spotlight The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks, Max Tegmark
Hummer: Towards Limited Competitive Preference Dataset
Yusen Wu, Li Jiang, Junwu Xiong, Jingqing Ruan, Yichuan Ding, Qingpei Guo, zujie wen, JUN ZHOU, Xiaotie Deng
Zephyr: Direct Distillation of LM Alignment
Lewis Tunstall, Edward Emanuel Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro Von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M Rush, Thomas Wolf
Nonparametric Variational Regularisation of Pretrained Transformers
Fabio James Fehr, James Henderson
Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability
Jiri Hron, Laura A Culp, Gamaleldin Fathy Elsayed, Rosanne Liu, Jasper Snoek, Simon Kornblith, Alex Rizkowsky, Isabelle Simpson, Jascha Sohl-Dickstein, Noah Fiedel, Aaron T Parisi, Alexander A Alemi, Azade Nova, Ben Adlam, Bernd Bohnet, Gaurav Mishra, Hanie Sedghi, Izzeddin Gur, Jaehoon Lee, John D Co-Reyes, Kathleen Kenealy, Kelvin Xu, Kevin Swersky, Igor Mordatch, Lechao Xiao, Maxwell Bileschi, Peter J Liu, Roman Novak, Sharad Vikram, Tris Warkentin, Jeffrey Pennington
Redesigning Information Markets in the Era of Language Models
Martin Weiss, Nasim Rahaman, Manuel Wuthrich, Yoshua Bengio, Li Erran Li, Bernhard Schölkopf, Christopher Pal
Large Language Model Routing with Benchmark Datasets
Tal Shnitzer, Anthony Ou, Mírian Silva, Kate Soule, Yuekai Sun, Justin Solomon, Neil Thompson, Mikhail Yurochkin
Language Models as Critical Thinking Tools: A Case Study of Philosophers
Andre Ye, Jared Moore, Rose Novick, Amy X Zhang
Cookbook: A framework for improving LLM generative abilities via programmatic data generating templates
Avanika Narayan, Mayee F Chen, Kush Bhatia, Christopher Re
Prompt Exploration with Prompt Regression
Michael Feffer, Ronald Xu, Yuekai Sun, Mikhail Yurochkin
FABLES: Evaluating faithfulness and content selection in book-length summarization
Yekyung Kim, Yapei Chang, Marzena Karpinska, Aparna Garimella, Varun Manjunatha, Kyle Lo, Tanya Goyal, Mohit Iyyer
Mapping the Increasing Use of LLMs in Scientific Papers
Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang, Christopher Potts, Christopher D Manning, James Y. Zou
Scattered Mixture-of-Experts Implementation
Shawn Tan, Yikang Shen, Rameswar Panda, Aaron Courville
What Are Tools Anyway? A Survey from the Language Model Perspective
Zhiruo Wang, Zhoujun Cheng, Hao Zhu, Daniel Fried, Graham Neubig
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration
Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, Diyi Yang
Data Checklist: On Unit-Testing Datasets with Usable Information
Heidi Chenyu Zhang, Shabnam Behzad, Kawin Ethayarajh, Dan Jurafsky
MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs
Vera Neplenbroek, Arianna Bisazza, Raquel Fernández
MambaByte: Token-free Selective State Space Model
Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan, Alexander M Rush
Description-Based Text Similarity
Shauli Ravfogel, Valentina Pyatkin, Amir David Nissan Cohen, Avshalom Manevich, Yoav Goldberg
Generating Synthetic Datasets for Few-shot Prompt Tuning
Xu Guo, Zilin Du, Boyang Li, Chunyan Miao
Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks?
Urja Khurana, Eric Nalisnick, Antske Fokkens, Swabha Swayamdipta
Does RoBERTa Perform Better than BERT in Continual Learning: An Attention Sink Perspective
Xueying Bai, Yifan Sun, Niranjan Balasubramanian
An In-Context Learning Agent for Formal Theorem-Proving
Amitayush Thakur, George Tsoukalas, Yeming Wen, Jimmy Xin, Swarat Chaudhuri
Efficient Parallelization Layouts for Large-Scale Distributed Model Training
Johannes Hagemann, Samuel Weinbach, Konstantin Dobler, Maximilian Schall, Gerard de Melo
Unforgettable Generalization in Language Models
Eric Zhang, Leshem Choshen, Jacob Andreas
MileBench: Benchmarking MLLMs in Long Context
Song Dingjie, Shunian Chen, Guiming Hardy Chen, Fei Yu, Xiang Wan, Benyou Wang
AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs
Zeyi Liao, Huan Sun
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
An Yan, Zhengyuan Yang, Junda Wu, Wanrong Zhu, Jianwei Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Julian McAuley, Jianfeng Gao, Lijuan Wang
Source-Aware Training Enables Knowledge Attribution in Language Models
Muhammad Khalifa, David Wadden, Emma Strubell, Honglak Lee, Lu Wang, Iz Beltagy, Hao Peng
A Language Agent for Autonomous Driving
Jiageng Mao, Junjie Ye, Yuxi Qian, Marco Pavone, Yue Wang
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
Chengsong Huang, Qian Liu, Bill Yuchen Lin, Tianyu Pang, Chao Du, Min Lin
🔦 Spotlight GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, Samuel R. Bowman
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms
Michael Hanna, Sandro Pezzelle, Yonatan Belinkov
Stronger Random Baselines for In-Context Learning
Gregory Yauney, David Mimno
Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities
Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Hiroki Iida, Masanari Ohi, Kakeru Hattori, Hirai Shota, Sakae Mizuki, Rio Yokota, Naoaki Okazaki
Decoupling Noise and Toxic Parameters for Language Model Detoxification by Task Vector Merging
Yongmin Kim, Takeshi Kojima, Yusuke Iwasawa, Yutaka Matsuo
Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection
Guillem Ramírez, Alexandra Birch, Ivan Titov
Adaptive Quantization Error Reconstruction for LLMs with Mixed Precision
Lin Ou, Jinpeng Xia, Yuewei Zhang, Chuzhan Hao, Hao Henry Wang
StyleTalker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
Yinghao Aaron Li, Xilin Jiang, Jordan Darefsky, Ge Zhu, Nima Mesgarani
🔦 Spotlight Iteratively Prompting Multimodal LLMs to Reproduce Natural and AI-Generated Images
Ali Naseh, Katherine Thai, Mohit Iyyer, Amir Houmansadr
Compression Represents Intelligence Linearly
Yuzhen Huang, Jinghan Zhang, Zifei Shan, Junxian He
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Lucas Lehnert, Sainbayar Sukhbaatar, DiJia Su, Qinqing Zheng, Paul McVay, Michael Rabbat, Yuandong Tian
How Easily do Irrelevant Inputs Skew the Responses of Large Language Models?
Siye Wu, Jian Xie, Jiangjie Chen, Tinghui Zhu, Kai Zhang, Yanghua Xiao
Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas
Louis Kwok, Michal Bravansky, Lewis Griffin
Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning
Tinghui Zhu, Kai Zhang, Jian Xie, Yu Su
LLM economicus? Mapping the Behavioral Biases of LLMs via Utility Theory
Jillian Ross, Yoon Kim, Andrew Lo
CALM : A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias
Vipul Gupta, Pranav Narayanan Venkit, Hugo Laurençon, Shomir Wilson, Rebecca J. Passonneau
Chinese Tiny LLM: Pretraining a Chinese-Centered Large Language Model
Xeron Du, Zhouliang Yu, Songyang Gao, Ding Pan, Cheng Yuyang, Ziyang Ma, Ruibin Yuan, Xingwei Qu, Jiaheng Liu, Tianyu Zheng, Xinchen Luo, Guorui Zhou, Wenhu Chen, Ge Zhang
Using Natural Language Explanations to Rescale Human Judgments
Manya Wadhwa, Jifan Chen, Junyi Jessy Li, Greg Durrett
LLM360: Towards Fully Transparent Open-Source LLMs
Zhengzhong Liu, Aurick Qiao, Willie Neiswanger, Hongyi Wang, Bowen Tan, Tianhua Tao, Junbo Li, Yuqi Wang, Suqi Sun, Omkar Pangarkar, Richard Fan, Yi Gu, Victor Miller, Yonghao Zhuang, Guowei He, Haonan Li, Fajri Koto, Liping Tang, Nikhil Ranjan, Zhiqiang Shen, Roberto Iriondo, Cun Mu, Zhiting Hu, Mark Schulze, Preslav Nakov, Timothy Baldwin, Eric Xing
From Narratives to Numbers: Valid Inference Using Language Model Predictions from Verbal Autopsies
Shuxian Fan, Adam Visokay, Kentaro Hoffman, Stephen Salerno, Li Liu, Jeffrey T. Leek, Tyler McCormick
The Larger the Better? Improved LLM Code-Generation via Budget Reallocation
Michael Hassid, Tal Remez, Jonas Gehring, Roy Schwartz, Yossi Adi
"Merge Conflicts!'" Exploring the Impacts of External Knowledge Distractors to Parametric Knowledge Graphs
Cheng Qian, Xinran Zhao, Tongshuang Wu
Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models
Adam Karvonen
AgentKit: Structured LLM Reasoning with Dynamic Graphs
Yue Wu, Yewen Fan, So Yeon Min, Shrimai Prabhumoye, Stephen Marcus McAleer, Russ Salakhutdinov, Yonatan Bisk, Yuanzhi Li, Tom Mitchell
A Reparameterized Discrete Diffusion Model for Text Generation
Lin Zheng, Jianbo Yuan, Lei Yu, Lingpeng Kong
Best Practices and Lessons Learned on Synthetic Data
Ruibo Liu, Jerry Wei, Fangyu Liu, Chenglei Si, Yanzhe Zhang, Jinmeng Rao, Steven Zheng, Daiyi Peng, Diyi Yang, Denny Zhou, Andrew M. Dai
Let’s Think Dot by Dot: Hidden computation in transformer language models
Jacob Pfau, William Merrill, Samuel R. Bowman
Multi-hop Question Answering under Temporal Knowledge Editing
Keyuan Cheng, Gang Lin, Haoyang Fei, Yuxuan Zhai, Lu Yu, Muhammad Asif Ali, Lijie Hu, Di Wang
DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning
Abhay Zala, Han Lin, Jaemin Cho, Mohit Bansal
Autonomous Evaluation and Refinement of Digital Agents
Jiayi Pan, Yichi Zhang, Nicholas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr
Building a Large Japanese Web Corpus for Large Language Models
Naoaki Okazaki, Kakeru Hattori, Hirai Shota, Hiroki Iida, Masanari Ohi, Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Rio Yokota, Sakae Mizuki
Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck
Nathan Godey, Éric Villemonte de la Clergerie, Benoît Sagot
Are Language Models Robust Coreference Resolvers?
Nghia T. Le, Alan Ritter
Information Guided Regularization for Fine-tuning Language Models
Mandar Sharma, Nikhil Muralidhar, Shengzhe Xu, Raquib Bin Yousuf, Naren Ramakrishnan
Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
Ruiqi Zhang, Licong Lin, Yu Bai, Song Mei
ScenicNL: Generating Probabilistic Scenario Programs from Natural Language
Karim Elmaaroufi, Devan Shanker, Ana Cismaru, Marcell Vazquez-Chanlatte, Alberto Sangiovanni-Vincentelli, Matei Zaharia, Sanjit A. Seshia
Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers
MohammadReza Ebrahimi, Sunny Panchal, Roland Memisevic
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
Xingyu Fu, Muyu He, Yujie Lu, William Yang Wang, Dan Roth
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
Robert Vacareanu, Vlad Andrei Negru, Vasile Suciu, Mihai Surdeanu
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models - A Survey
Philipp Mondorf, Barbara Plank
Forklift: An Extensible Neural Lifter
Jordi Armengol-Estapé, Rodrigo C. O. Rocha, Jackson Woodruff, Pasquale Minervini, Michael O'Boyle
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
Zexuan Zhong, Mengzhou Xia, Danqi Chen, Mike Lewis
What makes a good metric? Evaluating automatic metrics for text-to-image consistency
Candace Ross, Melissa Hall, Adriana Romero-Soriano, Adina Williams
Empowering Large Language Model Agents through Action Learning
Haiteng Zhao, Chang Ma, Guoyin Wang, Jing Su, Lingpeng Kong, Jingjing Xu, Zhi-Hong Deng, Hongxia Yang
On Limitations of the Transformer Architecture
Binghui Peng, Srini Narayanan, Christos Papadimitriou
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations
Deqing Fu, Ruohao Guo, Ghazal Khalighinejad, Ollie Liu, Bhuwan Dhingra, Dani Yogatama, Robin Jia, Willie Neiswanger
On Fairness of Low-Rank Adaptation of Large Models
Zhoujie Ding, Ken Liu, Pura Peetathawatchai, Berivan Isik, Sanmi Koyejo
Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning
Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Daogao Liu, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang
Information-Theoretic Distillation for Reference-less Summarization
Jaehun Jung, Ximing Lu, Liwei Jiang, Faeze Brahman, Peter West, Pang Wei Koh, Yejin Choi
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Chapados, Siva Reddy
Faithful and Unfaithful Error Recovery in Chain of Thought
Evelyn Yee, Alice Li, Chenyu Tang, Yeonho Jung, Ramamohan Paturi, Leon Bergen
AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models
Sicheng Zhu, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, Tong Sun
Evaluating Language Models for Efficient Code Generation
Jiawei Liu, Songrun Xie, Junhao Wang, Yuxiang Wei, Yifeng Ding, LINGMING ZHANG
Early Weight Averaging meets High Learning Rates for LLM Pre-training
Sunny Sanyal, Atula Tejaswi, Jean Kaddour, Abhishek Kumar, Sujay Sanghavi
Chain-of-Symbol Prompting For Spatial Reasoning in Large Language Models
Hanxu Hu, Hongyuan Lu, Huajian Zhang, Yun-Ze Song, Wai Lam, Yue Zhang
What is in Your Safe Data? Identifying Benign Data that Breaks Safety
Luxi He, Mengzhou Xia, Peter Henderson
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Hanshi Sun, Zhuoming Chen, Xinyu Yang, Yuandong Tian, Beidi Chen
Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models
Sebastian Bordt, Harsha Nori, Vanessa Cristiny Rodrigues Vasconcelos, Besmira Nushi, Rich Caruana
Reverse Training to Nurse the Reversal Curse
Olga Golovneva, Zeyuan Allen-Zhu, Jason E Weston, Sainbayar Sukhbaatar
LLM4Causal: Democratized Causal Tools for Everyone via Large Language Model
Haitao Jiang, Lin Ge, Yuhe Gao, Jianian Wang, Rui Song
🔦 Spotlight Starling-7B: Improving Helpfulness and Harmlessness with RLAIF
Banghua Zhu, Evan Frick, Tianhao Wu, Hanlin Zhu, Karthik Ganesan, Wei-Lin Chiang, Jian Zhang, Jiantao Jiao
RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models
Jie Huang, Wei Ping, Peng Xu, Mohammad Shoeybi, Kevin Chen-Chuan Chang, Bryan Catanzaro
JailBreakV: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks
Weidi Luo, Siyuan Ma, Xiaogeng Liu, Xiaoyu Guo, Chaowei Xiao
🔦 Spotlight A Long Way to Go: Investigating Length Correlations in RLHF
Prasann Singhal, Tanya Goyal, Jiacheng Xu, Greg Durrett
Counting Like Transformers: Compiling Temporal Counting Logic Into Softmax Transformers
Andy Yang, David Chiang
CoLLEGe: Concept Embedding Generation for Large Language Models
Ryan Teehan, Brenden M. Lake, Mengye Ren
CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration
Jiahui Gao, Renjie Pi, Tianyang Han, Han Wu, Lanqing HONG, Lingpeng Kong, Xin Jiang, Zhenguo Li
Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding
Zachary Ankner, Rishab Parthasarathy, Aniruddha Nrusimha, Christopher Rinard, Jonathan Ragan-Kelley, William Brandon
Model Autophagy Analysis to Explicate Self-consumption within Human-AI Interactions
Shu Yang, Muhammad Asif Ali, Lu Yu, Lijie Hu, Di Wang
EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents
Abhay Zala, Jaemin Cho, Han Lin, Jaehong Yoon, Mohit Bansal
Massive Activations in Large Language Models
Mingjie Sun, Xinlei Chen, J Zico Kolter, Zhuang Liu
Suspicion Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4
Jiaxian Guo, Bo Yang, Paul Yoo, Bill Yuchen Lin, Yusuke Iwasawa, Yutaka Matsuo
Evaluating the Adversarial Robustness of Retrieval-Based In-Context Learning for Large Language Models
Simon Yu, Jie He, Pasquale Minervini, Jeff Z. Pan
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Alex Zhuang, Ge Zhang, Tianyu Zheng, Xeron Du, Junjie Wang, Weiming Ren, Wenhao Huang, Jie Fu, Xiang Yue, Wenhu Chen
D2PO: Discriminator-Guided DPO with Response Evaluation Models
Prasann Singhal, Nathan Lambert, Scott Niekum, Tanya Goyal, Greg Durrett
🔦 Spotlight Tower: An Open Multilingual Large Language Model for Translation-Related Tasks
Duarte Miguel Alves, José Pombal, Nuno M Guerreiro, Pedro Henrique Martins, João Alves, Amin Farajian, Ben Peters, Ricardo Rei, Patrick Fernandes, Sweta Agrawal, Pierre Colombo, José G. C. de Souza, Andre Martins
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Haotian Zhang, Haoxuan You, Philipp Dufter, Bowen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, Yinfei Yang
Self-Guide: Better Task-Specific Instruction Following via Self-Synthetic Finetuning
Chenyang Zhao, Xueying Jia, Vijay Viswanathan, Graham Neubig, Tongshuang Wu
3M-Diffusion: Latent Multi-Modal Diffusion for Language-Guided Molecular Structure Generation
Huaisheng Zhu, Teng Xiao, Vasant G Honavar
CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting
Huihan Li, Liwei Jiang, Nouha Dziri, Xiang Ren, Yejin Choi
LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models
Haoran Li, Junqi Liu, Zexian Wang, Shiyuan Luo, Xiaowei Jia, Huaxiu Yao
CTIKG: LLM-Powered Knowledge Graph Construction from Cyber Threat Intelligence
Liangyi Huang, Xusheng Xiao
Enhancing Adversarial Robustness of LLMs with Analytic Hierarchy Process
Jiahao Zhao, Minzheng Wang, Nan Xu, YinLuo, Wenji Mao
Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions
Federico Cassano, Luisa Li, Akul Sethi, Noah Shinn, Abby Brennan-Jones, Jacob Ginesin, Edward Berman, George Chakhnashvili, Anton Lozhkov, Carolyn Jane Anderson, Arjun Guha
Length-Controlled AlpacaEval: A Simple Debiasing of Automatic Evaluators
Yann Dubois, Percy Liang, Tatsunori Hashimoto
STaR-GATE: Teaching Language Models to Ask Clarifying Questions
Chinmaya Andukuri, Jan-Philipp Fränken, Tobias Gerstenberg, Noah Goodman
Should We Attend More or Less? Modulating Attention for Fairness
Abdelrahman Zayed, Goncalo Mordido, Samira Shabanian, Sarath Chandar
On Robustness-Accuracy Characterization of Language Models using Synthetic Datasets
Ching-Yun Ko, Pin-Yu Chen, Payel Das, Yung-Sung Chuang, Luca Daniel
Handling Open-Vocabulary Constructs in Formalizing Specifications: Retrieval Augmented Parsing with Expert Knowledge
Mohammad Saqib Hasan, Sayontan Ghosh, Dhruv Verma, Geoff Kuenning, Erez Zadok, Scott Smolka, Niranjan Balasubramanian
Do Language Models Plan Ahead for Future Tokens?
Wilson Wu, John Xavier Morris, Lionel Levine
Automata-based constraints for language model decoding
Terry Koo, Frederick Liu, Luheng He
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversations
Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, Chi Wang
TOFU: A Task of Fictitious Unlearning for LLMs
Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary Chase Lipton, J Zico Kolter
SynerGPT: In-Context Learning for Personalized Drug Synergy Prediction and Drug Design
Carl Edwards, Aakanksha Naik, Tushar Khot, Martin D. Burke, Heng Ji, Tom Hope
Inspecting and Editing Knowledge Representations in Language Models
Evan Hernandez, Belinda Z. Li, Jacob Andreas
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Yinhong Liu, Han Zhou, Zhijiang Guo, Ehsan Shareghi, Ivan Vulić, Anna Korhonen, Nigel Collier
CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs
Jingzhe Shi, Jialuo Li, Qinwei Ma, Zaiwen Yang, Huan Ma, Lei Li
Forcing Diffuse Distributions out of Language Models
Yiming Zhang, Avi Schwarzschild, Nicholas Carlini, J Zico Kolter, Daphne Ippolito
Certifying LLM Safety against Adversarial Prompting
Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Aaron Jiaxun Li, Soheil Feizi, Himabindu Lakkaraju
Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data
Charles Jin
TMMLU+: An Improved Traditional Chinese Evaluation Suite for Foundation Models
Zhi Rui Tam, Ya Ting Pai, Yen-Wei Lee, Hong-Han Shuai, Jun-Da Chen, Wei Min Chu, Sega Cheng
BumbleBee: Dynamic KV-Cache Streaming Submodular Summarization for Infinite-Context Transformers
Lilly Kumari, Shengjie Wang, Tianyi Zhou, Nikhil Sarda, Anthony Rowe, Jeff Bilmes
Keep the Cost Down: A Review on Methods to Optimize LLM’s KV-Cache Consumption
Shi Luohe, Hongyi Zhang, Yao Yao, Zuchao Li, hai zhao
PAPERCLIP: Associating Astronomical Observations and Natural Language with Multi-Modal Models
Siddharth Mishra-Sharma, YIDING SONG, Jesse Thaler
IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models
Haz Sameen Shahgir, Khondker Salman Sayeed, Abhik Bhattacharjee, Wasi Uddin Ahmad, Yue Dong, Rifat Shahriyar
Yes, no, maybe? Revisiting language models' response stability under paraphrasing for the assessment of political leaning
Patrick Haller, Jannis Vamvas, Lena Ann Jäger
Measuring Taiwanese Mandarin Language Understanding
Po-Heng Chen, Sijia Cheng, Wei-Lin Chen, Yen-Ting Lin, Yun-Nung Chen
Pairwise Proximal Policy Optimization: Language Model Alignment with Comparative RL
Tianhao Wu, Banghua Zhu, Ruoyu Zhang, Zhaojin Wen, Kannan Ramchandran, Jiantao Jiao
Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness
Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Tongshuang Wu
Poly-Visual-Expert Vision-Language Models
Xiaoran Fan, Tao Ji, 江常皓, Shuo Li, Senjie Jin, Sirui Song, Junke Wang, Boyang Hong, Lu Chen, Guodong Zheng, Ming Zhang, Caishuang Huang, Rui Zheng, Zhiheng Xi, Yuhao Zhou, Shihan Dou, Junjie Ye, Hang Yan, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration
Qiushi Sun, Zhangyue Yin, Xiang Li, Zhiyong Wu, Xipeng Qiu, Lingpeng Kong
MANGO: A Benchmark for Evaluating Mapping and Navigation Abilities of Large Language Models
Peng Ding, Jiading Fang, Peng Li, Kangrui Wang, Xiaochen Zhou, Mo Yu, Jing Li, Hongyuan Mei, Matthew Walter
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
Yuxuan Wang, Alan Yuille, Zhuowan Li, Zilong Zheng
🔦 Spotlight Measuring and Controlling Instruction (In)Stability in Language Model Dialogs
Kenneth Li, Tianle Liu, Naomi Bashkansky, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg
Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
Jacob Eisenstein, Chirag Nagpal, Alekh Agarwal, Ahmad Beirami, Alexander D'Amour, Krishnamurthy Dj Dvijotham, Adam Fisch, Katherine A Heller, Stephen R Pfohl, Deepak Ramachandran, Peter Shaw, Jonathan Berant
SteP: Stacked LLM Policies for Web Actions
Paloma Sodhi, S.R.K Branavan, Yoav Artzi, Ryan McDonald
LLM-Datasets: An Open Framework for Pretraining Datasets of Large Language Models
Malte Ostendorff, Pedro Ortiz Suarez, Lucas Fonseca Lage, Georg Rehm
Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization
Costas Mavromatis, Petros Karypis, George Karypis
Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners
Jihyeon Lee, Dain Kim, Doohae Jung, Boseop Kim, Kyoung-Woon On
Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data
Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Tomasz Korbak, Henry Sleight, Rajashree Agrawal, John Hughes, Dhruv Bhandarkar Pai, Andrey Gromov, Dan Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo
Prompt-prompted Adaptive Structured Pruning for Efficient LLM Generation
Harry Dong, Beidi Chen, Yuejie Chi
WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting
Olly Styles, Sam Miller, Patricio Cerda-Mardini, Tanaya Guha, Victor Sanchez, Bertie Vidgen
🔦 Spotlight Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
Eric Zelikman, Eliana Lorch, Lester Mackey, Adam Tauman Kalai
Cohesive Conversations: Enhancing Authenticity in Multi-Agent Simulated Dialogues
KuanChao Chu, Yi-Pei Chen, Hideki Nakayama
StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows
Yiran Wu, Tianwei Yue, Shaokun Zhang, Chi Wang, Qingyun Wu
🔦 Spotlight MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Shengding Hu, Yuge Tu, Xu Han, Ganqu Cui, Chaoqun He, Weilin Zhao, Xiang Long, Zhi Zheng, Yewei Fang, Yuxiang Huang, Xinrong Zhang, Zhen Leng Thai, Chongyi Wang, Yuan Yao, Chenyang Zhao, Jie Zhou, Jie Cai, Zhongwu Zhai, Ning Ding, Chao Jia, Guoyang Zeng, dahai li, Zhiyuan Liu, Maosong Sun
Timo: Towards Better Temporal Reasoning for Language Models
Zhaochen Su, Jun Zhang, Tong Zhu, Xiaoye Qu, Juntao Li, Min zhang, Yu Cheng
Bot or Human? Detecting ChatGPT Imposters with A Single Question
Hong Wang, Xuan Luo, Weizhi Wang, Melody Yu, Xifeng Yan
🔦 Spotlight Will the Real Linda Please Stand up...to Large Language Models? Examining the Representativeness Heuristic in LLMs
Pengda Wang, Zilin Xiao, Hanjie Chen, Frederick L. Oswald
Enhancing Language Models with Idiomatic Reasoning
Jianing Zhou, Ziheng Zeng, Hongyu Gong, Suma Bhat
ACORN: Aspect-wise Commonsense Reasoning Explanation Evaluation
Ana Brassard, Benjamin Heinzerling, Keito Kudo, Keisuke Sakaguchi, Kentaro Inui
ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction
Mingyu Jin, Haochen Xue, Zhenting Wang, Boming Kang, Ruosong Ye, Kaixiong Zhou, Mengnan Du, Yongfeng Zhang
🔦 Spotlight Stream of Search (SoS): Learning to Search in Language
Kanishk Gandhi, Denise H J Lee, Gabriel Grand, Muxin Liu, Winson Cheng, Archit Sharma, Noah Goodman
Risks from Language Models for Automated Mental Healthcare: Ethics and Structure for Implementation
Declan Grabb, Max Lamparth, Nina Vasan
Prompt Public Large Language Models to Synthesize Data for Private On-device Applications
Shanshan Wu, Zheng Xu, Yanxiang Zhang, Yuanbo Zhang, Daniel Ramage
Agent-DocEdit: Language-Instructed LLM Agent for Content-Rich Document Editing
Te-Lin Wu, Rajiv Jain, Yufan Zhou, Puneet Mathur, Vlad I Morariu
From Strategic Narratives to Code-Like Cognitive Models: An LLM-Based Approach in A Sorting Task
Hanbo Xie, Hua-Dong Xiong, Robert Wilson
See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses
Yulong Chen, Yang Liu, Jianhao Yan, Xuefeng Bai, Ming Zhong, Yinghao Yang, Ziyi Yang, Chenguang Zhu, Yue Zhang
SPEER: Sentence-Level Planning of Long Clinical Summaries via Embedded Entity Retrieval
Griffin Thomas Adams, Jason Zucker, Noémie Elhadad
Effective Prompt Extraction from Language Models
Yiming Zhang, Nicholas Carlini, Daphne Ippolito
ReAct Meets ActRe: Autonomous Annotation of Agent Trajectories for Contrastive Self-Training
Zonghan Yang, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Yang Liu
InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification
Yujia Hu, Zhiqiang Hu, Chun Wei Seah, Roy Ka-Wei Lee