COLM 2025: Review Guidelines

Review Guidelines

Thank you for being part of the COLM program committee! We aim for a technically deep, exciting, forward-looking, insightful, and impactful program. We ask that you thoughtfully consider each paper assigned to you during the review process, and avoid simply applying mechanistic measures of quality. Papers can excel along several dimensions. Of course, few papers if any excel along all dimensions, and our job as the program committee is to carefully weigh contributions along the different dimensions to identify the best work to include at COLM.

A non-exhaustive list of such dimensions includes:

Empiricism, Data, and Evaluation A strong empirical foundation is often core to a strong paper, increasing the likelihood the approach generalizes beyond the experiments in the paper. Work that is strong along this dimension uses data that is as natural as possible, strong experimental design, and evaluation metrics that are known or shown to measure what they claim to.

Technological Impact Our field, both researchers and practitioners, has benefited enormously from work that demonstrated impact through the use of the technology, software, data, or other artifacts it presented. Work that excels along this dimension provides high quality, thoughtfully designed, and well packaged resources and artifacts that will enable future high quality and impactful work.

Ambition, Vision, Forward-outlook Progress is driven both by gradual development of techniques and big ambitious leaps forward. There are many challenges and risks in work that goes beyond the boundary of current research, but such work is critical for progress. Work that stands out along this dimension looks forward beyond problems currently studied, or extends currently studied problems in significant ways.

Understanding Depth, Principled Approach Our goal as researchers is not only to develop methods and build artifacts but also to understand both the methods we use and natural language, the phenomena we study. Papers that excel along this dimension will deepen our understanding, for example by taking a principled approach to modeling and learning, or through careful and deep analysis.

Clarity, Honesty, and Trust The paper you review is one of the main artifacts of the work. Abiding to the highest standards of scientific reporting is a key to quality. Especially nowadays, work in our field is read broadly, and has a dramatic impact on the perception of nascent technologies. Work that shines along this dimension will be written clearly, provide a measured and balanced presentation, and, as much as possible, release research materials.

Excellence in each of the above aspects already makes a significant contribution. However, weakness in some aspects cannot be an immediate reason to reject a paper. For example, it is sometimes extremely challenging to design robust empirical studies when studying extremely ambitious scenarios and problems, especially when they dramatically depart from current research practices and empirical evaluation requires significant innovation on its own. Accepting such papers, especially when authors are clear and honest about such limitations, is a risk, but one that can pay off greatly for the field when the paper shines along other dimensions. Another example is papers that may have a strong empirical foundation and a principled approachbut seem weaker on impact in the immediate future. This, to some degree, is how next-word prediction language modeling was viewed in the community for a long time. However, the principled approach proved itself in the long run, even if it looked risky along the way.

That said, inversely, even significant strength along a certain dimension is not a panacea for all. For example, work can present a forward-looking vision, beyond the wildest dreams of current research, but if it is very weak in other aspects, it may provide little to advance research beyond a compelling story.

Finally, we ask that you take into account that most researchers do not have access to large-scale compute. While some questions in our field require significant computational resources to study at the upper limits of scale, few publishing authors have these resources. Limiting this type of research to only these labs will stifle innovation and understanding. Naturally, this runs the risk that some small-scale results will not hold when studied later on at a large scale. But some results will, and they will not make it unless we, the program committee, make a bet on them.