Behind the scenes: the struggle for each paper (2021)

schneems · on Jan 7, 2024

I had an assignment in the OMSCS course where we had to turn the results of a project into a paper and a presentation. It was eye opening on why so many CS papers are difficult to decipher.

I’m used to writing on the web where the scroll is unlimited and everything is hyperlink able and potentially interactive. Journal papers are limited by length and so was our assignment. I had to cut virtually all helpful explanation needed to reproduce my results which was deeply frustrating. We were implementing an algorithm based on another paper and it was hard because key details were omitted or assumptions not stated. After that exercise I have to think some of it was intentional to get it down to size.

I find most people aren’t good at technical communication and teaching others without a LOT of practice. Even then it requires feedback and iteration to make sure the ideas are communicated well. Forcing people to be more succinct and omit details makes the final product worse to consume. I don’t know how common such limitations are these days, but I do know that the average paper is still out of reach of the average programmer (where it would likely have the most benefit).

godelski · on Jan 7, 2024

> Journal papers are limited by length and so was our assignment

I have always thought this was a bit silly and that it creates really weird effects that also decrease readability. An interesting point is that reviewers are not required to read the appendix of works. So everything is required to be in the front matter. This is a bit silly when we do things like research graphics or do generative works and such. You want to include images and samples but then your space is eaten up. What if you want to discuss analysis on those images and explore some? You could easily do this on a blog but you're forced to throw this into the appendix. But then a reviewer can ask a question that's explained there and your work can still get rejected because it isn't in the front matter. Another weird incentive is that people end up padding works to fit page limits. This is because if you turn in a shorter paper reviewers will frequently reject your work the same way your boss might not think you're working if they don't see you at your desk.

We live in the 21st century and we still publish like it's the 15th. Computers gave us the ability to embed images, which is why there are so many more graphs and charts now, and it's not like more pages cost more. So just remove it. Some papers should be only a few pages and there's nothing wrong with that. Some papers should be far larger and there's nothing wrong with that. It's just weird to set these up considering they were likely created under other constraints but momentum continued and we back justify the continued decisions (there is something to be said about readability, but that can just be a reason to reject).

Side note: CS groups typically publish in conferences

jltsiren · on Jan 7, 2024

Page limits force you to focus. As a researcher, you are often expected to communicate your ideas in 1 page, 3 pages, 10 pages, or 30 pages, for various purposes. If a journal asks for a 10-page paper, you write a 10-page paper. If a conference asks for a 1-page abstract, you write a 1-page abstract. Most people reading a paper are not interested in going through all the details, and those details should usually not be in the main paper.

It's also easier to find reviewers for short papers than for long ones.

Some the issues you mention are specific to CS conferences. Because there is only time for 1-2 rounds of reviews, the reviews focus more on accepting/rejecting the paper and less on clearing any misunderstandings before judging it. Conferences are are also more likely to have one-size-fits-all page limits, while journals often have several catagories of papers with different expectations of length.

godelski · on Jan 7, 2024

> Page limits force you to focus.

This can be solved in better ways, which is, in fact, reviewers. I'm okay with a soft requirement but a standardization is what I'm getting at as being problematic. Some papers are noisy because they should be 3 pages but are 10. Some papers are noisy because they are 10 pages and should be 30. There is no universal rule, and that's what I'm getting at.

> It's also easier to find reviewers for short papers than for long ones.

That's a separate problem that needs to be addressed, but is not easy.

> Some the issues you mention are specific to CS conferences.

Yes, but the author here is CS and we are on a CS focused website. But in general what I said isn't specific to conferences. If conferences are the problem then let's abandon them in favor of good science instead of keeping them around (or turn them into being meetup focused). Certainly the lack of back and forth between authors and reviewers is not a meaningful review process (most author rebuttals are limited to one page and often reviewers are not aligned in critiques). Are we all on the same team (better science) or strictly competing against one another?

jltsiren · on Jan 8, 2024

If the paper does not fit reasonably within the page limit, you should submit it to another venue. If you can't write a meaningful 10-page version of a 30-page paper, you probably can't give a meaningful 25-minute talk on it either. You should submit it directly to a journal that accepts long papers.

Some conferences also have special tracks for short papers, and some journals publish "letters" instead of or in addition to full-length papers.

godelski · on Jan 8, 2024

> If you can't write a meaningful 10-page version of a 30-page paper, you probably can't give a meaningful 25-minute talk on it either.

I can't really tell what's going on here anymore but I don't think we're having a conversation. You're just describing something that's not in good faith here. You're letting "meaningful" do the heavy lifting here. Yes, of course everyone can distill a paper, but not every paper can be distilled and then accepted into publication. Frankly, because reviewers act like exactly this and place weird arbitrary bars on what it means to be good work forgetting that all works are incomplete and thus encouraging embellishing and lying and setting continually new absurd bars.

Stop doing gymnastics to protect a system or just respond to my actual critiques. There's no perfect system so you can even say my critiques are valid yet not enough of a concern to abandon or modify our current system. It's not an all or nothing situation here. But I don't need to be lectured on something this silly as "if you can't do it in 10 pages, you aren't doing it right." My claim was that there isn't a one size fits all standard. I stand buy that. You can respond to what I wrote but there's not a good "teaching moment" here.

If you just want to tell me how I'm wrong without listening to my actual concern then don't comment. You're creating noise and just an angrier internet. If you think I have failed to consider something and that thing is important, do lay it out. But communicate what that actually is rather than just saying "dumb." Give a real critique. The same goes for when you review. Don't be reviewer 2. Reviewer 2 just holds back science.

jltsiren · on Jan 9, 2024

My point was that if a paper needs 30 pages, don't submit it to a conference. That makes as little sense as submitting an algorithms paper to a zoology journal. Conferences are centered around talks, and you can't present a long paper adequately in a short talk.

Journals can be more flexible than conferences. They don't need page limits, because they don't have the physical constraints imposed by conference dates and the number of parallel tracks. But journals also have audiences, and audience expectations are more important than your paper. You should take those expectations into account when choosing the journal. Don't send an algorithms paper to a zoology journal, and don't send a long paper to a journal that focuses on short papers.

Write the paper first, and then choose a journal or a conference that publishes papers like that. Just as there is no single page limit that fits all papers, there is no single venue that publishes papers of all lengths.

sideshowb · on Jan 7, 2024

I think desirability of page limits is very subject specific. Some people will just waffle if you don't give them a page limit. Other times it means there's not room for the technical details.

godelski · on Jan 7, 2024

But the reviewers can reject if it isn't enough or reject if it is too much. What I'm arguing is the alignment mechanism already exists. The page limit is over constraining

outrun86 · on Jan 7, 2024

Distill.pub was one effort to modernize publishing in CS. Chris Olah wrote some thoughts [1] about why he didn’t feel it was tenable. Seems like the primary challenge was the additional effort and skill involved in crafting rich-content/interactive material.

[1] https://distill.pub/2021/distill-hiatus/

godelski · on Jan 7, 2024

Honestly, I don't get why we don't just submit to OpenReview and call it a day. Paper is visible and distributed. There are comment sections where peer review can not just happen, but happen in the open (added bonus!). You can iterate and even see the difference between submissions. What is the conference/journal providing that isn't covered here? A stamp of approval? From a well known noisy system that creates other disincentives?

vladms · on Jan 7, 2024

Not sure the openness of the review would solve so many problems of the system. For example would not touch like reproducibility and data and code availability.

Then you will need moderation (or do you imagine that things will be civilized between people on the internet?) and would need to manage various possibilities of bullying/targeting/etc. Of course these things can happen now, but difference would be between a potentially fully automated and simple system and something very clunky (be friends with an editor, convince him to report who are the reviewers, manage to recognize another of his papers, etc.)

godelski · on Jan 7, 2024

> For example would not touch like reproducibility and data and code availability.

These are different issues, which are certainly important. But I do think in some way this would help. OpenReview does allow you to post comments many months after. Effectively think about this as a GitHub issues page. It certainly could be organized better but it is better than what exists now. OR also has links for code and community implementations (as does arxiv now). Here's an example that has all these things[0]. Granted data is missing, but I don't see why this can't also be integrated, but would need to also push cultural norms.

> Then you will need moderation

I think OR has this a bit solved, similarly arxiv. They are not anonymous accounts and are tied to your ORCID record. Arxiv requires you to have a verifier that is already someone with an arxiv account. Yes, this can be abused, but it is also an easier moderation problem that say Reddit or HN even. I think if you're posting bullying comments under a named profile, then it is good that that is visible so others can see. Mind you, bullying does already exist but it is just behind closed doors. It is worse now because only the Area Chair can take action and often they are over worked and works do get dismissed (which results in A LOT of wasted time, and money) because of this bullying. The larger the field, the more noise too and the more this happens. It is just far less common to see people bullying in public than behind closed doors.

I must stress though, that there is no perfect system here. There is no system that can make the amount of bullying 0. So we have to be careful in our critiques because there will always be valid critiques that are in fact of concern (like this one) but are fundamentally unsolvable. The question then becomes if we improve upon the existing frameworks and if whatever costs have been made are worth the added benefits. So I just want to make sure that this idea isn't killed because an impossible bar, despite the critique being valid.

Edit: I'd actually add that this system encourages reproduction. Because if we still measure on citations and number of publications this means that reproduction works can still count towards those metrics and thus someone's career advancements. The whole conference/journal system currently discourages such effort in favor of the absurdly nebulous novelty concept (which also makes papers noisy). My proposal would also allow for the publication of failures, which is also an important thing for academics.

[0] https://openreview.net/forum?id=Hkxzx0NtDB

sideshowb · on Jan 7, 2024

Promotion and filtering I guess? What does a record label provide when you can just upload music to Spotify?

godelski · on Jan 7, 2024

> What does a record label provide when you can just upload music to Spotify?

I believe this is an illustrative example in support of my proposition, not against. Many artists are in fact turning away from record labels in favor of self publishing. Similarly for books.

But I will say that I still think there's value and so I'll expand on my ideas about conferences. I think they should exist, but be focused on meet and greets. So instead of being an indicator of the validity of work, have them invite authors to speak about their works. Allow others to sign up for poster sessions. How to do that appropriately does need to be worked out, but there's nothing wrong with it simply being under recommendation from the advisement of the organizing members. Yes, there will still be preferential bias, but I do mean "still" because we do have preferential biases towards certain institutions and labs. This would just make it a bit more explicit that they are not the arbitrators of quality but just treated as a "reward."

Importantly I think this allows opening the doors for different kinds of research that are not incentivized by our systems. Most important being reproduction

ketzo · on Jan 7, 2024

What a great resource, both for self-reflection and for a student who wanted to chase a similar career. I should really do something similar for my history of paid work.

It’s not like I have a crazy illustrious career or anything, but it can feel like kind of a blur, just a rollercoaster that led inexorably towards the present, which couldn’t be further from the truth; I would love to be able to reflect on my successes (and failures!) and see the small, concrete steps I took towards each.

Even without writing it out, I know the connections I have made and the mentors / coworkers / friends who have helped me deserve much more credit than any individual strokes of brilliance on my part! Another thing that’s very easy for me to forget, day-to-day.

ShadowBlades512 · on Jan 7, 2024

I have started to at least write 4-5 bullet points per month of my job in my personal notes as a reminder of what stuff I have done. I find I will remember a lot of details as long as I have notes that remind me that a project even existed or an event happened. That has been enough for me.

ketzo · on Jan 7, 2024

That's a great idea, I think I'll start doing that. Sounds super worthwhile for very little effort.

halgir · on Jan 7, 2024

No way - reading this I thought I recalled one of the papers (Starcraft from the Stands). Pulled up my Zotero library, and sure enough, I cited it in my BA thesis almost ten years ago.

What a pleasant coincidence - thanks for the contribution!

amadeuspagel · on Jan 7, 2024

There's this new thing that some academics are working on at CERN - kind of like academic papers, with references and so forth, but on the computer.

Once this is ready, people will just be able to publish their "papers" there. I guess they'll be called something else then. But this sort of struggle to publish a "paper" will no longer be necessary.

jll29 · on Jan 7, 2024

Thanks for sharing a behind-the-curtain view on the history of your publications.

Thank you even more for publishing WebGazer and for following a "systems" approach in your research, when most people produce only papers. It's systems as research artifacts that encode the exact methods as described in the papers but in sufficient detail to be executable that drive innovation. Sadly, system papers are rather hard to publish, despite taking longer (software that is released needs to be much more polished than software that you are going to keep to yourself).

fallon54 · on Jan 7, 2024

AKA why you probably don't want to be in academia

vladms · on Jan 7, 2024

I think most of the things in life come with a lot of struggle, strange things, things that should be different and so on.

Making a startup? Go and check how hard and crazy that is. Make a family? Similar convoluted process with ups and downs.

What I think is wrong is that people have a very "idealized" image of a scientist scribbling on a board and equation and getting some prize (or defeating the aliens). These images are good for kids but after high-school I think people should give it a thought and say "ok, things are not exactly how I imagined in life, lets try to understand more what I like and want". You know the same process that makes people realize there is no Santa Claus.

ajsnigrutin · on Jan 7, 2024

Yep... Everyone in academia complains about publishing papers, about the high prices of publishing, about "publish or perish", and then when they come high enough in "academia", require the same pain from the newcomers. It's like a closed circle of people both requiring papers for maintaining and advancing your carreer and at the same time complaining about those papers (and the publishing process), and not even thinking about some kind of "change".

academia_hack · on Jan 7, 2024

I've giving "Accept - Minor Revisions" to every paper I've peer reviewed since getting my PhD other than two that were outright plagiarism. Figure it's important to the morale of grad students to get some positive validation and the vast majority of published research is garbage anyways so I don't feel particularly inclined to defend the trash heap as an unpaid reviewer. In practice, I find that I've tipped the scales in favor of a lot of borderline papers over the years and am quite happy about that.

JohnKemeny · on Jan 7, 2024

This is only true to some extent, having myself been on a fair number of hiring committees.

While the institution and national agencies measure impact in terms of number of "level 1/level 2" papers, colleagues don't care at all about this value. What's important is number of single-author papers, number of papers without their advisor, number of different small group collaborations, and most of all, having papers accepted in the top venues.

A person with 50 shit papers will not even be considered for the job.

ajsnigrutin · on Jan 7, 2024

Sure, but the same group of people that complains about (the publishing of) papers, is then in a position to change that, but doesn't. All those people that went through this process and complained, then (well.. a few years later) sit in univeristy comitees that decide what the hiring (and scoring) rules are, what are the requirements fo TAs, for professors and tenures, etc., and decide, that the "pain of publishing" is an ok thing to subject new generations to.

edit: i'm from slovenia, univeristies here are "autonomous" (not a direct part of the government.. except for being government funded), and they decide all the internal rules themselves.

cvwright · on Jan 7, 2024

The problem is that academia is otherwise a pretty cushy job. It attracts a lot of people who want the prestige and like to talk, but don’t actually want to do any work.

Peer review and the paper chase are the least bad solution that we’ve come up with to address this.

godelski · on Jan 7, 2024

> But this paper was critical to getting me accepted to a Ph.D. program. Why do I think that? Well I was rejected by every Ph.D. program I applied to before this publication (but that's another story), a story about people and opportunity.

This is an interesting note. We're talking about a student from one of the top CS schools (UIUC) and applying to another top school (UW). If you think about this a bit carefully, the paper being published did not change who he was or his capabilities, it was simply a difference in measured (distinct from measurable) signal.

It's incredible how many extremely noisy signals we use in academia but act as if we use a clear meritocracy. The review process is extremely noisy itself, with computer science in particular being generally more noisy given its preference of conferences over journals. I'm glad Jeff mentions people and opportunities, and it reminds me of the old saying about there being no self made man. But I think this is a very clear example of a instance where we need to think harder and more carefully. Counterfactually, it is almost certain that had that paper been rejected, but all else stays the same (i.e. getting into UW), his success story would also not change. Signals are definitely hard to measure and certainly schools are getting a lot of applicants, so I don't blame anyone for doing this, but I think it is incredibly important to remember these counterfactuals. To remember that metrics are guides and not causal variables themselves. Because there's a great irony in that metrics destroy meritocracies.

nkurz · on Jan 7, 2024

Your point is correct, but I'm not surprised by the difference. I think "legibility" is the term of art here. Writing a paper like this makes it almost[1] certain to the institution he is applying to that he is capable of writing a paper of this quality, while all the other metrics (GPA, GRE, etc) are much more probabilistic. Since someone incapable of writing such a paper is probably unsuited for a PhD, it seems entirely appropriate to choose applicants who have demonstrated ability to clear this bar over those that have not.

[1] "Almost" to account for the slight chance that he didn't actually author the paper but somehow managed to get his name put on it anyway.

godelski · on Jan 7, 2024

I agree and there's a lot in my comment to point to that. But my point is to distinguish between the metrics and the goals. I'm certain the author included in their CV that they had a pending paper when applying, so there is a signal, albeit a weaker but publishing is a weak signal to begin with.

I agree that you need to use metrics. But we need to be clear that metrics are not enough and very incomplete themselves. With something like admissions, I'm not sure there's anything except noisy signals and the strongest one by far is the interview.

> Since someone incapable of writing such a paper is probably unsuited for a PhD,

I very much disagree with this. The explicit purpose of schooling is to train people. Many undergrads are not going to have the opportunities to publish. It is not hard to train someone to write something publishable and this is not something I would be much concerned with myself given how much writing they're going to be doing over the next few years. The far more valuable skills are in being able to perform research which is quite ambiguous (there are at least 2 ways to read this sentence and both are correct: research type v measure). Your first 2 years of your PhD are almost exclusively training, with more class work and learning how to begin research. This isn't a job you're applying for, it is a training program.

nkurz · on Jan 7, 2024

>> Since someone incapable of writing such a paper is probably unsuited for a PhD

> I very much disagree with this.

Your disagreement is justified. I phrased that poorly. I meant it as a shorthand for "incapable of being trained to write such a paper". Showing that you already have the skill is proof, everything else just points to the possibility with varying degrees of accuracy.

I in turn disagree that "the purpose of schooling is to train people", at least if "schooling" refers to PhD programs. I think it's more that there aren't enough applicants who are able to perform without extensive training, so in practical terms PhD programs need to be willing to provide training. But at the same time, it's perfectly understandable that they would prefer to take applicants who have demonstrated ability to perform over those with statistical potential.

I'd prefer something like "The purpose of PhD programs is to advance the field". I'm personally in the odd category that I've co-authored several computer science research papers despite having dropped out to become a programmer prior to my BA. I've demonstrated my ability to perform much of the role of a PhD while simultaneously demonstrating that I perhaps shouldn't be relied upon to finish!

dlemire · on Jan 7, 2024

> I'd prefer something like "The purpose of PhD programs is to advance the field".

If you read Wikipedia under 'Doctor of Philosophy', you will find that a Ph.D. was once more of a prestigious title you got after doing the scholarship:

"The first higher doctorate in the modern sense was Durham University's DSc, introduced in 1882. This was soon followed by other universities, including the University of Cambridge establishing its ScD in the same year and the University of London transforming its DSc into a research degree in 1885. These were, however, very advanced degrees, rather than research-training degrees at the PhD level—Harold Jeffreys said that getting a Cambridge ScD was "more or less equivalent to being proposed for the Royal Society."

It is still possible to get a doctorate in this manner. Please see wikipedia under 'Doctor of Philosophy by publication'.

"A Doctor of Philosophy by publication (also known as a Ph.D. by Published Work, PhD by portfolio or Ph.D. under Special Regulation) is a manner of awarding a Ph.D. degree offered by some universities in which a series of articles usually with a common theme are published in scholarly, peer-reviewed journals to meet the requirements for the degree, in lieu of presentation of a final dissertation. Many PhD by Publication programs require the submission of a formal thesis and a viva voce."

It is offered in several countries in Europe. The wikipedia entry is incomplete: it is not just offered in the UK.

Furthermore, it is relatively common to get advanced degrees from well known universities (e.g., Harvard) without having an undergraduate degree.

godelski · on Jan 7, 2024

I see your point and I think that brings us a bit closer to alignment. But I think if someone is __incapable__ of writing such a paper there would likely be many larger flags and they probably should not have been able to pass their undergraduate curriculum.

I do want to make it clear: I'm not opposed to arbitrary filters when there is a high number of applicants and you simply need to reduce the number. I am opposed to pretending that such a filter is not arbitrary. I think we need to be clear about how strong of a signal any filter is, and be quite explicit that they are not all equal indicators. That is my main point: being explicit about the strength of a signal.

On regards to training, I do agree that schooling isn't __just__ training, but I'd fully disagree that this isn't one of the most important aspects of it, even in grad school. Your first two years (in US systems) are nearly identical to a masters and highly focused on classes. What are classes but training? Even being a TA or lecturer is, in part, training (full instructor of record would not be). Post conditional, I still think you are in training at least up until candidacy. That is much more arguable given the variability of advisors, with some being very hands on (training) and some being very hands off (on your own).

I'd prefer something as "The purpose of PhD programs is to train people to advance the field." Because by all accounts, it seems like you've done this (even with the self-deprecating humor. That is exceptionally common in PhDs too lol). I still maintain training because this isn't the end, but the beginning. Post PhD is where you can choose to go to be an academic researcher or industry researcher (or abandon research). Those are the actual jobs (which should have continued training) but your degree is more akin to a certification from your institution. You do come out with a body of work that is distinct from the institution, but the institution's goal is not to keep you around and continue performing work. They are explicitly formed to graduate you. To educate you. And what is education except a form of training?

Fwiw, I think we are decently aligned, but sometimes text is hard to communicate, especially post by post. I do think your critiques are valid, even where I disagree.

BrandoElFollito · on Jan 7, 2024

Another thing is that there is not enough pushback from the community at large.

My PhD thesis was less than 40 pages long. The introduction was 1/2 a page (basically "if you need an introduction you should not read this, here are 3,4 books to get you started").

Then I copied/pasted from my articles and then came the acknowledgments (which I actually fund valuable because I wanted to thank my advisor for his non-science-related help and a friend for her magnificent idea that turned around the thesis. And my parents, wife, dog etc.)

Then the conclusion ("brilliant work")

And then a discussion with myself about everything that I fucked up and what could be improved (my advisor fainted on that one).

The jury was 8 people. The younger/more dynamic ones were super happy (especially that they made their review a page long as well). The older ones were disgusted and said that clearly. I got my PhD.

I fought in Academia for a few years to bring some change but eventually left (also for other reasons). If I was to stay for my whole career I would have tried again and again to change the status quo.

dotnet00 · on Jan 7, 2024

A friend who recently finished her PhD had a similar experience, where all the senior scientists at our lab were concerned because her thesis was "only" 100 pages long and she didn't go through a professional editor to have it perfected.

My preliminary defense thesis had to be 50+ pages, but during the presentation, it was pretty obvious that the committee had at best looked at the table of contents. It all feels like such an unnecessary waste of effort. Even with my own thesis, over half of it is just padding with very fundamental background information because the work isn't really so complicated as to require that many pages to discuss, it's just demonstrating more advanced simulation capabilities by implementing GPU acceleration for a niche but simulation heavy field.

BrandoElFollito · on Jan 7, 2024

Theses at my time were about 200 pages long. A friend of mine wrote two tomes.

I clearly stated that I would not waste my time and the reviewers are free to provide comments and we will see during the defense.

I found out that a lot of these "rules" are traditions that one can challenge and suddenly they are not traditions anymore.

dotnet00 · on Jan 7, 2024

Yes, the only hard rule in my department is having a minimum of 50 pages, the idea that 100 pages is not enough came from the scientists applying their own experiences from years ago. Technically there was nothing they could do about her thesis having fewer pages, but as inexperienced students, it's obviously a little scary when people you look up to sound concerned (since academia is full of all sorts of unproductive and unstated expectations).

A friend at another department only had a minimum requirement of 5 pages, and his thesis ended up being just a collection of his publications.

BrandoElFollito · on Jan 8, 2024

> his thesis ended up being just a collection of his publications

We have in France the concept of "thesis on publications" where you write an intro, an outro/conclusion, and bind your articles in the middle. This is very helpful when you have already published everything that builds your thesis.

taopai · on Jan 7, 2024

Papers... our new religion...

darthoctopus · on Jan 7, 2024

[2021]

patrickmay · on Jan 7, 2024

Nearly three times the number of papers published by Claudine Gay. Why isn't he President of Harvard?

vaidhy · on Jan 7, 2024

I downvoted this comment because it is not pertinent to this topic and just flamebait.

lapinot · on Jan 8, 2024

The tone is probably flamebait, but the content is on topic imho. 15 papers before getting a phd?! Being a phd myself and having serious issues with a lot of the academic practices, reading the title i thought i would identify with the author and situations, which i didn't. I don't think this kind of experience is representative with the common "struggle with papers" among young researchers i have seen around me.