Why bad scientific code beats code following "best practices"

mattheww · on May 12, 2014

I am a scientist, and I have seen a lot of terrible code. Most scientists have no formal training in computer science or coding. Many advisors don't place much value in having their grad students take such classes, though even a short language-specific introduction class would vastly improve their students' productivity.

I recently undertook a complete rewrite of our group's analysis software that was written by our previous postdoc. It was ~30k lines of code in 2 files (one header, one source file), with pretty much every bad coding practice you can image. It was so complicated that that postdoc was essentially the only one who could make changes and add features.

The rewritten framework is only ~6k lines of code to replicate the exact same functionality. It's easy enough to use that just by following some examples, the grad students have been able to do implement studies in a couple days that took weeks in the old framework. The holy grail is for it to be easy enough for the faculty to use, but that will probably take a dedicated tutorial.

My point is that following "best practices" may be overkill, but taking a thoughtful approach to the design of the software can vastly improve your productivity in the long run. Posts like the OP help scientists who write bad code defend poor practices. Any scientist worth his salt should support following good practices because it will always lead to better science.

ap22213 · on May 12, 2014

I work in R&D for a large science services company. And, I'm often responsible to turn nifty research projects into marketable products. Because of this, I often take over a lot of code from scientists and academics. And, it's usually (e.g. always) pretty bad.

'Software engineers' get a bad rap for over-engineering code. And, I understand that. But, the opposite is so, so much worse. I see what you're describing every time I take over a project.

The worst characteristic though is lack of version control. Usually these teams will have used email to exchange source files. They usually have a directories full of 'version_X' sub-directories of different code. And, usually each member of the team will have different versions of the code.

The second worst characteristic I find is code that doesn't actually work unless it is placed exactly in the right directory of a now non-existent server. They send me code (in a zip file, of course), no instructions, no configuration. And, then I spend several days or even weeks just trying to get it to work the way that they said it worked back at their research 'demo' a year ago. 'It worked last year', they say. And, then imply that I'm some sort of hack because I can't understand what they're doing.

squidfood · on May 12, 2014

I'm a scientist who does lots of code. Most of my "projects" are 1000 lines or less (usually much less) to do a single function or calculation.

Last year I was pulled into my first larger-scale project (about 8 science coders at multiple institutions over 5 years). We were able to produce reasonable, readable code for each other on a file-by-file basis. But Version Control was the worst, worst part. Files emailed back and forth between subgroups that never made it into the tree, edits lost, we all had our own forked version at the end, essentially.

The most telling part was when I emailed both IT in my department and several professors (PIs) on the project, including those that taught "scientific programming", asking about setting up a source repository, if one of them could host one, and NONE of them had any clue what git, subversion, etc. even were, let alone where/how to set something up.

andreasvc · on May 12, 2014

You could set up a private BitBucket repo and simply give them the link to a .zip download, while you would enter any code you receive into the repo. It might be unfair that you would have to do all the version control, but it's better than nothing...

stcredzero · on May 12, 2014

At one company I worked at, we had EVCS. "Eliott Version Control System." Everyone emailed Eliott change sets and he put them together.

epaladin · on May 12, 2014

If you can say, what company? It sounds like a pretty interesting role- despite the frustration and difficulty of dealing with such code, turning that into something more generally useful/useable seems like it would be relatively fulfilling in the end.

eli_gottlieb · on May 13, 2014

>The second worst characteristic I find is code that doesn't actually work unless it is placed exactly in the right directory of a now non-existent server. They send me code (in a zip file, of course), no instructions, no configuration. And, then I spend several days or even weeks just trying to get it to work the way that they said it worked back at their research 'demo' a year ago. 'It worked last year', they say. And, then imply that I'm some sort of hack because I can't understand what they're doing.

As a graduate student who has had to deal with this kind of code, and finally joined together with another grad-student to fight back and make our software retargettable... I'm so, so sorry.

taeric · on May 12, 2014

I wouldn't say the opposite is so much worse, rather it is whichever annoyance you deal with is worse than the one you don't.

Also, there is a difference between over engineered and not engineered. It is truly the "over engineered" that has me annoyed nowdays.

danso · on May 12, 2014

This is true in finance and in other data-heavy fields as well. I've been shocked at the kinds of Excel sheets that, with a mess of spaghetti VB code written by someone long gone, factors into trades worth millions...sure, it "works"...but besides the very minor question of code elegance, who knows what optimization of returns could be made if the code wasn't such a fright that a knowledgable partner could tweak and experiment with it? Or that it was abstracted enough to be applied to the other kinds of trades that the firm is making (but hell what do I know, I'm not as rich as my hedge fund friends)?

What's particularly annoying is working with analysts who have a system of pasting SQL scripts from a (hand-labeled-versioned) text file to perform the necessary data-munging/pivoting for in-house use...their SQL work is, to be fair, so much of a leap forward from however such bulk data work was being done previously that they take offense when I offer to help them automate the work...as if their system of hand-pasting/executing scripts, then eyeballing the results for an hour to spot-check it, was inherently more reliable than a batch script with well-defined automated test parameters...What they fail to see is that it's not just about faster/better error-checking, but it's about more flexible analysis and output. Once the process has been abstracted, instead of producing one "clean" giant database that is faceted along one dimension (time, perhaps), the script can loop through and spit out a variety of useful permutaitons, which would be impossible/insanity if you stick with the hand-tweaked process.

That's the problem I see with the OP...A scientist can recognize when something seems to work, when it comes to the domain of programming and structure, but "what works" may simply be "what seems to work better than what I did last time"...which is not a foolproof standard of evaluation

dwc · on May 12, 2014

I'm a programmer and I've worked with scientists (planetary geology). The code is usually pretty bad, but ignoring how "pretty" or maintainable it might be, from the outside it ran way too slow, used too much memory, and botched edge conditions. On the good side, the intentions were pretty clear and the mathematics were sound. So it was pretty easy to fix things up to handle needed data volume and deal with the missed edge cases. As long as I was brought in within a certain window of time it was easy indeed.

The real issue is not best practices, per se, but what passes for them in some rather large circles. Yosefk's "DriverController, ControllerManager, DriverManager, ManagerController, controlDriver ad infinitum" is a fine warning sign. Nothing there is named after anything in the problem domain, and that's a sure sign of trouble. It's a sign that the programmer thinks the problem domain is software engineering or computer science, but that's wrong.

I've always seen becoming intimate with the problem domain as an integral part of programmimg in the real world. I've succeeded to the extent that I have been occasionally asked to provide help outside of software, by top people. How can anyone do a good job providing software solutions otherwise?

timr · on May 12, 2014

The question is (from direct experience): how long did it take you, and what was the cost to your career in terms of papers you didn't publish, research you didn't do, etc.?

It took me far too long to realize that there's almost no reward for code quality in academia. Code rarely gets re-used. Of the small amount that does, result consistency is a higher priority than maintainability, except for the .0001% of projects that end up being maintained by a large, collaborative team. So if you're the sucker who spends 30% of his time cleaning up the old code, you're at a 30% disadvantage to the people on the team who will quite happily use your work to publish papers, get postdocs/professorships and succeed.

I'm being a little harsh, but not by much. Unless you're tenured faculty, publishing is job one. The same rule applies to startups: code quality doesn't matter until you're successful, and once you're successful, someone else will be maintaining the code. The costs of badness are externalized to those who will voluntarily bear the burden.

beejiu · on May 12, 2014

I think you've hit the nail on the head. Scientists are not there to create great software. They are there to create great science. For the small amount of software that does end up in a commercial product, it will probably be rewritten anyway, and probably by somebody who wasn't doing the research in the first place.

SilasX · on May 12, 2014

So it "saves time for research" in the sense that scientists don't check that the code component operates correctly? In that case, why bother with code at all? Just make up plausible output and no one will look any further.

timr · on May 12, 2014

You're making an invalid assumption: "nice code" is neither a necessary nor sufficient condition for "correct code".

SilasX · on May 12, 2014

Where did I make that assumption? I said that scientists aren't checking that the results of the code are in fact correct.

timr · on May 12, 2014

If you're concluding that from what I said, then you're making the assumption. Bad code can still be well-tested.

SilasX · on May 13, 2014

Sorry, I wasn't clear: I meant that the reviewers aren't checking that the code is running correctly.

And yes, I'm sure scientists do a bang-up job testing their own code, just like they do a bang-up job validating their own experience, checking their own logic, and criticizing their own experiments.

But the whole point of science is not to trust yourself; to make reproducible what you did. To the extent that you seal off part of the process from this kind of review, you're not doing science, but something else.

jmcgough · on May 13, 2014

I think it depends on whether it needs to be maintained over a period of time or if multiple people need to work on the codebase. If it's just being written for one paper then sure, just get it done as quickly as possible.

However, there's no reason not to follow some best practices. Using a VCS has pretty much no cost other than some initial learning curve, and the productivity benefits can be substantial. So - I think there's a balancing act in terms of optimal speed between writing good code and writing code as fast as possible.

npsimons · on May 12, 2014

I hate "best practices", precisely because it implies there is one (and only one) "best" way to do something, and it's usually implied that there is only one tool that does things that way. That being said, I can see why "best practices" have come into being.

Like the article author, I too have worked on code created by physicists, mathematicians and yes, even electrical engineers. The article author is lucky; "bad" coding practices I've come across include:

- create a new directory, copy the files you want to change into the directory, then make new changes - that's version control! (nb - no, they didn't name anything to indicate which was the new "version").

- constructors with (I shit you not), 29 arguments, none of them defaulted. Of course, that was because it was converted from Matlab code where the original functions had 30 arguments . . .

- etc, etc, etc

I'll tell you what; give me your paper, and I'll implement the code from that much better than you ever could. Sure, I've had plenty of experience cleaning up other people's messes ("we've got this standalone RADAR sim written in Matlab; it should be quick and easy for you to convert to C++ and interact with a two other sims!"), which is precisely why I don't do it anymore. Or at least, I'll have a look and give you a better estimate than I used to, but I'll be honest and also quote you a much shorter time to re-write it from scratch.

hosh · on May 12, 2014

> "taking a thoughtful approach to the design of the software can vastly improve your productivity in the long run"

I think, taking a "thoughtful approach" is the key to a lot of different practices. "Best practice" as used by most people, in many different crafts and arts, is a method to avoid thinking on what it is you are trying to do.

The most effective kinds of "best practice" are the ones you mastered by making a lot of mistakes, not something you pulled out from a book or a class. It is naive to think you can substitute standards for personal mastery.

kunstmord · on May 12, 2014

I've waded through a lot of legacy and current scientific code (and still do that sometimes).

The worst part (not taking into account the coding style per se) for me was the (sometimes) inability to reuse the code I've encountered or adapt it to other cases.

I think scientific advisors should make a point which goes something like "If you're serious about your work, you might find one day that someone else wants to use parts of your code, so take that into account when planning your program". In my experience, a lot of programs are written as quick-hack solutions, and then there is no time to rewrite them, they grow bigger and it just snowballs from there.

The way CS was taught to us (and we're a big university) was pretty bad. No coding style, no experience with CVS, nothing concerning planning before writing new code. In the end, a lot of people got the bare minimum amount of knowledge needed to code, and started doing research using that knowledge.

yosefk · on May 12, 2014

I agree of course, I just think a scientist taking a more thoughtful approach > a scientist taking a sloppy approach > a "software engineer" taking an overly thoughtful approach. Because the latter could have written ~200K LOC spread in 5 directories and you'd need a debugger to tell which piece of code calls which.

Silhouette · on May 12, 2014

I think you're comparing apples to oranges, both here and repeatedly in your original article.

For one thing, you describe many "sins" that "software engineers" commit, but in reality code that was flawed in most of those ways would not even have passed review and made it into the VCS at a lot of software shops, nor would any serious undergrad CS or SE course advocate using those practices as indiscriminately as you seem to be suggesting.

For another thing, how many "scientists taking a sloppy approach" do you actually know who can successfully build the equivalent of a ~200K LOC project at all, even if those 200K lines were over-engineered, over-abstract code that could have been done in 50K or 100K lines by better developers? It's one thing to say a scientist writing a one-page script to run some data through an analysis library and chart the output can get by without much programming skill, but something else to suggest that the guy building the analysis library itself could.

gknoy · on May 12, 2014

It's not that a single scientist writes it, but rather that someone publishes a paper on something, with ugly code used to prove it, and then becomes a professor. Subsequent generations of graduate students are tasked with extending / improving this existing codebase until it is basically Cthulu in C form. ;)

I recall reading a propulsion simulation's code developed in this way. "Written" in C++, initially by automated translation of the original Fortran code. Successive generations of graduate students had grafted on bits of stuff, but the core was basically translated Fortran, with a generous helping of cut-and-paste rather than methods for many things. (I don't mean this as an insult to Fortran: I've tremendous respect for its capabilities, and have read well-written code in that as well.)

The net result was that fixing bugs in the system was very challenging, as it was a very brittle black box. It was not Daily-WTF-worthy, but still very frightening. I'm very grateful I was not the one maintaining it. ;)

microtonal · on May 12, 2014

You must not have been in science or you'd have encountered the 200K LOC program, written in five programming languages (two of them obscure), which can only be compiled on the author's computer. Oh, and add 50K of C code from ancient versions of other projects (which could've been used as libraries) for undocumented reasons.

Though, I have also had colleagues who were also brilliant programmers.

danieltillett · on May 12, 2014

This describes almost every published application I have ever tried to get running. It ends up being impossible to get the application working on anything other than the authors workstation.

happimess · on May 12, 2014

I would alter your list to say that a competent software engineer working together with a scientist > a scientist taking a thoughtful approach > a sloppy scientist > someone who is neither a competent software engineer nor a thoughtful scientist.

From the article and your comment above, it sounds to me like you have had to work with a terrible programmer who ranted about best practices to cover for his incompetence. We've all worked with someone like that, even in software shops. Don't tar us all with that brush.

mbillie1 · on May 13, 2014

I think it's a pretty shoddy software engineer who writes more LOC than the scientist. Good code is concise, readable without comments, etc. Bad software engineers write bad code is no different than a bad scientist reasoning that the sun is cold because the temperature in January is below freezing.

angersock · on May 12, 2014

What's really interesting here is comparing the two lists of problems the author gives.

On one hand, the problems are either product defects (crashes, missing files, etc.) or maintainability defects (globals, bad names, obscure clever libraries, etc.).

On the other hand, the problems the author mentions are basically things anathema to snowflake programmers (files spread all over, deep hierarchies, "grep-defeating techniques", etc.)

The academic's code scales vertically, because you can always (hah!) find some really bright researcher who is smart enough to grok the code and spend all the time in valgrind and whatnot to make it work. However, God help you if you can't find (or, more appropriately given the current academic culture, force) somebody to waste many hours of their lives fixing mudball code.

The other extreme scales horizontally, right? You have these many files, and deep hierarchies, and dynamic loading, but that's how a lot of people are used to doing it and that's what the tooling is designed to support. The big accomplishment of Java and C# isn't that it lets you get a 100x return from a 50x programmer, but that it lets you scale to having 50-100 programmers in a semi-reasonable way on a project.

In an ideal world, you have a small number of academics and engineers that communicate tightly and write good, compact, and clean code; in the real world, you want to pick tools that help you deal with the fact that it is hard to scale vertically.

EDIT:

At second read-through, I think the author just needs to use better tools. A good IDE makes code discovery much easier than mere grep, and helps solve a lot of other problems.

I do not understand the insistence of academics on using unfriendly tools.

mbillie1 · on May 13, 2014

> I do not understand the insistence of academics on using unfriendly tools.

My step father teaches doctorate business students. Until VERY recently he was running Corel Wordperfect simply because it was the first word processor he had installed. Never underestimate the potential stubbornness of smart people :)

dasil003 · on May 12, 2014

Setting aside the straw man of needlessly baroque architectures, I think there's an argument to be made that erring on the side of verbose but primitive code works in science because:

A) It needs to be read and understood by scientists who are primarily oriented around data rather than code.

B) Many people will need to read and understand the code who are not part of a core team maintaining a system over time. Peer reviewability is paramount.

C) In fact there is likely no "system" to be designed and maintained anyway, all scientific code is one-off in some sense.

All that said, software engineering as a discipline can further these goals, and it's a mistake to assume that getting "software engineers" involved will inevitably lead to complexification. A good software engineer can assess the goals and improve code along many axes, not just traditional enterprise software development patterns.

wpietri · on May 12, 2014

Agreed.

Another mitigating argument in his favor is that he appears to be practicing debugger-driven development. Personally, it gives me hives, but given his circumstances (not an expert, lots of code, much of it not his, lots of throwaway code), it may be his best option.

krick · on May 12, 2014

I don't know about programmers-vs-scientists-vs-engineers-vs-… and all that stuff (basically, these are just some words and I can question the fact they mean something at all), but I agree with the main point of the article. Or the way I interpreted it, anyway. That is "road to hell is paved with good intentions". I have to deal with it on daily basis. There is some legacy code in the project considered to be bad and always referred to as that. And, well, yeah, it is bad. But when I have to deal with some new architectural marvel of some my colleagues, who are considered to be good and actually are pretty bright adequate people, then I often think that that "legacy code" was actually easier to deal with before "refactoring". Exactly for the same reasons author mentioned.

I mean, some god object with multi-screen functions with 9000 ifs and non-escaped SQL is ugly and horrible, but in fact pretty simple to debug, comparatively easy to understand and often even easy to clean up a little bit without breaking anything. But some metaprogramming-reflection-abstract-class-GenericBuseinessObjectManagerProviderFactory-10-levels-of-inheritance is not. It even might be not ugly, it's often clever and somewhat elegant. If you know how that works. But if you don't (and for starters you always don't, unless you are author of that elegant solution) it takes you hours of pain and bloody tears before you can understand what happened here and finally make changes you wanted to.

I actually believe that this is a problem, because it isn't something that some person does, because he is dumb. He's not! It's the culture, that overly praises clever techniques and "elegant" solutions, while spreading the myth that "not sophisticated enough" means "bad". It's not! "Hard to understand" is "bad". Nobody really needs "cleverness" and "elegance", in the end of the day, they need something that works and is easy to understand and develop further. And the truth is that something "not sophisticated enough" (even if it's goto, copy-paste, mutable variable, global object, whatever) is often easier to understand than something sophisticated one.

userbinator · on May 12, 2014

I've been faced with this countless times too and I think the problem is a culture that equates complex and abstract solutions to elegance, when we should really have one that equates simplicity and conciseness to elegance.

By simplicity I also mean considering the solution as a whole, not the false simplicity of refactoring everything into one-statement-methods.

buzzybee · on May 12, 2014

When I've taken this thought to its extreme, Chuck Moore's ideology around Forth makes total sense:

If a problem is only going to be solved in a complex, Byzantine fashion, it's the wrong problem. Walk away from it. Solve a different one. Quit the job. Reconsider your lifestyle.

And most people aren't going to be able to consider it seriously on that level. The monstrous systems are there because everyone involved has collectively agreed that whatever is justifying the problem is so important that it's OK to let the resulting system grow monster-sized and swallow everyone up. On that basis the only thing anyone can hope for is a painkiller to make the monster a little less soul-crushing.

makomk · on May 12, 2014

That reminds me, I was trying to code something using Twisted Python the other day and regretted it for exactly that reasons. Abstraction layers and factories around factories around abstractions, oh my. What's more, despite those abstractions (or more realistically because of them) it wasn't easy to replace the underlying XMPP transport with a different one. I wound up having to replace a constructor and an undocumented internal method so I could substitute several of the layers with different implementations someone else had written, again in a way that was documented precisely nowhere.

stcredzero · on May 12, 2014

People forget when writing tools and libraries: easy things easy, hard things possible. Instead, people often aim for "can do everything possible in the spec, because I am awesome!"

Years ago, we had to go from a Sun JMS library to one from Weblogic. I had to instantiate a half dozen objects just to send a message. In the Sun one, you used one API call to authenticate and connect, and another to send.

Easy things easy, hard things possible. Otherwise, you're guilty of bad design.

collyw · on May 13, 2014

That reminds me of writing to a file in Java versus writing to a file in Perl / Python. (Its been a while since I did any Java, I hope it has improved since then).

heurist · on May 12, 2014

I agree with the author to the extent that you should keep the target audience in mind when writing code. I disagree that it means not using best practices. If you spend 30 minutes to an hour explaining objects or other conventions to your academic colleagues you'll all be better off when reading and writing code. It also sounds like an issue of developers being overly clever in general. Clever does not mean elegant and in my experience cleverness usually backfires for the reasons listed in the article. Elegance makes everything look easy.

danso · on May 12, 2014

> Oh, and one really mean observation that I'm afraid is too true to be omitted: idleness is the source of much trouble. A scientist has his science to worry about so he doesn't have time to complexify the code needlessly. Many programmers have no real substance in their work - the job is trivial - so they have too much time on their hands, which they use to dwell on "API design" and thus monstrosities are born.

Jesus, seriously? Can't tell if the author is just trolling flippantly in response to what may have been an unfair post...but ignoring the "programmers have no real substance in their work" thing...the OP mistakes thinks that "science" is all one needs to keep something on track. Uh, no. Just because someone thinks they know what they're doing scientifically doesn't mean they are good at examining or scrutinizing the way they work...which can include everything from the efficiency of data collection to the accuracy of such measurements. A good software engineer is not just fluff in such a situation.

adamors · on May 12, 2014

The entire post reeks of ignorance. I don't understand how it got to the front page.

c_plus_minus · on May 12, 2014

Totally agree. Seems he has a bit of a chip on his shoulder, by frequently mentioning "software engineering" in quotes.

And as someone who had the grave misfortune of having some experience in scientific code, all I can do is laugh at the OP's link. Yeah sure, scientist's are great at programming. I say let them off at it, because I want nothing to do with it. I value my sanity too much :)

spacemanmatt · on May 12, 2014

Seriously. It seems like there have been more neophytes willing to come out against mature practice like factory methods and other artifacts of polymorphic architecture. Oh well, complaining probably triggers a seratonin release or something for them.

TillE · on May 12, 2014

"Hey look, it's possible to misuse design patterns! Don't they suck, amirite guys?"

Online forums are generally filled with programmers who have never written anything large and complex.

spacemanmatt · on May 12, 2014

Even worse: "Hey look, it's possible to be confused by design patterns! Don't people who use them suck, amirite guys?"

The whole internet is amateur hour. I just got caught in a daydream, pretending like HN was above the noise for a while. Maybe it was.

ams6110 · on May 12, 2014

I know a lot of programmers who don't really do much real work. In particular one place that I worked they seemed to spend half the day promoting recycling, drinking water, and eating healthy; taking walks; playing chess; napping; talking about stand-up desks, etc. Maybe the OP works/has worked in a place like that.

collyw · on May 12, 2014

Actually that sounds a bit like the bioinformaticians in the last pace I worked. They would write a script that took 3 hours to run, launch it, and play on facebook for until it complete. (They could have learned to use a debugger / IDE /library that would have improved their efficiency - I don't think any of them used "advanced tools" like those). As the database programmer I always had too much work to slack off like that.

Another point, is that when I have to make a bigger architectural decision, then I don't jump straight in and code. I usually come up with two or three possible solutions in my head (or on a piece of paper). I will stare at the wall for most of the afternoon, go for walks, as I play the ideas off against each other. After a day or two I usually have an idea of which solution is better, and why.

collyw · on May 12, 2014

Most of the bioinformatics I work with write short scripts to count thing up (and some need my help to write a couple of nested loops after doing it for 4 years). Hardly difficult stuff.

danso · on May 12, 2014

Yes, some tasks can be done very simply, and decent programmers know this. I've written lots of Twitter-accessing code to do data analysis, but I love using the ruby-t gem, which lets me in a single line of UNIX (with pipes) do something like unfollow every user I currently follow who tweets more than 50 times/day and doesn't follow me back.

The OP is railing against dumb programmers...OK, that's fine. But just because I've heard of Josef Mengele doesn't mean that it's a good use of time to continually rant about the dangers of science.

Speaking of bioinformatics, one of my favorite programming blogs is from bioinformatics scientist Neil Saunders, who writes about scripts, complex and short, that he uses to data munge and efficiently run experiments. The title of his blog was inspired by an encounter he had with a fellow scientist who did not see the value of programming:

http://nsaunders.wordpress.com/about-2/about/

> You may be wondering about the title of this blog.

Early in my bioinformatics career, I gave a talk to my department. It was fairly basic stuff – how to identify genes in a genome sequence, standalone BLAST, annotation, data munging with Perl and so on. Come question time, a member of the audience raised her hand and said:

“It strikes me that what you’re doing is rather desperate. Wouldn’t you be better off doing some experiments?”*

It was one of the few times in my life when my jaw literally dropped and swung uselessly on its hinges. Perhaps I should have realised there and then that I was in the wrong department and made a run for it. Instead, I persisted for years, surrounded by people who either couldn’t or wouldn’t “get it”.

Ultimately though, her breathtakingly-stupid question did make a great blog title.

danieltillett · on May 12, 2014

I was I the same department as Neil when he gave this presentation - it is an amazingly small world at times.

danso · on May 12, 2014

Ha! That's funny...I don't know him at all, and have no idea whether he is actually well-known or well-regarded, given that he's not a social-media-celebrity or whatnot...but I stumbled upon his blog years ago while searching for various scraping techniques, and even as an engineer, reading his blog was a revelation for me that scientifically-minded, rigorously-logical people may yet still have a blind spot towards process.

danieltillett · on May 12, 2014

He is a funny character, but quite serious. I know he had lots of problem dealing with all the biologists in my old department.

ronaldx · on May 12, 2014

It's surprising there is no comment about wrong scientific code: code which apparently does one thing, but actually doesn't, and may produce harmful results.

(By "bad" code, I understand code which doesn't meet best practices)

It's very easy to accidentally produce wrong scientific code, partly since scientists are doing research. They use novel mathematical algorithms to solve hard problems, and it's not typically obvious what output is expected. It's not CRUD.

In this sense, the sins of the scientific programmer might actually be important - fragile code which crashes when something is wrong could be considered good - this may help to avoid publishing wrong code.

klmr · on May 12, 2014

I think the comment is implicit: bad code leads to wrong code. Or rather: with bad code, how can you tell either way?

> In this sense, the sins of the scientific programmer might actually be important

No, you’re turning facts on their head: “fail early”, as described by you, is not a “sin”, on the contrary, it’s a hallmark of good software engineering [1]. Of course a nicely formatted error message is preferred to a coredump, but the end result is similar. But that’s not what (a lot of) bad scientific code does. Instead, it veers into the realm of undefined or unpredictable behaviour by failing to recognise the existence of a problem.

[1] https://en.wikipedia.org/wiki/Fail-fast

apw · on May 13, 2014

It's not clear that a nicely formatted error message is preferable to a core dump.

With a core dump one can explore the execution environment at the time of the crash.

A nice compromise is a macro that prints an error message and then calls __builtin_trap().

undergrowth54 · on May 13, 2014

But a nicely formatted error message, so long as it is correct, can be read by many more people than a core dump.

I don't know how I would read a core dump of an error in a python program to understand where an exception came from.

raverbashing · on May 12, 2014

This is very easy to happen

For example, if you do a simulation with random numbers, and you're doing random()%NUM where NUM > MAX_RAND

The trick is that MAX_RAND varies between platforms cough Visual C cough

barrkel · on May 12, 2014

random()%NUM is usually wrong even if NUM < MAX_RAND.

random() is often a linear congruential generator (LCG: http://en.wikipedia.org/wiki/Linear_congruential_generator) for speed and simplicity purposes. LCGs are a multiply, an add and a modulus (the modulus is usually implicit from the machine word size). That means their low bits are highly predictable and not random at all.

    X(n+1) = (a * X(n) + c) mod m

Assume m is a power of 2 since it's usually implemented via machine word wrap-around. If c is relatively prime with m (in order to fill the whole range of m), then it will be odd. a-1 is normally a multiple of 4 since m is a power of two, so a is odd too.

So if X(n) is odd, X(n+1) will be even (o * o+o => o+o => e), and X(n+2) will be odd (o * e+o => e+o => o), and so on, with zero randomness.

So if you're trying to simulate coin flips and use %2, you will get a 1,0,1,0... sequence.

Someone · on May 12, 2014

It is often wrong even if random() were to produce perfectly random numbers; random()%NUM, for most values of NUM, will be biased.

For example, if RAND_MAX is 255, random()%10 equals 0, 1, 2, 3, 4, or 5 26/256 of the time, but 6, 7, 8, or 9 only 25/256 of the time.

That's why good libraries have a 'next(maxValue)' function that is more complex than random()%NUM. For example, see lines 251-268 of http://developer.classpath.org/doc/java/util/Random-source.h....

raverbashing · on May 12, 2014

I just tested this quickly (on MacOS 10.9.2), it is not a 1,0,1,0 sequence. (It is repeatable since I'm not seeding)

On Linux, same thing. It even gives the same sequence as the Mac OS version

(it's 100 numbers, "1,0,1,1...0,1," no \n at the end)

./rt | md5sum 7a5a5a0758ca83c95b21906be6052666

mturmon · on May 12, 2014

@barrkel made a small mistake, confusing rand() and random().

rand() is the earliest C random number generator. Its low-order bits (back in the day) went through a predictable sequence, so rand() & 0x1 was a bad source of random bits.

I don't think that rand() was specified so fully as to make this behavior required, but typical implementations exhibited it, so you could not use rand() for any serious work.

random() came after, does not use a LCG, and thus fixed this problem, so you would not see it if your code calls random(), whose man page says:

  The difference [between random() and rand()] is that rand() produces a much less
  random sequence -- in fact, the low dozen bits generated by rand go through a
  cyclic pattern.  All of the bits generated by random() are usable.  For
  example, `random()&01' will produce a random binary value.

Typically, because of this screwup, people use a third-party generator, like the "Mersenne twister".

barrkel · on May 12, 2014

LCG can be used for serious work, depending on your definition of serious. You need to use multiplication instead of modulus.

barrkel · on May 12, 2014

My comments were not specific to C. They were a general statement about runtime library provided random number functions across all languages - a risk to be aware of.

There are other mitigations. For example, Java's RNG uses an LCG, but returns the high 32 bits and uses a 48-bit modulus to counter this weakness.

cabinpark · on May 12, 2014

If you are a computational scientist who is using random() for any part of your simulation, you are an idiot.

cabinpark · on May 12, 2014

Apparently people didn't like what I said. But I stand by it. If you are using the built-in random number generator in your simulations you are doing something VERY stupid.

The built-in random() is not a good source of random numbers for scientific purposes. I've heard too many stories about how people don't think about the random number generator, only to have it bite them in the ass.

ronaldx · on May 12, 2014

> The built-in random() is not a good source of random numbers for scientific purposes.

Unless you are going to be more specific about when it's not good, this is not generally true.

Scientific purposes often includes Monte Carlo methods, and random() in Python uses a Mersenne Twister - a perfectly good match.

Be careful about calling people idiots.

scott_s · on May 12, 2014

You're probably getting downvotes for how you said it, and that you provided no explanation for what you said. A phrasing that would not clash with HN culture: "If you are a computational scientist who is using random() for any part of your simulation, you are making an enormous mistake."

raverbashing · on May 12, 2014

1 - You don't know what the purpose of the simulation is and what is a good source for every specific case.

2 - You're assuming people don't test their generators and just accept blindly whatever result it gives and/or don't compare results of known cases

3 - You say "If you do X you're an idiot" and provide exactly ZERO alternatives to it. Doesn't look very credible

EpicEng · on May 12, 2014

It's because you labeled these people as idiots. How about telling the truth and admitting that they may just be uninformed and making a mistake? One can be intelligent and, at the same time, make mistakes due to lack of experience, lack of training, lack of <whatever>.

weland · on May 12, 2014

I am a programmer who previously wrote scientific software, as a scientist. I can confirm the author's impression: the worst code usually came from the people with more "software engineering" expertise, but there was a catch: I can't think of any of them who were actually good programmers in the first place. Most of them couldn't fix a division by zero exception if the program consisted in a single line that read "1/0".

This isn't unexpected: everyone fucks up things that are his profession to fuck up. I was fucking up mathematical models of integrated devices and they were fucking up code.

But things really aren't that bad. Honestly. When I moved to industry, the first company I worked in was a small place place where the lead developers were exceptional, both as programmers and as leaders, so we wrote exceptional code and I also thought gee, I was coding a load of crap back then.

Then I moved to a larger, fairly well-known company and frankly, it's comparable. The mission-critical parts are ok, but the rest is such a gigantic pile of shit that it probably led to a few PhDs being awarded.

zo1 · on May 12, 2014

"the worst code usually came from the people with more "software engineering" expertise, but there was a catch: I can't think of any of them who were actually good programmers in the first place."

So you basically define an entire group of people based on only the ones you've met. Even then, you admit that none of them are good. Thus, all "software engineers" are bad?

If that's not what you were trying to say, then perhaps you should clarify a bit more. But it's how I interpreted what you were saying. And I probably wouldn't be the only one.

weland · on May 12, 2014

> So you basically define an entire group of people based on only the ones you've met. Even then, you admit that none of them are good. Thus, all "software engineers" are bad?

No, sorry if that's what ended up being understood (I assume you didn't read my whole comment?)

What I meant was that when I worked there, I've seen worse code coming from actual programmers than from scientists who wrote code, but didn't think of themselves as programmers. This isn't much of a surprise; it was a research lab and money were fairly tight since we were researching neither weapons nor patentable drugs. Most of it was spent on equipment and scientists. The under-paid programmers were usually under-skilled, too; brighter folks quickly left for greener pa$ture$, leaving the ones who couldn't otherwise land a job behind.

EpicEng · on May 12, 2014

I think the parent was trying to say that, in his experience, these people were not good programmers to begin with.

So it's not that best practices and software engineering is to blame, it's about poor and/or inexperienced programmers (we all know there are a lot of them) attempting to apply principles that they don't understand, and the result is bad code.

You also see to have completely skipped the middle of the parent's comment.

jerf · on May 12, 2014

It seems to me that the article ultimately addresses a straw man... I don't see anyone saying "Scientists would be better off if they adopted best practices that a first-year student would think are a good idea without any understanding" or "Scientists would be better off if they got terrible software engineers to help them". Most, if not all, of what's in the article is bad software engineering.

In fact, we are well aware that bad programming is not confined to scientists, and at the risk of being tautological, better programming practices lead to better programming outcomes than worse ones, regardless of science or not. (And note my careful confinement to "programming outcomes"... I won't promise that better programming practices lead to better final outcomes unconditionally, though I'd be willing to state there are times and places where it is called for.)

There are a lot of cargo-culting "software engineers" who blindly apply principles in stupid manners, but that's hardly what people are calling for here.

bluedino · on May 12, 2014

There's a guy who used to frequent a forum who routinely asked for help with his C programs.

He had no clue when it came to programming. He didn't understand the standard C functions, didn't understand memory allocation, didn't understand Big O.

He formatted his code in very odd ways, requested we tell him how to shut compiler warnings off, refused to use different sorting algorithms, it was just insane to see what this guy would do.

I recommended he switch to a language like Python so he could concentrate on getting the methods and ideas into code, instead of coming to us for help on arrays, file I/O, etc. He said he had been using C since the 80's and wasn't about to switch languages.

His justification for not wanting to learn to do things 'the right way' was that his way worked, and he had been published (with others) in a few academic papers, and therefore he was doing things right. Not that I would trust a single result generated from any of his programs...

cLeEOGPw · on May 12, 2014

You can train anyone to do anything with positive reinforcement that is salary. If he gets money for crappy C code, there's no motivation for him to improve.

kirab · on May 12, 2014

I just can’t find any relationship between the sins of software engineers he enumerates and the "best practices" he referenced in his title. In my experience best practices actually make those sins less probable.

fmstephe · on May 12, 2014

As a java developer I can say that his list of sins very nearly is our list of best practices.

I think that the programming culture has a big impact here. Yossi's complaints are typically about C++ programmers, and I can't comment on their culture directly. But I think Java and C++ both have "Design Patterns: Elements of Reusable Object-Oriented Software" as a spiritual foundation. So I suspect they are depraved along similar lines.

(Since it's hard to get tone across on a forum, I am being playful here. Although I am worn out by Java programming culture)

deong · on May 12, 2014

I used to work in a place that had a mix of Java and C or C++ projects going on, and different groups specializing in each. I always said I preferred dealing with the legacy C code than the new Java code. Function too long? I don't care. I can start at the top, read to the bottom, and understand what it's doing. If it actually is too long (rather than just longer than some new graduate's magic number k that is supposed to be the hard upper bound on function length), then I can quite easily break it up.

I could take Java code written by the best minds in that half of the company yesterday, and have absolutely no idea what it did, how it worked, where the files were or what resources it needed. Nothing actually happened in the code that anyone at the company wrote, as far as I could tell. They spent all their time writing what look to me to be prayers to the gods of third party jar files asking them to do whatever function needed to be done. "Well, I need to get a list of customers sorted by last name. Those heathens over in C++ land would write a SQL query, but I have Spring, Hibernate, and SOAP, so instead, I'll edit a generated XML file to refer to another generated XML file to refer to another generated XML file to refer to another generated XML file to refer to another generated XML file to refer to another generated XML file to create an object creation factory to create an object that can read something from a generated XML file that loads another generated XML file that tells Hibernate to load four gigabytes of customer data which I need to then prune down to what I need by editing a generated XML file so that Hibernate can send 20 records to a SOAP library that reads a generated XML file that reads a generated XML file to write a few bits over the wire where the client can read a generated XML file that parses a generated XML file that reads a generated XML file before crashing because the client's JAX-WS jar was 0.0.0.0.0.0.2 iterations off from the server's JAX-WS jar. But at least I don't have to write a difficult and error prone for loop."

watwut · on May 12, 2014

Getting "a list of customers sorted by last name" in hibernate does not require xml or anything like that. It is one java method with annotation containing order by query.

It is considerably shorter then traditional non-hibernate version.

deong · on May 12, 2014

I rather hoped that the rest of the text would have been a clue that I was exaggerating things slightly.

krisgenre · on May 12, 2014

Java code can be hard to read if you are just using an editor. Its usually a must to use an IDE when working on a large Java project, they make things like finding stuff really easier. As with newer versions of Spring and Hibernate, you can mostly do with only annotations and zero xml configuration.

henrik_w · on May 12, 2014

Totally agree. I usually like what Yossi writes, but I don't agree with this. SW engineering best practices in my mind are all about writing clear, easy-to-understand code, keeping it as simple as possible - i.e. nothing like what he calls best practices.

bpyne · on May 12, 2014

The author exposes a problem of ideological practice in software engineering. We're trained to organize code for extensibility and high availability, among other goals. We should probably learn to evaluate situations in which those goals are not helpful and adapt to the situation. After all, a surgeon probably doesn't follow surgical best practices when removing a splinter from his child's finger. SE's should probably recognize when a simple, non-abstracted approach works sufficiently for a situation and leave it at that.

collyw · on May 12, 2014

Fast, cheap, good, pick two.

JulianMorrison · on May 12, 2014

I really wish programming would get over its "OOP style" madness. If you are writing C++ or even Java that uses, rather than creates, an object hierarchy, then just write in procedural style with functional decomposition.

leephillips · on May 12, 2014

I'm not sure I understand what you're getting at, but I think I try to do this when I use Python: I never create classes, but write in as functional a style as possible. The problem is that I seem to be fighting with most of the libraries that I use, because they are all written by good, normal Python programmers who use the OO features of the language. You can't really use their methods as if they were pure functions, because they're not: they mutate arguments and have all kinds of side effects, often undocumented.

zo1 · on May 12, 2014

It's called encapsulation/abstraction. You're not supposed to know the internal state of a class, and you shouldn't care if it's changing. The fact that you're complaining about that means you're not using those provided classes correctly. It's like complaining that a car isn't driving properly on ice. And that's because it wasn't meant to drive on ice, but rather on non-slippery surfaces.

Perhaps you shouldn't try to pigeon-hole classes that were made for normal OO usage into your functional tastes.

leephillips · on May 12, 2014

I think everything you say here is correct. It sounded as if the comment I was replying to was recommending a practice that I've tried to follow, and I was explaining how trying use a functional style in an OO ecosystem caused me problems.

"You're not supposed to know the internal state of a class, and you shouldn't care if it's changing."

A recent headache was caused by trying to use a library (forgive the vagueness, I don't want to pick on anybody) that interfaced with an external service. One method was documented as returning a piece of information that had been previously stored in the main object through which you interact with the service. But when you invoke it the library makes additional API calls to the service and changes other data in the object. If you use the method in some expression that calls it 20 times it will make the set of API calls 20 times. There is no way to know this (ahead of time) unless you read the code. It's not documented because "You're not supposed to know the internal state of a class, and you shouldn't care if it's changing.". The author made assumptions about why you were using the method and what you were going to do next.

So what appears to be a function for retrieving a single value actually returns several values, returning one as a result, stuffing others silently into an object, and initiating network activity. This kind of lack of orthogonality and hiding behavior from the programmer is what motivates me to learn functional programming and avoid OO systems - although I understand they are a good match for programming GUIs and similar things.

mikhailfranco · on May 12, 2014

Yes, several of the original issues relating to mutual incomprehension would be solved if:

- scientists took a little time to read about design patterns

- programmers did not use design patterns gratuitously

htns · on May 12, 2014

Yeah. Most scientists are doing mathematics, not programming. Somehow math has done just fine without developing the patterns you see in programming. It's as if they weren't necessary in the domain.

diegoloop · on May 12, 2014

That's why I came up with the idea to make a huge repository (http://codingstyleguide.com) of programming conventions, "best practices", etc for any language. Where anyone (scientist, experts and newbies) can visit this platform and take a look at the "best" conventions to use.

collyw · on May 12, 2014

Thanks for this! I have been looking for something similar.

I find it easy enough to pick up the syntax of a new language, but this will allow me to go that step further and do things properly.

diegoloop · on May 12, 2014

Thank you @collyw! Still too much guidelines to post... The idea is to have different solutions for every writing convention in any programming language.

vendakka · on May 12, 2014

A large part of this boils down to being able to estimate how much technical debt you can afford to carry. I haven't seen very much code written by researchers (scientists/phd students/postdocs). However, from the little I've seen, the tendency sometimes is to either not be aware of technical debt accumulating, or unintentionally overestimating how much can be afforded. This results in unorganized codebases.

The other extreme is software engineers who focus overly on the mechanics and always underestimate how much technical debt they can afford. This results in over architected systems which try to plan for all eventualities.

Two useful skills to have as a software engineer are to know when to stop writing code and when it's okay to write messy code. The latter being done with the knowledge of when or even if you'll have to clean it up later.

ams6110 · on May 12, 2014

Technical debt is often irrelevant in scientific code. It's one-off code for a specific experiment. In many cases, once the paper is published nobody will ever run the code again. That's not always true, but it often is.

collyw · on May 12, 2014

Science is supposed to be reproducible. Writing a script that runs once on a specific machine is unlikely to achieve that.

leephillips · on May 12, 2014

Meaningful reproducibility would mean writing your own code and performing your own experiment in a different lab with different people to see if the results hold up. Running the same code more times on the same machine, or repeating a measurement in the same lab with the same people isn't what we mean by "reproducing" a result.

vendakka · on May 12, 2014

Exactly. This is the kind of technical debt I'm referring to when it comes to code written for science. Not something that can be used as a platform/library, but also not code having references to a single config file hard-coded in 5 places.

vendakka · on May 12, 2014

That is very true. I re-read my comment and realized it might come across as being critical of folks in academia not caring about technical debt, which was not the intention.

Technical debt in these scenarios is usually limited to a single paper cycle (or maybe 2). This is avoided with simple practices such as planning out your code with comments before writing it, not copy-pasting code into 3 or 4 places, etc. There isn't much overhead, but it saves a fair bit of time debugging issues caused by human error. It also ensures you write as little code as necessary for the problem at hand.

I completely agree that when it's just a single paper, there is not much need to ensure you have an elegant library which can be re-used.

bowlofpetunias · on May 12, 2014

TL;DR:

People being incompetent part-time do less damage than people being incompetent full-time. But for the sake of my straw man argument I ignore the fact that incompetence is the problem here.

yk · on May 12, 2014

Thing is, computers were build to handle scientific problems. And it shows in numerical programs, a especially egregious example was a simulation program with ~1k LOC main. However it is possible to work with that since it has a lot of structure, that main routine looked something like

    main(){
       initMatrix(reasonableName);//Repeat this block
       loadData(reasonableName);// 100 times.

       for(){
           someBookkeeping(reasonableName);
           //Another 100 lines
           for(){
                numericalStuff(reasonableName);
                //Again repeat 100 times
           }
        }
        cleanUpAndOutput(reasonableName);
        //Again repeat 100 times
        return 0;
    }

So that is of course horrible code, but at least it is horrible in a consistent way. I got really burned by code where the high level architecture was build by a software engineer and the details were filled in by a physicist. Then you get atrocities like should be separated classes that spread their functionality over multiple levels of inheritance. ( Along with several pages of a constructor...)

noselasd · on May 12, 2014

"Best Practice" isn't just about the code aesthetics, but things like source control, testing, assumptions made, documentation etc.

http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjo... gives some nice advice.

Htsthbjig · on May 12, 2014

Bad scientific code is bad, and bad programming code is bad.

Every time you do a clever thing, you have to do three things:

1-Document your smart idea on your code.

2-Document your smart idea, preferably on drawing.

3-Document your smart idea, preferably on audio-video.

People forget their smart ideas after 6 months or so. So basically if you have to debug the code later you have to spend at least the same amount of time you spent developing the smart idea in the first place each time you have to debug.

In my opinion, smart ideas are great tough, if you follow the three principles above solving a bug becomes fast.

Most people don't know about psychology so they believe that because they know it today, they will know it on six month time. Or worse, they fear that if they document their work they could be fired(this is the mentality of weak programmers that know they are weak, hopefully you wont work with these people, if you do quit as fast as possible).

jacquesm · on May 12, 2014

Simple code is simple to fix, but without some layers of abstraction there will likely be so much of it that understanding is hampered along a different dimension.

Quantity of code can be reduced by increased complexity of the code, at some sweet spot between the two is your ideal, code that is neither so dense that you can't read it any more and code that is not so verbose that you're going to be overwhelmed by the quantity.

It's never black-or-white, it is always a trade off.

touristtam · on May 12, 2014

I am surprised the author call himself a "programmer" and judge that "... Many programmers have no real substance in their work - the job is trivial ...". On what is based such judgment? Personal experience? Then it is totally biased.

He is also saying that "A scientist has his science to worry about ..." which is quite a demeaning of the job that a programmer has to go through in comparison: The programmer has to understand his field AND the domain: the scientist's field. Sure it might 'only' be to get the knowledge from the scientist. But doing away with software abstraction for the sake of writing simple (simplistic) code is hiding the fact that if the code is to be worked on in the future, it will be a giant spaghetti monster.

The author might as well write VB macros in MS Excel. :p

yosefk · on May 12, 2014

Yet in reality, I work on chip architecture and hardware accelerators and have never written VB macros in MS Excel. Go figure.

(I did write one VB macro in VS 6 though, I think. Perhaps it was that incident that distorted my worldview.)

touristtam · on May 12, 2014

Sorry if my previous comment seemed a bit harsh, I guess your blog post was bound to take some flak after such an opinionated view of the programmers in your field (and the perceived generalization to the programmers outside this field). That being said, you are possibly touching at a subject more fundamental regarding the human behavior, were some of us are perfectly happy not learning as much as we can and just going by our daily job. ;)

userbinator · on May 12, 2014

Although I've not worked with much scientific code, I think the phrase "best practices" has a lot to do with why this phenomenon happens; those who were subjected to formal software engineering education will have been exposed to them, and unfortunately for many, the best in "best practices" leads them to believe that these are the ultimate way to do something and should always be followed. Instead of thinking and reasoning about the problem, they are taught to apply a set of "best practices", and that doing so will provide the best solution. When faced with a problem, dogmatic adherence to these principles replaces actual thinking, and groupthink further indoctrinates them into something resembling a religion. Engineering is all about tradeoffs when solving problems. There is never a universal "best" practice that fits all situations, so I think anyone who claims to be practicing or teaching "software engineering" does not deserve to be called a "software engineer" if the bulk of their thought process consists of regurgitating "best practices".

On the other hand, those who haven't will approach the problem from a completely different perspective: they'll be primarily concerned with solving the problem itself, and will tend to use actual thought rather than relying on memorised practices. It's true that this can lead to "bad code" depending on their skill level and knowledge, but they'll also be far less likely to overcomplicate things and more easily understand and use abstractions appropriately. One of my favourite examples of this is the demoscene; it's comprised of programmers, most of them young and self-taught and never subjected to formal CS education or "best practices", who manage to do pretty amazing things with software and hardware. That's why I believe these people, the ones who learned "bottom up" from the basic principles of how the computer works and how it can be programmed, can eventually produce better code than the "indoctrinated software engineers" who were taught more on what to think than how to think.

Nimi · on May 12, 2014

Respectfully, I think the author got the reason wrong. In software, there are inherent problems, and non-inherent problems (as observed by No Silver Bullet). Scientists, when writing scientific code, can only encounter/create non-inherent problems: local (or relatively local) bugs in their code. Programmers, otoh, are employed in order to tackle (sometimes successfully, sometimes not) the inherent problems, which mostly distill to the problem of "scale". Note that most of the problems the author listed in those bullets may be described as "this doesn't scale".

So when the author is called to solve a problem created by a colleague, he either gets a very local bug in some scientific code, which is apparently easy to debug (I'm surprised about the concurrency stuff also being easy, but if that's the case - great for him), or a problem with a large code base, badly architected, which we all know is very hard to solve.

The author seems to imply that if software engineers would ditch this extravagance and start writing simple code instead, we would be better off - I highly doubt this. I mean, the code would certainly be easier to understand, but how much duplication would there be? How much more code would we have to understand?

kyzyl · on May 12, 2014

Okay, here's my attempt to distill things a bit. Now, I'm not very old, but my time putzing about this field has taught me that there are two components to carrying on successfully: experience, and communication.

Right now I run a company where we do science. To do our science, we have to write a ton of code. Web stuff, server deployments, numerical analysis, machine learning, hardware description, USB drivers, data visualization, you name it. Over the years I've gotten pretty good at picking the right level of abstraction for the job, and it's served me very well. The key is to have foresight about your situation, and foresight only gets better with experience. When I see one of the less experienced engineers at work doing something dubious, it's usually very easy to steer them away from danger. That's the experience bit; somebody has to know better, and has to act on it.

But knowing better isn't enough. If you want your academic lab to use better coding practices, you knowing better and flatly telling them to change their ways will never work. You have to convince them that they want to do it. If you can't come up with an argument for the use of your other language, or design paradigm, or SCM software that is both factually solid and contextually relevant enough, then 9 times out of 10 you are probably not hitting on the right solution. That doesn't meant that your lab director who is refusing to change is correct in his refusal, but I'll bet you if you pitched upon a solution that was right for the situation, and put some thought into your explanation, you would get a much better reception. The same principle applies when you're pitching a new methodology out in non-academic land, it's just that more often than not both parties' expectations are already closer to being in alignment.

Why does it work? Because most people in these types of environments, such as academic coding circles, are really quite smart. If something is sensible and sufficiently low friction, they will probably go for it. So, no, bad scientific code doesn't beat 'best practices'. But a good solution beats a bad solution every.fucking.time. If you can't show people that it's a good solution, it's probably not the right thing to do. Even if your idea is the better one, if you haven't convinced the people who are going to have to deal with it of this fact, they won't understand it, they will misuse it, mess it up, screw up their research and they will blame the software engineer. And they'd be at least partially correct in doing so, because you only painted half the house.

I'd end by saying 'It's really not rocket science', but... it really might be. ;-)

logn · on May 12, 2014

I think that the 'raw coding' style the author sees in scientists is desirable because that's essentially the base level of programming that even software engineers think in. But then software engineers jump one level higher and try to organize and abstract things. Not everyone can effectively write complex code on that level, and it's also an area where everyone has their own opinion on best practices or style.

Personally I often tend to write messy and 'not best practice' code until I know what direction I'm headed in. Then I refactor. I recently turned a few hundred lines of if-statements into a sensibly organized piece of code.

But I think the problem here is bad software engineers. And I don't think we should be apologists for poor coders (scientists or otherwise). If an organization accepts mediocre code from non software engineers, that's fine and more power to them. But I don't think it's good to encourage poor programming anymore than it is to tell kids grammar and spelling don't matter because it's better to read short slang phrases than long sentences no one will understand anyway.

dtech · on May 12, 2014

Alex Papadimoulis (from The Daily WTF) wrote an interesting short essay on exactly this problem, why this is caused and how you can detect and prevent it in yourself.

Programming Sucks! Or At Least, It Ought To: http://thedailywtf.com/Articles/Programming-Sucks!-Or-At-Lea...

spion · on May 12, 2014

The defeatist attitude in that article is interesting. I disagree. Its just a hard problem, and when we try to solve it we often fail and make things more complicated. It doesn't mean that the problem is unsolvable or that all of the tedium is inherent and irremovable.

Nursie · on May 12, 2014

So he's comparing the minor faults made by people writing single-use scientific programs to the excesses of the worst of 'enterprise' style coding.

It's not really any wonder he comes to the conclusion he does. It's a shame he doesn't know what software engineering is though. Hint - it's not about making things as complex, abstract and verbose as possible.

sanxiyn · on May 12, 2014

This rings true. Programmers are powerful, so programmers can do powerfully bad things. Non-programmers may write bad codes, but don't (probably can't) write powerfully bad codes.

acqq · on May 12, 2014

I also like to quote the following description of the problems generated following apparent "best practices":

http://blogs.msdn.com/b/ricom/archive/2007/02/02/performance...

"The project has far too many layers of abstraction and all that nice readable code turns out to be worthless crap that never had any hope of meeting the goals much less being worth maintaining over time."

The problem is that once programmers learn something that is "hard" to them, that is, something that demanded from them a big investment some of them then start to believe that anything they touch will benefit from using these hard-to-aquire techniques. That's how we end with "far too many layers of abstraction" and "the nice readable code turns out to be worthless crap." There's a lot of code produced following some recipes, without questioning if the recipes are appropriate to the problem.

Another problem is what I call "religious approach" to programming and design: blindly believing and applying without questioning all that is written in some books. It's an interesting psychological problem that often ends implemented in the code, which often happens due to the often "solitary" approach to design and code writing. If you have "architects" that don't look at the implementation and aren't ready to question and redo their own designs you can be almost sure the result will be ugly and maybe even totally wrong.

codezero · on May 12, 2014

This reminds me of a talk I attended at AGU about the huge legacy of code for climate modeling. The question was whether the projects should be started from scratch with better engineering principles. Check out the slides here. http://www.cs.toronto.edu/~sme/presentations/Easterbrook-AGU...

The goals of scientific code are often different than code written by software engineers. Having reproducible consistent output for a given input is very important and it's hard to move a bunch of Fortran to Python with confidence that the output maintains a 1:1 precision from historical inputs.

emsy · on May 12, 2014

TL;DR Software Engineers complexify the code because they have nothing to do and scientists do all the real work.

I was left with the impression that the author lacks a broad view over the software development landscapes and thus tends to generalise badly.

raverbashing · on May 12, 2014

Here's the problem

Programmers (and I see an example almost every day here on HN) don't know the math and equations (or the concepts) of even basic scientific calculations. Let alone some more complicated stuff.

Scientists on the other hand, most don't know the basic best practices of today, or the most modern techniques. We even have to be thankful they're not using Fortran (not that it's bad, but...)

But guess what, "wins" who can produce some results. In this case, the scientists with ugly code.

Nursie · on May 12, 2014

Speak for yourself. Some of us have had a reasonable amount of training in mathematics and various sciences as well as computer programming.

It's true that you don't always need this to be a programmer, but some of us have a decent grasp of some of this stuff.

raverbashing · on May 12, 2014

"Speak for yourself. Some of us have had a reasonable amount of training in mathematics and various sciences as well as computer programming"

It should be obvious that if I'm pointing about the issue I'm aware of it, hence I know something about math and science.

Nursie · on May 12, 2014

Then... don't presume to speak for everyone else?

"Programmers (and I see an example almost every day here on HN) don't know the math and equations (or the concepts) of even basic scientific calculations. Let alone some more complicated stuff."

This is a really sweeping statement and really does not apply to everyone.

raverbashing · on May 12, 2014

Well, true, in the way the sentence is written I am generalizing.

It meant to have a "usually" after the first parenthesis and the "don't"

And in the same way I know programmers who never heard of the Newton-Raphson method I know ones that know a lot about scientific subjects and mathematical methods.

Nursie · on May 13, 2014

The only reason I object is because people keep saying this stuff and it becomes accepted wisdom. Like "Programmers don't know the basics of science", "Software engineers always build an over-abstract, enterprisey-mess" or "people that studied computer science are only interested in solving esoteric technical things and have no view on business needs".

Whatever it is, it's starting to feel like there are a whole load of stereotypes building up that don't apply to me but might prejudice future work opportunities.

raverbashing · on May 14, 2014

This might happen, but this may be easy to filter in a CV/Interview setting (especially if the recruiter knows what they're looking for), and, of course, job application (one of the reasons to tune-up the cv and add relevant information to the cover letter).

I always made sure to get the point across, for example "oh, I see that your job opening mentions Finite Element Method and the area of numerical computation is something that interests me" or something similar for the other examples (if it's relevant to my case).

leephillips · on May 12, 2014

"We even have to be thankful they're not using Fortran (not that it's bad, but...)"

But what? If they happen not to be using Fortran, why should we be thankful?

raverbashing · on May 12, 2014

How easily can you find a Fortran programmer as opposed to a Python/C/Java one?

How easily can you integrate a Fortran program in an existent workflow/software?

danieltillett · on May 12, 2014

Humans can only handle so much complexity. With a lot of scientific code the underlying concepts are complex and this results in "simple" code. This is not a bad thing in the main as the people who understand the concepts (scientist) can understand the code - if you show some biologist elegant code they end up spending all their time trying to understand what the code is rather than what the codes is trying to do.

silentvoice · on May 12, 2014

I think I have a unique perspective on this, since I have both a strong CS training as well as a strong scientific computing training. I can look at code both from the perspective of producing useful scientific results as well as from the perspective of code quality (readability, maintainability, etc).

Scientific computing is a unique problem in programming that is different from what I think software engineers are accustomed to solving themselves. Performance is often extremely important, it trumps almost everything; yet changes need to be made to the code frequently which often necessitates rewrites. The software engineer solution to this is abstraction: hide details of conceptually unrelated components so that changes can be made in isolation without needing to propagate them through the whole source. Unfortunately throw in things like paper deadlines and abstraction quickly becomes a contradictory goal to performance. Since the kinds of changes one needs to make to scientific code can rarely be predicted (it's research, by definition we don't know what the results are before we get them), spending a huge amount of time on an extensible framework almost always is a waste of time since we can't design it for the unknown (believe me, I have seen dozens of these and none of them have widespread adoption in research, despite being of very high "code quality.").

Therefore an accepted practice is to write short hacks whose rewriting will be as painless as possible, if that is necessary. This is of course a generalization, not everybody does this. I'm just giving the reasoning behind this kind of code, and why modern practices are often ignored. The running assumption for this neglection is stubbornness or "get of my lawn" mentality from the researcher, but in my experience usually it's not.

cousin_it · on May 12, 2014

I agree that professional programmers often overestimate the benefits of their hard-won expertise and their beloved ideas about programming in general, and underestimate the value of being immersed in a specific domain. Here's another essay that makes the same point: http://prog21.dadgum.com/190.html

collyw · on May 12, 2014

I think this is an experience thing as well.

To me this post sums it up well: https://medium.com/p/db854689243

anatoly · on May 12, 2014

They keep saying that drinking lots of sugary drinks is bad for your health.

But I've seen people who drink lots of sugary drinks, and I've seen people who eat greasy burgers and nothing else, and let me tell you, the people who eat greasy burgers are much worse off.

It is therefore important to understand that drinking lots of sugary drinks beats eating only greasy burgers.

julienchastang · on May 12, 2014

I work at a scientific institution where scientific programming is our bread and butter (www.ucar.edu) and has been for decades. There are people here who are trying to make progress in the culture clash that Yossi describes. In particular, the UCAR Software Engineering Assembly holds an annual conference to tackle some of these thorny problems. This year's conference is over https://sea.ucar.edu/conference/2014, but hopefully it will be back next year. The conference had sections on best practices and software carpentry. Also, I am seeing a sea change where young scientist fresh out of grad school are actually pretty good programmers making traditional software developers (with little hard science background) somewhat obsolete.

RazorX · on May 13, 2014

I am a physics grad student and the paper I just submitted involved a lot of code (mostly fitting in Python plus plenty of LaTeX). I'm really trying to beat the stereotype here and write documented, maintainable, version controlled code. I see plenty of the bad code happening all around me. Research is supposed to be reproducible and the way the code that goes into the analysis is typically handled often runs opposite that goal.

Once I put the paper on the arXiv I open sourced everything and made a portal with links to all the repos. The only thing missing is the repo for the source of the submitted paper which I'll add once it makes it through APS review.

Here is my work if you're interested or have any feedback:

http://evansosenko.com/spin-lifetime/

Cheatboy2 · on May 12, 2014

Scientific code is often about solving ill-defined problems, working interactively and refining the work along the way. This is a different job than developing an application for end-users.

I am quite confident that, most of the time, scientific code is pretty bad. But I am also pretty sure that even in industry code is not so good anyway.

My worst experiences with scientific code was less the quick and dirty approach than on-purpose obfuscation and ego-coders who are more concerned by impressing others than writing simple and effective code.

galapago · on May 12, 2014

From the security point of view, i can confirm this. The team i was working on found a large number (+100) of exploitable binaries in Debian, most of them, in scientific packages.

spacemanmatt · on May 12, 2014

I work with scientists who self-trained in application development. They use design patterns and factory methods just like the next programmer because they have learned what is practical. We all started out writing shitty code but some of us had a need and a venue to refine the skill to the point that our employers could sell our work products. Researchers rarely see that pressure.

collyw · on May 12, 2014

What is practical in the short term is usually not practical in the long term in my experience.

mturk · on May 12, 2014

There is a strong, growing movement to improve scientific code. As two quick examples, the WSSSPE series: http://wssspe.researchcomputing.org.uk/ and the Software Sustainability Institute: http://software.ac.uk/ .

stcredzero · on May 12, 2014

The takeaway is not that software engineering is bad. The takeaway is that it can be misapplied. People can write spaghetti horrors using patterns every bit as unnecessarily complex as the the worst maze of gotos.

The takeaway is that your programmer had better be aligned with your goals, not programming for programming's sake.

brudgers · on May 12, 2014

[This is a sketch]

Lately, I've been thinking about the act of writing in a programming language as the act of writing. Which is to say I've been thinking about writing and wondering if it makes sense to think of programming as ontologically distinct. In other words, I'm wondering what it would be like if I were a lumper rather than a splitter.

As the push to toward greater 'coding literacy' gains ground --in the anglopnone world, at least -- there will simply be more marginally literate code for the same reason there is more marginally literate writing as a result of seeking universal literacy. A goal of functional literacy tends to produce the functionally literate not a society in which everyone is a person of letters.

Programming as just another form of written communication suggests that perhaps programmers are being trained [programmed ?] in a manner that predisposes them toward inflexibility as writers [1]. Imagine if Creative Writing graduates entered the workplace believing that everything must be written in meter. Imagine, having to maintain iambic pentameter or extend a sonnet. As bad as that sounds, adding a protagonist to a blank verse epic is probably a closer analogy.

Programming as just another form of written communication suggests that its rise in the academy has encumbered the minds of 'some people' with values closer to those of literary theory than practical empiricism. There aren't accepted 147 patterns for beam design. There are three - LFRD and SoM and pure empiricism from tables. Civil engineers are trained to use apply them in order of increasing complexity because Civil engineering training values simple standard solutions not creative ones.

The difference between engineering training and creative writing training is that the engineering ideal is doing it right the first time, not progressive refinement toward the great American novel through iterative rewrites - and editing. An engineer learns when a fink or howe or warren truss is appropriate. Civil engineering culture says "it's ok to solve simple problems" not "every problem is a prototype truss design problem." Treating every problem as the chance to create a snowflake is what the Creative Writing department does.

We can call programming "software engineering" or "computer science", but it's still pretty much just writing. What makes it unique is what makes all writing unique - the peculiarities of the audience and programs I will admit have some particularly peculiar members of their audience - computational devices whose peculiarity dictates certain conventions [keeping in mind that the goal of writing is effective communication].

What the article hits is that these conventions become stylized and that classically trained programmers fall into the practice of using style as a starting point.

Of course this is a sketch of an idea that's kicking around in my head, and as I've written it I've got the impression that despite all the bigO one finds in computer science departments, the process of constructing software systems is pretty much entirely empirical. That's why there is no SoM and LFRD - bigO is just the kernel of an SoM analog. What's missing is time and experience which grounds the field in humanist values. It will take time before the idea that the bridge is built so that people can cross the river, becomes the obvious driving force.

[1] Not all mind you, or any necessarily. I'm just accepting the characterization in the article as fact.

jostylr · on May 12, 2014

Do you have thoughts on literate programming? This is turning programming into the act of writing. I find it to be very liberating.

Many of the sins mentioned in the article can be addressed with good use of literate programming techniques. In particular, it makes reading the code a joy.

brudgers · on May 15, 2014

I'm wondering if good coders are just good writers, and "a good coder" is simply what we call someone who writes well in a particular genre in the same way we call someone "a good poet."

To put it another way, we see a continuum between the acts of writing a product manual and a textbook and a news feature and a short story and a novel. Why are we convinced computer programs ontological different?

All that said, Literate Programming often seems over done to me. Stripped down to it's essentials code can be crafted like poetry. Literate programming is only literature when being crafted as a literate program allows for something literary to be expressed which can't be said without untangling the knot of procedure and weaving it into a document.

klibertp · on May 13, 2014

The problem with literate programming - which I absolutely love myself - is that you demand people to be both good coders and good writers. From my personal experience, there certainly are people who fit the bill, but they are a minority. Being (hoping to be) one of them it took me a long time to understand that not everybody is able to write good prose and good code at the same time... and that there are even people utterly incapable of one of these, while still being proficient in the other.

Having said this, encouraging literate programming practices is certainly a good idea, just don't expect it to be a silver bullet for code quality issues.

steveklabnik · on May 12, 2014

You might really enjoy "Critical Code Studies."

brudgers · on May 12, 2014

So long as my the above thesis - roughly all programming is driven by literary theory - is true so that I can still try to write code. As much as I enjoy writing for humans, I enjoy writing for computers too.

After I wrote that, I wondered how alluring is literary theory among computer scientists:

   Knuth -> Literate Programming
   Abelson & Sussman -> Programs must be written for people to read
   Guido van Rossum -> The Python Way

Literary theory is a siren. The web offers the potential for a vast salon. Programming is steeped in aesthetics.

steveklabnik · on May 12, 2014

I actually almost went and got an English master's after my CS undergrad. I've found that there are more people from the humanities who are interested in code than there are coders interested in the humanities, though there are a number of us :)

noname123 · on May 12, 2014

Hi Steve, want your take on Tara McPherson's thesis on sexism and racism in computing is rooted int the modularity ("separate but equal") of UNIX Kernel design.

Original paper: http://dhdebates.gc.cuny.edu/debates/text/29

One Synopsis: http://samplereality.com/gmu/digital/2012/09/11/race-and-tec...

Follow up discussions: http://vanilla.dhpoco.org/discussion/26/week-2-readings-and-...

My take: I found the paper highly interesting and would like your take on it since it's relevant to what you're talking about. My take is her sourcing of Eric S. Raymond and his writings to be really interesting. ESR comes off as an anti-authoritarian (hackers glider, special) and also very tribalist (open source, GNU). On one hand, the presumption of meritocracy as an given in IT is really highlighted in ESR's manifesto's; on the other hand, he really belongs to a subset of "old school" hackers with Steve Levy, Richard Stallman.

I'd be highly interested for a socio-literary critical analysis of code and blog posts by DHH, PG and patio11 to construct the attitudes of the "Entrepreneur/Ruby" generation and whether the debate/adoption of new frameworks/languages/paradigms (Go, Ember.js, Rust, Responsive Design) is a new form of elitism and/or marketing for software developers as a consumer class. Also the shift from centralized servers (Sun SPARCs/Windows NT) to a more distributed programming model (Node.js, Celery, RabbitMQ, Hadoop) mirror the fragmentary nature, autonomous organizing of modern living (contract vs. full-time tenured employment, delayed marriages, millenials' definition of hooking up and founders dating).

steveklabnik · on May 12, 2014

I was actually physically in attendance when McPhereson presented this paper, long ago. Well, at least, she presented it at my university, I don't think it was the first time she revealed it or anything. I also own a copy of this book somewhere...

My take at the time was that McPherson wasn't making a causal or even correlative claim: just showing that this similar social structure appeared at the same time in two completely distinct situations. Nothing more than that. Which ends up being _interesting_, but not something that you can directly _use_ in any day-to-day way. That said, I do think the 'modular form' makes a lot of sense, and it has impacted some of my thinking. For a recent variant of this, you may enjoy Galloway's criticism of the OOO people. He draws a sort of similar conceptual schema with regards to {Ford,Taylor}ism's modularity and OOP, as well as the anti-humanism of OOO.

Then again, re-reading just now:

> In short, I suggest that these two moments are deeply interdependent.

Sooooo maybe that assessment was off. I'll have to give this another read. I guess I'm currently in agreement with your 'summary' link.

> My take is her sourcing of Eric S. Raymond and his writings to be really interesting.

Yeah. I mean, it's the logical place to go. And if you want to talk about the intersection of programming and race, well, esr is one of the more embarrassing sources you can pull up. She didn't touch on any of that, though, which I was pretty surprised about.

> I'd be highly interested for a socio-literary critical analysis of code

I'll take only the first half of this sentence to plug my own project: set up an email reminder to check out http://metaphysics.io, which I recently purchased for this express purpose. I'm still working out exactly what the project is, but I have four initial essays that I've drafted up so far. I think I'm going to release them as "issues", rather than one at a time like blog posts. Three of them are explicitly computer science related and one is not. The first issue deals mostly with affect theory, because I've been enamored with it lately.

You may want to check out what Kevin Brock has been doing lately around the intersection of rhetoric and code. There are other people who touch on this from time to time too, one of the most recent: https://www.destroyallsoftware.com/blog/2014/tdd-straw-men-a...

> whether the debate/adoption of new frameworks/languages/paradigms (Go, Ember.js, Rust, Responsive Design) is a new form of elitism and/or marketing for software developers as a consumer class.

This idea is super interesting. I want to describe it as much more... cynical than what I think, but it may in fact be true.

> Also the shift from centralized servers to a more distributed programming model mirror the fragmentary nature, autonomous organizing of modern living

I will most certainly be writing some thoughts about this in the future. I almost can see a parallel to McPhereson: two modes of organization, springing up in the same time and place.

noname123 · on May 14, 2014

Thanks Steve for your reply.

I've booked metaphysics.io and read the TDD/rhetoric article.

I see his pt. regarding people using "lores" like (ESR's writing or 37signals) to promote their ideas. The funny thing is had I encountered this article randomly on HN, I'd have not registered the whole "lore"-bit and skipped right to the milli-second benchmarks the author presented and the code snippets. But the truth is, by using the whole lore, the author has already primed me to think that the TDD mocking practice is somewhat flawed; the code snippets themselves just become the window dressing. But I'd have came away satisfied with myself looking at the numbers "objectively".

Also looked up Galloway and remembered that I skimmed a bit of this 4 years ago: http://www.amazon.com/Protocol-Control-Exists-Decentralizati...

I found it highly interesting that the book was published at the dawn of Facebook (April 2004) where Galloway focused on the gov't and commercial cabal (Oracle, Sun, Microsoft, Cisco, Network Solutions) on driving networking protocol standards, Java standards and RFC comittee's. Using Foccault's panopticon as the device, he painted almost an "1984"-like dystopia. Of course, in 2014 where we have BuzzFeed, Facebook, Instagram, Snapshot combined with NSA and Edward Snowden, but also a MSFT led by an Indian guy who wants run .net ecosystem like Apache, I'm not sure whether we are living in an Stallman-Communist state, NSA-Authoritarian state or Soma-Facebook-Brave New World state.

What's more interesting to me however is a programmer's role in all of this. Steve Levy's "Hacker," "2600" and cyberpunk seems to be far away in today's HBO's "Silicon Valley" and accelerator culture, just like the 80's. I'm hoping that grunge will happen in response to the Sex Pistols though.

brudgers · on May 12, 2014

The advantage coders have is that they have an easier time finding paid employment writing. So if nothing else, they tend to be better typists a few years after graduation...and perhaps more attentive to punctuation.

They learn to code meme is running rampant. But it's going to be hard for former humanities majors to make the jump. They tend not to have the math background and so long as programming is treated as engineering or science, that will be a barrier to entry.

steveklabnik · on May 12, 2014

I haven't found it to be that big of a deal, and I've spent lots of years teaching people. It's true that some background in discrete math/logic is incredibly helpful, but I think it's pretty rare for a math background to be needed for non-theoretical computer science.

I'm actually teaching a programming workshop at this year's Cultural Studies Association, so we'll see. :)

brudgers · on May 13, 2014

No other domain allows such concise examples for illustrating concepts. And no other domain locks so many people rigid with fear.. Merely introducing numbers can evoke memories of failure experienced as a tween. As a form of illiteracy it separates the afflicted from The field's important navigational landmarks, from useful abstractions and limits what can be communicated to the peculiar portion of the audience.

I guess my literary theory is rooted in the idea of a classical cannon. There's forever to learn more maths, but learn more a programmer must to communicate ever more clearly.

dragonwriter · on May 12, 2014

Not having a strong math background is a substantial disadvantage to learning computer science, but practical programming isn't computer science, and there's quite a lot of material available aimed at teaching programming that doesn't assume a strong math background and doesn't try to teach CS as such.

nmrm · on May 12, 2014

Another problem in many research settings is compensation. Labs/centers often cannot compete with industry on pay. They then assume that experienced professionals are willing to take a substantial pay cut to "do science".

In reality, the opposite is true. You are asking someone to forego like-minded software people at Large Software Corp. and instead work with peers who largely view software as plebian and inferior to their science (see the last 2 paragraphs of the article!).

These under-paying research labs end up with bad engineers, or young engineers who now have zero mentorship opportunities (just as bad as a bad engineer).

The take-away is that scientific labs should expect to pay slightly more than the going rate for mid-career software engineers.

Fomite · on May 12, 2014

> The take-away is that scientific labs should expect to pay slightly more than the going rate for mid-career software engineers.

Whether or not they should do this (I've been in a number of labs where a skilled software engineer is highly valued, as are many other lab technical staff members), this isn't going to happen. "Slightly more than the going rate" is pretty grant-breaking.

nmrm · on May 12, 2014

> "Slightly more than the going rate" is pretty grant-breaking.

That's probably correct. But it's somewhat disingenuous to sample from the lower end of the distribution and then make sweeping statements about the profession.distribution.

Fomite · on May 13, 2014

I don't follow.

ianstallings · on May 13, 2014

Most of these gripes sound like "programming is hard" to me.

_3u10 · on May 12, 2014

It's not so much scientists vs. programmers, it's more about experience.

If you write a lot of software, hopefully you're getting better at it, learning new skills, applying that, etc. It's like abstraction is like calculus if you know it it greatly simplifies things, if you don't it mystifies things.

More experienced software writers generally write at a higher level of abstraction.

kyzyl · on May 12, 2014

I would amend you statement by saying that more experienced developers are generally proficient at writing at a higher level of abstraction, but are more importantly also better at choosing the correct level of abstraction for the job.

I think that's really the key. When I peek into a well designed system that really fits the bill, the feeling I usually get is 'how sensible', not 'wow, this is a really great way to abstract the problem!' or 'this is kind of complicated, but very well designed!'.

The same sentiment applies pretty much everywhere, though. Humans draw their intellectual power from the ability to generalize, and so we see the same phenomena across disciplines. Writing? Most really effective writing is carefully constructed to include just the right statements, using just the right words, and nothing more.

Circuit design? Same thing. You can wire a circuit that will work (or not) in many, many ways. But when someone with experience runs across the work of an amateur, even if it's a functional circuit, it's very rare that he/she can't list off a bunch of ways the design could be fundamentally improved.

Carpentry? Yup. You can hack and nail and drill and glue something together that will work. Or--and I've seen this a lot--you can get a mechanical engineer to produce a profoundly complicated and structurally sound deck made with laser cutters and brackets made on a 6-axis CNC mill, but a seasoned carpenter can produce a much more elegant solution using a skill saw, a drill and a pencil.

_3u10 · on May 12, 2014

Yeah I think you hit it on the head... the right level of abstraction.

This is why I'm convinced LOC is one of the best measures of design, if a given design results in less LOC for a given problem it's the winner.

fmstephe · on May 12, 2014

I think that Yossi's point was the the non-programmer writes at a very low level of abstraction and this is a great virtue. Although there is a great deal of nuance to this debate I fall more and more on his side of it.

I think that one of the biggest problems with this debate are the numerous examples of exceptionally powerful and useful abstractions we used every day.

Programming languages are a wonderful abstraction over assembly which is a marvelous abstraction over machine code. Compilers leverage this abstraction to produce exceptionally fast machine code that few humans could ever match.

The file system is a fantastic abstraction over a broad range of very complex storage medium, you can even pretend a hard drive on another computer is part of your local file system.

So abstraction is clearly the most powerful tool in our toolbox. It's what allows us to climb above the Roman Numerals of computing.

I think the thing that is lost in this debate is the number of failed abstractions. Each of the abstractions listed above were hard won and are painstakingly maintained by dedicated developers. Even within their ranks there are countless failed examples, file systems that were unreliable and programming languages that made life harder than it needed to be. The other feature of those abstractions listed above is that there are few of them and we spend a large amount of time learning about them. As a Java developer I have invested a significant amount of time learning the details of Java garbage collections. Now there is a wonderful abstraction, that has some sharp spikey corner cases.

We can look at some common abstractions, third party libraries. I will use Spring as an example. The list of abstractions explodes, each one appears unique and I know developers who swear by Spring and use it extensively and who struggle to answer concrete questions about it. Vast quantities of unreliable and very hard to read code has been built on top these kinds of abstractions. (I note reluctantly that plenty of reliable software has been built on top of Spring too, and there are developers who have a very good grasp of at least some of it).

Finally lets survey a final category of abstraction. The ad hoc per project abstraction. This is what Yossi is actually talking about in his article. In enterprise Java these are typically a swirling mass of classes whose names are concatenations of various design patterns. These are typically of a very low quality, the programmer who wrote them had deadlines and has since moved on. They are 100% unfamiliar, they are 100% awkward and, like a man lost in the desert desperate for a drink of water, I find myself yearning for a concrete instantiation and a plain old method call.

My personal feeling is that, yes abstractions are extraordinarily powerful, but only when they are good and only when we understand them. I think producing new and useful ones is always harder than we expect. It's good to remember that we didn't replace roman numerals with ten thousand unique numeral systems.

YMMV :)

mercurial · on May 12, 2014

> My personal feeling is that, yes abstractions are extraordinarily powerful, but only when they are good and only when we understand them.

Certainly. Saying "I have created an abstraction" only works in conjunction with "I actually understand the problem on an abstract level, and I am familiar enough with the existing codebase." However, thinking that "amateur code", with bad or no data structures ("I actually want to use a dictionary, but let's have a 15-cases if/ifelsif/else expression instead"), 5-screen long functions, badly named variables and functions, state all over the place is "better" is folly. It isn't, it's just as terrible if not worse, and usually extremely brittle. Not to mention usually impossible to unit-test properly.

ams6110 · on May 12, 2014

Dictionaries are a great example. They are a useful general-purpose abstraction. Often times programmers will develop a hierarchy of classes that once you get to the root of is just a dictionary with some window dressing to hide the fact that it's just a dictionary. Sometimes it's easier to expose the dictionary directly and code the special cases directly. The programmer may fret about tight coupling but if you know you're never going to replace the dictionary-based implementation, it can be a lot easier to understand, especially if it's one-off code. And when dictionaries are provided by your language/libraries, you don't have to unit test them at all; you assume they work.

mercurial · on May 12, 2014

But now your program is weakly typed. Using the same logic, maybe your id is just an integer, but having an "ID" type will prevent bugs in the future, just like "dressing up" the dictionary with a meaningful name and exposing only the minimum amount of methods necessary will avoid people being tempted to, say, add things they shouldn't in your dictionary, not to mention that strong typing has a documentary value.

frozenport · on May 12, 2014

>>Complete lack of interest in parallelism bugs

Sounds Like

>>The result is that you don't know who calls what or why, debuggers are of moderate use at best, IDEs & grep die a slow, horrible death, etc. You literally have to give up on ever figuring this thing out before tears start flowing freely from your eyes.