In CPython, types implemented in C are part of the type tree

legerdemain · on Dec 24, 2020

Ugh... The type hierarchy of Python is painfully intertwined with the implementation of its type system. That implementation varies by Python implementation, and since the Python language is as old as Visual Basic, Borland Pascal, Werther's caramels, and trilobites, it varies dramatically over time. Add to that the fact that Python has a philosophy of wearing leaky abstractions proudly on its sleeve and barfing implementation-level details at its users.

Python really is one of the most complex, arbitrary, and poorly abstracted languages in popular use. I have no idea how anyone lives with it for anything except writing extremely repetitive, formulaic, and superficial code, because having to dig into its guts feels like a nightmare. Even Java's type system, as antiquated and simplistic as it is, is easier to understand and work with by comparison.

mattbillenstein · on Dec 24, 2020

For all its criticisms, it's an unusually good language for just getting shit done.

Sure, you can build whatever you want in almost any language you want, but 90% of those will be more verbose, harder to read, require compile steps, and be 10x the LOC, and take 4x longer to write... You can have it; I'll keep writing Python.

FartyMcFarter · on Dec 24, 2020

> unusually good language for just getting shit done.

I agree, but only for very specific/small values of "shit".

A codebase with thousands of lines of Python is IMHO already stretching the limits of readability; when you get to the tens/hundreds of thousands of lines, it's definitely past.

boxed · on Dec 25, 2020

Not really. I've worked 9 years on a 200kloc -> 300kloc code base. It's not a problem.

strokirk · on Dec 24, 2020

What do you use instead?

I've tried and used 10-15 programming languages in my days, and while I haven't used them all professionally Python is still by far my preferred language out of all that I've tried. TypeScript comes in at a close second, although it depends a lot on what you use it for of course.

I've managed to write both buggy and working code in all languages, and can't say that Python is more buggy than any other. Slower at runtime, for sure. But not less correct.

lmm · on Dec 25, 2020

Not the person you replied to, but I was a big Python fan until I found Scala. I try to write code that looks like Python (which means avoiding some libraries that rely too heavily on symbol-heavy names), but the type system can help me check that I really know the things I think I know, and even in small Python scripts there are conveniences that I miss (e.g. case classes - though nowadays attrs is more or less equivalent).

FartyMcFarter · on Dec 25, 2020

Correctness is not necessarily the issue. The hard thing with large Python codebases is that the lack of types and data definitions makes it hard to understand what data any piece of code is operating on. This may not be an issue for a project written/read by a single person, but when working on big team projects it's a different story.

Any statically-typed language will avoid this issue.

Rotareti · on Dec 25, 2020

Python supports type annotations and static typing via mypy and co. I find statically typed Python is absolutely comparable to other statically typed languages. At least I don't feel much of a difference working with it compared to go, typescript.

scotty79 · on Dec 25, 2020

I think Python is the language I wrote largest variety of programs in.

I recently tried to get stream of controller locations from Steam VR.

I was stumbling around looking for information about that in any language and the thing became a breeze only after I found Python bindings for it.

I never done commercial work in Python in my life. I did PHP, some Java, some Ruby, C# and JavaScript. But I keep encountering Python in stuff I do out of curiosity.

smabie · on Dec 24, 2020

Python is incredibly verbose and so mutation heavy it makes me sick. It's just a tasteless language, with very few redeeming qualities.

strokirk · on Dec 24, 2020

What do you use instead?

smabie · on Dec 25, 2020

I like: Julia, OCaml, Scala, F#, and kdb+/q.

I still use Python sometimes, but have migrated a lot of data analysis onto Julia. Works great.

p1esk · on Dec 24, 2020

Probably Haskell :P

pbourke · on Dec 24, 2020

> I have no idea how anyone lives with it for anything except writing extremely repetitive, formulaic, and superficial code, because having to dig into its guts feels like a nightmare.

It’s interesting that Python powers an outsized chunk of contemporary scientific research.

Perhaps its fitness for purpose is a result of its messiness, or perhaps that messiness maps better to the way that many people think? Would a more rigorously-typed alternate-universe Python have been as successful?

pansa2 · on Dec 24, 2020

> Would a more rigorously-typed alternate-universe Python have been as successful?

One of the reasons Python became popular was its reputation as “executable pseudocode”. Rigorous typing would detract from that.

Guido and the authors of Mypy clearly believe that now the language is popular, it can afford to add explicit types and become less like pseudocode. But I don’t think it would have become as popular had it taken this direction from the beginning.

lmm · on Dec 25, 2020

> One of the reasons Python became popular was its reputation as “executable pseudocode”. Rigorous typing would detract from that.

Not necessarily, if it had good type inference. I write Scala that looks pretty close to Python (unfortunately the community as a whole makes extensive use of symbolic names, which I think is a mistake, but you can avoid those libraries), and with Scala 3's indentation-based syntax it can become even more so.

pansa2 · on Dec 25, 2020

> I write Scala that looks pretty close to Python

Interesting. I’ve never used Scala - do you have any example code available?

lmm · on Dec 25, 2020

Hmm, now that I look at it my personal code is mostly gnarly functional libraries. https://github.com/m50d/plus-minus-zero/ is probably the project that's closest to something I'd do in Python, but I've still used a fancy reactive library and some functional tools for dealing with the explicit async futures.

(It's a quick calculator tool to compute your European Mahjong Associaton rank score based on your tournament results, and can import your existing results through some quick and dirty web scraping. The domain linked from github has expired, but it's running at https://adoring-khorana-c20461.netlify.app/ - Scala usually runs on the JVM but this build is set up to compile to JS and run in the browser).

wyldfire · on Dec 24, 2020

> Would a more rigorously-typed alternate-universe Python have been as successful?

Yes, perhaps even moreso. It certainly would've been easier for alternative implementations like PyPy.

Maybe even easier to make the transition to Python3 if the interface had been better designed.

But, hey, with all its faults I still think it's really great and deserves a lot of credit. That said, it could be a good time for Nim or some other new language to shine.

Blikkentrekker · on Dec 24, 2020

You make the common assumption that success is caused by property, rather than by the most important criterion of “right place at the right time”.

It would not surprise me at all that in an alternate universe, the exact same language was released a mere six months later, and was forgotten into obscurity.

Microsoft is undeniably the giant it is today largely because at that exact moment IBM needed a cheap operating system for it's line of personal computers and that Bill Gates' parent at the time worked for IBM and could arrange that deal certainly also helped.

The success of many giant companies can be traced to a pivotal event that was sheer dumb luck — for FedEx, this event was actually gambling in a casino with what money they had left when they were close to bankruptcy.

scotty79 · on Dec 25, 2020

You are right about cause of many things with that observation.

I fully agree that you can trace the shape and popularity of C# to many specific "place and time" business events.

However Python is different. Whent it got traction it wasn't the only thing around, and it's usage over the years follows the curve that no other language is following.

Because of that I believe there's more to Python popularity than just circumstances of its birth and development.

throwaway189262 · on Dec 24, 2020

IMO it's because researchers like it, because you can slap together some super sloppy hacky mess prototype and it will usually work. More germain languages stop you from from generating pure chaos.

Researchers also love excel, Fortran, and Matlab. Let that sink in for a bit.

Bring the downvotes hahaha

rednafi · on Dec 24, 2020

This is true. I primarily write research code as well as some code that goes to prod.

I love writing Python for exactly the reason you've mentioned. If something that needs to be re-written in a "proper" language, we've guys that can write that thing in them quickly and I don't need to worry about them that much. Did I mention those guys usually gets paid half as much as I do? As long as things work man.

legerdemain · on Dec 25, 2020

  > Did I mention those guys usually gets paid half as much as I do?

You certainly did! Congratulations on being a more valuable person than a software engineer.

scotty79 · on Dec 25, 2020

I think he's software engineer too. Just the one who's paid more for figuring what to write, not how to write it.

BlueTemplar · on Dec 24, 2020

Matlab (Octave), Fortran, Python, Excel - congratulations, you have won our Masters of Computational Physics language bingo!

(Well, there's also C, LabView, SQL and Java, but they are less prominent.)

Demiurge · on Dec 24, 2020

So, which is it, "will usually work" or "pure chaos"?

scotty79 · on Dec 25, 2020

> Would a more rigorously-typed alternate-universe Python have been as successful?

Maybe at some point we'll get TypedPython (like we got TypeScript).

In my opinion that's the way language design should be done.

First discover what semantic people want to use to express themselves and if you got that right enough then try to describe that semantic with a type system as rich as you need to let people express themselves more precisely.

Starting with designing typesystem is too hard and leads to fiasco that kept Haskell in obscurity and spawned whole slew of dynamic languages.

Although maybe we got better at making flexible enough type systems... Rust seems to be doing a lot of stuff right.

Too · on Dec 25, 2020

We already have TypedPython. It's called Mypy, and any organization doing serious or larger scale Python development should already have this hooked up to their CI.

Together with Pycharm you get almost same level of fearless refactoring only C# or Java can offer.

It's a bit shocking to see people not using it or not even being aware of it's existence, because without it large codebases quickly become a steaming pile of unsafe mess.

abcxjeb284 · on Dec 25, 2020

Agreed, strict use of mypy with editor support is a game changer (particularly its union types and TypedDict)

marmaduke · on Dec 24, 2020

The same arguments I used to make against MATLAB are used against Python on HN. Yet both are as or more more successful than your favorite language (from where I’m sitting).

orf · on Dec 24, 2020

Providing some specific details would make your argument a lot more convincing.

legerdemain · on Dec 24, 2020

Sure! In Python 2.x, "class" was an instance of "type." In Python 3.x, "type" is an instance of "class." (This is, of course, an over-simplification of a much more complicated reality.)

chrismorgan · on Dec 24, 2020

Could you expand on that? I’m not sure what exactly you’re saying, because you’re not using standard or unambiguous terminology. (`type` is a thing, but “class” isn’t a thing, but rather a category of things, a syntactic construct.)

Python 2:

  >>> class OldStyleClass: pass
  >>> isinstance(OldStyleClass, type)
  False

  >>> class NewStyleClass(object): pass
  >>> isinstance(NewStyleClass, type)
  True

Python 3:

  >>> class Class: pass
  >>> isinstance(Class, type)
  True

Old-style classes were kind of their own thing in their own space, but with new-style classes, classes are to `type` as instances are to `object`, and I don’t recall there being any major 2/3 difference beyond the simple removal of old-style classes.

legerdemain · on Dec 24, 2020

Python2:

  >>> type(type)
  <type 'type'>

Python3:

  >>> type(type)
  <class 'type'>

siebenmann · on Dec 24, 2020

The 'class' versus 'type' distinction here is actually arbitrary. The same C level code is responsible for the message that you get from 'type(whatever)' in both Python 2 and Python 3, but in Python 2 it drew a distinction between heap-allocated things (which were reported as 'class') and things that were not heap allocated (which were reported as 'type'). Non heap allocated things had to be created in C; heap allocated things were usually implemented in Python and were usually made with 'class X(base): ...'.

(This change was introduced in Python 3.0a5, bug #2565. Looking at the bug, this is a followup of making the type() of new style classes be reported as 'class ...', but preserving old behavior of type() for built-ins, done in 2001.)

chrismorgan · on Dec 24, 2020

Ah, gotcha. But I don’t think that that’s actually a difference—it’s just a slight terminology change around “type” and “class”, which I think may have been related to tidying up old-style classes. (That is: yeah, if you’re comparing old-style classes, it may have been a difference (I’m not certain), but I believe it’s completely superficial once you’re comparing the recommended form of classes for the last quite a few years of Python 2.)

eesmith · on Dec 24, 2020

I agree.

I'm pretty sure the word "class" here comes from the repr for types, in Objects/typeobject.c :

  static PyObject *
  type_repr(PyTypeObject *type)
  {
    ...
      if (mod != NULL && !_PyUnicode_EqualToASCIIId(mod, &PyId_builtins))
          rtn = PyUnicode_FromFormat("<class '%U.%U'>", mod, name);
      else
        rtn = PyUnicode_FromFormat("<class '%s'>", type->tp_name);
    ...

which is used later on when defining the "type" type:

  PyTypeObject PyType_Type = {
      PyVarObject_HEAD_INIT(&PyType_Type, 0)
      "type",                                     /* tp_name */
       ...
      (reprfunc)type_repr,                        /* tp_repr */

That is, Python 3 has no actual "class" type, so it's not correct to say that '"type" is an instance of "class."'

(I could be wrong - I'm no expert at the Python internals.)

quietbritishjim · on Dec 25, 2020

Looks like a slight difference in debugging text printed by a single function, not something fundamental about the whole type system.

eesmith · on Dec 24, 2020

I was working on writing the same thing, though less clearly. :)

As a minor note, in Python 2, types.ClassType was a "thing":

  >>> import types
  >>> types.ClassType is type
  False
  >>> isinstance(types.ClassType, type)
  True
  >>> my_class = types.ClassType("spam", bases=(), dict={})
  >>> my_class
  <class __main__.spam at 0x106732670>

I don't remember the details any more, but it looks like it was the base for class for old-style classes. Using your class definitions:

  >>> isinstance(OldStyleClass, types.ClassType)
  True
  >>> isinstance(NewStyleClass, types.ClassType)
  False

Python 3 replaced that functionality with types.new_class(), which returns an instance of type 'type':

  >>> my_class = types.new_class("spam")
  >>> isinstance(my_class, type)
  True 
  >>> my_class
  <class 'types.spam'>
  >>> type(my_class)
  <class 'type'>

As I recall, in the early 1.x days it wasn't apparent that the class/type dichotomy was a problem. It took a lot of work during the 2.x era to bring them together.

chrismorgan · on Dec 24, 2020

Thanks for the correction; it’s about five years since I’ve done any serious work in Python, and six or seven since I last did any involved metaprogramming, and I assiduously avoided old-style classes even then, and so I had completely forgotten about types.ClassType. Python 2.4 was the oldest version I ever worked with, also.

ketralnis · on Dec 24, 2020

Shrug, so don't use it. Nobody's making you

brundolf · on Dec 25, 2020

I think to understand its success you have to compare it not to Java, but to Bash. Or maybe C.

geofft · on Dec 24, 2020

> I have no idea how anyone lives with it for anything except writing extremely repetitive, formulaic, and superficial code, because having to dig into its guts feels like a nightmare.

This is sort of why I like Python: it prioritizes getting something working today and trades off being able to build a complex system over several years. Java is a great language if you're going to build a large-scale application and want a bunch of developers across time and space to productively contribute to it. But business requirements change, and if you can get what you want done by using a series of applications that live for a few weeks or months, why wouldn't you do that?

Another commenter mentioned scientific computing, which is a good example. When you're evaluating hypotheses, it's important to be able to write new code with high throughput. That means you need enough power of abstraction to be able to have libraries like NumPy etc., but you don't want so much structure that it's hard to write 10 lines of code and validate or falsify your gut feeling and then move on. Python was built around a REPL; Java only grew JShell fairly recently, and Jupyter was built for Python, not Java.

And even when you're trying to reproduce old work, it's a lot more scientifically valuable to have someone be able to look at the paper (the spec) and write a new implementation than to say "Start with this 100 kloc Java codebase which we wrote for Java 1.2 but still works today." There are of course useful applications for that capability and it's great that Java supports it, but this isn't one of those applications.

I don't do scientific Python myself, but I support users who do and I do a lot of infrastructure work. For that, it's still much more valuable to be able to write new code and get rid of it than to work within an old and well-built system. If I want to answer questions of, say, what our storage use patterns look like so the business can figure out what to invest in, it's a lot more valuable to be able to take that problem statement and produce an answer quickly than to build a long-lasting application that can keep producing answers to variants of that question, because the next question isn't likely to look too similar in terms of implementation.

Or maybe put it this way - if you're the IRS, and you're writing code to process and verify people's tax returns, Java sounds great. You're going to want to keep the tax code from N years ago still working, and you're going to want this application to live many years. But for me, I have a little program to do the math for me to fill out my taxes, and that program is in Python, because I have no need to keep maintaining the script when the tax code changes (or when I move states), and I'm the only author. I make a copy of the Python file each year and I drop anything irrelevant. The ability for me to read the entire script top to bottom and be convinced "OK, this solves this one task accurately" is far more valuable than the program being able to solve hundreds of tasks it needed to solve in the past.

legerdemain · on Dec 24, 2020

This is a traditional argument and I understand where you're coming from.

But look at where it got us. At every company where I've worked with data scientists, every piece of "data science" code is written twice -- once in Python, and again in another language. We had to hire at least one software engineer for every data scientist. There is an entire industry that targets "productionalizing" things written in Python by data scientists, because Python code is not production-ready code.

Python for education? Absolutely. Python for whiteboard interviews? Great. Python as a DSL for data science? Obviously. Python as a scripting language? Sure.

Python as a production-ready language for a growing company? I have spoken to people at lots of companies that started with Python, and then had to dedicate months or years for a full rewrite. And if they have people writing Python code, they have hired more people to rewrite that code immediately so it's not in Python anymore. I say "bye" to every interviewer who pitches me on a job writing Python: every time I have looked into an opportunity like that, it was an attempt to throw fresh meat at rewriting a creaky, unmaintainable Django monstrosity. This isn't me being an armchair philosopher, this is the industry around me.

nijave · on Dec 24, 2020

Production ready is a red herring. Facebook was built on PHP and found a way to make it work. There a plenty of massive websites running on Python.

Here are some examples https://learn.onemonth.com/10-famous-websites-built-using-py... (some, like Google, are a bit dubious)

I can agree that large projects in dynamically typed languages can be unwieldy without type hinting but there are tools to make it manageable (Ruby, JavaScript, and PHP are no different in regards to typing)

legerdemain · on Dec 24, 2020

On that list, I only have first-hand experience with Uber. The Uber entry on that list links to a blog post from Uber engineering. This four-year-old blog post says:

  > We rip out and replace older Python code

How many other companies on that list also "use" Python the same way that Uber "uses" Python?

geofft · on Dec 25, 2020

If you follow the link next to that text, it says that they're ripping out sync Python using Flask from their monolith and replacing it with async Python using Tornado in a microservice, though some teams are also exploring Go.

Which seems like an entirely reasonable way to use Python (no quotation marks needed), and exactly what I'm advocating - Python is a language where you can ship something today and reimplement it next year, also in Python, for the same engineering effort that you'd spend doing it once in a more highly structured language. Alternatively, you can reimplement it in another language. You can safely rip out and replace the original Python, because it's a language that optimizes for humans both reading and writing it.

And just about every place I've worked, business requirements are constantly changing, and the scale and structure of the company (and associated Conway's Law implications) are changing, and so code you write today is going to be tech debt in a year anyway. A language that encourages you to write outrippable code and makes it easy to replace it is your ally under these conditions.

(Put another way: Python is a language that is readable enough to avoid the https://www.joelonsoftware.com/2000/04/06/things-you-should-... trap, which is fundamentally about code that is so complex that a human can't figure out all of what's going on and the only safe way is to treat the existing code as a relic.)

Again, I'm not saying this is the language for everyone to use for all cases. There are cases where you want to make the code a little harder for humans to read and write so that the computer can help you with things. If that is indeed your use case, go write Java! But I think there's plenty of stuff you can call "production-ready" that doesn't fit this particular mold.

legerdemain · on Dec 25, 2020

I wrote that I have first-hand experience at Uber because I was ripping out that Python. There is no more Python, and certainly no more has been added in the four years since that blog post was written. Python is only for scripting data pipelines and automation. Almost everything else is Java/Go.

nicoburns · on Dec 24, 2020

Python isn't only unwieldly because of dyanamic typing, but also because it eschews functional idioms. JavaScript is much better in this regard, and it makes the code much more maintainable.

p7g · on Dec 24, 2020

Do you have any examples in mind?

legerdemain · on Dec 25, 2020

Sure, `reduce()` was removed from the standard library between Python 2 and Python 3. Here is Guido arguing that `lambda` and several other functions should be removed as well.[1]

[1] https://www.artima.com/weblogs/viewpost.jsp?thread=98196

klibertp · on Dec 25, 2020

> `reduce()` was removed from the standard library

Not true: it was moved from builtins to `functools` module, but it's still there.

legerdemain · on Dec 25, 2020

Please tell Guido that he got it wrong!

  > Only reduce will be removed from the 3.0 standard library. You can import it from functools.

geofft · on Dec 25, 2020

He did, in fact, misspeak. functools is part of the standard library, and was part of the standard library in 3.0, and contained reduce: https://github.com/python/cpython/blob/3.0/Lib/functools.py#...

What he meant to say is that it's removed from the list of builtins.

geofft · on Dec 24, 2020

To the point about data scientists - there isn't really a way you could solely hire the software engineers, right? So there is value in Python allowing the data scientists to iterate much quicker than they could if they were writing in a "production-ready" language.

I agree there's work to be done in closing that gap so you don't need an extra software engineer for productionalizing mostly-working Python (and I'm excited about tools for managing large-scale Python - e.g., several of my coworkers are trying out MyPy, which I haven't personally felt too much of a need for but seems like it could help), but the gap exists precisely because you can write something working in Python quickly, and it's not so much of an investment that you'd feel bad throwing it away if it doesn't work.

The company that started with Python and spent years on a full rewrite made enough money with Python to survive those years. If they had started in a more "production-ready" language they might not have shipped at all, and if they did they might not have shipped the right thing.

And at least for me personally, as an infra person, the question I'm evaluated on at the end of the day (or year) is "Did the infrastructure work," not "Did you write production-quality software to support the infrastructure." Some of the most critical software I've written (across multiple companies) has been 50 lines of Python shared via a Slack snippet or a shared homedir and only retreated into version control months later. There are a lot of problems that genuinely require only 50 lines of complexity, and the ceremony of a language like Java makes it much harder to understand what's going on. For those problems that do require umnanageable Django monoliths, by all means, write it in something else.

lmm · on Dec 25, 2020

Given that 90% of companies fail, maybe that's a good tradeoff. Build the thing that might work in Python, see if it gains any traction (the famous product-market fit), and then rewrite it if it worked. If Python saves you more than 10% of the effort on the first write, then on average it's worthwhile.

simonw · on Dec 24, 2020

What were the other languages you were rewriting the Python in for production?

legerdemain · on Dec 24, 2020

Thankfully, this rewriting has never been my job. Companies where I've seen it were typically targeting Spark as the execution environment, so the production languages were Java and Scala, at a ratio of about 2:1. PySpark in production was either disallowed up front or quickly disallowed retroactively after experiencing the magic and delight of data scientists shipping production PySpark.

kaba0 · on Dec 24, 2020

I don’t dislike python, but as far as I know reproducibility is a big problem in scientific circles (actually in both meanings, but I’m talking about copy-pasting codes not running at all).

ML is famous for it and python is the most used language there.

Master_Odin · on Dec 24, 2020

Reproducibility would be a problem in scientific circles regardless of language. Researchers are negatively incentivized on reproducibility to avoid scoping and the like.

watwatinthewat · on Dec 25, 2020

You're right, reproducibility is a problem in science, all science and not just ML (again, as you say), so I don't think that proves anything wrong with Python.

kaba0 · on Dec 25, 2020

That is true, but maybe Python doesn’t help with its primarily interpreter-based ecosystem (think of jupyter) where a result may only worked for the author because a variable he/she uses used to refer to something else no longer there in the source code (but still defined). Which bit me more than once with for example sage - but it may be simply a problem with jupyter/the way the scientific community writes code.

zelly · on Dec 25, 2020

Python is basically pseudocode so if you don't like Python then you don't like programming.

isbjorn16 · on Dec 24, 2020

I always love coming into these Python threads. It's extremely cathartic to know there are enough people who hate the language as much as I do, even if we are stuck with it for any number of good reasons.

All of the reasons we're stuck with it related to datascience and scientific computing are the exact reasons why we hate it. It's exceptionally fast to write code that does a very specific task without sparing a single solitary thought for any sort of software development best practices. It's incredibly approachable, which means it can be used by any keen scientific or mathematical minds who spend more time reading and writing whitepapers than they do writing code. The code, as an entity in and of itself, clearly shows that. But the _output_ of that code also shows that, which is why it isn't going to go away or be swapped out for anything else.

This is like the ideal breeding ground for further scientific or academic pursuits, and since everyone is standing on the shoulders of numpy/scipy/sklearn/pytorch-esquian giants, you're stuck in the scenario of having to replicate all of these libraries in any other language. I can't honestly say it's worth the effort to replace the entire stack, and it's also not going to help when these same researchers continue to use python going forward and start using some other library that you didn't port yet.

And so, they will continue pumping out new papers and prototypes-masquerading-as-quality-libraries, and we'll continue roping them in and duct taping fix after fix after fix on this kaleidoscope of horrors and we'll continue self medicating to dull the daily frustration of python while also accepting that it's the price we're going to need to pay if we want to work with these brilliant minds with all their neat new ideas pushing forth such a breadth of new understanding from the data _we already have_. Or at least that's what we say when we try to ignore the growing bald spot and bursts of incandescent rage every time we get a "quick script that needs a little polish" that clocks in at 2000 loc and literally doesn't have a single function definition in it and variable names that identify up to 14 different types depending on some global state and...

You know, I'm going to stop typing this. It's christmas eve and I wasn't planning on being this angry today

Edit: Some people weren't liking this, it may be because they think I'm being sarcastic when I talk about these brilliant people. I want to just clarify - I am not being sarcastic. They ARE brilliant. They're far more intelligent than I, and they're really pushing the boundaries of human understanding in mathematics and sciences. I acknowledge they're bring a whole hell more good to the table than they are bad, and there's a damn good reason why I am going to be doing this thing I'm doing in perpetuity, or until I can't handle it anymore.

But that doesn't mean their coding practices aren't often _abysmal_, and I won't apologize for my frustrations in this and the language that just _lets_ them do it.

p1esk · on Dec 24, 2020

I'm a deep learning researcher. I have no idea what you're talking about. When I crank out some Pytorch code of questionable quality it's not because of Python. It's because I either don't know better, or don't care. Switching to another language won't change this.

isbjorn16 · on Dec 25, 2020

Some languages force you into better practices. The extra overhead that you may not want to deal with - and that learning curve - are the exact sorts of things you want to do if you want to write code that is of less questionable quality.

Good Python looks a lot like what you would get from a compiled language, and every researcher I work with looks at it and goes "holy shit this is so much extra work I don't wanna do it"

And like I said, that's fair. The code is not the point, it's just a tool to get the data - that's the point.

I equate it to the one time use jigs woodworkers use. Some can get pretty fancy and be awesome, but most are just slapped together and will probably get chucked in the bin. I don't mean to shame anyone - there's a reason for it, and it serves it's purpose.

It's when someone hands you that shitty-ass jig, from a language that lets you do some pretty heinous things by design, the frustration builds like crazy! I've seen some pretty gnarly Java and C# in my day too, and all I know is the worst Java still is an order of magnitude easier to handle than the worst Python I see on the regular, it's wild.

p1esk · on Dec 25, 2020

Can you please show me some examples of what good Python code looks like, and point out the overhead? Next month I will be working on a major redesign of a fairly complicated simulations code (Pytorch) with the goal of making it more flexible, and incorporating some new features. This code will be used by many others, so I want to follow good software engineering practices.

isbjorn16 · on Dec 27, 2020

I don't off hand, but I didn't want to leave you hanging too long without saying anything either.

The main two things that get me all bent out of shape is having a purposefully disjoint types passed in as the same variable with a bunch of logic around handling either of them, often in very different ways, and never, ever checking types of values or ranges.

If you start throwing in type hinting and make use of mypy, it keeps your own code pretty coherent. If you do need to have disjoint types coming in, spend a lot of time thinking about how you want it to work. It may be reasonably cheap to force everything into a single type from the many possible types coming in, which should simplify things a lot. If that won't work, consider wrapping any of these types in a composite object that unifies the _how_ of accessing the data inside the type into a single low-cost abstraction. Whatever you do, don't let the logic about how to operate over your abstract data input bleed into the logic of how you're building off of it.

This is one of those things that Python makes hard, not because it purposefully stops you, but because it makes it _so very easy_ to spew implementation details through every single function call. It's easy, and programming is hard, and people have deadlines and it's something you can easily convince yourself you don't need to do, and suddenly you're writing bad code and nothing and nobody is there to stop you from it.

I'm a big fan of keeping my type hints in my code and not using typeshed; I WANT people to see what types I'm agreeing to support. You don't have to agonize over documentation or look 14 levels deep in my code to see what I'm handling. The goal here should be for someone to read my function signature and go "oh, okay, I know exactly what I need to provide for use".

Another vexing thing that comes about ALL the time in dsci/ml code is single letter identifiers. A lot of this is because the paper says this variable is `p` or maybe even `epsilon` so... that's what the variables get named. I've even seen `f()`, `g()`, and `h()` in the wild, and of course there wasn't a lick of documentation around it.

Unless your audience is only ever the people who wrote the paper itself, or those who studied it vigorously (more so than just reading it), these are terrible choices.

This advice is not python specific - it's language agnostic, but I notice it most in Python solely because I run into it the most dealing with non-devs taking a stab at writing their first libraries (or, rather, a bit of polish on their initial prototype). But in general, write your code not so that it's easy to write, but so that it's easy to read, easy to reason about, and doesn't require a chain of whitepapers to understand. The goal is so that someone reading this later doesn't have to literally be you, at this exact moment in time, to understand what is going on. You want your code to live long past your current attention on it, so write it in a way that is easy for another maintainer to pick up and run with.

In the end, other languages make some of this table stakes - not the naming, obviously, but the types? Range checking? Handling error conditions explicitly? Python gives you all the rope in the world necessary to bind yourself into a knot with, so it's all up to the developer to do the right thing. When I lament python, it's not because it's inherently bad, but because allowing people to play fast and loose with the rules means you're going to find a lot of people who just don't give a shit.

legerdemain · on Dec 25, 2020

Right on! It's not the job of ML specialists to waste their time polishing "clean," "performant," or "correct" code.

indymike · on Dec 25, 2020

You know, what you are describing sounds exactly like most production code I've inherited over the years. It's what imperative programming devolves into. Start at the top, run to the bottom. You see a lot less of this with compiled languages, and I suspect that has a lot to do with the additional complexity of actually compiling the code filtering out people who don't want to be bothered with details. Enjoy what's left of Christmas Eve.

nijave · on Dec 24, 2020

It seems like a lot of criticism of Python is directed at the data science community but I'm curious what the alternative is here?

Is it surprising people trained in math are writing hard to maintain programs? Is biology+Perl any better (I'd argue worse).

What languages exist that are hard to write bad code in? I've worked at places that rewrite abominations of Java every 5 years because they are such coupled, unmaintainable messes.

JPMC has Athena and Pinterest seem firmly committed to Python. I know some people that worked at Braintree (now part of PayPal) that heavily used Python

Home Assistant and Ansible are two massive OSS projects

zelly · on Dec 25, 2020

Julia or Swift

I'm hoping the default choice becomes Julia. For one, it's Lisp. Imagine tensorflow or whatever being distributed as one Julia codebase, no separate C++ to compile, no need because Julia can do it all. I hope it replaces R, Mathematica, Matlab, Excel VBA.

ausbah · on Dec 27, 2020

I would like to see Kotlin get more love, but that's just a personal wish

cercatrova · on Dec 25, 2020

Julia is apparently gaining steam in scientific computing.

neilparikh · on Dec 25, 2020

> What languages exist that are hard to write bad code in?

A language with a good type system (ex. Haskell, OCaml, Rust etc.) makes it hard to write bad code, because it eliminates a large class of potentially "bad" programs, and thus provides guarantees about the code that does type check.

If I have an arbitrary Haskell program, it might still be messy and hard to read, but I know I'll have certain guarantees about the code. For example, if a function isn't in IO, it won't have any effects, and a replacing it with a function that returns the same values for the same arguments will be safe.

Additionally, the language makes it easy for libraries to have well designed APIs, that ensure that the user doesn't forget to make the required checks. For example, removing null, and forcing Option/Maybe for functions that might fail forces the the caller to handle the failure case, instead of just forgetting to check for null, which is common.

vlovich123 · on Dec 25, 2020

If they don’t have bad code it’s because it’s a bunch of enthusiasts that are good at coding. Realistically I’m sure there’s even now examples of code that would be considered “bad” in any of those languages.

webgoat · on Dec 25, 2020

It's more the fact that these languages force certain best practices using their compiler. While python can be great when used in best practice, its level of freedom make it easy for the user to get away with obvious mistakes. It's great to have freedom like that when you're writing small projects, but a large project could lead to unintended effects that cause more damaging issues than the time lost using a stricter language.

vlovich123 · on Dec 25, 2020

That’s all great in principle, but is there any evidence to support this claim?

Certain properties are of course enforced so it’s impossible for a project to do certain things that make the code harder to read. On the other hand, humans are so creative that I have a hard time imagining that, given time and a wider mix of talent, you’ll still have code hygiene issues. Maybe not the exact problems that another language might have, but certainly your own flavor will be developed as developers have less contact with the core language enthusiasts that establish said best-practices..

Again, happy to be proven wrong but I’d like at least anecdotes or some kind of evidence rather than a theoretical argument from first principles that completely ignores the human element.

eesmith · on Dec 25, 2020

Words like "any effects" and "safe" have broader meanings than the way they are used in type systems. I ask that you be a bit more precise in your advocacy.

I can replace a sorting method having O(N log N) time with one having O(N^2) time, without changing the type. But it will certainly have an effect on my system, and if the run-time is too long, may result in an unsafe system (eg, in a real-time control system which requires a maximum response time).

While I know what you mean, "bad code" includes things like unacceptable performance.

I also find it hard to understand how to apply type systems to algorithms. What is the type of an algorithm which returns a graph diameter, and how does it differ from one returning the graph eccentricity or other graph property?

(That's where "same values for the same arguments" comes in, but that's that hard part of the problem, yes?)

insertnickname · on Dec 25, 2020

>A language with a good type system (ex. Haskell, OCaml, Rust etc.) makes it hard to write bad code,

Maybe they wouldn't be writing any code if they had to write it in a languages like that.

bollu · on Dec 25, 2020

In a world where unsafePerformIO, error, etc don't exist, sure.

The reason most Haskell codebases are cleaner is because the culture manages to uniquely emphasize good practices such that people don't try to break abstraction barriers.

I'm sure it's possible to write Haskell code that "misbehaves" as much as python code does. Once smart people with no training are let loose on something, that'll definitely happen.

We're lucky that Haskell hasn't been attacked yet :)

holri · on Dec 25, 2020

> and a replacing it with a function that returns the same values for the same arguments will be safe

No. That is only safe at compile time but not at runtime. Because the type alone often does not guarantee the meaning and semantics of the values in the specific context and program are the same.

Asraelite · on Dec 25, 2020

I'm not sure I get the argument here. So long as the input-output mapping for the function is identical, it cannot change the benavior of the program. This is what I would consider safe.

The semantics of values are irrelevant because the function will behave identically regardless of those semantics.

holri · on Dec 25, 2020

This only true if the type describes the whole meaning of the values. If the exit code of a unix program changes from 0 to 1 it still returns an int, but the meaning of that value is completely different one.

Asraelite · on Dec 26, 2020

But the exit code can't change from 0 to 1. For any given input to the Unix program, if the output was 0 with one implementation, then it would always remain 0 with any other _equivalent_ implementation.

This is literally what is meant by "returns the same values for the same arguments". Any function that fits this description, by definition, cannot suffer from the problem you are describing.

neolog · on Dec 24, 2020

> What languages exist that are hard to write bad code in?

Plenty of languages exist that are hard to write any code in, good or bad.

carlmr · on Dec 25, 2020

I often think that stopping people from writing code might be the best way to avoid bad code.

Stricter languages like Rust, ADA, or even F# (quite strongly typed), will not let you compile until you reach a certain level of explicitness/correctness. Thereby teaching you some basic things until you can compile.

Languages like Perl and Python allow you to do a lot of things, which ends up unmaintainable because it's too easy. The easy version just breaks during runtime then.

westurner · on Dec 24, 2020

The docs should have coverage on this:

Python/C API Reference Manual: https://docs.python.org/3/c-api/index.html

Python/C API Reference Manual » Object Implementation Support > Type Objects: https://docs.python.org/3/c-api/typeobj.html

CPython Devguide > Exploring Python Internals > Additional References: https://devguide.python.org/exploring/

CurtHagenlocher · on Dec 24, 2020

If I remember correctly, this was not originally true but type unification happened early in the life of 2.x.

loeg · on Dec 24, 2020

I don't know if it's related to C classes at all, but an interesting semi-related tidbit: 2.x has a weird distinction between "classic" classes and "new-style" classes that inherit from object. In 3.x, everything descends from object.

dapids · on Dec 24, 2020

C has classes?

chc · on Dec 24, 2020

Python has classes defined in C.

Spivak · on Dec 25, 2020

And “classes in C” is really just a matter of agreeing on a convention. GObject and COM are object systems that have C implementations.

I suppose it’s a mental thing when you’re “below” the abstract object system instead of getting to treat them as tangible things unto themselves. Like how C doesn’t really have strings — they’re pointers or structs and so you don’t get any sort of encapsulation that’s not enforced by convention. You have to maintain the invariants or only use methods that maintain them.

dapids · on Dec 25, 2020

Wow, it was a legitimate query, I wasn't sure what OP was referring to. Thanks for the downvotes.

monkeybutton · on Dec 24, 2020

Object, the root of all evil.

wizzwizz4 · on Dec 24, 2020

Some of the info is out of date; it's Python 2 only.

quietbritishjim · on Dec 24, 2020

Could you be specific? The article's targeted at Python 3 and I don't see anything that's really obviously Python 2 only. Did you mean the link in the sidebar maybe? That was written back in 2011 and I don't know whether it's still true or not, but it only talks about "new style" (at the time) classes so it still could be.

wizzwizz4 · on Dec 24, 2020

The paragraph containing it; the described repr behaviour is Python 2 behaviour.