The article claims fixed point arithmetic is the “correct” way to do arithmetic in databases. In the general case? No I’m totally unconvinced.
The reality is that any system for storing real numbers in the computer will be lossy. It’s up to the system designer to decide how much precision to store and where the loss happens. Fixed point numbers are still lossy - they just lose detail in very small numbers and they take arbitrarily more memory to store very large numbers. This makes sense for finance, but in a lot of scientific computing tasks or for 3D modelling the way floating point numbers work is great. There’s no right answer here across all tasks. Databases, like programming languages, should let you pick what sort of numbers you want.
With integers / rational numbers you can have arbitrary precision (eg Haskell’s Integer type is an arbitrary precision integer). But it’s a trade off. Arbitrary precision integers are slow and big. Almost all of the numbers that matter are within an i64. I think it’s a fine default for most data. I do like the idea of choosing what happens when integers overflow though. Wrap-around is a spicy choice. Outputting an error on overflow might be a better default.
I think that the author implicitly relies on the unstated assumption that for business related databases the vast majority of non-integer data is money (which is quite well served by fixed point arithmetic with limited precision) and any other measurements are quite rare. Often it is a reasonable assumption, but not always, so if the author was explicit about this, that would preempt this discussion
> I think that the author implicitly relies on the unstated assumption that for business related databases the vast majority of non-integer data is money
That would certainly result in a much shorter article. "Hey, databases deal with money all the time. They should have a fixedpoint integer type for money values. Floats are a bad choice".
If thats all they wanted to say, I wish they just said that.
Arbitrary precision is not fit for databases because databases benefit enormously from having constant offsets for records and fields.
As a practical matter, any columns in a database will tend to have values with a defined range of magnitudes and required precision. It's not like you're throwing arbitrary values into there; e.g., you're not going to be mixing high precision atomic measurements with interstellar parsecs and ounces of cocktail liquors.
That doesn't mean databases must not support floats/arbitrary precision, but I wager most use cases would favor fixed point.
i agree with you that, in general, different contexts benefit from different representations of numbers.
but it's also helpful for a system like a database to come with reasonable defaults that give non-surprising behavior for common use cases. if databases are mostly going to be used for basic operational accounting and record keeping in businesses, for small data, where compute isn't a bottleneck, then making fixed point arithmetic that will give non-surprising results for typical businessy calculations seems like a good decision.
it's great if the system offers opt-in ways for the discerning user to select the best numeric type for their application, but it's also great if users who don't read the manual get reasonable results for typical use cases.
> I do like the idea of choosing what happens when integers overflow though
it's entertaining to focus on what happens to the overall system surrounding the integer when answering 'what happens'.
e.g. 'when the auto-incrementing integer used in the bowels of the internet banking customer session management system overflows during a holiday period, what happens is none of the bank's customers can log to access their money any more, so they become very angry, and then a bunch of staff get paged and spend days firefighting to bring the system back up'
one can almost imagine a design meeting where the architect is outlining the various options for how the overall banking system will react when this particular integer overflow to a panel of business and technical stakeholders, and everyone voting for this choice.
> 'when the auto-incrementing integer used in the bowels of the internet banking customer session management system overflows during a holiday period, what happens is none of the bank's customers can log to access their money any more, so they become very angry, and then a bunch of staff get paged and spend days firefighting to bring the system back up'
There are a lot of outcomes that are much worse than simple downtime, with a clear error message in the logs. I'd much rather downtime than quiet data corruption & mysterious downstream errors, especially in a banking system.
Also a 64 bit signed integer can store numbers up to 9223372036854775807. If you consume one integer every time someone logs in to your website, your system will be able to handle the entire population of the USA logging in every second continuously for 835 years. Its a pretty safe integer size for most tasks.
Heh, it's all about frequency and impact, and "usually" it's actually pretty acceptable to have incorrect results much more often than people fall into the trap of believing(aka more than zero times).
The reality is that any system for storing real numbers in the computer will be lossy. It’s up to the system designer to decide how much precision to store and where the loss happens. Fixed point numbers are still lossy - they just lose detail in very small numbers and they take arbitrarily more memory to store very large numbers. This makes sense for finance, but in a lot of scientific computing tasks or for 3D modelling the way floating point numbers work is great. There’s no right answer here across all tasks. Databases, like programming languages, should let you pick what sort of numbers you want.
With integers / rational numbers you can have arbitrary precision (eg Haskell’s Integer type is an arbitrary precision integer). But it’s a trade off. Arbitrary precision integers are slow and big. Almost all of the numbers that matter are within an i64. I think it’s a fine default for most data. I do like the idea of choosing what happens when integers overflow though. Wrap-around is a spicy choice. Outputting an error on overflow might be a better default.