Introducing Numbat: A programming language with physical units as types

bugsmith@programming.dev · 1 year ago

Introducing Numbat: A programming language with physical units as types

ScreaminOctopus@sh.itjust.works · 1 year ago

This would be so nice in a mainstream language, I wonder if it would be possible with rust’s macro system?

SittingWave@programming.dev · edit-2 1 year ago

I disagree.

I worked with a software for quantum physics and electronic transport from microscale to mesoscale. It had a “python based” DSL that had support for units through that module. Seems the perfect scenario for such entity, so we wrote it integrating another similar package (it’s not the units package, I can’t find it anymore. In any case, it let you say things like speed = 3 * meters / second)

The results were… interesting.

There are many major problems:

managing scales. What if you add 1 meter and 1 nanometer? it’s technically possible, but you have loss of precision? or should it convert everything to nanometers? or increase the precision and set it to meters? Now multiply this for all the various rules of conversion and potential scale difference of various units and you get in a mess very fast.
constants. Researchers (the target of that language) often use fixed constants. Of course these constants have units. Of course they are important for dimensional analysis, but if all your work is in one measure domain (e.g. you are always using atomic units) then you just don’t care about the unit of measure of that constant. It’s known, but who cares? However, to perform math with dimensional correctness, now you force the researcher to define the constant in the script as the number followed by the unit, which again adds nothing but a chore of finding somewhere and writing in your meta language the unit.
Sometimes you are handling logarithms of metricated units, e.g. what’s the unit of measure of log(3 meters)? or what is the unit of measure of a cholesky decomposition of a matrix of metricated stuff? I honestly still don’t know as of today, but… does it matter? Do you care? Especially if they are in-transit numbers?
Massive performance impact and trouble when specifying arrays or mixing them. When specifying geometry information of large molecules, what do you do? specify an array, followed by the unit (meaning that the whole array numbers are all in the same unit?), or do you grant to specify one element in e.g. nanometers and the other in micrometers? now you have to eventually reconcile and normalise. What if you have to perform a multiplication between two matrices one in nanometers and one in micrometers? again, reconciliation. It’s a nightmare. Additionally, now these values are no longer memory contiguous, which trashes your cache and makes it close to impossible to transfer data to C, for example, for performance gain.
Units tend to be short names. This pollutes the namespace with short names such as m, s, etc. The result is that the likelihood of users overriding unit names is very high. So you write them in extended form, but then it becomes a chore because now instead of saying 3 * m / s they have to write 3 * meters / second. Or worse; 3 * units.meter / units.second.
Dimensional analysis implies that you might have to simplify all your units, to a normalised form, otherwise you end up with really complex behavior trying to perform operations. E.g. fuel efficiency is measured in meters squared, which is a very weird measure because it’s basically cubic meters (of fuel) divided by length traveled (in meters). The reynolds number is actually a pure (no unit) number. What should you do if you use the equation? simplify the pile of units until you eventually reduce it to a pure number, or leave it as it is?

So, it looks cool, but in practice it’s pointless. The only practice to follow is:

accept and output whatever unit makes sense for the business domain. Store them in a variable named explicitly with the unit (e.g. length_angstrom) until converted for internal use. Then you can implicitly assume the units to be standardized at one unit realm and omit the explicit unit.
convert at the interfaces fro user into metric or business specific units (eg. atomic units) and only use this form internally, in strict consistence.

In other words:

user gives stuff in micrometers -> store it in length_um -> convert it in nanometers -> store it in length -> use length from now on (implicitly in nanometers)

The reverse for output