The problem is that often getting rid of these type-unsafe idioms requires much ...

mpweiher · on Aug 4, 2021

Right. The other thing is that in the parent, "static type checking" got shortened to "types", which in turn got confused with "modelling".

The problem with stringly-typing is the model, which is SQL as strings. Statically type-checking that all your strings are, in fact, strings does not buy you anything whatsoever in this scenario.

On the other hand, if you have an actual model of the SQL, then it also doesn't matter whether that model is checked at runtime (dynamic typing) or at compile-time (static typing). In either case it will not allow injection attacks, if implemented correctly.

And of course the actual place where nastiness happens is the user input, and I haven't yet seen a place that can statically typecheck users. ;-). So, as you point out, you will need to do dynamic checking and sanitising of input.

chriswarbo · on Aug 4, 2021

> And of course the actual place where nastiness happens is the user input, and I haven't yet seen a place that can statically typecheck users. ;-)

User input has type `ByteString`. Not only can we check that statically, we absolutely should; and we should enforce that type, to reject any code which assumes otherwise.

Nastiness happens when developers treat user input as something other than ByteString; e.g. they might try appending it to a fragment of HTML (XSS); or wrap it in quotation marks and send it to a database (SQL injection).

We don't need to 'statically typecheck users'; we do need to statically check that `myApp` has type `ByteString -> Foo`, so we can avoid ever executing implementations which actually have type `HTMLFragment -> Foo` (XSS), or `SQLFragment -> Foo` (SQL injection), or whatever.

seanwilson · on Aug 4, 2021

> On the other hand, if you have an actual model of the SQL, then it also doesn't matter whether that model is checked at runtime (dynamic typing) or at compile-time (static typing). In either case it will not allow injection attacks, if implemented correctly.

It's possible to implement anything correctly with dynamic types but the point is static typing makes it easier to do so. Static types will exhaustively check for you that certain errors aren't possible (vs tests that check a finite number of cases) + compile-time (vs at runtime when the error actually occurs on particular inputs).

> And of course the actual place where nastiness happens is the user input, and I haven't yet seen a place that can statically typecheck users. ;-). So, as you point out, you will need to do dynamic checking and sanitising of input.

It sounds like you're talking about dynamic (type) checking here when you really mean regular runtime behaviour? You absolutely can at compile-time make sure your program is going to respond in a sensible way to problematic user inputs at runtime e.g. the compiler verifies that either the user input string value will get turned into a `valid email` value at runtime or the user is asked to try again. I wouldn't call the condition that checks if the string is a valid email a dynamic type check.

I don't see how user input is different or special from any other value either e.g. it's not like you can predict what exact values are going to come from files, random number generators or from complex calculations during runtime, you just care that they're in the range you expect.

mpweiher · on Aug 4, 2021

> It sounds like you're talking about dynamic (type) checking here when you really mean regular runtime behaviour?

No. Quite the opposite. People here confuse "static typing" with "checking stuff", even such obviously dynamic things as "checking outside input".

> You absolutely can at compile-time make sure your program is going to respond in a sensible way to problematic user inputs at runtime

Right. Sort of. But that is just normal code. Code that absolutely, definitely has to be executed at runtime and at no other time. Because that is when the actual users of your system are going to input the actual data. Not while you're compiling stuff. Your users (and the rest of the world you interact with) aren't there while you're compiling. And your compiler isn't around when your users are inputting data.

> the compiler verifies that either the user input string value will get turned into a `valid email` value at runtime

No, the compiler cannot verify user input, because the compiler isn't running when the user is inputting data. You can implement a model that has the concept of "valid e-mail", and you can implement your system such that it converts input data such as strings into your structured model at the edges and then only deals with "valid e-mail" objects internally. But implementing such a model has nothing whatsoever to do with whether you use a statically type-checked language or a dynamically type-checked language.

It also turns out that such systems tend to be really, really bad. What you actually want to do is keep the user input exactly as it was input, for auditing purposes if nothing else, and build your structures as an enrichment on top of that basic data. Because your idea of what constitutes a "valid" e-mail is almost certainly wrong, and it's better if the system can at least represent data even if it doesn't completely understand it, rather than destroy user data.

seanwilson · on Aug 4, 2021

> > the compiler verifies that either the user input string value will get turned into a `valid email` value at runtime or the user is asked to try again.

> No, the compiler cannot verify user input, because the compiler isn't running when the user is inputting data.

I'm not following why you thought I meant that. It's like you're arguing it's impossible to make any compile-time guarantees because you don't know what exact literal inputs your program will be dealing with at runtime when the field of https://en.wikipedia.org/wiki/Formal_verification exists. E.g. at compile time you can prove a sorting function gives the correct output for all possible inputs without having to run the code - you don't know in advance what the input is going to be (which is normal when coding), but you still know the output will be correct with respect to the function specification.

simiones · on Aug 5, 2021

There are two ways to have the compiler ensure that you produce safe SQL for any user input:

1. Define SQL in your type system, and force programmer to specify for any piece of user input they want to use in a query what its semantics are supposed to be. This is the SQL DSL approach, such as the short lived LINQ-to-SQL or Haskell's Selda. ORMs also do something similar.

2. Enforce that any string sent to the DB passes through some kind of checker that enures certain properties hold for that SQL. The checker will have to understand all of the semantics of SQL, just like in 1.

There are many libraries that go through path 1, but don't support the full capabilities of SQL (usually they support a tiny subset), even for a single DB.

mpweiher · on Aug 5, 2021

No, I am not saying that you cannot make any compile-time guarantees. I am saying that these have little to nothing to do with the actual securing against SQL injection attacks, and that the idea that static types help there is simply an unsupported assumption (and a circular argument, see parallel post).

mpweiher · on Aug 5, 2021

> It's possible to implement anything correctly with dynamic types but the point is static typing makes it easier to do so

This is where the argument became circular and we can basically stop.

The parent claim (by chriswarbo) was that, for example, SQL injection attacks were, in fact, incontrovertible proof that SQL injection attacks "are classic type errors". That simply isn't true, at least not in the sense of static type checking (and only made true, as I mentioned above, by conflating "static type", "type" and "model" into one incoherent gooey mush).

chriswarbo's incorrect claim was in response to my referenced article, The Safyness of Static Typing [1], where I (as a prelude to my actual point) look at the evidence for the claim that "static types make it easier to implement something correctly".

There isn't any.

Or to be precise, there isn't any that passes any statistical or other standards. What little evidence there is goes both ways and has, at best, tiny effect sizes.

And yes, that article is somewhat old, but the evidentiary situation has not changed, despite further attempts to make the claim that static typing is provably better. Claims that were resoundingly debunked.

So that's the background. With this background we have the idea that SQL injection attacks are somehow evidence for the problems with dynamic typing, which they are not. I can have a statically checked string type and have exactly the same SQL injection attacks, and in principle, checking code needs to run at runtime against the dynamic input it is represented with. Which I think we agree on:

> It's possible to implement anything correctly with dynamic types

but apparently chriswarbo does not. So that brings us to the circularity of the argument:

> but the point is static typing makes it easier to do so

This hasn't been shown. It was claimed. And it was claimed that SQL injection somehow proves this. Which it doesn't, because we agree that what prevents the attack is the code that runs, the model that executes. It may be that such code is easier to write with static types, but that hasn't been shown, it is just claimed and the claim repeated in support of the claim. Circle.

[1] https://blog.metaobject.com/2014/06/the-safyness-of-static-t...

seanwilson · on Aug 5, 2021

> Or to be precise, there isn't any that passes any statistical or other standards. What little evidence there is goes both ways and has, at best, tiny effect sizes.

You mean the link from [1] to "An experiment about static and dynamic type systems" based on "an empirical study with 49 [undergraduate] subjects that studies the impact of a static type system for the development of a parser over 27 hours working time.". I think studies like this are more distracting here than useful (low skill level, not large scale enough, too much noise, toy example). The article here sums up a lot of my thoughts on this: https://news.ycombinator.com/item?id=27892615

And do you really need a study that proves e.g. (to pick a simpler example to summarise) a language that makes it impossible for null deferences to happen at compile-time is going to have less null dereference errors than one that lets those errors happen at runtime? It's like insisting for an empirical study that 2 + 2 = 4.

> It may be that such code is easier to write with static types, but that hasn't been shown,

The typed SQL example might look something like:

   function getUserInput(): string

   function createSanitisedString(s: string): SanitisedString
   function createSafeSqlQuery(s: SanitisedString): SqlQuery

For the code `createSafeSqlQuery(getUserInput())`, a static type checker would stop the entire program from starting with a type error pinpointing the exact line where the unsanitised data is coming from. With a dynamic type checker:

1. At best, the code will fail only after the user input is received by createSafeSqlQuery during runtime and you won't know where the input originated from.

2. At worst, the coder forgets to add a check like `typeof s === SanitisedString` or a call to `createSanitisedString` in `createSafeSqlQuery` and creates an injection attack.

The static type checker clearly wins here for me for safety and ease of implementing correctly. I don't need a study to know that compile-time errors are better than production runtime errors, that automated and exhaustive checks of correctness are better than the coder having to remember to add manual checks, and that it's better to know the exact line the unsanitised input came from over only knowing it comes from somewhere.

What languages have you used that have strong static type systems that you've written large programs in? Have you tried OCaml or Haskell for example?

simiones · on Aug 5, 2021

> function createSanitisedString(s: string): SanitisedString > function createSafeSqlQuery(s: SanitisedString): SqlQuery

The problem with the whole argument is that these functions are not actually enough to work with SQL, since they don't allow us to create dynamic SQL from safe strings.

Here are some ideas of why we can' use the functions you proposed:

  user_filter = raw_user_input_col_name ++ " LIKE ?"
  createSafeSQLQuery("SELECT * FROM my_table WHERE " ++ user_filter ++ ";") //type error

  sanitizedQuery = createSanitisedString("SELECT * FROM my_table WHERE" ++ user_filter ++ ";") //should return error or quote everything
  createSafeSQLQuery("SELECT * FROM my_table WHERE" ++ createSanitisedString(user_filter) ++ ";") //filter will be wrong, type error

Let's now add something to allow us to achieve our goal:

  function asSanitizedString(s: string) : SanitizedString {
    return s
  }
  user_filter = createSanitizedString(raw_user_input_col_name ) ++ asSanitizedString(" LIKE ?")
  query = asSanitizedString("SELECT * FROM my_table WHERE") ++ user_filter ++ asSanitizedString(";")
  createSafeSqlQuery(query) //works, nice
  createSafeSqlQuery(asSanitizedString("SELECT * FROM my_table WHERE " ++ user_col_name ++ "LIKE ?;")) //oops, this also works

You can achieve all of this with a non-string API: you can have an SQL DSL or just an SQL AST library; or you can use an ORM. But either way, you can't fix it without modeling the entirety of SQL into your type system (or as much of SQL as you are willing to support).

If you don't believe me, go looking for a library that allows arbitrary strings as SQL, but statically ensures user input can't be used to construct an SQL query. I don't know of any one.

seanwilson · on Aug 5, 2021

> The problem with the whole argument is that these functions are not actually enough to work with SQL

My bad for being unclear but I meant you would be sanitising something like the user entering "tea cups" into a shop search input form and searching your shop product database with that. I didn't mean the user would be entering SQL queries.

> But either way, you can't fix it without modeling the entirety of SQL into your type system

Using the type system to check correctness sounds good to me.

This is getting way too into the weeds about SQL anyway. Null dereference checks are a less distracting example to focus on for instance. As long as you can encode it into the type system, all my points are still relevant.

simiones · on Aug 6, 2021

> My bad for being unclear but I meant you would be sanitising something like the user entering "tea cups" into a shop search input form and searching your shop product database with that. I didn't mean the user would be entering SQL queries.

No, I understood that. But my point is that the programmer is going to be composing SQL queries, and they will be doing it based on user input. The SQL API will have a very hard time distinguishing which parts it receives are trusted programmer input and which parts are untrusted user input.

> Using the type system to check correctness sounds good to me.

Sure, but SQL is a huge language with significant differences between any 2 major DB implementations (and that depends on SQL server configurations - e.g. in MySQL you can specify via config file whether "a" is a string or an identifier). I have never seen a full SQL DSL or AST used in any SQL client library, in any language: it's just too much work.

mpweiher · on Aug 5, 2021

>You mean the link from [1] to "An experiment about static and dynamic type systems"

No. I mean, yes, that is one study, but there are a lot more. They all come out essentially the same.

Danluu did a an overview article a while ago:

https://danluu.com/empirical-pl/

Note that the A large scale study of programming languages and code quality in github paper, the one that makes some of the strongest arguments for safety of the bunch, was the one that was later completely taken apart at OOPSLA 2019. Its findings, which also had very small effect sizes and statistics significance, are completely invalid.

> do you really need a study that proves ... less null references errors

1. Languages don't have null errors, programs do.

2. I don't need a study to show that a program has less than or equal null errors in a language with null safety, because the program in a language without might still have zero. If you're going to make a logical claim, then let's stick to what's logically provable. If you're going to hand-wave...well you might as well use a dynamically typed language ;-)

3. I do need a study to show that such differences matter at all. All software has bugs, if this a significant source of bugs, then a mechanism to make me not have them might matter.

4. I do need a study to show that the net effect is positive. For example, the famous Caper Jones study showed that dynamic languages such as Smalltalk and Objective-C were significantly more productive than statically typed languages, including Haskell, C++ and Java. Studies also have long shown that bugs scale linearly with code size, a very strong correlation. So let's say null-references are 1% of you total bugs, but you now write twice the amount of code, that means a move to a null-checked language will significantly increase your total bug count, despite eliminating a certain class of bugs.

(In fact other, older studies showed that type errors accounted for 10% of bugs, so if you can save just 10% of code using a dynamically typed language, you're ahead in terms of bugs).

Of course, I can also not require such studies and make an engineering judgement. And this is fine, we do it all the time because very little in software has been demonstrated much at all. But you then need to be aware that this is a subjective judgement call, and others may reasonably come to a different conclusion.

And being aware of this makes for much, much better engineering judgement, IMHO.

igouy · on Aug 5, 2021

> … but you now write twice the amount of code…

We could take some JavaScript programs —

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

— and try to transliterate them into Dart —

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

— and see how much or how little boiler plate is required when a language supports null safety, local type inference, implicit-dynamic: false ?

seanwilson · on Aug 5, 2021

> 1. Languages don't have null errors, programs do.

> 2. I don't need a study to show that a program has less than or equal null errors in a language with null safety, because the program in a language without might still have zero. If you're going to make a logical claim, then let's stick to what's logically provable. If you're going to hand-wave...well you might as well use a dynamically typed language ;-)

I think now you're just nitpicking and being uncharitable for the sake of it instead of responding to the overall point of my message I took care to write.

> Studies also have long shown that bugs scale linearly with code size, a very strong correlation. So let's say null-references are 1% of you total bugs, but you now write twice the amount of code, that means a move to a null-checked language will significantly increase your total bug count, despite eliminating a certain class of bugs.

> (In fact other, older studies showed that type errors accounted for 10% of bugs, so if you can save just 10% of code using a dynamically typed language, you're ahead in terms of bugs).

Most software engineering studies have so many confounding factors even in isolation that combing results from multiple studies like this is nonsensical and misleading. This reads like brilliant satire if I'm honest.

> And being aware of this makes for much, much better engineering judgement, IMHO.

You didn't reply to the question about if you've written large projects with strong static type systems like OCaml or Haskell. If the answer is no, they're worth learning for more awareness instead of relying on likely very limited and/or flawed empirical studies in my opinion.

igouy · on Aug 5, 2021

> … famous Caper Jones study…

Have you ever considered the validity of that study: which purports to compare programming languages, apparently without considering obvious differences in available programming tools?

Even back in the late '80s, those Smalltalk implementations provided a very integrated development environment — writing Smalltalk in a simple text editor really isn't the same ;-)

chriswarbo · on Aug 5, 2021

> > It's possible to implement anything correctly with dynamic types but the point is static typing makes it easier to do so

> This is where the argument became circular and we can basically stop.

I think any appeal to "easier" is inherently subjective. I've certainly tried to avoid making any such claims. Instead, I've mostly tried to argue that static types can forbid certain programs from ever running, and we can use that to forbid things like vulnerable SQL queries.

> > It's possible to implement anything correctly with dynamic types

> but apparently chriswarbo does not.

I never said any such thing; besides which, it's trivially the case that anything implemented with static types could also be implemented with dynamic types, since static types only exist at compile time (they are "erased" during compilation), and hence every running program is dynamically typed.

In fact, I don't think I've said anything about correct implementations at all. My point is that static types can forbid incorrect implementations. When it comes to security, that is much more important; i.e. I would much rather bang my head against a screen filled with compiler errors, than expose some insecure program to the harsh reality of the Internet.

> The parent claim (by chriswarbo) was that, for example, SQL injection attacks were, in fact, incontrovertible proof that SQL injection attacks "are classic type errors".

I've not claimed that (in fact, I'm stuggling to parse what that sentence might even mean). I claim that SQL injection attacks are "classic type errors" in the sense that:

- It's a widespread problem, easily encountered, commonly warned about, and most developers have probably done it at some point.

- It's a class of error that can be entirely ruled out at the language/API level, through use of static types. Similar to how segfaults can be entirely ruled out at the language level, using managed memory like in Python/JS/etc.

- Due to the above, it's a common example used to illustrate the usefulness of static types in blog posts, articles, etc. One which sticks in my mind is "A type-based solution to the “strings problem”: a fitting end to XSS and SQL-injection holes?" https://blog.moertel.com/posts/2006-10-18-a-type-based-solut...

- Due to the above, it's considered "folk wisdom" (or a "folk theorem") in the relevant fields (e.g. Programming Language Theory, Formal Methods, etc.)

Other examples of "classic type errors" might include:

- Mixing up integers with strings-of-digits, e.g. 'print("You were born around " + (time.now.year - input("How old are you?"))'

- Forgetting to 'unwrap' some list element, e.g. 'open(listdir("config/")).read()' instead of 'listdir("config/").map(f => open(f).read())'

- Mixing up units of measure, e.g. 'distance = speed + time' instead of 'distance = speed * time'

Basically any common mix-up between values/components in a program, where the computer could help up to spot the mix-up (perhaps entirely automatically, if type inference is involved). In the case of SQL injection, the mix-up is between 'string of bytes' and 'SQL query'.

> And yes, that article is somewhat old, but the evidentiary situation has not changed, despite further attempts to make the claim that static typing is provably better. Claims that were resoundingly debunked.

I very much appreciate when researchers attempt to ground things in a bit more empiricism! Unfortunately those particular studies just aren't looking at the sorts of things that I find relevant; in particular, those studies (and that linked blog post, and many of the comments here), seem overly-preoccupied with mundane trivialities like "string", or "int", or "SQLColumnName".

Personally, I'm much more interested in how types can help me:

- Avoid leaking secret information https://link.springer.com/chapter/10.1007/978-3-030-17138-4_...

- Guarantee big-O resource usage https://twanvl.nl/blog/agda/sorting#a-monad-for-keeping-trac...

- Guarantee the correct order of operations https://docs.idris-lang.org/en/latest/st/machines.html

- Prevent AI agents from losing capabilities http://chriswarbo.net/projects/powerplay

mpweiher · on Aug 15, 2021

[SQL injection]

> It's a class of error that can be entirely ruled out at the language/API level, through use of static types.

Repeating this claim doesn't make it true.

Once again: this has nothing to do with static vs. dynamic types, and everything to do with modelling.

To make this clear, let's compare a dynamically and a statically type-checked version of this with the model of SQL as just strings.

Both the dynamically and the statically type-checked version of this program will be susceptible to SQL injection attacks. The statically type-checked version will verify at compile time that all the values are indeed strings.

Now let's compare a dynamically and a statically checked program with a proper model for the SQL, not strings.

Both of these will not be susceptible to SQL injection attacks.

It has nothing to do with static vs. dynamic, and everything with how you model the problem.

chriswarbo · on Aug 4, 2021

> Optional requires generic types (kind-1 types) to be usable

My gut tells me that's not quite right (for some reasonable definition of 'usable'), since we can always build elimination forms into the language (after all, 'null' must be built in, and few take objection to building in elimination forms like if/then/else).

> it also requires some kind of pattern matching or monad comprehensions - otherwise, code using Optional is neither type safe nor readable.

I really like comprehensions in Python, but hardly ever use them elsewhere. In fact, when I do use them in Haskell and Scala, I usually find myself having to refactor them into calls to `map`/`join`/etc. soon after, to have more fine-grained scoping or somesuch.

> trying to offer a generic SQL library that presents an AST as the API for even the most popular RDBMSs quickly gets complex

That's orthogonal to anything I said. Represent SQL using arrays of bytes in memory if you like; offer a string-like interface for constructing and manipulating them if you like; just make sure to distinguish them from other string-like types, in a way that's statically enforceable, and where the only API to convert a 'String' to an 'SQL' is the escaping function.

Note that I don't particularly care if an SQL value is valid; it would be nice to statically enforce that, but you're right that (vendor-supplied) SQL is pretty complex in its own right. What's much more important to get right is that SQL is not user-generated.

simiones · on Aug 4, 2021

> My gut tells me that's not quite right (for some reasonable definition of 'usable'), since we can always build elimination forms into the language (after all, 'null' must be built in, and few take objection to building in elimination forms like if/then/else).

True, you could go the Go array route and have Optional be a special type that is generic, without otherwise supporting generic data structures.

> I really like comprehensions in Python, but hardly ever use them elsewhere.

I think code like possiblyMissing.map(value => code) introduces lots of unpleasant imbrication, and breaking out functions for every possibly-missing value also seems to me to lead to bad readability. I suppose this may be a matter of taste, or it may be related to other language features.

> just make sure to distinguish them from other string-like types, in a way that's statically enforceable, and where the only API to convert a 'String' to an 'SQL' is the escaping function.

I think this is the part that doesn't really work, because, in order to escape a piece of user input, you need to understand what it's meant to be. For example, a common (unsafe) idiom is:

  sqlFormatString = "SELECT %s FROM %s WHERE %s;";
  filterFormatString = "%s LIKE ?";
  filterString = sprintf(filterFormatString, userInputColumnName);
  sqlQuery = sprintf(sqlFormatString, columnName, tableName, filterString);
  preparedQuery = dbImpl.prepare(sqlQuery, userInputValue);

Which of course does the right thing for userInputValue; may or may not be doing the right thing with columnName and tableName, depending on the source; and does the really wrong thing with userInputColumn.

Now, you could replace this with a type safe dbImpl.prepare(), which only takes a NotUserGeneratedSQL type. But now you need to have some way to build a NotUserGeneratedSQL.

One option is to go all in on an SQL AST library, where you could build the query above like

  query = selectQuery();
  query.columnNames = [ escapeColName(columnName) ];
  query.from = escapeTableName(tableName); 
  queryFilter = equalityComparison();
  queryFilter.left = escapeColName(userInputColumn);
  queryFilter.right = preparedArg();
  query.filter = queryFilter;

  preparedQuery = dbImpl.prepare(query, userInputValue)

But this, as I said, gets tedious for the user and complex for the implementer.

However, I don't think there is any alternative that can actually enforce validation of user input in constructed queries. You can help users think about it by forcing them to use some kind of string -> NotUserGeneratedSQL function, but this function can't actually be written to reject the example I gave initially. A function that creates an empty minimal NotUserGeneratedSQL and offers other functions to build it up to a useful query will either accept strings (making it possible to introduce SQL injection) or have to accept structured SQL, making it fall into the AST problem.

Of course, the problem of NotUserGeneratedSQL is easy if you don't want to support any kind of dynamic query, other than prepared statements. But if you want to dynamically generate queries based on user input (e.g. user chooses which columns they want to see, which columns they want to filter by etc.), then I don't think it's possible to statically ensure that SQL injection is not possible without an ORM or SQL AST API.

Edit: if you believe otherwise, please show me a type-safe, SQL-injection-safe library in any language that isn't an ORM/EDSL or a straight up SQL AST manipulator. I am quite certain none exists, but I would be happy to be proven wrong.

chriswarbo · on Aug 5, 2021

What I had in mind was something along these lines:

    allowedColumns: Map[UserInput, SQL] = {
      "name": "name",
      "age": "age",
      ...
    }

    strict: SQL = strictComparison? "=" : "LIKE"
    
    query: Maybe[SQL] = allowedColumns
      .get(userCol)
      .map(col => "SELECT " + col + " FROM tbl WHERE tbl.foo " + strict + addParameter(":?", userFoo))

In particular:

- We can write literals which look like strings but have type SQL

- We can append fragments of SQL together (potentially making it invalid; oh well)

- We can include user input via parameterised queries, in locations where arbitrary strings/ints/etc. are allowed

- Anything 'structural', like identifiers, choice of comparison operations, building up sub-expressions, etc. must be done programatically, using the above features. In this case we select the column name (written as a static, literal SQL value) by looking up the user's input in a map. We also allow choosing between the type of comparison to use (again, both are SQL literals).

It seems to me that requirements like 'user chooses which columns they want to see, which columns they want to filter by etc.' is fundamentally incompatible with a safe, string-like representation. Instead, the options are:

- Fully representing the structure of the language. This results in an AST approach, which allows safe dynamic queries. I think any alternative, like tracking the offset of each delimiter in a string, etc. will turn out to be equivalent to maintaining an AST.

- A "flat", string-like representation, whose dynamism is limited to choosing between some combination of pre-supplied fragments. This is what I've shown above. This is safe, but the 'dynamism' is inherently limited up-front (i.e. it's overly conservative).

- A "flat", string-like representation, which has unrestricted dynamism, but hence is also inherently unsafe (i.e. it's overly liberal).

simiones · on Aug 6, 2021

> - We can write literals which look like strings but have type SQL

What I'm not clear is: what prevents me from accidentally/stupidly doing:

  filter : SQL = "WHERE " + userInputCol + " = ?"

Is this special string handling some compiler magic that distinguishes literal strings from string variables? If so then I think that in a language that supports something like this you can indeed make a safe library. The main downside is that you need to work entirely with compile-time constructs - e.g. you can't use something like printf to take a compile-time format string and turn it runtime into a query; and you can't take queries from a separate file, they must be in source code. But these may be acceptable trade-offs.

Do you know of any library that implements this?

chriswarbo · on Aug 7, 2021

> Is this special string handling some compiler magic that distinguishes literal strings from string variables?

Ah, maybe I should have made it clearer that I was overloading the double-quote syntax, so we can write:

    "foo": String
    "bar": SQL
    "baz": UserInput
    "quux": Shell
    etc.

I was also relying on type inference to figure out which is which, and on '+' returning the same type as both arguments, e.g.

    +: String  -> String  -> String
    +: SQL     -> SQL     -> SQL
    +: Int     -> Int     -> Int
    +: Float   -> Float   -> Float
    +: List[T] -> List[T] -> List[T]
    etc.

This way, we see how your example fails to typecheck:

    // Code as written
    filter : SQL = "WHERE " + userInputCol + " = ?"

    // Right-hand-side must have type SQL, to match left-hand-side
    "WHERE " + userInputCol + " = ?": SQL

    // Resolving order-of-operations of the two '+' operations
    ("WHERE " + userInputCol) + " = ?" : SQL
  
    // Arguments to outer '+' have same type as return value, which is SQL
    "WHERE " + userInputCol : SQL
    " = ?" : SQL

    // Arguments to inner '+' have same type as return value, which is SQL
    "WHERE " : SQL
    userInputCol : SQL

We've inferred that userInputCol must have type SQL, so it will fail for String/UserInput/whatever.

> The main downside is that you need to work entirely with compile-time constructs - e.g. you can't use something like printf to take a compile-time format string and turn it runtime into a query; and you can't take queries from a separate file, they must be in source code.

Yep, although macros could help with that sort of thing, e.g. Haskell's quasiquotation https://wiki.haskell.org/Quasiquotation

A couple of Google hits for 'haskell quasiquote sql':

https://hackage.haskell.org/package/postgresql-simple-0.6.2/...

https://hackage.haskell.org/package/postgresql-query

> Do you know of any library that implements this?

Not completely. De-coupling double-quoted literal syntax from a single String type can be done with Haskell's OverloadedStrings feature, but that relies on a function 'fromString : String -> t', which is what we're trying to avoid https://hackage.haskell.org/package/base-4.6.0.1/docs/Data-S...

Scala's custom interpolators are similar, but they rely on a function from 'StringContext -> t', and StringContext is easily created from a String ( https://www.scala-lang.org/api/current/scala/StringContext.h... )

To ensure safety, we would need some way to encapsulate the underlying fromString/StringContext implementation to prevent it being called by anything other than the literal-expansion at compile-time.

Of course, if we're willing to use macros then it's pretty easy, like those quasiquote examples above.

Haskell's module system is famously mediocre, so it might be possible to do this overloading + encapsulation with Idris https://idris2.readthedocs.io/en/latest/reference/overloaded...

(Of course, Idris also has an incredibly expressive type system, and a very powerful macro system, AKA "elaborator reflection", so it can definitely be done; but I haven't figured out the cleanest way)