1. High performance read/write of Scylla/Cassandra with high availability[1]. It...

charlysl · on Sept 6, 2021

> Zero impedance mismatch between database and application representation

Does this include informacion hiding/encapsulation? (to prevent saved objects' internal representation from being exposed).

Traditional databases don't have an encapsulation mechanism AFAIK, which is one of the reasons for impedance mismatch.

This is important because it is a good practice for client code to make no assumptions about the internal representation, accessing data only via the a public interface.

If it happens to be exposed by the database, the clients can use it in their queries. If the internal representation changed later on, such clients would be broken.

Of course, this can be solved by only allowing data access via, say, well designed restful apis (that don't expose internal details), but this would still provide no guarantees.

How about another reason for impedance mismatch, that of storing objects that belong to a class hierarchy?

takeda · on Sept 6, 2021

> Traditional databases don't have an encapsulation mechanism AFAIK, which is one of the reasons for impedance mismatch.

It actually does, those are views and functions.

The real problem with impedance mismatch is that SQL is declarative (you say what you want and database figures out how to get it) when most programming languages are iterative (you say what should be done).

The issue is that you have two very different languages. For one you have powerful IDE with type checking auto completion and refactoring capabilities, the SQL often is sent as a string and don't have these benefits. The various ORM are attempts to use iterative and object oriented language to access relational objects using a declarative language.

I think JetBrains is addressing the problem the right way. They added Data Grip functionality to their IDEs like PyCharm for example. What it does is that if you connect the IDE to a database and let it download the schema you get the same functionality for the data. Basically it will detect SQL statements in the string and offer the same capability for it as you have with the primary language.

At that point the impedance mismatch no longer feels like a mismatch. You basically have two languages, one to obtain data you need and another to process/present it. You can get database to return exact fields you need for your projects and even the object mapping starts feeling unnecessary.

Why data is stored in a relational way? Because that's most optimal way to store the data and the way it is stored allows multiple applications access the same data differently.

For example with NoSQL you need to know how the data will be used so you correctly plan how it will be stored. If application changes you might need to restructure the entire data.

Ultimately the data is the most important thing businesses and it stays, while applications that use it come and go.

wruza · on Sept 6, 2021

For example with NoSQL you need to know how the data will be used so you correctly plan how it will be stored. If application changes you might need to restructure the entire data.

Honestly, SQL has this problem too, but it presents itself not in the way you store, but in the way you query. There are simple schemas and complex ones, and irrespective of that there are obvious query sets and unplanned ones (i.e. written at runtime as part of the data analysis process). SQL and its autoplanning is required only for complex+unplanned cases, in my opinion. In all other cases I know my data and I’d better walk through the indexes myself rather than writing 4-story queries to satisfy the planner. At the end of the day, nested loops through the indexes is what RDBMS does. There is no declarative magic at the fetch-and-iterate level.

Iow, it would be nice to have “extql” a direct access to indexes and rows, in sort of a way EXPLAIN works, and skip SQL completely.

  function get_items(store_id) {
    for (var item in items.id) {
      var res = item.{name, code}
      var item_id = i.id
      res.price = prices.any({item_id, store_id})?.price
      if (!res.price) continue
      res.props = todict(props.all({item_id}).{name, value})
      yield res // or collect for a bigger picture
    }
  }

This query could be an equivalent of “select from items inner join prices on (store_id, item_id) left join props on (item_id)” but saving space for many props and being much more programmable. Also, it would be nice to have the same engine (sql+extql) at the “client” side, where the inverse problem exists – all your data is nosql, no chance to walk indexes or declare relations.

takeda · on Sept 6, 2021

> Honestly, SQL has this problem too, but it presents itself not in the way you store, but in the way you query. There are simple schemas and complex ones, and irrespective of that there are obvious query sets and unplanned ones (i.e. written at runtime as part of the data analysis process). SQL and its autoplanning is required only for complex+unplanned cases, in my opinion. In all other cases I know my data and I’d better walk through the indexes myself rather than writing 4-story queries to satisfy the planner. At the end of the day, nested loops through the indexes is what RDBMS does. There is no declarative magic at the fetch-and-iterate level.

The thing is that what worked at specific time can change. For example if you have simple join with two tables, let say A and B. You search by column in target A to get value from column in table B. Now if both tables are large then it makes sense to lookup in A by an index, then use foreign key and index to find the row in table B.

Now if A and B have few elements. Even if there is an index on both of them, it actually is faster just to scan one or both tables.

It might be actually more beneficial to ensure that tables are properly analyzed, have right indices and preferences in the query planner are tuned.

If you need to override query planner, you don't have to make sophisticated queries, you can just use this[1] extension. Though if things aren't working right it is either lack of data, mis-configuration or a bug.

[1] http://pghintplan.osdn.jp/pg_hint_plan.html

charlysl · on Sept 6, 2021

> For example with NoSQL you need to know how the data will be used so you correctly plan how it will be stored. If application changes you might need to restructure the entire data.

This point is very important, and well explained in Stonebraker's paper "What Goes Around Comes Around". What is most interesting is that he is actually talking about half a century old pre-relational IMS IBM databases, but they had exactly the same issue, hence the paper's title. Codd invented the relational model after watching how developers struggled with the very problem you mentioned.

Stonebraker famously quipped that "NoSQL really stands for not-yet-SQL".

He also addresses the impedance matching issue in the "OO databases" section; there is actually a lot more to it, and he gives it all an insider's historical perspective.

Scarbutt · on Sept 6, 2021

Your view of what impedance mismatch is doesn't sound accurate. It's not about declarative or imperative or syntax or strings, etc...

It's about data modeling, one models data using relations, the other models data in a hierarchical way (using maps, arrays, objects, etc...). They are two different ways to structure your data, hence the impedance mismatch.

takeda · on Sept 7, 2021

Perhaps I was using the wrong word. I was referring that traditionally things fell a bit off when working with SQL. Because often it wasn't a code, just a bit of strings that you were sending.

Because of that, developers started to abstract that with code and objects that were then populated with data.

With IDEs understanding the SQL that's no longer necessary. I can construct a specific SQL to get the exact structure my program needs. Even if it is hierarchical I can use various jsonb aggregation functions. That's a game changer to me.

bbsimonbb · on Sept 6, 2021

Not just jetbrains. pgtyped does this for postgres and typescript, and queryfirst for c# against sql server/postgres/mysql.

takeda · on Sept 6, 2021

I did not know that, I first encountered that in PyCharm.

I'm glad there's more.

Edit: actually what you mentioned is slightly different. This is what I'm talking about: https://youtu.be/_FlpiNno088?t=2863

vaughan · on Sept 6, 2021

> good practice for client code to make no assumptions about the internal representation

If your internal representation and API start to differ then it adds complexity fast. Its far better to have as close to a 1-1 mapping for your backend and frontend data models as possible.

zozbot234 · on Sept 5, 2021

Postgres comes with the building blocks for both sharding and HA out of the box, and they're extensively discussed in the docs. You don't need proprietary addons other than as pure convenience.

zem · on Sept 5, 2021

don't underestimate the importance of convenience. I'm convinced one of the reasons MySQL had so much more mindshare than postgres back in the day was that it was far easier to get up and running, even if postgres might have been easier to use once everything was set up correctly.

bostik · on Sept 6, 2021

That's funny, I must have been an outlier then.

I've been using Postgres since 1998, and I tried getting MySQL up first. There was more documentation available for the latter, so it should have been simple. Failed. It just didn't work.

Out of frustration I then tried Postgres, because I just wanted a decent database for my project. It was surprisingly easy, I only had to learn about pg_hba.conf to get to a functional state. Everything else was in place out of the box.

I've been a happy user ever since. MySQL may have had the mindshare (thanks to prevalence of LAMP) but everything outside the magic happy path was confusing and fragile.

ahoka · on Sept 6, 2021

No one cares what was in 1998, that's the whole point of this discussion. Postgres devs kept digging their heads into sand for many years, saying that high availability is somehow not the task of the database. In reality only the datastores need to be HA/durable in an otherwise stateless architecture.

BarryMilo · on Sept 6, 2021

I don't even remember choosing MySQL when I started (15 years ago). It was just so dominant, we didn't question it.

Nowadays I would still use it because I assume it is the dumbest database system and that's exactly what I need for my 1-5 user app.

5e92cb50239222b · on Sept 6, 2021

It still has some features over PostgreSQL that pushed me to choose it (actually MariaDB) for a new project about a year ago, namely multi-master replication. Yes, I know, terrible database and horrible feature, but it really helped in that particular domain.

I couldn't find anything decent for postgres, while MariaDB/MySQL have that built-in, with some differences in implementation. Especially for a customer who refuses to pay for his software, because there are some commercial solutions.

jjirsa · on Sept 6, 2021

Sharding is not the same as natural clustering, because eventually you’ll need to reshard and then you’ll be writing a lot more code.

jasfi · on Sept 6, 2021

Postgres has many HA solutions, and it's getting better all the time. Postgres has good performance, benchmarks need to show the various trade-offs systems make so that informed decisions can be made. Feel free to post a link.

The Graph model can be made available through extensions. See AGE: https://age.incubator.apache.org/ They plan to support OpenCypher.

The JSONB type allows for No-SQL like development if that's what you really want.

zeroc8 · on Sept 7, 2021

Just because you can use JSONB, it doesn't mean that it is as easy to work with as it is the case with MongoDB. So if you really need a JSON store, just use Mongo.

pas · on Sept 5, 2021

> independence mismatch

you probably mean impedance mismatch, right?

https://en.wikipedia.org/wiki/Object%E2%80%93relational_impe...

vaughan · on Sept 6, 2021

> Zero impedance mismatch

I see the problem here being that too many manual optimizations need to be done when implementing a schema.

You start with a logical schema (ERD diagram) and then implement it via a physical schema with denormalizations added for efficiency (usually because the relational model has scaling limits with number of joins, or because db migrations are too difficult, or handling unstructured data without introducing a ton of new tables). The db should do the denormalization automatically, allowing the user to interface with their logical schema directly.

Another reason is we can't use SQL in the browser - we have to go through many caching layers and API layers which complicate things.

cwp · on Sept 6, 2021

+1 to the application/database mismatch. GemStone is really amazing. I never got to do commercial work with it, but damn is it a great environment. I've never seen anything like it.