Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
MongoDB 6 Released (mongodb.com)
54 points by Fudgel on July 19, 2022 | hide | past | favorite | 60 comments



> MongoDB 6.0 adds additional capabilities to two key operators, $lookup and $graphlookup, improving JOINS and graph traversals, respectively. Both $lookup and $graphlookup now provide full support for sharded deployments.

> The performance of $lookup has also been upgraded. For instance, if there is an index on the foreign key and a small number of documents have been matched, $lookup can get results between 5 and 10 times faster than before. If a larger number of documents are matched, $lookup will be twice as fast as previous iterations. If there are no indexes available (and the join is for exploratory or ad hoc queries), then $lookup will yield a hundredfold performance improvement.

It should have been done long time ago. Better late than never.


Who are the large Mongo customers? It “seems” like it died 10-15 years ago, but clearly they’re still going.


https://www.mongodb.com/who-uses-mongodb

MongoDB is anything but dead.

Also MongoDB isn't even 15 years old, so how could it have died 15 years ago?

Disclaimer: ex-MongoDB employee.


Most large corporations. They have a very formal and tight process for spinning up Oracle instances, so the developers go to Mongo because it’s easy. Once it’s in production it’s very hard to rip out, and the salespeople jack up prices every year. God forbid you get more than 2 years behind on upgrades and it’s more fees. Internal IT departments get stuck with growing masses of instances that are hard to replace with their 1 yr planning cycles.

MongoDB blends the commercial aggressiveness of Oracle with the audit mindset of IBM. The sleaziest in the game.


I get an email about once a year claiming I use mongodb and am not paying license fees. It’s really weird because they’ll say stuff like a system I designed is based on mongodb and they have “evidence” from project staff. It’s such a strange claim and I guess they don’t know I made the system.

It’s also strange because they keep demanding license fees and asking to audit and I just deny and tell them to buzz off.

They remind me of Oracle. It also makes me wonder if there’s some legal compulsion that could force them to do discovery. Based on their behavior I would not use mongodb.


I wonder if it's a use case to sue them. Such things can be disruptive to business and is no different from someone calling a bomb hoax.


It’s the same playbook. They hire ultra aggressive folks to sell because it’s hard to rip out.

Oracle eventually faded in relevance similar to IBM. So will Mongo. It will be around as a relic living by auditing the installed base.


Oracle's doom is a tricky one to predict, they have been around for a long time. Can they survive Larry Ellison's retirement? Who can tell?

But IBM? You are having a laugh. IBM was there at the start of the computer industry and they have survived every single generation of change since, getting stronger each time.

Saying IBM has faded in relevance is absolutely farcical.


IBM’s stock price is roughly flat over the past 5 years. https://finance.yahoo.com/quote/IBM/

The S and P is up 50%. https://finance.yahoo.com/quote/%5EGSPC/

Go back 10 years and IBM is down roughly a third while the S and P has almost tripled. This is while IBM has been doing massive buybacks to raise the price.

Is that farcical?

What’s interesting is a Mongo employee looking at IBM as a role model. This confirms the “get in and then and then squeeze your customers” strategy.

Just be careful. As an employee, the endgame isn’t pretty when the company realizes that young auditors are cheaper than old salespeople and engineers.


I don't think I mentioned role model. We most definitely don't model ourselves on IBM. I was just pointing out their longevity which is impressive to me.


Do you have recent data on this auditing stuff you're referencing? MongoDB is growing much faster than Oracle, so maybe it won't fade...


If you want to ping me an email at Joe.Drumgoole@mongodb.com I can get that fixed pretty quickly.


Looks like an automated email.


It seems like it’s from a human. It’s from the same person and it is worded differently each time with oddly specific claims. They also email other people in my org asking to connect with me because they think I’m misusing licenses.


> Who are the large Mongo customers?

Companies that adopted this technical debt during its prime (2010-2015) and are now paying for enterprise licenses and support to service that debt.


Mongo has customers at some of the largest technology companies around and perhaps the best-in-class cloud-agnostic managed service. There's less hype than when the webscale video came out, but they've successfully converted a good bit of that hype into a solid platform with happy customers. I've seen a lot of Mongo new deployments even still.


I worked at BigCo that everyone knows about for quite awhile and when I left a couple years ago they were heavily investing in MongoDB for greenfield projects. It ended up being my organization’s (somewhat large, several thousand employees) de jure database for new projects. Mongo is going to be around for awhile.

Personally I disliked it, primarily due to personal bias towards RDBMS (most data is inherently relational and I saw a lot of unnecessary hoops getting jumped through to join data) and also because the driver situation was absolutely terrible when I had to work with it. Custom drivers to serialize almost any JSON data was incredibly annoying.


This "inherently relational" trope, in my view, is a sign that we've all deeply normalized normalization in our brains.. it's funny it's even in the nomenclature.. "normal". Yet denormalization is how our brains actually think: is it possible that all along we've been contorting ourselves to the 1970s machine and that that's no longer necessary?


Not a huge Mongo customer, but the startup I work at uses MongoDB exclusively as our database across our different services. As someone who worked as an SRE at a database company with a DBaaS product, MongoDB Atlas is by far the best DBaaS platform I've used. From an operations standpoint it's fantastic.

From a developer standpoint it's alright, we're getting to the scale now where we're thinking about migrating to a more global architecture and I'm glad we're using something like Mongo over trying to hack this out with Postgres.

The standard issues with trying to use MongoDB like a regular ol' SQL DB still apply though, and our developers (me included) still fall into using anti-patterns and have to speak with solutions engineers once in a while.


What kind of scale does the company have from qps etc pov?


They seem to be indeed going strong. I don't really see a viable alternative to MongoDB (document db). Fits well in a lot of use cases. Edit: we (Syncari) use MongoDB, we are not a very large customer but have a cluster that handles billions of documents


>Fits well in a lot of use cases.

Can you share the use cases? Why Mongo works better than Postgres?


I generally dislike Mongo, but I’ll give an anecdote.

Circa 2013 or so I was working for a regional newspaper group. They decided they wanted a weather section. So we subscribed to, iirc, Accuweather.

I wrote a script that would pull it down at the update interval (Every 15 minutes I think it was)

This was, for the time, for a shoestring outfit with only a few servers, a lot of data.

Something like 50k locations, with around 30 “rows” for each location - current conditions plus hour by hour going out a bit more than a day.

Loading this into our poor little MySQL server was… slow.

So I just setup a small mongo instance, and loaded/read from that. Worked great… we didn’t care about durability so we dodged that whole minefield. Mongo had a way to atomically swap two tables… so the import script would load into a scratch table, then swap that into the one the web server actually read from.

Loaded super fast since we didn’t care about durability, and read performance was more than adequate for the very simple queries we ram against it.

(To give you a feel for the era, this is when MyIASM was the default table store, and didn’t support fks or transactions)


I am not very well versed in modern Postgres solutions, but at a certain scale, you need to shard your Postgres databases into separate servers. It is not distributed, at least not without some fancy 3rd party plugins? As a result, your application code gets messy dealing with all these shards.

With Mongo, its distributed by default. And if your data keeps growing, you simply add more nodes without any additional changes in the application code.


As someone that fairly recently spent many months trying to shard a large mongodb system, this is patently false. Mongodb is not sharded “by default”, and sharding it is a lot of work and comes with a very heavy performance impact.


absolutely true that this is an area of improvement for MongoDB - moving from a "replica set" (a single shard cluster but without the config server cluster) to a sharded cluster can be hard. Atlas does it for you, but even then it can be hard. My info says they are working very hard on this.

Live resharding indeed helps for a sharded cluster that already exists.


Yes you need to enable sharding but its provided for you, if you want to enable it is what I meant. And adding shards as well. All configuration options in Mongo - vs you need to do this in a more manual way in Postgres (and change your app code more)

Of course it haa a perf impact - nothing is free.


You can even do live resharding since 5.0.


Our schemas are dynamic and the application maintains the schema (this is a core feature for us and not as a workaround for "schemaless")

JSON indexing is quite advanced in mongo. Postgres is catching up (gin indexes on jsonb arrays). So are aggregations on json

MongoDB Atlas - fully managed and by MongoDB core committers

We don't have a lot of multi document ACID requirements.

We are on GCP and cloud sql is a very basic postgres deployment and we don't want to mange infra


Mongo (similar for other NoSql propositions) is most suitable for specific data access patterns.

The most obvious one being where you are requesting data and related data based upon a unique business identifier.

NoSql DBs mostly make this both trivial and fast.

Postgres and other SQL options become relevant where aggregation and reporting become primary considerations.


Just to explain a little bit why the notion you're pointing to is incorrect -- identity and referential integrity fundamental concepts in the "SQL options" i.e. relational database management systems (RDBMS), based on the relational model[0], rooted in relational algebra[1].

NoSQL Databases arose from a realization that sometimes you don't need all the features of a relational database, for example:

- Redis which at the most basic level is a key value cache

- MongoDB which stores (denormalized) documents

NoSQL Databases like MongoDB are best when:

- All data related to a certain entity can be reasonably self contained (i.e. does not require relating to other possibly dynamic pieces of data)

- referential integrity (knowing that User.storeId can be treated as just a string) does not need to be maintained

- Constraint checking does not need to be performed or can be performed at the application level

- Schemas are either wildly variable or do not change at all, or checking them just isn't important.

The best example I hear often is a news site or blog -- if your main model is something like an Article that contains the author, content, tags and all data necessary to display one entry (and 99% of the time you get one Article and are not required to request related data that exist in separate collections), then NoSQL makes sense.

It's a tired conversation I think people have had repeatedly for a long time, so I won't go into it again (I'm also VERY biased towards RDBMS and in particular Postgres) -- would love to hear from experienced MongoDB proponents though!

One case I know Mongo was trusted for very early on was large scale out use cases, and after WiredTiger their execution engine improved immensely as well. I assume it's still great for that, where scale out is often quite lacking in RDBMS or not in the core experience.

[0]: https://en.wikipedia.org/wiki/Relational_database#Relational...

[1]: https://en.wikipedia.org/wiki/Relational_algebra


The key is that, what kind of ACID properties you need in your business.

RDBMS puts a lot of effort to support a wide range of ACID properties. However which also make it very difficult to be used as cluster.

So if your bussiness data is too large to be hold by a single instance of database, you will need a database cluster. And maybe someday, you want to get better performance, then you change your schema to remove some data constraints. Like removing FK, use application logic instead of storage procedure, etc.

That's the reason why we need NoSql database.

If OLTP is not the key part of your business, and it is expected to store a large amount of data in future. Then it would be better to carefully design your data schema so you can use NoSql database instead of a traditional RDBMS.


I don’t find the schema argument compelling at all. Postgres JSON columns are highly performant and queryable.


Not the JSON schema of one object, but the "data schema" of the whole project.

E.g. use nested document instead of FK to store 1 to 1 relationship. So you can take advantages of single document atomicity. Use array to store 1 to many relationship instead of an additional mapping table.


> The key is that, what kind of ACID properties you need in your business.

> RDBMS puts a lot of effort to support a wide range of ACID properties. However which also make it very difficult to be used as cluster.

Note that while common, ACID is actually not a requirement for RDBMSes -- there are NoSQL datastores that provide acid guarantees, like FoundationDB[0].

> So if your bussiness data is too large to be hold by a single instance of database, you will need a database cluster. And maybe someday, you want to get better performance, then you change your schema to remove some data constraints. Like removing FK, use application logic instead of storage procedure, etc.

I agree with the point, but I want to note that the vast majority of apps do not have data too large to be held by a single instance, especially with the eye-watering density of hardware these days.

Some proof for my essentially wild conjecture:

- Reddit started with just postgres[1]

- LetsEncrypt (the reason most of the internet will have TLS in the future if not already) supports over 235MM sites with just one MySQL box[2]

Now it's not that you can't mis-use RDBMS (Postgres), or that it's always the right tool for the job, but I just want to note that it's very possible to get very far with application size without scaling out horizontally. Scaling out horizontally & compromising your data model should be the last options you pursue, IMO.

Also, that said, Citus for postgres is now fully open source[3], TimescaleDB has been open source and only got open-er[4] (they have a clustering mechanism) so this diminishes the use case for NoSQL going forward. This means that the specific use case for NoSQL is eroding somewhat.

> If OLTP is not the key part of your business, and it is expected to store a large amount of data in future. Then it would be better to carefully design your data schema so you can use NoSql database instead of a traditional RDBMS.

I'd argue that you should default to OLTP and only deviate when you need the extra power, but we'd probably agree to disagree there :)

[EDIT] Highscalability.com has a great writeup on this from 2010 (!) that as I skim through still looks mostly relevant:

http://highscalability.com/blog/2010/12/6/what-the-heck-are-...

[0]: https://www.foundationdb.org/

[1]: http://highscalability.com/blog/2013/8/26/reddit-lessons-lea...

[2]: https://letsencrypt.org/2021/01/21/next-gen-database-servers...

[3]: https://www.citusdata.com/blog/2022/06/17/citus-11-goes-full...

[4]: https://www.timescale.com


> Note that while common, ACID is actually not a requirement for RDBMSes -- there are NoSQL datastores that provide acid guarantees, like FoundationDB[0].

Yes, most of NoSql provides some kind of ACID guarantees, however not as good as RDBMS. MongoDB also provides single document atomicity. But in RDBMS you can configure the ACID properties.

> but I want to note that the vast majority of apps do not have data too large to be held by a single instance

I agree with that. For me, the order would be: 1. Do not use database 2. Use SQLite 3. Use single instance RDBMS 4. Use single instance NoSql 5. Use single instance RDBMS + NoSql cluster 6. Use RDBMS cluster

> I'd argue that you should default to OLTP and only deviate when you need the extra power, but we'd probably agree to disagree there :)

Personally I would prefer NoSql when the business model fits some specific patterns. But I agree with you that.

The article of Highscalability.com is very useful, and also some other articles. I did not know the site before, thanks for sharing.


>>>MongoDB also provides single document atomicity.

And multi-document, in sharded clusters. 2018: https://www.mongodb.com/blog/post/mongodb-multi-document-aci...

>>>But in RDBMS you can configure the ACID properties. https://www.mongodb.com/docs/manual/reference/write-concern/


> I agree with that. For me, the order would be: 1. Do not use database 2. Use SQLite 3. Use single instance RDBMS 4. Use single instance NoSql 5. Use single instance RDBMS + NoSql cluster 6. Use RDBMS cluster

Yeah it’s so funny, HN seems to rediscover way of thinking every so often! I think it’s why SQLite projects are so popular (and of course how well built and popular SQLite is)


I'm confused. You appear to have repeated and expanded upon my position but are also inferring it's wrong.

Am I missing something here?


> Mongo (similar for other NoSql propositions) is most suitable for specific data access patterns. > The most obvious one being where you are requesting data and related data based upon a unique business identifier. NoSql DBs mostly make this both trivial and fast.

I don’t think that’s right though? Requesting data and related data is the purview of RDBMS, unless your example was a completely denormalized single object (I know Mongo supports links between objects as well but that doesn’t seem to be what you’re saying?)

Also the line point about SQL based databases being relevant when reporting and aggregation misses the importance of relations and operations in relations that is the point.

Maybe the comment was focused in on “SQL” rather than using it as a shorthand for “relational database” like most people do. Yes SQL makes aggregation and reporting easier but it’s primary job is data retrieval, and in most cases via an identifying primary key.


No viable alternative? Why not one of the many other document db's, such as CouchDB? Genuinely curious what MongoDB does better in general than all the other document oriented databases.


> what MongoDB does better

Hosting and infrastructure?

Sure for your weekend project there are plenty of alternatives. At enterprise level you aren't paying for just the base product you're paying to have someone else worry about all the other stuff that keeps it up and running.

Yes you can find Couchdb hosting services but it's very nice to have the company that builds the product also control the hosting.

I've worked at a company that was fairly large MongoDB customers. Even though MongoDB was a major part of our overall product there was never a second of considering hosting our own solution, and there was likewise never a time I remember Mongo being responsible for a major outage or other infrastructure issue. That's worth quite a lot of money to not have to worry about an essential part of your own internal product infra.


I see, so it's more the managed infrastructure than the actual DB itself?


Because MongoDB is web scale.

(Honest answer: a company can continue way longer than you think just supporting legacy customers - need Lotus Notes? https://www.hcltechsw.com/notes )


I don’t know anything about the actual numbers behind Mongo, but I question whether Mongo need many large customers to justify their business model? Lucee is able to run a stable, ongoing open source software business by taking a fraction of Adobe ColdFusion’s legacy customer base—most of whom switched explicitly for price reasons and probably haven’t given the Lucee corporation a single cent.


The bson interface of the mogodb API, a commercial friendly license, and mutable collection structures. It certainly isn’t the optimal solution for every application, but it does make sense in some roles.

Also, I look at projects which have survived over a decade rather fondly. ;)


Try typing "mongodb customers" into Google.

Here is a few I know are public references:

MetLife, Epic Games, Bosch, Inland Revenue (UK), Department of Work and pensions (UK) Barclays, HSBC, Boots, Forbes, Toyota, Sanoma, Intuit, Verizon.


Startups. When you are not sure how to map data relationships yet, MongoDB is the best tool. You can easily build a prototype without worrying about the database.

MongoDB Community Edition is a powerful software.


friends don't let friends use mongodb.

but in all seriousness. hstore is good enough for 99% of mongodb users


Isn't it experimental?


video game companies fo sho


No-SQL databases have their pros and cons and so do SQL databases.

Would it be feasible to use both, from within the same application, depending on the type of data in question?

The SQL vs. No-SQL debate has been going on for along time. Why not use both? Has anybody tried that?


You don't need to switch to MongoDB to have a JSON document server. Postgres has excellent JSON support, you can just put it all in Postgres. Bonus feature: you can easily migrate your unstructured data to structured data.


And another bonus, yay your backups are now all consistent and you can focus on one technology stack instead of two.

Reducing the cognitive load is important in large code bases.


I've never understood the conflating of JSON-like with "unstructured": I would call this richly structured


For small apps, which are most apps, a distributed, managed, partitioned postgresql cluster is sufficient.

Heck, even a single one on a good machine is sufficient.

It has a colorful mix of SQL and NoSQL constructs, interacted on SQL queries.


For small apps, go with SQLite.


Is the queryable encryption available in community?


Nice ARM support! 6.0 is not showing up on brew yet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: