Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As a relative idiot when it comes to this sort of thing, I'd like to insert the following supplementary question: what is the sort of application/dataset for which Mongo is particularly suited?

I've used it on small projects, and have enjoyed it. Perhaps my data has just been simple/loosely-coupled enough to never run into these problems?

I read a lot of posts like this on HN before every trying Mongo, so I've at least been convinced to always implement schema at the application layer. Others seem to keep learning that lesson in harder ways.



The biggest lure of Mongo is that it gives you a nice SQL-like query API. So it's fairly easy to get started with, compared to other NoSQL alternatives. I primarly use it for small-medium size apps - when I know upfront that I will never need to scale it beyond certain number of users in the short-medium term.

It's not as bad as it's made out to be. It's only if you really are looking to scale out, you should probably be better of picking something else.


Can you explain your reasoning? Isn't high scalability one of MongoDB's key features?


When you go to the supermarket, you may see products calling themselves "quality" or "luxury" or similar. Words like "powerful" and "scalable" are like that.


Thanks, but you didn't really explain anything. In what specific ways does it fail to scale well?


A few of the major issues I've seen around are the global write lock which is starting to get taken care of. The reliance on the OS to cache instead of intelligently caching based off data. Poor fail over support. Sharding not being a "turn key" solution like it was supposedly advertised as, misconfiguring it can lead to poor performance. Bringing extra nodes online can take a long time for data to migrate over to the new nodes.


Scaling has a number of components to it - it's not just about the absolute number of requests. There are always trade-offs (read up about the CAP theorem). My experience with Mongo has been that it has very inconsistent performance. Such inconsistency makes capacity planning very difficult. Also - stuff like failover, sharding etc. are not elegant.


"what is the sort of application/dataset for which Mongo is particularly suited"

The majority of the NoSQL databases are based on Amazon's Dynamo: loosely coupled replication. MongoDB is one of the few (next to Hbase and a few others) that adopts Google BigTable's architecture: data is divided in Ranges, and each mongod node serves multiple Ranges.

This means MongoDB is able to provide atomicity where it's harder with other SQL databases. In particular, we need to be able to do some sort of "compare and swap" operation that is guaranteed to be atomic/consistent, while still being able to have our mongod nodes distributed over multiple datacenters.

In Dynamo-based architectures, in order to provide the same amount of atomicity, you always end up writing to at least half + 1 the amount of replicated nodes you have available in your cluster. This is more awkward, and reduces the flexibility of the whole (the atomicity guarantee Mongo provides also works for stored javascript procedures, for example).

Having said that, we're using MongoDB about 3 years in production at this point, but we're far from happy about the availability it provides (issues like MongoDB not detecting that a node has gone down, failing to fail over, etc). We run a HA service, and to date all of our failures in uptime have been either the fault of our hosting provider or mongodb not failing over when it should. As such, we're always looking for a better alternative to move to, but at the moment MongoDB is about as good as it gets.


MongoDB is not even remotely close big table architecture. It has a different data model and a different sharding model, and just about a different everything.


I know that MongoDB is very different in architecture from BigTable (as opposed to HBase and BigTable, for example), but I always understood that the fundamental way they choose to assign and lookup regions to regionservers (or, in mongodb terminology, shards and shard servers) was based on the BigTable architecture.

Could you elaborate on the differences in the sharding model between the two?


Mongo's sharding model is the only thing I would call remotely close to BigTable. Everything else is leagues different.


You said that you don't like the availability issues then state "but at the moment MongoDB is about as good as it gets."?

Just. Wow.

Get your head out of the sand mate. MongoDB is nowhere NEAR as good as it gets.


I think you misunderstood what I meant; I meant that MongoDB is as good as it gets for our requirements, not availability-wise. I agree if availability is your only concern, there are far better solutions.


Documents with lots of optional fields, or lots of fields that can hold multiple values.

Workloads where writes are rare or you have a single writer separate from your readers, most reads are simple fetch-by-ID, and more complex queries are unpredictable and/or suitable for overnight batch runs.

Workloads where performance doesn't matter but you want schemaless for convenience.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: