Can anyone point me to a good technical review/discussion which is favorable to CouchDB? I've spent a lot of time reading their documentation and I didn't "get it". I am afraid I'm not smart enough to understand why running your queries in a form of two JavaScript functions can be faster or more convenient than mighty SQL.
Frankly, I am also not into "distributed" thing either since I haven't seen a startup (with my own eyes) that had such insane load requirements $1K MySQL hardware couldn't easily handle.
However, I realize that authors didn't spend all that time implementing something without a real need for, I just haven't discovered it yet - most of CouchDB stuff I was able to find was mostly tutorials, docs, etc but I wish I could find a systems architect blog post titled "How CouchDB saved our ass".
This is mostly an "it depends" sort of issue. Most people/sites don't really need a big SQL engine to store their data but follow the path of least-resistance in that general direction. CouchDB is a part of a growing movement away from general-purpose RDBMs towards task-specific data engines and from the world where ACID was all that mattered to one where the BASE model can provide an alternative set of benefits.
What CouchDB gives you are 1) schema-less data storage (so you can change your data schema on the fly), 2) built-in map-reduce that lets you perform some queries faster (at the cost of needing to re-think how you perform certain SQL JOIN queries), 3) an easy path towards data distribution, and 4) a database that is accessed via HTTP using REST principles.
Running your queries in a couple of Javascript functions (or python functions, or ruby functions, or any other language that has a view server) is not necessarily faster than SQL and in fact is probably slower. What makes it faster and more scalable is the built-in map/reduce primitives that these functions are using. Since the db is accessed via HTTP, the data format is JSON, and you can create queries using Javascript this system also introduces the intriguing possibility that you can cut the app server out of the picture and write web apps that interact directly with the data.
CouchDB uses the map/reduce query model which achieves its scalability by being intrinsically parallelizable. It aims to provide the foundation for databases that are scalable from the beginning, or at least offer a straighforward path to scalability.
Judging by all the "How we scaled MySQL with mad hacks at [some growing startup]" articles that appear here, I would say there is a serious need for CouchDB. If your company starts growing as fast as Twitter and you don't have a team of brilliant ninja developers to build you a custom messaging server from scratch, you'll see the need for it too.
It's too bad people think MySQL is representative of all relational databases. I guess "mad hacks" make for better press / blog buzz than "we used postgres/oracle/etc. and it wasn't a big deal". (Some people love all-nighters, too.)
Then again, the last thing I read about MySQL had somebody complaining how slow it was, but they hadn't ever indexed their tables. MySQL may not be the issue.
I would love to hear about couchbd from someone who also understands relational databases. Under what circumstances would it be useful, besides hypothetical scaling? I don't need a better big RDBMS, but I wouldn't want to miss another incredibly useful "small" database tool like sqlite.
A lot of people not getting distributed DBs changed their mind after reading this tutorial. I hope it can be informative in some way for you. Sorry my English sucks but the article should be understandable.
SQL does not scale out to multiple databases easily, CloudDB does.
And I know many startups where the above is not true, in fact, any successful startup hits a major bottleneck when they have to distribute MySQL. It's not a function of money, it's a function of the time and expertise to implement sharding well.
"SQL does not scale out to multiple databases easily, CloudDB does."
I don't believe CouchDB currently does this any better than what people are doing with RDBMS like MySQL. Right now with CouchDB, there's no bult-in partitioning, so if you want to distribute documents across databases on multiple host systems, you use some sort of reverse proxy to distribute requests across hosts or application level database sharding. Both of these approaches are currently used by folks scaling out things like MySQL.
To what extent is this a problem with relational databases as a whole versus MySQL in particular?
Also, is it known what problems couchdb encounters as it scales? Issues with indexing?
I'm curious because JSON is a subset of Lua's primary datatype in much the same way it is Javascript's object notation - if you change surface details like braces and string quoting, it's semantically identical.
We are using couchdb on massify.com for an upcoming product.
The replication is handy, but in the end you'll end up sharding in the app layer, because afaik it doesn't add anything in that regard. We are a long ways off from dealing with that though, so I honestly don't know much about it other than setting up replication which is easy as pie.
If you still have to shard with Cloud DB, then yes, Cloud DB is completely useless. But in theory, Cloud DB should be able to take care of sharding for you, and if it can, that'd make it much better than MySQL.
What would you use it for? Also, how would one go about implementing something like this in practice?
CouchDB isn't truly distributed, so one would need to run multiple isolated instances which would in turn impose restrictions on the size of a single database as well as the number of concurrent writes.
I've been toying with the idea of writing a CouchDB interface on top of HBase (or possibly some other column oriented DBMS) to overcome these issues and am wondering if anyone else has considered doing the same thing.
I'm not so sure about the isolated part. If I recall correctly, all instances should have the same data. Hence the size of a given database should match on all instances.
And why restrict the size? ...just charge per mb and put the onus on the user.
I thought you could write a javascript function to filter out datasets for each server, the client then queries multiple servers looking for what it wants?
Specifically CouchDB. Not SimpleDB or Google's Big Table. I've played with CouchDB and Python and I like it but it's overkill for my tiny just for fun projects, but if it turned out more cost effective to host in-house in the future, not having the lock-in is super attractive.
I like the concept. There could be an interesting business idea here. One of the problems that concerns me is speed of data access. If I'm running the service on AWS, your app will need fast access to your data. If you are hosting your app on AWS, why not run couchDB yourself? You will get much better speeds utilizing the cloud service's internal infrastructure rather than accessing your data via the internet. If you are hosting your web service somewhere else, then why not use the database solutions provided there? I'm just trying to understand the need for this kind of service.
Frankly, I am also not into "distributed" thing either since I haven't seen a startup (with my own eyes) that had such insane load requirements $1K MySQL hardware couldn't easily handle.
However, I realize that authors didn't spend all that time implementing something without a real need for, I just haven't discovered it yet - most of CouchDB stuff I was able to find was mostly tutorials, docs, etc but I wish I could find a systems architect blog post titled "How CouchDB saved our ass".