What do you do when you get more data than fits into ram? I've been working on a...

antirez · on Oct 21, 2011

What do you do when you get more data that fits in disk?

You do the same with memory, either you buy more RAM, or you distribute the data across multiple hosts (that is what we are trying to do with Redis Cluster).

The whole point is, what order of magnitude is my data? Is memory a big enough storage for my needs?

For instance for Lamer News you can store many millions of news and comments in 1 GB. So it makes perfectly sense for this application. For other applications you want a mixed solution where fast meta-data is stored on Redis and larger documents on disk in some other database. For other apps you need Cassandra and nothing else, and for other apps a good SQL database. It depends, as usually :)

LeafStorm · on Oct 20, 2011

The two attempts at disk storage so far - VM and Diskstore - are both based on the concept of RAM as the primary datastore and the disk as merely auxiliary storage. However, antirez is a perfectionist, so he scrapped both of them when they didn't work as well as he had hoped.

In the long term (i.e. after cluster is finished), there are plans to manipulate data structures directly on disk (an incredibly elegant solution, and also really good for SSDs). Though in the short term, you are right in that RAM is far more expensive than disk, and this does limit the utility of Redis as a primary datastore.

llimllib · on Oct 21, 2011

So... what's going to happen when lamernews runs out of RAM?

ohyes · on Oct 21, 2011

I think it is actually pretty obvious what /happens/ when it runs out of RAM.

You have to either buy more RAM, start using the VM option, or start using OS virtual memory. The latter two will slow it down a bit.

Another possibility is to figure out which sets of data are taking up the most space, and offload them to a separate disk-backed DB, and just cache the most frequently accessed subset of that data through Redis (this could be easily done through key expiration). I find this option kind of painful, mostly because I really like the way that Redis approaches data, and using another database is comparatively inconvenient.

Having worked with it a bit, I think Redis is a completely awesome piece of technology, and in practice it is perfectly good to use for something like this, at least in the short term. I don't want to bring down Antirez's hard work in any way.

Being the type of developer that hems and haws over corner cases and unlikely what-ifs, however, I felt the need to ask, as it has been in the back of my mind for a while.

I am encouraged that there will be a version that manipulates the data-structures on disk, as that was the best solution that I could think of as well. (Basically you could then run two instances of Redis, a RAM based one and a disk based one, echoing commands to both, but allowing data on the disk backed Redis to expire and taking responses from whichever returns an answer first).

My only hope is that clustering won't take to long and that it won't be abandoned in favor of a different castle in the sky if it doesn't turn out to be perfect.

nknight · on Oct 21, 2011

That's pretty mind-numbingly self-evident, isn't it? The kernel kills Redis and the site is down.

aaronblohowiak · on Oct 21, 2011

depending on your settings. the kernel could also page out some RAM, which early reports suggest isnt so bad on a high-end ssd.

bermanoid · on Oct 21, 2011

Isn't that more or less the exact use-case for Membase? Or does Redis offer a lot that Membase doesn't when the data does all fit in memory?