Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Could you help me out here - which is easier to get started developing with - MPI or hadoop?

Its not just about cheap processor time, its about ease of use. Hadoop is a platform that does much much of the hard part for you, giving you an enormous head start to solving problems in parallel because all you have to really think about is the algorithm you're executing. Thinking in mapreduce is still hard, but its a whole lot easier than thinking in MPI, and it takes a lot less time to get real computations happening on your data.

The cloud computing time could be MORE expensive than supercomputer time. That doesn't matter. The point is that this stuff is now accessible to a much wider group than MPI was, and they're learning the skills to crunch gobs of data.

Its pretty clear that you're a 'smart kid' and that both MPI and mapreduce are pretty easy for you. Thats not the case for most people. You're talking about hard scientists. I'm talking about normal developers.



you have a good point, but for what it's worth here's the practical example that is the basis of my opinion on this. I had a very large corpus of text and I wanted to construct a word association graph. This involves multiplying together lots of sparse 10^5 x 10^9 matrices that are too large to fit in single memory. This is out of core. Getting together 6 computers and setting up hadoop and HDFS took me about 2 days starting from nothing.

Figuring out how to do all of these out of core sparse matrix manipulations honestly took me about a week of tinkering before I had anything worth even trying. I'm certainly not an expert at this stuff so maybe a better coder could have done it a lot faster but that's my problem.

What would be a real advance in bringing massive scale data analysis to "the people" would be something like hadoop R or MATLAB that makes doing things like this completely automagic.


Regarding the real advance: That stuff is the very 'revolution' thats coming that I talk about :) Visicalc kindly sucked, but Excel is pretty good. Its still very early, but the way forward has become clear.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: