Of course benchmarking is essential, but the Control.Parallel library can decoup...

Of course benchmarking is essential, but the Control.Parallel library can decouple expressions to run in parallel (called sparks) from threads, and manages the whole shebang for you. You more-or-less say x `pseq` y and it will try to evaluate them in separate threads if possible. The RTS statistics will tell you how many sparks got "converted" to run on another CPU and whatnot, so you can compare and see if you actually got an improvement, but it is a bit easier than spinning up dedicated threads, assuming you're parallelizing pure computations. For true concurrency, you still need to deal with threads.