This is a fairly deep topic so your question has a really long & detailed answer. The simple answer is "aliasing." The concept is that picking one sample for each pixel makes an aggregate image that could be confused with other images, whereas a good downsampling algorithm only represents a (lower frequency, bandlimited) version of the original image.