Sangat sering dinyatakan bahwa meminimalkan residu kuadrat terkecil lebih disukai daripada meminimalkan residu absolut karena alasan itu lebih sederhana secara komputasi . Tapi, mungkin juga lebih baik karena alasan lain. Yaitu, jika asumsi itu benar (dan ini tidak jarang terjadi) maka ia memberikan solusi yang (rata-rata) lebih akurat.
Kemungkinan maksimum
Regresi kuadrat terkecil dan regresi kuantil (bila dilakukan dengan meminimalkan residu absolut) dapat dilihat sebagai memaksimalkan fungsi kemungkinan untuk kesalahan terdistribusi Gaussian / Laplace, dan dalam hal ini sangat terkait.
Distribusi Gaussian:
f(x)=12πσ2−−−−√e−(x−μ)22σ2
dengan kemungkinan log dimaksimalkan ketika meminimalkan jumlah residu kuadrat
logL(x)=−n2log(2π)−nlog(σ)−12σ2∑i=1n(xi−μ)2sum of squared residuals
Distribusi Laplace:
f(x)=12be−|x−μ|b
dengan kemungkinan log dimaksimalkan ketika meminimalkan jumlah residu absolut
logL(x)=−nlog(2)−nlog(b)−1b∑i=1n|xi−μ|sum of absolute residuals
Note: the Laplace distribution and the sum of absolute residuals relates to the median, but it can be generalized to other quantiles by giving different weights to negative and positive residuals.
Known error distribution
When we know the error-distribution (when the assumptions are likely true) it makes sense to choose the associated likelihood function. Minimizing that function is more optimal.
Very often the errors are (approximately) normal distributed. In that case using least squares is the best way to find the parameter μ (which relates to both the mean and the median). It is the best way because it has the lowest sample variance (lowest of all unbiased estimators). Or you can say more strongly: that it is stochastically dominant (see the illustration in this question comparing the distribution of the sample median and the sample mean).
So, when the errors are normal distributed, then the sample mean is a better estimator of the distribution median than the sample median. The least squares regression is a more optimal estimator of the quantiles. It is better than using the least sum of absolute residuals.
Because so many problems deal with normal distributed errors the use of the least squares method is very popular. To work with other type of distributions one can use the Generalized linear model. And, the method of iterative least squares, which can be used to solve GLMs, also works for the Laplace distribution (ie. for absolute deviations), which is equivalent to finding the median (or in the generalized version other quantiles).
Unknown error distribution
Robustness
The median or other quantiles have the advantage that they are very robust regarding the type of distribution. The actual values do not matter much and the quantiles only care about the order. So no matter what the distribution is, minimizing the absolute residuals (which is equivalent to finding the quantiles) is working very well.
The question becomes complex and broad here and it is dependent on what type of knowledge we have or do not have about the distribution function. For instance a distribution may be approximately normal distributed but only with some additional outliers. This can be dealt with by removing the outer values. This removal of the extreme values even works in estimating the location parameter of the Cauchy distribution where the truncated mean can be a better estimator than the median. So not only for the ideal situation when the assumptions hold, but also for some less ideal applications (e.g. additional outliers) there might be good robust methods that still use some form of a sum of squared residuals instead of sum of absolute residuals.
I imagine that regression with truncated residuals might be computationally much more complex. So it may actually be quantile regression which is the type of regression that is performed because of the reason that it is computationally simpler (not simpler than ordinary least squares, but simpler than truncated least squares).
Biased/unbiased
Another issue is biased versus unbiased estimators. In the above I described the maximum likelihood estimate for the mean, ie the least squares solution, as a good or preferable estimator because it often has the lowest variance of all unbiased estimators (when the errors are normal distributed). But, biased estimators may be better (lower expected sum of squared error).
This makes the question again broad and complex. There are many different estimators and many different situations to apply them. The use of an adapted sum of squared residuals loss function often works well to reduce the error (e.g. all kinds of regularization methods), but it may not need to work well for all cases. Intuitively it is not strange to imagine that, since the sum of squared residuals loss function often works well for all unbiased estimators, the optimal biased estimators is probably something close to a sum of squared residuals loss function.