"Knowing when to draw the line: designing more informative ecological experiments."
Reviewed 10/18/11
Data analysis has at its root the experimental design of a project. Every scientist who hopes to one day analyze their data must begin with the most appropriate arrangement of treatments and replications across space and time. Two alternative methods for analysis are linear regression and analysis of variance (ANOVA), both of which have their pros and cons when it comes to information produced and statistical power of that information.
ANOVA has traditionally been used for discrete independent variables of the presence/absence or type-based, etc. It has also been used for continuous variables that can be grouped into a gradient of categories, such as levels of nutrients in a gradient, or classes of densities in a population. Regression on the other hand is strictly meant for fully continuous independent variables that have a linear relationship with the response. The relationship between the response and independent variables can be transformed, but the basic linear pattern must be met.
ANOVA and linear regression both have at their base the same mathematical model, the general linear model. The difference is that while regression operates to find the parameter estimate for the relationship of the independent and response variable, ANOVA creates dummy variable terms for each level of the discrete independent variable. Then can look at each level and determine whether it differs significantly from the other terms in the model. It becomes readily apparent then that the benefits of using ANOVA come from its power to look at each term separately, without tying to force any sort of pattern onto the relationship between the response and independent variables. The consequences, however, could be a model with an overabundance of terms, leading to a lack of statistical power and making it harder to tie relationships between terms to one another.
Choosing one method over the other for data analysis and experimental design can be easy in some cases: where there is no limit to the number of experimental units for example, or where an independent variable has no underlying continuity to it and so analysis must be done using discrete dummy terms. However, as mentioned before, it is more complicated a decision process when the continuous variable can be grouped.
After reading this review I have decided that there are several main questions to focus on. Limitations on the number of experimental units may require analysis using ANOVA, as more replication ability would be possible. If regression really is the desired method of analysis in this case, than the experiment should be designed so that a fall back of ANOVA is possible. If the relationship comes back as non-linear, than it will also be necessary to use ANOVA. Regression analysis requires that the response and the residuals be normally distributed; this is not as much of a requirement for ANOVA, but accuracy of measurement for the independent variable is critical.
Lastly, the benefits of regression are very simply described but can have huge effects on the quality of research produced. Estimates derived from regression analysis have a much higher statistical power for a lower R-squared value than does ANOVA. Regression is also much better equipped to denote the relationship between a response and independent variable. Those parameter estimates can then be fed into ecological models and used in future research, a great incentive for those running simulation based analyses.
No comments:
Post a Comment