Considerations during data analysis in plant breeding research

Rouxléne van der Merwe
Senior lecturer at Plant Breeding, University of the Free State

As in any research experimentation the use of experimental designs and statistical analyses, capable of providing accurate inferences, are of fundamental importance in plant breeding field trials. Data collected from comparative experiments should be evaluated using appropriate statistical analysis. The choice of statistical analysis will depend on the purpose of the research, the design of the field experiment and the nature of the response variable. The purpose of experimentation could be one of many. For example to describe the yield of different genotypes of a crop, for assessing variability in quality, to determine the degree of association between different variables (plant characteristics), to predict the yield of a crop based on one or more independent variables (such as nutrient level, irrigation or planting date) or for decision making purposes. Although different designs can be used to carry out different experiments, the choice of experimental design should be carefully considered as it is determined by various factors. A main factor is the occurrence of soil heterogeneity. Some designs that include blocking can be used to correct the heterogeneity defects in the field. Another factor is the choice and number of treatments. An experimental treatment is any process whose effect is to be measured and compared with others. The quantitative or qualitative component of a treatment is called the treatment levels. When there are two or more types of treatments, then each treatment is referred to as a factor. For example, when 18 different genotypes are evaluated in terms of yield, then genotypes are the treatments and the treatment level is 18, and randomised complete block design would be sufficient for this trial. However, when 18 different genotypes are evaluated in terms of their yield response to two different planting dates, and the planting date trials are performed on two separate fields, then genotypes and planting dates become two factors, which will then be planted using a split-plot design. An experiment should be well-designed in order to avoid bias and should be sufficiently powerful to detect true treatment effects that are of biological significance. The response variables can be categorised as qualitative or quantitative. Qualitative variables include those that have nominal or ordinal scale measurements. These levels of measurements are often used by plant breeders in order to classify genotypes into categories such as seed colour (yellow or white maize) or growth habit (determinate or indeterminate). However, some categories can include ranks where genotypes are arranged in sequence ranging from the highest to the lowest, depending on the variable being measured. An example is the disease rating scale. For these variables, non-parametric statistical methods can be used where subjective assessments are made and no assumptions are made about the means and variances. Quantitative variables include those that have interval or ratio scale measurements, although ratio scale measurements are most widely used and can be subjected to mathematical procedures. Quantitative variables can be either discrete or continuous. Discrete variables include those that have whole numbers and are usually counts of things (or frequencies). Examples are number of pods per plant and number of seeds per pod. Data collected on discrete variables are subjected to non-parametric statistical methods during data analysis such as frequency distributions where the mode, median and variance can be determined. Most economic important traits in plants, such as yield, plant height and dry weight exhibit quantitative differences that have continuous distributions. Data collected on continuous variables are subjected to parametric statistical methods during data analysis. In plant breeding research, the study of genetic variation in quantitative traits is essential for successful exploitation of genetic variability. The quantitative traits can be analysed following parametric statistical tools to help the breeder in designing a suitable breeding methodology. Such analyses include estimation of parameters of genetic variability, nature of gene action and gene interactions, genotype x environment interaction, estimation of parameters useful in selection, etc. The most common parametric test that is used to compare two or more sets of means (e.g. genotype means) is the analysis of variance (ANOVA). It is sufficient for determining coefficient of variance when dealing with effect of treatments (e.g. genotypes) on individual parameters (e.g. yield). However, when the plant breeder wish to compare several traits or analyse more than two traits simultaneously, multivariate analysis is required. There are several methods of conducting multivariate data analysis; however, the choice of a method depends on the research aim or purpose. Finally, each statistical technique has certain strengths, weaknesses and conditions of use that should be clearly understood before attempting to practise and interpret results of the technique. It is advisable for any plant breeder or young researcher to familiarise themselves with basic biometry and data analysis techniques and consult a specialist in this field before a breeding or research project commences. This will not only reduce the probability of failure and thus save time and resources but also improve on precision of experimentation and plant breeding research.

Considerations during data analysis in plant breeding research

Share This Article, Choose Your Platform!