2.2 Estimating genetic effects of the parents

As the content of this section may not be common knowledge for breeders, we will first describe some basic aspects of the statistical methods used for the interested reader. Later on this will be illustrated with data in Example 2 in section 2.2.1 and Appendix 2 and in Example 3 in Appendix 3.

2.2.1 General combining ability (GCA)

The genotypic effects of the dura female and the pisifera male parents can be estimated from the performance of their tenera offspring provided that these effects are additive. The expected yield of the tenera offspring of the cross DixPj, E(yij), can then be written as the sum of a general constant, µ, the genotypic effect i of the dura mother Di and the genotypic effect j of the pisifera father Pj :

E(yij) =µ+ai +bj .

In quantitative genetics these additive effects of the parents are called General Combining Ability (GCA ) values.

For a set of C crosses, derived from A dura and B pisifera,where C £ A*B, the parameters µ,a1, ..., a A, b1, ...,b B can be estimated using the Least Squares Method. Assume that there are nij plots available of a certain tenera cross DixPj; if no cross has been made then nij = 0. Let us consider first the case that we have used a completely randomized design (CRD), i.e. the plots are allotted at random to the progenies. (In section 2.5 we will consider the use of an incomplete block design to compare the tenera offsprings.)

In the following example we have made C=16 crosses between A=5 dura and B=4 pisifera. In the experimental field there were 20 plots available and a completely randomized design (CRD) was used. The number of asteriks (*) in the table below shows how many plots are used by a certain cross; hence two asteriks means two plots etc. So n11=2, n12=1, n13=0 (no cross of dura 1 x pisifera 3), n14=1, etc.

EXAMPLE 1

 

pisifera

1

2

3

4


dura 1

**

*

*

dura 2

*

*

**

dura 3

*

*

*

*

dura 4

**

**

dura 5

*

*

*

*


The actual yield yijk of the k-th plot of a tenera offspring of the cross DixPj is a random sample of the population of all possible observations from this cross with population mean or expectation E(yijk) and variance s 2; hence, the statistical model is yijk = E(yijk) + eijk , where eijk is the effect of the environment or error on this k-th plot. These error-terms eijk are such that the expectation E(eijk) = 0 and the variance Var(eijk) = s 2; these errors are uncorrelated with one another because we have randomized the plots over the crosses. When one uses a randomization procedure to allot the plots of a field to the crosses, as with a completely randomized design (CRD), then the plot-errors can be assumed to be uncorrelated.

With such a model for the yields, this Least Squares Method searches estimates m, a i, and b j for the parameters m , a i and b j respectively, such that the sum of the squared deviations between the observation and the estimate of their expected value for k=1,...,nij, i=1,...,A, and j=1,...,B, S iS jS k[yijk – (m+ai+bj)]2 is minimal.

Good statistical packages such as SAS, SPSS, SYSTAT, BMDP and GENSTAT can provide these Least Squares estimates for the parameters. For the Normal Equations and their solution see Appendix 1.

From the Least Squares estimates for the parameters one can calculate the Least Squares Mean for a dura Di , LSM(Di), as m+ai +S bj /B and the Least Squares Mean for a pisifera Pj , LSM(Pj), as m + S ai /A +bj . We can rank all the dura and the pisifera according to their General Combining Ability on their Least Squares Means LSM(Di) and LSM(Pj), provided that the crossing scheme is connected. The term connected crossing scheme will be explained below.

If we make all possible crosses between A dura and B pisifera, hence we have C=A*B crosses, the crossing scheme is called a complete diallel scheme; if the number of crosses C is less than A*B the crossing scheme is called an incomplete diallel scheme.

In a complete diallel scheme with C=A*B crosses, where each cross has the same number of plots, nij = n for i=1,...,A and j=1,...,B , the Least Squares Mean for a dura or pisifera is just the average of the observations.

In this case LSM(Di)=yi../(n*B)

and LSM(Pj) = y.j. /(n*A).A complete diallel is always connected.

For an incomplete diallel scheme with C<A*B crosses one must use a good statistical package to get the Least Squares Means for the dura and the pisifera parents. The difficulty with an incomplete diallel scheme is that it can be disconnected and not all statistical packages notice this.

Furthermore, a good statistical package provides the estimate for the common variance s 2 as the Mean Square Error (or Mean Square Residual) from the Analysis Of Variance (ANOVA) Table.

In order to compare the entire set of the A dura and the B pisifera on the basis of the GCA values the parents must be crossed according to a so-called connected crossing scheme. A crossing scheme is called connected if for each dura pair (Dh,Di) of the A dura, there is a chain of dura from dura Dh to dura Di, in which each of the adjacent links of the chain occur together with the same pisifera. Otherwise the crossing design is called disconnected. In the same vein, the crossing scheme is connected if for each pisifera pair (Pk,Pj) of the B pisifera, there is a chain of pisifera from pisifera Pk to pisifera Pj, in which each of the adjacent links of the chain occur together with the same dura. Another way to check whether the crossing scheme is connected, is to form a two-way table of the crosses with the A dura as rows and the B pisifera as columns. The crossing scheme is connected if we cannot split the table in separate tables by interchanging rows and columns.

Let us elucidate this by a little example with C=8 crosses made from A=4 dura and B=4 pisifera. Let the realized crosses be indicated by an asterik (*) in the table.

Pisifera

P1 P2 P3 P4

Dura D1

*

*

D2

*

*

D3

*

*

D4

*

*


From the cross of dura D1 with pisifera P1, D1xP1, we can make a chain to the cross of dura D3 with P1, D3xP1; from D3xP1 we can go to the cross D3xP3, and from this cross D3xP3 we can go to the cross D1xP3, and then we come back to the cross D1xP1 . In this chain we have missed dura D2 and D4. Hence this crossing scheme is disconnected.

When we have rearranged the table as follows (interchange P3 with P2 and also interchange D3 with D2 ),

Pisifera

P1 P2 P3 P4

Dura D1

*

*

D2

*

*

D3

*

*

D4

*

*


we see directly that there are two disconnected sets of four crosses each. The first set contains the connected crosses D1xP1, D1xP3, D3xP1 and D3xP3; the second set contains the connected crosses D2xP2, D2xP4, D4xP2 and D4xP4. In such a disconnected crossing scheme no unbiased estimate can be made for the difference in effect between, for example, dura D1 and D2 or for the difference in effect between pisifera P3 and P4.

A more practical method of checking whether a crossing scheme is connected is to draw a chain from one cross to another following a horizontal or vertical direction only. If all the crosses are connected by one continuous chain then the crossing scheme is connected. In the above mentioned example the crossing scheme is connected if e.g. the following 8 crosses were made:

Pisifera

P1 P2 P3 P4

Dura D1

*

*

D2

*

*

D3

*

*

D4

*

*

A necessary condition to have a connected design is that the number of crosses C must be at least A+B-1. In the example above we have A=4 and B=4, so 4+4-1=7 crosses sufficient for a connected design. But we have 8 crosses and the crossing scheme is still connected if, for example, the cross D4xP1 was not made. But we must realize that this condition C³ A+B-1 is not sufficient. We must always check for connectedness by making a continuous chain through the crosses of the crossing scheme.

EXAMPLE 2

Assume that C=9 progenies (2 plots each), from A=5 dura and B=3 pisifera, are tested in a completely randomized design. We assume an additive model for the genetic effects of the dura and pisifera parents. Yield records (kg/plot) were as follows:

Pisifera

P1

P2

P3

Total

Dura D1

44

48

92

D2

45

42

45

43

175

D3

33

36

35

32

36

38

210

D4

44

42

46

48

180

D5

53

55

108


Total

248

241

276

765

This crossing design is connected because there is one continuous chain which connects all the crosses. For the analysis of this Example 2 see Appendix 2.

A solution of the Normal Equations gives:

m=54; a1= -4.8333333; a2= -6.9 ;

a3= -16.7666667; a4= -7.2333333; a5= 0;

b1= -3.1666667; b2= -3.5333333; b3= 0 .

The estimate for the common variance s 2 is 3.00303 based on 11 degrees of freedom.

A breeder is not interested in testing the hypothesis that all dura effects (or all pisifera effects) are the same, but is much more interested in how much pairs of dura or pisifera can be different in GCA values.

The difference between two dura effects, for example D1 and D2 is a 1 -a 2 and is estimated by a1 - a2 = 2.06667 with an estimated standard error of 1.61327.

The difference between two pisifera-effects, for example P2 - P3 is b 2 -b 3 is estimated by b2 - b3 = -0.26667 with an estimated standard error of 1.183813.

It is often reasonable to assume that the error-terms (and hence the yields) are Normally distributed, so we can construct for example a 95%-confidence interval for differences between the General Combining Abilities of the parents.

Let us illustrate this for a 95% confidence interval for such differences. The 5% two-sided significance point for a t-distribution with 11 degrees of freedom is 2.201 .

Hence the 95%-confidence limits for a 1- a 2 are 2.06667 ± 1.61327*2.201 = 2.06667 ± 3.55081 and the 95%-confidence interval is -1.48414 <a 1-a 2 <5.61748 .

In the same way the 95% confidence limits for b 2-b 3 are calculated as -0.26667 ± 1.183813*2.201 = -0.26667 ± 2.60557 and the 95%-confidence interval is -2.87224 <b 2-b 3 <2.33890 .

Note that if a 95% confidence interval for the difference of two parental effects contains zero, then this means that the null hypothesis "These two parental effects are equal" is not rejected with a significance level 5%. If the 95% confidence interval for the difference of two parental effects does not contain zero, this null-hypothesis of equal parental effects is rejected with a significance level of 5%.

To rank the parents according to their GCA values, we can for example use the Least Squares Mean (LSM). The Least Squares Mean of for example dura D1 is estimated by

m+a1 +(p1+p2 +p3 )/3 = 54 +(-4.83333) +

[(-3.16667) +(-3.53333) + 0]/3 = 46.93333, etc.

Dura LSM rank Pisifera LSM rank

D1

46.9333

2

P1

43.6867

2

D2

44.8667

3

P2

43.32

3

D3

35

5

P3

46.8533

1

D4

44.5333

4

D5

51.7667

1

We can get the same ranking of parents according to their GCA values, if we use a solution of the Normal Equations for these parental effects.

Normal Equations

Normal Equations

Dura solution rank Pisifera solution rank

D1

-4.83333

2

P1

-3.16667

2

D2

-6.9

3

P2

-3.53333

3

D3

-16.7667

5

P3

0

1

D4

-7.23333

4

D5

0

1

2.2.2 Specific combining ability (SCA)

Sometimes the additive model of the genetic effects of the parents do not fully explain the performance of their offspring. This is attributable to an interaction effect of the genetic effects of the parents. In other words, besides the additive genetic effects (General Combining Ability) of the parents there is also a specific interaction effect due to the specific combination of the parents. This specific interaction effect is called in quantitative genetics Specific Combining Ability (SCA). For this interaction model the expected yield of the tenera offspring of the crossing DixPj, E(yij), can then be written as the sum of a general constant, m *, the GCA effect a i* of the dura mother Di, the GCA effect b j* of the pisifera father and the SCA effect (a b )ij* of the realized cross:

E(Yij) = m * + a i + b j +(a b )ij 8 = m ij

When we have a set of C crosses, derived from A dura and B pisifera, where C£ A*B, the C parameters m ij can be estimated using the Least Squares Method.

Assume that there are nij plots available for a certain tenera cross DixPj; in the case that there has no cross been made then nij = 0. We consider here the case that we have used a completely randomized design (CRD). In section 2.5 we will consider the case that we have used an incomplete block design.

The actual yield yijk of the k-th plot of a tenera offspring of the cross DixPj is yijk = E(yijk) + eijk, where eijk is the effect of the environment or error on this k-th plot. These errors eijk are such that the expectation E(eijk) = 0 and the variance Var(eijk) = s 2; these errors are uncorrelated with one another. When one uses a randomization procedure to allot the plots of a field to the crosses, such as in a completely randomized design (CRD), then the plot-errors can be assumed to be uncorrelated.

The Least Squares Method searches estimates mij for the parameters m ij such that the sum of the squared deviations between the observation and the estimate of their expected value for k=1,...,nij, i=1,...,A, and j=1,...,B, S iS jS k [yij – mij]2 is minimal.

The Least Squares estimates mij for the parameters ij are found as solutions of the Normal Equations, which are in this case very easy.

Let us denote the sum of the observations of the nij plots of the cross DixPj by yij. ,

hence S k yijk = yij. . The Normal Equations are then:

nij * mij = yij. (4)

for i=1,...,A and j=1,...,B . There are only C Normal Equations present, because if a certain offspring DixPj has not been realized, then nij = 0 for such a progeny and we have no observations of this progeny.

The parameter estimates are then mij = yij. / nij, the progeny means of the crosses DixPj .

To estimate the Specific Combining Abilities of these progenies we must now calculate the estimates m for m , ai for a i and bj for b j for the parameters according to an additive model

E(yijk) = m +a i +b j

as has been explained in section 2.2.1 .

The estimate for the Specific Combining Ability (a b )ij* is

(ab)ij* = mij -(m + ai + bj ).

See Appendix 3 for the analysis and Example 3.

REMARK

For many characteristics of oil palms the Specific Combining Ability is not so large. Hence for a first screening of parents the additive model to estimate the General Combining Abilities is a good tool.