Should forest regeneration studies have more replications ?

When it comes to testing for differences in seedling survival, researchers sometimes make a Type II statistical error (i.e. failure to reject a false null hypothesis) due to the inherent variability associated with survival in tree planting studies. For example, in one trial (with five replications) first-year survival of seedlings planted in October (42%) was not significantly different (alpha = 0.05) from those planted in December (69%). Did planting in a dry October truly have no effect on survival? Authors who make a Type II error might not be aware that as seedling survival decreases (down to an overall average of 50% survival), statistical power declines. As a result, the ability to declare an 8% difference as “significant” is very difficult when survival averages 90% or less. We estimate that about half of regeneration trials (average survival of pines <90%) cannot declare a 12% difference as statistically significant (alpha = 0.05). When researchers realize their tree planting trials have low statistical power, they should consider using more replications. Other ways to increase power include: (1) use a one-tailed test (2) use a potentially more powerful contrast test (instead of an overall treatment F-test) and (3) conduct survival trials under a roof.

Although researchers often fail to reject the null hypothesis, they should never accept a null hypothesis.Even so, often researchers conclude that various treatments did not affect seedling survival.This may be true when the difference between means is very small.However, in several studies, an examined factor was not significant (α= 0.05) but the treatment increased survival of pine seedlings by at least 20% (Table 1).Could this type of increase be both biologically significant and statistically insignificant?This certainly can happen when the study design has low statistical power.
There is always a chance that some of our conclusions about treatment effects are wrong (Park 2008).Simply due to chance, researchers might conclude a treatment worked, but in reality it didn't (i.e.Type I error).More commonly we say the treatment had no effect on seedling survival, but there really was a treatment effect (i.e.Type II error).Examples of possible Type II errors are provided in Table 1.Because of a combination of limited resources and tradition, the design of most seedling survival trials has insufficient replication to detect a "true" 8% difference in survival.This is because most researchers in the southern United States use four replications or less.Installing additional replications will cost more, but it will also improve the power of survival tests (i.e. 1 -beta value).
Table 1.Examples of forest regeneration studies where the listed treatment increased seedling survival by more than 8% but the increase was not statistically significant (α = 0.05).A Type II error is likely (see comments) but the power of the test was too low to declare the increase as statistically significant.
We have noticed that statistically, it may be relatively easy to declare a 30 cm increase in height as significant, but hard to declare an 8% difference in seedling survival as statistically significant.This is partly because when average survival becomes less than 90%, the standard errors increase.There are various ways to increase the power of our tests (Table 2), but many choose to use less than 5 replications and some analyze trials with a simple multiple-range test (see various We and others have pointed out the importance of replications in both nursery (VanderSchaaf et al. 2003) and field trials (Zedeker et al. 1993;Foster 2001).In a previous paper, we examined the decline in power as plantations get older and increase in biomass (South and VanderSchaaf 2006).In this paper, we examine the impact of replication on the (a priori) power of establishment trials that report survival.In several cases, the power of the statistical test was so low that it was not able to declare a 20% difference as significant.Perhaps one reason authors don't typically report statistical power (Peterman 1990) is because the statistical power is low or because they only care about making Type I statistical errors.This paper examines various ways to increase the power of regeneration trials.

Simulation studies
Computer simulations were created to examine the effects of replication on standard error, standard deviation, LSD and power.Simulation #1 assumed a completely randomized design (CRD) with two treatments.Plot size for experimental units did not vary (i.e. the experimental area doubled when the number of replications doubled).Simulation #2 involved a randomized complete block design (RCB) with two treatments and a total of 200 seedlings per treatment.The numbers of replications simulated were 4, 8, 10 and 20, with 50, 25, 20 and 10 seedlings planted per experimental unit, respectively.

Literature survey
We selected 50 papers that were published in the proceedings of the Biennial Southern Silvicultural Research Conference (papers are available at www.cpe.vt.edu/ssrc/bssrc-proceedings.html).Only papers that involved pine seedlings and detected a significant difference (α = 0.05) are included in Table 3. Papers that did not report either F-test or mean comparison test results (e.g.Hassan and Silva 1999) were not included in the survey.In cases where multiple pairs of means were declared different, the pair with the smallest difference was selected.
Various mean comparison tests were included and in some cases, the type of test used was not reported.In theory, most values listed in Table 3 are slightly greater than an LSD value.The difference between the two means (B-A) was plotted on the y-axis and the lower mean (A) was plotted on the x-axis (Fig. 1).
Figure 1.The relationship between the smallest significant difference (B-A from Table 2) and lower survival value (A from Table 2).Detecting a 10% difference in survival is likely when average seedling survival is greater than 90% but unlikely when average survival is less than 70%.For this dataset, the LSD values appear to increase as the survival of the lowest treatment mean decreases.

Results
Simulations that varied replications revealed contrasting results.The standard error decreased when doubling replications involved doubling the area planted (Fig. 2) but the standard error increased when doubling replications resulted in smaller (i.e.fewer seedlings per plot) experimental units (Fig. 3).
The BSSRC survey suggests that an 8% difference in survival may be declared significant in less than 3 out of 10 trials.More than 1 out of 5 trials cannot detect a 15% difference in survival (Fig. 1).Due to low statistical power (1-beta), many trials are not able to detect a significant difference in survival, even when the treatment caused a "real" increase in survival.
The survey data also indicate that statistical power declines as seedling mortality increases.When survival (of the lowest mean) is more than 70%, an 8% increase in survival might be declared significant perhaps 4 out of 10 times.However, the chance is near zero when survival (of the lowest mean) is less than 70% survival (Fig. 1).The results from a greenhouse trial (Fig. 4) are similar to those obtained from the survey results.

Discussion
Prior to installing a seedling survival trial, researchers may ask how to design a trial so it might have a chance of detecting an 8% difference in survival.The following discusses several options one might consider.

Increasing Power by Increasing Study Area
Sometimes doubling the number of replications doubles the study area and hence the number of seedlings planted.Increasing the number of replications (from 4 to 8) will increase power and will decrease the LSD value (Fig. 1).The effect of simulated replications is shown in Figure 2 and an actual case is illustrated in Figure 4.As replication increases, the standard error and LSD values decrease.In contrast, the standard deviations and coefficient of variation increase slightly (from two to seven replications) because more variability is entered into the system as the total number of experimental units increase.In Figure 2, a survival difference of 16% was detected with four replications while an 8% difference was detected with ten replications.

Reducing the LSD by Reducing the Size of the Experimental Unit
For short-term trials, it is possible to increase the number of replications without increasing the area planted.For example, a study with four blocks and 50

LSD (%)
seedlings per experimental unit will require the same number of seedlings as a study with eight blocks and 25 seedlings per experimental unit.The trial with eight replications will have about the same power as one with four replications, but the LSD will likely be smaller (Fig. 3).When appropriate for short-term study objectives, we strongly recommend using 10 seedlings per experimental unit with a minimum of 10 replications.

Do Not Use Pseudoreplication
Pseudoreplication occurs when treatments are not replicated but an ANOVA (analysis of variance) is carried out by assuming sub-samples (or in some cases individual trees) are the same as replication.A major factor that leads to pseudoreplication is an inability (or reluctance) of the author to define the correct experimental unit.Some journal editors do not require ANOVA tables be included in manuscripts.Therefore, one might never know if the error term involved 13 or 2841 degrees of freedom.Reviewers who would normally reject non-replicated trials can be fooled into thinking that a statistical analysis (that involves pseudoreplication) is valid.Although some forestry research involves pseudoreplication (Hay and Rennie 1983;Dong and Burdett 1986;Smith 1989;Kamaluddin et al. 2005;de Souza et al. 2016), it should not be used as a method to reject a null hypothesis.

Higher Alpha Valu es Increase Power
A method some silviculturists choose to increase the power of a tree planting test is to raise the alpha value (Table 4) and increase the probability of a Type I error.Some wisely use a 0.1 level to reduce the Type II error (Walker et al. 1981;Walker et al. 1985;Amishev and Fox 2006;Cram et al. 2007;Dean et al. 2013;Scott and Stagg 2013;Curtis et al. 2015).In some cases an alpha value of 0.15 has been used (Xydias 1983;Haywood et al. 1998).Regardless of the alpha value selected, authors should list the actual p-value for treatment effects.In some trials one-tailed F-tests (or one-tailed t-tests) can be employed and this may increase power.For example, in some cases a researcher is only interested in knowing if a treatment increases seedling survival.There may be no real need to know if a decrease in survival is "statistically significant."A one-tailed F-test at a 0.05 alpha level produces equivalent results as a two-tailed test at the 0.1 alpha level.This knowledge allows researchers to increase statistical power without increasing the frequency of Type I errors (since no conclusions are made regarding treatments that lower seedling survival).We strongly recommend a priori, one-tailed tests be used for survival trials where appropriate.

Pre-planned Contrast Tests Increase Power
In most of the studies in Table 3, means were compared using a test such as Duncan's multiple range test, Tukey's test or Newman-Keuls test.A few used a preplanned contrast procedure (Mize and Schultz 1985;Warren 1986) to test for treatment effects.An example of this method is illustrated in a study with 11 nursery treatments and four replications (Blake and South 1991).When survival ranged from 89% to 98%, an F-test (p=0.68)suggested no treatment effect.However a pre-planned contrast test (that compared only top-pruned treatments) revealed a significant (α =0.05) treatment effect.

LSD as an Indication of Statistical Power
Although reporting statistical power is important (Peterman 1990), most forest regeneration papers do not provide any indication of the power of the test (i.e.no LSD values and no beta values).However, when comparing studies with the same experimental design, LSD values do provide the reader with some indication of the statistical power (Nemec 1991).In one study (Jackson et al. 2012), LSD values were reported when comparing six treatment means.This provided an opportunity to compare published LSD values (Fig. 4) with Figure 1.As expected, LSD values for pine seedlings (outplanted in sand pits) increased as the percent survival (of the lowest mean) decreased.Increasing the number of replications decreased the LSD values (Fig. 4).When the lowest mean was 80%, the LSD for four replications was about 18% and replicating 12 times reduced this to 14%.The predicted LSD values (Fig. 2) were similar in magnitude to those in Figure 4. We recommend researchers routinely report LSD values when reporting survival means.This will provide the reader with some idea of the power of the test.However, in certain cases, a lower LSD value is not associated with higher statistical power (Fig. 3).

Roofed Survival Trials
When the objective is to test nursery practices or tree planting treatments on first-year survival, then we highly recommend the planting site include a roof that protects seedlings from rainfall.Much time and effort has been wasted designing and installing outdoor studies that end up with adequate rainfall and high survival.For example, in one study (South et al. 2012), treatments that received rainfall resulted in a LSD of 5.3 but the difference in survival was only 1% (95% vs 96%).However, when these seedlings were planted under a roof and exposed to a four-month drought, the treatment effect was significant (P = 0.007; LSD = 14.6; means were 28% and 74%).Approximately one-third of the studies listed in Table 3 could have been established in a roofed stress house.

Use Figure 1 to Evaluate Your Study
After analyzing a seedling survival trial, researchers may want to evaluate their study design.A simple way is to plot the LSD value (α =0.05) on the Y-axis of Figure 1 with the lowest treatment mean on the X-axis.If the point is below the slope-line, your study is above average.However, if the point is above the line, you might need to increase the number of replications in future trials.

Acknowledgements
Authors wish to gratefully acknowledge Dr. Paul Jackson who provided the data for Figure 4 cr ea s in g Po w er b y In c re a sin g Stu d y Ar ea 25 4. 2 Reducing the LSD by Reducing the Size of the Experimental Unit 25 as an Indication of Statistical Power 27 4.7 Roofed Survival Trials 27 4.8 Use Figure 1 to Evaluate Your Study 27 5 Ackn o wl ed ge m en t s 28 6 Re f er en c e s 28 1 Introduction

Figure 2 .
Figure 2. Simulation #1 used a completely randomized design with two treatment means (85% and 90%).Each data point in the graph represents the average value from 100 simulations.The size of each experimental unit was the same for each data point.Then average values for the standard error, standard deviation (SD) and least significant difference (LSD)vary with the number of replications.A 15% LSD (α = 0.05) can be expected using four replications while a 10% LSD requires seven replications.The star represents an LSD value from a spacing study that contained 22 replications.The standard error of the mean (squares) and standard deviations (diamonds) are also plotted.An a priori power line (1-beta) is plotted assuming a constant standard deviation of 7, α = 0.05 and an 8% survival difference between two means.

Figure 3 .
Figure 3. Simulation #2 involved randomized complete block design with two treatment means (70% and 80%).Each data point in the graph represents the average value from 10 simulations.The graph shows the relationship between the least significant difference (LSD) and number of replications where the size of each experimental unit decreased as the number of replications increased.This simulation involved planting 200 seedlings per treatment, regardless of the number of replications (i.e. a total of 400 seedlings planted).A 14% LSD (α = 0.05) was achieved using four replications (50 seedlings per plot) while a 9% LSD requires twenty replications (10 seedlings per plot).

Table 2 .
The probability of correctly rejecting a null hypothesis (when the null hypothesis is false) is influenced by various factors.

Table 3 .
Examples of survival means (A;B) reported from 50 selected papers published in various volumes of the proceedings of the Biennial Southern Silvicultural Research Conferences (available at www.cpe.vt.edu/ssrc/bssrcproceedings.htmlYear = year of meeting; Page = location of means A and B).The mean separation test detected a significant (α = 0.05) difference between the lower mean (A) and the greater mean (B).In most of these trials, B-A is slightly larger than a least significant difference (LSD) value.RCB = Randomized complete block; CRD = completely randomized design.
Jackson et al. (2012;hip between the least significant difference (LSD) and seedling survival (for the lowest reported mean out of six treatment means in a randomized complete block design), for stored pine seedlings reported byJackson et al. (2012; table 5).As survival decreases, the LSD values increased.Each square (dashed line) represents four replications and each dot (solid line) represents 12 replications.There were 30 seedlings per experimental unit.
and to Stanley Zarnoch and Steve Grossnickle for providing helpful reviews of the manuscript.This paper was presented at a meeting of the Biennial Southern Silvicultural Research Conference on March 14, 2017.Amishev DV, Fox TR (2006) Impact of weed control and fertilization on growth of four species of pine in the Virginia Piedmont.In: Proceedings of the 13th biennial southern silvicultural research conference.US Forest Service.General Technical Report SRS-92.p. 121-123.Blake JI, South DB (1991) Effects of plant growth regulators on loblolly pine seedling development and field performance.In: Proceedings of the 6th biennial southern silvicultural research conference.US Forest Service.General Technical Report SE-70.p. 100-107.Boyer WD (1989) Response of planted longleaf pine bare-root and container stock to site preparation and release: fifth-year results.In: Proceedings of the 5th biennial southern silvicultural research conference.US Forest Service.General Technical Report SE-74.p. 165-168.Clabo DC, Clatterbuck WK (2015) Sprouting capability of shortleaf pine seedlings following clipping and burning: first-year results.In: Proceedings of the 17th biennial southern silvicultural research conference.US Forest Service.General Technical Report SRS-203.p. 137-142.Cram MM, Enebak SA, Fraedrich SW, Dwinell SW, Zarnoch SJ (2007) Evaluation of fumigants, EPTC herbicide, and Paenibacillus macerans in the production of loblolly pine seedlings.Forest Sci 53(1): 73-83.Curtis CM, Aust WM, Seiler JR, Strahm BD (2015) Survival and growth of restored Piedmont riparian forests as affected by site preparation, planting stock and planting aids.In: Proceedings of the 17th biennial southern silvicultural research conference.US Forest Service.General Technical Report SRS-203.p. 431-436.Dean TJ, Scott A, Holley AG (2013) Before and after comparisons of tree height in successive loblolly pine plantations with intervening machine, whole-tree harvesting.In: Proceedings of the 15th biennial southern silvicultural research conference.US Forest Service.General Technical Report SRS-175.p. 9-12.de Souza DPL, Gallagher T, Mitchell D, McDonald T, Smidt M (2016) Determining the effects of felling method and season of year on the regeneration of short rotation coppice.International Journal of Forest Engineering 27(1): 53-65.https://doi.org/10.1080/14942119.2015.1135616DeWit JN, Terry TA (1983) Site preparation effects on early loblolly pine growth, hardwood competition, and soil physical properties.In: Proceedings of the 2nd biennial southern silvicultural research conference.US Forest Service.General Technical Report SE-24.p. 40-47.Dong H, Burdett AN (1986) Chemical root pruning of Chinese pine seedlings raised in cupric sulfide impregnated paper containers.New Forest 1(1): 67-73.https://doi.org/10.1007/BF00028122Ezell AW, Yeiser JL, Lauer DK, Quicke HE (2013) Effect of application timing on efficiency of site preparation treatments using Chopper ®Gen2TM In: Proceedings of the 15th biennial southern silvicultural research conference.US Forest Service.General Technical Report SRS-175.p. 219-221.Foster JR (2001) Statistical power in forest monitoring.Forest Ecology and Management.151: 211-222. https://doi.org/10.1016/S0378-1127(01)00591-6