Analyze Outcome Ratios — ERAAnalyze • ERAg

This function analyzes outcome ratios in the ERA dataset for each combination of grouping variables as specified by column names in the Aggregate.By parameter. We suggest applying the ERA.Prepare function to data before using with this function.

ERAAnalyze(Data, rmOut = T, Aggregate.By, ROUND = 5, Fast = F)

Arguments

Data: A preapred ERA dataset (see PrepareERA function)
rmOut: Logical T/F. If TRUE extreme outliers are detected and removed for each combination of grouping variables.
Aggregate.By: Column names for grouping variables. Statistics will be compiled for each combination of these variables.
ROUND: Integer value for the number of decimal places numeric columns in the output dataset should be rounded to.
Fast: Logical T/F. If FALSE then lmer and lm models are used to estimate means, errors and significance if sufficient data exist.

Value

A data.table of response ratios and percentage change values, each row representing a combination of the grouping variables specified in the Aggregrate.By parameter. RR = outcome response ratio $log(MeanT/MeanC)$, PC = outcome proportional change $MeanT/MeanC$.

Output columns when Fast is TRUE:

Observations = no. rows of data
Studies = no. studies (publications)
Sites = no. of geographic locations
RR.Shapiro.Sig = P-value from a Shapiro-Wilk test of RR
RR = weighted mean of RR
RR = weighted median of RR
RR.se = weighted standard error of RR
RR.CIlow = lower 95% confidence interval of RR
RR.CIhigh = upper 95% confidence interval of RR
RR.var = weighted variance of RR
RR.Quantiles05 = weighted quantiles of the RR
PC.Shapiro.Sig = P-value from a Shapiro-Wilk test of PC
PC = weighted mean of PC
PC = weighted median of PC
PC.se = weighted standard error of PC
PC.CIlow = lower 95% confidence interval of PC
PC.CIhigh = upper 95% confidence interval of PC
PC.var = weighted variance of PC
PC.Quantiles05 = weighted quantiles of the PC
PC.pc = percent change based on PC (100 x PC - 100)
PC.pc.se.low = lower standard error confidence interval of % change based on PC
PC.pc.se.high = upper standard error confidence interval of % change based on PC
RR.pc = % change based on RR (100 x exp(RR) - 100)
RR.pc.se.low = lower standard error confidence interval of % change based on RR
RR.pc.se.high = upper standard error confidence interval of % change based on RR
RR.pc.jen = % change based on RR with correction for Jensen inequality (100 x exp(RR+RR.var/2) - 100)
RR.pc.jen.low = lower standard error confidence interval of % change based on RR with correction for Jensen inequality
RR.pc.jen.high = upper standard error confidence interval of % change based on RR with correction for Jensen inequality
RR.pc.jen.CIlow = lower 95% confidence interval of % change based on RR with correction for Jensen inequality
RR.pc.jen.CIhigh = upper 95% standard error confidence interval of % change based on RR with correction for Jensen inequality Where all units are indentical for the grouping variables (row) then the following columns will have values (else they are NA):
Units = the unit of recording for an outcome (e.g. kg/ha)
MeanT.Obs = number of experimental treatment observations
MeanT = weighted mean of experimental treatment outcome values
MeanT.se = weighted standard error of experimental treatment outcome values
MeanC.Obs = number of control treatment observations
MeanC = weighted mean of control treatment outcome values
MeanC.se = weighted standard error of contol treatment outcome values

When Fast = TRUE means, standard errors and confidence intervals are replaced by estimates from, in order of preference, lmer then lm models, if the model minimum data requirements are met. Percentage change data is then calculated from the updated estimates.

Additional columns when Fast = FALSE:

Model = type of model that was applied to data, NA = no model was applied.
RR.t.value = t statistic from RR model
RR.Pr(>|t|) = probability that the outcome is not equal to zero from RR model
RR.Sigma2 = RR model sigma2
PC.t.value = t statistic from PC model
PC.Pr(>|t|) = probability that the outcome is not equal to zero from PC model
PC.Sigma2 = PC model sigma2

Details

Several actions are or can be applied by this function:

Outlier Removal: Outliers are defined using an extreme outliers method where values above or below $3*IQR$ (interquartile range) are removed. The ERA outcome variables analyzed by this function are ratios between am experimental treatment and control outcome and should be approximately normally distributed. When the control approaches zero (e.g. yield collapse) this skews the distribution of the outcome ratio producing extremely high values tending to infinity and requiring outlier removal. The use of outcome ratios, whilst necessary to standardize outcomes between studies, means this approach is inappropriate to study nil outcomes (e.g. total crop yield failure), a binomial approach would be better for such instances. Outlier removal is optional and enabled if the rmOut parameter is set to TRUE (default).
Weighting: Within-study variance measures for mean outcomes are infrequently reported in agricultural literature, so traditional meta-analytic approaches cannot be applied to most ERA outcomes. Therefore individual observations are up-weighted by replication and down-weighted by the number of observations submitted from the same study (colname = Code) for each combination of grouping variables. Studies with more replications are likely to produce less variable information than studies with fewer. Controlling for the number of #' observations contributed by a study to the dataset weights each study equally. As such, outcome ratios are weighted according to: Weighting = ((RepsE * RepsC)/(RepsE)+(RepsC))/(Ns) where Rep is the number of replications for RepC the control and RepE the experimental treatment, and Ns is the total number of observations contributed to the overall dataset by the study to which the observation belongs.
Test of Normality: A Shapiro-Wilk test ( shapiro.test) is applied to raw and log-transformed outcome ratios for each combination of grouping variables. This can be used to judge whether values based on mean proportional change, mean response ratio or median proportional change should be used to evaluate practice performance.
Statistics calculated (in all cases na.rm=T):
- weighted means use the weighted.mean function
- weighted medians use the weighted.median function
- weighted standard errors use the weighted_se function
- 95% confidence intervals use the confint function with method = "Wald"
- weighted variance uses the wtd.stats wtd.var function
- weighted quantiles use the weighted.median (weighted.quantile) function with probs=seq(0,1,0.25)
Response ratios are back-transformed and converted to % change with and without a correction for the Jensen inequality. The correction applied is as per Tandini & Mehrabi 2017.
When Fast = FALSE where minimum data requirements are met linear-mixed effects or linear model is applied to the data to generate means, standard errors and variance.
- Linear mixed effects models use lme4 where outcomes from a grouping variable combination are from at least three sites of which two must have at least three observations. The model is weighted and includes a random intercept for site (lmer(Value~1 + (1|Site),weights=Weights)).
- If the minimum data requirements for the lmer are not met then a linear model with weights is applied (lm(Value~1,weights=Weights)) if there are at least 5 outcome observations for the grouping variable combination.
- If the minimum data requirements for the lm are not met no test is applied to the outcome values.

Also note that any groupings of data specified in the Aggregate.By parameter for which values of MeanC and MeanT are identical are removed from the dataset before analysis.