ERA-Explore-and-Analyze.rmd
In this vignette we will cover how to visualize ERA data and run simple meta-analysis.
Observations in the main ERA.Compiled
often consist of
bundles of practices which may relate to more than one product.
# No. combination practices
nrow(ERAg::ERA.Compiled[grepl("-",SubPrName)])
#> [1] 47113
# No. solo practices
nrow(ERAg::ERA.Compiled[!grepl("-",SubPrName)])
#> [1] 66173
This makes visualization of the dataset difficult as there are over a
1000 sub-practice combinations creating too many values to map onto a
figure. We can use the ERAComboSplit
function to “split”
these practice and product combinations into duplicate individual rows
each contain a unique combination of any practice x product combination
present in the original observation.
# There are many subpractice combinations in ERA
ERAg::ERA.Compiled[grepl("-",SubPrName),length(unique(SubPrName))]
#> [1] 1177
ERAg::ERA.Compiled[,unique(SubPrName)][1:10]
#> [1] "Inputs K-Inputs Micro-Inputs P-Inputs Urea"
#> [2] "Inputs K-Inputs P-Inputs Urea"
#> [3] "Inputs P-Inputs Urea"
#> [4] "Inputs Urea"
#> [5] "Inputs P"
#> [6] "Inputs K"
#> [7] "Inputs K-Inputs P"
#> [8] "Inputs K-Inputs Urea"
#> [9] "Inputs N-Inputs Urea"
#> [10] "Residue Unkn (nonNfix; NoFate)"
# Lets split these
ERA.Compiled.Split<-ERAComboSplit(Data=ERAg::ERA.Compiled)
# The dataset gets longer
dim(ERAg::ERA.Compiled)
#> [1] 113286 137
dim(ERA.Compiled.Split)
#> [1] 209953 143
# Suffixed .Combo columns contain split names and codes
grep(".Combo",colnames(ERA.Compiled.Split),value=T)
#> [1] "SubPrName.Combo" "PrName.Combo" "Theme.Combo"
#> [4] "Product.Simple.Combo" "Product.Subtype.Combo" "Product.Type.Combo"
# Combine codes have gone in the .Combo columns
ERA.Compiled.Split[grepl("-",SubPrName.Combo),length(unique(SubPrName.Combo))]
#> [1] 0
# Only single values are presnt
ERA.Compiled.Split[,unique(SubPrName.Combo)][1:10]
#> [1] "Inputs K" "Inputs Micro"
#> [3] "Inputs P" "Inputs Urea"
#> [5] "Inputs N" "Residue Unkn (nonNfix; NoFate)"
#> [7] "Irrigation Deficit" "Irrigation Supp"
#> [9] "Residue Incorp (nonNfix)" "Mulch (nonNfix)"
We can use the “split” dataset to explore how many instance of
products, practices or outcomes there are. Treemaps are great way of
visualizing these data using the treemap::treemap()
function.
# First we count the number of studies per level of the practice heirarchy (see PracticeCodes object for more information on the practice heirarchy)
ERA.Compiled.Split.Pracs<-ERA.Compiled.Split[,list(N.Studies=length(unique(Code))),by=list(SubPrName.Combo,PrName.Combo,Theme.Combo)]
head(ERA.Compiled.Split.Pracs)
#> SubPrName.Combo PrName.Combo Theme.Combo
#> <char> <char> <char>
#> 1: Inputs K Inorganic Fertilizer Nutrient Management
#> 2: Inputs Micro Inorganic Fertilizer Nutrient Management
#> 3: Inputs P Inorganic Fertilizer Nutrient Management
#> 4: Inputs Urea Inorganic Fertilizer Nutrient Management
#> 5: Inputs N Inorganic Fertilizer Nutrient Management
#> 6: Residue Unkn (nonNfix; NoFate) Crop Residue Soil Management
#> N.Studies
#> <int>
#> 1: 264
#> 2: 80
#> 3: 498
#> 4: 277
#> 5: 406
#> 6: 21
# Visualize with the treemap function
treemap::treemap(ERA.Compiled.Split.Pracs,
index=c("Theme.Combo","PrName.Combo"),
vSize="N.Studies",
type="index",
palette = "Paired",
title="Number of Studies reporting ERA practices")
In this example we will analyze outcomes for the effectiveness of water harvesting practices using pits.
First let’s explore the ERA PracticeCodes
to see where
we might find the names of the practices we are interested in. Theme is
the highest level of organization for practice which of these relate to
water harvesting?
PracticeCodes<-data.table(PracticeCodes)
PracticeCodes[,unique(Theme)]
#> [1] "Agroforestry" "Soil Management" "Blue Water Management"
#> [4] "Green Water Management" "Crop Management" "Genetic improvement"
#> [7] "Nutrient Management" "Energy" "Animals"
#> [10] "Postharvest" "Non-CSA" "Water Management"
Water Management
is a sensible place to look, let’s look
at the practices nesting within this theme.
PracticeCodes[Theme=="Water Management",unique(Practice)]
#> [1] "Irrigation" "Planting"
The Water Harvesting
practice is indeed in the
Water Management
theme, there are also sub-practices nested
within practices so let’s see what these are for water harvesting.
# Note that we are using the Subpractice.S field, this is the field used to generate the names in the ERA.Compiled$SubPrName and ERA.Compiled$SubPrName.Base fields
PracticeCodes[Practice=="Water Harvesting",unique(Subpractice.S)]
#> [1] "Water Tanks" "Zai Pits" "Stone Rows"
#> [4] "Grass Strips" "Planting Basins" "Dam"
#> [7] "Terraces" "Bunds" "Ridge & Furrow"
#> [10] "Tied Ridges" "Microcatchments" "Fanya Juu or Chini"
#> [13] "Trench"
Two of these practices seem to relate to pits
Planting Basins
and Zai Pits
, let’s check
their definitions to make sure.
FocalPractices<-c("Planting Basins","Zai Pits")
PracticeCodes[Subpractice %in% FocalPractices,Definition]
#> [1] "Small pits to capture water. Pits are filled with manure, compost or other nutrient sources before the rainy season. Crops are planted directly into the pits."
#> [2] "Small planting pits dug to capture water, crops are planted directly into the pits."
Great both are are types of planting pit.
Now let’s see what data we have in ERA on these practices
(PitPracs<-ERAg::ERA.Compiled[grepl(paste(FocalPractices,collapse = "|"),SubPrName),SubPrName] %>%
table %>%
sort(decreasing = T))
#> .
#> MinTill-Planting Basins
#> 661
#> Planting Basins
#> 361
#> MinTill-Mulch (nonNfix)-Planting Basins
#> 107
#> AgFor Prune Mulch (Nfix)-MinTill-Planting Basins
#> 100
#> Mulch (nonNfix)-Planting Basins
#> 92
#> Zai Pits
#> 58
#> AgFor Prune Mulch (nonNfix)-MinTill-Planting Basins
#> 50
#> Planting Basins-Rotation (Mixed)
#> 34
#> MinTill-Mulch (Nfix)-Mulch (nonNfix)-Planting Basins
#> 28
#> Inputs Manure-Zai Pits
#> 27
#> Mulch (noID)-Planting Basins
#> 24
#> Mulch (noID)-Planting Basins-Rotation (Mixed)
#> 24
#> NoTill-Planting Basins
#> 21
#> Inputs K-Inputs N-Inputs P-Planting Basins
#> 20
#> AgFor Alley (Nfix)-AgFor Prune (Unknown)-Planting Basins
#> 16
#> Inputs K-Inputs Manure-Inputs N-Inputs P-MinTill-Mulch (noID)-Planting Basins
#> 12
#> Inputs Manure-MinTill-Mulch (noID)-Planting Basins
#> 12
#> Inputs Urea-Zai Pits
#> 10
#> Irrigation Supp-MinTill-Planting Basins
#> 10
#> Irrigation Supp-Planting Basins
#> 10
#> Intercrop (Mixed)-pH-Planting Basins
#> 8
#> Intercrop (Mixed)-Planting Basins
#> 8
#> Mulch (nonNfix)-Zai Pits
#> 8
#> Inputs Manure-Inputs Urea-Zai Pits
#> 6
#> MinTill-Planting Basins-Residue Incorp (noID)
#> 6
#> Green Manure (Nfix; Time)-MinTill-Planting Basins-Residue Unkn (NoID; NoFate)-Rotation (Mixed)
#> 3
#> Inputs K-Inputs Manure-Inputs N-Inputs P-MinTill-Planting Basins
#> 2
#> Inputs K-Inputs Manure-Inputs N-Inputs P-Planting Basins
#> 2
#> Inputs K-Inputs N-Inputs P-MinTill-Planting Basins
#> 2
#> Inputs K-Inputs N-MinTill-Planting Basins
#> 2
#> Inputs K-Inputs N-Planting Basins
#> 2
#> Inputs K-Inputs P-MinTill-Planting Basins
#> 2
#> Inputs K-Inputs P-Planting Basins
#> 2
#> Inputs Manure-MinTill-Planting Basins
#> 2
#> Inputs Manure-Planting Basins
#> 2
#> Inputs N-Inputs P-MinTill-Planting Basins
#> 2
#> Inputs N-Inputs P-Planting Basins
#> 2
#> MinTill-Mulch (nonNfix)-Planting Basins-Rotation (Mixed)
#> 2
#> MinTill-Zai Pits
#> 2
#> Inputs K-Inputs N-Inputs P-Inputs Urea-MinTill-Planting Basins
#> 1
#> Inputs Manure-Inputs P-Inputs Urea-MinTill-Zai Pits
#> 1
#> Intercrop (Mixed)-MinTill-Mulch (Nfix)-Mulch (nonNfix)-Planting Basins
#> 1
Try using the treemap
approach as a way of visualizing
the data above.
It seems planting basins are often used in combination with minimum tillage and mulch, together this is known as conservation agriculture. Let’s subset the data to where we have a reasonable number of observations, at least 50.
PitPracs<-PitPracs[PitPracs>=50] %>% names
ERA.Pits<-ERAg::ERA.Compiled[SubPrName %in% PitPracs]
What outcomes do we have data for?
ERA.Pits.Out<-ERA.Pits[,list(N.Studies=length(unique(Code))),by=list(SubPrName,Out.SubInd)]
# Visualize with the treemap function
treemap::treemap(ERA.Pits.Out,
index=c("Out.SubInd"),
vSize="N.Studies",
type="index",
palette = "Paired",
title="Outcomes present for pit-based water harvesting practices")
What crops do we have data for?
ERA.Pits.Crops<-ERA.Pits[,list(N.Studies=length(unique(Code))),by=list(Product.Simple)]
# Visualize with the treemap function
treemap::treemap(ERA.Pits.Crops,
index=c("Product.Simple"),
vSize="N.Studies",
type="index",
palette = "Paired",
title="Crops present for pit-based water harvesting practices")
Where does this data come from?
ERAgON::ERAHexPlot(Data=ERA.Pits,Showpoints = "Yes",Point.Col = "black",Mid = "yellow",
High = "red",)
#> Warning in layer_sf(geom = GeomSf, data = data, mapping = mapping, stat = stat,
#> : Ignoring unknown aesthetics: x and y
#> Warning: `stat(log(count))` was deprecated in ggplot2 3.4.0.
#> ℹ Please use `after_stat(log(count))` instead.
#> ℹ The deprecated feature was likely used in the ERAgON package.
#> Please report the issue to the authors.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
100*(ERA.Pits[,table(Country)] / nrow(ERA.Pits)) %>% round(digits=3)
#> Country
#> Ethiopia Mozambique Niger Sudan Zambia Zimbabwe
#> 3.0 0.3 3.0 3.5 0.6 89.6
Where can I find the papers that contain this data?
ERA.Compiled
provides you with some basic bibliographic
data, the most important of which is the DOI
file which
enables you to locate the article on-line when adding the prefix
https://doi.org/
(e.g.,
https://doi.org/10.1007/s10705-005-6209-9
).
ERA.Pits[,list(Code,Journal,Author,DOI)] %>% unique
#> Code Journal Author DOI
#> <char> <char> <char> <char>
#> 1: NN0035 NUTR CYCL AGROECOSYS Fatondji D 10.1007/s10705-005-6209-9
#> 2: NN0304 EXP AGR Mashingaidze N 10.1017/s0014479712001020
#> 3: HK0259 J SUSTAIN AGR Umar BB 10.1080/10440046.2012.661700
#> 4: HK0280 PHYS CHEM EARTH Mupangwa W 10.1016/j.pce.2007.07.030
#> 5: HK0281 PHYS CHEM EARTH Mupangwa W 10.1016/j.pce,2008.06.049
#> 6: HK0282 FIELD CROP RES Mupangwa W 10.1016/j.fcr.2012.02.020
#> 7: HK0283 J ARID ENVIRON Mupangwa W 10.1016/j.jaridenv.2012.11.007
#> 8: HK0347 CROP PROT Mutsamba EF 10.1016/j.cropro.2016.01.004
#> 9: DK0032 FIELD CROP RES Kafesu 10.1016/j.fcr.2018.01.014
#> 10: EO0021 S AFR J PLANT SOIL Mupangwa 10.1080/02571862.2018.1438673
#> 11: AG0044 AIMS AGR FOOD Mupangwa 10.3934/agrfood.2016.1.85
#> 12: NJ0060 J NEMATOL Mashavakure 10.21307/jofnem-2018-033
#> 13: AN0068 RENEW AGR FOOD SYST Thierfelder 10.1017/S1742170515000332
#> 14: AN0083 SOIL TILL RES Sime 10.1016/j.still.2014.12.001
#> 15: AN0093 LAND DEGRAD DEV Wildemeersch 10.1002/ldr.2416
#> 16: AN0123 OUTLOOK AGR Ali 10.1177/0030727016665034
#> 17: AN0128 BIOSCI J Ali 10.14393/BJ-v33n4a2017-33954
#> 18: NJ0096 NUTR CYCL AGROECOSYS Nyagumbo 10.1007/s10705-015-9733-2
#> 19: NJ0106 J SCI FOOD AGR Ali 10.1002/jsfa.7047
#> 20: JS0245 EXP AGR Amede 10.1017/s0014479710000803
#> 21: AG0053 INT J AGR SUSTAIN Mupangwa 10.1080/14735903.2015.1011393
#> 22: EO0020 RENEW AGR FOOD SYST Mupangwa 10.1017/s1742170516000041
#> Code Journal Author DOI
There is also a DataLoc
field in
ERA.Compiled
that shows where data were extracted from
within in a publication,
ERA.Pits[1:5,list(DOI,PrName,Out.SubInd,MeanT,MeanC,Units,DataLoc)]
#> DOI PrName Out.SubInd MeanT MeanC Units
#> <char> <char> <char> <num> <num> <char>
#> 1: 10.1007/s10705-005-6209-9 Water Harvesting Crop Yield 0.03175 0.02350 Mg/ha
#> 2: 10.1007/s10705-005-6209-9 Water Harvesting Crop Yield 0.18332 0.14246 Mg/ha
#> 3: 10.1007/s10705-005-6209-9 Water Harvesting Crop Yield 1.17388 0.71884 Mg/ha
#> 4: 10.1007/s10705-005-6209-9 Water Harvesting Crop Yield 0.42047 0.10704 Mg/ha
#> 5: 10.1007/s10705-005-6209-9 Water Harvesting Crop Yield 0.37579 0.26944 Mg/ha
#> DataLoc
#> <char>
#> 1: Fig 3a
#> 2: Fig 3a
#> 3: Fig 3a
#> 4: Fig 3b
#> 5: Fig 3b
There are some pre-processing steps that we need to apply to the data
using the ERAg::PrepareERA
function before conducting a
meta-analysis based on response ratios. These steps include:
1) Negative outcomes: where outcomes have a notable number
(e.g. > 0.5%) of negative values for the control (MeanC
column) or experimental treatment (MeanT
column), this is
because negative numbers are incompatible with ratios.
2) Inverse outcomes: when a lower value indicates a better
outcome, the outcome values for control (MeanC
) and
experimental treatment (MeanT
) are swapped (excluding
economic outcomes).
See ?PrepareERA for more details
ERA.Pits.Prep<-ERAg::PrepareERA(Data = ERA.Pits,Perc.Neg = 0.5,RmNeg = T)
# Have we lost any data?
dim(ERA.Pits)
#> [1] 1429 137
dim(ERA.Pits.Prep)
#> [1] 1429 137
We haven’t lost any data which means that there were insufficient negative outcomes to cause us an issue.
To analyze the data we will use the ERAg::ERAAnalyze
function which performs the following:
Weighting
: Within-study variance measures for mean
outcomes are infrequently reported in agricultural literature, so
traditional meta-analysis approaches cannot be applied to most ERA
outcomes. Therefore individual observations are up-weighted by
replication and down-weighted by the number of observations submitted
from the same study (Code
column) for each combination of
grouping variables. Studies with more replications are likely to produce
less variable information than studies with fewer. Controlling for the
number of observations contributed by a study to the dataset weights
each study equally. As such, outcome ratios are weighted according to:
Weighting = ((RepsE * RepsC)/(RepsE)+(RepsC))/(Ns)
where
Rep
is the number of replications for RepC
the
control and RepE
the experimental treatment, and
Ns
is the total number of observations contributed to the
overall dataset by the study to which the observation belongs.
Outlier Removal
: Outliers are defined using an
extreme outliers method where response ratios above or below
(interquartile range) are removed. The ERA outcome variables analyzed by
this function are ratios between am experimental treatment and control
outcome and should be approximately normally distributed. When the
control approaches zero (e.g. yield collapse) this skews the
distribution of the outcome ratio producing extremely high values
tending to infinity and requiring outlier removal. The use of outcome
ratios, whilst necessary to standardize outcomes between studies, means
this approach is inappropriate to study nil outcomes (e.g. total crop
yield failure), a binomial approach would be better for such instances.
Outlier removal is optional and enabled if the rmOut parameter is set to
TRUE (default).
Test of Normality
: A Shapiro-Wilk test is applied to
raw and log-transformed outcome ratios for each combination of grouping
variables. This can be used to judge whether values based on mean
proportional change, mean response ratio or median proportional change
should be used to evaluate practice performance.
Basic Statistical Tests
: When
Fast = FALSE
where minimum data requirements are met
linear-mixed effects or linear model is applied to the data to generate
means, standard errors and variance. Linear mixed effects models use
lmer::lme4
where outcomes from a grouping variable
combination are from at least three sites of which two must have at
least three observations. The model is weighted and includes a random
intercept for site
(lmer(Value~1 + (1|Site),weights=Weights)
). If the minimum
data requirements for the linear-mixed effects are not met but there are
at least 5 outcome observations for the grouping variable combination
then a weighted linear model is used
(lm(Value~1,weights=Weights)
) . If the minimum data
requirements for the linear model are not met no test is applied to the
outcome values.
Correction of Jensen Inequality
: The log-scale
response ratios are back-transformed and converted to % change with and
without a correction for the Jensen inequality. The correction applied
is as per Tandini &
Mehrabi 2017. We recommend using the corrected values.
In all case we advise caution in the interpretation of analysis
outputs. The distribution of the data for each combination of values
specified in the Aggregate.By
field should be checked
through plotting. Combinations with limited data are vulnerable to
outliers and should be scrutinized carefully.
Ok now that we have an idea what the function is doing let’s run an
analysis of outcomes subindicator (Out.SubInd
) by
subpractice (SubPrName
) using the
ERAg::ERAAnalyze
function.
We will not specify a product (i.e. crop) column in the
Aggregate.By
argument so results are averaged across all
the crops present within each combination of subindicator x subpractice
as specified by c("Out.SubInd","SubPrName")
in the
Aggregate.By
argument.
Analysis1<-ERAg::ERAAnalyze(Data=ERA.Pits.Prep,
# rmOut: Outliers will be removed
rmOut = TRUE,
# Aggregate.By: Data are analyzed as subsets of these fields
Aggregate.By = c("Out.SubInd","SubPrName"),
# Fast: Tests will be applied to the data if FALSE
Fast=FALSE)
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
ERAAnalyze
outputs a data.table with many columns, we
will run through what many of these are next, but you can also see the
function description for more information
(?ERAg::ERAAnalyze
).
colnames(Analysis1)
#> [1] "Out.SubInd" "SubPrName" "Observations" "Studies"
#> [5] "Sites" "RR.Shapiro.Sig" "RR" "RR.median"
#> [9] "RR.var" "RR.se" "RR.CIlow" "RR.CIhigh"
#> [13] "RR.Quantiles0.25" "PC.Shapiro.Sig" "PC" "PC.median"
#> [17] "PC.se" "PC.CIlow" "PC.CIhigh" "PC.var"
#> [21] "PC.Quantiles0.25" "Units" "Model" "MeanT.Obs"
#> [25] "MeanT" "MeanT.se" "MeanC.Obs" "MeanC"
#> [29] "MeanC.se" "RR.t value" "RR.Pr(>|t|)" "RR.Sigma2"
#> [33] "PC.t value" "PC.Pr(>|t|)" "PC.Sigma2" "RR.pc.se.low"
#> [37] "RR.pc" "RR.pc.se.high" "RR.pc.jen.low" "RR.pc.jen"
#> [41] "RR.pc.jen.high" "RR.pc.jen.CIlow" "RR.pc.jen.CIhigh" "PC.pc.se.low"
#> [45] "PC.pc" "PC.pc.se.high"
The first 5 columns describe the subset of data analyzed
(combinations of the fields provided in the Aggregate.By
argument) and the amount of data that were available.
Analysis1[,list(Out.SubInd, SubPrName, Observations, Studies, Sites)] %>% head
#> Out.SubInd SubPrName Observations Studies Sites
#> <char> <char> <int> <int> <int>
#> 1: Crop Yield Zai Pits 28 3 4
#> 2: Crop Residue Yield Zai Pits 12 1 2
#> 3: Water Use Efficiency Zai Pits 11 2 3
#> 4: Crop Yield Planting Basins 119 9 10
#> 5: Nitrogen Agronomic Efficiency Planting Basins 4 1 1
#> 6: Water Use Efficiency Planting Basins 9 1 1
We can use the data availability fields to filter the results to combinations that meet a minimum data requirement, let’s specify a minimum of 3 studies.
Analysis1<-Analysis1[Studies>=3]
Analysis1[,list(Out.SubInd, SubPrName, Observations, Studies, Sites)]
#> Out.SubInd SubPrName Observations
#> <char> <char> <int>
#> 1: Crop Yield Zai Pits 28
#> 2: Crop Yield Planting Basins 119
#> 3: Crop Yield MinTill-Planting Basins 31
#> 4: Soil Moisture Planting Basins 78
#> 5: Infiltration Rate Planting Basins 11
#> 6: Crop Yield MinTill-Mulch (nonNfix)-Planting Basins 51
#> 7: Soil Moisture MinTill-Planting Basins 588
#> Studies Sites
#> <int> <int>
#> 1: 3 4
#> 2: 9 10
#> 3: 5 5
#> 4: 6 7
#> 5: 3 3
#> 6: 4 7
#> 7: 3 5
A response ratio (RR) is simply the natural log of the ratio of the experimental outcome to the control outcome. If maize yields with planting basins are 1.5 Mg/ha and without them are 1.1 Mg/ha the response ratio is log(1.5/1.1) = 0.310. RRs greater than zero indicate the experimental treatment is better than the control and vice-versa for RRs less than zero.
Measures of central tendency and error for RRs are provided in the
fields prefixed with RR.
and include mean
(RR
), median, variance, standard error and quantiles at 25%
intervals. If a test was applied to the data then the mean and standard
error are derived from that test.
A Shapiro-Wilks test of normality is applied to RRs. If a value in
the RR.Shapiro.Sig
field is <0.05 then the data for that
row are statistically non-normal therefore estimates of central tendency
and tests results may be unreliable.
Analysis1[,list(Out.SubInd,SubPrName,RR.Shapiro.Sig, RR, RR.median, RR.var, RR.se, RR.Quantiles0.25)] %>% head
#> Out.SubInd SubPrName RR.Shapiro.Sig
#> <char> <char> <num>
#> 1: Crop Yield Zai Pits 0.02136
#> 2: Crop Yield Planting Basins 0.01802
#> 3: Crop Yield MinTill-Planting Basins 0.34706
#> 4: Soil Moisture Planting Basins 0.00643
#> 5: Infiltration Rate Planting Basins 0.53826
#> 6: Crop Yield MinTill-Mulch (nonNfix)-Planting Basins 0.00001
#> RR RR.median RR.var RR.se
#> <num> <num> <num> <num>
#> 1: 1.60856 1.56759080 1.93062 0.41504
#> 2: 0.34048 0.28430648 0.18272 0.08592
#> 3: 0.35755 0.95097629 0.05203 0.16766
#> 4: 0.13466 0.18669793 0.05141 0.05812
#> 5: 0.04268 -0.09502693 0.06270 0.07082
#> 6: 0.32865 0.15415068 0.18962 0.17219
#> RR.Quantiles0.25
#> <char>
#> 1: 0.17618|0.6611|1.50677|2.05494|4.89784
#> 2: -1.28606|0.10469|0.28327|0.66602|1.4929
#> 3: -0.34971|0.78828|0.84251|0.89675|0.95098
#> 4: -0.39531|0.02917|0.17919|0.35003|0.64185
#> 5: -0.17829|-0.12999|-0.09863|0.16834|0.53714
#> 6: -0.50858|-0.01417|0.1463|0.30769|2.93563
If tests were performed (argument Fast=F
) then test
results presented in the t value
, Pr(>|t|)
and Sigma2
. The Pr(>|t|)
field indicates
significance.
Analysis1[,list(Out.SubInd,SubPrName,Model, `RR`,`RR.se`,`RR.t value`, `RR.Pr(>|t|)`, RR.Sigma2)] %>% head
#> Out.SubInd SubPrName Model
#> <char> <char> <char>
#> 1: Crop Yield Zai Pits lmerModLmerTest
#> 2: Crop Yield Planting Basins lmerModLmerTest
#> 3: Crop Yield MinTill-Planting Basins lmerModLmerTest
#> 4: Soil Moisture Planting Basins lmerModLmerTest
#> 5: Infiltration Rate Planting Basins lm
#> 6: Crop Yield MinTill-Mulch (nonNfix)-Planting Basins lmerModLmerTest
#> RR RR.se RR.t value RR.Pr(>|t|) RR.Sigma2
#> <num> <num> <num> <num> <num>
#> 1: 1.60856 0.41504 3.87563 0.03317 0.25526
#> 2: 0.34048 0.08592 3.96281 0.00343 0.02497
#> 3: 0.35755 0.16766 2.13255 0.09646 0.00706
#> 4: 0.13466 0.05812 2.31702 0.05524 0.00260
#> 5: 0.04268 0.07082 0.60264 0.56017 0.02508
#> 6: 0.32865 0.17219 1.90870 0.11167 0.01603
For easier interpretation mean RR and RR +/- standard error are
back-transformed and converted into percentage change (e.g., a ratio of
1.1 = a 10% increase). The RR.pc.
columns back-transform
log ratios using the exponent. The RR.pc.jen.
columns
back-transform log ratios using the exponent with a correction for the
Jensen inequality.
Analysis1[,list(Out.SubInd,SubPrName, RR.pc.se.low, RR.pc, RR.pc.se.high, RR.pc.jen.low, RR.pc.jen.high)] %>% head
#> Out.SubInd SubPrName RR.pc.se.low
#> <char> <char> <num>
#> 1: Crop Yield Zai Pits 229.86721
#> 2: Crop Yield Planting Basins 28.98939
#> 3: Crop Yield MinTill-Planting Basins 20.91166
#> 4: Soil Moisture Planting Basins 7.95454
#> 5: Infiltration Rate Planting Basins -2.77478
#> 6: Crop Yield MinTill-Mulch (nonNfix)-Planting Basins 16.93640
#> RR.pc RR.pc.se.high RR.pc.jen.low RR.pc.jen.high
#> <num> <num> <num> <num>
#> 1: 399.56124 656.55118 274.77288 759.54243
#> 2: 40.56221 53.17333 30.60992 55.09769
#> 3: 42.98221 69.08139 21.33923 69.67930
#> 4: 14.41477 21.26160 8.09497 21.41934
#> 5: 4.36039 12.01919 -1.54790 13.43275
#> 6: 38.90916 65.01068 17.87741 66.33855
Analyses are performed on the “raw” ratio of experimental to control
outcomes, these columns are prefixed PC.
. The proportional
change columns are as per the response ratio columns, but there are no
.jen.
columns as values are not logged.
We urge caution in using statistics based on proportional data due to the imbalanced scaling of ratios below or above one. Ratio or proportion data is often right skewed as values cannot be lower than zero but can tend to infinity.
Below we compare percent changes estimates using the ratio vs. response ratio.
The different statistics/methods give different effect size estimates and have different amounts of data availability. Generally the more data that are available the closer the convergence of the methods, although not always.
# Add percentage change results based on median
Analysis1[,PC.pc.median:=100*(PC.median-1)]
Analysis1[,RR.pc.median:=100*(RR.median-1)]
Analysis1[,list(Out.SubInd,SubPrName,Observations,RR.pc.jen,RR.pc.median,PC.pc,PC.pc.median)]
#> Out.SubInd SubPrName Observations
#> <char> <char> <int>
#> 1: Crop Yield Zai Pits 28
#> 2: Crop Yield Planting Basins 119
#> 3: Crop Yield MinTill-Planting Basins 31
#> 4: Soil Moisture Planting Basins 78
#> 5: Infiltration Rate Planting Basins 11
#> 6: Crop Yield MinTill-Mulch (nonNfix)-Planting Basins 51
#> 7: Soil Moisture MinTill-Planting Basins 588
#> RR.pc.jen RR.pc.median PC.pc PC.pc.median
#> <num> <num> <num> <num>
#> 1: 467.56778 56.759080 1523.465 379.508197
#> 2: 42.32813 -71.569352 52.000 32.884013
#> 3: 43.48782 -4.902371 53.348 158.823529
#> 4: 14.56361 -81.330207 16.424 20.526316
#> 5: 5.67731 -109.502693 7.208 -9.065156
#> 6: 40.02699 -84.584932 84.019 16.666667
#> 7: 13.29785 -83.794414 14.169 17.592593
A check of the normality of the data reveals potential issues and these are only “solved” (according to the significance of the Shapiro-Wilks test) by logging the data in one instance (Soil Moisture and MinTill-Planting Basins).
Analysis1[,list(Out.SubInd,SubPrName, RR.Shapiro.Sig, PC.Shapiro.Sig)]
#> Out.SubInd SubPrName RR.Shapiro.Sig
#> <char> <char> <num>
#> 1: Crop Yield Zai Pits 0.02136
#> 2: Crop Yield Planting Basins 0.01802
#> 3: Crop Yield MinTill-Planting Basins 0.34706
#> 4: Soil Moisture Planting Basins 0.00643
#> 5: Infiltration Rate Planting Basins 0.53826
#> 6: Crop Yield MinTill-Mulch (nonNfix)-Planting Basins 0.00001
#> 7: Soil Moisture MinTill-Planting Basins 0.00000
#> PC.Shapiro.Sig
#> <num>
#> 1: 0.00000
#> 2: 0.00000
#> 3: 0.00395
#> 4: 0.00002
#> 5: 0.32726
#> 6: 0.00000
#> 7: 0.00000
Let’s check the data distribution. If we take the combination of
Soil Moisture
and MinTill-Planting Basins
plotting the outcome response ratio and ratio vs a normal curve it
shows: 1) the right skew (due to the nature of ratio data) in the ratio
data is corrected by the natural log transformation; 2) whilst the
response ratio distribution is not quite normal, it’s not terrible
either and probably good enough for our purposes.
RR<-ERA.Pits.Prep[Out.SubInd=="Soil Moisture" & SubPrName=="MinTill-Planting Basins",yi]
hist(RR, prob=T)
curve(dnorm(x, mean=mean(RR), sd=sqrt(var(RR))), col="darkblue", lwd=2, add=TRUE, yaxt="n")
Proportion<-ERA.Pits.Prep[Out.SubInd=="Soil Moisture" & SubPrName=="MinTill-Planting Basins",MeanT/MeanC]
hist(Proportion, prob=T)
curve(dnorm(x, mean=mean(Proportion), sd=sqrt(var(Proportion))), col="darkblue", lwd=2, add=TRUE, yaxt="n")
Interestingly results can change if the numerator and denominator are swapped. For example:
Con<-c(1000,1500,2000) # Control outcomes
Exp<-c(1100,500,1200) # Experimental outcomes
# ***Experimental outcome is numerator***
Exp/Con
#> [1] 1.1000000 0.3333333 0.6000000
mean(Exp/Con)
#> [1] 0.6777778
# % change estimated from mean of ratios
100*(mean(Exp/Con)-1)
#> [1] -32.22222
# % change estimated from mean of response ratios
100*(exp(mean(log(Exp/Con)))-1)
#> [1] -39.63189
# ***Control outcome is numerator***
Con/Exp
#> [1] 0.9090909 3.0000000 1.6666667
mean(Con/Exp)
#> [1] 1.858586
# % change estimated from mean of ratios
100*(mean(Con/Exp)-1)
#> [1] 85.85859
# % change estimated from mean of response ratios
100*(exp(mean(log(Con/Exp)))-1)
#> [1] 65.65038
When averaging ratios if the experimental treatment is the numerator then the experimental treatment yields -32.2% less than the control, and if the control treatment is the numerator then we find the control yields 85.9% more than the experimental treatment. The results using response ratios are less dispersed (-39.6% vs 65.7%).
Which is true: 1) the control is 65.7% better than the experimental treatment or 2) the experimental treatment is 39.6% worse than the control?