Practical Statistics – 5
“….It is also useful to explore how the data is distributed….”
So moving forward we get to make some boxes’n stuff, though done in ‘boxes over time’, this will be as the book decrees.
Rather than re-work the same dataset I’ve gotten a new one to pour over for this and perhaps the next couple sections:
“….Rivendell – Rainfall Chemistry, Stream Water Chemistry, Throughfall Chemistry, Groundwater Chemistry – Solute chemistry (2007-2015)
Water samples from rainfall, stream, and wells. Major solutes (Ca, Mg, Na, K, and Si).
Description:
Significant solute flux from the weathered bedrock zone – which underlies soils and saprolite – has been suggested by many studies. However, controlling processes for the hydrochemistry dynamics in this zone are poorly understood. This work reports the first results from a four-year (2009-2012) high-frequency (1-3 day) monitoring of major solutes (Ca, Mg, Na, K and Si) in the perched, dynamic groundwater in a 4000 m2 zero-order basin located at the Angelo Coast Range Reserve, Northern California. Groundwater samples were autonomously collected at three wells (downslope, mid-slope, and upslope) aligned with the axis of the drainage. Rain and throughfall samples, profiles of well headspace pCO2, vertical profiles and time series of groundwater temperature, and contemporaneous data from an extensive hydrologic and climate sensor network provided the framework for data analysis….”
http://criticalzone.org/eel/data/dataset/4742/
Let’s look-see at the data:, which includes 4 files:
- 20150519_mastersheet_rainwater
- 20150519_mastersheet_GW_and_SW
- 20150519_ICP_rainwater_results_nM
- 20150519_ICP_GW_SW_results_nM
With the ICP labled data being the results of Inductively Coupled Plasma Mass Spectrometry
GW being Ground Water and SW being Stream Water
#install.packages("readxl")
library("readxl")
rainwater <- data.frame(
read_excel("/home/Daiten/R/Project 4/data/20150519_mastersheet_rainwater.xls"))
rainwater_ICP <- data.frame(
read_excel("/home/Daiten/R/Project 4/data/20150519_ICP_rainwater_results_nM.xls"))
GW_and_SW <- data.frame(
read_excel("/home/Daiten/R/Project 4/data/20150519_mastersheet_GW_and_SW.xls"))
GW_and_SW_ICP <- data.frame(
read_excel("/home/Daiten/R/Project 4/data/20150519_ICP_GW_SW_results_nM.xls"))
Let's pull some names:
<pre><code>
GW_and_SW_colnames<-colnames(GW_and_SW)
GW_and_SW_ICP_colnames<-colnames(GW_and_SW_ICP)
</code></pre>
GW_and_SW_colnames returns:
[1] "Sample.id" "sample.no" "description" "descp.no" "date" "time"
[7] "year" "month" "day" "hour" "minute" "filter.pore.size"
[13] "method" "method.No." "Isco..no."
and GW_and_SW_ICP_colnames returns
[1] "Sample.Id" "location"
[3] "sampling_date_JD" "rundate"
[5] "evaporated.no…0..yes…1." "bad_filters..no…0.yes..1."
[7] "storage_time_days" "Li7.LR."
[9] "Li7.LR._SD" "B11.LR."
[11] "B11.LR._SD" "Ca43.LR."
[13] "Ca43.LR._SD" "Rb85.LR."
[15] "Rb85.LR._SD" "Sr88.LR."
[17] "Sr88.LR._SD" "In115.LR."
[19] "In115.LR._SD" "Na23.MR."
[21] "Na23.MR._SD" "Mg25.MR."
[23] "Mg25.MR._SD" "Al27.MR."
[25] "Al27.MR._SD" "Si28.MR."
[27] "Si28.MR._SD" "P31.MR."
[29] "P31.MR._SD" "Ca44.MR."
[31] "Ca44.MR._SD" "Mn55.MR."
[33] "Mn55.MR._SD" "Fe56.MR."
[35] "Fe56.MR._SD" "Fe57.MR."
[37] "Fe57.MR._SD" "Co59.MR."
[39] "Co59.MR._SD" "Ni60.MR."
[41] "Ni60.MR._SD" "Cu63.MR."
[43] "Cu63.MR._SD" "Cu65.MR."
[45] "Cu65.MR._SD" "Zn64.MR."
[47] "Zn64.MR._SD" "Zn66.MR."
[49] "Zn66.MR._SD" "Rb85.MR."
[51] "Rb85.MR._SD" "Sr88.MR."
[53] "Sr88.MR._SD" "Mo98.MR."
[55] "Mo98.MR._SD" "In115.MR."
[57] "In115.MR._SD" "Ba138.MR."
[59] "Ba138.MR._SD" "K39.HR."
[61] "K39.HR._SD" "In115.HR."
[63] "In115.HR._SD"
Since the number of columns is small enough we'll differentiate and make a box plot for a location by hand, so we don't have to ask someone for a reg.ex expression, again.
location_zero <- data.frame(subset(GW_and_SW_ICP, GW_and_SW_ICP$location == "0"))
location_zero_LMHR <- data.frame(location_zero$Li7.LR., location_zero$B11.LR., location_zero$Ca43.LR., location_zero$Rb85.LR.,
location_zero$Sr88.LR., location_zero$In115.LR., location_zero$Na23.MR., location_zero$Mg25.MR.,
location_zero$Al27.MR., location_zero$Si28.MR., location_zero$P31.MR., location_zero$Ca44.MR.,
location_zero$Mn55.MR., location_zero$Fe56.MR., location_zero$Fe57.MR., location_zero$Co59.MR.,
location_zero$Ni60.MR., location_zero$Cu63.MR., location_zero$Cu65.MR., location_zero$Zn64.MR.,
location_zero$Zn66.MR., location_zero$Rb85.MR., location_zero$Sr88.MR., location_zero$Mo98.MR.,
location_zero$In115.MR., location_zero$Ba138.MR., location_zero$K39.HR., location_zero$In115.HR.)
location_zero_SD <- data.frame(location_zero$Li7.LR._SD, location_zero$B11.LR._SD, location_zero$Ca43.LR._SD, location_zero$Rb85.LR._SD,
location_zero$Sr88.LR._SD, location_zero$In115.LR._SD, location_zero$Na23.MR._SD, location_zero$Mg25.MR._SD,
location_zero$Al27.MR._SD, location_zero$Si28.MR._SD, location_zero$P31.MR._SD, location_zero$Ca44.MR._SD,
location_zero$Mn55.MR._SD, location_zero$Fe56.MR._SD, location_zero$Fe57.MR._SD, location_zero$Co59.MR._SD,
location_zero$Ni60.MR._SD, location_zero$Cu63.MR._SD, location_zero$Cu65.MR._SD, location_zero$Zn64.MR._SD,
location_zero$Zn66.MR._SD, location_zero$Rb85.MR._SD, location_zero$Sr88.MR._SD, location_zero$Mo98.MR._SD,
location_zero$In115.MR._SD, location_zero$Ba138.MR._SD, location_zero$K39.HR._SD, location_zero$In115.HR._SD)
x_axis <- c("Li7", "B11", "Ca43", "Rb85.LR", "Sr88.LR", "In115", "Na23", "Mg25", "Al27", "Si28", "P31", "Ca44", "Mn55", "Fe56", "Fe57",
"Co59", "Ni60", "Cu63", "Cu65", "Zn64", "Zn66", "Rb85.MR", "Sr88.MR", "Mo98", "In115", "Ba138", "K39", "In115")
y_axis <- c(-5, 10)
colnames(location_zero_LMHR) <- x_axis
colnames(location_zero_SD) <- x_axis
jpeg("SoluteBoxes1.jpg", width = 2000, height = 1000)
boxplot(log10(location_zero_LMHR), main = "Log10 LMHR Solute concentration ranges for location 0",
xlab = "Solutes", ylab = "Observation", ylim = y_axis)
jpeg("SoluteBoxes2.jpg", width = 2000, height = 1000)
boxplot(log10(location_zero_SD), main = "Log10 SD Solute concentration ranges for location 0",
xlab = "Solutes", ylab = "Concentration", ylim = y_axis)
dev.off()
Returns:


However this is dirty as the data has values like -9999 and I recon those should be scrubbed.
"Next Time"