Biostatistics Tutorial Full course for Beginners to Experts

Module 1 - Introduction to Statistics So let’s do an internet course; Academic Lesson’s “Biostatistics Tutorial Full course for Beginners to Experts”. But we’ll use the data I’ve collected so far and if I need any more I’ll download any missing dataset types described. The first module is the introduction and only really gives definitions: Module 1 Learning Objectives All of Which is all quite nice and defined as: Statistical Statistic: A characteristic that describes a sample Parameter: A characteristic that describes a population Descriptive Statistics, describing and communicating information simply Inferential Statistics, drawing conclusions from data and inferring Population: A set of all individuals of interest Sample: A selection…
Read More

Practical Statistics – 6

It would appear sampling and bootstrapping are differentiated in r by: replace = TRUE/FALSE in the sample function. Given this super adorable repository of small cute little sets about fish. I've found a set with Fish weight and length. Two classic variables. Gotta love the classics. There's also three identifiers. This is great for what I'm doing - not being huge and unwieldy for my skillset - they'll work wonderfully for the small sections in the books. Here we have: "FSAdata::BGHRfish – view download documentation – Fish information from samples collected from Big Hill Reservoir, KS, 2014." Let's take a peak: General Exploration colnames(df) [1] "UID" "fishID" "specCode" "length" "weight"…
Read More

Practical Statistics – 5

"….It is also useful to explore how the data is distributed…." So moving forward we get to make some boxes'n stuff, though done in 'boxes over time', this will be as the book decrees. Rather than re-work the same dataset I've gotten a new one to pour over for this and perhaps the next couple sections: "….Rivendell - Rainfall Chemistry, Stream Water Chemistry, Throughfall Chemistry, Groundwater Chemistry - Solute chemistry (2007-2015) Water samples from rainfall, stream, and wells. Major solutes (Ca, Mg, Na, K, and Si). Description: Significant solute flux from the weathered bedrock zone - which underlies soils and saprolite - has been suggested by many studies. However, controlling…
Read More

Practical Statistics – 4

Truth of the matter is I'm burnt out on this set. I suppose there is more that can be done with it but I ain't gonna do it. Before I walk away from it I wanted to do one more distribution chart, soil type by area - Functionally. Before I had a big 'ol loop to do it for me. This time with subsets It's down to a 'handful' of lines, and graphed. SmallTreeData
Read More

Boxes Over Time

Here's some sequential box plots for temperature over each year. #RStudio errors out on large graphs sizes, write it to file instead png("rplot.png", width = 4096, height = 4096) par(mfrow = c(6,5)) for(e in sort(unique(df$year))) { temper
Read More

Graphs and Functional Programming

One of my concerns with my recent efforts was that it felt like I was doing more.....brute force coding than actual R coding. to much for, and if, and while whatever, nonsense. In this I hoped to do more functional programming: "....Functional programming is a programming paradigm where values are passed around into functions, and those functions are themselves values. Functional programmers often try to make large parts of their code referentially transparent...." I don't know what referentially transparent is and I ain't about to fall into any wiki holes at the moment but I did try to use more of R's built in functions rather than all the loops…
Read More

Practical Statistics – 2

Estimates of Location and Variability A 'Typical Value', average, or estimate of where most of the data is located provides a central tendency for variables with measure data. Telling us where most values for a particular phenomenon will be found. These tendencies are the Mean and the Median in their various forms Average elevation for each area for (e in seq(1:4)) { assign("x", paste("SmallTreeData$Wilderness_Area", e, sep="")) z
Read More

Practical Statistics – 1

“....This book is aimed at the data scientist with some familiarity with the R programming language, and with some prior (perhaps spotty or ephemeral) exposure to statistics....” “....The two goals of this book:To lay out, in digestible, navigable, and easily referenced form, key concepts from statistics that are relevant to data science.To explain which concepts are important and useful from a data science perspective, which are less so, and why....” Part 1: Exploratory Data Analysis Using the Kaggle Cover Types Predictions Data Set. Roosevelt National Forest: Located in north central Colorado, northwest of Denver Colorado, in the southern Rocky Mountain Region the forest is a mixed coniferous forest occupying the…
Read More

Biostatistics Tutorial Full course for Beginners to Experts

Module 1 - Introduction to Statistics So let’s do an internet course; Academic Lesson’s “Biostatistics Tutorial Full course for Beginners to Experts”. But we’ll use the data I’ve collected so far and if I need any more I’ll download any missing dataset types described. The first module is the introduction and only really gives definitions: Module 1 Learning Objectives All of Which is all quite nice and defined as: Statistical Statistic: A characteristic that describes a sample Parameter: A characteristic that describes a population Descriptive Statistics, describing and communicating information simply Inferential Statistics, drawing conclusions from data and inferring Population: A set of all individuals of interest Sample: A selection…
Read More

Practical Statistics – 6

It would appear sampling and bootstrapping are differentiated in r by: replace = TRUE/FALSE in the sample function. Given this super adorable repository of small cute little sets about fish. I've found a set with Fish weight and length. Two classic variables. Gotta love the classics. There's also three identifiers. This is great for what I'm doing - not being huge and unwieldy for my skillset - they'll work wonderfully for the small sections in the books. Here we have: "FSAdata::BGHRfish – view download documentation – Fish information from samples collected from Big Hill Reservoir, KS, 2014." Let's take a peak: General Exploration colnames(df) [1] "UID" "fishID" "specCode" "length" "weight"…
Read More

Practical Statistics – 5

"….It is also useful to explore how the data is distributed…." So moving forward we get to make some boxes'n stuff, though done in 'boxes over time', this will be as the book decrees. Rather than re-work the same dataset I've gotten a new one to pour over for this and perhaps the next couple sections: "….Rivendell - Rainfall Chemistry, Stream Water Chemistry, Throughfall Chemistry, Groundwater Chemistry - Solute chemistry (2007-2015) Water samples from rainfall, stream, and wells. Major solutes (Ca, Mg, Na, K, and Si). Description: Significant solute flux from the weathered bedrock zone - which underlies soils and saprolite - has been suggested by many studies. However, controlling…
Read More

Practical Statistics – 4

Truth of the matter is I'm burnt out on this set. I suppose there is more that can be done with it but I ain't gonna do it. Before I walk away from it I wanted to do one more distribution chart, soil type by area - Functionally. Before I had a big 'ol loop to do it for me. This time with subsets It's down to a 'handful' of lines, and graphed. SmallTreeData
Read More

Boxes Over Time

Here's some sequential box plots for temperature over each year. #RStudio errors out on large graphs sizes, write it to file instead png("rplot.png", width = 4096, height = 4096) par(mfrow = c(6,5)) for(e in sort(unique(df$year))) { temper
Read More

Graphs and Functional Programming

One of my concerns with my recent efforts was that it felt like I was doing more.....brute force coding than actual R coding. to much for, and if, and while whatever, nonsense. In this I hoped to do more functional programming: "....Functional programming is a programming paradigm where values are passed around into functions, and those functions are themselves values. Functional programmers often try to make large parts of their code referentially transparent...." I don't know what referentially transparent is and I ain't about to fall into any wiki holes at the moment but I did try to use more of R's built in functions rather than all the loops…
Read More

Practical Statistics – 2

Estimates of Location and Variability A 'Typical Value', average, or estimate of where most of the data is located provides a central tendency for variables with measure data. Telling us where most values for a particular phenomenon will be found. These tendencies are the Mean and the Median in their various forms Average elevation for each area for (e in seq(1:4)) { assign("x", paste("SmallTreeData$Wilderness_Area", e, sep="")) z
Read More

Practical Statistics – 1

“....This book is aimed at the data scientist with some familiarity with the R programming language, and with some prior (perhaps spotty or ephemeral) exposure to statistics....” “....The two goals of this book:To lay out, in digestible, navigable, and easily referenced form, key concepts from statistics that are relevant to data science.To explain which concepts are important and useful from a data science perspective, which are less so, and why....” Part 1: Exploratory Data Analysis Using the Kaggle Cover Types Predictions Data Set. Roosevelt National Forest: Located in north central Colorado, northwest of Denver Colorado, in the southern Rocky Mountain Region the forest is a mixed coniferous forest occupying the…
Read More

Biostatistics Tutorial Full course for Beginners to Experts

Module 1 - Introduction to Statistics So let’s do an internet course; Academic Lesson’s “Biostatistics Tutorial Full course for Beginners to Experts”. But we’ll use the data I’ve collected so far and if I need any more I’ll download any missing dataset types described. The first module is the introduction and only really gives definitions: Module 1 Learning Objectives All of Which is all quite nice and defined as: Statistical Statistic: A characteristic that describes a sample Parameter: A characteristic that describes a population Descriptive Statistics, describing and communicating information simply Inferential Statistics, drawing conclusions from data and inferring Population: A set of all individuals of interest Sample: A selection…
Read More

Practical Statistics – 6

It would appear sampling and bootstrapping are differentiated in r by: replace = TRUE/FALSE in the sample function. Given this super adorable repository of small cute little sets about fish. I've found a set with Fish weight and length. Two classic variables. Gotta love the classics. There's also three identifiers. This is great for what I'm doing - not being huge and unwieldy for my skillset - they'll work wonderfully for the small sections in the books. Here we have: "FSAdata::BGHRfish – view download documentation – Fish information from samples collected from Big Hill Reservoir, KS, 2014." Let's take a peak: General Exploration colnames(df) [1] "UID" "fishID" "specCode" "length" "weight"…
Read More

Practical Statistics – 5

"….It is also useful to explore how the data is distributed…." So moving forward we get to make some boxes'n stuff, though done in 'boxes over time', this will be as the book decrees. Rather than re-work the same dataset I've gotten a new one to pour over for this and perhaps the next couple sections: "….Rivendell - Rainfall Chemistry, Stream Water Chemistry, Throughfall Chemistry, Groundwater Chemistry - Solute chemistry (2007-2015) Water samples from rainfall, stream, and wells. Major solutes (Ca, Mg, Na, K, and Si). Description: Significant solute flux from the weathered bedrock zone - which underlies soils and saprolite - has been suggested by many studies. However, controlling…
Read More

Practical Statistics – 4

Truth of the matter is I'm burnt out on this set. I suppose there is more that can be done with it but I ain't gonna do it. Before I walk away from it I wanted to do one more distribution chart, soil type by area - Functionally. Before I had a big 'ol loop to do it for me. This time with subsets It's down to a 'handful' of lines, and graphed. SmallTreeData
Read More

Boxes Over Time

Here's some sequential box plots for temperature over each year. #RStudio errors out on large graphs sizes, write it to file instead png("rplot.png", width = 4096, height = 4096) par(mfrow = c(6,5)) for(e in sort(unique(df$year))) { temper
Read More

Graphs and Functional Programming

One of my concerns with my recent efforts was that it felt like I was doing more.....brute force coding than actual R coding. to much for, and if, and while whatever, nonsense. In this I hoped to do more functional programming: "....Functional programming is a programming paradigm where values are passed around into functions, and those functions are themselves values. Functional programmers often try to make large parts of their code referentially transparent...." I don't know what referentially transparent is and I ain't about to fall into any wiki holes at the moment but I did try to use more of R's built in functions rather than all the loops…
Read More

Practical Statistics – 2

Estimates of Location and Variability A 'Typical Value', average, or estimate of where most of the data is located provides a central tendency for variables with measure data. Telling us where most values for a particular phenomenon will be found. These tendencies are the Mean and the Median in their various forms Average elevation for each area for (e in seq(1:4)) { assign("x", paste("SmallTreeData$Wilderness_Area", e, sep="")) z
Read More

Practical Statistics – 1

“....This book is aimed at the data scientist with some familiarity with the R programming language, and with some prior (perhaps spotty or ephemeral) exposure to statistics....” “....The two goals of this book:To lay out, in digestible, navigable, and easily referenced form, key concepts from statistics that are relevant to data science.To explain which concepts are important and useful from a data science perspective, which are less so, and why....” Part 1: Exploratory Data Analysis Using the Kaggle Cover Types Predictions Data Set. Roosevelt National Forest: Located in north central Colorado, northwest of Denver Colorado, in the southern Rocky Mountain Region the forest is a mixed coniferous forest occupying the…
Read More

Biostatistics Tutorial Full course for Beginners to Experts

Module 1 - Introduction to Statistics So let’s do an internet course; Academic Lesson’s “Biostatistics Tutorial Full course for Beginners to Experts”. But we’ll use the data I’ve collected so far and if I need any more I’ll download any missing dataset types described. The first module is the introduction and only really gives definitions: Module 1 Learning Objectives All of Which is all quite nice and defined as: Statistical Statistic: A characteristic that describes a sample Parameter: A characteristic that describes a population Descriptive Statistics, describing and communicating information simply Inferential Statistics, drawing conclusions from data and inferring Population: A set of all individuals of interest Sample: A selection…
Read More

Practical Statistics – 6

It would appear sampling and bootstrapping are differentiated in r by: replace = TRUE/FALSE in the sample function. Given this super adorable repository of small cute little sets about fish. I've found a set with Fish weight and length. Two classic variables. Gotta love the classics. There's also three identifiers. This is great for what I'm doing - not being huge and unwieldy for my skillset - they'll work wonderfully for the small sections in the books. Here we have: "FSAdata::BGHRfish – view download documentation – Fish information from samples collected from Big Hill Reservoir, KS, 2014." Let's take a peak: General Exploration colnames(df) [1] "UID" "fishID" "specCode" "length" "weight"…
Read More

Practical Statistics – 5

"….It is also useful to explore how the data is distributed…." So moving forward we get to make some boxes'n stuff, though done in 'boxes over time', this will be as the book decrees. Rather than re-work the same dataset I've gotten a new one to pour over for this and perhaps the next couple sections: "….Rivendell - Rainfall Chemistry, Stream Water Chemistry, Throughfall Chemistry, Groundwater Chemistry - Solute chemistry (2007-2015) Water samples from rainfall, stream, and wells. Major solutes (Ca, Mg, Na, K, and Si). Description: Significant solute flux from the weathered bedrock zone - which underlies soils and saprolite - has been suggested by many studies. However, controlling…
Read More

Practical Statistics – 4

Truth of the matter is I'm burnt out on this set. I suppose there is more that can be done with it but I ain't gonna do it. Before I walk away from it I wanted to do one more distribution chart, soil type by area - Functionally. Before I had a big 'ol loop to do it for me. This time with subsets It's down to a 'handful' of lines, and graphed. SmallTreeData
Read More

Boxes Over Time

Here's some sequential box plots for temperature over each year. #RStudio errors out on large graphs sizes, write it to file instead png("rplot.png", width = 4096, height = 4096) par(mfrow = c(6,5)) for(e in sort(unique(df$year))) { temper
Read More

Graphs and Functional Programming

One of my concerns with my recent efforts was that it felt like I was doing more.....brute force coding than actual R coding. to much for, and if, and while whatever, nonsense. In this I hoped to do more functional programming: "....Functional programming is a programming paradigm where values are passed around into functions, and those functions are themselves values. Functional programmers often try to make large parts of their code referentially transparent...." I don't know what referentially transparent is and I ain't about to fall into any wiki holes at the moment but I did try to use more of R's built in functions rather than all the loops…
Read More

Practical Statistics – 2

Estimates of Location and Variability A 'Typical Value', average, or estimate of where most of the data is located provides a central tendency for variables with measure data. Telling us where most values for a particular phenomenon will be found. These tendencies are the Mean and the Median in their various forms Average elevation for each area for (e in seq(1:4)) { assign("x", paste("SmallTreeData$Wilderness_Area", e, sep="")) z
Read More

Practical Statistics – 1

“....This book is aimed at the data scientist with some familiarity with the R programming language, and with some prior (perhaps spotty or ephemeral) exposure to statistics....” “....The two goals of this book:To lay out, in digestible, navigable, and easily referenced form, key concepts from statistics that are relevant to data science.To explain which concepts are important and useful from a data science perspective, which are less so, and why....” Part 1: Exploratory Data Analysis Using the Kaggle Cover Types Predictions Data Set. Roosevelt National Forest: Located in north central Colorado, northwest of Denver Colorado, in the southern Rocky Mountain Region the forest is a mixed coniferous forest occupying the…
Read More

Biostatistics Tutorial Full course for Beginners to Experts

Module 1 - Introduction to Statistics So let’s do an internet course; Academic Lesson’s “Biostatistics Tutorial Full course for Beginners to Experts”. But we’ll use the data I’ve collected so far and if I need any more I’ll download any missing dataset types described. The first module is the introduction and only really gives definitions: Module 1 Learning Objectives All of Which is all quite nice and defined as: Statistical Statistic: A characteristic that describes a sample Parameter: A characteristic that describes a population Descriptive Statistics, describing and communicating information simply Inferential Statistics, drawing conclusions from data and inferring Population: A set of all individuals of interest Sample: A selection…
Read More

Practical Statistics – 6

It would appear sampling and bootstrapping are differentiated in r by: replace = TRUE/FALSE in the sample function. Given this super adorable repository of small cute little sets about fish. I've found a set with Fish weight and length. Two classic variables. Gotta love the classics. There's also three identifiers. This is great for what I'm doing - not being huge and unwieldy for my skillset - they'll work wonderfully for the small sections in the books. Here we have: "FSAdata::BGHRfish – view download documentation – Fish information from samples collected from Big Hill Reservoir, KS, 2014." Let's take a peak: General Exploration colnames(df) [1] "UID" "fishID" "specCode" "length" "weight"…
Read More

Practical Statistics – 5

"….It is also useful to explore how the data is distributed…." So moving forward we get to make some boxes'n stuff, though done in 'boxes over time', this will be as the book decrees. Rather than re-work the same dataset I've gotten a new one to pour over for this and perhaps the next couple sections: "….Rivendell - Rainfall Chemistry, Stream Water Chemistry, Throughfall Chemistry, Groundwater Chemistry - Solute chemistry (2007-2015) Water samples from rainfall, stream, and wells. Major solutes (Ca, Mg, Na, K, and Si). Description: Significant solute flux from the weathered bedrock zone - which underlies soils and saprolite - has been suggested by many studies. However, controlling…
Read More

Practical Statistics – 4

Truth of the matter is I'm burnt out on this set. I suppose there is more that can be done with it but I ain't gonna do it. Before I walk away from it I wanted to do one more distribution chart, soil type by area - Functionally. Before I had a big 'ol loop to do it for me. This time with subsets It's down to a 'handful' of lines, and graphed. SmallTreeData
Read More

Boxes Over Time

Here's some sequential box plots for temperature over each year. #RStudio errors out on large graphs sizes, write it to file instead png("rplot.png", width = 4096, height = 4096) par(mfrow = c(6,5)) for(e in sort(unique(df$year))) { temper
Read More

Graphs and Functional Programming

One of my concerns with my recent efforts was that it felt like I was doing more.....brute force coding than actual R coding. to much for, and if, and while whatever, nonsense. In this I hoped to do more functional programming: "....Functional programming is a programming paradigm where values are passed around into functions, and those functions are themselves values. Functional programmers often try to make large parts of their code referentially transparent...." I don't know what referentially transparent is and I ain't about to fall into any wiki holes at the moment but I did try to use more of R's built in functions rather than all the loops…
Read More

Practical Statistics – 2

Estimates of Location and Variability A 'Typical Value', average, or estimate of where most of the data is located provides a central tendency for variables with measure data. Telling us where most values for a particular phenomenon will be found. These tendencies are the Mean and the Median in their various forms Average elevation for each area for (e in seq(1:4)) { assign("x", paste("SmallTreeData$Wilderness_Area", e, sep="")) z
Read More

Practical Statistics – 1

“....This book is aimed at the data scientist with some familiarity with the R programming language, and with some prior (perhaps spotty or ephemeral) exposure to statistics....” “....The two goals of this book:To lay out, in digestible, navigable, and easily referenced form, key concepts from statistics that are relevant to data science.To explain which concepts are important and useful from a data science perspective, which are less so, and why....” Part 1: Exploratory Data Analysis Using the Kaggle Cover Types Predictions Data Set. Roosevelt National Forest: Located in north central Colorado, northwest of Denver Colorado, in the southern Rocky Mountain Region the forest is a mixed coniferous forest occupying the…
Read More

Biostatistics Tutorial Full course for Beginners to Experts

Module 1 - Introduction to Statistics So let’s do an internet course; Academic Lesson’s “Biostatistics Tutorial Full course for Beginners to Experts”. But we’ll use the data I’ve collected so far and if I need any more I’ll download any missing dataset types described. The first module is the introduction and only really gives definitions: Module 1 Learning Objectives All of Which is all quite nice and defined as: Statistical Statistic: A characteristic that describes a sample Parameter: A characteristic that describes a population Descriptive Statistics, describing and communicating information simply Inferential Statistics, drawing conclusions from data and inferring Population: A set of all individuals of interest Sample: A selection…
Read More

Practical Statistics – 6

It would appear sampling and bootstrapping are differentiated in r by: replace = TRUE/FALSE in the sample function. Given this super adorable repository of small cute little sets about fish. I've found a set with Fish weight and length. Two classic variables. Gotta love the classics. There's also three identifiers. This is great for what I'm doing - not being huge and unwieldy for my skillset - they'll work wonderfully for the small sections in the books. Here we have: "FSAdata::BGHRfish – view download documentation – Fish information from samples collected from Big Hill Reservoir, KS, 2014." Let's take a peak: General Exploration colnames(df) [1] "UID" "fishID" "specCode" "length" "weight"…
Read More

Practical Statistics – 5

"….It is also useful to explore how the data is distributed…." So moving forward we get to make some boxes'n stuff, though done in 'boxes over time', this will be as the book decrees. Rather than re-work the same dataset I've gotten a new one to pour over for this and perhaps the next couple sections: "….Rivendell - Rainfall Chemistry, Stream Water Chemistry, Throughfall Chemistry, Groundwater Chemistry - Solute chemistry (2007-2015) Water samples from rainfall, stream, and wells. Major solutes (Ca, Mg, Na, K, and Si). Description: Significant solute flux from the weathered bedrock zone - which underlies soils and saprolite - has been suggested by many studies. However, controlling…
Read More

Practical Statistics – 4

Truth of the matter is I'm burnt out on this set. I suppose there is more that can be done with it but I ain't gonna do it. Before I walk away from it I wanted to do one more distribution chart, soil type by area - Functionally. Before I had a big 'ol loop to do it for me. This time with subsets It's down to a 'handful' of lines, and graphed. SmallTreeData
Read More

Boxes Over Time

Here's some sequential box plots for temperature over each year. #RStudio errors out on large graphs sizes, write it to file instead png("rplot.png", width = 4096, height = 4096) par(mfrow = c(6,5)) for(e in sort(unique(df$year))) { temper
Read More

Graphs and Functional Programming

One of my concerns with my recent efforts was that it felt like I was doing more.....brute force coding than actual R coding. to much for, and if, and while whatever, nonsense. In this I hoped to do more functional programming: "....Functional programming is a programming paradigm where values are passed around into functions, and those functions are themselves values. Functional programmers often try to make large parts of their code referentially transparent...." I don't know what referentially transparent is and I ain't about to fall into any wiki holes at the moment but I did try to use more of R's built in functions rather than all the loops…
Read More

Practical Statistics – 2

Estimates of Location and Variability A 'Typical Value', average, or estimate of where most of the data is located provides a central tendency for variables with measure data. Telling us where most values for a particular phenomenon will be found. These tendencies are the Mean and the Median in their various forms Average elevation for each area for (e in seq(1:4)) { assign("x", paste("SmallTreeData$Wilderness_Area", e, sep="")) z
Read More

Practical Statistics – 1

“....This book is aimed at the data scientist with some familiarity with the R programming language, and with some prior (perhaps spotty or ephemeral) exposure to statistics....” “....The two goals of this book:To lay out, in digestible, navigable, and easily referenced form, key concepts from statistics that are relevant to data science.To explain which concepts are important and useful from a data science perspective, which are less so, and why....” Part 1: Exploratory Data Analysis Using the Kaggle Cover Types Predictions Data Set. Roosevelt National Forest: Located in north central Colorado, northwest of Denver Colorado, in the southern Rocky Mountain Region the forest is a mixed coniferous forest occupying the…
Read More

Biostatistics Tutorial Full course for Beginners to Experts

Module 1 - Introduction to Statistics So let’s do an internet course; Academic Lesson’s “Biostatistics Tutorial Full course for Beginners to Experts”. But we’ll use the data I’ve collected so far and if I need any more I’ll download any missing dataset types described. The first module is the introduction and only really gives definitions: Module 1 Learning Objectives All of Which is all quite nice and defined as: Statistical Statistic: A characteristic that describes a sample Parameter: A characteristic that describes a population Descriptive Statistics, describing and communicating information simply Inferential Statistics, drawing conclusions from data and inferring Population: A set of all individuals of interest Sample: A selection…
Read More

Practical Statistics – 6

It would appear sampling and bootstrapping are differentiated in r by: replace = TRUE/FALSE in the sample function. Given this super adorable repository of small cute little sets about fish. I've found a set with Fish weight and length. Two classic variables. Gotta love the classics. There's also three identifiers. This is great for what I'm doing - not being huge and unwieldy for my skillset - they'll work wonderfully for the small sections in the books. Here we have: "FSAdata::BGHRfish – view download documentation – Fish information from samples collected from Big Hill Reservoir, KS, 2014." Let's take a peak: General Exploration colnames(df) [1] "UID" "fishID" "specCode" "length" "weight"…
Read More

Practical Statistics – 5

"….It is also useful to explore how the data is distributed…." So moving forward we get to make some boxes'n stuff, though done in 'boxes over time', this will be as the book decrees. Rather than re-work the same dataset I've gotten a new one to pour over for this and perhaps the next couple sections: "….Rivendell - Rainfall Chemistry, Stream Water Chemistry, Throughfall Chemistry, Groundwater Chemistry - Solute chemistry (2007-2015) Water samples from rainfall, stream, and wells. Major solutes (Ca, Mg, Na, K, and Si). Description: Significant solute flux from the weathered bedrock zone - which underlies soils and saprolite - has been suggested by many studies. However, controlling…
Read More

Practical Statistics – 4

Truth of the matter is I'm burnt out on this set. I suppose there is more that can be done with it but I ain't gonna do it. Before I walk away from it I wanted to do one more distribution chart, soil type by area - Functionally. Before I had a big 'ol loop to do it for me. This time with subsets It's down to a 'handful' of lines, and graphed. SmallTreeData
Read More

Boxes Over Time

Here's some sequential box plots for temperature over each year. #RStudio errors out on large graphs sizes, write it to file instead png("rplot.png", width = 4096, height = 4096) par(mfrow = c(6,5)) for(e in sort(unique(df$year))) { temper
Read More

Graphs and Functional Programming

One of my concerns with my recent efforts was that it felt like I was doing more.....brute force coding than actual R coding. to much for, and if, and while whatever, nonsense. In this I hoped to do more functional programming: "....Functional programming is a programming paradigm where values are passed around into functions, and those functions are themselves values. Functional programmers often try to make large parts of their code referentially transparent...." I don't know what referentially transparent is and I ain't about to fall into any wiki holes at the moment but I did try to use more of R's built in functions rather than all the loops…
Read More

Practical Statistics – 2

Estimates of Location and Variability A 'Typical Value', average, or estimate of where most of the data is located provides a central tendency for variables with measure data. Telling us where most values for a particular phenomenon will be found. These tendencies are the Mean and the Median in their various forms Average elevation for each area for (e in seq(1:4)) { assign("x", paste("SmallTreeData$Wilderness_Area", e, sep="")) z
Read More

Practical Statistics – 1

“....This book is aimed at the data scientist with some familiarity with the R programming language, and with some prior (perhaps spotty or ephemeral) exposure to statistics....” “....The two goals of this book:To lay out, in digestible, navigable, and easily referenced form, key concepts from statistics that are relevant to data science.To explain which concepts are important and useful from a data science perspective, which are less so, and why....” Part 1: Exploratory Data Analysis Using the Kaggle Cover Types Predictions Data Set. Roosevelt National Forest: Located in north central Colorado, northwest of Denver Colorado, in the southern Rocky Mountain Region the forest is a mixed coniferous forest occupying the…
Read More

Biostatistics Tutorial Full course for Beginners to Experts

Module 1 - Introduction to Statistics So let’s do an internet course; Academic Lesson’s “Biostatistics Tutorial Full course for Beginners to Experts”. But we’ll use the data I’ve collected so far and if I need any more I’ll download any missing dataset types described. The first module is the introduction and only really gives definitions: Module 1 Learning Objectives All of Which is all quite nice and defined as: Statistical Statistic: A characteristic that describes a sample Parameter: A characteristic that describes a population Descriptive Statistics, describing and communicating information simply Inferential Statistics, drawing conclusions from data and inferring Population: A set of all individuals of interest Sample: A selection…
Read More

Practical Statistics – 6

It would appear sampling and bootstrapping are differentiated in r by: replace = TRUE/FALSE in the sample function. Given this super adorable repository of small cute little sets about fish. I've found a set with Fish weight and length. Two classic variables. Gotta love the classics. There's also three identifiers. This is great for what I'm doing - not being huge and unwieldy for my skillset - they'll work wonderfully for the small sections in the books. Here we have: "FSAdata::BGHRfish – view download documentation – Fish information from samples collected from Big Hill Reservoir, KS, 2014." Let's take a peak: General Exploration colnames(df) [1] "UID" "fishID" "specCode" "length" "weight"…
Read More

Practical Statistics – 5

"….It is also useful to explore how the data is distributed…." So moving forward we get to make some boxes'n stuff, though done in 'boxes over time', this will be as the book decrees. Rather than re-work the same dataset I've gotten a new one to pour over for this and perhaps the next couple sections: "….Rivendell - Rainfall Chemistry, Stream Water Chemistry, Throughfall Chemistry, Groundwater Chemistry - Solute chemistry (2007-2015) Water samples from rainfall, stream, and wells. Major solutes (Ca, Mg, Na, K, and Si). Description: Significant solute flux from the weathered bedrock zone - which underlies soils and saprolite - has been suggested by many studies. However, controlling…
Read More

Practical Statistics – 4

Truth of the matter is I'm burnt out on this set. I suppose there is more that can be done with it but I ain't gonna do it. Before I walk away from it I wanted to do one more distribution chart, soil type by area - Functionally. Before I had a big 'ol loop to do it for me. This time with subsets It's down to a 'handful' of lines, and graphed. SmallTreeData
Read More

Boxes Over Time

Here's some sequential box plots for temperature over each year. #RStudio errors out on large graphs sizes, write it to file instead png("rplot.png", width = 4096, height = 4096) par(mfrow = c(6,5)) for(e in sort(unique(df$year))) { temper
Read More

Graphs and Functional Programming

One of my concerns with my recent efforts was that it felt like I was doing more.....brute force coding than actual R coding. to much for, and if, and while whatever, nonsense. In this I hoped to do more functional programming: "....Functional programming is a programming paradigm where values are passed around into functions, and those functions are themselves values. Functional programmers often try to make large parts of their code referentially transparent...." I don't know what referentially transparent is and I ain't about to fall into any wiki holes at the moment but I did try to use more of R's built in functions rather than all the loops…
Read More

Practical Statistics – 2

Estimates of Location and Variability A 'Typical Value', average, or estimate of where most of the data is located provides a central tendency for variables with measure data. Telling us where most values for a particular phenomenon will be found. These tendencies are the Mean and the Median in their various forms Average elevation for each area for (e in seq(1:4)) { assign("x", paste("SmallTreeData$Wilderness_Area", e, sep="")) z
Read More

Practical Statistics – 1

“....This book is aimed at the data scientist with some familiarity with the R programming language, and with some prior (perhaps spotty or ephemeral) exposure to statistics....” “....The two goals of this book:To lay out, in digestible, navigable, and easily referenced form, key concepts from statistics that are relevant to data science.To explain which concepts are important and useful from a data science perspective, which are less so, and why....” Part 1: Exploratory Data Analysis Using the Kaggle Cover Types Predictions Data Set. Roosevelt National Forest: Located in north central Colorado, northwest of Denver Colorado, in the southern Rocky Mountain Region the forest is a mixed coniferous forest occupying the…
Read More

Biostatistics Tutorial Full course for Beginners to Experts

Module 1 - Introduction to Statistics So let’s do an internet course; Academic Lesson’s “Biostatistics Tutorial Full course for Beginners to Experts”. But we’ll use the data I’ve collected so far and if I need any more I’ll download any missing dataset types described. The first module is the introduction and only really gives definitions: Module 1 Learning Objectives All of Which is all quite nice and defined as: Statistical Statistic: A characteristic that describes a sample Parameter: A characteristic that describes a population Descriptive Statistics, describing and communicating information simply Inferential Statistics, drawing conclusions from data and inferring Population: A set of all individuals of interest Sample: A selection…
Read More

Practical Statistics – 6

It would appear sampling and bootstrapping are differentiated in r by: replace = TRUE/FALSE in the sample function. Given this super adorable repository of small cute little sets about fish. I've found a set with Fish weight and length. Two classic variables. Gotta love the classics. There's also three identifiers. This is great for what I'm doing - not being huge and unwieldy for my skillset - they'll work wonderfully for the small sections in the books. Here we have: "FSAdata::BGHRfish – view download documentation – Fish information from samples collected from Big Hill Reservoir, KS, 2014." Let's take a peak: General Exploration colnames(df) [1] "UID" "fishID" "specCode" "length" "weight"…
Read More

Practical Statistics – 5

"….It is also useful to explore how the data is distributed…." So moving forward we get to make some boxes'n stuff, though done in 'boxes over time', this will be as the book decrees. Rather than re-work the same dataset I've gotten a new one to pour over for this and perhaps the next couple sections: "….Rivendell - Rainfall Chemistry, Stream Water Chemistry, Throughfall Chemistry, Groundwater Chemistry - Solute chemistry (2007-2015) Water samples from rainfall, stream, and wells. Major solutes (Ca, Mg, Na, K, and Si). Description: Significant solute flux from the weathered bedrock zone - which underlies soils and saprolite - has been suggested by many studies. However, controlling…
Read More

Practical Statistics – 4

Truth of the matter is I'm burnt out on this set. I suppose there is more that can be done with it but I ain't gonna do it. Before I walk away from it I wanted to do one more distribution chart, soil type by area - Functionally. Before I had a big 'ol loop to do it for me. This time with subsets It's down to a 'handful' of lines, and graphed. SmallTreeData
Read More

Boxes Over Time

Here's some sequential box plots for temperature over each year. #RStudio errors out on large graphs sizes, write it to file instead png("rplot.png", width = 4096, height = 4096) par(mfrow = c(6,5)) for(e in sort(unique(df$year))) { temper
Read More

Graphs and Functional Programming

One of my concerns with my recent efforts was that it felt like I was doing more.....brute force coding than actual R coding. to much for, and if, and while whatever, nonsense. In this I hoped to do more functional programming: "....Functional programming is a programming paradigm where values are passed around into functions, and those functions are themselves values. Functional programmers often try to make large parts of their code referentially transparent...." I don't know what referentially transparent is and I ain't about to fall into any wiki holes at the moment but I did try to use more of R's built in functions rather than all the loops…
Read More

Practical Statistics – 2

Estimates of Location and Variability A 'Typical Value', average, or estimate of where most of the data is located provides a central tendency for variables with measure data. Telling us where most values for a particular phenomenon will be found. These tendencies are the Mean and the Median in their various forms Average elevation for each area for (e in seq(1:4)) { assign("x", paste("SmallTreeData$Wilderness_Area", e, sep="")) z
Read More

Practical Statistics – 1

“....This book is aimed at the data scientist with some familiarity with the R programming language, and with some prior (perhaps spotty or ephemeral) exposure to statistics....” “....The two goals of this book:To lay out, in digestible, navigable, and easily referenced form, key concepts from statistics that are relevant to data science.To explain which concepts are important and useful from a data science perspective, which are less so, and why....” Part 1: Exploratory Data Analysis Using the Kaggle Cover Types Predictions Data Set. Roosevelt National Forest: Located in north central Colorado, northwest of Denver Colorado, in the southern Rocky Mountain Region the forest is a mixed coniferous forest occupying the…
Read More