《生物统计学》上机内容（四）

教学内容

本节课主要介绍如何运用R语言进行随机区组单因素方差检验，是单因素方差分析的扩展。
以下内容仅用于北京林业大学生态与自然保护学院《生物统计学》的第四节上机内容。

参考资料

https://rcompanion.org/handbook/
《R语言与统计分析》（汤银才著）

数据下载链接

lesson_04_one_way_aov_with_blocks.csv

R语言教程

# 预处理
# 清理内存
rm(list = ls())
gc()

##          used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 461134 24.7     995700 53.2   643651 34.4
## Vcells 828707  6.4    8388608 64.0  1649944 12.6

# 加载 packages
# 如果packages加载没成功
# install.packages("agricolae")
# install.packages("car")
# install.packages("psych")
# install.packages("skimr")
library(agricolae)

## Warning: 程辑包'agricolae'是用R版本4.1.1 来建造的

library(car)

## 载入需要的程辑包：carData

library(psych)

## Warning: 程辑包'psych'是用R版本4.1.1 来建造的

##
## 载入程辑包：'psych'

## The following object is masked from 'package:car':
##
##     logit

library(skimr)

## Warning: 程辑包'skimr'是用R版本4.1.1 来建造的

# 加载工作区域
# 需要在setwd里面设置本地磁盘对应的文件路径
setwd("F:/2018年生物统计课程/数据")
getwd()

## [1] "F:/2018年生物统计课程/数据"

随机区组单因素方差检验 One-way ANOVA with Random Blocks

练习题1：研究4种修剪方式A（对照）、B、C、D（k=4）对果树单株产量（kg/株）的影响，4次重复（n=4）,随机完全区组设计，其产量结果如下，试作方差分析？

Table 1. 修剪方式的影响

处理	区组1	区组2	区组3	区组4
A （对照）	25	23	27	26
B	32	27	26	31
C	21	19	20	22
D	20	21	18	21

数据预处理过程

# 加载数据
# 需要在read.csv里面设置本地磁盘对应的文件路径
clipper.data <- read.csv("F:/2018年生物统计课程/数据/lesson_04_one_way_aov_with_blocks.csv")

# 将处理treatment变为处理因子
# 将区组block变为处理因子
clipper.data$treatment <- as.factor(clipper.data$treatment)
clipper.data$block <- as.factor(clipper.data$block)

# 统计描述过程
# 查看数据格式
head(clipper.data) # 展示数据前六行

##   treatment block biomass
## 1         A     1      25
## 2         B     1      32
## 3         C     1      21
## 4         D     1      20
## 5         A     2      23
## 6         B     2      27

str(clipper.data) # 展示数据格式

## 'data.frame':    16 obs. of  3 variables:
##  $ treatment: Factor w/ 4 levels "A","B","C","D": 1 2 3 4 1 2 3 4 1 2 ...
##  $ block    : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 2 2 2 2 3 3 ...
##  $ biomass  : int  25 32 21 20 23 27 19 21 27 26 ...

# 使用skimr包进行更具体的统计描述
skim(clipper.data)

Data summary

Name	clipper.data
Number of rows	16
Number of columns	3
————————
Column type frequency:
factor	2
numeric	1
————————
Group variables	None

Variable type: factor

skim_variable	n_missing	complete_rate	ordered	n_unique	top_counts
treatment	0	1	FALSE	4	A: 4, B: 4, C: 4, D: 4
block	0	1	FALSE	4	1: 4, 2: 4, 3: 4, 4: 4

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
biomass	0	1	23.69	4.19	18	20.75	22.5	26.25	32	▆▇▅▃▃

方差分析过程

# 方差齐性和正太性检验的具体过程
# step1：使用leveneTest进行方差齐性检验
leveneTest(biomass ~ treatment, data = clipper.data)

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value  Pr(>F)
## group  3  3.1935 0.06259 .
##       12
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#  step2：使用qqPlot对数据进行qq图绘制
# 对数据进行线性拟合
#library(car)
fit.lm <- lm(biomass ~ treatment + block, data = clipper.data)

qqPlot(fit.lm, # 拟合方程
       simulate = TRUE, # 分布模拟？
       labels = TRUE, # 对异常值进行标记？
       col = "red", # 点的颜色
       main = "qqplot", # 总标题
       ylab = "Studentized residual" # y轴标题
       )

## [1] 8 9

# step3：使用qqPlot对数据进行正态性检验
# 一般而言，
# 当分析大于50行的大样本数据时，用Kolmogorov-Smirnov检验；
# 当分析小于50行的小样本数据时，用Shapiro-Wilk 检验；
# shapiro.test(x)
# 使用tapply功能对数据进行shapiro.test的批量处理
tapply(clipper.data$biomass, clipper.data$treatment, shapiro.test)

## $A
##
##  Shapiro-Wilk normality test
##
## data:  X[[i]]
## W = 0.97137, p-value = 0.85
##
##
## $B
##
##  Shapiro-Wilk normality test
##
## data:  X[[i]]
## W = 0.88207, p-value = 0.3476
##
##
## $C
##
##  Shapiro-Wilk normality test
##
## data:  X[[i]]
## W = 0.99291, p-value = 0.9719
##
##
## $D
##
##  Shapiro-Wilk normality test
##
## data:  X[[i]]
## W = 0.82743, p-value = 0.1612

# 方差分析的具体过程
# step1：使用lm对数据进行线性拟合
# fit.lm <- lm(biomass ~ treatment, data = clipper.data)

# step2: 使用car包的Anova对数据进行方差分析
# SPSS默认的方差分析类型为III型
# library(car)
Anova(fit.lm, type = "III")

## Anova Table (Type III tests)
##
## Response: biomass
##              Sum Sq Df  F value    Pr(>F)
## (Intercept) 1552.58  1 516.3316 2.938e-09 ***
## treatment    217.69  3  24.1316 0.0001226 ***
## block         18.69  3   2.0716 0.1743753
## Residuals     27.06  9
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# step3: 使用agricolae包的Tukey检验对数据进行多重比较
# library(agricolae)
comparison <- HSD.test(fit.lm, # 拟合的模型
                       "treatment", # 处理因子
                       group = TRUE,
                       main = "Tukey comparison")

# 展示Tukey分析的结果
comparison

## $statistics
##    MSerror Df    Mean       CV     MSD
##   3.006944  9 23.6875 7.320546 3.82783
##
## $parameters
##    test    name.t ntr StudentizedRange alpha
##   Tukey treatment   4          4.41489  0.05
##
## $means
##   biomass      std r Min Max   Q25  Q50   Q75
## A   25.25 1.707825 4  23  27 24.50 25.5 26.25
## B   29.00 2.943920 4  26  32 26.75 29.0 31.25
## C   20.50 1.290994 4  19  22 19.75 20.5 21.25
## D   20.00 1.414214 4  18  21 19.50 20.5 21.00
##
## $comparison
## NULL
##
## $groups
##   biomass groups
## B   29.00      a
## A   25.25      a
## C   20.50      b
## D   20.00      b
##
## attr(,"class")
## [1] "group"

# 使用boxplot函数对数据进行盒状图的构建
boxplot(biomass ~ treatment, # 因变量与自变量之间的关系 y ~ x
        data = clipper.data, # 数据来源
        ylim = c(10, 40), # y轴范围
        xlab = "Clipping style", # y轴标题
        ylab = "Biomass (kg)" # x轴标题
        )

# comparison$groups的数据来源于Tukey检验的结果，将用于作图
comparison$groups

##   biomass groups
## B   29.00      a
## A   25.25      a
## C   20.50      b
## D   20.00      b

# 在comparison$groups里面添加文本坐标的关键信息
# 首先确定文本的x轴位置
comparison$groups$x.pos <- c(2, 1, 3, 4)

# 在添加在盒状图中添加Tukey的结果
text(x = comparison$groups$x.pos, # 字符的x轴位置
     y = comparison$groups$biomass + 5, # 字符的y轴位置
     labels = comparison$groups$groups # 输出本文字符
      )

教学内容

参考资料

数据下载链接

R语言教程

随机区组单因素方差检验 One-way ANOVA with Random Blocks

数据预处理过程

方差分析过程

你可能也喜欢

《生物统计学》-投币游戏

《生物统计学》上机内容（二）

《生物统计学》上机内容（五）