そうかんぶんせき

39640 ワード

📌 そうかんぶんせき

の2つの変数間の線形関係を正相関と呼び,これらの関係の解析を正相関解析と呼ぶ.

の2つの変数は、通常、連続型変数を仮定する.

> library(MASS)
> str(cats)  # 고양이 성별에 따른 몸무게와 심장의 무게
'data.frame':	144 obs. of  3 variables:
 $ Sex: Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
 $ Bwt: num  2 2 2 2.1 2.1 2.1 2.1 2.1 2.1 2.1 ...
 $ Hwt: num  7 7.4 9.5 7.2 7.3 7.6 8.1 8.2 8.3 8.5 ...

相関分析により,猫の体重と心臓重量の関係を解析した.相関解析を行う前に,計算の観点から関係を大まかに理解しておきましょう.

> plot(cats$Hwt ~ cats$Bwt, col="forestgreen", pch=19,
+      xlab="Body weight (kg)", ylab="Heart weight (g)",
+      main="Body weight and Heart weight of Cats")

📌 そうかんけいすう
2つの変数間の線形関係の強度を測定する
値は

-1~1+1です.

-1:負の相関

+1:正相関

0:2つの変数の間に線形関係はありません.

対称、xとyの相関係数はyとxの相関係数と一致し、

線形変換の影響を受けない

> cor(cats$Bwt, cats$Hwt)
[1] 0.8041274

猫の体重と心臓重量の相関係数は0.8041274で、1に近い.正の線形関係があることがわかります.
method=c("pearson", "kendall", "spearman")method買収の3つのオプションを指定できます.
データセットが

の場合、pearsonは

です.

データセットがシーケンススケールの場合は「kendall」、「searman」

📝 Pearson相関係数とSpearman相関係数

Pearson相関係数には正規性仮定が必要である.

Spearman相関係数は正規性の仮定を満たさないシーケンススケールデータ計算に基づいて異常点にあまり敏感ではない.

Pearson相関係数とSpearman相関係数が大きく異なる場合、データにはPearson相関係数に大きな影響を及ぼす異常点が含まれる可能性がある.

cor()関数は相関係数を計算しますが、有効性の検証は行われません.相関係数の有効性検査はcor.test()関数を用いた.
貴務仮定:採用団に対する相関係数は0である.
対立仮定:募集団に対する相関係数は0ではない.

> cor.test(cats$Bwt, cats$Hwt)

	Pearson's product-moment correlation

data:  cats$Bwt and cats$Hwt
t = 16.119, df = 142, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.7375682 0.8552122
sample estimates:
      cor 
0.8041274

p-value=2.2 e-16、却下は仮定なし.
つまり、猫の体重と心臓の重さの間には正の相関関係がある.

> cor.test(cats$Bwt, cats$Hwt, alternative="greater", conf.level=0.99)

	Pearson's product-moment correlation

data:  cats$Bwt and cats$Hwt
t = 16.119, df = 142, p-value < 2.2e-16
alternative hypothesis: true correlation is greater than 0
99 percent confidence interval:
 0.7231755 1.0000000
sample estimates:
      cor 
0.8041274

alternative="greater"
耳無仮定:相関係数は0未満です.
対立仮定:相関係数が0より大きい
p-value=2.2 e-16、却下は仮定なし.すなわち、相関係数が0より大きい.上記の結果があります.

> cor.test(~ Bwt + Hwt, data=cats)

	Pearson's product-moment correlation

data:  Bwt and Hwt
t = 16.119, df = 142, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.7375682 0.8552122
sample estimates:
      cor 
0.8041274

シミュレーションフォーマットを利用する利点は、関数をサブセットで適用できることです.
p-value=0.000186であるため、却下は仮定なしとする.
母猫では,猫の体重と心臓重量の間にも相関関係がある.
しかしながら、cor.test()関数は、2つの変数間の相関検査のみを使用できる限界がある.
📌 相関係数行列

> str(iris)
'data.frame':	150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... 

> iris.cor <- cor(iris[-5])
> iris.cor

             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000

> class(iris.cor)  # 행렬 형식
[1] "matrix" "array

> iris.cor["Petal.Width", "Petal.Length"]
[1] 0.9628654

> library(psych)
>corr.test(iris[-5])
Call:corr.test(x = iris[-5])
Correlation matrix 
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length         1.00       -0.12         0.87        0.82
Sepal.Width         -0.12        1.00        -0.43       -0.37
Petal.Length         0.87       -0.43         1.00        0.96
Petal.Width          0.82       -0.37         0.96        1.00
Sample Size 
[1] 150
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length         0.00        0.15            0           0
Sepal.Width          0.15        0.00            0           0
Petal.Length         0.00        0.00            0           0
Petal.Width          0.00        0.00            0           0

 To see confidence intervals of the correlations, print with the short=FALSE option

1つ目:相関係数行列
2つ目:相関係数のp-value
📌 95%信頼区間

> print(corr.test(iris[-5]), short=FALSE)
Call:corr.test(x = iris[-5])
Correlation matrix 
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length         1.00       -0.12         0.87        0.82
Sepal.Width         -0.12        1.00        -0.43       -0.37
Petal.Length         0.87       -0.43         1.00        0.96
Petal.Width          0.82       -0.37         0.96        1.00
Sample Size 
[1] 150
Probability values (Entries above the diagonal are adjusted for multiple tests.) 
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length         0.00        0.15            0           0
Sepal.Width          0.15        0.00            0           0
Petal.Length         0.00        0.00            0           0
Petal.Width          0.00        0.00            0           0

 Confidence intervals based upon normal theory.  To get bootstrapped values, try cor.ci
            raw.lower raw.r raw.upper raw.p lower.adj upper.adj
Spl.L-Spl.W     -0.27 -0.12      0.04  0.15     -0.27      0.04
Spl.L-Ptl.L      0.83  0.87      0.91  0.00      0.81      0.91
Spl.L-Ptl.W      0.76  0.82      0.86  0.00      0.74      0.88
Spl.W-Ptl.L     -0.55 -0.43     -0.29  0.00     -0.58     -0.25
Spl.W-Ptl.W     -0.50 -0.37     -0.22  0.00     -0.51     -0.20
Ptl.L-Ptl.W      0.95  0.96      0.97  0.00      0.94      0.98

📌 ビジュアル相関行列

> str(state.x77)
 num [1:50, 1:8] 3615 365 2212 2110 21198 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:50] "Alabama" "Alaska" "Arizona" "Arkansas" ...
  ..$ : chr [1:8] "Population" "Income" "Illiteracy" "Life Exp" ...
 
> cor(state.x77)
            Population     Income  Illiteracy    Life Exp     Murder     HS Grad      Frost        Area
Population  1.00000000  0.2082276  0.10762237 -0.06805195  0.3436428 -0.09848975 -0.3321525  0.02254384
Income      0.20822756  1.0000000 -0.43707519  0.34025534 -0.2300776  0.61993232  0.2262822  0.36331544
Illiteracy  0.10762237 -0.4370752  1.00000000 -0.58847793  0.7029752 -0.65718861 -0.6719470  0.07726113
Life Exp   -0.06805195  0.3402553 -0.58847793  1.00000000 -0.7808458  0.58221620  0.2620680 -0.10733194
Murder      0.34364275 -0.2300776  0.70297520 -0.78084575  1.0000000 -0.48797102 -0.5388834  0.22839021
HS Grad    -0.09848975  0.6199323 -0.65718861  0.58221620 -0.4879710  1.00000000  0.3667797  0.33354187
Frost      -0.33215245  0.2262822 -0.67194697  0.26206801 -0.5388834  0.36677970  1.0000000  0.05922910
Area        0.02254384  0.3633154  0.07726113 -0.10733194  0.2283902  0.33354187  0.0592291  1.00000000

> # 산점도, 히스토그램, 상관계수 동시에 보여줌
> pairs.panels(state.x77, pch=21, bg="red", hist.col="gold",
+              main="Correlation Plot of us States Data")

> library(corrgram)
> # 산점도, 히스토그램, 상관계수 동시에 보여줌
> pairs.panels(state.x77, pch=21, bg="red", hist.col="gold",
+              main="Correlation Plot of us States Data")
> corrgram(state.x77, lower.panel=panel.shade,
+          upper.panel=panel.pie, text.panel=panel.txt,
+          order=TRUE, main="Corrgram of us States Data")

> cols <- colorRampPalette(c("red", "pink", "green", "blue"))
> corrgram(state.x77, col.regions=cols,
+          lower.panel=panel.pie, upper.panel=panel.conf,
+          text.panel=panel.txt, order=FALSE,
+          main="Corrgram of us States Data")

Reference

この問題について(そうかんぶんせき), 我々は、より多くの情報をここで見つけました https://velog.io/@revudn46/상관분석-상관계수

テキストは自由に共有またはコピーできます。ただし、このドキュメントのURLは参考URLとして残しておいてください。

Collection and Share based on the CC Protocol

SupabaseデータベースのフライウェイとGithubアクションの移行方法

Grapql APIゲートウェイによるマイクロサービスアーキテクチャの改善