どのように私は自分の計算機を構築する任意のデータセットをクラスにRを使用して与えられたグループを構築します.

34201 ワード

tutorial r datascience statistics テキストリンク

どのように私は自分の計算機を構築する任意のデータセットをクラスにRを使用して与えられたグループを構築します。

私が大学にいたとき、私は鮮明に覚えていることができます、我々の講師のうちの1人は我々をあまりに多くの任務で忙しくしておくのが好きでした.私たちが20の異なるデータセットを与えられた時間がありました.それらは大きいものです、そして、質問はそれらの各々をグループ化して、それぞれのために必要な統計的価値を計算することです.
私は少し怠惰な少年だったと言わなければならない、私はちょうど何かを繰り返して、私はすでに基礎を理解して自分自身を強調したくない.それで、私は速くそれを作る方法を考えました、しかし、私はより良い方法を見ることができませんでした.幸運にも、私は私のためにこれを行うことができる計算機を構築することができるはずです.それは私がアイデアを得た方法だった.LOL

大きいデータセットをクラスにグループ化して、すべての重要な統計的価値を計算する方法

まず最初に、私のコードが魔法のように見えないように、ここで親切にクリックする前にグループデータセットの方法を知らないならば、私に言わせてください.しかし、私があなたに示した前に、コードは私に私の単純なプロジェクトに到着するためにとられるステップを説明させました.

グループに大きなデータセットをセットして、重要な仕事を示している若干の統計的価値を計算する計算機をつくるためにとられるステップ

STEP 1 :グループの周波数テーブルを作成するために、与えられたクラス幅を持つクラスにデータをグループ化するコードが必要です.
STEP 2 :クラス境界を計算する必要がありますこれは、下限から0.5を減じ、各クラスの上限クラスに0.5を加算することによって行われる.
Step 3 :クラスマークを計算するコードを書く必要があります.これは各クラスの中点です.
Step 4 :各周波数で各クラスマークの乗算を見つけるコードが必要です.
STEP 5 :周波数の総和によって分割されたfxの総和を使用して平均を計算するコードが必要です
Step 6 :私は、偏差とその正方形を取得するコードが必要です.
Step 7 :数式を使用して分散と標準偏差を計算する必要があります.
あなたが上記の問題を見ることができるように、私は私のために私の計算機が欲しい上にリストされました.一歩一歩ピッキングを始めます.

与えられた幅を使用してクラスに設定されているデータをグループ化するコードの書き方。

私は再利用可能なコードを必要とするように関数の形でコードを書いているので、すべてが機能の内側になるように、私は常に必要なときに機能を呼び出すことができます.コードの前にデータをグループ化する方法についての知識を持っていない場合は、ここで説明することがたくさんあるので、私の記事を読んでください.次のコードを見てみましょう
コード>

reateGroupTable=function(data,classwidth){ 
minimumValue=(min(data)%/%classwidth)*classwidth # to calculate a value less than the minimum value in the data set.
MaximumValue=((max(data)%/%classwidth)+1)*classwidth
d=MaximumValue+classwidth #to get the last upper class limit
lowerclass=seq(minimumValue,MaximumValue,classwidth) # to get a sequence of all lower class limit.
upperclass=lowerclass+classwidth-1 #to form a sequence of all upperclass limit.
classInterval=paste(lowerclass,'-',upperclass) # the sequence of the labels for each class.
alldata=table(cut(data,seq(minimumValue-1,d-1,classwidth), labels=classInterval)) # to tabulate the data
mytable=data.frame(alldata) # turn the table to two column 
mytable
}

上記のコードは単なる関数であり、実行時には関数が実行されていないので出力を行いません.次のデータを試してみましょう.

例。

30人の学生の以下の得点のためにグループ頻度表を組み立ててください.
24、46、16、33、16、13、28、19、47、49、8、56、20、26、28、29、30、18、19、15、47、32、14、25、14、16、23、12、14、13、13
14、16、13、16、13、18、19、7、9、8、6、20、26、28、29、30、18、19、15、17、12、14、15、14、16、13、12、14、13.
コード>>

createGroupTable=function(data,classwidth){ 
minimumValue=(min(data)%/%classwidth)*classwidth # to calculate a value less than the minimum value in the data set.
MaximumValue=((max(data)%/%classwidth)+1)*classwidth
d=MaximumValue+classwidth #to get the last upper class limit
lowerclass=seq(minimumValue,MaximumValue,classwidth) # to get a sequence of all lower class limit.
upperclass=lowerclass+classwidth-1 #to form a sequence of all upperclass limit.
classInterval=paste(lowerclass,'-',upperclass) # the sequence of the labels for each class.
alldata=table(cut(data,seq(minimumValue-1,d-1,classwidth), labels=classInterval)) # to tabulate the data
mytable=data.frame(alldata) # turn the table to two column 
mytable
}
 #the functions code end here 
score=c(24, 46, 16, 33, 16, 13, 28, 19, 47, 49, 8, 56, 20, 26, 28, 29, 30, 18, 19, 15, 47, 32, 14, 25, 14, 16, 23, 12, 14, 13) 
#now call the function
createGroupTable(score,15)

結果＞

   Var1    Freq
1  0 - 9     1
2 10 - 19   13
3 20 - 29    8
4 30 - 39    3
5 40 - 49    4
6 50 - 59    1
7 60 - 69    0

'
あなたが異なったデータセットでこれを試みて、あなたの好みのクラス幅を提供するならば、あなたは我々がコードから完璧でないすべてで全く必要としない最後に、追加クラスがあると理解します.それで、私は最後の列を取り除くことによってそれを調整する必要があります.

Rを使用してテーブルの最後の行を削除する方法。

ここで最後の行を削除するには問題はありません.私がする必要があるのは最後の行のインデックスを見つけることです.次のようにします.
コード>

createGroupTable=function(data,classwidth){ 
minimumValue=(min(data)%/%classwidth)*classwidth # to calculate a value less than the minimum value in the data set.
MaximumValue=((max(data)%/%classwidth)+1)*classwidth
d=MaximumValue+classwidth #to get the last upper class limit
lowerclass=seq(minimumValue,MaximumValue,classwidth) # to get a sequence of all lower class limit.
upperclass=lowerclass+classwidth-1 #to form a sequence of all upperclass limit.
classInterval=paste(lowerclass,'-',upperclass) # the sequence of the labels for each class.
alldata=table(cut(data,seq(minimumValue-1,d-1,classwidth), labels=classInterval)) # to tabulate the data
mytable=data.frame(alldata) # turn the table to two column 
lastIndex=length(mytable$Freq)
newTable=mytable[-lastIndex,]
newTable}

 #the functions code end here 
score=c(24, 46, 16, 33, 16, 13, 28, 19, 47, 49, 8, 56, 20, 26, 28, 29, 30, 18, 19, 15, 47, 32, 14, 25, 14, 16, 23, 12, 14, 13) 
#now call the function
createGroupTable(score,10)

結果＞

  Var1     Freq
1  0 - 9     1
2 10 - 19   13
3 20 - 29    8
4 30 - 39    3
5 40 - 49    4
6 50 - 59    1

あなたは、我々が最後のインデックスを除くことを見ることができます.任意の大規模なデータを設定するだけで、データの変数名とクラス幅の入力を何度でも機能を呼び出すことができます.
今、私は他のステップに対処する必要があります.これらは時間がかかるつもりはない.

Rを用いたクラス境界を持つグループ頻度表の作成法

我々は周波数でクラス間隔を構築することができるので、我々はすべての下位クラスから0.5を減算し、すべての上位クラスに0.5を追加する必要があります.次のコードを調べます.

createGroupTable=function(data,classwidth){ 
minimumValue=(min(data)%/%classwidth)*classwidth
MaximumValue=((max(data)%/%classwidth)+1)*classwidth
d=MaximumValue+classwidth
lowerclass=seq(minimumValue,MaximumValue,classwidth)
 upperclass=lowerclass+classwidth-1
classInterval=paste(lowerclass,'-',upperclass)
lowerclassBound=lowerclass-0.5
upperclassBound=upperclass+0.5
classBoundary=paste(lowerclassBound,'-', upperclassBound)
alldata=table(cut(data,seq(minimumValue-1,d-1,classwidth), labels=classInterval))
mytable=data.frame(alldata)
mytable$classBound=classBoundary
pureTable=mytable[!(mytable$Freq==0),]
pureTable
}
 #the functions code end here 
score=c(24, 46, 16, 33, 16, 13, 28, 19, 47, 49, 8, 56, 20, 26, 28, 29, 30, 18, 19, 15, 47, 32, 14, 25, 14, 16, 23, 12, 14, 13) 
#now call the function
createGroupTable(score,10)

結果

  Var1     Freq  classBound
1   0 - 9    1  -0.5 - 9.5
2 10 - 19   13  9.5 - 19.5
3 20 - 29    8 19.5 - 29.5
4 30 - 39    3 29.5 - 39.5
5 40 - 49    4 39.5 - 49.5
6 50 - 59    1 49.5 - 59.5

今、私たちはクラスマーク(x)とfxを計算するコードを含める必要があります.
下記のコードを見てください.

createGroupTable=function(data,classwidth){ 
minimumValue=(min(data)%/%classwidth)*classwidth
MaximumValue=((max(data)%/%classwidth)+1)*classwidth
d=MaximumValue+classwidth
lowerclass=seq(minimumValue,MaximumValue,classwidth)
 upperclass=lowerclass+classwidth-1
classInterval=paste(lowerclass,'-',upperclass)
lowerclassBound=lowerclass-0.5
upperclassBound=upperclass+0.5
classBoundary=paste(lowerclassBound,'-', upperclassBound)
classMark=(lowerclass+upperclass)/2
alldata=table(cut(data,seq(minimumValue-1,d-1,classwidth), labels=classInterval))
mytable=data.frame(alldata)
Freq=mytable$Freq
Fx=Freq*classMark
mytable$classBound=classBoundary
mytable$classMark(x)=classMark
mytable$Fx=Fx
pureTable=mytable[!(mytable$Freq==0),]

pureTable
}
 #the functions code end here 
score=c(24, 46, 16, 33, 16, 13, 28, 19, 47, 49, 8, 56, 20, 26, 28, 29, 30, 18, 19, 15, 47, 32, 14, 25, 14, 16, 23, 12, 14, 13) 
#now call the function
createGroupTable(score,10)

結果＞

     Var1 Freq  classBound classMark    Fx
1   0 - 9    1  -0.5 - 9.5       4.5   4.5
2 10 - 19   13  9.5 - 19.5      14.5 188.5
3 20 - 29    8 19.5 - 29.5      24.5 196.0
4 30 - 39    3 29.5 - 39.5      34.5 103.5
5 40 - 49    4 39.5 - 49.5      44.5 178.0
6 50 - 59    1 49.5 - 59.5      54.5  54.5

上で見ることができるように、我々はちょうど2つのコラムを含みます.今、我々は前方に行くことができますし、平均FXの/を使用して平均を計算します.
そのためにいくつかのコードを追加します.
私は、標準偏差を含む私の次の記事の後でそれをあなたに示します.
あなたが私の記事のどれも見逃すことがないように、私について考えてください.
ハッピーコーディング!🖐️🖐️

Reference

この問題について(どのように私は自分の計算機を構築する任意のデータセットをクラスにRを使用して与えられたグループを構築します.), 我々は、より多くの情報をここで見つけました https://dev.to/maxwizard01/how-i-build-my-own-calculator-that-group-any-data-set-given-in-to-classes-using-r-2529

テキストは自由に共有またはコピーできます。ただし、このドキュメントのURLは参考URLとして残しておいてください。

Collection and Share based on the CC Protocol

Dockerのインストールとアンインストール

外側のクリック(モジュール)を閉じる反応のポップアップメニューを作成する方法