R言語-データ構造

8216 ワード

ホーム:https://lartpang.github.io/
この間、EDXでR言語の基礎コースを勉強しました.ここでまとめます.このコースは主にRのデータ構造の紹介に重点を置いています.もちろんその基本的な製図方法も紹介しています.
ワークスペース関連

ls()

## character(0)

rm(a)

## Warning in rm(a):      'a'

ls()

## character(0)

基本データタイプ

logical

TRUE/FALSE/NA/T/F(完全な形式の使用を推奨)／場合によっては0と非0の

numeric

integer is numeric

numeric not always integer

character

Other atomic types:

double:higher precision

compplex:complex numbers

raw:store raw bytes

is.*()は、括弧内の内容が正しいかどうかを返します.*対応タイプ.

# logical
TRUE

## [1] TRUE

class(TRUE)

## [1] "logical"

FALSE

## [1] FALSE

class(NA)

## [1] "logical"

T

## [1] TRUE

F

## [1] FALSE

# numeric
2

## [1] 2

class(2)

## [1] "numeric"

2.5

## [1] 2.5

class(2.5)

## [1] "numeric"

2L

## [1] 2

class(2L) 

## [1] "integer"

is.numeric(2)

## [1] TRUE

is.numeric(2L)

## [1] TRUE

#integer is numeric 
#numeric not always integer
is.integer(2)

## [1] FALSE

is.integer(2L)

## [1] TRUE

# character
 "I love data science!"

## [1] "I love data science!"

 class("I love data science!")

## [1] "character"

強制変換as.*()は括弧内の内容を返して変換します.*は、タイプに対応した結果、変換できない場合があります.

as.numeric(TRUE)

## [1] 1

as.numeric(FALSE)

## [1] 0

as.character(4)

## [1] "4"

as.numeric("4.5")

## [1] 4.5

as.integer("4.5")

## [1] 4

as.numeric("Hello")

## Warning:           NA

## [1] NA

ベクトル

Sequence of data elemens

Same baic type

Automatic coercion if necessary

character,numeric,logical

Single value=Vector

c()を作成したり、:を利用したりします.

# c()
drawn_suits

名前names()

remain

シングル値はまだベクトルです

my_apples

強制変換

drawn_ranks

基本演算
とても自然なのは単数の計算から広めてくることができます.

# with number: +-*/
earnings 
sum(bank)

## [1] 30

earnings > expenses

## [1]  TRUE  TRUE FALSE

## multiplication and division are done element-wise!
earnings * c(1, 2, 3)

## [1]  50 200  90

サブセット
インデックス方式

番(Rは1から)

名前:names()の利用

論理値

remain

マトリクスMatrix

Vector:1 Dアラビアンof data elemens

Matrix:2 D ary of data elemens

Rows and columns

One atonic vector type

matrix()を作成します
デフォルトでは列で塗りつぶします

#     
matrix(1:6, nrow = 2)

##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

matrix(1:6, ncol = 3)

##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

matrix(1:6, nrow = 2, byrow = TRUE)

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6

#     
matrix(1:3, nrow = 2, ncol = 3)

##      [,1] [,2] [,3]
## [1,]    1    3    2
## [2,]    2    1    3

matrix(1:4, nrow = 2, ncol = 3)

## Warning in matrix(1:4, nrow = 2, ncol = 3):     [4]      [3]  
##   

##      [,1] [,2] [,3]
## [1,]    1    3    1
## [2,]    2    4    2

#     
cbind(1:3, 1:3)

##      [,1] [,2]
## [1,]    1    1
## [2,]    2    2
## [3,]    3    3

rbind(1:3, 1:3)

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    1    2    3

m

名前を付けるrownames(),colnames()

強制変換

num

サブセット演算

行列演算

colSums(), rowSums()

Standard arthmetic possible

Element-wise computation

the_fellowship

因子Factors

Factors for categorical variables

Limited number of different values

Belong to category

作成係数factor()

blood

Rename factor levels

blood

Ordered factor

blood

リストリストリスト
Vector-Matrix-List

Vector:1 D,same type

Matrix:2 D,same type

List:

Different R Object

No coercion

Loss of some functionlity

作成リストlist()

list("Rsome times", 190, 5)

## [[1]]
## [1] "Rsome times"
## 
## [[2]]
## [1] 190
## 
## [[3]]
## [1] 5

song

名前付きリスト

#1
song

リストネスト

similar_song

サブセット演算[ versus [[

similar_song

[or[?+[[to select list element]+[reults insublist+[[and$to subset and exted lists]
リスト拡張
ここでは、Rにおいて比較的重要なシンボル$を引き出している.

similar_song

データボックスData Frame

Observations観測値

Varables変数

Example:people

each person=observation

properties=variables

Rows=observations

Columns=variables(age、name、…)

異なる変数の観測値はタイプが異なりますが、変数自体の観測値はすべてタイプが一致します.
データを導入する時に多く使われます.
データボックスを作成

name

データボックスの名前を付ける

name

ここでの文字列ベクトルは、因子タイプに自動的に変換されますので、この暗黙の挙動を避けるためにパラメータを設定することができます.

name

サブセット演算
Subset Data Frame*Subsetting syntax from marices and lists*[from marices*]

name

拡張データボックス
Exted Data Frame*Add columns=add variables*Add rows=addobservations

name

並べ替え
ここでは主にsort()とorder()を紹介しているが、order()は、データボックスの順序を調整するのにより適している.

str(people)

## 'data.frame':    5 obs. of  4 variables:
##  $ name  : chr  "Anne" "Pete" "Frank" "Julia" ...
##  $ age   : num  28 30 21 39 35
##  $ child : logi  FALSE TRUE TRUE FALSE TRUE
##  $ height: num  163 177 163 162 157

#sort()             
sort(people$age)

## [1] 21 28 30 35 39

#order()                
ranks

グラフィックグラフィックス
ここでは、graphicsパッケージのplot()とhist()が紹介されています.plot()は、データの種類によって異なる画像を描くことができる.

plot()(categorical)ストリップ図例えば、plot(countries$continent)

plot()(numerical)散点図は、例えば、plot(countries$population)

plot()(2 x numerical)の散点図は、例えば、plot(countries$area, countries$population)、plot(log(countries$area), log(countries$population))

である.

plot()(2 x categorical)のある棒グラフの変形例:plot(countries$continent, countries$religion)

hist()は、例えば、hist(africa$population)、hist(africa$population, breaks = 10)を描画してもよい.
Other graphics functions*barplot()*box plot()*pairs()
カスタム描画
ここでパラメータを変更します.詳しく説明する必要はありません.
ここで、関数par()を引き出しました.これは図形描画の共通パラメータリストです.一般的に使用される図形のいくつかの共通の属性が保存されています.複数の図形を描画する際に、基本的な属性を一括して決定することができます.
たとえば:

par(col = "blue") 
plot(mercury$temperature, mercury$pressure)

よく使うplotの属性は以下の通りです.

plot(mercury$temperature, mercury$pressure, 
       xlab = "Temperature", 
       ylab = "Pressure", 
       main = "T vs P for Mercury", #  
       type = "o", 
       col = "orange", 
       col.main = "darkgray", 
       cex.axis = 0.6, #cex          
       lty = 5, #Line Type
       pch = 4  #Plot Symbol
       )

多図描画mfrowとmfcolのパラメータは、複数の画像を配置するために使用されてもよく、違いは、前者は後のplot文で生成された画像を行で充填し、後者は列で充填することである.

#    
par(mfrow = c(2,2)) 
plot(shop$ads, shop$sales) 
plot(shop$comp, shop$sales) 
plot(shop$inv, shop$sales) 
plot(shop$size_dist, shop$sales)

#    
par(mfcol = c(2,2)) 
plot(shop$ads, shop$sales) 
plot(shop$comp, shop$sales) 
plot(shop$inv, shop$sales) 
plot(shop$size_dist, shop$sales)

Reset the grid

par(mfrow = c(1,1))

これよりもlayout()の関数設定の方が柔軟です.

grid

Reset the grid

layout(1) 
par(mfcol = c(1,1))

Reset all parameters

old_par

線形フィッティング
引出し関数lm()、*lineamodel*、****lm(a~b)はa=k*b+cを線形フィッティング*

plot(shop$ads, shop$sales, 
      pch = 16, col = 2, 
      xlab = "advertisement", 
      ylab = "net sales")
lm_sales

2018.8.20 360前端面経