Sparse PLS on R


Reference

library

install.packages("spls")
Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done
library(spls)
Sparse Partial Least Squares (SPLS) Regression and
Classification (version 2.2-3)
library(ggplot2)

data

data(yeast)

xはクロマチンの免疫沈降、yは発現サイクル?

help(yeast)
yeast {spls} R Documentation

Yeast Cell Cycle Dataset

Description

This is the Yeast Cell Cycle dataset used in Chun and Keles (2010).

Usage

 data(yeast) 

Format

A list with two components:

x

ChIP-chip data. A matrix with 542 rows and 106 columns.

y

Cell cycle gene expression data. A matrix with 542 rows and 18 columns.

Details

Matrix y is cell cycle gene expression data (Spellman et al., 1998) of 542 genes from an α factor based experiment. Each column corresponds to mRNA levels measured at every 7 minutes during 119 minutes (a total of 18 measurements). Matrix x is the chromatin immunoprecipitation on chip (ChIP-chip) data of Lee et al. (2002) and it contains the binding information for 106 transcription factors. See Chun and Keles (2010) for more details.

Source

Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thomson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, and Young RA (2002), "Transcriptional regulatory networks in Saccharomyces cerevisiae", Science, Vol. 298, pp. 799–804.

Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, and Futcher B (1998), "Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hydrization", Molecular Biology of the Cell, Vol. 9, pp. 3273–3279.

References

Chun H and Keles S (2010), "Sparse partial least squares for simultaneous dimension reduction and variable selection", Journal of the Royal Statistical Society - Series B, Vol. 72, pp. 3–25.

Examples


data(yeast)
yeast$x[1:5,1:5]
yeast$y[1:5,1:5]


[Package spls version 2.2-3 ]
yeast$x[1:5,1:5]
A matrix: 5 × 5 of type dbl
ABF1_YPD ACE2_YPD ADR1_YPD ARG80_YPD ARG81_YPD
21 -0.2722730 0.21932294 0.9238359567 -0.4755756 -0.10389318
41 0.1691280 0.53831198 0.0097604993 -0.3219534 -0.19750606
71 -0.1388962 0.02636382 0.0877516229 -0.2234093 0.10307741
78 -0.2865169 -0.31409427 -0.0454998435 0.3262217 0.27757502
102 -0.4950561 -0.14827419 0.0002987512 -0.2179458 -0.02539585
plot(yeast$x[,1])

yeast$y[1:5,1:5]
A matrix: 5 × 5 of type dbl
alpha0 alpha7 alpha14 alpha21 alpha28
1 -0.36 -0.42 0.29 -0.14 -0.19
2 1.04 0.19 0.47 -1.03 -0.63
5 -0.30 -0.45 0.75 0.37 0.27
8 -0.46 0.12 -0.06 -0.76 -0.70
9 -1.35 -0.86 -0.22 -0.38 -0.65
plot(yeast$y[,1])

↓ 変数が542個もある・・

str(yeast)
List of 2
 $ x: num [1:542, 1:106] -0.272 0.169 -0.139 -0.287 -0.495 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:542] "21" "41" "71" "78" ...
  .. ..$ : chr [1:106] "ABF1_YPD" "ACE2_YPD" "ADR1_YPD" "ARG80_YPD" ...
 $ y: num [1:542, 1:18] -0.36 1.04 -0.3 -0.46 -1.35 -2.06 -1.61 -0.07 0.11 0.15 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:542] "1" "2" "5" "8" ...
  .. ..$ : chr [1:18] "alpha0" "alpha7" "alpha14" "alpha21" ...

SPLS

set.seed(1)
cv <- cv.spls(yeast$x, yeast$y, eta = seq(0.1,0.9,0.1), K = c(5:10))
eta = 0.1 
eta = 0.2 
eta = 0.3 
eta = 0.4 
eta = 0.5 
eta = 0.6 
eta = 0.7 
eta = 0.8 
eta = 0.9 

Optimal parameters: eta = 0.6, K = 8

f <- spls(yeast$x, yeast$y, eta = cv$eta.opt, K = cv$K.opt)
print(f)
Sparse Partial Least Squares for multivariate responses
----
Parameters: eta = 0.6, K = 8, kappa = 0.5
PLS algorithm:
pls2 for variable selection, simpls for model fitting

SPLS chose 56 variables among 106 variables

Selected variables: 
ACE2_YPD    ARG80_YPD   ARG81_YPD   ASH1_YPD    AZF1_YPD    
BAS1_YPD    CBF1_YPD    CHA4_YPD    CRZ1_YPD    FHL1_YPD    
FKH1_YPD    FKH2_YPD    FZF1_YPD    GAT1_YPD    GAT3_YPD    
GCN4_YPD    GCR2_YPD    GLN3_YPD    HAA1_YPD    HAP2_YPD    
HAP5_YPD    HIR1_YPD    HIR2_YPD    IME4_YPD    INO4_YPD    
A1..MATA1._YPD  MBP1_YPD    MCM1_YPD    MET4_YPD    MSN2_YPD    
NDD1_YPD    NRG1_YPD    PHD1_YPD    PHO2_YPD    PUT3_YPD    
RCS1_YPD    REB1_YPD    RFX1_YPD    RIM101_YPD  RME1_YPD    
RTG1_YPD    RTG3_YPD    SIP4_YPD    SOK2_YPD    STB1_YPD    
STE12_YPD   STP2_YPD    SWI4_YPD    SWI5_YPD    SWI6_YPD    
THI2_YPD    YAP1_YPD    YAP6_YPD    YAP7_YPD    YFL044C_YPD 
YJL206C_YPD 
coef.f <- coef(f)
coef.f[1:5,1:5]
A matrix: 5 × 5 of type dbl
alpha0 alpha7 alpha14 alpha21 alpha28
ABF1_YPD 0.0000000 0.000000000 0.00000000 0.0000000000 0.000000000
ACE2_YPD 0.0874325 0.068452293 0.01374781 -0.0002541969 -0.033302624
ADR1_YPD 0.0000000 0.000000000 0.00000000 0.0000000000 0.000000000
ARG80_YPD -0.0486881 -0.019092797 0.02063442 0.0300421634 0.007925553
ARG81_YPD -0.0168849 0.009465868 0.06353825 0.0541704059 0.006978985
plot.spls(f, yvar=1 )

coefplot.spls( f, nwin=c(2,2), xvar=c(1:4) )