FM、FMM、DeepFMアレンジ(pytorch)

9720 ワード

推薦システムと深さ学習参考ブログ

一、分解機モデル(Factorization Machine,FM)(2010年)


FMのモデル方程式は、y=w 0+Σi=1 nw i x i x j+Σi=1 nΣj=i+1 nx i x j y=w_である0 +\sum_{i=1}^{n}{w_ix_ix_j}+\sum_{i=1}^{n}\sum_{j=i+1}^{n}{x_ix_j}y=w 0+i=1Σnwi xi xj+i=1Σnj=i+1Σnxi xjはX=(x 1,x 2,...,xn)’,X∈R nとする× 1 X=(x_1,x_2,…,x_n)^{'}, X\in R^{n\times1} X=(x1​,x2​,…,xn​)′,X∈Rn×1.交差項を構成する重みベクトルは,V i=(v i 1,v i 2,...,v i k)’,V i∈R kである.× 1 ​ V_i=(v_{i1},v_{i2},…,v_{ik})^{'},V_i\in R^{k\times 1}​ Vi​=(vi1​,vi2​,…,vik​)′,Vi​∈Rk×1​.
我々の問題の目標はT T T行列の上三角和を求めることに転化した.
ここで、T=[V 1′V 1 x 1....V 1′V n x 1 x 1.....V 1′V 1′V 1 x 1.V.V 1′V 1 x 1.V.V 1′V 1′V 1 x 1 x 1..V 1′V 1′V n x 1 x n]T=left[ begin{array}{ ccc}V_1^{′}V_1 x_1 x_1 x_1&&&V_1^{′}V_nx_1 x_1 x_1 x_1 x_n\\\\&V_i^{′}V_jx_ix_j\\\V_n^{V_V_x_x_x_x_x_x_x&…&V_n^{'}V_nx_nx_nend{array}right]T=⎣⎡V 1′V 1 x 1 x 1...Vn′​V1​xn​x1​​...Vi′​Vj​xi​xj​...​V1′​Vn​x1​xn​...Vn′​Vn​xn​xn​​⎦⎤​
V = [ V 1 ′ V 2 ′ . . . V n ′ ] n × k = [ ( v 11 v 12 . . . v 1 k ) ( v 21 v 22 . . . v 2 k ) . . . ( v n 1 v n 2 . . . v n k ) ] n × k = [ v 11 v 12 . . . v 1 k v 21 v 22 . . . v 2 k . . . . . . v i j . . . v n 1 v n 2 . . . v n k ] n × k V=\left[\begin{array}{c} V_{1}^{'}\\V_{2}^{'}\\...\\V_{n}^{'}\end{array}\right ]_{n\times k} =\left[\begin{array}{c} (v_{11} & v_{12} & ... & v_{1k})\\(v_{21} & v_{22} & ... & v_{2k})\\...\\(v_{n1} & v_{n2} & ... & v_{nk})\end{array}\right ]_{n\times k} =\left[\begin{array}{cccc} v_{11} & v_{12} & ... & v_{1k}\\v_{21} & v_{22} & ... & v_{2k}\\... & ... & v_{ij} & ...\\v_{n1} & v_{n2} & ... & v_{nk}\end{array}\right ]_{n\times k} V=⎣⎢⎢⎡​V1′​V2′​...Vn′​​⎦⎥⎥⎤​n×k​=⎣⎢⎢⎡​(v11​(v21​...(vn1​​v12​v22​vn2​​.........​v1k​)v2k​)vnk​)​⎦⎥⎥⎤​n×k​=⎣⎢⎢⎡​v11​v21​...vn1​​v12​v22​...vn2​​......vij​...​v1k​v2k​...vnk​​⎦⎥⎥⎤​n×k​

解:


∑ i = 1 n ∑ j = i + 1 n < v i , v j > x i x j = 1 2 ∑ i = 1 n ∑ j = 1 n < v i , v j > x i x j − 1 2 ∑ i = 1 n < v i , v i > x i x i = 1 2 ∑ i = 1 n ∑ j = 1 n ∑ k = 1 K v i k v j k x i x j − 1 2 ∑ i = 1 n ∑ k = 1 K v i k v i k x i x i = 1 2 ∑ k = 1 K [ ∑ i = 1 n ( v i k x i ) ∑ j = 1 n ( v j k x j ) ] − 1 2 ∑ k = 1 K ∑ i = 1 n x i 2 v i k 2 = 1 2 ∑ k = 1 K ( ∑ i = 1 n v i k x i ) 2 − 1 2 ∑ k = 1 K ( ∑ i = 1 n v i k 2 x i 2 ) = 1 2 ∑ k = 1 K [ ( ∑ i = 1 n v i k x i ) 2 − ∑ i = 1 n v i k 2 x i 2 ]\sum_{i=1}^{n}\sum_{j=i+1}^{n}x_ix_j =\\\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}x_ix_j -\frac{1}{2}\sum_{i=1}^{n}x_ix_i\\=\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\sum_{k=1}^{K}{v_{ik}v_{jk}x_ix_j} -\frac{1}{2}\sum_{i=1}^{n}\sum_{k=1}^{K}{v_{ik}v_{ik}x_ix_i}\\=\frac{1}{2}\sum_{k=1}^{K}[\sum_{i=1}^{n}(v_{ik}x_i)\sum_{j=1}^{n}(v_{jk}x_j)] -\frac{1}{2}\sum_{k=1}^{K}\sum_{i=1}^{n}x_i^2v_{ik}^2\\=\frac{1}{2}\sum_{k=1}^{K}(\sum_{i=1}^{n}v_{ik}x_i)^2 -\frac{1}{2}\sum_{k=1}^{K}(\sum_{i=1}^{n}v_{ik}^2x_i^2)\\=\frac{1}{2}\sum_{k=1}^{K}[(\sum_{i=1}^{n}v_{ik}x_i)^2-\sum_{i=1}^{n}v_{ik}^2x_i^2] i=1∑n​j=i+1∑n​xi​xj​=21​i=1∑n​j=1∑n​xi​xj​−21​i=1∑n​xi​xi​=21​i=1∑n​j=1∑n​k=1∑K​vik​vjk​xi​xj​−21​i=1∑n​k=1∑K​vik​vik​xi​xi​=21​k=1∑K​[i=1∑n​(vik​xi​)j=1∑n​(vjk​xj​)]−21​k=1∑K​i=1∑n​xi2​vik2​=21​k=1∑K​(i=1∑n​vik​xi​)2−21​k=1∑K​(i=1∑n​vik2​xi2​)=21​k=1∑K​[(i=1∑n​vik​xi​)2−i=1∑n​vik2​xi2​]式は,特徴次元n(各特徴を含むone−hot情報と密集型特徴),交差項パラメータ行列の次元kを整理する.記:モデル入力i n p u t∈R n× 1 input\in R^{n\times 1} input∈Rn×1. f m 1 = ( v ′ ⋅ i n p u t ) 2 v ∈ R n × k f m 1 ∈ R k × 1 f m 2 = ( v ′ ) 2 ⋅ i n p u t 2 f m 2 ∈ R k × 1 o u t = W ′ ⋅ i n p u t + 1 2 ⋅ 1 ( f m 1 − f m 2 ) W ∈ R n × 1 fm_1 = (v^{'}\cdot input)^2\qquad v\in R^{n\times k}\quad fm_1\in R^{k\times 1}\\fm_2 = (v^{'})^2\cdot input^2\qquad fm_2\in R^{k\times 1}\\out = W^{'}\cdot input +\frac{1}{2}\cdot\mathbf{1}(fm_1-fm_2)\qquad W\in R^{n\times 1} fm1​=(v′⋅input)2v∈Rn×kfm1​∈Rk×1fm2​=(v′)2⋅input2fm2​∈Rk×1out=W′⋅input+21​⋅1(fm1​−fm2​)W∈Rn×1そのうち:1=(1,1,1,..,,1)∈R 1× k\mathbf{1} = (1,1,1,...,1)\in R^{1\times k} 1=(1,1,1,...,1)∈R1×kモデルで学習されるパラメータはWWWとv vである.コードは次のとおりです.
class FM_model(nn.Module):
    def __init__(self, n, k):
        super(FM_model, self).__init__()
        self.n = n # len(items) + len(users)
        self.k = k
        self.linear = nn.Linear(self.n, 1, bias=True)
        self.v = nn.Parameter(torch.randn(self.k, self.n))

    def fm_layer(self, x):
      	# x   R^{batch*n}
        linear_part = self.linear(x)
        #   (batch*p) * (p*k)
        inter_part1 = torch.mm(x, self.v.t())  # out_size = (batch, k)
        #   (batch*p)^2 * (p*k)^2
        inter_part2 = torch.mm(torch.pow(x, 2), torch.pow(self.v, 2).t()) # out_size = (batch, k) 
        output = linear_part + 0.5 * torch.sum(torch.pow(inter_part1, 2) - inter_part2) 
        #  torch sum
        return output  # out_size = (batch, 1)

    def forward(self, x):
        output = self.fm_layer(x)
        return output

二、Field-aware FM(2015)


三、Deep FM(2017)