提示：文章完成后，目录可以自动生成，如何生成可以参考右边的帮助文档

文章目录

前言
一、公式推导
二、相关代码
- 1.setup layer
- 2. reshape
- 3. forward
- 4. backward
- 5. 完整代码
总结

前言

我在onnx模型转caffe模型时发现，caffe缺少一些onnx的算子。例如，矩阵乘法算子，caffe默认是通过InnerProductLayer来实现。但是，InnerProductLayer实现方法是输入矩阵X，全连接层提供权重矩阵W(我们不能输入这个权重矩阵W，自动为算子内部提供parameter blob），之后输出Y（Y = WX），这不同于实现我们想要的矩阵乘法。我们想要的矩阵乘法应该满足两个输入X和W，输出通过算子获得Y，与Y = np.multiply（X，W）相似。总言而之，caffe默认矩阵乘法算子InnerProductLayer不同于我们需要的乘法算子。

InnerProductLayer只能输入X，W我们不能自动输入操作
我们想要的算子有两个输入X和W，与numpy的 np.multiply算子类似。

一、公式推导

假设损失函数（loss function）为 I \Iota I : R M × N R^{M×N} RM×N → R. 输入此损失函数 Z \Zeta Z: ∈ \in ∈ R M × N R^{M×N} RM×N, Z \Zeta Z = XY, 其中X与Y输入矩阵乘法，X ∈ \in ∈ R M × K R^{M×K} RM×K, Y ∈ \in ∈ R K × N R^{K×N} RK×N, Z \Zeta Z 就自然对应为矩阵乘法的输出。当神经网络做向后传递时（backward propagation）, 损失函数的偏导数（partial derivatives）为 ∂ l ( Z ) ∂ X {∂ l(Z)}\over{∂X} ∂X∂l(Z) + ∂ l ( Z ) ∂ Y {∂ l(Z)}\over{∂Y} ∂Y∂l(Z)。所以，我们只要能够求解 ∂ l ( Z ) ∂ X {∂ l(Z)}\over{∂X} ∂X∂l(Z) ∈ \in ∈ R M × K R^{M×K} RM×K与 ∂ l ( Z ) ∂ Y {∂ l(Z)}\over{∂Y} ∂Y∂l(Z) ∈ \in ∈ R K × N R^{K×N} RK×N，那么就能够成功推导出反向传递的公式。首先看 ∂ l ( Z ) ∂ X {∂ l(Z)}\over{∂X} ∂X∂l(Z)，通过链式法则（chain rule）可得： ∂ l ( Z ) ∂ X {∂ l(Z)}\over{∂X} ∂X∂l(Z) = ∂ l ( Z ) ∂ Z {∂ l(Z)}\over{∂Z} ∂Z∂l(Z) ∂ Z ∂ X {∂Z}\over{∂X} ∂X∂Z = ∂ l ( Z ) ∂ Z {∂ l(Z)}\over{∂Z} ∂Z∂l(Z) ∂ ( X Y ) ∂ X {∂(XY)}\over{∂X} ∂X∂(XY) = ∂ l ( Z ) ∂ Z {∂ l(Z)}\over{∂Z} ∂Z∂l(Z) Y T Y^{T} YT。其中 ∂ l ( Z ) ∂ X {∂ l(Z)}\over{∂X} ∂X∂l(Z) ∈ \in ∈ R M × K R^{M×K} RM×K， ∂ l ( Z ) ∂ Z {∂ l(Z)}\over{∂Z} ∂Z∂l(Z) ∈ \in ∈ R M × N R^{M×N} RM×N， Y T Y^{T} YT ∈ \in ∈ R N × K R^{N×K} RN×K，上述等式结构上成立。如果将这个公式翻译为caffe代码，就是X.differential = Top.differential * Y.data.transpose。同理， ∂ l ( Z ) ∂ Y {∂ l(Z)}\over{∂Y} ∂Y∂l(Z) = ∂ l ( Z ) ∂ Z {∂ l(Z)}\over{∂Z} ∂Z∂l(Z) ∂ Z ∂ Y {∂Z}\over{∂Y} ∂Y∂Z = ∂ l ( Z ) ∂ Z {∂ l(Z)}\over{∂Z} ∂Z∂l(Z) ∂ ( X Y ) ∂ Y {∂(XY)}\over{∂Y} ∂Y∂(XY) = X T X^{T} XT ∂ l ( Z ) ∂ Z {∂ l(Z)}\over{∂Z} ∂Z∂l(Z)。其中 ∂ l ( Z ) ∂ Y {∂ l(Z)}\over{∂Y} ∂Y∂l(Z) ∈ \in ∈ R K × N R^{K×N} RK×N， X T X^{T} XT ∈ \in ∈ R K × M R^{K×M} RK×M， ∂ l ( Z ) ∂ Z {∂ l(Z)}\over{∂Z} ∂Z∂l(Z) ∈ \in ∈ R M × N R^{M×N} RM×N，上述等式结构上成立。接着翻译为caffe代码，为Y.differential = X.data.transpose * Top.differential。

二、相关代码

1.setup layer

void MatMulLayer::LayerSetUp(const vector<Blob>& bottom, const vector<Blob>& top){

LOG_IF(INFO, Caffe::root_solver()) << "begin setting up";
// use axis to check two bottom block if satisfied standard
const int xaxes = bottom[0]->num_axes();
LOG_IF(INFO, Caffe::root_solver()) << "xaxes" << xaxes;
const int yaxes = bottom[1]->num_axes();
LOG_IF(INFO, Caffe::root_solver()) << "yaxes" << yaxes;
CHECK(xaxes>=2)
    << "X blob must be of shape (B1xB2...xBn,M,K) or (M,K)!";
CHECK(yaxes>=2)
    << "y blob must be of shape (B1xB2...xBn,K,N) or (K,N)!";
CHECK(xaxes==yaxes)
    << "X blob must hold same shape as yaxes";

// get M, K and N
const int xK_ = bottom[0]->shape(-1);
const int yK_ = bottom[1]->shape(-2);
CHECK(xK_ == yK_)
    << "demension of X is incomplicatiable with y, not match formula of mat mul ";
M_ = bottom[0]->shape(-2);
K_ = xK_;
N_ = bottom[1]->shape(-1);

}

2. reshape

void MatMulLayer::Reshape(const vector<Blob>& bottom, const vector<Blob>& top){

// create top block
const int end_index = bottom[0]->num_axes() - 1;
vector<int> top_shape = bottom[0]->shape();
LOG_IF(INFO, Caffe::root_solver()) << "bottom shape end" << top_shape[end_index];
LOG_IF(INFO, Caffe::root_solver()) << "bottom shape end+1" << top_shape[end_index-1];
N_ = bottom[1]->shape(-1);
top_shape[end_index] = N_;
LOG_IF(INFO, Caffe::root_solver()) << "top shape end" << top_shape[end_index];
LOG_IF(INFO, Caffe::root_solver()) << "top shape end+1" << top_shape[end_index-1];
top[0]->Reshape(top_shape);
}

3. forward

void MatMulLayer::Forward_cpu(const vector<Blob>& bottom, const vector<Blob>& top){

// get matrix1, matrix2 and output
LOG_IF(INFO, Caffe::root_solver()) << "begin forward";
const Dtype* bottom_matrix1 = bottom[0]->cpu_data();
const Dtype* bottom_matrix2 = bottom[1]->cpu_data();
Dtype* top_data = top[0]->mutable_cpu_data();
// address to skip when passing each 2 * 2matrix
const int botm1_stride = M_ * K_;
const int botm2_stride = K_ * N_;
const int top_stride = M_ * N_;
int batch_size = 1;
// number of batch size if exist
const int xaxes = bottom[0]->num_axes();
if(xaxes > 2){
    batch_size = top[0]->count(0, top[0]->CanonicalAxisIndex(-2));
}

for(int batch=0; batch<batch_size; batch++){
    caffe_cpu_gemm<Dtype>(CblasNoTrans,
                      CblasNoTrans,
                      M_, N_, K_,
                      (Dtype)1.,
                      bottom_matrix1 + botm1_stride * batch,
                      bottom_matrix2 + botm2_stride * batch,
                      (Dtype)0.,
                      top_data + top_stride * batch );
    }
}

4. backward

void MatMulLayer::Backward_cpu(const vector<Blob>& top, const vector& propagate_down, const vector<Blob>& bottom){

LOG_IF(INFO, Caffe::root_solver()) << "begin backward";
// address to skip when passing each 2 * 2matrix
const int botm1_stride = M_ * K_;
const int botm2_stride = K_ * N_;
const int top_stride = M_ * N_;
int batch_size = 1;
// number of batch size if exist
const int xaxes = bottom[0]->num_axes();
if(xaxes > 2){
    batch_size = top[0]->count(0, top[0]->CanonicalAxisIndex(-2));
}
const Dtype* top_diff = top[0]->cpu_diff();
const Dtype* bottom_1 = bottom[0]->cpu_data();
const Dtype* bottom_2 = bottom[1]->cpu_data();
Dtype* botm_1_diff = bottom[0]->mutable_cpu_diff();
Dtype* botm_2_diff = bottom[1]->mutable_cpu_diff();

if(propagate_down[0]){
    // X.diff = Z.diff∗Y.data.transpose()
    for(int batch=0; batch<batch_size; batch++){
    caffe_cpu_gemm<Dtype>(CblasNoTrans,
                      CblasTrans,
                      M_, K_, N_,
                      (Dtype)1.,
                      top_diff + top_stride * batch,
                      bottom_2 + botm2_stride * batch,
                      (Dtype)0.,
                      botm_1_diff + botm1_stride * batch );
        }
   }

if(propagate_down[1]){
    // Y.diff = X.data.transpose()∗Z.diff
    for(int batch=0; batch<batch_size; batch++){
    caffe_cpu_gemm<Dtype>(CblasTrans,
                      CblasNoTrans,
                      K_, N_, M_,
                      (Dtype)1.,
                      bottom_1 + botm1_stride * batch,
                      top_diff + top_stride * batch,
                      (Dtype)0.,
                      botm_2_diff + botm2_stride * batch );
        }
   }
}

5. 完整代码

#include

#include “caffe/filler.hpp” #include “caffe/layers/mat_mul_layer.hpp” #include “caffe/util/math_functions.hpp”

namespace caffe {

template void MatMulLayer::LayerSetUp(const vector<Blob>& bottom, const vector<Blob>& top){

LOG_IF(INFO, Caffe::root_solver()) << "begin setting up";
// use axis to check two bottom block if satisfied standard
const int xaxes = bottom[0]->num_axes();
LOG_IF(INFO, Caffe::root_solver()) << "xaxes" << xaxes;
const int yaxes = bottom[1]->num_axes();
LOG_IF(INFO, Caffe::root_solver()) << "yaxes" << yaxes;
CHECK(xaxes>=2)
    << "X blob must be of shape (B1xB2...xBn,M,K) or (M,K)!";
CHECK(yaxes>=2)
    << "y blob must be of shape (B1xB2...xBn,K,N) or (K,N)!";
CHECK(xaxes==yaxes)
    << "X blob must hold same shape as yaxes";

// get M, K and N
const int xK_ = bottom[0]->shape(-1);
const int yK_ = bottom[1]->shape(-2);
CHECK(xK_ == yK_)
    << "demension of X is incomplicatiable with y, not match formula of mat mul ";
M_ = bottom[0]->shape(-2);
K_ = xK_;
N_ = bottom[1]->shape(-1);

}

template void MatMulLayer::Reshape(const vector<Blob>& bottom, const vector<Blob>& top){

// create top block
const int end_index = bottom[0]->num_axes() - 1;
vector<int> top_shape = bottom[0]->shape();
LOG_IF(INFO, Caffe::root_solver()) << "bottom shape end" << top_shape[end_index];
LOG_IF(INFO, Caffe::root_solver()) << "bottom shape end+1" << top_shape[end_index-1];
N_ = bottom[1]->shape(-1);
top_shape[end_index] = N_;
LOG_IF(INFO, Caffe::root_solver()) << "top shape end" << top_shape[end_index];
LOG_IF(INFO, Caffe::root_solver()) << "top shape end+1" << top_shape[end_index-1];
top[0]->Reshape(top_shape);
}

template void MatMulLayer::Forward_cpu(const vector<Blob>& bottom, const vector<Blob>& top){

// get matrix1, matrix2 and output
LOG_IF(INFO, Caffe::root_solver()) << "begin forward";
const Dtype* bottom_matrix1 = bottom[0]->cpu_data();
const Dtype* bottom_matrix2 = bottom[1]->cpu_data();
Dtype* top_data = top[0]->mutable_cpu_data();
// address to skip when passing each 2 * 2matrix
const int botm1_stride = M_ * K_;
const int botm2_stride = K_ * N_;
const int top_stride = M_ * N_;
int batch_size = 1;
// number of batch size if exist
const int xaxes = bottom[0]->num_axes();
if(xaxes > 2){
    batch_size = top[0]->count(0, top[0]->CanonicalAxisIndex(-2));
}

for(int batch=0; batch<batch_size; batch++){
    caffe_cpu_gemm<Dtype>(CblasNoTrans,
                      CblasNoTrans,
                      M_, N_, K_,
                      (Dtype)1.,
                      bottom_matrix1 + botm1_stride * batch,
                      bottom_matrix2 + botm2_stride * batch,
                      (Dtype)0.,
                      top_data + top_stride * batch );
    }
}

template void MatMulLayer::Backward_cpu(const vector<Blob>& top, const vector& propagate_down, const vector<Blob>& bottom){

LOG_IF(INFO, Caffe::root_solver()) << "begin backward";
// address to skip when passing each 2 * 2matrix
const int botm1_stride = M_ * K_;
const int botm2_stride = K_ * N_;
const int top_stride = M_ * N_;
int batch_size = 1;
// number of batch size if exist
const int xaxes = bottom[0]->num_axes();
if(xaxes > 2){
    batch_size = top[0]->count(0, top[0]->CanonicalAxisIndex(-2));
}
const Dtype* top_diff = top[0]->cpu_diff();
const Dtype* bottom_1 = bottom[0]->cpu_data();
const Dtype* bottom_2 = bottom[1]->cpu_data();
Dtype* botm_1_diff = bottom[0]->mutable_cpu_diff();
Dtype* botm_2_diff = bottom[1]->mutable_cpu_diff();

if(propagate_down[0]){
    // X.diff = Z.diff∗Y.data.transpose()
    for(int batch=0; batch<batch_size; batch++){
    caffe_cpu_gemm<Dtype>(CblasNoTrans,
                      CblasTrans,
                      M_, K_, N_,
                      (Dtype)1.,
                      top_diff + top_stride * batch,
                      bottom_2 + botm2_stride * batch,
                      (Dtype)0.,
                      botm_1_diff + botm1_stride * batch );
        }
   }

if(propagate_down[1]){
    // Y.diff = X.data.transpose()∗Z.diff
    for(int batch=0; batch<batch_size; batch++){
    caffe_cpu_gemm<Dtype>(CblasTrans,
                      CblasNoTrans,
                      K_, N_, M_,
                      (Dtype)1.,
                      bottom_1 + botm1_stride * batch,
                      top_diff + top_stride * batch,
                      (Dtype)0.,
                      botm_2_diff + botm2_stride * batch );
        }
   }
}

#ifdef CPU_ONLY STUB_GPU(MatMulLayer); #endif

INSTANTIATE_CLASS(MatMulLayer);
REGISTER_LAYER_CLASS(MatMul);

}

总结

以上就caffe矩阵乘法算子的是全部内容了。gpu的实现和cpu一致，只是把一些函数名称的cpu换成gpu因此不再过多赘述。

资讯详情

矩阵乘法算子caffe实现

文章目录

前言

一、公式推导

二、相关代码

1.setup layer

2. reshape

3. forward

4. backward

5. 完整代码

总结

动力学技术KTU1121 USB Type-C 端口保护器的介绍、特性、及应用

矩阵乘法算子caffe实现

文章目录

最近热搜

历史搜索 清除历史记录

历史搜索清除历史记录