资讯详情

Pytorch Tutoriais (PROTOTYPE) FX GRAPH MODE POST TRAINING DYNAMIC QUANTIZATION

(PROTOTYPE) FX GRAPH MODE POST TRAINING DYNAMIC QUANTIZATION

Tutorials > (prototype) FX Graph Mode Post Training Dynamic Quantization

doc : (prototype) FX Graph Mode Post Training Dynamic Quantization — PyTorch Tutorials 1.11.0 cu102 documentation

: Jerry Zhang

2022年5月24日

tag : 翻译学习

topic : Pytorch 量化


(prototype) FX Graph Mode Post Training Dynamic Quantization

  • 基于本课程介绍torch.fx在graph mode训练后动态量化步骤。
  • 单独的FX Graph Mode Post Training Static Quantization教程。
  • FX可以比较图形模式量化和急模式量化quantization docs中找到。

tldr; The FX Graph Mode API for dynamic quantization looks like the following:

import torch from torch.quantization import default_dynamic_qconfig # Note that this is temporary, we'll expose these functions to torch.quantization after official releasee from torch.quantization.quantize_fx import prepare_fx, convert_fx  float_model.eval() qconfig = get_default_qconfig("fbgemm") qconfig_dict = {"": qconfig} prepared_model = prepare_fx(float_model, qconfig_dict)  # fuse modules and insert observers # no calibration is required for dynamic quantization quantized_model = convert_fx(prepared_model)  # convert the model to a dynamically quantized model 

在本教程的基础上,动态量化应用 LSTM 跟随下一个单词预测模型 PyTorch 示例中的单词语言模型。我们将在Dynamic Quantization on an LSTM Word Language Model并省略描述中的代码。

    1. 定义模型,下载数据和模型

    下载 并将其压缩到数据文件夹中

    mkdir data cd data wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-v1.zip unzip wikitext-2-v1.zip 

    下载到数据文件夹:

    wget https://s3.amazonaws.com/pytorch-tutorial-assets/word_language_model_quantize.pth 

    定义模型:

    # imports import os from io import open import time import copy  import torch import torch.nn as nn import torch.nn.functional as F  # Model Definition class LSTMModel(nn.Module):     """Container module with an encoder, a recurrent module, and a decoder."""      def __init__(self, ntoken, ninp, nhid, nlayers, dropout=0.5):         super(LSTMModel, self).__init__()         self.drop = nn.Dropout(dropout)         self.encoder = nn.Embedding(ntoken, ninp)         self.rnn = nn.LSTM(ninp, nhid, nlayers, dropout=dropout)         self.decoder = nn.Linear(nhid, ntoken)          self.init_weights()          self.nhid = nhid         self.nlayers = nlayers      def init_weights(self):         initrange = 0.1         self.encoder.weight.data.uniform_(-initrange, initrange)         self.decoder.bias.data.zero_()         self.decoder.weight.data.uniform_(-initrange, initrange)      def forward(self, input, hidden):         emb = self.drop(self.encoder(input))         output, hidden = self.rnn(emb, hidden)         output = self.drop(output)         decoded = self.decoder(output)         return decoded, hidden   def init_hidden(lstm_model, bsz):     # get the weight tensor and create hidden layer in the same device     weight = lstm_model.encoder.weight     # get weight from quantized model     if not isinstance(weight, torch.Tensor):         weight = weight()     device = weight.device     nlayers = lstm_model.rnn.num_layers     nhid = lstm_model.rnn.hidden_size     return (torch.zeros(nlayers, bsz, nhid, device=device),             torch.zeros(nlayers, bsz, nhid, device=device))   #Load Text Data
    class Dictionary(object):
        def __init__(self):
            self.word2idx = {}
            self.idx2word = []
    
        def add_word(self, word):
            if word not in self.word2idx:
                self.idx2word.append(word)
                self.word2idx[word] = len(self.idx2word) - 1
            return self.word2idx[word]
    
        def __len__(self):
            return len(self.idx2word)
    
    
    class Corpus(object):
        def __init__(self, path):
            self.dictionary = Dictionary()
            self.train = self.tokenize(os.path.join(path, 'wiki.train.tokens'))
            self.valid = self.tokenize(os.path.join(path, 'wiki.valid.tokens'))
            self.test = self.tokenize(os.path.join(path, 'wiki.test.tokens'))
    
        def tokenize(self, path):
            """Tokenizes a text file."""
            assert os.path.exists(path)
            # Add words to the dictionary
            with open(path, 'r', encoding="utf8") as f:
                for line in f:
                    words = line.split() + ['<eos>']
                    for word in words:
                        self.dictionary.add_word(word)
    
            # Tokenize file content
            with open(path, 'r', encoding="utf8") as f:
                idss = []
                for line in f:
                    words = line.split() + ['<eos>']
                    ids = []
                    for word in words:
                        ids.append(self.dictionary.word2idx[word])
                    idss.append(torch.tensor(ids).type(torch.int64))
                ids = torch.cat(idss)
    
            return ids
    
    model_data_filepath = 'data/'
    
    corpus = Corpus(model_data_filepath + 'wikitext-2')
    
    ntokens = len(corpus.dictionary)
    
    # Load Pretrained Model
    model = LSTMModel(
        ntoken = ntokens,
        ninp = 512,
        nhid = 256,
        nlayers = 5,
    )
    
    model.load_state_dict(
        torch.load(
            model_data_filepath + 'word_language_model_quantize.pth',
            map_location=torch.device('cpu')
            )
        )
    
    model.eval()
    print(model)
    
    bptt = 25
    criterion = nn.CrossEntropyLoss()
    eval_batch_size = 1
    
    # create test data set
    def batchify(data, bsz):
        # Work out how cleanly we can divide the dataset into bsz parts.
        nbatch = data.size(0) // bsz
        # Trim off any extra elements that wouldn't cleanly fit (remainders).
        data = data.narrow(0, 0, nbatch * bsz)
        # Evenly divide the data across the bsz batches.
        return data.view(bsz, -1).t().contiguous()
    
    test_data = batchify(corpus.test, eval_batch_size)
    
    # Evaluation functions
    def get_batch(source, i):
        seq_len = min(bptt, len(source) - 1 - i)
        data = source[i:i+seq_len]
        target = source[i+1:i+1+seq_len].reshape(-1)
        return data, target
    
    def repackage_hidden(h):
      """Wraps hidden states in new Tensors, to detach them from their history."""
    
      if isinstance(h, torch.Tensor):
          return h.detach()
      else:
          return tuple(repackage_hidden(v) for v in h)
    
    def evaluate(model_, data_source):
        # Turn on evaluation mode which disables dropout.
        model_.eval()
        total_loss = 0.
        hidden = init_hidden(model_, eval_batch_size)
        with torch.no_grad():
            for i in range(0, data_source.size(0) - 1, bptt):
                data, targets = get_batch(data_source, i)
                output, hidden = model_(data, hidden)
                hidden = repackage_hidden(hidden)
                output_flat = output.view(-1, ntokens)
                total_loss += len(data) * criterion(output_flat, targets).item()
        return total_loss / (len(data_source) - 1)
    

2. Post Training Dynamic Quantization

​ 动态量化模型可以使用与训练后静态量化相同的函数,但具有动态 qconfig。

from torch.quantization.quantize_fx import prepare_fx, convert_fx
from torch.quantization import default_dynamic_qconfig, float_qparams_weight_only_qconfig

# Full docs for supported qconfig for floating point modules/ops can be found in docs for quantization (TODO: link)
# Full docs for qconfig_dict can be found in the documents of prepare_fx (TODO: link)
qconfig_dict = {
    "object_type": [
        (nn.Embedding, float_qparams_weight_only_qconfig),
        (nn.LSTM, default_dynamic_qconfig),
        (nn.Linear, default_dynamic_qconfig)
    ]
}
# Deepcopying the original model because quantization api changes the model inplace and we want
# to keep the original model for future comparison
model_to_quantize = copy.deepcopy(model)
prepared_model = prepare_fx(model_to_quantize, qconfig_dict)
print("prepared model:", prepared_model)
quantized_model = convert_fx(prepared_model)
print("quantized model", quantized_model)

​ 对于动态量化的objects,仅对模块插入observers,以获得动态可量化的函数和torch ops的权重。融合了Conv + Bn,Linear + ReLU等模块。prepare_fx

​ 在转换中将浮点数模块转换为动态量化模块,并将浮点运算转换为动态量化ops。可以在示例模型中看到 ,是动态量化的。nn.Embedding``nn.Linear``nn.LSTM

​ 现在我们可以比较量化模型的大小和运行时间。

def print_size_of_model(model):
    torch.save(model.state_dict(), "temp.p")
    print('Size (MB):', os.path.getsize("temp.p")/1e6)
    os.remove('temp.p')

print_size_of_model(model)
print_size_of_model(quantized_model)

​ 有 4 倍的尺寸减小,因为我们量化了模型中的所有权重 (nn.Embedding, nn.Linearnn.LSTM)从浮点数(4 个字节)到量化的 int(1 个字节)。

torch.set_num_threads(1)

def time_model_evaluation(model, test_data):
    s = time.time()
    loss = evaluate(model, test_data)
    elapsed = time.time() - s
    print('''loss: {0:.3f}\nelapsed time (seconds): {1:.1f}'''.format(loss, elapsed))

time_model_evaluation(model, test_data)
time_model_evaluation(quantized_model, test_data)

​ 此模型的加速速度大约为 2 倍。另请注意,加速可能会因型号,设备,构建,输入批量大小,线程等而异。

3. Conclusion

.1f}‘’'.format(loss, elapsed))

time_model_evaluation(model, test_data) time_model_evaluation(quantized_model, test_data)


​	此模型的加速速度大约为 2 倍。另请注意,加速可能会因型号,设备,构建,输入批量大小,线程等而异。

### [3. Conclusion](https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_dynamic.html#conclusion)

​	本教程介绍了用于 FX 图形模式下的训练后动态量化的 API,该 API 可动态量化与 Eager 模式量化相同的模块。

标签: bsz808a振动传感器变送器

锐单商城拥有海量元器件数据手册IC替代型号,打造 电子元器件IC百科大全!

 锐单商城 - 一站式电子元器件采购平台  

 深圳锐单电子有限公司