position_ids = position_ids.view (-1, seq_length).long () RuntimeError: shape ' [-1, 0]' is invalid for input of size 1073 It seems to be a problem in flash-attn, although I followed your instructions ...
The CUDA architecture in PyTorch leverages the power of GPUs to speed up computations by using the parallel computing power of NVIDIA. Deep learning models are trained using GPU memory, which stores ...
from transformers import AutoConfig import torch from flash_attn.models.gpt_neox import gpt_neox_config_to_gpt2_config from flash_attn.models.gpt import GPTModel ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results