Llama3.1 8B Load Fail: _tensor Split Error In VLLM
Hey guys,
I'm encountering an issue while trying to load Llama3.1 8B using vLLM, and I'm hoping someone can help me out. Specifically, I'm getting an AttributeError: module 'torch._tensor' has no attribute 'split'
error. This seems to be happening with both v0.9.2 and v0.10.0. Let's dive into the details, environment, and the error itself.
Understanding the Llama3.1 8B Loading Bug
When dealing with large language models like Llama3.1 8B, encountering loading issues can be quite a headache. The error message AttributeError: module 'torch._tensor' has no attribute 'split'
indicates that there's a problem with how PyTorch, the underlying deep learning framework, is handling tensor operations within vLLM. This can stem from various factors, including version incompatibilities, CUDA issues, or incorrect configurations. To get this sorted, let's dig into the environment and the traceback to pinpoint the exact cause.
Environment Details
First, let’s take a look at the environment details. Knowing the specifics of the system, PyTorch installation, CUDA setup, and library versions is super crucial for troubleshooting. Here’s a breakdown of the key components:
System Information
- OS: Ubuntu 22.04.5 LTS (x86_64)
- GCC Version: 11.4.0
- Python Version: 3.11.13
- CPU: AMD Ryzen Threadripper PRO 5995WX 64-Cores
- GPUs: 4x NVIDIA RTX A5000
- Nvidia Driver Version: 570.153.02
This setup indicates a high-performance computing environment, which is excellent for running large models. However, it also means that any misconfiguration could lead to significant issues.
PyTorch and CUDA
- PyTorch Version: 2.7.0+cu126
- CUDA Version: 12.4.131
- CUDA Module Loading: LAZY
It’s worth noting that PyTorch 2.7.0 is quite an unusual version. The latest stable PyTorch versions are in the 2.0 and 2.1 series. Using such an old version might be the root cause here, as certain functions and attributes might not be available or have been deprecated. The CUDA version looks fine, but the compatibility between PyTorch and CUDA versions needs to be verified.
Python Packages
Here’s a snippet of the relevant Python packages installed:
[pip3] torch==2.7.0
[pip3] transformers==4.53.3
[pip3] triton==2.3.0
[conda] torch==2.7.0
[conda] transformers==4.53.3
[conda] triton==2.3.0
Again, the transformers version (4.53.3) is quite old as well. Modern transformers libraries have seen significant updates and improvements. An outdated version could clash with newer model architectures or functionalities.
vLLM Information
- vLLM Version: 0.9.2
- CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
The vLLM version seems reasonable, but the CUDA archs not being set could be a potential area of concern. Sometimes, explicitly setting the CUDA architecture can resolve compatibility issues.
Decoding the Error: AttributeError: module 'torch._tensor' has no attribute 'split'
The core of the issue lies in this error message: AttributeError: module 'torch._tensor' has no attribute 'split'
. Let’s break this down.
What Does This Error Mean?
This error indicates that the torch._tensor
module, which is part of PyTorch’s internal API, doesn't have a function or attribute named split
. The split
function is typically used to divide a tensor into multiple chunks. This operation is common in neural network architectures, especially when dealing with attention mechanisms or splitting layers for parallel processing.
Why Is This Happening?
- Deprecated Function: The
torch._tensor.split
might have been deprecated or removed in the PyTorch version being used (2.7.0). PyTorch's internal APIs (those starting with_
) are not guaranteed to be stable across versions. - Incorrect Usage: There might be an attempt to use this function incorrectly within the vLLM codebase or a compiled part of it.
- TorchDynamo Compilation Issue: TorchDynamo is PyTorch’s JIT compiler, and it might be mishandling the compilation of the
split
operation. This is suggested by the traceback, which includestorch._dynamo.exc.BackendCompilerFailed
.
Analyzing the Traceback
The traceback provides a roadmap of where the error occurred in the code execution. Let’s walk through the key parts:
-
vLLM Initialization: The error starts during the initialization of the
LLM
class from vLLM:llm = LLM(model=