Skip to main content

Multiple deserialization vulnerabilities in PyTorch Lightning 2.4.0 and earlier versions


Jasper_The_Rasper
Moderator
Forum|alt.badge.img+54

Vulnerability Note VU#252619

Original Release Date: 2025-04-03 | Last Revised: 2025-04-03

  

Overview

PyTorch Lightning versions 2.4.0 and earlier do not use any verification mechanisms to ensure that model files are safe to load before loading them. Users of PyTorch Lightning should use caution when loading models from unknown or unmanaged sources.

Description

PyTorch Lightning, a high-level framework built on top of PyTorch, is designed to streamline deep learning model training, scaling, and deployment. PyTorch Lightning is widely used in AI research and production environments, often integrating with various cloud and distributed computing platforms to manage large-scale machine learning workloads.

PyTorch Lightning contains multiple vulnerabilities related to the deserialization of untrusted data (CWE-502). These vulnerabilities arise from the unsafe use of torch.load(), which is used to deserialize model checkpoints, configurations, and sometimes metadata. While torch.load() provides an optional weights_only=True parameter to mitigate the risks of loading arbitrary code, PyTorch Lightning does not require or enforce this safeguard as a principal security requirement for the product.

Kasimir Schulz of HiddenLayer identified and reported the following five vulnerabilities:

  1. The DeepSpeed integration in PyTorch Lightning loads optimizer states and model checkpoints without enforcing safe deserialization practices. It does not validate the integrity or origin of serialized data before passing it to torch.load(), allowing deserialization of arbitrary objects.
  2. The PickleSerializer class directly utilizes Python’s pickle module to handle data serialization and deserialization. Since pickle inherently allows execution of embedded code during deserialization, any untrusted or manipulated input processed by this class can introduce security risks.
  3. The _load_distributed_checkpoint component is responsible for handling distributed training checkpoints. It processes model state data across multiple nodes, but it does not include safeguards to verify or restrict the content being deserialized.
  4. The _lazy_load function is designed to defer loading of model components for efficiency. However, it does not enforce security controls on the serialized input, allowing for the potential deserialization of unverified objects.
  5. The Cloud_IO module facilitates storage and retrieval of model files from local and remote sources. It provides multiple deserialization pathways, such as handling files from disk, from remote servers, and from in-memory byte streams, without applying constraints on how the serialized data is interpreted.

Impact

A user could unknowingly load a malicious file from local or remote locations containing embedded code that executes within the system’s context, potentially leading to full system compromise.

 

>>Full Article<<

Reply