Introduction to Safetensors
Safetensors is an innovative library designed to provide a lightweight, secure, and efficient way of serializing and deserializing machine learning models. It prioritizes speed and safety while eliminating common pitfalls like malleability and execution risks associated with pickle or other less secure formats.
Whether you’re developing machine learning models, working with pre-trained transformers, or managing complex workflows, Safetensors makes it both easy and reliable to save and load your tensor data.
Why Safetensors?
- Security: Eliminates issues related to unauthorized code execution that arise with formats like pickle.
- Speed: Engineered for fast serialization and deserialization.
- Cross-compatibility: Operates seamlessly across Python and Rust ecosystems.
Key Safetensors APIs Explained with Examples
Let’s dive into some of the most essential APIs offered by Safetensors and learn how to use them in your workflows.
1. Saving a Tensor
The safe_save
function is used to save tensors securely into a `.safetensors` file.
from safetensors.numpy import save_file import numpy as np # Example data tensors = { "tensor1": np.random.rand(3, 3), "tensor2": np.random.rand(5), } # Save tensors to safetensors format save_file(tensors, "data.safetensors")
2. Loading a Tensor
The safe_load
function allows you to load tensors from a `.safetensors` file.
from safetensors.numpy import load_file # Load tensors back from the file loaded_tensors = load_file("data.safetensors") # Access the loaded tensors print(loaded_tensors["tensor1"]) print(loaded_tensors["tensor2"])
3. Metadata Support
Safetensors supports adding custom metadata to your serialized files, which can be critical for identifying the model version or other key information.
from safetensors.numpy import save_file # Tensors with metadata tensors = {"weights": np.random.rand(10, 10)} metadata = {"model_name": "MyModel", "version": "1.0"} # Save with metadata save_file(tensors, "model_metadata.safetensors", metadata=metadata)
4. Checking Metadata
Easily read metadata from a `.safetensors` file without loading the tensor values.
from safetensors.numpy import load_file_with_metadata # Load metadata without load tensors data, metadata = load_file_with_metadata("model_metadata.safetensors") print(metadata)
5. Integration with PyTorch
Safetensors integrates seamlessly with PyTorch tensors:
import torch from safetensors.torch import save_file, load_file # Example PyTorch tensor pytorch_tensor = torch.rand(3, 3) # Save it to file tensors = {"pytorch_tensor": pytorch_tensor} save_file(tensors, "pytorch_data.safetensors") # Load it back loaded_tensors = load_file("pytorch_data.safetensors") print(loaded_tensors["pytorch_tensor"].shape)
Application Example: Saving and Loading Transformer Models
Here’s an example where Safetensors is used to save and load a transformer model effectively:
from transformers import AutoModel from safetensors.torch import save_file, load_file # Load a pre-trained transformer model model = AutoModel.from_pretrained("bert-base-uncased") # Serialize model weights tensors = {name: param.detach().cpu() for name, param in model.state_dict().items()} save_file(tensors, "transformer_model.safetensors") # Deserialize model weights loaded_tensors = load_file("transformer_model.safetensors") # Apply the weights back to the model model.load_state_dict({name: torch.tensor(weight) for name, weight in loaded_tensors.items()}, strict=False)
With Safetensors, you can streamline your model storage process while ensuring security and efficiency.
Conclusion
Safetensors is a groundbreaking tool for serializing machine learning models with unparalleled speed and security. By understanding and utilizing its various APIs, you can simplify your workflows, prevent security risks, and improve processing performance in your work environment.
Try Safetensors today and experience a safer and faster way to manage your models!