intel/intel-extension-for-pytorch
50K+
Intel® Extension for PyTorch* extends PyTorch* with up-to-date feature optimizations for an extra performance boost on Intel hardware.
On Intel CPUs optimizations take advantage of the following instuction sets:
On Intel GPUs Intel® Extension for PyTorch* provides easy GPU acceleration through the PyTorch* xpu
device. The following Intel GPUs are supported:
Images available here start with the Ubuntu* 22.04 base image with Intel® Extension for PyTorch* built for different use cases as well as some additional software. The Python Dockerfile is used to generate The images below at https://github.com/intel/ai-containers.
Note: There are two dockerhub repositories (
intel/intel-extension-for-pytorch
andintel/intel-optimized-pytorch
) that are routinely updated with the latest images, however, some legacy images have not be published to both repositories.
The images below include support for both CPU and GPU optimizations:
Tag(s) | Pytorch | IPEX | Driver | Dockerfile |
---|---|---|---|---|
2.5.10-xpu-pip-base ,2.5.10-xpu | v2.5.1 | v2.5.10+xpu | 1057 | v0.4.0-Beta |
2.3.110-xpu-pip-base ,2.3.110-xpu | v2.3.1 | v2.3.110+xpu | 950 | v0.4.0-Beta |
2.1.40-xpu-pip-base ,2.1.40-xpu | v2.1.0 | v2.1.40+xpu | 914 | v0.4.0-Beta |
2.1.30-xpu | v2.1.0 | v2.1.30+xpu | 803 | v0.4.0-Beta |
2.1.20-xpu | v2.1.0 | v2.1.20+xpu | 803 | v0.3.4 |
2.1.10-xpu | v2.1.0 | v2.1.10+xpu | 736 | v0.2.3 |
xpu-flex-2.0.110-xpu | v2.0.1 | v2.0.110+xpu | 647 | v0.1.0 |
docker run -it --rm \
--device /dev/dri \
-v /dev/dri/by-path:/dev/dri/by-path \
--ipc=host \
intel/intel-extension-for-pytorch:2.5.10-xpu
The images below additionally include Jupyter Notebook server:
Tag(s) | Pytorch | IPEX | Driver | Jupyter Port | Dockerfile |
---|---|---|---|---|---|
2.5.10-xpu-pip-jupyter | v2.5.1 | v2.5.10+xpu | 1057 | 8888 | v0.4.0-Beta |
2.3.110-xpu-pip-jupyter | v2.3.1 | v2.3.110+xpu | 950 | 8888 | v0.4.0-Beta |
2.1.40-xpu-pip-jupyter | v2.1.0 | v2.1.40+xpu | 914 | 8888 | v0.4.0-Beta |
2.1.20-xpu-pip-jupyter | v2.1.0 | v2.1.20+xpu | 803 | 8888 | v0.3.4 |
2.1.10-xpu-pip-jupyter | v2.1.0 | v2.1.10+xpu | 736 | 8888 | v0.2.3 |
docker run -it --rm \
-p 8888:8888 \
--device /dev/dri \
-v /dev/dri/by-path:/dev/dri/by-path \
intel/intel-extension-for-pytorch:2.5.10-xpu-pip-jupyter
After running the command above, copy the URL (something like http://127.0.0.1:$PORT/?token=***
) into your browser to access the notebook server.
The images below are built only with CPU optimizations (GPU acceleration support was deliberately excluded):
Tag(s) | Pytorch | IPEX | Dockerfile |
---|---|---|---|
2.6.0-pip-base , latest | v2.6.0 | v2.6.0+cpu | v0.4.0-Beta |
2.5.0-pip-base , latest | v2.5.0 | v2.5.0+cpu | v0.4.0-Beta |
2.4.0-pip-base | v2.4.0 | v2.4.0+cpu | v0.4.0-Beta |
2.3.0-pip-base | v2.3.0 | v2.3.0+cpu | v0.4.0-Beta |
2.2.0-pip-base | v2.2.0 | v2.2.0+cpu | v0.3.4 |
2.1.0-pip-base | v2.1.0 | v2.1.0+cpu | v0.2.3 |
2.0.0-pip-base | v2.0.0 | v2.0.0+cpu | v0.1.0 |
docker run -it --rm intel/intel-extension-for-pytorch:latest
The images below additionally include Jupyter Notebook server:
Tag(s) | Pytorch | IPEX | Dockerfile |
---|---|---|---|
2.6.0-pip-jupyter | v2.6.0 | v2.6.0+cpu | v0.4.0-Beta |
2.5.0-pip-jupyter | v2.5.0 | v2.5.0+cpu | v0.4.0-Beta |
2.4.0-pip-jupyter | v2.4.0 | v2.4.0+cpu | v0.4.0-Beta |
2.3.0-pip-jupyter | v2.3.0 | v2.3.0+cpu | v0.4.0-Beta |
2.2.0-pip-jupyter | v2.2.0 | v2.2.0+cpu | v0.3.4 |
2.1.0-pip-jupyter | v2.1.0 | v2.1.0+cpu | v0.2.3 |
2.0.0-pip-jupyter | v2.0.0 | v2.0.0+cpu | v0.1.0 |
docker run -it --rm \
-p 8888:8888 \
-v $PWD/workspace:/workspace \
-w /workspace \
intel/intel-extension-for-pytorch:2.6.0-pip-jupyter
After running the command above, copy the URL (something like http://127.0.0.1:$PORT/?token=***
) into your browser to access the notebook server.
The images below additionally include Intel® oneAPI Collective Communications Library (oneCCL) and Neural Compressor (INC):
Tag(s) | Pytorch | IPEX | oneCCL | INC | Dockerfile |
---|---|---|---|---|---|
2.4.0-pip-multinode | v2.4.0 | v2.4.0+cpu | v2.4.0 | v3.0 | v0.4.0-Beta |
2.3.0-pip-multinode | v2.3.0 | v2.3.0+cpu | v2.3.0 | v2.6 | v0.4.0-Beta |
2.2.0-pip-multinode | v2.2.2 | v2.2.0+cpu | v2.2.0 | v2.6 | v0.4.0-Beta |
2.1.100-pip-mulitnode | v2.1.2 | v2.1.100+cpu | v2.1.0 | v2.6 | v0.4.0-Beta |
2.0.100-pip-multinode | v2.0.1 | v2.0.100+cpu | v2.0.0 | v2.6 | v0.4.0-Beta |
NotePasswordless SSH connection is also enabled in the image, but the container does not contain any SSH ID keys. The user needs to mount those keys at `/root/.ssh/id_rsa` and `/etc/ssh/authorized_keys`.
TipBefore mounting any keys, modify the permissions of those files with `chmod 600 authorized_keys; chmod 600 id_rsa` to grant read access for the default user account.
Setup and Run IPEX Multi-Node Container
ImportantMaintainence, Bug Fixes, and Releases of [Intel® Extension for PyTorch*] Multi-Node Container for Xeon Processors have ceased development. The last supported version is `2.4.0`. For future releases, please use the [Intel® Extension for PyTorch*] Multi-Node Container for XPU.
Some additional assembly is required to utilize this container with OpenSSH. To perform any kind of DDP (Distributed Data Parallel) execution, containers are assigned the roles of launcher and worker respectively:
SSH Server (Worker)
/etc/ssh/authorized_keys
SSH Client (Launcher)
/root/.ssh/id_rsa
To add these files correctly please follow the steps described below.
Setup ID Keys
You can use the commands provided below to generate the identity keys for OpenSSH.
ssh-keygen -q -N "" -t rsa -b 4096 -f ./id_rsa
touch authorized_keys
cat id_rsa.pub >> authorized_keys
Configure the permissions and ownership for all of the files you have created so far
chmod 600 id_rsa config authorized_keys
chown root:root id_rsa.pub id_rsa config authorized_keys
Create a hostfile for torchrun
or ipexrun
. (Optional)
Host host1
HostName <Hostname of host1>
IdentitiesOnly yes
IdentityFile ~/.root/id_rsa
Port <SSH Port>
Host host2
HostName <Hostname of host2>
IdentitiesOnly yes
IdentityFile ~/.root/id_rsa
Port <SSH Port>
...
Configure Intel® oneAPI Collective Communications Library in your python script
import oneccl_bindings_for_pytorch
import os
dist.init_process_group(
backend="ccl",
init_method="tcp://127.0.0.1:3022",
world_size=int(os.environ.get("WORLD_SIZE")),
rank=int(os.environ.get("RANK")),
)
Now start the workers and execute DDP on the launcher
Worker run command:
docker run -it --rm \
--net=host \
-v $PWD/authorized_keys:/etc/ssh/authorized_keys \
-v $PWD/tests:/workspace/tests \
-w /workspace \
intel/intel-extension-for-pytorch:2.4.0-pip-multinode \
bash -c '/usr/sbin/sshd -D'
Launcher run command:
docker run -it --rm \
--net=host \
-v $PWD/id_rsa:/root/.ssh/id_rsa \
-v $PWD/tests:/workspace/tests \
-v $PWD/hostfile:/workspace/hostfile \
-w /workspace \
intel/intel-extension-for-pytorch:2.4.0-pip-multinode \
bash -c 'ipexrun cpu --nnodes 2 --nprocs-per-node 1 --master-addr 127.0.0.1 --master-port 3022 /workspace/tests/ipex-resnet50.py --ipex --device cpu --backend ccl'
Note[Intel® MPI] can be configured based on your machine settings. If the above commands do not work for you, see the documentation for how to configure based on your network.
Enable DeepSpeed* optimizations
To enable DeepSpeed* optimizations with Intel® oneAPI Collective Communications Library, add the following to your python script:
import deepspeed
# Rather than dist.init_process_group(), use deepspeed.init_distributed()
deepspeed.init_distributed(backend="ccl")
Additionally, if you have a DeepSpeed* configuration you can use the below command as your launcher to run your script with that configuration:
docker run -it --rm \
--net=host \
-v $PWD/id_rsa:/root/.ssh/id_rsa \
-v $PWD/tests:/workspace/tests \
-v $PWD/hostfile:/workspace/hostfile \
-v $PWD/ds_config.json:/workspace/ds_config.json \
-w /workspace \
intel/intel-extension-for-pytorch:2.4.0-pip-multinode \
bash -c 'deepspeed --launcher IMPI \
--master_addr 127.0.0.1 --master_port 3022 \
--deepspeed_config ds_config.json --hostfile /workspace/hostfile \
/workspace/tests/ipex-resnet50.py --ipex --device cpu --backend ccl --deepspeed'
The image below is an extension of the IPEX Multi-Node Container designed to run Hugging Face Generative AI scripts. The container has the typical installations needed to run and fine tune PyTorch generative text models from Hugging Face. It can be used to run multinode jobs using the same instructions from the IPEX Multi-Node container.
Tag(s) | Pytorch | IPEX | oneCCL | HF Transformers | Dockerfile |
---|---|---|---|---|---|
2.4.0-pip-multinode-hf-4.44.0-genai | v2.4.0 | v2.4.0+cpu | v2.4.0 | v4.44.0 | v0.4.0-Beta |
Below is an example that shows single node job with the existing finetune.py
script.
# Change into home directory first and run the command
docker run -it \
-v $PWD/workflows/charts/huggingface-llm/scripts:/workspace/scripts \
-w /workspace/scripts \
intel/intel-extension-for-pytorch:2.4.0-pip-multinode-hf-4.44.0-genai \
bash -c 'python finetune.py <script-args>'
The images below are TorchServe* with CPU Optimizations:
Tag(s) | Pytorch | IPEX | Dockerfile |
---|---|---|---|
2.6.0-serving-cpu | v2.6.0 | v2.6.0+cpu | v0.4.0-Beta |
2.5.0-serving-cpu | v2.5.0 | v2.5.0+cpu | v0.4.0-Beta |
2.4.0-serving-cpu | v2.4.0 | v2.4.0+cpu | v0.4.0-Beta |
2.3.0-serving-cpu | v2.3.0 | v2.3.0+cpu | v0.4.0-Beta |
2.2.0-serving-cpu | v2.2.0 | v2.2.0+cpu | v0.3.4 |
For more details, follow the procedure in the TorchServe instructions.
The images below are TorchServe* with XPU Optimizations:
Tag(s) | Pytorch | IPEX | Dockerfile |
---|---|---|---|
2.3.110-serving-xpu | v2.3.1 | v2.3.110+xpu | v0.4.0-Beta |
The images below are built only with CPU optimizations (GPU acceleration support was deliberately excluded) and include Intel® Distribution for Python*:
Tag(s) | Pytorch | IPEX | Dockerfile |
---|---|---|---|
2.6.0-idp-base | v2.6.0 | v2.6.0+cpu | v0.4.0-Beta |
2.5.0-idp-base | v2.5.0 | v2.5.0+cpu | v0.4.0-Beta |
2.4.0-idp-base | v2.4.0 | v2.4.0+cpu | v0.4.0-Beta |
2.3.0-idp-base | v2.3.0 | v2.3.0+cpu | v0.4.0-Beta |
2.2.0-idp-base | v2.2.0 | v2.2.0+cpu | v0.3.4 |
2.1.0-idp-base | v2.1.0 | v2.1.0+cpu | v0.2.3 |
2.0.0-idp-base | v2.0.0 | v2.0.0+cpu | v0.1.0 |
The images below additionally include Jupyter Notebook server:
Tag(s) | Pytorch | IPEX | Dockerfile |
---|---|---|---|
2.6.0-idp-jupyter | v2.6.0 | v2.6.0+cpu | v0.4.0-Beta |
2.5.0-idp-jupyter | v2.5.0 | v2.5.0+cpu | v0.4.0-Beta |
2.4.0-idp-jupyter | v2.4.0 | v2.4.0+cpu | v0.4.0-Beta |
2.3.0-idp-jupyter | v2.3.0 | v2.3.0+cpu | v0.4.0-Beta |
2.2.0-idp-jupyter | v2.2.0 | v2.2.0+cpu | v0.3.4 |
2.1.0-idp-jupyter | v2.1.0 | v2.1.0+cpu | v0.2.3 |
2.0.0-idp-jupyter | v2.0.0 | v2.0.0+cpu | v0.1.0 |
The images below additionally include Intel® oneAPI Collective Communications Library (oneCCL) and Neural Compressor (INC):
Tag(s) | Pytorch | IPEX | oneCCL | INC | Dockerfile |
---|---|---|---|---|---|
2.4.0-idp-multinode | v2.4.0 | v2.4.0+cpu | v2.4.0 | v3.0 | v0.4.0-Beta |
2.3.0-idp-multinode | v2.3.0 | v2.3.0+cpu | v2.3.0 | v2.6 | v0.4.0-Beta |
2.2.0-idp-multinode | v2.2.0 | v2.2.0+cpu | v2.2.0 | v2.4.1 | v0.3.4 |
2.1.0-idp-mulitnode | v2.1.0 | v2.1.0+cpu | v2.1.0 | v2.3.1 | v0.2.3 |
2.0.0-idp-multinode | v2.0.0 | v2.0.0+cpu | v2.0.0 | v2.1.1 | v0.1.0 |
The images below are built only with CPU and GPU optimizations and include Intel® Distribution for Python*:
Tag(s) | Pytorch | IPEX | Driver | Dockerfile |
---|---|---|---|---|
2.5.10-xpu-idp-base | v2.5.1 | v2.5.10+xpu | 1057 | v0.4.0-Beta |
2.3.110-xpu-idp-base | v2.3.1 | v2.3.110+xpu | 950 | v0.4.0-Beta |
2.1.40-xpu-idp-base | v2.1.0 | v2.1.40+xpu | 914 | v0.4.0-Beta |
2.1.30-xpu-idp-base | v2.1.0 | v2.1.30+xpu | 803 | v0.4.0-Beta |
2.1.10-xpu-idp-base | v2.1.0 | v2.1.10+xpu | 736 | v0.2.3 |
The images below additionally include Jupyter Notebook server:
Tag(s) | Pytorch | IPEX | Driver | Jupyter Port | Dockerfile |
---|---|---|---|---|---|
2.5.10-xpu-idp-jupyter | v2.5.1 | v2.5.10+xpu | 1057 | 8888 | v0.4.0-Beta |
2.3.110-xpu-idp-jupyter | v2.3.1 | v2.3.110+xpu | 950 | 8888 | v0.4.0-Beta |
2.1.40-xpu-idp-jupyter | v2.1.0 | v2.1.40+xpu | 914 | 8888 | v0.4.0-Beta |
2.1.20-xpu-idp-jupyter | v2.1.0 | v2.1.20+xpu | 803 | 8888 | v0.3.4 |
2.1.10-xpu-idp-jupyter | v2.1.0 | v2.1.10+xpu | 736 | 8888 | v0.2.3 |
To build the images from source, clone the AI Containers repository, follow the main README.md
file to setup your environment, and run the following command:
cd pytorch
docker compose build ipex-base
docker compose run ipex-base
You can find the list of services below for each container in the group:
Service Name | Description |
---|---|
ipex-base | Base image with Intel® Extension for PyTorch* |
jupyter | Adds Jupyter Notebook server |
multinode | Adds Intel® oneAPI Collective Communications Library and INC |
xpu | Adds Intel GPU Support |
xpu-jupyter | Adds Jupyter notebook server to GPU image |
serving | TorchServe* |
The following images are available for MLPerf-optimized workloads. Instructions are available at 'Get Started with Intel MLPerf'.
Tag(s) | Base OS | MLPerf Round | Target Platform |
---|---|---|---|
mlperf-inference-4.1-resnet50 | rockylinux:8.7 | Inference v4.1 | Intel(R) Xeon(R) Platinum 8592+ |
mlperf-inference-4.1-retinanet | ubuntu:22.04 | Inference v4.1 | Intel(R) Xeon(R) Platinum 8592+ |
mlperf-inference-4.1-gptj | ubuntu:22.04 | Inference v4.1 | Intel(R) Xeon(R) Platinum 8592+ |
mlperf-inference-4.1-bert | ubuntu:22.04 | Inference v4.1 | Intel(R) Xeon(R) Platinum 8592+ |
mlperf-inference-4.1-dlrmv2 | rockylinux:8.7 | Inference v4.1 | Intel(R) Xeon(R) Platinum 8592+ |
mlperf-inference-4.1-3dunet | ubuntu:22.04 | Inference v4.1 | Intel(R) Xeon(R) Platinum 8592+ |
View the License for the Intel® Extension for PyTorch*.
The images below also contain other software which may be under other licenses (such as Pytorch*, Jupyter*, Bash, etc. from the base).
It is the image user's responsibility to ensure that any use of The images below comply with any relevant licenses for all software contained within.
* Other names and brands may be claimed as the property of others.
docker pull intel/intel-extension-for-pytorch