Public Repository

Last pushed: 5 months ago
Short Description
Intel's Deep Learning Deployment Toolkit
Full Description

With the Deep Learning Deployment Toolkit you can:

  • Optimize trained deep learning networks through model compression and weight quantization, which are tailored to end-point device characteristics
  • Deliver a unified API to integrate inference with application logic

The Deep Learning Deployment Toolkit comprises two main components: Model Optimizer and Inference Engine.

Model Optimizer is a cross-platform command line tool that performs static model analysis and adjusts deep learning models for optimal execution on end-point target devices:

  • Takes as input a trained network that contains a certain network topology, parameters, and the adjusted weights and biases. The input network is produced using the Caffe* framework
  • Performs horizontal and vertical fusion of the network layers
  • Prunes unused branches in the network
  • Applies weights compression methods
  • Produces as output an Internal Representation (IR) of the network - a pair of files that describe the whole model:
    Topology file - an XML file that describes the network topology
    Trained data file - a .bin file that contains the weights and biases binary data
  • The produced IR is used as an input for the Inference Engine.

Model Optimizer developer guide:

Inference Engine is a runtime that delivers a unified API to integrate the inference with application logic:

  • Takes as input an IR produced by Model Optimizer
  • Optimizes inference execution for target hardware
  • Delivers inference solution with reduced footprint on embedded inference platforms

Inference Engine developer guide:

Docker Pull Command