Our work-in-progress proposal to distribute machine/deep learning models reusing the Maven
The codename "moven" comes from combining "models" with "maven", although we could
come up with better name proposals.
The issue of how to distribute large models has appeared when integrating the Financial
Deep Learning Classifier.
In that case we needed to distribute 4GB of models to whoever wanted to run the classifier.
The issue was manually solved just for that case by manually sending the models. But it doesn't
look to be any well-accepted approach to solve this issue (for instance TensorFlow has
a dedicated repository for models).
This topic has been later discussed in a brainstoimg we had in Passau on July 13th, with
André Freitas, Leonardo Souza, Rupert Westenthaler and Sergio Fernández present.
- Maven plugin to build/distribute model artifacts
- Consumable/usable from Java and Python applications
The expected high-level workflow is something like:
- You separatelly provide a Maven package with your model/s, and publish it using regular Maven infrastructure.
- You declare your model dependency in your application (currently Java and Python are supported).
in the root folder ot the tool.
Generate a Moven artifact
You can find an example artifact at
java/example, where basically you need to place your models
src/main/models and trigger the
copy-models goal in your build lifecycle:
<plugin> <groupId>io.redlink.ssix.moven</groupId> <artifactId>moven-maven-plugin</artifactId> <version>0.1.0</version> <executions> <execution> <phase>process-resources</phase> <goals> <goal>copy-models</goal> </goals> </execution> </executions> </plugin>
Or you can simple use the provided archetype to build the skeleton of you model artifact
as any other archetype:
mvn archetype:generate \ -DarchetypeGroupId=io.redlink.ssix.moven \ -DarchetypeArtifactId=moven-model-archetype \ -DarchetypeVersion=<archetype-version> \ -DgroupId=<my.groupid> \ -DartifactId=<my-artifactId>
You should place your models at
src/main/models, as described before.
The you can normally deploy your models to on any regular Maven repository.
Use the models in your Java application
First you have to normally declare a dependency to the model artifact in your
Models are available at
META-INF/resources/models inside the JAR file. So tyipically you'd
retrieve then from the classpath:
Although models are also exposed via HTTP as static resources when the JAR is deployed
in any Servlet
>=3.0 container. That has got inspiration from James Ward and the
WebJars project; you can get further technical details from
paragraph 10.5 of JSR315 (Servlet 3.0 specification).
Use the models in your Python application
The module is published at PypI, so you can install it by executing:
pip install moven
You have to declare you dependency with models in a
models.txt file. Each line
will declare a dependency using a simple syntax from jip:
groupId:artifactId:version (which is inspired by
Then you can run
moven to retrieve all required models:
Models will be copied into the
./moven folder, organized by
artifactId in sub-folders.
Actual model artifacts will be cached at
$VIRTUAL_ENV/.jip/cache if you're
using a virtual environment).
Use it for continuous integration
If you are using any CI that supports custom Docker images, you can use the one
provided by the automated build.
In case you want to build a (custom) image based on this, you can build it by executing:
docker build -t ssix/moven .
This tool is available under Apache License, Version 2.0.