Public | Automated Build

Last pushed: 8 months ago
Short Description
Moven CI Docker Image
Full Description

Moven

Our work-in-progress proposal to distribute machine/deep learning models reusing the Maven
infrastructure.

The codename "moven" comes from combining "models" with "maven", although we could
come up with better name proposals.

Rationale

The issue of how to distribute large models has appeared when integrating the Financial
Deep Learning Classifier.

In that case we needed to distribute 4GB of models to whoever wanted to run the classifier.
The issue was manually solved just for that case by manually sending the models. But it doesn't
look to be any well-accepted approach to solve this issue (for instance TensorFlow has
a dedicated repository for models).

This topic has been later discussed in a brainstoimg we had in Passau on July 13th, with
André Freitas, Leonardo Souza, Rupert Westenthaler and Sergio Fernández present.

You can find the slides
of our presentation
at Apache Big Data in Seville in November 2016:

Features

  • Maven plugin to build/distribute model artifacts
  • Consumable/usable from Java and Python applications

Workflow

The expected high-level workflow is something like:

  1. You separatelly provide a Maven package with your model/s, and publish it using regular Maven infrastructure.
  2. You declare your model dependency in your application (currently Java and Python are supported).

Installation

Simply run:

mvn install

in the root folder ot the tool.

Documentation

Generate a Moven artifact

You can find an example artifact at java/example, where basically you need to place your models
at src/main/models and trigger the copy-models goal in your build lifecycle:

<plugin>
  <groupId>io.redlink.ssix.moven</groupId>
  <artifactId>moven-maven-plugin</artifactId>
  <version>0.1.0</version>
  <executions>
    <execution>
      <phase>process-resources</phase>
      <goals>
        <goal>copy-models</goal>
      </goals>
    </execution>
  </executions>
</plugin>

Or you can simple use the provided archetype to build the skeleton of you model artifact
as any other archetype:

mvn archetype:generate                         \
  -DarchetypeGroupId=io.redlink.ssix.moven     \
  -DarchetypeArtifactId=moven-model-archetype  \
  -DarchetypeVersion=<archetype-version>       \
  -DgroupId=<my.groupid>                       \
  -DartifactId=<my-artifactId>

You should place your models at src/main/models, as described before.

The you can normally deploy your models to on any regular Maven repository.

Use the models in your Java application

First you have to normally declare a dependency to the model artifact in your pom.xml file.

Models are available at META-INF/resources/models inside the JAR file. So tyipically you'd
retrieve then from the classpath:

this.getClass().getClassLoader()
    .getResourceAsStream("META-INF/resources/models/foo.ex")

Although models are also exposed via HTTP as static resources when the JAR is deployed
in any Servlet >=3.0 container. That has got inspiration from James Ward and the
WebJars project; you can get further technical details from
paragraph 10.5 of JSR315 (Servlet 3.0 specification).

Use the models in your Python application

The module is published at PypI, so you can install it by executing:

pip install moven

You have to declare you dependency with models in a models.txt file. Each line
will declare a dependency using a simple syntax from jip:
groupId:artifactId:version (which is inspired by
Groovy's Grape;
e.g., io.redlink.ssix.moven:moven-syntaxnet-example:1.0-SNAPSHOT.

Then you can run moven to retrieve all required models:

moven models.txt

Models will be copied into the ./moven folder, organized by artifactId in sub-folders.
Actual model artifacts will be cached at $HOME/.jip/cache ($VIRTUAL_ENV/.jip/cache if you're
using a virtual environment).

Use it for continuous integration

If you are using any CI that supports custom Docker images, you can use the one
provided by the automated build.

In case you want to build a (custom) image based on this, you can build it by executing:

docker build -t ssix/moven .

License

This tool is available under Apache License, Version 2.0.

Docker Pull Command
Owner
wikier
Source Repository