Public | Automated Build

Last pushed: a year ago
Short Description
Read data from MS Access MDB files by Python on alpine.
Full Description


Read data from MS Access MDB files by Python, only read, built on alpine.

Based on:

  • MDB Tools MDB Tools is a set of programs to help you use Microsoft Access file in various settings.
  • mdbread A simple Cython-based wrapper for the excellent MDBTools package to read data from MS Access MDB files.

Thanks these two amazing projects.

What this project did

  • Compile and install MDB Tools on alpine.
  • Remove mdbread's DEPRECATED code, remove pandas for a smaller size image and easy to build, then install mdbread.


  1. You can modify the file repositories to change the repositories of apk(Alpine Linux package management) for a faster download.
  2. Mdbread use pandas (a famous python data analyis packages) to return DataFrame object, i remove the pandas to save about half size of this image(235MB > 123MB). If u need DataFrame object, u can modify mdbread/mdbread.pyx and Dockerfile to add pandas support, there's a tag in github that built with pandas, it would help.

How to use

  • Pull and test

      >> docker run mio101/mdbread-alpine python -c "import mdbread;print(mdbread)"
      Unable to find image 'mio101/mdbread-alpine:latest' locally
      latest: Pulling from mio101/mdbread-alpine
      2aecc7e1714b: Already exists
      665060bf69b5: Pull complete
      7d37ce5e6761: Pull complete
      24e603a66a49: Pull complete
      Digest: sha256:378300b9e6a7803907cda9fe7e98b2859e78ba1426cb76fd40eabd87d66bc36b
      Status: Downloaded newer image for mio101/mdbread-alpine:latest
      <module 'mdbread' from '/usr/lib/python2.7/site-packages/'>

    Notice the last line that printed info of module mdbread, it means that the mdbread module has been successfully loaded.

  • mdbread Usage

    • From mdbread

        >>> import mdbread
        >>> db = mdbread.MDB("MyDB.mdb")
        >>> print db.tables
        ["tbl1", "tbl2", "tbl3"]
        >>> tbl = db["tbl1"]
        >>> print tbl.columns

      To get the data in a table, you have three options:

      mdbread.Table.records() returns a generator of dictionaries, where the keys are column names and the values are the data.

      iter(mdbread.Table) will return a namedtuple for each row. You can also use this form with for row in tbl:

      NOT SUPPORT IN THIS PROJECT: mdbread.Table.to_data_frame() will return a pandas DataFrame containing all the data for the entire table (possibly requiring lots of memory).

    • Tips

      mdbread.Table.records() and iter(mdbread.Table) will return a generator of namedtuple, it's convenience to use for dealing with data, bravo :)

Docker Pull Command
Source Repository