Read data from MS Access MDB files by Python, only read, built on alpine.
- MDB Tools MDB Tools is a set of programs to help you use Microsoft Access file in various settings.
- mdbread A simple Cython-based wrapper for the excellent MDBTools package to read data from MS Access MDB files.
Thanks these two amazing projects.
What this project did
- Compile and install MDB Tools on alpine.
- Remove mdbread's DEPRECATED code, remove pandas for a smaller size image and easy to build, then install mdbread.
- You can modify the file
repositoriesto change the repositories of apk(Alpine Linux package management) for a faster download.
- Mdbread use pandas (a famous python data analyis packages) to return
DataFrameobject, i remove the pandas to save about half size of this image(235MB > 123MB). If u need
DataFrameobject, u can modify mdbread/mdbread.pyx and Dockerfile to add pandas support, there's a tag in github that built with pandas, it would help.
How to use
Pull and test
>> docker run mio101/mdbread-alpine python -c "import mdbread;print(mdbread)" Unable to find image 'mio101/mdbread-alpine:latest' locally latest: Pulling from mio101/mdbread-alpine 2aecc7e1714b: Already exists 665060bf69b5: Pull complete 7d37ce5e6761: Pull complete 24e603a66a49: Pull complete Digest: sha256:378300b9e6a7803907cda9fe7e98b2859e78ba1426cb76fd40eabd87d66bc36b Status: Downloaded newer image for mio101/mdbread-alpine:latest <module 'mdbread' from '/usr/lib/python2.7/site-packages/mdbread.so'>
Notice the last line that printed info of module mdbread, it means that the mdbread module has been successfully loaded.
>>> import mdbread >>> db = mdbread.MDB("MyDB.mdb") >>> print db.tables ["tbl1", "tbl2", "tbl3"] >>> tbl = db["tbl1"] >>> print tbl.columns ["foo","bar","baz"]
To get the data in a table, you have three options:
mdbread.Table.records()returns a generator of dictionaries, where the keys are column names and the values are the data.
iter(mdbread.Table)will return a namedtuple for each row. You can also use this form with
for row in tbl:
NOT SUPPORT IN THIS PROJECT:
mdbread.Table.to_data_frame()will return a pandas
DataFramecontaining all the data for the entire table (possibly requiring lots of memory).
iter(mdbread.Table)will return a generator of namedtuple, it's convenience to use for dealing with data, bravo :)