Public Repository

Last pushed: a year ago
Short Description
Custom Docker images for data science
Full Description

This repository contains images created to the purpose of making it easy to start practicing with data science tools without having to install them, especially useful for teachers and learners.

The image is based on the one provided by dataquestio/python3-starter (https://hub.docker.com/r/dataquestio/python3-starter/)

  • Python 3, along with the following pre-installed libraries:
    pyzmq, scipy, pandas, matplotlib, statsmodels, scikit-learn, seaborn, nltk, gensim, sympy, bokeh, networkx, theano, requests, beautifulsoup4
  • Spark 1.6.2
  • R
  • Jupyter Notebooks server (localhost:8888)
  • RStudio Server (localhost:8887)
  • PySpark Notebooks (localhost:8889)

INSTALLATION

  • Download the image from the repository
    docker pull deccar/data-science

  • Create the shared repository between host and container
    mkdir ~/notebooks

  • Launch the container from this image and redirect jupyter, rstudio and pyspark/jupyter ports to be able to access them directly from our host system:
    docker run -d -p 8888:8888 -p 8787:8787 -p 8889:8889 -v ~/notebooks:/home/ds/notebooks deccar/data-science

  • Show containers in execution:
    docker ps

  • Copy the container id and replace in the following command:
    docker exec -ti <container_id> /bin/bash

  • Inside the container's console, write:
    nohup rserver &
    pyspark —master local[*]

Docker Pull Command
Owner
deccar