How to Automatically Clear Outputs from Jupyter Notebooks before Committing

By Andre Perunicic | August 24, 2017

This post explains how to automatically clear Jupyter or iPython notebook output cells every time you commit or switch branches in a particular git repo. Enabling this in a repository allows for easier collaboration on Jupyter notebooks and prevents the repository’s size from ballooning due to embedded plots and data printouts!

  1. Clone the script (into, say, ~/GitHub) from https://github.com/toobaz/ipynb_output_filter via

    git clone https://github.com/toobaz/ipynb_output_filter.git
    
  2. Make the script executable

    chmod +x ~/GitHub/ipynb_output_filter/ipynb_output_filter.py
    
  3. Create a ~/.gitattributes file

    touch ~/.gitattributes
    

    and add the following hook:

    echo "*.ipynb    filter=dropoutput_ipynb" >> ~/.gitattributes
    
  4. Configure git to find your ~/.gitattributes file:

    git config --global core.attributesfile ~/.gitattributes
    
  5. Add the dropout_ipynb section (that will trigger the script above) to your repo’s git config by pasting the following into path/to/repo/.git/config:

[filter "dropoutput_ipynb"]
     clean = ~/GitHub/ipynb_output_filter/ipynb_output_filter.py
     smudge = cat

From then on, committing any *.ipynb file should lead to the output being stripped.

If you found this post from Google, I hope this solved your problem. Don’t hesitate to get in touch with us if you need help setting up a data sourcing, analysis or reporting infrastructure.

Comments