Package managment and environments#

So far, we have already worked with a couple of python packages, like matplotlib. A full anaconda install comes with these packages pre-installed, but how do you install new packages, or specific versions of an existing package? And how do you prevent working with different versions devolving into a mess?

Finding and installing packages#

There are two main package managers for python, pypi and conda:

  • The python package index (pypi.org) is the python standard, but has some disadvantages when it comes to packages with more complex dependencies (other packages or non-python code). It is used through the pip command.

  • Conda comes with the anaconda-type distributions, contains most packages that exist in pypi, plus additional non-python packages that python packages might rely on for speed.

  • Some packages are only available fom github, a website and framework for collaboratively developing and publishing code.

How do you find packages and installation instructions?

  • First, check for the package documentation or github website. This will typically have instructions for installation

  • Second, if out of luck, search directly on anaconda.org or pypi.org.

Let’s say we have read a paper in which SLEAP was used for tracking animals. We want to use it to track our own behavioral videos. Google sleap installation and sleap github to find the installation instructions and github repository (=page), respectively.

How to use pypi and conda#

Pypi#

Pypi is used through the pip terminal command. To install a package from pypi, open the command line and enter:

pip install PACKAGE_NAME

Again specifics, like the exact spelling of the package name can be found on the package’s website.

Anaconda#

Anaconda is accessed through the conda command:

conda install PACKAGE_NAME

Sometimes you will see instructions that include one or more occurrences of the argument -c, short for “channel”:

conda install PACKAGE_NAME -c CHANNEL_NAME

Channels are specific locations inside the anaconda package library, where packages are stored.

Package management, dependency hell, and environments#

Why is package management hard? There are many different python packages, sometimes for very specific purposes, developed by many independent developers. However, this freedom and diversity can complicate things, if you need two packages in different projects that rely on different versions of another package: from https://medium.com/knerd/the-nine-circles-of-python-dependency-hell-481d53e3e025

This can lead to unresolvable dependencies and is known as the “python version hell” or “python dependency hell”.

Even worse, if you are not careful, you can completely break your python installation: You install a new package, that requires and installs a different version of matplotlib. However, this can break existing packages, which only work with the original version of matplotlib, not the newly installed one.

Environments#

Conda environments are a way out of this dilemma, since they allow you to have many independent python installations, called environments, on your computer. When you install (ana)conda, an environment called base is created, which has all the basics, like python itself and everything for installing and managing packages (see below).

You should never directly install anything into base!

Rather, if you start a new project, create an environment, and install packages into it. That way, if you mess up, you can delete the specific environment, but still have a working python installation.

  1. Create the environment: conda create --name my_env. This will create an empty environment. You can also specify packages to install during creation, for instance a python version or numpy: conda create --name my_env python=3.12 numpy

  2. Activate the environment: conda activate my_env. Your command line should indicate that the active environment is now my_env. Any call to pip or conda will now only affect the active environment my_env.

  3. Install your packages: conda install matplotlib. Conda will find the matplotlib package, plus all packages matplotlib depends on (basically, all imports), and install them.

It is good practice to have one environment per project. If you want to switch between environments, always deactivate the current one, with conda deactivate. Otherwise, you get weird interaction effects.

I will demonstrate this in one example:

  • create env

  • activate env

  • install conda packages

  • install pip package

  • deactivate env

Tips and tricks#

I lost track of my environments and what is installed in them#

You can list all conda environments on your machine with conda env list.

You can list all packages install in the currently active environment with conda list.

What if I messed up my environment?#

If you messed up, you can easily remove the environment and start from scratch:

conda env remove --name my_env

How do I work with jupyter notebooks in my environment?#

Install the following packages:

  • conda install ipykernel (if you use VSCode)

  • conda install jupyterlab (otherwise)

Anaconda is too big!#

Install miniforge: https://github.com/conda-forge/miniforge.