Python
Managing Python packages
Python packages are evolving fast, and many depend on specific versions of other packages. Package managers have been created to install the packages user requires with their dependences automatically, but sometimes a combination of versions that would satisfy all dependences does not exist.
You can use the system-provided Python, or our tensorflow.sif singularity container, each with their collection of packages. But no fixed collection of Python packages can satisfy everyone's needs.
The currently preferred solution is to encourage users to install their own Python library with one or more separate collections of packages for their various needs. Conda from the Anaconda distribution is perhaps the most popular package manager, and it maintains such collections called environments.
Do not use pip
to install packages unless there is no other way. It does not try to resolve version conflicts and you
can end up with a broken installation.
Install Anaconda
Go to www.anaconda.org, click Download Anaconda, Linux installers,
right click on the 64-Bit (x86) Installer, and copy the link.
Open an ssh window an alderaan, type wget
and paste the link to create a command line like
wget https://repo.anaconda.com/archive/Anaconda3-2022.05-Linux-x86_64.sh
and press enter. (The file name will change in future versions.) After the installer downloads,
/bin/sh Anaconda3-2021.11-Linux-x86_64.sh
and folow the directions. You should see
Do you wish the installer to initialize Anaconda3 by running conda init? [yes|no]
Answer yes. Edit your ~/.bash_profile
file to add the line
source .bashrc
then log out and back in. The conda
command should now be available
You can stop Conda from activating on login if you do not want to use it every time you log in, by
conda config --set auto_activate_base false
as suggested by the Anaconda installer.
Notes: If you already have your own custom settings
in ~/.bash_profile
or ~/.bashrc
, you should review both files to make sure that they do what you intended,
because a login shell will now source both files and the resulting environment may change.
Initializing anaconda makes changes to your ~/.bashrc
file.
When a new interactive shell starts it will source ~/.bashrc
and make conda
available,
but a login shell will source ~/.bash_profile
instead. See
man bash
and search for INVOCATION for more details.
You can't change bash
to another shell because of the way how the authentication is set up on the clusters.
Create Conda environments and install packages
Activate the base environment:
conda activate
You should see your prompt change to start with (base)
. Create your first environment, for example:
conda create --name myenv python=3.6 paramiko gdal matplotlib tensorflow pandas
Of course, these are just examples, use names of the packages and their versions that you need. Note that you can request specific versions of everything, even Python itself.
Conda will search for a combination of the versions of dependencies that allows it to install what you asked for. It is best to install all packages at once to minimize the chances of a version conflict. If Conda says that some packages cannot be found, leave installing them for the next step.
Now, use the conda-forge repository to add into the environment the packages that could not be found in the previous step:
conda activate myenv
conda install -c conda-forge netCDF4 PyGrib
Finally, use pip to install packages that cannot be found even on conda-forge:
pip install MesoPy
You may want to deactivate Conda when you are not using the environment:
conda deactivate
To make more environments, it is best to start again from the base environment like above.
Using Conda environments in a batch script
Make a batch script like this:
#!/bin/bash
#SBATCH --partition=math-alderaan
#SBATCH --job-name=conda
#SBATCH --nodes=1 # Number of requested nodes
#SBATCH --time=1:00:00 # Max wall time
#SBATCH --ntasks=1 # Number of tasks per job
#
# first emulate what happens at login or interactive shell
source ~/.bashrc
# now we can do what we normally would at the command line
conda activate myenv
python mycode.py
and submit to the scheduler using sbatch as usual.
Uninstalling Anaconda
Sometimes you may need to uninstall Anaconda, e.g. to save space, or if something goes wrong and you need to start over. Delete the Anaconda install directory
cd
rm -rf anaconda3
Then, edit ~/.bashrc
and delete the lines from
# >>> conda initialize >>>
to
# <<< conda initialize <<<