You may think I am crazy, but I really enjoy setting up my computer. I am the kind of person who likes to wipe their computer clean every so often and start fresh with a clean slate. This blog post is an update to an earlier blog post from 2020. A lot has changed since then, so here is a look at my current setup.
Like what you see here? Or have some tools you think that I should have included? I would love to hear from you on twitter (https://twitter.com/TheReaLSamlam).
TL/DR
Feeling bold and want to completely clone my setup? Just run the following script and install everything at your own risk ☠️:
setup.sh
Homebrew
Using a package manager is an easy way to keep tools up to date, and forces a consistent approach for downloading and installing new tools. When ever possible, I try to brew
install tools / software using Homebrew. To install homebrew run the following command:
Throughout this blog post we will use brew install
where ever possible to install our tools.
~/.zshrc
If you are using the zsh shell (now the default on mac) you will save your configurations and environment variables in the ~/.zshrc file. Below is my ~/.zshrc file. If there are any programs that you choose to no install just delete the associated lines from this file.
Python
Search “how to install python” on Google and you will find many differing opinions. My current approach is:
- Use pyenv to manage my python versions (e.g. enables me to have both Python
3.9
and3.10
on my computer). - Use venv for creating virtual environments.
- Use poetry for managing dependencies in projects.
- Use pipx to install system wide packages and command line tools.
pyenv
Installation
pyenv (https://github.com/pyenv/pyenv>) allows you to manage multiple versions of Python on your computer. To install I like to use the pyenv install script from https://github.com/pyenv/pyenv-installer:
The script installs “pyenv and friends” which includes:
- pyenv
- pyenv-doctor
- pyenv-installer
- pyenv-update
- pyenv-virtualenv
- pyenv-which-ext
Follow the instructions from the terminal output to configure pyenv. For me, I added the following line to my ~/.zshrc file:
Using pyenv
Now that pyenv is installed, you can start to install different versions of python. Check all of the available versions using the pyenv install --list
command:
show output
As of writing this blog post 3.10.0
is the most current version of Python. You can install it using:
And then validate that the installation worked:
show output
Global pyenv
I want to make the latest version of Python as my default. To do so I run the following command:
show output
Local pyenv
For some projects I may want to use a different version of Python then my default 3.10.0
. To change the default version for a specific project I can use the local
command. First install version a new version:
Then set it as the default python version for a new project:
show output
As soon as I navigate away from the project, my python version changes back to my global default.
show output
venv
Creating a new virtual environment
venv
comes built into Python. It is used to create virtual environments. Every time I start a new project I create a new virtual environment. Lets demonstrate by creating a new project:
To create a new virtual environment run the following command:
Lets break down the command above:
python -m venv
runs thevenv
program from the command line.- The last part of the command, the second
venv
is the positional argument forENV_DIR
. Runpython -m venv --help
to see all the arguments and options for the `venv program. - By convention, I always name the
ENV_DIR
asvenv
. However, you can name it anything you like (e.g.python -m venv my-virtual-environment
would also work).
The python -m venv venv
create a new directory in our current project named venv
. Lets take a look inside:
show output
Inside the venv/bin/
directory are several files and scripts. These are used to activate the virtual environment. Run the following command to activate your virtual environment:
If you are using starship your command line will now have a nice indicator letting you know that you are using a virtual environment:
You can prove to yourself that you are in a brand new isolated Python environment by running:
show output
Nice! You have a brand new canvas to start your next python project on. While the virtual environment is activated, anything you pip install will only be installed into the virtual environment.
First I will update pip to the most current version.
Then lets install a package:
show output
Since I do this often, I have included a little helper function in my ~/.zshrc file to automate the creation and updating of new virtual environments.
With this function you can call py_new_env()
from the command line at any time to create, activate, and then update a new virtual environment 🎉!
Using with pyenv
When you run the command python -m venv venv
the virtual environment will automatically be created using which ever version of python you currently have activated. If you are unsure, run the following command to check before creating a new virtual environment:
show output
I can see that Python 3.10.0
is currently active. I can double check by just running:
show output
If I want to create my virtual environment using a different python version I must first activate the other version using pyenv:
Then I can create my virtual environment:
Lets compare the two different virtual environments we created:
show output
You can see that the first one we created (venv
) is using python3.10
, and the second one we created (venv-394
) is using python3.9
.
venv vs. pyenv
When I first started using these tools I would often get them mixed up. What is venv doing? What is pyenv doing? Do I need both?
pyenv
controls your python version (e.g. 3.10 vs. 3.9).venv
isolates your project dependencies (the things you pip install).
poetry
poetry is a tool for python dependency management and packaging. From their website https://python-poetry.org/:
Python packaging and dependency management made easy
Installation
To install poetry run the following command:
Follow the instructions from the terminal output to configure poetry. For me, I added the following line to my ~/.zshrc file:
Verify that the installation worked by running:
show output
Starting a new project with poetry
I use poetry for almost every new project that I start. Lets create an example project to demonstrate:
show output
Your directory now has a new file, pyproject.toml. This file is used to keep track of the dependencies required for your project.
show output
Go ahead and manually update the projects description in pyproject.toml. Too add a dependency use the poetry add
command. This command essentially replaces pip install
. When you use poetry add
a few things happen:
- The new package is added to your pyproject.toml file.
- The poetry dependency resolver verifies the version requirements.
- A virtual environment is created by poetry.
show output
Take a look at your project, you will notice a new file:
show output
The new poetry.lock file contains a detailed description of all the packages you are using (this includes requests, and all of the packages that requests depends on). The pyproject.toml file has been automatically updated to include requests as a dependency.
show output
Using the poetry virtual environment
In order to isolate your package poetry creates a virtual environment. To access this virtual environment you can prefix any command with poetry run
and it will run inside the virtual environment.
show output
It can be annoying to prefix every command with poetry run
. Alternatively, you can run all commands in the poetry virtual environment with poetry shell
.
show output
Adding development dependencies
When creating a python package it is common that you as the developer will need a particular package (e.g. like black for code formatting), but that the end user will not require that same package to run the program. poetry has a mechanism to handle this by allowing you to specify development dependencies with the --dev
option.
show output
As you can see, black has been added as dev-dependency. When you publish your package to PyPi it will not include black as a requirement, but it will include requests.
poetry vs venv
It may not be clear when you should use poetry, vs. when you should us venv. There is not correct answer, but in general I use poetry when:
- I am creating a python package that I will share with others via PyPi and/or GitHub.
I use venv when:
- I do not think that I will be sharing the code with anyone.
- I want to quickly experiment
If you are unsure about which tool to use, just choose one and get started! You can always change your mind later on the. The important thing is that you are using a virtual environment!
pipx
On no, another python tool 🤯!
Yes, it is a lot to take in. But hopefully you will begin to find these tools useful as you continue your python journey. From the pipx website (https://pypa.github.io/pipx/):
pipx — Install and Run Python Applications in Isolated Environments
To install pipx run the following:
I use pipx to install command line tools that I always want to be available to me:
Python packages
Below is a collection of my favourite Python packages. I will not go into detail here, but these are the packages that I use on the most regular basis.
R
Installing R
Install the latest version of R from CRAN:
Select “Download R for macOS.
If you are using a newer Mac with an M1 chip select the second option for Apple silicon arm64. Otherwise, choose the first option for Intel 64-bit.
Click on the link and follow the instructions as prompted. Select all of the default configuration options.
Installing XQaurtz
In order to use R on your Mac OS you will also need to install XQuartz (https://www.xquartz.org/). From R for macOS page on CRAN:
Note: the use of X11 (including tcltk) requires XQuartz to be installed since it is no longer part of OS X. Always re-install XQuartz when upgrading your macOS to a new major version.
You can install XQuartz using homebrew:
RStudio Desktop
If you use R, you are probably already using RStudio Desktop. You can install RStudio from here: https://www.rstudio.com/products/rstudio/download/#download.
R packages
When ever I am starting with a fresh R install the first thing I do is install the tidyverse 🤓. It include 90% of what I use on a regular basis. The tidyverse is actually not one package, but a collection of packages that follow a common design language. The tidyverse is my favourite part of R!
Terminal
Every nerds favourite place to be… the terminal. As a data scientist / developer your terminal setup is a great way to express yourself. Do you like to get crazy and customize everything? Do you keep it simple and stick to the defaults? I like to take a middle ground approach. I want things to look pretty, but I also do not want to waste too much time configuring things.
iTerm2
iTerm2 is a replacement for the default terminal app that comes with your mac. It includes some nice features such as tabs and split panes.
Image from https://iterm2.com/features.html
To install run the following command:
Starship
starship is a cross-shell prompt. According to their website:
The minimal, blazing-fast, and infinitely customizable prompt for any shell!
Gif from https://starship.rs/guide/
I really like starship because:
- I think the defaults look good and are reasonable.
- I can easily have a consistent prompt across all of my devices.
- When needed, starship also has lots of customization options.
To install starship run the following command:
Command line tools
exa
From the exa website (https://the.exa.website/)
A modern replacement for
ls
.
Image from https://github.com/ogham/exa
It has defaults that I prefer, and has a nice coloured output. Run the following to install:
bat
From the bat repo (https://github.com/sharkdp/bat):
A cat clone with wings.
Run the following to install:
just
From the just repo (https://github.com/casey/just):
just
is a handy way to save and run project-specific commands.
just
is very similar to the ubiquitous make
command. It is inspired by make
, but focuses on just being a command runner as opposed to a build tool. To install just run the following command:
bpytop
From the bpytop GitHub repo (https://github.com/aristocratos/bpytop):
Resource monitor that shows usage and stats for processor, memory, disks, network and processes.
To install bytop use homebrew:
VS Code
Installation
VS Code is quickly becoming the editor of choice for many languages, including Python. For me, I use VS Code for most Python projects. I will also use VS Code when I want to quickly view a text based file (e.g. .json, .txt, .R, etc.).
Install VS Code directly from the VS Code Website and follow in the installation instructions:
https://code.visualstudio.com/download
Plugins
Out of the box VS Code comes with many great features, but it will not have everything you need. One of VS Codes greatest strengths is the strong eco system of plugins that are available. My favourites are:
- Python
- Python Indent
- autoDcostring - Python Docstring Generator
- Code Spell Checker
- CodeSnap
- Draw.io Integration
- File Utils
- Git Graph
- just
- Open Folder Context Menus for VS Code
- Open in GitHub
- Path Intellisense
- Peacock
- Rainbow CSV
- Visual Studio IntelliCode
Apps
Docker desktop
Very helpful for creating reproducible environments and for deploying applications to production (https://www.docker.com/products/docker-desktop/).
Typora
A nice text editor for markdown (https://typora.io/). Typora costs $14.99 but is worth it if you are writing a lot of markdown.
Fonts
You can manage and install fonts using homebrew. The website https://www.nerdfonts.com/ provides many fun fonts that include glyphs (icons). When you use these fonts with tools like starship your terminal output will look much better!
I use the Fira Code Nerd Font which is recommended by starship:
Remember to update iTerm2 (and any other terminals you may be using) to use your new font.
Inspiration and reference
I have referenced and used the below guides many times. Check them out for additional inspiration and ideas on how to create your perfect data science setup.
- Modern Python Developers Toolkit by Sebastian Witowski.
- UBC MDS software stack.