In this lab you will set up your Python development environment, get the Cloud Dataflow SDK for Python, and run an example pipeline using the Cloud Console.

Create a Cloud Storage bucket

  1. In the Cloud Console, click on Navigation menu and then click on Storage.
  2. Click Create bucket.
  3. In the Create bucket dialog, specify the following attributes:
  1. Click Create.

Install pip and the Cloud Dataflow SDK

  1. The Cloud Dataflow SDK for Python requires Python version 3.7. Check that you are using Python version 3.7 by running:
python3 --version
  1. Install pip, Python's package manager. Check if you already have pip installed by running:
pip3 --version
  1. After installation, check that you have pip version 7.0.0 or newer. To update pip, run the following command:
sudo pip3 install -U pip

This step is optional but highly recommended. Install and create a Python virtual environment for initial experiments:

If you do not have virtualenv version 13.1.0 or newer, install it by running:

sudo pip3 install --upgrade virtualenv