In this lab you will set up your Python development environment, get the Cloud Dataflow SDK for Python, and run an example pipeline using the Cloud Console.
Create a Cloud Storage bucket
- In the Cloud Console, click on Navigation menu and then click on Storage.
- Click Create bucket.
- In the Create bucket dialog, specify the following attributes:
- Name: A unique bucket name. Do not include sensitive information in the bucket name, as the bucket namespace is global and publicly visible.
- Location type: Region
- Location:
us-central1
- A location where bucket data will be stored.
- Click Create.
Install pip and the Cloud Dataflow SDK
- The Cloud Dataflow SDK for Python requires Python version 3.7. Check that you are using Python version 3.7 by running:
python3 --version
- Install pip, Python's package manager. Check if you already have pip installed by running:
pip3 --version
- After installation, check that you have pip version 7.0.0 or newer. To update pip, run the following command:
sudo pip3 install -U pip
This step is optional but highly recommended. Install and create a Python virtual environment for initial experiments:
If you do not have virtualenv version 13.1.0 or newer, install it by running:
sudo pip3 install --upgrade virtualenv