This lab will give you hands-on practice with TensorFlow 2.x model training, both locally and on AI Platform. After training, you will learn how to deploy your model to AI Platform for serving (prediction). You'll train your model to predict income category of a person using the United States Census Income Dataset.
This lab gives you an introductory, end-to-end experience of training and prediction on AI Platform. The lab will use a census dataset to:
You can read the files directly from Cloud Storage or copy them to your local environment.
%%bash
mkdir data
gsutil -m cp gs://cloud-samples-data/ml-engine/census/data/* data/
util.py
will contain utility methods for cleaning and preprocessing the data, as well as performing any feature engineering needed by transforming and normalizing the data.%%writefile trainer/util.py
def _download_and_clean_file(filename, url):
"""Downloads data from url, and makes changes to match the CSV format.
The CSVs may use spaces after the comma delimters (non-standard) or include
rows which do not represent well-formed examples. This function strips out
some of these problems.
Args:
filename: filename to save url to
url: URL of resource to download
"""
def download(data_dir):
"""Downloads census data if it is not already present.
Args:
data_dir: directory where we will access/save the census data
"""
def preprocess(dataframe):
"""Converts categorical features to numeric. Removes unused columns.
Args:
dataframe: Pandas dataframe with raw data
Returns:
Dataframe with preprocessed data
"""
def standardize(dataframe):
"""Scales numerical columns using their means and standard deviation to get
z-scores: the mean of each numerical column becomes 0, and the standard
deviation becomes 1. This can help the model converge during training.
Args:
dataframe: Pandas dataframe
Returns:
Input dataframe with the numerical columns scaled to z-scores
"""
def load_data():
"""Loads data into preprocessed (train_x, train_y, eval_y, eval_y)
dataframes.
Returns:
A tuple (train_x, train_y, eval_x, eval_y), where train_x and eval_x are
Pandas dataframes with features for training and train_y and eval_y are
numpy arrays with the corresponding labels.
"""
model.py
defines the input function and the model architecture.%%writefile trainer/model.py
def input_fn(features, labels, shuffle, num_epochs, batch_size):
"""Generates an input function to be used for model training.
Args:
features: numpy array of features used for training or inference
labels: numpy array of labels for each example
shuffle: boolean for whether to shuffle the data or not (set True for
training, False for evaluation)
num_epochs: number of epochs to provide the data for
batch_size: batch size for training
Returns:
A tf.data.Dataset that can provide data to the Keras model for training or
evaluation
"""
def create_keras_model(input_dim, learning_rate):
"""Creates Keras Model for Binary Classification.
The single output node + Sigmoid activation makes this a Logistic
Regression.
Args:
input_dim: How many features the input has
learning_rate: Learning rate for training
Returns:
The compiled Keras model (still needs to be trained)
"""
task.py
trains on data loaded and preprocessed in util.py
. Using the tf.distribute.MirroredStrategy() sope, it is possible to train on a distributed fashion. The trained model is then saved in a TensorFlow SavedModel format.%%writefile trainer/task.py
def get_args():
"""Argument parser.
Returns:
Dictionary of arguments.
"""
def train_and_evaluate(args):
"""Trains and evaluates the Keras model.
Uses the Keras model defined in model.py and trains on data loaded and
preprocessed in util.py. Saves the trained model in TensorFlow SavedModel
format to the path defined in part by the --job-dir argument.
Args:
args: dictionary of arguments - see get_args() for details
"""
...
...
export_path = os.path.join(args.job_dir, 'keras_export')
tf.keras.models.save_model(keras_model, export_path)
if __name__ == '__main__':
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
args = get_args()
tf.compat.v1.logging.set_verbosity(args.verbosity)
train_and_evaluate(args)