Cloud
The TensorFlow Cloud repository provides APIs that will allow to easily go from debugging and training your Keras and TensorFlow code in a local environment to distributed training in the cloud.
Install / Use
/learn @tensorflow/CloudREADME
TensorFlow Cloud
The TensorFlow Cloud repository provides APIs that will allow to easily go from debugging, training, tuning your Keras and TensorFlow code in a local environment to distributed training/tuning on Cloud.
Introduction
TensorFlow Cloud run API for GCP training/tuning
Installation
Requirements
-
Python >= 3.6
-
Google AI platform APIs enabled for your GCP account. We use the AI platform for deploying docker images on GCP.
-
Either a functioning version of docker if you want to use a local docker process for your build, or create a cloud storage bucket to use with Google Cloud build for docker image build and publishing.
-
(optional) nbconvert if you are using a notebook file as
entry_pointas shown in usage guide #4.
For detailed end to end setup instructions, please see Setup instructions.
Install latest release
pip install -U tensorflow-cloud
Install from source
git clone https://github.com/tensorflow/cloud.git
cd cloud
pip install src/python/.
High level overview
TensorFlow Cloud package provides the run API for training your models on GCP.
To start, let's walk through a simple workflow using this API.
-
Let's begin with a Keras model training code such as the following, saved as
mnist_example.py.import tensorflow as tf (x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data() x_train = x_train.reshape((60000, 28 * 28)) x_train = x_train.astype('float32') / 255 model = tf.keras.Sequential([ tf.keras.layers.Dense(512, activation='relu', input_shape=(28 * 28,)), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation='softmax') ]) model.compile(loss='sparse_categorical_crossentropy', optimizer=tf.keras.optimizers.Adam(), metrics=['accuracy']) model.fit(x_train, y_train, epochs=10, batch_size=128) -
After you have tested this model on your local environment for a few epochs, probably with a small dataset, you can train the model on Google Cloud by writing the following simple script
scale_mnist.py.import tensorflow_cloud as tfc tfc.run(entry_point='mnist_example.py')Running
scale_mnist.pywill automatically apply TensorFlow one device strategy and train your model at scale on Google Cloud Platform. Please see the usage guide section for detailed instructions and additional API parameters. -
You will see an output similar to the following on your console. This information can be used to track the training job status.
user@desktop$ python scale_mnist.py Job submitted successfully. Your job ID is: tf_cloud_train_519ec89c_a876_49a9_b578_4fe300f8865e Please access your job logs at the following URL: https://console.cloud.google.com/mlengine/jobs/tf_cloud_train_519ec89c_a876_49a9_b578_4fe300f8865e?project=prod-123
Setup instructions
End to end instructions to help set up your environment for Tensorflow Cloud. You use one of the following notebooks to setup your project or follow the instructions below.
<table align="left"> <td> <a href="https://colab.research.google.com/github/tensorflow/cloud/blob/master/examples/google_cloud_project_setup_instructions.ipynb"> <img width="50" src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo">Run in Colab </a> </td> <td> <a href="https://github.com/tensorflow/cloud/blob/master/examples/google_cloud_project_setup_instructions.ipynb"> <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">View on GitHub </a> </td> <td> <a href="https://www.kaggle.com/nitric/google-cloud-project-setup-instructions"> <img width="90" src="https://www.kaggle.com/static/images/site-logo.png" alt="Kaggle logo">Run in Kaggle </a> </td> </table>-
Create a new local directory
mkdir tensorflow_cloud cd tensorflow_cloud -
Make sure you have
python >= 3.6python -V -
Set up virtual environment
virtualenv tfcloud --python=python3 source tfcloud/bin/activate -
Set up your Google Cloud project
Verify that gcloud sdk is installed.
which gcloudSet default gcloud project
export PROJECT_ID=<your-project-id> gcloud config set project $PROJECT_ID -
Create a service account.
export SA_NAME=<your-sa-name> gcloud iam service-accounts create $SA_NAME gcloud projects add-iam-policy-binding $PROJECT_ID \ --member serviceAccount:$SA_NAME@$PROJECT_ID.iam.gserviceaccount.com \ --role 'roles/editor'Create a key for your service account.
gcloud iam service-accounts keys create ~/key.json --iam-account $SA_NAME@$PROJECT_ID.iam.gserviceaccount.comCreate the GOOGLE_APPLICATION_CREDENTIALS environment variable.
export GOOGLE_APPLICATION_CREDENTIALS=~/key.json -
Create a Cloud Storage bucket. Using Google Cloud build is the recommended method for building and publishing docker images, although we optionally allow for local docker daemon process depending on your specific needs.
BUCKET_NAME="your-bucket-name" REGION="us-central1" gcloud auth login gsutil mb -l $REGION gs://$BUCKET_NAME(optional for local docker setup)
shell sudo dockerd -
Authenticate access to Google Cloud registry.
gcloud auth configure-docker -
Install nbconvert if you plan to use a notebook file
entry_pointas shown in usage guide #4.pip install nbconvert -
Install latest release of tensorflow-cloud
pip install tensorflow-cloud
Usage guide
As described in the high level overview, the run API
allows you to train your models at scale on GCP. The
run
API can be used in four different ways. This is defined by where you are running
the API (Terminal vs IPython notebook), and your entry_point parameter.
entry_point is an optional Python script or notebook file path to the file
that contains your TensorFlow Keras training code. This is the most important
parameter in the API.
run(entry_point=None,
requirements_txt=None,
distribution_strategy='auto',
docker_config='auto',
chief_config='auto',
worker_config='auto',
worker_count=0,
entry_point_args=None,
stream_logs=False,
job_labels=None,
**kwargs)
-
Using a python file as
entry_point.If you have your
tf.kerasmodel in a python file (mnist_example.py), then you can write the following simple script (scale_mnist.py) to scale your model on GCP.import tensorflow_cloud as tfc tfc.run(entry_point='mnist_example.py')Please note that all the files in the same directory tree as
entry_pointwill be packaged in the docker image created, along with theentry_pointfile. It's recommended to create a new directory to house each cloud project which includes necessary files and nothing else, to optimize image build times. -
Using a notebook file as
entry_point.If you have your
tf.kerasmodel in a notebook file (mnist_example.ipynb), then you can write the following simple script (scale_mnist.py) to scale your model on GCP.import tensorflow_cloud as tfc tfc.run(entry_point='mnist_example.ipynb')Please note that all the files in the same directory tree as
entry_pointwill be packaged in the docker image created, along with theentry_pointfile. Like the python scriptentry_pointabove, we recommended creating a new directory to house each cloud project which includes necessary files and nothing else, to optimize image build times. -
Using
runwithin a python script that contains thetf.kerasmodel.You can use the
runAPI from within your python file that contains thetf.kerasmodel (`mnist_scale.
