SkillAgentSearch skills...

Cloud

The TensorFlow Cloud repository provides APIs that will allow to easily go from debugging and training your Keras and TensorFlow code in a local environment to distributed training in the cloud.

Install / Use

/learn @tensorflow/Cloud
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

TensorFlow Cloud

The TensorFlow Cloud repository provides APIs that will allow to easily go from debugging, training, tuning your Keras and TensorFlow code in a local environment to distributed training/tuning on Cloud.

Introduction

TensorFlow Cloud run API for GCP training/tuning

Installation

Requirements

For detailed end to end setup instructions, please see Setup instructions.

Install latest release

pip install -U tensorflow-cloud

Install from source

git clone https://github.com/tensorflow/cloud.git
cd cloud
pip install src/python/.

High level overview

TensorFlow Cloud package provides the run API for training your models on GCP. To start, let's walk through a simple workflow using this API.

  1. Let's begin with a Keras model training code such as the following, saved as mnist_example.py.

    import tensorflow as tf
    
    (x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()
    
    x_train = x_train.reshape((60000, 28 * 28))
    x_train = x_train.astype('float32') / 255
    
    model = tf.keras.Sequential([
      tf.keras.layers.Dense(512, activation='relu', input_shape=(28 * 28,)),
      tf.keras.layers.Dropout(0.2),
      tf.keras.layers.Dense(10, activation='softmax')
    ])
    
    model.compile(loss='sparse_categorical_crossentropy',
                  optimizer=tf.keras.optimizers.Adam(),
                  metrics=['accuracy'])
    
    model.fit(x_train, y_train, epochs=10, batch_size=128)
    
  2. After you have tested this model on your local environment for a few epochs, probably with a small dataset, you can train the model on Google Cloud by writing the following simple script scale_mnist.py.

    import tensorflow_cloud as tfc
    tfc.run(entry_point='mnist_example.py')
    

    Running scale_mnist.py will automatically apply TensorFlow one device strategy and train your model at scale on Google Cloud Platform. Please see the usage guide section for detailed instructions and additional API parameters.

  3. You will see an output similar to the following on your console. This information can be used to track the training job status.

    user@desktop$ python scale_mnist.py
    Job submitted successfully.
    Your job ID is:  tf_cloud_train_519ec89c_a876_49a9_b578_4fe300f8865e
    Please access your job logs at the following URL:
    https://console.cloud.google.com/mlengine/jobs/tf_cloud_train_519ec89c_a876_49a9_b578_4fe300f8865e?project=prod-123
    

Setup instructions

End to end instructions to help set up your environment for Tensorflow Cloud. You use one of the following notebooks to setup your project or follow the instructions below.

<table align="left"> <td> <a href="https://colab.research.google.com/github/tensorflow/cloud/blob/master/examples/google_cloud_project_setup_instructions.ipynb"> <img width="50" src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo">Run in Colab </a> </td> <td> <a href="https://github.com/tensorflow/cloud/blob/master/examples/google_cloud_project_setup_instructions.ipynb"> <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">View on GitHub </a> </td> <td> <a href="https://www.kaggle.com/nitric/google-cloud-project-setup-instructions"> <img width="90" src="https://www.kaggle.com/static/images/site-logo.png" alt="Kaggle logo">Run in Kaggle </a> </td> </table>
  1. Create a new local directory

    mkdir tensorflow_cloud
    cd tensorflow_cloud
    
  2. Make sure you have python >= 3.6

    python -V
    
  3. Set up virtual environment

    virtualenv tfcloud --python=python3
    source tfcloud/bin/activate
    
  4. Set up your Google Cloud project

    Verify that gcloud sdk is installed.

    which gcloud
    

    Set default gcloud project

    export PROJECT_ID=<your-project-id>
    gcloud config set project $PROJECT_ID
    
  5. Authenticate your GCP account

    Create a service account.

    export SA_NAME=<your-sa-name>
    gcloud iam service-accounts create $SA_NAME
    gcloud projects add-iam-policy-binding $PROJECT_ID \
        --member serviceAccount:$SA_NAME@$PROJECT_ID.iam.gserviceaccount.com \
        --role 'roles/editor'
    

    Create a key for your service account.

    gcloud iam service-accounts keys create ~/key.json --iam-account $SA_NAME@$PROJECT_ID.iam.gserviceaccount.com
    

    Create the GOOGLE_APPLICATION_CREDENTIALS environment variable.

    export GOOGLE_APPLICATION_CREDENTIALS=~/key.json
    
  6. Create a Cloud Storage bucket. Using Google Cloud build is the recommended method for building and publishing docker images, although we optionally allow for local docker daemon process depending on your specific needs.

    BUCKET_NAME="your-bucket-name"
    REGION="us-central1"
    gcloud auth login
    gsutil mb -l $REGION gs://$BUCKET_NAME
    

    (optional for local docker setup) shell sudo dockerd

  7. Authenticate access to Google Cloud registry.

    gcloud auth configure-docker
    
  8. Install nbconvert if you plan to use a notebook file entry_point as shown in usage guide #4.

    pip install nbconvert
    
  9. Install latest release of tensorflow-cloud

    pip install tensorflow-cloud
    

Usage guide

As described in the high level overview, the run API allows you to train your models at scale on GCP. The run API can be used in four different ways. This is defined by where you are running the API (Terminal vs IPython notebook), and your entry_point parameter. entry_point is an optional Python script or notebook file path to the file that contains your TensorFlow Keras training code. This is the most important parameter in the API.

run(entry_point=None,
    requirements_txt=None,
    distribution_strategy='auto',
    docker_config='auto',
    chief_config='auto',
    worker_config='auto',
    worker_count=0,
    entry_point_args=None,
    stream_logs=False,
    job_labels=None,
    **kwargs)
  1. Using a python file as entry_point.

    If you have your tf.keras model in a python file (mnist_example.py), then you can write the following simple script (scale_mnist.py) to scale your model on GCP.

    import tensorflow_cloud as tfc
    tfc.run(entry_point='mnist_example.py')
    

    Please note that all the files in the same directory tree as entry_point will be packaged in the docker image created, along with the entry_point file. It's recommended to create a new directory to house each cloud project which includes necessary files and nothing else, to optimize image build times.

  2. Using a notebook file as entry_point.

    If you have your tf.keras model in a notebook file (mnist_example.ipynb), then you can write the following simple script (scale_mnist.py) to scale your model on GCP.

    import tensorflow_cloud as tfc
    tfc.run(entry_point='mnist_example.ipynb')
    

    Please note that all the files in the same directory tree as entry_point will be packaged in the docker image created, along with the entry_point file. Like the python script entry_point above, we recommended creating a new directory to house each cloud project which includes necessary files and nothing else, to optimize image build times.

  3. Using run within a python script that contains the tf.keras model.

    You can use the run API from within your python file that contains the tf.keras model (`mnist_scale.

View on GitHub
GitHub Stars381
CategoryDevelopment
Updated2d ago
Forks94

Languages

Python

Security Score

100/100

Audited on Apr 3, 2026

No findings