Overview of the Google Cloud Platform

Below is an overview of some of the Google Cloud Platform (GCP) that is useful for performing Machine Learning.

Knowledge Graph | Text | Top

^*Double-tap to expand/collapse an item. Left-tap & drag to move. Mouse-wheel/pinch to zoom.

Knowledge Graph | Text | Top

Google Cloud Platform (GCP) Overview

Google Cloud Customers

Showcase of Big Data Customers

Feature Engineering

Where to do it?

On the fly, as data is sent to the input function

A separate pre-processing step on all the data, before the training step - in Dataflow so it is scaled and distributed; also good for time-windowed aggregations in a real-time pipeline (and use Dataflow for the predictions)

Alternative is to do pre-processing in plain Python Dataflow, in order to reuse the same pre-processing steps for inference on the serving inputs of the trained model

Do pre-processing in Dataflow and create a set of pre-processed features using tf.transform (min, max, vocab, etc. stored in metadata.json) so they are part of the actual model graph and can be used in TensorFlow during serving/inference

Google's core infrastructure

Level 1: Security

Communications to Google Cloud are encrypted in-transit, with multiple layers of security to protect against denial of service attacks , backed by Google security teams 24/7

Data is encrypted at rest and distributed for availability and reliability

Example - BigQuery

Table data is encrypted with envelope encryption - where data is encrypted with data encryption keys, keys are encrpyted with key encryption keys

Also can use customer managed keys

Can monitor and flag queries for anomalous behaviour

Can limit data access with authorised views, at the row and column level

Google Cloud Platform managed services manage

Deployment

Web app security

Identity

Operations

Access and authorisation

Network security

OS, data and content

Audit logging

Network

Storage and encryption

Hardware

You manage (when using GCP managed services)

Content

Access policies

Usage

Level 2: Compute Power

Level 2: Storage

Level 2: Networking

Overview

Private network, petabit bi-sectional, full-duplex bandwidth and Edge points of presence, with software-defined networking

Thousands of miles of fibre optic cable, crossing oceans, with repeaters to amplify optical signals

Carries ~40% of the worlds internet traffic daily

Any machine in Google's Jupiter Network can communicate with any other machine at over 10 gigabits per second

This speed means co-location of compute and storage is not necessary

>100 edge points of presence globally

>7500 edge node locations globally

Level 3: Big Data, ML & Analytics Product Suite

Google Cloud Public Datasets

Big Data Product Life-cycle

2/5 - Ingestion Products

Cloud Pub/Sub

Cloud Dataflow

Cloud Composer

3/5 - Analytics Products

Cloud Dataprep

BigQuery

Cloud Dataproc

Cloud Datalab

Overview

Based on JupyterLab

CLI: datalab create dataengvm --zone us-central1-a

4/5 - Machine Learning Products

Cloud TPU

Cloud ML

Cloud AutoML

ML APIs

TensorFlow

5/5 - Products to Serve Data and Insights to Users

Dialogflow

Data Studio Dashboards/BI

1/5 - Storage Products

Cloud Bigtable

Cloud Storage

Cloud SQL

Cloud Spanner

Cloud Datastore

Compute Engine Disks

GCP doco on Compute Engine disk options

Persistent Disks

Disks that are attached to a VM (running in Compute Engine)

Must be attached to a VM in the same zone

GCP doco on Snapshot creation

GCP doco on Snapshot best practices

Compute Products

Compute Engine

Infrastructure as a Service (IaaS)

Quick lift and shift, or maximum flexibility to manage server instances yourself

Google Kubenetes Engine (GKE)

Clusters of machines (mangaged by Google), under your administrative control

Run and orchestrate multiple portable containers in an efficient way

App Engine

Fully managed Platform as a Service (PaaS) framework

Run long-lived code (e.g. web applications) that can autoscale without needing to worry about infrastructure provisioning or resource management

Cloud Functions

Pre-trained AI building blocks

Sight

Cloud Vision API

Cloud Video Intelligence API

AutoML Vision

Language

Cloud Translation API

Converts text from one language to another

GCP Cloud Translation API documentation

Features

Supports 100+ languages

Cloud Natural Language API

Recognises parts of speech: entities and sentiment

Features

Sentiment score - from -1.0 to 1.0 (positive or negative)

Sentiment magnitude - from 0.0 to 1.0 (how intense of a feeling)

AutoML Translation

Translation using Automated Machine Learning

AutoML Natural Language

Natural Language Processing (NLP) using Automated Machine Learning

Conversation

Dialogflow Enterprise Edition

Cloud Text-to-Speech

Converts text into high-quality speech audio

Cloud Speech-to-Text

Converts audio to text for data processing

Resource Hierarchy

Organisation

Overview

Not required, but allows policies to be set and applied throughout all the projects under that organisation

Root node of an entire GCP hierarchy

Folders

A logical grouping for a collection of projects or nested folders

Use a folder for logically grouping different teams and/or products

Projects

A base-level, logical organizing entity for creating and using resources and services for managing billing APIs and permissions

Use a project for each environment - e.g. Dev, Test, Production

Actions that can be performed

Create

Manage

Delete

Undelete

Resources

Lowest level in the hierarchy

Examples: BigQuery dataset, Cloud Storage bucket, Compute Engine instance

Cloud Identity and Access Management (IAM)

IAM policies control user access to resources

Zones and Regions

Physically organise the GCP resources

Region - independent geographic areas; a data centre in the world, consisting of 2 or more zones (most have 3 zones, some have 4 zones)

Zone - a deployment area within a region; a single failure domain within a region (e.g. independent power supply, switch, etc).

History of Google Inventions

2002 - Google File System (GFS)

For sharding and storing petabytes of data at scale

Foundation for Cloud Storage and BigQuery Managed Storage

Cloud Storage

2004 - MapReduce

Challenge was how to index the expanding content of the Web

MapReduce-based programs can automatically parallelise and execute on a large cluster of commodity machines

A year later, Apache Hadoop was created by Doug Cutting and Mike Cafarella

Issue was that developers have to write the code to manage all of the commodity server infrastructure, rather than just focus on application logic - so Google moved away from MapReduce to Dremil between 2008 and 2010.

Cloud Dataproc

2006 - Bigtable

Challenge was how to record and retrieve millions of streaming user actions with high throughput

Was the inspiration behind MongoDB and Hbase

Cloud Bigtable

2008 to 2010 - Dremil

Issue with MapReduce was that developers have to write the code to manage all of the commodity server infrastructure, rather than just focus on application logic

Decomposed data into shards, compresses into columnular format across distributed storage

Uses a query optimiser to farm out a task for each shard of data to be processed in parallel across commodity hardware

Automatically manages data imbalances, worker communication, and scaling

Became the query engine behind BigQuery

BigQuery

2009 - Collossus

Next generation distributed data store

Cloud Storage

2010 - Flume

Data pipelines

Cloud Dataflow

2011 - Megastore

Cloud Datastore

2012 - Spanner

Planet scale relational database

Cloud Spanner (in 2016)

2013 - Pub/Sub

Messaging

Cloud Pub/Sub

2013 - Millwheel

Data pipelines

2014 - F1

2015 - TensorFlow

Machine Learning framework and library

TensorFlow

Cloud ML Engine

2017 - TPU

Hardware specialised for machine learning

AutoML

Hardware

CPU

GPU

TPU

An Application-Specific Chips (ASICs), faster than GPUs

Case Studies

Ebay - uses Cloud TPU Pods, giving them 10x speedup for training image recognition models - from months to a few days, and with the increased memory can process many more images at once

Cloud TPU v1 - ~90 teraflops

Cloud TPU v2 - 180 teraflops, 64-GB High Bandwidth Memory (HBM)

Cloud TPU v3 - 420 teraflops, 128-GB High Bandwidth Memory (HBM)