Overview of the Google Cloud Platform

Below is an overview of some of the Google Cloud Platform (GCP) that is useful for performing Machine Learning.

*Double-tap to expand/collapse an item. Left-tap & drag to move. Mouse-wheel/pinch to zoom.
Knowledge Graph |  Text |  Top
Google Cloud Platform (GCP) Overview Google Cloud Customers Showcase of Big Data Customers Feature Engineering Where to do it? On the fly, as data is sent to the input function A separate pre-processing step on all the data, before the training step - in Dataflow so it is scaled and distributed; also good for time-windowed aggregations in a real-time pipeline (and use Dataflow for the predictions) Alternative is to do pre-processing in plain Python Dataflow, in order to reuse the same pre-processing steps for inference on the serving inputs of the trained model Do pre-processing in Dataflow and create a set of pre-processed features using tf.transform (min, max, vocab, etc. stored in metadata.json) so they are part of the actual model graph and can be used in TensorFlow during serving/inference Google's core infrastructure Level 1: Security Communications to Google Cloud are encrypted in-transit, with multiple layers of security to protect against denial of service attacks , backed by Google security teams 24/7 Data is encrypted at rest and distributed for availability and reliability Example - BigQuery Table data is encrypted with envelope encryption - where data is encrypted with data encryption keys, keys are encrpyted with key encryption keys Also can use customer managed keys Can monitor and flag queries for anomalous behaviour Can limit data access with authorised views, at the row and column level Google Cloud Platform managed services manage Deployment Web app security Identity Operations Access and authorisation Network security OS, data and content Audit logging Network Storage and encryption Hardware You manage (when using GCP managed services) Content Access policies Usage Level 2: Compute Power Level 2: Storage Level 2: Networking Overview Private network, petabit bi-sectional, full-duplex bandwidth and Edge points of presence, with software-defined networking Thousands of miles of fibre optic cable, crossing oceans, with repeaters to amplify optical signals Carries ~40% of the worlds internet traffic daily Any machine in Google's Jupiter Network can communicate with any other machine at over 10 gigabits per second This speed means co-location of compute and storage is not necessary >100 edge points of presence globally >7500 edge node locations globally Level 3: Big Data, ML & Analytics Product Suite Google Cloud Public Datasets Big Data Product Life-cycle 2/5 - Ingestion Products Cloud Pub/Sub Cloud Dataflow Cloud Composer 3/5 - Analytics Products Cloud Dataprep BigQuery Cloud Dataproc Cloud Datalab Overview Based on JupyterLab CLI: datalab create dataengvm --zone us-central1-a 4/5 - Machine Learning Products Cloud TPU Cloud ML Cloud AutoML ML APIs TensorFlow 5/5 - Products to Serve Data and Insights to Users Dialogflow Data Studio Dashboards/BI 1/5 - Storage Products Cloud Bigtable Cloud Storage Cloud SQL Cloud Spanner Cloud Datastore Compute Engine Disks GCP doco on Compute Engine disk options Persistent Disks Disks that are attached to a VM (running in Compute Engine) Must be attached to a VM in the same zone GCP doco on Snapshot creation GCP doco on Snapshot best practices Compute Products Compute Engine Infrastructure as a Service (IaaS) Quick lift and shift, or maximum flexibility to manage server instances yourself Google Kubenetes Engine (GKE) Clusters of machines (mangaged by Google), under your administrative control Run and orchestrate multiple portable containers in an efficient way App Engine Fully managed Platform as a Service (PaaS) framework Run long-lived code (e.g. web applications) that can autoscale without needing to worry about infrastructure provisioning or resource management Cloud Functions Pre-trained AI building blocks Sight Cloud Vision API Cloud Video Intelligence API AutoML Vision Language Cloud Translation API Converts text from one language to another GCP Cloud Translation API documentation Features Supports 100+ languages Cloud Natural Language API Recognises parts of speech: entities and sentiment Features Sentiment score - from -1.0 to 1.0 (positive or negative) Sentiment magnitude - from 0.0 to 1.0 (how intense of a feeling) AutoML Translation Translation using Automated Machine Learning AutoML Natural Language Natural Language Processing (NLP) using Automated Machine Learning Conversation Dialogflow Enterprise Edition Cloud Text-to-Speech Converts text into high-quality speech audio Cloud Speech-to-Text Converts audio to text for data processing Resource Hierarchy Organisation Overview Not required, but allows policies to be set and applied throughout all the projects under that organisation Root node of an entire GCP hierarchy Folders A logical grouping for a collection of projects or nested folders Use a folder for logically grouping different teams and/or products Projects A base-level, logical organizing entity for creating and using resources and services for managing billing APIs and permissions Use a project for each environment - e.g. Dev, Test, Production Actions that can be performed Create Manage Delete Undelete Resources Lowest level in the hierarchy Examples: BigQuery dataset, Cloud Storage bucket, Compute Engine instance Cloud Identity and Access Management (IAM) IAM policies control user access to resources Zones and Regions Physically organise the GCP resources Region - independent geographic areas; a data centre in the world, consisting of 2 or more zones (most have 3 zones, some have 4 zones) Zone - a deployment area within a region; a single failure domain within a region (e.g. independent power supply, switch, etc). History of Google Inventions 2002 - Google File System (GFS) For sharding and storing petabytes of data at scale Foundation for Cloud Storage and BigQuery Managed Storage Cloud Storage 2004 - MapReduce Challenge was how to index the expanding content of the Web MapReduce-based programs can automatically parallelise and execute on a large cluster of commodity machines A year later, Apache Hadoop was created by Doug Cutting and Mike Cafarella Issue was that developers have to write the code to manage all of the commodity server infrastructure, rather than just focus on application logic - so Google moved away from MapReduce to Dremil between 2008 and 2010. Cloud Dataproc 2006 - Bigtable Challenge was how to record and retrieve millions of streaming user actions with high throughput Was the inspiration behind MongoDB and Hbase Cloud Bigtable 2008 to 2010 - Dremil Issue with MapReduce was that developers have to write the code to manage all of the commodity server infrastructure, rather than just focus on application logic Decomposed data into shards, compresses into columnular format across distributed storage Uses a query optimiser to farm out a task for each shard of data to be processed in parallel across commodity hardware Automatically manages data imbalances, worker communication, and scaling Became the query engine behind BigQuery BigQuery 2009 - Collossus Next generation distributed data store Cloud Storage 2010 - Flume Data pipelines Cloud Dataflow 2011 - Megastore Cloud Datastore 2012 - Spanner Planet scale relational database Cloud Spanner (in 2016) 2013 - Pub/Sub Messaging Cloud Pub/Sub 2013 - Millwheel Data pipelines 2014 - F1 2015 - TensorFlow Machine Learning framework and library TensorFlow Cloud ML Engine 2017 - TPU Hardware specialised for machine learning AutoML Hardware CPU GPU TPU An Application-Specific Chips (ASICs), faster than GPUs Case Studies Ebay - uses Cloud TPU Pods, giving them 10x speedup for training image recognition models - from months to a few days, and with the increased memory can process many more images at once Cloud TPU v1 - ~90 teraflops Cloud TPU v2 - 180 teraflops, 64-GB High Bandwidth Memory (HBM) Cloud TPU v3 - 420 teraflops, 128-GB High Bandwidth Memory (HBM)