Overview of TensorFlow

Below is a high-level overview of the Python-based Deep Learning library TensorFlow.

Knowledge Graph | Text | Top

^*Double-tap to expand/collapse an item. Left-tap & drag to move. Mouse-wheel/pinch to zoom.

Knowledge Graph | Text | Top

TensorFlow

TensorFlow Playground

High-Level Summary

Framework

Machine Learning

Library

Computational Graphs

Declarative Programming

Abstraction for describing computations as a directed graph

Edges

Tensors

Multidimensional arrays / matrices

Nodes

Operations

Terminology

Why?

Dependency-Driven Scheduling

Dependencies specify the order of execution

Parallel processing on distributed cores

Runtime

Executing graphs on a variety of hardware

Coding Steps

1. Set up a collection of feature columns

2. Create a model, passing in the feature columns

Model Types

Linear Regression - tf.estimator.LinearRegressor(feature_columns)

Deep Neural Network - tf.estimator.DNNRegressor(feature_columns, hidden_units=[128, 64, 32])

Linear Classification - tf.estimator.LinearClassifier(feature_columns)

Deep Neural Network Classification - tf.estimator.DNNClassifier(feature_columns, hidden_units)

3. Write input function / generator function to return 'features,labels', with features being a Dict {}

Input Functions

Pandas Input Function - tf.estimator.inputs.pandas_input_fn(x, y, batch_size, num_epochs, shuffle, queue_capacity, num_threads)

4. Train, passing in input function and number of steps

5. Use the trained model to predict

Function Reference

decode_csv(value_column, record_defaults = [ [1], [2] ]

tf.feature_column

Can be thought of as doing some types of pre-processing

numeric_column('colname')

categorical_column_with_vocabulary_list('blahId', keys=['123', '434']) - for one-hot encoding categorical values

categorical_column_with_identity('blahId', num_buckets=10) - if there are 10 known id values; e.g. hour of day being 0 to 23.

categorical_column_with_hash_bucket('blahId', hash_bucket_size=500) - don't need a full vocab known ahead of time; almost one-hot encoding based on the hash of the value

crossed_column([dayofweek, hourofday], 24*7) - to create a day_hour feature cross

bucketized_column(listOfValues, buckets) - possibly use np.linearspace() to create buckets

real_valued_column

sparse_column_with_keys('dayofweek', keys=['Sun',...'Sat'])

sparse_column_with_integerized_feature('hourofday', bucket_size=24)

embedding_column(mycrosspair, 10)

tf.transform

Allows users to define pre-processing pipelines and run these using large scale data processing frameworks, while also exporting the pipeline in a way that can be run as part of a TensorFlow graph when making predictions

YouTube - Introduction to Tensorflow Transform

Blog - Pre-processing for Machine Learning with tf.Transform

Concepts

Analyzers - e.g. mean, stddev, quantiles - implemented as Apache Beam data pipelines; Analyzers run and inject the result as constants in the TensorFlow graph

Scale to functions

tft.scale_to_z_score - or between 0 and 1

Bucketisation

tft.quantiles

tft.apply_buckets

Bag of Words / N-Grams

tft.string_split

tft.ngrams

tft.string_to_int

Feature Crosses

tft.string_join

tft.string_to_int

tft.apply_saved_model - inline any other saved TensorFlow model

tf.logging

set_verbosity(tf.logging.INFO) or DEBUG, INFO, WARN (default), ERROR, FATAL

tf.gfile

file_list = Glob(filename_pattern)

tf.data

dataset = TextLineDataset(file_list).map(my_decode_csv)

tf.estimator

train_and_evaluate(estimator, train_spec, eval_spec) - handles fault-tolerant, distributed training and evaluation

Features

Distributes the graph

Share variables

Evaluate periodically

Handles machine failures

Creates checkpoint files

Recovers from failures - workers and the chief

Saves summaries for TensorBoard

tf.estimator.TrainSpec

Contains the input function

Contains the max_steps = number of training steps (not epochs, because it could have recovered from a crash, so it is a count of steps)

tf.estimator.EvalSpec

Contains the input function

Contains the steps = None

Contains start_delay_secs=60 for starting evaluation after N seconds

Contains throttle_secs=600 for evaluating every N seconds

Contains exporters - for checkpointing

ModeKeys.TRAIN and ModeKeys.EVAL

Wide-and-deep DNN - NNLinearCombinedClassifier(model_dir=..., linear_feature_columns=sparse_wide_columns, dnn_feature_columns=dense_deep_columns, dnn_hidden_units=[100,50])

Tips

Start training a model fresh each time - shutil.rmtree(OUTDIR, ignore_errors = True)

Read sharded CSV files - dataset = tf.data.TextLineDataset(filenames).map(decode_csv_to_features_label)

Return the features and labels node in the graph - call dataset.make_one_shot_iterator().get_next()

Use TensorBoard to monitor training