Overview of TensorFlow

Below is a high-level overview of the Python-based Deep Learning library TensorFlow.

*Double-tap to expand/collapse an item. Left-tap & drag to move. Mouse-wheel/pinch to zoom.
Knowledge Graph |  Text |  Top
TensorFlow TensorFlow Playground High-Level Summary Framework Machine Learning Library Computational Graphs Declarative Programming Abstraction for describing computations as a directed graph Edges Tensors Multidimensional arrays / matrices Nodes Operations Terminology Why? Dependency-Driven Scheduling Dependencies specify the order of execution Parallel processing on distributed cores Runtime Executing graphs on a variety of hardware Coding Steps 1. Set up a collection of feature columns 2. Create a model, passing in the feature columns Model Types Linear Regression - tf.estimator.LinearRegressor(feature_columns) Deep Neural Network - tf.estimator.DNNRegressor(feature_columns, hidden_units=[128, 64, 32]) Linear Classification - tf.estimator.LinearClassifier(feature_columns) Deep Neural Network Classification - tf.estimator.DNNClassifier(feature_columns, hidden_units) 3. Write input function / generator function to return 'features,labels', with features being a Dict {} Input Functions Pandas Input Function - tf.estimator.inputs.pandas_input_fn(x, y, batch_size, num_epochs, shuffle, queue_capacity, num_threads) 4. Train, passing in input function and number of steps 5. Use the trained model to predict Function Reference tf decode_csv(value_column, record_defaults = [ [1], [2] ] tf.feature_column Can be thought of as doing some types of pre-processing numeric_column('colname') categorical_column_with_vocabulary_list('blahId', keys=['123', '434']) - for one-hot encoding categorical values categorical_column_with_identity('blahId', num_buckets=10) - if there are 10 known id values; e.g. hour of day being 0 to 23. categorical_column_with_hash_bucket('blahId', hash_bucket_size=500) - don't need a full vocab known ahead of time; almost one-hot encoding based on the hash of the value crossed_column([dayofweek, hourofday], 24*7) - to create a day_hour feature cross bucketized_column(listOfValues, buckets) - possibly use np.linearspace() to create buckets real_valued_column sparse_column_with_keys('dayofweek', keys=['Sun',...'Sat']) sparse_column_with_integerized_feature('hourofday', bucket_size=24) embedding_column(mycrosspair, 10) tf.transform Allows users to define pre-processing pipelines and run these using large scale data processing frameworks, while also exporting the pipeline in a way that can be run as part of a TensorFlow graph when making predictions YouTube - Introduction to Tensorflow Transform Blog - Pre-processing for Machine Learning with tf.Transform Concepts Analyzers - e.g. mean, stddev, quantiles - implemented as Apache Beam data pipelines; Analyzers run and inject the result as constants in the TensorFlow graph Scale to functions tft.scale_to_z_score - or between 0 and 1 Bucketisation tft.quantiles tft.apply_buckets Bag of Words / N-Grams tft.string_split tft.ngrams tft.string_to_int Feature Crosses tft.string_join tft.string_to_int tft.apply_saved_model - inline any other saved TensorFlow model tf.logging set_verbosity(tf.logging.INFO) or DEBUG, INFO, WARN (default), ERROR, FATAL tf.gfile file_list = Glob(filename_pattern) tf.data dataset = TextLineDataset(file_list).map(my_decode_csv) tf.estimator train_and_evaluate(estimator, train_spec, eval_spec) - handles fault-tolerant, distributed training and evaluation Features Distributes the graph Share variables Evaluate periodically Handles machine failures Creates checkpoint files Recovers from failures - workers and the chief Saves summaries for TensorBoard tf.estimator.TrainSpec Contains the input function Contains the max_steps = number of training steps (not epochs, because it could have recovered from a crash, so it is a count of steps) tf.estimator.EvalSpec Contains the input function Contains the steps = None Contains start_delay_secs=60 for starting evaluation after N seconds Contains throttle_secs=600 for evaluating every N seconds Contains exporters - for checkpointing ModeKeys.TRAIN and ModeKeys.EVAL Wide-and-deep DNN - NNLinearCombinedClassifier(model_dir=..., linear_feature_columns=sparse_wide_columns, dnn_feature_columns=dense_deep_columns, dnn_hidden_units=[100,50]) Tips Start training a model fresh each time - shutil.rmtree(OUTDIR, ignore_errors = True) Read sharded CSV files - dataset = tf.data.TextLineDataset(filenames).map(decode_csv_to_features_label) Return the features and labels node in the graph - call dataset.make_one_shot_iterator().get_next() Use TensorBoard to monitor training