trainer

Primitive Trainer Interface.

Classes

Trainer

Trainer provide an interface for all trainers to inherit from.

class ashpy.trainers.trainer.Trainer(epochs, example_dim, logdir='/home/docs/checkouts/readthedocs.org/user_builds/ashpy/checkouts/v0.2.0/docs/source/log', log_eval_mode=<LogEvalMode.TEST: 1>, global_step=None, metrics=None, callbacks=None)[source]

Bases: abc.ABC

Trainer provide an interface for all trainers to inherit from.

__init__(epochs, example_dim, logdir='/home/docs/checkouts/readthedocs.org/user_builds/ashpy/checkouts/v0.2.0/docs/source/log', log_eval_mode=<LogEvalMode.TEST: 1>, global_step=None, metrics=None, callbacks=None)[source]

Primitive trainer interface. Handles model saving and restore.

Parameters
  • epochs (int) – Number of training epochs.

  • example_dim (Tuple[int, int]) – Dimension of an example. In the case of GANs the example has dimension (2,1) since it’s composed by a tuple in which the first element is a tuple with 2 components and the second component is a single element. In the case of classifier the example has dimension (1, 1) since it’s composed by the example and the label.

  • logdir (str) – Checkpoint and log directory.

  • log_eval_mode (py:class:ashpy.modes.LogEvalMode) – to use when evaluating and logging.

  • global_step (Optional[py:class:ashpy.modes.LogEvalMode]) – tf.Variable that keeps track of the training steps.

  • metrics (Optional[List[ashpy.metrics.Metric]]) – list of metrics.

  • callbacks (Optional[List[ashpy.callbacks.Callback]]) – list of callbacks to handle events.

_current_epoch()[source]

Get the current epoch using the (restored) variables.

Return type

Tensor

Returns

current_epoch (tf.Tensor) – the current epoch of training.

_dataset_from_example(example, dims)[source]

Get a dataset from a given example.

Return type

DatasetV2

Returns

The dataset containing only the example.

_log_metrics_and_reset()[source]

Call for each metric the log and reset_states.

_measure_performance()[source]

Measure performance on dataset.

_measure_performance_if_needed(example, measure_performance_freq)[source]

Measure performance if needed.

Measure performance if self._global_step % measure_performance_freq is 0.

_on_batch_end()[source]

Handle the end of a training batch.

Return type

None

_on_batch_start()[source]

Handle the start of a training batch.

Return type

None

_on_epoch_end()[source]

Handle the end of the training epoch.

Return type

None

_on_epoch_start()[source]

Handle the start of the training epoch.

Return type

None

_on_exception()[source]

Handle the exception.

Return type

None

_on_train_end()[source]

Handle the end of training.

Return type

None

_on_train_start()[source]

Handle the start of training.

Return type

None

_reduce(per_replica_tensor, reduce_op)[source]

Reduce the input tensor in a distributed fashion, using the specified op.

_restore_or_init()[source]

Restore or initialize the persistence layer (checkpoint).

_save()[source]

Save the current checkpointable object status.

_update_global_batch_size(dataset, executors=None)[source]

Set the self._global_batch_size variable where needed.

Parameters

dataset (tf.data.Dataset) – a dataset from which the batch size will be extracted.

:param executors (Union[List[ashpy.losses.executor.Executor],: ashpy.losses.executor.Executor]: a list of executor

with the property “global_batch_size”.

_validate_callbacks()[source]

Check if every callback is an ashpy.callbacks.Callback.

_validate_metrics()[source]

Check if every metric is an ashpy.metrics.Metric.

abstract call(*args, **kwargs)[source]

Execute the training process.

property context

Return the training context.

Return type

Context

local_example(example, dims)[source]

Return a local example from a distributed example.

Returns

A local example from a distributed example.

measure_metrics()[source]

Measure the metrics.

Return type

None

model_selection()[source]

Use the metrics to perform model selection.

Return type

None