trainer

Primitive Trainer Interface.

Classes

Trainer Trainer provide an interface for all trainers to inherit from.
class ashpy.trainers.trainer.Trainer(epochs, example_dim, logdir=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/ashpy/checkouts/v0.3.1/docs/source/log'), log_eval_mode=<LogEvalMode.TEST: 1>, global_step=None, metrics=None, callbacks=None)[source]

Bases: abc.ABC

Trainer provide an interface for all trainers to inherit from.

__init__(epochs, example_dim, logdir=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/ashpy/checkouts/v0.3.1/docs/source/log'), log_eval_mode=<LogEvalMode.TEST: 1>, global_step=None, metrics=None, callbacks=None)[source]

Primitive trainer interface. Handles model saving and restore.

Parameters:
  • epochs (int) – Number of training epochs.
  • example_dim (Tuple[int, int]) – Dimension of an example. In the case of GANs the example has dimension (2,1) since it’s composed by a tuple in which the first element is a tuple with 2 components and the second component is a single element. In the case of classifier the example has dimension (1, 1) since it’s composed by the example and the label.
  • logdir (str) – Checkpoint and log directory.
  • log_eval_mode (py:class:ashpy.modes.LogEvalMode) – to use when evaluating and logging.
  • global_step (Optional[py:class:ashpy.modes.LogEvalMode]) – tf.Variable that keeps track of the training steps.
  • metrics (Optional[List[ashpy.metrics.Metric]]) – list of metrics.
  • callbacks (Optional[List[ashpy.callbacks.Callback]]) – list of callbacks to handle events.
Return type:

None

static _check_name_collision(objects, obj_type)[source]

Check that all objects have unique name.

_current_epoch()[source]

Get the current epoch using the (restored) variables.

Return type:Tensor
Returns:current_epoch (tf.Tensor) – the current epoch of training.
_dataset_from_example(example, dims)[source]

Get a dataset from a given example.

Return type:DatasetV2
Returns:The dataset containing only the example.
_generate_checkpoint_map()[source]

Generate a human readable map of the id and type mapping in the checkpoint.

_log_metrics_and_reset()[source]

Call for each metric the log and reset_states.

_measure_performance()[source]

Measure performance on dataset.

_measure_performance_if_needed(example, measure_performance_freq)[source]

Measure performance if needed.

Measure performance if self._global_step % measure_performance_freq is 0.

_on_batch_end()[source]

Handle the end of a training batch.

Return type:None
_on_batch_start()[source]

Handle the start of a training batch.

Return type:None
_on_epoch_end()[source]

Handle the end of the training epoch.

Return type:None
_on_epoch_start()[source]

Handle the start of the training epoch.

Return type:None
_on_exception()[source]

Handle the exception.

Return type:None
_on_train_end()[source]

Handle the end of training.

Return type:None
_on_train_start()[source]

Handle the start of training.

Return type:None
_reduce(per_replica_tensor, reduce_op)[source]

Reduce the input tensor in a distributed fashion, using the specified op.

_restore_or_init()[source]

Restore or initialize the persistence layer (checkpoint).

_save()[source]

Save the current checkpointable object status.

_update_checkpoint(ckpt_dict)[source]

Update the checkpoint with the new checkpoint dictionary.

_update_global_batch_size(dataset, executors=None)[source]

Set the self._global_batch_size variable where needed.

Parameters:dataset (tf.data.Dataset) – a dataset from which the batch size will be extracted.
:param executors (Union[List[ashpy.losses.executor.Executor],: ashpy.losses.executor.Executor]: a list of executor
with the property “global_batch_size”.
_validate_callbacks()[source]

Check if every callback is an ashpy.callbacks.Callback.

_validate_metrics()[source]

Check if every metric is an ashpy.metrics.Metric.

call(*args, **kwargs)[source]

Execute the training process.

context

Return the training context.

Return type:Context
local_example(example, dims)[source]

Return a local example from a distributed example.

Returns:A local example from a distributed example.
measure_metrics()[source]

Measure the metrics.

Return type:None
model_selection()[source]

Use the metrics to perform model selection.

Return type:None