Checkpointing

Z Komputery Dużej Mocy w ACK CYFRONET AGH
Skocz do:nawigacja, szukaj

Checkpointing - mechanism to automatically save the state of computations at regular intervals, so it is possible to resume the task from the last checkpoint in case of failure of one of the nodes involved in computations or after killing tasks for administrative reasons, such as exceeding maximum time in queue.

Some scientific applications have already implemented this feature and user needs only to activate it.