-
Andrey Kutuzov authoredAndrey Kutuzov authored
tensorflow_1.15.2_openBLAS_bert.out 328.59 KiB
Starting job 1581003 on c7-7 at Fri Nov 27 02:49:07 CET 2020
The following modules were not unloaded:
(Use "module --force purge" to unload all):
1) StdEnv
Training corpus: /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/no_wiki/
WordPiece vocabulary used: /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/norwegian_wordpiece_vocab_20k.txt
BERT configuration file: /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/norbert_config.json
Directory for TF record files: /cluster/home/andreku/norbert/data/tfrecords/
Directory for the trained model: /cluster/home/andreku/norbert/model/
Creating pretraining data (TF records)...
2020-11-27 02:49:26.018191: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
WARNING:tensorflow:From utils/create_pretraining_data.py:70: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.
creating instance from /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/no_wiki/nowiki2.txt
creating instance from /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/no_wiki/nowiki5.txt
creating instance from /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/no_wiki/nowiki4.txt
creating instance from /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/no_wiki/nowiki3.txt
creating instance from /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/no_wiki/nowiki0.txt
creating instance from /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/no_wiki/nowiki1.txt
*** Writing to output files ***
/cluster/home/andreku/norbert/data/tfrecords/0.tfr
/cluster/home/andreku/norbert/data/tfrecords/1.tfr
/cluster/home/andreku/norbert/data/tfrecords/2.tfr
/cluster/home/andreku/norbert/data/tfrecords/3.tfr
Creating pretraining data (TF records) finished.
Training BERT on the files from /cluster/home/andreku/norbert/data/tfrecords/...
2020-11-27 02:57:00.495256: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:57:00.495260: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:57:00.495259: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:57:00.495257: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
WARNING! Combining --use_xla with --manual_fp16 may prevent convergence.
This warning message will be removed when the underlying
issues have been fixed and you are running a TF version
that has that fix.
WARNING! Combining --use_xla with --manual_fp16 may prevent convergence.
This warning message will be removed when the underlying
issues have been fixed and you are running a TF version
that has that fix.
WARNING! Combining --use_xla with --manual_fp16 may prevent convergence.
This warning message will be removed when the underlying
issues have been fixed and you are running a TF version
that has that fix.
WARNING! Combining --use_xla with --manual_fp16 may prevent convergence.
This warning message will be removed when the underlying
issues have been fixed and you are running a TF version
that has that fix.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
W1127 02:57:36.098473 47493880456384 lazy_loader.py:50]
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
W1127 02:57:36.098610 47913607222464 lazy_loader.py:50]
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
W1127 02:57:36.098645 47379959473344 lazy_loader.py:50]
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
W1127 02:57:36.098819 47510213129408 lazy_loader.py:50]
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
INFO:tensorflow:***** Configuaration *****
I1127 02:57:58.593601 47493880456384 run_pretraining.py:577] ***** Configuaration *****
INFO:tensorflow: logtostderr: False
I1127 02:57:58.593803 47493880456384 run_pretraining.py:579] logtostderr: False
INFO:tensorflow: alsologtostderr: False
I1127 02:57:58.593867 47493880456384 run_pretraining.py:579] alsologtostderr: False
INFO:tensorflow: log_dir:
I1127 02:57:58.593924 47493880456384 run_pretraining.py:579] log_dir:
INFO:tensorflow: v: 0
I1127 02:57:58.593979 47493880456384 run_pretraining.py:579] v: 0
INFO:tensorflow: verbosity: 0
I1127 02:57:58.594031 47493880456384 run_pretraining.py:579] verbosity: 0
INFO:tensorflow: stderrthreshold: fatal
I1127 02:57:58.594082 47493880456384 run_pretraining.py:579] stderrthreshold: fatal
INFO:tensorflow: showprefixforinfo: True
I1127 02:57:58.594133 47493880456384 run_pretraining.py:579] showprefixforinfo: True
INFO:tensorflow: run_with_pdb: False
I1127 02:57:58.594184 47493880456384 run_pretraining.py:579] run_with_pdb: False
INFO:tensorflow: pdb_post_mortem: False
INFO:tensorflow:Using config: {'_model_dir': '/cluster/home/andreku/norbert/model/', '_tf_random_seed': None, '_save_summary_steps': None, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': gpu_options {
visible_device_list: "1"
}
graph_options {
optimizer_options {
global_jit_level: ON_1
}
rewrite_options {
memory_optimization: NO_MEM_OPT
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 10000, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x2b17e6e8f0d0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
I1127 02:57:58.594235 47493880456384 run_pretraining.py:579] pdb_post_mortem: False
INFO:tensorflow: run_with_profiling: False
I1127 02:57:58.594284 47493880456384 run_pretraining.py:579] run_with_profiling: False
I1127 02:57:58.594083 47379959473344 estimator.py:212] Using config: {'_model_dir': '/cluster/home/andreku/norbert/model/', '_tf_random_seed': None, '_save_summary_steps': None, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': gpu_options {
visible_device_list: "1"
}
graph_options {
optimizer_options {
global_jit_level: ON_1
}
rewrite_options {
memory_optimization: NO_MEM_OPT
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 10000, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x2b17e6e8f0d0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow: profile_file: None
INFO:tensorflow:Using config: {'_model_dir': '/cluster/home/andreku/norbert/model/', '_tf_random_seed': None, '_save_summary_steps': None, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': gpu_options {
visible_device_list: "2"
}
graph_options {
optimizer_options {
global_jit_level: ON_1
}
rewrite_options {
memory_optimization: NO_MEM_OPT
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 10000, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x2b363be151d0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
I1127 02:57:58.594334 47493880456384 run_pretraining.py:579] profile_file: None
INFO:tensorflow: use_cprofile_for_profiling: True
I1127 02:57:58.594384 47493880456384 run_pretraining.py:579] use_cprofile_for_profiling: True
I1127 02:57:58.594178 47510213129408 estimator.py:212] Using config: {'_model_dir': '/cluster/home/andreku/norbert/model/', '_tf_random_seed': None, '_save_summary_steps': None, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': gpu_options {
visible_device_list: "2"
}
graph_options {
optimizer_options {
global_jit_level: ON_1
}
rewrite_options {
memory_optimization: NO_MEM_OPT
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 10000, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x2b363be151d0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Using config: {'_model_dir': '/cluster/home/andreku/norbert/model/', '_tf_random_seed': None, '_save_summary_steps': None, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': gpu_options {
visible_device_list: "3"
}
graph_options {
optimizer_options {
global_jit_level: ON_1
}
rewrite_options {
memory_optimization: NO_MEM_OPT
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 10000, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x2b9426cb3150>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow: only_check_args: False
I1127 02:57:58.594434 47493880456384 run_pretraining.py:579] only_check_args: False
I1127 02:57:58.594252 47913607222464 estimator.py:212] Using config: {'_model_dir': '/cluster/home/andreku/norbert/model/', '_tf_random_seed': None, '_save_summary_steps': None, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': gpu_options {
visible_device_list: "3"
}
graph_options {
optimizer_options {
global_jit_level: ON_1
}
rewrite_options {
memory_optimization: NO_MEM_OPT
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 10000, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x2b9426cb3150>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x2b17e6ceecb0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow: op_conversion_fallback_to_while_loop: False
I1127 02:57:58.594484 47493880456384 run_pretraining.py:579] op_conversion_fallback_to_while_loop: False
W1127 02:57:58.594417 47379959473344 model_fn.py:630] Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x2b17e6ceecb0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow: test_random_seed: 301
I1127 02:57:58.594543 47493880456384 run_pretraining.py:579] test_random_seed: 301
INFO:tensorflow: test_srcdir:
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x2b363bc79cb0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:***** Running training *****
I1127 02:57:58.594595 47493880456384 run_pretraining.py:579] test_srcdir:
I1127 02:57:58.594599 47379959473344 run_pretraining.py:623] ***** Running training *****
W1127 02:57:58.594530 47510213129408 model_fn.py:630] Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x2b363bc79cb0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow: test_tmpdir: /tmp/absl_testing
I1127 02:57:58.594644 47493880456384 run_pretraining.py:579] test_tmpdir: /tmp/absl_testing
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x2b9426b12cb0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow: test_randomize_ordering_seed:
W1127 02:57:58.594609 47913607222464 model_fn.py:630] Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x2b9426b12cb0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow: Batch size = 128
INFO:tensorflow:***** Running training *****
I1127 02:57:58.594694 47493880456384 run_pretraining.py:579] test_randomize_ordering_seed:
I1127 02:57:58.594661 47379959473344 run_pretraining.py:624] Batch size = 128
I1127 02:57:58.594704 47510213129408 run_pretraining.py:623] ***** Running training *****
INFO:tensorflow: xml_output_file:
I1127 02:57:58.594743 47493880456384 run_pretraining.py:579] xml_output_file:
INFO:tensorflow: bert_config_file: /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/norbert_config.json
INFO:tensorflow:***** Running training *****
I1127 02:57:58.594794 47493880456384 run_pretraining.py:579] bert_config_file: /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/norbert_config.json
INFO:tensorflow: Batch size = 128
I1127 02:57:58.594791 47913607222464 run_pretraining.py:623] ***** Running training *****
I1127 02:57:58.594764 47510213129408 run_pretraining.py:624] Batch size = 128
INFO:tensorflow: input_files_dir: /cluster/home/andreku/norbert/data/tfrecords/
I1127 02:57:58.594844 47493880456384 run_pretraining.py:579] input_files_dir: /cluster/home/andreku/norbert/data/tfrecords/
INFO:tensorflow: Batch size = 128
INFO:tensorflow: eval_files_dir: None
I1127 02:57:58.594896 47493880456384 run_pretraining.py:579] eval_files_dir: None
I1127 02:57:58.594849 47913607222464 run_pretraining.py:624] Batch size = 128
INFO:tensorflow: output_dir: /cluster/home/andreku/norbert/model/
I1127 02:57:58.594947 47493880456384 run_pretraining.py:579] output_dir: /cluster/home/andreku/norbert/model/
INFO:tensorflow: dllog_path: /cluster/home/andreku/norbert/bert_dllog.json
I1127 02:57:58.594998 47493880456384 run_pretraining.py:579] dllog_path: /cluster/home/andreku/norbert/bert_dllog.json
INFO:tensorflow: init_checkpoint: None
I1127 02:57:58.595048 47493880456384 run_pretraining.py:579] init_checkpoint: None
INFO:tensorflow: optimizer_type: lamb
I1127 02:57:58.595099 47493880456384 run_pretraining.py:579] optimizer_type: lamb
INFO:tensorflow: max_seq_length: 128
I1127 02:57:58.595149 47493880456384 run_pretraining.py:579] max_seq_length: 128
INFO:tensorflow: max_predictions_per_seq: 20
I1127 02:57:58.595200 47493880456384 run_pretraining.py:579] max_predictions_per_seq: 20
INFO:tensorflow: do_train: True
I1127 02:57:58.595249 47493880456384 run_pretraining.py:579] do_train: True
INFO:tensorflow: do_eval: False
I1127 02:57:58.595298 47493880456384 run_pretraining.py:579] do_eval: False
INFO:tensorflow: train_batch_size: 128
I1127 02:57:58.595348 47493880456384 run_pretraining.py:579] train_batch_size: 128
INFO:tensorflow: eval_batch_size: 8
I1127 02:57:58.595397 47493880456384 run_pretraining.py:579] eval_batch_size: 8
INFO:tensorflow: learning_rate: 0.0001
I1127 02:57:58.595450 47493880456384 run_pretraining.py:579] learning_rate: 0.0001
INFO:tensorflow: num_train_steps: 1000
I1127 02:57:58.595500 47493880456384 run_pretraining.py:579] num_train_steps: 1000
INFO:tensorflow: num_warmup_steps: 100
I1127 02:57:58.595556 47493880456384 run_pretraining.py:579] num_warmup_steps: 100
INFO:tensorflow: save_checkpoints_steps: 1000
I1127 02:57:58.595606 47493880456384 run_pretraining.py:579] save_checkpoints_steps: 1000
INFO:tensorflow: display_loss_steps: 1
I1127 02:57:58.595656 47493880456384 run_pretraining.py:579] display_loss_steps: 1
INFO:tensorflow: iterations_per_loop: 1000
I1127 02:57:58.595705 47493880456384 run_pretraining.py:579] iterations_per_loop: 1000
INFO:tensorflow: max_eval_steps: 100
I1127 02:57:58.595754 47493880456384 run_pretraining.py:579] max_eval_steps: 100
INFO:tensorflow: num_accumulation_steps: 1
I1127 02:57:58.595804 47493880456384 run_pretraining.py:579] num_accumulation_steps: 1
INFO:tensorflow: allreduce_post_accumulation: False
I1127 02:57:58.595854 47493880456384 run_pretraining.py:579] allreduce_post_accumulation: False
INFO:tensorflow: verbose_logging: False
I1127 02:57:58.595903 47493880456384 run_pretraining.py:579] verbose_logging: False
INFO:tensorflow: horovod: True
I1127 02:57:58.595952 47493880456384 run_pretraining.py:579] horovod: True
INFO:tensorflow: report_loss: True
I1127 02:57:58.596003 47493880456384 run_pretraining.py:579] report_loss: True
INFO:tensorflow: manual_fp16: True
I1127 02:57:58.596053 47493880456384 run_pretraining.py:579] manual_fp16: True
INFO:tensorflow: amp: False
I1127 02:57:58.596103 47493880456384 run_pretraining.py:579] amp: False
INFO:tensorflow: use_xla: True
I1127 02:57:58.596153 47493880456384 run_pretraining.py:579] use_xla: True
INFO:tensorflow: init_loss_scale: 4294967296
I1127 02:57:58.596203 47493880456384 run_pretraining.py:579] init_loss_scale: 4294967296
INFO:tensorflow: ?: False
I1127 02:57:58.596254 47493880456384 run_pretraining.py:579] ?: False
INFO:tensorflow: help: False
I1127 02:57:58.596303 47493880456384 run_pretraining.py:579] help: False
INFO:tensorflow: helpshort: False
I1127 02:57:58.596354 47493880456384 run_pretraining.py:579] helpshort: False
INFO:tensorflow: helpfull: False
I1127 02:57:58.596405 47493880456384 run_pretraining.py:579] helpfull: False
INFO:tensorflow: helpxml: False
I1127 02:57:58.596455 47493880456384 run_pretraining.py:579] helpxml: False
INFO:tensorflow:**************************
I1127 02:57:58.596499 47493880456384 run_pretraining.py:580] **************************
INFO:tensorflow:Using config: {'_model_dir': '/cluster/home/andreku/norbert/model/', '_tf_random_seed': None, '_save_summary_steps': 1000, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': gpu_options {
visible_device_list: "0"
}
graph_options {
optimizer_options {
global_jit_level: ON_1
}
rewrite_options {
memory_optimization: NO_MEM_OPT
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 10000, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x2b326e601090>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
I1127 02:57:58.597053 47493880456384 estimator.py:212] Using config: {'_model_dir': '/cluster/home/andreku/norbert/model/', '_tf_random_seed': None, '_save_summary_steps': 1000, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': gpu_options {
visible_device_list: "0"
}
graph_options {
optimizer_options {
global_jit_level: ON_1
}
rewrite_options {
memory_optimization: NO_MEM_OPT
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 10000, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x2b326e601090>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x2b326e46ccb0>) includes params argument, but params are not passed to Estimator.
W1127 02:57:58.597308 47493880456384 model_fn.py:630] Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x2b326e46ccb0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:***** Running training *****
I1127 02:57:58.597487 47493880456384 run_pretraining.py:623] ***** Running training *****
INFO:tensorflow: Batch size = 128
I1127 02:57:58.597556 47493880456384 run_pretraining.py:624] Batch size = 128
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W1127 02:57:58.633793 47379959473344 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W1127 02:57:58.633807 47913607222464 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W1127 02:57:58.633828 47493880456384 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W1127 02:57:58.633791 47510213129408 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From run_pretraining.py:508: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
W1127 02:57:58.648817 47379959473344 deprecation.py:323] From run_pretraining.py:508: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W1127 02:57:58.648943 47379959473344 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
WARNING:tensorflow:From run_pretraining.py:508: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
W1127 02:57:58.649004 47493880456384 deprecation.py:323] From run_pretraining.py:508: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
WARNING:tensorflow:From run_pretraining.py:508: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
WARNING:tensorflow:From run_pretraining.py:508: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W1127 02:57:58.649064 47510213129408 deprecation.py:323] From run_pretraining.py:508: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
W1127 02:57:58.649065 47913607222464 deprecation.py:323] From run_pretraining.py:508: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
W1127 02:57:58.649117 47493880456384 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W1127 02:57:58.649194 47510213129408 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W1127 02:57:58.649196 47913607222464 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
WARNING:tensorflow:From run_pretraining.py:525: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
W1127 02:57:58.675407 47379959473344 deprecation.py:323] From run_pretraining.py:525: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
WARNING:tensorflow:From run_pretraining.py:525: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
WARNING:tensorflow:From run_pretraining.py:525: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/batching.py:276: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
W1127 02:57:58.675494 47493880456384 deprecation.py:323] From run_pretraining.py:525: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
W1127 02:57:58.675501 47510213129408 deprecation.py:323] From run_pretraining.py:525: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
W1127 02:57:58.675592 47379959473344 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/batching.py:276: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/batching.py:276: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/batching.py:276: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
WARNING:tensorflow:From run_pretraining.py:525: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
W1127 02:57:58.675670 47510213129408 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/batching.py:276: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
W1127 02:57:58.675669 47493880456384 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/batching.py:276: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
W1127 02:57:58.675637 47913607222464 deprecation.py:323] From run_pretraining.py:525: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/batching.py:276: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
W1127 02:57:58.675801 47913607222464 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/batching.py:276: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.
W1127 02:57:58.757650 47379959473344 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.
W1127 02:57:58.758238 47493880456384 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.
W1127 02:57:58.759057 47913607222464 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.
W1127 02:57:58.759206 47510213129408 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.
WARNING:tensorflow:From run_pretraining.py:540: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From run_pretraining.py:540: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From run_pretraining.py:540: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W1127 02:57:58.875067 47493880456384 deprecation.py:323] From run_pretraining.py:540: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From run_pretraining.py:540: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W1127 02:57:58.875084 47913607222464 deprecation.py:323] From run_pretraining.py:540: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W1127 02:57:58.875098 47510213129408 deprecation.py:323] From run_pretraining.py:540: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W1127 02:57:58.875125 47379959473344 deprecation.py:323] From run_pretraining.py:540: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
I1127 02:57:58.910511 47379959473344 estimator.py:1148] Calling model_fn.
INFO:tensorflow:Calling model_fn.
I1127 02:57:58.910566 47913607222464 estimator.py:1148] Calling model_fn.
INFO:tensorflow:*** Features ***
I1127 02:57:58.910585 47510213129408 estimator.py:1148] Calling model_fn.
I1127 02:57:58.910651 47379959473344 run_pretraining.py:257] *** Features ***
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:*** Features ***
INFO:tensorflow:*** Features ***
I1127 02:57:58.910698 47913607222464 run_pretraining.py:257] *** Features ***
INFO:tensorflow: name = input_ids, shape = (128, 128)
I1127 02:57:58.910714 47510213129408 run_pretraining.py:257] *** Features ***
I1127 02:57:58.910659 47493880456384 estimator.py:1148] Calling model_fn.
I1127 02:57:58.910735 47379959473344 run_pretraining.py:259] name = input_ids, shape = (128, 128)
INFO:tensorflow: name = input_ids, shape = (128, 128)
INFO:tensorflow:*** Features ***
INFO:tensorflow: name = input_ids, shape = (128, 128)
I1127 02:57:58.910781 47913607222464 run_pretraining.py:259] name = input_ids, shape = (128, 128)
INFO:tensorflow: name = input_mask, shape = (128, 128)
I1127 02:57:58.910788 47493880456384 run_pretraining.py:257] *** Features ***
I1127 02:57:58.910800 47510213129408 run_pretraining.py:259] name = input_ids, shape = (128, 128)
I1127 02:57:58.910806 47379959473344 run_pretraining.py:259] name = input_mask, shape = (128, 128)
INFO:tensorflow: name = input_mask, shape = (128, 128)
INFO:tensorflow: name = input_mask, shape = (128, 128)
I1127 02:57:58.910852 47913607222464 run_pretraining.py:259] name = input_mask, shape = (128, 128)
INFO:tensorflow: name = masked_lm_ids, shape = (128, 20)
INFO:tensorflow: name = input_ids, shape = (128, 128)
I1127 02:57:58.910870 47510213129408 run_pretraining.py:259] name = input_mask, shape = (128, 128)
I1127 02:57:58.910872 47379959473344 run_pretraining.py:259] name = masked_lm_ids, shape = (128, 20)
I1127 02:57:58.910872 47493880456384 run_pretraining.py:259] name = input_ids, shape = (128, 128)
INFO:tensorflow: name = masked_lm_ids, shape = (128, 20)
INFO:tensorflow: name = masked_lm_positions, shape = (128, 20)
INFO:tensorflow: name = masked_lm_ids, shape = (128, 20)
I1127 02:57:58.910917 47913607222464 run_pretraining.py:259] name = masked_lm_ids, shape = (128, 20)
INFO:tensorflow: name = input_mask, shape = (128, 128)
I1127 02:57:58.910934 47510213129408 run_pretraining.py:259] name = masked_lm_ids, shape = (128, 20)
I1127 02:57:58.910934 47379959473344 run_pretraining.py:259] name = masked_lm_positions, shape = (128, 20)
I1127 02:57:58.910941 47493880456384 run_pretraining.py:259] name = input_mask, shape = (128, 128)
INFO:tensorflow: name = masked_lm_positions, shape = (128, 20)
INFO:tensorflow: name = masked_lm_weights, shape = (128, 20)
INFO:tensorflow: name = masked_lm_positions, shape = (128, 20)
I1127 02:57:58.910980 47913607222464 run_pretraining.py:259] name = masked_lm_positions, shape = (128, 20)
INFO:tensorflow: name = masked_lm_ids, shape = (128, 20)
I1127 02:57:58.910997 47510213129408 run_pretraining.py:259] name = masked_lm_positions, shape = (128, 20)
I1127 02:57:58.910996 47379959473344 run_pretraining.py:259] name = masked_lm_weights, shape = (128, 20)
I1127 02:57:58.911006 47493880456384 run_pretraining.py:259] name = masked_lm_ids, shape = (128, 20)
INFO:tensorflow: name = masked_lm_weights, shape = (128, 20)
INFO:tensorflow: name = masked_lm_weights, shape = (128, 20)
INFO:tensorflow: name = next_sentence_labels, shape = (128, 1)
I1127 02:57:58.911043 47913607222464 run_pretraining.py:259] name = masked_lm_weights, shape = (128, 20)
INFO:tensorflow: name = masked_lm_positions, shape = (128, 20)
I1127 02:57:58.911058 47510213129408 run_pretraining.py:259] name = masked_lm_weights, shape = (128, 20)
I1127 02:57:58.911058 47379959473344 run_pretraining.py:259] name = next_sentence_labels, shape = (128, 1)
I1127 02:57:58.911068 47493880456384 run_pretraining.py:259] name = masked_lm_positions, shape = (128, 20)
INFO:tensorflow: name = next_sentence_labels, shape = (128, 1)
INFO:tensorflow: name = next_sentence_labels, shape = (128, 1)
INFO:tensorflow: name = segment_ids, shape = (128, 128)
I1127 02:57:58.911104 47913607222464 run_pretraining.py:259] name = next_sentence_labels, shape = (128, 1)
INFO:tensorflow: name = masked_lm_weights, shape = (128, 20)
I1127 02:57:58.911120 47510213129408 run_pretraining.py:259] name = next_sentence_labels, shape = (128, 1)
I1127 02:57:58.911121 47379959473344 run_pretraining.py:259] name = segment_ids, shape = (128, 128)
I1127 02:57:58.911129 47493880456384 run_pretraining.py:259] name = masked_lm_weights, shape = (128, 20)
INFO:tensorflow: name = segment_ids, shape = (128, 128)
INFO:tensorflow: name = segment_ids, shape = (128, 128)
I1127 02:57:58.911166 47913607222464 run_pretraining.py:259] name = segment_ids, shape = (128, 128)
INFO:tensorflow: name = next_sentence_labels, shape = (128, 1)
I1127 02:57:58.911181 47510213129408 run_pretraining.py:259] name = segment_ids, shape = (128, 128)
I1127 02:57:58.911189 47493880456384 run_pretraining.py:259] name = next_sentence_labels, shape = (128, 1)
INFO:tensorflow: name = segment_ids, shape = (128, 128)
I1127 02:57:58.911251 47493880456384 run_pretraining.py:259] name = segment_ids, shape = (128, 128)
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:176: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
W1127 02:57:58.911300 47379959473344 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:176: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:176: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:176: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
W1127 02:57:58.911346 47913607222464 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:176: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
W1127 02:57:58.911365 47510213129408 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:176: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:176: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
W1127 02:57:58.911433 47493880456384 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:176: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:427: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
W1127 02:57:58.912754 47379959473344 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:427: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:427: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:427: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
W1127 02:57:58.912822 47913607222464 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:427: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
W1127 02:57:58.912829 47510213129408 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:427: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:427: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
W1127 02:57:58.912909 47493880456384 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:427: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:366: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
W1127 02:57:58.961073 47379959473344 deprecation.py:506] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:366: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:366: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:366: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
W1127 02:57:58.961161 47913607222464 deprecation.py:506] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:366: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
W1127 02:57:58.961200 47510213129408 deprecation.py:506] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:366: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:366: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
W1127 02:57:58.961334 47493880456384 deprecation.py:506] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:366: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:683: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
W1127 02:57:58.975179 47379959473344 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:683: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:683: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
W1127 02:57:58.975510 47913607222464 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:683: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:683: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
W1127 02:57:58.975659 47510213129408 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:683: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W1127 02:57:58.975763 47379959473344 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:683: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
W1127 02:57:58.975786 47493880456384 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:683: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W1127 02:57:58.976104 47913607222464 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W1127 02:57:58.976248 47510213129408 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W1127 02:57:58.976366 47493880456384 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From run_pretraining.py:295: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.
W1127 02:58:01.261205 47379959473344 module_wrapper.py:139] From run_pretraining.py:295: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.
decayed_learning_rate_at_crossover_point = 4.000000e-04, adjusted_init_lr = 4.000000e-04
Initializing LAMB Optimizer
WARNING:tensorflow:From run_pretraining.py:295: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.
W1127 02:58:01.303079 47493880456384 module_wrapper.py:139] From run_pretraining.py:295: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.
decayed_learning_rate_at_crossover_point = 4.000000e-04, adjusted_init_lr = 4.000000e-04
WARNING:tensorflow:From run_pretraining.py:295: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.
W1127 02:58:01.307732 47510213129408 module_wrapper.py:139] From run_pretraining.py:295: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.
decayed_learning_rate_at_crossover_point = 4.000000e-04, adjusted_init_lr = 4.000000e-04
Initializing LAMB Optimizer
Initializing LAMB Optimizer
WARNING:tensorflow:From run_pretraining.py:295: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.
W1127 02:58:01.331020 47913607222464 module_wrapper.py:139] From run_pretraining.py:295: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.
decayed_learning_rate_at_crossover_point = 4.000000e-04, adjusted_init_lr = 4.000000e-04
Initializing LAMB Optimizer
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W1127 02:58:01.734069 47379959473344 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W1127 02:58:01.734835 47510213129408 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W1127 02:58:01.735529 47913607222464 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W1127 02:58:01.735536 47493880456384 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:169: The name tf.is_finite is deprecated. Please use tf.math.is_finite instead.
W1127 02:58:05.910739 47379959473344 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:169: The name tf.is_finite is deprecated. Please use tf.math.is_finite instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:169: The name tf.is_finite is deprecated. Please use tf.math.is_finite instead.
W1127 02:58:05.961147 47510213129408 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:169: The name tf.is_finite is deprecated. Please use tf.math.is_finite instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:169: The name tf.is_finite is deprecated. Please use tf.math.is_finite instead.
W1127 02:58:05.993538 47493880456384 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:169: The name tf.is_finite is deprecated. Please use tf.math.is_finite instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:169: The name tf.is_finite is deprecated. Please use tf.math.is_finite instead.
W1127 02:58:05.996852 47913607222464 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:169: The name tf.is_finite is deprecated. Please use tf.math.is_finite instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:178: The name tf.global_norm is deprecated. Please use tf.linalg.global_norm instead.
W1127 02:58:06.240663 47379959473344 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:178: The name tf.global_norm is deprecated. Please use tf.linalg.global_norm instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:178: The name tf.global_norm is deprecated. Please use tf.linalg.global_norm instead.
W1127 02:58:06.295029 47510213129408 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:178: The name tf.global_norm is deprecated. Please use tf.linalg.global_norm instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:178: The name tf.global_norm is deprecated. Please use tf.linalg.global_norm instead.
W1127 02:58:06.329781 47493880456384 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:178: The name tf.global_norm is deprecated. Please use tf.linalg.global_norm instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:178: The name tf.global_norm is deprecated. Please use tf.linalg.global_norm instead.
W1127 02:58:06.333020 47913607222464 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:178: The name tf.global_norm is deprecated. Please use tf.linalg.global_norm instead.
INFO:tensorflow:Done calling model_fn.
I1127 02:58:15.510627 47379959473344 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
I1127 02:58:15.697726 47510213129408 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
I1127 02:58:15.809911 47493880456384 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
I1127 02:58:15.811256 47493880456384 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Done calling model_fn.
I1127 02:58:15.824473 47913607222464 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Graph was finalized.
I1127 02:58:47.664816 47379959473344 monitored_session.py:240] Graph was finalized.
2020-11-27 02:58:47.696844: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
INFO:tensorflow:Graph was finalized.
I1127 02:58:48.071923 47510213129408 monitored_session.py:240] Graph was finalized.
2020-11-27 02:58:48.084972: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
INFO:tensorflow:Graph was finalized.
I1127 02:58:48.395794 47913607222464 monitored_session.py:240] Graph was finalized.
2020-11-27 02:58:48.411693: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
INFO:tensorflow:Graph was finalized.
I1127 02:58:48.467956 47493880456384 monitored_session.py:240] Graph was finalized.
2020-11-27 02:58:48.484066: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-11-27 02:58:49.716379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:14:00.0
2020-11-27 02:58:49.716420: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:58:49.716528: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:b1:00.0
2020-11-27 02:58:49.716562: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:58:49.716593: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:88:00.0
2020-11-27 02:58:49.716629: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:58:49.716665: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:39:00.0
2020-11-27 02:58:49.716700: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:58:51.386188: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-27 02:58:51.386188: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-27 02:58:51.386187: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-27 02:58:51.386190: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-27 02:58:51.734149: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-11-27 02:58:51.734149: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-11-27 02:58:51.734171: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-11-27 02:58:51.734215: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-11-27 02:58:52.149409: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-11-27 02:58:52.149411: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-11-27 02:58:52.149412: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-11-27 02:58:52.149419: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-11-27 02:58:52.778962: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-11-27 02:58:52.778968: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-11-27 02:58:52.778967: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-11-27 02:58:52.778972: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-11-27 02:58:53.025336: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-11-27 02:58:53.025337: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-11-27 02:58:53.025339: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-11-27 02:58:53.025339: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-11-27 02:58:54.429955: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-27 02:58:54.429956: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-27 02:58:54.429956: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-27 02:58:54.429970: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-27 02:58:54.439842: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 1
2020-11-27 02:58:54.439993: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 2
2020-11-27 02:58:54.440065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 3
2020-11-27 02:58:54.440105: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:58:54.440106: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:58:54.440114: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:58:54.440136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-11-27 02:58:54.440171: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:58:55.749275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-27 02:58:55.749315: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 3
2020-11-27 02:58:55.749322: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 3: N
2020-11-27 02:58:55.753723: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15125 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:b1:00.0, compute capability: 6.0)
2020-11-27 02:58:55.756771: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.
2020-11-27 02:58:55.790619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-27 02:58:55.790656: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 2
2020-11-27 02:58:55.790663: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 2: N
2020-11-27 02:58:55.794820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15125 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:88:00.0, compute capability: 6.0)
2020-11-27 02:58:55.796612: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.
2020-11-27 02:58:55.833128: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-27 02:58:55.833169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 1
2020-11-27 02:58:55.833176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 1: N
2020-11-27 02:58:55.837259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15125 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:39:00.0, compute capability: 6.0)
2020-11-27 02:58:55.840814: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.
2020-11-27 02:58:55.922214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-27 02:58:55.922259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0
2020-11-27 02:58:55.922267: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N
2020-11-27 02:58:55.926298: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15125 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:14:00.0, compute capability: 6.0)
2020-11-27 02:58:55.929590: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.
INFO:tensorflow:Running local_init_op.
I1127 02:59:01.640508 47379959473344 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Running local_init_op.
I1127 02:59:01.714041 47493880456384 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Running local_init_op.
I1127 02:59:01.775563 47913607222464 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Running local_init_op.
I1127 02:59:01.842157 47510213129408 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I1127 02:59:02.036135 47379959473344 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
I1127 02:59:02.105047 47493880456384 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
I1127 02:59:02.162716 47913607222464 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
I1127 02:59:02.238831 47510213129408 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /cluster/home/andreku/norbert/model/model.ckpt.
I1127 02:59:51.432053 47493880456384 basic_session_run_hooks.py:606] Saving checkpoints for 0 into /cluster/home/andreku/norbert/model/model.ckpt.
c7-7:41737:41793 [0] NCCL INFO Bootstrap : Using [0]ib0:10.33.7.7<0>
c7-7:41737:41793 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
c7-7:41737:41793 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:10.33.7.7<0>
c7-7:41737:41793 [0] NCCL INFO Using network IB
NCCL version 2.6.4+cuda10.1
c7-7:41740:41792 [3] NCCL INFO Bootstrap : Using [0]ib0:10.33.7.7<0>
c7-7:41739:41795 [2] NCCL INFO Bootstrap : Using [0]ib0:10.33.7.7<0>
c7-7:41738:41794 [1] NCCL INFO Bootstrap : Using [0]ib0:10.33.7.7<0>
c7-7:41740:41792 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
c7-7:41738:41794 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
c7-7:41739:41795 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
c7-7:41740:41792 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:10.33.7.7<0>
c7-7:41740:41792 [3] NCCL INFO Using network IB
c7-7:41738:41794 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:10.33.7.7<0>
c7-7:41738:41794 [1] NCCL INFO Using network IB
c7-7:41739:41795 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:10.33.7.7<0>
c7-7:41739:41795 [2] NCCL INFO Using network IB
c7-7:41739:41795 [2] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64
c7-7:41739:41795 [2] NCCL INFO Trees [0] 3/-1/-1->2->1|1->2->3/-1/-1 [1] 3/-1/-1->2->1|1->2->3/-1/-1
c7-7:41739:41795 [2] NCCL INFO Setting affinity for GPU 2 to 03f0,0003f000
c7-7:41737:41793 [0] NCCL INFO Channel 00/02 : 0 1 2 3
c7-7:41737:41793 [0] NCCL INFO Channel 01/02 : 0 1 2 3
c7-7:41740:41792 [3] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64
c7-7:41740:41792 [3] NCCL INFO Trees [0] -1/-1/-1->3->2|2->3->-1/-1/-1 [1] -1/-1/-1->3->2|2->3->-1/-1/-1
c7-7:41740:41792 [3] NCCL INFO Setting affinity for GPU 3 to 03f0,0003f000
c7-7:41738:41794 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64
c7-7:41738:41794 [1] NCCL INFO Trees [0] 2/-1/-1->1->0|0->1->2/-1/-1 [1] 2/-1/-1->1->0|0->1->2/-1/-1
c7-7:41738:41794 [1] NCCL INFO Setting affinity for GPU 1 to 3f00003f
c7-7:41737:41793 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64
c7-7:41737:41793 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1|-1->0->1/-1/-1 [1] 1/-1/-1->0->-1|-1->0->1/-1/-1
c7-7:41737:41793 [0] NCCL INFO Setting affinity for GPU 0 to 3f00003f
c7-7:41738:41794 [1] NCCL INFO Ring 00 : 1[39000] -> 2[88000] via P2P/IPC
c7-7:41740:41792 [3] NCCL INFO Ring 00 : 3[b1000] -> 0[14000] via P2P/IPC
c7-7:41739:41795 [2] NCCL INFO Ring 00 : 2[88000] -> 3[b1000] via P2P/IPC
c7-7:41737:41793 [0] NCCL INFO Ring 00 : 0[14000] -> 1[39000] via P2P/IPC
c7-7:41739:41795 [2] NCCL INFO Ring 00 : 2[88000] -> 1[39000] via P2P/IPC
c7-7:41738:41794 [1] NCCL INFO Ring 00 : 1[39000] -> 0[14000] via P2P/IPC
c7-7:41740:41792 [3] NCCL INFO Ring 00 : 3[b1000] -> 2[88000] via P2P/IPC
c7-7:41739:41795 [2] NCCL INFO Ring 01 : 2[88000] -> 3[b1000] via P2P/IPC
c7-7:41740:41792 [3] NCCL INFO Ring 01 : 3[b1000] -> 0[14000] via P2P/IPC
c7-7:41737:41793 [0] NCCL INFO Ring 01 : 0[14000] -> 1[39000] via P2P/IPC
c7-7:41738:41794 [1] NCCL INFO Ring 01 : 1[39000] -> 2[88000] via P2P/IPC
c7-7:41740:41792 [3] NCCL INFO Ring 01 : 3[b1000] -> 2[88000] via P2P/IPC
c7-7:41739:41795 [2] NCCL INFO Ring 01 : 2[88000] -> 1[39000] via P2P/IPC
c7-7:41740:41792 [3] NCCL INFO comm 0x2b942817e920 rank 3 nranks 4 cudaDev 3 busId b1000 - Init COMPLETE
c7-7:41738:41794 [1] NCCL INFO Ring 01 : 1[39000] -> 0[14000] via P2P/IPC
c7-7:41737:41793 [0] NCCL INFO comm 0x2b32702ae640 rank 0 nranks 4 cudaDev 0 busId 14000 - Init COMPLETE
c7-7:41739:41795 [2] NCCL INFO comm 0x2b36401cc990 rank 2 nranks 4 cudaDev 2 busId 88000 - Init COMPLETE
c7-7:41738:41794 [1] NCCL INFO comm 0x2b17e817e930 rank 1 nranks 4 cudaDev 1 busId 39000 - Init COMPLETE
c7-7:41737:41793 [0] NCCL INFO Launch mode Parallel
WARNING:tensorflow:From run_pretraining.py:146: The name tf.train.get_global_step is deprecated. Please use tf.compat.v1.train.get_global_step instead.
W1127 03:00:19.543303 47493880456384 module_wrapper.py:139] From run_pretraining.py:146: The name tf.train.get_global_step is deprecated. Please use tf.compat.v1.train.get_global_step instead.
2020-11-27 03:00:42.097413: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-27 03:00:44.042029: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-27 03:00:44.388321: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-27 03:00:44.700785: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-27 03:00:45.060258: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-27 03:00:45.078043: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-27 03:00:45.383347: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-27 03:00:45.388458: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:00:48.592843 - Iteration: 1 throughput_train : 17.862 seq/s mlm_loss : 10.0161 nsp_loss : 0.6303 total_loss : 10.6464 avg_loss_step : 10.6464 learning_rate : 0.0 loss_scaler : 4294967296
INFO:tensorflow:loss = 10.646426, step = 0
I1127 03:00:48.593047 47493880456384 basic_session_run_hooks.py:262] loss = 10.646426, step = 0
INFO:tensorflow:loss = 10.658132, step = 0
I1127 03:00:49.077786 47379959473344 basic_session_run_hooks.py:262] loss = 10.658132, step = 0
INFO:tensorflow:loss = 10.675867, step = 0
I1127 03:00:49.085876 47510213129408 basic_session_run_hooks.py:262] loss = 10.675867, step = 0
INFO:tensorflow:loss = 10.671821, step = 0
I1127 03:00:49.093230 47913607222464 basic_session_run_hooks.py:262] loss = 10.671821, step = 0
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:16.408520 47493880456384 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:16.408800 - Iteration: 1 throughput_train : 18.407 seq/s mlm_loss : 10.0516 nsp_loss : 0.6226 total_loss : 10.6742 avg_loss_step : 10.6742 learning_rate : 0.0 loss_scaler : 4294967296
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:16.409372 47379959473344 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:16.409482 47510213129408 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:16.409999 47913607222464 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:18.281277 47913607222464 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:18.281934 47493880456384 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:18.282180 - Iteration: 1 throughput_train : 273.343 seq/s mlm_loss : 10.0447 nsp_loss : 0.6394 total_loss : 10.6841 avg_loss_step : 10.6841 learning_rate : 0.0 loss_scaler : 2147483648
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:18.286860 47510213129408 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:18.288407 47379959473344 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:20.167413 47379959473344 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:20.168865 47510213129408 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:20.169528 47493880456384 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:20.169768 - Iteration: 1 throughput_train : 271.284 seq/s mlm_loss : 10.0232 nsp_loss : 0.6320 total_loss : 10.6552 avg_loss_step : 10.6552 learning_rate : 0.0 loss_scaler : 2147483648
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:20.173998 47913607222464 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:22.053810 47493880456384 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:22.054053 - Iteration: 1 throughput_train : 271.760 seq/s mlm_loss : 10.0441 nsp_loss : 0.6293 total_loss : 10.6734 avg_loss_step : 10.6734 learning_rate : 0.0 loss_scaler : 1073741824
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:22.056735 47379959473344 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:22.057951 47510213129408 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:22.057997 47913607222464 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:23.931773 47493880456384 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:23.932012 - Iteration: 1 throughput_train : 272.674 seq/s mlm_loss : 10.0205 nsp_loss : 0.6204 total_loss : 10.6409 avg_loss_step : 10.6409 learning_rate : 0.0 loss_scaler : 1073741824
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:23.933689 47913607222464 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:23.935443 47379959473344 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:23.937267 47510213129408 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:25.838017 - Iteration: 1 throughput_train : 268.666 seq/s mlm_loss : 10.0254 nsp_loss : 0.6318 total_loss : 10.6573 avg_loss_step : 10.6573 learning_rate : 0.0 loss_scaler : 536870912
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:27.716195 - Iteration: 1 throughput_train : 272.647 seq/s mlm_loss : 10.0365 nsp_loss : 0.6306 total_loss : 10.6672 avg_loss_step : 10.6672 learning_rate : 0.0 loss_scaler : 536870912
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:29.590927 - Iteration: 1 throughput_train : 273.147 seq/s mlm_loss : 10.0342 nsp_loss : 0.6437 total_loss : 10.6779 avg_loss_step : 10.6779 learning_rate : 0.0 loss_scaler : 268435456
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:31.473217 - Iteration: 1 throughput_train : 272.054 seq/s mlm_loss : 10.0322 nsp_loss : 0.6440 total_loss : 10.6761 avg_loss_step : 10.6761 learning_rate : 0.0 loss_scaler : 268435456
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:33.362713 - Iteration: 1 throughput_train : 271.017 seq/s mlm_loss : 10.0186 nsp_loss : 0.6424 total_loss : 10.6611 avg_loss_step : 10.6611 learning_rate : 0.0 loss_scaler : 134217728
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:35.244203 - Iteration: 1 throughput_train : 272.166 seq/s mlm_loss : 10.0325 nsp_loss : 0.6334 total_loss : 10.6659 avg_loss_step : 10.6659 learning_rate : 0.0 loss_scaler : 134217728
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:37.134643 - Iteration: 1 throughput_train : 270.878 seq/s mlm_loss : 10.0104 nsp_loss : 0.6264 total_loss : 10.6368 avg_loss_step : 10.6368 learning_rate : 0.0 loss_scaler : 67108864
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:39.021636 - Iteration: 1 throughput_train : 271.373 seq/s mlm_loss : 10.0204 nsp_loss : 0.6293 total_loss : 10.6497 avg_loss_step : 10.6497 learning_rate : 0.0 loss_scaler : 67108864
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:40.918920 - Iteration: 1 throughput_train : 269.902 seq/s mlm_loss : 9.9981 nsp_loss : 0.6241 total_loss : 10.6222 avg_loss_step : 10.6222 learning_rate : 0.0 loss_scaler : 33554432
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:42.811608 - Iteration: 1 throughput_train : 270.562 seq/s mlm_loss : 10.0181 nsp_loss : 0.6417 total_loss : 10.6598 avg_loss_step : 10.6598 learning_rate : 0.0 loss_scaler : 33554432
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:44.708990 - Iteration: 1 throughput_train : 269.888 seq/s mlm_loss : 10.0143 nsp_loss : 0.6288 total_loss : 10.6432 avg_loss_step : 10.6432 learning_rate : 0.0 loss_scaler : 16777216
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:46.604911 - Iteration: 1 throughput_train : 270.097 seq/s mlm_loss : 10.0303 nsp_loss : 0.6504 total_loss : 10.6807 avg_loss_step : 10.6807 learning_rate : 0.0 loss_scaler : 16777216
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:48.496796 - Iteration: 1 throughput_train : 270.671 seq/s mlm_loss : 9.9989 nsp_loss : 0.6374 total_loss : 10.6362 avg_loss_step : 10.6362 learning_rate : 0.0 loss_scaler : 8388608
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:50.386115 - Iteration: 1 throughput_train : 271.039 seq/s mlm_loss : 10.0406 nsp_loss : 0.6216 total_loss : 10.6623 avg_loss_step : 10.6623 learning_rate : 0.0 loss_scaler : 8388608
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:52.294803 - Iteration: 1 throughput_train : 268.289 seq/s mlm_loss : 10.0155 nsp_loss : 0.6210 total_loss : 10.6365 avg_loss_step : 10.6365 learning_rate : 0.0 loss_scaler : 4194304
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:54.193717 - Iteration: 1 throughput_train : 269.670 seq/s mlm_loss : 10.0482 nsp_loss : 0.6340 total_loss : 10.6822 avg_loss_step : 10.6822 learning_rate : 0.0 loss_scaler : 4194304
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:56.081120 - Iteration: 1 throughput_train : 271.314 seq/s mlm_loss : 10.0234 nsp_loss : 0.6274 total_loss : 10.6508 avg_loss_step : 10.6508 learning_rate : 0.0 loss_scaler : 2097152
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:57.983241 - Iteration: 1 throughput_train : 269.214 seq/s mlm_loss : 10.0021 nsp_loss : 0.6238 total_loss : 10.6259 avg_loss_step : 10.6259 learning_rate : 0.0 loss_scaler : 2097152
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:59.876384 - Iteration: 1 throughput_train : 270.492 seq/s mlm_loss : 10.0168 nsp_loss : 0.6303 total_loss : 10.6471 avg_loss_step : 10.6471 learning_rate : 0.0 loss_scaler : 1048576
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:01.776965 - Iteration: 1 throughput_train : 269.435 seq/s mlm_loss : 10.0485 nsp_loss : 0.6272 total_loss : 10.6756 avg_loss_step : 10.6756 learning_rate : 0.0 loss_scaler : 1048576
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:03.668076 - Iteration: 1 throughput_train : 270.783 seq/s mlm_loss : 10.0267 nsp_loss : 0.6193 total_loss : 10.6460 avg_loss_step : 10.6460 learning_rate : 0.0 loss_scaler : 524288
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:05.566324 - Iteration: 1 throughput_train : 269.766 seq/s mlm_loss : 10.0401 nsp_loss : 0.6316 total_loss : 10.6717 avg_loss_step : 10.6717 learning_rate : 0.0 loss_scaler : 524288
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:07.461631 - Iteration: 1 throughput_train : 270.197 seq/s mlm_loss : 10.0415 nsp_loss : 0.6244 total_loss : 10.6660 avg_loss_step : 10.6660 learning_rate : 0.0 loss_scaler : 262144
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:09.352564 - Iteration: 1 throughput_train : 270.818 seq/s mlm_loss : 10.0427 nsp_loss : 0.6339 total_loss : 10.6765 avg_loss_step : 10.6765 learning_rate : 0.0 loss_scaler : 262144
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:11.245946 - Iteration: 1 throughput_train : 270.464 seq/s mlm_loss : 10.0306 nsp_loss : 0.6332 total_loss : 10.6638 avg_loss_step : 10.6638 learning_rate : 0.0 loss_scaler : 131072
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:13.141670 - Iteration: 1 throughput_train : 270.138 seq/s mlm_loss : 10.0604 nsp_loss : 0.6395 total_loss : 10.6999 avg_loss_step : 10.6999 learning_rate : 0.0 loss_scaler : 131072
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:15.034842 - Iteration: 1 throughput_train : 270.488 seq/s mlm_loss : 10.0276 nsp_loss : 0.6205 total_loss : 10.6481 avg_loss_step : 10.6481 learning_rate : 0.0 loss_scaler : 65536
Skipping time record for 0 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:16.927021 - Iteration: 1 throughput_train : 270.631 seq/s mlm_loss : 10.0274 nsp_loss : 0.6343 total_loss : 10.6617 avg_loss_step : 10.6617 learning_rate : 0.0 loss_scaler : 65536
Skipping time record for 1 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:18.875661 - Iteration: 2 throughput_train : 262.791 seq/s mlm_loss : 10.0255 nsp_loss : 0.6242 total_loss : 10.6497 avg_loss_step : 10.6497 learning_rate : 0.0 loss_scaler : 32768
Skipping time record for 2 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:20.824668 - Iteration: 3 throughput_train : 262.741 seq/s mlm_loss : 10.0220 nsp_loss : 0.6288 total_loss : 10.6509 avg_loss_step : 10.6509 learning_rate : 4e-06 loss_scaler : 32768
Skipping time record for 3 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:22.779278 - Iteration: 4 throughput_train : 261.989 seq/s mlm_loss : 10.0407 nsp_loss : 0.6191 total_loss : 10.6598 avg_loss_step : 10.6598 learning_rate : 8e-06 loss_scaler : 32768
Skipping time record for 4 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:24.723752 - Iteration: 5 throughput_train : 263.353 seq/s mlm_loss : 10.0448 nsp_loss : 0.6121 total_loss : 10.6569 avg_loss_step : 10.6569 learning_rate : 1.2e-05 loss_scaler : 32768
Skipping time record for 5 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:26.672401 - Iteration: 6 throughput_train : 262.790 seq/s mlm_loss : 10.0324 nsp_loss : 0.5929 total_loss : 10.6253 avg_loss_step : 10.6253 learning_rate : 1.6e-05 loss_scaler : 32768
DLL 2020-11-27 03:02:28.626130 - Iteration: 7 throughput_train : 262.100 seq/s mlm_loss : 10.0436 nsp_loss : 0.5690 total_loss : 10.6126 avg_loss_step : 10.6126 learning_rate : 2e-05 loss_scaler : 32768
DLL 2020-11-27 03:02:30.570816 - Iteration: 8 throughput_train : 263.320 seq/s mlm_loss : 10.0315 nsp_loss : 0.5704 total_loss : 10.6018 avg_loss_step : 10.6018 learning_rate : 2.4e-05 loss_scaler : 32768
DLL 2020-11-27 03:02:32.512938 - Iteration: 9 throughput_train : 263.668 seq/s mlm_loss : 10.0385 nsp_loss : 0.5515 total_loss : 10.5900 avg_loss_step : 10.5900 learning_rate : 2.8e-05 loss_scaler : 32768
DLL 2020-11-27 03:02:34.459389 - Iteration: 10 throughput_train : 263.082 seq/s mlm_loss : 10.0142 nsp_loss : 0.5288 total_loss : 10.5431 avg_loss_step : 10.5431 learning_rate : 3.2e-05 loss_scaler : 32768
DLL 2020-11-27 03:02:36.404591 - Iteration: 11 throughput_train : 263.252 seq/s mlm_loss : 10.0280 nsp_loss : 0.4976 total_loss : 10.5255 avg_loss_step : 10.5255 learning_rate : 3.6e-05 loss_scaler : 32768
DLL 2020-11-27 03:02:38.354096 - Iteration: 12 throughput_train : 262.670 seq/s mlm_loss : 10.0271 nsp_loss : 0.4700 total_loss : 10.4971 avg_loss_step : 10.4971 learning_rate : 4e-05 loss_scaler : 32768
DLL 2020-11-27 03:02:40.307365 - Iteration: 13 throughput_train : 262.163 seq/s mlm_loss : 10.0201 nsp_loss : 0.4389 total_loss : 10.4589 avg_loss_step : 10.4589 learning_rate : 4.4e-05 loss_scaler : 32768
DLL 2020-11-27 03:02:42.260597 - Iteration: 14 throughput_train : 262.172 seq/s mlm_loss : 10.0028 nsp_loss : 0.4064 total_loss : 10.4092 avg_loss_step : 10.4092 learning_rate : 4.8e-05 loss_scaler : 32768
DLL 2020-11-27 03:02:44.214010 - Iteration: 15 throughput_train : 262.143 seq/s mlm_loss : 9.9913 nsp_loss : 0.3739 total_loss : 10.3652 avg_loss_step : 10.3652 learning_rate : 5.2e-05 loss_scaler : 32768
DLL 2020-11-27 03:02:46.161220 - Iteration: 16 throughput_train : 262.979 seq/s mlm_loss : 9.9777 nsp_loss : 0.3383 total_loss : 10.3160 avg_loss_step : 10.3160 learning_rate : 5.6e-05 loss_scaler : 32768
DLL 2020-11-27 03:02:48.108658 - Iteration: 17 throughput_train : 262.948 seq/s mlm_loss : 10.0070 nsp_loss : 0.3076 total_loss : 10.3146 avg_loss_step : 10.3146 learning_rate : 6e-05 loss_scaler : 32768
DLL 2020-11-27 03:02:50.056367 - Iteration: 18 throughput_train : 262.911 seq/s mlm_loss : 9.9829 nsp_loss : 0.2693 total_loss : 10.2522 avg_loss_step : 10.2522 learning_rate : 6.4e-05 loss_scaler : 32768
DLL 2020-11-27 03:02:51.993858 - Iteration: 19 throughput_train : 264.299 seq/s mlm_loss : 9.9751 nsp_loss : 0.2352 total_loss : 10.2103 avg_loss_step : 10.2103 learning_rate : 6.8e-05 loss_scaler : 32768
DLL 2020-11-27 03:02:53.937635 - Iteration: 20 throughput_train : 263.444 seq/s mlm_loss : 9.9623 nsp_loss : 0.2138 total_loss : 10.1761 avg_loss_step : 10.1761 learning_rate : 7.2e-05 loss_scaler : 32768
DLL 2020-11-27 03:02:55.880048 - Iteration: 21 throughput_train : 263.629 seq/s mlm_loss : 9.9588 nsp_loss : 0.1891 total_loss : 10.1479 avg_loss_step : 10.1479 learning_rate : 7.6e-05 loss_scaler : 32768
DLL 2020-11-27 03:02:57.826433 - Iteration: 22 throughput_train : 263.091 seq/s mlm_loss : 9.9589 nsp_loss : 0.1648 total_loss : 10.1237 avg_loss_step : 10.1237 learning_rate : 8e-05 loss_scaler : 32768
DLL 2020-11-27 03:02:59.775349 - Iteration: 23 throughput_train : 262.753 seq/s mlm_loss : 9.9276 nsp_loss : 0.1462 total_loss : 10.0738 avg_loss_step : 10.0738 learning_rate : 8.4e-05 loss_scaler : 32768
DLL 2020-11-27 03:03:01.715156 - Iteration: 24 throughput_train : 264.000 seq/s mlm_loss : 9.9098 nsp_loss : 0.1270 total_loss : 10.0368 avg_loss_step : 10.0368 learning_rate : 8.8e-05 loss_scaler : 32768
DLL 2020-11-27 03:03:03.662734 - Iteration: 25 throughput_train : 262.932 seq/s mlm_loss : 9.9122 nsp_loss : 0.1125 total_loss : 10.0247 avg_loss_step : 10.0247 learning_rate : 9.2e-05 loss_scaler : 32768
DLL 2020-11-27 03:03:05.609201 - Iteration: 26 throughput_train : 263.098 seq/s mlm_loss : 9.9003 nsp_loss : 0.0961 total_loss : 9.9964 avg_loss_step : 9.9964 learning_rate : 9.6e-05 loss_scaler : 32768
DLL 2020-11-27 03:03:07.548543 - Iteration: 27 throughput_train : 264.067 seq/s mlm_loss : 9.9011 nsp_loss : 0.0831 total_loss : 9.9842 avg_loss_step : 9.9842 learning_rate : 1e-04 loss_scaler : 32768
DLL 2020-11-27 03:03:09.491554 - Iteration: 28 throughput_train : 263.563 seq/s mlm_loss : 9.8791 nsp_loss : 0.0727 total_loss : 9.9518 avg_loss_step : 9.9518 learning_rate : 0.000104 loss_scaler : 32768
DLL 2020-11-27 03:03:11.438947 - Iteration: 29 throughput_train : 262.957 seq/s mlm_loss : 9.8622 nsp_loss : 0.0619 total_loss : 9.9241 avg_loss_step : 9.9241 learning_rate : 0.000108 loss_scaler : 32768
DLL 2020-11-27 03:03:13.390545 - Iteration: 30 throughput_train : 262.389 seq/s mlm_loss : 9.8653 nsp_loss : 0.0551 total_loss : 9.9203 avg_loss_step : 9.9203 learning_rate : 0.000112 loss_scaler : 32768
DLL 2020-11-27 03:03:15.340530 - Iteration: 31 throughput_train : 262.606 seq/s mlm_loss : 9.8029 nsp_loss : 0.0476 total_loss : 9.8505 avg_loss_step : 9.8505 learning_rate : 0.000116 loss_scaler : 32768
DLL 2020-11-27 03:03:17.289971 - Iteration: 32 throughput_train : 262.677 seq/s mlm_loss : 9.7908 nsp_loss : 0.0428 total_loss : 9.8336 avg_loss_step : 9.8336 learning_rate : 0.00012 loss_scaler : 32768
DLL 2020-11-27 03:03:19.241101 - Iteration: 33 throughput_train : 262.450 seq/s mlm_loss : 9.7904 nsp_loss : 0.0379 total_loss : 9.8283 avg_loss_step : 9.8283 learning_rate : 0.000124 loss_scaler : 32768
DLL 2020-11-27 03:03:21.181789 - Iteration: 34 throughput_train : 263.863 seq/s mlm_loss : 9.7822 nsp_loss : 0.0350 total_loss : 9.8171 avg_loss_step : 9.8171 learning_rate : 0.000128 loss_scaler : 32768
DLL 2020-11-27 03:03:23.120842 - Iteration: 35 throughput_train : 264.087 seq/s mlm_loss : 9.7403 nsp_loss : 0.0318 total_loss : 9.7721 avg_loss_step : 9.7721 learning_rate : 0.000132 loss_scaler : 32768
DLL 2020-11-27 03:03:25.062710 - Iteration: 36 throughput_train : 263.706 seq/s mlm_loss : 9.7183 nsp_loss : 0.0281 total_loss : 9.7463 avg_loss_step : 9.7463 learning_rate : 0.000136 loss_scaler : 32768
DLL 2020-11-27 03:03:27.010874 - Iteration: 37 throughput_train : 262.859 seq/s mlm_loss : 9.6493 nsp_loss : 0.0260 total_loss : 9.6753 avg_loss_step : 9.6753 learning_rate : 0.00014 loss_scaler : 32768
DLL 2020-11-27 03:03:28.957299 - Iteration: 38 throughput_train : 263.089 seq/s mlm_loss : 9.6475 nsp_loss : 0.0240 total_loss : 9.6715 avg_loss_step : 9.6715 learning_rate : 0.000144 loss_scaler : 32768
DLL 2020-11-27 03:03:30.907874 - Iteration: 39 throughput_train : 262.541 seq/s mlm_loss : 9.6494 nsp_loss : 0.0226 total_loss : 9.6720 avg_loss_step : 9.6720 learning_rate : 0.000148 loss_scaler : 32768
DLL 2020-11-27 03:03:32.857617 - Iteration: 40 throughput_train : 262.654 seq/s mlm_loss : 9.6227 nsp_loss : 0.0215 total_loss : 9.6442 avg_loss_step : 9.6442 learning_rate : 0.000152 loss_scaler : 32768
DLL 2020-11-27 03:03:34.797107 - Iteration: 41 throughput_train : 264.058 seq/s mlm_loss : 9.6086 nsp_loss : 0.0205 total_loss : 9.6292 avg_loss_step : 9.6292 learning_rate : 0.000156 loss_scaler : 32768
DLL 2020-11-27 03:03:36.743103 - Iteration: 42 throughput_train : 263.176 seq/s mlm_loss : 9.5962 nsp_loss : 0.0196 total_loss : 9.6158 avg_loss_step : 9.6158 learning_rate : 0.00016 loss_scaler : 32768
DLL 2020-11-27 03:03:38.681478 - Iteration: 43 throughput_train : 264.210 seq/s mlm_loss : 9.5094 nsp_loss : 0.0187 total_loss : 9.5281 avg_loss_step : 9.5281 learning_rate : 0.000164 loss_scaler : 32768
DLL 2020-11-27 03:03:40.627741 - Iteration: 44 throughput_train : 263.133 seq/s mlm_loss : 9.5604 nsp_loss : 0.0181 total_loss : 9.5785 avg_loss_step : 9.5785 learning_rate : 0.000168 loss_scaler : 32768
DLL 2020-11-27 03:03:42.584452 - Iteration: 45 throughput_train : 261.709 seq/s mlm_loss : 9.5284 nsp_loss : 0.0179 total_loss : 9.5463 avg_loss_step : 9.5463 learning_rate : 0.000172 loss_scaler : 32768
DLL 2020-11-27 03:03:44.531023 - Iteration: 46 throughput_train : 263.098 seq/s mlm_loss : 9.4623 nsp_loss : 0.0173 total_loss : 9.4796 avg_loss_step : 9.4796 learning_rate : 0.000176 loss_scaler : 32768
DLL 2020-11-27 03:03:46.475050 - Iteration: 47 throughput_train : 263.441 seq/s mlm_loss : 9.4430 nsp_loss : 0.0170 total_loss : 9.4600 avg_loss_step : 9.4600 learning_rate : 0.00018 loss_scaler : 32768
DLL 2020-11-27 03:03:48.420719 - Iteration: 48 throughput_train : 263.218 seq/s mlm_loss : 9.4856 nsp_loss : 0.0169 total_loss : 9.5025 avg_loss_step : 9.5025 learning_rate : 0.000184 loss_scaler : 32768
DLL 2020-11-27 03:03:50.356966 - Iteration: 49 throughput_train : 264.499 seq/s mlm_loss : 9.4995 nsp_loss : 0.0166 total_loss : 9.5161 avg_loss_step : 9.5161 learning_rate : 0.000188 loss_scaler : 32768
DLL 2020-11-27 03:03:52.294758 - Iteration: 50 throughput_train : 264.281 seq/s mlm_loss : 9.5366 nsp_loss : 0.0165 total_loss : 9.5531 avg_loss_step : 9.5531 learning_rate : 0.000192 loss_scaler : 32768
DLL 2020-11-27 03:03:54.239294 - Iteration: 51 throughput_train : 263.341 seq/s mlm_loss : 9.4012 nsp_loss : 0.0164 total_loss : 9.4176 avg_loss_step : 9.4176 learning_rate : 0.000196 loss_scaler : 32768
DLL 2020-11-27 03:03:56.183770 - Iteration: 52 throughput_train : 263.352 seq/s mlm_loss : 9.3971 nsp_loss : 0.0166 total_loss : 9.4137 avg_loss_step : 9.4137 learning_rate : 0.0002 loss_scaler : 32768
DLL 2020-11-27 03:03:58.120446 - Iteration: 53 throughput_train : 264.409 seq/s mlm_loss : 9.3740 nsp_loss : 0.0163 total_loss : 9.3903 avg_loss_step : 9.3903 learning_rate : 0.000204 loss_scaler : 32768
DLL 2020-11-27 03:04:00.070836 - Iteration: 54 throughput_train : 262.551 seq/s mlm_loss : 9.3342 nsp_loss : 0.0161 total_loss : 9.3502 avg_loss_step : 9.3502 learning_rate : 0.000208 loss_scaler : 32768
DLL 2020-11-27 03:04:02.009290 - Iteration: 55 throughput_train : 264.166 seq/s mlm_loss : 9.3070 nsp_loss : 0.0162 total_loss : 9.3232 avg_loss_step : 9.3232 learning_rate : 0.000212 loss_scaler : 32768
DLL 2020-11-27 03:04:03.965430 - Iteration: 56 throughput_train : 261.778 seq/s mlm_loss : 9.2850 nsp_loss : 0.0162 total_loss : 9.3012 avg_loss_step : 9.3012 learning_rate : 0.000216 loss_scaler : 32768
DLL 2020-11-27 03:04:05.915510 - Iteration: 57 throughput_train : 262.592 seq/s mlm_loss : 9.2349 nsp_loss : 0.0158 total_loss : 9.2507 avg_loss_step : 9.2507 learning_rate : 0.00022 loss_scaler : 32768
DLL 2020-11-27 03:04:07.867688 - Iteration: 58 throughput_train : 262.311 seq/s mlm_loss : 9.2527 nsp_loss : 0.0155 total_loss : 9.2682 avg_loss_step : 9.2682 learning_rate : 0.000224 loss_scaler : 32768
DLL 2020-11-27 03:04:09.808519 - Iteration: 59 throughput_train : 263.845 seq/s mlm_loss : 9.3086 nsp_loss : 0.0148 total_loss : 9.3234 avg_loss_step : 9.3234 learning_rate : 0.000228 loss_scaler : 32768
DLL 2020-11-27 03:04:11.749229 - Iteration: 60 throughput_train : 263.860 seq/s mlm_loss : 9.2488 nsp_loss : 0.0148 total_loss : 9.2635 avg_loss_step : 9.2635 learning_rate : 0.000232 loss_scaler : 32768
DLL 2020-11-27 03:04:13.697693 - Iteration: 61 throughput_train : 262.811 seq/s mlm_loss : 9.2955 nsp_loss : 0.0143 total_loss : 9.3097 avg_loss_step : 9.3097 learning_rate : 0.00023599999 loss_scaler : 32768
DLL 2020-11-27 03:04:15.654668 - Iteration: 62 throughput_train : 261.666 seq/s mlm_loss : 9.3006 nsp_loss : 0.0133 total_loss : 9.3139 avg_loss_step : 9.3139 learning_rate : 0.00024 loss_scaler : 32768
DLL 2020-11-27 03:04:17.602752 - Iteration: 63 throughput_train : 262.860 seq/s mlm_loss : 9.2198 nsp_loss : 0.0128 total_loss : 9.2326 avg_loss_step : 9.2326 learning_rate : 0.000244 loss_scaler : 32768
DLL 2020-11-27 03:04:19.535990 - Iteration: 64 throughput_train : 264.883 seq/s mlm_loss : 9.2287 nsp_loss : 0.0119 total_loss : 9.2405 avg_loss_step : 9.2405 learning_rate : 0.000248 loss_scaler : 32768
DLL 2020-11-27 03:04:21.482309 - Iteration: 65 throughput_train : 263.110 seq/s mlm_loss : 9.1751 nsp_loss : 0.0114 total_loss : 9.1865 avg_loss_step : 9.1865 learning_rate : 0.000252 loss_scaler : 32768
DLL 2020-11-27 03:04:23.425658 - Iteration: 66 throughput_train : 263.504 seq/s mlm_loss : 9.1630 nsp_loss : 0.0108 total_loss : 9.1737 avg_loss_step : 9.1737 learning_rate : 0.000256 loss_scaler : 32768
DLL 2020-11-27 03:04:25.366943 - Iteration: 67 throughput_train : 263.782 seq/s mlm_loss : 9.2060 nsp_loss : 0.0100 total_loss : 9.2160 avg_loss_step : 9.2160 learning_rate : 0.00026 loss_scaler : 32768
DLL 2020-11-27 03:04:27.312225 - Iteration: 68 throughput_train : 263.240 seq/s mlm_loss : 9.1642 nsp_loss : 0.0094 total_loss : 9.1737 avg_loss_step : 9.1737 learning_rate : 0.000264 loss_scaler : 32768
DLL 2020-11-27 03:04:29.264004 - Iteration: 69 throughput_train : 262.363 seq/s mlm_loss : 9.2531 nsp_loss : 0.0086 total_loss : 9.2617 avg_loss_step : 9.2617 learning_rate : 0.000268 loss_scaler : 32768
DLL 2020-11-27 03:04:31.209444 - Iteration: 70 throughput_train : 263.218 seq/s mlm_loss : 9.1726 nsp_loss : 0.0083 total_loss : 9.1808 avg_loss_step : 9.1808 learning_rate : 0.000272 loss_scaler : 32768
DLL 2020-11-27 03:04:33.161850 - Iteration: 71 throughput_train : 262.279 seq/s mlm_loss : 9.1361 nsp_loss : 0.0076 total_loss : 9.1437 avg_loss_step : 9.1437 learning_rate : 0.000276 loss_scaler : 32768
DLL 2020-11-27 03:04:35.113795 - Iteration: 72 throughput_train : 262.340 seq/s mlm_loss : 9.1495 nsp_loss : 0.0070 total_loss : 9.1565 avg_loss_step : 9.1565 learning_rate : 0.00028 loss_scaler : 32768
DLL 2020-11-27 03:04:37.067169 - Iteration: 73 throughput_train : 262.148 seq/s mlm_loss : 9.1399 nsp_loss : 0.0066 total_loss : 9.1465 avg_loss_step : 9.1465 learning_rate : 0.000284 loss_scaler : 32768
DLL 2020-11-27 03:04:39.012996 - Iteration: 74 throughput_train : 263.166 seq/s mlm_loss : 9.1689 nsp_loss : 0.0061 total_loss : 9.1750 avg_loss_step : 9.1750 learning_rate : 0.000288 loss_scaler : 32768
DLL 2020-11-27 03:04:40.958430 - Iteration: 75 throughput_train : 263.222 seq/s mlm_loss : 9.0112 nsp_loss : 0.0057 total_loss : 9.0169 avg_loss_step : 9.0169 learning_rate : 0.000292 loss_scaler : 32768
DLL 2020-11-27 03:04:42.901463 - Iteration: 76 throughput_train : 263.554 seq/s mlm_loss : 8.9815 nsp_loss : 0.0051 total_loss : 8.9866 avg_loss_step : 8.9866 learning_rate : 0.000296 loss_scaler : 32768
DLL 2020-11-27 03:04:44.846331 - Iteration: 77 throughput_train : 263.297 seq/s mlm_loss : 9.0860 nsp_loss : 0.0047 total_loss : 9.0907 avg_loss_step : 9.0907 learning_rate : 0.00029999999 loss_scaler : 32768
DLL 2020-11-27 03:04:46.791285 - Iteration: 78 throughput_train : 263.291 seq/s mlm_loss : 9.0976 nsp_loss : 0.0043 total_loss : 9.1018 avg_loss_step : 9.1018 learning_rate : 0.000304 loss_scaler : 32768
DLL 2020-11-27 03:04:48.734075 - Iteration: 79 throughput_train : 263.600 seq/s mlm_loss : 9.1228 nsp_loss : 0.0040 total_loss : 9.1268 avg_loss_step : 9.1268 learning_rate : 0.000308 loss_scaler : 32768
DLL 2020-11-27 03:04:50.680997 - Iteration: 80 throughput_train : 263.018 seq/s mlm_loss : 9.0853 nsp_loss : 0.0036 total_loss : 9.0889 avg_loss_step : 9.0889 learning_rate : 0.000312 loss_scaler : 32768
DLL 2020-11-27 03:04:52.630948 - Iteration: 81 throughput_train : 262.613 seq/s mlm_loss : 9.0399 nsp_loss : 0.0033 total_loss : 9.0432 avg_loss_step : 9.0432 learning_rate : 0.000316 loss_scaler : 32768
DLL 2020-11-27 03:04:54.577493 - Iteration: 82 throughput_train : 263.080 seq/s mlm_loss : 9.0501 nsp_loss : 0.0030 total_loss : 9.0530 avg_loss_step : 9.0530 learning_rate : 0.00032 loss_scaler : 32768
DLL 2020-11-27 03:04:56.525278 - Iteration: 83 throughput_train : 262.902 seq/s mlm_loss : 9.0453 nsp_loss : 0.0027 total_loss : 9.0480 avg_loss_step : 9.0480 learning_rate : 0.000324 loss_scaler : 32768
DLL 2020-11-27 03:04:58.468384 - Iteration: 84 throughput_train : 263.535 seq/s mlm_loss : 9.0070 nsp_loss : 0.0025 total_loss : 9.0095 avg_loss_step : 9.0095 learning_rate : 0.000328 loss_scaler : 32768
DLL 2020-11-27 03:05:00.408075 - Iteration: 85 throughput_train : 263.999 seq/s mlm_loss : 9.0142 nsp_loss : 0.0023 total_loss : 9.0165 avg_loss_step : 9.0165 learning_rate : 0.000332 loss_scaler : 32768
DLL 2020-11-27 03:05:02.363888 - Iteration: 86 throughput_train : 261.822 seq/s mlm_loss : 9.0216 nsp_loss : 0.0021 total_loss : 9.0237 avg_loss_step : 9.0237 learning_rate : 0.000336 loss_scaler : 32768
DLL 2020-11-27 03:05:04.309124 - Iteration: 87 throughput_train : 263.245 seq/s mlm_loss : 9.0332 nsp_loss : 0.0019 total_loss : 9.0351 avg_loss_step : 9.0351 learning_rate : 0.00034 loss_scaler : 32768
DLL 2020-11-27 03:05:06.254732 - Iteration: 88 throughput_train : 263.195 seq/s mlm_loss : 9.0397 nsp_loss : 0.0018 total_loss : 9.0414 avg_loss_step : 9.0414 learning_rate : 0.000344 loss_scaler : 32768
DLL 2020-11-27 03:05:08.194217 - Iteration: 89 throughput_train : 264.026 seq/s mlm_loss : 8.9267 nsp_loss : 0.0016 total_loss : 8.9283 avg_loss_step : 8.9283 learning_rate : 0.000348 loss_scaler : 32768
DLL 2020-11-27 03:05:10.135184 - Iteration: 90 throughput_train : 263.824 seq/s mlm_loss : 8.9722 nsp_loss : 0.0014 total_loss : 8.9736 avg_loss_step : 8.9736 learning_rate : 0.000352 loss_scaler : 32768
DLL 2020-11-27 03:05:12.082224 - Iteration: 91 throughput_train : 263.002 seq/s mlm_loss : 9.0116 nsp_loss : 0.0013 total_loss : 9.0129 avg_loss_step : 9.0129 learning_rate : 0.000356 loss_scaler : 32768
DLL 2020-11-27 03:05:14.028997 - Iteration: 92 throughput_train : 263.038 seq/s mlm_loss : 8.9696 nsp_loss : 0.0012 total_loss : 8.9708 avg_loss_step : 8.9708 learning_rate : 0.00036 loss_scaler : 32768
DLL 2020-11-27 03:05:15.973743 - Iteration: 93 throughput_train : 263.312 seq/s mlm_loss : 8.8364 nsp_loss : 0.0011 total_loss : 8.8375 avg_loss_step : 8.8375 learning_rate : 0.000364 loss_scaler : 32768
DLL 2020-11-27 03:05:17.934060 - Iteration: 94 throughput_train : 261.220 seq/s mlm_loss : 8.8243 nsp_loss : 0.0010 total_loss : 8.8253 avg_loss_step : 8.8253 learning_rate : 0.000368 loss_scaler : 32768
DLL 2020-11-27 03:05:19.873869 - Iteration: 95 throughput_train : 263.982 seq/s mlm_loss : 8.9199 nsp_loss : 0.0009 total_loss : 8.9208 avg_loss_step : 8.9208 learning_rate : 0.000372 loss_scaler : 32768
DLL 2020-11-27 03:05:21.825101 - Iteration: 96 throughput_train : 262.437 seq/s mlm_loss : 8.9223 nsp_loss : 0.0009 total_loss : 8.9232 avg_loss_step : 8.9232 learning_rate : 0.000376 loss_scaler : 32768
DLL 2020-11-27 03:05:23.768475 - Iteration: 97 throughput_train : 263.498 seq/s mlm_loss : 8.8848 nsp_loss : 0.0008 total_loss : 8.8856 avg_loss_step : 8.8856 learning_rate : 0.00038 loss_scaler : 32768
DLL 2020-11-27 03:05:25.719176 - Iteration: 98 throughput_train : 262.512 seq/s mlm_loss : 8.7879 nsp_loss : 0.0008 total_loss : 8.7887 avg_loss_step : 8.7887 learning_rate : 0.000384 loss_scaler : 32768
DLL 2020-11-27 03:05:27.664530 - Iteration: 99 throughput_train : 263.230 seq/s mlm_loss : 8.9027 nsp_loss : 0.0008 total_loss : 8.9034 avg_loss_step : 8.9034 learning_rate : 0.000388 loss_scaler : 32768
DLL 2020-11-27 03:05:29.611703 - Iteration: 100 throughput_train : 262.985 seq/s mlm_loss : 8.8705 nsp_loss : 0.0007 total_loss : 8.8712 avg_loss_step : 8.8712 learning_rate : 0.000392 loss_scaler : 32768
DLL 2020-11-27 03:05:31.569702 - Iteration: 101 throughput_train : 261.534 seq/s mlm_loss : 8.9104 nsp_loss : 0.0007 total_loss : 8.9112 avg_loss_step : 8.9112 learning_rate : 0.000396 loss_scaler : 32768
DLL 2020-11-27 03:05:33.508654 - Iteration: 102 throughput_train : 264.099 seq/s mlm_loss : 8.9061 nsp_loss : 0.0007 total_loss : 8.9068 avg_loss_step : 8.9068 learning_rate : 0.0003794733 loss_scaler : 32768
DLL 2020-11-27 03:05:35.457738 - Iteration: 103 throughput_train : 262.725 seq/s mlm_loss : 8.7767 nsp_loss : 0.0007 total_loss : 8.7774 avg_loss_step : 8.7774 learning_rate : 0.00037926243 loss_scaler : 32768
DLL 2020-11-27 03:05:37.407342 - Iteration: 104 throughput_train : 262.656 seq/s mlm_loss : 8.8475 nsp_loss : 0.0007 total_loss : 8.8482 avg_loss_step : 8.8482 learning_rate : 0.00037905143 loss_scaler : 32768
DLL 2020-11-27 03:05:39.357321 - Iteration: 105 throughput_train : 262.607 seq/s mlm_loss : 8.8304 nsp_loss : 0.0007 total_loss : 8.8311 avg_loss_step : 8.8311 learning_rate : 0.0003788403 loss_scaler : 32768
DLL 2020-11-27 03:05:41.301994 - Iteration: 106 throughput_train : 263.324 seq/s mlm_loss : 8.7950 nsp_loss : 0.0007 total_loss : 8.7957 avg_loss_step : 8.7957 learning_rate : 0.0003786291 loss_scaler : 32768
DLL 2020-11-27 03:05:43.252256 - Iteration: 107 throughput_train : 262.569 seq/s mlm_loss : 8.8664 nsp_loss : 0.0007 total_loss : 8.8671 avg_loss_step : 8.8671 learning_rate : 0.00037841775 loss_scaler : 32768
DLL 2020-11-27 03:05:45.209929 - Iteration: 108 throughput_train : 261.573 seq/s mlm_loss : 8.9076 nsp_loss : 0.0007 total_loss : 8.9083 avg_loss_step : 8.9083 learning_rate : 0.00037820628 loss_scaler : 32768
DLL 2020-11-27 03:05:47.177711 - Iteration: 109 throughput_train : 260.229 seq/s mlm_loss : 8.7935 nsp_loss : 0.0007 total_loss : 8.7942 avg_loss_step : 8.7942 learning_rate : 0.0003779947 loss_scaler : 32768
DLL 2020-11-27 03:05:49.127389 - Iteration: 110 throughput_train : 262.646 seq/s mlm_loss : 8.7110 nsp_loss : 0.0008 total_loss : 8.7117 avg_loss_step : 8.7117 learning_rate : 0.000377783 loss_scaler : 32768
DLL 2020-11-27 03:05:51.061484 - Iteration: 111 throughput_train : 264.765 seq/s mlm_loss : 8.6658 nsp_loss : 0.0007 total_loss : 8.6665 avg_loss_step : 8.6665 learning_rate : 0.00037757118 loss_scaler : 32768
DLL 2020-11-27 03:05:53.002014 - Iteration: 112 throughput_train : 263.885 seq/s mlm_loss : 8.8376 nsp_loss : 0.0007 total_loss : 8.8383 avg_loss_step : 8.8383 learning_rate : 0.00037735925 loss_scaler : 32768
DLL 2020-11-27 03:05:54.956762 - Iteration: 113 throughput_train : 261.965 seq/s mlm_loss : 8.8310 nsp_loss : 0.0007 total_loss : 8.8317 avg_loss_step : 8.8317 learning_rate : 0.0003771472 loss_scaler : 32768
DLL 2020-11-27 03:05:56.904382 - Iteration: 114 throughput_train : 262.923 seq/s mlm_loss : 8.7930 nsp_loss : 0.0007 total_loss : 8.7937 avg_loss_step : 8.7937 learning_rate : 0.000376935 loss_scaler : 32768
DLL 2020-11-27 03:05:58.839994 - Iteration: 115 throughput_train : 264.556 seq/s mlm_loss : 8.7929 nsp_loss : 0.0007 total_loss : 8.7936 avg_loss_step : 8.7936 learning_rate : 0.0003767227 loss_scaler : 32768
DLL 2020-11-27 03:06:00.784568 - Iteration: 116 throughput_train : 263.336 seq/s mlm_loss : 8.6613 nsp_loss : 0.0007 total_loss : 8.6619 avg_loss_step : 8.6619 learning_rate : 0.0003765103 loss_scaler : 32768
DLL 2020-11-27 03:06:02.728695 - Iteration: 117 throughput_train : 263.401 seq/s mlm_loss : 8.7185 nsp_loss : 0.0006 total_loss : 8.7191 avg_loss_step : 8.7191 learning_rate : 0.00037629774 loss_scaler : 32768
DLL 2020-11-27 03:06:04.673613 - Iteration: 118 throughput_train : 263.289 seq/s mlm_loss : 8.6652 nsp_loss : 0.0006 total_loss : 8.6658 avg_loss_step : 8.6658 learning_rate : 0.00037608508 loss_scaler : 32768
DLL 2020-11-27 03:06:06.621225 - Iteration: 119 throughput_train : 262.924 seq/s mlm_loss : 8.6610 nsp_loss : 0.0005 total_loss : 8.6616 avg_loss_step : 8.6616 learning_rate : 0.0003758723 loss_scaler : 32768
DLL 2020-11-27 03:06:08.571999 - Iteration: 120 throughput_train : 262.502 seq/s mlm_loss : 8.7262 nsp_loss : 0.0005 total_loss : 8.7267 avg_loss_step : 8.7267 learning_rate : 0.0003756594 loss_scaler : 32768
DLL 2020-11-27 03:06:10.541920 - Iteration: 121 throughput_train : 259.947 seq/s mlm_loss : 8.6293 nsp_loss : 0.0005 total_loss : 8.6298 avg_loss_step : 8.6298 learning_rate : 0.00037544637 loss_scaler : 32768
DLL 2020-11-27 03:06:12.490909 - Iteration: 122 throughput_train : 262.739 seq/s mlm_loss : 8.5872 nsp_loss : 0.0005 total_loss : 8.5877 avg_loss_step : 8.5877 learning_rate : 0.00037523327 loss_scaler : 32768
DLL 2020-11-27 03:06:14.435157 - Iteration: 123 throughput_train : 263.379 seq/s mlm_loss : 8.7120 nsp_loss : 0.0005 total_loss : 8.7124 avg_loss_step : 8.7124 learning_rate : 0.00037502 loss_scaler : 32768
DLL 2020-11-27 03:06:16.376265 - Iteration: 124 throughput_train : 263.806 seq/s mlm_loss : 8.6522 nsp_loss : 0.0004 total_loss : 8.6527 avg_loss_step : 8.6527 learning_rate : 0.0003748066 loss_scaler : 32768
DLL 2020-11-27 03:06:18.322954 - Iteration: 125 throughput_train : 263.049 seq/s mlm_loss : 8.6442 nsp_loss : 0.0004 total_loss : 8.6446 avg_loss_step : 8.6446 learning_rate : 0.0003745931 loss_scaler : 32768
DLL 2020-11-27 03:06:20.267349 - Iteration: 126 throughput_train : 263.359 seq/s mlm_loss : 8.6053 nsp_loss : 0.0004 total_loss : 8.6057 avg_loss_step : 8.6057 learning_rate : 0.00037437948 loss_scaler : 32768
DLL 2020-11-27 03:06:22.212657 - Iteration: 127 throughput_train : 263.239 seq/s mlm_loss : 8.5943 nsp_loss : 0.0004 total_loss : 8.5946 avg_loss_step : 8.5946 learning_rate : 0.00037416574 loss_scaler : 32768
DLL 2020-11-27 03:06:24.156897 - Iteration: 128 throughput_train : 263.381 seq/s mlm_loss : 8.5235 nsp_loss : 0.0004 total_loss : 8.5238 avg_loss_step : 8.5238 learning_rate : 0.00037395186 loss_scaler : 32768
DLL 2020-11-27 03:06:26.095908 - Iteration: 129 throughput_train : 264.091 seq/s mlm_loss : 8.5214 nsp_loss : 0.0003 total_loss : 8.5217 avg_loss_step : 8.5217 learning_rate : 0.0003737379 loss_scaler : 32768
DLL 2020-11-27 03:06:28.048722 - Iteration: 130 throughput_train : 262.224 seq/s mlm_loss : 8.4610 nsp_loss : 0.0003 total_loss : 8.4613 avg_loss_step : 8.4613 learning_rate : 0.00037352374 loss_scaler : 32768
DLL 2020-11-27 03:06:30.000980 - Iteration: 131 throughput_train : 262.298 seq/s mlm_loss : 8.6204 nsp_loss : 0.0003 total_loss : 8.6207 avg_loss_step : 8.6207 learning_rate : 0.0003733095 loss_scaler : 32768
DLL 2020-11-27 03:06:31.948226 - Iteration: 132 throughput_train : 262.975 seq/s mlm_loss : 8.7430 nsp_loss : 0.0003 total_loss : 8.7432 avg_loss_step : 8.7432 learning_rate : 0.00037309516 loss_scaler : 32768
DLL 2020-11-27 03:06:33.887387 - Iteration: 133 throughput_train : 264.073 seq/s mlm_loss : 8.6713 nsp_loss : 0.0003 total_loss : 8.6716 avg_loss_step : 8.6716 learning_rate : 0.00037288066 loss_scaler : 32768
DLL 2020-11-27 03:06:35.836325 - Iteration: 134 throughput_train : 262.748 seq/s mlm_loss : 8.6065 nsp_loss : 0.0003 total_loss : 8.6068 avg_loss_step : 8.6068 learning_rate : 0.00037266605 loss_scaler : 32768
DLL 2020-11-27 03:06:37.780946 - Iteration: 135 throughput_train : 263.331 seq/s mlm_loss : 8.6003 nsp_loss : 0.0002 total_loss : 8.6006 avg_loss_step : 8.6006 learning_rate : 0.00037245132 loss_scaler : 32768
DLL 2020-11-27 03:06:39.728538 - Iteration: 136 throughput_train : 262.928 seq/s mlm_loss : 8.5613 nsp_loss : 0.0002 total_loss : 8.5615 avg_loss_step : 8.5615 learning_rate : 0.00037223648 loss_scaler : 32768
DLL 2020-11-27 03:06:41.670525 - Iteration: 137 throughput_train : 263.687 seq/s mlm_loss : 8.5178 nsp_loss : 0.0002 total_loss : 8.5180 avg_loss_step : 8.5180 learning_rate : 0.0003720215 loss_scaler : 32768
DLL 2020-11-27 03:06:43.614944 - Iteration: 138 throughput_train : 263.356 seq/s mlm_loss : 8.4490 nsp_loss : 0.0002 total_loss : 8.4492 avg_loss_step : 8.4492 learning_rate : 0.00037180638 loss_scaler : 32768
DLL 2020-11-27 03:06:45.561708 - Iteration: 139 throughput_train : 263.039 seq/s mlm_loss : 8.5115 nsp_loss : 0.0002 total_loss : 8.5117 avg_loss_step : 8.5117 learning_rate : 0.00037159116 loss_scaler : 32768
DLL 2020-11-27 03:06:47.510462 - Iteration: 140 throughput_train : 262.771 seq/s mlm_loss : 8.6405 nsp_loss : 0.0002 total_loss : 8.6407 avg_loss_step : 8.6407 learning_rate : 0.00037137582 loss_scaler : 32768
DLL 2020-11-27 03:06:49.456153 - Iteration: 141 throughput_train : 263.187 seq/s mlm_loss : 8.4564 nsp_loss : 0.0002 total_loss : 8.4566 avg_loss_step : 8.4566 learning_rate : 0.00037116033 loss_scaler : 32768
DLL 2020-11-27 03:06:51.409935 - Iteration: 142 throughput_train : 262.097 seq/s mlm_loss : 8.5094 nsp_loss : 0.0002 total_loss : 8.5096 avg_loss_step : 8.5096 learning_rate : 0.00037094473 loss_scaler : 32768
DLL 2020-11-27 03:06:53.356027 - Iteration: 143 throughput_train : 263.132 seq/s mlm_loss : 8.5267 nsp_loss : 0.0002 total_loss : 8.5268 avg_loss_step : 8.5268 learning_rate : 0.000370729 loss_scaler : 32768
DLL 2020-11-27 03:06:55.302222 - Iteration: 144 throughput_train : 263.117 seq/s mlm_loss : 8.5299 nsp_loss : 0.0002 total_loss : 8.5300 avg_loss_step : 8.5300 learning_rate : 0.00037051315 loss_scaler : 32768
DLL 2020-11-27 03:06:57.234569 - Iteration: 145 throughput_train : 265.003 seq/s mlm_loss : 8.5149 nsp_loss : 0.0002 total_loss : 8.5151 avg_loss_step : 8.5151 learning_rate : 0.00037029717 loss_scaler : 32768
DLL 2020-11-27 03:06:59.181302 - Iteration: 146 throughput_train : 263.044 seq/s mlm_loss : 8.4608 nsp_loss : 0.0002 total_loss : 8.4609 avg_loss_step : 8.4609 learning_rate : 0.00037008105 loss_scaler : 32768
DLL 2020-11-27 03:07:01.123964 - Iteration: 147 throughput_train : 263.597 seq/s mlm_loss : 8.4247 nsp_loss : 0.0002 total_loss : 8.4249 avg_loss_step : 8.4249 learning_rate : 0.00036986484 loss_scaler : 32768
DLL 2020-11-27 03:07:03.071779 - Iteration: 148 throughput_train : 262.897 seq/s mlm_loss : 8.4268 nsp_loss : 0.0001 total_loss : 8.4270 avg_loss_step : 8.4270 learning_rate : 0.00036964848 loss_scaler : 32768
DLL 2020-11-27 03:07:05.015625 - Iteration: 149 throughput_train : 263.436 seq/s mlm_loss : 8.4862 nsp_loss : 0.0001 total_loss : 8.4863 avg_loss_step : 8.4863 learning_rate : 0.00036943197 loss_scaler : 32768
DLL 2020-11-27 03:07:06.962769 - Iteration: 150 throughput_train : 263.001 seq/s mlm_loss : 8.5256 nsp_loss : 0.0001 total_loss : 8.5258 avg_loss_step : 8.5258 learning_rate : 0.00036921538 loss_scaler : 32768
DLL 2020-11-27 03:07:08.906361 - Iteration: 151 throughput_train : 263.483 seq/s mlm_loss : 8.3895 nsp_loss : 0.0001 total_loss : 8.3896 avg_loss_step : 8.3896 learning_rate : 0.00036899865 loss_scaler : 32768
DLL 2020-11-27 03:07:10.853418 - Iteration: 152 throughput_train : 263.004 seq/s mlm_loss : 8.4238 nsp_loss : 0.0001 total_loss : 8.4239 avg_loss_step : 8.4239 learning_rate : 0.00036878177 loss_scaler : 32768
DLL 2020-11-27 03:07:12.796118 - Iteration: 153 throughput_train : 263.592 seq/s mlm_loss : 8.4345 nsp_loss : 0.0001 total_loss : 8.4347 avg_loss_step : 8.4347 learning_rate : 0.00036856477 loss_scaler : 32768
DLL 2020-11-27 03:07:14.743465 - Iteration: 154 throughput_train : 262.961 seq/s mlm_loss : 8.4097 nsp_loss : 0.0001 total_loss : 8.4098 avg_loss_step : 8.4098 learning_rate : 0.00036834765 loss_scaler : 32768
DLL 2020-11-27 03:07:16.688479 - Iteration: 155 throughput_train : 263.277 seq/s mlm_loss : 8.4177 nsp_loss : 0.0001 total_loss : 8.4178 avg_loss_step : 8.4178 learning_rate : 0.00036813042 loss_scaler : 32768
DLL 2020-11-27 03:07:18.638132 - Iteration: 156 throughput_train : 262.650 seq/s mlm_loss : 8.3631 nsp_loss : 0.0001 total_loss : 8.3632 avg_loss_step : 8.3632 learning_rate : 0.00036791302 loss_scaler : 32768
DLL 2020-11-27 03:07:20.583636 - Iteration: 157 throughput_train : 263.224 seq/s mlm_loss : 8.4297 nsp_loss : 0.0001 total_loss : 8.4299 avg_loss_step : 8.4299 learning_rate : 0.00036769552 loss_scaler : 32768
DLL 2020-11-27 03:07:22.521711 - Iteration: 158 throughput_train : 264.226 seq/s mlm_loss : 8.3151 nsp_loss : 0.0001 total_loss : 8.3152 avg_loss_step : 8.3152 learning_rate : 0.00036747789 loss_scaler : 32768
DLL 2020-11-27 03:07:24.468532 - Iteration: 159 throughput_train : 263.042 seq/s mlm_loss : 8.5058 nsp_loss : 0.0001 total_loss : 8.5059 avg_loss_step : 8.5059 learning_rate : 0.0003672601 loss_scaler : 32768
DLL 2020-11-27 03:07:26.420171 - Iteration: 160 throughput_train : 262.388 seq/s mlm_loss : 8.3777 nsp_loss : 0.0001 total_loss : 8.3778 avg_loss_step : 8.3778 learning_rate : 0.00036704223 loss_scaler : 32768
DLL 2020-11-27 03:07:28.362922 - Iteration: 161 throughput_train : 263.584 seq/s mlm_loss : 8.3775 nsp_loss : 0.0001 total_loss : 8.3776 avg_loss_step : 8.3776 learning_rate : 0.00036682418 loss_scaler : 32768
DLL 2020-11-27 03:07:30.311312 - Iteration: 162 throughput_train : 262.831 seq/s mlm_loss : 8.3661 nsp_loss : 0.0001 total_loss : 8.3662 avg_loss_step : 8.3662 learning_rate : 0.00036660602 loss_scaler : 32768
DLL 2020-11-27 03:07:32.261074 - Iteration: 163 throughput_train : 262.646 seq/s mlm_loss : 8.3650 nsp_loss : 0.0001 total_loss : 8.3651 avg_loss_step : 8.3651 learning_rate : 0.00036638777 loss_scaler : 32768
DLL 2020-11-27 03:07:34.197052 - Iteration: 164 throughput_train : 264.507 seq/s mlm_loss : 8.3360 nsp_loss : 0.0001 total_loss : 8.3361 avg_loss_step : 8.3361 learning_rate : 0.00036616935 loss_scaler : 32768
DLL 2020-11-27 03:07:36.141043 - Iteration: 165 throughput_train : 263.414 seq/s mlm_loss : 8.2951 nsp_loss : 0.0001 total_loss : 8.2952 avg_loss_step : 8.2952 learning_rate : 0.0003659508 loss_scaler : 32768
DLL 2020-11-27 03:07:38.082575 - Iteration: 166 throughput_train : 263.750 seq/s mlm_loss : 8.3435 nsp_loss : 0.0001 total_loss : 8.3436 avg_loss_step : 8.3436 learning_rate : 0.00036573215 loss_scaler : 32768
DLL 2020-11-27 03:07:40.025780 - Iteration: 167 throughput_train : 263.521 seq/s mlm_loss : 8.2895 nsp_loss : 0.0001 total_loss : 8.2896 avg_loss_step : 8.2896 learning_rate : 0.00036551332 loss_scaler : 32768
DLL 2020-11-27 03:07:41.969348 - Iteration: 168 throughput_train : 263.472 seq/s mlm_loss : 8.2783 nsp_loss : 0.0001 total_loss : 8.2784 avg_loss_step : 8.2784 learning_rate : 0.0003652944 loss_scaler : 32768
DLL 2020-11-27 03:07:43.912044 - Iteration: 169 throughput_train : 263.593 seq/s mlm_loss : 8.2646 nsp_loss : 0.0001 total_loss : 8.2647 avg_loss_step : 8.2647 learning_rate : 0.0003650753 loss_scaler : 32768
DLL 2020-11-27 03:07:45.853093 - Iteration: 170 throughput_train : 263.813 seq/s mlm_loss : 8.2263 nsp_loss : 0.0001 total_loss : 8.2264 avg_loss_step : 8.2264 learning_rate : 0.00036485613 loss_scaler : 32768
DLL 2020-11-27 03:07:47.799568 - Iteration: 171 throughput_train : 263.080 seq/s mlm_loss : 8.2997 nsp_loss : 0.0001 total_loss : 8.2998 avg_loss_step : 8.2998 learning_rate : 0.0003646368 loss_scaler : 32768
DLL 2020-11-27 03:07:49.750046 - Iteration: 172 throughput_train : 262.538 seq/s mlm_loss : 8.3229 nsp_loss : 0.0001 total_loss : 8.3229 avg_loss_step : 8.3229 learning_rate : 0.00036441733 loss_scaler : 32768
DLL 2020-11-27 03:07:51.694425 - Iteration: 173 throughput_train : 263.363 seq/s mlm_loss : 8.2844 nsp_loss : 0.0001 total_loss : 8.2845 avg_loss_step : 8.2845 learning_rate : 0.00036419774 loss_scaler : 32768
DLL 2020-11-27 03:07:53.638045 - Iteration: 174 throughput_train : 263.467 seq/s mlm_loss : 8.1636 nsp_loss : 0.0001 total_loss : 8.1636 avg_loss_step : 8.1636 learning_rate : 0.00036397803 loss_scaler : 32768
DLL 2020-11-27 03:07:55.574559 - Iteration: 175 throughput_train : 264.434 seq/s mlm_loss : 8.1904 nsp_loss : 0.0001 total_loss : 8.1905 avg_loss_step : 8.1905 learning_rate : 0.00036375815 loss_scaler : 32768
DLL 2020-11-27 03:07:57.513801 - Iteration: 176 throughput_train : 264.060 seq/s mlm_loss : 8.2629 nsp_loss : 0.0001 total_loss : 8.2630 avg_loss_step : 8.2630 learning_rate : 0.00036353816 loss_scaler : 32768
DLL 2020-11-27 03:07:59.482989 - Iteration: 177 throughput_train : 260.046 seq/s mlm_loss : 8.2931 nsp_loss : 0.0001 total_loss : 8.2931 avg_loss_step : 8.2931 learning_rate : 0.00036331802 loss_scaler : 32768
DLL 2020-11-27 03:08:01.445179 - Iteration: 178 throughput_train : 260.982 seq/s mlm_loss : 8.2797 nsp_loss : 0.0001 total_loss : 8.2797 avg_loss_step : 8.2797 learning_rate : 0.0003630978 loss_scaler : 32768
DLL 2020-11-27 03:08:03.392348 - Iteration: 179 throughput_train : 262.996 seq/s mlm_loss : 8.2624 nsp_loss : 0.0001 total_loss : 8.2625 avg_loss_step : 8.2625 learning_rate : 0.00036287738 loss_scaler : 32768
DLL 2020-11-27 03:08:05.336201 - Iteration: 180 throughput_train : 263.445 seq/s mlm_loss : 8.1726 nsp_loss : 0.0001 total_loss : 8.1726 avg_loss_step : 8.1726 learning_rate : 0.00036265687 loss_scaler : 32768
DLL 2020-11-27 03:08:07.286192 - Iteration: 181 throughput_train : 262.604 seq/s mlm_loss : 8.2768 nsp_loss : 0.0001 total_loss : 8.2768 avg_loss_step : 8.2768 learning_rate : 0.0003624362 loss_scaler : 32768
DLL 2020-11-27 03:08:09.231375 - Iteration: 182 throughput_train : 263.253 seq/s mlm_loss : 8.1903 nsp_loss : 0.0001 total_loss : 8.1903 avg_loss_step : 8.1903 learning_rate : 0.0003622154 loss_scaler : 32768
DLL 2020-11-27 03:08:11.179220 - Iteration: 183 throughput_train : 262.898 seq/s mlm_loss : 8.1683 nsp_loss : 0.0001 total_loss : 8.1683 avg_loss_step : 8.1683 learning_rate : 0.00036199446 loss_scaler : 32768
DLL 2020-11-27 03:08:13.119181 - Iteration: 184 throughput_train : 263.962 seq/s mlm_loss : 8.2485 nsp_loss : 0.0001 total_loss : 8.2485 avg_loss_step : 8.2485 learning_rate : 0.0003617734 loss_scaler : 32768
DLL 2020-11-27 03:08:15.063698 - Iteration: 185 throughput_train : 263.343 seq/s mlm_loss : 8.1498 nsp_loss : 0.0001 total_loss : 8.1499 avg_loss_step : 8.1499 learning_rate : 0.0003615522 loss_scaler : 32768
DLL 2020-11-27 03:08:17.016611 - Iteration: 186 throughput_train : 262.213 seq/s mlm_loss : 8.1920 nsp_loss : 0.0000 total_loss : 8.1920 avg_loss_step : 8.1920 learning_rate : 0.00036133087 loss_scaler : 32768
DLL 2020-11-27 03:08:18.960383 - Iteration: 187 throughput_train : 263.444 seq/s mlm_loss : 8.0945 nsp_loss : 0.0000 total_loss : 8.0946 avg_loss_step : 8.0946 learning_rate : 0.00036110939 loss_scaler : 32768
DLL 2020-11-27 03:08:20.907984 - Iteration: 188 throughput_train : 262.927 seq/s mlm_loss : 8.1721 nsp_loss : 0.0000 total_loss : 8.1721 avg_loss_step : 8.1721 learning_rate : 0.0003608878 loss_scaler : 32768
DLL 2020-11-27 03:08:22.841622 - Iteration: 189 throughput_train : 264.825 seq/s mlm_loss : 8.1278 nsp_loss : 0.0000 total_loss : 8.1279 avg_loss_step : 8.1279 learning_rate : 0.00036066602 loss_scaler : 32768
DLL 2020-11-27 03:08:24.788764 - Iteration: 190 throughput_train : 262.987 seq/s mlm_loss : 8.1138 nsp_loss : 0.0000 total_loss : 8.1138 avg_loss_step : 8.1138 learning_rate : 0.00036044416 loss_scaler : 32768
DLL 2020-11-27 03:08:26.737418 - Iteration: 191 throughput_train : 262.784 seq/s mlm_loss : 8.1500 nsp_loss : 0.0000 total_loss : 8.1500 avg_loss_step : 8.1500 learning_rate : 0.00036022213 loss_scaler : 32768
DLL 2020-11-27 03:08:28.697663 - Iteration: 192 throughput_train : 261.235 seq/s mlm_loss : 8.1091 nsp_loss : 0.0000 total_loss : 8.1091 avg_loss_step : 8.1091 learning_rate : 0.00035999998 loss_scaler : 32768
DLL 2020-11-27 03:08:30.642138 - Iteration: 193 throughput_train : 263.359 seq/s mlm_loss : 8.1852 nsp_loss : 0.0000 total_loss : 8.1852 avg_loss_step : 8.1852 learning_rate : 0.0003597777 loss_scaler : 32768
DLL 2020-11-27 03:08:32.593501 - Iteration: 194 throughput_train : 262.428 seq/s mlm_loss : 8.1016 nsp_loss : 0.0000 total_loss : 8.1016 avg_loss_step : 8.1016 learning_rate : 0.00035955527 loss_scaler : 32768
DLL 2020-11-27 03:08:34.535703 - Iteration: 195 throughput_train : 263.659 seq/s mlm_loss : 8.1958 nsp_loss : 0.0000 total_loss : 8.1959 avg_loss_step : 8.1959 learning_rate : 0.00035933268 loss_scaler : 32768
DLL 2020-11-27 03:08:36.483789 - Iteration: 196 throughput_train : 262.860 seq/s mlm_loss : 8.0895 nsp_loss : 0.0000 total_loss : 8.0896 avg_loss_step : 8.0896 learning_rate : 0.00035911 loss_scaler : 32768
DLL 2020-11-27 03:08:38.428640 - Iteration: 197 throughput_train : 263.297 seq/s mlm_loss : 8.1169 nsp_loss : 0.0000 total_loss : 8.1169 avg_loss_step : 8.1169 learning_rate : 0.00035888716 loss_scaler : 32768
DLL 2020-11-27 03:08:40.370687 - Iteration: 198 throughput_train : 263.678 seq/s mlm_loss : 8.0807 nsp_loss : 0.0000 total_loss : 8.0808 avg_loss_step : 8.0808 learning_rate : 0.0003586642 loss_scaler : 32768
DLL 2020-11-27 03:08:42.310836 - Iteration: 199 throughput_train : 263.936 seq/s mlm_loss : 8.1241 nsp_loss : 0.0000 total_loss : 8.1241 avg_loss_step : 8.1241 learning_rate : 0.00035844106 loss_scaler : 32768
DLL 2020-11-27 03:08:44.260879 - Iteration: 200 throughput_train : 262.597 seq/s mlm_loss : 8.0980 nsp_loss : 0.0000 total_loss : 8.0980 avg_loss_step : 8.0980 learning_rate : 0.0003582178 loss_scaler : 32768
DLL 2020-11-27 03:08:46.211101 - Iteration: 201 throughput_train : 262.574 seq/s mlm_loss : 8.0260 nsp_loss : 0.0000 total_loss : 8.0260 avg_loss_step : 8.0260 learning_rate : 0.0003579944 loss_scaler : 32768
DLL 2020-11-27 03:08:48.163125 - Iteration: 202 throughput_train : 262.337 seq/s mlm_loss : 8.0189 nsp_loss : 0.0000 total_loss : 8.0189 avg_loss_step : 8.0189 learning_rate : 0.00035777086 loss_scaler : 32768
DLL 2020-11-27 03:08:50.101095 - Iteration: 203 throughput_train : 264.244 seq/s mlm_loss : 7.9208 nsp_loss : 0.0000 total_loss : 7.9209 avg_loss_step : 7.9209 learning_rate : 0.0003575472 loss_scaler : 32768
DLL 2020-11-27 03:08:52.045706 - Iteration: 204 throughput_train : 263.332 seq/s mlm_loss : 8.0649 nsp_loss : 0.0000 total_loss : 8.0649 avg_loss_step : 8.0649 learning_rate : 0.00035732338 loss_scaler : 32768
DLL 2020-11-27 03:08:53.998160 - Iteration: 205 throughput_train : 262.273 seq/s mlm_loss : 8.1221 nsp_loss : 0.0000 total_loss : 8.1222 avg_loss_step : 8.1222 learning_rate : 0.0003570994 loss_scaler : 32768
DLL 2020-11-27 03:08:55.941833 - Iteration: 206 throughput_train : 263.458 seq/s mlm_loss : 7.9739 nsp_loss : 0.0000 total_loss : 7.9740 avg_loss_step : 7.9740 learning_rate : 0.0003568753 loss_scaler : 32768
DLL 2020-11-27 03:08:57.888539 - Iteration: 207 throughput_train : 263.048 seq/s mlm_loss : 7.9126 nsp_loss : 0.0000 total_loss : 7.9127 avg_loss_step : 7.9127 learning_rate : 0.0003566511 loss_scaler : 32768
DLL 2020-11-27 03:08:59.816639 - Iteration: 208 throughput_train : 265.588 seq/s mlm_loss : 8.0615 nsp_loss : 0.0000 total_loss : 8.0616 avg_loss_step : 8.0616 learning_rate : 0.0003564267 loss_scaler : 32768
DLL 2020-11-27 03:09:01.767805 - Iteration: 209 throughput_train : 262.445 seq/s mlm_loss : 8.1732 nsp_loss : 0.0000 total_loss : 8.1732 avg_loss_step : 8.1732 learning_rate : 0.0003562022 loss_scaler : 32768
DLL 2020-11-27 03:09:03.698268 - Iteration: 210 throughput_train : 265.260 seq/s mlm_loss : 8.0377 nsp_loss : 0.0000 total_loss : 8.0378 avg_loss_step : 8.0378 learning_rate : 0.00035597754 loss_scaler : 32768
DLL 2020-11-27 03:09:05.647353 - Iteration: 211 throughput_train : 262.726 seq/s mlm_loss : 8.0882 nsp_loss : 0.0000 total_loss : 8.0882 avg_loss_step : 8.0882 learning_rate : 0.0003557527 loss_scaler : 32768
DLL 2020-11-27 03:09:07.588237 - Iteration: 212 throughput_train : 263.839 seq/s mlm_loss : 8.0901 nsp_loss : 0.0000 total_loss : 8.0901 avg_loss_step : 8.0901 learning_rate : 0.00035552774 loss_scaler : 32768
DLL 2020-11-27 03:09:09.529088 - Iteration: 213 throughput_train : 263.841 seq/s mlm_loss : 8.0172 nsp_loss : 0.0000 total_loss : 8.0173 avg_loss_step : 8.0173 learning_rate : 0.00035530268 loss_scaler : 32768
DLL 2020-11-27 03:09:11.479403 - Iteration: 214 throughput_train : 262.561 seq/s mlm_loss : 8.0245 nsp_loss : 0.0000 total_loss : 8.0246 avg_loss_step : 8.0246 learning_rate : 0.00035507744 loss_scaler : 32768
DLL 2020-11-27 03:09:13.430404 - Iteration: 215 throughput_train : 262.469 seq/s mlm_loss : 7.9356 nsp_loss : 0.0000 total_loss : 7.9357 avg_loss_step : 7.9357 learning_rate : 0.00035485206 loss_scaler : 32768
DLL 2020-11-27 03:09:15.389328 - Iteration: 216 throughput_train : 261.406 seq/s mlm_loss : 7.9099 nsp_loss : 0.0000 total_loss : 7.9099 avg_loss_step : 7.9099 learning_rate : 0.00035462657 loss_scaler : 32768
DLL 2020-11-27 03:09:17.341977 - Iteration: 217 throughput_train : 262.247 seq/s mlm_loss : 7.9273 nsp_loss : 0.0000 total_loss : 7.9273 avg_loss_step : 7.9273 learning_rate : 0.0003544009 loss_scaler : 32768
DLL 2020-11-27 03:09:19.286963 - Iteration: 218 throughput_train : 263.280 seq/s mlm_loss : 8.0235 nsp_loss : 0.0000 total_loss : 8.0235 avg_loss_step : 8.0235 learning_rate : 0.00035417508 loss_scaler : 32768
DLL 2020-11-27 03:09:21.223025 - Iteration: 219 throughput_train : 264.496 seq/s mlm_loss : 7.8798 nsp_loss : 0.0000 total_loss : 7.8798 avg_loss_step : 7.8798 learning_rate : 0.00035394914 loss_scaler : 32768
DLL 2020-11-27 03:09:23.169923 - Iteration: 220 throughput_train : 263.022 seq/s mlm_loss : 7.9186 nsp_loss : 0.0000 total_loss : 7.9186 avg_loss_step : 7.9186 learning_rate : 0.00035372304 loss_scaler : 32768
DLL 2020-11-27 03:09:25.112939 - Iteration: 221 throughput_train : 263.548 seq/s mlm_loss : 7.8740 nsp_loss : 0.0000 total_loss : 7.8741 avg_loss_step : 7.8741 learning_rate : 0.0003534968 loss_scaler : 32768
DLL 2020-11-27 03:09:27.060205 - Iteration: 222 throughput_train : 262.972 seq/s mlm_loss : 7.8641 nsp_loss : 0.0000 total_loss : 7.8641 avg_loss_step : 7.8641 learning_rate : 0.0003532704 loss_scaler : 32768
DLL 2020-11-27 03:09:29.003529 - Iteration: 223 throughput_train : 263.507 seq/s mlm_loss : 8.0296 nsp_loss : 0.0000 total_loss : 8.0296 avg_loss_step : 8.0296 learning_rate : 0.0003530439 loss_scaler : 32768
DLL 2020-11-27 03:09:30.950318 - Iteration: 224 throughput_train : 263.036 seq/s mlm_loss : 7.9842 nsp_loss : 0.0000 total_loss : 7.9842 avg_loss_step : 7.9842 learning_rate : 0.0003528172 loss_scaler : 32768
DLL 2020-11-27 03:09:32.911519 - Iteration: 225 throughput_train : 261.105 seq/s mlm_loss : 7.8753 nsp_loss : 0.0000 total_loss : 7.8753 avg_loss_step : 7.8753 learning_rate : 0.0003525904 loss_scaler : 32768
DLL 2020-11-27 03:09:34.856809 - Iteration: 226 throughput_train : 263.238 seq/s mlm_loss : 8.0229 nsp_loss : 0.0000 total_loss : 8.0229 avg_loss_step : 8.0229 learning_rate : 0.00035236342 loss_scaler : 32768
DLL 2020-11-27 03:09:36.808877 - Iteration: 227 throughput_train : 262.324 seq/s mlm_loss : 7.8167 nsp_loss : 0.0000 total_loss : 7.8167 avg_loss_step : 7.8167 learning_rate : 0.00035213633 loss_scaler : 32768
DLL 2020-11-27 03:09:38.750480 - Iteration: 228 throughput_train : 263.739 seq/s mlm_loss : 7.8891 nsp_loss : 0.0000 total_loss : 7.8891 avg_loss_step : 7.8891 learning_rate : 0.00035190905 loss_scaler : 32768
DLL 2020-11-27 03:09:40.700379 - Iteration: 229 throughput_train : 262.619 seq/s mlm_loss : 7.8387 nsp_loss : 0.0000 total_loss : 7.8387 avg_loss_step : 7.8387 learning_rate : 0.00035168167 loss_scaler : 32768
DLL 2020-11-27 03:09:42.638737 - Iteration: 230 throughput_train : 264.186 seq/s mlm_loss : 7.9388 nsp_loss : 0.0000 total_loss : 7.9389 avg_loss_step : 7.9389 learning_rate : 0.0003514541 loss_scaler : 32768
DLL 2020-11-27 03:09:44.587679 - Iteration: 231 throughput_train : 262.745 seq/s mlm_loss : 7.8525 nsp_loss : 0.0000 total_loss : 7.8525 avg_loss_step : 7.8525 learning_rate : 0.00035122642 loss_scaler : 32768
DLL 2020-11-27 03:09:46.537407 - Iteration: 232 throughput_train : 262.640 seq/s mlm_loss : 7.8987 nsp_loss : 0.0000 total_loss : 7.8987 avg_loss_step : 7.8987 learning_rate : 0.00035099857 loss_scaler : 32768
DLL 2020-11-27 03:09:48.482687 - Iteration: 233 throughput_train : 263.242 seq/s mlm_loss : 7.9447 nsp_loss : 0.0000 total_loss : 7.9447 avg_loss_step : 7.9447 learning_rate : 0.00035077057 loss_scaler : 32768
DLL 2020-11-27 03:09:50.431191 - Iteration: 234 throughput_train : 262.805 seq/s mlm_loss : 7.8990 nsp_loss : 0.0000 total_loss : 7.8990 avg_loss_step : 7.8990 learning_rate : 0.00035054245 loss_scaler : 32768
DLL 2020-11-27 03:09:52.370179 - Iteration: 235 throughput_train : 264.095 seq/s mlm_loss : 7.7710 nsp_loss : 0.0000 total_loss : 7.7710 avg_loss_step : 7.7710 learning_rate : 0.00035031413 loss_scaler : 32768
DLL 2020-11-27 03:09:54.325579 - Iteration: 236 throughput_train : 261.878 seq/s mlm_loss : 7.8712 nsp_loss : 0.0000 total_loss : 7.8712 avg_loss_step : 7.8712 learning_rate : 0.00035008567 loss_scaler : 32768
DLL 2020-11-27 03:09:56.283402 - Iteration: 237 throughput_train : 261.553 seq/s mlm_loss : 7.8356 nsp_loss : 0.0000 total_loss : 7.8356 avg_loss_step : 7.8356 learning_rate : 0.00034985712 loss_scaler : 32768
DLL 2020-11-27 03:09:58.229327 - Iteration: 238 throughput_train : 263.155 seq/s mlm_loss : 7.8350 nsp_loss : 0.0000 total_loss : 7.8350 avg_loss_step : 7.8350 learning_rate : 0.00034962836 loss_scaler : 32768
DLL 2020-11-27 03:10:00.172347 - Iteration: 239 throughput_train : 263.547 seq/s mlm_loss : 7.9478 nsp_loss : 0.0000 total_loss : 7.9479 avg_loss_step : 7.9479 learning_rate : 0.0003493995 loss_scaler : 32768
DLL 2020-11-27 03:10:02.114698 - Iteration: 240 throughput_train : 263.644 seq/s mlm_loss : 7.8153 nsp_loss : 0.0000 total_loss : 7.8153 avg_loss_step : 7.8153 learning_rate : 0.00034917044 loss_scaler : 32768
DLL 2020-11-27 03:10:04.062529 - Iteration: 241 throughput_train : 262.896 seq/s mlm_loss : 7.8213 nsp_loss : 0.0000 total_loss : 7.8214 avg_loss_step : 7.8214 learning_rate : 0.00034894125 loss_scaler : 32768
DLL 2020-11-27 03:10:06.010827 - Iteration: 242 throughput_train : 262.832 seq/s mlm_loss : 7.7995 nsp_loss : 0.0000 total_loss : 7.7995 avg_loss_step : 7.7995 learning_rate : 0.0003487119 loss_scaler : 32768
DLL 2020-11-27 03:10:07.946752 - Iteration: 243 throughput_train : 264.514 seq/s mlm_loss : 7.7922 nsp_loss : 0.0000 total_loss : 7.7922 avg_loss_step : 7.7922 learning_rate : 0.0003484824 loss_scaler : 32768
DLL 2020-11-27 03:10:09.885653 - Iteration: 244 throughput_train : 264.109 seq/s mlm_loss : 7.8098 nsp_loss : 0.0000 total_loss : 7.8098 avg_loss_step : 7.8098 learning_rate : 0.0003482528 loss_scaler : 32768
DLL 2020-11-27 03:10:11.819470 - Iteration: 245 throughput_train : 264.805 seq/s mlm_loss : 7.7841 nsp_loss : 0.0000 total_loss : 7.7841 avg_loss_step : 7.7841 learning_rate : 0.00034802296 loss_scaler : 32768
DLL 2020-11-27 03:10:13.757695 - Iteration: 246 throughput_train : 264.199 seq/s mlm_loss : 7.8663 nsp_loss : 0.0000 total_loss : 7.8663 avg_loss_step : 7.8663 learning_rate : 0.00034779301 loss_scaler : 32768
DLL 2020-11-27 03:10:15.704388 - Iteration: 247 throughput_train : 263.049 seq/s mlm_loss : 7.7867 nsp_loss : 0.0000 total_loss : 7.7867 avg_loss_step : 7.7867 learning_rate : 0.00034756292 loss_scaler : 32768
DLL 2020-11-27 03:10:17.656853 - Iteration: 248 throughput_train : 262.273 seq/s mlm_loss : 7.7180 nsp_loss : 0.0000 total_loss : 7.7180 avg_loss_step : 7.7180 learning_rate : 0.00034733268 loss_scaler : 32768
DLL 2020-11-27 03:10:19.604249 - Iteration: 249 throughput_train : 262.955 seq/s mlm_loss : 7.7878 nsp_loss : 0.0000 total_loss : 7.7878 avg_loss_step : 7.7878 learning_rate : 0.00034710226 loss_scaler : 32768
DLL 2020-11-27 03:10:21.546382 - Iteration: 250 throughput_train : 263.667 seq/s mlm_loss : 7.7676 nsp_loss : 0.0000 total_loss : 7.7676 avg_loss_step : 7.7676 learning_rate : 0.00034687173 loss_scaler : 32768
DLL 2020-11-27 03:10:23.486447 - Iteration: 251 throughput_train : 263.950 seq/s mlm_loss : 7.9536 nsp_loss : 0.0000 total_loss : 7.9536 avg_loss_step : 7.9536 learning_rate : 0.000346641 loss_scaler : 32768
DLL 2020-11-27 03:10:25.442185 - Iteration: 252 throughput_train : 261.837 seq/s mlm_loss : 7.8250 nsp_loss : 0.0000 total_loss : 7.8250 avg_loss_step : 7.8250 learning_rate : 0.00034641015 loss_scaler : 32768
DLL 2020-11-27 03:10:27.382303 - Iteration: 253 throughput_train : 263.941 seq/s mlm_loss : 7.7938 nsp_loss : 0.0000 total_loss : 7.7938 avg_loss_step : 7.7938 learning_rate : 0.00034617912 loss_scaler : 32768
DLL 2020-11-27 03:10:29.337095 - Iteration: 254 throughput_train : 261.959 seq/s mlm_loss : 7.7801 nsp_loss : 0.0000 total_loss : 7.7801 avg_loss_step : 7.7801 learning_rate : 0.00034594798 loss_scaler : 32768
DLL 2020-11-27 03:10:31.274122 - Iteration: 255 throughput_train : 264.362 seq/s mlm_loss : 7.7076 nsp_loss : 0.0000 total_loss : 7.7076 avg_loss_step : 7.7076 learning_rate : 0.00034571663 loss_scaler : 32768
DLL 2020-11-27 03:10:33.218959 - Iteration: 256 throughput_train : 263.300 seq/s mlm_loss : 7.8528 nsp_loss : 0.0000 total_loss : 7.8528 avg_loss_step : 7.8528 learning_rate : 0.00034548517 loss_scaler : 32768
DLL 2020-11-27 03:10:35.166403 - Iteration: 257 throughput_train : 262.947 seq/s mlm_loss : 7.7599 nsp_loss : 0.0000 total_loss : 7.7599 avg_loss_step : 7.7599 learning_rate : 0.00034525353 loss_scaler : 32768
DLL 2020-11-27 03:10:37.113473 - Iteration: 258 throughput_train : 262.998 seq/s mlm_loss : 7.7255 nsp_loss : 0.0000 total_loss : 7.7255 avg_loss_step : 7.7255 learning_rate : 0.00034502172 loss_scaler : 32768
DLL 2020-11-27 03:10:39.069940 - Iteration: 259 throughput_train : 261.735 seq/s mlm_loss : 7.6998 nsp_loss : 0.0000 total_loss : 7.6998 avg_loss_step : 7.6998 learning_rate : 0.0003447898 loss_scaler : 32768
DLL 2020-11-27 03:10:41.015847 - Iteration: 260 throughput_train : 263.155 seq/s mlm_loss : 7.7650 nsp_loss : 0.0000 total_loss : 7.7650 avg_loss_step : 7.7650 learning_rate : 0.0003445577 loss_scaler : 32768
DLL 2020-11-27 03:10:42.956601 - Iteration: 261 throughput_train : 263.853 seq/s mlm_loss : 7.8025 nsp_loss : 0.0000 total_loss : 7.8025 avg_loss_step : 7.8025 learning_rate : 0.0003443254 loss_scaler : 32768
DLL 2020-11-27 03:10:44.902363 - Iteration: 262 throughput_train : 263.173 seq/s mlm_loss : 7.7155 nsp_loss : 0.0000 total_loss : 7.7155 avg_loss_step : 7.7155 learning_rate : 0.00034409302 loss_scaler : 32768
DLL 2020-11-27 03:10:46.835082 - Iteration: 263 throughput_train : 264.953 seq/s mlm_loss : 7.6332 nsp_loss : 0.0000 total_loss : 7.6332 avg_loss_step : 7.6332 learning_rate : 0.00034386042 loss_scaler : 32768
DLL 2020-11-27 03:10:48.781628 - Iteration: 264 throughput_train : 263.069 seq/s mlm_loss : 7.6907 nsp_loss : 0.0000 total_loss : 7.6907 avg_loss_step : 7.6907 learning_rate : 0.00034362767 loss_scaler : 32768
DLL 2020-11-27 03:10:50.729578 - Iteration: 265 throughput_train : 262.880 seq/s mlm_loss : 7.7665 nsp_loss : 0.0000 total_loss : 7.7665 avg_loss_step : 7.7665 learning_rate : 0.0003433948 loss_scaler : 32768
DLL 2020-11-27 03:10:52.671405 - Iteration: 266 throughput_train : 263.708 seq/s mlm_loss : 7.9586 nsp_loss : 0.0000 total_loss : 7.9586 avg_loss_step : 7.9586 learning_rate : 0.00034316175 loss_scaler : 32768
DLL 2020-11-27 03:10:54.620939 - Iteration: 267 throughput_train : 262.667 seq/s mlm_loss : 7.6711 nsp_loss : 0.0000 total_loss : 7.6711 avg_loss_step : 7.6711 learning_rate : 0.00034292857 loss_scaler : 32768
DLL 2020-11-27 03:10:56.561886 - Iteration: 268 throughput_train : 263.828 seq/s mlm_loss : 7.5332 nsp_loss : 0.0000 total_loss : 7.5332 avg_loss_step : 7.5332 learning_rate : 0.0003426952 loss_scaler : 32768
DLL 2020-11-27 03:10:58.503822 - Iteration: 269 throughput_train : 263.694 seq/s mlm_loss : 7.6867 nsp_loss : 0.0000 total_loss : 7.6867 avg_loss_step : 7.6867 learning_rate : 0.00034246166 loss_scaler : 32768
DLL 2020-11-27 03:11:00.450943 - Iteration: 270 throughput_train : 262.991 seq/s mlm_loss : 7.5312 nsp_loss : 0.0000 total_loss : 7.5312 avg_loss_step : 7.5312 learning_rate : 0.00034222798 loss_scaler : 32768
DLL 2020-11-27 03:11:02.403755 - Iteration: 271 throughput_train : 262.224 seq/s mlm_loss : 7.6066 nsp_loss : 0.0000 total_loss : 7.6067 avg_loss_step : 7.6067 learning_rate : 0.00034199413 loss_scaler : 32768
DLL 2020-11-27 03:11:04.348764 - Iteration: 272 throughput_train : 263.276 seq/s mlm_loss : 7.6654 nsp_loss : 0.0000 total_loss : 7.6654 avg_loss_step : 7.6654 learning_rate : 0.00034176014 loss_scaler : 32768
DLL 2020-11-27 03:11:06.297751 - Iteration: 273 throughput_train : 262.739 seq/s mlm_loss : 7.6799 nsp_loss : 0.0000 total_loss : 7.6799 avg_loss_step : 7.6799 learning_rate : 0.00034152597 loss_scaler : 32768
DLL 2020-11-27 03:11:08.233250 - Iteration: 274 throughput_train : 264.573 seq/s mlm_loss : 7.6734 nsp_loss : 0.0000 total_loss : 7.6734 avg_loss_step : 7.6734 learning_rate : 0.00034129166 loss_scaler : 32768
DLL 2020-11-27 03:11:10.176786 - Iteration: 275 throughput_train : 263.476 seq/s mlm_loss : 7.6108 nsp_loss : 0.0000 total_loss : 7.6108 avg_loss_step : 7.6108 learning_rate : 0.00034105717 loss_scaler : 32768
DLL 2020-11-27 03:11:12.136437 - Iteration: 276 throughput_train : 261.309 seq/s mlm_loss : 7.6594 nsp_loss : 0.0000 total_loss : 7.6594 avg_loss_step : 7.6594 learning_rate : 0.00034082253 loss_scaler : 32768
DLL 2020-11-27 03:11:14.082201 - Iteration: 277 throughput_train : 263.176 seq/s mlm_loss : 7.5897 nsp_loss : 0.0000 total_loss : 7.5897 avg_loss_step : 7.5897 learning_rate : 0.00034058772 loss_scaler : 32768
DLL 2020-11-27 03:11:16.035464 - Iteration: 278 throughput_train : 262.165 seq/s mlm_loss : 7.7030 nsp_loss : 0.0000 total_loss : 7.7030 avg_loss_step : 7.7030 learning_rate : 0.00034035274 loss_scaler : 32768
DLL 2020-11-27 03:11:17.992439 - Iteration: 279 throughput_train : 261.668 seq/s mlm_loss : 7.6785 nsp_loss : 0.0000 total_loss : 7.6785 avg_loss_step : 7.6785 learning_rate : 0.0003401176 loss_scaler : 32768
DLL 2020-11-27 03:11:19.943323 - Iteration: 280 throughput_train : 262.485 seq/s mlm_loss : 7.7485 nsp_loss : 0.0000 total_loss : 7.7485 avg_loss_step : 7.7485 learning_rate : 0.00033988233 loss_scaler : 32768
DLL 2020-11-27 03:11:21.894416 - Iteration: 281 throughput_train : 262.457 seq/s mlm_loss : 7.6150 nsp_loss : 0.0000 total_loss : 7.6150 avg_loss_step : 7.6150 learning_rate : 0.00033964685 loss_scaler : 32768
DLL 2020-11-27 03:11:23.842499 - Iteration: 282 throughput_train : 262.863 seq/s mlm_loss : 7.5638 nsp_loss : 0.0000 total_loss : 7.5638 avg_loss_step : 7.5638 learning_rate : 0.00033941126 loss_scaler : 32768
DLL 2020-11-27 03:11:25.780875 - Iteration: 283 throughput_train : 264.179 seq/s mlm_loss : 7.5457 nsp_loss : 0.0000 total_loss : 7.5457 avg_loss_step : 7.5457 learning_rate : 0.00033917546 loss_scaler : 32768
DLL 2020-11-27 03:11:27.724616 - Iteration: 284 throughput_train : 263.461 seq/s mlm_loss : 7.6641 nsp_loss : 0.0000 total_loss : 7.6641 avg_loss_step : 7.6641 learning_rate : 0.0003389395 loss_scaler : 32768
DLL 2020-11-27 03:11:29.666251 - Iteration: 285 throughput_train : 263.758 seq/s mlm_loss : 7.6784 nsp_loss : 0.0000 total_loss : 7.6784 avg_loss_step : 7.6784 learning_rate : 0.00033870342 loss_scaler : 32768
DLL 2020-11-27 03:11:31.614474 - Iteration: 286 throughput_train : 262.844 seq/s mlm_loss : 7.5921 nsp_loss : 0.0000 total_loss : 7.5921 avg_loss_step : 7.5921 learning_rate : 0.0003384671 loss_scaler : 32768
DLL 2020-11-27 03:11:33.559413 - Iteration: 287 throughput_train : 263.288 seq/s mlm_loss : 7.4688 nsp_loss : 0.0000 total_loss : 7.4688 avg_loss_step : 7.4688 learning_rate : 0.00033823066 loss_scaler : 32768
DLL 2020-11-27 03:11:35.500978 - Iteration: 288 throughput_train : 263.746 seq/s mlm_loss : 7.5700 nsp_loss : 0.0000 total_loss : 7.5700 avg_loss_step : 7.5700 learning_rate : 0.00033799408 loss_scaler : 32768
DLL 2020-11-27 03:11:37.447423 - Iteration: 289 throughput_train : 263.083 seq/s mlm_loss : 7.5455 nsp_loss : 0.0000 total_loss : 7.5455 avg_loss_step : 7.5455 learning_rate : 0.0003377573 loss_scaler : 32768
DLL 2020-11-27 03:11:39.397945 - Iteration: 290 throughput_train : 262.533 seq/s mlm_loss : 7.6285 nsp_loss : 0.0000 total_loss : 7.6285 avg_loss_step : 7.6285 learning_rate : 0.00033752035 loss_scaler : 32768
DLL 2020-11-27 03:11:41.345427 - Iteration: 291 throughput_train : 262.943 seq/s mlm_loss : 7.5731 nsp_loss : 0.0000 total_loss : 7.5731 avg_loss_step : 7.5731 learning_rate : 0.00033728324 loss_scaler : 32768
DLL 2020-11-27 03:11:43.297010 - Iteration: 292 throughput_train : 262.390 seq/s mlm_loss : 7.5067 nsp_loss : 0.0000 total_loss : 7.5067 avg_loss_step : 7.5067 learning_rate : 0.00033704596 loss_scaler : 32768
DLL 2020-11-27 03:11:45.240539 - Iteration: 293 throughput_train : 263.479 seq/s mlm_loss : 7.6464 nsp_loss : 0.0000 total_loss : 7.6464 avg_loss_step : 7.6464 learning_rate : 0.00033680853 loss_scaler : 32768
DLL 2020-11-27 03:11:47.195189 - Iteration: 294 throughput_train : 261.978 seq/s mlm_loss : 7.5721 nsp_loss : 0.0000 total_loss : 7.5721 avg_loss_step : 7.5721 learning_rate : 0.00033657093 loss_scaler : 32768
DLL 2020-11-27 03:11:49.141612 - Iteration: 295 throughput_train : 263.085 seq/s mlm_loss : 7.6829 nsp_loss : 0.0000 total_loss : 7.6829 avg_loss_step : 7.6829 learning_rate : 0.00033633318 loss_scaler : 32768
DLL 2020-11-27 03:11:51.090035 - Iteration: 296 throughput_train : 262.817 seq/s mlm_loss : 7.6925 nsp_loss : 0.0000 total_loss : 7.6925 avg_loss_step : 7.6925 learning_rate : 0.0003360952 loss_scaler : 32768
DLL 2020-11-27 03:11:53.020580 - Iteration: 297 throughput_train : 265.250 seq/s mlm_loss : 7.7207 nsp_loss : 0.0000 total_loss : 7.7207 avg_loss_step : 7.7207 learning_rate : 0.0003358571 loss_scaler : 32768
DLL 2020-11-27 03:11:54.963243 - Iteration: 298 throughput_train : 263.594 seq/s mlm_loss : 7.5999 nsp_loss : 0.0000 total_loss : 7.5999 avg_loss_step : 7.5999 learning_rate : 0.00033561882 loss_scaler : 32768
DLL 2020-11-27 03:11:56.902025 - Iteration: 299 throughput_train : 264.123 seq/s mlm_loss : 7.4692 nsp_loss : 0.0000 total_loss : 7.4692 avg_loss_step : 7.4692 learning_rate : 0.00033538035 loss_scaler : 32768
DLL 2020-11-27 03:11:58.844819 - Iteration: 300 throughput_train : 263.576 seq/s mlm_loss : 7.5501 nsp_loss : 0.0000 total_loss : 7.5501 avg_loss_step : 7.5501 learning_rate : 0.00033514178 loss_scaler : 32768
DLL 2020-11-27 03:12:00.787070 - Iteration: 301 throughput_train : 263.651 seq/s mlm_loss : 7.5963 nsp_loss : 0.0000 total_loss : 7.5963 avg_loss_step : 7.5963 learning_rate : 0.00033490296 loss_scaler : 32768
DLL 2020-11-27 03:12:02.733924 - Iteration: 302 throughput_train : 263.027 seq/s mlm_loss : 7.6253 nsp_loss : 0.0000 total_loss : 7.6253 avg_loss_step : 7.6253 learning_rate : 0.00033466402 loss_scaler : 32768
DLL 2020-11-27 03:12:04.675663 - Iteration: 303 throughput_train : 263.720 seq/s mlm_loss : 7.5718 nsp_loss : 0.0000 total_loss : 7.5718 avg_loss_step : 7.5718 learning_rate : 0.00033442487 loss_scaler : 32768
DLL 2020-11-27 03:12:06.612002 - Iteration: 304 throughput_train : 264.456 seq/s mlm_loss : 7.6788 nsp_loss : 0.0000 total_loss : 7.6788 avg_loss_step : 7.6788 learning_rate : 0.00033418558 loss_scaler : 32768
DLL 2020-11-27 03:12:08.554859 - Iteration: 305 throughput_train : 263.569 seq/s mlm_loss : 7.5251 nsp_loss : 0.0000 total_loss : 7.5252 avg_loss_step : 7.5252 learning_rate : 0.0003339461 loss_scaler : 32768
DLL 2020-11-27 03:12:10.512245 - Iteration: 306 throughput_train : 261.613 seq/s mlm_loss : 7.6095 nsp_loss : 0.0000 total_loss : 7.6095 avg_loss_step : 7.6095 learning_rate : 0.00033370644 loss_scaler : 32768
DLL 2020-11-27 03:12:12.461434 - Iteration: 307 throughput_train : 262.714 seq/s mlm_loss : 7.5467 nsp_loss : 0.0000 total_loss : 7.5467 avg_loss_step : 7.5467 learning_rate : 0.00033346665 loss_scaler : 32768
DLL 2020-11-27 03:12:14.406582 - Iteration: 308 throughput_train : 263.260 seq/s mlm_loss : 7.4174 nsp_loss : 0.0000 total_loss : 7.4174 avg_loss_step : 7.4174 learning_rate : 0.00033322664 loss_scaler : 32768
DLL 2020-11-27 03:12:16.353040 - Iteration: 309 throughput_train : 263.080 seq/s mlm_loss : 7.5884 nsp_loss : 0.0000 total_loss : 7.5884 avg_loss_step : 7.5884 learning_rate : 0.00033298647 loss_scaler : 32768
DLL 2020-11-27 03:12:18.298823 - Iteration: 310 throughput_train : 263.173 seq/s mlm_loss : 7.5277 nsp_loss : 0.0000 total_loss : 7.5277 avg_loss_step : 7.5277 learning_rate : 0.00033274613 loss_scaler : 32768
DLL 2020-11-27 03:12:20.242278 - Iteration: 311 throughput_train : 263.488 seq/s mlm_loss : 7.6201 nsp_loss : 0.0000 total_loss : 7.6201 avg_loss_step : 7.6201 learning_rate : 0.00033250562 loss_scaler : 32768
DLL 2020-11-27 03:12:22.195966 - Iteration: 312 throughput_train : 262.109 seq/s mlm_loss : 7.4652 nsp_loss : 0.0000 total_loss : 7.4652 avg_loss_step : 7.4652 learning_rate : 0.00033226493 loss_scaler : 32768
DLL 2020-11-27 03:12:24.144064 - Iteration: 313 throughput_train : 262.859 seq/s mlm_loss : 7.5447 nsp_loss : 0.0000 total_loss : 7.5447 avg_loss_step : 7.5447 learning_rate : 0.0003320241 loss_scaler : 32768
DLL 2020-11-27 03:12:26.083747 - Iteration: 314 throughput_train : 263.999 seq/s mlm_loss : 7.4444 nsp_loss : 0.0000 total_loss : 7.4444 avg_loss_step : 7.4444 learning_rate : 0.00033178306 loss_scaler : 32768
DLL 2020-11-27 03:12:28.025880 - Iteration: 315 throughput_train : 263.666 seq/s mlm_loss : 7.5540 nsp_loss : 0.0000 total_loss : 7.5540 avg_loss_step : 7.5540 learning_rate : 0.00033154184 loss_scaler : 32768
DLL 2020-11-27 03:12:29.974407 - Iteration: 316 throughput_train : 262.802 seq/s mlm_loss : 7.5016 nsp_loss : 0.0000 total_loss : 7.5016 avg_loss_step : 7.5016 learning_rate : 0.00033130046 loss_scaler : 32768
DLL 2020-11-27 03:12:31.914944 - Iteration: 317 throughput_train : 263.887 seq/s mlm_loss : 7.4427 nsp_loss : 0.0000 total_loss : 7.4427 avg_loss_step : 7.4427 learning_rate : 0.00033105887 loss_scaler : 32768
DLL 2020-11-27 03:12:33.847894 - Iteration: 318 throughput_train : 264.927 seq/s mlm_loss : 7.4909 nsp_loss : 0.0000 total_loss : 7.4909 avg_loss_step : 7.4909 learning_rate : 0.00033081716 loss_scaler : 32768
DLL 2020-11-27 03:12:35.797321 - Iteration: 319 throughput_train : 262.683 seq/s mlm_loss : 7.4878 nsp_loss : 0.0000 total_loss : 7.4878 avg_loss_step : 7.4878 learning_rate : 0.00033057525 loss_scaler : 32768
DLL 2020-11-27 03:12:37.745347 - Iteration: 320 throughput_train : 262.870 seq/s mlm_loss : 7.4414 nsp_loss : 0.0000 total_loss : 7.4414 avg_loss_step : 7.4414 learning_rate : 0.00033033316 loss_scaler : 32768
DLL 2020-11-27 03:12:39.691390 - Iteration: 321 throughput_train : 263.138 seq/s mlm_loss : 7.4378 nsp_loss : 0.0000 total_loss : 7.4378 avg_loss_step : 7.4378 learning_rate : 0.0003300909 loss_scaler : 32768
DLL 2020-11-27 03:12:41.638205 - Iteration: 322 throughput_train : 263.033 seq/s mlm_loss : 7.5072 nsp_loss : 0.0000 total_loss : 7.5072 avg_loss_step : 7.5072 learning_rate : 0.00032984844 loss_scaler : 32768
DLL 2020-11-27 03:12:43.579769 - Iteration: 323 throughput_train : 263.744 seq/s mlm_loss : 7.4456 nsp_loss : 0.0000 total_loss : 7.4456 avg_loss_step : 7.4456 learning_rate : 0.00032960583 loss_scaler : 32768
DLL 2020-11-27 03:12:45.523174 - Iteration: 324 throughput_train : 263.494 seq/s mlm_loss : 7.4672 nsp_loss : 0.0000 total_loss : 7.4672 avg_loss_step : 7.4672 learning_rate : 0.000329363 loss_scaler : 32768
DLL 2020-11-27 03:12:47.465975 - Iteration: 325 throughput_train : 263.576 seq/s mlm_loss : 7.5592 nsp_loss : 0.0000 total_loss : 7.5592 avg_loss_step : 7.5592 learning_rate : 0.00032912003 loss_scaler : 32768
DLL 2020-11-27 03:12:49.420059 - Iteration: 326 throughput_train : 262.057 seq/s mlm_loss : 7.4740 nsp_loss : 0.0000 total_loss : 7.4740 avg_loss_step : 7.4740 learning_rate : 0.00032887686 loss_scaler : 32768
DLL 2020-11-27 03:12:51.362595 - Iteration: 327 throughput_train : 263.627 seq/s mlm_loss : 7.4757 nsp_loss : 0.0000 total_loss : 7.4757 avg_loss_step : 7.4757 learning_rate : 0.00032863353 loss_scaler : 32768
DLL 2020-11-27 03:12:53.315047 - Iteration: 328 throughput_train : 262.290 seq/s mlm_loss : 7.5258 nsp_loss : 0.0000 total_loss : 7.5258 avg_loss_step : 7.5258 learning_rate : 0.00032839002 loss_scaler : 32768
DLL 2020-11-27 03:12:55.259760 - Iteration: 329 throughput_train : 263.328 seq/s mlm_loss : 7.5241 nsp_loss : 0.0000 total_loss : 7.5241 avg_loss_step : 7.5241 learning_rate : 0.0003281463 loss_scaler : 32768
DLL 2020-11-27 03:12:57.206000 - Iteration: 330 throughput_train : 263.109 seq/s mlm_loss : 7.4842 nsp_loss : 0.0000 total_loss : 7.4842 avg_loss_step : 7.4842 learning_rate : 0.0003279024 loss_scaler : 32768
DLL 2020-11-27 03:12:59.146238 - Iteration: 331 throughput_train : 263.924 seq/s mlm_loss : 7.5165 nsp_loss : 0.0000 total_loss : 7.5165 avg_loss_step : 7.5165 learning_rate : 0.00032765834 loss_scaler : 32768
DLL 2020-11-27 03:13:01.102445 - Iteration: 332 throughput_train : 261.769 seq/s mlm_loss : 7.5780 nsp_loss : 0.0000 total_loss : 7.5780 avg_loss_step : 7.5780 learning_rate : 0.0003274141 loss_scaler : 32768
DLL 2020-11-27 03:13:03.046597 - Iteration: 333 throughput_train : 263.395 seq/s mlm_loss : 7.5336 nsp_loss : 0.0000 total_loss : 7.5336 avg_loss_step : 7.5336 learning_rate : 0.00032716966 loss_scaler : 32768
DLL 2020-11-27 03:13:04.992116 - Iteration: 334 throughput_train : 263.207 seq/s mlm_loss : 7.5650 nsp_loss : 0.0000 total_loss : 7.5650 avg_loss_step : 7.5650 learning_rate : 0.00032692504 loss_scaler : 32768
DLL 2020-11-27 03:13:06.935790 - Iteration: 335 throughput_train : 263.457 seq/s mlm_loss : 7.5931 nsp_loss : 0.0000 total_loss : 7.5931 avg_loss_step : 7.5931 learning_rate : 0.00032668028 loss_scaler : 32768
DLL 2020-11-27 03:13:08.883392 - Iteration: 336 throughput_train : 262.928 seq/s mlm_loss : 7.5645 nsp_loss : 0.0000 total_loss : 7.5645 avg_loss_step : 7.5645 learning_rate : 0.00032643529 loss_scaler : 32768
DLL 2020-11-27 03:13:10.829737 - Iteration: 337 throughput_train : 263.112 seq/s mlm_loss : 7.4921 nsp_loss : 0.0000 total_loss : 7.4921 avg_loss_step : 7.4921 learning_rate : 0.00032619011 loss_scaler : 32768
DLL 2020-11-27 03:13:12.772453 - Iteration: 338 throughput_train : 263.597 seq/s mlm_loss : 7.4980 nsp_loss : 0.0000 total_loss : 7.4980 avg_loss_step : 7.4980 learning_rate : 0.00032594477 loss_scaler : 32768
DLL 2020-11-27 03:13:14.721102 - Iteration: 339 throughput_train : 262.788 seq/s mlm_loss : 7.4121 nsp_loss : 0.0000 total_loss : 7.4121 avg_loss_step : 7.4121 learning_rate : 0.00032569922 loss_scaler : 32768
DLL 2020-11-27 03:13:16.660199 - Iteration: 340 throughput_train : 264.084 seq/s mlm_loss : 7.4249 nsp_loss : 0.0000 total_loss : 7.4249 avg_loss_step : 7.4249 learning_rate : 0.00032545353 loss_scaler : 32768
DLL 2020-11-27 03:13:18.607639 - Iteration: 341 throughput_train : 262.947 seq/s mlm_loss : 7.2939 nsp_loss : 0.0000 total_loss : 7.2939 avg_loss_step : 7.2939 learning_rate : 0.00032520763 loss_scaler : 32768
DLL 2020-11-27 03:13:20.558980 - Iteration: 342 throughput_train : 262.426 seq/s mlm_loss : 7.4397 nsp_loss : 0.0000 total_loss : 7.4397 avg_loss_step : 7.4397 learning_rate : 0.0003249615 loss_scaler : 32768
DLL 2020-11-27 03:13:22.505950 - Iteration: 343 throughput_train : 263.025 seq/s mlm_loss : 7.4897 nsp_loss : 0.0000 total_loss : 7.4897 avg_loss_step : 7.4897 learning_rate : 0.00032471525 loss_scaler : 32768
DLL 2020-11-27 03:13:24.444568 - Iteration: 344 throughput_train : 264.160 seq/s mlm_loss : 7.4679 nsp_loss : 0.0000 total_loss : 7.4679 avg_loss_step : 7.4679 learning_rate : 0.0003244688 loss_scaler : 32768
DLL 2020-11-27 03:13:26.390096 - Iteration: 345 throughput_train : 263.206 seq/s mlm_loss : 7.3869 nsp_loss : 0.0000 total_loss : 7.3869 avg_loss_step : 7.3869 learning_rate : 0.0003242221 loss_scaler : 32768
DLL 2020-11-27 03:13:28.330263 - Iteration: 346 throughput_train : 263.940 seq/s mlm_loss : 7.2949 nsp_loss : 0.0000 total_loss : 7.2949 avg_loss_step : 7.2949 learning_rate : 0.00032397528 loss_scaler : 32768
DLL 2020-11-27 03:13:30.271210 - Iteration: 347 throughput_train : 263.850 seq/s mlm_loss : 7.4573 nsp_loss : 0.0000 total_loss : 7.4573 avg_loss_step : 7.4573 learning_rate : 0.00032372828 loss_scaler : 32768
DLL 2020-11-27 03:13:32.215272 - Iteration: 348 throughput_train : 263.406 seq/s mlm_loss : 7.4474 nsp_loss : 0.0000 total_loss : 7.4474 avg_loss_step : 7.4474 learning_rate : 0.00032348104 loss_scaler : 32768
DLL 2020-11-27 03:13:34.164977 - Iteration: 349 throughput_train : 262.643 seq/s mlm_loss : 7.4986 nsp_loss : 0.0000 total_loss : 7.4986 avg_loss_step : 7.4986 learning_rate : 0.00032323363 loss_scaler : 32768
DLL 2020-11-27 03:13:36.109154 - Iteration: 350 throughput_train : 263.390 seq/s mlm_loss : 7.3208 nsp_loss : 0.0000 total_loss : 7.3208 avg_loss_step : 7.3208 learning_rate : 0.00032298604 loss_scaler : 32768
DLL 2020-11-27 03:13:38.059980 - Iteration: 351 throughput_train : 262.492 seq/s mlm_loss : 7.4378 nsp_loss : 0.0000 total_loss : 7.4378 avg_loss_step : 7.4378 learning_rate : 0.00032273828 loss_scaler : 32768
DLL 2020-11-27 03:13:39.986562 - Iteration: 352 throughput_train : 265.797 seq/s mlm_loss : 7.4937 nsp_loss : 0.0000 total_loss : 7.4937 avg_loss_step : 7.4937 learning_rate : 0.0003224903 loss_scaler : 32768
DLL 2020-11-27 03:13:41.941464 - Iteration: 353 throughput_train : 261.949 seq/s mlm_loss : 7.2853 nsp_loss : 0.0000 total_loss : 7.2853 avg_loss_step : 7.2853 learning_rate : 0.00032224212 loss_scaler : 32768
DLL 2020-11-27 03:13:43.885609 - Iteration: 354 throughput_train : 263.397 seq/s mlm_loss : 7.3577 nsp_loss : 0.0000 total_loss : 7.3577 avg_loss_step : 7.3577 learning_rate : 0.00032199378 loss_scaler : 32768
DLL 2020-11-27 03:13:45.830018 - Iteration: 355 throughput_train : 263.362 seq/s mlm_loss : 7.4886 nsp_loss : 0.0000 total_loss : 7.4886 avg_loss_step : 7.4886 learning_rate : 0.00032174523 loss_scaler : 32768
DLL 2020-11-27 03:13:47.780367 - Iteration: 356 throughput_train : 262.556 seq/s mlm_loss : 7.4123 nsp_loss : 0.0000 total_loss : 7.4123 avg_loss_step : 7.4123 learning_rate : 0.0003214965 loss_scaler : 32768
DLL 2020-11-27 03:13:49.737945 - Iteration: 357 throughput_train : 261.589 seq/s mlm_loss : 7.4444 nsp_loss : 0.0000 total_loss : 7.4444 avg_loss_step : 7.4444 learning_rate : 0.00032124756 loss_scaler : 32768
DLL 2020-11-27 03:13:51.682312 - Iteration: 358 throughput_train : 263.364 seq/s mlm_loss : 7.4967 nsp_loss : 0.0000 total_loss : 7.4967 avg_loss_step : 7.4967 learning_rate : 0.00032099843 loss_scaler : 32768
DLL 2020-11-27 03:13:53.629990 - Iteration: 359 throughput_train : 262.916 seq/s mlm_loss : 7.4295 nsp_loss : 0.0000 total_loss : 7.4295 avg_loss_step : 7.4295 learning_rate : 0.0003207491 loss_scaler : 32768
DLL 2020-11-27 03:13:55.576567 - Iteration: 360 throughput_train : 263.066 seq/s mlm_loss : 7.4240 nsp_loss : 0.0000 total_loss : 7.4240 avg_loss_step : 7.4240 learning_rate : 0.0003204996 loss_scaler : 32768
DLL 2020-11-27 03:13:57.521788 - Iteration: 361 throughput_train : 263.247 seq/s mlm_loss : 7.3376 nsp_loss : 0.0000 total_loss : 7.3376 avg_loss_step : 7.3376 learning_rate : 0.00032024988 loss_scaler : 32768
DLL 2020-11-27 03:13:59.463882 - Iteration: 362 throughput_train : 263.674 seq/s mlm_loss : 7.3068 nsp_loss : 0.0000 total_loss : 7.3068 avg_loss_step : 7.3068 learning_rate : 0.00032 loss_scaler : 32768
DLL 2020-11-27 03:14:01.413182 - Iteration: 363 throughput_train : 262.696 seq/s mlm_loss : 7.2490 nsp_loss : 0.0000 total_loss : 7.2490 avg_loss_step : 7.2490 learning_rate : 0.00031974987 loss_scaler : 32768
DLL 2020-11-27 03:14:03.355906 - Iteration: 364 throughput_train : 263.587 seq/s mlm_loss : 7.4289 nsp_loss : 0.0000 total_loss : 7.4289 avg_loss_step : 7.4289 learning_rate : 0.0003194996 loss_scaler : 32768
DLL 2020-11-27 03:14:05.299932 - Iteration: 365 throughput_train : 263.410 seq/s mlm_loss : 7.2907 nsp_loss : 0.0000 total_loss : 7.2907 avg_loss_step : 7.2907 learning_rate : 0.00031924908 loss_scaler : 32768
DLL 2020-11-27 03:14:07.249375 - Iteration: 366 throughput_train : 262.678 seq/s mlm_loss : 7.3538 nsp_loss : 0.0000 total_loss : 7.3538 avg_loss_step : 7.3538 learning_rate : 0.0003189984 loss_scaler : 32768
DLL 2020-11-27 03:14:09.198219 - Iteration: 367 throughput_train : 262.760 seq/s mlm_loss : 7.3458 nsp_loss : 0.0000 total_loss : 7.3458 avg_loss_step : 7.3458 learning_rate : 0.00031874754 loss_scaler : 32768
DLL 2020-11-27 03:14:11.145848 - Iteration: 368 throughput_train : 262.927 seq/s mlm_loss : 7.3479 nsp_loss : 0.0000 total_loss : 7.3479 avg_loss_step : 7.3479 learning_rate : 0.00031849643 loss_scaler : 32768
DLL 2020-11-27 03:14:13.090710 - Iteration: 369 throughput_train : 263.296 seq/s mlm_loss : 7.3657 nsp_loss : 0.0000 total_loss : 7.3657 avg_loss_step : 7.3657 learning_rate : 0.00031824518 loss_scaler : 32768
DLL 2020-11-27 03:14:15.033829 - Iteration: 370 throughput_train : 263.532 seq/s mlm_loss : 7.3974 nsp_loss : 0.0000 total_loss : 7.3974 avg_loss_step : 7.3974 learning_rate : 0.0003179937 loss_scaler : 32768
DLL 2020-11-27 03:14:16.973736 - Iteration: 371 throughput_train : 263.970 seq/s mlm_loss : 7.4322 nsp_loss : 0.0000 total_loss : 7.4322 avg_loss_step : 7.4322 learning_rate : 0.00031774203 loss_scaler : 32768
DLL 2020-11-27 03:14:18.920804 - Iteration: 372 throughput_train : 263.002 seq/s mlm_loss : 7.5240 nsp_loss : 0.0000 total_loss : 7.5240 avg_loss_step : 7.5240 learning_rate : 0.00031749014 loss_scaler : 32768
DLL 2020-11-27 03:14:20.868788 - Iteration: 373 throughput_train : 262.878 seq/s mlm_loss : 7.4041 nsp_loss : 0.0000 total_loss : 7.4041 avg_loss_step : 7.4041 learning_rate : 0.00031723807 loss_scaler : 32768
DLL 2020-11-27 03:14:22.809522 - Iteration: 374 throughput_train : 263.867 seq/s mlm_loss : 7.3821 nsp_loss : 0.0000 total_loss : 7.3821 avg_loss_step : 7.3821 learning_rate : 0.0003169858 loss_scaler : 32768
DLL 2020-11-27 03:14:24.754125 - Iteration: 375 throughput_train : 263.332 seq/s mlm_loss : 7.4149 nsp_loss : 0.0000 total_loss : 7.4149 avg_loss_step : 7.4149 learning_rate : 0.0003167333 loss_scaler : 32768
DLL 2020-11-27 03:14:26.700220 - Iteration: 376 throughput_train : 263.129 seq/s mlm_loss : 7.3221 nsp_loss : 0.0000 total_loss : 7.3221 avg_loss_step : 7.3221 learning_rate : 0.00031648064 loss_scaler : 32768
DLL 2020-11-27 03:14:28.642378 - Iteration: 377 throughput_train : 263.663 seq/s mlm_loss : 7.3969 nsp_loss : 0.0000 total_loss : 7.3969 avg_loss_step : 7.3969 learning_rate : 0.00031622776 loss_scaler : 32768
DLL 2020-11-27 03:14:30.580951 - Iteration: 378 throughput_train : 264.161 seq/s mlm_loss : 7.3166 nsp_loss : 0.0000 total_loss : 7.3166 avg_loss_step : 7.3166 learning_rate : 0.00031597467 loss_scaler : 32768
DLL 2020-11-27 03:14:32.529395 - Iteration: 379 throughput_train : 262.845 seq/s mlm_loss : 7.3591 nsp_loss : 0.0000 total_loss : 7.3591 avg_loss_step : 7.3591 learning_rate : 0.00031572138 loss_scaler : 32768
DLL 2020-11-27 03:14:34.464180 - Iteration: 380 throughput_train : 264.702 seq/s mlm_loss : 7.3062 nsp_loss : 0.0000 total_loss : 7.3062 avg_loss_step : 7.3062 learning_rate : 0.00031546789 loss_scaler : 32768
DLL 2020-11-27 03:14:36.411542 - Iteration: 381 throughput_train : 262.993 seq/s mlm_loss : 7.3154 nsp_loss : 0.0000 total_loss : 7.3154 avg_loss_step : 7.3154 learning_rate : 0.0003152142 loss_scaler : 32768
DLL 2020-11-27 03:14:38.362082 - Iteration: 382 throughput_train : 262.562 seq/s mlm_loss : 7.2916 nsp_loss : 0.0000 total_loss : 7.2916 avg_loss_step : 7.2916 learning_rate : 0.0003149603 loss_scaler : 32768
DLL 2020-11-27 03:14:40.308844 - Iteration: 383 throughput_train : 263.065 seq/s mlm_loss : 7.2635 nsp_loss : 0.0000 total_loss : 7.2635 avg_loss_step : 7.2635 learning_rate : 0.0003147062 loss_scaler : 32768
DLL 2020-11-27 03:14:42.262053 - Iteration: 384 throughput_train : 262.182 seq/s mlm_loss : 7.4183 nsp_loss : 0.0000 total_loss : 7.4183 avg_loss_step : 7.4183 learning_rate : 0.00031445187 loss_scaler : 32768
DLL 2020-11-27 03:14:44.207124 - Iteration: 385 throughput_train : 263.301 seq/s mlm_loss : 7.3548 nsp_loss : 0.0000 total_loss : 7.3548 avg_loss_step : 7.3548 learning_rate : 0.0003141974 loss_scaler : 32768
DLL 2020-11-27 03:14:46.153029 - Iteration: 386 throughput_train : 263.188 seq/s mlm_loss : 7.3124 nsp_loss : 0.0000 total_loss : 7.3124 avg_loss_step : 7.3124 learning_rate : 0.00031394267 loss_scaler : 32768
DLL 2020-11-27 03:14:48.094674 - Iteration: 387 throughput_train : 263.765 seq/s mlm_loss : 7.2628 nsp_loss : 0.0000 total_loss : 7.2628 avg_loss_step : 7.2628 learning_rate : 0.00031368775 loss_scaler : 32768
DLL 2020-11-27 03:14:50.043893 - Iteration: 388 throughput_train : 262.740 seq/s mlm_loss : 7.2957 nsp_loss : 0.0000 total_loss : 7.2957 avg_loss_step : 7.2957 learning_rate : 0.0003134326 loss_scaler : 32768
DLL 2020-11-27 03:14:51.996387 - Iteration: 389 throughput_train : 262.299 seq/s mlm_loss : 7.1609 nsp_loss : 0.0000 total_loss : 7.1609 avg_loss_step : 7.1609 learning_rate : 0.00031317724 loss_scaler : 32768
DLL 2020-11-27 03:14:53.947261 - Iteration: 390 throughput_train : 262.519 seq/s mlm_loss : 7.4116 nsp_loss : 0.0000 total_loss : 7.4116 avg_loss_step : 7.4116 learning_rate : 0.0003129217 loss_scaler : 32768
DLL 2020-11-27 03:14:55.901634 - Iteration: 391 throughput_train : 262.049 seq/s mlm_loss : 7.3070 nsp_loss : 0.0000 total_loss : 7.3070 avg_loss_step : 7.3070 learning_rate : 0.00031266594 loss_scaler : 32768
DLL 2020-11-27 03:14:57.851915 - Iteration: 392 throughput_train : 262.588 seq/s mlm_loss : 7.3381 nsp_loss : 0.0000 total_loss : 7.3381 avg_loss_step : 7.3381 learning_rate : 0.00031240997 loss_scaler : 32768
DLL 2020-11-27 03:14:59.787004 - Iteration: 393 throughput_train : 264.626 seq/s mlm_loss : 7.3303 nsp_loss : 0.0000 total_loss : 7.3303 avg_loss_step : 7.3303 learning_rate : 0.00031215377 loss_scaler : 32768
DLL 2020-11-27 03:15:01.727589 - Iteration: 394 throughput_train : 263.879 seq/s mlm_loss : 7.3183 nsp_loss : 0.0000 total_loss : 7.3183 avg_loss_step : 7.3183 learning_rate : 0.00031189743 loss_scaler : 32768
DLL 2020-11-27 03:15:03.680450 - Iteration: 395 throughput_train : 262.219 seq/s mlm_loss : 7.2667 nsp_loss : 0.0000 total_loss : 7.2667 avg_loss_step : 7.2667 learning_rate : 0.0003116408 loss_scaler : 32768
DLL 2020-11-27 03:15:05.628807 - Iteration: 396 throughput_train : 262.825 seq/s mlm_loss : 7.3126 nsp_loss : 0.0000 total_loss : 7.3126 avg_loss_step : 7.3126 learning_rate : 0.00031138398 loss_scaler : 32768
DLL 2020-11-27 03:15:07.570734 - Iteration: 397 throughput_train : 263.695 seq/s mlm_loss : 7.2125 nsp_loss : 0.0000 total_loss : 7.2125 avg_loss_step : 7.2125 learning_rate : 0.000311127 loss_scaler : 32768
DLL 2020-11-27 03:15:09.511404 - Iteration: 398 throughput_train : 263.866 seq/s mlm_loss : 7.3387 nsp_loss : 0.0000 total_loss : 7.3387 avg_loss_step : 7.3387 learning_rate : 0.00031086974 loss_scaler : 32768
DLL 2020-11-27 03:15:11.463021 - Iteration: 399 throughput_train : 262.388 seq/s mlm_loss : 7.1753 nsp_loss : 0.0000 total_loss : 7.1753 avg_loss_step : 7.1753 learning_rate : 0.0003106123 loss_scaler : 32768
DLL 2020-11-27 03:15:13.414586 - Iteration: 400 throughput_train : 262.393 seq/s mlm_loss : 7.2465 nsp_loss : 0.0000 total_loss : 7.2465 avg_loss_step : 7.2465 learning_rate : 0.00031035463 loss_scaler : 32768
DLL 2020-11-27 03:15:15.367561 - Iteration: 401 throughput_train : 262.205 seq/s mlm_loss : 7.1308 nsp_loss : 0.0000 total_loss : 7.1308 avg_loss_step : 7.1308 learning_rate : 0.00031009674 loss_scaler : 32768
DLL 2020-11-27 03:15:17.315753 - Iteration: 402 throughput_train : 262.849 seq/s mlm_loss : 7.2739 nsp_loss : 0.0000 total_loss : 7.2739 avg_loss_step : 7.2739 learning_rate : 0.00030983868 loss_scaler : 32768
DLL 2020-11-27 03:15:19.270262 - Iteration: 403 throughput_train : 261.999 seq/s mlm_loss : 7.1999 nsp_loss : 0.0000 total_loss : 7.1999 avg_loss_step : 7.1999 learning_rate : 0.00030958035 loss_scaler : 32768
DLL 2020-11-27 03:15:21.228614 - Iteration: 404 throughput_train : 261.483 seq/s mlm_loss : 7.2858 nsp_loss : 0.0000 total_loss : 7.2858 avg_loss_step : 7.2858 learning_rate : 0.00030932183 loss_scaler : 32768
DLL 2020-11-27 03:15:23.174665 - Iteration: 405 throughput_train : 263.136 seq/s mlm_loss : 7.3039 nsp_loss : 0.0000 total_loss : 7.3039 avg_loss_step : 7.3039 learning_rate : 0.0003090631 loss_scaler : 32768
DLL 2020-11-27 03:15:25.124872 - Iteration: 406 throughput_train : 262.578 seq/s mlm_loss : 7.2997 nsp_loss : 0.0000 total_loss : 7.2997 avg_loss_step : 7.2997 learning_rate : 0.00030880413 loss_scaler : 32768
DLL 2020-11-27 03:15:27.065716 - Iteration: 407 throughput_train : 263.842 seq/s mlm_loss : 7.2214 nsp_loss : 0.0000 total_loss : 7.2214 avg_loss_step : 7.2214 learning_rate : 0.00030854496 loss_scaler : 32768
DLL 2020-11-27 03:15:29.011778 - Iteration: 408 throughput_train : 263.135 seq/s mlm_loss : 7.4462 nsp_loss : 0.0000 total_loss : 7.4462 avg_loss_step : 7.4462 learning_rate : 0.00030828555 loss_scaler : 32768
DLL 2020-11-27 03:15:30.963769 - Iteration: 409 throughput_train : 262.336 seq/s mlm_loss : 7.3811 nsp_loss : 0.0000 total_loss : 7.3811 avg_loss_step : 7.3811 learning_rate : 0.00030802598 loss_scaler : 32768
DLL 2020-11-27 03:15:32.907842 - Iteration: 410 throughput_train : 263.407 seq/s mlm_loss : 7.2664 nsp_loss : 0.0000 total_loss : 7.2664 avg_loss_step : 7.2664 learning_rate : 0.00030776614 loss_scaler : 32768
DLL 2020-11-27 03:15:34.858753 - Iteration: 411 throughput_train : 262.480 seq/s mlm_loss : 7.3723 nsp_loss : 0.0000 total_loss : 7.3723 avg_loss_step : 7.3723 learning_rate : 0.00030750607 loss_scaler : 32768
DLL 2020-11-27 03:15:36.800987 - Iteration: 412 throughput_train : 263.652 seq/s mlm_loss : 7.2971 nsp_loss : 0.0000 total_loss : 7.2971 avg_loss_step : 7.2971 learning_rate : 0.00030724582 loss_scaler : 32768
DLL 2020-11-27 03:15:38.741682 - Iteration: 413 throughput_train : 263.862 seq/s mlm_loss : 7.2804 nsp_loss : 0.0000 total_loss : 7.2804 avg_loss_step : 7.2804 learning_rate : 0.0003069853 loss_scaler : 32768
DLL 2020-11-27 03:15:40.696054 - Iteration: 414 throughput_train : 262.016 seq/s mlm_loss : 7.2973 nsp_loss : 0.0000 total_loss : 7.2973 avg_loss_step : 7.2973 learning_rate : 0.0003067246 loss_scaler : 32768
DLL 2020-11-27 03:15:42.639584 - Iteration: 415 throughput_train : 263.482 seq/s mlm_loss : 7.3678 nsp_loss : 0.0000 total_loss : 7.3678 avg_loss_step : 7.3678 learning_rate : 0.00030646368 loss_scaler : 32768
DLL 2020-11-27 03:15:44.588588 - Iteration: 416 throughput_train : 262.741 seq/s mlm_loss : 7.2747 nsp_loss : 0.0000 total_loss : 7.2747 avg_loss_step : 7.2747 learning_rate : 0.00030620254 loss_scaler : 32768
DLL 2020-11-27 03:15:46.535568 - Iteration: 417 throughput_train : 263.012 seq/s mlm_loss : 7.2525 nsp_loss : 0.0000 total_loss : 7.2525 avg_loss_step : 7.2525 learning_rate : 0.00030594118 loss_scaler : 32768
DLL 2020-11-27 03:15:48.481952 - Iteration: 418 throughput_train : 263.091 seq/s mlm_loss : 7.3423 nsp_loss : 0.0000 total_loss : 7.3423 avg_loss_step : 7.3423 learning_rate : 0.00030567954 loss_scaler : 32768
DLL 2020-11-27 03:15:50.421816 - Iteration: 419 throughput_train : 263.976 seq/s mlm_loss : 7.3036 nsp_loss : 0.0000 total_loss : 7.3036 avg_loss_step : 7.3036 learning_rate : 0.00030541772 loss_scaler : 32768
DLL 2020-11-27 03:15:52.367017 - Iteration: 420 throughput_train : 263.251 seq/s mlm_loss : 7.3738 nsp_loss : 0.0000 total_loss : 7.3738 avg_loss_step : 7.3738 learning_rate : 0.0003051557 loss_scaler : 32768
DLL 2020-11-27 03:15:54.303752 - Iteration: 421 throughput_train : 264.402 seq/s mlm_loss : 7.2585 nsp_loss : 0.0000 total_loss : 7.2585 avg_loss_step : 7.2585 learning_rate : 0.00030489342 loss_scaler : 32768
DLL 2020-11-27 03:15:56.253725 - Iteration: 422 throughput_train : 262.606 seq/s mlm_loss : 7.2003 nsp_loss : 0.0000 total_loss : 7.2003 avg_loss_step : 7.2003 learning_rate : 0.00030463093 loss_scaler : 32768
DLL 2020-11-27 03:15:58.186356 - Iteration: 423 throughput_train : 264.963 seq/s mlm_loss : 7.1696 nsp_loss : 0.0000 total_loss : 7.1696 avg_loss_step : 7.1696 learning_rate : 0.00030436818 loss_scaler : 32768
DLL 2020-11-27 03:16:00.139047 - Iteration: 424 throughput_train : 262.242 seq/s mlm_loss : 7.1525 nsp_loss : 0.0000 total_loss : 7.1525 avg_loss_step : 7.1525 learning_rate : 0.00030410523 loss_scaler : 32768
DLL 2020-11-27 03:16:02.081917 - Iteration: 425 throughput_train : 263.567 seq/s mlm_loss : 7.2703 nsp_loss : 0.0000 total_loss : 7.2703 avg_loss_step : 7.2703 learning_rate : 0.00030384207 loss_scaler : 32768
DLL 2020-11-27 03:16:04.027022 - Iteration: 426 throughput_train : 263.264 seq/s mlm_loss : 7.2664 nsp_loss : 0.0000 total_loss : 7.2664 avg_loss_step : 7.2664 learning_rate : 0.00030357862 loss_scaler : 32768
DLL 2020-11-27 03:16:05.968365 - Iteration: 427 throughput_train : 263.775 seq/s mlm_loss : 7.3296 nsp_loss : 0.0000 total_loss : 7.3296 avg_loss_step : 7.3296 learning_rate : 0.000303315 loss_scaler : 32768
DLL 2020-11-27 03:16:07.924970 - Iteration: 428 throughput_train : 261.721 seq/s mlm_loss : 7.2815 nsp_loss : 0.0000 total_loss : 7.2815 avg_loss_step : 7.2815 learning_rate : 0.00030305114 loss_scaler : 32768
DLL 2020-11-27 03:16:09.857325 - Iteration: 429 throughput_train : 265.002 seq/s mlm_loss : 7.2173 nsp_loss : 0.0000 total_loss : 7.2173 avg_loss_step : 7.2173 learning_rate : 0.00030278703 loss_scaler : 32768
DLL 2020-11-27 03:16:11.800881 - Iteration: 430 throughput_train : 263.476 seq/s mlm_loss : 7.0816 nsp_loss : 0.0000 total_loss : 7.0816 avg_loss_step : 7.0816 learning_rate : 0.0003025227 loss_scaler : 32768
DLL 2020-11-27 03:16:13.749441 - Iteration: 431 throughput_train : 262.797 seq/s mlm_loss : 7.3168 nsp_loss : 0.0000 total_loss : 7.3168 avg_loss_step : 7.3168 learning_rate : 0.00030225815 loss_scaler : 32768
DLL 2020-11-27 03:16:15.700028 - Iteration: 432 throughput_train : 262.525 seq/s mlm_loss : 7.2327 nsp_loss : 0.0000 total_loss : 7.2327 avg_loss_step : 7.2327 learning_rate : 0.00030199336 loss_scaler : 32768
DLL 2020-11-27 03:16:17.643313 - Iteration: 433 throughput_train : 263.511 seq/s mlm_loss : 7.2969 nsp_loss : 0.0000 total_loss : 7.2969 avg_loss_step : 7.2969 learning_rate : 0.00030172837 loss_scaler : 32768
DLL 2020-11-27 03:16:19.602435 - Iteration: 434 throughput_train : 261.383 seq/s mlm_loss : 7.2798 nsp_loss : 0.0000 total_loss : 7.2798 avg_loss_step : 7.2798 learning_rate : 0.00030146306 loss_scaler : 32768
DLL 2020-11-27 03:16:21.545878 - Iteration: 435 throughput_train : 263.491 seq/s mlm_loss : 7.2688 nsp_loss : 0.0000 total_loss : 7.2688 avg_loss_step : 7.2688 learning_rate : 0.00030119758 loss_scaler : 32768
DLL 2020-11-27 03:16:23.489194 - Iteration: 436 throughput_train : 263.507 seq/s mlm_loss : 7.3614 nsp_loss : 0.0000 total_loss : 7.3614 avg_loss_step : 7.3614 learning_rate : 0.0003009319 loss_scaler : 32768
DLL 2020-11-27 03:16:25.440544 - Iteration: 437 throughput_train : 262.423 seq/s mlm_loss : 7.2099 nsp_loss : 0.0000 total_loss : 7.2099 avg_loss_step : 7.2099 learning_rate : 0.0003006659 loss_scaler : 32768
DLL 2020-11-27 03:16:27.397897 - Iteration: 438 throughput_train : 261.617 seq/s mlm_loss : 7.3102 nsp_loss : 0.0000 total_loss : 7.3102 avg_loss_step : 7.3102 learning_rate : 0.00030039973 loss_scaler : 32768
DLL 2020-11-27 03:16:29.343546 - Iteration: 439 throughput_train : 263.193 seq/s mlm_loss : 7.1580 nsp_loss : 0.0000 total_loss : 7.1580 avg_loss_step : 7.1580 learning_rate : 0.00030013328 loss_scaler : 32768
DLL 2020-11-27 03:16:31.295704 - Iteration: 440 throughput_train : 262.314 seq/s mlm_loss : 7.2427 nsp_loss : 0.0000 total_loss : 7.2427 avg_loss_step : 7.2427 learning_rate : 0.00029986663 loss_scaler : 32768
DLL 2020-11-27 03:16:33.237572 - Iteration: 441 throughput_train : 263.704 seq/s mlm_loss : 7.3552 nsp_loss : 0.0000 total_loss : 7.3552 avg_loss_step : 7.3552 learning_rate : 0.00029959972 loss_scaler : 32768
DLL 2020-11-27 03:16:35.179117 - Iteration: 442 throughput_train : 263.747 seq/s mlm_loss : 7.3647 nsp_loss : 0.0000 total_loss : 7.3647 avg_loss_step : 7.3647 learning_rate : 0.00029933258 loss_scaler : 32768
DLL 2020-11-27 03:16:37.128443 - Iteration: 443 throughput_train : 262.694 seq/s mlm_loss : 7.1967 nsp_loss : 0.0000 total_loss : 7.1967 avg_loss_step : 7.1967 learning_rate : 0.0002990652 loss_scaler : 32768
DLL 2020-11-27 03:16:39.076634 - Iteration: 444 throughput_train : 262.849 seq/s mlm_loss : 7.2465 nsp_loss : 0.0000 total_loss : 7.2465 avg_loss_step : 7.2465 learning_rate : 0.0002987976 loss_scaler : 32768
DLL 2020-11-27 03:16:41.020180 - Iteration: 445 throughput_train : 263.475 seq/s mlm_loss : 7.1700 nsp_loss : 0.0000 total_loss : 7.1700 avg_loss_step : 7.1700 learning_rate : 0.00029852972 loss_scaler : 32768
DLL 2020-11-27 03:16:42.964342 - Iteration: 446 throughput_train : 263.392 seq/s mlm_loss : 7.1666 nsp_loss : 0.0000 total_loss : 7.1666 avg_loss_step : 7.1666 learning_rate : 0.0002982616 loss_scaler : 32768
DLL 2020-11-27 03:16:44.913690 - Iteration: 447 throughput_train : 262.692 seq/s mlm_loss : 7.3221 nsp_loss : 0.0000 total_loss : 7.3221 avg_loss_step : 7.3221 learning_rate : 0.00029799328 loss_scaler : 32768
DLL 2020-11-27 03:16:46.862555 - Iteration: 448 throughput_train : 262.758 seq/s mlm_loss : 7.3824 nsp_loss : 0.0000 total_loss : 7.3824 avg_loss_step : 7.3824 learning_rate : 0.0002977247 loss_scaler : 32768
DLL 2020-11-27 03:16:48.810640 - Iteration: 449 throughput_train : 262.861 seq/s mlm_loss : 7.2208 nsp_loss : 0.0000 total_loss : 7.2208 avg_loss_step : 7.2208 learning_rate : 0.00029745587 loss_scaler : 32768
DLL 2020-11-27 03:16:50.761030 - Iteration: 450 throughput_train : 262.553 seq/s mlm_loss : 7.1204 nsp_loss : 0.0000 total_loss : 7.1204 avg_loss_step : 7.1204 learning_rate : 0.0002971868 loss_scaler : 32768
DLL 2020-11-27 03:16:52.696460 - Iteration: 451 throughput_train : 264.580 seq/s mlm_loss : 7.2672 nsp_loss : 0.0000 total_loss : 7.2672 avg_loss_step : 7.2672 learning_rate : 0.00029691748 loss_scaler : 32768
DLL 2020-11-27 03:16:54.640078 - Iteration: 452 throughput_train : 263.467 seq/s mlm_loss : 7.3183 nsp_loss : 0.0000 total_loss : 7.3183 avg_loss_step : 7.3183 learning_rate : 0.00029664792 loss_scaler : 32768
DLL 2020-11-27 03:16:56.588397 - Iteration: 453 throughput_train : 262.829 seq/s mlm_loss : 7.1531 nsp_loss : 0.0000 total_loss : 7.1531 avg_loss_step : 7.1531 learning_rate : 0.00029637813 loss_scaler : 32768
DLL 2020-11-27 03:16:58.544146 - Iteration: 454 throughput_train : 261.831 seq/s mlm_loss : 7.1511 nsp_loss : 0.0000 total_loss : 7.1511 avg_loss_step : 7.1511 learning_rate : 0.00029610808 loss_scaler : 32768
DLL 2020-11-27 03:17:00.497813 - Iteration: 455 throughput_train : 262.110 seq/s mlm_loss : 7.2128 nsp_loss : 0.0000 total_loss : 7.2128 avg_loss_step : 7.2128 learning_rate : 0.0002958378 loss_scaler : 32768
DLL 2020-11-27 03:17:02.446322 - Iteration: 456 throughput_train : 262.804 seq/s mlm_loss : 7.2407 nsp_loss : 0.0000 total_loss : 7.2407 avg_loss_step : 7.2407 learning_rate : 0.00029556724 loss_scaler : 32768
DLL 2020-11-27 03:17:04.380240 - Iteration: 457 throughput_train : 264.789 seq/s mlm_loss : 7.2755 nsp_loss : 0.0000 total_loss : 7.2755 avg_loss_step : 7.2755 learning_rate : 0.00029529646 loss_scaler : 32768
DLL 2020-11-27 03:17:06.330658 - Iteration: 458 throughput_train : 262.547 seq/s mlm_loss : 7.3426 nsp_loss : 0.0000 total_loss : 7.3426 avg_loss_step : 7.3426 learning_rate : 0.0002950254 loss_scaler : 32768
DLL 2020-11-27 03:17:08.270300 - Iteration: 459 throughput_train : 264.005 seq/s mlm_loss : 7.2810 nsp_loss : 0.0000 total_loss : 7.2810 avg_loss_step : 7.2810 learning_rate : 0.0002947541 loss_scaler : 32768
DLL 2020-11-27 03:17:10.218623 - Iteration: 460 throughput_train : 262.828 seq/s mlm_loss : 7.3477 nsp_loss : 0.0000 total_loss : 7.3477 avg_loss_step : 7.3477 learning_rate : 0.00029448257 loss_scaler : 32768
DLL 2020-11-27 03:17:12.164246 - Iteration: 461 throughput_train : 263.193 seq/s mlm_loss : 7.1943 nsp_loss : 0.0000 total_loss : 7.1943 avg_loss_step : 7.1943 learning_rate : 0.0002942108 loss_scaler : 32768
DLL 2020-11-27 03:17:14.109038 - Iteration: 462 throughput_train : 263.306 seq/s mlm_loss : 7.2505 nsp_loss : 0.0000 total_loss : 7.2505 avg_loss_step : 7.2505 learning_rate : 0.00029393873 loss_scaler : 32768
DLL 2020-11-27 03:17:16.067267 - Iteration: 463 throughput_train : 261.500 seq/s mlm_loss : 7.1528 nsp_loss : 0.0000 total_loss : 7.1528 avg_loss_step : 7.1528 learning_rate : 0.00029366647 loss_scaler : 32768
DLL 2020-11-27 03:17:18.014657 - Iteration: 464 throughput_train : 262.954 seq/s mlm_loss : 7.2128 nsp_loss : 0.0000 total_loss : 7.2128 avg_loss_step : 7.2128 learning_rate : 0.0002933939 loss_scaler : 32768
DLL 2020-11-27 03:17:19.962783 - Iteration: 465 throughput_train : 262.855 seq/s mlm_loss : 7.1456 nsp_loss : 0.0000 total_loss : 7.1456 avg_loss_step : 7.1456 learning_rate : 0.00029312112 loss_scaler : 32768
DLL 2020-11-27 03:17:21.912294 - Iteration: 466 throughput_train : 262.668 seq/s mlm_loss : 7.2387 nsp_loss : 0.0000 total_loss : 7.2387 avg_loss_step : 7.2387 learning_rate : 0.00029284807 loss_scaler : 32768
DLL 2020-11-27 03:17:23.856882 - Iteration: 467 throughput_train : 263.333 seq/s mlm_loss : 6.9721 nsp_loss : 0.0000 total_loss : 6.9721 avg_loss_step : 6.9721 learning_rate : 0.00029257475 loss_scaler : 32768
DLL 2020-11-27 03:17:25.792373 - Iteration: 468 throughput_train : 264.572 seq/s mlm_loss : 7.3186 nsp_loss : 0.0000 total_loss : 7.3186 avg_loss_step : 7.3186 learning_rate : 0.0002923012 loss_scaler : 32768
DLL 2020-11-27 03:17:27.737909 - Iteration: 469 throughput_train : 263.206 seq/s mlm_loss : 7.2957 nsp_loss : 0.0000 total_loss : 7.2957 avg_loss_step : 7.2957 learning_rate : 0.0002920274 loss_scaler : 32768
DLL 2020-11-27 03:17:29.689296 - Iteration: 470 throughput_train : 262.416 seq/s mlm_loss : 7.2355 nsp_loss : 0.0000 total_loss : 7.2355 avg_loss_step : 7.2355 learning_rate : 0.0002917533 loss_scaler : 32768
DLL 2020-11-27 03:17:31.635084 - Iteration: 471 throughput_train : 263.172 seq/s mlm_loss : 7.0653 nsp_loss : 0.0000 total_loss : 7.0653 avg_loss_step : 7.0653 learning_rate : 0.000291479 loss_scaler : 32768
DLL 2020-11-27 03:17:33.580531 - Iteration: 472 throughput_train : 263.221 seq/s mlm_loss : 7.1232 nsp_loss : 0.0000 total_loss : 7.1232 avg_loss_step : 7.1232 learning_rate : 0.00029120437 loss_scaler : 32768
DLL 2020-11-27 03:17:35.533341 - Iteration: 473 throughput_train : 262.225 seq/s mlm_loss : 7.1772 nsp_loss : 0.0000 total_loss : 7.1772 avg_loss_step : 7.1772 learning_rate : 0.0002909295 loss_scaler : 32768
DLL 2020-11-27 03:17:37.472083 - Iteration: 474 throughput_train : 264.130 seq/s mlm_loss : 7.1375 nsp_loss : 0.0000 total_loss : 7.1375 avg_loss_step : 7.1375 learning_rate : 0.00029065443 loss_scaler : 32768
DLL 2020-11-27 03:17:39.417440 - Iteration: 475 throughput_train : 263.230 seq/s mlm_loss : 7.1084 nsp_loss : 0.0000 total_loss : 7.1084 avg_loss_step : 7.1084 learning_rate : 0.00029037904 loss_scaler : 32768
DLL 2020-11-27 03:17:41.365612 - Iteration: 476 throughput_train : 262.850 seq/s mlm_loss : 7.0644 nsp_loss : 0.0000 total_loss : 7.0644 avg_loss_step : 7.0644 learning_rate : 0.0002901034 loss_scaler : 32768
DLL 2020-11-27 03:17:43.316128 - Iteration: 477 throughput_train : 262.535 seq/s mlm_loss : 7.2486 nsp_loss : 0.0000 total_loss : 7.2486 avg_loss_step : 7.2486 learning_rate : 0.00028982753 loss_scaler : 32768
DLL 2020-11-27 03:17:45.250002 - Iteration: 478 throughput_train : 264.792 seq/s mlm_loss : 7.1969 nsp_loss : 0.0000 total_loss : 7.1969 avg_loss_step : 7.1969 learning_rate : 0.00028955136 loss_scaler : 32768
DLL 2020-11-27 03:17:47.210227 - Iteration: 479 throughput_train : 261.233 seq/s mlm_loss : 7.3055 nsp_loss : 0.0000 total_loss : 7.3055 avg_loss_step : 7.3055 learning_rate : 0.00028927496 loss_scaler : 32768
DLL 2020-11-27 03:17:49.163325 - Iteration: 480 throughput_train : 262.187 seq/s mlm_loss : 7.1899 nsp_loss : 0.0000 total_loss : 7.1899 avg_loss_step : 7.1899 learning_rate : 0.00028899824 loss_scaler : 32768
DLL 2020-11-27 03:17:51.094211 - Iteration: 481 throughput_train : 265.204 seq/s mlm_loss : 7.1026 nsp_loss : 0.0000 total_loss : 7.1026 avg_loss_step : 7.1026 learning_rate : 0.0002887213 loss_scaler : 32768
DLL 2020-11-27 03:17:53.038726 - Iteration: 482 throughput_train : 263.343 seq/s mlm_loss : 7.2032 nsp_loss : 0.0000 total_loss : 7.2032 avg_loss_step : 7.2032 learning_rate : 0.00028844408 loss_scaler : 32768
DLL 2020-11-27 03:17:54.986559 - Iteration: 483 throughput_train : 262.895 seq/s mlm_loss : 7.3204 nsp_loss : 0.0000 total_loss : 7.3204 avg_loss_step : 7.3204 learning_rate : 0.0002881666 loss_scaler : 32768
DLL 2020-11-27 03:17:56.922554 - Iteration: 484 throughput_train : 264.503 seq/s mlm_loss : 7.1323 nsp_loss : 0.0000 total_loss : 7.1323 avg_loss_step : 7.1323 learning_rate : 0.00028788886 loss_scaler : 32768
DLL 2020-11-27 03:17:58.882423 - Iteration: 485 throughput_train : 261.281 seq/s mlm_loss : 7.2339 nsp_loss : 0.0000 total_loss : 7.2339 avg_loss_step : 7.2339 learning_rate : 0.00028761083 loss_scaler : 32768
DLL 2020-11-27 03:18:00.827993 - Iteration: 486 throughput_train : 263.206 seq/s mlm_loss : 7.2105 nsp_loss : 0.0000 total_loss : 7.2105 avg_loss_step : 7.2105 learning_rate : 0.00028733254 loss_scaler : 32768
DLL 2020-11-27 03:18:02.779055 - Iteration: 487 throughput_train : 262.466 seq/s mlm_loss : 7.3181 nsp_loss : 0.0000 total_loss : 7.3181 avg_loss_step : 7.3181 learning_rate : 0.000287054 loss_scaler : 32768
DLL 2020-11-27 03:18:04.728417 - Iteration: 488 throughput_train : 262.692 seq/s mlm_loss : 7.3262 nsp_loss : 0.0000 total_loss : 7.3262 avg_loss_step : 7.3262 learning_rate : 0.00028677515 loss_scaler : 32768
DLL 2020-11-27 03:18:06.674498 - Iteration: 489 throughput_train : 263.151 seq/s mlm_loss : 7.2364 nsp_loss : 0.0000 total_loss : 7.2364 avg_loss_step : 7.2364 learning_rate : 0.00028649607 loss_scaler : 32768
DLL 2020-11-27 03:18:08.612851 - Iteration: 490 throughput_train : 264.196 seq/s mlm_loss : 7.2315 nsp_loss : 0.0000 total_loss : 7.2315 avg_loss_step : 7.2315 learning_rate : 0.00028621667 loss_scaler : 32768
DLL 2020-11-27 03:18:10.557102 - Iteration: 491 throughput_train : 263.380 seq/s mlm_loss : 7.1599 nsp_loss : 0.0000 total_loss : 7.1599 avg_loss_step : 7.1599 learning_rate : 0.00028593704 loss_scaler : 32768
DLL 2020-11-27 03:18:12.502386 - Iteration: 492 throughput_train : 263.240 seq/s mlm_loss : 7.1576 nsp_loss : 0.0000 total_loss : 7.1576 avg_loss_step : 7.1576 learning_rate : 0.00028565712 loss_scaler : 32768
DLL 2020-11-27 03:18:14.450672 - Iteration: 493 throughput_train : 262.835 seq/s mlm_loss : 7.1592 nsp_loss : 0.0000 total_loss : 7.1592 avg_loss_step : 7.1592 learning_rate : 0.0002853769 loss_scaler : 32768
DLL 2020-11-27 03:18:16.403204 - Iteration: 494 throughput_train : 262.653 seq/s mlm_loss : 7.1827 nsp_loss : 0.0000 total_loss : 7.1827 avg_loss_step : 7.1827 learning_rate : 0.00028509647 loss_scaler : 32768
DLL 2020-11-27 03:18:18.346185 - Iteration: 495 throughput_train : 263.552 seq/s mlm_loss : 7.2456 nsp_loss : 0.0000 total_loss : 7.2456 avg_loss_step : 7.2456 learning_rate : 0.0002848157 loss_scaler : 32768
DLL 2020-11-27 03:18:20.293166 - Iteration: 496 throughput_train : 263.010 seq/s mlm_loss : 7.2091 nsp_loss : 0.0000 total_loss : 7.2091 avg_loss_step : 7.2091 learning_rate : 0.00028453468 loss_scaler : 32768
DLL 2020-11-27 03:18:22.232044 - Iteration: 497 throughput_train : 264.112 seq/s mlm_loss : 7.1442 nsp_loss : 0.0000 total_loss : 7.1442 avg_loss_step : 7.1442 learning_rate : 0.0002842534 loss_scaler : 32768
DLL 2020-11-27 03:18:24.182440 - Iteration: 498 throughput_train : 262.559 seq/s mlm_loss : 7.1780 nsp_loss : 0.0000 total_loss : 7.1780 avg_loss_step : 7.1780 learning_rate : 0.0002839718 loss_scaler : 32768
DLL 2020-11-27 03:18:26.118348 - Iteration: 499 throughput_train : 264.517 seq/s mlm_loss : 7.1481 nsp_loss : 0.0000 total_loss : 7.1481 avg_loss_step : 7.1481 learning_rate : 0.00028368997 loss_scaler : 32768
DLL 2020-11-27 03:18:28.067147 - Iteration: 500 throughput_train : 262.767 seq/s mlm_loss : 7.1578 nsp_loss : 0.0000 total_loss : 7.1578 avg_loss_step : 7.1578 learning_rate : 0.00028340783 loss_scaler : 32768
DLL 2020-11-27 03:18:30.029405 - Iteration: 501 throughput_train : 260.962 seq/s mlm_loss : 7.1923 nsp_loss : 0.0000 total_loss : 7.1923 avg_loss_step : 7.1923 learning_rate : 0.00028312538 loss_scaler : 32768
DLL 2020-11-27 03:18:31.981118 - Iteration: 502 throughput_train : 262.373 seq/s mlm_loss : 7.0698 nsp_loss : 0.0000 total_loss : 7.0698 avg_loss_step : 7.0698 learning_rate : 0.0002828427 loss_scaler : 32768
DLL 2020-11-27 03:18:33.929806 - Iteration: 503 throughput_train : 262.782 seq/s mlm_loss : 7.1298 nsp_loss : 0.0000 total_loss : 7.1298 avg_loss_step : 7.1298 learning_rate : 0.0002825597 loss_scaler : 32768
DLL 2020-11-27 03:18:35.887803 - Iteration: 504 throughput_train : 261.539 seq/s mlm_loss : 7.1348 nsp_loss : 0.0000 total_loss : 7.1348 avg_loss_step : 7.1348 learning_rate : 0.00028227642 loss_scaler : 32768
DLL 2020-11-27 03:18:37.837362 - Iteration: 505 throughput_train : 262.661 seq/s mlm_loss : 7.0968 nsp_loss : 0.0000 total_loss : 7.0968 avg_loss_step : 7.0968 learning_rate : 0.0002819929 loss_scaler : 32768
DLL 2020-11-27 03:18:39.777842 - Iteration: 506 throughput_train : 263.893 seq/s mlm_loss : 7.1746 nsp_loss : 0.0000 total_loss : 7.1746 avg_loss_step : 7.1746 learning_rate : 0.00028170907 loss_scaler : 32768
DLL 2020-11-27 03:18:41.712612 - Iteration: 507 throughput_train : 264.670 seq/s mlm_loss : 7.0472 nsp_loss : 0.0000 total_loss : 7.0472 avg_loss_step : 7.0472 learning_rate : 0.00028142493 loss_scaler : 32768
DLL 2020-11-27 03:18:43.658805 - Iteration: 508 throughput_train : 263.117 seq/s mlm_loss : 7.1247 nsp_loss : 0.0000 total_loss : 7.1247 avg_loss_step : 7.1247 learning_rate : 0.0002811405 loss_scaler : 32768
DLL 2020-11-27 03:18:45.608370 - Iteration: 509 throughput_train : 262.662 seq/s mlm_loss : 7.1646 nsp_loss : 0.0000 total_loss : 7.1646 avg_loss_step : 7.1646 learning_rate : 0.0002808558 loss_scaler : 32768
DLL 2020-11-27 03:18:47.562977 - Iteration: 510 throughput_train : 261.985 seq/s mlm_loss : 7.1234 nsp_loss : 0.0000 total_loss : 7.1234 avg_loss_step : 7.1234 learning_rate : 0.00028057082 loss_scaler : 32768
DLL 2020-11-27 03:18:49.510394 - Iteration: 511 throughput_train : 262.951 seq/s mlm_loss : 7.1751 nsp_loss : 0.0000 total_loss : 7.1751 avg_loss_step : 7.1751 learning_rate : 0.00028028557 loss_scaler : 32768
DLL 2020-11-27 03:18:51.460393 - Iteration: 512 throughput_train : 262.604 seq/s mlm_loss : 7.1782 nsp_loss : 0.0000 total_loss : 7.1782 avg_loss_step : 7.1782 learning_rate : 0.00027999998 loss_scaler : 32768
DLL 2020-11-27 03:18:53.401768 - Iteration: 513 throughput_train : 263.771 seq/s mlm_loss : 7.2937 nsp_loss : 0.0000 total_loss : 7.2937 avg_loss_step : 7.2937 learning_rate : 0.00027971412 loss_scaler : 32768
DLL 2020-11-27 03:18:55.344469 - Iteration: 514 throughput_train : 263.590 seq/s mlm_loss : 7.2783 nsp_loss : 0.0000 total_loss : 7.2783 avg_loss_step : 7.2783 learning_rate : 0.00027942797 loss_scaler : 32768
DLL 2020-11-27 03:18:57.287717 - Iteration: 515 throughput_train : 263.518 seq/s mlm_loss : 7.2636 nsp_loss : 0.0000 total_loss : 7.2636 avg_loss_step : 7.2636 learning_rate : 0.00027914153 loss_scaler : 32768
DLL 2020-11-27 03:18:59.237231 - Iteration: 516 throughput_train : 262.671 seq/s mlm_loss : 7.2480 nsp_loss : 0.0000 total_loss : 7.2480 avg_loss_step : 7.2480 learning_rate : 0.0002788548 loss_scaler : 32768
DLL 2020-11-27 03:19:01.186263 - Iteration: 517 throughput_train : 262.733 seq/s mlm_loss : 7.0492 nsp_loss : 0.0000 total_loss : 7.0492 avg_loss_step : 7.0492 learning_rate : 0.00027856775 loss_scaler : 32768
DLL 2020-11-27 03:19:03.135507 - Iteration: 518 throughput_train : 262.704 seq/s mlm_loss : 7.1615 nsp_loss : 0.0000 total_loss : 7.1615 avg_loss_step : 7.1615 learning_rate : 0.0002782804 loss_scaler : 32768
DLL 2020-11-27 03:19:05.081326 - Iteration: 519 throughput_train : 263.169 seq/s mlm_loss : 7.1776 nsp_loss : 0.0000 total_loss : 7.1776 avg_loss_step : 7.1776 learning_rate : 0.0002779928 loss_scaler : 32768
DLL 2020-11-27 03:19:07.025144 - Iteration: 520 throughput_train : 263.440 seq/s mlm_loss : 7.2151 nsp_loss : 0.0000 total_loss : 7.2151 avg_loss_step : 7.2151 learning_rate : 0.00027770488 loss_scaler : 32768
DLL 2020-11-27 03:19:08.972771 - Iteration: 521 throughput_train : 262.925 seq/s mlm_loss : 7.1575 nsp_loss : 0.0000 total_loss : 7.1575 avg_loss_step : 7.1575 learning_rate : 0.00027741664 loss_scaler : 32768
DLL 2020-11-27 03:19:10.912775 - Iteration: 522 throughput_train : 263.957 seq/s mlm_loss : 7.0164 nsp_loss : 0.0000 total_loss : 7.0164 avg_loss_step : 7.0164 learning_rate : 0.00027712813 loss_scaler : 32768
DLL 2020-11-27 03:19:12.867211 - Iteration: 523 throughput_train : 262.006 seq/s mlm_loss : 7.1357 nsp_loss : 0.0000 total_loss : 7.1357 avg_loss_step : 7.1357 learning_rate : 0.0002768393 loss_scaler : 32768
DLL 2020-11-27 03:19:14.815340 - Iteration: 524 throughput_train : 262.855 seq/s mlm_loss : 7.0565 nsp_loss : 0.0000 total_loss : 7.0565 avg_loss_step : 7.0565 learning_rate : 0.00027655016 loss_scaler : 32768
DLL 2020-11-27 03:19:16.765248 - Iteration: 525 throughput_train : 262.617 seq/s mlm_loss : 7.1156 nsp_loss : 0.0000 total_loss : 7.1156 avg_loss_step : 7.1156 learning_rate : 0.00027626075 loss_scaler : 32768
DLL 2020-11-27 03:19:18.713168 - Iteration: 526 throughput_train : 262.883 seq/s mlm_loss : 7.0051 nsp_loss : 0.0000 total_loss : 7.0051 avg_loss_step : 7.0051 learning_rate : 0.000275971 loss_scaler : 32768
DLL 2020-11-27 03:19:20.658228 - Iteration: 527 throughput_train : 263.273 seq/s mlm_loss : 6.9388 nsp_loss : 0.0000 total_loss : 6.9388 avg_loss_step : 6.9388 learning_rate : 0.00027568097 loss_scaler : 32768
DLL 2020-11-27 03:19:22.603323 - Iteration: 528 throughput_train : 263.267 seq/s mlm_loss : 6.9356 nsp_loss : 0.0000 total_loss : 6.9356 avg_loss_step : 6.9356 learning_rate : 0.00027539063 loss_scaler : 32768
DLL 2020-11-27 03:19:24.541240 - Iteration: 529 throughput_train : 264.252 seq/s mlm_loss : 6.9701 nsp_loss : 0.0000 total_loss : 6.9701 avg_loss_step : 6.9701 learning_rate : 0.00027509997 loss_scaler : 32768
DLL 2020-11-27 03:19:26.487712 - Iteration: 530 throughput_train : 263.082 seq/s mlm_loss : 7.0495 nsp_loss : 0.0000 total_loss : 7.0495 avg_loss_step : 7.0495 learning_rate : 0.00027480902 loss_scaler : 32768
DLL 2020-11-27 03:19:28.429878 - Iteration: 531 throughput_train : 263.671 seq/s mlm_loss : 7.1275 nsp_loss : 0.0000 total_loss : 7.1275 avg_loss_step : 7.1275 learning_rate : 0.00027451775 loss_scaler : 32768
DLL 2020-11-27 03:19:30.362474 - Iteration: 532 throughput_train : 264.970 seq/s mlm_loss : 7.2831 nsp_loss : 0.0000 total_loss : 7.2831 avg_loss_step : 7.2831 learning_rate : 0.00027422616 loss_scaler : 32768
DLL 2020-11-27 03:19:32.318731 - Iteration: 533 throughput_train : 261.765 seq/s mlm_loss : 7.0477 nsp_loss : 0.0000 total_loss : 7.0477 avg_loss_step : 7.0477 learning_rate : 0.00027393427 loss_scaler : 32768
DLL 2020-11-27 03:19:34.267678 - Iteration: 534 throughput_train : 262.745 seq/s mlm_loss : 7.2889 nsp_loss : 0.0000 total_loss : 7.2889 avg_loss_step : 7.2889 learning_rate : 0.0002736421 loss_scaler : 32768
DLL 2020-11-27 03:19:36.212019 - Iteration: 535 throughput_train : 263.367 seq/s mlm_loss : 7.2063 nsp_loss : 0.0000 total_loss : 7.2063 avg_loss_step : 7.2063 learning_rate : 0.00027334958 loss_scaler : 32768
DLL 2020-11-27 03:19:38.155338 - Iteration: 536 throughput_train : 263.506 seq/s mlm_loss : 7.0838 nsp_loss : 0.0000 total_loss : 7.0838 avg_loss_step : 7.0838 learning_rate : 0.00027305676 loss_scaler : 32768
DLL 2020-11-27 03:19:40.101339 - Iteration: 537 throughput_train : 263.145 seq/s mlm_loss : 7.1725 nsp_loss : 0.0000 total_loss : 7.1725 avg_loss_step : 7.1725 learning_rate : 0.00027276363 loss_scaler : 32768
DLL 2020-11-27 03:19:42.045766 - Iteration: 538 throughput_train : 263.361 seq/s mlm_loss : 7.3377 nsp_loss : 0.0000 total_loss : 7.3377 avg_loss_step : 7.3377 learning_rate : 0.00027247018 loss_scaler : 32768
DLL 2020-11-27 03:19:43.988432 - Iteration: 539 throughput_train : 263.595 seq/s mlm_loss : 7.2170 nsp_loss : 0.0000 total_loss : 7.2170 avg_loss_step : 7.2170 learning_rate : 0.0002721764 loss_scaler : 32768
DLL 2020-11-27 03:19:45.929438 - Iteration: 540 throughput_train : 263.824 seq/s mlm_loss : 7.0227 nsp_loss : 0.0000 total_loss : 7.0227 avg_loss_step : 7.0227 learning_rate : 0.0002718823 loss_scaler : 32768
DLL 2020-11-27 03:19:47.877439 - Iteration: 541 throughput_train : 262.874 seq/s mlm_loss : 7.0522 nsp_loss : 0.0000 total_loss : 7.0522 avg_loss_step : 7.0522 learning_rate : 0.00027158792 loss_scaler : 32768
DLL 2020-11-27 03:19:49.821037 - Iteration: 542 throughput_train : 263.470 seq/s mlm_loss : 7.2414 nsp_loss : 0.0000 total_loss : 7.2414 avg_loss_step : 7.2414 learning_rate : 0.0002712932 loss_scaler : 32768
DLL 2020-11-27 03:19:51.782809 - Iteration: 543 throughput_train : 261.027 seq/s mlm_loss : 7.2697 nsp_loss : 0.0000 total_loss : 7.2697 avg_loss_step : 7.2697 learning_rate : 0.00027099813 loss_scaler : 32768
DLL 2020-11-27 03:19:53.730627 - Iteration: 544 throughput_train : 262.897 seq/s mlm_loss : 7.2600 nsp_loss : 0.0000 total_loss : 7.2600 avg_loss_step : 7.2600 learning_rate : 0.00027070276 loss_scaler : 32768
DLL 2020-11-27 03:19:55.680624 - Iteration: 545 throughput_train : 262.603 seq/s mlm_loss : 7.0905 nsp_loss : 0.0000 total_loss : 7.0905 avg_loss_step : 7.0905 learning_rate : 0.00027040706 loss_scaler : 32768
DLL 2020-11-27 03:19:57.611166 - Iteration: 546 throughput_train : 265.250 seq/s mlm_loss : 7.0384 nsp_loss : 0.0000 total_loss : 7.0384 avg_loss_step : 7.0384 learning_rate : 0.00027011108 loss_scaler : 32768
DLL 2020-11-27 03:19:59.552783 - Iteration: 547 throughput_train : 263.738 seq/s mlm_loss : 7.0897 nsp_loss : 0.0000 total_loss : 7.0897 avg_loss_step : 7.0897 learning_rate : 0.00026981474 loss_scaler : 32768
DLL 2020-11-27 03:20:01.490483 - Iteration: 548 throughput_train : 264.270 seq/s mlm_loss : 7.0288 nsp_loss : 0.0000 total_loss : 7.0288 avg_loss_step : 7.0288 learning_rate : 0.0002695181 loss_scaler : 32768
DLL 2020-11-27 03:20:03.442452 - Iteration: 549 throughput_train : 262.340 seq/s mlm_loss : 7.0422 nsp_loss : 0.0000 total_loss : 7.0422 avg_loss_step : 7.0422 learning_rate : 0.00026922108 loss_scaler : 32768
DLL 2020-11-27 03:20:05.393829 - Iteration: 550 throughput_train : 262.419 seq/s mlm_loss : 7.1266 nsp_loss : 0.0000 total_loss : 7.1266 avg_loss_step : 7.1266 learning_rate : 0.00026892376 loss_scaler : 32768
DLL 2020-11-27 03:20:07.338064 - Iteration: 551 throughput_train : 263.382 seq/s mlm_loss : 7.2177 nsp_loss : 0.0000 total_loss : 7.2177 avg_loss_step : 7.2177 learning_rate : 0.0002686261 loss_scaler : 32768
DLL 2020-11-27 03:20:09.271888 - Iteration: 552 throughput_train : 264.800 seq/s mlm_loss : 7.2969 nsp_loss : 0.0000 total_loss : 7.2969 avg_loss_step : 7.2969 learning_rate : 0.00026832815 loss_scaler : 32768
DLL 2020-11-27 03:20:11.209971 - Iteration: 553 throughput_train : 264.218 seq/s mlm_loss : 7.1176 nsp_loss : 0.0000 total_loss : 7.1176 avg_loss_step : 7.1176 learning_rate : 0.00026802986 loss_scaler : 32768
DLL 2020-11-27 03:20:13.153740 - Iteration: 554 throughput_train : 263.445 seq/s mlm_loss : 7.0981 nsp_loss : 0.0000 total_loss : 7.0981 avg_loss_step : 7.0981 learning_rate : 0.00026773117 loss_scaler : 32768
DLL 2020-11-27 03:20:15.107648 - Iteration: 555 throughput_train : 262.078 seq/s mlm_loss : 7.0844 nsp_loss : 0.0000 total_loss : 7.0844 avg_loss_step : 7.0844 learning_rate : 0.00026743222 loss_scaler : 32768
DLL 2020-11-27 03:20:17.054797 - Iteration: 556 throughput_train : 262.988 seq/s mlm_loss : 7.0976 nsp_loss : 0.0000 total_loss : 7.0976 avg_loss_step : 7.0976 learning_rate : 0.0002671329 loss_scaler : 32768
DLL 2020-11-27 03:20:19.006990 - Iteration: 557 throughput_train : 262.308 seq/s mlm_loss : 7.1912 nsp_loss : 0.0000 total_loss : 7.1912 avg_loss_step : 7.1912 learning_rate : 0.0002668333 loss_scaler : 32768
DLL 2020-11-27 03:20:20.951309 - Iteration: 558 throughput_train : 263.371 seq/s mlm_loss : 7.2057 nsp_loss : 0.0000 total_loss : 7.2057 avg_loss_step : 7.2057 learning_rate : 0.0002665333 loss_scaler : 32768
DLL 2020-11-27 03:20:22.892554 - Iteration: 559 throughput_train : 263.791 seq/s mlm_loss : 7.0939 nsp_loss : 0.0000 total_loss : 7.0939 avg_loss_step : 7.0939 learning_rate : 0.00026623296 loss_scaler : 32768
DLL 2020-11-27 03:20:24.841706 - Iteration: 560 throughput_train : 262.721 seq/s mlm_loss : 6.9790 nsp_loss : 0.0000 total_loss : 6.9790 avg_loss_step : 6.9790 learning_rate : 0.00026593232 loss_scaler : 32768
DLL 2020-11-27 03:20:26.787927 - Iteration: 561 throughput_train : 263.112 seq/s mlm_loss : 7.1894 nsp_loss : 0.0000 total_loss : 7.1894 avg_loss_step : 7.1894 learning_rate : 0.0002656313 loss_scaler : 32768
DLL 2020-11-27 03:20:28.733536 - Iteration: 562 throughput_train : 263.197 seq/s mlm_loss : 7.2672 nsp_loss : 0.0000 total_loss : 7.2672 avg_loss_step : 7.2672 learning_rate : 0.00026532996 loss_scaler : 32768
DLL 2020-11-27 03:20:30.674247 - Iteration: 563 throughput_train : 263.860 seq/s mlm_loss : 7.1595 nsp_loss : 0.0000 total_loss : 7.1595 avg_loss_step : 7.1595 learning_rate : 0.00026502827 loss_scaler : 32768
DLL 2020-11-27 03:20:32.612386 - Iteration: 564 throughput_train : 264.210 seq/s mlm_loss : 7.1448 nsp_loss : 0.0000 total_loss : 7.1448 avg_loss_step : 7.1448 learning_rate : 0.00026472626 loss_scaler : 32768
DLL 2020-11-27 03:20:34.559971 - Iteration: 565 throughput_train : 262.931 seq/s mlm_loss : 7.1942 nsp_loss : 0.0000 total_loss : 7.1942 avg_loss_step : 7.1942 learning_rate : 0.0002644239 loss_scaler : 32768
DLL 2020-11-27 03:20:36.505449 - Iteration: 566 throughput_train : 263.214 seq/s mlm_loss : 7.2053 nsp_loss : 0.0000 total_loss : 7.2053 avg_loss_step : 7.2053 learning_rate : 0.00026412116 loss_scaler : 32768
DLL 2020-11-27 03:20:38.456541 - Iteration: 567 throughput_train : 262.460 seq/s mlm_loss : 7.2482 nsp_loss : 0.0000 total_loss : 7.2482 avg_loss_step : 7.2482 learning_rate : 0.0002638181 loss_scaler : 32768
DLL 2020-11-27 03:20:40.395718 - Iteration: 568 throughput_train : 264.069 seq/s mlm_loss : 7.1790 nsp_loss : 0.0000 total_loss : 7.1790 avg_loss_step : 7.1790 learning_rate : 0.00026351467 loss_scaler : 32768
DLL 2020-11-27 03:20:42.335309 - Iteration: 569 throughput_train : 264.015 seq/s mlm_loss : 7.0584 nsp_loss : 0.0000 total_loss : 7.0584 avg_loss_step : 7.0584 learning_rate : 0.00026321094 loss_scaler : 32768
DLL 2020-11-27 03:20:44.282949 - Iteration: 570 throughput_train : 262.933 seq/s mlm_loss : 6.9871 nsp_loss : 0.0000 total_loss : 6.9871 avg_loss_step : 6.9871 learning_rate : 0.0002629068 loss_scaler : 32768
DLL 2020-11-27 03:20:46.232002 - Iteration: 571 throughput_train : 262.734 seq/s mlm_loss : 7.0186 nsp_loss : 0.0000 total_loss : 7.0186 avg_loss_step : 7.0186 learning_rate : 0.00026260235 loss_scaler : 32768
DLL 2020-11-27 03:20:48.189396 - Iteration: 572 throughput_train : 261.611 seq/s mlm_loss : 7.1793 nsp_loss : 0.0000 total_loss : 7.1793 avg_loss_step : 7.1793 learning_rate : 0.00026229752 loss_scaler : 32768
DLL 2020-11-27 03:20:50.130271 - Iteration: 573 throughput_train : 263.846 seq/s mlm_loss : 7.1713 nsp_loss : 0.0000 total_loss : 7.1713 avg_loss_step : 7.1713 learning_rate : 0.00026199236 loss_scaler : 32768
DLL 2020-11-27 03:20:52.078693 - Iteration: 574 throughput_train : 262.832 seq/s mlm_loss : 7.2160 nsp_loss : 0.0000 total_loss : 7.2160 avg_loss_step : 7.2160 learning_rate : 0.00026168683 loss_scaler : 32768
DLL 2020-11-27 03:20:54.026601 - Iteration: 575 throughput_train : 262.888 seq/s mlm_loss : 6.9750 nsp_loss : 0.0000 total_loss : 6.9750 avg_loss_step : 6.9750 learning_rate : 0.00026138092 loss_scaler : 32768
DLL 2020-11-27 03:20:55.971629 - Iteration: 576 throughput_train : 263.274 seq/s mlm_loss : 7.0823 nsp_loss : 0.0000 total_loss : 7.0823 avg_loss_step : 7.0823 learning_rate : 0.0002610747 loss_scaler : 32768
DLL 2020-11-27 03:20:57.918076 - Iteration: 577 throughput_train : 263.082 seq/s mlm_loss : 7.2118 nsp_loss : 0.0000 total_loss : 7.2118 avg_loss_step : 7.2118 learning_rate : 0.00026076808 loss_scaler : 32768
DLL 2020-11-27 03:20:59.870169 - Iteration: 578 throughput_train : 262.322 seq/s mlm_loss : 7.0925 nsp_loss : 0.0000 total_loss : 7.0925 avg_loss_step : 7.0925 learning_rate : 0.00026046112 loss_scaler : 32768
DLL 2020-11-27 03:21:01.818958 - Iteration: 579 throughput_train : 262.767 seq/s mlm_loss : 7.1628 nsp_loss : 0.0000 total_loss : 7.1628 avg_loss_step : 7.1628 learning_rate : 0.00026015379 loss_scaler : 32768
DLL 2020-11-27 03:21:03.770499 - Iteration: 580 throughput_train : 262.395 seq/s mlm_loss : 7.0864 nsp_loss : 0.0000 total_loss : 7.0864 avg_loss_step : 7.0864 learning_rate : 0.0002598461 loss_scaler : 32768
DLL 2020-11-27 03:21:05.719906 - Iteration: 581 throughput_train : 262.685 seq/s mlm_loss : 6.9141 nsp_loss : 0.0000 total_loss : 6.9141 avg_loss_step : 6.9141 learning_rate : 0.00025953804 loss_scaler : 32768
DLL 2020-11-27 03:21:07.664120 - Iteration: 582 throughput_train : 263.388 seq/s mlm_loss : 7.0748 nsp_loss : 0.0000 total_loss : 7.0748 avg_loss_step : 7.0748 learning_rate : 0.0002592296 loss_scaler : 32768
DLL 2020-11-27 03:21:09.607752 - Iteration: 583 throughput_train : 263.463 seq/s mlm_loss : 7.0926 nsp_loss : 0.0000 total_loss : 7.0926 avg_loss_step : 7.0926 learning_rate : 0.00025892083 loss_scaler : 32768
DLL 2020-11-27 03:21:11.555554 - Iteration: 584 throughput_train : 262.902 seq/s mlm_loss : 7.1773 nsp_loss : 0.0000 total_loss : 7.1773 avg_loss_step : 7.1773 learning_rate : 0.00025861166 loss_scaler : 32768
DLL 2020-11-27 03:21:13.496964 - Iteration: 585 throughput_train : 263.765 seq/s mlm_loss : 7.0778 nsp_loss : 0.0000 total_loss : 7.0778 avg_loss_step : 7.0778 learning_rate : 0.00025830214 loss_scaler : 32768
DLL 2020-11-27 03:21:15.450330 - Iteration: 586 throughput_train : 262.150 seq/s mlm_loss : 7.1004 nsp_loss : 0.0000 total_loss : 7.1004 avg_loss_step : 7.1004 learning_rate : 0.00025799224 loss_scaler : 32768
DLL 2020-11-27 03:21:17.395173 - Iteration: 587 throughput_train : 263.302 seq/s mlm_loss : 7.3831 nsp_loss : 0.0000 total_loss : 7.3831 avg_loss_step : 7.3831 learning_rate : 0.00025768197 loss_scaler : 32768
DLL 2020-11-27 03:21:19.343337 - Iteration: 588 throughput_train : 262.849 seq/s mlm_loss : 7.1921 nsp_loss : 0.0000 total_loss : 7.1921 avg_loss_step : 7.1921 learning_rate : 0.0002573713 loss_scaler : 32768
DLL 2020-11-27 03:21:21.304618 - Iteration: 589 throughput_train : 261.093 seq/s mlm_loss : 7.0348 nsp_loss : 0.0000 total_loss : 7.0349 avg_loss_step : 7.0349 learning_rate : 0.00025706028 loss_scaler : 32768
DLL 2020-11-27 03:21:23.253165 - Iteration: 590 throughput_train : 262.800 seq/s mlm_loss : 6.9590 nsp_loss : 0.0000 total_loss : 6.9590 avg_loss_step : 6.9590 learning_rate : 0.0002567489 loss_scaler : 32768
DLL 2020-11-27 03:21:25.204190 - Iteration: 591 throughput_train : 262.478 seq/s mlm_loss : 7.0475 nsp_loss : 0.0000 total_loss : 7.0475 avg_loss_step : 7.0475 learning_rate : 0.0002564371 loss_scaler : 32768
DLL 2020-11-27 03:21:27.149005 - Iteration: 592 throughput_train : 263.316 seq/s mlm_loss : 7.0136 nsp_loss : 0.0000 total_loss : 7.0136 avg_loss_step : 7.0136 learning_rate : 0.00025612494 loss_scaler : 32768
DLL 2020-11-27 03:21:29.090985 - Iteration: 593 throughput_train : 263.688 seq/s mlm_loss : 7.0588 nsp_loss : 0.0000 total_loss : 7.0588 avg_loss_step : 7.0588 learning_rate : 0.00025581243 loss_scaler : 32768
DLL 2020-11-27 03:21:31.034236 - Iteration: 594 throughput_train : 263.516 seq/s mlm_loss : 7.1540 nsp_loss : 0.0000 total_loss : 7.1540 avg_loss_step : 7.1540 learning_rate : 0.0002554995 loss_scaler : 32768
DLL 2020-11-27 03:21:32.979977 - Iteration: 595 throughput_train : 263.179 seq/s mlm_loss : 7.1299 nsp_loss : 0.0000 total_loss : 7.1299 avg_loss_step : 7.1299 learning_rate : 0.0002551862 loss_scaler : 32768
DLL 2020-11-27 03:21:34.933181 - Iteration: 596 throughput_train : 262.172 seq/s mlm_loss : 6.9452 nsp_loss : 0.0000 total_loss : 6.9452 avg_loss_step : 6.9452 learning_rate : 0.00025487252 loss_scaler : 32768
DLL 2020-11-27 03:21:36.886540 - Iteration: 597 throughput_train : 262.154 seq/s mlm_loss : 6.9286 nsp_loss : 0.0000 total_loss : 6.9286 avg_loss_step : 6.9286 learning_rate : 0.00025455843 loss_scaler : 32768
DLL 2020-11-27 03:21:38.837778 - Iteration: 598 throughput_train : 262.439 seq/s mlm_loss : 7.0848 nsp_loss : 0.0000 total_loss : 7.0848 avg_loss_step : 7.0848 learning_rate : 0.00025424396 loss_scaler : 32768
DLL 2020-11-27 03:21:40.789211 - Iteration: 599 throughput_train : 262.412 seq/s mlm_loss : 7.1805 nsp_loss : 0.0000 total_loss : 7.1805 avg_loss_step : 7.1805 learning_rate : 0.00025392912 loss_scaler : 32768
DLL 2020-11-27 03:21:42.723330 - Iteration: 600 throughput_train : 264.772 seq/s mlm_loss : 7.0631 nsp_loss : 0.0000 total_loss : 7.0631 avg_loss_step : 7.0631 learning_rate : 0.00025361383 loss_scaler : 32768
DLL 2020-11-27 03:21:44.667979 - Iteration: 601 throughput_train : 263.341 seq/s mlm_loss : 7.1914 nsp_loss : 0.0000 total_loss : 7.1914 avg_loss_step : 7.1914 learning_rate : 0.00025329823 loss_scaler : 32768
DLL 2020-11-27 03:21:46.617833 - Iteration: 602 throughput_train : 262.634 seq/s mlm_loss : 7.1009 nsp_loss : 0.0000 total_loss : 7.1009 avg_loss_step : 7.1009 learning_rate : 0.0002529822 loss_scaler : 32768
DLL 2020-11-27 03:21:48.562578 - Iteration: 603 throughput_train : 263.323 seq/s mlm_loss : 7.1826 nsp_loss : 0.0000 total_loss : 7.1826 avg_loss_step : 7.1826 learning_rate : 0.00025266578 loss_scaler : 32768
DLL 2020-11-27 03:21:50.516434 - Iteration: 604 throughput_train : 262.088 seq/s mlm_loss : 7.0750 nsp_loss : 0.0000 total_loss : 7.0750 avg_loss_step : 7.0750 learning_rate : 0.00025234895 loss_scaler : 32768
DLL 2020-11-27 03:21:52.460586 - Iteration: 605 throughput_train : 263.401 seq/s mlm_loss : 7.1380 nsp_loss : 0.0000 total_loss : 7.1380 avg_loss_step : 7.1380 learning_rate : 0.00025203172 loss_scaler : 32768
DLL 2020-11-27 03:21:54.402108 - Iteration: 606 throughput_train : 263.761 seq/s mlm_loss : 6.9981 nsp_loss : 0.0000 total_loss : 6.9982 avg_loss_step : 6.9982 learning_rate : 0.0002517141 loss_scaler : 32768
DLL 2020-11-27 03:21:56.348618 - Iteration: 607 throughput_train : 263.084 seq/s mlm_loss : 7.1024 nsp_loss : 0.0000 total_loss : 7.1024 avg_loss_step : 7.1024 learning_rate : 0.00025139606 loss_scaler : 32768
DLL 2020-11-27 03:21:58.309506 - Iteration: 608 throughput_train : 261.155 seq/s mlm_loss : 7.1363 nsp_loss : 0.0000 total_loss : 7.1363 avg_loss_step : 7.1363 learning_rate : 0.00025107767 loss_scaler : 32768
DLL 2020-11-27 03:22:00.243032 - Iteration: 609 throughput_train : 264.848 seq/s mlm_loss : 7.1706 nsp_loss : 0.0000 total_loss : 7.1706 avg_loss_step : 7.1706 learning_rate : 0.00025075884 loss_scaler : 32768
DLL 2020-11-27 03:22:02.189001 - Iteration: 610 throughput_train : 263.146 seq/s mlm_loss : 7.0137 nsp_loss : 0.0000 total_loss : 7.0137 avg_loss_step : 7.0137 learning_rate : 0.0002504396 loss_scaler : 32768
DLL 2020-11-27 03:22:04.133536 - Iteration: 611 throughput_train : 263.346 seq/s mlm_loss : 7.1882 nsp_loss : 0.0000 total_loss : 7.1882 avg_loss_step : 7.1882 learning_rate : 0.00025011998 loss_scaler : 32768
DLL 2020-11-27 03:22:06.091560 - Iteration: 612 throughput_train : 261.534 seq/s mlm_loss : 7.0072 nsp_loss : 0.0000 total_loss : 7.0072 avg_loss_step : 7.0072 learning_rate : 0.00024979992 loss_scaler : 32768
DLL 2020-11-27 03:22:08.036186 - Iteration: 613 throughput_train : 263.347 seq/s mlm_loss : 7.2069 nsp_loss : 0.0000 total_loss : 7.2069 avg_loss_step : 7.2069 learning_rate : 0.00024947946 loss_scaler : 32768
DLL 2020-11-27 03:22:09.989369 - Iteration: 614 throughput_train : 262.203 seq/s mlm_loss : 7.0523 nsp_loss : 0.0000 total_loss : 7.0523 avg_loss_step : 7.0523 learning_rate : 0.00024915856 loss_scaler : 32768
DLL 2020-11-27 03:22:11.939356 - Iteration: 615 throughput_train : 262.627 seq/s mlm_loss : 6.9251 nsp_loss : 0.0000 total_loss : 6.9251 avg_loss_step : 6.9251 learning_rate : 0.00024883728 loss_scaler : 32768
DLL 2020-11-27 03:22:13.883108 - Iteration: 616 throughput_train : 263.448 seq/s mlm_loss : 6.9566 nsp_loss : 0.0000 total_loss : 6.9566 avg_loss_step : 6.9566 learning_rate : 0.00024851557 loss_scaler : 32768
DLL 2020-11-27 03:22:15.833874 - Iteration: 617 throughput_train : 262.502 seq/s mlm_loss : 7.1425 nsp_loss : 0.0000 total_loss : 7.1425 avg_loss_step : 7.1425 learning_rate : 0.00024819348 loss_scaler : 32768
DLL 2020-11-27 03:22:17.786790 - Iteration: 618 throughput_train : 262.225 seq/s mlm_loss : 6.9496 nsp_loss : 0.0000 total_loss : 6.9496 avg_loss_step : 6.9496 learning_rate : 0.00024787092 loss_scaler : 32768
DLL 2020-11-27 03:22:19.733366 - Iteration: 619 throughput_train : 263.064 seq/s mlm_loss : 6.9826 nsp_loss : 0.0000 total_loss : 6.9826 avg_loss_step : 6.9826 learning_rate : 0.00024754796 loss_scaler : 32768
DLL 2020-11-27 03:22:21.680087 - Iteration: 620 throughput_train : 263.047 seq/s mlm_loss : 7.0752 nsp_loss : 0.0000 total_loss : 7.0752 avg_loss_step : 7.0752 learning_rate : 0.00024722458 loss_scaler : 32768
DLL 2020-11-27 03:22:23.622326 - Iteration: 621 throughput_train : 263.661 seq/s mlm_loss : 7.1954 nsp_loss : 0.0000 total_loss : 7.1954 avg_loss_step : 7.1954 learning_rate : 0.00024690077 loss_scaler : 32768
DLL 2020-11-27 03:22:25.571710 - Iteration: 622 throughput_train : 262.709 seq/s mlm_loss : 7.1284 nsp_loss : 0.0000 total_loss : 7.1284 avg_loss_step : 7.1284 learning_rate : 0.00024657653 loss_scaler : 32768
DLL 2020-11-27 03:22:27.503534 - Iteration: 623 throughput_train : 265.090 seq/s mlm_loss : 7.0029 nsp_loss : 0.0000 total_loss : 7.0030 avg_loss_step : 7.0030 learning_rate : 0.00024625188 loss_scaler : 32768
DLL 2020-11-27 03:22:29.455088 - Iteration: 624 throughput_train : 262.395 seq/s mlm_loss : 7.1417 nsp_loss : 0.0000 total_loss : 7.1417 avg_loss_step : 7.1417 learning_rate : 0.0002459268 loss_scaler : 32768
DLL 2020-11-27 03:22:31.402952 - Iteration: 625 throughput_train : 262.903 seq/s mlm_loss : 7.2715 nsp_loss : 0.0000 total_loss : 7.2715 avg_loss_step : 7.2715 learning_rate : 0.0002456013 loss_scaler : 32768
DLL 2020-11-27 03:22:33.352297 - Iteration: 626 throughput_train : 262.696 seq/s mlm_loss : 7.2432 nsp_loss : 0.0000 total_loss : 7.2432 avg_loss_step : 7.2432 learning_rate : 0.00024527535 loss_scaler : 32768
DLL 2020-11-27 03:22:35.305663 - Iteration: 627 throughput_train : 262.165 seq/s mlm_loss : 7.1894 nsp_loss : 0.0000 total_loss : 7.1894 avg_loss_step : 7.1894 learning_rate : 0.00024494898 loss_scaler : 32768
DLL 2020-11-27 03:22:37.260157 - Iteration: 628 throughput_train : 261.999 seq/s mlm_loss : 7.0775 nsp_loss : 0.0000 total_loss : 7.0775 avg_loss_step : 7.0775 learning_rate : 0.00024462212 loss_scaler : 32768
DLL 2020-11-27 03:22:39.203642 - Iteration: 629 throughput_train : 263.483 seq/s mlm_loss : 6.9457 nsp_loss : 0.0000 total_loss : 6.9457 avg_loss_step : 6.9457 learning_rate : 0.00024429488 loss_scaler : 32768
DLL 2020-11-27 03:22:41.145131 - Iteration: 630 throughput_train : 263.757 seq/s mlm_loss : 6.9169 nsp_loss : 0.0000 total_loss : 6.9169 avg_loss_step : 6.9169 learning_rate : 0.0002439672 loss_scaler : 32768
DLL 2020-11-27 03:22:43.095743 - Iteration: 631 throughput_train : 262.522 seq/s mlm_loss : 6.8909 nsp_loss : 0.0000 total_loss : 6.8909 avg_loss_step : 6.8909 learning_rate : 0.00024363906 loss_scaler : 32768
DLL 2020-11-27 03:22:45.050648 - Iteration: 632 throughput_train : 261.947 seq/s mlm_loss : 6.9957 nsp_loss : 0.0000 total_loss : 6.9957 avg_loss_step : 6.9957 learning_rate : 0.00024331047 loss_scaler : 32768
DLL 2020-11-27 03:22:46.996099 - Iteration: 633 throughput_train : 263.222 seq/s mlm_loss : 7.0746 nsp_loss : 0.0000 total_loss : 7.0747 avg_loss_step : 7.0747 learning_rate : 0.00024298145 loss_scaler : 32768
DLL 2020-11-27 03:22:48.951313 - Iteration: 634 throughput_train : 261.907 seq/s mlm_loss : 7.0754 nsp_loss : 0.0000 total_loss : 7.0754 avg_loss_step : 7.0754 learning_rate : 0.00024265201 loss_scaler : 32768
DLL 2020-11-27 03:22:50.894697 - Iteration: 635 throughput_train : 263.502 seq/s mlm_loss : 7.1159 nsp_loss : 0.0000 total_loss : 7.1159 avg_loss_step : 7.1159 learning_rate : 0.00024232209 loss_scaler : 32768
DLL 2020-11-27 03:22:52.839473 - Iteration: 636 throughput_train : 263.309 seq/s mlm_loss : 7.0654 nsp_loss : 0.0000 total_loss : 7.0654 avg_loss_step : 7.0654 learning_rate : 0.00024199173 loss_scaler : 32768
DLL 2020-11-27 03:22:54.783736 - Iteration: 637 throughput_train : 263.379 seq/s mlm_loss : 7.0250 nsp_loss : 0.0000 total_loss : 7.0250 avg_loss_step : 7.0250 learning_rate : 0.0002416609 loss_scaler : 32768
DLL 2020-11-27 03:22:56.731716 - Iteration: 638 throughput_train : 262.876 seq/s mlm_loss : 7.0833 nsp_loss : 0.0000 total_loss : 7.0833 avg_loss_step : 7.0833 learning_rate : 0.00024132965 loss_scaler : 32768
DLL 2020-11-27 03:22:58.670307 - Iteration: 639 throughput_train : 264.149 seq/s mlm_loss : 7.0435 nsp_loss : 0.0000 total_loss : 7.0435 avg_loss_step : 7.0435 learning_rate : 0.0002409979 loss_scaler : 32768
DLL 2020-11-27 03:23:00.616991 - Iteration: 640 throughput_train : 263.052 seq/s mlm_loss : 7.1228 nsp_loss : 0.0000 total_loss : 7.1228 avg_loss_step : 7.1228 learning_rate : 0.00024066574 loss_scaler : 32768
DLL 2020-11-27 03:23:02.567431 - Iteration: 641 throughput_train : 262.544 seq/s mlm_loss : 7.0899 nsp_loss : 0.0000 total_loss : 7.0899 avg_loss_step : 7.0899 learning_rate : 0.00024033307 loss_scaler : 32768
DLL 2020-11-27 03:23:04.524282 - Iteration: 642 throughput_train : 261.686 seq/s mlm_loss : 7.1281 nsp_loss : 0.0000 total_loss : 7.1281 avg_loss_step : 7.1281 learning_rate : 0.00023999998 loss_scaler : 32768
DLL 2020-11-27 03:23:06.473948 - Iteration: 643 throughput_train : 262.649 seq/s mlm_loss : 7.0125 nsp_loss : 0.0000 total_loss : 7.0125 avg_loss_step : 7.0125 learning_rate : 0.0002396664 loss_scaler : 32768
DLL 2020-11-27 03:23:08.419540 - Iteration: 644 throughput_train : 263.200 seq/s mlm_loss : 7.0583 nsp_loss : 0.0000 total_loss : 7.0583 avg_loss_step : 7.0583 learning_rate : 0.00023933238 loss_scaler : 32768
DLL 2020-11-27 03:23:10.376749 - Iteration: 645 throughput_train : 261.636 seq/s mlm_loss : 6.9369 nsp_loss : 0.0000 total_loss : 6.9369 avg_loss_step : 6.9369 learning_rate : 0.0002389979 loss_scaler : 32768
DLL 2020-11-27 03:23:12.318104 - Iteration: 646 throughput_train : 263.773 seq/s mlm_loss : 7.1417 nsp_loss : 0.0000 total_loss : 7.1417 avg_loss_step : 7.1417 learning_rate : 0.00023866293 loss_scaler : 32768
DLL 2020-11-27 03:23:14.262519 - Iteration: 647 throughput_train : 263.359 seq/s mlm_loss : 7.1459 nsp_loss : 0.0000 total_loss : 7.1459 avg_loss_step : 7.1459 learning_rate : 0.0002383275 loss_scaler : 32768
DLL 2020-11-27 03:23:16.202768 - Iteration: 648 throughput_train : 263.930 seq/s mlm_loss : 7.0461 nsp_loss : 0.0000 total_loss : 7.0461 avg_loss_step : 7.0461 learning_rate : 0.00023799158 loss_scaler : 32768
DLL 2020-11-27 03:23:18.151602 - Iteration: 649 throughput_train : 262.768 seq/s mlm_loss : 6.9983 nsp_loss : 0.0000 total_loss : 6.9983 avg_loss_step : 6.9983 learning_rate : 0.0002376552 loss_scaler : 32768
DLL 2020-11-27 03:23:20.089050 - Iteration: 650 throughput_train : 264.309 seq/s mlm_loss : 7.0473 nsp_loss : 0.0000 total_loss : 7.0473 avg_loss_step : 7.0473 learning_rate : 0.00023731834 loss_scaler : 32768
DLL 2020-11-27 03:23:22.039600 - Iteration: 651 throughput_train : 262.529 seq/s mlm_loss : 7.0731 nsp_loss : 0.0000 total_loss : 7.0731 avg_loss_step : 7.0731 learning_rate : 0.00023698098 loss_scaler : 32768
DLL 2020-11-27 03:23:23.973682 - Iteration: 652 throughput_train : 264.768 seq/s mlm_loss : 7.1553 nsp_loss : 0.0000 total_loss : 7.1553 avg_loss_step : 7.1553 learning_rate : 0.00023664316 loss_scaler : 32768
DLL 2020-11-27 03:23:25.931862 - Iteration: 653 throughput_train : 261.518 seq/s mlm_loss : 7.1410 nsp_loss : 0.0000 total_loss : 7.1410 avg_loss_step : 7.1410 learning_rate : 0.00023630487 loss_scaler : 32768
DLL 2020-11-27 03:23:27.872907 - Iteration: 654 throughput_train : 263.826 seq/s mlm_loss : 7.1481 nsp_loss : 0.0000 total_loss : 7.1481 avg_loss_step : 7.1481 learning_rate : 0.00023596609 loss_scaler : 32768
DLL 2020-11-27 03:23:29.816172 - Iteration: 655 throughput_train : 263.525 seq/s mlm_loss : 7.1475 nsp_loss : 0.0000 total_loss : 7.1475 avg_loss_step : 7.1475 learning_rate : 0.0002356268 loss_scaler : 32768
DLL 2020-11-27 03:23:31.767129 - Iteration: 656 throughput_train : 262.486 seq/s mlm_loss : 7.0285 nsp_loss : 0.0000 total_loss : 7.0285 avg_loss_step : 7.0285 learning_rate : 0.00023528704 loss_scaler : 32768
DLL 2020-11-27 03:23:33.721638 - Iteration: 657 throughput_train : 262.008 seq/s mlm_loss : 6.7844 nsp_loss : 0.0000 total_loss : 6.7844 avg_loss_step : 6.7844 learning_rate : 0.0002349468 loss_scaler : 32768
DLL 2020-11-27 03:23:35.662933 - Iteration: 658 throughput_train : 263.792 seq/s mlm_loss : 6.9796 nsp_loss : 0.0000 total_loss : 6.9796 avg_loss_step : 6.9796 learning_rate : 0.00023460605 loss_scaler : 32768
DLL 2020-11-27 03:23:37.613840 - Iteration: 659 throughput_train : 262.493 seq/s mlm_loss : 7.1014 nsp_loss : 0.0000 total_loss : 7.1014 avg_loss_step : 7.1014 learning_rate : 0.0002342648 loss_scaler : 32768
DLL 2020-11-27 03:23:39.558120 - Iteration: 660 throughput_train : 263.390 seq/s mlm_loss : 6.9130 nsp_loss : 0.0000 total_loss : 6.9130 avg_loss_step : 6.9130 learning_rate : 0.00023392304 loss_scaler : 32768
DLL 2020-11-27 03:23:41.504818 - Iteration: 661 throughput_train : 263.066 seq/s mlm_loss : 6.9852 nsp_loss : 0.0000 total_loss : 6.9852 avg_loss_step : 6.9852 learning_rate : 0.0002335808 loss_scaler : 32768
DLL 2020-11-27 03:23:43.447441 - Iteration: 662 throughput_train : 263.612 seq/s mlm_loss : 6.8094 nsp_loss : 0.0000 total_loss : 6.8094 avg_loss_step : 6.8094 learning_rate : 0.00023323807 loss_scaler : 32768
DLL 2020-11-27 03:23:45.397238 - Iteration: 663 throughput_train : 262.643 seq/s mlm_loss : 6.9847 nsp_loss : 0.0000 total_loss : 6.9847 avg_loss_step : 6.9847 learning_rate : 0.00023289482 loss_scaler : 32768
DLL 2020-11-27 03:23:47.348038 - Iteration: 664 throughput_train : 262.505 seq/s mlm_loss : 7.0528 nsp_loss : 0.0000 total_loss : 7.0528 avg_loss_step : 7.0528 learning_rate : 0.00023255104 loss_scaler : 32768
DLL 2020-11-27 03:23:49.295678 - Iteration: 665 throughput_train : 262.924 seq/s mlm_loss : 6.9873 nsp_loss : 0.0000 total_loss : 6.9873 avg_loss_step : 6.9873 learning_rate : 0.00023220679 loss_scaler : 32768
DLL 2020-11-27 03:23:51.243775 - Iteration: 666 throughput_train : 262.871 seq/s mlm_loss : 7.1582 nsp_loss : 0.0000 total_loss : 7.1582 avg_loss_step : 7.1582 learning_rate : 0.00023186201 loss_scaler : 32768
DLL 2020-11-27 03:23:53.192768 - Iteration: 667 throughput_train : 262.748 seq/s mlm_loss : 7.0979 nsp_loss : 0.0000 total_loss : 7.0979 avg_loss_step : 7.0979 learning_rate : 0.00023151671 loss_scaler : 32768
DLL 2020-11-27 03:23:55.142280 - Iteration: 668 throughput_train : 262.672 seq/s mlm_loss : 7.1601 nsp_loss : 0.0000 total_loss : 7.1601 avg_loss_step : 7.1601 learning_rate : 0.00023117094 loss_scaler : 32768
DLL 2020-11-27 03:23:57.104023 - Iteration: 669 throughput_train : 261.047 seq/s mlm_loss : 7.0957 nsp_loss : 0.0000 total_loss : 7.0957 avg_loss_step : 7.0957 learning_rate : 0.00023082459 loss_scaler : 32768
DLL 2020-11-27 03:23:59.047447 - Iteration: 670 throughput_train : 263.515 seq/s mlm_loss : 7.1941 nsp_loss : 0.0000 total_loss : 7.1941 avg_loss_step : 7.1941 learning_rate : 0.00023047773 loss_scaler : 32768
DLL 2020-11-27 03:24:00.991592 - Iteration: 671 throughput_train : 263.401 seq/s mlm_loss : 7.1064 nsp_loss : 0.0000 total_loss : 7.1064 avg_loss_step : 7.1064 learning_rate : 0.00023013038 loss_scaler : 32768
DLL 2020-11-27 03:24:02.946119 - Iteration: 672 throughput_train : 262.000 seq/s mlm_loss : 7.0649 nsp_loss : 0.0000 total_loss : 7.0649 avg_loss_step : 7.0649 learning_rate : 0.0002297825 loss_scaler : 32768
DLL 2020-11-27 03:24:04.888017 - Iteration: 673 throughput_train : 263.698 seq/s mlm_loss : 7.2841 nsp_loss : 0.0000 total_loss : 7.2841 avg_loss_step : 7.2841 learning_rate : 0.00022943408 loss_scaler : 32768
DLL 2020-11-27 03:24:06.836771 - Iteration: 674 throughput_train : 262.783 seq/s mlm_loss : 7.2462 nsp_loss : 0.0000 total_loss : 7.2462 avg_loss_step : 7.2462 learning_rate : 0.00022908511 loss_scaler : 32768
DLL 2020-11-27 03:24:08.780666 - Iteration: 675 throughput_train : 263.437 seq/s mlm_loss : 6.9762 nsp_loss : 0.0000 total_loss : 6.9762 avg_loss_step : 6.9762 learning_rate : 0.00022873563 loss_scaler : 32768
DLL 2020-11-27 03:24:10.728669 - Iteration: 676 throughput_train : 262.875 seq/s mlm_loss : 7.1139 nsp_loss : 0.0000 total_loss : 7.1139 avg_loss_step : 7.1139 learning_rate : 0.00022838563 loss_scaler : 32768
DLL 2020-11-27 03:24:12.678210 - Iteration: 677 throughput_train : 262.673 seq/s mlm_loss : 6.9546 nsp_loss : 0.0000 total_loss : 6.9546 avg_loss_step : 6.9546 learning_rate : 0.00022803509 loss_scaler : 32768
DLL 2020-11-27 03:24:14.619714 - Iteration: 678 throughput_train : 263.755 seq/s mlm_loss : 7.0491 nsp_loss : 0.0000 total_loss : 7.0491 avg_loss_step : 7.0491 learning_rate : 0.00022768397 loss_scaler : 32768
DLL 2020-11-27 03:24:16.560428 - Iteration: 679 throughput_train : 263.875 seq/s mlm_loss : 7.1945 nsp_loss : 0.0000 total_loss : 7.1945 avg_loss_step : 7.1945 learning_rate : 0.00022733232 loss_scaler : 32768
DLL 2020-11-27 03:24:18.500545 - Iteration: 680 throughput_train : 263.955 seq/s mlm_loss : 7.2359 nsp_loss : 0.0000 total_loss : 7.2359 avg_loss_step : 7.2359 learning_rate : 0.00022698016 loss_scaler : 32768
DLL 2020-11-27 03:24:20.450423 - Iteration: 681 throughput_train : 262.631 seq/s mlm_loss : 7.2378 nsp_loss : 0.0000 total_loss : 7.2378 avg_loss_step : 7.2378 learning_rate : 0.00022662744 loss_scaler : 32768
DLL 2020-11-27 03:24:22.396063 - Iteration: 682 throughput_train : 263.204 seq/s mlm_loss : 7.2309 nsp_loss : 0.0000 total_loss : 7.2309 avg_loss_step : 7.2309 learning_rate : 0.00022627415 loss_scaler : 32768
DLL 2020-11-27 03:24:24.336008 - Iteration: 683 throughput_train : 263.976 seq/s mlm_loss : 7.0434 nsp_loss : 0.0000 total_loss : 7.0434 avg_loss_step : 7.0434 learning_rate : 0.00022592032 loss_scaler : 32768
DLL 2020-11-27 03:24:26.280672 - Iteration: 684 throughput_train : 263.335 seq/s mlm_loss : 7.1367 nsp_loss : 0.0000 total_loss : 7.1367 avg_loss_step : 7.1367 learning_rate : 0.00022556593 loss_scaler : 32768
DLL 2020-11-27 03:24:28.228948 - Iteration: 685 throughput_train : 262.845 seq/s mlm_loss : 7.0214 nsp_loss : 0.0000 total_loss : 7.0214 avg_loss_step : 7.0214 learning_rate : 0.000225211 loss_scaler : 32768
DLL 2020-11-27 03:24:30.175596 - Iteration: 686 throughput_train : 263.056 seq/s mlm_loss : 7.0182 nsp_loss : 0.0000 total_loss : 7.0182 avg_loss_step : 7.0182 learning_rate : 0.0002248555 loss_scaler : 32768
DLL 2020-11-27 03:24:32.112172 - Iteration: 687 throughput_train : 264.423 seq/s mlm_loss : 6.9856 nsp_loss : 0.0000 total_loss : 6.9856 avg_loss_step : 6.9856 learning_rate : 0.00022449941 loss_scaler : 32768
DLL 2020-11-27 03:24:34.066370 - Iteration: 688 throughput_train : 262.043 seq/s mlm_loss : 7.1135 nsp_loss : 0.0000 total_loss : 7.1135 avg_loss_step : 7.1135 learning_rate : 0.00022414279 loss_scaler : 32768
DLL 2020-11-27 03:24:36.002090 - Iteration: 689 throughput_train : 264.558 seq/s mlm_loss : 7.0519 nsp_loss : 0.0000 total_loss : 7.0519 avg_loss_step : 7.0519 learning_rate : 0.00022378558 loss_scaler : 32768
DLL 2020-11-27 03:24:37.946283 - Iteration: 690 throughput_train : 263.399 seq/s mlm_loss : 6.9851 nsp_loss : 0.0000 total_loss : 6.9851 avg_loss_step : 6.9851 learning_rate : 0.00022342784 loss_scaler : 32768
DLL 2020-11-27 03:24:39.887002 - Iteration: 691 throughput_train : 263.872 seq/s mlm_loss : 6.9778 nsp_loss : 0.0000 total_loss : 6.9778 avg_loss_step : 6.9778 learning_rate : 0.0002230695 loss_scaler : 32768
DLL 2020-11-27 03:24:41.826115 - Iteration: 692 throughput_train : 264.100 seq/s mlm_loss : 6.8826 nsp_loss : 0.0000 total_loss : 6.8826 avg_loss_step : 6.8826 learning_rate : 0.00022271056 loss_scaler : 32768
DLL 2020-11-27 03:24:43.767739 - Iteration: 693 throughput_train : 263.748 seq/s mlm_loss : 7.1220 nsp_loss : 0.0000 total_loss : 7.1220 avg_loss_step : 7.1220 learning_rate : 0.00022235104 loss_scaler : 32768
DLL 2020-11-27 03:24:45.714254 - Iteration: 694 throughput_train : 263.072 seq/s mlm_loss : 6.9653 nsp_loss : 0.0000 total_loss : 6.9653 avg_loss_step : 6.9653 learning_rate : 0.00022199098 loss_scaler : 32768
DLL 2020-11-27 03:24:47.650199 - Iteration: 695 throughput_train : 264.509 seq/s mlm_loss : 7.0284 nsp_loss : 0.0000 total_loss : 7.0284 avg_loss_step : 7.0284 learning_rate : 0.00022163031 loss_scaler : 32768
DLL 2020-11-27 03:24:49.594442 - Iteration: 696 throughput_train : 263.383 seq/s mlm_loss : 6.9866 nsp_loss : 0.0000 total_loss : 6.9866 avg_loss_step : 6.9866 learning_rate : 0.00022126906 loss_scaler : 32768
DLL 2020-11-27 03:24:51.544360 - Iteration: 697 throughput_train : 262.624 seq/s mlm_loss : 7.0649 nsp_loss : 0.0000 total_loss : 7.0649 avg_loss_step : 7.0649 learning_rate : 0.00022090721 loss_scaler : 32768
DLL 2020-11-27 03:24:53.496310 - Iteration: 698 throughput_train : 262.342 seq/s mlm_loss : 6.9949 nsp_loss : 0.0000 total_loss : 6.9949 avg_loss_step : 6.9949 learning_rate : 0.00022054477 loss_scaler : 32768
DLL 2020-11-27 03:24:55.442191 - Iteration: 699 throughput_train : 263.160 seq/s mlm_loss : 7.2054 nsp_loss : 0.0000 total_loss : 7.2054 avg_loss_step : 7.2054 learning_rate : 0.00022018173 loss_scaler : 32768
DLL 2020-11-27 03:24:57.388667 - Iteration: 700 throughput_train : 263.079 seq/s mlm_loss : 7.0942 nsp_loss : 0.0000 total_loss : 7.0942 avg_loss_step : 7.0942 learning_rate : 0.00021981809 loss_scaler : 32768
DLL 2020-11-27 03:24:59.338946 - Iteration: 701 throughput_train : 262.565 seq/s mlm_loss : 6.9557 nsp_loss : 0.0000 total_loss : 6.9557 avg_loss_step : 6.9557 learning_rate : 0.00021945382 loss_scaler : 32768
DLL 2020-11-27 03:25:01.286580 - Iteration: 702 throughput_train : 262.924 seq/s mlm_loss : 6.8414 nsp_loss : 0.0000 total_loss : 6.8414 avg_loss_step : 6.8414 learning_rate : 0.00021908901 loss_scaler : 32768
DLL 2020-11-27 03:25:03.242617 - Iteration: 703 throughput_train : 261.793 seq/s mlm_loss : 6.8979 nsp_loss : 0.0000 total_loss : 6.8979 avg_loss_step : 6.8979 learning_rate : 0.00021872355 loss_scaler : 32768
DLL 2020-11-27 03:25:05.183694 - Iteration: 704 throughput_train : 263.814 seq/s mlm_loss : 7.0620 nsp_loss : 0.0000 total_loss : 7.0620 avg_loss_step : 7.0620 learning_rate : 0.00021835748 loss_scaler : 32768
DLL 2020-11-27 03:25:07.137410 - Iteration: 705 throughput_train : 262.115 seq/s mlm_loss : 7.0214 nsp_loss : 0.0000 total_loss : 7.0214 avg_loss_step : 7.0214 learning_rate : 0.00021799082 loss_scaler : 32768
DLL 2020-11-27 03:25:09.087071 - Iteration: 706 throughput_train : 262.659 seq/s mlm_loss : 7.1363 nsp_loss : 0.0000 total_loss : 7.1363 avg_loss_step : 7.1363 learning_rate : 0.00021762348 loss_scaler : 32768
DLL 2020-11-27 03:25:11.033028 - Iteration: 707 throughput_train : 263.148 seq/s mlm_loss : 7.0697 nsp_loss : 0.0000 total_loss : 7.0697 avg_loss_step : 7.0697 learning_rate : 0.00021725558 loss_scaler : 32768
DLL 2020-11-27 03:25:12.985814 - Iteration: 708 throughput_train : 262.229 seq/s mlm_loss : 6.9899 nsp_loss : 0.0000 total_loss : 6.9899 avg_loss_step : 6.9899 learning_rate : 0.00021688704 loss_scaler : 32768
DLL 2020-11-27 03:25:14.935678 - Iteration: 709 throughput_train : 262.621 seq/s mlm_loss : 7.1026 nsp_loss : 0.0000 total_loss : 7.1026 avg_loss_step : 7.1026 learning_rate : 0.0002165179 loss_scaler : 32768
DLL 2020-11-27 03:25:16.877306 - Iteration: 710 throughput_train : 263.735 seq/s mlm_loss : 7.2044 nsp_loss : 0.0000 total_loss : 7.2044 avg_loss_step : 7.2044 learning_rate : 0.00021614808 loss_scaler : 32768
DLL 2020-11-27 03:25:18.825260 - Iteration: 711 throughput_train : 262.883 seq/s mlm_loss : 7.0509 nsp_loss : 0.0000 total_loss : 7.0509 avg_loss_step : 7.0509 learning_rate : 0.00021577763 loss_scaler : 32768
DLL 2020-11-27 03:25:20.762617 - Iteration: 712 throughput_train : 264.328 seq/s mlm_loss : 6.9376 nsp_loss : 0.0000 total_loss : 6.9376 avg_loss_step : 6.9376 learning_rate : 0.00021540657 loss_scaler : 32768
DLL 2020-11-27 03:25:22.699380 - Iteration: 713 throughput_train : 264.398 seq/s mlm_loss : 6.9922 nsp_loss : 0.0000 total_loss : 6.9922 avg_loss_step : 6.9922 learning_rate : 0.00021503486 loss_scaler : 32768
DLL 2020-11-27 03:25:24.644235 - Iteration: 714 throughput_train : 263.304 seq/s mlm_loss : 7.0038 nsp_loss : 0.0000 total_loss : 7.0038 avg_loss_step : 7.0038 learning_rate : 0.00021466252 loss_scaler : 32768
DLL 2020-11-27 03:25:26.585650 - Iteration: 715 throughput_train : 263.778 seq/s mlm_loss : 7.0077 nsp_loss : 0.0000 total_loss : 7.0077 avg_loss_step : 7.0077 learning_rate : 0.0002142895 loss_scaler : 32768
DLL 2020-11-27 03:25:28.523752 - Iteration: 716 throughput_train : 264.224 seq/s mlm_loss : 6.9868 nsp_loss : 0.0000 total_loss : 6.9868 avg_loss_step : 6.9868 learning_rate : 0.00021391585 loss_scaler : 32768
DLL 2020-11-27 03:25:30.470348 - Iteration: 717 throughput_train : 263.062 seq/s mlm_loss : 7.0199 nsp_loss : 0.0000 total_loss : 7.0199 avg_loss_step : 7.0199 learning_rate : 0.00021354156 loss_scaler : 32768
DLL 2020-11-27 03:25:32.414807 - Iteration: 718 throughput_train : 263.353 seq/s mlm_loss : 6.9882 nsp_loss : 0.0000 total_loss : 6.9883 avg_loss_step : 6.9883 learning_rate : 0.00021316658 loss_scaler : 32768
DLL 2020-11-27 03:25:34.358761 - Iteration: 719 throughput_train : 263.420 seq/s mlm_loss : 7.0232 nsp_loss : 0.0000 total_loss : 7.0232 avg_loss_step : 7.0232 learning_rate : 0.00021279095 loss_scaler : 32768
DLL 2020-11-27 03:25:36.316535 - Iteration: 720 throughput_train : 261.562 seq/s mlm_loss : 6.9758 nsp_loss : 0.0000 total_loss : 6.9758 avg_loss_step : 6.9758 learning_rate : 0.00021241467 loss_scaler : 32768
DLL 2020-11-27 03:25:38.260948 - Iteration: 721 throughput_train : 263.358 seq/s mlm_loss : 6.9860 nsp_loss : 0.0000 total_loss : 6.9860 avg_loss_step : 6.9860 learning_rate : 0.0002120377 loss_scaler : 32768
DLL 2020-11-27 03:25:40.203508 - Iteration: 722 throughput_train : 263.611 seq/s mlm_loss : 6.7907 nsp_loss : 0.0000 total_loss : 6.7907 avg_loss_step : 6.7907 learning_rate : 0.0002116601 loss_scaler : 32768
DLL 2020-11-27 03:25:42.149083 - Iteration: 723 throughput_train : 263.213 seq/s mlm_loss : 6.9649 nsp_loss : 0.0000 total_loss : 6.9649 avg_loss_step : 6.9649 learning_rate : 0.00021128179 loss_scaler : 32768
DLL 2020-11-27 03:25:44.099758 - Iteration: 724 throughput_train : 262.521 seq/s mlm_loss : 7.0133 nsp_loss : 0.0000 total_loss : 7.0133 avg_loss_step : 7.0133 learning_rate : 0.00021090278 loss_scaler : 32768
DLL 2020-11-27 03:25:46.039909 - Iteration: 725 throughput_train : 263.936 seq/s mlm_loss : 7.0436 nsp_loss : 0.0000 total_loss : 7.0436 avg_loss_step : 7.0436 learning_rate : 0.00021052313 loss_scaler : 32768
DLL 2020-11-27 03:25:47.978228 - Iteration: 726 throughput_train : 264.188 seq/s mlm_loss : 6.9242 nsp_loss : 0.0000 total_loss : 6.9242 avg_loss_step : 6.9242 learning_rate : 0.0002101428 loss_scaler : 32768
DLL 2020-11-27 03:25:49.922639 - Iteration: 727 throughput_train : 263.369 seq/s mlm_loss : 6.9340 nsp_loss : 0.0000 total_loss : 6.9340 avg_loss_step : 6.9340 learning_rate : 0.00020976175 loss_scaler : 32768
DLL 2020-11-27 03:25:51.869811 - Iteration: 728 throughput_train : 262.997 seq/s mlm_loss : 6.9410 nsp_loss : 0.0000 total_loss : 6.9410 avg_loss_step : 6.9410 learning_rate : 0.00020938003 loss_scaler : 32768
DLL 2020-11-27 03:25:53.814381 - Iteration: 729 throughput_train : 263.348 seq/s mlm_loss : 6.9553 nsp_loss : 0.0000 total_loss : 6.9553 avg_loss_step : 6.9553 learning_rate : 0.00020899758 loss_scaler : 32768
DLL 2020-11-27 03:25:55.752746 - Iteration: 730 throughput_train : 264.190 seq/s mlm_loss : 7.0088 nsp_loss : 0.0000 total_loss : 7.0088 avg_loss_step : 7.0088 learning_rate : 0.00020861447 loss_scaler : 32768
DLL 2020-11-27 03:25:57.696858 - Iteration: 731 throughput_train : 263.399 seq/s mlm_loss : 7.0216 nsp_loss : 0.0000 total_loss : 7.0216 avg_loss_step : 7.0216 learning_rate : 0.00020823063 loss_scaler : 32768
DLL 2020-11-27 03:25:59.646614 - Iteration: 732 throughput_train : 262.636 seq/s mlm_loss : 6.9938 nsp_loss : 0.0000 total_loss : 6.9938 avg_loss_step : 6.9938 learning_rate : 0.00020784608 loss_scaler : 32768
DLL 2020-11-27 03:26:01.595195 - Iteration: 733 throughput_train : 262.794 seq/s mlm_loss : 7.1580 nsp_loss : 0.0000 total_loss : 7.1580 avg_loss_step : 7.1580 learning_rate : 0.00020746082 loss_scaler : 32768
DLL 2020-11-27 03:26:03.542897 - Iteration: 734 throughput_train : 262.913 seq/s mlm_loss : 7.0598 nsp_loss : 0.0000 total_loss : 7.0598 avg_loss_step : 7.0598 learning_rate : 0.00020707485 loss_scaler : 32768
DLL 2020-11-27 03:26:05.489226 - Iteration: 735 throughput_train : 263.100 seq/s mlm_loss : 7.1554 nsp_loss : 0.0000 total_loss : 7.1554 avg_loss_step : 7.1554 learning_rate : 0.00020668816 loss_scaler : 32768
DLL 2020-11-27 03:26:07.436096 - Iteration: 736 throughput_train : 263.036 seq/s mlm_loss : 6.9996 nsp_loss : 0.0000 total_loss : 6.9996 avg_loss_step : 6.9996 learning_rate : 0.00020630073 loss_scaler : 32768
DLL 2020-11-27 03:26:09.380758 - Iteration: 737 throughput_train : 263.322 seq/s mlm_loss : 7.1907 nsp_loss : 0.0000 total_loss : 7.1907 avg_loss_step : 7.1907 learning_rate : 0.00020591258 loss_scaler : 32768
DLL 2020-11-27 03:26:11.330650 - Iteration: 738 throughput_train : 262.617 seq/s mlm_loss : 6.9549 nsp_loss : 0.0000 total_loss : 6.9549 avg_loss_step : 6.9549 learning_rate : 0.0002055237 loss_scaler : 32768
DLL 2020-11-27 03:26:13.274491 - Iteration: 739 throughput_train : 263.434 seq/s mlm_loss : 7.0746 nsp_loss : 0.0000 total_loss : 7.0746 avg_loss_step : 7.0746 learning_rate : 0.00020513407 loss_scaler : 32768
DLL 2020-11-27 03:26:15.222379 - Iteration: 740 throughput_train : 262.888 seq/s mlm_loss : 7.0119 nsp_loss : 0.0000 total_loss : 7.0119 avg_loss_step : 7.0119 learning_rate : 0.00020474372 loss_scaler : 32768
DLL 2020-11-27 03:26:17.167334 - Iteration: 741 throughput_train : 263.285 seq/s mlm_loss : 6.9504 nsp_loss : 0.0000 total_loss : 6.9504 avg_loss_step : 6.9504 learning_rate : 0.00020435262 loss_scaler : 32768
DLL 2020-11-27 03:26:19.113672 - Iteration: 742 throughput_train : 263.099 seq/s mlm_loss : 7.0823 nsp_loss : 0.0000 total_loss : 7.0823 avg_loss_step : 7.0823 learning_rate : 0.00020396076 loss_scaler : 32768
DLL 2020-11-27 03:26:21.058406 - Iteration: 743 throughput_train : 263.316 seq/s mlm_loss : 7.1353 nsp_loss : 0.0000 total_loss : 7.1353 avg_loss_step : 7.1353 learning_rate : 0.00020356814 loss_scaler : 32768
DLL 2020-11-27 03:26:23.004111 - Iteration: 744 throughput_train : 263.185 seq/s mlm_loss : 6.9096 nsp_loss : 0.0000 total_loss : 6.9096 avg_loss_step : 6.9096 learning_rate : 0.00020317477 loss_scaler : 32768
DLL 2020-11-27 03:26:24.962059 - Iteration: 745 throughput_train : 261.537 seq/s mlm_loss : 6.9148 nsp_loss : 0.0000 total_loss : 6.9149 avg_loss_step : 6.9149 learning_rate : 0.00020278065 loss_scaler : 32768
DLL 2020-11-27 03:26:26.906694 - Iteration: 746 throughput_train : 263.329 seq/s mlm_loss : 7.0222 nsp_loss : 0.0000 total_loss : 7.0222 avg_loss_step : 7.0222 learning_rate : 0.00020238575 loss_scaler : 32768
DLL 2020-11-27 03:26:28.849332 - Iteration: 747 throughput_train : 263.599 seq/s mlm_loss : 7.0976 nsp_loss : 0.0000 total_loss : 7.0976 avg_loss_step : 7.0976 learning_rate : 0.00020199007 loss_scaler : 32768
DLL 2020-11-27 03:26:30.797783 - Iteration: 748 throughput_train : 262.814 seq/s mlm_loss : 7.0411 nsp_loss : 0.0000 total_loss : 7.0411 avg_loss_step : 7.0411 learning_rate : 0.00020159363 loss_scaler : 32768
DLL 2020-11-27 03:26:32.741763 - Iteration: 749 throughput_train : 263.418 seq/s mlm_loss : 7.0408 nsp_loss : 0.0000 total_loss : 7.0408 avg_loss_step : 7.0408 learning_rate : 0.00020119641 loss_scaler : 32768
DLL 2020-11-27 03:26:34.687990 - Iteration: 750 throughput_train : 263.113 seq/s mlm_loss : 7.0200 nsp_loss : 0.0000 total_loss : 7.0200 avg_loss_step : 7.0200 learning_rate : 0.00020079839 loss_scaler : 32768
DLL 2020-11-27 03:26:36.635163 - Iteration: 751 throughput_train : 262.985 seq/s mlm_loss : 7.0793 nsp_loss : 0.0000 total_loss : 7.0793 avg_loss_step : 7.0793 learning_rate : 0.00020039959 loss_scaler : 32768
DLL 2020-11-27 03:26:38.587543 - Iteration: 752 throughput_train : 262.290 seq/s mlm_loss : 7.0888 nsp_loss : 0.0000 total_loss : 7.0889 avg_loss_step : 7.0889 learning_rate : 0.00019999997 loss_scaler : 32768
DLL 2020-11-27 03:26:40.541880 - Iteration: 753 throughput_train : 262.033 seq/s mlm_loss : 6.9689 nsp_loss : 0.0000 total_loss : 6.9689 avg_loss_step : 6.9689 learning_rate : 0.00019959957 loss_scaler : 32768
DLL 2020-11-27 03:26:42.490621 - Iteration: 754 throughput_train : 262.782 seq/s mlm_loss : 7.0568 nsp_loss : 0.0000 total_loss : 7.0568 avg_loss_step : 7.0568 learning_rate : 0.00019919837 loss_scaler : 32768
DLL 2020-11-27 03:26:44.431352 - Iteration: 755 throughput_train : 263.880 seq/s mlm_loss : 6.9777 nsp_loss : 0.0000 total_loss : 6.9777 avg_loss_step : 6.9777 learning_rate : 0.00019879636 loss_scaler : 32768
DLL 2020-11-27 03:26:46.381201 - Iteration: 756 throughput_train : 262.632 seq/s mlm_loss : 6.8878 nsp_loss : 0.0000 total_loss : 6.8878 avg_loss_step : 6.8878 learning_rate : 0.00019839354 loss_scaler : 32768
DLL 2020-11-27 03:26:48.329564 - Iteration: 757 throughput_train : 262.851 seq/s mlm_loss : 6.9815 nsp_loss : 0.0000 total_loss : 6.9815 avg_loss_step : 6.9815 learning_rate : 0.00019798988 loss_scaler : 32768
DLL 2020-11-27 03:26:50.263999 - Iteration: 758 throughput_train : 264.735 seq/s mlm_loss : 6.8626 nsp_loss : 0.0000 total_loss : 6.8626 avg_loss_step : 6.8626 learning_rate : 0.0001975854 loss_scaler : 32768
DLL 2020-11-27 03:26:52.215179 - Iteration: 759 throughput_train : 262.447 seq/s mlm_loss : 6.8865 nsp_loss : 0.0000 total_loss : 6.8865 avg_loss_step : 6.8865 learning_rate : 0.0001971801 loss_scaler : 32768
DLL 2020-11-27 03:26:54.158885 - Iteration: 760 throughput_train : 263.468 seq/s mlm_loss : 6.9496 nsp_loss : 0.0000 total_loss : 6.9496 avg_loss_step : 6.9496 learning_rate : 0.00019677397 loss_scaler : 32768
DLL 2020-11-27 03:26:56.107146 - Iteration: 761 throughput_train : 262.842 seq/s mlm_loss : 7.0728 nsp_loss : 0.0000 total_loss : 7.0728 avg_loss_step : 7.0728 learning_rate : 0.00019636698 loss_scaler : 32768
DLL 2020-11-27 03:26:58.057779 - Iteration: 762 throughput_train : 262.534 seq/s mlm_loss : 7.1621 nsp_loss : 0.0000 total_loss : 7.1621 avg_loss_step : 7.1621 learning_rate : 0.00019595916 loss_scaler : 32768
DLL 2020-11-27 03:26:59.999778 - Iteration: 763 throughput_train : 263.701 seq/s mlm_loss : 7.0799 nsp_loss : 0.0000 total_loss : 7.0799 avg_loss_step : 7.0799 learning_rate : 0.00019555048 loss_scaler : 32768
DLL 2020-11-27 03:27:01.948907 - Iteration: 764 throughput_train : 262.733 seq/s mlm_loss : 6.9233 nsp_loss : 0.0000 total_loss : 6.9233 avg_loss_step : 6.9233 learning_rate : 0.00019514096 loss_scaler : 32768
DLL 2020-11-27 03:27:03.891286 - Iteration: 765 throughput_train : 263.634 seq/s mlm_loss : 6.9289 nsp_loss : 0.0000 total_loss : 6.9289 avg_loss_step : 6.9289 learning_rate : 0.00019473057 loss_scaler : 32768
DLL 2020-11-27 03:27:05.837773 - Iteration: 766 throughput_train : 263.078 seq/s mlm_loss : 6.9259 nsp_loss : 0.0000 total_loss : 6.9259 avg_loss_step : 6.9259 learning_rate : 0.00019431929 loss_scaler : 32768
DLL 2020-11-27 03:27:07.783792 - Iteration: 767 throughput_train : 263.141 seq/s mlm_loss : 7.0267 nsp_loss : 0.0000 total_loss : 7.0267 avg_loss_step : 7.0267 learning_rate : 0.00019390717 loss_scaler : 32768
DLL 2020-11-27 03:27:09.730431 - Iteration: 768 throughput_train : 263.057 seq/s mlm_loss : 6.9713 nsp_loss : 0.0000 total_loss : 6.9713 avg_loss_step : 6.9713 learning_rate : 0.00019349417 loss_scaler : 32768
DLL 2020-11-27 03:27:11.672904 - Iteration: 769 throughput_train : 263.623 seq/s mlm_loss : 7.0375 nsp_loss : 0.0000 total_loss : 7.0375 avg_loss_step : 7.0375 learning_rate : 0.00019308028 loss_scaler : 32768
DLL 2020-11-27 03:27:13.620243 - Iteration: 770 throughput_train : 262.962 seq/s mlm_loss : 6.9546 nsp_loss : 0.0000 total_loss : 6.9546 avg_loss_step : 6.9546 learning_rate : 0.0001926655 loss_scaler : 32768
DLL 2020-11-27 03:27:15.569877 - Iteration: 771 throughput_train : 262.657 seq/s mlm_loss : 7.0779 nsp_loss : 0.0000 total_loss : 7.0779 avg_loss_step : 7.0779 learning_rate : 0.0001922498 loss_scaler : 32768
DLL 2020-11-27 03:27:17.515130 - Iteration: 772 throughput_train : 263.247 seq/s mlm_loss : 7.0709 nsp_loss : 0.0000 total_loss : 7.0709 avg_loss_step : 7.0709 learning_rate : 0.00019183324 loss_scaler : 32768
DLL 2020-11-27 03:27:19.465553 - Iteration: 773 throughput_train : 262.561 seq/s mlm_loss : 7.1586 nsp_loss : 0.0000 total_loss : 7.1586 avg_loss_step : 7.1586 learning_rate : 0.00019141576 loss_scaler : 32768
DLL 2020-11-27 03:27:21.411975 - Iteration: 774 throughput_train : 263.096 seq/s mlm_loss : 7.1291 nsp_loss : 0.0000 total_loss : 7.1291 avg_loss_step : 7.1291 learning_rate : 0.00019099737 loss_scaler : 32768
DLL 2020-11-27 03:27:23.359734 - Iteration: 775 throughput_train : 262.906 seq/s mlm_loss : 7.0225 nsp_loss : 0.0000 total_loss : 7.0225 avg_loss_step : 7.0225 learning_rate : 0.00019057804 loss_scaler : 32768
DLL 2020-11-27 03:27:25.306435 - Iteration: 776 throughput_train : 263.050 seq/s mlm_loss : 6.7556 nsp_loss : 0.0000 total_loss : 6.7556 avg_loss_step : 6.7556 learning_rate : 0.0001901578 loss_scaler : 32768
DLL 2020-11-27 03:27:27.255866 - Iteration: 777 throughput_train : 262.692 seq/s mlm_loss : 6.8302 nsp_loss : 0.0000 total_loss : 6.8302 avg_loss_step : 6.8302 learning_rate : 0.00018973663 loss_scaler : 32768
DLL 2020-11-27 03:27:29.196414 - Iteration: 778 throughput_train : 263.883 seq/s mlm_loss : 6.9484 nsp_loss : 0.0000 total_loss : 6.9484 avg_loss_step : 6.9484 learning_rate : 0.00018931454 loss_scaler : 32768
DLL 2020-11-27 03:27:31.139501 - Iteration: 779 throughput_train : 263.540 seq/s mlm_loss : 6.8814 nsp_loss : 0.0000 total_loss : 6.8814 avg_loss_step : 6.8814 learning_rate : 0.00018889148 loss_scaler : 32768
DLL 2020-11-27 03:27:33.081837 - Iteration: 780 throughput_train : 263.644 seq/s mlm_loss : 6.8018 nsp_loss : 0.0000 total_loss : 6.8018 avg_loss_step : 6.8018 learning_rate : 0.00018846747 loss_scaler : 32768
DLL 2020-11-27 03:27:35.029407 - Iteration: 781 throughput_train : 262.931 seq/s mlm_loss : 6.9526 nsp_loss : 0.0000 total_loss : 6.9526 avg_loss_step : 6.9526 learning_rate : 0.00018804253 loss_scaler : 32768
DLL 2020-11-27 03:27:36.981657 - Iteration: 782 throughput_train : 262.303 seq/s mlm_loss : 6.9237 nsp_loss : 0.0000 total_loss : 6.9237 avg_loss_step : 6.9237 learning_rate : 0.00018761662 loss_scaler : 32768
DLL 2020-11-27 03:27:38.927913 - Iteration: 783 throughput_train : 263.109 seq/s mlm_loss : 6.8622 nsp_loss : 0.0000 total_loss : 6.8622 avg_loss_step : 6.8622 learning_rate : 0.00018718973 loss_scaler : 32768
DLL 2020-11-27 03:27:40.864483 - Iteration: 784 throughput_train : 264.426 seq/s mlm_loss : 6.9765 nsp_loss : 0.0000 total_loss : 6.9765 avg_loss_step : 6.9765 learning_rate : 0.00018676186 loss_scaler : 32768
DLL 2020-11-27 03:27:42.811028 - Iteration: 785 throughput_train : 263.071 seq/s mlm_loss : 7.0138 nsp_loss : 0.0000 total_loss : 7.0138 avg_loss_step : 7.0138 learning_rate : 0.00018633301 loss_scaler : 32768
DLL 2020-11-27 03:27:44.760318 - Iteration: 786 throughput_train : 262.700 seq/s mlm_loss : 6.9907 nsp_loss : 0.0000 total_loss : 6.9907 avg_loss_step : 6.9907 learning_rate : 0.00018590318 loss_scaler : 32768
DLL 2020-11-27 03:27:46.698962 - Iteration: 787 throughput_train : 264.144 seq/s mlm_loss : 7.2020 nsp_loss : 0.0000 total_loss : 7.2020 avg_loss_step : 7.2020 learning_rate : 0.00018547235 loss_scaler : 32768
DLL 2020-11-27 03:27:48.639566 - Iteration: 788 throughput_train : 263.878 seq/s mlm_loss : 7.0324 nsp_loss : 0.0000 total_loss : 7.0324 avg_loss_step : 7.0324 learning_rate : 0.00018504052 loss_scaler : 32768
DLL 2020-11-27 03:27:50.595212 - Iteration: 789 throughput_train : 261.852 seq/s mlm_loss : 7.0646 nsp_loss : 0.0000 total_loss : 7.0646 avg_loss_step : 7.0646 learning_rate : 0.00018460766 loss_scaler : 32768
DLL 2020-11-27 03:27:52.536775 - Iteration: 790 throughput_train : 263.775 seq/s mlm_loss : 7.1055 nsp_loss : 0.0000 total_loss : 7.1055 avg_loss_step : 7.1055 learning_rate : 0.00018417381 loss_scaler : 32768
DLL 2020-11-27 03:27:54.472441 - Iteration: 791 throughput_train : 264.576 seq/s mlm_loss : 6.8430 nsp_loss : 0.0000 total_loss : 6.8430 avg_loss_step : 6.8430 learning_rate : 0.00018373893 loss_scaler : 32768
DLL 2020-11-27 03:27:56.417840 - Iteration: 792 throughput_train : 263.246 seq/s mlm_loss : 6.8850 nsp_loss : 0.0000 total_loss : 6.8850 avg_loss_step : 6.8850 learning_rate : 0.00018330301 loss_scaler : 32768
DLL 2020-11-27 03:27:58.363681 - Iteration: 793 throughput_train : 263.168 seq/s mlm_loss : 6.9649 nsp_loss : 0.0000 total_loss : 6.9649 avg_loss_step : 6.9649 learning_rate : 0.00018286608 loss_scaler : 32768
DLL 2020-11-27 03:28:00.312285 - Iteration: 794 throughput_train : 262.803 seq/s mlm_loss : 6.8303 nsp_loss : 0.0000 total_loss : 6.8303 avg_loss_step : 6.8303 learning_rate : 0.00018242803 loss_scaler : 32768
DLL 2020-11-27 03:28:02.247975 - Iteration: 795 throughput_train : 264.550 seq/s mlm_loss : 6.7059 nsp_loss : 0.0000 total_loss : 6.7059 avg_loss_step : 6.7059 learning_rate : 0.00018198899 loss_scaler : 32768
DLL 2020-11-27 03:28:04.189328 - Iteration: 796 throughput_train : 263.784 seq/s mlm_loss : 6.8815 nsp_loss : 0.0000 total_loss : 6.8815 avg_loss_step : 6.8815 learning_rate : 0.00018154888 loss_scaler : 32768
DLL 2020-11-27 03:28:06.123885 - Iteration: 797 throughput_train : 264.703 seq/s mlm_loss : 6.9717 nsp_loss : 0.0000 total_loss : 6.9717 avg_loss_step : 6.9717 learning_rate : 0.0001811077 loss_scaler : 32768
DLL 2020-11-27 03:28:08.074127 - Iteration: 798 throughput_train : 262.572 seq/s mlm_loss : 7.0314 nsp_loss : 0.0000 total_loss : 7.0314 avg_loss_step : 7.0314 learning_rate : 0.0001806654 loss_scaler : 32768
DLL 2020-11-27 03:28:10.030905 - Iteration: 799 throughput_train : 261.695 seq/s mlm_loss : 6.9870 nsp_loss : 0.0000 total_loss : 6.9870 avg_loss_step : 6.9870 learning_rate : 0.00018022205 loss_scaler : 32768
DLL 2020-11-27 03:28:11.967739 - Iteration: 800 throughput_train : 264.390 seq/s mlm_loss : 7.0242 nsp_loss : 0.0000 total_loss : 7.0242 avg_loss_step : 7.0242 learning_rate : 0.00017977762 loss_scaler : 32768
DLL 2020-11-27 03:28:13.912341 - Iteration: 801 throughput_train : 263.333 seq/s mlm_loss : 6.9614 nsp_loss : 0.0000 total_loss : 6.9614 avg_loss_step : 6.9614 learning_rate : 0.00017933207 loss_scaler : 32768
DLL 2020-11-27 03:28:15.851202 - Iteration: 802 throughput_train : 264.119 seq/s mlm_loss : 6.8788 nsp_loss : 0.0000 total_loss : 6.8788 avg_loss_step : 6.8788 learning_rate : 0.00017888543 loss_scaler : 32768
DLL 2020-11-27 03:28:17.798613 - Iteration: 803 throughput_train : 262.958 seq/s mlm_loss : 6.9172 nsp_loss : 0.0000 total_loss : 6.9172 avg_loss_step : 6.9172 learning_rate : 0.00017843764 loss_scaler : 32768
DLL 2020-11-27 03:28:19.743369 - Iteration: 804 throughput_train : 263.330 seq/s mlm_loss : 6.8688 nsp_loss : 0.0000 total_loss : 6.8688 avg_loss_step : 6.8688 learning_rate : 0.00017798874 loss_scaler : 32768
DLL 2020-11-27 03:28:21.686579 - Iteration: 805 throughput_train : 263.541 seq/s mlm_loss : 7.0042 nsp_loss : 0.0000 total_loss : 7.0042 avg_loss_step : 7.0042 learning_rate : 0.0001775387 loss_scaler : 32768
DLL 2020-11-27 03:28:23.630887 - Iteration: 806 throughput_train : 263.374 seq/s mlm_loss : 6.9142 nsp_loss : 0.0000 total_loss : 6.9142 avg_loss_step : 6.9142 learning_rate : 0.00017708754 loss_scaler : 32768
DLL 2020-11-27 03:28:25.575169 - Iteration: 807 throughput_train : 263.379 seq/s mlm_loss : 6.9277 nsp_loss : 0.0000 total_loss : 6.9277 avg_loss_step : 6.9277 learning_rate : 0.00017663518 loss_scaler : 32768
DLL 2020-11-27 03:28:27.523783 - Iteration: 808 throughput_train : 262.791 seq/s mlm_loss : 7.0044 nsp_loss : 0.0000 total_loss : 7.0044 avg_loss_step : 7.0044 learning_rate : 0.0001761817 loss_scaler : 32768
DLL 2020-11-27 03:28:29.462463 - Iteration: 809 throughput_train : 264.140 seq/s mlm_loss : 7.1039 nsp_loss : 0.0000 total_loss : 7.1039 avg_loss_step : 7.1039 learning_rate : 0.00017572704 loss_scaler : 32768
DLL 2020-11-27 03:28:31.404054 - Iteration: 810 throughput_train : 263.742 seq/s mlm_loss : 6.9743 nsp_loss : 0.0000 total_loss : 6.9743 avg_loss_step : 6.9743 learning_rate : 0.0001752712 loss_scaler : 32768
DLL 2020-11-27 03:28:33.349908 - Iteration: 811 throughput_train : 263.163 seq/s mlm_loss : 6.9784 nsp_loss : 0.0000 total_loss : 6.9784 avg_loss_step : 6.9784 learning_rate : 0.00017481417 loss_scaler : 32768
DLL 2020-11-27 03:28:35.301479 - Iteration: 812 throughput_train : 262.393 seq/s mlm_loss : 7.0608 nsp_loss : 0.0000 total_loss : 7.0608 avg_loss_step : 7.0608 learning_rate : 0.00017435593 loss_scaler : 32768
DLL 2020-11-27 03:28:37.235982 - Iteration: 813 throughput_train : 264.711 seq/s mlm_loss : 7.0819 nsp_loss : 0.0000 total_loss : 7.0819 avg_loss_step : 7.0819 learning_rate : 0.0001738965 loss_scaler : 32768
DLL 2020-11-27 03:28:39.167683 - Iteration: 814 throughput_train : 265.091 seq/s mlm_loss : 7.2069 nsp_loss : 0.0000 total_loss : 7.2069 avg_loss_step : 7.2069 learning_rate : 0.00017343585 loss_scaler : 32768
DLL 2020-11-27 03:28:41.114099 - Iteration: 815 throughput_train : 263.087 seq/s mlm_loss : 6.9523 nsp_loss : 0.0000 total_loss : 6.9523 avg_loss_step : 6.9523 learning_rate : 0.00017297397 loss_scaler : 32768
DLL 2020-11-27 03:28:43.059582 - Iteration: 816 throughput_train : 263.214 seq/s mlm_loss : 7.0288 nsp_loss : 0.0000 total_loss : 7.0288 avg_loss_step : 7.0288 learning_rate : 0.00017251086 loss_scaler : 32768
DLL 2020-11-27 03:28:45.004560 - Iteration: 817 throughput_train : 263.283 seq/s mlm_loss : 6.9679 nsp_loss : 0.0000 total_loss : 6.9679 avg_loss_step : 6.9679 learning_rate : 0.00017204648 loss_scaler : 32768
DLL 2020-11-27 03:28:46.956909 - Iteration: 818 throughput_train : 262.288 seq/s mlm_loss : 6.8440 nsp_loss : 0.0000 total_loss : 6.8440 avg_loss_step : 6.8440 learning_rate : 0.00017158086 loss_scaler : 32768
DLL 2020-11-27 03:28:48.907714 - Iteration: 819 throughput_train : 262.497 seq/s mlm_loss : 6.8333 nsp_loss : 0.0000 total_loss : 6.8333 avg_loss_step : 6.8333 learning_rate : 0.00017111398 loss_scaler : 32768
DLL 2020-11-27 03:28:50.851101 - Iteration: 820 throughput_train : 263.498 seq/s mlm_loss : 6.8919 nsp_loss : 0.0000 total_loss : 6.8919 avg_loss_step : 6.8919 learning_rate : 0.00017064581 loss_scaler : 32768
DLL 2020-11-27 03:28:52.798281 - Iteration: 821 throughput_train : 262.985 seq/s mlm_loss : 7.0756 nsp_loss : 0.0000 total_loss : 7.0756 avg_loss_step : 7.0756 learning_rate : 0.00017017635 loss_scaler : 32768
DLL 2020-11-27 03:28:54.751387 - Iteration: 822 throughput_train : 262.189 seq/s mlm_loss : 7.0195 nsp_loss : 0.0000 total_loss : 7.0195 avg_loss_step : 7.0195 learning_rate : 0.0001697056 loss_scaler : 32768
DLL 2020-11-27 03:28:56.695710 - Iteration: 823 throughput_train : 263.387 seq/s mlm_loss : 6.9612 nsp_loss : 0.0000 total_loss : 6.9612 avg_loss_step : 6.9612 learning_rate : 0.00016923355 loss_scaler : 32768
DLL 2020-11-27 03:28:58.641915 - Iteration: 824 throughput_train : 263.134 seq/s mlm_loss : 6.8476 nsp_loss : 0.0000 total_loss : 6.8476 avg_loss_step : 6.8476 learning_rate : 0.00016876016 loss_scaler : 32768
DLL 2020-11-27 03:29:00.590179 - Iteration: 825 throughput_train : 262.837 seq/s mlm_loss : 6.8095 nsp_loss : 0.0000 total_loss : 6.8095 avg_loss_step : 6.8095 learning_rate : 0.00016828546 loss_scaler : 32768
DLL 2020-11-27 03:29:02.539913 - Iteration: 826 throughput_train : 262.640 seq/s mlm_loss : 6.9008 nsp_loss : 0.0000 total_loss : 6.9008 avg_loss_step : 6.9008 learning_rate : 0.00016780938 loss_scaler : 32768
DLL 2020-11-27 03:29:04.499045 - Iteration: 827 throughput_train : 261.380 seq/s mlm_loss : 6.9234 nsp_loss : 0.0000 total_loss : 6.9234 avg_loss_step : 6.9234 learning_rate : 0.00016733198 loss_scaler : 32768
DLL 2020-11-27 03:29:06.448263 - Iteration: 828 throughput_train : 262.710 seq/s mlm_loss : 6.8311 nsp_loss : 0.0000 total_loss : 6.8311 avg_loss_step : 6.8311 learning_rate : 0.0001668532 loss_scaler : 32768
DLL 2020-11-27 03:29:08.398186 - Iteration: 829 throughput_train : 262.614 seq/s mlm_loss : 6.9833 nsp_loss : 0.0000 total_loss : 6.9833 avg_loss_step : 6.9833 learning_rate : 0.00016637305 loss_scaler : 32768
DLL 2020-11-27 03:29:10.342656 - Iteration: 830 throughput_train : 263.351 seq/s mlm_loss : 6.8822 nsp_loss : 0.0000 total_loss : 6.8822 avg_loss_step : 6.8822 learning_rate : 0.00016589148 loss_scaler : 32768
DLL 2020-11-27 03:29:12.286770 - Iteration: 831 throughput_train : 263.399 seq/s mlm_loss : 6.8550 nsp_loss : 0.0000 total_loss : 6.8550 avg_loss_step : 6.8550 learning_rate : 0.00016540856 loss_scaler : 32768
DLL 2020-11-27 03:29:14.223740 - Iteration: 832 throughput_train : 264.370 seq/s mlm_loss : 6.9561 nsp_loss : 0.0000 total_loss : 6.9561 avg_loss_step : 6.9561 learning_rate : 0.0001649242 loss_scaler : 32768
DLL 2020-11-27 03:29:16.165404 - Iteration: 833 throughput_train : 263.733 seq/s mlm_loss : 6.8544 nsp_loss : 0.0000 total_loss : 6.8544 avg_loss_step : 6.8544 learning_rate : 0.00016443842 loss_scaler : 32768
DLL 2020-11-27 03:29:18.109233 - Iteration: 834 throughput_train : 263.440 seq/s mlm_loss : 6.9844 nsp_loss : 0.0000 total_loss : 6.9844 avg_loss_step : 6.9844 learning_rate : 0.0001639512 loss_scaler : 32768
DLL 2020-11-27 03:29:20.050892 - Iteration: 835 throughput_train : 263.733 seq/s mlm_loss : 6.9159 nsp_loss : 0.0000 total_loss : 6.9159 avg_loss_step : 6.9159 learning_rate : 0.00016346251 loss_scaler : 32768
DLL 2020-11-27 03:29:22.002353 - Iteration: 836 throughput_train : 262.408 seq/s mlm_loss : 6.9142 nsp_loss : 0.0000 total_loss : 6.9142 avg_loss_step : 6.9142 learning_rate : 0.00016297236 loss_scaler : 32768
DLL 2020-11-27 03:29:23.959769 - Iteration: 837 throughput_train : 261.611 seq/s mlm_loss : 6.9673 nsp_loss : 0.0000 total_loss : 6.9673 avg_loss_step : 6.9673 learning_rate : 0.00016248075 loss_scaler : 32768
DLL 2020-11-27 03:29:25.897218 - Iteration: 838 throughput_train : 264.305 seq/s mlm_loss : 7.0895 nsp_loss : 0.0000 total_loss : 7.0895 avg_loss_step : 7.0895 learning_rate : 0.00016198763 loss_scaler : 32768
DLL 2020-11-27 03:29:27.847529 - Iteration: 839 throughput_train : 262.562 seq/s mlm_loss : 7.1113 nsp_loss : 0.0000 total_loss : 7.1113 avg_loss_step : 7.1113 learning_rate : 0.00016149302 loss_scaler : 32768
DLL 2020-11-27 03:29:29.801018 - Iteration: 840 throughput_train : 262.138 seq/s mlm_loss : 6.9197 nsp_loss : 0.0000 total_loss : 6.9197 avg_loss_step : 6.9197 learning_rate : 0.00016099686 loss_scaler : 32768
DLL 2020-11-27 03:29:31.752143 - Iteration: 841 throughput_train : 262.471 seq/s mlm_loss : 7.0030 nsp_loss : 0.0000 total_loss : 7.0030 avg_loss_step : 7.0030 learning_rate : 0.0001604992 loss_scaler : 32768
DLL 2020-11-27 03:29:33.682008 - Iteration: 842 throughput_train : 265.358 seq/s mlm_loss : 6.9480 nsp_loss : 0.0000 total_loss : 6.9480 avg_loss_step : 6.9480 learning_rate : 0.00015999998 loss_scaler : 32768
DLL 2020-11-27 03:29:35.622089 - Iteration: 843 throughput_train : 263.946 seq/s mlm_loss : 7.0357 nsp_loss : 0.0000 total_loss : 7.0357 avg_loss_step : 7.0357 learning_rate : 0.0001594992 loss_scaler : 32768
DLL 2020-11-27 03:29:37.563073 - Iteration: 844 throughput_train : 263.824 seq/s mlm_loss : 6.8540 nsp_loss : 0.0000 total_loss : 6.8540 avg_loss_step : 6.8540 learning_rate : 0.00015899682 loss_scaler : 32768
DLL 2020-11-27 03:29:39.514069 - Iteration: 845 throughput_train : 262.470 seq/s mlm_loss : 6.9167 nsp_loss : 0.0000 total_loss : 6.9167 avg_loss_step : 6.9167 learning_rate : 0.00015849287 loss_scaler : 32768
DLL 2020-11-27 03:29:41.464709 - Iteration: 846 throughput_train : 262.522 seq/s mlm_loss : 6.9563 nsp_loss : 0.0000 total_loss : 6.9563 avg_loss_step : 6.9563 learning_rate : 0.00015798732 loss_scaler : 32768
DLL 2020-11-27 03:29:43.403287 - Iteration: 847 throughput_train : 264.152 seq/s mlm_loss : 6.9096 nsp_loss : 0.0000 total_loss : 6.9096 avg_loss_step : 6.9096 learning_rate : 0.00015748014 loss_scaler : 32768
DLL 2020-11-27 03:29:45.351202 - Iteration: 848 throughput_train : 262.885 seq/s mlm_loss : 7.0075 nsp_loss : 0.0000 total_loss : 7.0075 avg_loss_step : 7.0075 learning_rate : 0.00015697132 loss_scaler : 32768
DLL 2020-11-27 03:29:47.299579 - Iteration: 849 throughput_train : 262.824 seq/s mlm_loss : 6.8904 nsp_loss : 0.0000 total_loss : 6.8904 avg_loss_step : 6.8904 learning_rate : 0.00015646082 loss_scaler : 32768
DLL 2020-11-27 03:29:49.241340 - Iteration: 850 throughput_train : 263.718 seq/s mlm_loss : 6.9523 nsp_loss : 0.0000 total_loss : 6.9523 avg_loss_step : 6.9523 learning_rate : 0.00015594868 loss_scaler : 32768
DLL 2020-11-27 03:29:51.191276 - Iteration: 851 throughput_train : 262.612 seq/s mlm_loss : 7.0171 nsp_loss : 0.0000 total_loss : 7.0171 avg_loss_step : 7.0171 learning_rate : 0.00015543486 loss_scaler : 32768
DLL 2020-11-27 03:29:53.139418 - Iteration: 852 throughput_train : 262.854 seq/s mlm_loss : 6.9770 nsp_loss : 0.0000 total_loss : 6.9770 avg_loss_step : 6.9770 learning_rate : 0.00015491933 loss_scaler : 32768
DLL 2020-11-27 03:29:55.088081 - Iteration: 853 throughput_train : 262.785 seq/s mlm_loss : 7.1312 nsp_loss : 0.0000 total_loss : 7.1312 avg_loss_step : 7.1312 learning_rate : 0.00015440206 loss_scaler : 32768
DLL 2020-11-27 03:29:57.029673 - Iteration: 854 throughput_train : 263.740 seq/s mlm_loss : 6.9693 nsp_loss : 0.0000 total_loss : 6.9693 avg_loss_step : 6.9693 learning_rate : 0.00015388304 loss_scaler : 32768
DLL 2020-11-27 03:29:58.971206 - Iteration: 855 throughput_train : 263.749 seq/s mlm_loss : 6.9006 nsp_loss : 0.0000 total_loss : 6.9006 avg_loss_step : 6.9006 learning_rate : 0.0001533623 loss_scaler : 32768
DLL 2020-11-27 03:30:00.918501 - Iteration: 856 throughput_train : 262.968 seq/s mlm_loss : 6.8803 nsp_loss : 0.0000 total_loss : 6.8803 avg_loss_step : 6.8803 learning_rate : 0.00015283977 loss_scaler : 32768
DLL 2020-11-27 03:30:02.868788 - Iteration: 857 throughput_train : 262.565 seq/s mlm_loss : 6.9488 nsp_loss : 0.0000 total_loss : 6.9488 avg_loss_step : 6.9488 learning_rate : 0.00015231545 loss_scaler : 32768
DLL 2020-11-27 03:30:04.821727 - Iteration: 858 throughput_train : 262.208 seq/s mlm_loss : 7.1134 nsp_loss : 0.0000 total_loss : 7.1134 avg_loss_step : 7.1134 learning_rate : 0.0001517893 loss_scaler : 32768
DLL 2020-11-27 03:30:06.777028 - Iteration: 859 throughput_train : 261.891 seq/s mlm_loss : 7.1815 nsp_loss : 0.0000 total_loss : 7.1815 avg_loss_step : 7.1815 learning_rate : 0.00015126132 loss_scaler : 32768
DLL 2020-11-27 03:30:08.725621 - Iteration: 860 throughput_train : 262.793 seq/s mlm_loss : 7.1078 nsp_loss : 0.0000 total_loss : 7.1078 avg_loss_step : 7.1078 learning_rate : 0.00015073152 loss_scaler : 32768
DLL 2020-11-27 03:30:10.663019 - Iteration: 861 throughput_train : 264.312 seq/s mlm_loss : 7.1144 nsp_loss : 0.0000 total_loss : 7.1144 avg_loss_step : 7.1144 learning_rate : 0.00015019985 loss_scaler : 32768
DLL 2020-11-27 03:30:12.609102 - Iteration: 862 throughput_train : 263.133 seq/s mlm_loss : 6.9658 nsp_loss : 0.0000 total_loss : 6.9658 avg_loss_step : 6.9658 learning_rate : 0.00014966629 loss_scaler : 32768
DLL 2020-11-27 03:30:14.552379 - Iteration: 863 throughput_train : 263.514 seq/s mlm_loss : 6.8953 nsp_loss : 0.0000 total_loss : 6.8953 avg_loss_step : 6.8953 learning_rate : 0.00014913078 loss_scaler : 32768
DLL 2020-11-27 03:30:16.504053 - Iteration: 864 throughput_train : 262.385 seq/s mlm_loss : 6.9409 nsp_loss : 0.0000 total_loss : 6.9409 avg_loss_step : 6.9409 learning_rate : 0.00014859338 loss_scaler : 32768
DLL 2020-11-27 03:30:18.444361 - Iteration: 865 throughput_train : 263.920 seq/s mlm_loss : 6.9798 nsp_loss : 0.0000 total_loss : 6.9798 avg_loss_step : 6.9798 learning_rate : 0.00014805402 loss_scaler : 32768
DLL 2020-11-27 03:30:20.374200 - Iteration: 866 throughput_train : 265.364 seq/s mlm_loss : 6.7321 nsp_loss : 0.0000 total_loss : 6.7321 avg_loss_step : 6.7321 learning_rate : 0.00014751269 loss_scaler : 32768
DLL 2020-11-27 03:30:22.320670 - Iteration: 867 throughput_train : 263.081 seq/s mlm_loss : 6.9625 nsp_loss : 0.0000 total_loss : 6.9625 avg_loss_step : 6.9625 learning_rate : 0.00014696934 loss_scaler : 32768
DLL 2020-11-27 03:30:24.267699 - Iteration: 868 throughput_train : 263.009 seq/s mlm_loss : 7.0219 nsp_loss : 0.0000 total_loss : 7.0219 avg_loss_step : 7.0219 learning_rate : 0.000146424 loss_scaler : 32768
DLL 2020-11-27 03:30:26.216193 - Iteration: 869 throughput_train : 262.807 seq/s mlm_loss : 6.9241 nsp_loss : 0.0000 total_loss : 6.9241 avg_loss_step : 6.9241 learning_rate : 0.00014587663 loss_scaler : 32768
DLL 2020-11-27 03:30:28.164333 - Iteration: 870 throughput_train : 262.855 seq/s mlm_loss : 7.0425 nsp_loss : 0.0000 total_loss : 7.0425 avg_loss_step : 7.0425 learning_rate : 0.0001453272 loss_scaler : 32768
DLL 2020-11-27 03:30:30.101536 - Iteration: 871 throughput_train : 264.345 seq/s mlm_loss : 7.0188 nsp_loss : 0.0000 total_loss : 7.0188 avg_loss_step : 7.0188 learning_rate : 0.00014477567 loss_scaler : 32768
DLL 2020-11-27 03:30:32.051562 - Iteration: 872 throughput_train : 262.615 seq/s mlm_loss : 6.9611 nsp_loss : 0.0000 total_loss : 6.9611 avg_loss_step : 6.9611 learning_rate : 0.00014422202 loss_scaler : 32768
DLL 2020-11-27 03:30:33.999095 - Iteration: 873 throughput_train : 262.938 seq/s mlm_loss : 7.0169 nsp_loss : 0.0000 total_loss : 7.0169 avg_loss_step : 7.0169 learning_rate : 0.00014366626 loss_scaler : 32768
DLL 2020-11-27 03:30:35.941688 - Iteration: 874 throughput_train : 263.606 seq/s mlm_loss : 6.9855 nsp_loss : 0.0000 total_loss : 6.9855 avg_loss_step : 6.9855 learning_rate : 0.00014310832 loss_scaler : 32768
DLL 2020-11-27 03:30:37.889674 - Iteration: 875 throughput_train : 262.876 seq/s mlm_loss : 6.7696 nsp_loss : 0.0000 total_loss : 6.7696 avg_loss_step : 6.7696 learning_rate : 0.00014254822 loss_scaler : 32768
DLL 2020-11-27 03:30:39.842236 - Iteration: 876 throughput_train : 262.260 seq/s mlm_loss : 6.7091 nsp_loss : 0.0000 total_loss : 6.7091 avg_loss_step : 6.7091 learning_rate : 0.0001419859 loss_scaler : 32768
DLL 2020-11-27 03:30:41.787213 - Iteration: 877 throughput_train : 263.291 seq/s mlm_loss : 7.0089 nsp_loss : 0.0000 total_loss : 7.0089 avg_loss_step : 7.0089 learning_rate : 0.00014142132 loss_scaler : 32768
DLL 2020-11-27 03:30:43.738122 - Iteration: 878 throughput_train : 262.499 seq/s mlm_loss : 6.9753 nsp_loss : 0.0000 total_loss : 6.9753 avg_loss_step : 6.9753 learning_rate : 0.0001408545 loss_scaler : 32768
DLL 2020-11-27 03:30:45.676014 - Iteration: 879 throughput_train : 264.247 seq/s mlm_loss : 6.9691 nsp_loss : 0.0000 total_loss : 6.9691 avg_loss_step : 6.9691 learning_rate : 0.00014028541 loss_scaler : 32768
DLL 2020-11-27 03:30:47.621066 - Iteration: 880 throughput_train : 263.273 seq/s mlm_loss : 6.9666 nsp_loss : 0.0000 total_loss : 6.9666 avg_loss_step : 6.9666 learning_rate : 0.00013971397 loss_scaler : 32768
DLL 2020-11-27 03:30:49.564009 - Iteration: 881 throughput_train : 263.559 seq/s mlm_loss : 6.9325 nsp_loss : 0.0000 total_loss : 6.9325 avg_loss_step : 6.9325 learning_rate : 0.00013914017 loss_scaler : 32768
DLL 2020-11-27 03:30:51.512811 - Iteration: 882 throughput_train : 262.767 seq/s mlm_loss : 6.9320 nsp_loss : 0.0000 total_loss : 6.9320 avg_loss_step : 6.9320 learning_rate : 0.00013856404 loss_scaler : 32768
DLL 2020-11-27 03:30:53.460503 - Iteration: 883 throughput_train : 262.923 seq/s mlm_loss : 7.0282 nsp_loss : 0.0000 total_loss : 7.0282 avg_loss_step : 7.0282 learning_rate : 0.00013798548 loss_scaler : 32768
DLL 2020-11-27 03:30:55.400606 - Iteration: 884 throughput_train : 263.965 seq/s mlm_loss : 6.9080 nsp_loss : 0.0000 total_loss : 6.9080 avg_loss_step : 6.9080 learning_rate : 0.00013740448 loss_scaler : 32768
DLL 2020-11-27 03:30:57.359322 - Iteration: 885 throughput_train : 261.435 seq/s mlm_loss : 6.8399 nsp_loss : 0.0000 total_loss : 6.8399 avg_loss_step : 6.8399 learning_rate : 0.00013682104 loss_scaler : 32768
DLL 2020-11-27 03:30:59.302665 - Iteration: 886 throughput_train : 263.505 seq/s mlm_loss : 6.8415 nsp_loss : 0.0000 total_loss : 6.8415 avg_loss_step : 6.8415 learning_rate : 0.00013623506 loss_scaler : 32768
DLL 2020-11-27 03:31:01.254166 - Iteration: 887 throughput_train : 262.401 seq/s mlm_loss : 6.8820 nsp_loss : 0.0000 total_loss : 6.8820 avg_loss_step : 6.8820 learning_rate : 0.00013564657 loss_scaler : 32768
DLL 2020-11-27 03:31:03.195339 - Iteration: 888 throughput_train : 263.798 seq/s mlm_loss : 6.9256 nsp_loss : 0.0000 total_loss : 6.9256 avg_loss_step : 6.9256 learning_rate : 0.00013505551 loss_scaler : 32768
DLL 2020-11-27 03:31:05.149884 - Iteration: 889 throughput_train : 261.994 seq/s mlm_loss : 6.8912 nsp_loss : 0.0000 total_loss : 6.8912 avg_loss_step : 6.8912 learning_rate : 0.00013446188 loss_scaler : 32768
DLL 2020-11-27 03:31:07.099369 - Iteration: 890 throughput_train : 262.676 seq/s mlm_loss : 6.9530 nsp_loss : 0.0000 total_loss : 6.9530 avg_loss_step : 6.9530 learning_rate : 0.00013386556 loss_scaler : 32768
DLL 2020-11-27 03:31:09.044788 - Iteration: 891 throughput_train : 263.222 seq/s mlm_loss : 6.9154 nsp_loss : 0.0000 total_loss : 6.9154 avg_loss_step : 6.9154 learning_rate : 0.00013326661 loss_scaler : 32768
DLL 2020-11-27 03:31:10.994857 - Iteration: 892 throughput_train : 262.597 seq/s mlm_loss : 6.8963 nsp_loss : 0.0000 total_loss : 6.8963 avg_loss_step : 6.8963 learning_rate : 0.00013266497 loss_scaler : 32768
DLL 2020-11-27 03:31:12.947912 - Iteration: 893 throughput_train : 262.195 seq/s mlm_loss : 6.8198 nsp_loss : 0.0000 total_loss : 6.8198 avg_loss_step : 6.8198 learning_rate : 0.00013206057 loss_scaler : 32768
DLL 2020-11-27 03:31:14.893208 - Iteration: 894 throughput_train : 263.245 seq/s mlm_loss : 7.0223 nsp_loss : 0.0000 total_loss : 7.0223 avg_loss_step : 7.0223 learning_rate : 0.0001314534 loss_scaler : 32768
DLL 2020-11-27 03:31:16.829614 - Iteration: 895 throughput_train : 264.452 seq/s mlm_loss : 7.0538 nsp_loss : 0.0000 total_loss : 7.0538 avg_loss_step : 7.0538 learning_rate : 0.00013084337 loss_scaler : 32768
DLL 2020-11-27 03:31:18.776792 - Iteration: 896 throughput_train : 262.991 seq/s mlm_loss : 6.8960 nsp_loss : 0.0000 total_loss : 6.8960 avg_loss_step : 6.8960 learning_rate : 0.00013023053 loss_scaler : 32768
DLL 2020-11-27 03:31:20.723640 - Iteration: 897 throughput_train : 263.044 seq/s mlm_loss : 7.0073 nsp_loss : 0.0000 total_loss : 7.0073 avg_loss_step : 7.0073 learning_rate : 0.0001296148 loss_scaler : 32768
DLL 2020-11-27 03:31:22.670810 - Iteration: 898 throughput_train : 262.988 seq/s mlm_loss : 7.1470 nsp_loss : 0.0000 total_loss : 7.1470 avg_loss_step : 7.1470 learning_rate : 0.00012899611 loss_scaler : 32768
DLL 2020-11-27 03:31:24.614356 - Iteration: 899 throughput_train : 263.485 seq/s mlm_loss : 7.0015 nsp_loss : 0.0000 total_loss : 7.0015 avg_loss_step : 7.0015 learning_rate : 0.00012837444 loss_scaler : 32768
DLL 2020-11-27 03:31:26.556425 - Iteration: 900 throughput_train : 263.677 seq/s mlm_loss : 7.0501 nsp_loss : 0.0000 total_loss : 7.0501 avg_loss_step : 7.0501 learning_rate : 0.0001277497 loss_scaler : 32768
DLL 2020-11-27 03:31:28.506146 - Iteration: 901 throughput_train : 262.645 seq/s mlm_loss : 7.1122 nsp_loss : 0.0000 total_loss : 7.1122 avg_loss_step : 7.1122 learning_rate : 0.00012712195 loss_scaler : 32768
DLL 2020-11-27 03:31:30.463407 - Iteration: 902 throughput_train : 261.639 seq/s mlm_loss : 7.0145 nsp_loss : 0.0000 total_loss : 7.0145 avg_loss_step : 7.0145 learning_rate : 0.00012649108 loss_scaler : 32768
DLL 2020-11-27 03:31:32.411648 - Iteration: 903 throughput_train : 262.847 seq/s mlm_loss : 6.8233 nsp_loss : 0.0000 total_loss : 6.8233 avg_loss_step : 6.8233 learning_rate : 0.00012585704 loss_scaler : 32768
DLL 2020-11-27 03:31:34.359607 - Iteration: 904 throughput_train : 262.886 seq/s mlm_loss : 6.8065 nsp_loss : 0.0000 total_loss : 6.8065 avg_loss_step : 6.8065 learning_rate : 0.00012521975 loss_scaler : 32768
DLL 2020-11-27 03:31:36.297610 - Iteration: 905 throughput_train : 264.242 seq/s mlm_loss : 6.8583 nsp_loss : 0.0000 total_loss : 6.8583 avg_loss_step : 6.8583 learning_rate : 0.00012457925 loss_scaler : 32768
DLL 2020-11-27 03:31:38.240592 - Iteration: 906 throughput_train : 263.572 seq/s mlm_loss : 6.9624 nsp_loss : 0.0000 total_loss : 6.9624 avg_loss_step : 6.9624 learning_rate : 0.00012393543 loss_scaler : 32768
DLL 2020-11-27 03:31:40.194320 - Iteration: 907 throughput_train : 262.102 seq/s mlm_loss : 6.8486 nsp_loss : 0.0000 total_loss : 6.8486 avg_loss_step : 6.8486 learning_rate : 0.00012328826 loss_scaler : 32768
DLL 2020-11-27 03:31:42.136163 - Iteration: 908 throughput_train : 263.712 seq/s mlm_loss : 6.9456 nsp_loss : 0.0000 total_loss : 6.9456 avg_loss_step : 6.9456 learning_rate : 0.00012263766 loss_scaler : 32768
DLL 2020-11-27 03:31:44.085632 - Iteration: 909 throughput_train : 262.677 seq/s mlm_loss : 6.9101 nsp_loss : 0.0000 total_loss : 6.9101 avg_loss_step : 6.9101 learning_rate : 0.00012198356 loss_scaler : 32768
DLL 2020-11-27 03:31:46.032442 - Iteration: 910 throughput_train : 263.034 seq/s mlm_loss : 6.9922 nsp_loss : 0.0000 total_loss : 6.9922 avg_loss_step : 6.9922 learning_rate : 0.00012132597 loss_scaler : 32768
DLL 2020-11-27 03:31:47.972133 - Iteration: 911 throughput_train : 264.004 seq/s mlm_loss : 7.1033 nsp_loss : 0.0000 total_loss : 7.1033 avg_loss_step : 7.1033 learning_rate : 0.000120664794 loss_scaler : 32768
DLL 2020-11-27 03:31:49.926667 - Iteration: 912 throughput_train : 262.007 seq/s mlm_loss : 7.0447 nsp_loss : 0.0000 total_loss : 7.0447 avg_loss_step : 7.0447 learning_rate : 0.000119999975 loss_scaler : 32768
DLL 2020-11-27 03:31:51.874750 - Iteration: 913 throughput_train : 262.862 seq/s mlm_loss : 6.8380 nsp_loss : 0.0000 total_loss : 6.8380 avg_loss_step : 6.8380 learning_rate : 0.00011933142 loss_scaler : 32768
DLL 2020-11-27 03:31:53.823622 - Iteration: 914 throughput_train : 262.756 seq/s mlm_loss : 6.9580 nsp_loss : 0.0000 total_loss : 6.9580 avg_loss_step : 6.9580 learning_rate : 0.00011865913 loss_scaler : 32768
DLL 2020-11-27 03:31:55.777274 - Iteration: 915 throughput_train : 262.115 seq/s mlm_loss : 6.9403 nsp_loss : 0.0000 total_loss : 6.9403 avg_loss_step : 6.9403 learning_rate : 0.000117983014 loss_scaler : 32768
DLL 2020-11-27 03:31:57.726804 - Iteration: 916 throughput_train : 262.678 seq/s mlm_loss : 7.1607 nsp_loss : 0.0000 total_loss : 7.1607 avg_loss_step : 7.1607 learning_rate : 0.000117302996 loss_scaler : 32768
DLL 2020-11-27 03:31:59.678018 - Iteration: 917 throughput_train : 262.447 seq/s mlm_loss : 7.1012 nsp_loss : 0.0000 total_loss : 7.1012 avg_loss_step : 7.1012 learning_rate : 0.00011661903 loss_scaler : 32768
DLL 2020-11-27 03:32:01.619762 - Iteration: 918 throughput_train : 263.740 seq/s mlm_loss : 6.9754 nsp_loss : 0.0000 total_loss : 6.9754 avg_loss_step : 6.9754 learning_rate : 0.00011593096 loss_scaler : 32768
DLL 2020-11-27 03:32:03.566299 - Iteration: 919 throughput_train : 263.072 seq/s mlm_loss : 7.2180 nsp_loss : 0.0000 total_loss : 7.2180 avg_loss_step : 7.2180 learning_rate : 0.00011523884 loss_scaler : 32768
DLL 2020-11-27 03:32:05.510501 - Iteration: 920 throughput_train : 263.389 seq/s mlm_loss : 7.0958 nsp_loss : 0.0000 total_loss : 7.0958 avg_loss_step : 7.0958 learning_rate : 0.00011454254 loss_scaler : 32768
DLL 2020-11-27 03:32:07.456779 - Iteration: 921 throughput_train : 263.107 seq/s mlm_loss : 6.9975 nsp_loss : 0.0000 total_loss : 6.9975 avg_loss_step : 6.9975 learning_rate : 0.00011384197 loss_scaler : 32768
DLL 2020-11-27 03:32:09.406610 - Iteration: 922 throughput_train : 262.626 seq/s mlm_loss : 6.9562 nsp_loss : 0.0000 total_loss : 6.9562 avg_loss_step : 6.9562 learning_rate : 0.000113137074 loss_scaler : 32768
DLL 2020-11-27 03:32:11.358496 - Iteration: 923 throughput_train : 262.351 seq/s mlm_loss : 6.9001 nsp_loss : 0.0000 total_loss : 6.9001 avg_loss_step : 6.9001 learning_rate : 0.00011242771 loss_scaler : 32768
DLL 2020-11-27 03:32:13.293055 - Iteration: 924 throughput_train : 264.702 seq/s mlm_loss : 6.9168 nsp_loss : 0.0000 total_loss : 6.9168 avg_loss_step : 6.9168 learning_rate : 0.00011171388 loss_scaler : 32768
DLL 2020-11-27 03:32:15.244974 - Iteration: 925 throughput_train : 262.347 seq/s mlm_loss : 6.9462 nsp_loss : 0.0000 total_loss : 6.9462 avg_loss_step : 6.9462 learning_rate : 0.00011099547 loss_scaler : 32768
DLL 2020-11-27 03:32:17.190964 - Iteration: 926 throughput_train : 263.145 seq/s mlm_loss : 7.0298 nsp_loss : 0.0000 total_loss : 7.0298 avg_loss_step : 7.0298 learning_rate : 0.00011027237 loss_scaler : 32768
DLL 2020-11-27 03:32:19.130395 - Iteration: 927 throughput_train : 264.035 seq/s mlm_loss : 6.8945 nsp_loss : 0.0000 total_loss : 6.8945 avg_loss_step : 6.8945 learning_rate : 0.00010954445 loss_scaler : 32768
DLL 2020-11-27 03:32:21.073587 - Iteration: 928 throughput_train : 263.527 seq/s mlm_loss : 6.7719 nsp_loss : 0.0000 total_loss : 6.7719 avg_loss_step : 6.7719 learning_rate : 0.00010881172 loss_scaler : 32768
DLL 2020-11-27 03:32:23.020428 - Iteration: 929 throughput_train : 263.031 seq/s mlm_loss : 6.8506 nsp_loss : 0.0000 total_loss : 6.8506 avg_loss_step : 6.8506 learning_rate : 0.000108074004 loss_scaler : 32768
DLL 2020-11-27 03:32:24.964830 - Iteration: 930 throughput_train : 263.363 seq/s mlm_loss : 6.8781 nsp_loss : 0.0000 total_loss : 6.8781 avg_loss_step : 6.8781 learning_rate : 0.00010733124 loss_scaler : 32768
DLL 2020-11-27 03:32:26.909971 - Iteration: 931 throughput_train : 263.261 seq/s mlm_loss : 6.9124 nsp_loss : 0.0000 total_loss : 6.9124 avg_loss_step : 6.9124 learning_rate : 0.000106583284 loss_scaler : 32768
DLL 2020-11-27 03:32:28.844607 - Iteration: 932 throughput_train : 264.691 seq/s mlm_loss : 6.9373 nsp_loss : 0.0000 total_loss : 6.9373 avg_loss_step : 6.9373 learning_rate : 0.00010583 loss_scaler : 32768
DLL 2020-11-27 03:32:30.787501 - Iteration: 933 throughput_train : 263.568 seq/s mlm_loss : 6.9597 nsp_loss : 0.0000 total_loss : 6.9597 avg_loss_step : 6.9597 learning_rate : 0.00010507136 loss_scaler : 32768
DLL 2020-11-27 03:32:32.732301 - Iteration: 934 throughput_train : 263.327 seq/s mlm_loss : 6.8342 nsp_loss : 0.0000 total_loss : 6.8342 avg_loss_step : 6.8342 learning_rate : 0.000104307204 loss_scaler : 32768
DLL 2020-11-27 03:32:34.674462 - Iteration: 935 throughput_train : 263.680 seq/s mlm_loss : 6.8903 nsp_loss : 0.0000 total_loss : 6.8903 avg_loss_step : 6.8903 learning_rate : 0.000103537415 loss_scaler : 32768
DLL 2020-11-27 03:32:36.627343 - Iteration: 936 throughput_train : 262.234 seq/s mlm_loss : 6.9973 nsp_loss : 0.0000 total_loss : 6.9973 avg_loss_step : 6.9973 learning_rate : 0.00010276185 loss_scaler : 32768
DLL 2020-11-27 03:32:38.572531 - Iteration: 937 throughput_train : 263.271 seq/s mlm_loss : 6.7610 nsp_loss : 0.0000 total_loss : 6.7610 avg_loss_step : 6.7610 learning_rate : 0.00010198034 loss_scaler : 32768
DLL 2020-11-27 03:32:40.520489 - Iteration: 938 throughput_train : 262.893 seq/s mlm_loss : 6.7461 nsp_loss : 0.0000 total_loss : 6.7461 avg_loss_step : 6.7461 learning_rate : 0.00010119284 loss_scaler : 32768
DLL 2020-11-27 03:32:42.463832 - Iteration: 939 throughput_train : 263.518 seq/s mlm_loss : 6.8241 nsp_loss : 0.0000 total_loss : 6.8241 avg_loss_step : 6.8241 learning_rate : 0.00010039917 loss_scaler : 32768
DLL 2020-11-27 03:32:44.415486 - Iteration: 940 throughput_train : 262.393 seq/s mlm_loss : 7.0425 nsp_loss : 0.0000 total_loss : 7.0425 avg_loss_step : 7.0425 learning_rate : 9.959917e-05 loss_scaler : 32768
DLL 2020-11-27 03:32:46.363619 - Iteration: 941 throughput_train : 262.856 seq/s mlm_loss : 7.0274 nsp_loss : 0.0000 total_loss : 7.0274 avg_loss_step : 7.0274 learning_rate : 9.8792654e-05 loss_scaler : 32768
DLL 2020-11-27 03:32:48.305431 - Iteration: 942 throughput_train : 263.711 seq/s mlm_loss : 6.8771 nsp_loss : 0.0000 total_loss : 6.8771 avg_loss_step : 6.8771 learning_rate : 9.7979544e-05 loss_scaler : 32768
DLL 2020-11-27 03:32:50.247842 - Iteration: 943 throughput_train : 263.630 seq/s mlm_loss : 6.9546 nsp_loss : 0.0000 total_loss : 6.9546 avg_loss_step : 6.9546 learning_rate : 9.7159624e-05 loss_scaler : 32768
DLL 2020-11-27 03:32:52.187188 - Iteration: 944 throughput_train : 264.049 seq/s mlm_loss : 6.8345 nsp_loss : 0.0000 total_loss : 6.8345 avg_loss_step : 6.8345 learning_rate : 9.6332726e-05 loss_scaler : 32768
DLL 2020-11-27 03:32:54.132547 - Iteration: 945 throughput_train : 263.238 seq/s mlm_loss : 6.9657 nsp_loss : 0.0000 total_loss : 6.9657 avg_loss_step : 6.9657 learning_rate : 9.5498675e-05 loss_scaler : 32768
DLL 2020-11-27 03:32:56.084704 - Iteration: 946 throughput_train : 262.313 seq/s mlm_loss : 6.9970 nsp_loss : 0.0000 total_loss : 6.9970 avg_loss_step : 6.9970 learning_rate : 9.465722e-05 loss_scaler : 32768
DLL 2020-11-27 03:32:58.033721 - Iteration: 947 throughput_train : 262.744 seq/s mlm_loss : 6.9875 nsp_loss : 0.0000 total_loss : 6.9875 avg_loss_step : 6.9875 learning_rate : 9.380827e-05 loss_scaler : 32768
DLL 2020-11-27 03:32:59.973992 - Iteration: 948 throughput_train : 263.937 seq/s mlm_loss : 6.7989 nsp_loss : 0.0000 total_loss : 6.7989 avg_loss_step : 6.7989 learning_rate : 9.2951566e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:01.913029 - Iteration: 949 throughput_train : 264.100 seq/s mlm_loss : 6.8354 nsp_loss : 0.0000 total_loss : 6.8354 avg_loss_step : 6.8354 learning_rate : 9.208689e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:03.861673 - Iteration: 950 throughput_train : 262.792 seq/s mlm_loss : 6.9060 nsp_loss : 0.0000 total_loss : 6.9060 avg_loss_step : 6.9060 learning_rate : 9.1213966e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:05.807666 - Iteration: 951 throughput_train : 263.149 seq/s mlm_loss : 6.9194 nsp_loss : 0.0000 total_loss : 6.9194 avg_loss_step : 6.9194 learning_rate : 9.033266e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:07.756321 - Iteration: 952 throughput_train : 262.797 seq/s mlm_loss : 7.0511 nsp_loss : 0.0000 total_loss : 7.0511 avg_loss_step : 7.0511 learning_rate : 8.944268e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:09.700120 - Iteration: 953 throughput_train : 263.442 seq/s mlm_loss : 7.0058 nsp_loss : 0.0000 total_loss : 7.0058 avg_loss_step : 7.0058 learning_rate : 8.854374e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:11.640156 - Iteration: 954 throughput_train : 263.954 seq/s mlm_loss : 6.9056 nsp_loss : 0.0000 total_loss : 6.9056 avg_loss_step : 6.9056 learning_rate : 8.7635584e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:13.595436 - Iteration: 955 throughput_train : 261.895 seq/s mlm_loss : 6.9963 nsp_loss : 0.0000 total_loss : 6.9963 avg_loss_step : 6.9963 learning_rate : 8.671787e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:15.540825 - Iteration: 956 throughput_train : 263.231 seq/s mlm_loss : 6.9382 nsp_loss : 0.0000 total_loss : 6.9382 avg_loss_step : 6.9382 learning_rate : 8.579039e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:17.486852 - Iteration: 957 throughput_train : 263.143 seq/s mlm_loss : 6.9335 nsp_loss : 0.0000 total_loss : 6.9335 avg_loss_step : 6.9335 learning_rate : 8.485277e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:19.434531 - Iteration: 958 throughput_train : 262.930 seq/s mlm_loss : 6.9266 nsp_loss : 0.0000 total_loss : 6.9266 avg_loss_step : 6.9266 learning_rate : 8.390468e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:21.366225 - Iteration: 959 throughput_train : 265.092 seq/s mlm_loss : 6.8848 nsp_loss : 0.0000 total_loss : 6.8848 avg_loss_step : 6.8848 learning_rate : 8.294574e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:23.308822 - Iteration: 960 throughput_train : 263.605 seq/s mlm_loss : 7.0642 nsp_loss : 0.0000 total_loss : 7.0642 avg_loss_step : 7.0642 learning_rate : 8.1975544e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:25.255484 - Iteration: 961 throughput_train : 263.061 seq/s mlm_loss : 6.8351 nsp_loss : 0.0000 total_loss : 6.8351 avg_loss_step : 6.8351 learning_rate : 8.099378e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:27.197392 - Iteration: 962 throughput_train : 263.723 seq/s mlm_loss : 6.9702 nsp_loss : 0.0000 total_loss : 6.9702 avg_loss_step : 6.9702 learning_rate : 7.9999954e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:29.151166 - Iteration: 963 throughput_train : 262.112 seq/s mlm_loss : 6.9918 nsp_loss : 0.0000 total_loss : 6.9918 avg_loss_step : 6.9918 learning_rate : 7.899364e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:31.105493 - Iteration: 964 throughput_train : 262.034 seq/s mlm_loss : 7.0583 nsp_loss : 0.0000 total_loss : 7.0583 avg_loss_step : 7.0583 learning_rate : 7.7974284e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:33.063540 - Iteration: 965 throughput_train : 261.527 seq/s mlm_loss : 7.1737 nsp_loss : 0.0000 total_loss : 7.1737 avg_loss_step : 7.1737 learning_rate : 7.694147e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:35.000369 - Iteration: 966 throughput_train : 264.391 seq/s mlm_loss : 6.8307 nsp_loss : 0.0000 total_loss : 6.8307 avg_loss_step : 6.8307 learning_rate : 7.589462e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:36.945232 - Iteration: 967 throughput_train : 263.301 seq/s mlm_loss : 7.0447 nsp_loss : 0.0000 total_loss : 7.0447 avg_loss_step : 7.0447 learning_rate : 7.483311e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:38.889566 - Iteration: 968 throughput_train : 263.380 seq/s mlm_loss : 7.0521 nsp_loss : 0.0000 total_loss : 7.0521 avg_loss_step : 7.0521 learning_rate : 7.375633e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:40.837848 - Iteration: 969 throughput_train : 262.857 seq/s mlm_loss : 6.8837 nsp_loss : 0.0000 total_loss : 6.8837 avg_loss_step : 6.8837 learning_rate : 7.266353e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:42.787294 - Iteration: 970 throughput_train : 262.679 seq/s mlm_loss : 6.7635 nsp_loss : 0.0000 total_loss : 6.7635 avg_loss_step : 6.7635 learning_rate : 7.155411e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:44.718959 - Iteration: 971 throughput_train : 265.112 seq/s mlm_loss : 6.9365 nsp_loss : 0.0000 total_loss : 6.9365 avg_loss_step : 6.9365 learning_rate : 7.042722e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:46.672159 - Iteration: 972 throughput_train : 262.187 seq/s mlm_loss : 6.9507 nsp_loss : 0.0000 total_loss : 6.9507 avg_loss_step : 6.9507 learning_rate : 6.9282e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:48.605473 - Iteration: 973 throughput_train : 264.875 seq/s mlm_loss : 6.9384 nsp_loss : 0.0000 total_loss : 6.9384 avg_loss_step : 6.9384 learning_rate : 6.811746e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:50.553178 - Iteration: 974 throughput_train : 262.918 seq/s mlm_loss : 6.7134 nsp_loss : 0.0000 total_loss : 6.7134 avg_loss_step : 6.7134 learning_rate : 6.693273e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:52.503393 - Iteration: 975 throughput_train : 262.577 seq/s mlm_loss : 6.7645 nsp_loss : 0.0000 total_loss : 6.7645 avg_loss_step : 6.7645 learning_rate : 6.572664e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:54.433898 - Iteration: 976 throughput_train : 265.260 seq/s mlm_loss : 6.7382 nsp_loss : 0.0000 total_loss : 6.7382 avg_loss_step : 6.7382 learning_rate : 6.449802e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:56.374996 - Iteration: 977 throughput_train : 263.808 seq/s mlm_loss : 6.9420 nsp_loss : 0.0000 total_loss : 6.9420 avg_loss_step : 6.9420 learning_rate : 6.324552e-05 loss_scaler : 32768
DLL 2020-11-27 03:33:58.326857 - Iteration: 978 throughput_train : 262.357 seq/s mlm_loss : 7.0544 nsp_loss : 0.0000 total_loss : 7.0544 avg_loss_step : 7.0544 learning_rate : 6.196764e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:00.269736 - Iteration: 979 throughput_train : 263.566 seq/s mlm_loss : 6.9911 nsp_loss : 0.0000 total_loss : 6.9911 avg_loss_step : 6.9911 learning_rate : 6.0662922e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:02.214705 - Iteration: 980 throughput_train : 263.284 seq/s mlm_loss : 7.0535 nsp_loss : 0.0000 total_loss : 7.0535 avg_loss_step : 7.0535 learning_rate : 5.9329526e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:04.164418 - Iteration: 981 throughput_train : 262.643 seq/s mlm_loss : 6.9968 nsp_loss : 0.0000 total_loss : 6.9968 avg_loss_step : 6.9968 learning_rate : 5.7965462e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:06.115629 - Iteration: 982 throughput_train : 262.442 seq/s mlm_loss : 6.9746 nsp_loss : 0.0000 total_loss : 6.9746 avg_loss_step : 6.9746 learning_rate : 5.6568515e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:08.056133 - Iteration: 983 throughput_train : 263.889 seq/s mlm_loss : 7.1669 nsp_loss : 0.0000 total_loss : 7.1669 avg_loss_step : 7.1669 learning_rate : 5.51361e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:10.007734 - Iteration: 984 throughput_train : 262.389 seq/s mlm_loss : 6.8964 nsp_loss : 0.0000 total_loss : 6.8964 avg_loss_step : 6.8964 learning_rate : 5.3665553e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:11.952179 - Iteration: 985 throughput_train : 263.356 seq/s mlm_loss : 6.9327 nsp_loss : 0.0000 total_loss : 6.9327 avg_loss_step : 6.9327 learning_rate : 5.2153555e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:13.899200 - Iteration: 986 throughput_train : 263.005 seq/s mlm_loss : 6.8967 nsp_loss : 0.0000 total_loss : 6.8967 avg_loss_step : 6.8967 learning_rate : 5.0596398e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:15.850486 - Iteration: 987 throughput_train : 262.435 seq/s mlm_loss : 7.0340 nsp_loss : 0.0000 total_loss : 7.0340 avg_loss_step : 7.0340 learning_rate : 4.8989674e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:17.791556 - Iteration: 988 throughput_train : 263.815 seq/s mlm_loss : 7.0388 nsp_loss : 0.0000 total_loss : 7.0388 avg_loss_step : 7.0388 learning_rate : 4.7328533e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:19.735560 - Iteration: 989 throughput_train : 263.415 seq/s mlm_loss : 7.0282 nsp_loss : 0.0000 total_loss : 7.0282 avg_loss_step : 7.0282 learning_rate : 4.5606932e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:21.687572 - Iteration: 990 throughput_train : 262.337 seq/s mlm_loss : 6.9032 nsp_loss : 0.0000 total_loss : 6.9032 avg_loss_step : 6.9032 learning_rate : 4.381774e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:23.632230 - Iteration: 991 throughput_train : 263.328 seq/s mlm_loss : 6.9428 nsp_loss : 0.0000 total_loss : 6.9428 avg_loss_step : 6.9428 learning_rate : 4.195231e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:25.575963 - Iteration: 992 throughput_train : 263.451 seq/s mlm_loss : 6.9965 nsp_loss : 0.0000 total_loss : 6.9965 avg_loss_step : 6.9965 learning_rate : 3.999986e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:27.521152 - Iteration: 993 throughput_train : 263.253 seq/s mlm_loss : 6.9281 nsp_loss : 0.0000 total_loss : 6.9281 avg_loss_step : 6.9281 learning_rate : 3.794721e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:29.466436 - Iteration: 994 throughput_train : 263.243 seq/s mlm_loss : 6.8886 nsp_loss : 0.0000 total_loss : 6.8886 avg_loss_step : 6.8886 learning_rate : 3.577699e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:31.412530 - Iteration: 995 throughput_train : 263.134 seq/s mlm_loss : 6.9287 nsp_loss : 0.0000 total_loss : 6.9287 avg_loss_step : 6.9287 learning_rate : 3.3466327e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:33.367772 - Iteration: 996 throughput_train : 261.900 seq/s mlm_loss : 6.9638 nsp_loss : 0.0000 total_loss : 6.9638 avg_loss_step : 6.9638 learning_rate : 3.098382e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:35.317308 - Iteration: 997 throughput_train : 262.673 seq/s mlm_loss : 6.9143 nsp_loss : 0.0000 total_loss : 6.9143 avg_loss_step : 6.9143 learning_rate : 2.8284087e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:37.266406 - Iteration: 998 throughput_train : 262.731 seq/s mlm_loss : 6.9291 nsp_loss : 0.0000 total_loss : 6.9291 avg_loss_step : 6.9291 learning_rate : 2.5298059e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:39.209309 - Iteration: 999 throughput_train : 263.564 seq/s mlm_loss : 6.7641 nsp_loss : 0.0000 total_loss : 6.7641 avg_loss_step : 6.7641 learning_rate : 2.1908761e-05 loss_scaler : 32768
DLL 2020-11-27 03:34:41.159398 - Iteration: 1000 throughput_train : 262.595 seq/s mlm_loss : 6.8897 nsp_loss : 0.0000 total_loss : 6.8897 avg_loss_step : 6.8897 learning_rate : 1.7888427e-05 loss_scaler : 32768
INFO:tensorflow:Saving checkpoints for 1000 into /cluster/home/andreku/norbert/model/model.ckpt.
I1127 03:35:07.889332 47493880456384 basic_session_run_hooks.py:606] Saving checkpoints for 1000 into /cluster/home/andreku/norbert/model/model.ckpt.
INFO:tensorflow:Loss for final step: 6.968466.
I1127 03:35:08.175190 47510213129408 estimator.py:371] Loss for final step: 6.968466.
INFO:tensorflow:Loss for final step: 6.9535007.
I1127 03:35:08.179248 47379959473344 estimator.py:371] Loss for final step: 6.9535007.
INFO:tensorflow:Loss for final step: 6.982458.
I1127 03:35:08.204637 47913607222464 estimator.py:371] Loss for final step: 6.982458.
Skipping time record for 1000 due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:35:28.905232 - Iteration: 1001 throughput_train : 10.724 seq/s mlm_loss : 6.9356 nsp_loss : 0.0000 total_loss : 6.9356 avg_loss_step : 6.9356 learning_rate : 1.2648652e-05 loss_scaler : 32768
INFO:tensorflow:Loss for final step: 6.9356194.
I1127 03:35:29.232800 47493880456384 estimator.py:371] Loss for final step: 6.9356194.
INFO:tensorflow:-----------------------------
I1127 03:35:29.233134 47493880456384 run_pretraining.py:642] -----------------------------
INFO:tensorflow:Total Training Time = 2250.64 for Sentences = 512000
I1127 03:35:29.233198 47493880456384 run_pretraining.py:644] Total Training Time = 2250.64 for Sentences = 512000
INFO:tensorflow:Total Training Time W/O Overhead = 1934.17 for Sentences = 491520
I1127 03:35:29.233260 47493880456384 run_pretraining.py:646] Total Training Time W/O Overhead = 1934.17 for Sentences = 491520
INFO:tensorflow:Throughput Average (sentences/sec) with overhead = 227.49
I1127 03:35:29.233312 47493880456384 run_pretraining.py:647] Throughput Average (sentences/sec) with overhead = 227.49
INFO:tensorflow:Throughput Average (sentences/sec) = 254.12
I1127 03:35:29.233363 47493880456384 run_pretraining.py:648] Throughput Average (sentences/sec) = 254.12
DLL 2020-11-27 03:35:29.233412 - throughput_train : 254.125 seq/s
INFO:tensorflow:-----------------------------
I1127 03:35:29.233526 47493880456384 run_pretraining.py:650] -----------------------------
Training BERT finished.
Task and CPU usage stats:
JobID JobName AllocCPUS NTasks MinCPU MinCPUTask AveCPU Elapsed ExitCode
------------ ---------- ---------- -------- ---------- ---------- ---------- ---------- --------
1581003 BERT 24 00:46:27 0:0
1581003.bat+ batch 24 1 03:42:39 0 03:42:39 00:46:27 0:0
1581003.ext+ extern 24 1 00:00:00 0 00:00:00 00:46:27 0:0
Memory usage stats:
JobID MaxRSS MaxRSSTask AveRSS MaxPages MaxPagesTask AvePages
------------ ---------- ---------- ---------- -------- -------------- ----------
1581003
1581003.bat+ 17655412K 0 17655412K 9049 0 9049
1581003.ext+ 0 0 0 0 0 0
Disk usage stats:
JobID MaxDiskRead MaxDiskReadTask AveDiskRead MaxDiskWrite MaxDiskWriteTask AveDiskWrite
------------ ------------ --------------- -------------- ------------ ---------------- --------------
1581003
1581003.bat+ 9.60M 0 9.60M 0.07M 0 0.07M
1581003.ext+ 0.00M 0 0.00M 0 0 0
Job 1581003 completed at Fri Nov 27 03:35:35 CET 2020