Skip to content
Snippets Groups Projects
tensorflow_1.15.2_openBLAS_bert.out 328.59 KiB
Starting job 1581003 on c7-7 at Fri Nov 27 02:49:07 CET 2020

The following modules were not unloaded:
  (Use "module --force purge" to unload all):

  1) StdEnv
Training corpus: /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/no_wiki/
WordPiece vocabulary used: /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/norwegian_wordpiece_vocab_20k.txt
BERT configuration file: /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/norbert_config.json
Directory for TF record files: /cluster/home/andreku/norbert/data/tfrecords/
Directory for the trained model: /cluster/home/andreku/norbert/model/
Creating pretraining data (TF records)...
2020-11-27 02:49:26.018191: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
WARNING:tensorflow:From utils/create_pretraining_data.py:70: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

creating instance from /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/no_wiki/nowiki2.txt
creating instance from /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/no_wiki/nowiki5.txt
creating instance from /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/no_wiki/nowiki4.txt
creating instance from /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/no_wiki/nowiki3.txt
creating instance from /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/no_wiki/nowiki0.txt
creating instance from /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/no_wiki/nowiki1.txt
*** Writing to output files ***
/cluster/home/andreku/norbert/data/tfrecords/0.tfr
/cluster/home/andreku/norbert/data/tfrecords/1.tfr
/cluster/home/andreku/norbert/data/tfrecords/2.tfr
/cluster/home/andreku/norbert/data/tfrecords/3.tfr
Creating pretraining data (TF records) finished.
Training BERT on the files from /cluster/home/andreku/norbert/data/tfrecords/...
2020-11-27 02:57:00.495256: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:57:00.495260: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:57:00.495259: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:57:00.495257: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
WARNING! Combining --use_xla with --manual_fp16 may prevent convergence.
         This warning message will be removed when the underlying
         issues have been fixed and you are running a TF version
         that has that fix.
WARNING! Combining --use_xla with --manual_fp16 may prevent convergence.
         This warning message will be removed when the underlying
         issues have been fixed and you are running a TF version
         that has that fix.
WARNING! Combining --use_xla with --manual_fp16 may prevent convergence.
         This warning message will be removed when the underlying
         issues have been fixed and you are running a TF version
         that has that fix.
WARNING! Combining --use_xla with --manual_fp16 may prevent convergence.
         This warning message will be removed when the underlying
         issues have been fixed and you are running a TF version
         that has that fix.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W1127 02:57:36.098473 47493880456384 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W1127 02:57:36.098610 47913607222464 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W1127 02:57:36.098645 47379959473344 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W1127 02:57:36.098819 47510213129408 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

INFO:tensorflow:***** Configuaration *****
I1127 02:57:58.593601 47493880456384 run_pretraining.py:577] ***** Configuaration *****
INFO:tensorflow:  logtostderr: False
I1127 02:57:58.593803 47493880456384 run_pretraining.py:579]   logtostderr: False
INFO:tensorflow:  alsologtostderr: False
I1127 02:57:58.593867 47493880456384 run_pretraining.py:579]   alsologtostderr: False
INFO:tensorflow:  log_dir: 
I1127 02:57:58.593924 47493880456384 run_pretraining.py:579]   log_dir: 
INFO:tensorflow:  v: 0
I1127 02:57:58.593979 47493880456384 run_pretraining.py:579]   v: 0
INFO:tensorflow:  verbosity: 0
I1127 02:57:58.594031 47493880456384 run_pretraining.py:579]   verbosity: 0
INFO:tensorflow:  stderrthreshold: fatal
I1127 02:57:58.594082 47493880456384 run_pretraining.py:579]   stderrthreshold: fatal
INFO:tensorflow:  showprefixforinfo: True
I1127 02:57:58.594133 47493880456384 run_pretraining.py:579]   showprefixforinfo: True
INFO:tensorflow:  run_with_pdb: False
I1127 02:57:58.594184 47493880456384 run_pretraining.py:579]   run_with_pdb: False
INFO:tensorflow:  pdb_post_mortem: False
INFO:tensorflow:Using config: {'_model_dir': '/cluster/home/andreku/norbert/model/', '_tf_random_seed': None, '_save_summary_steps': None, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': gpu_options {
  visible_device_list: "1"
}
graph_options {
  optimizer_options {
    global_jit_level: ON_1
  }
  rewrite_options {
    memory_optimization: NO_MEM_OPT
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 10000, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x2b17e6e8f0d0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
I1127 02:57:58.594235 47493880456384 run_pretraining.py:579]   pdb_post_mortem: False
INFO:tensorflow:  run_with_profiling: False
I1127 02:57:58.594284 47493880456384 run_pretraining.py:579]   run_with_profiling: False
I1127 02:57:58.594083 47379959473344 estimator.py:212] Using config: {'_model_dir': '/cluster/home/andreku/norbert/model/', '_tf_random_seed': None, '_save_summary_steps': None, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': gpu_options {
  visible_device_list: "1"
}
graph_options {
  optimizer_options {
    global_jit_level: ON_1
  }
  rewrite_options {
    memory_optimization: NO_MEM_OPT
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 10000, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x2b17e6e8f0d0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:  profile_file: None
INFO:tensorflow:Using config: {'_model_dir': '/cluster/home/andreku/norbert/model/', '_tf_random_seed': None, '_save_summary_steps': None, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': gpu_options {
  visible_device_list: "2"
}
graph_options {
  optimizer_options {
    global_jit_level: ON_1
  }
  rewrite_options {
    memory_optimization: NO_MEM_OPT
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 10000, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x2b363be151d0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
I1127 02:57:58.594334 47493880456384 run_pretraining.py:579]   profile_file: None
INFO:tensorflow:  use_cprofile_for_profiling: True
I1127 02:57:58.594384 47493880456384 run_pretraining.py:579]   use_cprofile_for_profiling: True
I1127 02:57:58.594178 47510213129408 estimator.py:212] Using config: {'_model_dir': '/cluster/home/andreku/norbert/model/', '_tf_random_seed': None, '_save_summary_steps': None, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': gpu_options {
  visible_device_list: "2"
}
graph_options {
  optimizer_options {
    global_jit_level: ON_1
  }
  rewrite_options {
    memory_optimization: NO_MEM_OPT
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 10000, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x2b363be151d0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Using config: {'_model_dir': '/cluster/home/andreku/norbert/model/', '_tf_random_seed': None, '_save_summary_steps': None, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': gpu_options {
  visible_device_list: "3"
}
graph_options {
  optimizer_options {
    global_jit_level: ON_1
  }
  rewrite_options {
    memory_optimization: NO_MEM_OPT
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 10000, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x2b9426cb3150>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:  only_check_args: False
I1127 02:57:58.594434 47493880456384 run_pretraining.py:579]   only_check_args: False
I1127 02:57:58.594252 47913607222464 estimator.py:212] Using config: {'_model_dir': '/cluster/home/andreku/norbert/model/', '_tf_random_seed': None, '_save_summary_steps': None, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': gpu_options {
  visible_device_list: "3"
}
graph_options {
  optimizer_options {
    global_jit_level: ON_1
  }
  rewrite_options {
    memory_optimization: NO_MEM_OPT
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 10000, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x2b9426cb3150>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x2b17e6ceecb0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:  op_conversion_fallback_to_while_loop: False
I1127 02:57:58.594484 47493880456384 run_pretraining.py:579]   op_conversion_fallback_to_while_loop: False
W1127 02:57:58.594417 47379959473344 model_fn.py:630] Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x2b17e6ceecb0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:  test_random_seed: 301
I1127 02:57:58.594543 47493880456384 run_pretraining.py:579]   test_random_seed: 301
INFO:tensorflow:  test_srcdir: 
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x2b363bc79cb0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:***** Running training *****
I1127 02:57:58.594595 47493880456384 run_pretraining.py:579]   test_srcdir: 
I1127 02:57:58.594599 47379959473344 run_pretraining.py:623] ***** Running training *****
W1127 02:57:58.594530 47510213129408 model_fn.py:630] Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x2b363bc79cb0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:  test_tmpdir: /tmp/absl_testing
I1127 02:57:58.594644 47493880456384 run_pretraining.py:579]   test_tmpdir: /tmp/absl_testing
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x2b9426b12cb0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:  test_randomize_ordering_seed: 
W1127 02:57:58.594609 47913607222464 model_fn.py:630] Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x2b9426b12cb0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:  Batch size = 128
INFO:tensorflow:***** Running training *****
I1127 02:57:58.594694 47493880456384 run_pretraining.py:579]   test_randomize_ordering_seed: 
I1127 02:57:58.594661 47379959473344 run_pretraining.py:624]   Batch size = 128
I1127 02:57:58.594704 47510213129408 run_pretraining.py:623] ***** Running training *****
INFO:tensorflow:  xml_output_file: 
I1127 02:57:58.594743 47493880456384 run_pretraining.py:579]   xml_output_file: 
INFO:tensorflow:  bert_config_file: /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/norbert_config.json
INFO:tensorflow:***** Running training *****
I1127 02:57:58.594794 47493880456384 run_pretraining.py:579]   bert_config_file: /cluster/shared/nlpl/software/easybuild_ak/tests/text_data/norbert_config.json
INFO:tensorflow:  Batch size = 128
I1127 02:57:58.594791 47913607222464 run_pretraining.py:623] ***** Running training *****
I1127 02:57:58.594764 47510213129408 run_pretraining.py:624]   Batch size = 128
INFO:tensorflow:  input_files_dir: /cluster/home/andreku/norbert/data/tfrecords/
I1127 02:57:58.594844 47493880456384 run_pretraining.py:579]   input_files_dir: /cluster/home/andreku/norbert/data/tfrecords/
INFO:tensorflow:  Batch size = 128
INFO:tensorflow:  eval_files_dir: None
I1127 02:57:58.594896 47493880456384 run_pretraining.py:579]   eval_files_dir: None
I1127 02:57:58.594849 47913607222464 run_pretraining.py:624]   Batch size = 128
INFO:tensorflow:  output_dir: /cluster/home/andreku/norbert/model/
I1127 02:57:58.594947 47493880456384 run_pretraining.py:579]   output_dir: /cluster/home/andreku/norbert/model/
INFO:tensorflow:  dllog_path: /cluster/home/andreku/norbert/bert_dllog.json
I1127 02:57:58.594998 47493880456384 run_pretraining.py:579]   dllog_path: /cluster/home/andreku/norbert/bert_dllog.json
INFO:tensorflow:  init_checkpoint: None
I1127 02:57:58.595048 47493880456384 run_pretraining.py:579]   init_checkpoint: None
INFO:tensorflow:  optimizer_type: lamb
I1127 02:57:58.595099 47493880456384 run_pretraining.py:579]   optimizer_type: lamb
INFO:tensorflow:  max_seq_length: 128
I1127 02:57:58.595149 47493880456384 run_pretraining.py:579]   max_seq_length: 128
INFO:tensorflow:  max_predictions_per_seq: 20
I1127 02:57:58.595200 47493880456384 run_pretraining.py:579]   max_predictions_per_seq: 20
INFO:tensorflow:  do_train: True
I1127 02:57:58.595249 47493880456384 run_pretraining.py:579]   do_train: True
INFO:tensorflow:  do_eval: False
I1127 02:57:58.595298 47493880456384 run_pretraining.py:579]   do_eval: False
INFO:tensorflow:  train_batch_size: 128
I1127 02:57:58.595348 47493880456384 run_pretraining.py:579]   train_batch_size: 128
INFO:tensorflow:  eval_batch_size: 8
I1127 02:57:58.595397 47493880456384 run_pretraining.py:579]   eval_batch_size: 8
INFO:tensorflow:  learning_rate: 0.0001
I1127 02:57:58.595450 47493880456384 run_pretraining.py:579]   learning_rate: 0.0001
INFO:tensorflow:  num_train_steps: 1000
I1127 02:57:58.595500 47493880456384 run_pretraining.py:579]   num_train_steps: 1000
INFO:tensorflow:  num_warmup_steps: 100
I1127 02:57:58.595556 47493880456384 run_pretraining.py:579]   num_warmup_steps: 100
INFO:tensorflow:  save_checkpoints_steps: 1000
I1127 02:57:58.595606 47493880456384 run_pretraining.py:579]   save_checkpoints_steps: 1000
INFO:tensorflow:  display_loss_steps: 1
I1127 02:57:58.595656 47493880456384 run_pretraining.py:579]   display_loss_steps: 1
INFO:tensorflow:  iterations_per_loop: 1000
I1127 02:57:58.595705 47493880456384 run_pretraining.py:579]   iterations_per_loop: 1000
INFO:tensorflow:  max_eval_steps: 100
I1127 02:57:58.595754 47493880456384 run_pretraining.py:579]   max_eval_steps: 100
INFO:tensorflow:  num_accumulation_steps: 1
I1127 02:57:58.595804 47493880456384 run_pretraining.py:579]   num_accumulation_steps: 1
INFO:tensorflow:  allreduce_post_accumulation: False
I1127 02:57:58.595854 47493880456384 run_pretraining.py:579]   allreduce_post_accumulation: False
INFO:tensorflow:  verbose_logging: False
I1127 02:57:58.595903 47493880456384 run_pretraining.py:579]   verbose_logging: False
INFO:tensorflow:  horovod: True
I1127 02:57:58.595952 47493880456384 run_pretraining.py:579]   horovod: True
INFO:tensorflow:  report_loss: True
I1127 02:57:58.596003 47493880456384 run_pretraining.py:579]   report_loss: True
INFO:tensorflow:  manual_fp16: True
I1127 02:57:58.596053 47493880456384 run_pretraining.py:579]   manual_fp16: True
INFO:tensorflow:  amp: False
I1127 02:57:58.596103 47493880456384 run_pretraining.py:579]   amp: False
INFO:tensorflow:  use_xla: True
I1127 02:57:58.596153 47493880456384 run_pretraining.py:579]   use_xla: True
INFO:tensorflow:  init_loss_scale: 4294967296
I1127 02:57:58.596203 47493880456384 run_pretraining.py:579]   init_loss_scale: 4294967296
INFO:tensorflow:  ?: False
I1127 02:57:58.596254 47493880456384 run_pretraining.py:579]   ?: False
INFO:tensorflow:  help: False
I1127 02:57:58.596303 47493880456384 run_pretraining.py:579]   help: False
INFO:tensorflow:  helpshort: False
I1127 02:57:58.596354 47493880456384 run_pretraining.py:579]   helpshort: False
INFO:tensorflow:  helpfull: False
I1127 02:57:58.596405 47493880456384 run_pretraining.py:579]   helpfull: False
INFO:tensorflow:  helpxml: False
I1127 02:57:58.596455 47493880456384 run_pretraining.py:579]   helpxml: False
INFO:tensorflow:**************************
I1127 02:57:58.596499 47493880456384 run_pretraining.py:580] **************************
INFO:tensorflow:Using config: {'_model_dir': '/cluster/home/andreku/norbert/model/', '_tf_random_seed': None, '_save_summary_steps': 1000, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': gpu_options {
  visible_device_list: "0"
}
graph_options {
  optimizer_options {
    global_jit_level: ON_1
  }
  rewrite_options {
    memory_optimization: NO_MEM_OPT
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 10000, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x2b326e601090>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
I1127 02:57:58.597053 47493880456384 estimator.py:212] Using config: {'_model_dir': '/cluster/home/andreku/norbert/model/', '_tf_random_seed': None, '_save_summary_steps': 1000, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': gpu_options {
  visible_device_list: "0"
}
graph_options {
  optimizer_options {
    global_jit_level: ON_1
  }
  rewrite_options {
    memory_optimization: NO_MEM_OPT
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 10000, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x2b326e601090>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x2b326e46ccb0>) includes params argument, but params are not passed to Estimator.
W1127 02:57:58.597308 47493880456384 model_fn.py:630] Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x2b326e46ccb0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:***** Running training *****
I1127 02:57:58.597487 47493880456384 run_pretraining.py:623] ***** Running training *****
INFO:tensorflow:  Batch size = 128
I1127 02:57:58.597556 47493880456384 run_pretraining.py:624]   Batch size = 128
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W1127 02:57:58.633793 47379959473344 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W1127 02:57:58.633807 47913607222464 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W1127 02:57:58.633828 47493880456384 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W1127 02:57:58.633791 47510213129408 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From run_pretraining.py:508: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
W1127 02:57:58.648817 47379959473344 deprecation.py:323] From run_pretraining.py:508: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W1127 02:57:58.648943 47379959473344 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
WARNING:tensorflow:From run_pretraining.py:508: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
W1127 02:57:58.649004 47493880456384 deprecation.py:323] From run_pretraining.py:508: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
WARNING:tensorflow:From run_pretraining.py:508: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
WARNING:tensorflow:From run_pretraining.py:508: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W1127 02:57:58.649064 47510213129408 deprecation.py:323] From run_pretraining.py:508: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
W1127 02:57:58.649065 47913607222464 deprecation.py:323] From run_pretraining.py:508: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
W1127 02:57:58.649117 47493880456384 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W1127 02:57:58.649194 47510213129408 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W1127 02:57:58.649196 47913607222464 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
WARNING:tensorflow:From run_pretraining.py:525: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
W1127 02:57:58.675407 47379959473344 deprecation.py:323] From run_pretraining.py:525: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
WARNING:tensorflow:From run_pretraining.py:525: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
WARNING:tensorflow:From run_pretraining.py:525: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/batching.py:276: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
W1127 02:57:58.675494 47493880456384 deprecation.py:323] From run_pretraining.py:525: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
W1127 02:57:58.675501 47510213129408 deprecation.py:323] From run_pretraining.py:525: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
W1127 02:57:58.675592 47379959473344 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/batching.py:276: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/batching.py:276: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/batching.py:276: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
WARNING:tensorflow:From run_pretraining.py:525: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
W1127 02:57:58.675670 47510213129408 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/batching.py:276: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
W1127 02:57:58.675669 47493880456384 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/batching.py:276: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
W1127 02:57:58.675637 47913607222464 deprecation.py:323] From run_pretraining.py:525: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/batching.py:276: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
W1127 02:57:58.675801 47913607222464 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/contrib/data/python/ops/batching.py:276: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.

W1127 02:57:58.757650 47379959473344 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.

WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.

W1127 02:57:58.758238 47493880456384 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.

WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.

W1127 02:57:58.759057 47913607222464 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.

WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.

W1127 02:57:58.759206 47510213129408 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.parse_single_example is deprecated. Please use tf.io.parse_single_example instead.

WARNING:tensorflow:From run_pretraining.py:540: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From run_pretraining.py:540: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From run_pretraining.py:540: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W1127 02:57:58.875067 47493880456384 deprecation.py:323] From run_pretraining.py:540: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From run_pretraining.py:540: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W1127 02:57:58.875084 47913607222464 deprecation.py:323] From run_pretraining.py:540: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W1127 02:57:58.875098 47510213129408 deprecation.py:323] From run_pretraining.py:540: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W1127 02:57:58.875125 47379959473344 deprecation.py:323] From run_pretraining.py:540: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
I1127 02:57:58.910511 47379959473344 estimator.py:1148] Calling model_fn.
INFO:tensorflow:Calling model_fn.
I1127 02:57:58.910566 47913607222464 estimator.py:1148] Calling model_fn.
INFO:tensorflow:*** Features ***
I1127 02:57:58.910585 47510213129408 estimator.py:1148] Calling model_fn.
I1127 02:57:58.910651 47379959473344 run_pretraining.py:257] *** Features ***
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:*** Features ***
INFO:tensorflow:*** Features ***
I1127 02:57:58.910698 47913607222464 run_pretraining.py:257] *** Features ***
INFO:tensorflow:  name = input_ids, shape = (128, 128)
I1127 02:57:58.910714 47510213129408 run_pretraining.py:257] *** Features ***
I1127 02:57:58.910659 47493880456384 estimator.py:1148] Calling model_fn.
I1127 02:57:58.910735 47379959473344 run_pretraining.py:259]   name = input_ids, shape = (128, 128)
INFO:tensorflow:  name = input_ids, shape = (128, 128)
INFO:tensorflow:*** Features ***
INFO:tensorflow:  name = input_ids, shape = (128, 128)
I1127 02:57:58.910781 47913607222464 run_pretraining.py:259]   name = input_ids, shape = (128, 128)
INFO:tensorflow:  name = input_mask, shape = (128, 128)
I1127 02:57:58.910788 47493880456384 run_pretraining.py:257] *** Features ***
I1127 02:57:58.910800 47510213129408 run_pretraining.py:259]   name = input_ids, shape = (128, 128)
I1127 02:57:58.910806 47379959473344 run_pretraining.py:259]   name = input_mask, shape = (128, 128)
INFO:tensorflow:  name = input_mask, shape = (128, 128)
INFO:tensorflow:  name = input_mask, shape = (128, 128)
I1127 02:57:58.910852 47913607222464 run_pretraining.py:259]   name = input_mask, shape = (128, 128)
INFO:tensorflow:  name = masked_lm_ids, shape = (128, 20)
INFO:tensorflow:  name = input_ids, shape = (128, 128)
I1127 02:57:58.910870 47510213129408 run_pretraining.py:259]   name = input_mask, shape = (128, 128)
I1127 02:57:58.910872 47379959473344 run_pretraining.py:259]   name = masked_lm_ids, shape = (128, 20)
I1127 02:57:58.910872 47493880456384 run_pretraining.py:259]   name = input_ids, shape = (128, 128)
INFO:tensorflow:  name = masked_lm_ids, shape = (128, 20)
INFO:tensorflow:  name = masked_lm_positions, shape = (128, 20)
INFO:tensorflow:  name = masked_lm_ids, shape = (128, 20)
I1127 02:57:58.910917 47913607222464 run_pretraining.py:259]   name = masked_lm_ids, shape = (128, 20)
INFO:tensorflow:  name = input_mask, shape = (128, 128)
I1127 02:57:58.910934 47510213129408 run_pretraining.py:259]   name = masked_lm_ids, shape = (128, 20)
I1127 02:57:58.910934 47379959473344 run_pretraining.py:259]   name = masked_lm_positions, shape = (128, 20)
I1127 02:57:58.910941 47493880456384 run_pretraining.py:259]   name = input_mask, shape = (128, 128)
INFO:tensorflow:  name = masked_lm_positions, shape = (128, 20)
INFO:tensorflow:  name = masked_lm_weights, shape = (128, 20)
INFO:tensorflow:  name = masked_lm_positions, shape = (128, 20)
I1127 02:57:58.910980 47913607222464 run_pretraining.py:259]   name = masked_lm_positions, shape = (128, 20)
INFO:tensorflow:  name = masked_lm_ids, shape = (128, 20)
I1127 02:57:58.910997 47510213129408 run_pretraining.py:259]   name = masked_lm_positions, shape = (128, 20)
I1127 02:57:58.910996 47379959473344 run_pretraining.py:259]   name = masked_lm_weights, shape = (128, 20)
I1127 02:57:58.911006 47493880456384 run_pretraining.py:259]   name = masked_lm_ids, shape = (128, 20)
INFO:tensorflow:  name = masked_lm_weights, shape = (128, 20)
INFO:tensorflow:  name = masked_lm_weights, shape = (128, 20)
INFO:tensorflow:  name = next_sentence_labels, shape = (128, 1)
I1127 02:57:58.911043 47913607222464 run_pretraining.py:259]   name = masked_lm_weights, shape = (128, 20)
INFO:tensorflow:  name = masked_lm_positions, shape = (128, 20)
I1127 02:57:58.911058 47510213129408 run_pretraining.py:259]   name = masked_lm_weights, shape = (128, 20)
I1127 02:57:58.911058 47379959473344 run_pretraining.py:259]   name = next_sentence_labels, shape = (128, 1)
I1127 02:57:58.911068 47493880456384 run_pretraining.py:259]   name = masked_lm_positions, shape = (128, 20)
INFO:tensorflow:  name = next_sentence_labels, shape = (128, 1)
INFO:tensorflow:  name = next_sentence_labels, shape = (128, 1)
INFO:tensorflow:  name = segment_ids, shape = (128, 128)
I1127 02:57:58.911104 47913607222464 run_pretraining.py:259]   name = next_sentence_labels, shape = (128, 1)
INFO:tensorflow:  name = masked_lm_weights, shape = (128, 20)
I1127 02:57:58.911120 47510213129408 run_pretraining.py:259]   name = next_sentence_labels, shape = (128, 1)
I1127 02:57:58.911121 47379959473344 run_pretraining.py:259]   name = segment_ids, shape = (128, 128)
I1127 02:57:58.911129 47493880456384 run_pretraining.py:259]   name = masked_lm_weights, shape = (128, 20)
INFO:tensorflow:  name = segment_ids, shape = (128, 128)
INFO:tensorflow:  name = segment_ids, shape = (128, 128)
I1127 02:57:58.911166 47913607222464 run_pretraining.py:259]   name = segment_ids, shape = (128, 128)
INFO:tensorflow:  name = next_sentence_labels, shape = (128, 1)
I1127 02:57:58.911181 47510213129408 run_pretraining.py:259]   name = segment_ids, shape = (128, 128)
I1127 02:57:58.911189 47493880456384 run_pretraining.py:259]   name = next_sentence_labels, shape = (128, 1)
INFO:tensorflow:  name = segment_ids, shape = (128, 128)
I1127 02:57:58.911251 47493880456384 run_pretraining.py:259]   name = segment_ids, shape = (128, 128)
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:176: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

W1127 02:57:58.911300 47379959473344 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:176: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:176: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:176: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

W1127 02:57:58.911346 47913607222464 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:176: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

W1127 02:57:58.911365 47510213129408 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:176: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:176: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

W1127 02:57:58.911433 47493880456384 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:176: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:427: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

W1127 02:57:58.912754 47379959473344 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:427: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:427: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:427: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

W1127 02:57:58.912822 47913607222464 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:427: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

W1127 02:57:58.912829 47510213129408 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:427: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:427: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

W1127 02:57:58.912909 47493880456384 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:427: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:366: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
W1127 02:57:58.961073 47379959473344 deprecation.py:506] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:366: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:366: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:366: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
W1127 02:57:58.961161 47913607222464 deprecation.py:506] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:366: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
W1127 02:57:58.961200 47510213129408 deprecation.py:506] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:366: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:366: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
W1127 02:57:58.961334 47493880456384 deprecation.py:506] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:366: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:683: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
W1127 02:57:58.975179 47379959473344 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:683: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:683: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
W1127 02:57:58.975510 47913607222464 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:683: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:683: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
W1127 02:57:58.975659 47510213129408 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:683: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W1127 02:57:58.975763 47379959473344 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:683: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
W1127 02:57:58.975786 47493880456384 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/modeling.py:683: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.Dense instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W1127 02:57:58.976104 47913607222464 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W1127 02:57:58.976248 47510213129408 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W1127 02:57:58.976366 47493880456384 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From run_pretraining.py:295: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

W1127 02:58:01.261205 47379959473344 module_wrapper.py:139] From run_pretraining.py:295: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

decayed_learning_rate_at_crossover_point = 4.000000e-04, adjusted_init_lr = 4.000000e-04
Initializing LAMB Optimizer
WARNING:tensorflow:From run_pretraining.py:295: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

W1127 02:58:01.303079 47493880456384 module_wrapper.py:139] From run_pretraining.py:295: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

decayed_learning_rate_at_crossover_point = 4.000000e-04, adjusted_init_lr = 4.000000e-04
WARNING:tensorflow:From run_pretraining.py:295: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

W1127 02:58:01.307732 47510213129408 module_wrapper.py:139] From run_pretraining.py:295: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

decayed_learning_rate_at_crossover_point = 4.000000e-04, adjusted_init_lr = 4.000000e-04
Initializing LAMB Optimizer
Initializing LAMB Optimizer
WARNING:tensorflow:From run_pretraining.py:295: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

W1127 02:58:01.331020 47913607222464 module_wrapper.py:139] From run_pretraining.py:295: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.
decayed_learning_rate_at_crossover_point = 4.000000e-04, adjusted_init_lr = 4.000000e-04
Initializing LAMB Optimizer
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W1127 02:58:01.734069 47379959473344 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W1127 02:58:01.734835 47510213129408 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W1127 02:58:01.735529 47913607222464 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W1127 02:58:01.735536 47493880456384 deprecation.py:323] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-TensorFlow/1.15.2-foss-2019b-Python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:169: The name tf.is_finite is deprecated. Please use tf.math.is_finite instead.

W1127 02:58:05.910739 47379959473344 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:169: The name tf.is_finite is deprecated. Please use tf.math.is_finite instead.

WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:169: The name tf.is_finite is deprecated. Please use tf.math.is_finite instead.

W1127 02:58:05.961147 47510213129408 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:169: The name tf.is_finite is deprecated. Please use tf.math.is_finite instead.

WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:169: The name tf.is_finite is deprecated. Please use tf.math.is_finite instead.

W1127 02:58:05.993538 47493880456384 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:169: The name tf.is_finite is deprecated. Please use tf.math.is_finite instead.

WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:169: The name tf.is_finite is deprecated. Please use tf.math.is_finite instead.

W1127 02:58:05.996852 47913607222464 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:169: The name tf.is_finite is deprecated. Please use tf.math.is_finite instead.

WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:178: The name tf.global_norm is deprecated. Please use tf.linalg.global_norm instead.

W1127 02:58:06.240663 47379959473344 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:178: The name tf.global_norm is deprecated. Please use tf.linalg.global_norm instead.

WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:178: The name tf.global_norm is deprecated. Please use tf.linalg.global_norm instead.

W1127 02:58:06.295029 47510213129408 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:178: The name tf.global_norm is deprecated. Please use tf.linalg.global_norm instead.

WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:178: The name tf.global_norm is deprecated. Please use tf.linalg.global_norm instead.

W1127 02:58:06.329781 47493880456384 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:178: The name tf.global_norm is deprecated. Please use tf.linalg.global_norm instead.

WARNING:tensorflow:From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:178: The name tf.global_norm is deprecated. Please use tf.linalg.global_norm instead.

W1127 02:58:06.333020 47913607222464 module_wrapper.py:139] From /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/software/NLPL-nvidia_BERT/20.06.8-foss-2019b-TensorFlow-1.15.2-Python-3.7.4/optimization.py:178: The name tf.global_norm is deprecated. Please use tf.linalg.global_norm instead.

INFO:tensorflow:Done calling model_fn.
I1127 02:58:15.510627 47379959473344 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
I1127 02:58:15.697726 47510213129408 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
I1127 02:58:15.809911 47493880456384 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
I1127 02:58:15.811256 47493880456384 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Done calling model_fn.
I1127 02:58:15.824473 47913607222464 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Graph was finalized.
I1127 02:58:47.664816 47379959473344 monitored_session.py:240] Graph was finalized.
2020-11-27 02:58:47.696844: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
INFO:tensorflow:Graph was finalized.
I1127 02:58:48.071923 47510213129408 monitored_session.py:240] Graph was finalized.
2020-11-27 02:58:48.084972: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
INFO:tensorflow:Graph was finalized.
I1127 02:58:48.395794 47913607222464 monitored_session.py:240] Graph was finalized.
2020-11-27 02:58:48.411693: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
INFO:tensorflow:Graph was finalized.
I1127 02:58:48.467956 47493880456384 monitored_session.py:240] Graph was finalized.
2020-11-27 02:58:48.484066: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-11-27 02:58:49.716379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:14:00.0
2020-11-27 02:58:49.716420: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:58:49.716528: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:b1:00.0
2020-11-27 02:58:49.716562: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:58:49.716593: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:88:00.0
2020-11-27 02:58:49.716629: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:58:49.716665: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:39:00.0
2020-11-27 02:58:49.716700: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:58:51.386188: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-27 02:58:51.386188: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-27 02:58:51.386187: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-27 02:58:51.386190: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-27 02:58:51.734149: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-11-27 02:58:51.734149: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-11-27 02:58:51.734171: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-11-27 02:58:51.734215: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-11-27 02:58:52.149409: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-11-27 02:58:52.149411: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-11-27 02:58:52.149412: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-11-27 02:58:52.149419: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-11-27 02:58:52.778962: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-11-27 02:58:52.778968: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-11-27 02:58:52.778967: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-11-27 02:58:52.778972: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-11-27 02:58:53.025336: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-11-27 02:58:53.025337: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-11-27 02:58:53.025339: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-11-27 02:58:53.025339: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-11-27 02:58:54.429955: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-27 02:58:54.429956: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-27 02:58:54.429956: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-27 02:58:54.429970: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-27 02:58:54.439842: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 1
2020-11-27 02:58:54.439993: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 2
2020-11-27 02:58:54.440065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 3
2020-11-27 02:58:54.440105: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:58:54.440106: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:58:54.440114: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:58:54.440136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-11-27 02:58:54.440171: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-11-27 02:58:55.749275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-27 02:58:55.749315: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      3 
2020-11-27 02:58:55.749322: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 3:   N 
2020-11-27 02:58:55.753723: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15125 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:b1:00.0, compute capability: 6.0)
2020-11-27 02:58:55.756771: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.
2020-11-27 02:58:55.790619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-27 02:58:55.790656: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      2 
2020-11-27 02:58:55.790663: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 2:   N 
2020-11-27 02:58:55.794820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15125 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:88:00.0, compute capability: 6.0)
2020-11-27 02:58:55.796612: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.
2020-11-27 02:58:55.833128: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-27 02:58:55.833169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      1 
2020-11-27 02:58:55.833176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 1:   N 
2020-11-27 02:58:55.837259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15125 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:39:00.0, compute capability: 6.0)
2020-11-27 02:58:55.840814: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.
2020-11-27 02:58:55.922214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-27 02:58:55.922259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
2020-11-27 02:58:55.922267: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
2020-11-27 02:58:55.926298: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15125 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:14:00.0, compute capability: 6.0)
2020-11-27 02:58:55.929590: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.
INFO:tensorflow:Running local_init_op.
I1127 02:59:01.640508 47379959473344 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Running local_init_op.
I1127 02:59:01.714041 47493880456384 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Running local_init_op.
I1127 02:59:01.775563 47913607222464 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Running local_init_op.
I1127 02:59:01.842157 47510213129408 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I1127 02:59:02.036135 47379959473344 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
I1127 02:59:02.105047 47493880456384 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
I1127 02:59:02.162716 47913607222464 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
I1127 02:59:02.238831 47510213129408 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /cluster/home/andreku/norbert/model/model.ckpt.
I1127 02:59:51.432053 47493880456384 basic_session_run_hooks.py:606] Saving checkpoints for 0 into /cluster/home/andreku/norbert/model/model.ckpt.
c7-7:41737:41793 [0] NCCL INFO Bootstrap : Using [0]ib0:10.33.7.7<0>
c7-7:41737:41793 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
c7-7:41737:41793 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:10.33.7.7<0>
c7-7:41737:41793 [0] NCCL INFO Using network IB
NCCL version 2.6.4+cuda10.1
c7-7:41740:41792 [3] NCCL INFO Bootstrap : Using [0]ib0:10.33.7.7<0>
c7-7:41739:41795 [2] NCCL INFO Bootstrap : Using [0]ib0:10.33.7.7<0>
c7-7:41738:41794 [1] NCCL INFO Bootstrap : Using [0]ib0:10.33.7.7<0>
c7-7:41740:41792 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
c7-7:41738:41794 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
c7-7:41739:41795 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
c7-7:41740:41792 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:10.33.7.7<0>
c7-7:41740:41792 [3] NCCL INFO Using network IB
c7-7:41738:41794 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:10.33.7.7<0>
c7-7:41738:41794 [1] NCCL INFO Using network IB
c7-7:41739:41795 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:10.33.7.7<0>
c7-7:41739:41795 [2] NCCL INFO Using network IB
c7-7:41739:41795 [2] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64
c7-7:41739:41795 [2] NCCL INFO Trees [0] 3/-1/-1->2->1|1->2->3/-1/-1 [1] 3/-1/-1->2->1|1->2->3/-1/-1
c7-7:41739:41795 [2] NCCL INFO Setting affinity for GPU 2 to 03f0,0003f000
c7-7:41737:41793 [0] NCCL INFO Channel 00/02 :    0   1   2   3
c7-7:41737:41793 [0] NCCL INFO Channel 01/02 :    0   1   2   3
c7-7:41740:41792 [3] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64
c7-7:41740:41792 [3] NCCL INFO Trees [0] -1/-1/-1->3->2|2->3->-1/-1/-1 [1] -1/-1/-1->3->2|2->3->-1/-1/-1
c7-7:41740:41792 [3] NCCL INFO Setting affinity for GPU 3 to 03f0,0003f000
c7-7:41738:41794 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64
c7-7:41738:41794 [1] NCCL INFO Trees [0] 2/-1/-1->1->0|0->1->2/-1/-1 [1] 2/-1/-1->1->0|0->1->2/-1/-1
c7-7:41738:41794 [1] NCCL INFO Setting affinity for GPU 1 to 3f00003f
c7-7:41737:41793 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64
c7-7:41737:41793 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1|-1->0->1/-1/-1 [1] 1/-1/-1->0->-1|-1->0->1/-1/-1
c7-7:41737:41793 [0] NCCL INFO Setting affinity for GPU 0 to 3f00003f
c7-7:41738:41794 [1] NCCL INFO Ring 00 : 1[39000] -> 2[88000] via P2P/IPC
c7-7:41740:41792 [3] NCCL INFO Ring 00 : 3[b1000] -> 0[14000] via P2P/IPC
c7-7:41739:41795 [2] NCCL INFO Ring 00 : 2[88000] -> 3[b1000] via P2P/IPC
c7-7:41737:41793 [0] NCCL INFO Ring 00 : 0[14000] -> 1[39000] via P2P/IPC
c7-7:41739:41795 [2] NCCL INFO Ring 00 : 2[88000] -> 1[39000] via P2P/IPC
c7-7:41738:41794 [1] NCCL INFO Ring 00 : 1[39000] -> 0[14000] via P2P/IPC
c7-7:41740:41792 [3] NCCL INFO Ring 00 : 3[b1000] -> 2[88000] via P2P/IPC
c7-7:41739:41795 [2] NCCL INFO Ring 01 : 2[88000] -> 3[b1000] via P2P/IPC
c7-7:41740:41792 [3] NCCL INFO Ring 01 : 3[b1000] -> 0[14000] via P2P/IPC
c7-7:41737:41793 [0] NCCL INFO Ring 01 : 0[14000] -> 1[39000] via P2P/IPC
c7-7:41738:41794 [1] NCCL INFO Ring 01 : 1[39000] -> 2[88000] via P2P/IPC
c7-7:41740:41792 [3] NCCL INFO Ring 01 : 3[b1000] -> 2[88000] via P2P/IPC
c7-7:41739:41795 [2] NCCL INFO Ring 01 : 2[88000] -> 1[39000] via P2P/IPC
c7-7:41740:41792 [3] NCCL INFO comm 0x2b942817e920 rank 3 nranks 4 cudaDev 3 busId b1000 - Init COMPLETE
c7-7:41738:41794 [1] NCCL INFO Ring 01 : 1[39000] -> 0[14000] via P2P/IPC
c7-7:41737:41793 [0] NCCL INFO comm 0x2b32702ae640 rank 0 nranks 4 cudaDev 0 busId 14000 - Init COMPLETE
c7-7:41739:41795 [2] NCCL INFO comm 0x2b36401cc990 rank 2 nranks 4 cudaDev 2 busId 88000 - Init COMPLETE
c7-7:41738:41794 [1] NCCL INFO comm 0x2b17e817e930 rank 1 nranks 4 cudaDev 1 busId 39000 - Init COMPLETE
c7-7:41737:41793 [0] NCCL INFO Launch mode Parallel
WARNING:tensorflow:From run_pretraining.py:146: The name tf.train.get_global_step is deprecated. Please use tf.compat.v1.train.get_global_step instead.

W1127 03:00:19.543303 47493880456384 module_wrapper.py:139] From run_pretraining.py:146: The name tf.train.get_global_step is deprecated. Please use tf.compat.v1.train.get_global_step instead.

2020-11-27 03:00:42.097413: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-27 03:00:44.042029: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-27 03:00:44.388321: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-27 03:00:44.700785: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-27 03:00:45.060258: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-27 03:00:45.078043: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-11-27 03:00:45.383347: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-27 03:00:45.388458: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:00:48.592843 - Iteration: 1  throughput_train : 17.862 seq/s mlm_loss : 10.0161  nsp_loss : 0.6303  total_loss : 10.6464  avg_loss_step : 10.6464  learning_rate : 0.0  loss_scaler : 4294967296 
INFO:tensorflow:loss = 10.646426, step = 0
I1127 03:00:48.593047 47493880456384 basic_session_run_hooks.py:262] loss = 10.646426, step = 0
INFO:tensorflow:loss = 10.658132, step = 0
I1127 03:00:49.077786 47379959473344 basic_session_run_hooks.py:262] loss = 10.658132, step = 0
INFO:tensorflow:loss = 10.675867, step = 0
I1127 03:00:49.085876 47510213129408 basic_session_run_hooks.py:262] loss = 10.675867, step = 0
INFO:tensorflow:loss = 10.671821, step = 0
I1127 03:00:49.093230 47913607222464 basic_session_run_hooks.py:262] loss = 10.671821, step = 0
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:16.408520 47493880456384 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:16.408800 - Iteration: 1  throughput_train : 18.407 seq/s mlm_loss : 10.0516  nsp_loss : 0.6226  total_loss : 10.6742  avg_loss_step : 10.6742  learning_rate : 0.0  loss_scaler : 4294967296 
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:16.409372 47379959473344 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:16.409482 47510213129408 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:16.409999 47913607222464 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:18.281277 47913607222464 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:18.281934 47493880456384 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:18.282180 - Iteration: 1  throughput_train : 273.343 seq/s mlm_loss : 10.0447  nsp_loss : 0.6394  total_loss : 10.6841  avg_loss_step : 10.6841  learning_rate : 0.0  loss_scaler : 2147483648 
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:18.286860 47510213129408 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:18.288407 47379959473344 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:20.167413 47379959473344 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:20.168865 47510213129408 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:20.169528 47493880456384 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:20.169768 - Iteration: 1  throughput_train : 271.284 seq/s mlm_loss : 10.0232  nsp_loss : 0.6320  total_loss : 10.6552  avg_loss_step : 10.6552  learning_rate : 0.0  loss_scaler : 2147483648 
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:20.173998 47913607222464 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:22.053810 47493880456384 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:22.054053 - Iteration: 1  throughput_train : 271.760 seq/s mlm_loss : 10.0441  nsp_loss : 0.6293  total_loss : 10.6734  avg_loss_step : 10.6734  learning_rate : 0.0  loss_scaler : 1073741824 
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:22.056735 47379959473344 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:22.057951 47510213129408 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:22.057997 47913607222464 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:23.931773 47493880456384 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:23.932012 - Iteration: 1  throughput_train : 272.674 seq/s mlm_loss : 10.0205  nsp_loss : 0.6204  total_loss : 10.6409  avg_loss_step : 10.6409  learning_rate : 0.0  loss_scaler : 1073741824 
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:23.933689 47913607222464 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:23.935443 47379959473344 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W1127 03:01:23.937267 47510213129408 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:25.838017 - Iteration: 1  throughput_train : 268.666 seq/s mlm_loss : 10.0254  nsp_loss : 0.6318  total_loss : 10.6573  avg_loss_step : 10.6573  learning_rate : 0.0  loss_scaler : 536870912 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:27.716195 - Iteration: 1  throughput_train : 272.647 seq/s mlm_loss : 10.0365  nsp_loss : 0.6306  total_loss : 10.6672  avg_loss_step : 10.6672  learning_rate : 0.0  loss_scaler : 536870912 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:29.590927 - Iteration: 1  throughput_train : 273.147 seq/s mlm_loss : 10.0342  nsp_loss : 0.6437  total_loss : 10.6779  avg_loss_step : 10.6779  learning_rate : 0.0  loss_scaler : 268435456 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:31.473217 - Iteration: 1  throughput_train : 272.054 seq/s mlm_loss : 10.0322  nsp_loss : 0.6440  total_loss : 10.6761  avg_loss_step : 10.6761  learning_rate : 0.0  loss_scaler : 268435456 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:33.362713 - Iteration: 1  throughput_train : 271.017 seq/s mlm_loss : 10.0186  nsp_loss : 0.6424  total_loss : 10.6611  avg_loss_step : 10.6611  learning_rate : 0.0  loss_scaler : 134217728 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:35.244203 - Iteration: 1  throughput_train : 272.166 seq/s mlm_loss : 10.0325  nsp_loss : 0.6334  total_loss : 10.6659  avg_loss_step : 10.6659  learning_rate : 0.0  loss_scaler : 134217728 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:37.134643 - Iteration: 1  throughput_train : 270.878 seq/s mlm_loss : 10.0104  nsp_loss : 0.6264  total_loss : 10.6368  avg_loss_step : 10.6368  learning_rate : 0.0  loss_scaler : 67108864 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:39.021636 - Iteration: 1  throughput_train : 271.373 seq/s mlm_loss : 10.0204  nsp_loss : 0.6293  total_loss : 10.6497  avg_loss_step : 10.6497  learning_rate : 0.0  loss_scaler : 67108864 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:40.918920 - Iteration: 1  throughput_train : 269.902 seq/s mlm_loss : 9.9981  nsp_loss : 0.6241  total_loss : 10.6222  avg_loss_step : 10.6222  learning_rate : 0.0  loss_scaler : 33554432 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:42.811608 - Iteration: 1  throughput_train : 270.562 seq/s mlm_loss : 10.0181  nsp_loss : 0.6417  total_loss : 10.6598  avg_loss_step : 10.6598  learning_rate : 0.0  loss_scaler : 33554432 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:44.708990 - Iteration: 1  throughput_train : 269.888 seq/s mlm_loss : 10.0143  nsp_loss : 0.6288  total_loss : 10.6432  avg_loss_step : 10.6432  learning_rate : 0.0  loss_scaler : 16777216 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:46.604911 - Iteration: 1  throughput_train : 270.097 seq/s mlm_loss : 10.0303  nsp_loss : 0.6504  total_loss : 10.6807  avg_loss_step : 10.6807  learning_rate : 0.0  loss_scaler : 16777216 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:48.496796 - Iteration: 1  throughput_train : 270.671 seq/s mlm_loss : 9.9989  nsp_loss : 0.6374  total_loss : 10.6362  avg_loss_step : 10.6362  learning_rate : 0.0  loss_scaler : 8388608 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:50.386115 - Iteration: 1  throughput_train : 271.039 seq/s mlm_loss : 10.0406  nsp_loss : 0.6216  total_loss : 10.6623  avg_loss_step : 10.6623  learning_rate : 0.0  loss_scaler : 8388608 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:52.294803 - Iteration: 1  throughput_train : 268.289 seq/s mlm_loss : 10.0155  nsp_loss : 0.6210  total_loss : 10.6365  avg_loss_step : 10.6365  learning_rate : 0.0  loss_scaler : 4194304 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:54.193717 - Iteration: 1  throughput_train : 269.670 seq/s mlm_loss : 10.0482  nsp_loss : 0.6340  total_loss : 10.6822  avg_loss_step : 10.6822  learning_rate : 0.0  loss_scaler : 4194304 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:56.081120 - Iteration: 1  throughput_train : 271.314 seq/s mlm_loss : 10.0234  nsp_loss : 0.6274  total_loss : 10.6508  avg_loss_step : 10.6508  learning_rate : 0.0  loss_scaler : 2097152 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:57.983241 - Iteration: 1  throughput_train : 269.214 seq/s mlm_loss : 10.0021  nsp_loss : 0.6238  total_loss : 10.6259  avg_loss_step : 10.6259  learning_rate : 0.0  loss_scaler : 2097152 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:01:59.876384 - Iteration: 1  throughput_train : 270.492 seq/s mlm_loss : 10.0168  nsp_loss : 0.6303  total_loss : 10.6471  avg_loss_step : 10.6471  learning_rate : 0.0  loss_scaler : 1048576 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:01.776965 - Iteration: 1  throughput_train : 269.435 seq/s mlm_loss : 10.0485  nsp_loss : 0.6272  total_loss : 10.6756  avg_loss_step : 10.6756  learning_rate : 0.0  loss_scaler : 1048576 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:03.668076 - Iteration: 1  throughput_train : 270.783 seq/s mlm_loss : 10.0267  nsp_loss : 0.6193  total_loss : 10.6460  avg_loss_step : 10.6460  learning_rate : 0.0  loss_scaler : 524288 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:05.566324 - Iteration: 1  throughput_train : 269.766 seq/s mlm_loss : 10.0401  nsp_loss : 0.6316  total_loss : 10.6717  avg_loss_step : 10.6717  learning_rate : 0.0  loss_scaler : 524288 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:07.461631 - Iteration: 1  throughput_train : 270.197 seq/s mlm_loss : 10.0415  nsp_loss : 0.6244  total_loss : 10.6660  avg_loss_step : 10.6660  learning_rate : 0.0  loss_scaler : 262144 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:09.352564 - Iteration: 1  throughput_train : 270.818 seq/s mlm_loss : 10.0427  nsp_loss : 0.6339  total_loss : 10.6765  avg_loss_step : 10.6765  learning_rate : 0.0  loss_scaler : 262144 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:11.245946 - Iteration: 1  throughput_train : 270.464 seq/s mlm_loss : 10.0306  nsp_loss : 0.6332  total_loss : 10.6638  avg_loss_step : 10.6638  learning_rate : 0.0  loss_scaler : 131072 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:13.141670 - Iteration: 1  throughput_train : 270.138 seq/s mlm_loss : 10.0604  nsp_loss : 0.6395  total_loss : 10.6999  avg_loss_step : 10.6999  learning_rate : 0.0  loss_scaler : 131072 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:15.034842 - Iteration: 1  throughput_train : 270.488 seq/s mlm_loss : 10.0276  nsp_loss : 0.6205  total_loss : 10.6481  avg_loss_step : 10.6481  learning_rate : 0.0  loss_scaler : 65536 
Skipping time record for  0  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:16.927021 - Iteration: 1  throughput_train : 270.631 seq/s mlm_loss : 10.0274  nsp_loss : 0.6343  total_loss : 10.6617  avg_loss_step : 10.6617  learning_rate : 0.0  loss_scaler : 65536 
Skipping time record for  1  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:18.875661 - Iteration: 2  throughput_train : 262.791 seq/s mlm_loss : 10.0255  nsp_loss : 0.6242  total_loss : 10.6497  avg_loss_step : 10.6497  learning_rate : 0.0  loss_scaler : 32768 
Skipping time record for  2  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:20.824668 - Iteration: 3  throughput_train : 262.741 seq/s mlm_loss : 10.0220  nsp_loss : 0.6288  total_loss : 10.6509  avg_loss_step : 10.6509  learning_rate : 4e-06  loss_scaler : 32768 
Skipping time record for  3  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:22.779278 - Iteration: 4  throughput_train : 261.989 seq/s mlm_loss : 10.0407  nsp_loss : 0.6191  total_loss : 10.6598  avg_loss_step : 10.6598  learning_rate : 8e-06  loss_scaler : 32768 
Skipping time record for  4  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:24.723752 - Iteration: 5  throughput_train : 263.353 seq/s mlm_loss : 10.0448  nsp_loss : 0.6121  total_loss : 10.6569  avg_loss_step : 10.6569  learning_rate : 1.2e-05  loss_scaler : 32768 
Skipping time record for  5  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:02:26.672401 - Iteration: 6  throughput_train : 262.790 seq/s mlm_loss : 10.0324  nsp_loss : 0.5929  total_loss : 10.6253  avg_loss_step : 10.6253  learning_rate : 1.6e-05  loss_scaler : 32768 
DLL 2020-11-27 03:02:28.626130 - Iteration: 7  throughput_train : 262.100 seq/s mlm_loss : 10.0436  nsp_loss : 0.5690  total_loss : 10.6126  avg_loss_step : 10.6126  learning_rate : 2e-05  loss_scaler : 32768 
DLL 2020-11-27 03:02:30.570816 - Iteration: 8  throughput_train : 263.320 seq/s mlm_loss : 10.0315  nsp_loss : 0.5704  total_loss : 10.6018  avg_loss_step : 10.6018  learning_rate : 2.4e-05  loss_scaler : 32768 
DLL 2020-11-27 03:02:32.512938 - Iteration: 9  throughput_train : 263.668 seq/s mlm_loss : 10.0385  nsp_loss : 0.5515  total_loss : 10.5900  avg_loss_step : 10.5900  learning_rate : 2.8e-05  loss_scaler : 32768 
DLL 2020-11-27 03:02:34.459389 - Iteration: 10  throughput_train : 263.082 seq/s mlm_loss : 10.0142  nsp_loss : 0.5288  total_loss : 10.5431  avg_loss_step : 10.5431  learning_rate : 3.2e-05  loss_scaler : 32768 
DLL 2020-11-27 03:02:36.404591 - Iteration: 11  throughput_train : 263.252 seq/s mlm_loss : 10.0280  nsp_loss : 0.4976  total_loss : 10.5255  avg_loss_step : 10.5255  learning_rate : 3.6e-05  loss_scaler : 32768 
DLL 2020-11-27 03:02:38.354096 - Iteration: 12  throughput_train : 262.670 seq/s mlm_loss : 10.0271  nsp_loss : 0.4700  total_loss : 10.4971  avg_loss_step : 10.4971  learning_rate : 4e-05  loss_scaler : 32768 
DLL 2020-11-27 03:02:40.307365 - Iteration: 13  throughput_train : 262.163 seq/s mlm_loss : 10.0201  nsp_loss : 0.4389  total_loss : 10.4589  avg_loss_step : 10.4589  learning_rate : 4.4e-05  loss_scaler : 32768 
DLL 2020-11-27 03:02:42.260597 - Iteration: 14  throughput_train : 262.172 seq/s mlm_loss : 10.0028  nsp_loss : 0.4064  total_loss : 10.4092  avg_loss_step : 10.4092  learning_rate : 4.8e-05  loss_scaler : 32768 
DLL 2020-11-27 03:02:44.214010 - Iteration: 15  throughput_train : 262.143 seq/s mlm_loss : 9.9913  nsp_loss : 0.3739  total_loss : 10.3652  avg_loss_step : 10.3652  learning_rate : 5.2e-05  loss_scaler : 32768 
DLL 2020-11-27 03:02:46.161220 - Iteration: 16  throughput_train : 262.979 seq/s mlm_loss : 9.9777  nsp_loss : 0.3383  total_loss : 10.3160  avg_loss_step : 10.3160  learning_rate : 5.6e-05  loss_scaler : 32768 
DLL 2020-11-27 03:02:48.108658 - Iteration: 17  throughput_train : 262.948 seq/s mlm_loss : 10.0070  nsp_loss : 0.3076  total_loss : 10.3146  avg_loss_step : 10.3146  learning_rate : 6e-05  loss_scaler : 32768 
DLL 2020-11-27 03:02:50.056367 - Iteration: 18  throughput_train : 262.911 seq/s mlm_loss : 9.9829  nsp_loss : 0.2693  total_loss : 10.2522  avg_loss_step : 10.2522  learning_rate : 6.4e-05  loss_scaler : 32768 
DLL 2020-11-27 03:02:51.993858 - Iteration: 19  throughput_train : 264.299 seq/s mlm_loss : 9.9751  nsp_loss : 0.2352  total_loss : 10.2103  avg_loss_step : 10.2103  learning_rate : 6.8e-05  loss_scaler : 32768 
DLL 2020-11-27 03:02:53.937635 - Iteration: 20  throughput_train : 263.444 seq/s mlm_loss : 9.9623  nsp_loss : 0.2138  total_loss : 10.1761  avg_loss_step : 10.1761  learning_rate : 7.2e-05  loss_scaler : 32768 
DLL 2020-11-27 03:02:55.880048 - Iteration: 21  throughput_train : 263.629 seq/s mlm_loss : 9.9588  nsp_loss : 0.1891  total_loss : 10.1479  avg_loss_step : 10.1479  learning_rate : 7.6e-05  loss_scaler : 32768 
DLL 2020-11-27 03:02:57.826433 - Iteration: 22  throughput_train : 263.091 seq/s mlm_loss : 9.9589  nsp_loss : 0.1648  total_loss : 10.1237  avg_loss_step : 10.1237  learning_rate : 8e-05  loss_scaler : 32768 
DLL 2020-11-27 03:02:59.775349 - Iteration: 23  throughput_train : 262.753 seq/s mlm_loss : 9.9276  nsp_loss : 0.1462  total_loss : 10.0738  avg_loss_step : 10.0738  learning_rate : 8.4e-05  loss_scaler : 32768 
DLL 2020-11-27 03:03:01.715156 - Iteration: 24  throughput_train : 264.000 seq/s mlm_loss : 9.9098  nsp_loss : 0.1270  total_loss : 10.0368  avg_loss_step : 10.0368  learning_rate : 8.8e-05  loss_scaler : 32768 
DLL 2020-11-27 03:03:03.662734 - Iteration: 25  throughput_train : 262.932 seq/s mlm_loss : 9.9122  nsp_loss : 0.1125  total_loss : 10.0247  avg_loss_step : 10.0247  learning_rate : 9.2e-05  loss_scaler : 32768 
DLL 2020-11-27 03:03:05.609201 - Iteration: 26  throughput_train : 263.098 seq/s mlm_loss : 9.9003  nsp_loss : 0.0961  total_loss : 9.9964  avg_loss_step : 9.9964  learning_rate : 9.6e-05  loss_scaler : 32768 
DLL 2020-11-27 03:03:07.548543 - Iteration: 27  throughput_train : 264.067 seq/s mlm_loss : 9.9011  nsp_loss : 0.0831  total_loss : 9.9842  avg_loss_step : 9.9842  learning_rate : 1e-04  loss_scaler : 32768 
DLL 2020-11-27 03:03:09.491554 - Iteration: 28  throughput_train : 263.563 seq/s mlm_loss : 9.8791  nsp_loss : 0.0727  total_loss : 9.9518  avg_loss_step : 9.9518  learning_rate : 0.000104  loss_scaler : 32768 
DLL 2020-11-27 03:03:11.438947 - Iteration: 29  throughput_train : 262.957 seq/s mlm_loss : 9.8622  nsp_loss : 0.0619  total_loss : 9.9241  avg_loss_step : 9.9241  learning_rate : 0.000108  loss_scaler : 32768 
DLL 2020-11-27 03:03:13.390545 - Iteration: 30  throughput_train : 262.389 seq/s mlm_loss : 9.8653  nsp_loss : 0.0551  total_loss : 9.9203  avg_loss_step : 9.9203  learning_rate : 0.000112  loss_scaler : 32768 
DLL 2020-11-27 03:03:15.340530 - Iteration: 31  throughput_train : 262.606 seq/s mlm_loss : 9.8029  nsp_loss : 0.0476  total_loss : 9.8505  avg_loss_step : 9.8505  learning_rate : 0.000116  loss_scaler : 32768 
DLL 2020-11-27 03:03:17.289971 - Iteration: 32  throughput_train : 262.677 seq/s mlm_loss : 9.7908  nsp_loss : 0.0428  total_loss : 9.8336  avg_loss_step : 9.8336  learning_rate : 0.00012  loss_scaler : 32768 
DLL 2020-11-27 03:03:19.241101 - Iteration: 33  throughput_train : 262.450 seq/s mlm_loss : 9.7904  nsp_loss : 0.0379  total_loss : 9.8283  avg_loss_step : 9.8283  learning_rate : 0.000124  loss_scaler : 32768 
DLL 2020-11-27 03:03:21.181789 - Iteration: 34  throughput_train : 263.863 seq/s mlm_loss : 9.7822  nsp_loss : 0.0350  total_loss : 9.8171  avg_loss_step : 9.8171  learning_rate : 0.000128  loss_scaler : 32768 
DLL 2020-11-27 03:03:23.120842 - Iteration: 35  throughput_train : 264.087 seq/s mlm_loss : 9.7403  nsp_loss : 0.0318  total_loss : 9.7721  avg_loss_step : 9.7721  learning_rate : 0.000132  loss_scaler : 32768 
DLL 2020-11-27 03:03:25.062710 - Iteration: 36  throughput_train : 263.706 seq/s mlm_loss : 9.7183  nsp_loss : 0.0281  total_loss : 9.7463  avg_loss_step : 9.7463  learning_rate : 0.000136  loss_scaler : 32768 
DLL 2020-11-27 03:03:27.010874 - Iteration: 37  throughput_train : 262.859 seq/s mlm_loss : 9.6493  nsp_loss : 0.0260  total_loss : 9.6753  avg_loss_step : 9.6753  learning_rate : 0.00014  loss_scaler : 32768 
DLL 2020-11-27 03:03:28.957299 - Iteration: 38  throughput_train : 263.089 seq/s mlm_loss : 9.6475  nsp_loss : 0.0240  total_loss : 9.6715  avg_loss_step : 9.6715  learning_rate : 0.000144  loss_scaler : 32768 
DLL 2020-11-27 03:03:30.907874 - Iteration: 39  throughput_train : 262.541 seq/s mlm_loss : 9.6494  nsp_loss : 0.0226  total_loss : 9.6720  avg_loss_step : 9.6720  learning_rate : 0.000148  loss_scaler : 32768 
DLL 2020-11-27 03:03:32.857617 - Iteration: 40  throughput_train : 262.654 seq/s mlm_loss : 9.6227  nsp_loss : 0.0215  total_loss : 9.6442  avg_loss_step : 9.6442  learning_rate : 0.000152  loss_scaler : 32768 
DLL 2020-11-27 03:03:34.797107 - Iteration: 41  throughput_train : 264.058 seq/s mlm_loss : 9.6086  nsp_loss : 0.0205  total_loss : 9.6292  avg_loss_step : 9.6292  learning_rate : 0.000156  loss_scaler : 32768 
DLL 2020-11-27 03:03:36.743103 - Iteration: 42  throughput_train : 263.176 seq/s mlm_loss : 9.5962  nsp_loss : 0.0196  total_loss : 9.6158  avg_loss_step : 9.6158  learning_rate : 0.00016  loss_scaler : 32768 
DLL 2020-11-27 03:03:38.681478 - Iteration: 43  throughput_train : 264.210 seq/s mlm_loss : 9.5094  nsp_loss : 0.0187  total_loss : 9.5281  avg_loss_step : 9.5281  learning_rate : 0.000164  loss_scaler : 32768 
DLL 2020-11-27 03:03:40.627741 - Iteration: 44  throughput_train : 263.133 seq/s mlm_loss : 9.5604  nsp_loss : 0.0181  total_loss : 9.5785  avg_loss_step : 9.5785  learning_rate : 0.000168  loss_scaler : 32768 
DLL 2020-11-27 03:03:42.584452 - Iteration: 45  throughput_train : 261.709 seq/s mlm_loss : 9.5284  nsp_loss : 0.0179  total_loss : 9.5463  avg_loss_step : 9.5463  learning_rate : 0.000172  loss_scaler : 32768 
DLL 2020-11-27 03:03:44.531023 - Iteration: 46  throughput_train : 263.098 seq/s mlm_loss : 9.4623  nsp_loss : 0.0173  total_loss : 9.4796  avg_loss_step : 9.4796  learning_rate : 0.000176  loss_scaler : 32768 
DLL 2020-11-27 03:03:46.475050 - Iteration: 47  throughput_train : 263.441 seq/s mlm_loss : 9.4430  nsp_loss : 0.0170  total_loss : 9.4600  avg_loss_step : 9.4600  learning_rate : 0.00018  loss_scaler : 32768 
DLL 2020-11-27 03:03:48.420719 - Iteration: 48  throughput_train : 263.218 seq/s mlm_loss : 9.4856  nsp_loss : 0.0169  total_loss : 9.5025  avg_loss_step : 9.5025  learning_rate : 0.000184  loss_scaler : 32768 
DLL 2020-11-27 03:03:50.356966 - Iteration: 49  throughput_train : 264.499 seq/s mlm_loss : 9.4995  nsp_loss : 0.0166  total_loss : 9.5161  avg_loss_step : 9.5161  learning_rate : 0.000188  loss_scaler : 32768 
DLL 2020-11-27 03:03:52.294758 - Iteration: 50  throughput_train : 264.281 seq/s mlm_loss : 9.5366  nsp_loss : 0.0165  total_loss : 9.5531  avg_loss_step : 9.5531  learning_rate : 0.000192  loss_scaler : 32768 
DLL 2020-11-27 03:03:54.239294 - Iteration: 51  throughput_train : 263.341 seq/s mlm_loss : 9.4012  nsp_loss : 0.0164  total_loss : 9.4176  avg_loss_step : 9.4176  learning_rate : 0.000196  loss_scaler : 32768 
DLL 2020-11-27 03:03:56.183770 - Iteration: 52  throughput_train : 263.352 seq/s mlm_loss : 9.3971  nsp_loss : 0.0166  total_loss : 9.4137  avg_loss_step : 9.4137  learning_rate : 0.0002  loss_scaler : 32768 
DLL 2020-11-27 03:03:58.120446 - Iteration: 53  throughput_train : 264.409 seq/s mlm_loss : 9.3740  nsp_loss : 0.0163  total_loss : 9.3903  avg_loss_step : 9.3903  learning_rate : 0.000204  loss_scaler : 32768 
DLL 2020-11-27 03:04:00.070836 - Iteration: 54  throughput_train : 262.551 seq/s mlm_loss : 9.3342  nsp_loss : 0.0161  total_loss : 9.3502  avg_loss_step : 9.3502  learning_rate : 0.000208  loss_scaler : 32768 
DLL 2020-11-27 03:04:02.009290 - Iteration: 55  throughput_train : 264.166 seq/s mlm_loss : 9.3070  nsp_loss : 0.0162  total_loss : 9.3232  avg_loss_step : 9.3232  learning_rate : 0.000212  loss_scaler : 32768 
DLL 2020-11-27 03:04:03.965430 - Iteration: 56  throughput_train : 261.778 seq/s mlm_loss : 9.2850  nsp_loss : 0.0162  total_loss : 9.3012  avg_loss_step : 9.3012  learning_rate : 0.000216  loss_scaler : 32768 
DLL 2020-11-27 03:04:05.915510 - Iteration: 57  throughput_train : 262.592 seq/s mlm_loss : 9.2349  nsp_loss : 0.0158  total_loss : 9.2507  avg_loss_step : 9.2507  learning_rate : 0.00022  loss_scaler : 32768 
DLL 2020-11-27 03:04:07.867688 - Iteration: 58  throughput_train : 262.311 seq/s mlm_loss : 9.2527  nsp_loss : 0.0155  total_loss : 9.2682  avg_loss_step : 9.2682  learning_rate : 0.000224  loss_scaler : 32768 
DLL 2020-11-27 03:04:09.808519 - Iteration: 59  throughput_train : 263.845 seq/s mlm_loss : 9.3086  nsp_loss : 0.0148  total_loss : 9.3234  avg_loss_step : 9.3234  learning_rate : 0.000228  loss_scaler : 32768 
DLL 2020-11-27 03:04:11.749229 - Iteration: 60  throughput_train : 263.860 seq/s mlm_loss : 9.2488  nsp_loss : 0.0148  total_loss : 9.2635  avg_loss_step : 9.2635  learning_rate : 0.000232  loss_scaler : 32768 
DLL 2020-11-27 03:04:13.697693 - Iteration: 61  throughput_train : 262.811 seq/s mlm_loss : 9.2955  nsp_loss : 0.0143  total_loss : 9.3097  avg_loss_step : 9.3097  learning_rate : 0.00023599999  loss_scaler : 32768 
DLL 2020-11-27 03:04:15.654668 - Iteration: 62  throughput_train : 261.666 seq/s mlm_loss : 9.3006  nsp_loss : 0.0133  total_loss : 9.3139  avg_loss_step : 9.3139  learning_rate : 0.00024  loss_scaler : 32768 
DLL 2020-11-27 03:04:17.602752 - Iteration: 63  throughput_train : 262.860 seq/s mlm_loss : 9.2198  nsp_loss : 0.0128  total_loss : 9.2326  avg_loss_step : 9.2326  learning_rate : 0.000244  loss_scaler : 32768 
DLL 2020-11-27 03:04:19.535990 - Iteration: 64  throughput_train : 264.883 seq/s mlm_loss : 9.2287  nsp_loss : 0.0119  total_loss : 9.2405  avg_loss_step : 9.2405  learning_rate : 0.000248  loss_scaler : 32768 
DLL 2020-11-27 03:04:21.482309 - Iteration: 65  throughput_train : 263.110 seq/s mlm_loss : 9.1751  nsp_loss : 0.0114  total_loss : 9.1865  avg_loss_step : 9.1865  learning_rate : 0.000252  loss_scaler : 32768 
DLL 2020-11-27 03:04:23.425658 - Iteration: 66  throughput_train : 263.504 seq/s mlm_loss : 9.1630  nsp_loss : 0.0108  total_loss : 9.1737  avg_loss_step : 9.1737  learning_rate : 0.000256  loss_scaler : 32768 
DLL 2020-11-27 03:04:25.366943 - Iteration: 67  throughput_train : 263.782 seq/s mlm_loss : 9.2060  nsp_loss : 0.0100  total_loss : 9.2160  avg_loss_step : 9.2160  learning_rate : 0.00026  loss_scaler : 32768 
DLL 2020-11-27 03:04:27.312225 - Iteration: 68  throughput_train : 263.240 seq/s mlm_loss : 9.1642  nsp_loss : 0.0094  total_loss : 9.1737  avg_loss_step : 9.1737  learning_rate : 0.000264  loss_scaler : 32768 
DLL 2020-11-27 03:04:29.264004 - Iteration: 69  throughput_train : 262.363 seq/s mlm_loss : 9.2531  nsp_loss : 0.0086  total_loss : 9.2617  avg_loss_step : 9.2617  learning_rate : 0.000268  loss_scaler : 32768 
DLL 2020-11-27 03:04:31.209444 - Iteration: 70  throughput_train : 263.218 seq/s mlm_loss : 9.1726  nsp_loss : 0.0083  total_loss : 9.1808  avg_loss_step : 9.1808  learning_rate : 0.000272  loss_scaler : 32768 
DLL 2020-11-27 03:04:33.161850 - Iteration: 71  throughput_train : 262.279 seq/s mlm_loss : 9.1361  nsp_loss : 0.0076  total_loss : 9.1437  avg_loss_step : 9.1437  learning_rate : 0.000276  loss_scaler : 32768 
DLL 2020-11-27 03:04:35.113795 - Iteration: 72  throughput_train : 262.340 seq/s mlm_loss : 9.1495  nsp_loss : 0.0070  total_loss : 9.1565  avg_loss_step : 9.1565  learning_rate : 0.00028  loss_scaler : 32768 
DLL 2020-11-27 03:04:37.067169 - Iteration: 73  throughput_train : 262.148 seq/s mlm_loss : 9.1399  nsp_loss : 0.0066  total_loss : 9.1465  avg_loss_step : 9.1465  learning_rate : 0.000284  loss_scaler : 32768 
DLL 2020-11-27 03:04:39.012996 - Iteration: 74  throughput_train : 263.166 seq/s mlm_loss : 9.1689  nsp_loss : 0.0061  total_loss : 9.1750  avg_loss_step : 9.1750  learning_rate : 0.000288  loss_scaler : 32768 
DLL 2020-11-27 03:04:40.958430 - Iteration: 75  throughput_train : 263.222 seq/s mlm_loss : 9.0112  nsp_loss : 0.0057  total_loss : 9.0169  avg_loss_step : 9.0169  learning_rate : 0.000292  loss_scaler : 32768 
DLL 2020-11-27 03:04:42.901463 - Iteration: 76  throughput_train : 263.554 seq/s mlm_loss : 8.9815  nsp_loss : 0.0051  total_loss : 8.9866  avg_loss_step : 8.9866  learning_rate : 0.000296  loss_scaler : 32768 
DLL 2020-11-27 03:04:44.846331 - Iteration: 77  throughput_train : 263.297 seq/s mlm_loss : 9.0860  nsp_loss : 0.0047  total_loss : 9.0907  avg_loss_step : 9.0907  learning_rate : 0.00029999999  loss_scaler : 32768 
DLL 2020-11-27 03:04:46.791285 - Iteration: 78  throughput_train : 263.291 seq/s mlm_loss : 9.0976  nsp_loss : 0.0043  total_loss : 9.1018  avg_loss_step : 9.1018  learning_rate : 0.000304  loss_scaler : 32768 
DLL 2020-11-27 03:04:48.734075 - Iteration: 79  throughput_train : 263.600 seq/s mlm_loss : 9.1228  nsp_loss : 0.0040  total_loss : 9.1268  avg_loss_step : 9.1268  learning_rate : 0.000308  loss_scaler : 32768 
DLL 2020-11-27 03:04:50.680997 - Iteration: 80  throughput_train : 263.018 seq/s mlm_loss : 9.0853  nsp_loss : 0.0036  total_loss : 9.0889  avg_loss_step : 9.0889  learning_rate : 0.000312  loss_scaler : 32768 
DLL 2020-11-27 03:04:52.630948 - Iteration: 81  throughput_train : 262.613 seq/s mlm_loss : 9.0399  nsp_loss : 0.0033  total_loss : 9.0432  avg_loss_step : 9.0432  learning_rate : 0.000316  loss_scaler : 32768 
DLL 2020-11-27 03:04:54.577493 - Iteration: 82  throughput_train : 263.080 seq/s mlm_loss : 9.0501  nsp_loss : 0.0030  total_loss : 9.0530  avg_loss_step : 9.0530  learning_rate : 0.00032  loss_scaler : 32768 
DLL 2020-11-27 03:04:56.525278 - Iteration: 83  throughput_train : 262.902 seq/s mlm_loss : 9.0453  nsp_loss : 0.0027  total_loss : 9.0480  avg_loss_step : 9.0480  learning_rate : 0.000324  loss_scaler : 32768 
DLL 2020-11-27 03:04:58.468384 - Iteration: 84  throughput_train : 263.535 seq/s mlm_loss : 9.0070  nsp_loss : 0.0025  total_loss : 9.0095  avg_loss_step : 9.0095  learning_rate : 0.000328  loss_scaler : 32768 
DLL 2020-11-27 03:05:00.408075 - Iteration: 85  throughput_train : 263.999 seq/s mlm_loss : 9.0142  nsp_loss : 0.0023  total_loss : 9.0165  avg_loss_step : 9.0165  learning_rate : 0.000332  loss_scaler : 32768 
DLL 2020-11-27 03:05:02.363888 - Iteration: 86  throughput_train : 261.822 seq/s mlm_loss : 9.0216  nsp_loss : 0.0021  total_loss : 9.0237  avg_loss_step : 9.0237  learning_rate : 0.000336  loss_scaler : 32768 
DLL 2020-11-27 03:05:04.309124 - Iteration: 87  throughput_train : 263.245 seq/s mlm_loss : 9.0332  nsp_loss : 0.0019  total_loss : 9.0351  avg_loss_step : 9.0351  learning_rate : 0.00034  loss_scaler : 32768 
DLL 2020-11-27 03:05:06.254732 - Iteration: 88  throughput_train : 263.195 seq/s mlm_loss : 9.0397  nsp_loss : 0.0018  total_loss : 9.0414  avg_loss_step : 9.0414  learning_rate : 0.000344  loss_scaler : 32768 
DLL 2020-11-27 03:05:08.194217 - Iteration: 89  throughput_train : 264.026 seq/s mlm_loss : 8.9267  nsp_loss : 0.0016  total_loss : 8.9283  avg_loss_step : 8.9283  learning_rate : 0.000348  loss_scaler : 32768 
DLL 2020-11-27 03:05:10.135184 - Iteration: 90  throughput_train : 263.824 seq/s mlm_loss : 8.9722  nsp_loss : 0.0014  total_loss : 8.9736  avg_loss_step : 8.9736  learning_rate : 0.000352  loss_scaler : 32768 
DLL 2020-11-27 03:05:12.082224 - Iteration: 91  throughput_train : 263.002 seq/s mlm_loss : 9.0116  nsp_loss : 0.0013  total_loss : 9.0129  avg_loss_step : 9.0129  learning_rate : 0.000356  loss_scaler : 32768 
DLL 2020-11-27 03:05:14.028997 - Iteration: 92  throughput_train : 263.038 seq/s mlm_loss : 8.9696  nsp_loss : 0.0012  total_loss : 8.9708  avg_loss_step : 8.9708  learning_rate : 0.00036  loss_scaler : 32768 
DLL 2020-11-27 03:05:15.973743 - Iteration: 93  throughput_train : 263.312 seq/s mlm_loss : 8.8364  nsp_loss : 0.0011  total_loss : 8.8375  avg_loss_step : 8.8375  learning_rate : 0.000364  loss_scaler : 32768 
DLL 2020-11-27 03:05:17.934060 - Iteration: 94  throughput_train : 261.220 seq/s mlm_loss : 8.8243  nsp_loss : 0.0010  total_loss : 8.8253  avg_loss_step : 8.8253  learning_rate : 0.000368  loss_scaler : 32768 
DLL 2020-11-27 03:05:19.873869 - Iteration: 95  throughput_train : 263.982 seq/s mlm_loss : 8.9199  nsp_loss : 0.0009  total_loss : 8.9208  avg_loss_step : 8.9208  learning_rate : 0.000372  loss_scaler : 32768 
DLL 2020-11-27 03:05:21.825101 - Iteration: 96  throughput_train : 262.437 seq/s mlm_loss : 8.9223  nsp_loss : 0.0009  total_loss : 8.9232  avg_loss_step : 8.9232  learning_rate : 0.000376  loss_scaler : 32768 
DLL 2020-11-27 03:05:23.768475 - Iteration: 97  throughput_train : 263.498 seq/s mlm_loss : 8.8848  nsp_loss : 0.0008  total_loss : 8.8856  avg_loss_step : 8.8856  learning_rate : 0.00038  loss_scaler : 32768 
DLL 2020-11-27 03:05:25.719176 - Iteration: 98  throughput_train : 262.512 seq/s mlm_loss : 8.7879  nsp_loss : 0.0008  total_loss : 8.7887  avg_loss_step : 8.7887  learning_rate : 0.000384  loss_scaler : 32768 
DLL 2020-11-27 03:05:27.664530 - Iteration: 99  throughput_train : 263.230 seq/s mlm_loss : 8.9027  nsp_loss : 0.0008  total_loss : 8.9034  avg_loss_step : 8.9034  learning_rate : 0.000388  loss_scaler : 32768 
DLL 2020-11-27 03:05:29.611703 - Iteration: 100  throughput_train : 262.985 seq/s mlm_loss : 8.8705  nsp_loss : 0.0007  total_loss : 8.8712  avg_loss_step : 8.8712  learning_rate : 0.000392  loss_scaler : 32768 
DLL 2020-11-27 03:05:31.569702 - Iteration: 101  throughput_train : 261.534 seq/s mlm_loss : 8.9104  nsp_loss : 0.0007  total_loss : 8.9112  avg_loss_step : 8.9112  learning_rate : 0.000396  loss_scaler : 32768 
DLL 2020-11-27 03:05:33.508654 - Iteration: 102  throughput_train : 264.099 seq/s mlm_loss : 8.9061  nsp_loss : 0.0007  total_loss : 8.9068  avg_loss_step : 8.9068  learning_rate : 0.0003794733  loss_scaler : 32768 
DLL 2020-11-27 03:05:35.457738 - Iteration: 103  throughput_train : 262.725 seq/s mlm_loss : 8.7767  nsp_loss : 0.0007  total_loss : 8.7774  avg_loss_step : 8.7774  learning_rate : 0.00037926243  loss_scaler : 32768 
DLL 2020-11-27 03:05:37.407342 - Iteration: 104  throughput_train : 262.656 seq/s mlm_loss : 8.8475  nsp_loss : 0.0007  total_loss : 8.8482  avg_loss_step : 8.8482  learning_rate : 0.00037905143  loss_scaler : 32768 
DLL 2020-11-27 03:05:39.357321 - Iteration: 105  throughput_train : 262.607 seq/s mlm_loss : 8.8304  nsp_loss : 0.0007  total_loss : 8.8311  avg_loss_step : 8.8311  learning_rate : 0.0003788403  loss_scaler : 32768 
DLL 2020-11-27 03:05:41.301994 - Iteration: 106  throughput_train : 263.324 seq/s mlm_loss : 8.7950  nsp_loss : 0.0007  total_loss : 8.7957  avg_loss_step : 8.7957  learning_rate : 0.0003786291  loss_scaler : 32768 
DLL 2020-11-27 03:05:43.252256 - Iteration: 107  throughput_train : 262.569 seq/s mlm_loss : 8.8664  nsp_loss : 0.0007  total_loss : 8.8671  avg_loss_step : 8.8671  learning_rate : 0.00037841775  loss_scaler : 32768 
DLL 2020-11-27 03:05:45.209929 - Iteration: 108  throughput_train : 261.573 seq/s mlm_loss : 8.9076  nsp_loss : 0.0007  total_loss : 8.9083  avg_loss_step : 8.9083  learning_rate : 0.00037820628  loss_scaler : 32768 
DLL 2020-11-27 03:05:47.177711 - Iteration: 109  throughput_train : 260.229 seq/s mlm_loss : 8.7935  nsp_loss : 0.0007  total_loss : 8.7942  avg_loss_step : 8.7942  learning_rate : 0.0003779947  loss_scaler : 32768 
DLL 2020-11-27 03:05:49.127389 - Iteration: 110  throughput_train : 262.646 seq/s mlm_loss : 8.7110  nsp_loss : 0.0008  total_loss : 8.7117  avg_loss_step : 8.7117  learning_rate : 0.000377783  loss_scaler : 32768 
DLL 2020-11-27 03:05:51.061484 - Iteration: 111  throughput_train : 264.765 seq/s mlm_loss : 8.6658  nsp_loss : 0.0007  total_loss : 8.6665  avg_loss_step : 8.6665  learning_rate : 0.00037757118  loss_scaler : 32768 
DLL 2020-11-27 03:05:53.002014 - Iteration: 112  throughput_train : 263.885 seq/s mlm_loss : 8.8376  nsp_loss : 0.0007  total_loss : 8.8383  avg_loss_step : 8.8383  learning_rate : 0.00037735925  loss_scaler : 32768 
DLL 2020-11-27 03:05:54.956762 - Iteration: 113  throughput_train : 261.965 seq/s mlm_loss : 8.8310  nsp_loss : 0.0007  total_loss : 8.8317  avg_loss_step : 8.8317  learning_rate : 0.0003771472  loss_scaler : 32768 
DLL 2020-11-27 03:05:56.904382 - Iteration: 114  throughput_train : 262.923 seq/s mlm_loss : 8.7930  nsp_loss : 0.0007  total_loss : 8.7937  avg_loss_step : 8.7937  learning_rate : 0.000376935  loss_scaler : 32768 
DLL 2020-11-27 03:05:58.839994 - Iteration: 115  throughput_train : 264.556 seq/s mlm_loss : 8.7929  nsp_loss : 0.0007  total_loss : 8.7936  avg_loss_step : 8.7936  learning_rate : 0.0003767227  loss_scaler : 32768 
DLL 2020-11-27 03:06:00.784568 - Iteration: 116  throughput_train : 263.336 seq/s mlm_loss : 8.6613  nsp_loss : 0.0007  total_loss : 8.6619  avg_loss_step : 8.6619  learning_rate : 0.0003765103  loss_scaler : 32768 
DLL 2020-11-27 03:06:02.728695 - Iteration: 117  throughput_train : 263.401 seq/s mlm_loss : 8.7185  nsp_loss : 0.0006  total_loss : 8.7191  avg_loss_step : 8.7191  learning_rate : 0.00037629774  loss_scaler : 32768 
DLL 2020-11-27 03:06:04.673613 - Iteration: 118  throughput_train : 263.289 seq/s mlm_loss : 8.6652  nsp_loss : 0.0006  total_loss : 8.6658  avg_loss_step : 8.6658  learning_rate : 0.00037608508  loss_scaler : 32768 
DLL 2020-11-27 03:06:06.621225 - Iteration: 119  throughput_train : 262.924 seq/s mlm_loss : 8.6610  nsp_loss : 0.0005  total_loss : 8.6616  avg_loss_step : 8.6616  learning_rate : 0.0003758723  loss_scaler : 32768 
DLL 2020-11-27 03:06:08.571999 - Iteration: 120  throughput_train : 262.502 seq/s mlm_loss : 8.7262  nsp_loss : 0.0005  total_loss : 8.7267  avg_loss_step : 8.7267  learning_rate : 0.0003756594  loss_scaler : 32768 
DLL 2020-11-27 03:06:10.541920 - Iteration: 121  throughput_train : 259.947 seq/s mlm_loss : 8.6293  nsp_loss : 0.0005  total_loss : 8.6298  avg_loss_step : 8.6298  learning_rate : 0.00037544637  loss_scaler : 32768 
DLL 2020-11-27 03:06:12.490909 - Iteration: 122  throughput_train : 262.739 seq/s mlm_loss : 8.5872  nsp_loss : 0.0005  total_loss : 8.5877  avg_loss_step : 8.5877  learning_rate : 0.00037523327  loss_scaler : 32768 
DLL 2020-11-27 03:06:14.435157 - Iteration: 123  throughput_train : 263.379 seq/s mlm_loss : 8.7120  nsp_loss : 0.0005  total_loss : 8.7124  avg_loss_step : 8.7124  learning_rate : 0.00037502  loss_scaler : 32768 
DLL 2020-11-27 03:06:16.376265 - Iteration: 124  throughput_train : 263.806 seq/s mlm_loss : 8.6522  nsp_loss : 0.0004  total_loss : 8.6527  avg_loss_step : 8.6527  learning_rate : 0.0003748066  loss_scaler : 32768 
DLL 2020-11-27 03:06:18.322954 - Iteration: 125  throughput_train : 263.049 seq/s mlm_loss : 8.6442  nsp_loss : 0.0004  total_loss : 8.6446  avg_loss_step : 8.6446  learning_rate : 0.0003745931  loss_scaler : 32768 
DLL 2020-11-27 03:06:20.267349 - Iteration: 126  throughput_train : 263.359 seq/s mlm_loss : 8.6053  nsp_loss : 0.0004  total_loss : 8.6057  avg_loss_step : 8.6057  learning_rate : 0.00037437948  loss_scaler : 32768 
DLL 2020-11-27 03:06:22.212657 - Iteration: 127  throughput_train : 263.239 seq/s mlm_loss : 8.5943  nsp_loss : 0.0004  total_loss : 8.5946  avg_loss_step : 8.5946  learning_rate : 0.00037416574  loss_scaler : 32768 
DLL 2020-11-27 03:06:24.156897 - Iteration: 128  throughput_train : 263.381 seq/s mlm_loss : 8.5235  nsp_loss : 0.0004  total_loss : 8.5238  avg_loss_step : 8.5238  learning_rate : 0.00037395186  loss_scaler : 32768 
DLL 2020-11-27 03:06:26.095908 - Iteration: 129  throughput_train : 264.091 seq/s mlm_loss : 8.5214  nsp_loss : 0.0003  total_loss : 8.5217  avg_loss_step : 8.5217  learning_rate : 0.0003737379  loss_scaler : 32768 
DLL 2020-11-27 03:06:28.048722 - Iteration: 130  throughput_train : 262.224 seq/s mlm_loss : 8.4610  nsp_loss : 0.0003  total_loss : 8.4613  avg_loss_step : 8.4613  learning_rate : 0.00037352374  loss_scaler : 32768 
DLL 2020-11-27 03:06:30.000980 - Iteration: 131  throughput_train : 262.298 seq/s mlm_loss : 8.6204  nsp_loss : 0.0003  total_loss : 8.6207  avg_loss_step : 8.6207  learning_rate : 0.0003733095  loss_scaler : 32768 
DLL 2020-11-27 03:06:31.948226 - Iteration: 132  throughput_train : 262.975 seq/s mlm_loss : 8.7430  nsp_loss : 0.0003  total_loss : 8.7432  avg_loss_step : 8.7432  learning_rate : 0.00037309516  loss_scaler : 32768 
DLL 2020-11-27 03:06:33.887387 - Iteration: 133  throughput_train : 264.073 seq/s mlm_loss : 8.6713  nsp_loss : 0.0003  total_loss : 8.6716  avg_loss_step : 8.6716  learning_rate : 0.00037288066  loss_scaler : 32768 
DLL 2020-11-27 03:06:35.836325 - Iteration: 134  throughput_train : 262.748 seq/s mlm_loss : 8.6065  nsp_loss : 0.0003  total_loss : 8.6068  avg_loss_step : 8.6068  learning_rate : 0.00037266605  loss_scaler : 32768 
DLL 2020-11-27 03:06:37.780946 - Iteration: 135  throughput_train : 263.331 seq/s mlm_loss : 8.6003  nsp_loss : 0.0002  total_loss : 8.6006  avg_loss_step : 8.6006  learning_rate : 0.00037245132  loss_scaler : 32768 
DLL 2020-11-27 03:06:39.728538 - Iteration: 136  throughput_train : 262.928 seq/s mlm_loss : 8.5613  nsp_loss : 0.0002  total_loss : 8.5615  avg_loss_step : 8.5615  learning_rate : 0.00037223648  loss_scaler : 32768 
DLL 2020-11-27 03:06:41.670525 - Iteration: 137  throughput_train : 263.687 seq/s mlm_loss : 8.5178  nsp_loss : 0.0002  total_loss : 8.5180  avg_loss_step : 8.5180  learning_rate : 0.0003720215  loss_scaler : 32768 
DLL 2020-11-27 03:06:43.614944 - Iteration: 138  throughput_train : 263.356 seq/s mlm_loss : 8.4490  nsp_loss : 0.0002  total_loss : 8.4492  avg_loss_step : 8.4492  learning_rate : 0.00037180638  loss_scaler : 32768 
DLL 2020-11-27 03:06:45.561708 - Iteration: 139  throughput_train : 263.039 seq/s mlm_loss : 8.5115  nsp_loss : 0.0002  total_loss : 8.5117  avg_loss_step : 8.5117  learning_rate : 0.00037159116  loss_scaler : 32768 
DLL 2020-11-27 03:06:47.510462 - Iteration: 140  throughput_train : 262.771 seq/s mlm_loss : 8.6405  nsp_loss : 0.0002  total_loss : 8.6407  avg_loss_step : 8.6407  learning_rate : 0.00037137582  loss_scaler : 32768 
DLL 2020-11-27 03:06:49.456153 - Iteration: 141  throughput_train : 263.187 seq/s mlm_loss : 8.4564  nsp_loss : 0.0002  total_loss : 8.4566  avg_loss_step : 8.4566  learning_rate : 0.00037116033  loss_scaler : 32768 
DLL 2020-11-27 03:06:51.409935 - Iteration: 142  throughput_train : 262.097 seq/s mlm_loss : 8.5094  nsp_loss : 0.0002  total_loss : 8.5096  avg_loss_step : 8.5096  learning_rate : 0.00037094473  loss_scaler : 32768 
DLL 2020-11-27 03:06:53.356027 - Iteration: 143  throughput_train : 263.132 seq/s mlm_loss : 8.5267  nsp_loss : 0.0002  total_loss : 8.5268  avg_loss_step : 8.5268  learning_rate : 0.000370729  loss_scaler : 32768 
DLL 2020-11-27 03:06:55.302222 - Iteration: 144  throughput_train : 263.117 seq/s mlm_loss : 8.5299  nsp_loss : 0.0002  total_loss : 8.5300  avg_loss_step : 8.5300  learning_rate : 0.00037051315  loss_scaler : 32768 
DLL 2020-11-27 03:06:57.234569 - Iteration: 145  throughput_train : 265.003 seq/s mlm_loss : 8.5149  nsp_loss : 0.0002  total_loss : 8.5151  avg_loss_step : 8.5151  learning_rate : 0.00037029717  loss_scaler : 32768 
DLL 2020-11-27 03:06:59.181302 - Iteration: 146  throughput_train : 263.044 seq/s mlm_loss : 8.4608  nsp_loss : 0.0002  total_loss : 8.4609  avg_loss_step : 8.4609  learning_rate : 0.00037008105  loss_scaler : 32768 
DLL 2020-11-27 03:07:01.123964 - Iteration: 147  throughput_train : 263.597 seq/s mlm_loss : 8.4247  nsp_loss : 0.0002  total_loss : 8.4249  avg_loss_step : 8.4249  learning_rate : 0.00036986484  loss_scaler : 32768 
DLL 2020-11-27 03:07:03.071779 - Iteration: 148  throughput_train : 262.897 seq/s mlm_loss : 8.4268  nsp_loss : 0.0001  total_loss : 8.4270  avg_loss_step : 8.4270  learning_rate : 0.00036964848  loss_scaler : 32768 
DLL 2020-11-27 03:07:05.015625 - Iteration: 149  throughput_train : 263.436 seq/s mlm_loss : 8.4862  nsp_loss : 0.0001  total_loss : 8.4863  avg_loss_step : 8.4863  learning_rate : 0.00036943197  loss_scaler : 32768 
DLL 2020-11-27 03:07:06.962769 - Iteration: 150  throughput_train : 263.001 seq/s mlm_loss : 8.5256  nsp_loss : 0.0001  total_loss : 8.5258  avg_loss_step : 8.5258  learning_rate : 0.00036921538  loss_scaler : 32768 
DLL 2020-11-27 03:07:08.906361 - Iteration: 151  throughput_train : 263.483 seq/s mlm_loss : 8.3895  nsp_loss : 0.0001  total_loss : 8.3896  avg_loss_step : 8.3896  learning_rate : 0.00036899865  loss_scaler : 32768 
DLL 2020-11-27 03:07:10.853418 - Iteration: 152  throughput_train : 263.004 seq/s mlm_loss : 8.4238  nsp_loss : 0.0001  total_loss : 8.4239  avg_loss_step : 8.4239  learning_rate : 0.00036878177  loss_scaler : 32768 
DLL 2020-11-27 03:07:12.796118 - Iteration: 153  throughput_train : 263.592 seq/s mlm_loss : 8.4345  nsp_loss : 0.0001  total_loss : 8.4347  avg_loss_step : 8.4347  learning_rate : 0.00036856477  loss_scaler : 32768 
DLL 2020-11-27 03:07:14.743465 - Iteration: 154  throughput_train : 262.961 seq/s mlm_loss : 8.4097  nsp_loss : 0.0001  total_loss : 8.4098  avg_loss_step : 8.4098  learning_rate : 0.00036834765  loss_scaler : 32768 
DLL 2020-11-27 03:07:16.688479 - Iteration: 155  throughput_train : 263.277 seq/s mlm_loss : 8.4177  nsp_loss : 0.0001  total_loss : 8.4178  avg_loss_step : 8.4178  learning_rate : 0.00036813042  loss_scaler : 32768 
DLL 2020-11-27 03:07:18.638132 - Iteration: 156  throughput_train : 262.650 seq/s mlm_loss : 8.3631  nsp_loss : 0.0001  total_loss : 8.3632  avg_loss_step : 8.3632  learning_rate : 0.00036791302  loss_scaler : 32768 
DLL 2020-11-27 03:07:20.583636 - Iteration: 157  throughput_train : 263.224 seq/s mlm_loss : 8.4297  nsp_loss : 0.0001  total_loss : 8.4299  avg_loss_step : 8.4299  learning_rate : 0.00036769552  loss_scaler : 32768 
DLL 2020-11-27 03:07:22.521711 - Iteration: 158  throughput_train : 264.226 seq/s mlm_loss : 8.3151  nsp_loss : 0.0001  total_loss : 8.3152  avg_loss_step : 8.3152  learning_rate : 0.00036747789  loss_scaler : 32768 
DLL 2020-11-27 03:07:24.468532 - Iteration: 159  throughput_train : 263.042 seq/s mlm_loss : 8.5058  nsp_loss : 0.0001  total_loss : 8.5059  avg_loss_step : 8.5059  learning_rate : 0.0003672601  loss_scaler : 32768 
DLL 2020-11-27 03:07:26.420171 - Iteration: 160  throughput_train : 262.388 seq/s mlm_loss : 8.3777  nsp_loss : 0.0001  total_loss : 8.3778  avg_loss_step : 8.3778  learning_rate : 0.00036704223  loss_scaler : 32768 
DLL 2020-11-27 03:07:28.362922 - Iteration: 161  throughput_train : 263.584 seq/s mlm_loss : 8.3775  nsp_loss : 0.0001  total_loss : 8.3776  avg_loss_step : 8.3776  learning_rate : 0.00036682418  loss_scaler : 32768 
DLL 2020-11-27 03:07:30.311312 - Iteration: 162  throughput_train : 262.831 seq/s mlm_loss : 8.3661  nsp_loss : 0.0001  total_loss : 8.3662  avg_loss_step : 8.3662  learning_rate : 0.00036660602  loss_scaler : 32768 
DLL 2020-11-27 03:07:32.261074 - Iteration: 163  throughput_train : 262.646 seq/s mlm_loss : 8.3650  nsp_loss : 0.0001  total_loss : 8.3651  avg_loss_step : 8.3651  learning_rate : 0.00036638777  loss_scaler : 32768 
DLL 2020-11-27 03:07:34.197052 - Iteration: 164  throughput_train : 264.507 seq/s mlm_loss : 8.3360  nsp_loss : 0.0001  total_loss : 8.3361  avg_loss_step : 8.3361  learning_rate : 0.00036616935  loss_scaler : 32768 
DLL 2020-11-27 03:07:36.141043 - Iteration: 165  throughput_train : 263.414 seq/s mlm_loss : 8.2951  nsp_loss : 0.0001  total_loss : 8.2952  avg_loss_step : 8.2952  learning_rate : 0.0003659508  loss_scaler : 32768 
DLL 2020-11-27 03:07:38.082575 - Iteration: 166  throughput_train : 263.750 seq/s mlm_loss : 8.3435  nsp_loss : 0.0001  total_loss : 8.3436  avg_loss_step : 8.3436  learning_rate : 0.00036573215  loss_scaler : 32768 
DLL 2020-11-27 03:07:40.025780 - Iteration: 167  throughput_train : 263.521 seq/s mlm_loss : 8.2895  nsp_loss : 0.0001  total_loss : 8.2896  avg_loss_step : 8.2896  learning_rate : 0.00036551332  loss_scaler : 32768 
DLL 2020-11-27 03:07:41.969348 - Iteration: 168  throughput_train : 263.472 seq/s mlm_loss : 8.2783  nsp_loss : 0.0001  total_loss : 8.2784  avg_loss_step : 8.2784  learning_rate : 0.0003652944  loss_scaler : 32768 
DLL 2020-11-27 03:07:43.912044 - Iteration: 169  throughput_train : 263.593 seq/s mlm_loss : 8.2646  nsp_loss : 0.0001  total_loss : 8.2647  avg_loss_step : 8.2647  learning_rate : 0.0003650753  loss_scaler : 32768 
DLL 2020-11-27 03:07:45.853093 - Iteration: 170  throughput_train : 263.813 seq/s mlm_loss : 8.2263  nsp_loss : 0.0001  total_loss : 8.2264  avg_loss_step : 8.2264  learning_rate : 0.00036485613  loss_scaler : 32768 
DLL 2020-11-27 03:07:47.799568 - Iteration: 171  throughput_train : 263.080 seq/s mlm_loss : 8.2997  nsp_loss : 0.0001  total_loss : 8.2998  avg_loss_step : 8.2998  learning_rate : 0.0003646368  loss_scaler : 32768 
DLL 2020-11-27 03:07:49.750046 - Iteration: 172  throughput_train : 262.538 seq/s mlm_loss : 8.3229  nsp_loss : 0.0001  total_loss : 8.3229  avg_loss_step : 8.3229  learning_rate : 0.00036441733  loss_scaler : 32768 
DLL 2020-11-27 03:07:51.694425 - Iteration: 173  throughput_train : 263.363 seq/s mlm_loss : 8.2844  nsp_loss : 0.0001  total_loss : 8.2845  avg_loss_step : 8.2845  learning_rate : 0.00036419774  loss_scaler : 32768 
DLL 2020-11-27 03:07:53.638045 - Iteration: 174  throughput_train : 263.467 seq/s mlm_loss : 8.1636  nsp_loss : 0.0001  total_loss : 8.1636  avg_loss_step : 8.1636  learning_rate : 0.00036397803  loss_scaler : 32768 
DLL 2020-11-27 03:07:55.574559 - Iteration: 175  throughput_train : 264.434 seq/s mlm_loss : 8.1904  nsp_loss : 0.0001  total_loss : 8.1905  avg_loss_step : 8.1905  learning_rate : 0.00036375815  loss_scaler : 32768 
DLL 2020-11-27 03:07:57.513801 - Iteration: 176  throughput_train : 264.060 seq/s mlm_loss : 8.2629  nsp_loss : 0.0001  total_loss : 8.2630  avg_loss_step : 8.2630  learning_rate : 0.00036353816  loss_scaler : 32768 
DLL 2020-11-27 03:07:59.482989 - Iteration: 177  throughput_train : 260.046 seq/s mlm_loss : 8.2931  nsp_loss : 0.0001  total_loss : 8.2931  avg_loss_step : 8.2931  learning_rate : 0.00036331802  loss_scaler : 32768 
DLL 2020-11-27 03:08:01.445179 - Iteration: 178  throughput_train : 260.982 seq/s mlm_loss : 8.2797  nsp_loss : 0.0001  total_loss : 8.2797  avg_loss_step : 8.2797  learning_rate : 0.0003630978  loss_scaler : 32768 
DLL 2020-11-27 03:08:03.392348 - Iteration: 179  throughput_train : 262.996 seq/s mlm_loss : 8.2624  nsp_loss : 0.0001  total_loss : 8.2625  avg_loss_step : 8.2625  learning_rate : 0.00036287738  loss_scaler : 32768 
DLL 2020-11-27 03:08:05.336201 - Iteration: 180  throughput_train : 263.445 seq/s mlm_loss : 8.1726  nsp_loss : 0.0001  total_loss : 8.1726  avg_loss_step : 8.1726  learning_rate : 0.00036265687  loss_scaler : 32768 
DLL 2020-11-27 03:08:07.286192 - Iteration: 181  throughput_train : 262.604 seq/s mlm_loss : 8.2768  nsp_loss : 0.0001  total_loss : 8.2768  avg_loss_step : 8.2768  learning_rate : 0.0003624362  loss_scaler : 32768 
DLL 2020-11-27 03:08:09.231375 - Iteration: 182  throughput_train : 263.253 seq/s mlm_loss : 8.1903  nsp_loss : 0.0001  total_loss : 8.1903  avg_loss_step : 8.1903  learning_rate : 0.0003622154  loss_scaler : 32768 
DLL 2020-11-27 03:08:11.179220 - Iteration: 183  throughput_train : 262.898 seq/s mlm_loss : 8.1683  nsp_loss : 0.0001  total_loss : 8.1683  avg_loss_step : 8.1683  learning_rate : 0.00036199446  loss_scaler : 32768 
DLL 2020-11-27 03:08:13.119181 - Iteration: 184  throughput_train : 263.962 seq/s mlm_loss : 8.2485  nsp_loss : 0.0001  total_loss : 8.2485  avg_loss_step : 8.2485  learning_rate : 0.0003617734  loss_scaler : 32768 
DLL 2020-11-27 03:08:15.063698 - Iteration: 185  throughput_train : 263.343 seq/s mlm_loss : 8.1498  nsp_loss : 0.0001  total_loss : 8.1499  avg_loss_step : 8.1499  learning_rate : 0.0003615522  loss_scaler : 32768 
DLL 2020-11-27 03:08:17.016611 - Iteration: 186  throughput_train : 262.213 seq/s mlm_loss : 8.1920  nsp_loss : 0.0000  total_loss : 8.1920  avg_loss_step : 8.1920  learning_rate : 0.00036133087  loss_scaler : 32768 
DLL 2020-11-27 03:08:18.960383 - Iteration: 187  throughput_train : 263.444 seq/s mlm_loss : 8.0945  nsp_loss : 0.0000  total_loss : 8.0946  avg_loss_step : 8.0946  learning_rate : 0.00036110939  loss_scaler : 32768 
DLL 2020-11-27 03:08:20.907984 - Iteration: 188  throughput_train : 262.927 seq/s mlm_loss : 8.1721  nsp_loss : 0.0000  total_loss : 8.1721  avg_loss_step : 8.1721  learning_rate : 0.0003608878  loss_scaler : 32768 
DLL 2020-11-27 03:08:22.841622 - Iteration: 189  throughput_train : 264.825 seq/s mlm_loss : 8.1278  nsp_loss : 0.0000  total_loss : 8.1279  avg_loss_step : 8.1279  learning_rate : 0.00036066602  loss_scaler : 32768 
DLL 2020-11-27 03:08:24.788764 - Iteration: 190  throughput_train : 262.987 seq/s mlm_loss : 8.1138  nsp_loss : 0.0000  total_loss : 8.1138  avg_loss_step : 8.1138  learning_rate : 0.00036044416  loss_scaler : 32768 
DLL 2020-11-27 03:08:26.737418 - Iteration: 191  throughput_train : 262.784 seq/s mlm_loss : 8.1500  nsp_loss : 0.0000  total_loss : 8.1500  avg_loss_step : 8.1500  learning_rate : 0.00036022213  loss_scaler : 32768 
DLL 2020-11-27 03:08:28.697663 - Iteration: 192  throughput_train : 261.235 seq/s mlm_loss : 8.1091  nsp_loss : 0.0000  total_loss : 8.1091  avg_loss_step : 8.1091  learning_rate : 0.00035999998  loss_scaler : 32768 
DLL 2020-11-27 03:08:30.642138 - Iteration: 193  throughput_train : 263.359 seq/s mlm_loss : 8.1852  nsp_loss : 0.0000  total_loss : 8.1852  avg_loss_step : 8.1852  learning_rate : 0.0003597777  loss_scaler : 32768 
DLL 2020-11-27 03:08:32.593501 - Iteration: 194  throughput_train : 262.428 seq/s mlm_loss : 8.1016  nsp_loss : 0.0000  total_loss : 8.1016  avg_loss_step : 8.1016  learning_rate : 0.00035955527  loss_scaler : 32768 
DLL 2020-11-27 03:08:34.535703 - Iteration: 195  throughput_train : 263.659 seq/s mlm_loss : 8.1958  nsp_loss : 0.0000  total_loss : 8.1959  avg_loss_step : 8.1959  learning_rate : 0.00035933268  loss_scaler : 32768 
DLL 2020-11-27 03:08:36.483789 - Iteration: 196  throughput_train : 262.860 seq/s mlm_loss : 8.0895  nsp_loss : 0.0000  total_loss : 8.0896  avg_loss_step : 8.0896  learning_rate : 0.00035911  loss_scaler : 32768 
DLL 2020-11-27 03:08:38.428640 - Iteration: 197  throughput_train : 263.297 seq/s mlm_loss : 8.1169  nsp_loss : 0.0000  total_loss : 8.1169  avg_loss_step : 8.1169  learning_rate : 0.00035888716  loss_scaler : 32768 
DLL 2020-11-27 03:08:40.370687 - Iteration: 198  throughput_train : 263.678 seq/s mlm_loss : 8.0807  nsp_loss : 0.0000  total_loss : 8.0808  avg_loss_step : 8.0808  learning_rate : 0.0003586642  loss_scaler : 32768 
DLL 2020-11-27 03:08:42.310836 - Iteration: 199  throughput_train : 263.936 seq/s mlm_loss : 8.1241  nsp_loss : 0.0000  total_loss : 8.1241  avg_loss_step : 8.1241  learning_rate : 0.00035844106  loss_scaler : 32768 
DLL 2020-11-27 03:08:44.260879 - Iteration: 200  throughput_train : 262.597 seq/s mlm_loss : 8.0980  nsp_loss : 0.0000  total_loss : 8.0980  avg_loss_step : 8.0980  learning_rate : 0.0003582178  loss_scaler : 32768 
DLL 2020-11-27 03:08:46.211101 - Iteration: 201  throughput_train : 262.574 seq/s mlm_loss : 8.0260  nsp_loss : 0.0000  total_loss : 8.0260  avg_loss_step : 8.0260  learning_rate : 0.0003579944  loss_scaler : 32768 
DLL 2020-11-27 03:08:48.163125 - Iteration: 202  throughput_train : 262.337 seq/s mlm_loss : 8.0189  nsp_loss : 0.0000  total_loss : 8.0189  avg_loss_step : 8.0189  learning_rate : 0.00035777086  loss_scaler : 32768 
DLL 2020-11-27 03:08:50.101095 - Iteration: 203  throughput_train : 264.244 seq/s mlm_loss : 7.9208  nsp_loss : 0.0000  total_loss : 7.9209  avg_loss_step : 7.9209  learning_rate : 0.0003575472  loss_scaler : 32768 
DLL 2020-11-27 03:08:52.045706 - Iteration: 204  throughput_train : 263.332 seq/s mlm_loss : 8.0649  nsp_loss : 0.0000  total_loss : 8.0649  avg_loss_step : 8.0649  learning_rate : 0.00035732338  loss_scaler : 32768 
DLL 2020-11-27 03:08:53.998160 - Iteration: 205  throughput_train : 262.273 seq/s mlm_loss : 8.1221  nsp_loss : 0.0000  total_loss : 8.1222  avg_loss_step : 8.1222  learning_rate : 0.0003570994  loss_scaler : 32768 
DLL 2020-11-27 03:08:55.941833 - Iteration: 206  throughput_train : 263.458 seq/s mlm_loss : 7.9739  nsp_loss : 0.0000  total_loss : 7.9740  avg_loss_step : 7.9740  learning_rate : 0.0003568753  loss_scaler : 32768 
DLL 2020-11-27 03:08:57.888539 - Iteration: 207  throughput_train : 263.048 seq/s mlm_loss : 7.9126  nsp_loss : 0.0000  total_loss : 7.9127  avg_loss_step : 7.9127  learning_rate : 0.0003566511  loss_scaler : 32768 
DLL 2020-11-27 03:08:59.816639 - Iteration: 208  throughput_train : 265.588 seq/s mlm_loss : 8.0615  nsp_loss : 0.0000  total_loss : 8.0616  avg_loss_step : 8.0616  learning_rate : 0.0003564267  loss_scaler : 32768 
DLL 2020-11-27 03:09:01.767805 - Iteration: 209  throughput_train : 262.445 seq/s mlm_loss : 8.1732  nsp_loss : 0.0000  total_loss : 8.1732  avg_loss_step : 8.1732  learning_rate : 0.0003562022  loss_scaler : 32768 
DLL 2020-11-27 03:09:03.698268 - Iteration: 210  throughput_train : 265.260 seq/s mlm_loss : 8.0377  nsp_loss : 0.0000  total_loss : 8.0378  avg_loss_step : 8.0378  learning_rate : 0.00035597754  loss_scaler : 32768 
DLL 2020-11-27 03:09:05.647353 - Iteration: 211  throughput_train : 262.726 seq/s mlm_loss : 8.0882  nsp_loss : 0.0000  total_loss : 8.0882  avg_loss_step : 8.0882  learning_rate : 0.0003557527  loss_scaler : 32768 
DLL 2020-11-27 03:09:07.588237 - Iteration: 212  throughput_train : 263.839 seq/s mlm_loss : 8.0901  nsp_loss : 0.0000  total_loss : 8.0901  avg_loss_step : 8.0901  learning_rate : 0.00035552774  loss_scaler : 32768 
DLL 2020-11-27 03:09:09.529088 - Iteration: 213  throughput_train : 263.841 seq/s mlm_loss : 8.0172  nsp_loss : 0.0000  total_loss : 8.0173  avg_loss_step : 8.0173  learning_rate : 0.00035530268  loss_scaler : 32768 
DLL 2020-11-27 03:09:11.479403 - Iteration: 214  throughput_train : 262.561 seq/s mlm_loss : 8.0245  nsp_loss : 0.0000  total_loss : 8.0246  avg_loss_step : 8.0246  learning_rate : 0.00035507744  loss_scaler : 32768 
DLL 2020-11-27 03:09:13.430404 - Iteration: 215  throughput_train : 262.469 seq/s mlm_loss : 7.9356  nsp_loss : 0.0000  total_loss : 7.9357  avg_loss_step : 7.9357  learning_rate : 0.00035485206  loss_scaler : 32768 
DLL 2020-11-27 03:09:15.389328 - Iteration: 216  throughput_train : 261.406 seq/s mlm_loss : 7.9099  nsp_loss : 0.0000  total_loss : 7.9099  avg_loss_step : 7.9099  learning_rate : 0.00035462657  loss_scaler : 32768 
DLL 2020-11-27 03:09:17.341977 - Iteration: 217  throughput_train : 262.247 seq/s mlm_loss : 7.9273  nsp_loss : 0.0000  total_loss : 7.9273  avg_loss_step : 7.9273  learning_rate : 0.0003544009  loss_scaler : 32768 
DLL 2020-11-27 03:09:19.286963 - Iteration: 218  throughput_train : 263.280 seq/s mlm_loss : 8.0235  nsp_loss : 0.0000  total_loss : 8.0235  avg_loss_step : 8.0235  learning_rate : 0.00035417508  loss_scaler : 32768 
DLL 2020-11-27 03:09:21.223025 - Iteration: 219  throughput_train : 264.496 seq/s mlm_loss : 7.8798  nsp_loss : 0.0000  total_loss : 7.8798  avg_loss_step : 7.8798  learning_rate : 0.00035394914  loss_scaler : 32768 
DLL 2020-11-27 03:09:23.169923 - Iteration: 220  throughput_train : 263.022 seq/s mlm_loss : 7.9186  nsp_loss : 0.0000  total_loss : 7.9186  avg_loss_step : 7.9186  learning_rate : 0.00035372304  loss_scaler : 32768 
DLL 2020-11-27 03:09:25.112939 - Iteration: 221  throughput_train : 263.548 seq/s mlm_loss : 7.8740  nsp_loss : 0.0000  total_loss : 7.8741  avg_loss_step : 7.8741  learning_rate : 0.0003534968  loss_scaler : 32768 
DLL 2020-11-27 03:09:27.060205 - Iteration: 222  throughput_train : 262.972 seq/s mlm_loss : 7.8641  nsp_loss : 0.0000  total_loss : 7.8641  avg_loss_step : 7.8641  learning_rate : 0.0003532704  loss_scaler : 32768 
DLL 2020-11-27 03:09:29.003529 - Iteration: 223  throughput_train : 263.507 seq/s mlm_loss : 8.0296  nsp_loss : 0.0000  total_loss : 8.0296  avg_loss_step : 8.0296  learning_rate : 0.0003530439  loss_scaler : 32768 
DLL 2020-11-27 03:09:30.950318 - Iteration: 224  throughput_train : 263.036 seq/s mlm_loss : 7.9842  nsp_loss : 0.0000  total_loss : 7.9842  avg_loss_step : 7.9842  learning_rate : 0.0003528172  loss_scaler : 32768 
DLL 2020-11-27 03:09:32.911519 - Iteration: 225  throughput_train : 261.105 seq/s mlm_loss : 7.8753  nsp_loss : 0.0000  total_loss : 7.8753  avg_loss_step : 7.8753  learning_rate : 0.0003525904  loss_scaler : 32768 
DLL 2020-11-27 03:09:34.856809 - Iteration: 226  throughput_train : 263.238 seq/s mlm_loss : 8.0229  nsp_loss : 0.0000  total_loss : 8.0229  avg_loss_step : 8.0229  learning_rate : 0.00035236342  loss_scaler : 32768 
DLL 2020-11-27 03:09:36.808877 - Iteration: 227  throughput_train : 262.324 seq/s mlm_loss : 7.8167  nsp_loss : 0.0000  total_loss : 7.8167  avg_loss_step : 7.8167  learning_rate : 0.00035213633  loss_scaler : 32768 
DLL 2020-11-27 03:09:38.750480 - Iteration: 228  throughput_train : 263.739 seq/s mlm_loss : 7.8891  nsp_loss : 0.0000  total_loss : 7.8891  avg_loss_step : 7.8891  learning_rate : 0.00035190905  loss_scaler : 32768 
DLL 2020-11-27 03:09:40.700379 - Iteration: 229  throughput_train : 262.619 seq/s mlm_loss : 7.8387  nsp_loss : 0.0000  total_loss : 7.8387  avg_loss_step : 7.8387  learning_rate : 0.00035168167  loss_scaler : 32768 
DLL 2020-11-27 03:09:42.638737 - Iteration: 230  throughput_train : 264.186 seq/s mlm_loss : 7.9388  nsp_loss : 0.0000  total_loss : 7.9389  avg_loss_step : 7.9389  learning_rate : 0.0003514541  loss_scaler : 32768 
DLL 2020-11-27 03:09:44.587679 - Iteration: 231  throughput_train : 262.745 seq/s mlm_loss : 7.8525  nsp_loss : 0.0000  total_loss : 7.8525  avg_loss_step : 7.8525  learning_rate : 0.00035122642  loss_scaler : 32768 
DLL 2020-11-27 03:09:46.537407 - Iteration: 232  throughput_train : 262.640 seq/s mlm_loss : 7.8987  nsp_loss : 0.0000  total_loss : 7.8987  avg_loss_step : 7.8987  learning_rate : 0.00035099857  loss_scaler : 32768 
DLL 2020-11-27 03:09:48.482687 - Iteration: 233  throughput_train : 263.242 seq/s mlm_loss : 7.9447  nsp_loss : 0.0000  total_loss : 7.9447  avg_loss_step : 7.9447  learning_rate : 0.00035077057  loss_scaler : 32768 
DLL 2020-11-27 03:09:50.431191 - Iteration: 234  throughput_train : 262.805 seq/s mlm_loss : 7.8990  nsp_loss : 0.0000  total_loss : 7.8990  avg_loss_step : 7.8990  learning_rate : 0.00035054245  loss_scaler : 32768 
DLL 2020-11-27 03:09:52.370179 - Iteration: 235  throughput_train : 264.095 seq/s mlm_loss : 7.7710  nsp_loss : 0.0000  total_loss : 7.7710  avg_loss_step : 7.7710  learning_rate : 0.00035031413  loss_scaler : 32768 
DLL 2020-11-27 03:09:54.325579 - Iteration: 236  throughput_train : 261.878 seq/s mlm_loss : 7.8712  nsp_loss : 0.0000  total_loss : 7.8712  avg_loss_step : 7.8712  learning_rate : 0.00035008567  loss_scaler : 32768 
DLL 2020-11-27 03:09:56.283402 - Iteration: 237  throughput_train : 261.553 seq/s mlm_loss : 7.8356  nsp_loss : 0.0000  total_loss : 7.8356  avg_loss_step : 7.8356  learning_rate : 0.00034985712  loss_scaler : 32768 
DLL 2020-11-27 03:09:58.229327 - Iteration: 238  throughput_train : 263.155 seq/s mlm_loss : 7.8350  nsp_loss : 0.0000  total_loss : 7.8350  avg_loss_step : 7.8350  learning_rate : 0.00034962836  loss_scaler : 32768 
DLL 2020-11-27 03:10:00.172347 - Iteration: 239  throughput_train : 263.547 seq/s mlm_loss : 7.9478  nsp_loss : 0.0000  total_loss : 7.9479  avg_loss_step : 7.9479  learning_rate : 0.0003493995  loss_scaler : 32768 
DLL 2020-11-27 03:10:02.114698 - Iteration: 240  throughput_train : 263.644 seq/s mlm_loss : 7.8153  nsp_loss : 0.0000  total_loss : 7.8153  avg_loss_step : 7.8153  learning_rate : 0.00034917044  loss_scaler : 32768 
DLL 2020-11-27 03:10:04.062529 - Iteration: 241  throughput_train : 262.896 seq/s mlm_loss : 7.8213  nsp_loss : 0.0000  total_loss : 7.8214  avg_loss_step : 7.8214  learning_rate : 0.00034894125  loss_scaler : 32768 
DLL 2020-11-27 03:10:06.010827 - Iteration: 242  throughput_train : 262.832 seq/s mlm_loss : 7.7995  nsp_loss : 0.0000  total_loss : 7.7995  avg_loss_step : 7.7995  learning_rate : 0.0003487119  loss_scaler : 32768 
DLL 2020-11-27 03:10:07.946752 - Iteration: 243  throughput_train : 264.514 seq/s mlm_loss : 7.7922  nsp_loss : 0.0000  total_loss : 7.7922  avg_loss_step : 7.7922  learning_rate : 0.0003484824  loss_scaler : 32768 
DLL 2020-11-27 03:10:09.885653 - Iteration: 244  throughput_train : 264.109 seq/s mlm_loss : 7.8098  nsp_loss : 0.0000  total_loss : 7.8098  avg_loss_step : 7.8098  learning_rate : 0.0003482528  loss_scaler : 32768 
DLL 2020-11-27 03:10:11.819470 - Iteration: 245  throughput_train : 264.805 seq/s mlm_loss : 7.7841  nsp_loss : 0.0000  total_loss : 7.7841  avg_loss_step : 7.7841  learning_rate : 0.00034802296  loss_scaler : 32768 
DLL 2020-11-27 03:10:13.757695 - Iteration: 246  throughput_train : 264.199 seq/s mlm_loss : 7.8663  nsp_loss : 0.0000  total_loss : 7.8663  avg_loss_step : 7.8663  learning_rate : 0.00034779301  loss_scaler : 32768 
DLL 2020-11-27 03:10:15.704388 - Iteration: 247  throughput_train : 263.049 seq/s mlm_loss : 7.7867  nsp_loss : 0.0000  total_loss : 7.7867  avg_loss_step : 7.7867  learning_rate : 0.00034756292  loss_scaler : 32768 
DLL 2020-11-27 03:10:17.656853 - Iteration: 248  throughput_train : 262.273 seq/s mlm_loss : 7.7180  nsp_loss : 0.0000  total_loss : 7.7180  avg_loss_step : 7.7180  learning_rate : 0.00034733268  loss_scaler : 32768 
DLL 2020-11-27 03:10:19.604249 - Iteration: 249  throughput_train : 262.955 seq/s mlm_loss : 7.7878  nsp_loss : 0.0000  total_loss : 7.7878  avg_loss_step : 7.7878  learning_rate : 0.00034710226  loss_scaler : 32768 
DLL 2020-11-27 03:10:21.546382 - Iteration: 250  throughput_train : 263.667 seq/s mlm_loss : 7.7676  nsp_loss : 0.0000  total_loss : 7.7676  avg_loss_step : 7.7676  learning_rate : 0.00034687173  loss_scaler : 32768 
DLL 2020-11-27 03:10:23.486447 - Iteration: 251  throughput_train : 263.950 seq/s mlm_loss : 7.9536  nsp_loss : 0.0000  total_loss : 7.9536  avg_loss_step : 7.9536  learning_rate : 0.000346641  loss_scaler : 32768 
DLL 2020-11-27 03:10:25.442185 - Iteration: 252  throughput_train : 261.837 seq/s mlm_loss : 7.8250  nsp_loss : 0.0000  total_loss : 7.8250  avg_loss_step : 7.8250  learning_rate : 0.00034641015  loss_scaler : 32768 
DLL 2020-11-27 03:10:27.382303 - Iteration: 253  throughput_train : 263.941 seq/s mlm_loss : 7.7938  nsp_loss : 0.0000  total_loss : 7.7938  avg_loss_step : 7.7938  learning_rate : 0.00034617912  loss_scaler : 32768 
DLL 2020-11-27 03:10:29.337095 - Iteration: 254  throughput_train : 261.959 seq/s mlm_loss : 7.7801  nsp_loss : 0.0000  total_loss : 7.7801  avg_loss_step : 7.7801  learning_rate : 0.00034594798  loss_scaler : 32768 
DLL 2020-11-27 03:10:31.274122 - Iteration: 255  throughput_train : 264.362 seq/s mlm_loss : 7.7076  nsp_loss : 0.0000  total_loss : 7.7076  avg_loss_step : 7.7076  learning_rate : 0.00034571663  loss_scaler : 32768 
DLL 2020-11-27 03:10:33.218959 - Iteration: 256  throughput_train : 263.300 seq/s mlm_loss : 7.8528  nsp_loss : 0.0000  total_loss : 7.8528  avg_loss_step : 7.8528  learning_rate : 0.00034548517  loss_scaler : 32768 
DLL 2020-11-27 03:10:35.166403 - Iteration: 257  throughput_train : 262.947 seq/s mlm_loss : 7.7599  nsp_loss : 0.0000  total_loss : 7.7599  avg_loss_step : 7.7599  learning_rate : 0.00034525353  loss_scaler : 32768 
DLL 2020-11-27 03:10:37.113473 - Iteration: 258  throughput_train : 262.998 seq/s mlm_loss : 7.7255  nsp_loss : 0.0000  total_loss : 7.7255  avg_loss_step : 7.7255  learning_rate : 0.00034502172  loss_scaler : 32768 
DLL 2020-11-27 03:10:39.069940 - Iteration: 259  throughput_train : 261.735 seq/s mlm_loss : 7.6998  nsp_loss : 0.0000  total_loss : 7.6998  avg_loss_step : 7.6998  learning_rate : 0.0003447898  loss_scaler : 32768 
DLL 2020-11-27 03:10:41.015847 - Iteration: 260  throughput_train : 263.155 seq/s mlm_loss : 7.7650  nsp_loss : 0.0000  total_loss : 7.7650  avg_loss_step : 7.7650  learning_rate : 0.0003445577  loss_scaler : 32768 
DLL 2020-11-27 03:10:42.956601 - Iteration: 261  throughput_train : 263.853 seq/s mlm_loss : 7.8025  nsp_loss : 0.0000  total_loss : 7.8025  avg_loss_step : 7.8025  learning_rate : 0.0003443254  loss_scaler : 32768 
DLL 2020-11-27 03:10:44.902363 - Iteration: 262  throughput_train : 263.173 seq/s mlm_loss : 7.7155  nsp_loss : 0.0000  total_loss : 7.7155  avg_loss_step : 7.7155  learning_rate : 0.00034409302  loss_scaler : 32768 
DLL 2020-11-27 03:10:46.835082 - Iteration: 263  throughput_train : 264.953 seq/s mlm_loss : 7.6332  nsp_loss : 0.0000  total_loss : 7.6332  avg_loss_step : 7.6332  learning_rate : 0.00034386042  loss_scaler : 32768 
DLL 2020-11-27 03:10:48.781628 - Iteration: 264  throughput_train : 263.069 seq/s mlm_loss : 7.6907  nsp_loss : 0.0000  total_loss : 7.6907  avg_loss_step : 7.6907  learning_rate : 0.00034362767  loss_scaler : 32768 
DLL 2020-11-27 03:10:50.729578 - Iteration: 265  throughput_train : 262.880 seq/s mlm_loss : 7.7665  nsp_loss : 0.0000  total_loss : 7.7665  avg_loss_step : 7.7665  learning_rate : 0.0003433948  loss_scaler : 32768 
DLL 2020-11-27 03:10:52.671405 - Iteration: 266  throughput_train : 263.708 seq/s mlm_loss : 7.9586  nsp_loss : 0.0000  total_loss : 7.9586  avg_loss_step : 7.9586  learning_rate : 0.00034316175  loss_scaler : 32768 
DLL 2020-11-27 03:10:54.620939 - Iteration: 267  throughput_train : 262.667 seq/s mlm_loss : 7.6711  nsp_loss : 0.0000  total_loss : 7.6711  avg_loss_step : 7.6711  learning_rate : 0.00034292857  loss_scaler : 32768 
DLL 2020-11-27 03:10:56.561886 - Iteration: 268  throughput_train : 263.828 seq/s mlm_loss : 7.5332  nsp_loss : 0.0000  total_loss : 7.5332  avg_loss_step : 7.5332  learning_rate : 0.0003426952  loss_scaler : 32768 
DLL 2020-11-27 03:10:58.503822 - Iteration: 269  throughput_train : 263.694 seq/s mlm_loss : 7.6867  nsp_loss : 0.0000  total_loss : 7.6867  avg_loss_step : 7.6867  learning_rate : 0.00034246166  loss_scaler : 32768 
DLL 2020-11-27 03:11:00.450943 - Iteration: 270  throughput_train : 262.991 seq/s mlm_loss : 7.5312  nsp_loss : 0.0000  total_loss : 7.5312  avg_loss_step : 7.5312  learning_rate : 0.00034222798  loss_scaler : 32768 
DLL 2020-11-27 03:11:02.403755 - Iteration: 271  throughput_train : 262.224 seq/s mlm_loss : 7.6066  nsp_loss : 0.0000  total_loss : 7.6067  avg_loss_step : 7.6067  learning_rate : 0.00034199413  loss_scaler : 32768 
DLL 2020-11-27 03:11:04.348764 - Iteration: 272  throughput_train : 263.276 seq/s mlm_loss : 7.6654  nsp_loss : 0.0000  total_loss : 7.6654  avg_loss_step : 7.6654  learning_rate : 0.00034176014  loss_scaler : 32768 
DLL 2020-11-27 03:11:06.297751 - Iteration: 273  throughput_train : 262.739 seq/s mlm_loss : 7.6799  nsp_loss : 0.0000  total_loss : 7.6799  avg_loss_step : 7.6799  learning_rate : 0.00034152597  loss_scaler : 32768 
DLL 2020-11-27 03:11:08.233250 - Iteration: 274  throughput_train : 264.573 seq/s mlm_loss : 7.6734  nsp_loss : 0.0000  total_loss : 7.6734  avg_loss_step : 7.6734  learning_rate : 0.00034129166  loss_scaler : 32768 
DLL 2020-11-27 03:11:10.176786 - Iteration: 275  throughput_train : 263.476 seq/s mlm_loss : 7.6108  nsp_loss : 0.0000  total_loss : 7.6108  avg_loss_step : 7.6108  learning_rate : 0.00034105717  loss_scaler : 32768 
DLL 2020-11-27 03:11:12.136437 - Iteration: 276  throughput_train : 261.309 seq/s mlm_loss : 7.6594  nsp_loss : 0.0000  total_loss : 7.6594  avg_loss_step : 7.6594  learning_rate : 0.00034082253  loss_scaler : 32768 
DLL 2020-11-27 03:11:14.082201 - Iteration: 277  throughput_train : 263.176 seq/s mlm_loss : 7.5897  nsp_loss : 0.0000  total_loss : 7.5897  avg_loss_step : 7.5897  learning_rate : 0.00034058772  loss_scaler : 32768 
DLL 2020-11-27 03:11:16.035464 - Iteration: 278  throughput_train : 262.165 seq/s mlm_loss : 7.7030  nsp_loss : 0.0000  total_loss : 7.7030  avg_loss_step : 7.7030  learning_rate : 0.00034035274  loss_scaler : 32768 
DLL 2020-11-27 03:11:17.992439 - Iteration: 279  throughput_train : 261.668 seq/s mlm_loss : 7.6785  nsp_loss : 0.0000  total_loss : 7.6785  avg_loss_step : 7.6785  learning_rate : 0.0003401176  loss_scaler : 32768 
DLL 2020-11-27 03:11:19.943323 - Iteration: 280  throughput_train : 262.485 seq/s mlm_loss : 7.7485  nsp_loss : 0.0000  total_loss : 7.7485  avg_loss_step : 7.7485  learning_rate : 0.00033988233  loss_scaler : 32768 
DLL 2020-11-27 03:11:21.894416 - Iteration: 281  throughput_train : 262.457 seq/s mlm_loss : 7.6150  nsp_loss : 0.0000  total_loss : 7.6150  avg_loss_step : 7.6150  learning_rate : 0.00033964685  loss_scaler : 32768 
DLL 2020-11-27 03:11:23.842499 - Iteration: 282  throughput_train : 262.863 seq/s mlm_loss : 7.5638  nsp_loss : 0.0000  total_loss : 7.5638  avg_loss_step : 7.5638  learning_rate : 0.00033941126  loss_scaler : 32768 
DLL 2020-11-27 03:11:25.780875 - Iteration: 283  throughput_train : 264.179 seq/s mlm_loss : 7.5457  nsp_loss : 0.0000  total_loss : 7.5457  avg_loss_step : 7.5457  learning_rate : 0.00033917546  loss_scaler : 32768 
DLL 2020-11-27 03:11:27.724616 - Iteration: 284  throughput_train : 263.461 seq/s mlm_loss : 7.6641  nsp_loss : 0.0000  total_loss : 7.6641  avg_loss_step : 7.6641  learning_rate : 0.0003389395  loss_scaler : 32768 
DLL 2020-11-27 03:11:29.666251 - Iteration: 285  throughput_train : 263.758 seq/s mlm_loss : 7.6784  nsp_loss : 0.0000  total_loss : 7.6784  avg_loss_step : 7.6784  learning_rate : 0.00033870342  loss_scaler : 32768 
DLL 2020-11-27 03:11:31.614474 - Iteration: 286  throughput_train : 262.844 seq/s mlm_loss : 7.5921  nsp_loss : 0.0000  total_loss : 7.5921  avg_loss_step : 7.5921  learning_rate : 0.0003384671  loss_scaler : 32768 
DLL 2020-11-27 03:11:33.559413 - Iteration: 287  throughput_train : 263.288 seq/s mlm_loss : 7.4688  nsp_loss : 0.0000  total_loss : 7.4688  avg_loss_step : 7.4688  learning_rate : 0.00033823066  loss_scaler : 32768 
DLL 2020-11-27 03:11:35.500978 - Iteration: 288  throughput_train : 263.746 seq/s mlm_loss : 7.5700  nsp_loss : 0.0000  total_loss : 7.5700  avg_loss_step : 7.5700  learning_rate : 0.00033799408  loss_scaler : 32768 
DLL 2020-11-27 03:11:37.447423 - Iteration: 289  throughput_train : 263.083 seq/s mlm_loss : 7.5455  nsp_loss : 0.0000  total_loss : 7.5455  avg_loss_step : 7.5455  learning_rate : 0.0003377573  loss_scaler : 32768 
DLL 2020-11-27 03:11:39.397945 - Iteration: 290  throughput_train : 262.533 seq/s mlm_loss : 7.6285  nsp_loss : 0.0000  total_loss : 7.6285  avg_loss_step : 7.6285  learning_rate : 0.00033752035  loss_scaler : 32768 
DLL 2020-11-27 03:11:41.345427 - Iteration: 291  throughput_train : 262.943 seq/s mlm_loss : 7.5731  nsp_loss : 0.0000  total_loss : 7.5731  avg_loss_step : 7.5731  learning_rate : 0.00033728324  loss_scaler : 32768 
DLL 2020-11-27 03:11:43.297010 - Iteration: 292  throughput_train : 262.390 seq/s mlm_loss : 7.5067  nsp_loss : 0.0000  total_loss : 7.5067  avg_loss_step : 7.5067  learning_rate : 0.00033704596  loss_scaler : 32768 
DLL 2020-11-27 03:11:45.240539 - Iteration: 293  throughput_train : 263.479 seq/s mlm_loss : 7.6464  nsp_loss : 0.0000  total_loss : 7.6464  avg_loss_step : 7.6464  learning_rate : 0.00033680853  loss_scaler : 32768 
DLL 2020-11-27 03:11:47.195189 - Iteration: 294  throughput_train : 261.978 seq/s mlm_loss : 7.5721  nsp_loss : 0.0000  total_loss : 7.5721  avg_loss_step : 7.5721  learning_rate : 0.00033657093  loss_scaler : 32768 
DLL 2020-11-27 03:11:49.141612 - Iteration: 295  throughput_train : 263.085 seq/s mlm_loss : 7.6829  nsp_loss : 0.0000  total_loss : 7.6829  avg_loss_step : 7.6829  learning_rate : 0.00033633318  loss_scaler : 32768 
DLL 2020-11-27 03:11:51.090035 - Iteration: 296  throughput_train : 262.817 seq/s mlm_loss : 7.6925  nsp_loss : 0.0000  total_loss : 7.6925  avg_loss_step : 7.6925  learning_rate : 0.0003360952  loss_scaler : 32768 
DLL 2020-11-27 03:11:53.020580 - Iteration: 297  throughput_train : 265.250 seq/s mlm_loss : 7.7207  nsp_loss : 0.0000  total_loss : 7.7207  avg_loss_step : 7.7207  learning_rate : 0.0003358571  loss_scaler : 32768 
DLL 2020-11-27 03:11:54.963243 - Iteration: 298  throughput_train : 263.594 seq/s mlm_loss : 7.5999  nsp_loss : 0.0000  total_loss : 7.5999  avg_loss_step : 7.5999  learning_rate : 0.00033561882  loss_scaler : 32768 
DLL 2020-11-27 03:11:56.902025 - Iteration: 299  throughput_train : 264.123 seq/s mlm_loss : 7.4692  nsp_loss : 0.0000  total_loss : 7.4692  avg_loss_step : 7.4692  learning_rate : 0.00033538035  loss_scaler : 32768 
DLL 2020-11-27 03:11:58.844819 - Iteration: 300  throughput_train : 263.576 seq/s mlm_loss : 7.5501  nsp_loss : 0.0000  total_loss : 7.5501  avg_loss_step : 7.5501  learning_rate : 0.00033514178  loss_scaler : 32768 
DLL 2020-11-27 03:12:00.787070 - Iteration: 301  throughput_train : 263.651 seq/s mlm_loss : 7.5963  nsp_loss : 0.0000  total_loss : 7.5963  avg_loss_step : 7.5963  learning_rate : 0.00033490296  loss_scaler : 32768 
DLL 2020-11-27 03:12:02.733924 - Iteration: 302  throughput_train : 263.027 seq/s mlm_loss : 7.6253  nsp_loss : 0.0000  total_loss : 7.6253  avg_loss_step : 7.6253  learning_rate : 0.00033466402  loss_scaler : 32768 
DLL 2020-11-27 03:12:04.675663 - Iteration: 303  throughput_train : 263.720 seq/s mlm_loss : 7.5718  nsp_loss : 0.0000  total_loss : 7.5718  avg_loss_step : 7.5718  learning_rate : 0.00033442487  loss_scaler : 32768 
DLL 2020-11-27 03:12:06.612002 - Iteration: 304  throughput_train : 264.456 seq/s mlm_loss : 7.6788  nsp_loss : 0.0000  total_loss : 7.6788  avg_loss_step : 7.6788  learning_rate : 0.00033418558  loss_scaler : 32768 
DLL 2020-11-27 03:12:08.554859 - Iteration: 305  throughput_train : 263.569 seq/s mlm_loss : 7.5251  nsp_loss : 0.0000  total_loss : 7.5252  avg_loss_step : 7.5252  learning_rate : 0.0003339461  loss_scaler : 32768 
DLL 2020-11-27 03:12:10.512245 - Iteration: 306  throughput_train : 261.613 seq/s mlm_loss : 7.6095  nsp_loss : 0.0000  total_loss : 7.6095  avg_loss_step : 7.6095  learning_rate : 0.00033370644  loss_scaler : 32768 
DLL 2020-11-27 03:12:12.461434 - Iteration: 307  throughput_train : 262.714 seq/s mlm_loss : 7.5467  nsp_loss : 0.0000  total_loss : 7.5467  avg_loss_step : 7.5467  learning_rate : 0.00033346665  loss_scaler : 32768 
DLL 2020-11-27 03:12:14.406582 - Iteration: 308  throughput_train : 263.260 seq/s mlm_loss : 7.4174  nsp_loss : 0.0000  total_loss : 7.4174  avg_loss_step : 7.4174  learning_rate : 0.00033322664  loss_scaler : 32768 
DLL 2020-11-27 03:12:16.353040 - Iteration: 309  throughput_train : 263.080 seq/s mlm_loss : 7.5884  nsp_loss : 0.0000  total_loss : 7.5884  avg_loss_step : 7.5884  learning_rate : 0.00033298647  loss_scaler : 32768 
DLL 2020-11-27 03:12:18.298823 - Iteration: 310  throughput_train : 263.173 seq/s mlm_loss : 7.5277  nsp_loss : 0.0000  total_loss : 7.5277  avg_loss_step : 7.5277  learning_rate : 0.00033274613  loss_scaler : 32768 
DLL 2020-11-27 03:12:20.242278 - Iteration: 311  throughput_train : 263.488 seq/s mlm_loss : 7.6201  nsp_loss : 0.0000  total_loss : 7.6201  avg_loss_step : 7.6201  learning_rate : 0.00033250562  loss_scaler : 32768 
DLL 2020-11-27 03:12:22.195966 - Iteration: 312  throughput_train : 262.109 seq/s mlm_loss : 7.4652  nsp_loss : 0.0000  total_loss : 7.4652  avg_loss_step : 7.4652  learning_rate : 0.00033226493  loss_scaler : 32768 
DLL 2020-11-27 03:12:24.144064 - Iteration: 313  throughput_train : 262.859 seq/s mlm_loss : 7.5447  nsp_loss : 0.0000  total_loss : 7.5447  avg_loss_step : 7.5447  learning_rate : 0.0003320241  loss_scaler : 32768 
DLL 2020-11-27 03:12:26.083747 - Iteration: 314  throughput_train : 263.999 seq/s mlm_loss : 7.4444  nsp_loss : 0.0000  total_loss : 7.4444  avg_loss_step : 7.4444  learning_rate : 0.00033178306  loss_scaler : 32768 
DLL 2020-11-27 03:12:28.025880 - Iteration: 315  throughput_train : 263.666 seq/s mlm_loss : 7.5540  nsp_loss : 0.0000  total_loss : 7.5540  avg_loss_step : 7.5540  learning_rate : 0.00033154184  loss_scaler : 32768 
DLL 2020-11-27 03:12:29.974407 - Iteration: 316  throughput_train : 262.802 seq/s mlm_loss : 7.5016  nsp_loss : 0.0000  total_loss : 7.5016  avg_loss_step : 7.5016  learning_rate : 0.00033130046  loss_scaler : 32768 
DLL 2020-11-27 03:12:31.914944 - Iteration: 317  throughput_train : 263.887 seq/s mlm_loss : 7.4427  nsp_loss : 0.0000  total_loss : 7.4427  avg_loss_step : 7.4427  learning_rate : 0.00033105887  loss_scaler : 32768 
DLL 2020-11-27 03:12:33.847894 - Iteration: 318  throughput_train : 264.927 seq/s mlm_loss : 7.4909  nsp_loss : 0.0000  total_loss : 7.4909  avg_loss_step : 7.4909  learning_rate : 0.00033081716  loss_scaler : 32768 
DLL 2020-11-27 03:12:35.797321 - Iteration: 319  throughput_train : 262.683 seq/s mlm_loss : 7.4878  nsp_loss : 0.0000  total_loss : 7.4878  avg_loss_step : 7.4878  learning_rate : 0.00033057525  loss_scaler : 32768 
DLL 2020-11-27 03:12:37.745347 - Iteration: 320  throughput_train : 262.870 seq/s mlm_loss : 7.4414  nsp_loss : 0.0000  total_loss : 7.4414  avg_loss_step : 7.4414  learning_rate : 0.00033033316  loss_scaler : 32768 
DLL 2020-11-27 03:12:39.691390 - Iteration: 321  throughput_train : 263.138 seq/s mlm_loss : 7.4378  nsp_loss : 0.0000  total_loss : 7.4378  avg_loss_step : 7.4378  learning_rate : 0.0003300909  loss_scaler : 32768 
DLL 2020-11-27 03:12:41.638205 - Iteration: 322  throughput_train : 263.033 seq/s mlm_loss : 7.5072  nsp_loss : 0.0000  total_loss : 7.5072  avg_loss_step : 7.5072  learning_rate : 0.00032984844  loss_scaler : 32768 
DLL 2020-11-27 03:12:43.579769 - Iteration: 323  throughput_train : 263.744 seq/s mlm_loss : 7.4456  nsp_loss : 0.0000  total_loss : 7.4456  avg_loss_step : 7.4456  learning_rate : 0.00032960583  loss_scaler : 32768 
DLL 2020-11-27 03:12:45.523174 - Iteration: 324  throughput_train : 263.494 seq/s mlm_loss : 7.4672  nsp_loss : 0.0000  total_loss : 7.4672  avg_loss_step : 7.4672  learning_rate : 0.000329363  loss_scaler : 32768 
DLL 2020-11-27 03:12:47.465975 - Iteration: 325  throughput_train : 263.576 seq/s mlm_loss : 7.5592  nsp_loss : 0.0000  total_loss : 7.5592  avg_loss_step : 7.5592  learning_rate : 0.00032912003  loss_scaler : 32768 
DLL 2020-11-27 03:12:49.420059 - Iteration: 326  throughput_train : 262.057 seq/s mlm_loss : 7.4740  nsp_loss : 0.0000  total_loss : 7.4740  avg_loss_step : 7.4740  learning_rate : 0.00032887686  loss_scaler : 32768 
DLL 2020-11-27 03:12:51.362595 - Iteration: 327  throughput_train : 263.627 seq/s mlm_loss : 7.4757  nsp_loss : 0.0000  total_loss : 7.4757  avg_loss_step : 7.4757  learning_rate : 0.00032863353  loss_scaler : 32768 
DLL 2020-11-27 03:12:53.315047 - Iteration: 328  throughput_train : 262.290 seq/s mlm_loss : 7.5258  nsp_loss : 0.0000  total_loss : 7.5258  avg_loss_step : 7.5258  learning_rate : 0.00032839002  loss_scaler : 32768 
DLL 2020-11-27 03:12:55.259760 - Iteration: 329  throughput_train : 263.328 seq/s mlm_loss : 7.5241  nsp_loss : 0.0000  total_loss : 7.5241  avg_loss_step : 7.5241  learning_rate : 0.0003281463  loss_scaler : 32768 
DLL 2020-11-27 03:12:57.206000 - Iteration: 330  throughput_train : 263.109 seq/s mlm_loss : 7.4842  nsp_loss : 0.0000  total_loss : 7.4842  avg_loss_step : 7.4842  learning_rate : 0.0003279024  loss_scaler : 32768 
DLL 2020-11-27 03:12:59.146238 - Iteration: 331  throughput_train : 263.924 seq/s mlm_loss : 7.5165  nsp_loss : 0.0000  total_loss : 7.5165  avg_loss_step : 7.5165  learning_rate : 0.00032765834  loss_scaler : 32768 
DLL 2020-11-27 03:13:01.102445 - Iteration: 332  throughput_train : 261.769 seq/s mlm_loss : 7.5780  nsp_loss : 0.0000  total_loss : 7.5780  avg_loss_step : 7.5780  learning_rate : 0.0003274141  loss_scaler : 32768 
DLL 2020-11-27 03:13:03.046597 - Iteration: 333  throughput_train : 263.395 seq/s mlm_loss : 7.5336  nsp_loss : 0.0000  total_loss : 7.5336  avg_loss_step : 7.5336  learning_rate : 0.00032716966  loss_scaler : 32768 
DLL 2020-11-27 03:13:04.992116 - Iteration: 334  throughput_train : 263.207 seq/s mlm_loss : 7.5650  nsp_loss : 0.0000  total_loss : 7.5650  avg_loss_step : 7.5650  learning_rate : 0.00032692504  loss_scaler : 32768 
DLL 2020-11-27 03:13:06.935790 - Iteration: 335  throughput_train : 263.457 seq/s mlm_loss : 7.5931  nsp_loss : 0.0000  total_loss : 7.5931  avg_loss_step : 7.5931  learning_rate : 0.00032668028  loss_scaler : 32768 
DLL 2020-11-27 03:13:08.883392 - Iteration: 336  throughput_train : 262.928 seq/s mlm_loss : 7.5645  nsp_loss : 0.0000  total_loss : 7.5645  avg_loss_step : 7.5645  learning_rate : 0.00032643529  loss_scaler : 32768 
DLL 2020-11-27 03:13:10.829737 - Iteration: 337  throughput_train : 263.112 seq/s mlm_loss : 7.4921  nsp_loss : 0.0000  total_loss : 7.4921  avg_loss_step : 7.4921  learning_rate : 0.00032619011  loss_scaler : 32768 
DLL 2020-11-27 03:13:12.772453 - Iteration: 338  throughput_train : 263.597 seq/s mlm_loss : 7.4980  nsp_loss : 0.0000  total_loss : 7.4980  avg_loss_step : 7.4980  learning_rate : 0.00032594477  loss_scaler : 32768 
DLL 2020-11-27 03:13:14.721102 - Iteration: 339  throughput_train : 262.788 seq/s mlm_loss : 7.4121  nsp_loss : 0.0000  total_loss : 7.4121  avg_loss_step : 7.4121  learning_rate : 0.00032569922  loss_scaler : 32768 
DLL 2020-11-27 03:13:16.660199 - Iteration: 340  throughput_train : 264.084 seq/s mlm_loss : 7.4249  nsp_loss : 0.0000  total_loss : 7.4249  avg_loss_step : 7.4249  learning_rate : 0.00032545353  loss_scaler : 32768 
DLL 2020-11-27 03:13:18.607639 - Iteration: 341  throughput_train : 262.947 seq/s mlm_loss : 7.2939  nsp_loss : 0.0000  total_loss : 7.2939  avg_loss_step : 7.2939  learning_rate : 0.00032520763  loss_scaler : 32768 
DLL 2020-11-27 03:13:20.558980 - Iteration: 342  throughput_train : 262.426 seq/s mlm_loss : 7.4397  nsp_loss : 0.0000  total_loss : 7.4397  avg_loss_step : 7.4397  learning_rate : 0.0003249615  loss_scaler : 32768 
DLL 2020-11-27 03:13:22.505950 - Iteration: 343  throughput_train : 263.025 seq/s mlm_loss : 7.4897  nsp_loss : 0.0000  total_loss : 7.4897  avg_loss_step : 7.4897  learning_rate : 0.00032471525  loss_scaler : 32768 
DLL 2020-11-27 03:13:24.444568 - Iteration: 344  throughput_train : 264.160 seq/s mlm_loss : 7.4679  nsp_loss : 0.0000  total_loss : 7.4679  avg_loss_step : 7.4679  learning_rate : 0.0003244688  loss_scaler : 32768 
DLL 2020-11-27 03:13:26.390096 - Iteration: 345  throughput_train : 263.206 seq/s mlm_loss : 7.3869  nsp_loss : 0.0000  total_loss : 7.3869  avg_loss_step : 7.3869  learning_rate : 0.0003242221  loss_scaler : 32768 
DLL 2020-11-27 03:13:28.330263 - Iteration: 346  throughput_train : 263.940 seq/s mlm_loss : 7.2949  nsp_loss : 0.0000  total_loss : 7.2949  avg_loss_step : 7.2949  learning_rate : 0.00032397528  loss_scaler : 32768 
DLL 2020-11-27 03:13:30.271210 - Iteration: 347  throughput_train : 263.850 seq/s mlm_loss : 7.4573  nsp_loss : 0.0000  total_loss : 7.4573  avg_loss_step : 7.4573  learning_rate : 0.00032372828  loss_scaler : 32768 
DLL 2020-11-27 03:13:32.215272 - Iteration: 348  throughput_train : 263.406 seq/s mlm_loss : 7.4474  nsp_loss : 0.0000  total_loss : 7.4474  avg_loss_step : 7.4474  learning_rate : 0.00032348104  loss_scaler : 32768 
DLL 2020-11-27 03:13:34.164977 - Iteration: 349  throughput_train : 262.643 seq/s mlm_loss : 7.4986  nsp_loss : 0.0000  total_loss : 7.4986  avg_loss_step : 7.4986  learning_rate : 0.00032323363  loss_scaler : 32768 
DLL 2020-11-27 03:13:36.109154 - Iteration: 350  throughput_train : 263.390 seq/s mlm_loss : 7.3208  nsp_loss : 0.0000  total_loss : 7.3208  avg_loss_step : 7.3208  learning_rate : 0.00032298604  loss_scaler : 32768 
DLL 2020-11-27 03:13:38.059980 - Iteration: 351  throughput_train : 262.492 seq/s mlm_loss : 7.4378  nsp_loss : 0.0000  total_loss : 7.4378  avg_loss_step : 7.4378  learning_rate : 0.00032273828  loss_scaler : 32768 
DLL 2020-11-27 03:13:39.986562 - Iteration: 352  throughput_train : 265.797 seq/s mlm_loss : 7.4937  nsp_loss : 0.0000  total_loss : 7.4937  avg_loss_step : 7.4937  learning_rate : 0.0003224903  loss_scaler : 32768 
DLL 2020-11-27 03:13:41.941464 - Iteration: 353  throughput_train : 261.949 seq/s mlm_loss : 7.2853  nsp_loss : 0.0000  total_loss : 7.2853  avg_loss_step : 7.2853  learning_rate : 0.00032224212  loss_scaler : 32768 
DLL 2020-11-27 03:13:43.885609 - Iteration: 354  throughput_train : 263.397 seq/s mlm_loss : 7.3577  nsp_loss : 0.0000  total_loss : 7.3577  avg_loss_step : 7.3577  learning_rate : 0.00032199378  loss_scaler : 32768 
DLL 2020-11-27 03:13:45.830018 - Iteration: 355  throughput_train : 263.362 seq/s mlm_loss : 7.4886  nsp_loss : 0.0000  total_loss : 7.4886  avg_loss_step : 7.4886  learning_rate : 0.00032174523  loss_scaler : 32768 
DLL 2020-11-27 03:13:47.780367 - Iteration: 356  throughput_train : 262.556 seq/s mlm_loss : 7.4123  nsp_loss : 0.0000  total_loss : 7.4123  avg_loss_step : 7.4123  learning_rate : 0.0003214965  loss_scaler : 32768 
DLL 2020-11-27 03:13:49.737945 - Iteration: 357  throughput_train : 261.589 seq/s mlm_loss : 7.4444  nsp_loss : 0.0000  total_loss : 7.4444  avg_loss_step : 7.4444  learning_rate : 0.00032124756  loss_scaler : 32768 
DLL 2020-11-27 03:13:51.682312 - Iteration: 358  throughput_train : 263.364 seq/s mlm_loss : 7.4967  nsp_loss : 0.0000  total_loss : 7.4967  avg_loss_step : 7.4967  learning_rate : 0.00032099843  loss_scaler : 32768 
DLL 2020-11-27 03:13:53.629990 - Iteration: 359  throughput_train : 262.916 seq/s mlm_loss : 7.4295  nsp_loss : 0.0000  total_loss : 7.4295  avg_loss_step : 7.4295  learning_rate : 0.0003207491  loss_scaler : 32768 
DLL 2020-11-27 03:13:55.576567 - Iteration: 360  throughput_train : 263.066 seq/s mlm_loss : 7.4240  nsp_loss : 0.0000  total_loss : 7.4240  avg_loss_step : 7.4240  learning_rate : 0.0003204996  loss_scaler : 32768 
DLL 2020-11-27 03:13:57.521788 - Iteration: 361  throughput_train : 263.247 seq/s mlm_loss : 7.3376  nsp_loss : 0.0000  total_loss : 7.3376  avg_loss_step : 7.3376  learning_rate : 0.00032024988  loss_scaler : 32768 
DLL 2020-11-27 03:13:59.463882 - Iteration: 362  throughput_train : 263.674 seq/s mlm_loss : 7.3068  nsp_loss : 0.0000  total_loss : 7.3068  avg_loss_step : 7.3068  learning_rate : 0.00032  loss_scaler : 32768 
DLL 2020-11-27 03:14:01.413182 - Iteration: 363  throughput_train : 262.696 seq/s mlm_loss : 7.2490  nsp_loss : 0.0000  total_loss : 7.2490  avg_loss_step : 7.2490  learning_rate : 0.00031974987  loss_scaler : 32768 
DLL 2020-11-27 03:14:03.355906 - Iteration: 364  throughput_train : 263.587 seq/s mlm_loss : 7.4289  nsp_loss : 0.0000  total_loss : 7.4289  avg_loss_step : 7.4289  learning_rate : 0.0003194996  loss_scaler : 32768 
DLL 2020-11-27 03:14:05.299932 - Iteration: 365  throughput_train : 263.410 seq/s mlm_loss : 7.2907  nsp_loss : 0.0000  total_loss : 7.2907  avg_loss_step : 7.2907  learning_rate : 0.00031924908  loss_scaler : 32768 
DLL 2020-11-27 03:14:07.249375 - Iteration: 366  throughput_train : 262.678 seq/s mlm_loss : 7.3538  nsp_loss : 0.0000  total_loss : 7.3538  avg_loss_step : 7.3538  learning_rate : 0.0003189984  loss_scaler : 32768 
DLL 2020-11-27 03:14:09.198219 - Iteration: 367  throughput_train : 262.760 seq/s mlm_loss : 7.3458  nsp_loss : 0.0000  total_loss : 7.3458  avg_loss_step : 7.3458  learning_rate : 0.00031874754  loss_scaler : 32768 
DLL 2020-11-27 03:14:11.145848 - Iteration: 368  throughput_train : 262.927 seq/s mlm_loss : 7.3479  nsp_loss : 0.0000  total_loss : 7.3479  avg_loss_step : 7.3479  learning_rate : 0.00031849643  loss_scaler : 32768 
DLL 2020-11-27 03:14:13.090710 - Iteration: 369  throughput_train : 263.296 seq/s mlm_loss : 7.3657  nsp_loss : 0.0000  total_loss : 7.3657  avg_loss_step : 7.3657  learning_rate : 0.00031824518  loss_scaler : 32768 
DLL 2020-11-27 03:14:15.033829 - Iteration: 370  throughput_train : 263.532 seq/s mlm_loss : 7.3974  nsp_loss : 0.0000  total_loss : 7.3974  avg_loss_step : 7.3974  learning_rate : 0.0003179937  loss_scaler : 32768 
DLL 2020-11-27 03:14:16.973736 - Iteration: 371  throughput_train : 263.970 seq/s mlm_loss : 7.4322  nsp_loss : 0.0000  total_loss : 7.4322  avg_loss_step : 7.4322  learning_rate : 0.00031774203  loss_scaler : 32768 
DLL 2020-11-27 03:14:18.920804 - Iteration: 372  throughput_train : 263.002 seq/s mlm_loss : 7.5240  nsp_loss : 0.0000  total_loss : 7.5240  avg_loss_step : 7.5240  learning_rate : 0.00031749014  loss_scaler : 32768 
DLL 2020-11-27 03:14:20.868788 - Iteration: 373  throughput_train : 262.878 seq/s mlm_loss : 7.4041  nsp_loss : 0.0000  total_loss : 7.4041  avg_loss_step : 7.4041  learning_rate : 0.00031723807  loss_scaler : 32768 
DLL 2020-11-27 03:14:22.809522 - Iteration: 374  throughput_train : 263.867 seq/s mlm_loss : 7.3821  nsp_loss : 0.0000  total_loss : 7.3821  avg_loss_step : 7.3821  learning_rate : 0.0003169858  loss_scaler : 32768 
DLL 2020-11-27 03:14:24.754125 - Iteration: 375  throughput_train : 263.332 seq/s mlm_loss : 7.4149  nsp_loss : 0.0000  total_loss : 7.4149  avg_loss_step : 7.4149  learning_rate : 0.0003167333  loss_scaler : 32768 
DLL 2020-11-27 03:14:26.700220 - Iteration: 376  throughput_train : 263.129 seq/s mlm_loss : 7.3221  nsp_loss : 0.0000  total_loss : 7.3221  avg_loss_step : 7.3221  learning_rate : 0.00031648064  loss_scaler : 32768 
DLL 2020-11-27 03:14:28.642378 - Iteration: 377  throughput_train : 263.663 seq/s mlm_loss : 7.3969  nsp_loss : 0.0000  total_loss : 7.3969  avg_loss_step : 7.3969  learning_rate : 0.00031622776  loss_scaler : 32768 
DLL 2020-11-27 03:14:30.580951 - Iteration: 378  throughput_train : 264.161 seq/s mlm_loss : 7.3166  nsp_loss : 0.0000  total_loss : 7.3166  avg_loss_step : 7.3166  learning_rate : 0.00031597467  loss_scaler : 32768 
DLL 2020-11-27 03:14:32.529395 - Iteration: 379  throughput_train : 262.845 seq/s mlm_loss : 7.3591  nsp_loss : 0.0000  total_loss : 7.3591  avg_loss_step : 7.3591  learning_rate : 0.00031572138  loss_scaler : 32768 
DLL 2020-11-27 03:14:34.464180 - Iteration: 380  throughput_train : 264.702 seq/s mlm_loss : 7.3062  nsp_loss : 0.0000  total_loss : 7.3062  avg_loss_step : 7.3062  learning_rate : 0.00031546789  loss_scaler : 32768 
DLL 2020-11-27 03:14:36.411542 - Iteration: 381  throughput_train : 262.993 seq/s mlm_loss : 7.3154  nsp_loss : 0.0000  total_loss : 7.3154  avg_loss_step : 7.3154  learning_rate : 0.0003152142  loss_scaler : 32768 
DLL 2020-11-27 03:14:38.362082 - Iteration: 382  throughput_train : 262.562 seq/s mlm_loss : 7.2916  nsp_loss : 0.0000  total_loss : 7.2916  avg_loss_step : 7.2916  learning_rate : 0.0003149603  loss_scaler : 32768 
DLL 2020-11-27 03:14:40.308844 - Iteration: 383  throughput_train : 263.065 seq/s mlm_loss : 7.2635  nsp_loss : 0.0000  total_loss : 7.2635  avg_loss_step : 7.2635  learning_rate : 0.0003147062  loss_scaler : 32768 
DLL 2020-11-27 03:14:42.262053 - Iteration: 384  throughput_train : 262.182 seq/s mlm_loss : 7.4183  nsp_loss : 0.0000  total_loss : 7.4183  avg_loss_step : 7.4183  learning_rate : 0.00031445187  loss_scaler : 32768 
DLL 2020-11-27 03:14:44.207124 - Iteration: 385  throughput_train : 263.301 seq/s mlm_loss : 7.3548  nsp_loss : 0.0000  total_loss : 7.3548  avg_loss_step : 7.3548  learning_rate : 0.0003141974  loss_scaler : 32768 
DLL 2020-11-27 03:14:46.153029 - Iteration: 386  throughput_train : 263.188 seq/s mlm_loss : 7.3124  nsp_loss : 0.0000  total_loss : 7.3124  avg_loss_step : 7.3124  learning_rate : 0.00031394267  loss_scaler : 32768 
DLL 2020-11-27 03:14:48.094674 - Iteration: 387  throughput_train : 263.765 seq/s mlm_loss : 7.2628  nsp_loss : 0.0000  total_loss : 7.2628  avg_loss_step : 7.2628  learning_rate : 0.00031368775  loss_scaler : 32768 
DLL 2020-11-27 03:14:50.043893 - Iteration: 388  throughput_train : 262.740 seq/s mlm_loss : 7.2957  nsp_loss : 0.0000  total_loss : 7.2957  avg_loss_step : 7.2957  learning_rate : 0.0003134326  loss_scaler : 32768 
DLL 2020-11-27 03:14:51.996387 - Iteration: 389  throughput_train : 262.299 seq/s mlm_loss : 7.1609  nsp_loss : 0.0000  total_loss : 7.1609  avg_loss_step : 7.1609  learning_rate : 0.00031317724  loss_scaler : 32768 
DLL 2020-11-27 03:14:53.947261 - Iteration: 390  throughput_train : 262.519 seq/s mlm_loss : 7.4116  nsp_loss : 0.0000  total_loss : 7.4116  avg_loss_step : 7.4116  learning_rate : 0.0003129217  loss_scaler : 32768 
DLL 2020-11-27 03:14:55.901634 - Iteration: 391  throughput_train : 262.049 seq/s mlm_loss : 7.3070  nsp_loss : 0.0000  total_loss : 7.3070  avg_loss_step : 7.3070  learning_rate : 0.00031266594  loss_scaler : 32768 
DLL 2020-11-27 03:14:57.851915 - Iteration: 392  throughput_train : 262.588 seq/s mlm_loss : 7.3381  nsp_loss : 0.0000  total_loss : 7.3381  avg_loss_step : 7.3381  learning_rate : 0.00031240997  loss_scaler : 32768 
DLL 2020-11-27 03:14:59.787004 - Iteration: 393  throughput_train : 264.626 seq/s mlm_loss : 7.3303  nsp_loss : 0.0000  total_loss : 7.3303  avg_loss_step : 7.3303  learning_rate : 0.00031215377  loss_scaler : 32768 
DLL 2020-11-27 03:15:01.727589 - Iteration: 394  throughput_train : 263.879 seq/s mlm_loss : 7.3183  nsp_loss : 0.0000  total_loss : 7.3183  avg_loss_step : 7.3183  learning_rate : 0.00031189743  loss_scaler : 32768 
DLL 2020-11-27 03:15:03.680450 - Iteration: 395  throughput_train : 262.219 seq/s mlm_loss : 7.2667  nsp_loss : 0.0000  total_loss : 7.2667  avg_loss_step : 7.2667  learning_rate : 0.0003116408  loss_scaler : 32768 
DLL 2020-11-27 03:15:05.628807 - Iteration: 396  throughput_train : 262.825 seq/s mlm_loss : 7.3126  nsp_loss : 0.0000  total_loss : 7.3126  avg_loss_step : 7.3126  learning_rate : 0.00031138398  loss_scaler : 32768 
DLL 2020-11-27 03:15:07.570734 - Iteration: 397  throughput_train : 263.695 seq/s mlm_loss : 7.2125  nsp_loss : 0.0000  total_loss : 7.2125  avg_loss_step : 7.2125  learning_rate : 0.000311127  loss_scaler : 32768 
DLL 2020-11-27 03:15:09.511404 - Iteration: 398  throughput_train : 263.866 seq/s mlm_loss : 7.3387  nsp_loss : 0.0000  total_loss : 7.3387  avg_loss_step : 7.3387  learning_rate : 0.00031086974  loss_scaler : 32768 
DLL 2020-11-27 03:15:11.463021 - Iteration: 399  throughput_train : 262.388 seq/s mlm_loss : 7.1753  nsp_loss : 0.0000  total_loss : 7.1753  avg_loss_step : 7.1753  learning_rate : 0.0003106123  loss_scaler : 32768 
DLL 2020-11-27 03:15:13.414586 - Iteration: 400  throughput_train : 262.393 seq/s mlm_loss : 7.2465  nsp_loss : 0.0000  total_loss : 7.2465  avg_loss_step : 7.2465  learning_rate : 0.00031035463  loss_scaler : 32768 
DLL 2020-11-27 03:15:15.367561 - Iteration: 401  throughput_train : 262.205 seq/s mlm_loss : 7.1308  nsp_loss : 0.0000  total_loss : 7.1308  avg_loss_step : 7.1308  learning_rate : 0.00031009674  loss_scaler : 32768 
DLL 2020-11-27 03:15:17.315753 - Iteration: 402  throughput_train : 262.849 seq/s mlm_loss : 7.2739  nsp_loss : 0.0000  total_loss : 7.2739  avg_loss_step : 7.2739  learning_rate : 0.00030983868  loss_scaler : 32768 
DLL 2020-11-27 03:15:19.270262 - Iteration: 403  throughput_train : 261.999 seq/s mlm_loss : 7.1999  nsp_loss : 0.0000  total_loss : 7.1999  avg_loss_step : 7.1999  learning_rate : 0.00030958035  loss_scaler : 32768 
DLL 2020-11-27 03:15:21.228614 - Iteration: 404  throughput_train : 261.483 seq/s mlm_loss : 7.2858  nsp_loss : 0.0000  total_loss : 7.2858  avg_loss_step : 7.2858  learning_rate : 0.00030932183  loss_scaler : 32768 
DLL 2020-11-27 03:15:23.174665 - Iteration: 405  throughput_train : 263.136 seq/s mlm_loss : 7.3039  nsp_loss : 0.0000  total_loss : 7.3039  avg_loss_step : 7.3039  learning_rate : 0.0003090631  loss_scaler : 32768 
DLL 2020-11-27 03:15:25.124872 - Iteration: 406  throughput_train : 262.578 seq/s mlm_loss : 7.2997  nsp_loss : 0.0000  total_loss : 7.2997  avg_loss_step : 7.2997  learning_rate : 0.00030880413  loss_scaler : 32768 
DLL 2020-11-27 03:15:27.065716 - Iteration: 407  throughput_train : 263.842 seq/s mlm_loss : 7.2214  nsp_loss : 0.0000  total_loss : 7.2214  avg_loss_step : 7.2214  learning_rate : 0.00030854496  loss_scaler : 32768 
DLL 2020-11-27 03:15:29.011778 - Iteration: 408  throughput_train : 263.135 seq/s mlm_loss : 7.4462  nsp_loss : 0.0000  total_loss : 7.4462  avg_loss_step : 7.4462  learning_rate : 0.00030828555  loss_scaler : 32768 
DLL 2020-11-27 03:15:30.963769 - Iteration: 409  throughput_train : 262.336 seq/s mlm_loss : 7.3811  nsp_loss : 0.0000  total_loss : 7.3811  avg_loss_step : 7.3811  learning_rate : 0.00030802598  loss_scaler : 32768 
DLL 2020-11-27 03:15:32.907842 - Iteration: 410  throughput_train : 263.407 seq/s mlm_loss : 7.2664  nsp_loss : 0.0000  total_loss : 7.2664  avg_loss_step : 7.2664  learning_rate : 0.00030776614  loss_scaler : 32768 
DLL 2020-11-27 03:15:34.858753 - Iteration: 411  throughput_train : 262.480 seq/s mlm_loss : 7.3723  nsp_loss : 0.0000  total_loss : 7.3723  avg_loss_step : 7.3723  learning_rate : 0.00030750607  loss_scaler : 32768 
DLL 2020-11-27 03:15:36.800987 - Iteration: 412  throughput_train : 263.652 seq/s mlm_loss : 7.2971  nsp_loss : 0.0000  total_loss : 7.2971  avg_loss_step : 7.2971  learning_rate : 0.00030724582  loss_scaler : 32768 
DLL 2020-11-27 03:15:38.741682 - Iteration: 413  throughput_train : 263.862 seq/s mlm_loss : 7.2804  nsp_loss : 0.0000  total_loss : 7.2804  avg_loss_step : 7.2804  learning_rate : 0.0003069853  loss_scaler : 32768 
DLL 2020-11-27 03:15:40.696054 - Iteration: 414  throughput_train : 262.016 seq/s mlm_loss : 7.2973  nsp_loss : 0.0000  total_loss : 7.2973  avg_loss_step : 7.2973  learning_rate : 0.0003067246  loss_scaler : 32768 
DLL 2020-11-27 03:15:42.639584 - Iteration: 415  throughput_train : 263.482 seq/s mlm_loss : 7.3678  nsp_loss : 0.0000  total_loss : 7.3678  avg_loss_step : 7.3678  learning_rate : 0.00030646368  loss_scaler : 32768 
DLL 2020-11-27 03:15:44.588588 - Iteration: 416  throughput_train : 262.741 seq/s mlm_loss : 7.2747  nsp_loss : 0.0000  total_loss : 7.2747  avg_loss_step : 7.2747  learning_rate : 0.00030620254  loss_scaler : 32768 
DLL 2020-11-27 03:15:46.535568 - Iteration: 417  throughput_train : 263.012 seq/s mlm_loss : 7.2525  nsp_loss : 0.0000  total_loss : 7.2525  avg_loss_step : 7.2525  learning_rate : 0.00030594118  loss_scaler : 32768 
DLL 2020-11-27 03:15:48.481952 - Iteration: 418  throughput_train : 263.091 seq/s mlm_loss : 7.3423  nsp_loss : 0.0000  total_loss : 7.3423  avg_loss_step : 7.3423  learning_rate : 0.00030567954  loss_scaler : 32768 
DLL 2020-11-27 03:15:50.421816 - Iteration: 419  throughput_train : 263.976 seq/s mlm_loss : 7.3036  nsp_loss : 0.0000  total_loss : 7.3036  avg_loss_step : 7.3036  learning_rate : 0.00030541772  loss_scaler : 32768 
DLL 2020-11-27 03:15:52.367017 - Iteration: 420  throughput_train : 263.251 seq/s mlm_loss : 7.3738  nsp_loss : 0.0000  total_loss : 7.3738  avg_loss_step : 7.3738  learning_rate : 0.0003051557  loss_scaler : 32768 
DLL 2020-11-27 03:15:54.303752 - Iteration: 421  throughput_train : 264.402 seq/s mlm_loss : 7.2585  nsp_loss : 0.0000  total_loss : 7.2585  avg_loss_step : 7.2585  learning_rate : 0.00030489342  loss_scaler : 32768 
DLL 2020-11-27 03:15:56.253725 - Iteration: 422  throughput_train : 262.606 seq/s mlm_loss : 7.2003  nsp_loss : 0.0000  total_loss : 7.2003  avg_loss_step : 7.2003  learning_rate : 0.00030463093  loss_scaler : 32768 
DLL 2020-11-27 03:15:58.186356 - Iteration: 423  throughput_train : 264.963 seq/s mlm_loss : 7.1696  nsp_loss : 0.0000  total_loss : 7.1696  avg_loss_step : 7.1696  learning_rate : 0.00030436818  loss_scaler : 32768 
DLL 2020-11-27 03:16:00.139047 - Iteration: 424  throughput_train : 262.242 seq/s mlm_loss : 7.1525  nsp_loss : 0.0000  total_loss : 7.1525  avg_loss_step : 7.1525  learning_rate : 0.00030410523  loss_scaler : 32768 
DLL 2020-11-27 03:16:02.081917 - Iteration: 425  throughput_train : 263.567 seq/s mlm_loss : 7.2703  nsp_loss : 0.0000  total_loss : 7.2703  avg_loss_step : 7.2703  learning_rate : 0.00030384207  loss_scaler : 32768 
DLL 2020-11-27 03:16:04.027022 - Iteration: 426  throughput_train : 263.264 seq/s mlm_loss : 7.2664  nsp_loss : 0.0000  total_loss : 7.2664  avg_loss_step : 7.2664  learning_rate : 0.00030357862  loss_scaler : 32768 
DLL 2020-11-27 03:16:05.968365 - Iteration: 427  throughput_train : 263.775 seq/s mlm_loss : 7.3296  nsp_loss : 0.0000  total_loss : 7.3296  avg_loss_step : 7.3296  learning_rate : 0.000303315  loss_scaler : 32768 
DLL 2020-11-27 03:16:07.924970 - Iteration: 428  throughput_train : 261.721 seq/s mlm_loss : 7.2815  nsp_loss : 0.0000  total_loss : 7.2815  avg_loss_step : 7.2815  learning_rate : 0.00030305114  loss_scaler : 32768 
DLL 2020-11-27 03:16:09.857325 - Iteration: 429  throughput_train : 265.002 seq/s mlm_loss : 7.2173  nsp_loss : 0.0000  total_loss : 7.2173  avg_loss_step : 7.2173  learning_rate : 0.00030278703  loss_scaler : 32768 
DLL 2020-11-27 03:16:11.800881 - Iteration: 430  throughput_train : 263.476 seq/s mlm_loss : 7.0816  nsp_loss : 0.0000  total_loss : 7.0816  avg_loss_step : 7.0816  learning_rate : 0.0003025227  loss_scaler : 32768 
DLL 2020-11-27 03:16:13.749441 - Iteration: 431  throughput_train : 262.797 seq/s mlm_loss : 7.3168  nsp_loss : 0.0000  total_loss : 7.3168  avg_loss_step : 7.3168  learning_rate : 0.00030225815  loss_scaler : 32768 
DLL 2020-11-27 03:16:15.700028 - Iteration: 432  throughput_train : 262.525 seq/s mlm_loss : 7.2327  nsp_loss : 0.0000  total_loss : 7.2327  avg_loss_step : 7.2327  learning_rate : 0.00030199336  loss_scaler : 32768 
DLL 2020-11-27 03:16:17.643313 - Iteration: 433  throughput_train : 263.511 seq/s mlm_loss : 7.2969  nsp_loss : 0.0000  total_loss : 7.2969  avg_loss_step : 7.2969  learning_rate : 0.00030172837  loss_scaler : 32768 
DLL 2020-11-27 03:16:19.602435 - Iteration: 434  throughput_train : 261.383 seq/s mlm_loss : 7.2798  nsp_loss : 0.0000  total_loss : 7.2798  avg_loss_step : 7.2798  learning_rate : 0.00030146306  loss_scaler : 32768 
DLL 2020-11-27 03:16:21.545878 - Iteration: 435  throughput_train : 263.491 seq/s mlm_loss : 7.2688  nsp_loss : 0.0000  total_loss : 7.2688  avg_loss_step : 7.2688  learning_rate : 0.00030119758  loss_scaler : 32768 
DLL 2020-11-27 03:16:23.489194 - Iteration: 436  throughput_train : 263.507 seq/s mlm_loss : 7.3614  nsp_loss : 0.0000  total_loss : 7.3614  avg_loss_step : 7.3614  learning_rate : 0.0003009319  loss_scaler : 32768 
DLL 2020-11-27 03:16:25.440544 - Iteration: 437  throughput_train : 262.423 seq/s mlm_loss : 7.2099  nsp_loss : 0.0000  total_loss : 7.2099  avg_loss_step : 7.2099  learning_rate : 0.0003006659  loss_scaler : 32768 
DLL 2020-11-27 03:16:27.397897 - Iteration: 438  throughput_train : 261.617 seq/s mlm_loss : 7.3102  nsp_loss : 0.0000  total_loss : 7.3102  avg_loss_step : 7.3102  learning_rate : 0.00030039973  loss_scaler : 32768 
DLL 2020-11-27 03:16:29.343546 - Iteration: 439  throughput_train : 263.193 seq/s mlm_loss : 7.1580  nsp_loss : 0.0000  total_loss : 7.1580  avg_loss_step : 7.1580  learning_rate : 0.00030013328  loss_scaler : 32768 
DLL 2020-11-27 03:16:31.295704 - Iteration: 440  throughput_train : 262.314 seq/s mlm_loss : 7.2427  nsp_loss : 0.0000  total_loss : 7.2427  avg_loss_step : 7.2427  learning_rate : 0.00029986663  loss_scaler : 32768 
DLL 2020-11-27 03:16:33.237572 - Iteration: 441  throughput_train : 263.704 seq/s mlm_loss : 7.3552  nsp_loss : 0.0000  total_loss : 7.3552  avg_loss_step : 7.3552  learning_rate : 0.00029959972  loss_scaler : 32768 
DLL 2020-11-27 03:16:35.179117 - Iteration: 442  throughput_train : 263.747 seq/s mlm_loss : 7.3647  nsp_loss : 0.0000  total_loss : 7.3647  avg_loss_step : 7.3647  learning_rate : 0.00029933258  loss_scaler : 32768 
DLL 2020-11-27 03:16:37.128443 - Iteration: 443  throughput_train : 262.694 seq/s mlm_loss : 7.1967  nsp_loss : 0.0000  total_loss : 7.1967  avg_loss_step : 7.1967  learning_rate : 0.0002990652  loss_scaler : 32768 
DLL 2020-11-27 03:16:39.076634 - Iteration: 444  throughput_train : 262.849 seq/s mlm_loss : 7.2465  nsp_loss : 0.0000  total_loss : 7.2465  avg_loss_step : 7.2465  learning_rate : 0.0002987976  loss_scaler : 32768 
DLL 2020-11-27 03:16:41.020180 - Iteration: 445  throughput_train : 263.475 seq/s mlm_loss : 7.1700  nsp_loss : 0.0000  total_loss : 7.1700  avg_loss_step : 7.1700  learning_rate : 0.00029852972  loss_scaler : 32768 
DLL 2020-11-27 03:16:42.964342 - Iteration: 446  throughput_train : 263.392 seq/s mlm_loss : 7.1666  nsp_loss : 0.0000  total_loss : 7.1666  avg_loss_step : 7.1666  learning_rate : 0.0002982616  loss_scaler : 32768 
DLL 2020-11-27 03:16:44.913690 - Iteration: 447  throughput_train : 262.692 seq/s mlm_loss : 7.3221  nsp_loss : 0.0000  total_loss : 7.3221  avg_loss_step : 7.3221  learning_rate : 0.00029799328  loss_scaler : 32768 
DLL 2020-11-27 03:16:46.862555 - Iteration: 448  throughput_train : 262.758 seq/s mlm_loss : 7.3824  nsp_loss : 0.0000  total_loss : 7.3824  avg_loss_step : 7.3824  learning_rate : 0.0002977247  loss_scaler : 32768 
DLL 2020-11-27 03:16:48.810640 - Iteration: 449  throughput_train : 262.861 seq/s mlm_loss : 7.2208  nsp_loss : 0.0000  total_loss : 7.2208  avg_loss_step : 7.2208  learning_rate : 0.00029745587  loss_scaler : 32768 
DLL 2020-11-27 03:16:50.761030 - Iteration: 450  throughput_train : 262.553 seq/s mlm_loss : 7.1204  nsp_loss : 0.0000  total_loss : 7.1204  avg_loss_step : 7.1204  learning_rate : 0.0002971868  loss_scaler : 32768 
DLL 2020-11-27 03:16:52.696460 - Iteration: 451  throughput_train : 264.580 seq/s mlm_loss : 7.2672  nsp_loss : 0.0000  total_loss : 7.2672  avg_loss_step : 7.2672  learning_rate : 0.00029691748  loss_scaler : 32768 
DLL 2020-11-27 03:16:54.640078 - Iteration: 452  throughput_train : 263.467 seq/s mlm_loss : 7.3183  nsp_loss : 0.0000  total_loss : 7.3183  avg_loss_step : 7.3183  learning_rate : 0.00029664792  loss_scaler : 32768 
DLL 2020-11-27 03:16:56.588397 - Iteration: 453  throughput_train : 262.829 seq/s mlm_loss : 7.1531  nsp_loss : 0.0000  total_loss : 7.1531  avg_loss_step : 7.1531  learning_rate : 0.00029637813  loss_scaler : 32768 
DLL 2020-11-27 03:16:58.544146 - Iteration: 454  throughput_train : 261.831 seq/s mlm_loss : 7.1511  nsp_loss : 0.0000  total_loss : 7.1511  avg_loss_step : 7.1511  learning_rate : 0.00029610808  loss_scaler : 32768 
DLL 2020-11-27 03:17:00.497813 - Iteration: 455  throughput_train : 262.110 seq/s mlm_loss : 7.2128  nsp_loss : 0.0000  total_loss : 7.2128  avg_loss_step : 7.2128  learning_rate : 0.0002958378  loss_scaler : 32768 
DLL 2020-11-27 03:17:02.446322 - Iteration: 456  throughput_train : 262.804 seq/s mlm_loss : 7.2407  nsp_loss : 0.0000  total_loss : 7.2407  avg_loss_step : 7.2407  learning_rate : 0.00029556724  loss_scaler : 32768 
DLL 2020-11-27 03:17:04.380240 - Iteration: 457  throughput_train : 264.789 seq/s mlm_loss : 7.2755  nsp_loss : 0.0000  total_loss : 7.2755  avg_loss_step : 7.2755  learning_rate : 0.00029529646  loss_scaler : 32768 
DLL 2020-11-27 03:17:06.330658 - Iteration: 458  throughput_train : 262.547 seq/s mlm_loss : 7.3426  nsp_loss : 0.0000  total_loss : 7.3426  avg_loss_step : 7.3426  learning_rate : 0.0002950254  loss_scaler : 32768 
DLL 2020-11-27 03:17:08.270300 - Iteration: 459  throughput_train : 264.005 seq/s mlm_loss : 7.2810  nsp_loss : 0.0000  total_loss : 7.2810  avg_loss_step : 7.2810  learning_rate : 0.0002947541  loss_scaler : 32768 
DLL 2020-11-27 03:17:10.218623 - Iteration: 460  throughput_train : 262.828 seq/s mlm_loss : 7.3477  nsp_loss : 0.0000  total_loss : 7.3477  avg_loss_step : 7.3477  learning_rate : 0.00029448257  loss_scaler : 32768 
DLL 2020-11-27 03:17:12.164246 - Iteration: 461  throughput_train : 263.193 seq/s mlm_loss : 7.1943  nsp_loss : 0.0000  total_loss : 7.1943  avg_loss_step : 7.1943  learning_rate : 0.0002942108  loss_scaler : 32768 
DLL 2020-11-27 03:17:14.109038 - Iteration: 462  throughput_train : 263.306 seq/s mlm_loss : 7.2505  nsp_loss : 0.0000  total_loss : 7.2505  avg_loss_step : 7.2505  learning_rate : 0.00029393873  loss_scaler : 32768 
DLL 2020-11-27 03:17:16.067267 - Iteration: 463  throughput_train : 261.500 seq/s mlm_loss : 7.1528  nsp_loss : 0.0000  total_loss : 7.1528  avg_loss_step : 7.1528  learning_rate : 0.00029366647  loss_scaler : 32768 
DLL 2020-11-27 03:17:18.014657 - Iteration: 464  throughput_train : 262.954 seq/s mlm_loss : 7.2128  nsp_loss : 0.0000  total_loss : 7.2128  avg_loss_step : 7.2128  learning_rate : 0.0002933939  loss_scaler : 32768 
DLL 2020-11-27 03:17:19.962783 - Iteration: 465  throughput_train : 262.855 seq/s mlm_loss : 7.1456  nsp_loss : 0.0000  total_loss : 7.1456  avg_loss_step : 7.1456  learning_rate : 0.00029312112  loss_scaler : 32768 
DLL 2020-11-27 03:17:21.912294 - Iteration: 466  throughput_train : 262.668 seq/s mlm_loss : 7.2387  nsp_loss : 0.0000  total_loss : 7.2387  avg_loss_step : 7.2387  learning_rate : 0.00029284807  loss_scaler : 32768 
DLL 2020-11-27 03:17:23.856882 - Iteration: 467  throughput_train : 263.333 seq/s mlm_loss : 6.9721  nsp_loss : 0.0000  total_loss : 6.9721  avg_loss_step : 6.9721  learning_rate : 0.00029257475  loss_scaler : 32768 
DLL 2020-11-27 03:17:25.792373 - Iteration: 468  throughput_train : 264.572 seq/s mlm_loss : 7.3186  nsp_loss : 0.0000  total_loss : 7.3186  avg_loss_step : 7.3186  learning_rate : 0.0002923012  loss_scaler : 32768 
DLL 2020-11-27 03:17:27.737909 - Iteration: 469  throughput_train : 263.206 seq/s mlm_loss : 7.2957  nsp_loss : 0.0000  total_loss : 7.2957  avg_loss_step : 7.2957  learning_rate : 0.0002920274  loss_scaler : 32768 
DLL 2020-11-27 03:17:29.689296 - Iteration: 470  throughput_train : 262.416 seq/s mlm_loss : 7.2355  nsp_loss : 0.0000  total_loss : 7.2355  avg_loss_step : 7.2355  learning_rate : 0.0002917533  loss_scaler : 32768 
DLL 2020-11-27 03:17:31.635084 - Iteration: 471  throughput_train : 263.172 seq/s mlm_loss : 7.0653  nsp_loss : 0.0000  total_loss : 7.0653  avg_loss_step : 7.0653  learning_rate : 0.000291479  loss_scaler : 32768 
DLL 2020-11-27 03:17:33.580531 - Iteration: 472  throughput_train : 263.221 seq/s mlm_loss : 7.1232  nsp_loss : 0.0000  total_loss : 7.1232  avg_loss_step : 7.1232  learning_rate : 0.00029120437  loss_scaler : 32768 
DLL 2020-11-27 03:17:35.533341 - Iteration: 473  throughput_train : 262.225 seq/s mlm_loss : 7.1772  nsp_loss : 0.0000  total_loss : 7.1772  avg_loss_step : 7.1772  learning_rate : 0.0002909295  loss_scaler : 32768 
DLL 2020-11-27 03:17:37.472083 - Iteration: 474  throughput_train : 264.130 seq/s mlm_loss : 7.1375  nsp_loss : 0.0000  total_loss : 7.1375  avg_loss_step : 7.1375  learning_rate : 0.00029065443  loss_scaler : 32768 
DLL 2020-11-27 03:17:39.417440 - Iteration: 475  throughput_train : 263.230 seq/s mlm_loss : 7.1084  nsp_loss : 0.0000  total_loss : 7.1084  avg_loss_step : 7.1084  learning_rate : 0.00029037904  loss_scaler : 32768 
DLL 2020-11-27 03:17:41.365612 - Iteration: 476  throughput_train : 262.850 seq/s mlm_loss : 7.0644  nsp_loss : 0.0000  total_loss : 7.0644  avg_loss_step : 7.0644  learning_rate : 0.0002901034  loss_scaler : 32768 
DLL 2020-11-27 03:17:43.316128 - Iteration: 477  throughput_train : 262.535 seq/s mlm_loss : 7.2486  nsp_loss : 0.0000  total_loss : 7.2486  avg_loss_step : 7.2486  learning_rate : 0.00028982753  loss_scaler : 32768 
DLL 2020-11-27 03:17:45.250002 - Iteration: 478  throughput_train : 264.792 seq/s mlm_loss : 7.1969  nsp_loss : 0.0000  total_loss : 7.1969  avg_loss_step : 7.1969  learning_rate : 0.00028955136  loss_scaler : 32768 
DLL 2020-11-27 03:17:47.210227 - Iteration: 479  throughput_train : 261.233 seq/s mlm_loss : 7.3055  nsp_loss : 0.0000  total_loss : 7.3055  avg_loss_step : 7.3055  learning_rate : 0.00028927496  loss_scaler : 32768 
DLL 2020-11-27 03:17:49.163325 - Iteration: 480  throughput_train : 262.187 seq/s mlm_loss : 7.1899  nsp_loss : 0.0000  total_loss : 7.1899  avg_loss_step : 7.1899  learning_rate : 0.00028899824  loss_scaler : 32768 
DLL 2020-11-27 03:17:51.094211 - Iteration: 481  throughput_train : 265.204 seq/s mlm_loss : 7.1026  nsp_loss : 0.0000  total_loss : 7.1026  avg_loss_step : 7.1026  learning_rate : 0.0002887213  loss_scaler : 32768 
DLL 2020-11-27 03:17:53.038726 - Iteration: 482  throughput_train : 263.343 seq/s mlm_loss : 7.2032  nsp_loss : 0.0000  total_loss : 7.2032  avg_loss_step : 7.2032  learning_rate : 0.00028844408  loss_scaler : 32768 
DLL 2020-11-27 03:17:54.986559 - Iteration: 483  throughput_train : 262.895 seq/s mlm_loss : 7.3204  nsp_loss : 0.0000  total_loss : 7.3204  avg_loss_step : 7.3204  learning_rate : 0.0002881666  loss_scaler : 32768 
DLL 2020-11-27 03:17:56.922554 - Iteration: 484  throughput_train : 264.503 seq/s mlm_loss : 7.1323  nsp_loss : 0.0000  total_loss : 7.1323  avg_loss_step : 7.1323  learning_rate : 0.00028788886  loss_scaler : 32768 
DLL 2020-11-27 03:17:58.882423 - Iteration: 485  throughput_train : 261.281 seq/s mlm_loss : 7.2339  nsp_loss : 0.0000  total_loss : 7.2339  avg_loss_step : 7.2339  learning_rate : 0.00028761083  loss_scaler : 32768 
DLL 2020-11-27 03:18:00.827993 - Iteration: 486  throughput_train : 263.206 seq/s mlm_loss : 7.2105  nsp_loss : 0.0000  total_loss : 7.2105  avg_loss_step : 7.2105  learning_rate : 0.00028733254  loss_scaler : 32768 
DLL 2020-11-27 03:18:02.779055 - Iteration: 487  throughput_train : 262.466 seq/s mlm_loss : 7.3181  nsp_loss : 0.0000  total_loss : 7.3181  avg_loss_step : 7.3181  learning_rate : 0.000287054  loss_scaler : 32768 
DLL 2020-11-27 03:18:04.728417 - Iteration: 488  throughput_train : 262.692 seq/s mlm_loss : 7.3262  nsp_loss : 0.0000  total_loss : 7.3262  avg_loss_step : 7.3262  learning_rate : 0.00028677515  loss_scaler : 32768 
DLL 2020-11-27 03:18:06.674498 - Iteration: 489  throughput_train : 263.151 seq/s mlm_loss : 7.2364  nsp_loss : 0.0000  total_loss : 7.2364  avg_loss_step : 7.2364  learning_rate : 0.00028649607  loss_scaler : 32768 
DLL 2020-11-27 03:18:08.612851 - Iteration: 490  throughput_train : 264.196 seq/s mlm_loss : 7.2315  nsp_loss : 0.0000  total_loss : 7.2315  avg_loss_step : 7.2315  learning_rate : 0.00028621667  loss_scaler : 32768 
DLL 2020-11-27 03:18:10.557102 - Iteration: 491  throughput_train : 263.380 seq/s mlm_loss : 7.1599  nsp_loss : 0.0000  total_loss : 7.1599  avg_loss_step : 7.1599  learning_rate : 0.00028593704  loss_scaler : 32768 
DLL 2020-11-27 03:18:12.502386 - Iteration: 492  throughput_train : 263.240 seq/s mlm_loss : 7.1576  nsp_loss : 0.0000  total_loss : 7.1576  avg_loss_step : 7.1576  learning_rate : 0.00028565712  loss_scaler : 32768 
DLL 2020-11-27 03:18:14.450672 - Iteration: 493  throughput_train : 262.835 seq/s mlm_loss : 7.1592  nsp_loss : 0.0000  total_loss : 7.1592  avg_loss_step : 7.1592  learning_rate : 0.0002853769  loss_scaler : 32768 
DLL 2020-11-27 03:18:16.403204 - Iteration: 494  throughput_train : 262.653 seq/s mlm_loss : 7.1827  nsp_loss : 0.0000  total_loss : 7.1827  avg_loss_step : 7.1827  learning_rate : 0.00028509647  loss_scaler : 32768 
DLL 2020-11-27 03:18:18.346185 - Iteration: 495  throughput_train : 263.552 seq/s mlm_loss : 7.2456  nsp_loss : 0.0000  total_loss : 7.2456  avg_loss_step : 7.2456  learning_rate : 0.0002848157  loss_scaler : 32768 
DLL 2020-11-27 03:18:20.293166 - Iteration: 496  throughput_train : 263.010 seq/s mlm_loss : 7.2091  nsp_loss : 0.0000  total_loss : 7.2091  avg_loss_step : 7.2091  learning_rate : 0.00028453468  loss_scaler : 32768 
DLL 2020-11-27 03:18:22.232044 - Iteration: 497  throughput_train : 264.112 seq/s mlm_loss : 7.1442  nsp_loss : 0.0000  total_loss : 7.1442  avg_loss_step : 7.1442  learning_rate : 0.0002842534  loss_scaler : 32768 
DLL 2020-11-27 03:18:24.182440 - Iteration: 498  throughput_train : 262.559 seq/s mlm_loss : 7.1780  nsp_loss : 0.0000  total_loss : 7.1780  avg_loss_step : 7.1780  learning_rate : 0.0002839718  loss_scaler : 32768 
DLL 2020-11-27 03:18:26.118348 - Iteration: 499  throughput_train : 264.517 seq/s mlm_loss : 7.1481  nsp_loss : 0.0000  total_loss : 7.1481  avg_loss_step : 7.1481  learning_rate : 0.00028368997  loss_scaler : 32768 
DLL 2020-11-27 03:18:28.067147 - Iteration: 500  throughput_train : 262.767 seq/s mlm_loss : 7.1578  nsp_loss : 0.0000  total_loss : 7.1578  avg_loss_step : 7.1578  learning_rate : 0.00028340783  loss_scaler : 32768 
DLL 2020-11-27 03:18:30.029405 - Iteration: 501  throughput_train : 260.962 seq/s mlm_loss : 7.1923  nsp_loss : 0.0000  total_loss : 7.1923  avg_loss_step : 7.1923  learning_rate : 0.00028312538  loss_scaler : 32768 
DLL 2020-11-27 03:18:31.981118 - Iteration: 502  throughput_train : 262.373 seq/s mlm_loss : 7.0698  nsp_loss : 0.0000  total_loss : 7.0698  avg_loss_step : 7.0698  learning_rate : 0.0002828427  loss_scaler : 32768 
DLL 2020-11-27 03:18:33.929806 - Iteration: 503  throughput_train : 262.782 seq/s mlm_loss : 7.1298  nsp_loss : 0.0000  total_loss : 7.1298  avg_loss_step : 7.1298  learning_rate : 0.0002825597  loss_scaler : 32768 
DLL 2020-11-27 03:18:35.887803 - Iteration: 504  throughput_train : 261.539 seq/s mlm_loss : 7.1348  nsp_loss : 0.0000  total_loss : 7.1348  avg_loss_step : 7.1348  learning_rate : 0.00028227642  loss_scaler : 32768 
DLL 2020-11-27 03:18:37.837362 - Iteration: 505  throughput_train : 262.661 seq/s mlm_loss : 7.0968  nsp_loss : 0.0000  total_loss : 7.0968  avg_loss_step : 7.0968  learning_rate : 0.0002819929  loss_scaler : 32768 
DLL 2020-11-27 03:18:39.777842 - Iteration: 506  throughput_train : 263.893 seq/s mlm_loss : 7.1746  nsp_loss : 0.0000  total_loss : 7.1746  avg_loss_step : 7.1746  learning_rate : 0.00028170907  loss_scaler : 32768 
DLL 2020-11-27 03:18:41.712612 - Iteration: 507  throughput_train : 264.670 seq/s mlm_loss : 7.0472  nsp_loss : 0.0000  total_loss : 7.0472  avg_loss_step : 7.0472  learning_rate : 0.00028142493  loss_scaler : 32768 
DLL 2020-11-27 03:18:43.658805 - Iteration: 508  throughput_train : 263.117 seq/s mlm_loss : 7.1247  nsp_loss : 0.0000  total_loss : 7.1247  avg_loss_step : 7.1247  learning_rate : 0.0002811405  loss_scaler : 32768 
DLL 2020-11-27 03:18:45.608370 - Iteration: 509  throughput_train : 262.662 seq/s mlm_loss : 7.1646  nsp_loss : 0.0000  total_loss : 7.1646  avg_loss_step : 7.1646  learning_rate : 0.0002808558  loss_scaler : 32768 
DLL 2020-11-27 03:18:47.562977 - Iteration: 510  throughput_train : 261.985 seq/s mlm_loss : 7.1234  nsp_loss : 0.0000  total_loss : 7.1234  avg_loss_step : 7.1234  learning_rate : 0.00028057082  loss_scaler : 32768 
DLL 2020-11-27 03:18:49.510394 - Iteration: 511  throughput_train : 262.951 seq/s mlm_loss : 7.1751  nsp_loss : 0.0000  total_loss : 7.1751  avg_loss_step : 7.1751  learning_rate : 0.00028028557  loss_scaler : 32768 
DLL 2020-11-27 03:18:51.460393 - Iteration: 512  throughput_train : 262.604 seq/s mlm_loss : 7.1782  nsp_loss : 0.0000  total_loss : 7.1782  avg_loss_step : 7.1782  learning_rate : 0.00027999998  loss_scaler : 32768 
DLL 2020-11-27 03:18:53.401768 - Iteration: 513  throughput_train : 263.771 seq/s mlm_loss : 7.2937  nsp_loss : 0.0000  total_loss : 7.2937  avg_loss_step : 7.2937  learning_rate : 0.00027971412  loss_scaler : 32768 
DLL 2020-11-27 03:18:55.344469 - Iteration: 514  throughput_train : 263.590 seq/s mlm_loss : 7.2783  nsp_loss : 0.0000  total_loss : 7.2783  avg_loss_step : 7.2783  learning_rate : 0.00027942797  loss_scaler : 32768 
DLL 2020-11-27 03:18:57.287717 - Iteration: 515  throughput_train : 263.518 seq/s mlm_loss : 7.2636  nsp_loss : 0.0000  total_loss : 7.2636  avg_loss_step : 7.2636  learning_rate : 0.00027914153  loss_scaler : 32768 
DLL 2020-11-27 03:18:59.237231 - Iteration: 516  throughput_train : 262.671 seq/s mlm_loss : 7.2480  nsp_loss : 0.0000  total_loss : 7.2480  avg_loss_step : 7.2480  learning_rate : 0.0002788548  loss_scaler : 32768 
DLL 2020-11-27 03:19:01.186263 - Iteration: 517  throughput_train : 262.733 seq/s mlm_loss : 7.0492  nsp_loss : 0.0000  total_loss : 7.0492  avg_loss_step : 7.0492  learning_rate : 0.00027856775  loss_scaler : 32768 
DLL 2020-11-27 03:19:03.135507 - Iteration: 518  throughput_train : 262.704 seq/s mlm_loss : 7.1615  nsp_loss : 0.0000  total_loss : 7.1615  avg_loss_step : 7.1615  learning_rate : 0.0002782804  loss_scaler : 32768 
DLL 2020-11-27 03:19:05.081326 - Iteration: 519  throughput_train : 263.169 seq/s mlm_loss : 7.1776  nsp_loss : 0.0000  total_loss : 7.1776  avg_loss_step : 7.1776  learning_rate : 0.0002779928  loss_scaler : 32768 
DLL 2020-11-27 03:19:07.025144 - Iteration: 520  throughput_train : 263.440 seq/s mlm_loss : 7.2151  nsp_loss : 0.0000  total_loss : 7.2151  avg_loss_step : 7.2151  learning_rate : 0.00027770488  loss_scaler : 32768 
DLL 2020-11-27 03:19:08.972771 - Iteration: 521  throughput_train : 262.925 seq/s mlm_loss : 7.1575  nsp_loss : 0.0000  total_loss : 7.1575  avg_loss_step : 7.1575  learning_rate : 0.00027741664  loss_scaler : 32768 
DLL 2020-11-27 03:19:10.912775 - Iteration: 522  throughput_train : 263.957 seq/s mlm_loss : 7.0164  nsp_loss : 0.0000  total_loss : 7.0164  avg_loss_step : 7.0164  learning_rate : 0.00027712813  loss_scaler : 32768 
DLL 2020-11-27 03:19:12.867211 - Iteration: 523  throughput_train : 262.006 seq/s mlm_loss : 7.1357  nsp_loss : 0.0000  total_loss : 7.1357  avg_loss_step : 7.1357  learning_rate : 0.0002768393  loss_scaler : 32768 
DLL 2020-11-27 03:19:14.815340 - Iteration: 524  throughput_train : 262.855 seq/s mlm_loss : 7.0565  nsp_loss : 0.0000  total_loss : 7.0565  avg_loss_step : 7.0565  learning_rate : 0.00027655016  loss_scaler : 32768 
DLL 2020-11-27 03:19:16.765248 - Iteration: 525  throughput_train : 262.617 seq/s mlm_loss : 7.1156  nsp_loss : 0.0000  total_loss : 7.1156  avg_loss_step : 7.1156  learning_rate : 0.00027626075  loss_scaler : 32768 
DLL 2020-11-27 03:19:18.713168 - Iteration: 526  throughput_train : 262.883 seq/s mlm_loss : 7.0051  nsp_loss : 0.0000  total_loss : 7.0051  avg_loss_step : 7.0051  learning_rate : 0.000275971  loss_scaler : 32768 
DLL 2020-11-27 03:19:20.658228 - Iteration: 527  throughput_train : 263.273 seq/s mlm_loss : 6.9388  nsp_loss : 0.0000  total_loss : 6.9388  avg_loss_step : 6.9388  learning_rate : 0.00027568097  loss_scaler : 32768 
DLL 2020-11-27 03:19:22.603323 - Iteration: 528  throughput_train : 263.267 seq/s mlm_loss : 6.9356  nsp_loss : 0.0000  total_loss : 6.9356  avg_loss_step : 6.9356  learning_rate : 0.00027539063  loss_scaler : 32768 
DLL 2020-11-27 03:19:24.541240 - Iteration: 529  throughput_train : 264.252 seq/s mlm_loss : 6.9701  nsp_loss : 0.0000  total_loss : 6.9701  avg_loss_step : 6.9701  learning_rate : 0.00027509997  loss_scaler : 32768 
DLL 2020-11-27 03:19:26.487712 - Iteration: 530  throughput_train : 263.082 seq/s mlm_loss : 7.0495  nsp_loss : 0.0000  total_loss : 7.0495  avg_loss_step : 7.0495  learning_rate : 0.00027480902  loss_scaler : 32768 
DLL 2020-11-27 03:19:28.429878 - Iteration: 531  throughput_train : 263.671 seq/s mlm_loss : 7.1275  nsp_loss : 0.0000  total_loss : 7.1275  avg_loss_step : 7.1275  learning_rate : 0.00027451775  loss_scaler : 32768 
DLL 2020-11-27 03:19:30.362474 - Iteration: 532  throughput_train : 264.970 seq/s mlm_loss : 7.2831  nsp_loss : 0.0000  total_loss : 7.2831  avg_loss_step : 7.2831  learning_rate : 0.00027422616  loss_scaler : 32768 
DLL 2020-11-27 03:19:32.318731 - Iteration: 533  throughput_train : 261.765 seq/s mlm_loss : 7.0477  nsp_loss : 0.0000  total_loss : 7.0477  avg_loss_step : 7.0477  learning_rate : 0.00027393427  loss_scaler : 32768 
DLL 2020-11-27 03:19:34.267678 - Iteration: 534  throughput_train : 262.745 seq/s mlm_loss : 7.2889  nsp_loss : 0.0000  total_loss : 7.2889  avg_loss_step : 7.2889  learning_rate : 0.0002736421  loss_scaler : 32768 
DLL 2020-11-27 03:19:36.212019 - Iteration: 535  throughput_train : 263.367 seq/s mlm_loss : 7.2063  nsp_loss : 0.0000  total_loss : 7.2063  avg_loss_step : 7.2063  learning_rate : 0.00027334958  loss_scaler : 32768 
DLL 2020-11-27 03:19:38.155338 - Iteration: 536  throughput_train : 263.506 seq/s mlm_loss : 7.0838  nsp_loss : 0.0000  total_loss : 7.0838  avg_loss_step : 7.0838  learning_rate : 0.00027305676  loss_scaler : 32768 
DLL 2020-11-27 03:19:40.101339 - Iteration: 537  throughput_train : 263.145 seq/s mlm_loss : 7.1725  nsp_loss : 0.0000  total_loss : 7.1725  avg_loss_step : 7.1725  learning_rate : 0.00027276363  loss_scaler : 32768 
DLL 2020-11-27 03:19:42.045766 - Iteration: 538  throughput_train : 263.361 seq/s mlm_loss : 7.3377  nsp_loss : 0.0000  total_loss : 7.3377  avg_loss_step : 7.3377  learning_rate : 0.00027247018  loss_scaler : 32768 
DLL 2020-11-27 03:19:43.988432 - Iteration: 539  throughput_train : 263.595 seq/s mlm_loss : 7.2170  nsp_loss : 0.0000  total_loss : 7.2170  avg_loss_step : 7.2170  learning_rate : 0.0002721764  loss_scaler : 32768 
DLL 2020-11-27 03:19:45.929438 - Iteration: 540  throughput_train : 263.824 seq/s mlm_loss : 7.0227  nsp_loss : 0.0000  total_loss : 7.0227  avg_loss_step : 7.0227  learning_rate : 0.0002718823  loss_scaler : 32768 
DLL 2020-11-27 03:19:47.877439 - Iteration: 541  throughput_train : 262.874 seq/s mlm_loss : 7.0522  nsp_loss : 0.0000  total_loss : 7.0522  avg_loss_step : 7.0522  learning_rate : 0.00027158792  loss_scaler : 32768 
DLL 2020-11-27 03:19:49.821037 - Iteration: 542  throughput_train : 263.470 seq/s mlm_loss : 7.2414  nsp_loss : 0.0000  total_loss : 7.2414  avg_loss_step : 7.2414  learning_rate : 0.0002712932  loss_scaler : 32768 
DLL 2020-11-27 03:19:51.782809 - Iteration: 543  throughput_train : 261.027 seq/s mlm_loss : 7.2697  nsp_loss : 0.0000  total_loss : 7.2697  avg_loss_step : 7.2697  learning_rate : 0.00027099813  loss_scaler : 32768 
DLL 2020-11-27 03:19:53.730627 - Iteration: 544  throughput_train : 262.897 seq/s mlm_loss : 7.2600  nsp_loss : 0.0000  total_loss : 7.2600  avg_loss_step : 7.2600  learning_rate : 0.00027070276  loss_scaler : 32768 
DLL 2020-11-27 03:19:55.680624 - Iteration: 545  throughput_train : 262.603 seq/s mlm_loss : 7.0905  nsp_loss : 0.0000  total_loss : 7.0905  avg_loss_step : 7.0905  learning_rate : 0.00027040706  loss_scaler : 32768 
DLL 2020-11-27 03:19:57.611166 - Iteration: 546  throughput_train : 265.250 seq/s mlm_loss : 7.0384  nsp_loss : 0.0000  total_loss : 7.0384  avg_loss_step : 7.0384  learning_rate : 0.00027011108  loss_scaler : 32768 
DLL 2020-11-27 03:19:59.552783 - Iteration: 547  throughput_train : 263.738 seq/s mlm_loss : 7.0897  nsp_loss : 0.0000  total_loss : 7.0897  avg_loss_step : 7.0897  learning_rate : 0.00026981474  loss_scaler : 32768 
DLL 2020-11-27 03:20:01.490483 - Iteration: 548  throughput_train : 264.270 seq/s mlm_loss : 7.0288  nsp_loss : 0.0000  total_loss : 7.0288  avg_loss_step : 7.0288  learning_rate : 0.0002695181  loss_scaler : 32768 
DLL 2020-11-27 03:20:03.442452 - Iteration: 549  throughput_train : 262.340 seq/s mlm_loss : 7.0422  nsp_loss : 0.0000  total_loss : 7.0422  avg_loss_step : 7.0422  learning_rate : 0.00026922108  loss_scaler : 32768 
DLL 2020-11-27 03:20:05.393829 - Iteration: 550  throughput_train : 262.419 seq/s mlm_loss : 7.1266  nsp_loss : 0.0000  total_loss : 7.1266  avg_loss_step : 7.1266  learning_rate : 0.00026892376  loss_scaler : 32768 
DLL 2020-11-27 03:20:07.338064 - Iteration: 551  throughput_train : 263.382 seq/s mlm_loss : 7.2177  nsp_loss : 0.0000  total_loss : 7.2177  avg_loss_step : 7.2177  learning_rate : 0.0002686261  loss_scaler : 32768 
DLL 2020-11-27 03:20:09.271888 - Iteration: 552  throughput_train : 264.800 seq/s mlm_loss : 7.2969  nsp_loss : 0.0000  total_loss : 7.2969  avg_loss_step : 7.2969  learning_rate : 0.00026832815  loss_scaler : 32768 
DLL 2020-11-27 03:20:11.209971 - Iteration: 553  throughput_train : 264.218 seq/s mlm_loss : 7.1176  nsp_loss : 0.0000  total_loss : 7.1176  avg_loss_step : 7.1176  learning_rate : 0.00026802986  loss_scaler : 32768 
DLL 2020-11-27 03:20:13.153740 - Iteration: 554  throughput_train : 263.445 seq/s mlm_loss : 7.0981  nsp_loss : 0.0000  total_loss : 7.0981  avg_loss_step : 7.0981  learning_rate : 0.00026773117  loss_scaler : 32768 
DLL 2020-11-27 03:20:15.107648 - Iteration: 555  throughput_train : 262.078 seq/s mlm_loss : 7.0844  nsp_loss : 0.0000  total_loss : 7.0844  avg_loss_step : 7.0844  learning_rate : 0.00026743222  loss_scaler : 32768 
DLL 2020-11-27 03:20:17.054797 - Iteration: 556  throughput_train : 262.988 seq/s mlm_loss : 7.0976  nsp_loss : 0.0000  total_loss : 7.0976  avg_loss_step : 7.0976  learning_rate : 0.0002671329  loss_scaler : 32768 
DLL 2020-11-27 03:20:19.006990 - Iteration: 557  throughput_train : 262.308 seq/s mlm_loss : 7.1912  nsp_loss : 0.0000  total_loss : 7.1912  avg_loss_step : 7.1912  learning_rate : 0.0002668333  loss_scaler : 32768 
DLL 2020-11-27 03:20:20.951309 - Iteration: 558  throughput_train : 263.371 seq/s mlm_loss : 7.2057  nsp_loss : 0.0000  total_loss : 7.2057  avg_loss_step : 7.2057  learning_rate : 0.0002665333  loss_scaler : 32768 
DLL 2020-11-27 03:20:22.892554 - Iteration: 559  throughput_train : 263.791 seq/s mlm_loss : 7.0939  nsp_loss : 0.0000  total_loss : 7.0939  avg_loss_step : 7.0939  learning_rate : 0.00026623296  loss_scaler : 32768 
DLL 2020-11-27 03:20:24.841706 - Iteration: 560  throughput_train : 262.721 seq/s mlm_loss : 6.9790  nsp_loss : 0.0000  total_loss : 6.9790  avg_loss_step : 6.9790  learning_rate : 0.00026593232  loss_scaler : 32768 
DLL 2020-11-27 03:20:26.787927 - Iteration: 561  throughput_train : 263.112 seq/s mlm_loss : 7.1894  nsp_loss : 0.0000  total_loss : 7.1894  avg_loss_step : 7.1894  learning_rate : 0.0002656313  loss_scaler : 32768 
DLL 2020-11-27 03:20:28.733536 - Iteration: 562  throughput_train : 263.197 seq/s mlm_loss : 7.2672  nsp_loss : 0.0000  total_loss : 7.2672  avg_loss_step : 7.2672  learning_rate : 0.00026532996  loss_scaler : 32768 
DLL 2020-11-27 03:20:30.674247 - Iteration: 563  throughput_train : 263.860 seq/s mlm_loss : 7.1595  nsp_loss : 0.0000  total_loss : 7.1595  avg_loss_step : 7.1595  learning_rate : 0.00026502827  loss_scaler : 32768 
DLL 2020-11-27 03:20:32.612386 - Iteration: 564  throughput_train : 264.210 seq/s mlm_loss : 7.1448  nsp_loss : 0.0000  total_loss : 7.1448  avg_loss_step : 7.1448  learning_rate : 0.00026472626  loss_scaler : 32768 
DLL 2020-11-27 03:20:34.559971 - Iteration: 565  throughput_train : 262.931 seq/s mlm_loss : 7.1942  nsp_loss : 0.0000  total_loss : 7.1942  avg_loss_step : 7.1942  learning_rate : 0.0002644239  loss_scaler : 32768 
DLL 2020-11-27 03:20:36.505449 - Iteration: 566  throughput_train : 263.214 seq/s mlm_loss : 7.2053  nsp_loss : 0.0000  total_loss : 7.2053  avg_loss_step : 7.2053  learning_rate : 0.00026412116  loss_scaler : 32768 
DLL 2020-11-27 03:20:38.456541 - Iteration: 567  throughput_train : 262.460 seq/s mlm_loss : 7.2482  nsp_loss : 0.0000  total_loss : 7.2482  avg_loss_step : 7.2482  learning_rate : 0.0002638181  loss_scaler : 32768 
DLL 2020-11-27 03:20:40.395718 - Iteration: 568  throughput_train : 264.069 seq/s mlm_loss : 7.1790  nsp_loss : 0.0000  total_loss : 7.1790  avg_loss_step : 7.1790  learning_rate : 0.00026351467  loss_scaler : 32768 
DLL 2020-11-27 03:20:42.335309 - Iteration: 569  throughput_train : 264.015 seq/s mlm_loss : 7.0584  nsp_loss : 0.0000  total_loss : 7.0584  avg_loss_step : 7.0584  learning_rate : 0.00026321094  loss_scaler : 32768 
DLL 2020-11-27 03:20:44.282949 - Iteration: 570  throughput_train : 262.933 seq/s mlm_loss : 6.9871  nsp_loss : 0.0000  total_loss : 6.9871  avg_loss_step : 6.9871  learning_rate : 0.0002629068  loss_scaler : 32768 
DLL 2020-11-27 03:20:46.232002 - Iteration: 571  throughput_train : 262.734 seq/s mlm_loss : 7.0186  nsp_loss : 0.0000  total_loss : 7.0186  avg_loss_step : 7.0186  learning_rate : 0.00026260235  loss_scaler : 32768 
DLL 2020-11-27 03:20:48.189396 - Iteration: 572  throughput_train : 261.611 seq/s mlm_loss : 7.1793  nsp_loss : 0.0000  total_loss : 7.1793  avg_loss_step : 7.1793  learning_rate : 0.00026229752  loss_scaler : 32768 
DLL 2020-11-27 03:20:50.130271 - Iteration: 573  throughput_train : 263.846 seq/s mlm_loss : 7.1713  nsp_loss : 0.0000  total_loss : 7.1713  avg_loss_step : 7.1713  learning_rate : 0.00026199236  loss_scaler : 32768 
DLL 2020-11-27 03:20:52.078693 - Iteration: 574  throughput_train : 262.832 seq/s mlm_loss : 7.2160  nsp_loss : 0.0000  total_loss : 7.2160  avg_loss_step : 7.2160  learning_rate : 0.00026168683  loss_scaler : 32768 
DLL 2020-11-27 03:20:54.026601 - Iteration: 575  throughput_train : 262.888 seq/s mlm_loss : 6.9750  nsp_loss : 0.0000  total_loss : 6.9750  avg_loss_step : 6.9750  learning_rate : 0.00026138092  loss_scaler : 32768 
DLL 2020-11-27 03:20:55.971629 - Iteration: 576  throughput_train : 263.274 seq/s mlm_loss : 7.0823  nsp_loss : 0.0000  total_loss : 7.0823  avg_loss_step : 7.0823  learning_rate : 0.0002610747  loss_scaler : 32768 
DLL 2020-11-27 03:20:57.918076 - Iteration: 577  throughput_train : 263.082 seq/s mlm_loss : 7.2118  nsp_loss : 0.0000  total_loss : 7.2118  avg_loss_step : 7.2118  learning_rate : 0.00026076808  loss_scaler : 32768 
DLL 2020-11-27 03:20:59.870169 - Iteration: 578  throughput_train : 262.322 seq/s mlm_loss : 7.0925  nsp_loss : 0.0000  total_loss : 7.0925  avg_loss_step : 7.0925  learning_rate : 0.00026046112  loss_scaler : 32768 
DLL 2020-11-27 03:21:01.818958 - Iteration: 579  throughput_train : 262.767 seq/s mlm_loss : 7.1628  nsp_loss : 0.0000  total_loss : 7.1628  avg_loss_step : 7.1628  learning_rate : 0.00026015379  loss_scaler : 32768 
DLL 2020-11-27 03:21:03.770499 - Iteration: 580  throughput_train : 262.395 seq/s mlm_loss : 7.0864  nsp_loss : 0.0000  total_loss : 7.0864  avg_loss_step : 7.0864  learning_rate : 0.0002598461  loss_scaler : 32768 
DLL 2020-11-27 03:21:05.719906 - Iteration: 581  throughput_train : 262.685 seq/s mlm_loss : 6.9141  nsp_loss : 0.0000  total_loss : 6.9141  avg_loss_step : 6.9141  learning_rate : 0.00025953804  loss_scaler : 32768 
DLL 2020-11-27 03:21:07.664120 - Iteration: 582  throughput_train : 263.388 seq/s mlm_loss : 7.0748  nsp_loss : 0.0000  total_loss : 7.0748  avg_loss_step : 7.0748  learning_rate : 0.0002592296  loss_scaler : 32768 
DLL 2020-11-27 03:21:09.607752 - Iteration: 583  throughput_train : 263.463 seq/s mlm_loss : 7.0926  nsp_loss : 0.0000  total_loss : 7.0926  avg_loss_step : 7.0926  learning_rate : 0.00025892083  loss_scaler : 32768 
DLL 2020-11-27 03:21:11.555554 - Iteration: 584  throughput_train : 262.902 seq/s mlm_loss : 7.1773  nsp_loss : 0.0000  total_loss : 7.1773  avg_loss_step : 7.1773  learning_rate : 0.00025861166  loss_scaler : 32768 
DLL 2020-11-27 03:21:13.496964 - Iteration: 585  throughput_train : 263.765 seq/s mlm_loss : 7.0778  nsp_loss : 0.0000  total_loss : 7.0778  avg_loss_step : 7.0778  learning_rate : 0.00025830214  loss_scaler : 32768 
DLL 2020-11-27 03:21:15.450330 - Iteration: 586  throughput_train : 262.150 seq/s mlm_loss : 7.1004  nsp_loss : 0.0000  total_loss : 7.1004  avg_loss_step : 7.1004  learning_rate : 0.00025799224  loss_scaler : 32768 
DLL 2020-11-27 03:21:17.395173 - Iteration: 587  throughput_train : 263.302 seq/s mlm_loss : 7.3831  nsp_loss : 0.0000  total_loss : 7.3831  avg_loss_step : 7.3831  learning_rate : 0.00025768197  loss_scaler : 32768 
DLL 2020-11-27 03:21:19.343337 - Iteration: 588  throughput_train : 262.849 seq/s mlm_loss : 7.1921  nsp_loss : 0.0000  total_loss : 7.1921  avg_loss_step : 7.1921  learning_rate : 0.0002573713  loss_scaler : 32768 
DLL 2020-11-27 03:21:21.304618 - Iteration: 589  throughput_train : 261.093 seq/s mlm_loss : 7.0348  nsp_loss : 0.0000  total_loss : 7.0349  avg_loss_step : 7.0349  learning_rate : 0.00025706028  loss_scaler : 32768 
DLL 2020-11-27 03:21:23.253165 - Iteration: 590  throughput_train : 262.800 seq/s mlm_loss : 6.9590  nsp_loss : 0.0000  total_loss : 6.9590  avg_loss_step : 6.9590  learning_rate : 0.0002567489  loss_scaler : 32768 
DLL 2020-11-27 03:21:25.204190 - Iteration: 591  throughput_train : 262.478 seq/s mlm_loss : 7.0475  nsp_loss : 0.0000  total_loss : 7.0475  avg_loss_step : 7.0475  learning_rate : 0.0002564371  loss_scaler : 32768 
DLL 2020-11-27 03:21:27.149005 - Iteration: 592  throughput_train : 263.316 seq/s mlm_loss : 7.0136  nsp_loss : 0.0000  total_loss : 7.0136  avg_loss_step : 7.0136  learning_rate : 0.00025612494  loss_scaler : 32768 
DLL 2020-11-27 03:21:29.090985 - Iteration: 593  throughput_train : 263.688 seq/s mlm_loss : 7.0588  nsp_loss : 0.0000  total_loss : 7.0588  avg_loss_step : 7.0588  learning_rate : 0.00025581243  loss_scaler : 32768 
DLL 2020-11-27 03:21:31.034236 - Iteration: 594  throughput_train : 263.516 seq/s mlm_loss : 7.1540  nsp_loss : 0.0000  total_loss : 7.1540  avg_loss_step : 7.1540  learning_rate : 0.0002554995  loss_scaler : 32768 
DLL 2020-11-27 03:21:32.979977 - Iteration: 595  throughput_train : 263.179 seq/s mlm_loss : 7.1299  nsp_loss : 0.0000  total_loss : 7.1299  avg_loss_step : 7.1299  learning_rate : 0.0002551862  loss_scaler : 32768 
DLL 2020-11-27 03:21:34.933181 - Iteration: 596  throughput_train : 262.172 seq/s mlm_loss : 6.9452  nsp_loss : 0.0000  total_loss : 6.9452  avg_loss_step : 6.9452  learning_rate : 0.00025487252  loss_scaler : 32768 
DLL 2020-11-27 03:21:36.886540 - Iteration: 597  throughput_train : 262.154 seq/s mlm_loss : 6.9286  nsp_loss : 0.0000  total_loss : 6.9286  avg_loss_step : 6.9286  learning_rate : 0.00025455843  loss_scaler : 32768 
DLL 2020-11-27 03:21:38.837778 - Iteration: 598  throughput_train : 262.439 seq/s mlm_loss : 7.0848  nsp_loss : 0.0000  total_loss : 7.0848  avg_loss_step : 7.0848  learning_rate : 0.00025424396  loss_scaler : 32768 
DLL 2020-11-27 03:21:40.789211 - Iteration: 599  throughput_train : 262.412 seq/s mlm_loss : 7.1805  nsp_loss : 0.0000  total_loss : 7.1805  avg_loss_step : 7.1805  learning_rate : 0.00025392912  loss_scaler : 32768 
DLL 2020-11-27 03:21:42.723330 - Iteration: 600  throughput_train : 264.772 seq/s mlm_loss : 7.0631  nsp_loss : 0.0000  total_loss : 7.0631  avg_loss_step : 7.0631  learning_rate : 0.00025361383  loss_scaler : 32768 
DLL 2020-11-27 03:21:44.667979 - Iteration: 601  throughput_train : 263.341 seq/s mlm_loss : 7.1914  nsp_loss : 0.0000  total_loss : 7.1914  avg_loss_step : 7.1914  learning_rate : 0.00025329823  loss_scaler : 32768 
DLL 2020-11-27 03:21:46.617833 - Iteration: 602  throughput_train : 262.634 seq/s mlm_loss : 7.1009  nsp_loss : 0.0000  total_loss : 7.1009  avg_loss_step : 7.1009  learning_rate : 0.0002529822  loss_scaler : 32768 
DLL 2020-11-27 03:21:48.562578 - Iteration: 603  throughput_train : 263.323 seq/s mlm_loss : 7.1826  nsp_loss : 0.0000  total_loss : 7.1826  avg_loss_step : 7.1826  learning_rate : 0.00025266578  loss_scaler : 32768 
DLL 2020-11-27 03:21:50.516434 - Iteration: 604  throughput_train : 262.088 seq/s mlm_loss : 7.0750  nsp_loss : 0.0000  total_loss : 7.0750  avg_loss_step : 7.0750  learning_rate : 0.00025234895  loss_scaler : 32768 
DLL 2020-11-27 03:21:52.460586 - Iteration: 605  throughput_train : 263.401 seq/s mlm_loss : 7.1380  nsp_loss : 0.0000  total_loss : 7.1380  avg_loss_step : 7.1380  learning_rate : 0.00025203172  loss_scaler : 32768 
DLL 2020-11-27 03:21:54.402108 - Iteration: 606  throughput_train : 263.761 seq/s mlm_loss : 6.9981  nsp_loss : 0.0000  total_loss : 6.9982  avg_loss_step : 6.9982  learning_rate : 0.0002517141  loss_scaler : 32768 
DLL 2020-11-27 03:21:56.348618 - Iteration: 607  throughput_train : 263.084 seq/s mlm_loss : 7.1024  nsp_loss : 0.0000  total_loss : 7.1024  avg_loss_step : 7.1024  learning_rate : 0.00025139606  loss_scaler : 32768 
DLL 2020-11-27 03:21:58.309506 - Iteration: 608  throughput_train : 261.155 seq/s mlm_loss : 7.1363  nsp_loss : 0.0000  total_loss : 7.1363  avg_loss_step : 7.1363  learning_rate : 0.00025107767  loss_scaler : 32768 
DLL 2020-11-27 03:22:00.243032 - Iteration: 609  throughput_train : 264.848 seq/s mlm_loss : 7.1706  nsp_loss : 0.0000  total_loss : 7.1706  avg_loss_step : 7.1706  learning_rate : 0.00025075884  loss_scaler : 32768 
DLL 2020-11-27 03:22:02.189001 - Iteration: 610  throughput_train : 263.146 seq/s mlm_loss : 7.0137  nsp_loss : 0.0000  total_loss : 7.0137  avg_loss_step : 7.0137  learning_rate : 0.0002504396  loss_scaler : 32768 
DLL 2020-11-27 03:22:04.133536 - Iteration: 611  throughput_train : 263.346 seq/s mlm_loss : 7.1882  nsp_loss : 0.0000  total_loss : 7.1882  avg_loss_step : 7.1882  learning_rate : 0.00025011998  loss_scaler : 32768 
DLL 2020-11-27 03:22:06.091560 - Iteration: 612  throughput_train : 261.534 seq/s mlm_loss : 7.0072  nsp_loss : 0.0000  total_loss : 7.0072  avg_loss_step : 7.0072  learning_rate : 0.00024979992  loss_scaler : 32768 
DLL 2020-11-27 03:22:08.036186 - Iteration: 613  throughput_train : 263.347 seq/s mlm_loss : 7.2069  nsp_loss : 0.0000  total_loss : 7.2069  avg_loss_step : 7.2069  learning_rate : 0.00024947946  loss_scaler : 32768 
DLL 2020-11-27 03:22:09.989369 - Iteration: 614  throughput_train : 262.203 seq/s mlm_loss : 7.0523  nsp_loss : 0.0000  total_loss : 7.0523  avg_loss_step : 7.0523  learning_rate : 0.00024915856  loss_scaler : 32768 
DLL 2020-11-27 03:22:11.939356 - Iteration: 615  throughput_train : 262.627 seq/s mlm_loss : 6.9251  nsp_loss : 0.0000  total_loss : 6.9251  avg_loss_step : 6.9251  learning_rate : 0.00024883728  loss_scaler : 32768 
DLL 2020-11-27 03:22:13.883108 - Iteration: 616  throughput_train : 263.448 seq/s mlm_loss : 6.9566  nsp_loss : 0.0000  total_loss : 6.9566  avg_loss_step : 6.9566  learning_rate : 0.00024851557  loss_scaler : 32768 
DLL 2020-11-27 03:22:15.833874 - Iteration: 617  throughput_train : 262.502 seq/s mlm_loss : 7.1425  nsp_loss : 0.0000  total_loss : 7.1425  avg_loss_step : 7.1425  learning_rate : 0.00024819348  loss_scaler : 32768 
DLL 2020-11-27 03:22:17.786790 - Iteration: 618  throughput_train : 262.225 seq/s mlm_loss : 6.9496  nsp_loss : 0.0000  total_loss : 6.9496  avg_loss_step : 6.9496  learning_rate : 0.00024787092  loss_scaler : 32768 
DLL 2020-11-27 03:22:19.733366 - Iteration: 619  throughput_train : 263.064 seq/s mlm_loss : 6.9826  nsp_loss : 0.0000  total_loss : 6.9826  avg_loss_step : 6.9826  learning_rate : 0.00024754796  loss_scaler : 32768 
DLL 2020-11-27 03:22:21.680087 - Iteration: 620  throughput_train : 263.047 seq/s mlm_loss : 7.0752  nsp_loss : 0.0000  total_loss : 7.0752  avg_loss_step : 7.0752  learning_rate : 0.00024722458  loss_scaler : 32768 
DLL 2020-11-27 03:22:23.622326 - Iteration: 621  throughput_train : 263.661 seq/s mlm_loss : 7.1954  nsp_loss : 0.0000  total_loss : 7.1954  avg_loss_step : 7.1954  learning_rate : 0.00024690077  loss_scaler : 32768 
DLL 2020-11-27 03:22:25.571710 - Iteration: 622  throughput_train : 262.709 seq/s mlm_loss : 7.1284  nsp_loss : 0.0000  total_loss : 7.1284  avg_loss_step : 7.1284  learning_rate : 0.00024657653  loss_scaler : 32768 
DLL 2020-11-27 03:22:27.503534 - Iteration: 623  throughput_train : 265.090 seq/s mlm_loss : 7.0029  nsp_loss : 0.0000  total_loss : 7.0030  avg_loss_step : 7.0030  learning_rate : 0.00024625188  loss_scaler : 32768 
DLL 2020-11-27 03:22:29.455088 - Iteration: 624  throughput_train : 262.395 seq/s mlm_loss : 7.1417  nsp_loss : 0.0000  total_loss : 7.1417  avg_loss_step : 7.1417  learning_rate : 0.0002459268  loss_scaler : 32768 
DLL 2020-11-27 03:22:31.402952 - Iteration: 625  throughput_train : 262.903 seq/s mlm_loss : 7.2715  nsp_loss : 0.0000  total_loss : 7.2715  avg_loss_step : 7.2715  learning_rate : 0.0002456013  loss_scaler : 32768 
DLL 2020-11-27 03:22:33.352297 - Iteration: 626  throughput_train : 262.696 seq/s mlm_loss : 7.2432  nsp_loss : 0.0000  total_loss : 7.2432  avg_loss_step : 7.2432  learning_rate : 0.00024527535  loss_scaler : 32768 
DLL 2020-11-27 03:22:35.305663 - Iteration: 627  throughput_train : 262.165 seq/s mlm_loss : 7.1894  nsp_loss : 0.0000  total_loss : 7.1894  avg_loss_step : 7.1894  learning_rate : 0.00024494898  loss_scaler : 32768 
DLL 2020-11-27 03:22:37.260157 - Iteration: 628  throughput_train : 261.999 seq/s mlm_loss : 7.0775  nsp_loss : 0.0000  total_loss : 7.0775  avg_loss_step : 7.0775  learning_rate : 0.00024462212  loss_scaler : 32768 
DLL 2020-11-27 03:22:39.203642 - Iteration: 629  throughput_train : 263.483 seq/s mlm_loss : 6.9457  nsp_loss : 0.0000  total_loss : 6.9457  avg_loss_step : 6.9457  learning_rate : 0.00024429488  loss_scaler : 32768 
DLL 2020-11-27 03:22:41.145131 - Iteration: 630  throughput_train : 263.757 seq/s mlm_loss : 6.9169  nsp_loss : 0.0000  total_loss : 6.9169  avg_loss_step : 6.9169  learning_rate : 0.0002439672  loss_scaler : 32768 
DLL 2020-11-27 03:22:43.095743 - Iteration: 631  throughput_train : 262.522 seq/s mlm_loss : 6.8909  nsp_loss : 0.0000  total_loss : 6.8909  avg_loss_step : 6.8909  learning_rate : 0.00024363906  loss_scaler : 32768 
DLL 2020-11-27 03:22:45.050648 - Iteration: 632  throughput_train : 261.947 seq/s mlm_loss : 6.9957  nsp_loss : 0.0000  total_loss : 6.9957  avg_loss_step : 6.9957  learning_rate : 0.00024331047  loss_scaler : 32768 
DLL 2020-11-27 03:22:46.996099 - Iteration: 633  throughput_train : 263.222 seq/s mlm_loss : 7.0746  nsp_loss : 0.0000  total_loss : 7.0747  avg_loss_step : 7.0747  learning_rate : 0.00024298145  loss_scaler : 32768 
DLL 2020-11-27 03:22:48.951313 - Iteration: 634  throughput_train : 261.907 seq/s mlm_loss : 7.0754  nsp_loss : 0.0000  total_loss : 7.0754  avg_loss_step : 7.0754  learning_rate : 0.00024265201  loss_scaler : 32768 
DLL 2020-11-27 03:22:50.894697 - Iteration: 635  throughput_train : 263.502 seq/s mlm_loss : 7.1159  nsp_loss : 0.0000  total_loss : 7.1159  avg_loss_step : 7.1159  learning_rate : 0.00024232209  loss_scaler : 32768 
DLL 2020-11-27 03:22:52.839473 - Iteration: 636  throughput_train : 263.309 seq/s mlm_loss : 7.0654  nsp_loss : 0.0000  total_loss : 7.0654  avg_loss_step : 7.0654  learning_rate : 0.00024199173  loss_scaler : 32768 
DLL 2020-11-27 03:22:54.783736 - Iteration: 637  throughput_train : 263.379 seq/s mlm_loss : 7.0250  nsp_loss : 0.0000  total_loss : 7.0250  avg_loss_step : 7.0250  learning_rate : 0.0002416609  loss_scaler : 32768 
DLL 2020-11-27 03:22:56.731716 - Iteration: 638  throughput_train : 262.876 seq/s mlm_loss : 7.0833  nsp_loss : 0.0000  total_loss : 7.0833  avg_loss_step : 7.0833  learning_rate : 0.00024132965  loss_scaler : 32768 
DLL 2020-11-27 03:22:58.670307 - Iteration: 639  throughput_train : 264.149 seq/s mlm_loss : 7.0435  nsp_loss : 0.0000  total_loss : 7.0435  avg_loss_step : 7.0435  learning_rate : 0.0002409979  loss_scaler : 32768 
DLL 2020-11-27 03:23:00.616991 - Iteration: 640  throughput_train : 263.052 seq/s mlm_loss : 7.1228  nsp_loss : 0.0000  total_loss : 7.1228  avg_loss_step : 7.1228  learning_rate : 0.00024066574  loss_scaler : 32768 
DLL 2020-11-27 03:23:02.567431 - Iteration: 641  throughput_train : 262.544 seq/s mlm_loss : 7.0899  nsp_loss : 0.0000  total_loss : 7.0899  avg_loss_step : 7.0899  learning_rate : 0.00024033307  loss_scaler : 32768 
DLL 2020-11-27 03:23:04.524282 - Iteration: 642  throughput_train : 261.686 seq/s mlm_loss : 7.1281  nsp_loss : 0.0000  total_loss : 7.1281  avg_loss_step : 7.1281  learning_rate : 0.00023999998  loss_scaler : 32768 
DLL 2020-11-27 03:23:06.473948 - Iteration: 643  throughput_train : 262.649 seq/s mlm_loss : 7.0125  nsp_loss : 0.0000  total_loss : 7.0125  avg_loss_step : 7.0125  learning_rate : 0.0002396664  loss_scaler : 32768 
DLL 2020-11-27 03:23:08.419540 - Iteration: 644  throughput_train : 263.200 seq/s mlm_loss : 7.0583  nsp_loss : 0.0000  total_loss : 7.0583  avg_loss_step : 7.0583  learning_rate : 0.00023933238  loss_scaler : 32768 
DLL 2020-11-27 03:23:10.376749 - Iteration: 645  throughput_train : 261.636 seq/s mlm_loss : 6.9369  nsp_loss : 0.0000  total_loss : 6.9369  avg_loss_step : 6.9369  learning_rate : 0.0002389979  loss_scaler : 32768 
DLL 2020-11-27 03:23:12.318104 - Iteration: 646  throughput_train : 263.773 seq/s mlm_loss : 7.1417  nsp_loss : 0.0000  total_loss : 7.1417  avg_loss_step : 7.1417  learning_rate : 0.00023866293  loss_scaler : 32768 
DLL 2020-11-27 03:23:14.262519 - Iteration: 647  throughput_train : 263.359 seq/s mlm_loss : 7.1459  nsp_loss : 0.0000  total_loss : 7.1459  avg_loss_step : 7.1459  learning_rate : 0.0002383275  loss_scaler : 32768 
DLL 2020-11-27 03:23:16.202768 - Iteration: 648  throughput_train : 263.930 seq/s mlm_loss : 7.0461  nsp_loss : 0.0000  total_loss : 7.0461  avg_loss_step : 7.0461  learning_rate : 0.00023799158  loss_scaler : 32768 
DLL 2020-11-27 03:23:18.151602 - Iteration: 649  throughput_train : 262.768 seq/s mlm_loss : 6.9983  nsp_loss : 0.0000  total_loss : 6.9983  avg_loss_step : 6.9983  learning_rate : 0.0002376552  loss_scaler : 32768 
DLL 2020-11-27 03:23:20.089050 - Iteration: 650  throughput_train : 264.309 seq/s mlm_loss : 7.0473  nsp_loss : 0.0000  total_loss : 7.0473  avg_loss_step : 7.0473  learning_rate : 0.00023731834  loss_scaler : 32768 
DLL 2020-11-27 03:23:22.039600 - Iteration: 651  throughput_train : 262.529 seq/s mlm_loss : 7.0731  nsp_loss : 0.0000  total_loss : 7.0731  avg_loss_step : 7.0731  learning_rate : 0.00023698098  loss_scaler : 32768 
DLL 2020-11-27 03:23:23.973682 - Iteration: 652  throughput_train : 264.768 seq/s mlm_loss : 7.1553  nsp_loss : 0.0000  total_loss : 7.1553  avg_loss_step : 7.1553  learning_rate : 0.00023664316  loss_scaler : 32768 
DLL 2020-11-27 03:23:25.931862 - Iteration: 653  throughput_train : 261.518 seq/s mlm_loss : 7.1410  nsp_loss : 0.0000  total_loss : 7.1410  avg_loss_step : 7.1410  learning_rate : 0.00023630487  loss_scaler : 32768 
DLL 2020-11-27 03:23:27.872907 - Iteration: 654  throughput_train : 263.826 seq/s mlm_loss : 7.1481  nsp_loss : 0.0000  total_loss : 7.1481  avg_loss_step : 7.1481  learning_rate : 0.00023596609  loss_scaler : 32768 
DLL 2020-11-27 03:23:29.816172 - Iteration: 655  throughput_train : 263.525 seq/s mlm_loss : 7.1475  nsp_loss : 0.0000  total_loss : 7.1475  avg_loss_step : 7.1475  learning_rate : 0.0002356268  loss_scaler : 32768 
DLL 2020-11-27 03:23:31.767129 - Iteration: 656  throughput_train : 262.486 seq/s mlm_loss : 7.0285  nsp_loss : 0.0000  total_loss : 7.0285  avg_loss_step : 7.0285  learning_rate : 0.00023528704  loss_scaler : 32768 
DLL 2020-11-27 03:23:33.721638 - Iteration: 657  throughput_train : 262.008 seq/s mlm_loss : 6.7844  nsp_loss : 0.0000  total_loss : 6.7844  avg_loss_step : 6.7844  learning_rate : 0.0002349468  loss_scaler : 32768 
DLL 2020-11-27 03:23:35.662933 - Iteration: 658  throughput_train : 263.792 seq/s mlm_loss : 6.9796  nsp_loss : 0.0000  total_loss : 6.9796  avg_loss_step : 6.9796  learning_rate : 0.00023460605  loss_scaler : 32768 
DLL 2020-11-27 03:23:37.613840 - Iteration: 659  throughput_train : 262.493 seq/s mlm_loss : 7.1014  nsp_loss : 0.0000  total_loss : 7.1014  avg_loss_step : 7.1014  learning_rate : 0.0002342648  loss_scaler : 32768 
DLL 2020-11-27 03:23:39.558120 - Iteration: 660  throughput_train : 263.390 seq/s mlm_loss : 6.9130  nsp_loss : 0.0000  total_loss : 6.9130  avg_loss_step : 6.9130  learning_rate : 0.00023392304  loss_scaler : 32768 
DLL 2020-11-27 03:23:41.504818 - Iteration: 661  throughput_train : 263.066 seq/s mlm_loss : 6.9852  nsp_loss : 0.0000  total_loss : 6.9852  avg_loss_step : 6.9852  learning_rate : 0.0002335808  loss_scaler : 32768 
DLL 2020-11-27 03:23:43.447441 - Iteration: 662  throughput_train : 263.612 seq/s mlm_loss : 6.8094  nsp_loss : 0.0000  total_loss : 6.8094  avg_loss_step : 6.8094  learning_rate : 0.00023323807  loss_scaler : 32768 
DLL 2020-11-27 03:23:45.397238 - Iteration: 663  throughput_train : 262.643 seq/s mlm_loss : 6.9847  nsp_loss : 0.0000  total_loss : 6.9847  avg_loss_step : 6.9847  learning_rate : 0.00023289482  loss_scaler : 32768 
DLL 2020-11-27 03:23:47.348038 - Iteration: 664  throughput_train : 262.505 seq/s mlm_loss : 7.0528  nsp_loss : 0.0000  total_loss : 7.0528  avg_loss_step : 7.0528  learning_rate : 0.00023255104  loss_scaler : 32768 
DLL 2020-11-27 03:23:49.295678 - Iteration: 665  throughput_train : 262.924 seq/s mlm_loss : 6.9873  nsp_loss : 0.0000  total_loss : 6.9873  avg_loss_step : 6.9873  learning_rate : 0.00023220679  loss_scaler : 32768 
DLL 2020-11-27 03:23:51.243775 - Iteration: 666  throughput_train : 262.871 seq/s mlm_loss : 7.1582  nsp_loss : 0.0000  total_loss : 7.1582  avg_loss_step : 7.1582  learning_rate : 0.00023186201  loss_scaler : 32768 
DLL 2020-11-27 03:23:53.192768 - Iteration: 667  throughput_train : 262.748 seq/s mlm_loss : 7.0979  nsp_loss : 0.0000  total_loss : 7.0979  avg_loss_step : 7.0979  learning_rate : 0.00023151671  loss_scaler : 32768 
DLL 2020-11-27 03:23:55.142280 - Iteration: 668  throughput_train : 262.672 seq/s mlm_loss : 7.1601  nsp_loss : 0.0000  total_loss : 7.1601  avg_loss_step : 7.1601  learning_rate : 0.00023117094  loss_scaler : 32768 
DLL 2020-11-27 03:23:57.104023 - Iteration: 669  throughput_train : 261.047 seq/s mlm_loss : 7.0957  nsp_loss : 0.0000  total_loss : 7.0957  avg_loss_step : 7.0957  learning_rate : 0.00023082459  loss_scaler : 32768 
DLL 2020-11-27 03:23:59.047447 - Iteration: 670  throughput_train : 263.515 seq/s mlm_loss : 7.1941  nsp_loss : 0.0000  total_loss : 7.1941  avg_loss_step : 7.1941  learning_rate : 0.00023047773  loss_scaler : 32768 
DLL 2020-11-27 03:24:00.991592 - Iteration: 671  throughput_train : 263.401 seq/s mlm_loss : 7.1064  nsp_loss : 0.0000  total_loss : 7.1064  avg_loss_step : 7.1064  learning_rate : 0.00023013038  loss_scaler : 32768 
DLL 2020-11-27 03:24:02.946119 - Iteration: 672  throughput_train : 262.000 seq/s mlm_loss : 7.0649  nsp_loss : 0.0000  total_loss : 7.0649  avg_loss_step : 7.0649  learning_rate : 0.0002297825  loss_scaler : 32768 
DLL 2020-11-27 03:24:04.888017 - Iteration: 673  throughput_train : 263.698 seq/s mlm_loss : 7.2841  nsp_loss : 0.0000  total_loss : 7.2841  avg_loss_step : 7.2841  learning_rate : 0.00022943408  loss_scaler : 32768 
DLL 2020-11-27 03:24:06.836771 - Iteration: 674  throughput_train : 262.783 seq/s mlm_loss : 7.2462  nsp_loss : 0.0000  total_loss : 7.2462  avg_loss_step : 7.2462  learning_rate : 0.00022908511  loss_scaler : 32768 
DLL 2020-11-27 03:24:08.780666 - Iteration: 675  throughput_train : 263.437 seq/s mlm_loss : 6.9762  nsp_loss : 0.0000  total_loss : 6.9762  avg_loss_step : 6.9762  learning_rate : 0.00022873563  loss_scaler : 32768 
DLL 2020-11-27 03:24:10.728669 - Iteration: 676  throughput_train : 262.875 seq/s mlm_loss : 7.1139  nsp_loss : 0.0000  total_loss : 7.1139  avg_loss_step : 7.1139  learning_rate : 0.00022838563  loss_scaler : 32768 
DLL 2020-11-27 03:24:12.678210 - Iteration: 677  throughput_train : 262.673 seq/s mlm_loss : 6.9546  nsp_loss : 0.0000  total_loss : 6.9546  avg_loss_step : 6.9546  learning_rate : 0.00022803509  loss_scaler : 32768 
DLL 2020-11-27 03:24:14.619714 - Iteration: 678  throughput_train : 263.755 seq/s mlm_loss : 7.0491  nsp_loss : 0.0000  total_loss : 7.0491  avg_loss_step : 7.0491  learning_rate : 0.00022768397  loss_scaler : 32768 
DLL 2020-11-27 03:24:16.560428 - Iteration: 679  throughput_train : 263.875 seq/s mlm_loss : 7.1945  nsp_loss : 0.0000  total_loss : 7.1945  avg_loss_step : 7.1945  learning_rate : 0.00022733232  loss_scaler : 32768 
DLL 2020-11-27 03:24:18.500545 - Iteration: 680  throughput_train : 263.955 seq/s mlm_loss : 7.2359  nsp_loss : 0.0000  total_loss : 7.2359  avg_loss_step : 7.2359  learning_rate : 0.00022698016  loss_scaler : 32768 
DLL 2020-11-27 03:24:20.450423 - Iteration: 681  throughput_train : 262.631 seq/s mlm_loss : 7.2378  nsp_loss : 0.0000  total_loss : 7.2378  avg_loss_step : 7.2378  learning_rate : 0.00022662744  loss_scaler : 32768 
DLL 2020-11-27 03:24:22.396063 - Iteration: 682  throughput_train : 263.204 seq/s mlm_loss : 7.2309  nsp_loss : 0.0000  total_loss : 7.2309  avg_loss_step : 7.2309  learning_rate : 0.00022627415  loss_scaler : 32768 
DLL 2020-11-27 03:24:24.336008 - Iteration: 683  throughput_train : 263.976 seq/s mlm_loss : 7.0434  nsp_loss : 0.0000  total_loss : 7.0434  avg_loss_step : 7.0434  learning_rate : 0.00022592032  loss_scaler : 32768 
DLL 2020-11-27 03:24:26.280672 - Iteration: 684  throughput_train : 263.335 seq/s mlm_loss : 7.1367  nsp_loss : 0.0000  total_loss : 7.1367  avg_loss_step : 7.1367  learning_rate : 0.00022556593  loss_scaler : 32768 
DLL 2020-11-27 03:24:28.228948 - Iteration: 685  throughput_train : 262.845 seq/s mlm_loss : 7.0214  nsp_loss : 0.0000  total_loss : 7.0214  avg_loss_step : 7.0214  learning_rate : 0.000225211  loss_scaler : 32768 
DLL 2020-11-27 03:24:30.175596 - Iteration: 686  throughput_train : 263.056 seq/s mlm_loss : 7.0182  nsp_loss : 0.0000  total_loss : 7.0182  avg_loss_step : 7.0182  learning_rate : 0.0002248555  loss_scaler : 32768 
DLL 2020-11-27 03:24:32.112172 - Iteration: 687  throughput_train : 264.423 seq/s mlm_loss : 6.9856  nsp_loss : 0.0000  total_loss : 6.9856  avg_loss_step : 6.9856  learning_rate : 0.00022449941  loss_scaler : 32768 
DLL 2020-11-27 03:24:34.066370 - Iteration: 688  throughput_train : 262.043 seq/s mlm_loss : 7.1135  nsp_loss : 0.0000  total_loss : 7.1135  avg_loss_step : 7.1135  learning_rate : 0.00022414279  loss_scaler : 32768 
DLL 2020-11-27 03:24:36.002090 - Iteration: 689  throughput_train : 264.558 seq/s mlm_loss : 7.0519  nsp_loss : 0.0000  total_loss : 7.0519  avg_loss_step : 7.0519  learning_rate : 0.00022378558  loss_scaler : 32768 
DLL 2020-11-27 03:24:37.946283 - Iteration: 690  throughput_train : 263.399 seq/s mlm_loss : 6.9851  nsp_loss : 0.0000  total_loss : 6.9851  avg_loss_step : 6.9851  learning_rate : 0.00022342784  loss_scaler : 32768 
DLL 2020-11-27 03:24:39.887002 - Iteration: 691  throughput_train : 263.872 seq/s mlm_loss : 6.9778  nsp_loss : 0.0000  total_loss : 6.9778  avg_loss_step : 6.9778  learning_rate : 0.0002230695  loss_scaler : 32768 
DLL 2020-11-27 03:24:41.826115 - Iteration: 692  throughput_train : 264.100 seq/s mlm_loss : 6.8826  nsp_loss : 0.0000  total_loss : 6.8826  avg_loss_step : 6.8826  learning_rate : 0.00022271056  loss_scaler : 32768 
DLL 2020-11-27 03:24:43.767739 - Iteration: 693  throughput_train : 263.748 seq/s mlm_loss : 7.1220  nsp_loss : 0.0000  total_loss : 7.1220  avg_loss_step : 7.1220  learning_rate : 0.00022235104  loss_scaler : 32768 
DLL 2020-11-27 03:24:45.714254 - Iteration: 694  throughput_train : 263.072 seq/s mlm_loss : 6.9653  nsp_loss : 0.0000  total_loss : 6.9653  avg_loss_step : 6.9653  learning_rate : 0.00022199098  loss_scaler : 32768 
DLL 2020-11-27 03:24:47.650199 - Iteration: 695  throughput_train : 264.509 seq/s mlm_loss : 7.0284  nsp_loss : 0.0000  total_loss : 7.0284  avg_loss_step : 7.0284  learning_rate : 0.00022163031  loss_scaler : 32768 
DLL 2020-11-27 03:24:49.594442 - Iteration: 696  throughput_train : 263.383 seq/s mlm_loss : 6.9866  nsp_loss : 0.0000  total_loss : 6.9866  avg_loss_step : 6.9866  learning_rate : 0.00022126906  loss_scaler : 32768 
DLL 2020-11-27 03:24:51.544360 - Iteration: 697  throughput_train : 262.624 seq/s mlm_loss : 7.0649  nsp_loss : 0.0000  total_loss : 7.0649  avg_loss_step : 7.0649  learning_rate : 0.00022090721  loss_scaler : 32768 
DLL 2020-11-27 03:24:53.496310 - Iteration: 698  throughput_train : 262.342 seq/s mlm_loss : 6.9949  nsp_loss : 0.0000  total_loss : 6.9949  avg_loss_step : 6.9949  learning_rate : 0.00022054477  loss_scaler : 32768 
DLL 2020-11-27 03:24:55.442191 - Iteration: 699  throughput_train : 263.160 seq/s mlm_loss : 7.2054  nsp_loss : 0.0000  total_loss : 7.2054  avg_loss_step : 7.2054  learning_rate : 0.00022018173  loss_scaler : 32768 
DLL 2020-11-27 03:24:57.388667 - Iteration: 700  throughput_train : 263.079 seq/s mlm_loss : 7.0942  nsp_loss : 0.0000  total_loss : 7.0942  avg_loss_step : 7.0942  learning_rate : 0.00021981809  loss_scaler : 32768 
DLL 2020-11-27 03:24:59.338946 - Iteration: 701  throughput_train : 262.565 seq/s mlm_loss : 6.9557  nsp_loss : 0.0000  total_loss : 6.9557  avg_loss_step : 6.9557  learning_rate : 0.00021945382  loss_scaler : 32768 
DLL 2020-11-27 03:25:01.286580 - Iteration: 702  throughput_train : 262.924 seq/s mlm_loss : 6.8414  nsp_loss : 0.0000  total_loss : 6.8414  avg_loss_step : 6.8414  learning_rate : 0.00021908901  loss_scaler : 32768 
DLL 2020-11-27 03:25:03.242617 - Iteration: 703  throughput_train : 261.793 seq/s mlm_loss : 6.8979  nsp_loss : 0.0000  total_loss : 6.8979  avg_loss_step : 6.8979  learning_rate : 0.00021872355  loss_scaler : 32768 
DLL 2020-11-27 03:25:05.183694 - Iteration: 704  throughput_train : 263.814 seq/s mlm_loss : 7.0620  nsp_loss : 0.0000  total_loss : 7.0620  avg_loss_step : 7.0620  learning_rate : 0.00021835748  loss_scaler : 32768 
DLL 2020-11-27 03:25:07.137410 - Iteration: 705  throughput_train : 262.115 seq/s mlm_loss : 7.0214  nsp_loss : 0.0000  total_loss : 7.0214  avg_loss_step : 7.0214  learning_rate : 0.00021799082  loss_scaler : 32768 
DLL 2020-11-27 03:25:09.087071 - Iteration: 706  throughput_train : 262.659 seq/s mlm_loss : 7.1363  nsp_loss : 0.0000  total_loss : 7.1363  avg_loss_step : 7.1363  learning_rate : 0.00021762348  loss_scaler : 32768 
DLL 2020-11-27 03:25:11.033028 - Iteration: 707  throughput_train : 263.148 seq/s mlm_loss : 7.0697  nsp_loss : 0.0000  total_loss : 7.0697  avg_loss_step : 7.0697  learning_rate : 0.00021725558  loss_scaler : 32768 
DLL 2020-11-27 03:25:12.985814 - Iteration: 708  throughput_train : 262.229 seq/s mlm_loss : 6.9899  nsp_loss : 0.0000  total_loss : 6.9899  avg_loss_step : 6.9899  learning_rate : 0.00021688704  loss_scaler : 32768 
DLL 2020-11-27 03:25:14.935678 - Iteration: 709  throughput_train : 262.621 seq/s mlm_loss : 7.1026  nsp_loss : 0.0000  total_loss : 7.1026  avg_loss_step : 7.1026  learning_rate : 0.0002165179  loss_scaler : 32768 
DLL 2020-11-27 03:25:16.877306 - Iteration: 710  throughput_train : 263.735 seq/s mlm_loss : 7.2044  nsp_loss : 0.0000  total_loss : 7.2044  avg_loss_step : 7.2044  learning_rate : 0.00021614808  loss_scaler : 32768 
DLL 2020-11-27 03:25:18.825260 - Iteration: 711  throughput_train : 262.883 seq/s mlm_loss : 7.0509  nsp_loss : 0.0000  total_loss : 7.0509  avg_loss_step : 7.0509  learning_rate : 0.00021577763  loss_scaler : 32768 
DLL 2020-11-27 03:25:20.762617 - Iteration: 712  throughput_train : 264.328 seq/s mlm_loss : 6.9376  nsp_loss : 0.0000  total_loss : 6.9376  avg_loss_step : 6.9376  learning_rate : 0.00021540657  loss_scaler : 32768 
DLL 2020-11-27 03:25:22.699380 - Iteration: 713  throughput_train : 264.398 seq/s mlm_loss : 6.9922  nsp_loss : 0.0000  total_loss : 6.9922  avg_loss_step : 6.9922  learning_rate : 0.00021503486  loss_scaler : 32768 
DLL 2020-11-27 03:25:24.644235 - Iteration: 714  throughput_train : 263.304 seq/s mlm_loss : 7.0038  nsp_loss : 0.0000  total_loss : 7.0038  avg_loss_step : 7.0038  learning_rate : 0.00021466252  loss_scaler : 32768 
DLL 2020-11-27 03:25:26.585650 - Iteration: 715  throughput_train : 263.778 seq/s mlm_loss : 7.0077  nsp_loss : 0.0000  total_loss : 7.0077  avg_loss_step : 7.0077  learning_rate : 0.0002142895  loss_scaler : 32768 
DLL 2020-11-27 03:25:28.523752 - Iteration: 716  throughput_train : 264.224 seq/s mlm_loss : 6.9868  nsp_loss : 0.0000  total_loss : 6.9868  avg_loss_step : 6.9868  learning_rate : 0.00021391585  loss_scaler : 32768 
DLL 2020-11-27 03:25:30.470348 - Iteration: 717  throughput_train : 263.062 seq/s mlm_loss : 7.0199  nsp_loss : 0.0000  total_loss : 7.0199  avg_loss_step : 7.0199  learning_rate : 0.00021354156  loss_scaler : 32768 
DLL 2020-11-27 03:25:32.414807 - Iteration: 718  throughput_train : 263.353 seq/s mlm_loss : 6.9882  nsp_loss : 0.0000  total_loss : 6.9883  avg_loss_step : 6.9883  learning_rate : 0.00021316658  loss_scaler : 32768 
DLL 2020-11-27 03:25:34.358761 - Iteration: 719  throughput_train : 263.420 seq/s mlm_loss : 7.0232  nsp_loss : 0.0000  total_loss : 7.0232  avg_loss_step : 7.0232  learning_rate : 0.00021279095  loss_scaler : 32768 
DLL 2020-11-27 03:25:36.316535 - Iteration: 720  throughput_train : 261.562 seq/s mlm_loss : 6.9758  nsp_loss : 0.0000  total_loss : 6.9758  avg_loss_step : 6.9758  learning_rate : 0.00021241467  loss_scaler : 32768 
DLL 2020-11-27 03:25:38.260948 - Iteration: 721  throughput_train : 263.358 seq/s mlm_loss : 6.9860  nsp_loss : 0.0000  total_loss : 6.9860  avg_loss_step : 6.9860  learning_rate : 0.0002120377  loss_scaler : 32768 
DLL 2020-11-27 03:25:40.203508 - Iteration: 722  throughput_train : 263.611 seq/s mlm_loss : 6.7907  nsp_loss : 0.0000  total_loss : 6.7907  avg_loss_step : 6.7907  learning_rate : 0.0002116601  loss_scaler : 32768 
DLL 2020-11-27 03:25:42.149083 - Iteration: 723  throughput_train : 263.213 seq/s mlm_loss : 6.9649  nsp_loss : 0.0000  total_loss : 6.9649  avg_loss_step : 6.9649  learning_rate : 0.00021128179  loss_scaler : 32768 
DLL 2020-11-27 03:25:44.099758 - Iteration: 724  throughput_train : 262.521 seq/s mlm_loss : 7.0133  nsp_loss : 0.0000  total_loss : 7.0133  avg_loss_step : 7.0133  learning_rate : 0.00021090278  loss_scaler : 32768 
DLL 2020-11-27 03:25:46.039909 - Iteration: 725  throughput_train : 263.936 seq/s mlm_loss : 7.0436  nsp_loss : 0.0000  total_loss : 7.0436  avg_loss_step : 7.0436  learning_rate : 0.00021052313  loss_scaler : 32768 
DLL 2020-11-27 03:25:47.978228 - Iteration: 726  throughput_train : 264.188 seq/s mlm_loss : 6.9242  nsp_loss : 0.0000  total_loss : 6.9242  avg_loss_step : 6.9242  learning_rate : 0.0002101428  loss_scaler : 32768 
DLL 2020-11-27 03:25:49.922639 - Iteration: 727  throughput_train : 263.369 seq/s mlm_loss : 6.9340  nsp_loss : 0.0000  total_loss : 6.9340  avg_loss_step : 6.9340  learning_rate : 0.00020976175  loss_scaler : 32768 
DLL 2020-11-27 03:25:51.869811 - Iteration: 728  throughput_train : 262.997 seq/s mlm_loss : 6.9410  nsp_loss : 0.0000  total_loss : 6.9410  avg_loss_step : 6.9410  learning_rate : 0.00020938003  loss_scaler : 32768 
DLL 2020-11-27 03:25:53.814381 - Iteration: 729  throughput_train : 263.348 seq/s mlm_loss : 6.9553  nsp_loss : 0.0000  total_loss : 6.9553  avg_loss_step : 6.9553  learning_rate : 0.00020899758  loss_scaler : 32768 
DLL 2020-11-27 03:25:55.752746 - Iteration: 730  throughput_train : 264.190 seq/s mlm_loss : 7.0088  nsp_loss : 0.0000  total_loss : 7.0088  avg_loss_step : 7.0088  learning_rate : 0.00020861447  loss_scaler : 32768 
DLL 2020-11-27 03:25:57.696858 - Iteration: 731  throughput_train : 263.399 seq/s mlm_loss : 7.0216  nsp_loss : 0.0000  total_loss : 7.0216  avg_loss_step : 7.0216  learning_rate : 0.00020823063  loss_scaler : 32768 
DLL 2020-11-27 03:25:59.646614 - Iteration: 732  throughput_train : 262.636 seq/s mlm_loss : 6.9938  nsp_loss : 0.0000  total_loss : 6.9938  avg_loss_step : 6.9938  learning_rate : 0.00020784608  loss_scaler : 32768 
DLL 2020-11-27 03:26:01.595195 - Iteration: 733  throughput_train : 262.794 seq/s mlm_loss : 7.1580  nsp_loss : 0.0000  total_loss : 7.1580  avg_loss_step : 7.1580  learning_rate : 0.00020746082  loss_scaler : 32768 
DLL 2020-11-27 03:26:03.542897 - Iteration: 734  throughput_train : 262.913 seq/s mlm_loss : 7.0598  nsp_loss : 0.0000  total_loss : 7.0598  avg_loss_step : 7.0598  learning_rate : 0.00020707485  loss_scaler : 32768 
DLL 2020-11-27 03:26:05.489226 - Iteration: 735  throughput_train : 263.100 seq/s mlm_loss : 7.1554  nsp_loss : 0.0000  total_loss : 7.1554  avg_loss_step : 7.1554  learning_rate : 0.00020668816  loss_scaler : 32768 
DLL 2020-11-27 03:26:07.436096 - Iteration: 736  throughput_train : 263.036 seq/s mlm_loss : 6.9996  nsp_loss : 0.0000  total_loss : 6.9996  avg_loss_step : 6.9996  learning_rate : 0.00020630073  loss_scaler : 32768 
DLL 2020-11-27 03:26:09.380758 - Iteration: 737  throughput_train : 263.322 seq/s mlm_loss : 7.1907  nsp_loss : 0.0000  total_loss : 7.1907  avg_loss_step : 7.1907  learning_rate : 0.00020591258  loss_scaler : 32768 
DLL 2020-11-27 03:26:11.330650 - Iteration: 738  throughput_train : 262.617 seq/s mlm_loss : 6.9549  nsp_loss : 0.0000  total_loss : 6.9549  avg_loss_step : 6.9549  learning_rate : 0.0002055237  loss_scaler : 32768 
DLL 2020-11-27 03:26:13.274491 - Iteration: 739  throughput_train : 263.434 seq/s mlm_loss : 7.0746  nsp_loss : 0.0000  total_loss : 7.0746  avg_loss_step : 7.0746  learning_rate : 0.00020513407  loss_scaler : 32768 
DLL 2020-11-27 03:26:15.222379 - Iteration: 740  throughput_train : 262.888 seq/s mlm_loss : 7.0119  nsp_loss : 0.0000  total_loss : 7.0119  avg_loss_step : 7.0119  learning_rate : 0.00020474372  loss_scaler : 32768 
DLL 2020-11-27 03:26:17.167334 - Iteration: 741  throughput_train : 263.285 seq/s mlm_loss : 6.9504  nsp_loss : 0.0000  total_loss : 6.9504  avg_loss_step : 6.9504  learning_rate : 0.00020435262  loss_scaler : 32768 
DLL 2020-11-27 03:26:19.113672 - Iteration: 742  throughput_train : 263.099 seq/s mlm_loss : 7.0823  nsp_loss : 0.0000  total_loss : 7.0823  avg_loss_step : 7.0823  learning_rate : 0.00020396076  loss_scaler : 32768 
DLL 2020-11-27 03:26:21.058406 - Iteration: 743  throughput_train : 263.316 seq/s mlm_loss : 7.1353  nsp_loss : 0.0000  total_loss : 7.1353  avg_loss_step : 7.1353  learning_rate : 0.00020356814  loss_scaler : 32768 
DLL 2020-11-27 03:26:23.004111 - Iteration: 744  throughput_train : 263.185 seq/s mlm_loss : 6.9096  nsp_loss : 0.0000  total_loss : 6.9096  avg_loss_step : 6.9096  learning_rate : 0.00020317477  loss_scaler : 32768 
DLL 2020-11-27 03:26:24.962059 - Iteration: 745  throughput_train : 261.537 seq/s mlm_loss : 6.9148  nsp_loss : 0.0000  total_loss : 6.9149  avg_loss_step : 6.9149  learning_rate : 0.00020278065  loss_scaler : 32768 
DLL 2020-11-27 03:26:26.906694 - Iteration: 746  throughput_train : 263.329 seq/s mlm_loss : 7.0222  nsp_loss : 0.0000  total_loss : 7.0222  avg_loss_step : 7.0222  learning_rate : 0.00020238575  loss_scaler : 32768 
DLL 2020-11-27 03:26:28.849332 - Iteration: 747  throughput_train : 263.599 seq/s mlm_loss : 7.0976  nsp_loss : 0.0000  total_loss : 7.0976  avg_loss_step : 7.0976  learning_rate : 0.00020199007  loss_scaler : 32768 
DLL 2020-11-27 03:26:30.797783 - Iteration: 748  throughput_train : 262.814 seq/s mlm_loss : 7.0411  nsp_loss : 0.0000  total_loss : 7.0411  avg_loss_step : 7.0411  learning_rate : 0.00020159363  loss_scaler : 32768 
DLL 2020-11-27 03:26:32.741763 - Iteration: 749  throughput_train : 263.418 seq/s mlm_loss : 7.0408  nsp_loss : 0.0000  total_loss : 7.0408  avg_loss_step : 7.0408  learning_rate : 0.00020119641  loss_scaler : 32768 
DLL 2020-11-27 03:26:34.687990 - Iteration: 750  throughput_train : 263.113 seq/s mlm_loss : 7.0200  nsp_loss : 0.0000  total_loss : 7.0200  avg_loss_step : 7.0200  learning_rate : 0.00020079839  loss_scaler : 32768 
DLL 2020-11-27 03:26:36.635163 - Iteration: 751  throughput_train : 262.985 seq/s mlm_loss : 7.0793  nsp_loss : 0.0000  total_loss : 7.0793  avg_loss_step : 7.0793  learning_rate : 0.00020039959  loss_scaler : 32768 
DLL 2020-11-27 03:26:38.587543 - Iteration: 752  throughput_train : 262.290 seq/s mlm_loss : 7.0888  nsp_loss : 0.0000  total_loss : 7.0889  avg_loss_step : 7.0889  learning_rate : 0.00019999997  loss_scaler : 32768 
DLL 2020-11-27 03:26:40.541880 - Iteration: 753  throughput_train : 262.033 seq/s mlm_loss : 6.9689  nsp_loss : 0.0000  total_loss : 6.9689  avg_loss_step : 6.9689  learning_rate : 0.00019959957  loss_scaler : 32768 
DLL 2020-11-27 03:26:42.490621 - Iteration: 754  throughput_train : 262.782 seq/s mlm_loss : 7.0568  nsp_loss : 0.0000  total_loss : 7.0568  avg_loss_step : 7.0568  learning_rate : 0.00019919837  loss_scaler : 32768 
DLL 2020-11-27 03:26:44.431352 - Iteration: 755  throughput_train : 263.880 seq/s mlm_loss : 6.9777  nsp_loss : 0.0000  total_loss : 6.9777  avg_loss_step : 6.9777  learning_rate : 0.00019879636  loss_scaler : 32768 
DLL 2020-11-27 03:26:46.381201 - Iteration: 756  throughput_train : 262.632 seq/s mlm_loss : 6.8878  nsp_loss : 0.0000  total_loss : 6.8878  avg_loss_step : 6.8878  learning_rate : 0.00019839354  loss_scaler : 32768 
DLL 2020-11-27 03:26:48.329564 - Iteration: 757  throughput_train : 262.851 seq/s mlm_loss : 6.9815  nsp_loss : 0.0000  total_loss : 6.9815  avg_loss_step : 6.9815  learning_rate : 0.00019798988  loss_scaler : 32768 
DLL 2020-11-27 03:26:50.263999 - Iteration: 758  throughput_train : 264.735 seq/s mlm_loss : 6.8626  nsp_loss : 0.0000  total_loss : 6.8626  avg_loss_step : 6.8626  learning_rate : 0.0001975854  loss_scaler : 32768 
DLL 2020-11-27 03:26:52.215179 - Iteration: 759  throughput_train : 262.447 seq/s mlm_loss : 6.8865  nsp_loss : 0.0000  total_loss : 6.8865  avg_loss_step : 6.8865  learning_rate : 0.0001971801  loss_scaler : 32768 
DLL 2020-11-27 03:26:54.158885 - Iteration: 760  throughput_train : 263.468 seq/s mlm_loss : 6.9496  nsp_loss : 0.0000  total_loss : 6.9496  avg_loss_step : 6.9496  learning_rate : 0.00019677397  loss_scaler : 32768 
DLL 2020-11-27 03:26:56.107146 - Iteration: 761  throughput_train : 262.842 seq/s mlm_loss : 7.0728  nsp_loss : 0.0000  total_loss : 7.0728  avg_loss_step : 7.0728  learning_rate : 0.00019636698  loss_scaler : 32768 
DLL 2020-11-27 03:26:58.057779 - Iteration: 762  throughput_train : 262.534 seq/s mlm_loss : 7.1621  nsp_loss : 0.0000  total_loss : 7.1621  avg_loss_step : 7.1621  learning_rate : 0.00019595916  loss_scaler : 32768 
DLL 2020-11-27 03:26:59.999778 - Iteration: 763  throughput_train : 263.701 seq/s mlm_loss : 7.0799  nsp_loss : 0.0000  total_loss : 7.0799  avg_loss_step : 7.0799  learning_rate : 0.00019555048  loss_scaler : 32768 
DLL 2020-11-27 03:27:01.948907 - Iteration: 764  throughput_train : 262.733 seq/s mlm_loss : 6.9233  nsp_loss : 0.0000  total_loss : 6.9233  avg_loss_step : 6.9233  learning_rate : 0.00019514096  loss_scaler : 32768 
DLL 2020-11-27 03:27:03.891286 - Iteration: 765  throughput_train : 263.634 seq/s mlm_loss : 6.9289  nsp_loss : 0.0000  total_loss : 6.9289  avg_loss_step : 6.9289  learning_rate : 0.00019473057  loss_scaler : 32768 
DLL 2020-11-27 03:27:05.837773 - Iteration: 766  throughput_train : 263.078 seq/s mlm_loss : 6.9259  nsp_loss : 0.0000  total_loss : 6.9259  avg_loss_step : 6.9259  learning_rate : 0.00019431929  loss_scaler : 32768 
DLL 2020-11-27 03:27:07.783792 - Iteration: 767  throughput_train : 263.141 seq/s mlm_loss : 7.0267  nsp_loss : 0.0000  total_loss : 7.0267  avg_loss_step : 7.0267  learning_rate : 0.00019390717  loss_scaler : 32768 
DLL 2020-11-27 03:27:09.730431 - Iteration: 768  throughput_train : 263.057 seq/s mlm_loss : 6.9713  nsp_loss : 0.0000  total_loss : 6.9713  avg_loss_step : 6.9713  learning_rate : 0.00019349417  loss_scaler : 32768 
DLL 2020-11-27 03:27:11.672904 - Iteration: 769  throughput_train : 263.623 seq/s mlm_loss : 7.0375  nsp_loss : 0.0000  total_loss : 7.0375  avg_loss_step : 7.0375  learning_rate : 0.00019308028  loss_scaler : 32768 
DLL 2020-11-27 03:27:13.620243 - Iteration: 770  throughput_train : 262.962 seq/s mlm_loss : 6.9546  nsp_loss : 0.0000  total_loss : 6.9546  avg_loss_step : 6.9546  learning_rate : 0.0001926655  loss_scaler : 32768 
DLL 2020-11-27 03:27:15.569877 - Iteration: 771  throughput_train : 262.657 seq/s mlm_loss : 7.0779  nsp_loss : 0.0000  total_loss : 7.0779  avg_loss_step : 7.0779  learning_rate : 0.0001922498  loss_scaler : 32768 
DLL 2020-11-27 03:27:17.515130 - Iteration: 772  throughput_train : 263.247 seq/s mlm_loss : 7.0709  nsp_loss : 0.0000  total_loss : 7.0709  avg_loss_step : 7.0709  learning_rate : 0.00019183324  loss_scaler : 32768 
DLL 2020-11-27 03:27:19.465553 - Iteration: 773  throughput_train : 262.561 seq/s mlm_loss : 7.1586  nsp_loss : 0.0000  total_loss : 7.1586  avg_loss_step : 7.1586  learning_rate : 0.00019141576  loss_scaler : 32768 
DLL 2020-11-27 03:27:21.411975 - Iteration: 774  throughput_train : 263.096 seq/s mlm_loss : 7.1291  nsp_loss : 0.0000  total_loss : 7.1291  avg_loss_step : 7.1291  learning_rate : 0.00019099737  loss_scaler : 32768 
DLL 2020-11-27 03:27:23.359734 - Iteration: 775  throughput_train : 262.906 seq/s mlm_loss : 7.0225  nsp_loss : 0.0000  total_loss : 7.0225  avg_loss_step : 7.0225  learning_rate : 0.00019057804  loss_scaler : 32768 
DLL 2020-11-27 03:27:25.306435 - Iteration: 776  throughput_train : 263.050 seq/s mlm_loss : 6.7556  nsp_loss : 0.0000  total_loss : 6.7556  avg_loss_step : 6.7556  learning_rate : 0.0001901578  loss_scaler : 32768 
DLL 2020-11-27 03:27:27.255866 - Iteration: 777  throughput_train : 262.692 seq/s mlm_loss : 6.8302  nsp_loss : 0.0000  total_loss : 6.8302  avg_loss_step : 6.8302  learning_rate : 0.00018973663  loss_scaler : 32768 
DLL 2020-11-27 03:27:29.196414 - Iteration: 778  throughput_train : 263.883 seq/s mlm_loss : 6.9484  nsp_loss : 0.0000  total_loss : 6.9484  avg_loss_step : 6.9484  learning_rate : 0.00018931454  loss_scaler : 32768 
DLL 2020-11-27 03:27:31.139501 - Iteration: 779  throughput_train : 263.540 seq/s mlm_loss : 6.8814  nsp_loss : 0.0000  total_loss : 6.8814  avg_loss_step : 6.8814  learning_rate : 0.00018889148  loss_scaler : 32768 
DLL 2020-11-27 03:27:33.081837 - Iteration: 780  throughput_train : 263.644 seq/s mlm_loss : 6.8018  nsp_loss : 0.0000  total_loss : 6.8018  avg_loss_step : 6.8018  learning_rate : 0.00018846747  loss_scaler : 32768 
DLL 2020-11-27 03:27:35.029407 - Iteration: 781  throughput_train : 262.931 seq/s mlm_loss : 6.9526  nsp_loss : 0.0000  total_loss : 6.9526  avg_loss_step : 6.9526  learning_rate : 0.00018804253  loss_scaler : 32768 
DLL 2020-11-27 03:27:36.981657 - Iteration: 782  throughput_train : 262.303 seq/s mlm_loss : 6.9237  nsp_loss : 0.0000  total_loss : 6.9237  avg_loss_step : 6.9237  learning_rate : 0.00018761662  loss_scaler : 32768 
DLL 2020-11-27 03:27:38.927913 - Iteration: 783  throughput_train : 263.109 seq/s mlm_loss : 6.8622  nsp_loss : 0.0000  total_loss : 6.8622  avg_loss_step : 6.8622  learning_rate : 0.00018718973  loss_scaler : 32768 
DLL 2020-11-27 03:27:40.864483 - Iteration: 784  throughput_train : 264.426 seq/s mlm_loss : 6.9765  nsp_loss : 0.0000  total_loss : 6.9765  avg_loss_step : 6.9765  learning_rate : 0.00018676186  loss_scaler : 32768 
DLL 2020-11-27 03:27:42.811028 - Iteration: 785  throughput_train : 263.071 seq/s mlm_loss : 7.0138  nsp_loss : 0.0000  total_loss : 7.0138  avg_loss_step : 7.0138  learning_rate : 0.00018633301  loss_scaler : 32768 
DLL 2020-11-27 03:27:44.760318 - Iteration: 786  throughput_train : 262.700 seq/s mlm_loss : 6.9907  nsp_loss : 0.0000  total_loss : 6.9907  avg_loss_step : 6.9907  learning_rate : 0.00018590318  loss_scaler : 32768 
DLL 2020-11-27 03:27:46.698962 - Iteration: 787  throughput_train : 264.144 seq/s mlm_loss : 7.2020  nsp_loss : 0.0000  total_loss : 7.2020  avg_loss_step : 7.2020  learning_rate : 0.00018547235  loss_scaler : 32768 
DLL 2020-11-27 03:27:48.639566 - Iteration: 788  throughput_train : 263.878 seq/s mlm_loss : 7.0324  nsp_loss : 0.0000  total_loss : 7.0324  avg_loss_step : 7.0324  learning_rate : 0.00018504052  loss_scaler : 32768 
DLL 2020-11-27 03:27:50.595212 - Iteration: 789  throughput_train : 261.852 seq/s mlm_loss : 7.0646  nsp_loss : 0.0000  total_loss : 7.0646  avg_loss_step : 7.0646  learning_rate : 0.00018460766  loss_scaler : 32768 
DLL 2020-11-27 03:27:52.536775 - Iteration: 790  throughput_train : 263.775 seq/s mlm_loss : 7.1055  nsp_loss : 0.0000  total_loss : 7.1055  avg_loss_step : 7.1055  learning_rate : 0.00018417381  loss_scaler : 32768 
DLL 2020-11-27 03:27:54.472441 - Iteration: 791  throughput_train : 264.576 seq/s mlm_loss : 6.8430  nsp_loss : 0.0000  total_loss : 6.8430  avg_loss_step : 6.8430  learning_rate : 0.00018373893  loss_scaler : 32768 
DLL 2020-11-27 03:27:56.417840 - Iteration: 792  throughput_train : 263.246 seq/s mlm_loss : 6.8850  nsp_loss : 0.0000  total_loss : 6.8850  avg_loss_step : 6.8850  learning_rate : 0.00018330301  loss_scaler : 32768 
DLL 2020-11-27 03:27:58.363681 - Iteration: 793  throughput_train : 263.168 seq/s mlm_loss : 6.9649  nsp_loss : 0.0000  total_loss : 6.9649  avg_loss_step : 6.9649  learning_rate : 0.00018286608  loss_scaler : 32768 
DLL 2020-11-27 03:28:00.312285 - Iteration: 794  throughput_train : 262.803 seq/s mlm_loss : 6.8303  nsp_loss : 0.0000  total_loss : 6.8303  avg_loss_step : 6.8303  learning_rate : 0.00018242803  loss_scaler : 32768 
DLL 2020-11-27 03:28:02.247975 - Iteration: 795  throughput_train : 264.550 seq/s mlm_loss : 6.7059  nsp_loss : 0.0000  total_loss : 6.7059  avg_loss_step : 6.7059  learning_rate : 0.00018198899  loss_scaler : 32768 
DLL 2020-11-27 03:28:04.189328 - Iteration: 796  throughput_train : 263.784 seq/s mlm_loss : 6.8815  nsp_loss : 0.0000  total_loss : 6.8815  avg_loss_step : 6.8815  learning_rate : 0.00018154888  loss_scaler : 32768 
DLL 2020-11-27 03:28:06.123885 - Iteration: 797  throughput_train : 264.703 seq/s mlm_loss : 6.9717  nsp_loss : 0.0000  total_loss : 6.9717  avg_loss_step : 6.9717  learning_rate : 0.0001811077  loss_scaler : 32768 
DLL 2020-11-27 03:28:08.074127 - Iteration: 798  throughput_train : 262.572 seq/s mlm_loss : 7.0314  nsp_loss : 0.0000  total_loss : 7.0314  avg_loss_step : 7.0314  learning_rate : 0.0001806654  loss_scaler : 32768 
DLL 2020-11-27 03:28:10.030905 - Iteration: 799  throughput_train : 261.695 seq/s mlm_loss : 6.9870  nsp_loss : 0.0000  total_loss : 6.9870  avg_loss_step : 6.9870  learning_rate : 0.00018022205  loss_scaler : 32768 
DLL 2020-11-27 03:28:11.967739 - Iteration: 800  throughput_train : 264.390 seq/s mlm_loss : 7.0242  nsp_loss : 0.0000  total_loss : 7.0242  avg_loss_step : 7.0242  learning_rate : 0.00017977762  loss_scaler : 32768 
DLL 2020-11-27 03:28:13.912341 - Iteration: 801  throughput_train : 263.333 seq/s mlm_loss : 6.9614  nsp_loss : 0.0000  total_loss : 6.9614  avg_loss_step : 6.9614  learning_rate : 0.00017933207  loss_scaler : 32768 
DLL 2020-11-27 03:28:15.851202 - Iteration: 802  throughput_train : 264.119 seq/s mlm_loss : 6.8788  nsp_loss : 0.0000  total_loss : 6.8788  avg_loss_step : 6.8788  learning_rate : 0.00017888543  loss_scaler : 32768 
DLL 2020-11-27 03:28:17.798613 - Iteration: 803  throughput_train : 262.958 seq/s mlm_loss : 6.9172  nsp_loss : 0.0000  total_loss : 6.9172  avg_loss_step : 6.9172  learning_rate : 0.00017843764  loss_scaler : 32768 
DLL 2020-11-27 03:28:19.743369 - Iteration: 804  throughput_train : 263.330 seq/s mlm_loss : 6.8688  nsp_loss : 0.0000  total_loss : 6.8688  avg_loss_step : 6.8688  learning_rate : 0.00017798874  loss_scaler : 32768 
DLL 2020-11-27 03:28:21.686579 - Iteration: 805  throughput_train : 263.541 seq/s mlm_loss : 7.0042  nsp_loss : 0.0000  total_loss : 7.0042  avg_loss_step : 7.0042  learning_rate : 0.0001775387  loss_scaler : 32768 
DLL 2020-11-27 03:28:23.630887 - Iteration: 806  throughput_train : 263.374 seq/s mlm_loss : 6.9142  nsp_loss : 0.0000  total_loss : 6.9142  avg_loss_step : 6.9142  learning_rate : 0.00017708754  loss_scaler : 32768 
DLL 2020-11-27 03:28:25.575169 - Iteration: 807  throughput_train : 263.379 seq/s mlm_loss : 6.9277  nsp_loss : 0.0000  total_loss : 6.9277  avg_loss_step : 6.9277  learning_rate : 0.00017663518  loss_scaler : 32768 
DLL 2020-11-27 03:28:27.523783 - Iteration: 808  throughput_train : 262.791 seq/s mlm_loss : 7.0044  nsp_loss : 0.0000  total_loss : 7.0044  avg_loss_step : 7.0044  learning_rate : 0.0001761817  loss_scaler : 32768 
DLL 2020-11-27 03:28:29.462463 - Iteration: 809  throughput_train : 264.140 seq/s mlm_loss : 7.1039  nsp_loss : 0.0000  total_loss : 7.1039  avg_loss_step : 7.1039  learning_rate : 0.00017572704  loss_scaler : 32768 
DLL 2020-11-27 03:28:31.404054 - Iteration: 810  throughput_train : 263.742 seq/s mlm_loss : 6.9743  nsp_loss : 0.0000  total_loss : 6.9743  avg_loss_step : 6.9743  learning_rate : 0.0001752712  loss_scaler : 32768 
DLL 2020-11-27 03:28:33.349908 - Iteration: 811  throughput_train : 263.163 seq/s mlm_loss : 6.9784  nsp_loss : 0.0000  total_loss : 6.9784  avg_loss_step : 6.9784  learning_rate : 0.00017481417  loss_scaler : 32768 
DLL 2020-11-27 03:28:35.301479 - Iteration: 812  throughput_train : 262.393 seq/s mlm_loss : 7.0608  nsp_loss : 0.0000  total_loss : 7.0608  avg_loss_step : 7.0608  learning_rate : 0.00017435593  loss_scaler : 32768 
DLL 2020-11-27 03:28:37.235982 - Iteration: 813  throughput_train : 264.711 seq/s mlm_loss : 7.0819  nsp_loss : 0.0000  total_loss : 7.0819  avg_loss_step : 7.0819  learning_rate : 0.0001738965  loss_scaler : 32768 
DLL 2020-11-27 03:28:39.167683 - Iteration: 814  throughput_train : 265.091 seq/s mlm_loss : 7.2069  nsp_loss : 0.0000  total_loss : 7.2069  avg_loss_step : 7.2069  learning_rate : 0.00017343585  loss_scaler : 32768 
DLL 2020-11-27 03:28:41.114099 - Iteration: 815  throughput_train : 263.087 seq/s mlm_loss : 6.9523  nsp_loss : 0.0000  total_loss : 6.9523  avg_loss_step : 6.9523  learning_rate : 0.00017297397  loss_scaler : 32768 
DLL 2020-11-27 03:28:43.059582 - Iteration: 816  throughput_train : 263.214 seq/s mlm_loss : 7.0288  nsp_loss : 0.0000  total_loss : 7.0288  avg_loss_step : 7.0288  learning_rate : 0.00017251086  loss_scaler : 32768 
DLL 2020-11-27 03:28:45.004560 - Iteration: 817  throughput_train : 263.283 seq/s mlm_loss : 6.9679  nsp_loss : 0.0000  total_loss : 6.9679  avg_loss_step : 6.9679  learning_rate : 0.00017204648  loss_scaler : 32768 
DLL 2020-11-27 03:28:46.956909 - Iteration: 818  throughput_train : 262.288 seq/s mlm_loss : 6.8440  nsp_loss : 0.0000  total_loss : 6.8440  avg_loss_step : 6.8440  learning_rate : 0.00017158086  loss_scaler : 32768 
DLL 2020-11-27 03:28:48.907714 - Iteration: 819  throughput_train : 262.497 seq/s mlm_loss : 6.8333  nsp_loss : 0.0000  total_loss : 6.8333  avg_loss_step : 6.8333  learning_rate : 0.00017111398  loss_scaler : 32768 
DLL 2020-11-27 03:28:50.851101 - Iteration: 820  throughput_train : 263.498 seq/s mlm_loss : 6.8919  nsp_loss : 0.0000  total_loss : 6.8919  avg_loss_step : 6.8919  learning_rate : 0.00017064581  loss_scaler : 32768 
DLL 2020-11-27 03:28:52.798281 - Iteration: 821  throughput_train : 262.985 seq/s mlm_loss : 7.0756  nsp_loss : 0.0000  total_loss : 7.0756  avg_loss_step : 7.0756  learning_rate : 0.00017017635  loss_scaler : 32768 
DLL 2020-11-27 03:28:54.751387 - Iteration: 822  throughput_train : 262.189 seq/s mlm_loss : 7.0195  nsp_loss : 0.0000  total_loss : 7.0195  avg_loss_step : 7.0195  learning_rate : 0.0001697056  loss_scaler : 32768 
DLL 2020-11-27 03:28:56.695710 - Iteration: 823  throughput_train : 263.387 seq/s mlm_loss : 6.9612  nsp_loss : 0.0000  total_loss : 6.9612  avg_loss_step : 6.9612  learning_rate : 0.00016923355  loss_scaler : 32768 
DLL 2020-11-27 03:28:58.641915 - Iteration: 824  throughput_train : 263.134 seq/s mlm_loss : 6.8476  nsp_loss : 0.0000  total_loss : 6.8476  avg_loss_step : 6.8476  learning_rate : 0.00016876016  loss_scaler : 32768 
DLL 2020-11-27 03:29:00.590179 - Iteration: 825  throughput_train : 262.837 seq/s mlm_loss : 6.8095  nsp_loss : 0.0000  total_loss : 6.8095  avg_loss_step : 6.8095  learning_rate : 0.00016828546  loss_scaler : 32768 
DLL 2020-11-27 03:29:02.539913 - Iteration: 826  throughput_train : 262.640 seq/s mlm_loss : 6.9008  nsp_loss : 0.0000  total_loss : 6.9008  avg_loss_step : 6.9008  learning_rate : 0.00016780938  loss_scaler : 32768 
DLL 2020-11-27 03:29:04.499045 - Iteration: 827  throughput_train : 261.380 seq/s mlm_loss : 6.9234  nsp_loss : 0.0000  total_loss : 6.9234  avg_loss_step : 6.9234  learning_rate : 0.00016733198  loss_scaler : 32768 
DLL 2020-11-27 03:29:06.448263 - Iteration: 828  throughput_train : 262.710 seq/s mlm_loss : 6.8311  nsp_loss : 0.0000  total_loss : 6.8311  avg_loss_step : 6.8311  learning_rate : 0.0001668532  loss_scaler : 32768 
DLL 2020-11-27 03:29:08.398186 - Iteration: 829  throughput_train : 262.614 seq/s mlm_loss : 6.9833  nsp_loss : 0.0000  total_loss : 6.9833  avg_loss_step : 6.9833  learning_rate : 0.00016637305  loss_scaler : 32768 
DLL 2020-11-27 03:29:10.342656 - Iteration: 830  throughput_train : 263.351 seq/s mlm_loss : 6.8822  nsp_loss : 0.0000  total_loss : 6.8822  avg_loss_step : 6.8822  learning_rate : 0.00016589148  loss_scaler : 32768 
DLL 2020-11-27 03:29:12.286770 - Iteration: 831  throughput_train : 263.399 seq/s mlm_loss : 6.8550  nsp_loss : 0.0000  total_loss : 6.8550  avg_loss_step : 6.8550  learning_rate : 0.00016540856  loss_scaler : 32768 
DLL 2020-11-27 03:29:14.223740 - Iteration: 832  throughput_train : 264.370 seq/s mlm_loss : 6.9561  nsp_loss : 0.0000  total_loss : 6.9561  avg_loss_step : 6.9561  learning_rate : 0.0001649242  loss_scaler : 32768 
DLL 2020-11-27 03:29:16.165404 - Iteration: 833  throughput_train : 263.733 seq/s mlm_loss : 6.8544  nsp_loss : 0.0000  total_loss : 6.8544  avg_loss_step : 6.8544  learning_rate : 0.00016443842  loss_scaler : 32768 
DLL 2020-11-27 03:29:18.109233 - Iteration: 834  throughput_train : 263.440 seq/s mlm_loss : 6.9844  nsp_loss : 0.0000  total_loss : 6.9844  avg_loss_step : 6.9844  learning_rate : 0.0001639512  loss_scaler : 32768 
DLL 2020-11-27 03:29:20.050892 - Iteration: 835  throughput_train : 263.733 seq/s mlm_loss : 6.9159  nsp_loss : 0.0000  total_loss : 6.9159  avg_loss_step : 6.9159  learning_rate : 0.00016346251  loss_scaler : 32768 
DLL 2020-11-27 03:29:22.002353 - Iteration: 836  throughput_train : 262.408 seq/s mlm_loss : 6.9142  nsp_loss : 0.0000  total_loss : 6.9142  avg_loss_step : 6.9142  learning_rate : 0.00016297236  loss_scaler : 32768 
DLL 2020-11-27 03:29:23.959769 - Iteration: 837  throughput_train : 261.611 seq/s mlm_loss : 6.9673  nsp_loss : 0.0000  total_loss : 6.9673  avg_loss_step : 6.9673  learning_rate : 0.00016248075  loss_scaler : 32768 
DLL 2020-11-27 03:29:25.897218 - Iteration: 838  throughput_train : 264.305 seq/s mlm_loss : 7.0895  nsp_loss : 0.0000  total_loss : 7.0895  avg_loss_step : 7.0895  learning_rate : 0.00016198763  loss_scaler : 32768 
DLL 2020-11-27 03:29:27.847529 - Iteration: 839  throughput_train : 262.562 seq/s mlm_loss : 7.1113  nsp_loss : 0.0000  total_loss : 7.1113  avg_loss_step : 7.1113  learning_rate : 0.00016149302  loss_scaler : 32768 
DLL 2020-11-27 03:29:29.801018 - Iteration: 840  throughput_train : 262.138 seq/s mlm_loss : 6.9197  nsp_loss : 0.0000  total_loss : 6.9197  avg_loss_step : 6.9197  learning_rate : 0.00016099686  loss_scaler : 32768 
DLL 2020-11-27 03:29:31.752143 - Iteration: 841  throughput_train : 262.471 seq/s mlm_loss : 7.0030  nsp_loss : 0.0000  total_loss : 7.0030  avg_loss_step : 7.0030  learning_rate : 0.0001604992  loss_scaler : 32768 
DLL 2020-11-27 03:29:33.682008 - Iteration: 842  throughput_train : 265.358 seq/s mlm_loss : 6.9480  nsp_loss : 0.0000  total_loss : 6.9480  avg_loss_step : 6.9480  learning_rate : 0.00015999998  loss_scaler : 32768 
DLL 2020-11-27 03:29:35.622089 - Iteration: 843  throughput_train : 263.946 seq/s mlm_loss : 7.0357  nsp_loss : 0.0000  total_loss : 7.0357  avg_loss_step : 7.0357  learning_rate : 0.0001594992  loss_scaler : 32768 
DLL 2020-11-27 03:29:37.563073 - Iteration: 844  throughput_train : 263.824 seq/s mlm_loss : 6.8540  nsp_loss : 0.0000  total_loss : 6.8540  avg_loss_step : 6.8540  learning_rate : 0.00015899682  loss_scaler : 32768 
DLL 2020-11-27 03:29:39.514069 - Iteration: 845  throughput_train : 262.470 seq/s mlm_loss : 6.9167  nsp_loss : 0.0000  total_loss : 6.9167  avg_loss_step : 6.9167  learning_rate : 0.00015849287  loss_scaler : 32768 
DLL 2020-11-27 03:29:41.464709 - Iteration: 846  throughput_train : 262.522 seq/s mlm_loss : 6.9563  nsp_loss : 0.0000  total_loss : 6.9563  avg_loss_step : 6.9563  learning_rate : 0.00015798732  loss_scaler : 32768 
DLL 2020-11-27 03:29:43.403287 - Iteration: 847  throughput_train : 264.152 seq/s mlm_loss : 6.9096  nsp_loss : 0.0000  total_loss : 6.9096  avg_loss_step : 6.9096  learning_rate : 0.00015748014  loss_scaler : 32768 
DLL 2020-11-27 03:29:45.351202 - Iteration: 848  throughput_train : 262.885 seq/s mlm_loss : 7.0075  nsp_loss : 0.0000  total_loss : 7.0075  avg_loss_step : 7.0075  learning_rate : 0.00015697132  loss_scaler : 32768 
DLL 2020-11-27 03:29:47.299579 - Iteration: 849  throughput_train : 262.824 seq/s mlm_loss : 6.8904  nsp_loss : 0.0000  total_loss : 6.8904  avg_loss_step : 6.8904  learning_rate : 0.00015646082  loss_scaler : 32768 
DLL 2020-11-27 03:29:49.241340 - Iteration: 850  throughput_train : 263.718 seq/s mlm_loss : 6.9523  nsp_loss : 0.0000  total_loss : 6.9523  avg_loss_step : 6.9523  learning_rate : 0.00015594868  loss_scaler : 32768 
DLL 2020-11-27 03:29:51.191276 - Iteration: 851  throughput_train : 262.612 seq/s mlm_loss : 7.0171  nsp_loss : 0.0000  total_loss : 7.0171  avg_loss_step : 7.0171  learning_rate : 0.00015543486  loss_scaler : 32768 
DLL 2020-11-27 03:29:53.139418 - Iteration: 852  throughput_train : 262.854 seq/s mlm_loss : 6.9770  nsp_loss : 0.0000  total_loss : 6.9770  avg_loss_step : 6.9770  learning_rate : 0.00015491933  loss_scaler : 32768 
DLL 2020-11-27 03:29:55.088081 - Iteration: 853  throughput_train : 262.785 seq/s mlm_loss : 7.1312  nsp_loss : 0.0000  total_loss : 7.1312  avg_loss_step : 7.1312  learning_rate : 0.00015440206  loss_scaler : 32768 
DLL 2020-11-27 03:29:57.029673 - Iteration: 854  throughput_train : 263.740 seq/s mlm_loss : 6.9693  nsp_loss : 0.0000  total_loss : 6.9693  avg_loss_step : 6.9693  learning_rate : 0.00015388304  loss_scaler : 32768 
DLL 2020-11-27 03:29:58.971206 - Iteration: 855  throughput_train : 263.749 seq/s mlm_loss : 6.9006  nsp_loss : 0.0000  total_loss : 6.9006  avg_loss_step : 6.9006  learning_rate : 0.0001533623  loss_scaler : 32768 
DLL 2020-11-27 03:30:00.918501 - Iteration: 856  throughput_train : 262.968 seq/s mlm_loss : 6.8803  nsp_loss : 0.0000  total_loss : 6.8803  avg_loss_step : 6.8803  learning_rate : 0.00015283977  loss_scaler : 32768 
DLL 2020-11-27 03:30:02.868788 - Iteration: 857  throughput_train : 262.565 seq/s mlm_loss : 6.9488  nsp_loss : 0.0000  total_loss : 6.9488  avg_loss_step : 6.9488  learning_rate : 0.00015231545  loss_scaler : 32768 
DLL 2020-11-27 03:30:04.821727 - Iteration: 858  throughput_train : 262.208 seq/s mlm_loss : 7.1134  nsp_loss : 0.0000  total_loss : 7.1134  avg_loss_step : 7.1134  learning_rate : 0.0001517893  loss_scaler : 32768 
DLL 2020-11-27 03:30:06.777028 - Iteration: 859  throughput_train : 261.891 seq/s mlm_loss : 7.1815  nsp_loss : 0.0000  total_loss : 7.1815  avg_loss_step : 7.1815  learning_rate : 0.00015126132  loss_scaler : 32768 
DLL 2020-11-27 03:30:08.725621 - Iteration: 860  throughput_train : 262.793 seq/s mlm_loss : 7.1078  nsp_loss : 0.0000  total_loss : 7.1078  avg_loss_step : 7.1078  learning_rate : 0.00015073152  loss_scaler : 32768 
DLL 2020-11-27 03:30:10.663019 - Iteration: 861  throughput_train : 264.312 seq/s mlm_loss : 7.1144  nsp_loss : 0.0000  total_loss : 7.1144  avg_loss_step : 7.1144  learning_rate : 0.00015019985  loss_scaler : 32768 
DLL 2020-11-27 03:30:12.609102 - Iteration: 862  throughput_train : 263.133 seq/s mlm_loss : 6.9658  nsp_loss : 0.0000  total_loss : 6.9658  avg_loss_step : 6.9658  learning_rate : 0.00014966629  loss_scaler : 32768 
DLL 2020-11-27 03:30:14.552379 - Iteration: 863  throughput_train : 263.514 seq/s mlm_loss : 6.8953  nsp_loss : 0.0000  total_loss : 6.8953  avg_loss_step : 6.8953  learning_rate : 0.00014913078  loss_scaler : 32768 
DLL 2020-11-27 03:30:16.504053 - Iteration: 864  throughput_train : 262.385 seq/s mlm_loss : 6.9409  nsp_loss : 0.0000  total_loss : 6.9409  avg_loss_step : 6.9409  learning_rate : 0.00014859338  loss_scaler : 32768 
DLL 2020-11-27 03:30:18.444361 - Iteration: 865  throughput_train : 263.920 seq/s mlm_loss : 6.9798  nsp_loss : 0.0000  total_loss : 6.9798  avg_loss_step : 6.9798  learning_rate : 0.00014805402  loss_scaler : 32768 
DLL 2020-11-27 03:30:20.374200 - Iteration: 866  throughput_train : 265.364 seq/s mlm_loss : 6.7321  nsp_loss : 0.0000  total_loss : 6.7321  avg_loss_step : 6.7321  learning_rate : 0.00014751269  loss_scaler : 32768 
DLL 2020-11-27 03:30:22.320670 - Iteration: 867  throughput_train : 263.081 seq/s mlm_loss : 6.9625  nsp_loss : 0.0000  total_loss : 6.9625  avg_loss_step : 6.9625  learning_rate : 0.00014696934  loss_scaler : 32768 
DLL 2020-11-27 03:30:24.267699 - Iteration: 868  throughput_train : 263.009 seq/s mlm_loss : 7.0219  nsp_loss : 0.0000  total_loss : 7.0219  avg_loss_step : 7.0219  learning_rate : 0.000146424  loss_scaler : 32768 
DLL 2020-11-27 03:30:26.216193 - Iteration: 869  throughput_train : 262.807 seq/s mlm_loss : 6.9241  nsp_loss : 0.0000  total_loss : 6.9241  avg_loss_step : 6.9241  learning_rate : 0.00014587663  loss_scaler : 32768 
DLL 2020-11-27 03:30:28.164333 - Iteration: 870  throughput_train : 262.855 seq/s mlm_loss : 7.0425  nsp_loss : 0.0000  total_loss : 7.0425  avg_loss_step : 7.0425  learning_rate : 0.0001453272  loss_scaler : 32768 
DLL 2020-11-27 03:30:30.101536 - Iteration: 871  throughput_train : 264.345 seq/s mlm_loss : 7.0188  nsp_loss : 0.0000  total_loss : 7.0188  avg_loss_step : 7.0188  learning_rate : 0.00014477567  loss_scaler : 32768 
DLL 2020-11-27 03:30:32.051562 - Iteration: 872  throughput_train : 262.615 seq/s mlm_loss : 6.9611  nsp_loss : 0.0000  total_loss : 6.9611  avg_loss_step : 6.9611  learning_rate : 0.00014422202  loss_scaler : 32768 
DLL 2020-11-27 03:30:33.999095 - Iteration: 873  throughput_train : 262.938 seq/s mlm_loss : 7.0169  nsp_loss : 0.0000  total_loss : 7.0169  avg_loss_step : 7.0169  learning_rate : 0.00014366626  loss_scaler : 32768 
DLL 2020-11-27 03:30:35.941688 - Iteration: 874  throughput_train : 263.606 seq/s mlm_loss : 6.9855  nsp_loss : 0.0000  total_loss : 6.9855  avg_loss_step : 6.9855  learning_rate : 0.00014310832  loss_scaler : 32768 
DLL 2020-11-27 03:30:37.889674 - Iteration: 875  throughput_train : 262.876 seq/s mlm_loss : 6.7696  nsp_loss : 0.0000  total_loss : 6.7696  avg_loss_step : 6.7696  learning_rate : 0.00014254822  loss_scaler : 32768 
DLL 2020-11-27 03:30:39.842236 - Iteration: 876  throughput_train : 262.260 seq/s mlm_loss : 6.7091  nsp_loss : 0.0000  total_loss : 6.7091  avg_loss_step : 6.7091  learning_rate : 0.0001419859  loss_scaler : 32768 
DLL 2020-11-27 03:30:41.787213 - Iteration: 877  throughput_train : 263.291 seq/s mlm_loss : 7.0089  nsp_loss : 0.0000  total_loss : 7.0089  avg_loss_step : 7.0089  learning_rate : 0.00014142132  loss_scaler : 32768 
DLL 2020-11-27 03:30:43.738122 - Iteration: 878  throughput_train : 262.499 seq/s mlm_loss : 6.9753  nsp_loss : 0.0000  total_loss : 6.9753  avg_loss_step : 6.9753  learning_rate : 0.0001408545  loss_scaler : 32768 
DLL 2020-11-27 03:30:45.676014 - Iteration: 879  throughput_train : 264.247 seq/s mlm_loss : 6.9691  nsp_loss : 0.0000  total_loss : 6.9691  avg_loss_step : 6.9691  learning_rate : 0.00014028541  loss_scaler : 32768 
DLL 2020-11-27 03:30:47.621066 - Iteration: 880  throughput_train : 263.273 seq/s mlm_loss : 6.9666  nsp_loss : 0.0000  total_loss : 6.9666  avg_loss_step : 6.9666  learning_rate : 0.00013971397  loss_scaler : 32768 
DLL 2020-11-27 03:30:49.564009 - Iteration: 881  throughput_train : 263.559 seq/s mlm_loss : 6.9325  nsp_loss : 0.0000  total_loss : 6.9325  avg_loss_step : 6.9325  learning_rate : 0.00013914017  loss_scaler : 32768 
DLL 2020-11-27 03:30:51.512811 - Iteration: 882  throughput_train : 262.767 seq/s mlm_loss : 6.9320  nsp_loss : 0.0000  total_loss : 6.9320  avg_loss_step : 6.9320  learning_rate : 0.00013856404  loss_scaler : 32768 
DLL 2020-11-27 03:30:53.460503 - Iteration: 883  throughput_train : 262.923 seq/s mlm_loss : 7.0282  nsp_loss : 0.0000  total_loss : 7.0282  avg_loss_step : 7.0282  learning_rate : 0.00013798548  loss_scaler : 32768 
DLL 2020-11-27 03:30:55.400606 - Iteration: 884  throughput_train : 263.965 seq/s mlm_loss : 6.9080  nsp_loss : 0.0000  total_loss : 6.9080  avg_loss_step : 6.9080  learning_rate : 0.00013740448  loss_scaler : 32768 
DLL 2020-11-27 03:30:57.359322 - Iteration: 885  throughput_train : 261.435 seq/s mlm_loss : 6.8399  nsp_loss : 0.0000  total_loss : 6.8399  avg_loss_step : 6.8399  learning_rate : 0.00013682104  loss_scaler : 32768 
DLL 2020-11-27 03:30:59.302665 - Iteration: 886  throughput_train : 263.505 seq/s mlm_loss : 6.8415  nsp_loss : 0.0000  total_loss : 6.8415  avg_loss_step : 6.8415  learning_rate : 0.00013623506  loss_scaler : 32768 
DLL 2020-11-27 03:31:01.254166 - Iteration: 887  throughput_train : 262.401 seq/s mlm_loss : 6.8820  nsp_loss : 0.0000  total_loss : 6.8820  avg_loss_step : 6.8820  learning_rate : 0.00013564657  loss_scaler : 32768 
DLL 2020-11-27 03:31:03.195339 - Iteration: 888  throughput_train : 263.798 seq/s mlm_loss : 6.9256  nsp_loss : 0.0000  total_loss : 6.9256  avg_loss_step : 6.9256  learning_rate : 0.00013505551  loss_scaler : 32768 
DLL 2020-11-27 03:31:05.149884 - Iteration: 889  throughput_train : 261.994 seq/s mlm_loss : 6.8912  nsp_loss : 0.0000  total_loss : 6.8912  avg_loss_step : 6.8912  learning_rate : 0.00013446188  loss_scaler : 32768 
DLL 2020-11-27 03:31:07.099369 - Iteration: 890  throughput_train : 262.676 seq/s mlm_loss : 6.9530  nsp_loss : 0.0000  total_loss : 6.9530  avg_loss_step : 6.9530  learning_rate : 0.00013386556  loss_scaler : 32768 
DLL 2020-11-27 03:31:09.044788 - Iteration: 891  throughput_train : 263.222 seq/s mlm_loss : 6.9154  nsp_loss : 0.0000  total_loss : 6.9154  avg_loss_step : 6.9154  learning_rate : 0.00013326661  loss_scaler : 32768 
DLL 2020-11-27 03:31:10.994857 - Iteration: 892  throughput_train : 262.597 seq/s mlm_loss : 6.8963  nsp_loss : 0.0000  total_loss : 6.8963  avg_loss_step : 6.8963  learning_rate : 0.00013266497  loss_scaler : 32768 
DLL 2020-11-27 03:31:12.947912 - Iteration: 893  throughput_train : 262.195 seq/s mlm_loss : 6.8198  nsp_loss : 0.0000  total_loss : 6.8198  avg_loss_step : 6.8198  learning_rate : 0.00013206057  loss_scaler : 32768 
DLL 2020-11-27 03:31:14.893208 - Iteration: 894  throughput_train : 263.245 seq/s mlm_loss : 7.0223  nsp_loss : 0.0000  total_loss : 7.0223  avg_loss_step : 7.0223  learning_rate : 0.0001314534  loss_scaler : 32768 
DLL 2020-11-27 03:31:16.829614 - Iteration: 895  throughput_train : 264.452 seq/s mlm_loss : 7.0538  nsp_loss : 0.0000  total_loss : 7.0538  avg_loss_step : 7.0538  learning_rate : 0.00013084337  loss_scaler : 32768 
DLL 2020-11-27 03:31:18.776792 - Iteration: 896  throughput_train : 262.991 seq/s mlm_loss : 6.8960  nsp_loss : 0.0000  total_loss : 6.8960  avg_loss_step : 6.8960  learning_rate : 0.00013023053  loss_scaler : 32768 
DLL 2020-11-27 03:31:20.723640 - Iteration: 897  throughput_train : 263.044 seq/s mlm_loss : 7.0073  nsp_loss : 0.0000  total_loss : 7.0073  avg_loss_step : 7.0073  learning_rate : 0.0001296148  loss_scaler : 32768 
DLL 2020-11-27 03:31:22.670810 - Iteration: 898  throughput_train : 262.988 seq/s mlm_loss : 7.1470  nsp_loss : 0.0000  total_loss : 7.1470  avg_loss_step : 7.1470  learning_rate : 0.00012899611  loss_scaler : 32768 
DLL 2020-11-27 03:31:24.614356 - Iteration: 899  throughput_train : 263.485 seq/s mlm_loss : 7.0015  nsp_loss : 0.0000  total_loss : 7.0015  avg_loss_step : 7.0015  learning_rate : 0.00012837444  loss_scaler : 32768 
DLL 2020-11-27 03:31:26.556425 - Iteration: 900  throughput_train : 263.677 seq/s mlm_loss : 7.0501  nsp_loss : 0.0000  total_loss : 7.0501  avg_loss_step : 7.0501  learning_rate : 0.0001277497  loss_scaler : 32768 
DLL 2020-11-27 03:31:28.506146 - Iteration: 901  throughput_train : 262.645 seq/s mlm_loss : 7.1122  nsp_loss : 0.0000  total_loss : 7.1122  avg_loss_step : 7.1122  learning_rate : 0.00012712195  loss_scaler : 32768 
DLL 2020-11-27 03:31:30.463407 - Iteration: 902  throughput_train : 261.639 seq/s mlm_loss : 7.0145  nsp_loss : 0.0000  total_loss : 7.0145  avg_loss_step : 7.0145  learning_rate : 0.00012649108  loss_scaler : 32768 
DLL 2020-11-27 03:31:32.411648 - Iteration: 903  throughput_train : 262.847 seq/s mlm_loss : 6.8233  nsp_loss : 0.0000  total_loss : 6.8233  avg_loss_step : 6.8233  learning_rate : 0.00012585704  loss_scaler : 32768 
DLL 2020-11-27 03:31:34.359607 - Iteration: 904  throughput_train : 262.886 seq/s mlm_loss : 6.8065  nsp_loss : 0.0000  total_loss : 6.8065  avg_loss_step : 6.8065  learning_rate : 0.00012521975  loss_scaler : 32768 
DLL 2020-11-27 03:31:36.297610 - Iteration: 905  throughput_train : 264.242 seq/s mlm_loss : 6.8583  nsp_loss : 0.0000  total_loss : 6.8583  avg_loss_step : 6.8583  learning_rate : 0.00012457925  loss_scaler : 32768 
DLL 2020-11-27 03:31:38.240592 - Iteration: 906  throughput_train : 263.572 seq/s mlm_loss : 6.9624  nsp_loss : 0.0000  total_loss : 6.9624  avg_loss_step : 6.9624  learning_rate : 0.00012393543  loss_scaler : 32768 
DLL 2020-11-27 03:31:40.194320 - Iteration: 907  throughput_train : 262.102 seq/s mlm_loss : 6.8486  nsp_loss : 0.0000  total_loss : 6.8486  avg_loss_step : 6.8486  learning_rate : 0.00012328826  loss_scaler : 32768 
DLL 2020-11-27 03:31:42.136163 - Iteration: 908  throughput_train : 263.712 seq/s mlm_loss : 6.9456  nsp_loss : 0.0000  total_loss : 6.9456  avg_loss_step : 6.9456  learning_rate : 0.00012263766  loss_scaler : 32768 
DLL 2020-11-27 03:31:44.085632 - Iteration: 909  throughput_train : 262.677 seq/s mlm_loss : 6.9101  nsp_loss : 0.0000  total_loss : 6.9101  avg_loss_step : 6.9101  learning_rate : 0.00012198356  loss_scaler : 32768 
DLL 2020-11-27 03:31:46.032442 - Iteration: 910  throughput_train : 263.034 seq/s mlm_loss : 6.9922  nsp_loss : 0.0000  total_loss : 6.9922  avg_loss_step : 6.9922  learning_rate : 0.00012132597  loss_scaler : 32768 
DLL 2020-11-27 03:31:47.972133 - Iteration: 911  throughput_train : 264.004 seq/s mlm_loss : 7.1033  nsp_loss : 0.0000  total_loss : 7.1033  avg_loss_step : 7.1033  learning_rate : 0.000120664794  loss_scaler : 32768 
DLL 2020-11-27 03:31:49.926667 - Iteration: 912  throughput_train : 262.007 seq/s mlm_loss : 7.0447  nsp_loss : 0.0000  total_loss : 7.0447  avg_loss_step : 7.0447  learning_rate : 0.000119999975  loss_scaler : 32768 
DLL 2020-11-27 03:31:51.874750 - Iteration: 913  throughput_train : 262.862 seq/s mlm_loss : 6.8380  nsp_loss : 0.0000  total_loss : 6.8380  avg_loss_step : 6.8380  learning_rate : 0.00011933142  loss_scaler : 32768 
DLL 2020-11-27 03:31:53.823622 - Iteration: 914  throughput_train : 262.756 seq/s mlm_loss : 6.9580  nsp_loss : 0.0000  total_loss : 6.9580  avg_loss_step : 6.9580  learning_rate : 0.00011865913  loss_scaler : 32768 
DLL 2020-11-27 03:31:55.777274 - Iteration: 915  throughput_train : 262.115 seq/s mlm_loss : 6.9403  nsp_loss : 0.0000  total_loss : 6.9403  avg_loss_step : 6.9403  learning_rate : 0.000117983014  loss_scaler : 32768 
DLL 2020-11-27 03:31:57.726804 - Iteration: 916  throughput_train : 262.678 seq/s mlm_loss : 7.1607  nsp_loss : 0.0000  total_loss : 7.1607  avg_loss_step : 7.1607  learning_rate : 0.000117302996  loss_scaler : 32768 
DLL 2020-11-27 03:31:59.678018 - Iteration: 917  throughput_train : 262.447 seq/s mlm_loss : 7.1012  nsp_loss : 0.0000  total_loss : 7.1012  avg_loss_step : 7.1012  learning_rate : 0.00011661903  loss_scaler : 32768 
DLL 2020-11-27 03:32:01.619762 - Iteration: 918  throughput_train : 263.740 seq/s mlm_loss : 6.9754  nsp_loss : 0.0000  total_loss : 6.9754  avg_loss_step : 6.9754  learning_rate : 0.00011593096  loss_scaler : 32768 
DLL 2020-11-27 03:32:03.566299 - Iteration: 919  throughput_train : 263.072 seq/s mlm_loss : 7.2180  nsp_loss : 0.0000  total_loss : 7.2180  avg_loss_step : 7.2180  learning_rate : 0.00011523884  loss_scaler : 32768 
DLL 2020-11-27 03:32:05.510501 - Iteration: 920  throughput_train : 263.389 seq/s mlm_loss : 7.0958  nsp_loss : 0.0000  total_loss : 7.0958  avg_loss_step : 7.0958  learning_rate : 0.00011454254  loss_scaler : 32768 
DLL 2020-11-27 03:32:07.456779 - Iteration: 921  throughput_train : 263.107 seq/s mlm_loss : 6.9975  nsp_loss : 0.0000  total_loss : 6.9975  avg_loss_step : 6.9975  learning_rate : 0.00011384197  loss_scaler : 32768 
DLL 2020-11-27 03:32:09.406610 - Iteration: 922  throughput_train : 262.626 seq/s mlm_loss : 6.9562  nsp_loss : 0.0000  total_loss : 6.9562  avg_loss_step : 6.9562  learning_rate : 0.000113137074  loss_scaler : 32768 
DLL 2020-11-27 03:32:11.358496 - Iteration: 923  throughput_train : 262.351 seq/s mlm_loss : 6.9001  nsp_loss : 0.0000  total_loss : 6.9001  avg_loss_step : 6.9001  learning_rate : 0.00011242771  loss_scaler : 32768 
DLL 2020-11-27 03:32:13.293055 - Iteration: 924  throughput_train : 264.702 seq/s mlm_loss : 6.9168  nsp_loss : 0.0000  total_loss : 6.9168  avg_loss_step : 6.9168  learning_rate : 0.00011171388  loss_scaler : 32768 
DLL 2020-11-27 03:32:15.244974 - Iteration: 925  throughput_train : 262.347 seq/s mlm_loss : 6.9462  nsp_loss : 0.0000  total_loss : 6.9462  avg_loss_step : 6.9462  learning_rate : 0.00011099547  loss_scaler : 32768 
DLL 2020-11-27 03:32:17.190964 - Iteration: 926  throughput_train : 263.145 seq/s mlm_loss : 7.0298  nsp_loss : 0.0000  total_loss : 7.0298  avg_loss_step : 7.0298  learning_rate : 0.00011027237  loss_scaler : 32768 
DLL 2020-11-27 03:32:19.130395 - Iteration: 927  throughput_train : 264.035 seq/s mlm_loss : 6.8945  nsp_loss : 0.0000  total_loss : 6.8945  avg_loss_step : 6.8945  learning_rate : 0.00010954445  loss_scaler : 32768 
DLL 2020-11-27 03:32:21.073587 - Iteration: 928  throughput_train : 263.527 seq/s mlm_loss : 6.7719  nsp_loss : 0.0000  total_loss : 6.7719  avg_loss_step : 6.7719  learning_rate : 0.00010881172  loss_scaler : 32768 
DLL 2020-11-27 03:32:23.020428 - Iteration: 929  throughput_train : 263.031 seq/s mlm_loss : 6.8506  nsp_loss : 0.0000  total_loss : 6.8506  avg_loss_step : 6.8506  learning_rate : 0.000108074004  loss_scaler : 32768 
DLL 2020-11-27 03:32:24.964830 - Iteration: 930  throughput_train : 263.363 seq/s mlm_loss : 6.8781  nsp_loss : 0.0000  total_loss : 6.8781  avg_loss_step : 6.8781  learning_rate : 0.00010733124  loss_scaler : 32768 
DLL 2020-11-27 03:32:26.909971 - Iteration: 931  throughput_train : 263.261 seq/s mlm_loss : 6.9124  nsp_loss : 0.0000  total_loss : 6.9124  avg_loss_step : 6.9124  learning_rate : 0.000106583284  loss_scaler : 32768 
DLL 2020-11-27 03:32:28.844607 - Iteration: 932  throughput_train : 264.691 seq/s mlm_loss : 6.9373  nsp_loss : 0.0000  total_loss : 6.9373  avg_loss_step : 6.9373  learning_rate : 0.00010583  loss_scaler : 32768 
DLL 2020-11-27 03:32:30.787501 - Iteration: 933  throughput_train : 263.568 seq/s mlm_loss : 6.9597  nsp_loss : 0.0000  total_loss : 6.9597  avg_loss_step : 6.9597  learning_rate : 0.00010507136  loss_scaler : 32768 
DLL 2020-11-27 03:32:32.732301 - Iteration: 934  throughput_train : 263.327 seq/s mlm_loss : 6.8342  nsp_loss : 0.0000  total_loss : 6.8342  avg_loss_step : 6.8342  learning_rate : 0.000104307204  loss_scaler : 32768 
DLL 2020-11-27 03:32:34.674462 - Iteration: 935  throughput_train : 263.680 seq/s mlm_loss : 6.8903  nsp_loss : 0.0000  total_loss : 6.8903  avg_loss_step : 6.8903  learning_rate : 0.000103537415  loss_scaler : 32768 
DLL 2020-11-27 03:32:36.627343 - Iteration: 936  throughput_train : 262.234 seq/s mlm_loss : 6.9973  nsp_loss : 0.0000  total_loss : 6.9973  avg_loss_step : 6.9973  learning_rate : 0.00010276185  loss_scaler : 32768 
DLL 2020-11-27 03:32:38.572531 - Iteration: 937  throughput_train : 263.271 seq/s mlm_loss : 6.7610  nsp_loss : 0.0000  total_loss : 6.7610  avg_loss_step : 6.7610  learning_rate : 0.00010198034  loss_scaler : 32768 
DLL 2020-11-27 03:32:40.520489 - Iteration: 938  throughput_train : 262.893 seq/s mlm_loss : 6.7461  nsp_loss : 0.0000  total_loss : 6.7461  avg_loss_step : 6.7461  learning_rate : 0.00010119284  loss_scaler : 32768 
DLL 2020-11-27 03:32:42.463832 - Iteration: 939  throughput_train : 263.518 seq/s mlm_loss : 6.8241  nsp_loss : 0.0000  total_loss : 6.8241  avg_loss_step : 6.8241  learning_rate : 0.00010039917  loss_scaler : 32768 
DLL 2020-11-27 03:32:44.415486 - Iteration: 940  throughput_train : 262.393 seq/s mlm_loss : 7.0425  nsp_loss : 0.0000  total_loss : 7.0425  avg_loss_step : 7.0425  learning_rate : 9.959917e-05  loss_scaler : 32768 
DLL 2020-11-27 03:32:46.363619 - Iteration: 941  throughput_train : 262.856 seq/s mlm_loss : 7.0274  nsp_loss : 0.0000  total_loss : 7.0274  avg_loss_step : 7.0274  learning_rate : 9.8792654e-05  loss_scaler : 32768 
DLL 2020-11-27 03:32:48.305431 - Iteration: 942  throughput_train : 263.711 seq/s mlm_loss : 6.8771  nsp_loss : 0.0000  total_loss : 6.8771  avg_loss_step : 6.8771  learning_rate : 9.7979544e-05  loss_scaler : 32768 
DLL 2020-11-27 03:32:50.247842 - Iteration: 943  throughput_train : 263.630 seq/s mlm_loss : 6.9546  nsp_loss : 0.0000  total_loss : 6.9546  avg_loss_step : 6.9546  learning_rate : 9.7159624e-05  loss_scaler : 32768 
DLL 2020-11-27 03:32:52.187188 - Iteration: 944  throughput_train : 264.049 seq/s mlm_loss : 6.8345  nsp_loss : 0.0000  total_loss : 6.8345  avg_loss_step : 6.8345  learning_rate : 9.6332726e-05  loss_scaler : 32768 
DLL 2020-11-27 03:32:54.132547 - Iteration: 945  throughput_train : 263.238 seq/s mlm_loss : 6.9657  nsp_loss : 0.0000  total_loss : 6.9657  avg_loss_step : 6.9657  learning_rate : 9.5498675e-05  loss_scaler : 32768 
DLL 2020-11-27 03:32:56.084704 - Iteration: 946  throughput_train : 262.313 seq/s mlm_loss : 6.9970  nsp_loss : 0.0000  total_loss : 6.9970  avg_loss_step : 6.9970  learning_rate : 9.465722e-05  loss_scaler : 32768 
DLL 2020-11-27 03:32:58.033721 - Iteration: 947  throughput_train : 262.744 seq/s mlm_loss : 6.9875  nsp_loss : 0.0000  total_loss : 6.9875  avg_loss_step : 6.9875  learning_rate : 9.380827e-05  loss_scaler : 32768 
DLL 2020-11-27 03:32:59.973992 - Iteration: 948  throughput_train : 263.937 seq/s mlm_loss : 6.7989  nsp_loss : 0.0000  total_loss : 6.7989  avg_loss_step : 6.7989  learning_rate : 9.2951566e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:01.913029 - Iteration: 949  throughput_train : 264.100 seq/s mlm_loss : 6.8354  nsp_loss : 0.0000  total_loss : 6.8354  avg_loss_step : 6.8354  learning_rate : 9.208689e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:03.861673 - Iteration: 950  throughput_train : 262.792 seq/s mlm_loss : 6.9060  nsp_loss : 0.0000  total_loss : 6.9060  avg_loss_step : 6.9060  learning_rate : 9.1213966e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:05.807666 - Iteration: 951  throughput_train : 263.149 seq/s mlm_loss : 6.9194  nsp_loss : 0.0000  total_loss : 6.9194  avg_loss_step : 6.9194  learning_rate : 9.033266e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:07.756321 - Iteration: 952  throughput_train : 262.797 seq/s mlm_loss : 7.0511  nsp_loss : 0.0000  total_loss : 7.0511  avg_loss_step : 7.0511  learning_rate : 8.944268e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:09.700120 - Iteration: 953  throughput_train : 263.442 seq/s mlm_loss : 7.0058  nsp_loss : 0.0000  total_loss : 7.0058  avg_loss_step : 7.0058  learning_rate : 8.854374e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:11.640156 - Iteration: 954  throughput_train : 263.954 seq/s mlm_loss : 6.9056  nsp_loss : 0.0000  total_loss : 6.9056  avg_loss_step : 6.9056  learning_rate : 8.7635584e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:13.595436 - Iteration: 955  throughput_train : 261.895 seq/s mlm_loss : 6.9963  nsp_loss : 0.0000  total_loss : 6.9963  avg_loss_step : 6.9963  learning_rate : 8.671787e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:15.540825 - Iteration: 956  throughput_train : 263.231 seq/s mlm_loss : 6.9382  nsp_loss : 0.0000  total_loss : 6.9382  avg_loss_step : 6.9382  learning_rate : 8.579039e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:17.486852 - Iteration: 957  throughput_train : 263.143 seq/s mlm_loss : 6.9335  nsp_loss : 0.0000  total_loss : 6.9335  avg_loss_step : 6.9335  learning_rate : 8.485277e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:19.434531 - Iteration: 958  throughput_train : 262.930 seq/s mlm_loss : 6.9266  nsp_loss : 0.0000  total_loss : 6.9266  avg_loss_step : 6.9266  learning_rate : 8.390468e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:21.366225 - Iteration: 959  throughput_train : 265.092 seq/s mlm_loss : 6.8848  nsp_loss : 0.0000  total_loss : 6.8848  avg_loss_step : 6.8848  learning_rate : 8.294574e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:23.308822 - Iteration: 960  throughput_train : 263.605 seq/s mlm_loss : 7.0642  nsp_loss : 0.0000  total_loss : 7.0642  avg_loss_step : 7.0642  learning_rate : 8.1975544e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:25.255484 - Iteration: 961  throughput_train : 263.061 seq/s mlm_loss : 6.8351  nsp_loss : 0.0000  total_loss : 6.8351  avg_loss_step : 6.8351  learning_rate : 8.099378e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:27.197392 - Iteration: 962  throughput_train : 263.723 seq/s mlm_loss : 6.9702  nsp_loss : 0.0000  total_loss : 6.9702  avg_loss_step : 6.9702  learning_rate : 7.9999954e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:29.151166 - Iteration: 963  throughput_train : 262.112 seq/s mlm_loss : 6.9918  nsp_loss : 0.0000  total_loss : 6.9918  avg_loss_step : 6.9918  learning_rate : 7.899364e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:31.105493 - Iteration: 964  throughput_train : 262.034 seq/s mlm_loss : 7.0583  nsp_loss : 0.0000  total_loss : 7.0583  avg_loss_step : 7.0583  learning_rate : 7.7974284e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:33.063540 - Iteration: 965  throughput_train : 261.527 seq/s mlm_loss : 7.1737  nsp_loss : 0.0000  total_loss : 7.1737  avg_loss_step : 7.1737  learning_rate : 7.694147e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:35.000369 - Iteration: 966  throughput_train : 264.391 seq/s mlm_loss : 6.8307  nsp_loss : 0.0000  total_loss : 6.8307  avg_loss_step : 6.8307  learning_rate : 7.589462e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:36.945232 - Iteration: 967  throughput_train : 263.301 seq/s mlm_loss : 7.0447  nsp_loss : 0.0000  total_loss : 7.0447  avg_loss_step : 7.0447  learning_rate : 7.483311e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:38.889566 - Iteration: 968  throughput_train : 263.380 seq/s mlm_loss : 7.0521  nsp_loss : 0.0000  total_loss : 7.0521  avg_loss_step : 7.0521  learning_rate : 7.375633e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:40.837848 - Iteration: 969  throughput_train : 262.857 seq/s mlm_loss : 6.8837  nsp_loss : 0.0000  total_loss : 6.8837  avg_loss_step : 6.8837  learning_rate : 7.266353e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:42.787294 - Iteration: 970  throughput_train : 262.679 seq/s mlm_loss : 6.7635  nsp_loss : 0.0000  total_loss : 6.7635  avg_loss_step : 6.7635  learning_rate : 7.155411e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:44.718959 - Iteration: 971  throughput_train : 265.112 seq/s mlm_loss : 6.9365  nsp_loss : 0.0000  total_loss : 6.9365  avg_loss_step : 6.9365  learning_rate : 7.042722e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:46.672159 - Iteration: 972  throughput_train : 262.187 seq/s mlm_loss : 6.9507  nsp_loss : 0.0000  total_loss : 6.9507  avg_loss_step : 6.9507  learning_rate : 6.9282e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:48.605473 - Iteration: 973  throughput_train : 264.875 seq/s mlm_loss : 6.9384  nsp_loss : 0.0000  total_loss : 6.9384  avg_loss_step : 6.9384  learning_rate : 6.811746e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:50.553178 - Iteration: 974  throughput_train : 262.918 seq/s mlm_loss : 6.7134  nsp_loss : 0.0000  total_loss : 6.7134  avg_loss_step : 6.7134  learning_rate : 6.693273e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:52.503393 - Iteration: 975  throughput_train : 262.577 seq/s mlm_loss : 6.7645  nsp_loss : 0.0000  total_loss : 6.7645  avg_loss_step : 6.7645  learning_rate : 6.572664e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:54.433898 - Iteration: 976  throughput_train : 265.260 seq/s mlm_loss : 6.7382  nsp_loss : 0.0000  total_loss : 6.7382  avg_loss_step : 6.7382  learning_rate : 6.449802e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:56.374996 - Iteration: 977  throughput_train : 263.808 seq/s mlm_loss : 6.9420  nsp_loss : 0.0000  total_loss : 6.9420  avg_loss_step : 6.9420  learning_rate : 6.324552e-05  loss_scaler : 32768 
DLL 2020-11-27 03:33:58.326857 - Iteration: 978  throughput_train : 262.357 seq/s mlm_loss : 7.0544  nsp_loss : 0.0000  total_loss : 7.0544  avg_loss_step : 7.0544  learning_rate : 6.196764e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:00.269736 - Iteration: 979  throughput_train : 263.566 seq/s mlm_loss : 6.9911  nsp_loss : 0.0000  total_loss : 6.9911  avg_loss_step : 6.9911  learning_rate : 6.0662922e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:02.214705 - Iteration: 980  throughput_train : 263.284 seq/s mlm_loss : 7.0535  nsp_loss : 0.0000  total_loss : 7.0535  avg_loss_step : 7.0535  learning_rate : 5.9329526e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:04.164418 - Iteration: 981  throughput_train : 262.643 seq/s mlm_loss : 6.9968  nsp_loss : 0.0000  total_loss : 6.9968  avg_loss_step : 6.9968  learning_rate : 5.7965462e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:06.115629 - Iteration: 982  throughput_train : 262.442 seq/s mlm_loss : 6.9746  nsp_loss : 0.0000  total_loss : 6.9746  avg_loss_step : 6.9746  learning_rate : 5.6568515e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:08.056133 - Iteration: 983  throughput_train : 263.889 seq/s mlm_loss : 7.1669  nsp_loss : 0.0000  total_loss : 7.1669  avg_loss_step : 7.1669  learning_rate : 5.51361e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:10.007734 - Iteration: 984  throughput_train : 262.389 seq/s mlm_loss : 6.8964  nsp_loss : 0.0000  total_loss : 6.8964  avg_loss_step : 6.8964  learning_rate : 5.3665553e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:11.952179 - Iteration: 985  throughput_train : 263.356 seq/s mlm_loss : 6.9327  nsp_loss : 0.0000  total_loss : 6.9327  avg_loss_step : 6.9327  learning_rate : 5.2153555e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:13.899200 - Iteration: 986  throughput_train : 263.005 seq/s mlm_loss : 6.8967  nsp_loss : 0.0000  total_loss : 6.8967  avg_loss_step : 6.8967  learning_rate : 5.0596398e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:15.850486 - Iteration: 987  throughput_train : 262.435 seq/s mlm_loss : 7.0340  nsp_loss : 0.0000  total_loss : 7.0340  avg_loss_step : 7.0340  learning_rate : 4.8989674e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:17.791556 - Iteration: 988  throughput_train : 263.815 seq/s mlm_loss : 7.0388  nsp_loss : 0.0000  total_loss : 7.0388  avg_loss_step : 7.0388  learning_rate : 4.7328533e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:19.735560 - Iteration: 989  throughput_train : 263.415 seq/s mlm_loss : 7.0282  nsp_loss : 0.0000  total_loss : 7.0282  avg_loss_step : 7.0282  learning_rate : 4.5606932e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:21.687572 - Iteration: 990  throughput_train : 262.337 seq/s mlm_loss : 6.9032  nsp_loss : 0.0000  total_loss : 6.9032  avg_loss_step : 6.9032  learning_rate : 4.381774e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:23.632230 - Iteration: 991  throughput_train : 263.328 seq/s mlm_loss : 6.9428  nsp_loss : 0.0000  total_loss : 6.9428  avg_loss_step : 6.9428  learning_rate : 4.195231e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:25.575963 - Iteration: 992  throughput_train : 263.451 seq/s mlm_loss : 6.9965  nsp_loss : 0.0000  total_loss : 6.9965  avg_loss_step : 6.9965  learning_rate : 3.999986e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:27.521152 - Iteration: 993  throughput_train : 263.253 seq/s mlm_loss : 6.9281  nsp_loss : 0.0000  total_loss : 6.9281  avg_loss_step : 6.9281  learning_rate : 3.794721e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:29.466436 - Iteration: 994  throughput_train : 263.243 seq/s mlm_loss : 6.8886  nsp_loss : 0.0000  total_loss : 6.8886  avg_loss_step : 6.8886  learning_rate : 3.577699e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:31.412530 - Iteration: 995  throughput_train : 263.134 seq/s mlm_loss : 6.9287  nsp_loss : 0.0000  total_loss : 6.9287  avg_loss_step : 6.9287  learning_rate : 3.3466327e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:33.367772 - Iteration: 996  throughput_train : 261.900 seq/s mlm_loss : 6.9638  nsp_loss : 0.0000  total_loss : 6.9638  avg_loss_step : 6.9638  learning_rate : 3.098382e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:35.317308 - Iteration: 997  throughput_train : 262.673 seq/s mlm_loss : 6.9143  nsp_loss : 0.0000  total_loss : 6.9143  avg_loss_step : 6.9143  learning_rate : 2.8284087e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:37.266406 - Iteration: 998  throughput_train : 262.731 seq/s mlm_loss : 6.9291  nsp_loss : 0.0000  total_loss : 6.9291  avg_loss_step : 6.9291  learning_rate : 2.5298059e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:39.209309 - Iteration: 999  throughput_train : 263.564 seq/s mlm_loss : 6.7641  nsp_loss : 0.0000  total_loss : 6.7641  avg_loss_step : 6.7641  learning_rate : 2.1908761e-05  loss_scaler : 32768 
DLL 2020-11-27 03:34:41.159398 - Iteration: 1000  throughput_train : 262.595 seq/s mlm_loss : 6.8897  nsp_loss : 0.0000  total_loss : 6.8897  avg_loss_step : 6.8897  learning_rate : 1.7888427e-05  loss_scaler : 32768 
INFO:tensorflow:Saving checkpoints for 1000 into /cluster/home/andreku/norbert/model/model.ckpt.
I1127 03:35:07.889332 47493880456384 basic_session_run_hooks.py:606] Saving checkpoints for 1000 into /cluster/home/andreku/norbert/model/model.ckpt.
INFO:tensorflow:Loss for final step: 6.968466.
I1127 03:35:08.175190 47510213129408 estimator.py:371] Loss for final step: 6.968466.
INFO:tensorflow:Loss for final step: 6.9535007.
I1127 03:35:08.179248 47379959473344 estimator.py:371] Loss for final step: 6.9535007.
INFO:tensorflow:Loss for final step: 6.982458.
I1127 03:35:08.204637 47913607222464 estimator.py:371] Loss for final step: 6.982458.
Skipping time record for  1000  due to checkpoint-saving/warmup overhead
DLL 2020-11-27 03:35:28.905232 - Iteration: 1001  throughput_train : 10.724 seq/s mlm_loss : 6.9356  nsp_loss : 0.0000  total_loss : 6.9356  avg_loss_step : 6.9356  learning_rate : 1.2648652e-05  loss_scaler : 32768 
INFO:tensorflow:Loss for final step: 6.9356194.
I1127 03:35:29.232800 47493880456384 estimator.py:371] Loss for final step: 6.9356194.
INFO:tensorflow:-----------------------------
I1127 03:35:29.233134 47493880456384 run_pretraining.py:642] -----------------------------
INFO:tensorflow:Total Training Time = 2250.64 for Sentences = 512000
I1127 03:35:29.233198 47493880456384 run_pretraining.py:644] Total Training Time = 2250.64 for Sentences = 512000
INFO:tensorflow:Total Training Time W/O Overhead = 1934.17 for Sentences = 491520
I1127 03:35:29.233260 47493880456384 run_pretraining.py:646] Total Training Time W/O Overhead = 1934.17 for Sentences = 491520
INFO:tensorflow:Throughput Average (sentences/sec) with overhead = 227.49
I1127 03:35:29.233312 47493880456384 run_pretraining.py:647] Throughput Average (sentences/sec) with overhead = 227.49
INFO:tensorflow:Throughput Average (sentences/sec) = 254.12
I1127 03:35:29.233363 47493880456384 run_pretraining.py:648] Throughput Average (sentences/sec) = 254.12
DLL 2020-11-27 03:35:29.233412 -  throughput_train : 254.125 seq/s
INFO:tensorflow:-----------------------------
I1127 03:35:29.233526 47493880456384 run_pretraining.py:650] -----------------------------
Training BERT finished.

Task and CPU usage stats:
       JobID    JobName  AllocCPUS   NTasks     MinCPU MinCPUTask     AveCPU    Elapsed ExitCode 
------------ ---------- ---------- -------- ---------- ---------- ---------- ---------- -------- 
1581003            BERT         24                                             00:46:27      0:0 
1581003.bat+      batch         24        1   03:42:39          0   03:42:39   00:46:27      0:0 
1581003.ext+     extern         24        1   00:00:00          0   00:00:00   00:46:27      0:0 

Memory usage stats:
       JobID     MaxRSS MaxRSSTask     AveRSS MaxPages   MaxPagesTask   AvePages 
------------ ---------- ---------- ---------- -------- -------------- ---------- 
1581003                                                                          
1581003.bat+  17655412K          0  17655412K     9049              0       9049 
1581003.ext+          0          0          0        0              0          0 

Disk usage stats:
       JobID  MaxDiskRead MaxDiskReadTask    AveDiskRead MaxDiskWrite MaxDiskWriteTask   AveDiskWrite 
------------ ------------ --------------- -------------- ------------ ---------------- -------------- 
1581003                                                                                               
1581003.bat+        9.60M               0          9.60M        0.07M                0          0.07M 
1581003.ext+        0.00M               0          0.00M            0                0              0 

Job 1581003 completed at Fri Nov 27 03:35:35 CET 2020