dataloader에 데이터가 안 싣어지는 중

컴순이·2023년 3월 28일
0

NeMo의 Citrinet이라는 모델로 학습하는데 train.py 내에 model.test 부분에 오류가 나는 상황
data load가 안 되는 거 같음

1트

Error executing job with overrides: []
Traceback (most recent call last):
  File "train.py", line 84, in main
    test_trainer.test(asr_model)
  File "/home/helloubuntu/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 581, in test
    results = self.run(model)
  File "/home/helloubuntu/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 758, in _run
    self.dispatch()
  File "/home/helloubuntu/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 795, in dispatch
    self.accelerator.start_evaluating(self)
  File "/home/helloubuntu/miniconda3/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 99, in start_evaluating
    self.training_type_plugin.start_evaluating(trainer)
  File "/home/helloubuntu/miniconda3/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 148, in start_evaluating
    self._results = trainer.run_stage()
  File "/home/helloubuntu/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 806, in run_stage
    return self.run_evaluate()
  File "/home/helloubuntu/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1049, in run_evaluate
    eval_loop_results = self.run_evaluation()
  File "/home/helloubuntu/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 925, in run_evaluation
    dataloaders, max_batches = self.evaluation_loop.get_evaluation_dataloaders()
  File "/home/helloubuntu/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 56, in get_evaluation_dataloaders
    self.trainer.reset_test_dataloader(model)
  File "/home/helloubuntu/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py", line 420, in reset_test_dataloader
    self.num_test_batches, self.test_dataloaders = self._reset_eval_dataloader(model, 'test')
  File "/home/helloubuntu/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py", line 370, in _reset_eval_dataloader
    num_batches = len(dataloader) if has_len(dataloader) else float('inf')
  File "/home/helloubuntu/miniconda3/lib/python3.8/site-packages/pytorch_lightning/utilities/data.py", line 33, in has_len
    raise ValueError('`Dataloader` returned 0 length. Please make sure that it returns at least 1 batch')
ValueError: `Dataloader` returned 0 length. Please make sure that it returns at least 1 batch

EncDecCTCModelBPE.load_from_checkpoint( .ckpt) 이용해서 test 부분만 다시 해 보았다
2트

[NeMo W 2023-03-24 16:05:09 modelPT:151] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
    Test config :
    manifest_filepath: ../data/manifest/test.json
    sample_rate: 16000
    batch_size: 1
    shuffle: false

[NeMo I 2023-03-24 16:05:09 features:252] PADDING: 16
[NeMo I 2023-03-24 16:05:09 features:269] STFT using torch
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
O2
native
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[NeMo W 2023-03-24 16:05:12 nemo_logging:349] /home/helloubuntu/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py:354: UserWarning: One of given dataloaders is None and it will be skipped.
      rank_zero_warn("One of given dataloaders is None and it will be skipped.")

단서는 다 모였다

원인1
test manifest 파일이 잘못 들어가 있었다

원인2 (추정)
test manifest 파일을 수정했지만 checkpoint에서 불러온 모델이라 config가 이미 저장되어 있는건지 수정이 반영이 안되었다

해결 방안
modelPT 관련 warning을 참고하여
experiments/nemo/core/classes/modelPT.py 내에 self.setup_multiple_test_data(test_data_config=None) 로 test manifest data를 수동으로 load 하였다

결과
성공

profile
음음

0개의 댓글