Quantcast
Channel: Active questions tagged ubuntu - Stack Overflow
Viewing all articles
Browse latest Browse all 5962

RuntimeError: CUDA error: no kernel image is available for execution on the device (rastervision)

$
0
0

Hi I am trying to run rastervision pipeline on a GPU NVIDIA GEOFORCE 3050 RTX.

  • Ubuntu 22.04
  • Pytorch: Version: 1.12.0+cu116
  • CUDA: 12

But when I run the Docker container like that:

sudo docker run --rm --runtime=nvidia --gpus all  -it     -v ${RV_QUICKSTART_CODE_DIR}:/opt/src/code      -v ${RV_QUICKSTART_OUT_DIR}:/opt/data/output     quay.io/azavea/raster-vision:pytorch-0.20 /bin/bash

The model does not train and outputs this error:

RuntimeError: CUDA error: no kernel image is available for executionon the device CUDA kernel errors might be asynchronously reported atsome other API call,so the stacktrace below might be incorrect. Fordebugging consider passing CUDA_LAUNCH_BLOCKING=1.

PD: running nvidia-smi outputs the characteristics of the GPU, meaning it is recognized.

This is the output I get:

`Skipping 'analyze' command...python -m rastervision.pipeline.cli run_command /opt/data/output/pipeline-config.json trainRunning train command...2023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - Building datasets ...2023-03-09 08:53:29:rastervision.core.data.raster_source.rasterio_source: WARNING - Raster block size (2, 650) is too non-square. This can slow down reading. Consider re-tiling using GDAL.2023-03-09 08:53:29:rastervision.core.data.raster_source.rasterio_source: WARNING - Raster block size (2, 650) is too non-square. This can slow down reading. Consider re-tiling using GDAL.2023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - Physical CPUs: 122023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - Logical CPUs: 162023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - Total memory:  15.30 GB2023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - Size of /opt/data volume:  445.44 GB2023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - Size of / volume:  445.44 GB2023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - Python version: 3.9.16 (main, Jan 11 2023, 16:05:54) [GCC 11.2.0]/bin/sh: 1: nvcc: not found2023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - 2023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - Thu Mar  9 08:53:29 2023       +-----------------------------------------------------------------------------+| NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     ||-------------------------------+----------------------+----------------------+| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC || Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. ||                               |                      |               MIG M. ||===============================+======================+======================||   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A || N/A   37C    P3    14W /  30W |    262MiB /  4096MiB |      7%      Default ||                               |                      |                  N/A |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes:                                                                  ||  GPU   GI   CI        PID   Type   Process name                  GPU Memory ||        ID   ID                                                   Usage      ||=============================================================================|+-----------------------------------------------------------------------------+2023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - Devices:2023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - index, name, driver_version, memory.total [MiB], memory.used [MiB], memory.free [MiB]0, NVIDIA GeForce RTX 3050 Ti Laptop GPU, 525.89.02, 4096 MiB, 262 MiB, 3639 MiB2023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - PyTorch version: 1.12.1+cu1022023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - CUDA available: True2023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - CUDA version: 10.22023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - CUDNN version: 76052023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - Number of CUDA devices: 12023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - Active CUDA Device: GPU 02023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - model=SemanticSegmentationModelConfig(backbone=<Backbone.resnet50: 'resnet50'>, pretrained=True, init_weights=None, load_strict=True, external_def=None) solver=SolverConfig(lr=0.0001, num_epochs=1, test_num_epochs=2, test_batch_sz=4, overfit_num_steps=1, sync_interval=1, batch_sz=2, one_cycle=True, multi_stage=[], class_loss_weights=None, ignore_class_index=None, external_loss_def=None) data=SemanticSegmentationGeoDataConfig(scene_dataset='<1 train_scenes, 1 validation_scenes, 0 test_scenes>', window_opts="method=<GeoDataWindowMethod.random: 'random'> size=300 stride=None padding=None pad_direction='end' size_lims=(300, 301) h_lims=None w_lims=None max_windows=10 max_sample_attempts=100 efficient_aoi_sampling=True") predict_mode=False test_mode=False overfit_mode=False eval_train=False save_model_bundle=True log_tensorboard=True run_tensorboard=False output_uri='/opt/data/output/train'2023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - Using device: cuda2023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - train_ds: 10 items2023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - valid_ds: 10 items2023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - test_ds: 0 items2023-03-09 08:53:29:rastervision.pytorch_learner.learner: INFO - Plotting sample training batch.2023-03-09 08:53:30:rastervision.pytorch_learner.learner: INFO - Plotting sample validation batch.2023-03-09 08:53:31:rastervision.pytorch_learner.learner: INFO - epoch: 0Training:   0%|                                                                   | 0/5 [00:00<?, ?it/s]Traceback (most recent call last):  File "/opt/conda/lib/python3.9/runpy.py", line 197, in _run_module_as_main    return _run_code(code, main_globals, None,  File "/opt/conda/lib/python3.9/runpy.py", line 87, in _run_code    exec(code, run_globals)  File "/opt/src/rastervision_pipeline/rastervision/pipeline/cli.py", line 251, in <module>    _main()  File "/opt/src/rastervision_pipeline/rastervision/pipeline/cli.py", line 247, in _main    main()  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__    return self.main(*args, **kwargs)  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1055, in main    rv = self.invoke(ctx)  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke    return _process_result(sub_ctx.command.invoke(sub_ctx))  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke    return ctx.invoke(self.callback, **ctx.params)  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke    return __callback(*args, **kwargs)  File "/opt/src/rastervision_pipeline/rastervision/pipeline/cli.py", line 236, in run_command    _run_command(  File "/opt/src/rastervision_pipeline/rastervision/pipeline/cli.py", line 218, in _run_command    command_fn()  File "/opt/src/rastervision_core/rastervision/core/rv_pipeline/rv_pipeline.py", line 154, in train    backend.train(source_bundle_uri=self.config.source_bundle_uri)  File "/opt/src/rastervision_pytorch_backend/rastervision/pytorch_backend/pytorch_learner_backend.py", line 120, in train    learner.main()  File "/opt/src/rastervision_pytorch_learner/rastervision/pytorch_learner/learner.py", line 267, in main    self.train()  File "/opt/src/rastervision_pytorch_learner/rastervision/pytorch_learner/learner.py", line 1265, in train    train_metrics = self.train_epoch(  File "/opt/src/rastervision_pytorch_learner/rastervision/pytorch_learner/learner.py", line 1188, in train_epoch    output = self.train_step(batch, batch_ind)  File "/opt/src/rastervision_pytorch_learner/rastervision/pytorch_learner/semantic_segmentation_learner.py", line 26, in train_step    out = self.post_forward(self.model(x))  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl    return forward_call(*input, **kwargs)  File "/opt/conda/lib/python3.9/site-packages/torchvision/models/segmentation/_utils.py", line 23, in forward    features = self.backbone(x)  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl    return forward_call(*input, **kwargs)  File "/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py", line 69, in forward    x = module(x)  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl    return forward_call(*input, **kwargs)  File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/batchnorm.py", line 148, in forward    self.num_batches_tracked.add_(1)  # type: ignore[has-type]RuntimeError: CUDA error: no kernel image is available for execution on the deviceCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.For debugging consider passing CUDA_LAUNCH_BLOCKING=1.make: *** [/opt/data/output/Makefile:6: 0] Error 1`

Viewing all articles
Browse latest Browse all 5962

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>