영상처리 개발자로 1년 반동안 YOLO 시리즈를 사용하면서 사내 깃랩에만 정리하고 내 깃이나 블로그에는 정리 안해서 반성할 겸.. 작성합니다...
git: https://github.com/RHEEJEONGJIN/velog_yolov8_tutorial
pip install ultralytics
ultralytics를 pip install해서 yolov8에 필요한 다른 패키지들도 같이 설치되는 것을 확인할 수 있다.
from ultralytics import YOLO
import cv2
import os
import urllib
우선 간단하게 cv2와 ultralytics 패키지부터 import 시키자.
# download image
os.makedirs('data', exist_ok=True)
urllib.request.urlretrieve("https://ultralytics.com/images/bus.jpg", 'data/bus.jpg')
ultralytics github에 있는 이미지를 다운로드 받자.
urllib 패키지를 사용해서 다운로드 받을 수 있다.
urllib.request.urlretrieve("url 경로", '저장할 이름')
# load model
model = YOLO("./pretrained/yolov8s.pt")
yolov8로 바뀌면서 기존 yolov5와 가장 큰 변화가 패키지화가 된 것이다.
물론 ultralytics 깃페이지에서 git clone을 통해 기존처럼 사용할 수 있지만
변화의 트렌드에 맞춰 git clone하지 않고 유지보수하기 좋게 pip 패키지로만 진행할 예정이다.
customizing 하는데 필요한 준비물로 dataset yaml 파일과 cfg yaml 파일이 필요하다.
해당 파일들은 ultralytics github에도 있지만 패키지 안에 존재하므로 나는 주로 패키지 내에서 검색해서 찾고 복사해서 내 폴더 내 따로 만들어 준다.
coco128.yaml
# Ultralytics YOLO 🚀, GPL-3.0 license
# COCO128 dataset https://www.kaggle.com/ultralytics/coco128 (first 128 images from COCO train2017) by Ultralytics
# Example usage: yolo train data=coco128.yaml
# parent
# ├── yolov5
# └── datasets
# └── coco128 ← downloads here (7 MB)
# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco128 # dataset root dir
train: images/train2017 # train images (relative to 'path') 128 images
val: images/train2017 # val images (relative to 'path') 128 images
test: # test images (optional)
# Classes
names:
0: person
1: bicycle
2: car
3: motorcycle
4: airplane
5: bus
6: train
7: truck
8: boat
9: traffic light
10: fire hydrant
11: stop sign
12: parking meter
13: bench
14: bird
15: cat
16: dog
17: horse
18: sheep
19: cow
20: elephant
21: bear
22: zebra
23: giraffe
24: backpack
25: umbrella
26: handbag
27: tie
28: suitcase
29: frisbee
30: skis
31: snowboard
32: sports ball
33: kite
34: baseball bat
35: baseball glove
36: skateboard
37: surfboard
38: tennis racket
39: bottle
40: wine glass
41: cup
42: fork
43: knife
44: spoon
45: bowl
46: banana
47: apple
48: sandwich
49: orange
50: broccoli
51: carrot
52: hot dog
53: pizza
54: donut
55: cake
56: chair
57: couch
58: potted plant
59: bed
60: dining table
61: toilet
62: tv
63: laptop
64: mouse
65: remote
66: keyboard
67: cell phone
68: microwave
69: oven
70: toaster
71: sink
72: refrigerator
73: book
74: clock
75: vase
76: scissors
77: teddy bear
78: hair drier
79: toothbrush
# Download script/URL (optional)
download: https://ultralytics.com/assets/coco128.zip
default.yaml
# Ultralytics YOLO 🚀, GPL-3.0 license
# Default training settings and hyperparameters for medium-augmentation COCO training
task: detect # inference task, i.e. detect, segment, classify
mode: train # YOLO mode, i.e. train, val, predict, export
# Train settings -------------------------------------------------------------------------------------------------------
model: ./pretrained/yolov8s.pt # path to model file, i.e. yolov8n.pt, yolov8n.yaml
data: ./dataset/coco128.yaml # path to data file, i.e. coco128.yaml
epochs: 1 # number of epochs to train for
patience: 50 # epochs to wait for no observable improvement for early stopping of training
batch: 16 # number of images per batch (-1 for AutoBatch)
imgsz: 640 # size of input images as integer or w,h
save: True # save train checkpoints and predict results
save_period: -1 # Save checkpoint every x epochs (disabled if < 1)
cache: False # True/ram, disk or False. Use cache for data loading
device: 0 # device to run on, i.e. cuda device=0 or device=0,1,2,3 or device=cpu
workers: 8 # number of worker threads for data loading (per RANK if DDP)
project: runs/custom # project name
name: rhee # experiment name
exist_ok: True # whether to overwrite existing experiment
pretrained: False # whether to use a pretrained model
optimizer: SGD # optimizer to use, choices=['SGD', 'Adam', 'AdamW', 'RMSProp']
verbose: True # whether to print verbose output
seed: 0 # random seed for reproducibility
deterministic: True # whether to enable deterministic mode
single_cls: False # train multi-class data as single-class
image_weights: False # use weighted image selection for training
rect: False # support rectangular training if mode='train', support rectangular evaluation if mode='val'
cos_lr: False # use cosine learning rate scheduler
close_mosaic: 10 # disable mosaic augmentation for final 10 epochs
resume: False # resume training from last checkpoint
min_memory: False # minimize memory footprint loss function, choices=[False, True, <roll_out_thr>]
# Segmentation
overlap_mask: True # masks should overlap during training (segment train only)
mask_ratio: 4 # mask downsample ratio (segment train only)
# Classification
dropout: 0.0 # use dropout regularization (classify train only)
# Val/Test settings ----------------------------------------------------------------------------------------------------
val: True # validate/test during training
split: val # dataset split to use for validation, i.e. 'val', 'test' or 'train'
save_json: False # save results to JSON file
save_hybrid: False # save hybrid version of labels (labels + additional predictions)
conf: # object confidence threshold for detection (default 0.25 predict, 0.001 val)
iou: 0.7 # intersection over union (IoU) threshold for NMS
max_det: 300 # maximum number of detections per image
half: False # use half precision (FP16)
dnn: False # use OpenCV DNN for ONNX inference
plots: True # save plots during train/val
# Prediction settings --------------------------------------------------------------------------------------------------
source: # source directory for images or videos
show: False # show results if possible
save_txt: False # save results as .txt file
save_conf: False # save results with confidence scores
save_crop: False # save cropped images with results
hide_labels: False # hide labels
hide_conf: False # hide confidence scores
vid_stride: 1 # video frame-rate stride
line_thickness: 3 # bounding box thickness (pixels)
visualize: False # visualize model features
augment: False # apply image augmentation to prediction sources
agnostic_nms: False # class-agnostic NMS
classes: # filter results by class, i.e. class=0, or class=[0,2,3]
retina_masks: False # use high-resolution segmentation masks
boxes: True # Show boxes in segmentation predictions
# Export settings ------------------------------------------------------------------------------------------------------
format: torchscript # format to export to
keras: False # use Keras
optimize: False # TorchScript: optimize for mobile
int8: False # CoreML/TF INT8 quantization
dynamic: False # ONNX/TF/TensorRT: dynamic axes
simplify: False # ONNX: simplify model
opset: # ONNX: opset version (optional)
workspace: 4 # TensorRT: workspace size (GB)
nms: False # CoreML: add NMS
# Hyperparameters ------------------------------------------------------------------------------------------------------
lr0: 0.01 # initial learning rate (i.e. SGD=1E-2, Adam=1E-3)
lrf: 0.01 # final learning rate (lr0 * lrf)
momentum: 0.937 # SGD momentum/Adam beta1
weight_decay: 0.0005 # optimizer weight decay 5e-4
warmup_epochs: 3.0 # warmup epochs (fractions ok)
warmup_momentum: 0.8 # warmup initial momentum
warmup_bias_lr: 0.1 # warmup initial bias lr
box: 7.5 # box loss gain
cls: 0.5 # cls loss gain (scale with pixels)
dfl: 1.5 # dfl loss gain
fl_gamma: 0.0 # focal loss gamma (efficientDet default gamma=1.5)
label_smoothing: 0.0 # label smoothing (fraction)
nbs: 64 # nominal batch size
hsv_h: 0.015 # image HSV-Hue augmentation (fraction)
hsv_s: 0.7 # image HSV-Saturation augmentation (fraction)
hsv_v: 0.4 # image HSV-Value augmentation (fraction)
degrees: 0.0 # image rotation (+/- deg)
translate: 0.1 # image translation (+/- fraction)
scale: 0.5 # image scale (+/- gain)
shear: 0.0 # image shear (+/- deg)
perspective: 0.0 # image perspective (+/- fraction), range 0-0.001
flipud: 0.0 # image flip up-down (probability)
fliplr: 0.5 # image flip left-right (probability)
mosaic: 1.0 # image mosaic (probability)
mixup: 0.0 # image mixup (probability)
copy_paste: 0.0 # segment copy-paste (probability)
# Custom config.yaml ---------------------------------------------------------------------------------------------------
cfg: # for overriding defaults.yaml
# Debug, do not modify -------------------------------------------------------------------------------------------------
v5loader: False # use legacy YOLOv5 dataloader
# Tracker settings ------------------------------------------------------------------------------------------------------
tracker: botsort.yaml # tracker type, ['botsort.yaml', 'bytetrack.yaml']
model.train(cfg="cfg/custom.yaml")
default.yaml 파일을 cutom.yaml 파일이라 이름을 바꾸어 사용한다.
model: pretrained/yolov8s.pt # path to model file, i.e. yolov8n.pt, yolov8n.yaml
data: dataset/coco128.yaml # path to data file, i.e. coco128.yaml
epochs: 1 # number of epochs to train for
patience: 50 # epochs to wait for no observable improvement for early stopping of training
batch: 16 # number of images per batch (-1 for AutoBatch)
imgsz: 640 # size of input images as integer or w,h
save: True # save train checkpoints and predict results
save_period: -1 # Save checkpoint every x epochs (disabled if < 1)
cache: False # True/ram, disk or False. Use cache for data loading
device: 0 # device to run on, i.e. cuda device=0 or device=0,1,2,3 or device=cpu
workers: 8 # number of worker threads for data loading (per RANK if DDP)
project: runs/custom # project name
name: rhee # experiment name
exist_ok: True # whether to overwrite existing experiment
pretrained: True # whether to use a pretrained model
optimizer: SGD # optimizer to use, choices=['SGD', 'Adam', 'AdamW', 'RMSProp']
파라미터들을 조정해서 훈련시키고 있다.
exist_ok를 하면 experiments가 계속 생성이 안되고 내가 지정한 이름 한개로만 결과가 업데이트 된다. 나는 컴퓨터 용량을 생각해서 True로 진행했다.
validation 코드는 잘 안써서 코드만 언급하겠다. 훈련과정에서 나타나는 것이라 따로 나타내야할 필요가 있는가? 싶기도 하다.
# validation
metrics = model.val()
# load image
img = cv2.imread("data/bus.png", cv2.IMREAD_COLOR)
이미지 먼저 불러온 뒤 추론을 진행하겠다.
results = model(img)
print(results)
추론 방법은 매우 간단하다.
학습 시킨 모델에 source에 이미지만 넣어주면 끝난다.
boxes = results[0].boxes
box = boxes[0] # returns one box
print(box.xyxy)
print(box.conf)
이제 모델의 output인 results에서 object의 xyxy와 conf를 잘 조절하면 된다.
나는 현재 사람 인식을 주로 하고 있으며, 사람에 대한 오검출 및 미검출을 해결하기 위한 단계를 진행중이다.
사람에 대한 오검출을 줄이기위한 아이디어로 output의 각 클래스 별 점수를 조절해서 사람과 다른 사물의 점수가 비슷하게 검출된다면 해당 object는 검출하지 않겠다. 라는 방법을 시도 중이다.
data customizing은 다음편에서 진행하겠다.