'Dive into Deep Learning/D2L Computer Vision'에 해당되는 글 13건

2023.08.19 D2L - 14.2. Fine-Tuning
2023.08.19 D2L - 14.1. Image Augmentation
2023.08.18 D2L - 14. Computer Vision

Dive into Deep Learning/D2L Computer Vision

D2L - 14.2. Fine-Tuning

2023. 8. 19. 01:59 | Posted by 솔웅

https://d2l.ai/chapter_computer-vision/fine-tuning.html

14.2. Fine-Tuning — Dive into Deep Learning 1.0.0 documentation

d2l.ai

14.2. Fine-Tuning

In earlier chapters, we discussed how to train models on the Fashion-MNIST training dataset with only 60000 images. We also described ImageNet, the most widely used large-scale image dataset in academia, which has more than 10 million images and 1000 objects. However, the size of the dataset that we usually encounter is between those of the two datasets.

이전 장에서 60000개의 이미지만으로 Fashion-MNIST 교육 데이터 세트에서 모델을 교육하는 방법에 대해 논의했습니다. 또한 학계에서 가장 널리 사용되는 대규모 이미지 데이터 세트인 ImageNet에 대해서도 설명했습니다. 이 데이터 세트에는 천만 개 이상의 이미지와 1000개 개체가 있습니다. 그러나 우리가 일반적으로 접하는 데이터 세트의 크기는 두 데이터 세트의 중간 크기입니다.

Suppose that we want to recognize different types of chairs from images, and then recommend purchase links to users. One possible method is to first identify 100 common chairs, take 1000 images of different angles for each chair, and then train a classification model on the collected image dataset. Although this chair dataset may be larger than the Fashion-MNIST dataset, the number of examples is still less than one-tenth of that in ImageNet. This may lead to overfitting of complicated models that are suitable for ImageNet on this chair dataset. Besides, due to the limited amount of training examples, the accuracy of the trained model may not meet practical requirements.

이미지에서 다양한 유형의 의자를 인식하고 사용자에게 구매 링크를 추천하고 싶다고 가정합니다. 한 가지 가능한 방법은 먼저 100개의 일반적인 의자를 식별하고 각 의자에 대해 서로 다른 각도의 이미지 1000개를 가져온 다음 수집된 이미지 데이터 세트에서 분류 모델을 훈련하는 것입니다. 이 의자 데이터셋은 Fashion-MNIST 데이터셋보다 클 수 있지만 예제의 수는 여전히 ImageNet의 1/10 미만입니다. 이로 인해 이 의자 데이터 세트에서 ImageNet에 적합한 복잡한 모델이 과대적합될 수 있습니다. 게다가 제한된 양의 학습 예제로 인해 학습된 모델의 정확도가 실제 요구 사항을 충족하지 못할 수 있습니다.

In order to address the above problems, an obvious solution is to collect more data. However, collecting and labeling data can take a lot of time and money. For example, in order to collect the ImageNet dataset, researchers have spent millions of dollars from research funding. Although the current data collection cost has been significantly reduced, this cost still cannot be ignored.

위의 문제를 해결하기 위한 확실한 해결책은 더 많은 데이터를 수집하는 것입니다. 그러나 데이터를 수집하고 레이블을 지정하는 데는 많은 시간과 비용이 소요될 수 있습니다. 예를 들어 ImageNet 데이터 세트를 수집하기 위해 연구자들은 연구 자금에서 수백만 달러를 지출했습니다. 현재 데이터 수집 비용이 크게 줄었지만 이 비용은 여전히 무시할 수 없습니다.

Another solution is to apply transfer learning to transfer the knowledge learned from the source dataset to the target dataset. For example, although most of the images in the ImageNet dataset have nothing to do with chairs, the model trained on this dataset may extract more general image features, which can help identify edges, textures, shapes, and object composition. These similar features may also be effective for recognizing chairs.

또 다른 해결책은 전이 학습(transfer learning)을 적용하여 원본 데이터 세트에서 학습한 지식을 대상 데이터 세트로 이전하는 것입니다. 예를 들어 ImageNet 데이터세트의 대부분의 이미지는 의자와 관련이 없지만 이 데이터세트에서 훈련된 모델은 가장자리, 질감, 모양 및 개체 구성을 식별하는 데 도움이 될 수 있는 보다 일반적인 이미지 기능을 추출할 수 있습니다. 이러한 유사한 기능은 의자를 인식하는 데에도 효과적일 수 있습니다.

Transfer Learning (전이 학습) 이란?

'transfer learning(전이 학습)'은 하나의 작업에서 학습한 모델의 지식을 다른 관련 작업으로 전달하여 학습하는 기술을 말합니다. 기존에 학습된 모델의 일부 또는 전체 네트워크 구조와 가중치를 새로운 작업에 재사용하는 것을 의미합니다. 이를 통해 새로운 작업에 대한 학습 데이터가 부족한 경우에도 모델의 성능을 향상시킬 수 있습니다.

전이 학습은 다음과 같은 장점을 가지고 있습니다:

데이터 부족 문제 해결: 새로운 작업에 충분한 양의 학습 데이터가 없을 때, 기존 작업에서 학습한 모델을 전이하여 성능을 향상시킬 수 있습니다.
학습 시간 단축: 기존에 학습된 모델의 가중치를 초기화로 사용하면 초기 모델의 성능은 높아지며, 새로운 작업에 대한 학습 시간을 단축할 수 있습니다.
일반화 능력 향상: 기존 작업에서 학습한 모델은 이미 다양한 특징을 학습했으므로 이를 활용하여 새로운 작업에서의 성능을 향상시킬 수 있습니다.

전이 학습은 이미지 분류, 객체 검출, 자연어 처리 등 다양한 분야에서 활용되며, 사전 학습된 모델을 가져와서 적절한 수정 및 조정을 통해 새로운 작업에 맞게 재사용하는 것이 일반적인 접근 방식입니다.

14.2.1. Steps

In this section, we will introduce a common technique in transfer learning: fine-tuning. As shown in Fig. 14.2.1, fine-tuning consists of the following four steps:

이 섹션에서는 전이 학습의 일반적인 기술인 미세 조정을 소개합니다. 그림 14.2.1과 같이 미세 조정은 다음 네 단계로 구성됩니다.

When target datasets are much smaller than source datasets, fine-tuning helps to improve models’ generalization ability.

대상 데이터 세트가 소스 데이터 세트보다 훨씬 작을 때 미세 조정하면 모델의 일반화 능력을 향상시키는 데 도움이 됩니다.

14.2.2. Hot Dog Recognition

Let’s demonstrate fine-tuning via a concrete case: hot dog recognition. We will fine-tune a ResNet model on a small dataset, which was pretrained on the ImageNet dataset. This small dataset consists of thousands of images with and without hot dogs. We will use the fine-tuned model to recognize hot dogs from images.

핫도그 인식이라는 구체적인 사례를 통해 미세 조정을 시연해 보겠습니다. ImageNet 데이터 세트에서 사전 훈련된 작은 데이터 세트에서 ResNet 모델을 미세 조정합니다. 이 작은 데이터 세트는 핫도그가 있거나 없는 수천 개의 이미지로 구성됩니다. 미세 조정된 모델을 사용하여 이미지에서 핫도그를 인식합니다.

%matplotlib inline
import os
import torch
import torchvision
from torch import nn
from d2l import torch as d2l

이 코드는 딥러닝 모델을 구축하고 학습하기 위해 필요한 라이브러리와 설정을 불러오는 부분입니다. 각 줄의 역할을 살펴보겠습니다:

%matplotlib inline: 이 코드는 주피터 노트북 상에서 그래프를 인라인으로 표시하도록 설정하는 매직 명령어입니다. 따라서 그래프를 그릴 때 별도의 창이 아니라 노트북 셀 내에서 바로 보여집니다.
import os: os 모듈은 운영체제와 상호 작용하는 함수를 제공하는 파이썬 내장 모듈입니다. 파일 경로 조작, 환경 변수 설정 등에 사용될 수 있습니다.
import torch: PyTorch 라이브러리를 불러옵니다. 딥러닝 모델을 구축하고 학습하기 위한 핵심 라이브러리입니다.
import torchvision: PyTorch에서 이미지 및 비디오 데이터셋, 변환 등을 처리하기 위한 도구를 제공하는 라이브러리입니다.
from torch import nn: PyTorch의 nn 모듈을 불러옵니다. 이 모듈은 신경망 계층을 정의하고 관리하는 데 사용됩니다.
from d2l import torch as d2l: d2l 모듈에서 torch 모듈을 가져와 d2l이라는 이름으로 사용합니다. 이것은 Dive into Deep Learning (D2L) 프로젝트의 파이썬 유틸리티 모듈로, 딥러닝 학습을 위한 함수 및 도우미 기능을 제공합니다.

이 코드 블록은 딥러닝 모델을 구축하고 학습하는 데 필요한 주요 라이브러리와 설정을 불러오는 역할을 합니다. 해당 코드 이후에 딥러닝 모델을 정의하고 데이터를 불러와 학습하는 등의 작업을 진행할 수 있습니다.

14.2.2.1. Reading the Dataset

The hot dog dataset we use was taken from online images. This dataset consists of 1400 positive-class images containing hot dogs, and as many negative-class images containing other foods. 1000 images of both classes are used for training and the rest are for testing.

우리가 사용하는 핫도그 데이터 세트는 온라인 이미지에서 가져왔습니다. 이 데이터 세트는 핫도그를 포함하는 1400개의 포지티브 클래스 이미지와 다른 음식을 포함하는 많은 네거티브 클래스 이미지로 구성됩니다. 두 클래스의 1000개 이미지는 훈련에 사용되고 나머지는 테스트에 사용됩니다.

After unzipping the downloaded dataset, we obtain two folders hotdog/train and hotdog/test. Both folders have hotdog and not-hotdog subfolders, either of which contains images of the corresponding class.

다운로드한 데이터 세트의 압축을 푼 후 hotdog/train 및 hotdog/test 폴더 두 개를 얻습니다. 두 폴더 모두 핫도그 하위 폴더와 핫도그가 아닌 하위 폴더가 있으며 둘 중 하나에는 해당 클래스의 이미지가 포함되어 있습니다.

#@save
d2l.DATA_HUB['hotdog'] = (d2l.DATA_URL + 'hotdog.zip',
                         'fba480ffa8aa7e0febbb511d181409f899b9baa5')

data_dir = d2l.download_extract('hotdog')

이 코드는 D2L (Dive into Deep Learning) 라이브러리의 데이터 허브에 새로운 데이터셋을 등록하고 데이터셋을 다운로드하고 압축을 해제하는 역할을 합니다. 각 줄의 역할을 살펴보겠습니다:

#@save: 이 주석은 코드 블록이 문서화를 위한 주석임을 나타냅니다. D2L에서는 이러한 주석을 사용하여 문서를 생성하는 데 활용합니다.
d2l.DATA_HUB['hotdog'] = (d2l.DATA_URL + 'hotdog.zip', 'fba480ffa8aa7e0febbb511d181409f899b9baa5'): 이 줄은 데이터 허브에 'hotdog'라는 이름으로 새로운 데이터셋을 등록하는 역할을 합니다. 등록된 데이터셋에 대한 정보 (URL과 해시)를 제공합니다. 이 정보를 통해 데이터셋을 다운로드하고 검증할 수 있습니다.
data_dir = d2l.download_extract('hotdog'): 이 코드는 등록한 'hotdog' 데이터셋을 다운로드하고 압축을 해제하여 로컬 디렉토리에 저장하는 역할을 합니다. 데이터셋의 다운로드 URL과 해시를 사용하여 데이터셋을 검증합니다. data_dir 변수에는 데이터셋이 저장된 로컬 디렉토리 경로가 저장됩니다.

이 코드 블록은 D2L 라이브러리의 데이터 허브에 새로운 데이터셋을 등록하고 해당 데이터셋을 다운로드하고 압축을 해제하여 사용할 수 있도록 준비하는 역할을 합니다.

Print number of images in the data_dir.

# Display the first 5 images
for i in range(5):
    img = mpimg.imread(image_files[i])
    plt.imshow(img)
    plt.axis('off')  # Turn off axis labels
    plt.show()

Number of images: 2800

We create two instances to read all the image files in the training and testing datasets, respectively.

교육 및 테스트 데이터 세트의 모든 이미지 파일을 읽기 위해 각각 두 개의 인스턴스를 만듭니다.

train_imgs = torchvision.datasets.ImageFolder(os.path.join(data_dir, 'train'))
test_imgs = torchvision.datasets.ImageFolder(os.path.join(data_dir, 'test'))

이 코드는 'hotdog' 데이터셋의 학습 및 테스트 이미지들을 로드하는 역할을 합니다. 이해하기 쉽게 코드를 분석해보겠습니다:

#@save: 이 주석은 코드 블록이 문서화를 위한 주석임을 나타냅니다.
train_imgs = torchvision.datasets.ImageFolder(os.path.join(data_dir, 'train')): 이 줄은 학습 이미지들이 포함된 디렉토리에서 이미지를 로드하여 train_imgs에 저장합니다. ImageFolder는 디렉토리 구조를 기반으로 이미지 데이터셋을 생성합니다. 학습 이미지들은 'train' 서브디렉토리에 저장되어 있습니다.
test_imgs = torchvision.datasets.ImageFolder(os.path.join(data_dir, 'test')): 이 줄은 테스트 이미지들을 로드하여 test_imgs에 저장합니다. 테스트 이미지들은 'test' 서브디렉토리에 저장되어 있습니다.

이렇게 코드를 실행하면 'hotdog' 데이터셋의 학습 이미지와 테스트 이미지를 train_imgs와 test_imgs에 로드하게 됩니다. 'ImageFolder' 클래스는 데이터셋을 쉽게 관리하고 처리할 수 있도록 도와주는 유용한 도구입니다.

The first 8 positive examples and the last 8 negative images are shown below. As you can see, the images vary in size and aspect ratio.

처음 8개의 긍정적인 예와 마지막 8개의 부정적인 이미지가 아래에 나와 있습니다. 보시다시피 이미지의 크기와 종횡비가 다릅니다.

hotdogs = [train_imgs[i][0] for i in range(8)]
not_hotdogs = [train_imgs[-i - 1][0] for i in range(8)]
d2l.show_images(hotdogs + not_hotdogs, 2, 8, scale=1.4);

이 코드는 d2l (Dive into Deep Learning) 라이브러리의 d2l.show_images 함수를 사용하여 이미지 그리드를 표시하는 코드입니다. 이 함수는 이미지 목록을 격자 레이아웃으로 표시하는 데 사용됩니다.

코드의 각 부분이 하는 일은 다음과 같습니다.

hotdogs = [train_imgs[i][0] for i in range(8)]: 이 줄은 train_imgs 데이터셋에서 처음 8개의 이미지를 담은 hotdogs 리스트를 생성합니다. hotdogs의 각 요소는 이미지를 나타내는 텐서입니다.
not_hotdogs = [train_imgs[-i - 1][0] for i in range(8)]: 이 줄은 train_imgs 데이터셋에서 마지막 8개의 이미지를 담은 not_hotdogs 리스트를 생성합니다. 마찬가지로, not_hotdogs의 각 요소는 이미지를 나타내는 텐서입니다.
d2l.show_images(hotdogs + not_hotdogs, 2, 8, scale=1.4);: 이 줄은 d2l.show_images 함수를 사용하여 이미지 그리드를 표시합니다. 표시할 이미지로 hotdogs와 not_hotdogs 리스트를 연결합니다. 2와 8 인자는 격자가 2행 8열을 가지도록 지정합니다. scale=1.4 인자는 표시되는 이미지의 크기를 조정하여 더 나은 가시성을 확보합니다.

요약하면, 이 코드는 d2l.show_images 함수를 사용하여 그리드 형태로 16개의 이미지를 표시합니다. 처음 8개 이미지는 핫도그이고, 마지막 8개 이미지는 핫도그가 아닌 이미지입니다.

During training, we first crop a random area of random size and random aspect ratio from the image, and then scale this area to a 224×224 input image. During testing, we scale both the height and width of an image to 256 pixels, and then crop a central 224×224 area as input. In addition, for the three RGB (red, green, and blue) color channels we standardize their values channel by channel. Concretely, the mean value of a channel is subtracted from each value of that channel and then the result is divided by the standard deviation of that channel.

학습하는 동안 먼저 이미지에서 임의의 크기와 가로 세로 비율의 임의 영역을 자른 다음 이 영역을 224×224 입력 이미지로 조정합니다. 테스트하는 동안 이미지의 높이와 너비를 모두 256픽셀로 조정한 다음 중앙 224×224 영역을 입력으로 자릅니다. 또한 3개의 RGB(빨간색, 녹색 및 파란색) 색상 채널의 경우 해당 값을 채널별로 표준화합니다. 구체적으로, 채널의 평균값을 해당 채널의 각 값에서 뺀 다음 결과를 해당 채널의 표준 편차로 나눕니다.

# Specify the means and standard deviations of the three RGB channels to
# standardize each channel
normalize = torchvision.transforms.Normalize(
    [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

train_augs = torchvision.transforms.Compose([
    torchvision.transforms.RandomResizedCrop(224),
    torchvision.transforms.RandomHorizontalFlip(),
    torchvision.transforms.ToTensor(),
    normalize])

test_augs = torchvision.transforms.Compose([
    torchvision.transforms.Resize([256, 256]),
    torchvision.transforms.CenterCrop(224),
    torchvision.transforms.ToTensor(),
    normalize])

이 코드는 이미지 데이터 전처리를 위한 변환들을 정의하는 부분입니다. 데이터셋을 학습 및 테스트용으로 나눈 후에, 각 데이터셋의 이미지에 대해 수행할 전처리 작업을 설정하는데 사용됩니다. 코드의 각 줄이 하는 역할을 살펴보겠습니다.

normalize = torchvision.transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]): 이 줄은 이미지 정규화를 위한 변환을 정의합니다. 정규화는 각 채널의 평균과 표준 편차를 사용하여 RGB 이미지를 정규화합니다. 이러한 정규화는 모델의 학습 안정성과 성능을 향상시키는 데 도움이 됩니다.
train_augs = torchvision.transforms.Compose([...]): 이 줄은 학습 데이터에 적용할 전처리 과정을 정의합니다. torchvision.transforms.Compose 함수는 여러 개의 전처리 함수를 하나로 연결하여 적용할 수 있게 해줍니다. 학습 데이터에 적용되는 변환은 다음과 같습니다:
- RandomResizedCrop(224): 이미지를 임의의 크기로 자르고 224x224 크기로 크롭합니다.
- RandomHorizontalFlip(): 이미지를 수평으로 무작위로 뒤집습니다.
- ToTensor(): 이미지를 텐서 형식으로 변환합니다.
- normalize: 앞서 정의한 정규화를 적용합니다.
test_augs = torchvision.transforms.Compose([...]): 이 줄은 테스트 데이터에 적용할 전처리 과정을 정의합니다. 테스트 데이터에 적용되는 변환은 다음과 같습니다:
- Resize([256, 256]): 이미지 크기를 256x256으로 조정합니다.
- CenterCrop(224): 이미지를 중앙을 기준으로 224x224 크기로 크롭합니다.
- ToTensor(): 이미지를 텐서 형식으로 변환합니다.
- normalize: 정규화를 적용합니다.

이렇게 정의된 train_augs와 test_augs는 학습 및 테스트 데이터셋에서 이미지에 적용될 전처리 변환들을 구성합니다.

14.2.2.2. Defining and Initializing the Model

We use ResNet-18, which was pretrained on the ImageNet dataset, as the source model. Here, we specify pretrained=True to automatically download the pretrained model parameters. If this model is used for the first time, Internet connection is required for download.

ImageNet 데이터 세트에서 사전 훈련된 ResNet-18을 소스 모델로 사용합니다. 여기에서 pretrained=True를 지정하여 사전 훈련된 모델 매개변수를 자동으로 다운로드합니다. 이 모델을 처음 사용하는 경우 다운로드를 위해 인터넷 연결이 필요합니다.

pretrained_net = torchvision.models.resnet18(pretrained=True)

이 코드는 torchvision 라이브러리의 resnet18 모델을 미리 학습된 가중치와 함께 불러오는 부분입니다.

torchvision.models.resnet18(pretrained=True)은 ResNet-18 아키텍처를 가져와서 미리 학습된 가중치를 사용하여 초기화한 모델을 생성합니다. ResNet-18은 18개의 층으로 구성된 딥 컨볼루션 신경망 아키텍처로, 이미지 분류와 같은 컴퓨터 비전 작업에서 널리 사용됩니다.

pretrained=True 매개변수는 미리 학습된 가중치를 사용하겠다는 의미입니다. 이렇게 하면 모델이 ImageNet 데이터셋에서 사전 학습된 가중치를 가져옵니다. ImageNet은 대규모 이미지 데이터셋으로, 다양한 카테고리에 속하는 이미지로 구성되어 있습니다. 이러한 사전 학습된 가중치는 초기화 단계에서 모델의 성능을 향상시키는 데 도움이 됩니다.

따라서 pretrained_net은 미리 학습된 ResNet-18 모델을 나타냅니다. 이 모델은 이미지 분류 작업에 사용될 수 있으며, 이미지 데이터를 입력으로 받아 각 클래스에 대한 확률 분포를 출력하는 역할을 할 수 있습니다.

사전 훈련된 소스 모델 인스턴스에는 여러 피처 레이어와 출력 레이어 fc가 포함되어 있습니다. 이 분할의 주요 목적은 출력 레이어를 제외한 모든 레이어의 모델 매개변수를 미세 조정하는 것입니다. 소스 모델의 멤버 변수 fc는 다음과 같습니다.

pretrained_net.fc

pretrained_net.fc는 미리 학습된 ResNet-18 모델의 마지막 fully connected (FC) 레이어를 나타냅니다.

ResNet-18 아키텍처의 마지막에는 fully connected 레이어가 존재하며, 이 레이어는 모델이 ImageNet 데이터셋과 같은 대규모 이미지 분류 작업에서 사용될 때 출력 클래스 수에 해당하는 뉴런을 가지고 있습니다. 이 fully connected 레이어는 이미지 특성을 활용하여 각 클래스에 대한 확률 분포를 생성합니다.

pretrained_net.fc를 호출하면 ResNet-18 모델의 마지막 fully connected 레이어가 반환됩니다. 이 레이어는 모델의 출력 역할을 하며, 이 레이어를 통과한 출력은 클래스 확률 분포를 나타냅니다. 이 fully connected 레이어를 변경하거나 새로운 데이터셋에 맞게 fine-tuning하는 등의 작업을 수행할 수 있습니다.

As a fully connected layer, it transforms ResNet’s final global average pooling outputs into 1000 class outputs of the ImageNet dataset. We then construct a new neural network as the target model. It is defined in the same way as the pretrained source model except that its number of outputs in the final layer is set to the number of classes in the target dataset (rather than 1000).

완전히 연결된 계층으로서 ResNet의 최종 글로벌 평균 풀링 출력을 ImageNet 데이터 세트의 1000개 클래스 출력으로 변환합니다. 그런 다음 대상 모델로 새로운 신경망을 구성합니다. 최종 계층의 출력 수가 대상 데이터 세트의 클래스 수(1000이 아닌)로 설정된다는 점을 제외하면 사전 훈련된 소스 모델과 동일한 방식으로 정의됩니다.

In the code below, the model parameters before the output layer of the target model instance finetune_net are initialized to model parameters of the corresponding layers from the source model. Since these model parameters were obtained via pretraining on ImageNet, they are effective. Therefore, we can only use a small learning rate to fine-tune such pretrained parameters. In contrast, model parameters in the output layer are randomly initialized and generally require a larger learning rate to be learned from scratch. Letting the base learning rate be η, a learning rate of 10η will be used to iterate the model parameters in the output layer.

η - eta

아래 코드에서 대상 모델 인스턴스 finetune_net의 출력 레이어 앞의 모델 매개변수는 소스 모델에서 해당 레이어의 모델 매개변수로 초기화됩니다. 이러한 모델 매개변수는 ImageNet에서 사전 학습을 통해 얻었기 때문에 효과적입니다. 따라서 사전 훈련된 매개변수를 미세 조정하기 위해 작은 학습 속도만 사용할 수 있습니다. 반대로 출력 계층의 모델 매개변수는 무작위로 초기화되며 일반적으로 처음부터 학습하려면 더 큰 학습률이 필요합니다. 기본 학습 속도를 η로 하면 출력 레이어에서 모델 매개변수를 반복하는 데 10η의 학습 속도가 사용됩니다.

finetune_net = torchvision.models.resnet18(pretrained=True)
finetune_net.fc = nn.Linear(finetune_net.fc.in_features, 2)
nn.init.xavier_uniform_(finetune_net.fc.weight);

위 코드는 미리 학습된 ResNet-18 아키텍처를 가져와서, 해당 아키텍처의 마지막 fully connected 레이어를 새로운 레이어로 교체하는 과정을 나타냅니다.

finetune_net = torchvision.models.resnet18(pretrained=True): 미리 학습된 ResNet-18 모델을 가져옵니다. 이 모델은 ImageNet 데이터셋에서 학습되어 다양한 이미지 분류 작업에 사용할 수 있는 기본 아키텍처와 가중치를 가지고 있습니다.
finetune_net.fc = nn.Linear(finetune_net.fc.in_features, 2): 기존의 마지막 fully connected 레이어를 새로운 선형 레이어로 교체합니다. 이 때, finetune_net.fc.in_features는 기존 fully connected 레이어의 입력 feature 수를 나타냅니다. 여기서는 이를 유지한 채로 출력 클래스 수를 2로 설정하여 이진 분류 작업에 적합하도록 모델을 수정합니다.
nn.init.xavier_uniform_(finetune_net.fc.weight): 새로운 fully connected 레이어의 가중치를 Xavier 초기화 방법을 사용하여 초기화합니다. 이 초기화 방법은 신경망 가중치 초기화에 일반적으로 사용되는 방법 중 하나로, 초기 가중치를 효과적으로 설정해 네트워크의 학습을 원활하게 만듭니다.

이러한 수정 작업을 통해 ResNet-18 모델을 기존 분류 작업에서 새로운 이진 분류 작업에 맞게 fine-tuning할 수 있습니다. 새로운 fully connected 레이어는 2개의 클래스에 대한 확률 분포를 출력하도록 설정되어 이진 분류 작업을 수행할 수 있게 됩니다.

14.2.2.3. Fine-Tuning the Model

First, we define a training function train_fine_tuning that uses fine-tuning so it can be called multiple times.

먼저 여러 번 호출할 수 있도록 미세 조정을 사용하는 훈련 함수 train_fine_tuning을 정의합니다.

# If `param_group=True`, the model parameters in the output layer will be
# updated using a learning rate ten times greater
def train_fine_tuning(net, learning_rate, batch_size=128, num_epochs=5,
                      param_group=True):
    train_iter = torch.utils.data.DataLoader(torchvision.datasets.ImageFolder(
        os.path.join(data_dir, 'train'), transform=train_augs),
        batch_size=batch_size, shuffle=True)
    test_iter = torch.utils.data.DataLoader(torchvision.datasets.ImageFolder(
        os.path.join(data_dir, 'test'), transform=test_augs),
        batch_size=batch_size)
    devices = d2l.try_all_gpus()
    loss = nn.CrossEntropyLoss(reduction="none")
    if param_group:
        params_1x = [param for name, param in net.named_parameters()
             if name not in ["fc.weight", "fc.bias"]]
        trainer = torch.optim.SGD([{'params': params_1x},
                                   {'params': net.fc.parameters(),
                                    'lr': learning_rate * 10}],
                                lr=learning_rate, weight_decay=0.001)
    else:
        trainer = torch.optim.SGD(net.parameters(), lr=learning_rate,
                                  weight_decay=0.001)
    d2l.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs,
                   devices)

위 코드는 fine-tuning을 통해 사전 훈련된 신경망 모델을 새로운 작업에 맞게 학습시키는 과정을 정의한 함수를 나타냅니다.

train_fine_tuning(net, learning_rate, batch_size=128, num_epochs=5, param_group=True): fine-tuning을 수행하기 위한 함수를 정의합니다. net은 모델, learning_rate는 학습률, batch_size는 배치 크기, num_epochs는 학습 에포크 수를 나타냅니다. param_group는 파라미터 그룹화 여부를 결정하는 매개변수로, True로 설정하면 출력 레이어의 가중치에 더 높은 학습률을 적용하는 방식을 사용합니다.
train_iter와 test_iter: 학습 및 테스트 데이터를 DataLoader로 불러오는 과정을 나타냅니다. ImageFolder를 사용하여 데이터셋을 생성하고, transform을 통해 이미지 변환 작업을 수행합니다.
devices = d2l.try_all_gpus(): 사용 가능한 GPU 디바이스를 모두 가져오는 과정입니다.
loss = nn.CrossEntropyLoss(reduction="none"): 손실 함수를 교차 엔트로피 손실로 정의합니다. reduction을 "none"으로 설정하면 배치 내 각 샘플에 대한 손실 값을 구할 수 있습니다.
if param_group:: param_group가 True인 경우, 모델 파라미터를 그룹화하여 학습률을 설정하는 방식을 사용합니다. 이 때, params_1x는 출력 레이어를 제외한 모든 파라미터를 가져오는 리스트입니다. 출력 레이어의 파라미터는 학습률을 10배로 높여서 적용합니다.
trainer = torch.optim.SGD(...): SGD (stochastic gradient descent, 확률적 경사 하강법) 최적화 알고리즘을 사용하여 학습을 수행하는 옵티마이저를 정의합니다. params_1x와 출력 레이어의 파라미터에 서로 다른 학습률을 적용하도록 설정합니다.
d2l.train_ch13(...): 정의한 학습 함수를 호출하여 fine-tuning 학습을 수행합니다. 학습과 테스트 데이터셋, 손실 함수, 옵티마이저 등을 인자로 전달하여 모델을 학습시킵니다.

We set the base learning rate to a small value in order to fine-tune the model parameters obtained via pretraining. Based on the previous settings, we will train the output layer parameters of the target model from scratch using a learning rate ten times greater.

사전 학습을 통해 얻은 모델 매개 변수를 미세 조정하기 위해 기본 학습 속도를 작은 값으로 설정했습니다. 이전 설정을 기반으로 10배 더 큰 학습률을 사용하여 대상 모델의 출력 계층 매개변수를 처음부터 훈련합니다.

train_fine_tuning(finetune_net, 5e-5)

위 코드는 fine-tuning을 통해 사전 훈련된 ResNet 모델인 finetune_net을 학습하는 과정을 나타냅니다.

train_fine_tuning(finetune_net, 5e-5): train_fine_tuning 함수를 호출하여 finetune_net 모델을 학습합니다. 5e-5는 학습률을 나타내며, 여기서는 상대적으로 작은 학습률로 설정되어 있습니다. 따라서 finetune_net 모델을 학습하면서 학습률을 조절하여 새로운 작업에 맞게 가중치를 조정하게 됩니다.

For comparison, we define an identical model, but initialize all of its model parameters to random values. Since the entire model needs to be trained from scratch, we can use a larger learning rate.

비교를 위해 동일한 모델을 정의하지만 모든 모델 매개변수를 임의의 값으로 초기화합니다. 전체 모델을 처음부터 훈련해야 하므로 더 큰 학습률을 사용할 수 있습니다.

scratch_net = torchvision.models.resnet18()
scratch_net.fc = nn.Linear(scratch_net.fc.in_features, 2)
train_fine_tuning(scratch_net, 5e-4, param_group=False)

위 코드는 "from scratch"로 새로운 ResNet 모델을 생성하고 학습하는 과정을 나타냅니다.

scratch_net = torchvision.models.resnet18(): 새로운 ResNet 모델을 생성합니다. 이 모델은 사전 훈련된 가중치 없이 초기화된 상태입니다.
scratch_net.fc = nn.Linear(scratch_net.fc.in_features, 2): 생성한 새로운 ResNet 모델의 Fully Connected (FC) 레이어를 수정합니다. 입력 특성 개수에 따라 출력 특성 개수가 2인 FC 레이어로 변경됩니다. 이것은 분류 문제에서 두 개의 클래스를 구분하는 데 사용됩니다.
train_fine_tuning(scratch_net, 5e-4, param_group=False): train_fine_tuning 함수를 호출하여 생성한 새로운 ResNet 모델 scratch_net을 학습합니다. 5e-4는 학습률을 나타내며, param_group=False로 설정되어 FC 레이어에 해당하는 가중치만 학습되며, 사전 훈련된 모델의 가중치는 고정됩니다. 이렇게 하면 새로운 작업에 맞게 FC 레이어만 학습됩니다.

As we can see, the fine-tuned model tends to perform better for the same epoch because its initial parameter values are more effective.

보시다시피 미세 조정된 모델은 초기 매개변수 값이 더 효과적이기 때문에 동일한 에포크에 대해 더 나은 성능을 보이는 경향이 있습니다.

14.2.3. Summary

Transfer learning transfers knowledge learned from the source dataset to the target dataset. Fine-tuning is a common technique for transfer learning.
전이 학습은 원본 데이터 세트에서 학습한 지식을 대상 데이터 세트로 이전합니다. 미세 조정은 전이 학습을 위한 일반적인 기술입니다.
The target model copies all model designs with their parameters from the source model except the output layer, and fine-tunes these parameters based on the target dataset. In contrast, the output layer of the target model needs to be trained from scratch.
대상 모델은 출력 레이어를 제외한 소스 모델의 매개변수가 있는 모든 모델 디자인을 복사하고 대상 데이터 세트를 기반으로 이러한 매개변수를 미세 조정합니다. 반대로 대상 모델의 출력 계층은 처음부터 학습해야 합니다.
Generally, fine-tuning parameters uses a smaller learning rate, while training the output layer from scratch can use a larger learning rate.
일반적으로 미세 조정 매개변수는 더 작은 학습률을 사용하는 반면 출력 계층을 처음부터 훈련하면 더 큰 학습률을 사용할 수 있습니다.

14.2.4. Exercises

Keep increasing the learning rate of finetune_net. How does the accuracy of the model change?
Further adjust hyperparameters of finetune_net and scratch_net in the comparative experiment. Do they still differ in accuracy?
Set the parameters before the output layer of finetune_net to those of the source model and do not update them during training. How does the accuracy of the model change? You can use the following code.

for param in finetune_net.parameters():
    param.requires_grad = False

4. In fact, there is a “hotdog” class in the ImageNet dataset. Its corresponding weight parameter in the output layer can be obtained via the following code. How can we leverage this weight parameter?

weight = pretrained_net.fc.weight
hotdog_w = torch.split(weight.data, 1, dim=0)[934]
hotdog_w.shape

torch.Size([1, 512])

'Dive into Deep Learning > D2L Computer Vision' 카테고리의 다른 글

D2L - 14.10. Transposed Convolution (1)	2023.08.21
D2L - 14.9. Semantic Segmentation and the Dataset (0)	2023.08.20
D2L - 14.8. Region-based CNNs (R-CNNs) (0)	2023.08.20
D2L - 14.7. Single Shot Multibox Detection (0)	2023.08.19
D2L - 14.6. The Object Detection Dataset (0)	2023.08.19
D2L - 14.5. Multiscale Object Detection (0)	2023.08.19
D2L - 14.4. Anchor Boxes (0)	2023.08.19
D2L - 14.3. Object Detection and Bounding Boxes (0)	2023.08.19
D2L - 14.1. Image Augmentation (0)	2023.08.19
D2L - 14. Computer Vision (0)	2023.08.18

Dive into Deep Learning/D2L Computer Vision

D2L - 14.1. Image Augmentation

2023. 8. 19. 00:31 | Posted by 솔웅

https://d2l.ai/chapter_computer-vision/image-augmentation.html

14.1. Image Augmentation — Dive into Deep Learning 1.0.0 documentation

d2l.ai

14.1. Image Augmentation

In Section 8.1, we mentioned that large datasets are a prerequisite for the success of deep neural networks in various applications. Image augmentation generates similar but distinct training examples after a series of random changes to the training images, thereby expanding the size of the training set. Alternatively, image augmentation can be motivated by the fact that random tweaks of training examples allow models to rely less on certain attributes, thereby improving their generalization ability. For example, we can crop an image in different ways to make the object of interest appear in different positions, thereby reducing the dependence of a model on the position of the object. We can also adjust factors such as brightness and color to reduce a model’s sensitivity to color. It is probably true that image augmentation was indispensable for the success of AlexNet at that time. In this section we will discuss this widely used technique in computer vision.

섹션 8.1에서 대규모 데이터 세트가 다양한 애플리케이션에서 심층 신경망의 성공을 위한 전제 조건이라고 언급했습니다. 이미지 확대는 트레이닝 이미지에 대한 일련의 무작위 변경 후에 유사하지만 별개의 트레이닝 예제를 생성하여 트레이닝 세트의 크기를 확장합니다. 또는 훈련 예제를 임의로 조정하면 모델이 특정 속성에 덜 의존하게 되므로 일반화 능력이 향상된다는 사실에 의해 이미지 확대가 동기가 될 수 있습니다. 예를 들어 관심 대상이 다른 위치에 나타나도록 다양한 방법으로 이미지를 잘라 대상 위치에 대한 모델의 의존성을 줄일 수 있습니다. 밝기 및 색상과 같은 요소를 조정하여 색상에 대한 모델의 민감도를 줄일 수도 있습니다. 당시 AlexNet의 성공을 위해 이미지 확대가 필수 불가결한 것은 아마도 사실일 것입니다. 이 섹션에서는 컴퓨터 비전에서 널리 사용되는 이 기술에 대해 설명합니다.

Image Augmentation이란?

Image augmentation refers to the process of applying various transformations to images in order to create new variations of the original images. This technique is commonly used in machine learning and computer vision tasks, especially in training deep learning models for image recognition, object detection, and other tasks. The goal of image augmentation is to increase the diversity and variability of the training dataset, which can lead to improved model generalization and performance on unseen data.

이미지 증강은 원본 이미지에 다양한 변환을 적용하여 새로운 변형된 이미지를 생성하는 과정을 말합니다. 이 기술은 주로 기계 학습과 컴퓨터 비전 작업에서 사용되며, 특히 이미지 인식, 물체 감지 및 기타 작업에 대한 딥 러닝 모델을 훈련할 때 많이 활용됩니다. 이미지 증강의 목표는 훈련 데이터셋의 다양성과 가변성을 증가시켜 모델의 일반화 성능과 새로운 데이터에서의 성능을 향상시키는 것입니다.

Image augmentation techniques involve making small, controlled changes to the images while preserving their semantic content. Some common image augmentation techniques include:

이미지 증강 기법은 이미지의 시맨틱 콘텐츠를 보존하면서 작은 제어된 변화를 가하는 것을 포함합니다. 일반적인 이미지 증강 기술로는 다음과 같은 것들이 있습니다:

Horizontal and Vertical Flips: Mirroring the image horizontally or vertically to create variations.
수평 및 수직 뒤집기: 이미지를 수평 또는 수직으로 뒤집어서 변형을 만듭니다.
Rotation: Rotating the image by a certain degree.
회전: 이미지를 일정한 각도로 회전합니다.
Zoom: Enlarging or shrinking the image slightly.
확대 및 축소: 이미지를 약간 확대하거나 축소합니다.
Brightness and Contrast Adjustment: Changing the brightness and contrast levels of the image.
밝기 및 대비 조절: 이미지의 밝기와 대비 수준을 조정합니다.
Color Jittering: Applying small color changes to the image.
색상 조절: 이미지에 작은 색상 변화를 적용합니다.
Noise Addition: Adding small amounts of noise to the image.
노이즈 추가: 이미지에 작은 양의 노이즈를 추가합니다.
Random Cropping: Cropping a random portion of the image.
임의의 자르기: 이미지의 임의 부분을 자릅니다.
Elastic Transformations: Applying elastic deformations to the image.
탄성 변형: 이미지에 탄성 변형을 적용합니다.

These transformations help the model learn to be invariant to small changes in the input data, making it more robust and accurate when applied to real-world data.

이러한 변환은 모델이 입력 데이터의 작은 변화에 불변하게 학습하도록 도와줍니다. 이로써 모델은 실제 세계 데이터에 적용할 때 더 견고하고 정확하게 작동할 수 있게 됩니다.

In the context of deep learning and neural networks, image augmentation is often used during the training process to prevent overfitting, improve model generalization, and enhance the model's ability to handle various conditions and scenarios.

딥 러닝과 신경망의 맥락에서 이미지 증강image augmentation은 종종 훈련 과정 중에 사용되어 과적합을 방지하고 모델의 일반화 성능을 개선하며 다양한 조건과 시나리오를 처리할 수 있는 능력을 향상시킵니다.

%matplotlib inline
import torch
import torchvision
from torch import nn
from d2l import torch as d2l

14.1.1. Common Image Augmentation Methods

In our investigation of common image augmentation methods, we will use the following 400×500 image an example.

일반적인 이미지 확대 방법을 조사할 때 다음 400×500 이미지를 예로 사용합니다.

d2l.set_figsize()
img = d2l.Image.open('../img/cat1.jpg')
d2l.plt.imshow(img);

위 코드는 주피터 노트북에서 사용되는 코드로, 주어진 이미지 파일을 열어서 화면에 보여주는 예제입니다. 코드를 한 줄씩 살펴보겠습니다:

%matplotlib inline: 이 라인은 주피터 노트북에서 그래프나 이미지 등을 인라인으로 표시하기 위한 명령입니다. 이를 통해 그래프나 이미지를 코드 셀 아래에 바로 표시할 수 있습니다.
import torch: 파이토치 라이브러리를 가져옵니다.
import torchvision: 파이토치의 torchvision 패키지를 가져옵니다. torchvision은 이미지 데이터셋 및 변환을 다루는 라이브러리입니다.
from torch import nn: 파이토치의 nn 모듈에서 nn을 가져옵니다. nn은 신경망 모델을 구축하는 데 사용되는 모듈들을 포함하고 있습니다.
from d2l import torch as d2l: d2l 라이브러리에서 torch 모듈을 가져와 d2l이라는 이름으로 사용합니다.
d2l.set_figsize(): 그래프의 크기를 설정하는 d2l 라이브러리의 함수를 호출합니다.
img = d2l.Image.open('../img/cat1.jpg'): d2l 라이브러리의 Image 모듈을 사용하여 '../img/cat1.jpg' 경로에 있는 이미지 파일을 엽니다. img 변수에 이미지 객체가 저장됩니다.
d2l.plt.imshow(img): d2l 라이브러리의 plt 모듈을 사용하여 이미지를 표시합니다. imshow 함수를 호출하여 img를 표시하고, ;를 사용하여 출력 결과를 숨깁니다.

위 코드는 이미지 파일 '../img/cat1.jpg'를 열어서 주피터 노트북에서 화면에 표시하는 예제입니다. 이를 실행하면 해당 경로의 이미지가 출력되게 됩니다.

Most image augmentation methods have a certain degree of randomness. To make it easier for us to observe the effect of image augmentation, next we define an auxiliary function apply. This function runs the image augmentation method aug multiple times on the input image img and shows all the results.

대부분의 이미지 확대 image augmentation 방법에는 어느 정도의 무작위성이 있습니다. 이미지 확대 image augmentation 효과를 더 쉽게 관찰할 수 있도록 다음으로 보조 함수 적용을 정의합니다. 이 함수는 입력 이미지 img에서 이미지 확대 image augmentation 방법 aug를 여러 번 실행하고 모든 결과를 표시합니다.

def apply(img, aug, num_rows=2, num_cols=4, scale=1.5):
    Y = [aug(img) for _ in range(num_rows * num_cols)]
    d2l.show_images(Y, num_rows, num_cols, scale=scale)

위 코드는 이미지 변환 (Augmentation) 함수를 적용하여 변환된 이미지들을 그리드 형태로 표시하는 함수를 정의한 것입니다. 코드를 한 줄씩 살펴보겠습니다:

def apply(img, aug, num_rows=2, num_cols=4, scale=1.5):: apply라는 함수를 정의합니다. 이 함수는 세 개의 인자를 받습니다. 첫 번째 인자 img는 입력 이미지입니다. 두 번째 인자 aug는 이미지 변환 함수를 나타냅니다. num_rows와 num_cols는 그리드에 표시할 행과 열의 개수를 지정하며, scale은 이미지의 크기를 조절하는 인자입니다.
Y = [aug(img) for _ in range(num_rows * num_cols)]: aug 함수를 num_rows * num_cols번 반복하여 변환된 이미지들을 리스트 Y에 저장합니다. 각 반복마다 aug(img)를 호출하여 이미지를 변환합니다.
d2l.show_images(Y, num_rows, num_cols, scale=scale): 변환된 이미지 리스트 Y를 num_rows 행과 num_cols 열로 나타내는 그리드 형태로 표시합니다. 이미지의 크기는 scale 인자를 사용하여 조절합니다.

즉, 위 함수 apply는 입력 이미지를 주어진 변환 함수로 여러 번 변환하여 그리드 형태로 출력하는 역할을 수행합니다. 이를 통해 이미지 변환의 효과를 시각적으로 확인할 수 있습니다.

14.1.1.1. Flipping and Cropping

Flipping the image left and right usually does not change the category of the object. This is one of the earliest and most widely used methods of image augmentation. Next, we use the transforms module to create the RandomHorizontalFlip instance, which flips an image left and right with a 50% chance.

이미지를 좌우로 뒤집는 것은 일반적으로 개체의 범주를 변경하지 않습니다. 이것은 이미지 확대의 가장 초기이자 가장 널리 사용되는 방법 중 하나입니다. 다음으로 변환transforms 모듈을 사용하여 이미지를 50%의 확률로 좌우로 뒤집는 RandomHorizontalFlip 인스턴스를 만듭니다.

apply(img, torchvision.transforms.RandomHorizontalFlip())

위 코드는 이미지 변환 함수를 적용하여 입력 이미지를 수평으로 뒤집은 후 변환된 이미지들을 그리드 형태로 표시하는 작업을 수행합니다. 코드를 한 줄씩 살펴보겠습니다:

apply(img, torchvision.transforms.RandomHorizontalFlip()): apply 함수를 호출하여 입력 이미지 img에 torchvision.transforms.RandomHorizontalFlip() 변환 함수를 적용합니다. 이 변환 함수는 입력 이미지를 무작위로 수평으로 뒤집습니다. 즉, 이미지의 왼쪽과 오른쪽이 뒤바뀝니다.

위 코드는 apply 함수를 사용하여 입력 이미지를 수평으로 뒤집은 후 변환된 이미지들을 그리드 형태로 표시하는 작업을 수행합니다. 이를 통해 이미지 뒤집기 변환의 효과를 시각적으로 확인할 수 있습니다.

Flipping up and down is not as common as flipping left and right. But at least for this example image, flipping up and down does not hinder recognition. Next, we create a RandomVerticalFlip instance to flip an image up and down with a 50% chance.

위아래로 뒤집는 것은 좌우로 뒤집는 것만큼 일반적이지 않습니다. 그러나 적어도 이 예제 이미지의 경우 위아래로 뒤집는 것이 인식을 방해하지 않습니다. 다음으로 RandomVerticalFlip 인스턴스를 생성하여 50% 확률로 이미지를 위아래로 뒤집습니다.

apply(img, torchvision.transforms.RandomVerticalFlip())

위 코드는 입력 이미지에 수직으로 무작위로 뒤집기 변환을 적용하고, 이를 그리드 형태로 시각화하는 작업을 수행합니다. 코드를 단계별로 설명하겠습니다:

apply(img, torchvision.transforms.RandomVerticalFlip()): apply 함수를 호출하여 입력 이미지 img에 torchvision.transforms.RandomVerticalFlip() 변환 함수를 적용합니다. 이 변환 함수는 입력 이미지를 무작위로 수직으로 뒤집습니다. 즉, 이미지의 위쪽과 아래쪽이 뒤바뀝니다.

이 코드는 입력 이미지를 수직으로 뒤집은 후 변환된 이미지들을 그리드 형태로 표시합니다. 이를 통해 이미지 뒤집기 변환의 효과를 시각적으로 확인할 수 있습니다.

In the example image we used, the cat is in the middle of the image, but this may not be the case in general. In Section 7.5, we explained that the pooling layer can reduce the sensitivity of a convolutional layer to the target position. In addition, we can also randomly crop the image to make objects appear in different positions in the image at different scales, which can also reduce the sensitivity of a model to the target position.

우리가 사용한 예제 이미지에서 고양이는 이미지 중앙에 있지만 일반적으로 그렇지 않을 수 있습니다. 7.5절에서 풀링 레이어가 목표 위치에 대한 컨볼루션 레이어의 민감도를 감소시킬 수 있다고 설명했습니다. 또한 이미지를 무작위로 잘라 개체가 이미지의 다른 위치에 다른 배율로 나타나도록 할 수도 있습니다. 이렇게 하면 대상 위치에 대한 모델의 민감도를 줄일 수도 있습니다.

In the code below, we randomly crop an area with an area of 10%∼100% of the original area each time, and the ratio of width to height of this area is randomly selected from 0.5∼2. Then, the width and height of the region are both scaled to 200 pixels. Unless otherwise specified, the random number between a and b in this section refers to a continuous value obtained by random and uniform sampling from the interval [a,b].

아래 코드에서는 매번 원래 영역의 10%~100% 영역을 무작위로 자르고 이 영역의 너비와 높이의 비율을 0.5~2 중에서 무작위로 선택합니다. 그런 다음 영역의 너비와 높이가 모두 200픽셀로 조정됩니다. 달리 명시하지 않는 한, 이 섹션에서 a와 b 사이의 난수는 간격 [a,b]에서 무작위로 균일하게 샘플링하여 얻은 연속 값을 나타냅니다.

shape_aug = torchvision.transforms.RandomResizedCrop(
    (200, 200), scale=(0.1, 1), ratio=(0.5, 2))
apply(img, shape_aug)

위 코드는 입력 이미지에 무작위 크기와 종횡비로 자르기(resize crop) 변환을 적용하고, 이를 그리드 형태로 시각화하는 작업을 수행합니다. 코드를 단계별로 설명하겠습니다:

shape_aug = torchvision.transforms.RandomResizedCrop((200, 200), scale=(0.1, 1), ratio=(0.5, 2)): torchvision.transforms.RandomResizedCrop() 변환 함수를 사용하여 shape_aug라는 변수에 무작위 크기로 자르기 변환을 생성합니다. 이 함수는 입력 이미지를 주어진 크기로 자르고, 무작위로 크기와 종횡비를 변화시킵니다. 여기서 (200, 200)은 자르고자 하는 크기를 의미하며, scale=(0.1, 1)은 무작위로 선택할 크기의 범위를 나타내고, ratio=(0.5, 2)는 무작위로 선택할 종횡비의 범위를 나타냅니다.
apply(img, shape_aug): apply 함수를 호출하여 입력 이미지 img에 shape_aug로 정의된 변환 함수를 적용합니다. 이를 통해 입력 이미지에 무작위 크기와 종횡비로 자르기 변환을 적용한 이미지들을 그리드 형태로 표시합니다.

이 코드는 이미지를 무작위 크기와 종횡비로 자르는 변환을 적용하여 이미지의 다양한 부분을 추출하고 시각화하는 역할을 합니다.

14.1.1.2. Changing Colors

Another augmentation method is changing colors. We can change four aspects of the image color: brightness, contrast, saturation, and hue. In the example below, we randomly change the brightness of the image to a value between 50% (1−0.5) and 150% (1+0.5) of the original image.

또 다른 증강 방법은 색상을 변경하는 것입니다. 이미지 색상의 네 가지 측면인 밝기, 대비, 채도 및 색조를 변경할 수 있습니다. 아래 예에서는 이미지의 밝기를 원본 이미지의 50%(1−0.5)와 150%(1+0.5) 사이의 값으로 무작위로 변경합니다.

apply(img, torchvision.transforms.ColorJitter(
    brightness=0.5, contrast=0, saturation=0, hue=0))

위 코드는 입력 이미지에 색상 조정 변환을 적용하고, 이를 시각화하는 작업을 수행합니다. 코드를 단계별로 설명하겠습니다:

torchvision.transforms.ColorJitter(brightness=0.5, contrast=0, saturation=0, hue=0): torchvision.transforms.ColorJitter() 변환 함수를 사용하여 색상 조정 변환을 생성합니다. 이 함수는 입력 이미지의 색상을 조정하여 다양한 효과를 줄 수 있습니다. 여기서 brightness, contrast, saturation, hue 매개변수를 조정하여 각각 밝기, 대비, 채도, 색조를 조정할 수 있습니다. 각 매개변수의 값이 0보다 크면 변환이 적용됩니다.
apply(img, torchvision.transforms.ColorJitter(brightness=0.5, contrast=0, saturation=0, hue=0)): apply 함수를 호출하여 입력 이미지 img에 색상 조정 변환 함수를 적용합니다. 이를 통해 입력 이미지에 다양한 색상 조정 효과를 적용한 이미지들을 그리드 형태로 표시합니다.

이 코드는 이미지의 색상을 다양하게 변화시켜 시각화하는 역할을 합니다. 여기서는 밝기를 조정하고 대비, 채도, 색조는 조정하지 않습니다.

Similarly, we can randomly change the hue of the image.

마찬가지로 이미지의 색조를 무작위로 변경할 수 있습니다.

apply(img, torchvision.transforms.ColorJitter(
    brightness=0, contrast=0, saturation=0, hue=0.5))

위 코드는 입력 이미지에 대한 torchvision.transforms.ColorJitter() 변환을 사용하여 색상 조정 변환을 적용하고, 이를 시각화하는 작업을 수행합니다. 코드를 단계별로 설명하겠습니다:

torchvision.transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0.5): torchvision.transforms.ColorJitter() 변환 함수를 사용하여 색상 조정 변환을 생성합니다. 여기서 brightness, contrast, saturation 매개변수를 0으로 설정하고, hue 매개변수를 0.5로 설정하여 색조에 대한 조정만 적용될 수 있도록 합니다. hue 값이 0.5이므로 이미지의 색조가 최대 ±0.5만큼 변경됩니다.
apply(img, torchvision.transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0.5)): apply 함수를 호출하여 입력 이미지 img에 색상 조정 변환 함수를 적용합니다. 이를 통해 입력 이미지에 다양한 색조 조정 효과를 적용한 이미지들을 그리드 형태로 표시합니다.

이 코드는 이미지의 색조를 다양하게 변화시켜 시각화하는 역할을 합니다.

We can also create a RandomColorJitter instance and set how to randomly change the brightness, contrast, saturation, and hue of the image at the same time.

또한 RandomColorJitter 인스턴스를 생성하고 동시에 이미지의 밝기, 대비, 채도 및 색조를 무작위로 변경하는 방법을 설정할 수 있습니다.

color_aug = torchvision.transforms.ColorJitter(
    brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5)
apply(img, color_aug)

위 코드는 torchvision.transforms.ColorJitter() 변환을 사용하여 입력 이미지에 다양한 색상 조정을 적용하고, 이를 시각화하는 작업을 수행합니다. 코드를 단계별로 설명하겠습니다:

torchvision.transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5): torchvision.transforms.ColorJitter() 변환 함수를 사용하여 색상 조정 변환을 생성합니다. 여기서 brightness, contrast, saturation 매개변수를 0.5로 설정하고, hue 매개변수를 0.5로 설정하여 밝기, 대비, 채도, 색조에 대한 조정을 적용할 수 있도록 합니다. 이렇게 설정하면 입력 이미지의 색상이 다양하게 변화됩니다.
apply(img, color_aug): apply 함수를 호출하여 입력 이미지 img에 색상 조정 변환 함수 color_aug를 적용합니다. 이를 통해 입력 이미지에 다양한 색상 조정 효과를 적용한 이미지들을 그리드 형태로 표시합니다.

이 코드는 입력 이미지의 밝기, 대비, 채도, 색조를 다양하게 변화시켜 시각화하는 역할을 합니다.

14.1.1.3. Combining Multiple Image Augmentation Methods

In practice, we will combine multiple image augmentation methods. For example, we can combine the different image augmentation methods defined above and apply them to each image via a Compose instance.

실제로는 여러 이미지 확대 방법을 결합합니다. 예를 들어 위에서 정의한 다양한 이미지 확대 방법을 결합하고 Compose 인스턴스를 통해 각 이미지에 적용할 수 있습니다.

all_images = torchvision.datasets.CIFAR10(train=True, root="../data",
                                          download=True)
d2l.show_images([all_images[i][0] for i in range(32)], 4, 8, scale=0.8);

위 코드는 CIFAR-10 데이터셋에서 이미지를 가져와 시각화하는 작업을 수행합니다. 코드를 단계별로 설명하겠습니다:

torchvision.datasets.CIFAR10(train=True, root="../data", download=True): CIFAR-10 데이터셋을 로드합니다. train 매개변수를 True로 설정하여 학습 데이터셋을 로드하며, root 매개변수에 데이터의 저장 경로를 지정합니다. download 매개변수를 True로 설정하면 데이터셋을 다운로드합니다.
[all_images[i][0] for i in range(32)]: CIFAR-10 데이터셋에서 처음부터 32개의 이미지를 가져와 리스트로 만듭니다. 각 이미지 데이터는 (이미지, 레이블) 형태로 저장되어 있으며, 여기서는 이미지 데이터만을 선택합니다.
d2l.show_images([...], 4, 8, scale=0.8): d2l.show_images() 함수를 사용하여 이미지 리스트를 그리드 형태로 시각화합니다. 4는 그리드의 행 수, 8은 그리드의 열 수를 나타냅니다. scale 매개변수는 이미지 크기의 스케일을 조절하는 역할을 합니다.

이 코드는 CIFAR-10 데이터셋에서 가져온 처음 32개의 이미지를 4x8 그리드로 시각화하여 보여줍니다.

In order to obtain definitive results during prediction, we usually only apply image augmentation to training examples, and do not use image augmentation with random operations during prediction. Here we only use the simplest random left-right flipping method. In addition, we use a ToTensor instance to convert a minibatch of images into the format required by the deep learning framework, i.e., 32-bit floating point numbers between 0 and 1 with the shape of (batch size, number of channels, height, width).

예측 중에 결정적인 결과를 얻기 위해 일반적으로 훈련 예제에만 이미지 확대를 적용하고 예측 중에 임의 작업으로 이미지 확대를 사용하지 않습니다. 여기서는 가장 간단한 임의의 좌우 뒤집기 방법만 사용합니다. 또한 ToTensor 인스턴스를 사용하여 이미지의 미니 배치를 딥 러닝 프레임워크에 필요한 형식, 즉 (배치 크기, 채널 수, 높이, 너비).

train_augs = torchvision.transforms.Compose([
     torchvision.transforms.RandomHorizontalFlip(),
     torchvision.transforms.ToTensor()])

test_augs = torchvision.transforms.Compose([
     torchvision.transforms.ToTensor()])

위 코드는 데이터 증강을 위한 데이터 변환 함수들을 정의하는 작업을 수행합니다. 코드를 단계별로 설명하겠습니다:

train_augs: 학습 데이터에 적용할 데이터 증강 변환 함수들을 조합한 객체를 생성합니다. torchvision.transforms.Compose() 함수를 사용하여 여러 변환 함수들을 순차적으로 적용할 수 있습니다.
- torchvision.transforms.RandomHorizontalFlip(): 무작위로 이미지를 수평으로 뒤집는 데이터 증강을 수행합니다. 이는 이미지를 좌우로 반전시키는 효과를 줍니다.
- torchvision.transforms.ToTensor(): 이미지를 텐서 형태로 변환합니다. 이 작업은 이미지 데이터를 신경망에 입력하기 위해 필요한 전처리 단계 중 하나입니다.
test_augs: 테스트 데이터에 적용할 데이터 변환 함수들을 조합한 객체를 생성합니다. 여기서는 테스트 데이터에는 이미지 뒤집기 등의 변환을 적용하지 않고, 단순히 이미지를 텐서로 변환하는 ToTensor() 변환만 적용합니다.

이렇게 정의된 train_augs와 test_augs 객체는 학습 및 테스트 데이터에 데이터 증강 및 전처리를 적용할 때 사용됩니다. 학습 데이터에는 수평 뒤집기와 이미지를 텐서로 변환하는 작업이 수행되며, 테스트 데이터에는 이미지를 텐서로 변환하는 작업만 수행됩니다.

Next, we define an auxiliary function to facilitate reading the image and applying image augmentation. The transform argument provided by PyTorch’s dataset applies augmentation to transform the images. For a detailed introduction to DataLoader, please refer to Section 4.2.

다음으로 이미지 읽기 및 이미지 확대 적용을 용이하게 하는 보조 기능을 정의합니다. PyTorch의 데이터 세트에서 제공하는 변환 인수는 증강을 적용하여 이미지를 변환합니다. DataLoader에 대한 자세한 소개는 섹션 4.2를 참조하십시오.

def load_cifar10(is_train, augs, batch_size):
    dataset = torchvision.datasets.CIFAR10(root="../data", train=is_train,
                                           transform=augs, download=True)
    dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size,
                    shuffle=is_train, num_workers=d2l.get_dataloader_workers())
    return dataloader

위 코드는 CIFAR-10 데이터셋을 로드하는 함수인 load_cifar10를 정의하는 부분입니다. 코드를 단계별로 설명하겠습니다:

is_train: 학습 데이터인지 테스트 데이터인지를 나타내는 인자입니다. True이면 학습 데이터를, False이면 테스트 데이터를 로드합니다.
augs: 데이터 증강 및 전처리를 적용하기 위한 데이터 변환 함수를 받는 인자입니다. 앞서 정의한 train_augs 또는 test_augs와 같은 변환 함수 객체를 여기에 전달합니다.
batch_size: 배치 크기를 나타내는 인자입니다. 데이터를 미니배치로 나눠서 로드할 때 한 번에 로드되는 데이터 샘플의 개수입니다.
torchvision.datasets.CIFAR10(): CIFAR-10 데이터셋을 로드하는 함수입니다. 해당 데이터셋은 지정된 경로에서 이미 다운로드된 경우 로드하며, 그렇지 않은 경우 다운로드합니다. 데이터셋은 CIFAR-10의 학습 데이터 또는 테스트 데이터 중 하나를 선택하여 로드합니다.
transform: 데이터 변환 함수 객체를 전달합니다. 이를 통해 데이터 증강 및 전처리 작업을 수행합니다.
torch.utils.data.DataLoader(): 데이터셋을 미니배치 형태로 로드하기 위한 데이터 로더 객체를 생성합니다. 배치 크기와 데이터 순서를 지정하여 데이터를 로드하며, 필요한 경우 데이터 로드를 병렬화합니다.
d2l.get_dataloader_workers(): 데이터 로더를 병렬화할 때 사용할 워커의 수를 반환하는 함수입니다. 데이터 로드 성능을 향상시키기 위해 병렬 처리에 사용됩니다.
반환값: 로드된 데이터셋을 배치 단위로 로드하는 데이터 로더 객체를 반환합니다. 이를 통해 학습 또는 테스트 데이터를 미니배치 형태로 반복해서 사용할 수 있습니다.

14.1.2.1. Multi-GPU Training

#@save
def train_batch_ch13(net, X, y, loss, trainer, devices):
    """Train for a minibatch with multiple GPUs (defined in Chapter 13)."""
    if isinstance(X, list):
        # Required for BERT fine-tuning (to be covered later)
        X = [x.to(devices[0]) for x in X]
    else:
        X = X.to(devices[0])
    y = y.to(devices[0])
    net.train()
    trainer.zero_grad()
    pred = net(X)
    l = loss(pred, y)
    l.sum().backward()
    trainer.step()
    train_loss_sum = l.sum()
    train_acc_sum = d2l.accuracy(pred, y)
    return train_loss_sum, train_acc_sum

#@save
def train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs,
               devices=d2l.try_all_gpus()):
    """Train a model with multiple GPUs (defined in Chapter 13)."""
    timer, num_batches = d2l.Timer(), len(train_iter)
    animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0, 1],
                            legend=['train loss', 'train acc', 'test acc'])
    net = nn.DataParallel(net, device_ids=devices).to(devices[0])
    for epoch in range(num_epochs):
        # Sum of training loss, sum of training accuracy, no. of examples,
        # no. of predictions
        metric = d2l.Accumulator(4)
        for i, (features, labels) in enumerate(train_iter):
            timer.start()
            l, acc = train_batch_ch13(
                net, features, labels, loss, trainer, devices)
            metric.add(l, acc, labels.shape[0], labels.numel())
            timer.stop()
            if (i + 1) % (num_batches // 5) == 0 or i == num_batches - 1:
                animator.add(epoch + (i + 1) / num_batches,
                             (metric[0] / metric[2], metric[1] / metric[3],
                              None))
        test_acc = d2l.evaluate_accuracy_gpu(net, test_iter)
        animator.add(epoch + 1, (None, None, test_acc))
    print(f'loss {metric[0] / metric[2]:.3f}, train acc '
          f'{metric[1] / metric[3]:.3f}, test acc {test_acc:.3f}')
    print(f'{metric[2] * num_epochs / timer.sum():.1f} examples/sec on '
          f'{str(devices)}')

위 코드는 다중 GPU를 활용하여 모델을 학습하는 함수인 train_ch13와 해당 함수 내에서 호출되는 train_batch_ch13 함수를 정의하는 부분입니다. 코드를 단계별로 설명하겠습니다:

train_batch_ch13(net, X, y, loss, trainer, devices): 다중 GPU로 미니배치 학습을 수행하는 함수입니다. 인자로는 모델 net, 입력 데이터 X, 레이블 데이터 y, 손실 함수 loss, 옵티마이저 trainer, 그리고 사용할 GPU 디바이스들의 리스트 devices가 전달됩니다. 이 함수는 미니배치를 처리하고, 손실 계산, 역전파, 가중치 갱신 등의 학습 과정을 수행합니다.
train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs, devices=d2l.try_all_gpus()): 다중 GPU로 모델을 학습하는 함수입니다. 인자로는 모델 net, 학습 데이터 로더 train_iter, 테스트 데이터 로더 test_iter, 손실 함수 loss, 옵티마이저 trainer, 학습 에포크 수 num_epochs, 그리고 사용할 GPU 디바이스들의 리스트 devices가 전달됩니다. 이 함수는 각 에포크마다 학습과 평가 과정을 반복하여 모델을 학습하고 성능을 평가합니다.
nn.DataParallel(net, device_ids=devices).to(devices[0]): DataParallel을 사용하여 모델 net을 여러 GPU에 병렬화하여 할당하는 과정입니다. 병렬화된 모델은 devices[0]에 할당됩니다.
d2l.Accumulator(4): 성능 메트릭을 누적하기 위한 Accumulator 객체를 생성합니다. 누적할 값은 손실, 정확도, 데이터 샘플 수, 예측 수로 총 4개입니다.
timer.start() 및 timer.stop(): 각 미니배치 학습 시간을 측정하기 위한 타이머를 시작하고 중지합니다.
d2l.evaluate_accuracy_gpu(net, test_iter): 테스트 데이터에 대한 정확도를 GPU를 이용하여 평가하는 함수입니다.
animator.add(...): 학습 과정 및 성능 변화를 시각화하는 Animator 객체에 정보를 추가합니다.
마지막에는 각 에포크마다 학습 및 평가 결과를 출력하고, 처리한 예제 수에 대한 처리 속도도 출력합니다.

이러한 함수를 사용하면 다중 GPU를 활용하여 모델을 학습하고 성능을 평가할 수 있습니다.

Now we can define the train_with_data_aug function to train the model with image augmentation. This function gets all available GPUs, uses Adam as the optimization algorithm, applies image augmentation to the training dataset, and finally calls the train_ch13 function just defined to train and evaluate the model.

이제 train_with_data_aug 함수를 정의하여 이미지 확대image augmentation로 모델을 훈련할 수 있습니다. 이 함수는 사용 가능한 모든 GPU를 가져오고 Adam을 최적화 알고리즘으로 사용하며 훈련 데이터 세트에 이미지 증대를 적용하고 마지막으로 방금 정의한 train_ch13 함수를 호출하여 모델을 훈련하고 평가합니다.

batch_size, devices, net = 256, d2l.try_all_gpus(), d2l.resnet18(10, 3)
net.apply(d2l.init_cnn)

def train_with_data_aug(train_augs, test_augs, net, lr=0.001):
    train_iter = load_cifar10(True, train_augs, batch_size)
    test_iter = load_cifar10(False, test_augs, batch_size)
    loss = nn.CrossEntropyLoss(reduction="none")
    trainer = torch.optim.Adam(net.parameters(), lr=lr)
    net(next(iter(train_iter))[0])
    train_ch13(net, train_iter, test_iter, loss, trainer, 10, devices)

위 코드는 데이터 증강을 적용하여 ResNet-18 모델을 학습하는 과정을 정의하는 부분입니다. 코드를 단계별로 설명하겠습니다:

batch_size, devices, net: 미니배치 크기 batch_size, 사용 가능한 GPU 디바이스 리스트 devices, 그리고 d2l.resnet18 함수를 사용하여 클래스 수 10개, 입력 채널 3개인 ResNet-18 모델 net을 생성합니다.
net.apply(d2l.init_cnn): 모델의 가중치 초기화를 위해 d2l.init_cnn 함수를 적용합니다. 이 함수는 모델 내부의 합성곱 레이어의 가중치를 초기화합니다.
train_with_data_aug(train_augs, test_augs, net, lr=0.001): 데이터 증강을 적용하여 모델을 학습하는 함수입니다. 인자로는 학습 데이터 증강 변환 train_augs, 테스트 데이터 증강 변환 test_augs, 모델 net, 학습률 lr이 전달됩니다. 이 함수는 데이터 로더를 생성하고, 손실 함수, 옵티마이저 등을 설정하여 모델을 학습합니다.
train_iter = load_cifar10(True, train_augs, batch_size): 학습 데이터 로더를 생성합니다. load_cifar10 함수를 사용하여 CIFAR-10 학습 데이터에 데이터 증강을 적용한 후 미니배치 크기 batch_size로 분할합니다.
test_iter = load_cifar10(False, test_augs, batch_size): 테스트 데이터 로더를 생성합니다. 학습 데이터와 마찬가지로 테스트 데이터에 데이터 증강을 적용하고 미니배치 크기 batch_size로 분할합니다.
loss = nn.CrossEntropyLoss(reduction="none"): 손실 함수로 교차 엔트로피 손실 함수를 사용하며, reduction 인자를 "none"으로 설정하여 각 샘플에 대한 손실 값을 반환합니다.
trainer = torch.optim.Adam(net.parameters(), lr=lr): Adam 옵티마이저를 생성하고, 모델의 파라미터를 최적화 대상으로 설정합니다. 학습률은 lr로 설정됩니다.
net(next(iter(train_iter))[0]): 모델의 가중치를 초기화하고, 모델의 출력을 계산하여 GPU 메모리에 모델을 적재합니다.
train_ch13(net, train_iter, test_iter, loss, trainer, 10, devices): 다중 GPU를 활용하여 모델을 학습하는 train_ch13 함수를 호출하여 모델을 학습하고 성능을 평가합니다. 학습과 평가는 각각 10 에포크 동안 수행됩니다.

이러한 과정을 통해 데이터 증강을 적용하고, ResNet-18 모델을 학습하며 성능을 평가하는 과정이 구현되어 있습니다.

Let’s train the model using image augmentation based on random left-right flipping.

무작위 좌우 반전을 기반으로 한 이미지 확대를 사용하여 모델을 훈련시켜 봅시다.

train_with_data_aug(train_augs, test_augs, net)

14.1.3. Summary

Image augmentation generates random images based on existing training data to improve the generalization ability of models.

이미지 증대는 모델의 일반화 능력을 향상시키기 위해 기존 학습 데이터를 기반으로 임의의 이미지를 생성합니다.
In order to obtain definitive results during prediction, we usually only apply image augmentation to training examples, and do not use image augmentation with random operations during prediction.

예측 중에 결정적인 결과를 얻기 위해 일반적으로 훈련 예제에만 이미지 확대를 적용하고 예측 중에 임의 작업으로 이미지 확대를 사용하지 않습니다.
Deep learning frameworks provide many different image augmentation methods, which can be applied simultaneously.

딥 러닝 프레임워크는 동시에 적용할 수 있는 다양한 이미지 확대 image augmentation 방법을 제공합니다.

14.1.4. Exercises

Train the model without using image augmentation: train_with_data_aug(test_augs, test_augs). Compare training and testing accuracy when using and not using image augmentation. Can this comparative experiment support the argument that image augmentation can mitigate overfitting? Why?
Combine multiple different image augmentation methods in model training on the CIFAR-10 dataset. Does it improve test accuracy?
Refer to the online documentation of the deep learning framework. What other image augmentation methods does it also provide?

'Dive into Deep Learning > D2L Computer Vision' 카테고리의 다른 글

D2L - 14.10. Transposed Convolution (1)	2023.08.21
D2L - 14.9. Semantic Segmentation and the Dataset (0)	2023.08.20
D2L - 14.8. Region-based CNNs (R-CNNs) (0)	2023.08.20
D2L - 14.7. Single Shot Multibox Detection (0)	2023.08.19
D2L - 14.6. The Object Detection Dataset (0)	2023.08.19
D2L - 14.5. Multiscale Object Detection (0)	2023.08.19
D2L - 14.4. Anchor Boxes (0)	2023.08.19
D2L - 14.3. Object Detection and Bounding Boxes (0)	2023.08.19
D2L - 14.2. Fine-Tuning (0)	2023.08.19
D2L - 14. Computer Vision (0)	2023.08.18

Dive into Deep Learning/D2L Computer Vision

D2L - 14. Computer Vision

2023. 8. 18. 22:02 | Posted by 솔웅

https://d2l.ai/chapter_computer-vision/index.html

14. Computer Vision — Dive into Deep Learning 1.0.0 documentation

d2l.ai

14. Computer Vision

Whether it is medical diagnosis, self-driving vehicles, camera monitoring, or smart filters, many applications in the field of computer vision are closely related to our current and future lives. In recent years, deep learning has been the transformative power for advancing the performance of computer vision systems. It can be said that the most advanced computer vision applications are almost inseparable from deep learning. In view of this, this chapter will focus on the field of computer vision, and investigate methods and applications that have recently been influential in academia and industry.

의료 진단, 자율 주행 차량, 카메라 모니터링 또는 스마트 필터 등 컴퓨터 비전 분야의 많은 응용 프로그램은 현재와 미래의 삶과 밀접한 관련이 있습니다. 최근 몇 년 동안 딥 러닝은 컴퓨터 비전 시스템의 성능을 향상시키는 변혁적인 힘이었습니다. 가장 진보된 컴퓨터 비전 응용 프로그램은 딥 러닝과 거의 분리할 수 없다고 말할 수 있습니다. 이를 고려하여 이 장에서는 컴퓨터 비전 분야에 초점을 맞추고 최근 학계와 산업계에 영향을 미치고 있는 방법과 응용을 조사합니다.

In Section 7 and Section 8, we studied various convolutional neural networks that are commonly used in computer vision, and applied them to simple image classification tasks. At the beginning of this chapter, we will describe two methods that may improve model generalization, namely image augmentation and fine-tuning, and apply them to image classification. Since deep neural networks can effectively represent images in multiple levels, such layerwise representations have been successfully used in various computer vision tasks such as object detection, semantic segmentation, and style transfer. Following the key idea of leveraging layerwise representations in computer vision, we will begin with major components and techniques for object detection. Next, we will show how to use fully convolutional networks for semantic segmentation of images. Then we will explain how to use style transfer techniques to generate images like the cover of this book. In the end, we conclude this chapter by applying the materials of this chapter and several previous chapters on two popular computer vision benchmark datasets.

7절과 8절에서는 컴퓨터 비전에서 흔히 사용되는 다양한 컨볼루션 신경망을 연구하고 이를 간단한 이미지 분류 작업에 적용하였다. 이 장의 시작 부분에서 모델 일반화를 개선할 수 있는 두 가지 방법, 즉 이미지 확대 및 미세 조정을 설명하고 이미지 분류에 적용할 것입니다. 심층 신경망은 여러 수준에서 이미지를 효과적으로 표현할 수 있기 때문에 이러한 레이어별 표현은 개체 감지, 의미론적 분할 및 스타일 전송과 같은 다양한 컴퓨터 비전 작업에서 성공적으로 사용되었습니다. 컴퓨터 비전에서 계층별 표현을 활용하는 핵심 아이디어에 따라 물체 감지를 위한 주요 구성 요소와 기술부터 시작하겠습니다. 다음으로 이미지의 시맨틱 분할을 위해 완전 컨벌루션 네트워크를 사용하는 방법을 보여줍니다. 그런 다음 스타일 변환 기술을 사용하여 이 책의 표지와 같은 이미지를 생성하는 방법을 설명합니다. 마지막으로 이 장의 자료와 두 가지 인기 있는 컴퓨터 비전 벤치마크 데이터 세트에 대한 이전 장의 여러 장을 적용하여 이 장을 마무리합니다.

'Dive into Deep Learning > D2L Computer Vision' 카테고리의 다른 글

D2L - 14.10. Transposed Convolution (1)	2023.08.21
D2L - 14.9. Semantic Segmentation and the Dataset (0)	2023.08.20
D2L - 14.8. Region-based CNNs (R-CNNs) (0)	2023.08.20
D2L - 14.7. Single Shot Multibox Detection (0)	2023.08.19
D2L - 14.6. The Object Detection Dataset (0)	2023.08.19
D2L - 14.5. Multiscale Object Detection (0)	2023.08.19
D2L - 14.4. Anchor Boxes (0)	2023.08.19
D2L - 14.3. Object Detection and Bounding Boxes (0)	2023.08.19
D2L - 14.2. Fine-Tuning (0)	2023.08.19
D2L - 14.1. Image Augmentation (0)	2023.08.19

1 2

IT 기술 따라잡기

공지사항

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

카테고리

'Dive into Deep Learning/D2L Computer Vision'에 해당되는 글 13건

D2L - 14.2. Fine-Tuning

14.2. Fine-Tuning

14.2.1. Steps

14.2.2. Hot Dog Recognition

14.2.2.1. Reading the Dataset

14.2.2.2. Defining and Initializing the Model

14.2.2.3. Fine-Tuning the Model

14.2.3. Summary

14.2.4. Exercises

'Dive into Deep Learning > D2L Computer Vision' 카테고리의 다른 글

D2L - 14.1. Image Augmentation

14.1. Image Augmentation

14.1.1. Common Image Augmentation Methods

14.1.1.1. Flipping and Cropping

14.1.1.2. Changing Colors

14.1.1.3. Combining Multiple Image Augmentation Methods

14.1.2.1. Multi-GPU Training

14.1.3. Summary

14.1.4. Exercises

'Dive into Deep Learning > D2L Computer Vision' 카테고리의 다른 글

D2L - 14. Computer Vision

14. Computer Vision

'Dive into Deep Learning > D2L Computer Vision' 카테고리의 다른 글

티스토리툴바