'Dive into Deep Learning'에 해당되는 글 123건

2023.09.10 D2L - 19.2. Hyperparameter Optimization API
2023.09.10 D2L - 19.1. What Is Hyperparameter Optimization?
2023.09.10 D2L - 19. Hyperparameter Optimization
2023.09.10 D2L - 18.3. Gaussian Process Inference
2023.09.09 D2L - 18.2. Gaussian Process Priors 1
2023.09.09 D2L - 18.1. Introduction to Gaussian Processes
2023.09.09 D2L - 18. Gaussian Processes
2023.09.05 D2L - 17.3. Q-Learning
2023.09.05 D2L - 17.2. Value Iteration
2023.09.05 D2L - 17.1. Markov Decision Process (MDP)

Dive into Deep Learning/D2L Hyperparameter Optimization

D2L - 19.2. Hyperparameter Optimization API

2023. 9. 10. 01:40 | Posted by 솔웅

https://d2l.ai/chapter_hyperparameter-optimization/hyperopt-api.html

19.2. Hyperparameter Optimization API — Dive into Deep Learning 1.0.3 documentation

d2l.ai

19.2. Hyperparameter Optimization API

Before we dive into the methodology, we will first discuss a basic code structure that allows us to efficiently implement various HPO algorithms. In general, all HPO algorithms considered here need to implement two decision making primitives, searching and scheduling. First, they need to sample new hyperparameter configurations, which often involves some kind of search over the configuration space. Second, for each configuration, an HPO algorithm needs to schedule its evaluation and decide how many resources to allocate for it. Once we start to evaluate a configuration, we will refer to it as a trial. We map these decisions to two classes, HPOSearcher and HPOScheduler. On top of that, we also provide a HPOTuner class that executes the optimization process.

방법론에 대해 알아보기 전에 먼저 다양한 HPO 알고리즘을 효율적으로 구현할 수 있는 기본 코드 구조에 대해 논의하겠습니다. 일반적으로 여기에서 고려되는 모든 HPO 알고리즘은 검색 searching 과 예약 scheduling이라는 두 가지 의사 결정 기본 요소를 구현해야 합니다. 첫째, 새로운 하이퍼파라미터 구성을 샘플링해야 하며, 여기에는 종종 구성 공간에 대한 일종의 검색이 포함됩니다. 둘째, 각 구성에 대해 HPO 알고리즘은 평가 일정을 계획하고 이에 할당할 리소스 수를 결정해야 합니다. 구성 평가를 시작하면 이를 평가판이라고 합니다. 우리는 이러한 결정을 HPOSearcher와 HPOScheduler라는 두 클래스에 매핑합니다. 또한 최적화 프로세스를 실행하는 HPOTuner 클래스도 제공합니다.

This concept of scheduler and searcher is also implemented in popular HPO libraries, such as Syne Tune (Salinas et al., 2022), Ray Tune (Liaw et al., 2018) or Optuna (Akiba et al., 2019).

이 스케줄러 및 검색기 개념은 Syne Tune(Salinas et al., 2022), Ray Tune(Liaw et al., 2018) 또는 Optuna(Akiba et al., 2019)와 같은 인기 있는 HPO 라이브러리에서도 구현됩니다.

import time
from scipy import stats
from d2l import torch as d2l

위의 코드는 파이썬 모듈 및 라이브러리를 가져오는 부분입니다. 코드의 목적과 각 부분에 대한 설명은 다음과 같습니다:

import time: 파이썬의 내장 모듈인 time 모듈을 가져옵니다. 이 모듈은 시간과 관련된 함수를 제공하며, 시간 지연(delay)을 생성하거나 시간을 측정하는 데 사용될 수 있습니다.
from scipy import stats: SciPy 라이브러리에서 stats 모듈을 가져옵니다. SciPy는 과학 및 공학 연산을 위한 파이썬 라이브러리로, 통계, 최적화, 선형 대수, 신호 처리 등 다양한 기능을 제공합니다. 여기서는 주로 확률 분포와 통계 함수를 사용하는 stats 모듈을 가져옵니다.
from d2l import torch as d2l: "Dive into Deep Learning" (D2L) 도서와 관련된 PyTorch 유틸리티 함수 및 도움말 기능을 제공하는 d2l 라이브러리에서 PyTorch 모듈을 가져옵니다. 이 라이브러리는 딥러닝 교육 및 예제 코드를 지원하는 데 사용됩니다. 여기서는 d2l 라이브러리에서 torch 모듈을 가져와서 d2l로 별칭을 붙입니다.

이 코드는 필요한 모듈과 라이브러리를 가져와서 이후의 작업에서 사용할 수 있도록 설정하는 부분입니다.

Hyperparameter Optimization API란 (GPT 3.5 답변)

The Hyperparameter Optimization (HPO) API refers to a programming interface or toolset that allows developers and data scientists to perform hyperparameter optimization efficiently and effectively. This API typically includes functions, classes, and methods for configuring, running, and managing the hyperparameter optimization process.

하이퍼파라미터 최적화 (HPO) API는 개발자와 데이터 과학자가 하이퍼파라미터 최적화를 효율적이고 효과적으로 수행할 수 있도록 하는 프로그래밍 인터페이스 또는 도구 세트를 가리킵니다. 이 API에는 하이퍼파라미터 최적화 프로세스를 구성하고 실행하며 관리하는 데 사용되는 함수, 클래스 및 메서드가 포함됩니다.

Here are some key components and functionalities often found in an HPO API:

다음은 일반적으로 HPO API에서 찾을 수 있는 주요 구성 요소와 기능입니다.

Hyperparameter Configuration: The API allows users to define the hyperparameters they want to optimize and specify their respective search spaces. This involves specifying the range of values or distribution from which each hyperparameter should be sampled during optimization.

하이퍼파라미터 구성: API를 통해 사용자는 최적화하려는 하이퍼파라미터를 정의하고 해당 하이퍼파라미터의 검색 공간을 지정할 수 있습니다. 이는 각 하이퍼파라미터의 최적화 중에 샘플링해야 하는 값 또는 분포를 지정하는 것을 포함합니다.
Objective Function: Users can define the objective function (also known as the loss or evaluation function) that quantifies the performance of a machine learning model with a given set of hyperparameters. The API provides a way to evaluate the model's performance using this function.

목적 함수: 사용자는 주어진 하이퍼파라미터 세트에 대한 기계 학습 모델의 성능을 양적화하는 목적 함수(손실 또는 평가 함수로도 알려짐)를 정의할 수 있습니다. API는 이 함수를 사용하여 모델의 성능을 평가하는 방법을 제공합니다.
Optimization Algorithms: The API offers a selection of optimization algorithms such as random search, Bayesian optimization, genetic algorithms, or more advanced techniques. Users can choose the algorithm that best suits their optimization problem.

최적화 알고리즘: API는 무작위 검색, 베이지안 최적화, 유전 알고리즘 또는 더 고급 기술과 같은 최적화 알고리즘을 선택할 수 있는 옵션을 제공합니다. 사용자는 최적화 문제에 가장 적합한 알고리즘을 선택할 수 있습니다.
Concurrency and Parallelism: It provides options for running multiple hyperparameter optimization trials concurrently or in parallel. This can significantly speed up the optimization process, especially when optimizing computationally expensive models.

동시성과 병렬 처리: 일부 HPO API는 여러 하이퍼파라미터 최적화 시행을 동시에 또는 병렬로 실행하는 옵션을 제공합니다. 이렇게 하면 특히 계산 비용이 많이 드는 모델을 최적화할 때 최적화 프로세스가 크게 가속화될 수 있습니다.
Early Stopping and Resource Management: Some HPO APIs support early stopping criteria based on the performance of ongoing trials. They can also manage resources, ensuring that the optimization process respects hardware constraints.

조기 중지 및 리소스 관리: 일부 HPO API는 진행 중인 시행의 성능을 기반으로 조기 중지 기준을 지원합니다. 또한 하드웨어 제한을 존중하도록 최적화 프로세스를 관리할 수 있습니다.
Results Tracking: The API may include functions for tracking and storing the results of each optimization trial, including hyperparameters, evaluation metrics, and other relevant information.

결과 추적: API에는 각 하이퍼파라미터 최적화 시행의 결과를 추적하고 저장하는 함수가 포함될 수 있으며, 이는 하이퍼파라미터, 평가 메트릭, 기타 관련 정보를 포함합니다.
Visualization and Reporting: It often includes tools for visualizing the progress of hyperparameter optimization, such as plots or graphs showing how the objective function changes over time.

시각화 및 보고: 종종 하이퍼파라미터 최적화 진행 상황을 시각화하는 데 사용되는 도구를 포함하며, 이러한 도구는 목적 함수가 어떻게 시간에 따라 변하는지 보여주는 플롯이나 그래프를 제공합니다.
Integration with Machine Learning Frameworks: Many HPO APIs are designed to seamlessly integrate with popular machine learning libraries and frameworks like TensorFlow, PyTorch, scikit-learn, and others.

기계 학습 프레임워크 통합: 많은 HPO API는 인기 있는 기계 학습 라이브러리와 프레임워크인 TensorFlow, PyTorch, scikit-learn 및 기타 라이브러리와 원활하게 통합될 수 있도록 설계되었습니다.
Customization: Advanced users can customize the optimization process by specifying custom algorithms, resource management policies, or search strategies.

사용자 정의: 고급 사용자는 사용자 지정 알고리즘, 리소스 관리 정책 또는 검색 전략을 지정하여 최적화 프로세스를 사용자 정의할 수 있습니다.
Hyperparameter Importance Analysis: Some APIs offer functionality for analyzing the importance of different hyperparameters in the final model's performance.

하이퍼파라미터 중요도 분석: 일부 API는 최종 모델 성능에 대한 다른 하이퍼파라미터의 중요성을 분석하는 기능을 제공합니다.

Overall, the HPO API simplifies the process of hyperparameter tuning, allowing users to find optimal hyperparameter settings for their machine learning models more efficiently. It can save time and computational resources and help improve model performance. Popular examples of HPO APIs include Optuna, Hyperopt, and Ray Tune, among others.

전반적으로 HPO API는 하이퍼파라미터 튜닝 프로세스를 단순화하여 사용자가 기계 학습 모델의 최적 하이퍼파라미터 설정을 보다 효율적으로 찾을 수 있게 해줍니다. 시간과 계산 리소스를 절약하고 모델 성능을 향상시킬 수 있습니다. 인기 있는 HPO API 예제로는 Optuna, Hyperopt, Ray Tune 등이 있습니다.

19.2.1. Searcher

Below we define a base class for searchers, which provides a new candidate configuration through the sample_configuration function. A simple way to implement this function would be to sample configurations uniformly at random, as we did for random search in Section 19.1. More sophisticated algorithms, such as Bayesian optimization, will make these decisions based on the performance of previous trials. As a result, these algorithms are able to sample more promising candidates over time. We add the update function in order to update the history of previous trials, which can then be exploited to improve our sampling distribution.

아래에서는 Sample_configuration 함수를 통해 새로운 후보 구성을 제공하는 검색자를 위한 기본 클래스를 정의합니다. 이 기능을 구현하는 간단한 방법은 섹션 19.1에서 무작위 검색을 수행한 것처럼 무작위로 균일하게 구성을 샘플링하는 것입니다. 베이지안 최적화와 같은 보다 정교한 알고리즘은 이전 시도의 성능을 기반으로 이러한 결정을 내립니다. 결과적으로 이러한 알고리즘은 시간이 지남에 따라 더 유망한 후보자를 샘플링할 수 있습니다. 이전 시도의 기록을 업데이트하기 위해 업데이트 기능을 추가한 다음 샘플링 분포를 개선하는 데 활용할 수 있습니다.

class HPOSearcher(d2l.HyperParameters):  #@save
    def sample_configuration() -> dict:
        raise NotImplementedError

    def update(self, config: dict, error: float, additional_info=None):
        pass

위의 코드는 하이퍼파라미터(Hyperparameters) 탐색을 수행하는 클래스인 HPOSearcher를 정의하는 부분입니다. 코드의 목적과 각 부분에 대한 설명은 다음과 같습니다:

class HPOSearcher(d2l.HyperParameters):: HPOSearcher 클래스를 정의합니다. 이 클래스는 d2l.HyperParameters 클래스를 상속합니다.
def sample_configuration() -> dict:: 하이퍼파라미터 탐색 과정에서 하이퍼파라미터 구성(configuration)을 샘플링하는 메서드를 정의합니다. 이 메서드는 하이퍼파라미터 탐색 알고리즘이 다음으로 시도할 하이퍼파라미터 구성을 생성하고 이를 딕셔너리 형태로 반환해야 합니다. 이 메서드는 추상 메서드로 구현되지 않으며 하위 클래스에서 구현되어야 합니다.
def update(self, config: dict, error: float, additional_info=None):: 하이퍼파라미터 탐색 알고리즘이 하이퍼파라미터 구성을 시도한 후에, 해당 구성에 대한 결과인 검증 오차와 추가 정보를 기반으로 어떤 동작을 수행해야 하는지를 정의하는 메서드입니다. 이 메서드는 하이퍼파라미터 탐색 알고리즘이 현재 시도한 하이퍼파라미터 구성(config), 해당 구성에 대한 검증 오차(error), 그리고 추가 정보(additional_info)를 인자로 받습니다. 이 메서드는 추상 메서드로 구현되지 않으며 하위 클래스에서 구현되어야 합니다.

HPOSearcher 클래스는 하이퍼파라미터 탐색 과정에서 필요한 메서드를 정의하는 기본 클래스입니다. 실제 하이퍼파라미터 탐색을 위해서는 이 클래스를 상속하고 sample_configuration과 update 메서드를 구현해야 합니다. 이 클래스를 상속한 하위 클래스에서는 하이퍼파라미터 탐색 알고리즘에 따라 구체적인 동작을 정의하게 됩니다.

The following code shows how to implement our random search optimizer from the previous section in this API. As a slight extension, we allow the user to prescribe the first configuration to be evaluated via initial_config, while subsequent ones are drawn at random.

다음 코드는 이 API의 이전 섹션에서 무작위 검색 최적화 프로그램을 구현하는 방법을 보여줍니다. 약간의 확장으로 사용자가 초기 구성을 통해 평가할 첫 번째 구성을 규정하고 후속 구성은 무작위로 그릴 수 있습니다.

class RandomSearcher(HPOSearcher):  #@save
    def __init__(self, config_space: dict, initial_config=None):
        self.save_hyperparameters()

    def sample_configuration(self) -> dict:
        if self.initial_config is not None:
            result = self.initial_config
            self.initial_config = None
        else:
            result = {
                name: domain.rvs()
                for name, domain in self.config_space.items()
            }
        return result

위의 코드는 랜덤 탐색(Random Search)을 수행하는 RandomSearcher 클래스를 정의하는 부분입니다. 코드의 목적과 각 부분에 대한 설명은 다음과 같습니다:

class RandomSearcher(HPOSearcher):: RandomSearcher 클래스를 정의합니다. 이 클래스는 HPOSearcher 클래스를 상속합니다.
def __init__(self, config_space: dict, initial_config=None):: RandomSearcher 클래스의 생성자 메서드를 정의합니다. 이 생성자 메서드는 두 개의 매개변수를 입력으로 받습니다.
- config_space: 하이퍼파라미터 탐색을 위한 하이퍼파라미터 공간을 나타내는 딕셔너리입니다. 각 하이퍼파라미터의 이름과 확률 분포가 포함되어 있어야 합니다.
- initial_config: 초기 하이퍼파라미터 구성을 나타내는 딕셔너리입니다. 기본값은 None으로 설정되어 있습니다.
self.save_hyperparameters(): 하이퍼파라미터를 저장하는 메서드입니다. 이 메서드를 호출하여 RandomSearcher 클래스의 하이퍼파라미터를 저장합니다.
def sample_configuration(self) -> dict:: 하이퍼파라미터 탐색 과정에서 하이퍼파라미터 구성을 랜덤하게 샘플링하는 메서드를 정의합니다. 이 메서드는 딕셔너리 형태로 하이퍼파라미터 구성을 반환해야 합니다.
- 처음에 initial_config가 설정되어 있다면, 초기 구성을 사용하고 initial_config를 None으로 설정합니다.
- 그렇지 않으면, config_space에 정의된 각 하이퍼파라미터에 대해 해당 확률 분포(domain)에서 랜덤하게 값을 샘플링하여 딕셔너리로 구성합니다.
- 최종적으로 샘플링된 하이퍼파라미터 구성을 반환합니다.

RandomSearcher 클래스는 랜덤 탐색을 수행하는 클래스로, sample_configuration 메서드에서 랜덤하게 하이퍼파라미터를 선택하여 반환합니다. 이를 통해 하이퍼파라미터 탐색을 무작위로 수행하는 간단한 탐색 전략을 구현할 수 있습니다.

19.2.2. Scheduler

Beyond sampling configurations for new trials, we also need to decide when and for how long to run a trial. In practice, all these decisions are done by the HPOScheduler, which delegates the choice of new configurations to a HPOSearcher. The suggest method is called whenever some resource for training becomes available. Apart from invoking sample_configuration of a searcher, it may also decide upon parameters like max_epochs (i.e., how long to train the model for). The update method is called whenever a trial returns a new observation.

새로운 시험을 위한 샘플링 구성 외에도 시험을 실행할 시기와 기간도 결정해야 합니다. 실제로 이러한 모든 결정은 새로운 구성 선택을 HPOSearcher에 위임하는 HPOScheduler에 의해 수행됩니다. 훈련을 위한 리소스를 사용할 수 있을 때마다 제안 메소드가 호출됩니다. 검색기의 Sample_configuration을 호출하는 것 외에도 max_epochs(즉, 모델을 훈련할 기간)와 같은 매개변수를 결정할 수도 있습니다. 업데이트 메소드는 시행에서 새로운 관찰이 반환될 때마다 호출됩니다.

class HPOScheduler(d2l.HyperParameters):  #@save
    def suggest(self) -> dict:
        raise NotImplementedError

    def update(self, config: dict, error: float, info=None):
        raise NotImplementedError

위의 코드는 하이퍼파라미터(Hyperparameters) 탐색을 위한 스케줄러인 HPOScheduler 클래스를 정의하는 부분입니다. 이 클래스는 하이퍼파라미터 탐색 과정에서 다양한 하이퍼파라미터 탐색 알고리즘과 스케줄링을 구현하기 위한 기반 클래스입니다. 코드의 목적과 각 부분에 대한 설명은 다음과 같습니다:

class HPOScheduler(d2l.HyperParameters):: HPOScheduler 클래스를 정의합니다. 이 클래스는 d2l.HyperParameters 클래스를 상속합니다.
def suggest(self) -> dict:: 하이퍼파라미터 탐색 알고리즘이 다음으로 시도할 하이퍼파라미터 구성(configuration)을 제안하는 메서드를 정의합니다. 이 메서드는 추상 메서드로 구현되지 않으며 하위 클래스에서 구현되어야 합니다. 구체적인 하이퍼파라미터 탐색 알고리즘에 따라 다음 시도할 하이퍼파라미터 구성을 반환합니다.
def update(self, config: dict, error: float, info=None):: 하이퍼파라미터 탐색 알고리즘이 하이퍼파라미터 구성을 시도한 후에, 해당 구성에 대한 결과인 검증 오차와 추가 정보를 기반으로 어떤 동작을 수행해야 하는지를 정의하는 메서드입니다. 이 메서드는 추상 메서드로 구현되지 않으며 하위 클래스에서 구현되어야 합니다. 검증 오차와 추가 정보를 활용하여 하이퍼파라미터 탐색 알고리즘의 스케줄링 및 업데이트 동작을 정의합니다.

HPOScheduler 클래스는 다양한 하이퍼파라미터 탐색 알고리즘과 스케줄링 전략을 구현하기 위한 기반 클래스로 사용될 수 있습니다. 구체적인 하이퍼파라미터 탐색 알고리즘에 따라 suggest와 update 메서드를 하위 클래스에서 구현하여 사용할 수 있습니다.

To implement random search, but also other HPO algorithms, we only need a basic scheduler that schedules a new configuration every time new resources become available.

무작위 검색 및 기타 HPO 알고리즘을 구현하려면 새 리소스를 사용할 수 있을 때마다 새 구성을 예약하는 기본 스케줄러만 필요합니다.

class BasicScheduler(HPOScheduler):  #@save
    def __init__(self, searcher: HPOSearcher):
        self.save_hyperparameters()

    def suggest(self) -> dict:
        return self.searcher.sample_configuration()

    def update(self, config: dict, error: float, info=None):
        self.searcher.update(config, error, additional_info=info)

위의 코드는 기본적인 스케줄러인 BasicScheduler 클래스를 정의하는 부분입니다. 이 클래스는 하이퍼파라미터 탐색 과정에서 하이퍼파라미터를 제안하고 업데이트하는 역할을 합니다. 코드의 목적과 각 부분에 대한 설명은 다음과 같습니다:

class BasicScheduler(HPOScheduler):: BasicScheduler 클래스를 정의합니다. 이 클래스는 HPOScheduler 클래스를 상속합니다.
def __init__(self, searcher: HPOSearcher):: BasicScheduler 클래스의 생성자 메서드를 정의합니다. 이 생성자 메서드는 하이퍼파라미터 탐색을 수행하는 searcher 객체를 입력으로 받습니다.
- searcher: 하이퍼파라미터 탐색을 담당하는 HPOSearcher 클래스의 객체입니다.
self.save_hyperparameters(): 하이퍼파라미터를 저장하는 메서드입니다. 이 메서드를 호출하여 BasicScheduler 클래스의 하이퍼파라미터를 저장합니다.
def suggest(self) -> dict:: 하이퍼파라미터 제안 메서드를 구현합니다. 이 메서드는 searcher 객체의 sample_configuration 메서드를 호출하여 다음으로 시도할 하이퍼파라미터 구성을 제안합니다.
- self.searcher.sample_configuration(): searcher 객체의 sample_configuration 메서드를 호출하여 하이퍼파라미터 구성을 제안합니다.
def update(self, config: dict, error: float, info=None):: 하이퍼파라미터 업데이트 메서드를 구현합니다. 이 메서드는 searcher 객체의 update 메서드를 호출하여 하이퍼파라미터 탐색 알고리즘의 업데이트 동작을 수행합니다.
- self.searcher.update(config, error, additional_info=info): searcher 객체의 update 메서드를 호출하여 하이퍼파라미터 구성(config)과 검증 오차(error)를 기반으로 업데이트 동작을 수행합니다. 추가 정보(info)도 함께 전달할 수 있습니다.

BasicScheduler 클래스는 단순한 스케줄러로, searcher 객체의 메서드를 호출하여 하이퍼파라미터를 제안하고 업데이트합니다. 구체적인 하이퍼파라미터 탐색 알고리즘과 스케줄링 전략은 searcher 객체에서 결정됩니다. 이 클래스를 사용하여 기본적인 하이퍼파라미터 탐색을 수행할 수 있습니다.

19.2.3. Tuner

Finally, we need a component that runs the scheduler/searcher and does some book-keeping of the results. The following code implements a sequential execution of the HPO trials that evaluates one training job after the next and will serve as a basic example. We will later use Syne Tune for more scalable distributed HPO cases.

마지막으로 스케줄러/검색기를 실행하고 결과를 기록하는 구성 요소가 필요합니다. 다음 코드는 다음 훈련 작업을 차례로 평가하는 HPO 시도의 순차적 실행을 구현하며 기본 예제로 사용됩니다. 나중에 더 확장 가능한 분산 HPO 사례를 위해 Syne Tune을 사용할 것입니다.

class HPOTuner(d2l.HyperParameters):  #@save
    def __init__(self, scheduler: HPOScheduler, objective: callable):
        self.save_hyperparameters()
        # Bookeeping results for plotting
        self.incumbent = None
        self.incumbent_error = None
        self.incumbent_trajectory = []
        self.cumulative_runtime = []
        self.current_runtime = 0
        self.records = []

    def run(self, number_of_trials):
        for i in range(number_of_trials):
            start_time = time.time()
            config = self.scheduler.suggest()
            print(f"Trial {i}: config = {config}")
            error = self.objective(**config)
            error = float(error.cpu().detach().numpy())
            self.scheduler.update(config, error)
            runtime = time.time() - start_time
            self.bookkeeping(config, error, runtime)
            print(f"    error = {error}, runtime = {runtime}")

위의 코드는 하이퍼파라미터(Hyperparameters) 튜닝을 수행하는 HPOTuner 클래스를 정의하는 부분입니다. 이 클래스는 주어진 하이퍼파라미터 탐색 스케줄러(scheduler)와 목표 함수(objective)를 사용하여 하이퍼파라미터 탐색을 실행하고 결과를 기록하는 역할을 합니다. 코드의 목적과 각 부분에 대한 설명은 다음과 같습니다:

class HPOTuner(d2l.HyperParameters):: HPOTuner 클래스를 정의합니다. 이 클래스는 d2l.HyperParameters 클래스를 상속합니다.
def __init__(self, scheduler: HPOScheduler, objective: callable):: HPOTuner 클래스의 생성자 메서드를 정의합니다. 이 생성자 메서드는 두 개의 매개변수를 입력으로 받습니다.
- scheduler: 하이퍼파라미터 탐색 스케줄러(HPOScheduler) 객체입니다. 이 객체는 하이퍼파라미터 탐색 알고리즘과 스케줄링을 제어합니다.
- objective: 목표 함수(callable)입니다. 이 함수는 하이퍼파라미터 구성을 입력으로 받아 검증 오차를 반환하는 함수여야 합니다.
self.save_hyperparameters(): 하이퍼파라미터를 저장하는 메서드입니다. 이 메서드를 호출하여 HPOTuner 클래스의 하이퍼파라미터를 저장합니다.
self.incumbent, self.incumbent_error, self.incumbent_trajectory, self.cumulative_runtime, self.current_runtime, self.records: 하이퍼파라미터 탐색 결과를 저장하기 위한 인스턴스 변수들입니다. 이 변수들은 향후 결과 분석 및 시각화에 사용됩니다.
def run(self, number_of_trials): 하이퍼파라미터 탐색을 실행하는 메서드입니다. 이 메서드는 number_of_trials 만큼의 하이퍼파라미터 탐색 시도를 수행합니다.
- 먼저 현재 시간을 측정하여 시도의 시작 시간(start_time)을 저장합니다.
- scheduler 객체를 사용하여 다음으로 시도할 하이퍼파라미터 구성(config)을 제안합니다.
- objective 함수를 사용하여 제안된 하이퍼파라미터 구성에 대한 검증 오차(error)를 계산합니다.
- 검증 오차를 float 형태로 변환하여 저장합니다.
- scheduler 객체를 사용하여 하이퍼파라미터 탐색 알고리즘을 업데이트합니다.
- 시도의 실행 시간(runtime)을 계산합니다.
- bookkeeping 메서드를 호출하여 결과를 기록합니다.
- 시도별로 제안된 하이퍼파라미터 구성, 검증 오차, 실행 시간을 출력합니다.

HPOTuner 클래스는 주어진 하이퍼파라미터 탐색 스케줄러와 목표 함수를 사용하여 하이퍼파라미터 탐색을 수행하고 결과를 기록하는 역할을 합니다. 탐색된 결과는 인스턴스 변수에 저장되어 이후 분석 및 시각화에 사용됩니다.

19.2.4. Bookkeeping the Performance of HPO Algorithms

With any HPO algorithm, we are mostly interested in the best performing configuration (called incumbent) and its validation error after a given wall-clock time. This is why we track runtime per iteration, which includes both the time to run an evaluation (call of objective) and the time to make a decision (call of scheduler.suggest). In the sequel, we will plot cumulative_runtime against incumbent_trajectory in order to visualize the any-time performance of the HPO algorithm defined in terms of scheduler (and searcher). This allows us to quantify not only how well the configuration found by an optimizer works, but also how quickly an optimizer is able to find it.

모든 HPO 알고리즘에서 우리는 가장 성능이 좋은 구성(현재라고 함)과 주어진 wall-clock time 이후의 유효성 검사 오류에 주로 관심이 있습니다. 이것이 바로 우리가 평가 실행 시간(목표 호출)과 결정을 내리는 시간(scheduler.suggest 호출)을 모두 포함하는 반복당 런타임을 추적하는 이유입니다. 후속편에서는 스케줄러(및 검색기) 측면에서 정의된 HPO 알고리즘의 언제든지 성능을 시각화하기 위해 incumbent_trajectory에 대해 cumulative_runtime을 플롯합니다. 이를 통해 우리는 옵티마이저가 찾은 구성이 얼마나 잘 작동하는지 뿐만 아니라 옵티마이저가 이를 얼마나 빨리 찾을 수 있는지를 정량화할 수 있습니다.

@d2l.add_to_class(HPOTuner)  #@save
def bookkeeping(self, config: dict, error: float, runtime: float):
    self.records.append({"config": config, "error": error, "runtime": runtime})
    # Check if the last hyperparameter configuration performs better
    # than the incumbent
    if self.incumbent is None or self.incumbent_error > error:
        self.incumbent = config
        self.incumbent_error = error
    # Add current best observed performance to the optimization trajectory
    self.incumbent_trajectory.append(self.incumbent_error)
    # Update runtime
    self.current_runtime += runtime
    self.cumulative_runtime.append(self.current_runtime)

위의 코드는 HPOTuner 클래스에 새로운 메서드인 bookkeeping을 추가하는 부분입니다. bookkeeping 메서드는 하이퍼파라미터 탐색 결과를 기록하고 현재까지의 최적 하이퍼파라미터 구성 및 검증 오차를 관리합니다. 코드의 목적과 각 부분에 대한 설명은 다음과 같습니다:

@d2l.add_to_class(HPOTuner): bookkeeping 메서드를 HPOTuner 클래스에 추가하는 데코레이터입니다. 이를 통해 bookkeeping 메서드가 HPOTuner 클래스의 일부로 추가됩니다.
def bookkeeping(self, config: dict, error: float, runtime: float):: bookkeeping 메서드를 정의합니다. 이 메서드는 세 개의 매개변수를 입력으로 받습니다.
- config: 현재 시도한 하이퍼파라미터 구성(configuration)을 나타내는 딕셔너리입니다.
- error: 현재 시도한 하이퍼파라미터 구성에 대한 검증 오차를 나타내는 부동 소수점 숫자(float)입니다.
- runtime: 현재 시도한 하이퍼파라미터 탐색 시도의 실행 시간을 나타내는 부동 소수점 숫자(float)입니다.
self.records.append({"config": config, "error": error, "runtime": runtime}): 시도한 하이퍼파라미터 구성(config), 검증 오차(error), 실행 시간(runtime)을 딕셔너리 형태로 묶어 records 리스트에 추가합니다. 이를 통해 각 시도의 결과가 기록됩니다.
if self.incumbent is None or self.incumbent_error > error:: 현재까지의 최적 하이퍼파라미터 구성(incumbent)이 없거나 현재 시도한 하이퍼파라미터 구성의 검증 오차가 현재까지의 최적 검증 오차(incumbent_error)보다 작을 경우, 새로운 최적 하이퍼파라미터 구성으로 업데이트합니다.
self.incumbent_trajectory.append(self.incumbent_error): 최적 검증 오차를 최적화 경로(incumbent_trajectory)에 추가합니다. 이를 통해 최적 검증 오차의 변화를 추적할 수 있습니다.
self.current_runtime += runtime: 현재 시도한 하이퍼파라미터 탐색 시도의 실행 시간을 누적 실행 시간에 추가합니다.
self.cumulative_runtime.append(self.current_runtime): 누적 실행 시간을 cumulative_runtime 리스트에 추가합니다. 이를 통해 누적 실행 시간의 변화를 추적할 수 있습니다.

bookkeeping 메서드는 하이퍼파라미터 탐색 과정에서 발생한 결과를 기록하고 최적 하이퍼파라미터 구성 및 검증 오차를 관리하는 중요한 역할을 합니다. 이를 통해 하이퍼파라미터 탐색의 진행과 결과 분석을 용이하게 할 수 있습니다.

19.2.5. Example: Optimizing the Hyperparameters of a Convolutional Neural Network

We now use our new implementation of random search to optimize the batch size and learning rate of the LeNet convolutional neural network from Section 7.6. We being by defining the objective function, which will once more be validation error.

이제 우리는 섹션 7.6의 LeNet 컨벌루션 신경망의 배치 크기와 학습 속도를 최적화하기 위해 새로운 무작위 검색 구현을 사용합니다. 우리는 다시 한 번 검증 오류가 될 목적 함수를 정의하고 있습니다.

def hpo_objective_lenet(learning_rate, batch_size, max_epochs=10):  #@save
    model = d2l.LeNet(lr=learning_rate, num_classes=10)
    trainer = d2l.HPOTrainer(max_epochs=max_epochs, num_gpus=1)
    data = d2l.FashionMNIST(batch_size=batch_size)
    model.apply_init([next(iter(data.get_dataloader(True)))[0]], d2l.init_cnn)
    trainer.fit(model=model, data=data)
    validation_error = trainer.validation_error()
    return validation_error

위의 코드는 하이퍼파라미터 튜닝을 위한 목표 함수인 hpo_objective_lenet 함수를 정의하는 부분입니다. 이 함수는 LeNet 모델을 사용하여 Fashion MNIST 데이터셋에 대한 검증 오차를 반환하는 역할을 합니다. 코드의 목적과 각 부분에 대한 설명은 다음과 같습니다:

def hpo_objective_lenet(learning_rate, batch_size, max_epochs=10):: hpo_objective_lenet 함수를 정의합니다. 이 함수는 세 개의 하이퍼파라미터와 하나의 선택적 매개변수를 입력으로 받습니다.
- learning_rate: 학습률을 나타내는 부동 소수점 숫자(float)입니다.
- batch_size: 미니배치 크기를 나타내는 정수(int)입니다.
- max_epochs: 최대 에포크 수를 나타내는 정수(int)입니다. 기본값은 10입니다.
model = d2l.LeNet(lr=learning_rate, num_classes=10): LeNet 아키텍처를 사용하여 모델을 초기화합니다. 이때 학습률과 클래스 수를 매개변수로 설정합니다.
trainer = d2l.HPOTrainer(max_epochs=max_epochs, num_gpus=1): 하이퍼파라미터 튜닝을 위한 트레이너 객체를 생성합니다. 최대 에포크 수와 GPU 수를 설정합니다.
data = d2l.FashionMNIST(batch_size=batch_size): Fashion MNIST 데이터셋을 불러와서 데이터 객체를 생성합니다. 이때 미니배치 크기를 설정합니다.
model.apply_init([next(iter(data.get_dataloader(True)))[0]], d2l.init_cnn): 모델의 가중치를 초기화합니다. 이때 데이터로부터 첫 번째 미니배치를 추출하여 초기화에 사용합니다.
trainer.fit(model=model, data=data): 트레이너를 사용하여 모델을 학습시킵니다. 모델과 데이터를 입력으로 제공합니다.
validation_error = trainer.validation_error(): 학습된 모델을 검증 데이터에 대해 평가하여 검증 오차를 계산합니다.
return validation_error: 검증 오차를 반환합니다.

이 함수는 주어진 하이퍼파라미터 구성(learning_rate, batch_size, max_epochs)으로 LeNet 모델을 학습하고 검증 오차를 반환합니다. 이 함수는 하이퍼파라미터 탐색에서 목표로 하는 검증 오차를 최소화하기 위해 호출됩니다.

We also need to define the configuration space. Moreover, the first configuration to be evaluated is the default setting used in Section 7.6.

또한 구성 공간을 정의해야 합니다. 또한 평가할 첫 번째 구성은 섹션 7.6에서 사용된 기본 설정입니다.

config_space = {
    "learning_rate": stats.loguniform(1e-2, 1),
    "batch_size": stats.randint(32, 256),
}
initial_config = {
    "learning_rate": 0.1,
    "batch_size": 128,
}

위의 코드는 하이퍼파라미터 탐색을 위한 하이퍼파라미터 공간(config_space)과 초기 하이퍼파라미터 구성(initial_config)을 정의하는 부분입니다. 각 부분에 대한 설명은 다음과 같습니다:

config_space: 하이퍼파라미터 공간을 정의하는 딕셔너리입니다. 이 딕셔너리에는 탐색할 하이퍼파라미터의 이름과 각 하이퍼파라미터에 대한 확률 분포가 설정됩니다.
- "learning_rate": 학습률을 나타내는 하이퍼파라미터입니다. 이 학습률은 로그 균등 분포(stats.loguniform)를 사용하여 1e-2에서 1 사이의 값 중에서 무작위로 선택됩니다.
- "batch_size": 미니배치 크기를 나타내는 하이퍼파라미터입니다. 이 미니배치 크기는 균등 분포(stats.randint)를 사용하여 32에서 256 사이의 정수 중에서 무작위로 선택됩니다.
initial_config: 초기 하이퍼파라미터 구성을 정의하는 딕셔너리입니다. 이 딕셔너리에는 하이퍼파라미터의 이름과 초기값이 설정됩니다.
- "learning_rate": 학습률의 초기값을 0.1로 설정합니다.
- "batch_size": 미니배치 크기의 초기값을 128로 설정합니다.

이렇게 정의된 config_space와 initial_config를 사용하여 하이퍼파라미터 탐색을 수행할 때, 하이퍼파라미터 탐색 공간은 learning_rate와 batch_size 두 가지 하이퍼파라미터를 다루며, 초기 탐색은 initial_config에서 정의한 값으로 시작합니다. 이후 하이퍼파라미터 탐색 알고리즘이 지정된 공간에서 하이퍼파라미터를 무작위로 탐색하고 목표 함수를 최적화하려고 시도합니다.

Now we can start our random search:

이제 무작위 검색을 시작할 수 있습니다.

searcher = RandomSearcher(config_space, initial_config=initial_config)
scheduler = BasicScheduler(searcher=searcher)
tuner = HPOTuner(scheduler=scheduler, objective=hpo_objective_lenet)
tuner.run(number_of_trials=5)

위의 코드는 하이퍼파라미터 튜닝 프로세스를 설정하고 실행하는 부분입니다. 이 코드는 다음과 같은 주요 단계로 구성됩니다:

searcher = RandomSearcher(config_space, initial_config=initial_config): RandomSearcher 클래스를 사용하여 하이퍼파라미터 탐색기(searcher)를 생성합니다. 이 탐색기는 정의한 하이퍼파라미터 공간(config_space)에서 무작위로 하이퍼파라미터를 샘플링하며, 초기 하이퍼파라미터 구성(initial_config)은 최초 탐색 시도에서 사용됩니다.
scheduler = BasicScheduler(searcher=searcher): BasicScheduler 클래스를 사용하여 스케줄러(scheduler)를 생성합니다. 이 스케줄러는 하이퍼파라미터 탐색기(searcher)를 기반으로 하이퍼파라미터 탐색을 제어하며, 다음에 시도할 하이퍼파라미터 구성을 추천합니다.
tuner = HPOTuner(scheduler=scheduler, objective=hpo_objective_lenet): HPOTuner 클래스를 사용하여 하이퍼파라미터 튜너(tuner)를 생성합니다. 이 튜너는 스케줄러와 목표 함수(objective)를 입력으로 받습니다. 목표 함수는 하이퍼파라미터 탐색 시 목표로 하는 평가 지표(여기서는 검증 오차)를 최소화하기 위해 호출됩니다.
tuner.run(number_of_trials=5): 하이퍼파라미터 탐색을 실행합니다. number_of_trials 매개변수에 지정된 횟수(여기서는 5번)만큼 하이퍼파라미터 탐색을 반복하며, 각 시도에서 목표 함수를 호출하여 검증 오차를 최소화하는 최적의 하이퍼파라미터를 찾습니다.

이렇게 설정된 하이퍼파라미터 탐색 프로세스를 실행하면, 다양한 하이퍼파라미터 조합을 시도하여 모델의 검증 오차를 최적화하려고 노력합니다. 최적의 하이퍼파라미터 구성과 검증 오차의 기록은 tuner 객체에 저장되며, 최종적으로 가장 좋은 하이퍼파라미터 구성을 찾게 됩니다.

 error = 0.9000097513198853, runtime = 62.85189199447632

==> 여러 구성으로 여러번 run 하기 때문에 시간이 많이 걸림. 아래는 CoLab에서 돌린 결과. 11분 걸

Below we plot the optimization trajectory of the incumbent to get the any-time performance of random search:

아래에서는 무작위 검색의 언제든지 성능을 얻기 위해 기존 기업의 최적화 궤적을 그립니다.

board = d2l.ProgressBoard(xlabel="time", ylabel="error")
for time_stamp, error in zip(
    tuner.cumulative_runtime, tuner.incumbent_trajectory
):
    board.draw(time_stamp, error, "random search", every_n=1)

위의 코드는 하이퍼파라미터 탐색 과정에서 검증 오차의 변화를 시각화하는 부분입니다. 코드는 다음과 같이 동작합니다:

board = d2l.ProgressBoard(xlabel="time", ylabel="error"): d2l.ProgressBoard 객체를 생성하여 그래프를 초기화합니다. 이 그래프는 시간(time)에 따른 검증 오차(error)의 변화를 시각화합니다. x축은 시간을 나타내고, y축은 검증 오차를 나타냅니다.
for time_stamp, error in zip(tuner.cumulative_runtime, tuner.incumbent_trajectory):: 하이퍼파라미터 튜닝 과정에서 누적된 시간(time_stamp)과 현재까지의 최적 검증 오차(error)를 반복하면서 그래프를 그립니다.
board.draw(time_stamp, error, "random search", every_n=1): 그래프에 데이터를 추가합니다. 시간(time_stamp)과 검증 오차(error)를 전달하고, "random search"라는 레이블을 지정합니다. every_n=1은 모든 데이터 포인트를 표시하도록 지정하는데, 이 값이 높으면 그래프에 표시되는 데이터 포인트의 수가 감소합니다.

이 코드는 하이퍼파라미터 탐색 과정 중에 검증 오차의 변화를 실시간으로 시각화하여 어떻게 하이퍼파라미터 탐색이 진행되고 있는지를 모니터링할 수 있게 합니다. 그래프는 시간에 따른 검증 오차의 추이를 보여주며, 최적의 하이퍼파라미터 구성을 찾는 과정을 시각적으로 이해하는 데 도움을 줍니다.

19.2.6. Comparing HPO Algorithms

Just as with training algorithms or model architectures, it is important to understand how to best compare different HPO algorithms. Each HPO run depends on two major sources of randomness: the random effects of the training process, such as random weight initialization or mini-batch ordering, and the intrinsic randomness of the HPO algorithm itself, such as the random sampling of random search. Hence, when comparing different algorithms, it is crucial to run each experiment several times and report statistics, such as mean or median, across a population of multiple repetitions of an algorithm based on different seeds of the random number generator.

학습 알고리즘이나 모델 아키텍처와 마찬가지로 다양한 HPO 알고리즘을 가장 잘 비교하는 방법을 이해하는 것이 중요합니다. 각 HPO 실행은 무작위성의 두 가지 주요 소스, 즉 무작위 가중치 초기화 또는 미니 배치 순서 지정과 같은 훈련 프로세스의 무작위 효과와 무작위 검색의 무작위 샘플링과 같은 HPO 알고리즘 자체의 본질적인 무작위성에 따라 달라집니다. 따라서 다양한 알고리즘을 비교할 때 각 실험을 여러 번 실행하고 난수 생성기의 다양한 시드를 기반으로 하는 알고리즘의 여러 반복 모집단에 대한 평균 또는 중앙값과 같은 통계를 보고하는 것이 중요합니다.

To illustrate this, we compare random search (see Section 19.1.2) and Bayesian optimization (Snoek et al., 2012) on tuning the hyperparameters of a feed-forward neural network. Each algorithm was evaluated 50 times with a different random seed. The solid line indicates the average performance of the incumbent across these 50 repetitions and the dashed line the standard deviation. We can see that random search and Bayesian optimization perform roughly the same up to ~1000 seconds, but Bayesian optimization can make use of the past observation to identify better configurations and thus quickly outperforms random search afterwards.

이를 설명하기 위해 피드포워드 신경망의 하이퍼파라미터 조정에 대한 무작위 검색(19.1.2절 참조)과 베이지안 최적화(Snoek et al., 2012)를 비교합니다. 각 알고리즘은 서로 다른 무작위 시드를 사용하여 50회 평가되었습니다. 실선은 50회 반복에 걸쳐 재직자의 평균 성과를 나타내고 점선은 표준 편차를 나타냅니다. 무작위 검색과 베이지안 최적화는 최대 1000초까지 거의 동일하게 수행되지만 베이지안 최적화는 과거 관찰을 활용하여 더 나은 구성을 식별할 수 있으므로 나중에 무작위 검색보다 빠르게 성능이 향상된다는 것을 알 수 있습니다.

Fig. 19.2.1  Example any-time performance plot to compare two algorithms A and B.

19.2.7. Summary

This section laid out a simple, yet flexible interface to implement various HPO algorithms that we will look at in this chapter. Similar interfaces can be found in popular open-source HPO frameworks. We also looked at how we can compare HPO algorithms, and potential pitfall one needs to be aware.

이 섹션에서는 이 장에서 살펴볼 다양한 HPO 알고리즘을 구현하기 위한 간단하면서도 유연한 인터페이스를 제시했습니다. 인기 있는 오픈 소스 HPO 프레임워크에서도 유사한 인터페이스를 찾을 수 있습니다. 또한 HPO 알고리즘을 비교할 수 있는 방법과 알아야 할 잠재적인 함정도 살펴보았습니다.

19.2.8. Exercises

The goal of this exercise is to implement the objective function for a slightly more challenging HPO problem, and to run more realistic experiments. We will use the two hidden layer MLP DropoutMLP implemented in Section 5.6.
1. Code up the objective function, which should depend on all hyperparameters of the model and batch_size. Use max_epochs=50. GPUs do not help here, so num_gpus=0. Hint: Modify hpo_objective_lenet.
2. Choose a sensible search space, where num_hiddens_1, num_hiddens_2 are integers in [8,1024], and dropout values lie in [0,0.95], while batch_size lies in [16,384]. Provide code for config_space, using sensible distributions from scipy.stats.
3. Run random search on this example with number_of_trials=20 and plot the results. Make sure to first evaluate the default configuration of Section 5.6, which is initial_config = {'num_hiddens_1': 256, 'num_hiddens_2': 256, 'dropout_1': 0.5, 'dropout_2': 0.5, 'lr': 0.1, 'batch_size': 256}.
In this exercise, you will implement a new searcher (subclass of HPOSearcher) which makes decisions based on past data. It depends on parameters probab_local, num_init_random. Its sample_configuration method works as follows. For the first num_init_random calls, do the same as RandomSearcher.sample_configuration. Otherwise, with probability 1 - probab_local, do the same as RandomSearcher.sample_configuration. Otherwise, pick the configuration which attained the smallest validation error so far, select one of its hyperparameters at random, and sample its value randomly like in RandomSearcher.sample_configuration, but leave all other values the same. Return this configuration, which is identical to the best configuration so far, except in this one hyperparameter.
1. Code up this new LocalSearcher. Hint: Your searcher requires config_space as argument at construction. Feel free to use a member of type RandomSearcher. You will also have to implement the update method.
2. Re-run the experiment from the previous exercise, but using your new searcher instead of RandomSearcher. Experiment with different values for probab_local, num_init_random. However, note that a proper comparison between different HPO methods requires repeating experiments several times, and ideally considering a number of benchmark tasks.

'Dive into Deep Learning > D2L Hyperparameter Optimization' 카테고리의 다른 글

D2L - 19.5. Asynchronous Successive Halving (0)	2023.09.10
D2L - 19.4. Multi-Fidelity Hyperparameter Optimization (0)	2023.09.10
D2L - 19.3. Asynchronous Random Search (0)	2023.09.10
D2L - 19.1. What Is Hyperparameter Optimization? (0)	2023.09.10
D2L - 19. Hyperparameter Optimization (0)	2023.09.10

Dive into Deep Learning/D2L Hyperparameter Optimization

D2L - 19.1. What Is Hyperparameter Optimization?

2023. 9. 10. 01:23 | Posted by 솔웅

https://d2l.ai/chapter_hyperparameter-optimization/hyperopt-intro.html

19.1. What Is Hyperparameter Optimization? — Dive into Deep Learning 1.0.3 documentation

d2l.ai

19.1. What Is Hyperparameter Optimization?

As we have seen in the previous chapters, deep neural networks come with a large number of parameters or weights that are learned during training. On top of these, every neural network has additional hyperparameters that need to be configured by the user. For example, to ensure that stochastic gradient descent converges to a local optimum of the training loss (see Section 12), we have to adjust the learning rate and batch size. To avoid overfitting on training datasets, we might have to set regularization parameters, such as weight decay (see Section 3.7) or dropout (see Section 5.6). We can define the capacity and inductive bias of the model by setting the number of layers and number of units or filters per layer (i.e., the effective number of weights).

이전 장에서 살펴본 것처럼 심층 신경망에는 훈련 중에 학습되는 수많은 매개변수 또는 가중치가 포함됩니다. 게다가 모든 신경망에는 사용자가 구성해야 하는 추가 하이퍼파라미터가 있습니다. 예를 들어 확률적 경사하강법이 훈련 손실의 국소 최적값으로 수렴되도록 하려면(섹션 12 참조) 학습 속도와 배치 크기를 조정해야 합니다. 훈련 데이터세트에 대한 과적합을 방지하려면 가중치 감소(섹션 3.7 참조) 또는 드롭아웃(섹션 5.6 참조)과 같은 정규화 매개변수를 설정해야 할 수도 있습니다. 레이어 수와 레이어당 단위 또는 필터 수(즉, 유효 가중치 수)를 설정하여 모델의 용량과 유도 편향을 정의할 수 있습니다.

Unfortunately, we cannot simply adjust these hyperparameters by minimizing the training loss, because this would lead to overfitting on the training data. For example, setting regularization parameters, such as dropout or weight decay to zero leads to a small training loss, but might hurt the generalization performance.

불행하게도 훈련 손실을 최소화함으로써 이러한 하이퍼파라미터를 간단히 조정할 수는 없습니다. 왜냐하면 그렇게 하면 훈련 데이터에 과적합이 발생할 수 있기 때문입니다. 예를 들어, 드롭아웃이나 가중치 감소와 같은 정규화 매개변수를 0으로 설정하면 훈련 손실이 약간 발생하지만 일반화 성능이 저하될 수 있습니다.

Fig. 19.1.1  Typical workflow in machine learning that consists of training the model multiple times with different hyperparameters.

Without a different form of automation, hyperparameters have to be set manually in a trial-and-error fashion, in what amounts to a time-consuming and difficult part of machine learning workflows. For example, consider training a ResNet (see Section 8.6) on CIFAR-10, which requires more than 2 hours on an Amazon Elastic Cloud Compute (EC2) g4dn.xlarge instance. Even just trying ten hyperparameter configurations in sequence, this would already take us roughly one day. To make matters worse, hyperparameters are usually not directly transferable across architectures and datasets (Bardenet et al., 2013, Feurer et al., 2022, Wistuba et al., 2018), and need to be re-optimized for every new task. Also, for most hyperparameters, there are no rule-of-thumbs, and expert knowledge is required to find sensible values.

다른 형태의 자동화가 없으면 하이퍼파라미터는 시행착오 방식으로 수동으로 설정해야 하므로 기계 학습 워크플로에서 시간이 많이 걸리고 어려운 부분이 됩니다. 예를 들어, Amazon Elastic Cloud Compute(EC2) g4dn.xlarge 인스턴스에서 2시간 이상 필요한 CIFAR-10의 ResNet(섹션 8.6 참조) 교육을 고려해 보세요. 10개의 하이퍼파라미터 구성을 순차적으로 시도하는 것만으로도 이미 대략 하루가 걸릴 것입니다. 설상가상으로 하이퍼파라미터는 일반적으로 아키텍처와 데이터 세트 간에 직접 전송할 수 없으며(Bardenet et al., 2013, Feurer et al., 2022, Wistuba et al., 2018) 모든 새로운 작업에 대해 다시 최적화해야 합니다. 또한 대부분의 하이퍼파라미터에는 경험 법칙이 없으며, 합리적인 값을 찾기 위해서는 전문 지식이 필요합니다.

Hyperparameter optimization (HPO) algorithms are designed to tackle this problem in a principled and automated fashion (Feurer and Hutter, 2018), by framing it as a global optimization problem. The default objective is the error on a hold-out validation dataset, but could in principle be any other business metric. It can be combined with or constrained by secondary objectives, such as training time, inference time, or model complexity.

하이퍼파라미터 최적화(HPO) 알고리즘은 이 문제를 전역 최적화 문제로 구성하여 원칙적이고 자동화된 방식으로(Feurer and Hutter, 2018) 해결하도록 설계되었습니다. 기본 목표는 홀드아웃 검증 데이터 세트의 오류이지만 원칙적으로 다른 비즈니스 지표일 수도 있습니다. 이는 훈련 시간, 추론 시간 또는 모델 복잡성과 같은 2차 목표와 결합되거나 제한될 수 있습니다.

Recently, hyperparameter optimization has been extended to neural architecture search (NAS) (Elsken et al., 2018, Wistuba et al., 2019), where the goal is to find entirely new neural network architectures. Compared to classical HPO, NAS is even more expensive in terms of computation and requires additional efforts to remain feasible in practice. Both, HPO and NAS can be considered as sub-fields of AutoML (Hutter et al., 2019), which aims to automate the entire ML pipeline.

최근 하이퍼파라미터 최적화는 완전히 새로운 신경망 아키텍처를 찾는 것이 목표인 신경 아키텍처 검색(NAS)(Elsken et al., 2018, Wistuba et al., 2019)으로 확장되었습니다. 기존 HPO에 비해 NAS는 계산 측면에서 훨씬 더 비싸며 실제로 실행 가능성을 유지하려면 추가 노력이 필요합니다. HPO와 NAS는 모두 전체 ML 파이프라인 자동화를 목표로 하는 AutoML(Hutter et al., 2019)의 하위 분야로 간주될 수 있습니다.

In this section we will introduce HPO and show how we can automatically find the best hyperparameters of the logistic regression example introduced in Section 4.5.

이 섹션에서는 HPO를 소개하고 섹션 4.5에 소개된 로지스틱 회귀 예제의 최상의 하이퍼파라미터를 자동으로 찾는 방법을 보여줍니다.

19.1.1. The Optimization Problem

We will start with a simple toy problem: searching for the learning rate of the multi-class logistic regression model SoftmaxRegression from Section 4.5 to minimize the validation error on the Fashion MNIST dataset. While other hyperparameters like batch size or number of epochs are also worth tuning, we focus on learning rate alone for simplicity.

간단한 장난감 문제부터 시작하겠습니다. Fashion MNIST 데이터세트의 검증 오류를 최소화하기 위해 섹션 4.5에서 다중 클래스 로지스틱 회귀 모델 SoftmaxRegression의 학습률을 검색하는 것입니다. 배치 크기나 에포크 수와 같은 다른 하이퍼파라미터도 조정할 가치가 있지만 단순화를 위해 학습 속도에만 중점을 둡니다.

import numpy as np
import torch
from scipy import stats
from torch import nn
from d2l import torch as d2l

Before we can run HPO, we first need to define two ingredients: the objective function and the configuration space.

HPO를 실행하기 전에 먼저 목적 함수와 구성 공간이라는 두 가지 구성 요소를 정의해야 합니다.

19.1.1.1. The Objective Function

The performance of a learning algorithm can be seen as a function f:X→ℝ that maps from the hyperparameter space x∈D to the validation loss. For every evaluation of f(x), we have to train and validate our machine learning model, which can be time and compute intensive in the case of deep neural networks trained on large datasets. Given our criterion f(x) our goal is to find x⋆∈argminx∈Xf(x).

학습 알고리즘의 성능은 하이퍼파라미터 공간 x∈D에서 검증 손실로 매핑되는 함수 f:X→ℝ로 볼 수 있습니다. f(x)를 평가할 때마다 기계 학습 모델을 훈련하고 검증해야 하는데, 이는 대규모 데이터 세트에 대해 훈련된 심층 신경망의 경우 시간과 계산 집약적일 수 있습니다. 기준 f(x)가 주어지면 우리의 목표는 x⋆∈argminx∈Xf(x)를 찾는 것입니다.

There is no simple way to compute gradients of f with respect to x, because it would require to propagate the gradient through the entire training process. While there is recent work (Franceschi et al., 2017, Maclaurin et al., 2015) to drive HPO by approximate “hypergradients”, none of the existing approaches are competitive with the state-of-the-art yet, and we will not discuss them here. Furthermore, the computational burden of evaluating f requires HPO algorithms to approach the global optimum with as few samples as possible.

x에 대한 f의 기울기를 계산하는 간단한 방법은 없습니다. 왜냐하면 전체 훈련 과정을 통해 기울기를 전파해야 하기 때문입니다. 대략적인 "hypergradients"를 통해 HPO를 구동하는 최근 연구(Franceschi et al., 2017, Maclaurin et al., 2015)가 있지만 기존 접근 방식 중 어느 것도 아직 최첨단 기술과 경쟁할 수 없습니다. 여기서는 논의하지 마세요. 더욱이, f를 평가하는 계산 부담으로 인해 HPO 알고리즘은 가능한 적은 샘플을 사용하여 전역 최적에 접근해야 합니다.

The training of neural networks is stochastic (e.g., weights are randomly initialized, mini-batches are randomly sampled), so that our observations will be noisy: y∼f(x)+ϵ, where we usually assume that the ϵ∼N(0,σ) observation noise is Gaussian distributed.

신경망의 훈련은 확률론적입니다(예: 가중치는 무작위로 초기화되고, 미니 배치는 무작위로 샘플링됩니다). 따라서 우리의 관측값은 시끄러울 것입니다: y∼f(x)+ϵ, 여기서 우리는 일반적으로 ϵ∼N( 0,σ) 관측 잡음은 가우스 분포입니다.

Faced with all these challenges, we usually try to identify a small set of well performing hyperparameter configurations quickly, instead of hitting the global optima exactly. However, due to large computational demands of most neural networks models, even this can take days or weeks of compute. We will explore in Section 19.4 how we can speed-up the optimization process by either distributing the search or using cheaper-to-evaluate approximations of the objective function.

이러한 모든 문제에 직면했을 때 우리는 일반적으로 전체 최적 상태에 정확히 도달하는 대신 성능이 좋은 소수의 하이퍼파라미터 구성 세트를 신속하게 식별하려고 노력합니다. 그러나 대부분의 신경망 모델의 컴퓨팅 요구량이 많기 때문에 이 작업에도 며칠 또는 몇 주가 걸릴 수 있습니다. 우리는 섹션 19.4에서 검색을 분산하거나 목적 함수의 평가 비용이 더 저렴한 근사치를 사용하여 최적화 프로세스의 속도를 높일 수 있는 방법을 탐색할 것입니다.

We begin with a method for computing the validation error of a model.

모델의 검증 오류를 계산하는 방법부터 시작합니다.

class HPOTrainer(d2l.Trainer):  #@save
    def validation_error(self):
        self.model.eval()
        accuracy = 0
        val_batch_idx = 0
        for batch in self.val_dataloader:
            with torch.no_grad():
                x, y = self.prepare_batch(batch)
                y_hat = self.model(x)
                accuracy += self.model.accuracy(y_hat, y)
            val_batch_idx += 1
        return 1 -  accuracy / val_batch_idx

위의 코드는 하이퍼파라미터 최적화(Hyperparameter Optimization, HPO)를 수행하는 데 사용되는 HPOTrainer 클래스를 정의하는 파트입니다. 코드의 목적과 각 부분에 대한 설명은 다음과 같습니다:

class HPOTrainer(d2l.Trainer):: HPOTrainer 클래스를 정의합니다. 이 클래스는 d2l.Trainer 클래스를 상속받아 하이퍼파라미터 최적화를 위한 훈련 기능을 추가합니다.
def validation_error(self):: 검증 데이터셋을 사용하여 모델의 성능을 평가하고 검증 오차를 계산하는 메서드를 정의합니다.
self.model.eval(): 모델을 평가 모드로 설정합니다. 이 모드에서는 모델이 평가되기만 하고 그라디언트 계산이 비활성화됩니다.
accuracy = 0: 정확도를 초기화합니다. 이 변수는 모든 검증 배치의 정확도를 누적하기 위해 사용됩니다.
val_batch_idx = 0: 검증 배치의 인덱스를 초기화합니다.
for batch in self.val_dataloader:: 검증 데이터셋의 배치들을 반복합니다. self.val_dataloader는 검증 데이터셋을 로드하는 데 사용되는 데이터 로더입니다.
with torch.no_grad():: 그라디언트 계산을 비활성화하는 torch.no_grad() 컨텍스트를 생성합니다. 이 컨텍스트 내에서는 모델이 평가될 때 그라디언트가 계산되지 않습니다.
x, y = self.prepare_batch(batch): 현재 배치를 준비하고 입력 데이터 x와 레이블 y를 가져옵니다.
y_hat = self.model(x): 모델을 사용하여 입력 데이터에 대한 예측을 계산합니다.
accuracy += self.model.accuracy(y_hat, y): 현재 배치의 정확도를 계산하여 누적합니다. self.model.accuracy()는 모델의 예측과 실제 레이블을 사용하여 정확도를 계산하는 메서드입니다.
val_batch_idx += 1: 검증 배치 인덱스를 증가시킵니다.
return 1 - accuracy / val_batch_idx: 검증 데이터셋 전체에 대한 정확도를 계산하고, 1에서 빼서 검증 오차를 계산합니다. 이 오차는 하이퍼파라미터 최적화 과정에서 사용될 수 있습니다.

이렇게 정의된 HPOTrainer 클래스는 모델의 성능을 검증 데이터셋을 사용하여 평가하고 검증 오차를 계산하는 기능을 제공합니다. 이 클래스는 하이퍼파라미터 최적화의 일부로 모델의 성능을 측정하는 데 유용하게 사용될 수 있습니다.

We optimize validation error with respect to the hyperparameter configuration config, consisting of the learning_rate. For each evaluation, we train our model for max_epochs epochs, then compute and return its validation error:

learning_rate로 구성된 하이퍼파라미터 구성 config에 대한 검증 오류를 최적화합니다. 각 평가에 대해 max_epochs epoch에 대한 모델을 훈련한 다음 검증 오류를 계산하고 반환합니다.

def hpo_objective_softmax_classification(config, max_epochs=8):
    learning_rate = config["learning_rate"]
    trainer = d2l.HPOTrainer(max_epochs=max_epochs)
    data = d2l.FashionMNIST(batch_size=16)
    model = d2l.SoftmaxRegression(num_outputs=10, lr=learning_rate)
    trainer.fit(model=model, data=data)
    return trainer.validation_error().detach().numpy()

위의 코드는 하이퍼파라미터 최적화(Hyperparameter Optimization, HPO)를 수행하기 위한 목적 함수인 hpo_objective_softmax_classification를 정의하는 파트입니다. 이 함수는 주어진 하이퍼파라미터 구성을 사용하여 소프트맥스 분류 모델을 훈련하고 검증 오차를 반환합니다. 코드의 목적과 각 부분에 대한 설명은 다음과 같습니다:

def hpo_objective_softmax_classification(config, max_epochs=8):: 하이퍼파라미터 최적화를 위한 목적 함수를 정의합니다. 함수는 두 개의 매개변수를 입력으로 받습니다. config는 하이퍼파라미터 구성을 나타내며, max_epochs는 모델의 최대 훈련 에폭을 지정하는 매개변수로 기본값은 8입니다.
learning_rate = config["learning_rate"]: 주어진 하이퍼파라미터 구성에서 학습률(learning rate)을 가져옵니다. 이는 하이퍼파라미터 최적화의 일부로 조정될 값입니다.
trainer = d2l.HPOTrainer(max_epochs=max_epochs): 하이퍼파라미터 최적화를 위한 d2l.HPOTrainer 객체를 생성합니다. max_epochs는 모델의 최대 훈련 에폭을 설정합니다.
data = d2l.FashionMNIST(batch_size=16): Fashion MNIST 데이터셋을 로드하고 데이터 로더를 생성합니다. 이 데이터는 모델을 훈련하기 위한 학습 및 검증 데이터로 사용됩니다. batch_size는 한 번에 처리할 데이터의 배치 크기를 설정합니다.
model = d2l.SoftmaxRegression(num_outputs=10, lr=learning_rate): 소프트맥스 회귀(Softmax Regression) 모델을 생성합니다. num_outputs는 출력 클래스(레이블)의 수를 설정하며, lr은 학습률을 설정합니다.
trainer.fit(model=model, data=data): trainer를 사용하여 모델을 학습합니다. 모델과 데이터를 전달하고, 지정된 에폭 수(max_epochs)만큼 훈련을 수행합니다.
return trainer.validation_error().detach().numpy(): 훈련된 모델의 검증 오차를 계산하고 반환합니다. 검증 오차는 trainer.validation_error()를 통해 얻으며, NumPy 배열로 변환하여 반환합니다.

이 함수는 주어진 하이퍼파라미터 구성을 사용하여 모델을 훈련하고 검증 오차를 반환하므로, 하이퍼파라미터 최적화 알고리즘(예: Bayesian Optimization)에 의해 최적의 하이퍼파라미터 구성을 찾는 데 사용됩니다.

19.1.1.2. The Configuration Space

Along with the objective function f(x), we also need to define the feasible set x∈X to optimize over, known as configuration space or search space. For our logistic regression example, we will use:

목적 함수 objective function f(x)와 함께 구성 공간 configuration space 또는 검색 공간 search space 으로 알려진 최적화를 위한 실행 가능한 집합 x∈X도 정의해야 합니다. 로지스틱 회귀 예제에서는 다음을 사용합니다.

config_space = {"learning_rate": stats.loguniform(1e-4, 1)}

위의 코드는 하이퍼파라미터 최적화를 위한 하이퍼파라미터 공간을 정의하는 부분입니다. 코드의 목적과 각 부분에 대한 설명은 다음과 같습니다:

config_space = {"learning_rate": stats.loguniform(1e-4, 1)}: config_space라는 딕셔너리를 정의합니다. 이 딕셔너리는 하이퍼파라미터 공간을 설명하는 엔트리(항목)를 포함합니다.
"learning_rate": 이 항목은 하이퍼파라미터의 이름인 "learning_rate"를 나타냅니다. 이 이름은 해당 하이퍼파라미터를 식별하는 데 사용됩니다.
stats.loguniform(1e-4, 1): "learning_rate" 하이퍼파라미터의 값 범위를 지정합니다. stats.loguniform은 로그 스케일로 분포하는 값을 생성하는 함수입니다. 여기서는 1e-4에서부터 1까지의 로그 스케일 분포를 정의하고 있으므로, "learning_rate" 하이퍼파라미터의 값은 1e-4에서부터 1 사이의 값 중 하나가 될 것입니다.

이 코드는 하이퍼파라미터 최적화 과정에서 어떤 하이퍼파라미터를 탐색할 것인지를 정의하는데 사용됩니다. 여기서는 "learning_rate"라는 하이퍼파라미터의 값 범위를 로그 스케일로 지정하고 있으므로, 최적의 학습률을 찾기 위해 로그 스케일로 값을 탐색할 수 있습니다.

Here we use the use the loguniform object from SciPy, which represents a uniform distribution between -4 and -1 in the logarithmic space. This object allows us to sample random variables from this distribution.

여기서는 로그 공간에서 -4와 -1 사이의 균일 분포를 나타내는 SciPy의 loguniform 객체를 사용합니다. 이 객체를 사용하면 이 분포에서 무작위 변수를 샘플링할 수 있습니다.

Each hyperparameter has a data type, such as float for learning_rate, as well as a closed bounded range (i.e., lower and upper bounds). We usually assign a prior distribution (e.g, uniform or log-uniform) to each hyperparameter to sample from. Some positive parameters, such as learning_rate, are best represented on a logarithmic scale as optimal values can differ by several orders of magnitude, while others, such as momentum, come with linear scale.

각 하이퍼파라미터에는 learning_rate의 float와 같은 데이터 유형과 닫힌 경계 범위(예: 하한 및 상한)가 있습니다. 우리는 일반적으로 샘플링할 각 하이퍼파라미터에 사전 분포(예: 균일 또는 로그 균일)를 할당합니다. learning_rate와 같은 일부 양수 매개변수는 최적의 값이 여러 차수만큼 다를 수 있으므로 로그 척도로 가장 잘 표현되는 반면, 모멘텀과 같은 다른 매개변수는 선형 척도로 제공됩니다.

Below we show a simple example of a configuration space consisting of typical hyperparameters of a multi-layer perceptron including their type and standard ranges.

아래에서는 유형 및 표준 범위를 포함하여 다층 퍼셉트론의 일반적인 하이퍼 매개변수로 구성된 구성 공간의 간단한 예를 보여줍니다.

: Example configuration space of multi-layer perceptron

: 다층 퍼셉트론의 구성 공간 예시

Table 19.1.1 label:tab_example_configspaceNameTypeHyperparameter Rangeslog-scale

learning rate	float	:math:` [10^{-6},10^{-1}]`	yes
batch size	integer	[8,256]	yes
momentum	float	[0,0.99]	no
activation function	categorical	:mat h:{textrm{tanh} , textrm{relu}}
number of units	integer	[32,1024]	yes
number of layers	integer	[1,6]	no

In general, the structure of the configuration space X can be complex and it can be quite different from ℝ**d. In practice, some hyperparameters may depend on the value of others. For example, assume we try to tune the number of layers for a multi-layer perceptron, and for each layer the number of units. The number of units of the l-th layer is relevant only if the network has at least l+1 layers. These advanced HPO problems are beyond the scope of this chapter. We refer the interested reader to (Baptista and Poloczek, 2018, Hutter et al., 2011, Jenatton et al., 2017).

일반적으로 구성 공간 X의 구조는 복잡할 수 있으며 ℝ**d와 상당히 다를 수 있습니다. 실제로 일부 하이퍼파라미터는 다른 하이퍼파라미터의 값에 따라 달라질 수 있습니다. 예를 들어, 다층 퍼셉트론의 레이어 수와 각 레이어의 단위 수를 조정하려고 한다고 가정합니다. l번째 레이어의 단위 수는 네트워크에 l+1개 이상의 레이어가 있는 경우에만 관련이 있습니다. 이러한 고급 HPO 문제는 이 장의 범위를 벗어납니다. 관심 있는 독자에게는 (Baptista and Poloczek, 2018, Hutter et al., 2011, Jenatton et al., 2017)을 참조하시기 바랍니다.

The configuration space plays an important role for hyperparameter optimization, since no algorithms can find something that is not included in the configuration space. On the other hand, if the ranges are too large, the computation budget to find well performing configurations might become infeasible.

구성 공간은 구성 공간에 포함되지 않은 것을 어떤 알고리즘도 찾을 수 없기 때문에 하이퍼파라미터 최적화에 중요한 역할을 합니다. 반면에 범위가 너무 크면 성능이 좋은 구성을 찾기 위한 계산 예산이 실행 불가능해질 수 있습니다.

19.1.2. Random Search

Random search is the first hyperparameter optimization algorithm we will consider. The main idea of random search is to independently sample from the configuration space until a predefined budget (e.g maximum number of iterations) is exhausted, and to return the best observed configuration. All evaluations can be executed independently in parallel (see Section 19.3), but here we use a sequential loop for simplicity.

무작위 검색은 우리가 고려할 첫 번째 하이퍼파라미터 최적화 알고리즘입니다. 무작위 검색의 주요 아이디어는 미리 정의된 예산(예: 최대 반복 횟수)이 소진될 때까지 구성 공간에서 독립적으로 샘플링하고 가장 잘 관찰된 구성을 반환하는 것입니다. 모든 평가는 독립적으로 병렬로 실행될 수 있지만(19.3절 참조) 여기서는 단순화를 위해 순차 루프를 사용합니다.

errors, values = [], []
num_iterations = 5

for i in range(num_iterations):
    learning_rate = config_space["learning_rate"].rvs()
    print(f"Trial {i}: learning_rate = {learning_rate}")
    y = hpo_objective_softmax_classification({"learning_rate": learning_rate})
    print(f"    validation_error = {y}")
    values.append(learning_rate)
    errors.append(y)

위의 코드는 하이퍼파라미터 최적화 과정을 반복적으로 실행하고, 각 반복에서 얻은 검증 오차와 하이퍼파라미터 값을 기록하는 부분입니다. 코드의 목적과 각 부분에 대한 설명은 다음과 같습니다:

errors, values = [], []: 검증 오차와 하이퍼파라미터 값을 저장할 빈 리스트 errors와 values를 초기화합니다.
num_iterations = 5: 하이퍼파라미터 최적화를 몇 번 반복할지를 나타내는 변수를 설정합니다. 여기서는 5번 반복합니다.
for i in range(num_iterations):: num_iterations 횟수만큼 반복하는 루프를 시작합니다.
learning_rate = config_space["learning_rate"].rvs(): 하이퍼파라미터 공간에서 "learning_rate" 하이퍼파라미터의 값을 랜덤하게 선택합니다. 이렇게 선택된 학습률 값을 learning_rate 변수에 저장합니다.
print(f"Trial {i}: learning_rate = {learning_rate}"): 현재 반복의 정보를 출력합니다. 이 부분은 각 반복에서 어떤 하이퍼파라미터 값이 선택되었는지 확인하는 데 사용됩니다.
y = hpo_objective_softmax_classification({"learning_rate": learning_rate}): 선택된 하이퍼파라미터 값을 사용하여 hpo_objective_softmax_classification 함수를 호출하여 검증 오차를 계산합니다. 계산된 검증 오차는 y에 저장됩니다.
print(f" validation_error = {y}"): 계산된 검증 오차를 출력합니다. 이 부분은 각 반복에서 얻은 검증 오차를 확인하는 데 사용됩니다.
values.append(learning_rate): 선택된 하이퍼파라미터 값(학습률)을 values 리스트에 추가합니다. 이렇게 하이퍼파라미터 값들이 기록됩니다.
errors.append(y): 계산된 검증 오차를 errors 리스트에 추가합니다. 이렇게 검증 오차들이 기록됩니다.

이 코드는 하이퍼파라미터 최적화의 각 반복에서 랜덤하게 선택된 하이퍼파라미터 값을 사용하여 검증 오차를 계산하고, 이를 기록하여 최적의 하이퍼파라미터 조합을 찾기 위한 실험을 수행합니다.

 validation_error = 0.17070001363754272

The best learning rate is then simply the one with the lowest validation error.

가장 좋은 학습률은 검증 오류가 가장 낮은 학습률입니다.

best_idx = np.argmin(errors)
print(f"optimal learning rate = {values[best_idx]}")

위의 코드는 하이퍼파라미터 최적화 실험 결과에서 최적의 학습률(learning rate)을 선택하는 부분입니다. 코드의 목적과 각 부분에 대한 설명은 다음과 같습니다:

best_idx = np.argmin(errors): 검증 오차(errors 리스트) 중에서 가장 작은 값을 가지는 인덱스를 찾습니다. np.argmin() 함수는 배열에서 최솟값을 가지는 원소의 인덱스를 반환합니다. 이를 통해 최적의 학습률을 선택할 때 해당 인덱스를 사용할 것입니다.
print(f"optimal learning rate = {values[best_idx]}"): 최적의 학습률을 출력합니다. values 리스트에서 best_idx 인덱스에 해당하는 학습률 값을 가져와서 출력합니다. 이 값은 검증 오차가 가장 작을 때의 학습률을 나타냅니다.

즉, 이 코드는 여러 번의 하이퍼파라미터 최적화 실험을 통해 얻은 검증 오차를 분석하여, 검증 오차가 가장 작은 학습률을 최적의 학습률로 선택하고 출력합니다. 이렇게 찾은 최적의 하이퍼파라미터 값을 모델 학습에 사용할 수 있습니다.

optimal learning rate = 0.09844872561810249

Due to its simplicity and generality, random search is one of the most frequently used HPO algorithms. It does not require any sophisticated implementation and can be applied to any configuration space as long as we can define some probability distribution for each hyperparameter.

단순성과 일반성으로 인해 무작위 검색은 가장 자주 사용되는 HPO 알고리즘 중 하나입니다. 정교한 구현이 필요하지 않으며 각 하이퍼파라미터에 대한 확률 분포를 정의할 수 있는 한 모든 구성 공간에 적용할 수 있습니다.

Unfortunately random search also comes with a few shortcomings. First, it does not adapt the sampling distribution based on the previous observations it collected so far. Hence, it is equally likely to sample a poorly performing configuration than a better performing configuration. Second, the same amount of resources are spent for all configurations, even though some may show poor initial performance and are less likely to outperform previously seen configurations.

불행하게도 무작위 검색에는 몇 가지 단점도 있습니다. 첫째, 지금까지 수집한 이전 관측치를 기반으로 샘플링 분포를 조정하지 않습니다. 따라서 성능이 더 좋은 구성보다 성능이 낮은 구성을 샘플링할 가능성이 동일합니다. 둘째, 일부 구성은 초기 성능이 좋지 않고 이전 구성보다 성능이 떨어질 가능성이 있더라도 모든 구성에 동일한 양의 리소스가 사용됩니다.

In the next sections we will look at more sample efficient hyperparameter optimization algorithms that overcome the shortcomings of random search by using a model to guide the search. We will also look at algorithms that automatically stop the evaluation process of poorly performing configurations to speed up the optimization process.

다음 섹션에서는 검색을 안내하는 모델을 사용하여 무작위 검색의 단점을 극복하는 더 효율적인 하이퍼파라미터 최적화 알고리즘 샘플을 살펴보겠습니다. 또한 최적화 프로세스 속도를 높이기 위해 성능이 낮은 구성의 평가 프로세스를 자동으로 중지하는 알고리즘도 살펴보겠습니다.

19.1.3. Summary

In this section we introduced hyperparameter optimization (HPO) and how we can phrase it as a global optimization by defining a configuration space and an objective function. We also implemented our first HPO algorithm, random search, and applied it on a simple softmax classification problem.

이 섹션에서는 하이퍼파라미터 최적화(HPO)를 소개하고 구성 공간과 목적 함수를 정의하여 이를 전역 최적화로 표현하는 방법을 소개했습니다. 또한 첫 번째 HPO 알고리즘인 무작위 검색을 구현하고 이를 간단한 소프트맥스 분류 문제에 적용했습니다.

While random search is very simple, it is the better alternative to grid search, which simply evaluates a fixed set of hyperparameters. Random search somewhat mitigates the curse of dimensionality (Bellman, 1966), and can be far more efficient than grid search if the criterion most strongly depends on a small subset of the hyperparameters.

무작위 검색은 매우 간단하지만, 단순히 고정된 하이퍼파라미터 세트를 평가하는 그리드 검색보다 더 나은 대안입니다. 무작위 검색은 차원의 저주를 어느 정도 완화하며(Bellman, 1966), 기준이 하이퍼 매개변수의 작은 하위 집합에 가장 크게 의존하는 경우 그리드 검색보다 훨씬 더 효율적일 수 있습니다.

19.1.4. Exercises

In this chapter, we optimize the validation error of a model after training on a disjoint training set. For simplicity, our code uses Trainer.val_dataloader, which maps to a loader around FashionMNIST.val.
1. Convince yourself (by looking at the code) that this means we use the original FashionMNIST training set (60000 examples) for training, and the original test set (10000 examples) for validation.
2. Why could this practice be problematic? Hint: Re-read Section 3.6, especially about model selection.
3. What should we have done instead?
We stated above that hyperparameter optimization by gradient descent is very hard to do. Consider a small problem, such as training a two-layer perceptron on the FashionMNIST dataset (Section 5.2) with a batch size of 256. We would like to tune the learning rate of SGD in order to minimize a validation metric after one epoch of training.
1. Why cannot we use validation error for this purpose? What metric on the validation set would you use?
2. Sketch (roughly) the computational graph of the validation metric after training for one epoch. You may assume that initial weights and hyperparameters (such as learning rate) are input nodes to this graph. Hint: Re-read about computational graphs in Section 5.3.
3. Give a rough estimate of the number of floating point values you need to store during a forward pass on this graph. Hint: FashionMNIST has 60000 cases. Assume the required memory is dominated by the activations after each layer, and look up the layer widths in Section 5.2.
4. Apart from the sheer amount of compute and storage required, what other issues would gradient-based hyperparameter optimization run into? Hint: Re-read about vanishing and exploding gradients in Section 5.4.
5. Advanced: Read (Maclaurin et al., 2015) for an elegant (yet still somewhat unpractical) approach to gradient-based HPO.
Grid search is another HPO baseline, where we define an equi-spaced grid for each hyperparameter, then iterate over the (combinatorial) Cartesian product in order to suggest configurations.
1. We stated above that random search can be much more efficient than grid search for HPO on a sizable number of hyperparameters, if the criterion most strongly depends on a small subset of the hyperparameters. Why is this? Hint: Read (Bergstra et al., 2011).

'Dive into Deep Learning > D2L Hyperparameter Optimization' 카테고리의 다른 글

D2L - 19.5. Asynchronous Successive Halving (0)	2023.09.10
D2L - 19.4. Multi-Fidelity Hyperparameter Optimization (0)	2023.09.10
D2L - 19.3. Asynchronous Random Search (0)	2023.09.10
D2L - 19.2. Hyperparameter Optimization API (0)	2023.09.10
D2L - 19. Hyperparameter Optimization (0)	2023.09.10

Dive into Deep Learning/D2L Hyperparameter Optimization

D2L - 19. Hyperparameter Optimization

2023. 9. 10. 01:04 | Posted by 솔웅

https://d2l.ai/chapter_hyperparameter-optimization/index.html

19. Hyperparameter Optimization — Dive into Deep Learning 1.0.3 documentation

d2l.ai

19. Hyperparameter Optimization

Aaron Klein (Amazon), Matthias Seeger (Amazon), and Cedric Archambeau (Amazon)

The performance of every machine learning model depends on its hyperparameters. They control the learning algorithm or the structure of the underlying statistical model. However, there is no general way to choose hyperparameters in practice. Instead, hyperparameters are often set in a trial-and-error manner or sometimes left to their default values by practitioners, leading to suboptimal generalization.

모든 기계 학습 모델의 성능은 하이퍼파라미터에 따라 달라집니다. 학습 알고리즘이나 기본 통계 모델의 구조를 제어합니다. 그러나 실제로 하이퍼파라미터를 선택하는 일반적인 방법은 없습니다. 대신 초매개변수는 종종 시행착오 방식으로 설정되거나 때로는 실무자가 기본값으로 남겨두어 차선책 일반화로 이어집니다.

Hyperparameter optimization provides a systematic approach to this problem, by casting it as an optimization problem: a good set of hyperparameters should (at least) minimize a validation error. Compared to most other optimization problems arising in machine learning, hyperparameter optimization is a nested one, where each iteration requires training and validating a machine learning model.

하이퍼파라미터 최적화는 이 문제를 최적화 문제로 캐스팅하여 이 문제에 대한 체계적인 접근 방식을 제공합니다. 좋은 하이퍼파라미터 세트는 (적어도) 유효성 검사 오류를 최소화해야 합니다. 기계 학습에서 발생하는 대부분의 다른 최적화 문제와 비교할 때 하이퍼파라미터 최적화는 중첩된 문제로, 각 반복에는 기계 학습 모델을 훈련하고 검증해야 합니다.

In this chapter, we will first introduce the basics of hyperparameter optimization. We will also present some recent advancements that improve the overall efficiency of hyperparameter optimization by exploiting cheap-to-evaluate proxies of the original objective function. At the end of this chapter, you should be able to apply state-of-the-art hyperparameter optimization techniques to optimize the hyperparameter of your own machine learning algorithm.

이번 장에서는 먼저 하이퍼파라미터 최적화의 기본 사항을 소개합니다. 또한 원래 목적 함수의 평가하기 쉬운 프록시를 활용하여 하이퍼파라미터 최적화의 전반적인 효율성을 향상시키는 몇 가지 최근 발전 사항을 제시할 것입니다. 이 장을 마치면 최첨단 하이퍼파라미터 최적화 기술을 적용하여 자체 머신러닝 알고리즘의 하이퍼파라미터를 최적화할 수 있게 됩니다.

Hyperparameter Optimization 이란? (GPT 답변)

**하이퍼파라미터 최적화(Hyperparameter Optimization, HPO)**는 기계 학습 모델의 성능을 최대화하기 위해 모델의 하이퍼파라미터를 선택하는 과정을 가리킵니다. 하이퍼파라미터는 모델 자체가 아니라 모델을 구성하거나 교육하는 동안 조정할 수 있는 매개변수입니다. 이러한 하이퍼파라미터를 올바르게 선택하면 모델의 성능이 향상되며, 과적합(overfitting) 또는 과소적합(underfitting)과 같은 문제를 방지할 수 있습니다.

하이퍼파라미터 최적화 과정은 다음 단계로 진행됩니다:

하이퍼파라미터 공간 정의: 먼저 어떤 하이퍼파라미터를 조정할지 결정하고, 각 하이퍼파라미터의 유효한 범위 또는 가능한 값의 집합을 정의합니다. 예를 들어, 학습률(learning rate), 배치 크기(batch size), 은닉층의 수, 정규화 가중치 등의 하이퍼파라미터를 선택합니다.
목적 함수 정의: 하이퍼파라미터 최적화의 목표는 모델의 성능을 최대화하는 것이므로, 성능을 측정할 목적 함수(또는 손실 함수)를 정의해야 합니다. 이 목적 함수는 하이퍼파라미터 설정에 대한 모델의 성능을 측정합니다.
최적화 알고리즘 선택: 하이퍼파라미터 최적화를 위해 사용할 최적화 알고리즘을 선택합니다. 일반적으로 랜덤 서치(Random Search), 베이지안 최적화(Bayesian Optimization), 유전 알고리즘(Genetic Algorithm) 등의 다양한 알고리즘이 사용됩니다.
하이퍼파라미터 최적화 실행: 선택한 최적화 알고리즘을 사용하여 목적 함수를 최대화하는(또는 최소화하는) 하이퍼파라미터 조합을 찾습니다. 이러한 최적화 과정은 목적 함수를 여러 번 평가하며 진행됩니다.
최적의 하이퍼파라미터 설정 도출: 최적화 과정이 완료되면 최적의 하이퍼파라미터 설정을 얻게 됩니다. 이 설정을 사용하여 모델을 훈련하고 성능을 확인합니다.

하이퍼파라미터 최적화는 기계 학습 모델의 품질을 향상시키고, 모델을 더 효과적으로 조정하며, 과적합과 같은 문제를 방지하는 데 중요한 역할을 합니다. 그러나 주의할 점은 최적화 과정 자체가 계산적으로 비용이 많이 들 수 있으며, 하이퍼파라미터 공간이 크고 목적 함수가 불연속 또는 미분 불가능한 경우에 도전적일 수 있습니다. 따라서 하이퍼파라미터 최적화를 수행할 때는 최적화 알고리즘 선택, 하이퍼파라미터 공간 정의 등을 신중하게 고려해야 합니다.

'Dive into Deep Learning > D2L Hyperparameter Optimization' 카테고리의 다른 글

D2L - 19.5. Asynchronous Successive Halving (0)	2023.09.10
D2L - 19.4. Multi-Fidelity Hyperparameter Optimization (0)	2023.09.10
D2L - 19.3. Asynchronous Random Search (0)	2023.09.10
D2L - 19.2. Hyperparameter Optimization API (0)	2023.09.10
D2L - 19.1. What Is Hyperparameter Optimization? (0)	2023.09.10

Dive into Deep Learning/D2L Gaussian Processes

D2L - 18.3. Gaussian Process Inference

2023. 9. 10. 00:59 | Posted by 솔웅

https://d2l.ai/chapter_gaussian-processes/gp-inference.html

18.3. Gaussian Process Inference — Dive into Deep Learning 1.0.3 documentation

d2l.ai

18.3. Gaussian Process Inference

In this section, we will show how to perform posterior inference and make predictions using the GP priors we introduced in the last section. We will start with regression, where we can perform inference in closed form. This is a “GPs in a nutshell” section to quickly get up and running with Gaussian processes in practice. We’ll start coding all the basic operations from scratch, and then introduce GPyTorch, which will make working with state-of-the-art Gaussian processes and integration with deep neural networks much more convenient. We will consider these more advanced topics in depth in the next section. In that section, we will also consider settings where approximate inference is required — classification, point processes, or any non-Gaussian likelihoods.

이 섹션에서는 지난 섹션에서 소개한 GP priors 을 사용하여 사후 추론을 수행하고 예측하는 방법을 보여줍니다. 닫힌 형식으로 추론을 수행할 수 있는 회귀부터 시작하겠습니다. 이것은 실제로 가우스 프로세스를 빠르게 시작하고 실행하기 위한 "간단한 GP" 섹션입니다. 모든 기본 작업을 처음부터 코딩하기 시작한 다음 GPyTorch를 소개합니다. 이를 통해 최첨단 가우스 프로세스 작업 및 심층 신경망과의 통합이 훨씬 더 편리해집니다. 다음 섹션에서는 이러한 고급 주제를 심층적으로 고려할 것입니다. 해당 섹션에서는 분류, 포인트 프로세스 또는 비가우시안 가능성 등 대략적인 추론이 필요한 설정도 고려할 것입니다.

Gaussian Process Inference 란?

**가우시안 프로세스 추론(Gaussian Process Inference)**은 기계 학습과 통계 모델링에서 사용되는 강력한 도구 중 하나입니다. 가우시안 프로세스(GP)는 확률적 모델로, 확률 분포의 모든 점을 정의하는 데 사용됩니다. 이것은 특히 회귀 및 분류 문제에 적합하며, 확률 분포의 평균 및 분산을 사용하여 예측을 수행합니다.

여기에서 가우시안 프로세스 추론의 주요 개념을 설명합니다:

프로세스 (Process): 가우시안 프로세스는 "확률적 프로세스"를 모델링하는 것으로 생각할 수 있습니다. 즉, 이는 입력과 출력 간의 관계를 설명하는데 사용되는 확률 모델입니다.
확률 분포: GP는 모든 입력 값에 대한 확률 분포를 정의합니다. 각 입력 값에 대해 출력 값이 가우시안 분포를 따른다고 가정합니다. 따라서 GP는 평균 및 공분산(또는 커널)을 통해 확률 분포를 특성화합니다.
커널 (Kernel): GP의 핵심 부분 중 하나는 커널 함수입니다. 커널 함수는 입력 값 사이의 상관 관계를 정의합니다. 이것은 입력 값 간의 유사성을 측정하고 출력 값의 상관 관계를 결정하는 데 사용됩니다. 일반적으로 RBF(라디얼 베이시스 함수) 커널 또는 신경망 커널 등 다양한 커널 함수를 사용할 수 있습니다.
추론 (Inference): GP는 주어진 입력 값에 대한 출력 값을 추론하는 데 사용됩니다. 기존의 관찰 값을 기반으로 평균 및 분산을 계산하고, 이를 통해 예측값과 예측의 불확실성을 제공합니다. 이러한 예측은 회귀 문제와 분류 문제에서 모두 유용합니다.
하이퍼파라미터 (Hyperparameters): GP는 커널 함수의 하이퍼파라미터를 가집니다. 이러한 하이퍼파라미터는 모델을 학습하는 동안 조정되며, 모델의 적합성을 향상시키기 위해 최적화됩니다.
확률적 예측: GP는 확률적 모델이므로 예측값에 대한 불확실성을 제공합니다. 이것은 예측값이 얼마나 신뢰할 수 있는지를 알려줍니다.

가우시안 프로세스 추론은 주로 회귀 문제를 해결하는 데 사용되며, 데이터에 대한 예측 분포를 생성하여 모델의 불확실성을 고려합니다. 또한 하이퍼파라미터 최적화, 확률적 함수 샘플링 및 데이터 불확실성 추론과 같은 다양한 응용 분야에서 활용됩니다.

18.3.1. Posterior Inference for Regression

An observation model relates the function we want to learn, f(x), to our observations y(x), both indexed by some input x. In classification, x could be the pixels of an image, and y could be the associated class label. In regression, y typically represents a continuous output, such as a land surface temperature, a sea-level, a CO2 concentration, etc.

관측 모델은 우리가 학습하려는 함수 f(x)를 관측값 y(x)에 연결합니다. 둘 다 일부 입력 x에 의해 인덱싱됩니다. 분류에서 x는 이미지의 픽셀이 될 수 있고 y는 관련 클래스 레이블이 될 수 있습니다. 회귀 분석에서 y는 일반적으로 지표면 온도, 해수면, CO2 농도 등과 같은 연속 출력을 나타냅니다.

In regression, we often assume the outputs are given by a latent noise-free function f(x) plus i.i.d. Gaussian noise ϵ(x):

회귀 분석에서 우리는 종종 출력이 잠재 잡음 없는 함수 f(x) + i.i.d로 제공된다고 가정합니다. 가우스 잡음 ϵ(x):

with ϵ(x)∼N(0,σ2). Let y=y(X)=(y(x1),…,y(xn))**⊤ be a vector of our training observations, and f=(f(x1),…,f(xn))**⊤ be a vector of the latent noise-free function values, queried at the training inputs X=x1,…,xn.

ϵ(x)∼N(0,σ2)입니다. y=y(X)=(y(x1),…,y(xn))**⊤를 훈련 관측값의 벡터로 두고 f=(f(x1),…,f(xn))** ⊤ 훈련 입력 X=x1,…,xn에서 쿼리된 잠재 잡음 없는 함수 값의 벡터입니다.

We will assume f(x)∼GP(m,k), which means that any collection of function values f has a joint multivariate Gaussian distribution, with mean vector μi=m(xi) and covariance matrix Kij=k(xi,xj). The RBF kernel k(xi,xj)=a**2 exp⁡(− 1/2ℓ**2||xi−xj||**2) would be a standard choice of covariance function. For notational simplicity, we will assume the mean function m(x)=0; our derivations can easily be generalized later on.

우리는 f(x)∼GP(m,k)를 가정할 것입니다. 이는 f의 모든 함수 값 모음이 평균 벡터 μi=m(xi) 및 공분산 행렬 Kij=k(xi,xj를 갖는 결합 다변량 가우스 분포를 갖는다는 것을 의미합니다. ). RBF 커널 k(xi,xj)=a**2 exp⁡(− 1/2ℓ**2||xi−xj||**2)는 공분산 함수의 표준 선택입니다. 표기를 단순화하기 위해 평균 함수 m(x)=0으로 가정합니다. 우리의 유도는 나중에 쉽게 일반화될 수 있습니다.

Suppose we want to make predictions at a set of inputs

일련의 입력에 대해 예측을 하고 싶다고 가정해 보겠습니다.

Then we want to find x**2 and p(f∗|y,X). In the regression setting, we can conveniently find this distribution by using Gaussian identities, after finding the joint distribution over f∗=f(X∗) and y.

그런 다음 x**2와 p(f*|y,X)를 찾고 싶습니다. 회귀 설정에서 f*=f(X*) 및 y에 대한 결합 분포를 찾은 후 가우스 항등식을 사용하여 이 분포를 편리하게 찾을 수 있습니다.

If we evaluate equation (18.3.1) at the training inputs X, we have y=f+ϵ. By the definition of a Gaussian process (see last section), f∼N(0,K(X,X)) where K(X,X) is an n×n matrix formed by evaluating our covariance function (aka kernel) at all possible pairs of inputs xi,xj∈X. ϵ is simply a vector comprised of iid samples from N(0,σ**2) and thus has distribution N(0,σ**2I). y is therefore a sum of two independent multivariate Gaussian variables, and thus has distribution N(0,K(X,X)+σ**2I). One can also show that cov(f∗,y)=cov(y,f∗)**⊤=K(X∗,X) where K(X∗,X) is an m×n matrix formed by evaluating the kernel at all pairs of test and training inputs.

훈련 입력 X에서 방정식 (18.3.1)을 평가하면 y=f+ϵ가 됩니다. 가우스 프로세스(마지막 섹션 참조)의 정의에 따르면 f∼N(0,K(X,X)) 여기서 K(X,X)는 공분산 함수(일명 커널)를 평가하여 형성된 n×n 행렬입니다. 가능한 모든 입력 쌍 xi,xj∈X. ϵ는 단순히 N(0,σ**2)의 iid 샘플로 구성된 벡터이므로 분포 N(0,σ**2I)를 갖습니다. 따라서 y는 두 개의 독립적인 다변량 가우스 변수의 합이므로 분포 N(0,K(X,X)+σ**2I)를 갖습니다. cov(f*,y)=cov(y,f*)**⊤=K(X*,X) 여기서 K(X*,X)는 커널을 평가하여 형성된 m×n 행렬임을 보여줄 수도 있습니다. 모든 테스트 및 훈련 입력 쌍에서.

We can then use standard Gaussian identities to find the conditional distribution from the joint distribution (see, e.g., Bishop Chapter 2), f∗|y,X,X∗∼N(m∗,S∗), where m∗=K(X∗,X)[K(X,X)+σ**2 I]**−1 y, and S=K(X∗,X∗)−K(X∗,X)[K(X,X)+σ**2 I]**−1 K(X,X∗).

그런 다음 표준 가우스 항등식을 사용하여 결합 분포(예: Bishop 2장 참조) f*|y,X,X*∼N(m*,S*)에서 조건부 분포를 찾을 수 있습니다. 여기서 m*=K (X*,X)[K(X,X)+σ**2 I]**−1 y, S=K(X*,X*)−K(X*,X)[K(X, X)+σ**2 I]**−1 K(X,X**).

Typically, we do not need to make use of the full predictive covariance matrix S, and instead use the diagonal of S for uncertainty about each prediction. Often for this reason we write the predictive distribution for a single test point x∗, rather than a collection of test points.

일반적으로 전체 예측 공분산 행렬 S를 사용할 필요가 없으며 대신 각 예측에 대한 불확실성을 위해 S의 대각선을 사용합니다. 이러한 이유로 우리는 테스트 포인트 모음이 아닌 단일 테스트 포인트 x*에 대한 예측 분포를 작성하는 경우가 많습니다.

The kernel matrix has parameters θ that we also wish to estimate, such the amplitude 'a' and lengthscale ℓ of the RBF kernel above. For these purposes we use the marginal likelihood, p(y|θ,X), which we already derived in working out the marginal distributions to find the joint distribution over y,f∗. As we will see, the marginal likelihood compartmentalizes into model fit and model complexity terms, and automatically encodes a notion of Occam’s razor for learning hyperparameters. For a full discussion, see MacKay Ch. 28 (MacKay, 2003), and Rasmussen and Williams Ch. 5 (Rasmussen and Williams, 2006).

커널 행렬에는 위의 RBF 커널의 진폭 'a' 및 길이 척도 ℓ와 같이 추정하려는 매개변수 θ가 있습니다. 이러한 목적을 위해 우리는 y,f*에 대한 결합 분포를 찾기 위해 한계 분포를 계산할 때 이미 도출한 한계 우도 p(y|θ,X)를 사용합니다. 앞으로 살펴보겠지만, 한계 우도는 모델 적합성 및 모델 복잡성 용어로 분류되고 하이퍼파라미터 학습을 위한 Occam 면도칼 개념을 자동으로 인코딩합니다. 전체 토론을 보려면 MacKay Ch. 28(MacKay, 2003), Rasmussen 및 Williams Ch. 5 (라스무센과 윌리엄스, 2006).

import math
import os
import gpytorch
import matplotlib.pyplot as plt
import numpy as np
import torch
from scipy import optimize
from scipy.spatial import distance_matrix
from d2l import torch as d2l

d2l.set_figsize()

위의 코드는 여러 Python 라이브러리 및 모듈을 가져오고 환경을 설정하는 부분입니다. 코드의 각 부분에 대한 설명은 다음과 같습니다:

import math: Python의 수학 함수와 상수에 액세스하기 위한 라이브러리인 math를 가져옵니다.
import os: 운영 체제와 상호 작용하기 위한 라이브러리인 os를 가져옵니다.
import gpytorch: Gaussian Process 모델을 구현하고 조작하기 위한 라이브러리인 gpytorch를 가져옵니다. Gaussian Process는 확률 기반 회귀 및 분류 모델링에 사용됩니다.
import matplotlib.pyplot as plt: 데이터 시각화를 위한 Matplotlib 라이브러리의 서브 모듈인 pyplot을 가져옵니다.
import numpy as np: 다차원 배열 및 수학 함수를 제공하는 NumPy 라이브러리를 가져옵니다.
import torch: PyTorch 딥 러닝 라이브러리를 가져옵니다. PyTorch는 신경망 및 텐서 연산을 구현하는 데 사용됩니다.
from scipy import optimize: 과학 및 공학 계산을 위한 SciPy 라이브러리의 optimize 모듈을 가져옵니다. 이 모듈은 최적화 문제를 다루는 데 사용됩니다.
from scipy.spatial import distance_matrix: SciPy 라이브러리에서 distance_matrix 함수를 가져옵니다. 이 함수는 점들 간의 거리 행렬을 계산하는 데 사용됩니다.
from d2l import torch as d2l: "d2l" 모듈에서 "torch" 모듈을 가져옵니다. "d2l" 모듈은 딥 러닝 및 기계 학습 교육을 위한 도구와 함수를 제공합니다.
d2l.set_figsize(): D2L(Data Science for Dummies) 라이브러리의 set_figsize 함수를 호출하여 그림의 크기를 설정합니다. 이 함수는 일반적으로 그림의 크기를 조절하기 위해 사용됩니다.

이 코드는 다양한 라이브러리와 모듈을 가져오고, 환경을 설정하여 데이터 분석 및 시각화 작업을 수행할 준비를 하고 있습니다. 이 코드 자체는 데이터를 처리하거나 모델을 구축하지 않으며, 뒤에 나오는 코드 블록에서 작업이 이루어질 것입니다.

Posterior Inference for Regression이란? (GPT 답변)

*Posterior Inference for Regression(회귀를 위한 사후 추론)**는 기계 학습 및 통계에서 주로 사용되는 개념 중 하나입니다. 이것은 회귀 문제에서 예측 모델의 불확실성을 추정하고, 주어진 입력에 대한 출력값의 분포를 모델링하는 프로세스입니다. Posterior Inference는 가우시안 프로세스(GP)와 같은 확률적 모델 또는 베이지안 회귀와 관련이 있습니다. 아래에서 Posterior Inference for Regression의 주요 개념을 설명합니다:

사후 분포(Posterior Distribution): Posterior Inference에서 가장 중요한 개념은 사후 분포입니다. 사후 분포는 모델 파라미터에 대한 확률 분포로, 관찰된 데이터에 대한 정보를 사용하여 업데이트됩니다. 즉, 사후 분포는 관찰된 데이터에 따라 모델 파라미터가 어떻게 업데이트되어야 하는지를 나타냅니다.
베이지안 추론(Bayesian Inference): Posterior Inference는 베이지안 추론의 한 형태입니다. 베이지안 추론은 모델 파라미터의 사후 분포를 계산하는 과정을 포함하며, 이를 통해 모델의 불확실성을 추정합니다.
사후 평균(Posterior Mean): 사후 분포의 평균은 모델 파라미터의 최상의 추정치를 제공합니다. 이것은 예측 모델의 예측값에 대한 추정치로 사용됩니다.
사후 분산(Posterior Variance): 사후 분포의 분산은 모델 파라미터에 대한 불확실성을 나타냅니다. 높은 사후 분산은 모델 파라미터의 불확실성이 크다는 것을 의미하며, 예측의 불확실성에 영향을 미칩니다.
사후 예측(Posterior Prediction): 사후 분포를 사용하여 주어진 입력에 대한 출력값의 분포를 예측합니다. 이것은 예측 모델의 불확실성을 고려하는 중요한 부분입니다.
하이퍼파라미터 최적화(Hyperparameter Optimization): Posterior Inference를 사용하여 모델의 하이퍼파라미터를 최적화할 수 있습니다. 모델의 하이퍼파라미터를 조정하면 모델의 성능을 향상시킬 수 있습니다.

회귀 문제에서 Posterior Inference는 주로 Bayesian 회귀 모델, 가우시안 프로세스 회귀 및 베이지안 최적화에서 사용됩니다. 이를 통해 모델의 예측 불확실성을 고려하여 더 신뢰할 수 있는 예측을 수행할 수 있습니다.

18.3.2. Equations for Making Predictions and Learning Kernel Hyperparameters in GP Regression

We list here the equations you will use for learning hyperparameters and making predictions in Gaussian process regression. Again, we assume a vector of regression targets y, indexed by inputs X={x1,…,xn}, and we wish to make a prediction at a test input x∗. We assume i.i.d. additive zero-mean Gaussian noise with variance σ**2. We use a Gaussian process prior f(x)∼GP(m,k) for the latent noise-free function, with mean function m and kernel function k. The kernel itself has parameters θ that we want to learn. For example, if we use an RBF kernel, k(xi,xj)=a**2 exp⁡(− 1/2ℓ**2||x−x′||**2), we want to learn θ={a**2,ℓ**2}. Let K(X,X) represent an n×n matrix corresponding to evaluating the kernel for all possible pairs of n training inputs. Let K(x∗,X) represent a 1×n vector formed by evaluating k(x∗,xi), i=1,…,n. Let μ be a mean vector formed by evaluating the mean function m(x) at every training points x.

여기에 하이퍼파라미터를 학습하고 가우스 프로세스 회귀에서 예측하는 데 사용할 방정식이 나열되어 있습니다. 다시, 우리는 입력 X={x1,…,xn}에 의해 인덱싱된 회귀 목표 y의 벡터를 가정하고 테스트 입력 x*에서 예측을 만들고 싶습니다. 우리는 i.i.d를 가정합니다. 분산이 σ**2인 가산성 제로 평균 가우스 노이즈. 평균 함수 m과 커널 함수 k를 사용하여 잠재 잡음 없는 함수에 대해 f(x)∼GP(m,k) 이전의 가우스 프로세스를 사용합니다. 커널 자체에는 우리가 배우고 싶은 매개변수 θ가 있습니다. 예를 들어, RBF 커널 k(xi,xj)=a**2 exp⁡(− 1/2ℓ**2||x−x′||**2)를 사용하는 경우 θ=를 배우고 싶습니다. {a**2,ℓ**2}. K(X,X)는 가능한 모든 n 훈련 입력 쌍에 대해 커널을 평가하는 데 해당하는 n×n 행렬을 나타냅니다. K(x*,X)는 k(x*,xi), i=1,…,n을 평가하여 형성된 1×n 벡터를 나타낸다고 가정합니다. μ를 모든 트레이닝 포인트 x에서 평균 함수 m(x)를 평가하여 형성된 평균 벡터로 둡니다.

Typically in working with Gaussian processes, we follow a two-step procedure. 1. Learn kernel hyperparameters θ^ by maximizing the marginal likelihood with respect to these hyperparameters. 2. Use the predictive mean as a point predictor, and 2 times the predictive standard deviation to form a 95% credible set, conditioning on these learned hyperparameters θ^.

일반적으로 가우스 프로세스를 사용하여 작업할 때 우리는 2단계 절차를 따릅니다. 1. 이러한 하이퍼파라미터에 대한 한계 가능성을 최대화하여 커널 하이퍼파라미터 θ^를 학습합니다. 2. 예측 평균을 점 예측 변수로 사용하고 예측 표준 편차의 2배를 사용하여 학습된 하이퍼파라미터 θ^를 조건으로 하여 95% 신뢰할 수 있는 세트를 형성합니다.

The log marginal likelihood is simply a log Gaussian density, which has the form:

로그 한계 우도는 단순히 로그 가우스 밀도이며 다음과 같은 형식을 갖습니다.

The predictive distribution has the form:

예측 분포의 형식은 다음과 같습니다.

18.3.3. Interpreting Equations for Learning and Predictions

There are some key points to note about the predictive distributions for Gaussian processes:

가우스 프로세스의 예측 분포에 대해 주목해야 할 몇 가지 핵심 사항이 있습니다.

Despite the flexibility of the model class, it is possible to do exact Bayesian inference for GP regression in closed form. Aside from learning the kernel hyperparameters, there is no training. We can write down exactly what equations we want to use to make predictions. Gaussian processes are relatively exceptional in this respect, and it has greatly contributed to their convenience, versatility, and continued popularity.

모델 클래스의 유연성에도 불구하고 GP 회귀에 대한 정확한 베이지안 추론을 닫힌 형식으로 수행하는 것이 가능합니다. 커널 하이퍼파라미터를 학습하는 것 외에는 교육이 없습니다. 예측을 하기 위해 어떤 방정식을 사용하고 싶은지 정확하게 적을 수 있습니다. 가우스 프로세스는 이 점에서 상대적으로 예외적이며 편의성, 다양성 및 지속적인 인기에 크게 기여했습니다.

The predictive mean a∗ is a linear combination of the training targets y, weighted by the kernel kθ(x∗,X)[Kθ(x,X)+σ**2 I]**−1. As we will see, the kernel (and its hyperparameters) thus plays a crucial role in the generalization properties of the model.

예측 평균 a*는 커널 kθ(x*,X)[Kθ(x,X)+σ**2 I]**−1에 의해 가중치가 부여된 훈련 목표 y의 선형 조합입니다. 앞으로 살펴보겠지만 커널(및 해당 하이퍼파라미터)은 모델의 일반화 속성에서 중요한 역할을 합니다.

The predictive mean explicitly depends on the target values y but the predictive variance does not. The predictive uncertainty instead grows as the test input x∗ moves away from the target locations X, as governed by the kernel function. However, uncertainty will implicitly depend on the values of the targets y through the kernel hyperparameters θ, which are learned from the data.

예측 평균은 명시적으로 목표 값 y에 따라 달라지지만 예측 분산은 그렇지 않습니다. 대신 커널 함수에 따라 테스트 입력 x*가 목표 위치 X에서 멀어짐에 따라 예측 불확실성이 커집니다. 그러나 불확실성은 데이터에서 학습된 커널 하이퍼파라미터 θ를 통해 목표 y의 값에 암묵적으로 의존합니다.

The marginal likelihood compartmentalizes into model fit and model complexity (log determinant) terms. The marginal likelihood tends to select for hyperparameters that provide the simplest fits that are still consistent with the data.

한계 우도는 모델 적합성과 모델 복잡성(로그 결정 요인) 항으로 구분됩니다. 한계 우도는 데이터와 여전히 일치하는 가장 단순한 적합치를 제공하는 초매개변수를 선택하는 경향이 있습니다.

The key computational bottlenecks come from solving a linear system and computing a log determinant over an n×n symmetric positive definite matrix K(X,X) for n training points. Naively, these operations each incur O(n**3) computations, as well as O(n**2) storage for each entry of the kernel (covariance) matrix, often starting with a Cholesky decomposition. Historically, these bottlenecks have limited GPs to problems with fewer than about 10,000 training points, and have given GPs a reputation for “being slow” that has been inaccurate now for almost a decade. In advanced topics, we will discuss how GPs can be scaled to problems with millions of points.

주요 계산 병목 현상은 선형 시스템을 풀고 n 훈련 포인트에 대한 n×n 대칭 양의 정부호 행렬 K(X,X)에 대한 로그 행렬식을 계산하는 데서 발생합니다. 기본적으로 이러한 작업은 각각 O(n**3) 계산을 발생시키고 커널(공분산) 행렬의 각 항목에 대해 O(n**2) 저장을 발생시키며, 종종 Cholesky 분해로 시작됩니다. 역사적으로 이러한 병목 현상으로 인해 GP는 훈련 포인트가 약 10,000개 미만인 문제로 제한되었으며 GP는 "느리다"는 평판을 얻었으며 현재는 거의 10년 동안 부정확해졌습니다. 고급 주제에서는 GP를 수백만 포인트의 문제로 확장하는 방법에 대해 논의합니다.

For popular choices of kernel functions, K(X,X) is often close to singular, which can cause numerical issues when performing Cholesky decompositions or other operations intended to solve linear systems. Fortunately, in regression we are often working with Kθ(X,X)+σ**2 I, such that the noise variance σ**2 gets added to the diagonal of K(X,X), significantly improving its conditioning. If the noise variance is small, or we are doing noise free regression, it is common practice to add a small amount of “jitter” to the diagonal, on the order of 10**−6, to improve conditioning.

널리 사용되는 커널 함수 선택의 경우 K(X,X)는 종종 특이값에 가깝습니다. 이는 Cholesky 분해 또는 선형 시스템을 풀기 위한 기타 연산을 수행할 때 수치 문제를 일으킬 수 있습니다. 다행스럽게도 회귀 분석에서는 종종 Kθ(X,X)+σ**2 I로 작업하여 잡음 분산 σ**2가 K(X,X)의 대각선에 추가되어 조건이 크게 향상됩니다. 노이즈 분산이 작거나 노이즈 없는 회귀를 수행하는 경우 컨디셔닝을 개선하기 위해 대각선에 10**−6 정도의 소량의 "지터"를 추가하는 것이 일반적입니다.

18.3.4. Worked Example from Scratch

Let’s create some regression data, and then fit the data with a GP, implementing every step from scratch. We’ll sample data from

회귀 데이터를 생성한 다음 GP로 데이터를 맞추고 모든 단계를 처음부터 구현해 보겠습니다. 다음에서 데이터를 샘플링하겠습니다.

with ϵ∼N(0,σ**2). The noise free function we wish to find is f(x)=sin⁡(x)+1/2 sin⁡(4x). We’ll start by using a noise standard deviation σ=0.25.

ϵ∼N(0,σ**2)입니다. 우리가 찾고자 하는 잡음 없는 함수는 f(x)=sin⁡(x)+1/2 sin⁡(4x)입니다. 잡음 표준편차 σ=0.25를 사용하여 시작하겠습니다.

def data_maker1(x, sig):
    return np.sin(x) + 0.5 * np.sin(4 * x) + np.random.randn(x.shape[0]) * sig

sig = 0.25
train_x, test_x = np.linspace(0, 5, 50), np.linspace(0, 5, 500)
train_y, test_y = data_maker1(train_x, sig=sig), data_maker1(test_x, sig=0.)

d2l.plt.scatter(train_x, train_y)
d2l.plt.plot(test_x, test_y)
d2l.plt.xlabel("x", fontsize=20)
d2l.plt.ylabel("Observations y", fontsize=20)
d2l.plt.show()

위의 코드는 데이터 생성 및 시각화를 수행하는 파이썬 프로그램입니다. 코드의 각 부분에 대한 설명은 다음과 같습니다:

def data_maker1(x, sig):: 이 줄은 data_maker1라는 사용자 지정 함수를 정의합니다. 이 함수는 두 개의 입력 매개변수 x와 sig를 받습니다. x는 입력 데이터로 사용되며, sig는 노이즈의 크기를 나타내는 표준 편차입니다.
return np.sin(x) + 0.5 * np.sin(4 * x) + np.random.randn(x.shape[0]) * sig: 이 줄은 입력 데이터 x에 대한 관측치를 생성합니다. 관측치는 sin 함수와 4배 주파수가 높은 sin 함수를 합한 값에 노이즈를 추가한 결과입니다. 노이즈는 평균이 0이고 표준 편차가 sig인 정규 분포에서 생성됩니다.
sig = 0.25: 이 줄은 데이터 생성에 사용할 노이즈의 크기를 나타내는 sig 변수를 설정합니다. 이 변수는 0.25로 설정되어 있습니다.
train_x, test_x = np.linspace(0, 5, 50), np.linspace(0, 5, 500): 이 줄은 학습 데이터와 테스트 데이터의 x 값 범위를 생성합니다. np.linspace 함수를 사용하여 0부터 5까지의 범위를 50개의 등간격으로 분할한 것과 500개의 등간격으로 분할한 것을 각각 train_x와 test_x에 할당합니다.
train_y, test_y = data_maker1(train_x, sig=sig), data_maker1(test_x, sig=0.): 이 줄은 data_maker1 함수를 사용하여 학습 데이터와 테스트 데이터에 대한 관측치 train_y와 test_y를 생성합니다. 학습 데이터의 경우 sig 변수 값을 사용하고, 테스트 데이터의 경우 노이즈 없이 생성됩니다.
d2l.plt.scatter(train_x, train_y): 이 줄은 학습 데이터를 산점도로 시각화합니다. train_x와 train_y는 x와 y 축에 대한 데이터 포인트를 나타냅니다.
d2l.plt.plot(test_x, test_y): 이 줄은 테스트 데이터를 선 그래프로 시각화합니다. test_x와 test_y는 x와 y 축에 대한 데이터 포인트를 나타냅니다.
d2l.plt.xlabel("x", fontsize=20) 및 d2l.plt.ylabel("Observations y", fontsize=20): 이 두 줄은 x 축과 y 축에 라벨을 추가하고 글꼴 크기를 설정합니다.
d2l.plt.show(): 이 줄은 그래프를 화면에 표시합니다.

이 코드는 sin 함수와 노이즈를 추가하여 가상의 데이터를 생성하고, 학습 데이터와 테스트 데이터를 시각화하여 데이터의 분포를 확인하는 데 사용됩니다.

Here we see the noisy observations as circles, and the noise-free function in blue that we wish to find.

여기서는 잡음이 있는 관측값을 원으로 표시하고, 잡음이 없는 함수는 파란색으로 표시합니다.

Now, let’s specify a GP prior over the latent noise-free function, f(x)∼GP(m,k). We’ll use a mean function m(x)=0, and an RBF covariance function (kernel)

이제 잠재 잡음 없는 함수 f(x)∼GP(m,k)보다 먼저 GP를 지정해 보겠습니다. 평균 함수 m(x)=0과 RBF 공분산 함수(커널)를 사용하겠습니다.

mean = np.zeros(test_x.shape[0])
cov = d2l.rbfkernel(test_x, test_x, ls=0.2)

위의 코드는 평균과 공분산 행렬을 계산하는 부분입니다. 이 코드는 Gaussian Process 모델에서 확률 분포를 나타내는 데 사용됩니다. 코드의 각 부분에 대한 설명은 다음과 같습니다:

mean = np.zeros(test_x.shape[0]): 이 줄은 mean 변수를 생성하고, 이 변수를 테스트 데이터 포인트 수와 같은 길이의 제로 벡터로 초기화합니다. 이 벡터는 Gaussian Process 모델의 평균을 나타냅니다. 여기서 test_x의 shape[0]은 테스트 데이터 포인트의 수를 나타냅니다.
cov = d2l.rbfkernel(test_x, test_x, ls=0.2): 이 줄은 d2l.rbfkernel 함수를 사용하여 테스트 데이터 포인트 간의 공분산 행렬(cov)을 계산합니다. RBF (Radial Basis Function) 커널을 사용하여 계산하며, ls 매개변수는 커널의 길이 스케일을 나타냅니다. 이 커널은 Gaussian Process 모델에서 관측치 간의 상관 관계를 나타냅니다.

결과적으로, mean 변수는 테스트 데이터 포인트에 대한 평균을 나타내고, cov 변수는 테스트 데이터 포인트 간의 공분산을 나타냅니다. 이러한 정보는 Gaussian Process 모델을 구축하고 예측을 수행하는 데 사용됩니다.

We have started with a length-scale of 0.2. Before we fit the data, it is important to consider whether we have specified a reasonable prior. Let’s visualize some sample functions from this prior, as well as the 95% credible set (we believe there’s a 95% chance that the true function is within this region).

우리는 길이 척도 0.2로 시작했습니다. 데이터를 피팅하기 전에 합리적인 사전 설정을 지정했는지 고려하는 것이 중요합니다. 이전의 일부 샘플 함수와 95% 신뢰할 수 있는 집합을 시각화해 보겠습니다(우리는 실제 함수가 이 영역 내에 있을 확률이 95%라고 믿습니다).

prior_samples = np.random.multivariate_normal(mean=mean, cov=cov, size=5)
d2l.plt.plot(test_x, prior_samples.T, color='black', alpha=0.5)
d2l.plt.plot(test_x, mean, linewidth=2.)
d2l.plt.fill_between(test_x, mean - 2 * np.diag(cov), mean + 2 * np.diag(cov),
                 alpha=0.25)
d2l.plt.show()

위의 코드는 Gaussian Process의 사전 분포를 시각화하기 위한 파이썬 코드입니다. 코드의 각 부분에 대한 설명은 다음과 같습니다:

prior_samples = np.random.multivariate_normal(mean=mean, cov=cov, size=5): 이 줄은 np.random.multivariate_normal 함수를 사용하여 Gaussian Process의 사전 분포에서 무작위로 샘플을 생성합니다. 이 샘플은 mean 벡터와 cov 공분산 행렬을 기반으로 생성되며, size=5로 설정하여 5개의 샘플을 생성합니다.
d2l.plt.plot(test_x, prior_samples.T, color='black', alpha=0.5): 이 줄은 이전에 생성한 사전 샘플을 시각화합니다. test_x를 x 축으로 하고, 각 샘플을 선 그래프로 표시합니다. color='black'로 설정하여 검은색으로 그림과 alpha=0.5로 설정하여 투명도를 조절합니다.
d2l.plt.plot(test_x, mean, linewidth=2.): 이 줄은 Gaussian Process의 평균을 시각화합니다. test_x를 x 축으로 하고 mean을 y 축으로 하는 선 그래프를 그립니다. linewidth=2.로 설정하여 선의 두께를 조절합니다.
d2l.plt.fill_between(test_x, mean - 2 * np.diag(cov), mean + 2 * np.diag(cov), alpha=0.25): 이 줄은 Gaussian Process의 신뢰 구간을 시각화합니다. test_x 범위에서 mean - 2 * np.diag(cov)와 mean + 2 * np.diag(cov) 사이를 채우는 영역을 그립니다. 이 영역은 95% 신뢰 구간을 나타내며, alpha=0.25로 설정하여 투명도를 조절합니다.
d2l.plt.show(): 이 줄은 그래프를 화면에 표시합니다.

이 코드는 Gaussian Process의 사전 분포를 시각화하여 모델의 예측의 불확실성을 표현합니다. 사전 샘플, 평균 및 신뢰 구간을 통해 모델이 데이터에 대해 어떤 예측을 수행할 수 있는지와 해당 예측의 불확실성을 이해하는 데 도움이 됩니다.

Do these samples look reasonable? Are the high-level properties of the functions aligned with the type of data we are trying to model?

이 샘플이 합리적으로 보입니까? 함수의 상위 수준 속성이 우리가 모델링하려는 데이터 유형과 일치합니까?

Now let’s form the mean and variance of the posterior predictive distribution at any arbitrary test point x∗.

이제 임의의 테스트 지점 x*에서 사후 예측 분포의 평균과 분산을 만들어 보겠습니다.

Before we make predictions, we should learn our kernel hyperparameters θ and noise variance σ**2. Let’s initialize our length-scale at 0.75, as our prior functions looked too quickly varying compared to the data we are fitting. We’ll also guess a noise standard deviation σ of 0.75.

예측을 하기 전에 커널 하이퍼파라미터 θ와 노이즈 분산 σ**2를 배워야 합니다. 이전 함수가 피팅 중인 데이터에 비해 너무 빠르게 변하는 것처럼 보이므로 길이 척도를 0.75로 초기화하겠습니다. 또한 잡음 표준편차 σ를 0.75로 추측하겠습니다.

In order to learn these parameters, we will maximize the marginal likelihood with respect to these parameters.

이러한 매개변수를 학습하기 위해 이러한 매개변수에 대한 한계우도를 최대화하겠습니다.

Perhaps our prior functions were too quickly varying. Let’s guess a length-scale of 0.4. We’ll also guess a noise standard deviation of 0.75. These are simply hyperparameter initializations — we will learn these parameters from the marginal likelihood.

아마도 우리의 이전 기능이 너무 빠르게 변화했을 수도 있습니다. 길이 척도를 0.4로 가정해 보겠습니다. 또한 잡음 표준편차를 0.75로 추측하겠습니다. 이것은 단순히 하이퍼파라미터 초기화입니다. 우리는 이러한 매개변수를 한계 가능성으로부터 학습할 것입니다.

ell_est = 0.4
post_sig_est = 0.5

def neg_MLL(pars):
    K = d2l.rbfkernel(train_x, train_x, ls=pars[0])
    kernel_term = -0.5 * train_y @ \
        np.linalg.inv(K + pars[1] ** 2 * np.eye(train_x.shape[0])) @ train_y
    logdet = -0.5 * np.log(np.linalg.det(K + pars[1] ** 2 * \
                                         np.eye(train_x.shape[0])))
    const = -train_x.shape[0] / 2. * np.log(2 * np.pi)

    return -(kernel_term + logdet + const)


learned_hypers = optimize.minimize(neg_MLL, x0=np.array([ell_est,post_sig_est]),
                                   bounds=((0.01, 10.), (0.01, 10.)))
ell = learned_hypers.x[0]
post_sig_est = learned_hypers.x[1]

위의 코드는 Gaussian Process 모델의 하이퍼파라미터(길이 스케일과 노이즈 수준)를 최적화하기 위한 파이썬 코드입니다. 코드의 각 부분에 대한 설명은 다음과 같습니다:

ell_est = 0.4 및 post_sig_est = 0.5: 이 두 줄은 Gaussian Process 모델의 초기 추정치를 설정합니다. ell_est는 길이 스케일을 나타내고, post_sig_est는 노이즈 수준을 나타냅니다.
def neg_MLL(pars):: 이 줄은 Gaussian Process 모델의 로그 마이너스 마지널 우도(Log Marginal Likelihood)를 계산하는 사용자 지정 함수 neg_MLL을 정의합니다. 이 함수는 하이퍼파라미터 pars를 입력으로 받습니다.
K = d2l.rbfkernel(train_x, train_x, ls=pars[0]): 이 줄은 pars[0] 값을 사용하여 길이 스케일을 설정하고, RBF 커널을 계산합니다. train_x 간의 커널 행렬 K를 생성합니다.
kernel_term = -0.5 * train_y @ np.linalg.inv(K + pars[1] ** 2 * np.eye(train_x.shape[0])) @ train_y: 이 줄은 커널 기반 항을 계산합니다. 이 항은 데이터 포인트의 관측치 train_y를 사용하여 계산되며, 커널 행렬 K와 노이즈의 분산을 고려합니다.
logdet = -0.5 * np.log(np.linalg.det(K + pars[1] ** 2 * np.eye(train_x.shape[0]))): 이 줄은 로그 행렬식(log determinant)을 계산합니다. 로그 행렬식은 Gaussian Process 모델의 복잡성을 나타내며, 커널 행렬 K와 노이즈의 분산을 고려합니다.
const = -train_x.shape[0] / 2. * np.log(2 * np.pi): 이 줄은 상수항을 계산합니다. 이 항은 데이터 포인트의 수와 관련이 있으며, Gaussian Process 모델의 복잡성을 나타냅니다.
return -(kernel_term + logdet + const): 이 줄은 로그 마이너스 마지널 우도(negative log marginal likelihood)를 반환합니다. 이 값은 하이퍼파라미터를 조정하여 최소화하려는 목표 함수로 사용됩니다.
learned_hypers = optimize.minimize(neg_MLL, x0=np.array([ell_est,post_sig_est]), bounds=((0.01, 10.), (0.01, 10.))): 이 줄은 목표 함수인 neg_MLL를 최소화하여 하이퍼파라미터를 학습하는 과정을 수행합니다. 초기 추정치로 ell_est와 post_sig_est를 사용하고, 각 하이퍼파라미터의 최적 값을 찾기 위해 optimize.minimize 함수를 사용합니다. bounds 매개변수를 사용하여 각 하이퍼파라미터의 최적화 범위를 지정합니다.
ell = learned_hypers.x[0]와 post_sig_est = learned_hypers.x[1]: 이 두 줄은 최적화된 하이퍼파라미터 값을 추출합니다. 최적 길이 스케일은 ell 변수에 저장되고, 최적 노이즈 수준은 post_sig_est 변수에 저장됩니다.

이 코드는 Gaussian Process 모델의 하이퍼파라미터를 최적화하여 모델의 예측을 더 정확하게 조정하고 더 좋은 성능을 얻는 데 사용됩니다. 최적화된 하이퍼파라미터는 모델의 복잡성 및 예측의 정확성을 조절하는 데 중요합니다.

In this instance, we learn a length-scale of 0.299, and a noise standard deviation of 0.24. Note that the learned noise is extremely close to the true noise, which helps indicate that our GP is a very well-specified to this problem.

이 경우 길이 척도는 0.299, 잡음 표준 편차는 0.24를 학습합니다. 학습된 잡음은 실제 잡음과 매우 유사하므로 GP가 이 문제에 대해 매우 잘 지정되어 있음을 나타내는 데 도움이 됩니다.

In general, it is crucial to put careful thought into selecting the kernel and initializing the hyperparameters. While marginal likelihood optimization can be relatively robust to initialization, it is not immune to poor initializations. Try running the above script with a variety of initializations and see what results you find.

일반적으로 커널을 선택하고 하이퍼파라미터를 초기화할 때 신중하게 생각하는 것이 중요합니다. 한계 우도 최적화는 초기화에 상대적으로 강력할 수 있지만 잘못된 초기화에는 영향을 받지 않습니다. 다양한 초기화를 사용하여 위 스크립트를 실행해 보고 어떤 결과가 나오는지 확인하세요.

Now, let’s make predictions with these learned hypers.

이제 이러한 학습된 하이퍼를 사용하여 예측을 해보겠습니다.

K_x_xstar = d2l.rbfkernel(train_x, test_x, ls=ell)
K_x_x = d2l.rbfkernel(train_x, train_x, ls=ell)
K_xstar_xstar = d2l.rbfkernel(test_x, test_x, ls=ell)

post_mean = K_x_xstar.T @ np.linalg.inv((K_x_x + \
                post_sig_est ** 2 * np.eye(train_x.shape[0]))) @ train_y
post_cov = K_xstar_xstar - K_x_xstar.T @ np.linalg.inv((K_x_x + \
                post_sig_est ** 2 * np.eye(train_x.shape[0]))) @ K_x_xstar

lw_bd = post_mean - 2 * np.sqrt(np.diag(post_cov))
up_bd = post_mean + 2 * np.sqrt(np.diag(post_cov))

d2l.plt.scatter(train_x, train_y)
d2l.plt.plot(test_x, test_y, linewidth=2.)
d2l.plt.plot(test_x, post_mean, linewidth=2.)
d2l.plt.fill_between(test_x, lw_bd, up_bd, alpha=0.25)
d2l.plt.legend(['Observed Data', 'True Function', 'Predictive Mean', '95% Set on True Func'])
d2l.plt.show()

위의 코드는 Gaussian Process 모델을 사용하여 데이터의 예측을 수행하고 결과를 시각화하는 파이썬 코드입니다. 코드의 각 부분에 대한 설명은 다음과 같습니다:

K_x_xstar = d2l.rbfkernel(train_x, test_x, ls=ell): 이 줄은 학습 데이터 train_x와 테스트 데이터 test_x 간의 커널 행렬 K_x_xstar를 계산합니다. 이 커널 행렬은 학습 데이터와 테스트 데이터 간의 상관 관계를 나타냅니다.
K_x_x = d2l.rbfkernel(train_x, train_x, ls=ell): 이 줄은 학습 데이터 train_x 간의 커널 행렬 K_x_x를 계산합니다. 이 커널 행렬은 학습 데이터 포인트 간의 상관 관계를 나타냅니다.
K_xstar_xstar = d2l.rbfkernel(test_x, test_x, ls=ell): 이 줄은 테스트 데이터 test_x 간의 커널 행렬 K_xstar_xstar를 계산합니다. 이 커널 행렬은 테스트 데이터 포인트 간의 상관 관계를 나타냅니다.
post_mean = K_x_xstar.T @ np.linalg.inv((K_x_x + post_sig_est ** 2 * np.eye(train_x.shape[0]))) @ train_y: 이 줄은 예측 평균을 계산합니다. 예측 평균은 테스트 데이터와 학습 데이터 간의 상관 관계를 고려하여 계산되며, train_y는 학습 데이터의 관측치입니다.
post_cov = K_xstar_xstar - K_x_xstar.T @ np.linalg.inv((K_x_x + post_sig_est ** 2 * np.eye(train_x.shape[0]))) @ K_x_xstar: 이 줄은 예측 공분산을 계산합니다. 예측 공분산은 테스트 데이터 간의 상관 관계를 고려하여 계산되며, 모델의 불확실성을 나타냅니다.
lw_bd = post_mean - 2 * np.sqrt(np.diag(post_cov))와 up_bd = post_mean + 2 * np.sqrt(np.diag(post_cov)): 이 두 줄은 예측 공분산을 기반으로 95% 신뢰 구간을 계산합니다. lw_bd는 신뢰 구간의 하한을 나타내고, up_bd는 신뢰 구간의 상한을 나타냅니다.
d2l.plt.scatter(train_x, train_y): 이 줄은 학습 데이터를 산점도로 시각화합니다.
d2l.plt.plot(test_x, test_y, linewidth=2.): 이 줄은 테스트 데이터에 대한 실제 함수를 그립니다.
d2l.plt.plot(test_x, post_mean, linewidth=2.): 이 줄은 예측 평균을 그립니다.
d2l.plt.fill_between(test_x, lw_bd, up_bd, alpha=0.25): 이 줄은 95% 신뢰 구간을 시각화합니다. lw_bd와 up_bd 사이를 채우는 영역을 그립니다.
d2l.plt.legend(['Observed Data', 'True Function', 'Predictive Mean', '95% Set on True Func']): 이 줄은 그래프에 범례를 추가합니다.
d2l.plt.show(): 이 줄은 그래프를 화면에 표시합니다.

이 코드는 Gaussian Process 모델을 사용하여 데이터의 예측 평균과 신뢰 구간을 계산하고, 학습 데이터, 실제 함수, 예측 평균 및 신뢰 구간을 함께 시각화하여 모델의 예측을 평가합니다.

We see the posterior mean in orange almost perfectly matches the true noise free function! Note that the 95% credible set we are showing is for the latent noise free (true) function, and not the data points. We see that this credible set entirely contains the true function, and does not seem overly wide or narrow. We would not want nor expect it to contain the data points. If we wish to have a credible set for the observations, we should compute

주황색의 사후 평균이 실제 노이즈 없는 기능과 거의 완벽하게 일치하는 것을 볼 수 있습니다! 우리가 보여주고 있는 95% 신뢰할 수 있는 세트는 데이터 포인트가 아닌 잠재 잡음 없는(true) 기능에 대한 것입니다. 우리는 이 신뢰할 수 있는 집합이 진정한 기능을 완전히 포함하고 있으며 지나치게 넓거나 좁아 보이지 않는다는 것을 알 수 있습니다. 우리는 데이터 포인트가 포함되는 것을 원하지도 기대하지도 않습니다. 관측값에 대해 신뢰할 수 있는 세트를 갖고 싶다면 다음을 계산해야 합니다.

lw_bd_observed = post_mean - 2 * np.sqrt(np.diag(post_cov) + post_sig_est ** 2)
up_bd_observed = post_mean + 2 * np.sqrt(np.diag(post_cov) + post_sig_est ** 2)

위의 코드는 관찰된 데이터 포인트에 대한 95% 신뢰 구간을 계산하는 부분입니다. 코드의 각 부분에 대한 설명은 다음과 같습니다:

lw_bd_observed = post_mean - 2 * np.sqrt(np.diag(post_cov) + post_sig_est ** 2): 이 줄은 하한(낮은 경계) lw_bd_observed를 계산합니다. 이 하한은 예측 평균인 post_mean에서 예측 공분산 행렬 post_cov의 대각 요소에 post_sig_est의 제곱을 더한 값에서 2배의 표준 편차를 뺀 것입니다. 이를 통해 관찰된 데이터 포인트의 예측에 대한 하한을 계산합니다.
up_bd_observed = post_mean + 2 * np.sqrt(np.diag(post_cov) + post_sig_est ** 2): 이 줄은 상한(높은 경계) up_bd_observed를 계산합니다. 이 상한은 예측 평균인 post_mean에서 예측 공분산 행렬 post_cov의 대각 요소에 post_sig_est의 제곱을 더한 값에서 2배의 표준 편차를 더한 것입니다. 이를 통해 관찰된 데이터 포인트의 예측에 대한 상한을 계산합니다.

이렇게 계산된 하한과 상한을 사용하면 관찰된 데이터 포인트에 대한 예측의 신뢰 구간을 나타낼 수 있습니다. 이 구간은 모델의 예측의 불확실성을 표현하며, 95%의 신뢰 수준에서 관찰된 데이터 포인트가 포함될 것으로 예상됩니다.

There are two sources of uncertainty, epistemic uncertainty, representing reducible uncertainty, and aleatoric or irreducible uncertainty. The epistemic uncertainty here represents uncertainty about the true values of the noise free function. This uncertainty should grow as we move away from the data points, as away from the data there are a greater variety of function values consistent with our data. As we observe more and more data, our beliefs about the true function become more confident, and the epistemic uncertainty disappears. The aleatoric uncertainty in this instance is the observation noise, since the data are given to us with this noise, and it cannot be reduced.

불확실성에는 두 가지 원인이 있는데, 환원 가능한 불확실성을 나타내는 인식론적 불확실성과 우발적 또는 환원 불가능한 불확실성이 있습니다. 여기서 인식론적 불확실성은 잡음 없는 함수의 실제 값에 대한 불확실성을 나타냅니다. 이러한 불확실성은 데이터 포인트에서 멀어질수록 커집니다. 데이터에서 멀어지면 데이터와 일치하는 더 다양한 함수 값이 있기 때문입니다. 점점 더 많은 데이터를 관찰할수록 실제 함수에 대한 우리의 믿음은 더욱 확신을 갖게 되고 인식론적 불확실성은 사라집니다. 이 경우의 우연적 불확실성은 관찰 잡음입니다. 왜냐하면 데이터가 이 잡음과 함께 우리에게 제공되고 이를 줄일 수 없기 때문입니다.

The epistemic uncertainty in the data is captured by variance of the latent noise free function np.diag(post_cov). The aleatoric uncertainty is captured by the noise variance post_sig_est**2.

데이터의 인식론적 불확실성은 잠재 잡음 없는 함수 np.diag(post_cov)의 분산으로 포착됩니다. 우연적 불확실성은 post_sig_est**2 잡음 분산으로 포착됩니다.

Unfortunately, people are often careless about how they represent uncertainty, with many papers showing error bars that are completely undefined, no clear sense of whether we are visualizing epistemic or aleatoric uncertainty or both, and confusing noise variances with noise standard deviations, standard deviations with standard errors, confidence intervals with credible sets, and so on. Without being precise about what the uncertainty represents, it is essentially meaningless.

불행하게도 사람들은 종종 불확실성을 어떻게 표현하는지에 대해 부주의합니다. 많은 논문에서는 완전히 정의되지 않은 오류 막대가 표시되고, 인식론적 불확실성이나 우연적 불확실성 또는 둘 다를 시각화하고 있는지에 대한 명확한 감각이 없으며, 잡음 분산을 잡음 표준 편차와, 표준 편차를 잡음 표준 편차와 혼동합니다. 표준 오류, 신뢰할 수 있는 세트의 신뢰 구간 등. 불확실성이 무엇을 나타내는지 정확하게 밝히지 않으면 본질적으로 의미가 없습니다.

In the spirit of playing close attention to what our uncertainty represents, it is crucial to note that we are taking two times the square root of our variance estimate for the noise free function. Since our predictive distribution is Gaussian, this quantity enables us to form a 95% credible set, representing our beliefs about the interval which is 95% likely to contain the ground truth function. The noise variance is living on a completely different scale, and is much less interpretable.

불확실성이 무엇을 나타내는지에 세심한 주의를 기울이는 정신으로, 잡음 없는 함수에 대한 분산 추정치의 제곱근의 두 배를 취한다는 점에 유의하는 것이 중요합니다. 예측 분포는 가우스 분포이므로 이 수량을 통해 95% 신뢰할 수 있는 세트를 형성할 수 있으며, 이는 정답 함수를 포함할 가능성이 95%인 구간에 대한 우리의 믿음을 나타냅니다. 노이즈 분산은 완전히 다른 규모로 존재하며 해석하기가 훨씬 어렵습니다.

Finally, let’s take a look at 20 posterior samples. These samples tell us what types of functions we believe might fit our data, a posteriori.

마지막으로 20개의 후방 샘플을 살펴보겠습니다. 이 샘플은 어떤 유형의 함수가 데이터에 적합하다고 생각하는지 사후적으로 알려줍니다.

post_samples = np.random.multivariate_normal(post_mean, post_cov, size=20)
d2l.plt.scatter(train_x, train_y)
d2l.plt.plot(test_x, test_y, linewidth=2.)
d2l.plt.plot(test_x, post_mean, linewidth=2.)
d2l.plt.plot(test_x, post_samples.T, color='gray', alpha=0.25)
d2l.plt.fill_between(test_x, lw_bd, up_bd, alpha=0.25)
plt.legend(['Observed Data', 'True Function', 'Predictive Mean', 'Posterior Samples'])
d2l.plt.show()

위의 코드는 Gaussian Process 모델을 사용하여 데이터의 예측을 시각화하는 파이썬 코드입니다. 코드의 각 부분에 대한 설명은 다음과 같습니다:

post_samples = np.random.multivariate_normal(post_mean, post_cov, size=20): 이 줄은 post_mean과 post_cov를 기반으로 20개의 사후 샘플을 생성합니다. 이 샘플은 Gaussian Process 모델의 예측 분포에서 무작위로 추출된 것으로, 모델의 불확실성을 반영합니다.
d2l.plt.scatter(train_x, train_y): 이 줄은 학습 데이터를 산점도로 시각화합니다.
d2l.plt.plot(test_x, test_y, linewidth=2.): 이 줄은 테스트 데이터에 대한 실제 함수를 그립니다.
d2l.plt.plot(test_x, post_mean, linewidth=2.): 이 줄은 예측 평균을 그립니다.
d2l.plt.plot(test_x, post_samples.T, color='gray', alpha=0.25): 이 줄은 사후 샘플을 그립니다. 사후 샘플은 회색으로 표시되며, 투명도가 조절된 alpha=0.25로 설정되어 있습니다. 이를 통해 모델의 불확실성을 시각화합니다.
d2l.plt.fill_between(test_x, lw_bd, up_bd, alpha=0.25): 이 줄은 95% 신뢰 구간을 시각화합니다. lw_bd와 up_bd 사이를 채우는 영역을 그립니다.
plt.legend(['Observed Data', 'True Function', 'Predictive Mean', 'Posterior Samples']): 이 줄은 그래프에 범례를 추가합니다. 범례는 'Observed Data' (관찰된 데이터), 'True Function' (실제 함수), 'Predictive Mean' (예측 평균), 'Posterior Samples' (사후 샘플)을 표시합니다.
d2l.plt.show(): 이 줄은 그래프를 화면에 표시합니다.

이 코드는 Gaussian Process 모델을 사용하여 데이터의 예측을 시각화하고, 학습 데이터, 실제 함수, 예측 평균, 사후 샘플 및 신뢰 구간을 함께 시각화하여 모델의 예측과 불확실성을 평가합니다. 사후 샘플 및 신뢰 구간은 모델의 예측 분포를 더 자세히 이해하는 데 도움이 됩니다.

In basic regression applications, it is most common to use the posterior predictive mean and standard deviation as a point predictor and metric for uncertainty, respectively. In more advanced applications, such as Bayesian optimization with Monte Carlo acquisition functions, or Gaussian processes for model-based RL, it often necessary to take posterior samples. However, even if not strictly required in the basic applications, these samples give us more intuition about the fit we have for the data, and are often useful to include in visualizations.

기본 회귀 분석에서는 사후 예측 평균과 표준 편차를 각각 점 예측 변수와 불확실성 측정 기준으로 사용하는 것이 가장 일반적입니다. Monte Carlo 획득 기능을 사용한 베이지안 최적화 또는 모델 기반 RL을 위한 가우스 프로세스와 같은 고급 애플리케이션에서는 종종 사후 샘플을 가져와야 하는 경우가 있습니다. 그러나 기본 응용 프로그램에서 엄격하게 요구되지 않더라도 이러한 샘플은 데이터에 대한 적합성에 대해 더 많은 직관을 제공하며 종종 시각화에 포함하는 데 유용합니다.

18.3.5. Making Life Easy with GPyTorch

As we have seen, it is actually pretty easy to implement basic Gaussian process regression entirely from scratch. However, as soon as we want to explore a variety of kernel choices, consider approximate inference (which is needed even for classification), combine GPs with neural networks, or even have a dataset larger than about 10,000 points, then an implementation from scratch becomes unwieldy and cumbersome. Some of the most effective methods for scalable GP inference, such as SKI (also known as KISS-GP), can require hundreds of lines of code implementing advanced numerical linear algebra routines.

우리가 본 것처럼 기본 가우스 프로세스 회귀를 처음부터 완전히 구현하는 것은 실제로 매우 쉽습니다. 그러나 다양한 커널 선택을 탐색하고, 대략적인 추론(분류에도 필요함)을 고려하고, GP를 신경망과 결합하거나, 심지어 약 10,000포인트보다 큰 데이터 세트를 갖고자 하는 경우, 처음부터 다시 구현해야 합니다. 다루기 힘들고 번거롭다. SKI(KISS-GP라고도 함)와 같은 확장 가능한 GP 추론을 위한 가장 효과적인 방법 중 일부에는 고급 수치 선형 대수 루틴을 구현하는 수백 줄의 코드가 필요할 수 있습니다.

In these cases, the GPyTorch library will make our lives a lot easier. We’ll be discussing GPyTorch more in future notebooks on Gaussian process numerics, and advanced methods. The GPyTorch library contains many examples. To get a feel for the package, we will walk through the simple regression example, showing how it can be adapted to reproduce our above results using GPyTorch. This may seem like a lot of code to simply reproduce the basic regression above, and in a sense, it is. But we can immediately use a variety of kernels, scalable inference techniques, and approximate inference, by only changing a few lines of code from below, instead of writing potentially thousands of lines of new code.

이러한 경우 GPyTorch 라이브러리는 우리 삶을 훨씬 쉽게 만들어 줄 것입니다. 우리는 가우스 프로세스 수치 및 고급 방법에 대한 향후 노트북에서 GPyTorch에 대해 더 많이 논의할 것입니다. GPyTorch 라이브러리에는 많은 예제가 포함되어 있습니다. 패키지에 대한 느낌을 얻기 위해 간단한 회귀 예제를 살펴보고 GPyTorch를 사용하여 위 결과를 재현하도록 어떻게 적용할 수 있는지 보여드리겠습니다. 이는 위의 기본 회귀를 간단히 재현하기에는 많은 코드처럼 보일 수 있으며 어떤 의미에서는 그렇습니다. 그러나 잠재적으로 수천 줄의 새로운 코드를 작성하는 대신 아래에서 몇 줄의 코드만 변경하면 다양한 커널, 확장 가능한 추론 기술 및 대략적인 추론을 즉시 사용할 수 있습니다.

# First let's convert our data into tensors for use with PyTorch
train_x = torch.tensor(train_x)
train_y = torch.tensor(train_y)
test_y = torch.tensor(test_y)

# We are using exact GP inference with a zero mean and RBF kernel
class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ZeroMean()
        self.covar_module = gpytorch.kernels.ScaleKernel(
            gpytorch.kernels.RBFKernel())

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

위의 코드는 PyTorch와 GPyTorch를 사용하여 정확한 Gaussian Process(GP) 모델을 정의하는 파트입니다. 코드의 각 부분에 대한 설명은 다음과 같습니다:

데이터를 PyTorch Tensor로 변환하는 부분:
- train_x = torch.tensor(train_x): 학습 데이터 train_x를 PyTorch Tensor로 변환합니다.
- train_y = torch.tensor(train_y): 학습 데이터의 관측치 train_y를 PyTorch Tensor로 변환합니다.
- test_y = torch.tensor(test_y): 테스트 데이터의 관측치 test_y를 PyTorch Tensor로 변환합니다. (주의: 테스트 데이터의 test_x는 변환되지 않았으므로 주의가 필요합니다.)
정확한 GP 모델 정의:
- class ExactGPModel(gpytorch.models.ExactGP):: ExactGP 클래스를 상속받는 새로운 GP 모델 클래스 ExactGPModel을 정의합니다. 이 클래스는 GPyTorch를 사용하여 정확한 GP 추론을 수행합니다.
- def __init__(self, train_x, train_y, likelihood):: 모델의 생성자 메서드에서 학습 데이터 train_x, 관측치 train_y, 및 likelihood를 입력으로 받습니다.
- super(ExactGPModel, self).__init__(train_x, train_y, likelihood): 상위 클래스인 ExactGP의 생성자를 호출하여 모델을 초기화합니다.
- self.mean_module = gpytorch.means.ZeroMean(): 모델의 평균 함수를 Zero Mean으로 설정합니다. 이것은 GP 모델의 평균을 제로로 설정하는 의미입니다.
- self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel()): 모델의 공분산 함수를 Scale Kernel과 RBF Kernel의 조합으로 설정합니다. Scale Kernel은 공분산을 스케일링하고, RBF Kernel은 데이터 포인트 간의 상관 관계를 모델링합니다.
forward 메서드:
- def forward(self, x):: forward 메서드는 GP 모델의 순전파 연산을 정의합니다. 입력으로 x를 받아서 평균과 공분산을 계산하고, 이를 MultivariateNormal 분포로 반환합니다. 이 분포는 GP 모델의 예측 분포를 나타냅니다.

이 코드는 PyTorch와 GPyTorch를 사용하여 정확한 GP 모델을 정의하고, 모델의 평균 및 공분산 함수를 설정합니다. 이 모델을 사용하여 Gaussian Process 추론을 수행할 수 있습니다.

This code block puts the data in the right format for GPyTorch, and specifies that we are using exact inference, as well the mean function (zero) and kernel function (RBF) that we want to use. We can use any other kernel very easily, by calling, for instance, gpytorch.kernels.matern_kernel(), or gpyotrch.kernels.spectral_mixture_kernel(). So far, we have only discussed exact inference, where it is possible to infer a predictive distribution without making any approximations. For Gaussian processes, we can only perform exact inference when we have a Gaussian likelihood; more specifically, when we assume that our observations are generated as a noise-free function represented by a Gaussian process, plus Gaussian noise. In future notebooks, we will consider other settings, such as classification, where we cannot make these assumptions.

이 코드 블록은 GPyTorch에 적합한 형식으로 데이터를 배치하고 정확한 추론을 사용하고 있으며 사용하려는 평균 함수(0) 및 커널 함수(RBF)를 지정합니다. 예를 들어 gpytorch.kernels.matern_kernel() 또는 gpyotrch.kernels.spectral_mixture_kernel()을 호출하여 다른 커널을 매우 쉽게 사용할 수 있습니다. 지금까지 우리는 근사치를 만들지 않고도 예측 분포를 추론할 수 있는 정확한 추론에 대해서만 논의했습니다. 가우스 프로세스의 경우 가우스 우도가 있는 경우에만 정확한 추론을 수행할 수 있습니다. 더 구체적으로 말하면, 관측값이 가우스 프로세스와 가우스 노이즈로 표현되는 노이즈 없는 함수로 생성된다고 가정할 때입니다. 향후 노트북에서는 이러한 가정을 할 수 없는 분류와 같은 다른 설정을 고려할 것입니다.

# Initialize Gaussian likelihood
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(train_x, train_y, likelihood)
training_iter = 50
# Find optimal model hyperparameters
model.train()
likelihood.train()
# Use the adam optimizer, includes GaussianLikelihood parameters
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
# Set our loss as the negative log GP marginal likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

위의 코드는 Gaussian Process 모델의 학습을 위한 준비 단계를 수행하고 모델의 하이퍼파라미터를 최적화하기 위한 설정을 포함합니다. 코드의 각 부분에 대한 설명은 다음과 같습니다:

Gaussian Likelihood 초기화:
- likelihood = gpytorch.likelihoods.GaussianLikelihood(): Gaussian Likelihood 객체를 초기화합니다. 이 객체는 Gaussian Process 모델의 likelihood를 정의하며, 관측치의 노이즈 수준을 모델에 추가합니다.
정확한 GP 모델 및 학습 반복 횟수 초기화:
- model = ExactGPModel(train_x, train_y, likelihood): 이전에 정의한 ExactGPModel 클래스를 사용하여 GP 모델을 초기화합니다. 학습 데이터 train_x와 train_y, 그리고 위에서 초기화한 Gaussian Likelihood likelihood를 모델에 전달합니다.
- training_iter = 50: 모델을 학습시키기 위해 반복할 학습 횟수를 설정합니다. 이 경우 50번의 학습 반복을 수행합니다.
모델 및 likelihood를 학습 모드로 설정:
- model.train(): GP 모델을 학습 모드로 설정합니다. 이는 모델의 파라미터가 학습됨을 의미합니다.
- likelihood.train(): Gaussian Likelihood를 학습 모드로 설정합니다.
옵티마이저 설정:
- optimizer = torch.optim.Adam(model.parameters(), lr=0.1): Adam 옵티마이저를 초기화합니다. 이 옵티마이저는 GP 모델의 파라미터를 최적화하는 데 사용됩니다. model.parameters()를 사용하여 모델의 파라미터를 옵티마이저에 전달하고, 학습률(learning rate)은 lr=0.1로 설정합니다.
손실 함수 설정:
- mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model): 정확한 GP 주변 로그 우도를 계산하는 손실 함수를 설정합니다. 이 손실 함수는 Gaussian Process 모델의 학습에서 사용되며, 모델의 하이퍼파라미터를 최적화하는 데 도움을 줍니다.

이 코드는 GP 모델을 학습하기 위한 초기 설정을 수행하고, 모델의 하이퍼파라미터를 최적화하기 위한 준비를 마칩니다. 학습 반복을 통해 모델을 학습하고 최적의 하이퍼파라미터 값을 찾을 것입니다.

# Initialize Gaussian likelihood
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(train_x, train_y, likelihood)
training_iter = 50
# Find optimal model hyperparameters
model.train()
likelihood.train()
# Use the adam optimizer, includes GaussianLikelihood parameters
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
# Set our loss as the negative log GP marginal likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

위의 코드는 Gaussian Process (GP) 모델을 학습하기 위한 초기 설정 단계를 수행하는 파트입니다. 코드의 각 부분에 대한 설명은 다음과 같습니다:

Gaussian Likelihood 초기화:
- likelihood = gpytorch.likelihoods.GaussianLikelihood(): Gaussian Likelihood 객체를 초기화합니다. 이 객체는 GP 모델의 likelihood를 정의하며, 관측치의 노이즈 수준을 모델에 추가합니다.
GP 모델 초기화:
- model = ExactGPModel(train_x, train_y, likelihood): GP 모델을 초기화합니다. 이전에 정의한 ExactGPModel 클래스를 사용하여 모델을 생성합니다. 학습 데이터 train_x와 train_y, 그리고 위에서 초기화한 Gaussian Likelihood likelihood를 모델에 전달합니다. 이 모델은 평균 함수와 공분산 함수를 설정한 정확한 GP 모델입니다.
학습 반복 횟수 설정:
- training_iter = 50: 모델을 학습시키기 위해 반복할 학습 횟수를 설정합니다. 이 경우 50번의 학습 반복을 수행합니다.
모델 및 likelihood를 학습 모드로 설정:
- model.train(): GP 모델을 학습 모드로 설정합니다. 이것은 모델의 파라미터가 학습됨을 의미합니다.
- likelihood.train(): Gaussian Likelihood를 학습 모드로 설정합니다. likelihood 모델의 파라미터도 학습될 것입니다.
옵티마이저 설정:
- optimizer = torch.optim.Adam(model.parameters(), lr=0.1): Adam 옵티마이저를 초기화합니다. 이 옵티마이저는 GP 모델의 파라미터를 최적화하는 데 사용됩니다. model.parameters()를 사용하여 모델의 파라미터를 옵티마이저에 전달하고, 학습률(learning rate)은 lr=0.1로 설정합니다.
손실 함수 설정:
- mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model): 정확한 GP 주변 로그 우도를 계산하는 손실 함수를 설정합니다. 이 손실 함수는 Gaussian Process 모델의 학습에서 사용되며, 모델의 하이퍼파라미터를 최적화하는 데 도움을 줍니다.

Here, we explicitly specify the likelihood we want to use (Gaussian), the objective we will use for training kernel hyperparameters (here, the marginal likelihood), and the procedure we we want to use for optimizing that objective (in this case, Adam). We note that while we are using Adam, which is a “stochastic” optimizer, in this case, it is full-batch Adam. Because the marginal likelihood does not factorize over data instances, we cannot use an optimizer over “mini-batches” of data and be guaranteed convergence. Other optimizers, such as L-BFGS, are also supported by GPyTorch. Unlike in standard deep learning, doing a good job of optimizing the marginal likelihood corresponds strongly with good generalization, which often inclines us towards powerful optimizers like L-BFGS, assuming they are not prohibitively expensive.

여기서는 사용하려는 우도(가우시안), 커널 하이퍼파라미터 훈련에 사용할 목표(여기서는 한계 우도), 해당 목표를 최적화하기 위해 사용할 절차(이 경우 Adam)를 명시적으로 지정합니다. ). 우리는 "확률적" 최적화 프로그램인 Adam을 사용하고 있지만 이 경우에는 전체 배치 Adam이라는 점에 주목합니다. 한계 가능성은 데이터 인스턴스에 대해 인수분해되지 않기 때문에 데이터의 "미니 배치"에 대해 최적화 프로그램을 사용할 수 없으며 수렴을 보장할 수 없습니다. L-BFGS와 같은 다른 최적화 프로그램도 GPyTorch에서 지원됩니다. 표준 딥러닝과 달리, 한계 우도를 최적화하는 작업을 잘 수행하는 것은 좋은 일반화와 강력하게 일치합니다. 이는 종종 L-BFGS와 같은 강력한 최적화 프로그램이 엄청나게 비싸지 않다고 가정할 때 선호하게 됩니다.

for i in range(training_iter):
    # Zero gradients from previous iteration
    optimizer.zero_grad()
    # Output from model
    output = model(train_x)
    # Calc loss and backprop gradients
    loss = -mll(output, train_y)
    loss.backward()
    if i % 10 == 0:
        print(f'Iter {i+1:d}/{training_iter:d} - Loss: {loss.item():.3f} '
              f'squared lengthscale: '
              f'{model.covar_module.base_kernel.lengthscale.item():.3f} '
              f'noise variance: {model.likelihood.noise.item():.3f}')
    optimizer.step()

위의 코드는 GP 모델을 학습하기 위한 반복적인 학습 루프를 구현하는 파트입니다. 코드의 각 부분에 대한 설명은 다음과 같습니다:

for i in range(training_iter):: 학습 반복을 training_iter 횟수만큼 반복합니다. 이는 모델의 파라미터를 최적화하기 위한 학습 과정을 나타냅니다.
optimizer.zero_grad(): 각 반복에서 이전 반복에서의 그라디언트를 초기화합니다. 이렇게 하면 새로운 그라디언트를 계산할 수 있게 됩니다.
output = model(train_x): GP 모델을 사용하여 학습 데이터 train_x에 대한 출력을 계산합니다. 이는 GP 모델의 예측을 나타냅니다.
loss = -mll(output, train_y): 손실 함수를 계산합니다. 여기서는 GP 모델의 예측 output과 실제 학습 데이터의 관측치 train_y를 사용하여 음의 로그 주변 로그 우도를 계산합니다. 이 손실 함수는 GP 모델의 파라미터를 최적화하는 데 사용됩니다.
loss.backward(): 손실 함수를 사용하여 그라디언트(기울기)를 계산하고, 그라디언트를 모델의 파라미터에 역전파합니다. 이를 통해 파라미터를 업데이트합니다.
if i % 10 == 0:: 매 10번째 반복마다 아래의 정보를 출력합니다. 이 정보에는 현재 학습 반복 횟수, 손실 값, 커널의 길이 스케일, 및 노이즈 분산이 포함됩니다.
optimizer.step(): 옵티마이저를 사용하여 모델의 파라미터를 업데이트합니다. 역전파된 그라디언트를 사용하여 모델의 파라미터를 조정하여 손실을 최소화합니다.

이렇게 반복적으로 학습을 수행하면 GP 모델의 파라미터가 최적화되며, 모델은 학습 데이터에 더 잘 맞게 됩니다. 학습 반복 횟수가 증가함에 따라 손실이 감소하고 모델의 파라미터가 조정됩니다.

Iter 1/50 - Loss: 1.000 squared lengthscale: 0.693 noise variance: 0.693
Iter 11/50 - Loss: 0.711 squared lengthscale: 0.490 noise variance: 0.312
Iter 21/50 - Loss: 0.451 squared lengthscale: 0.506 noise variance: 0.127
Iter 31/50 - Loss: 0.330 squared lengthscale: 0.485 noise variance: 0.055
Iter 41/50 - Loss: 0.344 squared lengthscale: 0.472 noise variance: 0.038

Here we actually run the optimization procedure, outputting the values of the loss every 10 iterations.

여기에서는 실제로 최적화 절차를 실행하여 10번 반복마다 손실 값을 출력합니다.

# Get into evaluation (predictive posterior) mode
test_x = torch.tensor(test_x)
model.eval()
likelihood.eval()
observed_pred = likelihood(model(test_x))

위의 코드는 GP 모델을 평가 모드로 전환하고 테스트 데이터에 대한 예측을 수행하는 부분입니다. 코드의 각 부분에 대한 설명은 다음과 같습니다:

test_x = torch.tensor(test_x): 테스트 데이터 test_x를 PyTorch Tensor로 변환합니다. 테스트 데이터는 모델이 학습한 데이터가 아니며, 모델이 이 데이터에 대한 예측을 수행할 것입니다.
model.eval(): GP 모델을 평가 모드로 전환합니다. 모델을 평가 모드로 설정하면 모델의 파라미터는 고정되며, 그라디언트가 계산되지 않습니다. 이 모드에서 모델은 예측을 수행하기 위해 사용됩니다.
likelihood.eval(): Gaussian Likelihood를 평가 모드로 전환합니다. 마찬가지로 likelihood 모델의 파라미터는 고정되고, 그라디언트가 계산되지 않습니다. 이 모드에서 likelihood 모델은 예측 분포의 노이즈를 정의하는 데 사용됩니다.
observed_pred = likelihood(model(test_x)): 테스트 데이터 test_x에 대한 예측을 수행합니다. 먼저 GP 모델에 test_x를 입력으로 전달하여 예측값을 계산하고, 그 다음 Gaussian Likelihood를 사용하여 노이즈를 고려한 예측 분포를 얻습니다. 이 결과는 observed_pred에 저장됩니다.

이렇게 평가 모드로 전환된 모델과 likelihood를 사용하여 테스트 데이터에 대한 예측을 수행하고, 예측 분포를 얻게 됩니다. 이 예측 분포를 사용하여 모델의 성능을 평가하거나 시각화하는 데 사용할 수 있습니다.

The above codeblock enables us to make predictions on our test inputs.

위의 코드 블록을 사용하면 테스트 입력에 대해 예측할 수 있습니다.

with torch.no_grad():
    # Initialize plot
    f, ax = d2l.plt.subplots(1, 1, figsize=(4, 3))
    # Get upper and lower bounds for 95\% credible set (in this case, in
    # observation space)
    lower, upper = observed_pred.confidence_region()
    ax.scatter(train_x.numpy(), train_y.numpy())
    ax.plot(test_x.numpy(), test_y.numpy(), linewidth=2.)
    ax.plot(test_x.numpy(), observed_pred.mean.numpy(), linewidth=2.)
    ax.fill_between(test_x.numpy(), lower.numpy(), upper.numpy(), alpha=0.25)
    ax.set_ylim([-1.5, 1.5])
    ax.legend(['True Function', 'Predictive Mean', 'Observed Data',
               '95% Credible Set'])

위의 코드는 GP 모델을 사용하여 예측을 수행하고 그 결과를 시각화하는 부분입니다. 코드의 각 부분에 대한 설명은 다음과 같습니다:

with torch.no_grad():: 이 부분은 PyTorch의 그라디언트 계산을 비활성화하는 torch.no_grad() 컨텍스트를 생성합니다. 이 컨텍스트 내에서는 모델의 예측을 수행할 때 그라디언트가 계산되지 않으므로, 모델이 추론만 수행하게 됩니다.
f, ax = d2l.plt.subplots(1, 1, figsize=(4, 3)): 그래프를 생성하고 그림을 그릴 축을 설정합니다. d2l.plt.subplots(1, 1, figsize=(4, 3))는 크기가 4x3인 그래프를 생성하고 그림을 그릴 하나의 축을 반환합니다.
lower, upper = observed_pred.confidence_region(): observed_pred에 저장된 예측 분포를 사용하여 95% 신뢰 구간의 하한과 상한을 얻습니다. 이 구간은 예측 분포를 기반으로 합니다.
ax.scatter(train_x.numpy(), train_y.numpy()): 학습 데이터를 산점도로 시각화합니다. train_x와 train_y는 학습 데이터의 입력과 관측치입니다.
ax.plot(test_x.numpy(), test_y.numpy(), linewidth=2.): 실제 함수를 그래프로 그립니다. 이것은 테스트 데이터의 실제 관측치에 해당합니다.
ax.plot(test_x.numpy(), observed_pred.mean.numpy(), linewidth=2.): GP 모델의 예측 평균을 그래프로 그립니다. 이것은 GP 모델이 예측한 평균 함수입니다.
ax.fill_between(test_x.numpy(), lower.numpy(), upper.numpy(), alpha=0.25): GP 모델의 예측 신뢰 구간을 그래프로 표시합니다. 이 신뢰 구간은 lower와 upper로 정의되며, 95% 신뢰 구간을 나타냅니다. alpha=0.25는 채우는 영역의 투명도를 설정합니다.
ax.set_ylim([-1.5, 1.5]): y-축의 범위를 설정합니다. 이것은 그래프의 y-축 범위를 -1.5에서 1.5로 제한합니다.
ax.legend(...): 그래프에 범례를 추가합니다. 범례에는 'True Function' (실제 함수), 'Predictive Mean' (예측 평균), 'Observed Data' (관찰된 데이터), '95% Credible Set' (95% 신뢰 구간)이 포함됩니다.

이 코드는 GP 모델의 예측과 관측 데이터를 함께 시각화하여 모델의 예측 분포를 평가하고, 모델이 데이터를 어떻게 설명하고 있는지를 시각적으로 확인합니다.

Finally, we plot the fit.

마지막으로 피팅을 플롯합니다.

We see the fits are virtually identical. A few things to note: GPyTorch is working with squared length-scales and observation noise. For example, our learned noise standard deviation in the for scratch code is about 0.283. The noise variance found by GPyTorch is 0.81≈0.2832. In the GPyTorch plot, we also show the credible set in the observation space rather than the latent function space, to demonstrate that they indeed cover the observed datapoints.

우리는 핏이 거의 동일하다는 것을 알 수 있습니다. 몇 가지 참고 사항: GPyTorch는 제곱 길이 척도와 관찰 노이즈를 사용하여 작업합니다. 예를 들어 스크래치 코드에서 학습된 노이즈 표준 편차는 약 0.283입니다. GPyTorch에서 발견한 노이즈 분산은 0.81≒0.2832입니다. GPyTorch 플롯에서는 잠재 함수 공간이 아닌 관찰 공간에 신뢰할 수 있는 세트도 표시하여 실제로 관찰된 데이터 포인트를 포괄한다는 것을 보여줍니다.

18.3.6. Summary

We can combine a Gaussian process prior with data to form a posterior, which we use to make predictions. We can also form a marginal likelihood, which is useful for automatic learning of kernel hyperparameters, which control properties such as the rate of variation of the Gaussian process. The mechanics of forming the posterior and learning kernel hyperparameters for regression are simple, involving about a dozen lines of code. This notebook is a good reference for any reader wanting to quickly get “up and running” with Gaussian processes. We also introduced the GPyTorch library. Although the GPyTorch code for basic regression is relatively long, it can be trivially modified for other kernel functions, or more advanced functionality we will discuss in future notebooks, such as scalable inference, or non-Gaussian likelihoods for classification.

Gaussian process prior 를 데이터와 결합하여 posterior 를 형성하고 이를 예측에 사용할 수 있습니다. 또한 가우스 프로세스의 변동률과 같은 속성을 제어하는 커널 하이퍼파라미터의 자동 학습에 유용한 주변 우도를 형성할 수도 있습니다. 회귀를 위한 사후 및 학습 커널 하이퍼파라미터를 형성하는 메커니즘은 약 12줄의 코드를 포함하여 간단합니다. 이 노트북은 가우스 프로세스를 신속하게 "시작하고 실행"하려는 모든 독자에게 좋은 참고 자료입니다. GPyTorch 라이브러리도 소개했습니다. 기본 회귀를 위한 GPyTorch 코드는 상대적으로 길지만 다른 커널 기능이나 확장 가능한 추론 또는 분류를 위한 비가우시안 가능성과 같은 향후 노트북에서 논의할 고급 기능을 위해 쉽게 수정할 수 있습니다.

18.3.7. Exercises

We have emphasized the importance of learning kernel hyperparameters, and the effect of hyperparameters and kernels on the generalization properties of Gaussian processes. Try skipping the step where we learn hypers, and instead guess a variety of length-scales and noise variances, and check their effect on predictions. What happens when you use a large length-scale? A small length-scale? A large noise variance? A small noise variance?
We have said that the marginal likelihood is not a convex objective, but that hyperparameters like length-scale and noise variance can be reliably estimated in GP regression. This is generally true — in fact, the marginal likelihood is much better at learning length-scale hyperparameters than conventional approaches in spatial statistics, which involve fitting empirical autocorrelation functions (“covariograms”). Arguably, the biggest contribution from machine learning to Gaussian process research, at least before recent work on scalable inference, was the introduction of the marginal lkelihood for hyperparameter learning.

However, different pairings of even these parameters provide interpretably different plausible explanations for many datasets, leading to local optima in our objective. If we use a large length-scale, then we assume the true underlying function is slowly varying. If the observed data are varying significantly, then the only we can plausibly have a large length-scale is with a large noise-variance. If we use a small length-scale, on the other hand, our fit will be very sensitive to the variations in the data, leaving little room to explain variations with noise (aleatoric uncertainty).

Try seeing if you can find these local optima: initialize with very large length-scale with large noise, and small length-scales with small noise. Do you converge to different solutions?

We have said that a fundamental advantage of Bayesian methods is in naturally representing epistemic uncertainty. In the above example, we cannot fully see the effects of epistemic uncertainty. Try instead to predict with test_x = np.linspace(0, 10, 1000). What happens to the 95% credible set as your predictions move beyond the data? Does it cover the true function in that interval? What happens if you only visualize aleatoric uncertainty in that region?
Try running the above example, but instead with 10,000, 20,000 and 40,000 training points, and measure the runtimes. How does the training time scale? Alternatively, how do the runtimes scale with the number of test points? Is it different for the predictive mean and the predictive variance? Answer this question both by theoretically working out the training and testing time complexities, and by running the code above with a different number of points.
Try running the GPyTorch example with different covariance functions, such as the Matern kernel. How do the results change? How about the spectral mixture kernel, found in the GPyTorch library? Are some easier to train the marginal likelihood than others? Are some more valuable for long-range versus short-range predictions?
In our GPyTorch example, we plotted the predictive distribution including observation noise, while in our “from scratch” example, we only included epistemic uncertainty. Re-do the GPyTorch example, but this time only plotting epistemic uncertainty, and compare to the from-scratch results. Do the predictive distributions now look the same? (They should.)

'Dive into Deep Learning > D2L Gaussian Processes' 카테고리의 다른 글

D2L - 18.2. Gaussian Process Priors (1)	2023.09.09
D2L - 18.1. Introduction to Gaussian Processes (0)	2023.09.09
D2L - 18. Gaussian Processes (0)	2023.09.09

Dive into Deep Learning/D2L Gaussian Processes

D2L - 18.2. Gaussian Process Priors

2023. 9. 9. 11:31 | Posted by 솔웅

18.2. Gaussian Process Priors — Dive into Deep Learning 1.0.3 documentation (d2l.ai)

18.2. Gaussian Process Priors — Dive into Deep Learning 1.0.3 documentation

d2l.ai

18.2. Gaussian Process Priors

Understanding Gaussian processes (GPs) is important for reasoning about model construction and generalization, and for achieving state-of-the-art performance in a variety of applications, including active learning, and hyperparameter tuning in deep learning. GPs are everywhere, and it is in our interests to know what they are and how we can use them.

가우시안 프로세스(GP)를 이해하는 것은 모델 구성 및 일반화에 대한 추론과 능동 학습, 딥 러닝의 하이퍼파라미터 조정을 포함한 다양한 애플리케이션에서 최첨단 성능을 달성하는 데 중요합니다. GP는 어디에나 있으며, GP가 무엇인지, 어떻게 사용할 수 있는지 아는 것이 우리의 이익입니다.

In this section, we introduce Gaussian process priors over functions. In the next notebook, we show how to use these priors to do posterior inference and make predictions. The next section can be viewed as “GPs in a nutshell”, quickly giving what you need to apply Gaussian processes in practice.

이 섹션에서는 함수에 대한 Gaussian process priors를 소개를 합니다. 다음 노트북에서는 이러한 priors를 사용하여 사후 추론을 수행하고 예측하는 방법을 보여줍니다. 다음 섹션은 실제로 가우스 프로세스를 적용하는 데 필요한 내용을 빠르게 제공하는 "간단한 GP"로 볼 수 있습니다.

import numpy as np
from scipy.spatial import distance_matrix
from d2l import torch as d2l

d2l.set_figsize()

위의 코드는 Python 프로그램의 일부분이며, 주로 수학적 및 시각화 라이브러리를 사용하여 그림의 크기를 설정하는 목적으로 사용됩니다. 아래는 코드의 각 부분에 대한 설명입니다:

import numpy as np: 이 줄은 NumPy 라이브러리를 가져오고, 'np'라는 별칭으로 라이브러리를 사용할 수 있도록 합니다. NumPy는 다차원 배열 및 행렬 연산을 위한 파이썬 라이브러리로, 수치 계산과 관련된 다양한 기능을 제공합니다.
from scipy.spatial import distance_matrix: 이 줄은 SciPy 라이브러리에서 distance_matrix 함수를 가져옵니다. distance_matrix 함수는 점들 간의 거리 행렬을 계산하는 데 사용됩니다.
from d2l import torch as d2l: 이 줄은 "d2l" 모듈에서 "torch" 모듈을 가져오고, "d2l" 모듈을 "d2l"로 별칭 지정합니다. 이 모듈은 D2L(Data Science for Dummies) 라이브러리의 일부로서, 딥 러닝 및 기계 학습 교육을 위한 유용한 기능과 도구를 제공합니다.
d2l.set_figsize(): 이 줄은 D2L 라이브러리의 set_figsize 함수를 호출하여 현재 그림의 크기를 설정합니다. 이 함수는 일반적으로 그림의 크기를 조절하기 위해 사용됩니다.

이 코드 자체는 주로 라이브러리 및 설정을 가져오고 초기화하기 위한 것으로, 여기서는 실제 작업이 수행되지 않습니다. 코드의 나머지 부분이나 주변 문맥에서 더 많은 정보가 필요할 것입니다.

Gaussian Process Prior 란? (GPT 답변)

**가우시안 프로세스 사전(Gaussian Process Prior)**는 가우시안 프로세스(GP) 모델에서 사용되는 중요한 개념 중 하나입니다. GP는 확률적인 회귀 및 분류 문제를 해결하는 데 사용되는 확률적인 모델링 도구 중 하나로, 데이터 포인트 간의 관계를 모델링하는 데 사용됩니다.

가우시안 프로세스 사전은 모델링하려는 함수의 사전 정보 또는 가정을 나타냅니다. 이것은 GP 모델을 어떤 종류의 함수로 가정하고, 모델의 평균 및 공분산 구조를 결정하는 데 사용됩니다. GP 사전은 모델을 더 효과적으로 최적화하고, 모델이 데이터에 더 잘 적합하도록 도울 수 있습니다.

가우시안 프로세스 사전의 주요 요소는 다음과 같습니다:

평균 함수(Mean Function): GP 모델의 평균 함수는 예측 모델이 데이터를 따라 어떻게 중심에 위치해야 하는지를 제시합니다. 평균 함수는 데이터에 따라 조정될 수 있으며, 예측 모델이 평균 함수를 따라 움직입니다.
공분산 함수(Covariance Function 또는 Kernel 함수): GP 모델의 공분산 함수는 데이터 포인트 간의 관계를 모델링합니다. 이 함수는 데이터 포인트 간의 유사성을 측정하며, 두 데이터 포인트 사이의 거리 또는 유사성을 계산합니다. 일반적으로 RBF(Radial Basis Function) 커널, 선형 커널, 다항식 커널 등 다양한 커널 함수가 사용됩니다.
하이퍼파라미터(Hyperparameters): 가우시안 프로세스 사전은 평균 함수와 공분산 함수의 하이퍼파라미터를 포함합니다. 이러한 하이퍼파라미터는 GP 모델을 조정하고 데이터에 맞게 조절하는 데 사용됩니다. 하이퍼파라미터는 최적화 과정을 통해 조정됩니다.

가우시안 프로세스 사전을 설정하는 방법은 모델링하려는 문제에 따라 다르며, 경험과 도메인 지식이 필요한 경우가 많습니다. 올바른 가우시안 프로세스 사전을 선택하면 모델의 성능을 향상시키고, 데이터에 대한 불확실성을 더 정확하게 모델링할 수 있습니다.

18.2.1. Definition

A Gaussian process is defined as a collection of random variables, any finite number of which have a joint Gaussian distribution. If a function f(x) is a Gaussian process, with mean function m(x) and covariance function or kernel k(x,x′), f(x)∼GP(m,k), then any collection of function values queried at any collection of input points x (times, spatial locations, image pixels, etc.), has a joint multivariate Gaussian distribution with mean vector μ and covariance matrix K: f(x1),…,f(xn)∼N(μ,K), where μi=E[f(xi)]=m(xi) and Kij= Cov(f(xi),f(xj)) = k(xi,xj).

가우스 프로세스는 임의 변수의 모음으로 정의되며, 임의의 유한한 수는 공동 가우스 분포를 갖습니다. 함수 f(x)가 평균 함수 m(x)와 공분산 함수 또는 커널 k(x,x′), f(x)∼GP(m,k)를 갖는 가우스 프로세스인 경우 함수 값의 모음 임의의 입력 점 x(시간, 공간 위치, 이미지 픽셀 등) 모음에서 쿼리된 값은 평균 벡터 μ 및 공분산 행렬 K를 갖는 결합 다변량 가우스 분포를 갖습니다: f(x1),…,f(xn)∼N( μ,K), 여기서 μi=E[f(xi)]=m(xi)이고 Kij= Cov(f(xi),f(xj)) = k(xi,xj)입니다.

This definition may seem abstract and inaccessible, but Gaussian processes are in fact very simple objects.

이 정의는 추상적이고 접근하기 어려운 것처럼 보일 수 있지만 가우스 프로세스는 실제로 매우 간단한 개체입니다.

Any function with w drawn from a Gaussian (normal) distribution, and ϕ being any vector of basis functions, for example ϕ(x)=(1,x,x**2,...,x**d)⊤, is a Gaussian process. Moreover, any Gaussian process f(x) can be expressed in the form of equation (18.2.1). Let’s consider a few concrete examples, to begin getting acquainted with Gaussian processes, after which we can appreciate how simple and useful they really are.

가우스(정규) 분포에서 도출된 w를 갖는 모든 함수, 그리고 ф는 기본 함수의 임의의 벡터입니다. 예를 들어 'ф(x)=(1,x,x**2,...,x**d)⊤'는 가우스 프로세스입니다. 게다가 임의의 가우스 프로세스 f(x)는 방정식 (18.2.1)의 형태로 표현될 수 있습니다. 가우시안 프로세스에 익숙해지기 위해 몇 가지 구체적인 예를 고려해 보겠습니다. 그 후에는 이것이 실제로 얼마나 간단하고 유용한지 평가할 수 있습니다.

18.2.2. A Simple Gaussian Process

Suppose f(x)=w0+w1x, and w0,w1∼N(0,1), with w0,w1,x all in one dimension. We can equivalently write this function as the inner product f(x)=(w0,w1)(1,x)⊤. In (18.2.1) above, w=(w0,w1)**⊤ and ф(x)=(1,x)**⊤.

f(x)=w0+w1x, w0,w1∼N(0,1), w0,w1,x가 모두 1차원에 있다고 가정합니다. 이 함수를 내적 f(x)=(w0,w1)(1,x)⊤로 동등하게 작성할 수 있습니다. 위의 (18.2.1)에서 w=(w0,w1)**⊤ 및 ф(x)=(1,x)**⊤입니다.

For any x, f(x) is a sum of two Gaussian random variables. Since Gaussians are closed under addition, f(x) is also a Gaussian random variable for any x. In fact, we can compute for any particular x that f(x) is N(0,1+x2). Similarly, the joint distribution for any collection of function values, (f(x1),…,f(xn)), for any collection of inputs x1,…,xn, is a multivariate Gaussian distribution. Therefore f(x) is a Gaussian process.

임의의 x에 대해 f(x)는 두 가우스 확률 변수의 합입니다. 가우스는 덧셈에 대해 닫혀 있으므로 f(x)는 모든 x에 대한 가우스 확률 변수이기도 합니다. 실제로, 우리는 f(x)가 N(0,1+x2)인 특정 x에 대해 계산할 수 있습니다. 마찬가지로, 모든 입력 x1,…,xn 컬렉션에 대한 함수 값 컬렉션(f(x1),…,f(xn))에 대한 결합 분포는 다변량 가우스 분포입니다. 따라서 f(x)는 가우스 과정입니다.

In short, f(x) is a random function, or a distribution over functions. We can gain some insights into this distribution by repeatedly sampling values for w0,w1, and visualizing the corresponding functions f(x), which are straight lines with slopes and different intercepts, as follows:

간단히 말해서, f(x)는 무작위 함수, 즉 함수에 대한 분포입니다. 다음과 같이 w0,w1에 대한 값을 반복적으로 샘플링하고 기울기와 다양한 절편이 있는 직선인 해당 함수 f(x)를 시각화하면 이 분포에 대한 통찰력을 얻을 수 있습니다.

def lin_func(x, n_sample):
    preds = np.zeros((n_sample, x.shape[0]))
    for ii in range(n_sample):
        w = np.random.normal(0, 1, 2)
        y = w[0] + w[1] * x
        preds[ii, :] = y
    return preds

x_points = np.linspace(-5, 5, 50)
outs = lin_func(x_points, 10)
lw_bd = -2 * np.sqrt((1 + x_points ** 2))
up_bd = 2 * np.sqrt((1 + x_points ** 2))

d2l.plt.fill_between(x_points, lw_bd, up_bd, alpha=0.25)
d2l.plt.plot(x_points, np.zeros(len(x_points)), linewidth=4, color='black')
d2l.plt.plot(x_points, outs.T)
d2l.plt.xlabel("x", fontsize=20)
d2l.plt.ylabel("f(x)", fontsize=20)
d2l.plt.show()

위의 코드는 주어진 선형 함수를 기반으로 데이터를 생성하고 시각화하는 파이썬 프로그램입니다. 코드의 각 부분에 대한 설명은 다음과 같습니다:

def lin_func(x, n_sample):: 이 줄은 lin_func라는 사용자 지정 함수를 정의합니다. 이 함수는 x와 n_sample 두 개의 인자를 받습니다. x는 입력 데이터로 사용되며, n_sample은 데이터 샘플의 수를 나타냅니다.
preds = np.zeros((n_sample, x.shape[0])): 이 줄은 결과를 저장할 빈 배열인 preds를 생성합니다. 이 배열은 n_sample개의 행과 x 배열의 길이와 같은 열을 갖습니다.
for ii in range(n_sample):: 이 줄은 n_sample만큼 반복하는 루프를 시작합니다.
w = np.random.normal(0, 1, 2): 이 줄은 평균이 0이고 표준편차가 1인 정규 분포에서 무작위 가중치 w를 생성합니다. w는 길이가 2인 배열로, 첫 번째 원소는 y 절편을 나타내고 두 번째 원소는 기울기를 나타냅니다.
y = w[0] + w[1] * x: 이 줄은 입력 데이터 x에 대한 예측값 y를 계산합니다. 이 예측값은 선형 함수 w[0] + w[1] * x에 의해 생성됩니다.
preds[ii, :] = y: 이 줄은 현재 예측값 y를 preds 배열에 저장합니다.
return preds: 이 줄은 preds 배열을 반환하고, lin_func 함수를 종료합니다.
x_points = np.linspace(-5, 5, 50): 이 줄은 -5부터 5까지의 범위에서 50개의 등간격으로 분포하는 x_points 배열을 생성합니다. 이 배열은 x 값의 범위를 나타냅니다.
outs = lin_func(x_points, 10): 이 줄은 앞에서 정의한 lin_func 함수를 호출하여 x_points 값을 사용하여 예측값을 생성합니다. n_sample 매개변수로 10을 전달하므로 10개의 무작위 예측값이 생성됩니다.
lw_bd = -2 * np.sqrt((1 + x_points ** 2)) 및 up_bd = 2 * np.sqrt((1 + x_points ** 2)): 이 두 줄은 x_points 값을 사용하여 하한(lw_bd)과 상한(up_bd) 경계를 생성합니다. 이 경계는 시각화에서 사용될 것입니다.
d2l.plt.fill_between(x_points, lw_bd, up_bd, alpha=0.25): 이 줄은 fill_between 함수를 사용하여 x_points 범위 내에서 lw_bd와 up_bd 사이를 채우는 영역을 그립니다. alpha 매개변수는 영역의 투명도를 설정합니다.
d2l.plt.plot(x_points, np.zeros(len(x_points)), linewidth=4, color='black'): 이 줄은 x 축에 대한 제로 라인을 그리는 것으로, 그림에서 x 축을 나타냅니다.
d2l.plt.plot(x_points, outs.T): 이 줄은 outs 배열을 사용하여 x 축에 대한 예측값을 그립니다. 여러 개의 예측값이 그려질 것이며, 각 예측값은 서로 다른 색상으로 표시됩니다.
d2l.plt.xlabel("x", fontsize=20) 및 d2l.plt.ylabel("f(x)", fontsize=20): 이 두 줄은 x 축과 y 축에 라벨을 추가하고 글꼴 크기를 설정합니다.
d2l.plt.show(): 이 줄은 그림을 화면에 표시합니다.

이 코드는 주어진 선형 함수에 기반하여 데이터를 생성하고 시각적으로 나타내는 데 사용됩니다. 결과 그림에는 선형 함수를 따라 생성된 데이터 포인트와 해당 경계가 표시됩니다.

If w0 and w1 are instead drawn from N(0,a2), how do you imagine varying 'a' affects the distribution over functions?

w0과 w1이 N(0,a2)에서 대신 추출된다면 'a'의 변화가 함수 분포에 어떤 영향을 미칠 것이라고 생각하시나요?

18.2.3. From Weight Space to Function Space

In the plot above, we saw how a distribution over parameters in a model induces a distribution over functions. While we often have ideas about the functions we want to model — whether they’re smooth, periodic, quickly varying, etc. — it is relatively tedious to reason about the parameters, which are largely uninterpretable. Fortunately, Gaussian processes provide an easy mechanism to reason directly about functions. Since a Gaussian distribution is entirely defined by its first two moments, its mean and covariance matrix, a Gaussian process by extension is defined by its mean function and covariance function.

위의 플롯에서 모델의 매개변수에 대한 분포가 함수에 대한 분포를 유도하는 방법을 확인했습니다. 우리는 모델링하려는 함수에 대한 아이디어(매끄럽거나 주기적이거나 빠르게 변화하는 등)에 대해 종종 아이디어를 갖고 있지만 대체로 해석할 수 없는 매개변수에 대해 추론하는 것은 상대적으로 지루합니다. 다행스럽게도 가우스 프로세스는 함수에 대해 직접적으로 추론할 수 있는 쉬운 메커니즘을 제공합니다. 가우스 분포는 처음 두 모멘트인 평균과 공분산 행렬로 완전히 정의되므로 확장에 따른 가우스 프로세스는 평균 함수와 공분산 함수로 정의됩니다.

In the above example, the mean function 위의 예에서 평균 함수는

Similarly, the covariance function is 마찬가지로 공분산 함수는 다음과 같습니다.

Our distribution over functions can now be directly specified and sampled from, without needing to sample from the distribution over parameters. For example, to draw from f(x), we can simply form our multivariate Gaussian distribution associated with any collection of x we want to query, and sample from it directly. We will begin to see just how advantageous this formulation will be.

이제 매개변수에 대한 분포에서 샘플링할 필요 없이 함수에 대한 분포를 직접 지정하고 샘플링할 수 있습니다. 예를 들어 f(x)에서 추출하려면 쿼리하려는 x 컬렉션과 관련된 다변량 가우스 분포를 간단히 구성하고 여기에서 직접 샘플링하면 됩니다. 우리는 이 공식이 얼마나 유리한지 살펴보기 시작할 것입니다.

First, we note that essentially the same derivation for the simple straight line model above can be applied to find the mean and covariance function for any model of the form f(x)=w**⊤ ϕ(x), with w∼N(u,S). In this case, the mean function m(x)=u**⊤ ϕ(x), and the covariance function k(x,x′)=ϕ(x)**⊤ Sϕ(x′). Since ϕ(x) can represent a vector of any non-linear basis functions, we are considering a very general model class, including models with an even an infinite number of parameters.

먼저, 위의 단순 직선 모델에 대해 본질적으로 동일한 파생을 적용하여 'w∼N(u,S)와 함께 f(x)=w**⊤ ф(x) 형식의 모든 모델에 대한 평균 및 공분산 함수를 찾을 수 있습니다.. 이 경우 평균 함수 m(x)=u**⊤ ф(x)이고, 공분산 함수 k(x,x′)=ψ(x)**⊤ Sψ(x′)입니다. ф(x)는 모든 비선형 기저 함수의 벡터를 나타낼 수 있으므로 우리는 무한한 수의 매개변수를 가진 모델을 포함하여 매우 일반적인 모델 클래스를 고려하고 있습니다.

18.2.4. The Radial Basis Function (RBF) Kernel

The radial basis function (RBF) kernel is the most popular covariance function for Gaussian processes, and kernel machines in general. This kernel has the form kRBF(x,x′)=a**2 exp⁡(−1/2ℓ**2||x−x′||2), where 'a' is an amplitude parameter, and ℓ is a lengthscale hyperparameter.

RBF(방사형 기저 함수) 커널은 가우스 프로세스 및 일반적으로 커널 머신에 가장 널리 사용되는 공분산 함수입니다. 이 커널은 kRBF(x,x′)=a**2 exp⁡(−1/2ℓ**2||x−x′||2) 형식을 갖습니다. 여기서 'a'는 진폭 매개변수이고 ℓ는 길이 척도 하이퍼 매개변수입니다.

Radial Basis Function (RBF) Kernel 이란? (GPT3.5 의 답변)

The Radial Basis Function (RBF) kernel, also known as the Gaussian kernel, is a commonly used kernel function in machine learning and statistical modeling. It is particularly popular in the context of Support Vector Machines (SVMs) and Gaussian Processes (GPs).

라디얼 베이시스 함수(Radial Basis Function, RBF) 커널은 기계 학습 및 통계 모델링에서 일반적으로 사용되는 커널 함수로, 특히 서포트 벡터 머신(SVM)과 가우시안 프로세스(GP)의 맥락에서 널리 사용됩니다.

The RBF kernel is defined as: RBF 커널은 다음과 같이 정의됩니다.

K(x, x') = exp(-γ * ||x - x'||^2)

Here, K represents the kernel function, x and x' are data points, ||x - x'|| represents the Euclidean distance between these points, and γ (gamma) is a hyperparameter that controls the kernel's shape. The RBF kernel assigns a similarity or correlation score to pairs of data points based on their distance.

여기서 K는 커널 함수를 나타내며, x와 x'은 데이터 포인트이고, ||x - x'||은 이러한 포인트 간의 유클리드 거리를 나타내며, γ(gamma)는 커널의 모양을 제어하는 하이퍼파라미터입니다. RBF 커널은 데이터 포인트 쌍에 대해 거리를 기반으로 유사성 또는 상관 점수를 할당합니다.

Key characteristics of the RBF kernel include:

RBF 커널의 주요 특징은 다음과 같습니다.

Decay with Distance: As the distance between data points increases, the similarity score assigned by the RBF kernel decreases exponentially. This means that nearby points receive higher similarity scores, while distant points receive lower scores.

거리에 따른 감쇠: 데이터 포인트 간 거리가 증가함에 따라 RBF 커널이 할당하는 유사성 점수가 지수적으로 감소합니다. 이것은 가까운 포인트가 더 높은 유사성 점수를 받는 반면 먼 포인트가 더 낮은 점수를 받는다는 것을 의미합니다.
Smoothness: The RBF kernel produces smooth and continuous similarity scores, which makes it suitable for capturing complex patterns and relationships in the data.

부드러움: RBF 커널은 부드럽고 연속적인 유사성 점수를 생성하며, 이로써 데이터의 복잡한 패턴과 관계를 캡처하는 데 적합합니다.
Universal Approximator: The RBF kernel is a universal approximator, meaning it can approximate any continuous function with sufficient data and proper tuning of its hyperparameters.

범용 근사자(Universal Approximator): RBF 커널은 범용 근사자로, 충분한 데이터와 하이퍼파라미터의 적절한 조정을 통해 어떤 연속 함수든 근사화할 수 있습니다.
Hyperparameter γ: The hyperparameter γ controls the "width" of the kernel. Smaller values of γ result in a broader kernel that assigns higher similarity scores to a wider range of data points. Larger values of γ result in a narrower kernel that focuses on closer neighbors.

하이퍼파라미터 γ: 하이퍼파라미터 γ는 커널의 "폭"을 제어합니다. 작은 γ 값은 더 넓은 커널을 생성하여 더 넓은 범위의 데이터 포인트에 높은 유사성 점수를 할당합니다. 큰 γ 값은 더 좁은 커널을 생성하여 더 가까운 이웃에 중점을 둡니다.

The RBF kernel is often used in various machine learning tasks, such as SVM classification, GP regression, clustering, and dimensionality reduction. It is valued for its ability to capture complex, non-linear relationships in the data, but it may also be sensitive to the choice of the γ hyperparameter, which requires careful tuning for optimal model performance.

RBF 커널은 SVM 분류, GP 회귀, 클러스터링 및 차원 축소와 같은 다양한 기계 학습 작업에서 사용됩니다. 이는 데이터의 복잡한 비선형 관계를 캡처할 수 있는 능력으로 인해 중요하며, 최적의 모델 성능을 위해 γ 하이퍼파라미터 선택에 주의 깊게 조정해야 할 수도 있습니다.

Let’s derive this kernel starting from weight space. Consider the function

가중치 공간에서 시작하여 이 커널을 유도해 보겠습니다. 기능을 고려하십시오

f(x) is a sum of radial basis functions, with width ℓ, centred at the points ci, as shown in the following figure.

f(x)는 다음 그림과 같이 점 ci를 중심으로 너비가 ℓ인 방사형 기저 함수의 합입니다.

We can recognize f(x) as having the form w**⊤ ϕ(x), where w=(w1,…,wJ)**⊤ and ϕ(x) is a vector containing each of the radial basis functions. The covariance function of this Gaussian process is then

f(x)는 w**⊤ ф(x) 형식을 갖는 것으로 인식할 수 있습니다. 여기서 w=(w1,…,wJ)**⊤이고 ф(x)는 각 방사형 기저 함수를 포함하는 벡터입니다. 이 가우스 프로세스의 공분산 함수는 다음과 같습니다.

Now let’s consider what happens as we take the number of parameters (and basis functions) to infinity. Let cJ=log⁡J, c1=−log⁡ J, and ci+1−ci=Δc=2 log⁡J/J, and J→∞. The covariance function becomes the Riemann sum:

이제 매개변수(및 기본 함수)의 수를 무한대로 가져가면 어떤 일이 발생하는지 생각해 보겠습니다. cJ=log⁡J, c1=−log⁡ J, 그리고 ci+1−ci=Δc=2 log⁡J/J, 그리고 J→라고 하자. 공분산 함수는 리만 합계가 됩니다.

By setting c0=−∞ and c∞=∞, we spread the infinitely many basis functions across the whole real line, each a distance Δc→0 apart:

c0=−무한대와 c무한대=무를 설정함으로써 우리는 무한히 많은 기저 함수를 실제 선 전체에 걸쳐 각각 Δc→0 거리만큼 분산시킵니다.

It is worth taking a moment to absorb what we have done here. By moving into the function space representation, we have derived how to represent a model with an infinite number of parameters, using a finite amount of computation. A Gaussian process with an RBF kernel is a universal approximator, capable of representing any continuous function to arbitrary precision. We can intuitively see why from the above derivation. We can collapse each radial basis function to a point mass taking ℓ→0, and give each point mass any height we wish.

여기서 우리가 한 일을 잠시 흡수해 볼 가치가 있습니다. 함수 공간 표현으로 이동하여 유한한 양의 계산을 사용하여 무한한 수의 매개변수가 있는 모델을 표현하는 방법을 도출했습니다. RBF 커널을 사용하는 가우스 프로세스는 모든 연속 함수를 임의의 정밀도로 표현할 수 있는 범용 근사기입니다. 위의 도출을 통해 그 이유를 직관적으로 알 수 있습니다. 각 방사형 기저 함수를 ℓ→0을 취하는 점 질량으로 축소하고 각 점 질량에 원하는 높이를 부여할 수 있습니다.

So a Gaussian process with an RBF kernel is a model with an infinite number of parameters and much more flexibility than any finite neural network. Perhaps all the fuss about overparametrized neural networks is misplaced. As we will see, GPs with RBF kernels do not overfit, and in fact provide especially compelling generalization performance on small datasets. Moreover, the examples in (Zhang et al., 2021), such as the ability to fit images with random labels perfectly, but still generalize well on structured problems, (can be perfectly reproduced using Gaussian processes) (Wilson and Izmailov, 2020). Neural networks are not as distinct as we make them out to be.

따라서 RBF 커널을 사용하는 가우스 프로세스는 무한한 수의 매개 변수와 유한 신경망보다 훨씬 더 많은 유연성을 갖춘 모델입니다. 아마도 과도하게 매개변수화된 신경망에 대한 모든 소란은 잘못된 것일 수도 있습니다. 앞으로 살펴보겠지만 RBF 커널을 사용하는 GP는 과적합되지 않으며 실제로 작은 데이터 세트에서 특히 강력한 일반화 성능을 제공합니다. 더욱이 (Zhang et al., 2021)의 예에서는 임의의 레이블이 있는 이미지를 완벽하게 맞추면서도 구조화된 문제에 대해 여전히 잘 일반화하는 기능과 같은 것입니다(가우시안 프로세스를 사용하여 완벽하게 재현 가능)(Wilson and Izmailov, 2020). . 신경망은 우리가 생각하는 것만큼 뚜렷하지 않습니다.

We can build further intuition about Gaussian processes with RBF kernels, and hyperparameters such as length-scale, by sampling directly from the distribution over functions. As before, this involves a simple procedure:

함수에 대한 분포에서 직접 샘플링함으로써 RBF 커널과 길이 척도와 같은 하이퍼파라미터를 사용하는 가우스 프로세스에 대한 추가 직관을 구축할 수 있습니다. 이전과 마찬가지로 여기에는 간단한 절차가 포함됩니다.

Choose the input x points we want to query the GP: x1,…,xn.

GP에 쿼리하려는 입력 x 포인트(x1,…,xn)를 선택합니다.
Evaluate m(xi), i=1,…,n, and k(xi,xj) for i,j=1,…,n to respectively form the mean vector and covariance matrix μ and K, where (f(x1),…,f(xn))∼N(μ,K).

i,j=1,…,n에 대해 m(xi), i=1,…,n 및 k(xi,xj)를 계산하여 각각 평균 벡터와 공분산 행렬 μ 및 K를 형성합니다. 여기서 (f(x1) ,…,f(xn))∼N(μ,K).
Sample from this multivariate Gaussian distribution to obtain the sample function values.

이 다변량 가우스 분포에서 샘플링하여 샘플 함수 값을 얻습니다.
Sample more times to visualize more sample functions queried at those points.

더 많은 횟수를 샘플링하여 해당 지점에서 쿼리된 더 많은 샘플 함수를 시각화합니다.

We illustrate this process in the figure below.

아래 그림에서는 이 프로세스를 설명합니다.

def rbfkernel(x1, x2, ls=4.):  #@save
    dist = distance_matrix(np.expand_dims(x1, 1), np.expand_dims(x2, 1))
    return np.exp(-(1. / ls / 2) * (dist ** 2))

x_points = np.linspace(0, 5, 50)
meanvec = np.zeros(len(x_points))
covmat = rbfkernel(x_points,x_points, 1)

prior_samples= np.random.multivariate_normal(meanvec, covmat, size=5);
d2l.plt.plot(x_points, prior_samples.T, alpha=0.5)
d2l.plt.show()

위의 코드는 라디언 기저 함수(Radial Basis Function, RBF) 커널을 사용하여 가우시안 프로세스(Gaussian Process)의 사전 분포를 시각화하는 파이썬 프로그램입니다. 이 코드를 한 줄씩 설명하겠습니다:

def rbfkernel(x1, x2, ls=4.):: 이 줄은 rbfkernel 함수를 정의합니다. 이 함수는 두 개의 입력 x1과 x2를 받고, 추가적인 매개변수로 ls를 받습니다. ls는 커널의 길이 스케일을 나타내는 매개변수로, 기본값은 4입니다.
dist = distance_matrix(np.expand_dims(x1, 1), np.expand_dims(x2, 1)): 이 줄은 입력 데이터 x1과 x2 사이의 거리 행렬(dist)을 계산합니다. distance_matrix 함수를 사용하여 x1과 x2 간의 모든 가능한 거리를 계산합니다.
return np.exp(-(1. / ls / 2) * (dist ** 2)): 이 줄은 RBF 커널의 계산을 수행합니다. RBF 커널은 가우시안 형태로, 두 데이터 포인트 간의 거리를 지수 함수의 형태로 변환하여 반환합니다. ls는 커널의 길이 스케일을 조절하는 매개변수로, 커널의 폭을 조절합니다.
x_points = np.linspace(0, 5, 50): 이 줄은 0부터 5까지의 범위에서 50개의 등간격으로 분포하는 x_points 배열을 생성합니다. 이 배열은 x 값의 범위를 나타냅니다.
meanvec = np.zeros(len(x_points)): 이 줄은 x_points와 같은 길이의 제로 벡터인 meanvec를 생성합니다. 이 벡터는 가우시안 프로세스의 평균 벡터로 사용됩니다.
covmat = rbfkernel(x_points, x_points, 1): 이 줄은 rbfkernel 함수를 호출하여 x_points에 대한 공분산 행렬(covmat)을 계산합니다. 이 공분산 행렬은 RBF 커널을 사용하여 생성되며, ls 매개변수의 값이 1로 설정되어 있습니다.
prior_samples = np.random.multivariate_normal(meanvec, covmat, size=5): 이 줄은 meanvec와 covmat을 사용하여 가우시안 분포에서 5개의 무작위 샘플(prior_samples)을 생성합니다. 이 샘플은 가우시안 프로세스의 사전 분포에서 추출된 것입니다.
d2l.plt.plot(x_points, prior_samples.T, alpha=0.5): 이 줄은 prior_samples를 시각화합니다. 각각의 무작위 샘플은 x_points에 대해 그래프로 표시되며, alpha 매개변수를 사용하여 투명도를 설정합니다.
d2l.plt.show(): 이 줄은 그래프를 화면에 표시합니다.

이 코드는 가우시안 프로세스의 사전 분포를 시각화하기 위해 RBF 커널을 사용하는 간단한 예제를 제공합니다. 이를 통해 가우시안 프로세스가 어떻게 작동하는지 이해할 수 있습니다.

18.2.5. The Neural Network Kernel

Research on Gaussian processes in machine learning was triggered by research on neural networks. Radford Neal was pursuing ever larger Bayesian neural networks, ultimately showing in 1994 (later published in 1996, as it was one of the most infamous NeurIPS rejections) that such networks with an infinite number of hidden units become Gaussian processes with particular kernel functions (Neal, 1996). Interest in this derivation has re-surfaced, with ideas like the neural tangent kernel being used to investigate the generalization properties of neural networks (Matthews et al., 2018) (Novak et al., 2018). We can derive the neural network kernel as follows.

기계 학습의 가우스 프로세스에 대한 연구는 신경망에 대한 연구에서 시작되었습니다. Radford Neal은 훨씬 더 큰 베이지안 신경망을 추구했으며, 궁극적으로 1994년에(나중에 가장 악명 높은 NeurIPS 거부 중 하나인 1996년에 출판됨) 무한한 수의 숨겨진 단위를 가진 그러한 네트워크가 특정 커널 기능을 가진 가우스 프로세스가 된다는 것을 보여주었습니다(Neal , 1996). 신경망의 일반화 속성을 조사하는 데 신경 접선 커널과 같은 아이디어가 사용되면서 이 파생에 대한 관심이 다시 표면화되었습니다(Matthews et al., 2018)(Novak et al., 2018). 신경망 커널은 다음과 같이 유도할 수 있습니다.

Consider a neural network function f(x) with one hidden layer:

하나의 은닉층이 있는 신경망 함수 f(x)를 생각해 보세요.

b is a bias, vi are the hidden to output weights, ℎ is any bounded hidden unit transfer function, ui are the input to hidden weights, and J is the number of hidden units. Let b and vi be independent with zero mean and variances σ**2 b and σ**2v/J, respectively, and let the ui have independent identical distributions. We can then use the central limit theorem to show that any collection of function values f(x1),…,f(xn) has a joint multivariate Gaussian distribution.

b는 편향, vi는 출력 가중치에 대한 은닉, ℎ는 경계가 있는 숨겨진 단위 전달 함수, ui는 숨겨진 가중치에 대한 입력, J는 숨겨진 단위의 수입니다. b와 vi가 평균이 0이고 분산이 각각 σ**2 b 및 σ**2v/J인 독립이고 ui가 독립적인 동일한 분포를 갖는다고 가정합니다. 그런 다음 중심 극한 정리를 사용하여 함수 값 f(x1),…,f(xn)의 집합이 결합 다변량 가우스 분포를 가짐을 보여줄 수 있습니다.

The mean and covariance function of the corresponding Gaussian process are:

해당 가우스 프로세스의 평균 및 공분산 함수는 다음과 같습니다.

In some cases, we can essentially evaluate this covariance function in closed form. Let ℎ(x;u)=erf(u0+∑**p j=1 ujxj), where

The RBF kernel is stationary, meaning that it is translation invariant, and therefore can be written as a function of T=x−x′. Intuitively, stationarity means that the high-level properties of the function, such as rate of variation, do not change as we move in input space. The neural network kernel, however, is non-stationary. Below, we show sample functions from a Gaussian process with this kernel. We can see that the function looks qualitatively different near the origin.

RBF 커널은 고정적입니다. 즉, 변환 불변이므로 T=x−x′의 함수로 작성할 수 있습니다. 직관적으로 정상성은 입력 공간에서 이동할 때 변동률과 같은 함수의 상위 수준 속성이 변경되지 않음을 의미합니다. 그러나 신경망 커널은 고정되어 있지 않습니다. 아래에서는 이 커널을 사용한 가우스 프로세스의 샘플 함수를 보여줍니다. 함수가 원점 근처에서 질적으로 다르게 보이는 것을 볼 수 있습니다.

18.2.6. Summary

The first step in performing Bayesian inference involves specifying a prior. Gaussian processes can be used to specify a whole prior over functions. Starting from a traditional “weight space” view of modelling, we can induce a prior over functions by starting with the functional form of a model, and introducing a distribution over its parameters. We can alternatively specify a prior distribution directly in function space, with properties controlled by a kernel. The function-space approach has many advantages. We can build models that actually correspond to an infinite number of parameters, but use a finite amount of computation! Moreover, while these models have a great amount of flexibility, they also make strong assumptions about what types of functions are a priori likely, leading to relatively good generalization on small datasets.

베이지안 추론을 수행하는 첫 번째 단계는 사전 지정을 포함합니다. 가우스 프로세스를 사용하여 함수보다 전체 prior 을 지정할 수 있습니다. 모델링의 전통적인 "가중치 공간" 관점에서 시작하여 모델의 기능적 형태로 시작하고 해당 매개변수에 대한 분포를 도입함으로써 기능에 대한 사전 예측을 유도할 수 있습니다. 또는 커널에 의해 제어되는 속성을 사용하여 함수 공간에서 직접 사전 분포를 지정할 수도 있습니다. 기능 공간 접근 방식에는 많은 장점이 있습니다. 실제로 무한한 수의 매개변수에 해당하는 모델을 구축할 수 있지만 계산량은 한정되어 있습니다. 더욱이 이러한 모델은 상당한 유연성을 갖고 있지만 어떤 유형의 함수가 선험적으로 발생할 가능성이 있는지에 대한 강력한 가정을 만들어 소규모 데이터 세트에 대해 상대적으로 좋은 일반화를 이끌어냅니다.

The assumptions of models in function space are intuitively controlled by kernels, which often encode higher level properties of functions, such as smoothness and periodicity. Many kernels are stationary, meaning that they are translation invariant. Functions drawn from a Gaussian process with a stationary kernel have roughly the same high-level properties (such as rate of variation) regardless of where we look in the input space.

함수 공간에서 모델의 가정은 커널에 의해 직관적으로 제어되며, 커널은 부드러움 및 주기성과 같은 함수의 더 높은 수준 속성을 인코딩하는 경우가 많습니다. 많은 커널은 고정되어 있습니다. 즉, 변환 불변성을 의미합니다. 고정 커널을 사용하는 가우스 프로세스에서 도출된 함수는 입력 공간에서 보는 위치에 관계없이 대략 동일한 높은 수준의 속성(예: 변동률)을 갖습니다.

Gaussian processes are a relatively general model class, containing many examples of models we are already familiar with, including polynomials, Fourier series, and so on, as long as we have a Gaussian prior over the parameters. They also include neural networks with an infinite number of parameters, even without Gaussian distributions over the parameters. This connection, discovered by Radford Neal, triggered machine learning researchers to move away from neural networks, and towards Gaussian processes.

가우스 프로세스는 매개변수에 대한 가우스 사전이 있는 한 다항식, 푸리에 급수 등을 포함하여 우리에게 이미 익숙한 모델의 많은 예를 포함하는 비교적 일반적인 모델 클래스입니다. 또한 매개변수에 대한 가우스 분포가 없더라도 매개변수 수가 무한한 신경망도 포함됩니다. Radford Neal이 발견한 이 연결은 기계 학습 연구자들이 신경망에서 벗어나 가우스 프로세스로 이동하도록 촉발했습니다.

18.2.7. Exercises

'Dive into Deep Learning > D2L Gaussian Processes' 카테고리의 다른 글

D2L - 18.3. Gaussian Process Inference (0)	2023.09.10
D2L - 18.1. Introduction to Gaussian Processes (0)	2023.09.09
D2L - 18. Gaussian Processes (0)	2023.09.09

Dive into Deep Learning/D2L Gaussian Processes

D2L - 18.1. Introduction to Gaussian Processes

2023. 9. 9. 10:48 | Posted by 솔웅

18.1. Introduction to Gaussian Processes — Dive into Deep Learning 1.0.3 documentation (d2l.ai)

18.1. Introduction to Gaussian Processes — Dive into Deep Learning 1.0.3 documentation

d2l.ai

18.1. Introduction to Gaussian Processes

In many cases, machine learning amounts to estimating parameters from data. These parameters are often numerous and relatively uninterpretable — such as the weights of a neural network. Gaussian processes, by contrast, provide a mechanism for directly reasoning about the high-level properties of functions that could fit our data. For example, we may have a sense of whether these functions are quickly varying, periodic, involve conditional independencies, or translation invariance. Gaussian processes enable us to easily incorporate these properties into our model, by directly specifying a Gaussian distribution over the function values that could fit our data.

많은 경우 머신러닝은 데이터에서 매개변수를 추정하는 것과 같습니다. 이러한 매개변수는 신경망의 가중치와 같이 수가 많고 상대적으로 해석하기 어려운 경우가 많습니다. 대조적으로 가우스 프로세스는 데이터에 적합할 수 있는 함수의 상위 수준 속성을 직접 추론하기 위한 메커니즘을 제공합니다. 예를 들어, 우리는 이러한 함수가 빠르게 변화하는지, 주기적인지, 조건부 독립성을 포함하는지, 변환 불변성을 포함하는지 여부를 알 수 있습니다. 가우스 프로세스를 사용하면 데이터에 맞는 함수 값에 대한 가우스 분포를 직접 지정하여 이러한 속성을 모델에 쉽게 통합할 수 있습니다.

Let’s get a feel for how Gaussian processes operate, by starting with some examples.

몇 가지 예를 통해 가우스 프로세스가 어떻게 작동하는지 살펴보겠습니다.

Suppose we observe the following dataset, of regression targets (outputs), y, indexed by inputs, x. As an example, the targets could be changes in carbon dioxide concentrations, and the inputs could be the times at which these targets have been recorded. What are some features of the data? How quickly does it seem to varying? Do we have data points collected at regular intervals, or are there missing inputs? How would you imagine filling in the missing regions, or forecasting up until x=25?

입력 x로 인덱싱된 회귀 목표(출력) y의 다음 데이터 세트를 관찰한다고 가정합니다. 예를 들어 목표는 이산화탄소 농도의 변화일 수 있고 입력은 이러한 목표가 기록된 시간일 수 있습니다. 데이터의 일부 기능은 무엇입니까? 얼마나 빨리 변하는 것 같나요? 정기적으로 수집된 데이터 포인트가 있습니까, 아니면 누락된 입력이 있습니까? 누락된 영역을 채우거나 x=25까지 예측하는 것을 어떻게 상상하시겠습니까?

In order to fit the data with a Gaussian process, we start by specifying a prior distribution over what types of functions we might believe to be reasonable. Here we show several sample functions from a Gaussian process. Does this prior look reasonable? Note here we are not looking for functions that fit our dataset, but instead for specifying reasonable high-level properties of the solutions, such as how quickly they vary with inputs. Note that we will see code for reproducing all of the plots in this notebook, in the next notebooks on priors and inference.

데이터를 가우스 프로세스에 맞추려면 합리적이라고 믿을 수 있는 함수 유형에 대한 사전 분포를 지정하는 것부터 시작합니다. 여기서는 가우스 프로세스의 몇 가지 샘플 함수를 보여줍니다. 이 사전 조치가 합리적으로 보입니까? 여기서는 데이터 세트에 맞는 함수를 찾는 것이 아니라 입력에 따라 얼마나 빨리 변하는 지와 같은 솔루션의 합리적인 상위 수준 속성을 지정하기 위한 것입니다. 사전 및 추론에 대한 다음 노트북에서 이 노트북의 모든 플롯을 재현하기 위한 코드를 볼 수 있습니다.

Fig. 18.1.2  Sample prior functions that we may want to represent with our model.

Once we condition on data, we can use this prior to infer a posterior distribution over functions that could fit the data. Here we show sample posterior functions.

데이터에 조건을 적용하면 데이터에 적합한 함수에 대한 사후 분포를 추론하기 전에 이를 사용할 수 있습니다. 여기서는 샘플 사후 함수를 보여줍니다.

Fig. 18.1.3  Sample posterior functions, once we have observed the data.

We see that each of these functions are entirely consistent with our data, perfectly running through each observation. In order to use these posterior samples to make predictions, we can average the values of every possible sample function from the posterior, to create the curve below, in thick blue. Note that we do not actually have to take an infinite number of samples to compute this expectation; as we will see later, we can compute the expectation in closed form.

우리는 이러한 각 함수가 데이터와 완전히 일치하며 각 관찰을 통해 완벽하게 실행된다는 것을 알 수 있습니다. 예측을 위해 이러한 사후 샘플을 사용하기 위해 사후에서 가능한 모든 샘플 함수의 값을 평균화하여 아래 곡선을 두꺼운 파란색으로 만들 수 있습니다. 이 기대치를 계산하기 위해 실제로 무한한 수의 샘플을 취할 필요는 없습니다. 나중에 살펴보겠지만 닫힌 형식으로 기대값을 계산할 수 있습니다.

Fig. 18.1.4  Posterior samples, alongside posterior mean, which can be used for point predictions, in blue.

We may also want a representation of uncertainty, so we know how confident we should be in our predictions. Intuitively, we should have more uncertainty where there is more variability in the sample posterior functions, as this tells us there are many more possible values the true function could take. This type of uncertainty is called epistemic uncertainty, which is the reducible uncertainty associated with lack of information. As we acquire more data, this type of uncertainty disappears, as there will be increasingly fewer solutions consistent with what we observe. Like with the posterior mean, we can compute the posterior variance (the variability of these functions in the posterior) in closed form. With shade, we show two times the posterior standard deviation on either side of the mean, creating a credible interval that has a 95% probability of containing the true value of the function for any input x.

우리는 불확실성을 표현하기를 원할 수도 있으므로 예측에 얼마나 자신감을 가져야 하는지 알 수 있습니다. 직관적으로, 표본 사후 함수에 변동성이 더 많을수록 불확실성이 커집니다. 이는 실제 함수가 취할 수 있는 가능한 값이 더 많다는 것을 의미하기 때문입니다. 이러한 유형의 불확실성을 인식론적 불확실성이라고 하며, 이는 정보 부족과 관련된 환원 가능한 불확실성입니다. 더 많은 데이터를 수집할수록 우리가 관찰한 것과 일치하는 솔루션이 점점 줄어들기 때문에 이러한 유형의 불확실성은 사라집니다. 사후 평균과 마찬가지로 사후 분산(후방에서 이러한 함수의 변동성)을 닫힌 형식으로 계산할 수 있습니다. 음영을 사용하면 평균의 양쪽에 사후 표준 편차의 두 배를 표시하여 입력 x에 대한 함수의 실제 값을 포함할 확률이 95%인 신뢰할 수 있는 구간을 만듭니다.

Fig. 18.1.5  Posterior samples, including 95% credible set.

The plot looks somewhat cleaner if we remove the posterior samples, simply visualizing the data, posterior mean, and 95% credible set. Notice how the uncertainty grows away from the data, a property of epistemic uncertainty.

사후 샘플을 제거하고 데이터, 사후 평균 및 95% 신뢰 세트를 시각화하면 플롯이 다소 더 깨끗해 보입니다. 인식론적 불확실성의 속성인 불확실성이 데이터에서 어떻게 멀어지는지 주목하세요.

Fig. 18.1.6  Point predictions, and credible set.

The properties of the Gaussian process that we used to fit the data are strongly controlled by what’s called a covariance function, also known as a kernel. The covariance function we used is called the RBF (Radial Basis Function) kernel, which has the form

데이터를 맞추는 데 사용한 가우스 프로세스의 속성은 커널이라고도 알려진 공분산 함수에 의해 강력하게 제어됩니다. 우리가 사용한 공분산 함수는 RBF(방사형 기초 함수) 커널이라고 하며 다음과 같은 형식을 갖습니다.

공분산 함수 (Covariance function) 란?

Covariance function (also known as a kernel function in machine learning) is a mathematical function used in statistics and machine learning to quantify the degree to which two random variables (or data points) vary together. It provides a measure of how correlated or related two variables are to each other.

공분산 함수 (머신 러닝에서는 커널 함수로도 알려져 있음)는 통계 및 머신 러닝에서 두 개의 랜덤 변수(또는 데이터 포인트)가 함께 얼마나 변하는지를 측정하는 수학적 함수입니다. 이것은 두 변수가 서로 어떻게 상관되거나 관련되어 있는지를 측정하는 데 사용됩니다.

In the context of statistics and Gaussian processes, the covariance function specifies the relationship between different points in a dataset. It defines how the value of one data point covaries or varies with the value of another data point. This information is crucial for modeling and understanding the underlying patterns and relationships within the data.

Covariance functions are often used in:

통계 및 가우시안 프로세스의 맥락에서 공분산 함수는 데이터 집합의 서로 다른 지점 간의 관계를 지정합니다. 이것은 하나의 데이터 포인트의 값이 다른 데이터 포인트의 값과 어떻게 공분산하거나 변하는지를 정의합니다. 이 정보는 데이터 내부의 기본적인 패턴과 관계를 모델링하고 이해하는 데 중요합니다.

공분산 함수는 다음과 같은 분야에서 사용됩니다.

Gaussian Processes: In Gaussian processes, the covariance function (kernel) is a fundamental component. It determines how the values of the function being modeled vary across different input points. Common covariance functions include the Radial Basis Function (RBF) kernel, Matérn kernel, and many others.

가우시안 프로세스: 가우시안 프로세스에서 공분산 함수(커널)는 기본 구성 요소입니다. 이것은 모델링하려는 함수의 값이 다른 입력 지점에서 어떻게 변하는지를 결정합니다. 일반적인 공분산 함수에는 방사형 기저 함수(RBF) 커널, Matérn 커널 및 기타 여러 가지가 있습니다.
Spatial Statistics: In geostatistics and spatial statistics, covariance functions are used to model spatial dependencies in data, such as the correlation between temperatures at different locations on a map.

공간 통계: 지오통계 및 공간 통계에서 공분산 함수는 데이터 내에서 공간적 종속성(지도상의 다른 위치에서 온도 간의 상관 관계와 같은)을 모델링하는 데 사용됩니다.
Machine Learning: In machine learning, covariance functions are used in various algorithms, especially in kernel methods, including Support Vector Machines (SVMs) and Gaussian Process Regression (GPR).

머신 러닝: 머신 러닝에서 공분산 함수는 서포트 벡터 머신(SVM) 및 가우시안 프로세스 회귀(GPR)를 포함한 다양한 알고리즘에서 사용됩니다.

The choice of covariance function or kernel function has a significant impact on the performance of a model. Different covariance functions capture different types of relationships in the data, such as smoothness, periodicity, or linearity. Adjusting the parameters of the covariance function allows for fine-tuning the model's behavior.

공분산 함수 또는 커널 함수의 선택은 모델의 성능에 큰 영향을 미칩니다. 서로 다른 공분산 함수는 데이터에서 서로 다른 유형의 관계(부드러움, 주기성, 직선성 등)를 포착하며 공분산 함수의 매개변수를 조정하면 모델의 동작을 세밀하게 조정할 수 있습니다.

The hyperparameters of this kernel are interpretable. The amplitude parameter 'a' controls the vertical scale over which the function is varying, and the length-scale parameter ℓ controls the rate of variation (the wiggliness) of the function. Larger 'a' means larger function values, and larger ℓ means more slowly varying functions. Let’s see what happens to our sample prior and posterior functions as we vary 'a' and ℓ.

이 커널의 하이퍼파라미터는 해석 가능합니다. 진폭 매개변수 'a'는 함수가 변하는 수직 스케일을 제어하고, 길이 스케일 매개변수 ℓ는 함수의 변동률(흔들림)을 제어합니다. 'a'가 클수록 함수 값이 크고, ℓ가 클수록 함수가 천천히 변한다는 의미입니다. 'a'와 ℓ를 변경하면 샘플 사전 및 사후 함수에 어떤 일이 발생하는지 살펴보겠습니다.

The length-scale has a particularly pronounced effect on the predictions and uncertainty of a GP. At ||x−x′||=ℓ , the covariance between a pair of function values is a**2exp⁡(−0.5). At larger distances than ℓ , the values of the function values becomes nearly uncorrelated. This means that if we want to make a prediction at a point x∗, then function values with inputs x such that ||x−x′||>ℓ will not have a strong effect on our predictions.

길이 척도는 GP의 예측과 불확실성에 특히 뚜렷한 영향을 미칩니다. ||x−x′||=ℓ 에서 함수 값 쌍 사이의 공분산은 a**2exp⁡(−0.5)입니다. ℓ 보다 큰 거리에서는 함수 값의 상관관계가 거의 없어집니다. 이는 x* 지점에서 예측을 수행하려는 경우 ||x−x′||>ℓ와 같은 입력 x가 있는 함수 값이 예측에 큰 영향을 미치지 않음을 의미합니다.

Let’s see how changing the lengthscale affects sample prior and posterior functions, and credible sets. The above fits use a length-scale of 2. Let’s now consider ℓ=0.1,0.5,2,5,10 . A length-scale of 0.1 is very small relative to the range of the input domain we are considering, 25. For example, the values of the function at x=5 and x=10 will have essentially no correlation at such a length-scale. On the other hand, for a length-scale of 10, the function values at these inputs will be highly correlated. Note that the vertical scale changes in the following figures.

길이 척도를 변경하면 샘플 사전 및 사후 함수와 신뢰할 수 있는 세트에 어떤 영향을 미치는지 살펴보겠습니다. 위의 피팅은 길이 척도 2를 사용합니다. 이제 ℓ=0.1,0.5,2,5,10을 고려해 보겠습니다. 0.1의 길이 척도는 우리가 고려하고 있는 입력 영역의 범위인 25에 비해 매우 작습니다. 예를 들어 x=5와 x=10의 함수 값은 이러한 길이 척도에서 본질적으로 상관 관계가 없습니다. . 반면, 길이 척도가 10인 경우 이러한 입력의 함수 값은 높은 상관 관계를 갖습니다. 다음 그림에서는 수직 스케일이 변경됩니다.

Notice as the length-scale increases the ‘wiggliness’ of the functions decrease, and our uncertainty decreases. If the length-scale is small, the uncertainty will quickly increase as we move away from the data, as the datapoints become less informative about the function values.

길이 척도가 증가함에 따라 함수의 '흔들림'이 감소하고 불확실성이 감소합니다. 길이 척도가 작으면 데이터 포인트가 함수 값에 대한 정보를 덜 제공하므로 데이터에서 멀어질수록 불확실성이 빠르게 증가합니다.

Now, let’s vary the amplitude parameter, holding the length-scale fixed at 2. Note the vertical scale is held fixed for the prior samples, and varies for the posterior samples, so you can clearly see both the increasing scale of the function, and the fits to the data.

이제 길이 스케일을 2로 고정하여 진폭 매개변수를 변경해 보겠습니다. 수직 스케일은 이전 샘플에 대해 고정되어 있고 사후 샘플에 대해 달라집니다. 따라서 함수의 스케일 증가와 데이터에 적합합니다.

We see the amplitude parameter affects the scale of the function, but not the rate of variation. At this point, we also have the sense that the generalization performance of our procedure will depend on having reasonable values for these hyperparameters. Values of ℓ=2 and a=1 appeared to provide reasonable fits, while some of the other values did not. Fortunately, there is a robust and automatic way to specify these hyperparameters, using what is called the marginal likelihood, which we will return to in the notebook on inference.

진폭 매개변수는 함수의 규모에 영향을 주지만 변동률에는 영향을 미치지 않습니다. 이 시점에서 우리는 프로시저의 일반화 성능이 이러한 하이퍼파라미터에 대한 합리적인 값을 갖는지에 달려 있다는 것을 알게 되었습니다. ℓ=2 및 a=1 값은 합리적인 적합치를 제공하는 것으로 보였지만 일부 다른 값은 그렇지 않았습니다. 다행스럽게도 한계 우도(Marginal Likelihood)를 사용하여 이러한 하이퍼파라미터를 지정하는 강력하고 자동적인 방법이 있습니다. 이 방법은 추론에 대한 노트북에서 다시 설명하겠습니다.

So what is a GP, really? As we started, a GP simply says that any collection of function values f(x1),…,f(xn), indexed by any collection of inputs x1,…,xn has a joint multivariate Gaussian distribution. The mean vector μ of this distribution is given by a mean function, which is typically taken to be a constant or zero. The covariance matrix of this distribution is given by the kernel evaluated at all pairs of the inputs x.

그렇다면 GP란 무엇일까요? 우리가 시작했을 때, GP는 단순히 입력 x1,…,xn의 컬렉션에 의해 인덱스된 함수 값 f(x1),…,f(xn)의 컬렉션이 결합 다변량 가우스 분포를 갖는다고 말합니다. 이 분포의 평균 벡터 μ는 평균 함수로 제공되며 일반적으로 상수 또는 0으로 간주됩니다. 이 분포의 공분산 행렬은 모든 입력 x 쌍에서 평가된 커널에 의해 제공됩니다.

Equation (18.1.2) specifies a GP prior. We can compute the conditional distribution of f(x) for any x given f(x1),…,f(xn), the function values we have observed. This conditional distribution is called the posterior, and it is what we use to make predictions.

식 (18.1.2)은 GP 사전을 지정합니다. 우리가 관찰한 함수 값 f(x1),…,f(xn)이 주어지면 임의의 x에 대해 f(x)의 조건부 분포를 계산할 수 있습니다. 이 조건부 분포를 사후 분포라고 하며 예측을 위해 사용합니다.

In particular, 특히,

where k(x,x1:n) is a 1×n vector formed by evaluating k(x,xi) for i=1,…,n and k(x1:n,x1:n) is an n×n matrix formed by evaluating k(xi,xj) for i,j=1,…,n. m is what we can use as a point predictor for any x, and s**2 is what we use for uncertainty: if we want to create an interval with a 95% probability that f(x) is in the interval, we would use m±2s. The predictive means and uncertainties for all the above figures were created using these equations. The observed data points were given by f(x1),…,f(xn) and chose a fine grained set of x points to make predictions.

여기서 k(x,x1:n)은 i=1,…,n에 대해 k(x,xi)를 평가하여 형성된 1×n 벡터이고 k(x1:n,x1:n)은 형성된 n×n 행렬입니다. i,j=1,…,n에 대해 k(xi,xj)를 평가합니다. m은 임의의 x에 대한 점 예측자로 사용할 수 있는 것이고, s**2는 불확실성에 사용하는 것입니다. f(x)가 구간에 있을 확률이 95%인 구간을 생성하려면 다음과 같이 합니다. m±2초를 사용하세요. 위의 모든 수치에 대한 예측 평균과 불확실성은 이러한 방정식을 사용하여 생성되었습니다. 관찰된 데이터 포인트는 f(x1),…,f(xn)으로 제공되었으며 예측을 위해 세밀한 x 포인트 세트를 선택했습니다.

Let’s suppose we observe a single datapoint, f(x1), and we want to determine the value of f(x) at some x. Because f(x) is described by a Gaussian process, we know the joint distribution over (f(x),f(x1)) is Gaussian:

단일 데이터 포인트 f(x1)를 관찰하고 일부 x에서 f(x)의 값을 결정한다고 가정해 보겠습니다. f(x)는 가우스 프로세스로 설명되므로 (f(x),f(x1))에 대한 결합 분포가 가우스임을 알 수 있습니다.

The off-diagonal expression k(x,x1)=k(x1,x) tells us how correlated the function values will be — how strongly determined f(x) will be from f(x1). We have seen already that if we use a large length-scale, relative to the distance between x and x1, ||x−x1||, then the function values will be highly correlated. We can visualize the process of determining f(x) from f(x1) both in the space of functions, and in the joint distribution over f(x1),f(x). Let’s initially consider an x such that k(x,x1)=0.9, and k(x,x)=1, meaning that the value of f(x) is moderately correlated with the value of f(x1). In the joint distribution, the contours of constant probability will be relatively narrow ellipses.

비대각선 표현식 k(x,x1)=k(x1,x)는 함수 값의 상관 관계, 즉 f(x)가 f(x1)에서 얼마나 강력하게 결정되는지 알려줍니다. 우리는 x와 x1 사이의 거리(||x−x1||)에 상대적으로 큰 길이 척도를 사용하면 함수 값의 상관 관계가 높다는 것을 이미 확인했습니다. 함수 공간과 f(x1),f(x)에 대한 결합 분포 모두에서 f(x1)에서 f(x)를 결정하는 프로세스를 시각화할 수 있습니다. 처음에 k(x,x1)=0.9, k(x,x)=1인 x를 고려해 보겠습니다. 이는 f(x) 값이 f(x1) 값과 중간 정도의 상관 관계가 있음을 의미합니다. 결합 분포에서 일정한 확률의 윤곽은 상대적으로 좁은 타원이 됩니다.

Suppose we observe f(x1)=1.2. To condition on this value of f(x1), we can draw a horizontal line at 1.2 on our plot of the density, and see that the value of f(x) is mostly constrained to [0.64,1.52]. We have also drawn this plot in function space, showing the observed point f(x1) in orange, and 1 standard deviation of the Gaussian process predictive distribution for f(x) in blue, about the mean value of 1.08.

f(x1)=1.2를 관찰한다고 가정합니다. 이 f(x1) 값을 조건으로 하기 위해 밀도 플롯에서 1.2에 수평선을 그릴 수 있으며 f(x) 값이 대부분 [0.64,1.52]로 제한되어 있음을 알 수 있습니다. 우리는 또한 이 플롯을 함수 공간에 그려서 관찰된 점 f(x1)을 주황색으로 표시하고 f(x)에 대한 가우스 프로세스 예측 분포의 1 표준 편차를 파란색으로 표시했습니다(평균값 약 1.08).

Now suppose we have a stronger correlation, k(x,x1)=0.95. Now the ellipses have narrowed further, and the value of f(x) is even more strongly determined by f(x1). Drawing a horizontal line at 1.2, we see the contours for f(x) support values mostly within [0.83,1.45]. Again, we also show the plot in function space, with one standard deviation about the mean predictive value of 1.14.

이제 k(x,x1)=0.95라는 더 강한 상관관계가 있다고 가정합니다. 이제 타원은 더 좁아졌고 f(x)의 값은 f(x1)에 의해 훨씬 더 강력하게 결정됩니다. 1.2에서 수평선을 그리면 f(x) 지원 값의 윤곽이 대부분 [0.83,1.45] 내에 있는 것을 볼 수 있습니다. 다시 말하지만, 평균 예측 값 1.14에 대한 표준 편차가 1인 함수 공간의 플롯도 표시됩니다.

We see that the posterior mean predictor of our Gaussian process is closer to 1.2, because there is now a stronger correlation. We also see that our uncertainty (the error bars) have somewhat decreased. Despite the strong correlation between these function values, our uncertainty is still righly quite large, because we have only observed a single data point!

이제 더 강한 상관관계가 있기 때문에 가우스 프로세스의 사후 평균 예측 변수가 1.2에 더 가깝다는 것을 알 수 있습니다. 또한 불확실성(오차 막대)이 다소 감소한 것을 볼 수 있습니다. 이러한 함수 값 사이의 강한 상관 관계에도 불구하고 우리는 단 하나의 데이터 포인트만 관찰했기 때문에 불확실성은 여전히 상당히 큽니다!

This procedure can give us a posterior on f(x) for any x, for any number of points we have observed. Suppose we observe f(x1),f(x2). We now visualize the posterior for f(x) at a particular x=x′ in function space. The exact distribution for f(x) is given by the above equations. f(x) is Gaussian distributed, with mean

이 절차는 우리가 관찰한 임의의 수의 점에 대해 임의의 x에 대한 f(x)의 사후값을 제공할 수 있습니다. f(x1),f(x2)를 관찰한다고 가정합니다. 이제 함수 공간의 특정 x=x′에서 f(x)의 사후를 시각화합니다. f(x)의 정확한 분포는 위 방정식으로 제공됩니다. f(x)는 평균을 갖는 가우스 분포입니다.

and variance 및 분산

In this introductory notebook, we have been considering noise free observations. As we will see, it is easy to include observation noise. If we assume that the data are generated from a latent noise free function f(x) plus iid Gaussian noise ϵ(x)∼N(0,σ**2) with variance σ**2, then our covariance function simply becomes k(xi,xj)→k(xi,xj)+δijσ**2, where δif=1 if i=j and 0 otherwise.

이 소개 노트에서는 잡음 없는 관찰을 고려했습니다. 앞으로 살펴보겠지만 관찰 노이즈를 포함하는 것은 쉽습니다. 데이터가 잠재 잡음 없는 함수 f(x)와 분산 σ**2를 갖는 iid 가우스 잡음 ϵ(x)∼N(0,σ**2)에서 생성된다고 가정하면 공분산 함수는 간단히 k가 됩니다. (xi,xj)→k(xi,xj)+δijσ**2, 여기서 i=j이면 δif=1이고 그렇지 않으면 0입니다.

We have already started getting some intuition about how we can use a Gaussian process to specify a prior and posterior over solutions, and how the kernel function affects the properties of these solutions. In the following notebooks, we will precisely show how to specify a Gaussian process prior, introduce and derive various kernel functions, and then go through the mechanics of how to automatically learn kernel hyperparameters, and form a Gaussian process posterior to make predictions. While it takes time and practice to get used to concepts such as a “distributions over functions”, the actual mechanics of finding the GP predictive equations is actually quite simple — making it easy to get practice to form an intuitive understanding of these concepts.

우리는 이미 가우스 프로세스를 사용하여 솔루션에 대한 사전 및 사후를 지정하는 방법과 커널 기능이 이러한 솔루션의 속성에 어떻게 영향을 미치는지에 대한 직관을 얻기 시작했습니다. 다음 노트에서는 사전에 가우스 프로세스를 지정하는 방법, 다양한 커널 함수를 소개 및 도출하는 방법, 그리고 커널 하이퍼파라미터를 자동으로 학습하는 방법, 사후에 가우스 프로세스를 형성하여 예측하는 방법에 대한 메커니즘을 자세히 보여줍니다. "함수에 대한 분포"와 같은 개념에 익숙해지려면 시간과 연습이 필요하지만 GP 예측 방정식을 찾는 실제 메커니즘은 실제로 매우 간단하므로 이러한 개념을 직관적으로 이해하기 위한 연습을 쉽게 할 수 있습니다.

18.1.1. Summary

In typical machine learning, we specify a function with some free parameters (such as a neural network and its weights), and we focus on estimating those parameters, which may not be interpretable. With a Gaussian process, we instead reason about distributions over functions directly, which enables us to reason about the high-level properties of the solutions. These properties are controlled by a covariance function (kernel), which often has a few highly interpretable hyperparameters. These hyperparameters include the length-scale, which controls how rapidly (how wiggily) the functions are. Another hyperparameter is the amplitude, which controls the vertical scale over which our functions are varying. Representing many different functions that can fit the data, and combining them all together into a predictive distribution, is a distinctive feature of Bayesian methods. Because there is a greater amount of variability between possible solutions far away from the data, our uncertainty intuitively grows as we move from the data.

일반적인 기계 학습에서는 일부 자유 매개변수(예: 신경망 및 해당 가중치)를 사용하여 함수를 지정하고 해석할 수 없는 이러한 매개변수를 추정하는 데 중점을 둡니다. 가우스 프로세스를 사용하면 함수에 대한 분포를 직접 추론할 수 있으므로 솔루션의 상위 수준 속성을 추론할 수 있습니다. 이러한 속성은 해석 가능한 몇 가지 하이퍼 매개변수가 있는 공분산 함수(커널)에 의해 제어됩니다. 이러한 하이퍼파라미터에는 함수의 속도(얼마나 흔들리는지)를 제어하는 길이 척도가 포함됩니다. 또 다른 하이퍼파라미터는 진폭으로, 함수가 변하는 수직 스케일을 제어합니다. 데이터를 맞출 수 있는 다양한 함수를 표현하고 이를 모두 예측 분포로 결합하는 것은 베이지안 방법의 독특한 특징입니다. 데이터에서 멀리 떨어져 있는 가능한 솔루션 간에는 더 큰 변동성이 있기 때문에 데이터에서 이동할 때 불확실성이 직관적으로 커집니다.

A Gaussian process represents a distribution over functions by specifying a multivariate normal (Gaussian) distribution over all possible function values. It is possible to easily manipulate Gaussian distributions to find the distribution of one function value based on the values of any set of other values. In other words, if we observe a set of points, then we can condition on these points and infer a distribution over what the value of the function might look like at any other input. How we model the correlations between these points is determined by the covariance function and is what defines the generalization properties of the Gaussian process. While it takes time to get used to Gaussian processes, they are easy to work with, have many applications, and help us understand and develop other model classes, like neural networks.

가우스 프로세스는 가능한 모든 함수 값에 대해 다변량 정규(가우스) 분포를 지정하여 함수에 대한 분포를 나타냅니다. 가우스 분포를 쉽게 조작하여 다른 값 세트의 값을 기반으로 한 함수 값의 분포를 찾는 것이 가능합니다. 즉, 일련의 점을 관찰하면 이러한 점을 조건으로 하고 다른 입력에서 함수 값이 어떻게 보일지에 대한 분포를 추론할 수 있습니다. 이러한 점 간의 상관 관계를 모델링하는 방법은 공분산 함수에 의해 결정되며 이것이 가우스 프로세스의 일반화 속성을 정의합니다. 가우스 프로세스에 익숙해지는 데는 시간이 걸리지만 작업하기 쉽고 응용 프로그램이 많으며 신경망과 같은 다른 모델 클래스를 이해하고 개발하는 데 도움이 됩니다.

18.1.2. Exercises

What is the difference between epistemic uncertainty versus observation uncertainty?
Besides rate of variation and amplitude, what other properties of functions might we want to consider, and what would be real-world examples of functions that have those properties?
The RBF covariance function we considered says that covariances (and correlations) between observations decrease with their distance in the input space (times, spatial locations, etc.). Is this a reasonable assumption? Why or why not?
Is a sum of two Gaussian variables Gaussian? Is a product of two Gaussian variables Gaussian? If (a,b) have a joint Gaussian distribution, is a|b (a given b) Gaussian? Is a Gaussian?

6. Do you think increasing our estimate of observation noise would increase or decrease our estimate of the length-scale of the ground truth function?

7. As we move away from the data, suppose the uncertainty in our predictive distribution increases to a point, then stops increasing. Why might that happen?

'Dive into Deep Learning > D2L Gaussian Processes' 카테고리의 다른 글

D2L - 18.3. Gaussian Process Inference (0)	2023.09.10
D2L - 18.2. Gaussian Process Priors (1)	2023.09.09
D2L - 18. Gaussian Processes (0)	2023.09.09

Dive into Deep Learning/D2L Gaussian Processes

D2L - 18. Gaussian Processes

2023. 9. 9. 04:15 | Posted by 솔웅

https://d2l.ai/chapter_gaussian-processes/index.html

18. Gaussian Processes — Dive into Deep Learning 1.0.3 documentation

d2l.ai

18. Gaussian Processes

Andrew Gordon Wilson (New York University and Amazon)

Gaussian processes (GPs) are ubitiquous. You have already encountered many examples of GPs without realizing it. Any model that is linear in its parameters with a Gaussian distribution over the parameters is a Gaussian process. This class spans discrete models, including random walks, and autoregressive processes, as well as continuous models, including Bayesian linear regression models, polynomials, Fourier series, radial basis functions, and even neural networks with an infinite number of hidden units. There is a running joke that “everything is a special case of a Gaussian process”.

가우스 프로세스(GP)는 어디에나 있습니다. 당신은 이미 깨닫지 못한 채 GP의 많은 예를 접했습니다. 매개변수에 대한 가우스 분포를 갖는 매개변수가 선형인 모든 모델은 가우스 프로세스입니다. 이 클래스는 베이지안 선형 회귀 모델, 다항식, 푸리에 급수, 방사형 기저 함수 및 무한한 수의 숨겨진 단위가 있는 신경망을 포함한 연속 모델뿐만 아니라 랜덤 워크 및 자동 회귀 프로세스를 포함한 이산 모델을 포괄합니다. "모든 것은 가우스 과정의 특별한 경우이다"라는 농담이 있습니다.

Learning about Gaussian processes is important for three reasons: (1) they provide a function space perspective of modelling, which makes understanding a variety of model classes, including deep neural networks, much more approachable; (2) they have an extraordinary range of applications where they are state-of-the-art, including active learning, hyperparameter learning, auto-ML, and spatiotemporal regression; (3) over the last few years, algorithmic advances have made Gaussian processes increasingly scalable and relevant, harmonizing with deep learning through frameworks such as GPyTorch (Gardner et al., 2018). Indeed, GPs and and deep neural networks are not competing approaches, but highly complementary, and can be combined to great effect. These algorithmic advances are not just relevant to Gaussian processes, but provide a foundation in numerical methods that is broadly useful in deep learning.

가우스 프로세스에 대해 배우는 것은 세 가지 이유로 중요합니다. (1) 모델링의 함수 공간 관점을 제공하여 심층 신경망을 포함한 다양한 모델 클래스를 훨씬 더 쉽게 이해할 수 있습니다. (2) 능동 학습, 하이퍼파라미터 학습, 자동 ML 및 시공간 회귀를 포함하여 최첨단 애플리케이션에 적용할 수 있는 범위가 넓습니다. (3) 지난 몇 년 동안 알고리즘의 발전으로 인해 GPyTorch(Gardner et al., 2018)와 같은 프레임워크를 통해 딥 러닝과 조화를 이루면서 가우스 프로세스의 확장성과 관련성이 점점 더 높아졌습니다. 실제로 GP와 심층 신경망은 경쟁적인 접근 방식이 아니라 매우 상호보완적이며 결합하여 큰 효과를 낼 수 있습니다. 이러한 알고리즘의 발전은 가우스 프로세스에만 관련된 것이 아니라 딥 러닝에서 광범위하게 유용한 수치적 방법의 기초를 제공합니다.

In this chapter, we introduce Gaussian processes. In the introductory notebook, we start by reasoning intuitively about what Gaussian processes are and how they directly model functions. In the priors notebook, we focus on how to specify Gaussian process priors. We directly connect the tradiational weight-space approach to modelling to function space, which will help us reason about constructing and understanding machine learning models, including deep neural networks. We then introduce popular covariance functions, also known as kernels, which control the generalization properties of a Gaussian process. A GP with a given kernel defines a prior over functions. In the inference notebook, we will show how to use data to infer a posterior, in order to make predictions. This notebook contains from-scratch code for making predictions with a Gaussian process, as well as an introduction to GPyTorch. In upcoming notebooks, we will introduce the numerics behind Gaussian processes, which is useful for scaling Gaussian processes but also a powerful general foundation for deep learning, and advanced use-cases such as hyperparameter tuning in deep learning. Our examples will make use of GPyTorch, which makes Gaussian processes scale, and is closely integrated with deep learning functionality and PyTorch.

이번 장에서는 가우스 프로세스를 소개합니다. 입문 노트에서는 가우스 프로세스가 무엇인지, 그리고 이것이 기능을 직접 모델링하는 방법에 대해 직관적으로 추론하는 것부터 시작합니다. 사전 노트북에서는 가우스 프로세스 사전을 지정하는 방법에 중점을 둡니다. 우리는 모델링에 대한 전통적인 가중치 공간 접근 방식을 함수 공간에 직접 연결합니다. 이는 심층 신경망을 포함한 기계 학습 모델을 구성하고 이해하는 데 도움이 될 것입니다. 그런 다음 가우스 프로세스의 일반화 속성을 제어하는 커널이라고도 알려진 널리 사용되는 공분산 함수를 소개합니다. 주어진 커널을 가진 GP는 기능에 대한 사전 정의를 정의합니다. 추론 노트북에서는 예측을 위해 데이터를 사용하여 사후 추론을 수행하는 방법을 보여줍니다. 이 노트북에는 GPyTorch에 대한 소개뿐만 아니라 가우스 프로세스로 예측을 수행하기 위한 처음부터 시작하는 코드가 포함되어 있습니다. 다가오는 노트북에서는 가우스 프로세스를 확장하는 데 유용할 뿐만 아니라 딥 러닝을 위한 강력한 일반 기반이자 딥 러닝의 하이퍼파라미터 튜닝과 같은 고급 사용 사례인 가우스 프로세스 이면의 수치를 소개할 것입니다. 우리의 예에서는 가우시안 프로세스를 확장하고 딥 러닝 기능 및 PyTorch와 긴밀하게 통합되는 GPyTorch를 사용합니다.

Gaussian Processes 란?

**가우시안 프로세스(Gaussian Processes)**는 확률론적 모델링과 머신 러닝에서 사용되는 강력한 도구 중 하나입니다. 가우시안 프로세스는 데이터의 확률 분포를 모델링하고 예측하는 데 사용되며, 주로 회귀 문제 및 베이지안 최적화에 적용됩니다.

가우시안 프로세스는 확률적인 함수의 집합으로 정의되며, 이 함수들은 확률 분포에 따라서 결정됩니다. 각 함수는 무한한 차원의 입력 공간에서 정의되며, 이 함수들은 데이터 포인트의 무한한 집합에 대한 관측치를 설명하는 데 사용됩니다. 가우시안 프로세스는 다음과 같은 특징을 가집니다:

확률적 모델: 가우시안 프로세스는 함수의 확률 분포를 모델링하며, 각 함수 값은 확률 변수로 취급됩니다. 이를 통해 불확실성을 포함한 예측을 수행할 수 있습니다.
확률 분포의 파라미터: 가우시안 프로세스의 주요 파라미터는 평균 함수(일반적으로 0)와 공분산 함수(커널 함수)입니다. 이러한 함수들은 가우시안 프로세스의 형태와 성능을 결정하는 데 중요한 역할을 합니다.
평활성 및 유연성: 가우시안 프로세스는 입력 공간에서 연속적이고 매끄러운 함수를 모델링하므로 노이즈가 있는 데이터에도 적합합니다. 이러한 성질은 회귀 문제에서 특히 유용합니다.
확률적 예측: 가우시안 프로세스는 예측할 때 확률적 결과를 반환하며, 예측의 불확실성을 측정할 수 있습니다. 이를 통해 예측 신뢰도를 고려한 의사 결정을 내릴 수 있습니다.

가우시안 프로세스는 회귀, 분류, 최적화, 시뮬레이션 등 다양한 문제에 적용되며, 베이지안 하이퍼파라미터 최적화 및 함수 근사에도 사용됩니다. 가우시안 프로세스는 작은 데이터셋부터 대규모 데이터셋까지 다양한 문제에 유용하며, 머신 러닝 및 통계 분야에서 확장성과 강력한 예측 능력을 제공하는 중요한 도구 중 하나입니다.

'Dive into Deep Learning > D2L Gaussian Processes' 카테고리의 다른 글

D2L - 18.3. Gaussian Process Inference (0)	2023.09.10
D2L - 18.2. Gaussian Process Priors (1)	2023.09.09
D2L - 18.1. Introduction to Gaussian Processes (0)	2023.09.09

Dive into Deep Learning/D2L Reinforcement Learning

D2L - 17.3. Q-Learning

2023. 9. 5. 12:45 | Posted by 솔웅

https://d2l.ai/chapter_reinforcement-learning/qlearning.html

17.3. Q-Learning — Dive into Deep Learning 1.0.3 documentation

d2l.ai

17.3. Q-Learning

In the previous section, we discussed the Value Iteration algorithm which requires accessing the complete Markov decision process (MDP), e.g., the transition and reward functions. In this section, we will look at Q-Learning (Watkins and Dayan, 1992) which is an algorithm to learn the value function without necessarily knowing the MDP. This algorithm embodies the central idea behind reinforcement learning: it will enable the robot to obtain its own data.

이전 섹션에서는 전체 마르코프 결정 프로세스(MDP)(예: 전환 및 보상 기능)에 액세스해야 하는 Value Iteration 알고리즘에 대해 논의했습니다. 이번 절에서는 MDP를 알지 못하는 상태에서 가치함수를 학습하는 알고리즘인 Q-Learning(Watkins and Dayan, 1992)을 살펴보겠습니다. 이 알고리즘은 강화 학습의 핵심 아이디어를 구현합니다. 이를 통해 로봇은 자체 데이터를 얻을 수 있습니다.

Q-Learning이란?

'Q-Learning'은 강화 학습(Reinforcement Learning)에서 사용되는 강화 학습 알고리즘 중 하나입니다. Q-Learning은 미래 보상을 고려하여 에이전트가 최적의 행동을 학습하는 데 사용됩니다.

Q-Learning의 주요 아이디어는 'Q-Value' 또는 'Action-Value' 함수를 학습하는 것입니다. 각 상태(State) 및 행동(Action)에 대한 Q-Value는 해당 상태에서 특정 행동을 선택했을 때 얻을 수 있는 예상 보상의 합계를 나타냅니다. Q-Value 함수는 다음과 같이 정의됩니다.

Q(s, a) = (1 - α) * Q(s, a) + α * (R + γ * max(Q(s', a')))

여기서:

Q(s, a): 상태 s에서 행동 a를 선택할 때의 Q-Value입니다.
α(알파): 학습률(learning rate)로, Q-Value를 업데이트할 때 현재 값과 새로운 값을 얼마나 가중치를 두고 합칠지 결정합니다.
R: 현재 상태에서 행동을 수행했을 때 얻는 보상(reward)입니다.
γ(감마): 감쇠 계수(discount factor)로, 미래 보상을 현재 보상보다 얼마나 가치 있게 여길지를 결정합니다.
max(Q(s', a')): 다음 상태 s'에서 가능한 모든 행동 중에서 최대 Q-Value를 선택합니다.

Q-Learning 알고리즘의 주요 특징은 다음과 같습니다:

모델-Fee: 환경에 대한 사전 지식 없이 직접 경험을 통해 학습합니다.
Off-Policy: 정책(policy)을 따라가는 것이 아니라, 최적의 정책을 찾기 위해 여러 정책을 시도하면서 학습합니다.
탐험(Exploration)과 이용(Exploitation)의 균형: Q-Learning은 탐험을 통해 더 나은 행동을 찾으며, 이용을 통해 현재까지 학습된 지식을 활용합니다.

Q-Learning은 강화 학습의 고전적인 알고리즘 중 하나로, 다양한 응용 분야에서 사용됩니다. 특히, 게임이나 로봇 제어와 같은 영역에서 많이 활용되며, 강화 학습의 핵심 개념을 이해하는 데 도움이 됩니다.

17.3.1. The Q-Learning Algorithm

Recall that value iteration for the action-value function in Value Iteration corresponds to the update

Value Iteration에서 action-value function에 대한 value iteration은 업데이트에 해당한다는 점을 기억하세요.

As we discussed, implementing this algorithm requires knowing the MDP, specifically the transition function P(s′∣s,a). The key idea behind Q-Learning is to replace the summation over all s′∈S in the above expression by a summation over the states visited by the robot. This allows us to subvert the need to know the transition function.

논의한 대로 이 알고리즘을 구현하려면 MDP, 특히 전이 함수 P(s′∣s,a)를 알아야 합니다. Q-Learning의 핵심 아이디어는 위 표현식의 모든 s′∈S에 대한 합산을 로봇이 방문한 상태에 대한 합산으로 대체하는 것입니다. 이를 통해 우리는 전환 함수를 알아야 할 필요성을 뒤집을 수 있습니다.

17.3.2. An Optimization Problem Underlying Q-Learning

Let us imagine that the robot uses a policy πe(a∣s) to take actions. Just like the previous chapter, it collects a dataset of n trajectories of T timesteps each {(s**i t,a**i t)t=0,…,T−1}i=1,…,n. Recall that value iteration is really a set of constraints that ties together the action-value Q*(s,a) of different states and actions to each other. We can implement an approximate version of value iteration using the data that the robot has collected using πe as

로봇이 정책 πe(a∣s)를 사용하여 조치를 취한다고 가정해 보겠습니다. 이전 장과 마찬가지로 각 {(s**i t,a**i t)t=0,…,T−1}i=1,…,n T 시간 단계의 n 궤적 데이터 세트를 수집합니다. value iteration은 실제로 서로 다른 상태와 동작의 동작 값 Q*(s,a)를 서로 연결하는 제약 조건 집합이라는 점을 기억하세요. 로봇이 πe를 사용하여 수집한 데이터를 사용하여 대략적인 value iteration 버전을 구현할 수 있습니다.

Let us first observe the similarities and differences between this expression and value iteration above. If the robot’s policy πe were equal to the optimal policy π*, and if it collected an infinite amount of data, then this optimization problem would be identical to the optimization problem underlying value iteration. But while value iteration requires us to know P(s′∣s,a), the optimization objective does not have this term. We have not cheated: as the robot uses the policy πe to take an action a**i t at state s**i t, the next state s**i t+1 is a sample drawn from the transition function. So the optimization objective also has access to the transition function, but implicitly in terms of the data collected by the robot.

먼저 이 표현과 위의 값 반복 간의 유사점과 차이점을 살펴보겠습니다. 로봇의 정책 πe가 최적 정책 π*와 같고 무한한 양의 데이터를 수집했다면 이 최적화 문제는 가치 반복의 기본이 되는 최적화 문제와 동일할 것입니다. 그러나 값 반복을 위해서는 P(s′∣s,a)를 알아야 하지만 최적화 목적에는 이 항이 없습니다. 우리는 속이지 않았습니다. 로봇이 상태 s**i t에서 a**i t 작업을 수행하기 위해 정책 πe를 사용하므로 다음 상태 s**i t+1은 전이 함수에서 가져온 샘플입니다. 따라서 최적화 목표도 전환 기능에 액세스할 수 있지만 암시적으로 로봇이 수집한 데이터 측면에서 접근할 수 있습니다.

The variables of our optimization problem are Q(s,a) for all s∈S and a∈A. We can minimize the objective using gradient descent. For every pair (s**i t,a**i t) in our dataset, we can write

최적화 문제의 변수는 모든 s∈S 및 a∈A에 대한 Q(s,a)입니다. 경사 하강을 사용하여 목표를 최소화할 수 있습니다. 데이터 세트의 모든 쌍(s**i t,a**i t)에 대해 다음과 같이 쓸 수 있습니다.

where α is the learning rate. Typically in real problems, when the robot reaches the goal location, the trajectories end. The value of such a terminal state is zero because the robot does not take any further actions beyond this state. We should modify our update to handle such states as

여기서 α는 학습률입니다. 일반적으로 실제 문제에서는 로봇이 목표 위치에 도달하면 궤도가 종료됩니다. 로봇이 이 상태를 넘어서는 추가 작업을 수행하지 않기 때문에 이러한 최종 상태의 값은 0입니다. 다음과 같은 상태를 처리하려면 업데이트를 수정해야 합니다.

where 𝟙 s**i t+1 is terminal is an indicator variable that is one if s**i t+1 is a terminal state and zero otherwise. The value of state-action tuples (s,a) that are not a part of the dataset is set to −∞. This algorithm is known as Q-Learning.

여기서 𝟙 s**i t+1 is 터미널은 s**i t+1이 터미널 상태이면 1이고 그렇지 않으면 0인 표시 변수입니다. 데이터세트의 일부가 아닌 상태-동작 튜플(s,a)의 값은 −무효로 설정됩니다. 이 알고리즘은 Q-Learning으로 알려져 있습니다.

Given the solution of these updates Q^, which is an approximation of the optimal value function Q*, we can obtain the optimal deterministic policy corresponding to this value function easily using

최적의 가치 함수 Q*의 근사치인 이러한 업데이트 Q^의 솔루션이 주어지면 다음을 사용하여 이 가치 함수에 해당하는 최적의 결정론적 정책을 쉽게 얻을 수 있습니다.

There can be situations when there are multiple deterministic policies that correspond to the same optimal value function; such ties can be broken arbitrarily because they have the same value function.

동일한 최적 가치 함수에 해당하는 여러 결정론적 정책이 있는 상황이 있을 수 있습니다. 이러한 관계는 동일한 가치 함수를 갖기 때문에 임의로 끊어질 수 있습니다.

17.3.3. Exploration in Q-Learning

The policy used by the robot to collect data πe is critical to ensure that Q-Learning works well. Afterall, we have replaced the expectation over s′ using the transition function P(s′∣s,a) using the data collected by the robot. If the policy πe does not reach diverse parts of the state-action space, then it is easy to imagine our estimate Q^ will be a poor approximation of the optimal Q*. It is also important to note that in such a situation, the estimate of Q* at all states s∈S will be bad, not just the ones visited by πe. This is because the Q-Learning objective (or value iteration) is a constraint that ties together the value of all state-action pairs. It is therefore critical to pick the correct policy πe to collect data.

Q-Learning이 제대로 작동하려면 로봇이 데이터 πe를 수집하는 데 사용하는 정책이 중요합니다. 결국 우리는 로봇이 수집한 데이터를 사용하여 전이 함수 P(s′∣s,a)를 사용하여 s′에 대한 기대값을 대체했습니다. 정책 πe가 상태-행동 공간의 다양한 부분에 도달하지 못한다면 우리의 추정치 Q^가 최적 Q*에 대한 잘못된 근사치일 것이라고 상상하기 쉽습니다. 그러한 상황에서는 πe가 방문한 상태뿐만 아니라 모든 상태 s∈S에서 Q*의 추정값이 나쁠 것이라는 점에 유의하는 것도 중요합니다. 이는 Q-Learning 목표(또는 값 반복)가 모든 상태-작업 쌍의 값을 하나로 묶는 제약 조건이기 때문입니다. 따라서 데이터를 수집하기 위해 올바른 정책을 선택하는 것이 중요합니다.

We can mitigate this concern by picking a completely random policy πe that samples actions uniformly randomly from A. Such a policy would visit all states, but it will take a large number of trajectories before it does so.

우리는 A에서 균일하게 무작위로 작업을 샘플링하는 완전히 무작위적인 정책 πe를 선택하여 이러한 우려를 완화할 수 있습니다. 이러한 정책은 모든 상태를 방문하지만 그렇게 하기 전에 많은 수의 궤적을 필요로 합니다.

We thus arrive at the second key idea in Q-Learning, namely exploration. Typical implementations of Q-Learning tie together the current estimate of Q and the policy πe to set

where ε is called the “exploration parameter” and is chosen by the user. The policy πe is called an exploration policy. This particular πe is called an ε-greedy exploration policy because it chooses the optimal action (under the current estimate Q^) with probability 1−ε but explores randomly with the remainder probability ε. We can also use the so-called softmax exploration policy

여기서 ε는 "탐색 매개변수"라고 하며 사용자가 선택합니다. 정책 πe를 탐사 정책이라고 합니다. 이 특정 πe는 확률 1−ε로 최적의 행동(현재 추정치 Q^ 하에서)을 선택하지만 나머지 확률 ε으로 무작위로 탐색하기 때문에 ε-탐욕 탐색 정책이라고 합니다. 소위 소프트맥스 탐색 정책을 사용할 수도 있습니다.

where the hyper-parameter T is called temperature. A large value of ε in ε-greedy policy functions similarly to a large value of temperature T for the softmax policy.

여기서 초매개변수 T를 온도라고 합니다. ε-탐욕 정책에서 ε의 큰 값은 소프트맥스 정책의 온도 T의 큰 값과 유사하게 기능합니다.

It is important to note that when we pick an exploration that depends upon the current estimate of the action-value function Q^, we need to resolve the optimization problem periodically. Typical implementations of Q-Learning make one mini-batch update using a few state-action pairs in the collected dataset (typically the ones collected from the previous timestep of the robot) after taking every action using πe.

행동-가치 함수 Q^의 현재 추정에 의존하는 탐색을 선택할 때 최적화 문제를 주기적으로 해결해야 한다는 점에 유의하는 것이 중요합니다. Q-Learning의 일반적인 구현은 πe를 사용하여 모든 작업을 수행한 후 수집된 데이터 세트(일반적으로 로봇의 이전 단계에서 수집된 데이터)의 몇 가지 상태-작업 쌍을 사용하여 하나의 미니 배치 업데이트를 수행합니다.

17.3.4. The “Self-correcting” Property of Q-Learning

The dataset collected by the robot during Q-Learning grows with time. Both the exploration policy πe and the estimate Q^ evolve as the robot collects more data. This gives us a key insight into why Q-Learning works well. Consider a state s: if a particular action a has a large value under the current estimate Q^(s,a), then both the ε-greedy and the softmax exploration policies have a larger probability of picking this action. If this action actually is not the ideal action, then the future states that arise from this action will have poor rewards. The next update of the Q-Learning objective will therefore reduce the value Q^(s,a), which will reduce the probability of picking this action the next time the robot visits state s. Bad actions, e.g., ones whose value is overestimated in Q^(s,a), are explored by the robot but their value is correct in the next update of the Q-Learning objective. Good actions, e.g., whose value Q^(s,a) is large, are explored more often by the robot and thereby reinforced. This property can be used to show that Q-Learning can converge to the optimal policy even if it begins with a random policy πe (Watkins and Dayan, 1992).

Q-Learning 중에 로봇이 수집한 데이터 세트는 시간이 지남에 따라 증가합니다. 탐색 정책 πe와 추정치 Q^는 모두 로봇이 더 많은 데이터를 수집함에 따라 진화합니다. 이는 Q-Learning이 왜 잘 작동하는지에 대한 중요한 통찰력을 제공합니다. 상태 s를 고려하십시오. 특정 작업 a가 현재 추정치 Q^(s,a)보다 큰 값을 갖는 경우 ε-탐욕 및 소프트맥스 탐색 정책 모두 이 작업을 선택할 확률이 더 높습니다. 만약 이 행동이 실제로 이상적인 행동이 아니라면, 이 행동에서 발생하는 미래 상태는 낮은 보상을 받게 될 것입니다. 따라서 Q-Learning 목표의 다음 업데이트는 Q^(s,a) 값을 줄여 로봇이 다음에 상태 s를 방문할 때 이 작업을 선택할 확률을 줄입니다. 나쁜 행동, 예를 들어 Q^(s,a)에서 값이 과대평가된 행동은 로봇에 의해 탐색되지만 그 값은 Q-Learning 목표의 다음 업데이트에서 정확합니다. 예를 들어 Q^(s,a) 값이 큰 좋은 행동은 로봇에 의해 더 자주 탐색되어 강화됩니다. 이 속성은 Q-Learning이 무작위 정책 πe로 시작하더라도 최적의 정책으로 수렴할 수 있음을 보여주는 데 사용될 수 있습니다(Watkins and Dayan, 1992).

This ability to not only collect new data but also collect the right kind of data is the central feature of reinforcement learning algorithms, and this is what distinguishes them from supervised learning. Q-Learning, using deep neural networks (which we will see in the DQN chapeter later), is responsible for the resurgence of reinforcement learning (Mnih et al., 2013).

새로운 데이터를 수집할 뿐만 아니라 올바른 종류의 데이터를 수집하는 이러한 능력은 강화 학습 알고리즘의 핵심 기능이며 지도 학습과 구별됩니다. 심층 신경망(나중에 DQN 장에서 볼 예정)을 사용하는 Q-Learning은 강화 학습의 부활을 담당합니다(Mnih et al., 2013).

17.3.5. Implementation of Q-Learning

We now show how to implement Q-Learning on FrozenLake from Open AI Gym. Note this is the same setup as we consider in Value Iteration experiment.

이제 Open AI Gym의 FrozenLake에서 Q-Learning을 구현하는 방법을 보여줍니다. 이는 Value Iteration 실험에서 고려한 것과 동일한 설정입니다.

%matplotlib inline
import random
import numpy as np
from d2l import torch as d2l

seed = 0  # Random number generator seed
gamma = 0.95  # Discount factor
num_iters = 256  # Number of iterations
alpha   = 0.9  # Learing rate
epsilon = 0.9  # Epsilon in epsilion gready algorithm
random.seed(seed)  # Set the random seed
np.random.seed(seed)

# Now set up the environment
env_info = d2l.make_env('FrozenLake-v1', seed=seed)

이 코드는 강화 학습 문제를 해결하기 위한 환경 설정을 합니다. 구체적으로는 FrozenLake-v1 환경에서 Q-Learning 알고리즘을 실행하기 위한 환경을 설정하는 부분입니다. 코드의 주요 요소를 설명하겠습니다.

%matplotlib inline: 이 라인은 주피터 노트북(Jupyter Notebook)에서 그래프 및 플롯을 인라인으로 표시하도록 지시합니다.
import random: 파이썬의 random 모듈을 가져옵니다. 이 모듈은 난수 생성과 관련된 함수를 제공합니다.
import numpy as np: NumPy 라이브러리를 가져옵니다. NumPy는 과학적 계산을 위한 파이썬 라이브러리로, 다차원 배열과 관련된 기능을 제공합니다. 주로 행렬 연산과 숫자 계산에 사용됩니다.
from d2l import torch as d2l: "d2l" 패키지에서 "torch" 모듈을 가져와서 "d2l"로 별명을 붙입니다. 이 패키지는 "Dive into Deep Learning" 책의 코드와 유틸리티 함수를 제공합니다.
seed = 0: 랜덤 시드(seed)를 0으로 설정합니다. 시드를 설정하면 랜덤 함수 호출 결과가 항상 동일하게 유지됩니다. 이렇게 하면 실험의 재현성을 확보할 수 있습니다.
gamma = 0.95: 감쇠 요인(gamma)을 설정합니다. 감쇠 요인은 미래 보상의 가치를 현재 보상의 가치보다 얼마나 가중치를 둘 것인지 결정하는 요소입니다.
num_iters = 256: Q-Learning의 반복 횟수를 설정합니다. 즉, 학습을 몇 번 반복할 것인지를 결정합니다.
alpha = 0.9: 학습률(learning rate)을 설정합니다. 학습률은 Q-Value를 업데이트할 때 현재 값과 새로운 값을 얼마나 가중치를 두고 합칠지를 결정하는 요소입니다.
epsilon = 0.9: 엡실론(epsilon) 값을 설정합니다. 엡실론은 엡실론-그리디(epsilon-greedy) 알고리즘에서 사용되며, 탐험(Exploration)과 이용(Exploitation) 사이의 균형을 조절하는 역할을 합니다.
random.seed(seed), np.random.seed(seed): 랜덤 시드를 설정하여 실험 결과의 재현성을 확보합니다. 같은 시드를 사용하면 같은 조건에서 항상 같은 결과를 얻을 수 있습니다.
env_info = d2l.make_env('FrozenLake-v1', seed=seed): 'FrozenLake-v1'이라는 환경을 생성하고, 시드(seed)를 설정합니다. FrozenLake-v1은 강화 학습을 위한 환경으로, 얼어붙은 호수에서 목표 지점까지 에이전트를 이동시키는 과제를 제공합니다.

이제 이 설정된 환경에서 Q-Learning 알고리즘을 실행할 수 있습니다.

In the FrozenLake environment, the robot moves on a 4×4 grid (these are the states) with actions that are “up” (↑), “down” (→), “left” (←), and “right” (→). The environment contains a number of holes (H) cells and frozen (F) cells as well as a goal cell (G), all of which are unknown to the robot. To keep the problem simple, we assume the robot has reliable actions, i.e. P(s′∣s,a)=1 for all s∈S,a∈A. If the robot reaches the goal, the trial ends and the robot receives a reward of 1 irrespective of the action; the reward at any other state is 0 for all actions. The objective of the robot is to learn a policy that reaches the goal location (G) from a given start location (S) (this is s0) to maximize the return.

FrozenLake 환경에서 로봇은 "위"(↑), "아래"(→), "왼쪽"(←), "오른쪽"( →). 환경에는 다수의 구멍(H) 세포와 동결(F) 세포 및 목표 세포(G)가 포함되어 있으며, 이들 모두는 로봇에 알려지지 않습니다. 문제를 단순하게 유지하기 위해 로봇이 신뢰할 수 있는 동작을 한다고 가정합니다. 즉, 모든 s∈S,a∈A에 대해 P(s′∣s,a)=1입니다. 로봇이 목표에 도달하면 시험이 종료되고 로봇은 행동에 관계없이 1의 보상을 받습니다. 다른 상태에서의 보상은 모든 행동에 대해 0입니다. 로봇의 목적은 주어진 시작 위치(S)(이것은 s0)에서 목표 위치(G)에 도달하는 정책을 학습하여 수익을 극대화하는 것입니다.

We first implement ε-greedy method as follows:

def e_greedy(env, Q, s, epsilon):
    if random.random() < epsilon:
        return env.action_space.sample()

    else:
        return np.argmax(Q[s,:])

이 코드는 엡실론-그리디(epsilon-greedy) 알고리즘을 구현한 함수입니다. 이 알고리즘은 강화 학습에서 탐험(Exploration)과 이용(Exploitation) 사이의 균형을 조절하기 위해 사용됩니다. 엡실론-그리디 알고리즘은 주어진 상황에서 랜덤한 행동을 선택할 확률과 현재 학습한 최적 행동을 선택할 확률을 조절합니다.

여기서 각 인자의 의미를 설명하겠습니다:

env: 강화 학습 환경입니다. 이 환경에서 에이전트는 행동을 선택하고 보상을 받습니다.
Q: Q-Value 함수로, 각 상태(state) 및 행동(action)에 대한 가치를 나타내는 배열입니다. Q[s, a]는 상태 s에서 행동 a를 선택했을 때의 가치를 나타냅니다.
s: 현재 상태를 나타내는 변수입니다.
epsilon: 엡실론(epsilon) 값으로, [0, 1] 범위의 확률값입니다. 엡실론은 랜덤한 행동을 선택할 확률을 결정하는 매개변수입니다.

이 함수의 동작은 다음과 같습니다:

random.random() < epsilon: 랜덤한 확률값을 생성하고 이 값이 엡실론 값보다 작은지를 확인합니다. 엡실론 값보다 작으면 랜덤한 행동을 선택하게 됩니다. 이것은 탐험(Exploration)을 의미합니다. 즉, 에이전트는 새로운 경험을 얻기 위해 무작위로 행동을 선택합니다.
엡실론 값보다 크면, Q-Value 함수를 통해 현재 상태 s에서 가능한 행동 중에서 가치가 가장 높은 행동을 선택합니다. 이것은 이용(Exploitation)을 의미합니다. 에이전트는 학습한 지식을 활용하여 최적의 행동을 선택합니다.

따라서 이 함수는 엡실론 확률에 따라 탐험과 이용을 조절하여 행동을 선택합니다.

We are now ready to implement Q-learning:

이제 Q-learning을 구현할 준비가 되었습니다.

def q_learning(env_info, gamma, num_iters, alpha, epsilon):
    env_desc = env_info['desc']  # 2D array specifying what each grid item means
    env = env_info['env']  # 2D array specifying what each grid item means
    num_states = env_info['num_states']
    num_actions = env_info['num_actions']

    Q  = np.zeros((num_states, num_actions))
    V  = np.zeros((num_iters + 1, num_states))
    pi = np.zeros((num_iters + 1, num_states))

    for k in range(1, num_iters + 1):
        # Reset environment
        state, done = env.reset(), False
        while not done:
            # Select an action for a given state and acts in env based on selected action
            action = e_greedy(env, Q, state, epsilon)
            next_state, reward, done, _ = env.step(action)

            # Q-update:
            y = reward + gamma * np.max(Q[next_state,:])
            Q[state, action] = Q[state, action] + alpha * (y - Q[state, action])

            # Move to the next state
            state = next_state
        # Record max value and max action for visualization purpose only
        for s in range(num_states):
            V[k,s]  = np.max(Q[s,:])
            pi[k,s] = np.argmax(Q[s,:])
    d2l.show_Q_function_progress(env_desc, V[:-1], pi[:-1])

q_learning(env_info=env_info, gamma=gamma, num_iters=num_iters, alpha=alpha, epsilon=epsilon)

이 코드는 Q-Learning 알고리즘을 구현하여 강화 학습을 수행하는 함수입니다. Q-Learning은 강화 학습에서 가치 반복(Value Iteration)을 기반으로 하는 모델-프리(Model-Free) 강화 학습 알고리즘 중 하나로, 에이전트가 최적의 행동을 학습하는 방법 중 하나입니다.

여기서 각 인자의 의미를 설명하겠습니다:

env_info: 강화 학습 환경 정보입니다. 환경의 구조와 관련된 정보가 포함되어 있습니다.
gamma: 감쇠 계수(Discount Factor)로서 미래 보상의 현재 가치에 대한 중요성을 조절하는 매개변수입니다.
num_iters: 반복 횟수로서 학습을 몇 번 반복할지 결정하는 매개변수입니다.
alpha: 학습률(learning rate)로서 Q-Value를 업데이트할 때 얼마나 큰 보정을 적용할지 결정하는 매개변수입니다.
epsilon: 엡실론(epsilon) 값으로, 엡실론-그리디 알고리즘에서 랜덤한 탐험 확률을 나타냅니다.

이 함수는 다음과 같이 동작합니다:

초기 Q-Value 함수(Q)와 가치 함수(V)를 0으로 초기화합니다. 또한 정책 함수(pi)를 0으로 초기화합니다.
주어진 반복 횟수(num_iters)만큼 아래의 과정을 반복합니다:
- 환경을 초기화하고 시작 상태(state)를 얻습니다.
- 에이전트는 엡실론-그리디 알고리즘을 사용하여 현재 상태(state)에서 행동(action)을 선택합니다. 엡실론 확률에 따라 랜덤한 탐험 행동 또는 학습한 Q-Value를 기반으로 한 최적 행동을 선택합니다.
- 선택한 행동을 환경에 적용하고 다음 상태(next_state), 보상(reward), 종료 여부(done) 등을 얻습니다.
- Q-Value 업데이트를 수행합니다. Q-Learning의 핵심은 Q-Value 업데이트 공식인 Bellman Equation을 사용하여 Q-Value를 업데이트하는 것입니다. 새로운 가치(y)는 현재 보상(reward)과 다음 상태의 최대 Q-Value(gamma * np.max(Q[next_state,:]))를 합친 값으로 계산됩니다. 그리고 이 값을 사용하여 Q-Value를 업데이트합니다.
- 다음 상태로 이동합니다.
각 반복에서 가치 함수(V)와 정책 함수(pi)를 업데이트하고 시각화를 위해 저장합니다.

이렇게 반복적으로 Q-Value를 업데이트하고 최적의 정책을 학습하여 환경에서 에이전트가 최적의 행동을 선택할 수 있도록 합니다.

This result shows that Q-learning can find the optimal solution for this problem roughly after 250 iterations. However, when we compare this result with the Value Iteration algorithm’s result (see Implementation of Value Iteration), we can see that the Value Iteration algorithm needs way fewer iterations to find the optimal solution for this problem. This happens because the Value Iteration algorithm has access to the full MDP whereas Q-learning does not.

이 결과는 Q-learning이 대략 250번의 반복 후에 이 문제에 대한 최적의 솔루션을 찾을 수 있음을 보여줍니다. 그러나 이 결과를 Value Iteration 알고리즘의 결과(Value Iteration 구현 참조)와 비교하면 Value Iteration 알고리즘이 이 문제에 대한 최적의 솔루션을 찾기 위해 훨씬 더 적은 반복이 필요하다는 것을 알 수 있습니다. 이는 Value Iteration 알고리즘이 전체 MDP에 액세스할 수 있는 반면 Q-learning은 액세스할 수 없기 때문에 발생합니다.

17.3.6. Summary

Q-learning is one of the most fundamental reinforcement-learning algorithms. It has been at the epicenter of the recent success of reinforcement learning, most notably in learning to play video games (Mnih et al., 2013). Implementing Q-learning does not require that we know the Markov decision process (MDP), e.g., the transition and reward functions, completely.

Q-러닝은 가장 기본적인 강화학습 알고리즘 중 하나입니다. 이는 최근 강화 학습 성공의 진원지였으며, 특히 비디오 게임 학습에서 가장 두드러졌습니다(Mnih et al., 2013). Q-러닝을 구현하기 위해 MDP(Markov Decision Process)(예: 전환 및 보상 기능)를 완전히 알 필요는 없습니다.

'Dive into Deep Learning > D2L Reinforcement Learning' 카테고리의 다른 글

D2L - 17.2. Value Iteration (0)	2023.09.05
D2L - 17.1. Markov Decision Process (MDP) (0)	2023.09.05
D2L-17. Reinforcement Learning (0)	2023.09.05

Dive into Deep Learning/D2L Reinforcement Learning

D2L - 17.2. Value Iteration

2023. 9. 5. 12:16 | Posted by 솔웅

https://d2l.ai/chapter_reinforcement-learning/value-iter.html

17.2. Value Iteration — Dive into Deep Learning 1.0.3 documentation

d2l.ai

17.2. Value Iteration

In this section we will discuss how to pick the best action for the robot at each state to maximize the return of the trajectory. We will describe an algorithm called Value Iteration and implement it for a simulated robot that travels over a frozen lake.

이 섹션에서는 Return of the trajectory (궤적 반환)을 최대화하기 위해 각 state에서 robot에 대한 최상의 action을 선택하는 방법을 설명할 것입니다. Value Iteration이라는 알고리즘을 설명하고 이를 얼어붙은 호수 위를 이동하는 시뮬레이션 로봇에 대해 구현해 보겠습니다.

Value Iteration이란?

'Value Iteration(가치 반복)'은 강화 학습(Reinforcement Learning)에서 사용되는 동적 프로그래밍(Dynamic Programming) 알고리즘 중 하나입니다. 가치 반복은 마르코프 결정 과정(Markov Decision Process, MDP)에서 최적 가치 함수(optimal value function)를 근사화하기 위해 사용됩니다.

MDP에서 가치 함수(value function)는 각 상태(state)의 가치를 측정하며, 최적 가치 함수는 최상의 정책(policy)을 찾기 위한 중요한 개념입니다. 가치 반복은 다음 단계로 진행됩니다:

Initialization(초기화): 가치 함수를 초기화합니다. 보통 모든 상태의 가치를 0으로 초기화하거나 임의의 값으로 초기화합니다.
Iterative Update(반복 업데이트): 반복적으로 현재의 가치 함수를 업데이트합니다. 이 업데이트는 다음과 같은 벨만 최적 방정식(Bellman Optimality Equation)을 사용하여 이루어집니다.
- 상태 가치 업데이트(State Value Update): 모든 상태에 대해 현재 상태의 가치를 주변 상태의 가치와 보상을 고려하여 업데이트합니다. 이때 가치 함수는 최대값을 가지는 방향으로 업데이트됩니다.
- 행동 가치 업데이트(Action Value Update): 상태 대신 상태-행동 쌍(state-action pair)에 대한 가치 함수를 업데이트합니다. 이는 에이전트가 각 상태에서 가능한 모든 행동에 대한 가치를 나타냅니다.
Convergence(수렴): 가치 함수가 수렴할 때까지 반복 업데이트를 수행합니다. 일반적으로 두 가지 가치 함수 간의 차이가 어떤 임계값 이하로 작아질 때 알고리즘을 종료합니다.
Policy Extraction(정책 추출): 최적 가치 함수를 기반으로 최적 정책을 추출합니다. 이 최적 정책은 에이전트가 각 상태에서 어떤 행동을 선택해야 하는지를 나타냅니다.

가치 반복은 최적 정책과 최적 가치 함수를 찾는 강화 학습 문제에서 효과적으로 사용됩니다. 하지만 모든 가능한 상태-행동 쌍에 대해 가치 함수를 업데이트하므로 상태와 행동의 공간이 크면 계산 비용이 높을 수 있습니다.

Return of the trajectory 란?

In the context of hyperparameter optimization or any optimization process, the "trajectory" typically refers to the path or sequence of points that the optimizer explores during the search for the optimal solution. Each point on this trajectory corresponds to a specific configuration of hyperparameters, and the trajectory represents how the objective function (or some other evaluation metric) changes as the optimizer moves from one configuration to another.

하이퍼파라미터 최적화 또는 다른 최적화 과정에서 "궤적(trajectory)"은 일반적으로 옵티마이저가 최적해를 찾기 위해 탐색하는 지점 또는 순서를 나타냅니다. 이 궤적 상의 각 지점은 하이퍼파라미터의 특정 설정에 해당하며, 궤적은 옵티마이저가 한 설정에서 다른 설정으로 이동함에 따라 목적 함수(또는 다른 평가 지표)가 어떻게 변화하는지를 나타냅니다.

The "return of the trajectory" usually refers to the value of the objective function at the final point or configuration reached by the optimizer after its search process is complete. In the context of hyperparameter optimization, this would be the performance or error metric associated with the best set of hyperparameters found by the optimizer.

"궤적의 반환(return of the trajectory)"은 보통 최적화 과정이 완료된 후 옵티마이저가 도달한 최종 지점 또는 설정에서의 목적 함수 값을 나타냅니다. 하이퍼파라미터 최적화의 맥락에서 이것은 최적화 과정을 통해 찾은 하이퍼파라미터 집합으로 모델을 훈련할 때 얻는 성능 또는 에러 지표와 관련됩니다.

For example, if you're using a hyperparameter tuning algorithm to find the best hyperparameters for a machine learning model, the "return of the trajectory" would represent the final performance (e.g., accuracy, loss, etc.) achieved by the model when trained with the hyperparameters discovered by the optimization process.

예를 들어, 기계 학습 모델의 최적 하이퍼파라미터를 찾기 위해 하이퍼파라미터 튜닝 알고리즘을 사용 중이라면 "궤적의 반환"은 최적화 과정 중 발견된 하이퍼파라미터 집합으로 훈련된 모델의 최종 성능(정확도, 손실 등)을 나타낼 것입니다.

In summary, "return of the trajectory" is a way to describe the outcome or result of the optimization process, which is often measured by the performance achieved with the best set of hyperparameters found during the search.

요약하면, "궤적의 반환"은 최적화 과정의 결과 또는 최적화 과정 중 찾은 최상의 하이퍼파라미터 설정을 사용했을 때 얻는 성능으로, 최적화 과정의 결과물을 나타내는 방식입니다.

17.2.1. Stochastic Policy 확률적 정책

A stochastic policy denoted as π(a∣s) (policy for short) is a conditional distribution over the actions a∈A given the state s∈S, π(a∣s)≡P(a∣s). As an example, if the robot has four actions A= {go left, go down, go right, go up}. The policy at a state s∈S for such a set of actions A is a categorical distribution where the probabilities of the four actions could be [0.4,0.2,0.1,0.3]; at some other state s′∈S the probabilities π(a∣s′) of the same four actions could be [0.1,0.1,0.2,0.6]. Note that we should have ∑a π(a∣s)=1 for any state s. A deterministic policy is a special case of a stochastic policy in that the distribution π(a∣s) only gives non-zero probability to one particular action, e.g., [1,0,0,0] for our example with four actions.

π(a∣s)(간단히 정책)로 표시되는 확률론적 정책은 상태 s∈S, π(a∣s)=P(a∣s)가 주어지면 동작 a∈A에 대한 조건부 분포입니다. 예를 들어, 로봇에 4가지 동작 A= {왼쪽으로 이동, 아래로 이동, 오른쪽으로 이동, 위로 이동}이 있는 경우입니다. 그러한 일련의 행동 A에 대한 상태 s∈S의 정책은 네 가지 행동의 확률이 [0.4,0.2,0.1,0.3]이 될 수 있는 범주형 분포입니다. 다른 상태 s′∈S에서 동일한 네 가지 행동의 확률 π(a∣s′)는 [0.1,0.1,0.2,0.6]이 될 수 있습니다. 모든 상태 s에 대해 ∑a π(a∣s)=1이 있어야 합니다. 결정론적 정책은 분포 π(a∣s)가 하나의 특정 작업에만 0이 아닌 확률을 제공한다는 점에서 확률론적 정책의 특별한 경우입니다(예: 4개 작업이 있는 예시의 경우 [1,0,0,0]).

Stochastic Policy란?

In the context of reinforcement learning, a "stochastic policy" refers to a strategy or a set of rules that an agent uses to make decisions in an environment. What makes it "stochastic" is that it introduces an element of randomness or probability into the decision-making process.

Reinforcement learning(강화 학습)의 맥락에서 '확률적 정책'은 에이전트가 환경에서 의사 결정을 내리는 데 사용하는 전략이나 규칙을 가리킵니다. 이것이 '확률적'으로 만드는 것은 의사 결정 프로세스에 확률 또는 무작위성 요소를 도입한다는 것입니다.

In a stochastic policy, the agent doesn't make deterministic choices, where every action is known in advance. Instead, it selects actions based on a probability distribution. This means that even in the same state, the agent might choose different actions in different episodes or trials, reflecting the inherent uncertainty in the environment or the agent's own decision-making process.

확률적 정책에서 에이전트는 결정을 내릴 때 미리 모든 행동이 알려져 있는 결정적 선택을 하지 않습니다. 대신, 확률 분포를 기반으로 행동을 선택합니다. 이것은 동일한 상태에서라도 에이전트가 다른 에피소드나 시도에서 다른 행동을 선택할 수 있음을 의미하며, 이는 환경의 불확실성이나 에이전트 자체의 의사 결정 프로세스에 내재된 불확실성을 반영합니다.

Stochastic policies are often represented as probability distributions over possible actions. The agent samples from this distribution to select an action. The degree of randomness or variability in the policy can vary depending on the specific problem and learning algorithm. Stochastic policies can be advantageous in situations where variability in actions is desirable to explore the environment more effectively or handle uncertainties.

확률적 정책은 종종 가능한 행동에 대한 확률 분포로 표현됩니다. 에이전트는 이 분포에서 샘플링하여 행동을 선택합니다. 정책의 무작위성이나 변동성의 정도는 구체적인 문제 및 학습 알고리즘에 따라 다를 수 있습니다. 환경을 더 효과적으로 탐색하거나 불확실성을 다루기 위해 행동의 변동성이 바람직한 상황에서 확률적 정책을 사용하는 것이 유리할 수 있습니다.

In contrast, a "deterministic policy" would map each state directly to a specific action without any randomness, meaning that in the same state, the agent would always choose the same action.

반면, '결정적 정책'은 각 상태를 특정한 행동에 직접 매핑하며 어떠한 무작위성도 없이 특정한 행동을 선택한다는 것을 의미합니다.

Stochastic policies are commonly used in reinforcement learning algorithms, such as policy gradient methods, to enable exploration and adaptability in dynamic environments.

확률적 정책은 주로 정책 그래디언트 방법 등의 강화 학습 알고리즘에서 사용되며, 동적 환경에서의 탐색과 적응을 가능하게 하기 위해 활용됩니다.

17.2.2. Value Function

Imagine now that the robot starts at a state s0 and at each time instant, it first samples an action from the policy at∼π(st) and takes this action to result in the next state st+1. The trajectory τ=(s0,a0,r0,s1,a1,r1,…), can be different depending upon which particular action at is sampled at intermediate instants. We define the average return R(τ)=∑**∞ t=0 γ**t r(st,at) of all such trajectories

이제 로봇이 상태 s0에서 시작하고 매 순간마다 먼저 ∼π(st)의 정책에서 작업을 샘플링하고 이 작업을 수행하여 다음 상태 st+1을 생성한다고 상상해 보세요. 궤적 τ=(s0,a0,r0,s1,a1,r1,…)은 중간 순간에 샘플링되는 특정 동작에 따라 다를 수 있습니다. 우리는 그러한 모든 궤적의 평균 수익률 R(τ)=∑**킵 t=0 γ**t r(st,at)을 정의합니다.

where s t+1∼P(st+1∣st,at) is the next state of the robot and r(st,at) is the instantaneous reward obtained by taking action 'at' in state st at time t. This is called the “value function” for the policy π. In simple words, the value of a state s0 for a policy π, denoted by V**π(s0), is the expected ϒ-discounted return obtained by the robot if it begins at state s0 and takes actions from the policy π at each time instant.

여기서 s t+1∼P(st+1∣st,at)는 로봇의 다음 상태이고, r(st,at)는 시간 t에서 상태 st에서 'at' 행동을 취함으로써 얻은 순간 보상이다. 이를 정책 π에 대한 "가치 함수"라고 합니다. 간단히 말해서, V**π(s0)로 표시되는 정책 π에 대한 상태 s0의 값은 기대되는 ϒ-할인된 수익입니다. 이것은 로봇이 상태 s0에서 시작하고 매 순간마다 정책 π에서 조치를 취하는 경우 로봇에 의해 수행되는 것에 의해 얻어집니다.

We next break down the trajectory into two stages (i) the first stage which corresponds to s0→s1 upon taking the action a0, and (ii) a second stage which is the trajectory τ′=(s1,a1,r1,…) thereafter. The key idea behind all algorithms in reinforcement learning is that the value of state s0 can be written as the average reward obtained in the first stage and the value function averaged over all possible next states s1. This is quite intuitive and arises from our Markov assumption: the average return from the current state is the sum of the average return from the next state and the average reward of going to the next state. Mathematically, we write the two stages as

다음으로 궤도를 두 단계로 나눕니다. (i) 조치 a0을 취했을 때 s0→s1에 해당하는 첫 번째 단계, (ii) 궤도 τ'=(s1,a1,r1,…)인 두 번째 단계 - 이러한 상황이 계속 이어짐 . 강화 학습의 모든 알고리즘 뒤에 있는 핵심 아이디어는 상태 s0의 값이 첫 번째 단계에서 얻은 평균 보상으로 기록될 수 있고 가치 함수가 가능한 모든 다음 상태 s1에 대해 평균을 낼 수 있다는 것입니다. 이는 매우 직관적이며 Markov 가정에서 비롯됩니다. 현재 상태의 평균 수익은 다음 상태의 평균 수익과 다음 상태로 이동하는 평균 보상의 합입니다. 수학적으로 우리는 두 단계를 다음과 같이 씁니다.

This decomposition is very powerful: it is the foundation of the principle of dynamic programming upon which all reinforcement learning algorithms are based. Notice that the second stage gets two expectations, one over the choices of the action a0 taken in the first stage using the stochastic policy and another over the possible states s1 obtained from the chosen action. We can write (17.2.2) using the transition probabilities in the Markov decision process (MDP) as

이러한 분해는 매우 강력합니다. 이는 모든 강화 학습 알고리즘의 기반이 되는 동적 프로그래밍 원리의 기초입니다. 두 번째 단계에서는 확률론적 정책을 사용하여 첫 번째 단계에서 취한 행동 a0의 선택에 대한 기대와 선택한 행동에서 얻은 가능한 상태 s1에 대한 기대라는 두 가지 기대를 얻습니다. 마르코프 결정 과정(MDP)의 전환 확률을 사용하여 (17.2.2)를 다음과 같이 작성할 수 있습니다.

An important thing to notice here is that the above identity holds for all states s∈S because we can think of any trajectory that begins at that state and break down the trajectory into two stages.

여기서 주목해야 할 중요한 점은 위의 항등식이 모든 상태 s∈S에 대해 적용된다는 것입니다. 왜냐하면 우리는 해당 상태에서 시작하는 모든 궤적을 생각하고 궤적을 두 단계로 나눌 수 있기 때문입니다.

Value Function이란?

In the context of reinforcement learning, a "value function" is a fundamental concept used to estimate the expected cumulative rewards an agent can obtain when following a particular policy in an environment. Value functions are essential for making decisions and learning in reinforcement learning tasks.

강화 학습의 맥락에서 '가치 함수(Value Function)'는 특정 환경에서 특정 정책을 따를 때 에이전트가 기대하는 누적 보상을 추정하는 데 사용되는 기본 개념입니다. 가치 함수는 강화 학습 작업에서의 의사 결정 및 학습에 필수적입니다.

There are two main types of value functions:

주로 두 가지 유형의 가치 함수가 있습니다:

State-Value Function (V(s)): This function estimates the expected cumulative reward an agent can achieve when starting from a particular state and following a specific policy. In other words, V(s) quantifies the long-term desirability of being in a given state while following the chosen policy.

상태 가치 함수 (V(s)): 이 함수는 특정 상태에서 시작하여 특정 정책을 따를 때 에이전트가 기대하는 누적 보상을 추정합니다. 다시 말해, V(s)는 선택한 정책을 따를 때 주어진 상태에 있을 때의 장기적인 바람직함을 측정합니다.
Action-Value Function (Q(s, a)): This function estimates the expected cumulative reward an agent can obtain by taking a particular action (a) in a specific state (s) and then following a specific policy. Q(s, a) measures the long-term desirability of taking a particular action in a given state while following the chosen policy.

행동 가치 함수 (Q(s, a)): 이 함수는 특정 상태 (s)에서 특정 행동 (a)을 취한 다음 특정 정책을 따를 때 에이전트가 기대하는 누적 보상을 추정합니다. Q(s, a)는 주어진 상태에서 특정 행동을 취할 때 선택한 정책을 따를 때의 장기적인 바람직함을 측정합니다.

Value functions are crucial for various reinforcement learning algorithms, such as Q-learning and policy gradient methods. They serve as a foundation for evaluating and comparing different policies. By iteratively updating these value functions, an agent can learn to make better decisions and maximize its cumulative rewards in the environment.

가치 함수는 Q-러닝(Q-learning) 및 정책 그래디언트 방법과 같은 다양한 강화 학습 알고리즘에서 중요합니다. 이들은 서로 다른 정책을 평가하고 비교하는 데 기초를 제공합니다. 이러한 가치 함수를 반복적으로 업데이트함으로써 에이전트는 더 나은 결정을 내리고 환경에서 누적 보상을 최대화하는 방법을 학습할 수 있습니다.

The ultimate goal of reinforcement learning is often to find an optimal policy, which is a policy that maximizes the expected cumulative reward. Value functions are essential tools for achieving this goal because they help assess and improve policies based on their expected performance.

강화 학습의 궁극적인 목표는 종종 기대 누적 보상을 최대화하는 최적 정책을 찾는 것입니다. 가치 함수는 이 목표를 달성하기 위한 중요한 도구입니다.

17.2.3. Action-Value Function

In implementations, it is often useful to maintain a quantity called the “action value” function which is a closely related quantity to the value function. This is defined to be the average return of a trajectory that begins at s0 but when the action of the first stage is fixed to be

구현에서는 가치 함수와 밀접하게 관련된 수량인 "액션 가치" 함수라는 수량을 유지하는 것이 유용한 경우가 많습니다. 이는 s0에서 시작하지만 첫 번째 단계의 동작이 다음과 같이 고정된 궤적의 평균 반환으로 정의됩니다.

note that the summation inside the expectation is from t=1,…,∞ because the reward of the first stage is fixed in this case. We can again break down the trajectory into two parts and write

이 경우 첫 번째 단계의 보상이 고정되어 있기 때문에 기대값 내부의 합은 t=1,...,부터라는 점에 유의하세요. 우리는 다시 궤적을 두 부분으로 나누고 다음과 같이 쓸 수 있습니다.

This version is the analog of (17.2.3) for the action value function.

이 버전은 동작 값 함수에 대한 (17.2.3)과 유사합니다.

Action Value Function이란?

In the context of reinforcement learning, the "Action-Value Function," often denoted as Q(s, a), represents the expected cumulative reward an agent can obtain by taking a specific action (a) in a particular state (s) and then following a specific policy. It quantifies the long-term desirability of taking a particular action in a given state while adhering to the chosen policy.

강화 학습의 맥락에서 '행동 가치 함수(Action-Value Function)'는 특정 상태 (s)에서 특정 행동 (a)을 취한 다음 특정 정책을 따를 때 에이전트가 기대하는 누적 보상을 나타냅니다. 이는 선택한 정책을 따를 때 주어진 상태에서 특정 행동을 취하는 것의 장기적인 바람직함을 측정합니다.

Here's a breakdown of the components of the Action-Value Function (Q(s, a)):

행동 가치 함수 (Q(s, a))의 구성 요소를 살펴보겠습니다:

s (state): This is the current situation or configuration of the environment that the agent perceives.

s (상태): 이것은 에이전트가 지각하는 환경의 현재 상황 또는 구성입니다.
a (action): This is the specific action that the agent takes in the current state.

a (행동): 이것은 에이전트가 현재 상태에서 취하는 구체적인 행동입니다.
Q(s, a): This is the Action-Value Function, which provides an estimate of the expected cumulative reward starting from state s, taking action a, and then following a specific policy.

Q(s, a): 이것은 행동 가치 함수로, 상태 s에서 행동 a를 취하고 특정 정책을 따를 때 기대하는 누적 보상을 추정합니다.

The Action-Value Function is fundamental in reinforcement learning because it helps the agent evaluate and compare different actions in a given state. By computing the Q-values for each action in each state and following a specific policy, an agent can make informed decisions to maximize its cumulative rewards over time.

행동 가치 함수는 강화 학습에서 기본적이며, 에이전트가 주어진 상태에서 다른 행동을 평가하고 비교하는 데 도움을 줍니다. 각 상태에서 각 행동의 Q-값을 계산하고 특정 정책을 따를 때, 에이전트는 시간이 지남에 따라 누적 보상을 최대화하기 위한 정보에 기반하여 결정을 내릴 수 있습니다.

One of the key algorithms that uses the Action-Value Function is Q-learning, which aims to learn the optimal action-value function (Q-function) that maximizes the expected cumulative reward. This function is often used to guide the agent's behavior in an environment, allowing it to learn to make better decisions over time.

행동 가치 함수를 사용하는 주요 알고리즘 중 하나는 Q-러닝(Q-learning)입니다. 이 알고리즘은 기대 누적 보상을 최대화하는 행동 가치 함수 (Q-함수)를 학습하는 것을 목표로 하며, 이 함수는 환경에서 에이전트의 행동을 안내하는 데 자주 사용됩니다. 에이전트는 시간이 지남에 따라 더 나은 결정을 내리도록 학습하게 됩니다.

17.2.4. Optimal Stochastic Policy

Both the value function and the action-value function depend upon the policy that the robot chooses. We will next think of the “optimal policy” that achieves the maximal average return

가치 함수와 행동-가치 함수는 모두 로봇이 선택하는 정책에 따라 달라집니다. 다음으로 최대 평균수익률을 달성하는 '최적정책'에 대해 생각해 보겠습니다.

Of all possible stochastic policies that the robot could have taken, the optimal policy π* achieves the largest average discounted return for trajectories starting from state s0. Let us denote the value function and the action-value function of the optimal policy as V*≡Vπ* and Q*≡Qπ*.

로봇이 취할 수 있는 모든 가능한 확률론적 정책 중에서 최적 정책 π*는 상태 s0에서 시작하는 궤도에 대해 가장 큰 평균 할인 수익을 달성합니다. 최적 정책의 가치함수와 행동-가치함수를 V*=Vπ* 및 Q*=Qπ*로 표시하겠습니다.

Let us observe that for a deterministic policy where there is only one action that is possible under the policy at any given state. This gives us

특정 상태의 정책에 따라 가능한 작업이 하나만 있는 결정론적 정책에 대해 살펴보겠습니다. 이것이 우리에게 이것을 줍니다.

A good mnemonic to remember this is that the optimal action at state s (for a deterministic policy) is the one that maximizes the sum of reward r(s,a) from the first stage and the average return of the trajectories starting from the next sate s′, averaged over all possible next states s′ from the second stage.

이것을 기억하기 위한 좋은 기억법은 상태 s(결정론적 정책에 대한)에서의 최적의 행동은 첫 번째 단계의 보상 r(s,a)의 합과 다음 단계부터 시작하는 궤적의 평균 반환을 최대화하는 행동이라는 것입니다. 상태 s'는 두 번째 단계에서 가능한 모든 다음 상태 s'에 대해 평균을 냅니다.

Optimal Stochastic Policy란?

An "optimal stochastic policy" in reinforcement learning refers to a strategy or set of rules that an agent follows in an environment to maximize its expected cumulative rewards while incorporating a certain level of randomness or probability into its decision-making process.

강화 학습에서 '최적 확률적 정책(Optimal Stochastic Policy)'은 에이전트가 환경에서 따르는 전략 또는 규칙을 나타내며, 기대 누적 보상을 극대화하기 위해 의사 결정 프로세스에 일정 수준의 무작위성 또는 확률성을 통합하는 것을 의미합니다.

Here's a breakdown of the key components of an optimal stochastic policy:

다음은 최적 확률적 정책의 주요 구성 요소를 설명한 것입니다:

Optimal: The policy is considered "optimal" when it achieves the highest expected cumulative reward over the long run. In other words, it is the best strategy among all possible policies the agent can follow in the given environment.

최적: 정책은 장기적으로 기대 누적 보상을 최대화하는 경우에 "최적"으로 간주됩니다. 다시 말해, 주어진 환경에서 에이전트가 따를 수 있는 모든 정책 중에서 가장 우수한 전략입니다.
Stochastic: Unlike a deterministic policy that prescribes a single action for each state, an optimal stochastic policy introduces randomness or uncertainty into its actions. Instead of always selecting the same action in a given state, the agent chooses actions based on probability distributions. This stochasticity allows the agent to explore the environment effectively and adapt to uncertainties.

확률적: 특정 상태에 대해 항상 동일한 행동을 규정하는 결정적 정책과 달리 최적 확률적 정책은 행동에 무작위성 또는 불확실성을 도입합니다. 주어진 상태에서 항상 동일한 행동을 선택하는 대신, 에이전트는 확률 분포를 기반으로 행동을 선택합니다. 이러한 확률성은 에이전트가 환경을 효과적으로 탐색하고 불확실성에 적응하는 데 도움이 됩니다.
Expected Cumulative Rewards: The primary objective of an optimal stochastic policy is to maximize the expected cumulative rewards or the expected return. This means that the policy aims to make decisions that, on average, lead to the highest possible total rewards over time.

기대 누적 보상: 최적 확률적 정책의 주요 목표는 기대 누적 보상 또는 기대 수익을 극대화하는 것입니다. 이는 정책이 평균적으로 시간이 지남에 따라 가능한 최대 총 보상을 얻도록 결정을 내리려는 것을 의미합니다.

An optimal stochastic policy is often challenging to find and depends on the specific problem and the agent's understanding of the environment. It's typically determined using reinforcement learning algorithms that aim to estimate or approximate the optimal policy. These algorithms iteratively adjust the policy to find the best balance between exploration (trying out different actions) and exploitation (choosing actions that are known to yield high rewards) while considering the inherent uncertainty in the environment.

최적 확률적 정책을 찾는 것은 종종 어렵고 구체적인 문제 및 에이전트의 환경 이해에 따라 다릅니다. 이러한 정책을 추정하거나 근사하는 강화 학습 알고리즘을 사용하여 일반적으로 결정됩니다. 이러한 알고리즘은 환경의 내재적 불확실성을 고려하면서 탐색(다른 행동을 시도)과 활용(고 보상을 얻을 것으로 알려진 행동 선택) 사이의 최적 균형을 찾기 위해 정책을 반복적으로 조정합니다.

In summary, an optimal stochastic policy is a strategy that, over the long term, maximizes an agent's expected cumulative rewards while allowing for probabilistic decision-making, making it well-suited for reinforcement learning problems where exploration and adaptability are critical.

요약하면, 최적 확률적 정책은 기대 누적 보상을 최대화하면서 확률적 의사 결정을 허용하는 전략으로, 탐색과 적응이 중요한 강화 학습 문제에 적합합니다.

17.2.5. Principle of Dynamic Programming

Our developement in the previous section in (17.2.2) or (17.2.5) can be turned into an algorithm to compute the optimal value function V* or the action-value function Q*, respectively. Observe that

이전 섹션 (17.2.2) 또는 (17.2.5)의 개발은 각각 최적 가치 함수 V* 또는 행동-가치 함수 Q*를 계산하는 알고리즘으로 바뀔 수 있습니다. 그것을 관찰하십시오

For a deterministic optimal policy π*, since there is only one action that can be taken at state s, we can also write

결정론적 최적 정책 π*의 경우 상태 s에서 취할 수 있는 행동은 단 하나뿐이므로 다음과 같이 쓸 수도 있습니다.

for all states s∈S. This identity is called the “principle of dynamic programming” (Bellman, 1952, Bellman, 1957). It was formulated by Richard Bellman in 1950s and we can remember it as “the remainder of an optimal trajectory is also optimal”.

모든 상태 s∈S에 대해. 이러한 정체성을 "동적 프로그래밍의 원리"라고 합니다(Bellman, 1952, Bellman, 1957). 이는 1950년대 Richard Bellman에 의해 공식화되었으며 "최적 궤적의 나머지 부분도 최적"이라고 기억할 수 있습니다.

Principle of Dynamic Programming 이란?

The "Principle of Dynamic Programming" is a fundamental concept in the field of reinforcement learning and optimization, particularly in the context of solving problems with overlapping subproblems, such as many dynamic programming and reinforcement learning problems. It was first introduced by Richard Bellman in the mid-20th century.

'동적 프로그래밍 원리(Principle of Dynamic Programming)'는 강화 학습과 최적화 분야에서의 기본 개념으로, 많은 동적 프로그래밍과 강화 학습 문제와 같이 하위 문제가 겹치는 문제를 해결하기 위한 것입니다. 이 원리는 리처드 벨먼(Richard Bellman)에 의해 20세기 중반에 처음 소개되었습니다.

Dynamic programming is a mathematical optimization technique used to solve problems by breaking them down into smaller, overlapping subproblems and solving each subproblem only once, storing the results to avoid redundant computations. The Principle of Dynamic Programming consists of the following key components:

동적 프로그래밍은 문제를 더 작고 겹치는 하위 문제로 분해하고 각 하위 문제를 한 번만 해결하여 결과를 중복 계산하지 않도록 하는 수학적 최적화 기술입니다. 동적 프로그래밍의 원리는 다음과 같은 주요 구성 요소로 구성됩니다:

Optimality Principle: This principle states that an optimal solution to a larger problem can be constructed from optimal solutions to its smaller subproblems. In other words, it suggests that when solving a complex problem, you can build the solution incrementally by solving smaller subproblems optimally and then combining their solutions.

최적성 원리: 이 원리는 큰 문제의 최적 해결책을 더 작은 하위 문제의 최적 해결책으로 구성할 수 있다고 말합니다. 다시 말해, 복잡한 문제를 해결할 때, 더 작은 하위 문제를 최적으로 해결하고 그들의 해결책을 결합하여 솔루션을 점진적으로 구성할 수 있다는 것을 제안합니다.
Overlapping Subproblems: Dynamic programming is particularly useful when the problem can be divided into subproblems that are reused multiple times during the solution process. By solving these subproblems once and storing their solutions, you can avoid redundant calculations.

겹치는 하위 문제: 동적 프로그래밍은 문제를 해결하는 동안 여러 번 재사용되는 하위 문제로 나눌 수 있는 경우 특히 유용합니다. 이러한 하위 문제를 한 번 해결하고 그 결과를 저장하여 중복 계산을 피할 수 있습니다.
Memoization: To implement dynamic programming efficiently, you often use memoization, which involves storing the results of solved subproblems in a table or cache. When a subproblem is encountered again, you can simply look up its solution in the table instead of recomputing it.

메모이제이션: 동적 프로그래밍을 효율적으로 구현하기 위해 해결된 하위 문제의 결과를 테이블이나 캐시에 저장하는 메모이제이션을 종종 사용합니다. 하위 문제가 다시 발견되면 결과를 다시 계산하는 대신 테이블에서 해결책을 찾을 수 있습니다.
Bottom-Up or Top-Down Approach: Dynamic programming can be approached in two main ways: bottom-up and top-down. In the bottom-up approach, you start by solving the smallest subproblems and progressively build up to the larger problem. In the top-down approach (also known as memoization), you start with the larger problem and recursively solve smaller subproblems, storing their results along the way.

하향식 또는 상향식 접근법: 동적 프로그래밍은 주로 두 가지 주요 방식으로 접근할 수 있습니다. 하향식 접근법에서는 가장 작은 하위 문제부터 시작하여 점진적으로 큰 문제를 해결합니다. 상향식 접근법(메모이제이션으로도 알려집니다)에서는 큰 문제부터 시작하여 재귀적으로 작은 하위 문제를 해결하고 그 결과를 저장합니다.

Dynamic programming is used to solve a wide range of optimization problems, such as finding the shortest path in a graph (e.g., Dijkstra's algorithm), solving the knapsack problem, and in reinforcement learning for finding optimal policies and value functions. It's a powerful technique for tackling complex problems by efficiently reusing solutions to subproblems, and it forms the foundation for many algorithms in computer science and operations research.

동적 프로그래밍은 그래프에서 최단 경로를 찾는 문제(예: Dijkstra 알고리즘), 배낭 문제를 해결하는 문제 및 강화 학습에서 최적 정책과 가치 함수를 찾는 데 사용됩니다. 이것은 하위 문제의 솔루션을 효율적으로 재사용함으로써 복잡한 문제에 접근하는 강력한 기술이며, 컴퓨터 과학과 운영 연구의 많은 알고리즘의 기초를 형성합니다.

Bellman Equation 이란?

The Bellman Equation is a fundamental concept in the field of dynamic programming and reinforcement learning. It is named after the mathematician and computer scientist Richard Bellman. The Bellman Equation is a recursive equation used to express the total expected reward (or value) of being in a particular state of a system and taking a specific action, considering all possible future states and actions.

벨만 방정식(Bellman Equation)**은 동적 프로그래밍과 강화 학습 분야에서의 기본 개념입니다. 이 방정식은 수학자이자 컴퓨터 과학자인 리처드 벨만(Richard Bellman)의 이름에서 따왔습니다. 벨만 방정식은 특정 상태(state)에 있을 때 특정 동작(action)을 취하고 모든 가능한 미래 상태와 동작을 고려하여 예상 보상(또는 가치)의 총합을 표현하는 재귀적인 방정식입니다.

In its simplest form, the Bellman Equation is expressed as:

가장 간단한 형태로 벨만 방정식은 다음과 같이 표현됩니다:

Where:

represents the value of being in state , which is the expected cumulative reward from that state onward.

는 상태 에 있을 때의 가치를 나타내며, 해당 상태 이후의 예상 누적 보상입니다.
represents the action taken in state .

는 상태 에서 취한 동작을 나타냅니다.
is the immediate reward obtained after taking action in state .

은 상태 에서 동작 를 수행한 후 얻은 즉각적인 보상입니다.
is the discount factor, which scales the importance of future rewards relative to immediate rewards.

는 할인 계수로, 미래 보상의 중요성을 즉각적인 보상에 대비해 어떻게 조절할지를 나타냅니다.
represents a sum over all possible future states that could be reached from state by taking action .

은 동작 를 취하여 상태 에서 도달할 수 있는 모든 가능한 미래 상태 에 대한 합계를 나타냅니다.
is the probability of transitioning from state to state when taking action .

는 동작 를 취하면 상태 에서 상태 로 전환할 확률입니다.
is the value of the next state .

은 다음 상태 의 가치를 나타냅니다.

The Bellman Equation provides a way to compute the value of each state in a sequential decision-making process, such as a Markov Decision Process (MDP). Solving the Bellman Equation is essential for finding the optimal policy, which is a mapping of states to actions that maximizes the expected cumulative reward over time.

벨만 방정식은 순차적 의사 결정 문제, 예를 들어 마르코프 결정 과정(Markov Decision Process, MDP)에서 각 상태의 가치를 계산하는 방법을 제공합니다. 벨만 방정식을 해결하는 것은 최적 정책을 찾는 데 필수적이며, 이 정책은 시간에 따른 누적 보상을 최대화하는 상태에서 동작으로의 매핑입니다.

The Bellman Equation is a key concept in reinforcement learning, where agents learn to make decisions by iteratively updating their value estimates based on observed rewards and transitions. It is a cornerstone of algorithms like Q-learning and the Bellman equation-based methods, which are widely used in artificial intelligence and robotics for solving sequential decision-making problems.

벨만 방정식은 강화 학습에서 중요한 개념으로, 에이전트는 관찰된 보상과 전이에 기반하여 가치 추정치를 반복적으로 업데이트하는 방식으로 의사 결정을 내리는 방법을 학습합니다. 이 방정식은 Q-러닝(Q-learning)과 벨만 방정식 기반 방법과 같은 알고리즘에서 핵심 개념으로 사용되며, 순차적 의사 결정 문제를 해결하기 위해 인공 지능과 로봇 공학 분야에서 널리 사용됩니다.

17.2.6. Value Iteration

We can turn the principle of dynamic programming into an algorithm for finding the optimal value function called value iteration. The key idea behind value iteration is to think of this identity as a set of constraints that tie together V*(s) at different states s∈S. We initialize the value function to some arbitrary values V0(s) for all states s∈S. At the kth iteration, the Value Iteration algorithm updates the value function as

동적 프로그래밍의 원리를 가치 반복이라는 최적의 가치 함수를 찾는 알고리즘으로 바꿀 수 있습니다. Value Iteration의 핵심 아이디어는 이 동일성을 서로 다른 상태 s∈S에서 V*(s)를 함께 묶는 제약 조건 집합으로 생각하는 것입니다. 우리는 모든 상태 s∈S에 대해 임의의 값 V0(s)로 가치 함수를 초기화합니다. k번째 반복에서 Value Iteration 알고리즘은 값 함수를 다음과 같이 업데이트합니다.

It turns out that as k→∞ the value function estimated by the Value Iteration algorithm converges to the optimal value function irrespective of the initialization V0,

Value Iteration 알고리즘에 의해 추정된 값 함수는 k → 마다 초기화 V0에 관계없이 최적의 값 함수로 수렴하는 것으로 나타났습니다.

The same Value Iteration algorithm can be equivalently written using the action-value function as

동일한 Value Iteration 알고리즘은 다음과 같이 action-value 함수를 사용하여 동등하게 작성할 수 있습니다.

In this case we initialize Q0(s,a) to some arbitrary values for all s∈S and a∈A. Again we have Q*(s,a)=lim k→∞ Qk(s,a) for all s∈S and a∈A.

이 경우 Q0(s,a)를 모든 s∈S 및 a∈A에 대한 임의의 값으로 초기화합니다. 다시 우리는 모든 s∈S 및 a∈A에 대해 Q*(s,a)=lim k→킵 Qk(s,a)를 갖습니다.

Value Iteration 이란?

"Value Iteration" is an iterative algorithm used in reinforcement learning and dynamic programming to find the optimal value function for a Markov decision process (MDP). It's a fundamental method for solving problems where an agent makes decisions in an uncertain environment over a sequence of discrete time steps.

'가치 반복(Value Iteration)'은 강화 학습과 동적 프로그래밍에서 사용되는 반복 알고리즘으로, 마르코프 결정 프로세스(MDP)에 대한 최적 가치 함수를 찾는 데 사용됩니다. 이 알고리즘은 에이전트가 불확실한 환경에서 일련의 이산 시간 단계 동안 결정을 내리는 문제를 해결하는 데 사용되는 기본적인 방법입니다.

Here's an overview of the key concepts related to Value Iteration:

가치 반복과 관련된 주요 개념을 살펴보겠습니다:

Markov Decision Process (MDP): Value Iteration is typically applied to problems modeled as MDPs. An MDP consists of a set of states, a set of actions, transition probabilities, rewards, and a discount factor. It's a mathematical framework used to represent decision-making problems under uncertainty.

마르코프 결정 프로세스(MDP): 가치 반복은 일반적으로 MDP로 모델링된 문제에 적용됩니다. MDP는 상태 집합, 행동 집합, 전이 확률, 보상 및 할인 요소로 구성됩니다. 불확실성 하에서 의사 결정 문제를 표현하는 수학적 프레임워크입니다.
Value Function (V(s)): The goal of Value Iteration is to compute the value function V(s), which represents the expected cumulative rewards an agent can achieve starting from each state s while following an optimal policy. V(s) quantifies the desirability of being in a particular state.

가치 함수(V(s)): 가치 반복의 목표는 각 상태 s에서 최적 정책을 따를 때 에이전트가 달성할 수 있는 기대 누적 보상을 나타내는 가치 함수 V(s)를 계산하는 것입니다. V(s)는 특정 상태에 머무르는 것의 바람직함을 측정합니다.
Bellman Equation: Value Iteration relies on the Bellman equation, which relates the value function of a state to the value functions of its neighboring states. The Bellman equation for state s is written as V(s) = max[R(s, a) + γ * Σ P(s' | s, a) * V(s')], where R(s, a) is the immediate reward of taking action a in state s, γ (gamma) is the discount factor, P(s' | s, a) is the transition probability to reach state s' from state s by taking action a, and V(s') is the value of state s'.

벨만 방정식: 가치 반복은 상태의 가치 함수와 이웃 상태의 가치 함수 사이의 관계를 나타내는 벨만 방정식을 기반으로 합니다. 상태 s에 대한 벨만 방정식은 V(s) = max[R(s, a) + γ * Σ P(s' | s, a) * V(s')]로 표시되며, 여기서 R(s, a)는 상태 s에서 행동 a를 취했을 때 즉각적인 보상, γ(gamma)는 할인 요소, P(s' | s, a)는 행동 a를 취한 상태 s에서 상태 s'로 이동할 전이 확률, V(s')은 상태 s'의 가치를 나타냅니다.
Algorithm: Value Iteration starts with an initial estimate of the value function and iteratively updates it using the Bellman equation until it converges to the optimal value function. In each iteration, it computes the value for each state based on the current estimates of neighboring states' values. The process continues until the values no longer change significantly.

알고리즘: 가치 반복은 가치 함수의 초기 추정치로 시작하여 벨만 방정식을 사용하여 상태의 값을 반복적으로 업데이트하고 최적 가치 함수에 수렴할 때까지 반복합니다. 각 반복에서는 현재 상태의 이웃 상태의 값을 기반으로 각 상태의 가치를 계산합니다. 가치가 더 이상 크게 변하지 않을 때까지 이 과정을 반복합니다.
Policy Extraction: Once the optimal value function is found, a corresponding optimal policy can be extracted by selecting actions that maximize the right-hand side of the Bellman equation for each state.

정책 추출: 최적 가치 함수를 찾으면 각 상태에 대한 벨만 방정식 우변을 최대화하는 행동을 선택하여 해당하는 최적 정책을 추출할 수 있습니다.

Value Iteration is a powerful method for finding the optimal policy in MDPs, where the agent aims to maximize its expected cumulative rewards over time. It's known for its simplicity and efficiency, especially when the problem's state and action spaces are not too large. However, it might not be suitable for problems with very large state spaces due to the computational complexity of the algorithm.

가치 반복은 MDP에서 최적 정책을 찾는 강력한 방법으로, 에이전트가 시간이 지남에 따라 기대 누적 보상을 최대화하려는 경우에 사용됩니다. 특히 상태 및 행동 공간이 너무 크지 않은 경우에는 간단하고 효율적인 방법으로 알려져 있습니다. 그러나 상태 공간이 매우 큰 문제에는 알고리즘의 계산 복잡성 때문에 적합하지 않을 수 있습니다.

17.2.7. Policy Evaluation

Value Iteration enables us to compute the optimal value function, i.e., V**π* of the optimal deterministic policy π*. We can also use similar iterative updates to compute the value function associated with any other, potentially stochastic, policy π. We again initialize V0**π(s) to some arbitrary values for all states s∈S and at the kth iteration, perform the updates

Value Iteration을 사용하면 최적의 결정론적 정책 π*의 V**π*와 같은 최적의 가치 함수를 계산할 수 있습니다. 또한 유사한 반복 업데이트를 사용하여 잠재적으로 확률론적인 다른 정책 π와 관련된 가치 함수를 계산할 수도 있습니다. 모든 상태 s∈S에 대해 V0**π(s)를 임의의 값으로 다시 초기화하고 k번째 반복에서 업데이트를 수행합니다.

This algorithm is known as policy evaluation and is useful to compute the value function given the policy. Again, it turns out that as k→∞ these updates converge to the correct value function irrespective of the initialization V0,

이 알고리즘은 정책 평가로 알려져 있으며 정책에 따라 가치 함수를 계산하는 데 유용합니다. 다시 말하지만, k → π로 이러한 업데이트는 초기화 V0에 관계없이 올바른 값 함수로 수렴되는 것으로 나타났습니다.

The algorithm for computing the action-value function Q**π(s,a) of a policy π is analogous.

정책 π의 행동-가치 함수 Q**π(s,a)를 계산하는 알고리즘은 유사합니다.

Policy Evaluation이란?

"Policy Evaluation" is a fundamental step in the field of reinforcement learning, particularly in solving Markov decision processes (MDPs). It is the process of determining how good a given policy is by estimating the expected cumulative rewards an agent can achieve when following that policy in a given environment.

'정책 평가(Policy Evaluation)'는 강화 학습 분야에서 중요한 단계로, 특히 마르코프 결정 프로세스(MDP)를 해결하는 데 사용됩니다. 이는 주어진 환경에서 특정 정책을 따를 때 에이전트가 어떤 정책인지를 결정하는 과정입니다.

Here's a breakdown of what Policy Evaluation involves:

정책 평가가 포함하는 내용을 살펴보겠습니다:

Policy: A policy in reinforcement learning is a strategy that defines the agent's behavior in an environment. It specifies the mapping from states to actions, indicating what action to take in each state.

정책: 강화 학습에서 정책은 환경에서 에이전트의 행동을 정의하는 전략입니다. 이는 상태에서 행동으로의 매핑을 지정하며, 각 상태에서 어떤 행동을 취할지를 나타냅니다.
Value Function: Policy Evaluation aims to compute the value function associated with a given policy. The value function, denoted as V(s), represents the expected cumulative rewards an agent can obtain when starting from a particular state s and following the specified policy.

가치 함수: 정책 평가는 주어진 정책과 관련된 가치 함수를 계산하기 위한 과정입니다. 가치 함수는 V(s)로 표시되며, 특정 상태 s에서 시작하여 지정된 정책을 따를 때 에이전트가 얻을 수 있는 기대 누적 보상을 나타냅니다.
Bellman Expectation Equation: Policy Evaluation relies on the Bellman Expectation Equation, which expresses the value of a state in terms of the values of its neighboring states. It is typically represented as V(s) = R(s) + γ * Σ P(s' | s, π(s)) * V(s'), where R(s) is the immediate reward in state s, γ (gamma) is the discount factor, P(s' | s, π(s)) is the transition probability to reach state s' from state s by following the policy π(s), and V(s') is the value of state s'.

벨만 기대 방정식: 정책 평가는 벨만 기대 방정식을 기반으로 합니다. 이 방정식은 상태의 가치를 해당 상태의 이웃 상태의 가치로 표현합니다. 일반적으로 V(s) = R(s) + γ * Σ P(s' | s, π(s)) * V(s')로 표현되며, 여기서 R(s)는 상태 s에서의 즉각적인 보상, γ (감소 요소)는 할인 요소, P(s' | s, π(s))는 정책 π(s)를 따라 상태 s에서 상태 s'로 이동할 전이 확률, V(s')은 상태 s'의 가치를 나타냅니다.
Iterative Process: Policy Evaluation is often an iterative process. It starts with an initial estimate of the value function and iteratively updates the values of states based on the Bellman Expectation Equation until the values converge or change very little between iterations.

반복적인 프로세스: 정책 평가는 종종 반복적인 프로세스입니다. 초기 가치 함수의 추정치로 시작하고 벨만 기대 방정식을 기반으로 상태의 가치를 반복적으로 업데이트합니다. 값이 수렴하거나 반복 간에 값이 거의 변하지 않을 때까지 진행됩니다.
Convergence: Policy Evaluation continues until the value function estimates stabilize, meaning that further iterations do not significantly change the values. This indicates that the value function has converged to the true value function for the given policy.

수렴: 정책 평가는 값 함수의 추정치가 안정화될 때까지 계속됩니다. 이는 값 함수의 추정치가 주어진 정책에 대한 실제 값 함수로 수렴했음을 나타냅니다.

Policy Evaluation serves as a crucial component in many reinforcement learning algorithms, such as Policy Iteration and Value Iteration, where it is used to assess and improve policies. By accurately estimating the value of each state under a policy, it helps the agent make informed decisions and find the optimal policy that maximizes the expected cumulative rewards in the environment.

정책 평가는 정책 평가를 평가하고 개선하는 데 사용되는 많은 강화 학습 알고리즘에서 중요한 구성 요소로 사용됩니다. 각 상태의 값을 정확하게 추정함으로써 에이전트는 정보에 기반하여 결정을 내리고 환경에서 기대 누적 보상을 최대화하는 최적 정책을 찾는 데 도움을 줍니다.

17.2.8. Implementation of Value Iteration

We next show how to implement Value Iteration for a navigation problem called FrozenLake from Open AI Gym. We first need to setup the enviroment as shown in the following code.

다음으로 Open AI Gym에서 FrozenLake라는 탐색 문제에 대한 Value Iteration을 구현하는 방법을 보여줍니다. 먼저 다음 코드와 같이 환경을 설정해야 합니다.

%matplotlib inline
import random
import numpy as np
from d2l import torch as d2l

seed = 0  # Random number generator seed
gamma = 0.95  # Discount factor
num_iters = 10  # Number of iterations
random.seed(seed)  # Set the random seed to ensure results can be reproduced
np.random.seed(seed)

# Now set up the environment
env_info = d2l.make_env('FrozenLake-v1', seed=seed)

이 코드는 강화 학습 문제를 설정하고 환경을 초기화하는 데 사용됩니다. 주요 내용을 설명하겠습니다.

%matplotlib inline: 이 라인은 주피터 노트북에서 그래프를 인라인으로 표시하도록 하는 명령입니다. 그래프나 시각화를 출력하려면 이 명령을 사용합니다.
import random: Python의 내장 라이브러리인 random을 임포트합니다. random 모듈은 난수 생성과 관련된 함수를 제공합니다.
import numpy as np: NumPy를 임포트하고 별칭인 np로 사용합니다. NumPy는 과학적 계산을 위한 파이썬 라이브러리로, 다차원 배열과 관련된 다양한 기능을 제공합니다.
from d2l import torch as d2l: d2l 라이브러리에서 torch 모듈을 가져와 d2l 별칭으로 사용합니다. 이 라이브러리는 "Dive into Deep Learning"의 예제 및 유틸리티 함수를 제공합니다.
seed = 0: 난수 생성기의 시드 값을 설정합니다. 시드는 난수 생성의 초기 상태를 결정하는 값으로, 동일한 시드를 사용하면 같은 난수가 생성됩니다. 따라서 결과를 재현할 수 있도록 시드를 설정합니다.
gamma = 0.95: 강화 학습에서 사용하는 할인 요소(discount factor)입니다. 할인 요소는 미래 보상을 현재 가치로 감소시키는 데 사용됩니다. 값이 0에서 1 사이의 실수입니다.
num_iters = 10: 강화 학습 알고리즘에서 수행할 반복 횟수 또는 에피소드 수를 설정합니다. 이 예에서는 10번의 반복을 수행합니다.
random.seed(seed): Python의 랜덤 시드를 설정하여 난수 생성을 초기화합니다. 이렇게 함으로써 동일한 시드를 사용하면 항상 같은 난수가 생성됩니다. 결과 재현성을 위해 시드를 설정합니다.
np.random.seed(seed): NumPy의 난수 생성기의 시드를 설정합니다. 이 역시 결과 재현성을 위한 조치입니다.
env_info = d2l.make_env('FrozenLake-v1', seed=seed): d2l 라이브러리의 make_env 함수를 사용하여 강화 학습 환경을 설정합니다. 여기서는 "FrozenLake-v1" 환경을 사용하며, 앞서 설정한 시드를 적용합니다. "FrozenLake"는 OpenAI Gym에서 제공하는 강화 학습 환경 중 하나로, 얼어붙은 호수를 건너는 에이전트를 제어하는 문제입니다.

이제 강화 학습 환경이 설정되었으며, 이를 사용하여 에이전트를 훈련하고 평가할 수 있게 됩니다.

FrozenLake 환경에서 로봇은 4X4 grid (이것이 state임)에서 "위"(↑), "아래"(→), "왼쪽"(←), "오른쪽"( →)의 action들에 의해 움직입니다. 환경에는 다수의 구멍(H) 세포와 동결(F) 세포 및 목표 세포(G)가 포함되어 있으며, 이들 모두는 로봇에 알려지지 않습니다. 문제를 단순하게 유지하기 위해 로봇이 신뢰할 수 있는 동작을 한다고 가정합니다. 즉, 모든 s∈S,a∈A에 대해 P(s′∣s,a)=1입니다. 로봇이 목표에 도달하면 시험이 종료되고 로봇은 행동에 관계없이 1의 보상을 받습니다. 다른 상태에서의 보상은 모든 행동에 대해 0입니다. 로봇의 목적은 주어진 시작 위치(S)(이것은 s0)에서 목표 위치(G)에 도달하는 정책을 학습하여 수익을 극대화하는 것입니다.

The following function implements Value Iteration, where env_info contains MDP and environment related information and gamma is the discount factor:

다음 함수는 Value Iteration을 구현합니다. 여기서 env_info에는 MDP 및 환경 관련 정보가 포함되고 감마는 할인 요소입니다.

def value_iteration(env_info, gamma, num_iters):
    env_desc = env_info['desc']  # 2D array shows what each item means
    prob_idx = env_info['trans_prob_idx']
    nextstate_idx = env_info['nextstate_idx']
    reward_idx = env_info['reward_idx']
    num_states = env_info['num_states']
    num_actions = env_info['num_actions']
    mdp = env_info['mdp']

    V  = np.zeros((num_iters + 1, num_states))
    Q  = np.zeros((num_iters + 1, num_states, num_actions))
    pi = np.zeros((num_iters + 1, num_states))

    for k in range(1, num_iters + 1):
        for s in range(num_states):
            for a in range(num_actions):
                # Calculate \sum_{s'} p(s'\mid s,a) [r + \gamma v_k(s')]
                for pxrds in mdp[(s,a)]:
                    # mdp(s,a): [(p1,next1,r1,d1),(p2,next2,r2,d2),..]
                    pr = pxrds[prob_idx]  # p(s'\mid s,a)
                    nextstate = pxrds[nextstate_idx]  # Next state
                    reward = pxrds[reward_idx]  # Reward
                    Q[k,s,a] += pr * (reward + gamma * V[k - 1, nextstate])
            # Record max value and max action
            V[k,s] = np.max(Q[k,s,:])
            pi[k,s] = np.argmax(Q[k,s,:])
    d2l.show_value_function_progress(env_desc, V[:-1], pi[:-1])

value_iteration(env_info=env_info, gamma=gamma, num_iters=num_iters)

이 코드는 가치 반복(Value Iteration) 알고리즘을 사용하여 강화 학습 환경에서 최적 가치 함수(Optimal Value Function)와 최적 정책(Optimal Policy)을 찾는 데 사용됩니다. 가치 반복은 동적 프로그래밍(Dynamic Programming) 기술 중 하나로, 최적 가치 함수를 업데이트하고 최적 정책을 추정하여 에이전트를 최적으로 학습시키는 방법입니다.

주요 내용을 설명하겠습니다.

env_desc, prob_idx, nextstate_idx, reward_idx, num_states, num_actions, mdp 등의 변수: 이 변수들은 강화 학습 환경 정보를 저장하는데 사용됩니다. 각 변수는 다음과 같은 역할을 합니다.
- env_desc: 환경의 구조를 나타내는 2차원 배열로, 각 원소는 환경의 상태를 나타냅니다.
- prob_idx, nextstate_idx, reward_idx: 강화 학습 환경의 상태 전이 확률, 다음 상태, 보상 정보의 인덱스를 나타냅니다.
- num_states: 상태의 개수를 나타냅니다.
- num_actions: 행동의 개수를 나타냅니다.
- mdp: 환경의 확률적 상태 전이 정보를 나타내는 딕셔너리입니다.
V, Q, pi 변수: 이 변수들은 각각 가치 함수, 행동 가치 함수, 정책을 나타냅니다. 이러한 변수들은 반복적으로 업데이트되며, 최적 가치 함수와 최적 정책을 추정하는 데 사용됩니다.
for k in range(1, num_iters + 1):: 지정된 반복 횟수(num_iters) 동안 가치 반복 알고리즘을 실행합니다.
for s in range(num_states):: 모든 상태에 대해 반복합니다.
for a in range(num_actions):: 모든 가능한 행동에 대해 반복합니다.
for pxrds in mdp[(s,a)]:: 상태 전이 확률, 다음 상태, 보상, 종료 여부 정보를 mdp에서 가져와 반복합니다. 이 정보는 확률적 상태 전이를 나타냅니다.
pr, nextstate, reward: 상태 전이 확률, 다음 상태, 보상 값을 가져옵니다.
Q[k,s,a] += pr * (reward + gamma * V[k - 1, nextstate]): 행동 가치 함수 Q를 업데이트합니다. 이식은 벨만 방정식을 따릅니다.
V[k,s] = np.max(Q[k,s,:]): 가치 함수 V를 업데이트하고, 해당 상태에서 최대 가치를 기록합니다.
pi[k,s] = np.argmax(Q[k,s,:]): 정책을 업데이트하고, 해당 상태에서 최적 행동을 기록합니다.
d2l.show_value_function_progress(env_desc, V[:-1], pi[:-1]): 현재까지의 반복에서 얻은 최적 가치 함수와 최적 정책을 시각화하여 표시합니다.
value_iteration(env_info=env_info, gamma=gamma, num_iters=num_iters): 가치 반복 알고리즘을 실행하는 함수를 호출합니다.

이렇게 가치 반복 알고리즘은 주어진 환경에서 최적의 가치 함수와 최적의 정책을 찾아냅니다. 최적의 정책은 에이전트가 어떤 상태에서 어떤 행동을 선택해야 하는지를 결정합니다.

The above pictures show the policy (the arrow indicates the action) and value function (the change in color shows how the value function changes over time from the initial value shown by dark color to the optimal value shown by light colors.). As we see, Value Iteration finds the optimal value function after 10 iterations and the goal state (G) can be reached starting from any state as long as it is not an H cell. Another interesting aspect of the implementation is that in addition to finding the optimal value function, we also automatically found the optimal policy π* corresponding to this value function.

위 그림은 정책(화살표는 동작을 나타냄)과 가치함수(색상 변화는 가치함수가 시간에 따라 어두운 색으로 표시된 초기 값에서 밝은 색으로 표시된 최적의 값으로 어떻게 변화하는지 보여줍니다.)를 보여줍니다. 보시다시피 Value Iteration은 10번의 반복 후에 최적의 가치 함수를 찾고 목표 상태(G)는 H 셀이 아닌 한 어떤 상태에서도 도달할 수 있습니다. 구현의 또 다른 흥미로운 측면은 최적의 가치 함수를 찾는 것 외에도 이 가치 함수에 해당하는 최적의 정책 π*를 자동으로 찾았다는 것입니다.

17.2.9. Summary

The main idea behind the Value Iteration algorithm is to use the principle of dynamic programming to find the optimal average return obtained from a given state. Note that implementing the Value Iteration algorithm requires that we know the Markov decision process (MDP), e.g., the transition and reward functions, completely.

Value Iteration 알고리즘의 주요 아이디어는 동적 프로그래밍의 원리를 사용하여 주어진 상태에서 얻은 최적의 평균 수익을 찾는 것입니다. Value Iteration 알고리즘을 구현하려면 Markov 결정 프로세스(MDP), 예를 들어 전환 및 보상 기능을 완전히 알아야 합니다.

'Dive into Deep Learning > D2L Reinforcement Learning' 카테고리의 다른 글

D2L - 17.3. Q-Learning (0)	2023.09.05
D2L - 17.1. Markov Decision Process (MDP) (0)	2023.09.05
D2L-17. Reinforcement Learning (0)	2023.09.05

Dive into Deep Learning/D2L Reinforcement Learning

D2L - 17.1. Markov Decision Process (MDP)

2023. 9. 5. 11:30 | Posted by 솔웅

https://d2l.ai/chapter_reinforcement-learning/mdp.html

17.1. Markov Decision Process (MDP) — Dive into Deep Learning 1.0.3 documentation

d2l.ai

17.1. Markov Decision Process (MDP)

In this section, we will discuss how to formulate reinforcement learning problems using Markov decision processes (MDPs) and describe various components of MDPs in detail.

이 섹션에서는 MDP(Markov Decision Process)를 사용하여 강화학습 문제를 공식화하는 방법을 논의하고 MDP의 다양한 구성 요소를 자세히 설명합니다.

17.1.1. Definition of an MDP

A Markov decision process (MDP) (Bellman, 1957) is a model for how the state of a system evolves as different actions are applied to the system. A few different quantities come together to form an MDP.

마르코프 결정 프로세스(MDP)(Bellman, 1957)는 다양한 작업이 시스템에 적용될 때 시스템 상태가 어떻게 발전하는지에 대한 모델입니다. 몇 가지 다른 양이 모여서 MDP를 형성합니다.

Fig. 17.1.1  A simple gridworld navigation task where the robot not only has to find its way to the goal location (shown as a green house) but also has to avoid trap locations (shown as red cross signs). 그림 17.1.1 로봇이 목표 위치(녹색 집으로 표시)로 가는 길을 찾아야 할 뿐만 아니라 트랩 위치(적십자 기호로 표시)를 피해야 하는 간단한 그리드월드 탐색 작업입니다.

Let S be the set of states in the MDP. As a concrete example see Fig. 17.1.1, for a robot that is navigating a gridworld. In this case, S corresponds to the set of locations that the robot can be at any given timestep.

S를 MDP의 상태 집합으로 설정합니다. 구체적인 예로 그리드 세계를 탐색하는 로봇에 대한 그림 17.1.1을 참조하세요. 이 경우 S는 주어진 시간 단계에서 로봇이 있을 수 있는 위치 집합에 해당합니다.
Let A be the set of actions that the robot can take at each state, e.g., “go forward”, “turn right”, “turn left”, “stay at the same location”, etc. Actions can change the current state of the robot to some other state within the set S.

A를 로봇이 각 상태에서 취할 수 있는 일련의 작업(예: "앞으로 이동", "우회전", "좌회전", "같은 위치에 유지" 등)이라고 가정합니다. 작업은 로봇의 현재 상태를 변경할 수 있습니다. 로봇을 세트 S 내의 다른 상태로 전환합니다.
It may happen that we do not know how the robot moves exactly but only know it up to some approximation. We model this situation in reinforcement learning as follows: if the robot takes an action “go forward”, there might be a small probability that it stays at the current state, another small probability that it “turns left”, etc. Mathematically, this amounts to defining a “transition function” T:S×A×S→[0,1] such that T(s,a,s′)=P(s′∣s,a) using the conditional probability of reaching a state s′ given that the robot was at state s and took an action a. The transition function is a probability distribution and we therefore have ∑s′∈s**T(s,a,s′)=1 for all s∈S and a∈A, i.e., the robot has to go to some state if it takes an action.

로봇이 정확히 어떻게 움직이는지는 모르지만 대략적인 정도까지만 알 수 있는 경우도 있습니다. 우리는 강화 학습에서 이 상황을 다음과 같이 모델링합니다. 로봇이 "앞으로 나아가는" 행동을 취하면 현재 상태에 머무를 확률이 작을 수도 있고 "좌회전"할 확률도 작을 수도 있습니다. 수학적으로 이는 상태에 도달할 조건부 확률을 사용하여 T(s,a,s')=P(s'∣s,a)가 되도록 "전이 함수" T:S×A×S→[0,1]을 정의하는 것과 같습니다. s' 로봇이 s 상태에 있고 조치를 취했다는 점을 고려하면 a. 전이 함수는 확률 분포이므로 모든 s∈S 및 a∈A에 대해 ∑s′∈s**T(s,a,s′)=1입니다. 즉, 로봇은 다음과 같은 경우 어떤 상태로 이동해야 합니다. 조치가 필요합니다.
We now construct a notion of which actions are useful and which ones are not using the concept of a “reward” r:S×A→ℝ. We say that the robot gets a reward r(s,a) if the robot takes an action a at state s. If the reward r(s,a) is large, this indicates that taking the action a at state s is more useful to achieving the goal of the robot, i.e., going to the green house. If the reward r(s,a) is small, then action a is less useful to achieving this goal. It is important to note that the reward is designed by the user (the person who creates the reinforcement learning algorithm) with the goal in mind.

이제 우리는 어떤 행동이 유용하고 어떤 행동이 "보상" r:S×A→ℝ 개념을 사용하지 않는지에 대한 개념을 구성합니다. 로봇이 상태 s에서 행동 a를 취하면 로봇은 보상 r(s,a)를 받는다고 말합니다. 보상 r(s,a)가 크다면, 이는 상태 s에서 a를 취하는 것이 로봇의 목표, 즉 온실로 가는 것을 달성하는 데 더 유용하다는 것을 나타냅니다. 보상 r(s,a)가 작으면 작업 a는 이 목표를 달성하는 데 덜 유용합니다. 보상은 목표를 염두에 두고 사용자(강화학습 알고리즘을 생성하는 사람)에 의해 설계된다는 점에 유의하는 것이 중요합니다.

17.1.2. Return and Discount Factor

The different components above together form a Markov decision process (MDP)

위의 다양한 구성 요소가 함께 MDP(Markov Decision Process)를 구성합니다.

Let’s now consider the situation when the robot starts at a particular state s0∈S and continues taking actions to result in a trajectory

이제 로봇이 특정 상태 s0∈S에서 시작하여 계속해서 궤적을 생성하는 작업을 수행하는 상황을 고려해 보겠습니다.

At each time step t the robot is at a state st and takes an action at which results in a reward rt=r(st,at). The return of a trajectory is the total reward obtained by the robot along such a trajectory

각 시간 단계 t에서 로봇은 상태 st에 있고 보상 rt=r(st,at)을 가져오는 작업을 수행합니다. 궤도의 복귀는 그러한 궤도를 따라 로봇이 얻는 총 보상입니다.

The goal in reinforcement learning is to find a trajectory that has the largest return.

강화학습의 목표는 가장 큰 수익을 내는 궤적을 찾는 것입니다.

Think of the situation when the robot continues to travel in the gridworld without ever reaching the goal location. The sequence of states and actions in a trajectory can be infinitely long in this case and the return of any such infinitely long trajectory will be infinite. In order to keep the reinforcement learning formulation meaningful even for such trajectories, we introduce the notion of a discount factor γ<1. We write the discounted return as

로봇이 목표 위치에 도달하지 못한 채 그리드 세계에서 계속 이동하는 상황을 생각해 보세요. 이 경우 궤도의 상태와 동작의 순서는 무한히 길어질 수 있으며 무한히 긴 궤도의 반환은 무한합니다. 그러한 궤적에 대해서도 강화 학습 공식을 의미 있게 유지하기 위해 할인 계수 γ<1이라는 개념을 도입합니다. 우리는 할인된 수익을 다음과 같이 씁니다.

Note that if γ is very small, the rewards earned by the robot in the far future, say t=1000, are heavily discounted by the factor γ**1000. This encourages the robot to select short trajectories that achieve its goal, namely that of going to the green house in the gridwold example (see Fig. 17.1.1). For large values of the discount factor, say γ=0.99, the robot is encouraged to explore and then find the best trajectory to go to the goal location.

γ가 매우 작은 경우, 먼 미래에 로봇이 얻는 보상(t=1000)은 γ**1000 인자로 크게 할인됩니다. 이는 로봇이 목표를 달성하는 짧은 궤적, 즉 그리드월드 예에서 온실로 가는 경로를 선택하도록 장려합니다(그림 17.1.1 참조). 할인 요소의 큰 값(γ=0.99라고 가정)의 경우 로봇은 목표 위치로 이동하기 위한 최적의 궤적을 탐색하고 찾도록 권장됩니다.

17.1.3. Discussion of the Markov Assumption

Let us think of a new robot where the state st is the location as above but the action at is the acceleration that the robot applies to its wheels instead of an abstract command like “go forward”. If this robot has some non-zero velocity at state st, then the next location st+1 is a function of the past location st, the acceleration at, also the velocity of the robot at time t which is proportional to st−st−1. This indicates that we should have

위와 같이 상태 st가 위치이지만 동작은 "전진"과 같은 추상적인 명령 대신 로봇이 바퀴에 적용하는 가속도인 새로운 로봇을 생각해 보겠습니다. 이 로봇이 상태 st에서 0이 아닌 속도를 갖는 경우 다음 위치 st+1은 과거 위치 st의 함수, 가속도, st−st−에 비례하는 시간 t에서의 로봇 속도의 함수입니다. 1. 이는 우리가 이것을 가져야 함을 나타냅니다.

the “some function” in our case would be Newton’s law of motion. This is quite different from our transition function that simply depends upon st and at.

우리의 경우 "some function"은 뉴턴의 운동 법칙이 될 것입니다. 이는 단순히 st와 at에 의존하는 전환 함수와는 상당히 다릅니다.

Markov systems are all systems where the next state st+1 is only a function of the current state st and the action αt taken at the current state. In Markov systems, the next state does not depend on which actions were taken in the past or the states that the robot was at in the past. For example, the new robot that has acceleration as the action above is not Markovian because the next location st+1 depends upon the previous state st−1 through the velocity. It may seem that Markovian nature of a system is a restrictive assumption, but it is not so. Markov Decision Processes are still capable of modeling a very large class of real systems. For example, for our new robot, if we chose our state st to the tuple (location,velocity) then the system is Markovian because its next state (location t+1,velocity t+1) depends only upon the current state (location t,velocity t) and the action at the current state αt.

마르코프 시스템은 다음 상태 st+1이 현재 상태 st와 현재 상태에서 취한 조치 αt의 함수일 뿐인 모든 시스템입니다. Markov 시스템에서 다음 상태는 과거에 어떤 작업이 수행되었는지 또는 로봇이 과거에 있었던 상태에 의존하지 않습니다. 예를 들어, 위의 동작으로 가속도를 갖는 새 로봇은 다음 위치 st+1이 속도를 통해 이전 상태 st-1에 의존하기 때문에 Markovian이 아닙니다. 시스템의 마코브적 특성은 제한적인 가정인 것처럼 보일 수 있지만 그렇지 않습니다. Markov 결정 프로세스는 여전히 매우 큰 규모의 실제 시스템을 모델링할 수 있습니다. 예를 들어, 새 로봇의 경우 상태 st를 튜플(위치, 속도)로 선택하면 다음 상태(위치 t+1, 속도 t+1)가 현재 상태(위치)에만 의존하기 때문에 시스템은 마코비안입니다. t,속도 t) 및 현재 상태에서의 작용 αt.

17.1.4. Summary

The reinforcement learning problem is typically modeled using Markov Decision Processes. A Markov decision process (MDP) is defined by a tuple of four entities (S,A,T,r) where S is the state space, A is the action space, T is the transition function that encodes the transition probabilities of the MDP and r is the immediate reward obtained by taking action at a particular state.

강화 학습 문제는 일반적으로 Markov 결정 프로세스를 사용하여 모델링됩니다. 마르코프 결정 프로세스(MDP)는 4개 엔터티(S,A,T,r)의 튜플로 정의됩니다. 여기서 S는 상태 공간, A는 작업 공간, T는 MDP의 전환 확률을 인코딩하는 전환 함수입니다. r은 특정 상태에서 조치를 취함으로써 얻은 즉각적인 보상입니다.

17.1.5. Exercises

Suppose that we want to design an MDP to model MountainCar problem.
1. What would be the set of states?
2. What would be the set of actions?
3. What would be the possible reward functions?
How would you design an MDP for an Atari game like Pong game?

'Dive into Deep Learning > D2L Reinforcement Learning' 카테고리의 다른 글

D2L - 17.3. Q-Learning (0)	2023.09.05
D2L - 17.2. Value Iteration (0)	2023.09.05
D2L-17. Reinforcement Learning (0)	2023.09.05

1 2 3 4 5 6 ··· 13

공지사항

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

카테고리

'Dive into Deep Learning'에 해당되는 글 123건

19.2. Hyperparameter Optimization API

19.2.1. Searcher

19.2.2. Scheduler

19.2.3. Tuner

19.2.4. Bookkeeping the Performance of HPO Algorithms

19.2.5. Example: Optimizing the Hyperparameters of a Convolutional Neural Network

19.2.6. Comparing HPO Algorithms

19.2.7. Summary

19.2.8. Exercises

'Dive into Deep Learning > D2L Hyperparameter Optimization' 카테고리의 다른 글

19.1. What Is Hyperparameter Optimization?

19.1.1. The Optimization Problem

19.1.1.1. The Objective Function

19.1.1.2. The Configuration Space

19.1.2. Random Search

19.1.3. Summary

19.1.4. Exercises

'Dive into Deep Learning > D2L Hyperparameter Optimization' 카테고리의 다른 글

19. Hyperparameter Optimization

'Dive into Deep Learning > D2L Hyperparameter Optimization' 카테고리의 다른 글

18.3. Gaussian Process Inference

18.3.1. Posterior Inference for Regression

18.3.2. Equations for Making Predictions and Learning Kernel Hyperparameters in GP Regression

18.3.3. Interpreting Equations for Learning and Predictions

18.3.4. Worked Example from Scratch

18.3.5. Making Life Easy with GPyTorch

18.3.6. Summary

18.3.7. Exercises

'Dive into Deep Learning > D2L Gaussian Processes' 카테고리의 다른 글

18.2. Gaussian Process Priors

18.2.1. Definition

18.2.2. A Simple Gaussian Process

18.2.3. From Weight Space to Function Space

18.2.4. The Radial Basis Function (RBF) Kernel

18.2.5. The Neural Network Kernel

18.2.6. Summary

18.2.7. Exercises

'Dive into Deep Learning > D2L Gaussian Processes' 카테고리의 다른 글

18.1. Introduction to Gaussian Processes

18.1.1. Summary

18.1.2. Exercises

'Dive into Deep Learning > D2L Gaussian Processes' 카테고리의 다른 글

18. Gaussian Processes

'Dive into Deep Learning > D2L Gaussian Processes' 카테고리의 다른 글

17.3. Q-Learning

17.3.1. The Q-Learning Algorithm

17.3.2. An Optimization Problem Underlying Q-Learning

17.3.3. Exploration in Q-Learning

17.3.4. The “Self-correcting” Property of Q-Learning

17.3.5. Implementation of Q-Learning

17.3.6. Summary

'Dive into Deep Learning > D2L Reinforcement Learning' 카테고리의 다른 글

17.2. Value Iteration

17.2.1. Stochastic Policy 확률적 정책

17.2.2. Value Function

17.2.3. Action-Value Function

17.2.4. Optimal Stochastic Policy

17.2.5. Principle of Dynamic Programming

17.2.6. Value Iteration

17.2.7. Policy Evaluation

17.2.8. Implementation of Value Iteration

17.2.9. Summary

'Dive into Deep Learning > D2L Reinforcement Learning' 카테고리의 다른 글

17.1. Markov Decision Process (MDP)

17.1.1. Definition of an MDP

17.1.2. Return and Discount Factor

17.1.3. Discussion of the Markov Assumption

17.1.4. Summary

17.1.5. Exercises

'Dive into Deep Learning > D2L Reinforcement Learning' 카테고리의 다른 글

티스토리툴바