반응형
블로그 이미지
개발자로서 현장에서 일하면서 새로 접하는 기술들이나 알게된 정보 등을 정리하기 위한 블로그입니다. 운 좋게 미국에서 큰 회사들의 프로젝트에서 컬설턴트로 일하고 있어서 새로운 기술들을 접할 기회가 많이 있습니다. 미국의 IT 프로젝트에서 사용되는 툴들에 대해 많은 분들과 정보를 공유하고 싶습니다.
솔웅

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

카테고리

'Fine tuned'에 해당되는 글 1

  1. 2023.03.19 Openai cookbook : Azure OpenAI - How to fine-tune GPT-3 with Azure OpenAI


반응형

Azure OpenAI를 사용하려면 해당 서비스를 사용할 수 있는 권한을 Microsoft 사로부터 얻어야 합니다.

저는 공부 목적으로 필요하다고 신청했는데 거부 됐습니다.

 

 

실제 이 Azure OpenAI를 이용해서 제품을 개발하고자 한다면 한번 신청해 보세요.

신청 방법은 아래 글에 정리 해 놨습니다.

https://coronasdk.tistory.com/1304

 

Azure OpenAI 를 사용하기 위한 사전 요구 사항들 - 사용 요청 거부 됨

OpenAI CookBook 을 거의 다 공부하고 Azure OpenAI 를 공부할 차례 입니다. https://github.com/openai/openai-cookbook GitHub - openai/openai-cookbook: Examples and guides for using the OpenAI API Examples and guides for using the OpenAI API. C

coronasdk.tistory.com

 

저는 일단 실습은 못하고 Cookbook에 있는 글로 공부해 보겠습니다.

 

https://github.com/openai/openai-cookbook/blob/main/examples/azure/finetuning.ipynb

 

GitHub - openai/openai-cookbook: Examples and guides for using the OpenAI API

Examples and guides for using the OpenAI API. Contribute to openai/openai-cookbook development by creating an account on GitHub.

github.com

 

Azure Fine tuning example

In this example we'll try to go over all operations that can be done using the Azure endpoints and their differences with the openAI endpoints (if any).

 

이 예제에서는 Azure endpoints 를  사용하여 수행할 수 있는 모든 작업과 openAI endpoints  (있는 경우)과의 차이점을 살펴보겠습니다.


This example focuses on finetuning but also touches on the majority of operations that are available using the API. This example is meant to be a quick way of showing simple operations and is not meant as a finetune model adaptation tutorial.

 

이 예제는 finetuning에 중점을 두지만 API를 사용하여 사용할 수 있는 대부분의 작업도 다룹니다. 이 예제는 간단한 작업을 빠르게 보여주기 위한 것이며 finetuning 모델 적용 튜토리얼이 아닙니다.

 

import openai
from openai import cli

 

Setup

For the following sections to work properly we first have to setup some things. Let's start with the api_base and api_version. To find your api_base go to https://portal.azure.com, find your resource and then under "Resource Management" -> "Keys and Endpoints" look for the "Endpoint" value.

 

다음 섹션이 제대로 작동하려면 먼저 몇 가지를 설정해야 합니다. api_base 및 api_version부터 시작하겠습니다. api_base를 찾으려면 https://portal.azure.com으로 이동하여 리소스를 찾은 다음 "Resource Management" -> "Keys and Endpoints"에서 "Endpoint" 값을 찾습니다.

 

openai.api_version = '2022-12-01'
openai.api_base = '' # Please add your endpoint here

 

We next have to setup the api_type and api_key. We can either get the key from the portal or we can get it through Microsoft Active Directory Authentication. Depending on this the api_type is either azure or azure_ad.

 

다음으로 api_type 및 api_key를 설정해야 합니다. 포털에서 키를 얻거나 Microsoft Active Directory 인증을 통해 얻을 수 있습니다. 이에 따라 api_type은 azure 또는 azure_ad입니다.

 

Setup: Portal

Let's first look at getting the key from the portal. Go to https://portal.azure.com, find your resource and then under "Resource Management" -> "Keys and Endpoints" look for one of the "Keys" values.

 

먼저 포털에서 키를 가져오는 방법을 살펴보겠습니다. https://portal.azure.com으로 이동하여 리소스를 찾은 다음 "Resource Management" -> "Keys and Endpoints"에서 "Keys" 값 중 하나를 찾습니다.

 

openai.api_type = 'azure'
openai.api_key = ''  # Please add your api key here

(Optional) Setup: Microsoft Active Directory Authentication

Let's now see how we can get a key via Microsoft Active Directory Authentication. Uncomment the following code if you want to use Active Directory Authentication instead of keys from the portal.

 

(선택 사항) 설정: Microsoft Active Directory 인증
이제 Microsoft Active Directory 인증을 통해 키를 얻는 방법을 살펴보겠습니다. 포털의 키 대신 Active Directory 인증을 사용하려면 다음 코드의 주석을 제거하십시오.

 

# from azure.identity import DefaultAzureCredential

# default_credential = DefaultAzureCredential()
# token = default_credential.get_token("https://cognitiveservices.azure.com/.default")

# openai.api_type = 'azure_ad'
# openai.api_key = token.token

 

Files

In the next section we will focus on the files operations: importing, listing, retrieving, deleting. For this we need to create 2 temporary files with some sample data. For the sake of simplicity, we will use the same data for training and validation.

 

다음 섹션에서는 가져오기, 나열, 검색, 삭제와 같은 파일 작업에 중점을 둘 것입니다. 이를 위해 일부 샘플 데이터로 2개의 임시 파일을 생성해야 합니다. 단순화를 위해 교육 및 검증에 동일한 데이터를 사용합니다.

 

import shutil
import json

training_file_name = 'training.jsonl'
validation_file_name = 'validation.jsonl'

sample_data = [{"prompt": "When I go to the store, I want an", "completion": "apple."},
    {"prompt": "When I go to work, I want a", "completion": "coffee."},
    {"prompt": "When I go home, I want a", "completion": "soda."}]

print(f'Generating the training file: {training_file_name}')
with open(training_file_name, 'w') as training_file:
    for entry in sample_data:
        json.dump(entry, training_file)
        training_file.write('\n')

print(f'Copying the training file to the validation file')
shutil.copy(training_file_name, validation_file_name)

 

Files: Listing

List all of the uploaded files and check for the ones that are named "training.jsonl" or "validation.jsonl"

 

업로드된 모든 파일을 나열하고 이름이 "training.jsonl" 또는 "validation.jsonl"인 파일을 확인합니다.

 

print('Checking for existing uploaded files.')
results = []
files = openai.File.list().data
print(f'Found {len(files)} total uploaded files in the subscription.')
for item in files:
    if item["filename"] in [training_file_name, validation_file_name]:
        results.append(item["id"])
print(f'Found {len(results)} already uploaded files that match our names.')

 

Files: Deleting

Let's now delete those found files (if any) since we're going to be re-uploading them next.

 

다음에 다시 업로드할 예정이므로 찾은 파일(있는 경우)을 삭제하겠습니다.

 

print(f'Deleting already uploaded files...')
for id in results:
    openai.File.delete(sid = id)

 

Files: Importing & Retrieving

Now, let's import our two files ('training.jsonl' and 'validation.jsonl') and keep those IDs since we're going to use them later for finetuning.

 

이제 두 파일('training.jsonl' 및 'validation.jsonl')을 가져오고 나중에 미세 조정에 사용할 것이므로 해당 ID를 유지하겠습니다.


For this operation we are going to use the cli wrapper which does a bit more checks before uploading and also gives us progress. In addition, after uploading we're going to check the status our import until it has succeeded (or failed if something goes wrong)

 

이 작업을 위해 업로드하기 전에 조금 더 확인하고 진행률을 제공하는 cli 래퍼를 사용할 것입니다. 또한 업로드 후 가져오기가 성공할 때까지(또는 무언가 잘못되면 실패할 때까지) 가져오기 상태를 확인합니다.

 

import time

def check_status(training_id, validation_id):
    train_status = openai.File.retrieve(training_id)["status"]
    valid_status = openai.File.retrieve(validation_id)["status"]
    print(f'Status (training_file | validation_file): {train_status} | {valid_status}')
    return (train_status, valid_status)

#importing our two files
training_id = cli.FineTune._get_or_upload(training_file_name, True)
validation_id = cli.FineTune._get_or_upload(validation_file_name, True)

#checking the status of the imports
(train_status, valid_status) = check_status(training_id, validation_id)

while train_status not in ["succeeded", "failed"] or valid_status not in ["succeeded", "failed"]:
    time.sleep(1)
    (train_status, valid_status) = check_status(training_id, validation_id)

 

Files: Downloading

Now let's download one of the files, the training file for example, to check that everything was in order during importing and all bits are there.

 

이제 파일 중 하나(예: 교육 파일)를 다운로드하여 가져오는 동안 모든 것이 제대로 작동하고 모든 비트가 있는지 확인합니다.

 

print(f'Downloading training file: {training_id}')
result = openai.File.download(training_id)
print(result.decode('utf-8'))

 

Finetune

In this section we are going to use the two training and validation files that we imported in the previous section, to train a finetune model.

 

이 섹션에서는 finetune  모델을 교육하기 위해 이전 섹션에서 가져온 두 개의 교육 및 검증 파일을 사용할 것입니다.

 

 

Finetune: Adapt

First let's create the finetune adaptation job.

먼저 미세 조정 적응 작업을 생성해 보겠습니다.

 

create_args = {
    "training_file": training_id,
    "validation_file": validation_id,
    "model": "babbage",
    "compute_classification_metrics": True,
    "classification_n_classes": 3,
    "n_epochs": 20,
    "batch_size": 3,
    "learning_rate_multiplier": 0.3
}
resp = openai.FineTune.create(**create_args)
job_id = resp["id"]
status = resp["status"]

print(f'Fine-tunning model with jobID: {job_id}.')

 

Finetune: Streaming

While the job runs, we can subscribe to the streaming events to check the progress of the operation.

작업이 실행되는 동안 스트리밍 이벤트를 구독하여 작업 진행 상황을 확인할 수 있습니다.

 

 

import signal
import datetime

def signal_handler(sig, frame):
    status = openai.FineTune.retrieve(job_id).status
    print(f"Stream interrupted. Job is still {status}.")
    return

print(f'Streaming events for the fine-tuning job: {job_id}')
signal.signal(signal.SIGINT, signal_handler)

events = openai.FineTune.stream_events(job_id)
try:
    for event in events:
        print(f'{datetime.datetime.fromtimestamp(event["created_at"])} {event["message"]}')

except Exception:
    print("Stream interrupted (client disconnected).")

 

Finetune: Listing and Retrieving

Now let's check that our operation was successful and in addition we can look at all of the finetuning operations using a list operation.

이제 작업이 성공했는지 확인하고 목록 작업을 사용하여 모든 finetune  작업을 볼 수 있습니다.

 

status = openai.FineTune.retrieve(id=job_id)["status"]
if status not in ["succeeded", "failed"]:
    print(f'Job not in terminal status: {status}. Waiting.')
    while status not in ["succeeded", "failed"]:
        time.sleep(2)
        status = openai.FineTune.retrieve(id=job_id)["status"]
        print(f'Status: {status}')
else:
    print(f'Finetune job {job_id} finished with status: {status}')

print('Checking other finetune jobs in the subscription.')
result = openai.FineTune.list()
print(f'Found {len(result.data)} finetune jobs.')

 

Finetune: Deleting

Finally we can delete our finetune job.
WARNING: Please skip this step if you want to continue with the next section as the finetune model is needed. (The delete code is commented out by default)

 

마지막으로 finetune  작업을 삭제할 수 있습니다.
경고: finetune 모델이 필요하므로 다음 섹션을 계속하려면 이 단계를 건너뛰십시오. (삭제 코드는 기본적으로 주석 처리됨)

 

# openai.FineTune.delete(sid=job_id)

 

Deployments

In this section we are going to create a deployment using the finetune model that we just adapted and then used the deployment to create a simple completion operation.

 

이 섹션에서는 방금 수정한 finetune  모델을 사용하여 deployment 를 생성한 다음 deployment 를 사용하여 간단한 completion  작업을 생성할 것입니다.

 

Deployments: Create

Let's create a deployment using the fine-tune model.

fine-tune모델을 사용하여 deployment 를 생성해 보겠습니다.

 

#Fist let's get the model of the previous job:
result = openai.FineTune.retrieve(id=job_id)
if result["status"] == 'succeeded':
    model = result["fine_tuned_model"]

# Now let's create the deployment
print(f'Creating a new deployment with model: {model}')
result = openai.Deployment.create(model=model, scale_settings={"scale_type":"standard"})
deployment_id = result["id"]

 

Deployments: Retrieving

Now let's check the status of the newly created deployment

이제 새로 생성된 배포의 상태를 확인하겠습니다.

 

print(f'Checking for deployment status.')
resp = openai.Deployment.retrieve(id=deployment_id)
status = resp["status"]
print(f'Deployment {deployment_id} is with status: {status}')

Deployments: Listing

Now because creating a new deployment takes a long time, let's look in the subscription for an already finished deployment that succeeded.

이제 새 deployment를 만드는 데 시간이 오래 걸리므로 이미 완료된 deployment에 대한 subscription 을 살펴보겠습니다.

 

print('While deployment running, selecting a completed one.')
deployment_id = None
result = openai.Deployment.list()
for deployment in result.data:
    if deployment["status"] == "succeeded":
        deployment_id = deployment["id"]
        break

if not deployment_id:
    print('No deployment with status: succeeded found.')
else:
    print(f'Found a successful deployment with id: {deployment_id}.')

 

Completions

Now let's send a sample completion to the deployment.

이제 deployment에 샘플 completion 을 보내겠습니다.

 

print('Sending a test completion job')
start_phrase = 'When I go home, I want a'
response = openai.Completion.create(deployment_id=deployment_id, prompt=start_phrase, temperature=0, stop=".")
text = response['choices'][0]['text'].replace('\n', '').replace(' .', '.').strip()
print(f'"{start_phrase} {text}."')

 

Deployments: Delete

Finally let's delete the deployment

마지막으로 deployment를 삭제하겠습니다.

 

print(f'Deleting deployment: {deployment_id}')
openai.Deployment.delete(sid=deployment_id)

 

반응형
이전 1 다음