이미지 출처는 링크 or 아이펠 교육 자료입니다.
Linting
: Lint 설정을 통해 코드 스타일이 지켜지지 않을 경우, 코드 베이스에 병합하지 못하도록 할 수 있음Testing
: 코드의 양이 많아질수록 반드시 필요Validation
: 검증 없이 적용할 경우, 추후 문제가 많이 발생할 수 있음(보안 이슈 등)Docs
: 파라미터에 대한 설명이 필수임Build
Package
Registry
Deploy
lint-and-test
deploy-to-artifact-repository
vertex AI
$ git clone https://github.com/hayannn/mlops-quicklab-cicd.git
. 파일은 기본적으로 보이지 않도록 설정되어 있음
$ls -al
$ tree -a -L 2.
lint-and-test
jobs:
lint-and-test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: "3.11"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -q -r requirements.txt
pip install -q black isort mypy pytest pytest-cov
- name: Run Super-Linter with Black
uses: super-linter/super-linter@v6.4.1
env:
DEFAULT_BRANCH: main
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
VALIDATE_PYTHON_BLACK: true
PYTHON_BLACK_CONFIG: "--check --diff"
- name: Train model for testing
run: python trainer.py
- name: Run tests with Pytest
run: PYTHONPATH=$(pwd) pytest tests/
Super-Linter?
- https://github.com/super-linter/super-linter
- super linter를 이용하면 다양한 기능을 강력하게 사용 가능
- 해당 프로젝트에서는 Black을 사용하기 위해 끌어옴
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
input_size = 784
hidden_size = 128
num_classes = 10
num_epochs = 5
batch_size = 100
learning_rate = 0.001
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_dataset = datasets.MNIST(
root="data", train=True, transform=transforms.ToTensor(), download=True
)
test_dataset = datasets.MNIST(root="data", train=False, transform=transforms.ToTensor())
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)
class NeuralNet(nn.Module):
def __init__(self):
super(NeuralNet, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(hidden_size, num_classes)
def forward(self, x):
out = self.fc1(x)
out = self.relu(out)
out = self.fc2(out)
return out
model = NeuralNet().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
def train():
total_step = len(train_loader)
for epoch in range(num_epochs):
for i, (images, labels) in enumerate(train_loader):
images = images.reshape(-1, 28 * 28).to(device)
labels = labels.to(device)
outputs = model(images)
loss = criterion(outputs, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (i + 1) % 100 == 0:
print(
f"Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{total_step}], Loss: {loss.item():.4f}"
)
def save_model():
model_path = "model.pth"
torch.save(model.state_dict(), model_path)
print(f"Model saved to {model_path}")
if __name__ == "__main__":
train()
save_model()
코드 실행해보기
$ conda activate quicklab-modu $ pip3 install -r requirements.txt
$ rm -rf model.pth
lint-and-test
가 정상적으로 실행된 후에만, Push 작업이 이뤄질 수 있도록 설정테스트
- GitHub Actions Rulesets 설정(Enforcement Active로 설정)
- a 파일 생성 후, commit
- push 시도 ➡️ reject 되어야 함!
- 이전 커밋으로 원복
$ git switch -c feat/update-nueron-size
trainer.py
의 input size 변경 후 저장
add&commit
$ git add .
$ git commit -m "feat: change input size from 784 to 812"
$ git push origin feat/update-nueron-size
$ gh auth login
$ gh pr create --title "$(git rev-parse --abbrev-ref HEAD)"
보통은 바로 Merge가 활성화되지만, 위의 Ruleset 설정에 의해 바로 설정되지 않는 모습
우선은 close하여 PR 취소하기 & 브랜치까지 삭제하기
CLI에서도 main 브랜치로 돌아가, 만든 브랜치 삭제
$ git checkout main
$ git branch -D feat/update-nueron-size
google-github-actions
: 별도 추가 설정 없이 이용할 수 있도록 해줌deploy-to-artifact-repository:
runs-on: ubuntu-latest
needs: lint-and-test
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Get short SHA
id: slug
run: echo "::set-output name=sha::$(git rev-parse --short HEAD)"
- id: auth
uses: google-github-actions/auth@v2
with:
credentials_json: "${{ secrets.GCP_SA_KEY }}"
- name: Set up Cloud SDK
uses: google-github-actions/setup-gcloud@v2
- name: Build and push Docker image to Artifact Registry
run: |
gcloud auth configure-docker $REGION-docker.pkg.dev
docker build -t $REGION-docker.pkg.dev/$PROJECT_ID/$ARTIFACT_REPO_NAME/$CUSTOM_IMAGE:${{ steps.slug.outputs.sha }} .
docker push $REGION-docker.pkg.dev/$PROJECT_ID/$ARTIFACT_REPO_NAME/$CUSTOM_IMAGE:${{ steps.slug.outputs.sha }}
⭐️역할 선택⭐️
키 생성
GCP_SA_KEY
)GCP_PROJECT_ID
: GCP 프로젝트 IDGCP_BUCKET_NAME
: GCP 스토리지에서 버킷 생성 -> 그 이름을 넣어주면 됨GCP_ARTIFACT_REPO_NAME
: 만든 저장소 이름 Docker
asia-northeast3(서울)
deploy-to-artifact-repository
deploy-to-vertex-ai
Free disk space
⭐️deploy-to-artifact-repository:
runs-on: ubuntu-latest
needs: lint-and-test
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Get short SHA
id: slug
run: echo "::set-output name=sha::$(git rev-parse --short HEAD)"
- id: auth
uses: google-github-actions/auth@v2
with:
credentials_json: "${{ secrets.GCP_SA_KEY }}"
- name: Set up Cloud SDK
uses: google-github-actions/setup-gcloud@v2
- name: Build and push Docker image to Artifact Registry
run: |
gcloud auth configure-docker $REGION-docker.pkg.dev
docker build -t $REGION-docker.pkg.dev/$PROJECT_ID/$ARTIFACT_REPO_NAME/$CUSTOM_IMAGE:${{ steps.slug.outputs.sha }} .
docker push $REGION-docker.pkg.dev/$PROJECT_ID/$ARTIFACT_REPO_NAME/$CUSTOM_IMAGE:${{ steps.slug.outputs.sha }}
deploy-to-vertex-ai:
runs-on: ubuntu-latest
needs: deploy-to-artifact-repository
steps:
- name: Free disk space
run: |
sudo docker rmi $(docker image ls -aq) >/dev/null 2>&1 || true
sudo rm -rf \
/usr/share/dotnet /usr/local/lib/android /opt/ghc \
/usr/local/share/powershell /usr/share/swift /usr/local/.ghcup \
/usr/lib/jvm || true
sudo apt install aptitude -y >/dev/null 2>&1
sudo aptitude purge aria2 ansible azure-cli shellcheck rpm xorriso zsync \
esl-erlang firefox gfortran-8 gfortran-9 google-chrome-stable \
google-cloud-sdk imagemagick \
libmagickcore-dev libmagickwand-dev libmagic-dev ant ant-optional kubectl \
mercurial apt-transport-https mono-complete libmysqlclient \
unixodbc-dev yarn chrpath libssl-dev libxft-dev \
libfreetype6 libfreetype6-dev libfontconfig1 libfontconfig1-dev \
snmp pollinate libpq-dev postgresql-client powershell ruby-full \
sphinxsearch subversion mongodb-org azure-cli microsoft-edge-stable \
-y -f >/dev/null 2>&1
sudo aptitude purge google-cloud-sdk -f -y >/dev/null 2>&1
sudo aptitude purge microsoft-edge-stable -f -y >/dev/null 2>&1 || true
sudo apt purge microsoft-edge-stable -f -y >/dev/null 2>&1 || true
sudo aptitude purge '~n ^mysql' -f -y >/dev/null 2>&1
sudo aptitude purge '~n ^php' -f -y >/dev/null 2>&1
sudo aptitude purge '~n ^dotnet' -f -y >/dev/null 2>&1
sudo apt-get autoremove -y >/dev/null 2>&1
sudo apt-get autoclean -y >/dev/null 2>&1
sudo rm -rf ${GITHUB_WORKSPACE}/.git
- id: auth
uses: google-github-actions/auth@v2
with:
credentials_json: "${{ secrets.GCP_SA_KEY }}"
- name: Set up Cloud SDK
uses: google-github-actions/setup-gcloud@v2
- name: Submit Vertex AI training job
run: |
gcloud auth configure-docker $REGION-docker.pkg.dev
gcloud ai custom-jobs create \
--project ${{ env.PROJECT_ID }} \
--region=${{ env.REGION }} \
--display-name=mnist-training-job \
--worker-pool-spec=machine-type=n1-standard-4,executor-image-uri=asia-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-13.py310:latest,output-image-uri=$REGION-docker.pkg.dev/$PROJECT_ID/$ARTIFACT_REPO_NAME/$JOB_IMAGE,local-package-path=.,python-module=trainer \
--args="--model-dir=gs://${{ secrets.GCP_BUCKET_NAME }}/models"
주의: T4 GPU 사용할 것!
필요한 라이브러리 설치
!pip install datasets torchserve torch-model-archiver torch-workflow-archiver nvgpu
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import seaborn as sns
from torch.utils.data import DataLoader
from transformers import BatchEncoding, BertTokenizer, BertForSequenceClassification, AdamW
from sklearn.metrics import confusion_matrix
from datasets import load_dataset
from tqdm import tqdm
from typing import TypedDict
import matplotlib.pyplot as plt
dataset = load_dataset("ag_news")
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=4)
optimizer = AdamW(model.parameters(), lr=5e-5)
criterion = torch.nn.CrossEntropyLoss()
class DatasetItem(TypedDict):
text: str
label: str
def preprocess_data(dataset_item: DatasetItem) -> dict[str, torch.Tensor]:
return tokenizer(dataset_item["text"], truncation=True, padding="max_length", return_tensors="pt")
train_dataset = dataset["train"].select(range(1200)).map(preprocess_data, batched=True)
test_dataset = dataset["test"].select(range(800)).map(preprocess_data, batched=True)
train_dataset.set_format("torch", columns=["input_ids", "attention_mask", "label"])
test_dataset.set_format("torch", columns=["input_ids", "attention_mask", "label"])
train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=8, shuffle=False)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
num_epochs = 3
losses: list[float] = []
for epoch in range(num_epochs):
model.train()
total_loss = 0
for batch in tqdm(train_loader, desc=f"Epoch {epoch + 1}"):
inputs = {key: batch[key].to(device) for key in batch}
labels = inputs.pop("label")
outputs = model(**inputs, labels=labels)
loss = outputs.loss
total_loss += loss.item()
losses.append(loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()
average_loss = total_loss / len(train_loader)
print(f"Epoch {epoch + 1}, Average Loss: {average_loss}")
model.eval()
correct = 0
total = 0
with torch.no_grad():
for batch in tqdm(test_loader, desc="Evaluating"):
inputs = {key: batch[key].to(device) for key in batch}
labels = inputs.pop("label")
outputs = model(**inputs, labels=labels)
logits = outputs.logits
predicted_labels = torch.argmax(logits, dim=1)
correct += (predicted_labels == labels).sum().item()
total += labels.size(0)
accuracy = correct / total
print("")
print(f"Test Accuracy: {accuracy * 100:.2f}%")
test_input = "[Official] 'Legendary Coach Resigns → Appoints New Commander' Suwon Completes Coaching Staff... Scout Bae Ki-jong Joins + Coach Shin Hwa-yong Remains"
test_input_processed = tokenizer(test_input, truncation=True, padding="max_length", return_tensors="pt").to(device)
logits = model(**test_input_processed).logits
print(logits)
predicted_labels = torch.argmax(logits, dim=1)
labeling_mapper = ["world", "sports", "business", "sci/tech"]
print(labeling_mapper[predicted_labels[0]])
# 모델 체크포인트 저장
model_save_path = "model.pth"
torch.save(model.state_dict(), model_save_path)
%%writefile handler.py
import json
import logging
import torch
from ts.context import Context
from ts.torch_handler.base_handler import BaseHandler
from transformers import BatchEncoding, BertTokenizer, BertForSequenceClassification
logging.basicConfig(level=logging.INFO)
class ModelHandler(BaseHandler):
def __init__(self):
self.initialized = False
self.tokenizer = None
self.model = None
def initialize(self, context: Context):
properties = context.system_properties
model_dir = properties.get("model_dir")
self.initialized = True
self.tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
self.model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=4)
model_path = model_dir + "/model.pth"
self.model.load_state_dict(torch.load(model_path))
self.model.to(torch.device("cuda" if torch.cuda.is_available() else "cpu"))
self.model.eval()
def preprocess(self, texts: list[str]) -> BatchEncoding:
logging.info("preprocess", texts)
inputs = self.tokenizer(texts, truncation=True, padding=True, max_length=512, return_tensors="pt")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
return inputs.to(device)
def inference(self, input_batch: BatchEncoding) -> torch.Tensor:
with torch.no_grad():
outputs = self.model(**input_batch)
logging.info("inference", outputs)
return outputs.logits
def postprocess(self, inference_output: torch.Tensor) -> list[dict[str, float]]:
logging.info("postprocess", inference_output)
probabilities = torch.nn.functional.softmax(inference_output, dim=1)
return [{"label": int(torch.argmax(prob)), "probability": float(prob.max())} for prob in probabilities]
!wget https://raw.githubusercontent.com/microsoft/SDNet/master/bert_vocab_files/bert-base-uncased-vocab.txt \
-O bert-base-uncased-vocab.txt
!mkdir -p model-store
!torch-model-archiver \
--model-name model \
--version 1.0 \
--serialized-file model.pth \
--handler ./handler.py \
--extra-files "bert-base-uncased-vocab.txt" \
--export-path model-store \
-f
- 맞춤 GCE VM에 연결해서 사용할 경우
# from google.colab import auth auth.authenticate_user()
5-5에서 생성한 정보 사용
credentials.json
으로 변경PROJECT_ID = "gde-project-aicloud" # @param {type: "string"}
LOCATION = "asia-northeast3" # @param {type: "string"}
BUCKET_NAME = "mlops-quicklab" # @param {type: "string"}
MODEL_FILE_NAME = "model.mar" # @param {type: "string"}
storage_client = storage.Client(credentials=credentials)
bucket = storage_client.bucket(BUCKET_NAME)
def upload_blob(source_file_name: str, destination_blob_name: str) -> None:
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print(f"File {source_file_name} uploaded to {destination_blob_name}.")
upload_blob("model-store/model.mar", f"models/{MODEL_FILE_NAME}")
aiplatform.init(
project=PROJECT_ID,
location=LOCATION,
credentials=credentials,
)
model_path = f"gs://{BUCKET_NAME}/models"
registry_model = aiplatform.Model.upload(
display_name="AG News Classification",
artifact_uri=model_path,
serving_container_image_uri="asia-northeast3-docker.pkg.dev/gde-project-aicloud/mlops-quicklab/trainer:1.0.3",
is_default_version=True,
version_aliases=["v1"],
version_description="A news category classification model",
serving_container_predict_route="/predictions/model",
serving_container_health_route="/ping",
)
mlops-quicklab/vertexai/predictions
로 이동$ docker build . \
-t {실제 내용 입력}/trainer:1.0.0
$ docker images | grep trainer
brew install google-cloud-sdk
$ gcloud auth login
$ gcloud config set project {PROJECT_ID}
$ gcloud auth configure-docker asia-northeast3-docker.pkg.dev
$ docker push asia-northeast3-docker.pkg.dev/mlops-quicklab-449207/mlops-cicd/trainer:1.0.0
model_path = f"gs://{BUCKET_NAME}/models"
registry_model = aiplatform.Model.upload(
display_name="AG News Classification",
artifact_uri=model_path,
serving_container_image_uri="{복사해서 붙여넣기}/trainer:1.0.0",
is_default_version=True,
version_aliases=["v1"],
version_description="A news category classification model",
serving_container_predict_route="/predictions/model",
serving_container_health_route="/ping",
)
DEPLOY_COMPUTE = "n1-standard-2"
DEPLOY_ACCELERATOR = "NVIDIA_TESLA_T4"
endpoint = aiplatform.Endpoint.create(
display_name="ag-news-category-classification",
project=PROJECT_ID,
location=LOCATION,
)
온라인 예측
부분을 보면 endpoint 확인 가능(리전 서울로 설정)deployment = registry_model.deploy(
endpoint=endpoint,
machine_type=DEPLOY_COMPUTE,
min_replica_count=1,
max_replica_count=1,
accelerator_type=DEPLOY_ACCELERATOR,
accelerator_count=1,
traffic_percentage=100,
sync=True,
)
endpoint.predict(instances=[
"OpenAI releases AI video generator Sora to all customers"
])
endpoint.undeploy_all()
endpoint.delete()
registry_model.delete()