Label Detection 비교 - AWS Rekognition vs Google Cloud Vision API

아홉번째태양·2023년 5월 25일

Google Cloud Rekognition Vision API aws image label

AWS와 Google Cloud는 각각 이미지에 있는 오브젝트나 벌어지고 있는 상황, 혹은 그 밖의 해당 이미지를 묘사할 수 있는 워딩을 API를 제공한다. 마침 서비스에 해당 기능이 필요하던 참이라 둘을 간략하게 비교해본다.

1. 제공하는 기능

공통 기능 (aws / google)

Label Detection / Detect Labels
Image Properties / Detect Image Properties
Image Moderation / Detect Explicit Content
Facial Analysis / Detect Faces
Text in Image / Detect Text in Images(OCR)

AWS Rekognition

Celebrity Recognition
Face Comparison
PPE Detection
Custom Label

Google Cloud Vision API

Detect Text in Files
Detect Crop Hints
Detect Landmarks
Detect Logos
Detect Multiple Objects
Detect Web Entities and Pages

물론, 위 비교는 어디까지나 AWS Rekognition과 Google Cloud Vision API를 일대일로 비교했을 때이다. 최근 Google Cloud에서는 Vertax AI 기반의 서비스로 이전을 지원하며 이 경우 PPE Detection이나 파인튜닝을 통한 Custom Labeling또한 가능해진다.

2. 가격

AWS Rekognition
Google Cloud Vision API

100만개 기준 AWS가 약 1.25배 싸며, 그 이상 쓸 경우 AWS의 요금이 더 작아진다.

단, 추후 Custom Label이 필요해질 경우를 고려한다면,

AWS Rekognition with Custom Label
Google Cloud Vertax AI Data Labeling

요금을 책정하는 기준이 서로 다르기 때문에, 사용 계획에 따라 추후 다시 따져봐야 할 것 같다.

3. 라벨링 결과 비교

아래처럼 코드를 작성하여 각각의 API가 같은 사진을 두고 어떻게 라벨링하는지 결과를 비교해본다.

const { Rekognition } = require('@aws-sdk/client-rekognition');
const visionAPI = require('@google-cloud/vision');
const fs = require('fs');
const key = require('./key.json');

const rekognition = async () => {  
  const awsClient = new Rekognition({
    region: 'ap-northeast-2',
    credentials: {
      accessKeyId: '',
      secretAccessKey: '',
    },
  });
  
  const params = {
    Image: {
      S3Object: {
        Bucket: 'my-bucket',
        Name: 'sample2.jpg',
      },
    },
    MaxLabels: 20,
  };
  const response = await awsClient.detectLabels(params);
  console.log('AWS Rekognition: ');
  response.Labels.forEach((label) => console.log(label.Name, label.Confidence));
}

const visionAPI = async () => {
  const googleClient = new visionAPI.ImageAnnotatorClient({
    credentials: key,
  });
  
  const fileName = './resources/sample2.jpg';
  const image = fs.readFileSync(fileName);
  const request = {
    features: [
      { type: 'LABEL_DETECTION' },
    ],
    image: { content: image },
  };
  
  const [result] = await googleClient.annotateImage(request);
  const labels = result.labelAnnotations;
  console.log('Google Cloud Vision API: ');
  labels.forEach((label) => console.log(label.description, label.score));
}


rekognition();
visionAPI();

sample1.jpg

AWS Rekognition

Architecture 99.93106079101562
Building 99.93106079101562
Furniture 99.93106079101562
Indoors 99.93106079101562
Living Room 99.93106079101562
Room 99.93106079101562
Interior Design 99.65315246582031
Home Decor 98.14984130859375
Couch 96.9075698852539
Floor 96.1122055053711
Table 91.54740142822266
Desk 87.91352844238281
Chair 75.44424438476562
Reception 57.74221420288086
Reception Room 57.74221420288086
Waiting Room 57.74221420288086
Corner 57.68584442138672
Ceiling Light 57.25545883178711
Flooring 57.112632751464844
Foyer 56.165767669677734

Google Cloud Vision API

Building 0.9371646642684937
Table 0.8660224676132202
Couch 0.8498106002807617
House 0.844746470451355
Hall 0.8173579573631287
Chair 0.8140931129455566
Flooring 0.8104476928710938
Floor 0.8100553154945374
Fixture 0.798689603805542
Ceiling 0.7755923271179199

sample2.jpg

AWS Rekognition

Indoors 99.99840545654297
Interior Design 99.99840545654297
Kitchen 99.36380767822266
Cabinet 82.49337005615234
Furniture 82.49337005615234
Flower 66.68701934814453
Plant 66.68701934814453
Device 63.486080169677734
Candle 62.137245178222656
White Board 61.780555725097656
Closet 57.686668395996094
Cupboard 57.686668395996094
Sink 57.3814811706543
Sink Faucet 57.3814811706543
Shelf 56.95436096191406
Sideboard 56.328617095947266
Appliance 56.00457000732422
Electrical Device 56.00457000732422
Cooktop 55.871124267578125
Flower Arrangement 55.181182861328125

Google Coud Vision API

Cabinetry 0.9600701928138733
Countertop 0.9595826864242554
Property 0.9428776502609253
Building 0.9259463548660278
Kitchen 0.9065743684768677
Tap 0.9011318683624268
Sink 0.8935719728469849
Kitchen sink 0.8755253553390503
Lighting 0.8705979585647583
Plant 0.8593055009841919

Google Cloud Vision API는 라벨 갯수를 어떻게 늘릴 수 있는지 찾지 못하였지만, 이 밖에도 몇 가지 사진을 더 비교해보았을 때 AWS Rekognition이 중복라벨도 적고 조금 더 구체적이라는 느낌을 받은 것 같다.

결론

이미지 인식 정확도에 대해서는 여러 포스트마다 조금씩 다른 결과를 얘기하고 있지만, 어느 자료든 무시 가능한 수준의 미미한 차이였었다. 따라서, 당장에 단순한 라벨링만 사용한다고 하였을 때 가격적인 요소를 무시하기 힘들기도하고, 또 이미 AWS S3 Bucket을 저장소로서 활용하고 있기도해서 AWS Rekognition을 사용하기로 했다.

다만 라벨링 결과가 만족스러운 편은 아니라서, 다른 작업들이 정리되는대로 따로 이미지와 라벨을 맵핑한 데이터를 만들고 Custom Label 서비스를 사용해야할 것 같다.

참고자료
https://cloud.google.com/vision/docs/labels
https://cloud.google.com/vision-ai/docs/object-detector-model
https://cloud.google.com/vision/pricing/
https://cloud.google.com/vertex-ai/pricing#labeling
https://docs.aws.amazon.com/rekognition/latest/dg/labels-detect-labels-image.html
https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/what-is.html
https://aws.amazon.com/rekognition/pricing/