TPC-C on MySQL && RocksDB

ewillwin·2022년 10월 12일

Database Project

목록 보기

1/2

List five transactions in TPC-C and briefly explain the business logic each transaction carries out.

New order transaction, Payment transaction, Order status transaction, Delivery transaction, Stock level transaction으로 5개의 transactions가 있다.
New order transaction: warehouse로부터 평균 10개의 items을 주문 -> 주문 insert -> 각 item에 대한 해당 재고(stock) 수준 update
Payment transaction: customer의 지불을 처리 -> 잔액 및 기타 데이터 update
Order status transaction: customer의 마지막 주문 상태를 반환
Delivery transaction: 각 district에 하나씩, 10개의 pending orders을 진행 (한 주문에 10 items)
Stock level transaction: 한 district의 마지막 20개 주문 별로 주문한 item의 (stock)재고 수량을 검사

Briefly explain at least two tools you can use to monitor system statistics (e.g., I/O, CPU, memory, etc.) and list what metrics you can get from each tool

mpstat or htop을 이용하여 cpu status를 monitor 할 수 있다.
%usr항목은 application (user level)에서 실행하는 동안 발생한 cpu 사용률, %sys항목은 system level에서 실행하는 동안 발생한 cpu 사용률을 나타낸다. %iowait는 시스템이 outstanding disk I/O 요청이 있는 동안 CPU가 유휴상태였던 시간의 비율을 나타내고, %idle은 CPU가 유휴상태이고 시스템이 미해결 디스크 I/O가 없는 시간의 비율을 나타낸다.
vmstat을 이용하여 memory status를 monoitor 할 수 있다. Procs, memory, Swap, IO 등의 항목이 있다.
iostat을 이용하여 i/o status를 monitor 할 수 있다. r/s, w/s 등의 항목이 있다.

Interpret the meaning of each metric (trx, 95%, 99%, TpmC) in the TPC-C experimental result below.

10, trx: 493, 95%: 764.362, 99%:1011.230, ...
20, ...
<TpmC>
  3634.300 TpmC

10초 단위로 transactions 수와 response time이 출력되었다. trx: 493을 통해 10초 동안 493개의 transaction이 있었다는 사실을 알 수 있고, 95%: 764.362를 통해 10초당 transaction의 95% 응답 시간이 764.362라는 사실을 알 수 있다. 또한 3634.300 TpmC를 통해 분당 완료된 new order transaction의 수가 3634.300이라는 사실을 알 수 있다.

Describe how TPC-C throughput changes as the MySQL's buffer size increases from 10% to 50% of the DB size. Explain the expected result and why.

Buffer Pool Size를 DB size의 10%, 20%, 30%, 40%, 50%로 늘리면서 TpmC와 Buffer hit rate가 어떻게 변화하는지를 관찰한다면, buffer pool size가 증가할 수록 TpmC와 buffer hit rate가 증가할 것이다. 할당된 DB size가 늘어날수록 데이터를 cache할 수 있는 용량이 커지므로 hit rate가 늘어나고 그에 따라 transaction per minute도 증가할 것이다.

Describe how TPC-C throughput varies with scan depth sizes? Explain the expected result and why.

LRU scan depth의 변화에 따라 buffer miss senario의 step1, step2, step3의 비율이 달라지고, buffer manager operation에 영향을 줄 것이다. step1은 free list를 search하는 과정인데, 이때 free block이 있다면 이를 반환한다. step2는 LRU list의 tail부터 scan depth만큼 LRU list를 scan하며 clean page를 찾고, clean page가 있다면 이를 반환한다. step3는 LRU tail의 dirty page를 flush하여 free list에 삽입한다.

List at least two differences between B+Tree and Log-Structured Merge Tree

B+Tree는 in-place update를 하고, LSM Tree는 out of place update를 한다
B+Tree (가장 최근의 기록만 저장되어 있기 때문에)는 읽기에 최적화 되어있고, LSM Tree (logfile과 memtable을 사용하여 random write를 sequential write를 바꾸기 때문에)는 쓰기에 더 빠르다.

[TPC-C 5개의 transactions 나열하고, business logic을 간단하게 설명]
New-Order transaction: warehouse로부터 평균 10개의 items을 주문 -> 주문 insert -> 각 item에 대해 해당 재고 수준 update
Payment transaction: customer의 지불을 처리 -> 잔액 및 기타 데이터 update
Order status transaction: customer의 마지막 주문 상태를 반환
Delivery transaction: 각 district에 하나씩, 10개의 pending orders를 진행. 한 order에 10개의 items.
Stock level transaction: 한 district의 마지막 20개 주문 별로 주문한 item의 재고수량을 검사

[OLTP vs OLAP]
OLTP는 random read/write workload. SIUD 다 있음. index scans 사용
OLAP는 많은 양의 순차적 접근으로 구성됨. 대부분 S로 구성됨. full table scans 사용

[system statistics (e,g., I/O, CPU, memory, etc.) monitor할 때 쓰는 tool 최소 두 개 설명 그리고 각 tool을 통해 metrics를 얻을 수 있는지]
mpstat or htop을 이용하여 cpu status를 monitor할 수 있다. %usr, %sys 등의 항목이 있다.
vmstat을 이용하여 memory status를 monitor할 수 있다. Procs, memory, Swap, IO등의 항목이 있다.
iostat을 이용하여 i/o status를 monitor할 수 있다. IOPS와 CPU utilization, r/s, w/s 등의 항목이 있다.

[아래 TPC-C의 결과를 보고 각 metric의 뜻을 해석]
10, trx: 493, 95%: 764.362, 99%:1011.230, ...
20, ...

3634.300 TpmC
10초 단위로 transaction 수와 response time이 출력되었다.
trx: 493 -> 10초동안 493개의 transaction이 있었음
95%: 764.362 -> 10초당 transaction의 95% response time이 764.362
99%: 1011.230 -> 10초당 transaction의 99% response time이 1011.230
3634.300 Tpmc -> 분당 완료된 transaction의 수가 3634.300

[Buffer Pool LRU 알고리즘 설명]
두개의 sub list로 관리됨.
page를 buffer pool로 읽는다면, midpoint에 삽입함. least recently used page는 점점 list의 tail로 이동하며, 결국 방출됨

[TPC-C throughput이 buffer size를 DB size의 10% ~ 50%로 늘리면서 어떻게 변하는지? 예상 결과와 왜 그런지를 설명]
buffer pool size를 증가함에 따라 TpmC와 Buffer hit rate가 증가할 것이다. 할당된 buffer size가 늘어날 수록 데이터를 cache할 수 있는 용량이 커지므로 hit rate가 늘어나고 그에 따라 transaction per minute도 증가할 것이다.

[Buffer pool hit rate metric를 확인할 수 있는 tool?]
show engine innodb status;를 통해 innodb 엔진의 상태를 확인해볼 수 있다. buffer pool hit rate도 확인할 수 있다.

[Buffer miss scenario 간략하게 설명]
Database buffer는 Free list, LRU list, Flush list로 구성됨
step1 -> free list에서 free buffer frame을 찾고, 있다면 반환
step2 -> LRU list에서 scan depth만큼 tail에서부터 clean page frame을 찾고, 있다면 이를 free하여 반환
step3 -> LRU tail의 dirty page를 flush하여 free list에 삽입

[TPC-C throughput이 scan depth size를 변경함에 따라 어떻게 변하는지? 예상 결과와 왜 그런지를 설명]
LRU scan depth의 변화에 따라 buffer miss scenario의 step1, step2, step3의 비율이 달라지고, buffer manager operation에 영향을 줄 것이다.

[RocksDB의 구성요소 및 작동원리 설명]
Memtable -> 메모리에서의 데이터 구조
Logfile -> disk에 순차적으로 쓰이는 파일
SSTable -> key-value 쌍으로 정렬 되어있음

write 요청이 들어오면 memtable에 먼저 써짐 그 다음 순차적으로 logfile에 써짐
memtable이 가득 차게 되면, memtable은 disk의 SSTable에 flush 되고, 해당 memtable과 관련된 logfile은 삭제됨

SSTable의 indexes는 항상 memory에 load 되있음
Read는 memtable을 먼저 접근하고, 그 다음 SSTable indexes에 접근함

[B+Tree와 Log-Structured Merge Tree의 차이점을 두 개 나열]
B+Tree는 in place update, LSM Tree는 out of place update를 한다.
B+Tree는 가장 최근의 기록만 저장되어 있기 때문에 읽기에 최적화 되어있고, LSM Tree는 logfile과 memtable을 사용하여 random write를 sequential write로 바꾸기 때문에 쓰기에 더 빠르다.

[leveled compaction과 universal compaction의 차이점을 두 개 나열]
compaction이란? -> 같은 key에 대한 multiple copies를 제거, SST file을 더 큰 SST file로 merge
Leveled compaction
disk의 파일은 여러 level로 구성된다. L0은 memtable에서 flush된 file들을 저장한다. 각 level은(L0 제외) 정렬된 데이터의 연속이다. 각 level의(L0 제외) 내부에는 데이터가 여러 SST file로 범위가 나누어져있다.
0이 아닌 모든 level에는 타겟 크기가 있는데, compaction의 목표는 level의 크기를 타겟 크기보다 작게 만드는 것이다.
compaction은 L0의 파일 수가 특정 숫자를 넘었을 때 발생하고, L0의 파일들은 L1에 merge된다. 일반적으로 L0의 파일들은 key값이 겹치기 때문에, 모든 L0 파일들은 compaction 대상 파일로 선택해야 한다.
database의 data는 multiple levels로 구성돼있음
recent data -> L0, oldest data -> Lmax
L0: overlapping keys, flush time으로 정렬 돼있음
L1 ~ Lmax: Non-everlapping keys, key로 정렬 돼있음
Universal compaction
write-heavy workload에서, level compaction은 병목현상이 발생함.
universal compaction은 write amplification을 줄이기 위해 고안 됨.
universal compaction에선 모든 files는 L0에 time order로 정리되어있음.
universal compaction은 일시적으로 amplification size를 증가시킴
1. 시간순으로 인접한 few files를 pick
2. pick up한 few files를 merge
3. L0의 new file로 교체함

[Write amplification과 Space amplification]
write amplification = flash memory로의 physical write의 개수 / host로부터의 logical write의 개수
space amplification = database의 size / database의 data의 size

[RocksDB benchmark 결과]
micros/op -> 한 operation을 processing하는데 걸리는 시간
ops/sec -> 초당 processed된 operation

ewillwin

💼 Software Engineer @ LG Electronics | 🎓 SungKyunKwan Univ. CSE

다음 포스트

TPC-C on MySQL && RocksDB

Database Project

SQLite3Bench

0개의 댓글