C++ Vector 재대로 구현하기

JunTak Lee·2024년 5월 5일

C++

목록 보기

3/5

C++에서 vector라는 것은 하나의 대명사로써 가변배열에 가까운 형태로 많이 통용된다
그렇기에 굉장히 많은 vector implementation이 존재하고, 각자의 장단점이 존재한다
그럼에도 많은 사람들이 컴파일러에서 제공해주는 vector implementation을 활용한다
그 이유는 여러가지가 존재하겠지만, 개인적인 생각으로는 다음과 같은 이유들이 있다고 본다

가장 대중적이다
Standard specification을 가장 잘 반영하고 있다
버그가 상당히 적으며 ~~사실상 없으며~~ 잘 유지보수된다
위 조건을 만족하는 implementation 중 최선의 결과라고 볼 수 있다

사실 개인이 STL을 직접 구현한다는 것은 상당히 고된 일이다
따라서 모두 구현한다기보다 자신에게 필요한 기능 일부만 구현하고 나머지는 생략하기 마련이다
혹은 추가적인 제약을 두어 더 빠르거나 더 심플한 구현을 만들기 마련이다
여기까지는 C++의 모토를 굉장히 잘 이해하고 실행하는 행위이다

문제는 인터넷을 뒤져보면 이상한 ~~이뭐병스런~~ vector 구현체가 굉장히 많이 보인다
이게 제약을 두었다거나 단순화했다기 보다는 C++이란 언어를 이해못하는것 같아보인다
그래서 보다가 답답해서 그냥 내 손으로 직접 구현하기로 마음먹었다

맘같아선 STL의 전반적인 부분까지 모두 구현하고 싶지만, 그랬다가는 몇 주는 소요될 것이다
따라서 iterator나 allocator 같은 부분들은 모두 제외하고 구현하기로 마음먹었다
이 글에서 말하고 싶은바는 이런 화려한 기술이 아니라, 구현부 그 자체이기 때문이다

Allocator

std::allocator를 유심히 본다면 흥미로운 부분들이 많이 보인다
단순히 메모리 할당만 하는 것이 아니라 메모리 접근, 할당, 초기화를 모두 담당하기 때문이다
이 부분을 많은 한국 블로그들이 놓치고 있던데, 개인적으로 굉장히 중요한 부분이라고 본다
특정 블로그를 저격하고 싶은 마음은 없어 학원이 운영하는 블로그를 가져와보았다
https://oceancoding.blogspot.com/2020/12/c-custom-stdvector.html

설명도 자세하고 iterator까지 근사하게 구현해놓았지만 정작 중요한 부분을 놓치고 있다
언뜻 보면 잘모를수도 있겠지만, 실행해본다면 상당히 큰 문제가 바로 보인다
https://godbolt.org/z/157TsnrjM

위 사이트를 들어가보면 알겠지만, 뭐 대단한걸 한 것도 아니다
그냥 단순히 l-value push_back과 r-value push_back을 한번씩 호출한거다
그리고 중간에 capacity를 늘리는 과정이 혼동될 수 있으니 reserve로 먼저 2만큼 확보한거다

std::vector<foo> std_vec;
std_vec.reserve(2);
std_vec.push_back({});
std_vec.push_back(f);

Standard implementation은 다음과 같은 실행을 가진다

constructor called
move constructor called
destructor called
copy constructor called
destructor called
destructor called

r-value construct (foo::foo())
move constructed object (foo::foo(foo&&))
r-value destruct (1번에서 생성한 객체)
l-value copy construct (foo::foo(foo const&))
destruct first element (2번에서 move한 객체)
destruct second element (5번에서 move한 객체)

그렇다면 학원측의 implementation은 어떻게 동작할까
뭔가 더 길어졌는데 하나씩 따라가보자

constructor called
constructor called
constructor called
destructor called
constructor called
assgin operator called
destructor called
assgin operator called
destructor called
destructor called

reserve 1 capacity on vector construct (ocs::vector())
reserve first space (foo::foo())
reserve second space (foo::foo())
destruct first element in previous array (1번에서 생성한 객체)
r-value construct (foo::foo())
l-value assign (foo::operator=(foo const&), 5번에서 생성한 객체)
destruct r-value object (5번에서 생성한 객체)
l-value assign (foo::operator=(foo const&))
destruct first element (2번에서 생성, 6번에서 assign한 객체)
destruct second element (3번에서 생성, 8번에서 assign한 객체)

아마 이 둘의 가장 큰 차이라고 한다면 바로 object의 초기화 시점일 것이다
여기서 말하는 초기화란, constructor의 호출 시점을 말하는 것이다
standard 버전은 array가 생성되었다고해서 바로 초기화되지 않는다
그리고 생성되어야하는 시점(push_back)에서 초기화됨을 확인할 수 있다
그렇기에 assign operator도 호출되지 않는다

사실 이것만해도 성능면에서 굉장히 큰 문제지만, 아직 더 큰 문제가 남아있다
위와 같은 implementation은 default constructor가 존재하지 않으면 사용할 수 없다
그 이유는 new operator가 memory allocation + constructor invocation이기 때문이다
https://godbolt.org/z/Y6rYqWoh6

아예 해결방법이 없냐고하면 그렇지 않다
위 예시에서 뻔질나게 사용한 memcpy와 같은 c-style의 함수로 대체하면된다
예를들어 new 대신에 malloc을 사용하면 배열 초기화와 동시에 객체가 초기화되는 문제를 해결할 수 있다
하지만 이러한 해결방법은 c-style이라는 한계를 극복하지 못한다

C++ Style Allocation

C++ 스타일의 allocation을 공부하려면 std::allocator를 살펴보는 것이 가장 좋다
그런데 이 std::allocator를 살펴보면 new에서 다소 생소한 문법이 보인다
이 부분이 많은 사람들이 놓치고 있는 부분이지 않을까 싶어 설명하려 한다

대부분의 학교나 학원에서 new를 memory allocation operator로 설명하곤한다
이건 반은 맞고 반은 틀린 말이다
cppreference에 가보면 placement new라는게 등장한다
https://en.cppreference.com/w/cpp/language/new

읽어보면 new operator에 추가적인 parameter를 전달해서 다른 행동을 한다고 적혀있다
여기서 다른 행동이 무엇인지를 확인하려면 new header를 살펴봐야한다
https://en.cppreference.com/w/cpp/memory/new/operator_new

이 글에서 저걸 다 설명할 생각은 없고 ~~사실 뭔지 모른다~~ 9,10번에 주목하면 된다
non-allocating placement allocation functions이라는 긴 이름을 가지고 있는 이 녀석은
이름에서 알 수 있듯 memory allocation 대신 placement를 위한 녀석이다
이걸 설명하기 위해서 아주 간단한 예시를 만들었다
foo가 allocation를 위한 new, foo2가 placement를 위한 new 이다
https://godbolt.org/z/hcbvGjhMx

#include <print>
#include <string_view>
#include <new>

struct bar {
    __attribute__((noinline)) 
    bar() { std::print("bar constructor called"); }
};

auto foo() -> decltype(auto) {
    return new bar;
}

auto foo2(bar *ptr) -> decltype(auto) {
    return new(ptr) bar;
}

우선 C++을 공부했다면 누구나 다 아는, allocation을 위한 new 부터 살펴보자

call    operator new(unsigned long)@PLT
mov     rbx, rax
mov     rdi, rax
call    bar::bar() [base object constructor]

이 경우 assembly를 살펴보면, operator new를 호출하고 constructor를 호출한다
즉 우리가 흔히 아는 memory allocation 후 constructor 호출이다
그렇다면 placement new는 어떨까

foo2(bar*):                           
        push    rbx
        mov     rbx, rdi
        call    bar::bar() [base object constructor]
        mov     rax, rbx
        pop     rbx
        ret

차이를 위해 optimization을 넣었더니 operator new가 호출되지도 않고 날라가버렸다
그 이유는 아무일도 일어나지 않기에 호출될 이유가 없기 때문이다
따라서 전달받은 위치에서 생성자만 호출할 뿐이다
이렇기에 우리는 할당받았지만 초기화되지 않은 메모리에서 생성자를 따로 호출할 수 있다

Implementing allocator

지금까지의 내용으로 C++에서의 allocation을 어떻게 해야하는지 알 수 있다
이걸로 allocator를 만들수도 있겠지만, 그러기엔 allocation이란 프로세스가 너무 복잡해보였다
C++ Allocator 요구사항을 처음 읽어보는 입장에서 이건 투머치였다
그래서 ~~예외처리조차 하지 않는~~ 최소한의 내용만으로 구현하기로 마음먹었다

우선 가장 기본적으로 메모리를 할당 받고 해제할 수 있어야한다
당연한 소리지만, 메모리를 할당 받지 않고는 접근조차 허가되지 않은 행동이기 때문이다
원하는 만큼의 메모리를 할당 받거나 해제하기 위해 다음과 같이 작성했다

template <typename T>
auto vector<T>::__allocate(size_t capacity) -> pointer_type {
	return static_cast<value_type*>(::operator new(capacity * sizeof(value_type)));
}

template <typename T>
auto vector<T>::__deallocate(value_type *const pos) -> void {
	::operator delete(const_cast<void*>(static_cast<const volatile void*>(pos)));
}

여기서 콜론 두개를 앞에 붙이는 이유는 global 버전을 호출하기 위해서이다
당연하게도 operator도 함수 취급이니 overloading이 가능하다
따라서 global 버전의 operator new(delete)를 사용하기 위해서는 콜론 두개를 붙여주어야한다

이렇게 메모리를 할당 받았다면 해당 위치에 객체를 생성해야 할 것이다
사실 이 부분을 STL에서는 forwarding reference로 각 상황에 맞는 constructor를 호출한다
그렇기에 allocator에서는 저 arguments를 forwarding만 해주면 된다

template< class T, class... Args >
constexpr T* construct_at( T* p, Args&&... args );

하지만 목표가 STL을 사용하지 않고 구현하는것이므로, std::forward를 사용할 수 없다
따라서 ~~사실상 동일한~~ forward를 빠르게 구현하고 넘어가자

template <typename T>
struct remove_reference {
    typedef T type;
};

template <typename T>
struct remove_reference<T&> : remove_reference<T>
{};

template <typename T>
struct remove_reference<T&&> : remove_reference<T>
{};

template <typename T>
constexpr T&& forward(typename remove_reference<T>::type &arg) noexcept {
    return static_cast<T&&>(arg);
};

template <typename T>
constexpr T&& forward(typename remove_reference<T>::type &&arg) noexcept {
    return static_cast<T&&>(arg);
};

forward를 구현하였으니 construct_at은 심플하게 구현이 가능해진다

template <typename T>
template <typename... Args>
void vector<T>::__construct(const_pointer_type pos, Args&&... args) {
    ::new (const_cast<void*>(static_cast<const volatile void*>(pos))) value_type(forward<Args>(args)...);
}

construct를 했으니 반대로 destruct도 해야할 것이다
사실 destruct의 경우 굉장히 간단한데, 그냥 destructor만 호출해주면 끝이기 때문이다
~~심지어 얘는 overloading도 없다..!~~

template <typename T>
auto vector<T>::__destruct(value_type *pos) -> void {
    pos->~value_type();
}

Modifiers

Allocation을 힘들게 만들었는데 정작 element를 넣거나 빼지 못한다면 무슨 소용이 있을까
자료구조라하면 element를 저장하기 위한 것이므로, 저장하거나 삭제할 수 있어야 할 것이다
이를 위해서 Standard에서는 다양한 인터페이스를 소개한다 ~~심지어 추가되고 있다~~
그런데 Allocation에서 힘을 너무 많이 썼더니 살짝 귀찮아졌다
그래서 머리속에 떠오르는 것만 만들었다

push_back(value_type const &value)
push_back(value_type &&value)
pop_back()
template <typename... Args> emplace_back(Args&&... args)

insert(value_type *const pos, value_type const &value)
insert(value_type *const pos, value_type const *begin, value_type const *end)

erase(value_type *const pos)
erase(value_type *const begin, value_type *const end)

resize(size_type size)
resize(size_type size, value_type const& value)
swap(vector& other)
clear()

다 적고 보니 상당히 허전한데 나머지는 각 구현에서 조금씩 바꿔서 만들면되기에 생략한다
추가적으로 swap과 clear도 상당히 단순하고 명백한 부분이라 생각해 생략했다

Push Back

vector의 꽃이라고도 할 수 있는 push back이 가장 먼저 떠올라서 만들었다
기본적인 동작이라하면 capacity가 size 보다 큰 경우와 그외 2가지가 존재할 것이다
우선 capacity가 size 보다 큰 경우에는 그냥 마지막에 하나 construct 해주면 된다
그리고 그외의 경우에는 capacity를 n배로 늘려주고 마지막에 하나 construct 해주면 된다

template <typename T>
void vector<T>::push_back(const value_type &value) {
    if (__size < __capacity) {
    	// capacity 넘지 않은 경우
        // 마지막에 새로운 element를 construct만 해주면 끝
        __construct(__arr + __size++, value);
    }
    else {
    	// n배 늘린 새로운 메모리 주소
        size_type new_cap = __capacity == 0 ? 1 : __capacity * 2;
        pointer_type new_arr = __allocate(new_cap);
        pointer_type dst = new_arr + __size;
        
        // 새로운 element 추가
        __construct(dst, value); dst -= 1;
        
        // 기존 element들 move
        __move_construct_backward(dst, end() - 1, begin() - 1);
        
        // 기존 배열 destruct
        __destruct_range(begin(), end());
        __deallocate(begin());
        
        __arr      = new_arr;
        __capacity = new_cap;
        ++__size;
    }
}

이때 주의할점은 크게 3가지 정도인것 같다

vector의 size가 0이 될 수 있다는 것
새로운 위치로 옮기는 경우 move를 써야 효울적이라는 것
기존의 element들은 destructor를 호출해줘야 한다는 것

사실 size가 0이 아닌 implementation도 가능은하다
다만 이 경우 빈 vector 생성시 heap allocation이 필연적이어서 추천하지는 않는다
차라리 매번 capacity가 0인지 검사하고 0이라면 1로 만드는 편이 더 낫다고 본다
그리고 capacity의 경우 굳이 2배를 해줄 필요는 없다
실제로 최신 언어들은 2배 대신 다른 숫자를 선택하는 편이다
나는 그냥 clang, gcc와 똑같이하려고 2배로 구현했다

잘보면 l-value const reference 버전만 있고 r-value reference 버전이 안보인다
이유는 둘이 거의 똑같고 construct 호출시만 다르기 때문에 중복같아보여서 적지않았다

Pop Back

vector에서 push_back 만큼이나 효율적인 동작이 pop_back이다
사실 capacity에 관여하지 않기에 어쩌면 더 효율적이라고 볼 수도 있을 것이다
그렇기에 구현도 굉장히 짧은 편이다
back element의 destructor를 호출하고 size를 줄이면 끝이다

template <typename T>
void vector<T>::pop_back() {
    __destruct(__arr + --__size);
}

Emplace Back

push_back은 굉장히 효율적이지만, 값이 항상 복사 혹은 이동 된다는 단점이 있다
예를 들어 다음과 같은 코드를 생각해보자

struct foo {
	foo(int) {}
}

std::vector<foo> vec;
vec.push_back({32});

위 경우에서 constructor 호출 후, move constructor가 한번더 호출된다
하지만 push_back하는 시점에서 값을 생성하기에 move constructor가 호출될 이유는 없다
즉, 불필요한 함수 호출이 발생된다는 것이다
이것을 해결하기 위해 emplace_back이라는 인터페이스가 존재한다
push_back과 거의 동일하지만 construct에서만 약간 다른 형태로 구현해주었다

template <typename T>
template <typename... Args>
auto vector<T>::emplace_back(Args&&... args) -> reference_type {
    if (__size < __capacity) {
        __construct(__arr + __size++, forward<Args>(args)...);
    }
    else {
        size_type new_cap = __capacity == 0 ? 1 : __capacity * 2;
        pointer_type new_arr = __allocate(new_cap);
        
        pointer_type dst = new_arr + __size;
        __construct(dst, forward<Args>(args)...); dst -= 1;
        __move_construct_backward(dst, end() - 1, begin() - 1);
        
        __destruct_range(begin(), end());
        __deallocate(begin());
        
        __arr      = new_arr;
        __capacity = new_cap;
        ++__size;
    }
    
    return back();
}

Insert

insert는 vector에서 피해야할 operation 중 하나라고 생각한다
하지만 꼭 필요한 경우도 있으니 구현은 해야한다
다만 느리다는 것은 그만큼 더 많은 operation이 필요하고, 더 해야할 부분이 많다는 뜻이기도 하다

우선 그나마 쉬운 한개짜리 insert 부터 구현해보자
여기서 깨알같은 최적화를 위해 만약 insert하고자하는 위치가 end라면, 그냥 push_back으로 바꿔준다
그렇지 않은 경우 바로 뒤에서 설명할 range 기반 insert로 넘겨주면 된다
clang의 구현을 보니 한개짜리도 다 직접 구현해놨던데 귀찮아서 그냥 이런식으로 돌려버렸다

template <typename T>
void vector<T>::insert(const_pointer_type pos, value_type const &value) {
    if (pos == end()) {
        push_back(value);
    }
    else {
        pointer_type begin = const_cast<pointer_type>(&value);
        pointer_type end = begin + 1;
        
        insert(pos, begin, end);
    }
}

range 기반 입력이 들어온 경우에는 조금 귀찮아진다
우선 capacity도 신경써야하고, 앞 부분과 뒷 부분을 모두 옮겨줘야 하기 때문이다
optimal한 결과를 만들어낼려면 이것도 case by case로 만들어야할 것이다
하지만 귀찮은 관계로 iter 기반 insert를 제외하고는 깔끔하게 포기했다

template <typename T>
void vector<T>::insert(const_pointer_type pos,
                       pointer_type begin, pointer_type end) {
    size_type new_size = size() + size_type(end - begin);
    
    if (new_size > capacity()) {
        size_type new_cap = new_size < capacity() * 2 ? capacity() * 2 : new_size;
        pointer_type new_arr = __allocate(new_cap);
        
        pointer_type dst = new_arr;
        pointer_type old_begin = this->begin();
        pointer_type old_end = this->end();
        size_type diff = size_type(pos - old_begin);
        
        __move_construct_range(dst, old_begin, old_begin + diff); dst += diff;
        __copy_construct_range(dst, begin, end); dst += size_type(end - begin);
        __move_construct_range(dst, old_begin + diff, old_end);
        
        __destruct_range(old_begin, old_end);
        __deallocate(__arr);
        
        __arr      = new_arr;
        __capacity = new_cap;
    }
    else {
        pointer_type loc = this->begin() + (pos - this->begin());
        
        __move_backward(this->begin() + new_size - 1, this->end() - 1, loc - 1);
        __copy_construct_range(loc, begin, end);
    }
    
    __size = new_size;
}

Erase

destructor 부분에서 보았듯이 destruct가 construct보단 쉽다
해당 element의 destructor 호출해주고 뒤 element들을 앞으로 당겨주기만 하면 된다
reallocation이 일어날 일도 없으니 굳이 분기를 나눌 필요도 없다

template <typename T>
void vector<T>::erase(const_pointer_type pos) {
    pointer_type loc = begin() + (pos - begin());
    
    __destruct(loc);
    __move_range(loc, loc + 1, end());
    --__size;
}

template <typename T>
void vector<T>::erase(const_pointer_type begin, const_pointer_type end) {
    __destruct_range(begin, end);
    __move_range(begin, end, this->end());
    __size -= (size_type)(end - begin);
}

Resize

Resize의 경우 4가지 경우를 생각해야 된다

새로운 size가 기존 size와 동일한 경우
새로운 size가 기존 capacity 보다 큰 경우
새로운 size가 기존 capacity 보단 작지만, 기존 size 보단 큰 경우
새로운 size가 기존 size 보다 작은 경우

첫번째는 아무것도 안하고 그냥 리턴하면 된다
두번째는 당연하겠지만 allocation을 새로하고, 새로운 size까지 생성자를 호출해줘야한다
세번째는 allocation은 필요없지만, 새로운 size까지 생성자를 호출해줘야한다
네번째는 그냥 새로운 size까지 destructor 호출해주면 된다
뭔가 말이 복잡한데 코드는 생각보다 심플하다

template <typename T>
void vector<T>::resize(size_type size) {
    if (size == __size) {
        return;
    }
    
    if (size > __size) {
        if (size > __capacity) {
            auto new_cap = __capacity * 2 > size ? __capacity * 2 : size;
            auto dst = __allocate(new_cap);
            
            __move_construct_range(dst, begin(), end());
            __construct_range(dst + __size, dst + size);
            
            __destruct_range(begin(), end());
            __deallocate(begin());
            
            __arr = dst;
            __capacity = new_cap;
        }
        else {
            __construct_range(end(), begin() + size);
        }
    }
    else {
        __destruct_range(begin() + size, end());
    }
    
    __size = size;
}

다만 resize의 경우 두,세번째 케이스에서 생성자를 호출하기 때문에 또 다른 인터페이스가 필요하다
위 코드는 기본생성자만 호출할 수 있기에 복사생성자를 호출할 수 있게끔 해줘야 한다

template <typename T>
void vector<T>::resize(size_type size, value_type const& value) {
    if (size == __size) {
        return;
    }
    
    if (size > __size) {
        if (size > __capacity) {
            auto new_cap = __capacity * 2 > size ? __capacity * 2 : size;
            auto dst = __allocate(new_cap);
            
            __move_construct_range(dst, begin(), end());
            __construct_range(dst + __size, dst + size, value);
            
            __destruct_range(begin(), end());
            __deallocate(begin());
            
            __arr = dst;
            __capacity = new_cap;
        }
        else {
            __construct_range(end(), begin() + size, value);
        }
    }
    else {
        __destruct_range(begin() + size, end());
    }
    
    __size = size;
}

Performance

C++ 좋아하는 사람치고 성능 안좋아하는 사람은 못봤다
그렇기에 이렇게까지 만들어놓고 성능 측정을 안한다면 너무 아쉬울 것이다
성능비교를 뭐랑 할까 고민하다가 레퍼런스로 많이 참고했던 libc++ vector를 선택했다
물론 패배할껄 알지만, 그래도 얼마나 근접했는지를 확인해보고 싶었기 때문이다

측정 방법은 동일한 operation을 동일한 상태에서 동일한 iteration 만큼 수행하고 이를 측정하는 것이다
여기서 시간도 측정했지만, 그보다 중요하게 생각한 부분은 어떤 함수가 몇번이나 호출되었는가이다
따라서 시간 측정에 chrono library를 사용하였는데 그다지 문제된다고 생각하지 않는다
인텔 cpu였다면 rdtsc를 사용했겠지만 m3 mac으로 기변한 지금 불가능해서 그런것도 있다

우선 벤치마킹에 사용한 data type은 다음과 같다
정말 단순하게 호출된 함수의 count를 1씩 올려주는게 전부다

std::size_t constructor_cnt;
std::size_t destructor_cnt;
std::size_t copy_cnt;
std::size_t move_cnt;
std::size_t assign_cnt;
std::size_t move_assign_cnt;

struct foo {
    int value;
    long dummy[12] = { 0, };
    
    foo() : foo(42) {}
    foo(int n) : value(n) { ++constructor_cnt; }
    foo(foo const& o) : value(o.value) { ++copy_cnt; }
    foo(foo&& o) noexcept : value(o.value) { ++move_cnt; }
    ~foo() { ++destructor_cnt; }
    
    foo &operator=(foo const& o) { value = o.value; ++assign_cnt; return *this; }
    foo &operator=(foo&& o) { value = o.value; ++move_assign_cnt; return *this; }
    
    bool operator==(foo const &other) { return value == other.value; }
    bool operator!=(foo const &other) { return value != other.value; }
};

벤치마킹에 사용한 코드는 다음과 같다(출력부 제외)

#include <iostream>
#include <vector>
#include <chrono>
#include <array>
#include <tuple>
#include <string>

template <typename T>
using benchmark_t = std::tuple<std::string, int, void(*)(T&, int)>;

template <typename T>
std::array benchmark = {
	...
}

template <typename T>
auto benchmark_impl(T& vec) {
    for (auto& benchmark_func : benchmark<T>) {
        constructor_cnt = 0;
        copy_cnt        = 0;
        move_cnt        = 0;
        destructor_cnt  = 0;
        assign_cnt      = 0;
        move_assign_cnt = 0;
        
        auto target_name = std::get<0>(benchmark_func);
        auto target_iter = std::get<1>(benchmark_func);
        auto target_func = std::get<2>(benchmark_func);
        
        auto start = std::chrono::high_resolution_clock::now();
        
        for (int i = 0; i < target_iter; ++i) {
            target_func(vec, i);
        }
        
        auto end = std::chrono::high_resolution_clock::now();
        auto diff = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
        
        if (target_name.empty()) {
            continue;
        }
    }
}

auto main() -> int {
    vector<foo> vec;
    vector<foo_debug> vec_debug;
    {
        debug(vec_debug);
        benchmark_impl(vec);
    }
    
    std::vector<foo> std_vec;
    std::vector<foo_debug> std_vec_debug;
    {
        debug(std_vec_debug);
        benchmark_impl(std_vec);
    }
}

벤치마크한 함수와 iteration count는 다음과 같다

benchmark_t<T>{ {}, 1000,
	[](auto& vec, [[maybe_unused]] int cnt) -> void { vec.push_back(cnt); }},
        
benchmark_t<T>{ "reserve", 10000,
	[](auto& vec, [[maybe_unused]] int cnt) -> void { vec.reserve((std::size_t)cnt); }},
    
benchmark_t<T>{ "resize", 100,
	[](auto& vec, [[maybe_unused]] int cnt) -> void { vec.resize((std::size_t)cnt * 10000); }},
    
benchmark_t<T>{ "resize_args", 100,
	[](auto& vec, [[maybe_unused]] int cnt) -> void { vec.resize((std::size_t)cnt * 1000, {32}); }},
    
benchmark_t<T>{ "push_back", 1000000,
	[](auto& vec, [[maybe_unused]] int cnt) -> void { vec.push_back({cnt}); }},
    
benchmark_t<T>{ "insert", 1000,
	[](auto& vec, [[maybe_unused]] int cnt) -> void { vec.insert(vec.begin() + cnt * 2, {42}); }},
    
benchmark_t<T>{ "modify", 1000000,
	[](auto& vec, [[maybe_unused]] int cnt) -> void { vec[(std::size_t) + 1000].value = cnt * 200; }},
    
benchmark_t<T>{ "erase", 1000,
	[](auto& vec, [[maybe_unused]] int cnt) -> void { vec.erase(vec.begin()); }},
    
benchmark_t<T>{ "emplace_back", 1000000,
	[](auto& vec, [[maybe_unused]] int cnt) -> void { vec.emplace_back(cnt); }},
    
benchmark_t<T>{ "at", 100000,
	[](auto& vec, [[maybe_unused]] int cnt) -> void { vec.at((std::size_t)cnt * 3); }}

reserve	Custom implementation	Libc++ implementation
chrono	19140	17788
constructor	0	0
copy	0	0
move	8975000	8975000
destructor	8975000	8975000
assign	0	0
move assign	0	0

resize	Custom implementation	Libc++ implementation
chrono	24318	14394
constructor	990000	990000
copy	0	0
move	1200000	1200000
destructor	1201000	1201000
assign	0	0
move assign	0	0

resize_args	Custom implementation	Libc++ implementation
chrono	395	196
constructor	100	100
copy	99000	99000
move	0	0
destructor	990100	990100
assign	0	0
move assign	0	0

push_back	Custom implementation	Libc++ implementation
chrono	8054	3950
constructor	1000000	1000000
copy	0	0
move	1000000	1000000
destructor	1000000	1000000
assign	0	0
move assign	0	0

emplace_back	Custom implementation	Libc++ implementation
chrono	18794	19583
constructor	1000000	1000000
copy	0	0
move	1279872	1279872
destructor	1279872	1279872
assign	0	0
move assign	0	0

insert	Custom implementation	Libc++ implementation
chrono	1327495	1359554
constructor	1000	1000
copy	1000	0
move	0	1000
destructor	1000	1000
assign	0	1000
move assign	1098500500	1098499500

유일하게 호출 횟수가 차이나는 곳이 insert인데, 이유는 exception과 관련 되어있다
예전에 이 이유를 설명하는 stack overflow 글을 봤었는데 다시보니 기억이 나질 않는다..
어찌되었건 standard에 부합하는 구현체를 만드려면 libc++과 같이 해야한다
단순히 객체를 복사하는 것만으로는 해결되지 않는다

erase	Custom implementation	Libc++ implementation
chrono	1281392	1278906
constructor	0	0
copy	0	0
move	0	0
destructor	1000	10000
assign	0	0
move assign	1099499500	1099499500

at	Custom implementation	Libc++ implementation
chrono	89	89
constructor	0	0
copy	0	0
move	0	0
destructor	0	0
assign	0	0
move assign	0	0

재대로 구현한 경우라면, libc++의 구현이 더 빠른걸 확인할 수 있다
내 구현체가 더 빠른 경우는 높은 확률로 내가 뭘 빼뜨리고 구현해서이다
이렇듯 컴파일러가 작성한 STL은 Standard를 준수하는 최상의 결과물이라고 할 수 있다

다만 속도에서의 차이가 심한 경우는 뭔가 문제가 있는 경우인데..
이건 나중에 시간나면 다시 확인해봐야 할듯 하다

코딩을 하다보면 아무생각없이 library를 사용하기 마련이다
그 안이 어떻게 구현되었을까를 한번쯤은 고민해보고 찾아봐도 좋은 공부가 된다고 생각한다
왜냐하면 이러한 구현들은 해당 언어에 대한 높은 이해도가 동반되어야하기 때문이다
물론 언어가 도구라는 것에는 동의하는 편이다
다만 도구도 용도를 알고 능숙하게 다루는 것과 그렇지 못한것은 분명 다르다고 생각한다

이걸 직접 구현해본다면 STL 구현하시는 분들이 얼마나 대단한지 느낄 수 있을 것이다
Standard 문서를 읽으면서 requirements를 모두 충족시키는 구현을 만들기란 쉽지 않다
그런데 이러한 구현이 성능까지 최상인 결과물이라면 어떨까
위에서도 언급했지만 동일한 구현에서도 성능차이가 나는것을 확인할 수 있다
이러한 부분이 바로 mirco-optimization이 아닐까 싶다
이러한 optimization이 필요하냐 안하냐 논하기에 앞서 한가지를 꼭 기억해야한다
우리가 아무 고민없이 최상의 결과물을 누릴 수 있다는 사실이다
~~STL 구현하시는 분들께 마음속으로라도 감사합니다 삼창하고 지나가자~~

이번에도 하루면 작성할 수 있겠지라는 가벼운 마음을 시작했던 글이 5일이나 걸렸다
구현까지 생각하면 일주일정도 걸린것 같다
이번 글도 점차 시간에 쫓기고, 힘이 부쳐서 급하게 마무리한 감이있다
그래도 덕분에 나름 동작하는 vector 구현체를 만들 수 있었다
이런 짓거리를 나중에 또 할일이 있을까 싶지만은, 그래도 알아서 나쁠껀 없다고 본다
애초에 이런 공부가 C++과 친해지는 한걸음이라고 생각한다

원래는 깃헙에 올릴생각이 없었는데 이 썩을 velog가 토글 미지원이라 그냥 깃헙에 올렸다
https://github.com/ross1573/mini_vector

JunTak Lee

이전 포스트

C++ STL Container 성능 비교

다음 포스트