[Dreamhack] Heap Allocator Exploit: ptmalloc2 allocator

Merry Berry·2024년 9월 14일

heap ptmalloc pwnable

Pwnable&Reversing

목록 보기

5/9

https://dreamhack.io/lecture/courses/16
GLIBC 2.23 기준으로 정리한 글이다.

Chunk

Chunk Structure

힙 할당 단위

struct malloc_chunk {

  INTERNAL_SIZE_T      prev_size;  /* Size of previous chunk (if free).  */
  INTERNAL_SIZE_T      size;       /* Size in bytes, including overhead. */

  struct malloc_chunk* fd;         /* double links -- used only if free. */
  struct malloc_chunk* bk;

  /* Only used for large blocks: pointer to next larger size.  */
  struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
  struct malloc_chunk* bk_nextsize;
};

청크는 header(prev_size, size)와 mem(fd부터)으로 나뉜다. 실제 데이터는 mem부터 저장된다. 따라서 fd, bk 포인터는 청크가 해제(free)되었을 때만 사용된다.

prev_size: 인접한 앞에 있는(낮은 주소) 청크가 free일 경우(prev_inuse=0일 경우)의 크기. 병합 목적으로 사용된다.

tcache가 도입된 glibc 2.26이상부터는 이 값이 설정되지 않을 수 있으니 주의
size: 현재 청크의 크기(header+mem). LSB 3비트는 flag로 사용된다.
- PREV_INUSE(0th bit): 인접한 앞선 청크가 할당된 청크일 경우 1, free일 경우 0. 처음 할당된 청크일 경우 1
- IS_MMAPPED(1st bit): 해당 청크가 mmap()으로 할당됐을 때 1
  
  Chunks allocated via mmap, which have the second-lowest-order
  bit M (IS_MMAPPED) set in their size fields. Because they are
  allocated one-by-one, each must contain its own trailing size
  field. If the M bit is set, the other bits are ignored
  (because mmapped chunks are neither in an arena, nor adjacent
  to a freed chunk). The M bit is also used for chunks which
  originally came from a dumped heap via malloc_set_state in
  hooks.c. 링크
- NON_MAIN_ARENA(2nd bit): 해당 청크가 main_arena에서 관리되지 않을 때 1
fd: 해제된 청크에서만 사용. 동일한 bin의 다음 청크를 가리키는 포인터
bk: 해제된 청크에서만 사용. 동일한 bin의 이전 청크를 가리키는 포인터
fd_nextsize: large bin에서 사용하는 포인터. 현재 청크보다 크기가 작은 청크를 가리키는 포인터
bk_nextsize: large bin에서 사용하는 포인터. 현재 청크보다 크기가 큰 청크를 가리키는 포인터

Type of Chunk

Allocate Chunk: malloc()등을 통해 할당된 청크
Free Chunk: free()로 해제된 청크
Top Chunk(Wildness Chunk): 초기에 0x21000만큼의 크기를 가지며, malloc() 호출 시 free한 청크가 없다면 해당 청크에서 분리해서 할당한다. 만약 top chunk와 인접한 청크가 해제될 떼, 해당 청크의 크기가 fastbin에 포함되지 않으면 top chunk에 합쳐진다.

Bin

해제된 청크가 모여 있는 (단일/이중)연결 리스트. 여러 개의 bin이 존재하며, 각 bin마다 리스트에 포함되는 청크의 크기가 정해져 있다.
Fastbin, Unsorted bin, Smallbin, Largebin이 있다.

Fastbin
- 0x10(16) ~ 0x40(64) 바이트(32bit)
- 0x20(32) ~ 0x80(128) 바이트(64bit)
Smallbin
- < 0x200(512) 바이트(32bit)
- < 0x400(1024) 바이트(64bit)
Largebin
- >= 0x200(512) 바이트(32bit)
- >= 0x400(1024) 바이트(64bit)

Fastbin

LIFO, 단일 연결 리스트, bin들 중 할당 해제 속도가 가장 빠르다.
fastbin[0]:32byte, fastbin[1]:48byte, ..., fastbin[9]:178byte가 존재하지만, 리눅스는 이 중 7개(fastbin[0]:32byte~fastbin[6]:128byte)만을 사용한다. (64bit 기준)
병합 과정이 없다. 따라서 해제된 청크의 다음 청크의 prev_inuse가 0으로 설정되지 않는다.

실제 glibc 2.23 코드에 의하면64bit 기준 get_max_fast() 매크로의 리턴 값은 128byte이다.

Unsorted bin

FIFO, 원형 이중 연결 리스트, 1개의 bin만 존재한다. 인접한 청크는 병합된다.
다음과 같은 조건일 경우 청크가 unsorted bin에 포함된다.

smallbin, largebin 크기의 청크가 해제됐을 때

해당 크기의 청크는 바로 그 크기에 맞는 bin에 들어가지 않고, 먼저 unsorted bin에 들어간다.
fastbin에서 청크가 병합됐을 때

fastbin은 기본적으로 병합을 하지 않지만, 큰 크기의 size를 요청받는 등 특정 상황에서는 병합을 수행한다. ( malloc_consolidate())
요청한 크기의 best fit인 청크의 last remainder

만약 bin에 있는 청크보다 작은 크기를 요청하면, 그 청크를 split하여 제공한다. 이때 분리된 나머지 청크를 Last Remainder Chunk라고 한다.

이 bin에 있는 해제된 청크를 재사용하기 위해서는 해당 청크의 크기보다 작거나 같은 크기를 요청해야 한다.
만약 요청한 크기가 bin에 있는 청크의 크기보다 크다면, 해당 bin 내의 청크들은 각각 smallbin, largebin으로 옮겨진다.

Smallbin

FIFO, 원형 이중 연결 리스트이다.
smallbin[0]:32byte, smallbin[1]:48byte, ..., smallbin[61]:1008byte 총 62개의 smallbin이 존재한다.
해당 bin에서는 병합이 발생한다. 따라서 인접한 다음 청크의 prev_inuse가 0으로 설정될 수도 있다. 그러나 이때 병합이 발생해 하나의 청크로 합쳐지므로, 사실상 두 개의 청크가 인접해 있을 수 없다.

Largebin

FIFO, 이중 연결 리스트이다.
glibc 2.23 기준으로 512byte 이상, glibc 2.27 기준 1024byte 이상의 청크가 저장되며, 총 63개의 largebin이 존재한다. 각 largebin은 일정 범위의 크기인 청크를 저장한다.

glibc 2.23 기준 largebin[0]: 512<=sz<576, largebin[1]: 576<=sz<640, ...
glibc 2.27 기준 largebin[0]:1024<=sz<1088, ..., largebin[32]:3072<=sz<3584, ...

largebin 내의 청크들은 크기 기준으로 내림차순으로 정렬되며 이때 fd_nextsize, bk_nextsize 포인터를 사용한다. 링크
smallbin과 마찬가지로 병합이 발생한다.

bin members

이 다음에 기술될 malloc_state 구조체는 bin을 관리하는데, 해당 구조체 내에는 fastbinsY, bins 멤버가 있다. 이들은 모두 struct malloc_chunk *의 배열이다.
fastbinsY는 길이가 NFASTBINS(=10)인 배열이지만, 인덱스 6(7개)까지 사용하고 있다. 각 배열의 값은 각각의 bin에서 가장 앞에 있는 청크의 주소다. 만약 크기가 0x20인 두 청크 0x602000, 0x602020이 순서대로 할당 해제되면, 다음과 같은 구조가 형성된다.

// fastbinsY[0]: 0x602020 -> 0x602000
fastbinsY[0] = 0x602020
*((struct malloc_chunk*)0x602020).fd = 0x602000
*((struct malloc_chunk*)0x602000).fd = 0

fastbin은 LIFO 구조로, 가장 마지막에 들어온 청크는 bin의 fd에 연결된다.

bins는 길이가 NBINS(128)*2-2인 배열이고, unsorted bin, smallbin, largebin이 포함돼 있다. 참고로 각 배열의 원소는 8byte이지만, 위에서 설명한 bin 하나가 실제로는 두 개의 연속된 원소(16byte, 각각 fd, bk)를 사용한다. unsorted bin은 (fd=bin[0], bk=bin[1])을, smallbin은 (fd=bin[2], bk=bin[3])~(fd=bin[124], bk=bin[125]), largebin은 (fd=bin[126], bk=bin[127])~을 사용한다.
예를 들어, 크기가 각각 0x90, 0x110, 0x130인 3개의 청크 A, B, C가 순서대로 unsorted bin에 들어갔다고 가정할 때 다음과 같은 구조가 형성된다.

// bin[0] = A; A->bk = B; B->bk = C; C->bk = &bins[0] - 0x10
// bin[1] = C; C->fd = B; B->fd = A; A->fd = &bins[0] - 0x10
*((struct malloc_chunk*)(&bins[0] - 0x10)).fd = C  //bins[0] = C
*((struct malloc_chunk*)(&bins[0] - 0x10)).bk = A  //bins[1] = A
*((struct malloc_chunk*)A).fd = &bins[0] - 0x10
*((struct malloc_chunk*)A).bk = B
*((struct malloc_chunk*)B).fd = A
*((struct malloc_chunk*)B).bk = C
*((struct malloc_chunk*)C).fd = B
*((struct malloc_chunk*)C).bk = &bins[0] - 0x10

fastbin을 제외한 나머지 bin은 모두 FIFO 구조로, 가장 마지막에 들어온 청크는 원래 있던 마지막 청크의 bk에 연결된다.

이때 주의할 점은, fastbinsY, bins 모두 접근 시 기존 주소값에 0x10을 뺀 값으로 접근한다는 것이다. 예를 들어 unsorted bin(bins[0], bins[1])에 값을 쓰기 위해 (struct malloc_chunk *)(&bins[0] - 0x10)으로 접근한다. 그 이유는 struct malloc_chunkd의 fd, bk 멤버를 활용하기 위함인데, 특정 주소에서 0x10만큼 빼야 두 멤버들을 통해 원하는 주소에 값을 읽거나 쓸 수 있기 때문이다. 즉, *((struct malloc_chunk *)(&bins[0] - 0x10)).fd, *((struct malloc_chunk *)(&bins[0] - 0x10)).bk로 접근하여 bins[0], bins[1]에 값을 쓸 수 있다. 그리고 각 청크에서 fd, bk에 bin의 주소가 들어가야 한다면, 원래 bin의 주소 값에 0x10을 뺀 값이 저장된다.

실제로 코드에서 자주 사용되는 bin_at(m, i) 매크로 함수는 인덱스 i에 해당하는 bin의 주소 &bins[(i-1)*2]에서 0x10을 뺀 값 &bins[(i-1)*2]-0x10을 리턴한다.

largebin은 다른 bin들과 달리 내부에서 크기 순으로 내림차순 정렬한다. 예를 들어, 6개의 청크[크기] A(0x1000), B(0x1020), C(0x1040), D(0x1000), E(0x1020), F(0x1040)가 순서대로 largebin에 들어간다고 하자. 그럼 다음과 같은 구조가 형성된다.

가장 큰 크기의 청크는 bin의 fd에 연결된다.
크기 순으로 정렬되지만, 같은 크기일 경우 가장 마지막에 들어온 청크가 기존에 있던 청크의fd에 연결된다.

Arena

bin(fastbin, unsorted bin, smallbin, largebin)들을 관리하는 객체로, 각 스레드가 접근할 때마다 lock을 건다.
최대 64개의 arena를 생성할 수 있지만, 스레드의 수가 많을 경우 병목 현상이 생긴다. 따라서 glibc 2.26부터 tcache를 도입했다.
스레드들은 각각의 thread_arena를 갖는데, 이 중 메인 스레드가 관리하는 arena를 main_arena라고 하며, 전역 변수로 선언되어 있다.
만약 top chunk보다 큰 크기를 요청할 경우, mmap()으로 할당된 페이지에 청크가 할당되며 이는 main_arena에서 관리하지 않는다. 이때 해당 청크의 NON_MAIN_ARENA 값이 1이 된다.

static struct malloc_state main_arena =
{
  .mutex = _LIBC_LOCK_INITIALIZER,
  .next = &main_arena,
  .attached_threads = 1
};

struct malloc_state
{
  /* Serialize access.  */
  mutex_t mutex;

  /* Flags (formerly in max_fast).  */
  int flags;

  /* Fastbins */
  mfastbinptr fastbinsY[NFASTBINS];

  /* Base of the topmost chunk -- not otherwise kept in a bin */
  mchunkptr top;

  /* The remainder from the most recent split of a small request */
  mchunkptr last_remainder;

  /* Normal bins packed as described above */
  mchunkptr bins[NBINS * 2 - 2];

  /* Bitmap of bins */
  unsigned int binmap[BINMAPSIZE];

  /* Linked list */
  struct malloc_state *next;

  /* Linked list for free arenas.  Access to this field is serialized
     by free_list_lock in arena.c.  */
  struct malloc_state *next_free;

  /* Number of threads attached to this arena.  0 if the arena is on
     the free list.  Access to this field is serialized by
     free_list_lock in arena.c.  */
  INTERNAL_SIZE_T attached_threads;

  /* Memory allocated from the system in this arena.  */
  INTERNAL_SIZE_T system_mem;
  INTERNAL_SIZE_T max_system_mem;
};

malloc() Routine Analysis

__libc_malloc()

malloc()호출 시 __libc_malloc()이 호출된다.

#define arena_get(ptr, size) do {            \
      ptr = thread_arena;                    \
      arena_lock (ptr, size);                \
  } while (0)
#define arena_lock(ptr, size) do {           \
      if (ptr && !arena_is_corrupt (ptr))    \
        (void) mutex_lock (&ptr->mutex);     \
      else                                   \
        ptr = arena_get2 ((size), NULL);     \
  } while (0)

void *
__libc_malloc (size_t bytes)
{
  mstate ar_ptr;
  void *victim;

  void *(*hook) (size_t, const void *)
    = atomic_forced_read (__malloc_hook);
  if (__builtin_expect (hook != NULL, 0))
    return (*hook)(bytes, RETURN_ADDRESS (0));

  arena_get (ar_ptr, bytes);
  victim = _int_malloc (ar_ptr, bytes);

static void *
malloc_hook_ini (size_t sz, const void *caller)
{
  __malloc_hook = NULL;
  ptmalloc_init ();
  return __libc_malloc (sz);
}

__malloc_hook의 초기 값은 malloc_hook_ini()이다. malloc_hook_ini()는 __malloc_hook를 NULL로 설정하고, ptmalloc_init()을 호출한 뒤 __libc_malloc()을 호출한다. ptmalloc_init()에서는 thread_arena = &main_arena, __malloc_initialized = 1을 실행한다(그 외에 환경 변수에 따라 이것저것 설정하는 것으로 보인다).
다시 __libc_malloc()이 호출되면 hook이 NULL이므로 arena_get() 매크로를 통해 ar_ptr에 &main_arena 값을 설정한다. 이후 ar_ptr, bytes를 인자로 _int_malloc()을 호출한다.

만약 새 스레드를 생성했다면, 바로 arena_get() 매크로를 실행한다. 이때 thread_arena는 각 스레드의 TLS에 존재한다. 그런데 메인 스레드의 경우 ptmalloc_init()에서 메인 스레드의 특정 TLS 영역(thread_arena)을 &main_arena의 값으로 설정했으므로 해당 매크로 함수를 실행했을 때 NULL 값이 아니다. 그러나 새로운 스레드가 이 매크로 함수를 실행했을 때에는 ptmalloc_init()을 호출하지 않았으므로 thread_arena의 값이 NULL이므로, arena_get2()함수가 실행된다.

_int_malloc()

/* The smallest possible chunk */
#define MIN_CHUNK_SIZE        (offsetof(struct malloc_chunk, fd_nextsize))

/* The smallest size we can malloc is an aligned minimal chunk */
#define MINSIZE                                                    \ 
  (unsigned long)(((MIN_CHUNK_SIZE+MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK))

/*
   Check if a request is so large that it would wrap around zero when
   padded and aligned. To simplify some other code, the bound is made
   low enough so that adding MINSIZE will also not wrap around zero.
 */
#define REQUEST_OUT_OF_RANGE(req)                                  \
  ((unsigned long) (req) >=                                        \
   (unsigned long) (INTERNAL_SIZE_T) (-2 * MINSIZE))

/* pad request bytes into a usable size -- internal version */
#define request2size(req)                                         \
  (((req) + SIZE_SZ + MALLOC_ALIGN_MASK < MINSIZE)  ?             \
   MINSIZE :                                                      \
   ((req) + SIZE_SZ + MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK)

/*  Same, except also perform argument check */
#define checked_request2size(req, sz)                             \
  if (REQUEST_OUT_OF_RANGE (req)) {                               \
      __set_errno (ENOMEM);						                            \
      return 0;								                                    \
    }									                                            \
  (sz) = request2size (req);

static void *
_int_malloc (mstate av, size_t bytes)
{
  ...
  checked_request2size(bytes, nb);

_int_malloc()에서는 checked_request2size 매크로를 이용해, 인자 bytes가 헤더의 크기를 포함하며 0x10 단위로 정렬되도록 조정한 후 nb에 저장한다.

/* There are no usable arenas.  Fall back to sysmalloc to get a chunk from mmap.  */
if (__glibc_unlikely (av == NULL))
  {
  void *p = sysmalloc (nb, av);
  if (p != NULL)
  alloc_perturb (p, bytes);
  return p;
  }

만약 thread_arena가 없다면 mmap()을 통해 청크를 할당한다.

_int_malloc(): malloc_init_state()

_int_malloc()을 처음 호출하면 무조건 malloc_consolidate()가 호출된다.

nb가 smallbin에 있다면 여기에서 호출
nb가 largebin에 있다면 여기에서 호출

/* Maximum size of memory handled in fastbins.  */
static INTERNAL_SIZE_T global_max_fast;
#define get_max_fast() global_max_fast

static void malloc_consolidate(mstate av)
{
  if (get_max_fast () != 0) {
    ...
  }
  else {
    malloc_init_state(av);
    check_malloc_state(av);
  }

이때 첫 호출이므로 global_max_fast의 값은 0이다. 따라서 malloc_init_state()를 호출한다.

#ifndef DEFAULT_MXFAST
#define DEFAULT_MXFAST     (64 * SIZE_SZ / 4)
#endif

#define set_max_fast(s)                              \
  global_max_fast = (((s) == 0)                      \
                     ? SMALLBIN_WIDTH : ((s + SIZE_SZ) & ~MALLOC_ALIGN_MASK))

static void
malloc_init_state (mstate av)
{
  int i;
  mbinptr bin;

  /* Establish circular links for normal bins */
  for (i = 1; i < NBINS; ++i)
    {
      bin = bin_at (av, i);
      bin->fd = bin->bk = bin;
    }

#if MORECORE_CONTIGUOUS
  if (av != &main_arena)
#endif
  set_noncontiguous (av);
  if (av == &main_arena)
    set_max_fast (DEFAULT_MXFAST);
  av->flags |= FASTCHUNKS_BIT;

  av->top = initial_top (av);
}

malloc_init_state()는 malloc_struct 구조체의 bins를 초기화한다. 그리고 global_max_fast 값을 128로 설정하고, main_arena일 경우 flag에 FASTCHUNKS_BIT를 활성화하며, top 멤버의 값을 &bin[0]-0x10으로 설정한다.

_int_malloc(): Fastbin

  /*
     If the size qualifies as a fastbin, first check corresponding bin.
     This code is safe to execute even if av is not yet initialized, so we
     can try it without checking, which saves some time on this fast path.
   */

if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))

__int_malloc()은 먼저 fastbin에 제공하기 적절한 청크가 있는지 확인한다. 그 전에 nb가 fastbin의 범위에 있는지 확인하는데, 이때 global_max_fast 값을 조회한다.

/* offset 2 to use otherwise unindexable first 2 bins */
#define fastbin_index(sz)                                  \
  ((((unsigned int) (sz)) >> (SIZE_SZ == 8 ? 4 : 3)) - 2)
#define fastbin(ar_ptr, idx) ((ar_ptr)->fastbinsY[idx])

idx = fastbin_index (nb);
mfastbinptr *fb = &fastbin (av, idx);
mchunkptr pp = *fb;

nb에 맞는 fastbinsY의 인덱스 값을 가져와 idx에 저장한다. 그리고 fastbinsY[idx]의 값을 pp에 저장한다.

do
{
  victim = pp;
  if (victim == NULL)
    break;
}
while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim)) != victim);

catomic_compare_and_exchange_val_acq (fb, victim->fd, victim): *fb==victim을 검증한 후 *fb=victim->fd를 수행하며, victim을 리턴한다. 즉 fb=victim->(victim->fd) 리스트에서 victim을 빼내어 fb=victim->fd로 만든다. ch4rli3kop

fb 리스트에서 가장 앞에 있는 malloc_chunk 구조체의 주소를 victim에 저장한다.

#define chunk2mem(p)   ((void*)((char*)(p) + 2*SIZE_SZ))

if (victim != 0)
{
  if (__builtin_expect (fastbin_index (chunksize (victim)) != idx, 0))
    {
      errstr = "malloc(): memory corruption (fast)";
    errout:
      malloc_printerr (check_action, errstr, chunk2mem (victim), av);
      return NULL;
    }
  check_remalloced_chunk (av, victim, nb);
  void *p = chunk2mem (victim);
  alloc_perturb (p, bytes);
  return p;
}

앞선 fastbin fb에서 가져온 victim의 사이즈에 따른 인덱스 값이 nb에 따른 인덱스 idx와 같은지 검증한다. 다르면 에러를 유벌한다.
이후 check_remalloced_chunk 매크로가 사용되는데, 이는 do_check_remalloced_chunk(), do_check_inuse_chunk(), do_check_free_chunk()를 호출하며 victim의 flag를 초기화하고, 필드 값들이 올바르게 설정됐는지 검증한다. 이후 victim에 헤더의 크기 0x10을 더한 값을 리턴한다.

_int_malloc(): Smallbin

#define NBINS             128
#define NSMALLBINS         64
#define SMALLBIN_WIDTH    MALLOC_ALIGNMENT
#define SMALLBIN_CORRECTION (MALLOC_ALIGNMENT > 2 * SIZE_SZ)
#define MIN_LARGE_SIZE    ((NSMALLBINS - SMALLBIN_CORRECTION) * SMALLBIN_WIDTH)

#define in_smallbin_range(sz)  \
  ((unsigned long) (sz) < (unsigned long) MIN_LARGE_SIZE)

/*
   If a small request, check regular bin.  Since these "smallbins"
   hold one size each, no searching within bins is necessary.
   (For a large request, we need to wait until unsorted chunks are
   processed to find best fit. But for small ones, fits are exact
   anyway, so we can check now, which is faster.)
 */

if (in_smallbin_range (nb))
  {

_int_malloc()은 fastbin 다음으로 smallbin을 조회한다. 그 전에 nb가 smallbin의 범위에 해당하는지 확인한다. 참고로 MIN_LARGE_SIZE는 1024로 계산된다.

#define smallbin_index(sz)                                                       \
  ((SMALLBIN_WIDTH == 16 ? (((unsigned) (sz)) >> 4) : (((unsigned) (sz)) >> 3))  \
   + SMALLBIN_CORRECTION)

/* addressing -- note that bin_at(0) does not exist */
#define bin_at(m, i)                                                             \
  (mbinptr) (((char *) &((m)->bins[((i) - 1) * 2]))                              \
             - offsetof (struct malloc_chunk, fd))

typedef struct malloc_chunk *mbinptr;

idx = smallbin_index (nb);
mbinptr bin = bin_at (av, idx);

nb에 해당하는 smallbin의 인덱스를 idx에 저장한 후, &bins[(idx-1)*2]-0x10을 bin에 저장한다.

/* Reminders about list directionality within bins */
#define first(b)     ((b)->fd)
#define last(b)      ((b)->bk)

if ((victim = last (bin)) != bin)
  {
    if (victim == 0) /* initialization check */
      malloc_consolidate (av);

bins[(idx-1)*2 +1] (==last(bin))의 값을 victim에 저장한 후 bin이 아닌지 확인한다. malloc_consolidate()->malloc_init_state()가 호출된 적이 있다면 False가 된다. 첫 호출에서는 해당 조건이 True가 되므로, victim이 0인지 검사한 후 malloc_consolidate()를 호출하게 된다.

만약 bins[(idx-1)*2 +1]의 값이 bin이 아니지만 0도 아니라는 것은 해당 bin에 청크가 연결되어 있다는 의미다.

else
  {
  bck = victim->bk;
  if (__glibc_unlikely (bck->fd != victim))
    {
    errstr = "malloc(): smallbin double linked list corrupted";
    goto errout;
    }

만약 bin[(idx-1)*2 + 1]의 값이 bin도 아니고 0도 아닌 어느 청크의 주소 값이라면, 해당 bin에 청크가 연결된 것으로 간주하고 else 구문을 통과한다. else 블럭 내에서는 victim->bk의 값을 bk에 저장한 후, bk->fd와 victim이 동일한 지를 검증한다. 만약 이 둘이 다르다면 연결 리스트 상태가 corrupted된 것이므로 에러를 유발한다.

#define set_inuse_bit_at_offset(p, s)                          \
  (((mchunkptr) (((char *) (p)) + (s)))->size |= PREV_INUSE)

set_inuse_bit_at_offset (victim, nb);
bin->bk = bck;
bck->fd = bin;

set_inuse_bit_at_offset을 이용해 인접한 다음 청크의 flag에 PREV_INUSE를 활성화한다. 또한 bin--victim--bck를 bin--bck로 연결한다.

if (av != &main_arena)
  victim->size |= NON_MAIN_ARENA;
check_malloced_chunk (av, victim, nb);
void *p = chunk2mem (victim);
alloc_perturb (p, bytes);
return p;

만약 인자로 받은 av가 main_arena가 아니라면, 해당 청크에 NON_MAIN_ARENA flag를 활성화한다. 이후 해당 청크의 설정 값들을 검증한 후(check_malloced_chunk()), victim에 0x10을 더한 값을 리턴한다.

_int_malloc(): Fastbin

idx = largebin_index (nb);

...

/*
   If a large request, scan through the chunks of current bin in
   sorted order to find smallest that fits.  Use the skip list for this.
 */
w
if (!in_smallbin_range (nb))
  {
    bin = bin_at (av, idx);

nb가 smallbin에 해당하지 않는지, 즉 largebin에 해당하는지 검사한다. 그리고 nb에 맞는 bin의 주소를 bin에 저장한다.

/* skip scan if empty or largest chunk is too small */
if ((victim = first (bin)) != bin &&
    (unsigned long) (victim->size) >= (unsigned long) (nb))
  {

먼저 (victim = first (bin)) != bin으로 bin이 비어있는지 확인한다. 그렇지 않다면 (unsigned long) (victim->size) >= (unsigned long) (nb)으로 largebin에서 가장 큰 청크의 크기를 확인한다. 두 조건을 만족하면 if문 내의 코드를 실행한다.

victim = victim->bk_nextsize;

if문을 통과할 당시 victim은 largebin에서 가장 큰 청크 포인터이다. 따라서 victim->bk_nextsize는 largebin에서 가장 작은 청크 포인터를 의미한다.

while (((unsigned long) (size = chunksize (victim)) <
        (unsigned long) (nb)))
  victim = victim->bk_nextsize;

가장 작은 청크부터 탐색하되, nb와 같거나 그보다 큰 청크일 경우 탐색을 중지한다. 즉 best fit 청크를 찾는 것이다.

/* Take a chunk off a bin list */
#define unlink(AV, P, BK, FD) {							\
    FD = P->fd;									\
    BK = P->bk;								      	\
    if (__builtin_expect (FD->bk != P || BK->fd != P, 0))			\
      malloc_printerr (check_action, "corrupted double-linked list", P, AV);	\
    else {									\
        FD->bk = BK;								\
        BK->fd = FD;								\
        if (!in_smallbin_range (P->size)					\
            && __builtin_expect (P->fd_nextsize != NULL, 0)) {			\
	    if (__builtin_expect (P->fd_nextsize->bk_nextsize != P, 0)		\
		|| __builtin_expect (P->bk_nextsize->fd_nextsize != P, 0))	\
	      malloc_printerr (check_action,					\
			       "corrupted double-linked list (not small)",	\
			       P, AV);						\
            if (FD->fd_nextsize == NULL) {					\
                if (P->fd_nextsize == P)					\
                  FD->fd_nextsize = FD->bk_nextsize = FD;			\
                else {								\
                    FD->fd_nextsize = P->fd_nextsize;				\
                    FD->bk_nextsize = P->bk_nextsize;				\
                    P->fd_nextsize->bk_nextsize = FD;				\
                    P->bk_nextsize->fd_nextsize = FD;				\
                  }								\
              } else {								\
                P->fd_nextsize->bk_nextsize = P->bk_nextsize;			\
                P->bk_nextsize->fd_nextsize = P->fd_nextsize;			\
              }									\
          }									\
      }										\
}

/* Avoid removing the first entry for a size so that the skip
   list does not have to be rerouted.  */
if (victim != last (bin) && victim->size == victim->fd->size)
  victim = victim->fd;

remainder_size = size - nb;
unlink (av, victim, bck, fwd);

best fit으로 찾은 청크가 bin->bk가 아니고(victim->fd에 다른 청크가 있다는 의미(bin--??--victim)), victim->fd 청크의 크기가 victim과 동일하다면, 그 앞의 청크를 victim으로 설정한다.
그리고 victim의 크기에서 nb를 뺀 값을 remainder_size에 저장한 후, unlink 매크로를 이용하여 victim을 largebin에서 떼낸다. 또한 bck, fwd에 각각 victim->bk, victim->fd를 저장한다.

 /* Exhaust */
if (remainder_size < MINSIZE)
{
  set_inuse_bit_at_offset (victim, size);
  if (av != &main_arena)
    victim->size |= NON_MAIN_ARENA;
}

위에서 계산한 remainder_size가 청크의 최소 크기인 MINSIZE보다 작다면, 자르지 않고 victim을 통째로 제공한다. 이때 필요하다면 NON_MAIN_ARENA flag를 활성화한다.

/* The otherwise unindexable 1-bin is used to hold unsorted chunks. */
#define unsorted_chunks(M)          (bin_at (M, 1))
/* Treat space at ptr + offset as a chunk */
#define chunk_at_offset(p, s)  ((mchunkptr) (((char *) (p)) + (s)))

/* Split */
else
{
  remainder = chunk_at_offset (victim, nb);
  /* We cannot assume the unsorted list is empty and therefore
     have to perform a complete insert here.  */
  bck = unsorted_chunks (av);
  fwd = bck->fd;
if (__glibc_unlikely (fwd->bk != bck))
    {
      errstr = "malloc(): corrupted unsorted chunks";
      goto errout;
    }

만약 reaminder_size가 청크의 최소 크기보다 같거나 크다면 split한다. remainder에 victim에서 nb만큼 자르고 남은 청크의 포인터를 저장한다. 그리고 bck에는 unsorted bin을, fwd에는 unsorted bin의 가장 첫 번째 청크의 포인터를 저장한다. 만약 fwd->bk(==bck->fd->bk)와 bck가 다르다면 에러를 유발한다.

remainder->bk = bck;
remainder->fd = fwd;
bck->fd = remainder;
fwd->bk = remainder;
if (!in_smallbin_range (remainder_size))
	{
	remainder->fd_nextsize = NULL;
	remainder->bk_nextsize = NULL;
	}

remainder을 unsorted bin에 연결한다. 그리고 remainder_size가 smallbin에 해당한다면 remainder 청크의 fd_nextsize, bk_nextsize를 NULL로 설정한다.

/* Set size/use field */
#define set_head(p, s)       ((p)->size = (s))
/* Set size at footer (only when chunk is not in use) */
#define set_foot(p, s)       (((mchunkptr) ((char *) (p) + (s)))->prev_size = (s))

set_head (victim, nb | PREV_INUSE |
    (av != &main_arena ? NON_MAIN_ARENA : 0));
set_head (remainder, remainder_size | PREV_INUSE);
set_foot (remainder, remainder_size);
}

victim, remainder 청크에 크기와 flag를 설정한다. 그리고 remainder의 다음 인접한 청크의 prev_size를 remainder_size로 설정한다.

remainder 청크의 flag에 PREV_INUSE를 설정한 이유는, 인접한 이전 청크 victim을 제공할 것이기 때문이다.
victim 청크의 flag에 PREV_INUSE를 설정한 이유는 앞 청크가 바로 병합되어, victim 앞 청크가 사용 중이거나 victim 그 자체가 앞 청크이기 때문인 것으로 추측된다. 자세한 것은 분석해봐야 알 수 있을 듯.
check_malloced_chunk (av, victim, nb);
void *p = chunk2mem (victim);
alloc_perturb (p, bytes);
return p;
청크의 설정이 올바른지 확인한 후 victim + 0x10을 리턴한다.