Map의 Key가 두 객체의 조합이라면?

후니팍·2023년 3월 23일

고민 배경

코드를 작성하다 보면 Map을 이용하는 경우가 자주 있습니다. Map의 key가 두 객체의 조합이라면 key를 어떻게 두어야 할지 고민이었습니다. 2중 Map을 이용하는 방법, 두 객체의 toString()을 조합하는 방법 등 여러 방법을 떠올려 보았는데, 어떤 방법이 성능이 좋고 어떤 방법이 이상적일지 정리해보고자 글을 작성하게 되었습니다.

성능 실험

실험 방법

key를 평면의 좌표라고 설정하고 value를 아무 문자열로 설정한 HashMap을 이용하여 실험했습니다. key를 어떻게 설정하는지에 따라 속도가 어떻게 달라지는지 확인해 보았습니다. x와 y 좌표 각각2_000개 총 4_000_000개를 이용해 실험했고, 아래와 같은 4가지 방법을 이용해 실험했습니다.

Map<Integer, Map<Integer, String>> 방식
새로운 객체 Position을 만들어 Map<Position, String> 형태로 저장하는 방식
x와 y의 toString()을 연결한 것을 key로 설정하는 방식
x와 y의 hashCode()를 조합하여 key를 생성하는 방식

1. 이중 Map 방식

public class KeyTest {

    private final Map<Integer, Map<Integer, String>> testMap = new HashMap<>();

    @Test
    void test() {
        long beforeTime = System.currentTimeMillis();
        for (int i = 0; i < 2_000; i++) {
            Map<Integer, String> innerMap = new HashMap<>();
            for (int j = 0; j < 2_000; j++) {
                innerMap.put(j, "디투입니다.");
            }
            testMap.put(i, innerMap);
        }
        long afterTime = System.currentTimeMillis();
        long diffTime = afterTime - beforeTime;
        System.out.println("시간 차이: " + diffTime);
    }
}

시간 차이가 가장 적었던 방법입니다. Integer의 연산 속도가 빠른 편이라서, 이 때문에 속도가 빠르게 나온건지 궁금했습니다. 비슷한 방법으로 key를 String으로 두어 실험을 했는데요. 속도 차이는 크지 않았습니다.

public class KeyTest {

    private final Map<String, Map<String, String>> testMap = new HashMap<>();

    @Test
    void test() {
        long beforeTime = System.currentTimeMillis();
        for (int i = 0; i < 2_000; i++) {
            Map<String, String> innerMap = new HashMap<>();
            for (int j = 0; j < 2_000; j++) {
                innerMap.put(String.valueOf(j), "디투입니다.");
            }
            testMap.put(String.valueOf(i), innerMap);
        }
        long afterTime = System.currentTimeMillis();
        long diffTime = afterTime - beforeTime;
        System.out.println("시간 차이: " + diffTime);
    }
}

2. 새로운 객체를 만들어 그것을 key로 설정한 방식

public class KeyTest {

    static class Position {
        final int x;
        final int y;

        Position(int x, int y) {
            this.x = x;
            this.y = y;
        }

        @Override
        public boolean equals(final Object o) {
            if (this == o) return true;
            if (!(o instanceof Position)) return false;
            Position position = (Position) o;
            return x == position.x && y == position.y;
        }

        @Override
        public int hashCode() {
            return Objects.hash(x, y);
        }
    }

    private final Map<Position, String> testMap = new HashMap<>();

    @Test
    void test() {
        long beforeTime = System.currentTimeMillis();
        for (int i = 0; i < 2_000; i++) {
            for (int j = 0; j < 2_000; j++) {
                testMap.put(new Position(i, j), "디투입니다.");
            }
        }
        long afterTime = System.currentTimeMillis();
        long diffTime = afterTime - beforeTime;
        System.out.println("시간 차이: " + diffTime);
    }
}

시간 차이는 2207로 빠르지 않았습니다. 보통 x와 y를 같으면 같은 좌표로 보기때문에 equals()와 hashCode()를 오버라이드했는데요, 이 오버라이드 때문에 속도가 느려진건지 알아보고자 오버라이드를 제거한 상태에서도 실험해보았습니다.

public class KeyTest {

    static class Position {
        final int x;
        final int y;

        Position(int x, int y) {
            this.x = x;
            this.y = y;
        }
    }

    private final Map<Position, String> testMap = new HashMap<>();

    @Test
    void test() {
        long beforeTime = System.currentTimeMillis();
        for (int i = 0; i < 2_000; i++) {
            for (int j = 0; j < 2_000; j++) {
                testMap.put(new Position(i, j), "디투입니다.");
            }
        }
        long afterTime = System.currentTimeMillis();
        long diffTime = afterTime - beforeTime;
        System.out.println("시간 차이: " + diffTime);
    }
}

시간 차이가 줄긴 했지만, 이중 Map정도로 시간이 줄지는 않았습니다.

3. 두 객체의 `toString()` 메소드를 결합하여 새로운 문자열을 생성하는 방식

public class KeyTest {

    private final Map<String, String> testMap = new HashMap<>();

    @Test
    void test() {
        long beforeTime = System.currentTimeMillis();
        for (int i = 0; i < 2_000; i++) {
            for (int j = 0; j < 2_000; j++) {
                testMap.put(i + " " + j, "디투입니다.");
            }
        }
        long afterTime = System.currentTimeMillis();
        long diffTime = afterTime - beforeTime;
        System.out.println("시간 차이: " + diffTime);
    }
}

String의 연결 연산이 느린 연산으로 알고 있었는데, 객체를 생성하는 것보다 속도가 빨랐습니다. 이중 Map을 이용한 것보다는 느렸지만 좋은 성능을 보이고 있었습니다.

4. 두 객체의 hashCode를 연산하는 방식

public class KeyTest {

    private final Map<Integer, String> testMap = new HashMap<>();

    @Test
    void test() {
        long beforeTime = System.currentTimeMillis();
        for (int i = 0; i < 2_000; i++) {
            for (int j = 0; j < 2_000; j++) {
                testMap.put((String.format("%04d", i).hashCode() << 16) + String.format("%04d", j).hashCode(), "디투입니다.");
            }
        }
        long afterTime = System.currentTimeMillis();
        long diffTime = afterTime - beforeTime;
        System.out.println("시간 차이: " + diffTime);
        System.out.println(testMap.size());
    }
}

String으로 바꾸어 hashcode를 구한 후, 비트 연산으로 조합해주었습니다. 연산이 많다보니 오랜 시간이 걸렸습니다. 또, Integer의 hashcode를 사용하지 않고 굳이 String으로 바꾸어 hashcode를 생성했는데요. Integer의 hashcode는 숫자값 그 자체가 되기 때문에 hashcode다운 hashcode를 조합하기 위함이었습니다. 만약 Integer의 조합이 아니라 String이나 객체의 조합이었다면 hashcode다운 hashcode가 나오기 때문에 쓸 수 있겠죠?

Integer에서는 hashcode를 사용하지 않는 것이 더 빠른 연산을 만들어준다고 생각해 Integer 자체의 bit를 옮기고 조합하여 key를 만들었습니다. bit를 11칸 옮겨 key의 무결성을 보장해주었습니다. (숫자가 커지면 커질수록 더 많이 옮겨야 합니다. 보통 max로 잡고 16칸 옮기죠.)

public class KeyTest {

    private final Map<Integer, String> testMap = new HashMap<>();

    @Test
    void test() {
        long beforeTime = System.currentTimeMillis();
        for (int i = 0; i < 2_000; i++) {
            for (int j = 0; j < 2_000; j++) {
                testMap.put((i << 11) + j, "디투입니다.");
            }
        }
        long afterTime = System.currentTimeMillis();
        long diffTime = afterTime - beforeTime;
        System.out.println("시간 차이: " + diffTime);
        System.out.println("Map 사이즈: " + testMap.size());
    }
}

위의 코드에서 bit를 맞추지 못하고 testMap.put((i << 10) + j, "디투입니다.");처럼 비트를 겹차게 하거나 연산을 잘못하게 되면 key의 무결성을 보장하지 못합니다.

또 hashcode의 조합 방식에 따라 속도 차이가 나게 됩니다. 개발자는 가장 빠른 연산을 찾는데 시간이 걸리게 되겠죠. 최고의 연산을 찾으면 좋지만 개발 시간이 중요한 경우도 있기 때문에 bit를 옮기는 방법이나 XOR 연산 방법을 많이 사용하는 것 같습니다.
하지만 hashcode를 이용한 모든 방법 다 key가 많아지게 되면 중복이 생길 가능성이 높아집니다. 이런 경우 hashcode를 이용하면 안될 것 같습니다.

결과

방식	속도	무결성 보장
이중 Map	216ms	O
key를 위한 객체 생성	1638ms	O
`toString()` 연결 연산	692ms	O
`hashCode` 연산	개발자 역량에 따라 다름	X

결론

일반적인 Map을 만드는 경우라면 속도보다 가독성을 중요시하기 때문에 Position과 같은 새로운 객체를 생성하여 key로 두고 있다고 생각합니다. 저도 이 부분은 동의하고 있습니다.

하지만 캐싱을 이용하고 있는 경우에는 어떻게 해야할까요? 아래와 같이 말이죠.

    private static final int TOTAL_SQUARE_COUNT = 8 * 8;
    private static final Map<SquareKey, Square> cache = new HashMap<>(TOTAL_SQUARE_COUNT);

    private final File file;
    private final Rank rank;

    private Square(final File file, final Rank rank) {
        this.file = file;
        this.rank = rank;
    }

    public static Square of(final File file, final Rank rank) {
        final SquareKey key = new SquareKey(file, rank);
        if (cache.containsKey(key)) {
            return cache.get(key);
        }
        final Square square = new Square(file, rank);
        cache.put(key, square);
        return square;
    }

속도의 이점을 위해 캐싱을 사용하는데 굳이 새로운 객체를 생성해야 할까요? 아래의 코드와 같이 toString()을 이용하여 캐싱하는 것이 속도면에서 더 이점이지 않을까요?

    private static final int TOTAL_SQUARE_COUNT = 8 * 8;
    private static final Map<SquareKey, Square> cache = new HashMap<>(TOTAL_SQUARE_COUNT);
    
    private final File file;
    private final Rank rank;

    private Square(File file, Rank rank) {
        this.file = file;
        this.rank = rank;
    }

    public static Square of(final File file, final Rank rank) {
        final String cardKey = String.valueOf(List.of(file.getValue(), rank.getValue()));
        if (cache.containsKey(cardKey)) {
            return cache.get(cardKey);
        }
        Square square = new Square(file, rank);
        cache.put(cardKey, square);
        return square;
    }

이번에 글을 쓰게된 이유도 캐싱때문인데요. 어떤 것이 가장 좋다! 라고 말하지는 못할 것 같습니다. 속도의 이점을 들고가기 위해서 2중 Map을 쓰는 것이 가장 좋아보이지만 코드가 깔끔해지기 위해서 지양해야하는 부분이죠. 검색을 할 때 불편하기도 하고요.
저는 속도도 어느정도 보장해주고 가독성도 크게 해치지 않는 toString()을 사용했는데요. 체스 게임에서는 이 방법이 가장 적합하다고 판단해 toString()을 이용하여 key를 정하는 방식을 사용했습니다.