두번째 목표 달성

NamHyun Gu·2020년 10월 15일

Lexer Side Project compiler parser

dot-mds 개발기

목록 보기

3/3

지난 첫 목표 달성 글에서 Lexer, Parser, Compiler를 구현하고 mdx 파일예 json 값의 데이터를 바인딩하는데 성공했다.

소스코드는 Github에서 확인할 수 있다.

동작 원리

이후 다음 목표로 삼았던 다른 파일에서 불러오기 기능은 완성했는데 완성된 기능은 다음과 같이 동작한다.

// hello.mdx
${include "header"}
World

// header.mdx
# Hello

> dot-mdx hello.mdx output.mdx

// output.mdx
# Hello
World

hello.mdx 파일은 Lexer에 의해 다음과 같이 토큰화된다.

[
	<Token>{ type: TokenType.L_Block, line: 1 },
	<Token>{ type: TokenType.Keyword, value: "include", line: 1 },
	<Token>{ type: TokenType.Literal, value: "header", line: 1 },
	<Token>{ type: TokenType.R_Block, line: 1 },
	<Token>{ type: TokenType.NewLine, line: 1 },
	<Token>{ type: TokenType.Text, value: "World", line: 1 },
	<Token>{ type: TokenType.EOS },
]

이번 기능에 Keyword, Literal 타입의 토큰이 추가되었으며,
Keyword 토큰은 현재 include 키워드에만 대응한다.
Literal 토큰은 "header" 와 같이 양 옆에 따옴표가 있는 경우에만 처리된다.

Lexer에 의해 생성된 토큰 목록은 Parser에서 다음과 같은 AST를 생성한다.

Document([
	BlockStatement([
		IncludeDeclaration(
			Literal("hello")
		)
	]),
	LineSeparator(),
	DocumentText("World")
]

이전에는 Block 안에 Identifier만 정의할 수 있어서 Identifier가 정의된 Block를 IdentifierDeclaration으로 변환하였지만
include 키워드를 지원하게 됨으로써 BlockStatement와 IncludeDeclaration이 추가되었으며 기존 IdentifierDeclaration는 Identifier로 변경했다.

BlockStatement가 추가됨에 따라 블록과 블록 내 내용에서 대해서는 다음 함수를 통해 노드를 변환하도록 동작을 변경했다.
이번 기능을 구현하면서 블록이 온전히 닫히지 않았을 때 파싱 오류를 반환하도록 추가했다.

private parseBlock(): Node | undefined {
    const blockToken = this.tokenStream.peek();
    this.tokenStream.advance();
    const blockBody: Array<Token> = [];
    while (this.tokenStream.peek().type != TokenType.R_Block) {
      const token = this.tokenStream.peek();
      if (token.type == TokenType.EOS) {
        throw new Error(
          `Unexpected token, Require close parenthesis (line: ${blockToken.line})`
        );
      }
      blockBody.push(token);
      this.tokenStream.advance();
    }
    this.tokenStream.advance();

    const body: Array<Node> = [];
    if (blockBody.length > 0) {
      const bodyStream = new TokenStream(blockBody);
      const expr = this.parseInBlockExpr(bodyStream);
      if (expr) {
        body.push(expr);
      }
    }
    return new BlockStatement(body);
  }

private parseInBlockExpr(tokenStram: TokenStream): Node | undefined {
    switch (tokenStram.peek().type) {
      case TokenType.Keyword:
        return this.parseKeyword(tokenStram);
      case TokenType.Identifier:
        return this.parseIdentifier(tokenStram);
      case TokenType.Literal:
        return this.parseLiteral(tokenStram);
    }
    return undefined;
  }

마지막으로 Compiler에서는 IncludeDeclaration를 다음과 같이 처리한다.

이 기능을 추가하며 컴파일러에서 파일을 제어할 수 있어야 하기에 단순히 ServiceLocator를 추가하고, SourceLoader를 가져옴으로써 해결하였다.
SourceLoader는 컴파일을 처음 시작한 파일의 폴더 경로를 갖게 되며 폴더 경로에 IncludeDeclaration 내 Literal의 값을 더해 파일을 읽어온다.

파일을 읽어오고 compiler 함수를 수행하여 그 결과를 현재의 컴파일러의 값에 추가한다.

private transformIncludeDeclaration(node: IncludeDeclaration) {
    const sourceLoader: SourceLoader = services.get("source-loader");
    if (sourceLoader) {
      const source = node.source.value; // Literal 노드의 값을 가져온다
      const result = compile(
        sourceLoader.load(source),
        JSON.stringify(this.data)
      );
      this.append(result);
    }
  }

다음 목표

Lexer, Parser를 직접 구현하면서 Regex를 다루는 방법 등 많은 것을 배울 수 있었지만
'\n'만 있는 줄은 Lexer가 처리를 못해 무한 루프에 빠지는 등 문제가 있었으며, 처리해야할 규칙이 추가됨에 따라 가독성이 떨어지는 것 같다.

프로그래밍 언어론때 들었던 CFG와 같은 문법을 작성하여 파싱을 도와주는 여러 라이브러리가 있던데 이들 라이브러리들 중에 하나를 선정하여
읽기 쉽게 개선을 하는 것이 목표이며, 그리고 이번 기능을 구현할 때 급한대로 ServiceLocator를 이용하여 소스코드를 읽어오도록 했는데 더 나은 방법이 있는지 고민해봐야 될 것 같다.

NamHyun Gu

이전 포스트

두번째 목표 달성

dot-mds 개발기

동작 원리

다음 목표

첫 목표 달성

0개의 댓글