[NeurIPS '21] Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Generation

minha·2022년 2월 7일

Drug discovery

목록 보기

1/1

1. Introduction

1-1. 기존 drug design의 문제점과 novelty of their method

Unreal, Inappropriate 구조를 생성한다(drug-likeness). ==> combination of appropriate 'fragments'(chemically realistic & pharmacochemically acceptable)
Therapeutic potential을 보장하지 못한다. ==> cLogP나 QED를 optimization objective로 쓰기보단, 더 straightforward한 docking simulation score값을 optimization objective로 사용한다. (BUT) optimization이 더 어렵다. (1)
Train 데이터셋에 포함된 molecule과 비슷한 molecule을 생성할 뿐, 아예 색다른 de novo molecule을 생성하지 못한다. (2)

==> (1),(2)의 이유로 'better exploration' for RL agents가 필요하다.
prioritized experience replay(PER): 'sampling experiences that can give more information to the RL agent and thus have more surprisal - defined by the temporal-difference(TD) error - in higher probability'
이 메소드의 경우 priority = sample novelty
안가본 길에 더 높은 우선순위를 부여하는 것!, 그럼 안가본 길을 어떻게 판별하는가? predictive error이 큰(PER(PE)) or auxiliary reward의 uncertainty가 큰(PER(BU)) experience가 더 novel한, 안가본 experience이다.

SMILES-based generation (X)-> scaffold based task에는 사용할 수 없음(?) VS atom-based generation (X)-> unrealistic generated molecules VS motif-based generation (O)
-> Maziarz et al.의 방법과의 차이점 -> motif-adding action + 'connectivity-preserving fragmentation'(?) (CONTRADICTS WITH 'our method is free to explore various connectivity')
VAE-based generative method (X)-> diversity of generated molecule 낮음 VS RL-based generative method (O)-> more explorative power
FOR EFFICIENT EXPLORATION IN RL
curiosity (X): current state reward에 대해 true값과 estimated 값의 차이-> complex problem은 못풀때도 있음 VS sampling experiences (O)-PER

3. Methods

Fragmentation

데이터셋 내의 any arbitrary molecule을 connectivity-preserving fragment로 쪼갠다.

GNN embedding

모체 molecule: 모든 atom과 attachment site를 노드로 설정. 이 때, attachment site의 node type을 'attachment site'로 명시, 모든 attachment site의 인덱스를 저장함
=> undirected graph G with node feature matrix, adjacency matrix, attachment site masking vector
Fragment: 모체 molecule과 같은 방법으로 그래프 임베딩 된다.
☞ 1) GCN-based 모델로 먼저 node embedding H를 생성/ 2) node embedding이 sum-aggregated되어 graph embedding $h_{g}$ 를 생성

Generation

Action 1: 모체 molecule의 attachment sites 중 어디에 붙일건지? ==> 모체 molecule의 graph embedding( $h_{g}$ )과 모체 molecule의 attachment sites의 node embedding( $h_{att}$ )의 multiplicative interaction(MI)를 policy network1의 인풋으로 사용
Action 2: 어떤 fragment를 붙일건지? ==> Action 1의 MI 부산물(?)과 candidate fragments의 graph embedding( $h_{g_{cand}}$ )의 ECFP(Extended Connectivity Fingerprint)의 MI를 policy network2의 인풋으로 사용
Action 3: fragment의 attachment sites 중 어떤 site로 붙일건지? ==> Action 2의 MI 부산물(?)과 Action 2에서 선택된 fragment의 node embedding( $U_{cand}$ )의 MI를 policy network3의 인풋으로 사용
(
Extended-Connectivity Fingerprints (ECFPs) are circular topological fingerprints designed for molecular characterization, similarity searching, and structure-activity modeling
출처: https://docs.chemaxon.com/display/docs/extended-connectivity-fingerprint-ecfp.md
)
policy network: 3 FC-layer with ReLU + softmax -> 아웃풋: 각 action이 일어날 확률

Optimization objective - Explorative RL

docking score optimization이 더 어려우므로 explorative power이 더 큰 Soft actor-critic(SAC) RL 알고리즘을 사용함
Exploration을 encourage하기 위해, RL update때 novel experience를 sampling 한다. ==> PER

minha