[연구] PDB Viewer 생성 - ①

양현지·2023년 5월 6일

REST API python

연구

목록 보기

1/16

1-1 PDB ID 추출 및 local DB 생성

1. PDB ID list

: Rest api를 통해 현존하는 모든 pdb id를 받아와 그 중에서 단백질-펩타이드 complex의 리스트를 추출하는 python 프로그램을 작성

초기 작업
: threadpool을 사용하지 않고 small polypeptide가 검출 될 때마다 output text 파일에 write & flush 하도록 하였으나, 약 40만개의 PDB ID list를 가져오고 처리하는 과정에서 상당한 시간이 소요되며, 비정상 종료가 빈번하게 발생 (초당1개의 PDB ID를 처리)

수정된 작업
: threadpool을 사용해 처리 속도가 초당 약 400개로 증가하여 하루 이내에 40만개의 모든 PDB ID에 대해 output.txt파일을 생성 할 수 있었음

#STEP1 : 현존하는 모든 PDB ID 리스트 받아오기
#STEP2 : pdb_list의 pdb id 별로 json 파일 읽어서 조건을 만족하는 small_polypeptide_pdbs 리스트 생성
#STEP3 : small_polypeptide_pdbs 출력
import requests
import urllib.request
import concurrent.futures
#STEP1 : PDB ID 리스트 읽어오기
pdb_ids = []
url = 'ftp://ftp.wwpdb.org/pub/pdb/derived_data/index/entries.idx'
save_path = 'pdb_ids.txt'
urllib.request.urlretrieve(url, save_path)
with open(save_path) as f:
    for i, line in enumerate(f):
        if i >= 3:
            pdb_ids.append(line[:4])
#STEP2 :  polypeptide(L)이고 길이가 50 이하인 PDB ID 리스트 만들기
def get_pdb_info(pdb_id):
    with open('small_polypeptide_pdbs.txt', 'a') as f:
        #for pdb_id in pdb_ids:
            url = f'https://www.ebi.ac.uk/pdbe/api/pdb/entry/molecules/{pdb_id}'
            response = requests.get(url)
            pdb_id = pdb_id.lower()
            pdb_info = response.json()[pdb_id]
            polypeptide_count = 0
            valid = False
            min_length = float('inf')
            max_length = 0
            entries = []
            for molecule in pdb_info:
                if molecule['molecule_type'] == 'polypeptide(L)':
                    polypeptide_count += 1
                    entry_id = molecule['entity_id']
                    length = molecule['length']
                    if length <= 50:
                        valid = True
                    entries.append((entry_id, length))
                    min_length = min(min_length, length)
                    max_length = max(max_length, length)
            if polypeptide_count >= 2 and valid:
                print(
                f"PDB ID: {pdb_id}, Polypeptide Entry Count: {polypeptide_count}, Max Length: {max_length}, Min Length: {min_length}")
                f.write(f"{pdb_id}\t{polypeptide_count}\t{max_length}\t{min_length}\t")
                for entry in entries:
                    f.write(f"{entry[0]}:{entry[1]}, ")
                f.write('\n')
                f.flush()
with concurrent.futures.ThreadPoolExecutor() as executor:
    futures = [executor.submit(get_pdb_info, pdb_id) for pdb_id in pdb_ids]```

2. Blast Search 설치

(blastn: 2.13.0+)

: PDB ID에 대해 Blast Search를 수행하고자 Blast Search 설치

3. pdbtoFasta 변환

peptide 디렉터리 내 pdb 파일/protein 디렉터리 내 pdb 파일 각각에 대해 fasta파일로 변환
< pdb => fasta 작업>

기존 작업 : pdb_tofasta 사용
(1MCJ.P.fasta 파일 내용)

|P
H
=> fasta 파일에 pdb id가 누락( chain id와 아미노산 정보만 출력)

수정된 작업 : py 스크립트 작성하여 실행
( 5G46.C.fasta 파일 내용 )

5G46|C
KILHRLLQDS
=> pdb id | chain id 와 아미노산 서열 정보 포함

import os
#현재 디렉터리 내의 모든 pdb 파일 가져오기
pdb_files = [f for f in os.listdir('.') if f.endswith('.pdb')]
for pdb_file in pdb_files:
    # pdb 파일 이름에서 pdb id 추출
    pdb_id = pdb_file.split('.')[0]
    # pdb 파일 이름에서 chain id 추출
    chain = pdb_file.split('.')[1]
    # fasta 파일 이름 생성
    fasta_file = f"{pdb_id}.{chain}.fasta"
    # fasta 파일에 header 작성
    header = f">{pdb_id}|{chain}"
    # pdb 파일을 fasta 형식으로 변환하여 fasta 파일에 추가
    os.system(f"pdb_tofasta -multi {pdb_file} | sed '/^>/d' >> {fasta_file}")
    # header를 fasta 파일의 첫 줄에 추가
    with open(fasta_file, 'r+') as f:
        content = f.read()
        f.seek(0, 0)
        f.write(header.rstrip('\r\n') + '\n' + content)```

변환된 fasta 파일 예시

변환된 fasta파일을 하나의 fasta파일(merged.fasta)로 병합

for file in *.fasta; do
pdbid= $(basename $file .fasta) sed "s/^>/>$ {pdbid}_/" $file >> merged.fasta
done

4. localDB 생성

BLAST 용 local DB 생성

makeblastdb -in merged.fasta -dbtype prot -title peptide_blast_db -out peptide_blast_db

다음 글에서는 ligand chain id와 receprot chain id를 병합하여 pdb 파일을 생성하고 pdb 파일의 분자구조를 3차원 시각화한 웹페이지를 구현하는 방법에 대해 다룬다.

양현지

다음 포스트

[연구] PDB Viewer 생성 - ①

연구

1. PDB ID list

2. Blast Search 설치

3. pdbtoFasta 변환

4. localDB 생성

[딥러닝] CT segmentation

0개의 댓글