decision diffuser 코드리뷰

이두현·2024년 3월 17일
0

conda env decdiff

code/scripts/train 으로 넘어가는 kwargs

{'RUN.prefix': 'diffuser/default_inv/predict_epsilon_200_1000000.0/dropout_0.25/hopper-medium-expert-v2/100', 'seed': 100, 'returns_condition': True, 'predict_epsilon': True, 'n_diffusion_steps': 200, 'condition_dropout': 0.25, 'diffusion': 'models.GaussianInvDynDiffusion', 'n_train_steps': 1000000.0, 'dataset': 'hopper-medium-expert-v2', 'returns_scale': 400.0, 'RUN.job_counter': 1, 'RUN.job_name': 'predict_epsilon_200_1000000.0/dropout_0.25/hopper-medium-expert-v2/100'}

code/scripts/train.py

dataset_config = utils.Config(
        Config.loader,
        savepath='dataset_config.pkl',
        env=Config.dataset,
        horizon=Config.horizon,
        normalizer=Config.normalizer,
        preprocess_fns=Config.preprocess_fns,
        use_padding=Config.use_padding,
        max_path_length=Config.max_path_length,
        include_returns=Config.include_returns,
        returns_scale=Config.returns_scale,
        discount=Config.discount,
        termination_penalty=Config.termination_penalty,
    )
  • utils.Config class는 loader class와 정보들을 dictionary로 저장해두는 클래스
[utils/config ] Config: <class 'diffuser.datasets.sequence.SequenceDataset'>
    discount: 0.99
    env: hopper-medium-expert-v2
    horizon: 100
    include_returns: True
    max_path_length: 1000
    normalizer: CDFNormalizer
    preprocess_fns: []
    returns_scale: 400.0
    termination_penalty: -100
    use_padding: True

[ utils/config ] Saved config to: dataset_config.pkl
  • dataset_config 만든 후 log

scripts/train.py

render_config = utils.Config(
        Config.renderer,
        savepath='render_config.pkl',
        env=Config.dataset,
    )
[ utils/config ] Imported diffuser.utils:MuJoCoRenderer

[utils/config ] Config: <class 'diffuser.utils.rendering.MuJoCoRenderer'>
    env: hopper-medium-expert-v2

[ utils/config ] Saved config to: render_config.pkl
  • render_config 만든 후 log

scripts/train.py

dataset = dataset_config()
renderer = render_config()
  • ()는 utils.config 의 Config class의 call 함수를 부르게 되고

diffuser/utils/config.py

def __call__(self, *args, **kwargs):
        instance = self._class(*args, **kwargs, **self._dict)
        if self._device:
            instance = instance.to(self._device)
        return instance
  • 이렇게 클래스를 불러오게 된다.

dataset 어떻게 불러오나..

diffuser/datasets/sequence.py

class SequenceDataset(torch.utils.data.Dataset):

    def __init__(self, env='hopper-medium-replay', horizon=64,
        normalizer='LimitsNormalizer', preprocess_fns=[], max_path_length=1000,
        max_n_episodes=10000, termination_penalty=0, use_padding=True, discount=0.99, returns_scale=1000, include_returns=False):
				
				**self.preprocess_fn = get_preprocess_fn(preprocess_fns, env)
				self.env = env = load_environment(env)**
        self.returns_scale = returns_scale
        self.horizon = horizon
        self.max_path_length = max_path_length
        self.discount = discount
        **self.discounts = self.discount ** np.arange(self.max_path_length)[:, None]**
        self.use_padding = use_padding
        self.include_returns = include_returns
				itr = sequence_dataset(env, self.preprocess_fn)
  • 여기에서 dictionary 정보 얻음
  • env=’hopper-medium-expert-v2’ . horizon=100, normalizer=CDFNormalizer, preprocess_fn = [], termination_penalty = -100, returns_scale = 400, include_returns=True
  • preprocess_fns = [] 이므로 self.preprocess_fn 은 modification 없이 그냥 다시 env를 반환
  • self.env는 offlinehopperEnv instance 라고 한다 (load_environment)
  • discounts 는 [self.max_path_length, 1] 크기의 np array 를 지수로 갖는 discount factor 집합
  • sequence_dataset 부분이 ipdb 해도 바로 넘어가 버림

이어서 (diffuser/datasets/sequence.py)

			  **fields = ReplayBuffer(max_n_episodes, max_path_length, termination_penalty)**
        for i, episode in enumerate(itr):
            fields.add_path(episode)
        fields.finalize()

diffuser/datasets/buffer.py

class ReplayBuffer:

    def __init__(self, max_n_episodes, max_path_length, termination_penalty):
        self._dict = {
            'path_lengths': np.zeros(max_n_episodes, dtype=np.int32),
        }
        self._count = 0
        self.max_n_episodes = max_n_episodes
        self.max_path_length = max_path_length
        self.termination_penalty = termination_penalty
  • ReplayBuffer looks like this

diffuser/datasets/sequence.py

			  fields = ReplayBuffer(max_n_episodes, max_path_length, termination_penalty)
        **for i, episode in enumerate(itr):**
            fields.add_path(episode)
        fields.finalize()

여기서 sequence_dataset 이 실행됨

diffuser/datasets/d4rl.py

def sequence_dataset(env, preprocess_fn):
    """
    Returns an iterator through trajectories.
    Args:
        env: An OfflineEnv object.
        dataset: An optional dataset to pass in for processing. If None,
            the dataset will default to env.get_dataset()
        **kwargs: Arguments to pass to env.get_dataset().
    Returns:
        An iterator through dictionaries with keys:
            observations
            actions
            rewards
            terminals
    """
    dataset = get_dataset(env)
    dataset = preprocess_fn(dataset)

    N = dataset['rewards'].shape[0]
    data_ = collections.defaultdict(list)

    # The newer version of the dataset adds an explicit
    # timeouts field. Keep old method for backwards compatability.
    use_timeouts = 'timeouts' in dataset

    episode_step = 0
    for i in range(N):
        done_bool = bool(dataset['terminals'][i])
        if use_timeouts:
            final_timestep = dataset['timeouts'][i]
        else:
            final_timestep = (episode_step == env._max_episode_steps - 1)

        for k in dataset:
            if 'metadata' in k: continue
            data_[k].append(dataset[k][i])

        if done_bool or final_timestep:
            episode_step = 0
            episode_data = {}
            for k in data_:
                episode_data[k] = np.array(data_[k])
            if 'maze2d' in env.name:
                episode_data = process_maze2d_episode(episode_data)
            yield episode_data
            data_ = collections.defaultdict(list)

        episode_step += 1
  • get_data 을 중간에 interrupt 할 경우 에러 발생 주의
    • /home/doolee13/.d4rl/datasets/hopper_medium_expert-v2.hdf5
    • 이 경로에 있는 파일을 삭제해 주면 다시 정상 로드 가능
  • datset[’rewards’].shape → (1999906, )
  • use_timeouts = True
  • for k in dataset에서 key를 의미 (actions, infos/action_log_probs, infos/qpos, infos/qvel,

next_observations, observations, rewards, terminals, timeouts,

  • data_ 에 이러한 정보들을 담음
  • 만약 terminals 나 timeouts 가 true 이면 그 동안 모았던 data_ 정보를 episode_data에 옮겨담고 iter 에 반환 (한 episode가 끝났음을 의미하므로)
  • user_timeout 이 거짓이라면 수동으로 episode_step을 세어가며 timeout 규칙을 유지
  • dataset은 각 key에 대해 (1999906, field_dim) 만큼의 element를 보유
  • yield 결과는 각 key별로 (episode_len, field_dim) 크기의 원소를 받게 될 것 (list 안에 numpy variable)

diffuser/datasets/sequence.py

			  fields = ReplayBuffer(max_n_episodes, max_path_length, termination_penalty)
        for i, episode in enumerate(itr):
            **fields.add_path(episode)**
        fields.finalize()
  • episode keys: 'actions', 'infos/action_log_probs', 'infos/qpos', 'infos/qvel', 'next_observations', 'observations', 'rewards', 'terminals', 'timeouts’
  • 크기 예시) actions → [epsiode_len, 3]
  • observations → [episode_len, 11]
  • rewards → [episode_len, ]

diffuser/datasets/buffer.py

def add_path(self, path):
        path_length = len(path['observations'])
        assert path_length <= self.max_path_length

        if path['terminals'].any():
            assert (path['terminals'][-1] == True) and (not path['terminals'][:-1].any())

        ## if first path added, set keys based on contents
        self._add_keys(path)

        ## add tracked keys in path
        for key in self.keys:
            array = atleast_2d(path[key])
            if key not in self._dict: self._allocate(key, array)
            self._dict[key][self._count, :path_length] = array

        ## penalize early termination
        if path['terminals'].any() and self.termination_penalty is not None:
            assert not path['timeouts'].any(), 'Penalized a timeout episode for early termination'
            self._dict['rewards'][self._count, path_length - 1] += self.termination_penalty

        ## record path length
        self._dict['path_lengths'][self._count] = path_length

        ## increment path counter
        self._count += 1
  • example path_len : 470
  • path[’terminals’].any() 구문 이후를 통해 episode가 정확히 terminal 지점에서 끊어져 나왔는지 assert
  • add_keys 는 처음에 episode(path) 에서 반환되는 key를 self.key에 list 형태로 세팅해주고 이후에는 그냥 return
    • ['actions', 'infos/action_log_probs', 'infos/qpos', 'infos/qvel', 'next_observations', 'observations', 'rewards', 'terminals', 'timeouts']
  • atleast_2d를 통해 dimension = 1일 경우 2로 expand
  • 처음보는 key일 경우 self._allocate로 self._dict[key]에 (max_episode_len, max_path_len, dim) 할당
  • self._dict[key]의 self._count 번째 episode, :path_length 까지 현재 array 추가
  • termination_penalty는 -100 이며 마지막 path len에 추가 (early termination 일 경우)
  • [’path_length’]는 (max_episode_len, ) 크기의 numpy array이며 여기에 path_len 기록

fields.finalize() 설명

diffuser/datasets/buffer.py

def _add_attributes(self):
        '''
            can access fields with `buffer.observations`
            instead of `buffer['observations']`
        '''
        for key, val in self._dict.items():
            setattr(self, key, val)

def finalize(self):
        ## remove extra slots
        for key in self.keys + ['path_lengths']:
            self._dict[key] = self._dict[key][:self._count]
        self._add_attributes()
  • add_attributes는 클래스.key 이렇게 접근가능하게 만들어주기 위해 도입
  • _dict 안의 key들과 ‘path_length’ 까지 self._dict에 정리 (self._count는 총 episode 개수로 max_episode_len으로 할당되어있던 거대한 numpy array를 정리)

diffuser/datasets/sequence.py 이어서

        **self.normalizer = DatasetNormalizer(fields, normalizer, path_lengths=fields['path_lengths'])**
        self.indices = self.make_indices(fields.path_lengths, horizon)

        self.observation_dim = fields.observations.shape[-1]
        self.action_dim = fields.actions.shape[-1]
        self.fields = fields
        self.n_episodes = fields.n_episodes
        self.path_lengths = fields.path_lengths
        self.normalize()

diffuser/datasets/normalization.py

class DatasetNormalizer:

    def __init__(self, dataset, normalizer, path_lengths=None):
        dataset = flatten(dataset, path_lengths)

        self.observation_dim = dataset['observations'].shape[1]
        self.action_dim = dataset['actions'].shape[1]

        if type(normalizer) == str:
            normalizer = eval(normalizer)

        self.normalizers = {}
        for key, val in dataset.items():
            try:
                self.normalizers[key] = normalizer(val)
            except:
                print(f'[ utils/normalization ] Skipping {key} | {normalizer}')
            # key: normalizer(val)
            # for key, val in dataset.items()
  • flatten 하기 전 dataset[’observation’] : (3213, 1000, 11)
    • 한 후에는 각 episode 별 path length 까지 모두 한 줄로 만들어 (sum(path_lengths), 11)
  • observation_dim : 11 , action_dim : 3
  • normalizer 에는 CDFNormalizer 클래스 (현재와 같은 파일 안에 있음)
  • self.normalizers[key]에 Normalizer가 (val) 값으로 init 되어 들어감
    • normalizer init 에는 각 key dimension에 대한 min, max 계산을 해놓음

diffuser/datasets/sequence.py 이어서

        self.normalizer = DatasetNormalizer(fields, normalizer, path_lengths=fields['path_lengths'])
        **self.indices = self.make_indices(fields.path_lengths, horizon)**

        self.observation_dim = fields.observations.shape[-1]
        self.action_dim = fields.actions.shape[-1]
        self.fields = fields
        self.n_episodes = fields.n_episodes
        self.path_lengths = fields.path_lengths
        **self.normalize()**
  • self.indices는 샘플링용 index이며 (몇번째 샘플, start, end) 이며 start ~ end 는 Horizon 만큼의 길이
    • 각 에피소드에 대해 여러개의 (start, end) 가 저장되어있는 셈
def normalize(self, keys=['observations', 'actions']):
        '''
            normalize fields that will be predicted by the diffusion model
        '''
        for key in keys:
            array = self.fields[key].reshape(self.n_episodes*self.max_path_length, -1)
            normed = self.normalizer(array, key)
            self.fields[f'normed_{key}'] = normed.reshape(self.n_episodes, self.max_path_length, -1)
  • self.fields[key]는 (self.n_episodes, self.max_path_length, key_dim) 인데 앞의 두 dimension을 flatten 시켜 normalizer에 넘김
  • self.normalizer 는 위에서 했듯이 CDFNormalizer 클래스가 들어가며 self.normalizer(array, key)로부르면 call__ 함수가 불러지며 내부적으로 self.normalize 를 불러 normalized 된 값을 반환한다

만들어진 field 의 log

[0] 차원은 episode 개수 , [1] 차원은 각 에피소드 당 최대 가능한 길이

actions: (3213, 1000, 3)
infos/action_log_probs: (3213, 1000, 1)
infos/qpos: (3213, 1000, 6)
infos/qvel: (3213, 1000, 6)
next_observations: (3213, 1000, 11)
observations: (3213, 1000, 11)
rewards: (3213, 1000, 1)
terminals: (3213, 1000, 1)
timeouts: (3213, 1000, 1)
normed_observations: (3213, 1000, 11)
normed_actions: (3213, 1000, 3)

scripts/train.py

dataset = dataset_config()
**renderer = render_config()**

if Config.diffusion == 'models.GaussianInvDynDiffusion':
        model_config = utils.Config(
            Config.model,
            savepath='model_config.pkl',
            horizon=Config.horizon,
            transition_dim=observation_dim,
            cond_dim=observation_dim,
            dim_mults=Config.dim_mults,
            returns_condition=Config.returns_condition,
            dim=Config.dim,
            condition_dropout=Config.condition_dropout,
            calc_energy=Config.calc_energy,
            device=Config.device,
        )

        diffusion_config = utils.Config(
            Config.diffusion,
            savepath='diffusion_config.pkl',
            horizon=Config.horizon,
            observation_dim=observation_dim,
            action_dim=action_dim,
            n_timesteps=Config.n_diffusion_steps,
            loss_type=Config.loss_type,
            clip_denoised=Config.clip_denoised,
            predict_epsilon=Config.predict_epsilon,
            hidden_dim=Config.hidden_dim,
            ar_inv=Config.ar_inv,
            train_only_inv=Config.train_only_inv,
            ## loss weighting
            action_weight=Config.action_weight,
            loss_weights=Config.loss_weights,
            loss_discount=Config.loss_discount,
            returns_condition=Config.returns_condition,
            condition_guidance_w=Config.condition_guidance_w,
            device=Config.device,
        )
  • renderer도 비슷한 방식으로 load
  • Config.diffusion이 위와 같이 load

model log

Config: <class 'diffuser.models.temporal.TemporalUnet'>

calc_energy: False
cond_dim: 11
condition_dropout: 0.25
dim: 128
dim_mults: (1, 4, 8)
horizon: 100
returns_condition: True
transition_dim: 11

diffusion log

<class 'diffuser.models.diffusion.GaussianInvDynDiffusion'>

action_dim: 3
action_weight: 10
ar_inv: False
clip_denoised: True
condition_guidance_w: 1.2
hidden_dim: 256
horizon: 100
loss_discount: 1
loss_type: l2
loss_weights: None
n_timesteps: 200
observation_dim: 11
predict_epsilon: True
returns_condition: True
train_only_inv: False

trainer log

Config: <class 'diffuser.utils.training.Trainer'>

bucket: /home/aajay/weights/
ema_decay: 0.995
gradient_accumulate_every: 2
label_freq: 200000
log_freq: 1000
n_reference: 8
sample_freq: 10000
save_checkpoints: False
save_freq: 10000
save_parallel: False
train_batch_size: 32
train_device: cuda
train_lr: 0.0002

    **model = model_config()**

    diffusion = diffusion_config(model)

    trainer = trainer_config(diffusion, dataset, renderer)

diffuser/models/temporal.py

class TemporalUnet(nn.Module):

    def __init__(
        self,
        horizon,
        transition_dim,
        cond_dim,
        dim=128,
        dim_mults=(1, 2, 4, 8),
        returns_condition=False,
        condition_dropout=0.1,
        calc_energy=False,
        kernel_size=5,
    ):
        super().__init__()

        dims = [transition_dim, *map(lambda m: dim * m, dim_mults)]
        in_out = list(zip(dims[:-1], dims[1:]))
        print(f'[ models/temporal ] Channel dimensions: {in_out}')

				if calc_energy:
            mish = False
            act_fn = nn.SiLU()
        else:
            **mish = True
            act_fn = nn.Mish()**
  • dim_mults : [1, 4, 8] 이렇게 주어지고 transition_dim : 11 (obs와 동일)
  • dims : [11, 128, 512, 1024] (dim 이 128이기 때문에)
  • in_out : [(11, 128), (128, 512), (512, 1024)]
  • calc_energy : False 이고 밑에 부분을 할당함 , activation 함수로 Mish 사용

이어서

			  self.time_dim = dim
        self.returns_dim = dim

        self.time_mlp = nn.Sequential(
            SinusoidalPosEmb(dim),
            nn.Linear(dim, dim * 4),
            act_fn,
            nn.Linear(dim * 4, dim),
        )

			# -----------참고--------------
class SinusoidalPosEmb(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.dim = dim

    def forward(self, x):
        device = x.device
        half_dim = self.dim // 2
        emb = math.log(10000) / (half_dim - 1)
        emb = torch.exp(torch.arange(half_dim, device=device) * -emb)
        emb = x[:, None] * emb[None, :]
        emb = torch.cat((emb.sin(), emb.cos()), dim=-1)
        return emb
			# -----------참고--------------

        self.returns_condition = returns_condition
        self.condition_dropout = condition_dropout
        self.calc_energy = calc_energy

        **if self.returns_condition:
            self.returns_mlp = nn.Sequential(
                        nn.Linear(1, dim),
                        act_fn,
                        nn.Linear(dim, dim * 4),
                        act_fn,
                        nn.Linear(dim * 4, dim),
                    )
            self.mask_dist = Bernoulli(probs=1-self.condition_dropout)
            embed_dim = 2*dim**
        else:
            embed_dim = dim
  • time_dim, returns_dim 모두 128
  • x[:, None] * emb[None, :] 라인에서 x는 (batch, input_dim) → (batch, 1, input_dim) 이렇게 변함
  • emb는 (1, half_dim) 이렇게 변함
  • torch.cat (sin, cos) 에서 원래 emb에 sin, cos element-wise로 취해서 concat=-1 로 하기 때문에 half_dim 했던 것이 다시 dim으로 돌아온다
  • embed_dim = 256

이어서 하기 전에

class Conv1dBlock(nn.Module):
    '''
        Conv1d --> GroupNorm --> Mish
    '''

    def __init__(self, inp_channels, out_channels, kernel_size, mish=True, n_groups=8):
        super().__init__()

        if mish:
            act_fn = nn.Mish()
        else:
            act_fn = nn.SiLU()

        self.block = nn.Sequential(
            nn.Conv1d(inp_channels, out_channels, kernel_size, padding=kernel_size // 2),
            Rearrange('batch channels horizon -> batch channels 1 horizon'),
            nn.GroupNorm(n_groups, out_channels),
            Rearrange('batch channels 1 horizon -> batch channels horizon'),
            act_fn,
        )

    def forward(self, x):
        return self.block(x)
class ResidualTemporalBlock(nn.Module):

    def __init__(self, inp_channels, out_channels, embed_dim, horizon, kernel_size=5, mish=True):
        super().__init__()

        self.blocks = nn.ModuleList([
            Conv1dBlock(inp_channels, out_channels, kernel_size, mish),
            Conv1dBlock(out_channels, out_channels, kernel_size, mish),
        ])

        if mish:
            act_fn = nn.Mish()
        else:
            act_fn = nn.SiLU()

        self.time_mlp = nn.Sequential(
            act_fn,
            nn.Linear(embed_dim, out_channels),
            Rearrange('batch t -> batch t 1'),
        )

        self.residual_conv = **nn.Conv1d(inp_channels, out_channels, 1)** \
            if inp_channels != out_channels else nn.Identity()

    def forward(self, x, t):
        '''
            x : [ batch_size x inp_channels x horizon ]
            t : [ batch_size x embed_dim ]
            returns:
            out : [ batch_size x out_channels x horizon ]
        '''
        out = self.blocks[0](x) + self.time_mlp(t)
        out = self.blocks[1](out)

        return out + self.residual_conv(x)
  • blocks outputdim : [batch, out_channels, timestep]
  • self.time_mlp outputdim: [batch, out_channels, 1] (각 시간에 대해 scalar 부여, broadcast 되기 때문에)
`       self.downs = nn.ModuleList([])
        self.ups = nn.ModuleList([])
        num_resolutions = len(in_out)

        print(in_out)
        for ind, (dim_in, dim_out) in enumerate(in_out):
            is_last = ind >= (num_resolutions - 1)

            self.downs.append(nn.ModuleList([
                ResidualTemporalBlock(dim_in, dim_out, embed_dim=embed_dim, horizon=horizon, kernel_size=kernel_size, mish=mish),
                ResidualTemporalBlock(dim_out, dim_out, embed_dim=embed_dim, horizon=horizon, kernel_size=kernel_size, mish=mish),
                Downsample1d(dim_out) if not is_last else nn.Identity()
            ]))

            if not is_last:
                horizon = horizon // 2

        mid_dim = dims[-1]
        self.mid_block1 = ResidualTemporalBlock(mid_dim, mid_dim, embed_dim=embed_dim, horizon=horizon, kernel_size=kernel_size, mish=mish)
        self.mid_block2 = ResidualTemporalBlock(mid_dim, mid_dim, embed_dim=embed_dim, horizon=horizon, kernel_size=kernel_size, mish=mish)

        for ind, (dim_in, dim_out) in enumerate(reversed(in_out[1:])):
            is_last = ind >= (num_resolutions - 1)

            self.ups.append(nn.ModuleList([
                ResidualTemporalBlock(dim_out * 2, dim_in, embed_dim=embed_dim, horizon=horizon, kernel_size=kernel_size, mish=mish),
                ResidualTemporalBlock(dim_in, dim_in, embed_dim=embed_dim, horizon=horizon, kernel_size=kernel_size, mish=mish),
                Upsample1d(dim_in) if not is_last else nn.Identity()
            ]))

            if not is_last:
                horizon = horizon * 2

        self.final_conv = nn.Sequential(
            Conv1dBlock(dim, dim, kernel_size=kernel_size, mish=mish),
            nn.Conv1d(dim, transition_dim, 1),
        )
  • model은 U-net 으로 epsilon(x, t) 를 예상하며
    • input_dim : [batch, horizon, transition]
    • output_dim : [batch, horizon]

train.py 에서 model=TemporalUnet() init 끝나고

diffusion init 부분

class GaussianInvDynDiffusion(nn.Module):
    def __init__(self, model, horizon, observation_dim, action_dim, n_timesteps=1000,
        loss_type='l1', clip_denoised=False, predict_epsilon=True, hidden_dim=256,
        action_weight=1.0, loss_discount=1.0, loss_weights=None, returns_condition=False,
        condition_guidance_w=0.1, ar_inv=False, train_only_inv=False):
        super().__init__()
        self.horizon = horizon
        self.observation_dim = observation_dim
        self.action_dim = action_dim
        self.transition_dim = observation_dim + action_dim
        self.model = model
        self.ar_inv = ar_inv
        self.train_only_inv = train_only_inv
        if self.ar_inv:
            self.inv_model = ARInvModel(hidden_dim=hidden_dim, observation_dim=observation_dim, action_dim=action_dim)
        **else:
            self.inv_model = nn.Sequential(
                nn.Linear(2 * self.observation_dim, hidden_dim),
                nn.ReLU(),
                nn.Linear(hidden_dim, hidden_dim),
                nn.ReLU(),
                nn.Linear(hidden_dim, self.action_dim),
            )**
        self.returns_condition = returns_condition
        self.condition_guidance_w = condition_guidance_w
  • self.train_only_inv : False
  • returns_condition : True, condition_guidance_w : 1.2

이어서

	      betas = cosine_beta_schedule(n_timesteps)
        alphas = 1. - betas
        alphas_cumprod = torch.cumprod(alphas, axis=0)
        alphas_cumprod_prev = torch.cat([torch.ones(1), alphas_cumprod[:-1]])
        self.n_timesteps = int(n_timesteps)
        self.clip_denoised = clip_denoised
        self.predict_epsilon = predict_epsilon

        self.register_buffer('betas', betas)
        self.register_buffer('alphas_cumprod', alphas_cumprod)
        self.register_buffer('alphas_cumprod_prev', alphas_cumprod_prev)

        # calculations for diffusion q(x_t | x_{t-1}) and others
        self.register_buffer('sqrt_alphas_cumprod', torch.sqrt(alphas_cumprod))
        self.register_buffer('sqrt_one_minus_alphas_cumprod', torch.sqrt(1. - alphas_cumprod))
        self.register_buffer('log_one_minus_alphas_cumprod', torch.log(1. - alphas_cumprod))
        self.register_buffer('sqrt_recip_alphas_cumprod', torch.sqrt(1. / alphas_cumprod))
        self.register_buffer('sqrt_recipm1_alphas_cumprod', torch.sqrt(1. / alphas_cumprod - 1))

        # calculations for posterior q(x_{t-1} | x_t, x_0)
        posterior_variance = betas * (1. - alphas_cumprod_prev) / (1. - alphas_cumprod)
        self.register_buffer('posterior_variance', posterior_variance)

        ## log calculation clipped because the posterior variance
        ## is 0 at the beginning of the diffusion chain
        self.register_buffer('posterior_log_variance_clipped',
            torch.log(torch.clamp(posterior_variance, min=1e-20)))
        self.register_buffer('posterior_mean_coef1',
            betas * np.sqrt(alphas_cumprod_prev) / (1. - alphas_cumprod))
        self.register_buffer('posterior_mean_coef2',
	            (1. - alphas_cumprod_prev) * np.sqrt(alphas) / (1. - alphas_cumprod))

        **loss_weights = self.get_loss_weights(loss_discount)**
        self.loss_fn = Losses['state_l2'](loss_weights)
  • https://openreview.net/forum?id=-NEXDKk8gZ 여기에 나온대로 betas 를 계획 : [n_timesteps] 크기의 텐서 반환
  • cumprod는 alpha_bar를 계산하는 과정
  • cumprod_prev는 처음에 대해서는 1을 주고 나머지는 1 step이전값을 반환
  • self.register_buffer 하는 이유? : 훈련이 필요하진 않지만 모델과 같은 device에 자동으로 올라가도록 해줌
  • p,q 계산에 필요한 값들을 register에 등록
def get_loss_weights(self, discount):
        '''
            sets loss coefficients for trajectory

            action_weight   : float
                coefficient on first action loss
            discount   : float
                multiplies t^th timestep of trajectory loss by discount**t
            weights_dict    : dict
                { i: c } multiplies dimension i of observation loss by c
        '''
        self.action_weight = 1
        dim_weights = torch.ones(self.observation_dim, dtype=torch.float32)

        ## decay loss with trajectory timestep: discount**t
        discounts = discount ** torch.arange(self.horizon, dtype=torch.float)
        discounts = discounts / discounts.mean()
        loss_weights = torch.einsum('h,t->ht', discounts, dim_weights)
        # Cause things are conditioned on t=0
        **if self.predict_epsilon:
            loss_weights[0, :] = 0**

        return loss_weights
  • discount : 1 , self.horizion : 100
  • discounts shape : [100], dim_weights : [11]
  • loss_weights 는 [100, 11] 이고 [i][j] 원소는 discounts[i]와 dim_weights[j]의 곱
    • [100, 11] 의 원소 모두 1인 tensor라고 생각하면됨
  • t=0에 대해 condition 되어있기 때문에 loss_weights[0] 의 원소를 모두 0으로 변환

다시 돌아와서

 self.loss_fn = Losses['state_l2'](loss_weights)

이렇게 선언되어있었는데

class WeightedStateLoss(nn.Module):

    def __init__(self, weights):
        super().__init__()
        self.register_buffer('weights', weights)

    def forward(self, pred, targ):
        '''
            pred, targ : tensor
                [ batch_size x horizon x transition_dim ]
        '''
        loss = self._loss(pred, targ)
        weighted_loss = (loss * self.weights).mean()
        return weighted_loss, {'a0_loss': weighted_loss}

class WeightedStateL2(WeightedStateLoss):

    def _loss(self, pred, targ):
        return F.mse_loss(pred, targ, reduction='none')
  • 여기로 넘겨지는 self.weights가 [0]이 모두 0이고 그 다음부터 1인 matrix이기 때문에 horizion 0에 대해서는 l2 loss 부여하지 않는 L2 loss 함수

scripts/train.py 의 trainer 로드 과정

utils/training.py

class Trainer(object):
    def __init__(
        self,
        diffusion_model,
        dataset,
        renderer,
        ema_decay=0.995,
        train_batch_size=32,
        train_lr=2e-5,
        gradient_accumulate_every=2,
        step_start_ema=2000,
        update_ema_every=10,
        log_freq=100,
        sample_freq=1000,
        save_freq=1000,
        label_freq=100000,
        save_parallel=False,
        n_reference=8,
        bucket=None,
        train_device='cuda',
        save_checkpoints=False,
    ):
        super().__init__()
        self.model = diffusion_model
        self.ema = EMA(ema_decay)
        self.ema_model = copy.deepcopy(self.model)
        self.update_ema_every = update_ema_every
        self.save_checkpoints = save_checkpoints

        self.step_start_ema = step_start_ema
        self.log_freq = log_freq
        self.sample_freq = sample_freq
        self.save_freq = save_freq
        self.label_freq = label_freq
        self.save_parallel = save_parallel

        self.batch_size = train_batch_size
        self.gradient_accumulate_every = gradient_accumulate_every

        self.dataset = dataset

        self.dataloader = cycle(torch.utils.data.DataLoader(
            self.dataset, batch_size=train_batch_size, num_workers=0, shuffle=True, pin_memory=True
        ))
        self.dataloader_vis = cycle(torch.utils.data.DataLoader(
            self.dataset, batch_size=1, num_workers=0, shuffle=True, pin_memory=True
        ))
        self.renderer = renderer
        self.optimizer = torch.optim.Adam(diffusion_model.parameters(), lr=train_lr)

        self.bucket = bucket
        self.n_reference = n_reference

        self.reset_parameters()
        self.step = 0

        self.device = train_device
  • ema 는 soft update하기 위한 복사된 모델이라고 생각하면 됨
  • cycle은 yield로 데이터를 한 배치씩 줄 예정
  • reset_parameters()는 ema 모델의 weight를 원조 모델과 동기화 하는 것

scripts/train.py

# model, diffusion, trainer를 init하고나서 
    batch = utils.batchify(dataset[0], Config.device)
    loss, _ = diffusion.loss(*batch)
    loss.backward()
  • dimension 맞는지 테스트
  • dataset[0]의 선언으로 diffsuer/datasets/sequence.py SequenceDataset의 getitem함수가 호출됨
def __getitem__(self, idx, eps=1e-4):
        path_ind, start, end = self.indices[idx]

        observations = self.fields.normed_observations[path_ind, start:end]
        actions = self.fields.normed_actions[path_ind, start:end]

        conditions = self.get_conditions(observations)
        trajectories = np.concatenate([actions, observations], axis=-1)

        **if self.include_returns:
            rewards = self.fields.rewards[path_ind, start:]
            discounts = self.discounts[:len(rewards)]
            returns = (discounts * rewards).sum()
            returns = np.array([returns/self.returns_scale], dtype=np.float32)
            batch = RewardBatch(trajectories, conditions, returns)**
        else:
            batch = Batch(trajectories, conditions)

        return batch
  • observations : [100, 11] (horizon_len, obs_dim)
  • get_conditions 는 {0 : observations[0]} 을 반환
  • trajectories.shape : [100, 14]
  • start 부터 시작하는 모든 reward를 discount 고려해 더함 (return to go를 사용하나.? )
  • 그 다음 scale 하고 named tuple인 RewardBatch에 넘김

utils.arrays.py

def batchify(batch, device): 
  • 여기를 통과해 numpy array 에서 tensor 형태 input dataset으로 변환

scripts/train.py

batch = utils.batchify(dataset[0], Config.device)
loss, _ = diffusion.loss(*batch)
  • batch가 담고있는 정보
  • batch.trajectories : [1, 100, 14]
  • batch.conditions : {0 : [1, 11]}
  • batch.returns : [1,1]

diffuser/models/diffusion.py

def loss(self, x, cond, returns=None):
	          batch_size = len(x)
            t = torch.randint(0, self.n_timesteps, (batch_size,), device=x.device).long()
            **diffuse_loss, info = self.p_losses(x[:, :, self.action_dim:], cond, t, returns)**
            # Calculating inv loss
            x_t = x[:, :-1, self.action_dim:]
            a_t = x[:, :-1, :self.action_dim]
            x_t_1 = x[:, 1:, self.action_dim:]
            x_comb_t = torch.cat([x_t, x_t_1], dim=-1)
            x_comb_t = x_comb_t.reshape(-1, 2 * self.observation_dim)
            a_t = a_t.reshape(-1, self.action_dim)
            if self.ar_inv:
                inv_loss = self.inv_model.calc_loss(x_comb_t, a_t)
            else:
                pred_a_t = self.inv_model(x_comb_t)
                inv_loss = F.mse_loss(pred_a_t, a_t)

            loss = (1 / 2) * (diffuse_loss + inv_loss)

        return loss, info
  • n_timesteps는 diffuser의 T를 말하는 듯
  • 0~T-1 사이 값 뽑아 batch개수만큼 t에 할당
def p_losses(self, x_start, cond, t, returns=None):
        noise = torch.randn_like(x_start)

        **x_noisy = self.q_sample(x_start=x_start, t=t, noise=noise)**
        x_noisy = apply_conditioning(x_noisy, cond, 0)

        x_recon = self.model(x_noisy, cond, t, returns)

        if not self.predict_epsilon:
            x_recon = apply_conditioning(x_recon, cond, 0)

        assert noise.shape == x_recon.shape

        if self.predict_epsilon:
            loss, info = self.loss_fn(x_recon, noise)
        else:
            loss, info = self.loss_fn(x_recon, x_start)

        return loss, info
  • x_start : [1, 100, 11] (batch, horizon, obs_dim), noise도 크기 동일
def q_sample(self, x_start, t, noise=None):
        if noise is None:
            noise = torch.randn_like(x_start)

        sample = (
            extract(self.sqrt_alphas_cumprod, t, x_start.shape) * x_start +
            extract(self.sqrt_one_minus_alphas_cumprod, t, x_start.shape) * noise
        )

        return sample

Untitled

def extract(a, t, x_shape):
    b, *_ = t.shape
    out = a.gather(-1, t)
    return out.reshape(b, *((1,) * (len(x_shape) - 1)))
  • a : sqrt_alphas_cumprod 이고 사이즈 (200) → 각각 (0,n_timesteps) 사이 t에 대응
  • 두번째 줄에서 (batch) 만큼 out에 return
  • x_shape은 (batch, horizon, obs_dim) 이므로 batch dim 제외한 만큼 1 할당
    • 생성 예시 (b, 1, 1)

p_losses 이어서

def p_losses(self, x_start, cond, t, returns=None):
        noise = torch.randn_like(x_start)

        ****x_noisy = self.q_sample(x_start=x_start, t=t, noise=noise)
        **x_noisy = apply_conditioning(x_noisy, cond, 0)**

        x_recon = self.model(x_noisy, cond, t, returns)

        if not self.predict_epsilon:
            x_recon = apply_conditioning(x_recon, cond, 0)

        assert noise.shape == x_recon.shape

        if self.predict_epsilon:
            loss, info = self.loss_fn(x_recon, noise)
        else:
            loss, info = self.loss_fn(x_recon, x_start)

        return loss, info
# models/helpers.py
def apply_conditioning(x, conditions, action_dim):
    for t, val in conditions.items():
        x[:, t, action_dim:] = val.clone()
    return x
  • conditions 가 {0: timestep(env기준) 0에서의 obs} dictionary 이므로 x의 t=0 인 부분에 이 값들을 복사해 넣는다

self.model 이 있으므로 Temporal UNet 에 대한 forward 과정을 살펴보자

models/temporal.py

def forward(self, x, cond, time, returns=None, use_dropout=True, force_dropout=False):
        '''
            x : [ batch x horizon x transition ]
            returns : [batch x horizon]
        '''
        if self.calc_energy:
            x_inp = x

        x = einops.rearrange(x, 'b h t -> b t h')

        t = self.time_mlp(time)

        if self.returns_condition:
            assert returns is not None
            returns_embed = self.returns_mlp(returns)
            if use_dropout:
                **mask = self.mask_dist.sample(sample_shape=(returns_embed.size(0), 1)).to(returns_embed.device)
                returns_embed = mask*returns_embed**
            if force_dropout:
                returns_embed = 0*returns_embed
            t = torch.cat([t, returns_embed], dim=-1)
  • time : (0~T) diffusion 모델 T 에서 random 으로 뽑은 시간
  • x 모양을 [batch, 100, 11] → [batch, 11, 100]
  • t → [batch, hidden_dim]
  • self.returns_condition = True 이고 decision Transformer처럼 return to go를 embedding
    • returns_embed : [1, 128]
  • return condition을 bernollui 로 1을 주거나 0을 줘버림 (zero token 준다는 소리가 이부분 같음)
  • mask 크기 : [batch, 1]
  • 시간에 return condition이 cat 되어 들어가서 최종적으로 [batch, 2*dim]으로 들어감

이어서

        h = []

        for resnet, resnet2, downsample in self.downs:
            x = resnet(x, t)
            x = resnet2(x, t)
            h.append(x)
            x = downsample(x)
				
				x = self.mid_block1(x, t)
        x = self.mid_block2(x, t)
  • 처음 x : [1, 11, 100]
  • 처음 resnet, resnet2 통과 후 : [1, 128, 100]
    • 두번째 : [1, 512, 50]
      • 세번째 : [1, 1024, 25]
  • downsample 통과 후 : [1, 128, 50]
    • 두번째 : [1, 512, 25]
      • 세번째는 downsample 안함
  • mid block 두번 통과후 dimension 동일
for resnet, resnet2, upsample in self.ups:
            x = torch.cat((x, h.pop()), dim=1)
            x = resnet(x, t)
            x = resnet2(x, t)
            x = upsample(x)

        x = self.final_conv(x)

        x = einops.rearrange(x, 'b t h -> b h t')

        if self.calc_energy:
            # Energy function
            energy = ((x - x_inp)**2).mean()
            grad = torch.autograd.grad(outputs=energy, inputs=x_inp, create_graph=True)
            return grad[0]
        else:
            return x
  • 이 때 x 크기 : [1, 1024, 25]
  • torch cat 처음 : [1, 2048, 25]
    • 두번째 : [1, 1024, 50]
  • resnet 처음 : [1, 512, 25]
    • 두번쨰 : [1, 128, 50]
  • upsample 처음 : [1, 512, 50]
    • 두번째 : [1, 128, 100]
  • self.ups 는 두번
  • final conv 들어가기 전 [1, 128, 100] → [1, 11, 100]
  • 다시 [1, 100, 11] 로 돌려놓음

다시 돌아가서

def p_losses(self, x_start, cond, t, returns=None):
        noise = torch.randn_like(x_start)

        ****x_noisy = self.q_sample(x_start=x_start, t=t, noise=noise)
        x_noisy = apply_conditioning(x_noisy, cond, 0)

        x_recon = self.model(x_noisy, cond, t, returns)

        if not self.predict_epsilon:
            x_recon = apply_conditioning(x_recon, cond, 0)

        assert noise.shape == x_recon.shape

        if self.predict_epsilon:
            **loss, info = self.loss_fn(x_recon, noise)**
        else:
            loss, info = self.loss_fn(x_recon, x_start)

        return loss, info
  • loss_fn 은 에피소드 기준 timestep 0에만 loss 안따지는 weighted L2 loss 임을 기억!

models/diffusion.py

def loss(self, x, cond, returns=None):
        if self.train_only_inv:
            # Calculating inv loss
            x_t = x[:, :-1, self.action_dim:]
            a_t = x[:, :-1, :self.action_dim]
            x_t_1 = x[:, 1:, self.action_dim:]
            x_comb_t = torch.cat([x_t, x_t_1], dim=-1)
            x_comb_t = x_comb_t.reshape(-1, 2 * self.observation_dim)
            a_t = a_t.reshape(-1, self.action_dim)
            if self.ar_inv:
                loss = self.inv_model.calc_loss(x_comb_t, a_t)
                info = {'a0_loss':loss}
            else:
                pred_a_t = self.inv_model(x_comb_t)
                loss = F.mse_loss(pred_a_t, a_t)
                info = {'a0_loss': loss}
        else:
            batch_size = len(x)
            t = torch.randint(0, self.n_timesteps, (batch_size,), device=x.device).long()
            diffuse_loss, info = self.p_losses(x[:, :, self.action_dim:], cond, t, returns)
            # Calculating inv loss
            x_t = x[:, :-1, self.action_dim:]
            a_t = x[:, :-1, :self.action_dim]
            x_t_1 = x[:, 1:, self.action_dim:]
            x_comb_t = torch.cat([x_t, x_t_1], dim=-1)
            x_comb_t = x_comb_t.reshape(-1, 2 * self.observation_dim)
            a_t = a_t.reshape(-1, self.action_dim)
            if self.ar_inv:
                inv_loss = self.inv_model.calc_loss(x_comb_t, a_t)
            else:
                **pred_a_t = self.inv_model(x_comb_t)
                inv_loss = F.mse_loss(pred_a_t, a_t)**

            loss = (1 / 2) * (diffuse_loss + inv_loss)

        return loss, info
  • diffusion loss를 모두 계산했기 때문에 action에 대한 inverse dynamics 계산
  • x_t : [1, 99, 11] , a_t : [1, 99, 3]
  • x_comb_t : [1, 99, 22]
  • [batch * horizon-1, dim] 으로 둘다 전환

이제 scripts/train.py 의 trainer.train은 어떻게 동작하는지

위에서 dimension test한 것과 거의 동일하게 진행


eval 부분

넘겨지는 kwargs

{'RUN.prefix': 'diffuser/default_inv/predict_epsilon_200_1000000.0/dropout_0.25/hopper-medium-expert-v2/100', 'seed': 100, 'returns_condition': True, 'predict_epsilon': True, 'n_diffusion_steps': 200, 'condition_dropout': 0.25, 'diffusion': 'models.GaussianInvDynDiffusion', 'n_train_steps': 1000000.0, 'dataset': 'hopper-medium-expert-v2', 'returns_scale': 400.0, 'RUN.job_counter': 1, 'RUN.job_name': 'predict_epsilon_200_1000000.0/dropout_0.25/hopper-medium-expert-v2/100'}

dataset 과 config 동일하게 init

Config: <class 'diffuser.datasets.sequence.SequenceDataset'>
env: hopper-medium-expert-v2
horizon: 100
include_returns: True
max_path_length: 1000
normalizer: CDFNormalizer
preprocess_fns: []
returns_scale: 400.0
use_padding: True

diffuser config

Config: <class 'diffuser.models.diffusion.GaussianInvDynDiffusion'>
action_dim: 3
action_weight: 10
clip_denoised: True
condition_guidance_w: 1.2
hidden_dim: 256
horizon: 100
loss_discount: 1
loss_type: l2
loss_weights: None
n_timesteps: 200
observation_dim: 11
predict_epsilon: True
returns_condition: True

trainer config

Config: <class 'diffuser.utils.training.Trainer'>
bucket: /home/aajay/weights/
ema_decay: 0.995
gradient_accumulate_every: 2
label_freq: 200000
log_freq: 1000
n_reference: 8
sample_freq: 10000
save_freq: 10000
save_parallel: False
train_batch_size: 32
train_device: cuda
train_lr: 0.0002

model, diffusion, trainer load 동일

evaluate_inv_parallel.py

    model = model_config()
    diffusion = diffusion_config(model)
    trainer = trainer_config(diffusion, dataset, renderer)
    logger.print(utils.report_parameters(model), color='green')
    trainer.step = state_dict['step']
    trainer.model.load_state_dict(state_dict['model'])
    trainer.ema_model.load_state_dict(state_dict['ema'])

    num_eval = 10
    device = Config.device

    env_list = [gym.make(Config.dataset) for _ in range(num_eval)]
    dones = [0 for _ in range(num_eval)]
    episode_rewards = [0 for _ in range(num_eval)]

    assert trainer.ema_model.condition_guidance_w == Config.condition_guidance_w
    returns = to_device(Config.test_ret * torch.ones(num_eval, 1), device)

    t = 0
    obs_list = [env.reset()[None] for env in env_list]
    obs = np.concatenate(obs_list, axis=0)
    recorded_obs = [deepcopy(obs[:, None])]
  • env_list의 각 원소는 OfflieHoppperEnv 의 instance
  • dones, episode_rewards도 같은 크기
  • condition_guidance_w : 1.2
  • obs_list[0] 의 크기 : (1, 11)
  • obs shape : (10, 11) : (num_eval, obs_dim)
  • recorded_obs : [10, 1, 11] 원소 하나 가진 []
   while sum(dones) <  num_eval:
        obs = dataset.normalizer.normalize(obs, 'observations')
        conditions = {0: to_torch(obs, device=device)}
        **samples = trainer.ema_model.conditional_sample(conditions, returns=returns)**
        obs_comb = torch.cat([samples[:, 0, :], samples[:, 1, :]], dim=-1)
        obs_comb = obs_comb.reshape(-1, 2*observation_dim)
        action = trainer.ema_model.inv_model(obs_comb)

        samples = to_np(samples)
        action = to_np(action)

        action = dataset.normalizer.unnormalize(action, 'actions')

        if t == 0:
            normed_observations = samples[:, :, :]
            observations = dataset.normalizer.unnormalize(normed_observations, 'observations')
            savepath = os.path.join('images', 'sample-planned.png')
            renderer.composite(savepath, observations)
  • noramlize 통과하고 shape 유지
  • to_torch는 dict, tensor numpy 에 대해 dtype으로 바꿔주고 device에 올려주는 함수
  • conditions : [batch, 첫 obs]

diffuser/models/diffusion.py

@torch.no_grad()
    def conditional_sample(self, cond, returns=None, horizon=None, *args, **kwargs):
        '''
            conditions : [ (time, state), ... ]
        '''
        device = self.betas.device
        batch_size = len(cond[0])
        horizon = horizon or self.horizon
        shape = (batch_size, horizon, self.observation_dim)

        return self.p_sample_loop(shape, cond, returns, *args, **kwargs)
  • cond[0] 가 dictionary [0] 의 원소로 각 batch마다 첫번째 obs 포함
  • shape : (10, 100, 11)
@torch.no_grad()
    def p_sample_loop(self, shape, cond, returns=None, verbose=True, return_diffusion=False):
        device = self.betas.device

        batch_size = shape[0]
        x = 0.5*torch.randn(shape, device=device)
        x = apply_conditioning(x, cond, 0)

        if return_diffusion: diffusion = [x] # false

        progress = utils.Progress(self.n_timesteps) if verbose else utils.Silent()
        for i in reversed(range(0, self.n_timesteps)):
            timesteps = torch.full((batch_size,), i, device=device, dtype=torch.long)
            **x = self.p_sample(x, cond, timesteps, returns)**
            x = apply_conditioning(x, cond, 0)

            progress.update({'t': i})

            if return_diffusion: diffusion.append(x)

        progress.close()

        if return_diffusion:
            return x, torch.stack(diffusion, dim=1)
        else:
            return x
  • x 모양 : (10, 100, 11)
  • apply_conditioning : 모든 배치에 대해 timestep 0에서 obs 업데이트
  • timesteps는 각 배치별 현재 timestep 담고 있음 [batch]
  • 이 과정을 T번 반복해서 x 반환 (10, 100, 11)
@torch.no_grad()
    def p_sample(self, x, cond, t, returns=None):
        b, *_, device = *x.shape, x.device
        **model_mean, _, model_log_variance = self.p_mean_variance(x=x, cond=cond, t=t, returns=returns)**
        noise = 0.5*torch.randn_like(x)
        # no noise when t == 0
        nonzero_mask = (1 - (t == 0).float()).reshape(b, *((1,) * (len(x.shape) - 1)))
        return model_mean + nonzero_mask * (0.5 * model_log_variance).exp() * noise
  • x, noise shape : [10, 100, 11]
  • nonzero_mask shape : [10, 1, 1] (t=0에서 noise 주지 않기 위해)
def p_mean_variance(self, x, cond, t, returns=None):
        **if self.returns_condition:
            # epsilon could be epsilon or x0 itself
            epsilon_cond = self.model(x, cond, t, returns, use_dropout=False)
            epsilon_uncond = self.model(x, cond, t, returns, force_dropout=True)
            epsilon = epsilon_uncond + self.condition_guidance_w*(epsilon_cond - epsilon_uncond)**
        else:
            epsilon = self.model(x, cond, t)

        t = t.detach().to(torch.int64)
        x_recon = self.predict_start_from_noise(x, t=t, noise=epsilon)

        if self.clip_denoised:
            x_recon.clamp_(-1., 1.)
        else:
            assert RuntimeError()

        **model_mean, posterior_variance, posterior_log_variance = self.q_posterior(
                x_start=x_recon, x_t=x, t=t)**
        return model_mean, posterior_variance, posterior_log_variance
  • 논문에서 말한대로 perturbed noise 를 이용한 planning

Untitled

  • use_dropout 거짓이면 무조건 return 포함
  • froce_dropout 이 참이면 codition 무조건 없음
  • 매개변수로 들어오는 t는 [batch_size] 크기의 현재 diffusion timestep 나타냄 (reverse order)
  • predict_start_from_noise가 여기 대응

Untitled

def q_posterior(self, x_start, x_t, t):
        posterior_mean = (
            extract(self.posterior_mean_coef1, t, x_t.shape) * x_start +
            extract(self.posterior_mean_coef2, t, x_t.shape) * x_t
        )
        posterior_variance = extract(self.posterior_variance, t, x_t.shape)
        posterior_log_variance_clipped = extract(self.posterior_log_variance_clipped, t, x_t.shape)
        return posterior_mean, posterior_variance, posterior_log_variance_clipped

Untitled

  • 여기에 대응
  • posterior_mean : [10, 100, 11]
  • posterior_variance : [10, 1, 1]

evaluate_inv_parallel.py

 while sum(dones) <  num_eval:
        obs = dataset.normalizer.normalize(obs, 'observations')
        conditions = {0: to_torch(obs, device=device)}
        ****samples = trainer.ema_model.conditional_sample(conditions, returns=returns)
        obs_comb = torch.cat([samples[:, 0, :], samples[:, 1, :]], dim=-1)
        obs_comb = obs_comb.reshape(-1, 2*observation_dim)
        action = trainer.ema_model.inv_model(obs_comb)

        samples = to_np(samples)
        action = to_np(action)

        action = dataset.normalizer.unnormalize(action, 'actions')

        if t == 0:
            normed_observations = samples[:, :, :]
            observations = dataset.normalizer.unnormalize(normed_observations, 'observations')
            savepath = os.path.join('images', 'sample-planned.png')
            renderer.composite(savepath, observations)
  • samples : [10, 100, 11]
  • obs_comb: [10, 22]
profile
0100101

0개의 댓글