On AMD paltform, Torch.profiler schedule(wait=x,warmup=y,active=z) option would count GPU kernel time (y+z) times rather than z times #102141

WeiMa01 · 2023-05-24T02:54:04Z

🐛 Describe the bug

When I use torch.profiler on AMD platform to profile GPT model inference, I found the raw data is unreasonable, the CPU operations "self CUDA" total time is far less than the GPU kernels "self CUDA" total time.
After experiment and analyse the raw data, I found the GPU kernel "self CUDA" time is counted by the sum of warmup and active.

Pthon script:
with torch.profiler.profile( schedule=torch.profiler.schedule(wait=1,warmup=1,active=2), on_trace_ready=torch.profiler.tensorboard_trace_handler( dir_name='./logs'), activities=[ torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA ], with_modules=True, record_shapes=True, profile_memory=True, with_stack=True, )as prof: with torch.no_grad(): for i in range(4): logits = model.generate(**input_ids, do_sample=True, num_beams=1, min_length=12, max_new_tokens=12,pad_token_id=50256) prof.step() print(prof.key_averages(group_by_input_shape=True).table(row_limit=1000000, sort_by='self_cuda_time_total'))

Versions

model: GPT-J-6B with FP16 inference
H/W: MI100 * 8
S/W: PyTorch 1.13.0 ROCm:5.4 Transformers:4.29.2

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @robieta @chaekit @aaronenyeshi @ngimel @nbcsm @guotuofeng @guyang3532 @gaoteng-git @tiffzhaofb @dzhulgakov @davidberard98

The text was updated successfully, but these errors were encountered:

mwootton · 2023-05-30T16:36:19Z

@WeiMa01
Good catch, you have it exactly right. Fix is here:
pytorch/kineto#702

This was merged into kineto and it looks like Pytorch 2.0.0 updated the kineto submodule commit, so this is fixed in 2.0.

jeffdaily · 2023-07-24T17:27:31Z

Since this issue was fixed by the time of the PT2 release, we ask that you use a newer pytorch version that has the fix. We can't fix older releases. Closing since this issue is known and fixed in the latest pytorch. Please reopen if you still are having issues.

soulitzer added module: rocm AMD GPU support for Pytorch oncall: profiler profiler-related issues (cpu, gpu, kineto) labels May 25, 2023

jeffdaily assigned mwootton Jul 24, 2023

jeffdaily closed this as completed Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On AMD paltform, Torch.profiler schedule(wait=x,warmup=y,active=z) option would count GPU kernel time (y+z) times rather than z times #102141

On AMD paltform, Torch.profiler schedule(wait=x,warmup=y,active=z) option would count GPU kernel time (y+z) times rather than z times #102141

WeiMa01 commented May 24, 2023 •

edited by pytorch-bot bot

mwootton commented May 30, 2023

jeffdaily commented Jul 24, 2023

On AMD paltform, Torch.profiler schedule(wait=x,warmup=y,active=z) option would count GPU kernel time (y+z) times rather than z times #102141

On AMD paltform, Torch.profiler schedule(wait=x,warmup=y,active=z) option would count GPU kernel time (y+z) times rather than z times #102141

Comments

WeiMa01 commented May 24, 2023 • edited by pytorch-bot bot

🐛 Describe the bug

Versions

mwootton commented May 30, 2023

jeffdaily commented Jul 24, 2023

WeiMa01 commented May 24, 2023 •

edited by pytorch-bot bot