Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

On AMD paltform, Torch.profiler schedule(wait=x,warmup=y,active=z) option would count GPU kernel time (y+z) times rather than z times #102141

Closed
WeiMa01 opened this issue May 24, 2023 · 2 comments
Assignees
Labels
module: rocm AMD GPU support for Pytorch oncall: profiler profiler-related issues (cpu, gpu, kineto)

Comments

@WeiMa01
Copy link

WeiMa01 commented May 24, 2023

馃悰 Describe the bug

When I use torch.profiler on AMD platform to profile GPT model inference, I found the raw data is unreasonable, the CPU operations "self CUDA" total time is far less than the GPU kernels "self CUDA" total time.
After experiment and analyse the raw data, I found the GPU kernel "self CUDA" time is counted by the sum of warmup and active.

Pthon script:
with torch.profiler.profile( schedule=torch.profiler.schedule(wait=1,warmup=1,active=2), on_trace_ready=torch.profiler.tensorboard_trace_handler( dir_name='./logs'), activities=[ torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA ], with_modules=True, record_shapes=True, profile_memory=True, with_stack=True, )as prof: with torch.no_grad(): for i in range(4): logits = model.generate(**input_ids, do_sample=True, num_beams=1, min_length=12, max_new_tokens=12,pad_token_id=50256) prof.step() print(prof.key_averages(group_by_input_shape=True).table(row_limit=1000000, sort_by='self_cuda_time_total'))

Versions

model: GPT-J-6B with FP16 inference
H/W: MI100 * 8
S/W: PyTorch 1.13.0 ROCm:5.4 Transformers:4.29.2

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @robieta @chaekit @aaronenyeshi @ngimel @nbcsm @guotuofeng @guyang3532 @gaoteng-git @tiffzhaofb @dzhulgakov @davidberard98

@soulitzer soulitzer added module: rocm AMD GPU support for Pytorch oncall: profiler profiler-related issues (cpu, gpu, kineto) labels May 25, 2023
@mwootton
Copy link
Contributor

@WeiMa01
Good catch, you have it exactly right. Fix is here:
pytorch/kineto#702

This was merged into kineto and it looks like Pytorch 2.0.0 updated the kineto submodule commit, so this is fixed in 2.0.

@jeffdaily
Copy link
Collaborator

Since this issue was fixed by the time of the PT2 release, we ask that you use a newer pytorch version that has the fix. We can't fix older releases. Closing since this issue is known and fixed in the latest pytorch. Please reopen if you still are having issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: rocm AMD GPU support for Pytorch oncall: profiler profiler-related issues (cpu, gpu, kineto)
Projects
None yet
Development

No branches or pull requests

4 participants