Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inductor] [CPU] performance regression with TORCHINDUCTOR_FREEZING=1 #104952

Closed
ESI-SYD opened this issue Jul 11, 2023 · 9 comments
Closed

[Inductor] [CPU] performance regression with TORCHINDUCTOR_FREEZING=1 #104952

ESI-SYD opened this issue Jul 11, 2023 · 9 comments
Assignees
Labels
oncall: cpu inductor CPU Inductor issues for Intel team to triage oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ESI-SYD
Copy link

ESI-SYD commented Jul 11, 2023

馃悰 Describe the bug

There are 6 performance regression from #93531 (comment)

2023-07-09 nightly 2023-07-06 nightly Result Comp
model batch_size speedup inductor eager batch_size speedup inductor eager speedup ratio eager ratio inductor ratio
Background_Matting 1 0.782193 0.350721461 0.274331872 1 1.00737 0.279395653 0.281454799 0.78 1.03 0.8
doctr_det_predictor 1 1.090279 0.148405348 0.161803234 1 1.713053 0.095406578 0.163436525 0.64 1.01 0.64
functorch_dp_cifar10 64 0.622732 0.009190167 0.005723011 64 1.008348 0.005596095 0.005642811 0.62 0.99 0.61
gmlp_s16_224 128 1.068468 0.658434753 0.703516464 128 1.227975 0.587295424 0.721184098 0.87 1.03 0.89
resmlp_12_224 128 0.749039 0.415152565 0.310965462 128 1.237528 0.259625741 0.321294124 0.61 1.03 0.63
tnt_s_patch16_224 1 1.173545 0.094649226 0.111075126 1 1.367958 0.081854891 0.111974053 0.86 1.01 0.86

SW information:

SW Nightly commit Main commit
Pytorch 9b5a84f dd6c38c
Torchbench / 8526eabb
torchaudio a233cc1 1e117f5
torchtext 90ea46c 8546bbb
torchvision 2ab2f74 657027f
torchdata 9ed0325 901b483
dynamo_benchmarks 6226b7d /

Versions

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"

export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python -m torch.backends.xeon.run_cpu --node_id 0 benchmarks/dynamo/torchbench.py --inference --inference --float32 -dcpu -n50 --inductor  --no-skip --dashboard --only Background_Matting  --cold_start_latency

python -m torch.backends.xeon.run_cpu --node_id 0 benchmarks/dynamo/torchbench.py --inference --performance --float32 -dcpu -n50 --inductor  --no-skip --dashboard --only doctr_det_predictor  --cold_start_latency

python -m torch.backends.xeon.run_cpu --node_id 0 benchmarks/dynamo/torchbench.py --inference --performance --float32 -dcpu -n50 --inductor  --no-skip --dashboard --only functorch_dp_cifar10 --cold_start_latency

python -m torch.backends.xeon.run_cpu --node_id 0 benchmarks/dynamo/torchbench.py --inference --performance --float32 -dcpu -n50 --inductor  --no-skip --dashboard --only gmlp_s16_224  --cold_start_latency

python -m torch.backends.xeon.run_cpu --node_id 0 benchmarks/dynamo/timm_models.py --inference --performance --float32 -dcpu -n50 --inductor  --no-skip --dashboard --only resmlp_12_224  --cold_start_latency

python -m torch.backends.xeon.run_cpu --core_list 0 --ncores_per_instance 1 benchmarks/dynamo/timm_models.py --inference --performance --float32 -dcpu -n50 --inductor  --no-skip --dashboard --only tnt_s_patch16_224  --cold_start_latency

cc @ezyang @msaroufim @wconstab @bdhirsh @anijain2305 @zou3519

@XiaobingSuper XiaobingSuper self-assigned this Jul 12, 2023
@shunting314 shunting314 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 26, 2023
@penguinwu penguinwu added oncall: cpu inductor CPU Inductor issues for Intel team to triage and removed module: cpu inductor labels Dec 2, 2023
@chuanqi129
Copy link
Collaborator

Have double checked according to the latest test results, below 4 models still have regressions. cc @zxd1997066 help to find guilty commit for those 4 models. those are fP32 static default wrapper test, the first three models test with multi-thread, the last one tnt_s_patch16_224 is single thread.

2023-07-09 nightly 2023-07-06 nightly Result Comp
model batch_size speedup inductor eager batch_size speedup inductor eager speedup ratio eager ratio
Background_Matting 1 0.782193 0.350721461 0.274331872 1 1.00737 0.279395653 0.281454799 0.78 1.03
functorch_dp_cifar10 64 0.622732 0.009190167 0.005723011 64 1.008348 0.005596095 0.005642811 0.62 0.99
resmlp_12_224 128 0.749039 0.415152565 0.310965462 128 1.237528 0.259625741 0.321294124 0.61 1.03
tnt_s_patch16_224 1 1.173545 0.094649226 0.111075126 1 1.367958 0.081854891 0.111974053 0.86 1.01

@leslie-fang-intel
Copy link
Collaborator

@zxd1997066 Please help to find the guilty commit for each regression, so we can take a look.

@leslie-fang-intel leslie-fang-intel removed their assignment Dec 25, 2023
@zxd1997066
Copy link
Contributor

I cannot reproduce the 2023-07-06 nightly result from my side for these 4 models

Have double checked according to the latest test results, below 4 models still have regressions. cc @zxd1997066 help to find guilty commit for those 4 models. those are fP32 static default wrapper test, the first three models test with multi-thread, the last one tnt_s_patch16_224 is single thread.

2023-07-09 nightly 聽 聽 聽 2023-07-06 nightly 聽 聽 聽 Result Comp 聽 聽
model batch_size speedup inductor eager batch_size speedup inductor eager speedup ratio eager ratio
Background_Matting 1 0.782193 0.350721461 0.274331872 1 1.00737 0.279395653 0.281454799 0.78 1.03
functorch_dp_cifar10 64 0.622732 0.009190167 0.005723011 64 1.008348 0.005596095 0.005642811 0.62 0.99
resmlp_12_224 128 0.749039 0.415152565 0.310965462 128 1.237528 0.259625741 0.321294124 0.61 1.03
tnt_s_patch16_224 1 1.173545 0.094649226 0.111075126 1 1.367958 0.081854891 0.111974053 0.86 1.01

@leslie-fang-intel
Copy link
Collaborator

@zxd1997066 @chuanqi129 will check the performance data before regression.

@zxd1997066
Copy link
Contributor

update:
verified on 2023-07-6 nightly: 13763f5
Background_Matting:

without freezing:(good)
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,Background_Matting,1,0.997587,279.893412,43.882071,0.987273,478.472192,484.640358,183,1,0,0

with freezing:(bad)
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
cpu,Background_Matting,1,0.787942,356.164509,46.301095,0.985237,473.252659,480.344064,183,1,0,0
for commit before 7/6 nightly, running with TORCHINDUCTOR_FREEZING=1 will meet crash
image

but with 2024-01-29 nightly: 890d8e6, both with and without freezing have bad performance:
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks,autograd_captures,autograd_compiles
cpu,Background_Matting,1,0.854602,326.251965,15.495688,0.982960,477.204070,485.476762,183,1,0,0,0,0

@leslie-fang-intel
Copy link
Collaborator

update: verified on 2023-07-6 nightly: 13763f5 Background_Matting:

without freezing:(good) dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cpu,Background_Matting,1,0.997587,279.893412,43.882071,0.987273,478.472192,484.640358,183,1,0,0

with freezing:(bad) dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks cpu,Background_Matting,1,0.787942,356.164509,46.301095,0.985237,473.252659,480.344064,183,1,0,0

Thanks. @chuanqi129 @zxd1997066, For Background_Matting, did you mean the regression is due to our testing semantic TORCHINDUCTOR_FREEZING=1 or TORCHINDUCTOR_FREEZING=0 changed instead of code check in?

@zxd1997066
Copy link
Contributor

It is hard to say, since it is a very early report. But per my verification, TORCHINDUCTOR_FREEZING=1 and TORCHINDUCTOR_FREEZING=0 make difference on the same commit 13763f5

BTW, when using TORCHINDUCTOR_FREEZING=0, tnt_s_patch16_224, functorch_dp_cifar10, Background_Matting have performance regression with latest pytorch, tnt_s_patch16_224 and functorch_dp_cifar10 have the same suspected guilty commit 7e098f9, Background_Matting has the guilty commit 7c97c94, will submit issue for them separately.

@zxd1997066
Copy link
Contributor

tnt_s_patch16_224 and functorch_dp_cifar10 regression: #119178
Background_Matting regression: #119181
resmlp_12_224 has no regression when using TORCHINDUCTOR_FREEZING=0
the gap between TORCHINDUCTOR_FREEZING=1 and TORCHINDUCTOR_FREEZING=0: #119183

@leslie-fang-intel
Copy link
Collaborator

Close this issue as tracked in new issues group by guilty commit as above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: cpu inductor CPU Inductor issues for Intel team to triage oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

8 participants