Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch hangs at import when used together with TensorFlow #102360

Open
albertz opened this issue May 26, 2023 · 11 comments
Open

PyTorch hangs at import when used together with TensorFlow #102360

albertz opened this issue May 26, 2023 · 11 comments
Labels
topic: binaries triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@albertz
Copy link
Contributor

albertz commented May 26, 2023

🐛 Describe the bug

Code:

import tensorflow
import torch

This hangs in some cases in the import torch.

Importing it the other way around, or also just importing Torch does not hang. However, I'm reporting this here because the stacktrace looks still suspicious.

Specifically, on my system, Ubuntu 22.04, using the distrib Python 3.10, I have TensorFlow 2.12 and PyTorch 2.0.1. Same also with Python 3.11.

The stacktrace of the hang:

0x00007ffff7d51992 in __GI___libc_read (fd=0, buf=0x7fffffffa6f7, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:26                                              
26      ../sysdeps/unix/sysv/linux/read.c: No such file or directory.                                                                                                           
(gdb) bt                                                                                
#0  0x00007ffff7d51992 in __GI___libc_read (fd=0, buf=0x7fffffffa6f7, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:26                                                         
#1  0x00007ffff43af518 in std::random_device::_M_getval() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007fff5d55a6ef in _GLOBAL__sub_I_IpcFabricConfigClient.cpp () from /u/zeyer/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#3  0x00007ffff7fc947e in call_init (l=<optimized out>, argc=argc@entry=3, argv=argv@entry=0x7fffffffd1c8, env=env@entry=0x555557198d40) at ./elf/dl-init.c:70
#4  0x00007ffff7fc9568 in call_init (env=0x555557198d40, argv=0x7fffffffd1c8, argc=3, l=<optimized out>) at ./elf/dl-init.c:33
...

The fd=0, coming from std::random_device::_M_getval looks very suspicious to me. It looks like the std::random_device is not properly initialized? Code here and here.

In other cases, I have also seen the error "random_device could not be read". This seems to be very related, maybe it got another uninitialized _M_fd value.

I also reported this here: rwth-i6/returnn#1339

Some related issues:
https://discuss.pytorch.org/t/random-device-could-not-be-read/138697 (very related)
JohnSnowLabs/spark-nlp#5943
https://discuss.tensorflow.org/t/tensorflow-linux-wheels-are-being-upgraded-to-manylinux2014/8339
h2oai/datatable#2453
robjinman/pro_office_calc#5
boostorg/fiber#249
h2oai/datatable#2453
microsoft/LightGBM#1516

Versions

PyTorch version: 2.0.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.2 LTS (x86_64)
GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Clang version: 15.0.7
CMake version: version 3.26.3
Libc version: glibc-2.35

Python version: 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] (64-bit runtime)
Python platform: Linux-5.15.0-46-generic-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 980
Nvidia driver version: 530.41.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 6
On-line CPU(s) list: 0-5
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz
CPU family: 6
Model: 158
Thread(s) per core: 1
Core(s) per socket: 6
Socket(s): 1
Stepping: 10
CPU(s) scaling MHz: 42%
CPU max MHz: 4100.0000
CPU min MHz: 800.0000
BogoMIPS: 6000.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
L1d cache: 192 KiB (6 instances)
L1i cache: 192 KiB (6 instances)
L2 cache: 1.5 MiB (6 instances)
L3 cache: 9 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-5
Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported
Vulnerability L1tf: Mitigation; PTE Inversion
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT disabled
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT disabled
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, RSB filling
Vulnerability Srbds: Mitigation; Microcode
Vulnerability Tsx async abort: Mitigation; TSX disabled

Versions of relevant libraries:
[pip3] flake8==4.0.1
[pip3] numpy==1.23.5
[pip3] torch==2.0.1
[pip3] torchaudio==2.0.2
[pip3] torchdata==0.6.1
[pip3] triton==2.0.0
[conda] Could not collect

@albertz
Copy link
Contributor Author

albertz commented May 26, 2023

Interestingly, maybe using TensorFlow 2.10 does not cause the problem with the hang in PyTorch? At least I don't get the hang then. However, I don't have the proper CUDA env setup for this, so TF fails to load some CUDA libs, which might also influence the behavior. Or maybe TF 2.12 is also behaving a bit different w.r.t. the CUDA libs and loads them more lazily. I'm not sure.

$ python3.10 -c "import tensorflow; import torch"
2023-05-26 11:11:20.630355: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-26 11:11:20.726512: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cudnn-10.1-v7.6/lib64:/usr/local/cudnn-9.1-v7.1/lib64:/usr/local/cudnn-8.0-v7.0/lib64:/usr/local/cudnn-8.0-v6.0/lib64:/usr/local/cudnn-8.0-v5.1/lib64:/usr/local/cuda-9.1/lib64:/usr/local/cuda-9.1/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64:/usr/local/cuda-10.1/extras/CUPTI/lib64:/usr/local/cuda-10.0/lib64:/usr/local/cuda-10.0/extras/CUPTI/lib64:/usr/local/lib:/usr/local/cuda-6.5/lib64:/usr/lib/atlas-base:/usr/local/cuda-7.5/lib64
2023-05-26 11:11:20.726534: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-05-26 11:11:20.747490: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-05-26 11:11:22.222958: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cudnn-10.1-v7.6/lib64:/usr/local/cudnn-9.1-v7.1/lib64:/usr/local/cudnn-8.0-v7.0/lib64:/usr/local/cudnn-8.0-v6.0/lib64:/usr/local/cudnn-8.0-v5.1/lib64:/usr/local/cuda-9.1/lib64:/usr/local/cuda-9.1/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64:/usr/local/cuda-10.1/extras/CUPTI/lib64:/usr/local/cuda-10.0/lib64:/usr/local/cuda-10.0/extras/CUPTI/lib64:/usr/local/lib:/usr/local/cuda-6.5/lib64:/usr/lib/atlas-base:/usr/local/cuda-7.5/lib64
2023-05-26 11:11:22.223032: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cudnn-10.1-v7.6/lib64:/usr/local/cudnn-9.1-v7.1/lib64:/usr/local/cudnn-8.0-v7.0/lib64:/usr/local/cudnn-8.0-v6.0/lib64:/usr/local/cudnn-8.0-v5.1/lib64:/usr/local/cuda-9.1/lib64:/usr/local/cuda-9.1/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64:/usr/local/cuda-10.1/extras/CUPTI/lib64:/usr/local/cuda-10.0/lib64:/usr/local/cuda-10.0/extras/CUPTI/lib64:/usr/local/lib:/usr/local/cuda-6.5/lib64:/usr/lib/atlas-base:/usr/local/cuda-7.5/lib64
2023-05-26 11:11:22.223042: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

Why did I try TF 2.10? Because in our GitHub CI, this is what we use there, and this works.

But in addition to that, I have read here that TF has changed sth in recent versions, and they mention:

Q2. What kinds of breakages during the build process are most likely related to these changes?
RuntimeError: random_device could not be read

So it mentions exactly the error I saw (but I saw this error in PyTorch...).

@ezyang ezyang added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module topic: binaries labels May 27, 2023
@ezyang
Copy link
Contributor

ezyang commented May 27, 2023

This sort of problem tends to be quite difficult to diagnose, but one thing you could try is building pytorch and tf from source with the same compiler toolchain

@5cat
Copy link

5cat commented Jul 5, 2023

Going back to torch < 2 fixes this issue for me

@iXce
Copy link

iXce commented Jul 13, 2023

This doesn't seem to reproduce with nightlies (e.g. 2.1.0.dev20230712+cpu). Is it something that could be solved by a rebuild in a patch version with minor toolchain changes?

@yyyuhan
Copy link

yyyuhan commented Aug 21, 2023

My friend and I encountered the same issue here: the process became unresponsive while waiting for a blocked 4-byte read from the standard input (fd=0).

System Info:
Operating System: Ubuntu 20.04.6 LTS
Kernel: Linux 5.15.0-79-generic

Temporary Solution:
Downgrading torch to version < 2.0.0.

Unfortunately, the import is twisted with my project so I'm not able to provide a script for reproducing the error. Here are some other findings when I debugged the script with GDB:

  1. Observing the internal variables of the current object
    By casting the address of the relevant pointer within _M_getval to the intended type, I got the following values:
(gdb) p *(('std::random_device'*)0x7f68766c8fe0)
$1 = {{{_M_file = **0x75ad8b0**, _M_func = 0x0, **_M_fd = 0**}, _M_mt = {static state_size = 624, _M_x = {123394224, 
        0 <repeats 623 times>}, _M_p = 0}}}

Then I took a look at what _M_file points to:
(gdb) _p *((FILE*)0x75ad8b0) $3 = {_flags = -72539000, _IO_read_ptr = 0x0, _IO_read_end = 0x0, _IO_read_base = 0x0, _IO_write_base = 0x0, _IO_write_ptr = 0x0, _IO_write_end = 0x0, _IO_buf_base = 0x0, _IO_buf_end = 0x0, _IO_save_base = 0x0, _IO_backup_base = 0x0, _IO_save_end = 0x0, _markers = 0x0, _chain = 0x7f6a0c6575c0 <_IO_2_1_stderr_>, **_fileno = 4**, _flags2 = 128, _old_offset = 0, _cur_column = 0, _vtable_offset = 0 '\000', _shortbuf = "", _lock = 0x75ad990, _offset = -1, __pad1 = 0x0, __pad2 = 0x75ad9a0, __pad3 = 0x0, __pad4 = 0x0, __pad5 = 0, _mode = 0, _unused2 = '\000' <repeats 19 times>}

The _fileno=4 here points to /dev/urandom, which is the correct target file to read from:

lsof -p 362467 -a -d 4
COMMAND    PID              USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
python  362467 NAME    4r   CHR    1,9      0t0   10 **/dev/urandom**

However, it went for another path of reading from fd=0.

  1. Investigating the sequence of execution
    To ensure the initialization process of the file descriptors, I inserted breakpoints at random_device::_M_init and conducted single-step debugging. My observations revealed that the code entered the branch where the macro USE_POSIX_FILE_IO is not defined. Here is the section of code that I'm referring to: https://github.com/gcc-mirror/gcc/blob/d77c280454cfba48ef38357145cecdabc8c1b05c/libstdc%2B%2B-v3/src/c%2B%2B11/random.cc#L489C21-L489C21

I was unable to determine the preprocessor macro as it had been optimized out during compilation. However, system-wide POSIX I/O is fully supported.

While traversing the random_device::_M_getval function, it prompted the execution of an alternative branch of USE_POSIX_FILE_IO:
https://github.com/gcc-mirror/gcc/blob/d77c280454cfba48ef38357145cecdabc8c1b05c/libstdc%2B%2B-v3/src/c%2B%2B11/random.cc#L596-L608

This occurrence is quite perplexing to me; within the same process, different branches of the macro definition were activated. If anyone possesses insights into this situation, I would greatly appreciate any input. Thank you.

@DaveyBiggers
Copy link

This occurrence is quite perplexing to me; within the same process, different branches of the macro definition were activated. If anyone possesses insights into this situation, I would greatly appreciate any input. Thank you.

Does this mean we're dealing with two conflicting versions of the lib, one compiled with USE_POSIX_FILE_IO, and one without? Not really sure how this could arise.

I can reproduce this, by the way, just by doing:

import decord
import torch

decord==0.6.0
torch==2.0.1+cu117

I've tried this with Python 3.8.16 and 3.11.4.

If I import torch first, and then decord, the hang doesn't happen.

@yyyuhan
Copy link

yyyuhan commented Sep 6, 2023

@DaveyBiggers Dave is correct -- thank you for the clue!

With the 'import' example Dave shared above, I confirmed that random_device::_M_init and random_device::_M_getval were resolved to references in two different dynamic libraries:

(gdb) i symbol 0x7fff55a8b1b0
std::random_device::_M_init(std::string const&) in section .text of /home/xxx/miniconda3/envs/torch2/lib/python3.9/site-packages/**decord/libdecord.so**
(gdb) i symbol 0x7ffff4cc2160
std::random_device::_M_getval() in section .text of /home/xxx/miniconda3/envs/**torch2/bin/../lib/libstdc++.so.6**

This wouldn't happen at compile time due to the ODR of C++. But it could cause trouble during dynamic linking as we see in this example.

The solutions (other than downgrading torch) are:

  1. Set the environment variable LD_PRELOAD to the desired dynamic lib:
    LD_PRELOAD=/home/xxx/miniconda3/envs/torch2/bin/../lib/libstdc++.so python your_script.py

  2. Change the order of packages to import, i.e. import torch first

  3. (Not verified yet) Would the torch team consider setting the rpath flag when compiling the libtorch.so?

Cheers.

@albertz
Copy link
Contributor Author

albertz commented Sep 6, 2023

In the TF discussion I linked above (here), they don't mention USE_POSIX_FILE_IO, but they mention that they use the new new libstdcxx ABI, i.e. _GLIBCXX_USE_CXX11_ABI. I wonder if that is related as well.

@ChimesZ
Copy link

ChimesZ commented Nov 1, 2023

Recently I encountered one similar issue when I used both tensorboard and torch.

device = torch.device(opt.device if torch.cuda.is_available() else 'cpu')

if torch.cuda.is_available():
  model_t = model_t.to(device)
  model_t = nn.DataParallel(model_t, device_ids=opt.device_id)
  model_s = model_s.to(device)
  model_s = nn.DataParallel(model_s, device_ids=opt.device_id)
  # criterion = criterion.to(device)
  cudnn.benchmark = True

These codes araised error message in the terminal

torch._C._cuda_init()
RuntimeError: random_device could not be read

Magically, change the order of import solved this question from

import tensorboard_logger as tb_logger
import torch 

to

import torch
import tensorboard_logger as tb_logger

I don't know much about the mechanism behind the solution, but I guess it is related to the discussions above.
Hope this message could help somebody and any explanation for this problem is welcomed :)

@sid-kap
Copy link

sid-kap commented Nov 1, 2023

@ChimesZ Have you tried @yyyuhan's LD_PRELOAD solution?

For me, setting LD_PRELOAD="/lib/x86_64-linux-gnu/libstdc++.so.6" before starting python fixed it.

@shiva500
Copy link

My issue got resolved after going to pytorch version less than 2. Pls refer to below link for compatibility matrix.
https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-compatibility-matrix
pytorch comp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: binaries triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

9 participants