New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent matter devices unavailable - see liveness timeout on one peer #100966
Comments
Hey there @home-assistant/matter, mind taking a look at this issue as it has been labeled with an integration ( Code owner commandsCode owners of
(message by CodeOwnersMention) matter documentation |
After more investigation, it seems the three Tuo brand devices (two buttons, one contact sensor) have some firmware issues causing these subscription issues? Or maybe is it a HA Matter issue? |
Based on my own investigation a number of thread devices have these issues, for example Nanoleaf Matter Essential products. That just seem to dropoff quickly and then come back (maybe the device itself restarts?) Because we are publishing every update in realtime you can actually notice these quick dropouts in the HA log. We could also be a bit more forgiving and only mark the device as unavailable after x seconds/minutes of disconnection (like Apple does) but then we're hiding the underlying issue, which is maybe not the best idea in this current beta state of Matter. Note that HA uses the exact same underlying Matter SDK as Google and Apple under the hood so chances are very small this is a HA specific issue. What can make a difference is the host that runs HA and networking gear. In the past months we've been busy exploring issues with networking gear (or even Linux kernels) not entirely prepared for the IPV6 mdns/multicast packets from the Matter protocol. The route that we test the most is Home Assistant OS, running the official Matter add-on. We provide an experimental docker image for running the Matter server yourself but we've seen cases where the underlying Linux host had some issues in the network implementation, causing routing issues with matter packets. Yes, it can be that low level at this point. In any way, it's not as easy as it may seem and especially Thread based Matter devices are still a work in progress. Not only for us, but the entire market. Month by month it improves a lot but there are still some foundational connectivity issues left. |
That is really possible, we've seen the same sort of issues with Eve with their early beta firmwares for Matter support. They stabilized it with every firmware update and I now have a few test devices here that are connected for weeks over thread without a single dropout. At the same time I have a Nanoleaf Essentials bulb here. When I connect that bulb to my Thread network, the entire network becomes unstable somehow and even the otherwise stable Eve devices become unavailable. So point I'm trying to make here is that the issue can be caused by many factors. What could help is just slim down the network first. Try one device at a time until you find the cause. Also, you have disabled IGMP snooping. In some cases it might actually help to actually enable it to reduce congestion of multicast traffic flooding everywhere. Try toggle between enabled and disabled if there's a difference. |
Ok I have a bit more feedback here. I removed all the Tuo devices (contact sensor, buttons), and the HA logs cleared up. I re-enabled IGMP snooping on my 3 switches. However, I then saw frequent mDNS query timeout messages in the HA logs. I disabled IGMP everywhere, rebooted the HOAS Proxmox 8 host, rebooted the two Apple TVs, and now the HA logs are quiet again. Only a couple of one-off re-subscription events for an Eve contact sensor. |
OK, thanks for the additional info. So in your case you actually had to disable IGMP snooping which means its probably not dealing well with the IPv6 multicast packets (just a random guess from me here). A couple of one-off re-subscription events are OK, the device is IP based so will suffer from an accidental drop or just an IP change, the matter server should re-subscribe very fast in case that happens. Problem is if you see disconnects very often (paired with the mdns discovery timeouts in the server logs), that means the device can no longer be reached and/or discovered. So, your original issue is now resolved ? |
My core switch is QNAP QSW-M2116P-2T2S with firmware 2.0.0.22052. I'm procuring a new core switch, TP-Link TL-SG3428X-M2. I'll see if that makes any difference. My HAOS is also a VM on Proxmox 8 with Linux 6.2.16-15-pve. The Linux bridge in Proxmox does NOT have "VLAN aware" ticked, and I have no special sysctl interface settings. |
Can we close this one ? |
@marcelveldt Yup. Likely tied to buggy QNAP firmware. Using Netgear M4300 and TP-Link Jetstream switches with zero issues. |
The problem
I'm seeing occasion issues with some Matter devices being unavailable for a period of time, then they come back online. In the Matter logs I consistently see this issue with peer "F":
chip.DMG[126] ERROR Subscription Liveness timeout with SubscriptionID = 0xda9e7e45, Peer = 01:000000000000000F
My home network consists of two ATV 4Ks, tvOS 17, and IGMP snooping is disabled on all switches/routers. All Matter devices are Thread based. I'm not sure what peer device "F" is. Both ATVs are hardwired to the network.
What version of Home Assistant Core has the issue?
core-2023.9.3
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant OS
Integration causing the issue
Matter
Link to integration documentation on our website
https://www.home-assistant.io/integrations/matter
Diagnostics information
No response
Example YAML snippet
No response
Anything in the logs that might be useful for us?
Additional information
No response
The text was updated successfully, but these errors were encountered: