Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent matter devices unavailable - see liveness timeout on one peer #100966

Closed
Coder84619 opened this issue Sep 26, 2023 · 9 comments
Closed

Comments

@Coder84619
Copy link

The problem

I'm seeing occasion issues with some Matter devices being unavailable for a period of time, then they come back online. In the Matter logs I consistently see this issue with peer "F":

chip.DMG[126] ERROR Subscription Liveness timeout with SubscriptionID = 0xda9e7e45, Peer = 01:000000000000000F

My home network consists of two ATV 4Ks, tvOS 17, and IGMP snooping is disabled on all switches/routers. All Matter devices are Thread based. I'm not sure what peer device "F" is. Both ATVs are hardwired to the network.

What version of Home Assistant Core has the issue?

core-2023.9.3

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

Matter

Link to integration documentation on our website

https://www.home-assistant.io/integrations/matter

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2023-09-26 07:15:48 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 07:15:48 core-matter-server PersistentStorage[126] INFO SetSdkKey: f/1/s/0000000000000011 = b'\x150\x03\x10|\xe7\xcc$Y\x97\xfcq\xe5\xb9\xfa\xd1iJ[\x0e0\x04 \x95\x15\xb8\x8dUV\xdf\x0e\xfe\xa5\xb8\xdc-\xb6\xd6l;\xb9E\xea\xf3\xf1}\xbdi\xa7!\x1f\x89\xbb\xb9\xca0\x05\x0c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x18'
2023-09-26 07:15:48 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 07:15:48 core-matter-server PersistentStorage[126] INFO SetSdkKey: g/s/fOfMJFmX/HHlufrRaUpbDg== = b'\x15$\x01\x01$\x02\x11\x18'
2023-09-26 07:15:48 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 07:15:48 core-matter-server chip.DMG[126] ERROR Subscription Liveness timeout with SubscriptionID = 0x20d3fd43, Peer = 01:000000000000000F
2023-09-26 07:15:48 core-matter-server matter_server.server.device_controller.[node 15][126] INFO Previous subscription failed with Error: 50, re-subscribing in 0 ms...
2023-09-26 07:15:48 core-matter-server root[126] INFO Re-subscription succeeded!
2023-09-26 07:15:48 core-matter-server matter_server.server.device_controller.[node 17][126] INFO Re-Subscription succeeded
2023-09-26 07:15:49 core-matter-server PersistentStorage[126] INFO DeleteSdkKey: g/s/q/tXnhTyDsojTe4abqs/MA==
2023-09-26 07:15:49 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 07:15:49 core-matter-server PersistentStorage[126] INFO SetSdkKey: f/1/s/000000000000000F = b'\x150\x03\x10\x7fN\xa60i=\xd0\xf0\xf2mc\xcac\xba\xcf\x7f0\x04 \xecaU3L7E\xb6\xce\xa1\xcd\x81\x18\xf5\xa3!v\x83\x91l\n\xfb\x13/ME\x1c_\x1a\xe3\xdbZ0\x05\x0c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x18'
2023-09-26 07:15:49 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 07:15:49 core-matter-server PersistentStorage[126] INFO SetSdkKey: g/s/f06mMGk90PDybWPKY7rPfw== = b'\x15$\x01\x01$\x02\x0f\x18'
2023-09-26 07:15:49 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 07:15:50 core-matter-server root[126] INFO Re-subscription succeeded!
2023-09-26 07:15:50 core-matter-server matter_server.server.device_controller.[node 15][126] INFO Re-Subscription succeeded
2023-09-26 07:47:54 core-matter-server chip.DMG[126] ERROR Subscription Liveness timeout with SubscriptionID = 0x05ebc966, Peer = 01:000000000000000F
2023-09-26 07:47:54 core-matter-server matter_server.server.device_controller.[node 15][126] INFO Previous subscription failed with Error: 50, re-subscribing in 0 ms...
2023-09-26 07:47:55 core-matter-server PersistentStorage[126] INFO DeleteSdkKey: g/s/f06mMGk90PDybWPKY7rPfw==
2023-09-26 07:47:55 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 07:47:55 core-matter-server PersistentStorage[126] INFO SetSdkKey: f/1/s/000000000000000F = b'\x150\x03\x10\xac_!l8\x1f\x1c:_.\xdef \x9c>^0\x04 \xecaU3L7E\xb6\xce\xa1\xcd\x81\x18\xf5\xa3!v\x83\x91l\n\xfb\x13/ME\x1c_\x1a\xe3\xdbZ0\x05\x0c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x18'
2023-09-26 07:47:55 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 07:47:55 core-matter-server PersistentStorage[126] INFO SetSdkKey: g/s/rF8hbDgfHDpfLt5mIJw+Xg== = b'\x15$\x01\x01$\x02\x0f\x18'
2023-09-26 07:47:55 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 07:47:55 core-matter-server root[126] INFO Re-subscription succeeded!
2023-09-26 07:47:55 core-matter-server matter_server.server.device_controller.[node 15][126] INFO Re-Subscription succeeded
2023-09-26 08:10:17 core-matter-server chip.DMG[126] ERROR Subscription Liveness timeout with SubscriptionID = 0xdb18bde8, Peer = 01:000000000000000F
2023-09-26 08:10:17 core-matter-server matter_server.server.device_controller.[node 15][126] INFO Previous subscription failed with Error: 50, re-subscribing in 0 ms...
2023-09-26 08:10:19 core-matter-server PersistentStorage[126] INFO DeleteSdkKey: g/s/rF8hbDgfHDpfLt5mIJw+Xg==
2023-09-26 08:10:19 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 08:10:19 core-matter-server PersistentStorage[126] INFO SetSdkKey: f/1/s/000000000000000F = b'\x150\x03\x10\x18\x14\xd3\xbc4~\xeb\xad\xf9\xac\xe2\x92\xb9\x96\xbd\xc20\x04 \xecaU3L7E\xb6\xce\xa1\xcd\x81\x18\xf5\xa3!v\x83\x91l\n\xfb\x13/ME\x1c_\x1a\xe3\xdbZ0\x05\x0c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x18'
2023-09-26 08:10:19 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 08:10:19 core-matter-server PersistentStorage[126] INFO SetSdkKey: g/s/GBTTvDR+6635rOKSuZa9wg== = b'\x15$\x01\x01$\x02\x0f\x18'
2023-09-26 08:10:19 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 08:10:20 core-matter-server root[126] INFO Re-subscription succeeded!
2023-09-26 08:10:20 core-matter-server matter_server.server.device_controller.[node 15][126] INFO Re-Subscription succeeded
2023-09-26 08:10:49 core-matter-server chip.EM[126] ERROR Failed to Send CHIP MessageCounter:119075810 on exchange 15294r sendCount: 4 max retries: 4
2023-09-26 09:38:24 core-matter-server chip.DMG[126] ERROR Subscription Liveness timeout with SubscriptionID = 0xa8010d77, Peer = 01:000000000000000F
2023-09-26 09:38:24 core-matter-server matter_server.server.device_controller.[node 15][126] INFO Previous subscription failed with Error: 50, re-subscribing in 0 ms...
2023-09-26 09:38:26 core-matter-server PersistentStorage[126] INFO DeleteSdkKey: g/s/GBTTvDR+6635rOKSuZa9wg==
2023-09-26 09:38:26 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 09:38:26 core-matter-server PersistentStorage[126] INFO SetSdkKey: f/1/s/000000000000000F = b"\x150\x03\x10\xfe\xa5\xa9[j\xd9'eV\xba\xf4-\xb6\x0b_\xb10\x04 \xecaU3L7E\xb6\xce\xa1\xcd\x81\x18\xf5\xa3!v\x83\x91l\n\xfb\x13/ME\x1c_\x1a\xe3\xdbZ0\x05\x0c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x18"
2023-09-26 09:38:26 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 09:38:26 core-matter-server PersistentStorage[126] INFO SetSdkKey: g/s//qWpW2rZJ2VWuvQttgtfsQ== = b'\x15$\x01\x01$\x02\x0f\x18'
2023-09-26 09:38:26 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 09:38:27 core-matter-server root[126] INFO Re-subscription succeeded!
2023-09-26 09:38:27 core-matter-server matter_server.server.device_controller.[node 15][126] INFO Re-Subscription succeeded
2023-09-26 10:34:37 core-matter-server chip.DMG[126] ERROR Subscription Liveness timeout with SubscriptionID = 0xa899df97, Peer = 01:000000000000000F
2023-09-26 10:34:37 core-matter-server matter_server.server.device_controller.[node 15][126] INFO Previous subscription failed with Error: 50, re-subscribing in 0 ms...
2023-09-26 10:34:38 core-matter-server PersistentStorage[126] INFO DeleteSdkKey: g/s//qWpW2rZJ2VWuvQttgtfsQ==
2023-09-26 10:34:38 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 10:34:38 core-matter-server PersistentStorage[126] INFO SetSdkKey: f/1/s/000000000000000F = b'\x150\x03\x10\xb0\x8e\xcf\xba\nr7\xde$!S\xe5g \x1e@0\x04 \xecaU3L7E\xb6\xce\xa1\xcd\x81\x18\xf5\xa3!v\x83\x91l\n\xfb\x13/ME\x1c_\x1a\xe3\xdbZ0\x05\x0c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x18'
2023-09-26 10:34:38 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 10:34:38 core-matter-server PersistentStorage[126] INFO SetSdkKey: g/s/sI7PugpyN94kIVPlZyAeQA== = b'\x15$\x01\x01$\x02\x0f\x18'
2023-09-26 10:34:38 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 10:34:39 core-matter-server root[126] INFO Re-subscription succeeded!
2023-09-26 10:34:39 core-matter-server matter_server.server.device_controller.[node 15][126] INFO Re-Subscription succeeded
2023-09-26 10:35:00 core-matter-server chip.EM[126] ERROR Failed to Send CHIP MessageCounter:230028055 on exchange 53812r sendCount: 4 max retries: 4
2023-09-26 11:12:43 core-matter-server chip.DMG[126] ERROR Subscription Liveness timeout with SubscriptionID = 0x272cbe51, Peer = 01:000000000000000F
2023-09-26 11:12:43 core-matter-server matter_server.server.device_controller.[node 15][126] INFO Previous subscription failed with Error: 50, re-subscribing in 0 ms...
2023-09-26 11:12:48 core-matter-server PersistentStorage[126] INFO DeleteSdkKey: g/s/sI7PugpyN94kIVPlZyAeQA==
2023-09-26 11:12:48 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 11:12:48 core-matter-server PersistentStorage[126] INFO SetSdkKey: f/1/s/000000000000000F = b'\x150\x03\x10\xcbpk\xcfYZ\x02\xfdh\xc2\x90X\x84\x1f\xb0\xb60\x04 \xecaU3L7E\xb6\xce\xa1\xcd\x81\x18\xf5\xa3!v\x83\x91l\n\xfb\x13/ME\x1c_\x1a\xe3\xdbZ0\x05\x0c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x18'
2023-09-26 11:12:48 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 11:12:48 core-matter-server PersistentStorage[126] INFO SetSdkKey: g/s/y3Brz1laAv1owpBYhB+wtg== = b'\x15$\x01\x01$\x02\x0f\x18'
2023-09-26 11:12:48 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 11:12:49 core-matter-server root[126] INFO Re-subscription succeeded!
2023-09-26 11:12:49 core-matter-server matter_server.server.device_controller.[node 15][126] INFO Re-Subscription succeeded
2023-09-26 11:38:55 core-matter-server chip.DMG[126] ERROR Subscription Liveness timeout with SubscriptionID = 0x0208293a, Peer = 01:000000000000000F
2023-09-26 11:38:55 core-matter-server matter_server.server.device_controller.[node 15][126] INFO Previous subscription failed with Error: 50, re-subscribing in 0 ms...
2023-09-26 11:38:55 core-matter-server PersistentStorage[126] INFO DeleteSdkKey: g/s/y3Brz1laAv1owpBYhB+wtg==
2023-09-26 11:38:55 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 11:38:55 core-matter-server PersistentStorage[126] INFO SetSdkKey: f/1/s/000000000000000F = b'\x150\x03\x10^\x08Y\xff\xf0\xd8\xc1\t\x99\x8b\x9f\xe6\xb3\xf3\xc1\x930\x04 \xecaU3L7E\xb6\xce\xa1\xcd\x81\x18\xf5\xa3!v\x83\x91l\n\xfb\x13/ME\x1c_\x1a\xe3\xdbZ0\x05\x0c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x18'
2023-09-26 11:38:55 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 11:38:55 core-matter-server PersistentStorage[126] INFO SetSdkKey: g/s/XghZ//DYwQmZi5/ms/PBkw== = b'\x15$\x01\x01$\x02\x0f\x18'
2023-09-26 11:38:55 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 11:38:56 core-matter-server root[126] INFO Re-subscription succeeded!
2023-09-26 11:38:56 core-matter-server matter_server.server.device_controller.[node 15][126] INFO Re-Subscription succeeded
2023-09-26 12:39:00 core-matter-server chip.DMG[126] ERROR Subscription Liveness timeout with SubscriptionID = 0xd65a926e, Peer = 01:000000000000000F
2023-09-26 12:39:00 core-matter-server matter_server.server.device_controller.[node 15][126] INFO Previous subscription failed with Error: 50, re-subscribing in 0 ms...
2023-09-26 12:39:01 core-matter-server PersistentStorage[126] INFO DeleteSdkKey: g/s/XghZ//DYwQmZi5/ms/PBkw==
2023-09-26 12:39:01 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 12:39:01 core-matter-server PersistentStorage[126] INFO SetSdkKey: f/1/s/000000000000000F = b'\x150\x03\x10\xf9J}\xc5b\xe9\x9bVc\xe7\xce\x10l\x8fQY0\x04 \xecaU3L7E\xb6\xce\xa1\xcd\x81\x18\xf5\xa3!v\x83\x91l\n\xfb\x13/ME\x1c_\x1a\xe3\xdbZ0\x05\x0c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x18'
2023-09-26 12:39:01 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 12:39:01 core-matter-server PersistentStorage[126] INFO SetSdkKey: g/s/+Up9xWLpm1Zj584QbI9RWQ== = b'\x15$\x01\x01$\x02\x0f\x18'
2023-09-26 12:39:01 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 12:39:02 core-matter-server root[126] INFO Re-subscription succeeded!
2023-09-26 12:39:02 core-matter-server matter_server.server.device_controller.[node 15][126] INFO Re-Subscription succeeded
2023-09-26 13:18:06 core-matter-server chip.DMG[126] ERROR Subscription Liveness timeout with SubscriptionID = 0xda9e7e45, Peer = 01:000000000000000F
2023-09-26 13:18:06 core-matter-server matter_server.server.device_controller.[node 15][126] INFO Previous subscription failed with Error: 50, re-subscribing in 0 ms...
2023-09-26 13:18:07 core-matter-server PersistentStorage[126] INFO DeleteSdkKey: g/s/+Up9xWLpm1Zj584QbI9RWQ==
2023-09-26 13:18:07 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 13:18:07 core-matter-server PersistentStorage[126] INFO SetSdkKey: f/1/s/000000000000000F = b'\x150\x03\x10\xda0\x8c&\xb0d\xbd-\xb4\xf7=\x92wv\x15\xb10\x04 \xecaU3L7E\xb6\xce\xa1\xcd\x81\x18\xf5\xa3!v\x83\x91l\n\xfb\x13/ME\x1c_\x1a\xe3\xdbZ0\x05\x0c\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x18'
2023-09-26 13:18:07 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 13:18:07 core-matter-server PersistentStorage[126] INFO SetSdkKey: g/s/2jCMJrBkvS209z2Sd3YVsQ== = b'\x15$\x01\x01$\x02\x0f\x18'
2023-09-26 13:18:07 core-matter-server PersistentStorage[126] INFO Committing...
2023-09-26 13:18:08 core-matter-server root[126] INFO Re-subscription succeeded!
2023-09-26 13:18:08 core-matter-server matter_server.server.device_controller.[node 15][126] INFO Re-Subscription succeeded
2023-09-26 13:18:35 core-matter-server chip.EM[126] ERROR Failed to Send CHIP MessageCounter:209167867 on exchange 27571r sendCount: 4 max retries: 4

Additional information

No response

@home-assistant
Copy link

Hey there @home-assistant/matter, mind taking a look at this issue as it has been labeled with an integration (matter) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of matter can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign matter Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


matter documentation
matter source
(message by IssueLinks)

@Coder84619
Copy link
Author

After more investigation, it seems the three Tuo brand devices (two buttons, one contact sensor) have some firmware issues causing these subscription issues? Or maybe is it a HA Matter issue?

@marcelveldt
Copy link
Member

Based on my own investigation a number of thread devices have these issues, for example Nanoleaf Matter Essential products. That just seem to dropoff quickly and then come back (maybe the device itself restarts?)

Because we are publishing every update in realtime you can actually notice these quick dropouts in the HA log.
I'm pretty sure they also exist in every other ecosystem but they just do not refresh so quickly as we do.

We could also be a bit more forgiving and only mark the device as unavailable after x seconds/minutes of disconnection (like Apple does) but then we're hiding the underlying issue, which is maybe not the best idea in this current beta state of Matter.

Note that HA uses the exact same underlying Matter SDK as Google and Apple under the hood so chances are very small this is a HA specific issue. What can make a difference is the host that runs HA and networking gear. In the past months we've been busy exploring issues with networking gear (or even Linux kernels) not entirely prepared for the IPV6 mdns/multicast packets from the Matter protocol.

The route that we test the most is Home Assistant OS, running the official Matter add-on. We provide an experimental docker image for running the Matter server yourself but we've seen cases where the underlying Linux host had some issues in the network implementation, causing routing issues with matter packets. Yes, it can be that low level at this point.

In any way, it's not as easy as it may seem and especially Thread based Matter devices are still a work in progress. Not only for us, but the entire market. Month by month it improves a lot but there are still some foundational connectivity issues left.

@marcelveldt
Copy link
Member

After more investigation, it seems the three Tuo brand devices (two buttons, one contact sensor) have some firmware issues causing these subscription issues? Or maybe is it a HA Matter issue?

That is really possible, we've seen the same sort of issues with Eve with their early beta firmwares for Matter support. They stabilized it with every firmware update and I now have a few test devices here that are connected for weeks over thread without a single dropout.

At the same time I have a Nanoleaf Essentials bulb here. When I connect that bulb to my Thread network, the entire network becomes unstable somehow and even the otherwise stable Eve devices become unavailable.

So point I'm trying to make here is that the issue can be caused by many factors. What could help is just slim down the network first. Try one device at a time until you find the cause.

Also, you have disabled IGMP snooping. In some cases it might actually help to actually enable it to reduce congestion of multicast traffic flooding everywhere. Try toggle between enabled and disabled if there's a difference.

@Coder84619
Copy link
Author

Coder84619 commented Sep 29, 2023

Ok I have a bit more feedback here. I removed all the Tuo devices (contact sensor, buttons), and the HA logs cleared up. I re-enabled IGMP snooping on my 3 switches. However, I then saw frequent mDNS query timeout messages in the HA logs. I disabled IGMP everywhere, rebooted the HOAS Proxmox 8 host, rebooted the two Apple TVs, and now the HA logs are quiet again. Only a couple of one-off re-subscription events for an Eve contact sensor.

@marcelveldt
Copy link
Member

OK, thanks for the additional info. So in your case you actually had to disable IGMP snooping which means its probably not dealing well with the IPv6 multicast packets (just a random guess from me here).

A couple of one-off re-subscription events are OK, the device is IP based so will suffer from an accidental drop or just an IP change, the matter server should re-subscribe very fast in case that happens.

Problem is if you see disconnects very often (paired with the mdns discovery timeouts in the server logs), that means the device can no longer be reached and/or discovered.

So, your original issue is now resolved ?
Maybe you can mention the name/model of your switches so others who read this can try your solution as well.

@Coder84619
Copy link
Author

My core switch is QNAP QSW-M2116P-2T2S with firmware 2.0.0.22052. I'm procuring a new core switch, TP-Link TL-SG3428X-M2. I'll see if that makes any difference. My HAOS is also a VM on Proxmox 8 with Linux 6.2.16-15-pve. The Linux bridge in Proxmox does NOT have "VLAN aware" ticked, and I have no special sysctl interface settings.

@marcelveldt
Copy link
Member

Can we close this one ?

@Coder84619
Copy link
Author

Coder84619 commented Nov 2, 2023

@marcelveldt Yup. Likely tied to buggy QNAP firmware. Using Netgear M4300 and TP-Link Jetstream switches with zero issues.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants