Socket Becomes Unresponsive in Local Connectivity Extension After Lock Screen

I’m developing an app designed for hospital environments, where public internet access may not be available. The app includes two components: the main app and a Local Connectivity Extension. Both rely on persistent TCP socket connections to communicate with a local server.

We’re observing a recurring issue where the extension’s socket becomes unresponsive every 1–3 hours, but only when the device is on the lock screen, even if the main app remains in the foreground. When the screen is not locked, the connection is stable and no disconnections occur.

❗ Issue Details:

• What’s going on: The extension sends a keep-alive ping packet every second, and the server replies with a pong and a system time packet.

• The bug: The server stops receiving keep alive packets from the extension.

 • On the server, we detect about 30 second gap on the server, a gap that shows no packets were received by the extension. This was confirmed via server logs and Wireshark).

 • On the extension, from our logs there was no gap in sending packets. From it’s perspective, all packets were sent with no error.

 • Because no packet are being received by the server, no packets will be sent to the extension. Eventually the server closes the connection due to keep-alive timeout.

 • FYI we log when the NEAppPushProvider subclass sleeps and it did NOT go to sleep while we were debugging.

🧾 Example Logs:

Extension log:

2025-03-24 18:34:48.808 sendKeepAliveRequest()

2025-03-24 18:34:49.717 sendKeepAliveRequest()

2025-03-24 18:34:50.692 sendKeepAliveRequest()

... // continuous sending of the ping packet to the server, no problems here

2025-03-24 18:35:55.063 sendKeepAliveRequest()

2025-03-24 18:35:55.063 keepAliveTimer IS TIME OUT... in CoreService. // this is triggered because we did not receive any packets from the server

Server log:

2025-03-24 18:34:16.298 No keep-alive received for 16 seconds... connection ID=95b3... // this shows that there has been no packets being received by the extension ...

2025-03-24 18:34:30.298 Connection timed out on keep-alive. connection ID=95b3... // eventually closes due to no packets being received

2025-03-24 18:34:30.298 Remote Subsystem Disconnected {name=iPhone|Replica-Ext|...}

✅ Observations:

•	The extension process continues running and logging keep-alive attempts.

•	However, network traffic stops reaching the server, and no inbound packets are received by the extension.

•	It looks like the socket becomes silently suspended or frozen, without being properly closed or throwing an error.

❓Questions:

•	Do you know why this might happen within a Local Connectivity Extension, especially under foreground conditions and locked ?

•	Is there any known system behavior that might cause the socket to be suspended or blocked in this way after running for a few hours?

Any insights or recommendations would be greatly appreciated.

Thank you!

Answered by DTS Engineer in 832779022

The client detects the disconnection sometime between 30 seconds and 1 minute, which is the time gap of seemingly non-activity. (No logs from any services occured)

So, the first thing you need to be CERTAIN of here is that this is all happening inside the same extension provider instance. From past experience, I've seen many cases which looked like "the connection is disconnecting" when the actual issue was "the device is dropping on/off Wifi". My preferred technique here is to have your logging library include current pid (process id) in every log message, typically as the first value in the log message. If everything is formatted properly, it's easy to filter out when scanning an extended log but any change still tends to "jump out" when your scanning a log.

Based on the time delay you're describing, I suspect this is actually a Wifi issue, though it's possible that CFStream itself is the issue.

The extension uses CFStream to establish a TCP connection with the server, primarily for receiving real-time notifications. To keep the connection stable, it sends a keep-alive packet every second.

You should not be using CFStream for networking at ALL. CFSocketStream has been deprected since 2021, but we've telling people NOT to use CFStream for networking ever since we introduced the Network framework in iOS 10. At this point, I can't really have ANY confidence in how it will behave, particularly in a specialized context like a connectivity extension.

Note that we specifically document that the Network framework should be used by network extensions in the document "In-Provider Networking". As a side comment, the deprecations on the rest of that page are a side effect of how the binding between Swift and the network framework have evolved over time, NOT a "real" issue with it's actual implementation.

The goal is to ensure the extension’s TCP connection remains active and reliable in the background. Which value best supports that behavior?

These values primarily effect packet QOS, which is unlikely to matter in a LAN setting where the network is unlikely to ever be close to saturation. However, in the network framework "nw_service_class_best_effort" or "nw_service_class_signaling" are what I'd probably use.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Correction for the original post:

On the extension, from our logs there was no gap in sending packets. From it’s perspective, all packets were sent with no error..

this is wrong ,there is a gap in sendKeepAlive message from extension to server. this is what we want to understand

2025-03-24 18:34:50.692 sendKeepAliveRequest()

// 65 seconds gap

2025-03-24 18:35:55.063 sendKeepAliveRequest()

I’m developing an app designed for hospital environments, where public internet access may not be available. The app includes two components: the main app and a Local Connectivity Extension. Both rely on persistent TCP socket connections to communicate with a local server.

A few questions and speculations here:

  • What network(s) have you reproduced this on? Are you seeing this on multiple networks (particularly controlled networks with very simply configurations) or on a specific network?

  • Is your server located on the the same local network as the clients or is it on a remote network or otherwise isolated from the clients?

  • Is there an intermediate NAT server involved?

  • What happens when the client identifies the problem at reconnects? Is it able to reconnect immediately or is there some delay/issue?

This just a guess, but one issue I've seen is that poorly behaved NAT servers will basically cause EXACTLY the behavior you're seeing. What they actually do is terminate the outer connection (from the server to the NAT router) but do NOT notify or otherwise disturb the inner connection (local device to NAT router). That means the server "knows" about the failure immediately but the client won't find out until it sends data and times out (or misses some expected transmission).

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks for the response. Here are my answers to the questions below:

❓1. What network(s) have you reproduced this on?

•	The clients connect to Access Points which provide an address on our internal network with outside internet access.
•	The server runs on our internal network, but on a network without internet access.
•	The network uses a bridge to connect these two address spaces.
•	We do not use DNS hostnames; we strictly utilize direct IP addressing and rely on the network bridge.
•	These two address spaces are internal to a single building.
•	We have only used the single internal network, with variations on the Access Point the phone connects through:
•	An Access Point managed by our IT department
•	An Access Point managed by the engineering group

Note: Both provide addresses on our ‘Production’ network.

❓2. Is your server located on the same local network as the clients or is it on a remote network or otherwise isolated from the clients?

•	The server is on a separate internal network, but within the same physical building.
•	A network bridge connects the client and server subnets.
•	Clients use direct IP addressing to reach the server (no DNS resolution).

❓3. Is there an intermediate NAT server involved?

•	No.
•	Our network design does not use NAT between clients and the server.
•	All communication is over direct internal IPs, routed via the internal bridge.

❓4. What happens when the client identifies the problem and reconnects? Is it able to reconnect immediately or is there some delay/issue?

The client detects the disconnection sometime between 30 seconds and 1 minute, which is the time gap of seemingly non-activity. (No logs from any services occured) Upon connection loss detected, the extension attempted a reconnection and is successful within an expected time frame.

one more question, The extension uses CFStream to establish a TCP connection with the server, primarily for receiving real-time notifications. To keep the connection stable, it sends a keep-alive packet every second.

Since .voIP is deprecated, which StreamNetworkServiceTypeValue should be used in this scenario?

Available options:

@available(iOS 4.0, *) public static let voIP: StreamNetworkServiceTypeValue // ⚠️ Deprecated

@available(iOS 5.0, *) public static let video: StreamNetworkServiceTypeValue

@available(iOS 5.0, *) public static let background: StreamNetworkServiceTypeValue

@available(iOS 5.0, *) public static let voice: StreamNetworkServiceTypeValue

@available(iOS 10.0, *) public static let callSignaling: StreamNetworkServiceTypeValue

The goal is to ensure the extension’s TCP connection remains active and reliable in the background. Which value best supports that behavior?

Accepted Answer

The client detects the disconnection sometime between 30 seconds and 1 minute, which is the time gap of seemingly non-activity. (No logs from any services occured)

So, the first thing you need to be CERTAIN of here is that this is all happening inside the same extension provider instance. From past experience, I've seen many cases which looked like "the connection is disconnecting" when the actual issue was "the device is dropping on/off Wifi". My preferred technique here is to have your logging library include current pid (process id) in every log message, typically as the first value in the log message. If everything is formatted properly, it's easy to filter out when scanning an extended log but any change still tends to "jump out" when your scanning a log.

Based on the time delay you're describing, I suspect this is actually a Wifi issue, though it's possible that CFStream itself is the issue.

The extension uses CFStream to establish a TCP connection with the server, primarily for receiving real-time notifications. To keep the connection stable, it sends a keep-alive packet every second.

You should not be using CFStream for networking at ALL. CFSocketStream has been deprected since 2021, but we've telling people NOT to use CFStream for networking ever since we introduced the Network framework in iOS 10. At this point, I can't really have ANY confidence in how it will behave, particularly in a specialized context like a connectivity extension.

Note that we specifically document that the Network framework should be used by network extensions in the document "In-Provider Networking". As a side comment, the deprecations on the rest of that page are a side effect of how the binding between Swift and the network framework have evolved over time, NOT a "real" issue with it's actual implementation.

The goal is to ensure the extension’s TCP connection remains active and reliable in the background. Which value best supports that behavior?

These values primarily effect packet QOS, which is unlikely to matter in a LAN setting where the network is unlikely to ever be close to saturation. However, in the network framework "nw_service_class_best_effort" or "nw_service_class_signaling" are what I'd probably use.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Socket Becomes Unresponsive in Local Connectivity Extension After Lock Screen
 
 
Q