Z-Push Loop detection

From Zarafa wiki

Jump to: navigation, search

With Z-Push 1.x in several occasions loops could occur, which drained server performance and could only be solved manually. These cases were addressed in Z-Push 2.x, by implementing several counter measures in the so called "Loop detection". The functioning and reason behind are explained on this page.

When an object is ignored by Z-Push, you can see the reason with z-push-admin. Please execute

 z-push-admin.php -a list -u username

There are three general cases:

1. the object is broken in whatever reason (detected by Z-Push, semantic error, like appointments end date is before the start date)
2. there is a connection issue between the server and the mobile (bad reception, network instabilities etc.)
3. the mobile can (for whatever reason) not process the received data

In general: connection problems are impossible to avoid, because of the nature of mobile communication. Entering a tunnel or loosing connectivity is a big issue, but generally temporarily.

Case 1 is stated clearly in Z-push-admin. In these cases there is either a problem with the object (broken) itself or there is a bug in Z-Push, detecting an error while there is none. In this case a resync generally does NOT help, as the object will continue broken. The object has to be fixed before resync (eg. with zarafa-fsck).

Cases 2 and 3 look the same to Z-Push, because in these cases the mobile will just request the data of the last request again. This is when the so called "loop detection" of Z-Push kicks in. Z-Push sends the same data again and triggers an internal counter (send to the mobile, 2..3..4 times) to keep track.

The first counter measure of the loop detection is to decrease the amount of data (less items) sent per request. Normally up to 100 objects (appointments, emails) are requested by the mobile per request (Android generally requests 5, iOS 25 or 50 and Windows Mobile 100). With loop detection, Z-Push replies only 1 object per request. The amount of data is much smaller and the probability of transmission success is higher. The downside of this is, that to synchronize the same 100 objects, 100 requests are generated on your system which could increase the load temporarily.

The next step of loop detection kicks in, if the mobile re-request the same (single) object over and over again. This generally indicates case 3, but could also mean that the network is really (!) compromised. In this case, Z-Push tries to send the single object 2 more times to the mobile. On the third retry the object is ignored and added to the objects which "need attention" of the z-push-admin list. In this case, the reason string states something like "object was causing loop".

We have seen that the last case happens more often when within locations with very bad and unstable mobile reception. Unfortunately there is not much we can do. Basically the only possibility is to increase the number of allowed retries before ignoring, so that the server sends the same object e.g. 10 times before ignoring it.

This can be easily patched in the code, it's just an easy change which has no other side effects. In file z-push/lib/core/loopdetection.php, around line 732 ("case 3.3.1 - we got our broken item"). Increase the number 3 to eg. 5 in the line:

   if ($current['loopcount'] >= 3 && isset($current['potential'])) {

We are currently discussing to increase this value by default. A value of 5 is a good start, but you can also set a higher value.

In general e.g. 10 is already too high, because if does not work after e.g. 5 tries, than probably case 3 is happening (the mobile does not understand the data). In this scenario you could set it to 10000: the synchronization will not succeed and will just generate load and blocking other data to be synchronized as this is an issue on the mobile itself or the object is broken in a way it can not be detected by z-push (yet).

Personal tools