[Xmltv-devel] Add retry capability to XMLTV::Get_nice and tv_grab

Discussion:

[Xmltv-devel] Add retry capability to XMLTV::Get_nice and tv_grab_huro

Alistair Grant

2015-01-06 07:33:33 UTC

Hi,

I have modified Get_nice to optionally retry failed gets.

Prior to enabling retries I had over half the updates
using tv_grab_huro (in the Czech Republic) failing completely, and the
remainder were incomplete due to what were considered non-fatal
failures (but it did mean that data for some channels was not
retrieved).

Comments in the code indicate that retries were not implemented
because failures were so rare, unfortunately that has not been my
experience.

Given that the overhead of supporting retries is minimal, and the cost
of re-running tv_grab_xxx is much higher, in my case 30+ gets for a
re-run vs a few gets due to retries, it seems like a logical feature
to add.

Note that the modification does not change existing behaviour by
default, the retries have to be enabled.

I've run the tests in http://wiki.xmltv.org/index.php/XmltvValidation
with mostly positive results. I've seen some errors that were
previously occurring according to
http://www.crustynet.org.uk/~xmltv-tester/squeeze/nightly/ and some
that are due to my environment, e.g. not substituting $Id$ as part of
the build process.

Before spending more time on tracking down and either fixing or
documenting the errors I'd like to get some confidence that the
changes will be accepted back in to the main repository.

The code is currently in my github repository on the huro_retry branch
(https://github.com/akgrant43/XMLTV/tree/huro_retry). If you would
like to take a look, only two files have been modified:

* grab/Get_nice.pm
* grab/huro/tv_grab_huro.in

README has been modified to add a pointer back to the original code so
people know it is a fork, it won't be part of the final submission.

Please let me know what you think.

Thanks very much,
Alistair
akgrant0710 on SourceForge

Robert Eden

2015-01-08 06:30:11 UTC

Permalink

I'm currently at CES so it's difficult for me to work on it this week, if someone else wants to, great.

My only concern is to make sure the get_nice changes don't break other grabbers that use it.

Robert

Alistair Grant

2015-01-08 08:36:13 UTC

Permalink

Hi Robert,

Post by Robert Eden
I'm currently at CES so it's difficult for me to work on it this week, if someone else wants to, great.
My only concern is to make sure the get_nice changes don't break other grabbers that use it.

Thanks for your reply. I'll work on running test_grabbers and report
back with the results - it will probably be next week before I have an
update anyway.

Thanks again,
Alistair

h***@gmail.com

2015-01-11 16:14:45 UTC

Permalink

Post by Alistair Grant
I have modified Get_nice to optionally retry failed gets.
Prior to enabling retries I had over half the updates
using tv_grab_huro (in the Czech Republic) failing completely, and the
remainder were incomplete due to what were considered non-fatal
failures (but it did mean that data for some channels was not
retrieved).
Comments in the code indicate that retries were not implemented
because failures were so rare, unfortunately that has not been my
experience.
Given that the overhead of supporting retries is minimal, and the cost
of re-running tv_grab_xxx is much higher, in my case 30+ gets for a
re-run vs a few gets due to retries, it seems like a logical feature
to add.
Note that the modification does not change existing behaviour by
default, the retries have to be enabled.

[...]

Hi,

While I agree that some supposedly permanent errors sometimes turn out to be transient (e.g. 404, 408, 500), most errors (4xxx, 5xxx) *are* permanent and should be considered fatal.

I don't think you should be doing a blanket retry for every possible 4xxx/5xxx code.

What errors are you actually getting? (c.f. "$r->status_line" )

Geoff

Alistair Grant

2015-01-11 19:39:28 UTC

Permalink

Post by h***@gmail.com

[...]
Hi,
While I agree that some supposedly permanent errors sometimes turn out to be transient (e.g. 404, 408, 500), most errors (4xxx, 5xxx) *are* permanent and should be considered fatal.
I don't think you should be doing a blanket retry for every possible 4xxx/5xxx code.
What errors are you actually getting? (c.f. "$r->status_line" )
Geoff

The error I'm getting is:

500 Can't connect to www.port.cz:80 (Connection timed out)

I'm happy to limit the retry code to the list of status values you
provided above. I'll also add a backoff timer.

Thanks,
Alistair

h***@gmail.com

2015-01-12 08:31:52 UTC

Permalink

Post by Alistair Grant

Post by h***@gmail.com
While I agree that some supposedly permanent errors sometimes turn out to be transient (e.g. 404, 408, 500), most errors (4xxx, 5xxx) *are* permanent and should be considered fatal.
I don't think you should be doing a blanket retry for every possible 4xxx/5xxx code.
What errors are you actually getting? (c.f. "$r->status_line" )

500 Can't connect to www.port.cz:80 (Connection timed out)
I'm happy to limit the retry code to the list of status values you
provided above. I'll also add a backoff timer.
Thanks,
Alistair

Yes, "connection timeout" is always a tricky one; you don't know if it's a genuine problem at the website end or for how long it will last (if it's transient).

In most cases though, a 404 or a 500 *is* permanent and fatal; in these cases there is simply no point in trying again.

I don't think we should be allowing retries for *all* grabbers to cater for the erroneous errors experienced by one or two.

In version 0.005065 I exposed the LWP response object (as $Response) so the grabber could check the reply code and other headers if that grabber needed to. This allows the calling grabber to set

$Get_nice::FailOnError = 0; # prevent failure on GET error

and then check $Get_nice::Response->code and decide what to do with the error. So in effect the retry loop becomes part of tv_grab_huro rather than in get_nice. This way each grabber can make its own decisions as to what to do based on the specific issues with its particular website source.

Cheers,
Geoff

Alistair Grant

2015-01-13 09:13:35 UTC

Permalink

Post by h***@gmail.com

Post by Alistair Grant

500 Can't connect to www.port.cz:80 (Connection timed out)
I'm happy to limit the retry code to the list of status values you
provided above. I'll also add a backoff timer.
Thanks,
Alistair

Yes, "connection timeout" is always a tricky one; you don't know if it's a genuine problem at the website end or for how long it will last (if it's transient).
In most cases though, a 404 or a 500 *is* permanent and fatal; in these cases there is simply no point in trying again.
I don't think we should be allowing retries for *all* grabbers to cater for the erroneous errors experienced by one or two.
In version 0.005065 I exposed the LWP response object (as $Response) so the grabber could check the reply code and other headers if that grabber needed to. This allows the calling grabber to set
$Get_nice::FailOnError = 0; # prevent failure on GET error
and then check $Get_nice::Response->code and decide what to do with the error. So in effect the retry loop becomes part of tv_grab_huro rather than in get_nice. This way each grabber can make its own decisions as to what to do based on the specific issues with its particular website source.

Hi Geoff,

The way it's implemented allows each grabber to make its own decision.
The grabber requests retries, and then decides what to do if there is
still a problem after the maximum number of retries have been
exceeded. Just to reiterate, the default behaviour is unchanged, i.e.
we're not forcing all grabbers to retry, and if retries are enabled,
the behaviour after the final retry is also unchanged, i.e. it is
defined by $FailOnError.

If the grabbers are required to implement the retry functionality it
will need to be done for each of the entry points that the grabber
uses, i.e. one or more of get_nice, get_nice_tree, get_nice_xml and
get_nice_json, or a dispatch table will have to be maintained, instead
of just implementing it once in get_nice_aux.

If you're really determined for this to be in the grabber I can move
it, but I still think Get_nice.pm is the best place.

Thanks again for your feedback,
Alistair

h***@gmail.com

2015-01-13 13:00:31 UTC

Permalink

Post by Alistair Grant
Hi Geoff,
The way it's implemented allows each grabber to make its own decision.
The grabber requests retries, and then decides what to do if there is
still a problem after the maximum number of retries have been
exceeded. Just to reiterate, the default behaviour is unchanged, i.e.
we're not forcing all grabbers to retry, and if retries are enabled,
the behaviour after the final retry is also unchanged, i.e. it is
defined by $FailOnError.
If the grabbers are required to implement the retry functionality it
will need to be done for each of the entry points that the grabber
uses, i.e. one or more of get_nice, get_nice_tree, get_nice_xml and
get_nice_json, or a dispatch table will have to be maintained, instead
of just implementing it once in get_nice_aux.
If you're really determined for this to be in the grabber I can move
it, but I still think Get_nice.pm is the best place.
Thanks again for your feedback,
Alistair

Hi Alistair,

Shouldn't be too onerous; it's only a simple subroutine in the grabber which acts as a wrapper around get_nice.

I agree get_nice_aux is the cleanest place to put it but my concern arises around the use cases for it. A 404 for one grabber may be worth retrying (the source is flakey) but for another may be pointless (the source can be trusted). So you then start to need to parameterise which codes are 'retry-able' (i.e. on a per-grabber basis).

Plus it wouldn't fix all scenarios, so some situations will *still* need grabber-specific code.

E.g. In one grabber I worked on, the web pages frequently returned code 200 but the content was empty! In another it returned a 304 (Not Modified) even though you'd specified no caching. Another required checking certain response headers before you could be sure the data were valid.

But I'm not judge and jury; let's see what others think.

Cheers,
Geoff

Alistair Grant

2015-01-13 13:18:14 UTC

Permalink

Post by h***@gmail.com

This has already been made configurable - the grabber can set the list
of codes to retry in $XMLTV::Get_nice::RetryCodes.

Post by h***@gmail.com
Plus it wouldn't fix all scenarios, so some situations will *still* need grabber-specific code.
E.g. In one grabber I worked on, the web pages frequently returned code 200 but the content was empty! In another it returned a 304 (Not Modified) even though you'd specified no caching. Another required checking certain response headers before you could be sure the data were valid.

Thanks for the explanation, this helps me understand where you're
coming from. Some of these, e.g. checking headers, are grabber
specific, I agree. However I think retries are a much more common
requirement, e.g. it is something provided in wget, and it doesn't
preclude the grabbers performing additional checks.

Post by h***@gmail.com
But I'm not judge and jury; let's see what others think.

:-)

Thanks again,
Alistair