Mariano Cosentino
2010-06-02 13:41:35 UTC
Hi everyone,
the subject of this email might not be the correct one, but
I find hard to label it correctly.
In Argentina, outside the prime-time, every workday has
almost exactly the same programming (soap opera, talk shows, old
reruns, etc.), and the most of the time the description will be just
the generic for the show and not specific to the episode; Also, the
movie channels will constantly repeat the same movies several times a
week (even several times a day)
As I look at the way TV_GRAB_AR works can not help but
notice that it will retrieve the same information over and over again,
thus placing a useless load on the providers, and I really want to
avoid bothering them too much.
To the point: I recently rewrote the TV_GRAB_AR, and
included a really simple cache that checks the Provider's programID
and will not download program descriptions that had already been
downloaded in that run. This has greatly reduced the number of program
descriptions that I need to download. But I feel that we must go a
step forward and use an persistent cache, one that we can use to check
in subsequent runs so we never have to re-download the same data
again.
Besides reducing the workload, this should also help on data
enrichment, as this cache database can be matched to the rest of the
available sources to have better information.
Now, before start working on it, I wanted to know what do
you think about it, or if anyone have tried that and/or see any issues
with it, even more interesting will be to know that someone had
already done it (so i can be a really lazy guy and just adapt it to
our needs).
Best Regards, Marianok
the subject of this email might not be the correct one, but
I find hard to label it correctly.
In Argentina, outside the prime-time, every workday has
almost exactly the same programming (soap opera, talk shows, old
reruns, etc.), and the most of the time the description will be just
the generic for the show and not specific to the episode; Also, the
movie channels will constantly repeat the same movies several times a
week (even several times a day)
As I look at the way TV_GRAB_AR works can not help but
notice that it will retrieve the same information over and over again,
thus placing a useless load on the providers, and I really want to
avoid bothering them too much.
To the point: I recently rewrote the TV_GRAB_AR, and
included a really simple cache that checks the Provider's programID
and will not download program descriptions that had already been
downloaded in that run. This has greatly reduced the number of program
descriptions that I need to download. But I feel that we must go a
step forward and use an persistent cache, one that we can use to check
in subsequent runs so we never have to re-download the same data
again.
Besides reducing the workload, this should also help on data
enrichment, as this cache database can be matched to the rest of the
available sources to have better information.
Now, before start working on it, I wanted to know what do
you think about it, or if anyone have tried that and/or see any issues
with it, even more interesting will be to know that someone had
already done it (so i can be a really lazy guy and just adapt it to
our needs).
Best Regards, Marianok