Discussion:
[Xmltv-devel] tv_grab_uk_rt Performance
Adam Sutton
2012-01-17 15:37:26 UTC
Permalink
Hi,

Apologies if this would be better in the users list, but it feels like a
devel question.

I've just entered the world of DVB and thus started using xmltv,
specifically tv_grab_uk_rt. I was surprised by how long this took to
process the channel listing. For approx 60 channels on my laptop it takes
about 15min, on my DVB box that's more like 60min! And that's with 100% CPU
load.

I've done some searching on the web and most responses seem to be "that's
how long it takes!". However it doesn't seem right to me.

However this doesn't seem right to me for some basic data processing
(apologies if I'm vastly over simplifying), so I thought I'd write my own
script, using some of the hosted xmltv data used by tv_grab_uk_rt, to see
if I could see where the bottle necks are.

So far I have something which I believe is outputting roughly the right
things (I'm currently using python-xmltv which doesn't format the xml so
difficult to compare, however in an earlier version I hand crafted the XML
to match the format output by tv_grab_uk_rt). This script takes approx 15s
on my laptop (not tried on DVB box).

It doesn't include all the string processing, so my guess is that must be
part of the reason things take so long. So far I have only included the
uft8_fixups mappings, basic prog title mappings (key 5) and genre mappings.

However for a basic EPG display this is more than good enough (at least for
now). I just wondered what peoples thoughts were on this and whether there
is a mode that the mentioned script can be put into to create a similar
performance.

Regards
Adam
Karl Dietz
2012-01-17 21:10:54 UTC
Permalink
Hi Adam,
Post by Adam Sutton
I've just entered the world of DVB and thus started using xmltv,
That sounds the wrong way around. Going to digital TV you got some
options for guide data via your DVB signal that you did not have on
an analogue feed. (DVB-EIT, MHEG, etc.)
If you only want to look at the guide that might be enough for you.

...
Post by Adam Sutton
However this doesn't seem right to me for some basic data processing
(apologies if I'm vastly over simplifying), so I thought I'd write my
own script, using some of the hosted xmltv data used by tv_grab_uk_rt,
to see if I could see where the bottle necks are.
Analysis of performance bottlenecks in _uk_rt (or other grabbers) are
always appreciated.
Post by Adam Sutton
So far I have something which I believe is outputting roughly the right
things (I'm currently using python-xmltv which doesn't format the xml so
difficult to compare, however in an earlier version I hand crafted the
XML to match the format output by tv_grab_uk_rt). This script takes
approx 15s on my laptop (not tried on DVB box).
you can run your files through tv_sort to unify the formatting
Post by Adam Sutton
It doesn't include all the string processing, so my guess is that must
be part of the reason things take so long. So far I have only included
the uft8_fixups mappings, basic prog title mappings (key 5) and genre
mappings.
However for a basic EPG display this is more than good enough (at least
for now). I just wondered what peoples thoughts were on this and whether
there is a mode that the mentioned script can be put into to create a
similar performance.
The value of _uk_rt lies in the unification of the guide for
consumption by machines.
Likely everyone is happy with it being "fast enough" as it runs in a
daily batch job anyway.

Another option would be to port the fixes upstream, now that the feed
generation is done by MetaBroadcast. But you really need to talk to
Nick/MetaBroadcast about current developments.

If you want a fast and resource conversing grabber you'll always have
to fiddle on the provider side. Here's an example of a simple and
efficient API (OZTivo's extension of the SweDB API)
http://www.oztivo.net/twiki/bin/view/TVGuide/StaticXMLGuideAPI

Regards,
Karl
John Veness
2012-01-18 09:41:43 UTC
Permalink
Post by Adam Sutton
I've just entered the world of DVB and thus started using xmltv,
specifically tv_grab_uk_rt. I was surprised by how long this took to
process the channel listing. For approx 60 channels on my laptop it
takes about 15min, on my DVB box that's more like 60min! And that's with
100% CPU load.
If another data point is required, on my DVB-T (i.e. Freeview) setup,
tv_grab_uk_rt takes about 20 minutes. I'm sure a year or two ago it was
only 3 or 4 minutes, but I cannot remember version numbers etc. It would
be interesting to find where the bottlenecks are.

Cheers,

John
--
John Veness, MythTV user, UK, DVB-T
Nick Morrott
2012-01-18 10:16:41 UTC
Permalink
Post by John Veness
Post by Adam Sutton
I've just entered the world of DVB and thus started using xmltv,
specifically tv_grab_uk_rt. I was surprised by how long this took to
process the channel listing. For approx 60 channels on my laptop it
takes about 15min, on my DVB box that's more like 60min! And that's with
100% CPU load.
If another data point is required, on my DVB-T (i.e. Freeview) setup,
tv_grab_uk_rt takes about 20 minutes. I'm sure a year or two ago it was
only 3 or 4 minutes, but I cannot remember version numbers etc. It would
be interesting to find where the bottlenecks are.
I'll have a look with Devel::NYTProf, unless someone else beats me to
it and posts the results. On my slow MythTV backend built back in 2006
(AMD 3800), it takes 15 minutes to grab/process data for the 80
channels I have configured. This is with debugging enabled (which
slows it down further).

With metabroadcast having taken over the Radio Times feed and being
very proactive in listening to the community, it may well be possible
to push some of the processing upstream and even provide the data
natively in XMLTV format. The new feed *should* be utf-8 clean, so
that is one part of the grabber which could be removed going forwards.
Note that there is still a lot of processing of the data to make it as
consistent and rich as possible, which is where the slowdowns will
likely be. The new feed has its own ideas about what a programme's
*correct* title is, and I'm posting regular updates to try and keep
things consistent across the board.

@Adam - If you post your code online we can compare it to the current
guts of uk_rt - it may help to spot where bottlenecks exist. What spec
is your DVB box? Why aren't you using EIT listings instead of XMLTV?

Cheers,
Nick
Adam Sutton
2012-01-18 10:39:19 UTC
Permalink
Firstly apologies. You're missing an important post I added with details of
my investigations into where the bottlenecks are. I stupidly kept replying
from my default gmail account though I signed up with an alias so the
replies are getting held in a moderator list.

Anyway to summarise what I've found (hopefully the full reply will be
released later on):

The bottleneck is all in the time processing, which I must say surprised
me. It's not in the text processing at all, though I agree with some
comments regarding the fact that some of the processing may be redundant
these days.

The problem is that the time processing library is entirely string based
and therefore hugely inefficient. To give an example, on my laptop (PVR
machine is at home) I get the following times for a selection of 10
channels:

_uk_rt (unmodded) = ~150s
_uk_rt (with times converted to unix timestamps) = ~5s

A 30x speed up.

For my PVR machine this should equate to a run of approx 2-3min for the 60
or so channels I have. Which is more than acceptable.

I have spoken briefly to Robb about this and he has pointed at that I'm
probably not handling the DST issues, and he was correct I've left this out
initially (especially given the fact that DST only really becomes an issue
for that 1 nasty hour a year when the time jumps backwards, forwards is
easy to deal with due to no ambiguities). But this is a solvable problem.

I've just started looking into this using some data I managed to grab from
archive.org covering the two change dates (a few years back mind). And
interestingly the library in use by xmltv gets the times wrong anyway! When
the time goes forward I believe its correct, but when the time goes back it
gets all muddled up.

I'm going to try and spend some time making my _uk_rt mod DST aware, for
the time being I'm focusing on mods directly in that code, rather than
creating a replacement time processing lib that can be used by the other
tv_grab routines. But that shouldn't be too tricky.

My intention will be to provide a patch once I actually have something that
works properly and includes all time checks etc... I'm happy to also
provide my own python implementation, but to be honest I'd personally
rather use the xmltv scripts (with mods) since these are well used/tested
etc... However I may do some more work on my python scripts as it could
form a useful testing ground (as I better understand the code) and a useful
comparison.

I'm not using the on-air guide since I only seem to be getting the now/next
info, plus I believe EIT is only 7 days anyway? I'd rather have the full
2weeks if possible.

My PVR box is an old 800MHz CPU (forget what the model is), so its pretty
slow but more than enough to run tvheadend for the DVB streaming.

I'll try to provide some code later this week.

Regards
Adam
Post by Nick Morrott
Post by John Veness
Post by Adam Sutton
I've just entered the world of DVB and thus started using xmltv,
specifically tv_grab_uk_rt. I was surprised by how long this took to
process the channel listing. For approx 60 channels on my laptop it
takes about 15min, on my DVB box that's more like 60min! And that's with
100% CPU load.
If another data point is required, on my DVB-T (i.e. Freeview) setup,
tv_grab_uk_rt takes about 20 minutes. I'm sure a year or two ago it was
only 3 or 4 minutes, but I cannot remember version numbers etc. It would
be interesting to find where the bottlenecks are.
I'll have a look with Devel::NYTProf, unless someone else beats me to
it and posts the results. On my slow MythTV backend built back in 2006
(AMD 3800), it takes 15 minutes to grab/process data for the 80
channels I have configured. This is with debugging enabled (which
slows it down further).
With metabroadcast having taken over the Radio Times feed and being
very proactive in listening to the community, it may well be possible
to push some of the processing upstream and even provide the data
natively in XMLTV format. The new feed *should* be utf-8 clean, so
that is one part of the grabber which could be removed going forwards.
Note that there is still a lot of processing of the data to make it as
consistent and rich as possible, which is where the slowdowns will
likely be. The new feed has its own ideas about what a programme's
*correct* title is, and I'm posting regular updates to try and keep
things consistent across the board.
@Adam - If you post your code online we can compare it to the current
guts of uk_rt - it may help to spot where bottlenecks exist. What spec
is your DVB box? Why aren't you using EIT listings instead of XMLTV?
Cheers,
Nick
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
xmltv-devel mailing list
https://lists.sourceforge.net/lists/listinfo/xmltv-devel
Karl Dietz
2012-01-19 08:50:37 UTC
Permalink
Post by Adam Sutton
I have spoken briefly to Robb about this and he has pointed at that I'm
probably not handling the DST issues, and he was correct I've left this
out initially (especially given the fact that DST only really becomes an
issue for that 1 nasty hour a year when the time jumps backwards,
forwards is easy to deal with due to no ambiguities). But this is
a solvable problem.
I've just started looking into this using some data I managed to grab
from archive.org <http://archive.org> covering the two change dates (a
few years back mind). And interestingly the library in use by xmltv gets
the times wrong anyway! When the time goes forward I believe its
correct, but when the time goes back it gets all muddled up.
We are using the DateTime modules on our NonameTV sites for time
handling. I try to fixup the guide around the DST switch but the
upstream data is ambiguous more often then not.

I usually tell the module "here comes Europe/Berlin local time" then do
math in UTC and convert back to local time with explicit offset for
output.
I've considered porting the whole of Xmltv over but currently lack the
time for such a big project.
Post by Adam Sutton
I'm going to try and spend some time making my _uk_rt mod DST aware, for
the time being I'm focusing on mods directly in that code, rather than
creating a replacement time processing lib that can be used by the other
tv_grab routines. But that shouldn't be too tricky.
May I suggest to look at DateTime? Its working well for advanced stuff
like creating sets of time spans to cut and merge time sharing channels.
(see
https://github.com/dekarl/nonametv/blob/master/lib/NonameTV/Importer/Combiner.pm#L439
)
Post by Adam Sutton
My intention will be to provide a patch once I actually have something
that works properly and includes all time checks etc... I'm happy to
also provide my own python implementation, but to be honest I'd
personally rather use the xmltv scripts (with mods) since these are well
used/tested etc... However I may do some more work on my python scripts
as it could form a useful testing ground (as I better understand the
code) and a useful comparison.
I'm not using the on-air guide since I only seem to be getting the
now/next info, plus I believe EIT is only 7 days anyway? I'd rather have
the full 2weeks if possible.
We have up to 4 weeks of DVB-EIT on satellite/cable in Germany. I don't
know how for the Freesat/Freeview MHEG guide goes into the future.

Regards,
Karl

PS: EIT is explicit about the time ;)
Adam Sutton
2012-01-19 13:41:57 UTC
Permalink
I'm going to have to move onto other things for a bit. I did have a play
with using DateTime within the existing perl code, but not in a nice
generic (library) fashion that might be usable throughout XMLTV.

Possibly someone with more perl experience would be able to find it easier.
I will try and have another go at some point, but I need to move onto other
problems.

I have finished playing with my python script though, I tried experimenting
with some output caching, but actually this wasn't very efficient and it
was quicker to just reprocess the files each time, especially given the
absolute times are quite good anyway.

I've run it on my laptop and it creates the 62 freesat channels in around
90s, 60s of which is data fetching. This compares to around 15mins with the
original script. So assuming a roughly equal fetch time this is about a 30x
speedup.

My script still doesn't include ALL the title processing in the original,
but it does have most and as previously noted this is not a bottleneck in
my opinion. Adding the rest isn't a big job, I just don't need it at the
moment. I added as much as I felt necessary to allow me to do a direct file
diff.

With regards to the DST handling, this is also handled manually within the
script and probably isn't as generic as it could be (in fact I've just
realised I think I've ignored explicit timezone info in the entries).
However my testing on some old data from the internet archive showed it did
a better job than the original script which still appears to get confused
at the DST switchover (DST -> non-DST).

The output is hand-generated XML, rather than using the python xmltv output
module, since this doesn't format and again makes comparison difficult
(couldn't seem to get tv_sort working?).

I've put the script in my googlecode svn repo, so feel free to take a look:
http://adamsutton.googlecode.com/svn/xbmc-pvr/trunk/xmltv/tv_grab_uk_rt_aps.py

The script will use the existing tv_grab_uk_rt configuration file. However
so that it doesn't interfere with any of the cached data I've implemented
my own caching under ~/.xmltv/cache2.

You can run the script as:

python tv_grab_uk_rt_aps.py > listings.xml

--debug will enable some extra debug (though not as informative as the
original)
--help will show other options

I'm not proposing this as a replacement for the existing script in any way,
since the project is obviously written in perl not python. I just thought
it might serve as useful demonstration of an alternative, more efficient,
approach.

Regards
Adam

P.S.
I have hacked it quite a bit since I ran the comparison tests against older
data, so I won't put my hand on heart and swear it still gets all the DST
issues right ;)

I wasn't tracking it in SVN at the time, doh!

P.P.S
There are definitely still other general faults in the script, as its just
a toy example.
Post by Karl Dietz
Post by Adam Sutton
I have spoken briefly to Robb about this and he has pointed at that I'm
probably not handling the DST issues, and he was correct I've left this
out initially (especially given the fact that DST only really becomes an
issue for that 1 nasty hour a year when the time jumps backwards,
forwards is easy to deal with due to no ambiguities). But this is
a solvable problem.
I've just started looking into this using some data I managed to grab
from archive.org <http://archive.org> covering the two change dates (a
few years back mind). And interestingly the library in use by xmltv gets
the times wrong anyway! When the time goes forward I believe its
correct, but when the time goes back it gets all muddled up.
We are using the DateTime modules on our NonameTV sites for time
handling. I try to fixup the guide around the DST switch but the
upstream data is ambiguous more often then not.
I usually tell the module "here comes Europe/Berlin local time" then do
math in UTC and convert back to local time with explicit offset for
output.
I've considered porting the whole of Xmltv over but currently lack the
time for such a big project.
Post by Adam Sutton
I'm going to try and spend some time making my _uk_rt mod DST aware, for
the time being I'm focusing on mods directly in that code, rather than
creating a replacement time processing lib that can be used by the other
tv_grab routines. But that shouldn't be too tricky.
May I suggest to look at DateTime? Its working well for advanced stuff
like creating sets of time spans to cut and merge time sharing channels.
(see
https://github.com/dekarl/nonametv/blob/master/lib/NonameTV/Importer/Combiner.pm#L439
)
Yeah I started using DateTime for a "rough" reworking. Unfortunately my
perl is VERY rusty and while my implementation was much faster than the
original it's nowhere near as quick as my from scratch python
implementation.
I didn't see any obvious auto timezone calculation in DateTime, however
the rules for determining DST are relatively simple if you know whether or
not they apply for a given timezone, which we do for the Radio Times since
all times are given in UK local time.
Having local time in the source data is always going to create
ambiguities, I've had this problem in the past with other things, without
comparison to other info (i.e. preceding and proceeding programmes) it's
simply not possible to be 100% accurate in the determination of the correct
time when moving from DST to non-DST (due to the nasty repeating hour).
I've noted that even the existing _uk_rt script does not actually handle
the DST changeovers properly, whereas my implementation does. It tends to
output the end time in the same timezone as the start, even if the end is
no longer subject to DST, however the actual time is at least correct. I.e.
it might output 0205 +0100, even though strictly its 0105 +0000, but the
represented UTC time is at least correct.
Post by Karl Dietz
Post by Adam Sutton
My intention will be to provide a patch once I actually have something
that works properly and includes all time checks etc... I'm happy to
also provide my own python implementation, but to be honest I'd
personally rather use the xmltv scripts (with mods) since these are well
used/tested etc... However I may do some more work on my python scripts
as it could form a useful testing ground (as I better understand the
code) and a useful comparison.
I'm not using the on-air guide since I only seem to be getting the
now/next info, plus I believe EIT is only 7 days anyway? I'd rather have
the full 2weeks if possible.
We have up to 4 weeks of DVB-EIT on satellite/cable in Germany. I don't
know how for the Freesat/Freeview MHEG guide goes into the future.
I had been told there was a 7day EPG on the air, but it wasn't
automatically detected by tvheadend. And in the grand scheme of things,
i.e. getting a working replacement for my existing
standalone satellite PVR, that won't get me in trouble with the wife, the
EPG performance is low down the list.
However I thought I could quickly contribute something useful in this area.
Post by Karl Dietz
PS: EIT is explicit about the time ;)
No idea :(
Nick Morrott
2012-02-08 16:54:31 UTC
Permalink
Post by Nick Morrott
Post by John Veness
Post by Adam Sutton
I've just entered the world of DVB and thus started using xmltv,
specifically tv_grab_uk_rt. I was surprised by how long this took to
process the channel listing. For approx 60 channels on my laptop it
takes about 15min, on my DVB box that's more like 60min! And that's with
100% CPU load.
If another data point is required, on my DVB-T (i.e. Freeview) setup,
tv_grab_uk_rt takes about 20 minutes. I'm sure a year or two ago it was
only 3 or 4 minutes, but I cannot remember version numbers etc. It would
be interesting to find where the bottlenecks are.
I'll have a look with Devel::NYTProf, unless someone else beats me to
it and posts the results. On my slow MythTV backend built back in 2006
(AMD 3800), it takes 15 minutes to grab/process data for the 80
channels I have configured. This is with debugging enabled (which
slows it down further).
Profiling revealed how (very) slow the various Date::Manip calls are.
Everything else (including the title processing and utf-8 fixup
routines were not causing any problems).

I have replaced all of the DM-based time handling routines in uk_rt
with DateTime versions and have eliminated all but one call to
XMLTV::DST::utc_offset - and that functionality can probably be
replaced quite easily. The completion time for my 78 channels has
since reduced from 15 minutes to 2 minutes on a 2006-vintage machine.
I still need to fully test the timezone and DST handling with
real-world data, but the script currently creates the exact same XML
output as my production server does running the slower
Date::Manip-based code, so it's looking hopeful.

I have also moved the existing UTF-8 fixup handing to a configuration
option - the source data should be clean now since the MB takeover of
the feed, but I'll leave it in there for now "just in case". I may
well need to add back some of the UTF-8 handling to make
quotes/apostrophes consistent.

I'll prob commit these DateTime updates to CVS tomorrow, so other
uk_rt users can test it. Apologies for not having investigated this
sooner.

Cheers,
Nick

Continue reading on narkive:
Search results for '[Xmltv-devel] tv_grab_uk_rt Performance' (Questions and Answers)
10
replies
what are some things in the universe we can talk about?
started 2007-06-28 13:27:37 UTC
astronomy & space
Loading...