Discussion:
[Xmltv-devel] XMLTV Data Structure
Adam Sutton
2012-06-13 10:41:51 UTC
Permalink
All,

Firstly apologies if this makes it to the list twice, I realised I posted
from the wrong email address.

I've recently been introduced to the world of DVB. I decided to replace my
Sky subscription with a home brew DVB solution, due to a) I never watch
anything but the FTA channels and b) XBMC finally got proper PVR support.

I've subsequently started doing lots of work on handling EPG data and
improving the capabilities within tvheadend (my DVB backend of choice). My
original involvement was to point out some of the horrible inefficiencies
in the tv_grab_uk_rt script, which I was able to demonstrate could be made
about 30-60 times faster. I think some of these suggestions have made it
into the script, though I've not looked as I haven't used it for a while.

Having already written my own proof of concept scraper script (to compare
performance) it was a logical step to continue this and generally improve
it and make something of it. As a result I learned that the _uk_rt script
got its data from a service now provided by metabroadcast.com. And this in
turn provided a much more structured API. So I started to integrate
directly with that. Again I've seen mentions of the _uk_rt script being
updated in a similar way on the MB forum.

I managed to get a script working that would grab from MB and output in
XMLTV compatible format, however I found this quite limiting as much of the
really useful information (about the underlying structure, links etc..) was
lost. So I added some additional fields which I imaginatively called eXMLTV
:) And at the same time I created my own XML format that was completely
different and allowed me to properly replicate the more structured data.

In turn my updates to tvheadend focus on using this more structured
approach to allow more complex tasks to be easily achieved.

Where I'm going with this (rather long winded email) is that I now have a
format which is completely incompatible with XMLTV but that I believe
provides a much better basis for EPG data (sorry if that sounds really
arrogant). And I'm wondering if there is any likelihood on working together
to improve XMLTV.

Generally speaking I'm not interested in creating a competitor to XMLTV, I
just don't think I've got what it takes. You already have the market
cornered so to speak and currently my script is only useful for UK freesat
data (and currently even that requires a key from MB). However I think that
where the upstream data is in a more structured format (I'm assuming other
such sources exist) it seems sensible to try and use this data.

Currently my changes in TVH still work with XMLTV they simple result in a
less useful database (since the full structure isn't available) I imagine
this is unlikely to go away as most people simply won't have access to the
more rich data. But where possible it can still be "massaged".

With regards to how my data is actually structured, I don't currently have
a DTD etc... but I've uploaded a quick sample @
https://github.com/downloads/adamsutton/PyEPG/epg.xml.gz. And if you're
interested in the code its @https://github.com/adamsutton/PyEPG

Anyway I look forward to any input :)

Regards
Adam
Karl Dietz
2012-07-15 09:33:38 UTC
Permalink
Hi Adam,

nice to see you over here, too :)

On 13.06.2012 12:41, Adam Sutton wrote:
...
Post by Adam Sutton
Where I'm going with this (rather long winded email) is that I now have
a format which is completely incompatible with XMLTV but that I believe
provides a much better basis for EPG data (sorry if that sounds really
arrogant). And I'm wondering if there is any likelihood on working
together to improve XMLTV.
I do think there is a place for "XMLTV schema - the next generation",
likely modeled to support use cases that are also supported by
TV-Anytime.
But to get the available upstream data into such a complex schema you
have to match transmitted programs to a shared database which does not
yet exist with a license usable for free. (Insert rant about missing
TVBrainz here :)
Post by Adam Sutton
Generally speaking I'm not interested in creating a competitor to XMLTV,
I just don't think I've got what it takes. You already have the market
cornered so to speak and currently my script is only useful for UK
freesat data (and currently even that requires a key from MB). However I
think that where the upstream data is in a more structured format (I'm
assuming other such sources exist) it seems sensible to try and use this
data.
(talking about german and nordic stations now)
The upstream data that I've seen is mostly created for human
consumption. Sometimes a proper schema exists but is not really used,
but most of the time the schema is lacking for machine consumption.
Post by Adam Sutton
With regards to how my data is actually structured, I don't currently
have a DTD etc... but I've uploaded a quick sample
@https://github.com/downloads/adamsutton/PyEPG/epg.xml.gz. And if you're
I have taken a quick look and saw nothing that can't be done with the
current XMLTV schema. (and creative use of the episode-num for
series_id and program_id) Did I overlook something or is the difference
in the database normalization?

Regards,
Karl
Adam Sutton
2012-07-15 13:45:45 UTC
Permalink
Hi Karl,

Where else have you seen me?
Post by Karl Dietz
Hi Adam,
nice to see you over here, too :)
...
Post by Adam Sutton
Where I'm going with this (rather long winded email) is that I now have
a format which is completely incompatible with XMLTV but that I believe
provides a much better basis for EPG data (sorry if that sounds really
arrogant). And I'm wondering if there is any likelihood on working
together to improve XMLTV.
I do think there is a place for "XMLTV schema - the next generation",
likely modeled to support use cases that are also supported by
TV-Anytime.
But to get the available upstream data into such a complex schema you
have to match transmitted programs to a shared database which does not
yet exist with a license usable for free. (Insert rant about missing
TVBrainz here :)
That's not true, at least not completely. Take for example the data feed
used by the UK radio times script. This feed is currently provided by the
Atlas system (metabroadcast.com) indirectly, via the old (soon to be
deprecated) CSV like format previously provided by the Radio Times.

Strictly this service is not yet free, at least via the Atlas API (though
PA, the upstream provider, do give permission to create the existing RT
XMLTV feed). But metabroadcast are working hard to change this and it will
be available free (to a certain extent, images etc may be restricted). This
is the feed that I currently use.

It's a fully hierarchical model as discussed. And I'm aware that other
providers do have access to such sources, even if they're not yet free.
Post by Karl Dietz
Post by Adam Sutton
Generally speaking I'm not interested in creating a competitor to XMLTV,
I just don't think I've got what it takes. You already have the market
cornered so to speak and currently my script is only useful for UK
freesat data (and currently even that requires a key from MB). However I
think that where the upstream data is in a more structured format (I'm
assuming other such sources exist) it seems sensible to try and use this
data.
(talking about german and nordic stations now)
The upstream data that I've seen is mostly created for human
consumption. Sometimes a proper schema exists but is not really used,
but most of the time the schema is lacking for machine consumption.
Post by Adam Sutton
With regards to how my data is actually structured, I don't currently
have a DTD etc... but I've uploaded a quick sample
@https://github.com/downloads/adamsutton/PyEPG/epg.xml.gz. And if you're
I have taken a quick look and saw nothing that can't be done with the
current XMLTV schema. (and creative use of the episode-num for
series_id and program_id) Did I overlook something or is the difference
in the database normalization?
Data normalization is beneficial, it allows greater detail without lots of
duplication. And yes of course you can use a "free for all" field to
describe anything you like. The point is that having a more structured way
to describe the data in an expressive form, as I've suggested would
be preferential. Note: my early attempts to use the Atlas feed used an
XMLTV like format (with some minor tweaks) to make it easy to feed into
TVH, which already had XMLTV support. However as I'm now the maintainer of
the TVH EPG (at least will be in the coming weeks), I've already
transitioned it to a hierarchical model as described.

Feeds that simply cannot/do not provide such a model can easily be inserted
into the more expressive model with certain links either removed or implied
as best as possible. And that's the key point, XMLTV does not allow me to
express the data that I have (without either adding fields, non compliant,
and/or using the get out of jail free card that is an uncontrolled field,
namely episode-num).

I don't expect anything to happen any time soon, I was just suggesting what
I think is a more superior model (I would say that ;) ) and is more like
many of the original broadcasters and large EPG aggregators use, even if
that data isn't currently freely available.

Regards
Adam

Loading...