Post by h***@gmail.comRevised proposals for discussion.
Thanks, Geoff, for the revision.
For my taste, it's using too much space for the explanation of "system",
and what it's not, but doesn't sufficiently nail down what we *do* expect.
Most importantly, the concept of stability is lacking or not
sufficiently clear, but it's central to the idea of a CID.
I like that you're predefining some well-known sources.
I incorporated some of your proposal B into my earlier definition, so
here's proposal C:
<!ELEMENT programme (title+, sub-title*, desc*, credits?, date?,
category*, keyword*, language?, orig-language?, length?, icon*, url*,
country*, episode-num*, video?, audio?, previously-shown?, premiere?,
last-chance?, new?, subtitles*, rating*, star-rating*, review*, cid*)>
<!ELEMENT cid (#PCDATA)>
<!ATTLIST cid system CDATA>
<!-- CID : Content Identifier
This is an identifier which uniquely identifies some 'content' within
all the programmes for this grabber. A CID may refer to a film, episode
of a series, or e.g. a news or sports broadcast.
The CID is an opaque string, and only valid within a "system" that you
specify. The system is generally your grabber source, so the CIDs only
need to be unique for your grabber. See below for further explanation.
If the video content is the same, the CID should be the same, even if
broadcasted at a different time. If the video content is different, the
CID must be different.
1. Unique
There must never ever be 2 different contents with the same CID.
This means that a CID can only refer to an episode, not a whole series
or other grouping, because every episode would have the same ID, but
different content, and that's not unique anymore. It also means that an
episode number like "209" cannot be a CID, because there are several
series that have an episode called "209", but the content is different,
so again the number is not unique.
'Content' refers to what the end user actually wants to see, e.g. the
movie or news. Non-primary content like advertizing should be ignored
and is not considered different content. OTOH, each 8PM evening news
must get a different CID, because it's new for the user.
2. Stable
There must never be the same content with 2 different IDs.
If the video content is the same, the ID should be the same, even if
broadcasted at a different time.
I.e. if the same video is broadcasted again in the night, the next day
or in the coming weeks, the ID MUST be the same. If the same video is
broadcasted again 3 years later, it SHOULD have the same ID as 3 years
before.
It is unlikely a grabber will have sufficient information available to
it to be able to create its own CID; it is expected therefore that only
CIDs from the data source will be used.
For example an IMDb ID will be a good and probably very stable CID. A
programme ID used in a web link on the source website is probably
different for every airing, so it's not 'stable', and it's NOT
acceptable as CID. If your source provides an ID for the movie, and it
stays the same for all airings of that movie, it can be used as CID.
If any of the above criteria are not met by your IDs, you MUST NOT use
the <cid> tag.
Usecases
CIDs can be used for e.g.:
* database key, to create a normalized database where each film has one
entry, and the user can see all airings of that film
* duplication detection ("I already have recorded/seen this") while
creating recording schedules for PVRs and similar means.
If the above criteria of uniqueness and stability are not met, then the
downstream application will run into serious problems:
* Showing the wrong title and description
* recording the wrong shows, e.g. recording the documentation called
"Titanic" instead of the movie
* massive duplication and waste of disk space by re-recording shows
* not recording shows that should be recorded
system attribute
The "system" attribute defines the source of the CID. A "system" could
be the IDs of the data source, or a third-party database. This is to
enable the downstream application to uniquely identify content across
multiple grabbers which might otherwise use the same CID.
The combination of CID "system" + CID value MUST be globally unique and
refer to one, and only one, content. But the same CID value is permitted
to refer to differing content provided that the "system" attribute is
different.
CIDs guarantee their properties only within the "system", so that
grabbers only need to ensure that the CID is unique and stable within
their system.
It is suggested you make the "system" the internet domain name of the
data source, e.g. system="tvdata.com". If you use a third party
database, check the predefined systems below.
A CID SHOULD be consistent across multiple grabbers using the same data
source. When adding a new grabber, you MUST check if a <cid> "system"
has already been defined for your data source in another grabber and
SHOULD adopt its syntax. Conversely, you MUST NOT use a "system" which
has the same value as that from another grabber unless they refer to
exactly the same content, in which case they must use exactly the same
CID value for the same content.
Common systems
Certain "systems" are predefined and any grabber adding a <cid> for one
of these particular data sources MUST follow the following syntax:
IMDb
system: IMDb value: tt0246460
Please note upper/lowercase
e.g. <cid system="IMDb">tt0246460</cid>
<title>James Bond 007 - Die Another Day</title>
themoviedb
system: themoviedb value: 1571
e.g. <cid system="themoviedb">1571</cid>
<title>Die Hard</title>
thetvdb
system: thetvdb value: seriesid + "/" + id
TODO seriesid necessary or id alone sufficient?
e.g. <cid system="thetvdb">71470/46568</cid>
<title>Star Trek: TNG</title><sub-title>Deja Q</sub-title>
Atlas
http://atlas.metabroadcast.com/
system: Atlas value: 8hmr
e.g. <cid system="Atlas">b00779vr</cid>
<title>Star Trek</title><sub-title>The Immunity Syndrome</sub-title>