Tag Archives: music

Building a media info scraper and MKV tagger

XBMC uses scrapers to look up information about movies and TV-shows online in order for the media center to show more interesting information about the files in the library. Take the following image for example:

Detailed movie information provided by themoviedb.org via XBMC's scraper
Detailed movie information provided by themoviedb.org via XBMC’s scraper

Without the scraper you would only see the file name, and even that would probably look weird (MakeMKV saves files full of underscores for instance). So by downloading metadata from the internets you get a more professional-looking library, and you can even start browsing through your media using that metadata (listing all action movies and so on).

In music files you generally add all this info to the ID3-tag, and can even see that metadata in Windows Explorer. The same support doesn’t exist for movie files. MP4 uses the “XMP” system for metadata, and MKV uses some weird xml-specification that they haven’t even bothered completing yet. XBMC’s scraper doesn’t save any of the downloaded information to the files in any way either, meaning the data can’t really be seen anywhere else.

As a small project aimed at teaching myself some Python I’ve decided to code a scraper for both themoviedb.org and thetvdb.com. Here I get to learn how to best handle both JSON and XML, and how to handle web requests. I then plan on making a tagger-tool that tags an MKV-file with the downloaded metadata. This teaches me how to best use other programs within Python, and how to build XML-files.

The goal is to add metadata to MKV-files, but without media players that can actually read said metadata there won’t be much of a point to the aforementioned project. Therefore I plan on looking into possible ways of having XBMC “scrape” metadata from the files, and adding that functionality via an addon or by modifying the source code.

Here’s a link to the project’s home on Github. For my own benefit I will follow up with minor articles detailing design choices, interesting Python-tricks and so on.