Wdtvscraper - a media scraper for linux users

I’m making a media scraper because I couldn’t find one that works on linux. It’s working fine for me now but I want to see how it works in the wild so I need some help testing. If you are interested you can get it three ways:

Download source from git:

git clone git://github.com/rig9919/wdtvscraper.git

Download source tarball:


Download latest debian binary package:


You can read how to use it with:

wdtvscraper -h

If retrieving from git or the tarball, please know that imagemagick and Python Imaging Library (PIL) are required as well as Python itself. It is tested with python 2.7.2.

This does not require moviesheets or thumbgen or anything like that. It creates files that are compatible with the wdtv live and hub devices out of the box.

Please let me know any problems by using  https://github.com/rig9919/wdtvscraper/issues or you can tell me in this thread.

Once I work out all the bugs, I plan to work on tv episode/series support.

UPDATE 3-3-13

TV series are supported. Use the -t option.

UPDATE 3-11-13

Debian packages available from  https://sourceforge.net/projects/wdtvscraper/files/

UPDATE 4-2-13

Confirmed it does not work with python3.

Great idea, Wish there was a media scraper for Mac OSX

Awesome.  I’ll give it a try later this week (I start vacation tomorrow.)

Xtreme56 wrote:

Great idea, Wish there was a media scraper for Mac OSX

As far as I know, this project will work on mac osx if you have Python installed. I can’t say for sure though because I have no access to it. I would need someone to try running it on mac osx to find out what happens.

Well, not so good for me.  I’m guessing there’s no dependency check – I’m missing … something.

I used the tarball.

/home/tony/Downloads/wdtvscraper-master> ./scraper.py 
Traceback (most recent call last):
  File "./scraper.py", line 6, in <module>
    from local_video import LocalVideo
  File "/home/tony/Downloads/wdtvscraper-master/local_video.py", line 2, in <module>
    from PIL import Image
ImportError: No module named PIL
tony@mars Wed Dec 12 07:05 PM  
/home/tony/Downloads/wdtvscraper-master> python scraper.py -h
Traceback (most recent call last):
  File "scraper.py", line 6, in <module>
    from local_video import LocalVideo
  File "/home/tony/Downloads/wdtvscraper-master/local_video.py", line 2, in <module>
    from PIL import Image
ImportError: No module named PIL

I’m running Fedora 17, Python 2.7.3.

pil is the module that is used to preview the posters. I thought it was part of the standard python library which is why I didnt mention it. Check your distros repository for PIL and install it. Thanks for feedback

That did it…

/home/tony/Downloads/wdtvscraper-master> sudo yum install python-imaging
Resolving Dependencies
--> Running transaction check
---> Package python-imaging.i686 0:1.1.7-6.fc17 will be installed
--> Finished Dependency Resolution
 python-imaging i686 1.1.7-6.fc17 fedora 407 k

  python-imaging.i686 0:1.1.7-6.fc17


 Still think an error handler would be a good idea in this case – don’t think it’d be necessary to expect your script to SOLVE the dependency, but at least error out gracefully and in a more meaningful way.  Lots of Linux / Mac users (like me) won’t have a clue what that error up there meant, and will just give up.

Maybe it could say “This script requires Python Image Library.  Please install it and try again.”

Couple of notes after the quick test:

  • I’m kinda a stickler for properly formatted XML…  The XML generated here is all in a single line, just how the WD does it.   And that’s always driven me nuts.  :)  
  • Might want to put a DTD declaration on the first line of the XML.
  • Might want to allow an “options” file, such that the user can set preferences for language, country code

Otherwise, so far:  Nice and clean and quick!

I agree about the dependency issue I think Ill put the check in the install scripts when I make them like like other linux programs do. I will look into the other things you mentioned as well.

Its usable when you take the time for repating the search often. I get a lot of http time out  or bad gateway errors. It works like 1 in 3 times for TV. With movies its a lot better also because you van use the -i option.

tabe@bofx ~/Desktop/wdtvscraper-master $ ./scraper.py -t /backup/werk/  
Found series: The Wire  
Did not save poster: The Wire/aaaa-series-cover.metathumb already exists  
Found episode: S01E06 - The Wire == The Wire  
Traceback (most recent call last):  
&nbsp; File "./scraper.py", line 243, in <module>  
&nbsp;&nbsp;&nbsp; main()  
&nbsp; File "./scraper.py", line 71, in main  
&nbsp;&nbsp;&nbsp; process\_tv(args.tv\_path, args.quiet, args.debug)  
&nbsp; File "./scraper.py", line 205, in process\_tv  
&nbsp;&nbsp;&nbsp; episode = LocalEpisode(f, series.seriesname)  
&nbsp; File "/home/tabe/Desktop/wdtvscraper-master/tv\_series.py", line 79, in \_\_init\_\_  
&nbsp;&nbsp;&nbsp; super(LocalEpisode, self).\_\_init\_\_(series\_name)  
&nbsp; File "/home/tabe/Desktop/wdtvscraper-master/tv\_series.py", line 15, in \_\_init\_\_  
&nbsp;&nbsp;&nbsp; self.series\_data = self.\_\_get\_series\_info(match.tvdbId)  
&nbsp; File "/home/tabe/Desktop/wdtvscraper-master/tv\_series.py", line 46, in \_\_get\_series\_info  
&nbsp;&nbsp;&nbsp; return longsearch.searchForLongSeries(tvdbId)  
&nbsp; File "/home/tabe/Desktop/wdtvscraper-master/pytvdb/longsearch.py", line 325, in searchForLongSeries  
&nbsp;&nbsp;&nbsp; series.updatedTime = getServerTime()  
&nbsp; File "/home/tabe/Desktop/wdtvscraper-master/pytvdb/longsearch.py", line 288, in getServerTime  
&nbsp;&nbsp;&nbsp; response = httphelper.doGetRequest(mirrorUrl, mirrorPort, serverTimePath, serverTimeParams)  
&nbsp; File "/home/tabe/Desktop/wdtvscraper-master/pytvdb/httphelper.py", line 15, in doGetRequest  
&nbsp;&nbsp;&nbsp; response = urllib2.urlopen(request)  
&nbsp; File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen  
&nbsp;&nbsp;&nbsp; return \_opener.open(url, data, timeout)  
&nbsp; File "/usr/lib/python2.7/urllib2.py", line 406, in open  
&nbsp;&nbsp;&nbsp; response = meth(req, response)  
&nbsp; File "/usr/lib/python2.7/urllib2.py", line 519, in http\_response  
&nbsp;&nbsp;&nbsp; 'http', request, response, code, msg, hdrs)  
&nbsp; File "/usr/lib/python2.7/urllib2.py", line 444, in error  
&nbsp;&nbsp;&nbsp; return self.\_call\_chain(\*args)  
&nbsp; File "/usr/lib/python2.7/urllib2.py", line 378, in \_call\_chain  
&nbsp;&nbsp;&nbsp; result = func(\*args)  
&nbsp; File "/usr/lib/python2.7/urllib2.py", line 527, in http\_error\_default  
&nbsp;&nbsp;&nbsp; raise HTTPError(req.get\_full\_url(), code, msg, hdrs, fp)  
urllib2.HTTPError: HTTP Error 504: Gateway Time-out  


I thought there was a “Mac” alternative for Media Scraper ? based around “Thumbgen”

Don’t quote me … but here’s a Link


And a “Linux” alternative as well …based around “Thumbgen”


Complaining here realy helped because these htttp timout errors are gone now :slight_smile:

There is a problem with TV Media metafiles.



maybe because episode_number is not in xml?

Please keep supporting ths

I am sorry for no updates the last couple months.  Since then I have added support for tv series with the -t option. If you already knew that, read on to see the latest changes to tv scanning.

In the past few days I have made some changes to include series_number, and episode_number in the metadata files. This allows for episodes to be in order when sorted alphabetically.

When I was testing that functionality I changed the way tv episode titles are named and liked it so much I kept it. Episodes are now prefixed by a season and episode number identifier such as “402: The Change” which is an episode titled “The Change” and is season 4, episode 2 of the show. Right now this is non-changeable, I am thinking of making it an option. I do not include the show’s name because I think it is too ugly.

I have also found that sometimes thetvdb.org has episode screenshots that are too large for the wdtv device to show. For example, Farscape s03e01 has an episode image that is 73K which now gets resized to about 30K and shows fine on the device. 

TV series are also scanned a bit different now. I have changed the way you specify the target from needing to specify an all-encompassing tvshow directory to now needing to specify the directory of the tvshow directly. For example, instead of this:

./scraper -t /path/to/videos/tv

 you now use a tv show’s directory as the target like this:

./scraper -t /path/to/videos/tv/farscape

Yes it takes a little more work to scan your library of videos, but it cuts down on scan time and usually once a show’s directory is scraped once, it no longer needs to be scanned anyway.

And lastly I added a --force-overwrite option which causes it to overwrite existing .metathumb and .xml files it encounters during a scan.

If you’ve already scanned your shows I highly recommend you do it again with the --force-overwrite option.

And again, any questions or comments, post here or at  https://github.com/rig9919/wdtvscraper/issues. I apologize for not answering any questions earlier, I thought I had it set up to get an email for every response to this thread - apparently I didn’t. I will bookmark this thread so I can check it this time.

Latest fixes:

1.1.16 posters that are too large for wdtv are resized
1.1.21 language is no longer ignored when tv scraping
1.1.25 series data is no longer retrieved during episode retrieval
1.1.26 --force-overwrite option is no longer ignored when scraping movies

The changes in 1.1.25 provide a big speed increase.

Debian packages are now available at  https://sourceforge.net/projects/wdtvscraper/files/

To Mac Users:

Did you try HubFlow?

Here are the main updates in the past few weeks:

-added ability to choose movie poster and tv series poster
-fixed bug with tv series that have non-ascii characters
-improved multiple language support by returning tv series search results in the language specified
-confirmed it does not work with python3

The latest package is here:  http://sourceforge.net/projects/wdtvscraper/files/wdtvscraper_1.2.17-1_all.deb/download

1 Like