Compare commits

..

12 Commits

Author SHA1 Message Date
Ricardo Garcia
7ab2043c9c Bump version number 2010-10-31 11:23:52 +01:00
Ricardo Garcia
3e703dd1cd Remove generator and webpage template, moved to wiki 2010-10-31 11:23:52 +01:00
Ricardo Garcia
cc10940385 Fix very wrong code for setting the language
It turned out that, despite the program working without apparent errors,
the code for setting the language was completely wrong. First, it didn't
run unless some form of authentication was performed. Second, I
misstyped _LANG_URL as _LOGIN_URL, so the language was not being set at
all! Amazing it still worked.
2010-10-31 11:23:48 +01:00
Ricardo Garcia
5121ef2071 Fix wrong indentation 2010-10-31 11:23:48 +01:00
Ricardo Garcia
fd20984889 Bump version number 2010-10-31 11:23:48 +01:00
Ricardo Garcia
111ae3695c Document new -w option 2010-10-31 11:23:48 +01:00
Ricardo Garcia
0beeff4b3e Add que -w or --no-overwrites option 2010-10-31 11:23:48 +01:00
Ricardo Garcia
64a6f26c5d Put Danny Colligan as an author in the script itself 2010-10-31 11:23:48 +01:00
Ricardo Garcia
a9633f1457 Use quote_plus instead of manually replacing spaces by plus signs 2010-10-31 11:23:48 +01:00
Ricardo Garcia
a20e4c2f96 Improve documentation of new features in webpage 2010-10-31 11:23:47 +01:00
Ricardo Garcia
d1536018a8 Include Danny Colligan in credits 2010-10-31 11:23:47 +01:00
Ricardo Garcia
25af2bce3a Include Danny Colligan's YouTube search InfoExtractor 2010-10-31 11:23:47 +01:00
3 changed files with 101 additions and 236 deletions

View File

@@ -1,15 +0,0 @@
#!/usr/bin/env python
import hashlib
import subprocess
template = file('index.html.in', 'r').read()
version = subprocess.Popen(['./youtube-dl', '--version'], stdout=subprocess.PIPE).communicate()[0].strip()
data = file('youtube-dl', 'rb').read()
md5sum = hashlib.md5(data).hexdigest()
sha1sum = hashlib.sha1(data).hexdigest()
sha256sum = hashlib.sha256(data).hexdigest()
template = template.replace('@PROGRAM_VERSION@', version)
template = template.replace('@PROGRAM_MD5SUM@', md5sum)
template = template.replace('@PROGRAM_SHA1SUM@', sha1sum)
template = template.replace('@PROGRAM_SHA256SUM@', sha256sum)
file('index.html', 'w').write(template)

View File

@@ -1,214 +0,0 @@
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-type" content="text/html; charset=UTF-8" />
<title>youtube-dl: Download videos from YouTube.com</title>
<style type="text/css"><!--
body {
font-family: sans-serif;
font-size: small;
}
h1 {
text-align: center;
text-decoration: underline;
color: #006699;
}
h2 {
color: #006699;
}
p {
text-align: justify;
margin-left: 5%;
margin-right: 5%;
}
ul {
margin-left: 5%;
margin-right: 5%;
list-style-type: square;
}
li {
margin-bottom: 0.5ex;
}
.smallnote {
font-size: x-small;
text-align: center;
}
--></style>
</head>
<body>
<h1>youtube-dl: Download videos from YouTube.com</h1>
<p class="smallnote">(and more...)</p>
<h2>What is it?</h2>
<p><em>youtube-dl</em> is a small command-line program to download videos
from YouTube.com. It requires the <a href="http://www.python.org/">Python
interpreter</a>, version 2.4 or later, and it's not platform specific.
It should work in your Unix box, in Windows or in Mac OS X. The latest version
is <strong>@PROGRAM_VERSION@</strong>. It's released to the public domain,
which means you can modify it, redistribute it or use it however you like.</p>
<p>I'll try to keep it updated if YouTube.com changes the way you access
their videos. After all, it's a simple and short program. However, I can't
guarantee anything. If you detect it stops working, check for new versions
and/or inform me about the problem, indicating the program version you
are using. If the program stops working and I can't solve the problem but
you have a solution, I'd like to know it. If that happens and you feel you
can maintain the program yourself, tell me. My contact information is
at <a href="http://freshmeat.net/~rg3/">freshmeat.net</a>.</p>
<p>Thanks for all the feedback received so far. I'm glad people find my
program useful.</p>
<h2>Usage instructions</h2>
<p>In Windows, once you have installed the Python interpreter, save the
program with the <em>.py</em> extension and put it somewhere in the PATH.
Try to follow the
<a href="http://rg03.wordpress.com/youtube-dl-under-windows-xp/">guide to
install youtube-dl under Windows XP</a>.</p>
<p>In Unix, download it, give it execution permission and copy it to one
of the PATH directories (typically, <em>/usr/local/bin</em>).</p>
<p>After that, you should be able to call it from the command line as
<em>youtube-dl</em> or <em>youtube-dl.py</em>. I will use <em>youtube-dl</em>
in the following examples. Usage instructions are easy. Use <em>youtube-dl</em>
followed by a video URL or identifier. Example: <em>youtube-dl
"http://www.youtube.com/watch?v=foobar"</em>. The video will be saved
to the file <em>foobar.flv</em> in that example. As YouTube.com
videos are in Flash Video format, their extension should be <em>flv</em>.
In Linux and other unices, video players using a recent version of
<em>ffmpeg</em> can play them. That includes MPlayer, VLC, etc. Those two
work under Windows and other platforms, but you could also get a
specific FLV player of your taste.</p>
<p>If you try to run the program and you receive an error message containing the
keyword <em>SyntaxError</em> near the end, it means your Python interpreter
is too old.</p>
<h2>More usage tips</h2>
<ul>
<li>You can change the file name of the video using the -o option, like in
<em>youtube-dl -o vid.flv "http://www.youtube.com/watch?v=foobar"</em>.
Read the <a href="#otpl">Output template</a> section for more details on
this.</li>
<li>Some videos require an account to be downloaded, mostly because they're
flagged as mature content. You can pass the program a username and password
for a YouTube.com account with the -u and -p options, like <em>youtube-dl
-u myusername -p mypassword "http://www.youtube.com/watch?v=foobar"</em>.</li>
<li>The account data can also be read from the user .netrc file by indicating
the -n or --netrc option. The machine name is <em>youtube</em> in that
case.</li>
<li>The <em>simulate mode</em> (activated with -s or --simulate) can be used
to just get the real video URL and use it with a download manager if you
prefer that option.</li>
<li>The <em>quiet mode</em> (activated with -q or --quiet) can be used to
supress all output messages. This allows, in systems featuring /dev/stdout
and other similar special files, outputting the video data to standard output
in order to pipe it to another program without interferences.</li>
<li>The program can be told to simply print the final video URL to standard
output using the -g or --get-url option.</li>
<li>In a similar line, the -e or --get-title option tells the program to print
the video title.</li>
<li>The default filename is <em>video_id.flv</em>. But you can also use the
video title in the filename with the -t or --title option, or preserve the
literal title in the filename with the -l or --literal option.</li>
<li>You can make the program append <em>&amp;fmt=something</em> to the URL
by using the -f or --format option. This makes it possible to download high
quality versions of the videos when available.</li>
<li>The -b or --best-quality option is an alias for -f 18.</li>
<li>The -m or --mobile-version option is an alias for -f 17.</li>
<li>Normally, the program will stop on the first error, but you can tell it
to attempt to download every video with the -i or --ignore-errors option.</li>
<li>The -a or --batch-file option lets you specify a file to read URLs from.
The file must contain one URL per line.</li>
<li><em>youtube-dl</em> honors the <em>http_proxy</em> environment variable
if you want to use a proxy. Set it to something like
<em>http://proxy.example.com:8080</em>, and do not leave the <em>http://</em>
prefix out.</li>
<li>You can get the program version by calling it as <em>youtube-dl
-v</em> or <em>youtube-dl --version</em>.</li>
<li>For usage instructions, use <em>youtube-dl -h</em> or <em>youtube-dl
--help.</em></li>
<li>You can cancel the program at any time pressing Ctrl+C. It may print
some error lines saying something about <em>KeyboardInterrupt</em>.
That's ok.</li>
</ul>
<h2 id="otpl">Download it</h2>
<p>Note that if you directly click on these hyperlinks, your web browser will
most likely display the program contents. It's usually better to
right-click on it and choose the appropriate option, normally called <em>Save
Target As</em> or <em>Save Link As</em>, depending on the web browser you
are using.</p>
<p><a href="youtube-dl">@PROGRAM_VERSION@</a></p>
<ul>
<li><strong>MD5</strong>: @PROGRAM_MD5SUM@</li>
<li><strong>SHA1</strong>: @PROGRAM_SHA1SUM@</li>
<li><strong>SHA256</strong>: @PROGRAM_SHA256SUM@</li>
</ul>
<h2>Output template</h2>
<p>The -o option allows users to indicate a template for the output file names.
The basic usage is not to set any template arguments when downloading a single
file, like in <em>youtube-dl -o funny_video.flv 'http://some/video'</em>.
However, it may contain special sequences that will be replaced when
downloading each video. The special sequences have the format
<strong>%(NAME)s</strong>. To clarify, that's a percent symbol followed by a
name in parenthesis, followed by a lowercase S. Allowed names are:</p>
<ul>
<li><em>id</em>: The sequence will be replaced by the video identifier.</li>
<li><em>url</em>: The sequence will be replaced by the video URL.</li>
<li><em>uploader</em>: The sequence will be replaced by the nickname of the
person who uploaded the video.</li>
<li><em>title</em>: The sequence will be replaced by the literal video
title.</li>
<li><em>stitle</em>: The sequence will be replaced by a simplified video
title, restricted to alphanumeric characters and dashes.</li>
<li><em>ext</em>: The sequence will be replaced by the appropriate
extension (like <em>flv</em> or <em>mp4</em>).</li>
</ul>
<p>As you may have guessed, the default template is <em>%(id)s.%(ext)s</em>.
When some command line options are used, it's replaced by other templates like
<em>%(title)s-%(id)s.%(ext)s</em>. You can specify your own.</p>
<h2>Authors</h2>
<ul>
<li>Ricardo Garcia Gonzalez: program core, YouTube.com InfoExtractor,
metacafe.com InfoExtractor and YouTube playlist InfoExtractor.</li>
<li>Many other people contributing patches, code, ideas and kind messages. Too
many to be listed here. You know who you are. Thank you very much.</li>
</ul>
<p class="smallnote">Copyright &copy; 2006-2007 Ricardo Garcia Gonzalez</p>
</body>
</html>

View File

@@ -1,6 +1,7 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Author: Ricardo Garcia Gonzalez
# Author: Danny Colligan
# License: Public domain code
import htmlentitydefs
import httplib
@@ -88,6 +89,7 @@ class FileDownloader(object):
outtmpl: Template for output names.
ignoreerrors: Do not stop on download errors.
ratelimit: Download speed limit, in bytes/sec.
nooverwrites: Prevent overwriting files.
"""
_params = None
@@ -142,7 +144,7 @@ class FileDownloader(object):
return '--:--'
return '%02d:%02d' % (eta_mins, eta_secs)
@staticmethod
@staticmethod
def calc_speed(start, now, bytes):
dif = now - start
if bytes == 0 or dif < 0.001: # One millisecond
@@ -285,6 +287,9 @@ class FileDownloader(object):
except (ValueError, KeyError), err:
retcode = self.trouble('ERROR: invalid output template or system charset: %s' % str(err))
continue
if self._params['nooverwrites'] and os.path.exists(filename):
self.to_stderr('WARNING: file exists: %s; skipping' % filename)
continue
try:
self.pmkdir(filename)
except (OSError, IOError), err:
@@ -488,12 +493,8 @@ class YoutubeIE(InfoExtractor):
self.to_stderr(u'WARNING: parsing .netrc: %s' % str(err))
return
# No authentication to be performed
if username is None:
return
# Set language
request = urllib2.Request(self._LOGIN_URL, None, std_headers)
request = urllib2.Request(self._LANG_URL, None, std_headers)
try:
self.report_lang()
urllib2.urlopen(request).read()
@@ -501,6 +502,10 @@ class YoutubeIE(InfoExtractor):
self.to_stderr(u'WARNING: unable to set language: %s' % str(err))
return
# No authentication to be performed
if username is None:
return
# Log in
login_form = {
'current_form': 'loginForm',
@@ -721,6 +726,90 @@ class MetacafeIE(InfoExtractor):
'ext': video_extension.decode('utf-8'),
}]
class YoutubeSearchIE(InfoExtractor):
"""Information Extractor for YouTube search queries."""
_VALID_QUERY = r'ytsearch(\d+|all)?:[\s\S]+'
_TEMPLATE_URL = 'http://www.youtube.com/results?search_query=%s&page=%s&gl=US&hl=en'
_VIDEO_INDICATOR = r'href="/watch\?v=.+?"'
_MORE_PAGES_INDICATOR = r'>Next</a>'
_youtube_ie = None
def __init__(self, youtube_ie, downloader=None):
InfoExtractor.__init__(self, downloader)
self._youtube_ie = youtube_ie
@staticmethod
def suitable(url):
return (re.match(YoutubeSearchIE._VALID_QUERY, url) is not None)
def report_download_page(self, query, pagenum):
"""Report attempt to download playlist page with given number."""
self.to_stdout(u'[youtube] query "%s": Downloading page %s' % (query, pagenum))
def _real_initialize(self):
self._youtube_ie.initialize()
def _real_extract(self, query):
mobj = re.match(self._VALID_QUERY, query)
if mobj is None:
self.to_stderr(u'ERROR: invalid search query "%s"' % query)
return [None]
prefix, query = query.split(':')
prefix = prefix[8:]
if prefix == '':
return self._download_n_results(query, 1)
elif prefix == 'all':
return self._download_n_results(query, -1)
else:
try:
n = int(prefix)
if n <= 0:
self.to_stderr(u'ERROR: invalid download number %s for query "%s"' % (n, query))
return [None]
return self._download_n_results(query, n)
except ValueError: # parsing prefix as int fails
return self._download_n_results(query, 1)
def _download_n_results(self, query, n):
"""Downloads a specified number of results for a query"""
video_ids = []
already_seen = set()
pagenum = 1
while True:
self.report_download_page(query, pagenum)
result_url = self._TEMPLATE_URL % (urllib.quote_plus(query), pagenum)
request = urllib2.Request(result_url, None, std_headers)
try:
page = urllib2.urlopen(request).read()
except (urllib2.URLError, httplib.HTTPException, socket.error), err:
self.to_stderr(u'ERROR: unable to download webpage: %s' % str(err))
return [None]
# Extract video identifiers
for mobj in re.finditer(self._VIDEO_INDICATOR, page):
video_id = page[mobj.span()[0]:mobj.span()[1]].split('=')[2][:-1]
if video_id not in already_seen:
video_ids.append(video_id)
already_seen.add(video_id)
if len(video_ids) == n:
# Specified n videos reached
information = []
for id in video_ids:
information.extend(self._youtube_ie.extract('http://www.youtube.com/watch?v=%s' % id))
return information
if self._MORE_PAGES_INDICATOR not in page:
information = []
for id in video_ids:
information.extend(self._youtube_ie.extract('http://www.youtube.com/watch?v=%s' % id))
return information
pagenum = pagenum + 1
class YoutubePlaylistIE(InfoExtractor):
"""Information Extractor for YouTube playlists."""
@@ -852,7 +941,7 @@ if __name__ == '__main__':
# Parse command line
parser = optparse.OptionParser(
usage='Usage: %prog [options] url...',
version='2009.01.31',
version='2009.03.03',
conflict_handler='resolve',
)
parser.add_option('-h', '--help',
@@ -891,6 +980,8 @@ if __name__ == '__main__':
dest='ratelimit', metavar='L', help='download rate limit (e.g. 50k or 44.6m)')
parser.add_option('-a', '--batch-file',
dest='batchfile', metavar='F', help='file containing URLs to download')
parser.add_option('-w', '--no-overwrites',
action='store_true', dest='nooverwrites', help='do not overwrite files', default=False)
(opts, args) = parser.parse_args()
# Batch file verification
@@ -925,6 +1016,7 @@ if __name__ == '__main__':
youtube_ie = YoutubeIE()
metacafe_ie = MetacafeIE(youtube_ie)
youtube_pl_ie = YoutubePlaylistIE(youtube_ie)
youtube_search_ie = YoutubeSearchIE(youtube_ie)
# File downloader
charset = locale.getdefaultlocale()[1]
@@ -945,7 +1037,9 @@ if __name__ == '__main__':
or u'%(id)s.%(ext)s'),
'ignoreerrors': opts.ignoreerrors,
'ratelimit': opts.ratelimit,
'nooverwrites': opts.nooverwrites,
})
fd.add_info_extractor(youtube_search_ie)
fd.add_info_extractor(youtube_pl_ie)
fd.add_info_extractor(metacafe_ie)
fd.add_info_extractor(youtube_ie)