Bump version number

Remove generator and webpage template, moved to wiki
Fix very wrong code for setting the language
2010-10-31 11:23:52 +01:00 · 2010-10-31 11:23:52 +01:00 · 2010-10-31 11:23:48 +01:00 · 2010-10-31 11:23:48 +01:00 · 2010-10-31 11:23:48 +01:00 · 2010-10-31 11:23:48 +01:00
3 changed files with 101 additions and 236 deletions
--- a/15
+++ b/15
@@ -1,15 +0,0 @@
-#!/usr/bin/env python
-import hashlib
-import subprocess
-
-template = file('index.html.in', 'r').read()
-version = subprocess.Popen(['./youtube-dl', '--version'], stdout=subprocess.PIPE).communicate()[0].strip()
-data = file('youtube-dl', 'rb').read()
-md5sum = hashlib.md5(data).hexdigest()
-sha1sum = hashlib.sha1(data).hexdigest()
-sha256sum = hashlib.sha256(data).hexdigest()
-template = template.replace('@PROGRAM_VERSION@', version)
-template = template.replace('@PROGRAM_MD5SUM@', md5sum)
-template = template.replace('@PROGRAM_SHA1SUM@', sha1sum)
-template = template.replace('@PROGRAM_SHA256SUM@', sha256sum)
-file('index.html', 'w').write(template)
--- a/index.html.in
+++ b/index.html.in
@@ -1,214 +0,0 @@
-<!DOCTYPE html 
-     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
-     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
-<head>
-	<meta http-equiv="Content-type" content="text/html; charset=UTF-8" />
-	<title>youtube-dl: Download videos from YouTube.com</title>
-	<style type="text/css"><!--
-		body {
-			font-family: sans-serif;
-			font-size: small;
-		}
-		h1 {
-			text-align: center;
-			text-decoration: underline;
-			color: #006699;
-		}
-		h2 {
-			color: #006699;
-		}
-		p {
-			text-align: justify;
-			margin-left: 5%;
-			margin-right: 5%;
-		}
-		ul {
-			margin-left: 5%;
-			margin-right: 5%;
-			list-style-type: square;
-		}
-		li {
-			margin-bottom: 0.5ex;
-		}
-		.smallnote {
-			font-size: x-small;
-			text-align: center;
-		}
-		--></style>
-</head>
-<body>
-<h1>youtube-dl: Download videos from YouTube.com</h1>
-
-<p class="smallnote">(and more...)</p>
-
-<h2>What is it?</h2>
-
-<p><em>youtube-dl</em> is a small command-line program to download videos
-from YouTube.com. It requires the <a href="http://www.python.org/">Python
-interpreter</a>, version 2.4 or later, and it's not platform specific.
-It should work in your Unix box, in Windows or in Mac OS X. The latest version
-is <strong>@PROGRAM_VERSION@</strong>. It's released to the public domain,
-which means you can modify it, redistribute it or use it however you like.</p>
-
-<p>I'll try to keep it updated if YouTube.com changes the way you access
-their videos. After all, it's a simple and short program. However, I can't
-guarantee anything. If you detect it stops working, check for new versions
-and/or inform me about the problem, indicating the program version you
-are using. If the program stops working and I can't solve the problem but
-you have a solution, I'd like to know it. If that happens and you feel you
-can maintain the program yourself, tell me. My contact information is
-at <a href="http://freshmeat.net/~rg3/">freshmeat.net</a>.</p>
-
-<p>Thanks for all the feedback received so far. I'm glad people find my
-program useful.</p>
-
-<h2>Usage instructions</h2>
-
-<p>In Windows, once you have installed the Python interpreter, save the
-program with the <em>.py</em> extension and put it somewhere in the PATH.
-Try to follow the
-<a href="http://rg03.wordpress.com/youtube-dl-under-windows-xp/">guide to
-install youtube-dl under Windows XP</a>.</p>
-
-<p>In Unix, download it, give it execution permission and copy it to one
-of the PATH directories (typically, <em>/usr/local/bin</em>).</p>
-
-<p>After that, you should be able to call it from the command line as
-<em>youtube-dl</em> or <em>youtube-dl.py</em>. I will use <em>youtube-dl</em>
-in the following examples. Usage instructions are easy. Use <em>youtube-dl</em>
-followed by a video URL or identifier. Example: <em>youtube-dl
-"http://www.youtube.com/watch?v=foobar"</em>. The video will be saved
-to the file <em>foobar.flv</em> in that example. As YouTube.com
-videos are in Flash Video format, their extension should be <em>flv</em>.
-In Linux and other unices, video players using a recent version of
-<em>ffmpeg</em> can play them. That includes MPlayer, VLC, etc. Those two
-work under Windows and other platforms, but you could also get a
-specific FLV player of your taste.</p>
-
-<p>If you try to run the program and you receive an error message containing the
-keyword <em>SyntaxError</em> near the end, it means your Python interpreter
-is too old.</p>
-
-<h2>More usage tips</h2>
-
-<ul>
-
-<li>You can change the file name of the video using the -o option, like in
-<em>youtube-dl -o vid.flv "http://www.youtube.com/watch?v=foobar"</em>.
-Read the <a href="#otpl">Output template</a> section for more details on
-this.</li>
-
-<li>Some videos require an account to be downloaded, mostly because they're
-flagged as mature content. You can pass the program a username and password
-for a YouTube.com account with the -u and -p options, like <em>youtube-dl
-u myusername -p mypassword "http://www.youtube.com/watch?v=foobar"</em>.</li>
-
-<li>The account data can also be read from the user .netrc file by indicating
-the -n or --netrc option. The machine name is <em>youtube</em> in that
-case.</li>
-
-<li>The <em>simulate mode</em> (activated with -s or --simulate) can be used
-to just get the real video URL and use it with a download manager if you
-prefer that option.</li>
-
-<li>The <em>quiet mode</em> (activated with -q or --quiet) can be used to
-supress all output messages. This allows, in systems featuring /dev/stdout
-and other similar special files, outputting the video data to standard output
-in order to pipe it to another program without interferences.</li>
-
-<li>The program can be told to simply print the final video URL to standard
-output using the -g or --get-url option.</li>
-
-<li>In a similar line, the -e or --get-title option tells the program to print
-the video title.</li>
-
-<li>The default filename is <em>video_id.flv</em>. But you can also use the
-video title in the filename with the -t or --title option, or preserve the
-literal title in the filename with the -l or --literal option.</li>
-
-<li>You can make the program append <em>&amp;fmt=something</em> to the URL
-by using the -f or --format option. This makes it possible to download high
-quality versions of the videos when available.</li>
-
-<li>The -b or --best-quality option is an alias for -f 18.</li>
-
-<li>The -m or --mobile-version option is an alias for -f 17.</li>
-
-<li>Normally, the program will stop on the first error, but you can tell it
-to attempt to download every video with the -i or --ignore-errors option.</li>
-
-<li>The -a or --batch-file option lets you specify a file to read URLs from.
-The file must contain one URL per line.</li>
-
-<li><em>youtube-dl</em> honors the <em>http_proxy</em> environment variable
-if you want to use a proxy. Set it to something like
-<em>http://proxy.example.com:8080</em>, and do not leave the <em>http://</em>
-prefix out.</li>
-
-<li>You can get the program version by calling it as <em>youtube-dl
-v</em> or <em>youtube-dl --version</em>.</li>
-
-<li>For usage instructions, use <em>youtube-dl -h</em> or <em>youtube-dl
--help.</em></li>
-
-<li>You can cancel the program at any time pressing Ctrl+C. It may print
-some error lines saying something about <em>KeyboardInterrupt</em>.
-That's ok.</li>
-
-</ul>
-
-<h2 id="otpl">Download it</h2>
-
-<p>Note that if you directly click on these hyperlinks, your web browser will
-most likely display the program contents. It's usually better to
-right-click on it and choose the appropriate option, normally called <em>Save
-Target As</em> or <em>Save Link As</em>, depending on the web browser you
-are using.</p>
-
-<p><a href="youtube-dl">@PROGRAM_VERSION@</a></p>
-<ul>
-        <li><strong>MD5</strong>: @PROGRAM_MD5SUM@</li>
-        <li><strong>SHA1</strong>: @PROGRAM_SHA1SUM@</li>
-        <li><strong>SHA256</strong>: @PROGRAM_SHA256SUM@</li>
-</ul>
-
-<h2>Output template</h2>
-
-<p>The -o option allows users to indicate a template for the output file names.
-The basic usage is not to set any template arguments when downloading a single
-file, like in <em>youtube-dl -o funny_video.flv 'http://some/video'</em>.
-However, it may contain special sequences that will be replaced when
-downloading each video. The special sequences have the format
-<strong>%(NAME)s</strong>. To clarify, that's a percent symbol followed by a
-name in parenthesis, followed by a lowercase S. Allowed names are:</p>
-
-<ul>
-<li><em>id</em>: The sequence will be replaced by the video identifier.</li>
-<li><em>url</em>: The sequence will be replaced by the video URL.</li>
-<li><em>uploader</em>: The sequence will be replaced by the nickname of the
-person who uploaded the video.</li>
-<li><em>title</em>: The sequence will be replaced by the literal video
-title.</li>
-<li><em>stitle</em>: The sequence will be replaced by a simplified video
-title, restricted to alphanumeric characters and dashes.</li>
-<li><em>ext</em>: The sequence will be replaced by the appropriate
-extension (like <em>flv</em> or <em>mp4</em>).</li>
-</ul>
-
-<p>As you may have guessed, the default template is <em>%(id)s.%(ext)s</em>.
-When some command line options are used, it's replaced by other templates like
-<em>%(title)s-%(id)s.%(ext)s</em>. You can specify your own.</p>
-
-<h2>Authors</h2>
-
-<ul>
-<li>Ricardo Garcia Gonzalez: program core, YouTube.com InfoExtractor,
-metacafe.com InfoExtractor and YouTube playlist InfoExtractor.</li>
-<li>Many other people contributing patches, code, ideas and kind messages. Too
-many to be listed here. You know who you are. Thank you very much.</li>
-</ul>
-
-<p class="smallnote">Copyright &copy; 2006-2007 Ricardo Garcia Gonzalez</p>
-</body>
-</html>
--- a/108
+++ b/108
@@ -1,6 +1,7 @@
 #!/usr/bin/env python
 # -*- coding: utf-8 -*-
 # Author: Ricardo Garcia Gonzalez
+# Author: Danny Colligan
 # License: Public domain code
 import htmlentitydefs
 import httplib
@@ -88,6 +89,7 @@ class FileDownloader(object):
 	outtmpl:	Template for output names.
 	ignoreerrors:	Do not stop on download errors.
 	ratelimit:	Download speed limit, in bytes/sec.
+	nooverwrites:	Prevent overwriting files.
 	"""

 	_params = None
@@ -142,7 +144,7 @@ class FileDownloader(object):
 			return '--:--'
 		return '%02d:%02d' % (eta_mins, eta_secs)

- 	@staticmethod
+	@staticmethod
 	def calc_speed(start, now, bytes):
 		dif = now - start
 		if bytes == 0 or dif < 0.001: # One millisecond
@@ -285,6 +287,9 @@ class FileDownloader(object):
 					except (ValueError, KeyError), err:
 						retcode = self.trouble('ERROR: invalid output template or system charset: %s' % str(err))
 						continue
+					if self._params['nooverwrites'] and os.path.exists(filename):
+						self.to_stderr('WARNING: file exists: %s; skipping' % filename)
+						continue
 					try:
 						self.pmkdir(filename)
 					except (OSError, IOError), err:
@@ -488,12 +493,8 @@ class YoutubeIE(InfoExtractor):
 				self.to_stderr(u'WARNING: parsing .netrc: %s' % str(err))
 				return

-		# No authentication to be performed
-		if username is None:
-			return
-
 		# Set language
-		request = urllib2.Request(self._LOGIN_URL, None, std_headers)
+		request = urllib2.Request(self._LANG_URL, None, std_headers)
 		try:
 			self.report_lang()
 			urllib2.urlopen(request).read()
@@ -501,6 +502,10 @@ class YoutubeIE(InfoExtractor):
 			self.to_stderr(u'WARNING: unable to set language: %s' % str(err))
 			return

+		# No authentication to be performed
+		if username is None:
+			return
+
 		# Log in
 		login_form = {
 				'current_form': 'loginForm',
@@ -721,6 +726,90 @@ class MetacafeIE(InfoExtractor):
 			'ext':		video_extension.decode('utf-8'),
 			}]

+
+class YoutubeSearchIE(InfoExtractor):
+	"""Information Extractor for YouTube search queries."""
+	_VALID_QUERY = r'ytsearch(\d+|all)?:[\s\S]+'
+	_TEMPLATE_URL = 'http://www.youtube.com/results?search_query=%s&page=%s&gl=US&hl=en'
+	_VIDEO_INDICATOR = r'href="/watch\?v=.+?"'
+	_MORE_PAGES_INDICATOR = r'>Next</a>'
+	_youtube_ie = None
+
+	def __init__(self, youtube_ie, downloader=None): 
+		InfoExtractor.__init__(self, downloader)
+		self._youtube_ie = youtube_ie
+	
+	@staticmethod
+	def suitable(url):
+		return (re.match(YoutubeSearchIE._VALID_QUERY, url) is not None)
+
+	def report_download_page(self, query, pagenum):
+		"""Report attempt to download playlist page with given number."""
+		self.to_stdout(u'[youtube] query "%s": Downloading page %s' % (query, pagenum))
+
+	def _real_initialize(self):
+		self._youtube_ie.initialize()
+	
+	def _real_extract(self, query):
+		mobj = re.match(self._VALID_QUERY, query)
+		if mobj is None:
+			self.to_stderr(u'ERROR: invalid search query "%s"' % query)
+			return [None]
+
+		prefix, query = query.split(':')
+		prefix = prefix[8:]
+		if prefix == '': 
+			return self._download_n_results(query, 1)
+		elif prefix == 'all': 
+			return self._download_n_results(query, -1)
+		else: 
+			try:
+				n = int(prefix)
+				if n <= 0:
+					self.to_stderr(u'ERROR: invalid download number %s for query "%s"' % (n, query))
+					return [None]
+				return self._download_n_results(query, n)
+			except ValueError: # parsing prefix as int fails
+				return self._download_n_results(query, 1)
+
+	def _download_n_results(self, query, n):
+		"""Downloads a specified number of results for a query"""
+
+		video_ids = []
+		already_seen = set()
+		pagenum = 1
+
+		while True:
+			self.report_download_page(query, pagenum)
+			result_url = self._TEMPLATE_URL % (urllib.quote_plus(query), pagenum)
+			request = urllib2.Request(result_url, None, std_headers)
+			try:
+				page = urllib2.urlopen(request).read()
+			except (urllib2.URLError, httplib.HTTPException, socket.error), err:
+				self.to_stderr(u'ERROR: unable to download webpage: %s' % str(err))
+				return [None]
+
+			# Extract video identifiers
+			for mobj in re.finditer(self._VIDEO_INDICATOR, page):
+				video_id = page[mobj.span()[0]:mobj.span()[1]].split('=')[2][:-1]
+				if video_id not in already_seen:
+					video_ids.append(video_id)
+					already_seen.add(video_id)
+					if len(video_ids) == n:
+						# Specified n videos reached
+						information = []
+						for id in video_ids:
+							information.extend(self._youtube_ie.extract('http://www.youtube.com/watch?v=%s' % id))
+						return information
+
+			if self._MORE_PAGES_INDICATOR not in page:
+				information = []
+				for id in video_ids:
+					information.extend(self._youtube_ie.extract('http://www.youtube.com/watch?v=%s' % id))
+				return information
+
+			pagenum = pagenum + 1
+
 class YoutubePlaylistIE(InfoExtractor):
 	"""Information Extractor for YouTube playlists."""

@@ -852,7 +941,7 @@ if __name__ == '__main__':
 		# Parse command line
 		parser = optparse.OptionParser(
 				usage='Usage: %prog [options] url...',
-				version='2009.01.31',
+				version='2009.03.03',
 				conflict_handler='resolve',
 				)
 		parser.add_option('-h', '--help',
@@ -891,6 +980,8 @@ if __name__ == '__main__':
 				dest='ratelimit', metavar='L', help='download rate limit (e.g. 50k or 44.6m)')
 		parser.add_option('-a', '--batch-file',
 				dest='batchfile', metavar='F', help='file containing URLs to download')
+		parser.add_option('-w', '--no-overwrites',
+				action='store_true', dest='nooverwrites', help='do not overwrite files', default=False)
 		(opts, args) = parser.parse_args()

 		# Batch file verification
@@ -925,6 +1016,7 @@ if __name__ == '__main__':
 		youtube_ie = YoutubeIE()
 		metacafe_ie = MetacafeIE(youtube_ie)
 		youtube_pl_ie = YoutubePlaylistIE(youtube_ie)
+		youtube_search_ie = YoutubeSearchIE(youtube_ie)

 		# File downloader
 		charset = locale.getdefaultlocale()[1]
@@ -945,7 +1037,9 @@ if __name__ == '__main__':
 				or u'%(id)s.%(ext)s'),
 			'ignoreerrors': opts.ignoreerrors,
 			'ratelimit': opts.ratelimit,
+			'nooverwrites': opts.nooverwrites,
 			})
+		fd.add_info_extractor(youtube_search_ie)
 		fd.add_info_extractor(youtube_pl_ie)
 		fd.add_info_extractor(metacafe_ie)
 		fd.add_info_extractor(youtube_ie)
Author	SHA1	Message	Date
Ricardo Garcia	7ab2043c9c	Bump version number	2010-10-31 11:23:52 +01:00
Ricardo Garcia	3e703dd1cd	Remove generator and webpage template, moved to wiki	2010-10-31 11:23:52 +01:00
Ricardo Garcia	cc10940385	Fix very wrong code for setting the language It turned out that, despite the program working without apparent errors, the code for setting the language was completely wrong. First, it didn't run unless some form of authentication was performed. Second, I misstyped _LANG_URL as _LOGIN_URL, so the language was not being set at all! Amazing it still worked.	2010-10-31 11:23:48 +01:00
Ricardo Garcia	5121ef2071	Fix wrong indentation	2010-10-31 11:23:48 +01:00
Ricardo Garcia	fd20984889	Bump version number	2010-10-31 11:23:48 +01:00
Ricardo Garcia	111ae3695c	Document new -w option	2010-10-31 11:23:48 +01:00
Ricardo Garcia	0beeff4b3e	Add que -w or --no-overwrites option	2010-10-31 11:23:48 +01:00
Ricardo Garcia	64a6f26c5d	Put Danny Colligan as an author in the script itself	2010-10-31 11:23:48 +01:00
Ricardo Garcia	a9633f1457	Use quote_plus instead of manually replacing spaces by plus signs	2010-10-31 11:23:48 +01:00
Ricardo Garcia	a20e4c2f96	Improve documentation of new features in webpage	2010-10-31 11:23:47 +01:00
Ricardo Garcia	d1536018a8	Include Danny Colligan in credits	2010-10-31 11:23:47 +01:00
Ricardo Garcia	25af2bce3a	Include Danny Colligan's YouTube search InfoExtractor	2010-10-31 11:23:47 +01:00