Compare commits
31 Commits
2017.04.28
...
2017.05.01
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
e0c1e9a98c | ||
|
|
086041e2f8 | ||
|
|
74da856544 | ||
|
|
9edf47df7b | ||
|
|
238cec17ae | ||
|
|
50534b7158 | ||
|
|
9cd4209724 | ||
|
|
33a81c2c6f | ||
|
|
deef31955b | ||
|
|
9dac2cec2d | ||
|
|
6ec371cd9e | ||
|
|
13081db1f5 | ||
|
|
b07ea5eaec | ||
|
|
5599253009 | ||
|
|
98ce1a3fd3 | ||
|
|
ba5c3caf88 | ||
|
|
b5c39537be | ||
|
|
1c7c76e4fb | ||
|
|
557194591a | ||
|
|
27e70a8f6c | ||
|
|
a4c81e4968 | ||
|
|
7986c3abcd | ||
|
|
a1ebfd4494 | ||
|
|
d19093bd50 | ||
|
|
24eb7c2578 | ||
|
|
e7db6759e4 | ||
|
|
b364c87c42 | ||
|
|
9222d94510 | ||
|
|
edd9221cd2 | ||
|
|
c95e2b5911 | ||
|
|
76c1951036 |
6
.github/ISSUE_TEMPLATE.md
vendored
6
.github/ISSUE_TEMPLATE.md
vendored
@@ -6,8 +6,8 @@
|
||||
|
||||
---
|
||||
|
||||
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.04.28*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
|
||||
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.04.28**
|
||||
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.05.01*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
|
||||
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.05.01**
|
||||
|
||||
### Before submitting an *issue* make sure you have:
|
||||
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
|
||||
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
|
||||
[debug] User config: []
|
||||
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
|
||||
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
|
||||
[debug] youtube-dl version 2017.04.28
|
||||
[debug] youtube-dl version 2017.05.01
|
||||
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
|
||||
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
|
||||
[debug] Proxy map: {}
|
||||
|
||||
46
ChangeLog
46
ChangeLog
@@ -1,3 +1,31 @@
|
||||
version 2017.05.01
|
||||
|
||||
Core
|
||||
+ [extractor/common] Extract view count from JSON-LD
|
||||
* [utils] Improve unified_timestamp
|
||||
+ [utils] Add video/mp2t to mimetype2ext
|
||||
* [downloader/external] Properly handle live stream downloading cancellation
|
||||
(#8932)
|
||||
+ [utils] Add support for unicode whitespace in clean_html on python 2 (#12906)
|
||||
|
||||
Extractors
|
||||
* [infoq] Make audio format extraction non fatal (#12938)
|
||||
* [brightcove] Allow whitespace around attribute names in embedded code
|
||||
+ [zaq1] Add support for zaq1.pl (#12693)
|
||||
+ [xvideos] Extract duration (#12828)
|
||||
* [vevo] Fix extraction (#12879)
|
||||
+ [noovo] Add support for noovo.ca (#12792)
|
||||
+ [washingtonpost] Add support for embeds (#12699)
|
||||
* [yandexmusic:playlist] Fix extraction for python 3 (#12888)
|
||||
* [anvato] Improve extraction (#12913)
|
||||
* Promote to regular shortcut based extractor
|
||||
* Add mcp to access key mapping table
|
||||
* Add support for embeds extraction
|
||||
* Add support for anvato embeds in generic extractor
|
||||
* [xtube] Fix extraction for older FLV videos (#12734)
|
||||
* [tvplayer] Fix extraction (#12908)
|
||||
|
||||
|
||||
version 2017.04.28
|
||||
|
||||
Core
|
||||
@@ -24,19 +52,19 @@ Core
|
||||
* [YoutubeDL] Fix output template for missing timestamp (#12796)
|
||||
* [socks] Handle cases where credentials are required but missing
|
||||
* [extractor/common] Improve HLS extraction (#12211)
|
||||
- Extract m3u8 parsing to separate method
|
||||
- Improve rendition groups extraction
|
||||
- Build stream name according stream GROUP-ID
|
||||
- Ignore reference to AUDIO group without URI when stream has no CODECS
|
||||
- Use float for scaled tbr in _parse_m3u8_formats
|
||||
* Extract m3u8 parsing to separate method
|
||||
* Improve rendition groups extraction
|
||||
* Build stream name according stream GROUP-ID
|
||||
* Ignore reference to AUDIO group without URI when stream has no CODECS
|
||||
* Use float for scaled tbr in _parse_m3u8_formats
|
||||
* [utils] Add support for TTML styles in dfxp2srt
|
||||
* [downloader/hls] No need to download keys for fragments that have been
|
||||
already downloaded
|
||||
* [downloader/fragment] Improve fragment downloading
|
||||
- Resume immediately
|
||||
- Don't concatenate fragments and decrypt them on every resume
|
||||
- Optimize disk storage usage, don't store intermediate fragments on disk
|
||||
- Store bookkeeping download state file
|
||||
* Resume immediately
|
||||
* Don't concatenate fragments and decrypt them on every resume
|
||||
* Optimize disk storage usage, don't store intermediate fragments on disk
|
||||
* Store bookkeeping download state file
|
||||
+ [extractor/common] Add support for multiple getters in try_get
|
||||
+ [extractor/common] Add support for video of WebPage context in _json_ld
|
||||
(#12778)
|
||||
|
||||
@@ -45,6 +45,7 @@
|
||||
- **anderetijden**: npo.nl and ntr.nl
|
||||
- **AnimeOnDemand**
|
||||
- **anitube.se**
|
||||
- **Anvato**
|
||||
- **AnySex**
|
||||
- **Aparat**
|
||||
- **AppleConnect**
|
||||
@@ -529,6 +530,7 @@
|
||||
- **NJPWWorld**: 新日本プロレスワールド
|
||||
- **NobelPrize**
|
||||
- **Noco**
|
||||
- **Noovo**
|
||||
- **Normalboots**
|
||||
- **NosVideo**
|
||||
- **Nova**: TN.cz, Prásk.tv, Nova.cz, Novaplus.cz, FANDA.tv, Krásná.cz and Doma.cz
|
||||
@@ -1013,6 +1015,7 @@
|
||||
- **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword)
|
||||
- **youtube:watchlater**: Youtube watch later list, ":ytwatchlater" for short (requires authentication)
|
||||
- **Zapiks**
|
||||
- **Zaq1**
|
||||
- **ZDF**
|
||||
- **ZDFChannel**
|
||||
- **zingmp3**: mp3.zing.vn
|
||||
|
||||
@@ -338,6 +338,7 @@ class TestUtil(unittest.TestCase):
|
||||
self.assertEqual(unified_timestamp('UNKNOWN DATE FORMAT'), None)
|
||||
self.assertEqual(unified_timestamp('May 16, 2016 11:15 PM'), 1463440500)
|
||||
self.assertEqual(unified_timestamp('Feb 7, 2016 at 6:35 pm'), 1454870100)
|
||||
self.assertEqual(unified_timestamp('2017-03-30T17:52:41Q'), 1490896361)
|
||||
|
||||
def test_determine_ext(self):
|
||||
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
|
||||
@@ -899,6 +900,7 @@ class TestUtil(unittest.TestCase):
|
||||
def test_clean_html(self):
|
||||
self.assertEqual(clean_html('a:\nb'), 'a: b')
|
||||
self.assertEqual(clean_html('a:\n "b"'), 'a: "b"')
|
||||
self.assertEqual(clean_html('a<br>\xa0b'), 'a\nb')
|
||||
|
||||
def test_intlist_to_bytes(self):
|
||||
self.assertEqual(
|
||||
|
||||
@@ -29,7 +29,17 @@ class ExternalFD(FileDownloader):
|
||||
self.report_destination(filename)
|
||||
tmpfilename = self.temp_name(filename)
|
||||
|
||||
retval = self._call_downloader(tmpfilename, info_dict)
|
||||
try:
|
||||
retval = self._call_downloader(tmpfilename, info_dict)
|
||||
except KeyboardInterrupt:
|
||||
if not info_dict.get('is_live'):
|
||||
raise
|
||||
# Live stream downloading cancellation should be considered as
|
||||
# correct and expected termination thus all postprocessing
|
||||
# should take place
|
||||
retval = 0
|
||||
self.to_screen('[%s] Interrupted by user' % self.get_basename())
|
||||
|
||||
if retval == 0:
|
||||
fsize = os.path.getsize(encodeFilename(tmpfilename))
|
||||
self.to_screen('\r[%s] Downloaded %s bytes' % (self.get_basename(), fsize))
|
||||
|
||||
@@ -49,7 +49,7 @@ class FragmentFD(FileDownloader):
|
||||
index: 0-based index of current fragment among all fragments
|
||||
fragment_count:
|
||||
Total count of fragments
|
||||
|
||||
|
||||
This feature is experimental and file format may change in future.
|
||||
"""
|
||||
|
||||
|
||||
@@ -5,6 +5,7 @@ import base64
|
||||
import hashlib
|
||||
import json
|
||||
import random
|
||||
import re
|
||||
import time
|
||||
|
||||
from .common import InfoExtractor
|
||||
@@ -16,6 +17,7 @@ from ..utils import (
|
||||
intlist_to_bytes,
|
||||
int_or_none,
|
||||
strip_jsonp,
|
||||
unescapeHTML,
|
||||
)
|
||||
|
||||
|
||||
@@ -26,6 +28,8 @@ def md5_text(s):
|
||||
|
||||
|
||||
class AnvatoIE(InfoExtractor):
|
||||
_VALID_URL = r'anvato:(?P<access_key_or_mcp>[^:]+):(?P<id>\d+)'
|
||||
|
||||
# Copied from anvplayer.min.js
|
||||
_ANVACK_TABLE = {
|
||||
'nbcu_nbcd_desktop_web_prod_93d8ead38ce2024f8f544b78306fbd15895ae5e6': 'NNemUkySjxLyPTKvZRiGntBIjEyK8uqicjMakIaQ',
|
||||
@@ -114,6 +118,22 @@ class AnvatoIE(InfoExtractor):
|
||||
'nbcu_nbcd_desktop_web_prod_93d8ead38ce2024f8f544b78306fbd15895ae5e6_secure': 'NNemUkySjxLyPTKvZRiGntBIjEyK8uqicjMakIaQ'
|
||||
}
|
||||
|
||||
_MCP_TO_ACCESS_KEY_TABLE = {
|
||||
'qa': 'anvato_mcpqa_demo_web_stage_18b55e00db5a13faa8d03ae6e41f6f5bcb15b922',
|
||||
'lin': 'anvato_mcp_lin_web_prod_4c36fbfd4d8d8ecae6488656e21ac6d1ac972749',
|
||||
'univison': 'anvato_mcp_univision_web_prod_37fe34850c99a3b5cdb71dab10a417dd5cdecafa',
|
||||
'uni': 'anvato_mcp_univision_web_prod_37fe34850c99a3b5cdb71dab10a417dd5cdecafa',
|
||||
'dev': 'anvato_mcp_fs2go_web_prod_c7b90a93e171469cdca00a931211a2f556370d0a',
|
||||
'sps': 'anvato_mcp_sps_web_prod_54bdc90dd6ba21710e9f7074338365bba28da336',
|
||||
'spsstg': 'anvato_mcp_sps_web_prod_54bdc90dd6ba21710e9f7074338365bba28da336',
|
||||
'anv': 'anvato_mcp_anv_web_prod_791407490f4c1ef2a4bcb21103e0cb1bcb3352b3',
|
||||
'gray': 'anvato_mcp_gray_web_prod_4c10f067c393ed8fc453d3930f8ab2b159973900',
|
||||
'hearst': 'anvato_mcp_hearst_web_prod_5356c3de0fc7c90a3727b4863ca7fec3a4524a99',
|
||||
'cbs': 'anvato_mcp_cbs_web_prod_02f26581ff80e5bda7aad28226a8d369037f2cbe',
|
||||
'telemundo': 'anvato_mcp_telemundo_web_prod_c5278d51ad46fda4b6ca3d0ea44a7846a054f582'
|
||||
}
|
||||
|
||||
_ANVP_RE = r'<script[^>]+\bdata-anvp\s*=\s*(["\'])(?P<anvp>(?:(?!\1).)+)\1'
|
||||
_AUTH_KEY = b'\x31\xc2\x42\x84\x9e\x73\xa0\xce'
|
||||
|
||||
def __init__(self, *args, **kwargs):
|
||||
@@ -217,9 +237,42 @@ class AnvatoIE(InfoExtractor):
|
||||
'subtitles': subtitles,
|
||||
}
|
||||
|
||||
@staticmethod
|
||||
def _extract_urls(ie, webpage, video_id):
|
||||
entries = []
|
||||
for mobj in re.finditer(AnvatoIE._ANVP_RE, webpage):
|
||||
anvplayer_data = ie._parse_json(
|
||||
mobj.group('anvp'), video_id, transform_source=unescapeHTML,
|
||||
fatal=False)
|
||||
if not anvplayer_data:
|
||||
continue
|
||||
video = anvplayer_data.get('video')
|
||||
if not isinstance(video, compat_str) or not video.isdigit():
|
||||
continue
|
||||
access_key = anvplayer_data.get('accessKey')
|
||||
if not access_key:
|
||||
mcp = anvplayer_data.get('mcp')
|
||||
if mcp:
|
||||
access_key = AnvatoIE._MCP_TO_ACCESS_KEY_TABLE.get(
|
||||
mcp.lower())
|
||||
if not access_key:
|
||||
continue
|
||||
entries.append(ie.url_result(
|
||||
'anvato:%s:%s' % (access_key, video), ie=AnvatoIE.ie_key(),
|
||||
video_id=video))
|
||||
return entries
|
||||
|
||||
def _extract_anvato_videos(self, webpage, video_id):
|
||||
anvplayer_data = self._parse_json(self._html_search_regex(
|
||||
r'<script[^>]+data-anvp=\'([^\']+)\'', webpage,
|
||||
'Anvato player data'), video_id)
|
||||
anvplayer_data = self._parse_json(
|
||||
self._html_search_regex(
|
||||
self._ANVP_RE, webpage, 'Anvato player data', group='anvp'),
|
||||
video_id)
|
||||
return self._get_anvato_videos(
|
||||
anvplayer_data['accessKey'], anvplayer_data['video'])
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
access_key, video_id = mobj.group('access_key_or_mcp', 'id')
|
||||
if access_key not in self._ANVACK_TABLE:
|
||||
access_key = self._MCP_TO_ACCESS_KEY_TABLE[access_key]
|
||||
return self._get_anvato_videos(access_key, video_id)
|
||||
|
||||
@@ -70,7 +70,8 @@ class AppleTrailersIE(InfoExtractor):
|
||||
}, {
|
||||
'url': 'http://trailers.apple.com/trailers/magnolia/blackthorn/',
|
||||
'info_dict': {
|
||||
'id': 'blackthorn',
|
||||
'id': '4489',
|
||||
'title': 'Blackthorn',
|
||||
},
|
||||
'playlist_mincount': 2,
|
||||
'expected_warnings': ['Unable to download JSON metadata'],
|
||||
@@ -261,7 +262,7 @@ class AppleTrailersSectionIE(InfoExtractor):
|
||||
'title': 'Most Popular',
|
||||
'id': 'mostpopular',
|
||||
},
|
||||
'playlist_mincount': 80,
|
||||
'playlist_mincount': 30,
|
||||
}, {
|
||||
'url': 'http://trailers.apple.com/#section=moviestudios',
|
||||
'info_dict': {
|
||||
|
||||
@@ -522,7 +522,7 @@ class BrightcoveNewIE(InfoExtractor):
|
||||
# [2] looks like:
|
||||
for video, script_tag, account_id, player_id, embed in re.findall(
|
||||
r'''(?isx)
|
||||
(<video\s+[^>]*data-video-id=['"]?[^>]+>)
|
||||
(<video\s+[^>]*\bdata-video-id\s*=\s*['"]?[^>]+>)
|
||||
(?:.*?
|
||||
(<script[^>]+
|
||||
src=["\'](?:https?:)?//players\.brightcove\.net/
|
||||
|
||||
@@ -96,6 +96,7 @@ class CBCIE(InfoExtractor):
|
||||
'info_dict': {
|
||||
'title': 'Keep Rover active during the deep freeze with doggie pushups and other fun indoor tasks',
|
||||
'id': 'dog-indoor-exercise-winter-1.3928238',
|
||||
'description': 'md5:c18552e41726ee95bd75210d1ca9194c',
|
||||
},
|
||||
'playlist_mincount': 6,
|
||||
}]
|
||||
@@ -165,12 +166,11 @@ class CBCPlayerIE(InfoExtractor):
|
||||
'uploader': 'CBCC-NEW',
|
||||
},
|
||||
}, {
|
||||
# available only when we add `formats=MPEG4,FLV,MP3` to theplatform url
|
||||
'url': 'http://www.cbc.ca/player/play/2164402062',
|
||||
'md5': '17a61eb813539abea40618d6323a7f82',
|
||||
'md5': '33fcd8f6719b9dd60a5e73adcb83b9f6',
|
||||
'info_dict': {
|
||||
'id': '2164402062',
|
||||
'ext': 'flv',
|
||||
'ext': 'mp4',
|
||||
'title': 'Cancer survivor four times over',
|
||||
'description': 'Tim Mayer has beaten three different forms of cancer four times in five years.',
|
||||
'timestamp': 1320410746,
|
||||
|
||||
@@ -990,6 +990,7 @@ class InfoExtractor(object):
|
||||
'tbr': int_or_none(e.get('bitrate')),
|
||||
'width': int_or_none(e.get('width')),
|
||||
'height': int_or_none(e.get('height')),
|
||||
'view_count': int_or_none(e.get('interactionCount')),
|
||||
})
|
||||
|
||||
for e in json_ld:
|
||||
|
||||
@@ -41,6 +41,7 @@ from .alphaporno import AlphaPornoIE
|
||||
from .amcnetworks import AMCNetworksIE
|
||||
from .animeondemand import AnimeOnDemandIE
|
||||
from .anitube import AnitubeIE
|
||||
from .anvato import AnvatoIE
|
||||
from .anysex import AnySexIE
|
||||
from .aol import AolIE
|
||||
from .allocine import AllocineIE
|
||||
@@ -662,6 +663,7 @@ from .nintendo import NintendoIE
|
||||
from .njpwworld import NJPWWorldIE
|
||||
from .nobelprize import NobelPrizeIE
|
||||
from .noco import NocoIE
|
||||
from .noovo import NoovoIE
|
||||
from .normalboots import NormalbootsIE
|
||||
from .nosvideo import NosVideoIE
|
||||
from .nova import NovaIE
|
||||
@@ -1298,5 +1300,6 @@ from .youtube import (
|
||||
YoutubeWatchLaterIE,
|
||||
)
|
||||
from .zapiks import ZapiksIE
|
||||
from .zaq1 import Zaq1IE
|
||||
from .zdf import ZDFIE, ZDFChannelIE
|
||||
from .zingmp3 import ZingMp3IE
|
||||
|
||||
@@ -86,6 +86,8 @@ from .openload import OpenloadIE
|
||||
from .videopress import VideoPressIE
|
||||
from .rutube import RutubeIE
|
||||
from .limelight import LimelightBaseIE
|
||||
from .anvato import AnvatoIE
|
||||
from .washingtonpost import WashingtonPostIE
|
||||
|
||||
|
||||
class GenericIE(InfoExtractor):
|
||||
@@ -1427,6 +1429,22 @@ class GenericIE(InfoExtractor):
|
||||
'skip_download': True,
|
||||
},
|
||||
},
|
||||
{
|
||||
# Brightcove embed with whitespace around attribute names
|
||||
'url': 'http://www.stack.com/video/3167554373001/learn-to-hit-open-three-pointers-with-damian-lillard-s-baseline-drift-drill',
|
||||
'info_dict': {
|
||||
'id': '3167554373001',
|
||||
'ext': 'mp4',
|
||||
'title': "Learn to Hit Open Three-Pointers With Damian Lillard's Baseline Drift Drill",
|
||||
'description': 'md5:57bacb0e0f29349de4972bfda3191713',
|
||||
'uploader_id': '1079349493',
|
||||
'upload_date': '20140207',
|
||||
'timestamp': 1391810548,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
},
|
||||
# Another form of arte.tv embed
|
||||
{
|
||||
'url': 'http://www.tv-replay.fr/redirection/09-04-16/arte-reportage-arte-11508975.html',
|
||||
@@ -1677,6 +1695,29 @@ class GenericIE(InfoExtractor):
|
||||
},
|
||||
'playlist_mincount': 5,
|
||||
},
|
||||
{
|
||||
'url': 'http://kron4.com/2017/04/28/standoff-with-walnut-creek-murder-suspect-ends-with-arrest/',
|
||||
'info_dict': {
|
||||
'id': 'standoff-with-walnut-creek-murder-suspect-ends-with-arrest',
|
||||
'title': 'Standoff with Walnut Creek murder suspect ends',
|
||||
'description': 'md5:3ccc48a60fc9441eeccfc9c469ebf788',
|
||||
},
|
||||
'playlist_mincount': 4,
|
||||
},
|
||||
{
|
||||
# WashingtonPost embed
|
||||
'url': 'http://www.vanityfair.com/hollywood/2017/04/donald-trump-tv-pitches',
|
||||
'info_dict': {
|
||||
'id': '8caf6e88-d0ec-11e5-90d3-34c2c42653ac',
|
||||
'ext': 'mp4',
|
||||
'title': "No one has seen the drama series based on Trump's life \u2014 until now",
|
||||
'description': 'Donald Trump wanted a weekly TV drama based on his life. It never aired. But The Washington Post recently obtained a scene from the pilot script — and enlisted actors.',
|
||||
'timestamp': 1455216756,
|
||||
'uploader': 'The Washington Post',
|
||||
'upload_date': '20160211',
|
||||
},
|
||||
'add_ie': [WashingtonPostIE.ie_key()],
|
||||
},
|
||||
# {
|
||||
# # TODO: find another test
|
||||
# # http://schema.org/VideoObject
|
||||
@@ -2537,6 +2578,12 @@ class GenericIE(InfoExtractor):
|
||||
'limelight:media:%s' % mobj.group('id'),
|
||||
{'source_url': url}), 'LimelightMedia', mobj.group('id'))
|
||||
|
||||
# Look for Anvato embeds
|
||||
anvato_urls = AnvatoIE._extract_urls(self, webpage, video_id)
|
||||
if anvato_urls:
|
||||
return self.playlist_result(
|
||||
anvato_urls, video_id, video_title, video_description)
|
||||
|
||||
# Look for AdobeTVVideo embeds
|
||||
mobj = re.search(
|
||||
r'<iframe[^>]+src=[\'"]((?:https?:)?//video\.tv\.adobe\.com/v/\d+[^"]+)[\'"]',
|
||||
@@ -2654,6 +2701,12 @@ class GenericIE(InfoExtractor):
|
||||
return self.playlist_from_matches(
|
||||
rutube_urls, ie=RutubeIE.ie_key())
|
||||
|
||||
# Look for WashingtonPost embeds
|
||||
wapo_urls = WashingtonPostIE._extract_urls(webpage)
|
||||
if wapo_urls:
|
||||
return self.playlist_from_matches(
|
||||
wapo_urls, video_id, video_title, ie=WashingtonPostIE.ie_key())
|
||||
|
||||
# Looking for http://schema.org/VideoObject
|
||||
json_ld = self._search_json_ld(
|
||||
webpage, video_id, default={}, expected_type='VideoObject')
|
||||
|
||||
@@ -87,8 +87,8 @@ class InfoQIE(BokeCCBaseIE):
|
||||
|
||||
def _extract_http_audio(self, webpage, video_id):
|
||||
fields = self._hidden_inputs(webpage)
|
||||
http_audio_url = fields['filename']
|
||||
if http_audio_url is None:
|
||||
http_audio_url = fields.get('filename')
|
||||
if not http_audio_url:
|
||||
return []
|
||||
|
||||
cookies_header = {'Cookie': self._extract_cookies(webpage)}
|
||||
|
||||
97
youtube_dl/extractor/noovo.py
Normal file
97
youtube_dl/extractor/noovo.py
Normal file
@@ -0,0 +1,97 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .brightcove import BrightcoveNewIE
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_str
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
smuggle_url,
|
||||
try_get,
|
||||
)
|
||||
|
||||
|
||||
class NoovoIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:[^/]+\.)?noovo\.ca/videos/(?P<id>[^/]+/[^/?#&]+)'
|
||||
_TESTS = [{
|
||||
# clip
|
||||
'url': 'http://noovo.ca/videos/rpm-plus/chrysler-imperial',
|
||||
'info_dict': {
|
||||
'id': '5386045029001',
|
||||
'ext': 'mp4',
|
||||
'title': 'Chrysler Imperial',
|
||||
'description': 'md5:de3c898d1eb810f3e6243e08c8b4a056',
|
||||
'timestamp': 1491399228,
|
||||
'upload_date': '20170405',
|
||||
'uploader_id': '618566855001',
|
||||
'creator': 'vtele',
|
||||
'view_count': int,
|
||||
'series': 'RPM+',
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
}, {
|
||||
# episode
|
||||
'url': 'http://noovo.ca/videos/l-amour-est-dans-le-pre/episode-13-8',
|
||||
'info_dict': {
|
||||
'id': '5395865725001',
|
||||
'title': 'Épisode 13 : Les retrouvailles',
|
||||
'description': 'md5:336d5ebc5436534e61d16e63ddfca327',
|
||||
'ext': 'mp4',
|
||||
'timestamp': 1492019320,
|
||||
'upload_date': '20170412',
|
||||
'uploader_id': '618566855001',
|
||||
'creator': 'vtele',
|
||||
'view_count': int,
|
||||
'series': "L'amour est dans le pré",
|
||||
'season_number': 5,
|
||||
'episode': 'Épisode 13',
|
||||
'episode_number': 13,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
}]
|
||||
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/618566855001/default_default/index.html?videoId=%s'
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
data = self._download_json(
|
||||
'http://api.noovo.ca/api/v1/pages/single-episode/%s' % video_id,
|
||||
video_id)['data']
|
||||
|
||||
content = try_get(data, lambda x: x['contents'][0])
|
||||
|
||||
brightcove_id = data.get('brightcoveId') or content['brightcoveId']
|
||||
|
||||
series = try_get(
|
||||
data, (
|
||||
lambda x: x['show']['title'],
|
||||
lambda x: x['season']['show']['title']),
|
||||
compat_str)
|
||||
|
||||
episode = None
|
||||
og = data.get('og')
|
||||
if isinstance(og, dict) and og.get('type') == 'video.episode':
|
||||
episode = og.get('title')
|
||||
|
||||
video = content or data
|
||||
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
'ie_key': BrightcoveNewIE.ie_key(),
|
||||
'url': smuggle_url(
|
||||
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
|
||||
{'geo_countries': ['CA']}),
|
||||
'id': brightcove_id,
|
||||
'title': video.get('title'),
|
||||
'creator': video.get('source'),
|
||||
'view_count': int_or_none(video.get('viewsCount')),
|
||||
'series': series,
|
||||
'season_number': int_or_none(try_get(
|
||||
data, lambda x: x['season']['seasonNumber'])),
|
||||
'episode': episode,
|
||||
'episode_number': int_or_none(data.get('episodeNumber')),
|
||||
}
|
||||
@@ -2,9 +2,13 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_HTTPError
|
||||
from ..compat import (
|
||||
compat_HTTPError,
|
||||
compat_str,
|
||||
)
|
||||
from ..utils import (
|
||||
extract_attributes,
|
||||
try_get,
|
||||
urlencode_postdata,
|
||||
ExtractorError,
|
||||
)
|
||||
@@ -34,25 +38,32 @@ class TVPlayerIE(InfoExtractor):
|
||||
webpage, 'channel element'))
|
||||
title = current_channel['data-name']
|
||||
|
||||
resource_id = self._search_regex(
|
||||
r'resourceId\s*=\s*"(\d+)"', webpage, 'resource id')
|
||||
platform = self._search_regex(
|
||||
r'platform\s*=\s*"([^"]+)"', webpage, 'platform')
|
||||
resource_id = current_channel['data-id']
|
||||
|
||||
token = self._search_regex(
|
||||
r'token\s*=\s*"([^"]+)"', webpage, 'token', default='null')
|
||||
validate = self._search_regex(
|
||||
r'validate\s*=\s*"([^"]+)"', webpage, 'validate', default='null')
|
||||
r'data-token=(["\'])(?P<token>(?!\1).+)\1', webpage,
|
||||
'token', group='token')
|
||||
|
||||
context = self._download_json(
|
||||
'https://tvplayer.com/watch/context', display_id,
|
||||
'Downloading JSON context', query={
|
||||
'resource': resource_id,
|
||||
'nonce': token,
|
||||
})
|
||||
|
||||
validate = context['validate']
|
||||
platform = try_get(
|
||||
context, lambda x: x['platform']['key'], compat_str) or 'firefox'
|
||||
|
||||
try:
|
||||
response = self._download_json(
|
||||
'http://api.tvplayer.com/api/v2/stream/live',
|
||||
resource_id, headers={
|
||||
display_id, 'Downloading JSON stream', headers={
|
||||
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
|
||||
}, data=urlencode_postdata({
|
||||
'id': resource_id,
|
||||
'service': 1,
|
||||
'platform': platform,
|
||||
'id': resource_id,
|
||||
'token': token,
|
||||
'validate': validate,
|
||||
}))['tvplayer']['response']
|
||||
except ExtractorError as e:
|
||||
@@ -63,7 +74,7 @@ class TVPlayerIE(InfoExtractor):
|
||||
'%s said: %s' % (self.IE_NAME, response['error']), expected=True)
|
||||
raise
|
||||
|
||||
formats = self._extract_m3u8_formats(response['stream'], resource_id, 'mp4')
|
||||
formats = self._extract_m3u8_formats(response['stream'], display_id, 'mp4')
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
import json
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import (
|
||||
@@ -11,7 +12,6 @@ from ..compat import (
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
sanitized_Request,
|
||||
parse_iso8601,
|
||||
)
|
||||
|
||||
@@ -154,19 +154,24 @@ class VevoIE(VevoBaseIE):
|
||||
}
|
||||
|
||||
def _initialize_api(self, video_id):
|
||||
req = sanitized_Request(
|
||||
'http://www.vevo.com/auth', data=b'')
|
||||
webpage = self._download_webpage(
|
||||
req, None,
|
||||
'https://accounts.vevo.com/token', None,
|
||||
note='Retrieving oauth token',
|
||||
errnote='Unable to retrieve oauth token')
|
||||
errnote='Unable to retrieve oauth token',
|
||||
data=json.dumps({
|
||||
'client_id': 'SPupX1tvqFEopQ1YS6SS',
|
||||
'grant_type': 'urn:vevo:params:oauth:grant-type:anonymous',
|
||||
}).encode('utf-8'),
|
||||
headers={
|
||||
'Content-Type': 'application/json',
|
||||
})
|
||||
|
||||
if re.search(r'(?i)THIS PAGE IS CURRENTLY UNAVAILABLE IN YOUR REGION', webpage):
|
||||
self.raise_geo_restricted(
|
||||
'%s said: This page is currently unavailable in your region' % self.IE_NAME)
|
||||
|
||||
auth_info = self._parse_json(webpage, video_id)
|
||||
self._api_url_template = self.http_scheme() + '//apiv2.vevo.com/%s?token=' + auth_info['access_token']
|
||||
self._api_url_template = self.http_scheme() + '//apiv2.vevo.com/%s?token=' + auth_info['legacy_token']
|
||||
|
||||
def _call_api(self, path, *args, **kwargs):
|
||||
try:
|
||||
|
||||
@@ -13,6 +13,7 @@ from ..utils import (
|
||||
class WashingtonPostIE(InfoExtractor):
|
||||
IE_NAME = 'washingtonpost'
|
||||
_VALID_URL = r'(?:washingtonpost:|https?://(?:www\.)?washingtonpost\.com/video/(?:[^/]+/)*)(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
|
||||
_EMBED_URL = r'https?://(?:www\.)?washingtonpost\.com/video/c/embed/[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}'
|
||||
_TEST = {
|
||||
'url': 'https://www.washingtonpost.com/video/c/video/480ba4ee-1ec7-11e6-82c2-a7dcb313287d',
|
||||
'md5': '6f537e1334b714eb15f9563bd4b9cdfa',
|
||||
@@ -27,6 +28,11 @@ class WashingtonPostIE(InfoExtractor):
|
||||
},
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def _extract_urls(cls, webpage):
|
||||
return re.findall(
|
||||
r'<iframe[^>]+\bsrc=["\'](%s)' % cls._EMBED_URL, webpage)
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
video_data = self._download_json(
|
||||
|
||||
@@ -6,6 +6,7 @@ import re
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
js_to_json,
|
||||
orderedSet,
|
||||
parse_duration,
|
||||
sanitized_Request,
|
||||
@@ -37,6 +38,22 @@ class XTubeIE(InfoExtractor):
|
||||
'comment_count': int,
|
||||
'age_limit': 18,
|
||||
}
|
||||
}, {
|
||||
# FLV videos with duplicated formats
|
||||
'url': 'http://www.xtube.com/video-watch/A-Super-Run-Part-1-YT-9299752',
|
||||
'md5': 'a406963eb349dd43692ec54631efd88b',
|
||||
'info_dict': {
|
||||
'id': '9299752',
|
||||
'display_id': 'A-Super-Run-Part-1-YT',
|
||||
'ext': 'flv',
|
||||
'title': 'A Super Run - Part 1 (YT)',
|
||||
'description': 'md5:ca0d47afff4a9b2942e4b41aa970fd93',
|
||||
'uploader': 'tshirtguy59',
|
||||
'duration': 579,
|
||||
'view_count': int,
|
||||
'comment_count': int,
|
||||
'age_limit': 18,
|
||||
},
|
||||
}, {
|
||||
# new URL schema
|
||||
'url': 'http://www.xtube.com/video-watch/strange-erotica-625837',
|
||||
@@ -68,8 +85,9 @@ class XTubeIE(InfoExtractor):
|
||||
})
|
||||
|
||||
sources = self._parse_json(self._search_regex(
|
||||
r'(["\'])sources\1\s*:\s*(?P<sources>{.+?}),',
|
||||
webpage, 'sources', group='sources'), video_id)
|
||||
r'(["\'])?sources\1?\s*:\s*(?P<sources>{.+?}),',
|
||||
webpage, 'sources', group='sources'), video_id,
|
||||
transform_source=js_to_json)
|
||||
|
||||
formats = []
|
||||
for format_id, format_url in sources.items():
|
||||
@@ -78,6 +96,7 @@ class XTubeIE(InfoExtractor):
|
||||
'format_id': format_id,
|
||||
'height': int_or_none(format_id),
|
||||
})
|
||||
self._remove_duplicate_formats(formats)
|
||||
self._sort_formats(formats)
|
||||
|
||||
title = self._search_regex(
|
||||
|
||||
@@ -6,8 +6,10 @@ from .common import InfoExtractor
|
||||
from ..compat import compat_urllib_parse_unquote
|
||||
from ..utils import (
|
||||
clean_html,
|
||||
ExtractorError,
|
||||
determine_ext,
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
parse_duration,
|
||||
)
|
||||
|
||||
|
||||
@@ -20,6 +22,7 @@ class XVideosIE(InfoExtractor):
|
||||
'id': '4588838',
|
||||
'ext': 'mp4',
|
||||
'title': 'Biker Takes his Girl',
|
||||
'duration': 108,
|
||||
'age_limit': 18,
|
||||
}
|
||||
}
|
||||
@@ -36,6 +39,11 @@ class XVideosIE(InfoExtractor):
|
||||
r'<title>(.*?)\s+-\s+XVID', webpage, 'title')
|
||||
video_thumbnail = self._search_regex(
|
||||
r'url_bigthumb=(.+?)&', webpage, 'thumbnail', fatal=False)
|
||||
video_duration = int_or_none(self._og_search_property(
|
||||
'duration', webpage, default=None)) or parse_duration(
|
||||
self._search_regex(
|
||||
r'<span[^>]+class=["\']duration["\'][^>]*>.*?(\d[^<]+)',
|
||||
webpage, 'duration', fatal=False))
|
||||
|
||||
formats = []
|
||||
|
||||
@@ -67,6 +75,7 @@ class XVideosIE(InfoExtractor):
|
||||
'id': video_id,
|
||||
'formats': formats,
|
||||
'title': video_title,
|
||||
'duration': video_duration,
|
||||
'thumbnail': video_thumbnail,
|
||||
'age_limit': 18,
|
||||
}
|
||||
|
||||
@@ -234,7 +234,8 @@ class YandexMusicPlaylistIE(YandexMusicPlaylistBaseIE):
|
||||
'overembed': 'false',
|
||||
})['playlist']
|
||||
|
||||
tracks, track_ids = playlist['tracks'], map(compat_str, playlist['trackIds'])
|
||||
tracks = playlist['tracks']
|
||||
track_ids = [compat_str(track_id) for track_id in playlist['trackIds']]
|
||||
|
||||
# tracks dictionary shipped with playlist.jsx API is limited to 150 tracks,
|
||||
# missing tracks should be retrieved manually.
|
||||
|
||||
101
youtube_dl/extractor/zaq1.py
Normal file
101
youtube_dl/extractor/zaq1.py
Normal file
@@ -0,0 +1,101 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
unified_timestamp,
|
||||
)
|
||||
|
||||
|
||||
class Zaq1IE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?zaq1\.pl/video/(?P<id>[^/?#&]+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://zaq1.pl/video/xev0e',
|
||||
'md5': '24a5eb3f052e604ae597c4d0d19b351e',
|
||||
'info_dict': {
|
||||
'id': 'xev0e',
|
||||
'title': 'DJ NA WESELE. TANIEC Z FIGURAMI.węgrów/sokołów podlaski/siedlce/mińsk mazowiecki/warszawa',
|
||||
'description': 'www.facebook.com/weseledjKontakt: 728 448 199 / 505 419 147',
|
||||
'ext': 'mp4',
|
||||
'duration': 511,
|
||||
'timestamp': 1490896361,
|
||||
'uploader': 'Anonim',
|
||||
'upload_date': '20170330',
|
||||
'view_count': int,
|
||||
}
|
||||
}, {
|
||||
# malformed JSON-LD
|
||||
'url': 'http://zaq1.pl/video/x81vn',
|
||||
'info_dict': {
|
||||
'id': 'x81vn',
|
||||
'title': 'SEKRETNE ŻYCIE WALTERA MITTY',
|
||||
'ext': 'mp4',
|
||||
'duration': 6234,
|
||||
'timestamp': 1493494860,
|
||||
'uploader': 'Anonim',
|
||||
'upload_date': '20170429',
|
||||
'view_count': int,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
'expected_warnings': ['Failed to parse JSON'],
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
video_url = self._search_regex(
|
||||
r'data-video-url=(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
|
||||
'video url', group='url')
|
||||
|
||||
info = self._search_json_ld(webpage, video_id, fatal=False)
|
||||
|
||||
def extract_data(field, name, fatal=False):
|
||||
return self._search_regex(
|
||||
r'data-%s=(["\'])(?P<field>(?:(?!\1).)+)\1' % field,
|
||||
webpage, field, fatal=fatal, group='field')
|
||||
|
||||
if not info.get('title'):
|
||||
info['title'] = extract_data('file-name', 'title', fatal=True)
|
||||
|
||||
if not info.get('duration'):
|
||||
info['duration'] = int_or_none(extract_data('duration', 'duration'))
|
||||
|
||||
if not info.get('thumbnail'):
|
||||
info['thumbnail'] = extract_data('photo-url', 'thumbnail')
|
||||
|
||||
if not info.get('timestamp'):
|
||||
info['timestamp'] = unified_timestamp(self._html_search_meta(
|
||||
'uploadDate', webpage, 'timestamp'))
|
||||
|
||||
if not info.get('interactionCount'):
|
||||
info['view_count'] = int_or_none(self._html_search_meta(
|
||||
'interactionCount', webpage, 'view count'))
|
||||
|
||||
uploader = self._html_search_regex(
|
||||
r'Wideo dodał:\s*<a[^>]*>([^<]+)</a>', webpage, 'uploader',
|
||||
fatal=False)
|
||||
|
||||
width = int_or_none(self._html_search_meta(
|
||||
'width', webpage, fatal=False))
|
||||
height = int_or_none(self._html_search_meta(
|
||||
'height', webpage, fatal=False))
|
||||
|
||||
info.update({
|
||||
'id': video_id,
|
||||
'formats': [{
|
||||
'url': video_url,
|
||||
'width': width,
|
||||
'height': height,
|
||||
'http_headers': {
|
||||
'Referer': url,
|
||||
},
|
||||
}],
|
||||
'uploader': uploader,
|
||||
})
|
||||
|
||||
return info
|
||||
@@ -421,8 +421,8 @@ def clean_html(html):
|
||||
|
||||
# Newline vs <br />
|
||||
html = html.replace('\n', ' ')
|
||||
html = re.sub(r'\s*<\s*br\s*/?\s*>\s*', '\n', html)
|
||||
html = re.sub(r'<\s*/\s*p\s*>\s*<\s*p[^>]*>', '\n', html)
|
||||
html = re.sub(r'(?u)\s*<\s*br\s*/?\s*>\s*', '\n', html)
|
||||
html = re.sub(r'(?u)<\s*/\s*p\s*>\s*<\s*p[^>]*>', '\n', html)
|
||||
# Strip html tags
|
||||
html = re.sub('<.*?>', '', html)
|
||||
# Replace html entities
|
||||
@@ -1194,6 +1194,11 @@ def unified_timestamp(date_str, day_first=True):
|
||||
# Remove AM/PM + timezone
|
||||
date_str = re.sub(r'(?i)\s*(?:AM|PM)(?:\s+[A-Z]+)?', '', date_str)
|
||||
|
||||
# Remove unrecognized timezones from ISO 8601 alike timestamps
|
||||
m = re.search(r'\d{1,2}:\d{1,2}(?:\.\d+)?(?P<tz>\s*[A-Z]+)$', date_str)
|
||||
if m:
|
||||
date_str = date_str[:-len(m.group('tz'))]
|
||||
|
||||
for expression in date_formats(day_first):
|
||||
try:
|
||||
dt = datetime.datetime.strptime(date_str, expression) - timezone + datetime.timedelta(hours=pm_delta)
|
||||
@@ -2273,10 +2278,8 @@ def mimetype2ext(mt):
|
||||
return {
|
||||
'3gpp': '3gp',
|
||||
'smptett+xml': 'tt',
|
||||
'srt': 'srt',
|
||||
'ttaf+xml': 'dfxp',
|
||||
'ttml+xml': 'ttml',
|
||||
'vtt': 'vtt',
|
||||
'x-flv': 'flv',
|
||||
'x-mp4-fragmented': 'mp4',
|
||||
'x-ms-wmv': 'wmv',
|
||||
@@ -2284,11 +2287,11 @@ def mimetype2ext(mt):
|
||||
'x-mpegurl': 'm3u8',
|
||||
'vnd.apple.mpegurl': 'm3u8',
|
||||
'dash+xml': 'mpd',
|
||||
'f4m': 'f4m',
|
||||
'f4m+xml': 'f4m',
|
||||
'hds+xml': 'f4m',
|
||||
'vnd.ms-sstr+xml': 'ism',
|
||||
'quicktime': 'mov',
|
||||
'mp2t': 'ts',
|
||||
}.get(res, res)
|
||||
|
||||
|
||||
|
||||
@@ -1,3 +1,3 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
__version__ = '2017.04.28'
|
||||
__version__ = '2017.05.01'
|
||||
|
||||
Reference in New Issue
Block a user