release 2017.05.01

[ChangeLog] Actualize
[infoq] Make audio format extraction non fatal (closes #12938 )
2017-05-01 01:39:52 +07:00 · 2017-05-01 01:34:51 +07:00 · 2017-05-01 01:23:05 +07:00 · 2017-05-01 01:03:47 +07:00 · 2017-04-30 22:04:21 +07:00 · 2017-04-30 22:04:01 +07:00
24 changed files with 459 additions and 53 deletions
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@@ -6,8 +6,8 @@

 ---

-### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.04.28*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.04.28**
+### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.05.01*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
+- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.05.01**

 ### Before submitting an *issue* make sure you have:
 - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
 [debug] User config: []
 [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
-[debug] youtube-dl version 2017.04.28
+[debug] youtube-dl version 2017.05.01
 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
 [debug] Proxy map: {}
--- a/46
+++ b/46
@@ -1,3 +1,31 @@
+version 2017.05.01
+
+Core
+ [extractor/common] Extract view count from JSON-LD
+* [utils] Improve unified_timestamp
+ [utils] Add video/mp2t to mimetype2ext
+* [downloader/external] Properly handle live stream downloading cancellation
+  (#8932)
+ [utils] Add support for unicode whitespace in clean_html on python 2 (#12906)
+
+Extractors
+* [infoq] Make audio format extraction non fatal (#12938)
+* [brightcove] Allow whitespace around attribute names in embedded code
+ [zaq1] Add support for zaq1.pl (#12693)
+ [xvideos] Extract duration (#12828)
+* [vevo] Fix extraction (#12879)
+ [noovo] Add support for noovo.ca (#12792)
+ [washingtonpost] Add support for embeds (#12699)
+* [yandexmusic:playlist] Fix extraction for python 3 (#12888)
+* [anvato] Improve extraction (#12913)
+    * Promote to regular shortcut based extractor
+    * Add mcp to access key mapping table
+    * Add support for embeds extraction
+    * Add support for anvato embeds in generic extractor
+* [xtube] Fix extraction for older FLV videos (#12734)
+* [tvplayer] Fix extraction (#12908)
+
+
 version 2017.04.28

 Core
@@ -24,19 +52,19 @@ Core
 * [YoutubeDL] Fix output template for missing timestamp (#12796)
 * [socks] Handle cases where credentials are required but missing
 * [extractor/common] Improve HLS extraction (#12211)
-    - Extract m3u8 parsing to separate method
-    - Improve rendition groups extraction
-    - Build stream name according stream GROUP-ID
-    - Ignore reference to AUDIO group without URI when stream has no CODECS
-    - Use float for scaled tbr in _parse_m3u8_formats
+    * Extract m3u8 parsing to separate method
+    * Improve rendition groups extraction
+    * Build stream name according stream GROUP-ID
+    * Ignore reference to AUDIO group without URI when stream has no CODECS
+    * Use float for scaled tbr in _parse_m3u8_formats
 * [utils] Add support for TTML styles in dfxp2srt
 * [downloader/hls] No need to download keys for fragments that have been
  already downloaded
 * [downloader/fragment] Improve fragment downloading
-    - Resume immediately
-    - Don't concatenate fragments and decrypt them on every resume
-    - Optimize disk storage usage, don't store intermediate fragments on disk
-    - Store bookkeeping download state file
+    * Resume immediately
+    * Don't concatenate fragments and decrypt them on every resume
+    * Optimize disk storage usage, don't store intermediate fragments on disk
+    * Store bookkeeping download state file
 + [extractor/common] Add support for multiple getters in try_get
 + [extractor/common] Add support for video of WebPage context in _json_ld
  (#12778)
--- a/docs/supportedsites.md
+++ b/docs/supportedsites.md
@@ -45,6 +45,7 @@
 - **anderetijden**: npo.nl and ntr.nl
 - **AnimeOnDemand**
 - **anitube.se**
+ - **Anvato**
 - **AnySex**
 - **Aparat**
 - **AppleConnect**
@@ -529,6 +530,7 @@
 - **NJPWWorld**: 新日本プロレスワールド
 - **NobelPrize**
 - **Noco**
+ - **Noovo**
 - **Normalboots**
 - **NosVideo**
 - **Nova**: TN.cz, Prásk.tv, Nova.cz, Novaplus.cz, FANDA.tv, Krásná.cz and Doma.cz
@@ -1013,6 +1015,7 @@
 - **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword)
 - **youtube:watchlater**: Youtube watch later list, ":ytwatchlater" for short (requires authentication)
 - **Zapiks**
+ - **Zaq1**
 - **ZDF**
 - **ZDFChannel**
 - **zingmp3**: mp3.zing.vn
--- a/test/test_utils.py
+++ b/test/test_utils.py
@@ -338,6 +338,7 @@ class TestUtil(unittest.TestCase):
        self.assertEqual(unified_timestamp('UNKNOWN DATE FORMAT'), None)
        self.assertEqual(unified_timestamp('May 16, 2016 11:15 PM'), 1463440500)
        self.assertEqual(unified_timestamp('Feb 7, 2016 at 6:35 pm'), 1454870100)
+        self.assertEqual(unified_timestamp('2017-03-30T17:52:41Q'), 1490896361)

    def test_determine_ext(self):
        self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
@@ -899,6 +900,7 @@ class TestUtil(unittest.TestCase):
    def test_clean_html(self):
        self.assertEqual(clean_html('a:\nb'), 'a: b')
        self.assertEqual(clean_html('a:\n   "b"'), 'a:    "b"')
+        self.assertEqual(clean_html('a<br>\xa0b'), 'a\nb')

    def test_intlist_to_bytes(self):
        self.assertEqual(
--- a/youtube_dl/downloader/external.py
+++ b/youtube_dl/downloader/external.py
@@ -29,7 +29,17 @@ class ExternalFD(FileDownloader):
        self.report_destination(filename)
        tmpfilename = self.temp_name(filename)

-        retval = self._call_downloader(tmpfilename, info_dict)
+        try:
+            retval = self._call_downloader(tmpfilename, info_dict)
+        except KeyboardInterrupt:
+            if not info_dict.get('is_live'):
+                raise
+            # Live stream downloading cancellation should be considered as
+            # correct and expected termination thus all postprocessing
+            # should take place
+            retval = 0
+            self.to_screen('[%s] Interrupted by user' % self.get_basename())
+
        if retval == 0:
            fsize = os.path.getsize(encodeFilename(tmpfilename))
            self.to_screen('\r[%s] Downloaded %s bytes' % (self.get_basename(), fsize))
--- a/youtube_dl/downloader/fragment.py
+++ b/youtube_dl/downloader/fragment.py
@@ -49,7 +49,7 @@ class FragmentFD(FileDownloader):
                index:  0-based index of current fragment among all fragments
            fragment_count:
                Total count of fragments
-                
+
    This feature is experimental and file format may change in future.
    """

--- a/youtube_dl/extractor/anvato.py
+++ b/youtube_dl/extractor/anvato.py
@@ -5,6 +5,7 @@ import base64
 import hashlib
 import json
 import random
+import re
 import time

 from .common import InfoExtractor
@@ -16,6 +17,7 @@ from ..utils import (
    intlist_to_bytes,
    int_or_none,
    strip_jsonp,
+    unescapeHTML,
 )


@@ -26,6 +28,8 @@ def md5_text(s):


 class AnvatoIE(InfoExtractor):
+    _VALID_URL = r'anvato:(?P<access_key_or_mcp>[^:]+):(?P<id>\d+)'
+
    # Copied from anvplayer.min.js
    _ANVACK_TABLE = {
        'nbcu_nbcd_desktop_web_prod_93d8ead38ce2024f8f544b78306fbd15895ae5e6': 'NNemUkySjxLyPTKvZRiGntBIjEyK8uqicjMakIaQ',
@@ -114,6 +118,22 @@ class AnvatoIE(InfoExtractor):
        'nbcu_nbcd_desktop_web_prod_93d8ead38ce2024f8f544b78306fbd15895ae5e6_secure': 'NNemUkySjxLyPTKvZRiGntBIjEyK8uqicjMakIaQ'
    }

+    _MCP_TO_ACCESS_KEY_TABLE = {
+        'qa': 'anvato_mcpqa_demo_web_stage_18b55e00db5a13faa8d03ae6e41f6f5bcb15b922',
+        'lin': 'anvato_mcp_lin_web_prod_4c36fbfd4d8d8ecae6488656e21ac6d1ac972749',
+        'univison': 'anvato_mcp_univision_web_prod_37fe34850c99a3b5cdb71dab10a417dd5cdecafa',
+        'uni': 'anvato_mcp_univision_web_prod_37fe34850c99a3b5cdb71dab10a417dd5cdecafa',
+        'dev': 'anvato_mcp_fs2go_web_prod_c7b90a93e171469cdca00a931211a2f556370d0a',
+        'sps': 'anvato_mcp_sps_web_prod_54bdc90dd6ba21710e9f7074338365bba28da336',
+        'spsstg': 'anvato_mcp_sps_web_prod_54bdc90dd6ba21710e9f7074338365bba28da336',
+        'anv': 'anvato_mcp_anv_web_prod_791407490f4c1ef2a4bcb21103e0cb1bcb3352b3',
+        'gray': 'anvato_mcp_gray_web_prod_4c10f067c393ed8fc453d3930f8ab2b159973900',
+        'hearst': 'anvato_mcp_hearst_web_prod_5356c3de0fc7c90a3727b4863ca7fec3a4524a99',
+        'cbs': 'anvato_mcp_cbs_web_prod_02f26581ff80e5bda7aad28226a8d369037f2cbe',
+        'telemundo': 'anvato_mcp_telemundo_web_prod_c5278d51ad46fda4b6ca3d0ea44a7846a054f582'
+    }
+
+    _ANVP_RE = r'<script[^>]+\bdata-anvp\s*=\s*(["\'])(?P<anvp>(?:(?!\1).)+)\1'
    _AUTH_KEY = b'\x31\xc2\x42\x84\x9e\x73\xa0\xce'

    def __init__(self, *args, **kwargs):
@@ -217,9 +237,42 @@ class AnvatoIE(InfoExtractor):
            'subtitles': subtitles,
        }

+    @staticmethod
+    def _extract_urls(ie, webpage, video_id):
+        entries = []
+        for mobj in re.finditer(AnvatoIE._ANVP_RE, webpage):
+            anvplayer_data = ie._parse_json(
+                mobj.group('anvp'), video_id, transform_source=unescapeHTML,
+                fatal=False)
+            if not anvplayer_data:
+                continue
+            video = anvplayer_data.get('video')
+            if not isinstance(video, compat_str) or not video.isdigit():
+                continue
+            access_key = anvplayer_data.get('accessKey')
+            if not access_key:
+                mcp = anvplayer_data.get('mcp')
+                if mcp:
+                    access_key = AnvatoIE._MCP_TO_ACCESS_KEY_TABLE.get(
+                        mcp.lower())
+            if not access_key:
+                continue
+            entries.append(ie.url_result(
+                'anvato:%s:%s' % (access_key, video), ie=AnvatoIE.ie_key(),
+                video_id=video))
+        return entries
+
    def _extract_anvato_videos(self, webpage, video_id):
-        anvplayer_data = self._parse_json(self._html_search_regex(
-            r'<script[^>]+data-anvp=\'([^\']+)\'', webpage,
-            'Anvato player data'), video_id)
+        anvplayer_data = self._parse_json(
+            self._html_search_regex(
+                self._ANVP_RE, webpage, 'Anvato player data', group='anvp'),
+            video_id)
        return self._get_anvato_videos(
            anvplayer_data['accessKey'], anvplayer_data['video'])
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        access_key, video_id = mobj.group('access_key_or_mcp', 'id')
+        if access_key not in self._ANVACK_TABLE:
+            access_key = self._MCP_TO_ACCESS_KEY_TABLE[access_key]
+        return self._get_anvato_videos(access_key, video_id)
--- a/youtube_dl/extractor/appletrailers.py
+++ b/youtube_dl/extractor/appletrailers.py
@@ -70,7 +70,8 @@ class AppleTrailersIE(InfoExtractor):
    }, {
        'url': 'http://trailers.apple.com/trailers/magnolia/blackthorn/',
        'info_dict': {
-            'id': 'blackthorn',
+            'id': '4489',
+            'title': 'Blackthorn',
        },
        'playlist_mincount': 2,
        'expected_warnings': ['Unable to download JSON metadata'],
@@ -261,7 +262,7 @@ class AppleTrailersSectionIE(InfoExtractor):
            'title': 'Most Popular',
            'id': 'mostpopular',
        },
-        'playlist_mincount': 80,
+        'playlist_mincount': 30,
    }, {
        'url': 'http://trailers.apple.com/#section=moviestudios',
        'info_dict': {
--- a/youtube_dl/extractor/brightcove.py
+++ b/youtube_dl/extractor/brightcove.py
@@ -522,7 +522,7 @@ class BrightcoveNewIE(InfoExtractor):
        # [2] looks like:
        for video, script_tag, account_id, player_id, embed in re.findall(
                r'''(?isx)
-                    (<video\s+[^>]*data-video-id=['"]?[^>]+>)
+                    (<video\s+[^>]*\bdata-video-id\s*=\s*['"]?[^>]+>)
                    (?:.*?
                        (<script[^>]+
                            src=["\'](?:https?:)?//players\.brightcove\.net/
--- a/youtube_dl/extractor/cbc.py
+++ b/youtube_dl/extractor/cbc.py
@@ -96,6 +96,7 @@ class CBCIE(InfoExtractor):
        'info_dict': {
            'title': 'Keep Rover active during the deep freeze with doggie pushups and other fun indoor tasks',
            'id': 'dog-indoor-exercise-winter-1.3928238',
+            'description': 'md5:c18552e41726ee95bd75210d1ca9194c',
        },
        'playlist_mincount': 6,
    }]
@@ -165,12 +166,11 @@ class CBCPlayerIE(InfoExtractor):
            'uploader': 'CBCC-NEW',
        },
    }, {
-        # available only when we add `formats=MPEG4,FLV,MP3` to theplatform url
        'url': 'http://www.cbc.ca/player/play/2164402062',
-        'md5': '17a61eb813539abea40618d6323a7f82',
+        'md5': '33fcd8f6719b9dd60a5e73adcb83b9f6',
        'info_dict': {
            'id': '2164402062',
-            'ext': 'flv',
+            'ext': 'mp4',
            'title': 'Cancer survivor four times over',
            'description': 'Tim Mayer has beaten three different forms of cancer four times in five years.',
            'timestamp': 1320410746,
--- a/youtube_dl/extractor/common.py
+++ b/youtube_dl/extractor/common.py
@@ -990,6 +990,7 @@ class InfoExtractor(object):
                'tbr': int_or_none(e.get('bitrate')),
                'width': int_or_none(e.get('width')),
                'height': int_or_none(e.get('height')),
+                'view_count': int_or_none(e.get('interactionCount')),
            })

        for e in json_ld:
--- a/youtube_dl/extractor/extractors.py
+++ b/youtube_dl/extractor/extractors.py
@@ -41,6 +41,7 @@ from .alphaporno import AlphaPornoIE
 from .amcnetworks import AMCNetworksIE
 from .animeondemand import AnimeOnDemandIE
 from .anitube import AnitubeIE
+from .anvato import AnvatoIE
 from .anysex import AnySexIE
 from .aol import AolIE
 from .allocine import AllocineIE
@@ -662,6 +663,7 @@ from .nintendo import NintendoIE
 from .njpwworld import NJPWWorldIE
 from .nobelprize import NobelPrizeIE
 from .noco import NocoIE
+from .noovo import NoovoIE
 from .normalboots import NormalbootsIE
 from .nosvideo import NosVideoIE
 from .nova import NovaIE
@@ -1298,5 +1300,6 @@ from .youtube import (
    YoutubeWatchLaterIE,
 )
 from .zapiks import ZapiksIE
+from .zaq1 import Zaq1IE
 from .zdf import ZDFIE, ZDFChannelIE
 from .zingmp3 import ZingMp3IE
--- a/youtube_dl/extractor/generic.py
+++ b/youtube_dl/extractor/generic.py
@@ -86,6 +86,8 @@ from .openload import OpenloadIE
 from .videopress import VideoPressIE
 from .rutube import RutubeIE
 from .limelight import LimelightBaseIE
+from .anvato import AnvatoIE
+from .washingtonpost import WashingtonPostIE


 class GenericIE(InfoExtractor):
@@ -1427,6 +1429,22 @@ class GenericIE(InfoExtractor):
                'skip_download': True,
            },
        },
+        {
+            # Brightcove embed with whitespace around attribute names
+            'url': 'http://www.stack.com/video/3167554373001/learn-to-hit-open-three-pointers-with-damian-lillard-s-baseline-drift-drill',
+            'info_dict': {
+                'id': '3167554373001',
+                'ext': 'mp4',
+                'title': "Learn to Hit Open Three-Pointers With Damian Lillard's Baseline Drift Drill",
+                'description': 'md5:57bacb0e0f29349de4972bfda3191713',
+                'uploader_id': '1079349493',
+                'upload_date': '20140207',
+                'timestamp': 1391810548,
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
        # Another form of arte.tv embed
        {
            'url': 'http://www.tv-replay.fr/redirection/09-04-16/arte-reportage-arte-11508975.html',
@@ -1677,6 +1695,29 @@ class GenericIE(InfoExtractor):
            },
            'playlist_mincount': 5,
        },
+        {
+            'url': 'http://kron4.com/2017/04/28/standoff-with-walnut-creek-murder-suspect-ends-with-arrest/',
+            'info_dict': {
+                'id': 'standoff-with-walnut-creek-murder-suspect-ends-with-arrest',
+                'title': 'Standoff with Walnut Creek murder suspect ends',
+                'description': 'md5:3ccc48a60fc9441eeccfc9c469ebf788',
+            },
+            'playlist_mincount': 4,
+        },
+        {
+            # WashingtonPost embed
+            'url': 'http://www.vanityfair.com/hollywood/2017/04/donald-trump-tv-pitches',
+            'info_dict': {
+                'id': '8caf6e88-d0ec-11e5-90d3-34c2c42653ac',
+                'ext': 'mp4',
+                'title': "No one has seen the drama series based on Trump's life \u2014 until now",
+                'description': 'Donald Trump wanted a weekly TV drama based on his life. It never aired. But The Washington Post recently obtained a scene from the pilot script — and enlisted actors.',
+                'timestamp': 1455216756,
+                'uploader': 'The Washington Post',
+                'upload_date': '20160211',
+            },
+            'add_ie': [WashingtonPostIE.ie_key()],
+        },
        # {
        #     # TODO: find another test
        #     # http://schema.org/VideoObject
@@ -2537,6 +2578,12 @@ class GenericIE(InfoExtractor):
                'limelight:media:%s' % mobj.group('id'),
                {'source_url': url}), 'LimelightMedia', mobj.group('id'))

+        # Look for Anvato embeds
+        anvato_urls = AnvatoIE._extract_urls(self, webpage, video_id)
+        if anvato_urls:
+            return self.playlist_result(
+                anvato_urls, video_id, video_title, video_description)
+
        # Look for AdobeTVVideo embeds
        mobj = re.search(
            r'<iframe[^>]+src=[\'"]((?:https?:)?//video\.tv\.adobe\.com/v/\d+[^"]+)[\'"]',
@@ -2654,6 +2701,12 @@ class GenericIE(InfoExtractor):
            return self.playlist_from_matches(
                rutube_urls, ie=RutubeIE.ie_key())

+        # Look for WashingtonPost embeds
+        wapo_urls = WashingtonPostIE._extract_urls(webpage)
+        if wapo_urls:
+            return self.playlist_from_matches(
+                wapo_urls, video_id, video_title, ie=WashingtonPostIE.ie_key())
+
        # Looking for http://schema.org/VideoObject
        json_ld = self._search_json_ld(
            webpage, video_id, default={}, expected_type='VideoObject')
--- a/youtube_dl/extractor/infoq.py
+++ b/youtube_dl/extractor/infoq.py
@@ -87,8 +87,8 @@ class InfoQIE(BokeCCBaseIE):

    def _extract_http_audio(self, webpage, video_id):
        fields = self._hidden_inputs(webpage)
-        http_audio_url = fields['filename']
-        if http_audio_url is None:
+        http_audio_url = fields.get('filename')
+        if not http_audio_url:
            return []

        cookies_header = {'Cookie': self._extract_cookies(webpage)}
--- a/youtube_dl/extractor/noovo.py
+++ b/youtube_dl/extractor/noovo.py
@@ -0,0 +1,97 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .brightcove import BrightcoveNewIE
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    int_or_none,
+    smuggle_url,
+    try_get,
+)
+
+
+class NoovoIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:[^/]+\.)?noovo\.ca/videos/(?P<id>[^/]+/[^/?#&]+)'
+    _TESTS = [{
+        # clip
+        'url': 'http://noovo.ca/videos/rpm-plus/chrysler-imperial',
+        'info_dict': {
+            'id': '5386045029001',
+            'ext': 'mp4',
+            'title': 'Chrysler Imperial',
+            'description': 'md5:de3c898d1eb810f3e6243e08c8b4a056',
+            'timestamp': 1491399228,
+            'upload_date': '20170405',
+            'uploader_id': '618566855001',
+            'creator': 'vtele',
+            'view_count': int,
+            'series': 'RPM+',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # episode
+        'url': 'http://noovo.ca/videos/l-amour-est-dans-le-pre/episode-13-8',
+        'info_dict': {
+            'id': '5395865725001',
+            'title': 'Épisode 13 : Les retrouvailles',
+            'description': 'md5:336d5ebc5436534e61d16e63ddfca327',
+            'ext': 'mp4',
+            'timestamp': 1492019320,
+            'upload_date': '20170412',
+            'uploader_id': '618566855001',
+            'creator': 'vtele',
+            'view_count': int,
+            'series': "L'amour est dans le pré",
+            'season_number': 5,
+            'episode': 'Épisode 13',
+            'episode_number': 13,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }]
+    BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/618566855001/default_default/index.html?videoId=%s'
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        data = self._download_json(
+            'http://api.noovo.ca/api/v1/pages/single-episode/%s' % video_id,
+            video_id)['data']
+
+        content = try_get(data, lambda x: x['contents'][0])
+
+        brightcove_id = data.get('brightcoveId') or content['brightcoveId']
+
+        series = try_get(
+            data, (
+                lambda x: x['show']['title'],
+                lambda x: x['season']['show']['title']),
+            compat_str)
+
+        episode = None
+        og = data.get('og')
+        if isinstance(og, dict) and og.get('type') == 'video.episode':
+            episode = og.get('title')
+
+        video = content or data
+
+        return {
+            '_type': 'url_transparent',
+            'ie_key': BrightcoveNewIE.ie_key(),
+            'url': smuggle_url(
+                self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
+                {'geo_countries': ['CA']}),
+            'id': brightcove_id,
+            'title': video.get('title'),
+            'creator': video.get('source'),
+            'view_count': int_or_none(video.get('viewsCount')),
+            'series': series,
+            'season_number': int_or_none(try_get(
+                data, lambda x: x['season']['seasonNumber'])),
+            'episode': episode,
+            'episode_number': int_or_none(data.get('episodeNumber')),
+        }
--- a/youtube_dl/extractor/tvplayer.py
+++ b/youtube_dl/extractor/tvplayer.py
@@ -2,9 +2,13 @@
 from __future__ import unicode_literals

 from .common import InfoExtractor
-from ..compat import compat_HTTPError
+from ..compat import (
+    compat_HTTPError,
+    compat_str,
+)
 from ..utils import (
    extract_attributes,
+    try_get,
    urlencode_postdata,
    ExtractorError,
 )
@@ -34,25 +38,32 @@ class TVPlayerIE(InfoExtractor):
            webpage, 'channel element'))
        title = current_channel['data-name']

-        resource_id = self._search_regex(
-            r'resourceId\s*=\s*"(\d+)"', webpage, 'resource id')
-        platform = self._search_regex(
-            r'platform\s*=\s*"([^"]+)"', webpage, 'platform')
+        resource_id = current_channel['data-id']
+
        token = self._search_regex(
-            r'token\s*=\s*"([^"]+)"', webpage, 'token', default='null')
-        validate = self._search_regex(
-            r'validate\s*=\s*"([^"]+)"', webpage, 'validate', default='null')
+            r'data-token=(["\'])(?P<token>(?!\1).+)\1', webpage,
+            'token', group='token')
+
+        context = self._download_json(
+            'https://tvplayer.com/watch/context', display_id,
+            'Downloading JSON context', query={
+                'resource': resource_id,
+                'nonce': token,
+            })
+
+        validate = context['validate']
+        platform = try_get(
+            context, lambda x: x['platform']['key'], compat_str) or 'firefox'

        try:
            response = self._download_json(
                'http://api.tvplayer.com/api/v2/stream/live',
-                resource_id, headers={
+                display_id, 'Downloading JSON stream', headers={
                    'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
                }, data=urlencode_postdata({
+                    'id': resource_id,
                    'service': 1,
                    'platform': platform,
-                    'id': resource_id,
-                    'token': token,
                    'validate': validate,
                }))['tvplayer']['response']
        except ExtractorError as e:
@@ -63,7 +74,7 @@ class TVPlayerIE(InfoExtractor):
                    '%s said: %s' % (self.IE_NAME, response['error']), expected=True)
            raise

-        formats = self._extract_m3u8_formats(response['stream'], resource_id, 'mp4')
+        formats = self._extract_m3u8_formats(response['stream'], display_id, 'mp4')
        self._sort_formats(formats)

        return {
--- a/youtube_dl/extractor/vevo.py
+++ b/youtube_dl/extractor/vevo.py
@@ -1,6 +1,7 @@
 from __future__ import unicode_literals

 import re
+import json

 from .common import InfoExtractor
 from ..compat import (
@@ -11,7 +12,6 @@ from ..compat import (
 from ..utils import (
    ExtractorError,
    int_or_none,
-    sanitized_Request,
    parse_iso8601,
 )

@@ -154,19 +154,24 @@ class VevoIE(VevoBaseIE):
    }

    def _initialize_api(self, video_id):
-        req = sanitized_Request(
-            'http://www.vevo.com/auth', data=b'')
        webpage = self._download_webpage(
-            req, None,
+            'https://accounts.vevo.com/token', None,
            note='Retrieving oauth token',
-            errnote='Unable to retrieve oauth token')
+            errnote='Unable to retrieve oauth token',
+            data=json.dumps({
+                'client_id': 'SPupX1tvqFEopQ1YS6SS',
+                'grant_type': 'urn:vevo:params:oauth:grant-type:anonymous',
+            }).encode('utf-8'),
+            headers={
+                'Content-Type': 'application/json',
+            })

        if re.search(r'(?i)THIS PAGE IS CURRENTLY UNAVAILABLE IN YOUR REGION', webpage):
            self.raise_geo_restricted(
                '%s said: This page is currently unavailable in your region' % self.IE_NAME)

        auth_info = self._parse_json(webpage, video_id)
-        self._api_url_template = self.http_scheme() + '//apiv2.vevo.com/%s?token=' + auth_info['access_token']
+        self._api_url_template = self.http_scheme() + '//apiv2.vevo.com/%s?token=' + auth_info['legacy_token']

    def _call_api(self, path, *args, **kwargs):
        try:
--- a/youtube_dl/extractor/washingtonpost.py
+++ b/youtube_dl/extractor/washingtonpost.py
@@ -13,6 +13,7 @@ from ..utils import (
 class WashingtonPostIE(InfoExtractor):
    IE_NAME = 'washingtonpost'
    _VALID_URL = r'(?:washingtonpost:|https?://(?:www\.)?washingtonpost\.com/video/(?:[^/]+/)*)(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
+    _EMBED_URL = r'https?://(?:www\.)?washingtonpost\.com/video/c/embed/[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}'
    _TEST = {
        'url': 'https://www.washingtonpost.com/video/c/video/480ba4ee-1ec7-11e6-82c2-a7dcb313287d',
        'md5': '6f537e1334b714eb15f9563bd4b9cdfa',
@@ -27,6 +28,11 @@ class WashingtonPostIE(InfoExtractor):
        },
    }

+    @classmethod
+    def _extract_urls(cls, webpage):
+        return re.findall(
+            r'<iframe[^>]+\bsrc=["\'](%s)' % cls._EMBED_URL, webpage)
+
    def _real_extract(self, url):
        video_id = self._match_id(url)
        video_data = self._download_json(
--- a/youtube_dl/extractor/xtube.py
+++ b/youtube_dl/extractor/xtube.py
@@ -6,6 +6,7 @@ import re
 from .common import InfoExtractor
 from ..utils import (
    int_or_none,
+    js_to_json,
    orderedSet,
    parse_duration,
    sanitized_Request,
@@ -37,6 +38,22 @@ class XTubeIE(InfoExtractor):
            'comment_count': int,
            'age_limit': 18,
        }
+    }, {
+        # FLV videos with duplicated formats
+        'url': 'http://www.xtube.com/video-watch/A-Super-Run-Part-1-YT-9299752',
+        'md5': 'a406963eb349dd43692ec54631efd88b',
+        'info_dict': {
+            'id': '9299752',
+            'display_id': 'A-Super-Run-Part-1-YT',
+            'ext': 'flv',
+            'title': 'A Super Run - Part 1 (YT)',
+            'description': 'md5:ca0d47afff4a9b2942e4b41aa970fd93',
+            'uploader': 'tshirtguy59',
+            'duration': 579,
+            'view_count': int,
+            'comment_count': int,
+            'age_limit': 18,
+        },
    }, {
        # new URL schema
        'url': 'http://www.xtube.com/video-watch/strange-erotica-625837',
@@ -68,8 +85,9 @@ class XTubeIE(InfoExtractor):
            })

        sources = self._parse_json(self._search_regex(
-            r'(["\'])sources\1\s*:\s*(?P<sources>{.+?}),',
-            webpage, 'sources', group='sources'), video_id)
+            r'(["\'])?sources\1?\s*:\s*(?P<sources>{.+?}),',
+            webpage, 'sources', group='sources'), video_id,
+            transform_source=js_to_json)

        formats = []
        for format_id, format_url in sources.items():
@@ -78,6 +96,7 @@ class XTubeIE(InfoExtractor):
                'format_id': format_id,
                'height': int_or_none(format_id),
            })
+        self._remove_duplicate_formats(formats)
        self._sort_formats(formats)

        title = self._search_regex(
--- a/youtube_dl/extractor/xvideos.py
+++ b/youtube_dl/extractor/xvideos.py
@@ -6,8 +6,10 @@ from .common import InfoExtractor
 from ..compat import compat_urllib_parse_unquote
 from ..utils import (
    clean_html,
-    ExtractorError,
    determine_ext,
+    ExtractorError,
+    int_or_none,
+    parse_duration,
 )


@@ -20,6 +22,7 @@ class XVideosIE(InfoExtractor):
            'id': '4588838',
            'ext': 'mp4',
            'title': 'Biker Takes his Girl',
+            'duration': 108,
            'age_limit': 18,
        }
    }
@@ -36,6 +39,11 @@ class XVideosIE(InfoExtractor):
            r'<title>(.*?)\s+-\s+XVID', webpage, 'title')
        video_thumbnail = self._search_regex(
            r'url_bigthumb=(.+?)&amp', webpage, 'thumbnail', fatal=False)
+        video_duration = int_or_none(self._og_search_property(
+            'duration', webpage, default=None)) or parse_duration(
+            self._search_regex(
+                r'<span[^>]+class=["\']duration["\'][^>]*>.*?(\d[^<]+)',
+                webpage, 'duration', fatal=False))

        formats = []

@@ -67,6 +75,7 @@ class XVideosIE(InfoExtractor):
            'id': video_id,
            'formats': formats,
            'title': video_title,
+            'duration': video_duration,
            'thumbnail': video_thumbnail,
            'age_limit': 18,
        }
--- a/youtube_dl/extractor/yandexmusic.py
+++ b/youtube_dl/extractor/yandexmusic.py
@@ -234,7 +234,8 @@ class YandexMusicPlaylistIE(YandexMusicPlaylistBaseIE):
                'overembed': 'false',
            })['playlist']

-        tracks, track_ids = playlist['tracks'], map(compat_str, playlist['trackIds'])
+        tracks = playlist['tracks']
+        track_ids = [compat_str(track_id) for track_id in playlist['trackIds']]

        # tracks dictionary shipped with playlist.jsx API is limited to 150 tracks,
        # missing tracks should be retrieved manually.
--- a/youtube_dl/extractor/zaq1.py
+++ b/youtube_dl/extractor/zaq1.py
@@ -0,0 +1,101 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    unified_timestamp,
+)
+
+
+class Zaq1IE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?zaq1\.pl/video/(?P<id>[^/?#&]+)'
+    _TESTS = [{
+        'url': 'http://zaq1.pl/video/xev0e',
+        'md5': '24a5eb3f052e604ae597c4d0d19b351e',
+        'info_dict': {
+            'id': 'xev0e',
+            'title': 'DJ NA WESELE. TANIEC Z FIGURAMI.węgrów/sokołów podlaski/siedlce/mińsk mazowiecki/warszawa',
+            'description': 'www.facebook.com/weseledjKontakt: 728 448 199 / 505 419 147',
+            'ext': 'mp4',
+            'duration': 511,
+            'timestamp': 1490896361,
+            'uploader': 'Anonim',
+            'upload_date': '20170330',
+            'view_count': int,
+        }
+    }, {
+        # malformed JSON-LD
+        'url': 'http://zaq1.pl/video/x81vn',
+        'info_dict': {
+            'id': 'x81vn',
+            'title': 'SEKRETNE ŻYCIE WALTERA MITTY',
+            'ext': 'mp4',
+            'duration': 6234,
+            'timestamp': 1493494860,
+            'uploader': 'Anonim',
+            'upload_date': '20170429',
+            'view_count': int,
+        },
+        'params': {
+            'skip_download': True,
+        },
+        'expected_warnings': ['Failed to parse JSON'],
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        video_url = self._search_regex(
+            r'data-video-url=(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
+            'video url', group='url')
+
+        info = self._search_json_ld(webpage, video_id, fatal=False)
+
+        def extract_data(field, name, fatal=False):
+            return self._search_regex(
+                r'data-%s=(["\'])(?P<field>(?:(?!\1).)+)\1' % field,
+                webpage, field, fatal=fatal, group='field')
+
+        if not info.get('title'):
+            info['title'] = extract_data('file-name', 'title', fatal=True)
+
+        if not info.get('duration'):
+            info['duration'] = int_or_none(extract_data('duration', 'duration'))
+
+        if not info.get('thumbnail'):
+            info['thumbnail'] = extract_data('photo-url', 'thumbnail')
+
+        if not info.get('timestamp'):
+            info['timestamp'] = unified_timestamp(self._html_search_meta(
+                'uploadDate', webpage, 'timestamp'))
+
+        if not info.get('interactionCount'):
+            info['view_count'] = int_or_none(self._html_search_meta(
+                'interactionCount', webpage, 'view count'))
+
+        uploader = self._html_search_regex(
+            r'Wideo dodał:\s*<a[^>]*>([^<]+)</a>', webpage, 'uploader',
+            fatal=False)
+
+        width = int_or_none(self._html_search_meta(
+            'width', webpage, fatal=False))
+        height = int_or_none(self._html_search_meta(
+            'height', webpage, fatal=False))
+
+        info.update({
+            'id': video_id,
+            'formats': [{
+                'url': video_url,
+                'width': width,
+                'height': height,
+                'http_headers': {
+                    'Referer': url,
+                },
+            }],
+            'uploader': uploader,
+        })
+
+        return info
--- a/youtube_dl/utils.py
+++ b/youtube_dl/utils.py
@@ -421,8 +421,8 @@ def clean_html(html):

    # Newline vs <br />
    html = html.replace('\n', ' ')
-    html = re.sub(r'\s*<\s*br\s*/?\s*>\s*', '\n', html)
-    html = re.sub(r'<\s*/\s*p\s*>\s*<\s*p[^>]*>', '\n', html)
+    html = re.sub(r'(?u)\s*<\s*br\s*/?\s*>\s*', '\n', html)
+    html = re.sub(r'(?u)<\s*/\s*p\s*>\s*<\s*p[^>]*>', '\n', html)
    # Strip html tags
    html = re.sub('<.*?>', '', html)
    # Replace html entities
@@ -1194,6 +1194,11 @@ def unified_timestamp(date_str, day_first=True):
    # Remove AM/PM + timezone
    date_str = re.sub(r'(?i)\s*(?:AM|PM)(?:\s+[A-Z]+)?', '', date_str)

+    # Remove unrecognized timezones from ISO 8601 alike timestamps
+    m = re.search(r'\d{1,2}:\d{1,2}(?:\.\d+)?(?P<tz>\s*[A-Z]+)$', date_str)
+    if m:
+        date_str = date_str[:-len(m.group('tz'))]
+
    for expression in date_formats(day_first):
        try:
            dt = datetime.datetime.strptime(date_str, expression) - timezone + datetime.timedelta(hours=pm_delta)
@@ -2273,10 +2278,8 @@ def mimetype2ext(mt):
    return {
        '3gpp': '3gp',
        'smptett+xml': 'tt',
-        'srt': 'srt',
        'ttaf+xml': 'dfxp',
        'ttml+xml': 'ttml',
-        'vtt': 'vtt',
        'x-flv': 'flv',
        'x-mp4-fragmented': 'mp4',
        'x-ms-wmv': 'wmv',
@@ -2284,11 +2287,11 @@ def mimetype2ext(mt):
        'x-mpegurl': 'm3u8',
        'vnd.apple.mpegurl': 'm3u8',
        'dash+xml': 'mpd',
-        'f4m': 'f4m',
        'f4m+xml': 'f4m',
        'hds+xml': 'f4m',
        'vnd.ms-sstr+xml': 'ism',
        'quicktime': 'mov',
+        'mp2t': 'ts',
    }.get(res, res)


--- a/youtube_dl/version.py
+++ b/youtube_dl/version.py
@@ -1,3 +1,3 @@
 from __future__ import unicode_literals

-__version__ = '2017.04.28'
+__version__ = '2017.05.01'
Author	SHA1	Message	Date
Sergey M․	e0c1e9a98c	release 2017.05.01	2017-05-01 01:39:52 +07:00
Sergey M․	086041e2f8	[ChangeLog] Actualize	2017-05-01 01:34:51 +07:00
Sergey M․	74da856544	[infoq] Make audio format extraction non fatal (closes #12938 )	2017-05-01 01:23:05 +07:00
Sergey M․	9edf47df7b	[brightcove] Allow whitespace around attribute names in embedded code	2017-05-01 01:03:47 +07:00
Sergey M․	238cec17ae	[extractor/anvato] PEP 8	2017-04-30 22:04:21 +07:00
Sergey M․	50534b7158	[downloader/fragment] PEP 8	2017-04-30 22:04:01 +07:00
Sergey M․	9cd4209724	[zaq1] Improve extraction (closes #12693 )	2017-04-30 21:46:05 +07:00
Sergey M․	33a81c2c6f	[extractor/common] Extract view count from JSON-LD	2017-04-30 21:45:59 +07:00
Sergey M․	deef31955b	[utils] Improve unified_timestamp Seen at http://zaq1.pl/video/xev0e	2017-04-30 21:45:53 +07:00
slocum	9dac2cec2d	[zaq1] Add new extractor	2017-04-30 21:45:47 +07:00
Sergey M․	6ec371cd9e	[xvideos] Extract og:duration (closes #12828 )	2017-04-30 18:14:01 +07:00
Sander	13081db1f5	[xvideos] Add video duration	2017-04-30 18:10:49 +07:00
Sergey M․	b07ea5eaec	[vevo] Modernize	2017-04-30 17:58:22 +07:00
gritstub	5599253009	[vevo] Fix extraction (config.token.key)	2017-04-30 17:56:10 +07:00
Remita Amine	98ce1a3fd3	[utils] add video/mp2t to mimetype2ext	2017-04-30 09:03:10 +01:00
Yen Chi Hsuan	ba5c3caf88	[washingtonpost] Fix invalid escape sequence on Python 3.6	2017-04-30 02:15:28 +08:00
Sergey M․	b5c39537be	[noovo] Improve extraction (closes #12792 )	2017-04-30 00:24:25 +07:00
Frederic Bournival	1c7c76e4fb	[noovo] Add extractor	2017-04-30 00:24:19 +07:00
John Hawkinson	557194591a	[washingtonpost] Add support for embeds (closes #12699 )	2017-04-29 23:07:26 +07:00
Yen Chi Hsuan	27e70a8f6c	Merge pull request #12869 from Tithen-Firion/cbc-update-tests [cbc] update test cases	2017-04-29 21:34:18 +08:00
Sergey M․	a4c81e4968	[yandexmusic:playlist] Fix extraction for python 3 (closes #12888 )	2017-04-29 20:23:26 +07:00
Sergey M․	7986c3abcd	[anvato] Improve extraction (closes #12913 ) * Promote to regular shortcut based extractor * Add mcp to access key mapping table * Add support for embeds extraction * Add support for anvato embeds in generic extractor	2017-04-29 19:49:04 +07:00
Yen Chi Hsuan	a1ebfd4494	Merge pull request #12854 from Tithen-Firion/appletrailer-test-fix [appletrailers] update test cases	2017-04-29 19:24:38 +08:00
Yen Chi Hsuan	d19093bd50	Merge pull request #12906 from Tithen-Firion/clean-html-fix [utils] Fix inconsistent output of clean_html	2017-04-29 15:58:45 +08:00
Yen Chi Hsuan	24eb7c2578	[xtube] Fix extraction with non-standard JSON 'sources' Closes #12734 Thanks @paulguy for the fix!	2017-04-29 15:55:08 +08:00
Sergey M․	e7db6759e4	[downloader/external] Properly handle live stream downloading cancellation (closes #8932 )	2017-04-29 04:33:35 +07:00
Sergey M․	b364c87c42	[tvplayer] Fix extraction (closes #12908 )	2017-04-29 03:46:08 +07:00
Tithen-Firion	9222d94510	[test_utils] Add one more clean_html test	2017-04-28 18:05:14 +02:00
Tithen-Firion	edd9221cd2	[utils] Fix inconsistent output of clean_html `\s` in Python 2.x doesn't match unicode whitespace characters by default	2017-04-28 17:34:27 +02:00
Tithen-Firion	c95e2b5911	[cbc] update test cases	2017-04-27 18:07:07 +02:00
Tithen-Firion	76c1951036	[appletrailers] update test cases	2017-04-27 10:04:21 +02:00