Compare commits

..

31 Commits

Author SHA1 Message Date
Sergey M․
e0c1e9a98c release 2017.05.01 2017-05-01 01:39:52 +07:00
Sergey M․
086041e2f8 [ChangeLog] Actualize 2017-05-01 01:34:51 +07:00
Sergey M․
74da856544 [infoq] Make audio format extraction non fatal (closes #12938) 2017-05-01 01:23:05 +07:00
Sergey M․
9edf47df7b [brightcove] Allow whitespace around attribute names in embedded code 2017-05-01 01:03:47 +07:00
Sergey M․
238cec17ae [extractor/anvato] PEP 8 2017-04-30 22:04:21 +07:00
Sergey M․
50534b7158 [downloader/fragment] PEP 8 2017-04-30 22:04:01 +07:00
Sergey M․
9cd4209724 [zaq1] Improve extraction (closes #12693) 2017-04-30 21:46:05 +07:00
Sergey M․
33a81c2c6f [extractor/common] Extract view count from JSON-LD 2017-04-30 21:45:59 +07:00
Sergey M․
deef31955b [utils] Improve unified_timestamp
Seen at http://zaq1.pl/video/xev0e
2017-04-30 21:45:53 +07:00
slocum
9dac2cec2d [zaq1] Add new extractor 2017-04-30 21:45:47 +07:00
Sergey M․
6ec371cd9e [xvideos] Extract og:duration (closes #12828) 2017-04-30 18:14:01 +07:00
Sander
13081db1f5 [xvideos] Add video duration 2017-04-30 18:10:49 +07:00
Sergey M․
b07ea5eaec [vevo] Modernize 2017-04-30 17:58:22 +07:00
gritstub
5599253009 [vevo] Fix extraction (config.token.key) 2017-04-30 17:56:10 +07:00
Remita Amine
98ce1a3fd3 [utils] add video/mp2t to mimetype2ext 2017-04-30 09:03:10 +01:00
Yen Chi Hsuan
ba5c3caf88 [washingtonpost] Fix invalid escape sequence on Python 3.6 2017-04-30 02:15:28 +08:00
Sergey M․
b5c39537be [noovo] Improve extraction (closes #12792) 2017-04-30 00:24:25 +07:00
Frederic Bournival
1c7c76e4fb [noovo] Add extractor 2017-04-30 00:24:19 +07:00
John Hawkinson
557194591a [washingtonpost] Add support for embeds (closes #12699) 2017-04-29 23:07:26 +07:00
Yen Chi Hsuan
27e70a8f6c Merge pull request #12869 from Tithen-Firion/cbc-update-tests
[cbc] update test cases
2017-04-29 21:34:18 +08:00
Sergey M․
a4c81e4968 [yandexmusic:playlist] Fix extraction for python 3 (closes #12888) 2017-04-29 20:23:26 +07:00
Sergey M․
7986c3abcd [anvato] Improve extraction (closes #12913)
* Promote to regular shortcut based extractor
* Add mcp to access key mapping table
* Add support for embeds extraction
* Add support for anvato embeds in generic extractor
2017-04-29 19:49:04 +07:00
Yen Chi Hsuan
a1ebfd4494 Merge pull request #12854 from Tithen-Firion/appletrailer-test-fix
[appletrailers] update test cases
2017-04-29 19:24:38 +08:00
Yen Chi Hsuan
d19093bd50 Merge pull request #12906 from Tithen-Firion/clean-html-fix
[utils] Fix inconsistent output of clean_html
2017-04-29 15:58:45 +08:00
Yen Chi Hsuan
24eb7c2578 [xtube] Fix extraction with non-standard JSON 'sources'
Closes #12734

Thanks @paulguy for the fix!
2017-04-29 15:55:08 +08:00
Sergey M․
e7db6759e4 [downloader/external] Properly handle live stream downloading cancellation (closes #8932) 2017-04-29 04:33:35 +07:00
Sergey M․
b364c87c42 [tvplayer] Fix extraction (closes #12908) 2017-04-29 03:46:08 +07:00
Tithen-Firion
9222d94510 [test_utils] Add one more clean_html test 2017-04-28 18:05:14 +02:00
Tithen-Firion
edd9221cd2 [utils] Fix inconsistent output of clean_html
`\s` in Python 2.x doesn't match unicode whitespace characters by
default
2017-04-28 17:34:27 +02:00
Tithen-Firion
c95e2b5911 [cbc] update test cases 2017-04-27 18:07:07 +02:00
Tithen-Firion
76c1951036 [appletrailers] update test cases 2017-04-27 10:04:21 +02:00
24 changed files with 459 additions and 53 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.04.28*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.04.28**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.05.01*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.05.01**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.04.28
[debug] youtube-dl version 2017.05.01
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -1,3 +1,31 @@
version 2017.05.01
Core
+ [extractor/common] Extract view count from JSON-LD
* [utils] Improve unified_timestamp
+ [utils] Add video/mp2t to mimetype2ext
* [downloader/external] Properly handle live stream downloading cancellation
(#8932)
+ [utils] Add support for unicode whitespace in clean_html on python 2 (#12906)
Extractors
* [infoq] Make audio format extraction non fatal (#12938)
* [brightcove] Allow whitespace around attribute names in embedded code
+ [zaq1] Add support for zaq1.pl (#12693)
+ [xvideos] Extract duration (#12828)
* [vevo] Fix extraction (#12879)
+ [noovo] Add support for noovo.ca (#12792)
+ [washingtonpost] Add support for embeds (#12699)
* [yandexmusic:playlist] Fix extraction for python 3 (#12888)
* [anvato] Improve extraction (#12913)
* Promote to regular shortcut based extractor
* Add mcp to access key mapping table
* Add support for embeds extraction
* Add support for anvato embeds in generic extractor
* [xtube] Fix extraction for older FLV videos (#12734)
* [tvplayer] Fix extraction (#12908)
version 2017.04.28
Core
@@ -24,19 +52,19 @@ Core
* [YoutubeDL] Fix output template for missing timestamp (#12796)
* [socks] Handle cases where credentials are required but missing
* [extractor/common] Improve HLS extraction (#12211)
- Extract m3u8 parsing to separate method
- Improve rendition groups extraction
- Build stream name according stream GROUP-ID
- Ignore reference to AUDIO group without URI when stream has no CODECS
- Use float for scaled tbr in _parse_m3u8_formats
* Extract m3u8 parsing to separate method
* Improve rendition groups extraction
* Build stream name according stream GROUP-ID
* Ignore reference to AUDIO group without URI when stream has no CODECS
* Use float for scaled tbr in _parse_m3u8_formats
* [utils] Add support for TTML styles in dfxp2srt
* [downloader/hls] No need to download keys for fragments that have been
already downloaded
* [downloader/fragment] Improve fragment downloading
- Resume immediately
- Don't concatenate fragments and decrypt them on every resume
- Optimize disk storage usage, don't store intermediate fragments on disk
- Store bookkeeping download state file
* Resume immediately
* Don't concatenate fragments and decrypt them on every resume
* Optimize disk storage usage, don't store intermediate fragments on disk
* Store bookkeeping download state file
+ [extractor/common] Add support for multiple getters in try_get
+ [extractor/common] Add support for video of WebPage context in _json_ld
(#12778)

View File

@@ -45,6 +45,7 @@
- **anderetijden**: npo.nl and ntr.nl
- **AnimeOnDemand**
- **anitube.se**
- **Anvato**
- **AnySex**
- **Aparat**
- **AppleConnect**
@@ -529,6 +530,7 @@
- **NJPWWorld**: 新日本プロレスワールド
- **NobelPrize**
- **Noco**
- **Noovo**
- **Normalboots**
- **NosVideo**
- **Nova**: TN.cz, Prásk.tv, Nova.cz, Novaplus.cz, FANDA.tv, Krásná.cz and Doma.cz
@@ -1013,6 +1015,7 @@
- **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword)
- **youtube:watchlater**: Youtube watch later list, ":ytwatchlater" for short (requires authentication)
- **Zapiks**
- **Zaq1**
- **ZDF**
- **ZDFChannel**
- **zingmp3**: mp3.zing.vn

View File

@@ -338,6 +338,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_timestamp('UNKNOWN DATE FORMAT'), None)
self.assertEqual(unified_timestamp('May 16, 2016 11:15 PM'), 1463440500)
self.assertEqual(unified_timestamp('Feb 7, 2016 at 6:35 pm'), 1454870100)
self.assertEqual(unified_timestamp('2017-03-30T17:52:41Q'), 1490896361)
def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
@@ -899,6 +900,7 @@ class TestUtil(unittest.TestCase):
def test_clean_html(self):
self.assertEqual(clean_html('a:\nb'), 'a: b')
self.assertEqual(clean_html('a:\n "b"'), 'a: "b"')
self.assertEqual(clean_html('a<br>\xa0b'), 'a\nb')
def test_intlist_to_bytes(self):
self.assertEqual(

View File

@@ -29,7 +29,17 @@ class ExternalFD(FileDownloader):
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
retval = self._call_downloader(tmpfilename, info_dict)
try:
retval = self._call_downloader(tmpfilename, info_dict)
except KeyboardInterrupt:
if not info_dict.get('is_live'):
raise
# Live stream downloading cancellation should be considered as
# correct and expected termination thus all postprocessing
# should take place
retval = 0
self.to_screen('[%s] Interrupted by user' % self.get_basename())
if retval == 0:
fsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen('\r[%s] Downloaded %s bytes' % (self.get_basename(), fsize))

View File

@@ -49,7 +49,7 @@ class FragmentFD(FileDownloader):
index: 0-based index of current fragment among all fragments
fragment_count:
Total count of fragments
This feature is experimental and file format may change in future.
"""

View File

@@ -5,6 +5,7 @@ import base64
import hashlib
import json
import random
import re
import time
from .common import InfoExtractor
@@ -16,6 +17,7 @@ from ..utils import (
intlist_to_bytes,
int_or_none,
strip_jsonp,
unescapeHTML,
)
@@ -26,6 +28,8 @@ def md5_text(s):
class AnvatoIE(InfoExtractor):
_VALID_URL = r'anvato:(?P<access_key_or_mcp>[^:]+):(?P<id>\d+)'
# Copied from anvplayer.min.js
_ANVACK_TABLE = {
'nbcu_nbcd_desktop_web_prod_93d8ead38ce2024f8f544b78306fbd15895ae5e6': 'NNemUkySjxLyPTKvZRiGntBIjEyK8uqicjMakIaQ',
@@ -114,6 +118,22 @@ class AnvatoIE(InfoExtractor):
'nbcu_nbcd_desktop_web_prod_93d8ead38ce2024f8f544b78306fbd15895ae5e6_secure': 'NNemUkySjxLyPTKvZRiGntBIjEyK8uqicjMakIaQ'
}
_MCP_TO_ACCESS_KEY_TABLE = {
'qa': 'anvato_mcpqa_demo_web_stage_18b55e00db5a13faa8d03ae6e41f6f5bcb15b922',
'lin': 'anvato_mcp_lin_web_prod_4c36fbfd4d8d8ecae6488656e21ac6d1ac972749',
'univison': 'anvato_mcp_univision_web_prod_37fe34850c99a3b5cdb71dab10a417dd5cdecafa',
'uni': 'anvato_mcp_univision_web_prod_37fe34850c99a3b5cdb71dab10a417dd5cdecafa',
'dev': 'anvato_mcp_fs2go_web_prod_c7b90a93e171469cdca00a931211a2f556370d0a',
'sps': 'anvato_mcp_sps_web_prod_54bdc90dd6ba21710e9f7074338365bba28da336',
'spsstg': 'anvato_mcp_sps_web_prod_54bdc90dd6ba21710e9f7074338365bba28da336',
'anv': 'anvato_mcp_anv_web_prod_791407490f4c1ef2a4bcb21103e0cb1bcb3352b3',
'gray': 'anvato_mcp_gray_web_prod_4c10f067c393ed8fc453d3930f8ab2b159973900',
'hearst': 'anvato_mcp_hearst_web_prod_5356c3de0fc7c90a3727b4863ca7fec3a4524a99',
'cbs': 'anvato_mcp_cbs_web_prod_02f26581ff80e5bda7aad28226a8d369037f2cbe',
'telemundo': 'anvato_mcp_telemundo_web_prod_c5278d51ad46fda4b6ca3d0ea44a7846a054f582'
}
_ANVP_RE = r'<script[^>]+\bdata-anvp\s*=\s*(["\'])(?P<anvp>(?:(?!\1).)+)\1'
_AUTH_KEY = b'\x31\xc2\x42\x84\x9e\x73\xa0\xce'
def __init__(self, *args, **kwargs):
@@ -217,9 +237,42 @@ class AnvatoIE(InfoExtractor):
'subtitles': subtitles,
}
@staticmethod
def _extract_urls(ie, webpage, video_id):
entries = []
for mobj in re.finditer(AnvatoIE._ANVP_RE, webpage):
anvplayer_data = ie._parse_json(
mobj.group('anvp'), video_id, transform_source=unescapeHTML,
fatal=False)
if not anvplayer_data:
continue
video = anvplayer_data.get('video')
if not isinstance(video, compat_str) or not video.isdigit():
continue
access_key = anvplayer_data.get('accessKey')
if not access_key:
mcp = anvplayer_data.get('mcp')
if mcp:
access_key = AnvatoIE._MCP_TO_ACCESS_KEY_TABLE.get(
mcp.lower())
if not access_key:
continue
entries.append(ie.url_result(
'anvato:%s:%s' % (access_key, video), ie=AnvatoIE.ie_key(),
video_id=video))
return entries
def _extract_anvato_videos(self, webpage, video_id):
anvplayer_data = self._parse_json(self._html_search_regex(
r'<script[^>]+data-anvp=\'([^\']+)\'', webpage,
'Anvato player data'), video_id)
anvplayer_data = self._parse_json(
self._html_search_regex(
self._ANVP_RE, webpage, 'Anvato player data', group='anvp'),
video_id)
return self._get_anvato_videos(
anvplayer_data['accessKey'], anvplayer_data['video'])
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
access_key, video_id = mobj.group('access_key_or_mcp', 'id')
if access_key not in self._ANVACK_TABLE:
access_key = self._MCP_TO_ACCESS_KEY_TABLE[access_key]
return self._get_anvato_videos(access_key, video_id)

View File

@@ -70,7 +70,8 @@ class AppleTrailersIE(InfoExtractor):
}, {
'url': 'http://trailers.apple.com/trailers/magnolia/blackthorn/',
'info_dict': {
'id': 'blackthorn',
'id': '4489',
'title': 'Blackthorn',
},
'playlist_mincount': 2,
'expected_warnings': ['Unable to download JSON metadata'],
@@ -261,7 +262,7 @@ class AppleTrailersSectionIE(InfoExtractor):
'title': 'Most Popular',
'id': 'mostpopular',
},
'playlist_mincount': 80,
'playlist_mincount': 30,
}, {
'url': 'http://trailers.apple.com/#section=moviestudios',
'info_dict': {

View File

@@ -522,7 +522,7 @@ class BrightcoveNewIE(InfoExtractor):
# [2] looks like:
for video, script_tag, account_id, player_id, embed in re.findall(
r'''(?isx)
(<video\s+[^>]*data-video-id=['"]?[^>]+>)
(<video\s+[^>]*\bdata-video-id\s*=\s*['"]?[^>]+>)
(?:.*?
(<script[^>]+
src=["\'](?:https?:)?//players\.brightcove\.net/

View File

@@ -96,6 +96,7 @@ class CBCIE(InfoExtractor):
'info_dict': {
'title': 'Keep Rover active during the deep freeze with doggie pushups and other fun indoor tasks',
'id': 'dog-indoor-exercise-winter-1.3928238',
'description': 'md5:c18552e41726ee95bd75210d1ca9194c',
},
'playlist_mincount': 6,
}]
@@ -165,12 +166,11 @@ class CBCPlayerIE(InfoExtractor):
'uploader': 'CBCC-NEW',
},
}, {
# available only when we add `formats=MPEG4,FLV,MP3` to theplatform url
'url': 'http://www.cbc.ca/player/play/2164402062',
'md5': '17a61eb813539abea40618d6323a7f82',
'md5': '33fcd8f6719b9dd60a5e73adcb83b9f6',
'info_dict': {
'id': '2164402062',
'ext': 'flv',
'ext': 'mp4',
'title': 'Cancer survivor four times over',
'description': 'Tim Mayer has beaten three different forms of cancer four times in five years.',
'timestamp': 1320410746,

View File

@@ -990,6 +990,7 @@ class InfoExtractor(object):
'tbr': int_or_none(e.get('bitrate')),
'width': int_or_none(e.get('width')),
'height': int_or_none(e.get('height')),
'view_count': int_or_none(e.get('interactionCount')),
})
for e in json_ld:

View File

@@ -41,6 +41,7 @@ from .alphaporno import AlphaPornoIE
from .amcnetworks import AMCNetworksIE
from .animeondemand import AnimeOnDemandIE
from .anitube import AnitubeIE
from .anvato import AnvatoIE
from .anysex import AnySexIE
from .aol import AolIE
from .allocine import AllocineIE
@@ -662,6 +663,7 @@ from .nintendo import NintendoIE
from .njpwworld import NJPWWorldIE
from .nobelprize import NobelPrizeIE
from .noco import NocoIE
from .noovo import NoovoIE
from .normalboots import NormalbootsIE
from .nosvideo import NosVideoIE
from .nova import NovaIE
@@ -1298,5 +1300,6 @@ from .youtube import (
YoutubeWatchLaterIE,
)
from .zapiks import ZapiksIE
from .zaq1 import Zaq1IE
from .zdf import ZDFIE, ZDFChannelIE
from .zingmp3 import ZingMp3IE

View File

@@ -86,6 +86,8 @@ from .openload import OpenloadIE
from .videopress import VideoPressIE
from .rutube import RutubeIE
from .limelight import LimelightBaseIE
from .anvato import AnvatoIE
from .washingtonpost import WashingtonPostIE
class GenericIE(InfoExtractor):
@@ -1427,6 +1429,22 @@ class GenericIE(InfoExtractor):
'skip_download': True,
},
},
{
# Brightcove embed with whitespace around attribute names
'url': 'http://www.stack.com/video/3167554373001/learn-to-hit-open-three-pointers-with-damian-lillard-s-baseline-drift-drill',
'info_dict': {
'id': '3167554373001',
'ext': 'mp4',
'title': "Learn to Hit Open Three-Pointers With Damian Lillard's Baseline Drift Drill",
'description': 'md5:57bacb0e0f29349de4972bfda3191713',
'uploader_id': '1079349493',
'upload_date': '20140207',
'timestamp': 1391810548,
},
'params': {
'skip_download': True,
},
},
# Another form of arte.tv embed
{
'url': 'http://www.tv-replay.fr/redirection/09-04-16/arte-reportage-arte-11508975.html',
@@ -1677,6 +1695,29 @@ class GenericIE(InfoExtractor):
},
'playlist_mincount': 5,
},
{
'url': 'http://kron4.com/2017/04/28/standoff-with-walnut-creek-murder-suspect-ends-with-arrest/',
'info_dict': {
'id': 'standoff-with-walnut-creek-murder-suspect-ends-with-arrest',
'title': 'Standoff with Walnut Creek murder suspect ends',
'description': 'md5:3ccc48a60fc9441eeccfc9c469ebf788',
},
'playlist_mincount': 4,
},
{
# WashingtonPost embed
'url': 'http://www.vanityfair.com/hollywood/2017/04/donald-trump-tv-pitches',
'info_dict': {
'id': '8caf6e88-d0ec-11e5-90d3-34c2c42653ac',
'ext': 'mp4',
'title': "No one has seen the drama series based on Trump's life \u2014 until now",
'description': 'Donald Trump wanted a weekly TV drama based on his life. It never aired. But The Washington Post recently obtained a scene from the pilot script — and enlisted actors.',
'timestamp': 1455216756,
'uploader': 'The Washington Post',
'upload_date': '20160211',
},
'add_ie': [WashingtonPostIE.ie_key()],
},
# {
# # TODO: find another test
# # http://schema.org/VideoObject
@@ -2537,6 +2578,12 @@ class GenericIE(InfoExtractor):
'limelight:media:%s' % mobj.group('id'),
{'source_url': url}), 'LimelightMedia', mobj.group('id'))
# Look for Anvato embeds
anvato_urls = AnvatoIE._extract_urls(self, webpage, video_id)
if anvato_urls:
return self.playlist_result(
anvato_urls, video_id, video_title, video_description)
# Look for AdobeTVVideo embeds
mobj = re.search(
r'<iframe[^>]+src=[\'"]((?:https?:)?//video\.tv\.adobe\.com/v/\d+[^"]+)[\'"]',
@@ -2654,6 +2701,12 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches(
rutube_urls, ie=RutubeIE.ie_key())
# Look for WashingtonPost embeds
wapo_urls = WashingtonPostIE._extract_urls(webpage)
if wapo_urls:
return self.playlist_from_matches(
wapo_urls, video_id, video_title, ie=WashingtonPostIE.ie_key())
# Looking for http://schema.org/VideoObject
json_ld = self._search_json_ld(
webpage, video_id, default={}, expected_type='VideoObject')

View File

@@ -87,8 +87,8 @@ class InfoQIE(BokeCCBaseIE):
def _extract_http_audio(self, webpage, video_id):
fields = self._hidden_inputs(webpage)
http_audio_url = fields['filename']
if http_audio_url is None:
http_audio_url = fields.get('filename')
if not http_audio_url:
return []
cookies_header = {'Cookie': self._extract_cookies(webpage)}

View File

@@ -0,0 +1,97 @@
# coding: utf-8
from __future__ import unicode_literals
from .brightcove import BrightcoveNewIE
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
smuggle_url,
try_get,
)
class NoovoIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?noovo\.ca/videos/(?P<id>[^/]+/[^/?#&]+)'
_TESTS = [{
# clip
'url': 'http://noovo.ca/videos/rpm-plus/chrysler-imperial',
'info_dict': {
'id': '5386045029001',
'ext': 'mp4',
'title': 'Chrysler Imperial',
'description': 'md5:de3c898d1eb810f3e6243e08c8b4a056',
'timestamp': 1491399228,
'upload_date': '20170405',
'uploader_id': '618566855001',
'creator': 'vtele',
'view_count': int,
'series': 'RPM+',
},
'params': {
'skip_download': True,
},
}, {
# episode
'url': 'http://noovo.ca/videos/l-amour-est-dans-le-pre/episode-13-8',
'info_dict': {
'id': '5395865725001',
'title': 'Épisode 13 : Les retrouvailles',
'description': 'md5:336d5ebc5436534e61d16e63ddfca327',
'ext': 'mp4',
'timestamp': 1492019320,
'upload_date': '20170412',
'uploader_id': '618566855001',
'creator': 'vtele',
'view_count': int,
'series': "L'amour est dans le pré",
'season_number': 5,
'episode': 'Épisode 13',
'episode_number': 13,
},
'params': {
'skip_download': True,
},
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/618566855001/default_default/index.html?videoId=%s'
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'http://api.noovo.ca/api/v1/pages/single-episode/%s' % video_id,
video_id)['data']
content = try_get(data, lambda x: x['contents'][0])
brightcove_id = data.get('brightcoveId') or content['brightcoveId']
series = try_get(
data, (
lambda x: x['show']['title'],
lambda x: x['season']['show']['title']),
compat_str)
episode = None
og = data.get('og')
if isinstance(og, dict) and og.get('type') == 'video.episode':
episode = og.get('title')
video = content or data
return {
'_type': 'url_transparent',
'ie_key': BrightcoveNewIE.ie_key(),
'url': smuggle_url(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
{'geo_countries': ['CA']}),
'id': brightcove_id,
'title': video.get('title'),
'creator': video.get('source'),
'view_count': int_or_none(video.get('viewsCount')),
'series': series,
'season_number': int_or_none(try_get(
data, lambda x: x['season']['seasonNumber'])),
'episode': episode,
'episode_number': int_or_none(data.get('episodeNumber')),
}

View File

@@ -2,9 +2,13 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..compat import (
compat_HTTPError,
compat_str,
)
from ..utils import (
extract_attributes,
try_get,
urlencode_postdata,
ExtractorError,
)
@@ -34,25 +38,32 @@ class TVPlayerIE(InfoExtractor):
webpage, 'channel element'))
title = current_channel['data-name']
resource_id = self._search_regex(
r'resourceId\s*=\s*"(\d+)"', webpage, 'resource id')
platform = self._search_regex(
r'platform\s*=\s*"([^"]+)"', webpage, 'platform')
resource_id = current_channel['data-id']
token = self._search_regex(
r'token\s*=\s*"([^"]+)"', webpage, 'token', default='null')
validate = self._search_regex(
r'validate\s*=\s*"([^"]+)"', webpage, 'validate', default='null')
r'data-token=(["\'])(?P<token>(?!\1).+)\1', webpage,
'token', group='token')
context = self._download_json(
'https://tvplayer.com/watch/context', display_id,
'Downloading JSON context', query={
'resource': resource_id,
'nonce': token,
})
validate = context['validate']
platform = try_get(
context, lambda x: x['platform']['key'], compat_str) or 'firefox'
try:
response = self._download_json(
'http://api.tvplayer.com/api/v2/stream/live',
resource_id, headers={
display_id, 'Downloading JSON stream', headers={
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
}, data=urlencode_postdata({
'id': resource_id,
'service': 1,
'platform': platform,
'id': resource_id,
'token': token,
'validate': validate,
}))['tvplayer']['response']
except ExtractorError as e:
@@ -63,7 +74,7 @@ class TVPlayerIE(InfoExtractor):
'%s said: %s' % (self.IE_NAME, response['error']), expected=True)
raise
formats = self._extract_m3u8_formats(response['stream'], resource_id, 'mp4')
formats = self._extract_m3u8_formats(response['stream'], display_id, 'mp4')
self._sort_formats(formats)
return {

View File

@@ -1,6 +1,7 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..compat import (
@@ -11,7 +12,6 @@ from ..compat import (
from ..utils import (
ExtractorError,
int_or_none,
sanitized_Request,
parse_iso8601,
)
@@ -154,19 +154,24 @@ class VevoIE(VevoBaseIE):
}
def _initialize_api(self, video_id):
req = sanitized_Request(
'http://www.vevo.com/auth', data=b'')
webpage = self._download_webpage(
req, None,
'https://accounts.vevo.com/token', None,
note='Retrieving oauth token',
errnote='Unable to retrieve oauth token')
errnote='Unable to retrieve oauth token',
data=json.dumps({
'client_id': 'SPupX1tvqFEopQ1YS6SS',
'grant_type': 'urn:vevo:params:oauth:grant-type:anonymous',
}).encode('utf-8'),
headers={
'Content-Type': 'application/json',
})
if re.search(r'(?i)THIS PAGE IS CURRENTLY UNAVAILABLE IN YOUR REGION', webpage):
self.raise_geo_restricted(
'%s said: This page is currently unavailable in your region' % self.IE_NAME)
auth_info = self._parse_json(webpage, video_id)
self._api_url_template = self.http_scheme() + '//apiv2.vevo.com/%s?token=' + auth_info['access_token']
self._api_url_template = self.http_scheme() + '//apiv2.vevo.com/%s?token=' + auth_info['legacy_token']
def _call_api(self, path, *args, **kwargs):
try:

View File

@@ -13,6 +13,7 @@ from ..utils import (
class WashingtonPostIE(InfoExtractor):
IE_NAME = 'washingtonpost'
_VALID_URL = r'(?:washingtonpost:|https?://(?:www\.)?washingtonpost\.com/video/(?:[^/]+/)*)(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_EMBED_URL = r'https?://(?:www\.)?washingtonpost\.com/video/c/embed/[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}'
_TEST = {
'url': 'https://www.washingtonpost.com/video/c/video/480ba4ee-1ec7-11e6-82c2-a7dcb313287d',
'md5': '6f537e1334b714eb15f9563bd4b9cdfa',
@@ -27,6 +28,11 @@ class WashingtonPostIE(InfoExtractor):
},
}
@classmethod
def _extract_urls(cls, webpage):
return re.findall(
r'<iframe[^>]+\bsrc=["\'](%s)' % cls._EMBED_URL, webpage)
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(

View File

@@ -6,6 +6,7 @@ import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
js_to_json,
orderedSet,
parse_duration,
sanitized_Request,
@@ -37,6 +38,22 @@ class XTubeIE(InfoExtractor):
'comment_count': int,
'age_limit': 18,
}
}, {
# FLV videos with duplicated formats
'url': 'http://www.xtube.com/video-watch/A-Super-Run-Part-1-YT-9299752',
'md5': 'a406963eb349dd43692ec54631efd88b',
'info_dict': {
'id': '9299752',
'display_id': 'A-Super-Run-Part-1-YT',
'ext': 'flv',
'title': 'A Super Run - Part 1 (YT)',
'description': 'md5:ca0d47afff4a9b2942e4b41aa970fd93',
'uploader': 'tshirtguy59',
'duration': 579,
'view_count': int,
'comment_count': int,
'age_limit': 18,
},
}, {
# new URL schema
'url': 'http://www.xtube.com/video-watch/strange-erotica-625837',
@@ -68,8 +85,9 @@ class XTubeIE(InfoExtractor):
})
sources = self._parse_json(self._search_regex(
r'(["\'])sources\1\s*:\s*(?P<sources>{.+?}),',
webpage, 'sources', group='sources'), video_id)
r'(["\'])?sources\1?\s*:\s*(?P<sources>{.+?}),',
webpage, 'sources', group='sources'), video_id,
transform_source=js_to_json)
formats = []
for format_id, format_url in sources.items():
@@ -78,6 +96,7 @@ class XTubeIE(InfoExtractor):
'format_id': format_id,
'height': int_or_none(format_id),
})
self._remove_duplicate_formats(formats)
self._sort_formats(formats)
title = self._search_regex(

View File

@@ -6,8 +6,10 @@ from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..utils import (
clean_html,
ExtractorError,
determine_ext,
ExtractorError,
int_or_none,
parse_duration,
)
@@ -20,6 +22,7 @@ class XVideosIE(InfoExtractor):
'id': '4588838',
'ext': 'mp4',
'title': 'Biker Takes his Girl',
'duration': 108,
'age_limit': 18,
}
}
@@ -36,6 +39,11 @@ class XVideosIE(InfoExtractor):
r'<title>(.*?)\s+-\s+XVID', webpage, 'title')
video_thumbnail = self._search_regex(
r'url_bigthumb=(.+?)&amp', webpage, 'thumbnail', fatal=False)
video_duration = int_or_none(self._og_search_property(
'duration', webpage, default=None)) or parse_duration(
self._search_regex(
r'<span[^>]+class=["\']duration["\'][^>]*>.*?(\d[^<]+)',
webpage, 'duration', fatal=False))
formats = []
@@ -67,6 +75,7 @@ class XVideosIE(InfoExtractor):
'id': video_id,
'formats': formats,
'title': video_title,
'duration': video_duration,
'thumbnail': video_thumbnail,
'age_limit': 18,
}

View File

@@ -234,7 +234,8 @@ class YandexMusicPlaylistIE(YandexMusicPlaylistBaseIE):
'overembed': 'false',
})['playlist']
tracks, track_ids = playlist['tracks'], map(compat_str, playlist['trackIds'])
tracks = playlist['tracks']
track_ids = [compat_str(track_id) for track_id in playlist['trackIds']]
# tracks dictionary shipped with playlist.jsx API is limited to 150 tracks,
# missing tracks should be retrieved manually.

View File

@@ -0,0 +1,101 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
unified_timestamp,
)
class Zaq1IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?zaq1\.pl/video/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://zaq1.pl/video/xev0e',
'md5': '24a5eb3f052e604ae597c4d0d19b351e',
'info_dict': {
'id': 'xev0e',
'title': 'DJ NA WESELE. TANIEC Z FIGURAMI.węgrów/sokołów podlaski/siedlce/mińsk mazowiecki/warszawa',
'description': 'www.facebook.com/weseledjKontakt: 728 448 199 / 505 419 147',
'ext': 'mp4',
'duration': 511,
'timestamp': 1490896361,
'uploader': 'Anonim',
'upload_date': '20170330',
'view_count': int,
}
}, {
# malformed JSON-LD
'url': 'http://zaq1.pl/video/x81vn',
'info_dict': {
'id': 'x81vn',
'title': 'SEKRETNE ŻYCIE WALTERA MITTY',
'ext': 'mp4',
'duration': 6234,
'timestamp': 1493494860,
'uploader': 'Anonim',
'upload_date': '20170429',
'view_count': int,
},
'params': {
'skip_download': True,
},
'expected_warnings': ['Failed to parse JSON'],
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
r'data-video-url=(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'video url', group='url')
info = self._search_json_ld(webpage, video_id, fatal=False)
def extract_data(field, name, fatal=False):
return self._search_regex(
r'data-%s=(["\'])(?P<field>(?:(?!\1).)+)\1' % field,
webpage, field, fatal=fatal, group='field')
if not info.get('title'):
info['title'] = extract_data('file-name', 'title', fatal=True)
if not info.get('duration'):
info['duration'] = int_or_none(extract_data('duration', 'duration'))
if not info.get('thumbnail'):
info['thumbnail'] = extract_data('photo-url', 'thumbnail')
if not info.get('timestamp'):
info['timestamp'] = unified_timestamp(self._html_search_meta(
'uploadDate', webpage, 'timestamp'))
if not info.get('interactionCount'):
info['view_count'] = int_or_none(self._html_search_meta(
'interactionCount', webpage, 'view count'))
uploader = self._html_search_regex(
r'Wideo dodał:\s*<a[^>]*>([^<]+)</a>', webpage, 'uploader',
fatal=False)
width = int_or_none(self._html_search_meta(
'width', webpage, fatal=False))
height = int_or_none(self._html_search_meta(
'height', webpage, fatal=False))
info.update({
'id': video_id,
'formats': [{
'url': video_url,
'width': width,
'height': height,
'http_headers': {
'Referer': url,
},
}],
'uploader': uploader,
})
return info

View File

@@ -421,8 +421,8 @@ def clean_html(html):
# Newline vs <br />
html = html.replace('\n', ' ')
html = re.sub(r'\s*<\s*br\s*/?\s*>\s*', '\n', html)
html = re.sub(r'<\s*/\s*p\s*>\s*<\s*p[^>]*>', '\n', html)
html = re.sub(r'(?u)\s*<\s*br\s*/?\s*>\s*', '\n', html)
html = re.sub(r'(?u)<\s*/\s*p\s*>\s*<\s*p[^>]*>', '\n', html)
# Strip html tags
html = re.sub('<.*?>', '', html)
# Replace html entities
@@ -1194,6 +1194,11 @@ def unified_timestamp(date_str, day_first=True):
# Remove AM/PM + timezone
date_str = re.sub(r'(?i)\s*(?:AM|PM)(?:\s+[A-Z]+)?', '', date_str)
# Remove unrecognized timezones from ISO 8601 alike timestamps
m = re.search(r'\d{1,2}:\d{1,2}(?:\.\d+)?(?P<tz>\s*[A-Z]+)$', date_str)
if m:
date_str = date_str[:-len(m.group('tz'))]
for expression in date_formats(day_first):
try:
dt = datetime.datetime.strptime(date_str, expression) - timezone + datetime.timedelta(hours=pm_delta)
@@ -2273,10 +2278,8 @@ def mimetype2ext(mt):
return {
'3gpp': '3gp',
'smptett+xml': 'tt',
'srt': 'srt',
'ttaf+xml': 'dfxp',
'ttml+xml': 'ttml',
'vtt': 'vtt',
'x-flv': 'flv',
'x-mp4-fragmented': 'mp4',
'x-ms-wmv': 'wmv',
@@ -2284,11 +2287,11 @@ def mimetype2ext(mt):
'x-mpegurl': 'm3u8',
'vnd.apple.mpegurl': 'm3u8',
'dash+xml': 'mpd',
'f4m': 'f4m',
'f4m+xml': 'f4m',
'hds+xml': 'f4m',
'vnd.ms-sstr+xml': 'ism',
'quicktime': 'mov',
'mp2t': 'ts',
}.get(res, res)

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2017.04.28'
__version__ = '2017.05.01'