Compare commits

..

53 Commits

Author SHA1 Message Date
Sergey M․
3a0ceb32e2 release 2018.03.10 2018-03-10 04:45:57 +07:00
Sergey M․
7dee417127 [ChangeLog] Actualize
[ci skip]
2018-03-10 04:44:46 +07:00
Sergey M․
5b1d158834 [raywenderlich] Extract videos in order 2018-03-10 04:31:51 +07:00
Eitan Postavsky
a7298f3e99 [pornhub] Don't override session cookies (closes #15697) 2018-03-09 23:57:32 +07:00
Sergey M․
5d49d879cc [raywenderlich] Add extractor (#15251) 2018-03-09 23:27:44 +07:00
Sergey M․
b5434b5c31 [nexx] Fix typo 2018-03-08 03:25:04 +07:00
Sergey M․
690404a6f8 [funk] Fix extraction and rework extractors (closes #15792) 2018-03-08 03:17:46 +07:00
Sergey M․
d91dd0ce19 [nexx] Restore reverse engineered approach 2018-03-08 03:16:21 +07:00
kayb94
6202f08e1b [heise] Add support for kaltura embeds (closes #14961) 2018-03-06 23:10:01 +07:00
Sergey M․
574e9db2b0 [tvnow] Extract series metadata (closes #15774) 2018-03-06 23:06:00 +07:00
Toni Viemerö
2e25f80d5d [ruutu] Continue formats extraction on NOT-USED URLs 2018-03-06 02:01:04 +07:00
Sergey M․
64f34528df [vrtnu] Use redirect URL for building video JSON URL (closes #15767, closes #15769) 2018-03-05 22:57:19 +07:00
Sergey M․
26ad6bcdfc [vimeo] Modernize login code and improve error messaging 2018-03-05 22:45:47 +07:00
Sergey M․
81dc74966a [archiveorg] Fix extraction (closes #15770, closes #15772) 2018-03-05 22:30:32 +07:00
Sergey M․
d53b6764d0 [hidive] Remove proxy from params 2018-03-04 23:23:30 +07:00
Sergey M․
62f49dd3b9 [hidive] Add extractor (closes #15494) 2018-03-04 17:46:36 +07:00
Sergey M․
f9f10268c1 [afreecatv] Detect deleted videos 2018-03-04 03:13:45 +07:00
Sergey M․
f241a97312 [afreecatv] Fix extraction (closes #15755) 2018-03-04 03:01:58 +07:00
Sergey M․
86c8cfc555 [vice] Fix extraction and rework extractors (closes #11101, closes #13019, closes #13622, closes #13778) 2018-03-03 23:08:43 +07:00
Sergey M․
c01db237b5 [vidzi] Add support for vidzi.si (closes #15751) 2018-03-03 20:16:55 +07:00
Sergey M․
0093c77032 [downloader/hls] Skip uplynk ad fragments (closes #15748) 2018-03-03 20:00:25 +07:00
Sergey M․
5616caf852 [npo] Fix typo 2018-03-03 01:47:09 +07:00
Sergey M․
05a7ffb126 release 2018.03.03 2018-03-03 01:37:01 +07:00
Sergey M․
28f21c9501 [ChangeLog] Actualize
[ci skip]
2018-03-03 01:32:21 +07:00
Sergey M․
4c780fbd0a [yapfiles] Add extractor (closes #15726, refs #11085) 2018-03-03 01:24:36 +07:00
Sergey M․
7773a92800 [spankbang] Fix formats extraction (closes #15727) 2018-03-02 23:39:20 +07:00
Sergey M․
b871d7e954 [utils] Add parse_resolution 2018-03-02 23:39:04 +07:00
Remita Amine
44dc11db61 [adn] fix format extraction(#15716) 2018-02-28 19:41:30 +01:00
Sergey M․
949faa15e8 [toggle] Extract DASH and ISM formats (closes #15721) 2018-02-28 22:55:09 +07:00
Sergey M․
0c3e5f4921 Revert "Respect --prefer-insecure while updating (closes #15497)"
This reverts commit 7d2b4aa047.
2018-02-27 22:30:08 +07:00
Sergey M․
266fbd6b73 [nickelodeon] Add support for nickelodeon.com.tr (closes #15706) 2018-02-26 22:10:44 +07:00
Sergey M․
d1b6187012 [npo] Validate and filter format URLs (closes #15709) 2018-02-26 21:50:51 +07:00
Sergey M․
6ab35f5e16 release 2018.02.26 2018-02-26 04:23:38 +07:00
Sergey M․
32ae31847f [ChangeLog] Actualize 2018-02-26 04:19:04 +07:00
Sergey M․
abe8766c35 [udemy] Use custom User-Agent (closes #15571) 2018-02-26 04:12:53 +07:00
Sergey M․
eaa3172672 release 2018.02.25 2018-02-25 20:38:10 +07:00
Sergey M․
797c9284d6 [ChangeLog] Actualize 2018-02-25 20:35:52 +07:00
Sergey M․
8c73ef37b6 [vidlii] Add extractor (closes #14472, closes #14512, closes #14779) 2018-02-25 20:28:40 +07:00
Andrew Udvare
b5cbe3d652 [postprocessor/embedthumbnail] Skip embedding when there aren't any thumbnails 2018-02-25 19:33:13 +07:00
Sergey M․
ece12e6348 [streamango] Skip dead test 2018-02-25 18:36:25 +07:00
Sergey M․
ff274e3c16 [streamango] Capture and output error messages 2018-02-25 18:34:52 +07:00
Sergey M․
c106237d56 [streamango] Fix formats extraction, improve and simplify (closes #14256) 2018-02-25 18:27:23 +07:00
gfabiano
6e72ea4775 [streamango] Fix extraction (closes #14160) 2018-02-25 18:26:48 +07:00
Sergey M․
d6a0350253 [ard] Remove dead tests 2018-02-25 17:41:12 +07:00
Wandang
ad29ef043e [ard] Add alive tests 2018-02-25 17:38:07 +07:00
Sergey M․
f01df14c4f [telequebec:emission] Extend _VALID_URL 2018-02-25 17:05:39 +07:00
Sergey M․
9306b0c8d9 [telequebec] Add support for emissions and refactor (closes #14649, closes #14655) 2018-02-25 16:54:12 +07:00
Sergey M․
f4b7427279 [extractor/common] Improve jwplayer subtitles extraction (closes #15695) 2018-02-25 00:59:29 +07:00
Sergey M․
300148b48a [telequebec:live] Add extractor (closes #15688) 2018-02-24 06:17:29 +07:00
Wandang
2d17c63140 [abcnews] Update tests 2018-02-24 05:17:21 +07:00
Sergey M․
f2908d072e [mailru:music] Add extractor (closes #15618) 2018-02-24 04:52:55 +07:00
Remita Amine
5e7841932c [aenetworks] switch to akamai hls formats(closes #15612) 2018-02-23 08:23:55 +01:00
Sergey M․
870f3bfc63 [ytsearch] Fix flat title extraction (closes #11260, closes #15681) 2018-02-23 03:43:42 +07:00
45 changed files with 1477 additions and 332 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.02.22*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.02.22**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.03.10*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.03.10**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -36,7 +36,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2018.02.22
[debug] youtube-dl version 2018.03.10
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -1,3 +1,66 @@
version 2018.03.10
Core
* [downloader/hls] Skip uplynk ad fragments (#15748)
Extractors
* [pornhub] Don't override session cookies (#15697)
+ [raywenderlich] Add support for videos.raywenderlich.com (#15251)
* [funk] Fix extraction and rework extractors (#15792)
* [nexx] Restore reverse engineered approach
+ [heise] Add support for kaltura embeds (#14961, #15728)
+ [tvnow] Extract series metadata (#15774)
* [ruutu] Continue formats extraction on NOT-USED URLs (#15775)
* [vrtnu] Use redirect URL for building video JSON URL (#15767, #15769)
* [vimeo] Modernize login code and improve error messaging
* [archiveorg] Fix extraction (#15770, #15772)
+ [hidive] Add support for hidive.com (#15494)
* [afreecatv] Detect deleted videos
* [afreecatv] Fix extraction (#15755)
* [vice] Fix extraction and rework extractors (#11101, #13019, #13622, #13778)
+ [vidzi] Add support for vidzi.si (#15751)
* [npo] Fix typo
version 2018.03.03
Core
+ [utils] Add parse_resolution
Revert respect --prefer-insecure while updating
Extractors
+ [yapfiles] Add support for yapfiles.ru (#15726, #11085)
* [spankbang] Fix formats extraction (#15727)
* [adn] Fix extraction (#15716)
+ [toggle] Extract DASH and ISM formats (#15721)
+ [nickelodeon] Add support for nickelodeon.com.tr (#15706)
* [npo] Validate and filter format URLs (#15709)
version 2018.02.26
Extractors
* [udemy] Use custom User-Agent (#15571)
version 2018.02.25
Core
* [postprocessor/embedthumbnail] Skip embedding when there aren't any
thumbnails (#12573)
* [extractor/common] Improve jwplayer subtitles extraction (#15695)
Extractors
+ [vidlii] Add support for vidlii.com (#14472, #14512, #14779)
+ [streamango] Capture and output error messages
* [streamango] Fix extraction (#14160, #14256)
+ [telequebec] Add support for emissions (#14649, #14655)
+ [telequebec:live] Add support for live streams (#15688)
+ [mailru:music] Add support for mail.ru/music (#15618)
* [aenetworks] Switch to akamai HLS formats (#15612)
* [ytsearch] Fix flat title extraction (#11260, #15681)
version 2018.02.22
Core

View File

@@ -310,7 +310,8 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--encoding ENCODING Force the specified encoding (experimental)
--no-check-certificate Suppress HTTPS certificate validation
--prefer-insecure Use an unencrypted connection to retrieve
information whenever possible
information about the video. (Currently
supported only for YouTube)
--user-agent UA Specify a custom user agent
--referer URL Specify a custom referer, use if the video
access is restricted to one domain

View File

@@ -298,7 +298,8 @@
- **freespeech.org**
- **FreshLive**
- **Funimation**
- **Funk**
- **FunkChannel**
- **FunkMix**
- **FunnyOrDie**
- **Fusion**
- **Fux**
@@ -336,6 +337,7 @@
- **HentaiStigma**
- **hetklokhuis**
- **hgtv.com:show**
- **HiDive**
- **HistoricFilms**
- **history:topic**: History.com Topic
- **hitbox**
@@ -440,6 +442,8 @@
- **m6**
- **macgamestore**: MacGameStore trailers
- **mailru**: Видео@Mail.Ru
- **mailru:music**: Музыка@Mail.Ru
- **mailru:music:search**: Музыка@Mail.Ru
- **MakersChannel**
- **MakerTV**
- **mangomolo:live**
@@ -672,6 +676,7 @@
- **RaiPlay**
- **RaiPlayLive**
- **RaiPlayPlaylist**
- **RayWenderlich**
- **RBMARadio**
- **RDS**: RDS.ca
- **RedBullTV**
@@ -820,6 +825,8 @@
- **Telegraaf**
- **TeleMB**
- **TeleQuebec**
- **TeleQuebecEmission**
- **TeleQuebecLive**
- **TeleTask**
- **Telewebion**
- **TF1**
@@ -930,7 +937,6 @@
- **vice**
- **vice:article**
- **vice:show**
- **Viceland**
- **Vidbit**
- **Viddler**
- **Videa**
@@ -946,6 +952,7 @@
- **VideoPress**
- **videoweed**: VideoWeed
- **Vidio**
- **VidLii**
- **vidme**
- **vidme:user**
- **vidme:user:likes**
@@ -1050,6 +1057,7 @@
- **yandexmusic:album**: Яндекс.Музыка - Альбом
- **yandexmusic:playlist**: Яндекс.Музыка - Плейлист
- **yandexmusic:track**: Яндекс.Музыка - Трек
- **YapFiles**
- **YesJapan**
- **yinyuetai:video**: 音悦Tai
- **Ynet**

View File

@@ -53,6 +53,7 @@ from youtube_dl.utils import (
parse_filesize,
parse_count,
parse_iso8601,
parse_resolution,
pkcs1pad,
read_batch_urls,
sanitize_filename,
@@ -982,6 +983,16 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_count('1.1kk '), 1100000)
self.assertEqual(parse_count('1.1kk views'), 1100000)
def test_parse_resolution(self):
self.assertEqual(parse_resolution(None), {})
self.assertEqual(parse_resolution(''), {})
self.assertEqual(parse_resolution('1920x1080'), {'width': 1920, 'height': 1080})
self.assertEqual(parse_resolution('1920×1080'), {'width': 1920, 'height': 1080})
self.assertEqual(parse_resolution('1920 x 1080'), {'width': 1920, 'height': 1080})
self.assertEqual(parse_resolution('720p'), {'height': 720})
self.assertEqual(parse_resolution('4k'), {'height': 2160})
self.assertEqual(parse_resolution('8K'), {'height': 4320})
def test_version_tuple(self):
self.assertEqual(version_tuple('1'), (1,))
self.assertEqual(version_tuple('10.23.344'), (10, 23, 344))

View File

@@ -438,7 +438,7 @@ def _real_main(argv=None):
with YoutubeDL(ydl_opts) as ydl:
# Update version
if opts.update_self:
update_self(ydl.to_screen, opts.verbose, ydl._opener, opts.prefer_insecure)
update_self(ydl.to_screen, opts.verbose, ydl._opener)
# Remove cache dir
if opts.rm_cachedir:

View File

@@ -75,8 +75,9 @@ class HlsFD(FragmentFD):
fd.add_progress_hook(ph)
return fd.real_download(filename, info_dict)
def anvato_ad(s):
return s.startswith('#ANVATO-SEGMENT-INFO') and 'type=ad' in s
def is_ad_fragment(s):
return (s.startswith('#ANVATO-SEGMENT-INFO') and 'type=ad' in s or
s.startswith('#UPLYNK-SEGMENT') and s.endswith(',ad'))
media_frags = 0
ad_frags = 0
@@ -86,7 +87,7 @@ class HlsFD(FragmentFD):
if not line:
continue
if line.startswith('#'):
if anvato_ad(line):
if is_ad_fragment(line):
ad_frags += 1
ad_frag_next = True
continue
@@ -195,7 +196,7 @@ class HlsFD(FragmentFD):
'start': sub_range_start,
'end': sub_range_start + int(splitted_byte_range[0]),
}
elif anvato_ad(line):
elif is_ad_fragment(line):
ad_frag_next = True
self._finish_frag_download(ctx)

View File

@@ -66,7 +66,7 @@ class AbcNewsIE(InfoExtractor):
_TESTS = [{
'url': 'http://abcnews.go.com/Blotter/News/dramatic-video-rare-death-job-america/story?id=10498713#.UIhwosWHLjY',
'info_dict': {
'id': '10498713',
'id': '10505354',
'ext': 'flv',
'display_id': 'dramatic-video-rare-death-job-america',
'title': 'Occupational Hazards',
@@ -79,7 +79,7 @@ class AbcNewsIE(InfoExtractor):
}, {
'url': 'http://abcnews.go.com/Entertainment/justin-timberlake-performs-stop-feeling-eurovision-2016/story?id=39125818',
'info_dict': {
'id': '39125818',
'id': '38897857',
'ext': 'mp4',
'display_id': 'justin-timberlake-performs-stop-feeling-eurovision-2016',
'title': 'Justin Timberlake Drops Hints For Secret Single',

View File

@@ -51,7 +51,7 @@ class ADNIE(InfoExtractor):
# http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
dec_subtitles = intlist_to_bytes(aes_cbc_decrypt(
bytes_to_intlist(compat_b64decode(enc_subtitles[24:])),
bytes_to_intlist(b'\x1b\xe0\x29\x61\x38\x94\x24\x00\x12\xbd\xc5\x80\xac\xce\xbe\xb0'),
bytes_to_intlist(b'\xc8\x6e\x06\xbc\xbe\xc6\x49\xf5\x88\x0d\xc8\x47\xc4\x27\x0c\x60'),
bytes_to_intlist(compat_b64decode(enc_subtitles[:24]))
))
subtitles_json = self._parse_json(
@@ -107,15 +107,18 @@ class ADNIE(InfoExtractor):
options = player_config.get('options') or {}
metas = options.get('metas') or {}
title = metas.get('title') or video_info['title']
links = player_config.get('links') or {}
sub_path = player_config.get('subtitles')
error = None
if not links:
links_url = player_config['linksurl']
links_url = player_config.get('linksurl') or options['videoUrl']
links_data = self._download_json(urljoin(
self._BASE_URL, links_url), video_id)
links = links_data.get('links') or {}
metas = metas or links_data.get('meta') or {}
sub_path = sub_path or links_data.get('subtitles')
error = links_data.get('error')
title = metas.get('title') or video_info['title']
formats = []
for format_id, qualities in links.items():
@@ -146,7 +149,7 @@ class ADNIE(InfoExtractor):
'description': strip_or_none(metas.get('summary') or video_info.get('resume')),
'thumbnail': video_info.get('image'),
'formats': formats,
'subtitles': self.extract_subtitles(player_config.get('subtitles'), video_id),
'subtitles': self.extract_subtitles(sub_path, video_id),
'episode': metas.get('subtitle') or video_info.get('videoTitle'),
'series': video_info.get('playlistTitle'),
}

View File

@@ -122,7 +122,8 @@ class AENetworksIE(AENetworksBaseIE):
query = {
'mbr': 'true',
'assetTypes': 'high_video_s3'
'assetTypes': 'high_video_ak',
'switch': 'hls_high_ak',
}
video_id = self._html_search_meta('aetn:VideoID', webpage)
media_url = self._search_regex(

View File

@@ -177,6 +177,10 @@ class AfreecaTVIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
if re.search(r'alert\(["\']This video has been deleted', webpage):
raise ExtractorError(
'Video %s has been deleted' % video_id, expected=True)
station_id = self._search_regex(
r'nStationNo\s*=\s*(\d+)', webpage, 'station')
bbs_id = self._search_regex(
@@ -200,10 +204,10 @@ class AfreecaTVIE(InfoExtractor):
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, flag), expected=True)
video_element = video_xml.findall(compat_xpath('./track/video'))[1]
video_element = video_xml.findall(compat_xpath('./track/video'))[-1]
if video_element is None or video_element.text is None:
raise ExtractorError('Specified AfreecaTV video does not exist',
expected=True)
raise ExtractorError(
'Video %s video does not exist' % video_id, expected=True)
video_url = video_element.text.strip()

View File

@@ -41,7 +41,7 @@ class ArchiveOrgIE(InfoExtractor):
webpage = self._download_webpage(
'http://archive.org/embed/' + video_id, video_id)
jwplayer_playlist = self._parse_json(self._search_regex(
r"(?s)Play\('[^']+'\s*,\s*(\[.+\])\s*,\s*{.*?}\);",
r"(?s)Play\('[^']+'\s*,\s*(\[.+\])\s*,\s*{.*?}\)",
webpage, 'jwplayer playlist'), video_id)
info = self._parse_jwplayer_data(
{'playlist': jwplayer_playlist}, video_id, base_url=url)

View File

@@ -24,57 +24,30 @@ class ARDMediathekIE(InfoExtractor):
_VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
_TESTS = [{
'url': 'http://www.ardmediathek.de/tv/Dokumentation-und-Reportage/Ich-liebe-das-Leben-trotzdem/rbb-Fernsehen/Video?documentId=29582122&bcastId=3822114',
# available till 26.07.2022
'url': 'http://www.ardmediathek.de/tv/S%C3%9CDLICHT/Was-ist-die-Kunst-der-Zukunft-liebe-Ann/BR-Fernsehen/Video?bcastId=34633636&documentId=44726822',
'info_dict': {
'id': '29582122',
'id': '44726822',
'ext': 'mp4',
'title': 'Ich liebe das Leben trotzdem',
'description': 'md5:45e4c225c72b27993314b31a84a5261c',
'duration': 4557,
'title': 'Was ist die Kunst der Zukunft, liebe Anna McCarthy?',
'description': 'md5:4ada28b3e3b5df01647310e41f3a62f5',
'duration': 1740,
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'HTTP Error 404: Not Found',
}, {
'url': 'http://www.ardmediathek.de/tv/Tatort/Tatort-Scheinwelten-H%C3%B6rfassung-Video/Das-Erste/Video?documentId=29522730&bcastId=602916',
'md5': 'f4d98b10759ac06c0072bbcd1f0b9e3e',
'info_dict': {
'id': '29522730',
'ext': 'mp4',
'title': 'Tatort: Scheinwelten - Hörfassung (Video tgl. ab 20 Uhr)',
'description': 'md5:196392e79876d0ac94c94e8cdb2875f1',
'duration': 5252,
},
'skip': 'HTTP Error 404: Not Found',
}
}, {
# audio
'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
'md5': '219d94d8980b4f538c7fcb0865eb7f2c',
'info_dict': {
'id': '28488308',
'ext': 'mp3',
'title': 'Tod eines Fußballers',
'description': 'md5:f6e39f3461f0e1f54bfa48c8875c86ef',
'duration': 3240,
},
'skip': 'HTTP Error 404: Not Found',
'only_matching': True,
}, {
'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
'only_matching': True,
}, {
# audio
'url': 'http://mediathek.rbb-online.de/radio/Hörspiel/Vor-dem-Fest/kulturradio/Audio?documentId=30796318&topRessort=radio&bcastId=9839158',
'md5': '4e8f00631aac0395fee17368ac0e9867',
'info_dict': {
'id': '30796318',
'ext': 'mp3',
'title': 'Vor dem Fest',
'description': 'md5:c0c1c8048514deaed2a73b3a60eecacb',
'duration': 3287,
},
'skip': 'Video is no longer available',
'only_matching': True,
}]
def _extract_media_info(self, media_info_url, webpage, video_id):
@@ -252,20 +225,23 @@ class ARDMediathekIE(InfoExtractor):
class ARDIE(InfoExtractor):
_VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
_TEST = {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'md5': 'd216c3a86493f9322545e045ddc3eb35',
_TESTS = [{
# available till 14.02.2019
'url': 'http://www.daserste.de/information/talk/maischberger/videos/das-groko-drama-zerlegen-sich-die-volksparteien-video-102.html',
'md5': '8e4ec85f31be7c7fc08a26cdbc5a1f49',
'info_dict': {
'display_id': 'die-story-im-ersten-mission-unter-falscher-flagge',
'id': '100',
'display_id': 'das-groko-drama-zerlegen-sich-die-volksparteien-video',
'id': '102',
'ext': 'mp4',
'duration': 2600,
'title': 'Die Story im Ersten: Mission unter falscher Flagge',
'upload_date': '20140804',
'duration': 4435.0,
'title': 'Das GroKo-Drama: Zerlegen sich die Volksparteien?',
'upload_date': '20180214',
'thumbnail': r're:^https?://.*\.jpg$',
},
'skip': 'HTTP Error 404: Not Found',
}
}, {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)

View File

@@ -246,7 +246,7 @@ class VrtNUIE(GigyaBaseIE):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
webpage, urlh = self._download_webpage_handle(url, display_id)
title = self._html_search_regex(
r'(?ms)<h1 class="content__heading">(.+?)</h1>',
@@ -276,7 +276,7 @@ class VrtNUIE(GigyaBaseIE):
webpage, 'release_date', default=None))
# If there's a ? or a # in the URL, remove them and everything after
clean_url = url.split('?')[0].split('#')[0].strip('/')
clean_url = urlh.geturl().split('?')[0].split('#')[0].strip('/')
securevideo_url = clean_url + '.mssecurevideo.json'
try:

View File

@@ -2353,7 +2353,10 @@ class InfoExtractor(object):
for track in tracks:
if not isinstance(track, dict):
continue
if track.get('kind') != 'captions':
track_kind = track.get('kind')
if not track_kind or not isinstance(track_kind, compat_str):
continue
if track_kind.lower() not in ('captions', 'subtitles'):
continue
track_url = urljoin(base_url, track.get('file'))
if not track_url:

View File

@@ -385,7 +385,10 @@ from .freesound import FreesoundIE
from .freespeech import FreespeechIE
from .freshlive import FreshLiveIE
from .funimation import FunimationIE
from .funk import FunkIE
from .funk import (
FunkMixIE,
FunkChannelIE,
)
from .funnyordie import FunnyOrDieIE
from .fusion import FusionIE
from .fxnetworks import FXNetworksIE
@@ -429,6 +432,7 @@ from .hellporno import HellPornoIE
from .helsinki import HelsinkiIE
from .hentaistigma import HentaiStigmaIE
from .hgtv import HGTVComShowIE
from .hidive import HiDiveIE
from .historicfilms import HistoricFilmsIE
from .hitbox import HitboxIE, HitboxLiveIE
from .hitrecord import HitRecordIE
@@ -566,7 +570,11 @@ from .lynda import (
)
from .m6 import M6IE
from .macgamestore import MacGameStoreIE
from .mailru import MailRuIE
from .mailru import (
MailRuIE,
MailRuMusicIE,
MailRuMusicSearchIE,
)
from .makerschannel import MakersChannelIE
from .makertv import MakerTVIE
from .mangomolo import (
@@ -867,6 +875,7 @@ from .rai import (
RaiPlayPlaylistIE,
RaiIE,
)
from .raywenderlich import RayWenderlichIE
from .rbmaradio import RBMARadioIE
from .rds import RDSIE
from .redbulltv import RedBullTVIE
@@ -1045,7 +1054,11 @@ from .telebruxelles import TeleBruxellesIE
from .telecinco import TelecincoIE
from .telegraaf import TelegraafIE
from .telemb import TeleMBIE
from .telequebec import TeleQuebecIE
from .telequebec import (
TeleQuebecIE,
TeleQuebecEmissionIE,
TeleQuebecLiveIE,
)
from .teletask import TeleTaskIE
from .telewebion import TelewebionIE
from .testurl import TestURLIE
@@ -1202,7 +1215,6 @@ from .vice import (
ViceArticleIE,
ViceShowIE,
)
from .viceland import VicelandIE
from .vidbit import VidbitIE
from .viddler import ViddlerIE
from .videa import VideaIE
@@ -1217,6 +1229,7 @@ from .videomore import (
from .videopremium import VideoPremiumIE
from .videopress import VideoPressIE
from .vidio import VidioIE
from .vidlii import VidLiiIE
from .vidme import (
VidmeIE,
VidmeUserIE,
@@ -1360,6 +1373,7 @@ from .yandexmusic import (
YandexMusicPlaylistIE,
)
from .yandexdisk import YandexDiskIE
from .yapfiles import YapFilesIE
from .yesjapan import YesJapanIE
from .yinyuetai import YinYueTaiIE
from .ynet import YnetIE

View File

@@ -1,43 +1,102 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .nexx import NexxIE
from ..utils import extract_attributes
from ..utils import int_or_none
class FunkIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?funk\.net/(?:mix|channel)/(?:[^/]+/)*(?P<id>[^?/#]+)'
class FunkBaseIE(InfoExtractor):
def _make_url_result(self, video):
return {
'_type': 'url_transparent',
'url': 'nexx:741:%s' % video['sourceId'],
'ie_key': NexxIE.ie_key(),
'id': video['sourceId'],
'title': video.get('title'),
'description': video.get('description'),
'duration': int_or_none(video.get('duration')),
'season_number': int_or_none(video.get('seasonNr')),
'episode_number': int_or_none(video.get('episodeNr')),
}
class FunkMixIE(FunkBaseIE):
_VALID_URL = r'https?://(?:www\.)?funk\.net/mix/(?P<id>[^/]+)/(?P<alias>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.funk.net/mix/59d65d935f8b160001828b5b/0/59d517e741dca10001252574/',
'md5': '4d40974481fa3475f8bccfd20c5361f8',
'url': 'https://www.funk.net/mix/59d65d935f8b160001828b5b/die-realste-kifferdoku-aller-zeiten',
'md5': '8edf617c2f2b7c9847dfda313f199009',
'info_dict': {
'id': '716599',
'id': '123748',
'ext': 'mp4',
'title': 'Neue Rechte Welle',
'description': 'md5:a30a53f740ffb6bfd535314c2cc5fb69',
'timestamp': 1501337639,
'upload_date': '20170729',
'title': '"Die realste Kifferdoku aller Zeiten"',
'description': 'md5:c97160f5bafa8d47ec8e2e461012aa9d',
'timestamp': 1490274721,
'upload_date': '20170323',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
mix_id = mobj.group('id')
alias = mobj.group('alias')
lists = self._download_json(
'https://www.funk.net/api/v3.1/curation/curatedLists/',
mix_id, headers={
'authorization': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoiY3VyYXRpb24tdG9vbC12Mi4wIiwic2NvcGUiOiJzdGF0aWMtY29udGVudC1hcGksY3VyYXRpb24tc2VydmljZSxzZWFyY2gtYXBpIn0.SGCC1IXHLtZYoo8PvRKlU2gXH1su8YSu47sB3S4iXBI',
'Referer': url,
}, query={
'size': 100,
})['result']['lists']
metas = next(
l for l in lists
if mix_id in (l.get('entityId'), l.get('alias')))['videoMetas']
video = next(
meta['videoDataDelegate']
for meta in metas if meta.get('alias') == alias)
return self._make_url_result(video)
class FunkChannelIE(FunkBaseIE):
_VALID_URL = r'https?://(?:www\.)?funk\.net/channel/(?P<id>[^/]+)/(?P<alias>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.funk.net/channel/ba/die-lustigsten-instrumente-aus-dem-internet-teil-2',
'info_dict': {
'id': '1155821',
'ext': 'mp4',
'title': 'Die LUSTIGSTEN INSTRUMENTE aus dem Internet - Teil 2',
'description': 'md5:a691d0413ef4835588c5b03ded670c1f',
'timestamp': 1514507395,
'upload_date': '20171229',
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}, {
'url': 'https://www.funk.net/channel/59d5149841dca100012511e3/0/59d52049999264000182e79d/',
'url': 'https://www.funk.net/channel/59d5149841dca100012511e3/mein-erster-job-lovemilla-folge-1/lovemilla/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
channel_id = mobj.group('id')
alias = mobj.group('alias')
webpage = self._download_webpage(url, video_id)
results = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/filter', channel_id,
headers={
'authorization': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoiY3VyYXRpb24tdG9vbCIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxzZWFyY2gtYXBpIn0.q4Y2xZG8PFHai24-4Pjx2gym9RmJejtmK6lMXP5wAgc',
'Referer': url,
}, query={
'channelId': channel_id,
'size': 100,
})['result']
domain_id = NexxIE._extract_domain_id(webpage) or '741'
nexx_id = extract_attributes(self._search_regex(
r'(<div[^>]id=["\']mediaplayer-funk[^>]+>)',
webpage, 'media player'))['data-id']
video = next(r for r in results if r.get('alias') == alias)
return self.url_result(
'nexx:%s:%s' % (domain_id, nexx_id), ie=NexxIE.ie_key(),
video_id=nexx_id)
return self._make_url_result(video)

View File

@@ -102,6 +102,8 @@ from .channel9 import Channel9IE
from .vshare import VShareIE
from .mediasite import MediasiteIE
from .springboardplatform import SpringboardPlatformIE
from .yapfiles import YapFilesIE
from .vice import ViceIE
class GenericIE(InfoExtractor):
@@ -1970,6 +1972,18 @@ class GenericIE(InfoExtractor):
'params': {
'skip_download': True,
},
},
{
'url': 'https://www.yapfiles.ru/show/1872528/690b05d3054d2dbe1e69523aa21bb3b1.mp4.html',
'info_dict': {
'id': 'vMDE4NzI1Mjgt690b',
'ext': 'mp4',
'title': 'Котята',
},
'add_ie': [YapFilesIE.ie_key()],
'params': {
'skip_download': True,
},
}
# {
# # TODO: find another test
@@ -2947,6 +2961,16 @@ class GenericIE(InfoExtractor):
springboardplatform_urls, video_id, video_title,
ie=SpringboardPlatformIE.ie_key())
yapfiles_urls = YapFilesIE._extract_urls(webpage)
if yapfiles_urls:
return self.playlist_from_matches(
yapfiles_urls, video_id, video_title, ie=YapFilesIE.ie_key())
vice_urls = ViceIE._extract_urls(webpage)
if vice_urls:
return self.playlist_from_matches(
vice_urls, video_id, video_title, ie=ViceIE.ie_key())
def merge_dicts(dict1, dict2):
merged = {}
for k, v in dict1.items():

View File

@@ -2,11 +2,13 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .kaltura import KalturaIE
from .youtube import YoutubeIE
from ..utils import (
determine_ext,
int_or_none,
parse_iso8601,
smuggle_url,
xpath_text,
)
@@ -42,6 +44,19 @@ class HeiseIE(InfoExtractor):
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.heise.de/video/artikel/nachgehakt-Wie-sichert-das-c-t-Tool-Restric-tor-Windows-10-ab-3700244.html',
'md5': '4b58058b46625bdbd841fc2804df95fc',
'info_dict': {
'id': '1_ntrmio2s',
'timestamp': 1512470717,
'upload_date': '20171205',
'ext': 'mp4',
'title': 'ct10 nachgehakt hos restrictor',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://www.heise.de/ct/artikel/c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2403911.html',
'only_matching': True,
@@ -67,9 +82,14 @@ class HeiseIE(InfoExtractor):
if yt_urls:
return self.playlist_from_matches(yt_urls, video_id, title, ie=YoutubeIE.ie_key())
kaltura_url = KalturaIE._extract_url(webpage)
if kaltura_url:
return self.url_result(smuggle_url(kaltura_url, {'source_url': url}), KalturaIE.ie_key())
container_id = self._search_regex(
r'<div class="videoplayerjw"[^>]+data-container="([0-9]+)"',
webpage, 'container ID')
sequenz_id = self._search_regex(
r'<div class="videoplayerjw"[^>]+data-sequenz="([0-9]+)"',
webpage, 'sequenz ID')

View File

@@ -0,0 +1,96 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
urlencode_postdata,
)
class HiDiveIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hidive\.com/stream/(?P<title>[^/]+)/(?P<key>[^/?#&]+)'
# Using X-Forwarded-For results in 403 HTTP error for HLS fragments,
# so disabling geo bypass completely
_GEO_BYPASS = False
_TESTS = [{
'url': 'https://www.hidive.com/stream/the-comic-artist-and-his-assistants/s01e001',
'info_dict': {
'id': 'the-comic-artist-and-his-assistants/s01e001',
'ext': 'mp4',
'title': 'the-comic-artist-and-his-assistants/s01e001',
'series': 'the-comic-artist-and-his-assistants',
'season_number': 1,
'episode_number': 1,
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
title, key = mobj.group('title', 'key')
video_id = '%s/%s' % (title, key)
settings = self._download_json(
'https://www.hidive.com/play/settings', video_id,
data=urlencode_postdata({
'Title': title,
'Key': key,
}))
restriction = settings.get('restrictionReason')
if restriction == 'RegionRestricted':
self.raise_geo_restricted()
if restriction and restriction != 'None':
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, restriction), expected=True)
formats = []
subtitles = {}
for rendition_id, rendition in settings['renditions'].items():
bitrates = rendition.get('bitrates')
if not isinstance(bitrates, dict):
continue
m3u8_url = bitrates.get('hls')
if not isinstance(m3u8_url, compat_str):
continue
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='%s-hls' % rendition_id, fatal=False))
cc_files = rendition.get('ccFiles')
if not isinstance(cc_files, list):
continue
for cc_file in cc_files:
if not isinstance(cc_file, list) or len(cc_file) < 3:
continue
cc_lang = cc_file[0]
cc_url = cc_file[2]
if not isinstance(cc_lang, compat_str) or not isinstance(
cc_url, compat_str):
continue
subtitles.setdefault(cc_lang, []).append({
'url': cc_url,
})
season_number = int_or_none(self._search_regex(
r's(\d+)', key, 'season number', default=None))
episode_number = int_or_none(self._search_regex(
r'e(\d+)', key, 'episode number', default=None))
return {
'id': video_id,
'title': video_id,
'subtitles': subtitles,
'formats': formats,
'series': title,
'season_number': season_number,
'episode_number': episode_number,
}

View File

@@ -1,12 +1,17 @@
# coding: utf-8
from __future__ import unicode_literals
import itertools
import json
import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..utils import (
int_or_none,
parse_duration,
remove_end,
try_get,
)
@@ -157,3 +162,153 @@ class MailRuIE(InfoExtractor):
'view_count': view_count,
'formats': formats,
}
class MailRuMusicSearchBaseIE(InfoExtractor):
def _search(self, query, url, audio_id, limit=100, offset=0):
search = self._download_json(
'https://my.mail.ru/cgi-bin/my/ajax', audio_id,
'Downloading songs JSON page %d' % (offset // limit + 1),
headers={
'Referer': url,
'X-Requested-With': 'XMLHttpRequest',
}, query={
'xemail': '',
'ajax_call': '1',
'func_name': 'music.search',
'mna': '',
'mnb': '',
'arg_query': query,
'arg_extended': '1',
'arg_search_params': json.dumps({
'music': {
'limit': limit,
'offset': offset,
},
}),
'arg_limit': limit,
'arg_offset': offset,
})
return next(e for e in search if isinstance(e, dict))
@staticmethod
def _extract_track(t, fatal=True):
audio_url = t['URL'] if fatal else t.get('URL')
if not audio_url:
return
audio_id = t['File'] if fatal else t.get('File')
if not audio_id:
return
thumbnail = t.get('AlbumCoverURL') or t.get('FiledAlbumCover')
uploader = t.get('OwnerName') or t.get('OwnerName_Text_HTML')
uploader_id = t.get('UploaderID')
duration = int_or_none(t.get('DurationInSeconds')) or parse_duration(
t.get('Duration') or t.get('DurationStr'))
view_count = int_or_none(t.get('PlayCount') or t.get('PlayCount_hr'))
track = t.get('Name') or t.get('Name_Text_HTML')
artist = t.get('Author') or t.get('Author_Text_HTML')
if track:
title = '%s - %s' % (artist, track) if artist else track
else:
title = audio_id
return {
'extractor_key': MailRuMusicIE.ie_key(),
'id': audio_id,
'title': title,
'thumbnail': thumbnail,
'uploader': uploader,
'uploader_id': uploader_id,
'duration': duration,
'view_count': view_count,
'vcodec': 'none',
'abr': int_or_none(t.get('BitRate')),
'track': track,
'artist': artist,
'album': t.get('Album'),
'url': audio_url,
}
class MailRuMusicIE(MailRuMusicSearchBaseIE):
IE_NAME = 'mailru:music'
IE_DESC = 'Музыка@Mail.Ru'
_VALID_URL = r'https?://my\.mail\.ru/music/songs/[^/?#&]+-(?P<id>[\da-f]+)'
_TESTS = [{
'url': 'https://my.mail.ru/music/songs/%D0%BC8%D0%BB8%D1%82%D1%85-l-a-h-luciferian-aesthetics-of-herrschaft-single-2017-4e31f7125d0dfaef505d947642366893',
'md5': '0f8c22ef8c5d665b13ac709e63025610',
'info_dict': {
'id': '4e31f7125d0dfaef505d947642366893',
'ext': 'mp3',
'title': 'L.A.H. (Luciferian Aesthetics of Herrschaft) single, 2017 - М8Л8ТХ',
'uploader': 'Игорь Мудрый',
'uploader_id': '1459196328',
'duration': 280,
'view_count': int,
'vcodec': 'none',
'abr': 320,
'track': 'L.A.H. (Luciferian Aesthetics of Herrschaft) single, 2017',
'artist': 'М8Л8ТХ',
},
}]
def _real_extract(self, url):
audio_id = self._match_id(url)
webpage = self._download_webpage(url, audio_id)
title = self._og_search_title(webpage)
music_data = self._search(title, url, audio_id)['MusicData']
t = next(t for t in music_data if t.get('File') == audio_id)
info = self._extract_track(t)
info['title'] = title
return info
class MailRuMusicSearchIE(MailRuMusicSearchBaseIE):
IE_NAME = 'mailru:music:search'
IE_DESC = 'Музыка@Mail.Ru'
_VALID_URL = r'https?://my\.mail\.ru/music/search/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://my.mail.ru/music/search/black%20shadow',
'info_dict': {
'id': 'black shadow',
},
'playlist_mincount': 532,
}]
def _real_extract(self, url):
query = compat_urllib_parse_unquote(self._match_id(url))
entries = []
LIMIT = 100
offset = 0
for _ in itertools.count(1):
search = self._search(query, url, query, LIMIT, offset)
music_data = search.get('MusicData')
if not music_data or not isinstance(music_data, list):
break
for t in music_data:
track = self._extract_track(t, fatal=False)
if track:
entries.append(track)
total = try_get(
search, lambda x: x['Results']['music']['Total'], int)
if total is not None:
if offset > total:
break
offset += LIMIT
return self.playlist_result(entries, query)

View File

@@ -1,22 +1,27 @@
# coding: utf-8
from __future__ import unicode_literals
import hashlib
import random
import re
import time
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
parse_duration,
try_get,
urlencode_postdata,
)
class NexxIE(InfoExtractor):
_VALID_URL = r'''(?x)
(?:
https?://api\.nexx(?:\.cloud|cdn\.com)/v3/\d+/videos/byid/|
nexx:(?:\d+:)?|
https?://api\.nexx(?:\.cloud|cdn\.com)/v3/(?P<domain_id>\d+)/videos/byid/|
nexx:(?:(?P<domain_id_s>\d+):)?|
https?://arc\.nexx\.cloud/api/video/
)
(?P<id>\d+)
@@ -57,6 +62,21 @@ class NexxIE(InfoExtractor):
'params': {
'skip_download': True,
},
}, {
# does not work via arc
'url': 'nexx:741:1269984',
'md5': 'c714b5b238b2958dc8d5642addba6886',
'info_dict': {
'id': '1269984',
'ext': 'mp4',
'title': '1 TAG ohne KLO... wortwörtlich! 😑',
'alt_title': '1 TAG ohne KLO... wortwörtlich! 😑',
'description': 'md5:4604539793c49eda9443ab5c5b1d612f',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 607,
'timestamp': 1518614955,
'upload_date': '20180214',
},
}, {
'url': 'https://api.nexxcdn.com/v3/748/videos/byid/128907',
'only_matching': True,
@@ -103,12 +123,99 @@ class NexxIE(InfoExtractor):
def _extract_url(webpage):
return NexxIE._extract_urls(webpage)[0]
def _real_extract(self, url):
video_id = self._match_id(url)
def _handle_error(self, response):
status = int_or_none(try_get(
response, lambda x: x['metadata']['status']) or 200)
if 200 <= status < 300:
return
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, response['metadata']['errorhint']),
expected=True)
video = self._download_json(
def _call_api(self, domain_id, path, video_id, data=None, headers={}):
headers['Content-Type'] = 'application/x-www-form-urlencoded; charset=UTF-8'
result = self._download_json(
'https://api.nexx.cloud/v3/%s/%s' % (domain_id, path), video_id,
'Downloading %s JSON' % path, data=urlencode_postdata(data),
headers=headers)
self._handle_error(result)
return result['result']
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
domain_id = mobj.group('domain_id') or mobj.group('domain_id_s')
video_id = mobj.group('id')
video = None
response = self._download_json(
'https://arc.nexx.cloud/api/video/%s.json' % video_id,
video_id)['result']
video_id, fatal=False)
if response and isinstance(response, dict):
result = response.get('result')
if result and isinstance(result, dict):
video = result
# not all videos work via arc, e.g. nexx:741:1269984
if not video:
# Reverse engineered from JS code (see getDeviceID function)
device_id = '%d:%d:%d%d' % (
random.randint(1, 4), int(time.time()),
random.randint(1e4, 99999), random.randint(1, 9))
result = self._call_api(domain_id, 'session/init', video_id, data={
'nxp_devh': device_id,
'nxp_userh': '',
'precid': '0',
'playlicense': '0',
'screenx': '1920',
'screeny': '1080',
'playerversion': '6.0.00',
'gateway': 'html5',
'adGateway': '',
'explicitlanguage': 'en-US',
'addTextTemplates': '1',
'addDomainData': '1',
'addAdModel': '1',
}, headers={
'X-Request-Enable-Auth-Fallback': '1',
})
cid = result['general']['cid']
# As described in [1] X-Request-Token generation algorithm is
# as follows:
# md5( operation + domain_id + domain_secret )
# where domain_secret is a static value that will be given by nexx.tv
# as per [1]. Here is how this "secret" is generated (reversed
# from _play.api.init function, search for clienttoken). So it's
# actually not static and not that much of a secret.
# 1. https://nexxtvstorage.blob.core.windows.net/files/201610/27.pdf
secret = result['device']['clienttoken'][int(device_id[0]):]
secret = secret[0:len(secret) - int(device_id[-1])]
op = 'byid'
# Reversed from JS code for _play.api.call function (search for
# X-Request-Token)
request_token = hashlib.md5(
''.join((op, domain_id, secret)).encode('utf-8')).hexdigest()
video = self._call_api(
domain_id, 'videos/%s/%s' % (op, video_id), video_id, data={
'additionalfields': 'language,channel,actors,studio,licenseby,slug,subtitle,teaser,description',
'addInteractionOptions': '1',
'addStatusDetails': '1',
'addStreamDetails': '1',
'addCaptions': '1',
'addScenes': '1',
'addHotSpots': '1',
'addBumpers': '1',
'captionFormat': 'data',
}, headers={
'X-Request-CID': cid,
'X-Request-Token': request_token,
})
general = video['general']
title = general['title']

View File

@@ -198,7 +198,7 @@ class NickNightIE(NickDeIE):
class NickRuIE(MTVServicesInfoExtractor):
IE_NAME = 'nickelodeonru'
_VALID_URL = r'https?://(?:www\.)nickelodeon\.(?:ru|fr|es|pt|ro|hu)/[^/]+/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?:www\.)nickelodeon\.(?:ru|fr|es|pt|ro|hu|com\.tr)/[^/]+/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.nickelodeon.ru/shows/henrydanger/videos/episodes/3-sezon-15-seriya-licenziya-na-polyot/pmomfb#playlist/7airc6',
'only_matching': True,
@@ -220,6 +220,9 @@ class NickRuIE(MTVServicesInfoExtractor):
}, {
'url': 'http://www.nickelodeon.hu/musorok/spongyabob-kockanadrag/videok/episodes/buborekfujas-az-elszakadt-nadrag/q57iob#playlist/k6te4y',
'only_matching': True,
}, {
'url': 'http://www.nickelodeon.com.tr/programlar/sunger-bob/videolar/kayip-yatak/mgqbjy',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -195,6 +195,10 @@ class NPOIE(NPOBaseIE):
formats = []
urls = set()
def is_legal_url(format_url):
return format_url and format_url not in urls and re.match(
r'^(?:https?:)?//', format_url)
QUALITY_LABELS = ('Laag', 'Normaal', 'Hoog')
QUALITY_FORMATS = ('adaptive', 'wmv_sb', 'h264_sb', 'wmv_bb', 'h264_bb', 'wvc1_std', 'h264_std')
@@ -208,7 +212,7 @@ class NPOIE(NPOBaseIE):
})['items'][0]
for num, item in enumerate(items):
item_url = item.get('url')
if not item_url or item_url in urls:
if not is_legal_url(item_url):
continue
urls.add(item_url)
format_id = self._search_regex(
@@ -229,7 +233,7 @@ class NPOIE(NPOBaseIE):
quality = quality_from_format_id(format_id)
f_id = format_id
else:
quality, f_id = None
quality, f_id = [None] * 2
formats.append({
'url': format_url,
'format_id': f_id,
@@ -279,7 +283,7 @@ class NPOIE(NPOBaseIE):
if not is_live:
for num, stream in enumerate(metadata.get('streams', [])):
stream_url = stream.get('url')
if not stream_url or stream_url in urls:
if not is_legal_url(stream_url):
continue
urls.add(stream_url)
# smooth streaming is not supported

View File

@@ -114,13 +114,14 @@ class PornHubIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
self._set_cookie('pornhub.com', 'age_verified', '1')
def dl_webpage(platform):
self._set_cookie('pornhub.com', 'platform', platform)
return self._download_webpage(
'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id,
video_id, headers={
'Cookie': 'age_verified=1; platform=%s' % platform,
})
video_id)
webpage = dl_webpage('pc')

View File

@@ -0,0 +1,103 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .vimeo import VimeoIE
from ..utils import (
extract_attributes,
ExtractorError,
orderedSet,
smuggle_url,
unsmuggle_url,
urljoin,
)
class RayWenderlichIE(InfoExtractor):
_VALID_URL = r'https?://videos\.raywenderlich\.com/courses/(?P<course_id>[^/]+)/lessons/(?P<id>\d+)'
_TESTS = [{
'url': 'https://videos.raywenderlich.com/courses/105-testing-in-ios/lessons/1',
'info_dict': {
'id': '248377018',
'ext': 'mp4',
'title': 'Testing In iOS Episode 1: Introduction',
'duration': 133,
'uploader': 'Ray Wenderlich',
'uploader_id': 'user3304672',
},
'params': {
'noplaylist': True,
'skip_download': True,
},
'add_ie': [VimeoIE.ie_key()],
'expected_warnings': ['HTTP Error 403: Forbidden'],
}, {
'url': 'https://videos.raywenderlich.com/courses/105-testing-in-ios/lessons/1',
'info_dict': {
'title': 'Testing in iOS',
'id': '105-testing-in-ios',
},
'params': {
'noplaylist': False,
},
'playlist_count': 29,
}]
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
mobj = re.match(self._VALID_URL, url)
course_id, lesson_id = mobj.group('course_id', 'id')
video_id = '%s/%s' % (course_id, lesson_id)
webpage = self._download_webpage(url, video_id)
no_playlist = self._downloader.params.get('noplaylist')
if no_playlist or smuggled_data.get('force_video', False):
if no_playlist:
self.to_screen(
'Downloading just video %s because of --no-playlist'
% video_id)
if '>Subscribe to unlock' in webpage:
raise ExtractorError(
'This content is only available for subscribers',
expected=True)
vimeo_id = self._search_regex(
r'data-vimeo-id=["\'](\d+)', webpage, 'video id')
return self.url_result(
VimeoIE._smuggle_referrer(
'https://player.vimeo.com/video/%s' % vimeo_id, url),
ie=VimeoIE.ie_key(), video_id=vimeo_id)
self.to_screen(
'Downloading playlist %s - add --no-playlist to just download video'
% course_id)
lesson_ids = set((lesson_id, ))
for lesson in re.findall(
r'(<a[^>]+\bclass=["\']lesson-link[^>]+>)', webpage):
attrs = extract_attributes(lesson)
if not attrs:
continue
lesson_url = attrs.get('href')
if not lesson_url:
continue
lesson_id = self._search_regex(
r'/lessons/(\d+)', lesson_url, 'lesson id', default=None)
if not lesson_id:
continue
lesson_ids.add(lesson_id)
entries = []
for lesson_id in sorted(lesson_ids):
entries.append(self.url_result(
smuggle_url(urljoin(url, lesson_id), {'force_video': True}),
ie=RayWenderlichIE.ie_key()))
title = self._search_regex(
r'class=["\']course-title[^>]+>([^<]+)', webpage, 'course title',
default=None)
return self.playlist_result(entries, course_id, title)

View File

@@ -53,6 +53,12 @@ class RuutuIE(InfoExtractor):
'age_limit': 0,
},
},
# Episode where <SourceFile> is "NOT-USED", but has other
# downloadable sources available.
{
'url': 'http://www.ruutu.fi/video/3193728',
'only_matching': True,
},
]
def _real_extract(self, url):
@@ -72,7 +78,7 @@ class RuutuIE(InfoExtractor):
video_url = child.text
if (not video_url or video_url in processed_urls or
any(p in video_url for p in ('NOT_USED', 'NOT-USED'))):
return
continue
processed_urls.append(video_url)
ext = determine_ext(video_url)
if ext == 'm3u8':

View File

@@ -3,7 +3,12 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
from ..utils import (
ExtractorError,
parse_duration,
parse_resolution,
str_to_int,
)
class SpankBangIE(InfoExtractor):
@@ -15,7 +20,7 @@ class SpankBangIE(InfoExtractor):
'id': '3vvn',
'ext': 'mp4',
'title': 'fantasy solo',
'description': 'Watch fantasy solo free HD porn video - 05 minutes - Babe,Masturbation,Solo,Toy - dillion harper masturbates on a bed free adult movies sexy clips.',
'description': 'dillion harper masturbates on a bed',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'silly2587',
'age_limit': 18,
@@ -32,36 +37,49 @@ class SpankBangIE(InfoExtractor):
# mobile page
'url': 'http://m.spankbang.com/1o2de/video/can+t+remember+her+name',
'only_matching': True,
}, {
# 4k
'url': 'https://spankbang.com/1vwqx/video/jade+kush+solo+4k',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(url, video_id, headers={
'Cookie': 'country=US'
})
if re.search(r'<[^>]+\bid=["\']video_removed', webpage):
raise ExtractorError(
'Video %s is not available' % video_id, expected=True)
stream_key = self._html_search_regex(
r'''var\s+stream_key\s*=\s*['"](.+?)['"]''',
webpage, 'stream key')
formats = [{
'url': 'http://spankbang.com/_%s/%s/title/%sp__mp4' % (video_id, stream_key, height),
'ext': 'mp4',
'format_id': '%sp' % height,
'height': int(height),
} for height in re.findall(r'<(?:span|li|p)[^>]+[qb]_(\d+)p', webpage)]
self._check_formats(formats, video_id)
formats = []
for mobj in re.finditer(
r'stream_url_(?P<id>[^\s=]+)\s*=\s*(["\'])(?P<url>(?:(?!\2).)+)\2',
webpage):
format_id, format_url = mobj.group('id', 'url')
f = parse_resolution(format_id)
f.update({
'url': format_url,
'format_id': format_id,
})
formats.append(f)
self._sort_formats(formats)
title = self._html_search_regex(
r'(?s)<h1[^>]*>(.+?)</h1>', webpage, 'title')
description = self._og_search_description(webpage)
description = self._search_regex(
r'<div[^>]+\bclass=["\']bottom[^>]+>\s*<p>[^<]*</p>\s*<p>([^<]+)',
webpage, 'description', fatal=False)
thumbnail = self._og_search_thumbnail(webpage)
uploader = self._search_regex(
r'class="user"[^>]*><img[^>]+>([^<]+)',
webpage, 'uploader', default=None)
duration = parse_duration(self._search_regex(
r'<div[^>]+\bclass=["\']right_side[^>]+>\s*<span>([^<]+)',
webpage, 'duration', fatal=False))
view_count = str_to_int(self._search_regex(
r'([\d,.]+)\s+plays', webpage, 'view count', fatal=False))
age_limit = self._rta_search(webpage)
@@ -71,6 +89,8 @@ class SpankBangIE(InfoExtractor):
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'duration': duration,
'view_count': view_count,
'formats': formats,
'age_limit': age_limit,
}

View File

@@ -4,8 +4,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_chr
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
js_to_json,
)
@@ -32,12 +34,34 @@ class StreamangoIE(InfoExtractor):
'params': {
'skip_download': True,
},
'skip': 'gone',
}, {
'url': 'https://streamango.com/embed/clapasobsptpkdfe/20170315_150006_mp4',
'only_matching': True,
}]
def _real_extract(self, url):
def decrypt_src(encoded, val):
ALPHABET = '=/+9876543210zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA'
encoded = re.sub(r'[^A-Za-z0-9+/=]', '', encoded)
decoded = ''
sm = [None] * 4
i = 0
str_len = len(encoded)
while i < str_len:
for j in range(4):
sm[j % 4] = ALPHABET.index(encoded[i])
i += 1
char_code = ((sm[0] << 0x2) | (sm[1] >> 0x4)) ^ val
decoded += compat_chr(char_code)
if sm[2] != 0x40:
char_code = ((sm[1] & 0xf) << 0x4) | (sm[2] >> 0x2)
decoded += compat_chr(char_code)
if sm[3] != 0x40:
char_code = ((sm[2] & 0x3) << 0x6) | sm[3]
decoded += compat_chr(char_code)
return decoded
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
@@ -46,13 +70,26 @@ class StreamangoIE(InfoExtractor):
formats = []
for format_ in re.findall(r'({[^}]*\bsrc\s*:\s*[^}]*})', webpage):
video = self._parse_json(
format_, video_id, transform_source=js_to_json, fatal=False)
if not video:
mobj = re.search(r'(src\s*:\s*[^(]+\(([^)]*)\)[\s,]*)', format_)
if mobj is None:
continue
src = video.get('src')
format_ = format_.replace(mobj.group(0), '')
video = self._parse_json(
format_, video_id, transform_source=js_to_json,
fatal=False) or {}
mobj = re.search(
r'([\'"])(?P<src>(?:(?!\1).)+)\1\s*,\s*(?P<val>\d+)',
mobj.group(1))
if mobj is None:
continue
src = decrypt_src(mobj.group('src'), int_or_none(mobj.group('val')))
if not src:
continue
ext = determine_ext(src, default_ext=None)
if video.get('type') == 'application/dash+xml' or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
@@ -65,6 +102,16 @@ class StreamangoIE(InfoExtractor):
'height': int_or_none(video.get('height')),
'tbr': int_or_none(video.get('bitrate')),
})
if not formats:
error = self._search_regex(
r'<p[^>]+\bclass=["\']lead[^>]+>(.+?)</p>', webpage,
'error', default=None)
if not error and '>Sorry' in webpage:
error = 'Video %s is not available' % video_id
if error:
raise ExtractorError(error, expected=True)
self._sort_formats(formats)
return {

View File

@@ -10,19 +10,33 @@ from ..utils import (
)
class TeleQuebecIE(InfoExtractor):
class TeleQuebecBaseIE(InfoExtractor):
@staticmethod
def _limelight_result(media_id):
return {
'_type': 'url_transparent',
'url': smuggle_url(
'limelight:media:' + media_id, {'geo_countries': ['CA']}),
'ie_key': 'LimelightMedia',
}
class TeleQuebecIE(TeleQuebecBaseIE):
_VALID_URL = r'https?://zonevideo\.telequebec\.tv/media/(?P<id>\d+)'
_TESTS = [{
'url': 'http://zonevideo.telequebec.tv/media/20984/le-couronnement-de-new-york/couronnement-de-new-york',
'md5': 'fe95a0957e5707b1b01f5013e725c90f',
# available till 01.01.2023
'url': 'http://zonevideo.telequebec.tv/media/37578/un-petit-choc-et-puis-repart/un-chef-a-la-cabane',
'info_dict': {
'id': '20984',
'id': '577116881b4b439084e6b1cf4ef8b1b3',
'ext': 'mp4',
'title': 'Le couronnement de New York',
'description': 'md5:f5b3d27a689ec6c1486132b2d687d432',
'upload_date': '20170201',
'timestamp': 1485972222,
}
'title': 'Un petit choc et puis repart!',
'description': 'md5:b04a7e6b3f74e32d7b294cffe8658374',
'upload_date': '20180222',
'timestamp': 1519326631,
},
'params': {
'skip_download': True,
},
}, {
# no description
'url': 'http://zonevideo.telequebec.tv/media/30261',
@@ -31,19 +45,107 @@ class TeleQuebecIE(InfoExtractor):
def _real_extract(self, url):
media_id = self._match_id(url)
media_data = self._download_json(
'https://mnmedias.api.telequebec.tv/api/v2/media/' + media_id,
media_id)['media']
return {
'_type': 'url_transparent',
'id': media_id,
'url': smuggle_url(
'limelight:media:' + media_data['streamInfo']['sourceId'],
{'geo_countries': ['CA']}),
'title': media_data['title'],
info = self._limelight_result(media_data['streamInfo']['sourceId'])
info.update({
'title': media_data.get('title'),
'description': try_get(
media_data, lambda x: x['descriptions'][0]['text'], compat_str),
'duration': int_or_none(
media_data.get('durationInMilliseconds'), 1000),
'ie_key': 'LimelightMedia',
})
return info
class TeleQuebecEmissionIE(TeleQuebecBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:
[^/]+\.telequebec\.tv/emissions/|
(?:www\.)?telequebec\.tv/
)
(?P<id>[^?#&]+)
'''
_TESTS = [{
'url': 'http://lindicemcsween.telequebec.tv/emissions/100430013/des-soins-esthetiques-a-377-d-interets-annuels-ca-vous-tente',
'info_dict': {
'id': '66648a6aef914fe3badda25e81a4d50a',
'ext': 'mp4',
'title': "Des soins esthétiques à 377 % d'intérêts annuels, ça vous tente?",
'description': 'md5:369e0d55d0083f1fc9b71ffb640ea014',
'upload_date': '20171024',
'timestamp': 1508862118,
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://bancpublic.telequebec.tv/emissions/emission-49/31986/jeunes-meres-sous-pression',
'only_matching': True,
}, {
'url': 'http://www.telequebec.tv/masha-et-michka/epi059masha-et-michka-3-053-078',
'only_matching': True,
}, {
'url': 'http://www.telequebec.tv/documentaire/bebes-sur-mesure/',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
media_id = self._search_regex(
r'mediaUID\s*:\s*["\'][Ll]imelight_(?P<id>[a-z0-9]{32})', webpage,
'limelight id')
info = self._limelight_result(media_id)
info.update({
'title': self._og_search_title(webpage, default=None),
'description': self._og_search_description(webpage, default=None),
})
return info
class TeleQuebecLiveIE(InfoExtractor):
_VALID_URL = r'https?://zonevideo\.telequebec\.tv/(?P<id>endirect)'
_TEST = {
'url': 'http://zonevideo.telequebec.tv/endirect/',
'info_dict': {
'id': 'endirect',
'ext': 'mp4',
'title': 're:^Télé-Québec - En direct [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'is_live': True,
},
'params': {
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
m3u8_url = None
webpage = self._download_webpage(
'https://player.telequebec.tv/Tq_VideoPlayer.js', video_id,
fatal=False)
if webpage:
m3u8_url = self._search_regex(
r'm3U8Url\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'm3u8 url', default=None, group='url')
if not m3u8_url:
m3u8_url = 'https://teleqmmd.mmdlive.lldns.net/teleqmmd/f386e3b206814e1f8c8c1c71c0f8e748/manifest.m3u8'
formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', m3u8_id='hls')
self._sort_formats(formats)
return {
'id': video_id,
'title': self._live_title('Télé-Québec - En direct'),
'is_live': True,
'formats': formats,
}

View File

@@ -132,7 +132,7 @@ class ToggleIE(InfoExtractor):
formats = []
for video_file in info.get('Files', []):
video_url, vid_format = video_file.get('URL'), video_file.get('Format')
if not video_url or not vid_format:
if not video_url or video_url == 'NA' or not vid_format:
continue
ext = determine_ext(video_url)
vid_format = vid_format.replace(' ', '')
@@ -143,6 +143,18 @@ class ToggleIE(InfoExtractor):
note='Downloading %s m3u8 information' % vid_format,
errnote='Failed to download %s m3u8 information' % vid_format,
fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
video_url, video_id, mpd_id=vid_format,
note='Downloading %s MPD manifest' % vid_format,
errnote='Failed to download %s MPD manifest' % vid_format,
fatal=False))
elif ext == 'ism':
formats.extend(self._extract_ism_formats(
video_url, video_id, ism_id=vid_format,
note='Downloading %s ISM manifest' % vid_format,
errnote='Failed to download %s ISM manifest' % vid_format,
fatal=False))
elif ext in ('mp4', 'wvm'):
# wvm are drm-protected files
formats.append({

View File

@@ -7,6 +7,7 @@ from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
parse_iso8601,
parse_duration,
update_url_query,
@@ -16,8 +17,9 @@ from ..utils import (
class TVNowBaseIE(InfoExtractor):
_VIDEO_FIELDS = (
'id', 'title', 'free', 'geoblocked', 'articleLong', 'articleShort',
'broadcastStartDate', 'isDrm', 'duration', 'manifest.dashclear',
'format.defaultImage169Format', 'format.defaultImage169Logo')
'broadcastStartDate', 'isDrm', 'duration', 'season', 'episode',
'manifest.dashclear', 'format.title', 'format.defaultImage169Format',
'format.defaultImage169Logo')
def _call_api(self, path, video_id, query):
return self._download_json(
@@ -66,6 +68,10 @@ class TVNowBaseIE(InfoExtractor):
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
'series': f.get('title'),
'season_number': int_or_none(info.get('season')),
'episode_number': int_or_none(info.get('episode')),
'episode': title,
'formats': formats,
}
@@ -74,18 +80,21 @@ class TVNowIE(TVNowBaseIE):
_VALID_URL = r'https?://(?:www\.)?tvnow\.(?:de|at|ch)/(?:rtl(?:2|plus)?|nitro|superrtl|ntv|vox)/(?P<show_id>[^/]+)/(?:(?:list/[^/]+|jahr/\d{4}/\d{1,2})/)?(?P<id>[^/]+)/(?:player|preview)'
_TESTS = [{
# rtl
'url': 'https://www.tvnow.de/rtl/alarm-fuer-cobra-11/freier-fall/player?return=/rtl',
'url': 'https://www.tvnow.de/rtl2/grip-das-motormagazin/der-neue-porsche-911-gt-3/player',
'info_dict': {
'id': '385314',
'display_id': 'alarm-fuer-cobra-11/freier-fall',
'id': '331082',
'display_id': 'grip-das-motormagazin/der-neue-porsche-911-gt-3',
'ext': 'mp4',
'title': 'Freier Fall',
'description': 'md5:8c2d8f727261adf7e0dc18366124ca02',
'title': 'Der neue Porsche 911 GT 3',
'description': 'md5:6143220c661f9b0aae73b245e5d898bb',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1512677700,
'upload_date': '20171207',
'duration': 2862.0,
'timestamp': 1495994400,
'upload_date': '20170528',
'duration': 5283,
'series': 'GRIP - Das Motormagazin',
'season_number': 14,
'episode_number': 405,
'episode': 'Der neue Porsche 911 GT 3',
},
}, {
# rtl2

View File

@@ -5,6 +5,7 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_kwargs,
compat_str,
compat_urllib_request,
compat_urlparse,
@@ -114,6 +115,11 @@ class UdemyIE(InfoExtractor):
error_str += ' - %s' % error_data.get('formErrors')
raise ExtractorError(error_str, expected=True)
def _download_webpage(self, *args, **kwargs):
kwargs.setdefault('headers', {})['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.4'
return super(UdemyIE, self)._download_webpage(
*args, **compat_kwargs(kwargs))
def _download_json(self, url_or_request, *args, **kwargs):
headers = {
'X-Udemy-Snail-Case': 'true',

View File

@@ -5,113 +5,52 @@ import re
import time
import hashlib
import json
import random
from .adobepass import AdobePassIE
from .youtube import YoutubeIE
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..compat import (
compat_HTTPError,
compat_str,
)
from ..utils import (
ExtractorError,
int_or_none,
parse_age_limit,
str_or_none,
parse_duration,
ExtractorError,
extract_attributes,
try_get,
)
class ViceBaseIE(AdobePassIE):
def _extract_preplay_video(self, url, locale, webpage):
watch_hub_data = extract_attributes(self._search_regex(
r'(?s)(<watch-hub\s*.+?</watch-hub>)', webpage, 'watch hub'))
video_id = watch_hub_data['vms-id']
title = watch_hub_data['video-title']
query = {}
is_locked = watch_hub_data.get('video-locked') == '1'
if is_locked:
resource = self._get_mvpd_resource(
'VICELAND', title, video_id,
watch_hub_data.get('video-rating'))
query['tvetoken'] = self._extract_mvpd_auth(
url, video_id, 'VICELAND', resource)
# signature generation algorithm is reverse engineered from signatureGenerator in
# webpack:///../shared/~/vice-player/dist/js/vice-player.js in
# https://www.viceland.com/assets/common/js/web.vendor.bundle.js
exp = int(time.time()) + 14400
query.update({
'exp': exp,
'sign': hashlib.sha512(('%s:GET:%d' % (video_id, exp)).encode()).hexdigest(),
})
try:
host = 'www.viceland' if is_locked else self._PREPLAY_HOST
preplay = self._download_json(
'https://%s.com/%s/preplay/%s' % (host, locale, video_id),
video_id, query=query)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
error = json.loads(e.cause.read().decode())
raise ExtractorError('%s said: %s' % (
self.IE_NAME, error['details']), expected=True)
raise
video_data = preplay['video']
base = video_data['base']
uplynk_preplay_url = preplay['preplayURL']
episode = video_data.get('episode', {})
channel = video_data.get('channel', {})
subtitles = {}
cc_url = preplay.get('ccURL')
if cc_url:
subtitles['en'] = [{
'url': cc_url,
}]
return {
'_type': 'url_transparent',
'url': uplynk_preplay_url,
'id': video_id,
'title': title,
'description': base.get('body') or base.get('display_body'),
'thumbnail': watch_hub_data.get('cover-image') or watch_hub_data.get('thumbnail'),
'duration': int_or_none(video_data.get('video_duration')) or parse_duration(watch_hub_data.get('video-duration')),
'timestamp': int_or_none(video_data.get('created_at'), 1000),
'age_limit': parse_age_limit(video_data.get('video_rating')),
'series': video_data.get('show_title') or watch_hub_data.get('show-title'),
'episode_number': int_or_none(episode.get('episode_number') or watch_hub_data.get('episode')),
'episode_id': str_or_none(episode.get('id') or video_data.get('episode_id')),
'season_number': int_or_none(watch_hub_data.get('season')),
'season_id': str_or_none(episode.get('season_id')),
'uploader': channel.get('base', {}).get('title') or watch_hub_data.get('channel-title'),
'uploader_id': str_or_none(channel.get('id')),
'subtitles': subtitles,
'ie_key': 'UplynkPreplay',
}
class ViceIE(ViceBaseIE):
class ViceIE(AdobePassIE):
IE_NAME = 'vice'
_VALID_URL = r'https?://(?:.+?\.)?vice\.com/(?:(?P<locale>[^/]+)/)?videos?/(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?:(?:video|vms)\.vice|(?:www\.)?viceland)\.com/(?P<locale>[^/]+)/(?:video/[^/]+|embed)/(?P<id>[\da-f]+)'
_TESTS = [{
'url': 'https://news.vice.com/video/experimenting-on-animals-inside-the-monkey-lab',
'md5': '7d3ae2f9ba5f196cdd9f9efd43657ac2',
'url': 'https://video.vice.com/en_us/video/pet-cremator/58c69e38a55424f1227dc3f7',
'info_dict': {
'id': 'N2bzkydjraWDGwnt8jAttCF6Y0PDv4Zj',
'ext': 'flv',
'title': 'Monkey Labs of Holland',
'description': 'md5:92b3c7dcbfe477f772dd4afa496c9149',
'id': '5e647f0125e145c9aef2069412c0cbde',
'ext': 'mp4',
'title': '10 Questions You Always Wanted To Ask: Pet Cremator',
'description': 'md5:fe856caacf61fe0e74fab15ce2b07ca5',
'uploader': 'vice',
'uploader_id': '57a204088cb727dec794c67b',
'timestamp': 1489664942,
'upload_date': '20170316',
'age_limit': 14,
},
'add_ie': ['Ooyala'],
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['UplynkPreplay'],
}, {
# geo restricted to US
'url': 'https://video.vice.com/en_us/video/the-signal-from-tolva/5816510690b70e6c5fd39a56',
'info_dict': {
'id': '5816510690b70e6c5fd39a56',
'id': '930c0ad1f47141cc955087eecaddb0e2',
'ext': 'mp4',
'uploader': 'Waypoint',
'uploader': 'waypoint',
'title': 'The Signal From Tölva',
'description': 'md5:3927e3c79f9e8094606a2b3c5b5e55d5',
'uploader_id': '57f7d621e05ca860fa9ccaf9',
@@ -139,27 +78,131 @@ class ViceIE(ViceBaseIE):
'params': {
# AES-encrypted m3u8
'skip_download': True,
'proxy': '127.0.0.1:8118',
},
'add_ie': ['UplynkPreplay'],
}, {
'url': 'https://video.vice.com/en_us/video/pizza-show-trailer/56d8c9a54d286ed92f7f30e4',
'only_matching': True,
}, {
'url': 'https://video.vice.com/en_us/embed/57f41d3556a0a80f54726060',
'only_matching': True,
}, {
'url': 'https://vms.vice.com/en_us/video/preplay/58c69e38a55424f1227dc3f7',
'only_matching': True,
}, {
'url': 'https://www.viceland.com/en_us/video/thursday-march-1-2018/5a8f2d7ff1cdb332dd446ec1',
'only_matching': True,
}]
_PREPLAY_HOST = 'video.vice'
_PREPLAY_HOST = 'vms.vice'
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe\b[^>]+\bsrc=["\']((?:https?:)?//video\.vice\.com/[^/]+/embed/[\da-f]+)',
webpage)
@staticmethod
def _extract_url(webpage):
urls = ViceIE._extract_urls(webpage)
return urls[0] if urls else None
def _real_extract(self, url):
locale, video_id = re.match(self._VALID_URL, url).groups()
webpage, urlh = self._download_webpage_handle(url, video_id)
embed_code = self._search_regex(
r'embedCode=([^&\'"]+)', webpage,
'ooyala embed code', default=None)
if embed_code:
return self.url_result('ooyala:%s' % embed_code, 'Ooyala')
youtube_id = self._search_regex(
r'data-youtube-id="([^"]+)"', webpage, 'youtube id', default=None)
if youtube_id:
return self.url_result(youtube_id, 'Youtube')
return self._extract_preplay_video(urlh.geturl(), locale, webpage)
webpage = self._download_webpage(
'https://video.vice.com/%s/embed/%s' % (locale, video_id),
video_id)
video = self._parse_json(
self._search_regex(
r'PREFETCH_DATA\s*=\s*({.+?})\s*;\s*\n', webpage,
'app state'), video_id)['video']
video_id = video.get('vms_id') or video.get('id') or video_id
title = video['title']
is_locked = video.get('locked')
rating = video.get('rating')
thumbnail = video.get('thumbnail_url')
duration = int_or_none(video.get('duration'))
series = try_get(
video, lambda x: x['episode']['season']['show']['title'],
compat_str)
episode_number = try_get(
video, lambda x: x['episode']['episode_number'])
season_number = try_get(
video, lambda x: x['episode']['season']['season_number'])
uploader = None
query = {}
if is_locked:
resource = self._get_mvpd_resource(
'VICELAND', title, video_id, rating)
query['tvetoken'] = self._extract_mvpd_auth(
url, video_id, 'VICELAND', resource)
# signature generation algorithm is reverse engineered from signatureGenerator in
# webpack:///../shared/~/vice-player/dist/js/vice-player.js in
# https://www.viceland.com/assets/common/js/web.vendor.bundle.js
# new JS is located here https://vice-web-statics-cdn.vice.com/vice-player/player-embed.js
exp = int(time.time()) + 1440
query.update({
'exp': exp,
'sign': hashlib.sha512(('%s:GET:%d' % (video_id, exp)).encode()).hexdigest(),
'_ad_blocked': None,
'_ad_unit': '',
'_debug': '',
'platform': 'desktop',
'rn': random.randint(10000, 100000),
'fbprebidtoken': '',
})
try:
host = 'www.viceland' if is_locked else self._PREPLAY_HOST
preplay = self._download_json(
'https://%s.com/%s/video/preplay/%s' % (host, locale, video_id),
video_id, query=query)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 401):
error = json.loads(e.cause.read().decode())
error_message = error.get('error_description') or error['details']
raise ExtractorError('%s said: %s' % (
self.IE_NAME, error_message), expected=True)
raise
video_data = preplay['video']
base = video_data['base']
uplynk_preplay_url = preplay['preplayURL']
episode = video_data.get('episode', {})
channel = video_data.get('channel', {})
subtitles = {}
cc_url = preplay.get('ccURL')
if cc_url:
subtitles['en'] = [{
'url': cc_url,
}]
return {
'_type': 'url_transparent',
'url': uplynk_preplay_url,
'id': video_id,
'title': title,
'description': base.get('body') or base.get('display_body'),
'thumbnail': thumbnail,
'duration': int_or_none(video_data.get('video_duration')) or duration,
'timestamp': int_or_none(video_data.get('created_at'), 1000),
'age_limit': parse_age_limit(video_data.get('video_rating')),
'series': video_data.get('show_title') or series,
'episode_number': int_or_none(episode.get('episode_number') or episode_number),
'episode_id': str_or_none(episode.get('id') or video_data.get('episode_id')),
'season_number': int_or_none(season_number),
'season_id': str_or_none(episode.get('season_id')),
'uploader': channel.get('base', {}).get('title') or channel.get('name') or uploader,
'uploader_id': str_or_none(channel.get('id')),
'subtitles': subtitles,
'ie_key': 'UplynkPreplay',
}
class ViceShowIE(InfoExtractor):
@@ -203,14 +246,15 @@ class ViceArticleIE(InfoExtractor):
_TESTS = [{
'url': 'https://www.vice.com/en_us/article/on-set-with-the-woman-making-mormon-porn-in-utah',
'info_dict': {
'id': '58dc0a3dee202d2a0ccfcbd8',
'id': '41eae2a47b174a1398357cec55f1f6fc',
'ext': 'mp4',
'title': 'Mormon War on Porn ',
'description': 'md5:ad396a2481e7f8afb5ed486878421090',
'uploader': 'VICE',
'uploader_id': '57a204088cb727dec794c693',
'timestamp': 1489160690,
'upload_date': '20170310',
'description': 'md5:6394a8398506581d0346b9ab89093fef',
'uploader': 'vice',
'uploader_id': '57a204088cb727dec794c67b',
'timestamp': 1491883129,
'upload_date': '20170411',
'age_limit': 17,
},
'params': {
# AES-encrypted m3u8
@@ -219,17 +263,35 @@ class ViceArticleIE(InfoExtractor):
'add_ie': ['UplynkPreplay'],
}, {
'url': 'https://www.vice.com/en_us/article/how-to-hack-a-car',
'md5': 'a7ecf64ee4fa19b916c16f4b56184ae2',
'md5': '7fe8ebc4fa3323efafc127b82bd821d9',
'info_dict': {
'id': '3jstaBeXgAs',
'ext': 'mp4',
'title': 'How to Hack a Car: Phreaked Out (Episode 2)',
'description': 'md5:ee95453f7ff495db8efe14ae8bf56f30',
'uploader_id': 'MotherboardTV',
'uploader': 'Motherboard',
'uploader_id': 'MotherboardTV',
'upload_date': '20140529',
},
'add_ie': ['Youtube'],
}, {
'url': 'https://www.vice.com/en_us/article/znm9dx/karley-sciortino-slutever-reloaded',
'md5': 'a7ecf64ee4fa19b916c16f4b56184ae2',
'info_dict': {
'id': 'e2ed435eb67e43efb66e6ef9a6930a88',
'ext': 'mp4',
'title': "Making The World's First Male Sex Doll",
'description': 'md5:916078ef0e032d76343116208b6cc2c4',
'uploader': 'vice',
'uploader_id': '57a204088cb727dec794c67b',
'timestamp': 1476919911,
'upload_date': '20161019',
'age_limit': 17,
},
'params': {
'skip_download': True,
},
'add_ie': [ViceIE.ie_key()],
}, {
'url': 'https://www.vice.com/en_us/article/cowboy-capitalists-part-1',
'only_matching': True,
@@ -244,8 +306,8 @@ class ViceArticleIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
prefetch_data = self._parse_json(self._search_regex(
r'window\.__PREFETCH_DATA\s*=\s*({.*});',
webpage, 'prefetch data'), display_id)
r'__APP_STATE\s*=\s*({.+?})(?:\s*\|\|\s*{}\s*)?;\s*\n',
webpage, 'app state'), display_id)['pageData']
body = prefetch_data['body']
def _url_res(video_url, ie_key):
@@ -256,6 +318,10 @@ class ViceArticleIE(InfoExtractor):
'ie_key': ie_key,
}
vice_url = ViceIE._extract_url(webpage)
if vice_url:
return _url_res(vice_url, ViceIE.ie_key())
embed_code = self._search_regex(
r'embedCode=([^&\'"]+)', body,
'ooyala embed code', default=None)

View File

@@ -1,38 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .vice import ViceBaseIE
class VicelandIE(ViceBaseIE):
_VALID_URL = r'https?://(?:www\.)?viceland\.com/(?P<locale>[^/]+)/video/[^/]+/(?P<id>[a-f0-9]+)'
_TEST = {
'url': 'https://www.viceland.com/en_us/video/trapped/588a70d0dba8a16007de7316',
'info_dict': {
'id': '588a70d0dba8a16007de7316',
'ext': 'mp4',
'title': 'TRAPPED (Series Trailer)',
'description': 'md5:7a8e95c2b6cd86461502a2845e581ccf',
'age_limit': 14,
'timestamp': 1485474122,
'upload_date': '20170126',
'uploader_id': '57a204098cb727dec794c6a3',
'uploader': 'Viceland',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['UplynkPreplay'],
'skip': '404',
}
_PREPLAY_HOST = 'www.viceland'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
locale = mobj.group('locale')
webpage = self._download_webpage(url, video_id)
return self._extract_preplay_video(url, locale, webpage)

View File

@@ -0,0 +1,125 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
float_or_none,
get_element_by_id,
int_or_none,
strip_or_none,
unified_strdate,
urljoin,
)
class VidLiiIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vidlii\.com/(?:watch|embed)\?.*?\bv=(?P<id>[0-9A-Za-z_-]{11})'
_TESTS = [{
'url': 'https://www.vidlii.com/watch?v=tJluaH4BJ3v',
'md5': '9bf7d1e005dfa909b6efb0a1ff5175e2',
'info_dict': {
'id': 'tJluaH4BJ3v',
'ext': 'mp4',
'title': 'Vidlii is against me',
'description': 'md5:fa3f119287a2bfb922623b52b1856145',
'thumbnail': 're:https://.*.jpg',
'uploader': 'APPle5auc31995',
'uploader_url': 'https://www.vidlii.com/user/APPle5auc31995',
'upload_date': '20171107',
'duration': 212,
'view_count': int,
'comment_count': int,
'average_rating': float,
'categories': ['News & Politics'],
'tags': ['Vidlii', 'Jan', 'Videogames'],
}
}, {
'url': 'https://www.vidlii.com/embed?v=tJluaH4BJ3v&a=0',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'https://www.vidlii.com/watch?v=%s' % video_id, video_id)
video_url = self._search_regex(
r'src\s*:\s*(["\'])(?P<url>(?:https?://)?(?:(?!\1).)+)\1', webpage,
'video url', group='url')
title = self._search_regex(
(r'<h1>([^<]+)</h1>', r'<title>([^<]+) - VidLii<'), webpage,
'title')
description = self._html_search_meta(
('description', 'twitter:description'), webpage,
default=None) or strip_or_none(
get_element_by_id('des_text', webpage))
thumbnail = self._html_search_meta(
'twitter:image', webpage, default=None)
if not thumbnail:
thumbnail_path = self._search_regex(
r'img\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'thumbnail', fatal=False, group='url')
if thumbnail_path:
thumbnail = urljoin(url, thumbnail_path)
uploader = self._search_regex(
r'<div[^>]+class=["\']wt_person[^>]+>\s*<a[^>]+\bhref=["\']/user/[^>]+>([^<]+)',
webpage, 'uploader', fatal=False)
uploader_url = 'https://www.vidlii.com/user/%s' % uploader if uploader else None
upload_date = unified_strdate(self._html_search_meta(
'datePublished', webpage, default=None) or self._search_regex(
r'<date>([^<]+)', webpage, 'upload date', fatal=False))
duration = int_or_none(self._html_search_meta(
'video:duration', webpage, 'duration',
default=None) or self._search_regex(
r'duration\s*:\s*(\d+)', webpage, 'duration', fatal=False))
view_count = int_or_none(self._search_regex(
(r'<strong>(\d+)</strong> views',
r'Views\s*:\s*<strong>(\d+)</strong>'),
webpage, 'view count', fatal=False))
comment_count = int_or_none(self._search_regex(
(r'<span[^>]+id=["\']cmt_num[^>]+>(\d+)',
r'Comments\s*:\s*<strong>(\d+)'),
webpage, 'comment count', fatal=False))
average_rating = float_or_none(self._search_regex(
r'rating\s*:\s*([\d.]+)', webpage, 'average rating', fatal=False))
category = self._html_search_regex(
r'<div>Category\s*:\s*</div>\s*<div>\s*<a[^>]+>([^<]+)', webpage,
'category', fatal=False)
categories = [category] if category else None
tags = [
strip_or_none(tag)
for tag in re.findall(
r'<a[^>]+\bhref=["\']/results\?.*?q=[^>]*>([^<]+)',
webpage) if strip_or_none(tag)
] or None
return {
'id': video_id,
'url': video_url,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'uploader_url': uploader_url,
'upload_date': upload_date,
'duration': duration,
'view_count': view_count,
'comment_count': comment_count,
'average_rating': average_rating,
'categories': categories,
'tags': tags,
}

View File

@@ -13,7 +13,7 @@ from ..utils import (
class VidziIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vidzi\.(?:tv|cc)/(?:embed-)?(?P<id>[0-9a-zA-Z]+)'
_VALID_URL = r'https?://(?:www\.)?vidzi\.(?:tv|cc|si)/(?:embed-)?(?P<id>[0-9a-zA-Z]+)'
_TESTS = [{
'url': 'http://vidzi.tv/cghql9yq6emu.html',
'md5': '4f16c71ca0c8c8635ab6932b5f3f1660',
@@ -32,6 +32,9 @@ class VidziIE(InfoExtractor):
}, {
'url': 'http://vidzi.cc/cghql9yq6emu.html',
'only_matching': True,
}, {
'url': 'https://vidzi.si/rph9gztxj1et.html',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -41,21 +41,30 @@ class VimeoBaseInfoExtractor(InfoExtractor):
if self._LOGIN_REQUIRED:
raise ExtractorError('No login info available, needed for using %s.' % self.IE_NAME, expected=True)
return
self.report_login()
webpage = self._download_webpage(self._LOGIN_URL, None, False)
webpage = self._download_webpage(
self._LOGIN_URL, None, 'Downloading login page')
token, vuid = self._extract_xsrft_and_vuid(webpage)
data = urlencode_postdata({
data = {
'action': 'login',
'email': username,
'password': password,
'service': 'vimeo',
'token': token,
})
login_request = sanitized_Request(self._LOGIN_URL, data)
login_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
login_request.add_header('Referer', self._LOGIN_URL)
}
self._set_vimeo_cookie('vuid', vuid)
self._download_webpage(login_request, None, False, 'Wrong login info')
try:
self._download_webpage(
self._LOGIN_URL, None, 'Logging in',
data=urlencode_postdata(data), headers={
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': self._LOGIN_URL,
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 418:
raise ExtractorError(
'Unable to log in: bad username or password',
expected=True)
raise ExtractorError('Unable to log in')
def _verify_video_password(self, url, video_id, webpage):
password = self._downloader.params.get('videopassword')

View File

@@ -0,0 +1,101 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
qualities,
unescapeHTML,
)
class YapFilesIE(InfoExtractor):
_YAPFILES_URL = r'//(?:(?:www|api)\.)?yapfiles\.ru/get_player/*\?.*?\bv=(?P<id>\w+)'
_VALID_URL = r'https?:%s' % _YAPFILES_URL
_TESTS = [{
# with hd
'url': 'http://www.yapfiles.ru/get_player/?v=vMDE1NjcyNDUt0413',
'md5': '2db19e2bfa2450568868548a1aa1956c',
'info_dict': {
'id': 'vMDE1NjcyNDUt0413',
'ext': 'mp4',
'title': 'Самый худший пароль WIFI',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 72,
},
}, {
# without hd
'url': 'https://api.yapfiles.ru/get_player/?uid=video_player_1872528&plroll=1&adv=1&v=vMDE4NzI1Mjgt690b',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [unescapeHTML(mobj.group('url')) for mobj in re.finditer(
r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?%s.*?)\1'
% YapFilesIE._YAPFILES_URL, webpage)]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id, fatal=False)
player_url = None
query = {}
if webpage:
player_url = self._search_regex(
r'player\.init\s*\(\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'player url', default=None, group='url')
if not player_url:
player_url = 'http://api.yapfiles.ru/load/%s/' % video_id
query = {
'md5': 'ded5f369be61b8ae5f88e2eeb2f3caff',
'type': 'json',
'ref': url,
}
player = self._download_json(
player_url, video_id, query=query)['player']
playlist_url = player['playlist']
title = player['title']
thumbnail = player.get('poster')
if title == 'Ролик удален' or 'deleted.jpg' in (thumbnail or ''):
raise ExtractorError(
'Video %s has been removed' % video_id, expected=True)
playlist = self._download_json(
playlist_url, video_id)['player']['main']
hd_height = int_or_none(player.get('hd'))
QUALITIES = ('sd', 'hd')
quality_key = qualities(QUALITIES)
formats = []
for format_id in QUALITIES:
is_hd = format_id == 'hd'
format_url = playlist.get(
'file%s' % ('_hd' if is_hd else ''))
if not format_url or not isinstance(format_url, compat_str):
continue
formats.append({
'url': format_url,
'format_id': format_id,
'quality': quality_key(format_id),
'height': hd_height if is_hd else None,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': int_or_none(player.get('length')),
'formats': formats,
}

View File

@@ -2583,7 +2583,11 @@ class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
}]
class YoutubeSearchIE(SearchInfoExtractor, YoutubePlaylistIE):
class YoutubeSearchBaseInfoExtractor(YoutubePlaylistBaseInfoExtractor):
_VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})(?:[^"]*"[^>]+\btitle="(?P<title>[^"]+))?'
class YoutubeSearchIE(SearchInfoExtractor, YoutubeSearchBaseInfoExtractor):
IE_DESC = 'YouTube.com searches'
# there doesn't appear to be a real limit, for example if you search for
# 'python' you get more than 8.000.000 results
@@ -2617,8 +2621,7 @@ class YoutubeSearchIE(SearchInfoExtractor, YoutubePlaylistIE):
raise ExtractorError(
'[youtube] No video results', expected=True)
new_videos = self._ids_to_results(orderedSet(re.findall(
r'href="/watch\?v=(.{11})', html_content)))
new_videos = list(self._process_page(html_content))
videos += new_videos
if not new_videos or len(videos) > limit:
break
@@ -2641,11 +2644,10 @@ class YoutubeSearchDateIE(YoutubeSearchIE):
_EXTRA_QUERY_ARGS = {'search_sort': 'video_date_uploaded'}
class YoutubeSearchURLIE(YoutubePlaylistBaseInfoExtractor):
class YoutubeSearchURLIE(YoutubeSearchBaseInfoExtractor):
IE_DESC = 'YouTube.com search URLs'
IE_NAME = 'youtube:search_url'
_VALID_URL = r'https?://(?:www\.)?youtube\.com/results\?(.*?&)?(?:search_query|q)=(?P<query>[^&]+)(?:[&]|$)'
_VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})(?:[^"]*"[^>]+\btitle="(?P<title>[^"]+))?'
_TESTS = [{
'url': 'https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video',
'playlist_mincount': 5,

View File

@@ -534,7 +534,7 @@ def parseOpts(overrideArguments=None):
workarounds.add_option(
'--prefer-insecure',
'--prefer-unsecure', action='store_true', dest='prefer_insecure',
help='Use an unencrypted connection to retrieve information whenever possible')
help='Use an unencrypted connection to retrieve information about the video. (Currently supported only for YouTube)')
workarounds.add_option(
'--user-agent',
metavar='UA', dest='user_agent',

View File

@@ -31,7 +31,8 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
temp_filename = prepend_extension(filename, 'temp')
if not info.get('thumbnails'):
raise EmbedThumbnailPPError('Thumbnail was not found. Nothing to do.')
self._downloader.to_screen('[embedthumbnail] There aren\'t any thumbnails to embed')
return [], info
thumbnail_filename = info['thumbnails'][-1]['filename']

View File

@@ -28,10 +28,10 @@ def rsa_verify(message, signature, key):
return expected == signature
def update_self(to_screen, verbose, opener, prefer_insecure=False):
def update_self(to_screen, verbose, opener):
"""Update the program file with the latest version from the repository"""
UPDATE_URL = '//rg3.github.io/youtube-dl/update/'
UPDATE_URL = 'https://rg3.github.io/youtube-dl/update/'
VERSION_URL = UPDATE_URL + 'LATEST_VERSION'
JSON_URL = UPDATE_URL + 'versions.json'
UPDATES_RSA_KEY = (0x9d60ee4d8f805312fdb15a62f87b95bd66177b91df176765d13514a0f1754bcd2057295c5b6f1d35daa6742c3ffc9a82d3e118861c207995a8031e151d863c9927e304576bc80692bc8e094896fcf11b66f3e29e04e3a71e9a11558558acea1840aec37fc396fb6b65dc81a1c4144e03bd1c011de62e3f1357b327d08426fe93, 65537)
@@ -40,13 +40,9 @@ def update_self(to_screen, verbose, opener, prefer_insecure=False):
to_screen('It looks like you installed youtube-dl with a package manager, pip, setup.py or a tarball. Please use that to update.')
return
def guess_scheme(url, insecure=False):
return 'http%s:%s' % ('' if insecure is True else 's', url)
# Check if there is a new version
try:
newversion = opener.open(guess_scheme(
VERSION_URL, prefer_insecure)).read().decode('utf-8').strip()
newversion = opener.open(VERSION_URL).read().decode('utf-8').strip()
except Exception:
if verbose:
to_screen(encode_compat_str(traceback.format_exc()))
@@ -58,8 +54,7 @@ def update_self(to_screen, verbose, opener, prefer_insecure=False):
# Download and check versions info
try:
versions_info = opener.open(guess_scheme(
JSON_URL, prefer_insecure)).read().decode('utf-8')
versions_info = opener.open(JSON_URL).read().decode('utf-8')
versions_info = json.loads(versions_info)
except Exception:
if verbose:

View File

@@ -1689,6 +1689,28 @@ def parse_count(s):
return lookup_unit_table(_UNIT_TABLE, s)
def parse_resolution(s):
if s is None:
return {}
mobj = re.search(r'\b(?P<w>\d+)\s*[xX×]\s*(?P<h>\d+)\b', s)
if mobj:
return {
'width': int(mobj.group('w')),
'height': int(mobj.group('h')),
}
mobj = re.search(r'\b(\d+)[pPiI]\b', s)
if mobj:
return {'height': int(mobj.group(1))}
mobj = re.search(r'\b([48])[kK]\b', s)
if mobj:
return {'height': int(mobj.group(1)) * 540}
return {}
def month_by_name(name, lang='en'):
""" Return the number of a month by (locale-independently) English name """

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2018.02.22'
__version__ = '2018.03.10'