Compare commits

..

40 Commits

Author SHA1 Message Date
Sergey M․
537191826f release 2017.06.05 2017-06-05 00:48:07 +07:00
Sergey M․
130880ba48 [ChangeLog] Actualize 2017-06-05 00:43:38 +07:00
Sergey M․
f8ba3fda4d Credit @jktjkt for dvtv formats (#13063) 2017-06-05 00:38:44 +07:00
Sergey M․
e1b90cc3db Credit @mikf for beam:vod (#13032) 2017-06-05 00:35:41 +07:00
Sergey M․
43e6579558 Credit @adamvoss for bandcamp:weekly (#12758) 2017-06-04 23:22:19 +07:00
Sergey M․
6d923aab35 [bandcamp:weekly] Improve and extract more metadata (closes #12758) 2017-06-04 23:21:30 +07:00
Adam Voss
62bafabc09 [bandcamp:weekly] Add extractor 2017-06-04 23:21:07 +07:00
Sergey M․
9edcdac90c [pornhub:uservideos] Add missing raise 2017-06-04 20:39:55 +07:00
Sergey M․
cd138d8bd4 [pornhub:playlist] Fix extraction (closes #13281) 2017-06-04 15:54:19 +07:00
Sergey M․
cd750b731c [godtv] Remove extractor (closes #13175) 2017-06-03 22:08:12 +07:00
CeruleanSky
4bede0d8f5 [YoutubeDL] Don't emit ANSI escape codes on Windows 2017-06-03 19:14:23 +07:00
Sergey M․
f129c3f349 [safari] Fix typo (closes #13252) 2017-06-02 01:03:51 +07:00
Sergey M․
39d4c1be4d [youtube] Improve chapters extraction (closes #13247) 2017-06-01 23:29:45 +07:00
Sergey M․
f7a747ce59 [1tv] Lower preference for http formats (closes #13246) 2017-06-01 22:19:52 +07:00
Sergey M․
4489d41816 [francetv] Relax _VALID_URL 2017-06-01 00:15:15 +07:00
Sergey M․
87b5184a0d [drbonanza] Fix extraction (closes #13231) 2017-05-31 23:56:32 +07:00
Remita Amine
c56ad5c975 [packtpub] Fix authentication(closes #13240) 2017-05-31 15:44:29 +01:00
Sergey M․
6b7ce85cdc [README.md] Mention http_dash_segments protocol 2017-05-30 23:50:48 +07:00
Yen Chi Hsuan
d10d0e3cf8 [README.md] Add an example for how to use .netrc on Windows
That's a Python bug: http://bugs.python.org/issue28334
Most likely it will be fixed in Python 3.7: https://github.com/python/cpython/pull/123
2017-05-29 14:58:07 +08:00
Sergey M․
941ea38ef5 release 2017.05.29 2017-05-29 00:42:18 +07:00
Sergey M․
99bea8d298 [ChangeLog] Actualize 2017-05-29 00:33:56 +07:00
Yen Chi Hsuan
a49eccdfa7 [youtube] Parse player_url if format URLs are encrypted or DASH MPDs are requested
Fixes #13211
2017-05-28 20:20:20 +08:00
Sergey M․
a846173d93 [xhamster] Simplify (closes #13216) 2017-05-28 07:55:56 +07:00
fiocfun
78e210dea5 [xhamster] Fix author and like/dislike count extraction 2017-05-28 07:55:07 +07:00
Sergey M․
8555204274 [xhamster] Extract categories (closes #11728) 2017-05-28 07:50:15 +07:00
Sergey M․
164fcbfeb7 [abcnews] Improve and remove duplicate test (closes #12851) 2017-05-28 07:06:56 +07:00
Tithen-Firion
bc22df29c4 [abcnews] Add support for embed URLs 2017-05-28 07:06:29 +07:00
Sergey M․
7e688d2f6a [gaskrank] Improve (closes #12493) 2017-05-28 06:47:38 +07:00
motophil
5a6d1da442 [gaskrank] Fix extraction 2017-05-28 06:47:30 +07:00
Sergey M․
703751add4 [medialaan] PEP 8 (closes #12774) 2017-05-28 06:27:57 +07:00
midas02
4050be78e5 [medialaan] Fix videos with missing videoUrl
A rough trick to get around the two different json styles medialaan seems to be using.
Fix for these example videos:
https://vtmkzoom.be/video?aid=45724
https://vtmkzoom.be/video?aid=45425
2017-05-28 06:27:52 +07:00
Sergey M․
4d9fc40100 [dvtv] Improve and fix playlists support (closes #13063) 2017-05-28 06:19:54 +07:00
Jan Kundrát
765522345f [dvtv] Parse adaptive formats as well
The old code hit an error when it attempted to parse the string
"adaptive" for video height. Actually parsing the returned playlists is
a good idea because it adds more output formats, including some
audio-only-ones.
2017-05-28 06:19:46 +07:00
Sergey M․
6bceb36b99 [beam] Improve and add support for mixer.com (closes #13032) 2017-05-28 05:43:04 +07:00
Mike Fährmann
1e0d65f0bd [beam:vod] Add extractor 2017-05-28 05:42:23 +07:00
Sergey M․
03327bc9a6 [cbsinteractive] Relax _VALID_URL (closes #13213) 2017-05-27 22:37:24 +07:00
Yen Chi Hsuan
b407d8533d [utils] Drop an compatibility wrapper for Python < 2.6
addinfourl.getcode is added since Python 2.6a1. As youtube-dl now
requires 2.6+, this is no longer necessary.

See 9b0d46db11
2017-05-27 23:05:02 +08:00
Remita Amine
20e2c9de04 [adn] fix formats extraction 2017-05-26 20:00:44 +01:00
Yen Chi Hsuan
d16c0121b9 [youku] Extract more metadata (closes #10433) 2017-05-27 00:08:37 +08:00
Sergey M․
7f4c3a7439 [cbsnews] Fix extraction (closes #13205) 2017-05-26 22:42:27 +07:00
29 changed files with 602 additions and 353 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.05.26*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.05.26**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.06.05*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.06.05**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.05.26
[debug] youtube-dl version 2017.06.05
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -217,3 +217,6 @@ Marvin Ewald
Frédéric Bournival
Timendum
gritstub
Adam Voss
Mike Fährmann
Jan Kundrát

View File

@@ -1,3 +1,39 @@
version 2017.06.05
Core
* [YoutubeDL] Don't emit ANSI escape codes on Windows (#13270)
Extractors
+ [bandcamp:weekly] Add support for bandcamp weekly (#12758)
* [pornhub:playlist] Fix extraction (#13281)
- [godtv] Remove extractor (#13175)
* [safari] Fix typo (#13252)
* [youtube] Improve chapters extraction (#13247)
* [1tv] Lower preference for HTTP formats (#13246)
* [francetv] Relax URL regular expression
* [drbonanza] Fix extraction (#13231)
* [packtpub] Fix authentication (#13240)
version 2017.05.29
Extractors
* [youtube] Fix DASH MPD extraction for videos with non-encrypted format URLs
(#13211)
* [xhamster] Fix uploader and like/dislike count extraction (#13216))
+ [xhamster] Extract categories (#11728)
+ [abcnews] Add support for embed URLs (#12851)
* [gaskrank] Fix extraction (#12493)
* [medialaan] Fix videos with missing videoUrl (#12774)
* [dvtv] Fix playlist support
+ [dvtv] Add support for DASH and HLS formats (#3063)
+ [beam:vod] Add support for beam.pro/mixer.com VODs (#13032))
* [cbsinteractive] Relax URL regular expression (#13213)
* [adn] Fix formats extraction
+ [youku] Extract more metadata (#10433)
* [cbsnews] Fix extraction (#13205)
version 2017.05.26
Core

View File

@@ -474,7 +474,10 @@ machine twitch login my_twitch_account_name password my_twitch_password
```
To activate authentication with the `.netrc` file you should pass `--netrc` to youtube-dl or place it in the [configuration file](#configuration).
On Windows you may also need to setup the `%HOME%` environment variable manually.
On Windows you may also need to setup the `%HOME%` environment variable manually. For example:
```
set HOME=%USERPROFILE%
```
# OUTPUT TEMPLATE
@@ -649,7 +652,7 @@ Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begin
- `acodec`: Name of the audio codec in use
- `vcodec`: Name of the video codec in use
- `container`: Name of the container format
- `protocol`: The protocol that will be used for the actual download, lower-case (`http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `mms`, `f4m`, `ism`, `m3u8`, or `m3u8_native`)
- `protocol`: The protocol that will be used for the actual download, lower-case (`http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `mms`, `f4m`, `ism`, `http_dash_segments`, `m3u8`, or `m3u8_native`)
- `format_id`: A short description of the format
Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the video hoster.

View File

@@ -87,13 +87,13 @@
- **bambuser:channel**
- **Bandcamp**
- **Bandcamp:album**
- **Bandcamp:weekly**
- **bangumi.bilibili.com**: BiliBili番剧
- **bbc**: BBC
- **bbc.co.uk**: BBC iPlayer
- **bbc.co.uk:article**: BBC articles
- **bbc.co.uk:iplayer:playlist**
- **bbc.co.uk:playlist**
- **Beam:live**
- **Beatport**
- **Beeg**
- **BehindKink**
@@ -311,7 +311,6 @@
- **Go**
- **Go90**
- **GodTube**
- **GodTV**
- **Golem**
- **GoogleDrive**
- **Goshgay**
@@ -453,6 +452,8 @@
- **mixcloud:playlist**
- **mixcloud:stream**
- **mixcloud:user**
- **Mixer:live**
- **Mixer:vod**
- **MLB**
- **Mnet**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net

View File

@@ -254,6 +254,13 @@ class TestYoutubeChapters(unittest.TestCase):
'title': '3 - Из серпов луны...[Iz serpov luny]',
}]
),
(
# https://www.youtube.com/watch?v=xZW70zEasOk
# time point more than duration
'''● LCS Spring finals: Saturday and Sunday from <a href="#" onclick="yt.www.watch.player.seekTo(13*60+30);return false;">13:30</a> outside the venue! <br />● PAX East: Fri, Sat & Sun - more info in tomorrows video on the main channel!''',
283,
[]
),
]
def test_youtube_chapters(self):

View File

@@ -498,24 +498,25 @@ class YoutubeDL(object):
def to_console_title(self, message):
if not self.params.get('consoletitle', False):
return
if compat_os_name == 'nt' and ctypes.windll.kernel32.GetConsoleWindow():
# c_wchar_p() might not be necessary if `message` is
# already of type unicode()
ctypes.windll.kernel32.SetConsoleTitleW(ctypes.c_wchar_p(message))
if compat_os_name == 'nt':
if ctypes.windll.kernel32.GetConsoleWindow():
# c_wchar_p() might not be necessary if `message` is
# already of type unicode()
ctypes.windll.kernel32.SetConsoleTitleW(ctypes.c_wchar_p(message))
elif 'TERM' in os.environ:
self._write_string('\033]0;%s\007' % message, self._screen_file)
def save_console_title(self):
if not self.params.get('consoletitle', False):
return
if 'TERM' in os.environ:
if compat_os_name != 'nt' and 'TERM' in os.environ:
# Save the title on stack
self._write_string('\033[22;0t', self._screen_file)
def restore_console_title(self):
if not self.params.get('consoletitle', False):
return
if 'TERM' in os.environ:
if compat_os_name != 'nt' and 'TERM' in os.environ:
# Restore the title from stack
self._write_string('\033[23;0t', self._screen_file)

View File

@@ -12,7 +12,15 @@ from ..compat import compat_urlparse
class AbcNewsVideoIE(AMPIE):
IE_NAME = 'abcnews:video'
_VALID_URL = r'https?://abcnews\.go\.com/[^/]+/video/(?P<display_id>[0-9a-z-]+)-(?P<id>\d+)'
_VALID_URL = r'''(?x)
https?://
abcnews\.go\.com/
(?:
[^/]+/video/(?P<display_id>[0-9a-z-]+)-|
video/embed\?.*?\bid=
)
(?P<id>\d+)
'''
_TESTS = [{
'url': 'http://abcnews.go.com/ThisWeek/video/week-exclusive-irans-foreign-minister-zarif-20411932',
@@ -29,6 +37,9 @@ class AbcNewsVideoIE(AMPIE):
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://abcnews.go.com/video/embed?id=46979033',
'only_matching': True,
}, {
'url': 'http://abcnews.go.com/2020/video/2020-husband-stands-teacher-jail-student-affairs-26119478',
'only_matching': True,

View File

@@ -15,6 +15,7 @@ from ..utils import (
intlist_to_bytes,
srt_subtitles_timecode,
strip_or_none,
urljoin,
)
@@ -31,25 +32,28 @@ class ADNIE(InfoExtractor):
'description': 'md5:2f7b5aa76edbc1a7a92cedcda8a528d5',
}
}
_BASE_URL = 'http://animedigitalnetwork.fr'
def _get_subtitles(self, sub_path, video_id):
if not sub_path:
return None
enc_subtitles = self._download_webpage(
'http://animedigitalnetwork.fr/' + sub_path,
video_id, fatal=False)
urljoin(self._BASE_URL, sub_path),
video_id, fatal=False, headers={
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0',
})
if not enc_subtitles:
return None
# http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
dec_subtitles = intlist_to_bytes(aes_cbc_decrypt(
bytes_to_intlist(base64.b64decode(enc_subtitles[24:])),
bytes_to_intlist(b'\nd\xaf\xd2J\xd0\xfc\xe1\xfc\xdf\xb61\xe8\xe1\xf0\xcc'),
bytes_to_intlist(b'\x1b\xe0\x29\x61\x38\x94\x24\x00\x12\xbd\xc5\x80\xac\xce\xbe\xb0'),
bytes_to_intlist(base64.b64decode(enc_subtitles[:24]))
))
subtitles_json = self._parse_json(
dec_subtitles[:-compat_ord(dec_subtitles[-1])],
dec_subtitles[:-compat_ord(dec_subtitles[-1])].decode(),
None, fatal=False)
if not subtitles_json:
return None
@@ -103,9 +107,16 @@ class ADNIE(InfoExtractor):
metas = options.get('metas') or {}
title = metas.get('title') or video_info['title']
links = player_config.get('links') or {}
if not links:
links_url = player_config['linksurl']
links_data = self._download_json(urljoin(
self._BASE_URL, links_url), video_id)
links = links_data.get('links') or {}
formats = []
for format_id, qualities in links.items():
if not isinstance(qualities, dict):
continue
for load_balancer_url in qualities.values():
load_balancer_data = self._download_json(
load_balancer_url, video_id, fatal=False) or {}

View File

@@ -14,14 +14,16 @@ from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
KNOWN_EXTENSIONS,
parse_filesize,
unescapeHTML,
update_url_query,
unified_strdate,
)
class BandcampIE(InfoExtractor):
_VALID_URL = r'https?://.*?\.bandcamp\.com/track/(?P<title>.*)'
_VALID_URL = r'https?://.*?\.bandcamp\.com/track/(?P<title>[^/?#&]+)'
_TESTS = [{
'url': 'http://youtube-dl.bandcamp.com/track/youtube-dl-test-song',
'md5': 'c557841d5e50261777a6585648adf439',
@@ -155,7 +157,7 @@ class BandcampIE(InfoExtractor):
class BandcampAlbumIE(InfoExtractor):
IE_NAME = 'Bandcamp:album'
_VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?:/album/(?P<album_id>[^?#]+)|/?(?:$|[?#]))'
_VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?:/album/(?P<album_id>[^/?#&]+))?'
_TESTS = [{
'url': 'http://blazo.bandcamp.com/album/jazz-format-mixtape-vol-1',
@@ -222,6 +224,12 @@ class BandcampAlbumIE(InfoExtractor):
'playlist_count': 2,
}]
@classmethod
def suitable(cls, url):
return (False
if BandcampWeeklyIE.suitable(url) or BandcampIE.suitable(url)
else super(BandcampAlbumIE, cls).suitable(url))
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
uploader_id = mobj.group('subdomain')
@@ -250,3 +258,92 @@ class BandcampAlbumIE(InfoExtractor):
'title': title,
'entries': entries,
}
class BandcampWeeklyIE(InfoExtractor):
IE_NAME = 'Bandcamp:weekly'
_VALID_URL = r'https?://(?:www\.)?bandcamp\.com/?\?(?:.*?&)?show=(?P<id>\d+)'
_TESTS = [{
'url': 'https://bandcamp.com/?show=224',
'md5': 'b00df799c733cf7e0c567ed187dea0fd',
'info_dict': {
'id': '224',
'ext': 'opus',
'title': 'BC Weekly April 4th 2017 - Magic Moments',
'description': 'md5:5d48150916e8e02d030623a48512c874',
'duration': 5829.77,
'release_date': '20170404',
'series': 'Bandcamp Weekly',
'episode': 'Magic Moments',
'episode_number': 208,
'episode_id': '224',
}
}, {
'url': 'https://bandcamp.com/?blah/blah@&show=228',
'only_matching': True
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
blob = self._parse_json(
self._search_regex(
r'data-blob=(["\'])(?P<blob>{.+?})\1', webpage,
'blob', group='blob'),
video_id, transform_source=unescapeHTML)
show = blob['bcw_show']
# This is desired because any invalid show id redirects to `bandcamp.com`
# which happens to expose the latest Bandcamp Weekly episode.
show_id = int_or_none(show.get('show_id')) or int_or_none(video_id)
formats = []
for format_id, format_url in show['audio_stream'].items():
if not isinstance(format_url, compat_str):
continue
for known_ext in KNOWN_EXTENSIONS:
if known_ext in format_id:
ext = known_ext
break
else:
ext = None
formats.append({
'format_id': format_id,
'url': format_url,
'ext': ext,
'vcodec': 'none',
})
self._sort_formats(formats)
title = show.get('audio_title') or 'Bandcamp Weekly'
subtitle = show.get('subtitle')
if subtitle:
title += ' - %s' % subtitle
episode_number = None
seq = blob.get('bcw_seq')
if seq and isinstance(seq, list):
try:
episode_number = next(
int_or_none(e.get('episode_number'))
for e in seq
if isinstance(e, dict) and int_or_none(e.get('id')) == show_id)
except StopIteration:
pass
return {
'id': video_id,
'title': title,
'description': show.get('desc') or show.get('short_desc'),
'duration': float_or_none(show.get('audio_duration')),
'is_live': False,
'release_date': unified_strdate(show.get('published_date')),
'series': 'Bandcamp Weekly',
'episode': show.get('subtitle'),
'episode_number': episode_number,
'episode_id': compat_str(video_id),
'formats': formats
}

View File

@@ -6,18 +6,33 @@ from ..utils import (
ExtractorError,
clean_html,
compat_str,
float_or_none,
int_or_none,
parse_iso8601,
try_get,
urljoin,
)
class BeamProLiveIE(InfoExtractor):
IE_NAME = 'Beam:live'
_VALID_URL = r'https?://(?:\w+\.)?beam\.pro/(?P<id>[^/?#&]+)'
class BeamProBaseIE(InfoExtractor):
_API_BASE = 'https://mixer.com/api/v1'
_RATINGS = {'family': 0, 'teen': 13, '18+': 18}
def _extract_channel_info(self, chan):
user_id = chan.get('userId') or try_get(chan, lambda x: x['user']['id'])
return {
'uploader': chan.get('token') or try_get(
chan, lambda x: x['user']['username'], compat_str),
'uploader_id': compat_str(user_id) if user_id else None,
'age_limit': self._RATINGS.get(chan.get('audience')),
}
class BeamProLiveIE(BeamProBaseIE):
IE_NAME = 'Mixer:live'
_VALID_URL = r'https?://(?:\w+\.)?(?:beam\.pro|mixer\.com)/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://www.beam.pro/niterhayven',
'url': 'http://mixer.com/niterhayven',
'info_dict': {
'id': '261562',
'ext': 'mp4',
@@ -38,11 +53,17 @@ class BeamProLiveIE(InfoExtractor):
},
}
_MANIFEST_URL_TEMPLATE = '%s/channels/%%s/manifest.%%s' % BeamProBaseIE._API_BASE
@classmethod
def suitable(cls, url):
return False if BeamProVodIE.suitable(url) else super(BeamProLiveIE, cls).suitable(url)
def _real_extract(self, url):
channel_name = self._match_id(url)
chan = self._download_json(
'https://beam.pro/api/v1/channels/%s' % channel_name, channel_name)
'%s/channels/%s' % (self._API_BASE, channel_name), channel_name)
if chan.get('online') is False:
raise ExtractorError(
@@ -50,24 +71,118 @@ class BeamProLiveIE(InfoExtractor):
channel_id = chan['id']
def manifest_url(kind):
return self._MANIFEST_URL_TEMPLATE % (channel_id, kind)
formats = self._extract_m3u8_formats(
'https://beam.pro/api/v1/channels/%s/manifest.m3u8' % channel_id,
channel_name, ext='mp4', m3u8_id='hls', fatal=False)
manifest_url('m3u8'), channel_name, ext='mp4', m3u8_id='hls',
fatal=False)
formats.extend(self._extract_smil_formats(
manifest_url('smil'), channel_name, fatal=False))
self._sort_formats(formats)
user_id = chan.get('userId') or try_get(chan, lambda x: x['user']['id'])
return {
info = {
'id': compat_str(chan.get('id') or channel_name),
'title': self._live_title(chan.get('name') or channel_name),
'description': clean_html(chan.get('description')),
'thumbnail': try_get(chan, lambda x: x['thumbnail']['url'], compat_str),
'thumbnail': try_get(
chan, lambda x: x['thumbnail']['url'], compat_str),
'timestamp': parse_iso8601(chan.get('updatedAt')),
'uploader': chan.get('token') or try_get(
chan, lambda x: x['user']['username'], compat_str),
'uploader_id': compat_str(user_id) if user_id else None,
'age_limit': self._RATINGS.get(chan.get('audience')),
'is_live': True,
'view_count': int_or_none(chan.get('viewersTotal')),
'formats': formats,
}
info.update(self._extract_channel_info(chan))
return info
class BeamProVodIE(BeamProBaseIE):
IE_NAME = 'Mixer:vod'
_VALID_URL = r'https?://(?:\w+\.)?(?:beam\.pro|mixer\.com)/[^/?#&]+\?.*?\bvod=(?P<id>\d+)'
_TEST = {
'url': 'https://mixer.com/willow8714?vod=2259830',
'md5': 'b2431e6e8347dc92ebafb565d368b76b',
'info_dict': {
'id': '2259830',
'ext': 'mp4',
'title': 'willow8714\'s Channel',
'duration': 6828.15,
'thumbnail': r're:https://.*source\.png$',
'timestamp': 1494046474,
'upload_date': '20170506',
'uploader': 'willow8714',
'uploader_id': '6085379',
'age_limit': 13,
'view_count': int,
},
'params': {
'skip_download': True,
},
}
@staticmethod
def _extract_format(vod, vod_type):
if not vod.get('baseUrl'):
return []
if vod_type == 'hls':
filename, protocol = 'manifest.m3u8', 'm3u8_native'
elif vod_type == 'raw':
filename, protocol = 'source.mp4', 'https'
else:
assert False
data = vod.get('data') if isinstance(vod.get('data'), dict) else {}
format_id = [vod_type]
if isinstance(data.get('Height'), compat_str):
format_id.append('%sp' % data['Height'])
return [{
'url': urljoin(vod['baseUrl'], filename),
'format_id': '-'.join(format_id),
'ext': 'mp4',
'protocol': protocol,
'width': int_or_none(data.get('Width')),
'height': int_or_none(data.get('Height')),
'fps': int_or_none(data.get('Fps')),
'tbr': int_or_none(data.get('Bitrate'), 1000),
}]
def _real_extract(self, url):
vod_id = self._match_id(url)
vod_info = self._download_json(
'%s/recordings/%s' % (self._API_BASE, vod_id), vod_id)
state = vod_info.get('state')
if state != 'AVAILABLE':
raise ExtractorError(
'VOD %s is not available (state: %s)' % (vod_id, state),
expected=True)
formats = []
thumbnail_url = None
for vod in vod_info['vods']:
vod_type = vod.get('format')
if vod_type in ('hls', 'raw'):
formats.extend(self._extract_format(vod, vod_type))
elif vod_type == 'thumbnail':
thumbnail_url = urljoin(vod.get('baseUrl'), 'source.png')
self._sort_formats(formats)
info = {
'id': vod_id,
'title': vod_info.get('name') or vod_id,
'duration': float_or_none(vod_info.get('duration')),
'thumbnail': thumbnail_url,
'timestamp': parse_iso8601(vod_info.get('createdAt')),
'view_count': int_or_none(vod_info.get('viewsTotal')),
'formats': formats,
}
info.update(self._extract_channel_info(vod_info.get('channel') or {}))
return info

View File

@@ -8,7 +8,7 @@ from ..utils import int_or_none
class CBSInteractiveIE(CBSIE):
_VALID_URL = r'https?://(?:www\.)?(?P<site>cnet|zdnet)\.com/(?:videos|video/share)/(?P<id>[^/?]+)'
_VALID_URL = r'https?://(?:www\.)?(?P<site>cnet|zdnet)\.com/(?:videos|video(?:/share)?)/(?P<id>[^/?]+)'
_TESTS = [{
'url': 'http://www.cnet.com/videos/hands-on-with-microsofts-windows-8-1-update/',
'info_dict': {
@@ -60,6 +60,9 @@ class CBSInteractiveIE(CBSIE):
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://www.zdnet.com/video/huawei-matebook-x-video/',
'only_matching': True,
}]
MPX_ACCOUNTS = {

View File

@@ -61,11 +61,17 @@ class CBSNewsIE(CBSIE):
video_info = self._parse_json(self._html_search_regex(
r'(?:<ul class="media-list items" id="media-related-items"><li data-video-info|<div id="cbsNewsVideoPlayer" data-video-player-options)=\'({.+?})\'',
webpage, 'video JSON info'), video_id)
webpage, 'video JSON info', default='{}'), video_id, fatal=False)
item = video_info['item'] if 'item' in video_info else video_info
guid = item['mpxRefId']
return self._extract_video_info(guid, 'cbsnews')
if video_info:
item = video_info['item'] if 'item' in video_info else video_info
else:
state = self._parse_json(self._search_regex(
r'data-cbsvideoui-options=(["\'])(?P<json>{.+?})\1', webpage,
'playlist JSON info', group='json'), video_id)['state']
item = state['playlist'][state['pid']]
return self._extract_video_info(item['mpxRefId'], 'cbsnews')
class CBSNewsLiveVideoIE(InfoExtractor):

View File

@@ -1,135 +1,59 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_iso8601,
js_to_json,
parse_duration,
unescapeHTML,
)
class DRBonanzaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dr\.dk/bonanza/(?:[^/]+/)+(?:[^/])+?(?:assetId=(?P<id>\d+))?(?:[#&]|$)'
_TESTS = [{
'url': 'http://www.dr.dk/bonanza/serie/portraetter/Talkshowet.htm?assetId=65517',
_VALID_URL = r'https?://(?:www\.)?dr\.dk/bonanza/[^/]+/\d+/[^/]+/(?P<id>\d+)/(?P<display_id>[^/?#&]+)'
_TEST = {
'url': 'http://www.dr.dk/bonanza/serie/154/matador/40312/matador---0824-komme-fremmede-',
'info_dict': {
'id': '65517',
'id': '40312',
'display_id': 'matador---0824-komme-fremmede-',
'ext': 'mp4',
'title': 'Talkshowet - Leonard Cohen',
'description': 'md5:8f34194fb30cd8c8a30ad8b27b70c0ca',
'title': 'MATADOR - 08:24. "Komme fremmede".',
'description': 'md5:77b4c1ac4d4c1b9d610ab4395212ff84',
'thumbnail': r're:^https?://.*\.(?:gif|jpg)$',
'timestamp': 1295537932,
'upload_date': '20110120',
'duration': 3664,
'duration': 4613,
},
'params': {
'skip_download': True, # requires rtmp
},
}, {
'url': 'http://www.dr.dk/bonanza/radio/serie/sport/fodbold.htm?assetId=59410',
'md5': '6dfe039417e76795fb783c52da3de11d',
'info_dict': {
'id': '59410',
'ext': 'mp3',
'title': 'EM fodbold 1992 Danmark - Tyskland finale Transmission',
'description': 'md5:501e5a195749480552e214fbbed16c4e',
'thumbnail': r're:^https?://.*\.(?:gif|jpg)$',
'timestamp': 1223274900,
'upload_date': '20081006',
'duration': 7369,
},
}]
}
def _real_extract(self, url):
url_id = self._match_id(url)
webpage = self._download_webpage(url, url_id)
mobj = re.match(self._VALID_URL, url)
video_id, display_id = mobj.group('id', 'display_id')
if url_id:
info = json.loads(self._html_search_regex(r'({.*?%s.*})' % url_id, webpage, 'json'))
else:
# Just fetch the first video on that page
info = json.loads(self._html_search_regex(r'bonanzaFunctions.newPlaylist\(({.*})\)', webpage, 'json'))
webpage = self._download_webpage(url, display_id)
asset_id = str(info['AssetId'])
title = info['Title'].rstrip(' \'\"-,.:;!?')
duration = int_or_none(info.get('Duration'), scale=1000)
# First published online. "FirstPublished" contains the date for original airing.
timestamp = parse_iso8601(
re.sub(r'\.\d+$', '', info['Created']))
info = self._parse_html5_media_entries(
url, webpage, display_id, m3u8_id='hls',
m3u8_entry_protocol='m3u8_native')[0]
self._sort_formats(info['formats'])
def parse_filename_info(url):
match = re.search(r'/\d+_(?P<width>\d+)x(?P<height>\d+)x(?P<bitrate>\d+)K\.(?P<ext>\w+)$', url)
if match:
return {
'width': int(match.group('width')),
'height': int(match.group('height')),
'vbr': int(match.group('bitrate')),
'ext': match.group('ext')
}
match = re.search(r'/\d+_(?P<bitrate>\d+)K\.(?P<ext>\w+)$', url)
if match:
return {
'vbr': int(match.group('bitrate')),
'ext': match.group(2)
}
return {}
asset = self._parse_json(
self._search_regex(
r'(?s)currentAsset\s*=\s*({.+?})\s*</script', webpage, 'asset'),
display_id, transform_source=js_to_json)
video_types = ['VideoHigh', 'VideoMid', 'VideoLow']
preferencemap = {
'VideoHigh': -1,
'VideoMid': -2,
'VideoLow': -3,
'Audio': -4,
}
title = unescapeHTML(asset['AssetTitle']).strip()
formats = []
for file in info['Files']:
if info['Type'] == 'Video':
if file['Type'] in video_types:
format = parse_filename_info(file['Location'])
format.update({
'url': file['Location'],
'format_id': file['Type'].replace('Video', ''),
'preference': preferencemap.get(file['Type'], -10),
})
if format['url'].startswith('rtmp'):
rtmp_url = format['url']
format['rtmp_live'] = True # --resume does not work
if '/bonanza/' in rtmp_url:
format['play_path'] = rtmp_url.split('/bonanza/')[1]
formats.append(format)
elif file['Type'] == 'Thumb':
thumbnail = file['Location']
elif info['Type'] == 'Audio':
if file['Type'] == 'Audio':
format = parse_filename_info(file['Location'])
format.update({
'url': file['Location'],
'format_id': file['Type'],
'vcodec': 'none',
})
formats.append(format)
elif file['Type'] == 'Thumb':
thumbnail = file['Location']
def extract(field):
return self._search_regex(
r'<div[^>]+>\s*<p>%s:<p>\s*</div>\s*<div[^>]+>\s*<p>([^<]+)</p>' % field,
webpage, field, default=None)
description = '%s\n%s\n%s\n' % (
info['Description'], info['Actors'], info['Colophon'])
self._sort_formats(formats)
display_id = re.sub(r'[^\w\d-]', '', re.sub(r' ', '-', title.lower())) + '-' + asset_id
display_id = re.sub(r'-+', '-', display_id)
return {
'id': asset_id,
info.update({
'id': asset.get('AssetId') or video_id,
'display_id': display_id,
'title': title,
'formats': formats,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
}
'description': extract('Programinfo'),
'duration': parse_duration(extract('Tid')),
'thumbnail': asset.get('AssetImageUrl'),
})
return info

View File

@@ -5,9 +5,12 @@ import re
from .common import InfoExtractor
from ..utils import (
js_to_json,
unescapeHTML,
determine_ext,
ExtractorError,
int_or_none,
js_to_json,
mimetype2ext,
unescapeHTML,
)
@@ -24,14 +27,7 @@ class DVTVIE(InfoExtractor):
'id': 'dc0768de855511e49e4b0025900fea04',
'ext': 'mp4',
'title': 'Vondra o Českém století: Při pohledu na Havla mi bylo trapně',
}
}, {
'url': 'http://video.aktualne.cz/dvtv/stropnicky-policie-vrbetice-preventivne-nekontrolovala/r~82ed4322849211e4a10c0025900fea04/',
'md5': '6388f1941b48537dbd28791f712af8bf',
'info_dict': {
'id': '72c02230849211e49f60002590604f2e',
'ext': 'mp4',
'title': 'Stropnický: Policie Vrbětice preventivně nekontrolovala',
'duration': 1484,
}
}, {
'url': 'http://video.aktualne.cz/dvtv/dvtv-16-12-2014-utok-talibanu-boj-o-kliniku-uprchlici/r~973eb3bc854e11e498be002590604f2e/',
@@ -44,55 +40,100 @@ class DVTVIE(InfoExtractor):
'info_dict': {
'id': 'b0b40906854d11e4bdad0025900fea04',
'ext': 'mp4',
'title': 'Drtinová Veselovský TV 16. 12. 2014: Témata dne'
'title': 'Drtinová Veselovský TV 16. 12. 2014: Témata dne',
'description': 'md5:0916925dea8e30fe84222582280b47a0',
'timestamp': 1418760010,
'upload_date': '20141216',
}
}, {
'md5': '5f7652a08b05009c1292317b449ffea2',
'info_dict': {
'id': '420ad9ec854a11e4bdad0025900fea04',
'ext': 'mp4',
'title': 'Školní masakr možná změní boj s Talibanem, říká novinářka'
'title': 'Školní masakr možná změní boj s Talibanem, říká novinářka',
'description': 'md5:ff2f9f6de73c73d7cef4f756c1c1af42',
'timestamp': 1418760010,
'upload_date': '20141216',
}
}, {
'md5': '498eb9dfa97169f409126c617e2a3d64',
'info_dict': {
'id': '95d35580846a11e4b6d20025900fea04',
'ext': 'mp4',
'title': 'Boj o kliniku: Veřejný zájem, nebo právo na majetek?'
'title': 'Boj o kliniku: Veřejný zájem, nebo právo na majetek?',
'description': 'md5:889fe610a70fee5511dc3326a089188e',
'timestamp': 1418760010,
'upload_date': '20141216',
}
}, {
'md5': 'b8dc6b744844032dab6ba3781a7274b9',
'info_dict': {
'id': '6fe14d66853511e4833a0025900fea04',
'ext': 'mp4',
'title': 'Pánek: Odmítání syrských uprchlíků je ostudou české vlády'
'title': 'Pánek: Odmítání syrských uprchlíků je ostudou české vlády',
'description': 'md5:544f86de6d20c4815bea11bf2ac3004f',
'timestamp': 1418760010,
'upload_date': '20141216',
}
}],
}, {
'url': 'https://video.aktualne.cz/dvtv/zeman-si-jen-leci-mindraky-sobotku-nenavidi-a-babis-se-mu-te/r~960cdb3a365a11e7a83b0025900fea04/',
'md5': 'f8efe9656017da948369aa099788c8ea',
'info_dict': {
'id': '3c496fec365911e7a6500025900fea04',
'ext': 'mp4',
'title': 'Zeman si jen léčí mindráky, Sobotku nenávidí a Babiš se mu teď hodí, tvrdí Kmenta',
'duration': 1103,
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://video.aktualne.cz/v-cechach-poprve-zazni-zelenkova-zrestaurovana-mse/r~45b4b00483ec11e4883b002590604f2e/',
'only_matching': True,
}]
def _parse_video_metadata(self, js, video_id):
metadata = self._parse_json(js, video_id, transform_source=js_to_json)
data = self._parse_json(js, video_id, transform_source=js_to_json)
title = unescapeHTML(data['title'])
formats = []
for video in metadata['sources']:
ext = video['type'][6:]
formats.append({
'url': video['file'],
'ext': ext,
'format_id': '%s-%s' % (ext, video['label']),
'height': int(video['label'].rstrip('p')),
'fps': 25,
})
for video in data['sources']:
video_url = video.get('file')
if not video_url:
continue
video_type = video.get('type')
ext = determine_ext(video_url, mimetype2ext(video_type))
if video_type == 'application/vnd.apple.mpegurl' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif video_type == 'application/dash+xml' or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
video_url, video_id, mpd_id='dash', fatal=False))
else:
label = video.get('label')
height = self._search_regex(
r'^(\d+)[pP]', label or '', 'height', default=None)
format_id = ['http']
for f in (ext, label):
if f:
format_id.append(f)
formats.append({
'url': video_url,
'format_id': '-'.join(format_id),
'height': int_or_none(height),
})
self._sort_formats(formats)
return {
'id': metadata['mediaid'],
'title': unescapeHTML(metadata['title']),
'thumbnail': self._proto_relative_url(metadata['image'], 'http:'),
'id': data.get('mediaid') or video_id,
'title': title,
'description': data.get('description'),
'thumbnail': data.get('image'),
'duration': int_or_none(data.get('duration')),
'timestamp': int_or_none(data.get('pubtime')),
'formats': formats
}
@@ -103,7 +144,7 @@ class DVTVIE(InfoExtractor):
# single video
item = self._search_regex(
r"(?s)embedData[0-9a-f]{32}\['asset'\]\s*=\s*(\{.+?\});",
r'(?s)embedData[0-9a-f]{32}\[["\']asset["\']\]\s*=\s*(\{.+?\});',
webpage, 'video', default=None, fatal=False)
if item:
@@ -113,6 +154,8 @@ class DVTVIE(InfoExtractor):
items = re.findall(
r"(?s)BBX\.context\.assets\['[0-9a-f]{32}'\]\.push\(({.+?})\);",
webpage)
if not items:
items = re.findall(r'(?s)var\s+asset\s*=\s*({.+?});\n', webpage)
if items:
return {

View File

@@ -90,7 +90,7 @@ from .azmedien import (
)
from .baidu import BaiduVideoIE
from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE
from .bandcamp import BandcampIE, BandcampAlbumIE, BandcampWeeklyIE
from .bbc import (
BBCCoUkIE,
BBCCoUkArticleIE,
@@ -98,7 +98,10 @@ from .bbc import (
BBCCoUkPlaylistIE,
BBCIE,
)
from .beampro import BeamProLiveIE
from .beampro import (
BeamProLiveIE,
BeamProVodIE,
)
from .beeg import BeegIE
from .behindkink import BehindKinkIE
from .bellmedia import BellMediaIE
@@ -389,7 +392,6 @@ from .globo import (
from .go import GoIE
from .go90 import Go90IE
from .godtube import GodTubeIE
from .godtv import GodTVIE
from .golem import GolemIE
from .googledrive import GoogleDriveIE
from .googleplus import GooglePlusIE

View File

@@ -102,6 +102,8 @@ class FirstTVIE(InfoExtractor):
'format_id': f.get('name'),
'tbr': tbr,
'source_preference': quality(f.get('name')),
# quality metadata of http formats may be incorrect
'preference': -1,
})
# m3u8 URL format is reverse engineered from [1] (search for
# master.m3u8). dashEdges (that is currently balancer-vod.1tv.ru)

View File

@@ -112,7 +112,7 @@ class FranceTVBaseInfoExtractor(InfoExtractor):
class FranceTVIE(FranceTVBaseInfoExtractor):
_VALID_URL = r'https?://(?:(?:www\.)?france\.tv|mobile\.france\.tv)/(?:[^/]+/)+(?P<id>[^/]+)\.html'
_VALID_URL = r'https?://(?:(?:www\.)?france\.tv|mobile\.france\.tv)/(?:[^/]+/)*(?P<id>[^/]+)\.html'
_TESTS = [{
'url': 'https://www.france.tv/france-2/13h15-le-dimanche/140921-les-mysteres-de-jesus.html',
@@ -157,6 +157,9 @@ class FranceTVIE(FranceTVBaseInfoExtractor):
}, {
'url': 'https://mobile.france.tv/france-5/c-dans-l-air/137347-emission-du-vendredi-12-mai-2017.html',
'only_matching': True,
}, {
'url': 'https://www.france.tv/142749-rouge-sang.html',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -6,62 +6,52 @@ from .common import InfoExtractor
from ..utils import (
float_or_none,
int_or_none,
js_to_json,
unified_strdate,
)
class GaskrankIE(InfoExtractor):
"""InfoExtractor for gaskrank.tv"""
_VALID_URL = r'https?://(?:www\.)?gaskrank\.tv/tv/(?P<categories>[^/]+)/(?P<id>[^/]+)\.html?'
_TESTS = [
{
'url': 'http://www.gaskrank.tv/tv/motorrad-fun/strike-einparken-durch-anfaenger-crash-mit-groesserem-flurschaden.htm',
'md5': '1ae88dbac97887d85ebd1157a95fc4f9',
'info_dict': {
'id': '201601/26955',
'ext': 'mp4',
'title': 'Strike! Einparken können nur Männer - Flurschaden hält sich in Grenzen *lol*',
'thumbnail': r're:^https?://.*\.jpg$',
'categories': ['motorrad-fun'],
'display_id': 'strike-einparken-durch-anfaenger-crash-mit-groesserem-flurschaden',
'uploader_id': 'Bikefun',
'upload_date': '20170110',
'uploader_url': None,
}
},
{
'url': 'http://www.gaskrank.tv/tv/racing/isle-of-man-tt-2011-michael-du-15920.htm',
'md5': 'c33ee32c711bc6c8224bfcbe62b23095',
'info_dict': {
'id': '201106/15920',
'ext': 'mp4',
'title': 'Isle of Man - Michael Dunlop vs Guy Martin - schwindelig kucken',
'thumbnail': r're:^https?://.*\.jpg$',
'categories': ['racing'],
'display_id': 'isle-of-man-tt-2011-michael-du-15920',
'uploader_id': 'IOM',
'upload_date': '20160506',
'uploader_url': 'www.iomtt.com',
}
_VALID_URL = r'https?://(?:www\.)?gaskrank\.tv/tv/(?P<categories>[^/]+)/(?P<id>[^/]+)\.htm'
_TESTS = [{
'url': 'http://www.gaskrank.tv/tv/motorrad-fun/strike-einparken-durch-anfaenger-crash-mit-groesserem-flurschaden.htm',
'md5': '1ae88dbac97887d85ebd1157a95fc4f9',
'info_dict': {
'id': '201601/26955',
'ext': 'mp4',
'title': 'Strike! Einparken können nur Männer - Flurschaden hält sich in Grenzen *lol*',
'thumbnail': r're:^https?://.*\.jpg$',
'categories': ['motorrad-fun'],
'display_id': 'strike-einparken-durch-anfaenger-crash-mit-groesserem-flurschaden',
'uploader_id': 'Bikefun',
'upload_date': '20170110',
'uploader_url': None,
}
]
}, {
'url': 'http://www.gaskrank.tv/tv/racing/isle-of-man-tt-2011-michael-du-15920.htm',
'md5': 'c33ee32c711bc6c8224bfcbe62b23095',
'info_dict': {
'id': '201106/15920',
'ext': 'mp4',
'title': 'Isle of Man - Michael Dunlop vs Guy Martin - schwindelig kucken',
'thumbnail': r're:^https?://.*\.jpg$',
'categories': ['racing'],
'display_id': 'isle-of-man-tt-2011-michael-du-15920',
'uploader_id': 'IOM',
'upload_date': '20170523',
'uploader_url': 'www.iomtt.com',
}
}]
def _real_extract(self, url):
"""extract information from gaskrank.tv"""
def fix_json(code):
"""Removes trailing comma in json: {{},} --> {{}}"""
return re.sub(r',\s*}', r'}', js_to_json(code))
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
title = self._og_search_title(
webpage, default=None) or self._html_search_meta(
'title', webpage, fatal=True)
categories = [re.match(self._VALID_URL, url).group('categories')]
title = self._search_regex(
r'movieName\s*:\s*\'([^\']*)\'',
webpage, 'title')
thumbnail = self._search_regex(
r'poster\s*:\s*\'([^\']*)\'',
webpage, 'thumbnail', default=None)
mobj = re.search(
r'Video von:\s*(?P<uploader_id>[^|]*?)\s*\|\s*vom:\s*(?P<upload_date>[0-9][0-9]\.[0-9][0-9]\.[0-9][0-9][0-9][0-9])',
@@ -89,29 +79,14 @@ class GaskrankIE(InfoExtractor):
if average_rating:
average_rating = float_or_none(average_rating.replace(',', '.'))
playlist = self._parse_json(
self._search_regex(
r'playlist\s*:\s*\[([^\]]*)\]',
webpage, 'playlist', default='{}'),
display_id, transform_source=fix_json, fatal=False)
video_id = self._search_regex(
r'https?://movies\.gaskrank\.tv/([^-]*?)(-[^\.]*)?\.mp4',
playlist.get('0').get('src'), 'video id')
webpage, 'video id', default=display_id)
formats = []
for key in playlist:
formats.append({
'url': playlist[key]['src'],
'format_id': key,
'quality': playlist[key].get('quality')})
self._sort_formats(formats, field_preference=['format_id'])
return {
entry = self._parse_html5_media_entries(url, webpage, video_id)[0]
entry.update({
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': thumbnail,
'categories': categories,
'display_id': display_id,
'uploader_id': uploader_id,
@@ -120,4 +95,7 @@ class GaskrankIE(InfoExtractor):
'tags': tags,
'view_count': view_count,
'average_rating': average_rating,
}
})
self._sort_formats(entry['formats'])
return entry

View File

@@ -1,66 +0,0 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .ooyala import OoyalaIE
from ..utils import js_to_json
class GodTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?god\.tv(?:/[^/]+)*/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://god.tv/jesus-image/video/jesus-conference-2016/randy-needham',
'info_dict': {
'id': 'lpd3g2MzE6D1g8zFAKz8AGpxWcpu6o_3',
'ext': 'mp4',
'title': 'Randy Needham',
'duration': 3615.08,
},
'params': {
'skip_download': True,
}
}, {
'url': 'http://god.tv/playlist/bible-study',
'info_dict': {
'id': 'bible-study',
},
'playlist_mincount': 37,
}, {
'url': 'http://god.tv/node/15097',
'only_matching': True,
}, {
'url': 'http://god.tv/live/africa',
'only_matching': True,
}, {
'url': 'http://god.tv/liveevents',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
settings = self._parse_json(
self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'settings', default='{}'),
display_id, transform_source=js_to_json, fatal=False)
ooyala_id = None
if settings:
playlist = settings.get('playlist')
if playlist and isinstance(playlist, list):
entries = [
OoyalaIE._build_url_result(video['content_id'])
for video in playlist if video.get('content_id')]
if entries:
return self.playlist_result(entries, display_id)
ooyala_id = settings.get('ooyala', {}).get('content_id')
if not ooyala_id:
ooyala_id = self._search_regex(
r'["\']content_id["\']\s*:\s*(["\'])(?P<id>[\w-]+)\1',
webpage, 'ooyala id', group='id')
return OoyalaIE._build_url_result(ooyala_id)

View File

@@ -17,7 +17,7 @@ from ..utils import (
class MedialaanIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?:www\.|nieuws\.)?
(?:
(?P<site_id>vtm|q2|vtmkzoom)\.be/
(?:
@@ -85,6 +85,22 @@ class MedialaanIE(InfoExtractor):
# clip
'url': 'http://vtmkzoom.be/k3-dansstudio/een-nieuw-seizoen-van-k3-dansstudio',
'only_matching': True,
}, {
# http/s redirect
'url': 'https://vtmkzoom.be/video?aid=45724',
'info_dict': {
'id': '257136373657000',
'ext': 'mp4',
'title': 'K3 Dansstudio Ushuaia afl.6',
},
'params': {
'skip_download': True,
},
'skip': 'Requires account credentials',
}, {
# nieuws.vtm.be
'url': 'https://nieuws.vtm.be/stadion/stadion/genk-nog-moeilijk-programma',
'only_matching': True,
}]
def _real_initialize(self):
@@ -146,6 +162,8 @@ class MedialaanIE(InfoExtractor):
video_id, transform_source=lambda s: '[%s]' % s, fatal=False)
if player:
video = player[-1]
if video['videoUrl'] in ('http', 'https'):
return self.url_result(video['url'], MedialaanIE.ie_key())
info = {
'id': video_id,
'url': video['videoUrl'],

View File

@@ -1,5 +1,6 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
@@ -14,7 +15,6 @@ from ..utils import (
strip_or_none,
unified_timestamp,
urljoin,
urlencode_postdata,
)
@@ -45,22 +45,15 @@ class PacktPubIE(PacktPubBaseIE):
(username, password) = self._get_login_info()
if username is None:
return
webpage = self._download_webpage(self._PACKT_BASE, None)
login_form = self._form_hidden_inputs(
'packt-user-login-form', webpage)
login_form.update({
'email': username,
'password': password,
})
self._download_webpage(
self._PACKT_BASE, None, 'Logging in as %s' % username,
data=urlencode_postdata(login_form))
try:
self._TOKEN = self._download_json(
'%s/users/tokens/sessions' % self._MAPT_REST, None,
'Downloading Authorization Token')['data']['token']
self._MAPT_REST + '/users/tokens', None,
'Downloading Authorization Token', data=json.dumps({
'email': username,
'password': password,
}).encode())['data']['access']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 404):
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 401, 404):
message = self._parse_json(e.cause.read().decode(), None)['message']
raise ExtractorError(message, expected=True)
raise
@@ -83,7 +76,7 @@ class PacktPubIE(PacktPubBaseIE):
headers = {}
if self._TOKEN:
headers['Authorization'] = self._TOKEN
headers['Authorization'] = 'Bearer ' + self._TOKEN
video = self._download_json(
'%s/users/me/products/%s/chapters/%s/sections/%s'
% (self._MAPT_REST, course_id, chapter_id, video_id), video_id,

View File

@@ -252,11 +252,14 @@ class PornHubPlaylistBaseIE(InfoExtractor):
playlist = self._parse_json(
self._search_regex(
r'playlistObject\s*=\s*({.+?});', webpage, 'playlist'),
playlist_id)
r'(?:playlistObject|PLAYLIST_VIEW)\s*=\s*({.+?});', webpage,
'playlist', default='{}'),
playlist_id, fatal=False)
title = playlist.get('title') or self._search_regex(
r'>Videos\s+in\s+(.+?)\s+[Pp]laylist<', webpage, 'title', fatal=False)
return self.playlist_result(
entries, playlist_id, playlist.get('title'), playlist.get('description'))
entries, playlist_id, title, playlist.get('description'))
class PornHubPlaylistIE(PornHubPlaylistBaseIE):
@@ -296,6 +299,7 @@ class PornHubUserVideosIE(PornHubPlaylistBaseIE):
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404:
break
raise
page_entries = self._extract_entries(webpage)
if not page_entries:
break

View File

@@ -67,7 +67,7 @@ class SafariBaseIE(InfoExtractor):
'Login failed; make sure your credentials are correct and try again.',
expected=True)
SafariBaseIE.LOGGED_IN = True
self.LOGGED_IN = True
self.to_screen('Login successful')

View File

@@ -4,6 +4,7 @@ import re
from .common import InfoExtractor
from ..utils import (
clean_html,
dict_get,
ExtractorError,
int_or_none,
@@ -25,6 +26,7 @@ class XHamsterIE(InfoExtractor):
'uploader': 'Ruseful2011',
'duration': 893,
'age_limit': 18,
'categories': ['Fake Hub', 'Amateur', 'MILFs', 'POV', 'Boss', 'Office', 'Oral', 'Reality', 'Sexy'],
},
}, {
'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
@@ -36,6 +38,7 @@ class XHamsterIE(InfoExtractor):
'uploader': 'jojo747400',
'duration': 200,
'age_limit': 18,
'categories': ['Britney Spears', 'Celebrities', 'HD Videos', 'Sexy', 'Sexy Booty'],
},
'params': {
'skip_download': True,
@@ -51,6 +54,7 @@ class XHamsterIE(InfoExtractor):
'uploader': 'parejafree',
'duration': 72,
'age_limit': 18,
'categories': ['Amateur', 'Blowjobs'],
},
'params': {
'skip_download': True,
@@ -104,7 +108,7 @@ class XHamsterIE(InfoExtractor):
webpage, 'upload date', fatal=False))
uploader = self._html_search_regex(
r'<span[^>]+itemprop=["\']author[^>]+><a[^>]+href=["\'].+?xhamster\.com/user/[^>]+>(?P<uploader>.+?)</a>',
r'<span[^>]+itemprop=["\']author[^>]+><a[^>]+><span[^>]+>([^<]+)',
webpage, 'uploader', default='anonymous')
thumbnail = self._search_regex(
@@ -120,7 +124,7 @@ class XHamsterIE(InfoExtractor):
r'content=["\']User(?:View|Play)s:(\d+)',
webpage, 'view count', fatal=False))
mobj = re.search(r"hint='(?P<likecount>\d+) Likes / (?P<dislikecount>\d+) Dislikes'", webpage)
mobj = re.search(r'hint=[\'"](?P<likecount>\d+) Likes / (?P<dislikecount>\d+) Dislikes', webpage)
(like_count, dislike_count) = (mobj.group('likecount'), mobj.group('dislikecount')) if mobj else (None, None)
mobj = re.search(r'</label>Comments \((?P<commentcount>\d+)\)</div>', webpage)
@@ -152,6 +156,12 @@ class XHamsterIE(InfoExtractor):
self._sort_formats(formats)
categories_html = self._search_regex(
r'(?s)<table.+?(<span>Categories:.+?)</table>', webpage,
'categories', default=None)
categories = [clean_html(category) for category in re.findall(
r'<a[^>]+>(.+?)</a>', categories_html)] if categories_html else None
return {
'id': video_id,
'title': title,
@@ -165,6 +175,7 @@ class XHamsterIE(InfoExtractor):
'dislike_count': int_or_none(dislike_count),
'comment_count': int_or_none(comment_count),
'age_limit': age_limit,
'categories': categories,
'formats': formats,
}

View File

@@ -12,6 +12,7 @@ from ..utils import (
ExtractorError,
get_element_by_class,
js_to_json,
str_or_none,
strip_jsonp,
urljoin,
)
@@ -36,6 +37,12 @@ class YoukuIE(InfoExtractor):
'id': 'XMTc1ODE5Njcy',
'title': '★Smile﹗♡ Git Fresh -Booty Music舞蹈.',
'ext': 'mp4',
'duration': 74.73,
'thumbnail': r're:^https?://.*',
'uploader': '。躲猫猫、',
'uploader_id': '36017967',
'uploader_url': 'http://i.youku.com/u/UMTQ0MDcxODY4',
'tags': list,
}
}, {
'url': 'http://player.youku.com/player.php/sid/XNDgyMDQ2NTQw/v.swf',
@@ -46,6 +53,12 @@ class YoukuIE(InfoExtractor):
'id': 'XODgxNjg1Mzk2',
'ext': 'mp4',
'title': '武媚娘传奇 85',
'duration': 1999.61,
'thumbnail': r're:^https?://.*',
'uploader': '疯狂豆花',
'uploader_id': '62583473',
'uploader_url': 'http://i.youku.com/u/UMjUwMzMzODky',
'tags': list,
},
}, {
'url': 'http://v.youku.com/v_show/id_XMTI1OTczNDM5Mg==.html',
@@ -53,6 +66,12 @@ class YoukuIE(InfoExtractor):
'id': 'XMTI1OTczNDM5Mg',
'ext': 'mp4',
'title': '花千骨 04',
'duration': 2363,
'thumbnail': r're:^https?://.*',
'uploader': '放剧场-花千骨',
'uploader_id': '772849359',
'uploader_url': 'http://i.youku.com/u/UMzA5MTM5NzQzNg==',
'tags': list,
},
}, {
'url': 'http://v.youku.com/v_show/id_XNjA1NzA2Njgw.html',
@@ -61,6 +80,12 @@ class YoukuIE(InfoExtractor):
'id': 'XNjA1NzA2Njgw',
'ext': 'mp4',
'title': '邢義田复旦讲座之想象中的胡人—从“左衽孔子”说起',
'duration': 7264.5,
'thumbnail': r're:^https?://.*',
'uploader': 'FoxJin1006',
'uploader_id': '322014285',
'uploader_url': 'http://i.youku.com/u/UMTI4ODA1NzE0MA==',
'tags': list,
},
'params': {
'videopassword': '100600',
@@ -72,6 +97,12 @@ class YoukuIE(InfoExtractor):
'id': 'XOTUxMzg4NDMy',
'ext': 'mp4',
'title': '我的世界☆明月庄主☆车震猎杀☆杀人艺术Minecraft',
'duration': 702.08,
'thumbnail': r're:^https?://.*',
'uploader': '明月庄主moon',
'uploader_id': '38465621',
'uploader_url': 'http://i.youku.com/u/UMTUzODYyNDg0',
'tags': list,
},
}, {
'url': 'http://video.tudou.com/v/XMjIyNzAzMTQ4NA==.html?f=46177805',
@@ -79,6 +110,12 @@ class YoukuIE(InfoExtractor):
'id': 'XMjIyNzAzMTQ4NA',
'ext': 'mp4',
'title': '卡马乔国足开大脚长传冲吊集锦',
'duration': 289,
'thumbnail': r're:^https?://.*',
'uploader': '阿卜杜拉之星',
'uploader_id': '2382249',
'uploader_url': 'http://i.youku.com/u/UOTUyODk5Ng==',
'tags': list,
},
}, {
'url': 'http://video.tudou.com/v/XMjE4ODI3OTg2MA==.html',
@@ -154,7 +191,8 @@ class YoukuIE(InfoExtractor):
raise ExtractorError(msg)
# get video title
title = data['video']['title']
video_data = data['video']
title = video_data['title']
formats = [{
'url': stream['m3u8_url'],
@@ -171,6 +209,12 @@ class YoukuIE(InfoExtractor):
'id': video_id,
'title': title,
'formats': formats,
'duration': video_data.get('seconds'),
'thumbnail': video_data.get('logo'),
'uploader': video_data.get('username'),
'uploader_id': str_or_none(video_data.get('userid')),
'uploader_url': data.get('uploader', {}).get('homepage'),
'tags': video_data.get('tags'),
}

View File

@@ -1353,10 +1353,16 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
start_time = parse_duration(time_point)
if start_time is None:
continue
if start_time > duration:
break
end_time = (duration if next_num == len(chapter_lines)
else parse_duration(chapter_lines[next_num][1]))
if end_time is None:
continue
if end_time > duration:
end_time = duration
if start_time > end_time:
break
chapter_title = re.sub(
r'<a[^>]+>[^<]+</a>', '', chapter_line).strip(' \t-')
chapter_title = re.sub(r'\s+', ' ', chapter_title)
@@ -1715,12 +1721,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
format_id = url_data['itag'][0]
url = url_data['url'][0]
if 'sig' in url_data:
url += '&signature=' + url_data['sig'][0]
elif 's' in url_data:
encrypted_sig = url_data['s'][0]
if 's' in url_data or self._downloader.params.get('youtube_include_dash_manifest', True):
ASSETS_RE = r'"assets":.+?"js":\s*("[^"]+")'
jsplayer_url_json = self._search_regex(
ASSETS_RE,
embed_webpage if age_gate else video_webpage,
@@ -1741,6 +1743,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
video_webpage, 'age gate player URL')
player_url = json.loads(player_url_json)
if 'sig' in url_data:
url += '&signature=' + url_data['sig'][0]
elif 's' in url_data:
encrypted_sig = url_data['s'][0]
if self._downloader.params.get('verbose'):
if player_url is None:
player_version = 'unknown'

View File

@@ -932,14 +932,6 @@ class YoutubeDLHandler(compat_urllib_request.HTTPHandler):
except zlib.error:
return zlib.decompress(data)
@staticmethod
def addinfourl_wrapper(stream, headers, url, code):
if hasattr(compat_urllib_request.addinfourl, 'getcode'):
return compat_urllib_request.addinfourl(stream, headers, url, code)
ret = compat_urllib_request.addinfourl(stream, headers, url)
ret.code = code
return ret
def http_request(self, req):
# According to RFC 3986, URLs can not contain non-ASCII characters, however this is not
# always respected by websites, some tend to give out URLs with non percent-encoded
@@ -991,13 +983,13 @@ class YoutubeDLHandler(compat_urllib_request.HTTPHandler):
break
else:
raise original_ioerror
resp = self.addinfourl_wrapper(uncompressed, old_resp.headers, old_resp.url, old_resp.code)
resp = compat_urllib_request.addinfourl(uncompressed, old_resp.headers, old_resp.url, old_resp.code)
resp.msg = old_resp.msg
del resp.headers['Content-encoding']
# deflate
if resp.headers.get('Content-encoding', '') == 'deflate':
gz = io.BytesIO(self.deflate(resp.read()))
resp = self.addinfourl_wrapper(gz, old_resp.headers, old_resp.url, old_resp.code)
resp = compat_urllib_request.addinfourl(gz, old_resp.headers, old_resp.url, old_resp.code)
resp.msg = old_resp.msg
del resp.headers['Content-encoding']
# Percent-encode redirect URL of Location HTTP header to satisfy RFC 3986 (see

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2017.05.26'
__version__ = '2017.06.05'