How to webm (or mp4) for 4chan
If you just care about the webm software:
If you want to learn how to webm, read on.
Contents:
- Quick Reference
- Codecs and Containers
- webm vs mp4
- Size and Length Limits
- yt-dlp
- ffmpeg
- Audio
- Making Precisely Sized webms
- Resolution
- Subtitles
- webm for 4chan
Quick Reference
tl;dr here’s what you do up front
- Use vp9 video
- Use opus audio
- 192k for music
- 96k for everything else
- 56k mono if you need to save space
- Pick a resolution, use lookup table for ballpark
- Hopefully you’re using a tool that can calculate target size, but if not:
- Calculate video bitrate using this method
- Or use the lookup table:
Duration | Resolution | Bitrate | Bitrate | Bitrate |
---|---|---|---|---|
(6 MiB) | (4 MiB w/ audio) | (4 MiB no audio) | ||
< 30 seconds | 1920x1080 | 1500k | 990k | 1090k |
0:30 - 0:45 | 1600x900 | 1000k | 630k | 725k |
0:45 - 1:15 | 1440x810 | 550k | 335k | 435k |
1:15 - 2:00 | 1280x720 | 310k | 175k | 270k |
2:00 - 2:30 | 1024x576 | 230k | 120k | - |
2:30 - 3:00 | 960x540 | 175k | 80k | - |
3:00 - 4:00 | 854x480 | 100k | 35k | - |
4:00 - 5:00 | 736x414 | 60k | 8k | - |
5:00 - 6:00 | 640x360 | 35k | - | - |
6:00 - 6:40 | 480x270 | 20k | - | - |
Codecs and Containers
webm is a container format that is a modified form of .mkv
That means there is not just one type of webm. It can be any combination of a supported video + supported audio track.
For the purpose of 4chan, you have vp8 or vp9 for video, vorbis or opus for audio.
Keep in mind that webm can also contain av1 but this isn’t allowed on 4chan as of the time of this writing.
webm also supports embedded vtt subtitles, but this is also not supported by 4chan.
Metadata is also stripped by 4chan, so don’t bother.
If you come across a webm (or any container) and you want to know what’s in it, use ffprobe.
Basically you want to use vp9 + opus in all circumstances. vp8 and vorbis are legacy codecs that perform objectively worse.
webm vs mp4
4chan supports only h264 video with aac audio for mp4, so for the purposes of comparison, I’m talking about vp9+opus webm vs h264+aac mp4.
Side note: If you mix vp9+aac or h264+opus, you’re gonna create an mkv that can’t be posted to 4chan.
- Contrary to popular belief, vp9 is not objectively superior to h264 in all circumstances.
- Generally speaking, opus is better than aac. So if you’re going for music, like a static image with high quality audio, use webm.
- h264 encoding is much faster than vp9
- h264 better preserves film grain and fine detail at an adequate bitrate, meaning that mp4 is better for short high quality clips of live action movies and the like
- Since vp9 destroys film grain, it’s good for digital animation where there are large blocks of solid color
- vp9 is better at really low bitrates, meaning that webm really shines when you have a long video that is super crunched
With all that said, the strategies for encoding mp4 are substantially the same as webm, so you can apply the same knowledge regardless of which you choose.
Generally speaking, I recommend webm by default and mp4 if your clip is very short or you know what you’re doing.
Regarding YouTube
- Generally speaking, YouTube videos have h264, vp9, or av1 video streams, and audio that is opus or aac-lc/mp4a.40.2.
- Livestreams are h264+aac and after the stream ends, they are re-encoded to vp9+opus. Apparently YouTube determined that the space savings from re-encoding to vp9 was worth it for playback, lending to the argument that webm is a good default.
- Short videos are often encoded to av1+opus, so if you download a webm from YouTube and it’s under the 4chan size limit, don’t be surprised if you can’t post it.
Size and Length Limits
4chan limits:
- /wsg/: 400 seconds, 6MiB
- /gif/: 300 seconds, 4MiB
- All non-sound boards: 120 seconds, 4MiB
Base 1024 vs Base 1000 Bytes
This is a whole can of worms but important to understand. 4chan’s size limits are the traditional 1024 based definition of Megabytes (Sometimes these are called Mebibytes with MiB to differentiate from base 1000 MB).
For clarity, I will use Gi, Mi, Ki
for base 1024 bytes and G, M, K
for base 1000 bytes because I need to refer to both.
Some systems display local file sizes in MB. So if you see that your file is 6.29 MB, it’s fine because the real limit is 6 * 1024 * 1024 = 6291456 bytes. This also explains why a file displayed as 6 MB will show as 5.72 when uploaded to 4chan.
When specifying bitrates in ffmpeg, the prefix k
means Kilobits, so it’s important to keep in mind that you’re using the base 1000 definition of the term as well as bits so you have to divide by 8 to get bytes.
Side note: webm-for-4chan prints the final webm file size in KB and you want a file under 6144 or 4096
yt-dlp
yt-dlp is pretty much the best way to download any video. It supports thousands of sites, not just YouTube.
Most of the time, you just need to provide the URL and it will download the best available video by default. There are a couple handy things to know:
- Download an upcoming or ongoing livestream:
yt-dlp --wait-for-video 30 --live-from-start
- Download the auto subtitles:
yt-dlp --write-auto-sub --skip-download
- Download only the section from 10:00 to the end of the video:
yt-dlp --download-sections "*10:00-inf"
ffmpeg
Every solution out there is some kind of ffmpeg wrapper, so even if you use a GUI, the concepts discussed below still apply. The raw commands are just presented differently.
CRF vs Average Bitrate
For vp9 (and h.264) encoding you have 2 methods: constrained quality (CRF) and average bitrate. For your typical non-4chan use-case you probably want CRF, but this produces a variable bit-rate that makes the file size unpredictable. You will get a consistent quality, but not the best possible quality, because at the end of the day, it all boils down to video bitrate. CRF is just a way of telling ffmpeg that you don’t care about the specific bitrate or size, just make the perceived quality consistent. If you use CRF you’ll just end up making webms too big or too small.
The way to go is average bitrate, specifically two-pass encoding. On the first pass, ffmpeg builds a profile of the whole video that allows it to maximize compression on the next pass. Using average video bitrate (-b:v
), the encoder will produce a file that has variable bitrate at any one point in time, but it averages out to the specified rate over the whole length of the clip. Basically it “saves” bits during sections that are highly compressible (black screens, no motion) and “spends” the saved bits when needed. This isn’t unique to vp9, this is how two-pass encoding works for any codec.
Calculating Video Size
Ignoring audio, your output video size is easily calculated:
bitrate = size_limit / duration
where duration is in seconds.
Make sure your bitrate and size limit are the same units. For instance if you want to target a 6MiB
file, that’s 6 * 1024 * 1024 = 6291456
bytes.
Represented as kilobits, that’s 6291456 / 1024 * 8 = 49152
.
Then, divide that by the duration and there’s your target bitrate in kbps.
So for example, a 30 second
video is 49152 / 30 = 1638.4
Then run ffmpeg two-pass encoding:
ffmpeg -i in.mp4 -c:v libvpx-vp9 -b:v 1638k -pass 1 -an -f null /dev/null
ffmpeg -i in.mp4 -c:v libvpx-vp9 -b:v 1638k -pass 2 -an output.webm
-i
specifies the input file-c:v
specifies the video codec-b:v
specifies the video bitrate-an
means no audio-f null /dev/null
on pass 1 means no output video- Use
NUL
instead of/dev/null
if you’re using Windows
- Use
Preventing Size Overshoot
Even though we can calculate the bitrate down to the exact byte, it’s a target and ffmpeg can produce a file that’s a little bigger or smaller than the target.
The longer the video, the more ffmpeg is likely to overshoot on size.
In general, round down to the nearest integer in kbps, and for clips over 2 or 3 minutes, reduce the target by another 2-4 kbps.
It also appears that the h264 encoder is much worse about overshooting than vp9, so you may want to build in additional margin when making mp4.
Deadline and Multithreading
libvpx-vp9 has a couple options that affect quality and enconding speed.
- Speed up encoding a little by enabling row-based multithreading (
-row-mt 1
) - The deadline option changes the encode quality
- The default deadline is
good
, and I recommend using this most of the time - If you don’t care about waiting, use
-deadline best
. It will take significantly longer to encode, but the quality will be somewhat higher. - Get a significant speed-up at the cost of quality with
-deadline realtime
- The default deadline is
Clipping
You can directly make a webm of any time slice using seeking:
ffmpeg -ss 1:00:04.25 -t 45 -i input.mp4 output.webm
-ss
will seek to 1 hour, 4 seconds and 250 ms.-t
will take 45 seconds starting from the ss time.- Use
-to
instead of-t
to specify an absolute timestamp
This is precise when explicitly re-encoding, but keep in mind that if you just copy the streams (-c:v copy -c:a copy
), that can result in a broken video at the edges due to missing keyframes.
Filters
filters are really handy. You can add filters using -vf
(video filter) or -af
(audio filter).
See the official Filter Guide for more info.
A few useful video filters to know:
- blackframe Use this to find the timestamp where the real video begins if it starts with black frames or has a fade in from black.
- crop crops the video to whatever you want.
- cropdetect automatically finds the edges of a letterboxed video. Run this as a first pass and use the result with crop.
- minterpolate intelligently reduces the framerate of the video using motion interpolation.
- scale resizes the video.
- subtitles will burn-in subtitles from file.
Complex Filters
You can get crazy with -filter_complex
by building an entire filter graph.
One use case for this is to take multiple segments of a video and concatenate them together in one operation. For more information, see this project.
I’m not going to cover complex filters here, but read this and this and this if you’re interested.
Multiple Audio Tracks
How do we handle containers with multiple audio tracks? First, you have to figure out the index of the track by either inspecting it in a video player or with ffprobe:
ffprobe -v error -show_entries stream=index:stream_tags=language -select_streams a -of csv=p=0 input.mkv
This will show you something like this:
1,eng
2,jpn
One minor quirk is that the index shown is not the index you need to specify in ffmpeg. To get the real index, start with 0 and count up from there.
Then, in ffmpeg you can use -map
to specify the audio index.
-map 0:a:0
would select the first track (english)-map 0:a:1
would select the second track (japanese)
Audio and Video Sync
Sometimes an encode can get out of sync due to variable frame rate (vfr) especially when changing the framerate. In general, you want vfr because it efficiently decides how many frames are needed to represent a section of video.
Use -async 1 -vsync 2
when encoding.
-async 1
will prevent audio desync-vsync 2
will specify vfr video sync method
Audio
Calculating the video bitrate is not all there is to it. You have to factor in the size of the audio as well.
For webm the audio codec is libopus.
If you want to encode only the audio portion, the ffmpeg command is:
ffmpeg -i input.mp4 -vn -c:a libopus -b:a 128k output.ogg
-i
specifies the input file-vn
specifies no video-c:a
specifies the audio codec-b:a
specifies the audio bitrate
Generally, you can estimate the audio size using the audio bitrate and duration. However, the actual size can vary depending on how well it’s compressed. In order to get the most accurate size, render the audio using the command above and get the size of that file. Then, subtract that from the total file size limit to get the video size limit. Rendering audio is extremely fast, so this step doesn’t take long at all compared to video encoding.
Audio Bitrates
Choosing the right audio bitrate isn’t as straightforward as video. In general you can go with 96k by default, but there are other considerations.
Since video is usually more important than audio for perception of quality, you’ll want to use the lowest audio bitrate you can get away with. This table will give you a good idea of how much space your audio takes up:
Bitrate | Size at 3 minutes | Comments |
---|---|---|
320k | 7 MiB | Not practical except for music under 2:30 |
256k | 5.6 MiB | Upper practical limit for music webms |
192k | 4.2 MiB | Good for most high quality music |
160k | 3.5 MiB | Good for 5.1 surround sound |
128k | 2.8 MiB | Good for high quality stereo tracks |
112k | 2.5 MiB | If you need a little more than 96k |
96k | 2.1 MiB | Good default |
80k | 1.8 MiB | You can probably get away with this most of the time |
64k | 1.4 MiB | High quality mono track |
56k | 1.3 MiB | Decent quality for most mono applications |
48k | 1.1 MiB | Lower quality but you can usually get away with it |
32k | 721 KiB | Noticably degraded, don’t recommend |
Of course for short webms this doesn’t matter as much, but the bitrate adds up for long duration webms. Keep in mind that if you’re getting your stuff from YouTube, that’s usually at 128k so don’t bother with higher bitrate.
Stereo and Mono Mixdown
It’s important to understand that the bitrate is the total bitrate across all channels, meaning that you can save on total bitrate by reducing the number of channels. This is especially pronounced for 5.1 surround sound, so if you’re converting a movie clip or something, it’s a good idea to mixdown to stereo.
In ffmpeg, this is -ac 2
(2 audio channels) or -ac 1
(1 audio channel)
We can take this even further by analyzing the similarity of the 2 stereo tracks.
In python, this can be done with the cosine similarity function of the scikit-learn package.
import scipy.io.wavfile as wavfile
from sklearn.metrics.pairwise import cosine_similarity
rate, data = wavfile.read('temp.wav')
num_channels = data.shape[1]
if num_channels == 1:
print('Mono audio detected.')
cosim = 1
elif num_channels == 2:
left = data[:, 0]
right = data[:, 1]
cosim = cosine_similarity(left.reshape(1, -1),right.reshape(1, -1))[0][0]
print('Channel cosine similarity: {:.4f}%'.format(cosim * 100))
Basically if both tracks are highly similar, we can go ahead and mixdown to mono and cut the bitrate in half. For short clips it’s usually not worth it to do this, but as you can see from the table, a 56k mono track has significant space savings over a 96k stereo track at 3 minutes, granting about 1 MB to be used toward the video bitrate.
Music webms
Making a music webm is one of the easier things to do with ffmpeg. By music webm, I mean a webm with a static image and a song.
ffmpeg -i cover.jpg -i input.mp3 out.webm
However, this will use the default audio bitrate of 96k. We can be more explicit by setting the -b:a
audio bitrate:
ffmpeg -i cover.jpg -i input.mp3 -b:a 192k out.webm
Since the webm size is pretty much all music, finding the audio bitrate is just a function of the song duration:
bitrate_kbps = size_limit / duration
Where duration
is seconds and size_limit
is either 49152
for /wsg/ or 32768
for /gif/
We can take this one step further by messing with the keyframes:
ffmpeg -framerate 1 -loop 1 -i cover.jpg -i input.mp3 -c:a libopus -b:a 128k -c:v libvpx-vp9 -g 212 -t 0:03:32 out.webm
-framerate
sets the frame rate to 1 fps. You don’t need a 24fps static image, do you?-g
sets the group-of-pictures interval to match the duration of the song, effectively making a video that only has one keyframe.- Note that 212 seconds = 3:32, the duration of the song
Doing it with an animated gif is a little more tricky:
ffmpeg -ignore_loop 0 -i dancing_baby.gif -i input.mp3 -c:a libopus -b:a 128k -c:v libvpx-vp9 -b:v 108k -t 0:03:32 out.webm
-ignore_loop 0
will cause the gif to continually loop-t 3:32
limits the video duration. You want this to match the song duration, otherwise you’ll keep looping the gif after the song ends.-b:v
sets the target bitrate, ensuring that the video remains under the size limit
Making Precisely Sized webms
Using all the above knowledge, we now know how to make a webm using 3 passes:
- Render the audio:
ffmpeg -i input.mp4 -vn -c:a libopus -b:a 96k output.ogg
- Get the size of the audio file in bytes
- Lookup the size limit in bytes (
6 * 1024 * 1024
or4 * 1024 * 1024
) video_size = size_limit - audio_size
- Get the total
duration
of the video in seconds bitrate = video_size / duration
that’s your video bitrate in Bytes per secondbitrate_kbps = bitrate * 8 / 1024
that’s your video bitrate in kbps. For this example let’s say it’s 715.
- Run ffmpeg 1st pass
ffmpeg -i input.mp4 -c:v libvpx-vp9 -b:v 715k -async 1 -vsync 2 -pass 1 -an -f null /dev/null
- or
NUL
instead of/dev/null
if you’re using Windows - Note that on the first pass we do
-an
because ffmpeg is doing video stuff only
- Run ffmpeg 2nd pass
ffmpeg -i input.mp4 -c:v libvpx-vp9 -b:v 715k -async 1 -vsync 2 -pass 2 -c:a libopus -b:a 96k out.webm
That’ll produce a webm that’s very close to the size you expect.
Resolution
In ffmpeg, changing the resolution is done using the scale video filter.
ffmpeg -i input.mp4 -vf scale=1280:-1 output.webm
-vf
applies a video filter,scale
in this case-1
scales the height dimension automatically
But we can generalize this statement to work for any sized input:
-vf scale='min(1280,iw)':'min(1280,ih):force_original_aspect_ratio=decrease'
min(1280,iw)
andmin(1280,ih)
limit the size in each dimension to 1280 or the input size (iw
= image width,ih
= image height), whichever is smaller.- So for an input below 1280p this does nothing. And it applies equally to horizontally and vertically oriented videos.
Obviously you don’t want to keep the original resolution most of the time, but the trick is figuring out the right resolution. The table below is a good start:
Duration | Size |
---|---|
< 30 seconds | 1920 x 1080 |
0:30 - 0:45 | 1600 x 900 |
0:45 - 1:15 | 1440 x 810 |
1:15 - 2:00 | 1280 x 720 |
2:00 - 2:30 | 1024 x 576 |
2:30 - 3:00 | 960 x 540 |
3:00 - 4:00 | 854 x 480 |
4:00 - 4:45 | 736 x 414 |
4:00 - 5:30 | 640 x 360 |
5:30 - 6:40 | 480 x 270 |
Let’s use math to do better than a lookup table. Resolution is better determined as a function of file size and duration. And resolution is better defined by the total number of pixels rather than assuming that the input is 16:9 1080p. Because we need a solution that scales well for 3:4 or square videos or whatever.
So really, you need to account for the size of the audio just like when determining the video bitrate. Conveniently, the target video bitrate is exactly what we need because the bitrate itself was determined as a function of file size and duration.
total_pixels = width * height
# Factor in the total resolution of the image and the bit rate
x = target_bitrate * total_pixels
# Calculate the ideal resolution using logarithmic curve: y = a * ln(x/b)
a = 2.311e-01
b = 3.547e+01
scale_factor = a * math.log(target_bitrate/b)
scaled_pixels = total_pixels * scale_factor
scaled_height = scaled_pixels / width
scaled_width = scaled_pixels / height
calculated_resolution = max(scaled_height, scaled_width)
This is a curve fit that lines up with the table above. The difference between this and the lookup table is that now this scales with the target bitrate, so the audio size is a factor and can change the target resolution.
But we still have a problem. The original table has a baked-in assumption: The input is 1080p. To correct for this, we have to scale the input to 1080p so that the curve gives us a sane scale factor.
# Scales resolution sources to 1080p to match the calibrated resolution curve
def scale_to_1080(width, height):
min_dimension = min(width, height)
scale_factor = 1080 / min_dimension
return [width * scale_factor, height * scale_factor]
width, height = scale_to_1080(raw_width, raw_height)
And yes, this is still valid for any resolution because what’s being calculated is a resolution limit. All scale_to_1080
does is align the resolution with the baked in assumption in the curve fit, which make sure that smaller inputs don’t get reduced in size too much.
This method is far from perfect, but it gets the resolution in the right ballpark most of the time. In practice, there are other considerations, like the amount of motion in the clip, the number of colors, number of scene changes, and more. Getting the resolution right is more of a subjective process, so it’s difficult to come up with a purely mathematical solution that works unless we result to complex solutions like image analysis.
Subtitles
Subtitle burn-in is pretty easy in ffmpeg using the subtitles filter. You can specify external or embedded subs.
- For internal subs:
-vf subtitles=input.mkv:si=1
where si is the subtitle index. - For external subs:
-vf subtitles=subs.ass
You can identify embedded subtitles using ffprobe:
ffprobe -v error -select_streams s -of csv=p=0 -show_entries stream=index:stream_tags=language input.mkv
Which will output something like this:
3,jpn
4,eng
And just like the audio index, the number isn’t the index you want to specify in ffmpeg. Count from 0 from the top.
If you want to export the embedded subs to file:
ffmpeg -i input.mkv -map 0:s:1 subs.ass
- where
0:s:1
would specify the english subtitles and0:s:0
would specify the japanese subtitles as listed above.
You can download YouTube subtitles with yt-dlp:
yt-dlp --write-sub --sub-lang en --sub-format ttml
But ttml doesn’t really work for ffmpeg so you’ll have to convert to ass. This can be done with ttml2ssa.
Alternatively, use vtt subtitles:
yt-dlp --write-sub --sub-lang en --sub-format vtt
Sometimes, embedded subtitles aren’t usable by ffmpeg. In this case I recommend handbrake which is very good at recognizing obscure sub formats.
webm for 4chan
Everything I discussed above is implemented in my webm-for-4chan python script. So if you want to see it in action, use the script. It prints out the calculations and ffmpeg commands so that you can understand exactly what it’s doing. Use the --dry_run
option if you just want to see the calculations and commands without rendering the webm.
For the most part, the script does everything by default. You only have to specify options if you need to do something special like modify the audio bitrate or burn-in subtitles, etc.