View on GitHub

Webm Guide

If you wish to make a webm from scratch, you must first invent ffmpeg. ‐ Carl Sagan

How to webm (or mp4) for 4chan

If you just care about the webm software:

If you want to learn how to webm, read on.
image.jpg

Contents:

Quick Reference

tl;dr here’s what you do up front

Duration Resolution Bitrate Bitrate Bitrate
    (6 MiB) (4 MiB w/ audio) (4 MiB no audio)
< 30 seconds 1920x1080 1500k 990k 1090k
0:30 - 0:45 1600x900 1000k 630k 725k
0:45 - 1:15 1440x810 550k 335k 435k
1:15 - 2:00 1280x720 310k 175k 270k
2:00 - 2:30 1024x576 230k 120k -
2:30 - 3:00 960x540 175k 80k -
3:00 - 4:00 854x480 100k 35k -
4:00 - 5:00 736x414 60k 8k -
5:00 - 6:00 640x360 35k - -
6:00 - 6:40 480x270 20k - -

Codecs and Containers

webm is a container format that is a modified form of .mkv
That means there is not just one type of webm. It can be any combination of a supported video + supported audio track.
For the purpose of 4chan, you have vp8 or vp9 for video, vorbis or opus for audio.
Keep in mind that webm can also contain av1 but this isn’t allowed on 4chan as of the time of this writing.
webm also supports embedded vtt subtitles, but this is also not supported by 4chan.
Metadata is also stripped by 4chan, so don’t bother.

If you come across a webm (or any container) and you want to know what’s in it, use ffprobe.

Basically you want to use vp9 + opus in all circumstances. vp8 and vorbis are legacy codecs that perform objectively worse.

webm vs mp4

4chan supports only h264 video with aac audio for mp4, so for the purposes of comparison, I’m talking about vp9+opus webm vs h264+aac mp4.
Side note: If you mix vp9+aac or h264+opus, you’re gonna create an mkv that can’t be posted to 4chan.

With all that said, the strategies for encoding mp4 are substantially the same as webm, so you can apply the same knowledge regardless of which you choose.
Generally speaking, I recommend webm by default and mp4 if your clip is very short or you know what you’re doing.

Regarding YouTube

Size and Length Limits

4chan limits:

Base 1024 vs Base 1000 Bytes

This is a whole can of worms but important to understand. 4chan’s size limits are the traditional 1024 based definition of Megabytes (Sometimes these are called Mebibytes with MiB to differentiate from base 1000 MB).

For clarity, I will use Gi, Mi, Ki for base 1024 bytes and G, M, K for base 1000 bytes because I need to refer to both.

Some systems display local file sizes in MB. So if you see that your file is 6.29 MB, it’s fine because the real limit is 6 * 1024 * 1024 = 6291456 bytes. This also explains why a file displayed as 6 MB will show as 5.72 when uploaded to 4chan.

When specifying bitrates in ffmpeg, the prefix k means Kilobits, so it’s important to keep in mind that you’re using the base 1000 definition of the term as well as bits so you have to divide by 8 to get bytes.

Side note: webm-for-4chan prints the final webm file size in KB and you want a file under 6144 or 4096

yt-dlp

yt-dlp is pretty much the best way to download any video. It supports thousands of sites, not just YouTube.

Most of the time, you just need to provide the URL and it will download the best available video by default. There are a couple handy things to know:

ffmpeg

image.jpg

Every solution out there is some kind of ffmpeg wrapper, so even if you use a GUI, the concepts discussed below still apply. The raw commands are just presented differently.

CRF vs Average Bitrate

For vp9 (and h.264) encoding you have 2 methods: constrained quality (CRF) and average bitrate. For your typical non-4chan use-case you probably want CRF, but this produces a variable bit-rate that makes the file size unpredictable. You will get a consistent quality, but not the best possible quality, because at the end of the day, it all boils down to video bitrate. CRF is just a way of telling ffmpeg that you don’t care about the specific bitrate or size, just make the perceived quality consistent. If you use CRF you’ll just end up making webms too big or too small.

The way to go is average bitrate, specifically two-pass encoding. On the first pass, ffmpeg builds a profile of the whole video that allows it to maximize compression on the next pass. Using average video bitrate (-b:v), the encoder will produce a file that has variable bitrate at any one point in time, but it averages out to the specified rate over the whole length of the clip. Basically it “saves” bits during sections that are highly compressible (black screens, no motion) and “spends” the saved bits when needed. This isn’t unique to vp9, this is how two-pass encoding works for any codec.

Calculating Video Size

Ignoring audio, your output video size is easily calculated:
bitrate = size_limit / duration where duration is in seconds.
Make sure your bitrate and size limit are the same units. For instance if you want to target a 6MiB file, that’s 6 * 1024 * 1024 = 6291456 bytes.
Represented as kilobits, that’s 6291456 / 1024 * 8 = 49152.
Then, divide that by the duration and there’s your target bitrate in kbps.

So for example, a 30 second video is 49152 / 30 = 1638.4

Then run ffmpeg two-pass encoding:
ffmpeg -i in.mp4 -c:v libvpx-vp9 -b:v 1638k -pass 1 -an -f null /dev/null
ffmpeg -i in.mp4 -c:v libvpx-vp9 -b:v 1638k -pass 2 -an output.webm

Preventing Size Overshoot

Even though we can calculate the bitrate down to the exact byte, it’s a target and ffmpeg can produce a file that’s a little bigger or smaller than the target.
The longer the video, the more ffmpeg is likely to overshoot on size.
In general, round down to the nearest integer in kbps, and for clips over 2 or 3 minutes, reduce the target by another 2-4 kbps.

It also appears that the h264 encoder is much worse about overshooting than vp9, so you may want to build in additional margin when making mp4.

Deadline and Multithreading

libvpx-vp9 has a couple options that affect quality and enconding speed.

Clipping

You can directly make a webm of any time slice using seeking:
ffmpeg -ss 1:00:04.25 -t 45 -i input.mp4 output.webm

This is precise when explicitly re-encoding, but keep in mind that if you just copy the streams (-c:v copy -c:a copy), that can result in a broken video at the edges due to missing keyframes.

Filters

filters are really handy. You can add filters using -vf (video filter) or -af (audio filter).
See the official Filter Guide for more info.
A few useful video filters to know:

Complex Filters

You can get crazy with -filter_complex by building an entire filter graph.
One use case for this is to take multiple segments of a video and concatenate them together in one operation. For more information, see this project.

I’m not going to cover complex filters here, but read this and this and this if you’re interested.

Multiple Audio Tracks

How do we handle containers with multiple audio tracks? First, you have to figure out the index of the track by either inspecting it in a video player or with ffprobe:
ffprobe -v error -show_entries stream=index:stream_tags=language -select_streams a -of csv=p=0 input.mkv
This will show you something like this:

1,eng
2,jpn

One minor quirk is that the index shown is not the index you need to specify in ffmpeg. To get the real index, start with 0 and count up from there.
Then, in ffmpeg you can use -map to specify the audio index.

Audio and Video Sync

Sometimes an encode can get out of sync due to variable frame rate (vfr) especially when changing the framerate. In general, you want vfr because it efficiently decides how many frames are needed to represent a section of video.

Use -async 1 -vsync 2 when encoding.

Audio

Calculating the video bitrate is not all there is to it. You have to factor in the size of the audio as well.
For webm the audio codec is libopus.

If you want to encode only the audio portion, the ffmpeg command is:
ffmpeg -i input.mp4 -vn -c:a libopus -b:a 128k output.ogg

Generally, you can estimate the audio size using the audio bitrate and duration. However, the actual size can vary depending on how well it’s compressed. In order to get the most accurate size, render the audio using the command above and get the size of that file. Then, subtract that from the total file size limit to get the video size limit. Rendering audio is extremely fast, so this step doesn’t take long at all compared to video encoding.

Audio Bitrates

Choosing the right audio bitrate isn’t as straightforward as video. In general you can go with 96k by default, but there are other considerations.

Since video is usually more important than audio for perception of quality, you’ll want to use the lowest audio bitrate you can get away with. This table will give you a good idea of how much space your audio takes up:

Bitrate Size at 3 minutes Comments
320k 7 MiB Not practical except for music under 2:30
256k 5.6 MiB Upper practical limit for music webms
192k 4.2 MiB Good for most high quality music
160k 3.5 MiB Good for 5.1 surround sound
128k 2.8 MiB Good for high quality stereo tracks
112k 2.5 MiB If you need a little more than 96k
96k 2.1 MiB Good default
80k 1.8 MiB You can probably get away with this most of the time
64k 1.4 MiB High quality mono track
56k 1.3 MiB Decent quality for most mono applications
48k 1.1 MiB Lower quality but you can usually get away with it
32k 721 KiB Noticably degraded, don’t recommend

Of course for short webms this doesn’t matter as much, but the bitrate adds up for long duration webms. Keep in mind that if you’re getting your stuff from YouTube, that’s usually at 128k so don’t bother with higher bitrate.

Stereo and Mono Mixdown

It’s important to understand that the bitrate is the total bitrate across all channels, meaning that you can save on total bitrate by reducing the number of channels. This is especially pronounced for 5.1 surround sound, so if you’re converting a movie clip or something, it’s a good idea to mixdown to stereo.
In ffmpeg, this is -ac 2 (2 audio channels) or -ac 1 (1 audio channel)

We can take this even further by analyzing the similarity of the 2 stereo tracks.
In python, this can be done with the cosine similarity function of the scikit-learn package.

import scipy.io.wavfile as wavfile
from sklearn.metrics.pairwise import cosine_similarity

rate, data = wavfile.read('temp.wav')
num_channels = data.shape[1]
if num_channels == 1:
    print('Mono audio detected.')
    cosim = 1
elif num_channels == 2:
    left = data[:, 0]
    right = data[:, 1]
    cosim = cosine_similarity(left.reshape(1, -1),right.reshape(1, -1))[0][0]
print('Channel cosine similarity: {:.4f}%'.format(cosim * 100))

Basically if both tracks are highly similar, we can go ahead and mixdown to mono and cut the bitrate in half. For short clips it’s usually not worth it to do this, but as you can see from the table, a 56k mono track has significant space savings over a 96k stereo track at 3 minutes, granting about 1 MB to be used toward the video bitrate.

Music webms

Making a music webm is one of the easier things to do with ffmpeg. By music webm, I mean a webm with a static image and a song.
ffmpeg -i cover.jpg -i input.mp3 out.webm

However, this will use the default audio bitrate of 96k. We can be more explicit by setting the -b:a audio bitrate:
ffmpeg -i cover.jpg -i input.mp3 -b:a 192k out.webm

Since the webm size is pretty much all music, finding the audio bitrate is just a function of the song duration:
bitrate_kbps = size_limit / duration
Where duration is seconds and size_limit is either 49152 for /wsg/ or 32768 for /gif/

We can take this one step further by messing with the keyframes:
ffmpeg -framerate 1 -loop 1 -i cover.jpg -i input.mp3 -c:a libopus -b:a 128k -c:v libvpx-vp9 -g 212 -t 0:03:32 out.webm

Doing it with an animated gif is a little more tricky:
ffmpeg -ignore_loop 0 -i dancing_baby.gif -i input.mp3 -c:a libopus -b:a 128k -c:v libvpx-vp9 -b:v 108k -t 0:03:32 out.webm

Making Precisely Sized webms

Using all the above knowledge, we now know how to make a webm using 3 passes:

That’ll produce a webm that’s very close to the size you expect.

Resolution

In ffmpeg, changing the resolution is done using the scale video filter.
ffmpeg -i input.mp4 -vf scale=1280:-1 output.webm

But we can generalize this statement to work for any sized input:
-vf scale='min(1280,iw)':'min(1280,ih):force_original_aspect_ratio=decrease'

Obviously you don’t want to keep the original resolution most of the time, but the trick is figuring out the right resolution. The table below is a good start:

Duration Size
< 30 seconds 1920 x 1080
0:30 - 0:45 1600 x 900
0:45 - 1:15 1440 x 810
1:15 - 2:00 1280 x 720
2:00 - 2:30 1024 x 576
2:30 - 3:00 960 x 540
3:00 - 4:00 854 x 480
4:00 - 4:45 736 x 414
4:00 - 5:30 640 x 360
5:30 - 6:40 480 x 270

Let’s use math to do better than a lookup table. Resolution is better determined as a function of file size and duration. And resolution is better defined by the total number of pixels rather than assuming that the input is 16:9 1080p. Because we need a solution that scales well for 3:4 or square videos or whatever.

So really, you need to account for the size of the audio just like when determining the video bitrate. Conveniently, the target video bitrate is exactly what we need because the bitrate itself was determined as a function of file size and duration.

total_pixels = width * height
# Factor in the total resolution of the image and the bit rate
x = target_bitrate * total_pixels
# Calculate the ideal resolution using logarithmic curve: y = a * ln(x/b)
a = 2.311e-01
b = 3.547e+01
scale_factor = a * math.log(target_bitrate/b)
scaled_pixels = total_pixels * scale_factor
scaled_height = scaled_pixels / width
scaled_width = scaled_pixels / height
calculated_resolution = max(scaled_height, scaled_width)

This is a curve fit that lines up with the table above. The difference between this and the lookup table is that now this scales with the target bitrate, so the audio size is a factor and can change the target resolution.

But we still have a problem. The original table has a baked-in assumption: The input is 1080p. To correct for this, we have to scale the input to 1080p so that the curve gives us a sane scale factor.

# Scales resolution sources to 1080p to match the calibrated resolution curve
def scale_to_1080(width, height):
    min_dimension = min(width, height)
    scale_factor = 1080 / min_dimension
    return [width * scale_factor, height * scale_factor]

width, height = scale_to_1080(raw_width, raw_height)

And yes, this is still valid for any resolution because what’s being calculated is a resolution limit. All scale_to_1080 does is align the resolution with the baked in assumption in the curve fit, which make sure that smaller inputs don’t get reduced in size too much.

This method is far from perfect, but it gets the resolution in the right ballpark most of the time. In practice, there are other considerations, like the amount of motion in the clip, the number of colors, number of scene changes, and more. Getting the resolution right is more of a subjective process, so it’s difficult to come up with a purely mathematical solution that works unless we result to complex solutions like image analysis.

Subtitles

Subtitle burn-in is pretty easy in ffmpeg using the subtitles filter. You can specify external or embedded subs.

You can identify embedded subtitles using ffprobe:
ffprobe -v error -select_streams s -of csv=p=0 -show_entries stream=index:stream_tags=language input.mkv
Which will output something like this:

3,jpn
4,eng

And just like the audio index, the number isn’t the index you want to specify in ffmpeg. Count from 0 from the top.

If you want to export the embedded subs to file: ffmpeg -i input.mkv -map 0:s:1 subs.ass

You can download YouTube subtitles with yt-dlp:
yt-dlp --write-sub --sub-lang en --sub-format ttml
But ttml doesn’t really work for ffmpeg so you’ll have to convert to ass. This can be done with ttml2ssa.
Alternatively, use vtt subtitles:
yt-dlp --write-sub --sub-lang en --sub-format vtt

Sometimes, embedded subtitles aren’t usable by ffmpeg. In this case I recommend handbrake which is very good at recognizing obscure sub formats.

webm for 4chan

image.jpg

Everything I discussed above is implemented in my webm-for-4chan python script. So if you want to see it in action, use the script. It prints out the calculations and ffmpeg commands so that you can understand exactly what it’s doing. Use the --dry_run option if you just want to see the calculations and commands without rendering the webm.

For the most part, the script does everything by default. You only have to specify options if you need to do something special like modify the audio bitrate or burn-in subtitles, etc.