wasm + ffmpeg to achieve the front-end video frame interception

weixin_33772645 2018-07-28 14:52:32  1435  Favorite 5
copyright
Is it possible to process audio and video on the front-end page? For example, the user selects a video, and then supports him to set any frame of the video as the cover, without uploading the entire video to the backend for processing. After some exploration by the author, this function is basically realized, a complete demo: ffmpeg wasm interception of video frames function :

Support mp4/mov/mkv/avi and other files. The basic idea is this:

Use a file input to let the user select a video file, then read it as an ArrayBuffer, and pass it to ffmpeg.wasm for processing. After processing, output the rgb data to the canvas or convert it to base64 as the src attribute of the img tag to form a picture . (Canvas can directly use video dom as the object of drawImage to obtain video frames, but the format of video playback is relatively small. This article focuses on the implementation of the ffmpeg scheme, because ffmpeg can do other things, this is just an example.)

Here is a question, why use ffmpeg instead of writing directly in JS? Because the C library for multimedia processing is relatively mature, ffmpeg is one of them, and it is still open source, and wasm can just convert it to format and use it on the web page. There are fewer JS libraries related to multimedia processing. Write a demultiplexing by yourself ( demux) and the complexity of decoding video can be imagined, JS direct encoding and decoding will be more time-consuming. So there are ready-made ready-made ones.

The first step is compilation (if you are not interested in the compilation process, you can skip to step 2)

1. Compile ffmpeg to the wasm version
At first I thought it would be very difficult, but later I found that it was not that big, because there was a videoconverter.js that had already been converted (it is a ffmpeg to achieve audio and video transcoding on the web page), the key is to put some useless features Disable when configuring, or it will report a syntax error when compiling. The wasm transferred from emsdk is used here . The installation method of emsdk has been clearly explained in its installation tutorial . It mainly uses a script to determine the system to download different compiled files. After downloading, there will be several executable files, including emcc, emc++, emar and other commands. emcc is a C compiler, emc++ is a C++ compiler, and emar is used to package different .o library files into one .a file.

First download the source code from ffmpeg 's official website.

(1) configure
Unzip into the directory, and then execute the following command:

emconfigure ./configure --cc="emcc" --enable-cross-compile --target-os=none --arch=x86_32 --cpu=generic \
    --disable-ffplay --disable-ffprobe --disable-asm --disable-doc --disable-devices --disable-pthreads --disable-w32threads --disable-network \
    --disable-hwaccels --disable-parsers --disable-bsfs --disable-debug --disable-protocols --disable-indevs --disable-outdevs --enable-protocol=file
复制代码
Usually the role of configure is to generate a Makefile-the configure stage confirms some compilation environment and parameters, and then generates a compilation command to put it in the Makefile.

The main function of the previous emconfigure is to specify the compiler as emcc, but this is not enough, because there are some sub-modules in ffmpeg, and it is not possible to completely specify all compilers as emcc. Fortunately, configure of ffmpeg can The custom compiler is specified through the --cc parameter. On Mac, the C compiler generally uses /usr/bin/clang, which is specified as emcc here.

The following disable disables some features that do not support wasm. For example, --disable-asm disables the use of assembly code, because those assembly syntax emcc is incompatible. If it is not disabled, the compiler will report an error syntax. error. Another --disable-hwaccels is to disable hard decoding. Some graphics cards support direct decoding without application decoding (soft decoding). The performance of hard decoding will be significantly higher than that of soft decoding. After this is disabled, it will cause later use When reporting a warning:

[swscaler @ 0x105c480] No accelerated colorspace conversion found from yuv420p to rgb24.

But it does not affect the use.

(The process of executing configure will report a segment fault, but it has no effect in the subsequent process.)

After the configure command is executed, a Makefile and related configuration files will be generated.

(2) make
make is the stage to start compiling, execute the following command to compile:

emmake make复制代码
When executed on Mac, you will find that when you finally assemble multiple .o files into .a files, you will get an error:

AR libavdevice/libavdevice.a
fatal error: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ar: fatal error in /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault. xctoolchain/usr/bin/ranlib

To solve this problem, you need to change the packaged command from ar to emar, and then remove the ranlib process, modify the ffbuild/config.mak file:

# 修改ar为emar
- AR=ar
+ AR=emar
 
# 去掉ranlib
- RANLIB=ranlib
+ #RANLIB=ranlib复制代码
Then re-make it.

After the compilation is complete, a total ffmpeg file will be generated in the ffmpeg directory, and libavcodec.a and other files will be generated in the ffmpeg's libavcodec and other directories. These files are the bitcode files we will use later. bitcode is an intermediate code of a compiled program .

(At the end, the strip -o ffmpeg ffmpeg_g command will hang, but it doesn’t matter, the strip is changed to cp ffmpeg_g ffmpeg)

2. Use ffmpeg
ffmpeg is mainly composed of several lib directories:

libavcodec: provides codec function
libavformat: Demultiplexing (demux) and multiplexing (mux)
libswscale: image scaling and pixel format conversion
Take an mp4 file as an example. mp4 is a container format. First, use the libavformat API to demultiplex mp4 to obtain information such as the location of audio and video in this file. The video is generally encoded using h264, etc. So you need to use libavcodec to decode the yuv format of the image, and finally convert it to rgb format with the help of libswscale.

There are two ways to use ffmpeg, the first is to directly compile the ffmpeg file obtained in the first step into wasm:

# 需要拷贝一个.bc后缀，因为emcc是根据后缀区分文件格式的
cp ffmpeg_g ffmpeg.bc
emcc ffmpeg.bc -o ffmpeg.html复制代码
Then a ffmpeg.js and ffpmeg.wasm will be generated. ffmpeg.js is used to load and compile the wasm file and provide a global Module object to control the functions of the ffmpeg API in wasm. With this, call ffmpeg's API through Module in JS.

But I feel that this method is more troublesome. There are many differences between the data type of JS and C. The API of C is frequently adjusted in JS. It is more troublesome to let the data be passed to and from, because it is necessary to adjust the interception function. API of ffmpeg.

So I use the second way, first write C code, realize the function in C, and finally expose an interface to JS, so that JS and WASM only need to communicate through an interface API, not like The first way is called as frequently.

So the problem turns into two steps:

The first step is to use C language to write a ffmpeg function to save the video frame image

The second step is to compile wasm and js for data interaction

The implementation of the first step mainly refers to a ffmpeg tutorial: ffmpeg tutorial . The code inside is all ready-made, just copy it over. There are some small problems because the ffmpeg version he uses is slightly older, and some API parameters need to be modified. The code has been uploaded to github and can be seen at: cfile/simple.c .

The method of use has been introduced in the readme, compiled into an executable file simple by the following command:

gcc simple.c -lavutil -lavformat -lavcodec `pkg-config --libs --cflags libavutil` `pkg-config --libs --cflags libavformat` `pkg-config --libs --cflags libavcodec` `pkg-config --libs --cflags libswscale` -o simple复制代码
Then use it to pass the location of a video file:

./simple mountain.mp4复制代码
A picture in pcm format will be generated in the current directory.

This simple.c is the API for calling ffmpeg to automatically read the hard disk file. It needs to be changed to read the file content from the memory, that is, the buffer we read from the memory ourselves and then passed to ffmpeg, before we can change the data transmission to the buffer from JS Obtained , the implementation of this can be seen: simple-from-memory.c . The specific C code will not be analyzed here, that is, the API is adjusted, which is relatively simple, that is, to know how to use it, there are relatively few development documents on the ffmpeg online. .

In this way, even if the first step is completed, then in the second step, the input of the data is changed to get from JS, and the output is changed to return to JS.

3. Interaction between js and wasm
The specific implementation of the wasm version is in web.c (there is also a proccess.c which takes out some functions of simple.c). In web.c, there is a function exposed to JS calls. Let’s call it setFile. setFile is called for JS:

EMSCRIPTEN_KEEPALIVE // 这个宏表示这个函数要作为导出的函数
ImageData *setFile(uint8_t *buff, const int buffLength, int timestamp) {
    // process ...
    return result;
}复制代码
Three parameters need to be passed:

buff: original video data (passed in through JS's ArrayBuffer)
buffLength: The total size of the video buff (in bytes)
timestamp: I want to capture the video frame of the second second
Finally, after processing the data structure that returns an ImageData:

typedef struct {
    uint32_t width;
    uint32_t height;
    uint8_t *data;
} ImageData;复制代码
There are three fields: the width and height of the picture and the RGB data.

Compile after writing these C files:

emcc web.c process.c ../lib/libavformat.bc ../lib/libavcodec.bc ../lib/libswscale.bc ../lib/libswresample.bc ../lib/libavutil.bc \
    -Os -s WASM=1 -o index.html -s EXTRA_EXPORTED_RUNTIME_METHODS='["ccall", "cwrap"]' -s ALLOW_MEMORY_GROWTH=1 -s TOTAL_MEMORY=16777216
复制代码
Use the libavcode.bc and other files generated by the first step to compile. These files have a dependency order, and they cannot be reversed. The dependent ones must be placed behind. Here are some parameters to explain:

-o index.htmlIt means that the hmtl file is exported, and index.js and index.wasm will be exported at the same time . These two are mainly used, and the generated index.html is useless;

-s EXTRA_EXPORTED_RUNTIME_METHODS='["ccall", "cwrap"] It means to export the two functions ccall and cwrap. The function of these two functions is to call the setFile function written in C above;

-s TOTAL_MEMORY=16777216 It means that the total memory size of wasm is about 16MB, which is also the default value, and this needs to be a multiple of 64;

-s ALLOW_MEMORY_GROWTH=1 Automatically expand when the memory exceeds the total size.

After compiling, write a main.html, add input[type=file] and other controls, and introduce the index.js generated above, it will load index.wasm, and provide a global Module object to control the wasm API, including the above Specify the exported function during compilation, as shown in the following code:

<!DOCType html>
<html>
<head>
    <meta charset="utf-8">
    <title>ffmpeg wasm截取视频帧功能</title>
</head>
<body>
<form>
    <p>请选择一个视频（本地操作不会上传）</p>
    <input type="file" required name="file">
    <label>时间(秒)</label><input type="number" step="1" value="0" required name="time">
    <input type="submit" value="获取图像" style="font-size:16px;">
</form>
<!--这个canvas用来画导出的图像-->
<canvas width="600" height="400" id="canvas"></canvas>
<!--引入index.js-->
<script src="index.js"></script>
<script>
<script>
!function() {
   let setFile = null;
   // WASM下载并解析完毕
   Module.onRuntimeInitialized = function () {
        console.log('WASM initialized done!');
        // 导出的核心处理函数
        setFile = Module.cwrap('setFile', 'number',
                      ['number', 'number', 'number']);
   };
}();
</script>复制代码
You need to download and parse the wasm before you can start the operation. It provides an onRuntimeInitialized callback.

In order to be able to use the functions exported in the C file, you can use Module.cwrap, the first parameter is the function name, the second parameter is the return type, because the returned is a pointer address, here is a 32-bit number, so use The number type of js, the third parameter is the parameter type.

Then read the input file content into a buffer:

let form = document.querySelector('form');
// 监听onchange事件
form.file.onchange = function () {
    if (!setFile) {
        console.warn('WASM未加载解析完毕，请稍候');
        return;
    }
    let fileReader = new FileReader();
    fileReader.onload = function () {
        // 得到文件的原始二进制数据ArrayBuffer
        // 并放在buffer的Unit8Array里面
        let buffer = new Uint8Array(this.result);
        // ...
    };
    // 读取文件
    fileReader.readAsArrayBuffer(form.file.files[0]);
};复制代码
The read buffer is placed in a Uint8Array, which is an array. Each element in the array is of type unit8, that is, an unsigned 8-bit integer, which is the size of a byte of 0101.

The next key question is: how to pass this buffer to wasm's setFile function? This needs to understand the memory heap model of wasm.

4. The memory heap model of wasm
The total memory size used by wasm specified above during compilation. The contents of the memory can be viewed through Module.buffer and Module.HEAP8:

This thing is the key to the data interaction between JS and WASM. Put the data in the HEAP8 array in JS, and then tell the WASM data pointer address and the size of the occupied memory, that is, the index and occupied length of the HEAP8 array. In turn WASM wants to return data to JS is also put into this HEA8, and then returns the pointer address and length.

However, we cannot specify a location at random, and need to use the API provided by it for allocation and expansion. In JS, apply for memory through Module._molloc or Module.dynamicMalloc, as shown in the following code:

// 得到文件的原始二进制数据，放在buffer里面
let buffer = new Uint8Array(this.result);
// 在HEAP里面申请一块指定大小的内存空间
// 返回起始指针地址
let offset = Module._malloc(buffer.length);
// 填充数据
Module.HEAP8.set(buffer, offset); 
// 最后调WASM的函数
let ptr = setFile(offset, buffer.length, +form.time.value * 1000);复制代码
Call malloc, pass the required memory space size, and then return the allocated memory starting address offset, which is actually the index in the HEAP8 array, and then call the set method of Uint8Array to fill the data. Then pass the pointer address of this offset to setFile and inform the memory size. In this way, JS transfers data to WASM.

After calling setFile, the return value is a pointer address, pointing to a struct data structure:

typedef struct {
    uint32_t width;
    uint32_t height;
    uint8_t *data;
} ImageData;复制代码
The first 4 bytes of it are used to represent the width, the next 4 bytes are the height, and the following is the pointer of the rgb data of the picture. The size of the pointer is also 4 bytes. This omits the data length because it can Get it by width * height * 3.

So the content stored in [ptr, ptr + 4) is the width, the content stored in [ptr + 4, ptr + 8) is the length, the content stored in [ptr + 8, ptr + 12) is the pointer to the image data, the following code As shown:

let ptr = setFile(offset, buffer.length, +form.time.value * 1000);
let width = Module.HEAPU32[ptr / 4]
    height = Module.HEAPU32[ptr / 4 + 1],
    imgBufferPtr = Module.HEAPU32[ptr / 4 + 2],
    imageBuffer = Module.HEAPU8.subarray(imgBufferPtr, 
                      imgBufferPtr + width * height * 3);复制代码
HEAPU32 is similar to the above HEAP8, except that it reads a number every 32 bits. Since we are all 32 bits above, it is just right to use this, it is a unit of 4 bytes, and ptr is a One unit of bytes, so ptr / 4 will get index. Don't worry about not being divisible by 4 because it is 64-bit aligned.

So we get the rgb data content of the picture, and then draw it with canvas.

5. Canvas drawing image
Use Canvas' ImageData class, as shown in the following code:

function drawImage(width, height, buffer) {
    let imageData = ctx.createImageData(width, height);
    let k = 0;
    // 把buffer内存放到ImageData
    for (let i = 0; i < buffer.length; i++) {
        // 注意buffer数据是rgb的，而ImageData是rgba的
        if (i && i % 3 === 0) {
            imageData.data[k++] = 255;
        }
        imageData.data[k++] = buffer[i];
    }
    imageData.data[k] = 255;
    memCanvas.width = width;
    memCanvas.height = height;
    canvas.height = canvas.width * height / width;
    memContext.putImageData(imageData, 0, 0, 0, 0, width, height);
    ctx.drawImage(memCanvas, 0, 0, width, height, 0, 0, canvas.width, canvas.height);
}
drawImage(width, height, imageBuffer);复制代码
This is basically completed, but there is another very important thing to do, is to release the applied memory, or after repeated operations a few times, the memory of the webpage soared to one or two G, and then thrown out of memory exception , So after the drawImage, the requested memory is released:

drawImage(width, height, imageBuffer);
// 释放内存
Module._free(offset);
Module._free(ptr);
Module._free(imgBufferPtr);复制代码
The code written in C should also release the memory applied in the intermediate process, otherwise the memory leak is still quite serious. If the address of malloc is 16358200 every time after correct free, if there is no free, it will be re-expanded every time and return the incremental offset address.

But the overall memory consumed by this thing is still relatively large.

6. Problems
After initializing ffmpeg, the memory used by the webpage has soared to 500MB. If a 300MB file is selected for processing, the memory will soar to 1.3GB, because you need to malloc a 300MB memory when setting file is transferred, and then setfile in C code During the execution process, a 300MB context variable will be malloc, because it is necessary to process the mov/m4v format in order to obtain moov information so large, it is not optimized for the time being, these add up to more than 1GB, and WebAssembly.Memory can only If you grow, you cannot shrink, that is, you can only expand to the big, not shrink, and the expanded memory is always there. For ordinary mp4 files, the context variable only needs 1MB, which can control the memory within 1GB.

The second problem is that the generated wasm file is relatively large, the original is 12.6MB, and 5MB after gzip, as shown in the following figure:

Because ffmpeg itself is relatively large, if you can deeply study the source code, and then disable some useless functions or do not include it, you should be able to slim it down, or only extract useful code, this difficulty may be slightly higher.

The third problem is the robustness of the code. In addition to finding a way to lower the memory, you also need to consider some out-of-bounds memory access issues, because sometimes you throw this exception when running:

Uncaught RuntimeError: memory access out of bounds

Although there are some problems, at least it has run up and may not have the value of deploying a production environment for the time being, and it can be optimized slowly later.

In addition to the example in this article, you can also use ffmpeg to achieve other functions, so that the web page can also directly handle multimedia. Basically, as long as ffmpeg can do it, it can also run on the web page, and the performance of wasm is higher than that of running JS directly.