FFV1 & RAWcooked

What we learned

 

 

Data is never RAW.
Data is always cooked in a way or another.

 

Jérôme Martinez

No Time to Wait 5, December 2021

FFV1

Lossless video compression format

Open source, patent free

Adopted by several archives

Frames are divided by slices, with checksums

FFV1 compression

Example with 1 second at 24 fps 10-bit HD film on a 6-core (12-thread) Skylake-X CPU:

  • 24 DPX files (or in ZIP/TAR uncompressed): 189 MB
  • 1 compressed ZIP file: 175 MB in 10 seconds
  • 1 compressed LZMA2 file: 154 MB in 30 seconds
  • 1 FFV1/MKV Intra 16-slice file: 105 MB in 1.5 seconds

FFV1 standardization

Mainly sponsored by the PREFORMA project (2015-2017)

 

Review of the main (FFmpeg) encoder/decoder

Review and improvement of the pre-existing FFV1 spec draft

Implementation checker

FFV1 standardization

It is a slow process

Different people involved, lack of time

Hobby / spare time for lot of people involved

 

IETF (the standardization body) was very supportive
(thank you!)

FFV1 standardization

It is very useful!

Some bugs discovered while reviewing the code

Some clarification for corner cases

Derek Buitenhuis wrote a FFV1 decoder "for fun" (Great! Extra point of view on specs)

FFV1 standardization

FFV1 is standardized since 2021 (IETF RFC 9043)

RFC 9043 extract

Big gap between the main sponsorship and the standardization!

RAWcooked

  • Easy: just a short command line
    "rawcooked YourDirectoryName"
  • Store DPX/TIFF headers/footers in a specific Matroska attachment
  • Store other sidecar files as Matroksa attachments
  • Output is a single Matroska/FFV1/FLAC file
  • Encoding is reversible (bit-by-bit to original files)
    "rawcooked YourMatroskaFileName.mkv"

How RAWcooked is born?

Well... No Time To Wait?

Talk with Reto Kromer about a missing piece between archive constraints and a good storage

Source DPX are often required (technical or legal constraints)

But not optimal for storage (thousands of files! Not compressed!)

Reto offered the sponsorship for a proof of concept

How RAWcooked staid alive?

Well... No Time To Wait again?

Several other archives learned about the project and well, storage is costly...

These archives sponsor ($ and time) the improvements in RAWcooked

What if...

No initial sponsorship?

No further sponsorship?

 

--> Stronger together!

Sustainability for developers

One issue was that managers don't like to pay for something freely available

We wanted to keep the project open source

We chose an intermediate solution: open source with locked binaries

Advantages of open source are still there (you can fork if you don't like our work)

DPX and padding bits

DPX may have some padding bits

They are expected to be 0 (it is just for 32-bit boundaries)

FFmpeg (the encoding library) legitimely ignores them

We were also ignoring them when storing reversibility data (focus on DPX header)

DPX and padding bits

If padding bits are not 0, the reversibility promise is broken

Without users doing lot of tests, we would have missed that some scanners fill padding bits with something

 

--> Tests by users are important

--> Never bet on the value of an input byte, there will always be someone who decides to do something you didn't expected with it

FFmpeg constraints

RAWcooked uses by default Matroska attachments for storing reversibility data

Usually it is very small content (few KB of DPX header, compressed), but sometimes (0.0001% of processed content?) it becomes big

... and we didn't test that

FFmpeg was having a bug with attachments >= 1 GiB --> attachments reduced to 1 GiB, breaking the reversibility

 

--> Test reversibility of ALL files is actually needed

"rawcooked --check"

We added an automatic check in RAWcooked

It is 2x slower (decode of the compressed data, read again of DPX)

But we were not conservative enough and our promise was not true for 100% of created files

 

--> Speed should not be prioritized over check that all is fine

--> Next version of RAWcooked may have reversibility check enabled by default

FFmpeg constraints

FFmpeg implemented a check of coherency of input

One of theses checks is that an attachment should not be >= 256 MiB

Well, it is legitimate (avoiding to wait for the network in case of false-positive probing of the format)

... But it breaks the playback of some RAWcooked files

 

--> Matroska attachments were helpful for a first implementation, now moving to append data to the end of the file (and we keep backward compatibility)

Stay in touch

MediaArea: https://mediaarea.net, @MediaArea_net

RAWcooked: https://MediaArea.net/RAWcooked

Jérôme Martinez: jerome@mediaarea.net

Slides: https://MediaArea.net/Events

License (except images): CC BY