Daala Codebase 17 Aug 2014
Contents include Public API src Main library code examples Front-end tools tools Ancilliary tools (metrics, training, etc.) doc What documentation there is doc/coding_style.html: coding style guidelines
Build Targets Two build systems: configure.ac, Makefile.am: autotools-based unix/makefile: basic GNU makefile Three libraries: libdaalabase (common code between encoder and decoder), libdaaladec (decoder-specific) libdaalaenc (encoder-specific) Examples: encoder_example Encodes video (encapsulated in Ogg) dump_video Decodes to YUV4MPEG raw video player_example Simple SDL player
Front End Three example programs encoder_example The only one that currently does anything Reads y4m, writes.ogg Also reconstructs frames and writes a separate.y4m file for each one dump_video Meant to decode to.y4m Can copy example from libtheora and strip out uneeded things player_example Meant to be a simple SDL-based player Metrics (requires: --enable-dump-recons) tools/dump_{fastssim,psnr,psnrhvs,ssim} DAALA_ROOT=<build_dir>./tools/rd_collect.sh <codec> *.y4m OUTPUT=<label>./tools/rd_average.sh *.out IMAGE=prefix./tools/rd_plot.sh *.out EC2 instances available: https://github.com/tdaede/rd_tool.git (talk to Thomas)
Debugging --enable-assertions: turn on assertions --enable-logging: turn on logging OD_LOG_MODULES env variable to control what gets printed, see top of logging.c for a list Ex: OD_LOG_MODULES=motion-estimation:4 --enable-encoder-check Decode after encoding and check that the reconstructed frame matches the encoder s --enable-accouting: collect/dump statistics on bit usage make check: run unit tests make clean ; make debug Produces unoptimized debug build with assertions and logging enabled
Image Debugging od_state_dump_img(od_state *, od_img *, const char *tag) Dumps a %08i%s%s.png using frame #, tag, and suffix Suffix set via OD_DUMP_IMAGES_SUFFIX env variable (for parallel jobs) od_state_dump_yuv(od_state *, od_img *, const char *tag) Like above, but dumps a single-frame YUV4MPEG file %08i%s-%s.y4m od_img_draw_point(od_img *img, int x, int y, const unsigned char ycbcr[3]) od_img_draw_line(od_img *img, int x0,int y0, int x1, int y1, const unsigned char ycbcr[3]) Configure with --enable-dump-images to enable See also --enable-dump-recons to dump reconstructed frames only
Coding Tools Some coding tools can be enabled/disabled at compile time for testing purposes Block size min/max Prefilter Intra prediction Haar DC PVQ (vs. scalar quantization) Chroma from Luma Activity masking, quantization matrices Flags in internal.h Requires recompile: bitstream not compatible Be careful! We are already finding cases where some combinations are broken and/or subtly wrong (e.g., encoding information twice)
Video Data All video data in 8-bit Y CbCr (possibly plus alpha) struct od_img_plane { unsigned char *data; unsigned char xdec; unsigned char ydec; int xstride; int ystride; }; struct od_img { od_img_plane planes[od_nplanes_max]; int nplanes; ogg_int32_t width; ogg_int32_t height; };
Video Data Full flexibility only on encoder input Encoder copies data to internal buffer Width/Height padded to a multiple of 32 Crop rectangle in state.info.pic_{x, y, width, height} Start of rows aligned to 16-byte boundary Probably needs to be 32 32 pixels of padding on all sides: ystride > height xstride == 1
Objects od_state (state.h): daala_info info; int ref_imgi[4]; od_img ref_imgs[4]; od_img io_imgs[2]; ogg_int64_t cur_time; od_mv_grid_pt **mv_grid; int nhsb; int nvsb; unsigned char *bsize; }; od_enc (encint.h): od_state state; od_adapt_ctx adapt; oggbyte_buffer obb; od_ec_enc ec; int packet_state; Int quantizer[od_nplanes_max]; od_mv_est_ctx *mvest; }; od_dec (decint.h): od_state state; od_adapt_ctx adapt; oggbyte_buffer obb; od_ec_dec ec; Int quantizer[od_nplanes_max]; int packet_state; };
Entropy Coder Low-level encoding API (entenc.h) void od_ec_enc_bits(od_ec_enc *enc, ogg_uint32_t fl, unsigned ftb); void od_ec_encode_bool_q15(od_ec_enc *enc, int val, unsigned fz_q15); void od_ec_encode_bool(od_ec_enc *enc, int val, unsigned fz, unsigned ft); void od_ec_encode_cdf_q15(od_ec_enc *enc, int s, const ogg_uint16_t *cdf, int nsyms); void od_ec_encode_cdf_unscaled_dyadic(od_ec_enc *enc, int s, const ogg_uint16_t *cdf, int nsyms, unsigned ftb); void od_ec_encode_cdf(od_ec_enc *enc, int s, const ogg_uint16_t *cdf, int nsyms); void od_ec_encode_cdf_unscaled(od_ec_enc *enc, int s, const ogg_uint16_t *cdf, int nsyms); void od_ec_enc_uint(od_ec_enc *enc, ogg_uint32_t fl, ogg_uint32_t ft); Other encoder functions int od_ec_enc_tell(od_ec_enc *enc); ogg_uint32_t od_ec_enc_tell_frac(od_ec_enc *enc); void od_ec_enc_checkpoint(od_ec_enc *dst, const od_ec_enc *src); void od_ec_enc_rollback(od_ec_enc *dst, const od_ec_enc *src);
Entropy Decoder Low-level decoding API (entdec.h) ogg_uint32_t od_ec_dec_bits(od_ec_dec *dec, unsigned ftb); int od_ec_decode_bool_q15(od_ec_dec *dec, unsigned fz); int od_ec_decode_bool(od_ec_dec *dec, unsigned fz, unsigned ft); int od_ec_decode_cdf_q15(od_ec_dec *dec, const ogg_uint16_t *cdf, int nsyms); int od_ec_decode_cdf_unscaled_dyadic(od_ec_dec *dec, const ogg_uint16_t *cdf, int nsyms, unsigned _ftb); int od_ec_decode_cdf(od_ec_dec *dec, const ogg_uint16_t *cdf, int nsyms); int od_ec_decode_cdf_unscaled(od_ec_dec *dec, const ogg_uint16_t *cdf, int nsyms); ogg_uint32_t od_ec_dec_uint(od_ec_dec *dec, ogg_uint32_t ft); Other decoder functions int od_ec_dec_tell(od_ec_dec *dec); ogg_uint32_t od_ec_dec_tell_frac(od_ec_dec *dec);
Higher-level Entropy Coding Basic adaptive CDF: generic_code.h void od_encode_cdf_adapt(od_ec_enc *ec, int val, ogg_uint16_t *cdf, int n, int increment); int od_decode_cdf_adapt(od_ec_dec *ec, ogg_uint16_t *cdf, int n, int increment); Generic coder: generic_code.h Estimates model for you Shape of distribution modeled via lookup tables, decaying tail, can be shared by many contexts Particular context modeled via one parameter: expected value (updated per coded symbol) Laplace coder: laplace_code.h Versions for one-sided (exponential) distribution, known max, and vector with known L1 norm
Motion Estimation OBMC with adaptive partition sizes https://people.xiph.org/~tterribe/notes/mc.pdf (doc/mc.tex)... ignore the stuff about CGI Staged subpel (currently) Upsampled to hpel via od_state_upsample8() qpel and 1/8th pel done via bilinear interpolation Decoder: mc.c/mc.h OBMC blending (incl. multiresolution blending) MV prediction Encoder: mcenc.c SAD used for all decisions (no SATD yet) (Non-overlapped) block matching for all block sizes (fullpel) RDO for block-size decisions (Balmelli 2001) Real OBMC for costing, (badly) faked rate estimates Refine MVs via iterated dynamic programming Siubpel via diamond search during DP MV resolution chosen on per-frame basis
Intra Prediction Existing prediction done after the transform (freq. Domain) Currently disabled by default (OD_DISABLE_INTRA) Replaced by Haar DC over whole superblock (OD_DISABLE_HAAR_DC) Code in intra.c, trained tables in intradata* New hotness: Intra Paint Perform prediction prior to transform, like MC Can predict clean edges Decouples prediction block sizes from transform block sizes Easier to integrate with encoder MC decisions Status/integration plans? (Jean-Marc)
Block Sizes Transform block sizes supported: 4x4, 8x8, 16x16 Blocks organized into 32x32 Superblocks Planned for higher block sizes via TF Last attempt did not show improvements https://review.xiph.org/65/ Enough has changed that it s time to try again Psychovisual block size decisions Encoder estimates visibility of ringing artifacts No RDO (but bias towards larger blocks at low rates) Code in block_size*
Transforms OD_COEFF_SHIFT (4) Amount to shift up 8-bit coefficients before transform (non-lossless only) Lapping: filter.h/filter.c 4-point, 8-point, 16-point filters od_apply_filter_rows()/od_apply_filter_cols() decide which filters to apply based on block sizes Currently 4:4:4 or 4:2:0 only... need help to support 4:2:2 DCTs: dct.h/dct.c 4x4, 8x8, 16x16 Orthonormal scaling (e.g., DC scale == sqrt(1/n)) Reversible (bit-exact, both directions) TF: Trade off time-frequency resolution, tf.h/tf.c OD_HAAR_KERNEL: if you need a Haar transform for something, use this od_tf_up_hv_lp() Increase frequency resolution, horizontal and vertical directions, then low-pass (used by CfL) od_tf_up_hv Increase horizontal and vertical frequency resolution od_tf_down_hv Decrease horizontal and vertical frequency resolution (increase time resolution) od_tf_filter_2d()/od_tf_filter_inv_2d(): Second stage TF correction from Demo 3
PVQ Documentation: doc/video_pvq.lyx, doc/theoretical_results.lyx https://people.xiph.org/~tterribe/daala/pvq201404.pd f Code: pvq.h/pvq.c, pvq_encoder.c, pvq_decoder.c Scan order, band partitioning in partition.h/partition.c Bits actually coded with Laplace coder
Basic Encoding Process Copy/pad image (Y CbCr pixels) Prefilter across block boundaries Transform blocks Construct Intra predictors and pick an Intra mode Dump Images/ PSNR Postfilter across block boundaries Inverse transform Quantize + Encode coefficients (PVQ) Will use whiteboard