Today I'd like to share my little "flac on steriods" project.
...obviously inspired by "sox on steriods" ;)
Let's have a closer look at it.
There's not been that much evolution on flac lately. Great that somebody took the effort.
The main issue you'll face:
How to get the flac beast with that updated CRC algorithm on your machine!?!?
Bad luck for most of you. You have to wait.
You'd need to have flac version greater than 1.3.2 installed to have that feature inside.
1.3.3 is not even released yet.
And if that's done one day your OS maintainers still need ages to get it introduced.
For LMS users it'll take even longer.
So. 99.99% of you won't have the pleasure to enjoy the extra power for now.
Ok. What now? As usual. If you want bleeding edge stuff, there's no other way
as building the binary yourself. It's done pretty straight forward though.
Some background info affecting the flac binary performance.
flac offers several options to seriously improve its performance - just from the code perspective!
E.g. flac can make use of sse, sse2, avx2. These CPU features mainly apply to Intel platforms though!
Ever wondered why flac is that slow on a RPI?
Further flac can make use of C++ or assembler (nasm)
There are quite some variables around.
The usual issue: You just don't know how your flac was compiled and if it makes use of any of these "turbos".
Bottom line: The way flac gets compiled - and that includes the target CPU architecture - can have a huge impact on its performance!
Compiling it by yourself I consider a pretty good idea!
I ran my own compiled flac on my Intel NUC with all performance options switched on.
Let's have a look at the benchmark.
I am gonna try to reproduce the promised results of "+5%"first.
BTW:
As benchmark tool I'm using "perf" now. It seems to be reliable and more precise
compared to e.g. "time" as used for benchmarking sox earlier.
Preps:
Result:
...obviously inspired by "sox on steriods" ;)
I was triggered by the recent benchmark announcements on Phoronix.
The promise: flac now delivers a 5% faster encoding and decoding
by introducing a faster CRC algorithm.
That sounds nice!
The promise: flac now delivers a 5% faster encoding and decoding
by introducing a faster CRC algorithm.
That sounds nice!
There's not been that much evolution on flac lately. Great that somebody took the effort.
The main issue you'll face:
How to get the flac beast with that updated CRC algorithm on your machine!?!?
Bad luck for most of you. You have to wait.
You'd need to have flac version greater than 1.3.2 installed to have that feature inside.
1.3.3 is not even released yet.
And if that's done one day your OS maintainers still need ages to get it introduced.
For LMS users it'll take even longer.
So. 99.99% of you won't have the pleasure to enjoy the extra power for now.
Ok. What now? As usual. If you want bleeding edge stuff, there's no other way
as building the binary yourself. It's done pretty straight forward though.
Some background info affecting the flac binary performance.
flac offers several options to seriously improve its performance - just from the code perspective!
E.g. flac can make use of sse, sse2, avx2. These CPU features mainly apply to Intel platforms though!
Ever wondered why flac is that slow on a RPI?
Further flac can make use of C++ or assembler (nasm)
There are quite some variables around.
The usual issue: You just don't know how your flac was compiled and if it makes use of any of these "turbos".
Bottom line: The way flac gets compiled - and that includes the target CPU architecture - can have a huge impact on its performance!
Compiling it by yourself I consider a pretty good idea!
I ran my own compiled flac on my Intel NUC with all performance options switched on.
Let's have a look at the benchmark.
I am gonna try to reproduce the promised results of "+5%"first.
BTW:
As benchmark tool I'm using "perf" now. It seems to be reliable and more precise
compared to e.g. "time" as used for benchmarking sox earlier.
Preps:
- I reinstalled the Ubuntu flac and libs (dynamic linked binary)
- I then downloaded the Ubuntu flac sources and did a static compilation
- And I fetched the flac sources from git and compiled that statically
- I ran the encode and decode benchmarks
And here comes the result:
Binary = /tmp/flac-1.3.2-ubu
Performance counter stats for '/tmp/flac-1.3.2-ubu --totally-silent --compression-level-5 -f -o /tmp/test16.flac.flac-1.3.2-ubu /tmp/test16.wav' (10 runs):
1031,998175 task-clock (msec) # 1,000 CPUs utilized ( +- 0,07% )
6 context-switches # 0,006 K/sec ( +- 16,01% )
1 cpu-migrations # 0,001 K/sec ( +- 36,85% )
192 page-faults # 0,186 K/sec ( +- 0,41% )
2.757.615.568 cycles # 2,672 GHz ( +- 0,07% )
5.792.144.336 instructions # 2,10 insn per cycle ( +- 0,03% )
423.397.735 branches # 410,270 M/sec ( +- 0,06% )
11.845.109 branch-misses # 2,80% of all branches ( +- 0,03% )
1,032314326 seconds time elapsed ( +- 0,07% )
Binary = /tmp/flac-1.3.2-ubu-static
Performance counter stats for '/tmp/flac-1.3.2-ubu-static --totally-silent --compression-level-5 -f -o /tmp/test16.flac.flac-1.3.2-ubu-static /tmp/test16.wav' (10 runs):
1046,480818 task-clock (msec) # 1,000 CPUs utilized ( +- 0,07% )
5 context-switches # 0,005 K/sec ( +- 14,30% )
0 cpu-migrations # 0,000 K/sec ( +- 44,72% )
184 page-faults # 0,176 K/sec ( +- 0,24% )
2.801.189.305 cycles # 2,677 GHz ( +- 0,07% )
4.776.156.386 instructions # 1,71 insn per cycle ( +- 0,03% )
403.541.845 branches # 385,618 M/sec ( +- 0,06% )
11.491.004 branch-misses # 2,85% of all branches ( +- 0,05% )
1,046770327 seconds time elapsed ( +- 0,07% )
Binary = /tmp/flac-git-static
Performance counter stats for '/tmp/flac-git-static --totally-silent --compression-level-5 -f -o /tmp/test16.flac.flac-git-static /tmp/test16.wav' (10 runs):
923,622729 task-clock (msec) # 1,000 CPUs utilized ( +- 0,09% )
4 context-switches # 0,005 K/sec ( +- 18,62% )
0 cpu-migrations # 0,001 K/sec ( +- 33,33% )
180 page-faults # 0,195 K/sec ( +- 0,21% )
2.472.003.020 cycles # 2,676 GHz ( +- 0,07% )
5.108.543.740 instructions # 2,07 insn per cycle ( +- 0,03% )
541.381.977 branches # 586,151 M/sec ( +- 0,05% )
11.537.502 branch-misses # 2,13% of all branches ( +- 0,03% )
0,923934894 seconds time elapsed
Result:
The results show an around 11% increase of the flac made from git sources on the encode side - against both Ubuntu versions (repo binary and self compiled) having CRC optimizations not yet applied.
11% gain of the CRC improved binary. Nice! More then expected.
Somehow the binary compiled from Ubuntu sources shows a slightly lower performance then the dynamically linked Ubuntu version. Let's just accept that as it is. We made our case.
I then also did the decode test:
Binary = /tmp/flac-1.3.2-ubu
Performance counter stats for '/tmp/flac-1.3.2-ubu --totally-silent -d -f -o /tmp/test16.wav.flac-1.3.2-ubu /tmp/test16.flac' (10 runs):
566,553464 task-clock (msec) # 0,999 CPUs utilized ( +- 0,24% )
4 context-switches # 0,007 K/sec ( +- 15,09% )
0 cpu-migrations # 0,000 K/sec ( +- 66,67% )
128 page-faults # 0,225 K/sec ( +- 0,50% )
1.511.998.785 cycles # 2,669 GHz ( +- 0,16% )
3.580.347.563 instructions # 2,37 insn per cycle ( +- 0,07% )
214.363.822 branches # 378,365 M/sec ( +- 0,20% )
5.272.298 branch-misses # 2,46% of all branches ( +- 0,05% )
0,566851320 seconds time elapsed ( +- 0,24% )
Binary = /tmp/flac-1.3.2-ubu-static
Performance counter stats for '/tmp/flac-1.3.2-ubu-static --totally-silent -d -f -o /tmp/test16.wav.flac-1.3.2-ubu-static /tmp/test16.flac' (10 runs):
516,027060 task-clock (msec) # 0,999 CPUs utilized ( +- 0,97% )
3 context-switches # 0,006 K/sec ( +- 13,13% )
0 cpu-migrations # 0,000 K/sec ( +-100,00% )
119 page-faults # 0,231 K/sec ( +- 0,37% )
1.363.596.089 cycles # 2,642 GHz ( +- 0,15% )
3.378.787.107 instructions # 2,48 insn per cycle ( +- 0,08% )
213.400.313 branches # 413,545 M/sec ( +- 0,21% )
5.093.116 branch-misses # 2,39% of all branches ( +- 0,03% )
0,516293944 seconds time elapsed ( +- 0,97% )
Binary = /tmp/flac-git-static
Performance counter stats for '/tmp/flac-git-static --totally-silent -d -f -o /tmp/test16.wav.flac-git-static /tmp/test16.flac' (10 runs):
488,574913 task-clock (msec) # 0,999 CPUs utilized ( +- 0,37% )
2 context-switches # 0,005 K/sec ( +- 20,10% )
0 cpu-migrations # 0,000 K/sec
118 page-faults # 0,241 K/sec ( +- 0,31% )
1.297.780.573 cycles # 2,656 GHz ( +- 0,16% )
3.044.344.214 instructions # 2,35 insn per cycle ( +- 0,09% )
180.420.141 branches # 369,278 M/sec ( +- 0,24% )
5.077.955 branch-misses # 2,81% of all branches ( +- 0,16% )
0,488829035 seconds time elapsed ( +- 0,37% )
Result:
On the decode task a 14% gain of the new CRC optimized flac from git sources against the stock dynamic linked Ubuntu was found. A lot more than the folks over at flac promised.
There's "just" a "5%" increase against the Ubuntu sources compiled with "-O3 -march=broadwell". The decode and encode seems to have a different impact on the two different Ubuntu based binaries. Honestly. I don't feel motivated to look deeper into it for now.
It won't add anything much of relevance to the actual story.
It won't add anything much of relevance to the actual story.
Bottom line. Well done flac designers! You lived up to your promises. Your efforts are highly appreciated.
Enjoy.
********************************************************************************************************
Benchmarking test procedure:
IF="/tmp/test.wavOF="/tmp/test.flac"
DURATION="$(soxi -d $IF)"BITRATE="$(soxi -b $IF)"SAMPLERATE="$(soxi -r $IF)"
COMPRESSIONLEVEL="5"
echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
echo "****************"echo " DURATION:$DURATION"echo " SAMPLERATE:$SAMPLERATE"echo " BITRATE:$BITRATE"echo " COMPRESSION:$COMPRESSIONLEVEL"
rm $OF.* 2>/dev/null
for i in flac-1.3.2-ubu flac-1.3.2-ubu-static flac-git-static ; do
BIN="/tmp/$i"echo "****************************"echo "Binary = $BIN"perf stat -r 10 -B $BIN --totally-silent --compression-level-$COMPRESSIONLEVEL -f -o $OF.$i $IFsleep 3sync
echo
done
*************************************************************************
Compiling flac:
I'll show you now how to compile a static flac binary on Ubuntu or other Debian based systems. Open a terminal first.
I won't compile libogg support into the binary.
*************************************
sudo su
apt-get install build-essential libtool libtool-bin nasm
BASE=/tmp
cd $BASE
git clone https://git.xiph.org/flac.git
cd $BASE/flac
./autogen.sh
### gcc compiler settings:
### Find out your CPU specific parameter to use for your processor family and
### replace below "broadwell" entry accordingly e.g. "haswell"
export CFLAGS='-O3 -march=broadwell'
./configure --prefix=/usr --enable-static --disable-shared --disable-ogg --disable-doxygen-docs --disable-xmms-plugin
### You should now see listed in the configuration summary:
### SSE optimizations : ................... yes
### Asm optimizations : ................... yes
make
ls -l ./src/flac/flac
******************************************
Here we go. It's that easy.
Now you'll have a bleeding edge high performance standalone (static) flac binary at hand.
Note: It still says version 1.3.2 - just ignore it!
Copy it wherever you want it.
E.g. To your LMS installation
cp ./src/flac/flac /usr/share/squeezeboxserver/Bin/x86_64-linux/
I won't compile libogg support into the binary.
*************************************
sudo su
apt-get install build-essential libtool libtool-bin nasm
BASE=/tmp
git clone https://git.xiph.org/flac.git
cd $BASE/flac
./autogen.sh
### gcc compiler settings:
### Find out your CPU specific parameter to use for your processor family and
### replace below "broadwell" entry accordingly e.g. "haswell"
export CFLAGS='-O3 -march=broadwell'
./configure --prefix=/usr --enable-static --disable-shared --disable-ogg --disable-doxygen-docs --disable-xmms-plugin
### You should now see listed in the configuration summary:
### SSE optimizations : ................... yes
### Asm optimizations : ................... yes
ls -l ./src/flac/flac
******************************************
Here we go. It's that easy.
Now you'll have a bleeding edge high performance standalone (static) flac binary at hand.
Note: It still says version 1.3.2 - just ignore it!
E.g. To your LMS installation
cp ./src/flac/flac /usr/share/squeezeboxserver/Bin/x86_64-linux/