C++ matrix maths – library performance

Recently I have been look on the Ogre Matrix class which has a fairly un-optimized, but straightforward implementation, that you can see here.
I was wondering how it compares.

Of course somebody had a similar question in mind before. Martin Foot that is. While the discussion still applies today, I felt like the results could have changed since 2012 as libraries and compilers have moved on.

So I forked his code to update the libs to the latest versions and came up with the following results:

Library add (x86_64, SSSE3) mult (x86_64, SSSE3) add (armeabi-v7a, NEON) mult (armeabi-v7a, NEON)
Eigen3 17 ms 53 ms 173 ms 399 ms
GLM 50 ms 186 ms 232 ms 399 ms
Ogre 50 ms 184 ms 232 ms 399 ms
CML1 116 ms 348 ms 178 ms 489 ms

The used compiler was gcc with optimization level -O2.

As we can see Eigen3 just downgrades the rest on x86_64 – probably due its explicit vectorization. Notably, CLM1 is having some issues and even falls behind the naive implementations.
On ARM the results are more tight. With Eigen3 and CLM1 being about 25% faster at addition. However CML1 again has some issues with the mult test.

We end up with Eigen3 being the overall winner and GLM being second (Ogre does not count as it is not a Math library).

Also you should migrate away from CLM1 as the development focus shifted to CLM2 and the issues found above are probably not going to be resolved.

Switching Apache2 to php-fpm for performance

there are many articles on the internet telling you to switch from Apache & mod_php to nginx to get better performance.

However the main reason for performance improvement is not nginx itself but rather the way it integrates PHP.

Different ways to integrate PHP

Apache traditionally used mod_php to embed the PHP interpreter inside Apache HTTP request handler. This way it can directly interpret PHP scripts whereas with CGI it would have to start a new PHP interpreter process first – per request.

The drawback however is that the PHP interpreter is embedded in all request handlers – even those that just serve static files. This obviously blows up memory consumption which in turn can lower performance.

Nginx on the other hand uses the FCGI approach where a pool of PHP processes is started along the webserver using the FCGI process manager, FPM. The webserver then delegates individual requests using the FCGI protocol as needed.
This avoids the PHP interpreter startup costs as well as starting it without a need and is the reason nginx is faster then mod_php.

However since Apache 2.4 one can also use FCGI to integrate PHP and get virtually the same characteristics like nginx. Sticking with Apache saves you migrating all the .htaccess rules and means an easier setup for many webapps.

Furthermore since Apache 2.4.10 one can use mod_proxy_fcgi for a reverse-proxy configuration which further reduces the occupied PHP workers in the FPM pool for better performance.

Configuration on Ubuntu 16.04

Switching to FCGI on Ubuntu 16.04 is quite easy. The needed module are installed by default and just need to be enabled:

a2enmod proxy_fcgi && a2dismod php7.0

Then inside your-site.conf add

 # PHP-FPM
 <FilesMatch "\.php$">
     SetHandler "proxy:unix:/var/run/php/php7.0-fpm.sock|fcgi://localhost/"
 </FilesMatch>
 <Proxy "fcgi://localhost/">
 </Proxy>

this connects Apache in reverse proxy mode to the PHP-FPM pool using unix domain sockets for optimal performance. See the Apache Wiki for details.

Note that php-fpm by default only creates 5 PHP worker processes, which in turn limits the maximal simultaneous connections. You might want to raise this by adapting pm.max_children in /etc/php/7.0/fpm/pool.d/www.conf.

Typically you set this to RAM size / avg. process size. You can find out the latter via:

ps -ylC php-fpm7.0 --sort:rss

Performance Measurements

To measure the results I did a force reload of my single user Nextcloud instance and measured the Load time via Chrome developer tools:

Pagemod_phpmod_proxy_fcgi 
Files701 ms605 ms0.86
News1.77 s1.67 s0.94

as one can see depending on the amount of static/ dynamic files and internal/ external requests we can bring down the page load time by up to 15%.

Do not use Meson

Recently the Meson Build System gained some momentum. It is time to stop that.
Not that Meson is a bad piece of software – on the contrary, it is quite well designed.
Still it makes building C/C++ applications worse, by (quoting xkcd) basically creating  this:

It sets out to create a cross-platform, more readable and faster alternative to autotools. But there is already CMake that solves this.

You might say that CMake is ugly, but note that the CMake 2.x you might have tried is not the same CMake 3.x that is available today. Many patterns have improved and are now both more logical and more readable.

Nowadays the difference between Meson and CMake is just a matter of syntactic preference. The Meson authors seem to agree here.

The actual criterion for selecting a build system however should be tooling support and community spread. CMake easily wins here:

After the introduction of the server mode it got native support by QtCreator, CLion, Android Studio (NDK) and even Microsofts Visual Studio. Native means that you do not have to generate any intermediate project files, but the CMakeLists.txt is used directly by the IDE.

On the community spread side we got e.g. KDE, OpenCV, zlib, libpng, freetype and as of recently Boost. These projects using CMake not only guarantees that you can easily use them, but that you can also include them in your build via add_subdirectory such that they become part of your project. This is especially useful if you are cross-compiling – for instance to a Raspberry Pi.

On the other hand, reinventing a wheel that is tailored to the needs of a specific community (Gnome), means that it will fall behind and eventually die. This is what is currently happening to the Vala language that had a similar birth to Meson.

The meson devs might object that Meson generates build files that run faster on a Raspberry Pi. However if your cross compiling is working you do not need that. And honestly, that particular improvement could have been also achieved by providing a patch to the CMake Ninja generator..

Addendum 27-11-2018
Stumbled over another great guide to modern CMake.

Addendum 15-06-2018
A new guide for CMake called CGold can be found here, which is of comparable quality to the Meson docs.

Addendum 4-1-2018
Some comments (rightfully) note that Meson has generally a better documentation and avoids some of its pitfalls. However this is mostly due to Meson not being around long enough such that the way you do things in Meson changed. Neither did it see such a widespread use like CMake yet. (think of corner-cases)

But even if you argue that this is precisely the point why you should use Meson, I would argue that improving the existing documentation in CMake and adding more educational warnings is easier then writing something from scratch.

Migrating from owncloud 9.1 to nextcloud 11

First one should ask though: why? My main motivation was that many of the apps I use were easily available in the nextcloud store, while with owncloud I had to manually pull them from github.
Additionally some of the app authors migrated to nextcloud and did not provide further updates for owncloud.

Another reason is this:

the graphs above show the number of commits for owncloud and nextcloud. Owncloud has taken a very noticeable hit here after the fork – even though they deny it.

From the user perspective the lack of contribution is visible for instance in the admin interface where with nextcloud you get a nice log browser and system stats while with owncloud you do not. Furthermore the nextcloud android app handles Auto-Upload much better and generally seems more polished – I think one can expect nextcloud to advance faster in general.

Migrating

For migrating you can follow the excellent instructions of Jos Poortvliet.

In my case owncloud 9.1 was installed on Ubuntu in /var/www/owncloud and I put nextcloud 11 to /var/www/nextcloud. Then the following steps had to be applied:

  1. put owncloud in maintenance mode
    sudo -u www-data php occ maintenance:mode --on
  2. copy over the config.php
    cp /var/www/owncloud/config/config.php /var/www/nextcloud/config/
  3. adapt the path in config.php
    # from 
    'path' => '/var/www/owncloud/apps',
    # to
    'path' => '/var/www/nextcloud/apps',
  4. adapt the path in crontab
    sudo crontab -u www-data -e
  5. adapt the paths in the apache config
  6. run the upgrade script which takes care of the actual migration. Then disable the maintanance mode.
    sudo -u www-data php occ upgrade
    sudo -u www-data php occ maintenance:mode --off

and thats it.

Teatime and Sensors-Unity now available as snaps

After I now got even featured on OMG Ubuntu with both of my apps, I thought it would be a good idea to make them easier to install.

Those of you that were following my recent posts on creating snappy packages may already have guessed it. For everyone else the news today is: the teatime and sensors-unity utilities are now available as snaps, so you now can easily install them using the official Ubuntu Store or the command line as

sudo snap install sensors-unity
sudo snap install teatime

after this they will be available directly in the app launcher.

Note: sensors-unity additionally needs the hardware-observe permission which you currently can only give it using the command line as:

sudo snap connect sensors-unity:hardware-observe ubuntu-core:hardware-observe

Right now the only drawback is that both snaps include the full python3 and gtk3 runtimes and therefore weight around 80MB in size.
If you do not mind some extra steps for installation you can get them as 100KB debs from their PPAs: Teatime, Sensors-Unity.
However in the near future there will be a shared gnome-runtime snap which will mitigate the size issue.

OGRECave 1.10 release

The 1.10.0 release of the OGRECave fork was just created. This means that the code is considered stable enough for general usage and the current interfaces will be supported in subsequent patch releases (i.e. 1.10.1, 1.10.2 …).

SampleBrowser running GLES2 on desktop

This release represents more than 3 years of work from various contributors when compared to the previous 1.9 release. At the time of writing it contains all commits from the bitbucket version as well as many fork specific features and fixes.

If you are reading about the fork for the first time and wonder why it was created, see this blog post. For a comparison between the github and bitbucket version see this log.

For a general overview of the 1.10 features when compared to 1.9, see the OGRECave 1.10 release notes.

The highlights probably are:

  • upstream Python bindings as an component
  • improved GL3+/ GLES2 renderers
  • A new HLMS Component implementing physically based shading
  • SDL2 based input handling
  • Bites Component for rapid prototyping of applications
  • Emscripten platform support

For further information see the github page of the fork.

Odroid U3 in the Nextcloud Box

Until now I used a microSD card for storage of my Owncloud setup. The drawback of doing so is that microSD cards only allow for so many writes until they die and go in a read-only mode.

Therefore the Nextcloud box is an attractive upgrade allowing to use a more failure proof HDD while still keeping everything inside the same housing.

The housing

The first thing to note is that the housing is much larger than one might think from the photos. See the comparison photo with the Odroid housing. Actually this should not be a surprise as the 2.5″ HDD alone is larger than the Odroid board.

Continue reading Odroid U3 in the Nextcloud Box

WordPress, AMP and Ads

Delivering your content not only as HTML, but also using the AMP-HTML subset not only reduces the loading times for your readers, but also improves the score of your site in the Google results. On top of that your site will be proxied by the Google AMP Cache, lowering your server load.

If you are using WordPress, adding AMP support is as easy as installing the AMP-Plugin which is developed by Automattic, the company behind WordPress.

Doing so will likely get you more visitors, but unfortunately there is a drawback: the plugin does not support advertisements out of the box.

So if you – like me – rely on advertisements to cover the server costs, you have to apply some tweaks to get ads on AMP pages as well. This is what this post will be about.

Extending the AMP Plugin

Basically you have to modify your current theme. If you are using an off-the shelf theme, you should create a child-theme – otherwise just extend the functions.php of your custom theme.

AMP uses the amp-ad tag for displaying ads which requires a additional script to work. Currently it also works without adding a script, but it already generates a warning. Lets be safe here and add the required script to the list:


add_action( 'amp_post_template_data', 'xyz_amp_post_template_add_ad_script' );
function xyz_amp_post_template_add_ad_script( $data ) {
	$data['amp_component_scripts']['amp-ad'] = 'https://cdn.ampproject.org/v0/amp-ad-0.1.js';
	return $data;
}

Next we create a filter that injects the actual amp-ad tags into out content:


add_action( 'pre_amp_render_post', 'xyz_amp_add_custom_actions' );
function xyz_amp_add_custom_actions() {
    add_filter( 'the_content', 'xyz_amp_add_ad' );
}

function xyz_amp_add_ad( $content ) {
    // hack for skipping featured-image
    if ( false !== strpos( $content, 'wp-post-image')) {
        // as it unfortunately uses get_post too
        // fixed in AMP 0.4.1
        return $content;
    }

    return
    '<amp-ad width=300 height=250
         type="adsense"
         data-ad-client="$YOUR_ID"
         data-ad-slot="$YOUR_TOP_SLOT">
     </amp-ad>'
    .$content.
    '<amp-ad width=300 height=250
         type="adsense"
         data-ad-client="$YOUR_ID"
         data-ad-slot="$YOUR_BOTTOM_SLOT">
     </amp-ad>';
}

The snippet above assumes you are using google adsense. If you want to integrate a different Ad Network, look here for the specific syntax.

On OGRE versions

Currently one can choose between the following OGRE versions
1.9, 1.10, 2.0 and 2.1

However the versioning scheme has become completely arbitrary while still resembling semantic versioning.
As a consequence somebody even had to put a “What version to choose?” guide on the OGRE homepage.

Unfortunately the guide confuses more than it helps:

Continue reading On OGRE versions

Creating PyGTK app snaps with snapcraft

Snap is a new packaging format introduced by Ubuntu as an successor to dpkg aka debian package. It offers sandboxing and transactional updates and thus is a competitor to the flatpak format and resembles docker images.

As with every new technology the weakest point of working with snaps is the documentation. Your best bet so far is the snappy-playpen repository.

There are also some rough edges regarding desktop integration and python interoperability, so this is what the post will be about.

I will introduce some quircks that were needed to get teatime running, which is written in Python3 and uses Unity and GTK3 via GObject introspection.

The most important thing to be aware of is that snaps are similar to containers in that each snap has its own rootfs and only restricted access outside of it. This is basically what the sandboxing is about.
However a typical desktop application needs to know quite a lot about the outside world:

  • It must know which theme the user currently uses, and after that it also needs to access the theme files.
  • For saving anything it needs access to /home
  • If it should access the internet it needs system level access as well; like querying whether there actually is an active internet connection

To declare that we want to write to home, play back sound and use unity features we use the plugs keyword like

apps:
    teatime:
         # ...
         plugs: [unity7, home, pulseaudio]

However we must also tell our app to look for the according libraries inside its snap instead of the system paths. For this one must change quite a few environment variables manually. Fortunately Ubuntu provides wrapper scripts that take care of this for us. They are called desktop-launchers.

To use the launcher the configures the GTK3 environment we have to extend the teatime part like this:

apps:
    teatime:
        command: desktop-launch $SNAP/usr/share/teatime/teatime.py
        # ...
parts:
    teatime:
         # ...
         after: [desktop/gtk3]

The desktop-launch script takes care of telling PyGTK where the GI repository files are.

You can see the full snapcraft.yml here.

Update:

Before my fix, one had to use this rather lengthy startup command

env GI_TYPELIB_PATH=$SNAP/usr/lib/girepository-1.0:$SNAP/usr/lib/x86_64-linux-gnu/girepository-1.0 desktop-launch $SNAP/usr/share/teatime/teatime.py

which hard-coded the architecture.

End Update

After this teatime will start, but the paths still have to be fixed. Inside a snap “/” still refers to the system root, so all absolute paths must be prefixed with $SNAP.

Actually I think the design of flatpak is more elegant in this regard where “/” points to the local rootfs and one does not have to change absolute paths. To bring in the system parts flatpak uses bind mounts.

Conclusions

Once you get the hang of how snaps work, packaging becomes quite straightforward, however currently there are still some drawbacks

  • the snap package results in 120MB compared to a 12KB deb. This is actually a feature as shipping all the dependencies makes the snap installable on every linux distribution. However I hope that we can get this down by introducing shared frameworks (like GTK3, Python) that do not have to be included in the app package.
  • [Update: fixed] Due to another issue, your snap has only the C locale available and thus will not be localized.
  • [Update: fixed] Unity desktop notifications do not work. You will get a DBus exception at the corresponding call.
  • [Update: fixed] The shipped .desktop file is not hooked up with the system, so you can only launch the app via the command line.