Verdigris: Qt without moc

May 25, 2016, 4:08 am

≫ Next: QIcon::fromTheme uses GTK+'s icon cache in Qt 5.7

Verdigris is a header-only library that can be used with Qt. It uses macros to create a QMetaObject that is binary compatible with Qt's own QMetaObject without requiring moc. In other words, you can use Verdigris macros in your Qt or QML application instead of some of the Qt macros and then you do not need to run moc.

TL;DR: Github repository - Tutorial

Introduction

CopperSpice is a fork of Qt 4. Its main raison d'être is to get rid of moc because they consider it bad enough (IMHO wrongly). To do so, they replaced the user-friendly Qt macro with some less friendly macros.

However, CopperSpice is a whole fork of Qt, meaning they are maintaining the whole library. This also means they are recreating an ecosystem from scratch. If they had made it binary compatible, then they could have removed moc without the need to maintain the full Qt library. This is what Verdigris is.

Another problem of CopperSpice compared to Qt is that it generates and registers the QMetaObject at run-time when loading the application. Meanwhile, Verdigris uses constexpr to generate the QMetaObject at compile time. For this reason, binaries using CopperSpice are much bigger than binaries using Qt (moc or Vedrigris), and take also more time to load because of the massive amount of relocations.

Previous work

Most of the ground work is based on the code I wrote already in my previous blog post: Can Qt's moc be replaced by C++ reflection?. In that blog post, I was trying to see if reflection could help replace moc, while keeping the convenience of the current Qt macros. The goal was was to influence source compatibility as little as possible.

CopperSpice decided to use different macros that are less convenient. The macros of Verdigris are based or improved upon the CopperSpice ones.

Differences between CopperSpice and Verdigris

	Pure Qt	Verdigris	CopperSpice
Requires moc	Yes	No	No
Usage	Convenient macros	Ugly macros	Ugly macros
Qt Compatibility	Obviously	Yes	No
MetaObject generation	Compile Time (By moc)	Compile Time (By the compiler)	Run-time (At load time)
MetaObject location	Shared read-only memory	Shared read-only memory	Heap

Macros

Qt	Verdigris	CopperSpice
`Q_OBJECT`	`W_OBJECT(MyClass) ... W_OBJECT_IMPL(MyClass)`	`CS_OBJECT(MyClass)`
`public slots: mySlot(int x);`	`void mySlot(int x); W_SLOT(mySlot)`	`CS_SLOT_1(Public, void mySlot(int x)) CS_SLOT_2(mySlot)`
`signals: void mySignal(int x);`	`void mySignal(int x) W_SIGNAL(mySignal,x)`	`CS_SIGNAL_1(Public, void mySignal(int x)) CS_SIGNAL_2(mySignal,x)`
`Q_PROPERTY(int myProperty WRITE setProp READ getProp NOTIFY propChanged)`	`W_PROPERTY(int, myProperty WRITE setProp READ getProp NOTIFY propChanged)`	`CS_PROPERTY_READ(myProperty, getProp) CS_PROPERTY_WRITE(myProperty, setProp) CS_PROPERTY_NOTIFY(myProperty, propertyChanged)`
`private slots: myPrivateSlot(int x);`	`void myPrivateSlot(int x); W_SLOT(myPrivateSlot, (int), W_Access::Private)`	`CS_SLOT_1(Private, void myPrivateSlot(int x)) CS_SLOT_OVERLOAD(myPrivateSlot,(int))`

The first difference of Verdigris is the W_OBJECT_IMPL macro that needs to be written in the .cpp file. This is one of the few points for which Verdigris is less convenient than CopperSpice as they do not need this macro.

~~In CopperSpice, you cannot define a slot inline in the class definition. You don't have this restriction with Verdigris.~~(Update: I was told it is possible with CopperSpice by putting the body within the CS_SLOT_1 macro)

Both CopperSpice and Verdigirs can have templated QObject class or nested QObject. Verdigris cannot, however, have function local QObjects (because of the static member staticMetaObject) and local classes cannot have static members.

From an implementation point of view, CopperSpice macros use __LINE__ to build an unique identifier, which means that two macros cannot be put on the same lines. So you can't declare several slots in a line or declare properties or signals/slots from a macro. (which ironically is one of the "problems" they raised about Qt4's moc). Verdigris's macros do not have this problem.

Tutorial

The best way to learn about how to use Verdigris is to read through the tutorial (conveniently brought to you through our Code Browser).

Benchmarks

All benchmarks were done with CopperSpice 1.2.2, Qt 5.6.0 or Qt 4.8.3, GCC 6.1

KitchenSink

I made the KitchenSink example from CopperSpice compile both with CopperSpice, Qt 5 with moc or with Verdigris (patch). This table show the amount in minutes:seconds taken by make -j1

	Qt 5 (moc)	Verdigris	CopperSpice
Compilation time	1:57	1:26	16:43
Binary size	1.32 MB	1.36 MB	115 MB

I was surprised to see that Verdigris compiles faster than using moc. The cost of compiling the generated code in a separate compilation unit is what makes it slower, and including the generated file is a common way to speed up the compilation (which was not done in this case). CopperSpice is probably so slow because each translation unit needs to re-generate the code that generates the meta object for all the included objects (including the headers from CsCore, CsGui, ...). Verdigris, however, moves most of the slow-to-compile code in a W_OBJECT_IMPL macro in the .cpp code that is only parsed for the corresponding translation unit. Still, the tests take a very long time to compile, that's because they have many objects with lots of special methods/properties and we are probably hitting non-linear algorithms within the compiler.

Library loading time

Any program that links to a C++ library has some overhead because of the relocations and the init section. This benchmark simply links an almost empty program with the libraries, and compute the time it takes to run.

CopperSpice (CsCore, CsGui)	Qt 4 (QtCore, QtGui)	Qt 5 (Qt5Core, Qt5Gui, Qt5Widgets)
56ms	16ms	17ms

Loading CopperSpice is much slower because all the MetaObjects needs to be created and the huge amount of relocations.
Note: Since the program is empty and has no QObject on his own, neither moc nor Verdigris were used. Qt5 and Qt4 themselves were compiled with moc as usual.

Signals/Slots run-time performance

I built the Qt benchmarks for testing connecting and emitting of Qt signals. There is no difference between Verdigris or the normal Qt. As expected since the QMetaObject structure is the same and the moc generated code is equivalent to the templated code.

CopperSpice is faster for signal emission because they inlined everything including QMetaObject::activate. Something we don't do in Qt because we want to maintain binary compatibility and inlining everything means we can't change much of the data structures used until the next major version of Qt. This contributes largely to the code size which is two order of magnitude more than using Qt.

Implementations details

As said previously, most of the constexpr code was based on what I wrote for the previous research. I had to replace std::tuple with another data structure because std::tuple turned out to be way too slow to compile. The GNU's libstdc++ implementation of std::tuple does about 16 template recursion per parameter only to compute the noexpect clause of its copy constructor. Which is quite limiting when the default limit of template recursion is 256 (so that means maximum 16 types in a tuple). There is also the fact that the compiler seems to have operation of quadratic complexity depending on the amount of template parameter or instantiation. I therefore made my own binary tree structure that can compile in a reasonable time. Instead of having tuple<T1, T2, T3, T4, T5, ....> we have Node<Node<Node<Leaf<T1>,Leaf<T2>>,Node<Leaf<T3>,Leaf<T4>>>, ...>

Nonetheless, the whole thing is some pretty heavy advanced template and macro tricks. While working on it I found and reported or even fixed compiler bugs in clang [1], [2], [3], and GCC [1], [2], [3]

Conclusions

In conclusion, as CopperSpice has already shown, this shows that moc is not strictly necessary for Qt features. The trade-off is the simplicity of the macros. The alternative macros from CopperSpice and Verdigris are less convenient, and force you to repeat yourself. Complex template code can also increase compilation time even more than the overhead of moc.

On the other hand, we think the approach of CopperSpice was the wrong one. By forking Qt instead of being compatible with it, they give up on the whole Qt ecosystem, and the small CopperSpice team will never be able to keep up with the improvements that are made within Qt. (CopperSpice is a fork of Qt 4, it is way behind what Qt 5 now has)

Verdigris is a header-only library consisting on two headers that can easily be imported in a project to be used by anyone who has reasons not to use moc. You can get the files from the Github repository.

↧

QIcon::fromTheme uses GTK+'s icon cache in Qt 5.7

June 16, 2016, 3:31 am

≫ Next: QReadWriteLock gets faster in Qt 5.7

≪ Previous: Verdigris: Qt without moc

When you pass a name to QIcon::fromTheme , Qt needs to look up in the different folders in order to know which theme contains the given icon and at which size. This might mean a lot disk access needs to be done only in order to find out if the file exists. Applications such as KMail can have hundreds of icons for each of their menu actions.
In the KF5 port of KMail, 30% of its start-up time was spent loading those icons.

With Qt 5.7, the GTK+ icon cache will be used to speedup the loading of icons.

Freedesktop icon themes

QIcon::fromTheme loads an icon as specified by the Freedesktop Icon Theme Specification and Icon Naming Specification. It specifies the icon themes used by Linux desktops. In summary, an icon theme consists of many folders containing the icons in PNG or SVG format. Each folder contains the icon for a particular size and type. The type might be one of "mimetypes", "actions", "apps". Size are usually all the common sizes. This represents quite a lot of directories per theme. On my computer, Gnome's default theme has 106 folders; the Oxygen theme has 166; Breeze has 56; hicolor has 483.
A theme can also specify one or several fallback themes. Icons which cannot be found in a given theme need to be looked up in the fallback themes until an icon is found. The last resort fallback theme is always "hicolor".

Icon names can contains dashes (-). The dash is used to separate levels of specificity. For example, if the name to look up is "input-mouse-usb", and that this icon does not exist in the theme, "input-mouse" will be looked up instead, until an icon is found.

The nature of these themes and the way icon are looked up implies that the icon engine needs to look at many directories for the existence of files. This is especially true for filenames with many dashes, that might not exist at all. This implies a lot of stat() calls on the hard drive.

Icon Theme Cache

For the above reasons, desktop environments have been using caches to speed up icon loading. In KDE4, KDE's KIconLoader used a shared memory mapped pixmap cache, where the loaded icon data was shared across processes (KPixmapCache). When an application requested an icon of a given size, the cache was queried whether this icon for this size is available. This gave good result at the time in KDE applications. However, this can't be used by pure Qt applications. When running the KDE's Plasma desktop, the platform theme integration plugin can use KIconLoader to load icons which only works when running on Plasma. This caching is anyway not working well with QIcon::fromTheme because we need to know if a QIcon is null or not, and this forces a lookup regardless of the cache.

GTK+ also speeds up the icon lookup using a cache. They have a tool ( gtk-update-icon-cache ) that generates a icon-theme.cache file for each theme. This file contains a hash table for fast lookup to know if an icon is in the theme or not and at which size. The icon cache is system-wide, and usually generated by the distribution's package manager.

Qt can now use GTK+ cache files

The question I asked myself is how to improve the performance of QIcon::fromTheme ? It is clear that we need to use a cache. Should we try to use the KDE pixmap cache? I decided against the KDE pixmap cache because the cache is, in my opinion, at the wrong level. We don't want to cache the icon data, we want to cache a way to get from the icon name to the right file. The cache is also storing some other assets, we might get cache misses more often than we should.

The GTK+ cache is solving the right problem: It is system wide and most Linux distributions already generate it. So I thought, we should just re-use the same cache.

I went ahead and "reverse engineered" the GTK+ icon cache format in order to implement it in Qt. Having GTK+s' source code available to browse clearly helped.

Does that mean that Qt now depends on GTK+?

No, the use of this cache is optional. If the cache is not found or out of date, Qt will ignore the cache and do the full look-ups as before.

How do I make sure the cache is used?

Every time you install an application that adds or removes icons from one of the themes, the cache needs to be updated. Qt looks at the modification time of the directories, if any directory is newer than the cache, the cache will not be used. You need to run gtk-update-icon-cache after installing new icons, but most distributions take care of this for you.

How do I update the cache?

Run this command for every theme:

gtk-update-icon-cache /usr/share/icons/<theme-name> -i -t -f

-i makes it not include the image data into the cache (Qt does not make use of that)

Conclusion

Thank you for reading so far. The summary is that Qt 5.7 will automatically use the GTK+ icon caches. Browse the source of QIconCacheGtkReader if you are interrested.

↧

QReadWriteLock gets faster in Qt 5.7

August 10, 2016, 1:55 am

≫ Next: Two C++ tricks used in Verdigris implementation

≪ Previous: QIcon::fromTheme uses GTK+'s icon cache in Qt 5.7

In Qt 5.0 already, QMutex got revamped to be fast. In the non contended case, locking and unlocking is basically only a simple atomic instruction and it does not allocate memory making it really light. QReadWriteLock however did not get the same optimizations. Now in Qt 5.7, it gets on par with QMutex.

QReadWriteLock

QReadWriteLock's purpose is about the same as the one of QMutex: to protect a critical section. The difference is that QReadWriteLock proposes two different locking modes: read or write. This allows several readers to access the critical section at the same time, and therefore be potentially more efficient than a QMutex. At least that was the intention. But the problem is that, before Qt 5.7, QReadWriteLock's implementation was not as optimized as QMutex. In fact, QReadWriteLock was internally locking and unlocking a QMutex on every call (read or write). So QReadWriteLock was in fact slower than QMutex, unless the read section was held for a very long time, under contention.

For example, the internals of QMetaType were using a QReadWriteLock for the QMetaType database. This makes sense because that database is accessed very often for reading (every time one creates or operates on a QVariant) and very seldom accessed for writing (only when you need to register a new type the first time it is used). However, the QReadWriteLock locking (for read) was so slow that it took a significant amount of time in some QML application that use lots of QVariants, for example with Qt3D.
It was even proposed to replace QReadWriteLock by a QMutex within QMetaType. This would have saved 40% of the time of the QVariant creation. This was not necessary because I improved QReadWriteLock in Qt 5.7 to make it at least as fast as QMutex.

QMutex

QMutex is itself quite efficient already. I described the internals of QMutex in a previous article. Here is a reminder on the important aspects on QMutex:

sizeof(QMutex) == sizeof(void*), without any additional allocation.
The non-contended case is basically only an atomic operation for lock or unlock
In case we need to block, fallback to pthread or native locking primitives

QReadWriteLock in Qt 5.7

I optimized QReadWriteLock to bring it on par with QMutex, using the the same implementation principle.

QReadWriteLock only has one member: a QAtomicPointer named d_ptr. Depending on the value of d_ptr, the lock is in the following state:

When d_ptr == 0x0 (all the bits are 0): unlocked and non-recursive. No readers or writers holding nor waiting on it.
When d_ptr & 0x1 (the least significant bit is set): one or several readers are currently holding the lock. No writers are waiting and the lock is non-recursive. The amount of readers is (d_ptr >> 4) + 1.
When d_ptr == 0x2: we are locked for write and nobody is waiting.
In any other case, when the two least significant bits are 0 but the remaining bits contain data, d_ptr points to a QReadWriteLockPrivate object which either means that the lock is recursive, or that it is currently locked and threads are possibly waiting. The QReadWriteLockPrivate has a condition variable allowing to block or wake threads.

In other words, the two least significant bits encode the state. When it is a pointer to a QReadWriteLock, those two bits will always be 0 since pointers must be properly aligned to 32 or 64 bits addresses for the CPU to use them.

This table recap the state depending on the two least significant bits:

	if `d_ptr` is fully 0, then the lock is unlocked.
`00`	If `d_ptr` is not fully 0, it is pointer to a `QReadWriteLockPrivate`.
`01`	One or several readers are currently holding the lock. The amount of reader is `(d_ptr >> 4) + 1`.
`10`	One writer is holding the lock and nobody is waiting

We therefore define a few constants to help us read the code.

enum {
    StateMask = 0x3,
    StateLockedForRead = 0x1,
    StateLockedForWrite = 0x2,
};
const auto dummyLockedForRead = reinterpret_cast<QReadWriteLockPrivate *>(quintptr(StateLockedForRead));
const auto dummyLockedForWrite = reinterpret_cast<QReadWriteLockPrivate *>(quintptr(StateLockedForWrite));
inline bool isUncontendedLocked(const QReadWriteLockPrivate *d)
{ return quintptr(d) & StateMask; }

Aside: The code assumes that the null pointer value is equal to binary 0, which is not guaranteed by the C++ standard, but holds true on every supported platform.

`lockForRead`

The really fast case happens when there is no contention. If we can atomically swap from 0 to StateLockedForRead, we have the lock and there is nothing to do. If there already are readers, we need to increase the reader count, atomically. If a writer already holds the lock, then we need to block. In order to block, we will assign a QReadWriteLockPrivate and wait on its condition variable. We call QReadWriteLockPrivate::allocate() which will pop an unused QReadWriteLockPrivate from a lock-free stack (or allocates a new one if the stack is empty). Indeed, we can never free any of the QReadWriteLockPrivate as another thread might still hold pointer to it and de-reference it. So when we release a QReadWriteLockPrivate, we put it in a lock-free stack.

lockForRead actually calls tryLockForRead(-1), passing -1 as the timeout means "wait forever until we get the lock".

Here is the slightly edited code. (original)

bool QReadWriteLock::tryLockForRead(int timeout)
{
    // Fast case: non contended:
    QReadWriteLockPrivate *d;
    if (d_ptr.testAndSetAcquire(nullptr, dummyLockedForRead, d))
        return true;

    while (true) {
        if (d == 0) {
            if (!d_ptr.testAndSetAcquire(nullptr, dummyLockedForRead, d))
                continue;
            return true;
        }

        if ((quintptr(d) & StateMask) == StateLockedForRead) {
            // locked for read, increase the counter
            const auto val = reinterpret_cast<QReadWriteLockPrivate *>(quintptr(d) + (1U<<4));
            if (!d_ptr.testAndSetAcquire(d, val, d))
                continue;
            return true;
        }

        if (d == dummyLockedForWrite) {
            if (!timeout)
                return false;

            // locked for write, assign a d_ptr and wait.
            auto val = QReadWriteLockPrivate::allocate();
            val->writerCount = 1;
            if (!d_ptr.testAndSetOrdered(d, val, d)) {
                val->writerCount = 0;
                val->release();
                continue;
            }
            d = val;
        }
        Q_ASSERT(!isUncontendedLocked(d));
        // d is an actual pointer;

        if (d->recursive)
            return d->recursiveLockForRead(timeout);

        QMutexLocker lock(&d->mutex);
        if (d != d_ptr.load()) {
            // d_ptr has changed: this QReadWriteLock was unlocked before we had
            // time to lock d->mutex.
            // We are holding a lock to a mutex within a QReadWriteLockPrivate
            // that is already released (or even is already re-used). That's ok
            // because the QFreeList never frees them.
            // Just unlock d->mutex (at the end of the scope) and retry.
            d = d_ptr.loadAcquire();
            continue;
        }
        return d->lockForRead(timeout);
    }
}

`lockForWrite`

Exactly the same principle, as lockForRead but we would also block if there are readers holding the lock.

bool QReadWriteLock::tryLockForWrite(int timeout)
{
    // Fast case: non contended:
    QReadWriteLockPrivate *d;
    if (d_ptr.testAndSetAcquire(nullptr, dummyLockedForWrite, d))
        return true;

    while (true) {
        if (d == 0) {
            if (!d_ptr.testAndSetAcquire(d, dummyLockedForWrite, d))
                continue;
            return true;
        }

        if (isUncontendedLocked(d)) {
            if (!timeout)
                return false;

            // locked for either read or write, assign a d_ptr and wait.
            auto val = QReadWriteLockPrivate::allocate();
            if (d == dummyLockedForWrite)
                val->writerCount = 1;
            else
                val->readerCount = (quintptr(d) >> 4) + 1;
            if (!d_ptr.testAndSetOrdered(d, val, d)) {
                val->writerCount = val->readerCount = 0;
                val->release();
                continue;
            }
            d = val;
        }
        Q_ASSERT(!isUncontendedLocked(d));
        // d is an actual pointer;

        if (d->recursive)
            return d->recursiveLockForWrite(timeout);

        QMutexLocker lock(&d->mutex);
        if (d != d_ptr.load()) {
            // The mutex was unlocked before we had time to lock the mutex.
            // We are holding to a mutex within a QReadWriteLockPrivate that is already released
            // (or even is already re-used) but that's ok because the QFreeList never frees them.
            d = d_ptr.loadAcquire();
            continue;
        }
        return d->lockForWrite(timeout);
    }
}

`unlock`

The API has a single unlock for both read and write so we don't know if we are unlocking from reading or writing. Fortunately, we can know that with the state encoded in the lower bits. If we were locked for read, we need to decrease the reader count, or set the state to 0x0 if we are the last one. If we were locked for write we need to set the state to 0x0. If there is a QReadWriteLockPrivate, we need to update the data there, and possibly wake up the blocked threads.

void QReadWriteLock::unlock()
{
    QReadWriteLockPrivate *d = d_ptr.load();
    while (true) {
        Q_ASSERT_X(d, "QReadWriteLock::unlock()", "Cannot unlock an unlocked lock");

        // Fast case: no contention: (no waiters, no other readers)
        if (quintptr(d) <= 2) { // 1 or 2 (StateLockedForRead or StateLockedForWrite)
            if (!d_ptr.testAndSetRelease(d, nullptr, d))
                continue;
            return;
        }

        if ((quintptr(d) & StateMask) == StateLockedForRead) {
            Q_ASSERT(quintptr(d) > (1U<<4)); //otherwise that would be the fast case
            // Just decrease the reader's count.
            auto val = reinterpret_cast<QReadWriteLockPrivate *>(quintptr(d) - (1U<<4));
            if (!d_ptr.testAndSetRelease(d, val, d))
                continue;
            return;
        }

        Q_ASSERT(!isUncontendedLocked(d));

        if (d->recursive) {
            d->recursiveUnlock();
            return;
        }

        QMutexLocker locker(&d->mutex);
        if (d->writerCount) {
            Q_ASSERT(d->writerCount == 1);
            Q_ASSERT(d->readerCount == 0);
            d->writerCount = 0;
        } else {
            Q_ASSERT(d->readerCount > 0);
            d->readerCount--;
            if (d->readerCount > 0)
                return;
        }

        if (d->waitingReaders || d->waitingWriters) {
            d->unlock();
        } else {
            Q_ASSERT(d_ptr.load() == d); // should not change when we still hold the mutex
            d_ptr.storeRelease(nullptr);
            d->release();
        }
        return;
    }
}

Benchmarks

Here is the benchmark that was run: https://codereview.qt-project.org/167113/. The benchmark was run with Qt 5.6.1, GCC 6.1.1. What I call Qt 5.7 bellow is in fact Qt 5.6 + the QReadWriteLock patch so it only compares this patch.

Uncontended

This benchmark compares different types of lock by having a single thread running in a loop 1000000 times, locking and unlocking the mutex and doing nothing else.

QReadWriteLock (Qt 5.6)	38 ms	███████████████████
QReadWriteLock (Qt 5.7)	18 ms	█████████
QMutex	16 ms	████████
std::mutex	18 ms	█████████
std::shared_timed_mutex	33 ms	████████████████▌

Contented Reads

This benchmark runs as much threads as logical cores (4 in my cases). Each thread will lock and unlock the same mutex 1000000 times. We do a small amount of work inside and outside the lock. If no other work was done at all and the threads were only locking and unlocking, we would have a huge pressure on the mutex but this would not be a fair benchmark. So this benchmark does a hash lookup inside the lock and a string allocation outside of the lock. The more work is done inside the lock, the more we disadvantage QMutex compared to QReadWriteLock because threads would be blocked for longer time.

QReadWriteLock (Qt 5.6)	812 ms	████████████████████▍
QReadWriteLock (Qt 5.7)	285 ms	███████▏
QMutex	398 ms	██████████
std::mutex	489 ms	████████████▎
std::shared_timed_mutex	811 ms	████████████████████▎

Futex Version

On platforms that have futexes, QMutex does not even need a QMutexPrivate, it uses the futexes to hold the lock. Similarly, we could do the same with QReadWriteLock. I made an implementation of QReadWriteLock using futex (in fact I made it first before the generic version). But it is not in Qt 5.7 and is not yet merged in Qt, perhaps for a future version if I get the motivation and time to get it merged.

Could we get even faster?

As always, nothing is perfect and there is always still room for improvement. A flaw of this implementation is that all the readers still need to perform an atomic write at the same memory location (in order to increment the reader's count). This causes contention if there are many reader threads. For cache performance, we would no want that the readers write to the same memory location. Such implementations are possible and would make the contended case faster, but then would take more memory and might be slower for the non-contended case.

Conclusion

These benchmarks shows the huge improvement in QReadWriteLock in Qt 5.7. The Qt classes have nothing to envy to their libstdc++ implementation. std::shared_timed_mutex which would be the standard equivalent of a QReadWriteLock is surprisingly slow. (I heard rumors that it might get better.)
It is optimized for the usual case of Qt with relatively low contention. It is taking a very small amount of memory and makes it a pretty decent implementation of a read write lock.

In summary, you can now use QReadWriteLock as soon as there are many reads and seldom writes. This is only about non recursive mutex. Recursive mutex are always slower and should be avoided. Not only because they are slower, but it is also harder to reason about them.

↧

Two C++ tricks used in Verdigris implementation

February 14, 2018, 10:41 pm

≫ Next: Integrating QML and Rust: Creating a QMetaObject at Compile Time

≪ Previous: QReadWriteLock gets faster in Qt 5.7

I have just tagged the version 1.0 Verdigris. I am taking this opportunity to write an article about two C++ tricks used in its implementation.

Verdigris is a header-only C++ library which lets one use Qt without the need of moc. I've written an introductory blog post about it two years ago and have since then received several contributions on GitHub that extend the software.

Optionally Removing Parentheses in a Macro

This trick is used in the W_PROPERTY and the W_OBJECT_IMPL macro. The first argument of W_PROPERTY is a type. Typically: W_PROPERTY(QString, myProperty MEMBER m_myProperty).
But what happens when the type contains one or several commas, as in: W_PROPERTY(QMap<QString, int>, myProperty MEMBER m_myProperty)? That's not valid, macro expansion does not consider template and therefore the first argument would be up to the first comma. The solution is to put the type name in parentheses. The new problem is then how can we ignore parentheses in the implementation of the macro.

Let's rephrase the problem with simplified macros. Imagine we want to do a macro similar to this:

// Naive implementation of a macro that declares a getter function
#define DECLARE_GETTER(TYPE, NAME) TYPE get_##NAME()

// Can be used like this
DECLARE_GETTER(QString, property1);   // line A
// OK: expands to "QString get_property1()"

// But this does not work:
DECLARE_GETTER(QMap<QString, int>, property2);
// ERROR: 3 arguments passed to the macro, but only 2 expected

// And this
DECLARE_GETTER((QMap<QString, int>), property3);  // line B
// ERROR: expands to "(QMap<QString, int>) get_property3()"
// Can we get rid of the parenthesis?

The question is: How can we implement DECLARE_GETTER so both line A and line B produce the expected result? Can we get the macro to remove the parentheses.

Let's make a first attempt:

// REMOVE_PAREN will be our macro that removes the parenthesis
#define DECLARE_GETTER(TYPE, NAME) REMOVE_PAREN(TYPE) get_##NAME()

// Forward to REMOVE_PAREN_HELPER
#define REMOVE_PAREN(A) REMOVE_PAREN_HELPER A
#define REMOVE_PAREN_HELPER(...) __VA_ARGS__

DECLARE_GETTER((QMap<QString, int>), property1);
// OK: expands to "QMap<QString, int> get_property1()"
// This worked because "REMOVE_PAREN_HELPER (QMap<QString, int>)" was expanded to "QMap<QString, int>"

DECLARE_GETTER(QString, property2);
// ERROR: expands to "REMOVE_PAREN_HELPER QString get_property2()"
// There was no parenteses after REMOVE_PAREN_HELPER so it was not taken as a macro"

We managed to remove the parentheses, but we broke the case where there are no parentheses. Which lead to a sub-question: How to remove a specific token if present? In this case, how to remove "REMOVE_PARENT_HELPER" from the expansion

// Same as before
#define DECLARE_GETTER(TYPE, NAME) REMOVE_PAREN(TYPE) get_##NAME()

// Macro that removes the first argument
#define TAIL(A, ...) __VA_ARGS__

// This time, we add a "_ ," in front of the arguments
#define REMOVE_PAREN_HELPER(...) _ , __VA_ARGS__

#define REMOVE_PAREN(A) REMOVE_PAREN2(REMOVE_PAREN_HELPER A)

#define REMOVE_PAREN2() TAIL(REMOVE_PAREN_HELPER_##__VA_ARGS__)
//  ##__VA_ARGS__ will "glue" the first token of its argument with "REMOVE_PAREN_HELPER_"
// The first token is:
//  -  "_" if REMOVE_PAREN_HELPER was expanded, in which case we have "REMOVE_PAREN_HELPER__"
//      which will be removed by the TAIL macro; or
//  - "REMOVE_PAREN_HELPER if it was not expanded, in chich case we now have
//    "REMOVE_PAREN_HELPER_REMOVE_PAREN_HELPER"

// So we define a macro so that it will be removed by the TAIL macro
#define REMOVE_PAREN_HELPER_REMOVE_PAREN_HELPER _,

The above code should give you an idea on how things should work. But it is not yet working. We need to add a few layers of indirection so all macros arguments gets expanded

Here is the real code from Verdigris:

#define W_MACRO_MSVC_EXPAND(...) __VA_ARGS__
#define W_MACRO_DELAY(X,...) W_MACRO_MSVC_EXPAND(X(__VA_ARGS__))
#define W_MACRO_DELAY2(X,...) W_MACRO_MSVC_EXPAND(X(__VA_ARGS__))
#define W_MACRO_TAIL(A, ...) __VA_ARGS__

#define W_MACRO_REMOVEPAREN(A) W_MACRO_DELAY(W_MACRO_REMOVEPAREN2, W_MACRO_REMOVEPAREN_HELPER A)
#define W_MACRO_REMOVEPAREN2(...) W_MACRO_DELAY2(W_MACRO_TAIL, W_MACRO_REMOVEPAREN_HELPER_##__VA_ARGS__)
#define W_MACRO_REMOVEPAREN_HELPER(...) _ , __VA_ARGS__
#define W_MACRO_REMOVEPAREN_HELPER_W_MACRO_REMOVEPAREN_HELPER ,

#define DECLARE_GETTER(TYPE, NAME) W_MACRO_REMOVEPAREN(TYPE) get_##NAME()

// And now it works as expected:
DECLARE_GETTER(QString, property1);
DECLARE_GETTER((QMap<QString, int>), property2);

Note that the W_MACRO_MSVC_EXPAND is there only to work around a MSVC bug.

Building a `constexpr` State in a Class from a Macro

Conceptually, this is what the macro does

class Foo : public QObject {
   W_OBJECT(Foo) // init the state
   int xx();
   W_INVOKABLE(xx) // add things to the state
   int yy();
   W_INVOKABLE(yy) // add more things
};
W_OBJECT_IMPL(Foo); // Do something with the state

But what's the state? How do we represent it?
The idea is to have a static function (let's call it w_state) whose return value contains the state. Each W_INVOKABLE macro would then expand to a new definition of that function. Of course, it needs to take a different argument, so this just declares a new overload. We do it by having a w_number<N> class template, which inherits from w_number<N-1>. (This idea is basically inspired from CopperSpice's cs_counter, whose authors described in a talk at CppCon 2015)

Here is a simplified expanded version:

template<int N> struct w_number : public w_number<N - 1> {
    static constexpr int value = N;
    static constexpr w_number<N-1> prev() { return {}; }
};
// Specialize for 0 to break the recursion.
template<> struct w_number<0> { static constexpr int value = 0; };


class Foo {
public:
    // init the state  (expanded from W_OBJECT)
    static constexpr tuple<> w_state(w_number<0>) { return {}; }

    int xx();

    // add &Foo::xx to the state by defining w_state(w_number<1>)
    static constexpr auto w_state(w_number<tuple_size<
                decltype(w_state(w_number<255>()))>::value + 1> n)
        -> decltype(tuple_cat(w_state(n.prev()), make_tuple(&Foo::xx)))
    { return tuple_cat(w_state(n.prev()), make_tuple(&Foo::xx)); }

    int yy();

    // add &Foo::yy to the state by defining w_state(w_number<2>)
    static constexpr auto w_state(w_number<tuple_size<
                decltype(w_state(w_number<255>()))>::value + 1> n)
        -> decltype(tuple_cat(w_state(n.prev()), make_tuple(&Foo::yy)))
    { return tuple_cat(w_state(n.prev()), make_tuple(&Foo::yy)); }
};

// Use that state
constexpr auto FooMetaObject = buildMetaObject(Foo::w_state(w_number<255>()));

This is working pretty well. At the end of this simplified example, our state is a std::tuple containing &Foo::xx and &Foo::yy, from which we could build the meta object. (In the real implementation, the state is a bit more complicated and contains more data)

This works because the call to w_state(w_number<255>())) that we use to get the size of the tuple, is referring to the previous declaration of w_state. Since our current function is not yet defined yet, the most appropriate function is the remaining one with the highest number.
Notice that we have to repeat the same code in the decltype and after the return and we cannot use return type deduction because we need to use that function before the class is fully defined.

So far so good. However, I've hit what I think is a compiler bug when doing that with class template (eg, Foo would be template<typename> Foo). For this reason, instead of using static functions, I have used friend functions in verdigris. The principle is exactly the same. It is not well-known, but friend functions can be declared inline in the class, despite still being global functions. (I've also used that fact in the implementation of Q_ENUM)
One just needs to add a type which relates to the current class as an argument. I use a pointer to a pointer to the class instead of just a pointer because I don't want potential pointer conversion when calling it with a derived class.

class Bar {
public:

    // init the state  (expanded from W_OBJECT)
    friend constexpr tuple<> w_state(Bar **, w_number<0>) { return {}; }

    int xx();

    friend constexpr auto w_state(Bar **t, w_number<tuple_size<decltype(w_state(
            static_cast<Bar**>(nullptr), w_number<255>()))>::value + 1> n)
        -> decltype(tuple_cat(w_state(t, n.prev()), make_tuple(&Bar::xx)))
    { return tuple_cat(w_state(t, n.prev()), make_tuple(&Bar::xx)); }

    int yy();

    friend constexpr auto w_state(Bar **t, w_number<tuple_size<decltype(w_state(
            static_cast<Bar**>(nullptr), w_number<255>()))>::value + 1> n)
        -> decltype(tuple_cat(w_state(t, n.prev()), make_tuple(&Bar::yy)))
    { return tuple_cat(w_state(t, n.prev()), make_tuple(&Bar::yy)); }
};

// Use that state
constexpr auto BarMetaObject = buildMetaObject(w_state(static_cast<Bar**>(nullptr), w_number<255>()));

Conclusion

Please let me know when you find Verdigris useful or want to help out, either here in the comments or contribute directly on GitHub.

↧

Integrating QML and Rust: Creating a QMetaObject at Compile Time

June 7, 2018, 8:12 am

≪ Previous: Two C++ tricks used in Verdigris implementation

In this blog post, I would like to present a research project I have been working on: Trying to use QML from Rust, and in general, using a C++ library from Rust.

The project is a Rust crate which allows to create QMetaObject at compile time from pure Rust code. It is available here: https://github.com/woboq/qmetaobject-rs

Qt and Rust

There were already numerous existing projects that attempt to integrate Qt and Rust. A great GUI toolkit should be working with a great language.

As far back as 2014, the project cxx2rust tried to generate automatic bindings to C++, and in particular to Qt5. The blog post explain all the problems. Another project that automatically generate C++ bindings for Qt is cpp_to_rust. I would not pursue this way of automatically create bindings because it cannot produce a binding that can be used from idiomatic Rust code, without using unsafe.

There is also qmlrs. The idea here is to develop manually a small wrapper C++ library that exposes extern "C" functions. Then a Rust crate with a good and safe API can internally call these wrappers.
Similarly, the project qml-rust does approximately the same, but uses the DOtherSide bindings as the Qt wrapper library. The same used for D and Nim bindings for QML.
These two projects only concentrate on QML and not QtWidget nor the whole of Qt. Since the API is then much smaller, this simplifies a lot the fastidious work of creating the bindings manually. Both these projects generate a QMetaObject at runtime from information given by rust macros. Also you cannot use any type as parameter for your property or method arguments. You are limited to convert to built-in types.

Finally, there is Jos van den Oever's Rust Qt Binding Generator. To use this project, one has to write a JSON description of the interface one wants to expose, then the generator will generate the rust and C++ glue code so that you can easily call rust from your Qt C++/Qml application.
What I think is a problem is that you are still expected to write some C++ and add an additional step in your build system. That is perfectly fine if you want to add Rust to an existing C++ project, but not if you just want a GUI for a Rust application. Also writing this JSON description is a bit alien.

I started the qmetaobject crate mainly because I wanted to create the QMetaObject at rust compile time. The QMetaObject is a data structure which contains all the information about a class deriving from QObject (or Q_GADGET) so the Qt runtime can connect signals with slots, or read and write properties. Normally, the QMetaObject is built at compile time from a C++ file generated by moc, Qt's meta object compiler.
I'm a fan of creating QMetaObject: I am contributing to Qt, and I also wrote moc-ng and Verdigris which are all about creating QMetaObject. Verdigris uses the power of C++ constexpr to create the QMetaObject at compile time, and I wanted to try using Rust to see if it could also be done at compile time.

The `qmetaobject` crate

The crate uses a custom derive macro to generate the QMetaObject. Custom derive works by adding an annotation in front of a rust struct such as #[derive(QObject)] or #[derive(QGadget)]. Upon seeing this annotation, the rustc compiler will call the function from the qmetaobject_impl crate which implements the custom derive. The function has the signature fn(input : TokenStream) -> TokenStream. It will be called at compile time, and takes as input the source code of the struct it derives and should generate more source code that will then be compiled.
What we do in this custom derive macro is first to parse the content of the struct and find about some annotations. I've used a set of macros such as qt_property!, qt_method! and so on, similar to Qt's C++ macros. I could also have used custom attributes but I chose macros as it seemed more natural coming from the Qt world (but perhaps this should be revised).

Let's simply go over a dummy example of using the crate.

extern crate qmetaobject;
use qmetaobject::*; // For simplicity

// Deriving from QObject will automatically implement the QObject trait and
// generates QMetaObject through the custom derive macro.
// This is equivalent to add the Q_OBJECT in Qt code.
#[derive(QObject,Default)]
struct Greeter {
  // We need to specify a C++ base class. This is done by specifying a
  // QObject-like trait. Here we can specify other QObject-like traits such
  // as QAbstractListModel or QQmlExtensionPlugin.
  // The 'base' field is in fact a pointer to the C++ QObject.
  base : qt_base_class!(trait QObject),
  // We declare the 'name' property using the qt_property! macro.
  name : qt_property!(QString; NOTIFY name_changed),
  // We declare a signal. The custom derive will automatically create
  // a function of the same name that can be called to emit it.
  name_changed : qt_signal!(),
  // We can also declare invokable methods.
  compute_greetings : qt_method!(fn compute_greetings(&self, verb : String) -> QString {
      return (verb + " " + &self.name.to_string()).into()
  })
}

fn main() {
  // We then use qml_register_type as an equivalent to qmlRegisterType
  qml_register_type::<Greeter>(cstr!("Greeter"), 1, 0, cstr!("Greeter"));
  let mut engine = QmlEngine::new();
  engine.load_data(r#"
    import QtQuick 2.6; import QtQuick.Window 2.0; import Greeter 1.0;
    Window {
      visible: true;
      // We can instantiate our rust object here.
      Greeter { id: greeter; name: 'World'; }
      // and use it by accessing its property or method.
      Text { text: greeter.compute_greetings('hello'); }
    }"#.into());
  engine.exec();
}

In this example, we used qml_register_type to register the type to QML, but we can also also set properties on the global context. An example with this model, which also demonstrate QGadget

// derive(QGadget) is the equivalent of Q_GADGET.
#[derive(QGadget,Clone,Default)]
struct Point {
  x: qt_property!(i32),
  y: qt_property!(i32),
}

#[derive(QObject, Default)]
struct Model {
  // Here the C++ class will derive from QAbstractListModel
  base: qt_base_class!(trait QAbstractListModel),
  data: Vec<Point>
}

// But we still need to implement the QAbstractListModel manually
impl QAbstractListModel for Model {
  fn row_count(&self) -> i32 {
    self.data.len() as i32
  }
  fn data(&self, index: QModelIndex, role:i32) -> QVariant {
    if role != USER_ROLE { return QVariant::default(); }
    // We use the QGadget::to_qvariant function
    self.data.get(index.row() as usize).map(|x|x.to_qvariant()).unwrap_or_default()
  }
  fn role_names(&self) -> std::collections::HashMap<i32, QByteArray> {
    vec![(USER_ROLE, QByteArray::from("value"))].into_iter().collect()
  }
}

fn main() {
  let mut model = Model { data: vec![ Point{x:1,y:2} , Point{x:3, y:4} ], ..Default::default() };
  let mut engine = QmlEngine::new();
  // Registers _model as a context property.
  engine.set_object_property("_model".into(), &mut model);
  engine.load_data(r#"
    import QtQuick 2.6; import QtQuick.Window 2.0;
    Window {
      visible: true;
      ListView {
        anchors.fill: parent;
        model: _model;  // We reference our Model object
        // And we can access the property or method of our gadget
        delegate: Text{ text: value.x + ','+value.y; } }
    }"#.into());
  engine.exec();

Other implemented features include the creation of Qt plugins such as QQmlExtensionPlugin without writing a line of C++, only using rust and cargo. (See the qmlextensionplugins example.)

QMetaObject generation

The QMetaObject consists in a bunch of tables in the data section of the binary: a table of string and a table of integer. And there is also a function pointer with code used to read/write the properties or call the methods.

The custom derive macro will generate the tables as &'static[u8]. The moc generated code contains QByteArrayData, built in C++, but since we don't want to use a C++ compiler to generate the QMetaObject, we have to layout all the bytes of the QByteArrayData one by one. Another tricky part is the creation of the Qt binary JSON for the plugin metadata. The Qt binary JSON is also an undocumented data structure which needs to be built byte by byte, respecting many invariants such as alignment and order of the fields.

The code from the static_metacall is just an extern "C" fn. Then we can assemble all these pointers in a QMetaObject. We cannot create const static structure containing pointers. This is then implemented using the lazy_static! macro.

QObject Creation

Qt needs a QObject* pointer for our object. It has virtual methods to get the QMetaObject. The same applies for QAbstractListModel or any other class we could like to inherit from, which has many virtual methods that we wish to override.

We will then have to materialize an actual C++ object on the heap. This C++ counterpart is created by some of the C++ glue code. We will store a pointer to this C++ counterpart in the field annotated with the qt_base_class! macro. The glue code will instantiate a RustObject<QObject> . It is a class that inherits from QObject (or any other QObject derivative) and overrides the virtual to forward them to a callback in rust which will then be able to call the right function on the rust object.

One of the big problems is that in rust, contrary to C++, objects can be moved in memory at will. This is a big problem, as the C++ object contains a pointer to the rust object. So the rust object needs somehow to be fixed in memory. This can be achieved by putting it into a Box or a Rc, but even then, it is still possible to move the object in safe code. This problem is not entirely fixed, but the interface takes the object by value and moves it to an immutable location. Then the object can still be accessed safely from a QJSValue object.

Note that QGadget does not need a C++ counter-part.

C++ Glue code

For this project I need a bit of C++ glue code to create the C++ counter part of my object, or to access the C++ API for Qt types or QML API. I am using the cpp! macro from the cpp crate. This macro allows embedding C++ code directly into rust code with very little boiler plate compared to manually creating callbacks and declaring extern "C" functions.
I even contributed a cpp_class macro which allows wrapping C++ classes from rust.

Should an API be missing, it is easy to add the missing wrapper function. Also when we want to inherit from a class, we just need to imitate what is done for QAbstractListView, that is override all the virtual functions we want to override, and forward them to the function from the trait.

Final Words

My main goal with this crate was to try to see if we can integrate QML with idiomatic and safe Rust code. Without requiring to use of C++ or any other alien tool for the developer. I also had performance in mind and wanted to create the QMetaObject at compile time and limit the amount of conversions or heap allocations.
Although there are still some problems to solve, and that the exposed API is far from complete, this is already a beginning.

You can get the metaobject crate at this URL: https://github.com/woboq/qmetaobject-rs

↧