Taking Out the Garbage

From the title, you might think this post is about household chores. Instead, I’m happy to announce that we may have a path to solving GJS’s “Tardy Sweep Problem”.

For more information about the problem, read The Infamous GNOME Shell Memory Leak by Georges Stavracas. This is going to be a more technical post than my previous post on the topic, which was more about the social effects of writing blog posts about memory leaks. So first I’ll recap what the problem is.

Garbage, garbage, everywhere

Buzz Lightyear meme with the caption "Garbage, Garbage Everywhere"At the root of the GNOME desktop is an object-oriented technology called GObject. GObjects are reference counted, but not garbage collected. As long as their reference count is nonzero, they are “alive”, and when their reference count drops to zero, they are deleted from memory.

GObject reference counting

Graphical user interfaces (such as a large part of GNOME Shell) typically involve lots of GObjects which all increase each other’s reference count. A diagram for a simple GUI window made with GTK might look like this:

A typical GUI would involve many more objects than this, but this is just for illustrating the problem.

Here, each box is an object in the C program.

Note that these references are all non-directional, meaning that they aren’t really implemented as arrows. In reality it looks more like a list of numbers; Window (1); Box (1); etc. Each object “knows” that it has one reference, but it knows nothing about which other objects own those references. This will become important later.

When the app closes, it drops its reference to the window. The window’s reference count becomes zero, so it is erased. As part of that, it drops all the references it owns, so the reference count of the upper box becomes zero as well, and so on down the tree. Everything is erased and all the memory is reclaimed. This all happens immediately. So far, so good.

Javascript objects

To write the same GUI in a Javascript program, we want each GObject in the underlying C code to have a corresponding Javascript object so that we can interact with the GUI from our Javascript code.

Javascript objects are garbage collected, and the garbage collector in the SpiderMonkey JS engine is a “tracing” garbage collector, meaning that on every garbage collection pass it starts out with objects in a “root set” that it knows are not garbage. It “traces” each of those objects, asking it which other objects it refers to, and keeps tracing each new object until it hits a dead end. Any objects that weren’t traced are considered garbage, and are deleted. (For more information, the Wikipedia article is informative: https://en.wikipedia.org/wiki/Tracing_garbage_collection)

We need to integrate the JS objects and their garbage collection scheme with the GObjects and their reference counting scheme. That looks like this:

The associations between the Javascript objects (“JS”) and the GObjects are bidirectional. That means, the JS object owns a reference to the GObject, meaning the reference count of every GObject in this diagram is 2. The GObject also “roots” the JS object (marks it as unable to be garbage collected) because the JS object may have some state set on it (for example, by writing button._alreadyClicked = false; in JS) that should not be lost while the object is still alive.

The JS objects can also refer to each other. For example, see the rightmost arrow from the window’s JS object to the button’s JS object. The JS code that created this GUI probably contained something like win._button = button;. These references are directional, because the JS engine needs to know which objects refer to which other objects, in order to implement the garbage collector.

Speaking of the garbage collector! The JS objects, unlike the GObjects, are cleaned up by garbage collection. So as long as a JS object is not “rooted” and no other JS object refers to it, the garbage collector will clean it up. None of the JS objects in the above graph can be garbage collected, because they are all rooted by the GObjects.

Toggle references and tardy sweeps

Two objects (G and JS) keeping each other alive equals a reference cycle, you might think. That’s right; as I described it above, neither object could ever get deleted, so that’s a memory leak right there. We prevent this with a feature called toggle references: when a GObject’s reference count drops to 1 we assume that the owner of the one remaining reference is the JS object, and so the GObject drops its reference to the JS object (“toggles down“). The JS object is then eligible for garbage collection if no other JS object refers to it.

(If this doesn’t make much sense, don’t worry. Toggle references are among the most difficult to comprehend code in the GJS codebase. It took me about two years after I became the maintainer of GJS to fully understand them. I hope that writing about them will demystify them for others a bit.)

When we close the window of this GUI, here is approximately what happens. The app drops its references to the GObjects and JS objects that comprise the window. The window’s reference count drops to 1, so it toggles down, dropping one direction of the association between GObject and JS object.

Unlike the GObject-only case where everything was destroyed immediately, that’s all that can happen for now! Everything remains in place until the next garbage collection, because at the top of the object tree is the window’s JS object. It is eligible to be collected because it’s not rooted and no other JS object refers to it.

Normally the JS garbage collector can collect a whole tree of objects at once. That’s why the JS engine needs to have all the information about the directionality of the references.

However, it won’t do that for this tree. The JS garbage collector doesn’t know about the GObjects. So unfortunately, it takes several passes of the garbage collector to get everything. After one garbage collection only the window is gone, and the situation looks like this:

Now, the outermost box’s JS object has nothing referring to it, so it will be collected on the next pass of the garbage collector:

And then it takes one more pass for the last objects to be collected:

The objects were not leaked, as such, but it took four garbage collection passes to get all of them. The problem we previously had, that Georges blogged about, was that the garbage collector didn’t realize that this was happening. In normal use of a Javascript engine, there are no GObjects that behave differently, so trees of objects don’t deconstruct layer by layer like this. So, there might be hours or days in between garbage collector passes, making it seem like that memory was leaked. (And often, other trees would build up in the intervening time between passes.)

Avoiding toggle references

To mitigate the problem Georges implemented two optimizations. First, the “avoid toggle references” patch, which was actually written by Giovanni Campagna several years ago but never finished, made it so that objects don’t start out using the toggle reference system. Instead, only the JS objects hold references to the GObjects. The JS object can get garbage collected whenever nothing else refers to it, and it will drop its reference to the GObject.

A problem then occurs when that wasn’t the last reference to the GObject, i.e. it’s being kept alive by some C code somewhere, and the GObject resurfaces again in JS, for example by being returned by a C function. In this case we recreate the JS object, assuming that it will be identical to the one that was already garbage collected. The only case where that assumption doesn’t hold, is when the JS code sets some state on one of the JS objects. For example, you execute something like myButton._tag = 'foo';. If myButton gets deleted and recreated, it won’t have a _tag property. So in the case where any custom state is set on a JS object, we switch it over to the toggle reference system once again.

In theory this should help, because toggle references cause the tardy sweep problem, so if fewer objects use toggle references, there should be fewer objects collected tardily. However, this didn’t solve the problem, because especially in GNOME Shell, most JS objects have some state on them. And, sadly, it made the toggle reference code even more complicated than it already was.

The Big Hammer

The second optimization Georges implemented was the affectionately nicknamed “Big Hammer”. It checks if any GObjects toggled down during a garbage collector pass, and if so, restart the garbage collector a few seconds after. This made CPU performance worse, but would at least make sure that all unused objects were deleted from memory within a reasonable time frame (under a minute, rather than a day.)

Combined with some other memory optimizations, this made GNOME 3.30 quite a lot less memory hungry than its predecessors.

An afternoon at Mozilla

Earlier this year, I had been talking on IRC to Ted Campbell and Steve Fink on the SpiderMonkey team at Mozilla for a while, about various ins and outs of being an external (i.e. not Firefox) user of SpiderMonkey’s JS engine API. Early September I found myself in Toronto, where Ted Campbell is based out of, and I paid a visit to the Mozilla office one afternoon.

I had lunch with Ted and Kannan Vijayan of the SpiderMonkey team where we discussed the current status of external SpiderMonkey API users. Afterwards, we made the plans which eventually became this GitHub repository of examples and best practices for using the SpiderMonkey JS engine outside of Firefox. We have both documentation and code examples there, and more on the way. This is still in progress, but it should be the beginning of a good resource for embedding the JS engine, and the end of all those out-of-date pages on MDN!

I also learned some good practices that I can put to use in GJS. For example, we should avoid using JS::PersistentRooted except as a last resort, because it roots objects by putting them in a giant linked list, which is then traced during garbage collection. It’s often possible to store the objects more efficiently than that, and trace them from some other object, or the context.

Ending the tardy sweeps

In the second half of the afternoon we talked about some of the problems that I had with SpiderMonkey that were specific to GJS. Of course, the tardy sweep problem was one of them.

For advice on that, Ted introduced me to Nika Layzell, an engineer on the Gecko team. We looked at the XPCOM cycle collector and I was surprised to learn that Gecko uses a scheme similar to toggle references for some things. However, rather than GJS sticking with toggle references, she suggested a solution that had the advantage of being much simpler.

In “Avoiding toggle references” above, I mentioned that the only thing standing in the way of removing toggle references, is custom state on the JS objects. If there is custom state, the objects can’t be destroyed and recreated as needed. In Gecko, custom state properties on DOM objects are called “expandos” or “expando properties” and are troublesome in a similar way that they are in GJS’s toggle references.

Nika’s solution is to separate the JS object from the expandos, putting the expandos on a separate JS object which has a different lifetime from the JS object that represents the GObject in the JS code. We can then make the outer JS objects into JS Proxies so that when you get or set an expando property on the JS object, it delegates transparently to the expando object.

Kind of like this:

In the “before” diagram, there is a reference cycle which we have to solve with toggle references, and in the “after” diagram, there is no reference cycle, so everything can simply be taken care of by the garbage collector.

In cases where an object doesn’t have any expando properties set on it, we don’t even need to have an expando object at all. It can be created on demand, just like the JS object. It’s also important to note that the expando objects can never be accessed directly from JS code; the GObject is the sole conduit by which they can be accessed.

Recasting our GUI from the beginning of the post with a tree of GUI elements where the top-level window has an expando property pointing to the bottom-level button, and where the window was just closed, gives us this:

Most of these GObjects don’t even need to have expando objects, or JS objects!

At first glance this might seem to be garbage-collectable all at once, but we have to remember that GObjects aren’t integrated with the garbage collector, because they can’t be traced, they can only have their reference counts decremented. And the JS engine doesn’t allow you to make new garbage in the middle of a garbage collector sweep. So a naive implementation would have to collect this in two passes, leaving the window’s expando object and the button for the second pass:

This would require an extra garbage collector pass for every expando property that referred to another GObject via its JS object. Still a lot better than the previous situation, but it would be nice if we could collect the whole thing at once.

We can’t walk the whole tree of GObjects in the garbage collector’s marking phase; remember, GObject references are nondirectional, so there’s no generic way to ask a GObject which other GObjects it references. What we can do is partially integrate with the marking phase so that when a GObject has only one reference left, we make it so that the JS object traces the expando object directly, instead of the GObject rooting the expando object. Think of it as a “toggle reference lite”. This would solve the above case, but there are still some more corner cases that would require more than one garbage collection pass. I’m still thinking about how best to solve this.

What’s next

All together, this should make the horrible toggle reference code in GJS a lot simpler, and improve performance as well.

I started writing the code for this last weekend. If you would like to help, please get in touch with me. You can help by writing code, figuring out the corner cases, or testing the code by running GNOME Shell with the branch of GJS where this is being implemented. Follow along at issue #217.

Additionally, since I am in Toronto again, I’ll be visiting the Mozilla office again this week, and hopefully more good things will come out of that!

Acknowledgements

Thanks to Ted Campbell, Nika Layzell, and Kannan Vijayan of Mozilla for making me feel welcome at Mozilla Toronto, and taking some time out of their workday to talk to me; and thanks to my employer Endless for letting me take some time out of my workday to go there.

Thank you to Ted Campbell and Georges Stavracas for reading and commenting on a draft version of this post.

The diagrams in this post were made with svgbob, a nifty tool; hat tip to Federico Mena Quintero.

Advertisements