/tinyletter

process attach --name This Week's Program

This Week’s Program: Jan 8 - Jan 12

This week, I was working in the intersection of three areas:

  • Racket
  • GStreamer (through Racket FFI bindings)
  • Cocoa internals and the implementation of racket/gui

I realized that I am probably the only person in the world working in that space.

I also expanded my toolkit a bit, finally reaching for lldb. Admittedly, I had little success debugging this situation, but I feel like it’s a good tool to have close at hand.

The two challenges I was working through this week both involve GStreamer and Cocoa, and how Racket does some things to subvert or avoid the usual Cocoa routines.

osxvideosink

Here’s the first challenge, illustrated with a basic Overscan program:

#lang overscan

(broadcast (videotestsrc)
           (element-factory%-make "osxvideosink"))

This program connects a test video source to an osxvideosink, a GStreamer element that will draw a window preview of the video in macOS. This program, when run through the racket command line tool or REPL, works just fine.

The snag is the same program, but with the racket/gui framework loaded:

#lang overscan

(require racket/gui/base)

(broadcast (videotestsrc)
           (element-factory%-make "osxvideosink"))

This program crashes. Why does it crash? I poured over several bits of code. The code for the osxvideosink itself:

https://github.com/GStreamer/gst-plugins-good/blob/master/sys/osxvideo/osxvideosink.m

Something I learned through a fair bit of random googling, is that some programs are very persnickety about running on the “main thread” or “main runloop” of an application. The osxvideosink is one such program. There’s a function that checks whether or not the program is running in the main runloop: gst_osx_videosink_check_main_run_loop

This function runs a line of code to do a quick check to see if the main runloop is running:

[[NSRunLoop mainRunLoop] currentMode]

Through some experiments with my repl and Racket’s ffi/unsafe/objc module, this appears to be false.

I start up lldb and, in another shell, startup the Racket repl. I attach lldb to the repl with process attach --pid (with process continue to ensure that control returns to the repl). I run the program and it crashes, as expected. Now within lldb I can type thread backtrace, to see where this failed. Something in the Cocoa runloop is calling back into Racket and finding a big fat NULL.

At this point, I decide to dig into some Racket internals.

https://github.com/racket/gui/blob/master/gui-lib/mred/private/wx/cocoa/queue.rkt

This is the code that implements the Racket Gui in Cocoa. I don’t understand most of it, but what I am able to glean is that Racket doesn’t create a Cocoa application in the traditional sense. It instead kind of, sort of fakes an application using some runtime tricks.

So what it looks like is happening is that the osxvideosink is unable to detect that it is running in the main runloop, tries to start it’s own NSApplication, and something goes very bad when the two collide. Interesting and intellectually stimulating, with very little I can do about it.

glimagesink

Challenge number two involves this program:

#lang overscan

(require racket/gui/sink)

(broadcast (videotestsrc)
           (element-factory%-make "glimagesink"))

Very similar to the previous example, this time with the glimagesink element. The program runs, opens a window, and I can see the first frame of the test video signal. But the window does not continue to animate, it’s just a frozen frame.

I run two different shells with some GStreamer logging turned on. The first like so:

GST_DEBUG='gl*:5' gst-launch-1.0 -e videotestsrc ! glimagesink

This turns on logging for every element that begins with “gl” to the DEBUG level, and starts up gst-launch — a command line tool for testing out GStreamer pipelines. This time, the glimagesink performs as expected. In the second shell I run this command:

GST_DEBUG='gl*:5' racket -I overscan

This sets up logging the same way, but starts up an Overscan repl where I run the above program. I’m looking closely at the output and notice this very small distinction between the two. The working gst-launch log has a block like this each time a frame renders:

glcontext gstglcontext.c:749:gst_gl_context_activate:<glcontextcocoa0> activate:1
glcontext gstglcontext.c:749:gst_gl_context_activate:<glcontextcocoa0> activate:1
glcontext gstglcontext.c:749:gst_gl_context_activate:<glcontextcocoa0> activate:1
glcontext gstglcontext.c:749:gst_gl_context_activate:<glwrappedcontext0> activate:1
glcontext gstglcontext.c:749:gst_gl_context_activate:<glwrappedcontext0> activate:0
glcontext gstglcontext.c:749:gst_gl_context_activate:<glcontextcocoa0> activate:1
glcontext gstglcontext.c:749:gst_gl_context_activate:<glcontextcocoa0> activate:1
glcontext gstglcontext.c:749:gst_gl_context_activate:<glcontextcocoa0> activate:1
glcolorbalance gstglcolorbalance.c:231:gst_gl_color_balance_before_transform:<glcolorbalance0> sync to 0:00:00.100000000

Several calls to activate on these glcontexts. When I look at the logs in the Racket implementation the two outliers, the activate calls for the glwrappedcontext aren’t there.

Then, I repeat this with a different debugging configuration: GST_DEBUG='glcaopengllayer:7'. This sets up logging for the underlying CoreAnimation layer up to the LOG level. On the working version I see frequent calls to this line for each frame:

-[GstGLCAOpenGLLayer drawInCGLContext:pixelFormat:forLayerTime:displayTime:]: CAOpenGLLayer drawing with cgl context 0x7fcecd063200

In the Racket version, this line is only displayed once! Digging deeper into the code, I think I’m facing a similar issue with my osxvideosink: something about this drawing operation is happening in a thread that is assumed to be running, but because of the Racket GUI implementation details, might not be!

Finally, I turn to the Racket community with a post on the mailing list: https://groups.google.com/d/msg/racket-users/wH_OU3haWgk/MDeRcnuNAAAJ

And who better to answer my incredibly esoteric question than Professor Matthew Flatt, one of the core members of the Racket team. He gives me this handy little function: call-atomically-in-run-loop and expands my mind in the process. Running broadcast within this proc prevents the crash from happening, because I’m instructing Cocoa to run my broadcast function from the main runloop!

Now, my problem with glimagesink still persists, but I’ve narrowed down some of this odd behavior to this Cocoa function:

nextEventMatchingMask:untilDate:inMode:dequeue:

I think banging my head into this particular brick wall might actually get me closer to my goal of a working cross-platform(ish) solution.

— Mark