process attach --name This Week's Program
This Week’s Program: Jan 8 - Jan 12
This week, I was working in the intersection of three areas:
- Racket
- GStreamer (through Racket FFI bindings)
- Cocoa internals and the implementation of
racket/gui
I realized that I am probably the only person in the world working in that space.
I also expanded my toolkit a bit, finally reaching
for lldb
. Admittedly, I had little success
debugging this situation, but I feel like it’s a good tool to have
close at hand.
The two challenges I was working through this week both involve GStreamer and Cocoa, and how Racket does some things to subvert or avoid the usual Cocoa routines.
osxvideosink
Here’s the first challenge, illustrated with a basic Overscan program:
#lang overscan
(broadcast (videotestsrc)
(element-factory%-make "osxvideosink"))
This program connects a test video source to an osxvideosink, a GStreamer element that will draw a window preview of the video in macOS. This program, when run through the racket command line tool or REPL, works just fine.
The snag is the same program, but with the racket/gui
framework
loaded:
#lang overscan
(require racket/gui/base)
(broadcast (videotestsrc)
(element-factory%-make "osxvideosink"))
This program crashes. Why does it crash? I poured over several bits of code. The code for the osxvideosink itself:
https://github.com/GStreamer/gst-plugins-good/blob/master/sys/osxvideo/osxvideosink.m
Something I learned through a fair bit of random googling, is that
some programs are very persnickety about running on the “main thread”
or “main runloop” of an application. The osxvideosink is one such
program. There’s a function that checks whether or not the program is
running in the main
runloop:
gst_osx_videosink_check_main_run_loop
This function runs a line of code to do a quick check to see if the main runloop is running:
[[NSRunLoop mainRunLoop] currentMode]
Through some experiments with my repl and
Racket’s
ffi/unsafe/objc
module,
this appears to be false.
I start up lldb
and, in another shell, startup the Racket repl. I
attach lldb to the repl with process attach --pid
(with process
continue
to ensure that control returns to the repl). I run the
program and it crashes, as expected. Now within lldb
I can type
thread backtrace
, to see where this failed. Something in the Cocoa
runloop is calling back into Racket and finding a big fat NULL.
At this point, I decide to dig into some Racket internals.
https://github.com/racket/gui/blob/master/gui-lib/mred/private/wx/cocoa/queue.rkt
This is the code that implements the Racket Gui in Cocoa. I don’t understand most of it, but what I am able to glean is that Racket doesn’t create a Cocoa application in the traditional sense. It instead kind of, sort of fakes an application using some runtime tricks.
So what it looks like is happening is that the osxvideosink
is
unable to detect that it is running in the main runloop, tries to
start it’s own NSApplication
, and something goes very bad when the
two collide. Interesting and intellectually stimulating, with very
little I can do about it.
glimagesink
Challenge number two involves this program:
#lang overscan
(require racket/gui/sink)
(broadcast (videotestsrc)
(element-factory%-make "glimagesink"))
Very similar to the previous example, this time with
the glimagesink
element. The program runs, opens a
window, and I can see the first frame of the test video signal. But
the window does not continue to animate, it’s just a frozen frame.
I run two different shells with some GStreamer logging turned on. The first like so:
GST_DEBUG='gl*:5' gst-launch-1.0 -e videotestsrc ! glimagesink
This turns on logging for every element that begins with “gl” to the
DEBUG
level, and starts up gst-launch
— a command line tool for
testing out GStreamer pipelines. This time, the glimagesink performs
as expected. In the second shell I run this command:
GST_DEBUG='gl*:5' racket -I overscan
This sets up logging the same way, but starts up an Overscan repl
where I run the above program. I’m looking closely at the output and
notice this very small distinction between the two. The working
gst-launch
log has a block like this each time a frame renders:
glcontext gstglcontext.c:749:gst_gl_context_activate:<glcontextcocoa0> activate:1
glcontext gstglcontext.c:749:gst_gl_context_activate:<glcontextcocoa0> activate:1
glcontext gstglcontext.c:749:gst_gl_context_activate:<glcontextcocoa0> activate:1
glcontext gstglcontext.c:749:gst_gl_context_activate:<glwrappedcontext0> activate:1
glcontext gstglcontext.c:749:gst_gl_context_activate:<glwrappedcontext0> activate:0
glcontext gstglcontext.c:749:gst_gl_context_activate:<glcontextcocoa0> activate:1
glcontext gstglcontext.c:749:gst_gl_context_activate:<glcontextcocoa0> activate:1
glcontext gstglcontext.c:749:gst_gl_context_activate:<glcontextcocoa0> activate:1
glcolorbalance gstglcolorbalance.c:231:gst_gl_color_balance_before_transform:<glcolorbalance0> sync to 0:00:00.100000000
Several calls to activate
on these glcontexts. When I look at the
logs in the Racket implementation the two outliers, the activate calls
for the glwrappedcontext
aren’t there.
Then, I repeat this with a different debugging configuration:
GST_DEBUG='glcaopengllayer:7'
. This sets up logging for the
underlying CoreAnimation layer up to the LOG
level. On the working
version I see frequent calls to this line for each frame:
-[GstGLCAOpenGLLayer drawInCGLContext:pixelFormat:forLayerTime:displayTime:]: CAOpenGLLayer drawing with cgl context 0x7fcecd063200
In the Racket version, this line is only displayed once! Digging
deeper into the code, I think I’m facing a similar issue with my
osxvideosink
: something about this drawing operation is happening in a thread
that is assumed to be running, but because of the Racket GUI
implementation details, might not be!
Finally, I turn to the Racket community with a post on the mailing list: https://groups.google.com/d/msg/racket-users/wH_OU3haWgk/MDeRcnuNAAAJ
And who better to answer my incredibly esoteric question than
Professor Matthew Flatt, one of the
core members of the Racket team. He gives me this handy little
function: call-atomically-in-run-loop
and expands my mind in the process. Running broadcast
within this
proc prevents the crash from happening, because I’m instructing Cocoa
to run my broadcast
function from the main runloop!
Now, my problem with glimagesink
still persists, but I’ve narrowed
down some of this odd behavior
to this Cocoa function:
nextEventMatchingMask:untilDate:inMode:dequeue:
I think banging my head into this particular brick wall might actually get me closer to my goal of a working cross-platform(ish) solution.
— Mark