Pyodide - first impressions and basic profiling

At Creative Coding last week, I made ghostlines, a moody little sketch where I decided to draw an awful lot of vectors for fun on a canvas calculated off some live image manipulation from a camera feed. Unexpectedly, it was slower than a sloth, which was a good excuse to revisit WebAssembly (Wasm) for performance, and because why not.

I consulted this excellently curated list of Wasm-ready langs to pick out something with performant matrix manipulation capabilities, and decided to try out Pyodide, which is Python plus scientific computing libraries compiled to Wasm. In other words, seems like a good way to do client-side matrix manipulation or stats without perishing from boredom watching Javascript seize up.

Here are some first impressions and a quick benchmark on some basic image manipulation.

First thoughts

Pyodide’s setup was sleek and fantastic, even without comparing it to my precarious first brush with Wasm, which involved piecing together several dependency chains from hell to rebuild TinyGo on my decrepit machine. Pyodide’s examples worked out of the box for me, which is an exceedingly rare delight in modern web development.

Type conversion

It is not 100% magic, but it’s pretty close. You need to be aware of some type translation conventions between Javascript and Pyodide.

JS values passed to Pyodide (as with pyodide.globals.set) will be inside a JsProxy wrapper. You can use to_py() to have Pyodide attempt to convert it to a native Python object, and there’s even built-in support for Javascript promise handling, which I didn’t try but which looks pretty slick.

I’m also doing some things like casting JS values to Uint8, not because it’s particularly tricky for Python to convert an int32 to a uint8, but because I assume it will make it easier for the machine architecture underlying Wasm to handle the data more efficiently.

Let’s benchmark!

I started doing some basic profiling, but immediately saw that repeated data was probably getting cached because all runs after the first run went about 100x faster. So I needed some randomized “image” data.

numpy refresher/trivia

I haven’t used numpy much and not in a pretty long time, so I had to learn a lot of basics along the way.

Randomized “image” data

I built a quick random RGBA image-data generator in both JS (because that’s where the image data will be coming from) with lodash (for my sanity), and in Python (to simplify Python-side code debugging, but not to be used in the end result). To strike a balance of sanity and realism, I used an image size of 800x600.

First pass with vanilla python list comprehensions was hilariously slow; to be %.1f precise, about 4.5 seconds.

[
  [random.randint(0, 255) for _ in range(3)] + [255]
  for _ in range(height * width)  # each pixel
]

Then I found np.random.randint and matrix concatenation and it went down to 0.07 seconds. I’m not being precise here, I just need it to be not-slow enough for me to not-weep while iterating.

np.concatenate(
  ( # Generate rand RGB data
    np.random.randint(255, size=(height * width, 3)),
    # Alpha = 255
    255 * np.ones((height * width, 1)),
  ), axis=1,  # Concatenate them column-wise
)

In emulation of the way image data is shaped when you yoink it out of an HTML canvas context, the random “image” on the JS side was formatted as a 1D array of unsigned 8-bit integers, representing a flat, unchunked list of RGBA values.

new Uint8Array(_.flatten(_.map(_.range(height), () =>
  // each row
  _.flatten(_.map(_.range(width), () =>
    // For each pixel, give random RGB values with A=hi
    _.map(_.range(3), () => _.random(0,255)).concat(255)
  ))
)));

Actually benchmarking now

The current canonical way to time code execution in JS is performance.now().

I figured it would almost definitely be faster to chuck the image directly into Python and do all computing there, but since I’m not very familiar with the architectures of either Pyodide or Wasm, I didn’t want to make any assumptions.

For example, I figured it would probably be faster to (1) strip the image to greyscale on the Python side, but I didn’t want to rule out the possibility that (2) the performance hit from stripping to greyscale on the JS side might somehow be outweighed by the performance gain from translating 4x less data than a full RGBA image.

(1) Benchmark: pass random image data from JS to Pyodide via await pyodide.globals.set, then use await pyodide.runPython to nicely ask Pyodide to instantiate my Image class and greyscale it:

ran "strip RGB in Py" 10 times
avg: 1.42 ms
max: 7.10 ms
min: 0.30 ms

(2) Benchmark: strip the random image data to greyscale on the JS side, then pass it to Pyodide and instantiate an Image out of it.

ran "strip RGB in JS then pass to Py" 10 times
avg: 1767.25 ms
max: 2753.60 ms
min: 1239.00 ms

Update 10/14: I later found out that a good portion of my JS slowness was from foolish mishandling of intermediate data representations, rather than from JS actually being that bad at math, so the rest of this post will have to be rethought.