For whatever reason I’ve found myself doing a deep dive into AI Film recently. Part of the reason is that (my husband and collaborator) Michael Tippett has become obsessed with AI Film (and has been working on an epically strange series called Maximum Perception), and that together we decided to do the Pacific Future: AI & Film event in October. Another part of it is that we went to LA Tech Week, where we met a ton of fascinating LA-based people whose main references for Gen AI production come from cinema, and that LA is unsurprisingly extremely film-forward as a community. It’s also because much of the massive explosion of Generative AI content that is everywhere is being produced by people with backgrounds in film and visual FX who suddenly had a bit of time on their hands when the industry went on strike. It makes a lot of sense for Vancouver as well, as we have a big film community here and a super deep bench when it comes to Motion Graphics and Visual FX. But all of that aside, I find myself learning about a lot of stuff that is extremely new to me in terms of how moving images are produced within industry pipelines and how all these positions and specializations relate to each other and work together to produce, say, A Dog’s Purpose.
There was a great presentation Vince McCurley from the National Film Board of Canada (NFB) gave during our event, as he was describing how the NFB is using AI to work with their archive. (Fun Fact: without AI it would take 30 years to index their existing archive). The NFB is using machine learning to create detailed metadata for all their footage - everything from camera angle to colour, action, combination of colour and object, and combinations of elements (ie 2 TVs). Once keywords have been attached to the files using ML, they use Vector Embedding to group media according to similarities. Then they can search the similarities, as well as use a LLM to create detailed description of what is happening in the media and search that as well.
But what got my attention for todays purposes anyway is how you can search the NFB archive for things that match compositionally. Essentially this enables super easy match cutting. In their example they are using a figure in a running pose, and when they search their vast archive they can find an array of other images from hugely different contexts that show a figure in that pose or that is moving through that pose as it does something else, like fight a forest fire.
The end result is that you can use these ML systems to locate and then stitch together a reel where a figure transforms from one context to another at a point within the frame that remains smooth and consistent, going from a runner, to a firefighter, to two firefighters, to two politicians. This is hard to describe so see what it looks like at 19:44 on the video.
Then today I was looking at Jumper, a new video editing tool in Adobe Premiere Pro and Final Cut Pro that enables advanced image and speech recognition technologies so that you can locate footage by describing it, and a “similar” button where you can locate shots that are similar compositionally to other shots.
So in the past, famous match cutters and montage-ers - Kubrick, Eisenstein, Aronofsky, Hitchcock - would have developed the concept for their match cut, then created the shots, then matched the shots to create a symbolic/logical/compositional connection. (One example, in 2001: A Space Odyssey, a caveman throws a bone into the air and it becomes a spaceship in the next shot).
But with something like Jumper you can search within your clips and have connections surfaced though the AI’s pattern detection, because - duh - that is what AIs do, recognize and replicate structures in text or image.
Is this basically a supersonic match cutting mech that allows an editor to surf in an easygoing, drifting, melancholic back-and-forth between images and clips that is as effortless as swinging in a hammock? Does this become a different kind of filmmaking, especially if the idea isn’t ultimately about creating narrative throughlines but other kinds of freestyling?
Or will the resulting creations bear the hallmarks of the Easy Button, going from sunflowers to moody lanterns to mandelas - everything we expect, because we once expected it. Whatever those associative dimensions are, this reflects a cultural feedback loop where AI suggests visual matches based on collective cultural data, which editors then reintroduce into culture. The Dreamachine, but with our eyes open?