Reports from the world of creativity & AI.
Greetings all! Apologies for sending this second edition out rather late – I started a new job this week, and my brain has been overloaded with new Notion pages, dashboards, figuring out the coffee machine, and trying act like a normal person after two years working from home.
Meanwhile, I’ve had to rewrite this newsletter at least a dozen times, as new things keep happening! Let’s crack on before anything more gets released, shall we?
This week on the DALL·Ery GALL·Ery
Everything you need to know about Midjourney 20 minute read
I’ve spent the last week or two experimenting with Midjourney, and come away very impressed – in my view, it’s still somewhat overlooked and underrated.
With its kooky Discord-based UI, and an early reputation for ornate yet alien-feeling generations, there are a lot of common misconceptions about this multi-talented AI tool.
On Twitter, for example, a surprising number of people didn’t realise it was possible to use MidJourney outside of a frantic public chatroom.
So check out this detailed rundown of how Midjourney works, what it’s really capable of, and how it compares with DALL·E, especially if you haven’t tried it out yourself yet. The answers may surprise you!
🤩 Interesting stuff this week
Pretty much one massive ongoing story...
Stable Diffusion onboards 15k+ creators
After two weeks of alpha-testing, the Stable Diffusion text-to-image AI has onboarded some 15,424 members to a chaotic test Discord server.
A few notable differences:
It’s currently free to try – and more or less unlimited, aside from a ‘one prompt per minute’ speed limit
Outputs have a CC0 ‘no rights reserved’ license, which means you can do anything you want with them – but so can anybody else!
Default outputs are 512px square, but you can simply request up to 1024px (though it sometimes interprets this to mean you want a ‘larger scene’, rather than a higher-res version of the same thing)
You can receive all your generations in one shareable ‘grid’ image (no more Gandr!)
Landscape and portrait images are both built in - hurrah!
‘Advanced’ options include setting the number of ‘steps’ and the ‘CFG scale’, which adjusts how intently the model attempts to satisfy the prompt (here’s what that looks like)
‘Seed’ selection, which tends to generate images with similar compositions (because they are born from identical ‘noise’)
Similar to Midjourney, there are the usual rules against NSFW and hateful content, but not DALL·E’s additional limits on public figures and topics such as health or politics
Further waves of beta testers will soon be invited - you can sign up for that here.
Stable Diffusion’s parent company, Stability AI, is an ambiguously-sized outfit headquartered in London (25 people says LinkedIn, 75 people says this article), and they’re progressing at a rapid clip – they plan to launch a DALL·E style ‘prosumer’ web interface for Stable Diffusion imminently, and say they have the tech to back it up:
“The webpage and Discord server we are making will have the stable diffusion bot running on our A100 cluster pod network. This network is the 10th largest super computer in the world. We have more raw compute power than NASA!” - shadow wanderer, Discord
Meanwhile, Stability has also released the raw model to researchers, allowing them to run Stable Diffusion locally, which apparently only requires a mid-range graphic card with 5GB of VRAM!
In one final development, some users already have access to the upcoming desktop web app, which is expected to replace the Discord interface imminently (indeed, prompting via Discord is already being wound down…) I think they’ve been discouraged from sharing too much publicly, but here are a few screenshots:
(Can you tell that I’ve had to rewrite this section a lot while the newsletter has been in my drafts?)
It’s exciting, and perhaps a little unnerving, that just a few short months after DALL·E 2’s seemingly ‘light years ahead’ reveal, we already have so many competitive services launching left, right and centre, and not only from the likes of Meta and Google.
And not to freak anyone out, but the founder is also carrying out a mysterious countdown on Twitter, ending on Monday. Ominous!
TechCrunch: This startup is setting a DALL-E 2-like AI free, consequences be damned
Mixed: Open-source rival for OpenAI’s DALL·E runs on your graphics card
Interdependence.fm: Podcast interview with founder Emad Mostaque
In this fascinating paper, researchers find a way to ‘teach’ an existing text-to-image model what a new object is with just a handful of images. Once instructed, the model can then create versions of X in different contexts, just as you’ve previously seen them do with Kermit or Homer Simpson.
The method works just as well for ‘styles’ as ‘subjects’, inferring the key qualities of supplied artwork, which can then be used in further prompts:
From my inexpert interpretation, this doesn’t actually involve retraining the model in the conventional sense, which is an extremely intensive process. Instead, this approach aims to reveal ‘what embeddings would an (object, style) like ‘X’ trigger, in the existing model?’ and then creates a made-up prompt ‘meta-word’ which activates the same embeddings when used in a prompt.
Plugged into an existing model, it offers plenty of choice possibilities:
style saving: if you’ve successfully generated a series of images with a unique ‘look’ using conventional methods, you could simply re-upload them as ‘exemplars of Style X’, and then be able to create images ‘in the style of Style X’ forevermore
much faster real-life-to-AI-asset speed: Let’s say a character, location or person X becomes well-known – you’d currently need to wait months or even years for images of them to appear in a model’s core training data. (DALL·E, for example is only trained on pre-2019 images.) But with a tool like this, just a handful of images from the present day are enough to refer to X in text prompts.
Or tweet one of the researchers with your own ‘X’ to give it a go!
On why Midjourney uses Discord: “We found very quickly that most people don’t know what they want. You say: “Here’s a machine you can imagine anything with it — what do you want?” And they go: “dog.” And you go “really?” and they go “pink dog.” So you give them a picture of a dog, and they go “okay” and then go do something else.
Whereas if you put them in a group, they’ll go “dog” and someone else will go “space dog” and someone else will go “Aztec space dog,” and then all of a sudden, people understand the possibilities, and you’re creating this augmented imagination — an environment where people can learn and play with this new capacity.”
On tweaking results: “Our most recent update made everything look much, much better, and you might think we did that by throwing in a lot of paintings [into the training data]. But we didn’t; we just used the user data based off what people liked making [with the model].”
On compute: “There’s never been a service for consumers where they’re using thousands of trillions of operations in the course of 15 minutes without thinking about it. Probably by a factor of 10, I’d say it’s more compute than anything your average consumer has touched.”
👀 More cool stuff to explore
🛠 New tool: quick image shrinker for un-cropping
Want to zoom out from a scene? Just drag your initial image into this tool to create a border of blank pixels, ready to upload into DALL·E.
🔎 New tool: what’s that art style?
Found an image you like the vibe of, but not sure what it’s called? (Same.Energy is great for this!) Head here, click the two play buttons along the side (next to Check GPU + Setup) then put the URL in the field next to Interrogate. Click the play button next to Interrogate and a few seconds later you’ll get a bunch of guesses as to the nearest known art style.
🧫 Creative: Volstof Institute for Interdimensional Research
Nice spooky alternate-reality vibes. I am awfully partial to a bit of science-gone-spooky...
↔️ Epic: An 8000x8000 DALL·E mural
Loved this from back in the early DALL·E 2 beta era: I also love the thematically-appropriate face-redactions, from back when DALL·E didn’t let you share photos of faces!
A browsable guide to artist styles you can use in Stable Diffusion, and, I suspect, other major AI art tools. Assembled by the same squad that brought you this massive spreadsheet and this massive Google Drive folder.
I’d anticipate that these will likely work in DALL·E too; one drawback of DALL·E charging $$ out of the gate is it precludes projects which require thousands of prompts to rigorously complete, so it’s likely to be passed over as the ‘research platform of choice’ for investigations of this ilk.
A mouthwatering demo of generating AI assets and filling gaps with prompts, all inside an image editor UX: if you know Paint Bucket, like Gradient Fill, and love Context-Aware Fill, now try Anything You Can Imagine Fill!
For reasons not fully understood, if you ask Craiyon to generate ‘a crungus’ it pictures a hell-demon from the astral plane? And it’s not a fluke - ask it for ‘crungus at the gym’ and it’s the same creature.
⏳ History: A timeline of AI text-to-image tools
A thorough look back at milestones on the way to our current situation. How far we’ve come. And still so far to go!
😭 Article: I Went Viral in the Bad Way
Journalist Charlie Warzel recently got dragged by the Illustration Twitter Mob for having the audacity to illustrate his own newsletter with an image from Midjourney rather than a stock photograph from Getty. THE HORROR! It’s fair to say he’s a lot more self-flagellating than I think is called for given the facts of the matter.
It is noteworthy that for the time being, creating ‘prominent’ content with these tools might inspire hundreds of people to angrily tweet at you, call for your utter humiliation, and demand you pen a grovelling apology. I’d speculate that this will last as long as AI-created images are sufficiently distinctive or openly credited as such, i.e: probably just a few more weeks at the current rate.
Prompt hacks for Stable Diffusion:
Wrapping (words) in increasing amounts of ((brackets)) possilbly reduces their (((intensity))) – @TomLikesRobots
Adding! exclamation marks!! To key terms!!! Increases their intensity!!!! – @KyrickYoung
Connecting adjectives with hyphens possibly prevents concept leakage, ensuring your ‘big-yellow-car’ is in fact a car that is yellow and big, not simply a picture featuring the concepts of ‘bigness’, ‘yellowness’, and ‘carness'’ – [can’t remember where i saw this, sorry!]
Anyone for tennis? 🤯
‘Facebook has a lot of fake news on it these days.’ ‘Mark Zuckerberg did a terrible job at testifying before Congress. It makes me concerned about our country.’ 'His company exploits people for money and he doesn't care.' ‘Since deleting Facebook, my life has been much better.’ - just some choice quotes from Facebook’s inexplicably clumsy, why-did-they-even-release-this conversational AI, BlenderBot.
See you (early) next week!