I use Stable Diffusion myself, but Dall-E 3 is a bit further than the models I use when it comes to understanding the prompt. You can throw a long sentence at it and if the filter has nothing against it, you will get what you ask for. While with SD you have to do a lot more to get what you want, tags instead of a sentence, inpainting because 2 focuses at the same time is difficult. (Then Comfyui can go far with automation if you set it up properly.)

Then also how versatile the model is, locally I regularly have checkpoints or Loras for different styles or subjects, not an easy “1 prompt box for everything.” like I can find on Bing.

I seriously doubt that my 2080 can use a model like Dall-E 3. That’s why I say it’s not at the level of Stable Diffusion. A lot happens under the hood on those servers.

By the way, with the Bing app you can also continue without those tokens, but a little slower. Every day I have 15 new tokens, although it doesn’t go higher than that.
I don’t know about the Paint integration, it is still not available in my installation of Paint (Windows 11, Paint freshly downloaded from the store.)

But if there’s one thing I’ve learned about people, it’s that the perception of convenience is worth a lot, and they’re happy to pay more (or at all) because they think it’s easy. Googling is too much effort for some.

Edit: Another point for Microsoft to only offer it via servers, if they were to release it for local use, their filter will be gone within a day, and they don’t want any association with what some people want to generate in images.

