· By Bijan Bowen
Testing Flux.1-Dev Locally With A NVIDIA 3090TI & SwarmUI
When I first heard about the release of a new image model, I didn't think too much of it. Though I was hearing more and more about how impressed people were, it took a bit more of a push for me to decide to see what the hype was about. Perhaps I have low standards, but I have been happily using the same locally run model for the past 6 or so months (which in the world of AI, is a very long time).
The thing that finally drove me to check out this new model, Flux as it was dubbed, was when I saw a post on Reddit sharing results of it generating casual-looking mirror selfies that looked like they were taken in the mid-2010s. Being someone whose mind tends to quickly jump to potential 'outside the box' uses of technology, I found these extremely convincing images to be fascinating, impressive, exciting, and scary at the same time. I won't lie to you. The thought of people being catfished with these images was one of the first things that came to mind when I saw these realistic photos. I suppose I should find a better metric with which to judge how good an image model is, but I suppose that is just one of many metrics to judge how convincing a model's generations are.
Up until I tried Flux, my experience in terms of out-of-the-box model performance - that is, simply downloading a model and generating images from it, sans any after-effects or other modifications - was that it was pretty easy to tell if an 'attractive' image was AI-generated. It usually looked too "perfect," whether that be the facial symmetry of the person, the lighting of the image, or a combination of the two. The thing that really got me excited was the realism these images seemed to convey. It was not immediately obvious that these images were AI-generated, and the fascinating part of that (at least for me) was because of the model's ability to generate the "aesthetic" of the era it was generating from.
My first testing of Flux was related to this; I wanted to see how well it could generate photos that looked like they were from any X number of years ago. Before testing, I had to get Flux working. I tried for a bit of time to get the model installed based on the Flux Github documentation, but my computer was having issues with certain packages and dependencies, and I decided to follow an alternate install method by simply installing SwarmUI and downloading the Flux.1-Dev.safetensors and ae.safetensors files from the Huggingface repo for Flux.1-dev. I find it worthy of mention that you must agree to share your info to access the model on Huggingface.
Once I had all the requirements installed, I placed the "Flux.1-Dev.safetensors" file into the SwarmUI "Models/unet" folder, and the "ae.safetensors" file into the "Models/VAE" folder. Following this, a quick run of the ./launch-linux.sh command and the SwarmUI fired up and allowed me to select and load the Flux.1-Dev model. My first generations looked like blobs of crap, and after a bit of reading, I saw this was encountered by others and remedied by setting the "cfg" to 1. This fixed the issue, and all was ready to go. My steps for generation are as follows: 40 Steps, CFG Scale of 1, Euler sampler, and Normal scheduler.
The first few images I tested were funny things I wanted to see, which is me-speak for "nsfw". After I got that all out of my system, I began prompting it to generate images in certain years. I will admit my prompting is rather lackluster, half because I don't feel like typing much, and half because I like to test the model's abilities to generate based off relatively mundane prompts. Given this, I asked it to "Generate a photo taken at a college party in (Year x)". My first attempt at this was 1985, and the image result can be seen below.
I am not a photography expert, nor was I around in the year 1985, but based on the depictions and imagery I have seen of that era, this photo really looks like it came straight from the mid-80s. This was a far cry from the usual AI "perfection" that I was used to seeing. Sure, it would still be easy to nitpick certain elements of the photo, but the point I am excited about here is that the main subject of the photo really looks "vintage". The way the light illuminates the head of hair, the slightly "old" tinge to the photo, is very impressive to me, based on what I have previously seen from these models. Following this, I turned back the clock 20 years, to a "college party in 1965".
I want to mention a few specifics of this image that impressed me. Initially, the hands were a big win for this model in general. Most of my generations accurately depicted the human hand as having 5 fingers, a previous area of struggle. Next, the way that the woman to the right of the picture is facing away from the camera but still smiling and giving the impression that she is looking at someone off to the side. I was also impressed with the continuity between the glasses being held by the people as well as the fluid contained within them. Finally, the way that the background of the image has a sort of "fogginess to it" is something that I have seen with real photos taken during this time period, which I found to be an impressive grasp of the era's photographic examples.
I tested many other prompts with Flux.1-Dev, and I have to say that I am impressed albeit slightly concerned at the model's ability to produce "convincing" results. I want to quickly touch upon the model's ability to generate legible English text. In my brief text-related testing, I found that Flux.1-Dev was able to generate shorter snippets (one sentence or so) of text in a coherent and legible manner. Even when it didn't generate proper syntax, its grasp of the English alphabet was impressive. Gone were the dream-like squiggles produced by previous models, replaced with near-perfect replications of the English alphabet.
I do find that a lot of people like to immediately claim they can tell an AI-generated image is fake, so I suppose I should clarify that when I say convincing, I am also incorporating comparisons with abilities of previously released models into my threshold of what makes an AI-generated image convincing.
With that said, the abilities demonstrated by Flux are very impressive. The model can (most times) accurately generate the correct human anatomy. It also demonstrated the very impressive ability to produce smaller snippets of English text completely coherently and legibly. I still love my OpenDalleV1.1, but I must admit I may transition to using the new kid on the block, Flux.1-Dev, more and more.
You can view the video for this article on my YouTube Channel