Replies

  • Hi everyone, 

    I just wanted to add that with my RTX 2060 (the 2021 model with 12GB VRAM, which I paid $ 380, with no additional cooling required) I'm actually having a lot of fun with generative models and some of the results are really amazing. 

    After playing for long time with colabs (and feeling frustrated by their plans) I can finally run generative models locally, including Disco Diffusion!

    Of course the generation process is not as fast with those GPU beasts Google provide, but it's still very bearable! 

    For example, I just finished generating a 960 x 540 px image (700 steps) with Disco Diffusion and it took about 15 minutes.

    I thought I would let you know the above because from what I could read here, it looks like some of you think they need way more expenisve hardware for these things to run, which is not necessarily true.

    That said, following is a short list of some of the generative models I experimented with, sorted by the quality of the results I could achieve:

    - Disco Diffusion
    - GLID-3
    - CompVis Latent Models
    - CLIP guided diffusion HQ 256x256
    - V-Diffusion-Pytorch
    - VQGAN-CLIP-APP
    - Big-Sleep
    - Deep-Daze (I think this was the most humble in terms of hardware resources, I remember running it on my old GPU with just 2GB VRAM!)
    - Clip-Glass


    For those projects which output low res results you can always use an upscaling model ! So no problem for me with that!

    Another great thing which I managed to do was, I started using FFmpeg to stream the generated image as it was bein updated (say once every 2 seconds) interpolating the resulting frames (using FFmpeg filters) into a nice and smooth video which basically is a real-time generated video from the neural network at work! You need to see it! (I was thinking I could start streaming this live on youtube or twitch at some point).

    That's all! I'll leave you with a few images generated with some of the models above, I hope you like them:

     

    Disco Diffusion - "The symphony of the last orgasm, trending or artstation"

    10446943093?profile=RESIZE_930x

     

     

    Disco Diffusion - "The symphony of the last orgasm, trending or artstation"

    10446955285?profile=RESIZE_930x

     

     

    Disco Diffusion - "Tripping on Datura"

    10446983690?profile=RESIZE_930x

     

     

    V-Diffusion-Pytorch (? not sure) - "Looking into the void"

    10446956271?profile=RESIZE_930x

     

     

    VQGAN-CLIP-APP - "Looking into the void"10446956472?profile=RESIZE_930x

     

     

    CLIP-guided-diffusion-HQ256x256 - "Recursive Functions Become Alive"

    10446956691?profile=RESIZE_930x

     

     

    CLIP-guided-diffusion-HQ256x256 - "A post apocalyptic Babel tower"

    10446957468?profile=RESIZE_930x

     

     

    CLIP-guided-diffusion-HQ256x256 - "A knight at dawn"

    10446958273?profile=RESIZE_930x

     

     

    CLIP-glass - "An underground party in a post apocalyptic city"

    10446983482?profile=RESIZE_930x

     

     

    Disco Diffusion - "If you gaze for long into an abyss, the abyss gazes also into you, trending on artstation"

    10446984674?profile=RESIZE_930x

     

     

    Disco Diffusion - "Gazing into the abyss"

    10446986065?profile=RESIZE_930x

     

     

    Disco Diffusion - "An intricate set of mirror reflections"

    10446992055?profile=RESIZE_930x

     

     

    • Nice looking work. I went with the A6000 for the VRAM and the ability to do native 2k with nearly all DD models loaded. Don't know that I would ever use all the models at once, but it definitely gives me some headroom for what's avalable now and what might be coming. But yes, I did need a second mortgage for that system ;-) I don;t know what Synthetik might want to do in this area - Visions Of Chaos seems to have all the current models available in a UI that is very efficient looking (haven't actually used it yet). Definitely a ton of potential with these tools. 

    • Cool, thanks for sharing.

      Do you have any comments on workflow enhancements you would like to see in an ideal world for working with generative image synthesis algorithms?

      • John - did you look at the Visions of Chaos interface shot I sent? It adapts to whichever model you are using (he has incoporated nearly 100). The main thing I like about it is that you don't see all the code for each cell. And you also don;' need to know the syntax for text stuff, like keyframing an animation... Let me know if you want to see more examples of DD output, though I imagine you have seen plenty...

      • Not sure about the ideal workflow, there are many roads to try, but some of those are already being tested: for example were you aware of the following two projects?

        1. Big Sleep Creator - https://github.com/enricoros/big-sleep-creator/ - "UI for human-in-the-loop controlled image synthesis"

        Planned features:

        - Generation of images based on text
        - Hyperparameters control
        - Scatter/select creation, with progressive refinement
        - Branch/continue generation of images (tree of creation)
        - Latent space editing
        - Latent space constraining for continuation

        2. DALL·E Flow - https://github.com/jina-ai/dalle-flow "A Human-in-the-Loop workflow for creating HD images from text"

        To make it short this makes the generation process more interactive by asking for user input at different generation steps.

        ---

        I think I saw other projects of this kind, but at the moment I can't remember them...

         

         

         

         

        GitHub - enricoros/big-sleep-creator: UI frontend for lucidrains/big-sleep
        UI frontend for lucidrains/big-sleep. Contribute to enricoros/big-sleep-creator development by creating an account on GitHub.
      • Hi John,

        I've toyed around with a number of the other tools out there, but more recently have spent quite some time experimenting with Dalle-2, MidJourney and some Stable Diffusion too. 

        Dalle is very "coherent" but less naturally "artistic" and, as you pointed out, is very restrictive/censored... not to mention the fixed square aspect ratio and the fact that while one can upload a source image, this can't be combined with prompt input. The one feature that I do appreciate, even if it is way to constrained, is the ability to "edit", i.e. to select an area of an image to modify. This would be far, far more interesting if it provided a full range of selection tools, instead of a crude, round, soft edge eraser.

        I generally prefer the aesthetics of the MidJourney output, the variable aspect ratio and the reduced censorship, although this comes with far less "coherent" results. The new "Beta" model is a dramatic improvement, but one is still limited by the effemeral influence of the input image, the difficulty in reproducing/tweaking results, the lack of an "edit" feature, and one might say, excessive influence of the default MJ aesthetic. Overall, while the results are quite impressive, and it is tons of fun to explore, I'd suggest that the artist's ability to exercise intent and direction is somewhat limited; prompt engineering does have an impact, but so much boils down to simply rolling the dice enough times to come up with attractive results.

        Stable Diffusion was finally released and shows a lot of promise, in particular given its open source orientation and associated implications for censorship-avoidance. While I initially found it more difficult to generate results as satisfying as those I've been producing with MJ, I really appreciates the ability it provides to reliably and consistently reproduce results and thus to tweak them using the various knobs and levers provided towards a desired output.  This should allow the artist to exercise more intent and direction, particularly as the coherence of the models improves over time. Moreoever, unlike either Dalle and MJ, Stable Diffusion is open source, which opens up a whole range of possibilities in terms of how it could be plugged into other tools and workflows, and the price/accessibility dynamics.

        Combining the best of Dalle-2, MidJourney and Stable Diffusion, one "ideal" workflow would allow:

        • inputing images
        • combining with prompts to generate images with variable aspect ratios using a coherent model, and
        • providing the ability to "edit" based on a comprehensive set of selection tools, and
        • providing seed-control that allow tangible control when refining towards a desired output

        With respect to Studio Artist, I love the power that it provides, am in awe of its capabilities, and will likely upgrade SA if/when the time comes, but I have also found it very difficult to wield this power. For example, I wasn't able to reproduce Charis Tsevis style results as well as I'd have liked, and mostly ended up using it to generate hundreds/thousands of versions of input images. While this was fun and yielded some great results, it also produced a lot of throw aways along the way, created work separating the wheat from the chaff, and ultimately wasn't an interactive/iterative process. As a result, today I'm more likely to spend those hours playing with these new engines than with SA.

        That being said, I can't help but wonder how Studio Artist's capabilities could be leveraged to interact with one or more of these tools. e.g. generating images/selections/masks which are used as inputs to such services, retrieving the results and incorporating them into SA's scripted/batched processing capabilities. I'd LOVE to see what you might come up with. 

        Keep up the great work!! 

         

        • Thanks for your suggested workflow comments.

          I was wondering if we should setup a specific Group here on the user forum devoted to generative neural net synthesis.  To discuss the different available options, different approaches to using Studio Artist to process the output, etc.

           

          We've been heavily researching all of the different approaches for generative neural net image synthesis here for awhile now.  I post images/animations every day on my art blog with different neural net ai approaches in addition to the new EBM (energy based model) digital paint tech we are working on.  I've recently been experimenting with CLIP guided direct RGB generative image synthesis, which is kind of surprising me to be honest and making me question some underlying assumptions people have about this technology.

          I have tried Stable Diffusion, and while it is very cool, the current lack of 'control' is a huge limitation i think for digital artists.  The prototype generative ai context we're currently working with here is based on the CompVis latent diffusion model.  The cool thing about the way it is setup is that you aren't restricted to just working with text prompts.  And you also have a lot of adjustable control over what happens.

          Working with a fixed text prompt for the system, i can totally change the content and appearance of the generative output by manipulating the adjustable controls for the algorithm.  So systems that just let you enter text with limited or no control over internal parameters for the algorithm are kind of missing the point in some respects i think.

          A better approach in my opinion is to view the system as an image synthesis engine, and let then let the user have full control over dialing in everything that system has to offer for controlling the image synthesis process, be it realistic image synthesis, or abstract image synthesis.

           

          It seems like everyone is overly focused on text prompt control of these algorithms.  And sure, text prompt control is cool.

          But think about all of the subtlety associated with any given image.  And if you try to define that with words you could write a novel (violating the CLIP embedding token limit by the way) and still not come close to getting at the subtlety that is right there in the image it self.

          There is a reason why the old expression 'an image is worth a thousand words' was even thought up. It is really getting at something very important about images and perception that text only prompt systems are totally missing.

          So i'm not saying never use text.  Just that is is one adjustable part of a much bigger adjustable system.  And if you want real artistic control over manipulating what the system is doing, you need access to all of that,

           

          You mentioned censorship.  I have no real interest in using OpenAI Dall-e 2 because of that (although their pricing is also too high for what you are getting).  You wonder if the pencil was invented today, would people be having conversations about how you could use it to create harmful images, and how it should be restricted because of that.

          Let's consider Guernica, a work of art thought by art critics to be one of the most important anti-war paintings in history.  If you tried to make it using Dalle-2, OpenAI would kick you off their system for violation of their terms of usage.  It depicts war, violence, animal mutilation.

          Stable Diffusion has 2 different censorship things built into it if you look at the actual code for it.  One that checks your text prompt, another that checks the output image.  Since the code is open source, you could remove or comment out that stuff.  I worry a little bit about 'censorship' associated with the training of the 5B model (and associated 'aesthetic' models derived from it) they used.  I also worry a little about how even though it is open source, you need to enter an authorization code to activate the model, which leaves a back door for them to turn it off or restrict it at a later time.

          I was experimenting with a different latent diffusion model that had censorship code like this in it and it was barking at images of people doing yoga as being obscene.  So sure, if you want to make something for kids to create cute pictures of animals, maybe parental mode makes sense. But for artists it is ridiculous.

          Happy Island Music
          A depository for John Dalton's personal artwork. Studio Artist, MSG, procedural art, WMF, digital painting, image processing, human vision, digital…
          • These technologies are here to stay so I can't help but think that a dedicated group devoted to generative neural net synthesis would be warranted.

            I fully agree with your analysis regarding the limits of using words/language, and the need to expose levers that provide sufficient artistic control.

            You mentioned a "prototype generative [...] based on the CompVis latent diffusion model", you would be able to recommend a place where one could get the best sense of the potential here? 

            I hesitate to even comment on the possible options going forward, since you understand these things far better than I ever will, and my input will be of limited value here, but I'll take a chance and do so anyway. 

            It seems to me that when considering a path forward, SA is faced with a choice to either:

            1. build in tools in such a way as to be/appear native to the application,
            2. integrate with such a tool via APIs, or
            3. integrate with various and sundry web services.

            In the case of #1, i.e. "SA now features generative neural net synthesis", you might start by integrating an open source option off the shelf, but you'd likely find yourself needing to maintain a fork going forward, the burden of which would only grow over time, which might, in turn, limit feature expansion. While I appreciate that this option is likely the easiest for some new end-users, my experience in the software development arena suggests that "small pieces loosely joined" has a lot of merit.

            In the case of #2, i.e. "You can now leverage generative neural net synthesis if you also install product X", you would only have to maintain the API interactions and not the code base itself. This would provide a basis for potentially including other tools over time, and would likely allow feature expansion to be dictated more by the strength of the communities around those tools, then by the size of Synthetik's development team. This leverages the "small pieces loosely joined" approach, but at some cost (additional installation requirements, a little less "total control", etc.)  

            This is similar in the case of #3, i.e. "You can now leverage your Dalle/MJ/[...] account, to do X", but might involve a longer list of API interactions to maintain. This feels like the least "SA-ish" approach, but could perhaps result is a more generic approach to defining an interaction/API model.

            Anyway, food for thought.

        • All of your workflow suggestions are good, and do resonate with our thinking here about how to incorporate this technology into the Studio Artist workflow.

          And your analysis of various approaches we could take is also pretty spot on.  I do worry a little bit about the potential for people to drain their bank accounts if we attach web service based approaches that get billed on cloud gpu usage to something like Gallery Show.

           

          One thing that is maybe a little bit different in our thinking is that we're really interested in how to incorporate the notion of 'improvisation' into how one might work with generative image synthesis.  So i thought i would throw that out there to see if anyone had any thoughts.  Or 'wouldn't it be nice' wishes.

          • My $0.02, in terms of improvision...

            The first thing that comes to mind is how some of these systems allow initial strokes as inputs and/or include "in painting" (including Stable Diffusion I believe, although not via the current DreamStudio web implementation AFAIK). Combined with the ability to expand/pan the canvas, this could allow the artist to do things like select areas, pan or zoom out, and then fill with matching or prompt-based AI-generated pixels. This could be extended to auto-selecting/panning/zooming, etc. and/or rotating/dynamically-chaging prompts.


This reply was deleted.

Is anybody making a copy of all the material in the Tutorials Forum

Since the Forum is going away in June, has anyone started to make a copy of all the stuff in the Tutorials forum?I've made copies of some of the tutorial material on the main site, but haven't looked at the Tutorial Forum yet.I'm going to continue copying as much as I can for my own personal use anyway, but if anyone else is doing it, or has already started doing it, please let me know.Maybe we can co-ordinate our efforts. ps can't ..... believe John, would let this happen without so much as a…

Read more…
1 Reply · Reply by Thor Johnson on Saturday

Studio Artist is in Italy!

I was crawling the streets of Matera, Italy today and may have discovered where SA is hiding!  (see attached photo). Not meaning to make light of this great, sad mystery. But I just couldn't resist as I try to make sense of what's happening. Losing my connection to SA, Synthetik and John has been a great sadness... and if real, ends a monumental era in my creative life. love,~Victor   

Read more…
3 Replies · Reply by Thor Johnson Apr 13

The Overload

"The Overload"! A video with music, from the various experiments I made in Studio Artist with stuff that I have learned in the last few days, from tips and tricks I found by scouring this site and the Synthetik site for tutorials etc. MSG! Paint Synth with MSG Path Generation! Movie Brushes with MSG Path Start Generation! Time Particles! Time Particles with MSG Path Start Generation running Movie Brushes! All that, and more! Haha I have been trying to stretch the Paint Synthesizer in the…

Read more…
1 Reply · Reply by Thor Johnson Mar 31

Teenage Tongue Cult

Hi, here is the video I made back in 2010 for my song "Teenage Tongue Cult". I finally found my master folder of image sequence files for it on one of my old hard drives, and since the version I had on my Vimeo was of pretty terrible pixelated low quality visually, I re-did it yesterday. It has extensive use of Studio Artist through the whole thing. I made it by first animating the characters and scenes in Flash, against a mostly kind of muddy green background, a color I knew wasn't being used…

Read more…
2 Replies · Reply by Thor Johnson Mar 30