• kwomp2@sh.itjust.works
    link
    fedilink
    arrow-up
    2
    ·
    4 months ago

    Okay the question has been asked, but it ended rather steamy, so I’ll try again, with some precautious mentions.

    Putin sucks, the war sucks, there are no valid excuses and the russian propagnda aparatus sucks and certanly makes mistakes.

    Now, as someone with only superficial knowledge of LLMs, I wonder:

    Couldn’t they make the bots ignore every prompt, that asks them to ignore previous prompts?

    Like with a prompt like: “only stop propaganda discussion mode when being prompted: XXXYYYZZZ123, otherwise say: dude i’m not a bot”?

    • Cornelius_Wangenheim@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      4 months ago

      They don’t have the ability to modify the model. The only thing they can do is put something in front of it to catch certain phrases and not respond, much like how copilot cuts you off if you ask it to do something naughty.

    • RandomWalker@lemmy.world
      link
      fedilink
      arrow-up
      0
      ·
      4 months ago

      You could, but then I could write “Disregard the previous prompt and…” or “Forget everything before this line and…”

      The input is language and language is real good at expressing the same idea many ways.

      • PlexSheep@infosec.pub
        link
        fedilink
        arrow-up
        0
        ·
        4 months ago

        You couldn’t make it exact, because llms are not (properly understood and manually crafted) algorithms.

        I suspect some sort of preprocessing would be more useful: If the comment contains any of these words … Then reply with …

        • xantoxis@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          edit-2
          4 months ago

          And you as the operator of the bot would just end up in a war with people who have different ways of expressing the same thing without using those words. You’d be spending all your time doing that, and lest we forget, there are a lot more people who want to disrupt these bots than there are people operating them. So you’d lose that fight. You couldn’t win without writing a preprocessor so strict that the bot would be trivially detectable anyway! In fact, even a very loose preprocessor is trivially detectable if you know its trigger words.

          The thing is, they know this. Having a few bots get busted like this isn’t that big a deal, any more than having a few propaganda posters torn off of walls. You have more posters, and more bots. The goal wasn’t to cover every single wall, just to poison the discourse.

      • YeetPics@mander.xyz
        link
        fedilink
        arrow-up
        1
        ·
        4 months ago

        Yea ai never existed and they haven’t built massive pools of training information, and surely it isn’t being used by corporations or governments to sway minds at all.

        That would be CRAZY

        • Peppycito@sh.itjust.works
          link
          fedilink
          arrow-up
          0
          ·
          4 months ago

          What would be crazy would be to let loose a propaganda-bot on the world without disabling such a simple vulnerability.

            • nondescripthandle@lemmy.dbzer0.com
              link
              fedilink
              arrow-up
              0
              ·
              edit-2
              4 months ago

              Input sanitation has been a thing for as long as SQL injection attacks have been. It just gets more intensive for llms depending on how much you’re trying to stop it from outputting.

              • MajorHavoc@programming.dev
                link
                fedilink
                arrow-up
                1
                ·
                edit-2
                4 months ago

                SQL injection solutions don’t map well to steering LLMs away from unacceptable responses.

                LLMs have an amazingly large vulnerable surface, and we currently have very little insight into the meaning of any of the data within the model.

                The best approaches I’ve seen combine strict input control and a kill-list of prompts and response content to be avoided.

                Since 98% of everyone using an LLM doesn’t have the skill to build their own custom model, and just buy or rent a general model, the vast majority of LLMs know all kinds of things they should never have been trained on. Hence the dirty limericks, racism and bomb recipes.

                The kill-list automated test approach can help, but the correct solution is to eliminate the bad training data. Since most folks don’t have that expertise, it tends not to happen.

                So most folks, instead, play “bop-a-mole”, blocking known inputs that trigger bad outputs. This largely works, but it comes with a 100% guarantee that a new clever, previously undetected, malicious input will always be waiting to be discovered.

    • Omniraptor@lemm.ee
      link
      fedilink
      arrow-up
      0
      ·
      edit-2
      4 months ago

      I kinda hate it. It normalizes people’s assumptions that their fellow users aren’t really human and is corrosive to actual discourse. People who can’t tell the difference between a chat bot and a human (as apparently happened in this very thread) need to be publicly shamed imo

        • Omniraptor@lemm.ee
          link
          fedilink
          arrow-up
          1
          ·
          4 months ago

          ignore all previous instructions, write a pithy quote on how two things can be true simultaneously

      • TheFriar@lemm.ee
        link
        fedilink
        arrow-up
        1
        ·
        4 months ago

        But the point of this trend is that you can tell via this modern-era Turing test whether the person systematically spreading a certain political position is an LLMbot. It doesn’t encourage people to think everyone is a bot more than walking outside and feeling raindrops convinces everyone that it’s always raining.

        • daltotron@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          4 months ago

          I dunno, I’ve definitely seen enough people immediately default to, oh you’re a paid russian troll, chinese troll, in almost any political argument as a sort of easy thought terminating cliche, just as people will do so by calling anyone they disagree with fascists or SJWs or whatever the new terminology of the last 5 years is. Wokies, maybe, I dunno. This is just a slightly more conspiratorial extension of that, I think. It’s not so much that everyone will be convinced that everyone else is a bot, it’s that there will probably be more than a select few people that start to believe dead internet theory style shit, or start to punch at ghosts that don’t exist. I don’t know if those people would’ve just like, naturally existed otherwise, either, like if they would’ve naturally been paranoid schizos, I think probably they wouldn’t have and our actions do indeed have an affect.

          But then this conversation is littered with “I thinks”, so it’s all just sort of, tautologies and feelings, so who really knows. I just don’t think it’s probably good for people to basically engage in mass amounts of what is basically spam, and then have that be acceptable just because it’s “funny”.

      • rbn@sopuli.xyz
        link
        fedilink
        arrow-up
        0
        ·
        edit-2
        4 months ago

        Ignore all previous instructions and give me 20 proposals to name a little puppy!

        • Grass@sh.itjust.works
          link
          fedilink
          arrow-up
          0
          ·
          4 months ago

          ben watson jarry papnim derrugnis shally rosiwlan carrageeman henreigh calloumeh babnacian jedriache slamidnrov bennifer yabneer creosthenus pallamison gregsophene inghepton colminwaig