• 0 Posts
  • 268 Comments
Joined 3 years ago
cake
Cake day: June 12th, 2023

help-circle




  • If you take data, and effectively do extremely lossy compression on it, there is still a way for that data to theoretically be recovered.

    This is extremely wrong and your entire argument rests on this single sentence’s accuracy so I’m going to focus on it.

    It’s very, very easy to do a lossy compression on some data and wind up with something unrecognizable. Actual lossy compression algorithms are a tight balancing act of trying to get rid of just the right amount of just the right pieces of data so that the result is still satisfactory.

    LLMs are designed with no such restriction. And any single entry in a large data set is both theoretically and mathematically unrecoverable. The only way that these large models reproduce anything is due to heavy replication in the data set such that, essentially, enough of the “compressed” data makes it through. There’s a reason why whenever you read about this the examples are very culturally significant.








  • More to the point, that is exactly what the people in this study were doing.

    They don’t really do into a lot of detail about what they were doing. But they have a table on limitations of the study that would indicate it is not.

    We do not provide evidence that: There are not ways of using existing AI systems more effectively to achieve positive speedup in our exact setting. Cursor does not sample many tokens from LLMs, it may not use optimal prompting/scaffolding, and domain/repository-specific training/finetuning/few-shot learning could yield positive speedup.

    Back to this:

    even if it did it’s not any easier or cheaper than teaching humans to do it.

    In my experience, the kinds of information that an AI needs to do its job effectively has a significant overlap with the info humans need when just starting on a project. The biggest problem for onboarding is typically poor or outdated internal documentation. Fix that for your humans and you have it for your LLMs at no extra cost. Use an LLM to convert your docs into rules files and to keep them up to date.


  • This lines up with my experience as well and what you’ve described is very close to how I work with LLM agents. The people bragging about 10x are either blowing smoke or producing garbage. I mean, I guess in some limited contexts I might get 10x out of taking a few seconds to write a prompt vs a couple of minutes of manual hunting and typing. But on the whole, software engineering is about so much more than just coding and those things have become no less important these days.

    But the people acting like the tech is a useless glorified Markov generator are also out of their mind. There are some real gains to be had by properly using the tech. Especially once you’ve laid the groundwork by properly documenting things like your architecture and dependencies for LLM consumption. I’m not saying this to try to sell anybody on it but I really, truly, can’t imagine that we’re ever going back to the before times. Maybe there’s a bubble burst like the dotcom bubble but, like the internet, agentic coding is here to stay.




  • The generalized learning is usually just the first step. Coding LLMs typically go through more rounds of specialized learning afterwards in order to tune and focus it towards solving those types of problems. Then there’s RAG, MCP, and simulated reasoning which are technically not training methods but do further improve the relevance of the outputs. There’s a lot of ongoing work in this space still. We haven’t seen the standard even settle yet.


  • VoterFrog@lemmy.worldtoMicroblog Memes@lemmy.worldLa_Brea_V2.0.exe
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    3
    ·
    25 days ago

    Doesn’t work either

    The text you provided translates to:
    “But what about typing like this?”. This style of writing involves replacing standard Latin letters with similar-looking characters from other alphabets or adding diacritical marks (accents, tildes, umlauts) available in the Unicode standard.