• 6 Posts
  • 19 Comments
Joined 3 年前
cake
Cake day: 2023年7月7日

help-circle

  • Thank you for your opinion & recommendations. Something I saw today related to “sub-agents” is in Kimi 2.6’s model card it says

    Elevated Agent Swarm: Scaling horizontally to 300 sub-agents executing 4,000 coordinated steps, K2.6 can dynamically decompose tasks into parallel, domain-specialized subtasks, delivering end-to-end outputs from documents to websites to spreadsheets in a single autonomous run.

    So maybe Kimi 2.6 is doing the “type of thing” I am looking for, but I don’t have the means to run it practically. Maybe at 1 token per second which would be brutal.

    I tried out Qwen 3.6 27B but not yet in an agentic setting, so I can’t really judge yet. Maybe it’s just me but the small model size seems limiting. I thought gpt-oss-120b was good.








  • Thanks for your answer. I think to be clear, what I’m looking for is a kind of masked fine-tuning. You see, I want to “steer” a particular output instead of providing complete examples, which are costly to create.

    The steering would be something like this:

    1. I have an LLM generate a sequence.
    2. I find exactly where the LLM goes “off track” and correct it there (for only maybe 10-20 tokens instead of correcting the rest of the generation manually).
    3. The LLM continues “on track” until it goes off track again.

    What I would like to do is train the model based on these corrections I give it, where many corrections might be part of the same overall generation. Conceptually I think each correction must have some training value. I don’t know much about masking, but what I mean here is that I don’t want it to train on a few tens or hundreds of (incomplete) samples but rather thousands of (masked) “steers” that correct the course of the rest of the sample’s generated text.










  • Can SFT be used on partial generations? What I mean by a “steer” is a correction to only a portion, and not even the end, of model output.

    For example, a “bad” partial output might be:

    <assistant> Here are four examples:
    1. High-quality example 1
    2. Low-quality example 2
    

    and the “steer” might be:

    <assistant> Here are four examples:
    1. High-quality example 1
    2. High-quality example 2
    

    but the full response will eventually be:

    <assistant> Here are four examples:
    1. High-quality example 1
    2. High-quality example 2
    3. High-quality example 3
    4. High-quality example 4
    

    The corrections don’t include the full output.