Star Trek Website
  • Communities
  • Create Post
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
noumenon@lemmy.world to Machine Learning@programming.devEnglish · 23 days ago

DeepSeek proposes a core transformer architecture change that modifies how residual connections work at scale

arxiv.org

external-link
message-square
0
link
fedilink
5
external-link

DeepSeek proposes a core transformer architecture change that modifies how residual connections work at scale

arxiv.org

noumenon@lemmy.world to Machine Learning@programming.devEnglish · 23 days ago
message-square
0
link
fedilink
mHC: Manifold-Constrained Hyper-Connections
arxiv.org
external-link
Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and diversifying connectivity patterns. While yielding substantial performance gains, this diversification fundamentally compromises the identity mapping property intrinsic to the residual connection, which causes severe training instability and restricted scalability, and additionally incurs notable memory access overhead. To address these challenges, we propose Manifold-Constrained Hyper-Connections (mHC), a general framework that projects the residual connection space of HC onto a specific manifold to restore the identity mapping property, while incorporating rigorous infrastructure optimization to ensure efficiency. Empirical experiments demonstrate that mHC is effective for training at scale, offering tangible performance improvements and superior scalability. We anticipate that mHC, as a flexible and practical extension of HC, will contribute to a deeper understanding of topological architecture design and suggest promising directions for the evolution of foundational models.
alert-triangle
You must log in or # to comment.

Machine Learning@programming.dev

machine_learning@programming.dev

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !machine_learning@programming.dev

A community for posting things related to machine learning

Icon base by Lorc under CC BY 3.0 with modifications to add a gradient

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 1 user / day
  • 15 users / week
  • 95 users / month
  • 96 users / 6 months
  • 1 local subscriber
  • 603 subscribers
  • 96 Posts
  • 87 Comments
  • Modlog
  • mods:
  • Ategon@programming.dev
  • Akisamb@programming.dev
  • ericjmorey@programming.dev
  • BE: 0.19.13
  • Modlog
  • Legal
  • Instances
  • Docs
  • Code
  • join-lemmy.org