资讯详情

lld 链接器简述

lld 是 LLVM 一个子项目在开发中的一个子项目是链接器。 MacOS 其链接时间和优化效果远远超过苹果 ld64。

本文大纲如下: 1、Linker works 2、Traditional linker 3、lld:Atom based ??3.1、types ??3.2、Fast ??3.3、file model ???3.3.1、Object File ???3.3.2、Static Library (Archive) ???3.3.3、Dynamic Library (Shared Object) ??3.4、linking step ???3.4.1、Resolving ???3.4.2、Passes 4、feature 5、Darwin Linker ??5.1、Future Directions ???5.1.1、Sections ???5.1.2、Mach-o Object File Format ???5.1.3、New Object File Model ???5.1.4、Debug Info 6、Note

refer https://www.jianshu.com/p/ef3415255808

1、Linker works 链接器的任务

“Binds more abstract names to more concrete names, which permits programmers to write code using more abstract names”.

链接器需要将这些抽象的名称绑定到更具体的名称,以开发编程中使用的抽象名称。

In general concrete representations are higher performance but less flexible.

一般来说,更具体的表现会有更高的性能,但会导致更低的灵活性。

In practice a tool that glues together the outputs of separate compilation into a single output.

在实践中,链接器是将每个编译输出结果缝合到单个输出的工具。

A linker is in many ways a slightly more sophisticated version of cat. The linker must copy the necessary data from all the input objects, rearrange it and resolve fixups. It is worth bearing this in mind when complaining that the linker is slow, just doing a cat of all the object files can also take a significant amount of time, particularly when debug information is involved.

链接器有点像命令行。 cat 操作时,需要从输入的目标文件中复制所需组和修改所需的数据。这些工作是值得的,仅仅是所有输入的目标文件 cat 操作非常耗时,特别是当有调试信息时。

链接器就像一条装配线,每一步都使它更清晰,更接近机器代码。最后,需要明确定义每个符号的地址和重定位。

  • 阅读所有内容并进行局部转换
  • 要进行全局转换,需要有全局信息,比如垃圾回收
  • 生成 sections(具有相同权限的连续数据) ,如 PLT 和 GOT
  • 重组这些 sections 并分配相应的地址
  • 生成输出文件

Loading an object from a static library may introduce new undefined references as well as definitions, this may require more objects to be loaded to satisfy these references.

加载静态库 .o 文件可能会引入新的未定义引用和定义,需要加载更多 .o 满足这些引用的文件。

Shared libraries may also be “loaded” to provide symbol definitions and references, although loading just means that the shared library is added as a dependency of the output. The contents of the shared library beyond the symbol table aren’t needed.

动态库也可以加载以提供符号的定义和参考,尽管加载只意味着这些动态库依赖于输出。对于链接器,动态库只需要符号信息。

Finding all the content for a link is an iterative process that finishes when no more objects are loaded to resolve an undefined reference.

要链接的所有内容都是一个不断的过程,直到没有未定义的引用需要解决。

链接参数通常有固定的顺序要求,例如 file1.o lib1.a file2.o ,如果 file2.o 需要用到 lib1.a 里面的符号会报错,因为通常加载的目标文件不会再加载。解决方案是调整它们,或者在 file2.o 后面再加一个 lib1.a。

根据重定位信息,链接器使用有向图 section 现阶段没有地址分配,可以优化,比如垃圾回收。

一旦每个 section 地址分配,每个重定位信息和函数定义都会有相应的地址,重定位信息可以修改。

2、Traditional linker 传统的链接器

“section” based like traditional linkers which mostly just interlace sections from multiple object files into the output file.

以传统链接器为基础 section 完成的目标文件很多 sections 组合到输出文件中。

Traditional section based linking work well for simple linking, but their model makes advanced linking features difficult to implement. Features like dead code stripping, reordering functions for locality, and C coalescing require the linker to work at a finer grain.

传统基于 section 链接对于一般链接没有问题,但是它们的模型很难支持和实现一些链接特性。例如,删除无用代码,重新排列局部函数c 合并操作等。

3、lld: Atom based

An atom is an indivisible chunk of code or data. An atom has a set of attributes, such as: name, scope, content-type, alignment, etc. An atom also has a list of References. A Reference contains: a kind, an optional offset, an optional addend, and an optional target atom.

一个 atom 代码或数据是不可分割的。 atom 名称、范围、类型、对齐等属性很多。 atom 还有一系列的引用,包括类型、可选偏移、可选添加和可选目标 atom。

The Atom model allows the linker to use standard graph theory models for linking data structures. Each atom is a node, and each Reference is an edge. The feature of dead code stripping is implemented by following edges to mark all live atoms, and then delete the non-live atoms.

atom 该模型允许链接器使用标准的图形理论模型来链接数据结构。每个模型 atom 它是一个节点,每个引用都是一个边缘。去除无用代码的特性是根据边缘标记生存 atom、删除不存活的 atom 来实现的。

Typically each user written function or global variable is an atom. In addition, the compiler may emit other atoms, such as for literal c-strings or floating point constants, or for runtime data structures like dwarf unwind info or pointers to initializers.

一般来说,我们编写的函数或全局变量是一个 atom此外,编译器也可能生成其他 atom,如 C 的文本字符串、浮点数常量、运行时的数据结构(dwarf 栈信息等)、指针的初始化。

3.1、types

There are only four different types of atoms:

atom 只有 4 种类型

  • DefinedAtom 定义

95% of all atoms. This is a chunk of code or data 代码块或数据块

  • UndefinedAtom 未定义

This is a place holder in object files for a reference to some atom outside the translation unit.During core linking it is usually replaced by (coalesced into) another Atom.

这是某个目标文件里对于引用的占位符,它指向的是当前编译单元之外的一些 atom。经过主要的链接后,通常它就会被其他 atom 所替换了。

  • SharedLibraryAtom 动态库

If a required symbol name turns out to be defined in a dynamic shared library (and not some object file). A SharedLibraryAtom is the placeholder Atom used to represent that fact.It is similar to an UndefinedAtom, but it also tracks information about the associated shared library.

如果所需要的符号名是在动态库当中定义的。那么 SharedLibraryAtom 就是一个占位符的 atom。有点像上面的 UndefinedAtom,但它会跟踪相关的动态库的信息。

  • AbsoluteAtom 绝对

This is for embedded support where some stuff is implemented in ROM at some fixed address. This atom has no content. It is just an address that the Writer needs to fix up any references to point to.

这个是对嵌入式的支持,一些东西可能是在 ROM 中调整过的地址中实现的。这个 atom 没有内容信息,它只是一个地址,所以所有指向它的引用都需要被修正。

3.2、Fast

LLD only reads the information it needs from the input files. Copying section data to the output very late. Alternatively it could copy the full input contents into memory at read time. This would permit arbitrary modification of the section data, but would harm performance.

LLD 只从输入文件中读取它所需要的文件。很晚的时机才会去复制 section 的数据到输出。它可以选择直接复制所有的输入到内存当中,让它可以随意修改,但会对性能影响较大。

LLD tries to keep the amount of Target specific code to a minimum, this has the advantage of making it easy to port a new Target if it happens to look similar to other targets. It can make it more difficult if the target does something radically different though.

LLD 会尽量使特定平台的代码更少量,这会使它更容易去生成新的平台的代码。

The LLVM project has a rich ADT library that contains optimized containers and libraries that have accelerated LLD’s development and give it a performance advantage over default standard library types.

LLVM 有丰富的 ADT(抽象数据类型) 库,包含有很多优化过的容器,所以也可以用来优化 LLD。

As has been seen previously a linker will need to create hundreds of thousands of sections and symbols and in some cases tens of millions of relocations. This will end up with a lot of allocations of the same type, that can’t be released until the output is written (the last thing a linker does). A region based allocator where most allocations are just incrementing a pointer within a larger block can improve performance considerably. LLD has a separate memory region per type so iterating through a type doesn’t jump all over the address space.

一个链接器需要创建成百上千的 section 和符号,还有上万的重定位信息,这些会导致很多重复的对于相同类型的重定位信息,而且这些信息直到输出被写完才会被释放。内存分配于此会十分影响性能。LLD 为每个类型分配了独立的内存空间,所以遍历时不需要遍历所有的地址空间。

且 lld 的设计充分利用了并行的能力

3.3、file model

3.3.1、Object File

An object file is just a container of atoms. When linking an object file, a reader is instantiated which parses the object file and instantiates a set of atoms representing all content in the .o file. The linker adds all those atoms to a master graph.

目标文件就是 atom 的容器。当链接目标文件时,读取控制器会被实例化,然后实例化一系列的 atom 来代表所有在 .o 文件里的内容。链接器会把其所有的 atom 加入到主要的图中。

3.3.2、Static Library (Archive)

This is the traditional unix static archive which is just a collection of object files with a “table of contents”. When linking with a static library, by default nothing is added to the master graph of atoms. Instead, if after merging all atoms from object files into a master graph, if any “undefined” atoms are left remaining in the master graph, the linker reads the table of contents for each static library to see if any have the needed definitions. If so, the set of atoms from the specified object file in the static library is added to the master graph of atoms.

静态库就是一系列目标文件的集合。当链接静态库时,默认情况下不会把任何 atom 加到主要的图中。如果从目标文件中合并所有的 atom 到主要的图时,出现了未定义的 atom,然后链接器才会去读取静态库看是否有需要的定义。如果有,那么从静态库特定的目标文件中把其 atom 加入到主要的图中。

3.3.3、Dynamic Library (Shared Object)

Dynamic libraries are different than object files and static libraries in that they don’t directly add any content. Their purpose is to check at build time that the remaining undefined references can be resolved at runtime, and provide a list of dynamic libraries (SO_NEEDED) that will be needed at runtime. The way this is modeled in the linker is that a dynamic library contributes no atoms to the initial graph of atoms. Instead, (like static libraries) if there are “undefined” atoms in the master graph of all atoms, then each dynamic library is checked to see if exports the required symbol. If so, a “shared library” atom is instantiated by the by the reader which the linker uses to replace the “undefined” atom.

动态库不太一样,因为它不直接添加任何内容。它的目的是在编译时检查那些未定义的引用是否能在运行时找到,然后提供运行时所需要的一系列的动态库。所以动态库不会生成任何的 atom 到初始的图中。而是如果主要的图中出现了未定义的 atom (与上面静态库情况一致),链接器才会去检查动态库有没有导出对应的符号。如果有,那么动态库 atom 会被实例化,然后用来替换未定义的 atom。

3.4、linking step

The overall steps in linking are:

  1. Command line processing 命令行参数处理
  2. Parsing input files 解析输入文件
  3. Resolving 符号查找
  4. Passes/Optimizations 优化
  5. Generate output file 生成输出文件

The Resolving and Passes steps are done purely on the master graph of atoms, so they have no notion of file formats such as mach-o or ELF.

3.4.1、Resolving

The resolving step takes all the atoms’ graphs from each object file and combines them into one master object graph. Unfortunately, it is not as simple as appending the atom list from each file into one big list. There are many cases where atoms need to be coalesced. That is, two or more atoms need to be coalesced into one atom. This is necessary to support: C language “tentative definitions”, C++ weak symbols for templates and inlines defined in headers, replacing undefined atoms with actual definition atoms, and for merging copies of constants like c-strings and floating point constants.

解析过程会把目标文件里的所有 atom 的图合并到主要的图中。但并非简单地把 atom 列表加到同一个大列表当中。会有很多的情况下 atom 需要被合并(把两个或以上的 atom 合并到一个 atom 当中)。这很重要,像 C 语言的未决议的定义,C++ 头文件中的弱符号模版和内联函数,将这些未定义的 atom 替换为确切的定义好的 atom,然后合并 C 字符串常量和浮点数常量。

The linker support coalescing by-name and by-content. By-name is used for tentative definitions and weak symbols. By-content is used for constant data that can be merged.

链接器支持以名字或内容来合并。以名字用在未决议的定义和弱符号。以内容用在常量数据。

When one atom has a reference (FixUp) to another atom, there is also a binding type: by-name, direct, or indirect. A Fixup contains a tagged union that if the binding type is by-name, the union field is a pointer to a c-string. If the binding type is direct, the union is a pointer to an Atom. If the binding type is indirect, the union is a index into a table of pointers to Atoms. Below is a graphical representation of the binding types:

当一个 atom 引用了另一个 atom,会绑定类型:名字、直接调用、间接调用。如果是名字绑定的话,引用会包含有标签的集合,这个集合是 C 字符串的指针。如果是直接调用,这个集合就是 atom 的指针。如果是间接调用,这个集合就是指向 atom 的指针表上的某个 index。

Input file Atoms contain only direct and by-name references. Direct references are used for atoms defined in the same object file for which the target atom is either unnamed or cannot change. For instance, calling a static function in a translation unit will result in a direct reference to the static functions's atom. Also the FDE (dwarf unwind info) for a function has a direct reference to its function. On the other hand references to global symbols (e.g. call to printf) use by-name binding in object files.

输入文件的 atom 只包含有直接调用和名字引用。直接引用用在同一目标文件中那些目标 atom 是未命名或不能改变的 atom。对于实例而言,在一个编译单元里调用一个静态函数会导致对静态函数的 atom 的直接引用。调试信息 PDE 也会对它对应的函数有直接引用。另外对全局符号(printf)的引用是使用名字绑定的。

The resolving process maintains some global linking “state”, including a “symbol table” which is a map from llvm::StringRef to lld::Atom*. With these data structures, the linker iterates all atoms in all input files. For each atom, it checks if the atom is named and has a global or hidden scope. If so, the atom is added to the symbol table map. If there already is a matching atom in that table, that means the current atom needs to be coalesced with the found atom, or it is a multiple definition error.

解析过程会保存有全局链接状态,包括符号表。链接器遍历所有的输入文件下的 atom,对于每个 atom,它会检查这个 atom 是否被命名了和是否是全局或隐藏的作用域。如果是,则这个 atom 会被加入到符号表中,如果符号表中已经有相关的 atom 了,那么意味着当前的 atom 需要被合并,或者这就是个重复定义的编译错误。

Dead code stripping (if requested) is done at the end of resolving. The linker does a simple mark-and-sweep. It starts with “root” atoms (like “main” in a main executable) and follows each references and marks each Atom that it visits as “live”. When done, all atoms not marked “live” are removed.

无用代码去除会在解析完成之后进行。链接器会进行简单的标记与删除操作。它从根出发(main),然后跟踪所有的引用而标记存活。结束后,没有标记的 atom 会被删除。

3.4.2、Passes

The Passes step is an open ended set of routines that each get a change to modify or enhance the current lld::File object. Some example Passes are: 这个步骤是开放的,让我们有机会去修改或优化输出文件

  • stub (PLT) generation 桩生成
  • GOT instantiation 全局表实例化
  • order_file optimization 二进制重排优化
  • branch island generation 跳转生成
  • branch shim generation
  • Objective-C optimizations (Darwin specific)
  • TLV instantiation (Darwin specific)
  • DTrace probe processing (Darwin specific)
  • compact unwind encoding (Darwin specific)

4、feature

Identical Code Folding

The ICF algorithm looks at sections that are identical, yet have different symbols. It forms equivalence classes, where each equivalence class contains identical functions. At the end of the ICF algorithm a single candidate is chosen and the rest are eliminated. ICF needs to take account of the connections between sections when deciding whether they are equivalent, this can be quite an expensive process when there are many sections. LLD has been able to parallelise parts of the ICF algorithm.

ICF 算法会查找相同的 section,但它们具有不同的符号。它形成等价的类,而每个等价的类中含有相同的函数。最后 ICF 会选择其中一个而删除其他的。ICF 需要通过 section 之间的连边来判断它们是否相等,由于 section 量很多,所以这个操作非常耗时。但 LLD 已经可以并行部分的 ICF 算法了。

5、Darwin Linker

The Darwin linker is based on "Atoms".

There are two atoms: main and an anonymous atom containing the c-string literal "hello world". The Atom "main" has two fixups. One is the call site for the call to printf, and the other is a fixup for the instruction that loads the address of the c-string literal.

5.1、Future Directions

5.1.1、Sections

The current use of sections in mach-o .o files over-constrains the linker. By default, the linker should preserve the section an atom is in. But since all sections must be contiguous in the output, that limits the ability of the linker to order atoms for locality. It would be helpful to enrich the object file with with reason something is in the section it is. For instance, is the section found at runtime? Or was the use of a section just a quick way to group some content together?

现有的 MachO 格式限制 atom 链接器的发挥,因为它需要以 section 的形式存储,且里面的内容要连贯。

5.1.2、Mach-o Object File Format

The messiest part of the linker is the mach-o parser. This is because mach-o is a traditional section and symbols based file format. The parser must infer atom boundaries using two approaches. The first is that some section types have well defined content which the linker can parse into atoms (e.g. eh_frame). The other approach is a naming convention (which the compiler follows) by which the linker breaks sections into atoms at any non-local (not starting with 'L') symbol. The processing the linker has to do parse mach-o .o files is a significant part of the link time.

链接器最凌乱的部分就是 MachO 的 parser 了。因为它需要用两种方式去推断 atom 的范围,第一种一些 section 的类型定义得很好,可以直接找到;另一种是命名惯例,链接器通过非局部符号来判断。

Given that the assembler writes object files once, whereas the linker reads them many times (during development), it would make sense to optimize the object file format to be something the linker can read/parse efficiently.

5.1.3、New Object File Model

LLVM has a nice model for its IR. There are three representations: the binary bit code file, the in-memory object model, and a textual representation. LLVM contains utility possible code for converting between these representations. The same model makes sense for atoms too. There should be three representations for atoms: binary file, in-memory, and textual. The Darwin linker already has an in-memory C++ object model for Atoms. All we need is a textual representation and binary file format.

atom 应该也需要有三种格式:二进制 bitcode、内存目标模型、文本表示。

5.1.4、Debug Info

Around 2005 when Apple switched from using STABS to using DWARF for debug information, we made a design decision to have the linker ignore DWARF in .o files. This improves linking performance because the linker is not copying tons of debug info. Instead, the linker adds "debug notes" into output binary that contain the paths of the original .o files. During development the Darwin debugger will notice the debug notes and the load the dwarf debug information from the original object files. For release builds, a tool named dsymutil is run on the program. It finds the debug notes and then the original object files, then reads, merges and optimizes all the dwarf debug information into one .dSYM file which can be loaded by the debugger if needed.

The current way DWARF is generated is that all debug information for all functions in a translation unit are merged and optimized into sections based on debug info kind. For instance the mapping of instructions to source line numbers for all functions is compressed and put in one section. This does not play well in an Atom based file format. One idea is to have the compiler emit some intermediate representation debug information (one which is partitioned per atom) into the Atom based file format. The linker could then have code to convert that intermediate debug into to final dwarf. This is still an open question.

DWARF 的紧凑格式让以 atom 单位的链接器非常难以处理。

6、Note

  • 怎么使用 lld ? Replace /usr/bin/ld with lld, or Pass -fuse-ld=lld to clang
  • LLD ELF Simplified Control Flow

参考&推荐阅读

Linker Design: 

what makes lld so fast?:

Inside the Linker:

lld: A Fast, Simple and Portable Linker:https://llvm.org/devmtg/2017-10/slides/Ueyama-lld.pdf

LLD from a user’s perspective:https://archive.fosdem.org/2017

标签: 连接器fit系列

锐单商城拥有海量元器件数据手册IC替代型号,打造 电子元器件IC百科大全!

锐单商城 - 一站式电子元器件采购平台