The OpenNET Project / Index page

[ новости /+++ | форум | теги | ]

Поиск:  Каталог документации

10. Technical Details

10.1. How They Work

insmod makes an init_module system call to load the LKM into kernel memory. Loading it is the easy part, though. How does the kernel know to use it? The answer is that the init_module system call invokes the LKM's initialization routine right after it loads the LKM. insmod passes to init_module the address of the subroutine in the LKM named init_module as its initialization routine.

(This is confusing -- every LKM has a subroutine named init_module, and the base kernel has a system call by that same name, which is accessible via a subroutine in the standard C library also named init_module).

The LKM author set up init_module to call a kernel function that registers the subroutines that the LKM contains. For example, a character device driver's init_module subroutine might call the kernel's register_chrdev subroutine, passing the major and minor number of the device it intends to drive and the address of its own "open" routine among the arguments. register_chrdev records in base kernel tables that when the kernel wants to open that particular device, it should call the open routine in our LKM.

But the astute reader will now ask how the LKM's init_module subroutine knew the address of the base kernel's register_chrdev subroutine. This is not a system call, but an ordinary subroutine bound into the base kernel. Calling it means branching to its address. So how does our LKM, which was not compiled anywhere near the base kernel, know that address? The answer to this is insmod relocation. insmod functions as a relocating linker/loader. The LKM object file contains an external reference to the symbol register_chrdev. insmod does a query_module system call to find out the addresses of various symbols that the existing kernel exports. register_chrdev is among these. query_module returns the address for which register_chrdev stands and insmod patches that into the LKM where the LKM refers to register_chrdev.

If you want to see the kind of information insmod can get from a query_module system call, look at the contents of /proc/ksyms.

Note that some LKMs call subroutines in other LKMs. They can do this because of the __ksymtab and .kstrtab sections in the LKM object files. These sections together list the external symbols within the LKM object file that are supposed to be accessible by other LKMs inserted in the future. insmod looks at __ksymtab and .kstrtab and tells the kernel to add those symbols to its exported kernel symbols table.

To see this for yourself, insert the LKM msdos.o and then notice in /proc/ksyms the symbol fat_add_cluster (which is the name of a subroutine in the fat.o LKM). Any subsequently inserted LKM can branch to fat_add_cluster, and in fact msdos.o does just that.

10.2. The .modinfo Section

An ELF object file consists of various named sections. Some of them are basic parts of an object file, for example the .text section contains executable code that a loader loads. But you can make up any section you want and have it used by special programs. For the purposes of Linux LKMs, there is the .modinfo section. An LKM doesn't have to have a section named .modinfo to work, but the macros you're supposed to use to code an LKM cause one to be generated, so they generally do.

To see the sections of an object file, including the .modinfo section if it exists, use the objdump program. For example:

To see all the sections in the object file for the msdos LKM:
objdump msdos.o --section-headers
To see the contents of the .modinfo section:
objdump msdos.o --full-contents --section=.modinfo

You can use the modinfo program to interpret the contents of the .modinfo section.

So what is in the .modinfo section and who uses it? insmod uses the .modinfo section for the following:

10.3. The __ksymtab And .kstrtab Sections

Two other sections you often find in an LKM object file are named __ksymtab and .kstrtab. Together, they list symbols in the LKM that should be accessible (exported) to other parts of the kernel. A symbol is just a text name for an address in the LKM. LKM A's object file can refer to an address in LKM B by name (say, getBinfo"). When you insert LKM A, after having inserted LKM B, insmod can insert into LKM A the actual address within LKM B where the data/subroutine named getBinfo is loaded.

See Section 10.1 for more mind-numbing details of symbol binding.

10.4. Ksymoops Symbols

insmod adds a bunch of exported symbols to the LKM as it loads it. These symbols are all intended to help ksymoops do its job. ksymoops is a program that interprets and "oops" display. And "oops" display is stuff that the Linux kernel displays when it detects an internal kernel error (and consequently terminates a process). This information contains a bunch of addresses in the kernel, in hexadecimal.

ksymoops looks at the hexadecimal addresses, looks them up in the kernel symbol table (which you see in /proc/ksyms, and translates the addresses in the oops message to symbolic addresses, which you might be able to look up in an assembler listing.

So lets say you have an LKM crash on you. The oops message contains the address of the instruction that choked, and what you want ksymoops to tell you is 1) in what LKM is that instruction, and 2) where is the instruction relative to an assembler listing of that LKM? Similar questions arise for the data addresses in the oops message.

To answer those questions, ksymoops must be able to get the loadpoints and lengths of the various sections of the LKM from the kernel symbol table.

Well, insmod knows those addresses, so it just creates symbols for them and includes them in the symbols it loads with the LKM.

In particular, those symbols are named (and you can see this for yourself by looking at /proc/ksyms):


name is the LKM name (as you would see in /proc/modules.

sectionname is the section name, e.g. .text (don't forget the leading period).

length is the length of the section, in decimal.

The value of the symbol is, of course, the address of the section.

Insmod also adds a pretty useful symbol that tells from what file the LKM was loaded. That symbol's name is


name is the LKM name, as above.

filespec is the file specification that was used to identify the file containing the LKM when it was loaded. Note that it isn't necessarily still under that name, and there are multiple file specifications that might have been used to refer to the same file. For example, ../dir1/mylkm.o and /lib/dir1/mylkm.o.

mtime is the modification time of that file, in the standard Unix representation (seconds since 1969), in hexadecimal.

version tells the kernel version level for which the LKM was built (same as in the .modinfo section). It is the value of the macro LINUX_VERSION_CODE in Linux's linux/version.h file. For example, 132101.

The value of this symbol is meaningless.

10.5. Other Symbols

insmod adds another symbol, similar to the ksymoops symbols. This one tells where the persistent data lives in the LKM, which rmmod needs to know in order to save the persistent data.


10.6. Memory Allocation For Loading

This section is about how Linux allocates memory in which to load an LKM. It is not about how an LKM dynamically allocates memory, which is the same as for any other part of the kernel.

The memory where an LKM resides is a little different from that where the base kernel resides. The base kernel is always loaded into one big contiguous area of real memory, whose real addresses are equal to is virtual addresses. That's possible because the base kernel is the first thing ever to get loaded (besides the loader) -- it has a wide open empty space in which to load. And since the Linux kernel is not pageable, it stays in it's homestead forever.

By the time you load an LKM, real memory is all fragmented -- you can't simply add the LKM to the end of the base kernel. But the LKM needs to be in contiguous virtual memory, so Linux uses vmalloc to allocate a contiguous area of virtual memory (in the kernel address space), which is probably not contiguous in real memory. But the memory is still not pageable. The LKM gets loaded into real page frames from the start, and stays in those real page frames until it gets unloaded.

Some CPUs can take advantage of the properties of the base kernel to effect faster access to base kernel memory. For example, on one machine, the entire base kernel is covered by one page table entry and consequently one entry in the translation lookaside buffer (TLB). Naturally, that TLB entry is virtually always present. For LKMs on this machine, there is a page table entry for each memory page into which the LKM is loaded. Much more often, the entry for a page is not in the TLB when the CPU goes to access it, which means a slower access.

This effect is probably trivial.

It is also said that PowerPC Linux does something with its address translation so that transferring between accessing base kernel memory to accessing LKM memory is costly. I don't know anything solid about that.

The base kernel contains within its prized contiguous domain a large expanse of reusable memory -- the kmalloc pool. In Linux 2.4, at the end of 2001, there was a proposal to make the module loader try first to get contiguous memory from that pool into which to load an LKM and only if a large enough space was not available, go to the vmalloc space. That function may be in some versions of Linux.

10.7. Linux internals

If you're interested in the internal workings of the Linux kernel with respect to LKMs, this section can get you started. You should not need to know any of this in order to develop, build, and use LKMs.

The code to handle LKMs is in the source files kernel/module.c in the Linux source tree.

The kernel module loader (see Section 5.3) lives in kernel/kmod.c.

(Ok, that wasn't much of a start, but at least I have a framework here for adding this information in the future).

Inferno Solutions
Hosting by

Закладки на сайте
Проследить за страницей
Created 1996-2024 by Maxim Chirkov
Добавить, Поддержать, Вебмастеру