我之前在resourced的调试时就发现了这个问题:ctrl+atl+del,预期唤出dde-lock的关机界面(这是我们重点保护路径之一),但是treeland环境下系统直接重启了。之后和竹子确认了这是个已知问题,怀疑是某个DDE上层包没有正确处理。
但实际上不是,这是一个可能比看起来要复杂一点点的问题……
ok,开始。
一、系统为何重启?
系统重启问题的第一个难点就是:
- 没法提供可以交互的调试环境。——系统直接重启了,根本不给机会看系统里发生了什么,所以难以入手分析排查。
但是对于精(吹)通(b)各种内核调试手段的我们来说,当然不成问题:我们可以直接基于虚拟机调试,直接将断点打在内核的重启路径里,看看现场到底发生了什么。——转换到虚拟机调试的一个隐含的前提是:我们可以判断这个问题不太可能和硬件相关,虚拟机和物理机是一致的。
内核断点 kernel_restart():
Thread 4 hit Breakpoint 1, kernel_restart (cmd=0x0 <fixed_percpu_data>) at kernel/reboot.c:267
267 {
(gdb) bt
#0 kernel_restart (cmd=0x0 <fixed_percpu_data>) at kernel/reboot.c:267
#1 0xffffffff811644b2 in __do_sys_reboot (magic1=-18751827, magic2=672274793, cmd=19088743, arg=0x0 <fixed_percpu_data>) at kernel/reboot.c:738
#2 0xffffffff8210246a in do_syscall_x64 (nr=<optimized out>, regs=0xffffc90000013f58) at arch/x86/entry/common.c:51
#3 do_syscall_64 (regs=0xffffc90000013f58, nr=<optimized out>) at arch/x86/entry/common.c:81
#4 0xffffffff82200134 in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:121
#5 0x0000000000000000 in ?? ()
(gdb) p $lx_current()->comm
$1 = "systemd-shutdow"
(gdb) p $lx_current()->pid
$2 = 1
可以看到,此时是通过正常的reboot syscall走重启流程的,而且发起者是systemd-shutdown,pid为1。——这是正常的systemd reboot流程。
查看lx-dmesg,可以确认是正常流程:
[ 151.537142] systemd[1]: Reached target reboot.target - System Reboot.
[ 151.537402] systemd[1]: Shutting down.
[ 151.595059] systemd[1]: Using hardware watchdog 'iTCO_wdt', version 2, device /dev/watchdog0
[ 151.595129] systemd[1]: Watchdog running with a timeout of 10min.
[ 151.595150] watchdog: watchdog0: watchdog did not stop!
[ 151.624666] systemd-shutdown[1]: Using hardware watchdog 'iTCO_wdt', version 2, device /dev/watchdog0
[ 151.624777] systemd-shutdown[1]: Watchdog running with a timeout of 10min.
[ 151.693069] systemd-shutdown[1]: Syncing filesystems and block devices.
[ 151.861601] systemd-shutdown[1]: Sending SIGTERM to remaining processes...
[ 151.907134] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[ 151.919800] systemd-shutdown[1]: Unmounting file systems.
[ 151.923045] (sd-remount)[4143]: Remounting '/' read-only with options 'seclabel'.
[ 152.200373] EXT4-fs (sda1): re-mounted d60871a1-761a-4acf-a7f8-ec51b9b7a025 ro. Quota mode: none.
[ 152.271794] systemd-shutdown[1]: All filesystems unmounted.
[ 152.271800] systemd-shutdown[1]: Deactivating swaps.
[ 152.271862] systemd-shutdown[1]: All swaps deactivated.
[ 152.271865] systemd-shutdown[1]: Detaching loop devices.
[ 152.273272] systemd-shutdown[1]: All loop devices detached.
[ 152.273276] systemd-shutdown[1]: Stopping MD devices.
[ 152.273355] systemd-shutdown[1]: All MD devices stopped.
[ 152.273358] systemd-shutdown[1]: Detaching DM devices.
[ 152.273427] systemd-shutdown[1]: All DM devices detached.
[ 152.273431] systemd-shutdown[1]: All filesystems, swaps, loop devices, MD devices and DM devices detached.
[ 152.274589] systemd-shutdown[1]: Syncing filesystems and block devices.
[ 152.275270] systemd-shutdown[1]: Rebooting.
二、systemd的重启路径
粗略过一下systemd的main loop相关代码(删掉了大部分代码,只保留了reboot路径),可以大概看清楚它的路径:
int main(int argc, char *argv[]) {
……
r = invoke_main_loop(m,……)
……
/* Try to invoke the shutdown binary unless we already failed.
* If we failed above, we want to freeze after finishing cleanup. */
if (arg_runtime_scope == RUNTIME_SCOPE_SYSTEM &&
IN_SET(r, MANAGER_EXIT, MANAGER_REBOOT, MANAGER_POWEROFF, MANAGER_HALT, MANAGER_KEXEC)) {
r = become_shutdown(r, retval);
log_error_errno(r, "Failed to execute shutdown binary, %s: %m", getpid_cached() == 1 ? "freezing" : "quitting");
error_message = "Failed to execute shutdown binary";
}
……
if (getpid_cached() == 1) {
if (error_message)
manager_status_printf(NULL, STATUS_TYPE_EMERGENCY,
ANSI_HIGHLIGHT_RED "!!!!!!" ANSI_NORMAL,
"%s.", error_message);
freeze_or_exit_or_reboot();
}
……
}
static int become_shutdown(int objective, int retval) {
static const char* const table[_MANAGER_OBJECTIVE_MAX] = {
[MANAGER_EXIT] = "exit",
[MANAGER_REBOOT] = "reboot",
[MANAGER_POWEROFF] = "poweroff",
[MANAGER_HALT] = "halt",
[MANAGER_KEXEC] = "kexec",
};
...
execve(SYSTEMD_SHUTDOWN_BINARY_PATH, (char **) command_line, env_block);
return -errno;
}
reboot路径:
- 如果main loop退出,检查其ret,如果是REBOOT/POWEROFF等相关的,则会走到become_shutdown()
- 这个函数里,会直接execve,systemd变身为systemd-shutdown,pid仍然为1。
——这符合我们在内核过程里的trace,在结束时pid 1,会从systemd切换到systemd-shutdown,专门做shutdown相关工作。
那问题是:为什么systemd会退出main loop,走到reboot/shutdown路径里来?
三、systemd为何进入重启路径?
systemd的main loop相关代码:
static int invoke_main_loop(Manager *m, ...)
{
……
for (;;) {
int objective = manager_loop(m);
switch (objective) {
……
case MANAGER_REBOOT:
case MANAGER_POWEROFF:
case MANAGER_HALT:
case MANAGER_KEXEC: {
log_notice("Shutting down.");
*ret_retval = m->return_value;
*ret_fds = NULL;
*ret_switch_root_dir = *ret_switch_root_init = NULL;
return objective;
}
}
}
}
可以看到,还是在manager_loop里,通过各种event进来的。
进一步,我们可以找到下面两条event路径。
第一条是dbus路径,systemd通过dbus提供了reboot接口。——大概就是我们通常使用的systemctl reboot。
static int method_reboot(sd_bus_message *message, void *userdata, sd_bus_error *error) {
Manager *m = ASSERT_PTR(userdata);
int r;
assert(message);
r = mac_selinux_access_check(message, "reboot", error);
if (r < 0)
return r;
if (!MANAGER_IS_SYSTEM(m))
return sd_bus_error_setf(error, SD_BUS_ERROR_NOT_SUPPORTED,
"Reboot is only supported for system managers.");
m->objective = MANAGER_REBOOT;
return sd_bus_reply_method_return(message, NULL);
}
第二条是signal路径,要更复杂一点,可以认为并非常规路径:
static int manager_dispatch_signal_fd() {
n = read(m->signal_fd, &sfsi, sizeof(sfsi));
...
case SIGINT:
if (MANAGER_IS_SYSTEM(m))
manager_handle_ctrl_alt_del(m);
else
manager_start_special(m, SPECIAL_EXIT_TARGET, JOB_REPLACE_IRREVERSIBLY);
break;
……
}
可以看到:
- systemd是通过signal-fd来处理其signal的。——所以单纯的gdb并断不到。
- 对于SIGINT,会进入manager_handle_ctrl_alt_del(),也就是CAD处理里。
其代码为:
static void manager_handle_ctrl_alt_del(Manager *m) {
/* If the user presses C-A-D more than
* 7 times within 2s, we reboot/shutdown immediately,
* unless it was disabled in system.conf */
if (ratelimit_below(&m->ctrl_alt_del_ratelimit) || m->cad_burst_action == EMERGENCY_ACTION_NONE)
manager_start_special(m, SPECIAL_CTRL_ALT_DEL_TARGET, JOB_REPLACE_IRREVERSIBLY);
else
emergency_action(m, m->cad_burst_action, EMERGENCY_ACTION_WARN, NULL, -1,
"Ctrl-Alt-Del was pressed more than 7 times within 2s");
}
这符合我们的场景:有Ctrl-Alt-Del按键时,systemd会进入reboot流程。
我们可以简单的通过gdb attach systemd来确认。
因为systemd会有大量信号(尤其是SIGCHLD,子线程相关信号),所以我们调整一下断点位置:
- 因为signo是通过signal_fd读入的,并非函数入参,所以我们需要将断点打在manager_dispatch_signal_fd()内部,同时基于signo来过滤,设置条件断点。
- 如果有源码,则推荐基于源码文件:行数来打断点,类似:b src/core/manager.c:2882 if sfsi->signo != 17,在下面这一行之后就可以:
n = read(m->signal_fd, &sfsi, sizeof(sfsi));
我没有apt source源码+编译,所以就取巧一下:
- 先断在manager_dispatch_signal_fd()开头,然后一边n,一边info locals,等到sfsi读出来了,就知道这个汇编代码位置是我们要的断点。
- 然后:
b *$rip if sfsi->ssi_signo != 17
接下来触发key事件,对于虚拟机,如果不想来回切屏幕的话,也可以用virsh:
virsh send-key uos-v25-mutable KEY_LEFTCTRL KEY_LEFTALT KEY_DELETE
然后我们就可以看到systemd通过signal-fd收到的sig:
sfsi = {
ssi_signo = 2,
ssi_errno = 0,
ssi_code = 128,
ssi_pid = 0,
ssi_uid = 0,
ssi_fd = 0,
ssi_tid = 0,
ssi_band = 0,
ssi_overrun = 0,
ssi_trapno = 0,
ssi_status = 0,
ssi_int = 0,
ssi_ptr = 0,
ssi_utime = 0,
ssi_stime = 0,
ssi_addr = 0,
ssi_addr_lsb = 0,
__pad2 = 0,
ssi_syscall = 0,
ssi_call_addr = 0,
ssi_arch = 0,
__pad = '\000' <repeats 27 times>
}
这里值得关注的有两个信息:
- signo为2,也就是SIGINT。
- ssi_pid为0,这是send pid,而pid 0是内核的idle线程(也就是swap/0)。——它的含义是:这是内核发起的信号。
另外我们还有一种偷懒的方式,来确定是不是这个SIGINT所引发的重启:
- 在gdb里断下来之后,直接return返回,不处理该信号。
——确认如我们所料,就是它的原因。
ok,到这里,我们确认了:
- 因为systemd收到了SIGINT,然后进入了重启流程。
- 信号来自于内核。
那接下来的问题是:
- systemd为什么会在C-A-D按键时收到SIGINT?为什么只有treeland会?
(接下来,就要复杂一点点了)
四、内核的SIGINT产生路径
首先,我们想办法拿到内核的signal发送路径。
既然我们已经搭建了虚拟机的内核调试环境,我们就直接用断点来抓取:
在内核gdb里:
(gdb) b send_signal_locked if t->pid == 1 && (sig==2 || siginfo->si_signo == 2)
Breakpoint 6 at 0xffffffff8113d840: file kernel/signal.c, line 1227.
(gdb) c
Continuing.
其中send_signal_locked()是内核信号发送的底层核心函数,都会经过它:
int send_signal_locked(int sig, struct kernel_siginfo *info,
struct task_struct *t, enum pid_type type)
{
/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
bool force = false;
if (info == SEND_SIG_NOINFO) {
/* Force if sent from an ancestor pid namespace */
force = !task_pid_nr_ns(current, task_active_pid_ns(t));
} else if (info == SEND_SIG_PRIV) {
/* Don't ignore kernel generated signals */
force = true;
} else if (has_si_pid_and_uid(info)) {
/* SIGKILL and SIGSTOP is special or has ids */
struct user_namespace *t_user_ns;
rcu_read_lock();
t_user_ns = task_cred_xxx(t, user_ns);
if (current_user_ns() != t_user_ns) {
kuid_t uid = make_kuid(current_user_ns(), info->si_uid);
info->si_uid = from_kuid_munged(t_user_ns, uid);
}
rcu_read_unlock();
/* A kernel generated signal? */
force = (info->si_code == SI_KERNEL);
/* From an ancestor pid namespace? */
if (!task_pid_nr_ns(current, task_active_pid_ns(t))) {
info->si_pid = 0;
force = true;
}
}
return __send_signal_locked(sig, info, t, type, force);
}
其中可以看到明确区分了是否为kernel generated signal,如果info->si_code为SI_KERNEL时,会设置为force,且其info->si_pid会设置为0,之后会在signal_fd的read()里被拷贝回用户空间。
触发C-A-D,我们抓到以下的bt:
(gdb) bt
#0 send_signal_locked (sig=2, info=0x1 <fixed_percpu_data+1>, t=0xffff8881003bd280, type=PIDTYPE_TGID) at kernel/signal.c:1227
#1 0xffffffff8113d9dd in do_send_sig_info (sig=sig@entry=2, info=info@entry=0x1 <fixed_percpu_data+1>, p=p@entry=0xffff8881003bd280, type=type@entry=PIDTYPE_TGID) at kernel/signal.c:1311
#2 0xffffffff8113e33f in group_send_sig_info (type=PIDTYPE_TGID, p=0xffff8881003bd280, info=0x1 <fixed_percpu_data+1>, sig=2) at kernel/signal.c:1461
#3 kill_pid_info (pid=0xffff8881002b1800, info=0x1 <fixed_percpu_data+1>, sig=2) at kernel/signal.c:1495
#4 kill_pid (pid=0xffff8881002b1800, sig=2, priv=<optimized out>) at kernel/signal.c:1935
#5 0xffffffff81ac111c in kbd_keycode (hw_raw=<optimized out>, down=1, keycode=<optimized out>) at drivers/tty/vt/keyboard.c:1524
#6 kbd_event (handle=<optimized out>, event_type=<optimized out>, event_code=<optimized out>, value=1) at drivers/tty/vt/keyboard.c:1543
#7 0xffffffff81cf5fea in input_to_handler (handle=handle@entry=0xffff888101018720, vals=vals@entry=0xffff888100f940c0, count=count@entry=3) at drivers/input/input.c:132
#8 0xffffffff81cf743f in input_pass_values (dev=dev@entry=0xffff88810107e000, vals=0xffff888100f940c0, count=3) at drivers/input/input.c:161
#9 0xffffffff81cf7535 in input_pass_values (count=<optimized out>, vals=<optimized out>, dev=0xffff88810107e000) at drivers/input/input.c:150
#10 input_event_dispose (dev=0xffff88810107e000, disposition=<optimized out>, type=0, code=0, value=0) at drivers/input/input.c:378
#11 0xffffffff81cfa1b3 in input_event (value=0, code=0, type=0, dev=0xffff88810107e000) at drivers/input/input.c:435
#12 input_event (dev=dev@entry=0xffff88810107e000, type=type@entry=0, code=code@entry=0, value=value@entry=0) at drivers/input/input.c:427
#13 0xffffffff81d01cfb in input_sync (dev=0xffff88810107e000) at ./include/linux/input.h:450
#14 atkbd_receive_byte (ps2dev=0xffff8881203f5800, data=<optimized out>) at drivers/input/keyboard/atkbd.c:562
#15 0xffffffff81cf5afe in ps2_interrupt (serio=<optimized out>, data=83 'S', flags=<optimized out>) at drivers/input/serio/libps2.c:613
#16 0xffffffff81cf0d27 in serio_interrupt (serio=serio@entry=0xffff8881010d2000, data=data@entry=83 'S', dfl=dfl@entry=0) at drivers/input/serio/serio.c:998
#17 0xffffffff81cf223f in i8042_interrupt (irq=1, dev_id=<optimized out>) at drivers/input/serio/i8042.c:610
#18 0xffffffff811d06a7 in __handle_irq_event_percpu (desc=desc@entry=0xffff888100269200) at kernel/irq/handle.c:158
#19 0xffffffff811d0898 in handle_irq_event_percpu (desc=0xffff888100269200) at kernel/irq/handle.c:193
#20 handle_irq_event (desc=desc@entry=0xffff888100269200) at kernel/irq/handle.c:210
#21 0xffffffff811d5f4b in handle_edge_irq (desc=0xffff888100269200) at kernel/irq/chip.c:831
#22 0xffffffff81041d5b in generic_handle_irq_desc (desc=0x2 <fixed_percpu_data+2>) at ./include/linux/irqdesc.h:161
#23 handle_irq (regs=<optimized out>, desc=0x2 <fixed_percpu_data+2>) at arch/x86/kernel/irq.c:238
#24 __common_interrupt (regs=<optimized out>, vector=38) at arch/x86/kernel/irq.c:257
#25 0xffffffff821054f1 in common_interrupt (regs=0xffffffff83403da8, error_code=<optimized out>) at arch/x86/kernel/irq.c:247
Backtrace stopped: Cannot access memory at address 0xffffc90000004010
(gdb)
可以看到这就是一个典型的键盘事件栈:
- 从硬中断进入,i8042/serio/ps2的硬件层,到input子系统,再分发到tty/vt所注册的kbd handler。
- 在kbd handler里,触发了kill_pid,给systemd发送了SIGINT。
所以,这确实是内核发出的信号,而且直接来自于tty的kbd驱动。
接下来的问题是:
- tty/vt-kbd,为何会发送SIGINT?
五、内核的input/vt/kbd过程
在这里,我们需要看一点点代码,粗略地了解一点点内核的input/tty/vt/kbd相关的框架……
略去大量大量大量的细节,我们可以如此粗略地理解:
- vt(virtual tty)很重要,因为我们现在的session都是跑在vt里。——我们常说的切tty,切的就是vt。
- vt有不同的模式,最典型的是:tty模式和图形模式下,vt的工作原理是截然不同的。
这里要稍微提一下input子系统:
- 所有的硬件事件,都会通过input_event()进入input子系统,转换为三元的input事件:type/code/value。——典型的按键事件,type为1,code为键值,value 1/0表示up/down。
- 上面是event的生产者,input子系统注册多个消费者:也就是不同的handler。
- 典型的,kbd就是一个handler:它主要是为tty模式服务的。
- 典型的,有另外一个handler:evdev,也就是我们在用户空间看到的/dev/input/eventX,它会将input event通过设备节点传递到用户空间。
- 对于图形程序,比如xkb/libinput,就是通过evdev来读取事件并处理的,并不走kbd那一套。
- evdev也支持并发读取而互不影响(其实现很有意思),所以我们还可以手动起evtest抓取事件。——evdev甚至还支持事件写入模拟,比如我现在测试resourced的鼠标模拟就是这个路径。
ok,到这里,最重要的一点就是:
- kbd作为vt的默认keyboard handler,它是为tty服务的,在图形模式下它基本不工作。
有一个类似于C-A-D的典型例子,当我们尝试切换tty,也就是Ctrl-Alt-F6时:
- 如果当前是在tty里,则会由vt kbd在底层捕获这种组合按键,并且直接在内核调用set_console()相关函数,完成chvt。
- 如果是在图形环境里(以wayland为例),vt-kbd不处理,必须经由xorg,通过libinput,在xkb里组合为一个chvt事件,然后通过dbus请求systemd-logind来完成处理。——wayland环境下,session和vt管理,以及drm master权限等等,全部是由systemd-logind接管的,比xorg要规范。
那vt kbd如何完成这种切换?
它提供了相关的ioctl,可以设置其kbdmode:
- 在tty模式下时,其会被初始化为VC_UNICODE
- 到图形时,则会被切换为VC_OFF,关闭其处理路径。(当然实际实现要复杂得多得多)
unsigned char kbdmode:3; /* one 3-bit value */
#define VC_XLATE 0 /* translate keycodes using keymap */
#define VC_MEDIUMRAW 1 /* medium raw (keycode) mode */
#define VC_RAW 2 /* raw (scancode) mode */
#define VC_UNICODE 3 /* Unicode mode */
#define VC_OFF 4 /* disabled mode */
ok,接下来,让我们来看一看它们的代码实现。
六、内核的vt-kbd实现
kbd的入口处理函数是:
static void kbd_event(struct input_handle *handle, unsigned int event_type,
unsigned int event_code, int value)
{
/* We are called with interrupts disabled, just take the lock */
spin_lock(&kbd_event_lock);
if (event_type == EV_MSC && event_code == MSC_RAW &&
kbd_is_hw_raw(handle->dev))
kbd_rawcode(value);
if (event_type == EV_KEY && event_code <= KEY_MAX)
kbd_keycode(event_code, value, kbd_is_hw_raw(handle->dev));
spin_unlock(&kbd_event_lock);
tasklet_schedule(&keyboard_tasklet);
do_poke_blanked_console = 1;
schedule_console_callback();
}
kbd_keycode是核心处理函数,处理所有的key事件。
我们略过普通的按键处理,而是关注组合事件,也就是fn相关的处理:
static void kbd_keycode(unsigned int keycode, int down, bool hw_raw)
{
struct vc_data *vc = vc_cons[fg_console].d;
kbd = &kbd_table[vc->vc_num];
...
param.value = keysym;
rc = atomic_notifier_call_chain(&keyboard_notifier_list,
KBD_KEYSYM, ¶m);
...
if ((raw_mode || kbd->kbdmode == VC_OFF) && type != KT_SPEC && type != KT_SHIFT)
return;
(*k_handler[type])(vc, keysym & 0xff, !down);
}
上面的函数有很多个关键点:
- 有一个全局的kbd_table,vc->vc_num代表了其索引,可以拿到vt相关联的kbd。——最多有64个。
- 在处理流程里,有两次notify。——现在的ste powerkey事件,就是注册的一个notifier。
- 如果是运行在VC_OFF模式(也就是图形桌面)下,则只处理KT_SPEC/KT_SHIFT事件,其他直接跳过。——所以普通的abcd不会进入tty。
- 最终是通过一个table来完成handler调用的。
kbd的k_handler是一张表,用type来索引:
#define K_HANDLERS\
k_self, k_fn, k_spec, k_pad,\
k_dead, k_cons, k_cur, k_shift,\
k_meta, k_ascii, k_lock, k_lowercase,\
k_slock, k_dead2, k_brl, k_ignore
typedef void (k_handler_fn)(struct vc_data *vc, unsigned char value,
char up_flag);
static k_handler_fn K_HANDLERS;
static k_handler_fn *k_handler[16] = { K_HANDLERS };
可以看到:
- 我们所关注的k_spec就在其中,序号为2。
- 还有一个k_cons():没错,它就是kbd里的tty切换入口。
static void k_cons(struct vc_data *vc, unsigned char value, char up_flag)
{
if (up_flag)
return;
set_console(value);
}
而k_spec可以策略认为就是各种vt里的fn按键:各种特定事件的小功能,比如enter、show、send-intr、hold、SAK……
所以它内部也是一张表,根据事件id来找到对应的处理函数:
#define FN_HANDLERS\
fn_null, fn_enter, fn_show_ptregs, fn_show_mem,\
fn_show_state, fn_send_intr, fn_lastcons, fn_caps_toggle,\
fn_num, fn_hold, fn_scroll_forw, fn_scroll_back,\
fn_boot_it, fn_caps_on, fn_compose, fn_SAK,\
fn_dec_console, fn_inc_console, fn_spawn_con, fn_bare_num
typedef void (fn_handler_fn)(struct vc_data *vc);
static fn_handler_fn FN_HANDLERS;
static fn_handler_fn *fn_handler[] = { FN_HANDLERS };
static void k_spec(struct vc_data *vc, unsigned char value, char up_flag)
{
if (up_flag)
return;
if (value >= ARRAY_SIZE(fn_handler))
return;
if ((kbd->kbdmode == VC_RAW ||
kbd->kbdmode == VC_MEDIUMRAW ||
kbd->kbdmode == VC_OFF) &&
value != KVAL(K_SAK))
return; /* SAK is allowed even in raw mode */
fn_handler[value](vc);
}
其中有几个点需要特别关注:
- 当kbd->kbdmode为VC_OFF时,除了SAK事件之外,全部会被忽略掉。——也就是说,VC_OFF时,spec事件并不会触发。
- 事件12:fn_boot_it
看一下fn_boot_it的实现:
static void fn_boot_it(struct vc_data *vc)
{
ctrl_alt_del();
}
熟悉的名字:
/*
* This function gets called by ctrl-alt-del - ie the keyboard interrupt.
* As it's called within an interrupt, it may NOT sync: the only choice
* is whether to reboot at once, or just ignore the ctrl-alt-del.
*/
void ctrl_alt_del(void)
{
static DECLARE_WORK(cad_work, deferred_cad);
if (C_A_D)
schedule_work(&cad_work);
else
kill_cad_pid(SIGINT, 1);
}
可以看到:
- 如果没有C-A-D事件,默认就是给1号进程(也就是systemd)发送一个SIGINT。
ok,到这里,我们知道了:
- kbd有多种kbdmode。
- kbd工作在tty模式下时,需要处理各种事件,比如tty切换、tty-enter、C-A-D等等。
- 在图形模式下时,则需要切换为VC_OFF,关闭掉这些处理。——会由图形daemon通过evdev处理。
七、确认kbdmode
虚拟机gdb,断在内核ctrl_alt_del():
(gdb) bt
#0 ctrl_alt_del () at kernel/reboot.c:800
#1 0xffffffff81ac111c in kbd_keycode (hw_raw=<optimized out>, down=1, keycode=<optimized out>) at drivers/tty/vt/keyboard.c:1524
#2 kbd_event (handle=<optimized out>, event_type=<optimized out>, event_code=<optimized out>, value=1) at drivers/tty/vt/keyboard.c:1543
#3 0xffffffff81cf5fea in input_to_handler (handle=handle@entry=0xffff8881011bcd20, vals=vals@entry=0xffff888120198a20, count=count@entry=3) at drivers/input/input.c:132
#4 0xffffffff81cf743f in input_pass_values (dev=dev@entry=0xffff8881203b7800, vals=0xffff888120198a20, count=3) at drivers/input/input.c:161
#5 0xffffffff81cf7535 in input_pass_values (count=<optimized out>, vals=<optimized out>, dev=0xffff8881203b7800) at drivers/input/input.c:150
#6 input_event_dispose (dev=0xffff8881203b7800, disposition=<optimized out>, type=0, code=0, value=0) at drivers/input/input.c:378
#7 0xffffffff81cfa1b3 in input_event (value=0, code=0, type=0, dev=0xffff8881203b7800) at drivers/input/input.c:435
#8 input_event (dev=dev@entry=0xffff8881203b7800, type=type@entry=0, code=code@entry=0, value=value@entry=0) at drivers/input/input.c:427
#9 0xffffffff81d01cfb in input_sync (dev=0xffff8881203b7800) at ./include/linux/input.h:450
#10 atkbd_receive_byte (ps2dev=0xffff8881203ad800, data=<optimized out>) at drivers/input/keyboard/atkbd.c:562
#11 0xffffffff81cf5afe in ps2_interrupt (serio=<optimized out>, data=83 'S', flags=<optimized out>) at drivers/input/serio/libps2.c:613
#12 0xffffffff81cf0d27 in serio_interrupt (serio=serio@entry=0xffff88810119e800, data=data@entry=83 'S', dfl=dfl@entry=0) at drivers/input/serio/serio.c:998
#13 0xffffffff81cf223f in i8042_interrupt (irq=1, dev_id=<optimized out>) at drivers/input/serio/i8042.c:610
#14 0xffffffff811d06a7 in __handle_irq_event_percpu (desc=desc@entry=0xffff8881000a4a00) at kernel/irq/handle.c:158
#15 0xffffffff811d0898 in handle_irq_event_percpu (desc=0xffff8881000a4a00) at kernel/irq/handle.c:193
#16 handle_irq_event (desc=desc@entry=0xffff8881000a4a00) at kernel/irq/handle.c:210
#17 0xffffffff811d5f4b in handle_edge_irq (desc=0xffff8881000a4a00) at kernel/irq/chip.c:831
#18 0xffffffff81041d5b in generic_handle_irq_desc (desc=0xffff88810027b800) at ./include/linux/irqdesc.h:161
#19 handle_irq (regs=0x26 <fixed_percpu_data+38>, desc=0xffff88810027b800) at arch/x86/kernel/irq.c:238
#20 __common_interrupt (regs=regs@entry=0xffffc90003bb3f58, vector=vector@entry=38) at arch/x86/kernel/irq.c:257
#21 0xffffffff821054b3 in common_interrupt (regs=0xffffc90003bb3f58, error_code=38) at arch/x86/kernel/irq.c:247
#22 0xffffffff822015e6 in asm_common_interrupt () at ./arch/x86/include/asm/idtentry.h:678
查看kbdmode:
(gdb) p vc
$3 = (struct vc_data *) 0xffff888100279000
(gdb) p vc->vc_num
$4 = 0
(gdb) p kbd_table[0]
$5 = {
lockstate = 0 '\000',
slockstate = 0 '\000',
ledmode = 0 '\000',
ledflagstate = 0 '\000',
default_ledflagstate = 0 '\000',
kbdmode = 3 '\003',
modeflags = 20 '\024'
}
可以看到:
- 在treeland环境里,其关联的kbd->kbdmode为3,也就是VC_UNICODE,而非VC_OFF/4。
八、x11环境下kbdmode
在x11环境下:
$12 = (struct kbd_struct *) 0xffffffff844aaf00 <kbd_table>
(gdb) p *kbd
$13 = {
lockstate = 0 '\000',
slockstate = 0 '\000',
ledmode = 0 '\000',
ledflagstate = 0 '\000',
default_ledflagstate = 0 '\000',
kbdmode = 4 '\004',
modeflags = 20 '\024'
}
可以看到,其kbdmode为4。——这才是合理的。
我们进一步追踪vt_do_kdskbmode,这是内核里的kbdmode设置函数,从ioctl进入:
static int vt_k_ioctl(struct tty_struct *tty, unsigned int cmd,
unsigned long arg, bool perm)
{
...
switch (cmd) {
...
case KDSKBMODE:
if (!perm)
return -EPERM;
ret = vt_do_kdskbmode(console, arg);
if (ret)
return ret;
tty_ldisc_flush(tty);
break;
...
}
}
/**
* vt_do_kdskbmode - set keyboard mode ioctl
* @console: the console to use
* @arg: the requested mode
*
* Update the keyboard mode bits while holding the correct locks.
* Return 0 for success or an error code.
*/
int vt_do_kdskbmode(unsigned int console, unsigned int arg)
{
struct kbd_struct *kb = &kbd_table[console];
int ret = 0;
unsigned long flags;
spin_lock_irqsave(&kbd_event_lock, flags);
switch(arg) {
case K_RAW:
kb->kbdmode = VC_RAW;
break;
case K_MEDIUMRAW:
kb->kbdmode = VC_MEDIUMRAW;
break;
case K_XLATE:
kb->kbdmode = VC_XLATE;
do_compute_shiftstate();
break;
case K_UNICODE:
kb->kbdmode = VC_UNICODE;
do_compute_shiftstate();
break;
case K_OFF:
kb->kbdmode = VC_OFF;
break;
default:
ret = -EINVAL;
}
spin_unlock_irqrestore(&kbd_event_lock, flags);
return ret;
}
在启动过程中,可以抓到:
Thread 4 hit Breakpoint 16, vt_do_kdskbmode (console=0, arg=arg@entry=4) at drivers/tty/vt/keyboard.c:1841
1841 {
1: kbd_table = {{
lockstate = 0 '\000',
slockstate = 0 '\000',
ledmode = 0 '\000',
ledflagstate = 0 '\000',
default_ledflagstate = 0 '\000',
kbdmode = 3 '\003',
modeflags = 20 '\024'
} <repeats 63 times>}
2: $lx_current()->comm = "Xorg", '\000' <repeats 11 times>
(gdb) bt
#0 vt_do_kdskbmode (console=0, arg=arg@entry=4) at drivers/tty/vt/keyboard.c:1841
#1 0xffffffff81abc0d1 in vt_k_ioctl (perm=<optimized out>, arg=4, cmd=19269, tty=0xffff8881039b0800) at drivers/tty/vt/vt_ioctl.c:399
#2 vt_ioctl (tty=0xffff8881039b0800, cmd=19269, arg=4) at drivers/tty/vt/vt_ioctl.c:752
#3 0xffffffff81aabeea in tty_ioctl (file=0xffff888110f1e100, cmd=19269, arg=4) at drivers/tty/tty_io.c:2779
#4 0xffffffff814d5eb4 in vfs_ioctl (arg=4, cmd=<optimized out>, filp=0xffff888110f1e100) at fs/ioctl.c:51
#5 __do_sys_ioctl (arg=4, cmd=<optimized out>, fd=<optimized out>) at fs/ioctl.c:871
#6 __se_sys_ioctl (arg=4, cmd=<optimized out>, fd=<optimized out>) at fs/ioctl.c:857
#7 __x64_sys_ioctl (regs=<optimized out>) at fs/ioctl.c:857
#8 0xffffffff8210246a in do_syscall_x64 (nr=<optimized out>, regs=0xffffc90000de3f58) at arch/x86/entry/common.c:51
#9 do_syscall_64 (regs=0xffffc90000de3f58, nr=<optimized out>) at arch/x86/entry/common.c:81
#10 0xffffffff82200134 in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:121
#11 0x00005560ca1427b0 in ?? ()
#12 0x0000000000000003 in fixed_percpu_data ()
#13 0x00005560ca131c20 in ?? ()
#14 0x00005560ca142c20 in ?? ()
#15 0x00005560ca142bc4 in ?? ()
#16 0x00005560ca107ba0 in ?? ()
#17 0x0000000000000246 in ?? ()
#18 0x00007f543970b6c8 in ?? ()
#19 0x0000000000000000 in ?? ()
(gdb)
可以看到:
- x11环境下,是由Xorg完成设置的,会将其设置为4。
看一下xorg代码,可以看到它在初始化过程里会主动去设置:
void
xf86OpenConsole(void)
{
int i, ret;
struct vt_stat vts;
struct vt_mode VT;
const char *vcs[] = { "/dev/vc/%d", "/dev/tty%d", NULL };
if (serverGeneration == 1) {
linux_parse_vt_settings(FALSE);
SYSCALL(ret = ioctl(xf86Info.consoleFd, VT_GETSTATE, &vts));
/*
* now get the VT. This _must_ succeed, or else fail completely.
*/
if (!switch_to(xf86Info.vtno, "xf86OpenConsole"))
FatalError("xf86OpenConsole: Switching VT failed\n");
SYSCALL(ret = ioctl(xf86Info.consoleFd, VT_GETMODE, &VT));
OsSignal(SIGUSR1, xf86VTRequest);
VT.mode = VT_PROCESS;
VT.relsig = SIGUSR1;
VT.acqsig = SIGUSR1;
SYSCALL(ret = ioctl(xf86Info.consoleFd, VT_SETMODE, &VT));
SYSCALL(ret = ioctl(xf86Info.consoleFd, KDSETMODE, KD_GRAPHICS));
tcgetattr(xf86Info.consoleFd, &tty_attr);
SYSCALL(ioctl(xf86Info.consoleFd, KDGKBMODE, &tty_mode));
/* disable kernel special keys and buffering */
SYSCALL(ret = ioctl(xf86Info.consoleFd, KDSKBMODE, K_OFF));
nTty = tty_attr;
nTty.c_iflag = (IGNPAR | IGNBRK) & (~PARMRK) & (~ISTRIP);
nTty.c_oflag = 0;
nTty.c_cflag = CREAD | CS8;
nTty.c_lflag = 0;
nTty.c_cc[VTIME] = 0;
nTty.c_cc[VMIN] = 1;
cfsetispeed(&nTty, 9600);
cfsetospeed(&nTty, 9600);
tcsetattr(xf86Info.consoleFd, TCSANOW, &nTty);
}
}
else { /* serverGeneration != 1 */
if (!xf86Info.ShareVTs && xf86Info.autoVTSwitch) {
/* now get the VT */
if (!switch_to(xf86Info.vtno, "xf86OpenConsole"))
FatalError("xf86OpenConsole: Switching VT failed\n");
}
}
}
可以看到:
- 这个过程里有大量的通过ioctl对于vt配置的操作,其中就包括将其KDSKBMODE设置为K_OFF。
- xorg的明确注释符合我们的理解:disable掉kernel tty侧对于key的处理和buffering。——buffering指的是tty的input流,也就是我们熟悉的printk/stdin相关的。
- 貌似xorg也支持attach模式,有一个直接switch_to的路径。
- xorg注册了SIGUSR1,这是后续用来从内核接受tty相关状态改变的信号:aquire和release。
九、wayland环境下的kbdmode
在gnome/wayland环境下,则是由systemd-logind调用的:
(gdb)
Thread 1 hit Breakpoint 2, vt_do_kdskbmode (console=1, arg=arg@entry=4) at drivers/tty/vt/keyboard.c:1841
1841 {
1: $lx_current()->comm = "systemd-logind\000"
(gdb) c
Continuing.
systemd-login通过dbus提供了相关接口:
static int method_take_control(sd_bus_message *message, void *userdata, sd_bus_error *error) {
...
r = session_set_controller(s, sd_bus_message_get_sender(message), force, true);
return sd_bus_reply_method_return(message, NULL);
}
int session_set_controller(Session *s, const char *sender, bool force, bool prepare) {
...
/* When setting a session controller, we forcibly mute the VT and set
* it into graphics-mode. Applications can override that by changing
* VT state after calling TakeControl(). However, this serves as a good
* default and well-behaving controllers can now ignore VTs entirely.
* Note that we reset the VT on ReleaseControl() and if the controller
* exits.
* If logind crashes/restarts, we restore the controller during restart
* (without preparing the VT since the controller has probably overridden
* VT state by now) or reset the VT in case it crashed/exited, too. */
if (prepare) {
r = session_prepare_vt(s);
}
session_release_controller(s, true);
s->controller = TAKE_PTR(name);
session_save(s);
return 0;
}
static int session_prepare_vt(Session *s) {
...
vt = session_open_vt(s, /* reopen = */ false);
r = fchown(vt, s->user->user_record->uid, -1);
r = ioctl(vt, KDSKBMODE, K_OFF);
r = ioctl(vt, KDSETMODE, KD_GRAPHICS);
/* Oh, thanks to the VT layer, VT_AUTO does not work with KD_GRAPHICS.
* So we need a dummy handler here which just acknowledges *all* VT
* switch requests. */
mode.mode = VT_PROCESS;
mode.relsig = SIGRTMIN;
mode.acqsig = SIGRTMIN + 1;
r = ioctl(vt, VT_SETMODE, &mode);
return 0;
}
可以看到:
- logind通过dbus提供了session/vt相关的操作接口,尤其是控制权相关的。
- 同样的,会设置K_OFF,KD_GRAPHICS之类的……
另外还有一个操作,当我们切换tty时:
$ virsh send-key debian12 KEY_LEFTCTRL KEY_LEFTALT KEY_F3
可以看到agentty会设置:
Thread 1 hit Breakpoint 2, vt_do_kdskbmode (console=2, arg=arg@entry=3) at drivers/tty/vt/keyboard.c:1841
1841 {
1: $lx_current()->comm = "(agetty)\000\000\000\000\000\000\000"
(gdb) c
Continuing.
agentty,大概是tty创建之后跑的默认第一个程序,会打印出“login:”让我们输入用户名。——当我们按下回车时,其实就从agentty跳转到password程序了……
十、验证以及NEXT
到这里基本就差不多了,但是我们还可以验证一下:
在treeland登陆后,直接修改其关联kbd的kbdmode:
Thread 1 hit Breakpoint 8, kbd_keycode (hw_raw=<optimized out>, down=1, keycode=36) at drivers/tty/vt/keyboard.c:1408
1408 tty = vc->port.tty;
(gdb) p vc
$12 = (struct vc_data *) 0xffff88810115b400
(gdb) p vc->vc_num
$13 = 1
(gdb) set kbd_table[1].kbdmode = 4
(gdb) c
Continuing.
再触发:
virsh send-key uos-v25-mutable KEY_LEFTCTRL KEY_LEFTALT KEY_DELETE
结果:没有重启,也没有弹出dde-lock。——ok,符合预期。没有弹出dde-lock大概是因为没有服务在DDE环境里关注该事件。
Next:
- treeland fix:应该最好是参考wayland,通过systemd接口来实现session管理,合理配置vt。