最近在看一些资料的时候,发现资料中写到poll的文件描述符fd数目没有限制是因为是基于链表实现的。
但是我在康《unix网络编程卷一》的时候发现它对poll的fd集合是这么描述的:
怎么有的说是链表,有的说是数组啊,说法不一致啊。于是我赶紧打开我的主机manual一下看看。
首先我先man 2 select查看了一下select,发现:
select传的是fd_set,而fd_set则是大小受到FD_SETSIZE(资料中常说的1024)限制的数组结构:
赶紧再man 2 poll查看一下poll:
描述里面清楚的描写着fds就是个存放struct pollfd的数组啊(The set of file descriptors to be monitored is specified in the fds argument, which is an array of structures of the following form),《unix网络编程卷一》的描述这么看并没有问题啊,但是为何好多资料都说poll的fd是基于链表的呢,难道他们都说错了?
我急忙去请教大佬
我:大佬大佬,我linux有个不懂得地方。
大佬:啊,去看看源码吧。
我:看不懂咋整。
大佬:不看永远不懂。
我:Emmmm 好像很有道理。
于是迫不得已,我只好硬着头皮去翻Linux kernel代码(linux-5.6.12版本),搜索sys_poll,发现在/fs/select.c中(原来的sys_xxx都变成了SYSCALL_DEFINE宏定义了,找了半天),其中代码如下:
SYSCALL_DEFINE3(poll, struct pollfd __user *, ufds, unsigned int, nfds,
int, timeout_msecs)
{
struct timespec64 end_time, *to = NULL;
int ret;
if (timeout_msecs >= 0) {
to = &end_time;
poll_select_set_timeout(to, timeout_msecs / MSEC_PER_SEC,
NSEC_PER_MSEC * (timeout_msecs % MSEC_PER_SEC));
}
ret = do_sys_poll(ufds, nfds, to);
if (ret == -ERESTARTNOHAND) {
struct restart_block *restart_block;
restart_block = ¤t->restart_block;
restart_block->fn = do_restart_poll;
restart_block->poll.ufds = ufds;
restart_block->poll.nfds = nfds;
if (timeout_msecs >= 0) {
restart_block->poll.tv_sec = end_time.tv_sec;
restart_block->poll.tv_nsec = end_time.tv_nsec;
restart_block->poll.has_timeout = 1;
} else
restart_block->poll.has_timeout = 0;
ret = -ERESTART_RESTARTBLOCK;
}
return ret;
}
其中完成轮询功能的是ret = do_sys_poll(ufds, nfds, to)这一句,点到do_sys_poll:
static int do_sys_poll(struct pollfd __user *ufds, unsigned int nfds,
struct timespec64 *end_time)
{
struct poll_wqueues table;
int err = -EFAULT, fdcount, len;
/* Allocate small arguments on the stack to save memory and be
faster - use long to make sure the buffer is aligned properly
on 64 bit archs to avoid unaligned access */
long stack_pps[POLL_STACK_ALLOC/sizeof(long)];
struct poll_list *const head = (struct poll_list *)stack_pps;
struct poll_list *walk = head;
unsigned long todo = nfds;
if (nfds > rlimit(RLIMIT_NOFILE))
return -EINVAL;
len = min_t(unsigned int, nfds, N_STACK_PPS);
for (;;) {
walk->next = NULL;
walk->len = len;
if (!len)
break;
if (copy_from_user(walk->entries, ufds + nfds-todo,
sizeof(struct pollfd) * walk->len))
goto out_fds;
todo -= walk->len;
if (!todo)
break;
len = min(todo, POLLFD_PER_PAGE);
walk = walk->next = kmalloc(struct_size(walk, entries, len),
GFP_KERNEL);
if (!walk) {
err = -ENOMEM;
goto out_fds;
}
}
poll_initwait(&table);
fdcount = do_poll(head, &table, end_time);
poll_freewait(&table);
for (walk = head; walk; walk = walk->next) {
struct pollfd *fds = walk->entries;
int j;
for (j = 0; j < walk->len; j++, ufds++)
if (__put_user(fds[j].revents, &ufds->revents))
goto out_fds;
}
err = fdcount;
out_fds:
walk = head->next;
while (walk) {
struct poll_list *pos = walk;
walk = walk->next;
kfree(pos);
}
return err;
}
我惊了,原来真的有个struct poll_list类型的链表walk啊,这是个在内核空间开辟出来的链表,而我们传入的从用户空间复制来的pollfd则通过copy_from_user方法拷贝给了walk链表,我们需要处理的文件描述符总数nfds则是由调用者传进来的,最后遍历的是内核空间的链表walk。
看来书里说的没错,资料说的也没错,是我自己理解的有问题。我身在第一层,以为自己看到了第五层,其实自己只看到了第三层哈哈 。poll在用户态时的结构的确是数组,但是到了内核态却巧妙的转变成了链表,分配一个pollfd结构的数组并把该数组中元素的数目通知内核成了调用者的责任,内核不再需要知道类似fd_set的固定数据大小的数据类型。
kernel设计者真牛。
4 comments
原来如此,我说怎么看到的有矛盾
是可以理解内核态进行了转换吗
Nice One,我也以为资料错了
哈哈哈 我也发现这个问题了,缘分