Category Archives: Linux Kernel Data Structures

Linux Kernel Exploration ToolKit – Part 2 – Queue (FIFO)

A common programming pattern in any operating system kernel is producer and consumer. In this pattern, a producer creates data—say, networking packets to be processed  or  error messages to be read —while a consumer, in turn, reads, processes, or otherwise consumes the data. Often the easiest way to implement this pattern is with a queue.The producer pushes data onto the queue and the consumer pulls data off the queue.The consumer retrieves the data in the order it was enqueued.That is, the first data on the queue is the first data off the queue. For this reason, queues are also called FIFOs, short for first-in, first-out. See the below figure for an example of a standard queue.

The Linux kernel’s generic queue implementation is called kfifo and is implemented in <kernel/kfifo.c> and declared in <linux/kfifo.h>.

kfifo

Linux’s kfifo works like most other queue abstractions, providing two primary operations:enqueue (in) and dequeue (out).The kfifo object maintains two offsets into the queue: an in offset and an out offset.The in offset is the location in the queue to which the next enqueue will occur.The out offset is the location in the queue from which the next dequeue will occur.The out offset is always less than or equal to the in offset. It wouldn’t make sense for it to be greater; otherwise, you could dequeue data that had not yet been enqueued

The enqueue (in) operation copies data into the queue, starting at the in offset.When complete, the in offset is incremented by the amount of data enqueued.The dequeue (out) operation copies data out of the queue, starting from the out offset.When complete, the out offset is incremented by the amount of data dequeued.When the out offset is equal to the in offset, the queue is empty: No more data can be dequeued until more data is enqueued.When the in offset is equal to the length of the queue, no more data can be enqueued until the queue is reset.

kfifo declaration

  58struct __kfifo {
  59        unsigned int    in;
  60        unsigned int    out;
  61        unsigned int    mask;
  62        unsigned int    esize;
  63        void            *data;
  64};

Creating a Queue

To use a kfifo, you must first define and initialize it. as with most kernel objects, you can do this dynamically or statically.The most common method is dynamic.

 319/**
 320 * kfifo_alloc - dynamically allocates a new fifo buffer
 321 * @fifo: pointer to the fifo
 322 * @size: the number of elements in the fifo, this must be a power of 2
 323 * @gfp_mask: get_free_pages mask, passed to kmalloc()
 324 *
 325 * This macro dynamically allocates a new fifo buffer.
 326 *
 327 * The numer of elements will be rounded-up to a power of 2.
 328 * The fifo will be release with kfifo_free().
 329 * Return 0 if no error, otherwise an error code.
 330 */
 331#define kfifo_alloc(fifo, size, gfp_mask) \
 332__kfifo_int_must_check_helper( \
 333({ \
 334        typeof((fifo) + 1) __tmp = (fifo); \
 335        struct __kfifo *__kfifo = &__tmp->kfifo; \
 336        __is_kfifo_ptr(__tmp) ? \
 337        __kfifo_alloc(__kfifo, size, sizeof(*__tmp->type), gfp_mask) : \
 338        -EINVAL; \
 339}) \
 340)

What is gfp_mask ?

The low-level kernel memory allocation functions take a set of flags describing how that allocation is to be performed. Among other things, these GFP_ (“get free page”) flags control whether the allocation process can sleep and wait for memory, whether high memory can be used, and so on.  See this article for the full set.

If you want to allocate the buffer yourself, you can:

 354/**
 355 * kfifo_init - initialize a fifo using a preallocated buffer
 356 * @fifo: the fifo to assign the buffer
 357 * @buffer: the preallocated buffer to be used
 358 * @size: the size of the internal buffer, this have to be a power of 2
 359 *
 360 * This macro initialize a fifo using a preallocated buffer.
 361 *
 362 * The numer of elements will be rounded-up to a power of 2.
 363 * Return 0 if no error, otherwise an error code.
 364 */
 365#define kfifo_init(fifo, buffer, size) \
 366({ \
 367        typeof((fifo) + 1) __tmp = (fifo); \
 368        struct __kfifo *__kfifo = &__tmp->kfifo; \
 369        __is_kfifo_ptr(__tmp) ? \
 370        __kfifo_init(__kfifo, buffer, size, sizeof(*__tmp->type)) : \
 371        -EINVAL; \
 372})

Statically declaring a kfifo is simpler, but less common:

 124/**
 125 * DECLARE_KFIFO - macro to declare a fifo object
 126 * @fifo: name of the declared fifo
 127 * @type: type of the fifo elements
 128 * @size: the number of elements in the fifo, this must be a power of 2
 129 */
 130#define DECLARE_KFIFO(fifo, type, size) STRUCT_KFIFO(type, size) fifo
 131
 132/**
 133 * INIT_KFIFO - Initialize a fifo declared by DECLARE_KFIFO
 134 * @fifo: name of the declared fifo datatype
 135 */
 136#define INIT_KFIFO(fifo) \
 137(void)({ \
 138        typeof(&(fifo)) __tmp = &(fifo); \
 139        struct __kfifo *__kfifo = &__tmp->kfifo; \
 140        __kfifo->in = 0; \
 141        __kfifo->out = 0; \
 142        __kfifo->mask = __is_kfifo_ptr(__tmp) ? 0 : ARRAY_SIZE(__tmp->buf) - 1;\
 143        __kfifo->esize = sizeof(*__tmp->buf); \
 144        __kfifo->data = __is_kfifo_ptr(__tmp) ?  NULL : __tmp->buf; \
 145})

This creates a static kfifo named name with a queue of size bytes.As before, size must be a power of 2.

There are various functions  on  kfifo . please see here .

Mostly used functions are

Enqueuing Data

When your kfifo is created and initialized, enqueuing data into the queue is performed via the kfifo_in() function:

unsigned int kfifo_in(struct kfifo *fifo, const void *from, unsigned int len);

This function copies the len bytes starting at from into the queue represented by fifo. On success it returns the number of bytes enqueued. If less than len bytes are free in the queue, the function copies only up to the amount of available bytes.Thus the return value can be less than len or even zero, if nothing was copied.

Dequeuing Data

When you add data to a queue with kfifo_in(), you can remove it with kfifo_out():

unsigned int kfifo_out(struct kfifo *fifo, void *to, unsigned int len);

This function copies at most len bytes from the queue pointed at by fifo to the buffer pointed at by to. On success the function returns the number of bytes copied. If less than len bytes are in the queue, the function copies less than requested. When dequeued, data is no longer accessible from the queue.This is the normal usage of a queue, but if you want to “peek” at data within the queue without removing it, you can use kfifo_out_peek():

unsigned int kfifo_out_peek(struct kfifo *fifo, void *to, unsigned int len,unsigned offset);

This works the same as kfifo_out(), except that the out offset is not incremented,and thus the dequeued data is available to read on a subsequent call to kfifo_out().The parameter offset specifies an index into the queue; specify zero to read from the head of the queue, as kfifo_out() does.

Obtaining the Size of a Queue

To obtain the total size in bytes of the buffer used to store a kfifo’s queue, call kfifo_size():

static inline unsigned int kfifo_size(struct kfifo *fifo);

In another example of horrible kernel naming, use kfifo_len() to obtain the number of bytes enqueued in a kfifo:
static inline unsigned int kfifo_len(struct kfifo *fifo);
To find out the number of bytes available to write into a kfifo, call kfifo_avail():
static inline unsigned int kfifo_avail(struct kfifo *fifo);
Finally, kfifo_is_empty() and kfifo_is_full() return nonzero if the given kfifo is
empty or full, respectively, and zero if not:
static inline int kfifo_is_empty(struct kfifo *fifo);
static inline int kfifo_is_full(struct kfifo *fifo);

Resetting and Destroying the Queue

To reset a kfifo, jettisoning all the contents of the queue, call kfifo_reset():
static inline void kfifo_reset(struct kfifo *fifo);
To destroy a kfifo allocated with kfifo_alloc(), call kfifo_free():
void kfifo_free(struct kfifo *fifo);
If you created your kfifo with kfifo_init(), it is your responsibility to free the associated
buffer. How you do so depends on how you created it.

Example Queue Usage

With these interfaces under our belt, let’s take a look at a simple example of using a kfifo.
Assume we created a kfifo pointed at by fifo with a queue size of 8KB.We can now
enqueue data onto the queue. In this example, we enqueue simple integers. In your own
code, you will likely enqueue more complicated, task-specific structures. Using integers in
this example, let’s see exactly how the kfifo works:

unsigned int i;
/* enqueue [0, 32) to the kfifo named ‘fifo’ */
for (i = 0; i < 32; i++)
kfifo_in(fifo, &i; sizeof(i));
The kfifo named fifo now contains 0 through 31, inclusive.We can take a peek at the
first item in the queue and verify it is 0:
unsigned int val;
int ret;
ret = kfifo_out_peek(fifo, &val, sizeof(val), 0);
if (ret != sizeof(val))
return -EINVAL;

printk(KERN_INFO “%u\n”, val); /* should print 0 */
To dequeue and print all the items in the kfifo, we can use kfifo_out():
/* while there is data in the queue … */
while (kfifo_avail(fifo)) {
unsigned int val;
int ret;
/* … read it, one integer at a time */
ret = kfifo_out(fifo, &val, sizeof(val));
if (ret != sizeof(val))
return -EINVAL;
printk(KERN_INFO “%u\n”, val);
}
This prints 0 through 31, inclusive, and in that order. (If this code snippet printed the
numbers backward, from 31 to 0, we would have a stack, not a queue.)

References

1) Linux.Kernel.Development.3rd.Edition

/Suresh

 

Linux Kernel Exploration ToolKit – Part 1 – Linked lists

I thought to write about Linux kernel architecture  and various components in it. But before digging into that let me introduce you several built-in “code entities”  in the Linux kernel .Though it  is written in “C” it does not use libc,  it got its own entities . As with any large software project, the Linux kernel provides these “code entities”  to encourage code reuse. Kernel developers should use these data structures whenever possible and not “roll our own” solutions.

In the following sections, we cover the most useful of these generic data structures and other kernel code components

Linked List:

As you know in “C” , the size of the array is static and inserting a new element in the array ((not at the end :) ) is potentially expensive . Linked lists have their own strengths and weaknesses, but they happen to be strong where arrays are weak.

An array allocates memory for all its elements lumped together as one block of memory. In contrast, a linked list allocates space for each element separately in its own block of memory called a “linked list element” or “node”. The list gets is overall structure by using  pointers to connect all its nodes together like the links in a chain.The first element is often represented by a special pointer—called the head.

Types of Linked Lists:

Singly Linked List:

Each node contains two fields: a “data” field to store whatever element type the list holds for its client, and a “next” field which is a pointer used to link one node to the next node.

/* node in a  singly linked list */

struct  node {

void *data;

struct  node *next; /* pointer to the next element */

};

Doubly linked list

In some linked lists, each element also contains a pointer to the previous element.These lists are called doubly linked lists because they are linked both forward and backward.

/* a node in a doubly linked list */

struct node {

void *data; /* the payload */

struct list_element *next; /* pointer to the next element */

struct list_element *prev; /* pointer to the previous element */

};

Circular Linked Lists:

Normally, because the last element in a linked list has no next element, it is set to point to a special value, such as NULL, to indicate it is the last element in the list. In some linked lists, the last element does not point to a special value. Instead, it points back to the first value.This linked list is called a circular linked list because the list is cyclic. Circular linked  lists can come in both doubly and singly linked versions. In a circular doubly linked list, the first node’s “previous” pointer points at the last node.

Circular singly linked list:

Circular doubly linked list:

Linux Kernel Linked List implementation:

The Linux kernel’s linked list implementation is unique, it is fundamentally a circular doubly linked list. Using this type of linked list provides the greatest flexibility.

For example , we have a structure like as shown below

struct todo_tasks{

char *task_name;

unsigned int name_len;

short int status;

};

The common pattern for storing this structure in a linked list is to embed the list pointer in the structure as shown below

struct todo_tasks{

char *task_name;

unsigned int name_len;

short int status;

struct  todo_tasks  *next,*prev;

};

The Linux kernel approach is different. Instead of turning the structure into a linked list, the Linux approach is to embed a linked list node in the structure!

The kernel list code is declared in <linux/types.h> , These are implemented by a single include file <linux/list.h> . There is no separate “.c” file with any library of support routines. All of the code for handling linked lists is simple enough to be implemented using inline functions. Thus it is very easy to use this implementation in any other (GPLv2-licensed) project.

The list_head structure is shown below

struct list_head {

struct list_head *next,*prev;

};

The next pointer points to the next list node, and the prev pointer points to the previous

list node.Yet, seemingly, this is not particularly useful.

struct todo_tasks{

char *task_name;

unsigned int name_len;

short int status;

struct list_head  tasks_list

};

With this, tasks_list.next in todo_tasks points to the next element, and taks_list.prev in todo_tasks points to the previous. Now this is becoming useful.

The kernel provides a family of routines to manipulate linked lists. For example, the list_add() method adds a new node to an existing linked list.These methods , are generic and they accept only list_head structures.

So how to find the  address of the parent structure containing  the  kernel lists.?

Kernel uses the below code to do that.

 344/**
 345 * list_entry - get the struct for this entry
 346 * @ptr:        the &struct list_head pointer.
 347 * @type:       the type of the struct this is embedded in.
 348 * @member:     the name of the list_struct within the struct.
 349 */
 350#define list_entry(ptr, type, member) \
 351        container_of(ptr, type, member)

 684/**
 685 * container_of - cast a member of a structure out to the containing structure
 686 * @ptr:        the pointer to the member.
 687 * @type:       the type of the container struct this is embedded in.
 688 * @member:     the name of the member within the struct.
 689 *
 690 */
 691#define container_of(ptr, type, member) ({                      \
 692        const typeof( ((type *)0)->member ) *__mptr = (ptr);    \
 693        (type *)( (char *)__mptr - offsetof(type,member) );})

  20#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)

Don’t get confused with the above the code snippets it is quite simple and a well-known technique. Given a pointer to struct list_head in a data structure, macro list_entry() simply computes the pointer of the data structure. To achieve this we need to figure out where in the data structure the list_head is (offset of list_head). Then, simply deduct the list_head’s offset from the actual pointer passed to the macro.

Now the question is how can we compute the offset of an element in a structure? Suppose in the above  data structure structtodo_tasks   to find the offset of element tasks_list in it, this is how you do it:

(unsigned long)(&((struct todo_tasks *)0)->tasks_list)

Take memory address 0, and cast it to whatever the type of structure you have– in this case struct todo_tasks. Then, take the address of the member you’re interested in. This gives the offset of the member within the structure. Since we already know the absolute memory address of this element for a particular instance of the structure (for example in stuct todo_tasks *task1, by way of task1->tasks.list) deducting this offset point us to the address of the structure itself. That’s all there is to it.

To get a better handle on this I suggest you play around with the below code.

#include <stdio.h>

#include <stdlib.h>

struct foobar{

unsigned int foo;

char bar;

char boo;

};

int main(int argc, char** argv){

struct foobar tmp;

printf(“address of &tmp is= %p\n\n”, &tmp);

printf(“address of tmp->foo= %p \t offset of tmp->foo= %lu\n”, &tmp.foo, (unsigned long) &((struct foobar *)0)->foo);

printf(“address of tmp->bar= %p \t offset of tmp->bar= %lu\n”, &tmp.bar, (unsigned long) &((struct foobar *)0)->bar);

printf(“address of tmp->boo= %p \t offset of tmp->boo= %lu\n\n”, &tmp.boo, (unsigned long) &((struct foobar *)0)->boo);

printf(“computed address of &tmp using:\n”);

printf(“\taddress and offset of tmp->foo= %p\n”,

(struct foobar *) (((char *) &tmp.foo) – ((unsigned long) &((struct foobar *)0)->foo)));

printf(“\taddress and offset of tmp->bar= %p\n”,

(struct foobar *) (((char *) &tmp.bar) – ((unsigned long) &((struct foobar *)0)->bar)));

printf(“\taddress and offset of tmp->boo= %p\n”,

(struct foobar *) (((char *) &tmp.boo) – ((unsigned long) &((struct foobar *)0)->boo)));

return 0;

}

Output from this code is:

address of &tmp is= 0xbff43dfc

address of tmp->foo= 0xbff43dfc          offset of tmp->foo= 0

address of tmp->bar= 0xbff43e00          offset of tmp->bar= 4

address of tmp->boo= 0xbff43e01          offset of tmp->boo= 5

computed address of &tmp using:

address and offset of tmp->foo= 0xbff43dfc

address and offset of tmp->bar= 0xbff43dfc

address and offset of tmp->boo= 0xbff43dfc

Couple of things to note here about kernel lists.

  1. You can put struct list_head anywhere in your structure.
  2. You can name struct list_head variable anything you wish.
  3. You can have multiple lists!

So for example, the declaration below is also a valid one:

struct todo_tasks{

char *task_name;

unsigned int name_len;

short int status;

int sub_tasks;

int subtasks_completed;

struct list_head completed_subtasks;/* list structure */

int subtasks_waiting;

struct list_head waiting_subtasks;    /* another list of same or different items! */

struct list_head todo_list;                   /* list of todo_tasks */

};

Here are some examples of such lists from the kernel:

http://lxr.linux.no/linux+v3.6/include/linux/fs.h#L692

http://lxr.linux.no/linux+v3.6/include/linux/fs.h#L782

 Manipulating Linked Lists:

The kernel provides a family of functions to manipulate linked lists.They all take pointers to one or more list_head structures.The functions are implemented as inline functions in generic C and can be found in <linux/list.h>.

Interestingly, all these functions are O(1).1 This means they execute in constant time, regardless of the size of the list or any other inputs. For example, it takes the same  amount of time to add or remove an entry to or from a list whether that list has 3 or 3,000 entries.

For example To add a node to a linked list:

list_add(struct list_head *new, struct list_head *head)

This function adds the new node to the given list immediately after the head node.Because the list is circular and generally has no concept of first or last nodes, you can pass any element for head. If you do pass the “last” element, however, this function can be used to implement a stack.

To add a node to the end of a linked list:

list_add_tail(struct list_head *new, struct list_head *head)

This function adds the new node to the given list immediately before the head node. As with list_add(), because the lists are circular, you can generally pass any element for head. This function can be used to implement a queue, however, if you pass the “first” element. 

I think the best way to get familiar with the list functions is to simply scan this file   and this kernel.org doc . There are some examples of adding,deleting,traversing through kernel lists here .

References

1) Linux.Kernel.Development.3rd.Edition

2)  Linux.Kernel.Primer.A.Top.Down.Approach.for.x86.and.PowerPC.Architectures

3) http://lwn.net/Articles/336255/

4) http://isis.poly.edu/kulesh/stuff/src/klist/

/Suresh