cryptofreak.org cryptofreak home projects
contact about
Contact:


projects
News Agenda
Antera Antera
News Commentator
News fcreate
gkrellmGIMPS gkrellmGIMPS
Linux Porting Linux Porting
mod-chal mod-chal
Quake III Quake III
News Zope
Contact: webmaster

From: Don jessup (djessup72, yahoo dot com)
Date: 2002.04.08 - 20.12 MDT


  All right I made a attempt to make a trace of the read system call.
  It's attached to this email.  Let me know what you think. 

  I think I need to find a way to make it easier to read. 
  Any ideas?  

  By the way these documents were created with StarOffice.
  It is pretty impressive.   There may not be a need to use
  MS office ever again.

  don

   

__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/

SysReadDataFlow
The following is a high level trace of the linux read system call through user and kernel space. This link has a written description of each step in this image.



StepsPathImage0.gif

SysReadDataFlow

This link is a image of the linux system read path through the kernel and user space.

  1. User space program invokes the read system call using _libc_read.

  2. The _libc_read makes the transition into kernel space by setting EAX register with the appropiate sys_call symbol in this case __NR_read, which is defined in asm/unistd.h and then generates a software interrupt also called a gate (0x80).

  3. The system call routine in arch/i386/kernel/entry.S uses the syscall table to look up the appropiate sys function call based on the contents of the EAX register. In this case the sys_read function is invoked.

  4. sys_read located in fs/read_write.c looks up the file object using the fd parameter. Using the file object's function pointer for read, sys_read invokes the FS dependent read call.

  5. The function pointer was initialized at the time of file open and is set differently depending on the type of FS and file i.e. regular file, block file, char file…. . In the case of a ext2 and most other FSs and even block files the read call is set to generic_file_read function located in mm/filemap.c

  6. generic_file_read checks whether O_DIRECT flag has been selected. If the flag has been selected generic_file_direct_IO function is called otherwise do_generic_file_read function is executed. In this example we are following path for a regular file without O_DIRECT flag set.

  7. do_generic_file_read located in the mm/filemap.c is a complex function that tries to optimize for sequential reads. It looks for the data in page cache, by calling __find_page_nolock.

  8. __find_page_nolock located in mm/filemap.c fails because in this example our data lives on disk and is not in any cache.

  9. __find_page_nolock returns NULL;

  10. do_generic_file_read calls page_cache_alloc.

  11. page_cache_alloc located in include/linux/pagemap.h is just a wrapper for alloc_page which just returns a new allocated page frame.

  12. page_cache_alloc returns with the allocated page frame.

  13. do_generic_file_read adds the page to the page cache by calling __add_to_page_cache

  14. __add_to_page_cache located in mm/filemap.c inserts the new page into the page_cache.

  15. __add_to_page_cache returns.

  16. The file’s inode’s i_mapping address function pointer readpage, is used to fill the the new page. The function pointer is FS dependent in the case of ext2 and it calls the ext2_read_page function.

  17. ext2_readpage located in fs/ext2/inode.c is just a wrapper for block_read_full_page located in fs/buffer.c.

  18. block_read_full_page locks the page and creates empty buffers for the page. The function proceeds to fill the empty buffers from the buffer cache by calling get_block for each buffer. In this case get_block fails because the data is located on disk. Next the buffers are locked and set_buffer_async_io is called. This routine just sets the bh->b_end_io function ptr to the end_buffer_io_async function. Next submit_bh is invoked.

  19. submit_bh is located in devices/block/ll_rw_blk.c it just creates a new bio and initializes it with the buffer_head and initializes the bio’s bi_end_io function ptr with end_bio_bh_io_sync then calls submit_bio.

  20. submit_bio located in devices/block/ll_rw_blk.c just does some validity checks and updates some kernel statistics and then invokes generic_make_request.

  21. generic_make_request located in devices/block/ll_rw_blk.c gets the devices request_queue and calls the decices make_request_fn function ptr. This function ptr can be defined by the device driver or the device can choose to use a generic function __make_request. LVM,MD are examples of drives who defined their own make_request. In this example we assume the device that is being read from is using the default __make_request.

  22. The __make_request function located in devices/block/ll_rw_blk.c must arrange to transfer the given block. The __make_request function must grab the queues request lock (NOTE: In 2.4 this was a global lock for all request queues 2.5 has a lock for each request queue) before manipulating the request queue. The __make_request function allows clusterd request by delaying the actual I/O request to allow the joining together of request that operate on adjacent blocks. This is done by plugging the queue. The function blk_plug_device accomplishes this.

  23. blk_plug_device located in devices/block/ll_rw_blk.c schedules the plug_tq task queue descriptor in the tq_disk task_queue to cause the devices request_fn routine to be activated latter.

  24. After scheduling the task blk_plug_device returns.

  25. On return from blk_plug_device the __make_request function allocates a new request for the request queue and adds the bio to the request. Then __make_request function unlocks the queue and returns. Note: On subsequent calls to __make_request the kernel applies an “elevator” algorithm to the request, this algorithm tries to keep the disk head moving in the same direction as long as possible; this approach tends to minimize seek times while ensuring that all request get satisified eventually.

  26. The __make_request function returns 0 to generic_make_request. This causes generic_make_request to return.

  27. On return from generic_make_request submit_bio returns 1;

  28. submit_bh function returns the result from submit_bio.

  29. On return from submit_bh block_read_full_page calls submit_bh once for each buffer. Once completed, block_read_full_page returns 0;

  30. ext2_readpage the function ptr for readpage returns the result from block_read_full_page.

  31. On return from readpage do_generic_file_read checks to see if the Page is up to date. If it is not up to date as it is in this case it issues a readahead for this page. On return from the readahead wait_on_page is called.

  32. wait_on_page located in include/linux/pagemap.h locks the page and invokes __wait_on_page.

  33. __wait_on_page declares a waitqueue adds the pages wait queue entry to the wait queue and sets the taks to TASK_UNINTERRYPTIBLE then invokes sync_page.

  34. sync_page located in mm/filemap.c invokes the page’s function ptr sync_page. sync_page was initialized to block_sync_page on opening of the file.

  35. block_sync_page located in fs/buffer.c issues a request for tq_disk tasklet to be run by calling run_task_queue.

  36. run_task_queue sets up tq_disk tasklet to be run on next schedule invocation.

  37. Return from run_task_queue.

  38. block_sync_page function returns 0.

  39. sync_page function returns 0.

  40. __wait_on_page goes to sleep on return from sync_page by calling schedule. The sync_page and the call to schedule are continuously called until the page is unlocked. This eventually will cause the tq_disk tasklet to be run.

  41. tq_disk tasklet uses a function pointer to call generic_unplug_device. The initialization of td_disk tasklet to call generic_unplug_device happen in blk_init_queue which is called upon during initialization of the block driver this is not shown in this thread.

  42. generic_unplug_device checks to make sure that the queue is not empty and calls the device driver specific function pointer request_fn. This starts the I/O.

  43. A typical driver request function will do the following: Checks the validity of the request. Spawn a data transfer and return immediately without ending the request. This free’s up the the cpu and allows the request to be collected while the device is dealing with the current one. Once the device has the request it issues a interrupt and the bottom half of the interrupt handles the IO completion by calling end_that_request_first.

  44. end_that_request_first located in device/block/ll_rw_blk.c ends the I/O on the first buffer attached by calling bio_endio.

  45. bio_endio located in fs/bio.c sets the bio object up to date and calls the bio’s function ptr bi_end_io. This was originally initialized in the function submit_bh to end_bio_bh_io_sync.

  46. end_bio_bh_io_sync located in drivers/block/ll_rw_blk.c calls the buffer head’s function pointer b_end_io. This function pointer was initialized in block_read_full_page with end_buffer_io_async.

  47. end_buffer_io_async located in fs/buffer.c marks the buffers up to date and page buffer up to date and unlocks the page buffer.

  48. end_buffer_io_async returns

  49. end_bio_bh_io_sync releases the bio struct and returns 0.

  50. bio_endio returns 0;

  51. end_that_request_first sets up the next buffer_head to be transferred (if any) and returns 1 else the request is finished and returns 0.

  52. The request function will call end_that_request_last when done with the request or if there are more buffers it will grab the next one and spawn another data transfer. end_that_request_last just returns the request to the request queue free list. Then request function returns.

  53. generic_unplug_device returns thus ending the tasklet.

  54. Eventually schedule is called again and control is returned to __wait_on_page function.

  55. __wait_on_page checks to if the page is unlocked and in this case it is, this happen in end_buffer_io_async. __wait_on_page sets the state of the thread to TASK_RUNNING and removes itself off the wait queue and returns.

  56. wait_on_page returns

  57. do_generic_file_read copies the page to user space by calling the function actor. The actor routine has to update the user buffer pointers and the remaining count. The do_generic_file_read updates the access time of the inode and returns.

  58. generic_file_read updates the return value and then returns the value.

  59. sys_read updates the director and updates the file objects statistics and returns.

  60. system_call function handle signals, possibily schedules and does a RESTORE_ALL and calls iret. In this case iret returns program control from the software-generated interrupt gate to the user space program.

  61. _libc_read checks for errors and returns.

  62. User space returns from the read call with the amount of data it has read.



-- This is the antera mailing list. To unsubscribe, email majordomo, cryptofreak dot org with message body `unsubscribe antera'. Or, for more information, visit http://www.cryptofreak.org/.