Drupal page rendering process

From John VanDyk's drupal install.

A Walk Through Drupal's Page Serving Mechanism or Tiptoeing Sprightly Through the PHP

This is a commentary on the process Drupal goes through when serving a page. For convenience, we will choose the following URL, which asks Drupal to display the first node for us. (A node is a thing, usually a web page.)

A visual companion to this narration can be found on John's site. Before we start, let's dissect the URL. I'm running on an OS X machine, so the site I'm serving lives at /users/user/sites/. The drupal directory contains a checkout of the latest Drupal CVS tree. It looks like this:

So the URL above will be be requesting the root directory / of the Drupal site. Apache translates that into index.php. One variable/value pair is passed along with the request: the variable 'q' is set to the value 'node/1'.

So, let's pick up the show with the execution of index.php, which looks very simple and is only a few lines long.

Let's take a broad look at what happens during the execution of index.html. First, the includes/bootstrap.inc file is included, bringing in all the functions that are necessary to get Drupal's machinery up and running. There's a call to drupal_page_header(), which starts a timer, sets up caching, and notifies interested modules that the request is beginning." Next, the includes/common.inc file is included, giving access to a wide variety of utility functions such as path formatting functions, form generation and validation, etc. The call to fix_gpc_magic() is there to check on the status of "magic quotes" and to ensure that all escaped quotes enter Drupal's database consistently. Drupal then builds its navigation menu and sets the variable $status to the result of that operation. In the switch statement, Drupal checks for cases in which a Not Found or Access Denied message needs to be generated, and finally a call to drupal_page_footer(), which notifies all interested modules that the request is ending. Drupal closes up shop and the page is served. Simple, eh?

Let's delve a little more deeply into the process outlined above.

The first line of index.php includes the includes/bootstrap.inc file, but it also executes code towards the end of bootstrap.inc. First, it destroys any previous variable named $conf. Next, it calls conf_init(). This function allows Drupal to use site-specific configuration files if it finds them. It returns the name of the site-specific configuration file; if no site-specific configuration file is found, sets the variable $config equal to the string 'conf'. Next, it includes the named configuration file. Thus, in the default case it will include 'conf.php'. The code in conf_init would be easier to understand if the variable $file were instead called $potential_filename. Likewise $conf_filename would be a better choice than $config.

The selected configuration file (normally /includes/conf.php) is now parsed, setting the $db_url variable, the optional $db_prefix variable, the $base_url for the website, and the $languages array (default is "en"=>"english").

The database.inc file is now parsed, with the primary goal of initializing a connection to the database. If MySQL is being used, the database.mysql.inc files is brought in; if Postgres is being used, the pear database abstraction layer is used. Although the global variables $db_prefix, $db_type, and $db_url are set, the most useful result of parsing database.inc is a global variable called $active_db which contains the database connection handle.

Now that the database connection is set up, it's time to start a session by including the includes/session.inc file. Oddly, in this include file the executable code is located at the top of the file instead of the bottom. What the code does is to tell PHP to use Drupal's own session storage functions (located in this file) instead of the default PHP session code. A call to PHP's session_start() function thus calls Drupal's sess_open() and sess_read() functions. The sess_read function creates a global $user object and sets the $user->roles array appropriately. Since I am running as an anonymous user, the $user->roles array contains one entry, 1->anonymous user.

We have a database connection, a session has been set up...now it's time to get things set up for modules. The includes/module.inc file is included but no actual code is executed.

The last thing bootstrap.inc does is to set up the global variable $conf, an array of configuration options. It does this by calling the variable_init() function. If a per-site configuration file exists and has already populated the $conf variable, this populated array is passed in to variable_init(). Otherwise, the $conf variable is null and an empty array is passed in. In both cases, a populated array of name-value pairs is returned and assigned to the global $conf variable, where it will live for the duration of this request. It should be noted that name-value pairs in the per-site configuration file have precedence over name-value pairs retrieved from the "variable" table by variable_init().

We're done with bootstrap.inc! Now it's time to go back to index.php and call drupal_page_header(). This function has two responsibilities. First, it starts a timer if $conf['dev_timer'] is set; that is, if you are keeping track of page execution times. Second, if caching has been enabled it retrieves the cached page, calls module_invoke_all() for the 'init' and 'exit' hooks, and exits. If caching is not enabled or the page is not being served to an anonymous user (or several other special cases, like when feedback needs to be sent to a user), it simply exits and returns control to index.php.

Back at index.php, we find an include statement for common.php. This file is chock-full of miscellaneous utility goodness, all kept in one file for performance reasons. But in addition to putting all these utility functions into our namespace, common.php includes some files on its own. They include theme.inc, for theme support; pager.inc for paging through large datasets (it has nothing to do with calling your pager); and menu.inc. In menu.inc, many constants are defined that are used later by the menu system.

The next inclusion that common.inc makes is xmlrpc.inc, with all sorts of functions for dealing with XML-RPC calls. Although one would expect a quick check of whether or not this request is actually an XML-RPC call, no such check is done here. Instead, over 30 variable assignments are made, apparently so that if this request turns to actually be an XML-RPC call, they will be ready. An xmlrpc_init() function instead may help performance here?

A small tablesort.inc file is included as well, containing functions that help behind the scenes with sortable tables. Given the paucity of code here, a performance boost could be gained by moving these into common.inc itself.

The last include done by common.inc is file.inc, which contains common file handling functions. The constants FILE_DOWNLOADS_PUBLIC = 1 and FILE_DOWNLOADS_PRIVATE = 2 are set here, as well as the FILE_SEPARATOR, which is \ for Windows machines and / for all others.

Finally, with includes finished, common.inc sets PHP's error handler to the error_handler() function in the common.inc file. This error handler creates a watchdog entry to record the error and, if any error reporting is enabled via the error_reporting directive in PHP's configuration file (php.ini), it prints the error message to the screen. Drupal's error_handler() does not use the last parameter $variables, which is an array that points to the active symbol table at the point the error occurred. The comment "// set error handler:" at the end of common.inc is redundant, as it is readily apparent what the function call to set_error_handler() does.

The Content-Type header is now sent to the browser as a hard coded string: "Content-Type: text/html; charset=utf-8".

If you remember that the URL we are serving ends with /~vandyk/drupal/?q=node/1, you'll note that the variable q has been set. Drupal now parses this out and checks for any path aliasing for the value of q. If the value of q is a path alias, Drupal replaces the value of q with the actual path that the value of q is aliased to. This sleight-of-hand happens before any modules see the value of q. Cool.

Module initialization now happens via the module_init() function. This function runs require_once on the admin, filter, system, user and watchdog modules. The filter module defines FILTER_HTML* and FILTER_STYLE* constants while being included. Next, other modules are include_once'd via module_list(). In order to be loaded, a module must (1) be enabled (that is, the status column of the "system" database table must be set to 1), and (2) Drupal's throttle mechanism must determine whether or not the module is eligible for exclusion when load is high. First, it determines whether the module is eligible by looking at the throttle column of the "system" database table; then, if the module is eligible, it looks at $conf["throttle_level"] to see whether the load is high enough to exclude the module. Once all modules have been include_once'd and their names added to the $list local array, the array is sorted by module name and returned. The returned $list is discarded because the module_list() invocation is not part of an assignment (e.g., it is simply module_list() and not $module_list = module_list()). The strategy here is to keep the module list inside a static variable called $list inside the module_list() function. The next time module_list() is called, it will simply return its static variable $list rather than rebuilding the whole array. We see that as we follow the final objective of module_init(); that is, to send all modules the "init" callback.

To see how the callbacks work let's step through the init callback for the first module. First module_invoke_all() is called and passed the string enumerating which callback is to be called. This string could be anything; it is simply a symbol that call modules have agreed to abide by, by convention. In this case it is the string "init".

The module_invoke_all() function now steps through the list of modules it got from calling module_list(). The first one is "admin", so it calls module_invoke("admin","init"). The module_invoke() function simply puts the two together to get the name of the function it will call. In this case the name of the function to call is "admin_init". If a function by this name exists, the function is called and the returned result, if any, ends up in an array called $return which is returned after all modules have been invoked. The lesson learned here is that if you are writing a module and intend to return a value from a callback, you must return it as an array.

Back to common.inc. There is a check for suspicious input data. To find out whether or not the user has permission to bypass this check, user_access() is called. This retrieves the user's permissions and stashes them in a static variable called $perm. Whether or not a user has permission for a given action is determined by a simple substring search for the name of the permission (e.g., "bypass input data check") within the $perm string. Our $perm string, as an anonymous user, is currently "0access content, ". Why the 0 at the beginning of the string? Because $perm is initialized to 0 by user_access().

The actual check for suspicious input data is carried out by valid_input_data() which lives in common.inc. It simply goes through an array it's been handed (in this case the $_REQUEST array) and checks all keys and values for the following "evil" strings: javascript, expression, alert, dynsrc, datasrc, data, lowsrc, applet, script, object, style, embed, form, blink, meta, html, frame, iframe, layer, ilayer, head, frameset, xml. If any of these are matched watchdog records a warning and Drupal dies (in the PHP sense). I wondered why both the keys and values of the $_REQUEST array are examined. This seems very time-consuming. Also, would it die if my URL ended with "/?xml=true" or "/?format=xml"?

The next step in common.inc's executable code is a call to locale_init() to set up locale data. If the user is not an anonymous user and has a language preference set up, the two-character language key is returned; otherwise, the key of the single-entry global array $language is returned. In our case, that's "en".

The last gasp of common.inc is to call init_theme(). You'd think that for consistency this would be called theme_init() (of course, that would be a namespace clash with a callback of the same name). This finds out which themes are available, which the user has selected, and then include_once's the chosen theme. If the user's selected theme is not available, the value at $conf["theme_default"] is used. In our case, we are an anonymous user with no theme selected, so the default xtemplate theme is used. Thus, the file themes/xtemplate/xtemplate.theme is include_once'd. The inclusion of xtemplate.theme calls include_once("themes/xtemplate/xtemplate.inc", creates a new object called xtemplate as a global variable. Inside this object is an xtemplate object called "template" with lots of attributes. Then there is a nonfunctional line where SetNullBlock is called. A comment indicates that someone is aware that this doesn't work.

Now we're back to index.php! A call to fix_gpc_magic() is in order. The "gpc" stands for Get, Post, Cookie: the three places that unescaped quotes may be found. If deemed necessary by the status of the boolean magic_quotes_gpc directive in PHP's configuration file (php.ini), slashes will be stripped from $_GET, $_POST, $_COOKIE, and $_REQUEST arrays. It seems odd that the function is not called fix_gpc_magic_quotes, since it is the "magic quotes" that are being fixed, not the magic. In my distribution of PHP, the magic_quotes_gpc directive is set to "Off", so slashes do not need to be stripped.

The next step is to set up menus. I'm not sure why we're setting up menus for an anonymous user, but let's go ahead and follow the logic anyway. We jump to menu_execute_active_handler() in menu.inc. This sets up a $_menu array consisting of items, local tasks, path index, and visible arrays. Then the system realizes that we're not going to be building any menus for an anonymous user and bows out. The real meat of the node creation and formatting happens here, but is complex enough for a separate commentary. Back in index.php, the switch statement doesn't match either case and we approach the last call in the file, to drupal_page_footer in common.inc. This takes care of caching the page we've built if caching is enabled (it's not) and calls module_invoke_all() with the "exit" callback symbol.

Although you may think we're done, PHP's session handler still needs to tidy up. It calls sess_write() in session.inc to update the session database table, then sess_close() which simply returns 1.

We're done.