Want to get more articles like this one? Join my newsletter

WordPress for the adventurous: WP_Query class

It’s not too far-fetched to see WordPress as a library. You write a post and publish it. Meanwhile, WordPress classifies it and puts it away on its shelves.

But how do you find your post again after WordPress shelved it away? You need someone to help you navigate this huge library and find what you’re looking for. You need a librarian!

That’s the job of the WP_Query class in a nutshell. It’s the librarian of the WordPress database. You talk to it when you want to search through the WordPress database. It’ll help you find the information that you need!

It does this without requiring that you know how WordPress stores that information. That means that you don’t have to know how to write MySQL queries. WP_Query takes care of all that for you. (Yay!) It’ll transform your search request into a safe MySQL query and process it for you.

But how does the WP_Query class do this? That’s what we’re going to look at today. We’re going to explore how it works and what it can do for you. This will help you leverage it to the maximum in the future.

What problem does WP_Query solve?

MySQL queries are a bit of tricky beast. On one hand, they let you do pretty much anything that you want. That said, in the hands of an inexperienced developer, they’re a serious security risk.

The truth is that a lot of us don’t write code that touches MySQL that often. Even with the tools that WordPress gives us, it’s always a bit of a risk. That’s why WordPress put the WP_Query class at our disposal.

Gone is the need to create MySQL queries! You can just fill an array with query arguments instead. WordPress takes care of all the MySQL messiness for you and just returns the posts that you want. That’s pretty sweet.

The life cycle of WP_Query

So what happens when WP_Query needs to fetch posts from the
database? How does it turn an array of query arguments into an array of WP_Post objects? This isn’t a documented process so we’re left to ourselves to figure out what’s going on.

WP_Query diagram

To help us with that, we’re going to break down the process into distinct steps (shown above). These attempt to explain what’s going on inside WP_Query during that process. Let’s take a look at them.

Initializing WP_Query

Before WP_Query can do anything, it first needs to initialize itself. It does this using two methods: init and init_query_flags. init is the main method that WP_Query calls when it wants to initialize itself. It resets all its internal variables to their initial values.

init then calls init_query_flags. This is a secondary initialization method that focuses only on resetting the query flags. This just means setting them all to false.

So why does WP_Query do things this way? Why doesn’t it just do it in its constructor? It’s because WordPress lets you reuse the same WP_Query object to do as many queries as you want. Because of this, it needs these methods to live outside the constructor.

Extracting the WordPress query arguments

Once initialized, WP_Query needs a query to parse. To WP_Query, a query is an array of query arguments. It’s what you use instead of MySQL. It needs this array before it can do anything.

That said, it also accepts the query in the form of a string. It needs to follow the same format as URL query strings (e.g. year=2012&monthnum=12&day=12). If this happens, WP_Query will just convert the string to an array of query arguments using wp_parse_args.

These query arguments then get assigned to two internal variables: query and query_vars. There’s no difference between the two nowadays. In the past, query would store the string version of the WordPress query. Meanwhile, query_vars would contain the array version.

Converting the query arguments into query flags

Once WP_Query has an array of query arguments, it needs to convert them into something it can use. That’s the query flags that we discussed earlier. This conversion to query flags happens in the parse_query method.

The name of the method is a bit misleading. There’s no parsing that happens in this method anymore. The method name is just another artifact from when WP_Query used a string instead of an array.

So what happens in parse_query? Well first, it’ll validate all the inputs in the query_vars array. For example, if you set a year, it’ll ensure that it’s a non-negative integer and so on.

Once parse_query has validated the query arguments array, it can start the conversion process. That conversion process is nothing more than dozens of if, elseif and else statements. It uses those conditional statements to inspect all the validated query arguments. It’s during that inspection that it sets the appropriate query flags.

Turning query arguments into a MySQL query

Alright, so parse_query has converted query arguments into query flags. That said, these query flags won’t get posts for us. We still need a MySQL query to fetch them.

Creating this MySQL query is the main job of the get_posts method. The method is a lot like parse_query. It’s just a huge set of conditional statements.

SELECT $found_rows $distinct $fields FROM $wpdb->posts $join WHERE 1=1 $where $groupby $orderby $limits

These conditional statements have a single purpose. That’s to populate all the variables that make up the MySQL query (shown above). To achieve that goal, get_posts needs over 1,200 lines of code. (Wowzers!)

Because of its size, it doesn’t make a lot of sense to go over get_posts in great detail. (It wouldn’t be that useful for us anyways.) That said, there are a few things going on that are worth highlighting.

Subqueries

Right now, our focus has been on the WP_Query class itself. That said, it’s worth mentioning that there are other WP_*_Query classes. get_posts uses a few of them to generate parts of the MySQL query.

The first one is WP_Meta_Query. This is the query that get_posts uses to handle custom field parameters. It uses it to generate part of the join,where and orderby variables.

The next one is WP_Date_Query. get_posts relies on it whenever you use date parameters in your query arguments. Using it, get_posts creates some of the SQL in the where variable.

The last one is WP_Tax_Query. get_posts uses it whenever you use the tax_query query parameter. WP_Tax_Query will use it to generate some of the SQL for the join and where variables. It’s also worth noting that all category and tag query parameters get merged into tax_query argument. So they use WP_Tax_Query as well.

Actions and Filters

get_posts has two action hooks that you can use: pre_get_posts and posts_selection. It also has a countless number of filters that you can use. (Seriously, there’s a lot!) These filters let you make changes to the MySQL query that get_posts is generating. And their large number means that you can make those changes with surgical precision.

But let’s not forget about those two actions that we mentioned at first. posts_selection doesn’t seem to get any use at all. The documentation says that it’s for caching plugins, but they don’t seem to use it now. (Maybe they did at some point?)

Meanwhile, pre_get_posts is the complete opposite. It’s a super useful action hook. So much so that we’re going to take a small break to look at it!

The pre_get_posts hook

pre_get_posts is one of the most powerful hooks in all WordPress. It lets you change theWP_Query object before theget_posts method starts generating the MySQL query. It does that by passing you a reference to the WP_Query object.

This means that all the changes that you make are permanent. There’s no need to use a global variable or return value. You’re always modifying the WP_Query object that get_posts is going to use. (Scary!)

It also means that you can also do some serious damage if you’re not careful. WordPress isn’t going to do any validation to see if the changes that you made are safe. It’ll just ensure that all the necessary query arguments are there using fill_query_vars.

A small pre_get_posts example

On the flip side, you can do some pretty cool stuff with this hook. A common example is hiding a post category. You can do that by adding the category ID to the category__not_in query argument.

function remove_uncategorized_category(WP_Query $query)
{
    $query->query_vars['category__not_in'][] = 1;
}
add_action('pre_get_posts', 'remove_uncategorized_category');

This hides the default Uncategorized category which always has the ID of 1. It’s worth noting that WP_Query always initializes the category__not_in query argument as an array. That’s why we don’t have to do any validation before adding the category ID to the array.

But there’s a problem with our function. Can you spot it? Our posts are also hidden in the WordPress administration panel. (Oops!) Let’s make a small change to fix that.

function remove_uncategorized_category(WP_Query $query)
{
    if (!is_admin()) {
        $query->query_vars['category__not_in'][] = 1;
    }
}
add_action('pre_get_posts', 'remove_uncategorized_category');

We just added an is_admin check. That way you can still see the posts when you’re in the WordPress administration panel. And that’s why you need to be careful when you use pre_get_posts hook!

Generating WP_Post objects

The last step of the process is converting our database results into WP_Post objects. This happens twice at the tail end of the get_posts method. In both cases, the conversion uses this block of code:

$this->posts = array_map( 'get_post', $this->posts );

posts is a WP_Query internal variable. It contains the result of the MySQL query that get_posts generated. This result is an array containing either more arrays or stdClass objects. Neither of these are WP_Post objects.

That’s where array_map comes in. It’ll take the array inside posts and pass each element to get_post. get_post will then convert each of these array elements to a WP_Post object.

Once array_map finishes going through the array, it’ll only contain WP_Post objects. It’s a simple trick, but it highlights the power of PHP’s array functions. One small line of code to convert all your posts to WP_Post objects!

WP_Query and “The Loop”

Now that you understand how WP_Query fetches posts from the database. We have to look at another important aspect of the WP_Query class. And that’s its relationship with “The Loop“.

In fact, for a lot of people (and maybe you too!), “The Loop” is what they associate with WP_Query. They just can’t think of one without the other. After all, “The Loop” is pretty much the foundation of WordPress. It’s how most of us interact with posts. So how does WP_Query manage the loop?

The history of “The Loop”

Before we get into the inner workings of “The Loop”, let’s go over some of its history. The idea of a loop that goes through every WordPress post isn’t new. It’s been around since the time of b2.

Before the introduction of the WP_Query class, “The Loop” looked like this:

<?php if ($posts) : foreach ($posts as $post) : start_wp(); ?>

It was just a foreach loop that would go through the posts variable. That was a global variable that WordPress would use to store all the posts that it fetched from the database. It wasn’t any different from the posts variable that we saw earlier inside WP_Query.

But where did WordPress fetch posts before WP_Query? Well, WordPress would create and execute a MySQL query in wp-blog-header.php. It would then store the result from that query in posts. This would happen only once per page load. There was no way to run the WordPress query again afterwards.

Now, let’s go back to our foreach loop. As it loops through the posts variable, it creates a post variable in the global scope. start_wp would then use that global variable to display the current post.

There was a serious drawback to how “The Loop” worked before WP_Query. You couldn’t use “The Loop” more than once. That’s because WordPress stored everything in global variables.

The arrival of the WP_Query class changed that. Instead of relying on global variables, WordPress would store the query results in it. The query generation code also moved from wp-blog-header.php to WP_Query. These changes made it possible to use WP_Query to create more than one loop in your code.

How does WP_Query manage “The Loop”?

So that was the origin of the relationship between WP_Query and “The Loop”. We’ve also seen how WP_Query generates a MySQL query to fetch posts from the database. There’s only one piece of the puzzle left. It’s to look at what’s going on in WP_Query when it’s going through “The Loop”.

The internal variables

From WP_Query‘s perspective, “The Loop” is just a bunch of internal variables. These variables are: current_post, in_the_loop, post and post_count. All that WP_Query does is manage them as it goes through “The Loop”.

Out of those four variables, three are important to the inner workings of “The Loop”. current_post stores the index value of the current post in the posts array. post is the WP_Post object at the current_post index. post_count tracks the total number of posts in the posts array.

Meanwhile, in_the_loop is just a flag that tracks whether WP_Query is in “The Loop”. WP_Query doesn’t even use it. It’s there to give you an easy way to find out what the status of “The Loop” is.

Checking if we have posts in “The Loop”

Before looping through the posts that WP_Query fetched, you need to know if it even fetched any. That’s part of the job of the have_posts method. It returns true or false whether there are still posts in “The Loop” or not.

It replaces the if ($posts) from the old loop. Instead of that if statement, have_posts compares the current_post variable to the post_count variable. It’s looking to see what would happen if it incremented current_post. Would it be larger than post_count?

As we saw in the previous section, current_post is the index of the current post in the posts array. We don’t want current_post to be larger or equal to post_count. That would mean that current_post points to a non-existing array element.

Resetting “The Loop”

When that happens, have_posts does the other part of its job. It resets “The Loop” using the rewind_posts method. It’s a small method that changes some of the internal variables that manage “The Loop”.

The method resets the current_post index to -1. This tells WP_Query that the loop hasn’t started yet. It also changes the post variable so that it contains the post at the beginning of the posts array. That’s because the post stored in post needs to match the one that current_post points to.

Once rewind_posts finished resetting “The Loop”, have_posts sets the in_the_loop flag to false. This is the last step in the reset process of “The Loop”.

Looping through the posts

In the old loop, you’d cycle through all the posts using foreach ($posts as $post). The new loop replaces that foreach loop with the the_post method. This is the method that does the actual looping part of “The Loop”.

Whenever you call the_post, it’ll always start by setting the in_the_loop flag to true. It’ll also check if current_post is set to -1. If it is, it’ll call the loop_start hook.

Once that’s done, the_post will call the next_post method. This is a small method that increments current_post index by one. It then fetches the post at that index in the posts array and sets it to the internal post variable. It finishes up by returning the post to the the_post method.

Setting up the post data

Once it has a post, the_post has one last thing to do. It needs to set up all the data from the WP_Post object into global variables that WordPress will use. This is the job that the start_wp function handled in the old loop. Now, it’s the_post that handles it by calling the setup_postdata method.

The setup_postdata method does almost the same thing as the old start_wp function. The big difference is that you have to pass it a post to setup. start_wp would only use the post global variable from the foreach($posts as $post).

So what does setup_postdata do? Well first, it needs to ensure that you passed it a WP_Post object. If you didn’t, setup_postdata will try to convert it into one. If that doesn’t work, the method doesn’t do anything. (Bummer)

Once it has aWP_Post object, setup_postdata starts extracting global variables from it. These are:

  • id
  • authordata
  • currentday
  • currentmonth
  • page
  • pages
  • multipage
  • more
  • numpages

These global variables are remnants of some of the oldest code in core. Some, like id, are pretty easy to figure out, but others are not. That’s why we’re going to take a moment to go over them.

Pagination global variables

The primary use of these global variables is pagination. WordPress needs a lot of them to handle it. So which one are they?

To begin, you have page which stores the current page number. Unlike the rest of the global variables, page comes from the query parameter with the same name. It doesn’t come from the WP_Post object. This makes sense since it doesn’t make sense to store the current page number in a post.

Next, you have pages which is an array that contains the content of the post, but split by page. setup_postdata creates the array by exploding the content using <!--nextpage--> as the delimiter. It then counts how many elements are in pages and stores that in numpages.

The last pagination global variables is multipage. It’s a flag that can be either true or false. multipage is set to true whenever numpages is greater than one. This alerts you that the post needs to use pagination.

Controlling the more tag

The more global variable is the one that’s least documented and hardest to understand. It’s a flag that tells WordPress whether to respect the more tag or not. That’s the <!--more--> that you add to your post when you want to only have a teaser on the homepage.

By default, more has a value of 0. This tells WordPress to respect the more tag in your post. setup_postdata will change that value to 1 in specific scenarios. These are when:

  • is_page is true. (You’re viewing a page.)
  • is_single is true. (You’re viewing a post.)
  • is_feed is true. (You’re on a feed.)
  • numpages is greater than 1 and page is greater than 1. (You’re not on page 1 of a multipage post.)

If you think about it, those scenarios make sense. You only want to truncate the content for a teaser on the home page. You don’t want that when you’re viewing a post, a page or a feed.

What about the other global variables?

The rest of the global variables are pretty straightforward. You have id which is the ID of the post that setup_postdata is setting up. currentday and currentmonth are the day and month the author published that post. setup_postdata formats the two using mysql2date.

You also have authordata. It stores the result of the call to get_userdata. Unless there’s an error, this will always be the WP_User object of the post author.

Managing post comments

WP_Query doesn’t just manage posts. It also manages the comments of a post. WP_Query stores them in the comments internal variable as an array. And, in most cases, this array will stay empty. WP_Query doesn’t fetch comments by default.

It needs a comment feed

WP_Query itself will only fetch comments in a specific situation. That’s when the is_comment_feed flag is true. When that happens, get_posts will run a separate query to fetch the comments of a post. It’ll then store the result of that query in the comments variable.

There’s also the comment template

There’s one other situation where WordPress will fetch comments for WP_Query. That’s inside the comments_template function. This is the function that a theme calls when it wants to load the comments template for a post.

Like we mentioned earlier, WordPress doesn’t fetch the comments of a post by default. That means that, when you call comments_template, the comments array is still empty. That’s a bit of an issue when your job is to load the template to display these comments.

But don’t you worry, comments_template is on the case! It fetches all comments using a comment query. It then stores the result of the query inside WP_Query. This lets the comments template use the comment loop.

The comment loop

“Comment loop?”, you say. Why yes there’s also a comment loop! WP_Query is the class in charge of managing it.

The code for the comment loop is like a leaner version of the code for “The Loop”. It only uses three internal variables: comment, comment_count and current_comment. WP_Query uses them the same way it does with their post counterparts.

The methods are also the same as “The Loop”. You just replace “post” with “comment”. The result is that the comment loop looks the same as “The Loop”.

That said, the comment loop doesn’t see that much use. That’s because theme designers have the option to use another more convenient function. That’s wp_list_comments.

How does WP_Query handle nested loops?

As we’ve seen throughout this article, WP_Query uses a LOT of global variables. But what happens when you want to nest a loop inside another? How does it manage all these global variables? The trick is an object-oriented feature called “encapsulation“.

Encapsulation to the rescue!

That’s because, if you dig down, every WordPress query is an instance of WP_Query. That’s true even for the main WordPress query. WordPress stores it in the wp_the_query global variable.

With encapsulation, WordPress can ensure that every query result stays safe. Each WP_Query object will always store the result of their own query inside itself. And you can access it as long as that instance of WP_Query still exists.

And it gets better! Encapsulation also ensures each WP_Query object has its own loop. Like the query result, each WP_Query object stores the state of its own loop. You can go back to it at any time as long as PHP didn’t destroy that instance of the WP_Query object.

What’s really happening

With this in mind, let’s go back to our initial question, “How does WP_Query handle nested loops?” Well, the reality is that there isn’t any nesting happening per se. It was all clever trick by encapsulation! (The rascal!)

The fact is that each WP_Query instance encapsulates its own loop. It never contains anything other than its own loop. So what does the WP_Query class do when you’re “nesting” loops?

Well, it still needs to manage all the global variables for the post and query. Those are the variables that WordPress functions use. If we don’t replace them, they won’t refer to the correct post or query.

For example, let’s take the have_posts function. It calls the WP_Query method from the instance stored in the wp_query global variable. That means that we need wp_query to always store the current query that we’re using.

So that’s what nesting loop come down to. WordPress needs to replace global variables whenever you swap from one query to another. This isn’t as complicated as it sounds.

Managing global variables

The main global variable replacement scenario is for post global variables. So let’s say that you switch from one WP_Query instance to another. You need the_title to output the title of the current post in the query that you just swapped to and not the old one.

The reset_postdata method in the WP_Query class handles this scenario. It takes the post stored inside the post variable and restores its global variables. It does that by setting the post variable as the new post global variable. It then calls setup_postdata so that it can restore the rest of the global variables.

WordPress also offers the wp_reset_postdata function. This function also calls the reset_postdata method. It calls it on the WP_Query instance stored in the wp_query global variable.

wp_query is an important global variable for WordPress. This is where it stores what it considers to be the current query. All WordPress loop functions refer to it when they need to access to a WP_Query instance. That’s why WordPress also needs a function to reset the wp_query global variable.

That’s the job of the wp_reset_query function. It replaces the current query with the main WordPress query. As we saw earlier, WordPress stores that query in the wp_the_query global variable.

The function itself just replaces the wp_query instance with the wp_the_query one. Once it does that, it calls wp_reset_postdata. This resets all the global variable so that they point to the current post in the main WordPress query.

Your personal WordPress librarian

The WP_Query class has now been around for over a decade. That’s a long time! And even after all this time, it’s still the preferred way to access posts stored in WordPress.

That’s because creating MySQL queries isn’t for everyone. So it’s handy to have a personal librarian to help you find posts. But the WP_Query class isn’t without its complexities.

There are a lot of global variables at play. And things can get messy when you try to nest queries. That’s why it’s a good idea to know how it works. But also know how it ties back to the inner workings of WordPress and “The Loop”.

Creative Commons License