Want to get more articles like this one? Join my newsletter

PHP strings and how to format them

Strings have always been an integral part of the programming world. We use them so much because they do something that’s almost impossible to do with other data types. They allow our code to communicate with others.

The definition of “others” is quite wide in this context. Others can be as straightforward as someone using your WordPress plugin. But it can also be more abstract like a browser reading HTML code generated by PHP. They are both a third party who interacts with strings generated by your code.

Sometimes, these strings are simple and used as is. But more often than not, these strings need to be dynamic. They need to adapt to what’s going on inside your code.

This latter one is why it’s important to have a good understanding of what you can do with strings in PHP. That’s because there are countless ways that you might want to alter a string. For example, you could have a string change based on certain conditions. Or you could insert one or more variables into it.

For this article, we’re going to focus on how you can format a string. There are a lot of built-in helper functions for strings in PHP. We’ll look at the ones that can help you achieve that goal.

The string data type

Strings have been around for a long time. They’re a fundamental concept in theoretical computer science. This is why they are a critical (if not necessary) component of any programming language.

But a string isn’t the same for everyone. For computer scientists, a string has a pure mathematical definition. It’s a finite sequence of symbols from finite set called alphabet. (Oh yeah, look at all those Wikipedia links!)

But this definition is way too formal for us developers. (We like things simple and devoid of mathematical terminology!) What we developers call a “string” is a specific type of string called “string literal“. It’s a quoted sequence of characters inside your code.

Let’s take $wp = 'WordPress'; as an example. In that example, 'WordPress' would be the string literal. And WordPress would be string value stored inside that string literal.

Escape sequences (maybe?)

Now, what happens if you wanted to store Don't hack core! inside a string literal? There’s a single quote inside your value. So you can’t write your string literal as 'Don't hack core!'.

PHP would throw an error if you did that. It would think that your string literal is 'Don'. And it wouldn’t be able to process the t hack core!' afterwards.

Instead, you need to use an escape character to escape the single quote in your string value. In PHP (and most programming languages), that escape character is the backslash. Using that escape character, our string literal would become 'Don\'t hack core!'.

This combination of \ and ' is what we call an escape sequence. These escape sequences let you represent special characters like a newline (represented by \n). Without escape sequences, these characters would be difficult or impossible for you to write.

Single quotes vs double quotes

This is a good time to talk about the difference between single quotes and double quotes. In PHP, the compiler doesn’t process string literals the same for both. A lot of that processing difference centers around escape sequences.

PHP will only process two escape sequences with single quotes. There’s \' for escaping a single quote and \\ for escaping a backslash. Outside those two, PHP processes everything else as is.

With double quotes, the PHP compiler will process every escape sequence. It’ll also expand every variable inside a string literal. This tends to be the main reason why developers use double quotes.

Variable expansion

So what is variable expansion? It’s when the PHP compiler processes a string literal and parses variables inside of it. PHP variable expansion has two types of syntax: simple and complex.

Simple syntax

$version = PHP_VERSION;
echo "Current PHP version:  $version";

Above is an example of the simple syntax. Anytime PHP encounters a $, it’ll try to match the text after to a valid variable name. That said, it can’t expand variables like PHP_VERSION since they don’t use a $. That’s why our example assigned the value of the constant to the version variable.

While you might not be able to use constants, PHP simple syntax doesn’t limit you to just variable names either. You can use it to access array values. You can also use it to access object values. Here are two examples to show these off:

$version = array(PHP_VERSION, 'php' => PHP_VERSION);
echo "Current PHP version: $version[0]";
echo "Current PHP version: $version[php]";

The first one is an example using an array containing both a numeric index and a string key. The simple syntax supports both these types of arrays. But be aware, with the simple syntax, you can’t put the associative array key inside quotes.

$php = new stdClass();
$php->version = PHP_VERSION;
echo "Current PHP version: $php->version";

The second example uses a stdClass object. We created a version internal variable where we assigned it the value of the PHP_VERSION constant. We then echo the string literal containing $php->version.

In both examples, PHP will echo “Current PHP version: ” followed by the version number like it did in the first example. That said, it’s worth pointing out that you’re limited to a variable depth of 1. You can’t do $version[0][1], $global->php->version or any combination of those.

Complex syntax

This brings us to the complex syntax. Let’s say you’re running into the limitations of the simple syntax. You can just use the complex syntax to overcome them.

Now, the name “complex syntax” is a bit misleading. It isn’t more complicated to use than the simple syntax. In fact, it’s quite easy to use.

The name “complex syntax” just comes from the goal of the syntax. It allows you to use more complex variable expressions inside string literals. All that you need to do is wrap the variable expression in curly brackets.

echo "Current PHP version: {$GLOBALS['php']->version}";

So this is what the complex syntax looks like in practice. We’re using the $GLOBALS superglobal to access our php object from earlier. This example works because the PHP compiler processes everything as PHP.

There are still limitations with the complex syntax. You can’t use it with just a function inside. That said, it does work with object methods! Here’s an example:

class php {
    function get_version() {
        return PHP_VERSION;
    }
}

$php = new php();
echo "Current PHP version: {$php->get_version()}";

For this example, we created a class named php. Instead of giving it an internal variable like before, we created the get_version method. That method returns the value of the PHP_VERSION constant.

We then instantiate a new php object and assign it to the php variable. We then call the get_version method inside the curly brackets. This will generate the same output as all the other examples.

Why not use double quotes all the time?

At this point, you might be thinking that double quotes are pretty sweet. Why aren’t everyone using them all the time!? Good question!

In terms of performance, there’s no difference between the two. It’s possible that there was a time when there was a difference. But today, the argument that one performs better than the other doesn’t hold any water.

That said, the general recommendation is still to use single quotes as much as you can. You should limit double quotes to when you need to escape or expand something inside a string. This recommendation is also part of the WordPress coding standard.

heredoc (and nowdoc)

Most of us are only familiar with defining string literals using quotes. But PHP has another lesser known mechanism for defining them. We call it heredoc.

Heredoc lets you do something that’s harder to do with a quoted string literal. It lets you define a block of text as a literal. Here’s an a small example:

$version = PHP_VERSION;
echo <<<EOT
Current PHP version: $version
EOT;

<<< is the heredoc operator. It’s what tells PHP that the block of text below is a literal. EOT is the identifier that PHP will look for to close off the block of text.

It’s worth noting that you don’t have to use EOT as the identifier either. Any string will work as an identifier as long as you use the same one at the end of your block of text. What’s important is that the closing identifier line only contain the identifier followed by ;.

By default, PHP will process a heredoc like it would a double quoted string literal. It’ll process all valid escape sequences and expand variables when it can. To control how PHP will process a heredoc, you need to use quotes around the opening identifier.

$version = PHP_VERSION;
echo <<<"EOT"
Current PHP version: $version
EOT;

echo <<<'EOT'
Current PHP version: $version
EOT;

Here's our previous example using quotes around the identifier. The first example behaves the same as before. The second one will echo "Current PHP version: $version".

Using string literals to format strings

Now that we've seen what strings are, let's get back to the topic of formatting them. The most common and popular way of doing that is to use string literals. You can use variable expansion like we've seen in the examples so far.

Or you can use string operators to concatenate strings and variables together. In PHP, the concatenation operator is .. You can see an example of string concatenation below.

echo 'Current PHP version: ' . PHP_VERSION;

We concatenate Current PHP version: with the PHP_VERSION constant. You'll notice that we removed the version variable. String concatenation works fine with PHP constants. We don't need to store them in a variable for them to work.

Formatting strings using "printf"

PHP also has more advanced string formatting functions centered around printf. This is a class of functions used by programming languages since Fortran in the 1950s. That said, it didn't get popular until the 1970s with the C programming language. That's where the function name comes from.

So what does printf do? It lets you format strings according to a special templating language. This is a much more powerful way to insert variables into them.

In PHP, the printf function takes the string template as its first argument. Following that, there can be an indefinite amount of arguments. printf will insert these arguments into the string according to the given string template.

The "printf" templating language

The goal of the printf templating language is to replace variable expansion. Instead, you put variable placeholders inside a string. printf will then replace them with the arguments that you passed to the function.

Now, if all it did was replace variable expansion, you'd wonder why you should use it in the first place. It wouldn't feel more useful than what you've been doing with variable expansion already. And, on top of that, variable expansion tends to feel more intuitive to developers.

But that's not all that the printf templating language does. It also gives you options to control how printf formats the placeholders. This is where the power and utility of printf comes from.

"printf" placeholder syntax

Here's the full definition of a printf placeholder:

%[parameter][flags][width][.precision][length]type

In total, there are six formatting options. All the ones inside square brackets are optional. Let's go over them and see how they work.

Required type option

A printf placeholder always starts with a % . This is the identifier that tells printf to check for a placeholder. It'll try to parse it and find out if it has any formatting options.

The other required element of a placeholder is the type option. It tells printf how to format the argument before inserting it into the string. There are quite a few type options at your disposal, but we won't go through them all. Here are the important ones that are worth remembering:

  • % is the only special type option. It doesn't count as a placeholder and ignores all other formatting options. What it does is tell printf that you want it to output a percentage sign.
  • d tells printf that the argument is an integer and that you want it to format it as a signed base 10 number.
  • s tells printf that the argument is a string and to leave it as is.

These three type options will cover 99% of your formatting needs. That's because, most of the time, you want to format either a string or a number. The other options are for more specific use cases.

printf('Current %s version: %s', 'PHP', PHP_VERSION);

Here's a small printf example with two arguments and just the required options. The goal of the example is to show you how printf inserts arguments into placeholders. By default, it'll insert them in the same order as the arguments for the function call.

So, in our example, printf would replace the first %s with PHP. It would then replace the %s with the value of the PHP_VERSION constant. The result would be the same as every other example so far.

printf('Current %s version: %d', 'PHP', PHP_VERSION);

Now, let's say that we changed the second %s to a %d? Well, printf would try to format the value from PHP_VERSION as a signed base 10 number. This would result in printf removing everything but the major version from the string. printf would output either "Current PHP version: 5" or "Current PHP version: 7".

Parameter option

But what happens when you want to reuse an argument or replace them in a different order? That's the purpose of the parameter option. It lets you specify which argument you want printf to replace the placeholder with.

To use the parameter option, you need to add n$ to your placeholder. The n represents the position of the parameter that you want to use. printf bases it on the call that you made to it. Since its first parameter is always the string template, n will always begin at 1. The $ is only there to tell it that this number was for parameter option.

printf('Current %2$s version: %1$s', PHP_VERSION, 'PHP');

This is a modified version of our previous example. We only made two changes to it. We added the parameter option to our placeholders. Both %s became %2$s and %1$s.

Using the parameter option, we changed the order that printf would replace our arguments. To counter this change, we also inverted the order of the arguments inside the function call. The result is that printf will still output the same text as before.

This brings us to an important point that you should be aware of. The parameter option is only optional as long as you don't use it. If you use it for one placeholder, you must use it for every placeholder in your string template.

Flag option

Next up is the flag option. This isn't an option that you'll use as often. It lets you change some of the default printf behaviour. We'll go over the more useful options.

Let's start with the - flag. It tells printf that you want it to align the placeholder text to the left. This only applies when you use the width option. When you use the width option, printf will align the placeholder to the right by default.

There's also the + flag. You can use this flag when formatting numbers with printf. When you use that flag, printf will prepend positive numbers with a plus sign. By default, printf only adds a minus sign to negative numbers.

The last one is the 0 flag. You can use this one when you want printf to prepend 0s to a number. You can only use this flag when you're formatting a number and using the width option.

Width option

And this brings us to the width option! This option is there to enforce a minimum length for the placeholder. When an argument is shorter than the defined width option, printf will pad it with spaces. That said, printf won't truncate arguments that are longer than the width option.

Like we saw before, the width option ties into a few of the flag options. If you use the 0 flag, printf will pad numbers with 0s instead of spaces. The - flag lets you align the argument to the left with the padding added to the right.

So far, we've only seen that you could pad your arguments with either spaces or 0s. But printf doesn't limit you to just those two. You can also specify the character used to pad your placeholder. To do that, you need to add ' followed by the character that you want to use.

An example using a credit card
$creditcard = '4012888888881881';

printf("Your credit card number: %'*16s", substr($creditcard, -4));

This small example above demonstrates how to use a custom character with your placeholder. We have a credit card number stored in the creditcard variable. printf will print out the last 4 digits of the credit card with the other characters replaced with *.

First, let's take a look at the %'*16s placeholder that the example is using. '*' tells it that we want to use*instead of the usual space for padding. The16s` says that we're formatting a string and that its minimum size is 16 characters.

Now, let's jump to the printf function call. You'll notice that we're not passing it the creditcard variable right away. Instead, we use the substr function to extract the last four characters from it. That's what the -4 argument tells it to do. It tells substr that we want a 4 character long substring starting from the end.

So the result of this whole maneuver is that only 1881 makes it into the printf function call. It will then use that second argument to output Your credit card number: ************1881. And that's it for our credit card example!

Precision option

Our previous example highlights a shortcoming of the width option. We can't use it to enforce a maximum string length. Lucky for us, that's what the precision option does! It controls how big the placeholder can get.

That said, it doesn't always behave the same way. For strings and integers, it does what we described earlier. It controls the maximum size of the placeholder.

But for decimal values, it controls how precise a number is. This means how many numbers appear after the decimal point. This is where the name of the option comes from.

Using precision with our credit card example

Let's go back to our credit card example. We're going to remove the use substr to create our substring. Instead, we're going to use the precision option.

$creditcard = '4012888888881881';

printf("Your credit card number: %'*16.4s", $creditcard);

Our placeholder is now %'*16.4s. We added the .4 to the placeholder to tell printf to limit our string to a maximum of 4 characters. What's the output of printf now? It's Your credit card number: ************4012.

Oops, that's not the same output as earlier! What happened? The problem is that the precision option will truncate the string from the beginning. There's no way to do something like substr($creditcard, -4) with it.

So this doesn't look too good for use right now. That said, we can improve it so that the formatted string mirrors our credit card number. We just need to change our placeholder to %'*-16.4s.

We added the - flag to our placeholder. This tells printf to align the placeholder text to the left. This results in printf outputting Your credit card number: 4012************.

Now, the output matches our credit card number. That said, it's not as useful as what we were doing earlier with substr. This is a limitation that you have to keep in mind if you want to use the precision option.

Length option

Last up is the length option. This one has a bit of a confusing name. You'd think it would be the option to control the length of the placeholder.

Except it's not. Like we saw already, that's the role of the width option! So what does the length option do!?

It lets you control the length of the argument before printf processes it. This is a subtle, but important difference. Most often, it comes into play when you start dealing with different sizes of variables (e.g. long or short integers).

printf can't process these different variable sizes the same way. You need to tell it when you're passing it arguments with non-standard sizes. That's what the length option does.

But this isn't something you'll deal with a lot in PHP. PHP variables always come in one size. There's no such thing as different integer or float sizes. Because of that, the length option isn't that useful for us in the PHP world.

PHP functions

So far, we've only used the printf function in our examples. That said, PHP has a few other functions that we can use. Let's take a look at them.

Standard printf functions

Let's start off with what we'll call the "standard printf functions". There are three of them: printf, fprintf and sprintf. The difference between each function is how they output their result.

printf outputs the result right away much like the print language construct. Meanwhile,fprintf will output its result into the resource that you pass to it. This is often a file pointer.

sprintf is the function that we tend to see and use the most. It returns its result as a string. It's the one that we can use to format strings instead of string literals.

Array-based printf functions

We'll call the other set of functions "array-based printf functions". Like our "standard printf functions", there are also three of them. You have vprintf, vfprintf and vsprintf.

All three functions have almost the same name as their standard counterpart. The only difference is that they're all prefixed with a "v". That's because they all have the same change in behaviour.

"Standard printf functions" expect you to pass them arguments one at a time in the function call. In contrast, our "array-based printf functions" only take an array as an extra argument. This array should contain all the values that you want the function to use.

Scanning functions

Let's imagine a situation. You have a string or a file resource that someone formatted a specific way. You're looking to extract values from it.

This could be a job for a regular expression. But it's also a job that our last group of functions can do! They can parse a string or file resource using a printf string template. And, once done, they return the values from the template placeholders.

There are two functions that can do this. fscanf is the function that parses a file resource. Meanwhile, sscanf is the one that parses a string.

Possible return values

The way these functions return the parsed values is a bit unusual. You have two options. If you use one of the functions with only the required parameters, it'll return an array. That array will contain all the parsed values.

$return = sscanf('Your credit card number: ************4012', 'Your credit card number: %16s');

Here's how this works using our previous credit card example. We passed the string result that we got earlier to sscanf as the first argument. The second argument is the string template.

You'll notice that we removed the '*. That's because sscanf can't process it (not sure why). This leaves us with %16s. This means that the return array will contain ************4012 at the 0 index.

The other option is to pass optional arguments to the function. The function will assign the parsed values to those optional arguments. You can then use these optional arguments because the function defines them by reference.

sscanf('Your credit card number: ************4012', 'Your credit card number: %16s', $creditcard);

This is the previous example repurposed to use this alternate return method. Instead of $return =, we pass it the creditcard variable. sscanf will assign ************4012 to it as a string.

Some thoughts on scanning functions

In practice, you won't see either of these functions used much. The reality is that these functions are a poor substitute for regular expressions. If you need to parse a string or a file, it's just better to use them instead.

That said, for a lot of us, regular expressions are intimidating. So while these functions aren't as powerful as them, they are easier to use. This makes them a valid alternative for anyone who wouldn't use a regular expression. Since, if you know how to write a template string, you can use them to parse a string.

Wrapping this up

So this is most likely more than you ever wanted to read about PHP strings and how to format them! It's no doubt a lot to take in. But, keep in mind, this knowledge transcends just PHP.

Because we, as developers, deal with strings all the time. And more often than not, you need them formatted a certain way. That's why it's worth investing some time to better understand how they work and how to format them.

Creative Commons License