Sometimes PREG_SPLIT_DELIM_CAPTURE does strange results.
<?php
$content = '<strong>Lorem ipsum dolor</strong> sit <img src="test.png" />amet <span class="test" style="color:red">consec<i>tet</i>uer</span>.';
$chars = preg_split('/<[^>]*[^\/]>/i', $content, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
print_r($chars);
?>
Produces:
Array
(
[0] => Lorem ipsum dolor
[1] => sit <img src="test.png" />amet
[2] => consec
[3] => tet
[4] => uer
)
So that the delimiter patterns are missing. If you wanna get these patters remember to use parentheses.
<?php
$chars = preg_split('/(<[^>]*[^\/]>)/i', $content, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
print_r($chars); //parentheses added
?>
Produces:
Array
(
[0] => <strong>
[1] => Lorem ipsum dolor
[2] => </strong>
[3] => sit <img src="test.png" />amet
[4] => <span class="test" style="color:red">
[5] => consec
[6] => <i>
[7] => tet
[8] => </i>
[9] => uer
[10] => </span>
[11] => .
)
preg_split
(PHP 4, PHP 5)
preg_split — Éclate une chaîne par expression rationnelle
Description
Éclate une chaîne par expression rationnelle.
Liste de paramètres
- pattern
-
Le masque à chercher, sous la forme d'une chaîne de caractères.
- subject
-
La chaîne d'entrée.
- limit
-
Si limit est spécifié, alors seules les limit premières sous-chaînes sont retournées et si limit vaut -1, cela signifie en fait "sans limite", ce qui est utile pour passer le paramètre flags .
- flags
-
flags peut être la combinaison des options suivantes (combinées avec l'opérateur |):
- PREG_SPLIT_NO_EMPTY
- Si cette option est activée, seules les sous-chaînes non vides seront retournées par preg_split().
- PREG_SPLIT_DELIM_CAPTURE
- Si cette option est activée, les expressions entre parenthèses entre les délimiteurs de masques seront aussi capturées et retournées.
- PREG_SPLIT_OFFSET_CAPTURE
-
Si cette option est activée, pour chaque résultat, la position de celui-ci sera retournée. Notez que cela change la valeur retournée en un tableau où chaque élément est un tableau constitué de la chaîne trouvée à la position 0 et la position de la chaîne dans subject à la position 1.
Valeurs de retour
Retourne un tableau contenant les sous-chaînes de subject , séparées par les chaînes qui vérifient pattern .
Historique
| Version | Description |
|---|---|
| 4.3.0 | Le drapeau PREG_SPLIT_OFFSET_CAPTURE a été ajouté. |
| 4.0.5 | Le drapeau PREG_SPLIT_DELIM_CAPTURE a été ajouté. |
| 4.0.0 | Le paramètre flags a été ajouté. |
Exemples
Exemple #1 Exemple avec preg_split() : Éclatement d'une chaîne de recherche
<?php
// scinde la phrase grâce aux virgules et espacements
// ce qui inclus les " ", \r, \t, \n et \f
$keywords = preg_split("/[\s,]+/", "langage hypertexte, programmation");
?>
Exemple #2 Scinder une chaîne en caractères
<?php
$str = 'string';
$chars = preg_split('//', $str, -1, PREG_SPLIT_NO_EMPTY);
print_r($chars);
?>
Exemple #3 Scinde une chaîne et capture les positions
<?php
$str = 'langage hypertexte, programmation';
$chars = preg_split('/ /', $str, -1, PREG_SPLIT_OFFSET_CAPTURE);
print_r($chars);
?>
L'exemple ci-dessus va afficher :
Array
(
[0] => Array
(
[0] => langage
[1] => 0
)
[1] => Array
(
[0] => hypertexte,
[1] => 8
)
[2] => Array
(
[0] => programmation
[1] => 20
)
)
Notes
Si vous n'avez pas besoin de la puissance des expressions régulières, vous pouvez choisir des alternatives plus rapides (quoique plus simples) comme explode() ou str_split().
Voir aussi
- spliti() - Scinde une chaîne en un tableau, grâce à une expression rationnelle
- split() - Scinde une chaîne en un tableau, grâce à une expression rationnelle
- implode() - Rassemble les éléments d'un tableau en une chaîne
- preg_match() - Expression rationnelle standard
- preg_match_all() - Expression rationnelle globale
- preg_replace() - Rechercher et remplacer par expression rationnelle standard
preg_split
24-Oct-2009 10:26
06-Oct-2009 08:23
To split a camel-cased string using preg_split() with lookaheads and lookbehinds:
<?php
function splitCamelCase($str) {
return preg_split('/(?<=\\w)(?=[A-Z])/', $str);
}
?>
24-Sep-2009 09:34
If you want to use something like explode(PHP_EOL, $string) but for all combinations of \r and \n, try this one:
<?php
$text = "A\nB\rC\r\nD\r\rE\n\nF";
$texts = preg_split("/((\r(?!\n))|((?<!\r)\n)|(\r\n))/", $text);
?>
result:
array("A", "B", "C", "D", "", "E", "", "F");
01-Aug-2009 07:57
Extending m.timmermans's solution, you can use the following code as a search expression parser:
<?php
$search_expression = "apple bear \"Tom Cruise\" or 'Mickey Mouse' another word";
$words = preg_split("/[\s,]*\\\"([^\\\"]+)\\\"[\s,]*|" . "[\s,]*'([^']+)'[\s,]*|" . "[\s,]+/", $search_expression, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
print_r($words);
?>
The result will be:
Array
(
[0] => apple
[1] => bear
[2] => Tom Cruise
[3] => or
[4] => Mickey Mouse
[5] => another
[6] => word
)
1. Accepted delimiters: white spaces (space, tab, new line etc.) and commas.
2. You can use either simple (') or double (") quotes for expressions which contains more than one word.
28-May-2009 04:36
Spacing out your CamelCase using preg_replace:
<?php
function spacify($camel, $glue = ' ') {
return preg_replace( '/([a-z0-9])([A-Z])/', "$1$glue$2", $camel );
}
echo spacify('CamelCaseWords'), "\n"; // 'Camel Case Words'
echo spacify('camelCaseWords'), "\n"; // 'camel Case Words'
?>
27-May-2009 10:11
Here's a helpful function to space out your CamelCase using preg_split:
<?php
function spacify($camel, $glue = ' ') {
return $camel[0] . substr(implode($glue, array_map('implode', array_chunk(preg_split('/([A-Z])/',
ucfirst($camel), -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE), 2))), 1);
}
echo spacify('CamelCaseWords'); // 'Camel Case Words'
echo spacify('camelCaseWords'); // 'camel Case Words'
?>
23-May-2009 02:56
If you need convert function arguments without default default values and references, you can try this code:
<?php
$func_args = '$node, $op, $a3 = NULL, $form = array(), $a4 = NULL'
$call_arg = preg_match_all('@(?<func_arg>\$[^,= ]+)@i', $func_args, $matches);
$call_arg = implode(',', $matches['func_arg']);
?>
Result: string = "$node,$op,$a3,$form,$a4"
27-Mar-2009 07:02
how to display a shortened text string with an elipsis, but on word boundaries only.
<?php
function truncate($string, $max = 70, $rep = '...') {
$words = preg_split("/[\s]+/", $string);
$newstring = '';
$numwords = 0;
foreach ($words as $word) {
if ((strlen($newstring) + 1 + strlen($word)) < $max) {
$newstring .= ' '.$word;
++$numwords;
} else {
break;
}
}
if ($numwords < count($words)) {
$newstring .= $rep;
}
return $newstring;
}
?>
hope this helps someone! thanks for all the help from everyone else!!
17-Mar-2009 09:06
If the task is too complicated for preg_split, preg_match_all might come in handy, since preg_split is essentially a special case.
I wanted to split a string on a certain character (asterisk), but only if it wasn't escaped (by a preceding backslash). Thus, I should ensure an even number of backslashes before any asterisk meant as a splitter. Look-behind in a regular expression wouldn't work since the length of the preceding backslash sequence can't be fixed. So I turned to preg_match_all:
<?php
// split a string at unescaped asterisks
// where backslash is the escape character
$splitter = "/\\*((?:[^\\\\*]|\\\\.)*)/";
preg_match_all($splitter, "*$string", $aPieces, PREG_PATTERN_ORDER);
$aPieces = $aPieces[1];
// $aPieces now contains the exploded string
// and unescaping can be safely done on each piece
foreach ($aPieces as $idx=>$piece)
$aPieces[$idx] = preg_replace("/\\\\(.)/s", "$1", $piece);
?>
17-Jul-2008 08:17
<?php
$s = '<p>bleh blah</p><p style="one">one two three</p>';
$htmlbits = preg_split('/(<p( style="[-:a-z0-9 ]+")?>|<\/p>)/i', $s, -1, PREG_SPLIT_DELIM_CAPTURE);
print_r($htmlbits);
?>
Array
(
[0] =>
[1] => <p>
[2] => bleh blah
[3] => </p>
[4] =>
[5] => <p style="one">
[6] => style="one"
[7] => one two three
[8] => </p>
[9] =>
)
two interesting bits:
1. When using PREG_SPLIT_DELIM_CAPTURE, if you use more than one pair of parentheses, the result array can have members representing all pairs. See array indexes 5 and 6 to see two adjacent delimiter results in which the second is a subset match of the first.
2. If a parenthesised sub-expression is made optional by a following question mark (ex: '/abc (optional subregex)?/') some split delimiters may be captured in the result while others are not. See array indexes 1 and 2 to see an instance where the overall match succeeded and returned a delimiter while the optional sub-expression '( style="[-:a-z0-9 ]+")?' did not match, and did not return a delimiter. This means it's possible to have a result with an unpredictable number of delimiters in the result array.
This second aspect is true irrespective of the number of pairs of parentheses in the regex. This means: in a regular expression with a single optional parenthesised sub-expression, the overall expression can match without generating a corresponding delimiter in the result.
29-May-2008 10:56
For people who want to use the double quote to group words/fields, kind of like CSV does, you can use the following expression:
<?php
$keywords = preg_split( "/[\s,]*\\\"([^\\\"]+)\\\"[\s,]*|[\s,]+/", "textline with, commas and \"quoted text\" inserted", 0, PREG_SPLIT_DELIM_CAPTURE );
?>
Which will result in:
Array
(
[0] => textline
[1] => with
[2] => commas
[3] => and
[4] => quoted text
[5] => inserted
)
04-Sep-2007 08:29
I was having trouble getting the PREG_SPLIT_DELIM_CAPTURE flag to work because I missed reading the "parenthesized expression" in the documentation :-(
So the pattern should look like:
/(A)/
not just
/A/
and it works as described/expected.
23-Mar-2005 04:41
preg_split() behaves differently from perl's split() if the string ends with a delimiter. This perl snippet will print 5:
my @a = split(/ /, "a b c d e ");
print scalar @a;
The corresponding php code prints 6:
<?php print count(preg_split("/ /", "a b c d e ")); ?>
This is not necessarily a bug (nowhere does the documentation say that preg_split() behaves the same as perl's split()) but it might surprise perl programmers.
25-Sep-2004 03:01
To clarify the "limit" parameter and the PREG_SPLIT_DELIM_CAPTURE option,
<?php
$preg_split('(/ /)', '1 2 3 4 5 6 7 8', 4 ,PREG_SPLIT_DELIM_CAPTURE );
?>
returns:
('1', ' ', '2', ' ' , '3', ' ', '4 5 6 7 8')
So you actually get 7 array items not 4
29-May-2002 07:01
The above description for PREG_SPLIT_OFFSET_CAPTURE may be a bit confusing.
When the flag is or'd into the 'flags' parameter of preg_split, each match is returned in the form of a two-element array. For each of the two-element arrays, the first element is the matched string, while the second is the match's zero-based offset in the input string.
For example, if you called preg_split like this:
preg_split('/foo/', 'matchfoomatch', -1, PREG_SPLIT_OFFSET_CAPTURE);
it would return an array of the form:
Array(
[0] => Array([0] => "match", [1] => 0),
[1] => Array([1] => "match", [1] => 8)
)
Note that or'ing in PREG_DELIM_CAPTURE along with PREG_SPLIT_OFFSET_CAPTURE works as well.
