Safe HTML Truncation May 26th, 2008
Many times us programmers have to take some dynamic text and make an excerpt out of it. We do this for a number of reasons, but the most common, to save space on the page. We just want to lure the reader in with 300 or so characters then have them click through to see the rest of the story. Sure its easy to just take the content and take a PHP substring of it. But what if that content is HTML? What would happen if you cut off text at ‘<a href=index.php>Hello World ‘? The rest of the page would become a link to index.php. And while we’re on the subject, how many characters is that string there? PHP substring would say its 29 characters, when really only 11 of them are displayed. It occurred to me that we needed a safe HTML truncation function so I can use it over and over again. As a matter of fact, you saw this function in action when you clicked into this post.
First I will share with you the wordpress plugin I made for it. This plugin is simple and just carries the function and makes it obtainable through the template. So even if you will not be using this for wordpress, you can download this and extract the function.
Basically the function takes 3 arguments. The content, the number of characters to cut off at, and the addition string (ex. usually “…”) For all those who want to see the code, here it is:
function mk_html_substr($string, $length, $addstring=”")
{$addstring = ” ” . $addstring;
if (strlen($string) > $length) {
if( !empty( $string ) && $length>0 ) {
$isText = true;
$ret = “”;
$i = 0;
$currentChar = “”;
$lastSpacePosition = -1;
$lastChar = “”;
$tagsArray = array();
$currentTag = “”;
$tagLevel = 0;
$noTagLength = strlen( strip_tags( $string ) );// Parser loop
for( $j=0; $j<strlen( $string ); $j++ ) {
$currentChar = substr( $string, $j, 1 );
$ret .= $currentChar;
if( $currentChar == “<”) $isText = false;
if( $isText ) {
// Memorize last space position
if( $currentChar == ” ” ) { $lastSpacePosition = $j; }
else { $lastChar = $currentChar; }
$i++;
}else{
$currentTag .= $currentChar;
}
// Greater than event
if( $currentChar == “>” ) {
$isText = true;
// Opening tag handler
if( ( strpos( $currentTag, “<” ) !== FALSE ) && ( strpos( $currentTag, “/>” ) === FALSE ) && ( strpos( $currentTag, “</”) === FALSE ) ) {
// Tag has attribute(s)
if( strpos( $currentTag, ” ” ) !== FALSE ) {
$currentTag = substr( $currentTag, 1, strpos( $currentTag, ” ” ) - 1 );
}else{
// Tag doesn’t have attribute(s)
$currentTag = substr( $currentTag, 1, -1 );
}
array_push( $tagsArray, $currentTag );
}else if( strpos( $currentTag, “</” ) !== FALSE ){
array_pop( $tagsArray );
}
$currentTag = “”;
}
if( $i >= $length) {
break;
}
}
// Cut HTML string at last space position
if( $length < $noTagLength ) {
if( $lastSpacePosition != -1 ) {
$ret = substr( $string, 0, $lastSpacePosition );
}else{
$ret = substr( $string, $j );
}
}
// Close broken XHTML elements
while( sizeof( $tagsArray ) != 0 ) {
$aTag = array_pop( $tagsArray );
$ret .= “</” . $aTag . “>\n”;
}
}else{
$ret = “”;
}
// only add string if text was cut
if ( strlen($string) > $length ) {
return( $ret.$addstring );
}
else {
return ( $res );
}
}else {
return ( $string );
}}
