img img img
img

The Ellipsize Function


The Ellipsize Function

I have a bit of code I would like to share. I was looking in CPAN for something similar and of course I found something much more robust in HTML::Truncate. Anyways, you see all the time a preview of the article text and an ellipsis and a link to the entire article.

 

The problem arises when people use HTML to format the article. Trying to truncate the text requires that you first remove the HTML because you never know which tags you will unbalace by chopping it up. For example lets say you have a paragraph with 1000 characters and you only want to show 500 - you end up removing the closing P tag which causes the HTML to become unbalanced. Obviously this is a simplified example but you can imagine the effects on your web page layout when you have unbalanced HTML.

 

So the easiest thing to do for your list pages and your RSS feeds is to strip out the HTML first, then shorten the text to a specific length. HTML::Truncate does a nice job of this and even deals with UTF8 characters. My function does not do nearly as much however I felt like putting it up here for criticism before I delete it permanently in favor of HTML::Truncate. Also I might replace it with a HTML::Truncate procedural wrapper to make it easier for folks who don't want to write several lines of OO perl and would prefer to just call ellipsize() like I have been doing in the past. Let me know what you think.

 

 

#
# ellipsize ($text, $truncate_length, $max_length)
#
# If length of $text is over $max_length, cut off at word boundary to
# approximately $truncate_length and add "...".
#
sub ellipsize
{
    my ($text, $truncate_length, $max_length) = @_;

        my $hs = HTML::Strip->new();
        $hs->set_decode_entities(0);
        my $clean_text = $hs->parse( $text );
        $hs->eof;

        $clean_text = strip_whitespace_scalar($clean_text);

    if (length($clean_text) > $max_length) {
        $text = substr($clean_text, 0, $truncate_length);
        $text =~ s/\s+\w*$//;
        $text .= "...";
    }
    return $text;
}



Comments

Posted 05/27/2011 by webmaster
Worth submitting to CPAN or better to just write up how to use HTML::Truncate or better yet write a simplified wrapper for HTML::Truncate?
Posted 05/27/2011 by webmaster
Obviously you are missing the code for the strip_whitespace_scalar function. I will be posting that up in a separate document.


Please log in to comment.
Don't have a free account? Become a member!




Technology

More About ME:
What I Want For Christmas
LinkedIn Profile
CPAN
GITHUB


Valid XHTML 1.0 Transitional [Valid RSS]


img
img img img