How to Create a Unicode-Aware Slugify Function in PHP

Build a Unicode-aware PHP slugify function that handles accented characters, special symbols, clean URLs, and multilingual content better.

How to Create a Unicode-Aware Slugify Function in PHP cover image

Introduction

When I build websites and content-driven platforms, one of the first things I pay attention to is the URL structure. Clean, human-readable URLs are not just a matter of aesthetics. They directly influence SEO, click-through rates, and how search engines understand page hierarchy. In most PHP projects, generating these URLs means transforming a title into a slug: a lowercase string where words are separated by dashes and special characters are removed.

On the surface, writing a slugify() function looks simple. However, I have seen many implementations that work perfectly for plain English titles but completely break when encountering international characters such as à, ä, ă, ö, ß, or Romanian diacritics like ă, â, î, ș, ț. When these characters are stripped incorrectly, the resulting slug becomes incomplete, inconsistent, and sometimes even misleading.

The SEO Impact of Poor Slug Handling

Search engines rely heavily on URLs as ranking signals. A well-structured slug improves keyword relevance and reinforces the topic of the page. When special characters are removed incorrectly instead of converted, important letters disappear. For example, the word "café" turning into "caf" removes semantic clarity. Over time, these small issues can accumulate across a site and weaken its overall SEO consistency.

A good slug should be readable by humans, indexable by search engines, and consistent across all languages your website supports.

From my experience, multilingual websites suffer the most when slug functions are not Unicode-aware. Titles written in French, German, Romanian, or other European languages often contain accented characters. If those characters are simply deleted instead of transliterated, you lose both readability and accuracy.

The Common Slugify Mistake

Below is a typical slugify function that I often encounter in PHP projects. It converts the string to lowercase, replaces spaces with dashes, removes unwanted characters, and trims duplicate dashes. At first glance, it seems correct.

function slugify($title) {
  $slug = strtolower($title);
  $slug = str_replace(' ', '-', $slug);
  $slug = preg_replace('/[^a-z0-9-]/', '', $slug);
  $slug = preg_replace('/-+/', '-', $slug);
  $slug = trim($slug, '-');
  return $slug;
}

The issue here is the regular expression that removes everything outside the range a-z0-9-. Characters like "é" or "ă" are not converted; they are simply removed. That means "Crème brûlée" becomes "crme-brle" instead of "creme-brulee". This is not just a cosmetic problem. It reduces keyword clarity and makes URLs look broken.

Unicode-Aware Slugify with iconv()

To solve this properly, I use PHP's iconv() function. This function attempts to transliterate UTF-8 characters into their closest ASCII equivalents. Instead of deleting accented characters, it converts them. That is exactly what we want when building SEO-friendly URLs.

function slugify($title) {
  $slug = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $title);
  $slug = strtolower($slug);
  $slug = str_replace(' ', '-', $slug);
  $slug = preg_replace('/[^a-z0-9-]/', '', $slug);
  $slug = preg_replace('/-+/', '-', $slug);
  $slug = trim($slug, '-');
  return $slug;
}

With this approach, special characters are preserved in a readable ASCII form. This dramatically improves the quality of slugs across multiple languages.

Examples:

  • à la carte → a-la-carte
  • Crème brûlée → creme-brulee
  • straße → strasse
  • Înălțime → inaltime

Why Unicode Matters in URLs

Unicode support in URL slugs extends beyond just making links look good. When search engines crawl your site, they parse the URL structure to understand content categories and keywords. A slug that properly represents the original text—including accented characters as their ASCII equivalents—helps algorithms understand your page's relevance.

Consider a food blog with articles in multiple languages. A recipe titled "Crêpes aux Fraises" in French should produce a slug like "crepes-aux-fraises", not "crpes-aux-fraises". The semantic difference matters. Users reading the URL can instantly recognize what the page is about, which increases trust and click-through rates from search results.

Furthermore, proper Unicode handling ensures consistency when users bookmark or share your URLs. A truncated or corrupted slug looks unprofessional and may hurt your site's credibility. This is especially critical for international SEO strategies where your site must rank in multiple countries and languages.

How Transliteration Works Under the Hood

The iconv() function with the ASCII//TRANSLIT flag performs a character-by-character conversion. It maps each Unicode character to its closest ASCII representation. For most Western European accents, this means converting "é" to "e", "ö" to "o", and "ß" to "ss".

The TRANSLIT mode attempts phonetic or visual similarity where a direct ASCII equivalent doesn't exist. The IGNORE flag tells iconv to silently drop any character that cannot be transliterated, preventing errors. Without it, iconv might return false or throw an error if it encounters problematic characters.

It's worth noting that transliteration is language-dependent. Different languages have different rules. German treats "ä" as "ae", while Swedish might use just "a". The standard iconv behavior works well for most cases but may not be perfect for every language. For highly specialized requirements, you might explore libraries like Transliterator from the Intl extension or third-party packages that offer more granular control.

Performance Considerations

When I implement slugify functions in high-traffic applications, I think about performance. The iconv() function is fast, but calling it repeatedly on every page request can add up. For dynamic content, I cache slug values in the database or use a simple in-memory cache.

If you're generating slugs once during content creation or import, performance is not a concern. However, if you're generating them on-the-fly for every request, consider memoization or caching layers. Many frameworks offer caching out of the box; leverage them.

Another optimization is to avoid repeated regex operations. The function I showed performs two regex replacements. In most cases, this is negligible, but for batch processing of thousands of titles, you might optimize by combining operations or using a single pass where possible.

Common Pitfalls to Avoid

I've learned these lessons the hard way over years of building multilingual content platforms:

  • Assuming all servers have iconv enabled: Always check and provide fallbacks. Some hosting environments may have it disabled.
  • Not handling empty slugs: Titles that are entirely special characters might produce empty results. Validate and provide defaults.
  • Changing slug generation logic retroactively: If you change how you generate slugs, old URLs break. Use redirects or maintain legacy slug support.
  • Ignoring locale differences: The same character might transliterate differently in different locales. Document your assumptions.
  • Forgetting about special symbols: Emojis, currency signs, and other symbols still need handling. Test with diverse input.

Testing Your Slugify Function

Good practice demands unit tests for slug generation. I always create a test suite that covers edge cases:

// Test cases
assert(slugify('Hello World') === 'hello-world');
assert(slugify('Café') === 'cafe');
assert(slugify('Ümlauts and Accents') === 'umlauts-and-accents');
assert(slugify('Multiple---Dashes') === 'multiple-dashes');
assert(slugify('---Leading and Trailing---') === 'leading-and-trailing');
assert(slugify('') === '');

Testing ensures your implementation handles international content correctly and that future changes don't introduce regressions. I recommend running these tests in your CI/CD pipeline to catch issues early.

Integrating with Frameworks

Most modern PHP frameworks have built-in slug generation or provide libraries for it. CodeIgniter has URL helpers, Laravel has methods like Str::slug(), and Symfony offers the Slugger component. If you're using a framework, check its documentation first before rolling your own.

However, understanding how slug generation works under the hood helps you debug issues and customize behavior when needed. Framework implementations often use the approach I've shown here, so learning the fundamentals makes you a better developer.

Why This Approach Improves SEO and User Experience

When I implement Unicode-aware slug generation, I notice immediate improvements in URL consistency. Search engines can better interpret keywords, and users are more likely to trust and click clean, readable links. A slug like inaltime clearly reflects the original Romanian word, whereas a truncated version would look unprofessional.

For more insights on URL structure and SEO optimization, see our guide on SEO and website growth strategies. Additionally, if you're interested in how to apply similar character encoding techniques in other contexts, explore our code snippets and developer tools.

Server Requirements

Before using this function in production, I always verify that the iconv extension is enabled in the PHP environment. In most hosting setups, it is enabled by default. You can confirm this by running:

php -m | grep iconv

If the extension is missing, it can usually be enabled through your server configuration or hosting control panel.

FAQ

Q: Does iconv work on Windows servers?

A: Yes, iconv is available on Windows. However, some behavior differences exist between operating systems. Always test your implementation on your target platform.

Q: What if my hosting provider doesn't have iconv enabled?

A: You can use alternatives like the Intl extension's Transliterator class, or install a library like "cocur/slugify" that provides fallback mechanisms.

Q: Should I store slugs in the database or generate them on demand?

A: For performance and consistency, store slugs. This ensures URLs remain stable even if your generation logic changes later. Generate them once at content creation time.

Q: How do I handle URLs that need to be unique?

A: Generate the base slug, then check the database for collisions. Append a number (e.g., "-2", "-3") to duplicates to ensure uniqueness without compromising readability.

Final Thoughts

A reliable slugify function is a small but critical part of any SEO strategy. I treat URL generation as infrastructure, not as an afterthought. By using iconv() to handle transliteration properly, I ensure that my URLs remain clean, consistent, and search engine friendly across languages.

If you are building a PHP-based CMS, blog platform, or content-driven website, I strongly recommend implementing Unicode-aware slug generation from the start. It prevents long-term SEO issues and ensures your URLs remain readable and professional as your content grows.