What are Homoglyphs and Why Do They Matter?

A homoglyph is a character that looks identical or very similar to another character but has a different Unicode value. For example, the Latin letter "a" (U+0061) and the Cyrillic letter "а" (U+0430) appear identical to the human eye, but computers recognize them as completely different characters.

Homoglyphs exist because Unicode includes characters from hundreds of writing systems worldwide. Many letters across different alphabets evolved to represent similar sounds, resulting in visually identical glyphs. While this linguistic diversity is valuable, it creates security vulnerabilities.

Attackers exploit homoglyphs in phishing attacks by creating fake websites with URLs that look legitimate. For instance, "аpple.com" using a Cyrillic "а" appears identical to "apple.com" but leads to a different website. They're also used to spoof usernames on social media, fake email addresses, and hide malicious code in software by using lookalike variable names.

Common homoglyph pairs include: Latin "o" vs Greek "ο" vs Cyrillic "о", digit "0" vs letter "O", and Latin "a" vs Cyrillic "а". This tool detects these suspicious character substitutions to help protect against security threats.

Tool description

A security tool that analyzes text to detect homoglyphs - characters that look similar but have different Unicode values. Use this tool to find hidden Unicode characters, check for non Latin characters, and detect non ASCII characters that could be used maliciously. This phishing link analyzer helps you verify suspicious URLs and acts as a fake website link checker by identifying lookalike characters that attackers use in phishing attacks, domain spoofing, and other security exploits.

Features

  • Real-time Analysis: Automatically analyzes text as you type to detect suspicious characters
  • Non-ASCII Detection: Quickly detect non ASCII characters and find hidden Unicode characters that may be hidden in seemingly normal text
  • Unicode Code Points: Shows the Unicode value (U+xxxx) for each detected character