homoglyph-detector

Solid

Byte-level Unicode homoglyph detection for identifying invisible character substitutions in code

AI & Automation 1,160 stars 71 forks Updated today MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Homoglyph Detector Byte-level forensic analysis of code changes to detect Unicode homoglyph substitutions — characters that look identical to ASCII in every editor and diff tool but have different codepoints, silently breaking string comparisons, dictionary lookups, and identifier resolution. ## Purpose Homoglyph attacks (related to CVE-2021-42574 "Trojan Source") are the highest-stealth trojan technique. A Cyrillic `р` (U+0440) looks identical to a Latin `p` (U+0070) in every font, editor, and diff viewer. The only way to detect it is byte-level analysis via `hexdump`. This skill pipes git diffs through `hexdump -C` and scans for multi-byte UTF-8 sequences where single-byte ASCII is expected, particularly in string literals used as dictionary keys, variable names, and identifiers. ## Capabilities ### Confusable Character Detection Scans for these high-risk Unicode confusables: | Latin | Cyrillic | Greek | UTF-8 Bytes | |-------|----------|-------|-------------| | a (61) | а (D0 B0) | α (CE B1) | 1 vs 2 bytes | | c (63) | с (D1 81) | — | 1 vs 2 bytes | | e (65) | е (D0 B5) | ε (CE B5) | 1 vs 2 bytes | | o (6F) | о (D0 BE) | ο (CE BF) | 1 vs 2 bytes | | p (70) | р (D1 80) | ρ (CF 81) | 1 vs 2 bytes | | x (78) | х (D1 85) | χ (CF 87) | 1 vs 2 bytes | | y (79) | у (D1 83) | — | 1 vs 2 bytes | ### Zero-Width Character Detection - U+200B — Zero-width space - U+200C — Zero-width non-joiner - U+200D — Zero-width joiner - U+FEFF — Byte order mark (in non-BOM position) ### ...

Details

Author
a5c-ai
Repository
a5c-ai/babysitter
Created
4 months ago
Last Updated
today
Language
JavaScript
License
MIT

Similar Skills

Semantically similar based on skill content — not just same category

Data & Documents Solid

icon-lookup

Workaround for Claude Code filtering BMP PUA Unicode (U+E000-U+F8FF). Supplementary PUA Nerd Font icons like 󰊤 󱃾 󰁹 (U+F0000+, e.g. nf-md-github, nf-md-kubernetes, nf-md-battery) can be written directly. BMP PUA icons (Powerline, Font Awesome, Devicons) require placeholder syntax like {{ U+E0A0 }} or {{ nf-fa-star }} (without spaces), which hooks auto-convert. Invoke when reading or writing Starship configs, tmux themes, shell prompts, or statuslines.

459 Updated yesterday
malob
AI & Automation Listed

code-pattern-matching

Search for code patterns in decompiled output using Weggli semantic matching. Use when finding vulnerable code constructs like unchecked memcpy, buffer operations, or specific function call patterns in pseudocode.

15 Updated 2 months ago
vulhunt-re
AI & Automation Listed

glyph

Math-glyph encoding — LLM-facing compression for SPEC.md ∧ spec-adjacent writes. Loaded by /sdd:spec, /sdd:build, /sdd:check. Triggers on any write to SPEC.md ∨ user says "math-glyph", "glyph", "compress this", "be brief".

4 Updated today
kborovik
Data & Documents Listed

pattern-detection

Detect patterns, anomalies, and trends in code and data. Use when identifying code smells, finding security vulnerabilities, or discovering recurring patterns. Handles regex patterns, AST analysis, and statistical anomaly detection.

335 Updated today
aiskillstore
Code & Development Listed

russian-text-quality

Use when reviewing or writing code that contains Russian-language UI strings, i18n locale files (ru.json, ru.yml, ru.po), Russian comments, or transliterated identifiers (polzovatel, tovar, kolichestvo). Triggers on mixed English-code + Russian-content projects. Does not cover typography, spelling, or ё/е.

1 Updated today
Axelendometrial4386