Rules of thumb before you paste anything
- If the pattern is for validation, prefer "good enough" over "technically correct". Full RFC 5322 email regex is ~6,300 characters and nobody maintains it.
- Always test in the exact engine you will run it in. PCRE, JavaScript, Python, and Go all disagree on Unicode, lookbehinds, and backreferences.
- Comment your patterns. Future-you will thank you.
- For anything security-adjacent (validation of untrusted input), assume the regex is insufficient — validate with a parser too.
Email address — the pragmatic pattern
For signup forms and logging, this is all you need:
^[^\s@]+@[^\s@]+\.[^\s@]+$
Non-whitespace, non-@ characters, an @, more non-whitespace, a dot, more non-whitespace. Accepts everything real SMTP accepts and nothing obviously broken. Rejects spaces and multiple @ signs. Do not try to validate TLDs in regex — that list changes monthly.
URL — protocol optional, query string included
^(https?://)?([\w-]+\.)+[\w-]+(/[\w\-./?%&=]*)?$
Optional http/https, one or more dotted hostname segments, optional path with query and fragment-safe characters. Good for extracting URLs from prose. Do not use for parsing — use the URL constructor.
Phone number — country-specific is the only honest approach
International phone validation in one regex is a trap. The E.164 format (+ and 7–15 digits) is the safest loose check:
^\+[1-9]\d{6,14}$
For US-only forms:
^(\+1[\s-]?)?\(?\d{3}\)?[\s-]?\d{3}[\s-]?\d{4}$
Anything more ambitious — area-code ranges, carrier codes — is better handled by a library (libphonenumber).
Date — ISO 8601 only, please
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
Year, hyphen, month 01–12, hyphen, day 01–31. Does not validate calendar edges (Feb 30 still passes). For strict validation, parse with Date.parse() and check .getDate() matches.
UUID, IPv4, IPv6, and friends
UUID v4: ^[a-f0-9]{8}-[a-f0-9]{4}-4[a-f0-9]{3}-[89ab][a-f0-9]{3}-[a-f0-9]{12}$
IPv4: ^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$
IPv6: there is no short, correct regex. Use a library. If you absolutely must, the canonical one is 1200 characters.
Hex colour, HTML tag, strong password
Hex colour: ^#?([0-9a-f]{3}|[0-9a-f]{6})$
HTML tag (extract only): <([a-z][a-z0-9]*)\b[^>]*>(.*?)</\1>. This is not safe for parsing arbitrary HTML — nothing in regex is. Use a real parser.
Strong password (14+ chars with all four classes): ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[^\w\s]).{14,}$
Three patterns you should never copy-paste
- Any RFC 5322 email regex over 100 characters. The "maximally correct" version misclassifies real addresses and passes edge cases nobody sends.
- HTML parsers in regex. Regex is not powerful enough to parse a context-free grammar. Use DOMPurify, cheerio, or the DOMParser API.
- Credit-card validation with a regex that does not also run Luhn. A 16-digit pattern matches 1000 × more invalid strings than valid ones.