Best Ways to Strip Punctuation From Strings in Python and Javascript
Updated: March 27, 2026
TL;DR
Python: Use str.translate() with str.maketrans() for speed, re.sub() for Unicode control, or string.punctuation for simplicity. JavaScript: Use .replace() with regex /[^\w\s]/g for ASCII, or Intl.Segmenter for Unicode-aware punctuation removal. Choose based on your performance needs and character scope.
Removing punctuation is a common task in NLP preprocessing, form validation, text analysis, and data cleaning. The "best" method depends on your data (ASCII vs. Unicode), performance requirements (speed vs. readability), and language features. This reference guide compares methods across Python and JavaScript with performance benchmarks and edge case handling.
Python: Methods Ranked by Speed
1. str.translate() + str.maketrans() (Fastest)
import string
text = "Hello, world! How are you?"
translator = str.maketrans('', '', string.punctuation)
result = text.translate(translator)
print(result) # "Hello world How are you"
Performance: Fastest Pros: Fastest pure Python method, handles all ASCII punctuation Cons: Limited to predefined punctuation, doesn't remove Unicode punctuation
Edge Cases:
# Handles Unicode punctuation from string.punctuation
text = "café…hello"
result = text.translate(translator)
print(result) # "caféhello" (… removed, é preserved)
# But misses other Unicode punctuation:
text = "Hello «world» and ‹greetings›" # French quotes
result = text.translate(translator)
print(result) # "Hello «world» and ‹greetings›" (not removed)
2. Regex with re.sub() (Most Flexible)
import re
text = "Hello, world! How are you?"
result = re.sub(r'[^\w\s]', '', text)
print(result) # "Hello world How are you"
Pattern Breakdown:
[^\w\s]= match anything that's NOT a word char (\w) or whitespace (\s)- Word chars = a-z, A-Z, 0-9, and underscore (_)
Performance: Moderate (slower, but more control) Pros: Unicode-aware, customizable patterns, removes all punctuation Cons: Slower than translate(), overkill for simple ASCII
For Unicode Punctuation:
# Remove all Unicode punctuation (including accents, special chars)
text = "Hello, café! «World»"
result = re.sub(r'[^\w\s]', '', text, flags=re.UNICODE)
print(result) # "Hello café World"
# Remove punctuation but keep accents:
result = re.sub(r'[^\w\s\u0080-\uFFFF]', '', text)
# Keeps non-ASCII letters (accented chars), removes punctuation
3. Custom Set of Punctuation
import string
PUNCTUATION_TO_REMOVE = set(string.punctuation)
def strip_punctuation(text: str) -> str:
"""Remove ASCII punctuation"""
return ''.join(char for char in text if char not in PUNCTUATION_TO_REMOVE)
text = "Hello, world! How are you?"
result = strip_punctuation(text)
print(result) # "Hello world How are you"
Performance: Slowest Pros: Readable, customizable, deterministic Cons: Slowest of the three methods, only handles ASCII
Customizable:
# Remove only specific punctuation
REMOVE_ONLY = {',', '!', '?'}
def strip_select(text: str) -> str:
return ''.join(char for char in text if char not in REMOVE_ONLY)
text = "Hello, world! How are you?"
result = strip_select(text)
print(result) # "Hello world How are you"
# Keep some punctuation:
text = "Price: $19.99—amazing!"
result = strip_select(text)
print(result) # "Price: $19.99—amazing" (keeps $ and .)
4. Complex Unicode Handling (unicodedata)
import unicodedata
def remove_unicode_punctuation(text: str) -> str:
"""Remove Unicode category Punctuation (P*)"""
return ''.join(
char for char in text
if unicodedata.category(char)[0] != 'P'
)
text = "Hello, world! «Café»…"
result = remove_unicode_punctuation(text)
print(result) # "Hello world Café"
Unicode Categories:
Pc= Connector punctuation (underscore)Pd= Dash punctuation (-, –, —)Pe= Close punctuation ()})Pf= Final punctuation (»)Pi= Initial punctuation («)Po= Other punctuation (!, ?, .)Ps= Open punctuation (([{)
Pros: Handles any Unicode punctuation correctly, language-agnostic Cons: Slowest method, overkill for ASCII
JavaScript: Methods Ranked by Speed
1. Regex with replace() (Standard)
const text = "Hello, world! How are you?";
const result = text.replace(/[^\w\s]/g, '');
console.log(result); // "Hello world How are you"
Pattern:
/[^\w\s]/g= remove all non-word and non-space charactersgflag = global (replace all occurrences)
Performance: Fast Pros: Fast, concise, handles most cases Cons: Doesn't handle Unicode punctuation well
For Unicode:
// ASCII only (keeps Unicode letters, removes punctuation):
const text = "Hello, café! «World»";
const result = text.replace(/[^\w\s]/gu, '');
console.log(result); // "Hello café World" (u flag = Unicode)
// Remove all non-letter characters:
const result2 = text.replace(/[^\p{Letter}\p{Number}\s]/gu, '');
console.log(result2); // "Hello café World"
2. replace() with Unicode Property Escapes (ES2024)
// Remove all Unicode punctuation
const text = "Hello, world! «Café»";
const result = text.replace(/\p{P}/gu, '');
console.log(result); // "Hello world Café"
Unicode Properties:
\p{P}= any punctuation\p{Punctuation}= same as \p{P}\p{Pc}= connector punctuation\p{Pd}= dash punctuation\p{Po}= other punctuation
Pros: Precise, handles all Unicode punctuation Cons: Requires ES2024+ browser support
3. Custom Function (Readable)
function stripPunctuation(text) {
const punctuation = /[^\w\s]/g;
return text.replace(punctuation, '');
}
const text = "Hello, world! How are you?";
const result = stripPunctuation(text);
console.log(result); // "Hello world How are you"
4. Remove Specific Punctuation Only
function removeSpecificPunctuation(text, charsToRemove = ",.!?") {
const pattern = new RegExp(`[${charsToRemove}]`, 'g');
return text.replace(pattern, '');
}
const text = "Price: $19.99—amazing!";
const result = removeSpecificPunctuation(text, '!?');
console.log(result); // "Price: $19.99—amazing"
Performance Comparison Table
| Language | Method | Performance | Unicode Support | Best For |
|---|---|---|---|---|
| Python | translate() | Fastest | ASCII only | Speed-critical, ASCII data |
| Python | re.sub() | Moderate | Full Unicode | Flexibility, Unicode text |
| Python | Set comprehension | Slow | ASCII only | Readability, custom rules |
| Python | unicodedata | Slowest | Full Unicode | Precise Unicode handling |
| JavaScript | replace(/regex/) | Fast | ASCII/limited Unicode | General use |
| JavaScript | replace(/\p{P}/gu) | Moderate | Full Unicode | ES2024+ projects |
Practical Use Cases
Use Case 1: NLP Text Preprocessing (Remove All Punctuation)
# Python
import re
texts = ["Hello, world!", "Price: $19.99", "Hello…goodbye"]
cleaned = [re.sub(r'[^\w\s]', '', text) for text in texts]
# ["Hello world", "Price 1999", "Helloody"]
// JavaScript
const texts = ["Hello, world!", "Price: $19.99", "Hello…goodbye"];
const cleaned = texts.map(text => text.replace(/[^\w\s]/gu, ''));
// ["Hello world", "Price 1999", "Helloody"]
Use Case 2: Keep Apostrophes (Contractions)
import re
def keep_apostrophes(text):
return re.sub(r"[^\w\s']", '', text)
text = "It's don't valid, isn't it?"
result = keep_apostrophes(text)
print(result) # "It's don't valid isn't it"
function keepApostrophes(text) {
return text.replace(/[^\w\s']/gu, '');
}
const text = "It's don't valid, isn't it?";
const result = keepApostrophes(text);
console.log(result); // "It's don't valid isn't it"
Use Case 3: Remove Punctuation but Keep Hyphens (For Hyphenated Words)
import re
def keep_hyphens(text):
return re.sub(r"[^\w\s-]", '', text)
text = "well-known, mother-in-law: important!"
result = keep_hyphens(text)
print(result) # "well-known mother-in-law important"
function keepHyphens(text) {
return text.replace(/[^\w\s-]/gu, '');
}
const text = "well-known, mother-in-law: important!";
const result = keepHyphens(text);
console.log(result); // "well-known mother-in-law important"
Use Case 4: Remove Punctuation Except Periods (For Sentence Preservation)
import re
def keep_periods(text):
return re.sub(r"[^\w\s.]", '', text)
text = "Hello, world! How are you? I'm fine."
result = keep_periods(text)
print(result) # "Hello world How are you I'm fine."
Edge Cases and Gotchas
Numbers in Punctuation
import string
# string.punctuation doesn't include numbers
print(string.punctuation)
# Output: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
# But \w includes digits, so [^\w\s] removes all punctuation except numbers
text = "Price: $19.99!"
result = re.sub(r"[^\w\s]", '', text)
print(result) # "Price 1999" (keeps 1999)
result = re.sub(r"[^\d\s]", '', text)
print(result) # " 1999" (keep ONLY digits)
Emoji and Special Characters
import re
text = "Hello 👋 world! 🌍 How are you? 😊"
# ASCII-only removal:
result = re.sub(r"[^\w\s]", '', text)
print(result) # "Hello world How are you " (emoji still there in Python 3.12+)
# With Unicode flag:
result = re.sub(r"[^\w\s]", '', text, flags=re.UNICODE)
print(result) # Still keeps emoji (they're not punctuation)
# Remove emoji and punctuation:
result = re.sub(r"[^\w\s\p{L}]", '', text) # Won't work, \p not in Python re
# Use unicodedata instead
import unicodedata
result = ''.join(
c for c in text
if unicodedata.category(c)[0] not in ('P', 'So') # Remove Punctuation and Symbols/Other
)
print(result) # "Hello world How are you"
RTL Text (Arabic, Hebrew)
# Python handles RTL correctly in regex
import re
text = "مرحبا, بالعالم! Hello, world!"
result = re.sub(r'[^\w\s]', '', text)
print(result) # "مرحبا بالعالم Hello world"
// JavaScript also handles RTL with Unicode flag
const text = "مرحبا, بالعالم! Hello, world!";
const result = text.replace(/[^\w\s]/gu, '');
console.log(result); // "مرحبا بالعالم Hello world"
Benchmarking Your Own Data
import timeit
import string
import re
text = "Hello, world! " * 100 # 1400 chars
# Method 1: translate
def m1():
translator = str.maketrans('', '', string.punctuation)
return text.translate(translator)
# Method 2: regex
def m2():
return re.sub(r'[^\w\s]', '', text)
# Method 3: set comprehension
def m3():
return ''.join(c for c in text if c not in string.punctuation)
print("translate():", timeit.timeit(m1, number=10000))
print("re.sub():", timeit.timeit(m2, number=10000))
print("set comprehension:", timeit.timeit(m3, number=10000))
const text = "Hello, world! ".repeat(100); // 1400 chars
console.time("regex");
for (let i = 0; i < 10000; i++) {
text.replace(/[^\w\s]/g, '');
}
console.timeEnd("regex");
console.time("regex unicode");
for (let i = 0; i < 10000; i++) {
text.replace(/[^\w\s]/gu, '');
}
console.timeEnd("regex unicode");
Recommendations
Use str.translate() if:
- Working with ASCII text only
- Performance is critical (processing millions of strings)
- Using Python with standard string.punctuation
Use re.sub() if:
- Need Unicode punctuation support
- Want flexible pattern matching
- Performance is acceptable (< 1 million strings)
Use unicodedata if:
- Need precise control over punctuation categories
- Working with multilingual text requiring exact Unicode handling
JavaScript: Use .replace(/[^\w\s]/gu, '') for:
- Standard modern projects
- Unicode support required
- Readable, concise code
JavaScript: Use /\p{P}/gu if:
- ES2024+ browser support confirmed
- Need to remove ONLY punctuation (preserve all letters, numbers, symbols)
Conclusion
Removing punctuation is simple in concept but nuanced in implementation. For ASCII-only, Python's translate() is unbeatable. For Unicode, re.sub() is flexible. JavaScript's .replace() handles both with regex. Choose based on your data type, performance needs, and language features available. Test on your actual data — benchmarks vary wildly based on string length and punctuation density.