Best Ways to Strip Punctuation From Strings in Python and Javascript

Updated: March 27, 2026

#python #javascript #string-manipulation #recreated

Best Ways to Strip Punctuation From Strings in Python and Javascript

TL;DR

Python: Use str.translate() with str.maketrans() for speed, re.sub() for Unicode control, or string.punctuation for simplicity. JavaScript: Use .replace() with regex /[^\w\s]/g for ASCII, or Intl.Segmenter for Unicode-aware punctuation removal. Choose based on your performance needs and character scope.

Removing punctuation is a common task in NLP preprocessing, form validation, text analysis, and data cleaning. The "best" method depends on your data (ASCII vs. Unicode), performance requirements (speed vs. readability), and language features. This reference guide compares methods across Python and JavaScript with performance benchmarks and edge case handling.

Python: Methods Ranked by Speed

1. str.translate() + str.maketrans() (Fastest)

import string

text = "Hello, world! How are you?"
translator = str.maketrans('', '', string.punctuation)
result = text.translate(translator)
print(result)  # "Hello world How are you"

Performance: Fastest Pros: Fastest pure Python method, handles all ASCII punctuation Cons: Limited to predefined punctuation, doesn't remove Unicode punctuation

Edge Cases:

# Handles Unicode punctuation from string.punctuation
text = "café…hello"
result = text.translate(translator)
print(result)  # "caféhello" (… removed, é preserved)

# But misses other Unicode punctuation:
text = "Hello «world» and ‹greetings›"  # French quotes
result = text.translate(translator)
print(result)  # "Hello «world» and ‹greetings›" (not removed)

2. Regex with re.sub() (Most Flexible)

import re

text = "Hello, world! How are you?"
result = re.sub(r'[^\w\s]', '', text)
print(result)  # "Hello world How are you"

Pattern Breakdown:

[^\w\s] = match anything that's NOT a word char (\w) or whitespace (\s)
Word chars = a-z, A-Z, 0-9, and underscore (_)

Performance: Moderate (slower, but more control) Pros: Unicode-aware, customizable patterns, removes all punctuation Cons: Slower than translate(), overkill for simple ASCII

For Unicode Punctuation:

# Remove all Unicode punctuation (including accents, special chars)
text = "Hello, café! «World»"
result = re.sub(r'[^\w\s]', '', text, flags=re.UNICODE)
print(result)  # "Hello café World"

# Remove punctuation but keep accents:
result = re.sub(r'[^\w\s\u0080-\uFFFF]', '', text)
# Keeps non-ASCII letters (accented chars), removes punctuation

3. Custom Set of Punctuation

import string

PUNCTUATION_TO_REMOVE = set(string.punctuation)

def strip_punctuation(text: str) -> str:
    """Remove ASCII punctuation"""
    return ''.join(char for char in text if char not in PUNCTUATION_TO_REMOVE)

text = "Hello, world! How are you?"
result = strip_punctuation(text)
print(result)  # "Hello world How are you"

Performance: Slowest Pros: Readable, customizable, deterministic Cons: Slowest of the three methods, only handles ASCII

Customizable:

# Remove only specific punctuation
REMOVE_ONLY = {',', '!', '?'}

def strip_select(text: str) -> str:
    return ''.join(char for char in text if char not in REMOVE_ONLY)

text = "Hello, world! How are you?"
result = strip_select(text)
print(result)  # "Hello world How are you"

# Keep some punctuation:
text = "Price: $19.99—amazing!"
result = strip_select(text)
print(result)  # "Price: $19.99—amazing" (keeps $ and .)

4. Complex Unicode Handling (unicodedata)

import unicodedata

def remove_unicode_punctuation(text: str) -> str:
    """Remove Unicode category Punctuation (P*)"""
    return ''.join(
        char for char in text
        if unicodedata.category(char)[0] != 'P'
    )

text = "Hello, world! «Café»…"
result = remove_unicode_punctuation(text)
print(result)  # "Hello world Café"

Unicode Categories:

Pc = Connector punctuation (underscore)
Pd = Dash punctuation (-, –, —)
Pe = Close punctuation ()})
Pf = Final punctuation (»)
Pi = Initial punctuation («)
Po = Other punctuation (!, ?, .)
Ps = Open punctuation (([{)

Pros: Handles any Unicode punctuation correctly, language-agnostic Cons: Slowest method, overkill for ASCII

JavaScript: Methods Ranked by Speed

1. Regex with replace() (Standard)

const text = "Hello, world! How are you?";
const result = text.replace(/[^\w\s]/g, '');
console.log(result);  // "Hello world How are you"

Pattern:

/[^\w\s]/g = remove all non-word and non-space characters
g flag = global (replace all occurrences)

Performance: Fast Pros: Fast, concise, handles most cases Cons: Doesn't handle Unicode punctuation well

For Unicode:

// ASCII only (keeps Unicode letters, removes punctuation):
const text = "Hello, café! «World»";
const result = text.replace(/[^\w\s]/gu, '');
console.log(result);  // "Hello café World" (u flag = Unicode)

// Remove all non-letter characters:
const result2 = text.replace(/[^\p{Letter}\p{Number}\s]/gu, '');
console.log(result2);  // "Hello café World"

2. replace() with Unicode Property Escapes (ES2024)

// Remove all Unicode punctuation
const text = "Hello, world! «Café»";
const result = text.replace(/\p{P}/gu, '');
console.log(result);  // "Hello world Café"

Unicode Properties:

\p{P} = any punctuation
\p{Punctuation} = same as \p{P}
\p{Pc} = connector punctuation
\p{Pd} = dash punctuation
\p{Po} = other punctuation

Pros: Precise, handles all Unicode punctuation Cons: Requires ES2024+ browser support

3. Custom Function (Readable)

function stripPunctuation(text) {
  const punctuation = /[^\w\s]/g;
  return text.replace(punctuation, '');
}

const text = "Hello, world! How are you?";
const result = stripPunctuation(text);
console.log(result);  // "Hello world How are you"

4. Remove Specific Punctuation Only

function removeSpecificPunctuation(text, charsToRemove = ",.!?") {
  const pattern = new RegExp(`[${charsToRemove}]`, 'g');
  return text.replace(pattern, '');
}

const text = "Price: $19.99—amazing!";
const result = removeSpecificPunctuation(text, '!?');
console.log(result);  // "Price: $19.99—amazing"

Performance Comparison Table

Language	Method	Performance	Unicode Support	Best For
Python	translate()	Fastest	ASCII only	Speed-critical, ASCII data
Python	re.sub()	Moderate	Full Unicode	Flexibility, Unicode text
Python	Set comprehension	Slow	ASCII only	Readability, custom rules
Python	unicodedata	Slowest	Full Unicode	Precise Unicode handling
JavaScript	replace(/regex/)	Fast	ASCII/limited Unicode	General use
JavaScript	replace(/\p{P}/gu)	Moderate	Full Unicode	ES2024+ projects

Practical Use Cases

Use Case 1: NLP Text Preprocessing (Remove All Punctuation)

# Python
import re

texts = ["Hello, world!", "Price: $19.99", "Hello…goodbye"]
cleaned = [re.sub(r'[^\w\s]', '', text) for text in texts]
# ["Hello world", "Price 1999", "Helloody"]

// JavaScript
const texts = ["Hello, world!", "Price: $19.99", "Hello…goodbye"];
const cleaned = texts.map(text => text.replace(/[^\w\s]/gu, ''));
// ["Hello world", "Price 1999", "Helloody"]

Use Case 2: Keep Apostrophes (Contractions)

import re

def keep_apostrophes(text):
    return re.sub(r"[^\w\s']", '', text)

text = "It's don't valid, isn't it?"
result = keep_apostrophes(text)
print(result)  # "It's don't valid isn't it"

function keepApostrophes(text) {
  return text.replace(/[^\w\s']/gu, '');
}

const text = "It's don't valid, isn't it?";
const result = keepApostrophes(text);
console.log(result);  // "It's don't valid isn't it"

Use Case 3: Remove Punctuation but Keep Hyphens (For Hyphenated Words)

import re

def keep_hyphens(text):
    return re.sub(r"[^\w\s-]", '', text)

text = "well-known, mother-in-law: important!"
result = keep_hyphens(text)
print(result)  # "well-known mother-in-law important"

function keepHyphens(text) {
  return text.replace(/[^\w\s-]/gu, '');
}

const text = "well-known, mother-in-law: important!";
const result = keepHyphens(text);
console.log(result);  // "well-known mother-in-law important"

Use Case 4: Remove Punctuation Except Periods (For Sentence Preservation)

import re

def keep_periods(text):
    return re.sub(r"[^\w\s.]", '', text)

text = "Hello, world! How are you? I'm fine."
result = keep_periods(text)
print(result)  # "Hello world How are you I'm fine."

Edge Cases and Gotchas

Numbers in Punctuation

import string

# string.punctuation doesn't include numbers
print(string.punctuation)
# Output: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

# But \w includes digits, so [^\w\s] removes all punctuation except numbers
text = "Price: $19.99!"
result = re.sub(r"[^\w\s]", '', text)
print(result)  # "Price 1999"  (keeps 1999)

result = re.sub(r"[^\d\s]", '', text)
print(result)  # "  1999"  (keep ONLY digits)

Emoji and Special Characters

import re

text = "Hello 👋 world! 🌍 How are you? 😊"

# ASCII-only removal:
result = re.sub(r"[^\w\s]", '', text)
print(result)  # "Hello  world  How are you  " (emoji still there in Python 3.12+)

# With Unicode flag:
result = re.sub(r"[^\w\s]", '', text, flags=re.UNICODE)
print(result)  # Still keeps emoji (they're not punctuation)

# Remove emoji and punctuation:
result = re.sub(r"[^\w\s\p{L}]", '', text)  # Won't work, \p not in Python re
# Use unicodedata instead
import unicodedata
result = ''.join(
    c for c in text
    if unicodedata.category(c)[0] not in ('P', 'So')  # Remove Punctuation and Symbols/Other
)
print(result)  # "Hello  world  How are you"

RTL Text (Arabic, Hebrew)

# Python handles RTL correctly in regex
import re

text = "مرحبا, بالعالم! Hello, world!"
result = re.sub(r'[^\w\s]', '', text)
print(result)  # "مرحبا بالعالم Hello world"

// JavaScript also handles RTL with Unicode flag
const text = "مرحبا, بالعالم! Hello, world!";
const result = text.replace(/[^\w\s]/gu, '');
console.log(result);  // "مرحبا بالعالم Hello world"

Benchmarking Your Own Data

import timeit
import string
import re

text = "Hello, world! " * 100  # 1400 chars

# Method 1: translate
def m1():
    translator = str.maketrans('', '', string.punctuation)
    return text.translate(translator)

# Method 2: regex
def m2():
    return re.sub(r'[^\w\s]', '', text)

# Method 3: set comprehension
def m3():
    return ''.join(c for c in text if c not in string.punctuation)

print("translate():", timeit.timeit(m1, number=10000))
print("re.sub():", timeit.timeit(m2, number=10000))
print("set comprehension:", timeit.timeit(m3, number=10000))

const text = "Hello, world! ".repeat(100);  // 1400 chars

console.time("regex");
for (let i = 0; i < 10000; i++) {
  text.replace(/[^\w\s]/g, '');
}
console.timeEnd("regex");

console.time("regex unicode");
for (let i = 0; i < 10000; i++) {
  text.replace(/[^\w\s]/gu, '');
}
console.timeEnd("regex unicode");

Recommendations

Use str.translate() if:

Working with ASCII text only
Performance is critical (processing millions of strings)
Using Python with standard string.punctuation

Use re.sub() if:

Need Unicode punctuation support
Want flexible pattern matching
Performance is acceptable (< 1 million strings)

Use unicodedata if:

Need precise control over punctuation categories
Working with multilingual text requiring exact Unicode handling

JavaScript: Use .replace(/[^\w\s]/gu, '') for:

Standard modern projects
Unicode support required
Readable, concise code

JavaScript: Use /\p{P}/gu if:

ES2024+ browser support confirmed
Need to remove ONLY punctuation (preserve all letters, numbers, symbols)

Conclusion

Removing punctuation is simple in concept but nuanced in implementation. For ASCII-only, Python's translate() is unbeatable. For Unicode, re.sub() is flexible. JavaScript's .replace() handles both with regex. Choose based on your data type, performance needs, and language features available. Test on your actual data — benchmarks vary wildly based on string length and punctuation density.