Javascript - Regex To Remove Special Characters But Also Keep Greek Characters
Solution 1:
The way these ranges are defined is based on their character code. So, since A
has char code 65
, and z
has char code 122
, the following regex:
[A-z]
would match every letter, but also every character with char codes that fall between those char codes, namely those with codes 91 through 95, which would be the characters [\]^_
. (demo).
Now, for Greek letters, the character codes for the uppercase characters are 913-937 for alpha through omega, and the lowercase characters are 945-969 for alpha through omega (this includes both lowercase variants of sigma, namely ς
(962) and σ
(963)).
So, to match every character except for latin letters, greek letters, and arabic numerals, you need the following regex:
[a-zA-Z0-9α-ωΑ-Ω]
So, for greek characters, it works just like latin letters.
Edit: I've tested this via a Google Translate'd Lipsum, and it looks like this doesn't take accented letters into account. I've checked what the character codes for these accented letters were, and it turns out they are placed right before the lowercase letters, or right after the uppercase letters. So, the following regex works for all greek letters, including accented ones:
[a-zA-Z0-9ά-ωΑ-ώ]
This expanded range now also includes άέήίΰ
(char codes 940 through 944) and ϊϋόύώ
(codes 970 through 974).
To also include whitespace (spaces, tabs, newlines), simply include a \s
in the range:
[a-zA-Z0-9ά-ωΑ-ώ\s]
Demo.
Edit: Apparently there are more Greek letters that needed to be included in this range, namely those in the range [Ά-Ϋ]
, which is the range of letters right before the ά
, so the new regex would look like this:
[a-zA-Z0-9Ά-ωΑ-ώ\s]
Demo.
Solution 2:
Try adding the range of Greek characters like this:
/[^\w\sΆΈ-ϗἀ-῾]/gi
I created this pattern by looking at Unicode pages 0370 Greek and Coptic and 1F00 - Greek Extended. I don't speak Greek, and it's likely that a more restricted character set would be more appropriate, but this seems to work:
"-ἄλφα-".replace(/[^\w\sΆΈ-ϗἀ-῾]/gi, ''); // "ἄλφα"
Solution 3:
var stringToReplace = "παράδειγμαs & /(";
var result = stringToReplace.replace(/[^\u0370-\u03FF\w\s]/mg, "");
DEMO:
http://jsfiddle.net/tuga/LKjYd/
0370-03FF Greek and Coptic Character Block
Post a Comment for "Javascript - Regex To Remove Special Characters But Also Keep Greek Characters"