http://www.pythonregex.com/
Python Regular Expression Testing Tool
^[a-zA-Z0-9_-]{0,50}$
C_1234-5678
Code:
>>> regex = re.compile("^[a-zA-Z0-9_-]{0,50}$")
>>> r = regex.search(string)
>>> r
<_sre.SRE_Match object at 0x51150ec3caa25ba0>
>>> regex.match(string)
<_sre.SRE_Match object at 0x51150ec3caa25f90>
# List the groups found
>>> r.groups()
()
# List the named dictionary objects found
>>> r.groupdict()
{}
# Run findall
>>> regex.findall(string)
[u'C_1234-5678']
2
http://stackoverflow.com/questions/2525327/regex-for-a-za-z0-9-with-dashes-allowed-in-between-but-not-at-the-start-or-e
Update:
This question was an epic failure, but here's the working solution. It's based on Gumbo's answer (Gumbo's was close to working so I chose it as the accepted answer):
Solution:
r'(?=[a-zA-Z0-9\-]{4,25}$)^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$'
Original Question (albeit, after 3 edits)
I'm using Python and I'm not trying to extract the value, but rather test to make sure it fits the pattern.
allowed values:
spam123-spam-eggs-eggs1
spam123-eggs123
spam1234
eggs123
Not allowed values:
eggs1-
-spam123
spam--spam
I just can't have a dash at the starting or the end. There is a question on here that works in the opposite direction by getting the string value after the fact, but I simply need to test for the value so that I can disallow it. Also, it can be a maximum of 25 chars long, but a minimum of 4 chars long. Also, no 2 dashes can touch each other.
Here's what I've come up with after some experimentation with lookbehind, etc:
# Nothing here
3 matching a hyphen in python re
http://stackoverflow.com/questions/8383213/python-regex-for-hyphenated-words
Try this:
re.findall(r'\w+(?:-\w+)+',text)
Here we consider a hyphenated word to be:
- a number of word chars
- followed by any number of:
- a single hyphen
- followed by word chars
question was:
I'm looking for a regex to match hyphenated words in python.
The closest I've managed to get is: '\w+-\w+[-w+]*'
text = "one-hundered-and-three- some text foo-bar some--text"
hyphenated = re.findall(r'\w+-\w+[-\w+]*',text)
which returns list ['one-hundered-and-three-', 'foo-bar'].
This is almost perfect except for the trailing hyphen after 'three'. I only want the additional hyphen if followed by a 'word'. i.e. instead of the '[-\w+]*' I need something like '(-\w+)*' which I thought would work, but doesn't (it returns ['-three, '']). i.e. something that matches |word followed by hyphen followed by word followed by hyphen_word zero or more times|.
The main problem in your own expression are the square brackets. They don't group the content together, they create a character class, thats something completely different. – stema Dec 5 '11 at 9:46
| |
Thanks for the input, lazyr. I have considered the cases you point out, and they will not pose a problem. Thanks for the clarification, stema. I realised that the square brackets did not group the content, but they resulted in the closest match for what I was attempting to do. – user1081231 Dec 5 '11 at 11:55
|
I don't know what you plan to use this for, but have you considered cases where a trailing or prefixed hyphen is valid, like "nineteenth- and twentieth-century" or "investor-owned and -operated"? – lazyr Dec 5 '11 at 9:38
No comments:
Post a Comment