Regex only if group sequence not followed by? - swift

Using Swift 4.2, trying to match a regex similar to 1980 / 1989 / 2019 etc., but what I need is a match only if sequence is not followed by "p" ? what I'm trying ... "(?:[1-2]{1}[0,9]{1}[0-9]{1,2})\1(?![p])"

After plenty of testing I found a solution ... "(?:([1-2]{1}[0,9]{1}[0-9]{2})(?![p]))"

Let's consider this string :
let str = """
10 Hello 980 world,
1975 Hello 1980 world,
1985p Hello :1995 world,
2000 Hello 2005p world,
2010 Hello 2015 world,
2019 Hello 2020 world,
2999
"""
Let's declare this regex :
let pattern = "19[89]{1}[0-9]{1}(?![p])|20[01]{1}[0-9]{1}(?![p])"
let regex = try! NSRegularExpression(pattern: pattern)
Here are the different parts of the pattern :
19 matches the characters 19 literally,
[89]{1} matches a single character in the list 89, to limit the years to the 1980s and 90s,
[0-9]{1} one digit for the year,
(?![p]) negative lookahead, meaning: not followed by p,
| logical OR,
20 matches the characters 20 literally,
[01]{1} matches a single character in the list 01, to limit the years to the 2000s and 2010s,
[0-9]{1} one digit for the year,
(?![p]) negative lookahead, meaning: not followed by p.
Now, let's get the matches :
let matches =
regex.matches(in : str,
range : NSRange(location: 0, length: str.utf16.count))
for match in matches {
let range = match.range
if let swiftRange = Range(range, in: str) {
let name = str[swiftRange]
print(name)
}
}
Which prints in the console :
1980
1995
2000
2010
2015
2019
Bear in mind that this would still match things like 1990s, 1999a, 19999999, since you've only asked to not be followed by p.

Related

Extracting Month and Year from a string with Python Regex

I have a string from which I want to extract month name and year with Python regex. The string looks like the following-
x='januray valo na Feb 2017 valo Jan-2015 anj 1900 puch Janu Feb Jan Mar 15 MMMay-85 anF 15'
I code should return the following-
['Feb 2017', 'Jan-2015', 'Mar 15', 'May-85']
I have tried-
re.findall('[Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec]{3}[\s-]\d{2,4}', x)
But I the code picking up anF 15 as well, i.e. I am getting the following output-
['Feb 2017', 'Jan-2015', 'Mar 15', 'May-85', 'anF 15']
How can I stop the code from picking up wroong combinations like Jan|Feb?
Use an alternation for the abbreviated month names. That is, use the following regex pattern:
(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[\s-]\d{2,4}
This says what you intend, namely to match one of 12 abbreviated month names, followed by a space/dash, then 2 or 4 digits.
x = 'januray valo na Feb 2017 valo Jan-2015 anj 1900 puch Janu Feb Jan Mar 15 MMMay-85 anF 15'
results = re.findall('(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[\s-]\d{2,4}', x)
print(results)
['Feb 2017', 'Jan-2015', 'Mar 15', 'May-85']
The problem with your current pattern is that it using a character class:
[Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec]{3}[\s-]\d{2,4}
This actually says to match three letters from the letters contained by the month names (plus pipe). Put another way, you are saying this:
[abceglnoprtuvyADFJMNOS|]{3}[\s-]\d{2,4}
You are using character class here [Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec]{3}, which means any character from the character collection with repetition 3({3}). In order to fix it use a non-capturing group instead.
re.findall('(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[\s-]\d{2,4}', x)
/[a-z]{3}.?\d{4}/gi
this will work
check here

use regex to make sure something contains separate digits

I am trying to scrape locations of companies on websites. I have this function:
x=['174 WEST 4TH ST, NYC','All contents © Copyright 2018 Propela']
import re
def is_location(text):
"""Does text contain digits, lowercase and uppercase letters"""
return all(re.search(pattern, text) for pattern in ['\d{3,16}', '[a-z]*', '[A-Z]'])
# x[1]
# is_location(x[2])
print(list(filter(is_location, x)))
I wanted to use regex and somehow only catch things if digits are mentions twice, so since in 174 WEST 4TH ST, NYC there is a group of digits 174 and then another separate digit 4.
is this possible?
You may use the following pattern to match two numbers occurring in separate words in the string:
\d+.*\s+.*\d+
Here is a sample code:
line = "174 WEST 4TH ST, NYC";
res = re.search( r'\d+.*\s+.*\d+', line, re.M|re.I)
if res:
print "found a match: ", res.group()
else:
print "no match"

Matching regex pattern in Array of Strings

I have a following snapshot of a long String array:
Array[String] = Array("Tony Stark (USA) 16th October 2015", "Peter Comb (Canada) 21st September 2015")
I expect to have output as:
Array[String] = Array("Tony Stark", "Peter Comb")
Array[String] = Array("USA", "Canada")
Array[String] = Array("16th October 2015", "21st September 2015")
I have tried this:
"[.]+\\(([.]+)\\)[.]+"
But it is unable to parse. What could be the regex pattern to parse my RDD?
The issue with your regex is that inside the [], . is a literal . not a wildcard.
You're also missing groups around the name and the dates. The correct regex would be (.+)\\((.+)\\)(.+).
Calling the array a and the regex r, this gives:
scala> a.map {case r(name, country,year) => (name, country, year)}
res4: Array[(String, String, String)] = Array(("Tony Stark ",USA," 16th October 2015"), ("Peter Comb ",Canada," 21st September 2015"))
Presumably you'd want to match the spaces as well so they don't get pulled out in the groups.
val rdd: Array[String] = Array ("Tony Stark (USA) 16th October 2015", "Peter Comb (Canada) 21st September 2015")
(0 to 2).map (i => rdd.map (_.split ("[\\)\\(]")).map (a=> a(i)))
Vector(Array("Tony Stark ", "Peter Comb "), Array(USA, Canada), Array(" 16th October 2015", " 21st September 2015"))
A final trim cleans up the whitespace:
(0 to 2).map (i => rdd.map (_.split ("[\\)\\(]")).map (a=> a(i).trim))
Vector(Array(Tony Stark, Peter Comb), Array(USA, Canada), Array(16th October 2015, 21st September 2015))
Now to the regex:
"[.]+\\(([.]+)\\)[.]+"
A character group of one character makes rarely much sense - [a]+ is the same as a+. But for the dot it is different, it makes the dot a literal dot, since a dot as joker in a group doesn't make sense, it is just .+ .
While your sample text doesn't contain any literal dot, nor multiple in consecutive form, I guess it was just meant as .+
".+\\((.+)\\).+"
But regexes can be used in multiple ways. s.replace, s.matches, s.split and so on. Without information how you used it, it doesn't allow further reasoning.

regex for Google script to extract hour and time

How do I extract the hour and minute from a string containing full date using the regex?
text= "February 28, 2016 at 03:14PM";
hour= text.replace (/s /g , "");
min= text.replace (/s /g , "");
TYhe expected result (hour in 24 format) should be:
hour=15
min=14
(\d{2}):(\d{2})((P|A)M)
So you expect two digits (and put it into a capturing group), colon, and two digits (and put it into a capturing group) and after that you make a capturing group, so you extract AM or PM separately. After that you have to compare, in a programming language you use, its value with AM or PM and add or not 12 to hours.
Here is a demo: https://regex101.com/r/rT7aG3/2
You did not specify a language, so I make an example in R:
library(stringi)
text= "February 28, 2016 at 03:14PM";
ret <- stri_match_all_regex(text, "(\\d{2}):(\\d{2})((P|A)M)")
hour = as.integer(ret[[1]][1,2])
minute = as.integer(ret[[1]][1,3])
if(ret[[1]][1,4] == "PM")
hour <- hour + 12

RegEx / Regular expression for european dates (“j. F Y”)

I need a way to find dates using .RegularExpressionSearch in Swift 2.0, written in an european-style like
"6. November 2015", "13. Februar 2015" or "24.Dezember2015" (with an optional space).
I tried a lot expressions using this very helpful regex tester Web site, however, I couldn't get a working solution. I appreciate any help!
You may use the following regex:
let regex = try NSRegularExpression(pattern: "\\d+\\.\\s*\\p{L}+\\s*\\d{4}", options: [])
See regex demo
The regex will match:
\d+ - 1 or more digits (you can limit with a limiting quantifier to 1 or 2: \d{1,2})
\. - literal dot
\s* - 0 or more whitespace
\p{L}+ - a sequence of letter-only symbols
\s* - 0 or more whitespace
\d{4} - exactly 4 digits.
Swift code demo:
func matchesForRegexInText(regex: String, text: String) -> [String] {
do {
let regex = try NSRegularExpression(pattern: regex, options: [])
let nsString = text as NSString
let results = regex.matchesInString(text,
options: [], range: NSMakeRange(0, nsString.length))
return results.map { nsString.substringWithRange($0.range)}
} catch let error as NSError {
print("invalid regex: \(error.localizedDescription)")
return []
}
}
let string = "6. November 2015 and 13. Februar 2015 or 24.Dezember2015 1. My christmas gift 2015 01.01.2015"
let matches = matchesForRegexInText("\\d+\\.\\s*\\p{L}+\\s*\\d{4}", text: string)
print(matches)
Results: ["6. November 2015", "13. Februar 2015", "24.Dezember2015"]
I think this regex will be useful for you:
(\d{1,2}\s*\.\s*[^\d]+\d{4})

Resources