रेगेक्स के साथ पायथन में पैटर्न मिलान

रेगुलर एक्सप्रेशन क्या है?

वास्तविक दुनिया में, अधिकांश प्रोग्रामिंग भाषाओं में स्ट्रिंग पार्सिंग को नियमित अभिव्यक्ति द्वारा नियंत्रित किया जाता है। पायथन प्रोग्रामिंग भाषा में रेगुलर एक्सप्रेशन टेक्स्ट पैटर्न के मिलान के लिए उपयोग की जाने वाली एक विधि है।

प्रत्येक पायथन इंस्टॉलेशन के साथ आने वाला "री" मॉड्यूल रेगुलर एक्सप्रेशन सपोर्ट प्रदान करता है।

पायथन में, रेगुलर एक्सप्रेशन खोज को आमतौर पर इस प्रकार लिखा जाता है:

match = re.search(pattern, string)

re.search() विधि दो तर्क, एक नियमित अभिव्यक्ति पैटर्न और एक स्ट्रिंग लेती है और स्ट्रिंग के भीतर उस पैटर्न की खोज करती है। यदि पैटर्न स्ट्रिंग के भीतर पाया जाता है, तो खोज () एक मैच ऑब्जेक्ट देता है या कोई नहीं अन्यथा। तो एक नियमित अभिव्यक्ति में, एक स्ट्रिंग दी गई है, यह निर्धारित करें कि क्या वह स्ट्रिंग किसी दिए गए पैटर्न से मेल खाती है, और वैकल्पिक रूप से, प्रासंगिक जानकारी वाले सबस्ट्रिंग एकत्र करें। −

. जैसे सवालों के जवाब देने के लिए रेगुलर एक्सप्रेशन का इस्तेमाल किया जा सकता है

क्या यह स्ट्रिंग एक मान्य URL है?
दिए गए समूह में /etc/passwd में कौन से उपयोगकर्ता हैं?
लॉग फ़ाइल में सभी चेतावनी संदेशों की तिथि और समय क्या है?
एक विज़िटर द्वारा टाइप किए गए URL द्वारा किस उपयोगकर्ता नाम और दस्तावेज़ का अनुरोध किया गया था?

मिलान पैटर्न

नियमित अभिव्यक्ति जटिल मिनी-भाषा हैं। वे अज्ञात स्ट्रिंग्स से मेल खाने के लिए विशेष वर्णों पर भरोसा करते हैं, लेकिन आइए अक्षर, संख्या और स्पेस कैरेक्टर जैसे शाब्दिक वर्णों से शुरू करें, जो हमेशा खुद से मेल खाते हैं। आइए एक बुनियादी उदाहरण देखें:

#Need module 're' for regular expression
import re
#
search_string = "TutorialsPoint"
pattern = "Tutorials"
match = re.match(pattern, search_string)
#If-statement after search() tests if it succeeded
if match:
   print("regex matches: ", match.group())
else:
   print('pattern not found')

परिणाम

regex matches: Tutorials

स्ट्रिंग का मिलान करना

पायथन के "पुनः" मॉड्यूल में कई विधियाँ हैं, और यह जांचने के लिए कि क्या कोई विशेष नियमित अभिव्यक्ति एक विशिष्ट स्ट्रिंग से मेल खाती है, आप re.search() का उपयोग कर सकते हैं। re.MatchObject अतिरिक्त जानकारी प्रदान करता है जैसे कि स्ट्रिंग का कौन सा भाग मिलान पाया गया था।

सिंटैक्स

matchObject = re.search(pattern, input_string, flags=0)

उदाहरण

#Need module 're' for regular expression
import re
# Lets use a regular expression to match a date string.
regex = r"([a-zA-Z]+) (\d+)"
if re.search(regex, "Jan 2"):
   match = re.search(regex, "Jan 2")
   # This will print [0, 5), since it matches at the beginning and end of the
   # string
   print("Match at index %s, %s" % (match.start(), match.end()))
   # The groups contain the matched values. In particular:
   # match.group(0) always returns the fully matched string
   # match.group(1), match.group(2), ... will return the capture
   # groups in order from left to right in the input string  
   # match.group() is equivalent to match.group(0)
   # So this will print "Jan 2"
   print("Full match: %s" % (match.group(0)))
   # So this will print "Jan"
   print("Month: %s" % (match.group(1)))
   # So this will print "2"
   print("Day: %s" % (match.group(2)))
else:
   # If re.search() does not match, then None is returned
   print("Pattern not Found! ")

परिणाम

Match at index 0, 5
Full match: Jan 2
Month: Jan
Day: 2

चूंकि उपरोक्त विधि पहले मैच के बाद बंद हो जाती है, इसलिए डेटा निकालने की तुलना में रेगुलर एक्सप्रेशन के परीक्षण के लिए बेहतर है।

समूह कैप्चर करना

यदि पैटर्न में दो या अधिक कोष्ठक शामिल हैं, तो अंतिम परिणाम कोष्ठक () समूह तंत्र और अंतिम () की सहायता से स्ट्रिंग की सूची के बजाय एक टपल होगा। मिलान किए गए प्रत्येक पैटर्न को एक टपल द्वारा दर्शाया जाता है और प्रत्येक टपल में समूह(1), समूह(2).. डेटा होता है।

import re
regex = r'([\w\.-]+)@([\w\.-]+)'
str = ('hello john@hotmail.com, hello@Tutorialspoint.com, hello python@gmail.com')
matches = re.findall(regex, str)
print(matches)
for tuple in matches:
   print("Username: ",tuple[0]) #username
   print("Host: ",tuple[1]) #host

परिणाम

[('john', 'hotmail.com'), ('hello', 'Tutorialspoint.com'), ('python', 'gmail.com')]
Username: john
Host: hotmail.com
Username: hello
Host: Tutorialspoint.com
Username: python
Host: gmail.com

स्ट्रिंग ढूँढना और बदलना

एक अन्य सामान्य कार्य दिए गए स्ट्रिंग में पैटर्न के सभी उदाहरणों की खोज करना और उन्हें प्रतिस्थापित करना है, re.sub(pattern, प्रतिस्थापन, स्ट्रिंग) वास्तव में ऐसा करेगा। उदाहरण के लिए पुराने ईमेल डोमेन के सभी उदाहरणों को बदलने के लिए

कोड

# requid library
import re
#given string
str = ('hello john@hotmail.com, hello@Tutorialspoint.com, hello python@gmail.com, Hello World!')
#pattern to match
pattern = r'([\w\.-]+)@([\w\.-]+)'
#replace the matched pattern from string with,
replace = r'\1@XYZ.com'
   ## re.sub(pat, replacement, str) -- returns new string with all replacements,
   ## \1 is group(1), \2 group(2) in the replacement
print (re.sub(pattern, replace, str))

परिणाम

hello john@XYZ.com, hello@XYZ.com, hello python@XYZ.com, Hello World!

फिर से विकल्प फ़्लैग करें

ऊपर की तरह पायथन नियमित अभिव्यक्ति में, हम पैटर्न मिलान के व्यवहार को संशोधित करने के लिए विभिन्न विकल्पों का उपयोग कर सकते हैं। ये अतिरिक्त तर्क, वैकल्पिक ध्वज को खोज () या खोज () आदि फ़ंक्शन में जोड़ा जाता है, उदाहरण के लिए re.search(pattern, string, re.IGNORECASE)।

इग्नोरकेस -

जैसा कि नाम से संकेत मिलता है, यह पैटर्न केस को असंवेदनशील (अपर/लोअरकेस) बनाता है, इसके साथ, स्ट्रिंग्स जिसमें 'a' और 'A' दोनों मैच होते हैं।
डॉटॉल

re.DOTALL डॉट (.) मेटाकैरेक्टर को न्यूलाइन (\n) सहित सभी वर्णों से मेल खाने की अनुमति देता है।
मल्टीलाइन

re.MULTILINE एक स्ट्रिंग की प्रत्येक पंक्ति के प्रारंभ(^) और अंत($) के मिलान की अनुमति देता है। हालांकि, आम तौर पर, ^ और &पूरी स्ट्रिंग के प्रारंभ और अंत से मेल खाएंगे।