कैसे अजगर में मीडिया फ़ाइलों के माध्यम से परिमार्जन करने के लिए?

परिचय

वास्तविक विश्व कॉर्पोरेट व्यवसाय सेटिंग में, अधिकांश डेटा टेक्स्ट या एक्सेल फ़ाइलों में संग्रहीत नहीं किया जा सकता है। SQL-आधारित रिलेशनल डेटाबेस जैसे Oracle, SQL Server, PostgreSQL और MySQL व्यापक उपयोग में हैं, और कई वैकल्पिक डेटाबेस काफी लोकप्रिय हो गए हैं।

डेटाबेस का चुनाव आमतौर पर किसी एप्लिकेशन के प्रदर्शन, डेटा अखंडता और मापनीयता आवश्यकताओं पर निर्भर करता है।

इसे कैसे करें..

इस उदाहरण में हम एक sqlite3 डेटाबेस बनाने का तरीका जानेंगे। sqllite डिफ़ॉल्ट रूप से अजगर स्थापना के साथ स्थापित है और किसी और स्थापना की आवश्यकता नहीं है। यदि आप अनिश्चित हैं तो कृपया नीचे प्रयास करें। हम पंडों को भी आयात करेंगे।

SQL से डेटा को डेटाफ़्रेम में लोड करना काफी सरल है, और प्रक्रिया को सरल बनाने के लिए पांडा के कुछ कार्य हैं।

import sqlite3
import pandas as pd
print(f"Output \n {sqlite3.version}")

आउटपुट

2.6.0

आउटपुट

# connection object
conn = sqlite3.connect("example.db")
# customers data
customers = pd.DataFrame({
"customerID" : ["a1", "b1", "c1", "d1"]
, "firstName" : ["Person1", "Person2", "Person3", "Person4"]
, "state" : ["VIC", "NSW", "QLD", "WA"]
})
print(f"Output \n *** Customers info -\n {customers}")

आउटपुट

*** Customers info -
customerID firstName state
0 a1 Person1 VIC
1 b1 Person2 NSW
2 c1 Person3 QLD
3 d1 Person4 WA

# orders data
orders = pd.DataFrame({
"customerID" : ["a1", "a1", "a1", "d1", "c1", "c1"]
, "productName" : ["road bike", "mountain bike", "helmet", "gloves", "road bike", "glasses"]
})

print(f"Output \n *** orders info -\n {orders}")

आउटपुट

*** orders info -
customerID productName
0 a1 road bike
1 a1 mountain bike
2 a1 helmet
3 d1 gloves
4 c1 road bike
5 c1 glasses

# write to the db
customers.to_sql("customers", con=conn, if_exists="replace", index=False)
orders.to_sql("orders", conn, if_exists="replace", index=False)

आउटपुट

# frame an sql to fetch the data.
q = """
select orders.customerID, customers.firstName, count(*) as productQuantity
from orders
left join customers
on orders.customerID = customers.customerID
group by customers.firstName;
"""

आउटपुट

# run the sql.
pd.read_sql_query(q, con=conn)

उदाहरण

7. सब कुछ एक साथ रखना।

import sqlite3
import pandas as pd
print(f"Output \n {sqlite3.version}")
# connection object
conn = sqlite3.connect("example.db")
# customers data
customers = pd.DataFrame({
"customerID" : ["a1", "b1", "c1", "d1"]
, "firstName" : ["Person1", "Person2", "Person3", "Person4"]
, "state" : ["VIC", "NSW", "QLD", "WA"]
})

print(f"*** Customers info -\n {customers}")

# orders data
orders = pd.DataFrame({
"customerID" : ["a1", "a1", "a1", "d1", "c1", "c1"]
, "productName" : ["road bike", "mountain bike", "helmet", "gloves", "road bike", "glasses"]
})

print(f"*** orders info -\n {orders}")

# write to the db
customers.to_sql("customers", con=conn, if_exists="replace", index=False)
orders.to_sql("orders", conn, if_exists="replace", index=False)

# frame an sql to fetch the data.
q = """
select orders.customerID, customers.firstName, count(*) as productQuantity
from orders
left join customers
on orders.customerID = customers.customerID
group by customers.firstName;

"""

# run the sql.
pd.read_sql_query(q, con=conn)

आउटपुट

2.6.0
*** Customers info -
customerID firstName state
0 a1 Person1 VIC
1 b1 Person2 NSW
2 c1 Person3 QLD
3 d1 Person4 WA
*** orders info -
customerID productName
0 a1 road bike
1 a1 mountain bike
2 a1 helmet
3 d1 gloves
4 c1 road bike
5 c1 glasses
customerID firstName productQuantity
____________________________________
0      a1         Person1     3
1 c1 Person3 2
2 d1 Person4 1