पायथन का उपयोग करके टेक्स्ट डेटा को आयामी वैक्टर में कैसे एम्बेड किया जा सकता है?

Tensorflow एक मशीन लर्निंग फ्रेमवर्क है जो Google द्वारा प्रदान किया जाता है। यह एक ओपन-सोर्स फ्रेमवर्क है जिसका उपयोग पायथन के संयोजन में एल्गोरिदम, गहन शिक्षण अनुप्रयोगों और बहुत कुछ को लागू करने के लिए किया जाता है। इसका उपयोग अनुसंधान और उत्पादन उद्देश्यों के लिए किया जाता है।

केरस को प्रोजेक्ट ONEIROS (ओपन-एंडेड न्यूरो-इलेक्ट्रॉनिक इंटेलिजेंट रोबोट ऑपरेटिंग सिस्टम) के लिए अनुसंधान के एक भाग के रूप में विकसित किया गया था। केरस एक डीप लर्निंग एपीआई है, जिसे पायथन में लिखा गया है। यह एक उच्च-स्तरीय एपीआई है जिसमें एक उत्पादक इंटरफ़ेस है जो मशीन सीखने की समस्याओं को हल करने में मदद करता है। यह Tensorflow ढांचे के शीर्ष पर चलता है। इसे त्वरित तरीके से प्रयोग में मदद करने के लिए बनाया गया था। यह आवश्यक सार तत्व और बिल्डिंग ब्लॉक्स प्रदान करता है जो मशीन लर्निंग सॉल्यूशंस को विकसित करने और इनकैप्सुलेट करने के लिए आवश्यक हैं।

केरस पहले से ही Tensorflow पैकेज में मौजूद है। इसे कोड की नीचे दी गई लाइन का उपयोग करके एक्सेस किया जा सकता है।

import tensorflow
from tensorflow import keras

केरस कार्यात्मक एपीआई ऐसे मॉडल बनाने में मदद करता है जो अनुक्रमिक एपीआई का उपयोग करके बनाए गए मॉडल की तुलना में अधिक लचीले होते हैं। कार्यात्मक एपीआई उन मॉडलों के साथ काम कर सकता है जिनमें गैर-रेखीय टोपोलॉजी है, परतों को साझा कर सकते हैं और कई इनपुट और आउटपुट के साथ काम कर सकते हैं। एक गहन शिक्षण मॉडल आमतौर पर एक निर्देशित चक्रीय ग्राफ (DAG) होता है जिसमें कई परतें होती हैं। कार्यात्मक एपीआई परतों का ग्राफ बनाने में मदद करता है।

हम नीचे दिए गए कोड को चलाने के लिए Google सहयोग का उपयोग कर रहे हैं। Google Colab या Colaboratory ब्राउज़र पर पायथन कोड चलाने में मदद करता है और इसके लिए शून्य कॉन्फ़िगरेशन और GPU (ग्राफ़िकल प्रोसेसिंग यूनिट) तक मुफ्त पहुंच की आवश्यकता होती है। जुपिटर नोटबुक के ऊपर कोलैबोरेटरी बनाई गई है। निम्नलिखित कोड स्निपेट है जिसमें हम शीर्षक में प्रत्येक शब्द को 64-आयामी वेक्टर में एम्बेड करेंगे -

उदाहरण

print("Number of unique issue tags")
num_tags = 12
print("Size of vocabulary while preprocessing text data")
num_words = 10000
print("Number of classes for predictions")
num_classes = 4
title_input = keras.Input(
   shape=(None,), name="title"
)
print("Variable length int sequence")
body_input = keras.Input(shape=(None,), name="body")
tags_input = keras.Input(
   shape=(num_tags,), name="tags"
)
print("Embed every word in the title to a 64-dimensional vector")
title_features = layers.Embedding(num_words, 64)(title_input)
print("Embed every word into a 64-dimensional vector")
body_features = layers.Embedding(num_words, 64)(body_input)
print("Reduce sequence of embedded words into single 128-dimensional vector")
title_features = layers.LSTM(128)(title_features)
print("Reduce sequence of embedded words into single 132-dimensional vector")
body_features = layers.LSTM(32)(body_features)
print("Merge available features into a single vector by concatenating it")
x = layers.concatenate([title_features, body_features, tags_input])
print("Use logistic regression to predict the features")
priority_pred = layers.Dense(1, name="priority")(x)
department_pred = layers.Dense(num_classes, name="class")(x)
print("Instantiate a model that predicts priority and class")
model = keras.Model(
   inputs=[title_input, body_input, tags_input],
   outputs=[priority_pred, department_pred],
)

कोड क्रेडिट - https://www.tensorflow.org/guide/keras/functional

आउटपुट

Number of unique issue tags
Size of vocabulary while preprocessing text data
Number of classes for predictions
Variable length int sequence
Embed every word in the title to a 64-dimensional vector
Embed every word into a 64-dimensional vector
Reduce sequence of embedded words into single 128-dimensional vector
Reduce sequence of embedded words into single 132-dimensional vector
Merge available features into a single vector by concatenating it
Use logistic regression to predict the features
Instantiate a model that predicts priority and class

स्पष्टीकरण

कार्यात्मक एपीआई का उपयोग कई इनपुट और आउटपुट के साथ काम करने के लिए किया जा सकता है।
यह क्रमिक API के साथ नहीं किया जा सकता है।