26 बिग डेटा एनालिटिक तकनीकों में एक अंतर्दृष्टि:भाग 2

अब तक बिग डेटा के बारे में मेरे ब्लॉग में, मैंने आपको बिग डेटा के विभिन्न पहलुओं से परिचित कराया है, इसका वास्तव में क्या मतलब है और इसके लिए क्या करें और क्या न करें। पिछले ब्लॉग में हमने कुछ बिग डेटा एनालिटिक्स तकनीकें देखीं। सूची को इस ब्लॉग में आगे ले जा रहे हैं।

पैटर्न पहचान
पैटर्न पहचान मशीन लर्निंग की एक शाखा है जो डेटा में पैटर्न और नियमितताओं की पहचान पर ध्यान केंद्रित करती है, हालांकि कुछ मामलों में इसे मशीन लर्निंग का लगभग समानार्थी माना जाता है। पैटर्न रिकग्निशन सिस्टम कई मामलों में लेबल किए गए "प्रशिक्षण" डेटा (पर्यवेक्षित शिक्षण) से प्रशिक्षित होते हैं, लेकिन जब कोई लेबल डेटा उपलब्ध नहीं होता है, तो अन्य एल्गोरिदम का उपयोग पहले के अज्ञात पैटर्न (अप्रबंधित शिक्षण) की खोज के लिए किया जा सकता है।

भविष्य कहनेवाला मॉडलिंग
प्रिडिक्टिव एनालिटिक्स में कई तरह की तकनीकें शामिल हैं जो ऐतिहासिक और वर्तमान डेटा के आधार पर भविष्य के परिणामों की भविष्यवाणी करती हैं। व्यवहार में, अनुमानित विश्लेषण लगभग सभी विषयों पर लागू किया जा सकता है - कई हजार सेंसर से डेटा की धारा के आधार पर जेट इंजन की विफलता की भविष्यवाणी करने से लेकर, वे क्या खरीदते हैं, और यहां तक कि क्या खरीदते हैं, इसके आधार पर ग्राहकों की अगली चाल की भविष्यवाणी करने के लिए। वे सोशल मीडिया पर कहते हैं। भविष्य कहनेवाला विश्लेषण तकनीक मुख्य रूप से सांख्यिकीय विधियों पर आधारित हैं।

प्रतिगमन विश्लेषण
यह एक ऐसी तकनीक है जो स्वतंत्र चरों का उपयोग करती है और यह बताती है कि वे आश्रित चरों को कैसे प्रभावित करते हैं। इंटरनेट प्लेटफॉर्म पर प्यार पाने की संभावना जैसे सोशल मीडिया एनालिटिक्स का निर्धारण करने में यह एक बहुत ही उपयोगी तकनीक हो सकती है।

भावना विश्लेषण
भावना विश्लेषण किसी विषय के संबंध में वक्ताओं या लेखकों की भावनाओं को निर्धारित करने में शोधकर्ताओं की मदद करता है। मदद के लिए भावना विश्लेषण का उपयोग किया जा रहा है:
<उल शैली ="पाठ्य-संरेखण:औचित्य;">
अतिथि टिप्पणियों का विश्लेषण करके होटल श्रृंखला में सेवा में सुधार करें।

ग्राहक वास्तव में जो मांग रहे हैं उसे पूरा करने के लिए प्रोत्साहन और सेवाओं को अनुकूलित करें।

सोशल मीडिया की राय के आधार पर निर्धारित करें कि उपभोक्ता वास्तव में क्या सोचते हैं।

सिग्नल प्रोसेसिंग
सिग्नल प्रोसेसिंग एक सक्षम करने वाली तकनीक है जिसमें मौलिक सिद्धांत, अनुप्रयोग, एल्गोरिदम, और कई अलग-अलग भौतिक, प्रतीकात्मक, या अमूर्त प्रारूपों में निहित जानकारी को संसाधित करने या स्थानांतरित करने के कार्यान्वयन शामिल हैं, जिन्हें व्यापक रूप से निर्दिष्ट किया गया है। संकेत . यह प्रतिनिधित्व, मॉडलिंग, विश्लेषण, संश्लेषण, खोज, पुनर्प्राप्ति, संवेदन, अधिग्रहण, निष्कर्षण, सीखने, सुरक्षा या फोरेंसिक के लिए गणितीय, सांख्यिकीय, कम्प्यूटेशनल, अनुमानी, और भाषाई प्रतिनिधित्व, औपचारिकताओं और तकनीकों का उपयोग करता है। नमूना अनुप्रयोगों में समय श्रृंखला विश्लेषण के लिए मॉडलिंग या कम सटीक डेटा स्रोतों (यानी, शोर से संकेत निकालना) के सेट से डेटा को जोड़कर अधिक सटीक रीडिंग निर्धारित करने के लिए डेटा फ़्यूज़न को लागू करना शामिल है।

स्थानिक विश्लेषण
स्थानिक विश्लेषण वह प्रक्रिया है जिसके द्वारा हम अपरिष्कृत डेटा को उपयोगी जानकारी में बदलते हैं। It is the process of examining the locations, attributes, and relationships of features in spatial data through overlay and other analytical techniques in order to address a question or gain useful knowledge. Spatial analysis extracts or creates new information from spatial data.

Statistics
In statistics, exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modelling or hypothesis testing task. Statistical techniques are also used to reduce the likelihood of Type I errors (“false positives”) and Type II errors (“false negatives”). An example of an application is A/B testing to determine what types of marketing material will most increase revenue.

Supervised learning
Supervised learning is the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples . In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal ). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.

Social Network Analysis
Social network analysis is a technique that was first used in the telecommunications industry, and then quickly adopted by sociologists to study interpersonal relationships. It is now being applied to analyze the relationships between people in many fields and commercial activities. Nodes represent individuals within a network, while ties represent the relationships between the individuals.

Simulation
Modeling the behavior of complex systems, often used for forecasting, predicting and scenario planning. Monte Carlo simulations, for example, are a class of algorithms that rely on repeated random sampling, i.e., running thousands of simulations, each based on different assumptions. The result is a histogram that gives a probability distribution of outcomes. One application is assessing the likelihood of meeting financial targets given uncertainties about the success of various initiatives

Time Series Analysis
Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series data often arise when monitoring industrial processes or tracking corporate business metrics. Time series analysis accounts for the fact that data points taken over time may have an internal structure (such as autocorrelation, trend or seasonal variation) that should be accounted for. Examples of time series analysis include the hourly value of a stock market index or the number of patients diagnosed with a given condition every day.

Unsupervised Learning
Unsupervised learning is the machine learning task of inferring a function to describe hidden structure from unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution – this distinguishes unsupervised learning from supervised learning and reinforcement learning.

However, unsupervised learning also encompasses many other techniques that seek to summarize and explain key features of the data.

Visualization
Data visualization is the preparation of data in a pictorial or graphical format. It enables decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns. With interactive visualization, you can take the concept a step further by using technology to drill down into charts and graphs for more detail, interactively changing what data you see and how it’s processed.

Conclusion

Big data analytics has been one of the most important breakthroughs in the information technology industry. In fact, Big Data has shown its importance and need almost in all sectors, and in all the departments of those industries. There is not a single aspect of life which has not been affected by Big Data, not even our personal lives. Hence we need Big Data Analytics to manage this huge amounts of Data efficiently.

As said before this list is not exhaustive. Researchers are still experimenting on new ways of Analyzing this huge amounts of Data which is present in a variety of forms whose speed of generation is increasing with time to derive values for our specific uses.

26 बिग डेटा एनालिटिक तकनीकों में एक अंतर्दृष्टि:भाग 2

Conclusion