Data mining can be referred as the process of mining hidden knowledge from large volumes of raw data previously stored in data bases or as excel sheets. Data mining also enables the organizations to uncover and understand the hidden patterns in vast databases by using their current reporting capabilities, which are then further built to data mining models and applied to predict individual behavior and performance with high accuracy. The methods used are at the juncture of artificial intelligence, machine learning, statistics, database systems and business intelligence. Data mining is about solving problems by analyzing data already present in databases.
Data Mining comprises of five basic elements:
Save your time!
We can take care of your essay
- Proper editing and formatting
- Free revision, title page, and bibliography
- Flexible prices and money-back guarantee
Place an order
- Extract, transform, and load transaction data onto the data warehouse system.
- Store and manage the data in a multidimensional database system.
- Provide data access to business analysts and information technology professionals.
- Analyze the data by application software.
- Present the data in a useful format, such as a graph or table.
The sole purpose of data mining effort is to either create a descriptive model, that concisely presents the main characteristics of a data set, or a predictive model, that allows the data miner to predict an unknown value of a specific variable.
Classification in data mining requires two steps which includes the construction of a model, based on the training data set, which is then used to classify an unknown tuple into a class label.
Methodologies of Data Mining
Neural Network
An artificial neural network, often just called a neural network, is a mathematical model inspired by biological neural networks consisting of interconnected groups of artificial neurons that perform the processing of information using a connectionist approach towards computation. Neural networks are mainly required to model complex relationships between inputs and outputs or to find patterns in data. The examination of central nervous systems is considered to be the inspiration for neural networks. In an artificial neural network, simple artificial nodes, called ‘neurons’, ‘neurodes’, ‘processing elements’ or ‘units’, are connected together to form a network which mimics a biological neural network. The artificial neural network provides the approach inspired by biology, that has been largely abandoned for a more practical approach based on statistics and signal processing. Some systems use neural networks as components in larger systems to combine both adaptive as well as non-adaptive elements. It also provides the user with the capabilities to select the network topology, performance parameters, learning rule and stopping criteria.
Decision Trees
A flow chart like structure having each node denoting a test on an attribute value, whist each branch denotes an outcome of test and tree leaves denotes the distribution of classes. It is referred as a predictive model, which is used for classification. Partitioning the input space to cell, each representing one class, decision trees test the value of some input variable in each interior node, with the branches being labelled with the possible results of test. For business perspectives, decision trees are viewed to create a segmentation of original data. It is a favored technique for building understandable models because of its tree structure and skill to easily generate rules.
Genetic Algorithm
A problem-solving strategy to provide an optimal solution is referred as genetic algorithm. Genetic algorithms are based on the principle to select and evolve to produce several solutions to a given problem. Genetic algorithms based on search algorithms that are further based on the concepts of natural selection and genetics. Genetic algorithms are a subset of much larger branch of computation knowns as evolutionary computation.
Educational Data Mining Methods
Education data mining (EDM) consists of many powerful methods, with some acknowledged widely to be as universal data mining techniques, like clustering, prediction etc.
- Clustering. The clustering approach had been applied to obtain clear distinction between clusters. As soon as a set of clusters are determined, new instances are classified by determining the closest cluster.
- Prediction. The deduction of single aspect of data from combinations of data is the main processes within the prediction technique. The main types of methods falling under prediction are classification, regression and density estimation.
- Relationship mining. It is used to identify relationships among variables and encode them in rules for later application. Relationship mining is further classified into few broad categories as association rule mining, sequential pattern mining, correlation mining, and casual data mining.
- Classification approach. It is a supervised learning approach. Being a two-step process, it requires the construction of model by analyzing the data tuples from training data having a set of attributes. This algorithm is applied to training data for construction of model. This model can be used to classify the unknown tuples, if the accuracy of the model is acceptable.
- Regression. The approach to model the relationship between independent variables and dependent variables. The already knowns are independent variables and response variables are able to predict what we want. Thus, for future predictions, more complex techniques may be necessary.
Applications of Educational Data Mining
Three of the specified applications of educational data mining are as below:
- Prediction of the students’ performance. It includes the prediction of what types of students would drop out from school, and then return to school later on. This prediction is based on classification and regression techniques. These models did provide a short-term accuracy to prediction.
- Course management system. Course management systems like Moodle, which contains usage data having different activities are also based on EDM. A simplified data mining toolkit that operates within the course management systems and allows students to get data mining information for their course was developed by Garcia, Romero, Ventura and de Castro.
- Planning and scheduling. Researches on mobile learning environments recently suggest that data mining can be applied to help provide personalized contents to different mobile users, despite the differences between mobile devices and conventional PCs. EDM applications also allows the non-technical user to be engaged in data mining tools and activities making processing more accessible for all EDM users.
Challenges
Along with the development of related technology, costs and challenges associated with the implementation of EDM applications, like storing logged data and managing data systems. Also, which data to be mined and analyzed, may also be a challenge in few cases. A continued concern for the application of EDM tools is individual privacy.
Conclusions and Future Insights
Data mining, itself being a powerful analytical tool, enhances decision making and analyzing new patterns and relationships for organization. While, EDM contains techniques including machine learning, data mining statistics. Opportunities in EDM vary from an analysis at organizational level to an individual level. This paper presents a detailed description of techniques and algorithm for data mining and educational data mining. Thus, data mining is referred as a process of discovering interesting knowledge from large amounts of data stored either in data bases, data warehouses or other informatory repositories. One of the biggest challenges for data mining technology is managing the uncertain data which may be caused by outdated resources, sampling errors, or imprecise calculations.