Folk literature is an integral part of every society. It reflects the emotions, feelings, and experiences of the individuals of a society. Assam in the North-Eastern part of India is well known for its rich and diversified culture and its richness can be easily noticed in the various forms of folk music as the people of the state belong to different tribes and communities. A variety of music forms are observed among the people of Assam which provides a wonderful way of expressing diverse communities as well as their traditions. The tunes of most of the Assamese folk songs are in pentatonic scale similar to other traditional music of Asia like China, Mongolia, etc.
A melody is made up of several musical notes or pitches that are joined together to form one whole. The majority of compositions are made up of several melodies that interact with one another. However, when we refer to a folk song's melody, we really just mean the precise note transitions of the various folk song forms without any accompaniment. The melodies of a specific class of folk songs have a distinctive note transition pattern which makes it different from those of other types. Researches have been carried out to study the melody similarity of Hindustani Classical music where high-level music features have been used.
In this work, melodies of five major varieties of folk songs are selected for identification of the type using short-term features. Although the problem under consideration and the genre recognition problem are somewhat comparable, they are not exactly the same. A genre of a particular class of music is characterized by different elements such as accompaniment, polyphony, etc. Here our main focus is on the melody only, and it is a major reason for not considering live music or pre-recorded songs. On the other hand, only a few types of folk songs such as lokageet, and Bianam are available in recorded format. Since these few types of folk songs are recorded for commercial purposes, rawness is missing in the melodies of the songs. Melodies of the selected folk songs are played in a harmonium with the help of expert musicians and samples are recorded.The mel frequency cepstrum is found to be highly effective in modeling the subjective pitch and frequency content of audio signals as well as recognizing the structure of music signals. Based on Mel-frequency Cepstral Coefficients (MFCC), a number of musicological researches are carried out to identify genre, singer, music style, musical instruments etc. In this experiment, four popular supervised learning models namely Decision Tree Classifier (DTC), Linear Discriminant Analysis (LDA), Random Forest Classifier, and Support Vector Machine (SVM) are developed using the MFCC (Mel-frequency Cepstral Coefficients) features extracted from the audio signals of the melodies and also the identification performance is assessed using different evaluation techniques.
The festival of
In human life, marriage is a memorable ceremony. The cheerful circumstances of wedding are the source of
In every society, religious-traditional beliefs are rooted deeply in the human mind. In the ancient period, it was believed by Assamese people that the causes of Smallpox, Chickenpox, etc. were some angry Devine energy, called
From the ancient period of time people of Kamrup area of Assam used to express their feelings and emotions in the form of songs which are popularly known as
It is a type of folk music sung by the
Music Information retrieval (MIR) which is an interdisciplinary science of retrieving information from music is a rapidly growing field of research with several real-world applications. The growth of data mining techniques and signal processing have made it easy to study the different features of a piece of music. Classification as well as identification of music, namely genre classification, mood identification, instrument identification etc. have become very popular in the field of the musicological research.
In order to identify melody from standard MIDI files, an algorithm, based on Bayesian maximum-likelihood approach and dynamic programming is used that achieves an overall accuracy of 89%
In today’s machine learning applications, (Support Vector Machine), SVM is found to be one of the best algorithms for solving different types of classification as well as identification problems. In order to classify musical instruments, Prabavathy et. al. use MFCC features and Sonogram features
The data for this study is the melodies of the selected types of folk songs. For collecting samples for the study four folksingers were contacted and briefed about our objectives. All of them consented to record the songs sung by them. They are requested to sing the songs having variations in their melodic structure. In the next stage, with the help of an expert harmonium player, the melodies of the already recorded songs are played in harmonium and recorded in the .wav format at a sampling rate of 44100 Hz under the same acoustic environment. To maintain homogeneity, all the melodies are played in the same key or scale (Note A) of the same harmonium. A total of 60 songs are recorded in this experiment, out of which, 15 songs from
Including the extraction of the features, all the analysis is conducted in Python programming languages. From each of the audio samples, spectrogram is generated using the Matplotlib library which is a numerical extension of NumPy, a fundamental package for scientific computing. Features extraction from the spectrograms is done using librosa, a Python library for music and audio analysis.
A spectrogram is a visual representation of signal strength over time at various frequencies present in a particular waveform. The horizontal axis represents time while the vertical axis is used to represent the frequency of the signal. A third dimension, color, is used to describe the amplitude (or energy) of a particular frequency at a particular time. In this study, MEL (having MEL frequency bins on the y-axis) spectrogram is extracted from each of the samples. In
Mel-frequency Cepstral Coefficients (MFCC) that are introduced by Davis and Mermelstein in 1980, are a set of features (usually 10 to 20) that have wide use in automatic speech and speaker recognition. The representation of a short-term power spectrum of a sound is known as mel-frequency cepstrum (MFC). The coefficients that collectively make up an MFC are called Mel-Frequency Cepstral Coefficients. In other words, these are the cepstral representation of a signal where the frequency bands are distributed according to mel-scale. The MFCCs have been applied in a wide range of audio analyses, and have shown good performance compared to other features
A decision tree is a tree-structured multistage classification strategy where each internal node represents a test on an attribute. Each branch represents an outcome of the test. Class label or dependent variable is represented by each leaf node (or terminal node). For a tuple X, the attribute values of the tuple are tested against the decision tree. A path is discovered from the root to a leaf node that holds the tuple's class prediction. A decision tree can be easily converted into a classification rule. As a predictive modeling approach, it has wide applications in statistics, machine learning as well as data mining.
Linear Discriminant Analysis or discriminant function analysis is a multivariate technique to find a linear combination of features that characterizes or classifies two or more sets of objects or events. It is also used as a dimensionality reduction technique as a pre-processing step for machine learning. The term discrimination was introduced by R. A. Fisher in the first modern treatment of separative problems
Support Vector Machine (SVM) is one of the robust and accurate classification algorithms in a wide range of machine learning applications. It was developed by Vladimir Vapnik with his colleagues at AT&T Bell Laboratories. SVM has a sound theoretical foundation and needs only a dozen training examples. It determines the best hyperplane in the input space that differentiates between the classes. Originally this algorithm was developed for binary classification problems. In the case of multiclass classification, the problem is reduced to multiple binary classification problems.
A random forest is a popular supervised machine learning algorithm that can be used for both classification and regression problems. It is an ensemble classification method that produces multiple classifiers using a randomly selected subset of training samples to solve complex problems. Random forest contains a number of decision trees on various subsets of the given dataset and takes the average to improve the accuracy of prediction. Instead of relying on one decision tree, it uses the prediction from each tree, and based on the majority of predictions, the final prediction is made. This algorithm is found to have wide applications in the fields like Banking, Marketing, Medicine, remote sensing, etc.
After fitting the models following measures are adopted to evaluate the performances of the fitted model.
Where, TP= True Positive, FP= False Positive
TN= True Negative, FN= False Negative
Precision also called positive predictive value gives the value of the fraction of relevant instances among the retrieved instances and recall or sensitivity gives the value of the fraction of relevant instances that were retrieved. Where both the error false positive and false negative are equally serious, a decision is taken on the basis of the F-1 score.
A multivariate normality test is performed using the Henz-Zirkler test
|
|
|
Decision Tree Classifier |
73.58% |
71.54% - 75.62% |
Linear Discriminant Analysis |
85.58% |
83.65% - 87.51% |
Support Vector Machine |
94.17% |
92.67% - 95.67% |
Random Forest Classifier |
86.11% |
84.55% - 87.67% |
In this experiment, the performance of the Support Vector Machine in the identification of the melodies is quite good with a maximum average accuracy score i.e. 94.17% while the decision tree classifier predicts with the lowest accuracy score of 73.58%. The performance of Linear discriminant analysis and Random forest classifier are very close to each other.
In most of the musicological research work of genre classification or melody classification, the features are extracted from the original version of the recorded song or by using the written scores of the piece of music. Since the folk songs are not available in recorded form and neither their note transitions are available in written format, the melodies are recorded one by one from the primary sources. This characteristic makes this experiment unique from the others. For extraction of the features, instead of singing, all the melodies of the different types of folk songs are played on the same harmonium at the same note ‘A’ so that homogeneity can be maintained in the other characteristics. This is not possible in the case of the original recorded version of songs. Though some recorded versions of some folk songs are available, rawness is lacking due to the use of different modern musical instruments.
To identify melody, Standard MIDI files, as well as symbolic scores, are used
For a specific randomly chosen test sample, the results are visualized in the confusion matrices in
Out of the 19
The classification reports for the four fitted models are shown in
Identification of folk song melody by hearing it is not an easy task. Human listeners having knowledge about the particular folk music can readily identify the melody of that type of folk songs. In this experiment, an attempt has been made to develop a computational approach for the identification of the folk song melody using short-term features instead of using the note transition of the melody. This experiment will definitely provide a fair comparison between the popular classification algorithms DTC, LDA, SVM, and Random Forest Classifier in the identification of the solo folk song melody.
One of the biggest challenges in conducting this experiment was the collection of the audio samples for each type of folk under consideration. Since hearing the songs, the melody was played in harmonium to obtain the final audio samples, the process was very tedious. Results from this experiment indicate that the performance of SVM is quite good compared to the other three models. However increasing the training sample size, there is a possibility to improve the performances of the other three models also. The classification accuracy does not depend only on the features selected to fit the models but also on the size of the training set. This work will definitely provide a direction to study the important features in the identification of solo folk song melody and also the optimum size of the training data for each of the models.
We would like to express our sincere gratitude to the folk music experts Mr. Ajit Bora, Mr. Ghana Kanta Saikia, Mr. Sunil Bhuyan, Mrs. Mamaoni Saika, and Mr. Ranjit Bora for their active participation in the data collection process of the experiment. Without their contribution, it would not be possible to conduct this study.