This dataset is comprised of MUltiplatform BAngla SEntiment (MUBASE) and SentNob datasets. The SentNob dataset consists of public comments on news and videos collected from social media covering 13 different domains, including politics, education, and agriculture. The MUBASE dataset is a multiplatform dataset consisting of Tweets and Facebook posts, which are manually annotated with sentiment polarity.
The dataset under consideration in this shared task combines data from two distinct sources: MUBASE and SentNob. The SentNob dataset consists of public comments from various social media platforms related to news and video content. These comments, curated from 13 diverse domains such as politics, education, and agriculture. The annotation agreement of this manually annotated dataset shows an agreement score of 0.53, which indicates a moderate agreement. On the other hand, the MUBASE dataset is a large collection of multi-platform dataset that includes manually annotated Tweets and Facebook posts, each labeled with their respective sentiment polarity.
This combination offers a rich, diverse, and detailed landscape for studying sentiment analysis across various contexts for Bangla language.
Both MUBASE and SentNob are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
You should have received a copy of the license along with this work. If not, see http://creativecommons.org/licenses/by-nc-sa/4.0/.