Although Nigeria is relatively new in the global application of AI, young talents like Azeez Saheed, an undergraduate at the University of Lagos, are charting a new path in AI. Azeez’s work lays the foundation for Nigeria’s increasing impact in the field.
Azeez recently developed Naijaweb, a dataset comprising 230 million GPT2 tokens sourced from Nairaland, one of Nigeria’s largest online forums. His work seeks to enhance AI’s understanding of Nigerian language and culture, offering the long-term possibility of localised language for data sets.
Read also: Andela, CNCF empower 20,000 Africans with cloud-native skills
Who built Naijaweb, and what is it?
Naijaweb comprises millions of conversations from the Nairaland forum and is divided into categories like politics, entertainment, and others. Azeez collected data from the Nairaland forum, web-scraped it, and then tokenised it for training the GPT-2 models.
The value of this dataset is its ability to familiarise AI models with Nigerian Pidgin (a vernacular language), local slang, and cultural references. Some of these elements are frequently omitted and misinterpreted in datasets collected worldwide, so the AI systems trained on Naijaweb could provide better contextual and accurate responses for its Nigerian users.
Azeez favoured Nairaland above other forums because of its rich and authentic representation of the Nigerian people and culture. With millions of people actively engaging the platform daily, the Unilag undergraduate deemed Nairaland a rich source of data that can be used to train AI models to communicate more effectively with local populations.
It is deliberate to focus on Nairaland because it could be more typical. As a platform that captures the speech and concerns of millions of Nigerians, it allows AI models to be trained to communicate more effectively with local populations.
Read also: JADA’s vision to build a world-class AI workforce in Africa
Promoting AI’s localisation in Nigeria
AI technologies are rapidly evolving, but the effectiveness lies in the data quality used to train them. Global datasets feed models like ChatGPT, but sometimes, they are not thorough enough to capture the nuances of specific languages and cultures, including Nigeria’s.
Localised datasets like Naijaweb could address this gap by:
Enhancing AI’s ability to comprehend and create Nigerian Pidgin and regional dialects
They are reducing the quantity of preprocessed new data to provide contextually relevant responses to the user.
Doing this could accelerate the development of industries that could benefit from localised AI, such as education, customer service, and entertainment.
Although South Africa boasts a diverse population, there is still a significant lack of datasets for developing AI systems tailored to regional needs.
Azeez Saheed’s project attempts to make AI local and relevant in Nigeria and, more importantly, show that technology can be more valuable and popular for Nigerians by leveraging data obtained from platforms like Nairaland.
This brings novelty to the development of the AI ecosystem, as localised datasets will be crucial in ensuring people-centred artificial intelligence systems. Naijaweb’s success stories show that local innovation is needed to extend AI’s benefits to the Nigerian population and the market.
Leave a Reply