Mission and objectivesAs the United Nations lead agency on international development, UNDP works in 170 countries and territories to eradicate poverty and reduce inequality. We help countries to develop policies, leadership skills, partnering abilities, institutional capabilities, and to build resilience to achieve the Sustainable Development Goals. Our work is concentrated in three focus areas; sustainable development, democratic governance and peace building, and climate and disaster resilience.
ContextTo contribute to knowledge-sharing efforts and the documentation of lived experiences, we collect video and audio interviews capturing personal stories from diverse individuals. These interviews serve as a rich source of qualitative data for understanding lived realities, cultural contexts, and social dynamics, contributing to evidence-based dialogue and informed decision-making for sustainable development. However, the volume of recording collected has created a bottleneck in processing and analysis. Each recording needs to be transcribed, translated (where necessary), and analyzed to extract key insights through natural language processing (NLP) techniques. Given the sensitivity of the recordings, all processing must be conducted locally to ensure data security. The processing must accommodate multiple languagesโstarting with Setswana, English, and Russian. A language-detection feature or user-enforced settings should ensure accurate processing. This assignment is aimed at establishing a pipeline for: โข Batch transcription of recordings โข (Optional) Translation into English. โข (Optional) Basic NLP processing (e.g., named entity recognition and keyword extraction). This initiative will contribute to advancing innovative methods for qualitative analysis while preserving data privacy and security. Volunteers will have the opportunity to use their skills to create a tool that enables meaningful insights and promotes dialogue for sustainable development.
Task DescriptionThe online volunteers will work on creating a pipeline for NLP processing audio and video records. More specifically this includes: 1. Develop a Python-based pipeline for processing video interviews that includes: a. Batch transcription of audio and video files. b. Language detection or user-enforced language selection. c. (Optional) Translation of non-English transcripts into English. d. (Optional) Basic NLP processing, including named entity recognition and keyword extraction. 2. Ensure the solution can handle multiple file formats (e.g., MP4, MKV, MP3, WAV). 3. Processing must be conducted locally to ensure data security 4. Build an interface or script settings to enable user configuration (e.g., enable/disable translation, language selection). 5. Test the pipeline with sample videos in Setswana, English, and Russian. Deliverables: โข A Python-based pipeline script for local processing. โข Documentation on how to install and use the pipeline. โข Sample outputs demonstrating successful transcription, translation (optional), and NLP analysis.
Competencies and values
Living conditions and remarks