
Person Extraction
4 months
1 FTE: ML Engineer

Project
K2 project for MediaPress revolutionizes the way movie metadata is processed. By leveraging advanced NLP techniques, it accurately identifies and categorizes characters from movie synopses, distinguishing real and fictional entities.
Results
The Person Extraction project has set a new benchmark in movie metadata analysis. It has successfully extracted and categorized thousands of characters from various movie synopses, accurately distinguishing between real and fictional characters. The system's adaptability enables it to be fine-tuned according to MediaPress's unique ontology, ensuring high accuracy in role assignment and character recognition.
MediaPress

In partnership with MediaPress, a leader in movie metadata, this project aims to transform the traditional methods of metadata collection and analysis. The goal is to automatically extract character information from movie synopses, classify them as real or fictional, and assign them specific roles as per the client's ontology. The challenge was to develop a system capable of understanding the complex narratives of movies and accurately categorizing characters in a scalable and efficient manner.
Delivering sustainable value.
Deliverables
Impact
Skills
Advanced NLP Model: Developed a state-of-the-art natural language processing model capable of understanding and extracting character information from movie synopses.
Ontology Alignment: Tailored the model to align with MediaPress's specific ontology, ensuring accurate role assignment and categorization.
Scalable Extraction System: Created a robust system capable of processing large volumes of text data, ensuring efficiency and scalability.
Character Classification: Implemented an innovative approach to distinguish between real and fictional characters, enhancing the depth of metadata analysis.
The Person Extraction project has significantly enhanced the quality and depth of movie metadata analysis for MediaPress. It offers a highly accurate and scalable solution for character extraction and classification, tailored to the specific needs of the client. This project not only improves the efficiency of metadata processing but also provides a rich but also enables MediaPress to increase their data coverage and enchance their offer
Natural Language Processing (NLP), Machine Learning, Data Analysis, Character Recognition, Ontology Alignment