Yad Vashem uses AI to reveal previously unknown names of Holocaust victims
Yad Vashem uses AI to reveal previously unknown names of Holocaust victims
The innovation department of Yad Vashem World Holocaust Remembrance Center in Jerusalem has developed the a language model capable of extracting new names and details from the countless testimonies, names and details, at a speed that manual work will never be able to compete with. Thanks to the project, 400 new names have already been added to the Hall of Names, and many more are on the way
Yad Vashem World Holocaust Remembrance Center in Jerusalem is harnessing technological progress to locate, cross-reference and confirm hundreds of thousands of names of those murdered in the Holocaust that have not yet been identified.
The project that has been ongoing for the past two years uses a large language model (LLM) for this purpose, and through a recently completed pilot, 400 names were added to the Hall of Names, which today contains 4.9 million names. Yad Vashem estimates that in each of the 20,000 pieces of testimony in the institute's possession, it will be possible to extract about seven names of those who perished that were not known until now.
Thus, for example, Yad Vashem says that in the testimony given by Holocaust survivor Olga Katz-Goldstein in 2017, which included witness sheets for her parents and sisters, about which she gave details, there were no witness sheets about her extended family. With the help of a large language model developed at the institute and with the help of various experts, Yad Vashem succeeded in extracting the names of additional family members of Katz-Goldstein who were murdered in Auschwitz, including her grandmother, uncles and nephews. Without this technology it would probably take a long time - if at all - to discover their names. In this case the technology located names that even the family member did not remember or did not know about.
The holocaust victim name database initiated and led by Yad Vashem is working to collect the names of the holocaust victims and reconstruct their life stories. For decades this project collected and validated names only manually and slowly which included witness sheets, dispatch lists and archival documents, which were reviewed by experts in different languages who read and analyzed the various texts to extract details about those who perished and those who survived.
Yad Vashem has about 10 million records from various sources. Many of these records were never reviewed by the institute's staff due to a lack of manpower. The expectation now is to feed into the AI system all the millions of records to try to extract new names and places, as well as the connections between these across the various documents. "The use of artificial intelligence helps the study of the Holocaust in many different ways. The technology helps Yad Vashem to review in a short time hundreds of hours of testimony given by Holocaust survivors over the years, to extract from them new names of the murdered and details that have never been revealed," says Dr. Alexander Avraham, Director of the Hall of Names at Yad Vashem.
With the developments in the field of artificial intelligence, Yad Vashem realized that technology can help scan, extract and summarize a lot of information in their possession. The data in the hands of the institute creates unique challenges that existing solutions and products in the commercial market are not designed to deal with. These include the types of data, its quality, the many languages and outdated style of the data and its lack of uniformity, to name just a few of the challenges. "First, we had to take each testimony - video or audio - and turn the speech into text. We tagged 30 testimonies in each language. This is a task that is not easy to complete at a high syntactic level, especially in Hebrew," Esther Fuxbrumer, manager of software development in the innovation department at Yad Vashem, told Calcalist. "Then the 'entities' must be extracted from the texts, which is also not an easy task because the language of the testimony is different, it is not the spoken language of today." The purpose of the manual labeling of the testimony is to prepare information with which to train the model. Once the model has learned from the examples how to label, it can continue to label additional testimonies.