Library blogVeriff crosses the 10K line - 10,000+ documents in our database!

Veriff crosses the 10K line - 10,000+ documents in our database!

In a major milestone for an identity verification company, Veriff passed the 10,000 document mark in our internal database. Find out how we did it, and how it helps us deliver super fast and accurate verifications.

A header image announcing over 10,000 documents in our database.

Karita Sall

November 18, 2021

Blog Post

Finserv

Announcement

Recently, Veriff turned 6 years old, having been founded back in 2015 with the ultimate goal of making the online world a safer and fairer place.

Since then, we’ve been working hard on making our external products, and our internal tools, the absolute best for everyone to use on a daily basis. And we’re here to highlight one of our most significant internal tools, and something which is the biggest in this industry, our proprietary document database.

We’ve just passed the 10,000 mark, with over 10,000 official document specimens now held within our specimen database (SDB) - each with additional information to help train our AI & Machine learning models, and assist our tireless verification team, to make difficult decisions in a quick and easy way. This level of database allows us to assist our clients in scaling across the world, helping them welcome customers from practically anywhere on the planet.

To celebrate this, we sat down with three of our team members who’ve been involved in building the database day-to-day:

Tesha, a Document Database Specialist (DDS) within our Document Research team, who upload documents to the database, conduct research, and much more
Triin, who works in Data Ops, making sure everything within the database is ready to be used by our automation systems
Camila, one of our brilliant Verification Specialists, the team using the database daily, and consistently offering feedback and asking questions to help the database grow

They were all kind enough to offer their time and answer a few questions about how we got here, along with small contributions from a few other team members.

First off - how did we get from 1 document to 10,000?

Tesha: The road to 10k has truly been a journey. It all started back in the summer of 2019 when the first version of our specimen database, affectionately known as ‘BOB’, was essentially born. I remember the day like it was yesterday. The drive and excitement to finally have an in-house catalog of document specimens which we could grow, make our own, as other databases were too limited to serve our business needs & hyper growth. Several facelifts later, and countless hours of researching and using the favourite photo/image editor of the DDS team (which we use to prepare our specimens before uploading), here we are. But we couldn't have made it to today if it wasn't for the help of several teams and individuals, especially the amazing verification specialists.

Triin: Our Document Database Specialist (DDS) team has done a really awesome job researching all the specimens and updating information for our annotations. From the Data Operations side we have been adding these annotations to all our supported specimens according to the information the DDS team has been researching.

Camila: Huge thanks must go to the DDS team for their tireless research and constant updates, and to the incredible verification specialists (VS) that always post about possible new specimens that we encounter. It’s fascinating and exciting, how we receive sessions with brand new, recently issued documents. The 10k milestone is definitely a product of strong teamwork.

“Our database has to be celebrated, because no matter where you’re from, whether it’s the remote Cook Islands in the middle of the Pacific, tiny Saint Helena in the South Atlantic, or even the far away Christmas Island in the Indian Ocean, Veriff has you covered. And that’s pretty cool.” - Taras Boyko, Documents Product Manager

What’s the significance of crossing the 10k line, and how much further is there to go?

Tesha: If I were to sum up the significance of finally hitting the 10k mark, I would say it's somewhat like reaching the halfway point in a marathon (something which I’ll concede I have no personal experience in). I say this because it's a huge achievement and there’s a small sense of relief to get here, but it is only the beginning (or middle in the marathon analogy). We’re hopeful that BOB/SDB will continuously grow. It can only keep getting better.

Triin: Crossing the 10k line is an awesome milestone. I think it’s ultimately an endless journey - as long as governments are issuing new documents then we’ll always have something new to support, annotate, and then automate. It hasn’t been just 10k documents, it’s been hours of teamwork from all the teams who are somehow related to BOB.

Camila: As someone who saw the birth of BOB, it’s mind-blowing to realize that we have now surpassed the old databases that we used to rely on, and surpassed by a lot, in number of specimens, quantity, and quality of the information provided. As long as there are new documents and people wanting to verify themselves, there is room for BOB to grow constantly. In the absence of an apocalyptic scenario, there is no end to this road.

“Over 10,000 documents is an unmatched number in our industry, which is remarkable. We can say with confidence that we are here to support businesses who have a truly global reach. And this only further powers our AI, with so much knowledge of so many documents, we can make fast, automated & accurate decisions. No matter where your user base is, or where you’re expanding to, we’ve got you covered.” - Ibrahim Al-Taie, Product Marketing Manager

Is it possible to say a little about how our document database came to be? And how important is it to your team on a daily basis?

Tesha: As mentioned earlier, it all started back in Summer 2019. It initially didn't have its name yet (BOB) but was known as “document-viewer” and only had a handful of specimens. The document database is essentially one of the core elements for our Document Research Team. From uploading new specimens and replacing existing ones, to adding and updating specimen information following research. Thanks to the database, we have updated one of our original processes. Before we solely relied on our “Mastersheet” (A Google sheet where we kept an extensive list of all the documents known to us) but now we are able to keep track using the database. The Document Research Team wouldn't be where we are today without the document database.

Triin: Thinking about the document database back at the end of 2019 when we annotated the first specimens in SDB, compared with the annotation functionalities we have now, then it really has been a long successful ride. Our document database is an essential tool for Data Ops. To improve the quality of data extraction, we have to make sure that all the gathered document information is not only benefited by verification specialists, but also consumable by our automation.

Camila: BOB came from a need to have more specimens and information readily and easily available for specialists during the flow of verification. BOB is a vital tool for new joiners, who have just been flooded with information from their onboarding. Most of the answers to the basic doubts they have can be found in BOB. For more senior VS, it’s a tool to check information about a specimen that doesn’t come up so often, or to be reminded about recent updates (“I remember there was an update about this document, but not exactly what. Better check BOB!”)

And how have we approached creating and curating the database - how many people have put work into this?

Tesha: There were several individuals involved in the development of the database across different departments and teams. If I was to put an actual number I would say 70+. This included DDS, Engineers, Designers, Data scientists, Data annotators and Verification specialists.

Triin: At one point we had 20+ dedicated VS helping the Data Ops team to cover all the specimens with data positions. We started this project with quite a huge backlog of specimens without annotations and in a few months managed to cover all the supported specimens with the main data fields. Today all template annotations are done by Data Ops.

Camila: I’ll admit, I don’t know much insider information about BOB’s creation, but I was one of the VSs that helped with annotation, as Triin mentioned above :)

Finally, after reaching such a significant milestone, what’s next?

Tesha: Next is the 15k or 20k milestone possibly. But hopefully we can continue to develop the interface and usability of the specimen database to make it even more amazing.

Triin: Hopefully soon we are gonna have even more functionalities to facilitate annotation processes and improve annotation quality. It’s all a work in progress. And with that work, our AI only continues to improve and gain accuracy.

Camila: There are already discussions about ways to make BOB more user-friendly and easy to navigate when you’re in a hurry. And of course, new specimens will always keep coming.