fbpx

With the intention of assisting in the identification of potentially hazardous information in Gmail, such as spam and phishing emails, Google has introduced a new multilingual text vectorizer that goes by the name RETVec (which is an abbreviation for Resilient and Efficient Text Vectorizer).

“RETVec is trained to be resilient against character-level manipulations including insertion, deletion, typos, homoglyphs, LEET substitution, and more,” the project’s description on GitHub states. “RETVec is trained to become resilient against these manipulations.”

“The RETVec model is trained on top of a novel character encoder which can encode all UTF-8 characters and words efficiently.”

Threat actors are known to design counter-strategies in order to circumvent these security measures, despite the fact that large platforms such as Gmail and YouTube rely on text classification models in order to identify phishing attempts, inappropriate remarks, and scams.

Text alterations that are considered antagonistic have been observed being used by them. These manipulations include the use of homoglyphs, keyword stuffing, and invisible letters, among other things.

Defense Against Spam and Malicious Emails

RETVec, which is capable of working on more than one hundred languages out of the box, is designed to assist in the development of text classifiers that are more robust and efficient, as well as more resilient and efficient on the server side and on the device itself.

Vectorization is a technique used in natural language processing (NLP) that involves mapping words or phrases from a lexicon to a numerical representation that corresponds to them. This is done in order to carry out additional analysis, such as sentiment analysis, text categorization, and named entity recognition.

“Due to its novel architecture, RETVec works out-of-the-box on every language and all UTF-8 characters without the need for text preprocessing, making it the ideal candidate for on-device, web, and large-scale text classification deployments,” according to Google’s Elie Bursztein and Marina Zhang

The technology giant stated that the incorporation of the vectorizer into Gmail resulted in a 38% increase in the rate of spam detection compared to the baseline, as well as a 19.4% reduction in the rate of false positive rates. Furthermore, it resulted in an 83% reduction in the Tensor Processing Unit (TPU) utilization of the model.

It is because of its compact representation that models that have been trained with RETVec demonstrate faster inference speed. In addition, Bursztein and Zhang stated that having smaller models lowers the amount of computing expenses and minimizes the amount of latency, both of which are essential for large-scale applications and on-device models.

MANAGED CYBERSECURITY SOLUTIONS

Rhyno delivers a range of activities that combine to fully protect your infrastructure and data from cybercriminals, anywhere and everywhere, 24/7/365.

GO TO CYBERSECURITY SOLUTIONS

About Rhyno Cybersecurity Services

Rhyno Cybersecurity is a Canadian-based company focusing on 24/7 Managed Detection and Response, Penetration Testing, Enterprise Cloud, and Cybersecurity Solutions for small and midsize businesses.

Our products and services are robust, innovative, and cost-effective. Underpinned by our 24x7x365 Security Operations Centre (SOC), our experts ensure you have access to cybersecurity expertise when you need it the most.

This website uses cookies to improve your online experience. By continuing, we will assume that you are agreeing to our use of cookies. For more information, visit our Cookie Policy.

Privacy Preference Center