Trusted Reviews is supported by its audience. If you purchase through links on our site, we may earn a commission. Learn more.

What is ScreenAI: The Google technology explained

Recently introduced through Google Research, ScreenAI is an exciting new Google technology. 

Although still in its research phase, read on to learn more about ScreenAI, how it works and when you might get a chance to try the technology yourself.

What is ScreenAI?

ScreenAI is described as being a new “vision-language model for user interfaces and infographics that achieves state-of-the-art results on UI and infographics-based tasks.”

In other words, ScreenAI is a vision language model, which means it can simultaneously comprehend image and text data. It was built to take the complexity out of reading and understanding data from user interfaces (UIs) and infographics, such as charts, diagrams and tables. 

Put simply, you can ask ScreenAI to summarise a screenshot or graphic and you should receive a clear and concise summary of it. You can also ask ScreenAI questions based on the screenshot and receive correct answers based on the data provided.

How does ScreenAI work?

Firstly, ScreenAI’s architecture is built on the multilingual language-image model PaLI, but ScreenAI actually improves upon this with pix2struct. Pix2struct is a pretrained image-to-text model for visual language understanding, which can be finetuned on tasks containing visually-situated language. 

ScreenAI uses two stages to work: a self-supervised learning pre-training stage, achieved by using publicly accessible web pages, and a fine-tuning stage that uses manually sorted data by real users.

Released alongside ScreenAI are three new datasets to help further conclusively evaluate the model. These datasets include Screen Annotation, which evaluates the layout understanding capability of ScreenAI, ScreenQA and Complex ScreenQA to assess its question-answering (QA) capability. 

What are the benefits of ScreenAI?

ScreenAI is capable of tasks that were previously complex, such as QA and UI-specific QA, annotations, summaries and navigation.

According to Google Research, ScreenAI is able to achieve state of the art results on UI and infographic-based tasks and a “best-in-class” performance compared to similarly sized models. 

Where can I try ScreenAI?

We’ll have to be patient to try this ground-breaking technology, as ScreenAI is still a research project and is not currently available for public use. There is also no indication yet as to when this may change. 

Why trust our journalism?

Founded in 2003, Trusted Reviews exists to give our readers thorough, unbiased and independent advice on what to buy.

Today, we have millions of users a month from around the world, and assess more than 1,000 products a year.

author icon

Editorial independence

Editorial independence means being able to give an unbiased verdict about a product or company, with the avoidance of conflicts of interest. To ensure this is possible, every member of the editorial staff follows a clear code of conduct.

author icon

Professional conduct

We also expect our journalists to follow clear ethical standards in their work. Our staff members must strive for honesty and accuracy in everything they do. We follow the IPSO Editors’ code of practice to underpin these standards.

Trusted Reviews Logo

Sign up to our newsletter

Get the best of Trusted Reviews delivered right to your inbox.

This is a test error message with some extra words