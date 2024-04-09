Trusted Reviews is supported by its audience. If you purchase through links on our site, we may earn a commission. Learn more.

What is ScreenAI: The Google technology explained

Jessica Gorringe
Staff Writer

Recently introduced through Google Research, ScreenAI is an exciting new Google technology. 

Although still in its research phase, read on to learn more about ScreenAI, how it works and when you might get a chance to try the technology yourself.

What is ScreenAI?

ScreenAI is described as being a new “vision-language model for user interfaces and infographics that achieves state-of-the-art results on UI and infographics-based tasks.”

In other words, ScreenAI is a vision language model, which means it can simultaneously comprehend image and text data. It was built to take the complexity out of reading and understanding data from user interfaces (UIs) and infographics, such as charts, diagrams and tables. 

Put simply, you can ask ScreenAI to summarise a screenshot or graphic and you should receive a clear and concise summary of it. You can also ask ScreenAI questions based on the screenshot and receive correct answers based on the data provided.

How does ScreenAI work?

Firstly, ScreenAI’s architecture is built on the multilingual language-image model PaLI, but ScreenAI actually improves upon this with pix2struct. Pix2struct is a pretrained image-to-text model for visual language understanding, which can be finetuned on tasks containing visually-situated language. 

ScreenAI uses two stages to work: a self-supervised learning pre-training stage, achieved by using publicly accessible web pages, and a fine-tuning stage that uses manually sorted data by real users.

Released alongside ScreenAI are three new datasets to help further conclusively evaluate the model. These datasets include Screen Annotation, which evaluates the layout understanding capability of ScreenAI, ScreenQA and Complex ScreenQA to assess its question-answering (QA) capability. 

What are the benefits of ScreenAI?

ScreenAI is capable of tasks that were previously complex, such as QA and UI-specific QA, annotations, summaries and navigation.

According to Google Research, ScreenAI is able to achieve state of the art results on UI and infographic-based tasks and a “best-in-class” performance compared to similarly sized models. 

Where can I try ScreenAI?

We’ll have to be patient to try this ground-breaking technology, as ScreenAI is still a research project and is not currently available for public use. There is also no indication yet as to when this may change. 

