Google develops an open source harassment filter for journalists and public figures

Google’s Jigsaw division is releasing the code for an open-source anti-harassment tool called Harassment Manager. The tool, intended for journalists and other public figures, uses Jigsaw’s Perspective API to let users triage potentially malicious comments on social media platforms, starting with Twitter. It debuted in source code for developers to use before launching in June as a functional app for Thomson Reuters Foundation reporters.

Harassment Manager can currently work with Twitter’s API to combine modification options — such as hiding Twitter replies and muting or blocking accounts — with a bulk filtering and reporting system. Perspective examines the level of linguistic “toxicity” of messages based on elements such as threats, insults, and profanity.

join us on telegram

It separates information into queues on a dashboard that users can process in batches rather than individually through Twitter’s default moderation tool. They can choose to obfuscate the text of information as they process, so they don’t need to read every single piece of information, and they can search for keywords in addition to using an automatically generated queue.

The Harassment Manager also allows users to download an independent report containing abusive messages; this can create file-based clues for social media account owners, and in the case of illegal content such as direct threats, can directly aid law enforcement in forensics.

However, there is not yet a standalone app that users can download. Instead, developers are free to build applications containing its functionality based on this filter API, and services using it will be launched by partners such as the Thomson Reuters Foundation.

Jigsaw, which officially announced Harassment Manager on Women’s Day, described the tool as being particularly relevant to female journalists facing gender abuse, highlighting sources from “journalists and activists with a large Twitter presence” as well as the International Women’s Media Foundation and Protect Journalists Inputs from non-profit organizations such as committees.

In a Medium post, the team said it wanted developers to tailor it to other at-risk social media users. “We hope this technology can provide a resource for those who face harassment online, especially female journalists, activists, politicians and other public figures who encounter a lot of malicious content online,” the post reads.

Google has previously leveraged Perspective for automated auditing. In 2019, it released a browser extension called Tune that lets social media users avoid seeing harmful offensive messages, and is used by many review platforms (including Vox Media’s Coral) to supplement human moderation. However, as we noticed when we released Perspective and Tune, language analysis models have historically been far from perfect.

It sometimes misclassifies sarcastic content or fails to detect abusive messages, and the puzzle-like AI can inadvertently associate words like “blind” or “deaf” with toxicity that isn’t necessarily negative Rising, the jigsaw puzzle itself has been criticized as toxic workplace culture, although Google has disputed that claim.

However, unlike the AI controls of services like Twitter and Instagram, Harassment Manager is not a platform-side control feature. It’s clearly a triage tool to help manage the sometimes massive social media feedback that may be relevant to people far outside the realm of journalism — even if they can’t use it just yet.

Share this:

Leave a Comment Cancel reply