Thoughts of a Twitter User on Building a System to Flag Synthetic and Manipulated Media

Katnoria
3 min readNov 21, 2019
Source: https://en.wikipedia.org/wiki/Deepfake

It is great to see Twitter planning to do something about the use of synthetic and manipulated media (deepfakes). And, they are asking people to help shape their approach.

Link to the survey

While I appreciate their move, I share some concerns (not exhaustive) as a user of the platform — most of which I voiced in the survey feedback :

It gives too much power to Twitter, where they get to decide whether the content is real or fake, which in turn could impact the shape of conversation or information flow.

MODEL

  • At what level (in terms of evaluation metrics) do they think they have a powerful enough model that works well. How good is good?
  • Once implemented, the platform could create automation bias where it could make more people believe that the content is or must be acceptable since the AI system did not flag it. We are starting to identify the impact of feedback loops the recommendation systems can create.
  • It is good to see that they are looking for potential partners, commercial as well as not-for-profit, to work with them. Hence, I am hoping the overall product will be a collaborative effort of several partners, including policymakers and the teams that specialise in bias and fairness in machine learning.
  • Form to collaborate with Twitter
  • Given the power it gives to the platform (to decide what’s real and what is fake), I also hope they publicly share the evaluation criteria, tradeoffs, and the metrics used in production models.

DATA

  • The survey is out in a handful of languages, so It seems they are initially targeting a small set of languages but could expand to other languages? I hope they will be transparent about how they are going to collect the data.
source: https://www.katnoria.com/world-languages/
  • Even if they implement the solution for the English content, how do they plan to handle the bias in data representation (with in the country, continent, and globally)? Which is turn is also related to how and where the data gets labeled? (e.g. of biased labeling could be data collected in one country, labeled in another country)
source: http://techlist.com/mturk/global-mturk-worker-map.php

As a user, I see great value in Twitter for my use case — “Engage and Interact with ML Twitter”.

However, the more I think about this, the more questions I have.

But this will do for now.

--

--