Internet Media Content Analysis Program

Inspired by the recent Webdriver Torso Youtube video speculation, I had an Idea worked up based on what I thought it could all be about... which is a method of teaching a computer, and/or program to read large volumes of web based media content, that is, videos, and making determinations about their content, whether they be offensive, sensitive, or useful for intelligence and security purposes.

That is, a program that can scan the internet on an ongoing, continual basis, and is able to read not only the kind of information intelligible to existing computer technology, but can “understand” the meaning of the content of media items such as videos in a way similar to that which a human being would when making value judgements of what they see and hear in such videos.
The benefit of such capability is that you can process the enormous quantity of ever changing and newly uploaded video data on the net with the extremely high capacity to time scale ratio of a large mainframe computer, but with the qualitative judgement capability of a human being, meaning that your ability to find offensive material, or criminal activity shown in videos increases exponentially. If you could teach the computer and program ever more sophisticated ways to interpret such data, it could do much more besides... such as extrapolate information based on what it sees with a statistically high degree of success, such as who posted it, what the video shows, the location of the shoot, the identities of the people in the video, what they are doing with, or to each other, and how to find them.... instantly.

In order to do this, you need first, to teach the computer how make qualitative evaluations of this video data, which shows three dimensional content displayed in two dimensions, by teaching it a basic visual, audio, and audio-visual content and context language, using at first, an almost abstract expressionist visual alphabet of at least two basic shapes and tones (two objects are the minimum requisite for any kind of examination of relativity, in the spatial sense, from which deductions can be made) to instruct it on how to differentiate between the relative positions of each object displayed, to the other(s), whether, in the case of two basic shapes, it is to treat one of the shapes as being bigger than the other, and next to it... or whether they be of the same size, only one is positioned in this notional space at some distance beyond, or behind the other. In other words, we teach it perspective, and an appreciation of depth perception, which it can reference to the audio tone's pitch, volume, and quality to confirm these inferences.

Repetition of multiple scenarios, involving every conceivable permutation and arrangement of these most basic elements could begin to furnish the program with a basic skill set language, which would then allow you to progress to higher degrees of sophistication, involving more complex shapes, a greater range of colours and shades, and more complex tonal arrangements, such as a non-photo-realistic cartoon, to actual video featuring an object who's visual profile could hold a resemblance with a person as far as a computer is concerned, in order to see if it could tell the difference, such as the Eifel Tower, which appears to e an object standing perpendicular to the earth and visually, stands on two legs. These could then be elaboratd upon to be even those with a dense, multi-layered audio track including voices, music, mechanical sounds etc. all differentiated and understood by the program after only a brief scan.

This is a more complex process than you may think, as, in the case of two basic shapes, lets say, one red, and one blue rectangle, a computer is only capable of interpreting two of these which about each other as being side by side.... only a human mind could infer from the relative size of each, that one could be positioned behind the other, and therefore be partly concealed by it.
If you were, for instance, to arrange them in a cruciform pattern, with a thin vertical blue rectangle crossing a horizontal red rectangle at it's top, so as to obscure the middle of the red one, we would infer that the red is positioned behind the blue, but one side of this cross is a continuation of the other side... but a computer would perceive these as three separate, and distinct shapes who's edges meet: two red, and one blue.

It would take a considerable amount of such testing against sample image and tone slides to which the program is exposed to begin to establish an increase in the success rate in the interpretive assessments of these compared with a pre-prepared table of expected results you have for each slide.

But, by so doing, it could be possible to create a very powerful tool for quickly crunching vast amounts of video material posted on the internet which may be undesirable or even criminal in nature, and either blocking it or tracing to source for policing purposes and the like, without having to rely solely on many man hours of professional analysis, or the public reporting it as abuse.

Basically, I am inclined to believe that someone is trying to teach a computer to think.
  • 455
  • 1
  • 2

Creado por




Para mantener la alta calidad de los contenidos, debes acceder para dejar un comentario

1 comentario

  • janicerand

    Thank you badfaith for sharing your idea :)

Las cookies nos permiten ofrecerte nuestros servicios y mejorar tu experiencia como usuario. Más información