You are on page 1of 12

Multimedia Final Report

Name : Al-fakeeh Nader


Student Id : 2021380082

How will the AI technology be applied


in the future Multimedia Applications?
The field of multimedia deals with the computer-controlled fusion of text,
animation, audio, drawings, graphics, and still and moving images (video), among
many other medias, so that any kind of information can be digitally represented,
stored, processed, and conveyed. Multimedia can be part of a live presentation, but it
can also be played, displayed, recorded, interacted with, or accessed by information
satisfied processing equipment, such as automated and high-tech gadgets.
Electronic media techniques for storing and interacting with multimedia content are
known as multimedia devices. When it comes to fine art, multimedia stands out from
other media since, like models, it encompasses a wider range of mediums. A vast
amount of multimedia data has been created in the last ten years as a result of the
numerous newly developed multimedia services and apps, which has advanced
multimedia research. Additionally, multimedia research has greatly advanced the
analysis of image and video content by providing tools like multimedia content
distribution, multimedia search, and recommendations. etcArtificial intelligence, or at
least the modern concept of it, has been with us for several decades, but only in the
recent past has AI captured the collective psyche of everyday business and society.
AI is about the ability of computers and systems to perform tasks that typically
require human cognition. Our relationship with AI is symbiotic. Its tentacles reach
into every aspect of our lives and livelihoods, from early detections and better
treatments for cancer patients to new revenue streams and smoother operations for
businesses of all shapes and sizes.

AI can be considered big data's great equalizer in collecting, analyzing, democratizing


and monetizing information. The deluge of data we generate daily is essential to
training and improving AI systems for tasks such as automating processes more
efficiently, producing more reliable predictive outcomes and providing greater
network security.

Take a stroll along the AI timeline


The introduction of AI in the 1950s very much paralleled the beginnings of the
Atomic Age. Though their evolutionary paths have differed, both technologies are
viewed as posing an existential threat to humanity.

Important key words :


Multimedia
Technologies related to artificial intelligence
Applications for multimedia in the future.
Introduction:
Essential development of science and technologies for the past years shows huge
progress and changes in Artificial Intelligence and Multimedia technologies. Changes
that they brought in human lives are dynamically noticeable. These changes can
advance different industries and many other practices, for instance, health care,
education, customer or emergency services, transport or agriculture industries etc.
Even though, the concept of multimedia was invented in 1884 by Paul Nipkow who at
the age of 24 created first video disc, the term “Multimedia” had been changed for
multiple times into different meanings. The past decade, and particularly the past few
years, has been transformative for artificial intelligence, not so much in terms of what
we can do with this technology as what we are doing with it. Some place the advent of
this era to 2007, with the introduction of smartphones. At its most essential,
intelligence is just intelligence, whether artifact or animal. It is a form of computation,
and as such, a transformation of information. The cornucopia of deeply personal
information that resulted from the willful tethering of a huge portion of society to the
internet has allowed us to pass immense explicit and implicit knowledge from human
culture via human brains into digital form. Here we can not only use it to operate with
human-like competence but also produce further knowledge and behavior by means of
machine-based computation.
Since multimedia involves many different scientific domains, such as signal
processing, computer vision, databases, networks, middleware, human-computer
interaction, social science, and the humanities, it is by definition multidisciplinary. To
properly envision "Future multimedia applications," we must first comprehend the
components and tools, and then we must comprehend the potential applications for
multimedia apps.

There are five basic elements of multimedia: text, images, audio, video and
animation. Example - Text in fax, Photographic images, Geographic information
system maps, Voice commands, Audio messages, Music, Graphics, Moving
graphics animation, Full-motion stored and live video, Holographic images.

Multimedia Tools and Applications


 An program that makes use of several media foundations, such as text, graphics,
music, animation, and/or video, is referred to as a multimedia application.

 Key multimedia applications and specific technologies used in multimedia


systems are covered in multimedia conferences. It includes dispensing audio and
video, multimedia, artificial intelligence, virtual reality, also referred to as VR,
and 3-D imagery. The idea behind fascinating and cutting-edge multimedia
systems that connect altered information to the user in a

 format for nonlinear communication. The fundamentals and cutting-edge aspects


of programming, security, human-computer interfaces, and multimedia
application facilities are discussed in multimedia conferences.

 Virtual Reality (VR) and 3-D imaging


 Wireless, Mobile Computing
 Animation and Graphics
 Audio, video processing
 Education and training
 Visual Communication
 Multimedia analysis and Internet
 Artificial Intelligence and AI technologies
1. AI Technology - Machine Learning
As a scientific endeavor, machine learning grew out of the quest for artificial
intelligence (AI). In the early days of AI as an academic discipline, some researchers were
interested in having machines learn from data. They attempted to approach the problem with
various symbolic methods, as well as what were then termed "neural networks; these were
mostly perceptron and other models that were later found to be reinventions of
the generalized linear models of statistics. Probabilistic reasoning was also employed,
especially in automated medical diagnosis.

However, an increasing emphasis on the logical, knowledge-based approach caused a rift


between AI and machine learning. Probabilistic systems were plagued by theoretical and
practical problems of data acquisition and representation. By 1980, expert systems had
come to dominate AI, and statistics was out of favor.Work on symbolic/knowledge-based
learning did continue within AI, leading to inductive logic programming, but the more
statistical line of research was now outside the field of AI proper, in patter recognition
and information retrieval. Neural networks research had been abandoned by AI
and computer science around the same time. This line, too, was continued outside the AI/CS
field, as "connectionism", by researchers from other disciplines including Hopfield, Rumelhart
and Hinton. Their main success came in the mid-1980s with the reinvention
of backpropagation.

Machine learning (ML), reorganized and recognized as its own field, started to flourish in the
1990s. The field changed its goal from achieving artificial intelligence to tackling solvable
problems of a practical nature. It shifted focus away from the symbolic approaches it had
inherited from AI, and toward methods and models borrowed from statistics, fuzzy logic,
and probability theory.[25]

Data mining[edit]
Machine learning and data mining often employ the same methods and overlap significantly,
but while machine learning focuses on prediction, based on known properties learned from
the training data, data mining focuses on the discovery of (previously) unknown properties in
the data (this is the analysis step of knowledge discovery in databases). Data mining uses
many machine learning methods, but with different goals; on the other hand, machine
learning also employs data mining methods as "unsupervised learning" or as a
preprocessing step to improve learner accuracy. Much of the confusion between these two
research communities (which do often have separate conferences and separate
journals, ECML PKDD being a major exception) comes from the basic assumptions they
work with: in machine learning, performance is usually evaluated with respect to the ability
to reproduce known knowledge, while in knowledge discovery and data mining (KDD) the
key task is the discovery of previously unknown knowledge. Evaluated with respect to known
knowledge, an uninformed (unsupervised) method will easily be outperformed by other
supervised methods, while in a typical KDD task, supervised methods cannot be used due to
the unavailability of training data.

Machine learning also has intimate ties to optimization: many learning problems are
formulated as minimization of some loss function on a training set of examples. Loss
functions express the discrepancy between the predictions of the model being trained and
the actual problem instances (for example, in classification, one wants to assign a label to
instances, and models are trained to correctly predict the pre-assigned labels of a set of
examples).
Generalization
The difference between optimization and machine learning arises from the goal
of generalization: while optimization algorithms can minimize the loss on a training set,
machine learning is concerned with minimizing the loss on unseen samples. Characterizing
the generalization of various learning algorithms is an active topic of current research,
especially for deep learning algorithms.

Statistics
Machine learning and statistics are closely related fields in terms of methods, but distinct in
their principal goal: statistics draws population inferences from a sample, while machine
learning finds generalizable predictive patterns. According to Michael I. Jordan, the ideas of
machine learning, from methodological principles to theoretical tools, have had a long pre-
history in statistics. He also suggested the term data science as a placeholder to call the
overall field.

Conventional statistical analyses require the a priori selection of a model most suitable for
the study data set. In addition, only significant or theoretically relevant variables based on
previous experience are included for analysis. In contrast, machine learning is not built on a
pre-structured model; rather, the data shape the model by detecting underlying patterns. The
more variables (input) used to train the model, the more accurate the ultimate model will be.

Leo Breiman distinguished two statistical modeling paradigms: data model and algorithmic
model,] wherein "algorithmic model" means more or less the machine learning algorithms
like Random Forest.

Some statisticians have adopted methods from machine learning, leading to a combined field
that they call statistical learning.

Physics
Analytical and computational techniques derived from deep-rooted physics of disordered
systems can be extended to large-scale problems, including machine learning, e.g., to
analyze the weight space of deep neural networks. Statistical physics is thus finding
applications in the area of medical diagnostics.

Theory :
Main articles: Computational learning theory and Statistical learning theory
A core objective of a learner is to generalize from its experience. Generalization in this
context is the ability of a learning machine to perform accurately on new, unseen
examples/tasks after having experienced a learning data set. The training examples come
from some generally unknown probability distribution (considered representative of the space
of occurrences) and the learner has to build a general model about this space that enables it
to produce sufficiently accurate predictions in new cases.

The computational analysis of machine learning algorithms and their performance is a


branch of theoretical computer science known as computational learning theory via
the Probably Approximately Correct Learning (PAC) model. Because training sets are finite
and the future is uncertain, learning theory usually does not yield guarantees of the
performance of algorithms. Instead, probabilistic bounds on the performance are quite
common. The bias–variance decomposition is one way to quantify generalization error.

For the best performance in the context of generalization, the complexity of the hypothesis
should match the complexity of the function underlying the data. If the hypothesis is less
complex than the function, then the model has under fitted the data. If the complexity of the
model is increased in response, then the training error decreases. But if the hypothesis is too
complex, then the model is subject to overfitting and generalization will be poorer.

In addition to performance bounds, learning theorists study the time complexity and
feasibility of learning. In computational learning theory, a computation is considered feasible
if it can be done in polynomial time. There are two kinds of time complexity results: Positive
results show that a certain class of functions can be learned in polynomial time. Negative
results show that certain classes cannot be learned in polynomial time.

Approaches:
Machine learning approaches are traditionally divided into three broad categories, which
correspond to learning paradigms, depending on the nature of the "signal" or "feedback"
available to the learning system:

 Supervised learning: The computer is presented with example inputs and their desired
outputs, given by a "teacher", and the goal is to learn a general rule that maps inputs to
outputs.
 Unsupervised learning: No labels are given to the learning algorithm, leaving it on its
own to find structure in its input. Unsupervised learning can be a goal in itself
(discovering hidden patterns in data) or a means towards an end (feature learning).
 Reinforcement learning: A computer program interacts with a dynamic environment in
which it must perform a certain goal (such as driving a vehicle or playing a game against
an opponent). As it navigates its problem space, the program is provided feedback that's
analogous to rewards, which it tries to maximize.[6] Although each algorithm has
advantages and limitations, no single algorithm works for all problems.

Machine Learning's Application in Multimedia:

Examples of Machine Learning Algorithms Used in Social Media


Machine learning algorithms have become an integral part of social media platforms,
enabling them to analyze vast amounts of user-generated data and make informed
decisions. Some of the commonly used ML algorithms are:

Natural Language Processing (NLP)


NLP aids in understanding and interpreting human language patterns. Social media
companies utilize NLP to analyze text data, including tweets, comments, and posts, to
extract sentiment, categorize content, or identify trends.

With indexing, the audience can easily find a certain location inside the
presentation or lecture without having to search through it by hand. This is
more inconvenient over time.

1. Virtual studios that are simpler.

Virtual studios are revolutionizing the broadcasting and content creation


industries by fusing digital and physical elements to create or improve a studio
space. These production businesses provide an immersive and adaptable work
environment with green screen technology, real-time rendering, and
innovative software.

Evolution and Technology


The concept of the virtual studio has evolved significantly with the
advancement of computer graphics and real-time video processing. One of the
main features of the first virtual studios was chroma keying, which substitutes
a digital backdrop for a single-color backdrop (usually blue or green). Modern
virtual studios, however, have more sophisticated elements:
Real-Time Rendering: Advanced software can be used to render complex 3D scenes in
real-time. With the help of this technology, the background becomes more dynamic and
captivating in response to changes in perspective and camera motions.

Motion Tracking: The virtual environment automatically adjusts to maintain perspective


and depth when cameras equipped with motion-tracking capabilities move around the
set.

Aspects of Augmented Reality (AR): With AR technology, digital images can be placed
over the actual studio space, seamlessly integrating them with the live-action movie.

LED Walls and Volumes: Rather than using green screens to portray virtual
surroundings, some studios use LED walls. This technique allows for more realistic
lighting and reflections, as well as more natural interactions between the actors and
their surroundings.

 1. Integration and aggregation of comments


 You may view live events and video broadcasts on a number of different
platforms, including YouTube, Instagram, Tiktok, and other CDNs. The social
networking site of their choice will be used for viewing, with many users
favoring one over the other. Nevertheless, there is an issue with this behavior:
the viewer discussion and comments split up throughout the platforms.
Moderators can respond to comments from viewers more easily and engagement
levels are significantly increased when all pertinent comments are kept in one
location.
 In order to enable hosts to respond to comments in real time and automatically
direct responses to the relevant social media site, machine learning can be used
to aggregate social media comments into the stream. Additionally, machine
learning can make it easier to add dynamic content to a broadcast, like

A discussion on a news feed or a Twitter hashtag. The system has the ability to
dynamically include the content into the livestream by detecting and learning
important keywords from a variety of digital media platforms.

Potential applications of machine learning


2. Combine audience feedback into the live broadcast on all video platforms; Acquire
keywords associated with the live events (such as hashtags (#), at (@), or the event
name);
3. Keep an eye on online conversations on designated platforms (such as Facebook,
YouTube, Twitter, Tiktok, and news and media websites) and incorporate content
into the livestream dynamically;
4. Indexing using transcription, visual cues, and OCR
 With indexing, the audience can easily find a certain location inside the
presentation or lecture without having to search through it by hand. This
is more inconvenient over time.

 presentations and seminars when the audience may wish to rewatch


important scenes or crucial lessons covered in the video. A live video
can be indexed using machine learning in a few ways:

 Visual/auditory cues: As an alternative, a live event or recorded lecture


can be indexed according to visual or auditory cues, including applause
from the crowd, a slide change, or the appearance of a new speaker.
 Audio transcription: Although it can be done by hand, it takes a lot of
time and labor to convert audio files into text.
 Optical Character Recognition, also known as OCR, is a technology that
enables you to transform a range of documents into text data that can be
searched, including scanned paper documents, PDF files, and digital
photographs. After that, readers will be able to quickly find particular
information within a document or media file thanks to the indexing of
this data.

 Machine learning can help automate each of these video indexing methods,
helping to save tremendous costs by reducing the need for manual
transcription. Human operators can instead use their time to verify
transcribed/converted text, therefore helping the software learn new words and
correct any grammar issues.
 Possibilities for machine learning include:  Transcribing audio to text
and using that text as a basis for indexing important points in the VOD;
 Using OCR to turn overlays, lower thirds, and other on-screen text into
searchable data and automatically index important points in the video;

 Acquire knowledge of particular visual and auditory signals (such as


clapping or the presence of a presenter's face) and generate an index
entry automatically upon cue detection in the video;

5. Skillful live swapping


A live production or lecture needs to switch between several different video
sources or unique layouts in order to be genuinely likable. It will draw the
audience in and aid in emphasizing key points of the presentation. But this
procedure must be carried out by hand. Regular changeover could also be
challenging for smaller livestreams with fewer employees. These smaller groups
could lose out on a great chance to give viewers an engaging live video
experience.

 With today's encoding technology, machine learning can be used to automate


the process in response to spoken or visual cues, such audience applause,
presenter movement, or gestures. Is the speaker sharing a story from their own
life? Change to the view from the camera. Are the presentation slides being used
by the speaker to explain a concept? Change to the view of slides. Presenters
and AV technicians need not exert as much effort to create an interesting
switched live production thanks to machine learning.
 Potential applications of machine learning

 Learn visual and audio cues for each video source or layout;
 Switch to each video source or layout based on learned cues.
5. Dynamic calibration of images
For viewers to see a presentation that is clearly visible, live streams and recordings
need to have their image settings (such as exposure and white balance) calibrated
to perfection. Picture calibration can be a challenging procedure, especially if
users lack the knowledge to make the required corrections or if environmental
conditions (such lighting) are unpredictable. By identifying the existing picture
settings and applying adjustments to enhance picture quality, machine learning
can expedite the calibration process.

Potential applications of machine learning include: • Identifying the current photo


settings; • Learning the ideal picture settings to get the best possible shot;

Offer recommendations for enhancing the current image (or even set up the
settings automatically!);

Automated audio optimization


Halfway down our list of machine learning applications for video production
is automated audio optimization. High-quality audio is essential when live
streaming and recording a live presentation or lecture. Without clear audio, viewers
are unable to fully experience or understand the presentation. For the average
non-technical presenter or lecturer however, audio problems bring difficulties to
resolve quickly, such as inaudible/distorted volume or troubleshooting microphone
problems. These issues often require the assistance of AV technicians to diagnose and
resolve the problem before the presentation can proceed which is always not
convenient. Machine learning can be used to keep a watchful eye on audio and
automatically make adjustments to ensure maximum audio quality. Technicians could
be notified only if critical audio issues are detected.
Machine learning possibilities:
 Streamline the audio diagnostic process;
 Indicate to technicians when there is an audio issue to address;
 Ensure high-quality audio is available at all times;

6 Smarter presenter tracking


A lecture or live event frequently features one or more presenters who command the
attention of the audience. While presenting their topic, presenters will come and go from
within the frame and move around the stage. Often, using a camera to follow the speaker
while they move makes the presentation more interesting as a whole. For smaller live
performances or lectures, tracking would have to be done manually by a human camera
operator, which can be expensive. In this case, machine learning can be used to recognize
presenter faces and follow their movements without requiring manual camera control.
The camera automatically real-time repositions itself when the presenter travels around
the stage to make sure the clearly visible within the frame.
Seven condensed videos

There may be occasional issues with the lectures or webcast, including a speaker
switching, a delay in preparing the presentation materials, minor technological
problems, etc. With the use of a recorded presentation, professionals may
eliminate any downtime and provide viewers with a polished and expert end
product. Through the identification and removal of gaps in the recorded content
during post-production, machine learning can assist in automating this process.

Potential applications of machine learning include: • Recognize visual and


auditory cues based on predetermined criteria (e.g., more than 10 seconds of quiet
or the presenter leaving the stage);
• In post-production, automatically eliminate found gaps from the finished result;

• Assists video editors in high-volume video production environments in saving


time and effort on standard post-production activities;

7 Simplified control over recording :


The encoding system must be manually operated by the lecturer to start and
stop recording when the lecture is being recorded. Even though this work is quite
straightforward, machine learning offers the chance to automate it so professors
may concentrate on what they do best—teach. By automatically determining the
start and finish of each lecture, machine learning technologies can streamline the
recording control process. Machine learning, for instance, can begin recording
based on cues from the environment, like when the lights in the room turn on,
when sound is detected, when someone walks onto the stage, etc.

7 The presenter's face, ambient lighting, presentation materials, and other


audiovisual signals can all be taught by machine learning. It can also start and
stop recording when it detects these cues.

8 Automated lower thirds


Lower thirds are graphics, animations, or text overlays that are used in live video to
engage viewers and convey a message or other contextual information, such as a
presenter’s name or title. Created using special video editing software (such
as NewBlueFX), lower thirds can be applied in real time or can be manually
configured in post-production to appear at key moments during the presentation.
Machine learning in video editing applications can be used to recognize speaker
faces and other visual cues and automatically display the appropriate overlays without
the need for manual intervention by video editors.

9 Machine learning possibilities:


• Acquire visual and auditory cues, such as the face of each presenter as they enter the
frame; • Using cues that have been acquired, automatically show pertinent
information in the bottom thirds;

10 Highlight reels :
The last in the list of machine learning applications for video production is
highlight reels. A recorded presentation can be repurposed as marketing collateral
by editing the original material to contain only the presentation highlights, such as a
speaker’s key points or important moments in the event. Machine learning can be
applied to automatically search for and isolate key moments in the recorded video(s)
using visual (e.g. transcribed text) and audio cues (e.g. audience applause). The code
can
help create a highlight reel based on these isolated clips for the video editors to review.
This is particularly helpful for video editors in saving time and effort on routine
post-production tasks in a high-volume video setting.
Machine learning possibilities:
• Learn visual and audio cues that correspond to an important moment, such as
audience applause or keywords within transcribed text.
• Automatically isolate video clips based on learned cues.
• Use clips together to form a highlight reel.

streamlining, and personalizing your live streams and recordings. Whether you’re a
content creator, AV technician for an educational institution, or live event specialist,
machine learning can help improve your live video experience. With rapid
advancement of science and technology and with investment of scholars to explore
more AI and its technologies like machine learning applying to multimedia
applications, I it will open new branches and bring more convenience to human life.

References:

 https://www.techtarget.com/searchenterpriseai/tip/The-history-

of-artificial-intelligence-Complete-AI-timeline

 The future AI impact on Society , AI’s review

https://www.technologyreview.com/2019/12/18/102365/the-
future-of-ais-impact-on-society/

 “A survey on Chromecasat digital device” Journal of Emerging

Technologies and Innovative Research (ISSN: 2349-5162)

Published in Volume 5 Issue 10, October - 2018.

 Wikipedia: Link:https://en.wikipedia.org/wiki/Multimedia.

 Multimedia Applications by Klara Nahrstedt and Ralf Steinmetz.

 IBM: https://www.ibm.com/cloud/learn/machine-learning.

 Epiphan:

https://www.epiphan.com/blog/machine-learning-applications/.

You might also like