CLOUD-BASED MULTIMEDIA CONTENT
PROTECTION SYSTEM
Abstract—We propose a new design for large-scale multimedia content protection
systems. Our design leverages cloud infrastructures to provide cost efficiency,
rapid deployment, scalability, and elasticity to accommodate varying workloads.
The proposed system can be used to protect different multimedia content types,
including 2-D videos, 3-D videos, images, audio clips, songs, and music clips.
The system can be deployed on private and/or public clouds. Our system has two
novel components: (i) method to create signatures of 3-D videos, and (ii)
distributed matching engine for multimedia objects. The signature method creates
robust and representative signatures of 3-D videos that capture the depth
signals in these videos and it is computationally efficient to compute and
compare as well as it requires small storage. The distributed matching engine
achieves high scalability and it is designed to support different multimedia
objects. We implemented the proposed system and deployed it on two clouds: Amazon
cloud and our private cloud. Our experiments with more than 11,000 3-D videos
and 1 million images show the high accuracy and scalability of the proposed
system. In addition, we compared our system to the protection system used by
YouTube and our results show that the YouTube protection system fails to detect
most copies of 3-D videos, while our system detects more than 98% of them. This
comparison shows the need for the proposed 3-D signature method, since the
state-of-the-art commercial system was not able to handle 3-D videos
EXISTING
SYSTEM:
The problem of protecting various types
of multimedia content has attracted significant attention from academia and
industry. One approach to this problem is using watermarking, in which some
distinctive information is embedded in the content itself and a method is used
to search for this information in order to verify the authenticity of the
content. Watermarking requires inserting watermarks in the multimedia objects before
releasing them as well as mechanisms/systems to find objects and verify the
existence of correct watermarks in them. Thus, this approach may not be
suitable for already-released content without watermarks in them. The
watermarking approach is more suitable for the somewhat controlled
environments, such as distribution of multimedia content on DVDs or using
special sites and custom players. Watermarking may not be effective for the
rapidly increasing online videos, especially those uploaded to sites such as
YouTube and played back by any video player. Watermarking is not the focus of
this paper. The focus of this paper is on the other approach for protecting
multimedia content, which is content-based copy detection (CBCD). In this
approach, signatures (or fingerprints) are extracted from original objects.
Signatures are also created from query (suspected) objects downloaded from
online sites. Then, the similarity is computed between original and suspected
objects to find potential copies. Many previous works proposed different
methods for creating and matching signatures. These methods can be classified
into four categories: spatial, temporal, color, and transform-domain. Spatial
signatures (particularly the block-based) are the most widely used. However,
their weakness is the lack of resilience against large geometric
transformations. Temporal and color signatures are less robust and can be used
to enhance spatial signatures. Transform-domain signatures are computationally
intensive and not widely used in practice. For more details, see surveys for
audio fingerprinting and 2-D video
fingerprinting.
PROPOSED SYSTEM:
The proposed system is fairly complex
with multiple components, including: (i) crawler to download thousands of
multimedia objects from online hosting sites, (ii) signature method to create
representative fingerprints from multimedia objects, and (iii) distributed
matching engine to store signatures of original objects and match them against
query objects. We propose novel methods for the second and third components,
and we utilize off-the-shelf tools for the crawler. We have developed a
complete running system of all components and tested it with more than 11,000
3-D videos and 1 million images. We deployed parts of the system on the Amazon
cloud with varying number of machines (from eight to 128), and the other parts
of the system were deployed on our private cloud. This deployment model was
used to show the flexibility of our system, which enables it to efficiently
utilize varying computing resources and minimize the cost, since cloud
providers offer different pricing models for computing and network resources.
Through extensive experiments with real deployment, we show the high accuracy
(in terms of precision and recall) as well as the scalability and elasticity of
the proposed system.
Module
1
Distributed Matching Engine
Unlike many of the previous works, which designed a system for image matching,
our proposed matching engine is general and it can support different types of
multimedia objects, including images, 2-D videos, and 3-D videos. To achieve
this generality, we divide the engine into two main stages. The first stage
computes nearest neighbors for a given data point, and the second stage
post-processes the computed neighbors based on the object type. In addition,
our design supports high-dimensionality which is needed for multimedia objects
that are rich in features. Computing nearest neighbors is a common problem in
many applications. Our focus in this paper is on distributed techniques that
can scale to large datasets such as. Liao et al. build a
multi-dimensional index using R-tree on top of the Hadoop distributed file
system (HDFS). Their index, however, can only handle low dimensional
datasets—they performed their experiments with two dimensional data. They solve
the nearest neighbors over large datasets using MapReduce . Lu et al. construct a Voronoi-like diagram using some
selected pivot objects. They then group the data points around the closest
pivots and assign them to partitions, where searching can be done in parallel.
The system in is also designed for low
dimensional datasets; it did not consider data with more than 30 dimensions. In
contrast, in our experiments we used images and videos with up to 128
dimensions. Aly et al. propose a
distributed system for image retrieval. A major drawback of this system is
using a single machine that directs all query points, which makes it a single
point of failure as well as a bottleneck that could slow down the whole system.
Our system does not use a central machine, and thus it is more robust and
scalable. The closest work to ours is the RankReduce system, which implements a
distributed LSH (Locality Sensitive Hashing) index on a computing cluster using
MapReduce. RankReduce maintains multiple hash tables over a distributed
cluster, which requires storing multiple replicas of the datasets in hash tables.
This incurs significant storage cost and it increases the number of I/O
operations. In contrast, our system stores the dataset only once. We compare
the proposed matching engine against RankReduce and we show that our system
returns more accurate neighbors and it is more efficient.
Module
2
Design Goals and Approaches
A content protection system has three
main parties: (i) content owners (e.g., Disney), (ii) hosting sites (e.g.,
YouTube), and (iii) service providers (e.g., Audible Magic). The first party is
interested in protecting the copyright of some of its multimedia objects, by
finding whether these objects or parts of them are posted on hosting sites (the
second party). The third party is the entity that offers the copy finding
service to content owners by checking hosting sites. In some cases the hosting
sites offer the copy finding service to content owners. An example of this case
is YouTube, which offers content protection services. And in other, less
common, cases the content owners develop and operate their on protection
systems.
We define and justify the following four
goals as the most important ones in multimedia content protection systems.
• Accuracy: The system should
have high accuracy in terms of finding all copies (high recall) while not
reporting false copies (high precision). Achieving high accuracy is
challenging, because copied multimedia objects typically undergo various
modifications (or transformations). For example, copied videos can be subjected
to cropping, embedding in other videos, changing bit rates, scaling, blurring,
and/or changing frame rates. Our
approach to achieve this goal is to extract signatures from multimedia objects
that are robust to as many transformations as possible.
• Computational Efficiency: The
system should have short response time to report copies, especially for timely
multimedia objects such as sports videos. In addition, since many multimedia
objects are continually added to online hosting sites, which need to be checked
against reference objects, the content protection system should be able to
process many objects over a short period of time. Our approach to achieve this
goal is to make the signatures compact and fast to compute and compare without
sacrificing their robustness against transformations.
• Scalability and Reliability: The
system should scale (up and down) to different number of multimedia objects.
Scaling up means adding more objects because of monitoring more online hosting
sites, having more content owners using the system, and/or the occurrence of
special events such as sports tournaments and release of new movies.
Conversely, it is also possible that the set of objects
handled by the system shrinks, because,
for example, some content owners may terminate their contracts for the
protection service. Our approach to handle scalability is to design a
distributed system that can utilize varying amounts of computing resources.
With large-scale distributed systems, failures frequently
occur, which require the content
protection system to be reliable in face of different failures. To address this
reliability, we design the core parts of our system on top of the MapReduce
programming framework, which offers resiliency against different types of
failures.
• Cost Efficiency: The system
should minimize the cost of the needed computing infrastructure. Our approach
to achieve this goal is to design our system to effectively utilize cloud
computing infrastructures (public and/or private). Building on a cloud
computing infrastructure also achieves the scalability objective discussed
above and reduces the upfront cost of the computing infrastructure.
Module
3
Architecture and Operation
The system has multiple components; most
of them are hosted on cloud infrastructures. The figure shows the general case
where one or more cloud providers can be used by the system. This is because
some cloud providers are more efficient and/or provide more cost saving for
different computing and communication tasks. For example, a cloud provider offering
lower cost for inbound bandwidth and storage can be used for downloading and
temporarily storing videos from online sites, while another cloud provider (or
private cloud) offering better compute nodes at lower costs can be used to
maintain the distributed index and to perform the copy detection process. The
proposed system can be deployed and managed by any of the three parties
mentioned in the previous section: content owners, hosting sites, or service
providers.
• Distributed Index: Maintains signatures
of objects that need to be protected;
• Reference Registration: Creates
signatures from objects that content owners are interested in protecting, and
inserts them in the distributed index;
• Query Preparation: Creates signatures
from objects downloaded from online sites, which are called query signatures. It
then uploads these signatures to a common storage;
• Object Matching: Compares query
signatures versus reference signatures in the distributed index to find
potential copies. It also sends notifications to content owners if copies are
found;
• Parallel Crawling: Downloads
multimedia objects from various online hosting sites. The Distributed Index and
Object Matching components form what we call the Matching Engine. The second
and third components deal with signature creation. For the Crawling component,
we designed and implemented a parallel crawler and used it to download videos
from YouTube. The details of the crawler are omitted due to space limitations.
Module
3
SIGNATURE
CREATION
The proposed system is designed to
handle different types of multimedia objects. The system abstracts the details
of different media objects into multi-dimensional signatures. The signature creation
and comparison component is media specific, while other parts of the system do
not depend on the media type. Our proposed design supports creating composite
signatures that consist of one or more of the following elements:
• Visual signature: Created based on the
visual parts in multimedia objects and how they change with time;
• Audio signature: Created based on the
audio signals in multimedia objects;
• Depth signature: If multimedia objects
are 3-D videos, signatures from their depth signals are created;
• Meta data: Created from information
associated with multimedia objects such as their names, tags, descriptions, format
types, and IP addresses of their uploaders or downloaders
Module
4
Constructing the Matching Engine
It has a data structure that we call the
distributed index as well as distributed processing operations. The index is
divided into two parts: (i) directing tree, and (ii) bins. Directing tree is a space
partitioning tree [19] that is used to group similar points in the same or
close-by bins. It is also used to forward query points to bins with potential
matches. Bins are leaf nodes of the directing tree, but they are stored as
files on the distributed file system. All processing of the matching engine is
performed in two distributed operations: (i) Build Index, and (ii) Match Objects.
The first creates the index from reference data points, and the second matches
query objects versus reference objects in the index. The design of our index
has two main features that make it simple to implement in a distributed manner,
yet efficient and scalable. First, data points are stored only at leaf nodes.
Intermediate nodes do not store any data, they only store meta data to guide
the search through the tree. This significantly reduces the size of the
directing tree and makes it fit easily in the main memory of a single machine
even for large datasets. This feature allows us to distribute copies of the
directing tree to distributed machines to process queries in parallel.
Replicating the directing tree on different machines not only facilitates
distributed processing, but it also greatly improves the robustness and
efficiency of the system. The robustness is improved because there is no single
point of failure. The efficiency is improved because there is no central
machine or set of machines that other machines need to contact during the
computation. The second feature of our index design is the separation of leaf
nodes (bins) and storing them as files on the distributed file system. This
increases reliability as well as simplifies the implementation of the distributed
computations in our system, because concurrent accesses of data points are
facilitated by the distributed file system. The distributed index is
constructed from reference objects, which is done before processing any
queries. Constructing the index involves two steps: (i) creating the directing
tree, and (ii) distributing the reference dataset to bins. Once created, the
directing tree is serialized as one object an stored on the distributed file
system. This serialized object can be loaded in memory by various computational
tasks running on multiple machines in parallel. Distribution of data is done in
parallel on multiple machines using a simple MapReduce job.
Module
5
Matching Objects
The object matching process is done in
three steps: (i) partitioning query dataset, (ii) finding nearest neighbors for
each data point in the query dataset, and (iii) performing application- specific
object matching using the found nearest neighbors. Each of these three steps is
executed in parallel on the MapReduce infrastructure. The first step partitions
the query dataset such that each partition contains a bin and a list of data points
that are likely to have neighbors in that bin. This is done using the directing
tree, which is used to create the list of data points that corresponds to each
bin.
CONCLUSION
AND FUTURE WORK
Distributing copyrighted multimedia
objects by uploading them to online hosting sites such as YouTube can result in
significant loss of revenues for content creators. Systems needed to find
illegal copies of multimedia objects are complex and large scale. In this
paper, we presented a new design for multimedia content protection systems
using multi-cloud infrastructures. The proposed system supports different
multimedia content types and it can be deployed on private and/or public
clouds. Two key components of the proposed system are presented. The first one
is a new method for creating signatures of 3-D videos. Our method constructs
coarse-grained disparity maps using stereo correspondence for a sparse set of
points in the image. Thus, it captures the depth signal of the 3-D video, without
explicitly computing the exact depth map, which is computationally expensive.
Our experiments showed that the proposed 3-D signature produces high accuracy
in terms of both precision and recall and it is robust to many video
transformations including new ones that are specific to 3-D videos such as
synthesizing new views. The second key component in our system is the
distributed index, which is used to match multimedia objects characterized by
high dimensions. The distributed index is implemented using the MapReduce
framework and our experiments showed that it can elastically utilize varying
amount of computing resources and it produces high accuracy. The experiments
also showed that it outperforms the closest system in the literature in terms
of accuracy and computational efficiency. In addition, we evaluated the whole content
protection system with more than 11,000 3-D videos and the results showed the
scalability and accuracy of the proposed system. Finally, we compared our
system against the Content ID system used by YouTube. Our results showed that: (i)
there is a need for designing robust signatures for 3-D videos since the
current system used by the leading company in the industry fails to detect most
modified 3-D copies, and (ii) our proposed 3-D signature method can fill this
gap, because it is robust to many 2-D and 3-D video transformations. The work
in this paper can be extended in multiple directions. For example, our current
system is optimized for batch processing. Thus, it may not be suitable for
online detection of illegally distributed multimedia streams of live events
such as soccer games. In live events, only small segments of the video are
available and immediate detection of copyright infringement is crucial to
minimize financial losses. To support online detection, the matching engine of
our system needs to be implemented using a distributed programming framework that
supports online processing, such as Spark. In addition, composite signature
schemes that combine multiple modalities may be needed to quickly identify
short video segments. Furthermore, the crawler component needs to be customized
to find online sites that offer pirated video streams and obtain segments of these
streams for checking against reference streams, for which the signatures would
also need to be generated online. Another future direction for the work in this
paper is to design signatures for recent and complex formats of 3-D videos such
as multiview plus depth. A multiview plus depth video has multiple texture and
depth components, which allow users to view a scene from different angles.
Signatures for such videos would need to capture this complexity, while being
efficient to compute, compare, and store.
REFERENCES
[1] A. Abdelsadek, “Distributed index
for matching multimedia objects,” M.S. thesis, School of Comput. Sci., Simon
Fraser Univ., Burnaby, BC, Canada, 2014.
[2] A. Abdelsadek and M. Hefeeda, “Dimo:
Distributed index for matching multimedia objects using MapReduce,” in Proc. ACMMultimedia Syst. Conf.
(MMSys’14), Singapore, Mar. 2014, pp. 115–125.
[3] M. Aly, M. Munich, and P. Perona,
“Distributed Kd-Trees for retrieval from very large image collections,” in Proc.
Brit. Mach. Vis. Conf. (BMVC), Dundee, U.K., Aug. 2011.
[4] J. Bentley, “Multidimensional binary
search trees used for associative searching,” in Commun. ACM, Sep. 1975,
vol. 18, no. 9, pp. 509–517.
[5] P. Cano, E. Batle, T. Kalker, and J.
Haitsma, “A review of algorithms for audio fingerprinting,” in Proc. IEEE
Workshop Multimedia Signal Process., Dec. 2002, pp. 169–173.
[6] J. Dean and S. Ghemawat, “MapReduce:
Simplified data processing on large clusters,” in Proc. Symp. Oper. Syst.
Design Implementation (OSDI’04), San Francisco, CA, USA, Dec. 2004, pp.
137–150.
[7] J. Deng, W. Dong, R. Socher, L. Li,
K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,”
in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR’09), Miami, FL,
USA, Jun. 2009, pp. 248–255.
[8] A. Hampapur, K. Hyun, and R. Bolle,
“Comparison of sequence matching techniques for video copy detection,” in Proc.
SPIE Conf. Storage Retrieval Media Databases (SPIE’02), San Jose, CA, USA, Jan.
2002, pp. 194–201.
[9] S. Ioffe, “Full-length video
fingerprinting. Google Inc.,” U.S. Patent 8229219, Jul. 24, 2012.
[10] A. Kahng, J. Lach, W.
Mangione-Smith, S. Mantik, I. Markov, M. Potkonjak, P. Tucker, H. Wang, and G.
Wolfe, “Watermarking techniques for intellectual property protection,” in Proc.
35th Annu. Design Autom. Conf. (DAC’98), San Francisco, CA, USA, Jun. 1998,
pp. 776–781.
[11] N. Khodabakhshi and M. Hefeeda,
“Spider: A system for finding 3D video copies,” in ACM Trans. Multimedia
Comput., Commun., Appl. (TOMM), Feb. 2013, vol. 9, no. 1, pp. 7:1–7:20.
[12] S. Lee and C. Yoo, “Robust video
fingerprinting for content-based video identification,” IEEE Trans. Circuits
Syst. Video Technol., vol. 18, no. 7, pp. 983–988, Jul. 2008.
No comments:
Post a Comment