Discovering Emerging Topics in Social Streams via
Link-Anomaly Detection
ABSTRACT:
Detection of
emerging topics is now receiving renewed interest motivated by the rapid growth
of social networks. Conventional-term-frequency-based approaches may not be appropriate
in this context, because the information exchanged in social-network posts
include not only text but also images, URLs, and videos. We focus on emergence
of topics signaled by social aspects of theses networks. Specifically, we focus
on mentions of user links between users that are generated dynamically
(intentionally or unintentionally) through replies, mentions, and retweets. We
propose a probability model of the mentioning behavior of a social network
user, and propose to detect the emergence of a new topic from the anomalies
measured through the model. Aggregating anomaly scores from hundreds of users,
we show that we can detect emerging topics only based on the reply/mention
relationships in social-network posts. We demonstrate our technique in several
real data sets we gathered from Twitter. The experiments show that the proposed
mention-anomaly-based approaches can detect new topics at least as early as
text-anomaly-based approaches, and in some cases much earlier when the topic is
poorly identified by the textual contents in posts.
EXISTING SYSTEM:
Ø A new (emerging) topic is something people feel like
discussing, commenting, or forwarding the information further to their friends.
Conventional approaches for topic detection have mainly been concerned with the
frequencies of (textual) words.
DISADVANTAGES
OF EXISTING SYSTEM:
A
term-frequency-based approach could suffer from the ambiguity caused by
synonyms or homonyms. It may also require complicated preprocessing (e.g.,
segmentation) depending on the target language. Moreover, it cannot be applied
when the contents of the messages are mostly nontextual information. On the
other hand, the “words” formed by mentions are unique, require little
preprocessing to obtain (the information is often separated from the contents),
and are available regardless of the nature of the contents.
PROPOSED SYSTEM:
Ø In this paper, we have proposed a new approach to
detect the emergence of topics in a social network stream.
Ø The basic idea of our approach is to focus on the
social aspect of the posts reflected in the mentioning behavior of users
instead of the textual contents.
Ø We have proposed a probability model that captures
both the number of mentions per post and the frequency of mentionee.
ADVANTAGES
OF PROPOSED SYSTEM:
Ø The proposed method does not rely on the textual
contents of social network posts, it is robust to rephrasing and it can be
applied to the case where topics are concerned with information other than
texts, such as images, video, audio, and so on.
Ø The proposed link-anomaly-based methods performed even
better than the keyword-based methods on “NASA” and “BBC” data sets.
SYSTEM
REQUIREMENTS:
HARDWARE REQUIREMENTS:
Ø
System : Pentium IV 2.4 GHz.
Ø
Hard Disk :
40 GB.
Ø
Floppy Drive : 1.44
Mb.
Ø
Monitor : 15
VGA Colour.
Ø
Mouse :
Logitech.
Ø Ram : 512 Mb.
SOFTWARE
REQUIREMENTS:
Ø Operating system : Windows
XP/7.
Ø Coding Language : JAVA/J2EE
Ø IDE : Netbeans 7.4
Ø Database : MYSQL
REFERENCE:
Toshimitsu
Takahashi, Ryota Tomioka, and Kenji Yamanishi, Member, IEEE,“Discovering Emerging Topics in Social Streams
via Link-Anomaly Detection”, IEEE
TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 1, JANUARY 2014.
No comments:
Post a Comment