Abstract:
Existence of spam URLs over emails and Online Social Media (OSM) has become a growing
phenomenon. To counter the dissemination issues associated with long complex URLs in emails
and character limit imposed on various OSM (like Twitter), the concept of URL shortening
gained a lot of traction. URL shorteners take as input a long URL and give a short URL
with the same landing page in return. With its immense popularity over time, it has become
a prime target for the attackers giving them an advantage to conceal malicious content. Bitly,
a leading service in this domain alone shortens close to 80 million links each day, and marks
2-3 million as suspicious every week. 1 Some recent research highlights that services from Bitly
are being exploited heavily to carry out phishing attacks, work from home scams, pornographic
content propagation, etc. In year 2012, one major attack happened in which the U.S. federal
government's o cial short link service usa.gov (in collaboration with Bitly) was hijacked to
spread work from home scam. 2 Such attacks which targets seemingly secure and highly trusted
web sources look alarming and also bring to light the massive impact of exploiting the shortening
services. All this imposes additional performance pressure on Bitly and other URL shorteners
to be able to detect and take a timely action against the illegitimate content. It therefore
becomes important to inspect and identify the root cause and gaps in the implementation of
Bitly leading to such attacks. Over the years, multiple defense mechanisms have been set up to
handle traditional long URL spam but detection of short URL spam at zero hour still remains
a challenging task.
In this study, we analyzed a dataset marked as suspicious by Bitly in the month of October
2013 to highlight some ground issues in their spam detection mechanism. Our results reveal the
ine ciency of Bitly in using some spam detection services that it claims to use. We also show
as to how a suspicious Bitly account goes unnoticed despite of a prolonged recurrent illegitimate
activity. Bitly only displays a warning page on identi cation of suspicious links, but we observed
this approach to be weak in controlling the overall problem of spam. In addition, we identi ed
some short URL based features and coupled them with two domain speci c features to classify
a Bitly URL as malicious / benign. Short URL based feature set that we used comprises of
click dependent as well as click independent metrics, thus our algorithm can also identify a
malicious Bitly URL even before it is actually clicked. The proposed solution is independent
of any available blacklists or lexical URL based features. We used standard machine learning
classi cation techniques and were able to detect malicious Bitly URLs with a maximum accuracy
of 86.41%. Although our algorithm is designed speci c to Bitly, but we believe that it can be
easily extended and used by any other URL shortening services. To the best of our knowledge,
this is the rst attempt to underline loopholes in security mechanisms of the most popular URL
shortening service by analyzing only content the service itself marks as suspicious, and proposing
a suitable countermeasure.