Defacement Events

Defacement events are available to customers who have purchased RiskIQ Enterprise Digital Footprint (view product description). They alert customers to any sites within their inventory that have been defaced by hackers.

When a defaced site is found, a Defacement event is created in the workspace which can be viewed in the the events dashboard and events list inside the RiskIQ web application, in an email alert, or via the RiskIQ events API.

Outlined below are tips on:

  1. How to read and interpret the information presented in a Defacement event
  2. Suggested best practices for Defacement event management, including user workflow and  tagging

Example: a defaced site promoting the hackers' contact information on Facebook


Reading Defacement Events - Field Definitions

Event List Item

This is how Defacement events are represented in the Events section of the RiskIQ web application. Clicking on a list item brings up details for the event and user-initiated workflow actions. 

  • Event-Type: What kind of event it is.
  • Active: Defacement events are considered active if the page is live (has a 200 response code) and triggers the defacement policy.
  • Status: current status of the event.
  • Domain: the domain of the web page associated to the event.
  • First Seen: date the event was generated.
  • Active For: (if applicable) how much time has passed between the first and most recent crawl of this page where the violation was active. 
  • Tags (if any have been applied)

Event Header

At the top of each event's details is a header containing high-level information, as well as workflow actions.


  • Status: current event status and the ability to change the status of this event.
  • Tags: Tags applied to this event and the ability to add or remove tags (if any tags are configured for this event-type).
  • Owner: current event owner responsible for reviewing or tracking the event and the ability to assign a new owner for this event. 
  • Priority: current event priority and the ability to assign a new priority for this event.
  • Email Event Details (via envelope icon at top right)

Summary Tab

The Summary provides screenshots of the first and most recent crawls of the page and other information for assessing the event and deciding how to act on it. The Summary tab is organized into multiple sections:

ATTRIBUTES

  • Alexa: degree of web traffic indicated by the site’s Alexa rank (High = Top 1,000, Medium = Top 10,000, Low = 10,000+).

SITE STATUS

  • A calendar showing crawl dates and times when the page was observed as an active defacement or a non-defaced page.

HISTORY

  • Timeline of changes made to the event with the date, time, and name of the user who took each action, including:
    • Status changes 
    • Emails sent (with recipients)
    • Notes added
    • Tags added/removed

Site Details

This section provides more information about the website associated to this event beyond what is shown in the summary tab, including:

  • CName
  • Nameserver Information
  • ASN Information
  • Metro Code Information
  • Alexa Category and Exact Rank
  • Full WhoIs Record
  • Full IP WhoIs Record
  • Host Details
  • SSL Information
  • File Information

Classify Tab

This section details what about the page was flagged by the RiskIQ system in relation to the hacking classifier or the machine-learning-based, RiskIQ Minhash Defacement Classifier.


Crawls Tab

This section houses information on each instance this page was analyzed by RiskIQ. Users can select from any of the times that RiskIQ analyzed the page associated to this event to see details about the virtual user's interaction with the event page and user session overall at that point in time (a red arrow next to the timestamp indicates, active, while grey signals inactive).


Details provided about the crawl include: 

  • An overview providing metadata on the crawl and the screenshot taken by the virtual user
    • Date and time
    • Initial URL where the virtual user began the crawl
    • Browser used
    • Geographic location of the virtual user
    • Total number of pages visited during the user session
    • Total number of pages visited that returned error messages
    • URL of the event page
    • IP address
    • Response code and message returned by the event page
    • Page Content-type
    • Page Content length
    • Page response time
    • Window name
  • The original HTML response of the page
  • The rendered document object model after the page loaded in the user's browser
  • Files
  • Cookies
  • Links
  • Headers

Defacement Even Detection Overview

Defacement Events will only ever occur on webpages in a customer's inventory that are associated to an enterprise host. For information on how RiskIQ crawls pages in your inventory, collecting data to analyze for indicators of defacement, see FullSite Scanning.

RiskIQ's defacement event detection analyzing a crawled page includes two different detection types (both can apply to the same event simultaneously as shown in the classify tab example screenshot provided above). When page that is on an enterprise host in inventory is crawled, an event will be created if there is a match on either of the following classifiers / detection models:

  1. Defacement Minhash Classifier: This is a proprietary machine learning-based classifier developed by RiskIQ's in-house Data Science team. The model analyzes the similarity of a given page in terms of its overall structure and composition to examples of known defaced pages. Defacement threat actors tend to make every page they deface look the same (publishing their calling card on the site in the place of the legitimate content that was there previously). Defacements by different actors also tend to look somewhat similar to one another in terms of the structure of the page vs. legitimate pages, but may differ in terms of the specific text or images used for example. This model is trained to take advantage of this to detect likely defaced pages with a very high level of confidence and accuracy. It is limited however by needing examples of pages for training, and thus, is not as accurate if confronted with totally new, never-before-seen types of defacements.
  2. Hacking Taxonomy Classifier: This is a text-based classifier that is meant to be a secondary system supplementing the Minhash classifier in the event that a page is defaced that our model has not previously seen before and been trained on, and which is not significantly similar enough to other trained pages to trigger the machine learning model. This classifier looks for generic indications of defacement such as the title of a page containing the word "hacked" or the name of a well-known hacking group appearing in page content. This system is more prone to false positives, however, it it can also help prevent potential false negatives by pulling in suspicious pages that are not high confidence enough to trigger the Minhash classifier. 

Managing Defacement Events - User Review Decision Workflow and Tagging Best Practices

  • Green represents steps taken automatically by the RiskIQ system
  • Pink represents steps taken by a human user
  • Blue represents a status and/or tag label

Tag Set

(Custom--there is typically no need to use tags for Defacement events, but custom labels can be made as needed)

Monitoring and Resolution

  • Defacement events are re-crawled roughly every 48 hours (In addition to the normal scheduled full site scanning of asset pages). Additional samples can potentially occur outside of this schedule if the same page was also crawled by RiskIQ for another reason. 
    • Monitoring times are somewhat rough--to balance load across the entire system, so crawls may be slightly advanced or delayed to prevent road spikes.
  • Upon the first inactive sample of an event, an additional crawl will be scheduled 12 hours later to confirm whether it should resolve or the first crawl was an anomaly
  • An event will automatically resolve after 2 consecutive inactive samples and at least 1 hour of continuous inactive time.
  • Events change from Resolved to Tenacious if the next crawl is found to be active.
  • All events are monitored using the metro and browser from the most recent active crawl sample; if there was no prior active crawl (ex. for a manually submitted event), then the default crawl settings will be inherited from the auto-generated monitor project assigned to the event-type (typically this is a US based metro and a recent desktop version of Chrome as the browser unless otherwise specified)