Content events are used for classifying a wide range of general webpages and content that are not specifically treated as suspected phish, customer-owned websites, social media profiles, or mobile app stores to analyze against the configured policy.
When a policy violation is found, a Content event is created in the workspace which can be viewed in the events dashboard and events list inside the RiskIQ web application, in an email alert, or via the RiskIQ events API.
For a general introduction to events and other parts of the RiskIQ system, please see RiskIQ Platform Architecture.
Outlined below are tips on:
- How to read and interpret the information presented in a Content event (field definitions)
- Suggested user review decision tree and tagging best practices for Content event management
- How it works: Custom Monitoring system workflow overview
Example: using content events to monitor hacker forums and the sale of stolen credit card information.
Reading Content Events - Field Definitions
This is how Content events are represented in the Events section of the RiskIQ web application. Clicking on a list item brings up details for the event and user-initiated workflow actions.
Event List Item
- Thumbnail screenshot of the page that generated the event.
- Event-Type: What kind of event it is.
- URL: the web page associated to the event.
- Status: current status of the event.
- Created: date the event was generated.
- Updated: date of the most recent entry in the event history.
- Domain: the domain name of the URL that generated the event.
- Active: Content events are considered active as long as the URL remains live and contains content that triggers the custom policy--if that content is removed or the page is observed as inactive, the active flag will not appear in the list item any longer.
- Tags (if any have been applied--not pictured)
At the top of each event's details is a header containing workflow actions. See Event User Actions for information on these options.
The Summary page provides screenshots of the first and most recent crawls of the page and other information for assessing the event and deciding how to act on it. The Summary tab is organized into multiple sections:
- Domain: domain of the URL associated to this event.
- Alexa: degree of web traffic indicated by the site’s Alexa rank (High = Top 1,000, Medium = Top 10,000, Low = 10,000+).
- IP: IP address for the site associated to this event.
- Registrar: name of registrar for the domain associated to this event.
- Registrar Email: email address to contact the registrar for the domain.
- Registrar Phone: phone number to contact the registrar.
- Registrant: name of the registrant for the domain associated to this event.
- Registrant Email: email address to contact the registrant for the domain.
- Registrant Phone: phone number to contact the registrant for the domain.
- ASN: autonomous system number (ASN) associated to this event with the country of origin, and company.
- Hosting Provider: name, city, state, and country of the hosting provider for the site associated to this event.
- Name Servers: name servers for the domain associated to this event (click to expand if there are more than 5).
- MX Records: mail exchanger records specifying mail servers associated to the domain name
- TXT Records: other text records associated to this domain's DNS entry
- Timeline of changes made to the event with the date, time, and name of the user who took each action, including:
- Status changes
- Emails sent (with recipients)
- Notes added
- Tags added/removed
- Owner changes
- Priority changes
This section provides more information about the website associated to this event beyond what is shown in the summary tab, including:
- Nameserver Information
- ASN Information
- Metro Code Information
- Alexa Category and Exact Rank
- Full WhoIs Record
- Full IP WhoIs Record
- Host Details
- SSL Information
- File Information
This section details what about the page was flagged by the RiskIQ system in relation to your business logic.
Classifiers score the characteristics of web pages seen by virtual users and determine whether or not an event should be made according to the logic described in the policy. Each classifier used in Content event analysis is listed here with the number of hits (instances), its total score, and the highlighted page content (if applicable) that created the score per each available field that the classifier is targeted to.
All web pages, share a set of common features that are extracted and analyzed by RiskIQ, and which can be targeted specifically within your classifiers. These features include those pictured below as well as many others:
If this URL or its content is blacklisted, then this tab shows additional information on what reputation sources have reported it and for what type of activity.
If this event has been enforced, then this tab shows additional information about any enforcement actions this event has been involved in, such as the date, the user who initiated the action, the recipients of the request, and the full text of the sent message.
This section houses information on each instance this page was analyzed by RiskIQ. Users can select from any of the times that RiskIQ analyzed the page associated to this event (red arrow next to the timestamp indicates, active, while grey signals inactive) to see details about the virtual user's interaction with the event page and user session overall at that point in time.
Details provided about the crawl include:
- An overview providing metadata on the crawl and the screenshot taken by the virtual user
- Date and time
- Initial URL where the virtual user began the crawl
- Browser used
- Geographic location of the virtual user
- Total number of pages visited during the user session
- Total number of pages visited that returned error messages
- URL of the event page
- IP address
- Response code and message returned by the event page
- Page Content-type
- Page Content length
- Page response time
- Window name
- The original HTML response of the page
- The rendered document object model after the page loaded in the user's browser
Managing Content Events - User Review Decision Tree and Tagging Best Practices
Tags and User Review steps for Content Events are custom per client given the non-standard nature of the detection criteria.
Your RiskIQ Technical Account Manager will work with you to build event categories and review/enforcement protocols that are appropriate for your use-case and reporting goals.
Content Events System Overview
Content events are detected via crawls. Unlike some event-types that only inspect crawls from specific sources (a.k.a. "processed projects"), Content events inspect against all RiskIQ crawls inside and outside the workspace with (the exceptions of client inventory site monitoring and any projects outside the workspace that are explicitly labeled as private). The grain of the event is a unique URL.
Confirmed websites in the customer's inventory are whitelisted from generating Content events regardless of whether the crawl originates from their inventory site monitoring project or not, and a URL must successfully resolve to an IP (have some kind of web content on it) to potentially qualify as an event. Otherwise though, the detection logic for Content events is intentionally flexible in order to be able to support a wide variety of use cases surrounding web threats and undesirable content or behavior of third party web pages to which the client organization wants to be alerted and potentially take action in response.
The following diagram follows a Content event through the RiskIQ system from a virtual user first encountering a page, to the analysis of the crawl, and through the event monitoring based on user review, including enforcement procedures to resolution, and post-resolution monitoring.
- Green represents steps taken automatically by the RiskIQ system
- Pink represents steps taken by a human user
- Blue represents a status and/or tag label
If you are a RiskIQ admin user looking for step-by-step instructions on creating/modifying Content Event classifiers, policies, or projects refer to Setting Up Content Events.
Monitoring and Auto-Resolution
- Content events are re-crawled roughly every 48 hours. Additional samples can occur outside of this schedule based on normal/non-monitoring-related virtual user activity (if, for example the same pages also show up in searches for new pages).
- Monitoring times are somewhat rough--to balance load across the entire system, so crawls may be slightly advanced or delayed to prevent road spikes.
- Upon the first inactive sample of an event, an additional crawl will be scheduled 12 hours later to confirm whether it should resolve or the first crawl was an anomaly
- An event will automatically resolve after 2 consecutive inactive samples and at least 1 hour of continuous inactive time.
- Events change from Resolved to Tenacious if the next crawl is found to be active.
- Events change from Monitor to Tenacious if there is a >10% difference in page content between the next crawl and the prior one.