Phish Events

Phish events are available to External Threats customers subscribed to the Brand Phishing module. They alert customers to phishing pages targeting their brands and customers. 

RiskIQ ingests millions suspected phishing URLs every day from a broad range of sources including third-party partner feeds, client abuse box integrations, client web server referrer log integrations, newly observed domains, and DMARC data, as well as direct URL submissions from users. With all the data aggregated from all of those sources, we apply proprietary algorithms to sort phish from non-phish and classify brand targets with industry-leading speed and accuracy. 

When a new phish URL is found impersonating an organization, a Phish event is created in their workspace, which can be viewed in the the events dashboard and events list inside the RiskIQ web application, in an email alert, or via the RiskIQ events API.

For a general introduction to events and other parts of the RiskIQ system, please see RiskIQ Platform Architecture.

Outlined below are tips on:

  1. How to read and interpret the information presented in a Phish event (field definitions)
  2. Suggested best practices for Phish event management, including user workflow and tagging
  3. How it works: Phishing Threats detection and system overview

Example: a phishing site targeting the OurTime dating service

Reading Phish Events - Field Definitions

Event List Item

This is how Phish events are represented in the Events section of the RiskIQ web application. Clicking on a list item brings up details for the event and user-initiated workflow actions. 

  • Thumbnail screenshot of the page that generated the event.
  • Event-Type: what kind of event it is.
  • Status: current status of the event.
  • URL: the web page associated to the event.
  • Created: date the event was generated.
  • Updated: date the event last changed status or was otherwise edited (most recent update recorded in the event history).
  • Brand Target: brand being impersonated by the phishing page.
  • Active: phish events are considered active if the page is live (has a 200 response code), is classified as a phish, and is targeting the organization in which whose workspace the event exists (if the target changes to a brand owned by another organization, the event may deactivate).
  • Browser Blocked: whether the phish URL is currently browser blocked by Google Safe Browsing (not pictured).
  • Tags (if any have been applied--not pictured).

Event Header

At the top of each event's details is a header containing workflow actions. See Event User Actions for information on these options.

Summary Tab

The Summary provides screenshots of the first and most recent crawls of the page and other information for assessing the event and deciding how to act on it. The Summary tab is organized into multiple sections:

ATTRIBUTES

  • Alexa: degree of web traffic indicated by the site’s Alexa rank (High = Top 1,000, Medium = Top 10,000, Low = 10,000+).
  • Domain:  domain associated to this event.
  • Domain Expires: what date the domain expires, or “Not Registered” if the domain is not registered.
  • Initial URLs: the Initial URLs, which the RiskIQ virtual user followed to find this page. The initial URL and the URL of the Phish event (the final URL) may be different if the initial URL is a shortened URL or had a redirect to the final URL. There may also be multiple initial URLs listed if different initial URLs have all been seen redirecting to the same final URL phishing page (click to expand the list if there are more than 5). 
  • Cloaking: whether the final URL is the same as the initial URL or whether there was a redirect meant to obfuscate where the link leads to.
  • Source: where the URL for the Phish event was first obtained. This could be one of RiskIQ's global feeds of suspected phish that we comb through, a client-specific source, such as an abuse box, referrer logs, or a DMARC feed, or it could be a RiskIQ search project such as a similar domains project or social media search project.
  • Target Brand: the brand the phishing page is impersonating / the type of credentials the page asks the user to enter.
  • Target Country: the country of the users the page is intended to target (this can be difficult to determine reliably in an automated fashion for many brands, so the field can be manually set by users directly within the UI if they would like to assign or re-assign it).

WHOIS

  • IP: IP address for the site associated to this event.
  • Registrar: name of registrar for the domain associated to this event.
  • Registrar Email: email address to contact the registrar for the domain.
  • Registrar Phone: phone number to contact the registrar.
  • Registrant: name of the registrant for the domain associated to this event.
  • Registrant Email: email address to contact the registrant for the domain.
  • Registrant Phone: phone number to contact the registrant for the domain.
  • ASN: autonomous system number (ASN) associated to this event with the country of origin, and company.
  • Hosting Provider: name, city, state, and country of the hosting provider for the site associated to this event.
  • Name Servers: name servers for the domain associated to this event (click to expand if there are more than 5).
  • MX Records: mail exchanger records specifying mail servers responsible for accepting email messages sent to this domain name.
  • TXT Records: other text records associated to this domain's DNS entry, including SPF records indicating which mail servers are allowed to send mail from this domain.

HISTORY

  • Timeline of changes made to the event with the date, time, and name of the user who took each action, including:
    • Status changes 
    • Emails sent (with recipients)
    • Notes added
    • Tags added/removed

Site Details

This section provides more information about the website associated to this event beyond what is shown in the summary tab, including:

  • CName
  • Nameserver Information
  • ASN Information
  • Metro Code Information
  • Alexa Category and Exact Rank
  • Full Whois Record
  • Full IP Whois Record
  • Host Details
  • SSL Information
  • File Information

Classify Tab

This section details why this event was created, and what about the page was flagged by the RiskIQ system in relation to your rogueBrandForPhish classifier, the machine-learning-based, RiskIQ Minhash Classifier, and/or published Phishing Blacklists (see the section on 'Detection' for more details).

Classifiers score the characteristics of web pages seen by virtual users and determine whether or not an event should be made according to the logic described in the policy. Each classifier used in Phish event analysis is listed here with the number of hits (instances), its total score, and the highlighted page content (if applicable) that created the score per each available field that the classifier is targeted to.

All web pages, regardless of content, share a set of common features that are extracted and analyzed by RiskIQ, and which can be targeted specifically within your classifiers. These features include those pictured below as well as many others. 

If you are a RiskIQ admin user looking for step-by-step instructions on creating/modifying Phish classifiers, policies, or projects refer to Setting Up Phish Events.

Crawls Tab

This section houses information on each instance this page was analyzed by RiskIQ. Users can select from any of the times that RiskIQ analyzed the page associated to this event to see details about the virtual user's interaction with the event page and user session overall at that point in time.  (A red arrow next to the timestamp indicates the event was active at the time of that crawl, while grey signals inactive).

Details provided about the crawl include: 

  • An overview providing metadata on the crawl and the screenshot taken by the virtual user
    • Global Unique Identifiers for the user session and the page within the user session
    • Date and time
    • Initial URL where the virtual user began the crawl
    • Browser used
    • Geographic location of the virtual user
    • Total number of pages visited during the user session
    • Total number of pages visited that returned error messages
    • URL of the event page
    • IP address
    • Response code and message returned by the event page
    • Page Content-type
    • Page Content length
    • Page response time
    • Window name
  • The original HTML response of the page
  • The rendered document object model after the page loaded in the user's browser
  • Files
  • Cookies
  • Links
  • Headers

Blacklist Tab

This tab includes information about what blacklists and AV vendors have currently reported this phishing site, and the timestamp showing the times at which any pages reported to Google Safe Browsing through RiskIQ were submitted and at what time they became blocked.

RiskIQ automatically reports phish URLs to GSB and Microsoft SmartScreen upon confirmation and/or enforcement of an event via integrations with these services, so that consumers will receive a warning before loading those pages in their browser. This ensures that the phishers do not continue to gather credentials while any abuse complaints requesting to take the site fully offline are being processed. When a phish is blocked, a "Browser Blocked" label is visible at the top of the event. Similarly, if a URL is removed from the blocking list, this label will disappear. Notes in the event history show timestamps when blocking starts and ends and current blocking status is also filterable. If a phish was removed from GSB's block list and later comes back as active in the future, we automatically resubmit phish in the tenacious status provided they were previously confirmed or enforced before. 

If Google fails to block a phish URL we submit (they do some automated filtering on their side of all the submissions they receive as well to prevent false positive reports, but can also inadvertently exclude true phish at times), we automatically resubmit it every 24 hours as long as the phish site remains active. The blocked at timestamp is based on the time the URL was placed on Google's desktop Chrome blocking list. 

Google Safe Browsing and Microsoft SmartScreen together cover >95% of Internet users and all major browsers, including Chrome, Safari, Firefox, Internet Explorer, and their mobile equivalents. Plugins are also available for Opera and other browser-type users. Browsers other than Chrome and Chrome for Android may take slightly longer to block URLs given that they need to receive updates to the blocking list downstream from Google, and some discrepancies may exist between Chrome mobile and desktop blocking due to device constraints on Google's side. The Google Play Services implementation of GSB, which Chrome uses on Android, is subject to both bandwidth and RAM constraints to facilitate usage across a wide variety of Android devices. In practice, this means that some mobile devices may not receive equivalent coverage when compared to a desktop implementation. However, Google does their best to ensure these devices have the most meaningful subset of lists to provide as much protection as possible in the face of these constraints.

Managing Phish Events - User Review Decision Workflow and Tagging Best Practices

The flow chart below describes a decision tree encompassing best practices for reviewing phish events. It describes in more detail the 'User Review' step in the system overview diagram at the end of this article.

  • Green represents steps taken automatically by the RiskIQ system
  • Pink represents steps taken by a human user
  • Blue represents a status and/or tag label

Tag Set

(Custom--tags are not typically used for Phish events)

Phishing Threats System Overview

Detection

Phish events are designed to detect pages that are impersonating your brands and targeting your customers to steal sensitive information. The system processing time between when a URL first appears in one of our sources and when an event notification is created is typically 5-10 minutes. 

RiskIQ determines whether or not a given URL is a phish vs. a non-phish using a proprietary machine learning algorithm developed by our in-house Data Science team. This algorithm looks at many dimensions of similarity comparing the page being analyzed to known phishing pages and the legitimate login pages likely to be copied by phishers. Examples of previously observed phish and/or legitimate login pages that were detected as similar to any particular event page (and thus, which contributed to the system's determination that the page is a phish) can be seen in the classify tab of each event.

Having first determined that something is likely to be a phish in general, RiskIQ next checks if it's a phish that is relevant to your organization specifically. The target brand of a phish is jointly determined through 3 possible methods. 

  1. Most often, this is based on the machine learning classifier determining the most likely target brand based on which other previously seen phish and/or official login pages the page is most similar to. 
  2. We also use is to look for direct terms or regular expressions in the page content or URL, such as for specific logos or brand names appearing as an alternate way of determining and/or confirming the target brand. This is called the 'Rogue Brand For Phish' classifier and in the classify tab of each event, you can see which parts of the page this classifier identified highlighted in their surrounding context. The machine learning model can be biased towards labeling the target brand of pages only with brands we have seen examples of before or those that we see the most often, so this "back-up method" ensures that an event that mentions a specific brand directly in a way that hits the classifier overrides the target brand label provided by the machine learning model and the page will generate an event in your workspace. This can cause false positive events to be generated, however, if the entries of the classifier are overly broad, and it hits on phishing pages for other brands as well that are correctly labeled by the machine learning model. 
  3. Lastly, some of RiskIQ's open source intelligence partners who publish feeds of suspected phishing URLs also provide the suspected target brand as metadata in the feed. This is by far the least common method of brand classification used by RiskIQ, but if that information is available and states that URL is targeting a specific brand, we can also automatically use that piece of data to attribute the phish to your workspace, even if the other 2 methods did not provide a brand match. Due to the fact that RiskIQ cannot independently verify the accuracy of the brand information provided by these feeds and there may be false positives, the use of these brand matchers is optional your workspace. This information does not appear in the classify tab.

System Overview

The following diagram follows a Phish event through the RiskIQ system from a virtual user first encountering a page, to the analysis of the crawl, and through the event monitoring, including enforcement procedures to resolution, and post-resolution monitoring. 

  • Green represents steps taken automatically by the RiskIQ system
  • Pink represents steps taken by a human user (see the User Review diagram above for more details on this step)
  • Blue represents a status and/or tag label

Monitoring and Auto-Resolution

  • Phish events are monitored according to a "burn-down" schedule, wherein the length of time between checks increases over time. This schedule is designed to mirror thresholds that an analysis of a large corpus of phish found were statistically likely to correspond to the “lifespan” of various cohorts of the problem. The time spans mentioned are rough, as load on RiskIQ systems can cause algorithmic adjustments in the intervals. The schedule is as follows:
    • First crawl
    • Again roughly 5 min later
    • Again roughly 30 min later
    • Again roughly 3 hours later (while a phish is enforced, it will not go longer than 3 hours between checks, if not enforced, the burn-down will continue)
    • Again roughly 9 hours later
    • Again roughly a day later
    • Again roughly a day later
    • And then two, three, three, four, four, eight, and another eight days later.... etc.
  • This "burn-down" cycle resets and starts over whenever a new enforcement notice is sent or an event switches between being active and inactive, as these are periods where we expect to see more changes occur and, thus, monitoring the event more frequently allows us to record these changes more precisely.
  • Other crawl samples of a given event may occur in between scheduled monitoring checks as well (e.g. if a phish URL is reported in other phish feeds as a suspect URL after RiskIQ has already seen it and so we crawl it there as well as through the monitor project for the event). This can result in multiple initial URLs being observed and recorded for the same final URL.
  • Any initial URLs seen with a phish event are recorded as part of the event. Since some phish are only accessible by passing through a specific redirect chain (as a method of obfuscation), we attempt to crawl all initial URL redirect sequences as well as monitor the phish URL directly whenever we do a monitoring check. As long as at least one of those crawls results in a live observation of the final event URL, we consider the event active, and keep one of those crawls as the new sample in the crawls tab. If none of the crawls result in an active sample, again, we choose just one of the inactive crawls to show as a sample. If any of the crawls result in a phish page that is different from the event URL being monitored, we create a new event for that final URL.
  • Auto-resolving a phish event requires 2 consecutive inactive crawls and at least 1 hour of uninterrupted inactive time (where inactive is defined as meeting any of the following criteria: no longer being a live site, being live but no longer being a phishing page, or being a phishing page, but having changed to target a different brand other than those configured in the workspace).
  • Bringing a phish event back from resolved to tenacious requires just 1 subsequent active crawl after resolution has occurred.