VisibleRisk Data Collection Process

By Sean Malone – VP Delivery and CISO at VisibleRisk

In order to provide an accurate assessment of our clients’ cyber risk, VisibleRisk implements a holistic methodology that relies on data from many different sources. 

In this article, we’ll review the different types of data collected, and the approaches employed to collect that data. We want to be highly transparent about this process, so our clients know exactly what to expect when they work with us.

This article will cover:

  • Our approach to data and collection
  • External data
  • Internal data
  • Data validation
  • Security

Our Approach to Data Selection and Collection 

To understand and quantify a company’s cyber risk exposure, we need to first understand multiple dimensions of the situation. 

  • Who would want to attack this company, and why? 
  • How sophisticated would that adversary be? 
  • What is the attack surface — both internal and external? 
  • What security controls are in place to mitigate the likelihood of that attack succeeding? 
  • If it does succeed, what impact will it have on the organization, and what controls are in place to respond and recover quickly to mitigate that impact? 
  • What insurance and cash reserves are available to offset the financial loss from this event?


Understanding all these different sides of cyber risk requires collecting a significant amount of data about the assessed company. We refer to this dataset as “features,” which are used as inputs to our quantification and rating model. These features are collected in multiple ways. Features do not apply judgment, they are simply validated data points about a client organization. 


For example, to find out how many Active Directory Domain Administrators a company has, we could include that as a survey question, interview an Active Directory administrator, request a screenshot of the applicable screen in the server management interface, or connect directly to the server and run the appropriate query to return this data. These approaches vary, both in facility of collection, and in fidelity of the data collected. 


At VisibleRisk, we seek to deliberately collect features from the best point on the spectrum between “low effort, low fidelity” and “high effort, high fidelity.” Which approach is best depends on the specific feature in question. When two methods for collecting the data needed for a feature have the same level of fidelity, they are considered equivalent, and we can leverage the approach with the least effort required for our client. If the supporting data is not sufficiently reliable for a particular feature we may collect multiple data points in order to reach our desired fidelity threshold. 


Further, we continuously invest in the development of our proprietary technical collection tools, which accelerate and further automate the data collection process and help maximize fidelity while minimizing the required effort for our clients’ to provide this data. 

External Data Collection

We start every project with external data collection that requires no effort on the part of our clients. Sources of external data sources include:

  • Business Profile Data –  Understanding your business operations and financial profile provides context for the value at risk, and lets us establish a peer group for the assessment. 
  • Historical Breach and Loss Data – Analyzing past cyber attacks, particularly those from your peer group, allows us to form a basis for modeling the impact of future events. 
  • Threat Intelligence Data – Understanding which adversaries would target your organization, and the sophistication of those adversaries, is critical to evaluating the strength of defensive capabilities. 
  • Attack Surface Data – Visualizing your organization from the attacker’s perspective helps us evaluate your technical susceptibility to an external attack. 


We use this data to begin the modeling process before validating key elements with client representatives in subsequent stages.

Internal Data Collection

Next, we work directly with client representatives to augment the external data with several types of internal data collection. We collect internal data in the following ways:


Through our client portal web application, client representatives provide key data points on security controls and business operations through a series of questionnaires. These include the following topics:

  • Business Profile
  • Loss Exposure
  • Governance
  • Security Culture
  • Third Party and Supply Chain Oversight
  • Asset Management
  • Secure Device Management
  • Vulnerability Management
  • Boundary Defense
  • Network Security
  • Data Protection
  • Identity and Access Management
  • Detection and Response
  • Business Continuity and Data Recovery

Security Solution Reports

Most of our clients have deployed industry-standard security solutions, and we leverage those solutions as a rich set of data. Common examples are vulnerability management, endpoint detection and response, and SIEM solutions. We implement connectors to extract these reports via APIs where feasible. In other cases, we’ll provide instructions for client representatives to generate and upload a report via the solution console.

Collector Tools

Most of our system hardening features, in addition to identity and access management and several other areas, are best collected directly from our clients’ information assets. We have developed a proprietary toolset to accomplish this. Our endpoint assessment tool is typically executed through existing software deployment solutions and runs on the system being assessed. To be clear, this is not yet another installed agent, but rather a utility that is run once and discarded, to minimize the load on the endpoints. Other tools can be run from a user workstation, such as our Active Directory tool and database server assessment tool.

Tester Tools

Finally, we execute empirical tests to validate the effectiveness of key controls. For example, we generate pseudo-sensitive data (data that is randomly generated, but looks like credit card numbers, personal identification numbers, etc.) and attempt to exfiltrate that data to points on the Internet through a variety of techniques commonly used by real adversaries. We also use similar approaches to test our clients’ email filtering capabilities.

Data Validation

As we collect these different types of data, we conduct several review sessions with client representatives to confirm the accuracy of these features and dive deeper into any areas requiring further refinement after the first round of data collection. After validation and normalization to account for differences in our clients’ operating environments, the complete set of features is ready to be used in our assessment, quantification, and rating model.

A Word on Security

Our company is made up of security people. We understand that you care about protecting your company’s data and infrastructure, and you may be hesitant to run tools in your environment and share this data with us. 

To help you make this decision safely and confidently, our data collection is radically transparent. We share the full source code of all tools run in your environment, and you’re welcome to inspect & sandbox them to whatever extent you’d like. We also invite you to view the data generated by these tools before it’s transferred to VisibleRisk, so you can confirm that we’re extracting just these security-relevant features, and not unnecessary business, financial, or personal data

    Stay updated with VisibleRisk by signing up for our newsletter

    Your form has been submitted

    Thank you for subscribing to the VisibleRisk newsletter!

    Make better cybersecurity decisions with VisibleRisk

    • Understand how cyber risk impact’s your organization
    • Make informed risk-based decisions
    • Standardize boardroom conversations around cyber risk