Do you know what data your organization processes and under what conditions?

Introduction

Many organizations conduct IT security audits with external auditor organizations for various reasons. Some perform audits as part of the operation of a management system, while others do so to ensure legal compliance. Recently, the drive for compliance with the NIS2 Directive has prompted numerous organizations to assess their maturity levels. Certain controls in standards and legal regulations necessitate understanding what data an organization processes and under what circumstances. Audit findings often reveal concerning issues, highlighting a need to explore the reasons behind these challenges, the current practices, potential solutions, and the expected benefits. This blog post aims to address these aspects comprehensively.

Current Situation Overview

During an audit, auditors always inquire about the types of data the organization processes and the information systems involved. Most organizations respond by mentioning that they have a data processing register. However, this often refers only to the GDPR Article 30 records of processing activities. A comprehensive data processing inventory is typically absent, regardless of whether the organization is a small or medium-sized enterprise or a large corporation.

Many organizations are also unaware of the full scope of the information systems they use, not just the systems they operate but also those maintained by external providers under contract or accessed through acceptance of general terms and conditions. This includes systems that fall under Shadow IT, which are often not officially managed or sanctioned.

Organizations generally lack knowledge about the data contained in these systems, the sources of input data, the processing workflows, and the outputs generated. The organizations are also unaware of what “data at rest” is presented, whether it is stored outside of systems (e.g., on devices or in memory). These gaps in awareness and management are widespread, highlighting a critical issue in organizational practices.

The concept of data, as well as the definition of data categories and subcategories, are often missing in many cases. This leads to organizations being unclear about which data categories they should maintain records for or what data the auditor is referring to. Data categories and similar related concepts frequently remain undefined and unclear.

Good practices do exist, such as in certain documentation – like a System Security Plan (SSP) – where the categories of data handled in a system are defined, along with details on where the data originates and where it is transmitted. However, this documentation tends to focus primarily on the information systems operated by the organization itself. The data handled within systems merely “used” by the organization, particularly those maintained by external entities, is often not accounted for.

It is worth noting that even when data inventories are created, they typically reflect data categories relevant to a specific point in time and often lack mechanisms to track changes. However, having such an inventory, even with its limitations, is still better than the organization not knowing what data it processes at all.

In many cases, there is a lack of a data owner who takes responsibility for ensuring that managed and processed data is handled appropriately and in compliance with regulations, as well as for performing related tasks. This often results in inadequate access control to the data, opening further gaps in compliance.

Data is often classified based on its sensitivity according to the organization’s data classification policy, but this is not always the case. While there is no perfect system or solution for maintaining a real-time inventory of data, implementing effective solutions is possible, however, this requires resources.

The reasons for the current situation can also be traced back to this. Since creating a data map does not generate direct visible value, few organizations invest significant effort into it, despite the many advantages of having such an inventory.

What are the disadvantages if the organization does not know the data it processes? (not an exhaustive list)

The organization cannot determine what needs to be protected, making it unable to ensure security measures aligned with the sensitivity of the data or legal requirements. Beyond legal non-compliance, data protection often falls on IT operations, even though defining protection needs is not solely their responsibility.

From a compliance perspective, if the organization is unaware of the data it processes, it cannot identify the legal requirements for managing unknown data, thus failing to ensure compliance.

It cannot precisely determine who is responsible for the data’s security regarding confidentiality, integrity, and availability requirements.

The organization cannot assess the risks associated with data processing and transfers, leading to an inability to ensure security through contracts or implement effective technological safeguards.

Data backups are typically handled by IT (if backups exist), which may not align with business continuity requirements. After data loss, availability needs are overlooked, and restoration depends on whatever backups are available, making recovery efforts prone to failure due to poorly defined expectations.

Non-compliances exist with respect to the enforcement of information flow rules, which may include: security filters being enabled and disabled due to the management of incidents, compromise of unencrypted sensitive information, etc.

Establishing data leak prevention (DLP) measures becomes challenging as only general rules can be created, not data-specific ones. This results in generic DLP that lacks effectiveness.

Proper monitoring cannot be implemented because the organization does not know what to monitor, the associated risks, whose activities require monitoring, or who should receive alerts and reports.

The organization will not know where its data resides, in what condition, or its quality. Verification of data integrity will also not occur, leaving the organization unable to establish controls.

What are the advantages of knowing the data an organization processes? (not an exhaustive list)

The organization can identify risks, and if they are high-level risks, it can develop risk mitigation measures.

By understanding the data it processes, the organization can ensure compliance. It can implement measures such as data masking or other logical and access protections for certain data categories.

If the organization knows what data enters its information systems and from where, it can implement input validation and filtering processes.

Data leak prevention can only be effectively implemented if the organization understands the scope of the data it needs to protect. This function, along with others, must be closely integrated with risk management.

When the organization knows the data it processes and is aware of disposal and archiving requirements, it can handle these in a compliant manner. Data or related documents can be deleted, discarded, or archived in a timely manner.

There are no hidden non-compliances, instead, the organization is aware of the risks it faces. In the event of an incident, it knows which data is affected, what actions totake to address the negative event, and what notification obligations apply, especially if personal data is involved.

Opinion on How to Build a Data Map

First of all, it is necessary to know what electronic information systems the organisation uses (whether it operates them, outsources them or simply uses them). Once a list of systems is available, it is necessary to assess which systems receive input data from where, what data is that, what data transformations are generated, what data is stored, what data is generated by the organisation and what output data is generated or transmitted without changes.

You need to know the basic concepts that are used by the organisation during the survey, categories of data, types of data. All of these need to be explained so that there is no misunderstanding when carrying out the process. If in the data survey, someone understands emails as incoming data and someone understands the attachment data in the email or the data in the clear text, there can be misunderstandings.

If we have identified the systems and the data by category and type, if we know which data is “data in transit” and which data is “data at rest”, then we can see what data we have to work with.

Where possible, it is worth using an information system to manage the data map, other than an Excel spreadsheet (which is also better than nothing). Where data and its movement can be visualised, it is always more effective for performing additional tasks. Examples of such systems are the Collibra Data Governance Center or Dataedo. Beyond these, there are also systems that can be used not only to provide a visual representation, but also to automate processes, reduce the potential for errors and make it easier to comply with data management rules. Where there is scope to implement such a solution, it can lead to significant savings in time and resources in the long term.

Once we know the data and how it is present in our activities, we can identify the legislation that needs to be complied with when handling and/or processing the data. Once the legal obligations have been fulfilled, the risks associated with the specific data processing should be assessed and, where necessary, security measures should be implemented to enhance protection (encryption, masking, etc.).

Conclusion

Beyond legal and standard compliance, there are many advantages to knowing your data map and many disadvantages to not. Building a data map is a long-term investment and is certainly resource intensive. It also requires a professional to manage and implement the survey who is familiar with data management, has an overview of the organization’s processes and is aware of the context.

Let’s prioritize how important it is for us to have the data map available and set the goals we aim to achieve by establishing the data map. The organization can be confident that, beyond what has been mentioned in this post, there are many additional benefits to executing this process that the organization might not even realize without it.

If your organization needs help implementing this process, please contact our Compliance colleagues who will be happy to assist you!

[email protected]

Author

Baranya Zsolt

Senior Information Security Auditor

Top 4 Cyber Threats Security Leaders Feel Least Prepared For

Oct 30, 2025 | Fusion Center, Luter, Offsec, Uncategorized

Even the most experienced security leaders admit they’re not fully ready for every threat lurking...

Global Growth of Cybercrime

Sep 25, 2025 | Luter, Offsec

In today’s hyper-connected world, cybercrime is no longer a distant threat - it’s a looming...

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.

Cookie	Duration	Description
VISITOR_INFO1_LIVE	5 months 27 days
YSC	session
yt-remote-connected-devices	never	These cookies are set via embedded youtube-videos.
yt-remote-device-id	never	These cookies are set via embedded youtube-videos.

Introduction

Current Situation Overview

What are the disadvantages if the organization does not know the data it processes? (not an exhaustive list)

What are the advantages of knowing the data an organization processes? (not an exhaustive list)

Opinion on How to Build a Data Map

Conclusion

Baranya Zsolt

Top 4 Cyber Threats Security Leaders Feel Least Prepared For

Global Growth of Cybercrime

COMPANY

About us

Careers

FOLLOW US

SOLUTIONS

SOC

Integration

Offensive Security

Compliance

Cloud Security

ICS/OT Security

MITRE Gap Assessment

KNOWLEDGE CENTER

Whitepapers

Case studies

ICS Security Feed

Blog

FOR CLIENTS

NEWSLETTER