Efficient Data Collection

Correct and detailed carbon accounting requires a lot of activity input data. Think about data on energy consumption, employee commuting behavior, business travel, procured goods, etc. Without those data, a carbon footprint is incomplete and can lead to wrong insights. However, gathering all necessary data for a carbon footprint calculation is a struggle. We haven’t run into someone who doesn’t agree with this statement.

Although there is no silver bullet (spoiler: also this blog will not provide you with one), in this blog we leverage on our experience working with multiple carbon footprint experts across Europe to bring some structure in the mess. We’ll explain the 5 phases of data collection, the common struggles, the common approaches and the role of technology. We end with 10 concrete tips to help you going forward.

Note: in this blogpost, we focus on finding the correct activity data. Emission factor data is not addressed in this blog. If you are interested in the latter, we refer you to the Resources/Insights section on our website for a dedicated blog on emission factor data

The 5 phases of data collection

Data collection can be a chaotic process. To bring some structure, we cut the data collection process in 5 distinct phases. Each phase comes with its own challenges and possible solutions.

  1. Data need. Identifying the specific data requirements, depending on the operational and organizational boundaries of the footprint, and the data owners who can access the source data;

  2. Data extraction. Extracting relevant information from diverse sources, spanning company records, operational data, and specialized measuring devices. The data extraction is typically performed by the data owner;

  3. Data review. Validating the acquired data to ensure accuracy, consistency, and adherence to established standards;

  4. Data calculation. Converting the data into emission figures, by combining the reviewed activity data with appropriate emission factors;

  5. Process management. Orchestrating the 4 previous phases, encompassing data governance, and maintaining an auditable trail to ensure transparency and credibility in the entire data collection process

Common struggles in data collection

We recently did a survey amongst more than 100 carbon footprint experts on their struggles with collecting data for carbon accounting. See the figure below for the word cloud composed on their answers.

In summary, carbon accounting experts have 3 main struggles:

  1. Data availability. Data availability remains a significant hurdle, especially concerning scope 3 data. Often, this crucial information isn’t centrally available or exists only in formats that are challenging to process (e.g., scanned PDF files).

  2. Data quality. Carbon experts struggle to ensure an acceptable level of data quality. The sources of data might be ambiguous, leading to inconsistencies and gaps in received information. Also the completeness and accuracy of the data is often hard to assess.

  3. Stakeholder management. Coordinating with diverse stakeholders who are responsible for delivering disparate data sets introduces complexities and elongated lead times, hindering the efficiency of the data collection process.


What struggles do carbon experts face when collecting activity data? (word cloud)

Common approaches

Although all carbon accounting experts are facing to some extent the same struggles mentioned above, the approach to them might be different. We distinguish 2 different approaches: the expert-led approach vs. the data owner-led approach.

  • Expert-led. The expert, often an external sustainability consultant, takes the lead in the data collection process. They identify the data needed, review the data quality, perform the emission calculation and manage the process. They only need data owners for data extraction, as the experts themselves typically don’t have access to the source data. Main advantage of this approach is that the carbon footprint expert is involved in most phases, leaving less room for errors. Downside is additional workload (and cost) for the expert.

  • Data owner-led. The data owner, typically someone in an operational role in a company, takes the lead in the data collection process for the part they own. They extract the data, review it and sometimes do part of the calculation and process management. The role of the expert is limited to identifying the data need, supporting the data calculation and overviewing the process. The main advantage of this approach is a more evenly distribution of the workload, but the process is more prone to error.

Note that sometimes a combination of both approaches is used, for instance and data owner-led approach for scope 1 and 2 data and a expert-led approach for scope 3 data.


Who takes the lead? Expert-led approach vs. the data owner-led approach.

The role of technology

Technology can help to solve some of the struggles in the data collection process. It can speed up the data extraction, facilitate data review and automate data calculation. Let’s zoom in on data extraction.

There are different ways to extract data from its source (e.g., a companies ERP system) into a GHG inventory where it can be review and converted to emission data. The most common ways, ranked from low to high level of automation, are:

  • Manual: a person manually transfers data from the source to the GHG inventory;
  • Excel: a person transfers data in bulk from the source to the GHG inventory through an excel export/import;
  • Document screening: automated reading of typically PDF documents, leveraging OCR technology;
  • File transfer: automated bulk data transfer from the source to the GHG inventory;
  • API (Application Programming Interface): automated and standardized data transfer from the source to the GHG inventory.

The most optimal way to extract data from it source depends on 2 criteria: (1) data size and (2) data quality. Data size depends on the volume of the data (1 annual energy consumption number vs. 8760 hourly energy consumption data points) and the frequency of collecting the data (once a year vs. every week). The data quality depends on the completeness of the data (covering all sites vs. covering only 50% of the sites) and its format (scanned paper invoices vs. a machine-readable format like the JSON-format).

The higher the data quality and the larger the data size, the more optimal an automated data collection becomes. Or put differently, it’s a waste of time and money to set up and API-integration with a companies ERP system to read only 1 data point once a year.

Applying the framework as outlined above to carbon accounting, leads to 2 insights:

  1. Today, activity data is still often poor in data quality (i.e., not available in machine readable format) and data size (i.e., only required once a year). Therefore, the most cost-optimal way of collection data is by manual and excel import.

  2. Going forward, we can expect that both the data quality and data size will increase (i.e., more activity data that is centrally available in an ERP, in combination with quarterly updates in carbon footprints). As such, the future of data collection will move towards document screening and APIs.


How can technology help collecting activity data?

10 tips to improve data collection

The end this blog, we want to share 10 concrete actions you can take to improve the data collection process:

In phase I – Data need

  1. Organize a data kick-off to create a sense of urgency and to inform people.
  2. Provide data owner with examples of good data (e.g., pre-filled templates).

In phase II – Data extraction

  1. Ask for off-the-shelf available data, in particular in first reporting year.
  2. Pro-actively schedule follow up calls with data owner to set informal deadlines.

In phase III – Data review

  1. Check for outliers and orders of magnitude relative to turnover or FTE base.
  2. Be clear on what consultant does vs. what data owner does.

In phase IV – Data calculation

  1. Put a “Chinese wall” between the data provider and the data calculation.
  2. Keep link between calculated GHG inventory and original data source.

In phase V – Process management

  1. Keep track of data collection status (who, what, when, how).
  2. Use data versioning system to ensure everyone is working with latest version of data.

The carbon accounting software of Carbon+Alt+Delete can help in streamlining the data collection process. The insights described in this blog are the basis on how w have implemented data collection flows. Don’t hesitate to reach out for more information or a software demo.