close
close

Predicting war-related population and CO2 emission changes in Ukraine using social media

Predicting war-related population and CO2 emission changes in Ukraine using social media

Since the beginning of Russia’s large-scale invasion of Ukraine through February 2023, the Armed Conflict Location & Event Data Project has recorded more than 42,000 political violence events across Ukraine. Three-quarters of the events are explosions or remote violence, primarily affecting the northeastern, eastern, and southern regions of Ukraine, including artillery and rocket attacks (Supplementary Figure S1A). These geopolitical conflict events generally result in widespread population shifts and abrupt spatiotemporal changes in anthropogenic CO2 Emissions. To assess the impact of the Russia-Ukraine war on changes in population and CO2 emissions in Ukraine, we combine social media, CO2 Emissions and base population data for subsequent analysis (Fig. 1).

Fig. 1: Workflow for the proposed method.
illustration 1

Predicting changes in population and CO2 emissions by using geotagged Twitter data, GRACED baseline data and population data.

Social media data

As one of the most popular social media platforms, Twitter had 368 million monthly active users in 2022. Twitter is widely used in social media research due to its comprehensive metadata, openness of messages, and publicly available APIs (Alshaabi et al. 2021; Baranowski et al. 2020). Tweets are user-generated Twitter messages of 140 characters or less, of which about 1% contain geotagged information using the built-in Global Positioning System (GPS) of users’ mobile phones. Geotagged tweets provide accurate latitude and longitude information related to the WGS84 coordinate. In this research, all geotagged tweets in Ukraine between January 1, 2022, and February 28, 2023 are retrieved via the Twitter Streaming API. The data includes 119,046 geotagged tweets (Supplementary Figure S1B). Each geotagged tweet has the fields user ID, latitude/longitude and publish time.

The original social media data faces challenges such as inherent bias, random noise, and quality uncertainty (Leasure et al. 2023). Here, we preprocess the collected geotagged tweets to reduce the influence of noisy data and reflect the spatiotemporal changes in the number of active Twitter users. First, the geotagged tweets are grouped according to the month of the study period. Then, we duplicate the data generated by the same user at the same location according to the information of the recorded user ID, latitude, and longitude. After that, the set of processed geotagged tweets is counted with a spatial resolution of \({0,1}^{\circ}\,*\,{0,1}^{\circ}\) per month, which is considered an indicator of the number of active Twitter users.

With2 Emissions data

GRACED supports the near real-time global CO2 Emissions monitoring since 1 January 2019 by combining various spatial activity data, including hourly to daily electricity generation data from 31 countries, monthly industrial process production data from 62 countries/regions, daily mobility data from 416 cities, monthly fuel consumption data from 206 countries, and monthly maritime and aviation activity data (Dou et al. 2022, 2023). GRACED provides the anthropogenic CO2 Emissions data from different sectors with a spatial resolution of \({0,1}^{\circ}\,*\,{0,1}^{\circ}\)and a temporal resolution of 1 day. This study uses only the CO2 Emissions data from the private consumption, industry and ground transport sectors from 1 January 2021 to 28 February 2023 due to limited data from CO2 emissions from other sectors in Ukraine. Here we aggregate the daily CO2 Emissions data in monthly averages for each sector (Supplementary Figure S1C).

Population data

The LandScan dataset provided by Oak Ridge National Laboratory leverages geospatial data sources and spatial modeling techniques to create a comprehensive and high-resolution dataset on global population distribution (Dobson et al. 2000). With its accuracy and granularity, LandScan has proven to be a valuable resource for a wide range of applications, such as urban planning, emergency response, and environmental research (Smith et al. 2019; Tellman et al. 2021). We use the LandScan Global 2021 version for population censuses with a spatial resolution of approximately 1 km as baseline population data before the Russia-Ukraine war (Supplementary Figure S1D). The 2021 LandScan data are then downscaled to a spatial resolution of \({0,1}^{\circ}\,*\,{0,1}^{\circ}\) by summing the population numbers.

Estimating population changes

Based on the processed geotagged tweets and LandScan population data, we develop a deterministic model to estimate the monthly population change during the period of the Russia-Ukraine war (i.e., February 2022 to February 2023) compared to the pre-war period (i.e., January 2022) at a spatial resolution of \({0,1}^{\circ}\,*\,{0,1}^{\circ}\).

The first step is to calculate the change ratio of the amount of geotagged tweets per month compared to January 2022:

$${{{{\rm{CR}}}}}_{{{{\rm{grid}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},{{{{\rm{y}}}}}_{{{{\rm{j}}}}}}=\frac{{{{{\rm{T}}}}}_{{{{\rm{grid}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},{{{{\rm{y}}}}}_{{{{\rm{j}}}}}}-{{{{\rm{T}}}}}_{{{{\rm{grid}}}},{{{\rm{January}}}},2022}}{{{{{\rm{T}}}}}_{{{{\rm{grid}}}},{{{\rm{January}}}},2022}},$$

(1)

where TGrid, January 2022 is the number of geotagged tweets in January 2022, and \({{{{\rm{T}}}}}_{{{{\rm{Raster}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},{{{{\rm{y}}}}}}_{{{{\rm{j}}}}}}\) is the number of geotagged tweets in month i of year j.

The estimate of monthly population changes compared to January 2022 is:

$${{{{\rm{PC}}}}}_{{{{\rm{grid}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},{{{{\rm{y}}}}}_{{{{\rm{j}}}}}}={{{{\rm{CR}}}}}}_{{{{\rm{grid}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},{{{{\rm{y}}}}}_{{{{\rm{j}}}}}}\times {{{{\rm{Pop}}}}}_{{{{\rm{grid}}}},2021},$$

(2)

where PopGrid,2021 are the downscaled, gridded population data derived from 2021 LandScan data.

Estimation of CO2 Emissions changes

Here is the CO2 Emissions changes are calculated by combining monthly population changes and baseline CO2 emissions per capita in various emission sectors. First, we calculate the base grid CO2 -Emissions per capita in the private consumption, land transport and industry sectors are as follows:

$${{{{\rm{EPC}}}}}_{{{{\rm{Residential}}}},{{{\rm{Grid}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},2021}=\frac{{{{{\rm{E}}}}}_{{{{\rm{Residential}}}},{{{\rm{Grid}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},2021}}{{{{{\rm{Pop}}}}}_{{{{\rm{Grid}}}},2021}},$$

(3)

$${{{{\rm{EPC}}}}}_{{{{\rm{transport}}}},{{{\rm{grid}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},2021}=\frac{{{{{\rm{E}}}}}_{{{{\rm{transport}}}},{{{\rm{grid}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}}},2021}}{{{{{\rm{Pop}}}}}_{{{{\rm{grid}}}},2021}},$$

(4)

$${{{{\rm{EPC}}}}}_{{{{\rm{Industry}}}},{{{\rm{Raster}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},2021}=\frac{{{{{\rm{E}}}}}_{{{{\rm{Industry}}}},{{{\rm{Raster}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},2021}}{{{{{\rm{Pop}}}}}_{{{{\rm{Raster}}}},2021}},$$

(5)

Where \({{{{\rm{E}}}}}_{{{{\rm{Residential area}}}},{{{\rm{Network}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},2021},{{{{\rm{E}}}}}_{{{{\rm{Traffic}}}},{{{\rm{Network}}}},{{{{\rm{m}}}}}}_{{{{\rm{i}}}}}},2021}\)And \({{{{\rm{E}}}}}_{{{{\rm{Industry}}}},{{{\rm{Raster}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},2021}\) are the gridded CO2 -Emissions from the private consumption, land transport and industry sectors in month i of 2021.

Based on the gridded CO2 emissions per capita in different sectors, the monthly changes in CO2 Emissions during the period of the Russia-Ukraine war (i.e. February 2022 to February 2023) relative to the period before the war (i.e. January 2022) are estimated by multiplying the results of the monthly population change from Eq. (2):

$${{{{\rm{EC}}}}}_{{{{\rm{Residential building}}}},{{{\rm{grid}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},{{{{\rm{y}}}}}_{{{{\rm{j}}}}}}={{{{\rm{PC}}}}}_{{{{\rm{grid}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},{{{{\rm{y}}}}}_{{{{\rm{j}}}}}}\times {{{{\rm{EPC}}}}}_{{{{\rm{Residential building}}}},{{{\rm{grid}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},2021}\times {{{{\rm{TA}}}}}_{{{{\rm{Residential area}}}},{{{\rm{Grid}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}}},$$

(6)

$${{{{\rm{EC}}}}}_{{{{\rm{transport}}}},{{{\rm{grid}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},{{{{\rm{y}}}}}}_{{{{\rm{j}}}}}}=-{{{{\rm{PC}}}}}}_{{{{\rm{grid}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},{{{{\rm{y}}}}}_{{{{\rm{j}}}}}}\times {{{{\rm{EPC}}}}}_{{{{\rm{transport}}}},{{{\rm{grid}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},2021},$$

(7)

$${{{{\rm{EC}}}}}_{{{{\rm{Industry}}}},{{{\rm{Raster}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},{{{{\rm{y}}}}}}_{{{{\rm{j}}}}}}={{{{\rm{PC}}}}}}_{{{{\rm{Raster}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},{{{{\rm{y}}}}}_{{{{\rm{j}}}}}}\times {{{{\rm{EPC}}}}}_{{{{\rm{Industry}}}},{{{\rm{Raster}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}}},2021}.$$

(8th)

Here we assume that the population shift will lead to a decrease in CO2 -Emissions from private consumption and industry, while traffic congestion caused by population movement increases carbon emissions in the ground transport sector. In Eq. (6) \({{{{\rm{TA}}}}}_{{{{\rm{residential area}}}},{{{\rm{network}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}},2021}\) is a temperature adjustment parameter. GRACED calculates the residential consumption CO2 Emissions are based on heat emissions derived from air temperature data, which show obvious seasonal variations. Therefore, the temperature adjustment parameter is taken into account to compensate when calculating monthly changes in CO.2 emissions for this sector and is given by:

$$\begin{array}{r}{{{{\rm{TA}}}}}_{{{{\rm{Residential area}}}},{{{\rm{Raster}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}}}=\displaystyle{\frac{{{{{\rm{E}}}}}_{{{{\rm{Residential area}}}},{{{\rm{Raster}}}},{{{\rm{Jan}}}},2021}}{{{{{\rm{E}}}}}_{{{{\rm{Residential area}}}},{{{\rm{Raster}}}},{{{{\rm{m}}}}}_{{{{\rm{i}}}}}},2021}}},\end{array}$$

(9)

where ELiving, Network, January 2021 is the gridded CO2 -Emissions from the private consumption sector in January 2021. For example, the temperature adjustment parameters in February 2022 and February 2023 are the same, calculated based on the ratio of ELiving, Network, January 2021 to the gridded CO2 -Emissions from the private consumption sector in February 2021.

Validation of CO2 Emissions changes

To check the CO2 Emissions changes estimated by geo-tagged tweets (during the war), LandScan population data (pre-war) and GRACED (pre-war), we compare the estimated results with the CO2 Emissions changes of different sectors calculated by GRACED (during the war) (Supplementary Figures S2–S4). In particular, we perform a linear regression on the estimated and reference CO2 Emissions changes in the private consumption, land transport and industry sectors and examine the reliability of the proposed method according to the R2 and slope adjustment parameters.