Mismatch of total building numbers

The total building numbers in the file Main usage and construction material (12000 buildings) and Main usage and basement presence (10000 buildings) is different. I assume that these rest (difference) were not surveyed.

For e.g.

data: number of buildings for main usage and construction material

main usage	construction material	number_buildings
office	total	12000
office	wooden	3000
office	steel	4000
office	steel reinforced concrete	5000
.....

number of buildings for main usage and presence of basement

main usage	presence of basement	number_buildings
office	total	10000
office	basement present	3000
office	basement non present	4000
office	unknown	30000
.....

While importing the Main usage and basement presence dataset, I added a clause to keep construction_material column as total i.e. 0 and vice versa for basement presence

main usage	construction material	presence of basement	number_buildings
office	total	total	10000
office	total	basement present	3000
office	total	basement non present	4000
office	wooden	total	4000
office	steel reinforced concrete	total	5000

This makes sense for most of the dataset but when importing main usage as total or other types and basement presence as total, I run into issues where the building numbers from construction material get replaced with the numbers from basement presence (because these are imported later).

So my solution would be to not place construction material as 0 but -1 and same for basement presence.

Once the frequency distrubution calculations are done, I will make all the total building numbers uniform by adding a extra type unknown and rounding off the total building numbers to be same as for main usage and construction material dataset.

\rfc @ds @tara

Edited May 30, 2022 by Simantini Shinde