Mismatch of total building numbers
The total building numbers in the file Main usage and construction material
(12000 buildings) and Main usage and basement presence
(10000 buildings) is different. I assume that these rest (difference) were not surveyed.
For e.g.
- data: number of buildings for main usage and construction material
main usage | construction material | number_buildings |
---|---|---|
office | total | 12000 |
office | wooden | 3000 |
office | steel | 4000 |
office | steel reinforced concrete | 5000 |
..... |
number of buildings for main usage and presence of basement
main usage | presence of basement | number_buildings |
---|---|---|
office | total | 10000 |
office | basement present | 3000 |
office | basement non present | 4000 |
office | unknown | 30000 |
..... |
While importing the Main usage and basement presence
dataset, I added a clause to keep construction_material
column as total
i.e. 0 and vice versa for basement presence
main usage | construction material | presence of basement | number_buildings |
---|---|---|---|
office | total | total | 10000 |
office | total | basement present | 3000 |
office | total | basement non present | 4000 |
office | wooden | total | 4000 |
office | steel reinforced concrete | total | 5000 |
This makes sense for most of the dataset but when importing main usage
as total
or other types and basement presence
as total
, I run into issues where the building numbers from construction material
get replaced with the numbers from basement presence
(because these are imported later).
So my solution would be to not place construction material
as 0 but -1 and same for basement presence
.
Once the frequency distrubution calculations are done, I will make all the total building numbers uniform by adding a extra type unknown
and rounding off the total
building numbers to be same as for main usage and construction material
dataset.