Apply Geoparsing and Natural Language Procedures on Hamburg parliament database and newsletters in the context of real estate management

 

Digital City Science, Hafencity University Hamburg

 

GeoparsingGeoparsing is the act of transforming from an unstructured free text, with a procedure of text cleaning and location extraction, and further geocoding into geo coordinate.

 

The LIG project is financed by Landesbetrieb Immobilienmanagement und Grundvermögen (LIG). The official real estate management government institution in Hamburg. We, Digital City Science as a research group, have started our corporation since the year 2020. The project began with focusing on building an interactive web tool for site selection. Starting from last year, a geoparsing concept was introduced to the project and we aim to build a semi-automatic process to analyse text data.

 

 

Workflow of Geoparsing

Workflow of Geoparsing

 

Workflow

 

  • data clean preprocessing: (conversion into small letters, remove punctuations, special characters and numbers; remove German stop words)
  • Location Extraction (Name Entity Recognition): We use the Flair / NER-German-Large module from Hugging Face.
  • Geo coordinate conversion (Geocoding): We use Nominatim open source geocoder.
  • Keyword analysis: visualise with Word Cloud
  • Sentiment analysis: VADER and TextBlobDE
  • Topic modeling: LDA method (Latent Dirichlet Allocation)
  • Dataset we process Hamburg Parliament Database and Elbvertiefung e-Newsletter and include results as point clusters on the interactive web tool:

     

    The LIG Finder

     

     

    LIG Finder Demo

    The interactive web tool, the LIG Finder

    Source Code: Github Link

    The software architect of the LIG Finder

     

    Since early in this year, we refactored the LIG Finder app and had chosen open source solutions such as Python Flask as the middleware, PostgreSQL and PostGIS as database, Vue.js as the javascript framework. The LIG Finder Application follows the client-server architecture, and with the refactor version we preferred Maplibre GL JS over leaflet or openlayers due to superior rendering performance and its 3D renderings potential. It also has a geocoding toolbox to allow web users to query a place name or an address. The user database makes it possible to favourite land parcels and store them for later use.

    classification module

     

    Finding Parcels

    Finding parcels in the LIG Finder

     

     

    Finding parcels begins with selecting an Area of Interest (AoI). Users are provided with three selection methods for defining an AOI, including:
    (1) Administrative divisions: Users can interactively select the AoI in four different administrative levels by clicking on the geometry or selecting by name.
    (2) Drawing: The AoI can be defined by drawing a polygon on the map canvas.
    (3) Walking, biking, and driving distance: the PgRouting extension was used for the calculation of the travel distances. Thus, the corresponding OSM network data were retrieved, stored in the database, and their network topology and costs (time) were calculated in the preprocessing phase. Finally, the polygon (AoI) is created by specifying all accessible nodes from a central point and within a specified time.
    Using spatial intersection, land parcels within the AoI are then retrieved. Next, parcels can be filtered based on their area and gross built-up area. Further, users can refine the parcels using 98 criteria divided into three categories including land-use, property, and special areas, by placing criteria inside exclude and include parts. Finally, users can rank the filtered parcels based on their proximity to the city facilities using KNN analysis.

    find parcels module

    Keyword Analysis

    Keyword analysis: Top 20 keywords from Elbvertiefungs

     

    keyword clouds  keyword clouds
    keyword clouds  keyword clouds

     

    Sentiment Analysis

    Sentiment Analysis

     

    TextBlobDE

     

    Sentiment analysis is a natural language procedure that provides information regarding Polarity and Subjectivity

     

    We tried two methods – TextBlobDE and VADER (Valence Aware Dictionary and sEntiment Reasoner). TextblobDE provides information about polarity and subjectivity. VADER, on the other hand, gives values for negativity, positivity and neutrality of the text (in percentage), with the compound score, one can determine if overall texts are positive, negative or neutral. We also adapted Google Translate to convert text from German into English but this does not have any impact on overall sentiment scores, so we still keep our process in German.


    When we plot the sentiment analysis result on a 2D chart, illustrate as the left figure, the distribution of newsletter articles are less polarised and much fewer aggregated to the X axis, compared to parliament dataset. The colour of points represent TextBlobDE and VADER comparison. When both methods yield positive results, draw the green points. Points in blue represent both methods that determine negative results. Points in yellow show both methods yield neutral results. Those red points represent these two methods yield contradictory results.

     

     

     

     

    Visualization

    Visualization

    2D

     

    2D Pie Chart sentiment result comparison

     

    With pie chart probably is an efficient way to see multivariate, categorical value distribution in percentage, especially when there are duplicated points which fall on the same coordinates. Figure to the left suggest, the bigger the pie represents more points are geocoded into the same location.

     

     

    3D

     

     

     

     

     

     

     

     

    Another geovisualization approach is a 3D stacked bar chart. Figure to the right shows how the spatial distribution of sentiment analysis in 3D. Colour code were adapted the same as in the 2D Pie chart.

    Topic Modeling

    The LIG Team

     

    Martin Niggemann       Juiwen Chang       Qasem

    Project Manager: Martin Niggemann         Geoparsing: Juiwen Chang     Software architect: Qasem Safariallahkheili

     

    Filipe       Maksym

    Social Scientist: Filipe Mello Rose         Student Assistent: Maksym Yermakovych

     

    Back to top