2020-06-12 Fri 14:10

Towards Accurate Simulation of Global Challenges on Data Centers Infrastructures via Coupling of Models and Data Sources

Sergiy Gogolenko

ICCS'2020 @ Amsterdam (NL) remote∘2020-06-12 Fri 14:10

Computational global systems science applications and HiDALGO

Goal: evidence based policy-making for current and upcoming situations via accurate GC simulations

Accurate digital twinning of GCs coupled simulations

  • models for diverse social and physical phenomena (often multiscale)
  • massive static and streaming data sets

Technical Challenges in simulating GC across data centers

HPC and data centers environments:

  • static data on efficient PDFS
  • security restrictions for external data
  • proprietary software, models, and data
  • expensive simulations

GC simulations:

  • combine comp. expensive models
  • greedy to "external" data
    • streaming data:
      • sensors, SN, TC, etc.

Technical Challenges:

  • involve external data sources into the static simulations
  • couple across data centers

Representative Global Challenges

Human migration

  • Data:
    • ECMWF weather/climate data
    • UNHCR refugee data
    • food security data
    • telecommunication data
  • Models and software:
    • macro- and micro-
    • ABM with location network
      • Flee framework (python3)
    • weather/climate forecasts
    • GIS: OSM driven toolkit
  • Usage: Burundi, CAR, S.Sudan, Mali
  • south_sudan_location_graph.jpg

Urban air pollution

  • Models and software:
    • ABMS for traffic
    • CFD for NOx spread in air
      • OpenFOAM, ANSYS Fluent
      • Fenics-HPC, NEK5000
    • weather/climate forecasts
  • Data:
    • weather/climate data
    • streaming from sensors
    • OpenStreetMaps, other
  • Usage: twins for
    • Györ (HG)
    • EU: Stuttgart(DE), Graz(AT)
    • US: Milwakee(WS)

Social network analysis

  • Models and software:
    • ABMS for message spread
    • numerical linear algebra
      • PETSc, SLEPc
      • eigenvalues histogram
    • nets: NetworKit, Snap
  • Data:
    • streaming from Twitter
    • telecommunication data
    • SNAP datasets
  • Usage:
    • COVID-19 tweets

Generalized Workflow

Sorry, your browser does not support SVG.

High-Level Design

Sorry, your browser does not support SVG.

Orchestrator & monitor

Cloudify:

  • Clouds out-of-the-box
  • coupling mechanisms:
    • job_depends_on
  • OASIS TOSCA standard
  • Web GUI
  • many extensions

Croupiuer extension:

  • workload managers:
    • HPC: Slurm, Torque
    • HPDA: Mesos
  • coupling mechanisms:
    • job_mpi_coupled_with
    • job_data_coupled_with
      • streaming data with Kafka
  • data catalogues: CKAN

Coupling: locally simulated models

  • Notation
    • acyclic coupling
    • cyclic coupling:
      • sequential
      • concurrent
  • SNA:
    • simverify/validate
  • Migration:
    • conflict model migration model
    • migration model validation activities
    • coarse-grained national refined local
  • UAP:
    • traffic model CFD model of NOx flows
    • WCD CFD model of NOx flows

Coupling: external data sources

CKAN (DMS/DC):

  • consistency in harvesting
  • adequate level of security
  • extensible via plugins
  • data delivery methods:
    • files
    • links to external sources
    • profiled harvester

Apache Kafka:

  • real-time data pipelines
  • streaming data in HiDALGO
    • Twitter (with tweepy)
    • camera based traffic
    • monitor based pollution

Coupling: across HPC centres

  • specialized data center:
    • ECMWF: WCDF
  • vision:
    • bring users to the data
    • use data while it is hot
    • access using metadata
  • software: Polytope
  • goal:
    • enable coupling to build a workflow
  • implementation:
    • Step 1: Static coupling
      • static reanalysis data (calibration)
    • Step 2: Dynamic coupling
      • forecast data via a REST API

Future work

  • develop mechanisms for moving/handling large simulation results
    • Simulation HPDA (Apache Flink) { DMS/DS | Visualization }
  • improve mechanisms for acyclic coupling across data centers
  • implement strong coupling in the case studies
  • evaluate performance for the proposed solutions

Contributors

  • BUL: Derek Groen, Diana Suleimenova, Imran Mahmood
  • PSNC: Marcin Lawenda
  • ATOS: F. Javier Nieto De Santos
  • ECMWF: John Hanley, Milana Vuckovic
  • KNOW: Mark Kroell, Bernhard Geiger
  • PLUS: Robert Elsaesser
  • SZE: Zoltán Horváth

Thank you for your attention!

https://hidalgo-project.eu

contact@hidalgo-project.eu

June 12

14:30-14:50 Zoltán Horváth Improving accuracy of multi-scale urban air pollution simulation via coupling with sensor data and meteorological forcasts MMS
14:50-15:10 Milana Vuckovic Building cloud-based data services to enable earth-science workflows across HPC centres MMS
15:10-15:30 Imran Mahmood An Agent-based Multiscale Simulation of Forced Migration: A case study of South Sudan