{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "vwMqHwWTthA4" }, "source": [ "# Analisi e Previsione della Produzione di Olio d'Oliva\n", "\n", "Questo notebook esplora la relazione tra i dati meteorologici e la produzione annuale di olio d'oliva, con l'obiettivo di creare un modello predittivo." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2024-10-29T15:15:51.992629Z", "start_time": "2024-10-29T15:15:51.940019Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hit:1 http://archive.ubuntu.com/ubuntu jammy InRelease\n", "Get:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB] \n", "Get:3 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB] \n", "Hit:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64 InRelease\n", "Get:5 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB] \n", "Get:6 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [2672 kB]\n", "Get:7 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1452 kB]\n", "Get:8 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages [3241 kB]\n", "Get:9 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [2397 kB]\n", "Get:10 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [1163 kB]\n", "Fetched 11.3 MB in 2s (5846 kB/s) \n", "Reading package lists... Done\n", "Reading package lists... Done\n", "Building dependency tree... Done\n", "Reading state information... Done\n", "graphviz is already the newest version (2.42.2-6ubuntu0.1).\n", "0 upgraded, 0 newly installed, 0 to remove and 120 not upgraded.\n", "Requirement already satisfied: tensorflow in /usr/local/lib/python3.11/dist-packages (2.14.0)\n", "Requirement already satisfied: absl-py>=1.0.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (2.0.0)\n", "Requirement already satisfied: astunparse>=1.6.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.6.3)\n", "Requirement already satisfied: flatbuffers>=23.5.26 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (23.5.26)\n", "Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (0.5.4)\n", "Requirement already satisfied: google-pasta>=0.1.1 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (0.2.0)\n", "Requirement already satisfied: h5py>=2.9.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (3.9.0)\n", "Requirement already satisfied: libclang>=13.0.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (16.0.6)\n", "Requirement already satisfied: ml-dtypes==0.2.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (0.2.0)\n", "Requirement already satisfied: numpy>=1.23.5 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.26.0)\n", "Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (3.3.0)\n", "Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from tensorflow) (23.1)\n", "Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (4.24.3)\n", "Requirement already satisfied: setuptools in /usr/local/lib/python3.11/dist-packages (from tensorflow) (68.2.2)\n", "Requirement already satisfied: six>=1.12.0 in /usr/lib/python3/dist-packages (from tensorflow) (1.16.0)\n", "Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (2.3.0)\n", "Requirement already satisfied: typing-extensions>=3.6.6 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (4.8.0)\n", "Requirement already satisfied: wrapt<1.15,>=1.11.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.14.1)\n", "Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (0.37.1)\n", "Requirement already satisfied: grpcio<2.0,>=1.24.3 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.58.0)\n", "Requirement already satisfied: tensorboard<2.15,>=2.14 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (2.14.0)\n", "Requirement already satisfied: tensorflow-estimator<2.15,>=2.14.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (2.14.0)\n", "Requirement already satisfied: keras<2.15,>=2.14.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (2.14.0)\n", "Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/local/lib/python3.11/dist-packages (from astunparse>=1.6.0->tensorflow) (0.41.2)\n", "Requirement already satisfied: google-auth<3,>=1.6.3 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.15,>=2.14->tensorflow) (2.23.1)\n", "Requirement already satisfied: google-auth-oauthlib<1.1,>=0.5 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.15,>=2.14->tensorflow) (1.0.0)\n", "Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.15,>=2.14->tensorflow) (3.4.4)\n", "Requirement already satisfied: requests<3,>=2.21.0 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.15,>=2.14->tensorflow) (2.31.0)\n", "Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.15,>=2.14->tensorflow) (0.7.1)\n", "Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.15,>=2.14->tensorflow) (2.3.7)\n", "Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.11/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow) (5.3.1)\n", "Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.11/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow) (0.3.0)\n", "Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.11/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow) (4.9)\n", "Requirement already satisfied: urllib3>=2.0.5 in /usr/local/lib/python3.11/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow) (2.0.5)\n", "Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.11/dist-packages (from google-auth-oauthlib<1.1,>=0.5->tensorboard<2.15,>=2.14->tensorflow) (1.3.1)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.21.0->tensorboard<2.15,>=2.14->tensorflow) (3.2.0)\n", "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.21.0->tensorboard<2.15,>=2.14->tensorflow) (3.4)\n", "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.21.0->tensorboard<2.15,>=2.14->tensorflow) (2023.7.22)\n", "Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.11/dist-packages (from werkzeug>=1.0.1->tensorboard<2.15,>=2.14->tensorflow) (2.1.3)\n", "Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /usr/local/lib/python3.11/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow) (0.5.0)\n", "Requirement already satisfied: oauthlib>=3.0.0 in /usr/lib/python3/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<1.1,>=0.5->tensorboard<2.15,>=2.14->tensorflow) (3.2.0)\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", "\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n", "Requirement already satisfied: numpy in /usr/local/lib/python3.11/dist-packages (1.26.0)\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", "\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n", "Requirement already satisfied: pandas in /usr/local/lib/python3.11/dist-packages (2.2.3)\n", "Requirement already satisfied: numpy>=1.23.2 in /usr/local/lib/python3.11/dist-packages (from pandas) (1.26.0)\n", "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.11/dist-packages (from pandas) (2.8.2)\n", "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas) (2024.2)\n", "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas) (2024.2)\n", "Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", "\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n", "Requirement already satisfied: keras in /usr/local/lib/python3.11/dist-packages (2.14.0)\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", "\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n", "Requirement already satisfied: scikit-learn in /usr/local/lib/python3.11/dist-packages (1.5.2)\n", "Requirement already satisfied: numpy>=1.19.5 in /usr/local/lib/python3.11/dist-packages (from scikit-learn) (1.26.0)\n", "Requirement already satisfied: scipy>=1.6.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn) (1.14.1)\n", "Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn) (1.4.2)\n", "Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn) (3.5.0)\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", "\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n", "Requirement already satisfied: matplotlib in /usr/local/lib/python3.11/dist-packages (3.8.0)\n", "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (1.1.1)\n", "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (0.11.0)\n", "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (4.42.1)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (1.4.5)\n", "Requirement already satisfied: numpy<2,>=1.21 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (1.26.0)\n", "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (23.1)\n", "Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (10.0.1)\n", "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (3.2.0)\n", "Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (2.8.2)\n", "Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", "\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n", "Requirement already satisfied: joblib in /usr/local/lib/python3.11/dist-packages (1.4.2)\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", "\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n", "Requirement already satisfied: pyarrow in /usr/local/lib/python3.11/dist-packages (18.0.0)\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", "\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n", "Requirement already satisfied: fastparquet in /usr/local/lib/python3.11/dist-packages (2024.5.0)\n", "Requirement already satisfied: pandas>=1.5.0 in /usr/local/lib/python3.11/dist-packages (from fastparquet) (2.2.3)\n", "Requirement already satisfied: numpy in /usr/local/lib/python3.11/dist-packages (from fastparquet) (1.26.0)\n", "Requirement already satisfied: cramjam>=2.3 in /usr/local/lib/python3.11/dist-packages (from fastparquet) (2.9.0)\n", "Requirement already satisfied: fsspec in /usr/local/lib/python3.11/dist-packages (from fastparquet) (2024.10.0)\n", "Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from fastparquet) (23.1)\n", "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.5.0->fastparquet) (2.8.2)\n", "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.5.0->fastparquet) (2024.2)\n", "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.5.0->fastparquet) (2024.2)\n", "Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.2->pandas>=1.5.0->fastparquet) (1.16.0)\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", "\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n", "Requirement already satisfied: scipy in /usr/local/lib/python3.11/dist-packages (1.14.1)\n", "Requirement already satisfied: numpy<2.3,>=1.23.5 in /usr/local/lib/python3.11/dist-packages (from scipy) (1.26.0)\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", "\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n", "Requirement already satisfied: seaborn in /usr/local/lib/python3.11/dist-packages (0.13.2)\n", "Requirement already satisfied: numpy!=1.24.0,>=1.20 in /usr/local/lib/python3.11/dist-packages (from seaborn) (1.26.0)\n", "Requirement already satisfied: pandas>=1.2 in /usr/local/lib/python3.11/dist-packages (from seaborn) (2.2.3)\n", "Requirement already satisfied: matplotlib!=3.6.1,>=3.4 in /usr/local/lib/python3.11/dist-packages (from seaborn) (3.8.0)\n", "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.1.1)\n", "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.11/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (0.11.0)\n", "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (4.42.1)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.4.5)\n", "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (23.1)\n", "Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (10.0.1)\n", "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (3.2.0)\n", "Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.11/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (2.8.2)\n", "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.2->seaborn) (2024.2)\n", "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.2->seaborn) (2024.2)\n", "Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.4->seaborn) (1.16.0)\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", "\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n", "Requirement already satisfied: tqdm in /usr/local/lib/python3.11/dist-packages (4.67.0)\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", "\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n", "Requirement already satisfied: pydot in /usr/local/lib/python3.11/dist-packages (3.0.2)\n", "Requirement already satisfied: pyparsing>=3.0.9 in /usr/local/lib/python3.11/dist-packages (from pydot) (3.2.0)\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", "\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n", "Requirement already satisfied: tensorflow-io in /usr/local/lib/python3.11/dist-packages (0.37.1)\n", "Requirement already satisfied: tensorflow-io-gcs-filesystem==0.37.1 in /usr/local/lib/python3.11/dist-packages (from tensorflow-io) (0.37.1)\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", "\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n" ] } ], "source": [ "!apt-get update\n", "!apt-get install graphviz -y\n", "\n", "!pip install tensorflow\n", "!pip install numpy\n", "!pip install pandas\n", "\n", "!pip install keras\n", "!pip install scikit-learn\n", "!pip install matplotlib\n", "!pip install joblib\n", "!pip install pyarrow\n", "!pip install fastparquet\n", "!pip install scipy\n", "!pip install seaborn\n", "!pip install tqdm\n", "!pip install pydot\n", "!pip install tensorflow-io" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2024-10-25T21:05:00.337046Z", "start_time": "2024-10-25T21:04:03.960543Z" }, "colab": { "base_uri": "https://localhost:8080/" }, "id": "VqHdVCiJthA6", "outputId": "d8f830c1-5342-4e11-ac3c-96c535aad5fd" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2024-11-06 21:44:14.583940: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", "2024-11-06 21:44:14.584011: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", "2024-11-06 21:44:14.584064: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", "2024-11-06 21:44:14.596853: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", "To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Keras version: 2.14.0\n", "TensorFlow version: 2.14.0\n", "TensorFlow version: 2.14.0\n", "CUDA available: True\n", "GPU devices: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]\n", "1 Physical GPUs, 1 Logical GPUs\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2024-11-06 21:44:17.246902: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 43622 MB memory: -> device: 0, name: NVIDIA L40S, pci bus id: 0000:01:00.0, compute capability: 8.9\n" ] } ], "source": [ "import tensorflow as tf\n", "import keras\n", "\n", "print(f\"Keras version: {keras.__version__}\")\n", "print(f\"TensorFlow version: {tf.__version__}\")\n", "print(f\"TensorFlow version: {tf.__version__}\")\n", "print(f\"CUDA available: {tf.test.is_built_with_cuda()}\")\n", "print(f\"GPU devices: {tf.config.list_physical_devices('GPU')}\")\n", "\n", "# GPU configuration\n", "gpus = tf.config.experimental.list_physical_devices('GPU')\n", "if gpus:\n", " try:\n", " for gpu in gpus:\n", " tf.config.experimental.set_memory_growth(gpu, True)\n", " logical_gpus = tf.config.experimental.list_logical_devices('GPU')\n", " print(len(gpus), \"Physical GPUs,\", len(logical_gpus), \"Logical GPUs\")\n", " except RuntimeError as e:\n", " print(e)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2024-10-25T21:05:14.642072Z", "start_time": "2024-10-25T21:05:11.794331Z" }, "colab": { "base_uri": "https://localhost:8080/", "height": 160 }, "id": "cz0NU95IthA7", "outputId": "eaf1939a-7708-49ad-adc9-bac4e2448e10" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TensorFlow version: 2.14.0\n", "\n", "Dispositivi disponibili:\n", "[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]\n", "\n", "Shape del risultato: (10000, 10000)\n", "Device del tensore: /job:localhost/replica:0/task:0/device:GPU:0\n" ] }, { "data": { "text/plain": [ "'Test completato con successo!'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Test semplice per verificare che la GPU funzioni\n", "def test_gpu():\n", " print(\"TensorFlow version:\", tf.__version__)\n", " print(\"\\nDispositivi disponibili:\")\n", " print(tf.config.list_physical_devices())\n", "\n", " # Creiamo e moltiplichiamo due tensori sulla GPU\n", " with tf.device('/GPU:0'):\n", " a = tf.random.normal([10000, 10000])\n", " b = tf.random.normal([10000, 10000])\n", " c = tf.matmul(a, b)\n", "\n", " print(\"\\nShape del risultato:\", c.shape)\n", " print(\"Device del tensore:\", c.device)\n", " return \"Test completato con successo!\"\n", "\n", "\n", "test_gpu()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2024-10-25T21:05:34.177059Z", "start_time": "2024-10-25T21:05:34.012517Z" }, "id": "VYNuYASythA8" }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import MinMaxScaler, StandardScaler\n", "from tensorflow.keras.layers import Input, Dense, Dropout, Bidirectional, LSTM, LayerNormalization, Add, Activation, BatchNormalization, MultiHeadAttention, MaxPooling1D, Conv1D, GlobalMaxPooling1D, GlobalAveragePooling1D, \\\n", " Concatenate, ZeroPadding1D, Lambda, AveragePooling1D, concatenate\n", "from tensorflow.keras.layers import Dense, LSTM, Conv1D, Input, concatenate, Dropout, BatchNormalization, GlobalAveragePooling1D, Bidirectional, TimeDistributed, Attention, MultiHeadAttention\n", "from tensorflow.keras.models import Model\n", "from tensorflow.keras.regularizers import l2\n", "from tensorflow.keras.optimizers import Adam\n", "from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint\n", "from datetime import datetime\n", "import os\n", "import json\n", "import joblib\n", "import re\n", "import pyarrow as pa\n", "import pyarrow.parquet as pq\n", "from tqdm import tqdm\n", "from concurrent.futures import ProcessPoolExecutor, as_completed\n", "from functools import partial\n", "import psutil\n", "import multiprocessing\n", "\n", "random_state_value = 42\n", "\n", "base_project_dir = './kaggle/working/'\n", "data_project_dir = base_project_dir + 'data/'\n", "models_project_dir = base_project_dir + 'models/'\n", "\n", "os.makedirs(base_project_dir, exist_ok=True)\n", "os.makedirs(data_project_dir, exist_ok=True)\n", "os.makedirs(models_project_dir, exist_ok=True)" ] }, { "cell_type": "markdown", "metadata": { "id": "uHKkULSNthA8" }, "source": [ "## Funzioni di Plot" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "id": "gzvYVaBPthA8" }, "outputs": [], "source": [ "def save_plot(plt, title, output_dir='./kaggle/working/plots'):\n", " os.makedirs(output_dir, exist_ok=True)\n", " filename = \"\".join(x for x in title if x.isalnum() or x in [' ', '-', '_']).rstrip()\n", " filename = filename.replace(' ', '_').lower()\n", " filepath = os.path.join(output_dir, f\"{filename}.png\")\n", " plt.savefig(filepath, bbox_inches='tight', dpi=300)\n", " print(f\"Plot salvato come: {filepath}\")\n", "\n", "\n", "def to_camel_case(text):\n", " \"\"\"\n", " Converte una stringa in camelCase.\n", " Gestisce stringhe con spazi, trattini o underscore.\n", " Se è una sola parola, la restituisce in minuscolo.\n", " \"\"\"\n", " # Rimuove eventuali spazi iniziali e finali\n", " text = text.strip()\n", "\n", " # Se la stringa è vuota, ritorna stringa vuota\n", " if not text:\n", " return \"\"\n", "\n", " # Sostituisce trattini e underscore con spazi\n", " text = text.replace('-', ' ').replace('_', ' ')\n", "\n", " # Divide la stringa in parole\n", " words = text.split()\n", "\n", " # Se non ci sono parole dopo lo split, ritorna stringa vuota\n", " if not words:\n", " return \"\"\n", "\n", " # Se c'è una sola parola, ritorna in minuscolo\n", " if len(words) == 1:\n", " return words[0].lower()\n", "\n", " # Altrimenti procedi con il camelCase\n", " result = words[0].lower()\n", " for word in words[1:]:\n", " result += word.capitalize()\n", "\n", " return result" ] }, { "cell_type": "markdown", "metadata": { "id": "lhipxRbMthA8" }, "source": [ "## 1. Caricamento e preparazione dei Dati Meteo" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Function to convert csv to parquet\n", "def csv_to_parquet(csv_file, parquet_file, chunksize=100000):\n", " writer = None\n", "\n", " for chunk in pd.read_csv(csv_file, chunksize=chunksize):\n", " if writer is None:\n", "\n", " table = pa.Table.from_pandas(chunk)\n", " writer = pq.ParquetWriter(parquet_file, table.schema)\n", " else:\n", " table = pa.Table.from_pandas(chunk)\n", "\n", " writer.write_table(table)\n", "\n", " if writer:\n", " writer.close()\n", "\n", " print(f\"File conversion completed : {csv_file} -> {parquet_file}\")\n", "\n", "\n", "def read_json_files(folder_path):\n", " all_data = []\n", "\n", " file_list = sorted(os.listdir(folder_path))\n", "\n", " for filename in file_list:\n", " if filename.endswith('.json'):\n", " file_path = os.path.join(folder_path, filename)\n", " try:\n", " with open(file_path, 'r') as file:\n", " data = json.load(file)\n", " all_data.extend(data['days'])\n", " except Exception as e:\n", " print(f\"Error processing file '{filename}': {str(e)}\")\n", "\n", " return all_data\n", "\n", "\n", "def create_weather_dataset(data):\n", " dataset = []\n", " seen_datetimes = set()\n", "\n", " for day in data:\n", " date = day['datetime']\n", " for hour in day['hours']:\n", " datetime_str = f\"{date} {hour['datetime']}\"\n", "\n", " # Verifico se questo datetime è già stato visto\n", " if datetime_str in seen_datetimes:\n", " continue\n", "\n", " seen_datetimes.add(datetime_str)\n", "\n", " if isinstance(hour['preciptype'], list):\n", " preciptype = \"__\".join(hour['preciptype'])\n", " else:\n", " preciptype = hour['preciptype'] if hour['preciptype'] else \"\"\n", "\n", " conditions = hour['conditions'].replace(', ', '__').replace(' ', '_').lower()\n", "\n", " row = {\n", " 'datetime': datetime_str,\n", " 'temp': hour['temp'],\n", " 'feelslike': hour['feelslike'],\n", " 'humidity': hour['humidity'],\n", " 'dew': hour['dew'],\n", " 'precip': hour['precip'],\n", " 'snow': hour['snow'],\n", " 'preciptype': preciptype.lower(),\n", " 'windspeed': hour['windspeed'],\n", " 'winddir': hour['winddir'],\n", " 'pressure': hour['pressure'],\n", " 'cloudcover': hour['cloudcover'],\n", " 'visibility': hour['visibility'],\n", " 'solarradiation': hour['solarradiation'],\n", " 'solarenergy': hour['solarenergy'],\n", " 'uvindex': hour['uvindex'],\n", " 'conditions': conditions,\n", " 'tempmax': day['tempmax'],\n", " 'tempmin': day['tempmin'],\n", " 'precipprob': day['precipprob'],\n", " 'precipcover': day['precipcover']\n", " }\n", " dataset.append(row)\n", "\n", " dataset.sort(key=lambda x: datetime.strptime(x['datetime'], \"%Y-%m-%d %H:%M:%S\"))\n", "\n", " return pd.DataFrame(dataset)\n", "\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Crea le sequenze per LSTM\n", "def create_sequences(timesteps, X, y=None):\n", " \"\"\"\n", " Crea sequenze temporali dai dati.\n", " \n", " Parameters:\n", " -----------\n", " X : array-like\n", " Dati di input\n", " timesteps : int\n", " Numero di timestep per ogni sequenza\n", " y : array-like, optional\n", " Target values. Se None, crea sequenze solo per X\n", " \n", " Returns:\n", " --------\n", " tuple o array\n", " Se y è fornito: (X_sequences, y_sequences)\n", " Se y è None: X_sequences\n", " \"\"\"\n", " Xs = []\n", " for i in range(len(X) - timesteps):\n", " Xs.append(X[i:i + timesteps])\n", "\n", " if y is not None:\n", " ys = []\n", " for i in range(len(X) - timesteps):\n", " ys.append(y[i + timesteps])\n", " return np.array(Xs), np.array(ys)\n", "\n", " return np.array(Xs)\n", "\n", "def get_season(date):\n", " month = date.month\n", " day = date.day\n", " if (month == 12 and day >= 21) or (month <= 3 and day < 20):\n", " return 'Winter'\n", " elif (month == 3 and day >= 20) or (month <= 6 and day < 21):\n", " return 'Spring'\n", " elif (month == 6 and day >= 21) or (month <= 9 and day < 23):\n", " return 'Summer'\n", " elif (month == 9 and day >= 23) or (month <= 12 and day < 21):\n", " return 'Autumn'\n", " else:\n", " return 'Unknown'\n", "\n", "\n", "def get_time_period(hour):\n", " if 5 <= hour < 12:\n", " return 'Morning'\n", " elif 12 <= hour < 17:\n", " return 'Afternoon'\n", " elif 17 <= hour < 21:\n", " return 'Evening'\n", " else:\n", " return 'Night'\n", "\n", "\n", "def add_time_features(df):\n", " df['datetime'] = pd.to_datetime(df['datetime'])\n", " df['timestamp'] = df['datetime'].astype(np.int64) // 10 ** 9\n", " df['year'] = df['datetime'].dt.year\n", " df['month'] = df['datetime'].dt.month\n", " df['day'] = df['datetime'].dt.day\n", " df['hour'] = df['datetime'].dt.hour\n", " df['minute'] = df['datetime'].dt.minute\n", " df['hour_sin'] = np.sin(df['hour'] * (2 * np.pi / 24))\n", " df['hour_cos'] = np.cos(df['hour'] * (2 * np.pi / 24))\n", " df['day_of_week'] = df['datetime'].dt.dayofweek\n", " df['day_of_year'] = df['datetime'].dt.dayofyear\n", " df['week_of_year'] = df['datetime'].dt.isocalendar().week.astype(int)\n", " df['quarter'] = df['datetime'].dt.quarter\n", " df['is_month_end'] = df['datetime'].dt.is_month_end.astype(int)\n", " df['is_quarter_end'] = df['datetime'].dt.is_quarter_end.astype(int)\n", " df['is_year_end'] = df['datetime'].dt.is_year_end.astype(int)\n", " df['month_sin'] = np.sin(df['month'] * (2 * np.pi / 12))\n", " df['month_cos'] = np.cos(df['month'] * (2 * np.pi / 12))\n", " df['day_of_year_sin'] = np.sin(df['day_of_year'] * (2 * np.pi / 365.25))\n", " df['day_of_year_cos'] = np.cos(df['day_of_year'] * (2 * np.pi / 365.25))\n", " df['season'] = df['datetime'].apply(get_season)\n", " df['time_period'] = df['hour'].apply(get_time_period)\n", " return df\n", "\n", "\n", "def add_solar_features(df):\n", " # Calcolo dell'angolo solare\n", " df['solar_angle'] = np.sin(df['day_of_year'] * (2 * np.pi / 365.25)) * np.sin(df['hour'] * (2 * np.pi / 24))\n", "\n", " # Interazioni tra features rilevanti\n", " df['cloud_temp_interaction'] = df['cloudcover'] * df['temp']\n", " df['visibility_cloud_interaction'] = df['visibility'] * (100 - df['cloudcover'])\n", "\n", " # Feature derivate\n", " df['clear_sky_index'] = (100 - df['cloudcover']) / 100\n", " df['temp_gradient'] = df['temp'] - df['tempmin']\n", "\n", " return df\n", "\n", "\n", "def add_solar_specific_features(df):\n", " # Angolo solare e durata del giorno\n", " df['day_length'] = 12 + 3 * np.sin(2 * np.pi * (df['day_of_year'] - 81) / 365.25)\n", " df['solar_noon'] = 12 - df['hour']\n", " df['solar_elevation'] = np.sin(2 * np.pi * df['day_of_year'] / 365.25) * np.cos(2 * np.pi * df['solar_noon'] / 24)\n", "\n", " # Interazioni\n", " df['cloud_elevation'] = df['cloudcover'] * df['solar_elevation']\n", " df['visibility_elevation'] = df['visibility'] * df['solar_elevation']\n", "\n", " # Rolling features con finestre più ampie\n", " df['cloud_rolling_12h'] = df['cloudcover'].rolling(window=12).mean()\n", " df['temp_rolling_12h'] = df['temp'].rolling(window=12).mean()\n", "\n", " return df\n", "\n", "\n", "def add_advanced_features(df):\n", " # Features esistenti\n", " df = add_time_features(df)\n", " df = add_solar_features(df)\n", " df = add_solar_specific_features(df)\n", "\n", " # Aggiungi interazioni tra variabili meteorologiche\n", " df['temp_humidity'] = df['temp'] * df['humidity']\n", " df['temp_cloudcover'] = df['temp'] * df['cloudcover']\n", " df['visibility_cloudcover'] = df['visibility'] * df['cloudcover']\n", "\n", " # Features derivate per la radiazione solare\n", " df['clear_sky_factor'] = (100 - df['cloudcover']) / 100\n", " df['day_length'] = np.sin(df['day_of_year_sin']) * 12 + 12 # approssimazione della durata del giorno\n", "\n", " # Lag features\n", " df['temp_1h_lag'] = df['temp'].shift(1)\n", " df['cloudcover_1h_lag'] = df['cloudcover'].shift(1)\n", " df['humidity_1h_lag'] = df['humidity'].shift(1)\n", "\n", " # Rolling means\n", " df['temp_rolling_mean_6h'] = df['temp'].rolling(window=6).mean()\n", " df['cloudcover_rolling_mean_6h'] = df['cloudcover'].rolling(window=6).mean()\n", "\n", " return df\n", "\n", "# Preparazione dati\n", "def prepare_solar_data(weather_data, features):\n", " \"\"\"\n", " Prepara i dati per i modelli solari.\n", " \"\"\"\n", " # Aggiungi le caratteristiche temporali\n", " weather_data = add_advanced_features(weather_data)\n", " weather_data = pd.get_dummies(weather_data, columns=['season', 'time_period'], drop_first=True)\n", "\n", " # Dividi i dati\n", " data_after_2010 = weather_data[weather_data['year'] >= 2010].copy()\n", " data_after_2010 = data_after_2010.sort_values('datetime')\n", " data_after_2010.set_index('datetime', inplace=True)\n", "\n", " # Interpola valori mancanti\n", " target_variables = ['solarradiation', 'solarenergy', 'uvindex']\n", " for column in target_variables:\n", " data_after_2010[column] = data_after_2010[column].interpolate(method='time')\n", "\n", " # Rimuovi righe con valori mancanti\n", " data_after_2010.dropna(subset=features + target_variables, inplace=True)\n", "\n", " # Prepara X e y\n", " X = data_after_2010[features].values\n", " y = data_after_2010[target_variables].values\n", "\n", " # Normalizza features\n", " scaler_X = MinMaxScaler()\n", " X_scaled = scaler_X.fit_transform(X)\n", "\n", " return X_scaled, y, scaler_X, data_after_2010\n", "\n", "def prepare_model_specific_data(X_scaled, y, target_idx, timesteps):\n", " \"\"\"\n", " Prepara i dati specifici per ciascun modello.\n", " \"\"\"\n", " # Scaler specifico per il target\n", " scaler_y = MinMaxScaler()\n", " y_scaled = scaler_y.fit_transform(y[:, target_idx].reshape(-1, 1))\n", "\n", " # Split dei dati\n", " X_train, X_temp, y_train, y_temp = train_test_split(\n", " X_scaled, y_scaled, test_size=0.3, shuffle=False\n", " )\n", " X_val, X_test, y_val, y_test = train_test_split(\n", " X_temp, y_temp, test_size=0.5, shuffle=False\n", " )\n", "\n", " # Crea sequenze\n", " X_train_seq, y_train_seq = create_sequences(timesteps, X_train, y_train)\n", " X_val_seq, y_val_seq = create_sequences(timesteps, X_val, y_val)\n", " X_test_seq, y_test_seq = create_sequences(timesteps, X_test, y_test)\n", "\n", " return {\n", " 'train': (X_train_seq, y_train_seq),\n", " 'val': (X_val_seq, y_val_seq),\n", " 'test': (X_test_seq, y_test_seq)\n", " }, scaler_y\n", "\n", "def create_radiation_model(input_shape, solar_params_shape=(3,)):\n", " \"\"\"\n", " Modello per la radiazione solare con vincoli di non-negatività.\n", " \"\"\"\n", " # Input layers\n", " main_input = Input(shape=input_shape, name='main_input')\n", " solar_input = Input(shape=solar_params_shape, name='solar_params')\n", " \n", " # Branch CNN\n", " x1 = Conv1D(32, 3, padding='same')(main_input)\n", " x1 = BatchNormalization()(x1)\n", " x1 = Activation('relu')(x1)\n", " x1 = Conv1D(64, 3, padding='same')(x1)\n", " x1 = BatchNormalization()(x1)\n", " x1 = Activation('relu')(x1)\n", " x1 = GlobalAveragePooling1D()(x1)\n", " \n", " # Branch LSTM\n", " x2 = Bidirectional(LSTM(64, return_sequences=True))(main_input)\n", " x2 = Bidirectional(LSTM(32))(x2)\n", " x2 = BatchNormalization()(x2)\n", " \n", " # Solar parameters processing\n", " x3 = Dense(32)(solar_input)\n", " x3 = BatchNormalization()(x3)\n", " x3 = Activation('relu')(x3)\n", " \n", " # Combine all branches\n", " x = concatenate([x1, x2, x3])\n", " \n", " # Dense layers with non-negativity constraints\n", " x = Dense(64, kernel_constraint=tf.keras.constraints.NonNeg())(x)\n", " x = BatchNormalization()(x)\n", " x = Activation('relu')(x)\n", " x = Dropout(0.2)(x)\n", " \n", " x = Dense(32, kernel_constraint=tf.keras.constraints.NonNeg())(x)\n", " x = BatchNormalization()(x)\n", " x = Activation('relu')(x)\n", " \n", " # Output layer con vincoli di non-negatività\n", " output = Dense(1, \n", " kernel_constraint=tf.keras.constraints.NonNeg(),\n", " activation='relu')(x)\n", " \n", " model = Model(inputs=[main_input, solar_input], outputs=output, name=\"SolarRadiation\")\n", " return model\n", "\n", "def create_energy_model(input_shape):\n", " \"\"\"\n", " Modello migliorato per l'energia solare che sfrutta la relazione con la radiazione.\n", " Include vincoli di non-negatività e migliore gestione delle dipendenze temporali.\n", " \"\"\"\n", " inputs = Input(shape=input_shape)\n", " \n", " # Branch 1: Elaborazione temporale con attention\n", " # Multi-head attention per catturare relazioni temporali\n", " x1 = MultiHeadAttention(num_heads=8, key_dim=32)(inputs, inputs)\n", " x1 = BatchNormalization()(x1)\n", " x1 = Activation('relu')(x1)\n", " \n", " # Temporal Convolution branch per catturare pattern locali\n", " x2 = Conv1D(\n", " filters=64,\n", " kernel_size=3,\n", " padding='same',\n", " kernel_constraint=tf.keras.constraints.NonNeg()\n", " )(inputs)\n", " x2 = BatchNormalization()(x2)\n", " x2 = Activation('relu')(x2)\n", " x2 = Conv1D(\n", " filters=32,\n", " kernel_size=3,\n", " padding='same',\n", " kernel_constraint=tf.keras.constraints.NonNeg()\n", " )(x2)\n", " x2 = BatchNormalization()(x2)\n", " x2 = Activation('relu')(x2)\n", " \n", " # LSTM branch per memoria a lungo termine\n", " x3 = LSTM(64, return_sequences=True)(inputs)\n", " x3 = LSTM(32, return_sequences=False)(x3)\n", " x3 = BatchNormalization()(x3)\n", " x3 = Activation('relu')(x3)\n", " \n", " # Global pooling per ogni branch\n", " x1 = GlobalAveragePooling1D()(x1)\n", " x2 = GlobalAveragePooling1D()(x2)\n", " \n", " # Concatena tutti i branch\n", " x = concatenate([x1, x2, x3])\n", " \n", " # Dense layers con vincoli di non-negatività\n", " x = Dense(\n", " 128,\n", " kernel_constraint=tf.keras.constraints.NonNeg(),\n", " kernel_regularizer=l2(0.01)\n", " )(x)\n", " x = BatchNormalization()(x)\n", " x = Activation('relu')(x)\n", " x = Dropout(0.3)(x)\n", " \n", " x = Dense(\n", " 64,\n", " kernel_constraint=tf.keras.constraints.NonNeg(),\n", " kernel_regularizer=l2(0.01)\n", " )(x)\n", " x = BatchNormalization()(x)\n", " x = Activation('relu')(x)\n", " x = Dropout(0.2)(x)\n", " \n", " # Output layer con vincolo di non-negatività\n", " output = Dense(\n", " 1,\n", " kernel_constraint=tf.keras.constraints.NonNeg(),\n", " activation='relu', # Garantisce output non negativo\n", " kernel_regularizer=l2(0.01)\n", " )(x)\n", " \n", " model = Model(inputs=inputs, outputs=output, name=\"SolarEnergy\")\n", " return model\n", "\n", "def create_uv_model(input_shape):\n", " \"\"\"\n", " Modello migliorato per l'indice UV che sfrutta sia radiazione che energia solare.\n", " Include vincoli di non-negatività e considera le relazioni non lineari tra le variabili.\n", " \"\"\"\n", " inputs = Input(shape=input_shape)\n", " \n", " # CNN branch per pattern locali\n", " x1 = Conv1D(\n", " filters=64,\n", " kernel_size=3,\n", " padding='same',\n", " kernel_constraint=tf.keras.constraints.NonNeg()\n", " )(inputs)\n", " x1 = BatchNormalization()(x1)\n", " x1 = Activation('relu')(x1)\n", " x1 = MaxPooling1D(pool_size=2)(x1)\n", " \n", " x1 = Conv1D(\n", " filters=32,\n", " kernel_size=3,\n", " padding='same',\n", " kernel_constraint=tf.keras.constraints.NonNeg()\n", " )(x1)\n", " x1 = BatchNormalization()(x1)\n", " x1 = Activation('relu')(x1)\n", " x1 = GlobalAveragePooling1D()(x1)\n", " \n", " # Attention branch per relazioni complesse\n", " # Specialmente utile per le relazioni con radiazione ed energia\n", " x2 = MultiHeadAttention(num_heads=4, key_dim=32)(inputs, inputs)\n", " x2 = BatchNormalization()(x2)\n", " x2 = Activation('relu')(x2)\n", " x2 = GlobalAveragePooling1D()(x2)\n", " \n", " # Dense branch per le feature più recenti\n", " x3 = GlobalAveragePooling1D()(inputs)\n", " x3 = Dense(\n", " 64,\n", " kernel_constraint=tf.keras.constraints.NonNeg(),\n", " kernel_regularizer=l2(0.01)\n", " )(x3)\n", " x3 = BatchNormalization()(x3)\n", " x3 = Activation('relu')(x3)\n", " \n", " # Fusion dei branch\n", " x = concatenate([x1, x2, x3])\n", " \n", " # Dense layers con vincoli di non-negatività\n", " x = Dense(\n", " 128,\n", " kernel_constraint=tf.keras.constraints.NonNeg(),\n", " kernel_regularizer=l2(0.01)\n", " )(x)\n", " x = BatchNormalization()(x)\n", " x = Activation('relu')(x)\n", " x = Dropout(0.3)(x)\n", " \n", " x = Dense(\n", " 64,\n", " kernel_constraint=tf.keras.constraints.NonNeg(),\n", " kernel_regularizer=l2(0.01)\n", " )(x)\n", " x = BatchNormalization()(x)\n", " x = Activation('relu')(x)\n", " x = Dropout(0.2)(x)\n", " \n", " # Output layer con vincolo di non-negatività\n", " output = Dense(\n", " 1,\n", " kernel_constraint=tf.keras.constraints.NonNeg(),\n", " activation='relu', # Garantisce output non negativo\n", " kernel_regularizer=l2(0.01)\n", " )(x)\n", " \n", " model = Model(inputs=inputs, outputs=output, name=\"SolarUV\")\n", " return model\n", "\n", "class CustomCallback(tf.keras.callbacks.Callback):\n", " \"\"\"\n", " Callback personalizzato per monitorare la non-negatività delle predizioni\n", " e altre metriche importanti durante il training.\n", " \"\"\"\n", " def __init__(self, validation_data=None):\n", " super().__init__()\n", " self.validation_data = validation_data\n", " \n", " def on_epoch_end(self, epoch, logs=None):\n", " try:\n", " # Controlla se abbiamo i dati di validazione\n", " if hasattr(self.model, 'validation_data'):\n", " val_x = self.model.validation_data[0]\n", " if isinstance(val_x, list): # Per il modello della radiazione\n", " val_pred = self.model.predict(val_x, verbose=0)\n", " else:\n", " val_pred = self.model.predict(val_x, verbose=0)\n", " \n", " # Verifica non-negatività\n", " if np.any(val_pred < 0):\n", " print(\"\\nWarning: Rilevati valori negativi nelle predizioni\")\n", " print(f\"Min value: {np.min(val_pred)}\")\n", " \n", " # Statistiche predizioni\n", " print(f\"\\nStatistiche predizioni epoca {epoch}:\")\n", " print(f\"Min: {np.min(val_pred):.4f}\")\n", " print(f\"Max: {np.max(val_pred):.4f}\")\n", " print(f\"Media: {np.mean(val_pred):.4f}\")\n", " \n", " # Aggiunge le metriche ai logs\n", " if logs is not None:\n", " logs['val_pred_min'] = np.min(val_pred)\n", " logs['val_pred_max'] = np.max(val_pred)\n", " logs['val_pred_mean'] = np.mean(val_pred)\n", " except Exception as e:\n", " print(f\"\\nWarning nel CustomCallback: {str(e)}\")\n", "\n", "def create_callbacks(target):\n", " \"\"\"\n", " Crea le callbacks per il training del modello.\n", " \n", " Parameters:\n", " -----------\n", " target : str\n", " Nome del target per cui creare le callbacks\n", " \n", " Returns:\n", " --------\n", " list : Lista delle callbacks configurate\n", " \"\"\"\n", " # Crea la directory per i checkpoint e i logs\n", " model_dir = f'./kaggle/working/models/{target}'\n", " checkpoint_dir = os.path.join(model_dir, 'checkpoints')\n", " log_dir = os.path.join(model_dir, 'logs')\n", " \n", " os.makedirs(checkpoint_dir, exist_ok=True)\n", " os.makedirs(log_dir, exist_ok=True)\n", " \n", " return [\n", " # Early Stopping\n", " EarlyStopping(\n", " monitor='val_loss',\n", " patience=10,\n", " restore_best_weights=True,\n", " min_delta=0.0001\n", " ),\n", " # Reduce LR on Plateau\n", " ReduceLROnPlateau(\n", " monitor='val_loss',\n", " factor=0.5,\n", " patience=5,\n", " min_lr=1e-6,\n", " verbose=1\n", " ),\n", " # Model Checkpoint\n", " ModelCheckpoint(\n", " filepath=os.path.join(checkpoint_dir, 'best_model_{epoch:02d}_{val_loss:.4f}.h5'),\n", " monitor='val_loss',\n", " save_best_only=True,\n", " save_weights_only=True,\n", " verbose=1\n", " ),\n", " # TensorBoard\n", " tf.keras.callbacks.TensorBoard(\n", " log_dir=log_dir,\n", " histogram_freq=1,\n", " write_graph=True,\n", " update_freq='epoch'\n", " ),\n", " # Custom callback\n", " CustomCallback()\n", " ]\n", "\n", "def train_solar_models(weather_data, features, timesteps=24):\n", " \"\"\"\n", " Training sequenziale dei modelli solari dove ogni modello usa \n", " le predizioni dei modelli precedenti come feature aggiuntive.\n", " \n", " Parameters:\n", " -----------\n", " weather_data : pd.DataFrame\n", " Dataset contenente i dati meteorologici\n", " features : list\n", " Lista delle feature da utilizzare\n", " timesteps : int, optional\n", " Numero di timesteps per le sequenze temporali\n", " \n", " Returns:\n", " --------\n", " tuple\n", " (models, histories, scalers) contenenti i modelli addestrati,\n", " le storie di training e gli scalers utilizzati\n", " \"\"\"\n", " print(\"Preparazione dati iniziale...\")\n", " X_scaled, y, scaler_X, data_processed = prepare_solar_data(weather_data, features)\n", " \n", " models = {}\n", " histories = {}\n", " scalers = {'X': scaler_X}\n", " feature_scalers = {} # Per tenere traccia degli scaler delle nuove features\n", " \n", " # Manteniamo un array delle feature che si espanderà con le predizioni\n", " current_features = X_scaled.copy()\n", " print(f\"Shape iniziale features: {current_features.shape}\")\n", " \n", " # Dizionario per mantenere le predizioni di ogni modello\n", " predictions_by_target = {}\n", " \n", " # Configurazione per ciascun modello in ordine specifico\n", " model_configs = {\n", " 'solarradiation': {\n", " 'creator': create_radiation_model,\n", " 'index': 0,\n", " 'needs_solar_params': True,\n", " 'previous_predictions_needed': []\n", " },\n", " 'solarenergy': {\n", " 'creator': create_energy_model,\n", " 'index': 1,\n", " 'needs_solar_params': False,\n", " 'previous_predictions_needed': ['solarradiation']\n", " },\n", " 'uvindex': {\n", " 'creator': create_uv_model,\n", " 'index': 2,\n", " 'needs_solar_params': False,\n", " 'previous_predictions_needed': ['solarradiation', 'solarenergy']\n", " }\n", " }\n", " \n", " # Training sequenziale\n", " for target, config in model_configs.items():\n", " print(f\"\\n{'='*50}\")\n", " print(f\"Training modello per: {target}\")\n", " print(f\"{'='*50}\")\n", " \n", " # 1. Aggiunta delle predizioni precedenti come features\n", " if config['previous_predictions_needed']:\n", " print(f\"\\nAggiunta predizioni precedenti da: {config['previous_predictions_needed']}\")\n", " new_features_list = []\n", " \n", " for prev_target in config['previous_predictions_needed']:\n", " if prev_target in predictions_by_target:\n", " print(f\"\\nProcessing predizioni di {prev_target}...\")\n", " prev_pred = predictions_by_target[prev_target]\n", " \n", " # Allineamento dimensioni\n", " if len(prev_pred) != len(current_features):\n", " print(\"Allineamento dimensioni necessario:\")\n", " print(f\"- Current features: {current_features.shape}\")\n", " print(f\"- Predictions: {prev_pred.shape}\")\n", " \n", " offset = len(current_features) - len(prev_pred)\n", " if offset > 0:\n", " print(f\"Aggiunta padding di {offset} elementi\")\n", " pad_width = ((offset, 0), (0, 0)) if len(prev_pred.shape) > 1 else (offset, 0)\n", " prev_pred = np.pad(prev_pred, pad_width, mode='edge')\n", " else:\n", " print(f\"Taglio di {abs(offset)} elementi\")\n", " prev_pred = prev_pred[-len(current_features):]\n", " \n", " # Scaling delle predizioni\n", " feature_scaler = MinMaxScaler()\n", " prev_pred_scaled = feature_scaler.fit_transform(prev_pred.reshape(-1, 1))\n", " feature_scalers[f\"{prev_target}_pred\"] = feature_scaler\n", " \n", " print(f\"Statistiche feature {prev_target}:\")\n", " print(f\"- Shape: {prev_pred_scaled.shape}\")\n", " print(f\"- Range: [{prev_pred_scaled.min():.4f}, {prev_pred_scaled.max():.4f}]\")\n", " \n", " new_features_list.append(prev_pred_scaled)\n", " \n", " # Aggiunta delle nuove features\n", " if new_features_list:\n", " print(\"\\nVerifica dimensioni prima della concatenazione:\")\n", " lengths = [feat.shape[0] for feat in [current_features] + new_features_list]\n", " if len(set(lengths)) > 1:\n", " print(\"WARNING: Lunghezze diverse rilevate, allineamento necessario\")\n", " min_length = min(lengths)\n", " current_features = current_features[-min_length:]\n", " new_features_list = [feat[-min_length:] for feat in new_features_list]\n", " \n", " try:\n", " current_features = np.column_stack([current_features] + new_features_list)\n", " print(f\"Nuove dimensioni features: {current_features.shape}\")\n", " except ValueError as e:\n", " print(f\"Errore nella concatenazione: {str(e)}\")\n", " print(\"\\nDimensioni:\")\n", " print(f\"- Current: {current_features.shape}\")\n", " for i, feat in enumerate(new_features_list):\n", " print(f\"- New {i}: {feat.shape}\")\n", " raise\n", " \n", " # 2. Preparazione dati per il training\n", " print(\"\\nPreparazione dati di training...\")\n", " data_dict, scaler_y = prepare_model_specific_data(\n", " current_features, y, config['index'], timesteps\n", " )\n", " scalers[target] = scaler_y\n", " \n", " # 3. Creazione e compilazione del modello\n", " print(\"\\nCreazione modello...\")\n", " input_shape = (timesteps, current_features.shape[1])\n", " print(f\"Input shape: {input_shape}\")\n", " \n", " if config['needs_solar_params']:\n", " model = config['creator'](input_shape, solar_params_shape=(3,))\n", " solar_params = data_processed[['solar_angle', 'clear_sky_index', 'solar_elevation']].values\n", " else:\n", " model = config['creator'](input_shape)\n", " \n", " model.compile(\n", " optimizer=Adam(learning_rate=0.001, clipnorm=1.0),\n", " loss='huber',\n", " metrics=['mae']\n", " )\n", " model.summary()\n", " \n", " # 4. Training\n", " print(\"\\nInizio training...\")\n", " callbacks = create_callbacks(target)\n", " \n", " try:\n", " if config['needs_solar_params']:\n", " history = model.fit(\n", " [data_dict['train'][0], solar_params[:len(data_dict['train'][0])]],\n", " data_dict['train'][1],\n", " validation_data=([\n", " data_dict['val'][0],\n", " solar_params[len(data_dict['train'][0]):len(data_dict['train'][0])+len(data_dict['val'][0])]\n", " ], data_dict['val'][1]),\n", " epochs=50,\n", " batch_size=32,\n", " callbacks=callbacks,\n", " verbose=1\n", " )\n", " \n", " # Genera predizioni complete\n", " print(\"\\nGenerazione predizioni complete...\")\n", " all_sequences = create_sequences(timesteps, current_features)\n", " predictions = model.predict(\n", " [all_sequences, solar_params[:len(all_sequences)]]\n", " )\n", " else:\n", " history = model.fit(\n", " data_dict['train'][0],\n", " data_dict['train'][1],\n", " validation_data=(data_dict['val'][0], data_dict['val'][1]),\n", " epochs=50,\n", " batch_size=32,\n", " callbacks=callbacks,\n", " verbose=1\n", " )\n", " \n", " # Genera predizioni complete\n", " print(\"\\nGenerazione predizioni complete...\")\n", " all_sequences = create_sequences(timesteps, current_features)\n", " predictions = model.predict(all_sequences)\n", " \n", " # Denormalizza e processa le predizioni\n", " predictions = scaler_y.inverse_transform(predictions)\n", " predictions = np.maximum(predictions, 0) # Assicura non-negatività\n", " predictions_by_target[target] = predictions\n", " \n", " print(f\"\\nStatistiche finali predizioni {target}:\")\n", " print(f\"- Min: {predictions.min():.4f}\")\n", " print(f\"- Max: {predictions.max():.4f}\")\n", " print(f\"- Media: {predictions.mean():.4f}\")\n", " \n", " models[target] = model\n", " histories[target] = history\n", " \n", " except Exception as e:\n", " print(f\"\\nERRORE nel training di {target}: {str(e)}\")\n", " raise\n", " \n", " # Aggiunta degli scaler delle feature al dizionario principale\n", " scalers.update(feature_scalers)\n", "\n", " model_info = {\n", " target: {\n", " 'input_shape': (timesteps, current_features.shape[1]),\n", " 'feature_order': feature_scalers.keys() if target != 'solarradiation' else None,\n", " 'needs_solar_params': config['needs_solar_params']\n", " }\n", " for target, config in model_configs.items()\n", " }\n", " \n", " # Salva il model_info insieme agli scaler\n", " scalers['model_info'] = model_info\n", " \n", " return models, histories, scalers" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "id": "m7b_9nBJthA9" }, "outputs": [], "source": [ "def save_models_and_scalers(models, scalers, target_variables, base_path='./kaggle/working/models'):\n", " \"\"\"\n", " Salva i modelli Keras, gli scaler e gli artefatti aggiuntivi nella cartella models.\n", " \n", " Parameters:\n", " -----------\n", " models : dict\n", " Dizionario contenente i modelli Keras per ogni variabile target\n", " scalers : dict\n", " Dizionario contenente tutti gli scaler (compresi X, target e predizioni)\n", " target_variables : list\n", " Lista delle variabili target\n", " base_path : str\n", " Percorso base dove salvare i modelli\n", " \"\"\"\n", " if isinstance(base_path, list):\n", " base_path = './kaggle/working/models' # Path di default se viene passata una lista\n", " \n", " # Crea la cartella base se non esiste\n", " os.makedirs(base_path, exist_ok=True)\n", "\n", " # Salva tutti gli scaler\n", " scaler_path = os.path.join(base_path, 'scalers')\n", " os.makedirs(scaler_path, exist_ok=True)\n", " \n", " # Salva ogni scaler separatamente\n", " print(\"\\nSalvataggio scaler:\")\n", " for scaler_name, scaler in scalers.items():\n", " scaler_file = os.path.join(scaler_path, f'{scaler_name}.joblib')\n", " joblib.dump(scaler, scaler_file)\n", " print(f\"- Salvato scaler: {scaler_name}\")\n", "\n", " # Salva la configurazione dei modelli\n", " model_configs = {\n", " 'solarradiation': {'has_solar_params': True},\n", " 'solarenergy': {'has_solar_params': False},\n", " 'uvindex': {'has_solar_params': False}\n", " }\n", " config_path = os.path.join(base_path, 'model_configs.joblib')\n", " joblib.dump(model_configs, config_path)\n", "\n", " # Salva i modelli e gli artefatti per ogni variabile target\n", " print(\"\\nSalvataggio modelli e artefatti:\")\n", " for target in target_variables:\n", " print(f\"\\nProcessing {target}...\")\n", " # Crea una sottocartella per ogni target\n", " target_path = os.path.join(base_path, target)\n", " os.makedirs(target_path, exist_ok=True)\n", "\n", " try:\n", " # 1. Salva il modello completo\n", " model_path = os.path.join(target_path, 'model.keras')\n", " models[target].save(model_path, save_format='keras')\n", " print(f\"- Salvato modello completo: {model_path}\")\n", "\n", " # 2. Salva i pesi separatamente\n", " weights_path = os.path.join(target_path, 'weights')\n", " os.makedirs(weights_path, exist_ok=True)\n", " weight_file = os.path.join(weights_path, 'weights')\n", " models[target].save_weights(weight_file)\n", " print(f\"- Salvati pesi: {weight_file}\")\n", "\n", " # 3. Salva il plot del modello\n", " plot_path = os.path.join(target_path, f'{target}_architecture.png')\n", " tf.keras.utils.plot_model(\n", " models[target],\n", " to_file=plot_path,\n", " show_shapes=True,\n", " show_layer_names=True,\n", " rankdir='TB',\n", " expand_nested=True,\n", " dpi=150\n", " )\n", " print(f\"- Salvato plot architettura: {plot_path}\")\n", "\n", " # 4. Salva il summary del modello in un file di testo\n", " summary_path = os.path.join(target_path, f'{target}_summary.txt')\n", " with open(summary_path, 'w') as f:\n", " models[target].summary(print_fn=lambda x: f.write(x + '\\n'))\n", " print(f\"- Salvato summary modello: {summary_path}\")\n", "\n", " except Exception as e:\n", " print(f\"Errore nel salvataggio degli artefatti per {target}: {str(e)}\")\n", "\n", " # Salva la lista delle variabili target\n", " target_vars_path = os.path.join(base_path, 'target_variables.joblib')\n", " joblib.dump(target_variables, target_vars_path)\n", "\n", " # Salva un file README con la struttura e le informazioni\n", " readme_path = os.path.join(base_path, 'README.txt')\n", " with open(readme_path, 'w') as f:\n", " f.write(\"Model Artifacts Directory Structure\\n\")\n", " f.write(\"=================================\\n\\n\")\n", " f.write(\"Directory structure:\\n\")\n", " f.write(\"- scalers/: Contains all scalers used in the models\\n\")\n", " f.write(\"- model_configs.joblib: Configuration for each model\\n\")\n", " f.write(\"- target_variables.joblib: List of target variables\\n\")\n", " f.write(\"\\nFor each target variable:\\n\")\n", " f.write(\"- model.keras: Complete model\\n\")\n", " f.write(\"- weights/: Model weights\\n\")\n", " f.write(\"- *_architecture.png: Visual representation of model architecture\\n\")\n", " f.write(\"- *_summary.txt: Detailed model summary\\n\\n\")\n", " f.write(\"Saved scalers:\\n\")\n", " for scaler_name in scalers.keys():\n", " f.write(f\"- {scaler_name}\\n\")\n", "\n", " print(f\"\\nTutti gli artefatti salvati in: {base_path}\")\n", " print(f\"Consulta {readme_path} per i dettagli sulla struttura\")\n", "\n", " return base_path\n", "\n", "def load_models_and_scalers(base_path='./kaggle/working/models'):\n", " \"\"\"\n", " Carica i modelli Keras e tutti gli scaler dalla cartella models.\n", " \n", " Parameters:\n", " -----------\n", " base_path : str\n", " Percorso della cartella contenente i modelli salvati\n", " \n", " Returns:\n", " --------\n", " tuple\n", " (models, scalers, target_variables)\n", " \"\"\"\n", " try:\n", " # Carica la lista delle variabili target\n", " target_vars_path = os.path.join(base_path, 'target_variables.joblib')\n", " target_variables = joblib.load(target_vars_path)\n", "\n", " # Carica tutti gli scaler\n", " scaler_path = os.path.join(base_path, 'scalers')\n", " scalers = {}\n", " for scaler_file in os.listdir(scaler_path):\n", " if scaler_file.endswith('.joblib'):\n", " scaler_name = scaler_file[:-7] # rimuove '.joblib'\n", " scaler_file_path = os.path.join(scaler_path, scaler_file)\n", " scalers[scaler_name] = joblib.load(scaler_file_path)\n", "\n", " # Carica la configurazione dei modelli\n", " config_path = os.path.join(base_path, 'model_configs.joblib')\n", " model_configs = joblib.load(config_path)\n", "\n", " # Inizializza il dizionario dei modelli\n", " models = {}\n", "\n", " # Carica i custom layer se necessario\n", " custom_objects = {\n", " 'DataAugmentation': DataAugmentation,\n", " 'PositionalEncoding': PositionalEncoding\n", " }\n", "\n", " # Carica i modelli per ogni variabile target\n", " for target in target_variables:\n", " target_path = os.path.join(base_path, target)\n", " \n", " # Carica il model summary per ottenere le dimensioni corrette\n", " summary_path = os.path.join(target_path, f'{target}_summary.txt')\n", " input_shape = None\n", " if os.path.exists(summary_path):\n", " with open(summary_path, 'r') as f:\n", " for line in f:\n", " if 'Input Shape' in line:\n", " # Estrai la shape dal summary\n", " shape_str = line.split(':')[-1].strip()\n", " shape_tuple = eval(shape_str)\n", " input_shape = shape_tuple\n", " break\n", " \n", " if input_shape is None:\n", " # Fallback alle dimensioni di base\n", " base_features = len(scalers['X'].get_params()['feature_names_in_'])\n", " # Aggiungi feature per le predizioni precedenti\n", " additional_features = 0\n", " if target == 'solarenergy':\n", " additional_features = 1 # solarradiation\n", " elif target == 'uvindex':\n", " additional_features = 2 # solarradiation + solarenergy\n", " input_shape = (24, base_features + additional_features)\n", " \n", " # Carica il modello\n", " model_path = os.path.join(target_path, 'model.keras')\n", " try:\n", " # Prima prova a caricare il modello completo\n", " models[target] = tf.keras.models.load_model(\n", " model_path,\n", " custom_objects=custom_objects\n", " )\n", " print(f\"Caricato modello {target} da file\")\n", " except Exception as e:\n", " print(f\"Errore nel caricamento del modello {target}: {str(e)}\")\n", " print(\"Tentativo di ricostruzione del modello...\")\n", " \n", " # Se fallisce, ricostruisci il modello e carica i pesi\n", " if target == 'solarradiation':\n", " models[target] = create_radiation_model(input_shape)\n", " elif target == 'solarenergy':\n", " models[target] = create_energy_model(input_shape)\n", " else: # uvindex\n", " models[target] = create_uv_model(input_shape)\n", " \n", " # Carica i pesi\n", " weights_path = os.path.join(target_path, 'weights', 'weights')\n", " models[target].load_weights(weights_path)\n", " print(f\"Modello {target} ricostruito e pesi caricati\")\n", "\n", " print(f\"Modelli e scaler caricati da: {base_path}\")\n", " print(\"Scaler caricati:\")\n", " for scaler_name in scalers.keys():\n", " print(f\"- {scaler_name}\")\n", " \n", " return models, scalers, target_variables\n", "\n", " except Exception as e:\n", " print(f\"Errore nel caricamento dei modelli: {str(e)}\")\n", " raise" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def predict_solar_variables(data_before_2010, features, models, scalers, target_variables, timesteps=24):\n", " \"\"\"\n", " Effettua predizioni sequenziali per le variabili solari usando le informazioni\n", " salvate durante il training.\n", " \n", " Parameters:\n", " -----------\n", " data_before_2010 : pd.DataFrame\n", " Dati storici da predire\n", " features : list\n", " Lista delle feature da utilizzare\n", " models : dict\n", " Dizionario dei modelli per ogni target\n", " scalers : dict\n", " Dizionario contenente tutti gli scaler e le informazioni sui modelli\n", " target_variables : list\n", " Lista delle variabili target\n", " timesteps : int\n", " Numero di timestep per le sequenze\n", " \n", " Returns:\n", " --------\n", " pd.DataFrame\n", " DataFrame con le predizioni aggiunte\n", " \"\"\"\n", " import traceback\n", " \n", " # Crea copia dei dati\n", " data = data_before_2010.copy()\n", " \n", " # Prepara i dati di base\n", " X_before = data[features].values\n", " current_features = scalers['X'].transform(X_before)\n", " print(f\"Shape features iniziali: {current_features.shape}\")\n", " \n", " # Recupera le informazioni sui modelli\n", " model_info = scalers['model_info']\n", " \n", " # Dizionario per tenere traccia delle predizioni\n", " predictions_by_target = {}\n", " \n", " # Prepara i parametri solari\n", " solar_params = None\n", " if all(col in data.columns for col in ['solar_angle', 'clear_sky_index', 'solar_elevation']):\n", " solar_params = data[['solar_angle', 'clear_sky_index', 'solar_elevation']].values\n", " \n", " for target in target_variables:\n", " print(f\"\\n{'='*50}\")\n", " print(f\"Previsione di {target}\")\n", " print(f\"{'='*50}\")\n", " \n", " try:\n", " # Recupera info specifiche del modello\n", " target_info = model_info[target]\n", " expected_shape = target_info['input_shape']\n", " feature_order = target_info['feature_order']\n", " needs_solar_params = target_info['needs_solar_params']\n", " \n", " # Reset delle feature per ogni target\n", " X_current = current_features.copy()\n", " \n", " # Aggiungi le predizioni precedenti come features se necessario\n", " if feature_order is not None:\n", " print(\"Aggiunta predizioni precedenti come features\")\n", " new_features_list = []\n", " \n", " for feature_name in feature_order:\n", " if feature_name in scalers:\n", " base_target = feature_name.replace('_pred', '')\n", " if base_target in predictions_by_target:\n", " print(f\"Aggiunta predizione di {base_target}\")\n", " prev_pred = predictions_by_target[base_target]\n", " \n", " # Gestione NaN\n", " if np.isnan(prev_pred).any():\n", " print(f\"ATTENZIONE: Trovati NaN nelle predizioni di {base_target}\")\n", " prev_pred = np.nan_to_num(prev_pred, 0)\n", " \n", " # Scala le predizioni\n", " prev_pred_scaled = scalers[feature_name].transform(\n", " prev_pred.reshape(-1, 1)\n", " )\n", " \n", " # Allinea dimensioni\n", " if len(prev_pred_scaled) != len(X_current):\n", " if len(prev_pred_scaled) < len(X_current):\n", " pad_width = ((len(X_current) - len(prev_pred_scaled), 0), (0, 0))\n", " prev_pred_scaled = np.pad(prev_pred_scaled, pad_width, mode='edge')\n", " else:\n", " prev_pred_scaled = prev_pred_scaled[:len(X_current)]\n", " \n", " new_features_list.append(prev_pred_scaled)\n", " print(f\"Shape dopo aggiunta {base_target}: {prev_pred_scaled.shape}\")\n", " \n", " if new_features_list:\n", " X_current = np.column_stack([X_current] + new_features_list)\n", " print(f\"Shape finale features: {X_current.shape}\")\n", " \n", " # Verifica dimensioni\n", " if X_current.shape[1] != expected_shape[1]:\n", " raise ValueError(\n", " f\"Mismatch nelle dimensioni delle feature per {target}: \"\n", " f\"atteso {expected_shape[1]}, ottenuto {X_current.shape[1]}\"\n", " )\n", " \n", " # Crea le sequenze\n", " X_seq = create_sequences(timesteps, X_current)\n", " print(f\"Shape sequenze: {X_seq.shape}\")\n", " \n", " # Verifica NaN\n", " if np.isnan(X_seq).any():\n", " print(\"ATTENZIONE: Trovati NaN nelle sequenze di input\")\n", " X_seq = np.nan_to_num(X_seq, 0)\n", " \n", " # Effettua le predizioni\n", " if needs_solar_params and solar_params is not None:\n", " print(\"Utilizzo modello con parametri solari\")\n", " solar_params_seq = solar_params[timesteps:]\n", " if len(solar_params_seq) > len(X_seq):\n", " solar_params_seq = solar_params_seq[:len(X_seq)]\n", " \n", " y_pred_scaled = models[target].predict(\n", " [X_seq, solar_params_seq],\n", " batch_size=32,\n", " verbose=1\n", " )\n", " else:\n", " print(\"Utilizzo modello standard\")\n", " y_pred_scaled = models[target].predict(\n", " X_seq,\n", " batch_size=32,\n", " verbose=1\n", " )\n", " \n", " # Verifica e processa le predizioni\n", " if np.isnan(y_pred_scaled).any():\n", " print(\"ATTENZIONE: Trovati NaN nelle predizioni\")\n", " y_pred_scaled = np.nan_to_num(y_pred_scaled, 0)\n", " \n", " # Denormalizza\n", " y_pred = scalers[target].inverse_transform(y_pred_scaled)\n", " y_pred = np.maximum(y_pred, 0)\n", " \n", " # Salva le predizioni\n", " predictions_by_target[target] = y_pred\n", " \n", " # Aggiorna il DataFrame\n", " dates = data.index[timesteps:]\n", " if len(dates) > len(y_pred):\n", " dates = dates[:len(y_pred)]\n", " data.loc[dates, target] = y_pred\n", " \n", " print(f\"\\nStatistiche predizioni per {target}:\")\n", " print(f\"Media: {np.mean(y_pred):.2f}\")\n", " print(f\"Min: {np.min(y_pred):.2f}\")\n", " print(f\"Max: {np.max(y_pred):.2f}\")\n", " \n", " except Exception as e:\n", " print(f\"Errore nella predizione di {target}: {str(e)}\")\n", " print(\"Traceback completo:\", traceback.format_exc())\n", " # Inizializza con zeri in caso di errore\n", " y_pred = np.zeros(len(data) - timesteps)\n", " predictions_by_target[target] = y_pred\n", " dates = data.index[timesteps:]\n", " data.loc[dates, target] = y_pred\n", " continue\n", " \n", " # Gestisci valori mancanti\n", " print(\"\\nGestione valori mancanti...\")\n", " data[target_variables] = data[target_variables].fillna(0)\n", " missing_counts = data[target_variables].isnull().sum()\n", " if missing_counts.any():\n", " print(\"Valori mancanti rimanenti:\")\n", " print(missing_counts)\n", " \n", " return data\n", "\n", "def create_complete_dataset(data_before_2010, data_after_2010, predictions):\n", " \"\"\"\n", " Combina i dati predetti con i dati esistenti.\n", " \n", " Parameters:\n", " -----------\n", " data_before_2010 : pd.DataFrame\n", " Dati storici originali\n", " data_after_2010 : pd.DataFrame\n", " Dati più recenti\n", " predictions : pd.DataFrame\n", " Dati con predizioni\n", " \n", " Returns:\n", " --------\n", " pd.DataFrame\n", " Dataset completo combinato\n", " \"\"\"\n", " # Combina i dataset\n", " weather_data_complete = pd.concat([predictions, data_after_2010], axis=0)\n", " weather_data_complete = weather_data_complete.sort_index()\n", " \n", " # Verifica la continuità temporale\n", " time_gaps = weather_data_complete.index.to_series().diff().dropna()\n", " if time_gaps.max().total_seconds() > 3600: # gap maggiore di 1 ora\n", " print(\"Attenzione: Trovati gap temporali nei dati\")\n", " print(\"Gap massimo:\", time_gaps.max())\n", " \n", " return weather_data_complete" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "def add_olive_water_consumption_correlation(dataset):\n", " # Dati simulati per il fabbisogno d'acqua e la correlazione con la temperatura\n", " fabbisogno_acqua = {\n", " \"Nocellara dell'Etna\": {\"Primavera\": 1200, \"Estate\": 2000, \"Autunno\": 1000, \"Inverno\": 500, \"Temperatura Ottimale\": 18, \"Resistenza\": \"Media\"},\n", " \"Leccino\": {\"Primavera\": 1000, \"Estate\": 1800, \"Autunno\": 800, \"Inverno\": 400, \"Temperatura Ottimale\": 20, \"Resistenza\": \"Alta\"},\n", " \"Frantoio\": {\"Primavera\": 1100, \"Estate\": 1900, \"Autunno\": 900, \"Inverno\": 450, \"Temperatura Ottimale\": 19, \"Resistenza\": \"Alta\"},\n", " \"Coratina\": {\"Primavera\": 1300, \"Estate\": 2200, \"Autunno\": 1100, \"Inverno\": 550, \"Temperatura Ottimale\": 17, \"Resistenza\": \"Media\"},\n", " \"Moraiolo\": {\"Primavera\": 1150, \"Estate\": 2100, \"Autunno\": 900, \"Inverno\": 480, \"Temperatura Ottimale\": 18, \"Resistenza\": \"Media\"},\n", " \"Pendolino\": {\"Primavera\": 1050, \"Estate\": 1850, \"Autunno\": 850, \"Inverno\": 430, \"Temperatura Ottimale\": 20, \"Resistenza\": \"Alta\"},\n", " \"Taggiasca\": {\"Primavera\": 1000, \"Estate\": 1750, \"Autunno\": 800, \"Inverno\": 400, \"Temperatura Ottimale\": 19, \"Resistenza\": \"Alta\"},\n", " \"Canino\": {\"Primavera\": 1100, \"Estate\": 1900, \"Autunno\": 900, \"Inverno\": 450, \"Temperatura Ottimale\": 18, \"Resistenza\": \"Media\"},\n", " \"Itrana\": {\"Primavera\": 1200, \"Estate\": 2000, \"Autunno\": 1000, \"Inverno\": 500, \"Temperatura Ottimale\": 17, \"Resistenza\": \"Media\"},\n", " \"Ogliarola\": {\"Primavera\": 1150, \"Estate\": 1950, \"Autunno\": 900, \"Inverno\": 480, \"Temperatura Ottimale\": 18, \"Resistenza\": \"Media\"},\n", " \"Biancolilla\": {\"Primavera\": 1050, \"Estate\": 1800, \"Autunno\": 850, \"Inverno\": 430, \"Temperatura Ottimale\": 19, \"Resistenza\": \"Alta\"}\n", " }\n", "\n", " # Calcola il fabbisogno idrico annuale per ogni varietà\n", " for varieta in fabbisogno_acqua:\n", " fabbisogno_acqua[varieta][\"Annuale\"] = sum([fabbisogno_acqua[varieta][stagione] for stagione in [\"Primavera\", \"Estate\", \"Autunno\", \"Inverno\"]])\n", "\n", " # Aggiungiamo le nuove colonne al dataset\n", " dataset[\"Fabbisogno Acqua Primavera (m³/ettaro)\"] = dataset[\"Varietà di Olive\"].apply(lambda x: fabbisogno_acqua[x][\"Primavera\"])\n", " dataset[\"Fabbisogno Acqua Estate (m³/ettaro)\"] = dataset[\"Varietà di Olive\"].apply(lambda x: fabbisogno_acqua[x][\"Estate\"])\n", " dataset[\"Fabbisogno Acqua Autunno (m³/ettaro)\"] = dataset[\"Varietà di Olive\"].apply(lambda x: fabbisogno_acqua[x][\"Autunno\"])\n", " dataset[\"Fabbisogno Acqua Inverno (m³/ettaro)\"] = dataset[\"Varietà di Olive\"].apply(lambda x: fabbisogno_acqua[x][\"Inverno\"])\n", " dataset[\"Fabbisogno Idrico Annuale (m³/ettaro)\"] = dataset[\"Varietà di Olive\"].apply(lambda x: fabbisogno_acqua[x][\"Annuale\"])\n", " dataset[\"Temperatura Ottimale\"] = dataset[\"Varietà di Olive\"].apply(lambda x: fabbisogno_acqua[x][\"Temperatura Ottimale\"])\n", " dataset[\"Resistenza alla Siccità\"] = dataset[\"Varietà di Olive\"].apply(lambda x: fabbisogno_acqua[x][\"Resistenza\"])\n", "\n", " return dataset" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "id": "zOeyz5JHthA_" }, "outputs": [], "source": [ "def preprocess_weather_data(weather_df):\n", " # Calcola statistiche mensili per ogni anno\n", " monthly_weather = weather_df.groupby(['year', 'month']).agg({\n", " 'temp': ['mean', 'min', 'max'],\n", " 'humidity': 'mean',\n", " 'precip': 'sum',\n", " 'windspeed': 'mean',\n", " 'cloudcover': 'mean',\n", " 'solarradiation': 'sum',\n", " 'solarenergy': 'sum',\n", " 'uvindex': 'max'\n", " }).reset_index()\n", "\n", " monthly_weather.columns = ['year', 'month'] + [f'{col[0]}_{col[1]}' for col in monthly_weather.columns[2:]]\n", " return monthly_weather\n", "\n", "\n", "def get_growth_phase(month):\n", " if month in [12, 1, 2]:\n", " return 'dormancy'\n", " elif month in [3, 4, 5]:\n", " return 'flowering'\n", " elif month in [6, 7, 8]:\n", " return 'fruit_set'\n", " else:\n", " return 'ripening'\n", "\n", "\n", "def calculate_weather_effect(row, optimal_temp):\n", " # Effetti base\n", " temp_effect = -0.1 * (row['temp_mean'] - optimal_temp) ** 2\n", " rain_effect = -0.05 * (row['precip_sum'] - 600) ** 2 / 10000\n", " sun_effect = 0.1 * row['solarenergy_sum'] / 1000\n", "\n", " # Fattori di scala basati sulla fase di crescita\n", " if row['growth_phase'] == 'dormancy':\n", " temp_scale = 0.5\n", " rain_scale = 0.2\n", " sun_scale = 0.1\n", " elif row['growth_phase'] == 'flowering':\n", " temp_scale = 2.0\n", " rain_scale = 1.5\n", " sun_scale = 1.0\n", " elif row['growth_phase'] == 'fruit_set':\n", " temp_scale = 1.5\n", " rain_scale = 1.0\n", " sun_scale = 0.8\n", " else: # ripening\n", " temp_scale = 1.0\n", " rain_scale = 0.5\n", " sun_scale = 1.2\n", "\n", " # Calcolo dell'effetto combinato\n", " combined_effect = (\n", " temp_scale * temp_effect +\n", " rain_scale * rain_effect +\n", " sun_scale * sun_effect\n", " )\n", "\n", " # Aggiustamenti specifici per fase\n", " if row['growth_phase'] == 'flowering':\n", " combined_effect -= 0.5 * max(0, row['precip_sum'] - 50) # Penalità per pioggia eccessiva durante la fioritura\n", " elif row['growth_phase'] == 'fruit_set':\n", " combined_effect += 0.3 * max(0, row['temp_mean'] - (optimal_temp + 5)) # Bonus per temperature più alte durante la formazione dei frutti\n", "\n", " return combined_effect\n", "\n", "\n", "def calculate_water_need(weather_data, base_need, optimal_temp):\n", " # Calcola il fabbisogno idrico basato su temperatura e precipitazioni\n", " temp_factor = 1 + 0.05 * (weather_data['temp_mean'] - optimal_temp) # Aumenta del 5% per ogni grado sopra l'ottimale\n", " rain_factor = 1 - 0.001 * weather_data['precip_sum'] # Diminuisce leggermente con l'aumentare delle precipitazioni\n", " return base_need * temp_factor * rain_factor\n", "\n", "\n", "def clean_column_name(name):\n", " # Rimuove caratteri speciali e spazi, converte in snake_case e abbrevia\n", " name = re.sub(r'[^a-zA-Z0-9\\s]', '', name) # Rimuove caratteri speciali\n", " name = name.lower().replace(' ', '_') # Converte in snake_case\n", "\n", " # Abbreviazioni comuni\n", " abbreviations = {\n", " 'production': 'prod',\n", " 'percentage': 'pct',\n", " 'hectare': 'ha',\n", " 'tonnes': 't',\n", " 'litres': 'l',\n", " 'minimum': 'min',\n", " 'maximum': 'max',\n", " 'average': 'avg'\n", " }\n", "\n", " for full, abbr in abbreviations.items():\n", " name = name.replace(full, abbr)\n", "\n", " return name\n", "\n", "\n", "def create_technique_mapping(olive_varieties, mapping_path='./kaggle/working/models/technique_mapping.joblib'):\n", " # Estrai tutte le tecniche uniche dal dataset e convertile in lowercase\n", " all_techniques = olive_varieties['Tecnica di Coltivazione'].str.lower().unique()\n", "\n", " # Crea il mapping partendo da 1\n", " technique_mapping = {tech: i + 1 for i, tech in enumerate(sorted(all_techniques))}\n", "\n", " # Salva il mapping\n", " os.makedirs(os.path.dirname(mapping_path), exist_ok=True)\n", " joblib.dump(technique_mapping, mapping_path)\n", "\n", " return technique_mapping\n", "\n", "\n", "def encode_techniques(df, mapping_path='./kaggle/working/models/technique_mapping.joblib'):\n", " if not os.path.exists(mapping_path):\n", " raise FileNotFoundError(f\"Mapping not found at {mapping_path}. Run create_technique_mapping first.\")\n", "\n", " technique_mapping = joblib.load(mapping_path)\n", "\n", " # Trova tutte le colonne delle tecniche\n", " tech_columns = [col for col in df.columns if col.endswith('_tech')]\n", "\n", " # Applica il mapping a tutte le colonne delle tecniche\n", " for col in tech_columns:\n", " df[col] = df[col].str.lower().map(technique_mapping).fillna(0).astype(int)\n", "\n", " return df\n", "\n", "\n", "def decode_techniques(df, mapping_path='./kaggle/working/models/technique_mapping.joblib'):\n", " if not os.path.exists(mapping_path):\n", " raise FileNotFoundError(f\"Mapping not found at {mapping_path}\")\n", "\n", " technique_mapping = joblib.load(mapping_path)\n", " reverse_mapping = {v: k for k, v in technique_mapping.items()}\n", " reverse_mapping[0] = '' # Aggiungi un mapping per 0 a stringa vuota\n", "\n", " # Trova tutte le colonne delle tecniche\n", " tech_columns = [col for col in df.columns if col.endswith('_tech')]\n", "\n", " # Applica il reverse mapping a tutte le colonne delle tecniche\n", " for col in tech_columns:\n", " df[col] = df[col].map(reverse_mapping)\n", "\n", " return df\n", "\n", "\n", "def decode_single_technique(technique_value, mapping_path='./kaggle/working/models/technique_mapping.joblib'):\n", " if not os.path.exists(mapping_path):\n", " raise FileNotFoundError(f\"Mapping not found at {mapping_path}\")\n", "\n", " technique_mapping = joblib.load(mapping_path)\n", " reverse_mapping = {v: k for k, v in technique_mapping.items()}\n", " reverse_mapping[0] = ''\n", "\n", " return reverse_mapping.get(technique_value, '')" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "def get_optimal_workers():\n", " \"\"\"\n", " Calcola il numero ottimale di workers basandosi sulle risorse del sistema.\n", " \n", " Returns:\n", " int: Numero ottimale di workers\n", " \"\"\"\n", " # Ottiene il numero di CPU logiche (inclusi i thread virtuali)\n", " cpu_count = multiprocessing.cpu_count()\n", "\n", " # Ottiene la memoria totale e disponibile in GB\n", " memory = psutil.virtual_memory()\n", " total_memory_gb = memory.total / (1024 ** 3)\n", " available_memory_gb = memory.available / (1024 ** 3)\n", "\n", " # Stima della memoria necessaria per worker (esempio: 2GB per worker)\n", " memory_per_worker_gb = 2\n", "\n", " # Calcola il numero massimo di workers basato sulla memoria disponibile\n", " max_workers_by_memory = int(available_memory_gb / memory_per_worker_gb)\n", "\n", " # Usa il minimo tra:\n", " # - numero di CPU disponibili - 1 (lascia una CPU libera per il sistema)\n", " # - numero massimo di workers basato sulla memoria\n", " # - un limite massimo arbitrario (es. 16) per evitare troppo overhead\n", " optimal_workers = min(\n", " cpu_count - 1,\n", " max_workers_by_memory,\n", " 32 # limite massimo arbitrario\n", " )\n", "\n", " # Assicura almeno 1 worker\n", " return max(1, optimal_workers)\n", "\n", "\n", "def simulate_zone(base_weather, olive_varieties, year, zone, all_varieties, variety_techniques):\n", " \"\"\"\n", " Simula la produzione di olive per una singola zona.\n", " \n", " Args:\n", " base_weather: DataFrame con dati meteo di base per l'anno selezionato\n", " olive_varieties: DataFrame con le informazioni sulle varietà di olive\n", " zone: ID della zona\n", " all_varieties: Array con tutte le varietà disponibili\n", " variety_techniques: Dict con le tecniche disponibili per ogni varietà\n", " \n", " Returns:\n", " Dict con i risultati della simulazione per la zona\n", " \"\"\"\n", " # Crea una copia dei dati meteo per questa zona specifica\n", " zone_weather = base_weather.copy()\n", "\n", " # Genera variazioni meteorologiche specifiche per questa zona\n", " zone_weather['temp_mean'] *= np.random.uniform(0.95, 1.05, len(zone_weather))\n", " zone_weather['precip_sum'] *= np.random.uniform(0.9, 1.1, len(zone_weather))\n", " zone_weather['solarenergy_sum'] *= np.random.uniform(0.95, 1.05, len(zone_weather))\n", "\n", " # Genera caratteristiche specifiche della zona\n", " num_varieties = np.random.randint(1, 4) # 1-3 varietà per zona\n", " selected_varieties = np.random.choice(all_varieties, size=num_varieties, replace=False)\n", " hectares = np.random.uniform(1, 10) # Dimensione del terreno\n", " percentages = np.random.dirichlet(np.ones(num_varieties)) # Distribuzione delle varietà\n", "\n", " # Inizializzazione contatori annuali\n", " annual_production = 0\n", " annual_min_oil = 0\n", " annual_max_oil = 0\n", " annual_avg_oil = 0\n", " annual_water_need = 0\n", "\n", " # Inizializzazione dizionario dati varietà\n", " variety_data = {clean_column_name(variety): {\n", " 'tech': '',\n", " 'pct': 0,\n", " 'prod_t_ha': 0,\n", " 'oil_prod_t_ha': 0,\n", " 'oil_prod_l_ha': 0,\n", " 'min_yield_pct': 0,\n", " 'max_yield_pct': 0,\n", " 'min_oil_prod_l_ha': 0,\n", " 'max_oil_prod_l_ha': 0,\n", " 'avg_oil_prod_l_ha': 0,\n", " 'l_per_t': 0,\n", " 'min_l_per_t': 0,\n", " 'max_l_per_t': 0,\n", " 'avg_l_per_t': 0,\n", " 'olive_prod': 0,\n", " 'min_oil_prod': 0,\n", " 'max_oil_prod': 0,\n", " 'avg_oil_prod': 0,\n", " 'water_need': 0\n", " } for variety in all_varieties}\n", "\n", " # Simula produzione per ogni varietà selezionata\n", " for i, variety in enumerate(selected_varieties):\n", " # Seleziona tecnica di coltivazione casuale per questa varietà\n", " technique = np.random.choice(variety_techniques[variety])\n", " percentage = percentages[i]\n", "\n", " # Ottieni informazioni specifiche della varietà\n", " variety_info = olive_varieties[\n", " (olive_varieties['Varietà di Olive'] == variety) &\n", " (olive_varieties['Tecnica di Coltivazione'] == technique)\n", " ].iloc[0]\n", "\n", " # Calcola produzione base con variabilità\n", " base_production = variety_info['Produzione (tonnellate/ettaro)'] * 1000 * percentage * hectares / 12\n", " base_production *= np.random.uniform(0.9, 1.1)\n", "\n", " # Calcola effetti meteo sulla produzione\n", " weather_effect = zone_weather.apply(\n", " lambda row: calculate_weather_effect(row, variety_info['Temperatura Ottimale']),\n", " axis=1\n", " )\n", " monthly_production = base_production * (1 + weather_effect / 10000)\n", " monthly_production *= np.random.uniform(0.95, 1.05, len(zone_weather))\n", "\n", " # Calcola produzione annuale per questa varietà\n", " annual_variety_production = monthly_production.sum()\n", "\n", " # Calcola rese di olio con variabilità\n", " min_yield_factor = np.random.uniform(0.95, 1.05)\n", " max_yield_factor = np.random.uniform(0.95, 1.05)\n", " avg_yield_factor = (min_yield_factor + max_yield_factor) / 2\n", "\n", " min_oil_production = annual_variety_production * variety_info['Min Litri per Tonnellata'] / 1000 * min_yield_factor\n", " max_oil_production = annual_variety_production * variety_info['Max Litri per Tonnellata'] / 1000 * max_yield_factor\n", " avg_oil_production = annual_variety_production * variety_info['Media Litri per Tonnellata'] / 1000 * avg_yield_factor\n", "\n", " # Calcola fabbisogno idrico\n", " base_water_need = (\n", " variety_info['Fabbisogno Acqua Primavera (m³/ettaro)'] +\n", " variety_info['Fabbisogno Acqua Estate (m³/ettaro)'] +\n", " variety_info['Fabbisogno Acqua Autunno (m³/ettaro)'] +\n", " variety_info['Fabbisogno Acqua Inverno (m³/ettaro)']\n", " ) / 4\n", "\n", " monthly_water_need = zone_weather.apply(\n", " lambda row: calculate_water_need(row, base_water_need, variety_info['Temperatura Ottimale']),\n", " axis=1\n", " )\n", " monthly_water_need *= np.random.uniform(0.95, 1.05, len(monthly_water_need))\n", " annual_variety_water_need = monthly_water_need.sum() * percentage * hectares\n", "\n", " # Aggiorna totali annuali\n", " annual_production += annual_variety_production\n", " annual_min_oil += min_oil_production\n", " annual_max_oil += max_oil_production\n", " annual_avg_oil += avg_oil_production\n", " annual_water_need += annual_variety_water_need\n", "\n", " # Aggiorna dati varietà\n", " clean_variety = clean_column_name(variety)\n", " variety_data[clean_variety].update({\n", " 'tech': clean_column_name(technique),\n", " 'pct': percentage,\n", " 'prod_t_ha': variety_info['Produzione (tonnellate/ettaro)'] * np.random.uniform(0.95, 1.05),\n", " 'oil_prod_t_ha': variety_info['Produzione Olio (tonnellate/ettaro)'] * np.random.uniform(0.95, 1.05),\n", " 'oil_prod_l_ha': variety_info['Produzione Olio (litri/ettaro)'] * np.random.uniform(0.95, 1.05),\n", " 'min_yield_pct': variety_info['Min % Resa'] * min_yield_factor,\n", " 'max_yield_pct': variety_info['Max % Resa'] * max_yield_factor,\n", " 'min_oil_prod_l_ha': variety_info['Min Produzione Olio (litri/ettaro)'] * min_yield_factor,\n", " 'max_oil_prod_l_ha': variety_info['Max Produzione Olio (litri/ettaro)'] * max_yield_factor,\n", " 'avg_oil_prod_l_ha': variety_info['Media Produzione Olio (litri/ettaro)'] * avg_yield_factor,\n", " 'l_per_t': variety_info['Litri per Tonnellata'] * np.random.uniform(0.98, 1.02),\n", " 'min_l_per_t': variety_info['Min Litri per Tonnellata'] * min_yield_factor,\n", " 'max_l_per_t': variety_info['Max Litri per Tonnellata'] * max_yield_factor,\n", " 'avg_l_per_t': variety_info['Media Litri per Tonnellata'] * avg_yield_factor,\n", " 'olive_prod': annual_variety_production,\n", " 'min_oil_prod': min_oil_production,\n", " 'max_oil_prod': max_oil_production,\n", " 'avg_oil_prod': avg_oil_production,\n", " 'water_need': annual_variety_water_need\n", " })\n", "\n", " # Appiattisci i dati delle varietà\n", " flattened_variety_data = {\n", " f'{variety}_{key}': value\n", " for variety, data in variety_data.items()\n", " for key, value in data.items()\n", " }\n", "\n", " # Restituisci il risultato della zona\n", " return {\n", " 'year': year,\n", " 'zone_id': zone + 1,\n", " 'temp_mean': zone_weather['temp_mean'].mean(),\n", " 'precip_sum': zone_weather['precip_sum'].sum(),\n", " 'solar_energy_sum': zone_weather['solarenergy_sum'].sum(),\n", " 'ha': hectares,\n", " 'zone': f\"zone_{zone + 1}\",\n", " 'olive_prod': annual_production,\n", " 'min_oil_prod': annual_min_oil,\n", " 'max_oil_prod': annual_max_oil,\n", " 'avg_oil_prod': annual_avg_oil,\n", " 'total_water_need': annual_water_need,\n", " **flattened_variety_data\n", " }\n", "\n", "\n", "def simulate_olive_production_parallel(weather_data, olive_varieties, num_simulations=5, \n", " random_seed=None, max_workers=None, batch_size=500,\n", " output_path=\"./kaggle/working/data/simulated_data.parquet\"):\n", " \"\"\"\n", " Versione ottimizzata della simulazione che salva i risultati in un unico file parquet partizionato\n", " \n", " Parameters:\n", " -----------\n", " weather_data : DataFrame\n", " Dati meteorologici di input\n", " olive_varieties : DataFrame\n", " Dati sulle varietà di olive\n", " num_simulations : int\n", " Numero totale di simulazioni da eseguire\n", " random_seed : int, optional\n", " Seed per la riproducibilità\n", " max_workers : int, optional\n", " Numero massimo di workers per la parallelizzazione\n", " batch_size : int\n", " Dimensione di ogni batch di simulazioni\n", " output_path : str\n", " Percorso del file parquet di output (includerà le partizioni)\n", " \"\"\"\n", " import os\n", " from math import ceil\n", " \n", " if random_seed is not None:\n", " np.random.seed(random_seed)\n", " \n", " # Preparazione dati\n", " create_technique_mapping(olive_varieties)\n", " monthly_weather = preprocess_weather_data(weather_data)\n", " all_varieties = olive_varieties['Varietà di Olive'].unique()\n", " variety_techniques = {\n", " variety: olive_varieties[olive_varieties['Varietà di Olive'] == variety]['Tecnica di Coltivazione'].unique()\n", " for variety in all_varieties\n", " }\n", " \n", " # Calcolo workers ottimali se non specificati\n", " if max_workers is None:\n", " max_workers = get_optimal_workers() or 1\n", " print(f\"Utilizzando {max_workers} workers basati sulle risorse del sistema\")\n", " \n", " # Calcolo del numero di batch necessari\n", " num_batches = ceil(num_simulations / batch_size)\n", " print(f\"Elaborazione di {num_simulations} simulazioni in {num_batches} batch\")\n", " \n", " # Crea directory parent se non esiste\n", " os.makedirs(os.path.dirname(output_path), exist_ok=True)\n", " \n", " for batch_num in range(num_batches):\n", " start_sim = batch_num * batch_size\n", " end_sim = min((batch_num + 1) * batch_size, num_simulations)\n", " current_batch_size = end_sim - start_sim\n", " \n", " batch_results = []\n", " \n", " # Parallelizzazione usando ProcessPoolExecutor\n", " with ProcessPoolExecutor(max_workers=max_workers) as executor:\n", " with tqdm(total=current_batch_size * current_batch_size,\n", " desc=f\"Batch {batch_num + 1}/{num_batches}\") as pbar:\n", " \n", " future_to_sim_id = {}\n", " \n", " # Sottometti i lavori per il batch corrente\n", " for sim in range(start_sim, end_sim):\n", " selected_year = np.random.choice(monthly_weather['year'].unique())\n", " base_weather = monthly_weather[monthly_weather['year'] == selected_year].copy()\n", " base_weather.loc[:, 'growth_phase'] = base_weather['month'].apply(get_growth_phase)\n", " \n", " for zone in range(current_batch_size):\n", " future = executor.submit(\n", " simulate_zone,\n", " base_weather=base_weather,\n", " olive_varieties=olive_varieties,\n", " year=selected_year,\n", " zone=zone,\n", " all_varieties=all_varieties,\n", " variety_techniques=variety_techniques\n", " )\n", " future_to_sim_id[future] = sim + 1\n", " \n", " # Raccogli i risultati del batch\n", " for future in as_completed(future_to_sim_id.keys()):\n", " sim_id = future_to_sim_id[future]\n", " try:\n", " result = future.result()\n", " result['simulation_id'] = sim_id\n", " result['batch_id'] = batch_num # Aggiungiamo batch_id per il partizionamento\n", " batch_results.append(result)\n", " pbar.update(1)\n", " except Exception as e:\n", " print(f\"Errore nella simulazione {sim_id}: {str(e)}\")\n", " continue\n", " \n", " # Converti i risultati del batch in DataFrame\n", " batch_df = pd.DataFrame(batch_results)\n", " \n", " # Salva il batch come partizione del file parquet\n", " batch_df.to_parquet(\n", " output_path,\n", " partition_cols=['batch_id'], # Partiziona per batch_id\n", " append=batch_num > 0 # Appendi se non è il primo batch\n", " )\n", " \n", " # Libera memoria\n", " del batch_results\n", " del batch_df\n", " \n", " print(f\"Simulazione completata. I dati sono stati salvati in: {output_path}\")\n", "\n", "\n", "# Funzione per visualizzare il mapping delle tecniche\n", "def print_technique_mapping(mapping_path='./kaggle/working/models/technique_mapping.joblib'):\n", " if not os.path.exists(mapping_path):\n", " print(\"Mapping file not found.\")\n", " return\n", "\n", " mapping = joblib.load(mapping_path)\n", " print(\"Technique Mapping:\")\n", " for technique, code in mapping.items():\n", " print(f\"{technique}: {code}\")" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "def clean_column_names(df):\n", " # Funzione per pulire i nomi delle colonne\n", " new_columns = []\n", " for col in df.columns:\n", " # Usa regex per separare le varietà\n", " varieties = re.findall(r'([a-z]+)_([a-z_]+)', col)\n", " if varieties:\n", " new_columns.append(f\"{varieties[0][0]}_{varieties[0][1]}\")\n", " else:\n", " new_columns.append(col)\n", " return new_columns\n", "\n", "\n", "def prepare_comparison_data(simulated_data, olive_varieties):\n", " # Pulisci i nomi delle colonne\n", " df = simulated_data.copy()\n", "\n", " df.columns = clean_column_names(df)\n", " df = encode_techniques(df)\n", "\n", " all_varieties = olive_varieties['Varietà di Olive'].unique()\n", " varieties = [clean_column_name(variety) for variety in all_varieties]\n", " comparison_data = []\n", "\n", " for variety in varieties:\n", " olive_prod_col = next((col for col in df.columns if col.startswith(f'{variety}_') and col.endswith('_olive_prod')), None)\n", " oil_prod_col = next((col for col in df.columns if col.startswith(f'{variety}_') and col.endswith('_avg_oil_prod')), None)\n", " tech_col = next((col for col in df.columns if col.startswith(f'{variety}_') and col.endswith('_tech')), None)\n", " water_need_col = next((col for col in df.columns if col.startswith(f'{variety}_') and col.endswith('_water_need')), None)\n", "\n", " if olive_prod_col and oil_prod_col and tech_col and water_need_col:\n", " variety_data = df[[olive_prod_col, oil_prod_col, tech_col, water_need_col]]\n", " variety_data = variety_data[variety_data[tech_col] != 0] # Esclude le righe dove la tecnica è 0\n", "\n", " if not variety_data.empty:\n", " avg_olive_prod = pd.to_numeric(variety_data[olive_prod_col], errors='coerce').mean()\n", " avg_oil_prod = pd.to_numeric(variety_data[oil_prod_col], errors='coerce').mean()\n", " avg_water_need = pd.to_numeric(variety_data[water_need_col], errors='coerce').mean()\n", " efficiency = avg_oil_prod / avg_olive_prod if avg_olive_prod > 0 else 0\n", " water_efficiency = avg_oil_prod / avg_water_need if avg_water_need > 0 else 0\n", "\n", " comparison_data.append({\n", " 'Variety': variety,\n", " 'Avg Olive Production (kg/ha)': avg_olive_prod,\n", " 'Avg Oil Production (L/ha)': avg_oil_prod,\n", " 'Avg Water Need (m³/ha)': avg_water_need,\n", " 'Oil Efficiency (L/kg)': efficiency,\n", " 'Water Efficiency (L oil/m³ water)': water_efficiency\n", " })\n", "\n", " return pd.DataFrame(comparison_data)\n", "\n", "\n", "def plot_variety_comparison(comparison_data, metric):\n", " plt.figure(figsize=(12, 6))\n", " bars = plt.bar(comparison_data['Variety'], comparison_data[metric])\n", " plt.title(f'Comparison of {metric} across Olive Varieties')\n", " plt.xlabel('Variety')\n", " plt.ylabel(metric)\n", " plt.xticks(rotation=45, ha='right')\n", "\n", " for bar in bars:\n", " height = bar.get_height()\n", " plt.text(bar.get_x() + bar.get_width() / 2., height,\n", " f'{height:.2f}',\n", " ha='center', va='bottom')\n", "\n", " plt.tight_layout()\n", " plt.show()\n", " save_plot(plt, f'variety_comparison_{metric.lower().replace(\" \", \"_\").replace(\"/\", \"_\").replace(\"(\", \"\").replace(\")\", \"\")}')\n", " plt.close()\n", "\n", "\n", "def plot_efficiency_vs_production(comparison_data):\n", " plt.figure(figsize=(10, 6))\n", "\n", " plt.scatter(comparison_data['Avg Olive Production (kg/ha)'],\n", " comparison_data['Oil Efficiency (L/kg)'],\n", " s=100)\n", "\n", " for i, row in comparison_data.iterrows():\n", " plt.annotate(row['Variety'],\n", " (row['Avg Olive Production (kg/ha)'], row['Oil Efficiency (L/kg)']),\n", " xytext=(5, 5), textcoords='offset points')\n", "\n", " plt.title('Oil Efficiency vs Olive Production by Variety')\n", " plt.xlabel('Average Olive Production (kg/ha)')\n", " plt.ylabel('Oil Efficiency (L oil / kg olives)')\n", " plt.tight_layout()\n", " save_plot(plt, 'efficiency_vs_production')\n", " plt.close()\n", "\n", "\n", "def plot_water_efficiency_vs_production(comparison_data):\n", " plt.figure(figsize=(10, 6))\n", "\n", " plt.scatter(comparison_data['Avg Olive Production (kg/ha)'],\n", " comparison_data['Water Efficiency (L oil/m³ water)'],\n", " s=100)\n", "\n", " for i, row in comparison_data.iterrows():\n", " plt.annotate(row['Variety'],\n", " (row['Avg Olive Production (kg/ha)'], row['Water Efficiency (L oil/m³ water)']),\n", " xytext=(5, 5), textcoords='offset points')\n", "\n", " plt.title('Water Efficiency vs Olive Production by Variety')\n", " plt.xlabel('Average Olive Production (kg/ha)')\n", " plt.ylabel('Water Efficiency (L oil / m³ water)')\n", " plt.tight_layout()\n", " plt.show()\n", " save_plot(plt, 'water_efficiency_vs_production')\n", " plt.close()\n", "\n", "\n", "def plot_water_need_vs_oil_production(comparison_data):\n", " plt.figure(figsize=(10, 6))\n", "\n", " plt.scatter(comparison_data['Avg Water Need (m³/ha)'],\n", " comparison_data['Avg Oil Production (L/ha)'],\n", " s=100)\n", "\n", " for i, row in comparison_data.iterrows():\n", " plt.annotate(row['Variety'],\n", " (row['Avg Water Need (m³/ha)'], row['Avg Oil Production (L/ha)']),\n", " xytext=(5, 5), textcoords='offset points')\n", "\n", " plt.title('Oil Production vs Water Need by Variety')\n", " plt.xlabel('Average Water Need (m³/ha)')\n", " plt.ylabel('Average Oil Production (L/ha)')\n", " plt.tight_layout()\n", " plt.show()\n", " save_plot(plt, 'water_need_vs_oil_production')\n", " plt.close()\n", "\n", "\n", "def analyze_by_technique(simulated_data, olive_varieties):\n", " # Pulisci i nomi delle colonne\n", " df = simulated_data.copy()\n", "\n", " df.columns = clean_column_names(df)\n", " df = encode_techniques(df)\n", " all_varieties = olive_varieties['Varietà di Olive'].unique()\n", " varieties = [clean_column_name(variety) for variety in all_varieties]\n", "\n", " technique_data = []\n", "\n", " for variety in varieties:\n", " olive_prod_col = next((col for col in df.columns if col.startswith(f'{variety}_') and col.endswith('_olive_prod')), None)\n", " oil_prod_col = next((col for col in df.columns if col.startswith(f'{variety}_') and col.endswith('_avg_oil_prod')), None)\n", " tech_col = next((col for col in df.columns if col.startswith(f'{variety}_') and col.endswith('_tech')), None)\n", " water_need_col = next((col for col in df.columns if col.startswith(f'{variety}_') and col.endswith('_water_need')), None)\n", "\n", " if olive_prod_col and oil_prod_col and tech_col and water_need_col:\n", " variety_data = df[[olive_prod_col, oil_prod_col, tech_col, water_need_col]]\n", " variety_data = variety_data[variety_data[tech_col] != 0]\n", "\n", " if not variety_data.empty:\n", " for tech in variety_data[tech_col].unique():\n", " tech_data = variety_data[variety_data[tech_col] == tech]\n", "\n", " avg_olive_prod = pd.to_numeric(tech_data[olive_prod_col], errors='coerce').mean()\n", " avg_oil_prod = pd.to_numeric(tech_data[oil_prod_col], errors='coerce').mean()\n", " avg_water_need = pd.to_numeric(tech_data[water_need_col], errors='coerce').mean()\n", "\n", " efficiency = avg_oil_prod / avg_olive_prod if avg_olive_prod > 0 else 0\n", " water_efficiency = avg_oil_prod / avg_water_need if avg_water_need > 0 else 0\n", "\n", " technique_data.append({\n", " 'Variety': variety,\n", " 'Technique': tech,\n", " 'Technique String': decode_single_technique(tech),\n", " 'Avg Olive Production (kg/ha)': avg_olive_prod,\n", " 'Avg Oil Production (L/ha)': avg_oil_prod,\n", " 'Avg Water Need (m³/ha)': avg_water_need,\n", " 'Oil Efficiency (L/kg)': efficiency,\n", " 'Water Efficiency (L oil/m³ water)': water_efficiency\n", " })\n", "\n", " return pd.DataFrame(technique_data)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "def get_full_data(simulated_data, olive_varieties):\n", " # Assumiamo che simulated_data contenga già tutti i dati necessari\n", " # Includiamo solo le colonne rilevanti\n", " relevant_columns = ['year', 'temp_mean', 'precip_sum', 'solar_energy_sum', 'ha', 'zone', 'olive_prod']\n", "\n", " # Aggiungiamo le colonne specifiche per varietà\n", " all_varieties = olive_varieties['Varietà di Olive'].unique()\n", " varieties = [clean_column_name(variety) for variety in all_varieties]\n", " for variety in varieties:\n", " relevant_columns.extend([f'{variety}_olive_prod', f'{variety}_tech'])\n", "\n", " return simulated_data[relevant_columns].copy()\n", "\n", "\n", "def analyze_correlations(full_data, variety):\n", " # Filtra i dati per la varietà specifica\n", " variety_data = full_data[[col for col in full_data.columns if not col.startswith('_') or col.startswith(f'{variety}_')]]\n", "\n", " # Rinomina le colonne per chiarezza\n", " variety_data = variety_data.rename(columns={\n", " f'{variety}_olive_prod': 'olive_production',\n", " f'{variety}_tech': 'technique'\n", " })\n", "\n", " # Matrice di correlazione\n", " plt.figure(figsize=(12, 10))\n", " corr_matrix = variety_data[['temp_mean', 'precip_sum', 'solar_energy_sum', 'olive_production']].corr()\n", " sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')\n", " plt.title(f'Matrice di Correlazione - {variety}')\n", " plt.tight_layout()\n", " plt.show()\n", " save_plot(plt, f'correlation_matrix_{variety}')\n", " plt.close()\n", "\n", " # Scatter plots\n", " fig, axes = plt.subplots(2, 2, figsize=(20, 20))\n", " fig.suptitle(f'Relazione tra Fattori Meteorologici e Produzione di Olive - {variety}', fontsize=16)\n", "\n", " for ax, var in zip(axes.flat, ['temp_mean', 'precip_sum', 'solar_energy_sum', 'ha']):\n", " sns.scatterplot(data=variety_data, x=var, y='olive_production', hue='technique', ax=ax)\n", " ax.set_title(f'{var.capitalize()} vs Produzione Olive')\n", " ax.set_xlabel(var.capitalize())\n", " ax.set_ylabel('Produzione Olive (kg/ettaro)')\n", "\n", " plt.tight_layout()\n", " plt.show()\n", " save_plot(plt, f'meteorological_factors_{variety}')\n", " plt.close()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2024-10-24T10:25:45.872651Z", "start_time": "2024-10-24T10:25:45.859503Z" }, "id": "2QXm2B51thBA" }, "outputs": [], "source": [ "def prepare_transformer_data(df, olive_varieties_df):\n", " # Crea una copia del DataFrame per evitare modifiche all'originale\n", " df = df.copy()\n", "\n", " # Ordina per zona e anno\n", " df = df.sort_values(['zone', 'year'])\n", "\n", " # Definisci le feature\n", " temporal_features = ['temp_mean', 'precip_sum', 'solar_energy_sum']\n", " static_features = ['ha'] # Feature statiche base\n", " target_features = ['olive_prod', 'min_oil_prod', 'max_oil_prod', 'avg_oil_prod', 'total_water_need']\n", "\n", " # Ottieni le varietà pulite\n", " all_varieties = olive_varieties_df['Varietà di Olive'].unique()\n", " varieties = [clean_column_name(variety) for variety in all_varieties]\n", "\n", " # Crea la struttura delle feature per ogni varietà\n", " variety_features = [\n", " 'tech', 'pct', 'prod_t_ha', 'oil_prod_t_ha', 'oil_prod_l_ha',\n", " 'min_yield_pct', 'max_yield_pct', 'min_oil_prod_l_ha', 'max_oil_prod_l_ha',\n", " 'avg_oil_prod_l_ha', 'l_per_t', 'min_l_per_t', 'max_l_per_t', 'avg_l_per_t'\n", " ]\n", "\n", " # Prepara dizionari per le nuove colonne\n", " new_columns = {}\n", "\n", " # Prepara le feature per ogni varietà\n", " for variety in varieties:\n", " # Feature esistenti\n", " for feature in variety_features:\n", " col_name = f\"{variety}_{feature}\"\n", " if col_name in df.columns:\n", " if feature != 'tech': # Non includere la colonna tech direttamente\n", " static_features.append(col_name)\n", "\n", " # Feature binarie per le tecniche di coltivazione\n", " for technique in ['tradizionale', 'intensiva', 'superintensiva']:\n", " col_name = f\"{variety}_{technique}\"\n", " new_columns[col_name] = df[f\"{variety}_tech\"].notna() & (\n", " df[f\"{variety}_tech\"].str.lower() == technique\n", " ).fillna(False)\n", " static_features.append(col_name)\n", "\n", " # Aggiungi tutte le nuove colonne in una volta sola\n", " new_df = pd.concat([df] + [pd.Series(v, name=k) for k, v in new_columns.items()], axis=1)\n", "\n", " # Ordiniamo per zona e anno per mantenere la continuità temporale\n", " df_sorted = new_df.sort_values(['zone', 'year'])\n", "\n", " # Definiamo la dimensione della finestra temporale\n", " window_size = 41\n", "\n", " # Liste per raccogliere i dati\n", " temporal_sequences = []\n", " static_features_list = []\n", " targets_list = []\n", "\n", " # Iteriamo per ogni zona\n", " for zone in df_sorted['zone'].unique():\n", " zone_data = df_sorted[df_sorted['zone'] == zone].reset_index(drop=True)\n", "\n", " if len(zone_data) >= window_size: # Verifichiamo che ci siano abbastanza dati\n", " # Creiamo sequenze temporali scorrevoli\n", " for i in range(len(zone_data) - window_size + 1):\n", " # Sequenza temporale\n", " temporal_window = zone_data.iloc[i:i + window_size][temporal_features].values\n", " # Verifichiamo che non ci siano valori NaN\n", " if not np.isnan(temporal_window).any():\n", " temporal_sequences.append(temporal_window)\n", "\n", " # Feature statiche (prendiamo quelle dell'ultimo timestep della finestra)\n", " static_features_list.append(zone_data.iloc[i + window_size - 1][static_features].values)\n", "\n", " # Target (prendiamo quelli dell'ultimo timestep della finestra)\n", " targets_list.append(zone_data.iloc[i + window_size - 1][target_features].values)\n", "\n", " # Convertiamo in array numpy\n", " X_temporal = np.array(temporal_sequences)\n", " X_static = np.array(static_features_list)\n", " y = np.array(targets_list)\n", "\n", " print(f\"Dataset completo - Temporal: {X_temporal.shape}, Static: {X_static.shape}, Target: {y.shape}\")\n", "\n", " # Split dei dati (usando indici casuali per una migliore distribuzione)\n", " indices = np.random.permutation(len(X_temporal))\n", " #train_idx = int(len(indices) * 0.7)\n", " #val_idx = int(len(indices) * 0.85)\n", "\n", " train_idx = int(len(indices) * 0.65) # 65% training\n", " val_idx = int(len(indices) * 0.85) # 20% validation\n", " # Il resto rimane 15% test\n", "\n", " # Oppure versione con 25% validation:\n", " #train_idx = int(len(indices) * 0.60) # 60% training\n", " #val_idx = int(len(indices) * 0.85) # 25% validation\n", "\n", " train_indices = indices[:train_idx]\n", " val_indices = indices[train_idx:val_idx]\n", " test_indices = indices[val_idx:]\n", "\n", " # Split dei dati\n", " X_temporal_train = X_temporal[train_indices]\n", " X_temporal_val = X_temporal[val_indices]\n", " X_temporal_test = X_temporal[test_indices]\n", "\n", " X_static_train = X_static[train_indices]\n", " X_static_val = X_static[val_indices]\n", " X_static_test = X_static[test_indices]\n", "\n", " y_train = y[train_indices]\n", " y_val = y[val_indices]\n", " y_test = y[test_indices]\n", "\n", " # Standardizzazione\n", " scaler_temporal = StandardScaler()\n", " scaler_static = StandardScaler()\n", " scaler_y = StandardScaler()\n", "\n", " # Standardizzazione dei dati temporali\n", " X_temporal_train = scaler_temporal.fit_transform(X_temporal_train.reshape(-1, len(temporal_features))).reshape(X_temporal_train.shape)\n", " X_temporal_val = scaler_temporal.transform(X_temporal_val.reshape(-1, len(temporal_features))).reshape(X_temporal_val.shape)\n", " X_temporal_test = scaler_temporal.transform(X_temporal_test.reshape(-1, len(temporal_features))).reshape(X_temporal_test.shape)\n", "\n", " # Standardizzazione dei dati statici\n", " X_static_train = scaler_static.fit_transform(X_static_train)\n", " X_static_val = scaler_static.transform(X_static_val)\n", " X_static_test = scaler_static.transform(X_static_test)\n", "\n", " # Standardizzazione dei target\n", " y_train = scaler_y.fit_transform(y_train)\n", " y_val = scaler_y.transform(y_val)\n", " y_test = scaler_y.transform(y_test)\n", "\n", " print(\"\\nShape dopo lo split e standardizzazione:\")\n", " print(f\"Train - Temporal: {X_temporal_train.shape}, Static: {X_static_train.shape}, Target: {y_train.shape}\")\n", " print(f\"Val - Temporal: {X_temporal_val.shape}, Static: {X_static_val.shape}, Target: {y_val.shape}\")\n", " print(f\"Test - Temporal: {X_temporal_test.shape}, Static: {X_static_test.shape}, Target: {y_test.shape}\")\n", "\n", " # Prepara i dizionari di input\n", " train_data = {'temporal': X_temporal_train, 'static': X_static_train}\n", " val_data = {'temporal': X_temporal_val, 'static': X_static_val}\n", " test_data = {'temporal': X_temporal_test, 'static': X_static_test}\n", "\n", " base_path = './kaggle/working/models/oil_transformer/'\n", "\n", " os.makedirs(base_path, exist_ok=True)\n", "\n", " joblib.dump(scaler_temporal, os.path.join(base_path, 'scaler_temporal.joblib'))\n", " joblib.dump(scaler_static, os.path.join(base_path, 'scaler_static.joblib'))\n", " joblib.dump(scaler_y, os.path.join(base_path, 'scaler_y.joblib'))\n", "\n", " return (train_data, y_train), (val_data, y_val), (test_data, y_test), (scaler_temporal, scaler_static, scaler_y)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# Per denormalizzare e calcolare l'errore reale\n", "def calculate_real_error(model, test_data, test_targets, scaler_y):\n", " # Fare predizioni\n", " predictions = model.predict(test_data)\n", "\n", " # Denormalizzare predizioni e target\n", " predictions_real = scaler_y.inverse_transform(predictions)\n", " targets_real = scaler_y.inverse_transform(test_targets)\n", "\n", " # Calcolare errore percentuale per ogni target\n", " percentage_errors = []\n", " absolute_errors = []\n", "\n", " for i in range(predictions_real.shape[1]):\n", " mae = np.mean(np.abs(predictions_real[:, i] - targets_real[:, i]))\n", " mape = np.mean(np.abs((predictions_real[:, i] - targets_real[:, i]) / targets_real[:, i])) * 100\n", " percentage_errors.append(mape)\n", " absolute_errors.append(mae)\n", "\n", " # Stampa risultati per ogni target\n", " target_names = ['olive_prod', 'min_oil_prod', 'max_oil_prod', 'avg_oil_prod', 'total_water_need']\n", "\n", " print(\"\\nErrori per target:\")\n", " print(\"-\" * 50)\n", " for i, target in enumerate(target_names):\n", " print(f\"{target}:\")\n", " print(f\"MAE assoluto: {absolute_errors[i]:.2f}\")\n", " print(f\"Errore percentuale medio: {percentage_errors[i]:.2f}%\")\n", " print(f\"Precisione: {100 - percentage_errors[i]:.2f}%\")\n", " print(\"-\" * 50)\n", "\n", " return percentage_errors, absolute_errors" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2024-10-25T21:05:45.017577Z", "start_time": "2024-10-25T21:05:34.194467Z" }, "id": "d_WHC4rJthA8" }, "outputs": [], "source": [ "folder_path = './data/weather'\n", "#raw_data = read_json_files(folder_path)\n", "#weather_data = create_weather_dataset(raw_data)\n", "#weather_data['datetime'] = pd.to_datetime(weather_data['datetime'], errors='coerce')\n", "#weather_data['date'] = weather_data['datetime'].dt.date\n", "#weather_data = weather_data.dropna(subset=['datetime'])\n", "#weather_data['datetime'] = pd.to_datetime(weather_data['datetime'])\n", "#weather_data['year'] = weather_data['datetime'].dt.year\n", "#weather_data['month'] = weather_data['datetime'].dt.month\n", "#weather_data['day'] = weather_data['datetime'].dt.day\n", "#weather_data.head()\n", "\n", "#weather_data.to_parquet('./data/weather_data.parquet')" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2024-10-26T05:43:32.169183Z", "start_time": "2024-10-26T05:43:29.609044Z" }, "id": "uvIOrixethA9" }, "outputs": [], "source": [ "weather_data = pd.read_parquet('./kaggle/input/olive-oil/weather_data.parquet')\n", "\n", "features = [\n", " 'temp', 'tempmin', 'tempmax', 'humidity', 'cloudcover', 'windspeed', 'pressure', 'visibility',\n", " 'hour_sin', 'hour_cos', 'month_sin', 'month_cos', 'day_of_year_sin', 'day_of_year_cos',\n", " 'temp_humidity', 'temp_cloudcover', 'visibility_cloudcover', 'clear_sky_factor', 'day_length',\n", " 'temp_1h_lag', 'cloudcover_1h_lag', 'humidity_1h_lag', 'temp_rolling_mean_6h',\n", " 'cloudcover_rolling_mean_6h'\n", "] + [col for col in weather_data.columns if 'season_' in col or 'time_period_' in col]\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "start_time": "2024-10-26T05:43:33.294101Z" }, "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "7qF_3gVpthA9", "jupyter": { "is_executing": true }, "outputId": "0de98483-956b-45e2-f9f3-8410f79cd307" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Preparazione dati iniziale...\n", "Shape iniziale features: (129674, 24)\n", "\n", "==================================================\n", "Training modello per: solarradiation\n", "==================================================\n", "\n", "Preparazione dati di training...\n", "\n", "Creazione modello...\n", "Input shape: (24, 24)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2024-11-06 21:44:20.395277: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Model: \"SolarRadiation\"\n", "__________________________________________________________________________________________________\n", " Layer (type) Output Shape Param # Connected to \n", "==================================================================================================\n", " main_input (InputLayer) [(None, 24, 24)] 0 [] \n", " \n", " conv1d (Conv1D) (None, 24, 32) 2336 ['main_input[0][0]'] \n", " \n", " batch_normalization (Batch (None, 24, 32) 128 ['conv1d[0][0]'] \n", " Normalization) \n", " \n", " activation (Activation) (None, 24, 32) 0 ['batch_normalization[0][0]'] \n", " \n", " conv1d_1 (Conv1D) (None, 24, 64) 6208 ['activation[0][0]'] \n", " \n", " solar_params (InputLayer) [(None, 3)] 0 [] \n", " \n", " batch_normalization_1 (Bat (None, 24, 64) 256 ['conv1d_1[0][0]'] \n", " chNormalization) \n", " \n", " bidirectional (Bidirection (None, 24, 128) 45568 ['main_input[0][0]'] \n", " al) \n", " \n", " dense (Dense) (None, 32) 128 ['solar_params[0][0]'] \n", " \n", " activation_1 (Activation) (None, 24, 64) 0 ['batch_normalization_1[0][0]'\n", " ] \n", " \n", " bidirectional_1 (Bidirecti (None, 64) 41216 ['bidirectional[0][0]'] \n", " onal) \n", " \n", " batch_normalization_3 (Bat (None, 32) 128 ['dense[0][0]'] \n", " chNormalization) \n", " \n", " global_average_pooling1d ( (None, 64) 0 ['activation_1[0][0]'] \n", " GlobalAveragePooling1D) \n", " \n", " batch_normalization_2 (Bat (None, 64) 256 ['bidirectional_1[0][0]'] \n", " chNormalization) \n", " \n", " activation_2 (Activation) (None, 32) 0 ['batch_normalization_3[0][0]'\n", " ] \n", " \n", " concatenate (Concatenate) (None, 160) 0 ['global_average_pooling1d[0][\n", " 0]', \n", " 'batch_normalization_2[0][0]'\n", " , 'activation_2[0][0]'] \n", " \n", " dense_1 (Dense) (None, 64) 10304 ['concatenate[0][0]'] \n", " \n", " batch_normalization_4 (Bat (None, 64) 256 ['dense_1[0][0]'] \n", " chNormalization) \n", " \n", " activation_3 (Activation) (None, 64) 0 ['batch_normalization_4[0][0]'\n", " ] \n", " \n", " dropout (Dropout) (None, 64) 0 ['activation_3[0][0]'] \n", " \n", " dense_2 (Dense) (None, 32) 2080 ['dropout[0][0]'] \n", " \n", " batch_normalization_5 (Bat (None, 32) 128 ['dense_2[0][0]'] \n", " chNormalization) \n", " \n", " activation_4 (Activation) (None, 32) 0 ['batch_normalization_5[0][0]'\n", " ] \n", " \n", " dense_3 (Dense) (None, 1) 33 ['activation_4[0][0]'] \n", " \n", "==================================================================================================\n", "Total params: 109025 (425.88 KB)\n", "Trainable params: 108449 (423.63 KB)\n", "Non-trainable params: 576 (2.25 KB)\n", "__________________________________________________________________________________________________\n", "\n", "Inizio training...\n", "Epoch 1/50\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2024-11-06 21:44:28.783921: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8905\n", "2024-11-06 21:44:28.896066: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory\n", "2024-11-06 21:44:31.089698: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x71e4e5b291f0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:\n", "2024-11-06 21:44:31.089754: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA L40S, Compute Capability 8.9\n", "2024-11-06 21:44:31.096487: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.\n", "2024-11-06 21:44:31.334699: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 5/2836 [..............................] - ETA: 1:11 - loss: 0.9626 - mae: 1.2291 WARNING:tensorflow:Callback method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0232s vs `on_train_batch_end` time: 0.0599s). Check your callbacks.\n", "2836/2836 [==============================] - ETA: 0s - loss: 0.0277 - mae: 0.0994\n", "Epoch 1: val_loss improved from inf to 0.00431, saving model to ./kaggle/working/models/solarradiation/checkpoints/best_model_01_0.0043.h5\n", "2836/2836 [==============================] - 81s 24ms/step - loss: 0.0277 - mae: 0.0994 - val_loss: 0.0043 - val_mae: 0.0562 - lr: 0.0010\n", "Epoch 2/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0047 - mae: 0.0590\n", "Epoch 2: val_loss improved from 0.00431 to 0.00289, saving model to ./kaggle/working/models/solarradiation/checkpoints/best_model_02_0.0029.h5\n", "2836/2836 [==============================] - 67s 23ms/step - loss: 0.0047 - mae: 0.0590 - val_loss: 0.0029 - val_mae: 0.0435 - lr: 0.0010\n", "Epoch 3/50\n", "2835/2836 [============================>.] - ETA: 0s - loss: 0.0040 - mae: 0.0535\n", "Epoch 3: val_loss did not improve from 0.00289\n", "2836/2836 [==============================] - 67s 24ms/step - loss: 0.0040 - mae: 0.0534 - val_loss: 0.0035 - val_mae: 0.0478 - lr: 0.0010\n", "Epoch 4/50\n", "2835/2836 [============================>.] - ETA: 0s - loss: 0.0036 - mae: 0.0495\n", "Epoch 4: val_loss improved from 0.00289 to 0.00282, saving model to ./kaggle/working/models/solarradiation/checkpoints/best_model_04_0.0028.h5\n", "2836/2836 [==============================] - 67s 24ms/step - loss: 0.0036 - mae: 0.0495 - val_loss: 0.0028 - val_mae: 0.0410 - lr: 0.0010\n", "Epoch 5/50\n", "2835/2836 [============================>.] - ETA: 0s - loss: 0.0034 - mae: 0.0472\n", "Epoch 5: val_loss did not improve from 0.00282\n", "2836/2836 [==============================] - 70s 25ms/step - loss: 0.0034 - mae: 0.0472 - val_loss: 0.0034 - val_mae: 0.0457 - lr: 0.0010\n", "Epoch 6/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0033 - mae: 0.0460\n", "Epoch 6: val_loss improved from 0.00282 to 0.00275, saving model to ./kaggle/working/models/solarradiation/checkpoints/best_model_06_0.0028.h5\n", "2836/2836 [==============================] - 66s 23ms/step - loss: 0.0033 - mae: 0.0460 - val_loss: 0.0028 - val_mae: 0.0381 - lr: 0.0010\n", "Epoch 7/50\n", "2835/2836 [============================>.] - ETA: 0s - loss: 0.0031 - mae: 0.0450\n", "Epoch 7: val_loss improved from 0.00275 to 0.00255, saving model to ./kaggle/working/models/solarradiation/checkpoints/best_model_07_0.0026.h5\n", "2836/2836 [==============================] - 66s 23ms/step - loss: 0.0031 - mae: 0.0450 - val_loss: 0.0026 - val_mae: 0.0369 - lr: 0.0010\n", "Epoch 8/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0031 - mae: 0.0444\n", "Epoch 8: val_loss did not improve from 0.00255\n", "2836/2836 [==============================] - 65s 23ms/step - loss: 0.0031 - mae: 0.0444 - val_loss: 0.0037 - val_mae: 0.0442 - lr: 0.0010\n", "Epoch 9/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0029 - mae: 0.0431\n", "Epoch 9: val_loss did not improve from 0.00255\n", "2836/2836 [==============================] - 65s 23ms/step - loss: 0.0029 - mae: 0.0431 - val_loss: 0.0039 - val_mae: 0.0455 - lr: 0.0010\n", "Epoch 10/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0028 - mae: 0.0424\n", "Epoch 10: val_loss improved from 0.00255 to 0.00247, saving model to ./kaggle/working/models/solarradiation/checkpoints/best_model_10_0.0025.h5\n", "2836/2836 [==============================] - 65s 23ms/step - loss: 0.0028 - mae: 0.0424 - val_loss: 0.0025 - val_mae: 0.0357 - lr: 0.0010\n", "Epoch 11/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0028 - mae: 0.0424\n", "Epoch 11: val_loss did not improve from 0.00247\n", "2836/2836 [==============================] - 64s 23ms/step - loss: 0.0028 - mae: 0.0424 - val_loss: 0.0026 - val_mae: 0.0362 - lr: 0.0010\n", "Epoch 12/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0027 - mae: 0.0419\n", "Epoch 12: val_loss improved from 0.00247 to 0.00240, saving model to ./kaggle/working/models/solarradiation/checkpoints/best_model_12_0.0024.h5\n", "2836/2836 [==============================] - 65s 23ms/step - loss: 0.0027 - mae: 0.0419 - val_loss: 0.0024 - val_mae: 0.0359 - lr: 0.0010\n", "Epoch 13/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0027 - mae: 0.0410\n", "Epoch 13: val_loss did not improve from 0.00240\n", "2836/2836 [==============================] - 63s 22ms/step - loss: 0.0027 - mae: 0.0410 - val_loss: 0.0029 - val_mae: 0.0404 - lr: 0.0010\n", "Epoch 14/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0027 - mae: 0.0410\n", "Epoch 14: val_loss did not improve from 0.00240\n", "2836/2836 [==============================] - 63s 22ms/step - loss: 0.0027 - mae: 0.0410 - val_loss: 0.0034 - val_mae: 0.0403 - lr: 0.0010\n", "Epoch 15/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0026 - mae: 0.0406\n", "Epoch 15: val_loss improved from 0.00240 to 0.00225, saving model to ./kaggle/working/models/solarradiation/checkpoints/best_model_15_0.0023.h5\n", "2836/2836 [==============================] - 63s 22ms/step - loss: 0.0026 - mae: 0.0406 - val_loss: 0.0023 - val_mae: 0.0336 - lr: 0.0010\n", "Epoch 16/50\n", "2836/2836 [==============================] - ETA: 0s - loss: 0.0026 - mae: 0.0402\n", "Epoch 16: val_loss did not improve from 0.00225\n", "2836/2836 [==============================] - 60s 21ms/step - loss: 0.0026 - mae: 0.0402 - val_loss: 0.0026 - val_mae: 0.0367 - lr: 0.0010\n", "Epoch 17/50\n", "2835/2836 [============================>.] - ETA: 0s - loss: 0.0026 - mae: 0.0401\n", "Epoch 17: val_loss did not improve from 0.00225\n", "2836/2836 [==============================] - 63s 22ms/step - loss: 0.0026 - mae: 0.0401 - val_loss: 0.0025 - val_mae: 0.0352 - lr: 0.0010\n", "Epoch 18/50\n", "2835/2836 [============================>.] - ETA: 0s - loss: 0.0025 - mae: 0.0397\n", "Epoch 18: val_loss did not improve from 0.00225\n", "2836/2836 [==============================] - 67s 24ms/step - loss: 0.0025 - mae: 0.0397 - val_loss: 0.0024 - val_mae: 0.0364 - lr: 0.0010\n", "Epoch 19/50\n", "2836/2836 [==============================] - ETA: 0s - loss: 0.0025 - mae: 0.0393\n", "Epoch 19: val_loss did not improve from 0.00225\n", "2836/2836 [==============================] - 66s 23ms/step - loss: 0.0025 - mae: 0.0393 - val_loss: 0.0024 - val_mae: 0.0339 - lr: 0.0010\n", "Epoch 20/50\n", "2836/2836 [==============================] - ETA: 0s - loss: 0.0025 - mae: 0.0393\n", "Epoch 20: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.\n", "\n", "Epoch 20: val_loss did not improve from 0.00225\n", "2836/2836 [==============================] - 66s 23ms/step - loss: 0.0025 - mae: 0.0393 - val_loss: 0.0024 - val_mae: 0.0347 - lr: 0.0010\n", "Epoch 21/50\n", "2836/2836 [==============================] - ETA: 0s - loss: 0.0023 - mae: 0.0374\n", "Epoch 21: val_loss did not improve from 0.00225\n", "2836/2836 [==============================] - 66s 23ms/step - loss: 0.0023 - mae: 0.0374 - val_loss: 0.0027 - val_mae: 0.0366 - lr: 5.0000e-04\n", "Epoch 22/50\n", "2836/2836 [==============================] - ETA: 0s - loss: 0.0022 - mae: 0.0371\n", "Epoch 22: val_loss did not improve from 0.00225\n", "2836/2836 [==============================] - 66s 23ms/step - loss: 0.0022 - mae: 0.0371 - val_loss: 0.0026 - val_mae: 0.0349 - lr: 5.0000e-04\n", "Epoch 23/50\n", "2835/2836 [============================>.] - ETA: 0s - loss: 0.0022 - mae: 0.0369\n", "Epoch 23: val_loss did not improve from 0.00225\n", "2836/2836 [==============================] - 67s 24ms/step - loss: 0.0022 - mae: 0.0369 - val_loss: 0.0024 - val_mae: 0.0346 - lr: 5.0000e-04\n", "Epoch 24/50\n", "2835/2836 [============================>.] - ETA: 0s - loss: 0.0022 - mae: 0.0368\n", "Epoch 24: val_loss did not improve from 0.00225\n", "2836/2836 [==============================] - 64s 22ms/step - loss: 0.0022 - mae: 0.0368 - val_loss: 0.0025 - val_mae: 0.0359 - lr: 5.0000e-04\n", "Epoch 25/50\n", "2836/2836 [==============================] - ETA: 0s - loss: 0.0022 - mae: 0.0367\n", "Epoch 25: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.\n", "\n", "Epoch 25: val_loss did not improve from 0.00225\n", "2836/2836 [==============================] - 67s 24ms/step - loss: 0.0022 - mae: 0.0367 - val_loss: 0.0026 - val_mae: 0.0354 - lr: 5.0000e-04\n", "\n", "Generazione predizioni complete...\n", "4052/4052 [==============================] - 25s 6ms/step\n", "\n", "Statistiche finali predizioni solarradiation:\n", "- Min: 0.0000\n", "- Max: 1042.0900\n", "- Media: 176.6735\n", "\n", "==================================================\n", "Training modello per: solarenergy\n", "==================================================\n", "\n", "Aggiunta predizioni precedenti da: ['solarradiation']\n", "\n", "Processing predizioni di solarradiation...\n", "Allineamento dimensioni necessario:\n", "- Current features: (129674, 24)\n", "- Predictions: (129650, 1)\n", "Aggiunta padding di 24 elementi\n", "Statistiche feature solarradiation:\n", "- Shape: (129674, 1)\n", "- Range: [0.0000, 1.0000]\n", "\n", "Verifica dimensioni prima della concatenazione:\n", "Nuove dimensioni features: (129674, 25)\n", "\n", "Preparazione dati di training...\n", "\n", "Creazione modello...\n", "Input shape: (24, 25)\n", "Model: \"SolarEnergy\"\n", "__________________________________________________________________________________________________\n", " Layer (type) Output Shape Param # Connected to \n", "==================================================================================================\n", " input_1 (InputLayer) [(None, 24, 25)] 0 [] \n", " \n", " conv1d_2 (Conv1D) (None, 24, 64) 4864 ['input_1[0][0]'] \n", " \n", " batch_normalization_7 (Bat (None, 24, 64) 256 ['conv1d_2[0][0]'] \n", " chNormalization) \n", " \n", " activation_6 (Activation) (None, 24, 64) 0 ['batch_normalization_7[0][0]'\n", " ] \n", " \n", " multi_head_attention (Mult (None, 24, 25) 26393 ['input_1[0][0]', \n", " iHeadAttention) 'input_1[0][0]'] \n", " \n", " conv1d_3 (Conv1D) (None, 24, 32) 6176 ['activation_6[0][0]'] \n", " \n", " lstm_2 (LSTM) (None, 24, 64) 23040 ['input_1[0][0]'] \n", " \n", " batch_normalization_6 (Bat (None, 24, 25) 100 ['multi_head_attention[0][0]']\n", " chNormalization) \n", " \n", " batch_normalization_8 (Bat (None, 24, 32) 128 ['conv1d_3[0][0]'] \n", " chNormalization) \n", " \n", " lstm_3 (LSTM) (None, 32) 12416 ['lstm_2[0][0]'] \n", " \n", " activation_5 (Activation) (None, 24, 25) 0 ['batch_normalization_6[0][0]'\n", " ] \n", " \n", " activation_7 (Activation) (None, 24, 32) 0 ['batch_normalization_8[0][0]'\n", " ] \n", " \n", " batch_normalization_9 (Bat (None, 32) 128 ['lstm_3[0][0]'] \n", " chNormalization) \n", " \n", " global_average_pooling1d_1 (None, 25) 0 ['activation_5[0][0]'] \n", " (GlobalAveragePooling1D) \n", " \n", " global_average_pooling1d_2 (None, 32) 0 ['activation_7[0][0]'] \n", " (GlobalAveragePooling1D) \n", " \n", " activation_8 (Activation) (None, 32) 0 ['batch_normalization_9[0][0]'\n", " ] \n", " \n", " concatenate_1 (Concatenate (None, 89) 0 ['global_average_pooling1d_1[0\n", " ) ][0]', \n", " 'global_average_pooling1d_2[0\n", " ][0]', \n", " 'activation_8[0][0]'] \n", " \n", " dense_4 (Dense) (None, 128) 11520 ['concatenate_1[0][0]'] \n", " \n", " batch_normalization_10 (Ba (None, 128) 512 ['dense_4[0][0]'] \n", " tchNormalization) \n", " \n", " activation_9 (Activation) (None, 128) 0 ['batch_normalization_10[0][0]\n", " '] \n", " \n", " dropout_1 (Dropout) (None, 128) 0 ['activation_9[0][0]'] \n", " \n", " dense_5 (Dense) (None, 64) 8256 ['dropout_1[0][0]'] \n", " \n", " batch_normalization_11 (Ba (None, 64) 256 ['dense_5[0][0]'] \n", " tchNormalization) \n", " \n", " activation_10 (Activation) (None, 64) 0 ['batch_normalization_11[0][0]\n", " '] \n", " \n", " dropout_2 (Dropout) (None, 64) 0 ['activation_10[0][0]'] \n", " \n", " dense_6 (Dense) (None, 1) 65 ['dropout_2[0][0]'] \n", " \n", "==================================================================================================\n", "Total params: 94110 (367.62 KB)\n", "Trainable params: 93420 (364.92 KB)\n", "Non-trainable params: 690 (2.70 KB)\n", "__________________________________________________________________________________________________\n", "\n", "Inizio training...\n", "Epoch 1/50\n", " 4/2836 [..............................] - ETA: 1:01 - loss: 2.3626 - mae: 1.3694 WARNING:tensorflow:Callback method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0201s vs `on_train_batch_end` time: 0.0205s). Check your callbacks.\n", "2836/2836 [==============================] - ETA: 0s - loss: 0.0692 - mae: 0.1162\n", "Epoch 1: val_loss improved from inf to 0.00600, saving model to ./kaggle/working/models/solarenergy/checkpoints/best_model_01_0.0060.h5\n", "2836/2836 [==============================] - 73s 22ms/step - loss: 0.0692 - mae: 0.1162 - val_loss: 0.0060 - val_mae: 0.0636 - lr: 0.0010\n", "Epoch 2/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0057 - mae: 0.0630\n", "Epoch 2: val_loss improved from 0.00600 to 0.00485, saving model to ./kaggle/working/models/solarenergy/checkpoints/best_model_02_0.0048.h5\n", "2836/2836 [==============================] - 62s 22ms/step - loss: 0.0057 - mae: 0.0630 - val_loss: 0.0048 - val_mae: 0.0610 - lr: 0.0010\n", "Epoch 3/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0052 - mae: 0.0591\n", "Epoch 3: val_loss improved from 0.00485 to 0.00360, saving model to ./kaggle/working/models/solarenergy/checkpoints/best_model_03_0.0036.h5\n", "2836/2836 [==============================] - 61s 22ms/step - loss: 0.0052 - mae: 0.0591 - val_loss: 0.0036 - val_mae: 0.0480 - lr: 0.0010\n", "Epoch 4/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0049 - mae: 0.0557\n", "Epoch 4: val_loss improved from 0.00360 to 0.00291, saving model to ./kaggle/working/models/solarenergy/checkpoints/best_model_04_0.0029.h5\n", "2836/2836 [==============================] - 65s 23ms/step - loss: 0.0049 - mae: 0.0557 - val_loss: 0.0029 - val_mae: 0.0413 - lr: 0.0010\n", "Epoch 5/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0048 - mae: 0.0548\n", "Epoch 5: val_loss did not improve from 0.00291\n", "2836/2836 [==============================] - 61s 22ms/step - loss: 0.0048 - mae: 0.0549 - val_loss: 0.0087 - val_mae: 0.0886 - lr: 0.0010\n", "Epoch 6/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0048 - mae: 0.0545\n", "Epoch 6: val_loss did not improve from 0.00291\n", "2836/2836 [==============================] - 62s 22ms/step - loss: 0.0048 - mae: 0.0545 - val_loss: 0.0208 - val_mae: 0.1540 - lr: 0.0010\n", "Epoch 7/50\n", "2835/2836 [============================>.] - ETA: 0s - loss: 0.0046 - mae: 0.0535\n", "Epoch 7: val_loss did not improve from 0.00291\n", "2836/2836 [==============================] - 61s 22ms/step - loss: 0.0046 - mae: 0.0535 - val_loss: 0.0035 - val_mae: 0.0472 - lr: 0.0010\n", "Epoch 8/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0045 - mae: 0.0529\n", "Epoch 8: val_loss did not improve from 0.00291\n", "2836/2836 [==============================] - 62s 22ms/step - loss: 0.0045 - mae: 0.0529 - val_loss: 0.0112 - val_mae: 0.1042 - lr: 0.0010\n", "Epoch 9/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0046 - mae: 0.0529\n", "Epoch 9: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.\n", "\n", "Epoch 9: val_loss did not improve from 0.00291\n", "2836/2836 [==============================] - 61s 22ms/step - loss: 0.0045 - mae: 0.0529 - val_loss: 0.0035 - val_mae: 0.0498 - lr: 0.0010\n", "Epoch 10/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0043 - mae: 0.0521\n", "Epoch 10: val_loss did not improve from 0.00291\n", "2836/2836 [==============================] - 66s 23ms/step - loss: 0.0043 - mae: 0.0521 - val_loss: 0.0115 - val_mae: 0.1027 - lr: 5.0000e-04\n", "Epoch 11/50\n", "2836/2836 [==============================] - ETA: 0s - loss: 0.0044 - mae: 0.0528\n", "Epoch 11: val_loss did not improve from 0.00291\n", "2836/2836 [==============================] - 62s 22ms/step - loss: 0.0044 - mae: 0.0528 - val_loss: 0.0069 - val_mae: 0.0714 - lr: 5.0000e-04\n", "Epoch 12/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0042 - mae: 0.0518\n", "Epoch 12: val_loss improved from 0.00291 to 0.00289, saving model to ./kaggle/working/models/solarenergy/checkpoints/best_model_12_0.0029.h5\n", "2836/2836 [==============================] - 65s 23ms/step - loss: 0.0042 - mae: 0.0518 - val_loss: 0.0029 - val_mae: 0.0386 - lr: 5.0000e-04\n", "Epoch 13/50\n", "2835/2836 [============================>.] - ETA: 0s - loss: 0.0042 - mae: 0.0516\n", "Epoch 13: val_loss did not improve from 0.00289\n", "2836/2836 [==============================] - 58s 20ms/step - loss: 0.0042 - mae: 0.0516 - val_loss: 0.0072 - val_mae: 0.0754 - lr: 5.0000e-04\n", "Epoch 14/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0042 - mae: 0.0511\n", "Epoch 14: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.\n", "\n", "Epoch 14: val_loss did not improve from 0.00289\n", "2836/2836 [==============================] - 62s 22ms/step - loss: 0.0042 - mae: 0.0511 - val_loss: 0.0117 - val_mae: 0.1028 - lr: 5.0000e-04\n", "\n", "Generazione predizioni complete...\n", "4052/4052 [==============================] - 18s 4ms/step\n", "\n", "Statistiche finali predizioni solarenergy:\n", "- Min: 0.0380\n", "- Max: 3.3664\n", "- Media: 0.6877\n", "\n", "==================================================\n", "Training modello per: uvindex\n", "==================================================\n", "\n", "Aggiunta predizioni precedenti da: ['solarradiation', 'solarenergy']\n", "\n", "Processing predizioni di solarradiation...\n", "Allineamento dimensioni necessario:\n", "- Current features: (129674, 25)\n", "- Predictions: (129650, 1)\n", "Aggiunta padding di 24 elementi\n", "Statistiche feature solarradiation:\n", "- Shape: (129674, 1)\n", "- Range: [0.0000, 1.0000]\n", "\n", "Processing predizioni di solarenergy...\n", "Allineamento dimensioni necessario:\n", "- Current features: (129674, 25)\n", "- Predictions: (129650, 1)\n", "Aggiunta padding di 24 elementi\n", "Statistiche feature solarenergy:\n", "- Shape: (129674, 1)\n", "- Range: [0.0000, 1.0000]\n", "\n", "Verifica dimensioni prima della concatenazione:\n", "Nuove dimensioni features: (129674, 27)\n", "\n", "Preparazione dati di training...\n", "\n", "Creazione modello...\n", "Input shape: (24, 27)\n", "Model: \"SolarUV\"\n", "__________________________________________________________________________________________________\n", " Layer (type) Output Shape Param # Connected to \n", "==================================================================================================\n", " input_2 (InputLayer) [(None, 24, 27)] 0 [] \n", " \n", " conv1d_4 (Conv1D) (None, 24, 64) 5248 ['input_2[0][0]'] \n", " \n", " batch_normalization_12 (Ba (None, 24, 64) 256 ['conv1d_4[0][0]'] \n", " tchNormalization) \n", " \n", " activation_11 (Activation) (None, 24, 64) 0 ['batch_normalization_12[0][0]\n", " '] \n", " \n", " max_pooling1d (MaxPooling1 (None, 12, 64) 0 ['activation_11[0][0]'] \n", " D) \n", " \n", " conv1d_5 (Conv1D) (None, 12, 32) 6176 ['max_pooling1d[0][0]'] \n", " \n", " multi_head_attention_1 (Mu (None, 24, 27) 14235 ['input_2[0][0]', \n", " ltiHeadAttention) 'input_2[0][0]'] \n", " \n", " global_average_pooling1d_5 (None, 27) 0 ['input_2[0][0]'] \n", " (GlobalAveragePooling1D) \n", " \n", " batch_normalization_13 (Ba (None, 12, 32) 128 ['conv1d_5[0][0]'] \n", " tchNormalization) \n", " \n", " batch_normalization_14 (Ba (None, 24, 27) 108 ['multi_head_attention_1[0][0]\n", " tchNormalization) '] \n", " \n", " dense_7 (Dense) (None, 64) 1792 ['global_average_pooling1d_5[0\n", " ][0]'] \n", " \n", " activation_12 (Activation) (None, 12, 32) 0 ['batch_normalization_13[0][0]\n", " '] \n", " \n", " activation_13 (Activation) (None, 24, 27) 0 ['batch_normalization_14[0][0]\n", " '] \n", " \n", " batch_normalization_15 (Ba (None, 64) 256 ['dense_7[0][0]'] \n", " tchNormalization) \n", " \n", " global_average_pooling1d_3 (None, 32) 0 ['activation_12[0][0]'] \n", " (GlobalAveragePooling1D) \n", " \n", " global_average_pooling1d_4 (None, 27) 0 ['activation_13[0][0]'] \n", " (GlobalAveragePooling1D) \n", " \n", " activation_14 (Activation) (None, 64) 0 ['batch_normalization_15[0][0]\n", " '] \n", " \n", " concatenate_2 (Concatenate (None, 123) 0 ['global_average_pooling1d_3[0\n", " ) ][0]', \n", " 'global_average_pooling1d_4[0\n", " ][0]', \n", " 'activation_14[0][0]'] \n", " \n", " dense_8 (Dense) (None, 128) 15872 ['concatenate_2[0][0]'] \n", " \n", " batch_normalization_16 (Ba (None, 128) 512 ['dense_8[0][0]'] \n", " tchNormalization) \n", " \n", " activation_15 (Activation) (None, 128) 0 ['batch_normalization_16[0][0]\n", " '] \n", " \n", " dropout_3 (Dropout) (None, 128) 0 ['activation_15[0][0]'] \n", " \n", " dense_9 (Dense) (None, 64) 8256 ['dropout_3[0][0]'] \n", " \n", " batch_normalization_17 (Ba (None, 64) 256 ['dense_9[0][0]'] \n", " tchNormalization) \n", " \n", " activation_16 (Activation) (None, 64) 0 ['batch_normalization_17[0][0]\n", " '] \n", " \n", " dropout_4 (Dropout) (None, 64) 0 ['activation_16[0][0]'] \n", " \n", " dense_10 (Dense) (None, 1) 65 ['dropout_4[0][0]'] \n", " \n", "==================================================================================================\n", "Total params: 53160 (207.66 KB)\n", "Trainable params: 52402 (204.70 KB)\n", "Non-trainable params: 758 (2.96 KB)\n", "__________________________________________________________________________________________________\n", "\n", "Inizio training...\n", "Epoch 1/50\n", "2836/2836 [==============================] - ETA: 0s - loss: 0.0938 - mae: 0.1971\n", "Epoch 1: val_loss improved from inf to 0.02909, saving model to ./kaggle/working/models/uvindex/checkpoints/best_model_01_0.0291.h5\n", "2836/2836 [==============================] - 60s 18ms/step - loss: 0.0938 - mae: 0.1971 - val_loss: 0.0291 - val_mae: 0.1910 - lr: 0.0010\n", "Epoch 2/50\n", "2835/2836 [============================>.] - ETA: 0s - loss: 0.0098 - mae: 0.0821\n", "Epoch 2: val_loss did not improve from 0.02909\n", "2836/2836 [==============================] - 51s 18ms/step - loss: 0.0098 - mae: 0.0821 - val_loss: 0.3685 - val_mae: 0.8389 - lr: 0.0010\n", "Epoch 3/50\n", "2833/2836 [============================>.] - ETA: 0s - loss: 0.0079 - mae: 0.0714\n", "Epoch 3: val_loss did not improve from 0.02909\n", "2836/2836 [==============================] - 53s 19ms/step - loss: 0.0079 - mae: 0.0714 - val_loss: 0.8313 - val_mae: 1.3285 - lr: 0.0010\n", "Epoch 4/50\n", "2836/2836 [==============================] - ETA: 0s - loss: 0.0077 - mae: 0.0704\n", "Epoch 4: val_loss did not improve from 0.02909\n", "2836/2836 [==============================] - 55s 19ms/step - loss: 0.0077 - mae: 0.0704 - val_loss: 0.2950 - val_mae: 0.7527 - lr: 0.0010\n", "Epoch 5/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0076 - mae: 0.0698\n", "Epoch 5: val_loss did not improve from 0.02909\n", "2836/2836 [==============================] - 55s 20ms/step - loss: 0.0076 - mae: 0.0698 - val_loss: 2.0383 - val_mae: 2.5369 - lr: 0.0010\n", "Epoch 6/50\n", "2836/2836 [==============================] - ETA: 0s - loss: 0.0075 - mae: 0.0699\n", "Epoch 6: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.\n", "\n", "Epoch 6: val_loss did not improve from 0.02909\n", "2836/2836 [==============================] - 54s 19ms/step - loss: 0.0075 - mae: 0.0699 - val_loss: 0.3982 - val_mae: 0.8782 - lr: 0.0010\n", "Epoch 7/50\n", "2835/2836 [============================>.] - ETA: 0s - loss: 0.0067 - mae: 0.0668\n", "Epoch 7: val_loss did not improve from 0.02909\n", "2836/2836 [==============================] - 53s 19ms/step - loss: 0.0067 - mae: 0.0668 - val_loss: 0.1131 - val_mae: 0.4568 - lr: 5.0000e-04\n", "Epoch 8/50\n", "2835/2836 [============================>.] - ETA: 0s - loss: 0.0066 - mae: 0.0662\n", "Epoch 8: val_loss did not improve from 0.02909\n", "2836/2836 [==============================] - 54s 19ms/step - loss: 0.0066 - mae: 0.0662 - val_loss: 1.1239 - val_mae: 1.6230 - lr: 5.0000e-04\n", "Epoch 9/50\n", "2834/2836 [============================>.] - ETA: 0s - loss: 0.0065 - mae: 0.0658\n", "Epoch 9: val_loss did not improve from 0.02909\n", "2836/2836 [==============================] - 53s 19ms/step - loss: 0.0065 - mae: 0.0658 - val_loss: 0.4153 - val_mae: 0.8974 - lr: 5.0000e-04\n", "Epoch 10/50\n", "2835/2836 [============================>.] - ETA: 0s - loss: 0.0064 - mae: 0.0651\n", "Epoch 10: val_loss did not improve from 0.02909\n", "2836/2836 [==============================] - 53s 19ms/step - loss: 0.0064 - mae: 0.0651 - val_loss: 0.0937 - val_mae: 0.4185 - lr: 5.0000e-04\n", "Epoch 11/50\n", "2835/2836 [============================>.] - ETA: 0s - loss: 0.0063 - mae: 0.0648\n", "Epoch 11: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.\n", "\n", "Epoch 11: val_loss did not improve from 0.02909\n", "2836/2836 [==============================] - 53s 19ms/step - loss: 0.0063 - mae: 0.0648 - val_loss: 0.7356 - val_mae: 1.2348 - lr: 5.0000e-04\n", "\n", "Generazione predizioni complete...\n", "4052/4052 [==============================] - 11s 3ms/step\n", "\n", "Statistiche finali predizioni uvindex:\n", "- Min: 0.2790\n", "- Max: 20.2327\n", "- Media: 3.2535\n" ] } ], "source": [ "models, histories, scalers = train_solar_models(weather_data, features)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "id": "ixAzWupmthA-", "outputId": "ee180137-1c9f-4eb1-8866-db1e1b1cb58c" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Salvataggio scaler:\n", "- Salvato scaler: X\n", "- Salvato scaler: solarradiation\n", "- Salvato scaler: solarenergy\n", "- Salvato scaler: uvindex\n", "- Salvato scaler: solarradiation_pred\n", "- Salvato scaler: solarenergy_pred\n" ] }, { "ename": "TypeError", "evalue": "cannot pickle 'dict_keys' object", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[20], line 4\u001b[0m\n\u001b[1;32m 1\u001b[0m target_variables \u001b[38;5;241m=\u001b[39m [\u001b[38;5;124m'\u001b[39m\u001b[38;5;124msolarradiation\u001b[39m\u001b[38;5;124m'\u001b[39m, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124msolarenergy\u001b[39m\u001b[38;5;124m'\u001b[39m, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124muvindex\u001b[39m\u001b[38;5;124m'\u001b[39m]\n\u001b[1;32m 3\u001b[0m \u001b[38;5;66;03m# Salva tutto direttamente\u001b[39;00m\n\u001b[0;32m----> 4\u001b[0m \u001b[43msave_models_and_scalers\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 5\u001b[0m \u001b[43m \u001b[49m\u001b[43mmodels\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmodels\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 6\u001b[0m \u001b[43m \u001b[49m\u001b[43mscalers\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mscalers\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;66;43;03m# Passiamo direttamente il dizionario degli scalers così com'è\u001b[39;49;00m\n\u001b[1;32m 7\u001b[0m \u001b[43m \u001b[49m\u001b[43mtarget_variables\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtarget_variables\u001b[49m\n\u001b[1;32m 8\u001b[0m \u001b[43m)\u001b[49m\n", "Cell \u001b[0;32mIn[8], line 30\u001b[0m, in \u001b[0;36msave_models_and_scalers\u001b[0;34m(models, scalers, target_variables, base_path)\u001b[0m\n\u001b[1;32m 28\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m scaler_name, scaler \u001b[38;5;129;01min\u001b[39;00m scalers\u001b[38;5;241m.\u001b[39mitems():\n\u001b[1;32m 29\u001b[0m scaler_file \u001b[38;5;241m=\u001b[39m os\u001b[38;5;241m.\u001b[39mpath\u001b[38;5;241m.\u001b[39mjoin(scaler_path, \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mscaler_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m.joblib\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[0;32m---> 30\u001b[0m \u001b[43mjoblib\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdump\u001b[49m\u001b[43m(\u001b[49m\u001b[43mscaler\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mscaler_file\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 31\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m- Salvato scaler: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mscaler_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 33\u001b[0m \u001b[38;5;66;03m# Salva la configurazione dei modelli\u001b[39;00m\n", "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/joblib/numpy_pickle.py:553\u001b[0m, in \u001b[0;36mdump\u001b[0;34m(value, filename, compress, protocol, cache_size)\u001b[0m\n\u001b[1;32m 551\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m is_filename:\n\u001b[1;32m 552\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m \u001b[38;5;28mopen\u001b[39m(filename, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mwb\u001b[39m\u001b[38;5;124m'\u001b[39m) \u001b[38;5;28;01mas\u001b[39;00m f:\n\u001b[0;32m--> 553\u001b[0m \u001b[43mNumpyPickler\u001b[49m\u001b[43m(\u001b[49m\u001b[43mf\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mprotocol\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mprotocol\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdump\u001b[49m\u001b[43m(\u001b[49m\u001b[43mvalue\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 554\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 555\u001b[0m NumpyPickler(filename, protocol\u001b[38;5;241m=\u001b[39mprotocol)\u001b[38;5;241m.\u001b[39mdump(value)\n", "File \u001b[0;32m/usr/lib/python3.11/pickle.py:487\u001b[0m, in \u001b[0;36m_Pickler.dump\u001b[0;34m(self, obj)\u001b[0m\n\u001b[1;32m 485\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mproto \u001b[38;5;241m>\u001b[39m\u001b[38;5;241m=\u001b[39m \u001b[38;5;241m4\u001b[39m:\n\u001b[1;32m 486\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mframer\u001b[38;5;241m.\u001b[39mstart_framing()\n\u001b[0;32m--> 487\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msave\u001b[49m\u001b[43m(\u001b[49m\u001b[43mobj\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 488\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mwrite(STOP)\n\u001b[1;32m 489\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mframer\u001b[38;5;241m.\u001b[39mend_framing()\n", "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/joblib/numpy_pickle.py:355\u001b[0m, in \u001b[0;36mNumpyPickler.save\u001b[0;34m(self, obj)\u001b[0m\n\u001b[1;32m 352\u001b[0m wrapper\u001b[38;5;241m.\u001b[39mwrite_array(obj, \u001b[38;5;28mself\u001b[39m)\n\u001b[1;32m 353\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m\n\u001b[0;32m--> 355\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mPickler\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msave\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mobj\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m/usr/lib/python3.11/pickle.py:560\u001b[0m, in \u001b[0;36m_Pickler.save\u001b[0;34m(self, obj, save_persistent_id)\u001b[0m\n\u001b[1;32m 558\u001b[0m f \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mdispatch\u001b[38;5;241m.\u001b[39mget(t)\n\u001b[1;32m 559\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m f \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m--> 560\u001b[0m \u001b[43mf\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mobj\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;66;03m# Call unbound method with explicit self\u001b[39;00m\n\u001b[1;32m 561\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m\n\u001b[1;32m 563\u001b[0m \u001b[38;5;66;03m# Check private dispatch table if any, or else\u001b[39;00m\n\u001b[1;32m 564\u001b[0m \u001b[38;5;66;03m# copyreg.dispatch_table\u001b[39;00m\n", "File \u001b[0;32m/usr/lib/python3.11/pickle.py:972\u001b[0m, in \u001b[0;36m_Pickler.save_dict\u001b[0;34m(self, obj)\u001b[0m\n\u001b[1;32m 969\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mwrite(MARK \u001b[38;5;241m+\u001b[39m DICT)\n\u001b[1;32m 971\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmemoize(obj)\n\u001b[0;32m--> 972\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_batch_setitems\u001b[49m\u001b[43m(\u001b[49m\u001b[43mobj\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mitems\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m/usr/lib/python3.11/pickle.py:998\u001b[0m, in \u001b[0;36m_Pickler._batch_setitems\u001b[0;34m(self, items)\u001b[0m\n\u001b[1;32m 996\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m k, v \u001b[38;5;129;01min\u001b[39;00m tmp:\n\u001b[1;32m 997\u001b[0m save(k)\n\u001b[0;32m--> 998\u001b[0m \u001b[43msave\u001b[49m\u001b[43m(\u001b[49m\u001b[43mv\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 999\u001b[0m write(SETITEMS)\n\u001b[1;32m 1000\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m n:\n", "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/joblib/numpy_pickle.py:355\u001b[0m, in \u001b[0;36mNumpyPickler.save\u001b[0;34m(self, obj)\u001b[0m\n\u001b[1;32m 352\u001b[0m wrapper\u001b[38;5;241m.\u001b[39mwrite_array(obj, \u001b[38;5;28mself\u001b[39m)\n\u001b[1;32m 353\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m\n\u001b[0;32m--> 355\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mPickler\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msave\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mobj\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m/usr/lib/python3.11/pickle.py:560\u001b[0m, in \u001b[0;36m_Pickler.save\u001b[0;34m(self, obj, save_persistent_id)\u001b[0m\n\u001b[1;32m 558\u001b[0m f \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mdispatch\u001b[38;5;241m.\u001b[39mget(t)\n\u001b[1;32m 559\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m f \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m--> 560\u001b[0m \u001b[43mf\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mobj\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;66;03m# Call unbound method with explicit self\u001b[39;00m\n\u001b[1;32m 561\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m\n\u001b[1;32m 563\u001b[0m \u001b[38;5;66;03m# Check private dispatch table if any, or else\u001b[39;00m\n\u001b[1;32m 564\u001b[0m \u001b[38;5;66;03m# copyreg.dispatch_table\u001b[39;00m\n", "File \u001b[0;32m/usr/lib/python3.11/pickle.py:972\u001b[0m, in \u001b[0;36m_Pickler.save_dict\u001b[0;34m(self, obj)\u001b[0m\n\u001b[1;32m 969\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mwrite(MARK \u001b[38;5;241m+\u001b[39m DICT)\n\u001b[1;32m 971\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmemoize(obj)\n\u001b[0;32m--> 972\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_batch_setitems\u001b[49m\u001b[43m(\u001b[49m\u001b[43mobj\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mitems\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m/usr/lib/python3.11/pickle.py:998\u001b[0m, in \u001b[0;36m_Pickler._batch_setitems\u001b[0;34m(self, items)\u001b[0m\n\u001b[1;32m 996\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m k, v \u001b[38;5;129;01min\u001b[39;00m tmp:\n\u001b[1;32m 997\u001b[0m save(k)\n\u001b[0;32m--> 998\u001b[0m \u001b[43msave\u001b[49m\u001b[43m(\u001b[49m\u001b[43mv\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 999\u001b[0m write(SETITEMS)\n\u001b[1;32m 1000\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m n:\n", "File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/joblib/numpy_pickle.py:355\u001b[0m, in \u001b[0;36mNumpyPickler.save\u001b[0;34m(self, obj)\u001b[0m\n\u001b[1;32m 352\u001b[0m wrapper\u001b[38;5;241m.\u001b[39mwrite_array(obj, \u001b[38;5;28mself\u001b[39m)\n\u001b[1;32m 353\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m\n\u001b[0;32m--> 355\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mPickler\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msave\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mobj\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m/usr/lib/python3.11/pickle.py:578\u001b[0m, in \u001b[0;36m_Pickler.save\u001b[0;34m(self, obj, save_persistent_id)\u001b[0m\n\u001b[1;32m 576\u001b[0m reduce \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mgetattr\u001b[39m(obj, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m__reduce_ex__\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;28;01mNone\u001b[39;00m)\n\u001b[1;32m 577\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m reduce \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m--> 578\u001b[0m rv \u001b[38;5;241m=\u001b[39m reduce(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mproto)\n\u001b[1;32m 579\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 580\u001b[0m reduce \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mgetattr\u001b[39m(obj, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m__reduce__\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;28;01mNone\u001b[39;00m)\n", "\u001b[0;31mTypeError\u001b[0m: cannot pickle 'dict_keys' object" ] } ], "source": [ "target_variables = ['solarradiation', 'solarenergy', 'uvindex']\n", "\n", "# Salva tutto direttamente\n", "save_models_and_scalers(\n", " models=models,\n", " scalers=scalers, # Passiamo direttamente il dizionario degli scalers così com'è\n", " target_variables=target_variables\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2024-10-24T16:14:44.770508Z", "start_time": "2024-10-24T13:29:15.181470Z" }, "id": "BlQK-7y7thA-" }, "outputs": [], "source": [ "data_after_2010 = weather_data[weather_data['year'] >= 2010].copy()\n", "data_before_2010 = weather_data[weather_data['year'] < 2010].copy()\n", "# Previsione delle variabili mancanti per data_before_2010\n", "# Prepara data_before_2010\n", "data_before_2010 = data_before_2010.sort_values('datetime')\n", "data_before_2010.set_index('datetime', inplace=True)\n", "\n", "data_after_2010 = data_after_2010.sort_values('datetime')\n", "data_after_2010.set_index('datetime', inplace=True)\n", "\n", "# Assicurati che le features non abbiano valori mancanti\n", "data_before_2010[features] = data_before_2010[features].ffill()\n", "data_before_2010[features] = data_before_2010[features].bfill()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2024-10-24T18:50:48.087413Z", "start_time": "2024-10-24T18:47:52.511763Z" }, "id": "r_hFmenDthA-", "outputId": "650f8755-f6f6-47b4-fc74-c194dd81bf64" }, "outputs": [], "source": [ "#models, scaler_X, scalers_y, target_variables = load_models_and_scalers()\n", "\n", "# Effettua predizioni\n", "predictions = predict_solar_variables(\n", " data_before_2010=data_before_2010,\n", " features=features,\n", " models=models,\n", " scalers=scalers, # dizionario completo degli scalers\n", " target_variables=target_variables\n", ")\n", "\n", "# Crea dataset completo\n", "weather_data_complete = create_complete_dataset(\n", " data_before_2010,\n", " data_after_2010,\n", " predictions\n", ")\n", "\n", "# Salva il risultato\n", "weather_data_complete.reset_index(inplace=True)\n", "weather_data_complete.to_parquet(\n", " './kaggle/working/data/weather_data_complete.parquet',\n", " index=False\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "IKObKOVEthA-" }, "source": [ "## 2. Esplorazione dei Dati Meteo" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2024-10-23T06:10:46.688323Z", "start_time": "2024-10-23T06:10:46.586185Z" }, "id": "Z64O5RD9thA-" }, "outputs": [], "source": [ "weather_data = pd.read_parquet('./kaggle/working/data/weather_data_complete.parquet')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2024-10-23T06:10:50.718574Z", "start_time": "2024-10-23T06:10:46.901554Z" }, "id": "f3j3IUvothA-", "outputId": "a7f38468-f2f4-491e-eda5-ba6e6b8064ee" }, "outputs": [], "source": [ "# Visualizzazione delle tendenze temporali\n", "fig, axes = plt.subplots(6, 1, figsize=(15, 20))\n", "weather_data.set_index('date')['temp'].plot(ax=axes[0], title='Temperatura Media Giornaliera')\n", "weather_data.set_index('date')['humidity'].plot(ax=axes[1], title='Umidità Media Giornaliera')\n", "weather_data.set_index('date')['solarradiation'].plot(ax=axes[2], title='Radiazione Solare Giornaliera')\n", "weather_data.set_index('date')['solarenergy'].plot(ax=axes[3], title='Radiazione Solare Giornaliera')\n", "weather_data.set_index('date')['uvindex'].plot(ax=axes[4], title='Precipitazioni Giornaliere')\n", "weather_data.set_index('date')['precip'].plot(ax=axes[4], title='Precipitazioni Giornaliere')\n", "plt.tight_layout()\n", "plt.show()\n", "save_plot(plt, 'weather_trends')\n", "plt.close()" ] }, { "cell_type": "markdown", "metadata": { "id": "DHcEwp3pthA_" }, "source": [ "## 3. Simulazione dei Dati di Produzione Annuale" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2024-10-23T06:10:51.081621Z", "start_time": "2024-10-23T06:10:51.044080Z" }, "id": "5oG_nhbMthA_" }, "outputs": [], "source": [ "olive_varieties = pd.read_csv('./kaggle/input/olive-oil/variety_olive_oil_production.csv')\n", "\n", "olive_varieties = add_olive_water_consumption_correlation(olive_varieties)\n", "\n", "olive_varieties.to_parquet(\"./kaggle/working/data/olive_varieties.parquet\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2024-10-24T10:59:32.356335Z", "start_time": "2024-10-24T10:59:32.229812Z" }, "id": "Y2IH37lAthA_", "outputId": "d14e77c8-a4fb-4328-f6c6-de788bca8188" }, "outputs": [], "source": [ "olive_varieties = pd.read_parquet(\"./kaggle/working/data/olive_varieties.parquet\")\n", "\n", "weather_data = pd.read_parquet('./kaggle/working/data/weather_data_complete.parquet')\n", "\n", "simulated_data = simulate_olive_production_parallel(weather_data, olive_varieties, 1000, random_state_value)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Visualizza il mapping delle tecniche\n", "print_technique_mapping()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2024-10-23T06:10:54.639402Z", "start_time": "2024-10-23T06:10:52.895228Z" }, "id": "4izJmAsbthA_", "outputId": "9f871e9b-c9b5-406d-f482-b925befd9dad" }, "outputs": [], "source": [ "simulated_data = pd.read_parquet(\"./kaggle/working/data/simulated_data.parquet\")\n", "\n", "# Esecuzione dell'analisi\n", "comparison_data = prepare_comparison_data(simulated_data, olive_varieties)\n", "\n", "# Genera i grafici\n", "plot_variety_comparison(comparison_data, 'Avg Olive Production (kg/ha)')\n", "plot_variety_comparison(comparison_data, 'Avg Oil Production (L/ha)')\n", "plot_variety_comparison(comparison_data, 'Avg Water Need (m³/ha)')\n", "plot_variety_comparison(comparison_data, 'Oil Efficiency (L/kg)')\n", "plot_variety_comparison(comparison_data, 'Water Efficiency (L oil/m³ water)')\n", "plot_efficiency_vs_production(comparison_data)\n", "plot_water_efficiency_vs_production(comparison_data)\n", "plot_water_need_vs_oil_production(comparison_data)\n", "\n", "# Analisi per tecnica\n", "technique_data = analyze_by_technique(simulated_data, olive_varieties)\n", "\n", "print(technique_data)\n", "\n", "# Stampa un sommario statistico\n", "print(\"Comparison by Variety:\")\n", "print(comparison_data.set_index('Variety'))\n", "print(\"\\nBest Varieties by Water Efficiency:\")\n", "print(comparison_data.sort_values('Water Efficiency (L oil/m³ water)', ascending=False).head())" ] }, { "cell_type": "markdown", "metadata": { "id": "dwhl4ID_thBA" }, "source": [ "## 4. Analisi della Relazione tra Meteo e Produzione" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2024-10-23T06:10:55.903873Z", "start_time": "2024-10-23T06:10:54.655058Z" }, "id": "b28MG3NGthBA", "outputId": "ac0759ce-ee6e-49e0-9ddd-a70d01ea18ff" }, "outputs": [], "source": [ "# Uso delle funzioni\n", "full_data = get_full_data(simulated_data, olive_varieties)\n", "\n", "# Assumiamo che 'selected_variety' sia definito altrove nel codice\n", "# Per esempio:\n", "selected_variety = 'nocellara_delletna'\n", "\n", "analyze_correlations(full_data, selected_variety)" ] }, { "cell_type": "markdown", "metadata": { "id": "OZQ6hHFLthBA" }, "source": [ "## 5. Preparazione del Modello di Machine Learning" ] }, { "cell_type": "markdown", "metadata": { "id": "smX8MBhithBA" }, "source": [ "## Divisione train/validation/test:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2024-10-24T10:25:49.473595Z", "start_time": "2024-10-24T10:25:49.199833Z" }, "id": "tupaX2LNthBA", "outputId": "0a7968cd-9fef-4873-b834-d6b13fe805be" }, "outputs": [], "source": [ "simulated_data = pd.read_parquet(\"./kaggle/working/data/simulated_data.parquet\")\n", "olive_varieties = pd.read_parquet(\"./kaggle/working/data/olive_varieties.parquet\")\n", "\n", "(train_data, train_targets), (val_data, val_targets), (test_data, test_targets), scalers = prepare_transformer_data(simulated_data, olive_varieties)\n", "\n", "scaler_temporal, scaler_static, scaler_y = scalers\n", "\n", "print(\"Temporal data shape:\", train_data['temporal'].shape)\n", "print(\"Static data shape:\", train_data['static'].shape)\n", "print(\"Target shape:\", train_targets.shape)" ] }, { "cell_type": "markdown", "metadata": { "id": "kE7oohfsthBB" }, "source": [ "## OliveOilTransformer" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2024-10-24T09:32:37.506903Z", "start_time": "2024-10-24T09:32:36.905756Z" }, "id": "_l868dFFthBB", "outputId": "b67993d4-a49e-4b75-d346-bf7f362f932d" }, "outputs": [], "source": [ "@keras.saving.register_keras_serializable()\n", "class DataAugmentation(tf.keras.layers.Layer):\n", " \"\"\"Custom layer per l'augmentation dei dati\"\"\"\n", " def __init__(self, noise_stddev=0.03, **kwargs):\n", " super().__init__(**kwargs)\n", " self.noise_stddev = noise_stddev\n", "\n", " def call(self, inputs, training=None):\n", " if training:\n", " return inputs + tf.random.normal(\n", " shape=tf.shape(inputs), \n", " mean=0.0, \n", " stddev=self.noise_stddev\n", " )\n", " return inputs\n", "\n", " def get_config(self):\n", " config = super().get_config()\n", " config.update({\"noise_stddev\": self.noise_stddev})\n", " return config\n", "\n", "@keras.saving.register_keras_serializable()\n", "class PositionalEncoding(tf.keras.layers.Layer):\n", " \"\"\"Custom layer per l'encoding posizionale\"\"\"\n", " def __init__(self, d_model, **kwargs):\n", " super().__init__(**kwargs)\n", " self.d_model = d_model\n", " \n", " def build(self, input_shape):\n", " _, seq_length, _ = input_shape\n", " \n", " # Crea la matrice di encoding posizionale\n", " position = tf.range(seq_length, dtype=tf.float32)[:, tf.newaxis]\n", " div_term = tf.exp(\n", " tf.range(0, self.d_model, 2, dtype=tf.float32) * \n", " (-tf.math.log(10000.0) / self.d_model)\n", " )\n", " \n", " # Calcola sin e cos\n", " pos_encoding = tf.zeros((1, seq_length, self.d_model))\n", " pos_encoding_even = tf.sin(position * div_term)\n", " pos_encoding_odd = tf.cos(position * div_term)\n", " \n", " # Assegna i valori alle posizioni pari e dispari\n", " pos_encoding = tf.concat(\n", " [tf.expand_dims(pos_encoding_even, -1), \n", " tf.expand_dims(pos_encoding_odd, -1)], \n", " axis=-1\n", " )\n", " pos_encoding = tf.reshape(pos_encoding, (1, seq_length, -1))\n", " pos_encoding = pos_encoding[:, :, :self.d_model]\n", " \n", " # Salva l'encoding come peso non trainabile\n", " self.pos_encoding = self.add_weight(\n", " shape=(1, seq_length, self.d_model),\n", " initializer=tf.keras.initializers.Constant(pos_encoding),\n", " trainable=False,\n", " name='positional_encoding'\n", " )\n", " \n", " super().build(input_shape)\n", "\n", " def call(self, inputs):\n", " # Broadcast l'encoding posizionale sul batch\n", " batch_size = tf.shape(inputs)[0]\n", " pos_encoding_tiled = tf.tile(self.pos_encoding, [batch_size, 1, 1])\n", " return inputs + pos_encoding_tiled\n", "\n", " def get_config(self):\n", " config = super().get_config()\n", " config.update({\"d_model\": self.d_model})\n", " return config\n", "\n", "@keras.saving.register_keras_serializable()\n", "class WarmUpLearningRateSchedule(tf.keras.optimizers.schedules.LearningRateSchedule):\n", " \"\"\"Custom learning rate schedule with linear warmup and exponential decay.\"\"\"\n", " \n", " def __init__(self, initial_learning_rate=1e-3, warmup_steps=500, decay_steps=5000):\n", " super().__init__()\n", " self.initial_learning_rate = initial_learning_rate\n", " self.warmup_steps = warmup_steps\n", " self.decay_steps = decay_steps\n", "\n", " def __call__(self, step):\n", " warmup_pct = tf.cast(step, tf.float32) / self.warmup_steps\n", " warmup_lr = self.initial_learning_rate * warmup_pct\n", " decay_factor = tf.pow(0.1, tf.cast(step, tf.float32) / self.decay_steps)\n", " decayed_lr = self.initial_learning_rate * decay_factor\n", " return tf.where(step < self.warmup_steps, warmup_lr, decayed_lr)\n", "\n", " def get_config(self):\n", " return {\n", " 'initial_learning_rate': self.initial_learning_rate,\n", " 'warmup_steps': self.warmup_steps,\n", " 'decay_steps': self.decay_steps\n", " }\n", "\n", "def create_olive_oil_transformer(temporal_shape, static_shape, num_outputs,\n", " d_model=128, num_heads=8, ff_dim=256,\n", " num_transformer_blocks=4, mlp_units=[256, 128, 64],\n", " dropout=0.2):\n", " \"\"\"\n", " Crea un transformer per la predizione della produzione di olio d'oliva.\n", " \"\"\"\n", " # Input layers\n", " temporal_input = tf.keras.layers.Input(shape=temporal_shape, name='temporal')\n", " static_input = tf.keras.layers.Input(shape=static_shape, name='static')\n", "\n", " # === TEMPORAL PATH ===\n", " x = tf.keras.layers.LayerNormalization(epsilon=1e-6)(temporal_input)\n", " x = DataAugmentation()(x)\n", "\n", " # Temporal projection\n", " x = tf.keras.layers.Dense(\n", " d_model // 2,\n", " activation='gelu',\n", " kernel_regularizer=tf.keras.regularizers.l2(1e-5)\n", " )(x)\n", " x = tf.keras.layers.Dropout(dropout)(x)\n", " x = tf.keras.layers.Dense(\n", " d_model,\n", " activation='gelu',\n", " kernel_regularizer=tf.keras.regularizers.l2(1e-5)\n", " )(x)\n", "\n", " # Positional encoding\n", " x = PositionalEncoding(d_model)(x)\n", "\n", " # Transformer blocks\n", " skip_connection = x\n", " for _ in range(num_transformer_blocks):\n", " # Self-attention\n", " attention_output = tf.keras.layers.MultiHeadAttention(\n", " num_heads=num_heads,\n", " key_dim=d_model // num_heads,\n", " value_dim=d_model // num_heads\n", " )(x, x)\n", " attention_output = tf.keras.layers.Dropout(dropout)(attention_output)\n", "\n", " # Residual connection con pesi addestrabili\n", " residual_weights = tf.keras.layers.Dense(d_model, activation='sigmoid')(x)\n", " x = tf.keras.layers.Add()([x, residual_weights * attention_output])\n", " x = tf.keras.layers.LayerNormalization(epsilon=1e-6)(x)\n", "\n", " # Feed-forward network\n", " ffn = tf.keras.layers.Dense(ff_dim, activation=\"gelu\")(x)\n", " ffn = tf.keras.layers.Dropout(dropout)(ffn)\n", " ffn = tf.keras.layers.Dense(d_model)(ffn)\n", " ffn = tf.keras.layers.Dropout(dropout)(ffn)\n", "\n", " # Second residual connection\n", " x = tf.keras.layers.Add()([x, ffn])\n", " x = tf.keras.layers.LayerNormalization(epsilon=1e-6)(x)\n", "\n", " # Add final skip connection\n", " x = tf.keras.layers.Add()([x, skip_connection])\n", "\n", " # Temporal pooling\n", " attention_pooled = tf.keras.layers.MultiHeadAttention(\n", " num_heads=num_heads,\n", " key_dim=d_model // 4\n", " )(x, x)\n", " attention_pooled = tf.keras.layers.GlobalAveragePooling1D()(attention_pooled)\n", "\n", " # Additional pooling operations\n", " avg_pooled = tf.keras.layers.GlobalAveragePooling1D()(x)\n", " max_pooled = tf.keras.layers.GlobalMaxPooling1D()(x)\n", "\n", " # Combine pooling results\n", " temporal_features = tf.keras.layers.Concatenate()(\n", " [attention_pooled, avg_pooled, max_pooled]\n", " )\n", "\n", " # === STATIC PATH ===\n", " static_features = tf.keras.layers.LayerNormalization(epsilon=1e-6)(static_input)\n", " for units in [256, 128, 64]:\n", " static_features = tf.keras.layers.Dense(\n", " units,\n", " activation='gelu',\n", " kernel_regularizer=tf.keras.regularizers.l2(1e-5)\n", " )(static_features)\n", " static_features = tf.keras.layers.Dropout(dropout)(static_features)\n", "\n", " # === FEATURE FUSION ===\n", " combined = tf.keras.layers.Concatenate()([temporal_features, static_features])\n", "\n", " # === MLP HEAD ===\n", " x = combined\n", " for units in mlp_units:\n", " x = tf.keras.layers.BatchNormalization()(x)\n", " x = tf.keras.layers.Dense(\n", " units,\n", " activation=\"gelu\",\n", " kernel_regularizer=tf.keras.regularizers.l2(1e-5)\n", " )(x)\n", " x = tf.keras.layers.Dropout(dropout)(x)\n", "\n", " # Output layer\n", " outputs = tf.keras.layers.Dense(\n", " num_outputs,\n", " activation='linear',\n", " kernel_regularizer=tf.keras.regularizers.l2(1e-5)\n", " )(x)\n", "\n", " # Create model\n", " model = tf.keras.Model(\n", " inputs={'temporal': temporal_input, 'static': static_input},\n", " outputs=outputs,\n", " name='OilTransformer'\n", " )\n", " \n", " return model\n", "\n", "\n", "def create_transformer_callbacks(target_names, val_data, val_targets):\n", " \"\"\"\n", " Crea i callbacks per il training del modello.\n", " \n", " Parameters:\n", " -----------\n", " target_names : list\n", " Lista dei nomi dei target per il monitoraggio specifico\n", " val_data : dict\n", " Dati di validazione\n", " val_targets : array\n", " Target di validazione\n", " \n", " Returns:\n", " --------\n", " list\n", " Lista dei callbacks configurati\n", " \"\"\"\n", "\n", " # Custom Metric per target specifici\n", " class TargetSpecificMetric(tf.keras.callbacks.Callback):\n", " def __init__(self, validation_data, target_names):\n", " super().__init__()\n", " self.validation_data = validation_data\n", " self.target_names = target_names\n", "\n", " def on_epoch_end(self, epoch, logs={}):\n", " x_val, y_val = self.validation_data\n", " y_pred = self.model.predict(x_val, verbose=0)\n", "\n", " for i, name in enumerate(self.target_names):\n", " mae = np.mean(np.abs(y_val[:, i] - y_pred[:, i]))\n", " logs[f'val_{name}_mae'] = mae\n", "\n", " # Crea le cartelle per i checkpoint e i log se non esistono\n", " os.makedirs('./kaggle/working/models/oil_transformer/checkpoints', exist_ok=True)\n", " os.makedirs('./kaggle/working/models/oil_transformer/logs', exist_ok=True)\n", "\n", " callbacks = [\n", " # Early Stopping\n", " tf.keras.callbacks.EarlyStopping(\n", " monitor='val_loss',\n", " patience=20,\n", " restore_best_weights=True,\n", " min_delta=0.0005,\n", " mode='min'\n", " ),\n", "\n", " # Model Checkpoint\n", " tf.keras.callbacks.ModelCheckpoint(\n", " filepath='./kaggle/working/models/oil_transformer/checkpoints/model_{epoch:02d}_{val_loss:.4f}.h5',\n", " monitor='val_loss',\n", " save_best_only=True,\n", " mode='min',\n", " save_weights_only=True\n", " ),\n", "\n", " # Metric per target specifici\n", " TargetSpecificMetric(\n", " validation_data=(val_data, val_targets),\n", " target_names=target_names\n", " ),\n", "\n", " # Reduce LR on Plateau\n", " tf.keras.callbacks.ReduceLROnPlateau(\n", " monitor='val_loss',\n", " factor=0.5,\n", " patience=10,\n", " min_lr=1e-6,\n", " verbose=1\n", " ),\n", "\n", " # TensorBoard logging\n", " tf.keras.callbacks.TensorBoard(\n", " log_dir='./kaggle/working/models/oil_transformer/logs',\n", " histogram_freq=1,\n", " write_graph=True,\n", " update_freq='epoch'\n", " )\n", " ]\n", "\n", " return callbacks\n", "\n", "def compile_model(model, learning_rate=1e-3):\n", " \"\"\"\n", " Compila il modello con le impostazioni standard.\n", " \"\"\"\n", " lr_schedule = WarmUpLearningRateSchedule(\n", " initial_learning_rate=learning_rate,\n", " warmup_steps=500,\n", " decay_steps=5000\n", " )\n", " \n", " model.compile(\n", " optimizer=tf.keras.optimizers.AdamW(\n", " learning_rate=lr_schedule,\n", " weight_decay=0.01\n", " ),\n", " loss=tf.keras.losses.Huber(),\n", " metrics=['mae']\n", " )\n", "\n", " return model\n", "\n", "\n", "def setup_transformer_training(train_data, train_targets, val_data, val_targets):\n", " \"\"\"\n", " Configura e prepara il transformer con dimensioni dinamiche basate sui dati.\n", " \"\"\"\n", " # Estrai le shape dai dati\n", " temporal_shape = (train_data['temporal'].shape[1], train_data['temporal'].shape[2])\n", " static_shape = (train_data['static'].shape[1],)\n", " num_outputs = train_targets.shape[1]\n", "\n", " print(f\"Shape rilevate:\")\n", " print(f\"- Temporal shape: {temporal_shape}\")\n", " print(f\"- Static shape: {static_shape}\")\n", " print(f\"- Numero di output: {num_outputs}\")\n", "\n", " # Target names basati sul numero di output\n", " target_names = ['olive_prod', 'min_oil_prod', 'max_oil_prod', 'avg_oil_prod', 'total_water_need']\n", "\n", " # Assicurati che il numero di target names corrisponda al numero di output\n", " assert len(target_names) == num_outputs, \\\n", " f\"Il numero di target names ({len(target_names)}) non corrisponde al numero di output ({num_outputs})\"\n", "\n", " # Crea il modello con le dimensioni rilevate\n", " model = create_olive_oil_transformer(\n", " temporal_shape=temporal_shape,\n", " static_shape=static_shape,\n", " num_outputs=num_outputs\n", " )\n", "\n", " # Compila il modello\n", " model = compile_model(model)\n", "\n", " # Crea i callbacks\n", " callbacks = create_transformer_callbacks(target_names, val_data, val_targets)\n", "\n", " return model, callbacks, target_names\n", "\n", "def train_transformer(train_data, train_targets, val_data, val_targets, epochs=150, batch_size=64, save_name='final_model'):\n", " \"\"\"\n", " Funzione principale per l'addestramento del transformer.\n", " \"\"\"\n", " # Setup del modello\n", " model, callbacks, target_names = setup_transformer_training(\n", " train_data, train_targets, val_data, val_targets\n", " )\n", "\n", " # Mostra il summary del modello\n", " model.summary()\n", " os.makedirs(f\"./kaggle/working/models/oil_transformer/\", exist_ok=True)\n", " keras.utils.plot_model(model, f\"./kaggle/working/models/oil_transformer/{save_name}.png\", show_shapes=True)\n", "\n", " # Training\n", " history = model.fit(\n", " x=train_data,\n", " y=train_targets,\n", " validation_data=(val_data, val_targets),\n", " epochs=epochs,\n", " batch_size=batch_size,\n", " callbacks=callbacks,\n", " verbose=1,\n", " shuffle=True\n", " )\n", "\n", " # Salva il modello finale\n", " save_path = f'./kaggle/working/models/oil_transformer/{save_name}.keras'\n", " model.save(save_path, save_format='keras')\n", " \n", " os.makedirs(f'./kaggle/working/models/oil_transformer/weights/', exist_ok=True)\n", " model.save_weights(f'./kaggle/working/models/oil_transformer/weights')\n", " print(f\"\\nModello salvato in: {save_path}\")\n", "\n", " return model, history" ] }, { "cell_type": "markdown", "metadata": { "id": "aytSjU1UthBB" }, "source": [ "## Model Training" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2024-10-24T09:33:43.625381Z", "start_time": "2024-10-24T09:33:34.088970Z" }, "id": "xE3iTWonthBB", "outputId": "a784254e-deea-4fd3-8578-6a0dbbd45bd7" }, "outputs": [], "source": [ "model, history = train_transformer(train_data, train_targets, val_data, val_targets)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "hPPbvFYmthBB", "outputId": "e6570501-00e1-4dde-81e2-4712652a46b3" }, "outputs": [], "source": [ "# Calcola gli errori reali\n", "percentage_errors, absolute_errors = calculate_real_error(model, val_data, val_targets, scaler_y)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def evaluate_model_performance(model, data, targets, set_name=\"\"):\n", " \"\"\"\n", " Valuta le performance del modello su un set di dati specifico.\n", " \"\"\"\n", " predictions = model.predict(data, verbose=0)\n", " \n", " target_names = ['olive_prod', 'min_oil_prod', 'max_oil_prod', 'avg_oil_prod', 'total_water_need']\n", " metrics = {}\n", " \n", " for i, name in enumerate(target_names):\n", " mae = np.mean(np.abs(targets[:, i] - predictions[:, i]))\n", " mse = np.mean(np.square(targets[:, i] - predictions[:, i]))\n", " rmse = np.sqrt(mse)\n", " mape = np.mean(np.abs((targets[:, i] - predictions[:, i]) / (targets[:, i] + 1e-7))) * 100\n", " \n", " metrics[f\"{name}_mae\"] = mae\n", " metrics[f\"{name}_rmse\"] = rmse\n", " metrics[f\"{name}_mape\"] = mape\n", " \n", " if set_name:\n", " print(f\"\\nPerformance sul set {set_name}:\")\n", " for metric, value in metrics.items():\n", " print(f\"{metric}: {value:.4f}\")\n", " \n", " return metrics\n", "\n", "def retrain_model(base_model, train_data, train_targets, \n", " val_data, val_targets, \n", " test_data, test_targets,\n", " epochs=50, batch_size=128):\n", " \"\"\"\n", " Implementa il retraining del modello con i dati combinati.\n", " \"\"\"\n", " print(\"Valutazione performance iniziali del modello...\")\n", " initial_metrics = {\n", " 'train': evaluate_model_performance(base_model, train_data, train_targets, \"training\"),\n", " 'val': evaluate_model_performance(base_model, val_data, val_targets, \"validazione\"),\n", " 'test': evaluate_model_performance(base_model, test_data, test_targets, \"test\")\n", " }\n", " \n", " # Combina i dati per il retraining\n", " combined_data = {\n", " 'temporal': np.concatenate([train_data['temporal'], val_data['temporal'], test_data['temporal']]),\n", " 'static': np.concatenate([train_data['static'], val_data['static'], test_data['static']])\n", " }\n", " combined_targets = np.concatenate([train_targets, val_targets, test_targets])\n", " \n", " # Crea una nuova suddivisione per la validazione\n", " indices = np.arange(len(combined_targets))\n", " np.random.shuffle(indices)\n", " \n", " split_idx = int(len(indices) * 0.9)\n", " train_idx, val_idx = indices[:split_idx], indices[split_idx:]\n", " \n", " # Prepara i dati per il retraining\n", " retrain_data = {k: v[train_idx] for k, v in combined_data.items()}\n", " retrain_targets = combined_targets[train_idx]\n", " retrain_val_data = {k: v[val_idx] for k, v in combined_data.items()}\n", " retrain_val_targets = combined_targets[val_idx]\n", " \n", " checkpoint_path = './kaggle/working/models/oil_transformer/retrain_checkpoints'\n", " os.makedirs(checkpoint_path, exist_ok=True)\n", " \n", " # Configura callbacks\n", " callbacks = [\n", " tf.keras.callbacks.EarlyStopping(\n", " monitor='val_loss',\n", " patience=10,\n", " restore_best_weights=True,\n", " min_delta=0.0001\n", " ),\n", " tf.keras.callbacks.ReduceLROnPlateau(\n", " monitor='val_loss',\n", " factor=0.2,\n", " patience=5,\n", " min_lr=1e-6,\n", " verbose=1\n", " ),\n", " tf.keras.callbacks.ModelCheckpoint(\n", " filepath=os.path.join(checkpoint_path, 'model_{epoch:02d}_{val_loss:.4f}.keras'),\n", " monitor='val_loss',\n", " save_best_only=True,\n", " mode='min',\n", " save_weights_only=True\n", " )\n", " ]\n", " \n", " # Imposta learning rate per il fine-tuning\n", " optimizer = tf.keras.optimizers.AdamW(\n", " learning_rate=tf.keras.optimizers.schedules.ExponentialDecay(\n", " initial_learning_rate=1e-4,\n", " decay_steps=1000,\n", " decay_rate=0.9\n", " ),\n", " weight_decay=0.01\n", " )\n", " \n", " # Ricompila il modello con il nuovo optimizer\n", " base_model.compile(\n", " optimizer=optimizer,\n", " loss=tf.keras.losses.Huber(),\n", " metrics=['mae']\n", " )\n", " \n", " print(\"\\nAvvio retraining...\")\n", " history = base_model.fit(\n", " retrain_data,\n", " retrain_targets,\n", " validation_data=(retrain_val_data, retrain_val_targets),\n", " epochs=epochs,\n", " batch_size=batch_size,\n", " callbacks=callbacks,\n", " verbose=1\n", " )\n", " \n", " print(\"\\nValutazione performance finali...\")\n", " final_metrics = {\n", " 'train': evaluate_model_performance(base_model, train_data, train_targets, \"training\"),\n", " 'val': evaluate_model_performance(base_model, val_data, val_targets, \"validazione\"),\n", " 'test': evaluate_model_performance(base_model, test_data, test_targets, \"test\")\n", " }\n", " \n", " # Salva il modello finale\n", " save_path = './kaggle/working/models/oil_transformer/retrained_model.keras'\n", " os.makedirs(os.path.dirname(save_path), exist_ok=True)\n", " base_model.save(save_path, save_format='keras')\n", " print(f\"\\nModello riaddestrato salvato in: {save_path}\")\n", " \n", " # Report miglioramenti\n", " print(\"\\nMiglioramenti delle performance:\")\n", " for dataset in ['train', 'val', 'test']:\n", " print(f\"\\nSet {dataset}:\")\n", " for metric in initial_metrics[dataset].keys():\n", " initial = initial_metrics[dataset][metric]\n", " final = final_metrics[dataset][metric]\n", " improvement = ((initial - final) / initial) * 100\n", " print(f\"{metric}: {improvement:.2f}% di miglioramento\")\n", " \n", " return base_model, history, final_metrics\n", "\n", "def start_retraining(model_path, train_data, train_targets, \n", " val_data, val_targets, \n", " test_data, test_targets,\n", " epochs=50, batch_size=128):\n", " \"\"\"\n", " Avvia il processo di retraining in modo sicuro.\n", " \"\"\"\n", " try:\n", " print(\"Caricamento del modello...\")\n", " base_model = tf.keras.models.load_model(model_path, compile=False)\n", " print(\"Modello caricato con successo!\")\n", " \n", " return retrain_model(\n", " base_model=base_model,\n", " train_data=train_data,\n", " train_targets=train_targets,\n", " val_data=val_data,\n", " val_targets=val_targets,\n", " test_data=test_data,\n", " test_targets=test_targets,\n", " epochs=epochs,\n", " batch_size=batch_size\n", " )\n", " except Exception as e:\n", " print(f\"Errore durante il retraining: {str(e)}\")\n", " raise" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_path = './kaggle/working/models/oil_transformer/final_model.keras'\n", "\n", "retrained_model, retrain_history, final_metrics = start_retraining(\n", " model_path=model_path,\n", " train_data=train_data,\n", " train_targets=train_targets,\n", " val_data=val_data,\n", " val_targets=val_targets,\n", " test_data=test_data,\n", " test_targets=test_targets,\n", " epochs=50,\n", " batch_size=128\n", ")\n", "\n", "# Visualizza i risultati\n", "visualize_retraining_results(retrain_history, initial_metrics, final_metrics)" ] }, { "cell_type": "markdown", "metadata": { "id": "4BAI1zsJthBC" }, "source": [ "## 8. Conclusioni e Prossimi Passi\n", "\n", "In questo notebook, abbiamo:\n", "1. Caricato e analizzato i dati meteorologici\n", "2. Simulato la produzione annuale di olive basata sui dati meteo\n", "3. Esplorato le relazioni tra variabili meteorologiche e produzione di olive\n", "4. Creato e valutato un modello di machine learning per prevedere la produzione\n", "5. Utilizzato ARIMA per fare previsioni meteo\n", "6. Previsto la produzione di olive per il prossimo anno\n", "\n", "Prossimi passi:\n", "- Raccogliere dati reali sulla produzione di olive per sostituire i dati simulati\n", "- Esplorare modelli più avanzati, come le reti neurali o i modelli di ensemble\n", "- Incorporare altri fattori che potrebbero influenzare la produzione, come le pratiche agricole o l'età degli alberi\n", "- Sviluppare una dashboard interattiva basata su questo modello" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "accelerator": "GPU", "colab": { "gpuType": "A100", "provenance": [] }, "kaggle": { "accelerator": "none", "dataSources": [ { "datasetId": 5950719, "sourceId": 9725208, "sourceType": "datasetVersion" }, { "datasetId": 5954901, "sourceId": 9730815, "sourceType": "datasetVersion" } ], "dockerImageVersionId": 30787, "isGpuEnabled": false, "isInternetEnabled": true, "language": "python", "sourceType": "notebook" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.0rc1" } }, "nbformat": 4, "nbformat_minor": 4 }