4447 lines
240 KiB
Plaintext
4447 lines
240 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "vwMqHwWTthA4"
|
|
},
|
|
"source": [
|
|
"# Analisi e Previsione della Produzione di Olio d'Oliva\n",
|
|
"\n",
|
|
"Questo notebook esplora la relazione tra i dati meteorologici e la produzione annuale di olio d'oliva, con l'obiettivo di creare un modello predittivo."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-10-29T15:15:51.992629Z",
|
|
"start_time": "2024-10-29T15:15:51.940019Z"
|
|
}
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Hit:1 http://archive.ubuntu.com/ubuntu jammy InRelease\n",
|
|
"Get:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB] \n",
|
|
"Get:3 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB] \n",
|
|
"Hit:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64 InRelease\n",
|
|
"Get:5 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB] \n",
|
|
"Get:6 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [2672 kB]\n",
|
|
"Get:7 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1452 kB]\n",
|
|
"Get:8 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages [3241 kB]\n",
|
|
"Get:9 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [2397 kB]\n",
|
|
"Get:10 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [1163 kB]\n",
|
|
"Fetched 11.3 MB in 2s (5846 kB/s) \n",
|
|
"Reading package lists... Done\n",
|
|
"Reading package lists... Done\n",
|
|
"Building dependency tree... Done\n",
|
|
"Reading state information... Done\n",
|
|
"graphviz is already the newest version (2.42.2-6ubuntu0.1).\n",
|
|
"0 upgraded, 0 newly installed, 0 to remove and 120 not upgraded.\n",
|
|
"Requirement already satisfied: tensorflow in /usr/local/lib/python3.11/dist-packages (2.14.0)\n",
|
|
"Requirement already satisfied: absl-py>=1.0.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (2.0.0)\n",
|
|
"Requirement already satisfied: astunparse>=1.6.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.6.3)\n",
|
|
"Requirement already satisfied: flatbuffers>=23.5.26 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (23.5.26)\n",
|
|
"Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (0.5.4)\n",
|
|
"Requirement already satisfied: google-pasta>=0.1.1 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (0.2.0)\n",
|
|
"Requirement already satisfied: h5py>=2.9.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (3.9.0)\n",
|
|
"Requirement already satisfied: libclang>=13.0.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (16.0.6)\n",
|
|
"Requirement already satisfied: ml-dtypes==0.2.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (0.2.0)\n",
|
|
"Requirement already satisfied: numpy>=1.23.5 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.26.0)\n",
|
|
"Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (3.3.0)\n",
|
|
"Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from tensorflow) (23.1)\n",
|
|
"Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (4.24.3)\n",
|
|
"Requirement already satisfied: setuptools in /usr/local/lib/python3.11/dist-packages (from tensorflow) (68.2.2)\n",
|
|
"Requirement already satisfied: six>=1.12.0 in /usr/lib/python3/dist-packages (from tensorflow) (1.16.0)\n",
|
|
"Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (2.3.0)\n",
|
|
"Requirement already satisfied: typing-extensions>=3.6.6 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (4.8.0)\n",
|
|
"Requirement already satisfied: wrapt<1.15,>=1.11.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.14.1)\n",
|
|
"Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (0.37.1)\n",
|
|
"Requirement already satisfied: grpcio<2.0,>=1.24.3 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (1.58.0)\n",
|
|
"Requirement already satisfied: tensorboard<2.15,>=2.14 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (2.14.0)\n",
|
|
"Requirement already satisfied: tensorflow-estimator<2.15,>=2.14.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (2.14.0)\n",
|
|
"Requirement already satisfied: keras<2.15,>=2.14.0 in /usr/local/lib/python3.11/dist-packages (from tensorflow) (2.14.0)\n",
|
|
"Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/local/lib/python3.11/dist-packages (from astunparse>=1.6.0->tensorflow) (0.41.2)\n",
|
|
"Requirement already satisfied: google-auth<3,>=1.6.3 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.15,>=2.14->tensorflow) (2.23.1)\n",
|
|
"Requirement already satisfied: google-auth-oauthlib<1.1,>=0.5 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.15,>=2.14->tensorflow) (1.0.0)\n",
|
|
"Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.15,>=2.14->tensorflow) (3.4.4)\n",
|
|
"Requirement already satisfied: requests<3,>=2.21.0 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.15,>=2.14->tensorflow) (2.31.0)\n",
|
|
"Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.15,>=2.14->tensorflow) (0.7.1)\n",
|
|
"Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from tensorboard<2.15,>=2.14->tensorflow) (2.3.7)\n",
|
|
"Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.11/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow) (5.3.1)\n",
|
|
"Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.11/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow) (0.3.0)\n",
|
|
"Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.11/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow) (4.9)\n",
|
|
"Requirement already satisfied: urllib3>=2.0.5 in /usr/local/lib/python3.11/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow) (2.0.5)\n",
|
|
"Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.11/dist-packages (from google-auth-oauthlib<1.1,>=0.5->tensorboard<2.15,>=2.14->tensorflow) (1.3.1)\n",
|
|
"Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.21.0->tensorboard<2.15,>=2.14->tensorflow) (3.2.0)\n",
|
|
"Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.21.0->tensorboard<2.15,>=2.14->tensorflow) (3.4)\n",
|
|
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.21.0->tensorboard<2.15,>=2.14->tensorflow) (2023.7.22)\n",
|
|
"Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.11/dist-packages (from werkzeug>=1.0.1->tensorboard<2.15,>=2.14->tensorflow) (2.1.3)\n",
|
|
"Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /usr/local/lib/python3.11/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow) (0.5.0)\n",
|
|
"Requirement already satisfied: oauthlib>=3.0.0 in /usr/lib/python3/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<1.1,>=0.5->tensorboard<2.15,>=2.14->tensorflow) (3.2.0)\n",
|
|
"\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
|
|
"\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n",
|
|
"Requirement already satisfied: numpy in /usr/local/lib/python3.11/dist-packages (1.26.0)\n",
|
|
"\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
|
|
"\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n",
|
|
"Requirement already satisfied: pandas in /usr/local/lib/python3.11/dist-packages (2.2.3)\n",
|
|
"Requirement already satisfied: numpy>=1.23.2 in /usr/local/lib/python3.11/dist-packages (from pandas) (1.26.0)\n",
|
|
"Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.11/dist-packages (from pandas) (2.8.2)\n",
|
|
"Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas) (2024.2)\n",
|
|
"Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas) (2024.2)\n",
|
|
"Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)\n",
|
|
"\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
|
|
"\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n",
|
|
"Requirement already satisfied: keras in /usr/local/lib/python3.11/dist-packages (2.14.0)\n",
|
|
"\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
|
|
"\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n",
|
|
"Requirement already satisfied: scikit-learn in /usr/local/lib/python3.11/dist-packages (1.5.2)\n",
|
|
"Requirement already satisfied: numpy>=1.19.5 in /usr/local/lib/python3.11/dist-packages (from scikit-learn) (1.26.0)\n",
|
|
"Requirement already satisfied: scipy>=1.6.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn) (1.14.1)\n",
|
|
"Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn) (1.4.2)\n",
|
|
"Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn) (3.5.0)\n",
|
|
"\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
|
|
"\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n",
|
|
"Requirement already satisfied: matplotlib in /usr/local/lib/python3.11/dist-packages (3.8.0)\n",
|
|
"Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (1.1.1)\n",
|
|
"Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (0.11.0)\n",
|
|
"Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (4.42.1)\n",
|
|
"Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (1.4.5)\n",
|
|
"Requirement already satisfied: numpy<2,>=1.21 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (1.26.0)\n",
|
|
"Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (23.1)\n",
|
|
"Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (10.0.1)\n",
|
|
"Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (3.2.0)\n",
|
|
"Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (2.8.2)\n",
|
|
"Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)\n",
|
|
"\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
|
|
"\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n",
|
|
"Requirement already satisfied: joblib in /usr/local/lib/python3.11/dist-packages (1.4.2)\n",
|
|
"\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
|
|
"\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n",
|
|
"Requirement already satisfied: pyarrow in /usr/local/lib/python3.11/dist-packages (18.0.0)\n",
|
|
"\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
|
|
"\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n",
|
|
"Requirement already satisfied: fastparquet in /usr/local/lib/python3.11/dist-packages (2024.5.0)\n",
|
|
"Requirement already satisfied: pandas>=1.5.0 in /usr/local/lib/python3.11/dist-packages (from fastparquet) (2.2.3)\n",
|
|
"Requirement already satisfied: numpy in /usr/local/lib/python3.11/dist-packages (from fastparquet) (1.26.0)\n",
|
|
"Requirement already satisfied: cramjam>=2.3 in /usr/local/lib/python3.11/dist-packages (from fastparquet) (2.9.0)\n",
|
|
"Requirement already satisfied: fsspec in /usr/local/lib/python3.11/dist-packages (from fastparquet) (2024.10.0)\n",
|
|
"Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from fastparquet) (23.1)\n",
|
|
"Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.5.0->fastparquet) (2.8.2)\n",
|
|
"Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.5.0->fastparquet) (2024.2)\n",
|
|
"Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.5.0->fastparquet) (2024.2)\n",
|
|
"Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.2->pandas>=1.5.0->fastparquet) (1.16.0)\n",
|
|
"\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
|
|
"\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n",
|
|
"Requirement already satisfied: scipy in /usr/local/lib/python3.11/dist-packages (1.14.1)\n",
|
|
"Requirement already satisfied: numpy<2.3,>=1.23.5 in /usr/local/lib/python3.11/dist-packages (from scipy) (1.26.0)\n",
|
|
"\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
|
|
"\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n",
|
|
"Requirement already satisfied: seaborn in /usr/local/lib/python3.11/dist-packages (0.13.2)\n",
|
|
"Requirement already satisfied: numpy!=1.24.0,>=1.20 in /usr/local/lib/python3.11/dist-packages (from seaborn) (1.26.0)\n",
|
|
"Requirement already satisfied: pandas>=1.2 in /usr/local/lib/python3.11/dist-packages (from seaborn) (2.2.3)\n",
|
|
"Requirement already satisfied: matplotlib!=3.6.1,>=3.4 in /usr/local/lib/python3.11/dist-packages (from seaborn) (3.8.0)\n",
|
|
"Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.1.1)\n",
|
|
"Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.11/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (0.11.0)\n",
|
|
"Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (4.42.1)\n",
|
|
"Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.4.5)\n",
|
|
"Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (23.1)\n",
|
|
"Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (10.0.1)\n",
|
|
"Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (3.2.0)\n",
|
|
"Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.11/dist-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (2.8.2)\n",
|
|
"Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.2->seaborn) (2024.2)\n",
|
|
"Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas>=1.2->seaborn) (2024.2)\n",
|
|
"Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.4->seaborn) (1.16.0)\n",
|
|
"\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
|
|
"\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n",
|
|
"Requirement already satisfied: tqdm in /usr/local/lib/python3.11/dist-packages (4.67.0)\n",
|
|
"\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
|
|
"\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n",
|
|
"Requirement already satisfied: pydot in /usr/local/lib/python3.11/dist-packages (3.0.2)\n",
|
|
"Requirement already satisfied: pyparsing>=3.0.9 in /usr/local/lib/python3.11/dist-packages (from pydot) (3.2.0)\n",
|
|
"\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
|
|
"\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n",
|
|
"Requirement already satisfied: tensorflow-io in /usr/local/lib/python3.11/dist-packages (0.37.1)\n",
|
|
"Requirement already satisfied: tensorflow-io-gcs-filesystem==0.37.1 in /usr/local/lib/python3.11/dist-packages (from tensorflow-io) (0.37.1)\n",
|
|
"\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
|
|
"\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.3.1\u001b[0m\n",
|
|
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython3 -m pip install --upgrade pip\u001b[0m\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"!apt-get update\n",
|
|
"!apt-get install graphviz -y\n",
|
|
"\n",
|
|
"!pip install tensorflow\n",
|
|
"!pip install numpy\n",
|
|
"!pip install pandas\n",
|
|
"\n",
|
|
"!pip install keras\n",
|
|
"!pip install scikit-learn\n",
|
|
"!pip install matplotlib\n",
|
|
"!pip install joblib\n",
|
|
"!pip install pyarrow\n",
|
|
"!pip install fastparquet\n",
|
|
"!pip install scipy\n",
|
|
"!pip install seaborn\n",
|
|
"!pip install tqdm\n",
|
|
"!pip install pydot\n",
|
|
"!pip install tensorflow-io"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-10-25T21:05:00.337046Z",
|
|
"start_time": "2024-10-25T21:04:03.960543Z"
|
|
},
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/"
|
|
},
|
|
"id": "VqHdVCiJthA6",
|
|
"outputId": "d8f830c1-5342-4e11-ac3c-96c535aad5fd"
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"2024-11-06 21:44:14.583940: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
|
|
"2024-11-06 21:44:14.584011: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
|
|
"2024-11-06 21:44:14.584064: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
|
|
"2024-11-06 21:44:14.596853: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
|
|
"To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n"
|
|
]
|
|
},
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Keras version: 2.14.0\n",
|
|
"TensorFlow version: 2.14.0\n",
|
|
"TensorFlow version: 2.14.0\n",
|
|
"CUDA available: True\n",
|
|
"GPU devices: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]\n",
|
|
"1 Physical GPUs, 1 Logical GPUs\n"
|
|
]
|
|
},
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"2024-11-06 21:44:17.246902: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 43622 MB memory: -> device: 0, name: NVIDIA L40S, pci bus id: 0000:01:00.0, compute capability: 8.9\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"import tensorflow as tf\n",
|
|
"import keras\n",
|
|
"\n",
|
|
"print(f\"Keras version: {keras.__version__}\")\n",
|
|
"print(f\"TensorFlow version: {tf.__version__}\")\n",
|
|
"print(f\"TensorFlow version: {tf.__version__}\")\n",
|
|
"print(f\"CUDA available: {tf.test.is_built_with_cuda()}\")\n",
|
|
"print(f\"GPU devices: {tf.config.list_physical_devices('GPU')}\")\n",
|
|
"\n",
|
|
"# GPU configuration\n",
|
|
"gpus = tf.config.experimental.list_physical_devices('GPU')\n",
|
|
"if gpus:\n",
|
|
" try:\n",
|
|
" for gpu in gpus:\n",
|
|
" tf.config.experimental.set_memory_growth(gpu, True)\n",
|
|
" logical_gpus = tf.config.experimental.list_logical_devices('GPU')\n",
|
|
" print(len(gpus), \"Physical GPUs,\", len(logical_gpus), \"Logical GPUs\")\n",
|
|
" except RuntimeError as e:\n",
|
|
" print(e)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-10-25T21:05:14.642072Z",
|
|
"start_time": "2024-10-25T21:05:11.794331Z"
|
|
},
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/",
|
|
"height": 160
|
|
},
|
|
"id": "cz0NU95IthA7",
|
|
"outputId": "eaf1939a-7708-49ad-adc9-bac4e2448e10"
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"TensorFlow version: 2.14.0\n",
|
|
"\n",
|
|
"Dispositivi disponibili:\n",
|
|
"[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]\n",
|
|
"\n",
|
|
"Shape del risultato: (10000, 10000)\n",
|
|
"Device del tensore: /job:localhost/replica:0/task:0/device:GPU:0\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"'Test completato con successo!'"
|
|
]
|
|
},
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Test semplice per verificare che la GPU funzioni\n",
|
|
"def test_gpu():\n",
|
|
" print(\"TensorFlow version:\", tf.__version__)\n",
|
|
" print(\"\\nDispositivi disponibili:\")\n",
|
|
" print(tf.config.list_physical_devices())\n",
|
|
"\n",
|
|
" # Creiamo e moltiplichiamo due tensori sulla GPU\n",
|
|
" with tf.device('/GPU:0'):\n",
|
|
" a = tf.random.normal([10000, 10000])\n",
|
|
" b = tf.random.normal([10000, 10000])\n",
|
|
" c = tf.matmul(a, b)\n",
|
|
"\n",
|
|
" print(\"\\nShape del risultato:\", c.shape)\n",
|
|
" print(\"Device del tensore:\", c.device)\n",
|
|
" return \"Test completato con successo!\"\n",
|
|
"\n",
|
|
"\n",
|
|
"test_gpu()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-10-25T21:05:34.177059Z",
|
|
"start_time": "2024-10-25T21:05:34.012517Z"
|
|
},
|
|
"id": "VYNuYASythA8"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import pandas as pd\n",
|
|
"import numpy as np\n",
|
|
"import matplotlib.pyplot as plt\n",
|
|
"import seaborn as sns\n",
|
|
"from sklearn.model_selection import train_test_split\n",
|
|
"from sklearn.preprocessing import MinMaxScaler, StandardScaler\n",
|
|
"from tensorflow.keras.layers import Input, Dense, Dropout, Bidirectional, LSTM, LayerNormalization, Add, Activation, BatchNormalization, MultiHeadAttention, MaxPooling1D, Conv1D, GlobalMaxPooling1D, GlobalAveragePooling1D, \\\n",
|
|
" Concatenate, ZeroPadding1D, Lambda, AveragePooling1D, concatenate\n",
|
|
"from tensorflow.keras.layers import Dense, LSTM, Conv1D, Input, concatenate, Dropout, BatchNormalization, GlobalAveragePooling1D, Bidirectional, TimeDistributed, Attention, MultiHeadAttention\n",
|
|
"from tensorflow.keras.models import Model\n",
|
|
"from tensorflow.keras.regularizers import l2\n",
|
|
"from tensorflow.keras.optimizers import Adam\n",
|
|
"from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint\n",
|
|
"from datetime import datetime\n",
|
|
"import os\n",
|
|
"import json\n",
|
|
"import joblib\n",
|
|
"import re\n",
|
|
"import pyarrow as pa\n",
|
|
"import pyarrow.parquet as pq\n",
|
|
"from tqdm import tqdm\n",
|
|
"from concurrent.futures import ProcessPoolExecutor, as_completed\n",
|
|
"from functools import partial\n",
|
|
"import psutil\n",
|
|
"import multiprocessing\n",
|
|
"\n",
|
|
"random_state_value = 42\n",
|
|
"\n",
|
|
"base_project_dir = './kaggle/working/'\n",
|
|
"data_project_dir = base_project_dir + 'data/'\n",
|
|
"models_project_dir = base_project_dir + 'models/'\n",
|
|
"\n",
|
|
"os.makedirs(base_project_dir, exist_ok=True)\n",
|
|
"os.makedirs(data_project_dir, exist_ok=True)\n",
|
|
"os.makedirs(models_project_dir, exist_ok=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "uHKkULSNthA8"
|
|
},
|
|
"source": [
|
|
"## Funzioni di Plot"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {
|
|
"id": "gzvYVaBPthA8"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def save_plot(plt, title, output_dir='./kaggle/working/plots'):\n",
|
|
" os.makedirs(output_dir, exist_ok=True)\n",
|
|
" filename = \"\".join(x for x in title if x.isalnum() or x in [' ', '-', '_']).rstrip()\n",
|
|
" filename = filename.replace(' ', '_').lower()\n",
|
|
" filepath = os.path.join(output_dir, f\"{filename}.png\")\n",
|
|
" plt.savefig(filepath, bbox_inches='tight', dpi=300)\n",
|
|
" print(f\"Plot salvato come: {filepath}\")\n",
|
|
"\n",
|
|
"\n",
|
|
"def to_camel_case(text):\n",
|
|
" \"\"\"\n",
|
|
" Converte una stringa in camelCase.\n",
|
|
" Gestisce stringhe con spazi, trattini o underscore.\n",
|
|
" Se è una sola parola, la restituisce in minuscolo.\n",
|
|
" \"\"\"\n",
|
|
" # Rimuove eventuali spazi iniziali e finali\n",
|
|
" text = text.strip()\n",
|
|
"\n",
|
|
" # Se la stringa è vuota, ritorna stringa vuota\n",
|
|
" if not text:\n",
|
|
" return \"\"\n",
|
|
"\n",
|
|
" # Sostituisce trattini e underscore con spazi\n",
|
|
" text = text.replace('-', ' ').replace('_', ' ')\n",
|
|
"\n",
|
|
" # Divide la stringa in parole\n",
|
|
" words = text.split()\n",
|
|
"\n",
|
|
" # Se non ci sono parole dopo lo split, ritorna stringa vuota\n",
|
|
" if not words:\n",
|
|
" return \"\"\n",
|
|
"\n",
|
|
" # Se c'è una sola parola, ritorna in minuscolo\n",
|
|
" if len(words) == 1:\n",
|
|
" return words[0].lower()\n",
|
|
"\n",
|
|
" # Altrimenti procedi con il camelCase\n",
|
|
" result = words[0].lower()\n",
|
|
" for word in words[1:]:\n",
|
|
" result += word.capitalize()\n",
|
|
"\n",
|
|
" return result"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "lhipxRbMthA8"
|
|
},
|
|
"source": [
|
|
"## 1. Caricamento e preparazione dei Dati Meteo"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Function to convert csv to parquet\n",
|
|
"def csv_to_parquet(csv_file, parquet_file, chunksize=100000):\n",
|
|
" writer = None\n",
|
|
"\n",
|
|
" for chunk in pd.read_csv(csv_file, chunksize=chunksize):\n",
|
|
" if writer is None:\n",
|
|
"\n",
|
|
" table = pa.Table.from_pandas(chunk)\n",
|
|
" writer = pq.ParquetWriter(parquet_file, table.schema)\n",
|
|
" else:\n",
|
|
" table = pa.Table.from_pandas(chunk)\n",
|
|
"\n",
|
|
" writer.write_table(table)\n",
|
|
"\n",
|
|
" if writer:\n",
|
|
" writer.close()\n",
|
|
"\n",
|
|
" print(f\"File conversion completed : {csv_file} -> {parquet_file}\")\n",
|
|
"\n",
|
|
"\n",
|
|
"def read_json_files(folder_path):\n",
|
|
" all_data = []\n",
|
|
"\n",
|
|
" file_list = sorted(os.listdir(folder_path))\n",
|
|
"\n",
|
|
" for filename in file_list:\n",
|
|
" if filename.endswith('.json'):\n",
|
|
" file_path = os.path.join(folder_path, filename)\n",
|
|
" try:\n",
|
|
" with open(file_path, 'r') as file:\n",
|
|
" data = json.load(file)\n",
|
|
" all_data.extend(data['days'])\n",
|
|
" except Exception as e:\n",
|
|
" print(f\"Error processing file '{filename}': {str(e)}\")\n",
|
|
"\n",
|
|
" return all_data\n",
|
|
"\n",
|
|
"\n",
|
|
"def create_weather_dataset(data):\n",
|
|
" dataset = []\n",
|
|
" seen_datetimes = set()\n",
|
|
"\n",
|
|
" for day in data:\n",
|
|
" date = day['datetime']\n",
|
|
" for hour in day['hours']:\n",
|
|
" datetime_str = f\"{date} {hour['datetime']}\"\n",
|
|
"\n",
|
|
" # Verifico se questo datetime è già stato visto\n",
|
|
" if datetime_str in seen_datetimes:\n",
|
|
" continue\n",
|
|
"\n",
|
|
" seen_datetimes.add(datetime_str)\n",
|
|
"\n",
|
|
" if isinstance(hour['preciptype'], list):\n",
|
|
" preciptype = \"__\".join(hour['preciptype'])\n",
|
|
" else:\n",
|
|
" preciptype = hour['preciptype'] if hour['preciptype'] else \"\"\n",
|
|
"\n",
|
|
" conditions = hour['conditions'].replace(', ', '__').replace(' ', '_').lower()\n",
|
|
"\n",
|
|
" row = {\n",
|
|
" 'datetime': datetime_str,\n",
|
|
" 'temp': hour['temp'],\n",
|
|
" 'feelslike': hour['feelslike'],\n",
|
|
" 'humidity': hour['humidity'],\n",
|
|
" 'dew': hour['dew'],\n",
|
|
" 'precip': hour['precip'],\n",
|
|
" 'snow': hour['snow'],\n",
|
|
" 'preciptype': preciptype.lower(),\n",
|
|
" 'windspeed': hour['windspeed'],\n",
|
|
" 'winddir': hour['winddir'],\n",
|
|
" 'pressure': hour['pressure'],\n",
|
|
" 'cloudcover': hour['cloudcover'],\n",
|
|
" 'visibility': hour['visibility'],\n",
|
|
" 'solarradiation': hour['solarradiation'],\n",
|
|
" 'solarenergy': hour['solarenergy'],\n",
|
|
" 'uvindex': hour['uvindex'],\n",
|
|
" 'conditions': conditions,\n",
|
|
" 'tempmax': day['tempmax'],\n",
|
|
" 'tempmin': day['tempmin'],\n",
|
|
" 'precipprob': day['precipprob'],\n",
|
|
" 'precipcover': day['precipcover']\n",
|
|
" }\n",
|
|
" dataset.append(row)\n",
|
|
"\n",
|
|
" dataset.sort(key=lambda x: datetime.strptime(x['datetime'], \"%Y-%m-%d %H:%M:%S\"))\n",
|
|
"\n",
|
|
" return pd.DataFrame(dataset)\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Crea le sequenze per LSTM\n",
|
|
"def create_sequences(timesteps, X, y=None):\n",
|
|
" \"\"\"\n",
|
|
" Crea sequenze temporali dai dati.\n",
|
|
" \n",
|
|
" Parameters:\n",
|
|
" -----------\n",
|
|
" X : array-like\n",
|
|
" Dati di input\n",
|
|
" timesteps : int\n",
|
|
" Numero di timestep per ogni sequenza\n",
|
|
" y : array-like, optional\n",
|
|
" Target values. Se None, crea sequenze solo per X\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" --------\n",
|
|
" tuple o array\n",
|
|
" Se y è fornito: (X_sequences, y_sequences)\n",
|
|
" Se y è None: X_sequences\n",
|
|
" \"\"\"\n",
|
|
" Xs = []\n",
|
|
" for i in range(len(X) - timesteps):\n",
|
|
" Xs.append(X[i:i + timesteps])\n",
|
|
"\n",
|
|
" if y is not None:\n",
|
|
" ys = []\n",
|
|
" for i in range(len(X) - timesteps):\n",
|
|
" ys.append(y[i + timesteps])\n",
|
|
" return np.array(Xs), np.array(ys)\n",
|
|
"\n",
|
|
" return np.array(Xs)\n",
|
|
"\n",
|
|
"def get_season(date):\n",
|
|
" month = date.month\n",
|
|
" day = date.day\n",
|
|
" if (month == 12 and day >= 21) or (month <= 3 and day < 20):\n",
|
|
" return 'Winter'\n",
|
|
" elif (month == 3 and day >= 20) or (month <= 6 and day < 21):\n",
|
|
" return 'Spring'\n",
|
|
" elif (month == 6 and day >= 21) or (month <= 9 and day < 23):\n",
|
|
" return 'Summer'\n",
|
|
" elif (month == 9 and day >= 23) or (month <= 12 and day < 21):\n",
|
|
" return 'Autumn'\n",
|
|
" else:\n",
|
|
" return 'Unknown'\n",
|
|
"\n",
|
|
"\n",
|
|
"def get_time_period(hour):\n",
|
|
" if 5 <= hour < 12:\n",
|
|
" return 'Morning'\n",
|
|
" elif 12 <= hour < 17:\n",
|
|
" return 'Afternoon'\n",
|
|
" elif 17 <= hour < 21:\n",
|
|
" return 'Evening'\n",
|
|
" else:\n",
|
|
" return 'Night'\n",
|
|
"\n",
|
|
"\n",
|
|
"def add_time_features(df):\n",
|
|
" df['datetime'] = pd.to_datetime(df['datetime'])\n",
|
|
" df['timestamp'] = df['datetime'].astype(np.int64) // 10 ** 9\n",
|
|
" df['year'] = df['datetime'].dt.year\n",
|
|
" df['month'] = df['datetime'].dt.month\n",
|
|
" df['day'] = df['datetime'].dt.day\n",
|
|
" df['hour'] = df['datetime'].dt.hour\n",
|
|
" df['minute'] = df['datetime'].dt.minute\n",
|
|
" df['hour_sin'] = np.sin(df['hour'] * (2 * np.pi / 24))\n",
|
|
" df['hour_cos'] = np.cos(df['hour'] * (2 * np.pi / 24))\n",
|
|
" df['day_of_week'] = df['datetime'].dt.dayofweek\n",
|
|
" df['day_of_year'] = df['datetime'].dt.dayofyear\n",
|
|
" df['week_of_year'] = df['datetime'].dt.isocalendar().week.astype(int)\n",
|
|
" df['quarter'] = df['datetime'].dt.quarter\n",
|
|
" df['is_month_end'] = df['datetime'].dt.is_month_end.astype(int)\n",
|
|
" df['is_quarter_end'] = df['datetime'].dt.is_quarter_end.astype(int)\n",
|
|
" df['is_year_end'] = df['datetime'].dt.is_year_end.astype(int)\n",
|
|
" df['month_sin'] = np.sin(df['month'] * (2 * np.pi / 12))\n",
|
|
" df['month_cos'] = np.cos(df['month'] * (2 * np.pi / 12))\n",
|
|
" df['day_of_year_sin'] = np.sin(df['day_of_year'] * (2 * np.pi / 365.25))\n",
|
|
" df['day_of_year_cos'] = np.cos(df['day_of_year'] * (2 * np.pi / 365.25))\n",
|
|
" df['season'] = df['datetime'].apply(get_season)\n",
|
|
" df['time_period'] = df['hour'].apply(get_time_period)\n",
|
|
" return df\n",
|
|
"\n",
|
|
"\n",
|
|
"def add_solar_features(df):\n",
|
|
" # Calcolo dell'angolo solare\n",
|
|
" df['solar_angle'] = np.sin(df['day_of_year'] * (2 * np.pi / 365.25)) * np.sin(df['hour'] * (2 * np.pi / 24))\n",
|
|
"\n",
|
|
" # Interazioni tra features rilevanti\n",
|
|
" df['cloud_temp_interaction'] = df['cloudcover'] * df['temp']\n",
|
|
" df['visibility_cloud_interaction'] = df['visibility'] * (100 - df['cloudcover'])\n",
|
|
"\n",
|
|
" # Feature derivate\n",
|
|
" df['clear_sky_index'] = (100 - df['cloudcover']) / 100\n",
|
|
" df['temp_gradient'] = df['temp'] - df['tempmin']\n",
|
|
"\n",
|
|
" return df\n",
|
|
"\n",
|
|
"\n",
|
|
"def add_solar_specific_features(df):\n",
|
|
" # Angolo solare e durata del giorno\n",
|
|
" df['day_length'] = 12 + 3 * np.sin(2 * np.pi * (df['day_of_year'] - 81) / 365.25)\n",
|
|
" df['solar_noon'] = 12 - df['hour']\n",
|
|
" df['solar_elevation'] = np.sin(2 * np.pi * df['day_of_year'] / 365.25) * np.cos(2 * np.pi * df['solar_noon'] / 24)\n",
|
|
"\n",
|
|
" # Interazioni\n",
|
|
" df['cloud_elevation'] = df['cloudcover'] * df['solar_elevation']\n",
|
|
" df['visibility_elevation'] = df['visibility'] * df['solar_elevation']\n",
|
|
"\n",
|
|
" # Rolling features con finestre più ampie\n",
|
|
" df['cloud_rolling_12h'] = df['cloudcover'].rolling(window=12).mean()\n",
|
|
" df['temp_rolling_12h'] = df['temp'].rolling(window=12).mean()\n",
|
|
"\n",
|
|
" return df\n",
|
|
"\n",
|
|
"\n",
|
|
"def add_advanced_features(df):\n",
|
|
" # Features esistenti\n",
|
|
" df = add_time_features(df)\n",
|
|
" df = add_solar_features(df)\n",
|
|
" df = add_solar_specific_features(df)\n",
|
|
"\n",
|
|
" # Aggiungi interazioni tra variabili meteorologiche\n",
|
|
" df['temp_humidity'] = df['temp'] * df['humidity']\n",
|
|
" df['temp_cloudcover'] = df['temp'] * df['cloudcover']\n",
|
|
" df['visibility_cloudcover'] = df['visibility'] * df['cloudcover']\n",
|
|
"\n",
|
|
" # Features derivate per la radiazione solare\n",
|
|
" df['clear_sky_factor'] = (100 - df['cloudcover']) / 100\n",
|
|
" df['day_length'] = np.sin(df['day_of_year_sin']) * 12 + 12 # approssimazione della durata del giorno\n",
|
|
"\n",
|
|
" # Lag features\n",
|
|
" df['temp_1h_lag'] = df['temp'].shift(1)\n",
|
|
" df['cloudcover_1h_lag'] = df['cloudcover'].shift(1)\n",
|
|
" df['humidity_1h_lag'] = df['humidity'].shift(1)\n",
|
|
"\n",
|
|
" # Rolling means\n",
|
|
" df['temp_rolling_mean_6h'] = df['temp'].rolling(window=6).mean()\n",
|
|
" df['cloudcover_rolling_mean_6h'] = df['cloudcover'].rolling(window=6).mean()\n",
|
|
"\n",
|
|
" return df\n",
|
|
"\n",
|
|
"# Preparazione dati\n",
|
|
"def prepare_solar_data(weather_data, features):\n",
|
|
" \"\"\"\n",
|
|
" Prepara i dati per i modelli solari.\n",
|
|
" \"\"\"\n",
|
|
" # Aggiungi le caratteristiche temporali\n",
|
|
" weather_data = add_advanced_features(weather_data)\n",
|
|
" weather_data = pd.get_dummies(weather_data, columns=['season', 'time_period'], drop_first=True)\n",
|
|
"\n",
|
|
" # Dividi i dati\n",
|
|
" data_after_2010 = weather_data[weather_data['year'] >= 2010].copy()\n",
|
|
" data_after_2010 = data_after_2010.sort_values('datetime')\n",
|
|
" data_after_2010.set_index('datetime', inplace=True)\n",
|
|
"\n",
|
|
" # Interpola valori mancanti\n",
|
|
" target_variables = ['solarradiation', 'solarenergy', 'uvindex']\n",
|
|
" for column in target_variables:\n",
|
|
" data_after_2010[column] = data_after_2010[column].interpolate(method='time')\n",
|
|
"\n",
|
|
" # Rimuovi righe con valori mancanti\n",
|
|
" data_after_2010.dropna(subset=features + target_variables, inplace=True)\n",
|
|
"\n",
|
|
" # Prepara X e y\n",
|
|
" X = data_after_2010[features].values\n",
|
|
" y = data_after_2010[target_variables].values\n",
|
|
"\n",
|
|
" # Normalizza features\n",
|
|
" scaler_X = MinMaxScaler()\n",
|
|
" X_scaled = scaler_X.fit_transform(X)\n",
|
|
"\n",
|
|
" return X_scaled, y, scaler_X, data_after_2010\n",
|
|
"\n",
|
|
"def prepare_model_specific_data(X_scaled, y, target_idx, timesteps):\n",
|
|
" \"\"\"\n",
|
|
" Prepara i dati specifici per ciascun modello.\n",
|
|
" \"\"\"\n",
|
|
" # Scaler specifico per il target\n",
|
|
" scaler_y = MinMaxScaler()\n",
|
|
" y_scaled = scaler_y.fit_transform(y[:, target_idx].reshape(-1, 1))\n",
|
|
"\n",
|
|
" # Split dei dati\n",
|
|
" X_train, X_temp, y_train, y_temp = train_test_split(\n",
|
|
" X_scaled, y_scaled, test_size=0.3, shuffle=False\n",
|
|
" )\n",
|
|
" X_val, X_test, y_val, y_test = train_test_split(\n",
|
|
" X_temp, y_temp, test_size=0.5, shuffle=False\n",
|
|
" )\n",
|
|
"\n",
|
|
" # Crea sequenze\n",
|
|
" X_train_seq, y_train_seq = create_sequences(timesteps, X_train, y_train)\n",
|
|
" X_val_seq, y_val_seq = create_sequences(timesteps, X_val, y_val)\n",
|
|
" X_test_seq, y_test_seq = create_sequences(timesteps, X_test, y_test)\n",
|
|
"\n",
|
|
" return {\n",
|
|
" 'train': (X_train_seq, y_train_seq),\n",
|
|
" 'val': (X_val_seq, y_val_seq),\n",
|
|
" 'test': (X_test_seq, y_test_seq)\n",
|
|
" }, scaler_y\n",
|
|
"\n",
|
|
"def create_radiation_model(input_shape, solar_params_shape=(3,)):\n",
|
|
" \"\"\"\n",
|
|
" Modello per la radiazione solare con vincoli di non-negatività.\n",
|
|
" \"\"\"\n",
|
|
" # Input layers\n",
|
|
" main_input = Input(shape=input_shape, name='main_input')\n",
|
|
" solar_input = Input(shape=solar_params_shape, name='solar_params')\n",
|
|
" \n",
|
|
" # Branch CNN\n",
|
|
" x1 = Conv1D(32, 3, padding='same')(main_input)\n",
|
|
" x1 = BatchNormalization()(x1)\n",
|
|
" x1 = Activation('relu')(x1)\n",
|
|
" x1 = Conv1D(64, 3, padding='same')(x1)\n",
|
|
" x1 = BatchNormalization()(x1)\n",
|
|
" x1 = Activation('relu')(x1)\n",
|
|
" x1 = GlobalAveragePooling1D()(x1)\n",
|
|
" \n",
|
|
" # Branch LSTM\n",
|
|
" x2 = Bidirectional(LSTM(64, return_sequences=True))(main_input)\n",
|
|
" x2 = Bidirectional(LSTM(32))(x2)\n",
|
|
" x2 = BatchNormalization()(x2)\n",
|
|
" \n",
|
|
" # Solar parameters processing\n",
|
|
" x3 = Dense(32)(solar_input)\n",
|
|
" x3 = BatchNormalization()(x3)\n",
|
|
" x3 = Activation('relu')(x3)\n",
|
|
" \n",
|
|
" # Combine all branches\n",
|
|
" x = concatenate([x1, x2, x3])\n",
|
|
" \n",
|
|
" # Dense layers with non-negativity constraints\n",
|
|
" x = Dense(64, kernel_constraint=tf.keras.constraints.NonNeg())(x)\n",
|
|
" x = BatchNormalization()(x)\n",
|
|
" x = Activation('relu')(x)\n",
|
|
" x = Dropout(0.2)(x)\n",
|
|
" \n",
|
|
" x = Dense(32, kernel_constraint=tf.keras.constraints.NonNeg())(x)\n",
|
|
" x = BatchNormalization()(x)\n",
|
|
" x = Activation('relu')(x)\n",
|
|
" \n",
|
|
" # Output layer con vincoli di non-negatività\n",
|
|
" output = Dense(1, \n",
|
|
" kernel_constraint=tf.keras.constraints.NonNeg(),\n",
|
|
" activation='relu')(x)\n",
|
|
" \n",
|
|
" model = Model(inputs=[main_input, solar_input], outputs=output, name=\"SolarRadiation\")\n",
|
|
" return model\n",
|
|
"\n",
|
|
"def create_energy_model(input_shape):\n",
|
|
" \"\"\"\n",
|
|
" Modello migliorato per l'energia solare che sfrutta la relazione con la radiazione.\n",
|
|
" Include vincoli di non-negatività e migliore gestione delle dipendenze temporali.\n",
|
|
" \"\"\"\n",
|
|
" inputs = Input(shape=input_shape)\n",
|
|
" \n",
|
|
" # Branch 1: Elaborazione temporale con attention\n",
|
|
" # Multi-head attention per catturare relazioni temporali\n",
|
|
" x1 = MultiHeadAttention(num_heads=8, key_dim=32)(inputs, inputs)\n",
|
|
" x1 = BatchNormalization()(x1)\n",
|
|
" x1 = Activation('relu')(x1)\n",
|
|
" \n",
|
|
" # Temporal Convolution branch per catturare pattern locali\n",
|
|
" x2 = Conv1D(\n",
|
|
" filters=64,\n",
|
|
" kernel_size=3,\n",
|
|
" padding='same',\n",
|
|
" kernel_constraint=tf.keras.constraints.NonNeg()\n",
|
|
" )(inputs)\n",
|
|
" x2 = BatchNormalization()(x2)\n",
|
|
" x2 = Activation('relu')(x2)\n",
|
|
" x2 = Conv1D(\n",
|
|
" filters=32,\n",
|
|
" kernel_size=3,\n",
|
|
" padding='same',\n",
|
|
" kernel_constraint=tf.keras.constraints.NonNeg()\n",
|
|
" )(x2)\n",
|
|
" x2 = BatchNormalization()(x2)\n",
|
|
" x2 = Activation('relu')(x2)\n",
|
|
" \n",
|
|
" # LSTM branch per memoria a lungo termine\n",
|
|
" x3 = LSTM(64, return_sequences=True)(inputs)\n",
|
|
" x3 = LSTM(32, return_sequences=False)(x3)\n",
|
|
" x3 = BatchNormalization()(x3)\n",
|
|
" x3 = Activation('relu')(x3)\n",
|
|
" \n",
|
|
" # Global pooling per ogni branch\n",
|
|
" x1 = GlobalAveragePooling1D()(x1)\n",
|
|
" x2 = GlobalAveragePooling1D()(x2)\n",
|
|
" \n",
|
|
" # Concatena tutti i branch\n",
|
|
" x = concatenate([x1, x2, x3])\n",
|
|
" \n",
|
|
" # Dense layers con vincoli di non-negatività\n",
|
|
" x = Dense(\n",
|
|
" 128,\n",
|
|
" kernel_constraint=tf.keras.constraints.NonNeg(),\n",
|
|
" kernel_regularizer=l2(0.01)\n",
|
|
" )(x)\n",
|
|
" x = BatchNormalization()(x)\n",
|
|
" x = Activation('relu')(x)\n",
|
|
" x = Dropout(0.3)(x)\n",
|
|
" \n",
|
|
" x = Dense(\n",
|
|
" 64,\n",
|
|
" kernel_constraint=tf.keras.constraints.NonNeg(),\n",
|
|
" kernel_regularizer=l2(0.01)\n",
|
|
" )(x)\n",
|
|
" x = BatchNormalization()(x)\n",
|
|
" x = Activation('relu')(x)\n",
|
|
" x = Dropout(0.2)(x)\n",
|
|
" \n",
|
|
" # Output layer con vincolo di non-negatività\n",
|
|
" output = Dense(\n",
|
|
" 1,\n",
|
|
" kernel_constraint=tf.keras.constraints.NonNeg(),\n",
|
|
" activation='relu', # Garantisce output non negativo\n",
|
|
" kernel_regularizer=l2(0.01)\n",
|
|
" )(x)\n",
|
|
" \n",
|
|
" model = Model(inputs=inputs, outputs=output, name=\"SolarEnergy\")\n",
|
|
" return model\n",
|
|
"\n",
|
|
"def create_uv_model(input_shape):\n",
|
|
" \"\"\"\n",
|
|
" Modello migliorato per l'indice UV che sfrutta sia radiazione che energia solare.\n",
|
|
" Include vincoli di non-negatività e considera le relazioni non lineari tra le variabili.\n",
|
|
" \"\"\"\n",
|
|
" inputs = Input(shape=input_shape)\n",
|
|
" \n",
|
|
" # CNN branch per pattern locali\n",
|
|
" x1 = Conv1D(\n",
|
|
" filters=64,\n",
|
|
" kernel_size=3,\n",
|
|
" padding='same',\n",
|
|
" kernel_constraint=tf.keras.constraints.NonNeg()\n",
|
|
" )(inputs)\n",
|
|
" x1 = BatchNormalization()(x1)\n",
|
|
" x1 = Activation('relu')(x1)\n",
|
|
" x1 = MaxPooling1D(pool_size=2)(x1)\n",
|
|
" \n",
|
|
" x1 = Conv1D(\n",
|
|
" filters=32,\n",
|
|
" kernel_size=3,\n",
|
|
" padding='same',\n",
|
|
" kernel_constraint=tf.keras.constraints.NonNeg()\n",
|
|
" )(x1)\n",
|
|
" x1 = BatchNormalization()(x1)\n",
|
|
" x1 = Activation('relu')(x1)\n",
|
|
" x1 = GlobalAveragePooling1D()(x1)\n",
|
|
" \n",
|
|
" # Attention branch per relazioni complesse\n",
|
|
" # Specialmente utile per le relazioni con radiazione ed energia\n",
|
|
" x2 = MultiHeadAttention(num_heads=4, key_dim=32)(inputs, inputs)\n",
|
|
" x2 = BatchNormalization()(x2)\n",
|
|
" x2 = Activation('relu')(x2)\n",
|
|
" x2 = GlobalAveragePooling1D()(x2)\n",
|
|
" \n",
|
|
" # Dense branch per le feature più recenti\n",
|
|
" x3 = GlobalAveragePooling1D()(inputs)\n",
|
|
" x3 = Dense(\n",
|
|
" 64,\n",
|
|
" kernel_constraint=tf.keras.constraints.NonNeg(),\n",
|
|
" kernel_regularizer=l2(0.01)\n",
|
|
" )(x3)\n",
|
|
" x3 = BatchNormalization()(x3)\n",
|
|
" x3 = Activation('relu')(x3)\n",
|
|
" \n",
|
|
" # Fusion dei branch\n",
|
|
" x = concatenate([x1, x2, x3])\n",
|
|
" \n",
|
|
" # Dense layers con vincoli di non-negatività\n",
|
|
" x = Dense(\n",
|
|
" 128,\n",
|
|
" kernel_constraint=tf.keras.constraints.NonNeg(),\n",
|
|
" kernel_regularizer=l2(0.01)\n",
|
|
" )(x)\n",
|
|
" x = BatchNormalization()(x)\n",
|
|
" x = Activation('relu')(x)\n",
|
|
" x = Dropout(0.3)(x)\n",
|
|
" \n",
|
|
" x = Dense(\n",
|
|
" 64,\n",
|
|
" kernel_constraint=tf.keras.constraints.NonNeg(),\n",
|
|
" kernel_regularizer=l2(0.01)\n",
|
|
" )(x)\n",
|
|
" x = BatchNormalization()(x)\n",
|
|
" x = Activation('relu')(x)\n",
|
|
" x = Dropout(0.2)(x)\n",
|
|
" \n",
|
|
" # Output layer con vincolo di non-negatività\n",
|
|
" output = Dense(\n",
|
|
" 1,\n",
|
|
" kernel_constraint=tf.keras.constraints.NonNeg(),\n",
|
|
" activation='relu', # Garantisce output non negativo\n",
|
|
" kernel_regularizer=l2(0.01)\n",
|
|
" )(x)\n",
|
|
" \n",
|
|
" model = Model(inputs=inputs, outputs=output, name=\"SolarUV\")\n",
|
|
" return model\n",
|
|
"\n",
|
|
"class CustomCallback(tf.keras.callbacks.Callback):\n",
|
|
" \"\"\"\n",
|
|
" Callback personalizzato per monitorare la non-negatività delle predizioni\n",
|
|
" e altre metriche importanti durante il training.\n",
|
|
" \"\"\"\n",
|
|
" def __init__(self, validation_data=None):\n",
|
|
" super().__init__()\n",
|
|
" self.validation_data = validation_data\n",
|
|
" \n",
|
|
" def on_epoch_end(self, epoch, logs=None):\n",
|
|
" try:\n",
|
|
" # Controlla se abbiamo i dati di validazione\n",
|
|
" if hasattr(self.model, 'validation_data'):\n",
|
|
" val_x = self.model.validation_data[0]\n",
|
|
" if isinstance(val_x, list): # Per il modello della radiazione\n",
|
|
" val_pred = self.model.predict(val_x, verbose=0)\n",
|
|
" else:\n",
|
|
" val_pred = self.model.predict(val_x, verbose=0)\n",
|
|
" \n",
|
|
" # Verifica non-negatività\n",
|
|
" if np.any(val_pred < 0):\n",
|
|
" print(\"\\nWarning: Rilevati valori negativi nelle predizioni\")\n",
|
|
" print(f\"Min value: {np.min(val_pred)}\")\n",
|
|
" \n",
|
|
" # Statistiche predizioni\n",
|
|
" print(f\"\\nStatistiche predizioni epoca {epoch}:\")\n",
|
|
" print(f\"Min: {np.min(val_pred):.4f}\")\n",
|
|
" print(f\"Max: {np.max(val_pred):.4f}\")\n",
|
|
" print(f\"Media: {np.mean(val_pred):.4f}\")\n",
|
|
" \n",
|
|
" # Aggiunge le metriche ai logs\n",
|
|
" if logs is not None:\n",
|
|
" logs['val_pred_min'] = np.min(val_pred)\n",
|
|
" logs['val_pred_max'] = np.max(val_pred)\n",
|
|
" logs['val_pred_mean'] = np.mean(val_pred)\n",
|
|
" except Exception as e:\n",
|
|
" print(f\"\\nWarning nel CustomCallback: {str(e)}\")\n",
|
|
"\n",
|
|
"def create_callbacks(target):\n",
|
|
" \"\"\"\n",
|
|
" Crea le callbacks per il training del modello.\n",
|
|
" \n",
|
|
" Parameters:\n",
|
|
" -----------\n",
|
|
" target : str\n",
|
|
" Nome del target per cui creare le callbacks\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" --------\n",
|
|
" list : Lista delle callbacks configurate\n",
|
|
" \"\"\"\n",
|
|
" # Crea la directory per i checkpoint e i logs\n",
|
|
" model_dir = f'./kaggle/working/models/{target}'\n",
|
|
" checkpoint_dir = os.path.join(model_dir, 'checkpoints')\n",
|
|
" log_dir = os.path.join(model_dir, 'logs')\n",
|
|
" \n",
|
|
" os.makedirs(checkpoint_dir, exist_ok=True)\n",
|
|
" os.makedirs(log_dir, exist_ok=True)\n",
|
|
" \n",
|
|
" return [\n",
|
|
" # Early Stopping\n",
|
|
" EarlyStopping(\n",
|
|
" monitor='val_loss',\n",
|
|
" patience=10,\n",
|
|
" restore_best_weights=True,\n",
|
|
" min_delta=0.0001\n",
|
|
" ),\n",
|
|
" # Reduce LR on Plateau\n",
|
|
" ReduceLROnPlateau(\n",
|
|
" monitor='val_loss',\n",
|
|
" factor=0.5,\n",
|
|
" patience=5,\n",
|
|
" min_lr=1e-6,\n",
|
|
" verbose=1\n",
|
|
" ),\n",
|
|
" # Model Checkpoint\n",
|
|
" ModelCheckpoint(\n",
|
|
" filepath=os.path.join(checkpoint_dir, 'best_model_{epoch:02d}_{val_loss:.4f}.h5'),\n",
|
|
" monitor='val_loss',\n",
|
|
" save_best_only=True,\n",
|
|
" save_weights_only=True,\n",
|
|
" verbose=1\n",
|
|
" ),\n",
|
|
" # TensorBoard\n",
|
|
" tf.keras.callbacks.TensorBoard(\n",
|
|
" log_dir=log_dir,\n",
|
|
" histogram_freq=1,\n",
|
|
" write_graph=True,\n",
|
|
" update_freq='epoch'\n",
|
|
" ),\n",
|
|
" # Custom callback\n",
|
|
" CustomCallback()\n",
|
|
" ]\n",
|
|
"\n",
|
|
"def train_solar_models(weather_data, features, timesteps=24):\n",
|
|
" \"\"\"\n",
|
|
" Training sequenziale dei modelli solari dove ogni modello usa \n",
|
|
" le predizioni dei modelli precedenti come feature aggiuntive.\n",
|
|
" \n",
|
|
" Parameters:\n",
|
|
" -----------\n",
|
|
" weather_data : pd.DataFrame\n",
|
|
" Dataset contenente i dati meteorologici\n",
|
|
" features : list\n",
|
|
" Lista delle feature da utilizzare\n",
|
|
" timesteps : int, optional\n",
|
|
" Numero di timesteps per le sequenze temporali\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" --------\n",
|
|
" tuple\n",
|
|
" (models, histories, scalers) contenenti i modelli addestrati,\n",
|
|
" le storie di training e gli scalers utilizzati\n",
|
|
" \"\"\"\n",
|
|
" print(\"Preparazione dati iniziale...\")\n",
|
|
" X_scaled, y, scaler_X, data_processed = prepare_solar_data(weather_data, features)\n",
|
|
" \n",
|
|
" models = {}\n",
|
|
" histories = {}\n",
|
|
" scalers = {'X': scaler_X}\n",
|
|
" feature_scalers = {} # Per tenere traccia degli scaler delle nuove features\n",
|
|
" \n",
|
|
" # Manteniamo un array delle feature che si espanderà con le predizioni\n",
|
|
" current_features = X_scaled.copy()\n",
|
|
" print(f\"Shape iniziale features: {current_features.shape}\")\n",
|
|
" \n",
|
|
" # Dizionario per mantenere le predizioni di ogni modello\n",
|
|
" predictions_by_target = {}\n",
|
|
" \n",
|
|
" # Configurazione per ciascun modello in ordine specifico\n",
|
|
" model_configs = {\n",
|
|
" 'solarradiation': {\n",
|
|
" 'creator': create_radiation_model,\n",
|
|
" 'index': 0,\n",
|
|
" 'needs_solar_params': True,\n",
|
|
" 'previous_predictions_needed': []\n",
|
|
" },\n",
|
|
" 'solarenergy': {\n",
|
|
" 'creator': create_energy_model,\n",
|
|
" 'index': 1,\n",
|
|
" 'needs_solar_params': False,\n",
|
|
" 'previous_predictions_needed': ['solarradiation']\n",
|
|
" },\n",
|
|
" 'uvindex': {\n",
|
|
" 'creator': create_uv_model,\n",
|
|
" 'index': 2,\n",
|
|
" 'needs_solar_params': False,\n",
|
|
" 'previous_predictions_needed': ['solarradiation', 'solarenergy']\n",
|
|
" }\n",
|
|
" }\n",
|
|
" \n",
|
|
" # Training sequenziale\n",
|
|
" for target, config in model_configs.items():\n",
|
|
" print(f\"\\n{'='*50}\")\n",
|
|
" print(f\"Training modello per: {target}\")\n",
|
|
" print(f\"{'='*50}\")\n",
|
|
" \n",
|
|
" # 1. Aggiunta delle predizioni precedenti come features\n",
|
|
" if config['previous_predictions_needed']:\n",
|
|
" print(f\"\\nAggiunta predizioni precedenti da: {config['previous_predictions_needed']}\")\n",
|
|
" new_features_list = []\n",
|
|
" \n",
|
|
" for prev_target in config['previous_predictions_needed']:\n",
|
|
" if prev_target in predictions_by_target:\n",
|
|
" print(f\"\\nProcessing predizioni di {prev_target}...\")\n",
|
|
" prev_pred = predictions_by_target[prev_target]\n",
|
|
" \n",
|
|
" # Allineamento dimensioni\n",
|
|
" if len(prev_pred) != len(current_features):\n",
|
|
" print(\"Allineamento dimensioni necessario:\")\n",
|
|
" print(f\"- Current features: {current_features.shape}\")\n",
|
|
" print(f\"- Predictions: {prev_pred.shape}\")\n",
|
|
" \n",
|
|
" offset = len(current_features) - len(prev_pred)\n",
|
|
" if offset > 0:\n",
|
|
" print(f\"Aggiunta padding di {offset} elementi\")\n",
|
|
" pad_width = ((offset, 0), (0, 0)) if len(prev_pred.shape) > 1 else (offset, 0)\n",
|
|
" prev_pred = np.pad(prev_pred, pad_width, mode='edge')\n",
|
|
" else:\n",
|
|
" print(f\"Taglio di {abs(offset)} elementi\")\n",
|
|
" prev_pred = prev_pred[-len(current_features):]\n",
|
|
" \n",
|
|
" # Scaling delle predizioni\n",
|
|
" feature_scaler = MinMaxScaler()\n",
|
|
" prev_pred_scaled = feature_scaler.fit_transform(prev_pred.reshape(-1, 1))\n",
|
|
" feature_scalers[f\"{prev_target}_pred\"] = feature_scaler\n",
|
|
" \n",
|
|
" print(f\"Statistiche feature {prev_target}:\")\n",
|
|
" print(f\"- Shape: {prev_pred_scaled.shape}\")\n",
|
|
" print(f\"- Range: [{prev_pred_scaled.min():.4f}, {prev_pred_scaled.max():.4f}]\")\n",
|
|
" \n",
|
|
" new_features_list.append(prev_pred_scaled)\n",
|
|
" \n",
|
|
" # Aggiunta delle nuove features\n",
|
|
" if new_features_list:\n",
|
|
" print(\"\\nVerifica dimensioni prima della concatenazione:\")\n",
|
|
" lengths = [feat.shape[0] for feat in [current_features] + new_features_list]\n",
|
|
" if len(set(lengths)) > 1:\n",
|
|
" print(\"WARNING: Lunghezze diverse rilevate, allineamento necessario\")\n",
|
|
" min_length = min(lengths)\n",
|
|
" current_features = current_features[-min_length:]\n",
|
|
" new_features_list = [feat[-min_length:] for feat in new_features_list]\n",
|
|
" \n",
|
|
" try:\n",
|
|
" current_features = np.column_stack([current_features] + new_features_list)\n",
|
|
" print(f\"Nuove dimensioni features: {current_features.shape}\")\n",
|
|
" except ValueError as e:\n",
|
|
" print(f\"Errore nella concatenazione: {str(e)}\")\n",
|
|
" print(\"\\nDimensioni:\")\n",
|
|
" print(f\"- Current: {current_features.shape}\")\n",
|
|
" for i, feat in enumerate(new_features_list):\n",
|
|
" print(f\"- New {i}: {feat.shape}\")\n",
|
|
" raise\n",
|
|
" \n",
|
|
" # 2. Preparazione dati per il training\n",
|
|
" print(\"\\nPreparazione dati di training...\")\n",
|
|
" data_dict, scaler_y = prepare_model_specific_data(\n",
|
|
" current_features, y, config['index'], timesteps\n",
|
|
" )\n",
|
|
" scalers[target] = scaler_y\n",
|
|
" \n",
|
|
" # 3. Creazione e compilazione del modello\n",
|
|
" print(\"\\nCreazione modello...\")\n",
|
|
" input_shape = (timesteps, current_features.shape[1])\n",
|
|
" print(f\"Input shape: {input_shape}\")\n",
|
|
" \n",
|
|
" if config['needs_solar_params']:\n",
|
|
" model = config['creator'](input_shape, solar_params_shape=(3,))\n",
|
|
" solar_params = data_processed[['solar_angle', 'clear_sky_index', 'solar_elevation']].values\n",
|
|
" else:\n",
|
|
" model = config['creator'](input_shape)\n",
|
|
" \n",
|
|
" model.compile(\n",
|
|
" optimizer=Adam(learning_rate=0.001, clipnorm=1.0),\n",
|
|
" loss='huber',\n",
|
|
" metrics=['mae']\n",
|
|
" )\n",
|
|
" model.summary()\n",
|
|
" \n",
|
|
" # 4. Training\n",
|
|
" print(\"\\nInizio training...\")\n",
|
|
" callbacks = create_callbacks(target)\n",
|
|
" \n",
|
|
" try:\n",
|
|
" if config['needs_solar_params']:\n",
|
|
" history = model.fit(\n",
|
|
" [data_dict['train'][0], solar_params[:len(data_dict['train'][0])]],\n",
|
|
" data_dict['train'][1],\n",
|
|
" validation_data=([\n",
|
|
" data_dict['val'][0],\n",
|
|
" solar_params[len(data_dict['train'][0]):len(data_dict['train'][0])+len(data_dict['val'][0])]\n",
|
|
" ], data_dict['val'][1]),\n",
|
|
" epochs=50,\n",
|
|
" batch_size=32,\n",
|
|
" callbacks=callbacks,\n",
|
|
" verbose=1\n",
|
|
" )\n",
|
|
" \n",
|
|
" # Genera predizioni complete\n",
|
|
" print(\"\\nGenerazione predizioni complete...\")\n",
|
|
" all_sequences = create_sequences(timesteps, current_features)\n",
|
|
" predictions = model.predict(\n",
|
|
" [all_sequences, solar_params[:len(all_sequences)]]\n",
|
|
" )\n",
|
|
" else:\n",
|
|
" history = model.fit(\n",
|
|
" data_dict['train'][0],\n",
|
|
" data_dict['train'][1],\n",
|
|
" validation_data=(data_dict['val'][0], data_dict['val'][1]),\n",
|
|
" epochs=50,\n",
|
|
" batch_size=32,\n",
|
|
" callbacks=callbacks,\n",
|
|
" verbose=1\n",
|
|
" )\n",
|
|
" \n",
|
|
" # Genera predizioni complete\n",
|
|
" print(\"\\nGenerazione predizioni complete...\")\n",
|
|
" all_sequences = create_sequences(timesteps, current_features)\n",
|
|
" predictions = model.predict(all_sequences)\n",
|
|
" \n",
|
|
" # Denormalizza e processa le predizioni\n",
|
|
" predictions = scaler_y.inverse_transform(predictions)\n",
|
|
" predictions = np.maximum(predictions, 0) # Assicura non-negatività\n",
|
|
" predictions_by_target[target] = predictions\n",
|
|
" \n",
|
|
" print(f\"\\nStatistiche finali predizioni {target}:\")\n",
|
|
" print(f\"- Min: {predictions.min():.4f}\")\n",
|
|
" print(f\"- Max: {predictions.max():.4f}\")\n",
|
|
" print(f\"- Media: {predictions.mean():.4f}\")\n",
|
|
" \n",
|
|
" models[target] = model\n",
|
|
" histories[target] = history\n",
|
|
" \n",
|
|
" except Exception as e:\n",
|
|
" print(f\"\\nERRORE nel training di {target}: {str(e)}\")\n",
|
|
" raise\n",
|
|
" \n",
|
|
" # Aggiunta degli scaler delle feature al dizionario principale\n",
|
|
" scalers.update(feature_scalers)\n",
|
|
"\n",
|
|
" model_info = {\n",
|
|
" target: {\n",
|
|
" 'input_shape': (timesteps, current_features.shape[1]),\n",
|
|
" 'feature_order': feature_scalers.keys() if target != 'solarradiation' else None,\n",
|
|
" 'needs_solar_params': config['needs_solar_params']\n",
|
|
" }\n",
|
|
" for target, config in model_configs.items()\n",
|
|
" }\n",
|
|
" \n",
|
|
" # Salva il model_info insieme agli scaler\n",
|
|
" scalers['model_info'] = model_info\n",
|
|
" \n",
|
|
" return models, histories, scalers"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {
|
|
"id": "m7b_9nBJthA9"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def save_models_and_scalers(models, scalers, target_variables, base_path='./kaggle/working/models'):\n",
|
|
" \"\"\"\n",
|
|
" Salva i modelli Keras, gli scaler e gli artefatti aggiuntivi nella cartella models.\n",
|
|
" \n",
|
|
" Parameters:\n",
|
|
" -----------\n",
|
|
" models : dict\n",
|
|
" Dizionario contenente i modelli Keras per ogni variabile target\n",
|
|
" scalers : dict\n",
|
|
" Dizionario contenente tutti gli scaler (compresi X, target e predizioni)\n",
|
|
" target_variables : list\n",
|
|
" Lista delle variabili target\n",
|
|
" base_path : str\n",
|
|
" Percorso base dove salvare i modelli\n",
|
|
" \"\"\"\n",
|
|
" if isinstance(base_path, list):\n",
|
|
" base_path = './kaggle/working/models' # Path di default se viene passata una lista\n",
|
|
" \n",
|
|
" # Crea la cartella base se non esiste\n",
|
|
" os.makedirs(base_path, exist_ok=True)\n",
|
|
"\n",
|
|
" # Salva tutti gli scaler\n",
|
|
" scaler_path = os.path.join(base_path, 'scalers')\n",
|
|
" os.makedirs(scaler_path, exist_ok=True)\n",
|
|
" \n",
|
|
" # Salva ogni scaler separatamente\n",
|
|
" print(\"\\nSalvataggio scaler:\")\n",
|
|
" for scaler_name, scaler in scalers.items():\n",
|
|
" scaler_file = os.path.join(scaler_path, f'{scaler_name}.joblib')\n",
|
|
" joblib.dump(scaler, scaler_file)\n",
|
|
" print(f\"- Salvato scaler: {scaler_name}\")\n",
|
|
"\n",
|
|
" # Salva la configurazione dei modelli\n",
|
|
" model_configs = {\n",
|
|
" 'solarradiation': {'has_solar_params': True},\n",
|
|
" 'solarenergy': {'has_solar_params': False},\n",
|
|
" 'uvindex': {'has_solar_params': False}\n",
|
|
" }\n",
|
|
" config_path = os.path.join(base_path, 'model_configs.joblib')\n",
|
|
" joblib.dump(model_configs, config_path)\n",
|
|
"\n",
|
|
" # Salva i modelli e gli artefatti per ogni variabile target\n",
|
|
" print(\"\\nSalvataggio modelli e artefatti:\")\n",
|
|
" for target in target_variables:\n",
|
|
" print(f\"\\nProcessing {target}...\")\n",
|
|
" # Crea una sottocartella per ogni target\n",
|
|
" target_path = os.path.join(base_path, target)\n",
|
|
" os.makedirs(target_path, exist_ok=True)\n",
|
|
"\n",
|
|
" try:\n",
|
|
" # 1. Salva il modello completo\n",
|
|
" model_path = os.path.join(target_path, 'model.keras')\n",
|
|
" models[target].save(model_path, save_format='keras')\n",
|
|
" print(f\"- Salvato modello completo: {model_path}\")\n",
|
|
"\n",
|
|
" # 2. Salva i pesi separatamente\n",
|
|
" weights_path = os.path.join(target_path, 'weights')\n",
|
|
" os.makedirs(weights_path, exist_ok=True)\n",
|
|
" weight_file = os.path.join(weights_path, 'weights')\n",
|
|
" models[target].save_weights(weight_file)\n",
|
|
" print(f\"- Salvati pesi: {weight_file}\")\n",
|
|
"\n",
|
|
" # 3. Salva il plot del modello\n",
|
|
" plot_path = os.path.join(target_path, f'{target}_architecture.png')\n",
|
|
" tf.keras.utils.plot_model(\n",
|
|
" models[target],\n",
|
|
" to_file=plot_path,\n",
|
|
" show_shapes=True,\n",
|
|
" show_layer_names=True,\n",
|
|
" rankdir='TB',\n",
|
|
" expand_nested=True,\n",
|
|
" dpi=150\n",
|
|
" )\n",
|
|
" print(f\"- Salvato plot architettura: {plot_path}\")\n",
|
|
"\n",
|
|
" # 4. Salva il summary del modello in un file di testo\n",
|
|
" summary_path = os.path.join(target_path, f'{target}_summary.txt')\n",
|
|
" with open(summary_path, 'w') as f:\n",
|
|
" models[target].summary(print_fn=lambda x: f.write(x + '\\n'))\n",
|
|
" print(f\"- Salvato summary modello: {summary_path}\")\n",
|
|
"\n",
|
|
" except Exception as e:\n",
|
|
" print(f\"Errore nel salvataggio degli artefatti per {target}: {str(e)}\")\n",
|
|
"\n",
|
|
" # Salva la lista delle variabili target\n",
|
|
" target_vars_path = os.path.join(base_path, 'target_variables.joblib')\n",
|
|
" joblib.dump(target_variables, target_vars_path)\n",
|
|
"\n",
|
|
" # Salva un file README con la struttura e le informazioni\n",
|
|
" readme_path = os.path.join(base_path, 'README.txt')\n",
|
|
" with open(readme_path, 'w') as f:\n",
|
|
" f.write(\"Model Artifacts Directory Structure\\n\")\n",
|
|
" f.write(\"=================================\\n\\n\")\n",
|
|
" f.write(\"Directory structure:\\n\")\n",
|
|
" f.write(\"- scalers/: Contains all scalers used in the models\\n\")\n",
|
|
" f.write(\"- model_configs.joblib: Configuration for each model\\n\")\n",
|
|
" f.write(\"- target_variables.joblib: List of target variables\\n\")\n",
|
|
" f.write(\"\\nFor each target variable:\\n\")\n",
|
|
" f.write(\"- model.keras: Complete model\\n\")\n",
|
|
" f.write(\"- weights/: Model weights\\n\")\n",
|
|
" f.write(\"- *_architecture.png: Visual representation of model architecture\\n\")\n",
|
|
" f.write(\"- *_summary.txt: Detailed model summary\\n\\n\")\n",
|
|
" f.write(\"Saved scalers:\\n\")\n",
|
|
" for scaler_name in scalers.keys():\n",
|
|
" f.write(f\"- {scaler_name}\\n\")\n",
|
|
"\n",
|
|
" print(f\"\\nTutti gli artefatti salvati in: {base_path}\")\n",
|
|
" print(f\"Consulta {readme_path} per i dettagli sulla struttura\")\n",
|
|
"\n",
|
|
" return base_path\n",
|
|
"\n",
|
|
"def load_models_and_scalers(base_path='./kaggle/working/models'):\n",
|
|
" \"\"\"\n",
|
|
" Carica i modelli Keras e tutti gli scaler dalla cartella models.\n",
|
|
" \n",
|
|
" Parameters:\n",
|
|
" -----------\n",
|
|
" base_path : str\n",
|
|
" Percorso della cartella contenente i modelli salvati\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" --------\n",
|
|
" tuple\n",
|
|
" (models, scalers, target_variables)\n",
|
|
" \"\"\"\n",
|
|
" try:\n",
|
|
" # Carica la lista delle variabili target\n",
|
|
" target_vars_path = os.path.join(base_path, 'target_variables.joblib')\n",
|
|
" target_variables = joblib.load(target_vars_path)\n",
|
|
"\n",
|
|
" # Carica tutti gli scaler\n",
|
|
" scaler_path = os.path.join(base_path, 'scalers')\n",
|
|
" scalers = {}\n",
|
|
" for scaler_file in os.listdir(scaler_path):\n",
|
|
" if scaler_file.endswith('.joblib'):\n",
|
|
" scaler_name = scaler_file[:-7] # rimuove '.joblib'\n",
|
|
" scaler_file_path = os.path.join(scaler_path, scaler_file)\n",
|
|
" scalers[scaler_name] = joblib.load(scaler_file_path)\n",
|
|
"\n",
|
|
" # Carica la configurazione dei modelli\n",
|
|
" config_path = os.path.join(base_path, 'model_configs.joblib')\n",
|
|
" model_configs = joblib.load(config_path)\n",
|
|
"\n",
|
|
" # Inizializza il dizionario dei modelli\n",
|
|
" models = {}\n",
|
|
"\n",
|
|
" # Carica i custom layer se necessario\n",
|
|
" custom_objects = {\n",
|
|
" 'DataAugmentation': DataAugmentation,\n",
|
|
" 'PositionalEncoding': PositionalEncoding\n",
|
|
" }\n",
|
|
"\n",
|
|
" # Carica i modelli per ogni variabile target\n",
|
|
" for target in target_variables:\n",
|
|
" target_path = os.path.join(base_path, target)\n",
|
|
" \n",
|
|
" # Carica il model summary per ottenere le dimensioni corrette\n",
|
|
" summary_path = os.path.join(target_path, f'{target}_summary.txt')\n",
|
|
" input_shape = None\n",
|
|
" if os.path.exists(summary_path):\n",
|
|
" with open(summary_path, 'r') as f:\n",
|
|
" for line in f:\n",
|
|
" if 'Input Shape' in line:\n",
|
|
" # Estrai la shape dal summary\n",
|
|
" shape_str = line.split(':')[-1].strip()\n",
|
|
" shape_tuple = eval(shape_str)\n",
|
|
" input_shape = shape_tuple\n",
|
|
" break\n",
|
|
" \n",
|
|
" if input_shape is None:\n",
|
|
" # Fallback alle dimensioni di base\n",
|
|
" base_features = len(scalers['X'].get_params()['feature_names_in_'])\n",
|
|
" # Aggiungi feature per le predizioni precedenti\n",
|
|
" additional_features = 0\n",
|
|
" if target == 'solarenergy':\n",
|
|
" additional_features = 1 # solarradiation\n",
|
|
" elif target == 'uvindex':\n",
|
|
" additional_features = 2 # solarradiation + solarenergy\n",
|
|
" input_shape = (24, base_features + additional_features)\n",
|
|
" \n",
|
|
" # Carica il modello\n",
|
|
" model_path = os.path.join(target_path, 'model.keras')\n",
|
|
" try:\n",
|
|
" # Prima prova a caricare il modello completo\n",
|
|
" models[target] = tf.keras.models.load_model(\n",
|
|
" model_path,\n",
|
|
" custom_objects=custom_objects\n",
|
|
" )\n",
|
|
" print(f\"Caricato modello {target} da file\")\n",
|
|
" except Exception as e:\n",
|
|
" print(f\"Errore nel caricamento del modello {target}: {str(e)}\")\n",
|
|
" print(\"Tentativo di ricostruzione del modello...\")\n",
|
|
" \n",
|
|
" # Se fallisce, ricostruisci il modello e carica i pesi\n",
|
|
" if target == 'solarradiation':\n",
|
|
" models[target] = create_radiation_model(input_shape)\n",
|
|
" elif target == 'solarenergy':\n",
|
|
" models[target] = create_energy_model(input_shape)\n",
|
|
" else: # uvindex\n",
|
|
" models[target] = create_uv_model(input_shape)\n",
|
|
" \n",
|
|
" # Carica i pesi\n",
|
|
" weights_path = os.path.join(target_path, 'weights', 'weights')\n",
|
|
" models[target].load_weights(weights_path)\n",
|
|
" print(f\"Modello {target} ricostruito e pesi caricati\")\n",
|
|
"\n",
|
|
" print(f\"Modelli e scaler caricati da: {base_path}\")\n",
|
|
" print(\"Scaler caricati:\")\n",
|
|
" for scaler_name in scalers.keys():\n",
|
|
" print(f\"- {scaler_name}\")\n",
|
|
" \n",
|
|
" return models, scalers, target_variables\n",
|
|
"\n",
|
|
" except Exception as e:\n",
|
|
" print(f\"Errore nel caricamento dei modelli: {str(e)}\")\n",
|
|
" raise"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def predict_solar_variables(data_before_2010, features, models, scalers, target_variables, timesteps=24):\n",
|
|
" \"\"\"\n",
|
|
" Effettua predizioni sequenziali per le variabili solari usando le informazioni\n",
|
|
" salvate durante il training.\n",
|
|
" \n",
|
|
" Parameters:\n",
|
|
" -----------\n",
|
|
" data_before_2010 : pd.DataFrame\n",
|
|
" Dati storici da predire\n",
|
|
" features : list\n",
|
|
" Lista delle feature da utilizzare\n",
|
|
" models : dict\n",
|
|
" Dizionario dei modelli per ogni target\n",
|
|
" scalers : dict\n",
|
|
" Dizionario contenente tutti gli scaler e le informazioni sui modelli\n",
|
|
" target_variables : list\n",
|
|
" Lista delle variabili target\n",
|
|
" timesteps : int\n",
|
|
" Numero di timestep per le sequenze\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" --------\n",
|
|
" pd.DataFrame\n",
|
|
" DataFrame con le predizioni aggiunte\n",
|
|
" \"\"\"\n",
|
|
" import traceback\n",
|
|
" \n",
|
|
" # Crea copia dei dati\n",
|
|
" data = data_before_2010.copy()\n",
|
|
" \n",
|
|
" # Prepara i dati di base\n",
|
|
" X_before = data[features].values\n",
|
|
" current_features = scalers['X'].transform(X_before)\n",
|
|
" print(f\"Shape features iniziali: {current_features.shape}\")\n",
|
|
" \n",
|
|
" # Recupera le informazioni sui modelli\n",
|
|
" model_info = scalers['model_info']\n",
|
|
" \n",
|
|
" # Dizionario per tenere traccia delle predizioni\n",
|
|
" predictions_by_target = {}\n",
|
|
" \n",
|
|
" # Prepara i parametri solari\n",
|
|
" solar_params = None\n",
|
|
" if all(col in data.columns for col in ['solar_angle', 'clear_sky_index', 'solar_elevation']):\n",
|
|
" solar_params = data[['solar_angle', 'clear_sky_index', 'solar_elevation']].values\n",
|
|
" \n",
|
|
" for target in target_variables:\n",
|
|
" print(f\"\\n{'='*50}\")\n",
|
|
" print(f\"Previsione di {target}\")\n",
|
|
" print(f\"{'='*50}\")\n",
|
|
" \n",
|
|
" try:\n",
|
|
" # Recupera info specifiche del modello\n",
|
|
" target_info = model_info[target]\n",
|
|
" expected_shape = target_info['input_shape']\n",
|
|
" feature_order = target_info['feature_order']\n",
|
|
" needs_solar_params = target_info['needs_solar_params']\n",
|
|
" \n",
|
|
" # Reset delle feature per ogni target\n",
|
|
" X_current = current_features.copy()\n",
|
|
" \n",
|
|
" # Aggiungi le predizioni precedenti come features se necessario\n",
|
|
" if feature_order is not None:\n",
|
|
" print(\"Aggiunta predizioni precedenti come features\")\n",
|
|
" new_features_list = []\n",
|
|
" \n",
|
|
" for feature_name in feature_order:\n",
|
|
" if feature_name in scalers:\n",
|
|
" base_target = feature_name.replace('_pred', '')\n",
|
|
" if base_target in predictions_by_target:\n",
|
|
" print(f\"Aggiunta predizione di {base_target}\")\n",
|
|
" prev_pred = predictions_by_target[base_target]\n",
|
|
" \n",
|
|
" # Gestione NaN\n",
|
|
" if np.isnan(prev_pred).any():\n",
|
|
" print(f\"ATTENZIONE: Trovati NaN nelle predizioni di {base_target}\")\n",
|
|
" prev_pred = np.nan_to_num(prev_pred, 0)\n",
|
|
" \n",
|
|
" # Scala le predizioni\n",
|
|
" prev_pred_scaled = scalers[feature_name].transform(\n",
|
|
" prev_pred.reshape(-1, 1)\n",
|
|
" )\n",
|
|
" \n",
|
|
" # Allinea dimensioni\n",
|
|
" if len(prev_pred_scaled) != len(X_current):\n",
|
|
" if len(prev_pred_scaled) < len(X_current):\n",
|
|
" pad_width = ((len(X_current) - len(prev_pred_scaled), 0), (0, 0))\n",
|
|
" prev_pred_scaled = np.pad(prev_pred_scaled, pad_width, mode='edge')\n",
|
|
" else:\n",
|
|
" prev_pred_scaled = prev_pred_scaled[:len(X_current)]\n",
|
|
" \n",
|
|
" new_features_list.append(prev_pred_scaled)\n",
|
|
" print(f\"Shape dopo aggiunta {base_target}: {prev_pred_scaled.shape}\")\n",
|
|
" \n",
|
|
" if new_features_list:\n",
|
|
" X_current = np.column_stack([X_current] + new_features_list)\n",
|
|
" print(f\"Shape finale features: {X_current.shape}\")\n",
|
|
" \n",
|
|
" # Verifica dimensioni\n",
|
|
" if X_current.shape[1] != expected_shape[1]:\n",
|
|
" raise ValueError(\n",
|
|
" f\"Mismatch nelle dimensioni delle feature per {target}: \"\n",
|
|
" f\"atteso {expected_shape[1]}, ottenuto {X_current.shape[1]}\"\n",
|
|
" )\n",
|
|
" \n",
|
|
" # Crea le sequenze\n",
|
|
" X_seq = create_sequences(timesteps, X_current)\n",
|
|
" print(f\"Shape sequenze: {X_seq.shape}\")\n",
|
|
" \n",
|
|
" # Verifica NaN\n",
|
|
" if np.isnan(X_seq).any():\n",
|
|
" print(\"ATTENZIONE: Trovati NaN nelle sequenze di input\")\n",
|
|
" X_seq = np.nan_to_num(X_seq, 0)\n",
|
|
" \n",
|
|
" # Effettua le predizioni\n",
|
|
" if needs_solar_params and solar_params is not None:\n",
|
|
" print(\"Utilizzo modello con parametri solari\")\n",
|
|
" solar_params_seq = solar_params[timesteps:]\n",
|
|
" if len(solar_params_seq) > len(X_seq):\n",
|
|
" solar_params_seq = solar_params_seq[:len(X_seq)]\n",
|
|
" \n",
|
|
" y_pred_scaled = models[target].predict(\n",
|
|
" [X_seq, solar_params_seq],\n",
|
|
" batch_size=32,\n",
|
|
" verbose=1\n",
|
|
" )\n",
|
|
" else:\n",
|
|
" print(\"Utilizzo modello standard\")\n",
|
|
" y_pred_scaled = models[target].predict(\n",
|
|
" X_seq,\n",
|
|
" batch_size=32,\n",
|
|
" verbose=1\n",
|
|
" )\n",
|
|
" \n",
|
|
" # Verifica e processa le predizioni\n",
|
|
" if np.isnan(y_pred_scaled).any():\n",
|
|
" print(\"ATTENZIONE: Trovati NaN nelle predizioni\")\n",
|
|
" y_pred_scaled = np.nan_to_num(y_pred_scaled, 0)\n",
|
|
" \n",
|
|
" # Denormalizza\n",
|
|
" y_pred = scalers[target].inverse_transform(y_pred_scaled)\n",
|
|
" y_pred = np.maximum(y_pred, 0)\n",
|
|
" \n",
|
|
" # Salva le predizioni\n",
|
|
" predictions_by_target[target] = y_pred\n",
|
|
" \n",
|
|
" # Aggiorna il DataFrame\n",
|
|
" dates = data.index[timesteps:]\n",
|
|
" if len(dates) > len(y_pred):\n",
|
|
" dates = dates[:len(y_pred)]\n",
|
|
" data.loc[dates, target] = y_pred\n",
|
|
" \n",
|
|
" print(f\"\\nStatistiche predizioni per {target}:\")\n",
|
|
" print(f\"Media: {np.mean(y_pred):.2f}\")\n",
|
|
" print(f\"Min: {np.min(y_pred):.2f}\")\n",
|
|
" print(f\"Max: {np.max(y_pred):.2f}\")\n",
|
|
" \n",
|
|
" except Exception as e:\n",
|
|
" print(f\"Errore nella predizione di {target}: {str(e)}\")\n",
|
|
" print(\"Traceback completo:\", traceback.format_exc())\n",
|
|
" # Inizializza con zeri in caso di errore\n",
|
|
" y_pred = np.zeros(len(data) - timesteps)\n",
|
|
" predictions_by_target[target] = y_pred\n",
|
|
" dates = data.index[timesteps:]\n",
|
|
" data.loc[dates, target] = y_pred\n",
|
|
" continue\n",
|
|
" \n",
|
|
" # Gestisci valori mancanti\n",
|
|
" print(\"\\nGestione valori mancanti...\")\n",
|
|
" data[target_variables] = data[target_variables].fillna(0)\n",
|
|
" missing_counts = data[target_variables].isnull().sum()\n",
|
|
" if missing_counts.any():\n",
|
|
" print(\"Valori mancanti rimanenti:\")\n",
|
|
" print(missing_counts)\n",
|
|
" \n",
|
|
" return data\n",
|
|
"\n",
|
|
"def create_complete_dataset(data_before_2010, data_after_2010, predictions):\n",
|
|
" \"\"\"\n",
|
|
" Combina i dati predetti con i dati esistenti.\n",
|
|
" \n",
|
|
" Parameters:\n",
|
|
" -----------\n",
|
|
" data_before_2010 : pd.DataFrame\n",
|
|
" Dati storici originali\n",
|
|
" data_after_2010 : pd.DataFrame\n",
|
|
" Dati più recenti\n",
|
|
" predictions : pd.DataFrame\n",
|
|
" Dati con predizioni\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" --------\n",
|
|
" pd.DataFrame\n",
|
|
" Dataset completo combinato\n",
|
|
" \"\"\"\n",
|
|
" # Combina i dataset\n",
|
|
" weather_data_complete = pd.concat([predictions, data_after_2010], axis=0)\n",
|
|
" weather_data_complete = weather_data_complete.sort_index()\n",
|
|
" \n",
|
|
" # Verifica la continuità temporale\n",
|
|
" time_gaps = weather_data_complete.index.to_series().diff().dropna()\n",
|
|
" if time_gaps.max().total_seconds() > 3600: # gap maggiore di 1 ora\n",
|
|
" print(\"Attenzione: Trovati gap temporali nei dati\")\n",
|
|
" print(\"Gap massimo:\", time_gaps.max())\n",
|
|
" \n",
|
|
" return weather_data_complete"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def add_olive_water_consumption_correlation(dataset):\n",
|
|
" # Dati simulati per il fabbisogno d'acqua e la correlazione con la temperatura\n",
|
|
" fabbisogno_acqua = {\n",
|
|
" \"Nocellara dell'Etna\": {\"Primavera\": 1200, \"Estate\": 2000, \"Autunno\": 1000, \"Inverno\": 500, \"Temperatura Ottimale\": 18, \"Resistenza\": \"Media\"},\n",
|
|
" \"Leccino\": {\"Primavera\": 1000, \"Estate\": 1800, \"Autunno\": 800, \"Inverno\": 400, \"Temperatura Ottimale\": 20, \"Resistenza\": \"Alta\"},\n",
|
|
" \"Frantoio\": {\"Primavera\": 1100, \"Estate\": 1900, \"Autunno\": 900, \"Inverno\": 450, \"Temperatura Ottimale\": 19, \"Resistenza\": \"Alta\"},\n",
|
|
" \"Coratina\": {\"Primavera\": 1300, \"Estate\": 2200, \"Autunno\": 1100, \"Inverno\": 550, \"Temperatura Ottimale\": 17, \"Resistenza\": \"Media\"},\n",
|
|
" \"Moraiolo\": {\"Primavera\": 1150, \"Estate\": 2100, \"Autunno\": 900, \"Inverno\": 480, \"Temperatura Ottimale\": 18, \"Resistenza\": \"Media\"},\n",
|
|
" \"Pendolino\": {\"Primavera\": 1050, \"Estate\": 1850, \"Autunno\": 850, \"Inverno\": 430, \"Temperatura Ottimale\": 20, \"Resistenza\": \"Alta\"},\n",
|
|
" \"Taggiasca\": {\"Primavera\": 1000, \"Estate\": 1750, \"Autunno\": 800, \"Inverno\": 400, \"Temperatura Ottimale\": 19, \"Resistenza\": \"Alta\"},\n",
|
|
" \"Canino\": {\"Primavera\": 1100, \"Estate\": 1900, \"Autunno\": 900, \"Inverno\": 450, \"Temperatura Ottimale\": 18, \"Resistenza\": \"Media\"},\n",
|
|
" \"Itrana\": {\"Primavera\": 1200, \"Estate\": 2000, \"Autunno\": 1000, \"Inverno\": 500, \"Temperatura Ottimale\": 17, \"Resistenza\": \"Media\"},\n",
|
|
" \"Ogliarola\": {\"Primavera\": 1150, \"Estate\": 1950, \"Autunno\": 900, \"Inverno\": 480, \"Temperatura Ottimale\": 18, \"Resistenza\": \"Media\"},\n",
|
|
" \"Biancolilla\": {\"Primavera\": 1050, \"Estate\": 1800, \"Autunno\": 850, \"Inverno\": 430, \"Temperatura Ottimale\": 19, \"Resistenza\": \"Alta\"}\n",
|
|
" }\n",
|
|
"\n",
|
|
" # Calcola il fabbisogno idrico annuale per ogni varietà\n",
|
|
" for varieta in fabbisogno_acqua:\n",
|
|
" fabbisogno_acqua[varieta][\"Annuale\"] = sum([fabbisogno_acqua[varieta][stagione] for stagione in [\"Primavera\", \"Estate\", \"Autunno\", \"Inverno\"]])\n",
|
|
"\n",
|
|
" # Aggiungiamo le nuove colonne al dataset\n",
|
|
" dataset[\"Fabbisogno Acqua Primavera (m³/ettaro)\"] = dataset[\"Varietà di Olive\"].apply(lambda x: fabbisogno_acqua[x][\"Primavera\"])\n",
|
|
" dataset[\"Fabbisogno Acqua Estate (m³/ettaro)\"] = dataset[\"Varietà di Olive\"].apply(lambda x: fabbisogno_acqua[x][\"Estate\"])\n",
|
|
" dataset[\"Fabbisogno Acqua Autunno (m³/ettaro)\"] = dataset[\"Varietà di Olive\"].apply(lambda x: fabbisogno_acqua[x][\"Autunno\"])\n",
|
|
" dataset[\"Fabbisogno Acqua Inverno (m³/ettaro)\"] = dataset[\"Varietà di Olive\"].apply(lambda x: fabbisogno_acqua[x][\"Inverno\"])\n",
|
|
" dataset[\"Fabbisogno Idrico Annuale (m³/ettaro)\"] = dataset[\"Varietà di Olive\"].apply(lambda x: fabbisogno_acqua[x][\"Annuale\"])\n",
|
|
" dataset[\"Temperatura Ottimale\"] = dataset[\"Varietà di Olive\"].apply(lambda x: fabbisogno_acqua[x][\"Temperatura Ottimale\"])\n",
|
|
" dataset[\"Resistenza alla Siccità\"] = dataset[\"Varietà di Olive\"].apply(lambda x: fabbisogno_acqua[x][\"Resistenza\"])\n",
|
|
"\n",
|
|
" return dataset"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"metadata": {
|
|
"id": "zOeyz5JHthA_"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def preprocess_weather_data(weather_df):\n",
|
|
" # Calcola statistiche mensili per ogni anno\n",
|
|
" monthly_weather = weather_df.groupby(['year', 'month']).agg({\n",
|
|
" 'temp': ['mean', 'min', 'max'],\n",
|
|
" 'humidity': 'mean',\n",
|
|
" 'precip': 'sum',\n",
|
|
" 'windspeed': 'mean',\n",
|
|
" 'cloudcover': 'mean',\n",
|
|
" 'solarradiation': 'sum',\n",
|
|
" 'solarenergy': 'sum',\n",
|
|
" 'uvindex': 'max'\n",
|
|
" }).reset_index()\n",
|
|
"\n",
|
|
" monthly_weather.columns = ['year', 'month'] + [f'{col[0]}_{col[1]}' for col in monthly_weather.columns[2:]]\n",
|
|
" return monthly_weather\n",
|
|
"\n",
|
|
"\n",
|
|
"def get_growth_phase(month):\n",
|
|
" if month in [12, 1, 2]:\n",
|
|
" return 'dormancy'\n",
|
|
" elif month in [3, 4, 5]:\n",
|
|
" return 'flowering'\n",
|
|
" elif month in [6, 7, 8]:\n",
|
|
" return 'fruit_set'\n",
|
|
" else:\n",
|
|
" return 'ripening'\n",
|
|
"\n",
|
|
"\n",
|
|
"def calculate_weather_effect(row, optimal_temp):\n",
|
|
" # Effetti base\n",
|
|
" temp_effect = -0.1 * (row['temp_mean'] - optimal_temp) ** 2\n",
|
|
" rain_effect = -0.05 * (row['precip_sum'] - 600) ** 2 / 10000\n",
|
|
" sun_effect = 0.1 * row['solarenergy_sum'] / 1000\n",
|
|
"\n",
|
|
" # Fattori di scala basati sulla fase di crescita\n",
|
|
" if row['growth_phase'] == 'dormancy':\n",
|
|
" temp_scale = 0.5\n",
|
|
" rain_scale = 0.2\n",
|
|
" sun_scale = 0.1\n",
|
|
" elif row['growth_phase'] == 'flowering':\n",
|
|
" temp_scale = 2.0\n",
|
|
" rain_scale = 1.5\n",
|
|
" sun_scale = 1.0\n",
|
|
" elif row['growth_phase'] == 'fruit_set':\n",
|
|
" temp_scale = 1.5\n",
|
|
" rain_scale = 1.0\n",
|
|
" sun_scale = 0.8\n",
|
|
" else: # ripening\n",
|
|
" temp_scale = 1.0\n",
|
|
" rain_scale = 0.5\n",
|
|
" sun_scale = 1.2\n",
|
|
"\n",
|
|
" # Calcolo dell'effetto combinato\n",
|
|
" combined_effect = (\n",
|
|
" temp_scale * temp_effect +\n",
|
|
" rain_scale * rain_effect +\n",
|
|
" sun_scale * sun_effect\n",
|
|
" )\n",
|
|
"\n",
|
|
" # Aggiustamenti specifici per fase\n",
|
|
" if row['growth_phase'] == 'flowering':\n",
|
|
" combined_effect -= 0.5 * max(0, row['precip_sum'] - 50) # Penalità per pioggia eccessiva durante la fioritura\n",
|
|
" elif row['growth_phase'] == 'fruit_set':\n",
|
|
" combined_effect += 0.3 * max(0, row['temp_mean'] - (optimal_temp + 5)) # Bonus per temperature più alte durante la formazione dei frutti\n",
|
|
"\n",
|
|
" return combined_effect\n",
|
|
"\n",
|
|
"\n",
|
|
"def calculate_water_need(weather_data, base_need, optimal_temp):\n",
|
|
" # Calcola il fabbisogno idrico basato su temperatura e precipitazioni\n",
|
|
" temp_factor = 1 + 0.05 * (weather_data['temp_mean'] - optimal_temp) # Aumenta del 5% per ogni grado sopra l'ottimale\n",
|
|
" rain_factor = 1 - 0.001 * weather_data['precip_sum'] # Diminuisce leggermente con l'aumentare delle precipitazioni\n",
|
|
" return base_need * temp_factor * rain_factor\n",
|
|
"\n",
|
|
"\n",
|
|
"def clean_column_name(name):\n",
|
|
" # Rimuove caratteri speciali e spazi, converte in snake_case e abbrevia\n",
|
|
" name = re.sub(r'[^a-zA-Z0-9\\s]', '', name) # Rimuove caratteri speciali\n",
|
|
" name = name.lower().replace(' ', '_') # Converte in snake_case\n",
|
|
"\n",
|
|
" # Abbreviazioni comuni\n",
|
|
" abbreviations = {\n",
|
|
" 'production': 'prod',\n",
|
|
" 'percentage': 'pct',\n",
|
|
" 'hectare': 'ha',\n",
|
|
" 'tonnes': 't',\n",
|
|
" 'litres': 'l',\n",
|
|
" 'minimum': 'min',\n",
|
|
" 'maximum': 'max',\n",
|
|
" 'average': 'avg'\n",
|
|
" }\n",
|
|
"\n",
|
|
" for full, abbr in abbreviations.items():\n",
|
|
" name = name.replace(full, abbr)\n",
|
|
"\n",
|
|
" return name\n",
|
|
"\n",
|
|
"\n",
|
|
"def create_technique_mapping(olive_varieties, mapping_path='./kaggle/working/models/technique_mapping.joblib'):\n",
|
|
" # Estrai tutte le tecniche uniche dal dataset e convertile in lowercase\n",
|
|
" all_techniques = olive_varieties['Tecnica di Coltivazione'].str.lower().unique()\n",
|
|
"\n",
|
|
" # Crea il mapping partendo da 1\n",
|
|
" technique_mapping = {tech: i + 1 for i, tech in enumerate(sorted(all_techniques))}\n",
|
|
"\n",
|
|
" # Salva il mapping\n",
|
|
" os.makedirs(os.path.dirname(mapping_path), exist_ok=True)\n",
|
|
" joblib.dump(technique_mapping, mapping_path)\n",
|
|
"\n",
|
|
" return technique_mapping\n",
|
|
"\n",
|
|
"\n",
|
|
"def encode_techniques(df, mapping_path='./kaggle/working/models/technique_mapping.joblib'):\n",
|
|
" if not os.path.exists(mapping_path):\n",
|
|
" raise FileNotFoundError(f\"Mapping not found at {mapping_path}. Run create_technique_mapping first.\")\n",
|
|
"\n",
|
|
" technique_mapping = joblib.load(mapping_path)\n",
|
|
"\n",
|
|
" # Trova tutte le colonne delle tecniche\n",
|
|
" tech_columns = [col for col in df.columns if col.endswith('_tech')]\n",
|
|
"\n",
|
|
" # Applica il mapping a tutte le colonne delle tecniche\n",
|
|
" for col in tech_columns:\n",
|
|
" df[col] = df[col].str.lower().map(technique_mapping).fillna(0).astype(int)\n",
|
|
"\n",
|
|
" return df\n",
|
|
"\n",
|
|
"\n",
|
|
"def decode_techniques(df, mapping_path='./kaggle/working/models/technique_mapping.joblib'):\n",
|
|
" if not os.path.exists(mapping_path):\n",
|
|
" raise FileNotFoundError(f\"Mapping not found at {mapping_path}\")\n",
|
|
"\n",
|
|
" technique_mapping = joblib.load(mapping_path)\n",
|
|
" reverse_mapping = {v: k for k, v in technique_mapping.items()}\n",
|
|
" reverse_mapping[0] = '' # Aggiungi un mapping per 0 a stringa vuota\n",
|
|
"\n",
|
|
" # Trova tutte le colonne delle tecniche\n",
|
|
" tech_columns = [col for col in df.columns if col.endswith('_tech')]\n",
|
|
"\n",
|
|
" # Applica il reverse mapping a tutte le colonne delle tecniche\n",
|
|
" for col in tech_columns:\n",
|
|
" df[col] = df[col].map(reverse_mapping)\n",
|
|
"\n",
|
|
" return df\n",
|
|
"\n",
|
|
"\n",
|
|
"def decode_single_technique(technique_value, mapping_path='./kaggle/working/models/technique_mapping.joblib'):\n",
|
|
" if not os.path.exists(mapping_path):\n",
|
|
" raise FileNotFoundError(f\"Mapping not found at {mapping_path}\")\n",
|
|
"\n",
|
|
" technique_mapping = joblib.load(mapping_path)\n",
|
|
" reverse_mapping = {v: k for k, v in technique_mapping.items()}\n",
|
|
" reverse_mapping[0] = ''\n",
|
|
"\n",
|
|
" return reverse_mapping.get(technique_value, '')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def get_optimal_workers():\n",
|
|
" \"\"\"\n",
|
|
" Calcola il numero ottimale di workers basandosi sulle risorse del sistema.\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" int: Numero ottimale di workers\n",
|
|
" \"\"\"\n",
|
|
" # Ottiene il numero di CPU logiche (inclusi i thread virtuali)\n",
|
|
" cpu_count = multiprocessing.cpu_count()\n",
|
|
"\n",
|
|
" # Ottiene la memoria totale e disponibile in GB\n",
|
|
" memory = psutil.virtual_memory()\n",
|
|
" total_memory_gb = memory.total / (1024 ** 3)\n",
|
|
" available_memory_gb = memory.available / (1024 ** 3)\n",
|
|
"\n",
|
|
" # Stima della memoria necessaria per worker (esempio: 2GB per worker)\n",
|
|
" memory_per_worker_gb = 2\n",
|
|
"\n",
|
|
" # Calcola il numero massimo di workers basato sulla memoria disponibile\n",
|
|
" max_workers_by_memory = int(available_memory_gb / memory_per_worker_gb)\n",
|
|
"\n",
|
|
" # Usa il minimo tra:\n",
|
|
" # - numero di CPU disponibili - 1 (lascia una CPU libera per il sistema)\n",
|
|
" # - numero massimo di workers basato sulla memoria\n",
|
|
" # - un limite massimo arbitrario (es. 16) per evitare troppo overhead\n",
|
|
" optimal_workers = min(\n",
|
|
" cpu_count - 1,\n",
|
|
" max_workers_by_memory,\n",
|
|
" 32 # limite massimo arbitrario\n",
|
|
" )\n",
|
|
"\n",
|
|
" # Assicura almeno 1 worker\n",
|
|
" return max(1, optimal_workers)\n",
|
|
"\n",
|
|
"\n",
|
|
"def simulate_zone(base_weather, olive_varieties, year, zone, all_varieties, variety_techniques):\n",
|
|
" \"\"\"\n",
|
|
" Simula la produzione di olive per una singola zona.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" base_weather: DataFrame con dati meteo di base per l'anno selezionato\n",
|
|
" olive_varieties: DataFrame con le informazioni sulle varietà di olive\n",
|
|
" zone: ID della zona\n",
|
|
" all_varieties: Array con tutte le varietà disponibili\n",
|
|
" variety_techniques: Dict con le tecniche disponibili per ogni varietà\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" Dict con i risultati della simulazione per la zona\n",
|
|
" \"\"\"\n",
|
|
" # Crea una copia dei dati meteo per questa zona specifica\n",
|
|
" zone_weather = base_weather.copy()\n",
|
|
"\n",
|
|
" # Genera variazioni meteorologiche specifiche per questa zona\n",
|
|
" zone_weather['temp_mean'] *= np.random.uniform(0.95, 1.05, len(zone_weather))\n",
|
|
" zone_weather['precip_sum'] *= np.random.uniform(0.9, 1.1, len(zone_weather))\n",
|
|
" zone_weather['solarenergy_sum'] *= np.random.uniform(0.95, 1.05, len(zone_weather))\n",
|
|
"\n",
|
|
" # Genera caratteristiche specifiche della zona\n",
|
|
" num_varieties = np.random.randint(1, 4) # 1-3 varietà per zona\n",
|
|
" selected_varieties = np.random.choice(all_varieties, size=num_varieties, replace=False)\n",
|
|
" hectares = np.random.uniform(1, 10) # Dimensione del terreno\n",
|
|
" percentages = np.random.dirichlet(np.ones(num_varieties)) # Distribuzione delle varietà\n",
|
|
"\n",
|
|
" # Inizializzazione contatori annuali\n",
|
|
" annual_production = 0\n",
|
|
" annual_min_oil = 0\n",
|
|
" annual_max_oil = 0\n",
|
|
" annual_avg_oil = 0\n",
|
|
" annual_water_need = 0\n",
|
|
"\n",
|
|
" # Inizializzazione dizionario dati varietà\n",
|
|
" variety_data = {clean_column_name(variety): {\n",
|
|
" 'tech': '',\n",
|
|
" 'pct': 0,\n",
|
|
" 'prod_t_ha': 0,\n",
|
|
" 'oil_prod_t_ha': 0,\n",
|
|
" 'oil_prod_l_ha': 0,\n",
|
|
" 'min_yield_pct': 0,\n",
|
|
" 'max_yield_pct': 0,\n",
|
|
" 'min_oil_prod_l_ha': 0,\n",
|
|
" 'max_oil_prod_l_ha': 0,\n",
|
|
" 'avg_oil_prod_l_ha': 0,\n",
|
|
" 'l_per_t': 0,\n",
|
|
" 'min_l_per_t': 0,\n",
|
|
" 'max_l_per_t': 0,\n",
|
|
" 'avg_l_per_t': 0,\n",
|
|
" 'olive_prod': 0,\n",
|
|
" 'min_oil_prod': 0,\n",
|
|
" 'max_oil_prod': 0,\n",
|
|
" 'avg_oil_prod': 0,\n",
|
|
" 'water_need': 0\n",
|
|
" } for variety in all_varieties}\n",
|
|
"\n",
|
|
" # Simula produzione per ogni varietà selezionata\n",
|
|
" for i, variety in enumerate(selected_varieties):\n",
|
|
" # Seleziona tecnica di coltivazione casuale per questa varietà\n",
|
|
" technique = np.random.choice(variety_techniques[variety])\n",
|
|
" percentage = percentages[i]\n",
|
|
"\n",
|
|
" # Ottieni informazioni specifiche della varietà\n",
|
|
" variety_info = olive_varieties[\n",
|
|
" (olive_varieties['Varietà di Olive'] == variety) &\n",
|
|
" (olive_varieties['Tecnica di Coltivazione'] == technique)\n",
|
|
" ].iloc[0]\n",
|
|
"\n",
|
|
" # Calcola produzione base con variabilità\n",
|
|
" base_production = variety_info['Produzione (tonnellate/ettaro)'] * 1000 * percentage * hectares / 12\n",
|
|
" base_production *= np.random.uniform(0.9, 1.1)\n",
|
|
"\n",
|
|
" # Calcola effetti meteo sulla produzione\n",
|
|
" weather_effect = zone_weather.apply(\n",
|
|
" lambda row: calculate_weather_effect(row, variety_info['Temperatura Ottimale']),\n",
|
|
" axis=1\n",
|
|
" )\n",
|
|
" monthly_production = base_production * (1 + weather_effect / 10000)\n",
|
|
" monthly_production *= np.random.uniform(0.95, 1.05, len(zone_weather))\n",
|
|
"\n",
|
|
" # Calcola produzione annuale per questa varietà\n",
|
|
" annual_variety_production = monthly_production.sum()\n",
|
|
"\n",
|
|
" # Calcola rese di olio con variabilità\n",
|
|
" min_yield_factor = np.random.uniform(0.95, 1.05)\n",
|
|
" max_yield_factor = np.random.uniform(0.95, 1.05)\n",
|
|
" avg_yield_factor = (min_yield_factor + max_yield_factor) / 2\n",
|
|
"\n",
|
|
" min_oil_production = annual_variety_production * variety_info['Min Litri per Tonnellata'] / 1000 * min_yield_factor\n",
|
|
" max_oil_production = annual_variety_production * variety_info['Max Litri per Tonnellata'] / 1000 * max_yield_factor\n",
|
|
" avg_oil_production = annual_variety_production * variety_info['Media Litri per Tonnellata'] / 1000 * avg_yield_factor\n",
|
|
"\n",
|
|
" # Calcola fabbisogno idrico\n",
|
|
" base_water_need = (\n",
|
|
" variety_info['Fabbisogno Acqua Primavera (m³/ettaro)'] +\n",
|
|
" variety_info['Fabbisogno Acqua Estate (m³/ettaro)'] +\n",
|
|
" variety_info['Fabbisogno Acqua Autunno (m³/ettaro)'] +\n",
|
|
" variety_info['Fabbisogno Acqua Inverno (m³/ettaro)']\n",
|
|
" ) / 4\n",
|
|
"\n",
|
|
" monthly_water_need = zone_weather.apply(\n",
|
|
" lambda row: calculate_water_need(row, base_water_need, variety_info['Temperatura Ottimale']),\n",
|
|
" axis=1\n",
|
|
" )\n",
|
|
" monthly_water_need *= np.random.uniform(0.95, 1.05, len(monthly_water_need))\n",
|
|
" annual_variety_water_need = monthly_water_need.sum() * percentage * hectares\n",
|
|
"\n",
|
|
" # Aggiorna totali annuali\n",
|
|
" annual_production += annual_variety_production\n",
|
|
" annual_min_oil += min_oil_production\n",
|
|
" annual_max_oil += max_oil_production\n",
|
|
" annual_avg_oil += avg_oil_production\n",
|
|
" annual_water_need += annual_variety_water_need\n",
|
|
"\n",
|
|
" # Aggiorna dati varietà\n",
|
|
" clean_variety = clean_column_name(variety)\n",
|
|
" variety_data[clean_variety].update({\n",
|
|
" 'tech': clean_column_name(technique),\n",
|
|
" 'pct': percentage,\n",
|
|
" 'prod_t_ha': variety_info['Produzione (tonnellate/ettaro)'] * np.random.uniform(0.95, 1.05),\n",
|
|
" 'oil_prod_t_ha': variety_info['Produzione Olio (tonnellate/ettaro)'] * np.random.uniform(0.95, 1.05),\n",
|
|
" 'oil_prod_l_ha': variety_info['Produzione Olio (litri/ettaro)'] * np.random.uniform(0.95, 1.05),\n",
|
|
" 'min_yield_pct': variety_info['Min % Resa'] * min_yield_factor,\n",
|
|
" 'max_yield_pct': variety_info['Max % Resa'] * max_yield_factor,\n",
|
|
" 'min_oil_prod_l_ha': variety_info['Min Produzione Olio (litri/ettaro)'] * min_yield_factor,\n",
|
|
" 'max_oil_prod_l_ha': variety_info['Max Produzione Olio (litri/ettaro)'] * max_yield_factor,\n",
|
|
" 'avg_oil_prod_l_ha': variety_info['Media Produzione Olio (litri/ettaro)'] * avg_yield_factor,\n",
|
|
" 'l_per_t': variety_info['Litri per Tonnellata'] * np.random.uniform(0.98, 1.02),\n",
|
|
" 'min_l_per_t': variety_info['Min Litri per Tonnellata'] * min_yield_factor,\n",
|
|
" 'max_l_per_t': variety_info['Max Litri per Tonnellata'] * max_yield_factor,\n",
|
|
" 'avg_l_per_t': variety_info['Media Litri per Tonnellata'] * avg_yield_factor,\n",
|
|
" 'olive_prod': annual_variety_production,\n",
|
|
" 'min_oil_prod': min_oil_production,\n",
|
|
" 'max_oil_prod': max_oil_production,\n",
|
|
" 'avg_oil_prod': avg_oil_production,\n",
|
|
" 'water_need': annual_variety_water_need\n",
|
|
" })\n",
|
|
"\n",
|
|
" # Appiattisci i dati delle varietà\n",
|
|
" flattened_variety_data = {\n",
|
|
" f'{variety}_{key}': value\n",
|
|
" for variety, data in variety_data.items()\n",
|
|
" for key, value in data.items()\n",
|
|
" }\n",
|
|
"\n",
|
|
" # Restituisci il risultato della zona\n",
|
|
" return {\n",
|
|
" 'year': year,\n",
|
|
" 'zone_id': zone + 1,\n",
|
|
" 'temp_mean': zone_weather['temp_mean'].mean(),\n",
|
|
" 'precip_sum': zone_weather['precip_sum'].sum(),\n",
|
|
" 'solar_energy_sum': zone_weather['solarenergy_sum'].sum(),\n",
|
|
" 'ha': hectares,\n",
|
|
" 'zone': f\"zone_{zone + 1}\",\n",
|
|
" 'olive_prod': annual_production,\n",
|
|
" 'min_oil_prod': annual_min_oil,\n",
|
|
" 'max_oil_prod': annual_max_oil,\n",
|
|
" 'avg_oil_prod': annual_avg_oil,\n",
|
|
" 'total_water_need': annual_water_need,\n",
|
|
" **flattened_variety_data\n",
|
|
" }\n",
|
|
"\n",
|
|
"\n",
|
|
"def simulate_olive_production_parallel(weather_data, olive_varieties, num_simulations=5, \n",
|
|
" random_seed=None, max_workers=None, batch_size=500,\n",
|
|
" output_path=\"./kaggle/working/data/simulated_data.parquet\"):\n",
|
|
" \"\"\"\n",
|
|
" Versione ottimizzata della simulazione che salva i risultati in un unico file parquet partizionato\n",
|
|
" \n",
|
|
" Parameters:\n",
|
|
" -----------\n",
|
|
" weather_data : DataFrame\n",
|
|
" Dati meteorologici di input\n",
|
|
" olive_varieties : DataFrame\n",
|
|
" Dati sulle varietà di olive\n",
|
|
" num_simulations : int\n",
|
|
" Numero totale di simulazioni da eseguire\n",
|
|
" random_seed : int, optional\n",
|
|
" Seed per la riproducibilità\n",
|
|
" max_workers : int, optional\n",
|
|
" Numero massimo di workers per la parallelizzazione\n",
|
|
" batch_size : int\n",
|
|
" Dimensione di ogni batch di simulazioni\n",
|
|
" output_path : str\n",
|
|
" Percorso del file parquet di output (includerà le partizioni)\n",
|
|
" \"\"\"\n",
|
|
" import os\n",
|
|
" from math import ceil\n",
|
|
" \n",
|
|
" if random_seed is not None:\n",
|
|
" np.random.seed(random_seed)\n",
|
|
" \n",
|
|
" # Preparazione dati\n",
|
|
" create_technique_mapping(olive_varieties)\n",
|
|
" monthly_weather = preprocess_weather_data(weather_data)\n",
|
|
" all_varieties = olive_varieties['Varietà di Olive'].unique()\n",
|
|
" variety_techniques = {\n",
|
|
" variety: olive_varieties[olive_varieties['Varietà di Olive'] == variety]['Tecnica di Coltivazione'].unique()\n",
|
|
" for variety in all_varieties\n",
|
|
" }\n",
|
|
" \n",
|
|
" # Calcolo workers ottimali se non specificati\n",
|
|
" if max_workers is None:\n",
|
|
" max_workers = get_optimal_workers() or 1\n",
|
|
" print(f\"Utilizzando {max_workers} workers basati sulle risorse del sistema\")\n",
|
|
" \n",
|
|
" # Calcolo del numero di batch necessari\n",
|
|
" num_batches = ceil(num_simulations / batch_size)\n",
|
|
" print(f\"Elaborazione di {num_simulations} simulazioni in {num_batches} batch\")\n",
|
|
" \n",
|
|
" # Crea directory parent se non esiste\n",
|
|
" os.makedirs(os.path.dirname(output_path), exist_ok=True)\n",
|
|
" \n",
|
|
" for batch_num in range(num_batches):\n",
|
|
" start_sim = batch_num * batch_size\n",
|
|
" end_sim = min((batch_num + 1) * batch_size, num_simulations)\n",
|
|
" current_batch_size = end_sim - start_sim\n",
|
|
" \n",
|
|
" batch_results = []\n",
|
|
" \n",
|
|
" # Parallelizzazione usando ProcessPoolExecutor\n",
|
|
" with ProcessPoolExecutor(max_workers=max_workers) as executor:\n",
|
|
" with tqdm(total=current_batch_size * current_batch_size,\n",
|
|
" desc=f\"Batch {batch_num + 1}/{num_batches}\") as pbar:\n",
|
|
" \n",
|
|
" future_to_sim_id = {}\n",
|
|
" \n",
|
|
" # Sottometti i lavori per il batch corrente\n",
|
|
" for sim in range(start_sim, end_sim):\n",
|
|
" selected_year = np.random.choice(monthly_weather['year'].unique())\n",
|
|
" base_weather = monthly_weather[monthly_weather['year'] == selected_year].copy()\n",
|
|
" base_weather.loc[:, 'growth_phase'] = base_weather['month'].apply(get_growth_phase)\n",
|
|
" \n",
|
|
" for zone in range(current_batch_size):\n",
|
|
" future = executor.submit(\n",
|
|
" simulate_zone,\n",
|
|
" base_weather=base_weather,\n",
|
|
" olive_varieties=olive_varieties,\n",
|
|
" year=selected_year,\n",
|
|
" zone=zone,\n",
|
|
" all_varieties=all_varieties,\n",
|
|
" variety_techniques=variety_techniques\n",
|
|
" )\n",
|
|
" future_to_sim_id[future] = sim + 1\n",
|
|
" \n",
|
|
" # Raccogli i risultati del batch\n",
|
|
" for future in as_completed(future_to_sim_id.keys()):\n",
|
|
" sim_id = future_to_sim_id[future]\n",
|
|
" try:\n",
|
|
" result = future.result()\n",
|
|
" result['simulation_id'] = sim_id\n",
|
|
" result['batch_id'] = batch_num # Aggiungiamo batch_id per il partizionamento\n",
|
|
" batch_results.append(result)\n",
|
|
" pbar.update(1)\n",
|
|
" except Exception as e:\n",
|
|
" print(f\"Errore nella simulazione {sim_id}: {str(e)}\")\n",
|
|
" continue\n",
|
|
" \n",
|
|
" # Converti i risultati del batch in DataFrame\n",
|
|
" batch_df = pd.DataFrame(batch_results)\n",
|
|
" \n",
|
|
" # Salva il batch come partizione del file parquet\n",
|
|
" batch_df.to_parquet(\n",
|
|
" output_path,\n",
|
|
" partition_cols=['batch_id'], # Partiziona per batch_id\n",
|
|
" append=batch_num > 0 # Appendi se non è il primo batch\n",
|
|
" )\n",
|
|
" \n",
|
|
" # Libera memoria\n",
|
|
" del batch_results\n",
|
|
" del batch_df\n",
|
|
" \n",
|
|
" print(f\"Simulazione completata. I dati sono stati salvati in: {output_path}\")\n",
|
|
"\n",
|
|
"\n",
|
|
"# Funzione per visualizzare il mapping delle tecniche\n",
|
|
"def print_technique_mapping(mapping_path='./kaggle/working/models/technique_mapping.joblib'):\n",
|
|
" if not os.path.exists(mapping_path):\n",
|
|
" print(\"Mapping file not found.\")\n",
|
|
" return\n",
|
|
"\n",
|
|
" mapping = joblib.load(mapping_path)\n",
|
|
" print(\"Technique Mapping:\")\n",
|
|
" for technique, code in mapping.items():\n",
|
|
" print(f\"{technique}: {code}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def clean_column_names(df):\n",
|
|
" # Funzione per pulire i nomi delle colonne\n",
|
|
" new_columns = []\n",
|
|
" for col in df.columns:\n",
|
|
" # Usa regex per separare le varietà\n",
|
|
" varieties = re.findall(r'([a-z]+)_([a-z_]+)', col)\n",
|
|
" if varieties:\n",
|
|
" new_columns.append(f\"{varieties[0][0]}_{varieties[0][1]}\")\n",
|
|
" else:\n",
|
|
" new_columns.append(col)\n",
|
|
" return new_columns\n",
|
|
"\n",
|
|
"\n",
|
|
"def prepare_comparison_data(simulated_data, olive_varieties):\n",
|
|
" # Pulisci i nomi delle colonne\n",
|
|
" df = simulated_data.copy()\n",
|
|
"\n",
|
|
" df.columns = clean_column_names(df)\n",
|
|
" df = encode_techniques(df)\n",
|
|
"\n",
|
|
" all_varieties = olive_varieties['Varietà di Olive'].unique()\n",
|
|
" varieties = [clean_column_name(variety) for variety in all_varieties]\n",
|
|
" comparison_data = []\n",
|
|
"\n",
|
|
" for variety in varieties:\n",
|
|
" olive_prod_col = next((col for col in df.columns if col.startswith(f'{variety}_') and col.endswith('_olive_prod')), None)\n",
|
|
" oil_prod_col = next((col for col in df.columns if col.startswith(f'{variety}_') and col.endswith('_avg_oil_prod')), None)\n",
|
|
" tech_col = next((col for col in df.columns if col.startswith(f'{variety}_') and col.endswith('_tech')), None)\n",
|
|
" water_need_col = next((col for col in df.columns if col.startswith(f'{variety}_') and col.endswith('_water_need')), None)\n",
|
|
"\n",
|
|
" if olive_prod_col and oil_prod_col and tech_col and water_need_col:\n",
|
|
" variety_data = df[[olive_prod_col, oil_prod_col, tech_col, water_need_col]]\n",
|
|
" variety_data = variety_data[variety_data[tech_col] != 0] # Esclude le righe dove la tecnica è 0\n",
|
|
"\n",
|
|
" if not variety_data.empty:\n",
|
|
" avg_olive_prod = pd.to_numeric(variety_data[olive_prod_col], errors='coerce').mean()\n",
|
|
" avg_oil_prod = pd.to_numeric(variety_data[oil_prod_col], errors='coerce').mean()\n",
|
|
" avg_water_need = pd.to_numeric(variety_data[water_need_col], errors='coerce').mean()\n",
|
|
" efficiency = avg_oil_prod / avg_olive_prod if avg_olive_prod > 0 else 0\n",
|
|
" water_efficiency = avg_oil_prod / avg_water_need if avg_water_need > 0 else 0\n",
|
|
"\n",
|
|
" comparison_data.append({\n",
|
|
" 'Variety': variety,\n",
|
|
" 'Avg Olive Production (kg/ha)': avg_olive_prod,\n",
|
|
" 'Avg Oil Production (L/ha)': avg_oil_prod,\n",
|
|
" 'Avg Water Need (m³/ha)': avg_water_need,\n",
|
|
" 'Oil Efficiency (L/kg)': efficiency,\n",
|
|
" 'Water Efficiency (L oil/m³ water)': water_efficiency\n",
|
|
" })\n",
|
|
"\n",
|
|
" return pd.DataFrame(comparison_data)\n",
|
|
"\n",
|
|
"\n",
|
|
"def plot_variety_comparison(comparison_data, metric):\n",
|
|
" plt.figure(figsize=(12, 6))\n",
|
|
" bars = plt.bar(comparison_data['Variety'], comparison_data[metric])\n",
|
|
" plt.title(f'Comparison of {metric} across Olive Varieties')\n",
|
|
" plt.xlabel('Variety')\n",
|
|
" plt.ylabel(metric)\n",
|
|
" plt.xticks(rotation=45, ha='right')\n",
|
|
"\n",
|
|
" for bar in bars:\n",
|
|
" height = bar.get_height()\n",
|
|
" plt.text(bar.get_x() + bar.get_width() / 2., height,\n",
|
|
" f'{height:.2f}',\n",
|
|
" ha='center', va='bottom')\n",
|
|
"\n",
|
|
" plt.tight_layout()\n",
|
|
" plt.show()\n",
|
|
" save_plot(plt, f'variety_comparison_{metric.lower().replace(\" \", \"_\").replace(\"/\", \"_\").replace(\"(\", \"\").replace(\")\", \"\")}')\n",
|
|
" plt.close()\n",
|
|
"\n",
|
|
"\n",
|
|
"def plot_efficiency_vs_production(comparison_data):\n",
|
|
" plt.figure(figsize=(10, 6))\n",
|
|
"\n",
|
|
" plt.scatter(comparison_data['Avg Olive Production (kg/ha)'],\n",
|
|
" comparison_data['Oil Efficiency (L/kg)'],\n",
|
|
" s=100)\n",
|
|
"\n",
|
|
" for i, row in comparison_data.iterrows():\n",
|
|
" plt.annotate(row['Variety'],\n",
|
|
" (row['Avg Olive Production (kg/ha)'], row['Oil Efficiency (L/kg)']),\n",
|
|
" xytext=(5, 5), textcoords='offset points')\n",
|
|
"\n",
|
|
" plt.title('Oil Efficiency vs Olive Production by Variety')\n",
|
|
" plt.xlabel('Average Olive Production (kg/ha)')\n",
|
|
" plt.ylabel('Oil Efficiency (L oil / kg olives)')\n",
|
|
" plt.tight_layout()\n",
|
|
" save_plot(plt, 'efficiency_vs_production')\n",
|
|
" plt.close()\n",
|
|
"\n",
|
|
"\n",
|
|
"def plot_water_efficiency_vs_production(comparison_data):\n",
|
|
" plt.figure(figsize=(10, 6))\n",
|
|
"\n",
|
|
" plt.scatter(comparison_data['Avg Olive Production (kg/ha)'],\n",
|
|
" comparison_data['Water Efficiency (L oil/m³ water)'],\n",
|
|
" s=100)\n",
|
|
"\n",
|
|
" for i, row in comparison_data.iterrows():\n",
|
|
" plt.annotate(row['Variety'],\n",
|
|
" (row['Avg Olive Production (kg/ha)'], row['Water Efficiency (L oil/m³ water)']),\n",
|
|
" xytext=(5, 5), textcoords='offset points')\n",
|
|
"\n",
|
|
" plt.title('Water Efficiency vs Olive Production by Variety')\n",
|
|
" plt.xlabel('Average Olive Production (kg/ha)')\n",
|
|
" plt.ylabel('Water Efficiency (L oil / m³ water)')\n",
|
|
" plt.tight_layout()\n",
|
|
" plt.show()\n",
|
|
" save_plot(plt, 'water_efficiency_vs_production')\n",
|
|
" plt.close()\n",
|
|
"\n",
|
|
"\n",
|
|
"def plot_water_need_vs_oil_production(comparison_data):\n",
|
|
" plt.figure(figsize=(10, 6))\n",
|
|
"\n",
|
|
" plt.scatter(comparison_data['Avg Water Need (m³/ha)'],\n",
|
|
" comparison_data['Avg Oil Production (L/ha)'],\n",
|
|
" s=100)\n",
|
|
"\n",
|
|
" for i, row in comparison_data.iterrows():\n",
|
|
" plt.annotate(row['Variety'],\n",
|
|
" (row['Avg Water Need (m³/ha)'], row['Avg Oil Production (L/ha)']),\n",
|
|
" xytext=(5, 5), textcoords='offset points')\n",
|
|
"\n",
|
|
" plt.title('Oil Production vs Water Need by Variety')\n",
|
|
" plt.xlabel('Average Water Need (m³/ha)')\n",
|
|
" plt.ylabel('Average Oil Production (L/ha)')\n",
|
|
" plt.tight_layout()\n",
|
|
" plt.show()\n",
|
|
" save_plot(plt, 'water_need_vs_oil_production')\n",
|
|
" plt.close()\n",
|
|
"\n",
|
|
"\n",
|
|
"def analyze_by_technique(simulated_data, olive_varieties):\n",
|
|
" # Pulisci i nomi delle colonne\n",
|
|
" df = simulated_data.copy()\n",
|
|
"\n",
|
|
" df.columns = clean_column_names(df)\n",
|
|
" df = encode_techniques(df)\n",
|
|
" all_varieties = olive_varieties['Varietà di Olive'].unique()\n",
|
|
" varieties = [clean_column_name(variety) for variety in all_varieties]\n",
|
|
"\n",
|
|
" technique_data = []\n",
|
|
"\n",
|
|
" for variety in varieties:\n",
|
|
" olive_prod_col = next((col for col in df.columns if col.startswith(f'{variety}_') and col.endswith('_olive_prod')), None)\n",
|
|
" oil_prod_col = next((col for col in df.columns if col.startswith(f'{variety}_') and col.endswith('_avg_oil_prod')), None)\n",
|
|
" tech_col = next((col for col in df.columns if col.startswith(f'{variety}_') and col.endswith('_tech')), None)\n",
|
|
" water_need_col = next((col for col in df.columns if col.startswith(f'{variety}_') and col.endswith('_water_need')), None)\n",
|
|
"\n",
|
|
" if olive_prod_col and oil_prod_col and tech_col and water_need_col:\n",
|
|
" variety_data = df[[olive_prod_col, oil_prod_col, tech_col, water_need_col]]\n",
|
|
" variety_data = variety_data[variety_data[tech_col] != 0]\n",
|
|
"\n",
|
|
" if not variety_data.empty:\n",
|
|
" for tech in variety_data[tech_col].unique():\n",
|
|
" tech_data = variety_data[variety_data[tech_col] == tech]\n",
|
|
"\n",
|
|
" avg_olive_prod = pd.to_numeric(tech_data[olive_prod_col], errors='coerce').mean()\n",
|
|
" avg_oil_prod = pd.to_numeric(tech_data[oil_prod_col], errors='coerce').mean()\n",
|
|
" avg_water_need = pd.to_numeric(tech_data[water_need_col], errors='coerce').mean()\n",
|
|
"\n",
|
|
" efficiency = avg_oil_prod / avg_olive_prod if avg_olive_prod > 0 else 0\n",
|
|
" water_efficiency = avg_oil_prod / avg_water_need if avg_water_need > 0 else 0\n",
|
|
"\n",
|
|
" technique_data.append({\n",
|
|
" 'Variety': variety,\n",
|
|
" 'Technique': tech,\n",
|
|
" 'Technique String': decode_single_technique(tech),\n",
|
|
" 'Avg Olive Production (kg/ha)': avg_olive_prod,\n",
|
|
" 'Avg Oil Production (L/ha)': avg_oil_prod,\n",
|
|
" 'Avg Water Need (m³/ha)': avg_water_need,\n",
|
|
" 'Oil Efficiency (L/kg)': efficiency,\n",
|
|
" 'Water Efficiency (L oil/m³ water)': water_efficiency\n",
|
|
" })\n",
|
|
"\n",
|
|
" return pd.DataFrame(technique_data)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def get_full_data(simulated_data, olive_varieties):\n",
|
|
" # Assumiamo che simulated_data contenga già tutti i dati necessari\n",
|
|
" # Includiamo solo le colonne rilevanti\n",
|
|
" relevant_columns = ['year', 'temp_mean', 'precip_sum', 'solar_energy_sum', 'ha', 'zone', 'olive_prod']\n",
|
|
"\n",
|
|
" # Aggiungiamo le colonne specifiche per varietà\n",
|
|
" all_varieties = olive_varieties['Varietà di Olive'].unique()\n",
|
|
" varieties = [clean_column_name(variety) for variety in all_varieties]\n",
|
|
" for variety in varieties:\n",
|
|
" relevant_columns.extend([f'{variety}_olive_prod', f'{variety}_tech'])\n",
|
|
"\n",
|
|
" return simulated_data[relevant_columns].copy()\n",
|
|
"\n",
|
|
"\n",
|
|
"def analyze_correlations(full_data, variety):\n",
|
|
" # Filtra i dati per la varietà specifica\n",
|
|
" variety_data = full_data[[col for col in full_data.columns if not col.startswith('_') or col.startswith(f'{variety}_')]]\n",
|
|
"\n",
|
|
" # Rinomina le colonne per chiarezza\n",
|
|
" variety_data = variety_data.rename(columns={\n",
|
|
" f'{variety}_olive_prod': 'olive_production',\n",
|
|
" f'{variety}_tech': 'technique'\n",
|
|
" })\n",
|
|
"\n",
|
|
" # Matrice di correlazione\n",
|
|
" plt.figure(figsize=(12, 10))\n",
|
|
" corr_matrix = variety_data[['temp_mean', 'precip_sum', 'solar_energy_sum', 'olive_production']].corr()\n",
|
|
" sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')\n",
|
|
" plt.title(f'Matrice di Correlazione - {variety}')\n",
|
|
" plt.tight_layout()\n",
|
|
" plt.show()\n",
|
|
" save_plot(plt, f'correlation_matrix_{variety}')\n",
|
|
" plt.close()\n",
|
|
"\n",
|
|
" # Scatter plots\n",
|
|
" fig, axes = plt.subplots(2, 2, figsize=(20, 20))\n",
|
|
" fig.suptitle(f'Relazione tra Fattori Meteorologici e Produzione di Olive - {variety}', fontsize=16)\n",
|
|
"\n",
|
|
" for ax, var in zip(axes.flat, ['temp_mean', 'precip_sum', 'solar_energy_sum', 'ha']):\n",
|
|
" sns.scatterplot(data=variety_data, x=var, y='olive_production', hue='technique', ax=ax)\n",
|
|
" ax.set_title(f'{var.capitalize()} vs Produzione Olive')\n",
|
|
" ax.set_xlabel(var.capitalize())\n",
|
|
" ax.set_ylabel('Produzione Olive (kg/ettaro)')\n",
|
|
"\n",
|
|
" plt.tight_layout()\n",
|
|
" plt.show()\n",
|
|
" save_plot(plt, f'meteorological_factors_{variety}')\n",
|
|
" plt.close()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-10-24T10:25:45.872651Z",
|
|
"start_time": "2024-10-24T10:25:45.859503Z"
|
|
},
|
|
"id": "2QXm2B51thBA"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def prepare_transformer_data(df, olive_varieties_df):\n",
|
|
" # Crea una copia del DataFrame per evitare modifiche all'originale\n",
|
|
" df = df.copy()\n",
|
|
"\n",
|
|
" # Ordina per zona e anno\n",
|
|
" df = df.sort_values(['zone', 'year'])\n",
|
|
"\n",
|
|
" # Definisci le feature\n",
|
|
" temporal_features = ['temp_mean', 'precip_sum', 'solar_energy_sum']\n",
|
|
" static_features = ['ha'] # Feature statiche base\n",
|
|
" target_features = ['olive_prod', 'min_oil_prod', 'max_oil_prod', 'avg_oil_prod', 'total_water_need']\n",
|
|
"\n",
|
|
" # Ottieni le varietà pulite\n",
|
|
" all_varieties = olive_varieties_df['Varietà di Olive'].unique()\n",
|
|
" varieties = [clean_column_name(variety) for variety in all_varieties]\n",
|
|
"\n",
|
|
" # Crea la struttura delle feature per ogni varietà\n",
|
|
" variety_features = [\n",
|
|
" 'tech', 'pct', 'prod_t_ha', 'oil_prod_t_ha', 'oil_prod_l_ha',\n",
|
|
" 'min_yield_pct', 'max_yield_pct', 'min_oil_prod_l_ha', 'max_oil_prod_l_ha',\n",
|
|
" 'avg_oil_prod_l_ha', 'l_per_t', 'min_l_per_t', 'max_l_per_t', 'avg_l_per_t'\n",
|
|
" ]\n",
|
|
"\n",
|
|
" # Prepara dizionari per le nuove colonne\n",
|
|
" new_columns = {}\n",
|
|
"\n",
|
|
" # Prepara le feature per ogni varietà\n",
|
|
" for variety in varieties:\n",
|
|
" # Feature esistenti\n",
|
|
" for feature in variety_features:\n",
|
|
" col_name = f\"{variety}_{feature}\"\n",
|
|
" if col_name in df.columns:\n",
|
|
" if feature != 'tech': # Non includere la colonna tech direttamente\n",
|
|
" static_features.append(col_name)\n",
|
|
"\n",
|
|
" # Feature binarie per le tecniche di coltivazione\n",
|
|
" for technique in ['tradizionale', 'intensiva', 'superintensiva']:\n",
|
|
" col_name = f\"{variety}_{technique}\"\n",
|
|
" new_columns[col_name] = df[f\"{variety}_tech\"].notna() & (\n",
|
|
" df[f\"{variety}_tech\"].str.lower() == technique\n",
|
|
" ).fillna(False)\n",
|
|
" static_features.append(col_name)\n",
|
|
"\n",
|
|
" # Aggiungi tutte le nuove colonne in una volta sola\n",
|
|
" new_df = pd.concat([df] + [pd.Series(v, name=k) for k, v in new_columns.items()], axis=1)\n",
|
|
"\n",
|
|
" # Ordiniamo per zona e anno per mantenere la continuità temporale\n",
|
|
" df_sorted = new_df.sort_values(['zone', 'year'])\n",
|
|
"\n",
|
|
" # Definiamo la dimensione della finestra temporale\n",
|
|
" window_size = 41\n",
|
|
"\n",
|
|
" # Liste per raccogliere i dati\n",
|
|
" temporal_sequences = []\n",
|
|
" static_features_list = []\n",
|
|
" targets_list = []\n",
|
|
"\n",
|
|
" # Iteriamo per ogni zona\n",
|
|
" for zone in df_sorted['zone'].unique():\n",
|
|
" zone_data = df_sorted[df_sorted['zone'] == zone].reset_index(drop=True)\n",
|
|
"\n",
|
|
" if len(zone_data) >= window_size: # Verifichiamo che ci siano abbastanza dati\n",
|
|
" # Creiamo sequenze temporali scorrevoli\n",
|
|
" for i in range(len(zone_data) - window_size + 1):\n",
|
|
" # Sequenza temporale\n",
|
|
" temporal_window = zone_data.iloc[i:i + window_size][temporal_features].values\n",
|
|
" # Verifichiamo che non ci siano valori NaN\n",
|
|
" if not np.isnan(temporal_window).any():\n",
|
|
" temporal_sequences.append(temporal_window)\n",
|
|
"\n",
|
|
" # Feature statiche (prendiamo quelle dell'ultimo timestep della finestra)\n",
|
|
" static_features_list.append(zone_data.iloc[i + window_size - 1][static_features].values)\n",
|
|
"\n",
|
|
" # Target (prendiamo quelli dell'ultimo timestep della finestra)\n",
|
|
" targets_list.append(zone_data.iloc[i + window_size - 1][target_features].values)\n",
|
|
"\n",
|
|
" # Convertiamo in array numpy\n",
|
|
" X_temporal = np.array(temporal_sequences)\n",
|
|
" X_static = np.array(static_features_list)\n",
|
|
" y = np.array(targets_list)\n",
|
|
"\n",
|
|
" print(f\"Dataset completo - Temporal: {X_temporal.shape}, Static: {X_static.shape}, Target: {y.shape}\")\n",
|
|
"\n",
|
|
" # Split dei dati (usando indici casuali per una migliore distribuzione)\n",
|
|
" indices = np.random.permutation(len(X_temporal))\n",
|
|
" #train_idx = int(len(indices) * 0.7)\n",
|
|
" #val_idx = int(len(indices) * 0.85)\n",
|
|
"\n",
|
|
" train_idx = int(len(indices) * 0.65) # 65% training\n",
|
|
" val_idx = int(len(indices) * 0.85) # 20% validation\n",
|
|
" # Il resto rimane 15% test\n",
|
|
"\n",
|
|
" # Oppure versione con 25% validation:\n",
|
|
" #train_idx = int(len(indices) * 0.60) # 60% training\n",
|
|
" #val_idx = int(len(indices) * 0.85) # 25% validation\n",
|
|
"\n",
|
|
" train_indices = indices[:train_idx]\n",
|
|
" val_indices = indices[train_idx:val_idx]\n",
|
|
" test_indices = indices[val_idx:]\n",
|
|
"\n",
|
|
" # Split dei dati\n",
|
|
" X_temporal_train = X_temporal[train_indices]\n",
|
|
" X_temporal_val = X_temporal[val_indices]\n",
|
|
" X_temporal_test = X_temporal[test_indices]\n",
|
|
"\n",
|
|
" X_static_train = X_static[train_indices]\n",
|
|
" X_static_val = X_static[val_indices]\n",
|
|
" X_static_test = X_static[test_indices]\n",
|
|
"\n",
|
|
" y_train = y[train_indices]\n",
|
|
" y_val = y[val_indices]\n",
|
|
" y_test = y[test_indices]\n",
|
|
"\n",
|
|
" # Standardizzazione\n",
|
|
" scaler_temporal = StandardScaler()\n",
|
|
" scaler_static = StandardScaler()\n",
|
|
" scaler_y = StandardScaler()\n",
|
|
"\n",
|
|
" # Standardizzazione dei dati temporali\n",
|
|
" X_temporal_train = scaler_temporal.fit_transform(X_temporal_train.reshape(-1, len(temporal_features))).reshape(X_temporal_train.shape)\n",
|
|
" X_temporal_val = scaler_temporal.transform(X_temporal_val.reshape(-1, len(temporal_features))).reshape(X_temporal_val.shape)\n",
|
|
" X_temporal_test = scaler_temporal.transform(X_temporal_test.reshape(-1, len(temporal_features))).reshape(X_temporal_test.shape)\n",
|
|
"\n",
|
|
" # Standardizzazione dei dati statici\n",
|
|
" X_static_train = scaler_static.fit_transform(X_static_train)\n",
|
|
" X_static_val = scaler_static.transform(X_static_val)\n",
|
|
" X_static_test = scaler_static.transform(X_static_test)\n",
|
|
"\n",
|
|
" # Standardizzazione dei target\n",
|
|
" y_train = scaler_y.fit_transform(y_train)\n",
|
|
" y_val = scaler_y.transform(y_val)\n",
|
|
" y_test = scaler_y.transform(y_test)\n",
|
|
"\n",
|
|
" print(\"\\nShape dopo lo split e standardizzazione:\")\n",
|
|
" print(f\"Train - Temporal: {X_temporal_train.shape}, Static: {X_static_train.shape}, Target: {y_train.shape}\")\n",
|
|
" print(f\"Val - Temporal: {X_temporal_val.shape}, Static: {X_static_val.shape}, Target: {y_val.shape}\")\n",
|
|
" print(f\"Test - Temporal: {X_temporal_test.shape}, Static: {X_static_test.shape}, Target: {y_test.shape}\")\n",
|
|
"\n",
|
|
" # Prepara i dizionari di input\n",
|
|
" train_data = {'temporal': X_temporal_train, 'static': X_static_train}\n",
|
|
" val_data = {'temporal': X_temporal_val, 'static': X_static_val}\n",
|
|
" test_data = {'temporal': X_temporal_test, 'static': X_static_test}\n",
|
|
"\n",
|
|
" base_path = './kaggle/working/models/oil_transformer/'\n",
|
|
"\n",
|
|
" os.makedirs(base_path, exist_ok=True)\n",
|
|
"\n",
|
|
" joblib.dump(scaler_temporal, os.path.join(base_path, 'scaler_temporal.joblib'))\n",
|
|
" joblib.dump(scaler_static, os.path.join(base_path, 'scaler_static.joblib'))\n",
|
|
" joblib.dump(scaler_y, os.path.join(base_path, 'scaler_y.joblib'))\n",
|
|
"\n",
|
|
" return (train_data, y_train), (val_data, y_val), (test_data, y_test), (scaler_temporal, scaler_static, scaler_y)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 16,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Per denormalizzare e calcolare l'errore reale\n",
|
|
"def calculate_real_error(model, test_data, test_targets, scaler_y):\n",
|
|
" # Fare predizioni\n",
|
|
" predictions = model.predict(test_data)\n",
|
|
"\n",
|
|
" # Denormalizzare predizioni e target\n",
|
|
" predictions_real = scaler_y.inverse_transform(predictions)\n",
|
|
" targets_real = scaler_y.inverse_transform(test_targets)\n",
|
|
"\n",
|
|
" # Calcolare errore percentuale per ogni target\n",
|
|
" percentage_errors = []\n",
|
|
" absolute_errors = []\n",
|
|
"\n",
|
|
" for i in range(predictions_real.shape[1]):\n",
|
|
" mae = np.mean(np.abs(predictions_real[:, i] - targets_real[:, i]))\n",
|
|
" mape = np.mean(np.abs((predictions_real[:, i] - targets_real[:, i]) / targets_real[:, i])) * 100\n",
|
|
" percentage_errors.append(mape)\n",
|
|
" absolute_errors.append(mae)\n",
|
|
"\n",
|
|
" # Stampa risultati per ogni target\n",
|
|
" target_names = ['olive_prod', 'min_oil_prod', 'max_oil_prod', 'avg_oil_prod', 'total_water_need']\n",
|
|
"\n",
|
|
" print(\"\\nErrori per target:\")\n",
|
|
" print(\"-\" * 50)\n",
|
|
" for i, target in enumerate(target_names):\n",
|
|
" print(f\"{target}:\")\n",
|
|
" print(f\"MAE assoluto: {absolute_errors[i]:.2f}\")\n",
|
|
" print(f\"Errore percentuale medio: {percentage_errors[i]:.2f}%\")\n",
|
|
" print(f\"Precisione: {100 - percentage_errors[i]:.2f}%\")\n",
|
|
" print(\"-\" * 50)\n",
|
|
"\n",
|
|
" return percentage_errors, absolute_errors"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 17,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-10-25T21:05:45.017577Z",
|
|
"start_time": "2024-10-25T21:05:34.194467Z"
|
|
},
|
|
"id": "d_WHC4rJthA8"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"folder_path = './data/weather'\n",
|
|
"#raw_data = read_json_files(folder_path)\n",
|
|
"#weather_data = create_weather_dataset(raw_data)\n",
|
|
"#weather_data['datetime'] = pd.to_datetime(weather_data['datetime'], errors='coerce')\n",
|
|
"#weather_data['date'] = weather_data['datetime'].dt.date\n",
|
|
"#weather_data = weather_data.dropna(subset=['datetime'])\n",
|
|
"#weather_data['datetime'] = pd.to_datetime(weather_data['datetime'])\n",
|
|
"#weather_data['year'] = weather_data['datetime'].dt.year\n",
|
|
"#weather_data['month'] = weather_data['datetime'].dt.month\n",
|
|
"#weather_data['day'] = weather_data['datetime'].dt.day\n",
|
|
"#weather_data.head()\n",
|
|
"\n",
|
|
"#weather_data.to_parquet('./data/weather_data.parquet')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 18,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-10-26T05:43:32.169183Z",
|
|
"start_time": "2024-10-26T05:43:29.609044Z"
|
|
},
|
|
"id": "uvIOrixethA9"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"weather_data = pd.read_parquet('./kaggle/input/olive-oil/weather_data.parquet')\n",
|
|
"\n",
|
|
"features = [\n",
|
|
" 'temp', 'tempmin', 'tempmax', 'humidity', 'cloudcover', 'windspeed', 'pressure', 'visibility',\n",
|
|
" 'hour_sin', 'hour_cos', 'month_sin', 'month_cos', 'day_of_year_sin', 'day_of_year_cos',\n",
|
|
" 'temp_humidity', 'temp_cloudcover', 'visibility_cloudcover', 'clear_sky_factor', 'day_length',\n",
|
|
" 'temp_1h_lag', 'cloudcover_1h_lag', 'humidity_1h_lag', 'temp_rolling_mean_6h',\n",
|
|
" 'cloudcover_rolling_mean_6h'\n",
|
|
"] + [col for col in weather_data.columns if 'season_' in col or 'time_period_' in col]\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 19,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"start_time": "2024-10-26T05:43:33.294101Z"
|
|
},
|
|
"colab": {
|
|
"base_uri": "https://localhost:8080/",
|
|
"height": 1000
|
|
},
|
|
"id": "7qF_3gVpthA9",
|
|
"jupyter": {
|
|
"is_executing": true
|
|
},
|
|
"outputId": "0de98483-956b-45e2-f9f3-8410f79cd307"
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Preparazione dati iniziale...\n",
|
|
"Shape iniziale features: (129674, 24)\n",
|
|
"\n",
|
|
"==================================================\n",
|
|
"Training modello per: solarradiation\n",
|
|
"==================================================\n",
|
|
"\n",
|
|
"Preparazione dati di training...\n",
|
|
"\n",
|
|
"Creazione modello...\n",
|
|
"Input shape: (24, 24)\n"
|
|
]
|
|
},
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"2024-11-06 21:44:20.395277: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory\n"
|
|
]
|
|
},
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Model: \"SolarRadiation\"\n",
|
|
"__________________________________________________________________________________________________\n",
|
|
" Layer (type) Output Shape Param # Connected to \n",
|
|
"==================================================================================================\n",
|
|
" main_input (InputLayer) [(None, 24, 24)] 0 [] \n",
|
|
" \n",
|
|
" conv1d (Conv1D) (None, 24, 32) 2336 ['main_input[0][0]'] \n",
|
|
" \n",
|
|
" batch_normalization (Batch (None, 24, 32) 128 ['conv1d[0][0]'] \n",
|
|
" Normalization) \n",
|
|
" \n",
|
|
" activation (Activation) (None, 24, 32) 0 ['batch_normalization[0][0]'] \n",
|
|
" \n",
|
|
" conv1d_1 (Conv1D) (None, 24, 64) 6208 ['activation[0][0]'] \n",
|
|
" \n",
|
|
" solar_params (InputLayer) [(None, 3)] 0 [] \n",
|
|
" \n",
|
|
" batch_normalization_1 (Bat (None, 24, 64) 256 ['conv1d_1[0][0]'] \n",
|
|
" chNormalization) \n",
|
|
" \n",
|
|
" bidirectional (Bidirection (None, 24, 128) 45568 ['main_input[0][0]'] \n",
|
|
" al) \n",
|
|
" \n",
|
|
" dense (Dense) (None, 32) 128 ['solar_params[0][0]'] \n",
|
|
" \n",
|
|
" activation_1 (Activation) (None, 24, 64) 0 ['batch_normalization_1[0][0]'\n",
|
|
" ] \n",
|
|
" \n",
|
|
" bidirectional_1 (Bidirecti (None, 64) 41216 ['bidirectional[0][0]'] \n",
|
|
" onal) \n",
|
|
" \n",
|
|
" batch_normalization_3 (Bat (None, 32) 128 ['dense[0][0]'] \n",
|
|
" chNormalization) \n",
|
|
" \n",
|
|
" global_average_pooling1d ( (None, 64) 0 ['activation_1[0][0]'] \n",
|
|
" GlobalAveragePooling1D) \n",
|
|
" \n",
|
|
" batch_normalization_2 (Bat (None, 64) 256 ['bidirectional_1[0][0]'] \n",
|
|
" chNormalization) \n",
|
|
" \n",
|
|
" activation_2 (Activation) (None, 32) 0 ['batch_normalization_3[0][0]'\n",
|
|
" ] \n",
|
|
" \n",
|
|
" concatenate (Concatenate) (None, 160) 0 ['global_average_pooling1d[0][\n",
|
|
" 0]', \n",
|
|
" 'batch_normalization_2[0][0]'\n",
|
|
" , 'activation_2[0][0]'] \n",
|
|
" \n",
|
|
" dense_1 (Dense) (None, 64) 10304 ['concatenate[0][0]'] \n",
|
|
" \n",
|
|
" batch_normalization_4 (Bat (None, 64) 256 ['dense_1[0][0]'] \n",
|
|
" chNormalization) \n",
|
|
" \n",
|
|
" activation_3 (Activation) (None, 64) 0 ['batch_normalization_4[0][0]'\n",
|
|
" ] \n",
|
|
" \n",
|
|
" dropout (Dropout) (None, 64) 0 ['activation_3[0][0]'] \n",
|
|
" \n",
|
|
" dense_2 (Dense) (None, 32) 2080 ['dropout[0][0]'] \n",
|
|
" \n",
|
|
" batch_normalization_5 (Bat (None, 32) 128 ['dense_2[0][0]'] \n",
|
|
" chNormalization) \n",
|
|
" \n",
|
|
" activation_4 (Activation) (None, 32) 0 ['batch_normalization_5[0][0]'\n",
|
|
" ] \n",
|
|
" \n",
|
|
" dense_3 (Dense) (None, 1) 33 ['activation_4[0][0]'] \n",
|
|
" \n",
|
|
"==================================================================================================\n",
|
|
"Total params: 109025 (425.88 KB)\n",
|
|
"Trainable params: 108449 (423.63 KB)\n",
|
|
"Non-trainable params: 576 (2.25 KB)\n",
|
|
"__________________________________________________________________________________________________\n",
|
|
"\n",
|
|
"Inizio training...\n",
|
|
"Epoch 1/50\n"
|
|
]
|
|
},
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"2024-11-06 21:44:28.783921: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8905\n",
|
|
"2024-11-06 21:44:28.896066: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory\n",
|
|
"2024-11-06 21:44:31.089698: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x71e4e5b291f0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:\n",
|
|
"2024-11-06 21:44:31.089754: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA L40S, Compute Capability 8.9\n",
|
|
"2024-11-06 21:44:31.096487: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.\n",
|
|
"2024-11-06 21:44:31.334699: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.\n"
|
|
]
|
|
},
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
" 5/2836 [..............................] - ETA: 1:11 - loss: 0.9626 - mae: 1.2291 WARNING:tensorflow:Callback method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0232s vs `on_train_batch_end` time: 0.0599s). Check your callbacks.\n",
|
|
"2836/2836 [==============================] - ETA: 0s - loss: 0.0277 - mae: 0.0994\n",
|
|
"Epoch 1: val_loss improved from inf to 0.00431, saving model to ./kaggle/working/models/solarradiation/checkpoints/best_model_01_0.0043.h5\n",
|
|
"2836/2836 [==============================] - 81s 24ms/step - loss: 0.0277 - mae: 0.0994 - val_loss: 0.0043 - val_mae: 0.0562 - lr: 0.0010\n",
|
|
"Epoch 2/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0047 - mae: 0.0590\n",
|
|
"Epoch 2: val_loss improved from 0.00431 to 0.00289, saving model to ./kaggle/working/models/solarradiation/checkpoints/best_model_02_0.0029.h5\n",
|
|
"2836/2836 [==============================] - 67s 23ms/step - loss: 0.0047 - mae: 0.0590 - val_loss: 0.0029 - val_mae: 0.0435 - lr: 0.0010\n",
|
|
"Epoch 3/50\n",
|
|
"2835/2836 [============================>.] - ETA: 0s - loss: 0.0040 - mae: 0.0535\n",
|
|
"Epoch 3: val_loss did not improve from 0.00289\n",
|
|
"2836/2836 [==============================] - 67s 24ms/step - loss: 0.0040 - mae: 0.0534 - val_loss: 0.0035 - val_mae: 0.0478 - lr: 0.0010\n",
|
|
"Epoch 4/50\n",
|
|
"2835/2836 [============================>.] - ETA: 0s - loss: 0.0036 - mae: 0.0495\n",
|
|
"Epoch 4: val_loss improved from 0.00289 to 0.00282, saving model to ./kaggle/working/models/solarradiation/checkpoints/best_model_04_0.0028.h5\n",
|
|
"2836/2836 [==============================] - 67s 24ms/step - loss: 0.0036 - mae: 0.0495 - val_loss: 0.0028 - val_mae: 0.0410 - lr: 0.0010\n",
|
|
"Epoch 5/50\n",
|
|
"2835/2836 [============================>.] - ETA: 0s - loss: 0.0034 - mae: 0.0472\n",
|
|
"Epoch 5: val_loss did not improve from 0.00282\n",
|
|
"2836/2836 [==============================] - 70s 25ms/step - loss: 0.0034 - mae: 0.0472 - val_loss: 0.0034 - val_mae: 0.0457 - lr: 0.0010\n",
|
|
"Epoch 6/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0033 - mae: 0.0460\n",
|
|
"Epoch 6: val_loss improved from 0.00282 to 0.00275, saving model to ./kaggle/working/models/solarradiation/checkpoints/best_model_06_0.0028.h5\n",
|
|
"2836/2836 [==============================] - 66s 23ms/step - loss: 0.0033 - mae: 0.0460 - val_loss: 0.0028 - val_mae: 0.0381 - lr: 0.0010\n",
|
|
"Epoch 7/50\n",
|
|
"2835/2836 [============================>.] - ETA: 0s - loss: 0.0031 - mae: 0.0450\n",
|
|
"Epoch 7: val_loss improved from 0.00275 to 0.00255, saving model to ./kaggle/working/models/solarradiation/checkpoints/best_model_07_0.0026.h5\n",
|
|
"2836/2836 [==============================] - 66s 23ms/step - loss: 0.0031 - mae: 0.0450 - val_loss: 0.0026 - val_mae: 0.0369 - lr: 0.0010\n",
|
|
"Epoch 8/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0031 - mae: 0.0444\n",
|
|
"Epoch 8: val_loss did not improve from 0.00255\n",
|
|
"2836/2836 [==============================] - 65s 23ms/step - loss: 0.0031 - mae: 0.0444 - val_loss: 0.0037 - val_mae: 0.0442 - lr: 0.0010\n",
|
|
"Epoch 9/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0029 - mae: 0.0431\n",
|
|
"Epoch 9: val_loss did not improve from 0.00255\n",
|
|
"2836/2836 [==============================] - 65s 23ms/step - loss: 0.0029 - mae: 0.0431 - val_loss: 0.0039 - val_mae: 0.0455 - lr: 0.0010\n",
|
|
"Epoch 10/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0028 - mae: 0.0424\n",
|
|
"Epoch 10: val_loss improved from 0.00255 to 0.00247, saving model to ./kaggle/working/models/solarradiation/checkpoints/best_model_10_0.0025.h5\n",
|
|
"2836/2836 [==============================] - 65s 23ms/step - loss: 0.0028 - mae: 0.0424 - val_loss: 0.0025 - val_mae: 0.0357 - lr: 0.0010\n",
|
|
"Epoch 11/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0028 - mae: 0.0424\n",
|
|
"Epoch 11: val_loss did not improve from 0.00247\n",
|
|
"2836/2836 [==============================] - 64s 23ms/step - loss: 0.0028 - mae: 0.0424 - val_loss: 0.0026 - val_mae: 0.0362 - lr: 0.0010\n",
|
|
"Epoch 12/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0027 - mae: 0.0419\n",
|
|
"Epoch 12: val_loss improved from 0.00247 to 0.00240, saving model to ./kaggle/working/models/solarradiation/checkpoints/best_model_12_0.0024.h5\n",
|
|
"2836/2836 [==============================] - 65s 23ms/step - loss: 0.0027 - mae: 0.0419 - val_loss: 0.0024 - val_mae: 0.0359 - lr: 0.0010\n",
|
|
"Epoch 13/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0027 - mae: 0.0410\n",
|
|
"Epoch 13: val_loss did not improve from 0.00240\n",
|
|
"2836/2836 [==============================] - 63s 22ms/step - loss: 0.0027 - mae: 0.0410 - val_loss: 0.0029 - val_mae: 0.0404 - lr: 0.0010\n",
|
|
"Epoch 14/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0027 - mae: 0.0410\n",
|
|
"Epoch 14: val_loss did not improve from 0.00240\n",
|
|
"2836/2836 [==============================] - 63s 22ms/step - loss: 0.0027 - mae: 0.0410 - val_loss: 0.0034 - val_mae: 0.0403 - lr: 0.0010\n",
|
|
"Epoch 15/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0026 - mae: 0.0406\n",
|
|
"Epoch 15: val_loss improved from 0.00240 to 0.00225, saving model to ./kaggle/working/models/solarradiation/checkpoints/best_model_15_0.0023.h5\n",
|
|
"2836/2836 [==============================] - 63s 22ms/step - loss: 0.0026 - mae: 0.0406 - val_loss: 0.0023 - val_mae: 0.0336 - lr: 0.0010\n",
|
|
"Epoch 16/50\n",
|
|
"2836/2836 [==============================] - ETA: 0s - loss: 0.0026 - mae: 0.0402\n",
|
|
"Epoch 16: val_loss did not improve from 0.00225\n",
|
|
"2836/2836 [==============================] - 60s 21ms/step - loss: 0.0026 - mae: 0.0402 - val_loss: 0.0026 - val_mae: 0.0367 - lr: 0.0010\n",
|
|
"Epoch 17/50\n",
|
|
"2835/2836 [============================>.] - ETA: 0s - loss: 0.0026 - mae: 0.0401\n",
|
|
"Epoch 17: val_loss did not improve from 0.00225\n",
|
|
"2836/2836 [==============================] - 63s 22ms/step - loss: 0.0026 - mae: 0.0401 - val_loss: 0.0025 - val_mae: 0.0352 - lr: 0.0010\n",
|
|
"Epoch 18/50\n",
|
|
"2835/2836 [============================>.] - ETA: 0s - loss: 0.0025 - mae: 0.0397\n",
|
|
"Epoch 18: val_loss did not improve from 0.00225\n",
|
|
"2836/2836 [==============================] - 67s 24ms/step - loss: 0.0025 - mae: 0.0397 - val_loss: 0.0024 - val_mae: 0.0364 - lr: 0.0010\n",
|
|
"Epoch 19/50\n",
|
|
"2836/2836 [==============================] - ETA: 0s - loss: 0.0025 - mae: 0.0393\n",
|
|
"Epoch 19: val_loss did not improve from 0.00225\n",
|
|
"2836/2836 [==============================] - 66s 23ms/step - loss: 0.0025 - mae: 0.0393 - val_loss: 0.0024 - val_mae: 0.0339 - lr: 0.0010\n",
|
|
"Epoch 20/50\n",
|
|
"2836/2836 [==============================] - ETA: 0s - loss: 0.0025 - mae: 0.0393\n",
|
|
"Epoch 20: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.\n",
|
|
"\n",
|
|
"Epoch 20: val_loss did not improve from 0.00225\n",
|
|
"2836/2836 [==============================] - 66s 23ms/step - loss: 0.0025 - mae: 0.0393 - val_loss: 0.0024 - val_mae: 0.0347 - lr: 0.0010\n",
|
|
"Epoch 21/50\n",
|
|
"2836/2836 [==============================] - ETA: 0s - loss: 0.0023 - mae: 0.0374\n",
|
|
"Epoch 21: val_loss did not improve from 0.00225\n",
|
|
"2836/2836 [==============================] - 66s 23ms/step - loss: 0.0023 - mae: 0.0374 - val_loss: 0.0027 - val_mae: 0.0366 - lr: 5.0000e-04\n",
|
|
"Epoch 22/50\n",
|
|
"2836/2836 [==============================] - ETA: 0s - loss: 0.0022 - mae: 0.0371\n",
|
|
"Epoch 22: val_loss did not improve from 0.00225\n",
|
|
"2836/2836 [==============================] - 66s 23ms/step - loss: 0.0022 - mae: 0.0371 - val_loss: 0.0026 - val_mae: 0.0349 - lr: 5.0000e-04\n",
|
|
"Epoch 23/50\n",
|
|
"2835/2836 [============================>.] - ETA: 0s - loss: 0.0022 - mae: 0.0369\n",
|
|
"Epoch 23: val_loss did not improve from 0.00225\n",
|
|
"2836/2836 [==============================] - 67s 24ms/step - loss: 0.0022 - mae: 0.0369 - val_loss: 0.0024 - val_mae: 0.0346 - lr: 5.0000e-04\n",
|
|
"Epoch 24/50\n",
|
|
"2835/2836 [============================>.] - ETA: 0s - loss: 0.0022 - mae: 0.0368\n",
|
|
"Epoch 24: val_loss did not improve from 0.00225\n",
|
|
"2836/2836 [==============================] - 64s 22ms/step - loss: 0.0022 - mae: 0.0368 - val_loss: 0.0025 - val_mae: 0.0359 - lr: 5.0000e-04\n",
|
|
"Epoch 25/50\n",
|
|
"2836/2836 [==============================] - ETA: 0s - loss: 0.0022 - mae: 0.0367\n",
|
|
"Epoch 25: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.\n",
|
|
"\n",
|
|
"Epoch 25: val_loss did not improve from 0.00225\n",
|
|
"2836/2836 [==============================] - 67s 24ms/step - loss: 0.0022 - mae: 0.0367 - val_loss: 0.0026 - val_mae: 0.0354 - lr: 5.0000e-04\n",
|
|
"\n",
|
|
"Generazione predizioni complete...\n",
|
|
"4052/4052 [==============================] - 25s 6ms/step\n",
|
|
"\n",
|
|
"Statistiche finali predizioni solarradiation:\n",
|
|
"- Min: 0.0000\n",
|
|
"- Max: 1042.0900\n",
|
|
"- Media: 176.6735\n",
|
|
"\n",
|
|
"==================================================\n",
|
|
"Training modello per: solarenergy\n",
|
|
"==================================================\n",
|
|
"\n",
|
|
"Aggiunta predizioni precedenti da: ['solarradiation']\n",
|
|
"\n",
|
|
"Processing predizioni di solarradiation...\n",
|
|
"Allineamento dimensioni necessario:\n",
|
|
"- Current features: (129674, 24)\n",
|
|
"- Predictions: (129650, 1)\n",
|
|
"Aggiunta padding di 24 elementi\n",
|
|
"Statistiche feature solarradiation:\n",
|
|
"- Shape: (129674, 1)\n",
|
|
"- Range: [0.0000, 1.0000]\n",
|
|
"\n",
|
|
"Verifica dimensioni prima della concatenazione:\n",
|
|
"Nuove dimensioni features: (129674, 25)\n",
|
|
"\n",
|
|
"Preparazione dati di training...\n",
|
|
"\n",
|
|
"Creazione modello...\n",
|
|
"Input shape: (24, 25)\n",
|
|
"Model: \"SolarEnergy\"\n",
|
|
"__________________________________________________________________________________________________\n",
|
|
" Layer (type) Output Shape Param # Connected to \n",
|
|
"==================================================================================================\n",
|
|
" input_1 (InputLayer) [(None, 24, 25)] 0 [] \n",
|
|
" \n",
|
|
" conv1d_2 (Conv1D) (None, 24, 64) 4864 ['input_1[0][0]'] \n",
|
|
" \n",
|
|
" batch_normalization_7 (Bat (None, 24, 64) 256 ['conv1d_2[0][0]'] \n",
|
|
" chNormalization) \n",
|
|
" \n",
|
|
" activation_6 (Activation) (None, 24, 64) 0 ['batch_normalization_7[0][0]'\n",
|
|
" ] \n",
|
|
" \n",
|
|
" multi_head_attention (Mult (None, 24, 25) 26393 ['input_1[0][0]', \n",
|
|
" iHeadAttention) 'input_1[0][0]'] \n",
|
|
" \n",
|
|
" conv1d_3 (Conv1D) (None, 24, 32) 6176 ['activation_6[0][0]'] \n",
|
|
" \n",
|
|
" lstm_2 (LSTM) (None, 24, 64) 23040 ['input_1[0][0]'] \n",
|
|
" \n",
|
|
" batch_normalization_6 (Bat (None, 24, 25) 100 ['multi_head_attention[0][0]']\n",
|
|
" chNormalization) \n",
|
|
" \n",
|
|
" batch_normalization_8 (Bat (None, 24, 32) 128 ['conv1d_3[0][0]'] \n",
|
|
" chNormalization) \n",
|
|
" \n",
|
|
" lstm_3 (LSTM) (None, 32) 12416 ['lstm_2[0][0]'] \n",
|
|
" \n",
|
|
" activation_5 (Activation) (None, 24, 25) 0 ['batch_normalization_6[0][0]'\n",
|
|
" ] \n",
|
|
" \n",
|
|
" activation_7 (Activation) (None, 24, 32) 0 ['batch_normalization_8[0][0]'\n",
|
|
" ] \n",
|
|
" \n",
|
|
" batch_normalization_9 (Bat (None, 32) 128 ['lstm_3[0][0]'] \n",
|
|
" chNormalization) \n",
|
|
" \n",
|
|
" global_average_pooling1d_1 (None, 25) 0 ['activation_5[0][0]'] \n",
|
|
" (GlobalAveragePooling1D) \n",
|
|
" \n",
|
|
" global_average_pooling1d_2 (None, 32) 0 ['activation_7[0][0]'] \n",
|
|
" (GlobalAveragePooling1D) \n",
|
|
" \n",
|
|
" activation_8 (Activation) (None, 32) 0 ['batch_normalization_9[0][0]'\n",
|
|
" ] \n",
|
|
" \n",
|
|
" concatenate_1 (Concatenate (None, 89) 0 ['global_average_pooling1d_1[0\n",
|
|
" ) ][0]', \n",
|
|
" 'global_average_pooling1d_2[0\n",
|
|
" ][0]', \n",
|
|
" 'activation_8[0][0]'] \n",
|
|
" \n",
|
|
" dense_4 (Dense) (None, 128) 11520 ['concatenate_1[0][0]'] \n",
|
|
" \n",
|
|
" batch_normalization_10 (Ba (None, 128) 512 ['dense_4[0][0]'] \n",
|
|
" tchNormalization) \n",
|
|
" \n",
|
|
" activation_9 (Activation) (None, 128) 0 ['batch_normalization_10[0][0]\n",
|
|
" '] \n",
|
|
" \n",
|
|
" dropout_1 (Dropout) (None, 128) 0 ['activation_9[0][0]'] \n",
|
|
" \n",
|
|
" dense_5 (Dense) (None, 64) 8256 ['dropout_1[0][0]'] \n",
|
|
" \n",
|
|
" batch_normalization_11 (Ba (None, 64) 256 ['dense_5[0][0]'] \n",
|
|
" tchNormalization) \n",
|
|
" \n",
|
|
" activation_10 (Activation) (None, 64) 0 ['batch_normalization_11[0][0]\n",
|
|
" '] \n",
|
|
" \n",
|
|
" dropout_2 (Dropout) (None, 64) 0 ['activation_10[0][0]'] \n",
|
|
" \n",
|
|
" dense_6 (Dense) (None, 1) 65 ['dropout_2[0][0]'] \n",
|
|
" \n",
|
|
"==================================================================================================\n",
|
|
"Total params: 94110 (367.62 KB)\n",
|
|
"Trainable params: 93420 (364.92 KB)\n",
|
|
"Non-trainable params: 690 (2.70 KB)\n",
|
|
"__________________________________________________________________________________________________\n",
|
|
"\n",
|
|
"Inizio training...\n",
|
|
"Epoch 1/50\n",
|
|
" 4/2836 [..............................] - ETA: 1:01 - loss: 2.3626 - mae: 1.3694 WARNING:tensorflow:Callback method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0201s vs `on_train_batch_end` time: 0.0205s). Check your callbacks.\n",
|
|
"2836/2836 [==============================] - ETA: 0s - loss: 0.0692 - mae: 0.1162\n",
|
|
"Epoch 1: val_loss improved from inf to 0.00600, saving model to ./kaggle/working/models/solarenergy/checkpoints/best_model_01_0.0060.h5\n",
|
|
"2836/2836 [==============================] - 73s 22ms/step - loss: 0.0692 - mae: 0.1162 - val_loss: 0.0060 - val_mae: 0.0636 - lr: 0.0010\n",
|
|
"Epoch 2/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0057 - mae: 0.0630\n",
|
|
"Epoch 2: val_loss improved from 0.00600 to 0.00485, saving model to ./kaggle/working/models/solarenergy/checkpoints/best_model_02_0.0048.h5\n",
|
|
"2836/2836 [==============================] - 62s 22ms/step - loss: 0.0057 - mae: 0.0630 - val_loss: 0.0048 - val_mae: 0.0610 - lr: 0.0010\n",
|
|
"Epoch 3/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0052 - mae: 0.0591\n",
|
|
"Epoch 3: val_loss improved from 0.00485 to 0.00360, saving model to ./kaggle/working/models/solarenergy/checkpoints/best_model_03_0.0036.h5\n",
|
|
"2836/2836 [==============================] - 61s 22ms/step - loss: 0.0052 - mae: 0.0591 - val_loss: 0.0036 - val_mae: 0.0480 - lr: 0.0010\n",
|
|
"Epoch 4/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0049 - mae: 0.0557\n",
|
|
"Epoch 4: val_loss improved from 0.00360 to 0.00291, saving model to ./kaggle/working/models/solarenergy/checkpoints/best_model_04_0.0029.h5\n",
|
|
"2836/2836 [==============================] - 65s 23ms/step - loss: 0.0049 - mae: 0.0557 - val_loss: 0.0029 - val_mae: 0.0413 - lr: 0.0010\n",
|
|
"Epoch 5/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0048 - mae: 0.0548\n",
|
|
"Epoch 5: val_loss did not improve from 0.00291\n",
|
|
"2836/2836 [==============================] - 61s 22ms/step - loss: 0.0048 - mae: 0.0549 - val_loss: 0.0087 - val_mae: 0.0886 - lr: 0.0010\n",
|
|
"Epoch 6/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0048 - mae: 0.0545\n",
|
|
"Epoch 6: val_loss did not improve from 0.00291\n",
|
|
"2836/2836 [==============================] - 62s 22ms/step - loss: 0.0048 - mae: 0.0545 - val_loss: 0.0208 - val_mae: 0.1540 - lr: 0.0010\n",
|
|
"Epoch 7/50\n",
|
|
"2835/2836 [============================>.] - ETA: 0s - loss: 0.0046 - mae: 0.0535\n",
|
|
"Epoch 7: val_loss did not improve from 0.00291\n",
|
|
"2836/2836 [==============================] - 61s 22ms/step - loss: 0.0046 - mae: 0.0535 - val_loss: 0.0035 - val_mae: 0.0472 - lr: 0.0010\n",
|
|
"Epoch 8/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0045 - mae: 0.0529\n",
|
|
"Epoch 8: val_loss did not improve from 0.00291\n",
|
|
"2836/2836 [==============================] - 62s 22ms/step - loss: 0.0045 - mae: 0.0529 - val_loss: 0.0112 - val_mae: 0.1042 - lr: 0.0010\n",
|
|
"Epoch 9/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0046 - mae: 0.0529\n",
|
|
"Epoch 9: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.\n",
|
|
"\n",
|
|
"Epoch 9: val_loss did not improve from 0.00291\n",
|
|
"2836/2836 [==============================] - 61s 22ms/step - loss: 0.0045 - mae: 0.0529 - val_loss: 0.0035 - val_mae: 0.0498 - lr: 0.0010\n",
|
|
"Epoch 10/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0043 - mae: 0.0521\n",
|
|
"Epoch 10: val_loss did not improve from 0.00291\n",
|
|
"2836/2836 [==============================] - 66s 23ms/step - loss: 0.0043 - mae: 0.0521 - val_loss: 0.0115 - val_mae: 0.1027 - lr: 5.0000e-04\n",
|
|
"Epoch 11/50\n",
|
|
"2836/2836 [==============================] - ETA: 0s - loss: 0.0044 - mae: 0.0528\n",
|
|
"Epoch 11: val_loss did not improve from 0.00291\n",
|
|
"2836/2836 [==============================] - 62s 22ms/step - loss: 0.0044 - mae: 0.0528 - val_loss: 0.0069 - val_mae: 0.0714 - lr: 5.0000e-04\n",
|
|
"Epoch 12/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0042 - mae: 0.0518\n",
|
|
"Epoch 12: val_loss improved from 0.00291 to 0.00289, saving model to ./kaggle/working/models/solarenergy/checkpoints/best_model_12_0.0029.h5\n",
|
|
"2836/2836 [==============================] - 65s 23ms/step - loss: 0.0042 - mae: 0.0518 - val_loss: 0.0029 - val_mae: 0.0386 - lr: 5.0000e-04\n",
|
|
"Epoch 13/50\n",
|
|
"2835/2836 [============================>.] - ETA: 0s - loss: 0.0042 - mae: 0.0516\n",
|
|
"Epoch 13: val_loss did not improve from 0.00289\n",
|
|
"2836/2836 [==============================] - 58s 20ms/step - loss: 0.0042 - mae: 0.0516 - val_loss: 0.0072 - val_mae: 0.0754 - lr: 5.0000e-04\n",
|
|
"Epoch 14/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0042 - mae: 0.0511\n",
|
|
"Epoch 14: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.\n",
|
|
"\n",
|
|
"Epoch 14: val_loss did not improve from 0.00289\n",
|
|
"2836/2836 [==============================] - 62s 22ms/step - loss: 0.0042 - mae: 0.0511 - val_loss: 0.0117 - val_mae: 0.1028 - lr: 5.0000e-04\n",
|
|
"\n",
|
|
"Generazione predizioni complete...\n",
|
|
"4052/4052 [==============================] - 18s 4ms/step\n",
|
|
"\n",
|
|
"Statistiche finali predizioni solarenergy:\n",
|
|
"- Min: 0.0380\n",
|
|
"- Max: 3.3664\n",
|
|
"- Media: 0.6877\n",
|
|
"\n",
|
|
"==================================================\n",
|
|
"Training modello per: uvindex\n",
|
|
"==================================================\n",
|
|
"\n",
|
|
"Aggiunta predizioni precedenti da: ['solarradiation', 'solarenergy']\n",
|
|
"\n",
|
|
"Processing predizioni di solarradiation...\n",
|
|
"Allineamento dimensioni necessario:\n",
|
|
"- Current features: (129674, 25)\n",
|
|
"- Predictions: (129650, 1)\n",
|
|
"Aggiunta padding di 24 elementi\n",
|
|
"Statistiche feature solarradiation:\n",
|
|
"- Shape: (129674, 1)\n",
|
|
"- Range: [0.0000, 1.0000]\n",
|
|
"\n",
|
|
"Processing predizioni di solarenergy...\n",
|
|
"Allineamento dimensioni necessario:\n",
|
|
"- Current features: (129674, 25)\n",
|
|
"- Predictions: (129650, 1)\n",
|
|
"Aggiunta padding di 24 elementi\n",
|
|
"Statistiche feature solarenergy:\n",
|
|
"- Shape: (129674, 1)\n",
|
|
"- Range: [0.0000, 1.0000]\n",
|
|
"\n",
|
|
"Verifica dimensioni prima della concatenazione:\n",
|
|
"Nuove dimensioni features: (129674, 27)\n",
|
|
"\n",
|
|
"Preparazione dati di training...\n",
|
|
"\n",
|
|
"Creazione modello...\n",
|
|
"Input shape: (24, 27)\n",
|
|
"Model: \"SolarUV\"\n",
|
|
"__________________________________________________________________________________________________\n",
|
|
" Layer (type) Output Shape Param # Connected to \n",
|
|
"==================================================================================================\n",
|
|
" input_2 (InputLayer) [(None, 24, 27)] 0 [] \n",
|
|
" \n",
|
|
" conv1d_4 (Conv1D) (None, 24, 64) 5248 ['input_2[0][0]'] \n",
|
|
" \n",
|
|
" batch_normalization_12 (Ba (None, 24, 64) 256 ['conv1d_4[0][0]'] \n",
|
|
" tchNormalization) \n",
|
|
" \n",
|
|
" activation_11 (Activation) (None, 24, 64) 0 ['batch_normalization_12[0][0]\n",
|
|
" '] \n",
|
|
" \n",
|
|
" max_pooling1d (MaxPooling1 (None, 12, 64) 0 ['activation_11[0][0]'] \n",
|
|
" D) \n",
|
|
" \n",
|
|
" conv1d_5 (Conv1D) (None, 12, 32) 6176 ['max_pooling1d[0][0]'] \n",
|
|
" \n",
|
|
" multi_head_attention_1 (Mu (None, 24, 27) 14235 ['input_2[0][0]', \n",
|
|
" ltiHeadAttention) 'input_2[0][0]'] \n",
|
|
" \n",
|
|
" global_average_pooling1d_5 (None, 27) 0 ['input_2[0][0]'] \n",
|
|
" (GlobalAveragePooling1D) \n",
|
|
" \n",
|
|
" batch_normalization_13 (Ba (None, 12, 32) 128 ['conv1d_5[0][0]'] \n",
|
|
" tchNormalization) \n",
|
|
" \n",
|
|
" batch_normalization_14 (Ba (None, 24, 27) 108 ['multi_head_attention_1[0][0]\n",
|
|
" tchNormalization) '] \n",
|
|
" \n",
|
|
" dense_7 (Dense) (None, 64) 1792 ['global_average_pooling1d_5[0\n",
|
|
" ][0]'] \n",
|
|
" \n",
|
|
" activation_12 (Activation) (None, 12, 32) 0 ['batch_normalization_13[0][0]\n",
|
|
" '] \n",
|
|
" \n",
|
|
" activation_13 (Activation) (None, 24, 27) 0 ['batch_normalization_14[0][0]\n",
|
|
" '] \n",
|
|
" \n",
|
|
" batch_normalization_15 (Ba (None, 64) 256 ['dense_7[0][0]'] \n",
|
|
" tchNormalization) \n",
|
|
" \n",
|
|
" global_average_pooling1d_3 (None, 32) 0 ['activation_12[0][0]'] \n",
|
|
" (GlobalAveragePooling1D) \n",
|
|
" \n",
|
|
" global_average_pooling1d_4 (None, 27) 0 ['activation_13[0][0]'] \n",
|
|
" (GlobalAveragePooling1D) \n",
|
|
" \n",
|
|
" activation_14 (Activation) (None, 64) 0 ['batch_normalization_15[0][0]\n",
|
|
" '] \n",
|
|
" \n",
|
|
" concatenate_2 (Concatenate (None, 123) 0 ['global_average_pooling1d_3[0\n",
|
|
" ) ][0]', \n",
|
|
" 'global_average_pooling1d_4[0\n",
|
|
" ][0]', \n",
|
|
" 'activation_14[0][0]'] \n",
|
|
" \n",
|
|
" dense_8 (Dense) (None, 128) 15872 ['concatenate_2[0][0]'] \n",
|
|
" \n",
|
|
" batch_normalization_16 (Ba (None, 128) 512 ['dense_8[0][0]'] \n",
|
|
" tchNormalization) \n",
|
|
" \n",
|
|
" activation_15 (Activation) (None, 128) 0 ['batch_normalization_16[0][0]\n",
|
|
" '] \n",
|
|
" \n",
|
|
" dropout_3 (Dropout) (None, 128) 0 ['activation_15[0][0]'] \n",
|
|
" \n",
|
|
" dense_9 (Dense) (None, 64) 8256 ['dropout_3[0][0]'] \n",
|
|
" \n",
|
|
" batch_normalization_17 (Ba (None, 64) 256 ['dense_9[0][0]'] \n",
|
|
" tchNormalization) \n",
|
|
" \n",
|
|
" activation_16 (Activation) (None, 64) 0 ['batch_normalization_17[0][0]\n",
|
|
" '] \n",
|
|
" \n",
|
|
" dropout_4 (Dropout) (None, 64) 0 ['activation_16[0][0]'] \n",
|
|
" \n",
|
|
" dense_10 (Dense) (None, 1) 65 ['dropout_4[0][0]'] \n",
|
|
" \n",
|
|
"==================================================================================================\n",
|
|
"Total params: 53160 (207.66 KB)\n",
|
|
"Trainable params: 52402 (204.70 KB)\n",
|
|
"Non-trainable params: 758 (2.96 KB)\n",
|
|
"__________________________________________________________________________________________________\n",
|
|
"\n",
|
|
"Inizio training...\n",
|
|
"Epoch 1/50\n",
|
|
"2836/2836 [==============================] - ETA: 0s - loss: 0.0938 - mae: 0.1971\n",
|
|
"Epoch 1: val_loss improved from inf to 0.02909, saving model to ./kaggle/working/models/uvindex/checkpoints/best_model_01_0.0291.h5\n",
|
|
"2836/2836 [==============================] - 60s 18ms/step - loss: 0.0938 - mae: 0.1971 - val_loss: 0.0291 - val_mae: 0.1910 - lr: 0.0010\n",
|
|
"Epoch 2/50\n",
|
|
"2835/2836 [============================>.] - ETA: 0s - loss: 0.0098 - mae: 0.0821\n",
|
|
"Epoch 2: val_loss did not improve from 0.02909\n",
|
|
"2836/2836 [==============================] - 51s 18ms/step - loss: 0.0098 - mae: 0.0821 - val_loss: 0.3685 - val_mae: 0.8389 - lr: 0.0010\n",
|
|
"Epoch 3/50\n",
|
|
"2833/2836 [============================>.] - ETA: 0s - loss: 0.0079 - mae: 0.0714\n",
|
|
"Epoch 3: val_loss did not improve from 0.02909\n",
|
|
"2836/2836 [==============================] - 53s 19ms/step - loss: 0.0079 - mae: 0.0714 - val_loss: 0.8313 - val_mae: 1.3285 - lr: 0.0010\n",
|
|
"Epoch 4/50\n",
|
|
"2836/2836 [==============================] - ETA: 0s - loss: 0.0077 - mae: 0.0704\n",
|
|
"Epoch 4: val_loss did not improve from 0.02909\n",
|
|
"2836/2836 [==============================] - 55s 19ms/step - loss: 0.0077 - mae: 0.0704 - val_loss: 0.2950 - val_mae: 0.7527 - lr: 0.0010\n",
|
|
"Epoch 5/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0076 - mae: 0.0698\n",
|
|
"Epoch 5: val_loss did not improve from 0.02909\n",
|
|
"2836/2836 [==============================] - 55s 20ms/step - loss: 0.0076 - mae: 0.0698 - val_loss: 2.0383 - val_mae: 2.5369 - lr: 0.0010\n",
|
|
"Epoch 6/50\n",
|
|
"2836/2836 [==============================] - ETA: 0s - loss: 0.0075 - mae: 0.0699\n",
|
|
"Epoch 6: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.\n",
|
|
"\n",
|
|
"Epoch 6: val_loss did not improve from 0.02909\n",
|
|
"2836/2836 [==============================] - 54s 19ms/step - loss: 0.0075 - mae: 0.0699 - val_loss: 0.3982 - val_mae: 0.8782 - lr: 0.0010\n",
|
|
"Epoch 7/50\n",
|
|
"2835/2836 [============================>.] - ETA: 0s - loss: 0.0067 - mae: 0.0668\n",
|
|
"Epoch 7: val_loss did not improve from 0.02909\n",
|
|
"2836/2836 [==============================] - 53s 19ms/step - loss: 0.0067 - mae: 0.0668 - val_loss: 0.1131 - val_mae: 0.4568 - lr: 5.0000e-04\n",
|
|
"Epoch 8/50\n",
|
|
"2835/2836 [============================>.] - ETA: 0s - loss: 0.0066 - mae: 0.0662\n",
|
|
"Epoch 8: val_loss did not improve from 0.02909\n",
|
|
"2836/2836 [==============================] - 54s 19ms/step - loss: 0.0066 - mae: 0.0662 - val_loss: 1.1239 - val_mae: 1.6230 - lr: 5.0000e-04\n",
|
|
"Epoch 9/50\n",
|
|
"2834/2836 [============================>.] - ETA: 0s - loss: 0.0065 - mae: 0.0658\n",
|
|
"Epoch 9: val_loss did not improve from 0.02909\n",
|
|
"2836/2836 [==============================] - 53s 19ms/step - loss: 0.0065 - mae: 0.0658 - val_loss: 0.4153 - val_mae: 0.8974 - lr: 5.0000e-04\n",
|
|
"Epoch 10/50\n",
|
|
"2835/2836 [============================>.] - ETA: 0s - loss: 0.0064 - mae: 0.0651\n",
|
|
"Epoch 10: val_loss did not improve from 0.02909\n",
|
|
"2836/2836 [==============================] - 53s 19ms/step - loss: 0.0064 - mae: 0.0651 - val_loss: 0.0937 - val_mae: 0.4185 - lr: 5.0000e-04\n",
|
|
"Epoch 11/50\n",
|
|
"2835/2836 [============================>.] - ETA: 0s - loss: 0.0063 - mae: 0.0648\n",
|
|
"Epoch 11: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.\n",
|
|
"\n",
|
|
"Epoch 11: val_loss did not improve from 0.02909\n",
|
|
"2836/2836 [==============================] - 53s 19ms/step - loss: 0.0063 - mae: 0.0648 - val_loss: 0.7356 - val_mae: 1.2348 - lr: 5.0000e-04\n",
|
|
"\n",
|
|
"Generazione predizioni complete...\n",
|
|
"4052/4052 [==============================] - 11s 3ms/step\n",
|
|
"\n",
|
|
"Statistiche finali predizioni uvindex:\n",
|
|
"- Min: 0.2790\n",
|
|
"- Max: 20.2327\n",
|
|
"- Media: 3.2535\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"models, histories, scalers = train_solar_models(weather_data, features)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 20,
|
|
"metadata": {
|
|
"id": "ixAzWupmthA-",
|
|
"outputId": "ee180137-1c9f-4eb1-8866-db1e1b1cb58c"
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"\n",
|
|
"Salvataggio scaler:\n",
|
|
"- Salvato scaler: X\n",
|
|
"- Salvato scaler: solarradiation\n",
|
|
"- Salvato scaler: solarenergy\n",
|
|
"- Salvato scaler: uvindex\n",
|
|
"- Salvato scaler: solarradiation_pred\n",
|
|
"- Salvato scaler: solarenergy_pred\n"
|
|
]
|
|
},
|
|
{
|
|
"ename": "TypeError",
|
|
"evalue": "cannot pickle 'dict_keys' object",
|
|
"output_type": "error",
|
|
"traceback": [
|
|
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
|
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
|
|
"Cell \u001b[0;32mIn[20], line 4\u001b[0m\n\u001b[1;32m 1\u001b[0m target_variables \u001b[38;5;241m=\u001b[39m [\u001b[38;5;124m'\u001b[39m\u001b[38;5;124msolarradiation\u001b[39m\u001b[38;5;124m'\u001b[39m, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124msolarenergy\u001b[39m\u001b[38;5;124m'\u001b[39m, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124muvindex\u001b[39m\u001b[38;5;124m'\u001b[39m]\n\u001b[1;32m 3\u001b[0m \u001b[38;5;66;03m# Salva tutto direttamente\u001b[39;00m\n\u001b[0;32m----> 4\u001b[0m \u001b[43msave_models_and_scalers\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 5\u001b[0m \u001b[43m \u001b[49m\u001b[43mmodels\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmodels\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 6\u001b[0m \u001b[43m \u001b[49m\u001b[43mscalers\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mscalers\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;66;43;03m# Passiamo direttamente il dizionario degli scalers così com'è\u001b[39;49;00m\n\u001b[1;32m 7\u001b[0m \u001b[43m \u001b[49m\u001b[43mtarget_variables\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtarget_variables\u001b[49m\n\u001b[1;32m 8\u001b[0m \u001b[43m)\u001b[49m\n",
|
|
"Cell \u001b[0;32mIn[8], line 30\u001b[0m, in \u001b[0;36msave_models_and_scalers\u001b[0;34m(models, scalers, target_variables, base_path)\u001b[0m\n\u001b[1;32m 28\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m scaler_name, scaler \u001b[38;5;129;01min\u001b[39;00m scalers\u001b[38;5;241m.\u001b[39mitems():\n\u001b[1;32m 29\u001b[0m scaler_file \u001b[38;5;241m=\u001b[39m os\u001b[38;5;241m.\u001b[39mpath\u001b[38;5;241m.\u001b[39mjoin(scaler_path, \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mscaler_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m.joblib\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[0;32m---> 30\u001b[0m \u001b[43mjoblib\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdump\u001b[49m\u001b[43m(\u001b[49m\u001b[43mscaler\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mscaler_file\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 31\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m- Salvato scaler: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mscaler_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 33\u001b[0m \u001b[38;5;66;03m# Salva la configurazione dei modelli\u001b[39;00m\n",
|
|
"File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/joblib/numpy_pickle.py:553\u001b[0m, in \u001b[0;36mdump\u001b[0;34m(value, filename, compress, protocol, cache_size)\u001b[0m\n\u001b[1;32m 551\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m is_filename:\n\u001b[1;32m 552\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m \u001b[38;5;28mopen\u001b[39m(filename, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mwb\u001b[39m\u001b[38;5;124m'\u001b[39m) \u001b[38;5;28;01mas\u001b[39;00m f:\n\u001b[0;32m--> 553\u001b[0m \u001b[43mNumpyPickler\u001b[49m\u001b[43m(\u001b[49m\u001b[43mf\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mprotocol\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mprotocol\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdump\u001b[49m\u001b[43m(\u001b[49m\u001b[43mvalue\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 554\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 555\u001b[0m NumpyPickler(filename, protocol\u001b[38;5;241m=\u001b[39mprotocol)\u001b[38;5;241m.\u001b[39mdump(value)\n",
|
|
"File \u001b[0;32m/usr/lib/python3.11/pickle.py:487\u001b[0m, in \u001b[0;36m_Pickler.dump\u001b[0;34m(self, obj)\u001b[0m\n\u001b[1;32m 485\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mproto \u001b[38;5;241m>\u001b[39m\u001b[38;5;241m=\u001b[39m \u001b[38;5;241m4\u001b[39m:\n\u001b[1;32m 486\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mframer\u001b[38;5;241m.\u001b[39mstart_framing()\n\u001b[0;32m--> 487\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msave\u001b[49m\u001b[43m(\u001b[49m\u001b[43mobj\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 488\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mwrite(STOP)\n\u001b[1;32m 489\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mframer\u001b[38;5;241m.\u001b[39mend_framing()\n",
|
|
"File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/joblib/numpy_pickle.py:355\u001b[0m, in \u001b[0;36mNumpyPickler.save\u001b[0;34m(self, obj)\u001b[0m\n\u001b[1;32m 352\u001b[0m wrapper\u001b[38;5;241m.\u001b[39mwrite_array(obj, \u001b[38;5;28mself\u001b[39m)\n\u001b[1;32m 353\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m\n\u001b[0;32m--> 355\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mPickler\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msave\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mobj\u001b[49m\u001b[43m)\u001b[49m\n",
|
|
"File \u001b[0;32m/usr/lib/python3.11/pickle.py:560\u001b[0m, in \u001b[0;36m_Pickler.save\u001b[0;34m(self, obj, save_persistent_id)\u001b[0m\n\u001b[1;32m 558\u001b[0m f \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mdispatch\u001b[38;5;241m.\u001b[39mget(t)\n\u001b[1;32m 559\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m f \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m--> 560\u001b[0m \u001b[43mf\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mobj\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;66;03m# Call unbound method with explicit self\u001b[39;00m\n\u001b[1;32m 561\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m\n\u001b[1;32m 563\u001b[0m \u001b[38;5;66;03m# Check private dispatch table if any, or else\u001b[39;00m\n\u001b[1;32m 564\u001b[0m \u001b[38;5;66;03m# copyreg.dispatch_table\u001b[39;00m\n",
|
|
"File \u001b[0;32m/usr/lib/python3.11/pickle.py:972\u001b[0m, in \u001b[0;36m_Pickler.save_dict\u001b[0;34m(self, obj)\u001b[0m\n\u001b[1;32m 969\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mwrite(MARK \u001b[38;5;241m+\u001b[39m DICT)\n\u001b[1;32m 971\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmemoize(obj)\n\u001b[0;32m--> 972\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_batch_setitems\u001b[49m\u001b[43m(\u001b[49m\u001b[43mobj\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mitems\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n",
|
|
"File \u001b[0;32m/usr/lib/python3.11/pickle.py:998\u001b[0m, in \u001b[0;36m_Pickler._batch_setitems\u001b[0;34m(self, items)\u001b[0m\n\u001b[1;32m 996\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m k, v \u001b[38;5;129;01min\u001b[39;00m tmp:\n\u001b[1;32m 997\u001b[0m save(k)\n\u001b[0;32m--> 998\u001b[0m \u001b[43msave\u001b[49m\u001b[43m(\u001b[49m\u001b[43mv\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 999\u001b[0m write(SETITEMS)\n\u001b[1;32m 1000\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m n:\n",
|
|
"File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/joblib/numpy_pickle.py:355\u001b[0m, in \u001b[0;36mNumpyPickler.save\u001b[0;34m(self, obj)\u001b[0m\n\u001b[1;32m 352\u001b[0m wrapper\u001b[38;5;241m.\u001b[39mwrite_array(obj, \u001b[38;5;28mself\u001b[39m)\n\u001b[1;32m 353\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m\n\u001b[0;32m--> 355\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mPickler\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msave\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mobj\u001b[49m\u001b[43m)\u001b[49m\n",
|
|
"File \u001b[0;32m/usr/lib/python3.11/pickle.py:560\u001b[0m, in \u001b[0;36m_Pickler.save\u001b[0;34m(self, obj, save_persistent_id)\u001b[0m\n\u001b[1;32m 558\u001b[0m f \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mdispatch\u001b[38;5;241m.\u001b[39mget(t)\n\u001b[1;32m 559\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m f \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m--> 560\u001b[0m \u001b[43mf\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mobj\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;66;03m# Call unbound method with explicit self\u001b[39;00m\n\u001b[1;32m 561\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m\n\u001b[1;32m 563\u001b[0m \u001b[38;5;66;03m# Check private dispatch table if any, or else\u001b[39;00m\n\u001b[1;32m 564\u001b[0m \u001b[38;5;66;03m# copyreg.dispatch_table\u001b[39;00m\n",
|
|
"File \u001b[0;32m/usr/lib/python3.11/pickle.py:972\u001b[0m, in \u001b[0;36m_Pickler.save_dict\u001b[0;34m(self, obj)\u001b[0m\n\u001b[1;32m 969\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mwrite(MARK \u001b[38;5;241m+\u001b[39m DICT)\n\u001b[1;32m 971\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmemoize(obj)\n\u001b[0;32m--> 972\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_batch_setitems\u001b[49m\u001b[43m(\u001b[49m\u001b[43mobj\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mitems\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n",
|
|
"File \u001b[0;32m/usr/lib/python3.11/pickle.py:998\u001b[0m, in \u001b[0;36m_Pickler._batch_setitems\u001b[0;34m(self, items)\u001b[0m\n\u001b[1;32m 996\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m k, v \u001b[38;5;129;01min\u001b[39;00m tmp:\n\u001b[1;32m 997\u001b[0m save(k)\n\u001b[0;32m--> 998\u001b[0m \u001b[43msave\u001b[49m\u001b[43m(\u001b[49m\u001b[43mv\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 999\u001b[0m write(SETITEMS)\n\u001b[1;32m 1000\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m n:\n",
|
|
"File \u001b[0;32m/usr/local/lib/python3.11/dist-packages/joblib/numpy_pickle.py:355\u001b[0m, in \u001b[0;36mNumpyPickler.save\u001b[0;34m(self, obj)\u001b[0m\n\u001b[1;32m 352\u001b[0m wrapper\u001b[38;5;241m.\u001b[39mwrite_array(obj, \u001b[38;5;28mself\u001b[39m)\n\u001b[1;32m 353\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m\n\u001b[0;32m--> 355\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mPickler\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msave\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mobj\u001b[49m\u001b[43m)\u001b[49m\n",
|
|
"File \u001b[0;32m/usr/lib/python3.11/pickle.py:578\u001b[0m, in \u001b[0;36m_Pickler.save\u001b[0;34m(self, obj, save_persistent_id)\u001b[0m\n\u001b[1;32m 576\u001b[0m reduce \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mgetattr\u001b[39m(obj, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m__reduce_ex__\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;28;01mNone\u001b[39;00m)\n\u001b[1;32m 577\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m reduce \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m--> 578\u001b[0m rv \u001b[38;5;241m=\u001b[39m reduce(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mproto)\n\u001b[1;32m 579\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 580\u001b[0m reduce \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mgetattr\u001b[39m(obj, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m__reduce__\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;28;01mNone\u001b[39;00m)\n",
|
|
"\u001b[0;31mTypeError\u001b[0m: cannot pickle 'dict_keys' object"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"target_variables = ['solarradiation', 'solarenergy', 'uvindex']\n",
|
|
"\n",
|
|
"# Salva tutto direttamente\n",
|
|
"save_models_and_scalers(\n",
|
|
" models=models,\n",
|
|
" scalers=scalers, # Passiamo direttamente il dizionario degli scalers così com'è\n",
|
|
" target_variables=target_variables\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-10-24T16:14:44.770508Z",
|
|
"start_time": "2024-10-24T13:29:15.181470Z"
|
|
},
|
|
"id": "BlQK-7y7thA-"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"data_after_2010 = weather_data[weather_data['year'] >= 2010].copy()\n",
|
|
"data_before_2010 = weather_data[weather_data['year'] < 2010].copy()\n",
|
|
"# Previsione delle variabili mancanti per data_before_2010\n",
|
|
"# Prepara data_before_2010\n",
|
|
"data_before_2010 = data_before_2010.sort_values('datetime')\n",
|
|
"data_before_2010.set_index('datetime', inplace=True)\n",
|
|
"\n",
|
|
"data_after_2010 = data_after_2010.sort_values('datetime')\n",
|
|
"data_after_2010.set_index('datetime', inplace=True)\n",
|
|
"\n",
|
|
"# Assicurati che le features non abbiano valori mancanti\n",
|
|
"data_before_2010[features] = data_before_2010[features].ffill()\n",
|
|
"data_before_2010[features] = data_before_2010[features].bfill()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-10-24T18:50:48.087413Z",
|
|
"start_time": "2024-10-24T18:47:52.511763Z"
|
|
},
|
|
"id": "r_hFmenDthA-",
|
|
"outputId": "650f8755-f6f6-47b4-fc74-c194dd81bf64"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"#models, scaler_X, scalers_y, target_variables = load_models_and_scalers()\n",
|
|
"\n",
|
|
"# Effettua predizioni\n",
|
|
"predictions = predict_solar_variables(\n",
|
|
" data_before_2010=data_before_2010,\n",
|
|
" features=features,\n",
|
|
" models=models,\n",
|
|
" scalers=scalers, # dizionario completo degli scalers\n",
|
|
" target_variables=target_variables\n",
|
|
")\n",
|
|
"\n",
|
|
"# Crea dataset completo\n",
|
|
"weather_data_complete = create_complete_dataset(\n",
|
|
" data_before_2010,\n",
|
|
" data_after_2010,\n",
|
|
" predictions\n",
|
|
")\n",
|
|
"\n",
|
|
"# Salva il risultato\n",
|
|
"weather_data_complete.reset_index(inplace=True)\n",
|
|
"weather_data_complete.to_parquet(\n",
|
|
" './kaggle/working/data/weather_data_complete.parquet',\n",
|
|
" index=False\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "IKObKOVEthA-"
|
|
},
|
|
"source": [
|
|
"## 2. Esplorazione dei Dati Meteo"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-10-23T06:10:46.688323Z",
|
|
"start_time": "2024-10-23T06:10:46.586185Z"
|
|
},
|
|
"id": "Z64O5RD9thA-"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"weather_data = pd.read_parquet('./kaggle/working/data/weather_data_complete.parquet')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-10-23T06:10:50.718574Z",
|
|
"start_time": "2024-10-23T06:10:46.901554Z"
|
|
},
|
|
"id": "f3j3IUvothA-",
|
|
"outputId": "a7f38468-f2f4-491e-eda5-ba6e6b8064ee"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Visualizzazione delle tendenze temporali\n",
|
|
"fig, axes = plt.subplots(6, 1, figsize=(15, 20))\n",
|
|
"weather_data.set_index('date')['temp'].plot(ax=axes[0], title='Temperatura Media Giornaliera')\n",
|
|
"weather_data.set_index('date')['humidity'].plot(ax=axes[1], title='Umidità Media Giornaliera')\n",
|
|
"weather_data.set_index('date')['solarradiation'].plot(ax=axes[2], title='Radiazione Solare Giornaliera')\n",
|
|
"weather_data.set_index('date')['solarenergy'].plot(ax=axes[3], title='Radiazione Solare Giornaliera')\n",
|
|
"weather_data.set_index('date')['uvindex'].plot(ax=axes[4], title='Precipitazioni Giornaliere')\n",
|
|
"weather_data.set_index('date')['precip'].plot(ax=axes[4], title='Precipitazioni Giornaliere')\n",
|
|
"plt.tight_layout()\n",
|
|
"plt.show()\n",
|
|
"save_plot(plt, 'weather_trends')\n",
|
|
"plt.close()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "DHcEwp3pthA_"
|
|
},
|
|
"source": [
|
|
"## 3. Simulazione dei Dati di Produzione Annuale"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-10-23T06:10:51.081621Z",
|
|
"start_time": "2024-10-23T06:10:51.044080Z"
|
|
},
|
|
"id": "5oG_nhbMthA_"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"olive_varieties = pd.read_csv('./kaggle/input/olive-oil/variety_olive_oil_production.csv')\n",
|
|
"\n",
|
|
"olive_varieties = add_olive_water_consumption_correlation(olive_varieties)\n",
|
|
"\n",
|
|
"olive_varieties.to_parquet(\"./kaggle/working/data/olive_varieties.parquet\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-10-24T10:59:32.356335Z",
|
|
"start_time": "2024-10-24T10:59:32.229812Z"
|
|
},
|
|
"id": "Y2IH37lAthA_",
|
|
"outputId": "d14e77c8-a4fb-4328-f6c6-de788bca8188"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"olive_varieties = pd.read_parquet(\"./kaggle/working/data/olive_varieties.parquet\")\n",
|
|
"\n",
|
|
"weather_data = pd.read_parquet('./kaggle/working/data/weather_data_complete.parquet')\n",
|
|
"\n",
|
|
"simulated_data = simulate_olive_production_parallel(weather_data, olive_varieties, 1000, random_state_value)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Visualizza il mapping delle tecniche\n",
|
|
"print_technique_mapping()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-10-23T06:10:54.639402Z",
|
|
"start_time": "2024-10-23T06:10:52.895228Z"
|
|
},
|
|
"id": "4izJmAsbthA_",
|
|
"outputId": "9f871e9b-c9b5-406d-f482-b925befd9dad"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"simulated_data = pd.read_parquet(\"./kaggle/working/data/simulated_data.parquet\")\n",
|
|
"\n",
|
|
"# Esecuzione dell'analisi\n",
|
|
"comparison_data = prepare_comparison_data(simulated_data, olive_varieties)\n",
|
|
"\n",
|
|
"# Genera i grafici\n",
|
|
"plot_variety_comparison(comparison_data, 'Avg Olive Production (kg/ha)')\n",
|
|
"plot_variety_comparison(comparison_data, 'Avg Oil Production (L/ha)')\n",
|
|
"plot_variety_comparison(comparison_data, 'Avg Water Need (m³/ha)')\n",
|
|
"plot_variety_comparison(comparison_data, 'Oil Efficiency (L/kg)')\n",
|
|
"plot_variety_comparison(comparison_data, 'Water Efficiency (L oil/m³ water)')\n",
|
|
"plot_efficiency_vs_production(comparison_data)\n",
|
|
"plot_water_efficiency_vs_production(comparison_data)\n",
|
|
"plot_water_need_vs_oil_production(comparison_data)\n",
|
|
"\n",
|
|
"# Analisi per tecnica\n",
|
|
"technique_data = analyze_by_technique(simulated_data, olive_varieties)\n",
|
|
"\n",
|
|
"print(technique_data)\n",
|
|
"\n",
|
|
"# Stampa un sommario statistico\n",
|
|
"print(\"Comparison by Variety:\")\n",
|
|
"print(comparison_data.set_index('Variety'))\n",
|
|
"print(\"\\nBest Varieties by Water Efficiency:\")\n",
|
|
"print(comparison_data.sort_values('Water Efficiency (L oil/m³ water)', ascending=False).head())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "dwhl4ID_thBA"
|
|
},
|
|
"source": [
|
|
"## 4. Analisi della Relazione tra Meteo e Produzione"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-10-23T06:10:55.903873Z",
|
|
"start_time": "2024-10-23T06:10:54.655058Z"
|
|
},
|
|
"id": "b28MG3NGthBA",
|
|
"outputId": "ac0759ce-ee6e-49e0-9ddd-a70d01ea18ff"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Uso delle funzioni\n",
|
|
"full_data = get_full_data(simulated_data, olive_varieties)\n",
|
|
"\n",
|
|
"# Assumiamo che 'selected_variety' sia definito altrove nel codice\n",
|
|
"# Per esempio:\n",
|
|
"selected_variety = 'nocellara_delletna'\n",
|
|
"\n",
|
|
"analyze_correlations(full_data, selected_variety)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "OZQ6hHFLthBA"
|
|
},
|
|
"source": [
|
|
"## 5. Preparazione del Modello di Machine Learning"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "smX8MBhithBA"
|
|
},
|
|
"source": [
|
|
"## Divisione train/validation/test:\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-10-24T10:25:49.473595Z",
|
|
"start_time": "2024-10-24T10:25:49.199833Z"
|
|
},
|
|
"id": "tupaX2LNthBA",
|
|
"outputId": "0a7968cd-9fef-4873-b834-d6b13fe805be"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"simulated_data = pd.read_parquet(\"./kaggle/working/data/simulated_data.parquet\")\n",
|
|
"olive_varieties = pd.read_parquet(\"./kaggle/working/data/olive_varieties.parquet\")\n",
|
|
"\n",
|
|
"(train_data, train_targets), (val_data, val_targets), (test_data, test_targets), scalers = prepare_transformer_data(simulated_data, olive_varieties)\n",
|
|
"\n",
|
|
"scaler_temporal, scaler_static, scaler_y = scalers\n",
|
|
"\n",
|
|
"print(\"Temporal data shape:\", train_data['temporal'].shape)\n",
|
|
"print(\"Static data shape:\", train_data['static'].shape)\n",
|
|
"print(\"Target shape:\", train_targets.shape)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "kE7oohfsthBB"
|
|
},
|
|
"source": [
|
|
"## OliveOilTransformer"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-10-24T09:32:37.506903Z",
|
|
"start_time": "2024-10-24T09:32:36.905756Z"
|
|
},
|
|
"id": "_l868dFFthBB",
|
|
"outputId": "b67993d4-a49e-4b75-d346-bf7f362f932d"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"@keras.saving.register_keras_serializable()\n",
|
|
"class DataAugmentation(tf.keras.layers.Layer):\n",
|
|
" \"\"\"Custom layer per l'augmentation dei dati\"\"\"\n",
|
|
" def __init__(self, noise_stddev=0.03, **kwargs):\n",
|
|
" super().__init__(**kwargs)\n",
|
|
" self.noise_stddev = noise_stddev\n",
|
|
"\n",
|
|
" def call(self, inputs, training=None):\n",
|
|
" if training:\n",
|
|
" return inputs + tf.random.normal(\n",
|
|
" shape=tf.shape(inputs), \n",
|
|
" mean=0.0, \n",
|
|
" stddev=self.noise_stddev\n",
|
|
" )\n",
|
|
" return inputs\n",
|
|
"\n",
|
|
" def get_config(self):\n",
|
|
" config = super().get_config()\n",
|
|
" config.update({\"noise_stddev\": self.noise_stddev})\n",
|
|
" return config\n",
|
|
"\n",
|
|
"@keras.saving.register_keras_serializable()\n",
|
|
"class PositionalEncoding(tf.keras.layers.Layer):\n",
|
|
" \"\"\"Custom layer per l'encoding posizionale\"\"\"\n",
|
|
" def __init__(self, d_model, **kwargs):\n",
|
|
" super().__init__(**kwargs)\n",
|
|
" self.d_model = d_model\n",
|
|
" \n",
|
|
" def build(self, input_shape):\n",
|
|
" _, seq_length, _ = input_shape\n",
|
|
" \n",
|
|
" # Crea la matrice di encoding posizionale\n",
|
|
" position = tf.range(seq_length, dtype=tf.float32)[:, tf.newaxis]\n",
|
|
" div_term = tf.exp(\n",
|
|
" tf.range(0, self.d_model, 2, dtype=tf.float32) * \n",
|
|
" (-tf.math.log(10000.0) / self.d_model)\n",
|
|
" )\n",
|
|
" \n",
|
|
" # Calcola sin e cos\n",
|
|
" pos_encoding = tf.zeros((1, seq_length, self.d_model))\n",
|
|
" pos_encoding_even = tf.sin(position * div_term)\n",
|
|
" pos_encoding_odd = tf.cos(position * div_term)\n",
|
|
" \n",
|
|
" # Assegna i valori alle posizioni pari e dispari\n",
|
|
" pos_encoding = tf.concat(\n",
|
|
" [tf.expand_dims(pos_encoding_even, -1), \n",
|
|
" tf.expand_dims(pos_encoding_odd, -1)], \n",
|
|
" axis=-1\n",
|
|
" )\n",
|
|
" pos_encoding = tf.reshape(pos_encoding, (1, seq_length, -1))\n",
|
|
" pos_encoding = pos_encoding[:, :, :self.d_model]\n",
|
|
" \n",
|
|
" # Salva l'encoding come peso non trainabile\n",
|
|
" self.pos_encoding = self.add_weight(\n",
|
|
" shape=(1, seq_length, self.d_model),\n",
|
|
" initializer=tf.keras.initializers.Constant(pos_encoding),\n",
|
|
" trainable=False,\n",
|
|
" name='positional_encoding'\n",
|
|
" )\n",
|
|
" \n",
|
|
" super().build(input_shape)\n",
|
|
"\n",
|
|
" def call(self, inputs):\n",
|
|
" # Broadcast l'encoding posizionale sul batch\n",
|
|
" batch_size = tf.shape(inputs)[0]\n",
|
|
" pos_encoding_tiled = tf.tile(self.pos_encoding, [batch_size, 1, 1])\n",
|
|
" return inputs + pos_encoding_tiled\n",
|
|
"\n",
|
|
" def get_config(self):\n",
|
|
" config = super().get_config()\n",
|
|
" config.update({\"d_model\": self.d_model})\n",
|
|
" return config\n",
|
|
"\n",
|
|
"@keras.saving.register_keras_serializable()\n",
|
|
"class WarmUpLearningRateSchedule(tf.keras.optimizers.schedules.LearningRateSchedule):\n",
|
|
" \"\"\"Custom learning rate schedule with linear warmup and exponential decay.\"\"\"\n",
|
|
" \n",
|
|
" def __init__(self, initial_learning_rate=1e-3, warmup_steps=500, decay_steps=5000):\n",
|
|
" super().__init__()\n",
|
|
" self.initial_learning_rate = initial_learning_rate\n",
|
|
" self.warmup_steps = warmup_steps\n",
|
|
" self.decay_steps = decay_steps\n",
|
|
"\n",
|
|
" def __call__(self, step):\n",
|
|
" warmup_pct = tf.cast(step, tf.float32) / self.warmup_steps\n",
|
|
" warmup_lr = self.initial_learning_rate * warmup_pct\n",
|
|
" decay_factor = tf.pow(0.1, tf.cast(step, tf.float32) / self.decay_steps)\n",
|
|
" decayed_lr = self.initial_learning_rate * decay_factor\n",
|
|
" return tf.where(step < self.warmup_steps, warmup_lr, decayed_lr)\n",
|
|
"\n",
|
|
" def get_config(self):\n",
|
|
" return {\n",
|
|
" 'initial_learning_rate': self.initial_learning_rate,\n",
|
|
" 'warmup_steps': self.warmup_steps,\n",
|
|
" 'decay_steps': self.decay_steps\n",
|
|
" }\n",
|
|
"\n",
|
|
"def create_olive_oil_transformer(temporal_shape, static_shape, num_outputs,\n",
|
|
" d_model=128, num_heads=8, ff_dim=256,\n",
|
|
" num_transformer_blocks=4, mlp_units=[256, 128, 64],\n",
|
|
" dropout=0.2):\n",
|
|
" \"\"\"\n",
|
|
" Crea un transformer per la predizione della produzione di olio d'oliva.\n",
|
|
" \"\"\"\n",
|
|
" # Input layers\n",
|
|
" temporal_input = tf.keras.layers.Input(shape=temporal_shape, name='temporal')\n",
|
|
" static_input = tf.keras.layers.Input(shape=static_shape, name='static')\n",
|
|
"\n",
|
|
" # === TEMPORAL PATH ===\n",
|
|
" x = tf.keras.layers.LayerNormalization(epsilon=1e-6)(temporal_input)\n",
|
|
" x = DataAugmentation()(x)\n",
|
|
"\n",
|
|
" # Temporal projection\n",
|
|
" x = tf.keras.layers.Dense(\n",
|
|
" d_model // 2,\n",
|
|
" activation='gelu',\n",
|
|
" kernel_regularizer=tf.keras.regularizers.l2(1e-5)\n",
|
|
" )(x)\n",
|
|
" x = tf.keras.layers.Dropout(dropout)(x)\n",
|
|
" x = tf.keras.layers.Dense(\n",
|
|
" d_model,\n",
|
|
" activation='gelu',\n",
|
|
" kernel_regularizer=tf.keras.regularizers.l2(1e-5)\n",
|
|
" )(x)\n",
|
|
"\n",
|
|
" # Positional encoding\n",
|
|
" x = PositionalEncoding(d_model)(x)\n",
|
|
"\n",
|
|
" # Transformer blocks\n",
|
|
" skip_connection = x\n",
|
|
" for _ in range(num_transformer_blocks):\n",
|
|
" # Self-attention\n",
|
|
" attention_output = tf.keras.layers.MultiHeadAttention(\n",
|
|
" num_heads=num_heads,\n",
|
|
" key_dim=d_model // num_heads,\n",
|
|
" value_dim=d_model // num_heads\n",
|
|
" )(x, x)\n",
|
|
" attention_output = tf.keras.layers.Dropout(dropout)(attention_output)\n",
|
|
"\n",
|
|
" # Residual connection con pesi addestrabili\n",
|
|
" residual_weights = tf.keras.layers.Dense(d_model, activation='sigmoid')(x)\n",
|
|
" x = tf.keras.layers.Add()([x, residual_weights * attention_output])\n",
|
|
" x = tf.keras.layers.LayerNormalization(epsilon=1e-6)(x)\n",
|
|
"\n",
|
|
" # Feed-forward network\n",
|
|
" ffn = tf.keras.layers.Dense(ff_dim, activation=\"gelu\")(x)\n",
|
|
" ffn = tf.keras.layers.Dropout(dropout)(ffn)\n",
|
|
" ffn = tf.keras.layers.Dense(d_model)(ffn)\n",
|
|
" ffn = tf.keras.layers.Dropout(dropout)(ffn)\n",
|
|
"\n",
|
|
" # Second residual connection\n",
|
|
" x = tf.keras.layers.Add()([x, ffn])\n",
|
|
" x = tf.keras.layers.LayerNormalization(epsilon=1e-6)(x)\n",
|
|
"\n",
|
|
" # Add final skip connection\n",
|
|
" x = tf.keras.layers.Add()([x, skip_connection])\n",
|
|
"\n",
|
|
" # Temporal pooling\n",
|
|
" attention_pooled = tf.keras.layers.MultiHeadAttention(\n",
|
|
" num_heads=num_heads,\n",
|
|
" key_dim=d_model // 4\n",
|
|
" )(x, x)\n",
|
|
" attention_pooled = tf.keras.layers.GlobalAveragePooling1D()(attention_pooled)\n",
|
|
"\n",
|
|
" # Additional pooling operations\n",
|
|
" avg_pooled = tf.keras.layers.GlobalAveragePooling1D()(x)\n",
|
|
" max_pooled = tf.keras.layers.GlobalMaxPooling1D()(x)\n",
|
|
"\n",
|
|
" # Combine pooling results\n",
|
|
" temporal_features = tf.keras.layers.Concatenate()(\n",
|
|
" [attention_pooled, avg_pooled, max_pooled]\n",
|
|
" )\n",
|
|
"\n",
|
|
" # === STATIC PATH ===\n",
|
|
" static_features = tf.keras.layers.LayerNormalization(epsilon=1e-6)(static_input)\n",
|
|
" for units in [256, 128, 64]:\n",
|
|
" static_features = tf.keras.layers.Dense(\n",
|
|
" units,\n",
|
|
" activation='gelu',\n",
|
|
" kernel_regularizer=tf.keras.regularizers.l2(1e-5)\n",
|
|
" )(static_features)\n",
|
|
" static_features = tf.keras.layers.Dropout(dropout)(static_features)\n",
|
|
"\n",
|
|
" # === FEATURE FUSION ===\n",
|
|
" combined = tf.keras.layers.Concatenate()([temporal_features, static_features])\n",
|
|
"\n",
|
|
" # === MLP HEAD ===\n",
|
|
" x = combined\n",
|
|
" for units in mlp_units:\n",
|
|
" x = tf.keras.layers.BatchNormalization()(x)\n",
|
|
" x = tf.keras.layers.Dense(\n",
|
|
" units,\n",
|
|
" activation=\"gelu\",\n",
|
|
" kernel_regularizer=tf.keras.regularizers.l2(1e-5)\n",
|
|
" )(x)\n",
|
|
" x = tf.keras.layers.Dropout(dropout)(x)\n",
|
|
"\n",
|
|
" # Output layer\n",
|
|
" outputs = tf.keras.layers.Dense(\n",
|
|
" num_outputs,\n",
|
|
" activation='linear',\n",
|
|
" kernel_regularizer=tf.keras.regularizers.l2(1e-5)\n",
|
|
" )(x)\n",
|
|
"\n",
|
|
" # Create model\n",
|
|
" model = tf.keras.Model(\n",
|
|
" inputs={'temporal': temporal_input, 'static': static_input},\n",
|
|
" outputs=outputs,\n",
|
|
" name='OilTransformer'\n",
|
|
" )\n",
|
|
" \n",
|
|
" return model\n",
|
|
"\n",
|
|
"\n",
|
|
"def create_transformer_callbacks(target_names, val_data, val_targets):\n",
|
|
" \"\"\"\n",
|
|
" Crea i callbacks per il training del modello.\n",
|
|
" \n",
|
|
" Parameters:\n",
|
|
" -----------\n",
|
|
" target_names : list\n",
|
|
" Lista dei nomi dei target per il monitoraggio specifico\n",
|
|
" val_data : dict\n",
|
|
" Dati di validazione\n",
|
|
" val_targets : array\n",
|
|
" Target di validazione\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" --------\n",
|
|
" list\n",
|
|
" Lista dei callbacks configurati\n",
|
|
" \"\"\"\n",
|
|
"\n",
|
|
" # Custom Metric per target specifici\n",
|
|
" class TargetSpecificMetric(tf.keras.callbacks.Callback):\n",
|
|
" def __init__(self, validation_data, target_names):\n",
|
|
" super().__init__()\n",
|
|
" self.validation_data = validation_data\n",
|
|
" self.target_names = target_names\n",
|
|
"\n",
|
|
" def on_epoch_end(self, epoch, logs={}):\n",
|
|
" x_val, y_val = self.validation_data\n",
|
|
" y_pred = self.model.predict(x_val, verbose=0)\n",
|
|
"\n",
|
|
" for i, name in enumerate(self.target_names):\n",
|
|
" mae = np.mean(np.abs(y_val[:, i] - y_pred[:, i]))\n",
|
|
" logs[f'val_{name}_mae'] = mae\n",
|
|
"\n",
|
|
" # Crea le cartelle per i checkpoint e i log se non esistono\n",
|
|
" os.makedirs('./kaggle/working/models/oil_transformer/checkpoints', exist_ok=True)\n",
|
|
" os.makedirs('./kaggle/working/models/oil_transformer/logs', exist_ok=True)\n",
|
|
"\n",
|
|
" callbacks = [\n",
|
|
" # Early Stopping\n",
|
|
" tf.keras.callbacks.EarlyStopping(\n",
|
|
" monitor='val_loss',\n",
|
|
" patience=20,\n",
|
|
" restore_best_weights=True,\n",
|
|
" min_delta=0.0005,\n",
|
|
" mode='min'\n",
|
|
" ),\n",
|
|
"\n",
|
|
" # Model Checkpoint\n",
|
|
" tf.keras.callbacks.ModelCheckpoint(\n",
|
|
" filepath='./kaggle/working/models/oil_transformer/checkpoints/model_{epoch:02d}_{val_loss:.4f}.h5',\n",
|
|
" monitor='val_loss',\n",
|
|
" save_best_only=True,\n",
|
|
" mode='min',\n",
|
|
" save_weights_only=True\n",
|
|
" ),\n",
|
|
"\n",
|
|
" # Metric per target specifici\n",
|
|
" TargetSpecificMetric(\n",
|
|
" validation_data=(val_data, val_targets),\n",
|
|
" target_names=target_names\n",
|
|
" ),\n",
|
|
"\n",
|
|
" # Reduce LR on Plateau\n",
|
|
" tf.keras.callbacks.ReduceLROnPlateau(\n",
|
|
" monitor='val_loss',\n",
|
|
" factor=0.5,\n",
|
|
" patience=10,\n",
|
|
" min_lr=1e-6,\n",
|
|
" verbose=1\n",
|
|
" ),\n",
|
|
"\n",
|
|
" # TensorBoard logging\n",
|
|
" tf.keras.callbacks.TensorBoard(\n",
|
|
" log_dir='./kaggle/working/models/oil_transformer/logs',\n",
|
|
" histogram_freq=1,\n",
|
|
" write_graph=True,\n",
|
|
" update_freq='epoch'\n",
|
|
" )\n",
|
|
" ]\n",
|
|
"\n",
|
|
" return callbacks\n",
|
|
"\n",
|
|
"def compile_model(model, learning_rate=1e-3):\n",
|
|
" \"\"\"\n",
|
|
" Compila il modello con le impostazioni standard.\n",
|
|
" \"\"\"\n",
|
|
" lr_schedule = WarmUpLearningRateSchedule(\n",
|
|
" initial_learning_rate=learning_rate,\n",
|
|
" warmup_steps=500,\n",
|
|
" decay_steps=5000\n",
|
|
" )\n",
|
|
" \n",
|
|
" model.compile(\n",
|
|
" optimizer=tf.keras.optimizers.AdamW(\n",
|
|
" learning_rate=lr_schedule,\n",
|
|
" weight_decay=0.01\n",
|
|
" ),\n",
|
|
" loss=tf.keras.losses.Huber(),\n",
|
|
" metrics=['mae']\n",
|
|
" )\n",
|
|
"\n",
|
|
" return model\n",
|
|
"\n",
|
|
"\n",
|
|
"def setup_transformer_training(train_data, train_targets, val_data, val_targets):\n",
|
|
" \"\"\"\n",
|
|
" Configura e prepara il transformer con dimensioni dinamiche basate sui dati.\n",
|
|
" \"\"\"\n",
|
|
" # Estrai le shape dai dati\n",
|
|
" temporal_shape = (train_data['temporal'].shape[1], train_data['temporal'].shape[2])\n",
|
|
" static_shape = (train_data['static'].shape[1],)\n",
|
|
" num_outputs = train_targets.shape[1]\n",
|
|
"\n",
|
|
" print(f\"Shape rilevate:\")\n",
|
|
" print(f\"- Temporal shape: {temporal_shape}\")\n",
|
|
" print(f\"- Static shape: {static_shape}\")\n",
|
|
" print(f\"- Numero di output: {num_outputs}\")\n",
|
|
"\n",
|
|
" # Target names basati sul numero di output\n",
|
|
" target_names = ['olive_prod', 'min_oil_prod', 'max_oil_prod', 'avg_oil_prod', 'total_water_need']\n",
|
|
"\n",
|
|
" # Assicurati che il numero di target names corrisponda al numero di output\n",
|
|
" assert len(target_names) == num_outputs, \\\n",
|
|
" f\"Il numero di target names ({len(target_names)}) non corrisponde al numero di output ({num_outputs})\"\n",
|
|
"\n",
|
|
" # Crea il modello con le dimensioni rilevate\n",
|
|
" model = create_olive_oil_transformer(\n",
|
|
" temporal_shape=temporal_shape,\n",
|
|
" static_shape=static_shape,\n",
|
|
" num_outputs=num_outputs\n",
|
|
" )\n",
|
|
"\n",
|
|
" # Compila il modello\n",
|
|
" model = compile_model(model)\n",
|
|
"\n",
|
|
" # Crea i callbacks\n",
|
|
" callbacks = create_transformer_callbacks(target_names, val_data, val_targets)\n",
|
|
"\n",
|
|
" return model, callbacks, target_names\n",
|
|
"\n",
|
|
"def train_transformer(train_data, train_targets, val_data, val_targets, epochs=150, batch_size=64, save_name='final_model'):\n",
|
|
" \"\"\"\n",
|
|
" Funzione principale per l'addestramento del transformer.\n",
|
|
" \"\"\"\n",
|
|
" # Setup del modello\n",
|
|
" model, callbacks, target_names = setup_transformer_training(\n",
|
|
" train_data, train_targets, val_data, val_targets\n",
|
|
" )\n",
|
|
"\n",
|
|
" # Mostra il summary del modello\n",
|
|
" model.summary()\n",
|
|
" os.makedirs(f\"./kaggle/working/models/oil_transformer/\", exist_ok=True)\n",
|
|
" keras.utils.plot_model(model, f\"./kaggle/working/models/oil_transformer/{save_name}.png\", show_shapes=True)\n",
|
|
"\n",
|
|
" # Training\n",
|
|
" history = model.fit(\n",
|
|
" x=train_data,\n",
|
|
" y=train_targets,\n",
|
|
" validation_data=(val_data, val_targets),\n",
|
|
" epochs=epochs,\n",
|
|
" batch_size=batch_size,\n",
|
|
" callbacks=callbacks,\n",
|
|
" verbose=1,\n",
|
|
" shuffle=True\n",
|
|
" )\n",
|
|
"\n",
|
|
" # Salva il modello finale\n",
|
|
" save_path = f'./kaggle/working/models/oil_transformer/{save_name}.keras'\n",
|
|
" model.save(save_path, save_format='keras')\n",
|
|
" \n",
|
|
" os.makedirs(f'./kaggle/working/models/oil_transformer/weights/', exist_ok=True)\n",
|
|
" model.save_weights(f'./kaggle/working/models/oil_transformer/weights')\n",
|
|
" print(f\"\\nModello salvato in: {save_path}\")\n",
|
|
"\n",
|
|
" return model, history"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "aytSjU1UthBB"
|
|
},
|
|
"source": [
|
|
"## Model Training"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2024-10-24T09:33:43.625381Z",
|
|
"start_time": "2024-10-24T09:33:34.088970Z"
|
|
},
|
|
"id": "xE3iTWonthBB",
|
|
"outputId": "a784254e-deea-4fd3-8578-6a0dbbd45bd7"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"model, history = train_transformer(train_data, train_targets, val_data, val_targets)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"id": "hPPbvFYmthBB",
|
|
"outputId": "e6570501-00e1-4dde-81e2-4712652a46b3"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Calcola gli errori reali\n",
|
|
"percentage_errors, absolute_errors = calculate_real_error(model, val_data, val_targets, scaler_y)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def evaluate_model_performance(model, data, targets, set_name=\"\"):\n",
|
|
" \"\"\"\n",
|
|
" Valuta le performance del modello su un set di dati specifico.\n",
|
|
" \"\"\"\n",
|
|
" predictions = model.predict(data, verbose=0)\n",
|
|
" \n",
|
|
" target_names = ['olive_prod', 'min_oil_prod', 'max_oil_prod', 'avg_oil_prod', 'total_water_need']\n",
|
|
" metrics = {}\n",
|
|
" \n",
|
|
" for i, name in enumerate(target_names):\n",
|
|
" mae = np.mean(np.abs(targets[:, i] - predictions[:, i]))\n",
|
|
" mse = np.mean(np.square(targets[:, i] - predictions[:, i]))\n",
|
|
" rmse = np.sqrt(mse)\n",
|
|
" mape = np.mean(np.abs((targets[:, i] - predictions[:, i]) / (targets[:, i] + 1e-7))) * 100\n",
|
|
" \n",
|
|
" metrics[f\"{name}_mae\"] = mae\n",
|
|
" metrics[f\"{name}_rmse\"] = rmse\n",
|
|
" metrics[f\"{name}_mape\"] = mape\n",
|
|
" \n",
|
|
" if set_name:\n",
|
|
" print(f\"\\nPerformance sul set {set_name}:\")\n",
|
|
" for metric, value in metrics.items():\n",
|
|
" print(f\"{metric}: {value:.4f}\")\n",
|
|
" \n",
|
|
" return metrics\n",
|
|
"\n",
|
|
"def retrain_model(base_model, train_data, train_targets, \n",
|
|
" val_data, val_targets, \n",
|
|
" test_data, test_targets,\n",
|
|
" epochs=50, batch_size=128):\n",
|
|
" \"\"\"\n",
|
|
" Implementa il retraining del modello con i dati combinati.\n",
|
|
" \"\"\"\n",
|
|
" print(\"Valutazione performance iniziali del modello...\")\n",
|
|
" initial_metrics = {\n",
|
|
" 'train': evaluate_model_performance(base_model, train_data, train_targets, \"training\"),\n",
|
|
" 'val': evaluate_model_performance(base_model, val_data, val_targets, \"validazione\"),\n",
|
|
" 'test': evaluate_model_performance(base_model, test_data, test_targets, \"test\")\n",
|
|
" }\n",
|
|
" \n",
|
|
" # Combina i dati per il retraining\n",
|
|
" combined_data = {\n",
|
|
" 'temporal': np.concatenate([train_data['temporal'], val_data['temporal'], test_data['temporal']]),\n",
|
|
" 'static': np.concatenate([train_data['static'], val_data['static'], test_data['static']])\n",
|
|
" }\n",
|
|
" combined_targets = np.concatenate([train_targets, val_targets, test_targets])\n",
|
|
" \n",
|
|
" # Crea una nuova suddivisione per la validazione\n",
|
|
" indices = np.arange(len(combined_targets))\n",
|
|
" np.random.shuffle(indices)\n",
|
|
" \n",
|
|
" split_idx = int(len(indices) * 0.9)\n",
|
|
" train_idx, val_idx = indices[:split_idx], indices[split_idx:]\n",
|
|
" \n",
|
|
" # Prepara i dati per il retraining\n",
|
|
" retrain_data = {k: v[train_idx] for k, v in combined_data.items()}\n",
|
|
" retrain_targets = combined_targets[train_idx]\n",
|
|
" retrain_val_data = {k: v[val_idx] for k, v in combined_data.items()}\n",
|
|
" retrain_val_targets = combined_targets[val_idx]\n",
|
|
" \n",
|
|
" checkpoint_path = './kaggle/working/models/oil_transformer/retrain_checkpoints'\n",
|
|
" os.makedirs(checkpoint_path, exist_ok=True)\n",
|
|
" \n",
|
|
" # Configura callbacks\n",
|
|
" callbacks = [\n",
|
|
" tf.keras.callbacks.EarlyStopping(\n",
|
|
" monitor='val_loss',\n",
|
|
" patience=10,\n",
|
|
" restore_best_weights=True,\n",
|
|
" min_delta=0.0001\n",
|
|
" ),\n",
|
|
" tf.keras.callbacks.ReduceLROnPlateau(\n",
|
|
" monitor='val_loss',\n",
|
|
" factor=0.2,\n",
|
|
" patience=5,\n",
|
|
" min_lr=1e-6,\n",
|
|
" verbose=1\n",
|
|
" ),\n",
|
|
" tf.keras.callbacks.ModelCheckpoint(\n",
|
|
" filepath=os.path.join(checkpoint_path, 'model_{epoch:02d}_{val_loss:.4f}.keras'),\n",
|
|
" monitor='val_loss',\n",
|
|
" save_best_only=True,\n",
|
|
" mode='min',\n",
|
|
" save_weights_only=True\n",
|
|
" )\n",
|
|
" ]\n",
|
|
" \n",
|
|
" # Imposta learning rate per il fine-tuning\n",
|
|
" optimizer = tf.keras.optimizers.AdamW(\n",
|
|
" learning_rate=tf.keras.optimizers.schedules.ExponentialDecay(\n",
|
|
" initial_learning_rate=1e-4,\n",
|
|
" decay_steps=1000,\n",
|
|
" decay_rate=0.9\n",
|
|
" ),\n",
|
|
" weight_decay=0.01\n",
|
|
" )\n",
|
|
" \n",
|
|
" # Ricompila il modello con il nuovo optimizer\n",
|
|
" base_model.compile(\n",
|
|
" optimizer=optimizer,\n",
|
|
" loss=tf.keras.losses.Huber(),\n",
|
|
" metrics=['mae']\n",
|
|
" )\n",
|
|
" \n",
|
|
" print(\"\\nAvvio retraining...\")\n",
|
|
" history = base_model.fit(\n",
|
|
" retrain_data,\n",
|
|
" retrain_targets,\n",
|
|
" validation_data=(retrain_val_data, retrain_val_targets),\n",
|
|
" epochs=epochs,\n",
|
|
" batch_size=batch_size,\n",
|
|
" callbacks=callbacks,\n",
|
|
" verbose=1\n",
|
|
" )\n",
|
|
" \n",
|
|
" print(\"\\nValutazione performance finali...\")\n",
|
|
" final_metrics = {\n",
|
|
" 'train': evaluate_model_performance(base_model, train_data, train_targets, \"training\"),\n",
|
|
" 'val': evaluate_model_performance(base_model, val_data, val_targets, \"validazione\"),\n",
|
|
" 'test': evaluate_model_performance(base_model, test_data, test_targets, \"test\")\n",
|
|
" }\n",
|
|
" \n",
|
|
" # Salva il modello finale\n",
|
|
" save_path = './kaggle/working/models/oil_transformer/retrained_model.keras'\n",
|
|
" os.makedirs(os.path.dirname(save_path), exist_ok=True)\n",
|
|
" base_model.save(save_path, save_format='keras')\n",
|
|
" print(f\"\\nModello riaddestrato salvato in: {save_path}\")\n",
|
|
" \n",
|
|
" # Report miglioramenti\n",
|
|
" print(\"\\nMiglioramenti delle performance:\")\n",
|
|
" for dataset in ['train', 'val', 'test']:\n",
|
|
" print(f\"\\nSet {dataset}:\")\n",
|
|
" for metric in initial_metrics[dataset].keys():\n",
|
|
" initial = initial_metrics[dataset][metric]\n",
|
|
" final = final_metrics[dataset][metric]\n",
|
|
" improvement = ((initial - final) / initial) * 100\n",
|
|
" print(f\"{metric}: {improvement:.2f}% di miglioramento\")\n",
|
|
" \n",
|
|
" return base_model, history, final_metrics\n",
|
|
"\n",
|
|
"def start_retraining(model_path, train_data, train_targets, \n",
|
|
" val_data, val_targets, \n",
|
|
" test_data, test_targets,\n",
|
|
" epochs=50, batch_size=128):\n",
|
|
" \"\"\"\n",
|
|
" Avvia il processo di retraining in modo sicuro.\n",
|
|
" \"\"\"\n",
|
|
" try:\n",
|
|
" print(\"Caricamento del modello...\")\n",
|
|
" base_model = tf.keras.models.load_model(model_path, compile=False)\n",
|
|
" print(\"Modello caricato con successo!\")\n",
|
|
" \n",
|
|
" return retrain_model(\n",
|
|
" base_model=base_model,\n",
|
|
" train_data=train_data,\n",
|
|
" train_targets=train_targets,\n",
|
|
" val_data=val_data,\n",
|
|
" val_targets=val_targets,\n",
|
|
" test_data=test_data,\n",
|
|
" test_targets=test_targets,\n",
|
|
" epochs=epochs,\n",
|
|
" batch_size=batch_size\n",
|
|
" )\n",
|
|
" except Exception as e:\n",
|
|
" print(f\"Errore durante il retraining: {str(e)}\")\n",
|
|
" raise"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"model_path = './kaggle/working/models/oil_transformer/final_model.keras'\n",
|
|
"\n",
|
|
"retrained_model, retrain_history, final_metrics = start_retraining(\n",
|
|
" model_path=model_path,\n",
|
|
" train_data=train_data,\n",
|
|
" train_targets=train_targets,\n",
|
|
" val_data=val_data,\n",
|
|
" val_targets=val_targets,\n",
|
|
" test_data=test_data,\n",
|
|
" test_targets=test_targets,\n",
|
|
" epochs=50,\n",
|
|
" batch_size=128\n",
|
|
")\n",
|
|
"\n",
|
|
"# Visualizza i risultati\n",
|
|
"visualize_retraining_results(retrain_history, initial_metrics, final_metrics)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"id": "4BAI1zsJthBC"
|
|
},
|
|
"source": [
|
|
"## 8. Conclusioni e Prossimi Passi\n",
|
|
"\n",
|
|
"In questo notebook, abbiamo:\n",
|
|
"1. Caricato e analizzato i dati meteorologici\n",
|
|
"2. Simulato la produzione annuale di olive basata sui dati meteo\n",
|
|
"3. Esplorato le relazioni tra variabili meteorologiche e produzione di olive\n",
|
|
"4. Creato e valutato un modello di machine learning per prevedere la produzione\n",
|
|
"5. Utilizzato ARIMA per fare previsioni meteo\n",
|
|
"6. Previsto la produzione di olive per il prossimo anno\n",
|
|
"\n",
|
|
"Prossimi passi:\n",
|
|
"- Raccogliere dati reali sulla produzione di olive per sostituire i dati simulati\n",
|
|
"- Esplorare modelli più avanzati, come le reti neurali o i modelli di ensemble\n",
|
|
"- Incorporare altri fattori che potrebbero influenzare la produzione, come le pratiche agricole o l'età degli alberi\n",
|
|
"- Sviluppare una dashboard interattiva basata su questo modello"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"accelerator": "GPU",
|
|
"colab": {
|
|
"gpuType": "A100",
|
|
"provenance": []
|
|
},
|
|
"kaggle": {
|
|
"accelerator": "none",
|
|
"dataSources": [
|
|
{
|
|
"datasetId": 5950719,
|
|
"sourceId": 9725208,
|
|
"sourceType": "datasetVersion"
|
|
},
|
|
{
|
|
"datasetId": 5954901,
|
|
"sourceId": 9730815,
|
|
"sourceType": "datasetVersion"
|
|
}
|
|
],
|
|
"dockerImageVersionId": 30787,
|
|
"isGpuEnabled": false,
|
|
"isInternetEnabled": true,
|
|
"language": "python",
|
|
"sourceType": "notebook"
|
|
},
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.11.0rc1"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|