Data-extraction from WUnderground with Python-script

Moderator: leecollings

Post Reply
Toulon7559
Posts: 859
Joined: Sunday 23 February 2014 17:56
Target OS: Raspberry Pi / ODroid
Domoticz version: <2025
Location: Hengelo(Ov)/NL
Contact:

Data-extraction from WUnderground with Python-script

Post by Toulon7559 »

WUnderground has many weatherstations under it's umbrella, of which the data is accessible if you are a Contributor:
therefore WU might be a suitable datasource for other applications.

Simple scraping WUnderground just for CurrentConditions is not difficult:
can be done with a rather simple lua-script operating on the output of an API-call like https://api.weather.com/v2/pws/observat ... yourApiKey
If you need additional info such as Sunrise/Sunset, it can be 'borrowed' from Domoticz.
For timing of the scraping-script, mind the quotum of 500 calls/day for the applicable WU-account!

More extended scraping of WUnderground for CurrentConditions & Forecast has been offered in form of a dzVents-Script

A suitable approach to extract data from WU is also presented as a PHP-script by the author of the socalled Saratoga-Template.
Almost meeting the desired setup, but:
- PHP-application demands webspace to run
- for the subsequent application of data not yet a ready PHP-script.

;-) Getting ambitious, therefore now starting effort to make the following configuration:
#1. Python-script, because it should also be used by non-Domoticz weather-enthousiasts having a Raspberry (or alike), but not necessarily Domoticz/dzVents, nor PHP&webspace
#2. reading Today's data from a WUnderground weatherstation via the equivalent of https://api.weather.com/v2/pws/observat ... yourApiKey
That yields a JSON-file like in the attachment [and that example-file only covers the time till approx. 10:00 in the morning!!!)
#3. scanning the data in the resulting JSON-file to get
a. Current/actual/latest data
b. Extremes (= high-values and low-values of this day after 00:00)
c. times that the Extremes occurred
#4. application of found values for functions, such as generation of a quasi-Cumulus-datafile as described in this thread.
Obviously, the characteristics of such function echo back to the requirements for extraction of information, meaning that most extracted values will be used.

As stated, the PHP-script mentioned above is good guidance for a general setup, and probably also example Python-scripts exist for each aspect, but did not yet find one package fitting in combination most above requirements .........
For #1, #2 and 4# have bits & pieces available for recycling.
However, #3. is 'critical kernel' when trying to apply the WU JSON-file for Today's Data:
aspect a. requires dissect of the latest section (or 'bin')
=> find number of latest bin, and extract&process the desired data from that latest bin as current/actual values.
[simpler, faster and probably more valid extraction of current/actual data is use of (example)url-call for CurrentData and extraction of data, but it implies one more WU API-call per cycle of the script]
aspect b. starts easy, because initial values are in the first section/'bin0' of the json-file
=> extract the desired data from this bin 0 to serve as initial reference for further checking of Extremes.
Subsequently scan the other sections/'bin's and register the higher values respectively the lower values.
aspect c. has the more difficult function that time-stamps related to extreme values are not included
=> times of occurrence have to be derived from the section/bin in which the related highest value or lowest value occurred
=> if finding a new Extreme, register the time of occurrence = epoch time of the bin in which found

Somebody with a examples/hints for the mentioned aspects of #3?
Attachments
WU_IHENGE39_1day.txt
WU_Extract = JSON-file
Extension changed to txt to suit upload to forum
(74.63 KiB) Downloaded 41 times
Last edited by Toulon7559 on Sunday 01 May 2022 15:49, edited 4 times in total.
Set1 = RPI-Zero+RFXCom433+S0PCM+Shield for BMP180/DS18B20/RS485+DDS238-1ZNs
Set2 = RPI-3A++RFLinkGTW+ESP8266s+PWS_WS7000
Common = KAKUs+3*PVLogger+PWS_TFA_Nexus
plus series of 'satellites' for dedicated interfacing, monitoring & control.
Toulon7559
Posts: 859
Joined: Sunday 23 February 2014 17:56
Target OS: Raspberry Pi / ODroid
Domoticz version: <2025
Location: Hengelo(Ov)/NL
Contact:

Re: Data-extraction from WUnderground with Python-script

Post by Toulon7559 »

Trying to make the initial setup for the output-file by means of a default Python dictionary, but apparently missing a subtlety in the syntax.

Testing the scriptline with the definition of the dictionary as part of a much longer script.
Every OK upto the defintion of the dictionary.
The dictionary is an extended variation of a definition happily applied in multiple older Python-scripts.

The error reports show the 'offending' line of the script with the definition of the dictionary.
The error code-presentation in the forum-display is not as is shown in Putty's window:
if you want to see more equivalent to Putty's display, open this forum message in editing mode.

Labelname time_stamp worked OK in other Python2.7-scripts.
Deletion in all labelnames of character $ does not make difference.

Where/what is the fault/mistake in this dictionary setup?

Testrunning with Python2.x results in a SyntaxError, and in Putty's full-screen window the debugger's pointer aims at an empty space far behind the textstring.

Code: Select all

    HWA_dict = {'$StationDate': 22-04-2022, '$StationTime': 20:22, '$tempUnit': C, '$humUnit': %, '$barUnit': hPa, '$rainUnit': mm, '$rateUnit': mm/hr, '$windUnit'= km/h, '$sunriseTime': 06:00, $sunsetTime': 18:00, '$outsideTemp': 20.0, '$hiOutsideTemp': 21, '$lowOutsideTempTime': 19.9, '$hiOutsideTempTime': 21:00, '$lowOutsideTempTime': 04:00, '$outsideHumidty': 88, '$hiOutsideHumidity': 90, '$lowOutsideHumidty': 80, '$hiOutsideHumidityTime': 04:00, '$lowOutsideHumidityTime': 04:00, '$outsideDewPt': 17.3, '$hiOutsideDewPtTime': 04:00, '$lowOutsideDewPtTime': 04:00, '$windSpeed': 10,'$wind10Avg': 11, '$hiWindSpeed': 22, '$hiWindSpeedTime': 20:22, '$windDir': 288, '$windDirection': NNW, '$windChill': 16, '$outsideHeatindex': 22, '$barometer': 1000, '$barTrend': up, '$rainRate': 2, '$dailyRain': 4.5, '$monthlyRain': 18, '$totalRain': 210, '$solarRad': 10, '$hiSolarRad': 1100,  '$hiSolarRadTime': 10:00, '$uv': 1, '$hiUV': 11, '$hiUVTime': 11:00, '$identification': Test voor WU-extract met Python, '$software': Python&Domoticz, 'time_stamp': 0}
                                                              ^
SyntaxError: invalid syntax
Testrunning with Python3.x results in another Syntax Error

Code: Select all

    HWA_dict = {'$StationDate': 22-04-2022, '$StationTime': 20:22, '$tempUnit': C, '$humUnit': %, '$barUnit': hPa, '$rainUnit': mm, '$rateUnit': mm/hr, '$windUnit'= km/h, '$sunriseTime': 06:00, $sunsetTime': 18:00, '$outsideTemp': 20.0, '$hiOutsideTemp': 21, '$lowOutsideTempTime': 19.9, '$hiOutsideTempTime': 21:00, '$lowOutsideTempTime': 04:00, '$outsideHumidty': 88, '$hiOutsideHumidity': 90, '$lowOutsideHumidty': 80, '$hiOutsideHumidityTime': 04:00, '$lowOutsideHumidityTime': 04:00, '$outsideDewPt': 17.3, '$hiOutsideDewPtTime': 04:00, '$lowOutsideDewPtTime': 04:00, '$windSpeed': 10,'$wind10Avg': 11, '$hiWindSpeed': 22, '$hiWindSpeedTime': 20:22, '$windDir': 288, '$windDirection': NNW, '$windChill': 16, '$outsideHeatindex': 22, '$barometer': 1000, '$barTrend': up, '$rainRate': 2, '$dailyRain': 4.5, '$monthlyRain': 18, '$totalRain': 210, '$solarRad': 10, '$hiSolarRad': 1100,  '$hiSolarRadTime': 10:00, '$uv': 1, '$hiUV': 11, '$hiUVTime': 11:00, '$identification': Test voor WU-extract met Python, '$software': Python&Domoticz, 'time_stamp': 0}
                                    ^
SyntaxError: invalid token
In above code-pack the debugger's pointer initially seemed to aim at underscore in last labelname in the dictionary [= time_stamp], but taking out underscore [ => timestamp] only resulted in different position of the pointer: now aims at space before timestamp.

Code: Select all

    HWA_dict = {'$StationDate': 22-04-2022, '$StationTime': 20:22, '$tempUnit': C, '$humUnit': %, '$barUnit': hPa, '$rainUnit': mm, '$rateUnit': mm/hr, '$windUnit'= km/h, '$sunriseTime': 06:00, $sunsetTime': 18:00, '$outsideTemp': 20.0, '$hiOutsideTemp': 21, '$lowOutsideTempTime': 19.9, '$hiOutsideTempTime': 21:00, '$lowOutsideTempTime': 04:00, '$outsideHumidty': 88, '$hiOutsideHumidity': 90, '$lowOutsideHumidty': 80, '$hiOutsideHumidityTime': 04:00, '$lowOutsideHumidityTime': 04:00, '$outsideDewPt': 17.3, '$hiOutsideDewPtTime': 04:00, '$lowOutsideDewPtTime': 04:00, '$windSpeed': 10,'$wind10Avg': 11, '$hiWindSpeed': 22, '$hiWindSpeedTime': 20:22, '$windDir': 288, '$windDirection': NNW, '$windChill': 16, '$outsideHeatindex': 22, '$barometer': 1000, '$barTrend': up, '$rainRate': 2, '$dailyRain': 4.5, '$monthlyRain': 18, '$totalRain': 210, '$solarRad': 10, '$hiSolarRad': 1100,  '$hiSolarRadTime': 10:00, '$uv': 1, '$hiUV': 11, '$hiUVTime': 11:00, '$identification': Test voor WU-extract met Python, '$software': Python&Domoticz, 'timestamp': 0}
                                    ^
SyntaxError: invalid token
After another check that everywhere the spacing between labelnames was present, corrected 1 remaining error for $wind10Avg [see above!], and then got the debugger's pointer at same position for both running by Python2.x and Python3.x: far behind the printed textstring (in 'empty' space) .
Below is error report from Python3-running.

Code: Select all

    HWA_dict = {'StationDate': 22-04-2022, 'StationTime': 20:22, 'tempUnit': C, 'humUnit': %, 'barUnit': hPa, 'rainUnit': mm, 'rateUnit': mm/hr, 'windUnit'= km/h, '$sunriseTime': 06:00, sunsetTime': 18:00, 'outsideTemp': 20.0, 'hiOutsideTemp': 21, 'lowOutsideTempTime': 19.9, 'hiOutsideTempTime': 21:00, 'lowOutsideTempTime': 04:00, 'outsideHumidty': 88, 'hiOutsideHumidity': 90, 'lowOutsideHumidty': 80, 'hiOutsideHumidityTime': 04:00, 'lowOutsideHumidityTime': 04:00, 'outsideDewPt': 17.3, 'hiOutsideDewPtTime': 04:00, 'lowOutsideDewPtTime': 04:00, 'windSpeed': 10, 'wind10Avg': 11, 'hiWindSpeed': 22, 'hiWindSpeedTime': 20:22, 'windDir': 288, 'windDirection': NNW, 'windChill': 16, 'outsideHeatindex': 22, 'barometer': 1000, 'barTrend': up, 'rainRate': 2, 'dailyRain': 4.5, 'monthlyRain': 18, 'totalRain': 210, 'solarRad': 10, 'hiSolarRad': 1100,  'hiSolarRadTime': 10:00, 'uv': 1, 'hiUV': 11, 'hiUVTime': 11:00, 'identification': Test voor WU-extract met Python, 'software': Python&Domoticz, 'time_stamp': 0}
                                   ^
SyntaxError: invalid token
Set1 = RPI-Zero+RFXCom433+S0PCM+Shield for BMP180/DS18B20/RS485+DDS238-1ZNs
Set2 = RPI-3A++RFLinkGTW+ESP8266s+PWS_WS7000
Common = KAKUs+3*PVLogger+PWS_TFA_Nexus
plus series of 'satellites' for dedicated interfacing, monitoring & control.
Toulon7559
Posts: 859
Joined: Sunday 23 February 2014 17:56
Target OS: Raspberry Pi / ODroid
Domoticz version: <2025
Location: Hengelo(Ov)/NL
Contact:

Re: Data-extraction from WUnderground with Python-script

Post by Toulon7559 »

Another aspect to be solved is scanning the JSON-file from WUnderground to find the daily extreme values.
Scanning equivalent to the process in the PHP-script mentioned before.

Trying the approach below.

Code: Select all

# ------------------------------------------------------------------------------------------
# Line 442 = In WU JSON-Output Today's Data find & process next Bin upto/incl. last Bin
# ------------------------------------------------------------------------------------------
# Example call-string for ToDay-info = https://api.weather.com/v2/pws/observations/all/1day?stationId=KMAHANOV10&format=json&units=m&apiKey=yourApiKey
#    page = urllib.urlopen('https://api.weather.com/v2/pws/observations/all/1day?stationId=KMAHANOV10&format=json&units=m&apiKey=yourApiKey')
#    content_test = page.read()
#    obj_test2 = json.loads(content_test)
# Extraction of a value is like 
# Epochcheck = obj_test2['observations'][Bin]['epoch']
# 'epoch' is present in all bins of the JSON-file and therefore choosen as key for scan & search

# Bin [0] earlier has been processed => initial contents at 00:00 = references for Extremes
# GENERAL setup of the scanning function:
# Bin = 0
# while 'epoch' exists
#     Read 'epoch' & translate to CET/LocalTime
#     Compare values of Extremes in that bin with earlier Extremes
#         if hi_value higher than hiExtreme => new hiExtreme & adapt HiTime
#         if low_value lower than LowExtreme => new lowExtreme & adapt LowTime
#     Bin = Bin + 1
print ('End of scanning bins for Extremes in TodayData')

# Approach1
Bin = 0
Epochcheck =  obj_test2['observations'][0]['epoch']
try:
     Epochcheck = obj_test2['observations'][Bin]['epoch']
     print(Bin)
     Bin += 1
except NameError:
     Epochcheck = None  

# Approach2
Bin = 0
Epochcheck = obj_test2['observations'][0]['epoch']
While Epochcheck is not None:
    Epochcheck = obj_test2['observations'][Bin]['epoch']
    Print(Bin)
    Bin += 1

Approach1 does not throw an error, but it steps out at Bin = 1.
Approach2 reports a syntax error.

Code: Select all

  File "/home/pi/domoticz/scripts/python/URL_JSON_WU_to_HWA_Start01a_0186.py", line 476
    While Epochcheck is not None:
                   ^
SyntaxError: invalid syntax
Apparently in a Python-script the checkline below (with variable value for Bin) cannot be applied in this way.

Code: Select all

    Epochcheck = obj_test2['observations'][Bin]['epoch']

Somebody with a better idea?
Last edited by Toulon7559 on Sunday 01 May 2022 15:51, edited 4 times in total.
Set1 = RPI-Zero+RFXCom433+S0PCM+Shield for BMP180/DS18B20/RS485+DDS238-1ZNs
Set2 = RPI-3A++RFLinkGTW+ESP8266s+PWS_WS7000
Common = KAKUs+3*PVLogger+PWS_TFA_Nexus
plus series of 'satellites' for dedicated interfacing, monitoring & control.
Toulon7559
Posts: 859
Joined: Sunday 23 February 2014 17:56
Target OS: Raspberry Pi / ODroid
Domoticz version: <2025
Location: Hengelo(Ov)/NL
Contact:

Re: Data-extraction from WUnderground with Python-script

Post by Toulon7559 »

Was pointed to this url-call which provides the latest/actual high/low values of the selected day:

Code: Select all

https://api.weather.com/v2/pws/history/daily?stationId=KCAOAKLA44&format=json&units=m&date=20181001&apiKey=yourApiKey
In experience of the contributor, with less work & time the result is equivalent to the more difficult scanning of the long file.
Just another method required to register-on-the-fly the related time of occurrence for the extremes.

After some experiments also seems useful to apply 2 separate dictionaires to make clear demarcation of data:
- registration of extreme values & times of occurrence
- reference for outputfile(s)

:( for progress just need to solve the 'problem' with the dictionaries as mentioned in this previous message:
any hints?
Set1 = RPI-Zero+RFXCom433+S0PCM+Shield for BMP180/DS18B20/RS485+DDS238-1ZNs
Set2 = RPI-3A++RFLinkGTW+ESP8266s+PWS_WS7000
Common = KAKUs+3*PVLogger+PWS_TFA_Nexus
plus series of 'satellites' for dedicated interfacing, monitoring & control.
Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest