# Outlier treatment using non-linear regression

## Recommended Posts

I just got my electrical engineering degree last month and I used a non-linear regression model to treat irradiance outliers after applying reliable meteorological data (from an NREL class I meteorological station) to transposition models from the pvlib library. In some scenarios, some hours had inconsistent irradiance values (such as 5000W/m²), to treat this I used this regression model, which adjusts the limits of values according to a sinusoidal curve (for each day of the year). My reference of the "right values" were the results of PVsyst and HelioScope, but I would like to know if PVsyst itself somehow deals with these possible outliers, since it uses the Perez transposition model, for example. My aim was to adjust the plane-of-array irradiance in some scenarios with a lot of outliers.

##### Share on other sites

When importing custom data, PVsyst will test whether irradiances are between -28 W/m^2 and 1600 W/m^2. If the irradiances are out of these bounds, they are considered 0.

If you want to use the data in PVsyst, I would suggest using your filters for outliers before importing in PVsyst.

Finally, the sine may work ok for clear sky conditions, but how do you account for overcast conditions ?

##### Share on other sites

Why the lower limit is a negative number (-28W/m²)? Because of instrumental measurement errors, sensor calibration or simply data analysis? In my python code I considered every negative value as 0. Your upper limit, close to 17% higher than the solar constant seems logical to me. I don't entirely disagree with the negative value, but it really took me by surprise.
Probably I will finish my English version this week and I could send to you, but even in overcast conditions my model should work, because it adjust the sine wave according to the data. Probably in a 15 minute dataset it won't fit very well, but in hourly dataset it should fit in most overcast days.

##### Share on other sites

In my research, I employed statistical measures such as skewness, kurtosis, and the coefficient of variation, along with the envelope derived through the Hilbert Transform, to analyze variations in irradiation curves. For my upcoming master's degree, I aim to refine my model by adjusting the upper and lower limits of the sine waves based on these statistical and signal analysis descriptors. I am also considering the inclusion of additional metrics.

##### Share on other sites

I don't know the reason for the -28W/m^2 specifically, but I imagine it is related to accommodating some measurements we used for the validation of the code.
Physically, considering the balance of radiation at night (a clear sky has a low temperature) it is common that pyranometers give negative readings at night. For PVsyst it doesn't matter too much, since there are other “nighttime” filters that set the production to zero.

Your research is definitely interesting, we'll gladly give it a read once published / made public. I am sure it may be appealing to other forum users as well.