Home >> Forums >> Calibration

Calibration

Dear sir,
Thanks to OSMOSE team's kind help, I have built the calibration for my project. However, I am puzzled by some issues about the calibration.  Would you like to talk about them at a convenient time? The issues are listed as follows.
Q1:  I set my calibration’s EA parameter seed to 63, and want to know how the results would be influenced by changing the seed’s value. Is there a principle to set the seed for different studies?
Q2: The calibration guide says M.proxy is the total mortality (total mortality: Mdiverse+starvation+predation…), however, I found it was lower than natural mortality in example parmin.csv. Could you tell me what the M.proxy refer to?
Q3: Since the gen.max’s value (c(100, 100, 150, 200), four phases) produces many generations, the calibration spends too much time. Is there a principle to set the gen.max for different studies? I want to know what would happen if I reduce the gen.max’s value.
Thank you in advance.
Best regards,
Lei Xing

Hello Lei Xing,

Q1, In the calibration example I sent, we had by default the seed parameter to 63 because we were running the calibration on a 64 CPUs computer : 1 CPU for the master process and 63 CPUs for the slaves (each slave is dedicated to run an Osmose simulation with a specific set of parameters).

There is indeed a minimum value for this parameter that depends on the number of parameters you are estimating. Under this minimum value the search of the optimal solution will no be reliable. There is a formula for estimating this minimum value as a function of the number of parameters, but I must ask Ricardo. You may find your answer in the attached paper though.

Q2, in the calibration example I sent to you I don't think we are using the M.proxy parameters ? It is work in progress for future version of the calibration. I suggest that at present you stick to "more obvious" parameters to estimate, such as plancton accessibility, larval mortality, etc.

Q3, there is unfortunately no easy answer to your question. You must find the number of generation per phases that works best for your calibration. Saying so there is nonetheless a few principles to keep in mind : there are two ways for the algorithm to move from one generation to the next : either when reaching the max number of generation as set by the gen.max paremeter or the step size (difference from one generation to the previous one) inferior to the parameter 'convergence' (in calibration.conf). If gen.max is too small you may jump to the next generation without exploring sufficiently your parameter space. Though for the first phases you may allow the algorithm to explore very quickly (less than 100 generation) the parameter space in order to get closer the optimal solution though you know you are not going to reach it and in the last phase (when estimating all the parameter all together) that is when you must let the algorithm explore thoroughly.

How much can you reduce gen.max for the first generation ? Check out the evolution of the step size, maybe you can even make a plot a let see after how many generation it reduced signigicantly ? Maybe in your case after 50 generations the step size is already stabilized or small enough to get an acceptable estimation of the first estimated parameters and you can move to the second phase.

Cheers, Philippe

Dear Philippe,
I sincerely appreciate your patient reply. I understand the meaning of parameters now, and it is obvious that I couldn't  built my OSMOSE's project without your kind help. 
Thank you again for your generous help.
Best regards,
Lei Xing
Ocean University of China

Dear Philippe,
Recently, I got the result of my project’s calibration. Unfortunately, the biomasses calculated by EA were not against the actual biomasses. I checked my project and reviewed the ea_adr’s example, and then found some confusing issues which were listed as follows. Would you like to talk about them at your convenient time?
 
Q1: If I set natural mortality to NA in every phase, would it mean that this parameter is not calibrated any more, and its value will read from paropt.csv or adr_param-natural-mortality.csv?
 
Q2: The SP.biomass which was calculated by EA in optim.csv was far lower than the actual biomasses which I set in DATA/SP.biomass.csv (type=lnorm2, calibration=1). I would like to know whether I need to calculate the SP.biomass with a conversion formula or unit conversion.
 
Q3: In my calibration, the parameter time was set to 30, 31, …, 49 in the example file DATA/ shrimp.biomass.csv. What does the parameter time refer to (output.start.year=30, simulation.time.nyear=50)? What do the values which were in shrimp’s column refer to (the annual average biomass, time 30’s biomass, or other)? We have surveyed in our study area four times in a year (Feb., May, Aug., and Nov.). Because of this, I want to calibrate against four different values in a simulation year. Could you tell me the way to make it come true?
 
Q4: Except the type “equilibrium”, I would like to know other types for fluxing and their meanings.
 
Thank you in advance and I am looking forward to your reply.
 
Best regards,
 
Lei Xing
Ocean University of China

Hello Lei Xing,

I'm going straight to your questions :

Q1 - it depends on you LIB/dynamics.R file so you better make sure that both paropt.csv and the adr_param-natural-mortality.csv have the correct values.

Q2 - DATA/SpBiomass.cvs is compared against the biomass in Osmose output files, in tonnes. I do not understand what does SP.biomass from optim.csv refer to ?

Q3.1 - The first column, Time, in the DATA/*.csv files are purely indicative, the EA does not even read them. But indeed in the case of the ADR calibration the Time column refered to year 30 to year 49. Indeed it means that the Osmose biomass output file should be outputed every year for 20 years. One way to achive this is to set output.start.year=30, simulation.time.nyear=50 and output.recordfrequency.ndt = 24. But output.start.year=50, simulation.time.nyear=70 and output.recordfrequency.ndt = 24 would also work as I said initially that the values of the time column does not matter in the DATA/*.csv files.

Q3.2 - In your study are you have four biomass estimates every year and you would like to use this available information. Two ways to handle it (and I'm not sure what is best, you will have to decide) :

  • You assume that the biomass estimates are good averages for every trimestre. Your DATA/*.csv will look like

30.16,biomass_estimate_Feb
30.33,biomass_estimate_May
30.50,biomass_estimate_Aug
30.66,biomass_estimate_Nov
31.16,biomass_estimate_Feb
31.33,biomass_estimate_May
etc.

And output.recordfrequency.ndt = 6

  • You assume that the biomass estimate is only relevant for the corresponding month. Your DATA/*.csv will look like

30.0,NA
30.083,biomass_estimate_Feb
30.167,NA
30.25,NA
30.33,biomass_estimate_May
30.416,NA
30.5,NA
30.583,biomass_estimate_Aug
30.667,NA
30.75,NA
30.833,biomass_estimate_Nov
etc.
And output.recordfrequency.ndt = 2

Q4 - I am of no help unfortunatly, these were experimental features included by Ricardo Oliveros Ramos for his configuration in the Humboldt current but I am not familiar with them. You mask him directly.

Cheers, Philippe