tag:blogger.com,1999:blog-6061887630060661987.post4076341170268576483..comments2018-12-11T15:10:13.876-05:00Comments on dlib C++ Library: A Global Optimization Algorithm Worth UsingDavis Kingnoreply@blogger.comBlogger60125tag:blogger.com,1999:blog-6061887630060661987.post-80468402323955667312018-12-11T15:10:13.876-05:002018-12-11T15:10:13.876-05:00For now I have set all the additional parameters (...For now I have set all the additional parameters (put in a dict) as a global variable. It definitely solve my problem.. but it doesnt 'feel' that great.<br />Could you think of a better solution? (where "better" here is quite arbitrary)zwephttps://www.blogger.com/profile/09240148981899264877noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-91596101994364988152018-12-11T10:02:51.887-05:002018-12-11T10:02:51.887-05:00Hello, I have some trouble using the dlib.find_min...Hello, I have some trouble using the dlib.find_min_global in my NN model...<br /><br />I have setup a class that creates a model object and attaches it to 'self'. Then, a method is called where this model object is trained using self.model_obj.fit_generator from Keras.<br /><br />I have made it in such a way that this method needs additional parameters like learning rate, and others, which I want to approximate using this dlib function.<br /><br />However, there is a 'self' argument in the method by the nature of my class... and I cant seem to find a way to have dlib.find_min_global ignore this first argument in its optimizaiton. Do you have an idea maybe?zwephttps://www.blogger.com/profile/09240148981899264877noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-30371752653510522532018-11-29T07:20:16.150-05:002018-11-29T07:20:16.150-05:00I don't have any numbers to share, but I'v...I don't have any numbers to share, but I've used it plenty on problems of 10 dimension and it works fine. There isn't anything special about 5 vs 10 dimensions.Davis Kinghttps://www.blogger.com/profile/16577392965630448489noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-87995254841296134992018-11-29T04:37:45.296-05:002018-11-29T04:37:45.296-05:00Hi Davis,
For a global optimization algorithm, th...Hi Davis,<br /><br />For a global optimization algorithm, this seems extremely easy to use, and seems to converge fast (from your tests upto 5 dimensions). I have a CFD optimization problem with 10 geometric design variables, and am considering applying this method for finding the global maximum in efficiency. However, since the CFD's are quite computationally expensive, I am trying to be sure it would work well in higher dimensions. Do you have some test results in the range of 6-10 dimensions that you can share with us?<br /><br />Thanks :)praxhttps://www.blogger.com/profile/05305551077908245740noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-68776619694250445492018-08-11T07:30:53.675-04:002018-08-11T07:30:53.675-04:00Sorry for the late reply, for some reason I haven&...Sorry for the late reply, for some reason I haven't been getting notifications of new commends on the blog. <br /><br />Yes, you can do all those more complex use cases by using the global_function_search class directly rather than using find_min_global(). See the documentation for global_function_search.<br /><br />No, there is no built in tool to visualize the surrogate surfaces, if you want to do that it's on you. I had hacked together something to make the video from this blog post, but the code is not readily reusable, so I'm not sharing it. It would be easier for someone to write it from scratch than figure out how to use that bit of hacky code.<br /><br /><br />The extensions I made to the algorithm make it work with things that are not lipschitz. There is a whole discussion in the blog post about functions that are discontinuous. Discontinuous functions are not Lipschitz.Davis Kinghttps://www.blogger.com/profile/16577392965630448489noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-36521416562064596022018-07-13T23:13:30.728-04:002018-07-13T23:13:30.728-04:00Hi Davis! Thank you very much for the sharing.
I ...Hi Davis! Thank you very much for the sharing.<br /><br />I am trying to use your algorithm to tune a Machine Learning program that takes a day to get one data point (each with a specific set of hyper-parameters) . I've already a few data point and would like to know if there is an option in your function to incorporate these data points to predefine the upper bound U(x) (rather than starting from a random point)?<br /><br />Also, is there an option to print out/save the value of the parameter x_i and result y obtained in each iteration so that I can decide whether I should iterate more. <br /><br />My calculation just takes so long to run and those features will help a lot!<br /><br />Thank you.<br />Unknownhttps://www.blogger.com/profile/03452827871556753687noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-13195488982386342612018-07-03T04:44:34.181-04:002018-07-03T04:44:34.181-04:00Is there an easy way to visualise the optimisation...Is there an easy way to visualise the optimisation/surrogate surface? Something similar to what one gets out of skopt plot_objective, plot_evaluations functions?Fahiz Baba Yarahttps://www.blogger.com/profile/02509475568147194357noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-5784628056551123862018-05-30T11:51:46.905-04:002018-05-30T11:51:46.905-04:00Hi Davis!
Thank you for your amazing algorithm!
...Hi Davis! <br />Thank you for your amazing algorithm!<br /><br />I have question for your suggestion using it for hyperparameter tuning:<br /><br />"Shouldn't we know first that the neural network we are using is <br /> a Lipschitz function with input its hyperparameters?"<br /><br />There is already work done proving that neural nets are Lipschitz functions,<br />but not with input their hyperparameters. At least I cannot find any work<br />that proves so...<br /><br />Thanks!Ioannis Athanasiadishttps://www.blogger.com/profile/01749941832682059764noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-86131523205001996492018-05-28T22:25:18.047-04:002018-05-28T22:25:18.047-04:00How can I pass dynamic array to bound1 and bound2?...How can I pass dynamic array to bound1 and bound2? All my approaches are failing...Unknownhttps://www.blogger.com/profile/12397084472218321850noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-80869323547289921572018-05-17T09:30:17.981-04:002018-05-17T09:30:17.981-04:00Yes, it's all derivative free.
This kind of a...Yes, it's all derivative free.<br /><br />This kind of algorithm only works for problems with a relatively small number of parameters. There is no way it could possibly optimize a problem with 10s of thousands of parameters (or more, some DNNs have millions to billions of parameters). This is true for any derivative free algorithm. It's just not going to work.<br /><br />But if you have something like 5 parameters you want to optimize then it's great.Davis Kinghttps://www.blogger.com/profile/16577392965630448489noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-83472258503272970612018-05-17T04:35:53.921-04:002018-05-17T04:35:53.921-04:00Hi Davis
Apologies in advance if some of these is...Hi Davis<br /><br />Apologies in advance if some of these is very basic.<br /><br />RE:<br />"You could combine the trust region strategy with pretty much any other derivative free solver."<br />Just to be 100% clear: "the trust region strategy" is *itself* derivative-free (looking at BOBYQA definition) - is my understanding right?<br /><br /><br />RE:<br />Q: "Is it possible to use this brilliant optimizer in the training of neural network of machine learning? Is this optimzer is something that could replace traditional optimizer like Adam,AdaGrad...?"<br />A: "No, it's not going to be reasonable to use this in place of adam or sgd."<br /><br />Is this true only of neural network hyperparam optimisations, or more in genearl of machine learning hyperparam optimisation problems? <br />Would you mind expanding on this?<br /><br />Reiterating my gratitude for your excellent work.Mario Riverahttps://www.blogger.com/profile/10835086690809997056noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-86823581733882319642018-05-16T11:07:35.170-04:002018-05-16T11:07:35.170-04:00No, it's not going to be reasonable to use thi...No, it's not going to be reasonable to use this in place of adam or sgd.Davis Kinghttps://www.blogger.com/profile/16577392965630448489noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-73183556933442176722018-05-16T09:58:15.198-04:002018-05-16T09:58:15.198-04:00Is it possible to use this brilliant optimizer in ...Is it possible to use this brilliant optimizer in the training of neural network of machine learning? Is this optimzer is something that could replace traditional optimizer like Adam,AdaGrad...?Junwei Donghttps://www.blogger.com/profile/05066597905219881172noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-60839874572030610332018-05-15T07:08:02.946-04:002018-05-15T07:08:02.946-04:00You could combine the trust region strategy with p...You could combine the trust region strategy with pretty much any other derivative free solver. I used LIPO here because I find LIPO the most compelling. I have not compared it to HORD. Davis Kinghttps://www.blogger.com/profile/16577392965630448489noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-75865331939487111642018-05-15T03:50:15.331-04:002018-05-15T03:50:15.331-04:00Hi
Excellent work, thank you for sharing.
Two que...Hi<br /><br />Excellent work, thank you for sharing.<br />Two questions:<br />1) Can this be combined with hyperband https://archive.is/CDVR8 ?<br />2) How does this compare to HORD / pysot (https://archive.is/YgRvr)? Do you have any comment on that approach?<br /><br />Greetings from London<br />Mario Riverahttps://www.blogger.com/profile/10835086690809997056noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-80577189620255065472018-05-02T11:16:34.637-04:002018-05-02T11:16:34.637-04:00Thanks, glad you like it.
Yes, you can do this. ...Thanks, glad you like it.<br /><br />Yes, you can do this. See the extended discussion here: http://dlib.net/dlib/global_optimization/global_function_search_abstract.h.html#global_function_search. That's the C++ interface. There is a python interface with essentially an identical API as well.Davis Kinghttps://www.blogger.com/profile/16577392965630448489noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-23951438898824525942018-05-02T11:09:17.878-04:002018-05-02T11:09:17.878-04:00Hi Davis! Thank you very much for this impressive ...Hi Davis! Thank you very much for this impressive algorithm. I was recently struggling with Bayesian Optimization and its parameters, so I definitively got what you were talking about in the first paragraphs of this article.<br /><br />However, I'm wondering if it possible to save the "current state" of the optimization process. BayesOpt (the C++ lib) has a very nice functionality that allows to save the current sampled points as well as the parameters of the optimization, being able to resume it later; I think that something similar could be possible in this case saving the current bounds. <br /><br />I haven't explored DLib fully (I just discovered it some weeks ago), so I don't know if this is currently implemented, but it would be very useful if it is not.<br /><br />Thanks!MatÃas Mattamalahttps://www.blogger.com/profile/10353214905445083399noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-41523698931628125182018-05-01T22:17:06.596-04:002018-05-01T22:17:06.596-04:00I just evaluate 5000 random points and take the be...I just evaluate 5000 random points and take the best. I thought about how you might try to optimize U(x) exactly, but I'm not sure there is an efficient way. And random search of U(x) is plenty good enough.Davis Kinghttps://www.blogger.com/profile/16577392965630448489noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-44935214199704079172018-05-01T21:28:25.590-04:002018-05-01T21:28:25.590-04:00Great blog Davis and very intriguing! The solver p...Great blog Davis and very intriguing! The solver part of delegating it back to solve a similar SVM problem is so elegant!<br /><br />One question though, assuming after we estimate out a good U(x) with minimized K in each step; how should we decide the next evaluation point? Looks to me that in one-dimensional case, all candidate points are within a fixed set: {x|x=(xi + xj) / 2}, so it's only O(n^2) complexity to query U(x). How about higher dimensional cases? Can we still only evaluate those boundary points? Or maybe we need to do random sampling?Yitong Zhouhttps://www.blogger.com/profile/15016020139807848788noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-3120892581465533772018-05-01T12:48:55.513-04:002018-05-01T12:48:55.513-04:00I would love to, however, in the spark mode, each ...I would love to, however, in the spark mode, each model may run on one or multiple executors, not sure how to wrap it with function evaluationnew home Fox Chapel 15238https://www.blogger.com/profile/05534380060960060920noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-39024422281170807792018-05-01T09:42:51.617-04:002018-05-01T09:42:51.617-04:00Yes, you need that.
Why don't you just call d...Yes, you need that.<br /><br />Why don't you just call dlib? It's not hard to call C++ from spark.Davis Kinghttps://www.blogger.com/profile/16577392965630448489noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-54806470571976954452018-05-01T09:40:17.730-04:002018-05-01T09:40:17.730-04:00Hi Davis,
Thanks for the great work
I am doing...Hi Davis, <br /> Thanks for the great work<br />I am doing a port to scala so that it can be used in spark env, would you mind helping me understand some of the issues:<br />https://github.com/davisking/dlib/blob/master/dlib/global_optimization/upper_bound_function.h#L217 trainer.force_last_weight_to_1(true);<br />any reason to set this? I am using liblinear-java, which does not have force_last_weight_to_1<br />https://github.com/bwaldvogel/liblinear-java/<br /><br />Thanks!new home Fox Chapel 15238https://www.blogger.com/profile/05534380060960060920noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-49549508283812681242018-04-26T07:30:07.605-04:002018-04-26T07:30:07.605-04:00That's exactly what this optimizer is for. So...That's exactly what this optimizer is for. So it will work fine.Davis Kinghttps://www.blogger.com/profile/16577392965630448489noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-32555630925300320002018-04-25T23:43:49.325-04:002018-04-25T23:43:49.325-04:00Super interesting. I was going to try to use this ...Super interesting. I was going to try to use this to optimize some neural network hyperparameters (LR, momentum, etc) -- but I was wondering how well you'd expect this to work when there's variation in the function evaluation. Any thoughts?<br /><br />Thanksbkjhttps://www.blogger.com/profile/16751420649036201956noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-1012797626632147302018-04-18T20:53:08.353-04:002018-04-18T20:53:08.353-04:00If you square both sides of the constraints in the...If you square both sides of the constraints in the QP shown in the post you get a QP in canonical form and can use any QP solver you want to solve it. That gives you U(x).Davis Kinghttps://www.blogger.com/profile/16577392965630448489noreply@blogger.com