tag:blogger.com,1999:blog-6061887630060661987.post4076341170268576483..comments2018-07-19T21:20:35.996-04:00Comments on dlib C++ Library: A Global Optimization Algorithm Worth UsingDavis Kingnoreply@blogger.comBlogger55125tag:blogger.com,1999:blog-6061887630060661987.post-36521416562064596022018-07-13T23:13:30.728-04:002018-07-13T23:13:30.728-04:00Hi Davis! Thank you very much for the sharing.
I ...Hi Davis! Thank you very much for the sharing.<br /><br />I am trying to use your algorithm to tune a Machine Learning program that takes a day to get one data point (each with a specific set of hyper-parameters) . I've already a few data point and would like to know if there is an option in your function to incorporate these data points to predefine the upper bound U(x) (rather than starting from a random point)?<br /><br />Also, is there an option to print out/save the value of the parameter x_i and result y obtained in each iteration so that I can decide whether I should iterate more. <br /><br />My calculation just takes so long to run and those features will help a lot!<br /><br />Thank you.<br />Unknownhttps://www.blogger.com/profile/03452827871556753687noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-13195488982386342612018-07-03T04:44:34.181-04:002018-07-03T04:44:34.181-04:00Is there an easy way to visualise the optimisation...Is there an easy way to visualise the optimisation/surrogate surface? Something similar to what one gets out of skopt plot_objective, plot_evaluations functions?Fahiz Baba Yarahttps://www.blogger.com/profile/02509475568147194357noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-5784628056551123862018-05-30T11:51:46.905-04:002018-05-30T11:51:46.905-04:00Hi Davis!
Thank you for your amazing algorithm!
...Hi Davis! <br />Thank you for your amazing algorithm!<br /><br />I have question for your suggestion using it for hyperparameter tuning:<br /><br />"Shouldn't we know first that the neural network we are using is <br /> a Lipschitz function with input its hyperparameters?"<br /><br />There is already work done proving that neural nets are Lipschitz functions,<br />but not with input their hyperparameters. At least I cannot find any work<br />that proves so...<br /><br />Thanks!Ioannis Athanasiadishttps://www.blogger.com/profile/01749941832682059764noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-86131523205001996492018-05-28T22:25:18.047-04:002018-05-28T22:25:18.047-04:00How can I pass dynamic array to bound1 and bound2?...How can I pass dynamic array to bound1 and bound2? All my approaches are failing...Unknownhttps://www.blogger.com/profile/12397084472218321850noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-80869323547289921572018-05-17T09:30:17.981-04:002018-05-17T09:30:17.981-04:00Yes, it's all derivative free.
This kind of a...Yes, it's all derivative free.<br /><br />This kind of algorithm only works for problems with a relatively small number of parameters. There is no way it could possibly optimize a problem with 10s of thousands of parameters (or more, some DNNs have millions to billions of parameters). This is true for any derivative free algorithm. It's just not going to work.<br /><br />But if you have something like 5 parameters you want to optimize then it's great.Davis Kinghttps://www.blogger.com/profile/16577392965630448489noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-83472258503272970612018-05-17T04:35:53.921-04:002018-05-17T04:35:53.921-04:00Hi Davis
Apologies in advance if some of these is...Hi Davis<br /><br />Apologies in advance if some of these is very basic.<br /><br />RE:<br />"You could combine the trust region strategy with pretty much any other derivative free solver."<br />Just to be 100% clear: "the trust region strategy" is *itself* derivative-free (looking at BOBYQA definition) - is my understanding right?<br /><br /><br />RE:<br />Q: "Is it possible to use this brilliant optimizer in the training of neural network of machine learning? Is this optimzer is something that could replace traditional optimizer like Adam,AdaGrad...?"<br />A: "No, it's not going to be reasonable to use this in place of adam or sgd."<br /><br />Is this true only of neural network hyperparam optimisations, or more in genearl of machine learning hyperparam optimisation problems? <br />Would you mind expanding on this?<br /><br />Reiterating my gratitude for your excellent work.Mario Riverahttps://www.blogger.com/profile/10835086690809997056noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-86823581733882319642018-05-16T11:07:35.170-04:002018-05-16T11:07:35.170-04:00No, it's not going to be reasonable to use thi...No, it's not going to be reasonable to use this in place of adam or sgd.Davis Kinghttps://www.blogger.com/profile/16577392965630448489noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-73183556933442176722018-05-16T09:58:15.198-04:002018-05-16T09:58:15.198-04:00Is it possible to use this brilliant optimizer in ...Is it possible to use this brilliant optimizer in the training of neural network of machine learning? Is this optimzer is something that could replace traditional optimizer like Adam,AdaGrad...?Junwei Donghttps://www.blogger.com/profile/05066597905219881172noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-60839874572030610332018-05-15T07:08:02.946-04:002018-05-15T07:08:02.946-04:00You could combine the trust region strategy with p...You could combine the trust region strategy with pretty much any other derivative free solver. I used LIPO here because I find LIPO the most compelling. I have not compared it to HORD. Davis Kinghttps://www.blogger.com/profile/16577392965630448489noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-75865331939487111642018-05-15T03:50:15.331-04:002018-05-15T03:50:15.331-04:00Hi
Excellent work, thank you for sharing.
Two que...Hi<br /><br />Excellent work, thank you for sharing.<br />Two questions:<br />1) Can this be combined with hyperband https://archive.is/CDVR8 ?<br />2) How does this compare to HORD / pysot (https://archive.is/YgRvr)? Do you have any comment on that approach?<br /><br />Greetings from London<br />Mario Riverahttps://www.blogger.com/profile/10835086690809997056noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-80577189620255065472018-05-02T11:16:34.637-04:002018-05-02T11:16:34.637-04:00Thanks, glad you like it.
Yes, you can do this. ...Thanks, glad you like it.<br /><br />Yes, you can do this. See the extended discussion here: http://dlib.net/dlib/global_optimization/global_function_search_abstract.h.html#global_function_search. That's the C++ interface. There is a python interface with essentially an identical API as well.Davis Kinghttps://www.blogger.com/profile/16577392965630448489noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-23951438898824525942018-05-02T11:09:17.878-04:002018-05-02T11:09:17.878-04:00Hi Davis! Thank you very much for this impressive ...Hi Davis! Thank you very much for this impressive algorithm. I was recently struggling with Bayesian Optimization and its parameters, so I definitively got what you were talking about in the first paragraphs of this article.<br /><br />However, I'm wondering if it possible to save the "current state" of the optimization process. BayesOpt (the C++ lib) has a very nice functionality that allows to save the current sampled points as well as the parameters of the optimization, being able to resume it later; I think that something similar could be possible in this case saving the current bounds. <br /><br />I haven't explored DLib fully (I just discovered it some weeks ago), so I don't know if this is currently implemented, but it would be very useful if it is not.<br /><br />Thanks!MatÃas Mattamalahttps://www.blogger.com/profile/10353214905445083399noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-41523698931628125182018-05-01T22:17:06.596-04:002018-05-01T22:17:06.596-04:00I just evaluate 5000 random points and take the be...I just evaluate 5000 random points and take the best. I thought about how you might try to optimize U(x) exactly, but I'm not sure there is an efficient way. And random search of U(x) is plenty good enough.Davis Kinghttps://www.blogger.com/profile/16577392965630448489noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-44935214199704079172018-05-01T21:28:25.590-04:002018-05-01T21:28:25.590-04:00Great blog Davis and very intriguing! The solver p...Great blog Davis and very intriguing! The solver part of delegating it back to solve a similar SVM problem is so elegant!<br /><br />One question though, assuming after we estimate out a good U(x) with minimized K in each step; how should we decide the next evaluation point? Looks to me that in one-dimensional case, all candidate points are within a fixed set: {x|x=(xi + xj) / 2}, so it's only O(n^2) complexity to query U(x). How about higher dimensional cases? Can we still only evaluate those boundary points? Or maybe we need to do random sampling?Yitong Zhouhttps://www.blogger.com/profile/15016020139807848788noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-3120892581465533772018-05-01T12:48:55.513-04:002018-05-01T12:48:55.513-04:00I would love to, however, in the spark mode, each ...I would love to, however, in the spark mode, each model may run on one or multiple executors, not sure how to wrap it with function evaluationnew home Fox Chapel 15238https://www.blogger.com/profile/05534380060960060920noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-39024422281170807792018-05-01T09:42:51.617-04:002018-05-01T09:42:51.617-04:00Yes, you need that.
Why don't you just call d...Yes, you need that.<br /><br />Why don't you just call dlib? It's not hard to call C++ from spark.Davis Kinghttps://www.blogger.com/profile/16577392965630448489noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-54806470571976954452018-05-01T09:40:17.730-04:002018-05-01T09:40:17.730-04:00Hi Davis,
Thanks for the great work
I am doing...Hi Davis, <br /> Thanks for the great work<br />I am doing a port to scala so that it can be used in spark env, would you mind helping me understand some of the issues:<br />https://github.com/davisking/dlib/blob/master/dlib/global_optimization/upper_bound_function.h#L217 trainer.force_last_weight_to_1(true);<br />any reason to set this? I am using liblinear-java, which does not have force_last_weight_to_1<br />https://github.com/bwaldvogel/liblinear-java/<br /><br />Thanks!new home Fox Chapel 15238https://www.blogger.com/profile/05534380060960060920noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-49549508283812681242018-04-26T07:30:07.605-04:002018-04-26T07:30:07.605-04:00That's exactly what this optimizer is for. So...That's exactly what this optimizer is for. So it will work fine.Davis Kinghttps://www.blogger.com/profile/16577392965630448489noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-32555630925300320002018-04-25T23:43:49.325-04:002018-04-25T23:43:49.325-04:00Super interesting. I was going to try to use this ...Super interesting. I was going to try to use this to optimize some neural network hyperparameters (LR, momentum, etc) -- but I was wondering how well you'd expect this to work when there's variation in the function evaluation. Any thoughts?<br /><br />Thanksbkjhttps://www.blogger.com/profile/16751420649036201956noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-1012797626632147302018-04-18T20:53:08.353-04:002018-04-18T20:53:08.353-04:00If you square both sides of the constraints in the...If you square both sides of the constraints in the QP shown in the post you get a QP in canonical form and can use any QP solver you want to solve it. That gives you U(x).Davis Kinghttps://www.blogger.com/profile/16577392965630448489noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-1290869536848429822018-04-18T19:06:09.178-04:002018-04-18T19:06:09.178-04:00Hi Davis,
You mentioned that in your MaxLIPO vers...Hi Davis,<br /><br />You mentioned that in your MaxLIPO version of LIPO you select the maximum upper bounding point, i.e., you are maximizing U(x). How exactly do you do this? Thank you.Chavdar Papazovhttps://www.blogger.com/profile/08416949440035521639noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-53558807102127478042018-04-07T07:14:03.653-04:002018-04-07T07:14:03.653-04:00Yes, you can use it with parallel function evaluat...Yes, you can use it with parallel function evaluations. I specifically made the API in a way that supports that. Read the documentation here http://dlib.net/optimization.html#global_function_search and also click on the more details button and read the documentation. There is a lot of discussion of this topic there. The entire API is designed around the goal of supporting parallel function evaluation. Davis Kinghttps://www.blogger.com/profile/16577392965630448489noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-78477964758719750192018-04-05T22:17:14.552-04:002018-04-05T22:17:14.552-04:00Hello Davis,
Great work with the library. This co...Hello Davis,<br /><br />Great work with the library. This could be a major relief for anyone who has to deal with parameter tuning. <br /><br />I have a question for you. From reading your post, the method seems to be a <i>sequential, direction based method</i>. Is that correct? We are looking to automate some of the model building process and hyper-parameter tuning is one of them and so I wanted to understand if it is possible to distribute the computations of the algorithm across nodes. <br /><br />So, given the nature of the algorithm, is it possible to have a distributed version of this?Shubham Ashok Gandhihttps://www.blogger.com/profile/15863993536631889372noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-34708025677242840542018-04-05T16:35:02.704-04:002018-04-05T16:35:02.704-04:00I just realized that my confusion originated from ...I just realized that my confusion originated from their notation. In that formula, they have both k_i and X_i, but these are actually not the same "i"s.Hao Zhanghttps://www.blogger.com/profile/07624564561599103144noreply@blogger.comtag:blogger.com,1999:blog-6061887630060661987.post-79438466081647877522018-04-05T12:25:43.010-04:002018-04-05T12:25:43.010-04:00No, that's not what it says. It says they pic...No, that's not what it says. It says they pick the smallest k that satisfies the Lipschitz condition everywhere.Davis Kinghttps://www.blogger.com/profile/16577392965630448489noreply@blogger.com