Estimation of Gamma in K-Prototypes #185

crixus5678 · 2022-09-02T06:31:09Z

For the estimation of gamma in k-prototypes, the current implementation appears to estimate the gamma by using 0.5 * (standard deviation of all numeric data).

However, in the paper [Huang 1997] it was mentioned that gamma is guided by the " average standard deviation of numeric attributes". If that is the case, shouldn't we be calculating the mean for all the standard deviation of each numeric attribute?

nicodv · 2022-09-06T05:51:55Z

For reference, from the paper:

Generally speaking, γ_l is related to σ_l , the average standard deviation of numeric attributes in cluster l. In practice, σ_l can be used as a guidance to determine γ_l . However, since σ_l is unknown before clustering, the overall average standard deviation σ of numeric attributes can be used for all σ_l.

So yes, it appears you are correct in your statement.

crixus5678 added the bug label Sep 2, 2022

nicodv mentioned this issue Sep 6, 2022

Improve estimation of gamma for k-prototypes #186

Merged

nicodv closed this as completed in #186 Sep 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Estimation of Gamma in K-Prototypes #185

Estimation of Gamma in K-Prototypes #185

crixus5678 commented Sep 2, 2022

nicodv commented Sep 6, 2022

Estimation of Gamma in K-Prototypes #185

Estimation of Gamma in K-Prototypes #185

Comments

crixus5678 commented Sep 2, 2022

nicodv commented Sep 6, 2022