From c72f14f50143ed37548a1cbe47700a9ded235481 Mon Sep 17 00:00:00 2001 From: William G Underwood <42812654+WGUNDERWOOD@users.noreply.github.com> Date: Sat, 30 Mar 2024 13:23:09 -0400 Subject: [PATCH] proof --- abstract.tex | 4 ++-- acknowledgments.tex | 16 ++++++++-------- introduction.tex | 34 +++++++++++++++++----------------- 3 files changed, 27 insertions(+), 27 deletions(-) diff --git a/abstract.tex b/abstract.tex index d037ee6..574959e 100644 --- a/abstract.tex +++ b/abstract.tex @@ -18,14 +18,14 @@ forests, including a central limit theorem for the estimated regression function and a characterization of the bias. We show how to conduct feasible and valid nonparametric inference by constructing confidence intervals, and -further provide a debiasing procedure which enables minimax-optimal estimation +further provide a debiasing procedure that enables minimax-optimal estimation rates for smooth function classes in arbitrary dimension. Next, we turn our attention to nonparametric kernel density estimation with dependent dyadic network data. We present results for minimax-optimal estimation, including a novel lower bound for the dyadic uniform convergence rate, and develop methodology for uniform inference via confidence bands and -counterfactual analysis. Our methods are based on strong approximation and are +counterfactual analysis. Our methods are based on strong approximations and are designed to be adaptive to potential dyadic degeneracy. We give empirical results with simulated and real-world economic trade data. diff --git a/acknowledgments.tex b/acknowledgments.tex index 405bb6d..90658d7 100644 --- a/acknowledgments.tex +++ b/acknowledgments.tex @@ -2,16 +2,16 @@ I am extremely fortunate to have been surrounded by many truly wonderful people over the course of my career, and without their support this dissertation would -not have been possible. While it is infeasible for me to identify every one of -them individually, I would like to mention a few names in particular, to +not have been possible. While it is impossible for me to identify every one of +them individually, I would like to mention a few names in particular to recognize those who have been especially important to me during the last few years. Firstly, I would like to express my utmost gratitude to my Ph.D.\ adviser, Matias Cattaneo. Working with Matias has been genuinely inspirational for me, and I could not have asked for a more rewarding start to my journey as a -researcher. From the very beginning he has guided me expertly through my -studies, providing hands-on assistance when required, while also allowing me the +researcher. From the very beginning, he has guided me expertly through my +studies, providing hands-on assistance when required while also allowing me the independence necessary to develop as an academic. I hope that, during the four years we have worked together, I have acquired just a fraction of his formidable mathematical intuition, keen attention to detail, boundless creativity, and @@ -28,14 +28,14 @@ Amir Ali Ahmadi, Ramon van Handel, Mikl{\'o}s R{\'a}cz, and Mykhaylo Shkolnikov, my colleagues Sanjeev Kulkarni and Roc{\'i}o Titiunik, and my former supervisor Mihai Cucuringu. -I am thankful also for the staff members at Princeton who have been -perpetually helpful, and would like to identify Kim +I am also thankful for the staff members at Princeton who have been +perpetually helpful, and I would like to identify Kim Lupinacci in particular; her assistance in all things administrative has been invaluable. I am grateful to my fellow graduate students in the ORFE department for their technical expertise and generosity with their time, and for making Sherrerd -Hall such a vibrant and exciting space; especially +Hall such a vibrant and exciting space, especially Jose Avilez, Pier Beneventano, Ben Budway, @@ -90,7 +90,7 @@ and Anita Zhang. Thank you to the Princeton Chapel Choir for being such a wonderful community of musicians and a source of close friends, -and to our directors Nicole Aldrich and Penna Rose, and organist Eric Plutz. +and to our directors, Nicole Aldrich and Penna Rose, and organist Eric Plutz. Lastly, yet most importantly, I want to thank my family for their unwavering support throughout my studies. My visits back home have been a source of joy diff --git a/introduction.tex b/introduction.tex index 3d00ff1..53cc6f0 100644 --- a/introduction.tex +++ b/introduction.tex @@ -15,25 +15,25 @@ \chapter{Introduction} random forests, neural networks, and many more. % nonparametric estimation is good -The benefits of the nonparametric framework are clear; statistical procedures +The benefits of the nonparametric framework are clear: statistical procedures can be formulated in cases where the stringent assumptions of parametric models are untestable, demonstrably violated, or simply unreasonable. -Further, the resulting -methods often as a consequence inherit desirable robustness properties against -various forms of misspecification or misuse. The class of problems which can be +As a consequence, +the resulting methods often inherit desirable robustness properties against +various forms of misspecification or misuse. The class of problems that can be formulated is correspondingly larger: arbitrary distributions and relationships can be characterized and estimated in a principled manner. % nonparametric estimation is hard Nonetheless, these attractive properties do come at a price. In particular, as its name suggests, the nonparametric approach forgoes the ability to reduce -a complex statistical problem to that of estimating a fixed finite number of -parameters. Rather, nonparametric procedures typically involve making inference +a complex statistical problem to that of estimating a fixed, finite number of +parameters. Rather, nonparametric procedures typically involve making inferences about a growing number of parameters simultaneously, as witnessed in high-dimensional regimes, or even directly handling infinite-dimensional objects such as entire regression or density functions. As a consequence, nonparametric estimators are usually less efficient than their corresponding -correctly specified parametric counterparts, when these are available; rates of +correctly specified parametric counterparts, when they are available; rates of convergence tend to be slower, and confidence sets more conservative. Another challenge is that theoretical mathematical analyses of nonparametric estimators are often significantly more demanding than those required for low-dimensional @@ -54,7 +54,7 @@ \chapter{Introduction} a central concept in classical statistics, and despite the rapid recent development of theory for modern nonparametric estimators, their applicability to statistical inference is in certain cases rather less well -studied: theoretically sound and practically implementable inference procedures +studied; theoretically sound and practically implementable inference procedures are sometimes absent in the literature. % complex data @@ -64,11 +64,11 @@ \chapter{Introduction} framework of independent and identically distributed samples, and instead might consist of time series, stochastic processes, networks, or high-dimensional or functional data, to name just a few. -Therefore it is important to understand how nonparametric methods might be +Therefore, it is important to understand how nonparametric methods might be adapted to correctly handle these data types, maintaining fast estimation rates and valid techniques for statistical inference. The technical challenges associated with such an endeavor are non-trivial; many standard techniques are -ineffective in the presence of dependent or infinite-dimensional data for +ineffective in the presence of dependent or infinite-dimensional data, for example. As such, the development of new mathematical results in probability theory plays an important role in the comprehensive treatment of nonparametric statistics with complex data. @@ -85,7 +85,7 @@ \section*{Overview of the dissertation} % what are random forests Random forests are popular ensembling-based methods for classification and regression, which are well known for their good performance, flexibility, -robustness and efficiency. The majority of random forest models share the +robustness, and efficiency. The majority of random forest models share the following common framework for producing estimates of a classification or regression function using covariates and a response variable. Firstly, the covariate space is partitioned in some algorithmic manner, possibly using a @@ -103,9 +103,9 @@ \section*{Overview of the dissertation} % mondrian random forests One interesting such example is that of the Mondrian random forest, in which the underlying partitions (or trees) are constructed independently of the data. -Naturally this restriction rules out many classical random forest models which +Naturally, this restriction rules out many classical random forest models, which exhibit a complex and data-dependent partitioning scheme. Instead, trees are -sampled from a canonical stochastic process, known as the Mondrian process, +sampled from a canonical stochastic process known as the Mondrian process, which endows the resulting tree and forest estimators with various agreeable features. @@ -117,7 +117,7 @@ \section*{Overview of the dissertation} consistent variance estimator, allows one to perform asymptotically valid statistical inference, such as constructing confidence intervals, on the unknown regression function. We also provide a debiasing procedure for Mondrian -random forests which allows them to achieve minimax-optimal estimation rates +random forests, which allows them to achieve minimax-optimal estimation rates with H{\"o}lder smooth regression functions, for any smoothness parameter and in arbitrary dimension. @@ -136,7 +136,7 @@ \section*{Overview of the dissertation} % broad scope We focus on nonparametric estimation and inference with dyadic -data, and in particular we seek methods which are robust in the sense that our +data, and in particular we seek methods that are robust in the sense that our results should hold uniformly across the support of the data. Such uniformity guarantees allow for statistical inference in a broader range of settings, including specification testing and distributional counterfactual analysis. We @@ -155,8 +155,8 @@ \section*{Overview of the dissertation} causal inference and program evaluation. % why it is difficult A crucial feature of dyadic distributions is that they may be ``degenerate'' at -certain points in the support of the data, a property making our analysis -somewhat delicate. Nonetheless our methods for uniform inference remain robust +certain points in the support of the data, a property that makes our analysis +somewhat delicate. Nonetheless, our methods for uniform inference remain robust to the potential presence of such points. % applications For implementation purposes, we discuss inference procedures based on positive