Point Cloud Encoding with Transformer in Michelangelo and CAD-Recode

About

There is an interesting paper – CAD-Recode: Reverse Engineering CAD Code from Point Clouds -. I found it 2 days after when it is open 17 Dec 2024. https://arxiv.org/abs/2412.14042

This research is in the category of CAD data generation, similar to DeepCAD.
Unlike DeepCAD, it generates data using CADQuery and leverages synthetic data, achieving a highly rich three-dimensional representation.
There are several interesting aspects, and I’d like to introduce one of them.

( It seems that the strategy comes from Michelangelo in https://arxiv.org/abs/2306.17115)

3D Point Cloud Encoding in PointNet

PointNet

PointNet’s encoding includes a process (T-Net) that estimates an appropriate transformation from the point cloud to enhance its spatial consistency. Additionally, it achieves order-invariant processing of point clouds by applying global pooling to the features. However, PointNet does not infer viewpoints; instead, it focuses on spatial transformations and order invariance.

Point Cloud Encoding in CAD-Recode

It takes a significantly different approach from PointNet’s encoding method.

Apply uniform random sampling to downsample the input point clouds
Apply lexicographic sorting by (z, y, x) coordinates
Apply fourier positional encoding of coordinates and concatenating with point normals
Employ a linear layer projection

Since it is using lexicographic sorting for each axis, the order is going to be neatly ordered.

And also, it is using positional encoding. I am not sure whether Fourier positional encoding was adopted specifically for using a Transformer, but, probably it is. The form of positional encoding in 3 dimensional space is like this.

\text{PE}(\mathbf{p}) = \begin{bmatrix} \sin(2\pi f_1 p_x), \cos(2\pi f_1 p_x), \sin(2\pi f_1 p_y), \cos(2\pi f_1 p_y), \sin(2\pi f_1 p_z), \cos(2\pi f_1 p_z), \\ \sin(2\pi f_2 p_x), \cos(2\pi f_2 p_x), \sin(2\pi f_2 p_y), \cos(2\pi f_2 p_y), \sin(2\pi f_2 p_z), \cos(2\pi f_2 p_z), \\ \vdots \\ \sin(2\pi f_N p_x), \cos(2\pi f_N p_x), \sin(2\pi f_N p_y), \cos(2\pi f_N p_y), \sin(2\pi f_N p_z), \cos(2\pi f_N p_z) \end{bmatrix}

Where f is frequency, and N is the order of encoding.

Their implementation is quite simple.

class FourierPointEncoder(nn.Module):
    def __init__(self, hidden_size):
        super().__init__()
        frequencies = 2.0 ** torch.arange(8, dtype=torch.float32)
        self.register_buffer('frequencies', frequencies, persistent=False)
        self.projection = nn.Linear(54, hidden_size)

    def forward(self, points):
        x = points[..., :3]
        x = (x.unsqueeze(-1) * self.frequencies).view(*x.shape[:-1], -1)
        x = torch.cat((points[..., :3], x.sin(), x.cos()), dim=-1)
        x = self.projection(torch.cat((x, points[..., 3:]), dim=-1))
        return x

I will share the other components in the paper next time.

About

3D Point Cloud Encoding in PointNet

Point Cloud Encoding in CAD-Recode

Related News

You may have missed