「データ可視化のDimensional Reduction:PCAとT-SNE、UMAPとLDA」Article Review in韓国語

3819 ワード

原文:https://towardsdatascience.com/dimensionality-reduction-for-data-visualization-pca-vs-tsne-vs-umap-be4aa7b1cb29

	이 글은 개인적인 공부를 위해 해외 기사를 번역해서 리뷰한 것입니다!

What is Dimensionality Reduction?

多機能機器の最も重要な問題:

多くの人が訓練が非常に遅く、

では良い解決策が見つからない

次元CURSE(次元呪い)

次元の縮小は重要な要因です.

Dimensionality Reductionで使用される場所

Data Compression

Noise Reduction

Data Classification

Data Visualization

Main Approaches for Dimensionality Reduction

Projection

高レベルデータを低レベル

にドラッグ&ドロップ

点間の距離をほぼ保持する特徴がある.

*PCA

Manifold Learning

トレーニングインスタンスが存在する位置で対流形をモデリングし、次元を縮小する方法をManifold Learningと呼ぶ.
->研修インスタンスが存在する場所は、実際のサブスペース

を表します.

What is Manifold Learning?

Manifold = 고차원의 데이터
- 고차원의 데이터를 공간상에 표현할 때 찍히는 점들을 아우르는 subspace = Manifold(원본 공간)
- Manifold를 찾는 것 = Manifold Learning
- 잘 찾은 Manifold에서 projection 시키면 데이터의 차원이 축소될 수 있다.
- Manifold 학습 = 학습이 되지 않은 상태에서 데이터를 통해 모델을 학습해 나가는 것.
- 스위스 롤

Manifold LearningはManifold仮定に基づいている.

Manifold Hypothesis(assumption) in general
(a) Natural data in high dimensional spaces concemtrates close to lower dimensional manifolds
:高次元データの密度は低いが、これらのデータセットを含む低次元管理フォルダがある.
(b) Probability density decreases very rapidly when moving away from the supporting manifold.
:この低次元のバンパーを离れた瞬间、密度が急激に低下.

				reference : https://junstar92.tistory.com/157

Manifold assumption in semi-supervised learning?
(a) the input space is composed of multiple lower-dimensional manifolds on which all data lie
(b) data points lying on the same manifold have the same label

Manifold学習はナビゲーションデータ解析に有用であり,学習を指導するのに用いられないため,半監督として用いられる.

最大の目的は、隣接するデータポイントの情報を保持することです.

*t-SNE

PCA(principal Component Analysis)

for "unsupervised"algorithm

Principal Components

Principalコンポーネント:トレーニングデータの最大分散数の軸(axis)

means finding the First, Second, Third components orthogonal to the other component(s)

各分散を1つの軸とし、直交する子供を1つずつ切り落とすという意味

t-SNE(T-distributed stochastic neighbour embedding)

の高次元データセットを使用します.

は、1つのデータポイントに2次元および3次元の地図位置を提供する.

は、すべてのデータに対して距離を描くのではなく、1つのデータに対して2次元と3次元の距離を作成し、これらのデータの意味の間にクラスタを探します.
したがって、データの意味を保持することができる.

t−SNEは、類似のインスタンス間の距離を維持し、他のインスタンスを低減して次元を縮小する.

LDA(Linear Discriminant Analysis)

"supervised"and computes the directions("linear discriminants") seperation maximized

モード分類前処理でよく用いられるテクニック

カバー継ぎ手と計算コストを削減するために、class-seperabilityをより良くしたくない.

class-可分性:カテゴリの分類度合い(?)説明に使うべきだ.

PCAと似ているが,異なるレベル間の分離を最大化した.

UMAP(Uniform Maniold Approximation and Projection)

nonlinear dimensionality reduction method

effective for visualizing clusterings or groups of data points and their relative proximities

t-SNE間の差異=拡張性(拡張性)

疎行列に直接適用することができる.

その利点は、

の前処理プロセスに適用できることである.

t−SNEは似ているが、より速い.

Reference

この問題について(「データ可視化のDimensional Reduction:PCAとT-SNE、UMAPとLDA」Article Review in韓国語), 我々は、より多くの情報をここで見つけました https://velog.io/@jee-9/Dimensionality-Reduction-for-Data-Visualization-PCA-vs-t-SNE-vs-UMAP-vs-LDA-Article-Review-in-한국어

テキストは自由に共有またはコピーできます。ただし、このドキュメントのURLは参考URLとして残しておいてください。

Collection and Share based on the CC Protocol

プログラマーLV.3ネットワーク

[解決]CORSエラーの解決