Model API¶
SRF Class¶
SRF ¶
SRF(
rank: int = 10,
rho: float = 3.0,
max_outer: int = 30,
max_inner: int = 20,
tol: float = 0.0001,
verbose: int = 0,
init: str = "random_sqrt",
random_state: int | None = None,
missing_values: float | None = np.nan,
bounds: tuple[float, float] | None = (None, None),
)
Bases: TransformerMixin, BaseEstimator
Symmetric non-negative matrix factorization via ADMM.
Factorizes a symmetric similarity matrix S into WW' where W >= 0. Handles missing entries and optional bound constraints on V.
The objective is: min_{W>=0, V} ||M * (S - V)||^2_F + rho/2 * ||V - WW'||^2_F where M is the observation mask.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rank
|
int
|
Number of factors (dimensionality of the latent space) |
10
|
rho
|
float
|
SRF penalty parameter controlling constraint enforcement |
3.0
|
max_outer
|
int
|
Maximum number of SRF outer iterations |
10
|
max_inner
|
int
|
Maximum iterations for w-subproblem per outer iteration |
30
|
tol
|
float
|
Convergence tolerance for constraint violation |
1e-4
|
verbose
|
int
|
Whether to print optimization progress |
0
|
init
|
str
|
Method for factor initialization ('random', 'random_sqrt') |
'random_sqrt'
|
random_state
|
int or None
|
Random seed for reproducible initialization |
None
|
missing_values
|
float or None
|
Values to be treated as missing to mask the matrix |
np.nan
|
bounds
|
tuple of (float, float) or None
|
Tuple of (lower, upper) bounds for the auxiliary variable v. If None, the bounds are inferred from the data. In practice, one can also pass the expected bounds of the matrix (e.g. (0, 1) for cosine similarity) |
(None, None)
|
Attributes:
| Name | Type | Description |
|---|---|---|
w_ |
np.ndarray of shape (n_samples, rank)
|
Learned factor matrix w |
components_ |
np.ndarray of shape (n_samples, rank)
|
Alias for w_ (sklearn compatibility) |
n_iter_ |
int
|
Number of SRF iterations performed |
history_ |
dict
|
Dictionary containing optimization metrics per iteration |
Examples:
>>> # Basic usage with complete data
>>> from pysrf import SRF
>>> model = SRF(rank=10, random_state=42)
>>> w = model.fit_transform(similarity_matrix)
>>> reconstruction = w @ w.T
>>> # Usage with missing data (NaN values)
>>> similarity_matrix[mask] = np.nan
>>> model = SRF(rank=10, missing_values=np.nan)
>>> w = model.fit_transform(similarity_matrix)
References
.. [1] Shi et al. (2016). "Inexact Block Coordinate Descent Methods For Symmetric Nonnegative Matrix Factorization"
Source code in pysrf/model.py
_check_convergence ¶
_check_convergence(
primal_res: float,
dual_res: float,
v: ndarray,
x_hat: ndarray,
lam: ndarray,
) -> bool
Check ADMM convergence using primal and dual residual norms.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
primal_res
|
float
|
Norm of the primal residual ||V - WW'||_F |
required |
dual_res
|
float
|
Norm of the dual residual rho * ||V - V_old||_F |
required |
v
|
ndarray of shape (n, n)
|
Current auxiliary variable |
required |
x_hat
|
ndarray of shape (n, n)
|
Current estimate WW' |
required |
lam
|
ndarray of shape (n, n)
|
Current Lagrange multipliers |
required |
Returns:
| Name | Type | Description |
|---|---|---|
converged |
bool
|
True if both primal and dual tolerances are satisfied |
Source code in pysrf/model.py
_compute_metrics ¶
_compute_metrics(
x: ndarray,
v: ndarray,
x_hat: ndarray,
lam: ndarray,
primal_residual: ndarray,
v_old: ndarray | None = None,
) -> dict[str, float]
Compute ADMM optimization metrics for monitoring and convergence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray of shape (n, n)
|
Original similarity matrix |
required |
v
|
ndarray of shape (n, n)
|
Auxiliary variable |
required |
x_hat
|
ndarray of shape (n, n)
|
Current estimate WW' |
required |
lam
|
ndarray of shape (n, n)
|
Lagrange multipliers |
required |
primal_residual
|
ndarray of shape (n, n)
|
V - WW' |
required |
v_old
|
ndarray of shape (n, n) or None
|
Previous V, used to compute the dual residual |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
metrics |
dict[str, float]
|
Keys: total_objective, data_fit, penalty, lagrangian, rec_error, evar, primal_residual, dual_residual |
Source code in pysrf/model.py
_fit_complete_data ¶
Fit model with complete data (no missing values).
Uses _frobenius_residual to compute ||X - WW'||_F without forming WW', so memory stays at O(n^2).
Source code in pysrf/model.py
_fit_missing_data ¶
Fit model with missing data using ADMM.
Source code in pysrf/model.py
_store_results ¶
fit ¶
Fit the symmetric NMF model to the data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
array-like of shape (n_samples, n_samples)
|
Symmetric similarity matrix. Missing values are allowed and should be marked according to the missing_values parameter. |
required |
y
|
Ignored
|
Not used, present here for API consistency by convention. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
self |
object
|
Fitted estimator. |
Source code in pysrf/model.py
fit_transform ¶
Fit the model and return the learned factors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
array-like of shape (n_samples, n_samples)
|
Symmetric similarity matrix |
required |
y
|
Ignored
|
Not used, present here for API consistency by convention. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
w |
array-like of shape (n_samples, rank)
|
Learned factor matrix |
Source code in pysrf/model.py
reconstruct ¶
Reconstruct the similarity matrix from factors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
w
|
array-like of shape (n_samples, rank) or None
|
Factor matrix to use for reconstruction. If None, uses the fitted factors. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
s_hat |
array-like of shape (n_samples, n_samples)
|
Reconstructed similarity matrix |
Source code in pysrf/model.py
score ¶
Score the model using reconstruction error on observed entries only.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
array-like of shape (n_samples, n_samples)
|
Symmetric similarity matrix. Missing values are allowed and should be marked according to the missing_values parameter. |
required |
y
|
Ignored
|
Not used, present here for API consistency by convention. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
mse |
float
|
Mean squared error of the reconstruction on observed entries. |
Source code in pysrf/model.py
transform ¶
Project data onto the learned factor space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
array-like of shape (n_samples, n_samples)
|
Symmetric matrix to transform |
required |
Returns:
| Name | Type | Description |
|---|---|---|
w |
array-like of shape (n_samples, rank)
|
Transformed data |