r/MachineLearning 24h High-Signal Summary
The 24h window was sparse but had one concrete systems signal: a report that cuBLAS batched FP32 GEMM underutilizes RTX 5090-class hardware, with reproducible comparisons against a custom kernel showing large gains. Secondary signal came from a new GBDT implementation (ibu-boost) applying absolute-threshold split rejection from recent screening work. Most other activity was request/discussion-level rather than publishable benchmarks.
Papers & Benchmarks
-
cuBLAS batched FP32 performance anomaly report (RTX 5090) is the highest-signal technical item. The post claims cuBLAS selects inefficient kernels across multiple matrix sizes/batches, with side-by-side results against a custom kernel indicating substantial speedups in many settings. If reproducible by others, this is operationally important for training/inference stacks relying on batched GEMM. https://www.reddit.com/r/MachineLearning/comments/1shtv0r/d_60_matmul_performance_bug_in_cublas_on_rtx_5090/
-
No clearly validated new SOTA benchmark/paper result reached consensus in this 24h slice. https://www.reddit.com/r/MachineLearning/new/
Open Source & Tools
-
ibu-boost(new GBDT library) applies absolute-threshold rejection to split selection. The implementation adapts the screening-transform idea (arXiv:2604.01178) from transformer context to boosted trees, aiming to avoid low-value splits that still win relative ranking. https://www.reddit.com/r/MachineLearning/comments/1shpdm2/p_ibuboost_a_gbdt_library_where_splits_are/ -
Large-scale OCR systems planning thread (50M pages / 1 week) surfaced practical deployment constraints. Not a release, but useful signal on real-world throughput/cost engineering priorities. https://www.reddit.com/r/MachineLearning/comments/1shg2ob/d_large_scale_ocr_d/
Industry & Community
-
High-volume community threads were mostly career/process/norms discussion, not technical breakthroughs. https://www.reddit.com/r/MachineLearning/comments/1shwc5d/is_google_deepmind_known_to_ghost_applicants_d/
-
Dataset-demand discussion highlighted unmet data verticals (street scenes, shelf OCR, utility meters), but remains exploratory. https://www.reddit.com/r/MachineLearning/comments/1shibc9/what_imagevideo_training_data_is_hardest_to_find/