So I wanted to get into ML using Python recently and I was wondering about which ML library I should learn as a ML beginner first. I’ve been using Python for a few years now.

  • AlmightySnoo 🐢🇮🇱🇺🇦@lemmy.world
    link
    fedilink
    arrow-up
    12
    ·
    edit-2
    1 year ago

    I’d say since you’re a beginner, it’s much better to try to implement your regression functions and any necessary helper functions (train/test split etc…) yourself in the beginning. Learn the necessary linear algebra and quadratic programming and try to implement linear regression, logistic regression and SVMs using only numpy and cvxpy.

    Once you get the hang of it, you can jump straight into sklearn and be confident that you understand sort of what those “blackboxes” really do and that will also help you a lot with troubleshooting.

    For neural networks and deep learning, pytorch is imposing itself as an industry standard right now. Look up “adjoint automatic differentiation” (“backpropagation” doesn’t do it any justice as pytorch instead implements a very general dynamic AAD) and you’ll understand the “magic” behind the gradients that pytorch gives you. Karpathy’s YouTube tutorials are really good to get an intro to AAD/autodiff in the context of deep learning.

      • AlmightySnoo 🐢🇮🇱🇺🇦@lemmy.world
        link
        fedilink
        arrow-up
        4
        ·
        edit-2
        1 year ago

        Linear and logistic regression are much easier (and less error prone) to implement from scratch than neural network training with backpropagation.

        That way you can still follow the progression I suggested: implement those regressions by hand using numpy -> compare against (and appreciate) sklearn -> implement SVMs by hand using cvxpy -> appreciate sklearn again.

        If you get the hang of “classical” ML, then deep learning becomes easy as it’s still machine learning, just with more complicated models and no closed-form solutions.

  • Artyom@lemm.ee
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    Sklearn for most of the data handling, pytorch for the model. They’re designed to be useable together.