for loops

Ranter

shinobiultra

1738

Comments

0

BigBoo

2336

6y

What language are you talking about?
0

j4cobgarby

8094

6y

Maybe threading, or GPU utilization?
0

BigBoo

2336

6y

@j4cobgarby GPU is slower for arrays than CPU though.
0

NickyBones

9768

6y

Probably SIMD instructions and cache-awareness.

Also if you write in C++, some libraries that accelerate your code like LAPACK are actually written in Fortran/C, which I think are inherently faster languages for calculations.
0

shinobiultra

1738

6y

@BigBoo Not a particular one. I just saw this as a general concept utilized from C++ to Python

@NickyBones This is what I think. The underlying code written almost in assembly gives it so much power

So pound to pound, are these libraries really only meant to simplify the code ? Andrew Ng said in one of his videos that using for loops is actually SLOWER so I'm just confused now
0

BigBoo

2336

6y

@shinobiultra Check the implementation you use. Generally it's the same speed. Writing stuff in assembly does not make things faster than c++ just because. C++ compilers are usually well optimized.

But using others implementations is nice for one reason. And it's apparent if you watch the cppcon talk about Facebookstrings.

Using others implementations can cover more use cases than you usually do on your own unless you have a really well optimized structure.

For example, it's beneficial for speed to have things on the stack. But the stack is small.
So one way to do this is to, for example. Let a string of a smaller size be allocated on the stack but strings of a longer length be allocated to heap.

This will make smaller strings faster but it does not mean that all strings would be faster.

There is no easy answer. It all comes down to specific implementation of the specific library. There might be some cases that perform better. But overall it's the same.
0

NickyBones

9768

6y

@shinobiultra I know Andrew Ng from his DL work. In that area, there are cases where for loops can be optimized by matrix operations, and that indeed is a lot faster.

LAPACK/BLAS/etc are meant to save you the trouble of going super low-level and handling stuff like aliasing. By querying the HW about its properties you can optimize your code per machine, and those libraries provide you with an abstraction layer.
1

psukys

237

6y

SIMD optimizations
0

shinobiultra

1738

6y

@NickyBones Yeah that's what I meant, that matrix operations are actually faster than for loops even tho the result is the same.
1

NickyBones

9768

6y

@shinobiultra matrix operations are like SIMD^2, since SIMD works on vectors. If your HW supports matrix acceleration like GPUs do, then matrix operations is the way to go.

However not everything can be done by converting to matrix operations, and not all machines have powerful GPU (or GPU at all), so libraries like LAPACK are still needed.
1

aritzh

756

6y

Modern CPUs are able to operate on numbers bigger than 64bits. Let's say you CPU has 256bit operations. If you wanted to add two vector of four 64 bit integers each, if done manually, you'd need 4 add instructions. Vectorization libraries, however, abstract the low-level non-portable instructions that your CPU has to operate on 256 bits, which allows you to add the two vectors in a single instruction. On a perfect world, you'd get a 4x speedup in this particular case, although it is rarely that high.

Related Rants

Add Comment

question

vectorization

faster?