DeepSeek- Behind the line of media, between power, hype, and real science

Attention: decision makers; academic staff; professionals, and stakeholders.

Summary: make sure to read the conclusion.

Introduction

The introduction of the large language model [DeepSeek] has sent shockwaves through multiple facets of global society. In this brief review, we will explore the topic, dissect its components, assess its impact, and provide a concise summary of our perspective on the matter.

The brief history

In 1956, at Dartmouth College in Hanover, New Hampshire, USA, computer scientist John McCarthy and his colleagues introduced the concept of machines simulating human intelligence for the first time. This event is historically considered the groundwork and foundation of AI research and studies.

This idea, combined with the industrial revolution and advancements in computational power over the years, has significantly contributed to the success of AI as we know it today. Throughout this period, various frameworks and architectures have represented the core concept of AI, including data mining, machine learning, and reinforcement learning. However, one of the most recent and newsworthy developments is the introduction of what is known as the Large Language Model (LLM). The first notable use of Large Language Models (LLMs) in the tech industry can be traced back to the early 2010s. One of the pioneering models was Google’s Word2Vec, introduced in 2013, which was capable of understanding and generating human language by converting words into numerical form. This laid the groundwork for more advanced models like GPT (Generative Pre-trained Transformer) by OpenAI, which was first introduced in 2018. These early models paved the way for the development of more sophisticated LLMs that we see today, such as GPT-3 and GPT-4

The Technical Scheme

Much like the industrial and technological advancements we see today, the fundamental factor is the mathematical representation of the problem. For example, many engineering achievements have introduced sophisticated mathematical models that have enhanced and solved various challenges. The field of AI research was similarly founded on mathematical models based on the concept of optimization.

To grasp this concept, one must understand the mathematical principles of polynomials, matrix algebra, and the notion of derivatives in calculus. A fundamental AI problem can be exemplified by Equation-1 below. In this equation, y is dependent on two critical coefficients, α and β, with X being a variable that can take any value. Our goal is to determine the optimal values for α and β that yield the perfect value of y.

Here’s an interesting point: X can be a very large matrix with millions, if not billions, of rows, where each row represents an experience or case. Additionally, the matrix can have millions or billions of columns, with each column representing a feature. On the right-hand side of this equation, we have a vector Y, which has the same number of rows, with each y value corresponding to a specific row. So, what is the task of AI? The task is to use this matrix, referred to as historical data, to find the optimal vectors of α and the corresponding β that create a model capable of predicting the value of a new y, given the vector values of X. The graph below illustrates this concept from a mathematical perspective.

During this quest for the optimal solution, multiple iterations are carried out to minimize the error. The error in question pertains to the proposed vectors of α and the corresponding β. Each time an error occurs; we utilize the knowledge of derivatives in calculus to compute the error based on the current assumptions. This allows us to update the values iteratively until we arrive at the ideal vectors of α and β. Before concluding this section, few important things to mention:

  1. it’s essential to understand that matrix X represents all the information one can gather about a specific problem. For example, if the problem involves object detection, you can envision this matrix as a large collection of images, each corresponding to a specific object we aim to predict.  
  2. Creating matrix X demands substantial resources, effort, and professional expertise. It involves converting all relevant information, at least in digital form, into a numerical representation. This process is crucial for accurately capturing the data necessary for the specific problem at hand

Finally, the mathematical operation to determine the vectors α and β through numerous iterations of derivative calculations requires substantial computational power. Securing these resources typically involves utilizing high-performance computing systems, often found in data centers equipped with powerful processors and GPUs. Access to these advanced computational resources is crucial for effectively performing these intensive calculations.

The Tec. battle

As the maturity of AI mathematical equations was established, businesses discovered a profitable way to turn this into a business model. In fact, at some point, the realm of data, processing, and computations became a lucrative area for many investors. With substantial promotion and investment, some began to present AI research in an almost magical light. This is where terms like neural networks, deep learning, and other buzzwords started to emerge.

Unfortunately, this led many to believe that AI was simply about mysterious machine capabilities. In reality, all these buzzwords address the same fundamental problem mentioned earlier in the technical scheme. The only difference lies in the mathematical methods used to determine the vectors of α and β.

From the outset, the USA has been a leading figure in the AI field, both scientifically and commercially. The architecture of AI has proven to be an exceptional method for solving various problems. Additionally, the industry for powerful computational chips attracted substantial investment. This shifted the focus to the infrastructure and capacity needed to develop such AI models.

Many educational institutions have also contributed to the narrative, asserting that building this capacity requires enormous effort, time, and computational power. However, the reality was a drive for market dominance and competition-free control. Given the hype, many have been convinced to invest in this line of business.  

Consequently, educational institutions began promoting products such as software, platforms, and tools, suggesting that mastering these would make one an AI creator, when the fact is only an AI user. However, this approach often results in graduating the next generation with little understanding of the mathematical foundations of AI, focusing instead on tool and framework proficiency.

This trend is also evident in job advertisements, where companies seek candidates proficient in specific tools and frameworks, as if the ability to reason and think critically has become less valued. Naturally, companies that develop these tools and software retain a dominant position and have the final say. Most of these companies are based in Silicon Valley. Around the world, many companies just felt behind the driver seat to lead AI from computational and applied perspectives. In fact, they choose to use whatever is made available of AI solutions by those giant tech companies, whoever tried to lead a revolution and be independent one way or another was to be out of business or completely crashed

The Geo. battle

The US politics has leveraged this hype over the years to promote the concept. Meanwhile, China has taken a different approach, leading the way in massive production capacity. They have utilized their manpower and vast land to expand this concept, delving deep into AI architecture. With years of meticulous planning, they have demonstrated to the West and the world that things can be accomplished efficiently with a well-thought-out plan.

For years, U.S. Wall Street marketing campaigns have sought investments to fund AI solutions, with successive governments supporting this endeavor. Public knowledge and trends suggested that building a powerful AI model required a massive budget. Until recently, companies claimed that developing a robust AI model necessitated a figure exceeding nine digits.

However, on January 20, 2025, the landscape changed dramatically. A small IT company, founded in 2023 in China with fewer than 200 employees, introduced a highly sophisticated, sleek, and open-source LLM AI model to the market. The company claimed that the model was developed for under $6 million, with efficient use of power and energy. This news sent shockwaves through the U.S. marketplace, with companies like NVIDIA potentially losing over $600 billion in a single day. This technological release demonstrated China’s ability to challenge the U.S. and its approach to global dominance, showing that competition is indeed possible.

Conclusion

The news now dominates the incident, but unfortunately, the scientific community is focusing in the wrong direction and not learning the proper lesson. News outlets are preoccupied with questions like how it happened and what kind of chip was used, when in fact, the reality is quite different. What this small company achieved, despite the strict export controls imposed by the U.S. on advanced semiconductor technology to limit China's access to critical components, was mastering the mathematics behind AI. Our conclusion is clear: it's all about solving the AI mathematical problem. Once one understands that they can almost certainly use available machines to complete the job effectively. Furthermore, if someone is given all the ultimate computational power without a solid grasp of the underlying AI mathematics, it will merely result in a display of figures and a showcase business, rather than real progress.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *