반응형
블로그 이미지
개발자로서 현장에서 일하면서 새로 접하는 기술들이나 알게된 정보 등을 정리하기 위한 블로그입니다. 운 좋게 미국에서 큰 회사들의 프로젝트에서 컬설턴트로 일하고 있어서 새로운 기술들을 접할 기회가 많이 있습니다. 미국의 IT 프로젝트에서 사용되는 툴들에 대해 많은 분들과 정보를 공유하고 싶습니다.
솔웅

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

카테고리

Jan 8 2024 OpenAI and journalism

2024. 1. 12. 09:24 | Posted by 솔웅


반응형

https://openai.com/blog/openai-and-journalism

 

OpenAI and journalism

We support journalism, partner with news organizations, and believe The New York Times lawsuit is without merit.

openai.com

 

 

OpenAI and journalism

We support journalism, partner with news organizations, and believe The New York Times lawsuit is without merit.

 

 

 

Our goal is to develop AI tools that empower people to solve problems that are otherwise out of reach. People worldwide are already using our technology to improve their daily lives. Millions of developers and more than 92% of Fortune 500 are building on our products today.

 

우리의 목표는 사람들이 다른 방법으로는 접근할 수 없는 문제를 해결할 수 있도록 지원하는 AI 도구를 개발하는 것입니다. 전 세계 사람들은 이미 일상 생활을 개선하기 위해 우리의 기술을 사용하고 있습니다. 오늘날 수백만 명의 개발자와 Fortune 500대 기업 중 92% 이상이 우리 제품을 기반으로 개발하고 있습니다.

 

 

While we disagree with the claims in The New York Times lawsuit, we view it as an opportunity to clarify our business, our intent, and how we build our technology. Our position can be summed up in these four points, which we flesh out below:

 

우리는 New York Times 소송의 주장에 동의하지 않지만, 이를 우리의 비즈니스, 의도, 기술 구축 방법을 명확히 할 수 있는 기회로 봅니다. 우리의 입장은 다음 네 가지로 요약될 수 있으며, 아래에서 구체적으로 설명하겠습니다.

 

 

  1. We collaborate with news organizations and are creating new opportunities

    우리는 언론 기관과 협력하여 새로운 기회를 창출하고 있습니다.

  2. Training is fair use, but we provide an opt-out because it’s the right thing to do

    Training  은 공정한 사용이지 올바른 일이기 때문에 거부할 수 있는 옵션을 제공합니다.

  3. “Regurgitation” is a rare bug that we are working to drive to zero

    "Regurgitation 역류, 표"는 우리가 제로화하기 위해 노력하고 있는 희귀한 버그입니다.

  4. The New York Times is not telling the full story

    New York Times는 전체 내용을 말하지 않습니다.

1. We collaborate with news organizations and are creating new opportunities

 

1. 우리는 언론사와 협력하여 새로운 기회를 창출하고 있습니다.

 

 

We work hard in our technology design process to support news organizations. We’ve met with dozens, as well as leading industry organizations like the News/Media Alliance, to explore opportunities, discuss their concerns, and provide solutions. We aim to learn, educate, listen to feedback, and adapt.

 

우리는 언론사를 지원하기 위해 기술 설계 프로세스에 열심히 노력하고 있습니다. 우리는 기회를 탐색하고, 우려 사항을 논의하고, 솔루션을 제공하기 위해 뉴스/미디어 연합(News/Media Alliance)과 같은 선도적인 업계 조직뿐만 아니라 수십 곳을 만났습니다. 우리는 배우고, 교육하고, 피드백을 듣고, 적응하는 것을 목표로 합니다.

 

Our goals are to support a healthy news ecosystem, be a good partner, and create mutually beneficial opportunities. With this in mind, we have pursued partnerships with news organizations to achieve these objectives:

 

우리의 목표는 건강한 뉴스 생태계를 지원하고, 좋은 파트너가 되며, 상호 이익이 되는 기회를 창출하는 것입니다. 이를 염두에 두고 우리는 다음과 같은 목표를 달성하기 위해 언론 기관과 파트너십을 추구해 왔습니다.

 

  1. Deploy our products to benefit and support reporters and editors, by assisting with time-consuming tasks like analyzing voluminous public records and translating stories.

    방대한 공공 기록 분석 및 기사 번역과 같이 시간이 많이 걸리는 작업을 지원하여 기자와 편집자에게 혜택을 주고 지원하기 위해 제품을 배포합니다.

  2. Teach our AI models about the world by training on additional historical, non-publicly available content.

    추가로 비공개로 제공되는 역사적 콘텐츠를 훈련하여 AI 모델에 세상에 대해 가르칩니다.

  3. Display real-time content with attribution in ChatGPT, providing new ways for news publishers to connect with readers.

    ChatGPT에 속성이 포함된 실시간 콘텐츠를 표시하여 뉴스 게시자가 독자와 연결할 수 있는 새로운 방법을 제공합니다.

Our early partnerships with the Associated Press, Axel Springer, American Journalism Project and NYU offer a glimpse into our approach.

 

Associated Press, Axel Springer, American Journalism Project 및 NYU와의 초기 파트너십을 통해 우리의 접근 방식을 엿볼 수 있습니다.

 

 

2. Training is fair use, but we provide an opt-out because it’s the right thing to do

 

2. Training은 공정한 사용이지만 옳은 일이기 때문에 거부할 수 있는 옵션을 제공합니다.

 

Training AI models using publicly available internet materials is fair use, as supported by long-standing and widely accepted precedents. We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness.

 

공개적으로 이용 가능한 인터넷 자료를 사용하여 AI 모델을 훈련시키는 것은 오랫동안 지속되고 널리 받아들여지는 선례에 의해 뒷받침되는 공정 사용입니다. 우리는 이 원칙이 창작자에게는 공정하고 혁신가에게는 필요하며 미국 경쟁력에 매우 중요하다고 생각합니다.

 

The principle that training AI models is permitted as a fair use is supported by a wide range of academics, library associations, civil society groups, startups, leading US companies, creators, authors, and others that recently submitted comments to the US Copyright Office. Other regions and countries, including the European Union, Japan, Singapore, and Israel also have laws that permit training models on copyrighted content—an advantage for AI innovation, advancement, and investment.

 

AI 모델 훈련이 공정한 사용으로 허용된다는 원칙은 최근 미국 저작권청에 의견을 제출한 다양한 학계, 도서관 협회, 시민 사회 단체, 스타트업, 미국 선도 기업, 창작자, 작가 및 기타 사람들에 의해 지지됩니다. 유럽 연합, 일본, 싱가포르, 이스라엘을 포함한 다른 지역 및 국가에도 저작권이 있는 콘텐츠에 대한 훈련 모델을 허용하는 법률이 있습니다. 이는 AI 혁신, 발전 및 투자에 유리합니다.

 

That being said, legal right is less important to us than being good citizens. We have led the AI industry in providing a simple opt-out process for publishers (which The New York Times adopted in August 2023) to prevent our tools from accessing their sites.

 

즉, 법적 권리는 좋은 시민이 되는 것보다 우리에게 덜 중요합니다. Google은 Google 도구가 사이트에 액세스하지 못하도록 게시자에게 간단한 거부 프로세스(2023년 8월 New York Times 채택)를 제공함으로써 AI 업계를 주도해 왔습니다.

 

 

3. “Regurgitation” is a rare bug that we are working to drive to zero

 

3. “Regurgitation, 표절, 역류, 토해내”는 우리가 제로화하기 위해 노력하고 있는 희귀한 버그입니다.

 

Our models were designed and trained to learn concepts in order to apply them to new problems.

 

우리의 모델은 개념을 새로운 문제에 적용하기 위해 학습하도록 설계되고 훈련되었습니다.

 

Memorization is a rare failure of the learning process that we are continually making progress on, but it’s more common when particular content appears more than once in training data, like if pieces of it appear on lots of different public websites. So we have measures in place to limit inadvertent memorization and prevent regurgitation in model outputs. We also expect our users to act responsibly; intentionally manipulating our models to regurgitate is not an appropriate use of our technology and is against our terms of use.

 

암기 Memorization  는 우리가 지속적으로 발전하고 있는 학습 과정에서 드문 실패이지만, 특정 콘텐츠가 여러 공개 웹사이트에 나타나는 것처럼 특정 콘텐츠가 훈련 데이터에 두 번 이상 나타날 때 더 일반적입니다. 따라서 우리는 의도하지 않은 암기 Memorization   를 제한하고 모델 출력의 역류 Regurgitation 를 방지하기 위한 조치를 취했습니다. 또한 우리는 사용자가 책임감 있게 행동할 것을 기대합니다. 의도적으로 모델을 조작하여 역류 Regurgitation  시키는 것은 당사 기술의 적절한 사용이 아니며 당사 이용 약관에 위배됩니다.

 

Just as humans obtain a broad education to learn how to solve new problems, we want our AI models to observe the range of the world’s information, including from every language, culture, and industry. Because models learn from the enormous aggregate of human knowledge, any one sector—including news—is a tiny slice of overall training data, and any single data source—including The New York Times—is not significant for the model’s intended learning.

 

인간이 새로운 문제를 해결하는 방법을 배우기 위해 광범위한 교육을 받는 것처럼 우리는 AI 모델이 모든 언어, 문화, 산업을 포함하여 전 세계의 다양한 정보를 관찰하기를 원합니다. 모델은 인간 지식의 막대한 집합체로부터 학습하기 때문에 뉴스를 포함한 모든 한 부문은 전체 교육 데이터의 작은 조각이며 New York Times를 포함한 단일 데이터 소스는 모델의 의도된 학습에 중요하지 않습니다.

 

4. The New York Times is not telling the full story

 

4. 뉴욕타임스는 전체 내용을 말하지 않습니다.

 

Our discussions with The New York Times had appeared to be progressing constructively through our last communication on December 19. The negotiations focused on a high-value partnership around real-time display with attribution in ChatGPT, in which The New York Times would gain a new way to connect with their existing and new readers, and our users would gain access to their reporting. We had explained to The New York Times that, like any single source, their content didn't meaningfully contribute to the training of our existing models and also wouldn't be sufficiently impactful for future training. Their lawsuit on December 27—which we learned about by reading The New York Times—came as a surprise and disappointment to us.

 

The New York Times와의 논의는 12월 19일 마지막 커뮤니케이션을 통해 건설적으로 진행되는 것으로 나타났습니다. 협상은 ChatGPT의 속성이 포함된 실시간 디스플레이를 중심으로 한 고부가가치 파트너십에 초점을 맞췄으며, 이로 인해 The New York Times는 새로운 이점을 얻게 됩니다. 기존 및 신규 독자와 연결하는 방법이며 사용자는 보고에 액세스할 수 있습니다. 우리는 다른 단일 소스와 마찬가지로 해당 콘텐츠가 기존 모델의 교육에 의미 있게 기여하지 않았으며 향후 교육에도 충분한 영향을 미치지 않을 것이라고 The New York Times에 설명했습니다. New York Times를 읽으면서 알게 된 12월 27일의 소송은 우리에게 놀라움과 실망으로 다가왔습니다.

 

Along the way, they had mentioned seeing some regurgitation of their content but repeatedly refused to share any examples, despite our commitment to investigate and fix any issues. We’ve demonstrated how seriously we treat this as a priority, such as in July when we took down a ChatGPT feature immediately after we learned it could reproduce real-time content in unintended ways.

 

그 과정에서 그들은 콘텐츠가 일부 역류 Regurgitation 되는 것을 언급했지만 문제를 조사하고 수정하겠다는 우리의 노력에도 불구하고 어떤 사례도 공유하기를 반복적으로 거부했습니다. 우리는 의도하지 않은 방식으로 실시간 콘텐츠를 재현할 수 있다는 사실을 알게 된 직후 ChatGPT 기능을 중단한 7월과 같이 이를 얼마나 진지하게 우선순위로 취급하는지 보여주었습니다.

 

Interestingly, the regurgitations The New York Times induced appear to be from years-old articles that have proliferated on multiple third-party websites. It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate. Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts.

 

흥미롭게도 New York Times가 유발한 역류 Regurgitation 는 여러 제3자 웹사이트에 확산된 오래된 기사에서 나온 것으로 보입니다. 우리 모델이 역류 Regurgitation 하도록 하기 위해 종종 긴 기사 발췌를 포함하여 프롬프트를 의도적으로 조작한 것 같습니다. 이러한 프롬프트를 사용할 때에도 우리 모델은 일반적으로 The New York Times가 암시하는 방식으로 동작하지 않습니다. 이는 모델이 모델에 역류 Regurgitation 하도록 지시했거나 여러 시도에서 사례를 선별했음을 의미합니다.

 

Despite their claims, this misuse is not typical or allowed user activity, and is not a substitute for The New York Times. Regardless, we are continually making our systems more resistant to adversarial attacks to regurgitate training data, and have already made much progress in our recent models.

 

그들의 주장에도 불구하고 이러한 오용은 일반적이지 않거나 허용되는 사용자 활동이 아니며 The New York Times를 대체할 수 없습니다. 그럼에도 불구하고 우리는 훈련 데이터를 역류 Regurgitation 시키는 적대적 공격에 대한 시스템 저항력을 높이기 위해 지속적으로 노력하고 있으며 이미 최근 모델에서 많은 진전을 이루었습니다.

 

We regard The New York Times’ lawsuit to be without merit. Still, we are hopeful for a constructive partnership with The New York Times and respect its long history, which includes reporting the first working neural network over 60 years ago and championing First Amendment freedoms.

 

우리는 New York Times의 소송이 가치가 없다고 생각합니다. 그럼에도 불구하고 우리는 The New York Times와의 건설적인 파트너십을 희망하며 60년 전 최초로 작동하는 신경망을 보고하고 수정헌법 제1조의 자유를 옹호하는 등의 오랜 역사를 존중합니다.

 

We look forward to continued collaboration with news organizations, helping elevate their ability to produce quality journalism by realizing the transformative potential of AI.

 

우리는 AI의 혁신적인 잠재력을 실현하여 양질의 저널리즘을 생산하는 능력을 향상시키는 언론 기관과의 지속적인 협력을 기대합니다.

 

 

 

 

 

 

 

 

반응형


반응형

https://huggingface.co/learn/nlp-course/chapter4/6?fw=pt

 

End-of-chapter quiz - Hugging Face NLP Course

2. Using 🤗 Transformers 3. Fine-tuning a pretrained model 4. Sharing models and tokenizers 5. The 🤗 Datasets library 6. The 🤗 Tokenizers library 9. Building and sharing demos new

huggingface.co

 

End-of-chapter quiz

 

 

 

 

 

반응형


반응형

https://huggingface.co/learn/nlp-course/chapter4/5?fw=pt

 

Part 1 completed! - Hugging Face NLP Course

2. Using 🤗 Transformers 3. Fine-tuning a pretrained model 4. Sharing models and tokenizers 5. The 🤗 Datasets library 6. The 🤗 Tokenizers library 9. Building and sharing demos new

huggingface.co

 

 

Part 1 completed!

 

This is the end of the first part of the course! Part 2 will be released on November 15th with a big community event, see more information here.

 

이것으로 강좌의 첫 번째 부분이 끝났습니다! 파트 2는 대규모 커뮤니티 이벤트와 함께 11월 15일에 출시될 예정입니다. 자세한 내용은 여기에서 확인하세요.

 

You should now be able to fine-tune a pretrained model on a text classification problem (single or pairs of sentences) and upload the result to the Model Hub. To make sure you mastered this first section, you should do exactly that on a problem that interests you (and not necessarily in English if you speak another language)! You can find help in the Hugging Face forums and share your project in this topic once you’re finished.

 

이제 텍스트 분류 문제(단일 또는 쌍의 문장)에 대해 사전 학습된 모델을 미세 조정하고 결과를 모델 허브에 업로드할 수 있습니다. 이 첫 번째 섹션을 완전히 마스터하려면 관심 있는 문제에 대해 정확하게 해당 섹션을 완료해야 합니다(다른 언어를 사용하는 경우 반드시 영어로 할 필요는 없음)! Hugging Face 포럼에서 도움을 찾을 수 있으며 작업이 끝나면 이 주제에서 프로젝트를 공유할 수 있습니다.

 

We can’t wait to see what you will build with this!

 

우리는 당신이 이것으로 무엇을 만들 것인지 기대하고 있습니다!

 

 

 

 

 

 

 

 

 

 

반응형